JP4976641B2

JP4976641B2 - Method and apparatus for recommending target items based on third party stereotype preferences

Info

Publication number: JP4976641B2
Application number: JP2003545038A
Authority: JP
Inventors: ヴィアールギュッタ，スリニヴァス; クラパティ，カウシャル
Original assignee: Koninklijke Philips NV; Koninklijke Philips Electronics NV
Current assignee: Koninklijke Philips NV
Priority date: 2001-11-13
Filing date: 2002-11-05
Publication date: 2012-07-18
Anticipated expiration: 2022-11-05
Also published as: EP1449380B1; EP1449380A2; JP2005509967A; KR20040054767A; US20030097300A1; ATE419712T1; DE60230664D1; AU2002363685A1; CN1611074A; CN1276661C; KR100972557B1; WO2003043337A3; WO2003043337A2

Abstract

A method and apparatus are disclosed for recommending items of interest to a user, such as television program recommendations, before a viewing history or purchase history of the user is available. A third party viewing or purchase history is processed to generate stereotype profiles that reflect the typical patterns of items selected by representative viewers. A user can select the most relevant stereotype(s) from the generated stereotype profiles and thereby initialize his or her profile with the items that are closest to his or her own interests. A clustering routine is disclosed to partition the third party viewing or purchase history (the data set) into clusters, such that points (e.g., television programs) in one cluster are closer to the mean of that cluster than any other cluster. A mean computation routine is also disclosed to compute the symbolic mean of a cluster.

Description

本発明は、テレビ番組のような、対象品目を推奨する方法及び装置に関し、特に、利用者の購入又は視聴履歴が利用可能である以前に、番組及び他の対象品目を推奨する手法に関する。 The present invention relates to a method and apparatus for recommending target items, such as television programs, and more particularly to a method for recommending programs and other target items before a user's purchase or viewing history is available.

テレビ視聴者が利用可能なチャンネル数が、そのようなチャンネル上で利用可能な番組コンテンツの多様性に加えて、増加するにつれて、テレビ視聴者が対象テレビ番組を見分けるのはますます難しくなってきている。電子番組ガイド（ＥＰＧ）は利用可能なテレビ番組を、例えば、題名、時間、日付及びチャンネルによって、特定し、個人別選好に応じて、利用可能なテレビ番組を検索又はソートすることを可能にすることによって、対象番組を特定することを容易にする。 As the number of channels available to TV viewers increases in addition to the variety of program content available on such channels, it becomes increasingly difficult for TV viewers to distinguish between target TV programs. Yes. Electronic Program Guide (EPG) identifies available TV programs by, for example, title, time, date, and channel, and allows you to search or sort available TV programs according to personal preferences This makes it easy to specify the target program.

テレビ番組及び他の対象品目を推奨する、多数の推奨ツールが提案又は提唱されている。テレビ番組推奨ツールは、例えば、視聴者選好をＥＰＧに当てはめて特定視聴者が関心を持ち得る推奨番組群を取得する。一般に、テレビ番組推奨ツールは明示的又は暗黙的ツールを用いるか、上述の何らかの組み合わせを用いて、視聴者選好を取得する。暗黙的テレビ番組推奨ツールはテレビ番組の推奨を視聴者の視聴履歴から導き出された情報に基づいて、知らず知らずのうちに、生成する。一方、明示的テレビ番組推奨ツールは、明示的に視聴者に、題名、ジャンル、俳優、チャンネル及び日時のような、番組の属性についての選好について尋ねて、視聴者プロフィールを導き出して推奨を生成する。
米国特許出願公開第09/819286号明細書米国特許出願公開第09/466406号明細書米国特許出願公開第09/498271号明細書米国特許出願公開第09/627139号明細書 Stanfill及びWaltz、“Toward Memory-Based Reasoning”、Communications of the ACM、1986年、29:12号、1213-1228頁 Cost及びSalzberg、“A Weighted Nearest Neighbor Algorithm For Learning With Symbolic Features”、Machine Learning、 Kluwer Publishers(Boston、MA)、1993年、10号、57-58頁 J. Kittler他、”Combing Classifiers”、Proc. of the 13th Int’l Conf. on Pattern Recognition“、オーストリア、Vienna、1996年、II号、897-901頁 A number of recommendation tools have been proposed or proposed that recommend television programs and other target items. For example, the television program recommendation tool applies a viewer preference to an EPG and acquires a recommended program group that a specific viewer may be interested in. In general, television program recommendation tools obtain viewer preferences using explicit or implicit tools, or some combination of the above. The implicit television program recommendation tool generates a television program recommendation unknowingly based on information derived from the viewing history of the viewer. Explicit TV program recommendation tools, on the other hand, explicitly ask viewers about their preferences about program attributes, such as title, genre, actor, channel and date and time, to derive a viewer profile and generate recommendations .
US Patent Application Publication No. 09/819286 U.S. Patent Application Publication No. 09/466406 US Patent Application Publication No. 09/498271 U.S. Patent Application No. 09/627139 Stanfill and Waltz, “Toward Memory-Based Reasoning”, Communications of the ACM, 1986, 29:12, pages 1213-1228 Cost and Salzberg, “A Weighted Nearest Neighbor Algorithm For Learning With Symbolic Features”, Machine Learning, Kluwer Publishers (Boston, MA), 1993, 10, pp. 57-58 J. Kittler et al., “Combing Classifiers”, Proc. Of the 13th Int'l Conf. On Pattern Recognition, Austria, Vienna, 1996, II, pages 897-901

現在利用可能な推奨ツールは利用者が対象品目を特定するのを助力するが、多数の制限を受け、そしてそれは、克服された場合、そのような推奨ツールの利便性及び性能を大きく向上させ得る。例えば、明示的推奨ツールの徹底的な実施においては、初期化するのが非常に冗長で、各新規利用者が粗粒度レベルで該新規利用者の選好を特定する非常に詳細な調査に回答することを必要とする。暗黙的テレビ番組推奨ツールがプロフィールを、知らず知らずのうちに視聴行動を観察することによって、導き出す一方、正確になるには長期間を要する。更に、そのような暗黙的テレビ番組推奨ツールは何らかの推奨を行い始めるのに少なくとも最小量の視聴履歴を必要とする。したがって、そのような暗黙的テレビ番組推奨ツールは、推奨ツールが最初に取得される際に、如何なる推奨をも行うことができない。 Currently available recommendation tools help users identify target items, but are subject to a number of limitations, which, when overcome, can greatly improve the convenience and performance of such recommendation tools . For example, in a thorough implementation of an explicit recommendation tool, initialization is very verbose and each new user responds to a very detailed survey that identifies the new user's preferences at a coarse-grain level. I need that. While implicit television program recommendation tools derive profiles by observing viewing behaviors without knowing it, it takes a long time to be accurate. Furthermore, such an implicit television program recommendation tool requires at least a minimum amount of viewing history to begin making any recommendations. Thus, such an implicit television program recommendation tool cannot make any recommendations when the recommendation tool is first obtained.

したがって、テレビ番組のような品目を、十分な個人別視聴履歴が利用可能になる前に、知らず知らずのうちに推奨し得る、方法及び装置についてのニーズが存在する。更に、第3者の視聴行動に基づいて特定の利用者についての番組推奨を生成する方法及び装置についてのニーズが存在する。 Accordingly, there is a need for a method and apparatus that can recommend items such as television programs without the knowledge before sufficient personalized viewing history is available. Furthermore, there is a need for a method and apparatus for generating program recommendations for specific users based on third party viewing behavior.

一般に、利用者に対して、推奨テレビ番組のような、対象品目を推奨する方法及び装置を開示する。本発明の一特徴によれば、推奨は、例えば、利用者が最初に当該推奨ツールを取得した際のように、利用者の視聴履歴又は購入履歴が利用可能になる前に、生成し得る。最初に、1つ以上の第3者からの視聴履歴又は購入履歴が利用されて特定利用者に対する対象品目を推奨する。第３者視聴又は購入履歴は更に大きな母集団の典型を示す人口統計を有する標本集団から取得される。 Generally, a method and apparatus for recommending a target item, such as a recommended television program, to a user is disclosed. According to one aspect of the present invention, recommendations may be generated before a user's viewing history or purchase history is available, such as when the user first obtains the recommendation tool. First, a viewing history or purchase history from one or more third parties is used to recommend a target item for a specific user. Third party viewing or purchase history is obtained from a sample population with demographics representative of a larger population.

第３者視聴又は購入履歴は、典型的な視聴者によって選定された品目の典型的パターンを反映する、ステレオタイプ・プロフィールを生成するよう処理される。ここで用いられるように、ステレオタイプ・プロフィールはある程度お互いが類似している品目（データ・ポイント）のクラスタである。したがって、特定のクラスタは特定のパターンを表す第3者視聴又は購入履歴からの選定品目の特定セグメントに相当する。ステレオタイプ・プロフィールが本発明によって生成されると、利用者は最も適切なステレオタイプを選定し得、それによって、該利用者のプロフィールを、該利用者自らの興味に最も近い品目によって初期化する。ステレオタイプ・プロフィールは更に、各個別利用者の特定の、個人視聴行動に向けて、選定パターン及び何らかのフィードバックが設けられることによって、調整されて発展する。 The third party viewing or purchase history is processed to generate a stereotype profile that reflects the typical pattern of items selected by the typical viewer. As used herein, a stereotype profile is a cluster of items (data points) that are somewhat similar to each other. Thus, a particular cluster corresponds to a particular segment of selected items from third party viewing or purchase history that represents a particular pattern. Once the stereotype profile has been generated by the present invention, the user can select the most appropriate stereotype, thereby initializing the user's profile with the item closest to the user's own interests. . The stereotype profile is further tailored and developed by providing selection patterns and some feedback for each individual user's specific personal viewing behavior.

第３者視聴又は購入履歴（データ・セット）がクラスタに分割されて、１つのクラスタにおけるポイント（例えば、テレビ番組）が、他のいかなるクラスタよりも、そのクラスタの平均値に近いような、クラスタ化ルーチンを開示する。更に、クラスタの記号的平均値を算定する平均値算定ルーチンを開示する。テレビ番組のような、特定のデータ・ポイントが、クラスタに、該データ・ポイントと各クラスタとの間の距離に基づいて各クラスタの平均値を用いて割り当てられる。クラスタ化性能評価ルーチンはクラスタ生成を停止する、あらかじめ定義された基準がいつ充足されたかを判定する。 A cluster in which a third party viewing or purchase history (data set) is divided into clusters and the points in one cluster (eg, a television program) are closer to the average value of that cluster than any other cluster Disclosed are routines. Further, an average value calculation routine for calculating the symbolic average value of the cluster is disclosed. A particular data point, such as a television program, is assigned to a cluster using the average value of each cluster based on the distance between the data point and each cluster. The clustering performance evaluation routine determines when a predefined criterion that stops cluster generation is met.

図１は本発明によるテレビ番組推奨ツール100を示す。図１に表すように、例示的テレビ番組推奨ツール100は、特定視聴者向け対象番組を特定するよう、図２とともに説明する番組データベース200中の番組を評価する。推奨番組群は視聴者に向けて、例えば、（図なしの）セット・トップ端末/テレビを用いて周知の画面上プレゼンテーション手法を用いて、表示される。ここでは本発明をテレビ番組の推奨に関連して示す一方、本発明は、視聴履歴又は購入履歴のような、利用者行動の評価に基づいた如何なる自動生成推奨にも適用し得る。 FIG. 1 shows a television program recommendation tool 100 according to the present invention. As shown in FIG. 1, the exemplary television program recommendation tool 100 evaluates programs in the program database 200 described in conjunction with FIG. 2 to identify targeted programs for specific audiences. The recommended program group is displayed to the viewer using, for example, a well-known on-screen presentation method using a set top terminal / TV (not shown). While the present invention is illustrated herein in connection with television program recommendations, the present invention may be applied to any automatically generated recommendations based on evaluation of user behavior, such as viewing history or purchase history.

本発明の一特徴によれば、テレビ番組推奨ツール100は、利用者が最初にテレビ番組推奨ツール100を取得する際のように利用者の視聴履歴140が利用可能になる前に、テレビ番組推奨を生成し得る。図１に表すように、テレビ番組推奨ツール100は最初に1つ以上の第３者からの視聴履歴130を利用して特定利用者に対して対象番組を推奨する。一般に、第3者視聴履歴130は、年齢、収入、性別及び教育のような、更に大きな母集団の典型を示す人口統計を有する1つ以上の標本集団の視聴習慣に基づくものである。 According to one aspect of the present invention, the TV program recommendation tool 100 recommends a TV program before the user's viewing history 140 is available, such as when the user first acquires the TV program recommendation tool 100. Can be generated. As shown in FIG. 1, the television program recommendation tool 100 first recommends a target program to a specific user using a viewing history 130 from one or more third parties. In general, the third party viewing history 130 is based on the viewing habits of one or more sample populations having demographics representative of the larger population, such as age, income, gender and education.

図１に表すように、第３者視聴履歴130は特定の母集団による視聴/非視聴番組群を有する。視聴番組群は該特定の母集団によって実際に視聴された番組を観察することによって取得される。視聴されない番組群は、例えば、番組データベース200中の番組を無作為標本抽出することによって、取得される。視聴されない番組群が「Adaptive Sampling Technique for Selecting Negative Examples for Artificial Intelligence Applications」と題する、西暦2001年3月28日付特許出願（特許文献１参照。）の教示によって取得される別の実施例もある。 As shown in FIG. 1, the third-party viewing history 130 includes viewing / non-viewing program groups based on a specific population. The viewing program group is acquired by observing a program actually viewed by the specific population. The program group that is not viewed is acquired by, for example, randomly sampling programs in the program database 200. There is another example in which a program group that is not viewed is acquired by teaching of a patent application dated March 28, 2001 (see Patent Document 1) entitled “Adaptive Sampling Technique for Selecting Negative Examples for Artificial Intelligence Applications”.

本発明の別の特徴によれば、テレビ番組推奨ツール100は、典型的な視聴者によって視聴されるテレビ番組の典型的なパターンを反映するステレオタイプ・プロフィールを生成するよう、第３者視聴履歴130を処理する。更に以下に説明するように、ステレオタイプ・プロフィールは、ある程度お互いに類似しているテレビ番組（データ・ポイント）のクラスタである。したがって、特定のクラスタは特定のパターンを表す第３者視聴履歴130からのテレビ番組の特定セグメントに対応する。 According to another aspect of the present invention, the television program recommendation tool 100 generates a stereotype profile that reflects typical patterns of television programs viewed by typical viewers. Process 130. As described further below, a stereotype profile is a cluster of television programs (data points) that are somewhat similar to each other. Thus, a particular cluster corresponds to a particular segment of a television program from the third party viewing history 130 that represents a particular pattern.

第３者視聴履歴130はある特定のパターンを表す番組のクラスタを設けるよう本発明によって処理される。その後、利用者は最も適切なステレオタイプを選定し得、それによって該利用者のプロフィールを該利用者自らの興味に最も近い番組によって初期化する。ステレオタイプ・プロフィールは更に、その記録パターン、及び番組に対するフィードバックに応じて、各個別利用者の特定の、個人視聴行動に向けて調整されて発展する。一実施例においては、番組スコアを判定する際に、利用者の自らの視聴履歴140からの番組に第３者視聴履歴からの番組よりも高い重みづけを与え得る。 Third party viewing history 130 is processed by the present invention to provide a cluster of programs representing a particular pattern. The user can then select the most appropriate stereotype, thereby initializing the user's profile with the program closest to the user's own interests. Stereotype profiles are further developed and tailored to specific individual viewing behaviors of each individual user in response to their recording patterns and feedback to the program. In one embodiment, in determining the program score, a program from a user's own viewing history 140 may be given a higher weight than a program from a third party viewing history.

テレビ番組推奨ツール100は、中央処理装置（ＣＰＵ）のような処理装置115、及びＲＡＭ（読み込み書き込み記憶装置）及び/又はＲＯＭ（読み取り専用記憶装置）のような、記憶装置120を有する、パーソナル・コンピュータ又はワークステーションのような、如何なる算定装置としてでも実施され得る。テレビ番組推奨ツール100は更に、例えば、（図なしの）セット・トップ端末又は表示装置における、特定用途向け集積回路（ＡＳＩＣ）として実施され得る。更に、テレビ番組推奨ツール100はティーヴォ（Tivo）（商標）の、ティーヴォ（Tivo）社（カリフォルニア州サニーヴェール）から商用で入手可能な、システムのような、如何なる利用可能なテレビ番組推奨ツールとしてでも実施し得、又、「Method and Apparatus for Recommending Television Programming Using Decision Trees」と題する、西暦1999年12月17日付特許出願（特許文献２参照。）、「Bayesian TV Show Recommender」と題する、西暦2000年2月4日付特許出願（特許文献３参照。）、及び「Three-Way Media Recommendation Method and System」と題する、西暦2000年７月２７日付特許出願（特許文献４参照。）に記載されたテレビ番組推奨ツール、若しくはそれらの如何なる組み合わせとしてでも、本発明の特徴および機能を実行するようここで変更されたように、実施し得る。 The television program recommendation tool 100 is a personal computer having a processing unit 115 such as a central processing unit (CPU) and a storage unit 120 such as a RAM (read / write storage unit) and / or a ROM (read only storage unit). It can be implemented as any computing device, such as a computer or workstation. The television program recommendation tool 100 may further be implemented as an application specific integrated circuit (ASIC), for example in a set top terminal (not shown) or display device. Further, the TV program recommendation tool 100 is any available TV program recommendation tool, such as the system commercially available from Tivo ™, Tivo, Inc. (Sunnyvale, Calif.). A patent application dated December 17, 1999 (see Patent Document 2) entitled "Method and Apparatus for Recommending Television Programming Using Decision Trees", 2000, entitled "Bayesian TV Show Recommender" The television program described in the patent application dated February 4, 2000 (see Patent Document 3) and the patent application dated July 27, 2000 AD (see Patent Document 4) entitled “Three-Way Media Recommendation Method and System” As a recommended tool, or any combination thereof, it may be implemented as modified herein to carry out the features and functions of the present invention. .

図１に表して、図２乃至８とともに更に以下に説明するように、テレビ番組推奨ツール100は番組データベース200、ステレオタイプ・プロフィール処理300、クラスタ化ルーチン400、平均値算定ルーチン500、距離算定ルーチン600及びクラスタ性能評価ルーチン800を有する。一般に、番組データベース200は周知の電子番組ガイドとして実施され得、特定の時間間隔において入手可能な各番組についての情報を記録する。ステレオタイプ・プロフィール処理300は(i)典型的な視聴者によって視聴されるテレビ番組の典型的なパターンを反映するステレオタイプ・プロフィールを生成するよう第３者視聴履歴130を処理して；(ii)利用者が最も適切なステレオタイプを選定することを可能にして；かつ、(iii)選定ステレオタイプに基づいた推奨を生成する。 As shown in FIG. 1 and further described below in conjunction with FIGS. 2-8, the television program recommendation tool 100 includes a program database 200, stereotype profile processing 300, clustering routine 400, average value calculation routine 500, distance calculation routine. 600 and a cluster performance evaluation routine 800. In general, program database 200 may be implemented as a well-known electronic program guide and records information about each program available at a particular time interval. Stereotype profile processing 300 (i) processes third party viewing history 130 to generate a stereotype profile that reflects typical patterns of television programs viewed by typical viewers; (ii) ) Allows the user to select the most appropriate stereotype; and (iii) generates recommendations based on the selected stereotype.

クラスタ化ルーチン400は、１つのクラスタにおけるポイント（テレビ番組）が他のいかなるクラスタよりもそのクラスタの平均値（質量中心）に近くなるように、第3者視聴履歴130（データ・セット）をクラスタに分割するよう、ステレオタイプ・プロフィール化処理300によって呼び出される。クラスタ化ルーチン400は平均値算定ルーチン500を呼び出してクラスタの記号的平均値を算定する。距離算定ルーチン600は、特定のテレビ番組と、特定のクラスタの平均値との間の距離に基づいてテレビ番組が各クラスタにどれくらい近いかを評価するよう、クラスタ化ルーチン400によって呼び出される。最後に、クラスタ化ルーチン400は、クラスタの生成を停止する基準がいつ充足されたかを判定するよう、クラスタ化性能評価ルーチン800を呼び出す。 Clustering routine 400 clusters third party viewing history 130 (data set) so that the points (television programs) in one cluster are closer to the average (center of mass) of the cluster than any other cluster. Called by the stereotype profiling process 300 to divide into The clustering routine 400 calls an average value calculation routine 500 to calculate the symbolic average value of the cluster. The distance calculation routine 600 is called by the clustering routine 400 to evaluate how close a television program is to each cluster based on the distance between the particular television program and the average value of the particular cluster. Finally, the clustering routine 400 calls the clustering performance evaluation routine 800 to determine when the criteria for stopping cluster generation has been met.

図２は図１の番組データベース（EPG）200からの見本テーブルである。上記のように、番組データベース200は特定の時間間隔において入手可能な各番組についての情報を記録する。図２に表すように、番組データベース200は、各々が特定の番組と関係する、記録205乃至220のような、複数の記録を有する。各番組について、番組データベース200はフィールド240及び245各々における番組に関係する日時及びチャンネルを示す。更に、各番組の、題名、ジャンル及び俳優がフィールド250、255及び270において特定される。番組の長さ及び説明のような、（図なしの）他の周知の特徴も更に、番組データベース200に有し得る。 FIG. 2 is a sample table from the program database (EPG) 200 of FIG. As described above, the program database 200 records information about each program available at a specific time interval. As shown in FIG. 2, the program database 200 has a plurality of records, such as records 205-220, each associated with a particular program. For each program, program database 200 indicates the date and time and channel associated with the program in fields 240 and 245, respectively. In addition, the title, genre and actor of each program are specified in fields 250, 255 and 270. Other well-known features (not shown) may also be included in the program database 200, such as program length and description.

図３は本発明の特徴を組み込んだステレオタイプ・プロフィール処理300の例示的実施形態を説明する流れ図である。上記のように、ステレオタイプ・プロフィール処理300は(i)典型的な視聴者によって視聴されるテレビ番組の典型的なパターンを反映するステレオタイプ・プロフィールを生成するよう、第３者視聴履歴130を処理して；(ii)利用者が最も適切なステレオタイプを選定してそれによって当該利用者のプロフィールを初期化することを可能にして；かつ(iii)選定ステレオタイプに基づいて推奨を生成する。第3者視聴履歴130の処理はオフライン、例えば、工場内、にて実行され得、テレビ番組推奨ツール100は利用者による選定用生成ステレオタイプ・プロフィールがインストールされた利用者に向けて設け得ることを特筆する。 FIG. 3 is a flow diagram illustrating an exemplary embodiment of a stereotype profile process 300 incorporating features of the present invention. As described above, the stereotype profile processing 300 (i) sets the third party viewing history 130 to generate a stereotype profile that reflects the typical pattern of a television program viewed by a typical viewer. Processing; (ii) allows the user to select the most appropriate stereotype and thereby initialize the user's profile; and (iii) generate recommendations based on the selected stereotype . The processing of the third party viewing history 130 can be performed offline, for example, in a factory, and the TV program recommendation tool 100 can be provided for a user who has installed the generated stereotype profile for selection by the user. Special mention.

したがって、図３に表すように、ステレオタイプ・プロフィール処理300はまず、工程310中に第３者視聴履歴130を収集する。その後、ステレオタイプ・プロフィール処理300は、図４とともに下記に説明するように、工程320中に、ステレオタイプ・プロフィールに対応する番組のクラスタを生成するよう、クラスタ化ルーチン400を実行する。更に下記に説明するように、例示的クラスタ化ルーチン400は「k-平均値法」クラスタ化ルーチンのような、管理されていないデータ・クラスタ化アルゴリズムを、視聴履歴データ・セット130に対して用い得る。上記のように、クラスタ化ルーチン400は、１つのクラスタにおけるポイント（テレビ番組）が他の如何なるクラスタよりもそのクラスタの平均値（質量中心）に近くなるよう、第3者視聴履歴130（データ・セット）をクラスタに分割する。 Accordingly, as depicted in FIG. 3, stereotype profile processing 300 first collects third party viewing history 130 during step 310. The stereotype profile process 300 then executes a clustering routine 400 during step 320 to generate a cluster of programs corresponding to the stereotype profile, as described below in conjunction with FIG. As described further below, the exemplary clustering routine 400 uses an unmanaged data clustering algorithm for the viewing history data set 130, such as a “k-means” clustering routine. obtain. As described above, the clustering routine 400 allows the third party viewing history 130 (data- Set) is divided into clusters.

ステレオタイプ・プロフィール処理300は更に、工程330中に、1つ以上の各ステレオタイプ・プロフィールを特徴付けるラベルを各クラスタに割り当てる。一例示的実施例においては、クラスタの平均値はクラスタ全体についての典型的なテレビ番組となって、該平均値番組の特徴を、クラスタをラベル化するのに用い得る。例えば、テレビ番組推奨ツール100は、ジャンルが各クラスタについての、優性の、すなわち、特徴付ける、特徴であるよう、構成し得る。 Stereotype profile processing 300 further assigns a label characterizing each of the one or more stereotype profiles to each cluster during step 330. In one exemplary embodiment, the average value of the cluster becomes a typical television program for the entire cluster, and the features of the average value program can be used to label the cluster. For example, the television program recommendation tool 100 may be configured such that the genre is the dominant, i.e., characterizing, feature for each cluster.

ラベルされたステレオタイプ・プロフィールは、工程340中に、利用者の興味に最も近いステレオタイプ・プロフィールの選定のために各利用者に対して示される。各選定クラスタを構成する番組はそのステレオタイプの「典型的な視聴履歴」として考えることができ、各クラスタについてのステレオタイプ・プロフィールを構築するのに用いられ得る。したがって、工程350中に、選定ステレオタイプ・プロフィールからの番組を有する視聴履歴が利用者に向けて生成される。最後に、上記工程において生成された視聴履歴が、番組推奨を取得するよう、工程360中に番組推奨ツールに当てはめられる。番組推奨ツールは、当業者には明白なように、上記で参照されたような、ここにおいて変更された、如何なる従来の番組推奨ツールとしてでも実施し得る。プログラムの制御は工程370中に終了する。 The labeled stereotype profile is presented to each user during step 340 for selection of the stereotype profile that most closely matches the user's interest. The programs that make up each selected cluster can be thought of as a “typical viewing history” of that stereotype and can be used to build a stereotype profile for each cluster. Accordingly, during step 350, a viewing history having programs from the selected stereotype profile is generated for the user. Finally, the viewing history generated in the above step is applied to the program recommendation tool during step 360 to obtain program recommendations. The program recommendation tool may be implemented as any conventional program recommendation tool modified herein, as referenced above, as will be apparent to those skilled in the art. Control of the program ends during step 370.

図４は本発明の特徴を組み込んだクラスタ化ルーチン400の例示的実施形態を説明する流れ図である。上記のように、クラスタ化ルーチン400はステレオタイプ・プロフィール処理300によって工程320中に呼び出されて、１つのクラスタにおけるポイント（テレビ番組）が他の如何なるクラスタよりもそのクラスタの平均値（質量中心）に近くなるよう、第３者視聴履歴130（データ・セット）をクラスタに分割する。一般に、クラスタ化ルーチンは標本データ・セットにおける例のグループ分けを見出す、管理されていない、タスクを重点的に取り扱う。本発明はデータ・セットをｋのクラスタにk-平均値法クラスタ化アルゴリズムを用いて分割する。以下に説明するように、クラスタ化ルーチン400に対する２つの主要パラメータには：(i)図６とともに以下に説明する、最も近いクラスタを見出す距離メトリック；及び(ii)生成するクラスタ数、ｋ；がある。 FIG. 4 is a flow diagram illustrating an exemplary embodiment of a clustering routine 400 incorporating features of the present invention. As described above, the clustering routine 400 is invoked during step 320 by the stereotype profile processing 300 so that points (television programs) in one cluster are the average (center of mass) of that cluster over any other cluster. The third party viewing history 130 (data set) is divided into clusters so as to be close to. In general, clustering routines focus on unmanaged tasks that find example groupings in a sample data set. The present invention divides the data set into k clusters using a k-means clustering algorithm. As described below, the two main parameters for the clustering routine 400 include: (i) a distance metric to find the nearest cluster, described below in conjunction with FIG. 6; and (ii) the number of clusters to generate, k; is there.

例示的クラスタ化ルーチン400は、例データを更にクラスタ化しても如何なる分類精度の向上をももたらさない場合に安定値ｋに達するという条件で、ｋの動的値を利用する。更に、クラスタ・サイズが、空のクラスタが記録される時点まで増加される。したがって、クラスタ化は、クラスタの自然レベルに達した際に停止する。 The example clustering routine 400 uses the dynamic value of k, provided that further clustering of the example data does not result in any improvement in classification accuracy, reaching a stable value k. In addition, the cluster size is increased to the point where an empty cluster is recorded. Thus, clustering stops when the natural level of the cluster is reached.

図４に表すように、クラスタ化ルーチン400はまず、工程410中にｋのクラスタを定める。例示的クラスタ化ルーチン400はクラスタの最低数、例えば２、を選択することによって開始する。この固定数について、クラスタ化ルーチン400は視聴履歴データ130全体を処理して、数回の繰り返し後、安定的（すなわち、アルゴリズムが更に一度繰り返したとしても、１つのクラスタから別のクラスタに移動する番組はない）と考えられ得る２つのクラスタに達する。現時のｋのクラスタは工程420中に1つ以上の番組によって初期化される。 As shown in FIG. 4, the clustering routine 400 first determines k clusters during step 410. The example clustering routine 400 begins by selecting the minimum number of clusters, eg, two. For this fixed number, the clustering routine 400 processes the entire viewing history data 130 and is stable after several iterations (ie, moving from one cluster to another even if the algorithm is repeated once more). Two clusters that can be considered) are reached. The current k clusters are initialized during step 420 by one or more programs.

１つの例示的実施形態においては、クラスタは工程420中に第３者視聴履歴130から選定された、いくつかのシード番組によって初期化される。クラスタを初期化する番組は無作為に又は順次に選定し得る。順次実施形態においては、クラスタは視聴履歴130における最初の番組から始まる番組か、視聴履歴130における任意の点で始まる番組によって初期化し得る。更に別の変形例においては、各クラスタを初期化する番組数は更に変動させ得る。最後に、クラスタは第3者視聴履歴130における番組から無作為に選定された特徴値を有する1つ以上の「仮想の」番組によって初期化し得る。 In one exemplary embodiment, the cluster is initialized with a number of seed programs selected from third party viewing history 130 during step 420. Programs that initialize the cluster may be selected randomly or sequentially. In a sequential embodiment, a cluster may be initialized with a program that begins with the first program in the viewing history 130 or a program that begins at any point in the viewing history 130. In yet another variation, the number of programs that initialize each cluster may be further varied. Finally, the cluster may be initialized with one or more “virtual” programs having feature values randomly selected from programs in the third party viewing history 130.

その後、クラスタ化ルーチン400は、図５とともに以下に説明する、平均値算定ルーチン500を、工程430中に各クラスタの現時の平均値を算定するよう初期化する。クラスタ化ルーチン400は更に、図６とともに以下に説明する、距離算定ルーチン600を、工程440中に、第３者視聴履歴130中の各番組の各クラスタまでの距離を判定するよう、実行する。視聴履歴130中の各番組は更に工程460中に最も近いクラスタに割り当てられる。 Thereafter, the clustering routine 400 initializes an average value calculation routine 500, described below in conjunction with FIG. 5, to calculate the current average value of each cluster during step 430. The clustering routine 400 further executes a distance calculation routine 600, described below in conjunction with FIG. 6, to determine the distance to each cluster of each program in the third party viewing history 130 during step 440. Each program in viewing history 130 is further assigned to the nearest cluster during step 460.

工程470中に、１つのクラスタから別のクラスタに移動した番組があるかを判定するテストを行う。工程470中に番組が１つのクラスタから別のクラスタに移動したと判定された場合、プログラム制御は工程430に戻って、クラスタの安定群が特定されるまで上記に説明した方法で続行する。しかしながら、工程470中に、１つのクラスタから別のクラスタに移動した番組はないと判断された場合には、プログラム制御は工程480に進む。 During step 470, a test is performed to determine if there are programs that have moved from one cluster to another. If it is determined during step 470 that the program has moved from one cluster to another, program control returns to step 430 and continues in the manner described above until a stable group of clusters is identified. However, if it is determined during step 470 that no program has moved from one cluster to another, program control proceeds to step 480.

工程480中には、特定の性能基準が充足されたか、空のクラスタが特定されたか（集合的に、「停止基準」と呼ぶ。）を判定するテストを更に行う。工程480中に停止基準が充足されていない場合、ｋの値は工程485中に増加されてプログラム制御は工程420に戻って上記の方法によって続行する。しかしながら、工程480中に停止基準が充足されたと判断された場合、プログラム制御は終了する。停止基準の評価は更に以下に図８とともに説明する。 During step 480, a further test is performed to determine whether a particular performance criterion has been met or an empty cluster has been identified (collectively referred to as a “stop criterion”). If the stop criteria is not met during step 480, the value of k is increased during step 485 and program control returns to step 420 and continues in the manner described above. However, if it is determined during step 480 that the stop criteria have been met, program control ends. The evaluation of the stop criterion will be further described below with reference to FIG.

例示的クラスタ化ルーチン400は番組を１つのクラスタに配置するので、クリスプなクラスタと呼ばれるものを生成する。更に別の変形例はファジーなクラスタ化を利用し、そしてそれによって特定の例（テレビ番組）が部分的に多くのクラスタに属することを可能にする。ファジーなクラスタ化方法においては、テレビ番組は重みづけが割り当てられ、そしてそれはテレビ番組がどの程度クラスタ平均値に近いかを表す。重みづけはクラスタ平均値からのテレビ番組の距離の逆２乗に従属し得る。単一のテレビ番組に関係するクラスタ重みづけ全部の和は１００％になるはずである。 The exemplary clustering routine 400 places programs into one cluster, thus creating what is called a crisp cluster. Yet another variation utilizes fuzzy clustering, thereby allowing a particular example (television program) to partly belong to many clusters. In a fuzzy clustering method, television programs are assigned a weight, which represents how close the television program is to the cluster average. The weighting may depend on the inverse square of the distance of the television program from the cluster average. The sum of all cluster weights associated with a single television program should be 100%.

クラスタの記号的平均値の算定
図5は本発明の特徴を組み込んだ平均値算定ルーチン500の例示的実施形態を説明する流れ図である。上記のように、平均値算定ルーチン500はクラスタ化ルーチン400によって呼び出されてクラスタの記号的平均値を算定する。数値データについては、平均値は分散値を最小化する値である。この考え方を記号的データにまで広げることによって、クラスタの平均値を、クラスタ内分散値（したがってクラスタの半径又は領域）を最小化する、x_μの値を見出すことによって特定し得、 Cluster Symbolic Average Calculation FIG. 5 is a flow diagram illustrating an exemplary embodiment of an average calculation routine 500 incorporating features of the present invention. As described above, the average value calculation routine 500 is called by the clustering routine 400 to calculate the symbolic average value of the cluster. For numerical data, the average value is the value that minimizes the variance value. By extending this notion to symbolic data, the mean value of the cluster can be specified by finding the value of x _μ that minimizes the intracluster variance (and hence the radius or region of the cluster)

で、Jは同様のクラス（視聴又は非視聴）からのテレビ番組のクラスタで、ｘ_iは番組iについての記号的特徴値で、x_μはVar(J)を最小化するような、Jにおけるテレビ番組の１つからの特徴値である。

Where J is a cluster of television programs from similar classes (viewing or non-viewing), x _i is a symbolic feature value for program i, and x _μ is such that it minimizes Var (J). A feature value from one of the television programs.

したがって、図５に表すように、平均算定ルーチン500は、工程510中に、最初に現在特定クラスタJにある番組を特定する。対象の現時の記号的属性について、工程520中で、クラスタの分散値が式(1)を用いて、考えられる各記号的値x_μについて算定される。分散値を最小化する記号的値x_μが工程530中に平均値として選定される。 Thus, as shown in FIG. 5, the average calculation routine 500 first identifies the program currently in the particular cluster J during step 510. For the current symbolic attribute of interest, in step 520, the variance value of the cluster is calculated for each possible symbolic value x _μ using equation (1). The symbolic value x _μ that minimizes the variance value is selected as an average value during step 530.

工程540中に考慮する記号的属性が更にあるかを判定するテストを行う。工程540中で考慮する記号的属性が更にあると判定された場合、プログラム制御は工程520に戻って上記に説明した方法によって続行する。しかしながら、工程540中で、考慮する記号的属性が更にないと判定された場合、プログラム制御はクラスタ化ルーチン400に戻る。 A test is performed to determine if there are more symbolic attributes to consider during step 540. If it is determined in step 540 that there are more symbolic attributes to consider, program control returns to step 520 and continues in the manner described above. However, if it is determined in step 540 that there are no more symbolic attributes to consider, program control returns to the clustering routine 400.

算定上、Jにおける各記号的特徴値はｘ_μとして試されて分散値を最小化する記号的値がクラスタJにおける対象記号的属性の平均値となる。2つの種類の平均値、すなわち、特徴ベースの平均値と番組ベースの平均値との算定が考えられる。 For calculation, each symbolic feature value in J is tried as x _μ , and the symbolic value that minimizes the variance value is the average value of the target symbolic attribute in cluster J. Two types of average values can be calculated: feature-based average and program-based average.

特徴ベースの記号的平均値
ここに説明する例示的平均値算定ルーチン500は特徴ベースであり、結果として生じるクラスタ平均値はクラスタJにおける例（番組）から取り出された特徴値を有するが、それは記号的属性の平均値がその考えられる値の１つでなければならないからである。しかしながら、クラスタ平均値は「仮想の」テレビ番組であり得ることを特筆する。この仮想番組の特徴値は例の１つ（例えば、「EBC」）から取り出されたチャンネル値と別の例（例えば、現実には「EBC」では放送されない「BBCワールド・ニュース」）から取り出された題名値を有し得る。したがって、最小分散値を表す如何なる特徴値をもその特徴の平均値を表すよう選定される。平均値算定ルーチン500は、工程540中に、全ての特徴位置について、全ての特徴（すなわち、記号的属性）が考慮されたことが判定されるまで、繰り返される。そのようにして取得された、結果として生じる仮想の番組はクラスタの平均値を表すのに用いられる。 Feature-Based Symbolic Average Value The exemplary average value calculation routine 500 described herein is feature-based, and the resulting cluster average value has feature values taken from the example (program) in cluster J, which is symbolic. This is because the average value of the target attribute must be one of its possible values. However, it is noted that the cluster average value can be a “virtual” television program. The feature value of this virtual program is taken from the channel value taken from one of the examples (eg “EBC”) and from another example (eg “BBC World News” which is not actually broadcast on “EBC”). May have a subject value. Thus, any feature value that represents the minimum variance value is selected to represent the average value of that feature. The average calculation routine 500 is repeated during step 540 until it is determined that all features (ie, symbolic attributes) have been considered for all feature locations. The resulting virtual program so obtained is used to represent the average value of the cluster.

番組ベースの記号的平均値
別の変形例においては、分散値についての式(1)において、x_iはテレビ番組i自体であり得、同様にx_μは、クラスタJ中の番組群における分散値を最小化する、クラスタJ中の番組である。この場合においては、個別の特定値間ではなく番組間の距離が最小化するのに適切なメトリックである。更に、この場合において結果として生じる平均値は仮想的な番組ではなくて、ちょうどJ群から選ばれた番組である。そのようにしてクラスタJ中で見出された、クラスタJ中の番組全てにおける分散値を最小化する、如何なる番組をもクラスタの平均値を表すのに用い得る。 Program-Based Symbolic Average In another variation, in equation (1) for the variance value, x _i can be the television program i itself, and similarly x _μ is the variance value for the programs in cluster J. This is a program in cluster J that minimizes. In this case, the metric is appropriate for minimizing the distance between programs rather than between individual specific values. Furthermore, in this case, the resulting average value is not a virtual program, but just a program selected from the J group. Any program found in cluster J that minimizes the variance value for all programs in cluster J can be used to represent the average value of the cluster.

複数の番組を用いた記号的平均値
上記の例示的平均値算定ルーチン500は、（特徴ベースの実施形態か番組ベースの実施形態かにかかわらず）考えられる各特徴について単一の特定値を用いてクラスタの平均値を特徴付ける。しかしながら、平均値算定中に各特徴について１つの特徴値のみに依存することは、平均値がクラスタについての典型的クラスタ中心ではもはやないので、不適切なクラスタ化につながることが多いことが判った。すなわち、1つの番組のみでクラスタを表すのは望ましくないかもしれず、むしろ、平均値又は複数の平均値を表す複数の番組を、クラスタを表すのに利用し得る。したがって、別の変形例においては、クラスタは考えられる各特徴について複数の平均値又は複数の特徴値によって表し得る。したがって、分散値を最小化する、（特徴ベースの記号的平均値についての）Nの特徴又は（番組ベースの記号的平均値についての）Nの番組が工程530中に選定され、Nはクラスタの平均値を表すのに用いられる番組数である。 Symbolic Average Using Multiple Programs The exemplary average calculation routine 500 described above uses a single specific value for each possible feature (whether feature-based or program-based embodiments). Characterize the cluster average. However, it has been found that relying on only one feature value for each feature during mean value calculation often leads to inadequate clustering because the mean value is no longer at the typical cluster center for the cluster. . That is, it may not be desirable to represent a cluster with only one program, but rather multiple programs representing an average value or multiple average values may be utilized to represent a cluster. Thus, in another variation, a cluster may be represented by multiple average values or multiple feature values for each possible feature. Thus, N features (for feature-based symbolic averages) or N programs (for program-based symbolic averages) that minimize the variance value are selected during step 530, where N is The number of programs used to represent the average value.

番組とクラスタとの間の距離の算定
上記のように、距離算定ルーチン600はクラスタ化ルーチン400によって呼び出されてテレビ番組から各クラスタまでどの程度近いかを特定のテレビ番組と特定クラスタの平均値との間の距離に基づいて評価する。算定距離メトリックは標本データ・セットにおけるさまざまな例の間の相違を数量化してクラスタの領域について判定する。利用者プロフィールのクラスタ化を可能するためには、視聴履歴における如何なる２つのテレビ番組の間の距離をも算定する必要がある。一般に、お互いに近いテレビ番組は１つのクラスタに入る傾向がある。ユークリッド距離、マンハッタン距離、及びマハラノビス距離のような、数値ベクトル間の距離を算定する多数の比較的簡単な手法が存在する。既存の距離算定手法は、しかしながら、テレビ番組ベクトルの場合には用いることができないが、これはテレビ番組が主として記号的特徴値を有するからである。例えば、2001年3月22日の午後8時に「EBC」で放送された「悪魔」の一話及び2001年3月25日の午後8時に「FEX」で放送された「サイモンズ」の一話のような２つのテレビ番組は以下の特徴ベクトルを用いて表し得る：
題名：悪魔題名：サイモンズ
チャンネル：EBC チャンネル：FEX
放送日：2001年3月22日放送日：2001年3月25日
放送時間：20：00 放送時間：20：00
明らかに、既知の数値距離メトリックを特徴値「EBC」と特徴値「FEX」との間の距離を算定するのに用いることはできない。値差分メトリック（VDM）は記号的特徴値領域における特徴値間の距離を測定する既存の手法である。VDMの手法は各特徴の考えられる各値についてのインスタンス全ての分類の類似性を考慮する。この方法を用いることによって、特徴の値全ての間の距離を特定するマトリックスが、訓練群における例に基づいて統計的に導き出される。記号的特徴値間の距離を算定するVDM手法は更に詳細に説明されているものがある（例えば、非特許文献１参照。）
本発明は、２つのテレビ番組又は他の対象品目間の特徴値間の距離を算定するのに、VDM手法又はその変形例を利用する。元のVDMは、２つの特徴値間の距離算定において重みづけ項を使用するので、距離メトリックを非対称にする。変形VDM（MVDM）は当該重みづけ項を除外して距離マトリックスを対称にする。記号的特徴値間の距離を算定するMVDM手法の更に詳細な説明をしているものもある（例えば、非特許文献２参照。）
MVDMによれば、V1及びV2の２つの値の間の距離δは、特定の特徴について、 Calculation of Distance Between Programs and Clusters As described above, the distance calculation routine 600 is called by the clustering routine 400 to determine how close a TV program is to each cluster and the average value of a specific TV program and a specific cluster. Evaluate based on the distance between. The calculated distance metric quantifies the differences between the various examples in the sample data set to determine the area of the cluster. In order to be able to cluster user profiles, it is necessary to calculate the distance between any two television programs in the viewing history. In general, television programs that are close to each other tend to be in one cluster. There are a number of relatively simple techniques for calculating the distance between numerical vectors, such as the Euclidean distance, the Manhattan distance, and the Mahalanobis distance. Existing distance calculation methods, however, cannot be used in the case of television program vectors, since television programs mainly have symbolic feature values. For example, the story of “Devil” broadcast on “EBC” on March 22, 2001 at 8 pm and the story of “Simons” broadcast on “FEX” on March 25, 2001 at 8 pm Two such television programs may be represented using the following feature vectors:
Title: Devil Title: Simons Channel: EBC Channel: FEX
Broadcast date: March 22, 2001 Broadcast date: March 25, 2001 Broadcast time: 20:00 Broadcast time: 20:00
Obviously, a known numerical distance metric cannot be used to calculate the distance between the feature value “EBC” and the feature value “FEX”. Value difference metric (VDM) is an existing technique for measuring the distance between feature values in a symbolic feature value region. The VDM approach takes into account the classification similarity of all instances for each possible value of each feature. By using this method, a matrix that identifies the distance between all feature values is statistically derived based on examples in the training group. Some VDM methods for calculating the distance between symbolic feature values are described in more detail (see, for example, Non-Patent Document 1).
The present invention utilizes the VDM technique or variations thereof to calculate the distance between feature values between two television programs or other target items. The original VDM uses a weighting term in calculating the distance between two feature values, thus making the distance metric asymmetric. The modified VDM (MVDM) makes the distance matrix symmetric by excluding the weighting term. Some MVDM methods for calculating the distance between symbolic feature values are described in more detail (for example, see Non-Patent Document 2).
According to MVDM, the distance δ between the two values of V1 and V2 is

によって示される。本発明の番組推奨環境においては、MVDM式(3)は、特に「視聴」及び「非視聴」のクラスを扱うよう変換される。

Indicated by. In the program recommendation environment of the present invention, the MVDM expression (3) is converted to handle classes of “viewing” and “non-viewing”.

式(4)においては、V1及びV2は対象とする特徴について考えられる2つの値である。上記の例について続けると、特徴「チャンネル」について、第１値V1は「EBC」に等しくて、第２値V2は「FEX」に等しい。該値の間の距離は当該例が分類される全てのクラスの和である。本発明の例示的番組推奨ツール実施例について適切なクラスは「視聴」及び「非視聴」である。C1iはV1（「EBC」）がクラスi(iはクラス視聴を意味する1に等しい)に分類された回数で、C1(「C1_total」)はV1がデータ・セット中に発生した合計回数である。値「r」は定数で、通常１に設定される。

In Equation (4), V1 and V2 are two possible values for the target feature. Continuing with the above example, for the feature “channel”, the first value V1 is equal to “EBC” and the second value V2 is equal to “FEX”. The distance between the values is the sum of all classes into which the example is classified. Suitable classes for the exemplary program recommendation tool embodiment of the present invention are “viewing” and “non-viewing”. C1i is the number of times V1 ("EBC") is classified as class i (i is equal to 1 meaning class viewing) and C1 ("C1_total") is the total number of times V1 occurred in the data set . The value “r” is a constant and is usually set to 1.

式(4)によって定義されたメトリックは、全ての分類について同様の相対頻度で発生する場合には、値を相似のものとして特定する。項C1i/C1は、対象とする特徴が値V1を有するとすると、残余中央値がiとして分類される可能性を表す。したがって、２つの値は、該値が考えられる全ての分類について相似の可能性を示す場合、相似である。式(4)は2つの値の間の全体的な相似を、全ての分類においての該値の可能性の差分の和を見出すことによって算定する。２つのテレビ番組の間の距離は２つのテレビ番組ベクトルの対応する特徴値の間の距離の和である。 If the metric defined by equation (4) occurs with similar relative frequency for all classifications, the value is specified as similar. The term C1i / C1 represents the possibility that the residual median is classified as i, assuming that the feature of interest has the value V1. Thus, two values are similar if they indicate similarities for all possible classifications. Equation (4) calculates the overall similarity between two values by finding the sum of the possible differences of the values in all classifications. The distance between two television programs is the sum of the distances between corresponding feature values of the two television program vectors.

図７Aは特徴「チャンネル」に関係する特徴値についての距離表の一部である。図７Aは各クラスについての各チャンネル特徴の発生数をプログラム化している。図7Aに表す値は例示的第３者視聴履歴130から取ってきたものである。 FIG. 7A is a portion of a distance table for feature values related to the feature “channel”. FIG. 7A programs the number of occurrences of each channel feature for each class. The values depicted in FIG. 7A are taken from an exemplary third-party viewing history 130.

図７Bは、MVDM式(4)を用いて、図７Aに表す例示的統計から算定された各特徴値対の間の距離を表示する。直感的に言えば、「EBC」と「ABS」は、大部分はクラス「視聴」において発生し、クラス「非視聴」においては発生しない（「ABS」は「非視聴」の構成部分がわずかにある）ので、お互いに「近い」はずである。図７Bは、「EBC」と「ABS」との間の距離が（非ゼロの）小さな値であることによって、直感を強めるものである。「ASPN」は、一方、大部分がクラス「非視聴」において発生し、したがって、このデータ・セットについては、「EBC」と「ABS」との両方から「遠い」はずである。図７Bは、「EBC」と「ASPN」との間の距離を、2.0の考えられる最大距離に対して、1.895になるようプログラム化される。同様に、「ABS」と「ASPN」との間の距離は1.828の値で高くなっている。 FIG. 7B displays the distance between each feature value pair calculated from the exemplary statistics depicted in FIG. 7A using MVDM equation (4). Intuitively speaking, “EBC” and “ABS” mostly occur in the class “viewing” and do not occur in the class “non-viewing” (“ABS” has a slightly non-viewing component. So, they should be “close” to each other. FIG. 7B enhances intuition by the fact that the distance between “EBC” and “ABS” is a small (non-zero) value. “ASPN”, on the other hand, occurs mostly in the class “non-viewing”, so this data set should be “far” from both “EBC” and “ABS”. FIG. 7B is programmed such that the distance between “EBC” and “ASPN” is 1.895, for a maximum possible distance of 2.0. Similarly, the distance between “ABS” and “ASPN” is high at a value of 1.828.

したがって、図６に表すように、距離算定ルーチン600はまず、工程610中で第３者視聴履歴130において番組を特定する。現時の対象番組について、距離算定ルーチン600は式(4)を用いて工程620中に各記号的特徴値の、（平均値算定ルーチン500によって判定された）各クラスタ平均値に対応する特徴までの、距離を算定する。 Accordingly, as shown in FIG. 6, the distance calculation routine 600 first identifies the program in the third party viewing history 130 in step 610. For the current program of interest, the distance calculation routine 600 uses equation (4) to calculate the characteristics of each symbolic feature value during step 620 up to the feature corresponding to each cluster average value (determined by the average value calculation routine 500). Calculate the distance.

現時の番組とクラスタ平均値との間の距離は工程630中に、対応する特徴値間の距離を総計することによって、算定される。工程640中に、第3者視聴履歴130において対象とする番組が更にあるかを判定するテストを行う。工程640中で、第３者視聴履歴130において対象とする番組が更にあると判断された場合、工程650中で次の番組が特定されてプログラム制御は工程620に進んで上記の方法で続行する。 The distance between the current program and the cluster average is calculated during step 630 by summing the distances between corresponding feature values. During step 640, a test is performed to determine whether there are more programs of interest in the third party viewing history 130. If it is determined in step 640 that there are more programs of interest in the third party viewing history 130, the next program is identified in step 650 and program control proceeds to step 620 and continues in the manner described above. .

しかしながら、工程640中で第３者視聴履歴130において対象とする番組が更にないと判定された場合、プログラム制御はクラスタ化ルーチン400に戻る。 However, if it is determined in step 640 that there are no more programs in the third party viewing history 130, program control returns to the clustering routine 400.

上記の「複数の番組から導き出された記号的平均値」のように、クラスタの平均値は、（特徴ベースの実施形態か番組ベースの実施形態かにかかわらず）考えられる各特徴についての特徴値数を用いて特徴付け得る。複数の平均値から生じる結果は更に、距離算定ルーチン600の変形例によって投票を通じたコンセンサス決定に達するようプールされる。例えば、今度は工程620中で、番組の特定の特徴値とさまざまな平均値に対応する特徴値の各々との距離が算定される。該最小距離結果はプールされて、例えば、多数決又は専門家をコンセンサス決定に達するよう寄せ集めることによって、投票に用いられる。そのような手法を更に詳細に説明したものもある（例えば、非特許文献３参照。）。 Like the “symbolic mean derived from multiple programs” above, the mean value of the cluster is the feature value for each possible feature (regardless of feature-based or program-based embodiment). Can be characterized using numbers. Results resulting from multiple averages are further pooled to reach consensus decisions through voting by a variation of the distance calculation routine 600. For example, now in step 620, the distance between a particular feature value of the program and each of the feature values corresponding to various average values is calculated. The minimum distance results are pooled and used for voting, for example by voting majority or experts to reach consensus decisions. Some have described such a technique in more detail (see, for example, Non-Patent Document 3).

停止基準
上記のように、クラスタ化ルーチン400は、図8に表すように、クラスタ化性能評価ルーチン800を呼び出して、クラスタの生成を停止する基準がいつ充足されたかを判定する。例示的クラスタ化ルーチン400は、ｋの動的値を、例データのクラスタ化を更に行っても分類精度における向上を何らもたらさなくなる際に安定値ｋに達するという条件で、使用する。更に、クラスタ・サイズは、空のクラスタが記録される時点まで増加し得る。したがって、クラスタ化は、クラスタの自然レベルに達した際に停止する。 Stop Criteria As described above, the clustering routine 400 calls the clustering performance evaluation routine 800, as shown in FIG. 8, to determine when the criteria for stopping cluster generation have been met. The example clustering routine 400 uses the dynamic value of k, provided that further clustering of the example data does not result in any improvement in classification accuracy and reaches a stable value k. Furthermore, the cluster size can be increased to the point where empty clusters are recorded. Thus, clustering stops when the natural level of the cluster is reached.

例示的クラスタ化性能評価ルーチン800は、第３者視聴履歴130からの番組のサブセット（テスト・データ・セット）を用いてクラスタ化ルーチン400の分類精度をテストする。当該テスト・セットにおける各番組について、クラスタ化性能評価ルーチン800は該番組に最も近いクラスタ（どのクラスタ平均値が最も近いか）を判定して、クラスタ及び対象番組についてクラス・ラベルを比較する。整合クラス・ラベルの割合をクラスタ化ルーチン400の精度と解釈する。 The example clustering performance evaluation routine 800 tests the classification accuracy of the clustering routine 400 using a subset of programs (test data set) from the third party viewing history 130. For each program in the test set, the clustering performance evaluation routine 800 determines the cluster closest to the program (which cluster average is the closest) and compares the class labels for the cluster and the target program. The percentage of consistent class labels is interpreted as the accuracy of the clustering routine 400.

このようにして、図８のように、クラスタ化性能評価ルーチン800はまず、工程810中で、第３者視聴履歴130からの番組のサブセットをテスト・データ・セットの役目を担うよう、収集する。その後、クラス・ラベルが各クラスタに対して工程820中に、クラスタ中の視聴および非視聴番組の割合に基づいて割り当てられる。例えば、クラスタ中の番組の大部分が視聴されている場合、該クラスタは「視聴」のラベルが割り当てられる。 Thus, as shown in FIG. 8, the clustering performance evaluation routine 800 first collects a subset of programs from the third party viewing history 130 to act as a test data set in step 810. . A class label is then assigned to each cluster during step 820 based on the percentage of viewing and non-viewing programs in the cluster. For example, if most of the programs in a cluster are being viewed, the cluster is assigned a “view” label.

テスト・セットにおいて各番組に最も近いクラスタが工程830中に特定されて、割り当てられたクラスタについてのクラス・ラベルが、番組が実際に視聴されたかどうかについて、比較される。クラスタの平均値を表すのに複数の番組を用いた実施形態においては、（各番組までの）平均距離又は投票手法が使用され得る。整合クラス・レベルの割合が、工程840中で、プログラム制御がクラスタ化ルーチン400に戻る前に、判定される。クラスタ化ルーチン400は、分類精度があらかじめ定義された閾値に達した場合、終了する。 The cluster closest to each program in the test set is identified during step 830 and the class label for the assigned cluster is compared as to whether the program was actually viewed. In embodiments that use multiple programs to represent the average value of the cluster, an average distance (to each program) or a voting technique may be used. The consistency class level percentage is determined in step 840 before program control returns to the clustering routine 400. The clustering routine 400 ends when the classification accuracy reaches a predefined threshold.

ここで表して説明した実施例及び変形例は単に本発明の原理を示すものであって、さまざまな変更が当業者によって、本発明の範囲及び精神から逸脱することなく、実施し得ることを特筆する。 It should be noted that the embodiments and variations shown and described herein are merely illustrative of the principles of the invention and that various modifications may be made by those skilled in the art without departing from the scope and spirit of the invention. To do.

本発明による、テレビ番組推奨ツールの概略構成図である。It is a schematic block diagram of the television program recommendation tool by this invention. 図１の例示的番組データベースからの見本テーブルである。2 is a sample table from the exemplary program database of FIG. 本発明の原理を実施する、図１のステレオタイプ・プロフィール化処理を説明する流れ図である。2 is a flow diagram illustrating the stereotype profiling process of FIG. 1 embodying the principles of the present invention. 本発明の原理を実施する、図１のクラスタ化ルーチンを説明する流れ図である。2 is a flow diagram illustrating the clustering routine of FIG. 1 implementing the principles of the present invention. 本発明の原理を実施する、図１の平均値算定ルーチンを説明する流れ図である。2 is a flowchart illustrating the average value calculation routine of FIG. 1 implementing the principles of the present invention. 本発明の原理を実施する、図１の距離算定ルーチンを説明する流れ図である。2 is a flow diagram illustrating the distance calculation routine of FIG. 1 implementing the principles of the present invention. 各クラスについて各チャンネルの特徴値の発生数を示す、例示的チャンネル特徴値発生テーブルからの見本テーブルである。6 is a sample table from an exemplary channel feature value generation table showing the number of occurrences of feature values for each channel for each class. 図７Ａに表す例示的統計から算定された各特徴値対間の距離を示す例示的特徴値対距離テーブルからの見本テーブルである。7B is a sample table from an exemplary feature value versus distance table showing the distance between each feature value pair calculated from the exemplary statistics depicted in FIG. 7A. 本発明の原理を実施する、図１のクラスタ化性能評価ルーチンを説明する流れ図である。2 is a flow diagram illustrating the clustering performance evaluation routine of FIG. 1 implementing the principles of the present invention.

Claims

A method for generating a user profile indicative of user preferences in a system having a processing unit comprising:
The processing device obtaining a third party selection history representing a television program selected by at least one third party;
The processing device comprising dividing the third party selection history into clusters of television programs ; and the processing device receiving at least one selection of the clusters from the user ;
As before SL user profile, a method of pre-Symbol viewing history with a television program from at least one selected cluster, wherein Rukoto generated by the processing device towards the user.

The method of claim 1, further comprising:
The processing device recommending a television program based on the user profile;
A method characterized by comprising:

The method of claim 1, further comprising:
The processing device assigning a label to each of the clusters;
A method characterized by comprising:

4. The method of claim 3, wherein the user selects the at least one cluster based on the assigned label.

The method of claim 1, wherein the dividing step further comprises:
The processor uses a k-means clustering routine;
A method characterized by comprising:

A system that generates a user profile that shows user preferences:
A storage device storing a computer readable code; and a processing device operatively coupled to the storage device;
The processing device has:
Obtaining a third party selection history indicating a television program selected by at least one third party;
Dividing the third party selection history into clusters of television programs ; and receiving at least one selection of the clusters from the user;
In addition,
System Examples user profile, the prior SL viewing history with a television program from at least one selected cluster towards the user, and wherein the Rukoto generated by said processing device.

7. The system of claim 6 , wherein the processing device is:
Recommending television programs based on the selected clusters;
A system characterized by being configured to perform:

A device that generates a user profile indicating user preferences:
A computer-readable recording medium;
The medium has computer-readable code means on the medium, and the computer-readable code means is:
A function of obtaining a third party selection history indicating a television program selected by at least one third party;
A function of dividing the third party selection history into clusters of television programs ; and a function of receiving at least one selection of the clusters from the user;
In addition,
Wherein as the user profile, prior Symbol least one viewing history with a television program from the selected clusters toward the user, and wherein the Rukoto generated by readable code means by the computer.

9. The apparatus of claim 8 , further comprising code means readable by the computer:
The ability to recommend television programs based on the selected cluster;
A device characterized by comprising: