JP2009251957A

JP2009251957A - Interest information specification system, interest information specification method, and program for interest information specification

Info

Publication number: JP2009251957A
Application number: JP2008099613A
Authority: JP
Inventors: Yoji Miyazaki; 陽司宮崎
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2008-04-07
Filing date: 2008-04-07
Publication date: 2009-10-29
Anticipated expiration: 2028-04-07
Also published as: JP5228584B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an interest information specification system for specifying feature words which appear at random in a certain period as interest information. <P>SOLUTION: An appearance frequency calculation means 971 calculates a feature word appearance frequency as the use frequency of a content shown by feature words for each feature word by referring to a feature word history including feature words showing features of the content used by a person or a group and a use time when each content whose features are shown by the feature words by the person or the group. An appearance interval calculation means 972 calculates an appearance time interval as the use time interval of content shown by the feature words for each feature word by referring to the feature word history. A feature word evaluation means 973 refers to the feature word appearance frequency and the appearance time interval, and calculates the evaluation value of the feature words according to the dissociation of the distribution of the appearance time interval and the probability distribution of a model for each feature word. A feature word specification means 974 specifies the feature words based on the feature value. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、興味情報特定システム、興味情報特定方法および興味情報特定用プログラムに関し、特に、人物やグループの定常的な興味を特定する興味情報特定システム、興味情報特定方法および興味情報特定用プログラムに関する。 The present invention relates to an interest information identification system, an interest information identification method, and an interest information identification program, and more particularly to an interest information identification system, an interest information identification method, and an interest information identification program for identifying a regular interest of a person or a group. .

文書、音楽、動画などのコンテンツを推薦するシステムにおいて、ユーザの過去のコンテンツの閲覧履歴、視聴履歴をもとにユーザの興味を捉え、ユーザに合ったコンテンツの推薦あるいは検索を支援するシステムがある。 In a system for recommending content such as documents, music, and videos, there is a system that supports the recommendation or search of content suitable for a user by capturing the user's interest based on the user's past browsing history and viewing history of the content. .

このようなシステムでは、ユーザによる各コンテンツの閲覧回数と、ユーザが閲覧したコンテンツの特徴を表す特徴語とを用いてユーザの興味を表す情報を特定する。 In such a system, information representing the user's interest is specified using the number of times each content is browsed by the user and a feature word representing a feature of the content browsed by the user.

例えば、特徴語が「提案書」、「ネットワーク」である文書１と、特徴語が「提案書」、「セキュリティ」である文書２とがある場合を考える。あるユーザＡが、文書１を２回、文書２を３回閲覧したとすると、「提案書」は５回出現し、「ネットワーク」は２回出現し、「セキュリティ」は３回出現したと考える。その出現回数が、ユーザＡの各特徴語に対する興味の強さを表しているとして、「提案書」、「ネットワーク」、「セキュリティ」に対する興味の強さを「５」、「２」、「３」と表現することができる。この興味の強さから、ユーザＡは、「ネットワーク」や「セキュリティ」に比べ、「提案書」に強い興味を持っていることが分かる。 For example, consider a case where there is a document 1 whose feature words are “proposal” and “network”, and a document 2 whose feature words are “proposal” and “security”. If a user A views document 1 twice and document 2 three times, it is considered that “suggestion” appears five times, “network” appears twice, and “security” appears three times. . Assuming that the number of appearances represents the strength of interest in each feature word of user A, the strengths of interest in “proposal”, “network”, and “security” are “5”, “2”, “3” Can be expressed. From the strength of this interest, it can be seen that the user A is more interested in the “proposal” than the “network” and “security”.

さらに、利用者の興味遷移を考慮したコンテンツのレコメンドを可能とする情報レコメンド方法が提案されている（例えば、特許文献１参照）。特許文献１に記載の方法では、利用者の操作履歴と、コンテンツのメタデータ情報とに基づき、利用者の短期的な項目別志向情報を算出し、その情報を基に時間的興味遷移を考慮した利用者項目別志向情報を算出する。そして、その情報とコンテンツのメタデータ情報を基に、利用者の時間的興味遷移を考慮したコンテンツのレコメンドを実施する。 Furthermore, an information recommendation method has been proposed that makes it possible to recommend content in consideration of user interest transitions (see, for example, Patent Document 1). In the method described in Patent Document 1, short-term item-oriented information of a user is calculated based on the user's operation history and content metadata information, and temporal interest transition is considered based on the information. The user item-oriented information is calculated. Then, based on the information and the metadata information of the content, the content recommendation considering the temporal interest transition of the user is performed.

また、特許文献２には、短期と長期のユーザの嗜好情報を生成する情報処理装置が記載されている。特許文献２に記載された装置ではユーザの嗜好を示す嗜好情報データの更新が急速に進むか緩やかに進むかが、嗜好要素パラメータによって決定される。特許文献２に記載された装置は、操作嗜好値パラメータと嗜好要素パラメータとの積として嗜好加算値を算出し、嗜好値に加算する。この結果、嗜好要素パラメータを小さく定めれば、１回の視聴、操作では嗜好値があまり大きく増加せず、長期的な嗜好を表すことができる。また、嗜好要素パラメータを大きくすることで、１回の視聴、操作により嗜好値が大きく増加するので、短期的な嗜好を表すことができる。 Patent Document 2 describes an information processing apparatus that generates preference information for short-term and long-term users. In the device described in Patent Document 2, whether the update of the preference information data indicating the user's preference proceeds rapidly or slowly is determined by the preference element parameter. The apparatus described in Patent Literature 2 calculates a preference addition value as a product of an operation preference value parameter and a preference element parameter, and adds the preference addition value to the preference value. As a result, if the preference element parameter is set to be small, the preference value does not increase so much by one viewing and operation, and long-term preference can be expressed. In addition, by increasing the preference element parameter, the preference value is greatly increased by a single viewing and operation, and thus a short-term preference can be expressed.

また、特許文献３には、ユーザの操作により得られるキーワードを用いて、ユーザが潜在的に興味を抱いている情報をレコメンド情報として提供する潜在ニーズ推論装置が記載されている。特許文献３に記載の装置では、「発生頻度が高い」、「発生時刻の最大値と最小値の差が大きい」、「発生間隔の標準偏差が小さい」という性質を総合的に強く持つキーワードを特定する。 Patent Document 3 describes a latent needs inference device that provides information that a user is potentially interested in as recommendation information using a keyword obtained by a user's operation. In the apparatus described in Patent Document 3, keywords having a strong overall characteristic of “high occurrence frequency”, “large difference between maximum and minimum occurrence times”, and “small standard deviation of occurrence interval” Identify.

また、非特許文献１には、指数分布が、ある条件のもとで決められた事象が発生するまでの時間（待ち時間）の分布として知られていることが記載されている。 Non-Patent Document 1 describes that the exponential distribution is known as the distribution of time (waiting time) until an event determined under a certain condition occurs.

また、特許文献４には、入力文書画像の画像特徴量の分布と、登録画像の画像特徴量の分布とを比較することで、登録画像の入力画像に対する類似度スコアを計算する画像検索システムが記載されている。 Patent Document 4 discloses an image search system that calculates a similarity score for an input image of a registered image by comparing the distribution of the image feature amount of the input document image with the distribution of the image feature amount of the registered image. Are listed.

特開２００５−２０８８９６号公報（段落００２８）Japanese Patent Laying-Open No. 2005-208896 (paragraph 0028) 特開２００５−８６４７２号公報（段落００６０，００６１，００７４−００８０）JP 2005-86472 A (paragraphs 0060, 0061, 0074-0080) 再公表特許ＷＯ０１／０３９１１８号（第６ページ、第１３ページ）Republished patent WO01 / 039118 (6th page, 13th page) 特開２００７−１７２０７７号公報（段落００２１，００２３）JP 2007-172077 A (paragraphs 0021, 0023) 松原望著、「入門確率過程」、第５刷、東京図書株式会社、２００７年５月１０日、ｐ．４６Matsubara Nozomi, “Introductory Stochastic Process”, 5th edition, Tokyo Book Co., Ltd., May 10, 2007, p. 46

単にコンテンツの閲覧回数に応じて、コンテンツの特徴を表す特徴語の出現回数をカウントするだけでは、人やグループの定常的な興味を特定することはできない。例えば、あるユーザが「Ｂ」という特徴語で表される事項に定常的に興味を有しているとする。そして、そのユーザがある特定の期間に、調査目的のためだけに特徴語「Ａ」を持つ文書を多数閲覧し、その後、特徴語「Ａ」を持つ文書の閲覧を停止したとする。この場合、調査目的のために一時的に特徴語「Ａ」に関連する文書を閲覧しただけにもかかわらず、特徴語「Ｂ」の出現回数が特徴語「Ａ」の出現回数を超えるまでの期間は、ユーザが特徴語「Ａ」に強い興味をもっていると判定されてしまう。 It is not possible to specify the regular interest of a person or group simply by counting the number of appearances of feature words representing the features of the content according to the number of times of browsing the content. For example, it is assumed that a certain user is regularly interested in matters represented by a characteristic word “B”. Then, it is assumed that the user browses a large number of documents having the characteristic word “A” only for the purpose of investigation during a certain period, and then stops browsing the documents having the characteristic word “A”. In this case, the number of appearances of the feature word “B” exceeds the number of appearances of the feature word “A” even though the document related to the feature word “A” is temporarily browsed for the purpose of investigation. During the period, it is determined that the user has a strong interest in the feature word “A”.

ここで、定常的な興味とは、短期間の間のみに生じる興味ではなく、ある程度長期的に人やグループが持っている興味である。 Here, the constant interest is not an interest that occurs only for a short period of time, but an interest that a person or group has for a long period of time.

特許文献３に記載された発明では、「発生頻度が高い」、「発生時刻の最大値と最小値の差が大きい」、「発生間隔の標準偏差が小さい」という性質を総合的に強く持つキーワードを特定することで、ユーザの潜在的な興味を推定する。しかし、定常的な興味を示す特徴語の代表的な出現態様として、「ある期間内においてランダムに発生する」という出現態様があり、このような特徴語をより効果的に抽出できることが好ましい。 In the invention described in Patent Document 3, a keyword having a strong overall characteristic of “high occurrence frequency”, “large difference between maximum and minimum occurrence times”, and “small standard deviation of occurrence interval” By identifying, the user's potential interest is estimated. However, as a typical appearance mode of feature words indicating a constant interest, there is an appearance mode of “occurring randomly within a certain period”, and it is preferable that such feature words can be extracted more effectively.

また、「発生時刻の最大値と最小値の差が大きい」、「発生間隔の標準偏差が小さい」などの性質を持つ特徴語を抽出する場合において、出現間隔の長さを考慮して特徴語を抽出できることがより好ましい。例えば、長期間にわたって毎日出現している特徴語に特化して抽出したり、毎日でなくても、長期間の間に数日おきに出現することが繰り返される特徴語であれば、定常的な興味を示す情報として抽出したりすることができることが好ましい。 In addition, when extracting feature words with properties such as “the difference between the maximum and minimum occurrence times is large” and “the standard deviation of the occurrence interval is small”, the feature words are taken into account when considering the length of the appearance interval. More preferably, can be extracted. For example, if it is a feature word that is extracted specially for a feature word that appears every day for a long period of time, or a feature word that repeats every few days even if it is not every day, It is preferable that information indicating interest can be extracted.

そこで、本発明は、ある期間内でランダムに出現する特徴語を興味情報として特定することができる興味情報特定システム、興味情報特定方法、および興味情報特定用プログラムを提供することを目的とする。 Therefore, an object of the present invention is to provide an interest information specifying system, an interest information specifying method, and an interest information specifying program that can specify feature words that appear randomly within a certain period as interest information.

また、本発明は、発生時刻の最大値と最小値の差が大きく、発生間隔の標準偏差が小さい特徴語を特定するときに、出現間隔の長さを考慮して特徴語を特定することができる興味情報特定システム、興味情報特定方法、および興味情報特定用プログラムを提供することを目的とする。 Further, according to the present invention, when specifying a feature word having a large difference between the maximum value and the minimum value of the occurrence time and a small standard deviation of the occurrence interval, the feature word may be specified in consideration of the length of the appearance interval. An object of the present invention is to provide an interest information identification system, an interest information identification method, and an interest information identification program.

本発明の興味情報特定システムは、人物またはグループの興味を表す興味情報を特定する興味情報特定システムであって、人物またはグループが利用したコンテンツの特徴を表す特徴語と、人物またはグループがその特徴語により特徴が表される各コンテンツを利用した利用時刻とを含む特徴語履歴を参照して、特徴語毎に、特徴語が表すコンテンツの利用頻度である特徴語出現頻度を求める出現頻度計算手段と、特徴語履歴を参照して、特徴語毎に、特徴語が表すコンテンツの利用時刻間隔である出現時刻間隔を求める出現間隔計算手段と、特徴語出現頻度および出現時刻間隔を参照し、特徴語毎に、出現時刻間隔の分布と、モデルとなる確率分布との乖離に応じて特徴語の評価値を求める特徴語評価手段と、評価値に基づいて特徴語を特定する特徴語特定手段とを備えることを特徴とする。 The interest information identification system of the present invention is an interest information identification system for identifying interest information representing the interest of a person or group, wherein the feature word representing the characteristics of the content used by the person or group, Appearance frequency calculation means for obtaining, for each feature word, a feature word appearance frequency, which is the use frequency of the content represented by the feature word, with reference to a feature word history including the use time of each content whose feature is represented by the word And, referring to the feature word history, for each feature word, refer to the appearance interval calculation means for obtaining the appearance time interval that is the use time interval of the content represented by the feature word, the feature word appearance frequency and the appearance time interval, For each word, feature word evaluation means that obtains the evaluation value of the feature word according to the difference between the distribution of the appearance time intervals and the probability distribution as a model, and the feature word based on the evaluation value Characterized in that it comprises a feature word specifying means for.

また、本発明の興味情報特定システムは、人物またはグループの興味を表す興味情報を特定する興味情報特定システムであって、人物またはグループが利用したコンテンツの特徴を表す特徴語と、人物またはグループがその特徴語により特徴が表される各コンテンツを利用した利用時刻とを含む特徴語履歴を参照して、特徴語毎に、特徴語が表すコンテンツの利用頻度である特徴語出現頻度を求める出現頻度計算手段と、特徴語履歴を参照して、特徴語毎に、特徴語が表すコンテンツの利用時刻間隔である出現時刻間隔を求める出現間隔計算手段と、特徴語毎に、出現時刻間隔の標準偏差および平均値を計算し、標準偏差をＳＴＤＥＶとし、平均値をＡＶＥとし、特徴語が表すコンテンツの最初の利用時刻および最後の利用時刻をそれぞれＴ_ｌａｓｔ，Ｔ_０とし、特徴語履歴の導出対象期間をＴとしたときに、パラメータβを用いて、｛（Ｔ_ｌａｓｔ−Ｔ_０）／Ｔ｝・ｅ^{（−β・ＳＴＤＥＶ・ＡＶＥ）}を計算することにより、特徴語の評価値を求め、特徴語出現頻度が定められた回数以下である特徴語の評価値を所定値に定める特徴語評価手段と、評価値に基づいて特徴語を特定する特徴語特定手段とを備えることを特徴とする。 The interest information identification system of the present invention is an interest information identification system for identifying interest information representing the interest of a person or group, wherein a feature word representing the characteristics of content used by the person or group, and a person or group Appearance frequency for obtaining the feature word appearance frequency, which is the use frequency of the content represented by the feature word, for each feature word with reference to the feature word history including the use time of each content whose feature is represented by the feature word A calculation means, an appearance interval calculation means for obtaining an appearance time interval that is a use time interval of content represented by the feature word for each feature word with reference to the feature word history, and a standard deviation of the appearance time interval for each feature word The average value is calculated, the standard deviation is STDEV, the average value is AVE, and the first use time and the last use time of the content represented by the feature word are respectively T {(T _last −T ₀ ) / T} · e ^{(−β · STDEV · AVE)} is calculated by using the parameter β, where _last and T ₀ are T and the characteristic word history derivation target period is T. Thus, a feature word evaluation unit that obtains an evaluation value of a feature word, sets the evaluation value of the feature word that is equal to or less than a predetermined number of times the feature word appears, and a feature word that identifies the feature word based on the evaluation value And a word specifying means.

また、本発明の興味情報特定方法は、人物またはグループの興味を表す興味情報を特定する興味情報特定方法であって、人物またはグループが利用したコンテンツの特徴を表す特徴語と、人物またはグループがその特徴語により特徴が表される各コンテンツを利用した利用時刻とを含む特徴語履歴を参照して、特徴語毎に、特徴語が表すコンテンツの利用頻度である特徴語出現頻度を求める出現頻度計算ステップと、特徴語履歴を参照して、特徴語毎に、特徴語が表すコンテンツの利用時刻間隔である出現時刻間隔を求める出現間隔計算ステップと、特徴語出現頻度および出現時刻間隔を参照し、特徴語毎に、出現時刻間隔の分布と、モデルとなる確率分布との乖離に応じて特徴語の評価値を求める特徴語評価ステップと、評価値に基づいて特徴語を特定する特徴語特定ステップとを含むことを特徴とする。 The interest information specifying method of the present invention is an interest information specifying method for specifying interest information representing the interest of a person or group, wherein a feature word representing a feature of content used by the person or group and a person or group Appearance frequency for obtaining the feature word appearance frequency, which is the use frequency of the content represented by the feature word, for each feature word with reference to the feature word history including the use time of each content whose feature is represented by the feature word Referring to the calculation step, the feature word history, the appearance interval calculation step for obtaining the appearance time interval that is the use time interval of the content represented by the feature word for each feature word, and the feature word appearance frequency and the appearance time interval For each feature word, a feature word evaluation step for obtaining an evaluation value of the feature word in accordance with a deviation between the distribution of appearance time intervals and the model probability distribution, and based on the evaluation value Characterized in that it comprises a characteristic word specifying step of specifying the symptoms word.

また、本発明の興味情報特定方法は、人物またはグループの興味を表す興味情報を特定する興味情報特定方法であって、人物またはグループが利用したコンテンツの特徴を表す特徴語と、人物またはグループがその特徴語により特徴が表される各コンテンツを利用した利用時刻とを含む特徴語履歴を参照して、特徴語毎に、特徴語が表すコンテンツの利用頻度である特徴語出現頻度を求める出現頻度計算ステップと、特徴語履歴を参照して、特徴語毎に、特徴語が表すコンテンツの利用時刻間隔である出現時刻間隔を求める出現間隔計算ステップと、特徴語毎に、出現時刻間隔の標準偏差および平均値を計算し、標準偏差をＳＴＤＥＶとし、平均値をＡＶＥとし、特徴語が表すコンテンツの最初の利用時刻および最後の利用時刻をそれぞれＴ_ｌａｓｔ，Ｔ_０とし、特徴語履歴の導出対象期間をＴとしたときに、パラメータβを用いて、｛（Ｔ_ｌａｓｔ−Ｔ_０）／Ｔ｝・ｅ^{（−β・ＳＴＤＥＶ・ＡＶＥ）}を計算することにより、特徴語の評価値を求め、特徴語出現頻度が定められた回数以下である特徴語の評価値を所定値に定める特徴語評価ステップと、評価値に基づいて特徴語を特定する特徴語特定ステップとを含むことを特徴とする。 The interest information specifying method of the present invention is an interest information specifying method for specifying interest information representing the interest of a person or group, wherein a feature word representing a feature of content used by the person or group and a person or group Appearance frequency for obtaining the feature word appearance frequency, which is the use frequency of the content represented by the feature word, for each feature word with reference to the feature word history including the use time of each content whose feature is represented by the feature word A calculation step, an appearance interval calculation step for obtaining an appearance time interval that is a use time interval of content represented by the feature word for each feature word with reference to the feature word history, and a standard deviation of the appearance time interval for each feature word The average value is calculated, the standard deviation is STDEV, the average value is AVE, and the first use time and the last use time of the content represented by the feature word are respectively T {(T _last −T ₀ ) / T} · e ^{(−β · STDEV · AVE)} is calculated by using the parameter β, where _last and T ₀ are T and the characteristic word history derivation target period is T. Thus, a feature word evaluation step for obtaining an evaluation value of a feature word, setting the evaluation value of the feature word that is equal to or less than a predetermined number of times of appearance of the feature word to a predetermined value, and a feature word specifying the feature word based on the evaluation value A word specifying step.

また、本発明の興味情報特定用プログラムは、人物またはグループの興味を表す興味情報を特定するコンピュータに搭載される興味情報特定用プログラムであって、コンピュータに、人物またはグループが利用したコンテンツの特徴を表す特徴語と、人物またはグループがその特徴語により特徴が表される各コンテンツを利用した利用時刻とを含む特徴語履歴を参照して、特徴語毎に、特徴語が表すコンテンツの利用頻度である特徴語出現頻度を求める出現頻度計算処理、特徴語履歴を参照して、特徴語毎に、特徴語が表すコンテンツの利用時刻間隔である出現時刻間隔を求める出現間隔計算処理、特徴語出現頻度および出現時刻間隔を参照し、特徴語毎に、出現時刻間隔の分布と、モデルとなる確率分布との乖離に応じて特徴語の評価値を求める特徴語評価処理、および、評価値に基づいて特徴語を特定する特徴語特定処理を実行させることを特徴とする。 The interest information specifying program of the present invention is an interest information specifying program mounted on a computer for specifying interest information representing the interest of a person or group, and the feature of the content used by the person or group on the computer. Frequency of use of the content represented by the feature word for each feature word with reference to the feature word history including the feature word representing the character and the use time when each person or group represented the feature by the feature word. Appearance frequency calculation processing for obtaining a feature word appearance frequency, appearance frequency calculation processing for obtaining an appearance time interval that is a use time interval of content represented by the feature word for each feature word with reference to the feature word history, and feature word appearance Refer to the frequency and appearance time interval, and for each feature word, evaluate the evaluation value of the feature word according to the difference between the distribution of the appearance time interval and the model probability distribution. Mel characteristic word evaluation processing, and characterized in that to execute the characteristic word specifying process for specifying a characteristic word based on the evaluation value.

また、本発明の興味情報特定用プログラムは、人物またはグループの興味を表す興味情報を特定するコンピュータに搭載される興味情報特定用プログラムであって、コンピュータに、人物またはグループが利用したコンテンツの特徴を表す特徴語と、人物またはグループがその特徴語により特徴が表される各コンテンツを利用した利用時刻とを含む特徴語履歴を参照して、特徴語毎に、特徴語が表すコンテンツの利用頻度である特徴語出現頻度を求める出現頻度計算処理、特徴語履歴を参照して、特徴語毎に、特徴語が表すコンテンツの利用時刻間隔である出現時刻間隔を求める出現間隔計算処理、特徴語毎に、出現時刻間隔の標準偏差および平均値を計算し、標準偏差をＳＴＤＥＶとし、平均値をＡＶＥとし、特徴語が表すコンテンツの最初の利用時刻および最後の利用時刻をそれぞれＴ_ｌａｓｔ，Ｔ_０とし、特徴語履歴の導出対象期間をＴとしたときに、パラメータβを用いて、｛（Ｔ_ｌａｓｔ−Ｔ_０）／Ｔ｝・ｅ^{（−β・ＳＴＤＥＶ・ＡＶＥ）}を計算することにより、特徴語の評価値を求め、特徴語出現頻度が定められた回数以下である特徴語の評価値を所定値に定める特徴語評価処理、および、評価値に基づいて特徴語を特定する特徴語特定処理を実行させることを特徴とする。 The interest information specifying program of the present invention is an interest information specifying program mounted on a computer for specifying interest information representing the interest of a person or group, and the feature of the content used by the person or group on the computer. Frequency of use of the content represented by the feature word for each feature word with reference to the feature word history including the feature word representing the character and the use time when each person or group represented the feature by the feature word. Appearance frequency calculation process for obtaining a feature word appearance frequency, and an appearance interval calculation process for obtaining an appearance time interval that is a use time interval of content represented by the feature word for each feature word with reference to the feature word history, for each feature word Then, the standard deviation and the average value of the appearance time intervals are calculated, the standard deviation is STDEV, the average value is AVE, and the content of the content represented by the feature word is calculated. Each _T last use time and the end of use time of the _{T 0,} the derivation period of characteristic word history when is T, by using the parameter _{_{β, {(T last -T 0}} ) / T} · e By calculating ^{(−β · STDEV · AVE)} , a feature word evaluation value is obtained, and a feature word evaluation value in which the feature word appearance frequency is equal to or less than a predetermined number of times is determined, and And a feature word specifying process for specifying a feature word based on the evaluation value.

本発明によれば、ある期間内でランダムに出現する特徴語を興味情報として特定することができる。 According to the present invention, feature words that appear randomly within a certain period can be specified as interest information.

また、本発明によれば、発生時刻の最大値と最小値の差が大きく、出現時刻間隔の標準偏差および平均値が小さいほど値が大きくなるように評価値を定めることができ、さらに、出現時刻間隔が短い特徴語を特に優先的に抽出しやすくするか否かを調整することができる。 Further, according to the present invention, the evaluation value can be determined such that the difference between the maximum value and the minimum value of the occurrence time is large, and the value is increased as the standard deviation and the average value of the appearance time interval are smaller. It can be adjusted whether or not feature words with a short time interval are easily extracted with priority.

以下、本発明の実施形態を図面を参照して説明する。 Embodiments of the present invention will be described below with reference to the drawings.

実施形態１．
図１は、本発明の第１の実施形態の興味情報特定システムの例を示すブロック図である。第１の実施形態の興味情報特定システムは、抽出対象設定手段１００と、アクセス履歴記憶手段２００と、コンテンツ管理手段３００と、特徴語履歴生成手段４００と、出現頻度計算手段５００と、出現間隔計算手段６００と、特徴語評価手段７００と、特徴語選択手段８００と、興味情報提示手段９００とを備える。 Embodiment 1. FIG.
FIG. 1 is a block diagram illustrating an example of an interest information specifying system according to the first embodiment of this invention. The interest information identification system according to the first embodiment includes an extraction target setting unit 100, an access history storage unit 200, a content management unit 300, a feature word history generation unit 400, an appearance frequency calculation unit 500, and an appearance interval calculation. Means 600, feature word evaluation means 700, feature word selection means 800, and interest information presentation means 900 are provided.

最初に、本発明の興味情報特定システムが予め記憶するアクセス履歴およびメタ情報と、それらの情報から生成される特徴語履歴について説明する。 First, the access history and meta information stored in advance by the interest information identification system of the present invention and the feature word history generated from the information will be described.

興味情報特定システムは、予めアクセス履歴と各コンテンツのメタ情報とを記憶する。 The interest information specifying system stores an access history and meta information of each content in advance.

アクセス履歴は、コンテンツの識別情報と、そのコンテンツの利用時刻と、そのコンテンツを利用した人物（以下、ユーザと記す。）またはグループの識別情報とを含む利用履歴である。アクセス履歴では、コンテンツの識別情報と、コンテンツの利用時刻と、そのコンテンツを利用したユーザまたはグループの識別情報とが対応付けられている。なお、コンテンツの利用の態様は特に限定されず、コンテンツの利用時刻は、ユーザまたはグループがコンテンツを閲覧した時刻、視聴した時刻、ダウンロードした時刻などのいずれであってもよい。また、コンテンツの利用時刻の単位も特に限定されない。例えば、利用時刻は秒単位として時分秒で表してもよく、あるいは、日単位として日付で表してもよい。あるいは、時単位や週単位などで表してもよい。また、コンテンツの識別情報および利用時刻に対して、ユーザおよびそのユーザが属するグループの両方の識別情報が対応付けられていてもよい。以下、ユーザ名をユーザの識別情報として用い、グループの名称（例えば所属部署名）をグループの識別情報として用いる場合を例にして説明する。 The access history is a usage history including content identification information, use time of the content, and identification information of a person who uses the content (hereinafter referred to as a user) or a group. In the access history, content identification information, content use time, and user or group identification information using the content are associated with each other. Note that the usage mode of the content is not particularly limited, and the usage time of the content may be any of the time when the user or group browses the content, the viewing time, the downloading time, and the like. Further, the unit of content use time is not particularly limited. For example, the use time may be expressed in hours, minutes and seconds as a second unit, or may be expressed in dates as a day unit. Alternatively, it may be expressed in hours or weeks. Further, the identification information of the user and the group to which the user belongs may be associated with the identification information of the content and the use time. Hereinafter, a case where a user name is used as user identification information and a group name (for example, a department name) is used as group identification information will be described as an example.

図２は、アクセス履歴の例を示す説明図である。図２に例示するアクセス履歴では、日付（利用時刻）と、文書ＩＤ（コンテンツの識別情報）と、ユーザ名と、そのユーザの所属部署名とが対応付けられている。例えば、図２に例示するアクセス履歴において、先頭行のデータは、「２００７年９月１日」に、部署「ＳＥＣＴＩＯＮ１」に所属するユーザ「ＵＳＥＲ１」が、文書ＩＤ「ＩＤ００１」の文書をダウンロードしたことを示している。 FIG. 2 is an explanatory diagram illustrating an example of an access history. In the access history illustrated in FIG. 2, the date (use time), the document ID (content identification information), the user name, and the user's department name are associated with each other. For example, in the access history illustrated in FIG. 2, the data in the first row is “September 1, 2007” and the user “USER1” who belongs to the department “SECTION1” downloaded the document with the document ID “ID001”. It is shown that.

コンテンツのメタ情報は、コンテンツの付加的な情報であり、コンテンツ毎に用意される。メタ情報は、コンテンツの識別情報と、そのコンテンツの特徴を表す特徴語とを含み、コンテンツの識別情報と特徴語とが対応付けられている。また、メタ情報は、コンテンツの識別情報および特徴語とともに、他の情報（例えば、コンテンツ名、コンテンツの作成者、作成日時など）を含んでいてもよい。図３は、メタ情報の例を示す説明図である。図３に例示するメタ情報では、文書ＩＤ（コンテンツの識別情報）と、文書名（コンテンツ名）と、特徴語とが対応付けられている。例えば、図３に例示する文書ＩＤ「ＩＤ００１」のメタ情報は、文書ＩＤ「ＩＤ００１」の文書名が「○○提案資料」であり、その文書には「セキュリティ、ユビキタス、ネットワーク」が特徴語として定められていることを示している。 Content meta information is additional information of content and is prepared for each content. The meta information includes content identification information and a feature word representing the feature of the content, and the content identification information and the feature word are associated with each other. Further, the meta information may include other information (for example, content name, content creator, creation date and time) in addition to content identification information and feature words. FIG. 3 is an explanatory diagram illustrating an example of meta information. In the meta information illustrated in FIG. 3, a document ID (content identification information), a document name (content name), and a feature word are associated with each other. For example, in the meta information of the document ID “ID001” illustrated in FIG. 3, the document name of the document ID “ID001” is “XX proposed material”, and “security, ubiquitous, network” is a characteristic word for the document. It shows that it is stipulated.

特徴語履歴は、あるユーザ（グループでもよい。）が利用したコンテンツの特徴を表す特徴語と、そのユーザ（またはそのグループ）が特徴語によって特徴が表される各コンテンツを利用した利用時刻とを含む情報である。特徴語履歴では、特徴語と利用時刻とが対応付けられている。図４は、特徴語履歴の例を示す説明図である。図４に示す例では、例えば、「セキュリティ」を特徴語とするコンテンツが、あるユーザ（またはグループ）によって、「２００７／０９／０１」、「２００７／０９／０１」、「２００７／０９／０２」に利用されたことなどを示している。特徴語履歴は、アクセス履歴およびメタ情報から生成される。 The feature word history includes a feature word that represents a feature of content used by a certain user (or group), and a use time at which each user whose feature is represented by the feature word is used by the user (or the group). It is information to include. In the feature word history, feature words and usage times are associated with each other. FIG. 4 is an explanatory diagram illustrating an example of a feature word history. In the example illustrated in FIG. 4, for example, content having “security” as a feature word is “2007/09/01”, “2007/09/01”, “2007/09/02” by a certain user (or group). ”Indicates that it was used. The feature word history is generated from the access history and meta information.

本発明の興味情報特定システムは、アクセス履歴およびメタ情報から特徴語履歴を生成し、特徴語毎に、ユーザまたはグループの定常的な興味の度合いを表す評価値を計算する。図５は、各特徴語の評価値の例を示す説明図である。図５に示す例では、特徴語「セキュリティ」に対する評価値は「０．１」であり、特徴語「ユビキタス」に対する評価値は「１．０」である。評価値が大きいほうがユーザの興味が強いとすると、あるユーザ（またはあるグループ）は「ネットワーク」に最も興味を示していることを表す。 The interest information identification system of the present invention generates a feature word history from the access history and meta information, and calculates an evaluation value representing the degree of steady interest of the user or group for each feature word. FIG. 5 is an explanatory diagram illustrating an example of the evaluation value of each feature word. In the example illustrated in FIG. 5, the evaluation value for the feature word “security” is “0.1”, and the evaluation value for the feature word “ubiquitous” is “1.0”. If the evaluation value is larger, the user's interest is stronger. This indicates that a user (or a group) is most interested in the “network”.

また、コンテンツの例として、文書（電子文書）が挙げられるが、コンテンツは、文書に限定されない。例えば、コンテンツは、ホームページや電子掲示板などのＷＥＢページ、店舗情報や観光情報などの位置関連情報、テレビジョン放送やラジオ放送の番組情報、映像コンテンツや音楽コンテンツ、書籍情報などであってもよい。以下、コンテンツが文書（電子文書）である場合を例に説明する。 An example of content is a document (electronic document), but the content is not limited to a document. For example, the content may be a WEB page such as a home page or an electronic bulletin board, location-related information such as store information or sightseeing information, television broadcast or radio broadcast program information, video content, music content, book information, or the like. Hereinafter, a case where the content is a document (electronic document) will be described as an example.

次に、本実施形態の各構成要素について説明する。
抽出対象設定手段１００は、定常的な興味を表す興味情報として特徴語を特定する対象となるユーザまたはグループを設定する。例えば、抽出対象設定手段１００は、ユーザまたはグループを示すユーザ名またはグループ名の入力を促す画面を出力し、その画面にユーザ名またはグループ名が入力されると、そのユーザ名またはグループ名によって特定されるユーザまたはグループを定常的興味の特定対象として決定してもよい。図６は、ユーザ名またはグループ名の入力画面の例である。図６に例示する画面は、ユーザ名入力とグループ名入力のいずれかを指定するラジオボタン４０１と、ユーザ名またはグループ名が入力される入力欄４０２とを含んでいる。抽出対象設定手段１００は、図６に例示する入力画面を表示し、ユーザ名入力とグループ名入力のいずれかが指定され、その名称が入力欄４０２に入力されると、その入力された名称をユーザ名またはグループ名として決定する。 Next, each component of this embodiment is demonstrated.
The extraction target setting unit 100 sets a user or group to be a target for specifying a feature word as interest information representing a regular interest. For example, the extraction target setting unit 100 outputs a screen that prompts the user to enter a user name or group name indicating the user or group, and when the user name or group name is input to the screen, the extraction target setting unit 100 specifies the user name or group name. The user or group to be played may be determined as a specific target of constant interest. FIG. 6 is an example of a user name or group name input screen. The screen illustrated in FIG. 6 includes a radio button 401 for designating either user name input or group name input, and an input field 402 for inputting a user name or group name. The extraction target setting unit 100 displays the input screen illustrated in FIG. 6. When either the user name input or the group name input is designated and the name is input to the input field 402, the input name is displayed. Determine as user name or group name.

あるいは、抽出対象設定手段１００は、クッキー（Ｃｏｏｋｉｅ）などを用いて、以前入力されたユーザ名やグループ名により特定されるユーザまたはグループを定常的興味の特定対象として決定したり、他のシステムからユーザ名やグループ名を自動的に引き継ぎ、そのユーザ名やグループ名により特定されるユーザまたはグループを定常的興味の特定対象として決定してもよい。 Alternatively, the extraction target setting unit 100 may use a cookie or the like to determine a user or group specified by a previously input user name or group name as a target of constant interest or from another system. The user name or group name may be automatically taken over, and the user or group specified by the user name or group name may be determined as the target of regular interest.

アクセス履歴記憶手段２００は、アクセス履歴を記憶する。例えば、アクセス履歴記憶手段２００は、コンテンツ管理手段３００に記憶される文書（コンテンツ）がアクセスされたとき（すなわち利用されたとき）、その文書の識別情報と、利用時刻と、その文書を利用したユーザのユーザ名とを対応付けて、アクセス履歴として追加していけばよい。既に説明したように、そのユーザが属するグループ名もアクセス履歴に含めてもよい。なお、アクセス履歴記憶手段２００がアクセス履歴を取得する態様は特に限定されない。例えば、外部で作成されたアクセス履歴がアクセス履歴記憶手段２００に入力され、アクセス履歴記憶手段２００がそのアクセス履歴を記憶してもよい。 The access history storage unit 200 stores an access history. For example, when the document (content) stored in the content management unit 300 is accessed (that is, used), the access history storage unit 200 uses the document identification information, the use time, and the document. It is only necessary to associate the user name of the user and add it as an access history. As already described, the group name to which the user belongs may also be included in the access history. The manner in which the access history storage unit 200 acquires the access history is not particularly limited. For example, an access history created externally may be input to the access history storage unit 200, and the access history storage unit 200 may store the access history.

また、アクセス履歴記憶手段２００は、ユーザ名またはグループ名が指定されると、指定されたユーザ名またはグループ名に対応する文書の識別情報（以下、文書ＩＤと記す。）および利用時刻を、アクセス履歴から検索する。 Further, when a user name or group name is designated, the access history storage means 200 accesses the document identification information (hereinafter referred to as document ID) and the use time corresponding to the designated user name or group name. Search from history.

コンテンツ管理手段３００は、文書（コンテンツ）と、その文書のメタ情報と関連付けて記憶する。メタ情報には、文書ＩＤと、その文書の特徴を表す特徴語とが含まれている。メタ情報には、文書名などの他の情報が含まれていてもよい。また、文書名を文書ＩＤとしてもよい。また、コンテンツ管理手段３００は、文書ＩＤや文書名などをもとに文書を検索してもよい。 The content management unit 300 stores a document (content) in association with meta information of the document. The meta information includes a document ID and a feature word representing the feature of the document. The meta information may include other information such as a document name. The document name may be used as the document ID. Further, the content management unit 300 may search for a document based on a document ID, a document name, or the like.

特徴語履歴生成手段４００は、アクセス履歴記憶手段２００に記憶されるアクセス履歴と、コンテンツ管理手段３００に記憶される文書のメタ情報とを参照して、特徴語履歴を生成する。特徴語履歴生成手段４００は、抽出対象設定手段１００に設定されたユーザまたはグループが利用したコンテンツの特徴を表す特徴語を特定し、その特徴語に、そのコンテンツの利用時刻を対応付ける処理を行うことにより、特徴語履歴を生成する。 The feature word history generation unit 400 refers to the access history stored in the access history storage unit 200 and the meta information of the document stored in the content management unit 300 to generate a feature word history. The feature word history generation unit 400 specifies a feature word representing the feature of the content used by the user or group set in the extraction target setting unit 100, and performs processing for associating the use time of the content with the feature word Thus, a feature word history is generated.

例えば、アクセス履歴記憶手段２００が図２に例示するアクセス履歴を記憶し、コンテンツ管理手段３００が図３に例示するメタ情報を記憶しているとする。なお、図２に例示するアクセス履歴は、２００７年９月１日から２００７年９月７日の期間におけるアクセス履歴であるものとする。また、抽出対象設定手段１００が「ＵＳＥＲ１」を設定したとする。この場合の特徴語履歴の生成処理の例を説明する。 For example, it is assumed that the access history storage unit 200 stores the access history illustrated in FIG. 2, and the content management unit 300 stores the meta information illustrated in FIG. The access history illustrated in FIG. 2 is an access history during the period from September 1, 2007 to September 7, 2007. Further, it is assumed that the extraction target setting unit 100 sets “USER1”. An example of feature word history generation processing in this case will be described.

まず、特徴語履歴生成手段４００は、抽出対象設定手段１００に設定されたユーザ名またはグループ名に対応する文書ＩＤおよび利用時刻を取得する。例えば、アクセス履歴記憶手段２００に検索させる。本例では、特徴語履歴生成手段４００は、「ＵＳＥＲ１」に対応する文書ＩＤとして「ＩＤ００１」、「ＩＤ００２」、「ＩＤ００３」、「ＩＤ００５」を取得する（図２参照）。また、「ＩＤ００１」の文書の利用時刻として、「２００７／０９／０１」を取得する。他の文書の利用時刻も同様に取得する。 First, the feature word history generation unit 400 acquires the document ID and the use time corresponding to the user name or group name set in the extraction target setting unit 100. For example, the access history storage unit 200 is searched. In this example, the feature word history generation unit 400 acquires “ID001”, “ID002”, “ID003”, and “ID005” as document IDs corresponding to “USER1” (see FIG. 2). In addition, “2007/09/01” is acquired as the use time of the document of “ID001”. The use times of other documents are acquired in the same manner.

さらに、特徴語履歴生成手段４００は、その文書ＩＤ毎に、文書ＩＤに対応付けられている特徴語をメタ情報から抽出する。そして、特徴語履歴生成手段４００は、その特徴語と、同一の文書ＩＤに対応付けられている利用時刻とを対応付ける。例えば、特徴語履歴生成手段４００は、上記の文書ＩＤ「ＩＤ００１」に関して、コンテンツ管理手段３００に記憶されているメタ情報から、「ＩＤ００１」に対応する特徴語「セキュリティ、ユビキタス、ネットワーク」を抽出し、その各特徴語と、「ＩＤ００１」に対応付けられていた利用時刻「２００７／０９／０１」とを対応付ける。すなわち、「セキュリティ」、「ユビキタス」、「ネットワーク」それぞれに対し、「２００７／０９／０１」を対応付ける。他の文書ＩＤ「ＩＤ００２」、「ＩＤ００３」、「ＩＤ００５」に関しても、同様の処理を行う。図４は、上記のような処理の結果、得られた特徴語履歴を表している。また、アクセス履歴は、２００７年９月１日から２００７年９月７日の期間における履歴であるので、特徴語履歴の導出対象期間は、２００７年９月１日から２００７年９月７日の期間である。 Further, the feature word history generation unit 400 extracts a feature word associated with the document ID from the meta information for each document ID. Then, the feature word history generation unit 400 associates the feature word with the use time associated with the same document ID. For example, the feature word history generation unit 400 extracts the feature word “security, ubiquitous, network” corresponding to “ID001” from the meta information stored in the content management unit 300 for the document ID “ID001”. Each feature word is associated with the use time “2007/09/01” associated with “ID001”. That is, “2007/09/01” is associated with “security”, “ubiquitous”, and “network”. Similar processing is performed for the other document IDs “ID002”, “ID003”, and “ID005”. FIG. 4 shows the feature word history obtained as a result of the above processing. Since the access history is a history in the period from September 1, 2007 to September 7, 2007, the derivation target period of the feature word history is from September 1, 2007 to September 7, 2007. It is a period.

出現頻度計算手段５００は、特徴語履歴生成手段４００が生成した特徴語履歴を参照して、各特徴語の特徴語出現頻度を求める。特徴語出現頻度は、特徴語により特徴が表される各コンテンツの利用頻度（利用回数）である。出現頻度計算手段５００は、特徴語履歴を参照して、各特徴語毎に、特徴語に対応付けられた利用時刻の数をカウントし、そのカウント結果を特徴語出現頻度とすればよい。以下、特徴語出現頻度を単に出現頻度と記す。例えば、図４に例示する特徴語履歴が生成されたとする。この場合、「セキュリティ」の出現頻度は３回であり、「ユビキタス」の出現頻度は２回である。図７は、図４に例示する特徴語履歴に基づいて導出された出現頻度を示す。 The appearance frequency calculation unit 500 refers to the feature word history generated by the feature word history generation unit 400 and obtains the feature word appearance frequency of each feature word. The feature word appearance frequency is the use frequency (number of uses) of each content whose feature is represented by the feature word. The appearance frequency calculation means 500 may refer to the feature word history, count the number of times of use associated with the feature word for each feature word, and use the count result as the feature word appearance frequency. Hereinafter, the feature word appearance frequency is simply referred to as the appearance frequency. For example, assume that the feature word history illustrated in FIG. 4 is generated. In this case, the appearance frequency of “security” is three times, and the appearance frequency of “ubiquitous” is two times. FIG. 7 shows the appearance frequency derived based on the feature word history exemplified in FIG.

出現間隔計算手段６００は、特徴語履歴生成手段４００が生成した特徴語履歴を参照して、各特徴語の出現時刻間隔を求める。出現時刻間隔は、特徴語により特徴が表される各コンテンツの利用時刻の間隔である。出現間隔計算手段６００は、特徴語履歴を参照して、各特徴語毎に、特徴語に対応付けられた利用時刻同士の差を計算すればよい。また、出現間隔計算手段６００は、各出現時刻間隔が生じた回数をカウントする。例えば、図４に例示する特徴語履歴が生成されたとする。「セキュリティ」は、「２００７／０９／０１」に２回、「２００７／０９／０２」に１回出現していることになる（図４参照）。この場合、出現間隔計算手段６００は、「２００７／０９／０１」同士の差として、「０日間隔」という出現時刻間隔を計算し、「０日間隔」が生じた回数「１回」をカウントする。また、「２００７／０９／０１」と「２００７／０９／０２」との差として、「１日間隔」という出現時刻間隔を計算し、「１日間隔」が生じた回数「１回」をカウントする。ここでは「セキュリティ」の出現時刻間隔を例示したが、他の特徴語についても同様に出現時刻間隔を求める。また、出現間隔計算手段６００は、特徴語に対して、利用時刻が１つだけしか対応付けられていない場合には、すべての出現時刻間隔について、カウント数を０回とする。例えば、図４に例示する「サーバ」の場合、「０日間隔」、「１日間隔」などのいずれの出現時刻間隔についても「０回」とする。図８は、図４に例示する特徴語履歴に基づいて導出された出現時刻間隔を示す。 The appearance interval calculation unit 600 refers to the feature word history generated by the feature word history generation unit 400 to obtain the appearance time interval of each feature word. The appearance time interval is an interval between use times of each content whose features are represented by feature words. The appearance interval calculation unit 600 may calculate the difference between the usage times associated with the feature words for each feature word with reference to the feature word history. Further, the appearance interval calculation means 600 counts the number of times each appearance time interval has occurred. For example, assume that the feature word history illustrated in FIG. 4 is generated. “Security” appears twice in “2007/09/01” and once in “2007/09/02” (see FIG. 4). In this case, the appearance interval calculation means 600 calculates the appearance time interval of “0 day interval” as the difference between “2007/09/01”, and counts the number of times “0 day” has occurred “1”. To do. In addition, as a difference between “2007/09/01” and “2007/09/02”, an appearance time interval of “1 day interval” is calculated, and the number of times “1 day interval” occurs is counted as “1 time”. To do. Here, the appearance time interval of “security” is illustrated, but the appearance time interval is similarly obtained for other feature words. Further, when only one use time is associated with the feature word, the appearance interval calculation unit 600 sets the count number to 0 for all the appearance time intervals. For example, in the case of “server” illustrated in FIG. 4, the appearance time intervals such as “0 day interval” and “1 day interval” are set to “0”. FIG. 8 shows the appearance time intervals derived based on the feature word history exemplified in FIG.

特徴語評価手段７００は、特徴語履歴生成手段４００が生成した特徴語履歴と出現間隔計算手段６００が求めた特徴語の出現時刻間隔とを参照して、特徴語毎に評価値を計算する。この評価値は、設定されたユーザまたはグループの定常的な興味の度合いを表す値である。特徴語評価手段７００は、出現時刻間隔の分布と、モデルとなる確率分布との差を計算し、その差に応じて特徴語の評価値を計算する。ここで、モデルとなる確率分布は、特徴語が一様に出現する（すなわち、特徴語により特徴が表される文書がランダムに利用される）と仮定したときの、出現時刻間隔の確率分布である。特徴語評価手段７００は、各特徴語を順に選択し、選択した特徴語について評価値を計算する。 The feature word evaluation unit 700 refers to the feature word history generated by the feature word history generation unit 400 and the appearance time interval of the feature word obtained by the appearance interval calculation unit 600, and calculates an evaluation value for each feature word. This evaluation value is a value representing the degree of steady interest of the set user or group. The feature word evaluation means 700 calculates the difference between the distribution of appearance time intervals and the probability distribution as a model, and calculates the evaluation value of the feature word according to the difference. Here, the model probability distribution is the probability distribution of the appearance time intervals when it is assumed that the feature words appear uniformly (that is, the document whose features are represented by the feature words are used randomly). is there. The feature word evaluation unit 700 sequentially selects each feature word and calculates an evaluation value for the selected feature word.

ランダムに事象が発生する場合、ある事象が起こった後、次の事象が起こるまでの時間は、指数分布に従うことが知られている。従って、特徴語が一様に出現すると仮定した場合の出現時刻間隔も指数分布に従う。そこで、本実施形態では、モデルとなる確率分布として、指数分布を用いる。 It is known that when an event occurs at random, the time from when an event occurs to the next event follows an exponential distribution. Therefore, the appearance time interval when it is assumed that the feature words appear uniformly follows the exponential distribution. Therefore, in the present embodiment, an exponential distribution is used as a probability distribution serving as a model.

ある出現時刻間隔で特徴語が出現する確率（すなわち、特徴語により特徴が表される文書がある出現時刻間隔で利用される確率）は、実際にその出現時刻間隔で特徴語が出現した回数を、各出現時刻間隔で特徴語が出現した回数の和で除算した値である。すなわち、出現時刻間隔をｔとすると、出現時刻間隔ｔで特徴語が出現する確率（Ｐ’（ｔ）と記す。）は、以下に示す式（１）によって表される。 The probability that a feature word will appear at a certain appearance time interval (that is, the probability that a document whose feature is represented by the feature word will be used at a certain appearance time interval) is the number of times the feature word has actually appeared at that appearance time interval. , The value divided by the sum of the number of appearances of the feature word at each appearance time interval. That is, when the appearance time interval is t, the probability that a feature word appears at the appearance time interval t (denoted as P ′ (t)) is expressed by the following equation (1).

式（１）に示すｆ（ｔ）は、出現時刻間隔ｔで特徴語が出現した回数であり、式（１）の右辺の分母は、各出現時刻間隔で特徴語が出現した回数の和である。 F (t) shown in Equation (1) is the number of times a feature word has appeared at the appearance time interval t, and the denominator on the right side of Equation (1) is the sum of the number of appearances of the feature word at each appearance time interval. is there.

また、モデルとなる確率分布（指数分布）において、出現時刻間隔ｔで特徴語が出現する確率をＰ（ｔ）とすると、Ｐ（ｔ）は、以下に示す式（２）によって表される。 Further, in the probability distribution (exponential distribution) serving as a model, if the probability that a feature word appears at an appearance time interval t is P (t), P (t) is expressed by the following equation (2).

Ｐ（ｔ）＝（Ｋ／Ｔ）ｅ^{−（Ｋ／Ｔ）ｔ} 式（２） P (t) = (K / T) e− ^{(K / T) t} formula (2)

ここで、Ｋは、選択している特徴語の出現頻度である。また、Ｔは、特徴語履歴の導出対象期間（換言すれば、特徴語履歴を生成する基となったアクセス履歴を採取していた期間）である。 Here, K is the appearance frequency of the selected feature word. T is a feature word history derivation target period (in other words, a period during which an access history that is a basis for generating a feature word history is collected).

特徴語評価手段７００は、選択している特徴語の各出現時刻間隔毎に、式（１）によりＰ’（ｔ）を計算し、式（２）によりＰ（ｔ）を計算し、その差分の絶対値｜Ｐ（ｔ）−Ｐ’（ｔ）｜を計算する。そして、特徴語評価手段７００は、各出現時刻間隔毎に計算した差分の絶対値｜Ｐ（ｔ）−Ｐ’（ｔ）｜の総和を求め、その総和に応じた評価値を計算する。本実施形態では、特徴語評価手段７００は、以下に示す式（３）の計算を行うことにより、選択している特徴語の評価値を計算する。 The feature word evaluation unit 700 calculates P ′ (t) according to the expression (1) and P (t) according to the expression (2) for each appearance time interval of the selected feature word, and the difference between them. The absolute value | P (t) −P ′ (t) | The feature word evaluation unit 700 obtains the sum of absolute values | P (t) −P ′ (t) | of the differences calculated at each appearance time interval, and calculates an evaluation value corresponding to the sum. In the present embodiment, the feature word evaluation unit 700 calculates the evaluation value of the selected feature word by calculating the following equation (3).

式（３）の左辺のＶは、選択している特徴語の評価値である。また、式（３）の右辺の指数部分は、各出現時刻間隔毎に計算した差分の絶対値｜Ｐ（ｔ）−Ｐ’（ｔ）｜の総和に−１を乗じた値である。このように評価値を計算した場合、｜Ｐ（ｔ）−Ｐ’（ｔ）｜の総和が小さいほど、評価値Ｖは大きくなる。すなわち、出現時刻間隔の分布と、モデルとなる確率分布との差が小さいほど、評価値Ｖは大きくなる。 V on the left side of Equation (3) is the evaluation value of the selected feature word. In addition, the exponent part on the right side of Equation (3) is a value obtained by multiplying the sum of the absolute values | P (t) −P ′ (t) | When the evaluation value is calculated in this way, the evaluation value V increases as the sum of | P (t) −P ′ (t) | decreases. That is, the smaller the difference between the appearance time interval distribution and the model probability distribution, the larger the evaluation value V.

また、出現頻度が１回であり、出現時刻間隔が求まらない特徴語については、評価値を０と定める。すなわち、いずれのｔについても生じた回数が０回となる場合には、Ｖ＝０と定める。 In addition, an evaluation value is set to 0 for a feature word whose appearance frequency is once and whose appearance time interval cannot be obtained. That is, when the number of occurrences for any t is 0, V = 0 is determined.

特徴語評価手段７００の処理の具体例を示す。特徴語履歴生成手段４００が、ある特徴語Ａについて、９月１日から９月１４日までの間の特徴語履歴として、図９（ａ）に例示する特徴語履歴が生成されたとする。この場合、出現頻度計算手段５００は、図９（ｂ）に示すように、特徴語Ａの出現頻度「９」を求める。また、出現間隔計算手段６００は、出現時刻間隔を計算し、その出現時刻間隔が生じた回数をカウントする。例えば、図９（ａ）に示す特徴語履歴では、出現時刻間隔０日は３回生じ、出現時刻間隔１日は２回生じている。本例では、図９（ｃ）に示す出現時刻間隔が得られる。 A specific example of processing of the feature word evaluation unit 700 will be shown. Assume that the feature word history generation unit 400 generates a feature word history illustrated in FIG. 9A as a feature word history between September 1 and September 14 for a certain feature word A. In this case, the appearance frequency calculation means 500 obtains the appearance frequency “9” of the feature word A as shown in FIG. Further, the appearance interval calculation means 600 calculates an appearance time interval and counts the number of times that the appearance time interval has occurred. For example, in the feature word history shown in FIG. 9A, the appearance time interval 0 day occurs three times, and the appearance time interval 1 day occurs twice. In this example, the appearance time interval shown in FIG. 9C is obtained.

特徴語評価手段７００は、各出現時刻間隔の実際の確率と、モデルとなる確率分布での確率との差の絶対値を求め、その総和を用いて評価値を計算する。図１０は、この計算過程を示す説明図である。本例では、特徴語Ａの出現頻度Ｋ＝９である。また、特徴語履歴は９月１日から９月１４日までの期間について求めているので、式（２）における期間Ｔ＝１４である。従って、Ｋ／Ｔ＝０．６４２８５７である。 The feature word evaluation unit 700 obtains the absolute value of the difference between the actual probability of each appearance time interval and the probability in the model probability distribution, and calculates the evaluation value using the sum. FIG. 10 is an explanatory diagram showing this calculation process. In this example, the appearance frequency K of the feature word A is 9. Further, since the feature word history is obtained for the period from September 1 to September 14, the period T = 14 in the equation (2). Therefore, K / T = 0.642857.

出現時刻間隔０日を例にして、その出現時刻間隔で実際に特徴語が出現する確率を求める。すなわち、ｔ＝０として、式（１）によりＰ’（０）を求める。図９（ｃ）に示すように、出現時刻間隔０日の生じた回数ｆ（０）＝３である。また、各出現時刻間隔で特徴語が出現した回数の和は、３＋２＋１＋１＋１＝８である（図９（ｃ）参照）。よって、Ｐ’（０）＝３／８＝０．３７５である。 Taking the appearance time interval 0 days as an example, the probability that a feature word will actually appear at the appearance time interval is obtained. That is, assuming that t = 0, P ′ (0) is obtained from Equation (1). As shown in FIG. 9C, the number of occurrences f (0) = 3 that occurred on the appearance time interval 0 days. Further, the sum of the number of appearances of feature words at each appearance time interval is 3 + 2 + 1 + 1 + 1 = 8 (see FIG. 9C). Therefore, P ′ (0) = 3/8 = 0.375.

また、モデルとなる確率分布（指数分布）において、出現時刻間隔０日が生じる確率Ｐ（０）は、上記のＫ／Ｔを用いて、０．６４２８５７×ｅ^{−０．６４２８５７×０}＝０．６４２８５７となる。よって、出現時刻間隔０日における実際の確率Ｐ’（０）と、モデルとなる確率分布での確率Ｐ（０）との差の絶対値は、｜０．６４２８５７−０．３７５｜＝０．２６８となる。同様に、他の出現時刻間隔ｔについても｜Ｐ（ｔ）−Ｐ’（ｔ）｜を計算すると、図１０に示すようになる。この総和を計算すると、０．５７０となる。なお、図１０には、この差の絶対値を求める過程で計算されたＰ’（ｔ）およびＰ（ｔ）も示している。 In the probability distribution (exponential distribution) serving as a model, the probability P (0) of occurrence of an appearance time interval of 0 days is 0.642857 × e− ^{0.642857 × 0} = 0. 642857. Therefore, the absolute value of the difference between the actual probability P ′ (0) at the appearance time interval 0 days and the probability P (0) in the model probability distribution is | 0.642857−0.375 | = 0. 268. Similarly, when | P (t) −P ′ (t) | is calculated for other appearance time intervals t, the result is as shown in FIG. When this sum is calculated, it is 0.570. FIG. 10 also shows P ′ (t) and P (t) calculated in the process of obtaining the absolute value of this difference.

特徴語評価手段７００は、この総和を用いて、式（３）の計算を行い特徴語Ａの評価値Ｖを計算する。本例では、Ｖ＝ｅ^{−０．５７０}＝０．５６５となる。ここでは、特徴語Ａを例にして説明したが、他の特徴語についても同様に評価値を計算する。 The feature word evaluation means 700 calculates the evaluation value V of the feature word A by calculating Equation (3) using this sum. In this example, V = e− ^0.570 = 0.565. Although the feature word A has been described as an example here, evaluation values are similarly calculated for other feature words.

ここでは、図９に示す例を用いて説明したが、図４に示す特徴語履歴から図７および図８に示す出現頻度、出現時刻間隔を求め、各特徴語の評価値を求めると、図１１に示すようになる。「サーバ」、「ストレージ」は出現頻度が１回であり、出現時刻間隔が求まらないため、評価値を０とする。 Here, the example shown in FIG. 9 is used for explanation. However, when the appearance frequency and the appearance time interval shown in FIGS. 7 and 8 are obtained from the feature word history shown in FIG. 4 and the evaluation value of each feature word is obtained, FIG. 11 as shown. Since “server” and “storage” have an appearance frequency of 1 and an appearance time interval cannot be obtained, the evaluation value is set to 0.

本実施形態では、モデルとなる確率分布として指数分布を用いているが、モデルとなる確率分布は、指数分布に限定されず、特徴語がランダムに出現する場合の出現時刻間隔の分布を表すものであれば、指数分布以外の確率分布をモデルとしてもよい。 In this embodiment, an exponential distribution is used as a model probability distribution. However, the model probability distribution is not limited to the exponential distribution, and represents a distribution of appearance time intervals when feature words appear randomly. If so, a probability distribution other than the exponential distribution may be used as a model.

特徴語選択手段８００は、特徴語評価手段７００が求めた各特徴語の評価値に基づいて特徴語を特定する。本実施形態では、評価値が閾値以上となっている特徴語を選択する。このように特定される特徴語は、指定されたユーザまたはグループの定常的な興味を示す興味情報であり、特徴語選択手段８００は、上記のように求められた評価値に基づいて特徴語を特定することにより、定常的な興味を示す興味情報を特定する。例えば、閾値が０．３であり、図１１に示すように各特徴語の評価値が求められているとすると、特徴語選択手段８００は、定常的な興味を表す特徴語として、「セキュリティ」および「ネットワーク」を選択する。 The feature word selection unit 800 specifies a feature word based on the evaluation value of each feature word obtained by the feature word evaluation unit 700. In this embodiment, feature words having an evaluation value equal to or greater than a threshold value are selected. The feature word specified in this way is interest information indicating the steady interest of the designated user or group, and the feature word selection unit 800 selects the feature word based on the evaluation value obtained as described above. By specifying, the interest information indicating stationary interest is specified. For example, if the threshold value is 0.3 and the evaluation value of each feature word is obtained as shown in FIG. 11, the feature word selection unit 800 displays “security” as a feature word representing stationary interest. And select Network.

なお、ここでは、閾値を用いて特徴語を特定する場合を説明したが、特徴語選択手段８００は他の方法で特徴語を特定してもよい。例えば、特徴語選択手段８００は、評価値が降順になるように特徴語をソートし、評価値の大きい上位の特徴語を、予め定められた個数だけ選択してもよい。 Here, the case where the feature word is specified using the threshold value has been described, but the feature word selection unit 800 may specify the feature word by another method. For example, the feature word selection unit 800 may sort the feature words so that the evaluation values are in descending order, and select a predetermined number of higher-order feature words having a large evaluation value.

興味情報提示手段９００は、特徴語選択手段８００が特定した特徴語をユーザに提示する。例えば、興味情報提示手段９００は、特徴語をポータルサイトに表示してもよい。すなわち、ポータルサイトのトップ画面において、特徴語選択手段８００が選択した特徴語を「おすすめ検索キーワード」として提示してもよい。また、本発明の興味情報特定システムは、図１２に示すように、上記の各手段１００〜９００に加えて、特徴語を用いてコンテンツを検索する検索手段９５０を備えていてもよい。そして、検索手段９５０が、特徴語選択手段８００に特定された特徴語を検索語としてコンテンツを検索し、興味情報提示手段９００が、その検索結果もあわせて表示してもよい。 The interest information presentation unit 900 presents the feature word specified by the feature word selection unit 800 to the user. For example, the interest information presentation unit 900 may display the feature word on the portal site. That is, on the top screen of the portal site, the feature word selected by the feature word selection unit 800 may be presented as a “recommended search keyword”. Moreover, the interest information identification system of this invention may be provided with the search means 950 which searches a content using a feature word in addition to said each means 100-900, as shown in FIG. Then, the search unit 950 may search the content using the feature word specified by the feature word selection unit 800 as a search word, and the interest information presentation unit 900 may display the search result together.

図１３は、興味情報提示手段９００が出力する画面の例を示す説明図である。図１３（ａ）に示すように、ポータルサイトにおいて、特徴語選択手段８００が選択した特徴語を「おすすめ検索キーワード」として表示し、検索手段９５０が検索した検索結果（図１３（ａ）に示す例では「おすすめニュース」）も合わせて表示してもよい。図１３（ａ）に例示するポータルサイトは、例えば、ログイン時などに、ログインした者に対して本人の定常的な興味を提示する場合に用いられる。また、本人の定常的な興味ではなく、他人や他のグループ（例えば、他部署）の定常的な興味を調べる用途に本発明を用いてもよい。例えば、ある者が、他人である「ＵＳＥＲ１」を抽出対象設定手段１００に入力し、「ＵＳＥＲ１」の定常的な興味を調べるといった用途にも用いることができる。図１３（ｂ）は、この場合の、特徴語出力画面の例を示している。なお、図１３に示す各画面に、他の情報も合わせて表示されていてもよい。 FIG. 13 is an explanatory diagram illustrating an example of a screen output by the interest information presenting unit 900. As shown in FIG. 13A, on the portal site, the feature word selected by the feature word selection unit 800 is displayed as “recommended search keyword”, and the search result searched by the search unit 950 (shown in FIG. 13A). In the example, “Recommended News”) may also be displayed. The portal site illustrated in FIG. 13A is used, for example, when presenting a regular interest of a person who logs in at the time of login or the like. Moreover, you may use this invention for the use which investigates the regular interest of others and other groups (for example, other departments) instead of the regular interest of the principal. For example, a certain person can input “USER1”, which is another person, into the extraction target setting unit 100 and examine the steady interest of “USER1”. FIG. 13B shows an example of the feature word output screen in this case. Other information may also be displayed on each screen shown in FIG.

また、図１２に示す検索手段９５０は、特徴語を検索語としてコンテンツを検索する場合、コンテンツ管理手段３００に記憶されているコンテンツを検索対象としてもよく、あるいは、興味情報特定システム外部のコンテンツデータベースや、各種Ｗｅｂページを検索対象としてもよい。 12 may search for content stored in the content management unit 300, or may be a content database outside the interest information identification system. Alternatively, various Web pages may be searched.

なお、抽出対象設定手段１００で、興味情報として特徴語を特定する対象としてグループ（例えば、部署）が設定された場合、興味情報特定システムは、そのグループに所属する各ユーザそれぞれについて各特徴語の評価値を計算し、特徴語毎に各ユーザの評価値を足し合わせてもよい。そして、その結果を、グループにおける特徴語の評価値としてもよい。あるいは、グループに所属するユーザ全員のアクセス履歴から、グループの特徴語履歴を一括して生成し、グループにおける特徴語の評価値を計算してもよい。 When the extraction target setting unit 100 sets a group (for example, a department) as a target for specifying a feature word as interest information, the interest information specifying system sets each feature word for each user belonging to the group. An evaluation value may be calculated, and the evaluation value of each user may be added for each feature word. Then, the result may be an evaluation value of the feature word in the group. Alternatively, the feature word history of the group may be collectively generated from the access histories of all users belonging to the group, and the evaluation value of the feature word in the group may be calculated.

抽出対象設定手段１００、アクセス履歴記憶手段２００、コンテンツ管理手段３００、特徴語履歴生成手段４００、出現頻度計算手段５００、出現間隔計算手段６００、特徴語評価手段７００、特徴語選択手段８００、興味情報提示手段９００、および検索手段９５０は、例えば、プログラム（興味情報特定用プログラム）に従って動作するＣＰＵによって実現されていてもよい。そして、上記の各手段が同一のＣＰＵによって実現されてもよい。プログラムは、例えば、興味情報特定システムが備える記憶装置に記憶され、ＣＰＵがプログラムを読み込み、そのプログラムに従って、抽出対象設定手段１００、アクセス履歴記憶手段２００、コンテンツ管理手段３００、特徴語履歴生成手段４００、出現頻度計算手段５００、出現間隔計算手段６００、特徴語評価手段７００、特徴語選択手段８００、興味情報提示手段９００、および検索手段９５０として動作してもよい。なお、抽出態様設定手段１００は、そのＣＰＵと、キーボードなどの入力装置とにより実現されていてもよい。アクセス履歴記憶手段２００およびコンテンツ管理手段３００は、ＣＰＵと記憶装置とにより実現される。また、興味情報提示手段９００は、ＣＰＵとディスプレイ装置とにより実現される。 Extraction target setting means 100, access history storage means 200, content management means 300, feature word history generation means 400, appearance frequency calculation means 500, appearance interval calculation means 600, feature word evaluation means 700, feature word selection means 800, interest information The presentation unit 900 and the search unit 950 may be realized by a CPU that operates according to a program (interest information specifying program), for example. And each said means may be implement | achieved by the same CPU. For example, the program is stored in a storage device included in the interest information identification system, and the CPU reads the program, and in accordance with the program, the extraction target setting unit 100, the access history storage unit 200, the content management unit 300, and the feature word history generation unit 400. , Appearance frequency calculation means 500, appearance interval calculation means 600, feature word evaluation means 700, feature word selection means 800, interest information presentation means 900, and search means 950. The extraction mode setting unit 100 may be realized by the CPU and an input device such as a keyboard. The access history storage unit 200 and the content management unit 300 are realized by a CPU and a storage device. The interest information presentation unit 900 is realized by a CPU and a display device.

ここでは、各手段が同一のコンピュータにより実現される場合を例示したが、本発明の興味情報特定システムの構成は、一台のコンピュータによって実現する場合に限定されない。以下に、その例を示す。 Here, the case where each unit is realized by the same computer is illustrated, but the configuration of the interest information specifying system of the present invention is not limited to the case where it is realized by one computer. An example is shown below.

抽出対象設定手段１００および興味情報提示手段９００は、例えば、ディスプレイ装置と入力装置を備え、プログラムに従って動作するＰＤＡ（ＰｅｒｓｏｎａｌＤａｔａＡｓｓｉｓｔａｎｔｓ）、パーソナルコンピュータ、携帯電話機などの情報処理装置によって実現されてもよい。 The extraction target setting unit 100 and the interest information presentation unit 900 may be realized by an information processing device such as a PDA (Personal Data Assistants), a personal computer, and a mobile phone, which includes a display device and an input device and operates according to a program. .

アクセス履歴記憶手段２００は、アクセス履歴を記憶する記憶装置を備え、データベースプログラムに従って動作するパーソナルコンピュータやサーバ型のコンピュータによって実現されてもよい。コンテンツ管理手段３００は、文書とメタ情報とを関連付けて記憶する記憶装置を備え、データベースプログラムに従って動作するパーソナルコンピュータやサーバ型のコンピュータによって実現されてもよい。 The access history storage unit 200 includes a storage device that stores an access history, and may be realized by a personal computer or a server type computer that operates according to a database program. The content management means 300 may be realized by a personal computer or a server type computer that includes a storage device that stores a document and meta information in association with each other and operates according to a database program.

また、特徴語履歴生成手段４００、出現頻度計算手段５００、出現間隔計算手段６００、特徴語評価手段７００、特徴語選択手段８００、検索手段９５０は、同じコンピュータによって実現されていてもよい。あるいは、異なるコンピュータによって実現され、ＴＣＰ／ＩＰ（ＴｒａｎｓｍｉｓｓｉｏｎＣｏｎｔｒｏｌＰｒｏｔｏｃｏｌ／ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）などの通信プロトコルを使って通信を行って処理を進めてもよい。 The feature word history generation unit 400, the appearance frequency calculation unit 500, the appearance interval calculation unit 600, the feature word evaluation unit 700, the feature word selection unit 800, and the search unit 950 may be realized by the same computer. Alternatively, it may be realized by a different computer, and processing may be performed by performing communication using a communication protocol such as TCP / IP (Transmission Control Protocol / Internet Protocol).

次に、動作について説明する。図１４は、第１の実施形態の興味情報特定システムの処理経過の例を示す流れ図である。 Next, the operation will be described. FIG. 14 is a flowchart illustrating an example of processing progress of the interest information identification system according to the first embodiment.

抽出対象設定手段１００が、定常的な興味情報を特定する対象となるユーザまたはグループを設定する（ステップＳ１）。すると、特徴語履歴生成手段４００は、アクセス履歴記憶手段２００が記憶しているアクセス履歴とコンテンツ管理手段３００が記憶しているメタ情報とを用いて、ステップＳ１で設定されたユーザまたはグループに関連する特徴語履歴を生成する（ステップＳ２）。出現頻度計算手段５００は、その特徴語履歴を参照して、各特徴語の出現頻度を求め（ステップＳ３）、出現間隔計算手段６００は、その特徴語履歴を参照して、各特徴語の出現時刻間隔を求める（ステップＳ４）。 The extraction target setting unit 100 sets a user or a group that is a target for specifying stationary interest information (step S1). Then, the feature word history generation unit 400 uses the access history stored in the access history storage unit 200 and the meta information stored in the content management unit 300 to relate to the user or group set in step S1. A feature word history to be generated is generated (step S2). The appearance frequency calculation unit 500 refers to the feature word history to obtain the appearance frequency of each feature word (step S3), and the appearance interval calculation unit 600 refers to the feature word history and the appearance of each feature word. A time interval is obtained (step S4).

続いて、特徴語評価手段７００は、ステップＳ４で求めた出現時刻間隔を参照して、各出現時刻間隔で特徴語が出現する実際の確率を計算する（ステップＳ５）。すなわち、各出現時刻間隔毎に、式（１）の計算を行い、各出現時刻間隔についての確率Ｐ’（ｔ）を計算する。 Subsequently, the feature word evaluation unit 700 refers to the appearance time interval obtained in step S4, and calculates the actual probability that the feature word appears at each appearance time interval (step S5). That is, for each appearance time interval, the equation (1) is calculated, and the probability P ′ (t) for each appearance time interval is calculated.

特徴語評価手段７００は、ステップＳ３で求めた出現頻度とステップＳ４で求めた出現時刻間隔を参照し、モデルとなる確率分布（指数分布）においてそれぞれの出現時刻間隔で特徴語が出現する確率を計算する（ステップＳ６）。例えば、、出現頻度Ｋと、特徴語履歴の導出対象期間Ｔにより、Ｋ／Ｔを計算し、各出現時刻間隔毎に、式（２）の計算を行い、各出現時刻間隔についての確率Ｐ（ｔ）を計算する。 The feature word evaluation unit 700 refers to the appearance frequency obtained in step S3 and the appearance time interval obtained in step S4, and calculates the probability that the feature word appears at each appearance time interval in the model probability distribution (exponential distribution). Calculate (step S6). For example, K / T is calculated from the appearance frequency K and the derivation target period T of the feature word history, the equation (2) is calculated for each appearance time interval, and the probability P ( t) is calculated.

続いて、特徴語評価手段７００は、各出現時刻間隔について、ステップＳ５で求めた実際の出現確率と、ステップＳ６で求めた指数分布上での出現確率とのずれ量（差分の絶対値）を計算する（ステップＳ７）。さらに、特徴語評価手段７００は、ステップＳ７で各出現時刻間隔毎に計算したずれ量の総和を計算し（ステップＳ８）、その総和を用いて特徴語の評価値を計算する（ステップＳ９）。ステップＳ９では、式（３）の計算を行って、評価値Ｖを計算すればよい。 Subsequently, the feature word evaluation unit 700 calculates, for each appearance time interval, a deviation amount (absolute value of the difference) between the actual appearance probability obtained in step S5 and the appearance probability on the exponential distribution obtained in step S6. Calculate (step S7). Further, the feature word evaluation unit 700 calculates the sum of the deviation amounts calculated for each appearance time interval in step S7 (step S8), and calculates the evaluation value of the feature word using the sum (step S9). In step S9, the evaluation value V may be calculated by calculating equation (3).

特徴語評価手段７００は、ステップＳ５〜Ｓ９の処理を各特徴語毎に行う。ステップＳ５〜Ｓ９のそれぞれのステップで、特徴語を順次、選択し、選択した特徴語毎に処理を行ってもよい。また、ステップＳ５の処理を実行する前に、特徴語を選択し、選択した特徴語についてステップＳ５〜Ｓ９の処理を行い、さらに次の特徴語を選択し、同様にステップＳ５〜Ｓ９の処理を行っていってもよい。 The feature word evaluation unit 700 performs the processing of steps S5 to S9 for each feature word. In each of steps S5 to S9, feature words may be sequentially selected, and processing may be performed for each selected feature word. Further, before executing the process of step S5, the feature word is selected, the process of steps S5 to S9 is performed on the selected feature word, the next feature word is further selected, and the process of steps S5 to S9 is performed in the same manner. You may go.

全ての特徴語について評価値を計算した後、特徴語選択手段８００は、評価値に基づいて特徴語を特定することにより、ステップＳ１で設定されたユーザまたはグループの定常的な興味を示す興味情報を特定する（ステップＳ１０）。ステップＳ１０では、例えば、評価値が、予め定められた閾値以上である特徴語を選択すればよい。 After calculating the evaluation values for all the feature words, the feature word selecting unit 800 specifies the feature words based on the evaluation values, thereby indicating the interest information indicating the steady interest of the user or group set in step S1. Is specified (step S10). In step S10, for example, a feature word whose evaluation value is greater than or equal to a predetermined threshold value may be selected.

興味情報提示手段９００は、ステップＳ１０で特定された特徴語を表示する（ステップＳ１１）。この結果、興味情報特定システムの使用者は、設定したユーザまたはグループの定常的な興味を知ることができる。 The interest information presentation unit 900 displays the feature word specified in step S10 (step S11). As a result, the user of the interest information specifying system can know the steady interest of the set user or group.

上記の処理経過の具体例を以下に示す。以下の例では、マウスやキーボードなどの入力装置と、ボタンなどのユーザインタフェースや文字を表示するディスプレイ装置を備えたパーソナルコンピュータによって興味情報特定システムが実現されているもとのする。また、アクセス履歴記憶手段２００およびコンテンツ管理手段３００は、データベースプログラムにより動作しているものとする。また、アクセス履歴記憶手段２００は、図２に示すアクセス履歴を記憶しているものとする。図２に示すアクセス履歴では、ユーザが文書を閲覧したりダウンロードした日付（利用時刻）と、文書ＩＤと、ユーザ名と、所属部署名（グループ名）とが対応付けられている。また、図３に示すメタ情報を記憶している者とする。図３に示すメタ情報では、文書ＩＤと、文書名と、特徴語とが対応付けられている。 A specific example of the above process will be shown below. In the following example, it is assumed that the interest information identification system is realized by a personal computer including an input device such as a mouse and a keyboard, a user interface such as a button, and a display device that displays characters. Further, it is assumed that the access history storage unit 200 and the content management unit 300 are operated by a database program. Further, it is assumed that the access history storage unit 200 stores the access history shown in FIG. In the access history shown in FIG. 2, the date (usage time) when the user browsed or downloaded the document, the document ID, the user name, and the department name (group name) are associated with each other. It is assumed that the person who stores the meta information shown in FIG. In the meta information shown in FIG. 3, the document ID, the document name, and the feature word are associated with each other.

抽出対象設定手段１００は、ステップＳ１（図１４参照）において、図６に例示する入力画面を表示し、ユーザ名または部署名の入力を促す。本例では、「ＵＳＥＲ１」が入力されたとする。抽出対象設定手段１００は、入力された「ＵＳＥＲ１」を、興味情報特定対象として決定する。 In step S1 (see FIG. 14), the extraction target setting unit 100 displays the input screen illustrated in FIG. 6 and prompts the user name or department name to be input. In this example, it is assumed that “USER1” is input. The extraction target setting unit 100 determines the input “USER1” as an interest information identification target.

ステップＳ２では、特徴語履歴生成手段４００が、「ＵＳＥＲ１」に対応する日付および文書ＩＤをアクセス履歴記憶手段２００から取得し、その文書ＩＤに対応する特徴語を文書管理手段３００から取得し、その特徴語と日付とを対応付けることで、特徴語履歴を生成する。例えば、「ＵＳＥＲ１」に対応する日付および文書ＩＤとして、「２００７／０９／０１」および「ＩＤ００１」がある（図２参照）。特徴語履歴生成手段４００は、その文書ＩＤ「ＩＤ００１」に対応する特徴語「セキュリティ」、「ユビキタス」、「ネットワーク」を文書管理手段３００から取得して、その各特徴語に「２００７／０９／０１」を対応付ける。特徴語履歴生成手段４００は、アクセス履歴において「ＵＳＥＲ１」に対応付けられている全ての文書ＩＤについて、この処理を行う。この結果、図４に示す特徴語履歴が生成される。 In step S2, the feature word history generation unit 400 acquires the date and document ID corresponding to “USER1” from the access history storage unit 200, acquires the feature word corresponding to the document ID from the document management unit 300, and A feature word history is generated by associating a feature word with a date. For example, there are “2007/09/01” and “ID001” as the date and document ID corresponding to “USER1” (see FIG. 2). The feature word history generation unit 400 acquires the feature words “security”, “ubiquitous”, and “network” corresponding to the document ID “ID001” from the document management unit 300, and sets “2007/09 / 01 ”is associated. The feature word history generation unit 400 performs this process for all document IDs associated with “USER1” in the access history. As a result, the feature word history shown in FIG. 4 is generated.

次のステップＳ３では、出現頻度計算手段５００が、生成された特徴語履歴を用いて、各特徴語の出現頻度を求める。例えば、「セキュリティ」に関しては、「２００７／０９／０１」、「２００７／０９／０１」、「２００７／０９／０２」に出現しているので、出現頻度を３回とする。他の特徴語についても出現頻度を求める。この結果、図７に示す出現頻度が得られる。 In the next step S3, the appearance frequency calculation means 500 obtains the appearance frequency of each feature word using the generated feature word history. For example, since “security” appears in “2007/09/01”, “2007/09/01”, and “2007/09/02”, the appearance frequency is set to 3 times. Appearance frequencies are also found for other feature words. As a result, the appearance frequency shown in FIG. 7 is obtained.

ステップＳ４では、出現間隔計算手段６００が、生成された特徴語履歴を用いて、各特徴語の出現時刻間隔を求め、各出現時刻間隔が生じた回数をカウントする。例えば、「セキュリティ」に関しては、「２００７／０９／０１」、「２００７／０９／０１」、「２００７／０９／０２」に出現しているので、出現時刻間隔０日が１回、出現時刻間隔１日が１回となる。他の特徴語に関しても同様の処理を行う。この結果、図８に示す結果が得られる。なお、文書が利用された日付が１つしかなければ、いずれの出現時刻間隔の発生回数も０回とする。 In step S4, the appearance interval calculation means 600 obtains the appearance time interval of each feature word using the generated feature word history, and counts the number of times each appearance time interval has occurred. For example, regarding “security”, since it appears in “2007/09/01”, “2007/09/01”, and “2007/09/02”, the appearance time interval 0 day is once, the appearance time interval One day is once. Similar processing is performed for other feature words. As a result, the result shown in FIG. 8 is obtained. If there is only one date when the document was used, the number of occurrences of any appearance time interval is set to zero.

特徴語評価手段７００は、ステップＳ５〜Ｓ９において、ステップＳ３で求めた出現頻度およびステップＳ４で求めた出現時刻間隔を用いて、図４に示す特徴語履歴中の特徴語の評価値を計算する。以下、特徴語「セキュリティ」の評価値を求める場合を例にして、評価値の計算過程の具体例を示す。図１５は、この評価値の計算過程を示す説明図である。 In step S5 to S9, the feature word evaluation unit 700 calculates the evaluation value of the feature word in the feature word history shown in FIG. 4 using the appearance frequency obtained in step S3 and the appearance time interval obtained in step S4. . In the following, a specific example of the evaluation value calculation process will be described by taking as an example the case of obtaining the evaluation value of the feature word “security”. FIG. 15 is an explanatory diagram showing the process of calculating the evaluation value.

ステップＳ５では、特徴語評価手段７００は、図８に示す出現時刻間隔を用いて、出現時刻間隔ｔで特徴語が出現する実際の確率Ｐ’（ｔ）を、各出現時刻間隔毎に計算する。また、出現時刻間隔ｔの発生回数が０回ということは、その出現時刻間隔ｔで文書が利用された回数０回であり、この場合、Ｐ’（ｔ）＝０とする。図８に示す「セキュリティ」に関しては、０日間隔（ｔ＝０）が１回、１日間隔（ｔ＝１）が１回となっているので、Ｐ’（０）およびＰ’（１）は、それぞれ０．５である。 In step S5, the feature word evaluation unit 700 uses the appearance time intervals shown in FIG. 8 to calculate the actual probability P ′ (t) that the feature word appears at the appearance time interval t for each appearance time interval. . In addition, the number of occurrences of the appearance time interval t being 0 is the number of times the document has been used at the appearance time interval t, and in this case, P ′ (t) = 0. With regard to “security” shown in FIG. 8, the 0-day interval (t = 0) is 1 time and the 1-day interval (t = 1) is 1 time, so P ′ (0) and P ′ (1) Are 0.5 respectively.

ステップＳ６では、特徴語評価手段７００は、特徴語履歴の対象期間Ｔと、出現頻度Ｋとを用いて、モデルとなる確率分布（指数分布）において出現時刻間隔ｔで特徴語が出現する確率Ｐ（ｔ）を、各出現時刻間隔毎に計算する。本例では、特徴語履歴は、２００７年９月１から２００７年９月７日の期間のアクセス履歴から生成されているので、Ｔ＝７となる。この期間Ｔは、例えば、アクセス履歴記憶手段２００において、アクセス履歴の作成開始時および作成終了時を記録しておき、特徴語評価手段７００がその開始時から終了時までの期間を計算して求めてもよい。あるいは、期間Ｔは外部から入力されてもよい。「セキュリティ」の出現頻度Ｋは３であり、Ｔ＝７であるので、特徴語評価手段７００は、Ｋ／Ｔ＝０．４２８６を計算する。さらに、特徴語評価手段７００は、この値を用いて、式（２）の計算を行いＰ（ｔ）を求める。例えば、０日間隔（ｔ＝０）の場合、Ｐ（０）＝０．４２８６×ｅ^{−０．４２８６×０}＝０．４２８６となる。 In step S6, the feature word evaluation means 700 uses the target period T of the feature word history and the appearance frequency K, and the probability P that the feature word appears at the appearance time interval t in the model probability distribution (exponential distribution). (T) is calculated for each appearance time interval. In this example, since the feature word history is generated from the access history for the period from September 1, 2007 to September 7, 2007, T = 7. The period T is obtained by, for example, recording the start and end of creation of the access history in the access history storage unit 200, and calculating the period from the start to the end of the feature word evaluation unit 700. May be. Alternatively, the period T may be input from the outside. Since the appearance frequency K of “security” is 3 and T = 7, the feature word evaluation unit 700 calculates K / T = 0.4286. Further, the feature word evaluation unit 700 calculates the equation (2) using this value to obtain P (t). For example, in the case of an interval of 0 days (t = 0), P (0) = 0.4286 × e− ^{0.4286 × 0} = 0.4286.

ステップＳ７では、特徴語評価手段７００は、それぞれの出現時刻間隔毎に、ずれ量｜Ｐ（ｔ）−Ｐ’（ｔ）｜を計算し、続くステップＳ８では、出現時刻間隔毎に計算した｜Ｐ（ｔ）−Ｐ’（ｔ）｜の総和を求める。本例では、図１５に示すように、このずれ量の総和は０．６７０となる。 In step S7, the feature word evaluation unit 700 calculates the deviation amount | P (t) −P ′ (t) | for each appearance time interval. In the subsequent step S8, the feature word evaluation unit 700 calculates for each appearance time interval. The sum of P (t) −P ′ (t) | is obtained. In this example, as shown in FIG. 15, the total sum of the deviation amounts is 0.670.

ステップＳ９では、特徴語評価手段７００は、ステップＳ８で求めたずれ量の総和により、評価値Ｖを計算する。式（３）の計算を行い、評価値Ｖを求めると、Ｖ＝ｅ^{−０．６７０}＝０．５１２となる。ここでは、「セキュリティ」の評価値を求める場合を例示したが、他の特徴語についても同様に評価値を計算する。この結果、図１１に示すように各特徴語の評価値が求まる。 In step S9, the feature word evaluation unit 700 calculates the evaluation value V based on the sum of the deviation amounts obtained in step S8. When the calculation of Expression (3) is performed to obtain the evaluation value V, V = e ^−0.670 = 0.512. Here, the case of obtaining the evaluation value of “security” is illustrated, but the evaluation value is calculated in the same manner for other feature words. As a result, the evaluation value of each feature word is obtained as shown in FIG.

ステップＳ１０では、特徴語選択手段８００は、評価値が閾値以上となっている特徴語を選択する。本例では、予め閾値が０．３に設定されているものとする。特徴語選択手段８００は、評価値が０．３以上の「セキュリティ」および「ネットワーク」を選択する（図１１参照）。 In step S10, the feature word selection unit 800 selects a feature word whose evaluation value is equal to or greater than a threshold value. In this example, it is assumed that the threshold is set to 0.3 in advance. The feature word selection unit 800 selects “security” and “network” having an evaluation value of 0.3 or more (see FIG. 11).

ステップＳ１１では、興味情報提示手段９００が、選択された特徴語「セキュリティ」および「ネットワーク」を、ユーザまたはグループの定常的な興味を示す語として表示する。例えば、「おすすめ検索キーワード」として表示してもよい。 In step S11, the interest information presenting means 900 displays the selected feature words “security” and “network” as words indicating the steady interest of the user or group. For example, it may be displayed as “recommended search keyword”.

なお、図１２に示すように、興味情報特定システムが検索手段９５０を備え、検索手段９５０が、ステップＳ１０で選択された特徴語を検索語として、検索語に合致する文書、Ｗｅｂページ、ニュース記事などを検索し、興味情報提示手段９００が、特徴語とともに、それらの検索結果を表示してもよい。 As shown in FIG. 12, the interest information identification system includes a search unit 950, and the search unit 950 uses the feature word selected in step S10 as a search word, a document that matches the search word, a Web page, and a news article. And the interest information presenting means 900 may display the search results together with the feature words.

あるいは、興味情報提示手段９００がステップＳ１１で特徴語を表示するときには、検索手段９５０は検索を行わず、特徴語の表示後に、表示した特徴語のうちいずれかが興味情報特定システムの使用者に指定されたときに、検索手段９５０が、指定された特徴語を検索語として各種コンテンツ（文書、Ｗｅｂページなど）を検索してもよい。この検索結果は、例えば、興味情報提示手段９００が表示すればよい。この場合においても、コンテンツ管理手段３００に記憶されているコンテンツを検索対象としてもよく、あるいは、興味情報特定システム外部のコンテンツデータベースや、各種Ｗｅｂページを検索対象としてもよい。 Alternatively, when the interest information presentation unit 900 displays the feature word in step S11, the search unit 950 does not perform the search, and after the feature word is displayed, any one of the displayed feature words is displayed to the user of the interest information specifying system. When specified, the search unit 950 may search various contents (documents, Web pages, etc.) using the specified feature word as a search word. The search result may be displayed by, for example, the interest information presentation unit 900. Even in this case, the content stored in the content management unit 300 may be a search target, or a content database outside the interest information identification system or various Web pages may be the search target.

図１６は、特徴語による検索を行う場合の画面例を示す説明図である。ステップＳ１１において、興味情報提示手段９００は、図１６（ａ）に例示する画面１４０１を表示する。画面１４０１には、特定された特徴語１４０２および検索語の入力欄１４０３および検索ボタンを含む。表示した特徴語がクリックされたり、あるいは、入力欄１４０３に入力されて検索ボタンがクリックされるなどの操作によって、特徴語が指定されると、検索手段９５０は、指定された特徴語を検索語として文書などの検索を行い、興味情報提示手段９００は、その検索結果を表示する。図１６（ｂ）に例示する画面１４１１は、検索結果表示画面の例を示す。画面１４１１では、例えば、ステップＳ１０で特定された特徴語１４１２、検索結果１４１４とを含む。また、図１６（ｂ）では、入力欄１４１３に、指定された特徴語（本例では「セキュリティ」）を表示する場合を示している。 FIG. 16 is an explanatory diagram showing an example of a screen when performing a search using feature words. In step S11, the interest information presentation unit 900 displays a screen 1401 illustrated in FIG. The screen 1401 includes an identified feature word 1402, a search word input field 1403, and a search button. When a feature word is specified by clicking on the displayed feature word or by an operation such as inputting to the input field 1403 and clicking the search button, the search unit 950 searches for the specified feature word. And the interest information presenting means 900 displays the search result. A screen 1411 illustrated in FIG. 16B shows an example of a search result display screen. The screen 1411 includes, for example, the feature word 1412 and the search result 1414 specified in step S10. FIG. 16B shows a case where the designated feature word (in this example, “security”) is displayed in the input field 1413.

検索結果の表示画面１４１１では、最初に特徴語を表示する画面１４０１とは、異なる特徴語を表示してもよい。例えば、ステップＳ１０で特定された特徴語のうち、最初の画面１４０１で表示していない特徴語を表示してもよい。あるいは、ステップＳ１０で特徴語選択手段８００が評価値の高い順に特徴語をソートし、興味情報提示手段９００は、画面１４０１，１４１１のように画面を切り換えるときに、先の画面ほど上位の特徴語を表示するようにして、表示する特徴語を切り換えてもよい。 The search result display screen 1411 may display a different feature word from the screen 1401 that first displays the feature word. For example, among the feature words specified in step S10, feature words that are not displayed on the first screen 1401 may be displayed. Alternatively, in step S10, the feature word selection unit 800 sorts the feature words in descending order of evaluation value, and when the interest information presentation unit 900 switches the screen as in the screens 1401 and 1411, the feature word is higher in the previous screen. May be displayed, and the feature word to be displayed may be switched.

また、興味情報特定システムは、各ユーザ、各部署それぞれについて定常的な興味を表す特徴語を求め、列挙してもよい。 In addition, the interest information identification system may obtain and enumerate characteristic words that represent steady interest for each user and each department.

また、興味情報特定システムは、ユーザについての特徴語と、ユーザが所属するグループについての特徴語を特定して表示してもよい。さらに、一つの期間だけでなく、複数の期間について、それぞれステップＳ２〜Ｓ１０の処理を行って、各期間毎に特徴語を提示してもよい。例えば、図１７に例示する画面を表示してもよい。図１７に例示する画面では、１ヶ月間の定常的な興味を示す特徴語１５０２と、１年間の定常的な興味を示す特徴語１５０３と、設定したユーザが所属するグループの定常的な興味を示す特徴語１５０４とを表示している。また、これらの特徴語のいずれか指定されると、検索手段９５０がその特徴語を検索語として検索を行い、図１７に示すように、その検索結果１５０５を表示してもよい。 Moreover, the interest information identification system may identify and display feature words about the user and feature words about the group to which the user belongs. Furthermore, the process of steps S2 to S10 may be performed for each of a plurality of periods instead of just one period, and feature words may be presented for each period. For example, the screen illustrated in FIG. 17 may be displayed. In the screen illustrated in FIG. 17, a feature word 1502 indicating a one-month stationary interest, a feature word 1503 indicating a one-year stationary interest, and a stationary interest of a group to which the set user belongs. A feature word 1504 is displayed. If any one of these feature words is specified, the search unit 950 may perform a search using the feature word as a search word, and display the search result 1505 as shown in FIG.

また、図１７に示す例では、１ヶ月間および１年間の定常的な興味をそれぞれ示す場合を例示しているが、上半期・下半期、あるいは、１月・２月・３月といったように、期間毎に特徴語を特定して表示してもよい。 In addition, in the example shown in FIG. 17, the case of showing a constant interest for one month and one year is illustrated, but a period such as the first half, the second half, or January, February, March, etc. A feature word may be specified for each display.

各期間毎の定常的な興味を示す特徴語を特定する場合、例えば、特徴語履歴生成手段４００が期間を一つずつ選択し、選択した期間におけるアクセス履歴のみを抽出し、そのアクセス履歴を用いて、ステップＳ２以降の処理を行えばよい。また、ステップＳ７で用いる期間Ｔとして、選択した期間を用いればよい。そして、各期間を一つずつ選択し、期間毎の特徴語を特定すればよい。 When specifying a feature word indicating a constant interest for each period, for example, the feature word history generation unit 400 selects one period at a time, extracts only the access history in the selected period, and uses the access history Thus, the processing after step S2 may be performed. Further, the selected period may be used as the period T used in step S7. Then, each period is selected one by one, and a feature word for each period may be specified.

また、特徴語選択手段８００がステップＳ１０で特徴語を特定するとき、評価値に基づいて特徴語を特定した後、検索手段９５０が、選択された各特徴語を検索語としてコンテンツの検索を行い、検索結果が０件であった特徴語は、選択した結果から除外してもよい。 Further, when the feature word selection unit 800 specifies a feature word in step S10, after specifying the feature word based on the evaluation value, the search unit 950 searches the content using each selected feature word as a search word. The feature words whose search results are 0 may be excluded from the selected results.

また、検索手段９５０が検索対象とするコンテンツ集合と、コンテンツ管理手段３００が記憶しているコンテンツ集合とが同一である場合、検索手段９５０が、使用者に指定された特徴語でコンテンツを検索した後、その検索結果を用いて、ステップＳ１から再度処理を行い、定常的な興味を示す特徴語を特定し、その特徴語を表示してもよい。 In addition, when the content set to be searched by the search unit 950 and the content set stored by the content management unit 300 are the same, the search unit 950 searches for the content with the feature word specified by the user. Thereafter, the search result may be used to perform the process again from step S1, identify a feature word indicating a steady interest, and display the feature word.

本実施形態では、蓄積されたアクセス履歴に基づいて、特徴語履歴を生成し、ある出現時刻間隔で特徴語が出現する確率Ｐ’（ｔ）を、それぞれの出現時刻間隔毎に計算する。また、モデルとなる確率分布において、ある出現時刻間隔で特徴語が出現する確率Ｐ（ｔ）を、それぞれの出現時刻間隔毎に計算する。そして、出現時刻間隔毎に、両者の差の絶対値｜Ｐ（ｔ）−Ｐ’（ｔ）｜を計算し、その総和に応じて特徴語の評価値を計算する。ここで、モデルとなる確率分布は、特徴語がランダムに出現した場合の確率分布であるので、評価値は、特徴語がどの程度ランダムに出現しているかの度合いとなる。本実施形態では、そのような評価値に基づいて特徴語を特定しているので、ある期間内で一時的に多く出現した特徴語よりも、ある期間内で満遍なくランダムに出現した特徴語を特定することができる。よって、本実施形態によれば、ある期間内でランダムに出現する特徴語を興味情報として特定することができる In the present embodiment, a feature word history is generated based on the accumulated access history, and a probability P ′ (t) that a feature word appears at a certain appearance time interval is calculated for each appearance time interval. Further, the probability P (t) that a feature word appears at a certain appearance time interval in the model probability distribution is calculated for each appearance time interval. Then, for each appearance time interval, the absolute value | P (t) −P ′ (t) | of the difference between the two is calculated, and the evaluation value of the feature word is calculated according to the sum. Here, since the probability distribution as a model is a probability distribution when feature words appear at random, the evaluation value is a degree of how much the feature words appear at random. In the present embodiment, feature words are specified based on such evaluation values, so that feature words that appear randomly and uniformly within a certain period are specified rather than feature words that temporarily appear within a certain period. can do. Therefore, according to the present embodiment, feature words that appear randomly within a certain period can be specified as interest information.

また、第１の実施形態において、ずれ量の総和に基づいて評価値を計算する方法は、式（３）の計算に限定されない。例えば、各出現時刻間隔におけるずれ量｜Ｐ（ｔ）−Ｐ’（ｔ）｜の総和を評価値としてもよい。この場合、ランダムに出現する特徴語ほど、評価値は小さくなるので、特徴語選択手段８００は、例えば、評価値が閾値以下となっている特徴語を選択すればよい。 In the first embodiment, the method for calculating the evaluation value based on the total sum of the deviation amounts is not limited to the calculation of Expression (3). For example, the sum of the deviation amounts | P (t) −P ′ (t) | at each appearance time interval may be used as the evaluation value. In this case, since the evaluation value becomes smaller as the feature word appears at random, the feature word selection unit 800 may select, for example, a feature word whose evaluation value is equal to or less than a threshold value.

また、上記の特徴語評価手段７００の処理例では、出現時刻間隔毎のずれ量｜Ｐ（ｔ）−Ｐ’（ｔ）｜を計算し、それらの総和を求めているが、評価値の計算方法は、出現時刻間隔の分布とモデルとなる確率分布との乖離に応じて評価値を計算する方法であればよく、上記の計算方法に限定されるわけではない。 Further, in the processing example of the feature word evaluation unit 700 described above, the deviation amount | P (t) −P ′ (t) | for each appearance time interval is calculated and the total sum thereof is obtained. The method may be any method as long as the evaluation value is calculated according to the difference between the distribution of the appearance time intervals and the model probability distribution, and is not limited to the above calculation method.

例えば、Ｐ（ｔ）とＰ’（ｔ）との比を用いて特徴語の評価値Ｖを計算してもよい。Ｐ（ｔ）とＰ’（ｔ）との比を用いた評価値Ｖの計算例について説明する。本例では、特徴語評価手段７００は、ステップＳ７において、出現時刻間隔毎に、ずれ量としてＰ（ｔ）／Ｐ’（ｔ）を計算する。続くステップＳ８では、特徴語評価手段７００は、出現時刻間隔毎に計算したＰ（ｔ）／Ｐ’（ｔ）をそれぞれ掛け合わせる。すなわち、特徴語評価手段７００は、以下に示す式（４）の計算を行う。 For example, the evaluation value V of the feature word may be calculated using the ratio of P (t) and P ′ (t). An example of calculating the evaluation value V using the ratio of P (t) and P ′ (t) will be described. In this example, the feature word evaluation unit 700 calculates P (t) / P ′ (t) as a deviation amount at each appearance time interval in step S7. In subsequent step S8, the feature word evaluation unit 700 multiplies P (t) / P ′ (t) calculated for each appearance time interval. That is, the feature word evaluation unit 700 calculates the following formula (4).

式（４）の計算結果と１との差の絶対値は、出現時刻間隔の分布とモデルとなる確率分布とのずれの大きさを示し、式（４）の計算結果と１との差の絶対値が大きいほどずれが大きい。特徴語評価手段７００は、式（４）の計算結果を用いて、以下に示す式（５）の計算を行い、特徴語の評価値Ｖを求めればよい。 The absolute value of the difference between the calculation result of Equation (4) and 1 indicates the magnitude of the deviation between the distribution of the appearance time intervals and the probability distribution as a model, and the difference between the calculation result of Equation (4) and 1 The larger the absolute value, the greater the deviation. The feature word evaluation means 700 may calculate the following expression (5) using the calculation result of expression (4) to obtain the evaluation value V of the feature word.

式（５）の右辺の指数部分は、出現時刻間隔毎に計算したＰ（ｔ）／Ｐ’（ｔ）の積と、１との差の絶対値に−１を乗じた値である。Ｐ（ｔ）とＰ’（ｔ）との比を用いて評価値Ｖを計算する方法は、式（５）の計算方法に限定されない。例えば、出現時刻間隔毎に｛ｌｏｇ（Ｐ（ｔ）／Ｐ’（ｔ））｝^２を計算し、その値の積を用いて以下に示す式（６）の計算を行って、特徴語の評価値Ｖを求めてもよい。 The exponent part on the right side of Equation (5) is a value obtained by multiplying the absolute value of the difference between the product of P (t) / P ′ (t) calculated for each appearance time interval by −1. The method of calculating the evaluation value V using the ratio of P (t) and P ′ (t) is not limited to the calculation method of Equation (5). For example, {log (P (t) / P ′ (t))} ² is calculated for each appearance time interval, and the product of the values is used to calculate Equation (6) below, The evaluation value V may be obtained.

式（６）の右辺の指数部分は、出現時刻間隔毎に計算した｛ｌｏｇ（Ｐ（ｔ）／Ｐ’（ｔ））｝^２の積に−１を乗じた値である。 The exponent part on the right side of Equation (6) is a value obtained by multiplying the product of {log (P (t) / P ′ (t))} ² calculated for each appearance time interval by −1.

また、式（５）および式（６）では、Ｐ（ｔ）とＰ’（ｔ）との比を用いる場合に、出現時刻間隔毎に求めた値の積を計算しているが、出現時刻間隔毎に求めた値を加算して評価値を求めてもよい。例えば、出現時刻間隔毎に｜１−（Ｐ（ｔ）／Ｐ’（ｔ））｜を計算し、その和を用いて以下に示す式（７）の計算を行って、特徴語の評価値Ｖを求めてもよい。 Further, in the expressions (5) and (6), when the ratio of P (t) and P ′ (t) is used, the product of the values obtained at every appearance time interval is calculated. The evaluation value may be obtained by adding the values obtained for each interval. For example, | 1- (P (t) / P ′ (t)) | is calculated for each appearance time interval, and the following expression (7) is calculated using the sum, and the evaluation value of the feature word is calculated. V may be obtained.

式（７）の右辺の指数部分は、出現時刻間隔毎に計算した｜１−（Ｐ（ｔ）／Ｐ’（ｔ））｜の和に−１を乗じた値である。 The exponent part on the right side of Equation (7) is a value obtained by multiplying the sum of | 1- (P (t) / P '(t)) | calculated for each appearance time interval by -1.

式（５）から式（７）に例示する計算で評価値Ｖを求める場合、出現時刻間隔の分布とモデルとなる確率分布との乖離が少ないほど、評価値Ｖの値は大きくなる。 When the evaluation value V is obtained by the calculation exemplified in the equations (5) to (7), the evaluation value V increases as the deviation between the appearance time interval distribution and the model probability distribution decreases.

式（３）における｜Ｐ（ｔ）−Ｐ’（ｔ）｜は、出現時刻間隔の分布とモデルとなる確率分布との乖離の程度を示す値の一例である。同様に、式（５）におけるＰ（ｔ）／Ｐ’（ｔ）、式（６）における｛ｌｏｇ（Ｐ（ｔ）／Ｐ’（ｔ））｝^２、および式（７）における｜１−（Ｐ（ｔ）／Ｐ’（ｔ））｜も、乖離の程度を示す値の例である。 | P (t) −P ′ (t) | in Expression (3) is an example of a value indicating the degree of deviation between the distribution of the appearance time intervals and the model probability distribution. Similarly, P (t) / P ′ (t) in equation (5), {log (P (t) / P ′ (t))} ² in equation (6), and | 1− in equation (7). (P (t) / P ′ (t)) | is also an example of a value indicating the degree of deviation.

実施形態２．
次に、本発明の第２の実施形態について説明する。第２の実施形態の興味情報特定システムも、第１の実施形態と同様に、抽出対象設定手段１００と、アクセス履歴記憶手段２００と、コンテンツ管理手段３００と、特徴語履歴生成手段４００と、出現頻度計算手段５００と、出現間隔計算手段６００と、特徴語評価手段７００と、特徴語選択手段８００と、興味情報提示手段９００とを備える。また、検索手段９５０を備えていてもよい。以下、図１を参照して、第２の実施形態について説明する。 Embodiment 2. FIG.
Next, a second embodiment of the present invention will be described. Similar to the first embodiment, the interest information specifying system of the second embodiment also includes an extraction target setting unit 100, an access history storage unit 200, a content management unit 300, a feature word history generation unit 400, and an appearance. Frequency calculation means 500, appearance interval calculation means 600, feature word evaluation means 700, feature word selection means 800, and interest information presentation means 900 are provided. Further, a search unit 950 may be provided. Hereinafter, a second embodiment will be described with reference to FIG.

第２の実施形態では、特徴語評価手段７００が評価値を計算する方法が、第１の実施形態と異なる。特徴語評価手段７００以外の構成要素の動作は、第１の実施形態と同様であり、説明を省略する。 In the second embodiment, the method for calculating the evaluation value by the feature word evaluation unit 700 is different from that in the first embodiment. The operations of the components other than the feature word evaluation unit 700 are the same as those in the first embodiment, and a description thereof will be omitted.

第２の実施形態では、特徴語評価手段７００は、特徴語が定期的に出現する傾向が高いほど、値が大きくなるように特徴語の評価値を計算する。例えば、ある１日に集中して７回出現する特徴語と、７日間に渡って１日１回ずつ出現する特徴語とでは、後者の方が値が大きくなるように評価値を計算する。一時期に集中的に発生する特徴語よりも、定期的に出現することを繰り返す特徴語の方が、ユーザやグループの定常的興味を表していると言える。本実施形態では、そのような特徴語に高い評価値を付与するように計算する。具体的には、特徴語評価手段７００は、出現期間が長く、出現時刻間隔の標準偏差および平均値が小さいほど、値が大きくなるように評価値を計算する。出現期間は、着目している特徴語が最初に出現した時から最後に出現した時までの期間である。すなわち、着目している特徴語により特徴が表されるコンテンツが最初に利用された時から最後に利用されたときまでの期間である。 In the second embodiment, the feature word evaluation unit 700 calculates the evaluation value of the feature word so that the value increases as the tendency of the feature word to appear regularly increases. For example, an evaluation value is calculated so that the value of a feature word that appears seven times in a concentrated manner on one day and a feature word that appears once a day for seven days are larger. It can be said that the feature word that repeats regularly appears more representatively than the feature words that occur intensively at a certain time. In the present embodiment, calculation is performed so that a high evaluation value is assigned to such a feature word. Specifically, the feature word evaluation unit 700 calculates the evaluation value so that the value increases as the appearance period is longer and the standard deviation and average value of the appearance time interval are smaller. The appearance period is a period from when the feature word of interest first appears to when it last appears. That is, it is a period from the time when the content whose feature is expressed by the feature word of interest is first used until the time when it is used last.

特徴語評価手段７００は、出現間隔計算手段６００によって各特徴語の出現時刻間隔が求まると、各特徴語毎に、出現時刻間隔の標準偏差および平均値を計算する。なお、標準偏差は、標本分散を用いて計算してもよく、あるいは、不偏分散を用いて計算してもよい。この標準偏差をＳＴＤＥＶと記し、平均値をＡＶＥと記すことにする。また、各特徴語毎の最初の出現時刻をＴ_０と記し、最後の出現時刻をＴ_ｌａｓｔと記すことにする。また、第１の実施の形態と同様に、特徴語履歴の導出対象期間（特徴語履歴を生成する基となったアクセス履歴を採取していた期間）をＴとする。特徴語評価手段７００は、パラメータβを用いて、以下に示す式（８）の計算を行うことにより、特徴語の評価値Ｖを求める。 When the appearance interval calculation unit 600 obtains the appearance time interval of each feature word, the feature word evaluation unit 700 calculates the standard deviation and the average value of the appearance time interval for each feature word. The standard deviation may be calculated using sample variance or may be calculated using unbiased variance. This standard deviation is denoted as STDEV, and the average value is denoted as AVE. In addition, the first appearance time for each feature word is _denoted as T _0, and the last appearance time is denoted as T _last . Similarly to the first embodiment, T is a feature word history derivation target period (a period during which an access history that is a basis for generating a feature word history is collected). The feature word evaluation unit 700 obtains the evaluation value V of the feature word by calculating the following expression (8) using the parameter β.

Ｖ＝｛（Ｔ_ｌａｓｔ−Ｔ_０）／Ｔ｝・ｅ^{（−β・ＳＴＤＥＶ・ＡＶＥ）} 式（８） V = {(T _last −T ₀ ) / T} · e ^{(−β · STDEV · AVE)} Equation (8)

パラメータβは、出現時刻間隔が短い特徴語を特に優先的に抽出しやすくするか否かを調整するためのパラメータである。βの値が大きいと、出現時刻間隔が短い場合に、出現時刻間隔が長い場合に比べて大きな評価値を付与することができ、出現時刻間隔が短い場合と長い場合とでの評価値の差を大きくすることができる。βの値は予め定められていてもよい。あるいは、例えば、抽出対象設定手段１００を介して、興味情報特定システムの使用者によって入力されてもよい。βの値が入力されるタイミングは、式（８）の計算を行う前であればよく、例えば、ユーザ名やグループ名とともに入力されてもよい。 The parameter β is a parameter for adjusting whether or not a feature word having a short appearance time interval is particularly easily extracted. When the value of β is large, a larger evaluation value can be given when the appearance time interval is short than when the appearance time interval is long, and the difference between the evaluation values when the appearance time interval is short and long Can be increased. The value of β may be determined in advance. Alternatively, for example, it may be input by the user of the interest information specifying system via the extraction target setting unit 100. The timing at which the value of β is input may be before the calculation of Expression (8). For example, it may be input together with a user name or a group name.

また、特徴語評価手段７００は、出現頻度が定められた回数以下である特徴語に関しては、式（８）の計算を行わずに、評価値を所定値に定める。この所定値は、評価値が最も低いことを示す値であればよい。以下、この所定値が０であるものとして説明する。 In addition, the feature word evaluation unit 700 sets the evaluation value to a predetermined value without calculating the formula (8) for the feature word whose appearance frequency is equal to or less than the predetermined number of times. This predetermined value may be a value indicating that the evaluation value is the lowest. In the following description, it is assumed that the predetermined value is zero.

また、出現頻度に関する上記の「定められた回数」は、例えば２回であるが、２回でなくてもよい。出現頻度が２回以下であるということは、出現時刻間隔が最大で１つしか求められない。出現時刻間隔が１つも求まらなければ、出現時刻間隔の標準偏差が求められない。また、出現頻度が２回であり、出現時刻間隔が１つ求められたとしても、不偏分散による標準偏差は求められず、また、標本分散による標準偏差は求めることができても標準偏差は０となるため、均一の間隔で特徴語が出現する場合と区別ができない。よって、出現頻度が例えば２回以下の場合、評価値を０とする。また、特徴語の出現頻度が２回より多い場合であっても、出現頻度が少なく、出現頻度から求められる出現時刻間隔の数が少ない場合には、標準偏差が０となることがある。よって、上記の「定められた回数」は２回より多くてもよい。上記の「定められた回数」は規定値であってもよく、あるいは、「定められた回数」を示す値が興味情報特定システムの使用者から入力されてもよい。以下の説明では、出現頻度が２回以下である特徴語の評価値を０とする場合を例にして説明する。 In addition, the “predetermined number of times” related to the appearance frequency is, for example, twice, but may not be twice. That the appearance frequency is 2 times or less requires only one appearance time interval at the maximum. If no appearance time interval is obtained, the standard deviation of the appearance time interval cannot be obtained. Further, even if the appearance frequency is 2 and one appearance time interval is obtained, the standard deviation due to unbiased variance is not obtained, and the standard deviation due to sample variance can be obtained, but the standard deviation is 0. Therefore, it cannot be distinguished from the case where feature words appear at uniform intervals. Therefore, when the appearance frequency is, for example, twice or less, the evaluation value is set to 0. Even if the appearance frequency of the feature word is more than twice, the standard deviation may be zero if the appearance frequency is low and the number of appearance time intervals obtained from the appearance frequency is small. Therefore, the “predetermined number of times” may be more than two times. The “predetermined number of times” may be a specified value, or a value indicating the “predetermined number of times” may be input from the user of the interest information specifying system. In the following description, the case where the evaluation value of a feature word whose appearance frequency is twice or less is set to 0 will be described as an example.

特徴語評価手段７００は、特徴語履歴中の各特徴語を順次選択し、選択した特徴語の評価値を計算する。 The feature word evaluation unit 700 sequentially selects each feature word in the feature word history and calculates an evaluation value of the selected feature word.

第２の実施例における特徴語評価手段７００の処理の具体例を示す。特徴語履歴生成手段４００が、ある特徴語Ａについて、９月１日から９月１４日までの間の特徴語履歴として、図９（ａ）に例示する特徴語履歴が生成されたとする。この場合、出現頻度計算手段５００は、図９（ｂ）に示すように、特徴語Ａの出現頻度「９」を求める。出現間隔計算手段６００は、図９（ｃ）に示す出現時刻間隔を求める。出現頻度は２回よりも多いので、特徴語評価手段７００は、式（８）の計算を行って評価値を求める。ここではβ＝０．１として説明する。 A specific example of the processing of the feature word evaluation unit 700 in the second embodiment will be shown. Assume that the feature word history generation unit 400 generates a feature word history illustrated in FIG. 9A as a feature word history between September 1 and September 14 for a certain feature word A. In this case, the appearance frequency calculation means 500 obtains the appearance frequency “9” of the feature word A as shown in FIG. The appearance interval calculation means 600 obtains the appearance time interval shown in FIG. Since the appearance frequency is more than twice, the feature word evaluation unit 700 calculates the formula (8) to obtain the evaluation value. Here, explanation will be made assuming that β = 0.1.

特徴語履歴は９月１日から９月１４日までのアクセス履歴から生成されているので、Ｔ＝１４である。また、図９（ａ）に示すように、特徴語Ａの最後の出現時刻は９月１日であり、最後の出現時刻は９月１２日であるので、Ｔ_ｌａｓｔ−Ｔ_０＝１１である。また、特徴語評価手段７００は、図９（ｃ）に示す各出現時刻間隔の標準偏差ＳＴＤＥＶおよび平均値を計算する。本例では、ＡＶＥ＝１．３７５，ＳＴＤＥＶ＝１．５０６である。よって、特徴語評価手段７００は、（１１／１４）・ｅ^{（−０．１×１．３７５×１．５０６）}を計算することにより、特徴語Ａの評価値Ｖを求める。本例では、Ｖ＝０．６３９となる。 Since the feature word history is generated from the access history from September 1st to September 14th, T = 14. Further, as shown in FIG. 9A, since the last appearance time of the feature word A is September 1 and the last appearance time is September 12, T _last −T ₀ = 11. . Further, the feature word evaluation unit 700 calculates the standard deviation STDEV and the average value of each appearance time interval shown in FIG. In this example, AVE = 1.375 and STDEV = 1.506. Therefore, the feature word evaluation unit 700 calculates the evaluation value V of the feature word A by calculating (11/14) · e ^{(−0.1 × 1.375 × 1.506)} . In this example, V = 0.639.

ここでは、図９に示す例を用いて説明したが、図４に示す特徴語履歴から図７および図８に示す出現頻度、出現時刻間隔を求め、各特徴語の評価値を求めると、図１８に示すようになる。特徴語「ユビキタス」、「パソコン」、「サーバ」、「ストレージ」は、出現頻度が２回または１回であり、出現時刻間隔を求められないか、あるいは、１つしか求められないので、評価値を０．０と定める。 Here, the example shown in FIG. 9 is used for explanation. However, when the appearance frequency and the appearance time interval shown in FIGS. 7 and 8 are obtained from the feature word history shown in FIG. 4 and the evaluation value of each feature word is obtained, FIG. As shown in FIG. Characteristic words "Ubiquitous", "PC", "Server", and "Storage" have an appearance frequency of 2 or 1 and cannot be determined for the appearance time interval or only one, so evaluation The value is set to 0.0.

本例では、定期的に出現していて、定常的な興味を表していると言える特徴語に高い評価値を付与するので、特徴語選択手段８００は、例えば、評価値が閾値以上となっている特徴語を選択すればよい。閾値が０．１であるとすると、特徴語選択手段８００は、図１８に示す各特徴語のうち、「セキュリティ」および「ネットワーク」を選択する。 In this example, since a high evaluation value is given to a feature word that appears regularly and expresses a constant interest, the feature word selection unit 800, for example, has an evaluation value equal to or greater than a threshold value. What is necessary is just to select the feature word. If the threshold is 0.1, the feature word selection unit 800 selects “security” and “network” from the feature words shown in FIG.

次に、動作について説明する。図１９は、第２の実施形態の興味情報特定システムの処理経過の例を示す流れ図である。図１９に示すステップＳ１〜Ｓ４およびステップＳ１０，Ｓ１１は、第１の実施形態と同様の処理であり、説明を省略する。 Next, the operation will be described. FIG. 19 is a flowchart illustrating an example of processing progress of the interest information identification system according to the second embodiment. Steps S1 to S4 and steps S10 and S11 shown in FIG. 19 are the same processes as those in the first embodiment, and a description thereof will be omitted.

ステップＳ４までの処理で各特徴語の出現頻度および出現時刻間隔が求められると、特徴語評価手段７００は、各出現時刻間隔の平均値ＡＶＥおよび標準偏差ＳＴＤＥＶを計算する（ステップＳ４ａ）。続いて、特徴語評価手段７００は、ステップＳ４ａで求めたＡＶＥおよびＳＴＤＥＶを用いて式（８）の計算を行い、特徴語の評価値Ｖを計算する（ステップＳ４ｂ）。ステップＳ４ｂにおいて、特徴語評価手段７００は、特徴語履歴を参照して、着目している特徴語の最初の出現時刻をＴ_０とし、最後の出現時刻をＴ_ｌａｓｔとすればよい。 When the appearance frequency and appearance time interval of each feature word are obtained in the processing up to step S4, the feature word evaluation unit 700 calculates the average value AVE and standard deviation STDEV of each appearance time interval (step S4a). Subsequently, the feature word evaluation unit 700 calculates Expression (8) using AVE and STDEV obtained in Step S4a, and calculates the evaluation value V of the feature word (Step S4b). In step S4b, the feature word evaluation unit 700 may refer to the feature word history and set the first appearance time of the feature word of interest as T ₀ and the last appearance time as T _last .

特徴語評価手段７００は、ステップＳ４ａ，Ｓ４ｂの処理を各特徴語毎に行う。ステップＳ４ａ，Ｓ４ｂのそれぞれのステップで、特徴語を順次選択し、選択した特徴語毎に処理を行ってもよい。また、ステップＳ４ａの処理を実行する前に、特徴語を選択し、選択した特徴語についてステップＳ４ａ，Ｓ４ｂの処理を行い、さらに次の特徴語を選択し、同様にステップＳ４ａ，Ｓ４ｂの処理を行っていってもよい。 The feature word evaluation unit 700 performs the processing of steps S4a and S4b for each feature word. In each of steps S4a and S4b, feature words may be sequentially selected, and processing may be performed for each selected feature word. Further, before executing the process of step S4a, a feature word is selected, the process of steps S4a and S4b is performed for the selected feature word, the next feature word is further selected, and the process of steps S4a and S4b is performed in the same manner. You may go.

以降のステップＳ１０，Ｓ１１の処理は第１の実施の形態と同様である。また、第１の実施形態で説明した種々の変形例が第２の実施形態に適用されてもよい。 The subsequent steps S10 and S11 are the same as those in the first embodiment. Various modifications described in the first embodiment may be applied to the second embodiment.

上記処理の具体例を以下に示す。なお、ステップＳ４までの処理は、第１の実施形態と同様であり、説明を省略する。ステップＳ２で図４に示す特徴語履歴を生成し、ステップＳ３で図７に示す出現頻度を計算し、ステップＳ４で図８に示す出現時刻間隔を求めたとする。また、以下の説明では、「セキュリティ」の評価値の計算過程を例示する。図２０は、この計算過程を示す説明図である。 A specific example of the above process is shown below. Note that the processing up to step S4 is the same as in the first embodiment, and a description thereof will be omitted. Assume that the feature word history shown in FIG. 4 is generated in step S2, the appearance frequency shown in FIG. 7 is calculated in step S3, and the appearance time interval shown in FIG. 8 is obtained in step S4. Further, in the following description, the calculation process of the “security” evaluation value is exemplified. FIG. 20 is an explanatory diagram showing this calculation process.

ステップＳ４ａにおいて、特徴語評価手段７００は、出現時刻間隔を用いて、各特徴語について、出現時刻間隔の平均値ＡＶＥおよび標準偏差ＳＴＤＥＶを計算する。例えば、特徴語「セキュリティ」の場合、出現時刻間隔「０日」が１回あり、「１日」が１回ある。従って、平均値ＡＶＥ＝０．５となり、標準偏差ＳＴＤＥＶ＝０．７０７となる。 In step S4a, the feature word evaluation unit 700 calculates the average value AVE and the standard deviation STDEV of the appearance time intervals for each feature word using the appearance time intervals. For example, in the case of the feature word “security”, the appearance time interval “0 day” is once and “1 day” is once. Therefore, the average value AVE = 0.5, and the standard deviation STDEV = 0.707.

次にステップＳ４ｂにおいて、特徴語評価手段７００は、式（８）の計算を行い、特徴語の評価値Ｖを求める。特徴語「セキュリティ」の場合、Ｔ_ｌａｓｔは、「２００７／０９／０２」であり、Ｔ_０は「２００７／０９／０１」である。よって、Ｔ_ｌａｓｔ−Ｔ_０＝１である。また、特徴語履歴の導出対象期間Ｔ＝７であるので、Ｖ＝（１／７）・ｅ^{（−０．１×０．５×０．７０７）}＝０．１３８となる。 Next, in step S4b, the feature word evaluation unit 700 calculates the expression (8) to obtain the evaluation value V of the feature word. In the case of the feature word “security”, T _last is “2007/09/02”, and T ₀ is “2007/09/01”. Therefore, T _last −T ₀ = 1. Further, since the derivation target period T of the feature word history is 7, V = (1/7) · e ^{(−0.1 × 0.5 × 0.707)} = 0.138.

同様に、他の特徴語についても評価値を求めると、図１８に示すようになる。「ユビキタス」、「パソコン」、「サーバ」、「ストレージ」に関しては、出現頻度が２回または１回であり、出現時刻間隔を求められないか、あるいは、１つしか求められないので、評価値を０．０と定める。 Similarly, when evaluation values are obtained for other feature words, they are as shown in FIG. For "Ubiquitous", "PC", "Server", and "Storage", the appearance frequency is twice or once, and the appearance time interval cannot be obtained or only one can be obtained. Is defined as 0.0.

各特徴語の評価値を計算した後、ステップＳ１０，Ｓ１１の処理を行う。この処理は第１の実施形態と同様である。 After calculating the evaluation value of each feature word, the processes of steps S10 and S11 are performed. This process is the same as in the first embodiment.

本実施形態では、式（８）により、各特徴語の評価値を計算するので、長い期間、出現間隔が概ね小さく、偏りがない特徴語に対してより高い評価値を付与する。そして、その評価値に基づいて特徴語を選択するので、短い期間に多く出現した特徴語よりも、長い期間で定期的に出現した特徴語を抽出することができ、ユーザやグループの定常的な興味を表す特徴語として利用できる。 In the present embodiment, since the evaluation value of each feature word is calculated by the equation (8), a higher evaluation value is given to a feature word having a small appearance interval and no bias for a long period of time. Since feature words are selected based on the evaluation value, it is possible to extract feature words that regularly appear in a longer period than feature words that frequently appear in a short period of time. It can be used as a feature word representing interest.

また、パラメータβは、出現時刻間隔が短い特徴語を特に優先的に抽出しやすくするか否かを調整するためのパラメータであり、βの値を調整することにより、特徴語の出現時刻間隔が短い特徴語を優先的に抽出したり、あるいは、出現時刻間隔が短くなくても定期的に出現する特徴語に比較的高い評価値を付与して、そのような特徴語も抽出されやすくなるようにしたり調整することができる。よって、出現期間が長く、出現時刻間隔の標準偏差および平均値が小さいほど評価値が大きくなることを前提とした上で、βの値を大きくするほど、出現時刻間隔が短い特徴語と長い特徴語の評価値の差を大きくして、前者を抽出しやすくすることができる。 The parameter β is a parameter for adjusting whether or not a feature word having a short appearance time interval is easily extracted with priority. By adjusting the value of β, the appearance time interval of the feature word is set. Precisely extract short feature words, or assign a relatively high evaluation value to feature words that appear regularly even if the appearance time interval is not short so that such feature words can be easily extracted. And can be adjusted. Therefore, on the premise that the evaluation value becomes larger as the appearance period is longer and the standard deviation and average value of the appearance time interval are smaller, as the value of β is larger, the feature word and the longer feature whose appearance time interval is shorter The difference between the evaluation values of the words can be increased to facilitate the extraction of the former.

図２１は、β＝０．１の場合の評価値とβ＝１．０の場合の評価値との比較を示す説明図である。図２１に示す横軸は、出現時刻間隔であり、縦軸は評価値である。β＝０．１の場合でも、β＝１．０の場合でも出現時刻間隔が大きくなるほど、評価値は低下するが、β＝１．０の場合の方が急激に低下する。すなわち、β＝１．０では、出現時刻間隔が短い場合と長い場合とで評価値の差が大きくなる。例えば、β＝１．０では、出現時刻間隔が５程度になると、評価値は０に近づき、出現時刻間隔が短い場合には、０に比べて大きな評価値が付与され、出現時刻間隔が短い場合と長い場合とで評価値の差が大きくなる。この結果、出現時刻間隔が短い評価値が抽出されやすくなる。 FIG. 21 is an explanatory diagram showing a comparison between an evaluation value when β = 0.1 and an evaluation value when β = 1.0. The horizontal axis shown in FIG. 21 is the appearance time interval, and the vertical axis is the evaluation value. Even when β = 0.1 and β = 1.0, the evaluation value decreases as the appearance time interval increases, but the value decreases more rapidly when β = 1.0. That is, when β = 1.0, the difference between the evaluation values increases when the appearance time interval is short and when it is long. For example, when β = 1.0, when the appearance time interval becomes about 5, the evaluation value approaches 0. When the appearance time interval is short, a larger evaluation value is given than 0, and the appearance time interval is short. The difference in evaluation value increases between the case and the case where the case is long. As a result, an evaluation value with a short appearance time interval is easily extracted.

次に、本発明の概要について説明する。図２２は、本発明の概要を示すブロック図である。本発明の興味情報特定システムは、出現頻度計算手段９７１と、出現間隔計算手段９７２と、特徴語評価手段９７３と、特徴語特定手段９７４とを備える。 Next, the outline of the present invention will be described. FIG. 22 is a block diagram showing an outline of the present invention. The interest information specifying system of the present invention includes appearance frequency calculating means 971, appearance interval calculating means 972, feature word evaluating means 973, and feature word specifying means 974.

出現頻度計算手段９７１（例えば、図１に示す出現頻度計算手段５００）は、人物またはグループが利用したコンテンツの特徴を表す特徴語と、人物またはグループがその特徴語により特徴が表される各コンテンツを利用した利用時刻とを含む特徴語履歴を参照して、特徴語毎に、特徴語が表すコンテンツの利用頻度である特徴語出現頻度を求める。 The appearance frequency calculation means 971 (for example, the appearance frequency calculation means 500 shown in FIG. 1) includes a feature word representing the characteristics of the content used by the person or group, and each content in which the characteristics of the person or group are represented by the feature word. The feature word appearance frequency, which is the use frequency of the content represented by the feature word, is obtained for each feature word with reference to the feature word history including the use time using the.

出現間隔計算手段９７２（例えば、図１に示す出現間隔計算手段６００）は、特徴語履歴を参照して、特徴語毎に、特徴語が表すコンテンツの利用時刻間隔である出現時刻間隔を求める。 The appearance interval calculation unit 972 (for example, the appearance interval calculation unit 600 shown in FIG. 1) refers to the feature word history and obtains the appearance time interval that is the use time interval of the content represented by the feature word for each feature word.

特徴語評価手段９７３（例えば、図１に示す特徴語評価手段７００）は、特徴語出現頻度および出現時刻間隔を参照し、特徴語毎に、出現時刻間隔の分布と、モデルとなる確率分布との乖離に応じて特徴語の評価値を求める。 The feature word evaluation unit 973 (for example, the feature word evaluation unit 700 shown in FIG. 1) refers to the feature word appearance frequency and the appearance time interval, and for each feature word, the appearance time interval distribution and the model probability distribution The evaluation value of the feature word is obtained according to the divergence.

特徴語特定手段９７４（例えば、図１に示す特徴語選択手段８００）は、評価値に基づいて特徴語を特定する。 The feature word specifying unit 974 (for example, the feature word selecting unit 800 shown in FIG. 1) specifies the feature word based on the evaluation value.

このような構成により、実際の確率分布と、モデルとなる確率分布との乖離に応じた特徴語の評価値を計算し、その評価値に基づいて特徴語を特定するので、ある期間においてランダムに出現する特徴語を特定することができる。従って、人物やグループの定常的な興味を特定することができる。 With such a configuration, the evaluation value of the feature word is calculated according to the difference between the actual probability distribution and the model probability distribution, and the feature word is specified based on the evaluation value. Appearing feature words can be identified. Therefore, it is possible to specify a regular interest of a person or group.

第１の実施形態では、特徴語評価手段が、特徴語をそれぞれ選択し、選択した特徴語におけるそれぞれの出現時刻間隔毎に、出現時刻間隔で特徴語が出現する確率と、モデルとなる確率分布にてその出現時刻間隔で特徴語が出現する確率との乖離の程度を示す値を計算し、出現時刻間隔毎に計算した乖離の程度を示す値に基づいて、選択した特徴語の評価値を求める構成が記載されている。 In the first embodiment, the feature word evaluation unit selects each feature word, and for each appearance time interval in the selected feature word, the probability that the feature word appears at the appearance time interval and the probability distribution that serves as a model Is used to calculate a value indicating the degree of deviation from the probability that a feature word will appear at the appearance time interval, and based on the value indicating the degree of deviation calculated for each appearance time interval, the evaluation value of the selected feature word is calculated. The required configuration is described.

また、第１の実施形態では、特徴語評価手段が、特徴語をそれぞれ選択し、選択した特徴語におけるそれぞれの出現時刻間隔毎に、出現時刻間隔で特徴語が出現する確率と、モデルとなる確率分布にてその出現時刻間隔で特徴語が出現する確率との差分の絶対値を求め、出現時刻間隔毎に計算した差分の絶対値の総和に基づいて、選択した特徴語の評価値を求める構成が記載されている。 In the first embodiment, the feature word evaluation unit selects each feature word, and the probability word appearance at the appearance time interval for each appearance time interval in the selected feature word is used as a model. Obtain the absolute value of the difference between the probability of occurrence of a feature word at the appearance time interval in the probability distribution and obtain the evaluation value of the selected feature word based on the sum of the absolute values of the differences calculated at each appearance time interval The configuration is described.

また、第１の実施形態では、特徴語評価手段が、出現時刻間隔をｔとしたときに、出現時刻間隔ｔで特徴語が出現する確率を、出現時刻間隔ｔで特徴語が出現した回数をそれぞれの出現時刻間隔で特徴語が出現した回数の和で除算することにより計算し、特徴語履歴の導出対象期間をＴとし、特徴語の出現頻度をＫとしたときに、モデルとなる確率分布にて出現時刻間隔ｔで特徴語が出現する確率を、（Ｋ／Ｔ）ｅ^{−（Ｋ／Ｔ）ｔ}を計算することによって求める構成が記載されている。 In the first embodiment, the feature word evaluation unit calculates the probability that the feature word appears at the appearance time interval t when the appearance time interval is t, and the number of times the feature word appears at the appearance time interval t. Probability distribution that is a model when the feature word history derivation period is T and the feature word appearance frequency is K, calculated by dividing by the sum of the number of appearances of the feature word at each appearance time interval Describes a configuration in which the probability that a feature word appears at an appearance time interval t is calculated by calculating (K / T) e- ^{(K / T) t} .

また、特徴語評価手段９７３は、特徴語毎に、出現時刻間隔の標準偏差および平均値を計算し、標準偏差をＳＴＤＥＶとし、平均値をＡＶＥとし、特徴語が表すコンテンツの最初の利用時刻および最後の利用時刻をそれぞれＴ_ｌａｓｔ，Ｔ_０とし、特徴語履歴の導出対象期間をＴとしたときに、パラメータβを用いて、｛（Ｔ_ｌａｓｔ−Ｔ_０）／Ｔ｝・ｅ^{（−β・ＳＴＤＥＶ・ＡＶＥ）}を計算することにより、特徴語の評価値を求め、特徴語出現頻度が定められた回数以下である特徴語の評価値を所定値に定めてもよい。 The feature word evaluation unit 973 calculates the standard deviation and the average value of the appearance time intervals for each feature word, sets the standard deviation to STDEV, sets the average value to AVE, and the first use time of the content represented by the feature word and When the last use time is T _last and T ₀ and the feature word history derivation target period is T, using the parameter β, {(T _last −T ₀ ) / T} · e ^{(−β ·} By calculating ( ^{STDEV · AVE)} , an evaluation value of the feature word may be obtained, and an evaluation value of the feature word whose feature word appearance frequency is equal to or less than a predetermined number of times may be set to a predetermined value.

この場合、出現期間が長く、出現時刻間隔の標準偏差および平均値が小さい特徴語に大きな評価値を付与し、その特徴語を特定することができる。よって、定期的に出現する特徴語を特定することにより、人物やグループの定常的な興味を特定できる。また、パラメータβの値を調整することにより、特徴語の出現時刻間隔が短い特徴語を優先的に抽出したり、あるいは、出現時刻間隔が短くなくても定期的に出現する特徴語に比較的高い評価値を付与して、そのような特徴語も抽出されやすくなるようにしたり調整することができる。 In this case, a large evaluation value is assigned to a feature word having a long appearance period and a small standard deviation and average value of the appearance time interval, and the feature word can be specified. Thus, by identifying feature words that appear regularly, it is possible to identify the regular interest of a person or group. In addition, by adjusting the value of the parameter β, feature words having a short appearance time interval can be preferentially extracted, or the feature words that appear regularly even if the appearance time interval is not short can be relatively By assigning a high evaluation value, such feature words can be easily extracted or adjusted.

また、第２の実施形態では、パラメータβの値が入力されるパラメータ入力手段（例えば、抽出対象設定手段１００により実現される。）を備える構成が開示されている。 In the second embodiment, a configuration including parameter input means (for example, realized by the extraction target setting means 100) for inputting the value of the parameter β is disclosed.

また、各実施形態では、コンテンツの識別情報とコンテンツの利用時刻とコンテンツを利用した人物またはグループとを含む利用履歴を記憶する利用履歴記憶手段（例えば、アクセス履歴記憶手段２００）と、コンテンツの識別情報とコンテンツの特徴を表す特徴語とを含むメタ情報を記憶するメタ情報記憶手段（例えば、コンテンツ管理手段３００）と、利用履歴とメタ情報とを参照して、興味の特定対象となる人物またはグループが利用したコンテンツの特徴を表す特徴語を特定し、その特徴語に、コンテンツの利用時刻を対応付けることにより、特徴語履歴を生成する特徴語履歴生成手段（例えば、特徴語履歴生成手段４００）とを備える構成が開示されている。 In each embodiment, a usage history storage unit (for example, an access history storage unit 200) that stores usage history including content identification information, content usage time, and a person or a group that uses the content, and content identification A meta information storage unit (for example, content management unit 300) that stores meta information including information and a feature word representing a feature of the content, a usage history and the meta information, and a person who is a target of interest identification or A feature word history generating unit (for example, a feature word history generating unit 400) that generates a feature word history by specifying a feature word representing a feature of content used by a group and associating the use time of the content with the feature word. The structure provided with these is disclosed.

また、各実施形態では、興味情報の特定対象となる人物またはグループを設定する特定対象設定手段（例えば、抽出対象設定手段１００）を備え、特徴語履歴生成手段が、利用履歴とメタ情報とを参照して、特定対象設定手段に設定された人物またはグループが利用したコンテンツの特徴を表す特徴語を特定し、その特徴語に、コンテンツの利用時刻を対応付けることにより、特徴語履歴を生成する構成が開示されている。 In each embodiment, a specific target setting unit (for example, an extraction target setting unit 100) that sets a person or a group that is a target of interest information is provided, and the feature word history generating unit includes the use history and the meta information. A configuration for generating a feature word history by specifying a feature word representing a feature of content used by a person or group set in the target setting unit and referring to the use time of the content with the feature word Is disclosed.

また、各実施形態では、特徴語特定手段が特定した特徴語を用いてコンテンツを検索する検索手段（例えば、図１２に示す検索手段９５０）を備える構成が開示されている。 In each embodiment, a configuration is disclosed that includes search means (for example, search means 950 shown in FIG. 12) for searching for content using the feature words specified by the feature word specifying means.

また、各実施形態では、特徴語特定手段が特定した特徴語を表示する表示手段（例えば、興味情報提示手段９００）を備える構成が開示されている。 In each embodiment, a configuration is disclosed that includes display means (for example, interest information presenting means 900) that displays the feature words specified by the feature word specifying means.

また、各実施形態では、表示手段が表示した特徴語のうち、指定された特徴語を用いてコンテンツを検索する検索手段（例えば、図１２に示す検索手段９５０）を備える構成が開示されている。 In each embodiment, a configuration including a search unit (for example, a search unit 950 shown in FIG. 12) that searches for content using a specified feature word among the feature words displayed by the display unit is disclosed. .

本発明は、例えば、データベースに格納されたデータを検索する情報検索システムや、データベースに格納されたデータの中から適した情報を推薦する情報推薦システムに利用される興味情報特定システムに好適に適用される。また、インターネットあるいはイントラネット上でユーザの興味に合わせて情報をポータルサイトに表示したり、ユーザや社員の所属・活動・興味を検索するディレクトリサービスに利用される興味情報特定システムにも好適に適用可能である。 The present invention is suitably applied to, for example, an interest information specifying system used for an information search system for searching data stored in a database and an information recommendation system for recommending suitable information from data stored in a database. Is done. It can also be applied to interest information identification systems used for directory services that display information on the portal site according to the user's interest on the Internet or intranet, and search for the affiliation, activity, and interest of users and employees. It is.

本発明の第１の実施形態の興味情報特定システムの例を示すブロック図である。It is a block diagram which shows the example of the interest information specific system of the 1st Embodiment of this invention. アクセス履歴の例を示す説明図である。It is explanatory drawing which shows the example of an access history. メタ情報の例を示す説明図である。It is explanatory drawing which shows the example of meta information. 特徴語履歴の例を示す説明図である。It is explanatory drawing which shows the example of a characteristic word log | history. 各特徴語の評価値の例を示す説明図である。It is explanatory drawing which shows the example of the evaluation value of each feature word. ユーザ名またはグループ名の入力画面の例を示す説明図である。It is explanatory drawing which shows the example of the input screen of a user name or a group name. 出現頻度の例を示す説明図である。It is explanatory drawing which shows the example of appearance frequency. 出現時刻間隔の例を示す説明図である。It is explanatory drawing which shows the example of an appearance time interval. 評価値計算に用いる特徴語履歴、出現頻度および出現時刻間隔の例を示す説明図である。It is explanatory drawing which shows the example of the characteristic word log | history used for evaluation value calculation, appearance frequency, and appearance time interval. 評価値の計算過程を示す説明図である。It is explanatory drawing which shows the calculation process of an evaluation value. 計算された評価値の例を示す説明図である。It is explanatory drawing which shows the example of the calculated evaluation value. 検索手段を備える興味情報特定システムの例を示すブロック図である。It is a block diagram which shows the example of an interest information specific system provided with a search means. 興味情報提示手段が出力する画面の例を示す説明図である。It is explanatory drawing which shows the example of the screen which an interest information presentation means outputs. 第１の実施形態の興味情報特定システムの処理経過の例を示す流れ図である。It is a flowchart which shows the example of the process progress of the interest information identification system of 1st Embodiment. 評価値の計算過程を示す説明図である。It is explanatory drawing which shows the calculation process of an evaluation value. 特徴語による検索を行う場合の画面例を示す説明図である。It is explanatory drawing which shows the example of a screen in the case of performing the search by a feature word. 興味情報提示手段が出力する画面の例を示す説明図である。It is explanatory drawing which shows the example of the screen which an interest information presentation means outputs. 計算された評価値の例を示す説明図である。It is explanatory drawing which shows the example of the calculated evaluation value. 第２の実施形態の興味情報特定システムの処理経過の例を示す流れ図である。It is a flowchart which shows the example of the process progress of the interest information identification system of 2nd Embodiment. 第２の実施形態における評価値の計算過程を示す説明図である。It is explanatory drawing which shows the calculation process of the evaluation value in 2nd Embodiment. β＝０．１の場合の評価値とβ＝１．０の場合の評価値との比較を示す説明図である。It is explanatory drawing which shows the comparison with the evaluation value in the case of (beta) = 0.1, and the evaluation value in the case of (beta) = 1.0. 本発明の概要を示すブロック図である。It is a block diagram which shows the outline | summary of this invention.

Explanation of symbols

１００抽出対象設定手段
２００アクセス履歴記憶手段
３００コンテンツ管理手段
４００特徴語履歴生成手段
５００出現頻度計算手段
６００出現間隔計算手段
７００特徴語評価手段
８００特徴語選択手段
９００興味情報提示手段
９５０検索手段
９７１出現頻度計算手段
９７２出現間隔計算手段
９７３特徴語評価手段
９７４特徴語特定手段 100 extraction target setting means 200 access history storage means 300 content management means 400 feature word history generation means 500 appearance frequency calculation means 600 appearance interval calculation means 700 feature word evaluation means 800 feature word selection means 900 interest information presentation means 950 search means 971 appearance Frequency calculation means 972 Appearance interval calculation means 973 Feature word evaluation means 974 Feature word identification means

Claims

An interest information identification system that identifies interest information representing the interest of a person or group,
Each feature word is referred to by referring to a feature word history including a feature word representing a feature of the content used by the person or group and a use time when each person or group uses the content whose feature is represented by the feature word. And an appearance frequency calculating means for obtaining a feature word appearance frequency that is a use frequency of the content represented by the feature word,
With reference to the feature word history, for each feature word, an appearance interval calculating means for obtaining an appearance time interval that is a use time interval of the content represented by the feature word;
A feature word evaluation unit that refers to a feature word appearance frequency and an appearance time interval, and obtains an evaluation value of the feature word according to a deviation between a distribution of the appearance time interval and a probability distribution as a model for each feature word;
An interest information specifying system comprising: feature word specifying means for specifying a feature word based on an evaluation value.

The feature word evaluation means is
Each feature word is selected, and for each appearance time interval in the selected feature word, the probability that the feature word will appear at the appearance time interval, and the probability that the feature word will appear at the appearance time interval in the model probability distribution The interest information specifying system according to claim 1, wherein a value indicating a degree of deviation from the selected feature word is calculated based on a value indicating the degree of deviation calculated for each appearance time interval. .

The feature word evaluation means is
Each feature word is selected, and for each appearance time interval in the selected feature word, the probability that the feature word will appear at the appearance time interval, and the probability that the feature word will appear at the appearance time interval in the model probability distribution The interest information specification according to claim 1, wherein an absolute value of a difference between the selected feature word and an evaluation value of the selected feature word is obtained based on a sum of absolute values of the differences calculated at each appearance time interval. system.

The feature word evaluation means is
When the appearance time interval is t, the probability that the feature word appears at the appearance time interval t is the sum of the number of times the feature word appears at the appearance time interval t. Calculated by dividing by
When the characteristic word history derivation target period is T and the appearance frequency of the feature word is K, the probability that the feature word appears at the appearance time interval t in the model probability distribution is (K / T) The interest information identification system according to claim 3, which is obtained by calculating e ^{− (K / T) t} .

An interest information identification system that identifies interest information representing the interest of a person or group,
Each feature word is referred to by referring to a feature word history including a feature word representing a feature of the content used by the person or group and a use time when each person or group uses the content whose feature is represented by the feature word. And an appearance frequency calculating means for obtaining a feature word appearance frequency that is a use frequency of the content represented by the feature word,
With reference to the feature word history, for each feature word, an appearance interval calculating means for obtaining an appearance time interval that is a use time interval of the content represented by the feature word;
For each feature word, the standard deviation and average value of the appearance time intervals are calculated, the standard deviation is STDEV, the average value is AVE, and the first use time and the last use time of the content represented by the feature word are T {(T _last −T ₀ ) / T} · e ^{(−β · STDEV · AVE)} is calculated by using the parameter β, where _last and T ₀ are T and the characteristic word history derivation target period is T. Thus, an evaluation value of the feature word is obtained, and the feature word evaluation means for setting the evaluation value of the feature word that is equal to or less than the predetermined number of times that the feature word appearance frequency is determined,
An interest information specifying system comprising: feature word specifying means for specifying a feature word based on an evaluation value.

The interest information specifying system according to claim 5, further comprising parameter input means for inputting a value of the parameter β.

Use history storage means for storing a use history including content identification information, use time of the content, and identification information of a person or group using the content;
Meta-information storage means for storing meta-information including content identification information and feature words representing the characteristics of the content;
By referring to the usage history and the meta information, a feature word representing a feature of the content used by a person or group of interest is specified, and the use time of the content is associated with the feature word The interest information identification system according to any one of claims 1 to 6, further comprising: feature word history generation means for generating a feature word history.

With specific target setting means for setting a person or group as a target of specific interest information,
The feature word history generation means refers to the use history and the meta information, specifies a feature word representing the feature of the content used by the person or group set in the specification target setting means, and sets the content to the feature word The interest information identification system according to claim 7, wherein a feature word history is generated by associating the use times of.

The interest information specifying system according to any one of claims 1 to 8, further comprising search means for searching for content using the feature word specified by the feature word specifying means.

The interest information specifying system according to any one of claims 1 to 9, further comprising display means for displaying the feature words specified by the feature word specifying means.

The interest information identification system according to claim 10, further comprising search means for searching for content using a specified feature word among the feature words displayed by the display means.

An interest information identification method for identifying interest information representing an interest of a person or group,
Each feature word is referred to by referring to a feature word history including a feature word representing a feature of the content used by the person or group and a use time when each person or group uses the content whose feature is represented by the feature word. In addition, an appearance frequency calculating step for obtaining a feature word appearance frequency that is a use frequency of the content represented by the feature word,
With reference to the feature word history, for each feature word, an appearance interval calculation step for obtaining an appearance time interval that is a use time interval of content represented by the feature word;
A feature word evaluation step for obtaining an evaluation value of a feature word according to a difference between a distribution of appearance time intervals and a probability distribution as a model for each feature word with reference to the feature word appearance frequency and the appearance time interval;
And a feature word specifying step of specifying a feature word based on the evaluation value.

In the feature word evaluation step,
Each feature word is selected, and for each appearance time interval in the selected feature word, the probability that the feature word will appear at the appearance time interval, and the probability that the feature word will appear at the appearance time interval in the model probability distribution The interest information specifying method according to claim 12, wherein a value indicating a degree of divergence is calculated, and an evaluation value of the selected feature word is obtained based on a value indicating the degree of divergence calculated for each appearance time interval. .

In the feature word evaluation step,
Each feature word is selected, and for each appearance time interval in the selected feature word, the probability that the feature word will appear at the appearance time interval, and the probability that the feature word will appear at the appearance time interval in the model probability distribution The interest information specification according to claim 12 or 13, wherein an absolute value of a difference between the selected feature word and an evaluation value of the selected feature word is obtained based on a sum of absolute values of the differences calculated at each appearance time interval. Method.

An interest information identification method for identifying interest information representing an interest of a person or group,
Each feature word is referred to by referring to a feature word history including a feature word representing a feature of the content used by the person or group and a use time when each person or group uses the content whose feature is represented by the feature word. In addition, an appearance frequency calculating step for obtaining a feature word appearance frequency that is a use frequency of the content represented by the feature word,
With reference to the feature word history, for each feature word, an appearance interval calculation step for obtaining an appearance time interval that is a use time interval of content represented by the feature word;
For each feature word, the standard deviation and average value of the appearance time intervals are calculated, the standard deviation is STDEV, the average value is AVE, and the first use time and the last use time of the content represented by the feature word are T {(T _last −T ₀ ) / T} · e ^{(−β · STDEV · AVE)} is calculated by using the parameter β, where _last and T ₀ are T and the characteristic word history derivation target period is T. A feature word evaluation step for determining a feature word evaluation value and setting a feature word evaluation value equal to or less than a predetermined number of times the feature word appearance frequency is determined;
And a feature word specifying step of specifying a feature word based on the evaluation value.

The interest information specifying method according to claim 15, further comprising a parameter input step in which a value of the parameter β is input.

An interest information identification program installed in a computer for identifying interest information representing the interest of a person or group,
In the computer,
Each feature word is referred to by referring to a feature word history including a feature word representing a feature of the content used by the person or group and a use time when each person or group uses the content whose feature is represented by the feature word. In addition, an appearance frequency calculation process for obtaining a feature word appearance frequency that is a use frequency of the content represented by the feature word,
An appearance interval calculation process for obtaining an appearance time interval that is a use time interval of the content represented by the feature word for each feature word with reference to the feature word history,
With reference to the feature word appearance frequency and the appearance time interval, for each feature word, a feature word evaluation process for obtaining an evaluation value of the feature word according to a deviation between the distribution of the appearance time interval and the probability distribution as a model, and
An interest information specifying program for executing a feature word specifying process for specifying a feature word based on an evaluation value.

On the computer,
In the feature word evaluation process,
Each feature word is selected, and for each appearance time interval in the selected feature word, the probability that the feature word will appear at the appearance time interval, and the probability that the feature word will appear at the appearance time interval in the model probability distribution The interest information specification according to claim 17, wherein a value indicating a degree of deviation from the selected feature word is calculated based on a value indicating the degree of deviation calculated for each appearance time interval. Program.

On the computer,
In the feature word evaluation process,
Each feature word is selected, and for each appearance time interval in the selected feature word, the probability that the feature word will appear at the appearance time interval, and the probability that the feature word will appear at the appearance time interval in the model probability distribution The interest according to claim 17 or 18, wherein an absolute value of a difference between the selected feature word and an evaluation value of the selected feature word is obtained based on a sum of absolute values of the differences calculated at each appearance time interval. Information identification program.

An interest information identification program installed in a computer for identifying interest information representing the interest of a person or group,
In the computer,
Each feature word is referred to by referring to a feature word history including a feature word representing a feature of the content used by the person or group and a use time when each person or group uses the content whose feature is represented by the feature word. In addition, an appearance frequency calculation process for obtaining a feature word appearance frequency that is a use frequency of the content represented by the feature word,
An appearance interval calculation process for obtaining an appearance time interval that is a use time interval of the content represented by the feature word for each feature word with reference to the feature word history,
For each feature word, the standard deviation and average value of the appearance time intervals are calculated, the standard deviation is STDEV, the average value is AVE, and the first use time and the last use time of the content represented by the feature word are T {(T _last −T ₀ ) / T} · e ^{(−β · STDEV · AVE)} is calculated by using the parameter β, where _last and T ₀ are T and the characteristic word history derivation target period is T. Thus, an evaluation value of the feature word is obtained, a feature word evaluation process in which the evaluation value of the feature word that is equal to or less than the predetermined number of times the feature word appearance frequency is determined, and
An interest information specifying program for executing a feature word specifying process for specifying a feature word based on an evaluation value.

On the computer,
The interest information specifying program according to claim 20, wherein a parameter input process in which a value of parameter β is input is executed.