JP5234836B2

JP5234836B2 - Content management apparatus, information relevance calculation method, and information relevance calculation program

Info

Publication number: JP5234836B2
Application number: JP2010095578A
Authority: JP
Inventors: 大我吉田; 豪入江; 隆佐藤; 明小島
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2010-04-19
Filing date: 2010-04-19
Publication date: 2013-07-10
Anticipated expiration: 2030-04-19
Also published as: JP2011227633A

Description

本発明は，文書，画像，音楽，映像などのコンテンツを管理するシステムにおいて，コンテンツに対して付与されるアノテーション同士の関連度，およびアノテーションとコンテンツ間の関連度を算出する情報関連度算出技術に関するものである。 The present invention relates to an information relevance calculation technique for calculating the relevance between annotations given to content and the relevance between annotations and content in a system for managing content such as documents, images, music, and videos. Is.

電子化された文書，画像，音楽，映像などのコンテンツを扱うサービスでは，タグと呼ばれる単語や短い文章がコンテンツにアノテーションとして付加され，分類や検索に利用されている。 In services that handle content such as digitized documents, images, music, and videos, words called tags and short sentences are added to the content as annotations and used for classification and search.

アノテーションを用いてコンテンツを分類する方法は，一つのコンテンツを一つのカテゴリに振り分ける従来の一般的な分類方法とは異なり，タグ等を利用することによって，一つのコンテンツに対して複数の属性情報を付与して分類することができる。こうすることで，複数の属性や特性を持つコンテンツに対しても，複数の分類情報を付加することができ，ユーザは複数の属性や特性を指定してコンテンツを検索し，絞り込むことが可能になる。 The method of classifying content using annotation is different from the conventional general method of classifying one content into one category. By using tags, etc., multiple attribute information can be assigned to one content. It can be given and classified. In this way, multiple classification information can be added to content with multiple attributes and characteristics, and the user can search and narrow down content by specifying multiple attributes and characteristics. Become.

コンテンツの分類や検索にタグを利用するサービスとしては，国内では，はてな（登録商標）ブックマークやニコニコ動画（登録商標），国外では，ＹｏｕＴｕｂｅ（登録商標）やＦｌｉｃｋｒ（登録商標），ｄｅｌｉｃｉｏｕｓなどが存在する。それらのサービスでは，各コンテンツに付加されたタグを，ユーザが自由に参照することができる。このとき，コンテンツに付加された日時が早い順にタグを並べてユーザに提示するのが一般的である。 Services that use tags for content classification and search include Hatena (registered trademark) bookmarks and Nico Nico Douga (registered trademark) in Japan, and YouTube (registered trademark), Flickr (registered trademark), and delicious in other countries. To do. In these services, the user can freely refer to the tag added to each content. At this time, the tags are generally arranged in order from the earliest date and time added to the content and presented to the user.

さらに，ｄｅｌｉｃｉｏｕｓでは，コンテンツであるウェブページに付けられたタグ群を，どのユーザが付けたかによって分類して表示するほか，そのウェブページに付けられた全てのタグを，付けたユーザが多いものから順に並べて表示する技術が用いられている。すなわち，より多くのユーザによって付加されたタグほど，より上位に配置されることになる。 Furthermore, in delicious, the tag group attached to the web page which is the content is classified and displayed according to which user attached, and since many users attached all the tags attached to the web page. A technique of arranging and displaying in order is used. That is, tags added by more users are placed higher.

タグとコンテンツの関連度を算出する既存の検索技術としては，画像の特徴量を利用してランキングするもの（非特許文献１）が挙げられる。 As an existing search technique for calculating the degree of association between a tag and a content, there is a technique for ranking using a feature amount of an image (Non Patent Literature 1).

D. Liu, X. S. Hua, L. J. Yang, M. Wang and H. J. Zhang，“Tag Ranking ”, In Proceedings of A-CM International World Wide Web Conference, Pages:351-360, 2009 ．D. Liu, X. S. Hua, L. J. Yang, M. Wang and H. J. Zhang, “Tag Ranking”, In Proceedings of A-CM International World Wide Web Conference, Pages: 351-360, 2009.

しかしながら，従来技術では，アノテーションにコンテンツとの関連度を表す値が付いておらず，また，コンテンツとの関連度の高さによって順序付けがされていない。そのため，ユーザは，どのアノテーションがコンテンツを典型的に表しているのかを判断することができず，また，システム開発者は，アノテーションとコンテンツ間の関連度を利用したランキングや関連コンテンツの推薦を行うシステムを作成することができない。 However, in the prior art, the annotation does not have a value indicating the degree of association with the content, and is not ordered according to the degree of the degree of association with the content. As a result, users cannot determine which annotations typically represent content, and system developers make rankings using the degree of association between annotations and content and recommend related content The system cannot be created.

前述したｄｅｌｉｃｉｏｕｓなどの従来技術では，コンテンツに付加されたタグがユーザごとに分類されているシステムにしか適用することができない。さらに，タグ数によるランキングでは，カテゴリ名などの多くのユーザが共通して付加する一般的なタグが上位になりやすい。そのため，タグの名前からコンテンツの特徴を絞り込んで推測することができるようなタグが上位にランキングされるとは限らない。 The above-described conventional technology such as delicious can be applied only to a system in which tags added to content are classified for each user. Furthermore, in ranking based on the number of tags, general tags that are commonly added by many users such as category names tend to be higher. For this reason, tags that can be estimated by narrowing down the characteristics of the content from the tag name are not necessarily ranked higher.

また，非特許文献１では，画像の特徴量を用いてタグと画像の関連度を計算し，タグｔ_jおよびタグｔ_jにおける画像の類似度および下記の式(1) のＧｏｏｇｌｅ（登録商標）ｄｉｓｔａｎｃｅによって定義されたタグ間の距離を用いて関連度の修正を行っている。そのため，画像に対してだけしか適用できない手法であり，また，アノテーション情報のみを用いた場合と比較して解析のための時間や計算量などのコストがかかってしまう。 In Non-Patent Document 1, the degree of association between a tag and an image is calculated using the feature amount of the image, the similarity between the image at the tag t _j and the tag t _j, and Google (registered trademark) of the following formula (1) The degree of relevance is corrected using the distance between the tags defined by the distance. Therefore, it is a method that can be applied only to an image, and costs such as analysis time and calculation amount are required as compared with the case where only annotation information is used.

ｄ（ｔ_i，ｔ_j）＝｛ｍａｘ（log ｆ（ｔ_i），log ｆ（ｔ_j））−log ｆ（ｔ_i，ｔ_j）｝÷｛log Ｇ−ｍｉｎ（log ｆ（ｔ_i），log ｆ（ｔ_j））｝ …式(1)
ここで，ｆ（ｔ_i）およびｆ（ｔ_j）は，タグｔ_iおよびタグｔ_jが付加された画像数，ｆ（ｔ_i，ｔ_j）は，ｆ（ｔ_i）とｆ（ｔ_j）とが共に付加された画像数，Ｇは，全画像数である。 d (t _i , t _j ) = {max (log f (t _i ), log f (t _j )) − log f (t _i , t _j )} ÷ {log G−min (log f (t _i )) , Log f (t _j ))} (1)
Here, f (t _i ) and f (t _j ) are the number of images to which tag t _i and tag t _j are added, and f (t _i , t _j ) are f (t _i ) and f (t _j ). ) Are added together, and G is the total number of images.

本発明は，以上のような課題を解決するためのものであり，コンテンツの分類などを目的として付加されたアノテーションに対して，アノテーション情報のみを用いることにより，アノテーションとコンテンツとの関連度を算出できる点を特徴とする。また，算出した関連度は，コンテンツのランキングや推薦に利用することができる。 The present invention is for solving the above-mentioned problems, and the degree of association between annotation and content is calculated by using only annotation information for annotation added for the purpose of content classification and the like. It is characterized by what it can do. The calculated relevance can be used for content ranking and recommendation.

本発明は，上記課題を解決するため，コンテンツに付随するアノテーションを格納した情報記憶手段を有するコンテンツ管理装置が，単一のコンテンツに付加された複数のアノテーションに対し，コンテンツに関連する度合いによって関連度を算出する方法であって，各コンテンツに付加されたアノテーション群を取得し，任意のアノテーション間の関連度（以下，Ａ−Ａ関連度と記す）を算出し，算出されたアノテーション間の関連度を利用してあるアノテーションと他のアノテーション群との関連度を算出し，アノテーションとコンテンツ間の関連度（以下，Ａ−Ｃ関連度と記す）の算出およびランキングを行うことを特徴とする。 In order to solve the above-described problem, the present invention relates to a content management apparatus having an information storage unit storing annotations accompanying content with respect to a plurality of annotations added to a single content depending on the degree related to the content. Is a method for calculating the degree of annotation, which is obtained by acquiring an annotation group added to each content, calculating the degree of association between arbitrary annotations (hereinafter referred to as A-A degree of association), and the relation between the calculated annotations. The degree of association between an annotation and another annotation group is calculated using the degree, and the degree of association between the annotation and the content (hereinafter referred to as AC relation degree) is calculated and ranked.

アノテーション間の関連度の算出には，あるアノテーションが付加されたコンテンツ集合に対する別のアノテーションも付加されたコンテンツ集合の包含率，または，アノテーションの同一コンテンツにおける共起頻度，または，アノテーションの出現頻度に対する独立性検定の値，などが利用可能である。 The calculation of the degree of association between annotations is based on the content rate of the content set with another annotation added to the content set with a certain annotation, the co-occurrence frequency in the same content of the annotation, or the appearance frequency of the annotation. Independence test values, etc. are available.

また，あるアノテーションと同一コンテンツに付加された他のアノテーション群との関連度の算出には，他のアノテーション群とのＡ−Ａ関連度の総和，または，他のアノテーション群とのＡ−Ａ関連度の中央値，または，他のアノテーション群とのＡ−Ａ関連度の比率をリンクの強さとするランダムサーファーモデル，などが利用可能である。 In addition, in calculating the degree of association between an annotation and another annotation group added to the same content, the sum of the degrees of AA association with other annotation groups, or the AA association with other annotation groups A random surfer model in which the median degree or the ratio of AA relevance with other annotation groups is used as the link strength can be used.

非特許文献１記載の発明，および，本発明は，双方ともに，アノテーションとコンテンツとの関連性を求める点で一致している。しかしながら，次の点で大きな違いがある。非特許文献１に記載されている技術では，アノテーション同士の関連性を分析することに加えてコンテンツそのものの特徴である画像特徴を求め，アノテーションとコンテンツとの関連性を算出する。これに対して，本発明は，アノテーション同士の関連性のみに基づいて，アノテーションとコンテンツとの関連性を算出できる。 Both the invention described in Non-Patent Document 1 and the present invention match in that the relationship between the annotation and the content is obtained. However, there are significant differences in the following points. In the technique described in Non-Patent Document 1, in addition to analyzing the relationship between annotations, an image feature that is a feature of the content itself is obtained, and the relationship between the annotation and the content is calculated. On the other hand, the present invention can calculate the relationship between the annotation and the content based only on the relationship between the annotations.

本発明が，アノテーション同士の関連性のみによって，アノテーションとコンテンツ間の関連性を求めることができる理由を，図１を用いて説明する。図１は，アノテーションによって記述されるコンテンツ内容のイメージ図である。 The reason why the present invention can determine the relationship between the annotation and the content only by the relationship between the annotations will be described with reference to FIG. FIG. 1 is an image diagram of content contents described by annotation.

アノテーションは，コンテンツの内容を捉えて付加されるものである。したがって，各アノテーションは，コンテンツの持つ内容のうち，少なくともその一部を表現したものであるといえる。図１の例では，図７に示すような，「旅行」，「神奈川県」，「電車」，「江ノ電」，「江ノ島」，「鎌倉」という６つのアノテーション（タグ）が付加された「神奈川県の観光名所」というタイトルのビデオコンテンツについて，ビデオコンテンツ全体の内容に対して，各アノテーションが表現している部分を図示している。 Annotations are added by capturing the content. Therefore, it can be said that each annotation expresses at least a part of the content. In the example of FIG. 1, “Kanagawa” to which six annotations (tags) such as “travel”, “Kanagawa prefecture”, “train”, “Enoden”, “Enoshima”, and “Kamakura” are added as shown in FIG. For the video content titled “Prefectural Tourist Attractions”, the part represented by each annotation is shown for the entire video content.

本発明は，次の知見に基づいている。
［知見１］：より多くのアノテーションにより重複して表現されている内容は，コンテンツの主要な内容である。 The present invention is based on the following knowledge.
[Knowledge 1]: The content that is duplicated by more annotations is the main content.

コンテンツの全内容において，特に重要な内容については，多くのアノテーションが表現しようとするであろう。図１では，アノテーションによって表現される内容を示す円が多く重なっている領域が，「主要な内容」の領域に相当する。 Many annotations will try to express particularly important content in the whole content. In FIG. 1, an area where many circles indicating the contents represented by the annotation overlap each other corresponds to the “main contents” area.

さらに，本発明は，次の知見に基づいて，各アノテーションとコンテンツとの関連性を求める。
［知見２］：主要な内容を表現するアノテーションは，コンテンツとの関連性が高い。 Further, the present invention obtains the relationship between each annotation and content based on the following knowledge.
[Knowledge 2]: Annotations that express the main contents are highly related to the contents.

図１でいえば，アノテーションの円がより多く重なっている領域が大きいほど，コンテンツとの関連性が高いアノテーションであるということになる。 In FIG. 1, the larger the area where more circles of annotation are overlapped, the higher the relevance with the content.

したがって，アノテーション同士の内容がどの程度重なり合っているかを推定すれば，コンテンツの主要な内容を推定することができるのである。 Therefore, the main contents of the content can be estimated by estimating how much the annotations overlap.

また，本発明では，同一コンテンツに付加された他のアノテーションの情報とどれだけ共通した情報を表現しているかを示すアノテーション同士の関連度を算出する。あるアノテーションが付加されたコンテンツ集合に対する別のアノテーションも付加されたコンテンツ集合の包含率による関連度の算出手段を用いることにより，アノテーション間の非対称な関係を定義することができる。または，アノテーションの同一コンテンツにおける共起頻度による関連度の算出手段を用いることにより，アノテーションが付加されたコンテンツ数の差に影響されにくい関連度の算出ができる。または，アノテーションの出現頻度に対する独立性検定の値による関連度の算出手段を用いることにより，統計情報を利用した精度の高い関連度の算出ができる。 In the present invention, the degree of association between annotations indicating how much information is expressed with information of other annotations added to the same content is calculated. An asymmetric relationship between annotations can be defined by using a means for calculating a degree of association based on the content rate of a content set to which another annotation is added to a content set to which a certain annotation is added. Alternatively, by using a means for calculating the degree of association based on the co-occurrence frequency in the same content of the annotation, the degree of relevance that is not easily affected by the difference in the number of contents with the annotation added can be calculated. Alternatively, by using a means for calculating the degree of association based on the value of the independence test for the appearance frequency of the annotation, it is possible to calculate the degree of association with high accuracy using statistical information.

また，本発明では，同一コンテンツに付加された他のアノテーションのうち，どれだけ多くのアノテーションと共通の情報を表現できているかによってアノテーションとコンテンツ間の関連度を算出する。他のアノテーション群とのＡ−Ａ関連度の総和による関連度の算出手段を用いることにより，少ない計算コストでアノテーションとコンテンツ間の関連度を算出できる。または，他のアノテーション群とのＡ−Ａ関連度の中央値による関連度の算出手段を用いることにより，Ａ−Ａ関連度の外れ値に影響されにくい関連度の算出ができる。または，他のアノテーション群とのＡ−Ａ関連度の比率をリンクの強さとするランダムサーファーモデルによる関連度の算出手段を用いることにより，同一コンテンツに付けられた全てのタグとの関係性を考慮した精度の高い関連度の算出ができる。 In the present invention, the degree of association between the annotation and the content is calculated based on how many annotations common to the same content can be expressed among other annotations added to the same content. By using a means for calculating the degree of association based on the sum of the degrees of AA association with other annotation groups, the degree of association between the annotation and the content can be calculated with a small calculation cost. Alternatively, by using a means for calculating the relevance level based on the median AA relevance level with other annotation groups, it is possible to calculate the relevance level that is not easily affected by an outlier of the AA relevance level. Alternatively, by using a means for calculating the degree of association based on a random surfer model with the link strength as the ratio of AA relevance with other annotation groups, the relationship with all tags attached to the same content is considered. Highly accurate relevance can be calculated.

したがって，本発明の手法は，アノテーション情報のみを利用し，画像特徴量などコンテンツの種類に依存する情報を利用しないため，アノテーションが付加された任意のコンテンツに対して適用可能であり，画像特徴量を利用した手法と比べ計算コストが少ない点で，従来技術とは異なる。 Therefore, since the method of the present invention uses only annotation information and does not use information that depends on the type of content such as image feature amount, it can be applied to any content with an annotation added. It is different from the conventional technology in that the calculation cost is low compared with the method that uses the.

また，従来は，コンテンツに付加されるアノテーション群は，アノテーションが付けられた順に並べて提示されることが一般的であった。このため，コンテンツに付加されたどのアノテーションがコンテンツの特徴をより適切かつ詳細に表現しているかを知ることは困難であり，一目しただけでコンテンツの概要を把握することはできなかった。 Conventionally, annotation groups added to content are generally presented in the order in which annotations are added. For this reason, it is difficult to know which annotations added to the content express the features of the content more appropriately and in detail, and it was not possible to grasp the outline of the content at a glance.

これに対し，本発明によれば，コンテンツに付加された各アノテーションに対してコンテンツとの関連度を算出し，アノテーションをコンテンツとの関連度が高い順に並べることができる。したがって，関連度の高いアノテーションを一目するだけで，コンテンツの概要を把握することが可能になる。 On the other hand, according to the present invention, it is possible to calculate the degree of association with content for each annotation added to the content and arrange the annotations in descending order of degree of association with the content. Therefore, it is possible to grasp the outline of the content only by looking at the annotation with high relevance.

また，従来のアノテーションを指定した検索における検索結果リストは，閲覧された回数や投稿された日時など，アノテーションとコンテンツとの関連度とは関係のない指標による並べ替えしかできなかったのに対し，本発明を利用することで，検索条件に指定したアノテーションとの関連度が高い順に検索結果のコンテンツを並べて提示することが可能になる。 In addition, the search result list in the conventional search specifying annotations could only be sorted by an index that is not related to the degree of association between annotations and content, such as the number of times viewed or the date and time of posting. By using the present invention, it becomes possible to arrange and present the search result contents in descending order of the degree of association with the annotation specified as the search condition.

さらに，本発明では，あるコンテンツにおける主要なアノテーションを含む他のコンテンツを発見することにより，従来の重複するアノテーション数が多いコンテンツを発見する手法と比べ，関連するコンテンツをより高い精度で取得することが可能になる。すなわち，従来手法では，コンテンツに付加されたアノテーションのうちコンテンツの特徴を典型的に表現できていないアノテーションが重複する場合と，コンテンツに対して典型的なアノテーションが重複する場合を同等に扱っていたのに対し，本発明を利用することで，より典型的なアノテーションがより多く重複しているコンテンツを取得することができる。 Furthermore, in the present invention, by finding other contents including main annotations in a certain content, it is possible to obtain related contents with higher accuracy than the conventional method of finding contents having a large number of overlapping annotations. Is possible. In other words, in the conventional method, the annotations that are not able to express the characteristics of the content are duplicated in the annotations added to the content, and the case where the typical annotations are duplicated for the content is treated equally. On the other hand, by using the present invention, it is possible to acquire content in which more typical annotations are duplicated more.

以上のように，本発明によれば，コンテンツとコンテンツに付加されたアノテーションとの関連度を算出することにより，従来技術では得られなかった，コンテンツの検索およびコンテンツの推薦などに極めて有効な情報を提供することができるようになる。 As described above, according to the present invention, by calculating the degree of association between content and annotations added to the content, information that is extremely useful for content search, content recommendation, etc., which cannot be obtained by the prior art. Will be able to provide.

アノテーションによって記述されるコンテンツ内容のイメージ図である。It is an image figure of the contents contents described by annotation. 本発明の実施形態に係るコンテンツ管理装置の構成例を示す図である。It is a figure which shows the structural example of the content management apparatus which concerns on embodiment of this invention. 情報関連度算出部の処理フローチャートである。It is a process flowchart of an information relevance calculation part. アノテーション蓄積装置に格納されたアノテーション情報の例を示す図である。It is a figure which shows the example of the annotation information stored in the annotation storage apparatus. Ａ−Ｃ関連度記憶装置に記憶されるアノテーション−コンテンツ関係管理テーブルの例を示す図である。It is a figure which shows the example of the annotation-content relationship management table memorize | stored in A-C relevance degree memory | storage device. コンテンツに付加されたアノテーションについてのＡ−Ａ関連度の例を示す図である。It is a figure which shows the example of the AA relevance degree about the annotation added to the content. アノテーションが付加されたコンテンツの例を示す図である。It is a figure which shows the example of the content to which the annotation was added. Ａ−Ａ関連度の算出方法の例を示す模式図である。It is a schematic diagram which shows the example of the calculation method of AA relevance degree. Ａ−Ａ関連度からのＡ−Ｃ関連度算出方法の例を示す模式図である。It is a schematic diagram which shows the example of the AC relevance calculation method from AA relevance. コンテンツ集合の並べ替えを行うコンテンツ管理装置の構成例を示す図である。It is a figure which shows the structural example of the content management apparatus which rearranges a content set. 関連コンテンツ集合の取得を行うコンテンツ管理装置の構成例を示す図である。It is a figure which shows the structural example of the content management apparatus which acquires a related content set.

以下，本発明の実施の形態について，図面を用いて説明する。図２は，本発明の実施形態に係るコンテンツ管理装置を模式的に示す構成図である。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 2 is a block diagram schematically showing the content management apparatus according to the embodiment of the present invention.

コンテンツ管理装置１０は，コンテンツとアノテーションとを保持するアノテーション蓄積装置１００と，情報関連度算出部１１０と，情報関連度算出部１１０によって算出されたアノテーションとコンテンツ間の関連度（Ａ−Ｃ関連度）を記憶するＡ−Ｃ関連度記憶装置１２０を備える。情報関連度算出部１１０は，コンテンツ選択部１１１，Ａ−Ａ関連度算出部１１２，Ａ−Ｃ関連度算出部１１３から構成される。 The content management apparatus 10 includes an annotation storage apparatus 100 that holds content and annotations, an information relevance calculation unit 110, a relevance between the annotation calculated by the information relevance calculation unit 110 and the content (A-C relevance). A-C relevance degree storage device 120 is stored. The information relevance calculation unit 110 includes a content selection unit 111, an AA relevance calculation unit 112, and an AC relevance calculation unit 113.

アノテーション蓄積装置１００は，この例では，コンテンツ情報管理テーブル１０１と，アノテーション情報管理テーブル１０２を格納している。また，Ａ−Ｃ関連度記憶装置１２０は，アノテーションとコンテンツの関係の強さ（関連度）の情報を保持するＡ−Ｃ関係管理テーブル１２１を記憶する。説明を分かりやすくするために，この例では，アノテーション蓄積装置１００とＡ−Ｃ関連度記憶装置１２０とを別装置として表しているが，同じ記憶装置であってもよい。また，ここではアノテーション蓄積装置１００に，コンテンツ自体も格納されているものとして説明するが，アノテーション蓄積装置１００では，コンテンツを特定するコンテンツＩＤとそのアノテーション群のみを管理することとし，コンテンツ自体は他の装置に格納されているとしてもよい。 In this example, the annotation storage apparatus 100 stores a content information management table 101 and an annotation information management table 102. The A-C relevance degree storage device 120 stores an A-C relation management table 121 that holds information on the strength (relevance degree) of the relation between annotations and contents. In order to make the explanation easy to understand, in this example, the annotation storage device 100 and the AC relevance degree storage device 120 are shown as separate devices, but they may be the same storage device. Although the description is given here assuming that the content itself is also stored in the annotation storage device 100, the annotation storage device 100 manages only the content ID for specifying the content and the annotation group, and the content itself is the other. It may be stored in the device.

入出力装置２０は，関連度の算出操作を行う人間が利用するディスプレイやキーボードその他の周辺装置であるが，関連度を利用するシステムであってもよい。以下では，コンテンツ管理装置１０を利用する人間またはシステムを“ユーザ”という。 The input / output device 20 is a display, a keyboard, or other peripheral devices used by a person who performs a relevance calculation operation, but may be a system that uses the relevance. Hereinafter, a person or system that uses the content management apparatus 10 is referred to as a “user”.

コンテンツ管理装置１０は，例えば，ＣＰＵ（Central Processing Unit ），ＲＯＭ（Read Only Memory），ＲＡＭ（Random Access Memory），ＨＤＤ（Hard Disk Drive ），およびＲＡＭなどの記憶手段に展開されたプログラムを含む。図２に示した構成要素の動作を記述したプログラムは，コンテンツ管理装置１０として利用されるコンピュータ上で実行させたり，または，ネットワークなどを介してサービスとして実行させたりすることが可能である。アノテーション蓄積装置１００は，ＡＰＩ（Application Program Interface ）などを通じてコンテンツおよびアノテーションを取得するプログラムであってもよい。 The content management apparatus 10 includes a program developed in storage means such as a central processing unit (CPU), a read only memory (ROM), a random access memory (RAM), a hard disk drive (HDD), and a RAM. The program describing the operation of the components shown in FIG. 2 can be executed on a computer used as the content management apparatus 10, or can be executed as a service via a network or the like. The annotation storage apparatus 100 may be a program that acquires content and annotation through an API (Application Program Interface) or the like.

情報関連度算出部１１０は，あるコンテンツに付加された複数のアノテーションに対し，各アノテーションとコンテンツとの関係の強さを数値で表すＡ−Ｃ関連度を算出する。ここで，コンテンツとは，例えば，電子化された文書，画像，音楽，映像などを含み，アノテーションとは，タグ，キーワード，メタデータ，ユーザ情報などを含む。 The information relevance calculation unit 110 calculates an AC relevance level that represents the strength of the relationship between each annotation and the content as a numerical value for a plurality of annotations added to a certain content. Here, the content includes, for example, an electronic document, image, music, video, and the like, and the annotation includes a tag, a keyword, metadata, user information, and the like.

すなわち，情報関連度選出部１１０は，入出力装置２０や他のシステムなどのユーザからの要求に応じて，コンテンツ選択部１１１にて，関連度の算出の対象となるコンテンツを決定し，Ａ−Ａ関連度算出部１１２にて，Ａ−Ａ関連度を算出し，Ａ−Ｃ関連度算出部１１３にて，Ａ−Ｃ関連度を算出した後，アノテーション群のＡ−Ｃ関連度算出結果を出力する。または，算出したＡ−Ｃ関連度をＡ−Ｃ関連度記憶装置１２０に記憶する。 That is, the information relevance level selection unit 110 determines the content whose relevance level is to be calculated in the content selection unit 111 in response to a request from the user such as the input / output device 20 or another system. The A relevance calculation unit 112 calculates the AA relevance, and the AC relevance calculation unit 113 calculates the AC relevance. Output. Alternatively, the calculated AC relevance degree is stored in the AC relevance degree storage device 120.

図３は，情報関連度算出部１１０におけるコンテンツおよびアノテーション情報の処理の概要を示すフローチャートである。 FIG. 3 is a flowchart showing an outline of content and annotation information processing in the information relevance calculation unit 110.

情報関連度算出部１１０は，図３に示すように，まず，コンテンツ選択部１１１により，入出力装置２０等から入力された条件に適合するコンテンツ集合を，アノテーション蓄積装置１００で管理されているコンテンツの中から選択する（ステップＳ１）。コンテンツを選択する条件は，メニュー等による選択，作成日時等による選択，検索キーワードによる選択など，任意でよい。 As shown in FIG. 3, the information relevance calculation unit 110 first manages the content set managed by the annotation storage device 100 by the content selection unit 111 that meets the conditions input from the input / output device 20 or the like. (Step S1). The conditions for selecting the content may be arbitrary, such as selection by a menu, selection by creation date and time, selection by a search keyword, and the like.

次に，Ａ−Ａ関連度算出部１１２により，選択されたコンテンツ集合の中の関連度算出の対象となるコンテンツに付加されたアノテーション群を，アノテーション蓄積装置１００から取得する（ステップＳ２）。続いて，Ａ−Ａ関連度算出部１１２では，取得したアノテーションの任意の組み合わせに対してＡ−Ａ関連度を算出する処理を実行する（ステップＳ３）。 Next, the AA relevance calculation unit 112 acquires the annotation group added to the content whose relevance is to be calculated in the selected content set from the annotation storage device 100 (step S2). Subsequently, the AA relevance calculating unit 112 executes a process for calculating the AA relevance for an arbitrary combination of acquired annotations (step S3).

次に，Ａ−Ｃ関連度算出部１１３により，ステップＳ３で算出したＡ−Ａ関連度から，各アノテーションとコンテンツ間の関連度（Ａ−Ｃ関連度）を算出する処理を実行する（ステップＳ４）。算出したアノテーション群のＡ−Ｃ関連度を，入出力装置２０等に出力してユーザに提示する。または，Ａ−Ｃ関連度記憶装置１２０に格納する（ステップＳ５）。さらに，Ａ−Ｃ関連度の算出対象となったアノテーション群について，Ａ−Ｃ関連度が高い順もしくは低い順に並べ替えて，順序づけされたアノテーション群を出力するようにしてもよい。選択されたコンテンツが複数ある場合には，ステップＳ２〜Ｓ５を各コンテンツについて繰り返す。 Next, the A-C relevance calculation unit 113 executes a process of calculating the relevance between each annotation and content (A-C relevance) from the A-A relevance calculated in step S3 (step S4). ). The calculated AC relevance level of the annotation group is output to the input / output device 20 or the like and presented to the user. Alternatively, it is stored in the AC relevance storage device 120 (step S5). Furthermore, the annotation groups that are the targets of calculation of the A-C relevance level may be rearranged in the order of high or low A-C relevance level, and the ordered annotation group may be output. If there are a plurality of selected contents, steps S2 to S5 are repeated for each content.

図４は，アノテーション蓄積装置１００に格納されたアノテーション情報の例を示す図である。アノテーション蓄積装置１００には，図４に示すように，関連度の算出の対象になるコンテンツおよびそれらに付加されているアノテーションについてのデータを記憶するコンテンツ情報管理テーブル１０１とアノテーション情報管理テーブル１０２が格納されている。アノテーション蓄積装置１００は，コンテンツを選択するための検索条件を指定することにより，コンテンツやコンテンツに付加されたアノテーションについてのデータを出力する。 FIG. 4 is a diagram illustrating an example of annotation information stored in the annotation storage apparatus 100. As shown in FIG. 4, the annotation storage apparatus 100 stores a content information management table 101 and an annotation information management table 102 that store data about content that is a target of calculation of relevance and annotations added to the content. Has been. The annotation storage apparatus 100 outputs data about the content and the annotation added to the content by designating a search condition for selecting the content.

コンテンツ情報管理テーブル１０１には，図４（Ａ）に示すように，コンテンツを識別するコンテンツＩＤ，コンテンツの名前，コンテンツの説明情報などが格納される。また，アノテーション情報管理テーブル１０２には，図４（Ｂ）に示すように，アノテーションを識別するアノテーションＩＤ，アノテーションの名前などの情報が格納される。 As shown in FIG. 4A, the content information management table 101 stores a content ID for identifying content, a content name, content description information, and the like. Further, as shown in FIG. 4B, the annotation information management table 102 stores information such as an annotation ID for identifying an annotation and an annotation name.

図４に示したコンテンツ情報管理テーブル１０１およびアノテーション情報管理テーブル１０２は一例であり，名前や説明などの属性は格納されていなくてもよく，また，他の属性が格納されていてもよい。 The content information management table 101 and the annotation information management table 102 shown in FIG. 4 are examples, and attributes such as name and description may not be stored, and other attributes may be stored.

図５は，Ａ−Ｃ関連度記憶装置１２０に格納されるアノテーション−コンテンツ関係管理テーブル（Ａ−Ｃ関係管理テーブル）１２１の例を示す図である。Ａ−Ｃ関係管理テーブル１２１には，図５に示すように，情報関連度算出部１１０によって算出されたＡ−Ｃ関連度が，各コンテンツＩＤとアノテーションＩＤとの組合せのそれぞれに対して格納される。図５に示すＡ−Ｃ関係管理テーブル１２１も一例であり，他のデータ形式でＡ−Ｃ関連度等の情報を格納するようにしてもよい。 FIG. 5 is a diagram illustrating an example of the annotation-content relationship management table (AC relationship management table) 121 stored in the AC relevance storage device 120. In the A-C relationship management table 121, as shown in FIG. 5, the A-C relevance calculated by the information relevance calculation unit 110 is stored for each combination of each content ID and annotation ID. The The A-C relationship management table 121 shown in FIG. 5 is also an example, and information such as the degree of A-C relevance may be stored in another data format.

以上のアノテーション蓄積装置１００に格納された情報をもとに，コンテンツ選択部１１１では，関連度の算出の対象になるコンテンツをアノテーション蓄積装置１００から検索し，該当する各コンテンツに対して，コンテンツに付加されたアノテーション群を取得し，取得したアノテーション群をＡ−Ａ関連度算出部１１２に伝達する。Ａ−Ａ関連度算出部１１２では，条件に該当する各コンテンツに対し，同一コンテンツに付加された任意のアノテーション間における関連度であるＡ−Ａ関連度を算出する。 Based on the information stored in the annotation storage device 100 described above, the content selection unit 111 searches the annotation storage device 100 for content for which the degree of association is to be calculated, and sets the content for each corresponding content. The added annotation group is acquired, and the acquired annotation group is transmitted to the AA association degree calculation unit 112. The AA relevance calculating unit 112 calculates an AA relevance that is a relevance between arbitrary annotations added to the same content for each content satisfying the condition.

図６に，本発明の一実施形態におけるコンテンツに付加されたアノテーションについてのＡ−Ａ関連度の例を示す。同図において，アノテーションＡ，アノテーションＢ，アノテーションＣ，アノテーションＤは，同一のコンテンツａに付加されたアノテーションを表している。アノテーション間の関連度は二つ，もしくは複数のアノテーションの関係として表現される。本実施形態では，アノテーション間の関連度を二つのアノテーション間の関係として説明する。二つのアノテーションＡ，Ｂにおいて，アノテーションＡから見たアノテーションＢとの関連度と，アノテーションＢから見たアノテーションＡとの関連度は同じであるとは限らないため，これらを区別するためにアノテーション間の関係は，片方向の関係として定義してもよい。 FIG. 6 shows an example of the AA relevance level for the annotation added to the content according to the embodiment of the present invention. In the figure, annotation A, annotation B, annotation C, and annotation D represent annotations added to the same content a. The degree of association between annotations is expressed as a relationship between two or more annotations. In the present embodiment, the degree of association between annotations will be described as a relationship between two annotations. In two annotations A and B, the degree of association with annotation B viewed from annotation A and the degree of association with annotation A viewed from annotation B are not necessarily the same. This relationship may be defined as a one-way relationship.

〔Ａ−Ａ関連度の第１の実施例〕
例えば，同一のコンテンツａに付加されたアノテーションＡとアノテーションＢのＡ−Ａ関連度Ｒ（Ａ，Ｂ）の第１の実施例として，あるアノテーションが付加されたコンテンツ集合に対する別のアノテーションも付加されたコンテンツ集合の包含率により定義してもよい。このとき，Ａ−Ａ関連度Ｒ（Ａ，Ｂ）は，次の式(2) で与えられる。 [First Example of AA Relevance]
For example, as a first example of the A-A relevance R (A, B) between annotation A and annotation B added to the same content a, another annotation for a content set to which a certain annotation is added is also added. It may be defined by the inclusion rate of the content set. At this time, the AA relevance R (A, B) is given by the following equation (2).

Ｒ（Ａ，Ｂ）＝｛Ａ，Ｂを共に含むコンテンツ数｝／｛Ｂを含むコンテンツ数｝
…式(2)
ここで，Ａ，Ｂを共に含むコンテンツ数とは，アノテーション蓄積装置１００において蓄積されているコンテンツ集合の中で，アノテーションＡおよびアノテーションＢが共に付加されたコンテンツの総数のことである。また，Ｂを含むコンテンツ数とは，アノテーション蓄積装置１００において蓄積されているコンテンツ集合の中で，アノテーションＢが付加されたコンテンツの総数のことである。 R (A, B) = {number of contents including both A and B} / {number of contents including B}
... Formula (2)
Here, the number of contents including both A and B is the total number of contents to which both annotation A and annotation B are added in the contents set stored in the annotation storage apparatus 100. The number of contents including B is the total number of contents to which annotation B is added in the contents set stored in the annotation storage apparatus 100.

Ａ−Ａ関連度の第１の実施例に挙げた手法を用いることにより，アノテーションが付加されたコンテンツの総数に大きな差がある場合のＡ−Ａ関連度を算出する際に，付けられたコンテンツ数の多いアノテーションに対してはＡ−Ａ関連度を低く算出し，付けられたコンテンツ数の少ないアノテーションに対してはＡ−Ａ関連度を高く算出するといった，非対称な関係を定義することができる。 Content attached when calculating the AA relevance when there is a large difference in the total number of annotated content by using the method described in the first example of the AA relevance It is possible to define an asymmetric relationship such that the AA relevance level is calculated low for annotations with a large number, and the AA relevance level is calculated high for annotations with a small number of attached contents. .

図７は，アノテーションが付加されたコンテンツの例を示す図である。また，図８は，Ａ−Ａ関連度の算出方法を模式的に示す図である。例えば，図７に示すようなアノテーションが付加されたコンテンツを考える。このとき，二つのアノテーション「江ノ電」と「電車」の関連度Ｒ（江ノ電, 電車) およびＲ（電車, 江ノ電) は，図８のように算出される。すなわち，アノテーション「電車」が付加されたコンテンツ数が１００，「江ノ電」が付加されたコンテンツ数が２５であり，「電車」と「江ノ電」とが共に付加されたコンテンツ数が２０である場合，Ａ−Ａ関連度は，次のように算出される。 FIG. 7 is a diagram illustrating an example of content to which annotations are added. FIG. 8 is a diagram schematically illustrating a method of calculating the AA relevance level. For example, consider content with an annotation as shown in FIG. At this time, the relations R (Enoden, train) and R (train, Enoden) between the two annotations “Enoden” and “train” are calculated as shown in FIG. That is, when the number of contents to which the annotation “train” is added is 100, the number of contents to which “Enoden” is added is 25, and the number of contents to which both “train” and “Enoden” are added is 20, The AA relevance is calculated as follows.

Ｒ（江ノ電, 電車) ＝２０／１００＝０．２
Ｒ（電車, 江ノ電) ＝２０／２５＝０．８
〔Ａ−Ａ関連度の第２の実施例〕
Ａ−Ａ関連度Ｒ（Ａ，Ｂ）の第２の実施例として，Ａ−Ａ関連度をアノテーションの同一コンテンツにおける共起頻度により定義してもよい。このとき，Ａ−Ａ関連度Ｒ（Ａ，Ｂ）は，次の式(3) で与えられる。 R (Enoden, train) = 20/100 = 0.2
R (train, Enoden) = 20/25 = 0.8
[Second Example of AA Relevance]
As a second example of the AA relevance level R (A, B), the AA relevance level may be defined by the co-occurrence frequency of annotations in the same content. At this time, the AA relevance R (A, B) is given by the following equation (3).

Ｒ（Ａ，Ｂ）
＝｛Ａ，Ｂを共に含むコンテンツ数｝／｛ＡまたはＢを含むコンテンツ数｝…式(3)
ここで，ＡまたはＢを共に含むコンテンツ数とは，アノテーション蓄積装置１００において蓄積されているコンテンツ集合の中で，アノテーションＡまたはアノテーションＢが付加されたコンテンツの総数のことである。 R (A, B)
= {Number of contents including both A and B} / {number of contents including A or B} Expression (3)
Here, the number of contents including both A and B is the total number of contents to which annotation A or annotation B is added in the contents set stored in the annotation storage apparatus 100.

Ａ−Ａ関連度の第２の実施例に挙げた手法を用いることにより，アノテーションが付加されたコンテンツ数に差があるときにＡ−Ａ関連度が高くならないため，カテゴリ名などの付加されたコンテンツ数の多いアノテーションや，わずかなコンテンツにしか付加されていないアノテーションの影響によってＡ−Ａ関連度が高く算出されるのを防ぐことができる。 By using the method described in the second example of the AA relevance level, the AA relevance level does not increase when there is a difference in the number of contents with annotations added. It can be prevented that the AA relevance level is calculated high due to the influence of annotations having a large number of contents or annotations added to only a few contents.

〔Ａ−Ａ関連度の第３の実施例〕
また，別のＡ−Ａ関連度Ｒ（Ａ，Ｂ）の第３の実施例として，アノテーションの出現頻度に対するカイ二乗値により定義してもよい。このとき，Ａ−Ａ関連度Ｒ（Ａ，Ｂ）は，次の式(4) で与えられる。 [Third Example of AA Relevance]
Further, as a third example of another AA association degree R (A, B), it may be defined by a chi-square value for the appearance frequency of the annotation. At this time, the AA relevance R (A, B) is given by the following equation (4).

ここで，ｎはアノテーション蓄積装置１００において蓄積されているコンテンツ集合に存在するコンテンツの総数であり，［Ａ，Ｂ］，［￣Ａ，￣Ｂ］，［￣Ａ，Ｂ］，［Ａ，￣Ｂ］は，それぞれアノテーションＡ，Ｂを共に含むコンテンツ数，アノテーションＡ，Ｂのどちらも含まないコンテンツ数，アノテーションＢを含みアノテーションＡを含まないコンテンツ数，アノテーションＡを含みアノテーションＢを含まないコンテンツ数である。なお，「￣Ａ」の「￣」は，Ａの上に付く記号である（Ｂも同様）。 Here, n is the total number of contents existing in the content set stored in the annotation storage apparatus 100, and [A, B], [￣A, ￣B], [￣A, B], [A, ￣ B] is the number of contents including both annotations A and B, the number of contents not including both annotations A and B, the number of contents including annotation B and not including annotation A, and the number of contents including annotation A and not including annotation B It is. Note that “￣” in “￣A” is a symbol on A (the same applies to B).

また，［Ａ］，［￣Ａ］，［Ｂ］，［￣Ｂ］は，それぞれアノテーションＡを含むコンテンツ数，アノテーションＡを含まないコンテンツ数，アノテーションＢを含むコンテンツ数，アノテーションＢを含まないコンテンツ数である。得られたコンテンツ数が少なかった場合，次の参考文献１に記載されているような「イェーツの補正」などの補正を行ってもよい。 [A], [￣A], [B], and [￣B] are the number of contents including annotation A, the number of contents not including annotation A, the number of contents including annotation B, and the contents not including annotation B, respectively. Is a number. When the number of obtained contents is small, correction such as “Yates correction” as described in Reference Document 1 below may be performed.

〔参考文献１〕：F. Yates, “Contingency Tables Involving Small Numbers and the Ｘ² Test ”，Supplement to the Journal of the Royal Statistical Society, Pages:217-235, 1934.
Ａ−Ａ関連度の第３の実施例に挙げた手法を用いることにより，統計情報を利用するため，各アノテーションが付加される傾向に相関性があるかどうかを反映した精度の高いＡ−Ａ関連度を算出することができる。 [Reference 1]: F. Yates, “Contingency Tables Involving Small Numbers and the X ² Test”, Supplement to the Journal of the Royal Statistical Society, Pages: 217-235, 1934.
Since the statistical information is used by using the method described in the third embodiment of the AA relevance level, AA with high accuracy reflecting whether or not there is a correlation in the tendency that each annotation is added. Relevance can be calculated.

Ａ−Ｃ関連度算出部１１３は，Ａ−Ａ関連度算出部１１２で計算された各アノテーション間での関連度の値を入力として，アノテーションとコンテンツ間の関連度であるＡ−Ｃ関連度の算出を行う。 The A-C relevance calculating unit 113 receives the value of the relevance between the annotations calculated by the AA relevance calculating unit 112 as input, and calculates the A-C relevance that is the relevance between the annotation and the content. Perform the calculation.

コンテンツにアノテーションを付加するシステムでは，コンテンツの主要な属性や特徴は，そのコンテンツに付加される多くのアノテーションによって表現される傾向がある。そこで，本実施形態では，あるコンテンツに付加されたアノテーションについて，同一コンテンツに付加されたより多くの他のアノテーションと，より高い関連度を持っている場合に，アノテーションとコンテンツ間の関連度が高くなるとして説明する。すなわち，コンテンツに付加されたアノテーション群のうち，どのアノテーションとも高い関連度を持っているアノテーションは，Ａ−Ｃ関連度が高く算出される。 In a system for adding annotations to content, the main attributes and features of the content tend to be expressed by many annotations added to the content. Therefore, in this embodiment, when an annotation added to a certain content has a higher degree of association with a larger number of other annotations added to the same content, the degree of association between the annotation and the content increases. Will be described. That is, an annotation having a high relevance level with any annotation in the annotation group added to the content is calculated to have a high AC relevance level.

〔Ａ−Ｃ関連度の第１の実施例〕
図６に示したコンテンツにおけるアノテーション群の例では，例えば，Ａ−Ｃ関連度の第１の実施例として，コンテンツａに対するアノテーションＡのＡ−Ｃ関連度Ｓ（ａ，Ａ）をアノテーションＡと同一コンテンツに付加された他のアノテーション群とのＡ−Ａ関連度の総和により定義してもよい。このとき，Ａ−Ｃ関連度Ｓ（ａ，Ａ）は，次の式(5) で与えられる。 [First Example of A-C Relevance]
In the example of the annotation group in the content illustrated in FIG. 6, for example, as a first example of the A-C relevance, the A-C relevance S (a, A) of the annotation A with respect to the content a is the same as the annotation A. You may define by the sum total of AA relevance degree with the other annotation group added to the content. At this time, the AC relevance S (a, A) is given by the following equation (5).

Ｓ（ａ，Ａ）＝ΣＲ（ｔ，Ａ）〔ただし，Σはｔ∈Ｔの総和〕 …式(5)
なお，Ｔはコンテンツａに付加されたアノテーションＡ以外のアノテーション群を含む集合である。 S (a, A) = ΣR (t, A) [where Σ is the sum of t∈T] (5)
T is a set including annotation groups other than the annotation A added to the content a.

Ａ−Ｃ関連度の第１の実施例に挙げた手法を用いることにより，算出したＡ−Ａ関連度の合計を求めるだけでよいため，少ない計算コストでＡ−Ｃ関連度を算出することができる。 By using the method described in the first example of the A-C relevance level, it is only necessary to obtain the total of the calculated A-A relevance levels. Therefore, it is possible to calculate the A-C relevance level with a low calculation cost. it can.

アノテーションが付加されたコンテンツの表示画面の一例を図７に示しているが，このコンテンツに付加されたアノテーションのＡ−Ａ関連度からＡ−Ｃ関連度を算出する例を図９に示す。図９は，Ａ−Ａ関連度算出部１１２によって算出されたＲ（Ａ，Ｂ）の値を要素とするＡ−Ａ関連度テーブルを示しており，例えば，アノテーション「旅行」に対するコンテンツ「神奈川県の観光名所」のＡ−Ｃ関連度Ｓ（神奈川県の観光名所，旅行）は，次のように算出される。 FIG. 7 shows an example of the display screen of the content with the annotation added. FIG. 9 shows an example of calculating the A-C relevance from the AA relevance of the annotation added to the content. FIG. 9 shows an AA association degree table whose elements are R (A, B) values calculated by the AA association degree calculation unit 112. For example, the content “Kanagawa Prefecture” for the annotation “travel” is shown. The A-C relevance S (tourist attraction in Kanagawa Prefecture, travel) of “no tourist attraction” is calculated as follows.

Ｓ（神奈川県の観光名所，旅行）
＝Ｒ（神奈川県，旅行）＋Ｒ（電車，旅行）＋Ｒ（江ノ電，旅行）＋Ｒ（江ノ島，旅行）＋Ｒ（鎌倉，旅行）
＝０．１＋０．３＋０．０１＋０．０１＋０．０３＝０．４５
他のアノテーションについてのＡ−Ｃ関連度についても，それぞれ同様に算出される。このようにして算出されたＡ−Ｃ関連度から，図９に示すＡ−Ａ関連度テーブルでは，タグ「江ノ島」のＡ−Ｃ関連度が「３．７」で，最も高くなることがわかる。 S (sightseeing spot, travel of Kanagawa)
= R (Kanagawa Prefecture, Travel) + R (Train, Travel) + R (Enoden, Travel) + R (Enoshima, Travel) + R (Kamakura, Travel)
= 0.1 + 0.3 + 0.01 + 0.01 + 0.03 = 0.45
The AC relevance levels for other annotations are calculated in the same manner. From the A-C relevance level calculated in this way, it can be seen that in the AA relevance level table shown in FIG. 9, the A-C relevance level of the tag “Enoshima” is “3.7”, which is the highest. .

〔Ａ−Ｃ関連度の第２の実施例〕
Ａ−Ｃ関連度Ｓ（ａ，Ａ）の第２の実施例として，Ａ−Ｃ関連度を他のアノテーション群とのＡ−Ａ関連度の中央値により定義してもよい。このとき，Ａ−Ｃ関連度Ｓ（ａ，Ａ）は，次の式(6) で与えられる。同一コンテンツに付加されている他のアノテーションの総数をｎとする。 [Second Example of A-C Relevance]
As a second example of the A-C relevance S (a, A), the A-C relevance may be defined by the median AA relevance with other annotation groups. At this time, the AC relevance S (a, A) is given by the following equation (6). Let n be the total number of other annotations added to the same content.

・ｎが奇数の場合
Ｓ（ａ，Ａ）＝Ｒ（ｔ′_(n+1)/2，Ａ）
・ｎが偶数の場合 …式(6)
Ｓ（ａ，Ａ）＝（Ｒ（ｔ′_n/2，Ａ）＋Ｒ（ｔ′_n/2+1，Ａ））／２
ここで，ｔ′_iは同一コンテンツに付加された他の全てのアノテーションとの間のＡ−Ａ関連度を小さい順に並べ替えたときにｉ番目となるアノテーションである。 When n is an odd number S (a, A) = R (t ′ _{(n + 1) / 2} , A)
・ When n is an even number: Formula (6)
S (a, A) = (R (t ′ _{n / 2} , A) + R (t ′ _{n / 2 + 1} , A)) / 2
Here, t ′ _i is the i-th annotation when the AA association degrees with all other annotations added to the same content are rearranged in ascending order.

Ａ−Ｃ関連度の第２の実施例に挙げた手法を用いることにより，Ａ−Ａ関連度の中に他の値と比べて非常に大きな値や小さな値の外れ値が含まれていた場合に，中央値によって緩和され，Ａ−Ｃ関連度が不当に高く算出されたり，不当に低く算出されたりするのを防ぐことができる。 By using the method described in the second example of the A-C relevance, the A-A relevance includes a very large value or an outlier that is smaller than other values. In addition, it is mitigated by the median value, and it can be prevented that the degree of A-C relevance is unduly high or low.

〔Ａ−Ｃ関連度の第３の実施例〕
また，別のＡ−Ｃ関連度Ｓ（ａ，Ａ）の第３の実施例として，各アノテーションをノードとし，他のアノテーション群とのＡ−Ａ関連度の比率をエッジの強さとするランダムサーファーモデル（参考文献２参照）により定義してもよい。このとき，Ａ−Ｃ関連度Ｓ（ａ，Ａ）は，以下に示す式で与えられる。 [Third Example of A-C Relevance]
In addition, as a third example of another AC relevance S (a, A), a random surfer having each annotation as a node and a ratio of AA relevance to other annotation groups as edge strength It may be defined by a model (see Reference 2). At this time, the AC relevance S (a, A) is given by the following equation.

〔参考文献２〕：S. Brin and L. Page,“The anatomy of a large scale hypertextual web search engine ”, In Proceedings of the seventh international conference on World Wide Web, Pages:107-117, 1998.
以下の式において，ｔ_iはコンテンツａに付加されたアノテーションのうち，ｉ番目に付加されたアノテーションであり，Ｔ_jはコンテンツａに付加されたアノテーションのうちアノテーションｔ_j以外のアノテーションを含む集合とする。
１．Ｍ_a（ｉ，ｊ）を（ｉ，ｊ）成分とする行列Ｍ_aを定義する。 [Reference 2]: S. Brin and L. Page, “The anatomy of a large scale hypertextual web search engine”, In Proceedings of the seventh international conference on World Wide Web, Pages: 107-117, 1998.
In the following expression, t _i is an i-th annotation added to the content a, and T _j is a set including annotations other than the annotation t _j among the annotations added to the content a. To do.
1. A matrix M _a having M _a (i, j) as an (i, j) component is defined.

・ｉ＝ｊの場合：Ｍ_a（ｉ，ｊ）＝０
・ｉ≠ｊの場合：Ｍ_a（ｉ，ｊ）＝Ｒ（ｔ_j，ｔ_i）／ΣＲ（ｔ_j，ｔ）
〔ただし，Σはｔ∈Ｔ_jの総和〕
２．Ｍ_aの固有値と固有ベクトル列の全ての組み合わせを計算し，絶対値が最大となる固有ベクトルを長さが１になるように正規化したベクトルＶを求める。
３．ベクトルＶのｉ行目の値をＳ（ａ，ｔ_i）とする。 When i = j: M _a (i, j) = 0
When i ≠ j: M _a (i, j) = R (t _j , t _i ) / ΣR (t _j , t)
[Where Σ is the sum of t∈T _j ]
2. All combinations of eigenvalues of M _a and eigenvector sequences are calculated, and a vector V obtained by normalizing the eigenvectors having the maximum absolute value so as to have a length of 1 is obtained.
3. The value of the i-th row of the vector V is S (a, t _i ).

ランダムサーファーモデルでは，エッジの強さが同じであっても，より高い値を持ったノードとのエッジの方が重視される。すなわち，Ａ−Ｃ関連度の第３の実施例に挙げた手法を用いることにより，他の多くのアノテーションと高いＡ−Ａ関連度を持っているアノテーションであるかを考慮して，どのアノテーションとのＡ−Ａ関連度を重視するのかを決定する。したがって，精度の高いＡ−Ｃ関連度の算出が可能になる。 In the random surfer model, even if the edge strength is the same, an edge with a node having a higher value is more important. In other words, by using the method described in the third example of the A-C relevance level, it is possible to determine which annotations are considered in consideration of the annotation having a high AA relevance level with many other annotations. It is determined whether to attach importance to the AA relevance level. Accordingly, it is possible to calculate the AC relevance with high accuracy.

〔コンテンツランキング処理〕
Ａ−Ｃ関連度を算出済みのアノテーションが付加されたコンテンツを利用することにより，ユーザがアノテーションを指定して検索を行った際における検索結果のコンテンツ集合を，指定したアノテーションとの関連が強い順に並べて提示することが可能である。これにより，検索を行ったユーザは，同じアノテーションが付けられたコンテンツ集合の中でも，アノテーションによって表現される特徴をより多く持ったコンテンツを発見し，閲覧することができるようになる。 [Content ranking processing]
By using content with annotations whose AC relevance has been calculated, the content set of the search results when the user performs a search by specifying the annotations in descending order of the relationship with the specified annotations. It is possible to present them side by side. As a result, the user who has performed the search can find and browse content having more features expressed by the annotation, even in the content set with the same annotation.

このとき，例えば，検索条件で指定したアノテーションが，コンテンツ中で何番目にＡ−Ｃ関連度が高いタグなのかを利用し，検索結果を並べ替えてもよい。 At this time, for example, the search result may be rearranged by using the number of tags with the highest A-C relevance in the content, which is the annotation specified in the search condition.

図１０は，コンテンツ集合の並べ替えを行うコンテンツ管理装置の構成例を示す図である。図１０において，アノテーション蓄積装置１００，情報関連度算出部１１０，Ａ−Ｃ関連度記憶装置１２０は，図２における同符号のものに対応する。 FIG. 10 is a diagram illustrating a configuration example of a content management apparatus that rearranges content sets. 10, the annotation storage device 100, the information relevance calculation unit 110, and the A-C relevance storage device 120 correspond to the same reference numerals in FIG.

コンテンツランキング部１３０は，Ａ−Ｃ関連度に従ってコンテンツ集合の並べ替えを行うものであり，コンテンツ選択部１３１，アノテーション並べ替え部１３２，コンテンツ並べ替え部１３３から構成される。 The content ranking unit 130 rearranges the content set according to the AC relevance level, and includes a content selection unit 131, an annotation rearrangement unit 132, and a content rearrangement unit 133.

以下，図１０に示すコンテンツランキング部１３０が実行する処理手順について説明する。コンテンツランキング部１３０は，コンテンツを検索して得られた検索結果リスト中のコンテンツの順序を，次のような処理手順で決定する。 Hereinafter, a processing procedure executed by the content ranking unit 130 illustrated in FIG. 10 will be described. The content ranking unit 130 determines the order of the contents in the search result list obtained by searching for the contents by the following processing procedure.

（１）ユーザからの検索要求に対して，コンテンツ選択部１３１は，アノテーション蓄積装置１００から，指定されたアノテーションが付加されているコンテンツ集合を取得する。 (1) In response to a search request from a user, the content selection unit 131 acquires a content set to which a designated annotation is added from the annotation storage device 100.

（２）コンテンツ選択部１３１は，取得したコンテンツに付加されているＡ−Ｃ関連度を，Ａ−Ｃ関連度記憶装置１２０から読み出し，各コンテンツに付加されたアノテーションとそれらのＡ−Ｃ関連度をアノテーション並べ替え部１３２に伝達する。ただし，アノテーション群のＡ−Ｃ関連度が算出済みでない場合には，前述した情報関連度算出部１１０の処理機能を用いてＡ−Ｃ関連度を算出し，算出結果をアノテーション並べ替え部１３２に伝達する。 (2) The content selection unit 131 reads the A-C relevance level added to the acquired content from the A-C relevance level storage device 120, and the annotations added to each content and their A-C relevance levels Is transmitted to the annotation rearrangement unit 132. However, if the A-C relevance level of the annotation group has not been calculated, the A-C relevance level is calculated using the processing function of the information relevance level calculation unit 110 described above, and the calculation result is sent to the annotation rearrangement unit 132. introduce.

（３）アノテーション並べ替え部１３２は，取得された各コンテンツについて，そのコンテンツに付加されたアノテーションをＡ−Ｃ関連度が高い順に並べ替え，各コンテンツとアノテーションをコンテンツ並べ替え部１３３に伝達する。 (3) For each acquired content, the annotation rearrangement unit 132 rearranges the annotations added to the content in descending order of A-C relevance, and transmits each content and annotation to the content rearrangement unit 133.

（４）コンテンツ並べ替え部１３３は，指定されたアノテーションが，並べ替えの後に上位に付加されているコンテンツから順にコンテンツを並べ替える。ただし，指定されたアノテーションの位置が同じ場合，付加されているアノテーションが少ないコンテンツから順に並べる。また，コンテンツに付加されているアノテーションの数が同じ場合，指定されたアノテーションのＡ−Ｃ関連度の値が高いものから順に並べる。さらに，アノテーションのＡ−Ｃ関連度の値が同じ場合，コンテンツＩＤが小さいものから順に並べる。 (4) The content rearrangement unit 133 rearranges the content in order from the content in which the designated annotation is added to the upper level after the rearrangement. However, if the position of the specified annotation is the same, it is arranged in order from the content with few added annotations. In addition, when the number of annotations added to the content is the same, the specified annotations are arranged in descending order of the AC relevance value. Further, when the annotations have the same AC relevance value, they are arranged in order from the smallest content ID.

（５）コンテンツ並べ替え部１３３は，コンテンツ集合の並べ替え結果をユーザに出力する。 (5) The content rearrangement unit 133 outputs the result of rearranging the content set to the user.

〔関連コンテンツ取得処理〕
さらに，Ａ−Ｃ関連度を算出したアノテーションが付加されたコンテンツを利用することにより，あるコンテンツに関連する別のコンテンツを発見することが可能である。これにより，ユーザはあるコンテンツを閲覧したときに，閲覧したコンテンツに関連する別のコンテンツを推薦する情報を自動的に入手することができ，続けて関連コンテンツを閲覧することができるようになる。 [Related content acquisition processing]
Furthermore, it is possible to find another content related to a certain content by using the content to which the annotation for which the AC relevance degree is calculated is added. As a result, when a user browses a certain content, the user can automatically obtain information for recommending another content related to the browsed content, and can subsequently browse the related content.

このとき，例えば，コンテンツに付けられたアノテーション群の類似度としてコサイン類似度を利用し，コンテンツの類似度の算出を行ってもよい。 At this time, for example, the cosine similarity may be used as the similarity of the annotation group attached to the content to calculate the content similarity.

図１１は，関連コンテンツ集合の取得を行うコンテンツ管理装置の構成例を示す図である。図１１において，アノテーション蓄積装置１００，情報関連度算出部１１０，Ａ−Ｃ関連度記憶装置１２０は，図２における同符号のものに対応する。 FIG. 11 is a diagram illustrating a configuration example of a content management apparatus that acquires a related content set. In FIG. 11, the annotation storage device 100, the information relevance calculation unit 110, and the AC relevance storage device 120 correspond to the same reference numerals in FIG.

関連コンテンツ取得部１４０は，Ａ−Ｃ関連度を算出したアノテーションが付加されたコンテンツを利用することにより，現在着目しているコンテンツと関連するコンテンツを取得するものであり，ベクトル作成部１４１，ベクトル類似度算出部１４２，関連コンテンツ提示部１４３から構成される。 The related content acquisition unit 140 acquires content related to the currently focused content by using the content with the annotation with the calculated AC relevance level. The vector creation unit 141, the vector It consists of a similarity calculation unit 142 and a related content presentation unit 143.

以下，図１１に示す関連コンテンツ取得部１４０が実行する処理手順について説明する。関連コンテンツ取得部１４０は，現在着目しているコンテンツと関連するコンテンツを，次のような処理手順で選出する。 Hereinafter, a processing procedure executed by the related content acquisition unit 140 illustrated in FIG. 11 will be described. The related content acquisition unit 140 selects content related to the content currently focused on in the following processing procedure.

（１）ベクトル作成部１４１は，コンテンツ集合Ｃ＝｛Ｃ_i｜ｉ＝１〜ｎ_C｝（ただし，ｎ_Cはコンテンツの総数）中の各コンテンツＣ_iに対し，Ｃ_iに付加されたアノテーション集合Ｔ_i＝｛Ｔ_ij｜ｊ＝１〜ｎ_Ti｝（ただし，ｎ_Tiはコンテンツに付加されたアノテーションの総数）を要素とし，各要素の値としてアノテーションＴ_ijとコンテンツＣ_iのＡ−Ｃ関連度の値を持つＡ−Ｃ関連度ベクトルＶ_iを定義し，ベクトル類似度算出部１４２に伝達する。ただし，アノテーション群のＡ−Ｃ関連度が算出済みでない場合には，前述した情報関連度算出部１１０の処理機能を用いてＡ−Ｃ関連度を算出し，Ａ−Ｃ関連度ベクトルを作成し，ベクトル類似度算出部１４２に伝達する。 (1) vector generating unit 141, the content set _{C = {C i | i =} 1~n C} ( however, n _C is the total number of contents) for each content C _i in the annotations added to the C _i The set T _i = {T _ij | j = 1 to n _Ti } (where n _Ti is the total number of annotations added to the content) is an element, and the value of each element is the annotation T _ij and the _AC of the content C _i An AC relevance vector V _i having a relevance value is defined and transmitted to the vector similarity calculation unit 142. However, if the A-C relevance level of the annotation group has not been calculated, the A-C relevance level is calculated using the processing function of the information relevance level calculation unit 110 described above, and an A-C relevance level vector is created. , To the vector similarity calculation unit 142.

（２）ベクトル類似度算出部１４２は，Ａ−Ｃ関連度ベクトルＶ_xと，Ａ−Ｃ関連度ベクトル集合Ｖ＝｛Ｖ_i｜ｉ＝１〜ｎ_C，ｉ≠ｘ｝中の各Ａ−Ｃ関連度ベクトルＶ_iとのコサイン類似度ｃｏｓ（Ｖ_x，Ｖ_i）を計算し，関連コンテンツ提示部１４３に伝達する。 (2) The vector similarity calculation unit 142 calculates the A-C association degree vector V _x and the A-C association degree vector set V = {V _i | i = 1 to n _C , i ≠ x}. The cosine similarity cos (V _x , V _i ) with the C relevance vector V _i is calculated and transmitted to the related content presentation unit 143.

（３）関連コンテンツ提示部１４３は，コサイン類似度ｃｏｓ（Ｖ_x，Ｖ_i）が，あらかじめ決めておいた閾値の値以上となるＡ−Ｃ関連度ベクトルＶ_iの集合を求め，そのＡ−Ｃ関連度ベクトルＶ_iに対応するコンテンツＣ_iの集合を，コンテンツＣ_xに関連するコンテンツの集合とする。 (3) The related content presentation unit 143 obtains a set of AC relevance vectors V _{i in which} the cosine similarity cos (V _x , V _i ) is equal to or greater than a predetermined threshold value. A set of contents C _i corresponding to the C relevance vector V _i is set as a set of contents related to the contents C _x .

（４）関連コンテンツ提示部１４３は，関連コンテンツ集合をユーザに出力する。 (4) The related content presentation unit 143 outputs a related content set to the user.

以上，本発明の実施形態を説明したが，本発明は，上記の実施形態に限定されず，特許請求の範囲に記載された技術的範囲内において変更や応用が可能である。 Although the embodiments of the present invention have been described above, the present invention is not limited to the above-described embodiments, and modifications and applications can be made within the technical scope described in the claims.

１０コンテンツ管理装置
２０入出力装置
１００アノテーション蓄積装置
１０１コンテンツ情報管理テーブル
１０２アノテーション情報管理テーブル
１１０情報関連度算出部
１１１コンテンツ選択部
１１２Ａ−Ａ関連度算出部
１１３Ａ−Ｃ関連度算出部
１２０Ａ−Ｃ関連度記憶装置
１２１Ａ−Ｃ関係管理テーブル
１３０コンテンツランキング部
１３１コンテンツ選択部
１３２アノテーション並べ替え部
１３３コンテンツ並べ替え部
１４０関連コンテンツ取得部
１４１ベクトル作成部
１４２ベクトル類似度算出部
１４３関連コンテンツ提示部 DESCRIPTION OF SYMBOLS 10 Content management apparatus 20 Input / output apparatus 100 Annotation accumulation | storage device 101 Content information management table 102 Annotation information management table 110 Information relevance calculation part 111 Content selection part 112 AA relevance degree calculation part 113 AC relevance degree calculation part 120 A -C Relevance Level Storage Device 121 A-C Relationship Management Table 130 Content Ranking Unit 131 Content Selection Unit 132 Annotation Rearrangement Unit 133 Content Rearrangement Unit 140 Related Content Acquisition Unit 141 Vector Creation Unit 142 Vector Similarity Calculation Unit 143 Related Content Presentation Part

Claims

A content management device that stores and manages annotations added to content,
Annotation storage means storing annotations added to the content;
A content selection means for selecting designated content and acquiring an annotation group added to the content from the annotation storage means;
An inter-annotation relevance calculating means for calculating a relevance between annotations that expresses the strength of the relationship between annotations in the annotation group by a numerical value;
An annotation-content relevance calculating means for calculating the relevance between the annotation and the content that expresses the strength of the relationship between each annotation and the content from the relevance between the annotations,
A content management apparatus, comprising: an annotation-content relevance output means for storing or outputting the calculated relevance between the annotation and the content.

The inter-annotation relevance calculating means is:
A means for calculating the degree of association based on the content rate of a content set to which another annotation is added to a content set to which a certain annotation is added,
Or a means of calculating the relevance by the co-occurrence frequency in the same content of annotation,
The content management apparatus according to claim 1, further comprising at least one of a degree-of-association calculation unit based on an independence test value with respect to an annotation appearance frequency.

The annotation-content relevance calculation means includes:
A means for calculating the degree of association by the sum of the degree of association between annotations with other annotation groups,
Or a means for calculating the degree of association by the median degree of association between annotations with other annotation groups,
3. The method according to claim 1, further comprising at least one of a degree-of-association calculation means based on a random surfer model in which a link strength is a ratio of the degree of association between annotations with other annotation groups. Content management device.

Based on the annotation group for which the degree of association between the annotation and the content is calculated and the degree of association between the annotation and the content, the order is sorted in descending order of the degree of association between the annotation and the content. The content management apparatus according to claim 1, further comprising output processing means for outputting the annotation group that has been processed.

A content management device that stores and manages annotations added to content,
Annotation storage means storing annotations added to the content;
The degree of association between the annotation and the content calculated by the content selection unit, the inter-annotation relevance calculation unit, and the annotation-content relevance calculation unit according to any one of claims 1 to 4. Annotation-content relevance storage means for storing
A certain content based on the information of the annotation group added to each content stored in the annotation storage means and the information on the relation between the annotation and the content stored in the annotation-content relevance storage means An annotation sorting means for rearranging the order of annotations added to the item in descending order of the degree of association between the annotation and the content,
A content management device comprising content sorting means for rearranging the order of content sets based on annotations added to each content in the content set and the degree of association between the annotations and the content .

A content management device that stores and manages annotations added to content,
Annotation storage means storing annotations added to the content;
The degree of association between the annotation and the content calculated by the content selection unit, the inter-annotation relevance calculation unit, and the annotation-content relevance calculation unit according to any one of claims 1 to 4. Annotation-content relevance storage means for storing
A vector creation means for creating an annotation having a value of the degree of association between the annotations of the annotation group added to each content stored in the annotation storage means and the content, and a degree of association vector between the contents;
A vector similarity calculation means for calculating a similarity between the relevance vectors between the annotation created for each content and the content;
A content management apparatus comprising: related content presenting means for acquiring and presenting other content related to a certain content based on information on the similarity between the calculated annotation and the relevance vector between the content .

An information relevance calculation method executed by a content management apparatus having an annotation storage unit storing annotations added to content,
A content selection step of selecting the specified content and acquiring the annotation group added to the content from the annotation storage means;
An inter-annotation relevance calculation step for calculating a relevance level between annotations that numerically represents the strength of the relationship between annotations in the annotation group;
An annotation-content relevance calculating step for calculating the relevance between the annotation and the content that expresses the strength of the relationship between each annotation and the content from the relevance between the annotations;
An information relevance calculation method, comprising: executing an annotation-content relevance output step for storing or outputting a relevance between the calculated annotation and the content.

An information relevance calculation program for causing a computer to execute the information relevance calculation method according to claim 7.