JP3996470B2

JP3996470B2 - Visual information classification method, visual information classification apparatus, visual information classification program, and recording medium recording the program

Info

Publication number: JP3996470B2
Application number: JP2002242775A
Authority: JP
Inventors: 悦郎藤田; 伸二宮原; 伸治安部; 林　　泰仁
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2002-08-23
Filing date: 2002-08-23
Publication date: 2007-10-24
Anticipated expiration: 2022-08-23
Also published as: JP2004086262A

Description

【０００１】
【発明の属する技術分野】
本発明は、大量の情報を情報間の内容的類似性に基づいて２次元平面上に分類配置する視覚的情報分類方法およびその装置と、その視覚的情報分類方法の実現に用いられる視覚的情報分類プログラムおよびそのプログラムを記録した記録媒体とに関し、特に、２次元配置の対象となる情報の数が増大したり、新たに情報が追加される場合にも、短い時間で分類配置を行えるようにする視覚的情報分類方法およびその装置と、その視覚的情報分類方法の実現に用いられる視覚的情報分類プログラムおよびそのプログラムを記録した記録媒体とに関する。
【０００２】
【従来の技術】
従来、大量コンテンツを２次元上に視覚的に分類配置する技術が、例えば下記の文献で提案されている。
〔参考文献１〕James A. Wise, et. al. Visualizing the non-visual:Spatial analysis and interaction with information from text documents. Proc. of IEEE Information Visualization '95, pp.51-58(1995)。
【０００３】
この文献では、コンテンツがテキスト文書である場合を対象として、テキスト文書の概念を計量化して概念ベクトルを抽出し、これに多次元尺度法を適用してコンテンツの２次元配置およびそれを用いたブラウジングインタフェースを実現している。
【０００４】
【発明が解決しようとする課題】
コンテンツにメタ情報が付与されている場合、これをコンテンツの探索に利用することは効果的である。例えばウェブページの探索の場合なら、多くのポータルサイトなどが提供しているディレクトリサービスを用いることで目的のウェブページを効率的に絞り込むといったことができる。
【０００５】
このようなことを考慮し、本発明者は、先に出願した特願2001-352056 や特願2002-55461で、情報にあらかじめ付与された分類カテゴリと概要説明文などのテキストとから情報間の類似性すなわち距離を算出して、多次元尺度法を用いて情報を２次元平面上に分類配置するという発明を出願した。
【０００６】
この発明の特徴は、コンテンツを分類カテゴリの単位にクラスター化して配置するという点にある。
【０００７】
しかるに、従来のようなコンテンツの２次元配置を行うにしろ、本発明者が先に出願した特願2001-352056 や特願2002-55461に従って情報の２次元配置を行うにしろ、いずれの場合にも、２次元配置の対象となる情報の数が増大すると、多次元尺度法の実施に要する時間が増加するという問題がある。
【０００８】
そして、従来のようなコンテンツの２次元配置を行うにしろ、本発明者が先に出願した特願2001-352056 や特願2002-55461に従って情報の２次元配置を行うにしろ、いずれの場合にも、新たに入力された情報を２次元平面的に追加的に配置することができないことから、新たに情報が追加される場合には、新たに入力された情報を含める形で分類配置の処理を最初からやり直さなくてはならないという問題がある。
【０００９】
本発明はかかる事情に鑑みてなされたものであって、大量の情報を情報間の内容的類似性に基づいて２次元平面上に分類配置した後に、個々の情報をその分類配置のマップに逐次的に分類配置できるようにすることで、２次元配置の対象となる情報の数が増大したり、新たに情報が追加される場合にも、短い時間で分類配置を行えるようにする新たな視覚的情報分類技術の提供を目的とする。
【００１０】
【課題を解決するための手段】
この目的を達成するために、本発明の視覚的情報分類装置は、大量の情報を情報間の内容的類似性に基づいて２次元平面上に配置するにあたって、分類対象となる情報に分類カテゴリなどのような階層構造をとるメタ情報が付与されていない場合に、（１）各々の情報の持つ概念ベクトルの間の距離を、情報に予め付与された階層構造をとるメタ情報の一致度合いに応じて一致度合いが大きくなるほど小さな値となるように補正して、その補正した距離に基づいて、各々の情報の２次元上での配置座標を算出することで、それらの情報を配置する基準マップを作成する作成手段と、（２）未配置の情報が与えられる場合に、基準マップを構成する情報の中に含まれるその未配置情報に付与されたメタ情報と同一のメタ情報を持つ情報を特定して、その特定した情報の中から、その未配置情報との間の概念ベクトル間距離が最も小さなものとなる情報を選択する選択手段と、（３）未配置情報の基準マップ上における配置座標の初期座標を設定する設定手段と、（４）基準マップ上に、選択手段の選択した情報の配置座標を中心とするある大きさを持つ領域を初期設定して、その領域に入る情報の配置座標と、その情報の持つ概念ベクトルと未配置情報の持つ概念ベクトルとの間の距離をメタ情報の一致度合いに応じて補正した距離を変数とする単調減少関数とに基づいて、未配置情報の基準マップ上における配置座標をその単調減少関数の値に応じてその情報の配置座標に近づける形で更新してから、その領域を小さなものに設定することを繰り返していくことで、未配置情報の基準マップ上における最終的な配置座標を算出する算出手段とを備えるように構成する。
この構成を採るときに、算出手段は、単調減少関数に繰り返し回数が増加するほど小さな値を示す係数を乗算したものを用いて、配置座標の更新を行うことがある。
また、設定手段は、選択手段の選択した情報の配置座標、あるいは、その配置座標の近傍の配置座標を初期座標として設定することがある。
また、算出手段は、領域の初期設定領域として、未配置情報に付与されたメタ情報と同一のメタ情報を持つ情報のみが含まれる領域を初期設定することがある。
また、算出手段は、これまでに配置座標を算出した未配置情報を基準マップを構成する情報に含める形で、新たに与えられる未配置情報の基準マップ上における配置座標を算出することがある。
【００１６】
以上の各処理手段はコンピュータプログラムで実現できるものであり、このコンピュータプログラムは、半導体メモリなどの記録媒体に記録して提供することができる。
【００１７】
このように構成される本発明の視覚的情報分類装置では、情報に付与されたメタ情報に従って、メタ情報の一致度合いが大きくなるほど小さな値となるようにと情報の持つ概念ベクトル間の距離を補正し、その補正した距離に従って多次元尺度法などを用いて基準マップを作成した後、未配置の情報が与えられると、基準マップを構成する情報の中に含まれるその未配置情報に付与されたメタ情報と同一のメタ情報を持つ情報を特定して、その特定した情報の中から、その未配置情報との間の概念ベクトル間距離が最も小さなものとなる情報を選択するとともに、未配置情報の基準マップ上における配置座標の初期座標を設定する。
続いて、基準マップ上に、選択した情報の配置座標を中心とするある大きさを持つ領域を初期設定して、その領域に入る情報の配置座標と、その情報の持つ概念ベクトルと未配置情報の持つ概念ベクトルとの間の距離をメタ情報の一致度合いに応じて補正した距離を変数とする単調減少関数とに基づいて、未配置情報の基準マップ上における配置座標をその単調減少関数の値に応じてその情報の配置座標に近づける形で更新してから、その領域を小さなものに設定することを繰り返していくことで、未配置情報の基準マップ上における最終的な配置座標を算出することで、基準マップ上に配置されていない情報を基準マップに逐次的に分類配置する。
【００１８】
この分類配置の実現にあたって、未配置情報は、未配置情報に付与されたメタ情報と同一のメタ情報を持つ情報の近傍に配置されることが予想されるので、基準マップを構成する全ての情報を処理対象として、未配置情報の配置座標の算出を行うのではなくて、基準マップを構成する情報の中に含まれる未配置情報に付与されたメタ情報と同一のメタ情報を持つ情報を処理対象として、未配置情報の配置座標の算出を行うように処理することがある。
【００２０】
そして、この分類配置の実現にあたって、これまでに配置座標を算出した未配置情報を基準マップを構成する情報に含める形で、新たに与えられる未配置情報の基準マップ上における配置座標を算出するように処理することがある。
【００２１】
このようにして、本発明によれば、大量の情報を情報間の内容的類似性に基づいて２次元平面上に分類配置した後に、個々の情報をその分類配置のマップに逐次的あるいは追加的に分類配置できるようになることから、２次元配置の対象となる情報の数が増大したり、新たに情報が追加される場合にも、短い時間で分類配置を行えるようになる。
【００２２】
【発明の実施の形態】
以下、コンテンツの視覚的分類に適用した実施の形態に従って、本発明について詳細に説明する。
【００２３】
図１は、本発明の一実施形態に係る視覚的コンテンツ分類方法を実施するシステムの構成を示す図である。
【００２４】
図１に示す視覚的コンテンツ分類システムは、コンピュータ１０と、このコンピュータ１０にネットワーク３０を介して接続されるコンテンツデータベース（コンテンツＤＢ）２０、メタ情報データベース（メタ情報ＤＢ）２１、概念ベクトルデータベース（概念ベクトルＤＢ）２２および配置座標データベース（配置座標ＤＢ）２３から構成されている。
【００２５】
コンピュータ１０は、ＲＡＭ、ＲＯＭ、磁気ディスクなどからなるメモリ、ＣＰＵ、ディスプレイによる表示部１１、およびマウスやキーボードなどからなる指示入力部１２から構成されており、ＣＰＵが実行するソフトウェアプログラムによって実現される基準マップ作成部４０およびコンテンツ配置部４１を備えている。
【００２６】
コンテンツＤＢ２０には、処理対象となるコンテンツと、その内容を表すテキスト（概要説明文）とが格納されている。
【００２７】
また、メタ情報ＤＢ２１には、コンテンツＤＢ２０に格納されている各々のコンテンツに付与された分類カテゴリの情報（実際には最下層の分類カテゴリの情報）が格納されている。
【００２８】
この分類カテゴリの情報は、事前に与えられたコンテンツの分類体系に従ってコンテンツ毎に付与されることになる。この実施形態例では、分類カテゴリ情報は、深さＮ（Ｎは正の整数である）の階層構造を有しているものとする。
【００２９】
図２に、コンテンツを分類するための分類カテゴリの体系の一例を示す。この分類体系に従う場合、コンテンツＤＢ２０に格納されている各コンテンツには、図２に示すＬij(i,j＝1,2,3)のいずれかの適切な分類カテゴリが事前に割り当てられており、この割り当てられた分類カテゴリの情報がメタ情報ＤＢ２１に格納されている。
【００３０】
また、概念ベクトルＤＢ２２には、以下で説明する処理によって、コンテンツＤＢ２０に格納されている各コンテンツに関しての概念ベクトルが格納されている。
【００３１】
また、配置座標ＤＢ２３には、以下で説明する処理によって、各々のコンテンツの２次元配置座標が格納されている。
【００３２】
基準マップ作成部４０およびコンテンツ配置部４１は、このように構成される視覚的コンテンツ分類システムの下で、以下に説明する処理を実行することで本発明を実現するように動作する。
【００３３】
〔１〕基準マップ作成部４０の処理
基準マップ作成部４０は、コンテンツＤＢ２０に格納されているコンテンツを処理対象として、それらの全ての２つのコンテンツの組み合わせについてコンテンツ間の距離を算出して、それに基づいて各々のコンテンツの２次元上での配置座標を算出し、同じ分類カテゴリに含まれるコンテンツ同士が２次元上で集団をなすように近接して配置されるという図３や図４に示すようなコンテンツの散布図画像を作成して、ユーザに提示するという処理を実行する。
【００３４】
ここで、図４に示す散布図画像は、図３に示す散布図画像の矩形領域５０を拡大表示した場合の例を示しており、この矩形領域５０を指定して、拡大縮小操作つまみ５１をマウス等により操作することにより表示することが可能となるものである。
【００３５】
このようにして作成されるコンテンツの散布図画像は、コンテンツＤＢ２０に格納されているコンテンツを使って作成されるものであることから、以下、基準マップと呼ぶことにする。
【００３６】
図５に、基準マップ作成部４０の実行する処理フローの一実施形態例を図示する。
【００３７】
基準マップ作成部４０は、図５の処理フローに示すように、先ず最初に、ステップ１０で、コンテンツＤＢ２０に格納されている各コンテンツの概要説明文をメモリに読み出し、続くステップ１１で、メタ情報ＤＢ２１に格納されている各コンテンツに付与された分類カテゴリの情報（実際には最下層の分類カテゴリの情報）をメモリに読み出す。
【００３８】
続いて、ステップ１２で、読み出した概要説明文から１つ又は複数の概念ベクトルを算出して、図６に示すように、読み出した最下層分類カテゴリの情報と対応をとる形で概念ベクトルＤＢ２２に格納する。
【００３９】
ここで、図６では図示していないが、概念ベクトルＤＢ２２に格納した概念ベクトルがどのコンテンツのものであるのかについて示すリンク情報についても概念ベクトルＤＢ２２に格納することになる。
【００４０】
この概念ベクトルは多次元の実数値ベクトルとして表される。なお、概要説明文からの概念ベクトル（事前に定められた語彙に関する重みベクトルとして与えられる）の算出方法については、下記文献に詳述されているのでここでは説明を省略する。
〔参考文献２〕熊本睦他，概念ベースの情報検索への適用−概念ベースを用いた検索の特性評価−，信学技報 AI98-63(1999)。
【００４１】
また、概念ベクトルの算出方法として、概念ベースに分類カテゴリの代表語が入力されると、その代表語から連想される語彙や説明文から概念ベクトルを算出するという方法が提案されているので、それを用いることでもよい。この方法については、参考文献２ならびに下記文献に詳述されているのでここでは説明を省略する。
〔参考文献３〕笠原要他，国語辞書を利用した日常語の類似性判別，情処論，Vol.38,No7,pp1272-1283,(1997) 。
【００４２】
続いて、ステップ１３で、概念ベクトルＤＢ２２に格納した各コンテンツの概念ベクトルとそれに対応付けられる最下層分類カテゴリの情報とをメモリに読み出し、続くステップ１４で、表示対象となるコンテンツに含まれる２つのコンテンツの全ての組合せについて、コンテンツ間の距離を算出する。
【００４３】
なお、２つのコンテンツの全ての組合せとは、必ずしもコンテンツＤＢ２０に格納されているコンテンツの全てでなくてもよく、例えば表示対象があらかじめ検索条件等によって絞られている場合には、その表示対象となっているコンテンツから抽出可能な２つのコンテンツの全ての組合せを意味する。
【００４４】
このコンテンツ間の距離の算出処理については、後述する図１０および図１１の処理フローで詳細に説明する。
【００４５】
続いて、ステップ１５で、算出した距離を使い、多次元尺度構成法によって、各々のコンテンツの２次元平面上での配置座標を算出して、配置座標ＤＢ２３に格納する。
【００４６】
多次元尺度構成法は、高次元ベクトル空間から低次元空間への圧縮アルゴリズムであって、以下に示す目的関数の最小化問題として解かれる。
【００４７】
【数１】

【００４８】
すなわち、この目的関数の最小値を与える（ｘa,ｙa )(ａ＝１,2,....,ｎ）の組が、各々のコンテンツａの２次元上での配置座標となる。ただし、この目的関数で、ｄab^*はコンテンツａとコンテンツｂとの間の距離（後述する図１０および図１１の処理フローで算出する距離）を示し、ｄabは、
ｄab ＝｛（ｘa −ｘb ）²＋（ｙa −ｙb ）²｝^1/2
を示し、ｎは表示対象となっているコンテンツの総数を示している。
【００４９】
この目的関数の最小化問題は、いわゆる最急降下法を用いて求められるが、これについては下記文献に詳述されているのでここでは説明を省略する。
〔参考文献４〕J. W. Sammon. A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, C-18(5):401-409(1969)。
【００５０】
続いて、ステップ１６で、配置座標ＤＢ２３に格納した各コンテンツの２次元座標情報をメモリに読み出して、これに基づいてユーザに提示するコンテンツの散布図画像を作成し、続くステップ１７で、作成したコンテンツの散布図画像をコンピュータ１０の表示部１１へ出力する。
【００５１】
このようにして作成されるコンテンツの散布図画像は、後述する図１０および図１１の処理フローで説明するように、コンテンツ間の距離を単に概念ベクトル間の距離で与えるのではなく、コンテンツが分類される分類カテゴリ間の類似性まで考慮して定義することで、多次元尺度構成法の結果にコンテンツの分類カテゴリ情報をも組み込める点に特徴がある。
【００５２】
これによって、図３や図４に示したように、同じ分類カテゴリに含まれるコンテンツ同士は、２次元上で集団をなすように近接して配置されるという効果が得られる。
【００５３】
次に、図５の処理フローのステップ１４で実行するコンテンツ間の距離の算出処理について説明する。
【００５４】
図７に例示するように、分類カテゴリ情報は、深さＮの階層構造を有している。すなわち、各々のコンテンツは第１階層における分類として、
Ｌi1 ：第１階層の分類カテゴリ
ここで、i1＝１,....,Ｍ
のいずれかに分類されており（Ｍは正の整数）、分類カテゴリＬi1に分類されたコンテンツは第２階層における分類として、
Ｌi1i2 ：Ｌi1を親カテゴリとする第２階層の分類カテゴリ
ここで、i2＝i2(i1)＝１,....,Ｍi1
のいずれかに分類されている（Ｍi1は正の整数）。
【００５５】
同様に、第ｋ−１階層においてある分類カテゴリＬi1i2i3....i(k-1)に分類されたコンテンツは第ｋ階層における分類として、
Ｌi1i2i3....ik ：Ｌi1i2i3....i(k-1)を親カテゴリとする
第ｋ階層の分類カテゴリ
ここで、ik＝ik(i1,i2,....,i(k-1))=１,....,Ｍi1i2i3....i(k-1)
のいずれかに分類されており（Ｍi1i2i3....i(k-1)は正の整数）、これがｋ＝Ｎまで続けられる。
【００５６】
そして、Ｎ階層目における分類カテゴリＬi1i2i3....iNの名前が、上述の分類カテゴリ情報としてメタ情報ＤＢ２１に格納されている。
【００５７】
図５の処理フローのステップ１４で実行するコンテンツ間の距離の算出処理について、図７に示す分類体系を前提として具体的に説明するならば、視覚的コンテンツ分類システムは、分類カテゴリの第１階層の距離係数行列として、分類カテゴリの第１階層が３つのクラスで構成されることに対応して、例えば、図８（ａ）に示すように、対角成分については１より小さな値を示すＡ₁(Ａ₁＜１）を持ち、非対角成分については１よりも大きな値を示すＢ₁(Ｂ₁≧１）を持つものを用いる。
【００５８】
すなわち、２つのコンテンツの分類カテゴリの第１階層が同一カテゴリにあるときには、１より小さな値を示すＡ₁(Ａ₁＜１）が割り当てられ、同一カテゴリにないときには、１よりも大きな値を示すＢ₁(Ｂ₁≧１）が割り当てられることになる第１階層の距離係数行列を用いるのである。
【００５９】
また、分類カテゴリの第２階層の距離係数行列として、分類カテゴリの第２階層が３つのクラスで構成されることに対応して、例えば、図８（ｂ）に示すように、対角成分については１より小さな値を示すＡ₂(Ａ₂＜１）を持ち、非対角成分については１よりも大きな値を示すＢ₂(Ｂ₂≧１）を持つものを用いる。
【００６０】
すなわち、２つのコンテンツの分類カテゴリの第２階層が同一カテゴリにあるときには、１より小さな値を示すＡ₂(Ａ₂＜１）が割り当てられ、同一カテゴリにないときには、１よりも大きな値を示すＢ₂(Ｂ₂≧１）が割り当てられることになる第２階層の距離係数行列を用いるのである。
【００６１】
また、分類カテゴリの第３階層の距離係数行列として、分類カテゴリの第３階層が３つのクラスで構成されることに対応して、例えば、図８（ｃ）に示すように、対角成分については１より小さな値を示すＡ₃(Ａ₃＜１）を持ち、非対角成分については１よりも大きな値を示すＢ₃(Ｂ₃≧１）を持つものを用いる。
【００６２】
すなわち、２つのコンテンツの分類カテゴリの第３階層が同一カテゴリにあるときには、１より小さな値を示すＡ₃(Ａ₃＜１）が割り当てられ、同一カテゴリにないときには、１よりも大きな値を示すＢ₃(Ｂ₃≧１）が割り当てられることになる第３階層の距離係数行列を用いるのである。
【００６３】
視覚的コンテンツ分類システムは、この距離係数行列により特定される補正係数ｗを使って、コンテンツｃ_iの概念ベクトルｖ_iと、コンテンツｃ_jの概念ベクトルｖ_jとにより算出されるコンテンツｃ_iとコンテンツｃ_jとの間の距離“ｄist(ｖ_i,ｖ_j）”を、“ｗ×ｄist(ｖ_i,ｖ_j) ”という算出式に従って補正することで、概念ベクトル間の距離の算出に対して分類カテゴリ情報を反映させるように処理している。
【００６４】
このとき用いる補正係数ｗは、例えば、図９に示すように、２つのコンテンツの分類カテゴリが第１および第２階層のレベルまで一致するときには、分類カテゴリの第３階層が一致するのか否かに応じて、
ｗ＝Ａ₁×Ａ₂×Ａ₃（第３階層まで一致）
ｗ＝Ａ₁×Ａ₂×Ｂ₃（第３階層は不一致）
と算出し、２つのコンテンツの分類カテゴリが第１階層のレベルまで一致し、かつ第２階層のレベルでは一致しないときには、
ｗ＝Ａ₁×Ｂ₂
と算出し、２つのコンテンツの分類カテゴリが第１階層のレベルで一致しないときには、
ｗ＝Ｂ₁
と算出するようにしている。
【００６５】
このようにして算出される補正係数ｗを用いると、２つのコンテンツの分類カテゴリがより深い階層レベルまで一致する程、“ｗ×ｄist(ｖ_i,ｖ_j) ”という算出式に従って算出される２つのコンテンツの距離がより小さなものになるように補正されることになる。
【００６６】
図１０および図１１に、図５の処理フローのステップ１４で実行することになるコンテンツ間の距離の算出処理の詳細な処理フローを図示する。
【００６７】
この処理フローの実行にあたって、先ず最初に、コンテンツｃ_iとコンテンツｃ_jとの間のユークリッド距離“ｄist(ｖ_i,ｖ_j）”を算出する。
【００６８】
次に、分類カテゴリ間の距離を表す距離行列（上述した距離係数行列から導出される）をメモリ上に変数として次の通り構成する。
【００６９】
まず、第１階層の分類カテゴリＬi1（i1＝１,....,Ｍ）に関する距離行列（ｗpq）を構成する。ただし、（ｗpq）はＭ次非負対称行列である。
【００７０】
次に、全てのＬi1について、Ｌi1の直下の分類カテゴリＬi1i2（i2＝i2(i1)＝１,....,Ｍi1）に関する距離行列（ｗ［Ｌi1］pq）を、
ｗ［Ｌi1］pq ：＝ｗi1i1＊ｓ［Ｌi1］pq
ｗi1i1 ：上述の（ｗpq）の（i1,i1)成分
（ｓ［Ｌi1］pq）：Ｍi1次非負対称行列
のように構成する。ただし、（ｗ［Ｌi1］pq）はＭi1次非負対称行列である。
【００７１】
次に、全てのＬi1i2について、Ｌi1i2の直下の分類カテゴリＬi1i2i3（i3＝i3(i1,i2) ＝１,....,Ｍi1i2）に関する距離行列（ｗ［Ｌi1i2］pq）を、
ｗ [Ｌi1i2］pq ：＝ｗ［Ｌi1］i2i2＊ｓ［Ｌi1i2］pq
ｗ [Ｌi1］i2i2 ：上述の（ｗ［Ｌi1］pq）の（i2,i2 ）成分
（ｓ［Ｌi1i2］pq）：Ｍi1i2次非負対称行列
のように構成する。ただし、（ｗ［Ｌi1i2］pq）はＭi1i2次非負対称行列である。
【００７２】
以下同様に、ｋ階層目における全てのＬi1i2....ikについて、Ｌi1i2....ikの直下の分類カテゴリＬi1i2....iki(k+1)（i(k+1)＝i(k+1)(i1,i2,....,ik) に関する距離行列（ｗ［Ｌi1i2....ik］pq）を、
ｗ［Ｌi1i2..ik］pq：＝ｗ［Ｌi1i2..i(k-1)]ikik ＊ｓ［Ｌi1i2..ik］pq
ｗ [Ｌi1i2..i(k-1)]ikik ：（ｗ［Ｌi1i2..ｉ(k-1)]pq）の（ik,ik)成分
（ｓ［Ｌi1i2..ik］pq）：Ｍi1i2..ik次非負対称行列
のように構成する。ただし、（ｗ［Ｌi1i2....ik］pq）はＭi1i2....ik次非負対称行列として定義される。
【００７３】
そして、ｋ＝Ｎ−１まで、この距離行列（ｗ［Ｌi1i2....ik］pq）を構成する。
【００７４】
そして、（ｗpq）および（ｓ［Ｌi1］pq),（ｓ［Ｌi1i2］pq),...,（ｓ［Ｌi1i2....i(N-2)］pq),（ｓ［Ｌi1i2....i(N-1)］pq）の各成分を、対角成分については１より小さい任意の値、非対角成分については１あるいは１より大きい任意の値に設定して、上述の距離行列（ｗpq),（ｗ［Ｌi1］pq),（ｗ［Ｌi1i2］pq),...,（ｗ［Ｌi1i2....i(N-2)］pq),（ｗ［Ｌi1i2....i(N-1)）］pq）の変数全てを初期化する。
【００７５】
コンテンツｃ_iとコンテンツｃ_jとの間の距離の算出では、上記の距離行列を用いて、分類カテゴリ間の距離を考慮した距離ｄist ^*( ｖ_i,ｖ_j）を、
ｄist ^*( ｖ_i,ｖ_j）＝ｗ＊ｄist ( ｖ_i,ｖ_j）
という算出式に従って新たに計算してメモリに記録する。
【００７６】
ここで、ｗは、コンテンツｃ_iのＮ階層目の分類カテゴリ名がＬi1i2....iN、コンテンツｃ_jのＮ階層目の分類カテゴリがＬj1j2....jNのときに、
（１）ｗ＝ｗi1j1 if i1！＝j1
（２）ｗ＝ｗ［Ｌi1....i(k-1)］ikjk
if i1＝j1,....,i(k-1)＝j(k-1), ik！＝jk
where ２＜＝ｋ＜＝Ｎ
（３）ｗ＝ｗ［Ｌi1....i(N-1)〕iNjN if i1＝j1,...,iN=jN
のように与えられる。
【００７７】
そして、この分類カテゴリ間の距離を考慮した距離ｄist ^*( ｖ_i,ｖ_j）をコンテンツの全ての組合せについて算出し、上記読み出したデータに関する距離行列を構成する。これが上述した多次元尺度法適用処理で用いる距離行列となる。
【００７８】
次に、図１０および図１１の処理フローに従って、図５の処理フローのステップ１４で実行することになるコンテンツ間の距離の算出処理について具体的に説明する。
【００７９】
視覚的コンテンツ分類システムは、上述した多次元尺度法適用処理のために、コンテンツｃ_iとコンテンツｃ_jとの間の距離を算出する場合、図１０および図１１の処理フローに示すように、先ず最初に、ステップ２０で、コンテンツｃ_iの概念ベクトルｖ_iと、コンテンツｃ_jの概念ベクトルｖ_jとの間のユークリッド距離“ｄist(ｖ_i,ｖ_j）”を算出する。
【００８０】
続いて、ステップ２１で、分類カテゴリの階層レベルを示す変数ｋに、第１階層を示す“１”をセットする。
【００８１】
続いて、ステップ２２で、コンテンツｃ_iの分類カテゴリＬp の持つ第１階層の分類カテゴリ値と、コンテンツｃ_jの分類カテゴリＬq の持つ第１階層の分類カテゴリ値とを特定する。
【００８２】
続いて、ステップ２３で、その特定した分類カテゴリ値の指す第１階層の距離係数行列の成分値を特定する。すなわち、図８（ａ）に示すような行列で定義される分類カテゴリの第１階層に対応付けて用意される距離係数行列を参照することで、その特定した分類カテゴリ値の指す成分値（図８（ａ）の例で説明するならば、Ａ₁かＢ₁）を特定するのである。
【００８３】
続いて、ステップ２４で、その特定した成分値を変数ｗに代入する。続いて、ステップ２５で、ステップ２２で特定した２つの分類カテゴリ値が一致するのか否かを判断して、一致しないことを判断するときには、ステップ２６に進んで、変数ｗの値とステップ２０で算出した距離“ｄist(ｖ_i,ｖ_j）”とを乗算することで、コンテンツｃ_iとコンテンツｃ_jとの間の距離を算出して、処理を終了する。
【００８４】
一方、ステップ２５で、ステップ２２で特定した２つの分類カテゴリ値が一致することを判断するときには、ステップ２７に進んで、変数ｋの値を１つインクリメントし、続くステップ２８で、変数ｋの値が分類カテゴリの深さＮよりも大きくなったのか否かを判断する。
【００８５】
この判断処理により、変数ｋの値が分類カテゴリの深さＮよりも大きくなっていないことを判断するときには、ステップ３０に進んで、コンテンツｃ_iの分類カテゴリＬp の持つ第ｋ階層の分類カテゴリ値と、コンテンツｃ_jの分類カテゴリＬq の持つ第ｋ階層の分類カテゴリ値とを特定する。
【００８６】
続いて、ステップ３１で、その特定した分類カテゴリ値の指す第ｋ階層の距離係数行列の成分値を特定する。すなわち、ｋ＝２のときには、図８（ｂ）に示すような行列で定義される分類カテゴリの第２階層に対応付けて用意される距離係数行列を参照することで、その特定した分類カテゴリ値の指す成分値（図８（ｂ）の例で説明するならば、Ａ₂かＢ₂）を特定するのである。
【００８７】
続いて、ステップ３２で、その特定した成分値と変数ｗの値とを乗算して、その乗算結果を新たな変数ｗの値として変数ｗに代入する。続いて、ステップ３３で、ステップ３０で特定した２つの分類カテゴリ値が一致するのか否かを判断して、一致しないことを判断するときには、ステップ３４に進んで、変数ｗの値とステップ２０で算出した距離“ｄist(ｖ_i,ｖ_j）”とを乗算することで、コンテンツｃ_iとコンテンツｃ_jとの間の距離を算出して、処理を終了する。
【００８８】
一方、ステップ３３で、ステップ３１で特定した２つの分類カテゴリ値が一致することを判断するときには、１つ下の階層レベルの分類カテゴリへの処理に進むべく、ステップ２７に戻る。
【００８９】
そして、ステップ２７〜ステップ３３の処理を繰り返していくことで、ステップ２８で、変数ｋの値が分類カテゴリの深さＮよりも大きくなったことを判断するときには、ステップ２９に進んで、変数ｗの値とステップ２０で算出した距離“ｄist(ｖ_i,ｖ_j）”とを乗算することで、コンテンツｃ_iとコンテンツｃ_jとの間の距離を算出して、処理を終了する。
【００９０】
このようにして、視覚的コンテンツ分類システムは、例えば、図９に示したように、２つのコンテンツの分類カテゴリが第１および第２階層のレベルまで一致するときには、分類カテゴリの第３階層が一致するのか否かに応じて、補正係数ｗを、
ｗ＝Ａ₁×Ａ₂×Ａ₃（第３階層まで一致）
ｗ＝Ａ₁×Ａ₂×Ｂ₃（第３階層は不一致）
と算出し、２つのコンテンツの分類カテゴリが第１階層のレベルまで一致し、かつ第２階層のレベルでは一致しないときには、
ｗ＝Ａ₁×Ｂ₂
と算出し、２つのコンテンツの分類カテゴリが第１階層のレベルで一致しないときには、
ｗ＝Ｂ₁
と算出するように処理して、そのようにして算出した補正係数ｗと、概念ベクトル間のユークリッド距離“ｄist(ｖ_i,ｖ_j）”とを乗算することで、コンテンツｃ_iとコンテンツｃ_jとの間の距離を算出するように処理するのである。
【００９１】
このようにして算出される補正係数ｗを用いると、２つのコンテンツの分類カテゴリがより深い階層レベルまで一致する程、“ｗ×ｄist(ｖ_i,ｖ_j) ”という算出式に従って算出される２つのコンテンツの距離がより小さなものになるように補正されることになる。
【００９２】
以上説明したように、基準マップ作成部４０は、コンテンツＤＢ２０に格納されているコンテンツを処理対象として、それらの全ての２つのコンテンツの組み合わせについてコンテンツ間の距離を算出して、それに基づいて各々のコンテンツの２次元上での配置座標を算出し、同じ分類カテゴリに含まれるコンテンツ同士が２次元上で集団をなすように近接して配置されるという図３や図４に示すようなコンテンツの散布図画像（基準マップ）を作成して、ユーザに提示するように処理するのである。
【００９３】
〔２〕コンテンツ配置部４１の処理
コンテンツ配置部４１は、基準マップ作成部４０により基準マップ（コンテンツの散布図画像）が作成された後に、基準マップに配置されていないコンテンツが与えられると、そのコンテンツの基準マップ上の配置座標を算出して基準マップに配置するとともに、そのコンテンツに関する情報をコンテンツＤＢ２０／メタ情報ＤＢ２１／概念ベクトルＤＢ２２／配置座標ＤＢ２３に登録するという処理を実行する。
【００９４】
図１２および図１３に、コンテンツ配置部４１の実行する処理フローの一実施形態例を図示する。
【００９５】
次に、この処理フローに従って、コンテンツ配置部４１の実行する処理について詳細に説明する。
【００９６】
コンテンツ配置部４１は、基準マップに配置されていないコンテンツの配置要求が発行されると、図１２および図１３の処理フローに示すように、先ず最初に、ステップ４０で、配置対象コンテンツの分類カテゴリ情報（実際には最下層の分類カテゴリの情報）を取得する。
【００９７】
続いて、ステップ４１で、配置対象コンテンツの概要説明文を入手して、上述した基準マップ作成部４０と同様の処理に従って、それから配置対象コンテンツの概念ベクトル（以下、この概念ベクトルをＶで表す）を算出する。
【００９８】
続いて、ステップ４２で、配置座標ＤＢ２３および概念ベクトルＤＢ２２から、基準マップ上に配置される各コンテンツ、すなわち各概念ベクトル｛Ｘｉ｝の２次元座標｛ｘｉ｝を読み出すとともに、それらの各概念ベクトル｛Ｘｉ｝の属する分類カテゴリ情報（最下層分類カテゴリの情報）を読み出す。
【００９９】
続いて、ステップ４３で、読み出した概念ベクトルを処理対象として、配置対象コンンテツの属する最下層分類カテゴリ配下の概念ベクトルの内、配置対象コンテンツの概念ベクトルＶとの間の距離が最も近い概念ベクトル（以下、この概念ベクトルをＹで表す）を特定する。
【０１００】
なお、このとき算出する距離については、同一の分類カテゴリに属していることから、上述したような補正を行う必要はない。
【０１０１】
続いて、ステップ４４で、ステップ４３で求めた概念ベクトルＹの２次元座標ｙを中心とする近傍領域Ｎy(t)の初期値として、全ての概念ベクトル｛Ｘｉ｝の２次元座標｛ｘｉ｝を含むものを設定する。なお、変数ｔは、以下の説明から分かるように、処理の繰り返し回数を表している。
【０１０２】
続いて、ステップ４５で、配置対象コンテンツの概念ベクトルＶを基準マップに投影したときの２次元座標ｖ(t) の初期値として、適当な２次元座標を設定する。
【０１０３】
このとき、２次元座標ｖ(t) の初期値として、概念ベクトルＹの２次元座標ｙを設定したり、その近傍の２次元座標を設定することが好ましいが、それに限られるものではない。
【０１０４】
続いて、ステップ４６で、近傍領域Ｎy(t)に属する全ての概念ベクトル｛Ｘｉ｝の２次元座標｛ｘｉ｝について処理を行ったのか否かを判断して、全ての２次元座標｛ｘｉ｝について処理を行っていないことを判断するときには、ステップ４７に進んで、未処理の２次元座標｛ｘｉ｝（未処理の概念ベクトル｛Ｘｉ｝）を１つ選択する。
【０１０５】
続いて、ステップ４８で、配置対象コンテンツの概念ベクトルＶおよびその２次元座標ｖ(t) と、選択した概念ベクトル｛Ｘｉ｝およびその２次元座標｛ｘｉ｝とを使い、
v2(t)=v(t)+a(t)*ｈ(d*(V,Xi))*[xi-v(t)] ‥‥‥（ｉ）式
v(t)=v2(t) ‥‥‥ (ii) 式
という計算式に従って、配置対象コンテンツの２次元座標ｖ(t) を修正して、ステップ４６に戻る。
【０１０６】
すなわち、先ず最初に、配置対象コンテンツの概念ベクトルＶおよびその２次元座標ｖ(t) と、選択した概念ベクトル｛Ｘｉ｝およびその２次元座標｛ｘｉ｝とを使って（ｉ）式を計算することでｖ2(t)を計算し、それを新たなｖ(t) と修正することで、配置対象コンテンツの２次元座標ｖ(t) を修正して、ステップ４６に戻るように処理するのである。
【０１０７】
ここで、「ａ(t) 」は、ｔとともに単調に減少する正値関数を表している。また、「ｄ＊（Ｖ，Ｘｉ）」は、概念ベクトルＶとステップ４７で選択した概念ベクトル｛Ｘｉ｝との間の距離を表している。この距離としては、上述のアルゴリズムに従って分類カテゴリ情報により補正した距離を用いることになるが、そのような補正を行わない距離を用いることも可能である。
【０１０８】
また、「ｈ（・）」は、ｔに依存しない正値の単調減少関数を表し、概念ベクトルＶと概念ベクトル｛Ｘｉ｝との間の距離が大きくなるに従って小さな値を示す関数である。
【０１０９】
この（ｉ）式は、配置対象コンテンツの概念ベクトルＶの２次元座標ｖ(t) を、ステップ４７で選択した概念ベクトル｛Ｘｉ｝の２次元座標｛ｘｉ｝に近づける形で修正することを意味しているが、この修正にあたって、概念ベクトルＶとステップ４７で選択した概念ベクトル｛Ｘｉ｝との間の距離が大きいときにはｈ（・）の値が小さくなることで、その近づける量を小さなものとなるようにしている。
【０１１０】
このようにしてステップ４６〜ステップ４８を繰り返していくことで、図１４で説明するならば、図中の▲１▼に示すように、ある概念ベクトル｛Ｘｉ｝／２次元座標｛ｘｉ｝を選択すると、図中の▲２▼に示すように、それを使ってｖ2(t)を計算して、図中の▲３▼に示すように、その計算したｖ2(t)を新たなｖ(t) とすることでｖ(t) を修正し、次に、図中の▲４▼に示すように、別の概念ベクトル｛Ｘｉ｝／２次元座標｛ｘｉ｝を選択すると、図中の▲５▼に示すように、それを使ってｖ2(t)を計算して、図中の▲６▼に示すように、その計算したｖ2(t)を新たなｖ(t) とすることでｖ(t) を修正するという処理を繰り返していくことになる。
【０１１１】
そして、このようにしてステップ４６〜ステップ４８を繰り返していくときに、ステップ４６で、近傍領域Ｎy(t)に属する全ての概念ベクトル｛Ｘｉ｝の２次元座標｛ｘｉ｝について処理を行ったことを判断するときには、ステップ４９に進んで、配置対象コンテンツの２次元座標ｖ(t) をｖ(t+1) に更新する。
【０１１２】
続いて、ステップ５０で、ステップ４４で設定した概念ベクトルＹの２次元座標ｙを中心とする近傍領域Ｎy(t)を、例えば規定の縮小率に従って、その大きさが縮小する形でＮy(t+1)に更新する。
【０１１３】
続いて、ステップ５１で、その縮小した近傍領域Ｎy(t+1)の中に、概念ベクトルＹの２次元座標ｙしか存在しないという状態に到達したのか否かを判断して、そのような状態に到達していないことを判断するときには、ステップ５４に進んで、（ｉ）式の係数ａ(t) を、例えば規定の縮小率に従って、その大きさが小さくなる形でａ(t+1) に更新し、続くステップ５５で、ｖ(t+1) を新たなｖ(t) とし、Ｎy(t+1) を新たなＮy(t) とし、ａ(t+1) を新たなａ(t) として、ステップ４６に戻る。
【０１１４】
一方、ステップ５１で、近傍領域Ｎy(t+1)の中に、概念ベクトルＹの２次元座標ｙしか存在しないという状態に到達したことを判断するときには、ステップ５２に進んで、ステップ４９で更新したｖ(t+1) を配置対象コンテンツの配置位置として決定し、続くステップ５３で、コンテンツＤＢ２０、メタ情報ＤＢ２１、概念ベクトルＤＢ２２、配置座標ＤＢ２３のそれぞれに、配置対象コンテンツに関する情報を登録して、処理を終了する。
【０１１５】
このようにして決定されることになる配置対象コンテンツの配置位置ｖ(t+1) は、コンテンツの概念ベクトル間の距離構造を保存するような形で決定されることになることから、基準マップ作成部４０により作成された基準マップの形態を崩すことなく、配置対象コンテンツを基準マップ上に配置することができるようになる。
【０１１６】
このようにして、コンテンツ配置部４１は、基準マップ作成部４０により基準マップ（コンテンツの散布図画像）が作成された後に、基準マップに配置されていないコンテンツが与えられると、図１５に示すように、そのコンテンツの基準マップ上の配置座標を算出して基準マップに配置する（図中の▲印）とともに、そのコンテンツに関する情報をコンテンツＤＢ２０／メタ情報ＤＢ２１／概念ベクトルＤＢ２２／配置座標ＤＢ２３に登録するように処理するのである。
【０１１７】
ここで、図１２および図１３の処理フローでは、（ｉ）式に示すように、近傍領域Ｎy(t)に属する全ての２次元座標｛ｘｉ｝を順番に選択しながら、配置対象コンテンツの２次元座標ｖ(t) を逐次的に修正していくように処理したが、この（ｉ）式に代えて、
v(t)←v(t)＋Σa(t)*h(d*(V,Xi))*[xi-v(t)]
但し、Σは全ての２次元座標｛ｘｉ｝についての総和
というように、配置対象コンテンツの２次元座標ｖ(t) を一度に修正するようにしてもよい。
【０１１８】
また、図１２および図１３の処理フローでは、基準マップに配置される全ての概念ベクトル｛Ｘｉ｝を処理対象として、配置対象コンテンツの概念ベクトルＶの２次元座標ｖ(t) を修正するようにしたが、配置対象コンンテツの属する最下層分類カテゴリ配下の概念ベクトル｛Ｘｉ｝との間の距離が小さいことで、その影響度が大きいことを考慮して、配置対象コンンテツの属する最下層分類カテゴリ配下の概念ベクトル｛Ｘｉ｝のみを処理対象として、配置対象コンテンツの概念ベクトルＶの２次元座標ｖ(t) を修正するようにしてもよい。このようにすると、計算量を削減できることで高速化を図れるようになる。
【０１１９】
また、図１２および図１３の処理フローでは説明しなかったが、次から次へと配置対象コンテンツが与えられる場合には、これまでに配置座標を算出した配置対象コンテンツについての情報を含めることなく、基準マップ作成部４０が作成対象としたコンテンツについての情報のみを使って、新たに与えられる配置対象コンテンツの配置座標を決定するようにしてもよいし、これまでに配置座標を算出した配置対象コンテンツについての情報を基準マップ作成部４０が作成対象としたコンテンツの情報に含める形で、新たに与えられる配置対象コンテンツの配置座標を決定するようにしてもよい。
【０１２０】
図示実施形態例に従って本発明を説明したが、本発明はこれに限定されるものではない。例えば、実施形態例では、コンテンツを分類カテゴリを単位にクラスター化して配置することで基準マップを作成して、未配置のコンテンツをそれに追加するという処理例に従って本発明を説明したが、本発明は分類カテゴリを単位としないで基準マップを作成する場合にもそのまま適用できるものである。
【０１２１】
また、実施形態例では、分類カテゴリを概念ベクトル間の距離の算出に反映させることで基準マップを作成するということで説明したが、その他のメタ情報を概念ベクトル間の距離の算出に反映させることで基準マップを作成するようにしてもよい。
【０１２２】
また、実施形態例では、コンテンツの分類を具体例にして本発明を説明したが、本発明はその適用がコンテンツの分類に限られるものではない。
【０１２３】
【発明の効果】
以上説明したように、本発明によれば、大量の情報を情報間の内容的類似性に基づいて２次元平面上に分類配置した後に、個々の情報をその分類配置のマップに逐次的あるいは追加的に分類配置できるようになることから、２次元配置の対象となる情報の数が増大したり、新たに情報が追加される場合にも、短い時間で分類配置を行えるようになる。
【０１２４】
これによって、従来の多次元尺度法を用いたコンテンツの一括配置手法に比べて、処理時間を短縮することが可能になるとともに、日々追加更新されるインターネットなどのディジタルコンテンツを対象とした視覚的分類が可能になるという効果が得られる。
【図面の簡単な説明】
【図１】本発明の一実施形態例である。
【図２】分類カテゴリの体系の一例を示す図である。
【図３】散布図画像の一例を示す図である。
【図４】散布図画像の一例を示す図である。
【図５】基準マップ作成部の実行する処理フローの一実施形態例である。
【図６】概念ベクトルＤＢの説明図である。
【図７】分類カテゴリの体系の一例を示す図である。
【図８】距離係数行列の説明図である。
【図９】補正係数の説明図である。
【図１０】基準マップ作成部の実行する処理フローの一実施形態例である。
【図１１】基準マップ作成部の実行する処理フローの一実施形態例である。
【図１２】コンテンツ配置部の実行する処理フローの一実施形態例である。
【図１３】コンテンツ配置部の実行する処理フローの一実施形態例である。
【図１４】コンテンツ配置部の実行する処理の説明図である。
【図１５】コンテンツ配置部の実行する処理の説明図である。
【符号の説明】
１０コンピュータ
１１表示部
１２指示入力部
２０コンテンツＤＢ
２１メタ情報ＤＢ
２２概念ベクトルＤＢ
２３配置座標ＤＢ
３０ネットワーク
４０基準マップ作成部
４１コンテンツ配置部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a visual information classification method and apparatus for classifying and arranging a large amount of information on a two-dimensional plane based on content similarity between information, and visual information used for realizing the visual information classification method. With regard to a classification program and a recording medium on which the program is recorded, particularly when the number of pieces of information to be two-dimensionally arranged is increased or new information is added, the classification and arrangement can be performed in a short time. The present invention relates to a visual information classification method and apparatus therefor, a visual information classification program used for realizing the visual information classification method, and a recording medium on which the program is recorded.
[0002]
[Prior art]
Conventionally, techniques for visually classifying and arranging a large amount of content in two dimensions have been proposed in, for example, the following documents.
[Reference 1] James A. Wise, et. Al. Visualizing the non-visual: Spatial analysis and interaction with information from text documents. Proc. Of IEEE Information Visualization '95, pp. 51-58 (1995).
[0003]
In this document, when the content is a text document, the concept of the text document is quantified to extract a concept vector, and a multi-dimensional scaling method is applied to the concept vector to perform two-dimensional arrangement of content and browsing using the same The interface is realized.
[0004]
[Problems to be solved by the invention]
When meta information is given to content, it is effective to use it for content search. For example, in the case of searching for a web page, a target web page can be efficiently narrowed down by using a directory service provided by many portal sites.
[0005]
In consideration of this, the present inventor, in Japanese Patent Application No. 2001-352056 and Japanese Patent Application No. 2002-55461, filed earlier, the information between the classification category previously given to the information and the text such as the summary explanatory text. We applied for an invention in which similarity, that is, distance, was calculated and information was classified and arranged on a two-dimensional plane using a multidimensional scaling method.
[0006]
A feature of the present invention is that contents are clustered and arranged in units of classification categories.
[0007]
However, in either case, whether the content is two-dimensionally arranged as in the past, or whether the information is two-dimensionally arranged in accordance with the Japanese Patent Application 2001-352056 and the Japanese Patent Application 2002-55461 filed earlier by the present inventor. However, when the number of pieces of information to be two-dimensionally arranged increases, there is a problem that the time required to perform the multidimensional scaling method increases.
[0008]
In either case, whether the content is two-dimensionally arranged as in the past, or whether the information is two-dimensionally arranged in accordance with Japanese Patent Application Nos. 2001-352056 and 2002-55461 filed earlier by the present inventor. However, since newly input information cannot be additionally arranged in a two-dimensional plane, when newly added information is included, processing of classification and arrangement including newly input information is performed. There is a problem that you have to start over.
[0009]
The present invention has been made in view of such circumstances, and after a large amount of information is classified and arranged on a two-dimensional plane based on the content similarity between the information, each information is sequentially displayed in a map of the classified arrangement. By making it possible to classify and arrange automatically, new visuals that enable classification and arrangement in a short time even when the number of information to be two-dimensionally arranged increases or when new information is added The purpose is to provide technical information classification technology.
[0010]
[Means for Solving the Problems]
  ThisIn order to achieve the above object, the visual information classification apparatus according to the present invention arranges a large amount of information on a two-dimensional plane based on the content similarity between pieces of information. When meta information that has a hierarchical structure like(1)Distance between concept vectors of each informationIs adjusted to be a smaller value as the degree of matching increases according to the degree of matching of the meta information having a hierarchical structure given in advance to the information, and based on the corrected distance,By calculating the arrangement coordinates of each piece of information in two dimensions, the informationArrangeCreating means to create a reference map(2) When non-arranged information is given, information having the same meta information as that added to the non-arranged information included in the information constituting the reference map is specified, and A selection means for selecting information having the smallest distance between concept vectors from the identified information, and (3) initial coordinates of the arrangement coordinates on the reference map of the non-arrangement information. Setting means for setting; (4) initializing an area having a certain size centered on the arrangement coordinates of the information selected by the selection means on the reference map, and the arrangement coordinates of the information entering the area; Based on a monotonically decreasing function whose distance is a variable obtained by correcting the distance between the concept vector of the information and the concept vector of the unplaced information according to the degree of coincidence of the meta information, the unplaced information on the reference map Placement coordinates Depending on the value of the monotonically decreasing function after updating in a manner to approach the arrangement coordinates of the information, that is repeated to set the area to small, nonPlacement information on the reference mapUltimateAnd a calculation means for calculating the arrangement coordinates.
  When adopting this configuration, the calculation means may update the arrangement coordinates by using a monotonically decreasing function multiplied by a coefficient indicating a smaller value as the number of repetitions increases.
  Further, the setting means may set the arrangement coordinates of the information selected by the selection means or the arrangement coordinates in the vicinity of the arrangement coordinates as initial coordinates.
  In addition, the calculation unit may initially set an area that includes only information having the same meta information as the meta information assigned to the unallocated information as the initial setting area.
  Further, the calculating means may calculate the arrangement coordinates on the reference map of newly provided unarrangement information in such a manner that the non-arrangement information for which the arrangement coordinates have been calculated is included in the information constituting the reference map.
[0016]
Each of the above processing means can be realized by a computer program, and the computer program can be provided by being recorded on a recording medium such as a semiconductor memory.
[0017]
  In the visual information classification device of the present invention configured as described above, the meta information attached to the information is obeyed.Therefore, the larger the matching level of meta information, the smaller the value.After correcting the distance between concept vectors in the report and creating a reference map using multidimensional scaling according to the corrected distance, unplaced information is givenAnd information having the same meta information as the meta information given to the non-arranged information included in the information constituting the reference map, and the unallocated information is identified from the identified information. Information with the smallest distance between concept vectors is selected, and initial coordinates of the arrangement coordinates on the reference map of the non-arrangement information are set.
  Subsequently, on the reference map, an area having a certain size centering on the arrangement coordinates of the selected information is initialized, and the arrangement coordinates of the information entering the area, the concept vector of the information, and the unplaced information Based on the monotonically decreasing function whose distance is a variable obtained by correcting the distance between the vector and the concept vector according to the degree of coincidence of the meta information, the arrangement coordinates on the reference map of the unallocated information are the values of the monotonic decreasing function. Update the information so that it is closer to the coordinates of the information, and then repeatedly set the area to a smaller one.Placement information on the reference mapUltimateBy calculating the arrangement coordinates, information that is not arranged on the reference map is sequentially classified and arranged on the reference map.
[0018]
In realizing this classification and arrangement, the non-placement information is expected to be placed in the vicinity of information having the same meta information as the meta-information assigned to the non-placement information, so all the information constituting the reference map Instead of calculating the placement coordinates of the unplaced information, the information having the same meta information as the meta information given to the unplaced information included in the information constituting the reference map is processed. As an object, processing may be performed so as to calculate the arrangement coordinates of the non-arrangement information.
[0020]
Then, in realizing this classification and arrangement, the arrangement coordinates on the reference map of the newly provided unarrangement information are calculated in such a manner that the unarrangement information for which the arrangement coordinates have been calculated is included in the information constituting the reference map. May be processed.
[0021]
Thus, according to the present invention, after a large amount of information is classified and arranged on a two-dimensional plane based on the content similarity between the information, each information is sequentially or additionally added to the classification and arrangement map. Therefore, even when the number of information to be two-dimensionally arranged increases or when new information is added, the classification and arrangement can be performed in a short time.
[0022]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, the present invention will be described in detail according to an embodiment applied to visual classification of contents.
[0023]
FIG. 1 is a diagram showing a configuration of a system that implements a visual content classification method according to an embodiment of the present invention.
[0024]
The visual content classification system shown in FIG. 1 includes a computer 10, a content database (content DB) 20, a meta information database (meta information DB) 21, and a concept vector database (concept) connected to the computer 10 via a network 30. (Vector DB) 22 and an arrangement coordinate database (arrangement coordinate DB) 23.
[0025]
The computer 10 includes a memory including a RAM, a ROM, and a magnetic disk, a CPU, a display unit 11 using a display, and an instruction input unit 12 including a mouse and a keyboard, and is realized by a software program executed by the CPU. A reference map creation unit 40 and a content placement unit 41 are provided.
[0026]
The content DB 20 stores content to be processed and text (summary explanation) representing the content.
[0027]
Further, the meta information DB 21 stores information on classification categories assigned to the respective contents stored in the content DB 20 (actually information on the classification categories in the lowest layer).
[0028]
This classification category information is given for each content in accordance with a content classification system given in advance. In this example embodiment, it is assumed that the classification category information has a hierarchical structure having a depth N (N is a positive integer).
[0029]
FIG. 2 shows an example of a classification category system for classifying content. When following this classification system, each content stored in the content DB 20 is pre-assigned any appropriate classification category of Lij (i, j = 1, 2, 3) shown in FIG. Information on the assigned classification category is stored in the meta information DB 21.
[0030]
The concept vector DB 22 stores concept vectors related to each content stored in the content DB 20 by processing described below.
[0031]
Further, the two-dimensional arrangement coordinates of each content are stored in the arrangement coordinate DB 23 by the process described below.
[0032]
The reference map creation unit 40 and the content placement unit 41 operate to realize the present invention by executing the processing described below under the visual content classification system configured as described above.
[0033]
[1] Processing of reference map creation unit 40
The reference map creation unit 40 uses the content stored in the content DB 20 as a processing target, calculates the distance between the contents for the combination of all the two contents, and based on the calculated distance between the two contents. 3 is calculated, and a scatter diagram image of content as shown in FIG. 3 and FIG. 4 is created in which contents included in the same classification category are arranged close to each other so as to form a group in two dimensions. The process of presenting to the user is executed.
[0034]
Here, the scatter diagram image shown in FIG. 4 shows an example in which the rectangular region 50 of the scatter diagram image shown in FIG. 3 is enlarged, and the enlargement / reduction operation knob 51 is designated by designating the rectangular region 50. It can be displayed by operating with a mouse or the like.
[0035]
Since the scatter diagram image of the content created in this way is created using the content stored in the content DB 20, it is hereinafter referred to as a reference map.
[0036]
FIG. 5 illustrates an example of a processing flow executed by the reference map creation unit 40.
[0037]
As shown in the processing flow of FIG. 5, the reference map creation unit 40 first reads a summary description of each content stored in the content DB 20 into the memory in step 10, and then in step 11, meta information Information on the classification category assigned to each content stored in the DB 21 (actually, information on the classification category in the lowest layer) is read into the memory.
[0038]
Subsequently, in step 12, one or a plurality of concept vectors are calculated from the read summary description, and as shown in FIG. 6, the concept vector DB 22 is stored in correspondence with the read information of the lowest layer classification category. Store.
[0039]
Here, although not shown in FIG. 6, link information indicating what content the concept vector stored in the concept vector DB 22 belongs to is also stored in the concept vector DB 22.
[0040]
This concept vector is represented as a multidimensional real-valued vector. Note that a method for calculating a concept vector (given as a weight vector related to a predetermined vocabulary) from the summary explanatory text is described in detail in the following document, and will not be described here.
[Reference 2] Kumamoto, et al., Application to concept-based information retrieval-Characteristic evaluation of retrieval using concept base-, IEICE Technical Report AI98-63 (1999).
[0041]
In addition, as a method for calculating a concept vector, when a representative word of a classification category is input to the concept base, a method of calculating a concept vector from a vocabulary or an explanation sentence associated with the representative word is proposed. May be used. Since this method is described in detail in Reference Document 2 and the following document, description thereof is omitted here.
[Reference 3] Kasara Kaname et al., Similarity discrimination of daily words using a Japanese language dictionary, linguistic theory, Vol.38, No7, pp1272-1283, (1997).
[0042]
Subsequently, in step 13, the concept vector of each content stored in the concept vector DB 22 and the information of the lowest category classification associated therewith are read out to the memory, and in step 14, the two contents included in the content to be displayed are read. The distance between contents is calculated for all combinations of contents.
[0043]
Note that all the combinations of the two contents do not necessarily have to be all of the contents stored in the content DB 20. For example, when the display target is narrowed down in advance by a search condition or the like, It means all combinations of two contents that can be extracted from the contents.
[0044]
The processing for calculating the distance between contents will be described in detail in the processing flow of FIGS. 10 and 11 described later.
[0045]
Subsequently, in step 15, using the calculated distance, the arrangement coordinates of each content on the two-dimensional plane are calculated by the multidimensional scale construction method, and stored in the arrangement coordinate DB 23.
[0046]
The multidimensional scaling method is a compression algorithm from a high-dimensional vector space to a low-dimensional space, and can be solved as the objective function minimization problem described below.
[0047]
[Expression 1]

[0048]
That is, a set of (xa, ya) (a = 1, 2,..., N) that gives the minimum value of the objective function becomes the arrangement coordinates of each content a in two dimensions. However, with this objective function, dab^*Indicates the distance between the content a and the content b (the distance calculated in the processing flow of FIGS. 10 and 11 described later), and dab is
dab = {(xa -xb)²+ (Ya −yb)²}^1/2
N indicates the total number of contents to be displayed.
[0049]
This objective function minimization problem is obtained by using a so-called steepest descent method, which is described in detail in the following document, and is not described here.
[Reference 4] J. W. Sammon. A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, C-18 (5): 401-409 (1969).
[0050]
Subsequently, in step 16, the two-dimensional coordinate information of each content stored in the arrangement coordinate DB 23 is read into the memory, and based on this, a scatter diagram image of the content to be presented to the user is created. A scatter diagram image of content is output to the display unit 11 of the computer 10.
[0051]
As described in the processing flow of FIG. 10 and FIG. 11 to be described later, the scatter diagram image of the content created in this way does not simply give the distance between the contents by the distance between the concept vectors, but the contents are classified. It is characterized in that content category category information can also be incorporated into the result of the multidimensional scaling method by defining in consideration of the similarity between classified categories.
[0052]
As a result, as shown in FIG. 3 and FIG. 4, there is an effect that the contents included in the same classification category are arranged close to each other so as to form a group in two dimensions.
[0053]
Next, the process for calculating the distance between contents executed in step 14 of the process flow of FIG. 5 will be described.
[0054]
As illustrated in FIG. 7, the classification category information has a hierarchical structure of depth N. In other words, each content is classified as the first level,
Li1: Classification category of the first layer
Where i1 = 1, ..., M
(M is a positive integer), and the content classified into the classification category Li1 is classified as a classification in the second hierarchy,
Li1i2: Classification category in the second layer with Li1 as the parent category
Where i2 = i2 (i1) = 1, ..., Mi1
(Mi1 is a positive integer).
[0055]
Similarly, content classified into a certain classification category Li1i2i3... I (k-1) in the (k-1) th layer is classified as a classification in the kth layer,
Li1i2i3 .... ik: Li1i2i3 .... i (k-1) is the parent category
Classification category of the k-th hierarchy
Where ik = ik (i1, i2, ..., i (k-1)) = 1, ..., Mi1i2i3 .... i (k-1)
(Mi1i2i3... I (k-1) is a positive integer), and this continues until k = N.
[0056]
The names of the classification categories Li1i2i3... IN in the Nth layer are stored in the meta information DB 21 as the above-described classification category information.
[0057]
If the processing for calculating the distance between contents executed in step 14 of the processing flow of FIG. 5 is specifically described on the premise of the classification system shown in FIG. 7, the visual content classification system has a first hierarchy of classification categories. In correspondence with the fact that the first layer of the classification category is composed of three classes, for example, as shown in FIG.₁(A₁<1), and B indicates a value greater than 1 for the off-diagonal component₁(B₁Those having ≧ 1) are used.
[0058]
That is, when the first hierarchy of two content classification categories is in the same category, A indicating a value smaller than 1₁(A₁<1) is assigned and is not in the same category, B indicating a value greater than 1₁(B₁The distance coefficient matrix of the first layer to which ≧ 1) is assigned is used.
[0059]
Further, as the distance coefficient matrix of the second category of the category category, corresponding to the fact that the second category of the category category is composed of three classes, for example, as shown in FIG. Indicates A less than 1₂(A₂<1), and B indicates a value greater than 1 for the off-diagonal component₂(B₂Those having ≧ 1) are used.
[0060]
That is, when the second hierarchy of two content classification categories is in the same category, A indicating a value smaller than 1₂(A₂<1) is assigned and is not in the same category, B indicating a value greater than 1₂(B₂The distance coefficient matrix of the second layer to which ≧ 1) is assigned is used.
[0061]
Further, as the distance coefficient matrix of the third hierarchy of the classification category, corresponding to the fact that the third hierarchy of the classification category is composed of three classes, for example, as shown in FIG. Indicates A less than 1_Three(A_Three<1), and B indicates a value greater than 1 for the off-diagonal component_Three(B_ThreeThose having ≧ 1) are used.
[0062]
That is, when the third hierarchy of two content classification categories is in the same category, A indicating a value smaller than 1_Three(A_Three<1) is assigned and is not in the same category, B indicating a value greater than 1_Three(B_ThreeThe third-layer distance coefficient matrix to be assigned ≧ 1) is used.
[0063]
The visual content classification system uses the correction coefficient w specified by this distance coefficient matrix to use the content c._iConcept vector v_iAnd content c_jConcept vector v_jContent c calculated by_iAnd content c_jThe distance “dist (v_i,v_j) ”,“ W × dist (v_i,v_j) "Is corrected according to the calculation formula" ", so that the classification category information is reflected in the calculation of the distance between the concept vectors.
[0064]
The correction coefficient w used at this time is, for example, as shown in FIG. 9, when the classification categories of the two contents match up to the first and second levels, whether or not the third level of the classification categories match. Depending on,
w = A₁× A₂× A_Three(Match up to the third level)
w = A₁× A₂× B_Three(The third layer is inconsistent)
When the classification categories of the two contents match up to the level of the first hierarchy and do not match at the level of the second hierarchy,
w = A₁× B₂
When the classification categories of the two contents do not match at the level of the first hierarchy,
w = B₁
Is calculated.
[0065]
When the correction coefficient w calculated in this way is used, “w × dist (v_i,v_j) "Is corrected so that the distance between the two contents calculated according to the calculation formula" "becomes smaller.
[0066]
10 and 11 show a detailed processing flow of the distance calculation processing to be executed in step 14 of the processing flow of FIG.
[0067]
In executing this processing flow, first, content c_iAnd content c_jEuclidean distance "dist (v_i,v_j) ”.
[0068]
Next, a distance matrix (derived from the above-described distance coefficient matrix) representing the distance between classification categories is configured as a variable on the memory as follows.
[0069]
First, a distance matrix (wpq) relating to the classification category Li1 (i1 = 1,..., M) of the first hierarchy is constructed. Where (wpq) is an M-order non-negative symmetric matrix.
[0070]
Next, for all Li1, the distance matrix (w [Li1] pq) for the classification category Li1i2 (i2 = i2 (i1) = 1,..., Mi1) immediately below Li1 is
w [Li1] pq: = wi1i1 * s [Li1] pq
wi1i1: (i1, i1) component of the above (wpq)
(S [Li1] pq): Mi1 non-negative symmetric matrix
The configuration is as follows. However, (w [Li1] pq) is a Mi1 order non-negative symmetric matrix.
[0071]
Next, for all Li1i2, the distance matrix (w [Li1i2] pq) for the classification category Li1i2i3 (i3 = i3 (i1, i2) = 1,..., Mi1i2) immediately below Li1i2 is
w [Li1i2] pq: = w [Li1] i2i2 * s [Li1i2] pq
w [Li1] i2i2: (i2, i2) component of (w [Li1] pq) described above
(S [Li1i2] pq): Mi1i2 non-negative symmetric matrix
The configuration is as follows. However, (w [Li1i2] pq) is a Mi1i quadratic non-negative symmetric matrix.
[0072]
Similarly, for all Li1i2 .... ik in the k-th hierarchy, the classification category Li1i2 .... iki (k + 1) (i (k + 1) = i () immediately below Li1i2 .... ik k + 1) (i1, i2, ...., ik) with a distance matrix (w [Li1i2 .... ik] pq)
w [Li1i2..ik] pq: = w [Li1i2..i (k-1)] ikik * s [Li1i2..ik] pq
w [Li1i2..i (k-1)] ikik: (ik, ik) component of (w [Li1i2..i (k-1)] pq)
(S [Li1i2..ik] pq): Mi1i2..ik degree non-negative symmetric matrix
The configuration is as follows. However, (w [Li1i2 .... ik] pq) is defined as a Mi1i2 .... ik-order non-negative symmetric matrix.
[0073]
This distance matrix (w [Li1i2 .... ik] pq) is constructed up to k = N-1.
[0074]
(Wpq) and (s [Li1i2pq), (s [Li1i2] pq), ..., (s [Li1i2 .... i (N-2)] pq), (s [Li1i2 .. ..i (N-1)] pq) is set to any value less than 1 for the diagonal component, 1 or any value greater than 1 for the non-diagonal component, and the above distance Matrix (wpq), (w [Li1] pq), (w [Li1i2] pq), ..., (w [Li1i2 .... i (N-2)] pq), (w [Li1i2 ... .i (N-1))] Initialize all variables in pq).
[0075]
Content c_iAnd content c_jIn the calculation of the distance between the distance and the distance dist in consideration of the distance between the classification categories using the distance matrix described above.^*(v_i,v_j)
dist^*(v_i,v_j) = W * dist (v_i,v_j)
Is calculated and recorded in the memory according to the following formula.
[0076]
Where w is the content c_iNth layer classification category name is Li1i2 .... iN, content c_jWhen the classification category of the Nth layer is Lj1j2 .... jN,
(1) w = wi1j1 if i1! = J1
(2) w = w [Li1 ... i (k-1)] ikjk
if i1 = j1, ...., i (k-1) = j (k-1), ik! = Jk
where 2 <= k <= N
(3) w = w [Li1 ... i (N-1)] iNjN if i1 = j1, ..., iN = jN
Is given as follows.
[0077]
And distance dist considering the distance between the classification categories^*(v_i,v_j) Is calculated for all combinations of contents, and a distance matrix for the read data is constructed. This is the distance matrix used in the above-described multidimensional scaling application process.
[0078]
Next, according to the processing flow of FIG. 10 and FIG. 11, the processing for calculating the distance between contents to be executed in step 14 of the processing flow of FIG. 5 will be specifically described.
[0079]
The visual content classification system uses the content c for the multidimensional scaling application processing described above._iAnd content c_jWhen calculating the distance between the content c and the content c, first, as shown in the processing flow of FIG. 10 and FIG._iConcept vector v_iAnd content c_jConcept vector v_jEuclidean distance "dist (v_i,v_j) ”.
[0080]
Subsequently, in step 21, “1” indicating the first hierarchy is set to the variable k indicating the hierarchy level of the classification category.
[0081]
Subsequently, in step 22, content c_iThe classification category value of the first hierarchy possessed by the classification category Lp and the content c_jAnd the classification category value of the first hierarchy possessed by the classification category Lq.
[0082]
Subsequently, in step 23, the component value of the first layer distance coefficient matrix indicated by the identified category value is specified. That is, by referring to the distance coefficient matrix prepared in association with the first layer of the classification category defined by the matrix as shown in FIG. 8A, the component value indicated by the identified classification category value (FIG. In the example of 8 (a), A₁Or B₁) Is specified.
[0083]
Subsequently, in step 24, the specified component value is substituted into the variable w. Subsequently, in step 25, it is determined whether or not the two classification category values specified in step 22 match. When it is determined that they do not match, the process proceeds to step 26, and the value of the variable w and step 20 are determined. Calculated distance “dist (v_i,v_j) "To multiply content c_iAnd content c_jThe distance between and is calculated, and the process ends.
[0084]
On the other hand, when it is determined in step 25 that the two classification category values specified in step 22 match, the process proceeds to step 27 where the value of variable k is incremented by 1, and in step 28, the value of variable k is increased. Is determined to be greater than the depth N of the classification category.
[0085]
When it is determined by this determination processing that the value of the variable k is not greater than the depth N of the classification category, the process proceeds to step 30 and the content c_iThe classification category value of the k-th layer and the content c_jAnd the classification category value of the k-th layer possessed by the classification category Lq.
[0086]
Subsequently, in step 31, the component value of the distance coefficient matrix of the k-th hierarchy pointed to by the specified classification category value is specified. That is, when k = 2, the identified classification category value is obtained by referring to the distance coefficient matrix prepared in association with the second layer of the classification category defined by the matrix as shown in FIG. The component value pointed to by (if explained in the example of FIG.₂Or B₂) Is specified.
[0087]
Subsequently, in step 32, the specified component value is multiplied by the value of the variable w, and the multiplication result is substituted into the variable w as a new value of the variable w. Subsequently, in step 33, it is determined whether or not the two classification category values specified in step 30 match. When it is determined that they do not match, the process proceeds to step 34, where the value of the variable w and step 20 are determined. Calculated distance “dist (v_i,v_j) "To multiply content c_iAnd content c_jThe distance between and is calculated, and the process ends.
[0088]
On the other hand, when it is determined in step 33 that the two classification category values specified in step 31 match, the process returns to step 27 to proceed to the processing of the classification category of the next lower hierarchy level.
[0089]
When it is determined in step 28 that the value of variable k has become larger than the classification category depth N by repeating steps 27 to 33, the process proceeds to step 29, where variable w And the distance “dist (v_i,v_j) "To multiply content c_iAnd content c_jThe distance between and is calculated, and the process ends.
[0090]
In this way, the visual content classification system, for example, as shown in FIG. 9, when the classification categories of two contents match up to the first and second levels, the third hierarchy of the classification categories matches. Depending on whether or not to do so, the correction coefficient w is
w = A₁× A₂× A_Three(Match up to the third level)
w = A₁× A₂× B_Three(The third layer is inconsistent)
When the classification categories of the two contents match up to the level of the first hierarchy and do not match at the level of the second hierarchy,
w = A₁× B₂
When the classification categories of the two contents do not match at the level of the first hierarchy,
w = B₁
And the correction coefficient w thus calculated and the Euclidean distance “dist (v_i,v_j) "To multiply content c_iAnd content c_jIt is processed so as to calculate the distance between the two.
[0091]
When the correction coefficient w calculated in this way is used, “w × dist (v_i,v_j) "Is corrected so that the distance between the two contents calculated according to the calculation formula" "becomes smaller.
[0092]
As described above, the reference map creation unit 40 calculates the distance between the contents for the combination of all the two contents, with the contents stored in the content DB 20 as a processing target, and based on each distance based on the calculated distance. Content distribution as shown in FIG. 3 and FIG. 4 is calculated in which the arrangement coordinates of the content in two dimensions are calculated and the contents included in the same classification category are arranged close to each other so as to form a group in two dimensions. A graphic image (reference map) is created and processed to be presented to the user.
[0093]
[2] Processing of content placement unit 41
After the reference map (scattering diagram image of the content) is created by the reference map creation unit 40 and content that is not placed on the reference map is given, the content placement unit 41 sets the placement coordinates of the content on the reference map. While calculating and arranging on the reference map, a process of registering information on the content in the content DB 20 / meta information DB 21 / concept vector DB 22 / placement coordinate DB 23 is executed.
[0094]
12 and 13 show an embodiment of the processing flow executed by the content placement unit 41. FIG.
[0095]
Next, processing executed by the content placement unit 41 will be described in detail according to this processing flow.
[0096]
When the content placement unit 41 issues a placement request for content that is not placed on the reference map, first, in step 40, as shown in the processing flow of FIGS. Information (actually information on the classification category of the lowest layer) is acquired.
[0097]
Subsequently, in step 41, an outline description of the content to be arranged is obtained, and a concept vector of the content to be arranged (hereinafter, this concept vector is represented by V) according to the same processing as that of the reference map creating unit 40 described above. Is calculated.
[0098]
Subsequently, in step 42, the contents arranged on the reference map, that is, the two-dimensional coordinates {xi} of each concept vector {Xi} are read from the arrangement coordinate DB 23 and the concept vector DB 22, and each concept vector { Xi} category category information (lowermost layer category category information) is read.
[0099]
Subsequently, in step 43, the concept vector (with the closest distance from the concept vector V of the content to be placed among the concept vectors under the lowest category classification category to which the placement target content belongs is set with the read concept vector as the processing target. Hereinafter, this concept vector is represented by Y).
[0100]
Note that the distance calculated at this time does not need to be corrected as described above because it belongs to the same classification category.
[0101]
Subsequently, in step 44, the two-dimensional coordinates {xi} of all concept vectors {Xi} are used as initial values of the neighborhood region Ny (t) centered on the two-dimensional coordinates y of the concept vector Y obtained in step 43. Set things to include. Note that the variable t represents the number of repetitions of processing, as will be understood from the following description.
[0102]
Subsequently, in step 45, an appropriate two-dimensional coordinate is set as an initial value of the two-dimensional coordinate v (t) when the concept vector V of the arrangement target content is projected onto the reference map.
[0103]
At this time, it is preferable to set the two-dimensional coordinate y of the concept vector Y as an initial value of the two-dimensional coordinate v (t) or set the two-dimensional coordinates in the vicinity thereof, but the present invention is not limited to this.
[0104]
Subsequently, in step 46, it is determined whether or not the processing has been performed on the two-dimensional coordinates {xi} of all the concept vectors {Xi} belonging to the neighboring region Ny (t), and all the two-dimensional coordinates {xi} are determined. When it is determined that the process is not performed, the process proceeds to step 47 to select one unprocessed two-dimensional coordinate {xi} (unprocessed concept vector {Xi}).
[0105]
Subsequently, in step 48, using the concept vector V and its two-dimensional coordinate v (t) of the content to be arranged, the selected concept vector {Xi} and its two-dimensional coordinate {xi},
v2 (t) = v (t) + a (t) * h (d * (V, Xi)) * [xi-v (t)] (i) formula
v (t) = v2 (t) (ii) Equation
The two-dimensional coordinates v (t) of the content to be arranged are corrected according to the calculation formula:
[0106]
That is, first, formula (i) is calculated using the concept vector V and its two-dimensional coordinate v (t) of the content to be arranged, and the selected concept vector {Xi} and its two-dimensional coordinate {xi}. Thus, v2 (t) is calculated and corrected to a new v (t), thereby correcting the two-dimensional coordinates v (t) of the arrangement target content and processing to return to step 46. .
[0107]
Here, “a (t)” represents a positive value function that decreases monotonously with t. “D * (V, Xi)” represents the distance between the concept vector V and the concept vector {Xi} selected in step 47. As this distance, the distance corrected by the classification category information according to the above algorithm is used.It will beIt is also possible to use a distance that does not perform the correction.
[0108]
“H (•)” represents a positive monotonously decreasing function that does not depend on t, and is a function that shows a smaller value as the distance between the concept vector V and the concept vector {Xi} increases.
[0109]
This equation (i) means that the two-dimensional coordinate v (t) of the concept vector V of the content to be arranged is corrected so as to approach the two-dimensional coordinate {xi} of the concept vector {Xi} selected in step 47. However, in this correction, when the distance between the concept vector V and the concept vector {Xi} selected in step 47 is large, the value of h (•) becomes small so that the approaching amount becomes small. It is trying to become.
[0110]
By repeating step 46 to step 48 in this manner, if described with reference to FIG. 14, a certain concept vector {Xi} / 2-dimensional coordinate {xi} is selected as shown by (1) in the figure. Then, as shown in (2) in the figure, v2 (t) is calculated using it, and as shown in (3) in the figure, the calculated v2 (t) is replaced with a new v (t ) To correct v (t), and then select another concept vector {Xi} / 2-dimensional coordinates {xi} as shown by (4) in the figure. As shown in ▼, v2 (t) is calculated by using it, and as shown in (6) in the figure, the calculated v2 (t) is set as a new v (t). The process of correcting t) will be repeated.
[0111]
Then, when step 46 to step 48 are repeated in this way, in step 46, processing is performed for the two-dimensional coordinates {xi} of all concept vectors {Xi} belonging to the neighboring region Ny (t). Is determined, the process proceeds to step 49 to update the two-dimensional coordinates v (t) of the content to be arranged to v (t + 1).
[0112]
Subsequently, in step 50, Ny (t (n) in the form of reducing the size of the neighboring region Ny (t) centered on the two-dimensional coordinate y of the concept vector Y set in step 44 according to, for example, a specified reduction ratio. Update to +1).
[0113]
Subsequently, in step 51, it is determined whether or not a state in which only the two-dimensional coordinate y of the concept vector Y exists in the reduced neighborhood region Ny (t + 1) is reached, and such a state is determined. When it is determined that the value has not been reached, the routine proceeds to step 54, where the coefficient a (t) in the equation (i) is reduced to a (t + 1) in such a way that the coefficient a (t + 1) becomes smaller in accordance with, for example, a specified reduction ratio. In step 55, v (t + 1) is set as a new v (t), Ny (t + 1) is set as a new Ny (t), and a (t + 1) is set as a new a ( Return to step 46 as t).
[0114]
On the other hand, when it is determined in step 51 that the state where only the two-dimensional coordinate y of the concept vector Y exists in the neighboring region Ny (t + 1), the process proceeds to step 52 and updated in step 49. V (t + 1) is determined as the placement position of the placement target content, and in step 53, information related to the placement target content is registered in each of the content DB 20, the meta information DB 21, the concept vector DB 22, and the placement coordinate DB 23. The process is terminated.
[0115]
Since the arrangement position v (t + 1) of the arrangement target content to be determined in this way is determined in such a manner as to preserve the distance structure between the concept vectors of the contents, the reference map The arrangement target content can be arranged on the reference map without destroying the form of the reference map created by the creation unit 40.
[0116]
In this way, the content placement unit 41, as shown in FIG. 15, when content that is not placed on the reference map is provided after the reference map (content scatter diagram image) is created by the reference map creation unit 40. In addition, the arrangement coordinates of the content on the reference map are calculated and arranged on the reference map (marked with ▲ in the figure), and information on the contents is registered in the content DB 20 / meta information DB 21 / concept vector DB 22 / placement coordinate DB 23. It is processed like this.
[0117]
Here, in the processing flow of FIG. 12 and FIG. 13, as shown in the equation (i), while selecting all the two-dimensional coordinates {xi} belonging to the neighboring region Ny (t) in order, Although the processing was performed so that the dimensional coordinate v (t) was sequentially corrected, instead of the equation (i),
v (t) ← v (t) + Σa (t) * h (d * (V, Xi)) * [xi-v (t)]
Where Σ is the sum of all two-dimensional coordinates {xi}
In this way, the two-dimensional coordinates v (t) of the arrangement target content may be corrected at a time.
[0118]
Also, in the processing flows of FIGS. 12 and 13, the two-dimensional coordinates v (t) of the concept vector V of the content to be arranged are corrected with all the concept vectors {Xi} arranged in the reference map as processing targets. However, considering that the influence is large because the distance to the concept vector {Xi} under the lowest category classification category to which the arrangement target content belongs is small, the lowermost classification category to which the arrangement target content belongs The two-dimensional coordinates v (t) of the concept vector V of the content to be arranged may be corrected by using only the concept vector {Xi} as the processing target. In this way, it is possible to increase the speed by reducing the amount of calculation.
[0119]
Also, although not described in the processing flow of FIGS. 12 and 13, when arrangement target content is given from one to the next, information about the arrangement target content whose arrangement coordinates have been calculated is not included. In addition, the arrangement coordinates of the arrangement target content to be newly given may be determined using only the information about the contents that the reference map creation unit 40 created, or the arrangement target for which the arrangement coordinates have been calculated so far You may make it determine the arrangement | positioning coordinate of the arrangement | positioning content newly provided in the form in which the information about a content is included in the information of the content made into the reference | standard map preparation part 40.
[0120]
Although the present invention has been described according to the illustrated embodiment, the present invention is not limited to this. For example, in the embodiment, the present invention has been described according to a processing example in which a reference map is created by clustering content in units of classification categories and arranged, and unallocated content is added thereto. The present invention can also be applied to a case where a reference map is created without using a classification category as a unit.
[0121]
Further, in the embodiment example, it has been described that the reference map is created by reflecting the classification category in the calculation of the distance between the concept vectors, but other meta information is reflected in the calculation of the distance between the concept vectors. The reference map may be created by
[0122]
Further, in the embodiment, the present invention has been described by taking the content classification as a specific example, but the present invention is not limited to the content classification.
[0123]
【The invention's effect】
As described above, according to the present invention, after a large amount of information is classified and arranged on a two-dimensional plane based on the content similarity between the information, individual information is sequentially or added to the classification and arrangement map. Therefore, even when the number of pieces of information to be two-dimensionally arranged increases or when new information is added, the classification and arrangement can be performed in a short time.
[0124]
This makes it possible to shorten the processing time and to visually classify digital contents such as the Internet that are added and updated daily compared to the conventional batch arrangement method of contents using the multidimensional scaling method. Can be obtained.
[Brief description of the drawings]
FIG. 1 is an example of an embodiment of the present invention.
FIG. 2 is a diagram illustrating an example of a classification category system;
FIG. 3 is a diagram showing an example of a scatter diagram image.
FIG. 4 is a diagram illustrating an example of a scatter diagram image.
FIG. 5 is an example of a processing flow executed by a reference map creation unit;
FIG. 6 is an explanatory diagram of a concept vector DB.
FIG. 7 is a diagram illustrating an example of a classification category system;
FIG. 8 is an explanatory diagram of a distance coefficient matrix.
FIG. 9 is an explanatory diagram of correction coefficients.
FIG. 10 is an example of a processing flow executed by a reference map creation unit.
FIG. 11 is an example of a processing flow executed by a reference map creation unit.
FIG. 12 is an example of a processing flow executed by a content placement unit.
FIG. 13 is an example of a processing flow executed by a content placement unit.
FIG. 14 is an explanatory diagram of processing executed by a content placement unit.
FIG. 15 is an explanatory diagram of processing executed by a content placement unit.
[Explanation of symbols]
10 Computer
11 Display
12 Instruction input part
20 Content DB
21 Meta Information DB
22 Concept vector DB
23 Location coordinate DB
30 network
40 Reference map generator
41 Content placement section

Claims

And a calculating means and producing means and selecting means and setting means, visual visual information classifier you place on a two-dimensional plane based on the contents similarity between a large amount of information information is executed Satoshiteki An information classification method,
The creation means, the distance between the concepts vector having the each of the information previously granted matching degree compensation as matching degree is smaller value increases in accordance with the meta information a hierarchical structure information and, based on the distances the correction, by calculating the location coordinates on the two-dimensional of each information, create a reference map to place their information,
When the selection means is given non-arranged information, specify information having the same meta information as the meta information given to the non-arrangement information included in the information constituting the reference map, From the identified information, select the information with the smallest distance between the concept vectors to the unplaced information,
The setting means sets initial coordinates of arrangement coordinates on the reference map of the non-arrangement information,
The calculation means initializes an area having a certain size centered on the arrangement coordinates of the information selected by the selection means on the reference map, and the arrangement coordinates of the information entering the area and the information The reference map of the non-arranged information based on a monotonically decreasing function whose distance is a variable obtained by correcting the distance between the concept vector possessed by the concept vector and the concept vector possessed by the non-arranged information according to the degree of coincidence of the meta information. By updating the arrangement coordinates on the top in accordance with the value of the monotonically decreasing function so as to approach the arrangement coordinates of the information, and repeatedly setting the area to be small, and Turkey to calculate the final placement coordinates on the reference map,
Feature visual information classification method.

The visual information classification method according to claim 1,
The calculation means is to update the arrangement coordinates by using the monotonically decreasing function multiplied by a coefficient indicating a smaller value as the number of repetitions increases.
Feature visual information classification method.

The visual information classification method according to claim 1 or 2,
The setting means sets the arrangement coordinates of the information selected by the selection means or the arrangement coordinates in the vicinity of the arrangement coordinates as the initial coordinates.
Feature visual information classification method.

In visual information classification methods in placing serial to any one of claims 1 to 3,
Said calculating means, as an initial setting area of the region, a region that contains only information having the above Symbol same meta information and granted meta information unplaced information be initialized,
Feature visual information classification method.

In visual information classification methods in placing serial to any one of claims 1 to 4,
Said calculation means, unplaced information calculated placement coordinates Until now in the form including the information constituting the reference map, calculating a placement coordinates on the reference map unplaced information given newly ,
Feature visual information classification method.

A visual information classification device that arranges a large amount of information on a two-dimensional plane based on content similarity between information,
The distance between the concept vectors with the respective information, complement and correct as matching degrees depending on the degree of coincidence of the meta information taking previously assigned hierarchical structure is smaller value increases in the information, and the correction distance based, by calculating the location coordinates on the two-dimensional of each information, a generating means for generating a reference map to place their information,
When non-arranged information is given, information having the same meta information as that added to the non-arranged information included in the information constituting the reference map is specified, and the information of the specified information A selection means for selecting information from which a distance between concept vectors between the unplaced information is the smallest,
Setting means for setting initial coordinates of arrangement coordinates on the reference map of the non-arrangement information;
On the reference map, an area having a certain size centered on the arrangement coordinates of the information selected by the selection means is initialized, the arrangement coordinates of the information entering the area, the concept vector of the information, and the above Based on a monotonically decreasing function whose variable is the distance obtained by correcting the distance between the concept vector of the non-arranged information according to the degree of coincidence of the meta information, the arrangement coordinates of the non-arranged information on the reference map are calculated. By updating the information close to the arrangement coordinates of the information according to the value of the monotonic decreasing function and then setting the area to a small one, the final information on the reference map of the unarranged information is obtained. A calculating means for calculating a typical arrangement coordinate,
Feature visual information classification device.

The visual information classification device according to claim 6,
The calculation means is to update the arrangement coordinates by using the monotonically decreasing function multiplied by a coefficient indicating a smaller value as the number of repetitions increases.
Feature visual information classification device.

The visual information classification device according to claim 6 or 7,
The setting means sets the arrangement coordinates of the information selected by the selection means or the arrangement coordinates in the vicinity of the arrangement coordinates as the initial coordinates.
Feature visual information classification device.

In visual information classification apparatus on the mounting serial any one of claims 6 to 8,
Said calculating means, as an initial setting area of the region, a region that contains only information having the above Symbol same meta information and granted meta information unplaced information be initialized,
Feature visual information classification device.

In visual information classification apparatus on the mounting serial any one of claims 6 to 9,
Said calculation means, unplaced information calculated placement coordinates Until now in the form including the information constituting the reference map, calculating a placement coordinates on the reference map unplaced information given newly ,
Feature visual information classification device.

Claims 1 to 5 or visual information classification method processing Ru to cause the computer to execute visual Satoshiteki information classification program used for realization of described in one of.

Visual information classification recording medium recorded with the Ru vision Satoshiteki information classification program cause the computer to execute operation for implementing the method according to any one of claims 1 to 5.