JP3444223B2

JP3444223B2 - Database registration device

Info

Publication number: JP3444223B2
Application number: JP07506099A
Authority: JP
Inventors: 政行中江; 英彦岡田
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1999-03-19
Filing date: 1999-03-19
Publication date: 2003-09-08
Anticipated expiration: 2016-12-28
Also published as: JPH11312115A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は非定型データに対す
る効率的で高精度な関連データ検索を可能にするデータ
ベース検索装置に好適なデータベース登録装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a database registration device suitable for a database search device that enables efficient and highly accurate related data search for atypical data.

【０００２】[0002]

【従来の技術】従来、データの意味情報を用いた検索を
可能にするデータベースは、例えば、技術文書のデータ
ベースや、図書館蔵書のデータベースなどのように、一
般に蓄積されたデータ中から自動的にキーワードを抽出
し、予め経験的に作られた類義語辞書などを用いて、デ
ータ間の関連度を計算するものであった。2. Description of the Related Art Conventionally, a database that enables a search using semantic information of data is a keyword that is automatically searched from generally accumulated data such as a database of technical documents and a database of library collections. Was extracted, and the degree of association between data was calculated using a synonym dictionary or the like that was created empirically in advance.

【０００３】一方、画像や音声などのように言語によら
ないデータの場合は、データの特徴量をベクトルの形で
表し、データ間の関連度を双方のベクトル間距離で表す
方法が知られている。このような方法には、例えば、特
開昭５８−１４７７９９号公報，特開昭６０−１８９５
８５号公報，特開昭６４−１７１８２号公報，特開平４
−１５７７７号公報などのように、音声パターンや文字
パターンの特徴ベクトル間のユークリッド距離をもって
類似度とする方法がある。この他にも、例えば、特開昭
６０−２０２４９１号公報や特開昭６３−７８０号公報
のように、特徴ベクトルの各成分の差の絶対値の総和を
もって類似度とする方法もある。いずれの方法にして
も、特徴ベクトルは音声データや画像データを特徴づけ
る物理量の系列であり、その特徴ベクトルの各成分は予
め全て決定されていることが前提となっている。On the other hand, in the case of non-language data such as images and voices, a method is known in which the feature quantity of the data is represented in the form of a vector and the degree of association between the data is represented by the distance between the two vectors. There is. Such methods include, for example, JP-A-58-147799 and JP-A-60-1895.
85, JP 64-17182 A, JP 4
There is a method in which the Euclidean distance between feature vectors of a voice pattern or a character pattern is used as the degree of similarity, as in Japanese Patent Publication No. -15777. In addition to this, for example, as in JP-A-60-202491 and JP-A-63-780, there is a method in which the sum of the absolute values of the differences between the components of the feature vector is used as the similarity. In either method, the feature vector is a series of physical quantities that characterize the audio data and the image data, and it is premised that all the components of the feature vector are determined in advance.

【０００４】また、近年、文書，画像，音声などを統一
的に取り扱うデータ形式として、ハイパーテキストが注
目されている。ハイパーテキストは文書，画像，音声な
どから構成されるノードと呼ばれる情報単位をもち、ノ
ード間をリンクにより関連づけすることができる。デー
タベースの分野においても、このハイパーテキストで表
現されたデータを対象とすることで、文書，画像，音声
といったデータ型の区別のない新しいメディアとして応
用されている。In recent years, hypertext has attracted attention as a data format that handles documents, images, sounds, and the like in a unified manner. Hypertext has an information unit called a node composed of a document, an image, a sound, etc., and the nodes can be associated with each other by a link. In the field of databases, the data expressed in hypertext is used as a new medium without distinction between data types such as documents, images, and sounds.

【０００５】ここで、ハイパーテキストシステムへの新
規ノード追加の際、新規ノードから関連ノードへのリン
クもしくは関連ノードから新規ノードへのリンクを自動
的に付加する手法が望まれている。このような手法に
は、特開平３−２７８２４７号公報のように、ノードの
内容が変更されると、差分情報またはノードの全情報が
新しいバージョンのノードとして保持され、変更前のノ
ードへ新しいノードからリンクが付加されるといったも
のがある。また、特開平４−３１７１７２号公報のよう
に、通常の情報を持ったノード（検索対象ノード）の他
に、検索のために用いるノード（検索ノード）を用意し
て対応する検索対象ノードとリンクしておき、検索対象
ノードに変更があると、変更の必要な検索ノードのリン
クについて、更新，追加が行われるという手法もある。
最後に、特開平５−２０３６３号公報のように、各情報
に定性的な検索条件を予め記載しておき、情報表示時に
検索条件を同時に表示し、そこで選択された検索条件を
元に他の情報があいまい検索されるという手法もある。
尚、ここでいう定性的な検索条件とは、例えば画像の暗
さのように、画像のコントラストなどの物理量により判
断可能な条件である。また、ハイパーテキストデータベ
ースにおいて、従来、検索結果の表示については一般
に、検索式に適合するノードへのリンクのリストが表示
されていた。Here, there is a demand for a method of automatically adding a link from a new node to a related node or a link from a related node to a new node when adding a new node to the hypertext system. In such a method, when the content of the node is changed, the difference information or all the information of the node is held as a node of a new version, as in Japanese Patent Laid-Open No. 3-278247, and a new node is added to the node before the change. There is a thing that a link is added from. Further, as in Japanese Patent Laid-Open No. 4-317172, in addition to a node having normal information (search target node), a node used for search (search node) is prepared and linked with a corresponding search target node. Incidentally, there is also a method in which when the search target node is changed, the link of the search node that needs to be changed is updated or added.
Finally, as in JP-A-5-20363, qualitative search conditions are described in advance in each piece of information, and the search conditions are displayed at the same time when the information is displayed, and other search conditions are selected based on the search conditions selected there. There is also a method in which information is fuzzy searched.
Note that the qualitative search condition here is a condition that can be determined by a physical quantity such as image contrast, such as the darkness of the image. Further, in the hypertext database, conventionally, regarding the display of the search result, generally, a list of links to nodes matching the search expression is displayed.

【０００６】[0006]

【発明が解決しようとする課題】第１の問題点として、
データの登録について、データの意味を定義するため
に、データ中のキーワード自動抽出もしくは登録者によ
る自由なキーワード登録を行うような従来技術では、例
えばユーザインタフェースに関するデータのようにユー
ザ特性，タスク特性，システム特性など様々な視点から
特徴づけを行う必要があり、かつキーワードとして適当
な語彙が明確でない場合、データの意味を代表するキー
ワードを適切に定義することは難しく、そのためデータ
への適切な特徴づけが困難であるなどの問題があった。The first problem is as follows.
Regarding the registration of data, in the related art in which a keyword is automatically extracted from the data or a keyword is freely registered by the registrant in order to define the meaning of the data, user characteristics, task characteristics, such as user interface data, If it is necessary to characterize from various viewpoints such as system characteristics, and if the appropriate vocabulary as a keyword is not clear, it is difficult to properly define a keyword that represents the meaning of the data, and therefore appropriate characterization of the data There were problems such as being difficult.

【０００７】第２の問題点として、ハイパーテキストデ
ータベースにおいて、新規ノードの登録の際、各ノード
に付加された定性的情報を用いて自動的にリンクを生成
するような従来の方法では、（１）文書，画像，音声な
どの任意な組み合わせを認めるような非定型データに対
して対応できない、（２）互いにリンク付けされるノー
ド対に対し、リンクの方向を動的に決定できない、など
の問題があった。A second problem is that in the conventional method of automatically creating a link using the qualitative information added to each node when registering a new node in the hypertext database, (1 ) Problems such as not being able to deal with atypical data that allows arbitrary combinations of documents, images, sounds, etc. (2) The direction of links cannot be dynamically determined for node pairs linked to each other was there.

【０００８】そこで本発明では、文書，画像，音声など
を自由に組み合わせた非定型的なデータに対し、単一の
方法で特徴づけることができ、かつ新規に登録されるデ
ータと既に蓄積された関連データとの間で自動的なリン
クづけを行うデータベース登録装置を提供することを目
的とする。Therefore, in the present invention, atypical data in which documents, images, sounds, etc. are freely combined can be characterized by a single method, and data to be newly registered has already been stored. An object is to provide a database registration device that automatically links with related data.

【０００９】[0009]

【課題を解決するための手段】本発明のデータベース登
録装置は、データ登録者が新規データの内容を入力する
ためのコンテンツ登録手段と、予め定められたキーワー
ド集合の中から前記新規データの特徴を表すのに適当な
キーワードをデータ登録者が選択するためのキーワード
選択手段と、前記キーワード選択手段で用いるキーワー
ド集合を保存しておくキーワード集合保存手段と、前記
キーワード選択手段で選択された各キーワードに対して
［０，１］なる実数による重みづけをデータ登録者が入
力するための重みづけ入力手段と、前記キーワード選択
手段で選択されたキーワードと前記重みづけ入力手段で
入力された重みづけとから新規データの特徴ベクトルを
生成する特徴ベクトル生成手段と、データを蓄積するデ
ータ蓄積手段と、前記特徴ベクトル生成手段で生成され
た新規データの特徴ベクトルおよび前記データ蓄積手段
に保存されている各データに付加されている特徴ベクト
ル間の距離が予め定められたしきい値以下となる前記デ
ータ蓄積手段に保存されているデータを前記新規データ
に対する近傍データとして求める位置グループ抽出手段
と、前記位置グループ抽出手段により求められた近傍デ
ータのカテゴリと前記新規データのカテゴリとから、予
め定められたリンク付け規則を用いて、前記近傍データ
と前記新規データとの間に新たに追加するべきリンクの
方向を判定するリンク方向判定手段と、前記近傍データ
と前記新規データとの間に前記リンク方向判定手段によ
り判定された方向にリンクを追加し、前記新規データ及
び前記新規データの特徴ベクトル並びに前記リンクにつ
いての情報を前記データ蓄積手段に蓄積するノード追加
手段とを備えている。また、前記位置グループ抽出手段
は、特徴ベクトルの成分間の相関度の表を保存しておく
成分間相関表保存手段と、前記特徴ベクトル生成手段で
生成された新規データの特徴ベクトルおよび前記データ
蓄積手段に保存されている各データに付加されている特
徴ベクトルの未定義成分を、これらのベクトルで定義済
みの成分と前記成分間相関表保存手段に保存された成分
間の相関度の表とを用いて補完する未定義成分補完手段
と、前記未定義成分補完手段で未定義部分が補完された
前記新規データの特徴ベクトルと前記データ蓄積手段に
保存されている前記データの特徴ベクトルとの距離を求
めるベクトル間距離計算手段と、前記ベクトル間距離計
算手段で計算された特徴ベクトル間の距離が予め定めら
れたしきい値以下であれば前記データを前記新規データ
に対する近傍データと判定する近傍データ判定手段と、
前記近傍データ判定手段において近傍データの判定に用
いる予め定められたしきい値を保存しておくしきい値保
存手段とから構成されている。The database registration device of the present invention is characterized by a content registration means for a data registrant to input the contents of new data and a characteristic of the new data from a predetermined keyword set. Keyword selecting means for the data registrant to select an appropriate keyword to represent, keyword set saving means for saving the keyword set used by the keyword selecting means, and each keyword selected by the keyword selecting means. On the other hand, from the weighting input means for the data registrant to input the weighting with a real number of [0, 1], the keyword selected by the keyword selecting means, and the weighting input by the weighting input means. Feature vector generation means for generating a feature vector of new data, data storage means for storing data, The data storage in which the distance between the feature vector of the new data generated by the feature vector generation means and the feature vector added to each data stored in the data storage means is equal to or less than a predetermined threshold value. Position group extraction means for obtaining the data stored in the means as neighborhood data for the new data, and a predetermined link from the category of the neighborhood data obtained by the location group extraction means and the category of the new data By using a rule, link direction determining means for determining the direction of a link to be newly added between the neighboring data and the new data, and the link direction determining means between the neighboring data and the new data. A link is added in the determined direction, and the new data and the feature vector alignment of the new data are added. Information about the link and a node addition means for storing in said data storage means. Further, the position group extracting means stores an inter-component correlation table storing means for storing a table of the correlation degree between the components of the feature vector, the feature vector of the new data generated by the feature vector generating means, and the data storage. The undefined components of the feature vector added to each data stored in the means are defined by the components defined by these vectors and the correlation table between the components stored in the inter-component correlation table storage means. An undefined component complementing means for complementing by using the distance between the feature vector of the new data whose undefined portion is complemented by the undefined component complementing means and the feature vector of the data stored in the data accumulating means. If the inter-vector distance calculation means to be obtained and the distance between the feature vectors calculated by the inter-vector distance calculation means are less than or equal to a predetermined threshold value, the data is And the neighboring data determination means determines that the proximity data to serial new data,
And a threshold value storage means for storing a predetermined threshold value used in the determination of the proximity data in the proximity data determination means.

【００１０】このような構成のデータベース登録装置に
あっては、テキスト，画像，音声などの任意の組み合わ
せによる非定型データに対して、適切なキーワードの付
与を単一の方法で行うことができる。また、ハイパーテ
キストノードの新規作成における意味的に関連する他の
ノードへの自動的なリンクづけが可能となる。In the database registration device having such a configuration, it is possible to assign an appropriate keyword to atypical data composed of an arbitrary combination of text, image, voice, etc. by a single method. In addition, it is possible to automatically link to another node that is semantically related in newly creating a hypertext node.

【００１１】また前記未定義成分補完手段は、前記特徴
ベクトル生成手段で生成された特徴ベクトルおよび予め
保存されたデータの特徴ベクトルの未定義成分の補完値
を求めるために必要な、その未定義成分と相関をもつ定
義された成分がない場合、その未定義成分をどのような
値との差をとっても常に０となるような特別な記号に置
き換えるため、前記のように未定義部分の補完が行えな
かった場合にも、ベクトル間の距離を求めて関連度を計
ることができる。The undefined component complementing means is necessary for obtaining the complementary value of the undefined component of the feature vector of the feature vector generated by the feature vector generating means and the data stored in advance. If there is no defined component that correlates with, the undefined component is replaced by a special symbol that will always be 0 regardless of the difference with any value, so the undefined portion can be complemented as described above. Even if there is not, the degree of association can be measured by obtaining the distance between the vectors.

【００１２】[0012]

【発明の実施の形態】次に本発明の実施の形態の例につ
いて図面を参照して詳細に説明するが、その前に、本発
明のデータベース登録装置を使って登録されたデータを
検索する装置（データベース検索装置）について説明し
ておく。BEST MODE FOR CARRYING OUT THE INVENTION Next, an example of an embodiment of the present invention will be described in detail with reference to the drawings. Before that, a device for retrieving registered data using a database registration device of the present invention The (database search device) will be described.

【００１３】図１を参照すると、データベース検索装置
の一例は、ユーザが希望のデータの特徴を入力するため
の問合せ入力手段１と、入力された問合せを特徴ベクト
ルに変換する問合せベクトル生成手段２と、データ及び
データの特徴ベクトルを保存してあるデータ保存手段３
と、特徴ベクトルに含まれる未定義成分を補完する未定
義成分補完手段４と、未定義成分補完手段４で用いる成
分間相関表を保存してある成分間相関表保存手段５と、
問合せおよび保存データの特徴ベクトル間の距離を求め
るベクトル間距離計算手段６と、問合せの特徴ベクトル
に対して或るしきい値以下の距離にある特徴ベクトルを
持つ保存データを判定する近傍データ判定手段７と、そ
のしきい値を保存してあるしきい値保存手段８と、近傍
データ判定手段７で求められた近傍データの一覧を表示
する検索結果表示手段９と、検索結果表示手段９で表示
された近傍データの一覧中からユーザが任意のデータを
指示するためのデータ指示手段１０と、指示されたデー
タの内容を表示するデータ表示手段１１とを含む。ここ
で、未定義成分補完手段４，成分間相関表保存手段５，
ベクトル間距離計算手段６，近傍データ判定手段７及び
しきい値保存手段８で、位置グループ抽出部１２が構成
される。Referring to FIG. 1, an example of a database search apparatus includes a query input means 1 for a user to input a desired data feature, and a query vector generation means 2 for converting an input query into a feature vector. , A data storage means 3 for storing data and data feature vectors
An undefined component complementing means 4 for complementing the undefined component included in the feature vector, and an inter-component correlation table storage means 5 storing an inter-component correlation table used by the undefined component complementing means 4.
An inter-vector distance calculation means 6 for obtaining the distance between the feature vectors of the inquiry and the saved data, and a neighborhood data determination means for determining the saved data having the feature vector at a distance less than a certain threshold with respect to the feature vector of the inquiry 7, a threshold value storage means 8 for storing the threshold value thereof, a search result display means 9 for displaying a list of the neighborhood data obtained by the neighborhood data determination means 7, and a search result display means 9. It includes a data designating means 10 for allowing the user to designate arbitrary data from the list of the neighborhood data that has been designated, and a data display means 11 for displaying the content of the designated data. Here, undefined component complementing means 4, inter-component correlation table storage means 5,
The position group extraction unit 12 is configured by the vector distance calculation unit 6, the neighborhood data determination unit 7, and the threshold value storage unit 8.

【００１４】問合せ入力手段１およびデータ指示手段１
０は、例えばマウスおよびキーボードなどの情報入力装
置で構成され、検索結果表示手段９およびデータ表示手
段１１は、例えばＶＤＴなどの情報表示装置で構成され
る。また、データ保存手段３は、例えば固定ディスクな
どの外部記憶装置で構成され、問合せベクトル生成手段
２および位置グループ抽出部１２は、例えばＰＣ（パー
ソナルコンピュータ）やＷＳ（ワークステーション）な
どで動作するプログラムで構成される。Inquiry input means 1 and data instruction means 1
Reference numeral 0 is an information input device such as a mouse and keyboard, and the search result display means 9 and data display means 11 are information display devices such as a VDT. Further, the data storage means 3 is composed of an external storage device such as a fixed disk, and the inquiry vector generation means 2 and the position group extraction unit 12 are programs operating on, for example, a PC (personal computer) or WS (workstation). Composed of.

【００１５】データ保存手段３に保存されたデータは、
文書，画像，音声などを任意に組み合わせた、他のデー
タへの関係を０個以上もつデータと、そのデータに対応
する特徴ベクトルとから構成される。例えば、データＡ
とデータＢがあり、それぞれの特徴ベクトルがｖ
［Ａ］，ｖ［Ｂ］であって、かつデータＡが持つ関係の
一つにデータＢへの関係があるとき、データ保存手段３
では図２のような形で保存される。The data stored in the data storage means 3 is
It is composed of data having 0 or more relations to other data, which is an arbitrary combination of documents, images, sounds, and the like, and a feature vector corresponding to the data. For example, data A
And data B, and each feature vector is v
[A], v [B], and when one of the relations of the data A has a relation to the data B, the data storage means 3
Then, it is saved in the form as shown in FIG.

【００１６】次に、図１のように構成されたデータベー
ス検索装置の動作について図を参照して説明する。Next, the operation of the database search device configured as shown in FIG. 1 will be described with reference to the drawings.

【００１７】図１を参照すると、問合せ入力手段１にお
いて、例えばキーボードやマウスなどを用いて、ユーザ
からキーワードを選択し、それぞれに［０，１］なる実
数により重みづけを行ったものとする。例えば、図３
（ａ）に示したように予め任意のシステムのユーザイン
タフェースのデザイン事例などのデータの特徴を表すキ
ーワード集合が定められているとして、今、ユーザによ
って、図３（ｂ）に示したキーワードが選択され、それ
ぞれ上から順に「未選択，０.8，０.2, 未選択,０.8,
０.7, 未選択, ０.1, ０.2」と重みづけが行われたとす
る。このとき、問合せベクトル生成手段２により、入力
された問合せに対応する特徴ベクトル（問合せベクト
ル）が以下のように生成される。（ｘ［１］，０.8，０.2, ｘ［２］, ０.8, ０.7, ｘ［３］, ０.1, ０.2） …（a）ここで、ｘ［１］，ｘ［２］，ｘ［３］は未定義成分を
示す。Referring to FIG. 1, it is assumed that the inquiry input means 1 selects a keyword from the user using, for example, a keyboard or a mouse, and weights each keyword with a real number of [0, 1]. For example, in FIG.
As shown in (a), it is assumed that a keyword set representing a characteristic of data such as a design example of a user interface of an arbitrary system is previously defined, and the user now selects the keyword shown in FIG. 3 (b). In order from the top, "Unselected, 0.8, 0.2, Unselected, 0.8,
0.7, unselected, 0.1, 0.2 ”are weighted. At this time, the query vector generation means 2 generates a feature vector (query vector) corresponding to the input query as follows. (X [1], 0.8, 0.2, x [2], 0.8, 0.7, x [3], 0.1, 0.2) (a) where x [1 ], X [2], x [3] represent undefined components.

【００１８】次に、上記問合せベクトルおよび検索対象
となるデータ保存手段３中の各データの特徴ベクトルの
それぞれについて、未定義成分を含む場合は未定義成分
補完手段４で、それらのデータの特徴ベクトルにおける
定義済み成分と、成分間相関表保存手段５により予め保
存された属性値間の相関表とから、未定義成分の補完値
が決定される。Next, with respect to each of the above-mentioned inquiry vector and the feature vector of each data in the data storage means 3 to be searched, when an undefined component is included, the undefined component complementing means 4 causes the feature vector of those data. The complementary value of the undefined component is determined based on the defined component in (1) and the correlation table between the attribute values stored in advance by the component correlation table storage unit 5.

【００１９】図４に成分間相関表保存手段５に保存され
ている属性値間の相関表の一例を示す。この表では、行
方向および列方向に属性値が列挙されており、行と列と
の交差点に相関の有無および相関度を記載してある。或
る属性値αと相関を持つ他の属性値を調べる場合、上部
に列挙された属性値群から属性値αを選択し、その列と
各行との交差点に記載された値を調べる。数値ｐが記載
されていればその行の左横に記載された属性値と相関が
あってその相関度がｐであり、「−」の場合は相関がな
いことになる。例えば、属性値「初心者向け」と相関を
持つ他の属性値を調べる場合は、表の上部に列挙された
属性値群における属性値「初心者向け」の列と各行との
交差点の値を調べる。図の場合、属性値「幼児向け」，
「一般向け」，「高齢者向け」と相関があり、その相関
度はそれぞれ０．９，０．３，０．８である。FIG. 4 shows an example of a correlation table between attribute values stored in the inter-component correlation table storage means 5. In this table, attribute values are listed in the row direction and the column direction, and the presence or absence of correlation and the degree of correlation are described at the intersections of rows and columns. When investigating another attribute value having a correlation with a certain attribute value α, the attribute value α is selected from the attribute value group listed above, and the value written at the intersection of the column and each row is examined. If the numerical value p is described, there is a correlation with the attribute value described on the left side of the line, and the degree of correlation is p, and if "-", there is no correlation. For example, when investigating another attribute value having a correlation with the attribute value "for beginners", the value of the intersection between the column of the attribute value "for beginners" and each row in the attribute value group listed at the top of the table is examined. In the case of the figure, the attribute value "For infants",
There is a correlation with "for the general public" and "for the elderly", and the correlations are 0.9, 0.3 and 0.8, respectively.

【００２０】さて、未定義成分補完手段４は、例えば上
記問合せベクトル（a）の場合、ｘ［１］とｘ［２］と
ｘ［３］の３つの未定義成分を含むため、以下のような
処理を行う。In the case of the inquiry vector (a), the undefined component complementing means 4 includes three undefined components x [1], x [2], and x [3]. Performs various processing.

【００２１】図４の相関表を見ると、ｘ［２］に対応す
る属性値「初心者向け」と相関があるのは、「幼児向
け」，「一般向け」，「高齢者向け」の３つの属性値で
あるが、問合せベクトル（a）において「幼児向け」の
属性値は未定義成分ｘ［１］であるため除外され、残り
の「一般向け」，「高齢者向け」のうち最も相関度の高
い「高齢者向け」が選択される。この相関値の意味は
「高齢者であれば、かなりの確からしさで初心者であ
る」ことを示している。相関表を見ると、その相関度は
０．８であり、問合せベクトル（a）で「高齢者向け」
に割り当てられている重み付けは０．２なので、ｘ［２］＝０.2・０.8＝０.16 と補完される。Looking at the correlation table of FIG. 4, there are three correlations with the attribute value "for beginners" corresponding to x [2]: "for infants", "for general", and "for elderly". Although it is an attribute value, the attribute value for "infant" in query vector (a) is excluded because it is an undefined component x [1], and the correlation degree is the highest among the remaining "general" and "elderly". “For senior citizens”, which has a high rating, is selected. The meaning of this correlation value indicates that "an elderly person is a beginner with considerable certainty." Looking at the correlation table, the degree of correlation is 0.8, and the query vector (a) is "for the elderly".
Since the weight assigned to x is 0.2, x [2] = 0.2 · 0.8 = 0.16 is complemented.

【００２２】同様に、ｘ［３］に相当する「視覚障害対
応」は、図４の相関表を見ると、「高齢者向け」とのみ
０.7の相関度をもち、上記問合せベクトル（a）では属
性値「高齢者向け」の重み付けは０．２なので、ｘ［３］＝０.2・０.7＝０.14 と補完される。Similarly, the "visually impaired" corresponding to x [3] has a degree of correlation of 0.7 only with "for the elderly" in the correlation table of FIG. 4, and the inquiry vector (a ), The weighting of the attribute value “for the elderly” is 0.2, so x [3] = 0.2 · 0.7 = 0.14 is complemented.

【００２３】一方、ｘ［１］に対応する属性値「幼児向
け」は、図４の相関表では属性値「初心者向け」とだけ
相関を持つが、上記問合せベクトル（a）では属性値
「初心者向け」は未定義成分ｘ［２］である。従って、
補完値を求めることができない。このような場合には、
⊥を割り当てる。⊥は後段のベクトル間距離計算の際、
当該成分を距離計算に用いないことを示す。すなわち、
差を求めようとする２つのベクトルの成分のうち、少な
くとも一方が⊥であるとき、その差を常に０とする。こ
うすることで、補完できなかった未定義成分が後段の距
離計算に影響を与えることがなくなり、したがって依然
としてベクトル間距離を２つのデータの意味的関連度の
指標とすることができる。以上のような補完の結果、前
記問合せベクトル（a）は次のようになる。（⊥，０.8，０.2, 0.16, ０.8, ０.7, 0.14, ０.1, ０.2） …（b）On the other hand, the attribute value "for young children" corresponding to x [1] has a correlation only with the attribute value "for beginners" in the correlation table of FIG. "To" is the undefined component x [2]. Therefore,
The complementary value cannot be calculated. In such cases,
Assign ⊥. ⊥ is the distance between vectors in the latter stage,
Indicates that the component is not used in the distance calculation. That is,
When at least one of the two vector components for which a difference is to be obtained is ⊥, the difference is always set to 0. By doing so, the undefined component that could not be complemented does not affect the distance calculation in the subsequent stage, and therefore the inter-vector distance can still be used as an index of the semantic relevance of the two data. As a result of the above complement, the inquiry vector (a) becomes as follows. (⊥, 0.8, 0.2, 0.16, 0.8, 0.7, 0.14, 0.1, 0.2) (b)

【００２４】なお、図４の相関表では、各属性値が少な
くとも他の１つの属性値と相関を持つようになっている
が、他の属性値と全く相関を持たない属性値が存在する
場合も考えられ、若し、そのような属性値が未定義成分
として現れた場合には、先と同様に補完値を求めること
ができない。このような場合には、⊥を割り当てる。In the correlation table of FIG. 4, each attribute value has a correlation with at least one other attribute value, but there is an attribute value that has no correlation with another attribute value. It is also conceivable that if such an attribute value appears as an undefined component, the complementary value cannot be obtained as before. In such cases, assign ⊥.

【００２５】この後、ベクトル間距離計算手段６で、上
記のように未定義成分が補完された問合せベクトル
（b）および各データの特徴ベクトルの間で、距離計算
が行われる。この距離の定義として、（１）各成分ごと
の差の絶対値の総和、もしくは（２）ユークリッド距離
の２通りが考えられるが、問合せと検索対象となる各デ
ータとの意味的関連度の指標としてはどちらでも同じ効
果である。そこで、ここでは距離を各成分間の差の絶対
値の総和と定義する。そして、近傍データ判定手段７
で、しきい値保存手段８に予め登録されたしきい値と上
記ベクトル間距離を比較し、当該ベクトル間距離がしき
い値以下であれば、位置グループＧｐに当該検索対象デ
ータが加えられる。After that, the inter-vector distance calculating means 6 calculates the distance between the query vector (b) in which the undefined component is complemented as described above and the feature vector of each data. There are two possible ways to define this distance: (1) the sum of the absolute values of the differences for each component, or (2) the Euclidean distance. An index of the degree of semantic relevance between the query and each piece of data to be searched. Both have the same effect. Therefore, here, the distance is defined as the sum of absolute values of the differences between the components. Then, the neighborhood data determination means 7
Then, the threshold value stored in the threshold value storage means 8 is compared with the vector distance, and if the vector distance is less than or equal to the threshold value, the search target data is added to the position group Gp.

【００２６】こうして得られた位置グループＧｐ中のデ
ータ集合がユーザからの問合せに対する検索結果であ
り、そこに含まれるデータおよびデータ間の関係の一覧
をディスプレイ装置などの検索結果表示手段９で表示す
る。なお、ハイパーテキストの場合、上記一覧には各デ
ータのタイトルなどが含まれる。ユーザが、このような
一覧を見て、データ指示手段１０を用いて一覧中の任意
のデータを指示すると、指示されたデータの詳細な内容
がデータ表示手段１１により表示される。The data set in the position group Gp thus obtained is the search result for the inquiry from the user, and the list of the data contained therein and the relationship between the data is displayed by the search result display means 9 such as a display device. . In the case of hypertext, the list includes the title of each data. When the user looks at such a list and uses the data designating means 10 to designate any data in the list, the data display means 11 displays the detailed contents of the designated data.

【００２７】以上により、ユーザの問合せ入力において
検索式などを用いない直観的な入力を行うことが可能に
なり、特徴ベクトルを用いた効率的で高精度な関連度計
算が可能になり、特徴ベクトル中に未定義の成分があっ
てもデータ間の意味的関連度という意味を損なわずに特
徴ベクトル間の距離を計算することが可能となる。As described above, it becomes possible to perform intuitive input without using a search expression in the user's inquiry input, and efficient and highly accurate relevance calculation using the feature vector becomes possible. Even if there are undefined components in it, it becomes possible to calculate the distance between feature vectors without impairing the meaning of the degree of semantic association between data.

【００２８】次に、データベース検索装置の他の例につ
いて図を参照して説明する。Next, another example of the database search device will be described with reference to the drawings.

【００２９】図５を参照すると、データベース検索装置
の他の例は、予めデータ間に関係が定められており、か
つデータが幾つかのカテゴリに分類されている場合、位
置グループ抽出部１２で得られた各近傍データの属する
カテゴリと、前記近傍データから関係を辿って到達でき
るデータの属するカテゴリとから、カテゴリ間規則保存
手段１４に保存されている予め定められたカテゴリ間規
則を用いて、前記近傍データから関係を辿って到達でき
るデータを検索結果に含めるか否かを判断する関係デー
タ判定手段１３を含むという点で、図１に示される例の
データベース検索装置と異なる。なお、関係データ判定
手段１３とカテゴリ間規則保存手段１４とで関係グルー
プ抽出部１５が構成されている。関係グループ抽出部１
５は、例えばＰＣまたはＷＳなどで動作するプログラム
によって実現される。Referring to FIG. 5, another example of the database search device is obtained by the position group extraction unit 12 when the relation between data is defined in advance and the data is classified into several categories. Using the predetermined inter-category rule stored in the inter-category rule storage means 14 from the category to which each of the neighborhood data belongs and the category to which the data that can be reached by following the relation from the neighborhood data belong, The database search apparatus differs from the database search apparatus of the example shown in FIG. 1 in that it includes a relational data determination unit 13 that determines whether or not the search result includes data that can be reached by tracing the relationship from the neighborhood data. The relational data determination unit 13 and the inter-category rule storage unit 14 constitute a relational group extraction unit 15. Relationship group extraction unit 1
5 is implemented by a program that operates on a PC or WS, for example.

【００３０】また、データ保存手段３に保存されたデー
タは、文書，画像，音声などを任意に組み合わせた、他
のデータへの関係を０個以上もつデータと、そのデータ
に対応する特徴ベクトル及びそのデータの属するカテゴ
リとから構成される。例えば、データＡとデータＢがあ
り、それぞれの特徴ベクトルがｖ［Ａ］，ｖ［Ｂ］、そ
れぞれのカテゴリがＣ１，Ｃ２であって、かつデータＡ
がもつ関係の一つにデータＢへの関係があるとき、デー
タ保存手段３では図６のような形で保存される。Further, the data stored in the data storage means 3 includes data having 0 or more relations to other data in which documents, images, voices, etc. are arbitrarily combined, feature vectors corresponding to the data, and It is composed of the category to which the data belongs. For example, there are data A and data B, their feature vectors are v [A] and v [B], their categories are C1 and C2, and the data A is
When one of the relations has a relation to the data B, the data is stored in the data storing means 3 as shown in FIG.

【００３１】次に図５に示されるデータベース検索装置
の動作を、図１に示したデータベース検索装置との相違
部分を中心に、図を参照しながら詳しく説明する。Next, the operation of the database search device shown in FIG. 5 will be described in detail with reference to the drawing, focusing on the differences from the database search device shown in FIG.

【００３２】図５を参照すると、関係データ判定手段１
３は、近傍データ判定手段７で得られた位置グループＧ
ｐに属する任意のデータＡについて、当該データから辿
ることのできる関係を辿り、その先にある任意のデータ
Ｂを求める。そして、データＡおよびデータＢの属する
カテゴリと、カテゴリ間規則保存手段１４で保存されて
いる予め定められたカテゴリ間規則とを用いて、データ
Ｂをデータ集合Ｇｒに含めるか否かを判定する。例え
ば、カテゴリとして「ガイドライン」，「デザイン事
例」，「評価事例」という３つが予め定められており、
Ｇｐに「デザイン事例」に属するデータＡが含まれてい
るとする。そして、データＡはデータＢ，データＣ，デ
ータＤへの関係を持ち、それぞれのデータのカテゴリ
が、データＢは「ガイドライン」，データＣは「デザイ
ン事例」，データＤは「評価事例」であるとする。ま
た、カテゴリ間規則保存手段１４で図８のようなカテゴ
リ間規則が保存されているとする。このとき、図７を参
照すると、関係データ判定手段１３では、まずＧｒがＧ
ｒ＝Ｇｐと初期化され、Ｇｐに属するデータＡについ
て、カテゴリ間規則を参照する。カテゴリ間規則では、
「デザイン事例」から「ガイドライン」への関係、およ
び「デザイン事例」から「評価事例」への関係をともに
Ｇｒに含めると定められているので、データＢおよびデ
ータＤはＧｒに追加される。一方、「デザイン事例」か
ら「デザイン事例」への関係はＧｒに含めないとされて
いるので、データＣはＧｒに追加されない。したがっ
て、データＡについて、データＢおよびデータＤが新た
にＧｒに追加される。このような処理をＧｐに属する全
てのデータについて行う。Referring to FIG. 5, the relational data judging means 1
3 is the position group G obtained by the neighborhood data determination means 7.
With respect to the arbitrary data A belonging to p, the relation that can be traced from the data is traced, and the arbitrary data B at the end is obtained. Then, using the categories to which the data A and the data B belong and the predetermined inter-category rule stored in the inter-category rule storage means 14, it is determined whether or not the data B is included in the data set Gr. For example, three categories, “guideline”, “design case”, and “evaluation case”, are preset,
It is assumed that Gp includes data A belonging to “design case”. The data A has a relationship with the data B, the data C, and the data D, and the categories of the respective data are the data B is “guideline”, the data C is “design case”, and the data D is “evaluation case”. And It is also assumed that the inter-category rule storage means 14 stores inter-category rules as shown in FIG. At this time, referring to FIG. 7, in the relational data determination means 13, first, Gr is G
The inter-category rule is referred to for the data A that is initialized to r = Gp and belongs to Gp. In the inter-category rules,
Since it is defined that both the relationship from the “design case” to the “guideline” and the relationship from the “design case” to the “evaluation case” are included in Gr, the data B and the data D are added to Gr. On the other hand, since the relationship from the “design case” to the “design case” is not included in Gr, the data C is not added to Gr. Therefore, for data A, data B and data D are newly added to Gr. Such processing is performed for all data belonging to Gp.

【００３３】こうして得られたデータ集合Ｇｒがユーザ
からの問合せに対する検索結果であり、そこに含まれる
データおよびデータ間の関係の一覧を検索結果表示手段
９で表示する。そして、ユーザがデータ指示手段１０を
用いて任意のデータを指示すると、指示されたデータの
内容がデータ表示手段１１により表示される。The data set Gr thus obtained is the search result for the inquiry from the user, and the search result display means 9 displays a list of the data contained therein and the relationship between the data. When the user uses the data instructing means 10 to instruct arbitrary data, the content of the instructed data is displayed by the data displaying means 11.

【００３４】これにより、ユーザは検索結果に含まれる
データのそれぞれについて、自身の求めるデータである
か否かを判断するに際して、問合せとの関連度だけでな
く、そこからどのようなデータを得ることができるかに
ついての情報を共に用いて、多角的に判断することがで
きる。As a result, the user can obtain not only the degree of relevance to the inquiry but also what kind of data from the inquiry when judging whether each of the data included in the search result is the data desired by the user. It is possible to make a multifaceted decision by using together information on whether or not

【００３５】次に本発明に係るデータベース登録装置の
一実施の形態について、図を参照しながら説明する。Next, an embodiment of the database registration device according to the present invention will be described with reference to the drawings.

【００３６】図９を参照すると、本発明に係るデータベ
ース登録装置の実施の形態の構成は、データ登録者がデ
ータ（ノード）の内容を登録するコンテンツ登録手段１
６と、予め定められたキーワード集合を保存してあるキ
ーワード集合保存手段１７と、そのキーワード集合から
ノードの内容を適切に表すキーワードをデータ登録者が
選ぶためのキーワード選択手段１８と、データ登録者が
各選択キーワードに［０，１］なる実数を用いて重みづ
けを行うための重みづけ入力手段１９と、重みづけされ
た選択キーワードの集合から特徴ベクトルを生成する特
徴ベクトル生成手段２０と、データ（ノード）を蓄積す
るデータ蓄積手段２１と、関連ノードの集合を求める位
置グループ抽出部１２と、登録ノードと近傍ノードとの
間にリンクを追加する際にリンクの方向を判定するため
のリンク方向判定手段２２と、その判定に用いる規則を
保存してあるリンクづけ規則保存手段２３と、登録ノー
ドをデータ蓄積手段２１に追加してリンク方向判定手段
２２で決定された方向に登録ノードと近傍ノードとの間
にリンクを追加するノード追加手段２４とを含む。Referring to FIG. 9, the configuration of the embodiment of the database registration apparatus according to the present invention is a content registration means 1 in which a data registrant registers the content of data (node).
6, a keyword set storage means 17 in which a predetermined keyword set is stored, a keyword selection means 18 for the data registrant to select a keyword that appropriately represents the content of the node from the keyword set, and a data registrant. Is a weighting input means 19 for weighting each selected keyword by using a real number of [0, 1], a characteristic vector generation means 20 for generating a characteristic vector from a set of weighted selected keywords, and data. A data storage unit 21 that stores (nodes), a position group extraction unit 12 that obtains a set of related nodes, and a link direction for determining a link direction when adding a link between a registered node and a neighboring node. The determination means 22, the linking rule storage means 23 in which the rules used for the determination are stored, and the registered node are the data storage hands. In addition to 21 and a node addition section 24 for adding a link between the link direction determination unit direction registration node determined at 22 and neighboring nodes.

【００３７】ここで、データ蓄積手段２１に格納される
データは、前述した図６に示されるものと同様で、文
書，画像，音声などを任意に組み合わせた、他のデータ
への関係を０個以上持つデータと、そのデータに対応す
る特徴ベクトル及びそのデータの属するカテゴリとから
構成される。Here, the data stored in the data storage means 21 is the same as that shown in FIG. 6 described above, and the relation to other data, which is an arbitrary combination of documents, images, voices, etc., is 0. It is composed of the above-mentioned data, the feature vector corresponding to the data, and the category to which the data belongs.

【００３８】また、位置グループ抽出部１２の内部構造
は、図１に示したデータベース検索装置の主要部を構成
する位置グループ抽出部１２と同じ構成であり、図１０
に示すように、特徴ベクトルの成分間の相関度の表を保
存しておく成分間相関表保存手段５と、特徴ベクトル生
成手段２０で生成された新規ノードの特徴ベクトルおよ
びデータ蓄積手段２１に保存されている各ノードに付加
されている特徴ベクトルの未定義成分を、これらのベク
トルで定義済みの成分と成分間相関表保存手段５に保存
された成分間の相関度の表とを用いて補完する未定義成
分補完手段４と、未定義成分補完手段４で未定義部分が
補完された新規ノードの特徴ベクトルとデータ蓄積手段
２１に保存されているノードの特徴ベクトルとの距離を
求めるベクトル間距離計算手段６と、ベクトル間距離計
算手段６で計算された特徴ベクトル間の距離が予め定め
られたしきい値以下であれば前記ノードを前記新規ノー
ドに対する近傍データと判定する近傍データ判定手段７
と、近傍データ判定手段７において近傍データの判定に
用いる予め定められたしきい値を保存しておくしきい値
保存手段８とから構成されている。また、未定義成分補
完手段４は、特徴ベクトル生成手段２０で生成された特
徴ベクトルおよび予め保存されたデータの特徴ベクトル
の未定義成分の補完値を求めるために必要な、その未定
義成分と相関をもつ定義された成分がない場合、その未
定義成分をどのような値との差をとっても常に０となる
ような特別な記号に置き換える。The internal structure of the position group extracting unit 12 is the same as that of the position group extracting unit 12 forming the main part of the database searching apparatus shown in FIG.
As shown in FIG. 3, the correlation table between the components of the feature vector is stored in the component correlation table storage means 5, the feature vector of the new node generated by the feature vector generation means 20, and the data storage means 21. The undefined components of the feature vector added to each node are complemented by using the components defined by these vectors and the correlation table between the components stored in the inter-component correlation table storage means 5. Inter-vector distance for obtaining the distance between the undefined component complementing means 4 and the feature vector of the new node whose undefined portion is complemented by the undefined component complementing means 4 and the feature vector of the node stored in the data accumulating means 21. If the distance between the calculation means 6 and the feature vector calculated by the inter-vector distance calculation means 6 is less than or equal to a predetermined threshold value, the node is regarded as a neighbor node for the new node. Data and determines proximity data determination unit 7
And a threshold value storage means 8 for storing a predetermined threshold value used in the proximity data determination means 7 for the determination of the proximity data. Further, the undefined component complementing means 4 correlates with the undefined component necessary for obtaining the complementary value of the undefined component of the feature vector generated by the feature vector generating means 20 and the feature vector of the data stored in advance. If there is no defined component with, then the undefined component is replaced with a special symbol that is always 0, no matter what the difference.

【００３９】なお、図９に示すコンテンツ登録手段１
６，キーワード選択手段１８，重みづけ入力手段１９
は、マウスおよびキーボードなどの情報入力装置であ
り、キーワード集合保存手段１７，特徴ベクトル生成手
段２０，位置グループ抽出部１２，リンク方向判定手段
２２，リンクづけ規則保存手段２３およびノード追加手
段２４は、例えばＰＣやＷＳなどで動作するプログラム
で実現される。また、データ蓄積手段２１は、例えば固
定ディスクなどの外部記憶装置である。The content registration means 1 shown in FIG.
6, keyword selection means 18, weighting input means 19
Is an information input device such as a mouse and a keyboard, and the keyword set storage means 17, the feature vector generation means 20, the position group extraction part 12, the link direction determination means 22, the linking rule storage means 23 and the node addition means 24 are For example, it is realized by a program that operates on a PC or WS. The data storage means 21 is an external storage device such as a fixed disk.

【００４０】このような構成のデータベース登録装置の
動作について図を参照しながら説明する。The operation of the database registration device having such a configuration will be described with reference to the drawings.

【００４１】図９を参照すると、まず登録者は登録ノー
ドの内容を予め文書，画像，音声を任意に組み合わせて
作成し、コンテンツ登録手段１６により入力する。ま
た、この登録ノードのカテゴリ（後述するガイドライ
ン，デザイン事例，評価事例など）をコンテンツ登録手
段１６により入力する。そして、キーワード選択手段１
８により、コンテンツ登録手段１６で入力したノードの
内容にしたがって、キーワード集合保存手段１７に予め
保存されたキーワード集合の中から適当なキーワードを
選択する。したがって、選択されないキーワードも存在
する。ここで、キーワード集合はユーザにとってわかり
やすいように幾つかのカテゴリに分類されていてもよ
い。例えば、図１１を参照すると、「一般」，「初心
者」，「視覚」などのキーワード集合（属性値）は、
「年齢層」，「システム経験」，「タスク知識」，「障
害」の４つのカテゴリ（属性）に分類されている。そし
て、登録者は、選択した各キーワードに対し、重みづけ
入力手段１９により［０，１］なる実数を当該キーワー
ドの重要度として割り当てる。このとき、前記カテゴリ
に対して実数を割り当てることを許してもよい。この場
合、そのカテゴリに属するキーワード全てにカテゴリに
対して割り当てられた実数を割り当てる。こうすること
で、キーワードの一括選択を行うことになり、ユーザの
重み付け入力時の手間が軽減される。また、各キーワー
ドに実数を割り当て、かつカテゴリにも実数を割り当て
ることを許してもよい。この場合、実数の与えられたカ
テゴリに属するキーワード全てについて、それぞれのキ
ーワードに割り当てられた実数と当該カテゴリに与えら
れた実数との積を求め、その値を当該キーワードに再び
割り当てる。こうすることで、繰り返し検索を行う際の
重みづけの調整にかかる手間を軽減できる。Referring to FIG. 9, first, the registrant creates the contents of the registered node in advance by arbitrarily combining a document, an image, and a voice, and inputs them by the content registration means 16. Further, the category of this registered node (guideline, design case, evaluation case, etc., which will be described later) is input by the content registration means 16. And the keyword selection means 1
8, an appropriate keyword is selected from the keyword set stored in advance in the keyword set storage unit 17 according to the content of the node input by the content registration unit 16. Therefore, some keywords are not selected. Here, the keyword set may be classified into some categories so that the user can easily understand it. For example, referring to FIG. 11, keyword sets (attribute values) such as “general”, “beginner”, and “visual” are
It is classified into four categories (attributes) of "age group", "system experience", "task knowledge", and "disability". Then, the registrant assigns a real number of [0, 1] to each selected keyword by the weighting input means 19 as the importance of the keyword. At this time, it may be allowed to assign a real number to the category. In this case, the real number assigned to the category is assigned to all the keywords belonging to the category. By doing so, keywords are collectively selected, and the user's time and effort at the time of inputting weights are reduced. It is also possible to assign a real number to each keyword and a real number to a category. In this case, for all the keywords belonging to the category given the real number, the product of the real number assigned to each keyword and the real number assigned to the category is calculated, and the value is reassigned to the keyword. By doing so, it is possible to reduce the time and effort required to adjust the weight when repeatedly performing the search.

【００４２】さて、特徴ベクトル生成手段２０は前段ま
での結果から当該登録ノードに対応する特徴ベクトルを
生成する。その際、キーワード選択手段１８で選択され
なかったキーワードには未定義であることを示す特別な
記号ｘを割り当てる。このようなｘとして、例えば−１
のように［０，１］でない実数を用いてもよいし、ある
いは例えば“ｘ”のように文字を用いてもよい。そし
て、上記のように各キーワードに割り当てられた実数
を、ある一定の順序に並べたベクトルが、登録ノードに
対応する特徴ベクトルである。例えば、上記で例として
挙げたキーワード集合に対して、「一般」，「初心
者」，「熟練者」，「乏しい」，「普通」，「豊富」，
「視覚」，「肢体」の８つのキーワードがキーワード選
択手段１８で選択され、続いて重みづけ入力手段１９で
図１２のように重要度がユーザにより定められたとき、
登録データに対応する特徴ベクトルは、（ｘ， 0.8，ｘ，０.9, ｘ，0.7, ０.3, ０.8, 0.7,
０.7, ｘ，０.1）（ｘは未定義部分）のようになる。The characteristic vector generating means 20 generates a characteristic vector corresponding to the registered node from the results obtained up to the preceding stage. At that time, a special symbol x indicating undefined is assigned to the keyword not selected by the keyword selecting means 18. As such x, for example, -1
A real number other than [0,1] may be used, or a character such as “x” may be used. A vector in which the real numbers assigned to the keywords are arranged in a certain order as described above is the feature vector corresponding to the registered node. For example, “general”, “beginner”, “expert”, “poor”, “normal”, “abundant”,
When eight keywords of “visual” and “limb” are selected by the keyword selecting means 18, and subsequently, the importance is set by the weighting input means 19 as shown in FIG.
The feature vector corresponding to the registration data is (x, 0.8, x, 0.9, x, 0.7, 0.3, 0.8, 0.7,
0.7, x, 0.1) (x is an undefined part).

【００４３】次に位置グループ抽出部１２により、特徴
ベクトル生成手段２０で決定された登録ノードの特徴ベ
クトルと、データ蓄積手段２１に保存された各ノードの
特徴ベクトルとの間の距離が計算され、予め定められた
しきい値以下の距離にあるノードの集合Ｇｐを求める。
即ち、図１０を参照すると、未定義成分補完手段４は、
特徴ベクトル生成手段２０で生成された新規ノードの特
徴ベクトルおよびデータ蓄積手段２１に保存されている
各ノードに付加されている特徴ベクトルの未定義成分
を、これらのベクトルで定義済みの成分と成分間相関表
保存手段５に保存された成分間の相関度の表とを用いて
補完し、ベクトル間距離計算手段６は、未定義部分が補
完された新規ノードの特徴ベクトルとデータ蓄積手段２
１に保存されているノードの特徴ベクトルとの距離を求
め、近傍データ判定手段７は、ベクトル間距離計算手段
６で計算された特徴ベクトル間の距離が予め定められた
しきい値以下であれば前記ノードを前記新規ノードに対
する近傍データと判定して、集合Ｇｐに含ませる。Next, the position group extraction unit 12 calculates the distance between the feature vector of the registered node determined by the feature vector generation unit 20 and the feature vector of each node stored in the data storage unit 21, A set Gp of nodes located at a distance equal to or less than a predetermined threshold value is obtained.
That is, referring to FIG. 10, the undefined component complementing means 4 is
The feature vector of the new node generated by the feature vector generation means 20 and the undefined component of the feature vector added to each node stored in the data storage means 21 are defined by these vectors and between the components. The correlation table storage means 5 is used for complementing the correlation table between the components, and the inter-vector distance calculation means 6 is the feature vector of the new node in which the undefined portion is complemented and the data storage means 2.
If the distance between the feature vectors calculated by the inter-vector distance calculation means 6 is less than or equal to a predetermined threshold, the neighborhood data determination means 7 obtains the distance from the feature vector of the node stored in 1. The node is determined to be neighborhood data for the new node and included in the set Gp.

【００４４】次にリンク方向判定手段２２は、位置グル
ープ抽出部１２で求められたＧｐの各ノードについて、
登録ノードに対するリンク方向を判定する。その際、図
９のリンクづけ規則保存手段２３に予め保存されたリン
クづけ規則が適用される。リンクづけ規則はノードのカ
テゴリをもとに定められており、例えば「ガイドライ
ン」，「デザイン事例」，「評価事例」の３つのカテゴ
リについては図１３のようなリンクづけ規則となる。
今、登録ノードが「ガイドライン」に属し、ノードＡ，
Ｂ，Ｃがそれぞれ「ガイドライン」，「デザイン事
例」，「評価事例」に属しているとすると、図１３のリ
ンクづけ規則に従って登録ノードとノードＡ，Ｂ，Ｃそ
れぞれとの間のリンク方向は、図１４のように判定され
る。Next, the link direction determining means 22 determines, for each node of Gp obtained by the position group extracting section 12,
Determine the link direction for the registered node. At that time, the linking rule stored in advance in the linking rule storage unit 23 of FIG. 9 is applied. The linking rule is determined based on the category of the node. For example, the linking rule as shown in FIG. 13 is applied to the three categories of “guideline”, “design case”, and “evaluation case”.
Now, the registered node belongs to the "guideline", and the node A,
If B and C belong to the “guideline”, the “design case”, and the “evaluation case”, respectively, the link directions between the registered node and the nodes A, B, and C according to the linking rule of FIG. The determination is made as shown in FIG.

【００４５】最後に、ノード追加手段２４によって、登
録ノードとその特徴ベクトル並びにカテゴリ、および位
置グループＧｐに属する各ノードへの（からの）リンク
を、リンク方向判定手段２２で決定された方向に追加
し、ノード蓄積手段２１へ蓄積する。Finally, the node adding means 24 adds a link to (from) the registered node, its feature vector and category, and each node belonging to the position group Gp in the direction determined by the link direction determining means 22. Then, the data is stored in the node storage means 21.

【００４６】以上により、新規データの登録時に、新規
データ及び近傍データの属するカテゴリ情報と、カテゴ
リを用いたリンクづけ規則とにより、新規データと近傍
データとの間に追加するリンク方向を判定し、その方向
にリンクを自動的に追加して新規データ及びそのリンク
情報を蓄積するため、ハイパーテキストの新規ノード作
成時の関連ノード探索とリンク追加にかかる手間を軽減
することができる。As described above, at the time of registering new data, the link direction to be added between the new data and the neighbor data is determined by the category information to which the new data and the neighbor data belong and the linking rule using the category, Since a link is automatically added in that direction and new data and its link information are stored, it is possible to reduce the time and effort required for searching for a related node and adding a link when creating a new node of hypertext.

【００４７】[0047]

【発明の効果】以上説明したように本発明によれば以下
のような効果を得ることができる。As described above, according to the present invention, the following effects can be obtained.

【００４８】請求項１および２に記載の発明によれば、
データの登録時に、入力データに対して予め定められた
キーワード集合からユーザの任意によりキーワードを選
択し、選択されたそれぞれのキーワードに対して重みづ
けを行った結果から入力データの特徴ベクトルを生成す
るため、文書，画像，音声などの任意の組み合わせによ
る非定型データに対して、データ登録者にとってわかり
やすい単一の方法で特徴づけることができ、かつ適切な
キーワード付与にかかる手間を軽減することができる。
また、新規データ及び近傍データの属するカテゴリ情報
とカテゴリを用いたリンクづけ規則とにより、新規デー
タと近傍データとの間に追加するリンク方向を判定し、
その方向にリンクを自動的に追加して新規データ及びそ
のリンク情報を蓄積するため、ハイパーテキストの新規
ノード作成時の関連ノード探索とリンク追加にかかる手
間を軽減することができる。According to the invention described in claims 1 and 2,
At the time of data registration, a keyword is selected by the user from a predetermined keyword set for the input data, and a feature vector of the input data is generated from the result of weighting each selected keyword. Therefore, it is possible to characterize atypical data composed of an arbitrary combination of documents, images, voices, etc. by a single method that is easy for the data registrant to understand, and reduce the time and effort required to assign appropriate keywords. .
Further, by the category information to which the new data and the neighborhood data belong and the linking rule using the category, the link direction to be added between the new data and the neighborhood data is determined,
Since a link is automatically added in that direction and new data and its link information are stored, it is possible to reduce the time and effort required for searching for a related node and adding a link when creating a new node of hypertext.

【００４９】また、請求項３に記載の発明によれば、特
徴ベクトルの或る未定義部分の補完値を求めるために必
要な定義された成分がない場合、任意の値に対する差に
ついて０となるような特別な記号で当該未定義部分を置
き換えるため、後段のベクトル間距離計算手段において
未定義部分の存在する部分が無視される。したがって、
特徴ベクトル中に多くの未定義部分が存在する場合にも
特徴ベクトルを用いた効率のよい関連度計算を行うこと
ができる。According to the third aspect of the invention, when there is no defined component necessary for obtaining the complementary value of a certain undefined portion of the feature vector, the difference with respect to an arbitrary value becomes 0. Since the undefined part is replaced with such a special symbol, the part where the undefined part exists is ignored in the vector distance calculating means in the subsequent stage. Therefore,
Even if there are many undefined parts in the feature vector, efficient relevance calculation using the feature vector can be performed.

[Brief description of drawings]

【図１】データベース検索装置の一例を示すブロック図
である。FIG. 1 is a block diagram showing an example of a database search device.

【図２】図１中のデータ保存手段に保存されているデー
タの説明図である。2 is an explanatory diagram of data stored in a data storage unit in FIG. 1. FIG.

【図３】問合せ入力時にユーザに提示されるキーワード
集合の例とユーザが実際に選択したキーワードの例とを
示す図である。FIG. 3 is a diagram showing an example of a keyword set presented to a user at the time of inputting an inquiry and an example of a keyword actually selected by the user.

【図４】成分間相関表の例を示す図である。FIG. 4 is a diagram showing an example of an inter-component correlation table.

【図５】データベース検索装置の他の例を示すブロック
図である。FIG. 5 is a block diagram showing another example of a database search device.

【図６】図５中のデータ保存手段に保存されているデー
タの説明図である。6 is an explanatory diagram of data stored in a data storage unit in FIG.

【図７】関係データ判定手段の動作説明図である。FIG. 7 is an operation explanatory diagram of a relational data determination unit.

【図８】カテゴリ間規則の例を示す図である。FIG. 8 is a diagram showing an example of rules between categories.

【図９】本発明に係るデータベース登録装置の実施の形
態のブロック図である。FIG. 9 is a block diagram of an embodiment of a database registration device according to the present invention.

【図１０】データベース登録装置における位置グループ
抽出部の構成例を示すブロック図である。FIG. 10 is a block diagram showing a configuration example of a position group extraction unit in the database registration device.

【図１１】キーボード集合保存手段に保存されているキ
ーワード集合（属性値）の例とカテゴリの例とを示す図
である。FIG. 11 is a diagram showing an example of a keyword set (attribute value) and an example of categories stored in a keyboard set storage means.

【図１２】データ登録者が新規ノードの登録時にそのノ
ードに対して選択したキーワード（属性値）とその重要
度（重みづけ）の例を示す図である。FIG. 12 is a diagram showing an example of a keyword (attribute value) selected by a data registrant for a new node when registering a new node and its importance (weighting).

【図１３】リンクづけ規則の例を示す図である。FIG. 13 is a diagram showing an example of a linking rule.

【図１４】登録された新規ノードに付与されたリンクの
例を示す図である。FIG. 14 is a diagram showing an example of a link given to a registered new node.

[Explanation of symbols]

１問合せ入力手段２問合せベクトル生成手段３データ保存手段４未定義成分補完手段５成分間相関表保存手段６ベクトル間距離計算手段７近傍データ判定手段８しきい値保存手段９検索結果表示手段１０データ指示手段１１データ表示手段１２位置グループ抽出部１３関係データ判定手段１４カテゴリ間規則保存手段１５関係グループ抽出部１６コンテンツ登録手段１７キーワード集合保存手段１８キーワード選択手段１９重みづけ入力手段２０特徴ベクトル生成手段２１データ蓄積手段２２リンク方向判定手段２３リンクづけ規則保存手段２４ノード追加手段 1 Inquiry input method 2 Query vector generation means 3 data storage means 4 Complementary means for undefined components 5 component correlation table storage means 6 Vector distance calculation means 7 Neighborhood data determination means 8 Threshold storage means 9 Search result display means 10 Data instruction means 11 Data display means 12 Position group extraction unit 13 Relational data determination means 14 Category preservation rules 15 Relation group extraction unit 16 Content registration means 17 Keyword set storage means 18 Keyword selection means 19 Weighting input means 20 Feature vector generation means 21 Data storage means 22 Link direction determination means 23 Linking rule storage means 24 node addition method

フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/30 G06F 12/00 ＪＩＣＳＴファイル（ＪＯＩＳ)Continuation of front page (58) Fields surveyed (Int.Cl. ⁷ , DB name) G06F 17/30 G06F 12/00 JISST file (JOIS)

Claims

(57) [Claims]

1. A content registration means for a data registrant to input the contents of new data, and a data registrant selects a keyword suitable for expressing the characteristics of the new data from a predetermined keyword set. For selecting a keyword set, a keyword set storage unit for storing a keyword set used by the keyword selection unit, and a weighting with a real number of [0, 1] for each keyword selected by the keyword selection unit. A weighting input means for the data registrant to input, and a feature vector generating means for generating a feature vector of new data from the keyword selected by the keyword selecting means and the weighting input by the weighting input means. A data storage means for storing data, and a new data generated by the feature vector generation means. Of the data stored in the data storage means in which the distance between the feature vector of the data and the feature vector added to each data stored in the data storage means is equal to or less than a predetermined threshold value. From the position group extracting means to be obtained as neighborhood data with respect to the data, the category of the neighborhood data obtained by the location group extracting means and the category of the new data, by using a predetermined linking rule, the neighborhood data and the Link direction determining means for determining the direction of the link to be newly added between the new data, and add a link in the direction determined by the link direction determining means between the neighborhood data and the new data, Information about the new data, the feature vector of the new data, and the link is stored in the data storage. Database registration apparatus characterized by comprising a node addition means for storing the unit.

2. A content registration means for a data registrant to input the contents of new data, and a data registrant selects a keyword suitable for expressing the characteristics of the new data from a predetermined keyword set. For selecting a keyword set, a keyword set storage unit for storing a keyword set used by the keyword selection unit, and a weighting with a real number of [0, 1] for each keyword selected by the keyword selection unit. A weighting input means for the data registrant to input, and a feature vector generating means for generating a feature vector of new data from the keyword selected by the keyword selecting means and the weighting input by the weighting input means. And a table of the degree of correlation between the data storage means for storing data and the feature vector components. Between the component correlation table storage means, the feature vector of the new data generated by the feature vector generation means, and the undefined component of the feature vector added to each data stored in the data storage means. Undefined component complementing means for complementing by using a component defined by a vector and the correlation table between the components stored in the inter-component correlation table storing means, and an undefined portion is complemented by the undefined component complementing means An inter-vector distance calculating means for obtaining a distance between the feature vector of the new data and the feature vector of the data stored in the data accumulating means, and a distance between the feature vectors calculated by the inter-vector distance calculating means Is less than or equal to a predetermined threshold value, the data is neighborhood data determination means for determining the neighborhood data for the new data, and the neighborhood data determination means. Threshold value storing means for storing a predetermined threshold value used in the determination of the neighborhood data in the data determining means, the neighborhood data category determined by the neighborhood data determining means, and the new data category. , Link direction determining means for determining a direction of a link to be newly added between the neighborhood data and the new data by using a predetermined linking rule, and between the neighborhood data and the new data A node adding unit that adds a link in the direction determined by the link direction determining unit, and stores the new data, the feature vector of the new data, and information about the link in the data storing unit. Database registration device.

3. The undefined component complementing means, which is necessary for obtaining a complementary value of the undefined component of the feature vector generated by the feature vector generating means and the feature vector of the data stored in advance, the undefined component 3. When there is no defined component having a correlation with the component, the undefined component is replaced with a special symbol such that the difference with any value is always 0. 3. Database registration device.