JPH10198682A

JPH10198682A - Database retrieving device and database registering device

Info

Publication number: JPH10198682A
Application number: JP8358902A
Authority: JP
Inventors: Masayuki Nakae; 政行中江; Hidehiko Okada; 英彦岡田
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1996-12-28
Filing date: 1996-12-28
Publication date: 1998-07-31
Anticipated expiration: 2016-12-28
Also published as: JP2943748B2

Abstract

PROBLEM TO BE SOLVED: To make it possible to calculate the degree of association even when an undefined component is included in a feature vector in the case of calculating the degree of association between an inquiry and previously stored data by using a feature vector by a database retrieving device. SOLUTION: When an undefined component exists in an inquiry vector generated by an inquiry vector generation means 2 or the feature vector of each data stored in a data storage means 3 prior to the calculation of a distance between the inquiry vector and the feature vector of each data, an undefined component interpolating means 4 interpolates the undefined component by using a correlation table indicating the degree of correlation among the components of a feature vector stored in an inter-component correlation table storing means 5 and other defined components in the feature vector including the undefined component. When another correlated defined component does not exist, the undefined component can not be interpolated, so that a specific code for setting up a difference from any value to '0' is substituted for the undefined component so as not exert influence upon the distance calculation of the feature vector.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は非定型データに対す
る効率的で高精度な関連データ検索を可能にする、デー
タベース検索装置、およびデータベース登録装置に関す
る。[0001] 1. Field of the Invention [0002] The present invention relates to a database search device and a database registration device that enable efficient and accurate related data search for irregular data.

【０００２】[0002]

【従来の技術】従来、データの意味情報を用いた検索を
可能にするデータベースは、例えば、技術文書のデータ
ベースや、図書館蔵書のデータベースなどのように、一
般に蓄積されたデータ中から自動的にキーワードを抽出
し、予め経験的に作られた類義語辞書などを用いて、デ
ータ間の関連度を計算するものであった。2. Description of the Related Art Conventionally, a database that enables a search using the semantic information of data has been automatically input from a generally stored data, such as a database of technical documents and a database of library collections. Was extracted, and the degree of association between data was calculated using a thesaurus created empirically in advance.

【０００３】一方、画像や音声などのように言語によら
ないデータの場合は、データの特徴量をベクトルの形で
表し、データ間の関連度を双方のベクトル間距離で表す
方法が知られている。このような方法には、例えば、特
開昭５８−１４７７９９号公報，特開昭６０−１８９５
８５号公報，特開昭６４−１７１８２号公報，特開平４
−１５７７７号公報などのように、音声パターンや文字
パターンの特徴ベクトル間のユークリッド距離をもって
類似度とする方法がある。この他にも、例えば、特開昭
６０−２０２４９１号公報や特開昭６３−７８０号公報
のように、特徴ベクトルの各成分の差の絶対値の総和を
もって類似度とする方法もある。いずれの方法にして
も、特徴ベクトルは音声データや画像データを特徴づけ
る物理量の系列であり、その特徴ベクトルの各成分はあ
らかじめ全て決定されていることが前提となっている。On the other hand, in the case of data that does not depend on languages, such as images and sounds, a method is known in which the feature amount of the data is represented in the form of a vector, and the degree of association between the data is represented by the distance between the two vectors. I have. Such methods include, for example, JP-A-58-147799 and JP-A-60-1895.
No. 85, JP-A-64-17182, JP-A-Hei.
As disclosed in Japanese Patent Laid-Open No. -15777, there is a method of determining the similarity based on the Euclidean distance between feature vectors of a voice pattern or a character pattern. In addition, there is a method of determining the similarity by summing the absolute values of the differences between the components of the feature vector as disclosed in Japanese Patent Application Laid-Open Nos. 60-202491 and 63-780. In any case, the feature vector is a sequence of physical quantities characterizing audio data or image data, and it is premised that all components of the feature vector are determined in advance.

【０００４】また、近年、文書，画像，音声などを統一
的に取り扱うデータ形式として、ハイパーテキストが注
目されている。ハイパーテキストは文書，画像，音声な
どから構成されるノードと呼ばれる情報単位をもち、ノ
ード間をリンクにより関連づけすることができる。デー
タベースの分野においても、このハイパーテキストで表
現されたデータを対象とすることで、文書，画像，音声
といったデータ型の区別のない新しいメディアとして応
用されている。[0004] In recent years, hypertexts have attracted attention as a data format for uniformly handling documents, images, sounds, and the like. The hypertext has an information unit called a node composed of a document, an image, a sound, and the like, and the nodes can be associated with each other by a link. Also in the field of databases, by targeting the data expressed in the hypertext, it is applied as a new medium with no distinction of data types such as documents, images, and sounds.

【０００５】ここで、ハイパーテキストシステムへの新
規ノード追加の際、新規ノードから関連ノードへのリン
クもしくは関連ノードから新規ノードへのリンクを自動
的に付加する手法が望まれている。このような手法に
は、特開平３−２７８２４７号公報のように、ノードの
内容が変更されると、差分情報またはノードの全情報が
新しいバージョンのノードとして保持され、変更前のノ
ードへ新しいノードからリンクが付加されるといったも
のがある。また、特開平４−３１７１７２号公報のよう
に、通常の情報を持ったノード（検索対象ノード）の他
に、検索のために用いるノード（検索ノード）を用意し
て対応する検索対象ノードとリンクしておき、検索対象
ノードに変更があると、変更の必要な検索ノードのリン
クについて、更新，追加が行われるという手法もある。
最後に、特開平５−２０３６３号公報のように、各情報
に定性的な検索条件をあらかじめ記載しておき、情報表
示時に検索条件を同時に表示し、そこで選択された検索
条件を元に他の情報があいまい検索されるという手法も
ある。尚、ここでいう定性的な検索条件とは、例えば画
像の暗さのように、画像のコントラストなどの物理量に
より判断可能な条件である。また、ハイパーテキストデ
ータベースにおいて、従来、検索結果の表示については
一般に、検索式に適合するノードへのリンクのリストが
表示されていた。Here, there is a demand for a technique for automatically adding a link from a new node to a related node or a link from a related node to a new node when a new node is added to the hypertext system. In such a method, as disclosed in Japanese Patent Application Laid-Open No. 3-278247, when the content of a node is changed, the difference information or all information of the node is held as a new version of the node, and the new node is added to the node before the change. Is added from the link. Also, as disclosed in JP-A-4-317172, in addition to a node having ordinary information (search target node), a node (search node) used for search is prepared and a corresponding search target node is linked. In addition, there is a method in which when there is a change in the search target node, the link of the search node that needs to be changed is updated or added.
Finally, qualitative search conditions are described in advance in each information as in Japanese Patent Application Laid-Open No. Hei 5-20363, and the search conditions are displayed at the same time when the information is displayed. There is also a technique in which information is ambiguously searched. Here, the qualitative search condition is a condition that can be determined by a physical quantity such as the contrast of an image, such as the darkness of an image. Conventionally, in a hypertext database, a list of links to nodes that match a search formula is generally displayed for displaying search results.

【０００６】[0006]

【発明が解決しようとする課題】第１の問題点として、
データ検索について、ユーザの入力した問合せと既に蓄
積されたデータとの間の関連度を計算する手段として、
問合せ及びデータの特徴をベクトル量に表し、それらの
関連度をベクトル間の距離とする従来の方法は特徴ベク
トルの全成分が決定していることを前提としているた
め、ベクトル成分の一部に未定義のものがあった場合、
距離計算を行うことができないという問題があった。The first problem is as follows.
For data search, as a means to calculate the degree of association between the query entered by the user and the data already stored,
The conventional method in which the features of the query and data are represented by vector quantities and the degree of relevance of them is the distance between the vectors is based on the assumption that all the components of the feature vector have been determined. If there is a definition,
There was a problem that distance calculation could not be performed.

【０００７】第２の問題点として、ハイパーテキストデ
ータベースに対する検索について、従来の方法では検索
結果がユーザの入力した問合せに関連したノードの単な
る羅列であり、検索結果に含まれる各ノードについての
リンク情報は明示されず、それゆえ、ユーザは検索結果
として表される各ノードが自身の求めるものか否かの見
通しをつけにくいという問題があった。[0007] As a second problem, regarding a search for a hypertext database, in the conventional method, the search result is simply a list of nodes related to the query input by the user, and the link information about each node included in the search result. Therefore, there is a problem that it is difficult for the user to predict whether or not each node represented as a search result is what he or she wants.

【０００８】第３の問題点として、データの登録につい
て、データの意味を定義するために、データ中のキーワ
ード自動抽出もしくは登録者による自由なキーワード登
録を行うような従来技術では、例えばユーザインタフェ
ースに関するデータのようにユーザ特性，タスク特性，
システム特性など様々な視点から特徴づけを行う必要が
あり、かつキーワードとして適当な語彙が明確でない場
合、データの意味を代表するキーワードを適切に定義す
ることは難しく、そのためデータへの適切な特徴づけが
困難であるなどの問題があった。[0008] As a third problem, with respect to data registration, in the prior art in which a keyword is automatically extracted from data or a keyword is freely registered by a registrant in order to define the meaning of the data, for example, a user interface is involved. Like data, user characteristics, task characteristics,
If it is necessary to characterize from various viewpoints, such as system characteristics, and the appropriate vocabulary is not clear as a keyword, it is difficult to properly define a keyword that represents the meaning of the data, and therefore, appropriate characterization for the data There was a problem that it was difficult.

【０００９】第４の問題点として、ハイパーテキストデ
ータベースにおいて、新規ノードの登録の際、各ノード
に付加された定性的情報を用いて自動的にリンクを生成
するような従来の方法では、（１）文書，画像，音声な
どの任意な組み合わせを認めるような非定型データに対
して対応できない、（２）互いにリンク付けされるノー
ド対に対し、リンクの方向を動的に決定できない、など
の問題があった。As a fourth problem, in the conventional method of automatically generating a link using qualitative information added to each node when registering a new node in a hypertext database, (1) ) Problems such as not being able to cope with atypical data that allows arbitrary combinations of documents, images, sounds, etc., and (2) not being able to dynamically determine the link direction for node pairs linked to each other was there.

【００１０】そこで本発明では、文書，画像，音声など
を自由に組み合わせた非定型的なデータに対し、単一の
方法で特徴づけることができ、かつ新規に登録されるデ
ータと既に蓄積された関連データとの間で自動的なリン
クづけを行うデータベース登録装置、および高精度で効
率のよい関連データ検索を行うことができ、かつ検索結
果の見通しがよいデータベース検索装置を提供すること
を目的とする。Therefore, according to the present invention, atypical data obtained by freely combining documents, images, sounds, and the like can be characterized by a single method, and newly registered data and already registered data can be characterized. It is an object of the present invention to provide a database registration device for automatically linking with related data, and a database search device capable of performing highly accurate and efficient related data search and having a good prospect of search results. I do.

【００１１】[0011]

【課題を解決するための手段】本発明のデータベース検
索装置は、ユーザからの問合せを入力する問合せ入力手
段と、データおよび該データの特徴ベクトルを保存して
おくデータ保存手段と、前記問合せ入力手段で入力され
た問合せを問合せベクトルに変換する問合せベクトル生
成手段と、特徴ベクトルの成分間の相関度の表を保存し
ておく成分間相関表保存手段と、前記問合せベクトル生
成手段で生成された問合せベクトルおよび前記データ保
存手段に保存されている各データに付加されている特徴
ベクトルの未定義成分を、これらのベクトルで定義済み
の成分と前記成分間相関表保存手段に保存された成分間
の相関度の表とを用いて補完する未定義成分補完手段
と、前記未定義成分補完手段で未定義部分が補完された
前記問合せベクトルと前記データ保存手段に保存されて
いる前記データの特徴ベクトルとの距離を求めるベクト
ル間距離計算手段と、前記ベクトル間距離計算手段で計
算された特徴ベクトル間の距離が予め定められたしきい
値以下であれば前記データを前記問合せに対する近傍デ
ータと判定する近傍データ判定手段と、前記近傍データ
判定手段において近傍データの判定に用いる予め定めら
れたしきい値を保存しておくしきい値保存手段と、前記
近傍データ判定手段で近傍データと判定されたデータの
一覧を表示する検索結果表示手段と、前記検索結果表示
手段で表示されるデータの一覧中からユーザが内容を表
示させるデータを指示するデータ指示手段と、前記デー
タ指示手段で指示されたデータの内容を表示する表示手
段とを有することを特徴とする。According to the present invention, there is provided a database search apparatus comprising: a query input unit for inputting a query from a user; a data storage unit for storing data and a feature vector of the data; Query vector generating means for converting the query input in step 2 into a query vector, inter-component correlation table storing means for storing a table of the degree of correlation between the components of the feature vector, and a query generated by the query vector generating means The vector and the undefined components of the feature vector added to each data stored in the data storage unit are used to determine the correlation between the components defined by these vectors and the components stored in the inter-component correlation table storage unit. Undefined component complementing means for complementing using a table of degrees, and the query vector in which an undefined part has been complemented by the undefined component complementing means An inter-vector distance calculating means for obtaining a distance from a feature vector of the data stored in the data storing means, and a distance between the feature vectors calculated by the inter-vector distance calculating means being equal to or less than a predetermined threshold value If it is, nearby data determination means for determining the data as proximity data to the inquiry, and threshold storage means for storing a predetermined threshold value used for determination of proximity data in the proximity data determination means Search result display means for displaying a list of data determined to be near data by the nearby data determination means; and data for instructing data to be displayed by the user from a list of data displayed by the search result display means. It is characterized by comprising an instruction means and a display means for displaying the contents of the data specified by the data instruction means.

【００１２】このような構成のデータベース検索装置に
あっては、未定義成分補完手段によって、問合せベクト
ルおよび既に保存されているデータの特徴ベクトルの未
定義部分の補完が行われるため、問合せベクトルや特徴
ベクトルが未定義成分を含む場合であっても、それらの
距離を計算して関連度を計ることができる。In the database search apparatus having such a configuration, the query vector and the feature vector of the already stored data are complemented by the undefined component complementing means. Even if the vector includes undefined components, their distance can be calculated to measure the degree of relevance.

【００１３】また、前記未定義成分補完手段は、問合せ
ベクトルおよび予め保存されたデータの特徴ベクトルの
或る未定義成分の補完値を求めるために必要な、その未
定義成分と相関をもつ定義された成分がない場合、その
未定義成分をどのような値との差をとっても常に０とな
るような特別な記号に置き換える構成を有することを特
徴とする。これにより、前記のように未定義成分の補完
が行えなかった場合にも、ベクトル間の距離を求めて関
連度を計ることができる。The undefined component complementing means is defined to have a correlation with the undefined component which is necessary for obtaining a complement value of a certain undefined component of a query vector and a feature vector of data stored in advance. If there is no component, the undefined component is replaced with a special symbol that always becomes 0 regardless of the difference from any value. Thus, even when the undefined component cannot be complemented as described above, the distance between the vectors can be obtained to measure the degree of association.

【００１４】さらに、本発明のデータベース検索装置
は、予めデータ間に関係が定められており、かつデータ
が幾つかのカテゴリに分類されている場合、前記近傍デ
ータ判定手段で得られた各近傍データの属するカテゴリ
と、前記近傍データから関係を辿って到達できるデータ
の属するカテゴリとから、カテゴリ間規則保存手段によ
り保存されている予め定められたカテゴリ間規則を用い
て、前記近傍データから関係を辿って到達できるデータ
を検索結果に含めるか否かを判断する関係データ判定手
段を有することを特徴とする。Further, in the database search device according to the present invention, when a relation between data is determined in advance and the data is classified into several categories, each of the neighborhood data obtained by the neighborhood data determining means is provided. And the category to which the data that can be reached by tracing the relation from the neighboring data belongs to, using a predetermined inter-category rule stored by the inter-category rule storing means, to trace the relation from the neighboring data. And a relational data judging unit for judging whether or not data reachable by the search is included in the search result.

【００１５】他方、本発明のデータベース登録装置は、
データ登録者が新規データの内容を入力するためのコン
テンツ登録手段と、予め定められたキーワード集合の中
から前記新規データの特徴を表すのに適当なキーワード
をデータ登録者が選択するためのキーワード選択手段
と、前記キーワード選択手段で用いるキーワード集合を
保存しておくキーワード集合保存手段と、前記キーワー
ド選択手段で選択された各キーワードに対して［０，
１］なる実数による重みづけをデータ登録者が入力する
ための重みづけ入力手段と、前記キーワード選択手段で
選択されたキーワードと前記重みづけ入力手段で入力さ
れた重みづけとから新規データの特徴ベクトルを生成す
る特徴ベクトル生成手段と、前記新規データ及び前記新
規データの特徴ベクトルを蓄積するデータ蓄積手段とを
有することを特徴とする。On the other hand, the database registration device of the present invention
Content registration means for the data registrant to enter the content of the new data; and keyword selection for the data registrant to select a keyword suitable for representing the characteristics of the new data from a predetermined set of keywords. Means, a keyword set storage means for storing a keyword set used by the keyword selection means, and [0,
1] weighting input means for the data registrant to input weighting by real numbers, and a feature vector of new data based on the keyword selected by the keyword selection means and the weighting input by the weighting input means And a data storage means for storing the new data and a feature vector of the new data.

【００１６】このような構成のデータベース登録装置に
あっては、テキスト，画像，音声などの任意の組み合わ
せによる非定型データに対して、適切なキーワードの付
与を単一の方法で行うことができる。In the database registration device having such a configuration, it is possible to assign an appropriate keyword to the atypical data by an arbitrary combination of text, image, sound, and the like by a single method.

【００１７】そして本発明のデータベース登録装置は、
更に、特徴ベクトルの成分間の相関度の表を保存してお
く成分間相関表保存手段と、前記特徴ベクトル生成手段
で生成された新規データの特徴ベクトルおよび前記デー
タ蓄積手段に保存されている各データに付加されている
特徴ベクトルの未定義成分を、これらのベクトルで定義
済みの成分と前記成分間相関表保存手段に保存された成
分間の相関度の表とを用いて補完する未定義成分補完手
段と、前記未定義成分補完手段で未定義部分が補完され
た前記新規データの特徴ベクトルと前記データ蓄積手段
に保存されている前記データの特徴ベクトルとの距離を
求めるベクトル間距離計算手段と、前記ベクトル間距離
計算手段で計算された特徴ベクトル間の距離が予め定め
られたしきい値以下であれば前記データを前記新規デー
タに対する近傍データと判定する近傍データ判定手段
と、前記近傍データ判定手段において近傍データの判定
に用いる予め定められたしきい値を保存しておくしきい
値保存手段と、前記近傍データ判定手段により判定され
た近傍データのカテゴリと前記新規データのカテゴリと
から、予め定められたリンク付け規則を用いて、前記近
傍データと前記新規データとの間に新たに追加するべき
リンクの方向を判定するリンク方向判定手段と、前記近
傍データと前記新規データとの間に前記リンク方向判定
手段により判定された方向にリンクを追加し、前記新規
データと前記リンクについての情報を前記データ蓄積手
段に蓄積するノード追加手段とを備えることを特徴とす
る。The database registration device of the present invention
Further, an inter-component correlation table storage unit for storing a table of the degree of correlation between components of the feature vector, and a feature vector of the new data generated by the feature vector generation unit and each of the feature vectors stored in the data storage unit. An undefined component that complements the undefined components of the feature vector added to the data using the components defined by these vectors and the table of the degree of correlation between the components stored in the inter-component correlation table storage unit. Interpolating means; inter-vector distance calculating means for calculating a distance between a feature vector of the new data whose undefined part has been complemented by the undefined component complementing means and a feature vector of the data stored in the data storage means; If the distance between the feature vectors calculated by the inter-vector distance calculation means is equal to or less than a predetermined threshold, the data is stored in the neighborhood data for the new data. Data, a threshold value storage unit that stores a predetermined threshold value used for the determination of the proximity data in the proximity data determination unit, and a threshold value storage unit that determines the proximity data. Link direction determining means for determining a direction of a link to be newly added between the neighboring data and the new data from a category of the neighboring data and a category of the new data by using a predetermined linking rule. A node adding means for adding a link between the neighboring data and the new data in the direction determined by the link direction determining means, and storing information about the new data and the link in the data storage means; It is characterized by having.

【００１８】これにより、ハイパーテキストノードの新
規作成における意味的に関連する他のノードへの自動的
なリンクづけが可能となる。This makes it possible to automatically link to another node that is semantically related to the creation of a new hypertext node.

【００１９】[0019]

【発明の実施の形態】次に本発明の実施の形態の例につ
いて図面を参照して詳細に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Next, embodiments of the present invention will be described in detail with reference to the drawings.

【００２０】まず、本発明に係るデータベース検索装置
の第１の実施の形態について図面を参照して詳細に説明
する。First, a first embodiment of a database search device according to the present invention will be described in detail with reference to the drawings.

【００２１】図１を参照すると、本発明に係るデータベ
ース検索装置の第１の実施の形態の構成は、ユーザが希
望のデータの特徴を入力するための問合せ入力手段１
と、入力された問合せを特徴ベクトルに変換する問合せ
ベクトル生成手段２と、データ及びデータの特徴ベクト
ルを保存してあるデータ保存手段３と、特徴ベクトルに
含まれる未定義成分を補完する未定義成分補完手段４
と、未定義成分補完手段４で用いる成分間相関表を保存
してある成分間相関表保存手段５と、問合せおよび保存
データの特徴ベクトル間の距離を求めるベクトル間距離
計算手段６と、問合せの特徴ベクトルに対して或るしき
い値以下の距離にある特徴ベクトルを持つ保存データを
判定する近傍データ判定手段７と、そのしきい値を保存
してあるしきい値保存手段８と、近傍データ判定手段７
で求められた近傍データの一覧を表示する検索結果表示
手段９と、検索結果表示手段９で表示された近傍データ
の一覧中からユーザが任意のデータを指示するためのデ
ータ指示手段１０と、指示されたデータの内容を表示す
るデータ表示手段１１とを含む。ここで、未定義成分補
完手段４，成分間相関表保存手段５，ベクトル間距離計
算手段６，近傍データ判定手段７及びしきい値保存手段
８で、位置グループ抽出部１２が構成される。Referring to FIG. 1, the configuration of the first embodiment of the database search apparatus according to the present invention is a query input means 1 for allowing a user to input desired data characteristics.
Query vector generating means 2 for converting an input query into a feature vector, data storage means 3 for storing data and a feature vector of data, and an undefined component for complementing an undefined component included in the feature vector Complementary means 4
An inter-component correlation table storage unit 5 storing an inter-component correlation table used by the undefined component complementing unit 4, an inter-vector distance calculation unit 6 for obtaining a distance between a query and a feature vector of stored data, A neighborhood data determination unit 7 for determining stored data having a feature vector located at a distance equal to or less than a certain threshold with respect to the feature vector; a threshold storage unit 8 for storing the threshold; Judgment means 7
Search result display means 9 for displaying a list of nearby data obtained in step 1, data indicating means 10 for a user to specify arbitrary data from the list of nearby data displayed on search result display means 9, Data display means 11 for displaying the contents of the data thus obtained. Here, the position group extraction unit 12 is constituted by the undefined component complementing unit 4, the inter-component correlation table storage unit 5, the inter-vector distance calculation unit 6, the proximity data determination unit 7, and the threshold value storage unit 8.

【００２２】問合せ入力手段１およびデータ指示手段１
０は、例えばマウスおよびキーボードなどの情報入力装
置で構成され、検索結果表示手段９およびデータ表示手
段１１は、例えばＶＤＴなどの情報表示装置で構成され
る。また、データ保存手段３は、例えば固定ディスクな
どの外部記憶装置で構成され、問合せベクトル生成手段
２および位置グループ抽出部１２は、例えばＰＣ（パー
ソナルコンピュータ）やＷＳ（ワークステーション）な
どで動作するプログラムで構成される。Inquiry input means 1 and data instruction means 1
Reference numeral 0 denotes an information input device such as a mouse and a keyboard. The search result display unit 9 and the data display unit 11 include an information display device such as a VDT. The data storage unit 3 is configured by an external storage device such as a fixed disk, and the inquiry vector generation unit 2 and the position group extraction unit 12 are configured by a program operating on a PC (personal computer), WS (workstation), or the like. It consists of.

【００２３】データ保存手段３に保存されたデータは、
文書，画像，音声などを任意に組み合わせた、他のデー
タへの関係を０個以上もつデータと、そのデータに対応
する特徴ベクトルとから構成される。例えば、データＡ
とデータＢがあり、それぞれの特徴ベクトルがｖ
［Ａ］，ｖ［Ｂ］であって、かつデータＡがもつ関係の
一つにデータＢへの関係があるとき、データ保存手段３
では図２のような形で保存される。The data stored in the data storage means 3 is
It is composed of data that has any combination of documents, images, sounds, etc. and has zero or more relations to other data, and feature vectors corresponding to the data. For example, data A
And data B, and each feature vector is v
[A], v [B], and one of the relations of the data A has a relation to the data B, the data storage means 3
Then, it is saved in the form as shown in FIG.

【００２４】次に、図１のように構成されたデータベー
ス検索装置の動作について図を参照して説明する。Next, the operation of the database search device configured as shown in FIG. 1 will be described with reference to the drawings.

【００２５】図１を参照すると、問合せ入力手段１にお
いて、例えばキーボードやマウスなどを用いて、ユーザ
からキーワードを選択し、それぞれに［０，１］なる実
数により重みづけを行ったものとする。例えば、図３
（ａ）に示したように予め任意のシステムのユーザイン
タフェースのデザイン事例などのデータの特徴を表すキ
ーワード集合が定められているとして、今、ユーザによ
って、図３（ｂ）に示したキーワードが選択され、それ
ぞれ上から順に「未選択，０.8，０.2, 未選択,０.8,
０.7, 未選択, ０.1, ０.2」と重みづけが行われたとす
る。このとき、問合せベクトル生成手段２により、入力
された問合せに対応する特徴ベクトル（問合せベクト
ル）が以下のように生成される。（ｘ［１］，０.8，０.2, ｘ［２］, ０.8, ０.7, ｘ［３］, ０.1, ０.2） … ここで、ｘ［１］，ｘ［２］，ｘ［３］は未定義成分を
示す。Referring to FIG. 1, it is assumed that a keyword is selected from a user using, for example, a keyboard or a mouse, and weighted by a real number [0, 1] in the inquiry input means 1. For example, FIG.
As shown in FIG. 3A, it is assumed that a keyword set representing the characteristics of data such as a design example of a user interface of an arbitrary system is determined in advance, and the user selects the keyword shown in FIG. From the top, respectively, "unselected, 0.8, 0.2, unselected, 0.8,
0.7, unselected, 0.1, 0.2 ". At this time, the feature vector (query vector) corresponding to the input query is generated as follows by the query vector generating means 2. (X [1], 0.8, 0.2, x [2], 0.8, 0.7, x [3], 0.1, 0.2) where x [1], x [2] and x [3] indicate undefined components.

【００２６】次に、上記問合せベクトルおよび検索対象
となるデータ保存手段３中の各データの特徴ベクトルの
それぞれについて、未定義成分を含む場合は未定義成分
補完手段４で、それらのデータの特徴ベクトルにおける
定義済み成分と、成分間相関表保存手段５により予め保
存された属性値間の相関表とから、未定義成分の補完値
が決定される。Next, when the query vector and the feature vector of each data in the data storage means 3 to be searched include an undefined component, the feature vector of the data is determined by the undefined component complementing means 4. And the correlation table between the attribute values stored in advance by the inter-component correlation table storage unit 5, the complement value of the undefined component is determined.

【００２７】図４に成分間相関表保存手段５に保存され
ている属性値間の相関表の一例を示す。この表では、行
方向および列方向に属性値が列挙されており、行と列と
の交差点に相関の有無および相関度を記載してある。或
る属性値αと相関を持つ他の属性値を調べる場合、上部
に列挙された属性値群から属性値αを選択し、その列と
各行との交差点に記載された値を調べる。数値ｐが記載
されていればその行の左横に記載された属性値と相関が
あってその相関度がｐであり、「−」の場合は相関がな
いことになる。例えば、属性値「初心者向け」と相関を
持つ他の属性値を調べる場合は、表の上部に列挙された
属性値群における属性値「初心者向け」の列と各行との
交差点の値を調べる。図の場合、属性値「幼児向け」，
「一般向け」，「高齢者向け」と相関があり、その相関
度はそれぞれ０．９，０．３，０．８である。FIG. 4 shows an example of a correlation table between attribute values stored in the inter-component correlation table storage means 5. In this table, the attribute values are listed in the row direction and the column direction, and the presence or absence of the correlation and the degree of correlation are described at the intersection between the row and the column. When examining another attribute value having a correlation with a certain attribute value α, the attribute value α is selected from the attribute value group listed above, and the value described at the intersection between the column and each row is examined. If the numerical value p is described, there is a correlation with the attribute value described on the left side of the line, and the degree of correlation is p, and if "-", there is no correlation. For example, when examining another attribute value having a correlation with the attribute value “for beginners”, the intersection value between the column of the attribute value “for beginners” and each row in the attribute value group listed at the top of the table is examined. In the case of the figure, the attribute value "for infants",
There is a correlation with "for the general public" and "for the elderly", and the degrees of correlation are 0.9, 0.3 and 0.8, respectively.

【００２８】さて、未定義成分補完手段４は、例えば上
記問合せベクトルの場合、ｘ［１］とｘ［２］とｘ
［３］の３つの未定義成分を含むため、以下のような処
理を行う。Now, for example, in the case of the above inquiry vector, the undefined component complementing means 4 calculates x [1], x [2] and x
Since the three undefined components of [3] are included, the following processing is performed.

【００２９】図４の相関表を見ると、ｘ［２］に対応す
る属性値「初心者向け」と相関があるのは、「幼児向
け」，「一般向け」，「高齢者向け」の３つの属性値で
あるが、問合せベクトルにおいて「幼児向け」の属性
値は未定義成分ｘ［１］であるため除外され、残りの
「一般向け」，「高齢者向け」のうち最も相関度の高い
「高齢者向け」が選択される。この相関値の意味は「高
齢者であれば、かなりの確からしさで初心者である」こ
とを示している。相関表を見ると、その相関度は０．８
であり、問合せベクトルで「高齢者向け」に割り当て
られている重み付けは０．２なので、ｘ［２］＝０.2・０.8＝０.16 と補完される。Referring to the correlation table of FIG. 4, there are three types of attributes, "products for infants", "products for the general public", and "products for the elderly" that have a correlation with the attribute value "for beginners" corresponding to x [2]. In the query vector, the attribute value for “toddler” is excluded because it is an undefined component x [1], and the “correspondence” with the highest correlation among the remaining “for general” and “for the elderly” For the elderly "is selected. The meaning of this correlation value indicates that "the elderly are novice with considerable certainty." Looking at the correlation table, the correlation degree is 0.8
Since the weight assigned to “for the elderly” in the inquiry vector is 0.2, it is complemented as x [2] = 0.2 · 0.8 = 0.16.

【００３０】同様に、ｘ［３］に相当する「視覚障害対
応」は、図４の相関表を見ると、「高齢者向け」とのみ
０.7の相関度をもち、上記問合せベクトルでは属性値
「高齢者向け」の重み付けは０．２なので、ｘ［３］＝０.2・０.7＝０.14 と補完される。Similarly, “visual impairment correspondence” corresponding to x [3] has a correlation of 0.7 only with “for the elderly” in the correlation table of FIG. Since the weight for the value “for the elderly” is 0.2, it is complemented as x [3] = 0.2.7 = 0.14.

【００３１】一方、ｘ［１］に対応する属性値「幼児向
け」は、図４の相関表では属性値「初心者向け」とだけ
相関を持つが、上記問合せベクトルでは属性値「初心
者向け」は未定義成分ｘ［２］である。従って、補完値
を求めることができない。このような場合には、⊥を割
り当てる。⊥は後段のベクトル間距離計算の際、当該成
分を距離計算に用いないことを示す。すなわち、差を求
めようとする２つのベクトルの成分のうち、少なくとも
一方が⊥であるとき、その差を常に０とする。こうする
ことで、補完できなかった未定義成分が後段の距離計算
に影響を与えることがなくなり、したがって依然として
ベクトル間距離を２つのデータの意味的関連度の指標と
することができる。以上のような補完の結果、前記問合
せベクトルは次のようになる。（⊥，０.8，０.2, 0.16, ０.8, ０.7, 0.14, ０.1, ０.2） …On the other hand, the attribute value "for infants" corresponding to x [1] has a correlation only with the attribute value "for beginners" in the correlation table of FIG. It is an undefined component x [2]. Therefore, a complementary value cannot be obtained. In such a case, ⊥ is assigned. ⊥ indicates that the component is not used in the distance calculation when calculating the inter-vector distance in the subsequent stage. That is, when at least one of the two vector components whose difference is to be obtained is 求め, the difference is always set to 0. By doing so, the undefined component that could not be complemented does not affect the distance calculation at the subsequent stage, so that the distance between the vectors can still be used as an index of the degree of semantic relevance between the two data. As a result of the above complementation, the inquiry vector is as follows. (⊥, 0.8, 0.2, 0.16, 0.8, 0.7, 0.14, 0.1, 0.2) ...

【００３２】なお、図４の相関表では、各属性値が少な
くとも他の１つの属性値と相関を持つようになっている
が、他の属性値と全く相関を持たない属性値が存在する
場合も考えられ、若し、そのような属性値が未定義成分
として現れた場合には、先と同様に補完値を求めること
ができい。このような場合には、⊥を割り当てる。In the correlation table of FIG. 4, each attribute value has a correlation with at least one other attribute value, but there is an attribute value having no correlation with another attribute value. If such an attribute value appears as an undefined component, a complementary value cannot be obtained in the same manner as described above. In such a case, ⊥ is assigned.

【００３３】この後、ベクトル間距離計算手段６で、上
記のように未定義成分が補完された問合せベクトルお
よび各データの特徴ベクトルの間で、距離計算が行われ
る。この距離の定義として、（１）各成分ごとの差の絶
対値の総和、もしくは（２）ユークリッド距離の２通り
が考えられるが、問合せと検索対象となる各データとの
意味的関連度の指標としてはどちらでも同じ効果であ
る。そこで、ここでは距離を各成分間の差の絶対値の総
和と定義する。そして、近傍データ判定手段７で、しき
い値保存手段８に予め登録されたしきい値と上記ベクト
ル間距離を比較し、当該ベクトル間距離がしきい値以下
であれば、位置グループＧｐに当該検索対象データが加
えられる。Thereafter, the inter-vector distance calculating means 6 calculates a distance between the query vector in which the undefined component is complemented as described above and the feature vector of each data. There are two possible definitions of this distance: (1) the sum of the absolute values of the differences for each component, or (2) the Euclidean distance. An index of the degree of semantic relevance between the query and the data to be searched Both have the same effect. Therefore, here, the distance is defined as the sum of the absolute values of the differences between the components. Then, the neighborhood data determination means 7 compares the threshold value registered in advance in the threshold value storage means 8 with the distance between the vectors. Search target data is added.

【００３４】こうして得られた位置グループＧｐ中のデ
ータ集合がユーザからの問合せに対する検索結果であ
り、そこに含まれるデータおよびデータ間の関係の一覧
をディスプレイ装置などの検索結果表示装置１４で表示
する。なお、ハイパーテキストの場合、上記一覧には各
データのタイトルなどが含まれる。ユーザが、このよう
な一覧を見て、データ指示手段１０を用いて一覧中の任
意のデータを指示すると、指示されたデータの詳細な内
容がデータ表示装置１１により表示される。The data set in the position group Gp thus obtained is a search result in response to an inquiry from the user, and a list of data contained therein and a relation between the data is displayed on the search result display device 14 such as a display device. . In the case of hypertext, the list includes the title of each data. When the user looks at such a list and specifies any data in the list using the data specifying means 10, the data display device 11 displays the detailed contents of the specified data.

【００３５】以上により、ユーザの問合せ入力において
検索式などを用いない直観的な入力を行うことが可能に
なり、特徴ベクトルを用いた効率的で高精度な関連度計
算が可能になり、特徴ベクトル中に未定義の成分があっ
てもデータ間の意味的関連度という意味を損なわずに特
徴ベクトル間の距離を計算することが可能となる。As described above, it is possible to perform an intuitive input without using a search formula or the like in a user's query input, and it is possible to perform an efficient and highly accurate relevance calculation using a feature vector. Even if there are undefined components, the distance between feature vectors can be calculated without losing the meaning of the degree of semantic relevance between data.

【００３６】次に、本発明にかかるデータベース検索装
置の第２の実施の形態について図を参照して説明する。Next, a second embodiment of the database search device according to the present invention will be described with reference to the drawings.

【００３７】図５を参照すると、本発明に係るデータベ
ース検索装置の第２の実施の形態の構成は、予めデータ
間に関係が定められており、かつデータが幾つかのカテ
ゴリに分類されている場合、位置グループ抽出部１２で
得られた各近傍データの属するカテゴリと、前記近傍デ
ータから関係を辿って到達できるデータの属するカテゴ
リとから、カテゴリ間規則保存手段１４に保存されてい
る予め定められたカテゴリ間規則を用いて、前記近傍デ
ータから関係を辿って到達できるデータを検索結果に含
めるか否かを判断する関係データ判定手段１３を含むと
いう点で、図１に示される第１の実施の形態のデータベ
ース検索装置と異なる。なお、関係データ判定手段１３
とカテゴリ間規則保存手段１４とで関係グループ抽出部
１５が構成されている。関係グループ抽出部１５は、例
えばＰＣまたはＷＳなどで動作するプログラムによって
実現される。Referring to FIG. 5, in the configuration of the second embodiment of the database search apparatus according to the present invention, the relationship between data is determined in advance, and the data is classified into several categories. In this case, the predetermined category stored in the inter-category rule storage unit 14 is based on the category to which each piece of neighboring data obtained by the position group extracting unit 12 belongs and the category to which data that can be reached by following the relationship from the neighboring data belongs. The first embodiment shown in FIG. 1 includes a relational data determination unit 13 that determines whether or not data that can be reached by tracing a relationship from the neighboring data is included in the search result using the inter-category rule. Is different from the database search device of the embodiment. Note that the relational data determination means 13
The inter-category rule storage means 14 constitutes a related group extracting unit 15. The related group extracting unit 15 is realized by a program operating on, for example, a PC or WS.

【００３８】また、データ保存手段３に保存されたデー
タは、文書，画像，音声などを任意に組み合わせた、他
のデータへの関係を０個以上もつデータと、そのデータ
に対応する特徴ベクトル及びそのデータの属するカテゴ
リとから構成される。例えば、データＡとデータＢがあ
り、それぞれの特徴ベクトルがｖ［Ａ］，ｖ［Ｂ］、そ
れぞれのカテゴリがＣ１，Ｃ２であって、かつデータＡ
がもつ関係の一つにデータＢへの関係があるとき、デー
タ保存手段３では図６のような形で保存される。The data stored in the data storage unit 3 includes data obtained by arbitrarily combining documents, images, sounds, and the like, and having zero or more relations to other data, a feature vector corresponding to the data, and the like. And the category to which the data belongs. For example, there are data A and data B, the respective feature vectors are v [A] and v [B], the respective categories are C1 and C2, and the data A
When one of the relations has a relation to the data B, the data is stored in the data storage means 3 in the form as shown in FIG.

【００３９】次に図５に示されるデータベース検索装置
の動作を、図１に示したデータベース検索装置との相違
部分を中心に、図を参照しながら詳しく説明する。Next, the operation of the database search device shown in FIG. 5 will be described in detail with reference to the drawings, focusing on the differences from the database search device shown in FIG.

【００４０】図５を参照すると、関係データ判定手段１
３は、近傍データ判定手段７で得られた位置グループＧ
ｐに属する任意のデータＡについて、当該データから辿
ることのできる関係を辿り、その先にある任意のデータ
Ｂを求める。そして、データＡおよびデータＢの属する
カテゴリと、カテゴリ間規則保存手段１４で保存されて
いる予め定められたカテゴリ間規則とを用いて、データ
Ｂをデータ集合Ｇｒに含めるか否かを判定する。例え
ば、カテゴリとして「ガイドライン」，「デザイン事
例」，「評価事例」という３つが予め定められており、
Ｇｐに「デザイン事例」に属するデータＡが含まれてい
るとする。そして、データＡはデータＢ，データＣ，デ
ータＤへの関係を持ち、それぞれのデータのカテゴリ
が、データＢは「ガイドライン」，データＣは「デザイ
ン事例」，データＤは「評価事例」であるとする。ま
た、カテゴリ間規則保存手段１４で図８のようなカテゴ
リ間規則が保存されているとする。このとき、図７を参
照すると、関係データ判定手段１３では、まずＧｒがＧ
ｒ＝Ｇｐと初期化され、Ｇｐに属するデータＡについ
て、カテゴリ間規則を参照する。カテゴリ間規則では、
「デザイン事例」から「ガイドライン」への関係、およ
び「デザイン事例」から「評価事例」への関係をともに
Ｇｒに含めると定められているので、データＢおよびデ
ータＤはＧｒに追加される。一方、「デザイン事例」か
ら「デザイン事例」への関係はＧｒに含めないとされて
いるので、データＣはＧｒに追加されない。したがっ
て、データＡについて、データＢおよびデータＤが新た
にＧｒに追加される。このような処理をＧｐに属する全
てのデータについて行う。Referring to FIG. 5, relational data determining means 1
3 is the position group G obtained by the neighborhood data determination means 7
For any data A belonging to p, a relationship traceable from the data is traced, and any data B beyond that is obtained. Then, using the category to which the data A and the data B belong and the predetermined inter-category rule stored in the inter-category rule storage unit 14, it is determined whether or not the data B is to be included in the data set Gr. For example, three categories, "guideline", "design case", and "evaluation case", are predetermined.
It is assumed that Gp includes data A belonging to “design case”. Data A has a relationship to data B, data C, and data D, and the category of each data is “guideline”, data C is “design case”, and data D is “evaluation case”. And It is also assumed that an inter-category rule as shown in FIG. At this time, referring to FIG. 7, the relational data determination means 13 first sets Gr to G
r = Gp is initialized and data A belonging to Gp is referred to the inter-category rule. In the cross-category rules,
Since it is defined that the relationship from “design case” to “guideline” and the relationship from “design case” to “evaluation case” are both included in Gr, data B and data D are added to Gr. On the other hand, since the relationship from “design case” to “design case” is not included in Gr, data C is not added to Gr. Therefore, for data A, data B and data D are newly added to Gr. Such processing is performed for all data belonging to Gp.

【００４１】こうして得られたデータ集合Ｇｒがユーザ
からの問合せに対する検索結果であり、そこに含まれる
データおよびデータ間の関係の一覧を検索結果表示手段
９で表示する。そして、ユーザがデータ指示手段１０を
用いて任意のデータを指示すると、指示されたデータの
内容がデータ表示手段１１により表示される。The data set Gr thus obtained is a search result in response to an inquiry from the user, and a list of data included therein and a relation between the data is displayed by the search result display means 9. Then, when the user designates arbitrary data using the data designating means 10, the content of the designated data is displayed on the data display means 11.

【００４２】これにより、ユーザは検索結果に含まれる
データのそれぞれについて、自身の求めるデータである
か否かを判断するに際して、問合せとの関連度だけでな
く、そこからどのようなデータを得ることができるかに
ついての情報を共に用いて、多角的に判断することがで
きる。Thus, when determining whether each of the data included in the search result is the data desired by the user, not only the degree of relevance to the query but also what data is obtained therefrom It is possible to make a multifaceted determination by using information on whether or not it is possible.

【００４３】次に本発明に係るデータベース登録装置の
一実施の形態について、図を参照しながら説明する。Next, an embodiment of the database registration apparatus according to the present invention will be described with reference to the drawings.

【００４４】図９を参照すると、本発明に係るデータベ
ース登録装置の実施の形態の構成は、データ登録者がデ
ータ（ノード）の内容を登録するコンテンツ登録手段１
６と、予め定められたキーワード集合を保存してあるキ
ーワード集合保存手段１７と、そのキーワード集合から
ノードの内容を適切に表すキーワードをデータ登録者が
選ぶためのキーワード選択手段１８と、データ登録者が
各選択キーワードに［０，１］なる実数を用いて重みづ
けを行うための重みづけ入力手段１９と、重みづけされ
た選択キーワードの集合から特徴ベクトルを生成する特
徴ベクトル生成手段２０と、データ（ノード）を蓄積す
るデータ蓄積手段２１と、関連ノードの集合をもとめる
位置グループ抽出部１２と、登録ノードと近傍ノードと
の間にリンクを追加する際にリンクの方向を判定するた
めのリンク方向判定手段２２と、その判定に用いる規則
を保存してあるリンクづけ規則保存手段２３と、登録ノ
ードをデータ蓄積手段２１に追加してリンク方向判定手
段２２で決定された方向に登録ノードと近傍ノードとの
間にリンクを追加するノード追加手段２４とを含む。Referring to FIG. 9, the configuration of the embodiment of the database registration apparatus according to the present invention is a content registration means 1 in which a data registrant registers the contents of data (node).
6, a keyword set storage unit 17 storing a predetermined keyword set, a keyword selection unit 18 for the data registrant to select a keyword appropriately representing the content of the node from the keyword set, and a data registrant A weighting input means 19 for weighting each selected keyword using a real number [0, 1]; a feature vector generating means 20 for generating a feature vector from a set of weighted selected keywords; (Node) data storage means 21, a position group extraction unit 12 for obtaining a set of related nodes, and a link direction for determining a link direction when adding a link between a registered node and a neighboring node Determining means 22, linking rule storing means 23 storing rules used for the determination, and data storage of the registered nodes In addition to the stage 21 and a node addition section 24 for adding a link between Node and the neighboring node in the direction determined by the link direction determination means 22.

【００４５】ここで、データ蓄積手段２１に格納される
データは、前述した図６に示されるものと同様で、文
書，画像，音声などを任意に組み合わせた、他のデータ
への関係を０個以上もつデータと、そのデータに対応す
る特徴ベクトル及びそのデータの属するカテゴリとから
構成される。Here, the data stored in the data storage means 21 is the same as that shown in FIG. 6 described above, and has no relation to other data obtained by arbitrarily combining documents, images, sounds, etc. It is composed of data having the above, a feature vector corresponding to the data, and a category to which the data belongs.

【００４６】また、位置グループ抽出部１２の内部構造
は、図１に示したデータベース検索装置の主要部を構成
する位置グループ抽出部１２と同じ構成であり、図１０
に示すように、特徴ベクトルの成分間の相関度の表を保
存しておく成分間相関表保存手段５と、特徴ベクトル生
成手段２０で生成された新規ノードの特徴ベクトルおよ
びデータ蓄積手段２１に保存されている各ノードに付加
されている特徴ベクトルの未定義成分を、これらのベク
トルで定義済みの成分と成分間相関表保存手段５に保存
された成分間の相関度の表とを用いて補完する未定義成
分補完手段４と、未定義成分補完手段４で未定義部分が
補完された新規ノードの特徴ベクトルとデータ蓄積手段
２１に保存されているノードの特徴ベクトルとの距離を
求めるベクトル間距離計算手段６と、ベクトル間距離計
算手段６で計算された特徴ベクトル間の距離が予め定め
られたしきい値以下であれば前記ノードを前記新規ノー
ドに対する近傍データと判定する近傍データ判定手段７
と、近傍データ判定手段７において近傍データの判定に
用いる予め定められたしきい値を保存しておくしきい値
保存手段８とから構成されている。また、未定義成分補
完手段４は、特徴ベクトル生成手段２０で生成された特
徴ベクトルおよび予め保存されたデータの特徴ベクトル
の未定義成分の補完値を求めるために必要な、その未定
義成分と相関をもつ定義された成分がない場合、その未
定義成分をどのような値との差をとっても常に０となる
ような特別な記号に置き換える。The internal structure of the position group extracting unit 12 is the same as that of the position group extracting unit 12 constituting the main part of the database search apparatus shown in FIG.
As shown in (1), the inter-component correlation table storage means 5 for storing a table of the degree of correlation between the components of the feature vector, and the feature vector of the new node generated by the feature vector generation means 20 and the data storage means 21 The undefined components of the feature vector added to each of the nodes are complemented using the components defined by these vectors and the table of the degree of correlation between the components stored in the inter-component correlation table storage unit 5. Undefined component complementing means 4 and a distance between vectors for calculating a distance between a feature vector of a new node whose undefined part has been complemented by the undefined component complementing means 4 and a feature vector of a node stored in the data storage means 21 If the distance between the feature vectors calculated by the calculating means 6 and the inter-vector distance calculating means 6 is equal to or less than a predetermined threshold value, the node is set to a neighborhood data for the new node. Data and determines proximity data determination unit 7
And a threshold value storage means 8 for storing a predetermined threshold value used for the determination of the proximity data in the proximity data determination means 7. In addition, the undefined component complementing means 4 correlates with the undefined component necessary for finding the complement value of the feature vector generated by the feature vector generating means 20 and the undefined component of the feature vector of the data stored in advance. If there is no defined component with, the undefined component is replaced with a special symbol that always becomes 0 regardless of the difference from any value.

【００４７】なお、コンテンツ登録手段１６，キーワー
ド選択手段１８，重みづけ入力手段１９は、マウスおよ
びキーボードなどの情報入力装置であり、キーワード集
合保存手段１７，特徴ベクトル生成手段２０，位置グル
ープ抽出部１２，リンク方向判定手段２２，リンクづけ
規則保存手段２３およびノード追加手段２４は、例えば
ＰＣやＷＳなどで動作するプログラムで実現される。ま
た、データ蓄積手段２１は、例えば固定ディスクなどの
外部記憶装置である。The content registration means 16, the keyword selection means 18, and the weight input means 19 are information input devices such as a mouse and a keyboard, and include a keyword set storage means 17, a feature vector generation means 20, and a position group extraction part 12. The link direction determining means 22, the linking rule storing means 23, and the node adding means 24 are realized by, for example, a program operating on a PC or WS. The data storage means 21 is an external storage device such as a fixed disk.

【００４８】このような構成のデータベース登録装置の
動作について図を参照しながら説明する。The operation of the database registration device having such a configuration will be described with reference to the drawings.

【００４９】図９を参照すると、まず登録者は登録ノー
ドの内容を予め文書，画像，音声を任意に組み合わせて
作成し、コンテンツ登録手段１６により入力する。ま
た、この登録ノードのカテゴリ（後述するガイドライ
ン，デザイン事例，評価事例など）をコンテンツ登録手
段１６により入力する。そして、キーワード選択手段１
８により、コンテンツ登録手段１６で入力したノードの
内容にしたがって、キーワード集合保存手段１７に予め
保存されたキーワード集合の中から適当なキーワードを
選択する。したがって、選択されないキーワードも存在
する。ここで、キーワード集合はユーザにとってわかり
やすいように幾つかのカテゴリに分類されていてもよ
い。例えば、図１１を参照すると、「一般」，「初心
者」，「視覚」などのキーワード集合（属性値）は、
「年齢層」，「システム経験」，「タスク知識」，「障
害」の４つのカテゴリ（属性）に分類されている。そし
て、登録者は、選択した各キーワードに対し、重みづけ
入力手段１９により［０，１］なる実数を当該キーワー
ドの重要度として割り当てる。このとき、前記カテゴリ
に対して実数を割り当てることを許してもよい。この場
合、そのカテゴリに属するキーワード全てにカテゴリに
対して割り当てられた実数を割り当てる。こうすること
で、キーワードの一括選択を行うことになり、ユーザの
重み付け入力時の手間が軽減される。また、各キーワー
ドに実数を割り当て、かつカテゴリにも実数を割り当て
ることを許してもよい。この場合、実数の与えられたカ
テゴリに属するキーワード全てについて、それぞれのキ
ーワードに割り当てられた実数と当該カテゴリに与えら
れた実数との積を求め、その値を当該キーワードに再び
割り当てる。こうすることで、繰り返し検索を行う際の
重みづけの調整にかかる手間を軽減できる。Referring to FIG. 9, the registrant first prepares the contents of the registration node by combining documents, images, and sounds arbitrarily in advance, and inputs the contents by the content registration means 16. The category of the registered node (a guideline, a design case, an evaluation case, etc., which will be described later) is input by the content registration unit 16. And a keyword selecting means 1
In step 8, an appropriate keyword is selected from the keyword set stored in the keyword set storage unit 17 in advance according to the content of the node input by the content registration unit 16. Therefore, some keywords are not selected. Here, the keyword set may be classified into several categories so as to be easily understood by the user. For example, referring to FIG. 11, a keyword set (attribute value) such as “general”, “beginner”, and “visual” is
It is classified into four categories (attributes): "age group", "system experience", "task knowledge", and "disability". Then, the registrant assigns a real number [0, 1] to the selected keyword by the weighting input unit 19 as the importance of the keyword. At this time, a real number may be assigned to the category. In this case, the real number assigned to the category is assigned to all the keywords belonging to the category. By doing so, the keywords are selected all at once, and the user's trouble in inputting weights is reduced. Also, a real number may be assigned to each keyword, and a real number may be assigned to a category. In this case, for all keywords belonging to the category given the real number, the product of the real number assigned to each keyword and the real number given to the category is obtained, and the value is reassigned to the keyword. By doing so, it is possible to reduce the time and effort required to adjust the weight when performing a repeated search.

【００５０】さて、特徴ベクトル生成手段２０は前段ま
での結果から当該登録ノードに対応する特徴ベクトルを
生成する。その際、キーワード選択手段１８で選択され
なかったキーワードには未定義であることを示す特別な
記号ｘを割り当てる。このようなｘとして、例えば−１
のように［０，１］でない実数を用いてもよいし、ある
いは例えば“ｘ”のように文字を用いてもよい。そし
て、上記のように各キーワードに割り当てられた実数
を、ある一定の順序に並べたベクトルが、登録ノードに
対応する特徴ベクトルである。例えば、上記で例として
挙げたキーワード集合に対して、「一般」，「初心
者」，「熟練者」，「乏しい」，「普通」，「豊富」，
「視覚」，「肢体」の８つのキーワードがキーワード選
択手段１８で選択され、続いて重みづけ入力手段１９で
図１２のように重要度がユーザにより定められたとき、
登録データに対応する特徴ベクトルは、（ｘ， 0.8，ｘ，０.9, ｘ，0.7, ０.3, ０.8, 0.7,
０.7, ｘ，０.1）（ｘは未定義部分）のようになる。The feature vector generating means 20 generates a feature vector corresponding to the registered node from the results up to the previous stage. At this time, a special symbol x indicating undefined is assigned to the keyword not selected by the keyword selecting means 18. As such x, for example, -1
For example, a real number other than [0, 1] may be used, or a character such as “x” may be used. A vector in which the real numbers assigned to the respective keywords as described above are arranged in a certain order is a feature vector corresponding to the registered node. For example, for the keyword set mentioned above as an example, "general", "beginner", "expert", "poor", "normal", "rich",
When the eight keywords “visual” and “limb” are selected by the keyword selecting means 18 and subsequently the importance is determined by the user as shown in FIG.
The feature vectors corresponding to the registered data are (x, 0.8, x, 0.9, x, 0.7, 0.3, 0.8, 0.7,
0.7, x, 0.1) (x is an undefined part).

【００５１】次に位置グループ抽出部１２により、特徴
ベクトル生成手段２０で決定された登録ノードの特徴ベ
クトルと、データ蓄積手段２２に保存された各ノードの
特徴ベクトルとの間の距離が計算され、予め定められた
しきい値以下の距離にあるノードの集合Ｇｐを求める。
即ち、図１０を参照すると、未定義成分補完手段４は、
特徴ベクトル生成手段２０で生成された新規ノードの特
徴ベクトルおよびデータ蓄積手段２１に保存されている
各ノードに付加されている特徴ベクトルの未定義成分
を、これらのベクトルで定義済みの成分と成分間相関表
保存手段５に保存された成分間の相関度の表とを用いて
補完し、ベクトル間距離計算手段６は、未定義部分が補
完された新規ノードの特徴ベクトルとデータ蓄積手段２
１に保存されているノードの特徴ベクトルとの距離を求
め、近傍データ判定手段７は、ベクトル間距離計算手段
６で計算された特徴ベクトル間の距離が予め定められた
しきい値以下であれば前記ノードを前記新規ノードに対
する近傍データと判定して、集合Ｇｐに含ませる。Next, the distance between the feature vector of the registered node determined by the feature vector generation unit 20 and the feature vector of each node stored in the data storage unit 22 is calculated by the position group extraction unit 12, A set Gp of nodes located at a distance equal to or less than a predetermined threshold is obtained.
That is, referring to FIG. 10, the undefined component complementing means 4
The undefined components of the feature vector of the new node generated by the feature vector generating means 20 and the feature vector added to each node stored in the data storage means 21 are defined by the components defined by these vectors and the components Interpolation is performed using the table of the degree of correlation between components stored in the correlation table storage unit 5, and the inter-vector distance calculation unit 6 calculates the feature vector of the new node whose undefined part has been completed and the data storage unit 2.
1. The distance from the feature vector of the node stored in 1 is obtained, and the neighborhood data determination unit 7 determines that the distance between the feature vectors calculated by the inter-vector distance calculation unit 6 is equal to or less than a predetermined threshold. The node is determined as neighboring data for the new node, and is included in the set Gp.

【００５２】次にリンク方向判定手段２２は、位置グル
ープ抽出部１２で求められたＧｐの各ノードについて、
登録ノードに対するリンク方向を判定する。その際、リ
ンクづけ規則保存手段２３に予め保存されたリンクづけ
規則が適用される。リンクづけ規則はノードのカテゴリ
をもとに定められており、例えば「ガイドライン」，
「デザイン事例」，「評価事例」の３つのカテゴリにつ
いては図１３のようなリンクづけ規則となる。今、登録
ノードが「ガイドライン」に属し、ノードＡ，Ｂ，Ｃが
それぞれ「ガイドライン」，「デザイン事例」，「評価
事例」に属しているとすると、図１３のリンクづけ規則
に従って登録ノードとノードＡ，Ｂ，Ｃそれぞれとの間
のリンク方向は、図１４のように判定される。Next, the link direction determining means 22 calculates, for each node of Gp obtained by the position group extracting unit 12,
Determine the link direction to the registered node. At that time, a linking rule stored in the linking rule storage unit 23 in advance is applied. The linking rules are defined based on the category of the node, for example, "Guidelines",
The linking rules shown in FIG. 13 are applied to the three categories of “design case” and “evaluation case”. Now, assuming that the registered node belongs to the "guideline" and the nodes A, B, and C belong to the "guideline", "design case", and "evaluation case", respectively. The link direction between each of A, B, and C is determined as shown in FIG.

【００５３】最後に、ノード追加手段２４によって、登
録ノードとその特徴ベクトル並びにカテゴリ、および位
置グループＧｐに属する各ノードへの（からの）リンク
を、リンク方向判定手段２２で決定された方向に追加
し、ノード蓄積手段２２へ蓄積する。Finally, the node adding means 24 adds the registered node, its feature vector, category, and a link to (from) each node belonging to the position group Gp in the direction determined by the link direction determining means 22. Then, the data is accumulated in the node accumulation means 22.

【００５４】以上により、新規データの登録時に、新規
データ及び近傍データの属するカテゴリ情報と、カテゴ
リを用いたリンクづけ規則とにより、新規データと近傍
データとの間に追加するリンク方向を判定し、その方向
にリンクを自動的に追加して新規データ及びそのリンク
情報を蓄積するため、ハイパーテキストの新規ノード作
成時の関連ノード探索とリンク追加にかかる手間を軽減
することができる。As described above, when registering new data, the link direction to be added between the new data and the neighboring data is determined by the category information to which the new data and the neighboring data belong and the linking rule using the category. Since a link is automatically added in that direction to accumulate new data and its link information, it is possible to reduce the time required for searching for a related node and adding a link when creating a new node of hypertext.

【００５５】[0055]

【発明の効果】以上説明したように本発明によれば以下
のような効果を得ることができる。As described above, according to the present invention, the following effects can be obtained.

【００５６】請求項１乃至３記載の発明によれば、問合
せもしくは蓄積されたデータの特徴ベクトルの一部に含
まれる未定義成分を、成分間の相関度に基づき、定義さ
れた成分から補完値を決定するので、問合せ及び蓄積さ
れているデータの特徴ベクトルの全ての成分が定義され
ていなくても特徴ベクトルを用いた効率のよい高精度の
関連度計算を行うことができる。According to the first to third aspects of the present invention, an undefined component included in a part of a feature vector of an inquiry or stored data is replaced with a complement value from a defined component based on the degree of correlation between components. Is determined, it is possible to perform an efficient and highly accurate relevance calculation using the feature vector even if all components of the feature vector of the query and the stored data are not defined.

【００５７】また、請求項２に記載の発明によれば、特
徴ベクトルの或る未定義部分の補完値を求めるために必
要な定義された成分がない場合、任意の値に対する差に
ついて０となるような特別な記号で当該未定義部分を置
き換えるため、後段のベクトル間距離計算手段において
未定義部分の存在する部分が無視される。したがって、
特徴ベクトル中に多くの未定義部分が存在する場合にも
特徴ベクトルを用いた効率のよい関連度計算を行うこと
ができる。According to the second aspect of the present invention, when there is no defined component necessary for obtaining a complement value of a certain undefined portion of the feature vector, the difference with respect to an arbitrary value becomes 0. Since the undefined portion is replaced with such a special symbol, the portion where the undefined portion exists is ignored by the inter-vector distance calculating means in the subsequent stage. Therefore,
Even when there are many undefined portions in the feature vector, efficient relevance calculation using the feature vector can be performed.

【００５８】また、請求項３に記載の発明によれば、問
合せと関連度の高いデータの集合を求めるだけでなく、
その集合に属する各データと関連をもつデータをも検索
結果に含めて表示するため、ユーザは検索結果に含まれ
る各データについて関連度以外の付加的な情報から多角
的な判断ができ、より検索結果が自身の求めるものか否
かの見通しをつけやすくなる。According to the third aspect of the present invention, not only is a data set having a high degree of relevance to a query obtained,
Since the data that is related to each data belonging to the set is also included in the search results and displayed, the user can make multi-faceted judgments from the additional information other than the relevance for each data included in the search results. This makes it easier to see whether the result is what you want.

【００５９】そして、請求項４に記載の発明によれば、
データの登録時に、入力データに対して予め定められた
キーワード集合からユーザの任意によりキーワードを選
択し、選択されたそれぞれのキーワードに対して重みづ
けを行った結果から入力データの特徴ベクトルを生成す
るため、文書，画像，音声などの任意の組み合わせによ
る非定型データに対して、データ登録者にとってわかり
やすい単一の方法で特徴づけることができ、かつ適切な
キーワード付与にかかる手間を軽減することができる。According to the fourth aspect of the present invention,
At the time of data registration, a keyword is arbitrarily selected by the user from a predetermined keyword set for the input data, and a feature vector of the input data is generated from a result of weighting each selected keyword. Therefore, it is possible to characterize atypical data formed by an arbitrary combination of documents, images, sounds, and the like in a single method that is easy for the data registrant to understand, and to reduce the time and effort required to assign appropriate keywords. .

【００６０】また、請求項５に記載の発明によれば、新
規データの登録時に、新規データ及び近傍データの属す
るカテゴリ情報とカテゴリを用いたリンクづけ規則とに
より、新規データと近傍データとの間に追加するリンク
方向を判定し、その方向にリンクを自動的に追加して新
規データ及びそのリンク情報を蓄積するため、ハイパー
テキストの新規ノード作成時の関連ノード探索とリンク
追加にかかる手間を軽減することができる。According to the fifth aspect of the present invention, at the time of registration of new data, the new data and the neighboring data are linked to each other by the category information and the linking rule using the category. To determine the direction of the link to be added, and automatically add a link in that direction to accumulate new data and its link information, reducing the time required to search for related nodes and add links when creating new nodes for hypertext. can do.

[Brief description of the drawings]

【図１】本発明に係るデータベース検索装置の第１の実
施の形態のブロック図である。FIG. 1 is a block diagram of a first embodiment of a database search device according to the present invention.

【図２】図１中のデータ保存手段に保存されているデー
タの説明図である。FIG. 2 is an explanatory diagram of data stored in a data storage unit in FIG.

【図３】問合せ入力時にユーザに提示されるキーワード
集合の例とユーザが実際に選択したキーワードの例とを
示す図である。FIG. 3 is a diagram showing an example of a keyword set presented to a user when an inquiry is input and an example of a keyword actually selected by the user.

【図４】成分間相関表の例を示す図である。FIG. 4 is a diagram showing an example of an inter-component correlation table.

【図５】本発明に係るデータベース検索装置の第２の実
施の形態のブロック図である。FIG. 5 is a block diagram of a second embodiment of the database search device according to the present invention.

【図６】図５中のデータ保存手段に保存されているデー
タの説明図である。6 is an explanatory diagram of data stored in a data storage unit in FIG.

【図７】関係データ判定手段の動作説明図である。FIG. 7 is an explanatory diagram of an operation of a related data determination unit.

【図８】カテゴリ間規則の例を示す図である。FIG. 8 is a diagram illustrating an example of an inter-category rule.

【図９】本発明に係るデータベース登録装置の実施の形
態のブロック図である。FIG. 9 is a block diagram of an embodiment of a database registration device according to the present invention.

【図１０】データベース登録装置における位置グループ
抽出部の構成例を示すブロック図である。FIG. 10 is a block diagram illustrating a configuration example of a position group extraction unit in the database registration device.

【図１１】キーボード集合保存手段に保存されているキ
ーワード集合（属性値）の例とカテゴリの例とを示す図
である。FIG. 11 is a diagram illustrating an example of a keyword set (attribute value) and an example of a category stored in a keyboard set storage unit.

【図１２】データ登録者が新規ノードの登録時にそのノ
ードに対して選択したキーワード（属性値）とその重要
度（重みづけ）の例を示す図である。FIG. 12 is a diagram illustrating an example of a keyword (attribute value) selected for a new node by a data registrant at the time of registration of the node and its importance (weight).

【図１３】リンクづけ規則の例を示す図である。FIG. 13 is a diagram illustrating an example of a linking rule.

【図１４】登録された新規ノードに付与されたリンクの
例を示す図である。FIG. 14 is a diagram illustrating an example of a link provided to a registered new node.

[Explanation of symbols]

１問合せ入力手段２問合せベクトル生成手段３データ保存手段４未定義成分補完手段５成分間相関表保存手段６ベクトル間距離計算手段７近傍データ判定手段８しきい値保存手段９検索結果表示手段１０データ指示手段１１データ表示手段１２位置グループ抽出部１３関係データ判定手段１４カテゴリ間規則保存手段１５関係グループ抽出部１６コンテンツ登録手段１７キーワード集合保存手段１８キーワード選択手段１９重みづけ入力手段２０特徴ベクトル生成手段２１データ蓄積手段２２リンク方向判定手段２３リンクづけ規則保存手段２４ノード追加手段 DESCRIPTION OF SYMBOLS 1 Query input means 2 Query vector generation means 3 Data storage means 4 Undefined component complementing means 5 Inter-component correlation table storage means 6 Vector distance calculation means 7 Neighbor data judgment means 8 Threshold value storage means 9 Search result display means 10 Data Instruction means 11 Data display means 12 Position group extraction unit 13 Relational data determination means 14 Inter-category rule storage means 15 Relation group extraction unit 16 Content registration means 17 Keyword set storage means 18 Keyword selection means 19 Weighting input means 20 Feature vector generation means 21 data storage means 22 link direction determination means 23 linking rule storage means 24 node addition means

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号ＦＩＧ０６Ｆ 15/403 ３８０Ｅ 15/419 ３２０ ──────────────────────────────────────────────────の Continued on the front page (51) Int.Cl. ⁶ Identification code FIG06F 15/403 380E 15/419 320

Claims

[Claims]

A query input unit for inputting a query from a user; a data storage unit for storing data and a feature vector of the data; and a query for converting the query input by the query input unit into a query vector. A vector generation unit, an inter-component correlation table storage unit for storing a table of the degree of correlation between components of the feature vector, and an inquiry vector generated by the inquiry vector generation unit and each of the data stored in the data storage unit. An undefined component that complements the undefined components of the feature vector added to the data using the components defined by these vectors and the table of the degree of correlation between the components stored in the inter-component correlation table storage unit. Complementing means, the query vector whose undefined part has been complemented by the undefined component complementing means, and the data stored in the data storing means. An inter-vector distance calculating means for obtaining a distance from the feature vector, and if the distance between the feature vectors calculated by the inter-vector distance calculating means is equal to or less than a predetermined threshold value, the data is neighborhood data for the inquiry. Proximity data determination means, threshold value storage means for storing a predetermined threshold value used for determination of proximity data in the proximity data determination means, and proximity data determination means in the proximity data determination means Search result display means for displaying a list of searched data, data instructing means for instructing data to be displayed by the user from the data list displayed in the search result display means, and data instructing by the data instructing means. And a display means for displaying the contents of the data.

2. The undefined component complementing means includes a defined component having a correlation with the undefined component necessary for obtaining a complement value of the undefined component of the query vector and a feature vector of data stored in advance. 2. The database search device according to claim 1, wherein when there is no, the undefined component is replaced with a special symbol that always becomes 0 regardless of the difference from any value.

3. A relation between data is determined in advance,
And if the data is broken down into several categories,
The predetermined category stored by the inter-category rule storage unit is determined from the category to which each of the nearby data obtained by the nearby data determination unit belongs and the category to which the data that can be reached by following the relationship from the nearby data belongs. The database search device according to claim 1 or 2, further comprising a relational data determination unit configured to determine whether to include, in a search result, data that can be reached by following a relation from the neighboring data using a rule.

4. A content registration means for a data registrant to input the content of new data, and a data registrant selects a keyword suitable for representing characteristics of the new data from a predetermined keyword set. A keyword set for storing the keyword set used by the keyword selecting means, and weighting each keyword selected by the keyword selecting means with a real number [0, 1]. And a feature vector generating means for generating a feature vector of new data from the keyword selected by the keyword selecting means and the weight inputted by the weight input means. And data storage means for storing the new data and a feature vector of the new data. Database registration apparatus characterized by.

5. An inter-component correlation table storage unit for storing a table of the degree of correlation between components of a feature vector, a feature vector of new data generated by the feature vector generation unit, and a feature vector stored in the data storage unit. The undefined components of the feature vector added to each data are complemented by using the components defined by these vectors and the correlation degree table between the components stored in the inter-component correlation table storage unit. An undefined component complementing means, and a distance between vectors for finding a distance between a feature vector of the new data whose undefined part has been complemented by the undefined component complementing means and a feature vector of the data stored in the data storage means. Calculating means, if the distance between the feature vectors calculated by the inter-vector distance calculating means is equal to or less than a predetermined threshold value, the data is compared with the new data. Neighborhood data determination means for determining proximity data to be stored, threshold value storage means for storing a predetermined threshold value used for determination of proximity data in the proximity data determination means, determination by the proximity data determination means A link direction for determining a direction of a link to be newly added between the proximity data and the new data from the category of the proximity data and the category of the new data by using a predetermined linking rule. Determining means; adding a link between the neighboring data and the new data in the direction determined by the link direction determining means, and storing information on the new data and the link in the data storage means; 5. The database registration device according to claim 4, further comprising: means.

6. The undefined component complementing means, which is required to obtain a complement value of an undefined component of a feature vector generated by the feature vector generating means and a feature vector of data stored in advance, 6. The method according to claim 5, wherein when there is no defined component having a correlation with the component, the undefined component is replaced with a special symbol which is always 0 regardless of the difference from any value. Database registration device.