JP3024045B2

JP3024045B2 - Data retrieval device based on natural language

Info

Publication number: JP3024045B2
Application number: JP6144979A
Authority: JP
Inventors: 洋池内; 育雄芥子
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1994-06-27
Filing date: 1994-06-27
Publication date: 2000-03-21
Anticipated expiration: 2015-03-21
Also published as: JPH0816611A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は自然言語に基づくデータ
検索装置に関し、特に電子ブック、電子辞書等のテキス
トデータ、及び画像データベースあるいはそれらの複合
されたマルチメディアデータベースを、自然言語に基づ
き検索する装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data retrieval apparatus based on a natural language, and in particular, retrieves text data such as an electronic book and an electronic dictionary, and an image database or a composite multimedia database based on a natural language. Related to the device.

【０００２】[0002]

【従来の技術】従来、自然言語によるデータ検索装置と
してはフルテキストサーチによるもの、キーワード検索
によるもの、またはベクトルの一致度により検索を行な
うものなどがある。2. Description of the Related Art Conventionally, as a data search device using a natural language, there are a device using a full text search, a device using a keyword search, and a device performing a search based on the degree of coincidence of vectors.

【０００３】入力画像をもとに画像検索を行なう画像検
索装置に学習機能を組み込んだ、類似パターン検索装置
が、特開平４−３４０６７７号公報に開示されている。
この装置は、入力されたパターン画像から類似のパター
ン画像をパターンの物理的特徴を基に検索するものであ
り、物理的特徴から類似度を算出する手段をニューラル
ネットワークで構成することによって、ニューラルネッ
トワークの出力である類似度と与えられた類似度の期待
値の差が小さくなるように学習することを可能にしてい
る。[0003] Japanese Patent Application Laid-Open No. 4-340677 discloses a similar pattern search device in which a learning function is incorporated in an image search device for performing image search based on an input image.
This apparatus searches for a similar pattern image from an input pattern image based on the physical characteristics of the pattern. By configuring a means for calculating the degree of similarity from the physical characteristics by a neural network, the neural network It is possible to perform learning so as to reduce the difference between the similarity output from the above and the expected value of the given similarity.

【０００４】自然言語による画像検索方法としては、画
像にキーワードを付加するキーワード検索方法がある。
これは画像にその特徴を表す複数のキーワードを付加
し、入力文として与えられたキーワードと画像に付加さ
れたキーワードとの一致度により画像を検索するもので
ある。As a natural language image search method, there is a keyword search method for adding a keyword to an image.
This is to add a plurality of keywords representing the features to an image and search for the image based on the degree of coincidence between the keyword given as an input sentence and the keyword added to the image.

【０００５】学習機能を有するキーワード検索を基にし
た情報データベース装置が、特開平５−２３３７０７号
公報に開示されている。この装置は、入力された文章あ
るいはキーワードの属するクラスを決定するとともに、
そのクラスから学習結果に基づいて、別のキーワードの
連想を行い、文書の分類及びキーワードの連想支援が可
能である。文章の属するクラスを決定する方法として、
入力された文章からキーワードの抽出を行ない、それを
基に変換された特徴ベクトルから文章の属するクラスを
決定する。クラスの決定の手段としてニューラルネット
ワークが用いられているため、利用者が与えたクラス分
類に適応する学習も可能である。なお、この装置で用い
られる特徴ベクトルは、単にキーワードの出現頻度に基
づくものであるので、検索方法としてはキーワード検索
の一種とみなせる。[0005] An information database apparatus based on a keyword search having a learning function is disclosed in Japanese Patent Application Laid-Open No. 5-233707. This device determines the class to which the input sentence or keyword belongs,
Based on the learning result from the class, association with another keyword is performed, and document classification and keyword association support are possible. As a method of determining the class to which a sentence belongs,
The keyword is extracted from the input text, and the class to which the text belongs is determined from the feature vector converted based on the keyword. Since a neural network is used as a means for determining a class, learning adapted to a class classification given by a user is also possible. It should be noted that the feature vector used in this device is simply based on the frequency of appearance of the keyword, and can be regarded as a type of keyword search as a search method.

【０００６】学習機能を有し、フルテキストサーチを行
う文書検索方法及び文書検索システムが、特開平５−３
４２２５５号公報に開示されている。この方法及びシス
テムは、入力文章と検索対象文との適合度の決定を行な
う際に、入力文章の構文解析、入力文章内の単語間の結
合度の算出、検索対象文内の単語間の結合度の算出、類
義語辞書内の単語との結合度の算出、入力文章と検索対
象文の単語間の結合度の算出を順に実行する。各過程の
結合度の算出手段がニューラルネットワークにより構成
されていることにより、出力として得られる適合度と利
用者が与えた適合度との差が小さくなるように学習する
ことが可能である。A document search method and a document search system having a learning function and performing a full text search are disclosed in Japanese Patent Laid-Open No. 5-3.
No. 42255. This method and system are used to determine the degree of matching between an input sentence and a search target sentence, to analyze the syntax of the input sentence, calculate the degree of connection between words in the input sentence, and connect the words between words in the search target sentence. The calculation of the degree, the calculation of the degree of connection with the word in the synonym dictionary, and the calculation of the degree of connection between the input sentence and the word of the search target sentence are sequentially executed. Since the means for calculating the degree of connection of each process is constituted by a neural network, it is possible to learn so that the difference between the degree of fitness obtained as an output and the degree of fitness given by the user is small.

【０００７】電子情報通信学会発行の信学技法ＡＩ９２
−９９（１９９３−１）「大規模文書データベースから
の連想検索」には、文脈ベクトルによる文書検索方法が
提案されている。ここで使用される「文脈ベクトル」
は、単語の意味を分散的に表現したものであり、各単語
を１００個以上の特徴単語と関係付けたものである。文
書や質問文の文脈ベクトルは、テキスト中から抽出され
た各重要単語の文脈ベクトルを正規化して求められる。
検索が、質問文の文脈ベクトルと各文書の文脈ベクトル
との内積を計算して行われることにより、検索結果のラ
ンキング及び類似文例の検索が可能である。[0007] AI92 issued by IEICE.
-99 (1993-1), "Associative search from large-scale document database", a document search method using a context vector is proposed. "Context vector" used here
Is a representation of the meaning of a word in a distributed manner, and associates each word with 100 or more characteristic words. The context vector of a document or a question sentence is obtained by normalizing the context vector of each important word extracted from the text.
Since the search is performed by calculating the inner product of the context vector of the question sentence and the context vector of each document, it is possible to search the ranking of the search results and search for similar sentence examples.

【０００８】[0008]

【発明が解決しようとする課題】画像イメージデータの
検索を行う場合、上述の類似パターン検索装置では、画
像として表されたパターンの類似によって検索を行うた
め、表示されている内容による検索ができない。When searching for image data, the above-described similar pattern search apparatus searches for similarities in the pattern represented as an image, and therefore cannot search for the displayed contents.

【０００９】付加されたキーワードによってデータの検
索を行う場合、上述のキーワード検索を基にした情報デ
ータベース装置では、キーワードとしてデータに付加さ
れた単語でしか検索できず、また利用者が入力すると想
定される印象語・連想語等を含む全ての単語を予めキー
ワードとして画像に付加しておくことも事実上不可能で
ある。従って、キーワードとして付加された限られた入
力文での検索しか出来ない。同様に、上述のフルテキス
トサーチを行う文書検索方法および文書検索システムに
おいても、検索対象文書中に現れる単語あるいは文でし
か検索できない。[0009] In the case of performing data search using an added keyword, it is assumed that an information database apparatus based on the above-described keyword search can search only by a word added to data as a keyword, and that a user inputs. It is practically impossible to add in advance all words including impression words and association words to images as keywords. Therefore, only a limited input sentence added as a keyword can be searched. Similarly, in the above-described document search method and document search system for performing a full-text search, search can be performed only by words or sentences appearing in a search target document.

【００１０】以上のようにキーワードまたはパターンに
よるいずれの検索方法においても、検索対象データ内に
明示的に表現された自然言語の範囲でしか検索できず、
人間の知識・感性等に応じた検索は不可能である。As described above, any search method using a keyword or a pattern can search only in the range of a natural language explicitly expressed in search target data.
It is impossible to search according to human knowledge and sensitivity.

【００１１】また、文脈ベクトルによる検索方法によれ
ば、類似文の検索が可能であるが、高速なハードウェア
が必要であり、装置のコストが高くなり、規模が大きい
ものとなってしまう。According to the search method based on the context vector, a similar sentence can be searched, but high-speed hardware is required, the cost of the apparatus is increased, and the scale becomes large.

【００１２】さらに、装置に学習機能を持たせるために
ニューラルネットワークを使用する場合においても、高
速かつ大規模なハードウェアが必要であり、装置のコス
トが高くなり、規模が大きいものとなってしまう。Further, even when a neural network is used to provide a learning function to the device, high-speed and large-scale hardware is required, which increases the cost of the device and increases the scale. .

【００１３】本発明は、上記の課題を解決するためにな
されたもので、データの特徴を表すデータベクトル及び
質問文の特徴を表すデータベクトルを、人間の知識・感
性等に応じて特徴付けられた単語ベクトルを基にして作
成し、入力された単語あるいは文から人間の知識・感性
等に応じてテキストデータ以外の画像データ等の検索も
可能な、自然言語によるデータ検索装置を提供すること
を目的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems, and is characterized in that a data vector representing a feature of data and a data vector representing a feature of a question sentence are characterized according to human knowledge and sensitivity. To provide a data search device in a natural language that can be created based on a word vector that has been input and that can search for image data other than text data from input words or sentences according to human knowledge and sensitivity. Aim.

【００１４】[0014]

【課題を解決するための手段】本発明によれば、前述の
目的は、検索対象データを質問文として入力する入力手
段と、単語の特徴を表す単語ベクトルを格納する単語辞
書と、前記単語ベクトルに基づいて前記質問文の特徴を
表すデータベクトルを作成するベクトル生成手段と、検
索対象データ及び検索対象データの特徴を表すデータベ
クトルを格納するデータベースと、前記質問文のデータ
ベクトルと前記検索対象データのデータベクトルとの一
致度から前記データベース内の検索対象データを検索す
る検索手段と、検索結果を出力する出力手段と、前記検
索結果に対する利用者の判断を入力する判断入力手段
と、前記利用者の判断に応じて前記質問文のデータベク
トルに所定値を乗じた値を前記検索対象データのデータ
ベクトルに加算又は減算して更新する学習手段とを備え
るデータ検索装置であって、前記ベクトル生成手段は、
前記単語ベクトルに基づいて画像データに付加された説
明文の特徴を表すデータベクトルを作成する請求項１の
データ検索装置によって達成される。According to Means for Solving the Problems] The present invention, the above objective, an input means for inputting a search target data as quality Toibun, a word dictionary for storing word vector representing a feature of a word, the wherein the vector generation means for creating a data vector based on the word vectors representing the features of the question, a database for storing data vector representing the feature of the search target data and the search target data, the data vector of the question from the coincidence degree between data vector search target data search means for searching for the search target data in the database, and output means for outputting a search result, the search
Judgment input means for inputting user judgment on search results
And the database of the question sentence according to the judgment of the user.
To the search target data
A data search device comprising: a learning unit that updates by adding or subtracting from a vector, wherein the vector generation unit includes :
Theory added to the image data based on the word vectors
This is achieved by the data retrieval device of claim 1 for creating a data vector representing a feature of a clear text .

【００１５】[0015]

【００１６】[0016]

【作用】請求項１のデータ検索装置においては、入力手
段により検索対象データが質問文として入力され、ベク
トル生成手段により単語ベクトルに基づいて質問文の特
徴を表すデータベクトルを作成し、検索手段により質問
文のデータベクトルと検索対象データのデータベクトル
との一致度からデータベース内の検索対象データが検索
され、出力手段により検索結果が出力される。人間の知
識・感性等に応じて特徴付けられた単語ベクトルから作
成されたデータベクトルを使用し、データベクトルの一
致度から検索対象データを検索することにより、質問文
中に含まれる単語と一致するデータのみならず、人間の
知識・感性等に応じた検索が可能となる。さらに、ベク
トル生成手段が単語ベクトルに基づいて画像データに付
加された説明文の特徴を表すデータベクトルを作成する
ことにより、画像データ等を含むマルチメディアデータ
ベースを自然言語によって検索することが可能となる。[Action] In the data retrieval apparatus according to claim 1, the search target data by the input means is input as the quality Toibun, the question on the basis of the word vector by vector generation JP
Create a data vector representing the butterfly, questions by the search means
The search target data in the database is searched based on the degree of coincidence between the data vector of the sentence and the data vector of the search target data , and a search result is output by the output unit. By using a data vector created from word vectors characterized according to human knowledge, sensitivity, etc., data that matches the words contained in the question sentence is searched by searching the search target data based on the degree of matching of the data vectors. Not only that, it is also possible to perform a search in accordance with human knowledge, sensitivity, and the like. Further, the vector generation means attaches the image data based on the word vector .
By creating a data vector representing the features of the added explanation , a multimedia database including image data and the like can be searched in a natural language.

【００１７】[0017]

【００１８】[0018]

【実施例】以下、本発明の実施例を図を参照しながら詳
述する。DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of the present invention will be described below in detail with reference to the drawings.

【００１９】図１は本実施例の構成図であり、質問文を
入力するキーボード等からなる入力手段１と、各データ
及びその特徴を表すデータベクトルを格納する検索対象
となるデータベース２と、検索結果を出力するディスプ
レイ等の出力手段３と、各単語の特徴を表す単語ベクト
ルを格納する単語辞書４と、自然言語テキストからデー
タベクトルを生成するベクトル生成手段５と、検索を実
行する検索手段６と、検索結果に対する利用者の判断を
入力する判断入力手段７と、入力された判断に応じた学
習を行なう学習手段８とからなる。FIG. 1 is a block diagram of the present embodiment. The input means 1 comprises a keyboard or the like for inputting a question sentence, a database 2 to be searched for storing each data and a data vector representing its characteristics, An output unit 3 such as a display for outputting a result; a word dictionary 4 for storing word vectors representing the characteristics of each word; a vector generating unit 5 for generating a data vector from a natural language text; and a search unit 6 for executing a search And judgment input means 7 for inputting a user's judgment on the search result, and learning means 8 for performing learning according to the input judgment.

【００２０】本実施例での単語辞書４、ベクトル生成手
段５及び検索手段６の具体的構成は、前述の「大規模文
書データベースからの連想検索」で提案されたシステム
の各構成と同様であり、本実施例の「データベクトル」
は「文脈ベクトル」に対応している。The specific configurations of the word dictionary 4, the vector generation unit 5 and the search unit 6 in this embodiment are the same as those of the system proposed in the above-mentioned "associative search from a large-scale document database". , "Data vector" of the present embodiment
Corresponds to a "context vector".

【００２１】以下、本実施例の動作を図を参照しながら
説明する。The operation of this embodiment will be described below with reference to the drawings.

【００２２】まず、ベクトル生成手段５の動作を図２を
参照しながら説明する。図２で例えば入力文を自然言語
テキスト「火山の爆発で放出された溶岩」とすると、ス
テップＳ２１で「火山」「爆発」「放出」「溶岩」の４
つの単語が抽出される。ステップＳ２２で単語辞書に格
納された各々の単語の単語ベクトルが以下のように抽出
される。First, the operation of the vector generation means 5 will be described with reference to FIG. In FIG. 2, for example, if the input sentence is a natural language text “lava released by a volcanic explosion”, in step S21, “volcanic”, “explosion”, “release”, “lava”
Two words are extracted. In step S22, a word vector of each word stored in the word dictionary is extracted as follows.

【００２３】火山＝（０，２，０，０，１，０，２，０，１）爆発＝（０，１，０，０，０，１，１，０，０）放出＝（１，０，０，０，０，１，０，０，１）溶岩＝（０，２，０，０，０，２，１，０，０）ステップＳ２３で各単語ベクトルを加えてＣＶ０を得た
後、ＣＶ０を長さ１０に正規化してＣＶ１を以下のよう
に求める。Volcano = (0,2,0,0,1,0,2,0,1) Explosion = (0,1,0,0,0,1,1,0,0) Emission = (1, (0,0,0,0,1,0,0,1) Lava = (0,2,0,0,0,2,1,0,0) In step S23, each word vector was added to obtain CV0. Thereafter, CV0 is normalized to a length of 10, and CV1 is obtained as follows.

【００２４】ＣＶ０＝（１，５，０，０，１，４，４，０，２）｜ＣＶ０｜＝（１²＋５²＋・・・＋２²）^1/2＝７．９３
７ＣＶ１＝１０×ＣＶ０／｜ＣＶ０｜＝（１，６，０，０，１，５，５，０，３）ステップＳ２４でＣＶ１を入力文に対応するデータベク
トルとして出力する。このように自然言語テキスト中か
ら抽出された各単語ベクトルの加重和を正規化したもの
をそのテキストのデータベクトルとして生成する。デー
タベース内に格納されている自然言語のテキストデータ
に対応する各データベクトルは、ベクトル生成手段５を
用いて上述の方法で生成されたものである。また質問文
に対応するデータベクトルは、質問文を入力としてベク
トル生成手段５によって同様の方法で生成される。CV0 = (1,5,0,0,1,4,4,0,2) | CV0 | = (1 ² +5 ² +... +2 ² ) ^1/2 = 7.93
7 CV1 = 10 × CV0 / | CV0 | = (1,6,0,0,1,5,5,0,3) In step S24, CV1 is output as a data vector corresponding to the input sentence. In this way, a normalized weighted sum of each word vector extracted from the natural language text is generated as a data vector of the text. Each data vector corresponding to the text data of the natural language stored in the database is generated by the above-described method using the vector generating means 5. The data vector corresponding to the question sentence is generated by the vector generating means 5 in the same manner as the question sentence as an input.

【００２５】図３に自然言語テキストと対応するデータ
ベクトルの格納方法の例を示す。ここでは自然言語テキ
ストとデータベクトルとを対応付けるインデックスを同
時に保持している。ここで、インデックスとは、対応す
るテキストデータのアドレスを意味している。FIG. 3 shows an example of a method of storing a data vector corresponding to a natural language text. Here, an index for associating a natural language text with a data vector is simultaneously held. Here, the index means the address of the corresponding text data.

【００２６】次に、本実施例で画像データのデータベク
トルを生成する方法について図５を参照しながら説明す
る。Next, a method of generating a data vector of image data in this embodiment will be described with reference to FIG.

【００２７】本実施例における画像データは、図４に示
すように、出力手段３により画像表示が可能なビットイ
メージデータである画像イメージデータに説明文が付加
されたものである。まずステップＳ５１で、画像番号を
表すｉを１に初期化する。ステップＳ５２で、ｉ番目の
画像のイメージデータのインデックスをバッファＩＤＸ
ｉに格納し、ステップＳ５３でｉ番目の画像に対応する
説明文を抽出し、ステップＳ５４でその説明文を入力と
してベクトル生成手段５によりデータベクトルＣＶを求
める。ステップＳ５５で、求められたデータベクトルＣ
ＶをデータベクトルバッファＤＣＶｉに格納する。ステ
ップＳ５６で、次の画像を指すようｉに１を足し、画像
データが終りであればフローは終了しそうでなければス
テップ５２に戻り上述の動作を繰り返す。上述の方法で
得られたデータベクトル、インデックスを図３との対比
で図示すると図６の様になり、画像データに対しても対
応するデータベクトルが格納される。ここで、インデッ
クスとは、対応する画像データのアドレスを意味してい
る。As shown in FIG. 4, the image data in the present embodiment is obtained by adding a description to image data which is bit image data that can be displayed by the output means 3. First, in step S51, i representing an image number is initialized to 1. In step S52, the index of the image data of the i-th image is stored in the buffer IDX.
In step S53, a description corresponding to the i-th image is extracted. In step S54, the description is used as an input to obtain a data vector CV by the vector generation unit 5. In step S55, the obtained data vector C
V is stored in the data vector buffer DCVi. In step S56, 1 is added to i so as to indicate the next image. If the image data is over, the flow is not finished. If not, the process returns to step 52 and the above-described operation is repeated. FIG. 6 shows the data vectors and indices obtained by the above-described method in comparison with FIG. 3, and corresponding data vectors are stored for image data. Here, the index means the address of the corresponding image data.

【００２８】検索手段６は、質問文のデータベクトルと
データベース内のデータベクトルとの距離を計算し、そ
の距離が近いデータベクトルのインデックスに対応する
自然言語テキスト及び画像イメージデータを検索結果と
して出力する。The search means 6 calculates the distance between the data vector of the question sentence and the data vector in the database, and outputs a natural language text and image data corresponding to the index of the data vector whose distance is short as the search result. .

【００２９】以下、本実施例の学習機能について図を参
照しながら説明する。Hereinafter, the learning function of this embodiment will be described with reference to the drawings.

【００３０】図７は本実施例の学習機能全体の動作を示
すフローチャートである。ステップＳ７１で入力手段１
より質問文が入力され、ステップＳ７２で検索手段６に
よりデータベース２からデータが検索される。ステップ
Ｓ７３で検索結果が出力手段３に出力される。ステップ
Ｓ７４で、利用者は検索結果に対する判断を判断入力手
段７より入力し、ステップＳ７５で学習手段８が判断に
応じた学習を行なう。ステップＳ７６で検索を終了する
場合にはフローは終了し、そうでない時はステップＳ７
１に戻る。FIG. 7 is a flowchart showing the operation of the entire learning function of this embodiment. In step S71, the input unit 1
The question sentence is input, and the data is retrieved from the database 2 by the retrieval means 6 in step S72. In step S73, the search result is output to the output unit 3. In step S74, the user inputs a judgment on the search result from the judgment input unit 7, and in step S75, the learning unit 8 performs learning according to the judgment. If the search is to be ended in step S76, the flow ends; otherwise, step S7.
Return to 1.

【００３１】本実施例ではステップＳ７４で判断の入力
として、検索されたデータの１つに対して利用者が入力
手段１より正解と判断した場合には○を、間違いと判断
した場合には×を入力するよう構成されている。これに
より本実施例では入力手段１が判断入力手段７の機能を
備えている。In this embodiment, when the user determines that one of the retrieved data is correct from the input means 1 as an input of the determination in step S74, the user determines ×, and when the user determines that the data is incorrect, the user selects ×. Is configured to be input. Thus, in this embodiment, the input unit 1 has the function of the judgment input unit 7.

【００３２】ステップＳ７５の学習手段の動作について
図８を用いて詳細に説明する。The operation of the learning means in step S75 will be described in detail with reference to FIG.

【００３３】入力された質問文から求められた質問文の
データベクトルをＱ＝（ｑ1，ｑ2，・・・，ｑn）とし、利用者が○または×の正誤判断を行なったデータ
のデータベクトルをＶ＝（ｖ1，ｖ2，・・・，ｖn）とする。ここで学習手段８は、利用者が○を入力した時
にはＶ＋εＱに、×を入力した時にはＶ−εＱにＶをそ
れぞれ更新する。ここでεは学習の速さと精度から決定
される正の定数である。Let Q = (q1, q2,..., Qn) be the data vector of the question sentence obtained from the input question sentence. Let V = (v1, v2,..., Vn). Here, the learning means 8 updates V to V + εQ when the user inputs ○, and updates V to −Q when the user inputs X. Here, ε is a positive constant determined from the learning speed and accuracy.

【００３４】図８に学習手段による出力結果の具体例を
示す。図８（ａ）は質問文に対して得点の高いものから
順に検索結果として表示した例であり、本例では自然言
語テキストを検索対象としているが、画像検索の場合も
同様に表示される。ここでいう得点とは質問文のデータ
ベクトルと検索されたデータのデータベクトルとの内積
であり、これは両ベクトルの距離の近さを数値化したも
のと考えて良い。最初の検索結果として、質問文「火山
の爆発で放出された溶岩」に対し、テキストデータ「原
地形」が８２点、「火山砕屑岩」が６６点のデータとし
て検索された。ここでは質問文のデータベクトルがＱ＝（１，６，０，０，１，５，５，０，３）であり、「火山砕屑岩」のデータベクトルがＶ＝（５，３，２，０，０，４，１，１，６）であるとしており、それらの内積として６６点が得られ
ている。利用者が「火山砕屑岩」を選択し、質問文に対
して正しいデータであると判断して○の入力を行なった
場合、学習手段８は上述の式に基づいてε＝０．２とす
ると次のように更新するデータベクトルを求める。FIG. 8 shows a specific example of the output result by the learning means. FIG. 8A shows an example in which a question sentence is displayed as a search result in descending order of score. In this example, a natural language text is a search target, but the image search is similarly displayed. The score here is an inner product of the data vector of the question sentence and the data vector of the retrieved data, which may be considered as a numerical value of the closeness of the distance between the two vectors. As a result of the first search, the question text "Lava released by volcanic eruption" was searched for text data "Original topography" as 82 points and "volcaniclastic rock" as 66 points. Here, the data vector of the question text is Q = (1,6,0,0,1,5,5,0,3), and the data vector of “volcaniclastic rock” is V = (5,3,2, 0,0,4,1,1,6), and 66 points are obtained as the inner product thereof. When the user selects “volcaniclastic rock”, determines that the data is correct for the question text, and inputs “○”, the learning means 8 sets ε = 0.2 based on the above equation. Find the data vector to be updated as follows.

【００３５】Ｖ＋０．２×Ｑ＝（５，４，２，０，０，
５，２，０，７）この後に再度同じ質問文を入力すると図８（ｂ）に示す
ように「火山砕屑岩」の得点は内積の計算より８５点と
なり上位に検索される。このように利用者の判断に応じ
た学習が正しく行なわれる。また逆に利用者が×を入力
した場合にも同様に学習が正しく行なわれる。V + 0.2 × Q = (5,4,2,0,0,
(5, 2, 0, 7) After that, when the same question sentence is input again, as shown in FIG. 8 (b), the score of "volcaniclastic rock" becomes 85 points from the calculation of the inner product, and is searched higher. Thus, learning according to the user's judgment is correctly performed. Conversely, when the user inputs x, the learning is similarly performed correctly.

【００３６】以上のように、利用者の判定が○×によっ
て行われることにより、従来の学習手段と比較して、利
用者の負担が軽減される。As described above, the determination of the user is made by ○, whereby the burden on the user is reduced as compared with the conventional learning means.

【００３７】本実施例においては画像のデータベクトル
の作成に画像の説明文を用いたが、それ以外にも例えば
利用者が与えた画像の印象を表す数単語からデータベク
トルを作成する方法などが考えられる。また、本発明に
おける学習方法は上述の数式による方法のみに限定され
るものではない。さらに、単語辞書の構造、データベク
トルの構成方法、検索方法、及び利用者による判断の方
法は上述の実施例の方法のみに限定されるものではな
い。In this embodiment, the description of the image is used to create the data vector of the image. However, for example, a method of creating the data vector from several words representing the impression of the image given by the user may be used. Conceivable. Further, the learning method according to the present invention is not limited to the method using the above-described formula. Furthermore, the structure of the word dictionary, the method of constructing the data vector, the method of searching, and the method of judgment by the user are not limited to the method of the above-described embodiment.

【００３８】[0038]

【発明の効果】請求項１のデータ検索装置においては、
人間の知識・感性等に応じて特徴付けられた単語ベクト
ルから作成されたデータベクトルを使用し、データベク
トルの一致度から検索対象データを検索することによ
り、質問文中に含まれる単語と一致するデータのみなら
ず、人間の知識・感性等に応じた検索が可能となる。さ
らに、ベクトル生成手段が単語ベクトルに基づいて画像
データに付加された説明文の特徴を表すデータベクトル
を作成することにより、画像データ等を含むマルチメデ
ィアデータベースを自然言語によって検索することが可
能となる。According to the data search apparatus of the first aspect,
By using a data vector created from word vectors characterized according to human knowledge, sensitivity, etc., data that matches the words contained in the question sentence is searched by searching the search target data based on the degree of matching of the data vectors. Not only that, it is also possible to perform a search in accordance with human knowledge, sensitivity, and the like. Furthermore, the vector generation means creates a data vector representing the characteristics of the description added to the image data based on the word vector, so that a multimedia database including image data and the like can be searched in a natural language. .

【００３９】[0039]

[Brief description of the drawings]

【図１】本発明のデータ検索装置の実施例の構成図であ
る。FIG. 1 is a configuration diagram of an embodiment of a data search device of the present invention.

【図２】図１の装置のベクトル生成手段のデータベクト
ルを生成方法を示すフローチャートである。FIG. 2 is a flowchart showing a method of generating a data vector by a vector generating means of the apparatus of FIG.

【図３】図１の装置のデータベースの自然言語テキスト
データの格納方法の例である。FIG. 3 is an example of a method for storing natural language text data in a database of the apparatus of FIG. 1;

【図４】図１の装置の画像データの構成例である。FIG. 4 is a configuration example of image data of the apparatus of FIG. 1;

【図５】図１の装置のベクトル生成手段の画像データの
データベクトル生成方法を示すフローチャートである。FIG. 5 is a flowchart illustrating a data vector generation method of image data by a vector generation unit of the apparatus of FIG. 1;

【図６】図５の方法により作成される画像データとデー
タベクトルの格納方法の例である。6 is an example of a method of storing image data and data vectors created by the method of FIG. 5;

【図７】図１の装置の学習方法を示すフローチャートで
ある。FIG. 7 is a flowchart showing a learning method of the apparatus of FIG. 1;

【図８】図１の装置の検索結果及び学習の結果の表示例
を示す図である。FIG. 8 is a diagram showing a display example of a search result and a learning result of the apparatus of FIG. 1;

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平４−98463（ＪＰ，Ａ) 特開平４−352279（ＪＰ，Ａ) 特開平２−125363（ＪＰ，Ａ) 芥子，池内，小渕「意味ベクトルによる百科辞典テキストデータベースの構築」情報処理学会シンポジウム論文集, Ｖｏｌ．93，Ｎｏ．９，ｐ．227−234, 1993（平５−12−２) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/30 ＪＩＣＳＴファイル（ＪＯＩＳ)──────────────────────────────────────────────────続き Continuation of the front page (56) References JP-A-4-98463 (JP, A) JP-A-4-352279 (JP, A) JP-A-2-125363 (JP, A) Akutoshi, Ikeuchi, Obuchi "Construction of Encyclopedia Text Database Using Semantic Vectors", IPSJ Symposium, Vol. 93, no. 9, p. 227-234, 1993 (Heisei 5-12-2) (58) Field surveyed (Int. Cl. ⁷ , DB name) G06F 17/30 JICST file (JOIS)

Claims

(57) [Claims]

1. A input means for inputting a search target data as quality Toibun, a word dictionary for storing word vector representing a feature of a word, a data vector representing a feature of the question on the basis of the word vectors a vector generation means for creating, from the coincidence degree between the search target data and a database for storing de <br/> Tabekutoru representing the feature of the search target data, data vector of the search target data and data vectors of the question Search means for searching for search target data in the database, output means for outputting a search result ,
Judgment input for inputting a user's judgment on the search result
Means and data of the question sentence according to the judgment of the user
The value obtained by multiplying the vector by a predetermined value is
And a learning means for updating by adding or subtracting from the data vector.
It is added to the image data based on the word vectors
Creating a data vector that represents the characteristics of the written description
Characteristic data search device.