JP2003030238A

JP2003030238A - Device, method and program for retrieving parallel type information and recording medium with the program recorded thereon

Info

Publication number: JP2003030238A
Application number: JP2001217559A
Authority: JP
Inventors: Takashi Yugawa; 高志湯川
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2001-07-18
Filing date: 2001-07-18
Publication date: 2003-01-31

Abstract

PROBLEM TO BE SOLVED: To provide a parallel type information retrieving device and method for performing acceleration by dividing a document set and paralleling partial retrieval with a plurality of partial retrieval processors and also reducing storage capacity by storing only a subset of divided documents in each partial retrieval processor, a parallel type information retrieval program and a recording medium with the program recorded thereon. SOLUTION: A document set of a retrieval object is divided into a plurality of document subsets, each of a plurality of partial retrieval processors 100 provided corresponding to each of the plurality of divided document subsets parallelly retrieves each document subset with a retrieval key, a result integrating processor 150 collects a plurality of quasi-retrieval results being a plurality of partial retrieval results retrieved by the plurality of partial retrieval processors 100 from the respective partial retrieval processors 100, and the result integrating processor 150 retrieves a sum of sets of the collected quasi- retrieval results with the retrieval key.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、自然言語で記述さ
れ電子化された大量の文書から利用者が所望する文書を
的確かつ高速に検索する情報検索技術に関し、具体的に
は文書の集合の中から検索キーに該当する文書を検索す
る並列型情報検索装置および方法と並列型情報検索プロ
グラムおよび該プログラムを記録した記録媒体に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information retrieval technique for accurately and quickly retrieving a document desired by a user from a large number of electronic documents described in natural language and digitized. The present invention relates to a parallel-type information search device and method for searching a document corresponding to a search key, a parallel-type information search program, and a recording medium having the program recorded therein.

【０００２】[0002]

【従来の技術】自然言語で記述され電子化された大量の
文書から利用者が所望する文書を計算機により見つけ出
す情報検索においては、全文検索と呼ばれる技術が多く
開発されている。全文検索では、語や語の集合（以下、
これを検索キーと呼ぶ）を利用者が入力し、予め格納さ
れた文書の中から検索キーと指定された語や語の集合を
含む文書を見つけ出す。処理を高速化するために、予め
索引を付与し、検索キーが入力されて検索を行う際に、
該索引を利用することで、文書そのものの内容に対して
語が含まれるか否かの検索をしなくとも結果を得られる
ような技術が開発されている。2. Description of the Related Art A technique called a full-text search has been developed for information retrieval in which a computer finds a document desired by a user from a large number of electronic documents described in natural language and digitized. In full-text search, a word or a set of words (hereinafter,
This is called a search key), and the user finds a document containing a word or a set of words designated as the search key from the documents stored in advance. In order to speed up the process, an index is added in advance, and when a search key is entered and a search is performed,
By using the index, a technique has been developed in which a result can be obtained without searching the content of the document itself for words.

【０００３】しかしながら、全文検索では、ある文書に
含まれる語が検索キーとした語と一致しない限り、該文
書が検索結果として得られることはない。このため、意
味的に包含関係があったり類似した内容を含む文書であ
っても、検索結果に含まれない可能性がある。例えば、
検索キーとして「災害」を指定した場合、「地震」や
「洪水」が含まれる文書は災害について記述したもので
ある可能性が高いが、該文書に「災害」という語そのも
のが含まれない限り、検索結果として得られることはな
い。However, in the full-text search, the document is not obtained as a search result unless the word included in a document matches the word used as the search key. Therefore, there is a possibility that documents that have a semantically inclusive relation or that contain similar contents are not included in the search results. For example,
If "disaster" is specified as the search key, a document containing "earthquake" or "flood" is likely to describe a disaster, but unless the document itself contains the word "disaster". , It will never be obtained as a search result.

【０００４】上述のような問題を回避するために、概念
検索と呼ばれる技術が開発されている。概念検索では、
文書や検索キーに含まれる語を文字列としてではなく、
その語が表す概念を計算機内で処理可能な形式で保持
し、概念間の関係に基づいて検索を行う。すなわち、こ
のような検索手法では、計算機に「災害」と「地震」あ
るいは「洪水」の関係が深いことを認識させることがで
き、検索キーとして「災害」を指定した場合に、「災
害」という語を含まないが「地震」や「洪水」という語
を含むという文書をも検索結果として得ることができ
る。In order to avoid the above problems, a technique called concept retrieval has been developed. In concept search,
Instead of using words contained in documents or search keys as character strings,
The concept represented by the word is stored in a form that can be processed in a computer, and a search is performed based on the relationship between the concepts. In other words, with such a search method, it is possible to make the computer recognize that there is a deep relationship between "disaster" and "earthquake" or "flood", and if "disaster" is specified as the search key, it will be called "disaster". A document that does not include the word but includes the words “earthquake” and “flood” can be obtained as the search result.

【０００５】検索対象となる文書の集合の中で用いられ
る語の概念を格納したものを概念ベースと呼ぶ。概念ベ
ースでは、該文書の集合に含まれる個々の語について、
実数値を要素とし長さが１であるベクトルとして表現し
格納している。語のベクトル表現は、該文書の集合に含
まれるすべての語に対して、該語の生起や他の語との関
係から統計的な分析を行うことにより計算して求める。
具体的な計算方法は、文献 H.Schutze,J.O.Pedersen著,
「Information Retrieval Based on Word Sense」,4th
Annual Symposium on Documents Analysis and Informa
tion Retrieval予稿集，161〜176頁，1995年発行を参照
されたい。すべての語のベクトルは同一次元となる。What stores the concept of a word used in a set of documents to be searched is called a concept base. In the concept base, for each word included in the set of documents,
It is expressed and stored as a vector whose real value is an element and length is 1. The vector expression of a word is calculated and obtained by performing statistical analysis on all the words included in the set of documents, from the occurrence of the word and the relationship with other words.
The specific calculation method is described in the article H. Schutze, JO Pedersen,
"Information Retrieval Based on Word Sense", 4th
Annual Symposium on Documents Analysis and Informa
Seetion Retrieval Proceedings, 161-176, published 1995. All word vectors have the same dimensions.

【０００６】該文書の集合に含まれる文書に対しても、
個々の文書を、実数値を要素とし長さが１であるベクト
ルとして表現し格納する。個々の文書のベクトルは次の
ように計算される。まず、文書より該文書に含まれるす
べての語を取り出す。次に、取り出された個々の語に対
して概念ベースを参照してベクトルを求める。取り出さ
れた個々の語のベクトルが得られたら、該文書に含まれ
るすべての語にわたり該得られた個々の語のベクトルを
加算する。最後に、ベクトルが方向を保持したまま長さ
が１となるように、ベクトルのすべての要素に一定数を
掛け合わせ、得られた長さ１のベクトルを該文書のベク
トルとする。個々の文書に対応するベクトルを該文書の
文書ベクトルと呼ぶことにする。概念検索システムにお
いては、文書と該文書に対する文書ベクトルとを組に
し、それら組の集合を検索対象として予め格納している
ことになる。For documents included in the set of documents,
Each document is represented and stored as a vector whose real value is an element and whose length is 1. The vector of individual documents is calculated as follows. First, all words included in the document are extracted from the document. Next, a vector is obtained by referring to the concept base for each extracted word. Once the vector of individual words retrieved has been obtained, the vector of individual words obtained is added over all the words contained in the document. Finally, all the elements of the vector are multiplied by a certain number so that the length of the vector becomes 1 while keeping the direction, and the obtained vector of length 1 is set as the vector of the document. The vector corresponding to each document will be called the document vector of the document. In the concept retrieval system, a document and a document vector for the document are paired, and a set of these pairs is stored in advance as a retrieval target.

【０００７】検索キーに対しても、文書のベクトルを計
算するのと同様にしてベクトルを計算することができ
る。具体的には、まず、検索キーに含まれるすべての語
を取り出す。次に、取り出された個々の語に対して概念
ベースを参照してベクトルを求める。取り出された個々
の語のベクトルが得られたら、該検索キーに含まれるす
べての語にわたり該得られた個々の語のベクトルを加算
する。最後に、ベクトルが方向を保持したまま長さが１
となるように、ベクトルのすべての要素に一定数を掛け
合わせ、得られた長さ１のベクトルを該検索キーのベク
トルとする。このベクトルを該検索キーの検索キーベク
トルと呼ぶことにする。For the search key, the vector can be calculated in the same manner as the vector of the document is calculated. Specifically, first, all words included in the search key are extracted. Next, a vector is obtained by referring to the concept base for each extracted word. Once the vector of individual words retrieved has been obtained, the vector of individual words obtained is added over all the words contained in the search key. Finally, the vector has a length of 1
All the elements of the vector are multiplied by a fixed number so that the obtained vector of length 1 is used as the vector of the search key. This vector will be called the search key vector of the search key.

【０００８】上述のように、文書ベクトルも検索キーベ
クトルも、語のベクトルの和によって計算されるため、
語のベクトルと同一の次元を持つ。ある文書ベクトルと
検索キーベクトルとの内積を、該文書と該検索キーとの
関連度として定義する。関連度は０から１の間の実数値
をとることになる。As described above, since both the document vector and the search key vector are calculated by the sum of the word vectors,
It has the same dimension as the word vector. The inner product of a certain document vector and the search key vector is defined as the degree of association between the document and the search key. The degree of association takes a real value between 0 and 1.

【０００９】概念検索システムにおいて、検索キーが与
えられた際の検索処理は次のようになる。まず、検索キ
ーから検索キーベクトルを計算する。次に、格納された
すべての文書に対し、該文書の文書ベクトルと検索キー
ベクトルとの関連度を計算する。計算された関連度の大
きい順に文書をソートし、上位から規定された件数だけ
文書を取り出し、結果として出力する。In the concept retrieval system, retrieval processing when a retrieval key is given is as follows. First, a search key vector is calculated from the search key. Next, for all stored documents, the degree of association between the document vector of the document and the search key vector is calculated. The documents are sorted in descending order of the calculated degree of association, the specified number of documents are extracted from the top, and the results are output.

【００１０】このような動作により、概念検索に基づく
検索システムにおいては、検索キーに指定された語その
ものを含まないが関連する内容が記述された文書をも検
索結果として得ることができる。その反面、格納された
すべての文書に対し、文書ベクトルと検索キーベクトル
との内積を計算する必要があるため、全文検索のような
索引に基づく検索を行うことが困難であり、検索速度が
低速となっている。By such an operation, in the retrieval system based on the concept retrieval, a document which does not include the word itself designated by the retrieval key but describes related contents can be obtained as the retrieval result. On the other hand, since it is necessary to calculate the inner product of the document vector and the search key vector for all stored documents, it is difficult to perform an index-based search such as full-text search, and the search speed is slow. Has become.

【００１１】概念検索を高速に行うために、複数のプロ
セッサを用いて並列処理する方法が考えられている。検
索対象となる文書の集合を複数の部分集合に分割し、個
々の部分集合に対しプロセッサが概念検索を行うことで
並列処理が可能となる。In order to perform a concept search at high speed, a method of parallel processing using a plurality of processors has been considered. Parallel processing is possible by dividing a set of documents to be searched into a plurality of subsets, and performing a concept search on each subset.

【００１２】[0012]

【発明が解決しようとする課題】上述したように、従来
の概念検索に基づく検索システムでは、格納されたすべ
ての文書に対して文書ベクトルと検索キーベクトルとの
内積を計算する必要があるため、全文検索のような索引
に基づく検索を行うことが困難であり、検索速度が遅い
という問題がある。As described above, in the conventional search system based on the concept search, it is necessary to calculate the inner product of the document vector and the search key vector for all the stored documents. There is a problem that it is difficult to perform an index-based search such as a full-text search, and the search speed is slow.

【００１３】また、概念検索を高速化するために複数の
プロセッサを用いて並列処理する従来の方法では、概念
検索における概念ベースは検索対象となる文書の集合に
基づいて構築される必要があるため、個々のプロセッサ
が検索する対象は文書の部分集合であるにも関わらず、
概念ベース構築のために文書の集合に属するすべての文
書を格納しておく必要があり、従ってすべてのプロセッ
サが文書の集合を重複し格納することになるため、文書
の量が多い場合には記憶容量を無駄に消費するという問
題がある。Further, in the conventional method of performing parallel processing using a plurality of processors in order to speed up the concept search, the concept base in the concept search needs to be constructed based on a set of documents to be searched. , Even though each processor searches for a subset of documents,
It is necessary to store all the documents that belong to a set of documents in order to build the concept base, and therefore all processors will store the set of documents in duplicate. There is a problem of wasting capacity.

【００１４】本発明は、上記に鑑みてなされたもので、
その目的とするところは、文書の集合を分割して複数の
部分検索プロセッサで部分検索を並列して高速化すると
ともに各部分検索プロセッサでは分割された文書の部分
集合のみを格納して記憶容量の低減化を図った並列型情
報検索装置および方法と並列型情報検索プログラムおよ
び該プログラムを記録した記録媒体を提供することにあ
る。The present invention has been made in view of the above,
The purpose is to divide a set of documents and speed up partial search in parallel by a plurality of partial search processors, and to store only a partial set of the divided documents in each partial search processor to reduce the storage capacity. It is an object of the present invention to provide a parallel-type information search device and method, a parallel-type information search program, and a recording medium that records the program, which are reduced in size.

【００１５】[0015]

【課題を解決するための手段】上記目的を達成するた
め、請求項１記載の本発明は、文書の集合の中から検索
キーに該当する文書を検索する並列型情報検索装置であ
って、前記文書の集合を分割した複数の文書部分集合に
それぞれ対応して設けられ、各文書部分集合に対して前
記検索キーでの検索を行う複数の部分検索プロセッサ
と、この複数の部分検索プロセッサで検索された複数の
部分検索結果である複数の準検索結果を各部分検索プロ
セッサから収集し、この収集した準検索結果の和集合に
対して前記検索キーでの検索を行う結果統合プロセッサ
とを有し、前記複数の部分検索プロセッサの各々は、前
記分割された文書部分集合を格納する文書部分集合格納
手段と、この格納された文書部分集合に含まれる個々の
文書のそれぞれについて、文書部分集合のすべての文書
に現れる単語の統計的な性質と個々の文書に現れる単語
の統計的な性質とから実数値を要素とし長さが１である
ベクトルとして表現される個々の文書のベクトルを準文
書ベクトルとして計算し格納する準文書ベクトル計算格
納手段と、前記文書部分集合に含まれるすべての文書に
現れる単語の統計的な性質と前記検索キーに現れる単語
の統計的な性質とから実数値を要素とし長さが１である
ベクトルを準検索キーベクトルとして計算する準検索キ
ーベクトル計算手段と、前記文書部分集合格納手段に格
納された前記文書部分集合に含まれる個々の文書のそれ
ぞれについて、前記準文書ベクトル計算格納手段に格納
された文書の準文書ベクトルと前記計算した準検索キー
ベクトルとの内積を関連度として計算し、この関連度が
大なる文書から順に所定の件数分の文書を準検索結果と
して検索する部分概念検索実行手段と、この検索された
所定の件数分の文書を準検索結果として出力する準検索
結果出力手段とを有し、前記結果統合プロセッサは、前
記複数の部分検索プロセッサからそれぞれ出力される準
検索結果を収集する準検索結果収集手段と、この収集し
た複数の部分検索プロセッサからの準検索結果を準検索
結果和集合として格納する準検索結果和集合格納手段
と、この格納された準検索結果和集合に含まれる個々の
文書のそれぞれについて、準検索結果和集合のすべての
文書に現れる単語の統計的な性質と個々の文書に現れる
単語の統計的な性質とから実数値を要素とし長さが１で
あるベクトルとして表現される個々の文書のベクトルを
近似文書ベクトルとして計算し格納する近似文書ベクト
ル計算格納手段と、前記準検索結果和集合に含まれるす
べての文書に現れる単語の統計的な性質と前記検索キー
に現れる単語の統計的な性質とから実数値を要素とし長
さが１であるベクトルを近似検索キーベクトルとして計
算する近似検索キーベクトル計算手段と、前記準検索結
果和集合に含まれる個々の文書のそれぞれについて、前
記近似文書ベクトル計算格納手段に格納された文書の前
記近似文書ベクトルと前記計算した近似検索キーベクト
ルとの内積を関連度として計算し、この関連度が大なる
文書から順に所定の件数分の文書を検索結果として検索
する概念検索実行手段と、この検索された所定の件数分
の文書を検索結果として出力する検索結果出力手段とを
有することを要旨とする。In order to achieve the above object, the present invention according to claim 1 is a parallel type information retrieval apparatus for retrieving a document corresponding to a retrieval key from a set of documents. A plurality of partial search processors which are provided respectively corresponding to a plurality of document subsets obtained by dividing a set of documents, and which perform a search with the search key for each document subset, and are searched by the plurality of partial search processors. A plurality of quasi-search results that are a plurality of partial search results collected from each partial search processor, and a result integration processor that performs a search with the search key for the union of the collected quasi-search results, Each of the plurality of partial search processors includes a document subset storage unit that stores the divided document subset, and an individual document included in the stored document subset. , Of the individual documents represented by a vector with a real value as an element, from the statistical properties of the words appearing in all the documents of the document subset and the statistical properties of the words appearing in the individual documents. From a quasi-document vector calculation storage means for calculating and storing a vector as a quasi-document vector, and statistical properties of words appearing in all documents included in the document subset and statistical properties of words appearing in the search key Quasi-search key vector calculation means for calculating a vector having a real value as an element and having a length of 1 as a quasi-search key vector, and each of the individual documents included in the document subset stored in the document subset storage means. With respect to, the inner product of the quasi-document vector of the document stored in the quasi-document vector calculation storage means and the calculated quasi-search key vector is calculated as the degree of association. Partial concept search execution means for searching a predetermined number of documents as semi-search results in order from the document with the highest degree of relevance, and semi-search result output for outputting the searched predetermined number of documents as semi-search results Means, the result integration processor collects the quasi-search result collecting means for collecting the quasi-search results output from the plurality of partial search processors respectively, and the quasi-search results from the collected partial search processors. A quasi-search result union storage means for storing as a quasi-search result union set, and statistics of words appearing in all the documents of the quasi-search result union set for each individual document included in the stored quasi-search result union set Approximate vector of each document expressed as a vector with a real value as an element from the statistical properties and statistical properties of words appearing in each document It is realized from the approximate document vector calculation storage means for calculating and storing as a writing vector, and the statistical property of words appearing in all documents included in the quasi-search result union and the statistical property of words appearing in the search key. Approximate search key vector calculation means for calculating a vector having a numerical value as an element and having a length of 1 as an approximate search key vector, and the approximate document vector calculation storage means for each individual document included in the quasi-search result union A concept of calculating the inner product of the approximate document vector of the document stored in the document and the calculated approximate search key vector as the degree of relevance, and retrieving a predetermined number of documents as retrieval results in order from the document with the highest degree of relevance. It is a gist to have a search execution means and a search result output means for outputting a predetermined number of searched documents as search results.

【００１６】請求項１記載の本発明にあっては、文書の
集合を複数の文書部分集合に分割し、各文書部分集合を
それぞれ複数の部分検索プロセッサに割り付け、各部分
検索プロセッサの文書部分集合格納手段に格納し、この
格納された文書部分集合に含まれる個々の文書について
準文書ベクトルを計算し、文書部分集合に含まれる個々
の文書のそれぞれについて、文書の準文書ベクトルと検
索キーの準検索キーベクトルとの内積を関連度として計
算し、この関連度が大なる文書から順に所定の件数分の
文書を準検索結果として検索し、この検索された所定の
件数分の文書を各部分検索プロセッサから準検索結果と
して結果統合プロセッサに出力し、結果統合プロセッサ
は複数の部分検索プロセッサからそれぞれ出力される準
検索結果を収集して準検索結果和集合として準検索結果
和集合格納手段に格納し、この格納された準検索結果和
集合に含まれる個々の文書について近似文書ベクトルを
計算し、準検索結果和集合の各文書について、近似文書
ベクトルと検索キーの近似検索キーベクトルとの内積を
関連度として計算し、この関連度が大なる文書から順に
所定の件数分の文書を検索結果として出力するため、複
数の部分検索プロセッサでの並列検索により概念検索を
高速化し得るとともに、各部分検索プロセッサには分割
された文書部分集合のみが格納され、この文書部分集合
のみに基づいて概念ベースを構築することにより、記憶
容量を低減することができ、経済化を図ることができ
る。According to the first aspect of the present invention, a set of documents is divided into a plurality of document subsets, each document subset is assigned to a plurality of partial search processors, and the document subsets of each partial search processor are allocated. The quasi-document vector stored in the storage means, the quasi-document vector is calculated for each document included in the stored document subset, and the quasi-document vector of the document and the quasi-document search key are calculated for each of the individual documents included in the document subset. The dot product with the search key vector is calculated as the degree of association, and a predetermined number of documents are searched as semi-search results in order from the document with the highest degree of relevance, and the predetermined number of retrieved documents are partially searched. The semi-search results are output from the processor to the result integration processor. The result integration processor collects the semi-search results output from each of the partial search processors. It is stored in the quasi-search result union storage means as a quasi-search result union set, an approximate document vector is calculated for each document included in the stored quasi-search result union set, and for each document in the quasi-search result union set, The inner product of the approximate document vector and the approximate search key vector of the search key is calculated as the degree of relevance, and a predetermined number of documents are output in order from the document with the highest degree of relevance. The parallel search can speed up the concept search, and each sub-search processor stores only the divided document subsets, and the concept base is constructed based only on the document subsets to reduce the storage capacity. It is possible to achieve economic efficiency.

【００１７】また、請求項２記載の本発明は、文書の集
合の中から検索キーに該当する文書を検索する並列型情
報検索方法であって、前記文書の集合を複数の文書部分
集合に分割し、この分割した複数の文書部分集合の各々
を複数の部分検索プロセッサに割り付けて、各部分検索
プロセッサの文書部分集合格納手段に格納し、各部分検
索プロセッサにおいて文書部分集合格納手段に格納され
た文書部分集合に含まれる個々の文書のそれぞれについ
て、文書部分集合のすべての文書に現れる単語の統計的
な性質と個々の文書に現れる単語の統計的な性質とから
実数値を要素とし長さが１であるベクトルとして表現さ
れる個々の文書のベクトルを準文書ベクトルとして計算
して格納し、各部分検索プロセッサにおいて前記文書部
分集合に含まれるすべての文書に現れる単語の統計的な
性質と前記検索キーに現れる単語の統計的な性質とから
実数値を要素とし長さが１であるベクトルを準検索キー
ベクトルとして計算し、各部分検索プロセッサにおいて
文書部分集合格納手段に格納された文書部分集合に含ま
れる個々の文書のそれぞれについて、文書の前記準文書
ベクトルと前記準検索キーベクトルとの内積を関連度と
して計算し、この関連度が大なる文書から順に所定の件
数分の文書を準検索結果として検索し、この検索された
所定の件数分の文書を準検索結果として出力し、複数の
部分検索プロセッサからそれぞれ出力される準検索結果
を結果統合プロセッサで収集し、結果統合プロセッサに
おいて収集した複数の部分検索プロセッサからの準検索
結果を準検索結果和集合として準検索結果和集合格納手
段に格納し、この格納された準検索結果和集合に含まれ
る個々の文書のそれぞれについて、準検索結果和集合の
すべての文書に現れる単語の統計的な性質と個々の文書
に現れる単語の統計的な性質とから実数値を要素とし長
さが１であるベクトルとして表現される個々の文書のベ
クトルを近似文書ベクトルとして計算して格納し、前記
準検索結果和集合に含まれるすべての文書に現れる単語
の統計的な性質と前記検索キーに現れる単語の統計的な
性質とから実数値を要素とし長さが１であるベクトルを
近似検索キーベクトルとして計算し、前記準検索結果和
集合に含まれる個々の文書のそれぞれについて、文書の
前記近似文書ベクトルと前記近似検索キーベクトルとの
内積を関連度として計算し、この関連度が大なる文書か
ら順に所定の件数分の文書を検索結果として検索し、こ
の検索された所定の件数分の文書を検索結果として出力
することを要旨とする。The present invention according to claim 2 is a parallel information retrieval method for retrieving a document corresponding to a retrieval key from a set of documents, wherein the set of documents is divided into a plurality of document subsets. Then, each of the divided plurality of document subsets is assigned to a plurality of partial search processors, stored in the document subset storage means of each partial search processor, and stored in the document subset storage means in each partial search processor. For each individual document included in the document subset, the real value is used as the element and the length is calculated from the statistical properties of the words that appear in all the documents of the document subset and the statistical properties of the words that appear in each document. The vector of each document represented as a vector of 1 is calculated and stored as a quasi-document vector, and is included in the document subset in each partial search processor. Based on the statistical properties of the words appearing in all the documents and the statistical properties of the words appearing in the search key, a vector having a real value as an element and a length of 1 is calculated as a quasi-search key vector, and each partial search is performed. In the processor, for each individual document included in the document subset stored in the document subset storage means, an inner product of the quasi-document vector of the document and the quasi-search key vector is calculated as the degree of association, and the degree of association is calculated. A predetermined number of documents are searched as a quasi-search result in order from the largest document, the searched documents of a predetermined number are output as quasi-search results, and quasi-search results output from each of the partial search processors Are collected by the result integration processor, and the quasi-search results from the partial search processors collected by the result integration processor are quasi-checked as a quasi-search result union. The result is stored in the result union storage means, and for each of the individual documents included in this stored quasi-search result union, the statistical properties of words appearing in all the documents in the quasi-search result union and the individual documents A vector of each document represented by a vector having a real value as an element and a length of 1 is calculated and stored as an approximate document vector based on the statistical properties of the appearing words, and is included in the quasi-search result union. From the statistical properties of the words appearing in all the documents and the statistical properties of the words appearing in the search key, a vector having a real value as an element and a length of 1 is calculated as an approximate search key vector, and the quasi-search result is obtained. For each individual document included in the union set, the inner product of the approximate document vector of the document and the approximate search key vector is calculated as the degree of association, and whether the document has a high degree of association The gist is to sequentially search for a predetermined number of documents as search results, and output the searched for a predetermined number of documents as search results.

【００１８】請求項２記載の本発明にあっては、文書の
集合を複数の文書部分集合に分割し、各文書部分集合を
それぞれ複数の部分検索プロセッサに割り付け、各部分
検索プロセッサの文書部分集合格納手段に格納し、この
格納された文書部分集合に含まれる個々の文書について
準文書ベクトルを計算し、文書部分集合に含まれる個々
の文書のそれぞれについて、文書の準文書ベクトルと検
索キーの準検索キーベクトルとの内積を関連度として計
算し、この関連度が大なる文書から順に所定の件数分の
文書を準検索結果として検索し、この検索された所定の
件数分の文書を各部分検索プロセッサから準検索結果と
して結果統合プロセッサに出力し、結果統合プロセッサ
は複数の部分検索プロセッサからそれぞれ出力される準
検索結果を収集して準検索結果和集合として準検索結果
和集合格納手段に格納し、この格納された準検索結果和
集合に含まれる個々の文書について近似文書ベクトルを
計算し、準検索結果和集合の各文書について、近似文書
ベクトルと検索キーの近似検索キーベクトルとの内積を
関連度として計算し、この関連度が大なる文書から順に
所定の件数分の文書を検索結果として出力するため、複
数の部分検索プロセッサでの並列検索により概念検索を
高速化し得るとともに、各部分検索プロセッサには分割
された文書部分集合のみが格納され、この文書部分集合
のみに基づいて概念ベースを構築することにより、記憶
容量を低減することができ、経済化を図ることができ
る。According to the second aspect of the present invention, a set of documents is divided into a plurality of document subsets, and each document subset is assigned to each of a plurality of partial search processors. The quasi-document vector stored in the storage means, the quasi-document vector is calculated for each document included in the stored document subset, and the quasi-document vector of the document and the quasi-document search key are calculated for each of the individual documents included in the document subset. The dot product with the search key vector is calculated as the degree of association, and a predetermined number of documents are searched as semi-search results in order from the document with the highest degree of relevance, and the predetermined number of retrieved documents are partially searched. The semi-search results are output from the processor to the result integration processor. The result integration processor collects the semi-search results output from each of the partial search processors. It is stored in the quasi-search result union storage means as a quasi-search result union set, an approximate document vector is calculated for each document included in the stored quasi-search result union set, and for each document in the quasi-search result union set, The inner product of the approximate document vector and the approximate search key vector of the search key is calculated as the degree of relevance, and a predetermined number of documents are output in order from the document with the highest degree of relevance. The parallel search can speed up the concept search, and each sub-search processor stores only the divided document subsets, and the concept base is constructed based only on the document subsets to reduce the storage capacity. It is possible to achieve economic efficiency.

【００１９】更に、請求項３記載の本発明は、文書の集
合の中から検索キーに該当する文書を検索する並列型情
報検索プログラムであって、前記文書の集合を複数の文
書部分集合に分割し、この分割した複数の文書部分集合
の各々を複数の部分検索プロセッサに割り付けて、各部
分検索プロセッサの文書部分集合格納手段に格納し、各
部分検索プロセッサにおいて文書部分集合格納手段に格
納された文書部分集合に含まれる個々の文書のそれぞれ
について、文書部分集合のすべての文書に現れる単語の
統計的な性質と個々の文書に現れる単語の統計的な性質
とから実数値を要素とし長さが１であるベクトルとして
表現される個々の文書のベクトルを準文書ベクトルとし
て計算して格納し、各部分検索プロセッサにおいて前記
文書部分集合に含まれるすべての文書に現れる単語の統
計的な性質と前記検索キーに現れる単語の統計的な性質
とから実数値を要素とし長さが１であるベクトルを準検
索キーベクトルとして計算し、各部分検索プロセッサに
おいて文書部分集合格納手段に格納された文書部分集合
に含まれる個々の文書のそれぞれについて、文書の前記
準文書ベクトルと前記準検索キーベクトルとの内積を関
連度として計算し、この関連度が大なる文書から順に所
定の件数分の文書を準検索結果として検索し、この検索
された所定の件数分の文書を準検索結果として出力し、
複数の部分検索プロセッサからそれぞれ出力される準検
索結果を結果統合プロセッサで収集し、結果統合プロセ
ッサにおいて収集した複数の部分検索プロセッサからの
準検索結果を準検索結果和集合として準検索結果和集合
格納手段に格納し、この格納された準検索結果和集合に
含まれる個々の文書のそれぞれについて、準検索結果和
集合のすべての文書に現れる単語の統計的な性質と個々
の文書に現れる単語の統計的な性質とから実数値を要素
とし長さが１であるベクトルとして表現される個々の文
書のベクトルを近似文書ベクトルとして計算して格納
し、前記準検索結果和集合に含まれるすべての文書に現
れる単語の統計的な性質と前記検索キーに現れる単語の
統計的な性質とから実数値を要素とし長さが１であるベ
クトルを近似検索キーベクトルとして計算し、前記準検
索結果和集合に含まれる個々の文書のそれぞれについ
て、文書の前記近似文書ベクトルと前記近似検索キーベ
クトルとの内積を関連度として計算し、この関連度が大
なる文書から順に所定の件数分の文書を検索結果として
検索し、この検索された所定の件数分の文書を検索結果
として出力することを要旨とする。Further, the present invention according to claim 3 is a parallel information retrieval program for retrieving a document corresponding to a retrieval key from a document set, wherein the document set is divided into a plurality of document subsets. Then, each of the divided plurality of document subsets is assigned to a plurality of partial search processors, stored in the document subset storage means of each partial search processor, and stored in the document subset storage means in each partial search processor. For each individual document included in the document subset, the real value is used as the element and the length is calculated from the statistical properties of the words that appear in all the documents of the document subset and the statistical properties of the words that appear in each document. The vector of each document represented as a vector of 1 is calculated and stored as a quasi-document vector, and included in the document subset in each partial search processor. The partial search is performed by calculating a vector having a real value as an element and a length of 1 as a quasi-search key vector from the statistical properties of the words appearing in all the documents and the statistical properties of the words appearing in the search key. In the processor, for each individual document included in the document subset stored in the document subset storage means, an inner product of the quasi-document vector of the document and the quasi-search key vector is calculated as the degree of association, and the degree of association is calculated. A predetermined number of documents are searched as a semi-search result in order from the largest document, and the predetermined number of retrieved documents are output as a semi-search result.
The quasi-search results output from each of the multiple partial search processors are collected by the result integration processor, and the quasi-search results from the multiple partial search processors collected by the result integration processor are stored as a quasi-search result union set as a quasi-search result union. For each of the individual documents stored in the means and stored in this quasi-search result union, the statistical properties of the words that appear in all documents of the quasi-search result union and the statistics of the words that appear in each document The vector of each document expressed as a vector having a real value as an element and a length of 1 is calculated and stored as an approximate document vector from the general property and stored in all documents included in the quasi-search result union. From the statistical properties of the words that appear and the statistical properties of the words that appear in the search key, a vector with a real value as an element and a length of 1 is approximated to the search key. Vector, and for each individual document included in the quasi-search result union, the inner product of the approximate document vector of the document and the approximate search key vector is calculated as the degree of association, and the document having a high degree of association is calculated. It is a gist to search for a predetermined number of documents as search results in order from and output the searched documents for the predetermined number of search results.

【００２０】請求項３記載の本発明にあっては、文書の
集合を複数の文書部分集合に分割し、各文書部分集合を
それぞれ複数の部分検索プロセッサに割り付け、各部分
検索プロセッサの文書部分集合格納手段に格納し、この
格納された文書部分集合に含まれる個々の文書について
準文書ベクトルを計算し、文書部分集合に含まれる個々
の文書のそれぞれについて、文書の準文書ベクトルと検
索キーの準検索キーベクトルとの内積を関連度として計
算し、この関連度が大なる文書から順に所定の件数分の
文書を準検索結果として検索し、この検索された所定の
件数分の文書を各部分検索プロセッサから準検索結果と
して結果統合プロセッサに出力し、結果統合プロセッサ
は複数の部分検索プロセッサからそれぞれ出力される準
検索結果を収集して準検索結果和集合として準検索結果
和集合格納手段に格納し、この格納された準検索結果和
集合に含まれる個々の文書について近似文書ベクトルを
計算し、準検索結果和集合の各文書について、近似文書
ベクトルと検索キーの近似検索キーベクトルとの内積を
関連度として計算し、この関連度が大なる文書から順に
所定の件数分の文書を検索結果として出力するため、複
数の部分検索プロセッサでの並列検索により概念検索を
高速化し得るとともに、各部分検索プロセッサには分割
された文書部分集合のみが格納され、この文書部分集合
のみに基づいて概念ベースを構築することにより、記憶
容量を低減することができ、経済化を図ることができ
る。According to another aspect of the present invention, a set of documents is divided into a plurality of document subsets, and each document subset is assigned to a plurality of partial search processors. The quasi-document vector stored in the storage means, the quasi-document vector is calculated for each document included in the stored document subset, and the quasi-document vector of the document and the quasi-document search key are calculated for each of the individual documents included in the document subset. The dot product with the search key vector is calculated as the degree of association, and a predetermined number of documents are searched as semi-search results in order from the document with the highest degree of relevance, and the predetermined number of retrieved documents are partially searched. The semi-search results are output from the processor to the result integration processor. The result integration processor collects the semi-search results output from each of the partial search processors. It is stored in the quasi-search result union storage means as a quasi-search result union set, an approximate document vector is calculated for each document included in the stored quasi-search result union set, and for each document in the quasi-search result union set, The inner product of the approximate document vector and the approximate search key vector of the search key is calculated as the degree of relevance, and a predetermined number of documents are output in order from the document with the highest degree of relevance. The parallel search can speed up the concept search, and each sub-search processor stores only the divided document subsets, and the concept base is constructed based only on the document subsets to reduce the storage capacity. It is possible to achieve economic efficiency.

【００２１】請求項４記載の本発明は、文書の集合の中
から検索キーに該当する文書を検索する並列型情報検索
プログラムを記録した記録媒体であって、前記文書の集
合を複数の文書部分集合に分割し、この分割した複数の
文書部分集合の各々を複数の部分検索プロセッサに割り
付けて、各部分検索プロセッサの文書部分集合格納手段
に格納し、各部分検索プロセッサにおいて文書部分集合
格納手段に格納された文書部分集合に含まれる個々の文
書のそれぞれについて、文書部分集合のすべての文書に
現れる単語の統計的な性質と個々の文書に現れる単語の
統計的な性質とから実数値を要素とし長さが１であるベ
クトルとして表現される個々の文書のベクトルを準文書
ベクトルとして計算して格納し、各部分検索プロセッサ
において前記文書部分集合に含まれるすべての文書に現
れる単語の統計的な性質と前記検索キーに現れる単語の
統計的な性質とから実数値を要素とし長さが１であるベ
クトルを準検索キーベクトルとして計算し、各部分検索
プロセッサにおいて文書部分集合格納手段に格納された
文書部分集合に含まれる個々の文書のそれぞれについ
て、文書の前記準文書ベクトルと前記準検索キーベクト
ルとの内積を関連度として計算し、この関連度が大なる
文書から順に所定の件数分の文書を準検索結果として検
索し、この検索された所定の件数分の文書を準検索結果
として出力し、複数の部分検索プロセッサからそれぞれ
出力される準検索結果を結果統合プロセッサで収集し、
結果統合プロセッサにおいて収集した複数の部分検索プ
ロセッサからの準検索結果を準検索結果和集合として準
検索結果和集合格納手段に格納し、この格納された準検
索結果和集合に含まれる個々の文書のそれぞれについ
て、準検索結果和集合のすべての文書に現れる単語の統
計的な性質と個々の文書に現れる単語の統計的な性質と
から実数値を要素とし長さが１であるベクトルとして表
現される個々の文書のベクトルを近似文書ベクトルとし
て計算して格納し、前記準検索結果和集合に含まれるす
べての文書に現れる単語の統計的な性質と前記検索キー
に現れる単語の統計的な性質とから実数値を要素とし長
さが１であるベクトルを近似検索キーベクトルとして計
算し、前記準検索結果和集合に含まれる個々の文書のそ
れぞれについて、文書の前記近似文書ベクトルと前記近
似検索キーベクトルとの内積を関連度として計算し、こ
の関連度が大なる文書から順に所定の件数分の文書を検
索結果として検索し、この検索された所定の件数分の文
書を検索結果として出力する並列型情報検索プログラム
を記録媒体に記録することを要旨とする。According to a fourth aspect of the present invention, there is provided a recording medium in which a parallel information retrieval program for retrieving a document corresponding to a retrieval key from a set of documents is recorded. The document is divided into sets, each of the divided plurality of document subsets is assigned to a plurality of partial search processors, stored in the document subset storage means of each partial search processor, and stored in the document subset storage means in each partial search processor. For each individual document included in the stored document subset, the real value is used as an element from the statistical properties of the words that appear in all the documents of the document subset and the statistical properties of the words that appear in each document. A vector of each document represented as a vector having a length of 1 is calculated and stored as a quasi-document vector, and the document is stored in each partial search processor. A vector having a real value as an element and a length of 1 is calculated as a quasi-search key vector from the statistical properties of the words appearing in all documents included in the minute set and the statistical properties of the words appearing in the search key. For each individual document included in the document subset stored in the document subset storage means in each partial search processor, an inner product of the quasi-document vector of the document and the quasi-search key vector is calculated as the degree of association, A predetermined number of documents are searched as semi-search results in order from the document with the highest degree of relevance, and the predetermined number of retrieved documents are output as semi-search results, which are respectively output from a plurality of partial search processors. Collect the quasi-search results by the result integration processor,
The quasi-search results from the plurality of partial search processors collected by the result integration processor are stored in the quasi-search result union storage means as a quasi-search result union set, and the individual documents included in the stored quasi-search result union set are stored. For each of them, it is expressed as a vector having a real number as an element and a length of 1 from the statistical properties of words appearing in all documents of the quasi-search result union and the statistical properties of words appearing in individual documents. The vector of each document is calculated and stored as an approximate document vector, and from the statistical properties of the words appearing in all the documents included in the quasi-search result union set and the statistical properties of the words appearing in the search key. A vector having a real value as an element and a length of 1 is calculated as an approximate search key vector, and a sentence is calculated for each individual document included in the quasi-search result union. The inner product of the approximate document vector and the approximate search key vector is calculated as the degree of relevance, and a predetermined number of documents are searched as search results in order from the document with the highest degree of relevance, and the predetermined number of retrieved cases is calculated. It is a gist to record a parallel type information search program that outputs a minute document as a search result in a recording medium.

【００２２】請求項４記載の本発明にあっては、文書の
集合を複数の文書部分集合に分割し、各文書部分集合を
それぞれ複数の部分検索プロセッサに割り付け、各部分
検索プロセッサの文書部分集合格納手段に格納し、この
格納された文書部分集合に含まれる個々の文書について
準文書ベクトルを計算し、文書部分集合に含まれる個々
の文書のそれぞれについて、文書の準文書ベクトルと検
索キーの準検索キーベクトルとの内積を関連度として計
算し、この関連度が大なる文書から順に所定の件数分の
文書を準検索結果として検索し、この検索された所定の
件数分の文書を各部分検索プロセッサから準検索結果と
して結果統合プロセッサに出力し、結果統合プロセッサ
は複数の部分検索プロセッサからそれぞれ出力される準
検索結果を収集して準検索結果和集合として準検索結果
和集合格納手段に格納し、この格納された準検索結果和
集合に含まれる個々の文書について近似文書ベクトルを
計算し、準検索結果和集合の各文書について、近似文書
ベクトルと検索キーの近似検索キーベクトルとの内積を
関連度として計算し、この関連度が大なる文書から順に
所定の件数分の文書を検索結果として出力する並列型情
報検索プログラムを記録媒体に記録しているため、該記
録媒体を用いて、その流通性を高めることができる。According to another aspect of the present invention, a set of documents is divided into a plurality of document subsets, and each document subset is assigned to a plurality of partial search processors. The quasi-document vector stored in the storage means, the quasi-document vector is calculated for each document included in the stored document subset, and the quasi-document vector of the document and the quasi-document search key are calculated for each of the individual documents included in the document subset. The dot product with the search key vector is calculated as the degree of association, and a predetermined number of documents are searched as semi-search results in order from the document with the highest degree of relevance, and the predetermined number of retrieved documents are partially searched. The semi-search results are output from the processor to the result integration processor. The result integration processor collects the semi-search results output from each of the partial search processors. It is stored in the quasi-search result union storage means as a quasi-search result union set, an approximate document vector is calculated for each document included in the stored quasi-search result union set, and for each document in the quasi-search result union set, A recording medium that stores a parallel information search program that calculates the inner product of the approximate document vector and the approximate search key vector of the search key as the degree of relevance, and outputs a predetermined number of documents as search results in order from the document with the highest degree of relevance. Since the recording medium is recorded on the recording medium, it is possible to enhance the distribution of the recording medium.

【００２３】[0023]

【発明の実施の形態】以下、図面を用いて本発明の実施
の形態を説明する。図１は、本発明の一実施形態に係る
並列型情報検索装置の構成を示すブロック図である。同
図に示す並列型情報検索装置は、概念と文書とを同一の
ベクトル空間上でベクトル表現し、ベクトル同士の内積
により関連性を計算し、文書の集合の中から検索キーに
該当する文書を検索するものであるが、前記文書の集合
を分割した複数の文書部分集合にそれぞれ対応して設け
られ、各文書部分集合に対して前記検索キーでの検索を
行う複数の部分検索プロセッサ１００と、この複数の部
分検索プロセッサ１００で検索された複数の部分検索結
果である複数の準検索結果を各部分検索プロセッサ１０
０から収集し、この収集した準検索結果の和集合に対し
て前記検索キーでの検索を行う１つの結果統合プロセッ
サ１５０とから構成されている。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing the configuration of a parallel type information search device according to an embodiment of the present invention. The parallel information retrieval device shown in the figure expresses a concept and a document as vectors in the same vector space, calculates the relevance by the inner product of the vectors, and finds the document corresponding to the retrieval key from the set of documents. A plurality of partial search processors 100, which are provided to correspond to a plurality of document subsets obtained by dividing the set of documents and perform a search with the search key for each document subset, Each of the partial search processors 10 obtains a plurality of quasi-search results which are a plurality of partial search results searched by the plurality of partial search processors 100.
It is composed of one result integration processor 150 that collects from 0 and performs a search with the search key on the union of the collected quasi-search results.

【００２４】前記複数の部分検索プロセッサ１００の各
々は、文書の集合を複数の文書の部分集合に分割して得
られる文書の部分集合を格納する文書部分集合格納手段
を構成する文書部分集合格納部１１０、この格納された
部分集合の文書を入力として概念ベースを構築する部分
概念ベース構築部１２０、この部分概念ベース構築部１
２０で構築された部分概念ベースを格納する部分概念ベ
ース格納部１２１、前記文書部分集合格納部１１０に格
納された部分集合の文書と部分概念ベース格納部１２１
に格納された概念ベースとからそれぞれの文書に対する
準文書ベクトルを計算する準文書ベクトル計算部１１
１、この準文書ベクトル計算部１１１で計算された準文
書ベクトルを格納する準文書ベクトル格納部１１２、利
用者によって指定された検索キー２００と部分概念ベー
ス格納部１２１に格納された概念ベースとから準検索キ
ーベクトルを計算する準検索キーベクトル計算手段を構
成する準検索キーベクトル計算部１３０、前記文書部分
集合格納部１１０および準文書ベクトル格納部１１２を
参照しつつ準検索キーベクトルと文書部分集合格納部１
１０に格納された部分集合の個々の文書との関連度を準
検索キーベクトルと各準文書ベクトルとの内積として計
算し、この計算した関連度が大なる文書から順に規定件
数分の文書を準検索結果として検索する部分概念検索実
行手段を構成する部分概念検索実行部１４０、およびこ
の検索された規定件数分の文書を準検索結果として結果
統合プロセッサ１５０に出力する準検索結果出力手段を
構成する準検索結果出力部１４１から構成されている。Each of the plurality of partial search processors 100 is a document subset storage unit which constitutes a document subset storage means for storing a subset of documents obtained by dividing a set of documents into a plurality of document subsets. 110, a partial concept base constructing unit 120 that constructs a conceptual base by inputting the documents of the stored subset, and this partial concept base constructing unit 1
A sub-concept base storage unit 121 for storing the sub-concept base constructed in 20, a document of the sub-set stored in the document sub-set storage unit 110 and the sub-concept base storage unit 121.
Quasi-document vector calculation unit 11 that calculates a quasi-document vector for each document from the concept base stored in
1. From the quasi-document vector storage unit 112 that stores the quasi-document vector calculated by the quasi-document vector calculation unit 111, the search key 200 specified by the user, and the concept base stored in the partial concept base storage unit 121. The quasi-search key vector and the document sub-set are referred to with reference to the quasi-search key vector calculation unit 130, the document subset storage unit 110, and the quasi-document vector storage unit 112, which constitute the quasi-search key vector calculation means for calculating the quasi-search key vector. Storage 1
The degree of association with each document of the subset stored in 10 is calculated as the inner product of the quasi-search key vector and each quasi-document vector, and the documents of the specified number are quasi-ordered in order from the document with the calculated highest degree of association. A partial concept search execution unit 140 that constitutes a partial concept search execution unit that is searched as a search result, and a quasi-search result output unit that outputs the searched documents of the specified number to the result integration processor 150 as a quasi-search result. It is composed of a semi-search result output unit 141.

【００２５】また、結果統合プロセッサ１５０は、複数
の部分検索プロセッサ１００からそれぞれ出力される準
検索結果を収集する準検索結果収集手段を構成する準検
索結果収集部１６０、この収集した複数の部分検索プロ
セッサ１００からの準検索結果を準検索結果和集合とし
て格納する準検索結果和集合格納手段を構成する準検索
結果和集合格納部１６１、この格納された準検索結果和
集合の文書を入力として概念ベースを構築する近似概念
ベース構築部１７０、この近似概念ベース構築部１７０
で構築された近似概念ベースを格納する近似概念ベース
格納部１７１、前記準検索結果和集合格納部１６１に格
納された準検索結果和集合の文書と近似概念ベース格納
部１７１に格納された概念ベースとからそれぞれの文書
に対する近似文書ベクトルを計算する近似文書ベクトル
計算部１６２、この近似文書ベクトル計算部１６２で計
算された近似文書ベクトルを格納する近似文書ベクトル
格納部１６３、利用者から指定された前記利用者指定検
索キー２００と近似概念ベース格納部１７１に格納され
た概念ベースとから近似検索キーベクトルを計算する近
似検索キーベクトル計算手段を構成する近似検索キーベ
クトル計算部１８０、前記準検索結果和集合格納部１６
１および近似文書ベクトル格納部１６３を参照しつつ近
似検索キーベクトルと準検索結果和集合格納部１６１に
格納された準検索結果和集合の個々の文書との関連度を
近似検索キーベクトルと各近似文書ベクトルとの内積と
して計算し、この計算した関連度が大なる文書から順に
規定件数分の文書を検索結果として検索する概念検索実
行手段を構成する概念検索実行部１９０、およびこの検
索された規定件数分の文書を検索結果として出力する検
索結果出力手段を構成する検索結果出力部１９１から構
成されている。Further, the result integration processor 150 includes a quasi-search result collecting section 160 which constitutes quasi-search result collecting means for collecting quasi-search results output from the plurality of partial-search processors 100, and the plurality of collected partial searches. A quasi-search result union storage unit 161 that constitutes a quasi-search result union storage unit that stores quasi-search results from the processor 100 as a quasi-search result union; Approximate concept base building unit 170 that builds a base, and this approximate concept base building unit 170
Approximate concept base storage unit 171 for storing the approximate concept base constructed in 1., the document of the quasi-search result union stored in the quasi-search result union storage unit 161, and the concept base stored in the approximate concept base storage unit 171. And an approximate document vector calculation unit 162 that calculates an approximate document vector for each document from, and an approximate document vector storage unit 163 that stores the approximate document vector calculated by the approximate document vector calculation unit 162, and the above specified by the user. An approximate search key vector calculation unit 180 that constitutes an approximate search key vector calculation unit that calculates an approximate search key vector from the user-specified search key 200 and the concept base stored in the approximate concept base storage unit 171, and the quasi-search result sum. Set storage unit 16
1 and the approximate document vector storage unit 163, the degree of relevance between the approximate search key vector and the individual documents of the quasi-search result sum set stored in the quasi-search result sum set storage unit 161 is approximated by the approximate search key vector and each approximation. A concept search execution unit 190 that constitutes a concept search execution unit that calculates as an inner product with a document vector, and searches as many search documents as the search result in order from the document with the highest degree of relevance, and this searched rule. It is composed of a search result output unit 191 which constitutes a search result output means for outputting as many documents as the search results.

【００２６】なお、各部分検索プロセッサ１００におい
て、準文書ベクトル計算部１１１、準文書ベクトル格納
部１１２、部分概念ベース構築部１２０、部分概念ベー
ス格納部１２１は、準文書ベクトル計算格納手段を構成
するものであり、この準文書ベクトルは上述したように
部分概念ベース構築部１２０で構築され部分概念ベース
格納部１２１に格納された部分概念ベースと文書部分集
合格納部１１０に格納された部分集合の文書とからそれ
ぞれの文書に対して計算されるものであるが、具体的に
は文書部分集合格納部１１０の格納された文書部分集合
に含まれる個々の文書のそれぞれについて、文書部分集
合のすべての文書に現れる単語の統計的な性質と個々の
文書に現れる単語の統計的な性質とから実数値を要素と
し長さが１であるベクトルとして表現される個々の文書
のベクトルと準文書ベクトルとして計算されるものであ
る。In each partial search processor 100, the quasi-document vector calculation unit 111, quasi-document vector storage unit 112, sub-concept base construction unit 120, and sub-concept base storage unit 121 constitute quasi-document vector calculation storage means. This quasi-document vector is a document of the partial concept base constructed by the partial concept base construction unit 120 and stored in the partial concept base storage unit 121 and the subset document stored in the document subset storage unit 110 as described above. Is calculated for each of the documents, specifically, for each of the individual documents included in the document subset stored in the document subset storage unit 110, all documents of the document subset are calculated. It has a real value as an element and a length of 1 from the statistical properties of words appearing in and the statistical properties of words appearing in individual documents. It is those calculated as a vector and quasi document vector of each document to be represented as vector.

【００２７】また、結果統合プロセッサ１５０におい
て、近似文書ベクトル計算部１６２、近似概念ベース構
築部１７０、近似概念ベース格納部１７１、近似文書ベ
クトル格納部１６３は近似文書ベクトル計算格納手段を
構成するものであり、この近似文書ベクトルは上述した
ように近似概念ベース構築部１７０で構築され近似概念
ベース格納部１７１に格納された近似概念ベースと準検
索結果和集合格納部１６１に格納された準検索結果和集
合の文書とからそれぞれの文書に対して計算されるもの
であるが、具体的には準検索結果和集合格納部１６１に
格納された準検索結果和集合に含まれる個々の文書のそ
れぞれについて、準検索結果和集合のすべての文書に現
れる単語の統計的な性質と個々の文書に現れる単語の統
計的な性質とから実数値を要素とし長さが１であるベク
トルとして表現される個々の文書のベクトルを近似文書
ベクトルとして計算されるものである。In the result integration processor 150, the approximate document vector calculation unit 162, the approximate concept base construction unit 170, the approximate concept base storage unit 171, and the approximate document vector storage unit 163 constitute an approximate document vector calculation storage unit. This approximate document vector is the approximate concept base constructed by the approximate concept base constructing unit 170 and stored in the approximate concept base storage unit 171 as described above, and the quasi-search result sum stored in the quasi-search result sum set storage unit 161. It is calculated for each document from the set of documents. Specifically, for each of the individual documents included in the quasi-search result sum set stored in the quasi-search result sum set storage unit 161, Based on the statistical properties of words that appear in all documents in the quasi-search result union and the statistical properties of words that appear in individual documents, In which the length with a value of the element is calculated as an approximation document vector a vector of individual documents represented as a vector is 1.

【００２８】また、各部分検索プロセッサ１００におい
て、準検索キーベクトル計算部１３０は、上述したよう
に、利用者によって指定された利用者指定検索キー２０
０と部分概念ベース格納部１２１に格納された概念ベー
スとから準検索キーベクトルを計算するが、具体的には
文書部分集合格納部１１０に格納された文書の部分集合
に含まれるすべての文書に現れる単語の統計的な性質と
利用者指定検索キー２００に現れる単語の統計的な性質
とから実数値を要素とし長さが１であるベクトルを準検
索キーベクトルとして計算する。Further, in each partial search processor 100, the quasi-search key vector calculation unit 130, as described above, uses the user-specified search key 20 specified by the user.
The quasi-search key vector is calculated from 0 and the concept base stored in the partial concept base storage unit 121. Specifically, all documents included in the document subset stored in the document subset storage unit 110 are calculated. A vector having a real value as an element and a length of 1 is calculated as a quasi-search key vector from the statistical properties of the appearing words and the statistical properties of the words appearing in the user-specified search key 200.

【００２９】同様に、結果統合プロセッサ１５０におい
て、近似検索キーベクトル計算部１８０は、上述したよ
うに、利用者から指定された前記利用者指定検索キー２
００と近似概念ベース格納部１７１に格納された概念ベ
ースとから近似検索キーベクトルを計算するが、具体的
には準検索結果和集合格納部１６１に格納された準検索
結果和集合に含まれるすべての文書に現れる単語の統計
的な性質と利用者指定検索キー２００に現れる単語の統
計的な性質とから実数値を要素とし長さが１であるベク
トルを近似検索キーベクトルとして計算する。Similarly, in the result integration processor 150, the approximate search key vector calculation unit 180, as described above, uses the user-specified search key 2 specified by the user.
00 and the concept base stored in the approximate concept base storage unit 171, an approximate search key vector is calculated, and specifically, all included in the quasi-search result union stored in the quasi-search result union storage unit 161. Based on the statistical properties of the words appearing in the document and the statistical properties of the words appearing in the user-specified search key 200, a vector having a real value as an element and a length of 1 is calculated as the approximate search key vector.

【００３０】次に、図２に示すフローチャートを参照し
て、上述したように構成される本実施形態の並列型情報
検索装置の作用について説明する。Next, with reference to the flow chart shown in FIG. 2, the operation of the parallel type information retrieval apparatus of the present embodiment configured as described above will be described.

【００３１】まず、検索対象となる文書の集合は、文書
の部分集合に分割され、それぞれの部分検索プロセッサ
１００内にある文書部分集合格納部１１０に格納される
（ステップＳ１）。各部分検索プロセッサ１００の部分
概念ベース構築部１２０においては、該文書部分集合格
納部１１０に格納された文書を入力として概念ベースを
構築し、部分概念ベース格納部１２１に格納する（ステ
ップＳ３）。また、準文書ベクトル計算部１１１におい
て、該文書部分集合格納部１１０に格納された文書と部
分概念ベース格納部１２１に格納された概念ベースとか
ら、それぞれの文書に対する準文書ベクトルを計算し、
準文書ベクトル格納部１１２に格納する（ステップＳ
５）。First, the set of documents to be searched is divided into document subsets and stored in the document subset storage unit 110 in each partial search processor 100 (step S1). The partial concept base construction unit 120 of each partial search processor 100 constructs a concept base using the document stored in the document subset storage unit 110 as an input and stores it in the partial concept base storage unit 121 (step S3). In the quasi-document vector calculation unit 111, a quasi-document vector for each document is calculated from the document stored in the document subset storage unit 110 and the concept base stored in the partial concept base storage unit 121,
Store in the quasi-document vector storage unit 112 (step S
5).

【００３２】利用者が検索キー２００を指定すると、該
利用者指定検索キー２００はすべての部分検索プロセッ
サ１００と結果統合プロセッサ１５０に入力され、検索
動作が開始される（ステップＳ７）。When the user specifies the search key 200, the user specified search key 200 is input to all the partial search processors 100 and the result integration processor 150, and the search operation is started (step S7).

【００３３】まず、各部分検索プロセッサ１００では、
準検索キーベクトル計算部１３０において該利用者指定
検索キー２００と部分概念ベース格納部１２１に格納さ
れた概念ベースとから準検索キーベクトルを計算し、該
準検索キーベクトルを部分概念検索実行部１４０に入力
する（ステップＳ９）。準検索キーベクトルを入力され
た部分概念検索実行部１４０は、文書部分集合格納部１
１０と準文書ベクトル格納部１１２とを参照しつつ、該
利用者指定検索キーと文書部分集合格納部１１０に格納
された個々の文書との関連度を準検索キーベクトルと準
文書ベクトル格納部１１２に格納された各準文書の準文
書ベクトルとの内積として計算し（ステップＳ１１）、
関連度の大きいものから規定件数だけの文書を選択す
る。選択された文書を準検索結果として準検索結果出力
部１４１より結果統合プロセッサ１５０へ出力する（ス
テップＳ１３）。First, in each partial search processor 100,
The quasi-search key vector calculation unit 130 calculates a quasi-search key vector from the user-specified search key 200 and the concept base stored in the sub-concept base storage unit 121, and the quasi-search key vector is used as the sub-concept search execution unit 140. (Step S9). The partial concept search execution unit 140 to which the semi-search key vector is input is
10 and the quasi-document vector storage unit 112, the quasi-search key vector and quasi-document vector storage unit 112 are used to determine the degree of association between the user-specified search key and the individual documents stored in the document subset storage unit 110. Calculated as an inner product of each quasi-document stored in the quasi-document and the quasi-document vector (step S11),
Select the specified number of documents from the ones with the highest degree of relevance. The selected document is output as the semi-search result from the semi-search result output unit 141 to the result integration processor 150 (step S13).

【００３４】すべての部分検索プロセッサ１００が準検
索結果を得た後に、結果統合プロセッサ１５０の準検索
結果収集部１６０が動作を開始し、各部分検索プロセッ
サ１００の準検索結果出力部１４１からの出力を収集し
て和集合を形成し、準検索結果和集合格納部１６１に格
納する（ステップＳ１５）。準検索結果和集合が準検索
結果和集合格納部１６１に格納されると、近似概念ベー
ス構築部１７０が該準検索結果和集合格納部１６１に格
納された文書を入力として概念ベースを構築し、近似概
念ベース格納部１７１に格納する（ステップＳ１７）。
続いて、近似文書ベクトル計算部１６２において、準検
索結果和集合格納部１６１に格納された個々の文書につ
いて、近似概念ベース格納部１７１に格納された概念ベ
ースに基づいて近似文書ベクトルが計算され、近似文書
ベクトル格納部１６３に格納される（ステップＳ１
９）。After all the partial search processors 100 have obtained the quasi-search results, the quasi-search result collection section 160 of the result integration processor 150 starts its operation, and the output from the quasi-search result output section 141 of each partial search processor 100. Are collected to form a union and are stored in the quasi-search result union storage unit 161 (step S15). When the quasi-search result union storage unit 161 stores the quasi-search result union storage unit 161, the approximate concept base construction unit 170 constructs a concept base using the document stored in the quasi-search result union storage unit 161 as an input, It is stored in the approximate concept base storage unit 171 (step S17).
Then, the approximate document vector calculation unit 162 calculates an approximate document vector for each document stored in the quasi-search result union storage unit 161 based on the concept base stored in the approximate concept base storage unit 171. It is stored in the approximate document vector storage unit 163 (step S1).
9).

【００３５】利用者から指定されていた利用者指定検索
キー２００は近似検索キーベクトル計算部１８０に入力
され、近似概念ベース格納部１７１に格納された概念ベ
ースに基づいて、近似検索キーベクトルが計算される
（ステップＳ２１）。概念検索実行部１９０では、準検
索結果和集合格納部１６１と近似文書ベクトル格納部１
６３に格納された文書および該文書に対する近似文書ベ
クトルを参照して、該文書と利用者指定検索キーとの関
連度を近似検索キーベクトルと各近似文書ベクトルとの
内積として計算し（ステップＳ２３）、関連度の大きい
ものから規定件数だけの文書を選択する。最後に該選択
された文書を検索結果出力部１９１より出力する（ステ
ップＳ２５）。The user-specified search key 200 specified by the user is input to the approximate search key vector calculation unit 180, and the approximate search key vector is calculated based on the concept base stored in the approximate concept base storage unit 171. (Step S21). In the concept search execution unit 190, the semi-search result union storage unit 161 and the approximate document vector storage unit 1
With reference to the document stored in 63 and the approximate document vector for the document, the degree of association between the document and the user-specified search key is calculated as the inner product of the approximate search key vector and each approximate document vector (step S23). , Select only the specified number of documents from the ones with the highest relevance. Finally, the selected document is output from the search result output unit 191 (step S25).

【００３６】本実施形態では、部分検索プロセッサ１０
０と結果統合プロセッサ１５０とは個別の実体として示
したが、これは本発明において本質的に規定されるもの
ではなく、部分検索プロセッサ１００のうち１台が結果
統合プロセッサ１５０を兼務してもかまわない。また、
部分検索プロセッサ１００および結果統合プロセッサ１
５０の各機能部位を独立した実体として記述してある
が、これも本発明において本質的に規定されるものでは
なく、例えばマイクロプロセッサとメモリとからなる実
体によって、すべての機能部位をソフトウェアとして実
現してもよいものである。In this embodiment, the partial search processor 10
Although 0 and the result integration processor 150 are shown as separate entities, this is not essentially defined in the present invention, and one of the partial search processors 100 may also serve as the result integration processor 150. Absent. Also,
Partial search processor 100 and result integration processor 1
Although each functional part of 50 is described as an independent entity, this is not essentially defined in the present invention, and all the functional parts are realized as software by an entity composed of a microprocessor and a memory, for example. You can do it.

【００３７】上述したように、本実施形態では、各部分
検索プロセッサ１００において部分検索を行う場合に、
各部分検索プロセッサ１００が格納した文書の部分集合
のみに基づいて構築された概念ベースを用いている。ま
た、各部分検索プロセッサ１００で得られた部分検索の
結果である準検索結果は結果統合プロセッサ１５０で収
集されて統合され、この統合の際に結果統合プロセッサ
１５０において準検索結果の和集合から概念ベースを構
築し、準検索結果の和集合に含まれる個々の文書の近似
文書ベクトル、近似検索キーベクトル、および個々の文
書に対する関連度を再計算し、この再計算された関連度
に基づき検索結果を決定している。As described above, in the present embodiment, when partial search is performed in each partial search processor 100,
A concept base constructed based on only a subset of documents stored by each partial search processor 100 is used. Further, the quasi-search results, which are the results of the sub-searches obtained by the respective sub-search processors 100, are collected and integrated by the result integration processor 150, and at the time of this integration, the result integration processor 150 determines the concept from the union of the quasi-search results. The base is constructed, the approximate document vector of each document included in the union of the quasi-search results, the approximate search key vector, and the relevance to each document are recalculated, and the search result is based on this recalculated relevance. Has been decided.

【００３８】すなわち、上記実施形態では、検索対象と
なる文書の集合を部分集合に分割し、部分集合それぞれ
に対し、部分検索プロセッサが並列に検索を実行するた
め、概念検索を高速に行うことができる。最終的な検索
結果は検索統合プロセッサにおいて、準検索結果和集合
に対する概念検索を行う必要があり、この処理は並列化
できないが、検索対象となる文書の件数が、部分検索プ
ロセッサに割り付けられるよう部分集合に分割された後
でも十分に大きく、かつ、検索結果として得るべき文書
の件数、すなわち規定件数が相対的に小さければ、該結
果統合プロセッサにおける処理は部分検索プロセッサで
の処理に比べて十分に小さいため、十分に高速化され
る。例えば、１０台の部分検索プロセッサを用い、１０
０万件の文書から検索キーと関連度が高い順に１００件
を得ることを考えると、各部分検索プロセッサは、１０
万件の文書に対する概念検索を行う。一方、結果統合プ
ロセッサでは、個々の部分検索プロセッサで得られた１
００件の結果をすべて収集して和集合を求め、これに対
して概念検索を行うため、検索対象となる文書の件数は
１００件×１０台＝１０００件である。このように、結
果統合プロセッサでの概念検索の処理量は、部分検索プ
ロセッサでの概念検索の処理量に比べて十分に小さいた
め、概念検索全体を通した処理は十分に高速に実行され
ることになる。That is, in the above embodiment, the set of documents to be searched is divided into subsets, and the partial search processors execute the search in parallel for each of the subsets, so that the concept search can be performed at high speed. it can. The final search result needs to be conceptually searched for the quasi-search result union in the search integration processor, and this process cannot be parallelized. However, the number of documents to be searched can be partially allocated to the partial search processor. Even after being divided into sets, if the number of documents to be obtained as a search result, that is, the prescribed number is relatively small, the processing in the result integration processor will be sufficient compared to the processing in the partial search processor. Since it is small, it is sufficiently speeded up. For example, using 10 partial search processors, 10
Considering that 100 documents are obtained in order of increasing relevance to the search key from 0,000 documents, each partial search processor has 10
Perform a concept search on all documents. On the other hand, in the result integration processor, 1 obtained by each partial search processor
Since all 00 results are collected to obtain a union set and a concept search is performed for this, the number of documents to be searched is 100 × 10 = 1000. As described above, the processing amount of the concept search in the result integration processor is sufficiently smaller than the processing amount of the concept search in the partial search processor, so that the processing through the entire concept search can be executed sufficiently fast. become.

【００３９】また、部分検索プロセッサでは、割り付け
られた文書に基づいて概念ベースを生成し、結果統合プ
ロセッサでは、得られた準検索結果和集合に基づいて概
念ベースを生成するため、各プロセッサが検索対象とす
る以外の文書を格納する必要がなく、記憶領域を無駄に
消費することがない。Further, the partial search processor generates a concept base based on the assigned document, and the result integration processor generates a concept base based on the obtained quasi-search result union, so that each processor searches. It is not necessary to store documents other than the target document, and the storage area is not wasted.

【００４０】部分検索プロセッサおよび結果統合プロセ
ッサ上で構築される概念ベースは、構築の源となる文書
の集合が異なるため、それぞれのプロセッサによって異
なるし、また、従来技術の概念検索装置で構築される概
念ベースとも異なる。このため、検索結果も、従来技術
の概念検索での検索結果とは異なってくるが、準文書ベ
クトルと準検索キーベクトルともに個々のプロセッサの
概念ベースに基づいて生成しており、関連度は従来技術
の概念検索における関連度と大きくは異ならないこと、
結果統合プロセッサにおいて、各部分検索プロセッサで
の検索結果を１つの概念ベースに基づき再検索すること
から、近似の結果が得られる。The concept bases constructed on the partial search processor and the result integration processor differ depending on the respective processors because the set of documents as the source of construction is different, and are constructed by the concept retrieval device of the prior art. It is also different from the concept base. Therefore, the search result is different from the search result in the conventional concept search, but both the quasi-document vector and the quasi-search key vector are generated based on the concept base of each processor, and the degree of relevance is conventional. Does not differ significantly from the degree of relevance in technology concept search,
In the result integration processor, approximate results are obtained by re-searching the search results of each partial search processor based on one concept base.

【００４１】検索対象となる文書の集合を部分集合に分
割して部分検索プロセッサに割り付け、該部分検索プロ
セッサが、自プロセッサに割り付けられれば文書の部分
集合のみに基づいて概念ベースを構築する点が従来の方
法と本質的に異なっている。The point that the set of documents to be searched is divided into subsets and assigned to the partial search processor, and if the partial search processor is assigned to its own processor, the concept base is constructed based on only the subset of documents. Essentially different from conventional methods.

【００４２】なお、上記実施形態の並列型情報検索方法
の処理手順をプログラムとして例えばＣＤやＦＤなどの
記録媒体に記録して、この記録媒体をコンピュータシス
テムに組み込んだり、または記録媒体に記録されたプロ
グラムを通信回線を介してコンピュータシステムにダウ
ンロードしたり、または記録媒体からインストールし、
該プログラムでコンピュータシステムを作動させること
により、並列型情報検索方法を実施する並列型情報検索
装置として機能させることができることは勿論であり、
このような記録媒体を用いることにより、その流通性を
高めることができるものである。The processing procedure of the parallel type information retrieval method of the above embodiment is recorded as a program in a recording medium such as a CD or FD, and this recording medium is incorporated in a computer system or recorded in the recording medium. The program is downloaded to the computer system via a communication line or installed from a recording medium,
Of course, by operating a computer system with the program, it is possible to function as a parallel-type information search device that implements the parallel-type information search method.
By using such a recording medium, it is possible to improve its distribution.

【００４３】[0043]

【発明の効果】以上説明したように、本発明によれば、
検索対象となる文書の集合を複数の文書部分集合に分割
し、各文書部分集合を複数の部分検索プロセッサの各々
に割り当て、該複数の部分検索プロセッサで並列に検索
を行うので、概念検索を高速化し得るとともに、各部分
検索プロセッサには分割された文書部分集合のみが格納
され、この文書部分集合のみに基づいて概念ベースを構
築することにより、記憶容量を低減することができ、経
済化を図ることができる。As described above, according to the present invention,
A set of documents to be searched is divided into a plurality of document subsets, each document subset is assigned to each of a plurality of partial search processors, and parallel search is performed by the plurality of partial search processors. In addition, each partial search processor stores only a divided document subset, and by constructing a concept base based only on this document subset, the storage capacity can be reduced and the economy can be improved. be able to.

[Brief description of drawings]

【図１】本発明の一実施形態に係る並列型情報検索装置
の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a parallel type information search device according to an embodiment of the present invention.

【図２】図１に示す並列型情報検索装置の作用を示すフ
ローチャートである。FIG. 2 is a flowchart showing an operation of the parallel information search device shown in FIG.

[Explanation of symbols]

１００部分検索プロセッサ１１０文書部分集合格納部１１１準文書ベクトル計算部１１２準文書ベクトル格納部１２０部分概念ベース構築部１２１部分概念ベース格納部１３０準検索キーベクトル計算部１４０部分概念検索実行部１４１準検索結果出力部１５０結果統合プロセッサ１６０準検索結果収集部１６１準検索結果和集合格納部１６２近似文書ベクトル計算部１６３近似文書ベクトル格納部１７０近似概念ベース構築部１７１近似概念ベース格納部１８０近似検索キーベクトル計算部１９０概念検索実行部１９１検索結果出力部２００利用者指定検索キー 100 partial search processor 110 Document Subset Storage 111 Quasi-document vector calculator 112 Quasi-document vector storage 120 Partial Concept Base Construction Department 121 Partial Concept Base Storage 130 Semi-search key vector calculator 140 Partial Concept Search Execution Unit 141 Semi-search result output section 150 result integration processor 160 Semi-search result collection section 161 Semi-search result union storage unit 162 Approximate document vector calculator 163 Approximate document vector storage 170 Approximate concept base construction unit 171 Approximate concept base storage unit 180 Approximate search key vector calculator 190 Concept search execution unit 191 Search result output section 200 User-specified search key

Claims

[Claims]

1. A parallel information search device for searching a document corresponding to a search key from a set of documents, wherein the parallel information search device is provided corresponding to each of a plurality of document subsets obtained by dividing the set of documents. A plurality of partial search processors that perform a search on the document subset with the search key, and a plurality of quasi-search results that are a plurality of partial search results searched by the plurality of partial search processors are collected from each partial search processor And a result integration processor that performs a search with the search key for the union of the collected quasi-search results, and each of the plurality of partial search processors stores the divided document subsets. The document subset storage means and, for each of the individual documents contained in this stored document subset, the statistical properties and words of the words that appear in all the documents of the document subset. Quasi-document vector calculation storage means for calculating and storing a vector of each document represented by a vector having a real value as an element and having a length of 1 as a quasi-document vector from the statistical property of words appearing in the document, From the statistical properties of words appearing in all documents included in the document subset and the statistical properties of words appearing in the search key, a vector having a real value as an element and a length of 1 is set as a quasi-search key vector. Quasi-search key vector calculation means for calculating and quasi-document vector of the document stored in the quasi-document vector calculation storage means for each individual document included in the document subset stored in the document subset storage means The inner product of the above-mentioned calculated quasi-search key vector is calculated as the degree of relevance, and a predetermined number of documents are quasi-searched in order from the document with the highest degree of relevance. A partial concept search execution means for searching as a result, and a semi-search result output means for outputting the searched documents of a predetermined number as a semi-search result, wherein the result integration processor is the plurality of partial search processors Quasi-search result collection means for collecting quasi-search results respectively output from quasi-search results, quasi-search result union storage means for storing the quasi-search results collected from the plurality of partial search processors as a quasi-search result union set, For each of the individual documents included in the stored quasi-search result union, the actual characteristics are calculated from the statistical properties of the words that appear in all the documents of the quasi-search result union and the statistical properties of the words that appear in each document. Approximate document vector calculation storage that calculates and stores the vector of each document expressed as a vector whose numerical value is an element and whose length is 1 as an approximate document vector And a vector having a real value as an element and a length of 1 from the statistical properties of words appearing in all documents included in the quasi-search result union set and the statistical properties of words appearing in the search key. Approximate search key vector calculation means for calculating as an approximate search key vector, and for each individual document included in the quasi-search result union, the approximate document vector of the document stored in the approximate document vector calculation storage means and the The inner product with the calculated approximate search key vector is calculated as the degree of relevance, and a concept search execution unit that retrieves a predetermined number of documents as search results in order from the document with the highest degree of relevance, and the predetermined number of retrieved cases Parallel-type information search device, comprising: a search result output unit that outputs a minute document as a search result.

2. A parallel information retrieval method for retrieving a document corresponding to a retrieval key from a set of documents, wherein the set of documents is divided into a plurality of document subsets, and the plurality of divided document portions are divided. Each of the sets is assigned to a plurality of partial search processors, stored in the document subset storage means of each partial search processor, and the individual documents included in the document subset stored in the document subset storage means in each partial search processor. , Each of which is represented as a vector whose real value is an element and has a length of 1 from the statistical properties of words appearing in all documents of the document subset and the statistical properties of words appearing in individual documents. The vector of the document is calculated and stored as a quasi-document vector, and the word sequences appearing in all the documents included in the document subset in each partial search processor are stored. A vector having a real value as an element and a length of 1 is calculated as a quasi-search key vector from the statistical property and the statistical property of the word appearing in the search key, and is stored in the document subset storage means in each partial search processor. For each individual document included in the stored document subset, the inner product of the quasi-document vector of the document and the quasi-search key vector is calculated as the degree of association, and a predetermined number of documents are ordered in descending order of the degree of association. Minutes of documents are searched as semi-search results, and a predetermined number of searched documents are output as semi-search results, and the semi-search results output from multiple partial search processors are collected by the result integration processor. Storing quasi-search results from a plurality of partial search processors collected in the result integration processor as quasi-search result union storage means in quasi-search result union storage means , For each of the individual documents included in this stored quasi-search result union, the statistical properties of the words that appear in all documents of the quasi-search result union and the statistical properties of the words that appear in each document Statistic of words appearing in all the documents included in the quasi-search result union set by calculating and storing the vector of each document expressed as a vector having a real value as an element and a length of 1 as Individual documents included in the quasi-search result union set by calculating a vector having a real value as an element and having a length of 1 as an approximate search key vector from the general property and the statistical property of the word appearing in the search key. For each of the above, the inner product of the approximate document vector of the document and the approximate search key vector is calculated as the degree of relevance, and a predetermined number of sentences are sequentially arranged from the document with the highest degree of relevance. Results As searched, parallel information retrieval method and outputting the retrieved predetermined number of items document as a search result.

3. A parallel information retrieval program for retrieving a document corresponding to a retrieval key from a set of documents, wherein the set of documents is divided into a plurality of document subsets, and the plurality of divided document portions are divided. Each of the sets is assigned to a plurality of partial search processors, stored in the document subset storage means of each partial search processor, and the individual documents included in the document subset stored in the document subset storage means in each partial search processor. , Each of which is represented as a vector whose real value is an element and has a length of 1 from the statistical properties of words appearing in all documents of the document subset and the statistical properties of words appearing in individual documents. The vector of the document is calculated and stored as a quasi-document vector, and a single document that appears in all documents included in the document subset in each partial search processor is stored. A vector having a real value as an element and a length of 1 is calculated as a quasi-search key vector from the statistical properties of words and the statistical properties of words appearing in the search key, and each partial search processor stores a document subset. For each of the individual documents included in the document subset stored in the means, the inner product of the quasi-document vector of the document and the quasi-search key vector is calculated as the degree of association, and the documents with the highest degree of association are determined in order. The documents for the number of are searched as the quasi-search results, and the documents for the specified number of retrieved are output as the quasi-search results, and the quasi-search results output from multiple partial search processors are collected by the result integration processor. Then, the quasi-search results from the plurality of partial search processors collected in the result integration processor are quasi-search result union storage means as quasi-search result union storage means. For each of the individual documents included in this stored quasi-search result union, the statistical properties of the words that appear in all documents of the quasi-search result union and the statistical properties of the words that appear in each individual document are stored. Words that appear in all the documents included in the quasi-search result union are calculated by storing the vector of each document expressed as a vector having a real value as an element and a length of 1 from the property as an approximate document vector and storing the vector. Is calculated as an approximate search key vector having a real value as an element and an approximate search key vector from the statistical properties of and the statistical properties of words appearing in the search key. For each of the documents, the inner product of the approximate document vector of the document and the approximate search key vector is calculated as the degree of association, and the predetermined number of cases are calculated in order from the document with the highest degree of association. Parallel information retrieval program for document search as a search result, and outputs a document of the retrieved predetermined number of items as the search result.

4. A recording medium in which a parallel information search program for searching a document corresponding to a search key from a set of documents is recorded, wherein the set of documents is divided into a plurality of document subsets, and the division is performed. Each of the plurality of document subsets is allocated to a plurality of partial search processors, stored in the document subset storage means of each partial search processor, and stored in the document subset storage means in each partial search processor. A vector having a real value as an element and a length of 1 based on the statistical properties of the words appearing in all the documents of the document subset and the statistical properties of the words appearing in each document for each of the included documents. The vector of each document expressed as is calculated and stored as a quasi-document vector, and all the documents included in the document subset in each partial search processor are stored. A vector having a real value as an element and a length of 1 is calculated as a quasi-search key vector from the statistical properties of words appearing in all documents and the statistical properties of words appearing in the search key, and each partial search processor In each of the individual documents included in the document subset stored in the document subset storage means, the inner product of the quasi-document vector of the document and the quasi-search key vector is calculated as the degree of association, and the degree of association is large. The specified number of documents are searched in order from the following documents as the quasi-search results, the specified number of searched documents are output as the quasi-search results, and the quasi-search results output from the multiple partial search processors are output. Semi-search as a quasi-search result union of quasi-search results from multiple partial search processors collected by the result merging processor For each of the individual documents included in the stored quasi-search result union stored in the fruit union set storage means, the statistical properties of words appearing in all the documents in the quasi-search result union and the individual documents A vector of each document represented by a vector having a real value as an element and a length of 1 is calculated and stored as an approximate document vector from the statistical properties of the appearing words, and is included in the quasi-search result union. A vector having a real value as an element and a length of 1 is calculated as an approximate search key vector from the statistical properties of words appearing in all documents and the statistical properties of words appearing in the search key, and the quasi-search result is obtained. For each individual document included in the union set, the inner product of the approximate document vector of the document and the approximate search key vector is calculated as the degree of association, and the document with the highest degree of association is calculated. Searching a search result document in a predetermined number of items to order from a recording medium recording the parallel information retrieval program and outputs a document of the retrieved predetermined number of items as the search result.