JP2749790B2

JP2749790B2 - Parallel text search system

Info

Publication number: JP2749790B2
Application number: JP7069471A
Authority: JP
Inventors: アリ・エム・アルハッジ; 英一郎隅田; 仁飯田
Original assignee: Ei Tei Aaru Onsei Honyaku Tsushin Kenkyusho Kk
Current assignee: Ei Tei Aaru Onsei Honyaku Tsushin Kenkyusho Kk
Priority date: 1995-03-28
Filing date: 1995-03-28
Publication date: 1998-05-13
Anticipated expiration: 2013-05-13
Also published as: JPH08263517A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、複数のプロセッサを備
えた並列処理コンピュータを用いてテキスト検索処理を
実行する並列テキスト検索システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a parallel text search system for executing a text search process using a parallel processing computer having a plurality of processors.

【０００２】[0002]

【従来の技術】テキスト検索システムはコンピュータを
ベースとするシステムであり、その機能は、ユーザが要
求しているテキスト又は文書をデータベース中で検索す
る機能である。一般に、電子形式で記憶されているテキ
ストには、新聞文書、技術の要約、事務所の覚書、電子
メールのメッセージなどがある。最近このような電子文
書が普及したため、実際に、テキスト検索システムの重
要性が著しく高まりかつこのシステムが最近の情報シス
テムの不可欠の要素になってきている。多種類の演算テ
キスト検索システムはすでに存在しているが、検索の有
効性と検索速度とを両立するシステムは存在しない。2. Description of the Related Art A text search system is a computer-based system, and its function is to search a database for a text or document requested by a user. Generally, text stored in electronic form includes newspaper documents, technical summaries, office notes, e-mail messages, and the like. The recent widespread use of such electronic documents has in fact significantly increased the importance of text search systems and has made them an integral part of modern information systems. Although various types of operation text search systems already exist, there is no system that achieves both search effectiveness and search speed.

【０００３】大部分の商業用テキスト検索システムは、
転置索引とブール質問文を使ったテキスト検索アルゴリ
ズムに基づいたブールシステムであり、例えば、文献１
「Salton,G. and McGill, M,“Introduction to Modern
Information Retrieval, McGraw Hill,New York,１９
９３年」において開示されている。転置索引（又は逆引
きファイルともいう。）はテキスト中に出てくるキーワ
ードと、テキストに対するポインタのリストで構成さ
れ、ユーザが供給するブール質問文は、ＡＮＤ，ＯＲ，
ＮＯＴなどのブール演算子によって結合されたサーチ用
語で構成されている。このシステムの処理においては、
検索操作が行われ、ユーザの質問文に一致したテキスト
に対するポインタがユーザに戻される。ブールシステム
の性能は良くない。この理由は、このシステムは通常、
逐次処理コンピュータを用いて実行されるので、検索特
性が劣りかつ検索速度が遅いからである。[0003] Most commercial text search systems include:
This is a Boolean system based on a text search algorithm using an inverted index and a Boolean question sentence.
"Salton, G. and McGill, M," Introduction to Modern
Information Retrieval, McGraw Hill, New York, 19
1993 ". An inverted index (or reverse file) is composed of a list of keywords appearing in the text and pointers to the text. The Boolean query sent by the user includes AND, OR,
It consists of search terms joined by Boolean operators such as NOT. In the processing of this system,
A search operation is performed, and a pointer to the text that matches the user's question is returned to the user. The performance of the Boolean system is not good. The reason is that this system is usually
This is because the search is performed using a sequential processing computer, so that the search characteristics are inferior and the search speed is low.

【０００４】コンピュータ技術とテキスト検索アルゴリ
ズムの進歩とともに、従来のブールシステムの代替シス
テムに用いる方法として、テキストランク付け検索法が
提案されている。このテキストランク付け検索法は、ベ
クトル処理検索アルゴリズムに基づいており、そのアル
ゴリズムではテキストとユーザの質問文が重み付けベク
トルとしてモデル化されています。その検索操作は、テ
キストのベクトルが与えられた質問文のベクトルといか
にうまく整合しているかについてスコアを計算し、次い
で、トップにランクされたテキストをユーザに戻すこと
を含む。このテキストランク付け検索法を用いた検索シ
ステムにおいては、重み付けベクトルとランク付けを用
いているので、一般に、検索の有効性が高くなってい
る。さらに、これらのシステムは、多数のベクトルを収
納して並列で処理できる大規模コンピュータを用いて並
列処理を実行することができるので、比較的高い検索速
度で検索処理を実行することができる。[0004] With advances in computer technology and text search algorithms, text ranking search methods have been proposed as a method to be used as an alternative to conventional Boolean systems. This text-ranked search method is based on a vector processing search algorithm, in which the text and the user's question are modeled as weighted vectors. The search operation involves calculating a score for how well the vector of text matches the vector of the given question sentence, and then returning the top ranked text to the user. In a search system using the text ranking search method, since the weighting vector and the ranking are used, the effectiveness of the search is generally high. Further, these systems can execute parallel processing using a large-scale computer capable of storing a large number of vectors and processing them in parallel, so that the search processing can be executed at a relatively high search speed.

【０００５】ベクトル処理テキスト検索アルゴリズムの
並列処理を、ＳＩＭＤ（Single Instruction stream-Mu
ltiple Data stream）型コネクション・マシン（Connec
tionMachine）を用いて実行したことが文献２「Stanfil
l,C. and Kahle,B.,“Parallel Free-Text Search on t
he Connection Machine",in Communications of theAC
M,Vol.29,No.12,pp.1229-1239,１９８８年１２月」にお
いて報告されている。この検索処理においては、テキス
トと質問文の２値ベクトル、いわゆるシグネチャ（sign
ature）を用いている。The parallel processing of the vector processing text search algorithm is performed by SIMD (Single Instruction stream-Mu).
ltiple Data stream type connection machine (Connec
Reference 2 "Stanfil
l, C. and Kahle, B., “Parallel Free-Text Search on t
he Connection Machine ", in Communications of theAC
M, Vol. 29, No. 12, pp. 1229-1239, December 1988 ". In this search processing, a binary vector of a text and a question sentence, a so-called signature (sign
ature).

【０００６】図２に、上記コネクション・マシンを並列
質問プロセッサ２ａとして用いた従来例の検索システム
を示す。この従来例の検索システムは、テキストデータ
を格納するテキストデータベースメモリ１１と、２値
（バイナリ）ベクトル発生器１ａと、２値ベクトルメモ
リと、並列質問プロセッサ２ａとを備える。２値ベクト
ル発生器１ａはコネクション・マシンのホストの逐次計
算機であって、並列質問プロセッサ２ａは上記コネクシ
ョン・マシンの並列プロセッサで構成される。ここで、
２値ベクトル発生器１ａは、テキストデータベースメモ
リ１１内のテキストデータベースに対して直接アクセス
してテキストデータを読み出し２値ベクトルを生成して
２値ベクトルデータベースメモリ１３に格納する。一
方、並列質問プロセッサ２ａは、入力される質問文に応
答して、２値ベクトルデータベースメモリ１３内の２値
ベクトルを参照して検索して検索結果を出力する。FIG. 2 shows a conventional search system using the above connection machine as the parallel query processor 2a. This conventional search system includes a text database memory 11 for storing text data, a binary (binary) vector generator 1a, a binary vector memory, and a parallel query processor 2a. The binary vector generator 1a is a sequential computer of the host of the connection machine, and the parallel query processor 2a is composed of the parallel processor of the connection machine. here,
The binary vector generator 1a directly accesses the text database in the text database memory 11, reads out text data, generates a binary vector, and stores it in the binary vector database memory 13. On the other hand, in response to the input question text, the parallel question processor 2a searches by referring to the binary vector in the binary vector database memory 13, and outputs a search result.

【０００７】[0007]

【発明が解決しようとする課題】しかしながら、この文
献２の検索システムにおいては、明らかに、コネクショ
ン・マシンの小型演算処理装置に接続された記憶装置の
記憶容量が小さすぎたために、用語の重み付けのデータ
を収納できなっかたため、テキストの重み付け用語ベク
トルによる表現を用いる処理を実行することができなか
った。たしかに、従来に比較してより高速の検索速度を
得ることができたことが、文献３「Walts,D.,“Applica
tions of the Connection Machine",in Computer,Vol.2
0,No.1,pp.58-79,１９８８年１月」において報告されて
いる。また、２値ベクトル表現がテキストの用語の重み
付けをサポートしなかったため、検索の有効性が不満足
なレベルであったことが文献４「Salton,G. and Buckle
y,C.,“Parallel Text Search Methods",in Communicat
ions of the ACM,Vol.32,No.2,pp.202-215,１９８８年
２月」において報告されている。However, in the retrieval system of Document 2, it is apparent that the storage capacity of the storage device connected to the small processing unit of the connection machine is too small, so that the weighting of the terms is not performed. Because of the inability to store the data, it was not possible to execute the process using the expression using the weighted term vector of the text. Indeed, the fact that a higher search speed could be obtained than in the past was described in Reference 3, “Walts, D.,“ Applica
tions of the Connection Machine ", in Computer, Vol.2
0, No. 1, pp. 58-79, January 1988 ". Also, since binary vector representation did not support weighting of text terms, the effectiveness of the search was at an unsatisfactory level [4].
y, C., “Parallel Text Search Methods”, in Communicat
ions of the ACM, Vol. 32, No. 2, pp. 202-215, February 1988 ".

【０００８】本発明の目的は以上の問題点を解決し、従
来例に比較してより高い検索効率を有しかつより高速で
テキストの検索を実行することができるテキスト検索シ
ステムを提供することにある。An object of the present invention is to solve the above problems and to provide a text search system which has higher search efficiency and can execute text search at higher speed as compared with the conventional example. is there.

【０００９】[0009]

【課題を解決するための手段】本発明に係る並列テキス
ト検索システムは、自然言語文からなるテキストのデー
タベース及び自然言語文からなる質問文から、各テキス
ト及び質問文に対して各用語の出現頻度に対応した各用
語の重要度を示す重みを計算することにより、用語と重
みとからなる重み付けベクトルを生成する発生手段と、
上記発生手段によって生成された各テキストの重み付け
ベクトルを分散してそれぞれ記憶する複数Ｘ個の記憶手
段と、上記複数Ｘ個の記憶手段に接続され、それぞれ並
列に動作して入力された質問文に対応する重み付けベク
トルに応答して、各テキストの重み付けベクトルと質問
文に対応する重み付けベクトルとの間の類似度を示すス
コアを計算して出力する複数Ｘ個のプロセッサ素子と、
上記複数Ｘ個のプロセッサ素子から出力されるスコアに
基づいて、複数のスコアを降順で並び替えかつより高い
所定の複数個のスコアに対応する検索結果を出力する検
索プロセッサ手段と、検索結果に対するユーザの適合・
非適合の判断に従って、質問文に対するベクトルを更新
し、再度検索することを並列計算によって繰り返す帰還
手段とを備えたことを特徴とする。A parallel text search system according to the present invention uses a natural language sentence text database and a natural language sentence question sentence for each text and question sentence for each term. Generating means for calculating a weight indicating the degree of importance of each term corresponding to the term, thereby generating a weight vector composed of the term and the weight;
A plurality of X storage means for distributing and storing the weighting vectors of the respective texts generated by the generation means, respectively; and a plurality of X storage means connected to the plurality of X storage means, each of which operates in parallel to input a question sentence. A plurality of X processor elements for calculating and outputting a score indicating a similarity between the weight vector of each text and the weight vector corresponding to the question sentence in response to the corresponding weight vector;
Search processor means for rearranging a plurality of scores in descending order based on the scores output from the plurality of X processor elements and outputting search results corresponding to a predetermined plurality of higher scores; Conformity of
Feedback means for updating the vector for the question sentence in accordance with the nonconformity judgment and repeating the search again by parallel calculation.

【００１０】[0010]

【作用】以上のように構成された並列テキスト検索シス
テムにおいては、上記発生手段は、自然言語文からなる
テキストのデータベース及び自然言語文からなる質問文
から、各テキスト及び質問文に対して各用語の出現頻度
に対応した各用語の重要度を示す重みを計算することに
より、用語と重みとからなる重み付けベクトルを生成す
る。そして、上記複数Ｘ個の記憶手段は、上記発生手段
によって生成された各テキストの重み付けベクトルを分
散してそれぞれ記憶する。上記複数Ｘ個のプロセッサ素
子は、上記複数Ｘ個の記憶手段に接続され、それぞれ並
列に動作して入力された質問文に対応する重み付けベク
トルに応答して、各テキストの重み付けベクトルと質問
文に対応する重み付けベクトルとの間の類似度を示すス
コアを計算して出力する。次いで、上記検索プロセッサ
手段は、上記複数Ｘ個のプロセッサ素子から出力される
スコアに基づいて、複数のスコアを降順で並び替えかつ
より高い所定の複数個のスコアに対応する検索結果を出
力する。さらに、上記帰還手段は、検索結果に対するユ
ーザの適合・非適合の判断に従って、質問文に対するベ
クトルを更新し、再度検索することを並列計算によって
繰り返す。In the parallel text search system configured as described above, the generating means includes a text database composed of natural language sentences and a question sentence composed of natural language sentences. By calculating a weight indicating the importance of each term corresponding to the frequency of occurrence of, a weight vector including the term and the weight is generated. The plurality of X storage units distribute and store the weighting vectors of the respective texts generated by the generation unit. The plurality of X processor elements are connected to the plurality of X storage means, and operate in parallel with each other to respond to weighting vectors corresponding to the input question texts, respectively. A score indicating the similarity with the corresponding weighting vector is calculated and output. Next, the search processor unit rearranges the plurality of scores in descending order based on the scores output from the plurality of X processor elements and outputs a search result corresponding to a predetermined plurality of higher scores. Further, the feedback means updates the vector for the question sentence in accordance with the user's determination as to whether the search result matches or does not match, and repeats the search by parallel calculation.

【００１１】[0011]

【実施例】以下、図面を参照して本発明に係る英語を対
象とした実施例について説明する。図１は、本発明に係
る一実施例である並列テキスト検索システムのブロック
図である。この実施例の並列テキスト検索システムは、
テキストデータを格納するテキストデータベースメモリ
１１と、重み付けベクトル発生器１と、重み付けベクト
ルメモリと、並列質問プロセッサ２とを備える。ここ
で、本実施例は、図２に図示した従来例に比較して、重
み付けベクトル発生器１と、並列質問プロセッサ２との
構成が異なる。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention for English will be described below with reference to the drawings. FIG. 1 is a block diagram of a parallel text search system according to one embodiment of the present invention. The parallel text search system of this embodiment is
The system includes a text database memory 11 for storing text data, a weight vector generator 1, a weight vector memory, and a parallel query processor 2. Here, the present embodiment is different from the conventional example shown in FIG. 2 in the configuration of the weight vector generator 1 and the parallel query processor 2.

【００１２】図１において、重み付けベクトル発生器１
は、テキストデータベースメモリ１１内のテキストデー
タベースに対して直接アクセスしてテキストデータを読
み出し重み付けベクトルを生成して重み付けベクトルデ
ータベースメモリ１２に格納する。一方、並列質問プロ
セッサ２は、入力される質問文に応答して、重み付けベ
クトルデータベースメモリ１２内の重み付けベクトルを
参照して上記質問文の内容を検索し検索結果を出力す
る。In FIG. 1, a weight vector generator 1
Reads the text data by directly accessing the text database in the text database memory 11, generates a weight vector, and stores it in the weight vector database memory 12. On the other hand, in response to the input question sentence, the parallel question processor 2 searches the content of the question sentence by referring to the weight vector in the weight vector database memory 12 and outputs a search result.

【００１３】次いで、さらに、各部１，２，１１，１２
の詳細について説明する。重み付けベクトル発生器１と
テキストデータベースメモリ１１はこのシステムのホス
トコンピュータ（ＳＵＮＳｐａｒｃワークステーショ
ン）で構成され、重み付けベクトルデータベースメモリ
１２と並列質問プロセッサ２とはＫＳＲ並列コンピュー
タで構成される。Next, each part 1, 2, 11, 12
Will be described in detail. The weight vector generator 1 and the text database memory 11 are constituted by a host computer (SUN Spark workstation) of the system, and the weight vector database memory 12 and the parallel query processor 2 are constituted by a KSR parallel computer.

【００１４】上記データベースメモリ１１にはサーチさ
れるテキスト（もしくは、文書又は文献）が予め格納さ
れている。重み付けベクトル発生器１はテキストデータ
ベースメモリ１１に対して直接アクセスして重み付けベ
クトルデータベースを生成して重み付けベクトルデータ
ベースメモリ１２に格納する。並列質問プロセッサ２
は、複数Ｘ個のプロセッサ素子Ｐ１乃至ＰＸと、各プロ
セッサ素子に接続されたＸ個のメモリと、検索処理を実
行する検索プロセッサ素子とを備える。そして、並列質
問プロセッサ２は、入力される質問文のベクトルとすべ
てのテキストベクトル間で類似度計算処理を並列に実行
することによって、重み付けベクトルデータベースに対
して直接アクセスする。またこの並列質問プロセッサ２
は、質問文の適合情報によって新しい質問文を定式化す
ることによって適合帰還演算も実行する。The database memory 11 stores a text (or a document or a document) to be searched in advance. The weight vector generator 1 directly accesses the text database memory 11 to generate a weight vector database and stores it in the weight vector database memory 12. Parallel Query Processor 2
Comprises a plurality of X processor elements P1 to PX, X memories connected to each processor element, and a search processor element for executing a search process. Then, the parallel question processor 2 directly accesses the weight vector database by executing the similarity calculation process in parallel between the input question sentence vector and all the text vectors. This parallel query processor 2
Also performs the adaptive feedback operation by formulating a new question sentence with the matching information of the question sentence.

【００１５】並列質問プロセッサ２において用いられる
ＫＳＲ並列コンピュータはＭＩＭＤ（Multiple Instruc
tion stream Multiple Data）型の並列計算システムで
ある。The KSR parallel computer used in the parallel query processor 2 is a MIMD (Multiple Instruc
This is a parallel computing system of the "action stream multiple data" type.

【００１６】次に、重み付けベクトル発生器１の構成と
動作について述べる。重み付けベクトル発生器１は、３
つの主な構成要素、すなわちストップリストフィルタ
（stoplist filter）、サフィックスストリッピングス
テマ（suffix stripping stemmer）及び用語重み割り当
て関数（term weights assignment function）計算部で
構成されています。各構成要素について以下に説明す
る。Next, the configuration and operation of the weight vector generator 1 will be described. The weight vector generator 1
It consists of two main components: a stoplist filter, a suffix stripping stemmer, and a term weights assignment function calculator. Each component will be described below.

【００１７】（ａ）ストップリストフィルタは、例えば
ａｎｄ，ｏｆ，ｏｒ，ｂｕｔ，ｔｈｅなどの、最も頻度
の高い英語の用語を、テキスト又は質問文から除きま
す。これらの用語を除いても検索効率に影響しない。さ
らにこのフィルタ処理によって記憶必要量が減少して質
問文を処理する速度が上昇する。このストップリストフ
ィルタは、公知のブラウン・コーパスから得られた４２
５語で構成されているストップリストを用いて適用され
る。（ｂ）サフィックスストリッピングステマは、ストップ
リストフィルタによって除かれなかった単語の語幹に置
換する。例えば、このステマは、「analysis」，「anal
yzing」，「analyzes」及び「analyzed」などの各種の
異なる単語を共通の語幹「analy」で置換する。このス
テミング処理を実行すると、多数の用語が単一の語幹で
置換されるので、記憶の必要容量が大幅に減少する。さ
らに、その語幹は、置換される以前の用語より出現頻度
が高いので検索効率が上昇する。この処理は、例えば、
公知のポーター（Porter）ステミングアルゴリズムを用
いて実施することができる。(A) The stoplist filter removes the most frequent English terms such as and, of, or, but, and the from the text or question text. Excluding these terms does not affect search efficiency. Further, the filtering process reduces the storage requirement and increases the speed of processing the question text. The stoplist filter is a 42-bit filter obtained from a known Brown corpus.
It is applied using a stoplist consisting of 5 words. (B) The suffix stripping stemmer replaces stems of words not removed by the stoplist filter. For example, this stemmer is "analysis", "anal
Replace various different words such as "yzing", "analyzes" and "analyzed" with a common stem "analy". Executing this stemming process greatly reduces storage requirements because many terms are replaced with a single stem. Further, the stem has a higher frequency of occurrence than the term before replacement, so that search efficiency is improved. This process, for example,
This can be performed using a known Porter stemming algorithm.

【００１８】（ｃ）用語重み割り当て関数計算部は、重
み割り当て関数を用いて、上記ステマが生成した各語幹
に実数の重みを割り当てます。この重みはテキスト又は
質問文中の用語の重要度を区別するので、検索効率が大
幅に改善される。さらに、重みによって検索されたテキ
ストのランク付けが可能になるので、このシステムに対
してユーザが親しみ易くなる。本実施例のシステムで
は、テキストと質問文の両者に対して精巧な重み割り当
て関数を用いた。テキスト又は質問文におけるｉ番目の
用語（以下、用語ｉという。）の重み係数ｗ_iは次の数
１で表される。(C) The term weight assignment function calculation unit assigns a real weight to each stem generated by the stemmer using a weight assignment function. This weight distinguishes the importance of a term in a text or a question sentence, thereby greatly improving search efficiency. In addition, the weighted text can be used to rank the retrieved text, making the system more user friendly. In the system of the present embodiment, an elaborate weight assignment function is used for both the text and the question sentence. The weight coefficient w _i of the i-th term (hereinafter, term i) in the text or the question text is expressed by the following equation 1.

【００１９】[0019]

【数１】 (Equation 1)

【００２０】ここで、ｔｆ_ｉは用語ｉのテキスト又は質
問文中への出現頻度であり、ｎ_ｉは用語ｉが記載されて
いるテキスト又は質問文の数であり、Ｎはテキストデー
タベースメモリ１１内のテキスト又は質問文の数であ
り、Ｗはテキスト又は質問文中の用語の総数である。数
１で表されたこの関数の分母は、テキスト又は質問文の
ベクトルの長さが等しいことを保証する用語重み正規化
成分である。この関数は、０から１．０までの間で変化
する重みを割り当てることができる。[0020] Here, tf _i is the frequency of occurrence of the text or question sentence term i, n _i is the number of text or question term i is described, N is the text in the database memory 11 W is the number of texts or question sentences, and W is the total number of terms in the text or question sentences. The denominator of this function, represented by equation (1), is the term weight normalization component that guarantees that the length of the text or question vector is equal. This function can assign weights that vary between 0 and 1.0.

【００２１】次いで、並列質問プロセッサ２の構成及び
動作について述べる。並列質問プロセッサ２の複数Ｘ個
のプロセッサ素子Ｐ１乃至ＰＸは、上記複数Ｘ個のメモ
リに接続され、それぞれ並列に動作して入力された質問
文に対応する重み付けベクトルに応答して、各テキスト
の重み付けベクトルと質問文に対応する重み付けベクト
ルとの間の類似度を示すスコアを計算して出力する。そ
して、当該並列質問プロセッサ２の検索プロセッサ素子
は、上記複数Ｘ個のプロセッサ素子から出力されるスコ
アに基づいて、複数のスコアを降順で並び替えかつより
高い所定の複数個のスコアに対応する検索結果を出力す
る。また、このプロセッサは、適合帰還法も行い検索効
率を改善する。この適合帰還法による処理は、検索結果
に対するユーザの適合・非適合の判断に従って、質問文
に対するベクトルを更新し、再度検索することを並列計
算によって繰り返す。ここで、検索結果に対するユーザ
の適合・非適合の判断は、例えばキーボードなどの入力
手段を用いてリアルタイムに入力してもよいし、もしく
は、予めメモリ内に検索情報ファイル２０として記憶し
てもよい。この並列質問プロセッサ２によって実行され
る並列質問処理について、図３に示す制御フローを参照
して、以下に詳細に説明する。Next, the configuration and operation of the parallel query processor 2 will be described. The plurality X of processor elements P1 to PX of the parallel query processor 2 are connected to the plurality X of memories, and operate in parallel with each other to respond to weighting vectors corresponding to the input question sentences, and A score indicating the degree of similarity between the weight vector and the weight vector corresponding to the question sentence is calculated and output. Then, the search processor element of the parallel query processor 2 sorts the plurality of scores in descending order based on the scores output from the plurality of X processor elements, and searches for the plurality of scores corresponding to the predetermined plurality of higher scores. Output the result. The processor also performs adaptive feedback to improve search efficiency. In the processing by the adaptive feedback method, the vector for the question sentence is updated according to the user's determination as to whether or not the search result matches, and the search is repeated again by parallel calculation. Here, the user's determination as to whether or not the search result is appropriate may be input in real time using an input unit such as a keyboard, or may be stored in advance as a search information file 20 in a memory. . The parallel question processing executed by the parallel question processor 2 will be described below in detail with reference to a control flow shown in FIG.

【００２２】（１）ステップＳ１においては、全テキス
トの重み付けベクトルと１つの実際の質問文の重み付け
ベクトルとをＫＳＲコンピュータのメインメモリに読み
出して格納する。各重み付けベクトルは、以下の構造体
として記憶される。すなわち、（ａ）ベクトルの大きさ
を保持するための整数タイプの変数；（ｂ）ベクトルの
用語を保持するための文字列タイプの配列；及び（ｃ）
用語の重みを保持するための浮動小数点タイプの配列。（２）ステップＳ２においては、テキストのベクトルを
異なるプロセッサ素子Ｐ１乃至ＰＸに割り当てることに
よって、質問文ベクトルの類似度計算処理をＫＳＲ並列
コンピュータのプロセッサ素子Ｐ１乃至ＰＸに分散させ
る。このテキストベクトルのプロセッサ素子Ｐ１乃至Ｐ
Ｘへの割り当ては、図４に示すように、各プロセッサ素
子Ｐ１乃至ＰＸ（ここで、１つのプロセッサ素子をＰｘ
と表わす。）に対して、テキストベクトルをＤｘ個（ｘ
＝１，２，…，Ｘ）割り当てることによって行われる。
ここで、各プロセッサ素子Ｐｘに対するテキストベクト
ルの個数Ｄｘは、次の数２の割り当て関数で決定され
る。(1) In step S1, the weight vectors of all the texts and the weight vector of one actual question are read and stored in the main memory of the KSR computer. Each weight vector is stored as the following structure. (A) an integer type variable for holding the magnitude of the vector; (b) a character string type array for holding the term of the vector; and (c)
An array of floating point type to hold term weights. (2) In step S2, the similarity calculation process of the question sentence vector is distributed to the processor elements P1 to PX of the KSR parallel computer by allocating the text vector to the different processor elements P1 to PX. The processor elements P1 to P of this text vector
As shown in FIG. 4, the assignment to X is made by assigning each processor element P1 to PX (here, one processor element to Px
It is expressed as ), Dx text vectors (x
= 1, 2,..., X).
Here, the number Dx of text vectors for each processor element Px is determined by the following assignment function of Expression 2.

【００２３】[0023]

【数２】Ｄｘ＝Ｎ／Ｘ，もしｘ＞ＮｍｏｄＸのとき＝（Ｎ／Ｘ）＋１，その他のときDx = N / X, if x> NmodX = (N / X) +1, otherwise

【００２４】ここで、Ｎはテキストベクトルの合計数で
あり、ＸはＫＳＲ並列コンピュータのプロセッサ素子Ｐ
１乃至ＰＸの合計数である。この割り当てによって、並
列のコンピュータのマシン資源で均等に平均化された処
理を実行することができる。また、ＮｍｏｄＸは、Ｎを
Ｘで割った余りを表し、Ｎ／ＸはＮをＸで割った商を表
わす。（３）次いで、ステップＳ３においては、質問文ベクト
ルの類似度計算処理を並列に実行する。これは、各プロ
セッサ素子Ｐ１乃至ＰＸが、質問文ベクトルＱと各テキ
ストベクトルＤｘとの間の内積を計算する。（４）さらに、ステップＳ４においては、テキストの類
似度のスコアが減少する順にテキストのランク付けを行
い、そのトップの所定の複数ｎ個のテキストを出力して
実際の質問文に対する適合性について判断する。以下の
実験は、バッチモードの処理で実行されるので、その適
合性の判断は実際の質問文の適合テキストの名称が入っ
ている適合情報のファイル２０を参照して自動的に行わ
れる。しかしながら、対話モードの処理の場合において
は、適合性の判断はリアルタイムにユーザが行って、例
えばキーボードなどの入力手段を用いてその判断結果を
入力する。判断がなされると、すべての（又は充分な）
適合テキストが検索された場合、検索操作から抜け出
て、ホストコンピュータのディスクファイル中の上記適
合文書に対応するフルテキストを出力する。Where N is the total number of text vectors and X is the processor element P of the KSR parallel computer.
This is the total number of 1 to PX. With this allocation, it is possible to execute processing that is evenly averaged with the machine resources of the parallel computers. NmodX represents a remainder obtained by dividing N by X, and N / X represents a quotient obtained by dividing N by X. (3) Next, in step S3, similarity calculation processing of the question sentence vector is executed in parallel. This means that each processor element P1 to PX calculates an inner product between the question sentence vector Q and each text vector Dx. (4) Further, in step S4, the texts are ranked in order of decreasing similarity score of the texts, and a plurality of predetermined top n texts are output to determine the suitability for the actual question text. I do. Since the following experiment is performed in the batch mode, the relevance determination is automatically performed with reference to the relevance information file 20 containing the name of the relevance text of the actual question text. However, in the case of the process in the interactive mode, the compatibility is determined by the user in real time, and the determination result is input using an input unit such as a keyboard. Once the judgment is made, all (or enough)
If a matching text is found, the process exits the search operation and outputs the full text corresponding to the matching document in the disk file of the host computer.

【００２５】（５）ステップＳ５においては、検索すべ
き残りのテキストがあるか否かが判断され、検索すべき
テキストがある場合は、ステップＳ６に進むが、それ以
外のときは、この並列質問処理を終了する。（６）ステップＳ５において、検索すべきテキストがあ
る場合は、適合帰還法を利用することによって、より多
くの（又は残りの）適合テキストを検索するために次の
ステップ６に進む。（７）以下に示すイデのデック・ハイ適合帰還法に従っ
て再び重み付けを行うことによって質問文ベクトルを再
定式化する。(5) In step S5, it is determined whether there is any remaining text to be searched. If there is a text to be searched, the process proceeds to step S6. The process ends. (6) If there is a text to be searched in step S5, the process proceeds to the next step 6 to search for more (or remaining) matching texts by using the adaptive feedback method. (7) The question sentence vector is reformulated by re-weighting according to the ide's deck-high adaptive feedback method shown below.

【００２６】[0026]

【数３】Ｑ_i+1 ＝Ｑ_i＋Σ（適合するテキストベクトル）−（適合しない中でトップのテキストベクトル）## EQU3 ## Q _{i + 1} = Q _i + Σ (conforming text vector) − (top non-conforming text vector)

【００２７】すなわち、再定式化された質問ベクトルＱ
_i+1は以下の手順で求める。（ｉ）前の質問ベクトルＱ_iに、全ての適合するテキス
トベクトルを加算する。ここで、各テキストベクトルの
加算は各テキストベクトル中の全ての用語及び重みを加
算することである。（ｉｉ）（ｉ）の結果のベクトルから、適合しないもの
中でトップの、つまり、最上位スコアのテキストベクト
ルを減算することによって、再定式化された質問ベクト
ルＱ_i+1を得る。ここで、テキストベクトルの減算は当
該テキストベクトル中の全ての用語及び重みを減算する
ことである。（８）そして、すべての適合テキストが検索されるか又
は満足すべき検索効果が得られるまでステップＳ２から
ステップＳ６までの処理を繰り返す。That is, the reformulated question vector Q
_{i + 1} is obtained by the following procedure. (I) Add all matching text vectors to the previous question vector Q _i . Here, adding each text vector means adding all terms and weights in each text vector. (Ii) Obtain the reformulated question vector Q _{i + 1} by subtracting the top, i.e. highest scoring, text vector among the non-conforming ones from the vector of (i). Here, the subtraction of the text vector is to subtract all terms and weights in the text vector. (8) Then, the processing from step S2 to step S6 is repeated until all matching texts are searched or a satisfactory search effect is obtained.

【００２８】本発明者によって構成された本出願人によ
る本実施例の並列テキスト検索システムと、従来例のコ
ネクション・マシンを用いたテキスト検索システムとの
相違点について以下に説明する。本実施例と従来例の両
者のテキスト検索システムは、ベクトル処理テキスト検
索アルゴリズムを並列で実行することによって効果的で
かつ高速のテキスト検索を行うことを目的としている。
しかしながら、この２つのシステムは実行方法が完全に
異なるので、性能が異なる。本実施例のシステムは以下
に説明する理由から、効率的でかつ優れた速度性能が得
られる。The difference between the parallel text search system of the present embodiment constituted by the present inventor and the text search system using a conventional connection machine will be described below. The text search system of both the present embodiment and the conventional example aims at performing an effective and high-speed text search by executing a vector processing text search algorithm in parallel.
However, the performance of the two systems is different because they are completely different in their implementation. The system according to the present embodiment can provide efficient and excellent speed performance for the following reasons.

【００２９】従来例におけるコネクション・マシンによ
る処理は、テキストと質問文のビット列表現に基づいて
おり、用語の重み付けをサポートしないので、検索の質
が低下する。一方、本実施例のシステムは精巧な用語の
重み付けを使うので、質が高い有効な検索が行われる。The processing by the connection machine in the conventional example is based on the bit string expression of the text and the question sentence, and does not support the weighting of the terms. On the other hand, the system of the present embodiment uses sophisticated term weighting, so that a high-quality and effective search is performed.

【００３０】従来例のコネクション・マシンのシステム
では、質問文ベクトルはテキストベクトルと並列に照合
される。しかしながら、質問文の用語は、１つずつ各テ
キストの用語と比較される。質問文用語のこの逐次処理
によってホストコンピュータとプロセッサ素子との間の
通信の量が増大するので、検索速度が低下する。一方、
本実施例のシステムでは、内積演算処理を実行すること
によって、質問文の用語がテキストの用語に対して処理
され、その内積演算処理は、ホストコンピュータとプロ
セッサ素子間の通信を必要としない。このため検索速度
が実質的に上昇する。In a conventional connection machine system, a question sentence vector is collated in parallel with a text vector. However, the terms in the question are compared one by one with the terms in each text. This sequential processing of query term terms increases the amount of communication between the host computer and the processor elements, thereby reducing search speed. on the other hand,
In the system of this embodiment, by executing the inner product operation process, the terms of the question sentence are processed with respect to the terms of the text, and the inner product operation process does not require communication between the host computer and the processor element. This substantially increases the search speed.

【００３１】最後に、この並列テキスト検索システムの
性能について説明する。本発明者は、異なる課題の分野
をカバーする６つの実験テキストのコレクションを利用
してシステムの性能を評価した。しかしながら、ここで
報告する性能のデータは次の２つのコレクションを用い
て得られたデータだけに限定する。すなわち（ａ）ＬＩＳＡ：６００４のテキストと３５の質問文か
らなる図書館科学のデータである。（ｂ）ＡＤＩ：８２のテキストと３５の質問文からなる
情報科学のデータである。重み付けベクトルは、８０
０メガバイト（０．８ギガバイト）の分散メモリに接続
された２５個のプロセッサ素子からなるＫＳＲコンピュ
ータモデルによって読み取り、並列で処理した。検索効
率と検索時間について、上記システムの性能を評価し
た。Finally, the performance of the parallel text search system will be described. The inventor has evaluated the performance of the system using a collection of six experimental texts covering different subject areas. However, the performance data reported here is limited to data obtained using the following two collections. (A) LISA: Library science data consisting of 6004 texts and 35 question texts. (B) ADI: Information science data consisting of 82 texts and 35 question sentences. The weight vector is 80
The data was read and processed in parallel by a KSR computer model consisting of 25 processor elements connected to a 0 megabyte (0.8 gigabyte) distributed memory. The performance of the above system was evaluated for search efficiency and search time.

【００３２】テキスト検索システムの検索効率は通常、
呼出し率と精度を用いて評価する。呼出し率とは、検索
された適合テキストの総適合テキストに対する比率と定
義であり、精度は検索された適合文書の検索された文書
に対する比率である。ＬＩＳＡを用いて上記の検索実験
をいくつか実施し、得られた呼出し率と精度の値を表１
に示す。The search efficiency of a text search system is usually
Evaluate using call rate and accuracy. The recall rate is defined as the ratio of the searched matching text to the total matching text, and the accuracy is the ratio of the searched matching document to the searched document. Some of the above search experiments were performed using LISA, and the resulting call rates and accuracy values are shown in Table 1.
Shown in

【００３３】[0033]

【表１】 ──────────────────────────── 繰り返し回数質問文のサイズ呼び出し率精度 ──────────────────────────── ０１８０．２６４０．７９９１４７７０．３５８０．４７５２６０９０．５０９０．４５０３８２３０．５８４０．３８７４９６９０．７１６０．３８０５１２２８０．７３５０．３２５６１２６８０．８１１０．３０７７１３９６０．８４９０．２８１ ────────────────────────────[Table 1] 回数 Number of repetitions Question text size Calling rate Accuracy ──────── ──────────────────── 0 18 0.264 0.799 1 477 0.358 0.475 2609 0.509 0.450 3 823 0.584 0 .387 4969 0.716 0.380 5 1228 0.735 0.325 6 1268 0.811 0.307 7 1396 0.849 0.281 ────────────

【００３４】表１の数値において、繰り返す数が０であ
るときは、最初の質問文の処理に対応し、これに続く反
復は適合帰還で定式化された質問文の処理に対応してい
る。帰還反復の数は、予め設定できるが、この実験で
は、以前の反復で適合テキストが得られたときのみ新し
い帰還反復を容認することによって動的に求めた。In the numerical values of Table 1, when the number of repetitions is 0, it corresponds to the processing of the first question sentence, and the subsequent repetition corresponds to the processing of the question sentence formulated by adaptive feedback. The number of feedback iterations can be preset, but in this experiment it was determined dynamically by accepting new feedback iterations only when the previous iteration yielded matching text.

【００３５】表１に示す呼出し率の数字は、帰還の反復
の実行が多いなど検索の質が改善されていることを示し
ている。上記の適合帰還法を用いることによって改善さ
れた呼出し率の測定値を図６に示す。The call rate figures shown in Table 1 indicate that the quality of the search has been improved, such as by performing more feedback iterations. Paging rate measurements improved by using the adaptive feedback method described above are shown in FIG.

【００３６】適合帰還反復で定式化した質問文中の用語
の数の関数として、及びそのコレクション中のテキスト
の数の関数として、システムの検索時間を調べた。両方
の場合とも、測定された検索時間は、（質問文ベクトル
とすべてのテキストベクトルとの類似度を計算するため
に消費した時間）＋（類似度のスコアを、その減少する
順にランク付けするのに要した時間）に相当する。The search time of the system was examined as a function of the number of terms in the question sentence formulated in the adaptive feedback iteration and as a function of the number of texts in the collection. In both cases, the measured search time is (time spent calculating the similarity between the question sentence vector and all text vectors) + (ranking the similarity score in decreasing order). Time required for

【００３７】適合帰還の反復が多く実施されるほど新し
く定式化された質問文中の用語の数が増大したとうこと
は、表１の質問文の大きさの記載数値から極めて明白で
ある。質問文の大きさが増大すると当然、検索時間は比
例して増大する。この実験によれば、１８語からなる最
初の質問文を処理するのに約０．２３ミリ秒かかり、そ
して１３９６語からなる最後の質問文ベクトルを処理す
るのに１３．９３ミリ秒を要した。質問文ベクトル中の
用語の数の関数として検索時間をプロットして図７に示
したが、同一の検索実験を単一のプロセッサ素子を用い
て実施したときに得た検索時間も併せてプロットした。
高速化率Ｓを、（１個のプロセッサ素子ＰＥでの処理時
間）／（２５個のプロセッサ素子ＰＥでの処理時間）と
して計算したところ、Ｓ＝２２を得た。高速化率Ｓの値
をプロセッサ素子ＰＥの合計数で割算してシステムの利
用率０．８８が得られる。It is quite clear from Table 1 that the number of terms in the newly formulated question sentence increases as the number of repetitions of the adaptive feedback increases. As the size of the question text increases, the search time naturally increases in proportion. According to this experiment, it took about 0.23 milliseconds to process the first 18 word query and 13.93 milliseconds to process the last 1396 word vector. . The search time as a function of the number of terms in the query sentence vector is plotted in FIG. 7, but the search time obtained when the same search experiment was performed using a single processor element was also plotted. .
When the speed-up rate S was calculated as (processing time in one processor element PE) / (processing time in 25 processor elements PE), S = 22 was obtained. By dividing the value of the speed-up rate S by the total number of the processor elements PE, a system utilization rate of 0.88 is obtained.

【００３８】システムの検索時間をテキストの数の関数
として測定するため、ＬＩＳＡの高速化性能を、ＡＤＩ
の高速化性能と比較した。上記２つのコレクションにつ
いて得た平均の高速化率と利用率を比較して表２に示
す。To measure the search time of the system as a function of the number of texts, the acceleration performance of LISA was
Compared with the speedup performance. Table 2 shows a comparison between the average speedup rate and the utilization rate obtained for the two collections.

【００３９】[0039]

【表２】 ─────────────────────────────────── コレクション名テキストの数質問文の最大サイズ高速化率利用率 ─────────────────────────────────── ＡＤＩ８２８６０１４．０３０．６０ＬＩＳＡ６００４１３９６２２．６８０．８８ ───────────────────────────────────[Table 2] ─────────────────────────────────── Collection name Number of texts Maximum size of question text High speed Conversion rate Utilization rate ADI 82 860 14.03 0.60 LISA 6004 1396 22.68 0.88───────────────────────────────────

【００４０】表２に示すように、ＬＩＳＡの方が優れて
おり、従って、一層高い高速化率が得られた。このこと
は、テキストデータベースの大きさが大きいときに、こ
のシステムの性能が特に有効であることを示唆してい
る。As shown in Table 2, LISA was more excellent, and a higher speed-up rate was obtained. This suggests that the performance of this system is particularly effective when the size of the text database is large.

【００４１】以上説明したように、重み付けベクトル発
生器１と並列質問プロセッサ２とを用いることによっ
て、従来例に比較してより高い検索効率を有しかつより
高速なテキスト検索を実行することができる。また、大
量のテキストデータベースについても処理することがで
きる。As described above, by using the weight vector generator 1 and the parallel query processor 2, it is possible to execute a text search having higher search efficiency and higher speed as compared with the conventional example. . It can also process a large number of text databases.

【００４２】なお、重み付けベクトル発生器１も並列計
算機によって容易に構成される。The weight vector generator 1 is also easily constructed by a parallel computer.

【００４３】[0043]

【発明の効果】以上詳述したように本発明に係る並列テ
キスト検索システムによれば、自然言語文からなるテキ
ストのデータベース及び自然言語文からなる質問文か
ら、各テキスト及び質問文に対して各用語の出現頻度に
対応した各用語の重要度を示す重みを計算することによ
り、用語と重みとからなる重み付けベクトルを生成する
発生手段と、上記発生手段によって生成された各テキス
トの重み付けベクトルを分散してそれぞれ記憶する複数
Ｘ個の記憶手段と、上記複数Ｘ個の記憶手段に接続さ
れ、それぞれ並列に動作して入力された質問文に対応す
る重み付けベクトルに応答して、各テキストの重み付け
ベクトルと質問文に対応する重み付けベクトルとの間の
類似度を示すスコアを計算して出力する複数Ｘ個のプロ
セッサ素子と、上記複数Ｘ個のプロセッサ素子から出力
されるスコアに基づいて、複数のスコアを降順で並び替
えかつより高い所定の複数個のスコアに対応する検索結
果を出力する検索プロセッサ手段と、検索結果に対する
ユーザの適合・非適合の判断に従って、質問文に対する
ベクトルを更新し、再度検索することを並列計算によっ
て繰り返す帰還手段とを備える。従って、従来例に比較
してより高い検索の有効性を有しかつより高速なテキス
ト検索を実行することができる。また、大量のテキスト
データベースについても処理することができる。As described above in detail, according to the parallel text retrieval system of the present invention, a text database composed of natural language sentences and a question sentence composed of natural language sentences are used for each text and question sentence. Generating means for generating a weight vector composed of the term and the weight by calculating a weight indicating the importance of each term corresponding to the frequency of appearance of the term; and distributing the weight vector of each text generated by the generating means. And a plurality of X storage means connected to the plurality of X storage means, respectively, and operated in parallel with each other to respond to weighting vectors corresponding to the input question sentences, and A plurality of X processor elements for calculating and outputting a score indicating the degree of similarity between the weight and the weight vector corresponding to the question sentence; Search processor means for rearranging a plurality of scores in descending order based on the scores output from the X processor elements and outputting search results corresponding to the predetermined plurality of higher scores, and adaptation of the user to the search results A feedback unit that updates the vector for the question sentence according to the determination of non-conformity and repeats the search by parallel calculation; Therefore, it is possible to execute a higher-speed text search with higher search effectiveness as compared with the conventional example. It can also process a large number of text databases.

[Brief description of the drawings]

【図１】本発明に係る一実施例である並列テキスト検
索システムのブロック図である。FIG. 1 is a block diagram of a parallel text search system according to an embodiment of the present invention.

【図２】従来例のテキスト検索システムのブロック図
である。FIG. 2 is a block diagram of a conventional text search system.

【図３】図１の並列質問プロセッサによって実行され
る並列質問処理を示すフローチャートである。FIG. 3 is a flowchart showing a parallel query process executed by the parallel query processor of FIG. 1;

【図４】図１の並列質問プロセッサ内の複数のプロセ
ッサＰ１乃至ＰＸによって処理される重み付けベクトル
を示す図である。FIG. 4 is a diagram showing weighting vectors processed by a plurality of processors P1 to PX in the parallel query processor of FIG. 1;

【図５】図１の並列質問プロセッサ内の複数のプロセ
ッサＰ１乃至ＰＸによる処理のフローを示す図である。FIG. 5 is a diagram showing a flow of processing by a plurality of processors P1 to PX in the parallel query processor of FIG. 1;

【図６】図１の並列テキスト検索システムの動作評価
実験における帰還処理の繰り返し数に対する呼出し率の
改善率を示すグラフである。FIG. 6 is a graph showing an improvement rate of a call rate with respect to the number of repetitions of feedback processing in an operation evaluation experiment of the parallel text search system of FIG. 1;

【図７】図１の並列テキスト検索システムの動作評価
実験における質問文のサイズに対する実行時間を示すグ
ラフである。7 is a graph showing the execution time with respect to the size of a question sentence in an operation evaluation experiment of the parallel text search system of FIG. 1;

[Explanation of symbols]

１…重み付けベクトル発生器、２…並列質問プロセッサ、１１…テキストデータベースメモリ、１２…重み付けベクトルデータベースメモリ、Ｐ１乃至ＰＸ…プロセッサ素子。 DESCRIPTION OF SYMBOLS 1 ... Weight vector generator, 2 ... Parallel query processor, 11 ... Text database memory, 12 ... Weight vector database memory, P1 through PX ... Processor elements.

───────────────────────────────────────────────────── フロントページの続き (72)発明者飯田仁京都府相楽郡精華町大字乾谷小字三平谷５番地株式会社エイ・ティ・アール音声翻訳通信研究所内 (56)参考文献特開昭58−117045（ＪＰ，Ａ) 特開平63−101967（ＪＰ，Ａ) 特開平３−25675（ＪＰ，Ａ) 特開平３−177972（ＪＰ，Ａ) 特開平４−133173（ＪＰ，Ａ) 特開平５−266082（ＪＰ，Ａ) 特開平５−266087（ＪＰ，Ａ) 特開平６−4584（ＪＰ，Ａ) ────────────────────────────────────────────────── ─── Continuing on the front page (72) Inventor Jin Iida Kyoto, Soraku-gun, Seika-cho, 5th, Inaniya, 5th, Sanraya, ATI, Inc. Voice Translation and Communication Research Laboratories, Inc. (56) References JP-A-58- 117045 (JP, A) JP-A-63-101967 (JP, A) JP-A-3-25675 (JP, A) JP-A-3-177972 (JP, A) JP-A-4-133173 (JP, A) JP-A-5-266082 (JP, A) JP-A-5-266087 (JP, A) JP-A-6-4584 (JP, A)

Claims

(57) [Claims]

1. A weight indicating the importance of each term corresponding to the frequency of appearance of each term for each text and question sentence is calculated from a text database consisting of natural language sentences and a question sentence consisting of natural language sentences. By this means, generating means for generating a weight vector composed of terms and weights; a plurality of X storing means for distributing and storing the weight vectors of the respective texts generated by the generating means; In response to a weight vector corresponding to the input question sentence which is connected to the storage means and operated in parallel, a score indicating the similarity between the weight vector of each text and the weight vector corresponding to the question sentence is calculated. A plurality of processor elements that are calculated and output, and a plurality of processor elements are calculated based on the scores output from the plurality of X processor elements. Search processor means for sorting the search results in descending order and outputting search results corresponding to a plurality of higher predetermined scores, and updating the vector for the question sentence according to the user's determination as to whether or not the search results match or not. A parallel text search system, comprising: feedback means for repeating search by parallel calculation.