JPH0660121A

JPH0660121A - Information retrieval device

Info

Publication number: JPH0660121A
Application number: JP5058941A
Authority: JP
Inventors: Atsushi Hatakeyama; 敦畠山; Kanji Kato; 寛次加藤; Satoshi Asakawa; 悟志浅川; Hisamitsu Kawaguchi; 川口　　久光
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1992-03-19
Filing date: 1993-03-18
Publication date: 1994-03-04
Anticipated expiration: 2018-06-09
Also published as: JP3413866B2

Abstract

PURPOSE:To return the result to a terminal equipment in a reply time not different from that in the case of a single user by providing a means applying retrieval processing in the lump to plural stored retrieval requests to the system when plural retrieval requests are stored. CONSTITUTION:The system is made up of a retrieval terminal equipment 100, a CPU 110 executing the retrieval processing, a magnetic disk 120 storing information for a database, and a memory 130 storing various programs and content for a data buffer. Moreover, other areas of the memory 130 are used for an output buffer for a session (n) used to output a retrieval result to each request source and a waiting queue buffer or the like storing tentatively a retrieval request when requests comes simultaneously from plural terminal equipments. When a retrieval request is extracted from the buffer, number of retrieval requests having already been stored in the buffer is checked and the retrieval result is sent to all request sources by one retrieval processing independently of the sequence of registered retrieval requests by implementing the retrieval request unification processing unifying resident retrieval requests into one.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、文書データベースを、
文字列を指定して文書の全文を対象として探索する機能
を複数のユーザが共用して使う情報検索装置に係り、特
に各ユーザからの検索要求が同時に送られてくる場合
に、待ち時間を最小限に抑えるのに好適な装置に関す
る。This invention relates to a document database,
The present invention relates to an information search device in which a plurality of users share the function of searching for the entire text of a document by specifying a character string, and particularly when waiting for search requests from each user, waiting time is minimized. The present invention relates to a device that is suitable for holding down to the limit.

【０００２】[0002]

【従来の技術】従来より、文書の登録の際にキーワード
付けを行う必要の無いフルテキストサーチ方式が特開平
０３−１７４６５２号公報で提案されている。この方式
は、文書を単語単位に圧縮した凝縮本文と、使用文字を
一文字単位で登録した文字成分表を用いて、フルテキス
トサーチを実用レベルで高速に行うことを目的としてい
る。2. Description of the Related Art Conventionally, Japanese Patent Laid-Open No. 03-174652 proposes a full-text search method that does not require adding a keyword when registering a document. This method aims to perform full-text search at high speed at a practical level by using a condensed text in which a document is compressed in word units and a character component table in which used characters are registered in character units.

【０００３】しかしながら、凝縮されているとはいえ、
テキストデータを一文字ずつ探索する方式であることに
違いは無い。そのため、ＣＰＵがデータスキャンの間中
文字の照合動作を行っており、他の処理を行う時間的余
裕が無いことになる。このことは、複数のユーザにサー
ビスを時分割に提供することが困難であるということを
意味する。However, even though it is condensed,
There is no difference in the method of searching text data character by character. Therefore, the CPU performs the collating operation of the middle character during the data scan, and there is no time margin to perform other processing. This means that it is difficult to provide services to multiple users in a time-sharing manner.

【０００４】すなわち、フルテキストサーチ処理を行う
検索装置に複数個の端末が接続し、その各々の端末から
頻繁に検索要求が与えられる場合、テキストスキャン中
のため他の処理をＣＰＵが行うことができず、ＣＰＵの
文字照合動作が一通り終了するまで他の要求が待たされ
ることになる。That is, when a plurality of terminals are connected to the search device for performing the full-text search process and a search request is frequently given from each of the terminals, the CPU may perform another process because the text scan is in progress. This cannot be done, and another request is kept waiting until the character collating operation of the CPU is completed.

【０００５】また、処理の終了待ちとなっている状態
で、次の処理要求が来た場合、あるいは、次々と各端末
から検索要求が来た場合には、検索装置は要求が来た順
番に処理をするために後から来た要求であるほど待ち時
間が多くなるという問題があった。Further, when the next processing request is received while waiting for the completion of the processing, or when the search requests are sequentially received from the respective terminals, the search device is in the order of the requests. There is a problem that the waiting time increases as the request comes later for processing.

【０００６】[0006]

【発明が解決しようとする課題】本発明の目的は、複数
個の端末を持つフルテキストサーチの検索装置につい
て、各端末からの検索要求が次々と送られてきても、見
かけ上シングルユーザの場合と変らない応答時間で端末
に結果を帰すことのできる装置を提供することにある。SUMMARY OF THE INVENTION It is an object of the present invention to provide a full-text search search device having a plurality of terminals even if search requests are sent from each terminal one after another, but in the case of an apparent single user. It is to provide a device capable of returning a result to a terminal with a response time that does not change.

【０００７】[0007]

【課題を解決するための手段】前記課題を解決するため
に、検索処理中に他の検索端末からの検索要求を受け付
けた場合、その検索要求を格納する手段と、検索要求格
納手段内の検索要求の個数を監視して、予め設定された
個数ｎ個以上の複数の検索要求が格納されたとき、格納
された複数の検索要求を一括して検索処理する手段とを
設けたことにある。In order to solve the above problems, when a search request is received from another search terminal during the search process, a means for storing the search request and a search in the search request storage means There is provided means for monitoring the number of requests and, when a preset number n or more of a plurality of search requests are stored, collectively processing the stored plurality of search requests.

【０００８】具体的には、以下の手段を有する装置を構
成する。（１）文字列の照合手段（２）テキストデータの格納と読出し手段（３）複数の検索端末より要求を受け付ける手段（４）検索要求の待ちキューバッファ（前記検索要求格
納手段に相当する）（５）検索要求元識別手段（６）キューバッファに溜った検索要求を要求元の識別
子を付加して一つにまとめる検索要求を統一する手段（７）要求元別に検索結果の文書集合を格納する手段（８）要求元別に結果情報を一時保存し、要求元へ転送
する手段（９）統一された検索要求に対して一時に検索処理を行
い、その結果を要求元別に振り分ける手段以下これらの手段を用いた検索処理の概要を説明する。Specifically, an apparatus having the following means is constructed. (1) Character string collating means (2) Text data storing and reading means (3) Means for accepting requests from a plurality of search terminals (4) Wait queue buffer for search requests (corresponding to the search request storing means) ( 5) Retrieval request source identification means (6) Means for unifying retrieval requests accumulated in the queue buffer by adding an identifier of the retrieval source to unify the retrieval requests (7) Storing a document set of retrieval results for each requesting source Means (8) Means for temporarily storing the result information for each request source and transferring it to the request source (9) Means for temporarily performing a search process for a unified search request and distributing the result according to the request sources An outline of the search process using is explained.

【０００９】検索処理中でも、他の端末からの要求を受
け付けられるようにするには、図１のようにキューバッ
ファを設け、検索中に受け付けた要求を逐次待ちキュー
として登録していく方法が考えられる。しかし、この方
法では、後から登録した検索要求はその前に登録してあ
る検索要求の検索処理が全て実行されるまで待たされる
ことになる。そこで、図２に示すように検索要求をバッ
ファから取り出す時に、バッファに既に入っている検索
要求の数を見て、溜っている検索要求を一つにまとめる
検索要求統一処理を行うことで前記の検索要求登録順に
係らず一回の検索処理で全ての要求元に検索結果を送出
することができる。以下、この検索要求統一処理と要求
元への検索結果送出方法について概略を説明する。To allow requests from other terminals to be accepted even during the search process, a method of providing a queue buffer as shown in FIG. 1 and registering the requests accepted during the search as a sequential waiting queue is considered. To be However, according to this method, a search request registered later has to wait until the search processing of the search request registered before is executed. Therefore, as shown in FIG. 2, when the search requests are fetched from the buffer, the number of the search requests already stored in the buffer is checked, and the search request unification processing for collecting the accumulated search requests into one is performed. Regardless of the search request registration order, the search results can be sent to all request sources by one search process. The outline of the search request unifying process and the method of transmitting the search result to the request source will be described below.

【００１０】この検索要求統一処理は、例えば「“計算
機”という単語が現われる文書を探せ」という要求と、
「“バイオ技術”という単語が現われる文書を探せ」と
いう要求と「“学習型ユーザインタフェース”という単
語が現われる文書を探せ」という要求がキューバッファ
に格納されていた場合に、これらをまとめて「“計算
機”または“バイオ技術”または“学習型ユーザインタ
フェース”のいずれかの単語が現われる文書を探せ」と
いう新たな要求として検索処理を実行する。This search request unifying process includes, for example, a request "search for a document in which the word" computer "appears".
If a request for "search for a document in which the word" biotechnology "appears" and a request for "search for a document in which the word" learning user interface "appears" are stored in the queue buffer, combine them into " The search processing is executed as a new request of "search for a document in which a word of" computer "or" biotechnology "or" learning type user interface "appears".

【００１１】この検索要求を管理する待ちキューは、検
索要求のあった端末の識別子を条件式と共に格納するも
のである。すなわち、前記例での検索要求は以下のよう
に表せる。The waiting queue for managing the search request stores the identifier of the terminal that has made the search request together with the conditional expression. That is, the search request in the above example can be expressed as follows.

【００１２】最初の段階でキュー１＝ｕ１：“計算機” キュー２＝ｕ２：“バイオ技術” キュー３＝ｕ３：“学習型ユーザインタフェース” とキューバッファに入っていた場合、これらの検索要求
を統一してキュー１＝ｕ１：“計算機” ｏｒｕ２：“バイオ技術” ｏｒｕ３：“学習型ユーザインタフェース” のように新しい検索要求として検索処理する。ここで、
ｕ１，ｕ２，ｕ３は要求元の識別子である。When the queue 1 = u1: “computer” queue 2 = u2: “biotechnology” queue 3 = u3: “learning type user interface” in the queue buffer at the first stage, these search requests are unified. Then, the retrieval processing is performed as a new retrieval request such as queue 1 = u1: “computer” or u2: “biotechnology” or u3: “learning type user interface”. here,
u1, u2, and u3 are identifiers of request sources.

【００１３】このようにして統一された検索要求で、検
索処理してすべての要求に対していずれかの条件式を充
たす文書を検索する。すなわち、前記の例では、“計算
機”，“バイオ技術”，“学習型ユーザインタフェー
ス”のいずれかの語を含む文書を検索し、もしこれらの
語を含む文書がヒットすれば、文書を特定する文書ＩＤ
と共に要求元の識別子を出力する。例えば、ｄ．．を文
書ＩＤとして、ｄ１，ｕ１，ｕ３，ｕ５ｄ３，ｕ２ｄ１０，ｕ３，ｕ４ｄ２５，ｕ２，ｕ５ｄ３７，ｕ５のように出力する。この例では、例えばｄ１の文書につ
いて、ｕ１，ｕ３，ｕ５の要求した条件式が充たされて
いることを示す。With the thus unified search request, a search process is performed to search for a document satisfying any of the conditional expressions for all the requests. That is, in the above example, a document including any of the words "computer", "biotechnology", and "learning type user interface" is searched, and if the document including these words is hit, the document is specified. Document ID
Along with it, the requester identifier is output. For example, d. ． Is output as a document ID such as d1, u1, u3, u5 d3, u2 d10, u3, u4 d25, u2, u5 d37, u5. In this example, it is shown that the conditional expressions requested by u1, u3, and u5 are satisfied for the document d1, for example.

【００１４】次にこの出力結果を要求元へ振り分ける処
理を行う。この振り分け処理は、要求元識別子をもとに
して行うことができる。すなわち、前記例では、ｕ．．
によってそれぞれの要求元への検索結果集合として格納
し各要求元へ出力する。Next, processing for distributing the output result to the request source is performed. This distribution process can be performed based on the request source identifier. That is, in the above example, u. ．
Is stored as a search result set for each request source and output to each request source.

【００１５】例えば、ｕ１：（ｄ１）ｕ２：（ｄ３，ｄ２５）ｕ３：（ｄ１，ｄ１０）ｕ４：（ｄ１０）ｕ５：（ｄ１，ｄ２５，ｄ３７）のようになる。これは、例えばｕ１の要求した条件に合
致する文書がｄ１であったことを示している。最後にこ
のようにして得られた要求元毎の検索結果を各要求元へ
出力して処理を終える。For example, u1: (d1) u2: (d3, d25) u3: (d1, d10) u4: (d10) u5: (d1, d25, d37). This indicates that, for example, the document matching the condition requested by u1 was d1. Finally, the search results thus obtained for each requesting source are output to each requesting source, and the processing is terminated.

【００１６】このようにして、複数の検索端末の要求を
ひとつにまとめて検索処理することにより、検索装置の
レスポンスを向上することができる。例えば、前記例で
は、５個の検索要求を統一して検索処理しているので、
本来５回テキストをスキャンしなければならないところ
を１回のスキャンで処理を終えることができる。In this way, the requests of a plurality of search terminals are combined into one and the search processing is performed, whereby the response of the search device can be improved. For example, in the above example, since the search processing is performed by unifying the five search requests,
Where texts should originally be scanned five times, the process can be completed with one scan.

【００１７】次に、検索装置にさらに高度な機能を持た
せた場合の処理の概要について、説明する。フルテキス
トサーチにおいては、テキストをスキャンするというそ
の方式上の問題から、できるだけスキャンする量を減ら
した方が結果として処理時間が短縮するという特質があ
る。また、検索の手順を考えた場合、前回の検索結果集
合をもとに、さらに条件を付加して絞り込みの検索をす
ることが圧倒的に多い。すなわち、一般的な検索では、
既に絞られた特定の集合に対してキーワードを変えなが
らさらに絞り込んでいくことが多く、常にＤＢ全体（原
集合）に対しての検索を行うわけではない。従って、絞
り込みの検索を行うのであれば、すべてのデータをスキ
ャンするよりも、前回の検索結果集合のデータのみにつ
いてスキャンする方が、処理時間を短縮することができ
る。この様な、前回の検索結果である限られた文書集合
を次の検索対象として検索処理していき、検索処理毎に
その探索範囲を狭めていく検索方法をここではハイアラ
ーキ検索と呼ぶ。Next, the outline of the processing when the search device is provided with more advanced functions will be described. In the full-text search, there is a characteristic that the processing time is shortened as a result by reducing the scanning amount as much as possible due to the system problem of scanning the text. Further, when considering the search procedure, it is overwhelmingly more frequent to add a condition and perform a narrowed search based on the previous search result set. That is, in a general search,
In many cases, specific keywords that have already been narrowed down are further narrowed down by changing keywords, and the entire DB (original set) is not always searched. Therefore, if a narrowed search is performed, the processing time can be shortened by scanning only the data of the previous search result set rather than scanning all the data. A search method in which a limited document set, which is the result of the previous search, is subjected to the next search target and the search range is narrowed for each search process is called a hierarchy search.

【００１８】ハイアラーキ検索機能を複数端末接続のシ
ステムで実現するには、まず検索対象をどの様に管理
し、複数要求の受付け時どの様に設定するかが問題とな
る。この問題を図３を用いて具体的に説明する。図３で
は、時刻ｔ１にユーザａが“音声”という検索タームが
ある文書集合ＢＡＳＥ＿Ａ１をつくり、時刻ｔ３にＢＡ
ＳＥ＿Ａ１に対して、更に“合成”という検索タームで
絞り込んでいる。一方、時刻ｔ２にはユーザＢが“画
像”という検索タームがある文書集合ＢＡＳＥ＿Ｂ１を
つくり、時刻ｔ４にＢＡＳＥ＿Ｂ１に対して、更に“認
識”という検索タームで絞り込んでいる。それぞれのユ
ーザの第２回目の検索では、ＢＡＳＥ＿Ａ１，ＢＡＳＥ
＿Ｂ１という異なる文書集合を対象に別々の検索処理を
行う必要があり、複数個の検索要求を一括して処理する
ことが困難となる。つまり、時刻ｔ３とｔ４での文字列
検索処理を同時に行うことができない。これまで説明し
てきたように、文字列検索処理は非常に時間のかかる処
理なので、より多くの時間を検索サービスにに要するよ
うになってしまう。In order to realize the hierarchy search function in a system with a plurality of terminals connected, first of all, how to manage the search target and how to set it when receiving a plurality of requests becomes a problem. This problem will be specifically described with reference to FIG. In FIG. 3, at time t1, the user a creates a document set BASE_A1 having a search term “voice”, and at time t3, BA
SE_A1 is further narrowed down by the search term "composite". On the other hand, at time t2, the user B creates a document set BASE_B1 having a search term of "image", and further narrows down the BASE_B1 at time t4 with a search term of "recognition". In the second search for each user, BASE_A1, BASE
It is necessary to perform different search processes for different document sets _B1, which makes it difficult to process a plurality of search requests at once. That is, the character string search processing at the times t3 and t4 cannot be performed at the same time. As described above, the character string search process is a very time-consuming process, so that the search service will require more time.

【００１９】この問題を解決するには、別々の検索対象
の文書集合を一つにまとめ、一度の検索処理で複数個の
検索要求を処理する必要がある。これは例えば、図４に
示すように常にデータベースの全体を検索対象として、
検索処理を行い検索処理後に各要求元の検索対象との積
集合を求めることで解決することができる。この方法で
は、常にＤＢ全体（原集合）ＢＡＳＥ＿０を対象に検索
処理するので、ユーザＡ，ユーザＢの２回目の検索要求
も時刻ｔ３の一度の文字列検索処理で終了することがで
きる。すなわち、時刻ｔ３においてＢＡＳＥ＿０に対し
てユーザＡとユーザＢの検索要求である“合成”と“認
識”のいずれかを含む文書を文字列検索し、検索処理終
了後にユーザＡとユーザＢの検索結果集合に分配し、そ
れぞれＢＡＳＥ＿Ａ１，ＢＡＳＥ＿Ｂ１と積集合を求め
ることで、求める検索結果集合ＢＡＳＥ＿Ａ２，ＢＡＳ
Ｅ＿Ｂ２を得ることができる。この方法では、図３のｔ
３及びｔ４の文字列検索処理を一度に行うことができる
が、常に全データベースＢＡＳＥ＿０のテキストデータ
をスキャンしなければならないため、探索量が増えて処
理時間が多量にかかるという欠点がある。In order to solve this problem, it is necessary to collect different document sets to be retrieved into one and process a plurality of retrieval requests by one retrieval process. For example, as shown in FIG. 4, the entire database is always searched for,
This can be solved by performing a search process and obtaining a product set with the search target of each request source after the search process. In this method, since the search process is always performed on the entire DB (original set) BASE_0, the second search request of the user A and the user B can be completed by the single character string search process at time t3. That is, at time t3, a character string search is performed for a document including any of “composite” and “recognition”, which are the search requests of user A and user B to BASE_0, and the search results of user A and user B after the search processing is completed. Search result sets BASE_A2, BASE to be obtained by distributing the sets to BASE_A1 and BASE_B1
E_B2 can be obtained. In this method, t in FIG.
Although the character string search processing of 3 and t4 can be performed at one time, since the text data of all databases BASE_0 must be constantly scanned, there is a disadvantage that the search amount increases and the processing time is long.

【００２０】これに対し、図５のようにＢＡＳＥ＿Ａ１
とＢＡＳＥ＿Ｂ１の和集合を検索対象にして一括検索す
る方法も考えられる。確かに検索対象は少なくなるので
探索量は減少するが、単純にＢＡＳＥ＿Ａ１とＢＡＳＥ
＿Ｂ１の和集合ＢＡＳＥ＿ＡＢ１の検索結果をユーザＡ
とユーザＢに分配するだけでは、求める結果集合が得ら
れない。すなわち、図に示す交差斜線の部分（ユーザＡ
の例では、ＢＡＳＥ＿Ｂ１でＢＡＳＥ＿Ａ１以外であ
り、かつ“合成”を含む文書）がノイズとなる。このよ
うに、マルチユーザ環境で検索装置を運用する場合、検
索対象ＤＢが同一でも各ユーザの検索文字列がそれぞれ
異なり、また絞り込み回数もそれぞれ異なるため、通常
ある時点での各ユーザの検索対象集合は相互に違ったも
のになる。したがって、各ユーザの検索要求（検索文字
列）を一括して検索しようとしても対象とする集合が異
なるため、単にそれぞれの検索対象集合の論理和（Ｏ
Ｒ）を取り、それに対して検索を行ったとしても、正し
い検索結果集合が得られない。On the other hand, as shown in FIG. 5, BASE_A1
It is also possible to perform a collective search by using the union of BASE_B1 and BASE_B1 as a search target. Certainly, the number of search items decreases, so the search amount decreases, but simply BASE_A1 and BASE
Search results for BASE_AB1 which is the union of _B1
The desired result set cannot be obtained only by distributing the result to the user B. That is, the cross hatched portion (user A
In this example, a document in which BASE_B1 is other than BASE_A1 and includes “composite” becomes noise. As described above, when the search device is operated in a multi-user environment, even if the search target DBs are the same, the search character strings of each user are different, and the number of times of narrowing down is also different. Will be different from each other. Therefore, even if an attempt is made to collectively retrieve the search requests (search character strings) of each user, the target sets are different, and therefore the logical sum (O
Even if R) is taken and a search is performed on it, a correct search result set cannot be obtained.

【００２１】これを解決するには、図６のように一括検
索処理した後、それぞれの要求元の検索結果集合との積
集合を求め、これを検索結果とする。すなわち、時刻ｔ
３での検索結果集合ＢＡＳＥ＿ＡＢ２をユーザＡとユー
ザＢのそれぞれに分配すると共に、各ユーザの検索対象
であるＢＡＳＥ＿Ａ１とＢＡＳＥ＿Ｂ１との積集合を取
るのである。こうすれば、要求元単位で、検索ノイズの
ない正しい検索結果が得られることになる。To solve this, after performing the batch search process as shown in FIG. 6, a product set with the search result set of each request source is obtained, and this set is used as the search result. That is, time t
The search result set BASE_AB2 in 3 is distributed to each of the user A and the user B, and a product set of BASE_A1 and BASE_B1 which is a search target of each user is obtained. By doing this, it is possible to obtain a correct search result with no search noise for each request source.

【００２２】以下、図６の方式でえられたユーザＡの検
索結果ＢＡＳＥ＿ＡＢ３Ａが求める集合ＢＡＳＥ＿Ａ２
に等しいことを図７を用いて説明する。ここでは、検索
対象を定めて文字列検索した結果をＢＡＳＥ＿ｘｘ
（“文字列”）のように表すものとする。この式は、Ｂ
ＡＳＥ＿ｘｘの集合を対象に“文字列”のある文書を検
索した結果集合を表す。例えば、ＢＡＳＥ＿０（“音
声”）はＢＡＳＥ＿０を対象に“音声”で検索した結果
集合なので、ＢＡＳＥ＿Ａ１と同一集合を表すことにな
る。図６の検索結果集合ＢＡＳＥ＿ＡＢ３Ａはこの式を
用いて以下のように展開できる。Hereinafter, the set BASE_A2 obtained by the search result BASE_AB3A of the user A obtained by the method of FIG.
Will be described with reference to FIG. Here, the result of the character string search that defines the search target is BASE_xx.
("Character string"). This formula is B
It represents a result set obtained by searching a document having a "character string" for the set of ASE_xx. For example, BASE_0 (“voice”) is a result set obtained by searching BASE_0 with “voice”, and thus represents the same set as BASE_A1. The search result set BASE_AB3A in FIG. 6 can be expanded as follows using this formula.

【００２３】 BASE_AB3A = BASE_A1 AND BASE_AB2A = BASE_A1 AND BASE_AB2("合成") = BASE_A1 AND BASE_AB1(("合成" OR "認識") AND "合成") = BASE_A1 AND BASE_AB1("合成") = BASE_A1 AND (BASE_A1("合成") OR BASE_B1("合成")) = (BASE_A1 AND BASE_A1("合成")) OR (BASE_A1 AND BASE_B1("合成")) = BASE_A1("合成") OR (BASE_A1 AND BASE_B1)("合成”）図７は以上の検索結果集合間の関係を示すものである。
図に示すように、ＢＡＳＥ＿Ａ１とBASE_B1の積集合(BA
SE_A1 AND BASE_B1)はBASE_A1に完全に包含される。従
って、(BASE_A1 AND BASE_B1)("合成")も、BASE_A1("合
成")に完全に包含される。ゆえに、 BASE_AB3A = BASE_A1("合成") = BASE_A2 が成り立つ。BASE_AB3A = BASE_A1 AND BASE_AB2A = BASE_A1 AND BASE_AB2 ("synthesis") = BASE_A1 AND BASE_AB1 (("synthesis" OR "recognition") AND "synthesis") = BASE_A1 AND BASE_AB1 ("synthesis") = BASE_A1 AND (BASE_A1 ("Synthesis") OR BASE_B1 ("Synthesis")) = (BASE_A1 AND BASE_A1 ("Synthesis")) OR (BASE_A1 AND BASE_B1 ("Synthesis")) = BASE_A1 ("Synthesis") OR (BASE_A1 AND BASE_B1) (" Synthesis ") FIG. 7 shows the relationship between the above search result sets.
As shown in the figure, the intersection of BASE_A1 and BASE_B1 (BA
SE_A1 AND BASE_B1) is completely included in BASE_A1. Therefore, (BASE_A1 AND BASE_B1) (“composite”) is also completely included in BASE_A1 (“composite”). Therefore, BASE_AB3A = BASE_A1 ("composite") = BASE_A2 holds.

【００２４】このような、集合間の論理条件の演算は、
文字列検索に較べ極めて短時間に処理できるため、文字
列検索処理の対象をできるだけ減少させている本方式は
有効で、複数人の検索要求を一括して短時間で処理する
ことができるといえる。The calculation of the logical condition between the sets is as follows.
Since it can be processed in an extremely short time compared to character string search, this method that reduces the number of character string search processing targets as much as possible is effective, and it can be said that search requests of multiple people can be processed collectively in a short time. .

【００２５】このようなハイアラーキ検索機能を有する
検索装置において、前記の複数要求がキューバッファに
溜っている場合の処理についてもう少し具体的に説明す
る。In the search device having such a hierarchy search function, the process when the plurality of requests are accumulated in the queue buffer will be described more concretely.

【００２６】例えば各検索端末の検索要求により、現在
までに次のように結果集合が得られているものとする。For example, it is assumed that the following result sets have been obtained up to now by a search request from each search terminal.

【００２７】ｕ１：（ｄ１，ｄ５，ｄ１２，ｄ１５，ｄ１８，ｄ２
７）ｕ２：（ｄ２，ｄ５，ｄ７，ｄ１２，ｄ１８）ｕ３：（ｄ１，ｄ５，ｄ１８，ｄ３０）ｕ４：（ｄ３，ｄ７，ｄ１２，ｄ３０，ｄ４２，ｄ５
０，ｄ５２）ｕ５：（ｄ２，ｄ５，ｄ８，ｄ１２，ｄ１８，ｄ４２，
ｄ５２）この時、キューには次のように処理すべき要求が入って
いたものとする。U1: (d1, d5, d12, d15, d18, d2
7) u2: (d2, d5, d7, d12, d18) u3: (d1, d5, d18, d30) u4: (d3, d7, d12, d30, d42, d5)
0, d52) u5: (d2, d5, d8, d12, d18, d42,
d52) At this time, it is assumed that the queue has a request to be processed as follows.

【００２８】キュー１＝ｕ１：“計算機” キュー２＝ｕ２：“バイオ技術” キュー３＝ｕ３：“学習型ユーザインタフェース” 検索要求のあるのはｕ１，ｕ２，ｕ３の端末からである
ので、前記検索結果集合のうち、ｕ１，ｕ２，ｕ３の結
果集合のＯＲ集合を求めて、この集合内の文書のテキス
トデータのみをスキャンして“計算機”“バイオ技術”
“学習型ユーザインタフェース”のいずれかの語を含む
文書を探索して、ヒットした文書について、要求元の識
別子と共に文書ＩＤを出力する。すなわち、下記の結果
集合ｕ１ｏｒｕ２ｏｒｕ３：（ｄ１，ｄ２，ｄ５，
ｄ７，ｄ１２，ｄ１５，ｄ１８，ｄ２７，ｄ３０）に含まれる文書のテキストデータをスキャンし、例えば
次のような検索結果を得る。Queue 1 = u1: “Computer” Queue 2 = u2: “Biotechnology” Queue 3 = u3: “Learning User Interface” Since there is a search request from the terminals of u1, u2, u3, From the search result set, an OR set of the result sets u1, u2, and u3 is obtained, and only the text data of the documents in this set is scanned, and “computer” “biotechnology”
A document including any of the words of "learning type user interface" is searched for, and the document ID is output together with the identifier of the request source for the hit document. That is, the following result set u1 or u2 or u3: (d1, d2, d5
d7, d12, d15, d18, d27, d30) is scanned for the text data of the document to obtain the following retrieval result, for example.

【００２９】ｄ１，ｕ１，ｕ３ｄ５，ｕ２ｄ１２，ｕ２，ｕ３ｄ１５，ｕ１ｄ３０，ｕ２この後、要求元別にこの出力結果を振り分けるが、ただ
振り分けただけでは、要求元別の検索結果集合に含まれ
ない文書が出力されるため、振り分けた後の集合と結果
集合とのＡＮＤ集合を求め最終結果とする。すなわち、
一旦下記のように要求元別に検索結果を振り分けた後、ｕ１：（ｄ１，ｄ１５）ｕ２：（ｄ５，ｄ１２，ｄ３０）ｕ３：（ｄ１，ｄ１２）前記、最初にＯＲ集合を求めた結果集合とのＡＮＤ集合
を求め、ｕ１：（ｄ１，ｄ１５）ｕ２：（ｄ５，ｄ１２）ｕ３：（ｄ１）とすると、正しい結果が得られることになる。D1, u1, u3 d5, u2 d12, u2, u3 d15, u1 d30, u2 After that, the output results are distributed according to request sources, but if they are distributed only, they are included in the search result set for each request source. Since a document that does not exist is output, an AND set of the sorted set and the result set is obtained as the final result. That is,
Once the search results are sorted by request source as follows, u1: (d1, d15) u2: (d5, d12, d30) u3: (d1, d12) and the result set from which the OR set is first obtained When the AND set of is obtained and is set to u1: (d1, d15) u2: (d5, d12) u3: (d1), a correct result will be obtained.

【００３０】このように、検索結果集合をもとにスキャ
ンすべき文書をあらかじめ絞っておき、検索要求をＯＲ
で検索処理した後、各要求元別の結果集合をＡＮＤする
ことにより、レスポンスがよく、なおかつハイアラーキ
検索という機能を提供することが可能となる。As described above, the documents to be scanned are narrowed down in advance based on the search result set, and the search request is ORed.
After performing the search processing in (1) and (2), the result set for each request source is ANDed, so that it is possible to provide a function with high response and a hierarchy search.

【００３１】[0031]

【作用】以上の手段を用いて、複数の検索端末から検索
要求をキューバッファに溜め込み、一定数以上の検索要
求がバッファに入ったときには、検索要求を一つに統一
して処理することにより、待ち時間の少ない検索装置を
提供することができる。統一された検索要求には、要求
元の識別子がそれぞれの検索要求に付加し、文字列の照
合結果にも要求元識別子を出力するので、一度に検索処
理した後、結果集合を要求元別に振り分けることができ
る。By using the above means, search requests from a plurality of search terminals are stored in the queue buffer, and when a certain number or more of search requests are entered in the buffer, the search requests are unified and processed. A search device with a short waiting time can be provided. In the unified search request, the request source identifier is added to each search request, and the request source identifier is also output to the collation result of the character string. Therefore, after performing the search processing at once, the result set is sorted according to the request source. be able to.

【００３２】また、要求元別に過去の検索結果集合を格
納する手段を有し、絞り込み検索のモードの時、統一さ
れた検索要求に対しては、まず検索要求元毎の結果集合
のＯＲ集合を検索の対象として、テキストデータをスキ
ャンすることにより、データスキャン量を少なくして、
検索処理時間を短縮することができる。このとき、検索
結果に対して、前記ＯＲ集合をとった結果集合ともう一
度今度はＡＮＤを求めることで、各要求元別に正しい絞
り込み結果が得ることができる。Further, there is a means for storing a past search result set for each request source, and in the narrow search mode, for a unified search request, first, an OR set of the result sets for each search request source is set. By scanning text data as a search target, the amount of data scan is reduced,
The search processing time can be shortened. At this time, a correct narrowing result can be obtained for each request source by once again obtaining AND with the result set obtained by taking the OR set with respect to the search result.

【００３３】[0033]

【実施例】以下、本発明の実施例について詳細に説明す
る。EXAMPLES Examples of the present invention will be described in detail below.

【００３４】図８は、本実施例の構成を示す図である。
本実施例は、検索端末１００、検索処理を実行するＣＰ
Ｕ１１０、データベースを格納する磁気ディスク１２
０、各種のプログラムとデータバッファを格納するメモ
リ１３０からなっている。メモリ１３０には、検索処理
全体を制御する制御プログラム、データベース中の特定
文字列を探索する文字列探索プログラム、データを前記
磁気ディスクから読み出し文字列探索プログラムにデー
タを送るファイル読み出しプログラム、文字列探索プロ
グラムの出力から検索文字列間のＡＮＤ，ＯＲを判定処
理する論理条件判定プログラム、要求が複数の端末から
送られたものである場合、各要求元へ検索結果を分配し
て送出する照合結果分配プログラムの各プログラムが格
納してある。また、同メモリのほかの部分には文字列探
索及び論理条件判定の結果を一時的に蓄える照合結果出
力バッファ、各要求元別に過去の検索結果集合を管理蓄
積するセッションｎ用出力バッファ、各要求元へ検索結
果を出力するためのセッションｎ用出力バッファ、複数
の端末から同時に要求が来たときに検索要求を一時格納
しておく待ちキューバッファとしても使っている。FIG. 8 is a diagram showing the configuration of this embodiment.
In this embodiment, the search terminal 100 and the CP that executes the search process
U110, magnetic disk 12 for storing the database
0, a memory 130 for storing various programs and a data buffer. In the memory 130, a control program for controlling the entire search process, a character string search program for searching a specific character string in the database, a file read program for sending data from the magnetic disk to the character string search program, and a character string search A logical condition judgment program for judging AND and OR between search character strings from the output of the program, and a collation result distribution for distributing the search result to each request source when the request is sent from a plurality of terminals Each program of the program is stored. Further, in the other part of the memory, a collation result output buffer for temporarily storing the results of the character string search and the logical condition determination, an output buffer for session n for managing and storing a past search result set for each request source, and each request It is also used as an output buffer for session n for outputting the search result to the original, and a waiting queue buffer for temporarily storing the search request when requests are simultaneously received from a plurality of terminals.

【００３５】これより、検索処理の流れに従って、各プ
ログラムの処理の詳細を説明する。まず、端末側での検
索処理の流れについて説明する。図９は本実施例での検
索端末で実行する業務の流れを示すＰＡＤ図である。本
実施例では、検索端末を検索装置に接続する接続要求を
行なった後、条件式を端末使用者から読み込み、検索要
求を装置に送り結果を受け取る処理を繰り返すものとす
る。受け取った検索結果により、場合によっては文書デ
ータを検索装置より取得して表示することも行なう。す
べての検索業務が終了した後、検索装置との接続を開放
するものとする。Now, the details of the processing of each program will be described according to the flow of the search processing. First, the flow of search processing on the terminal side will be described. FIG. 9 is a PAD diagram showing the flow of work executed by the search terminal in this embodiment. In this embodiment, after making a connection request to connect the search terminal to the search device, the conditional expression is read from the terminal user, the search request is sent to the device, and the process of receiving the result is repeated. Depending on the received search result, the document data may be acquired from the search device and displayed. After completing all search operations, the connection with the search device shall be released.

【００３６】図１０は制御プログラムの処理をまとめた
ＰＡＤ図である。FIG. 10 is a PAD diagram summarizing the processing of the control program.

【００３７】まず検索端末より接続要求があると、制御
プログラムはセッションｎ用結果集合管理テーブルとセ
ッションｎ用出力バッファを確保する。このｎは、接続
要求のあった端末ごとに更新される１からの通し番号で
ある。この番号を以後端末識別子と呼ぶ。例えば、既に
２個の端末からの接続要求が来ていて、新たに３個目の
端末から接続要求が来た場合、この端末の識別子を３と
し、セッション３用結果集合管理テーブル及びセッショ
ン３用出力バッファの領域を確保する。First, when there is a connection request from the search terminal, the control program reserves the session n result set management table and the session n output buffer. This n is a serial number from 1 that is updated for each terminal that has made a connection request. This number is hereinafter referred to as a terminal identifier. For example, if a connection request has already arrived from two terminals and a connection request has newly arrived from the third terminal, the identifier of this terminal is set to 3, and the result set management table for session 3 and for session 3 are used. Reserve an area for the output buffer.

【００３８】接続の開放要求があった場合には、上記端
末識別子の結果集合管理テーブルと出力バッファのメモ
リ領域を開放する。When a connection release request is issued, the result set management table of the terminal identifier and the memory area of the output buffer are released.

【００３９】文書データの表示要求の場合は、要求のあ
った文書をファイルから読みだす命令をファイル読み出
しプログラムに対して送り、該プログラムがセッション
ｎ用出力バッファに出力する文書データを要求元へ送出
する。In the case of a document data display request, an instruction to read the requested document from the file is sent to the file reading program, and the program sends the document data to be output to the session n output buffer to the request source. To do.

【００４０】制御プログラムの処理の説明の最後に、本
発明の特徴である検索要求が端末より送られたときの処
理について詳細に説明する。At the end of the description of the processing of the control program, the processing when the search request is sent from the terminal, which is a feature of the present invention, will be described in detail.

【００４１】検索要求を受け付けた場合、制御プログラ
ムは新たに待ちキューを生成してキューバッファに積み
上げる処理を行なう。このキューバッファは図１１に示
すように、要求元の端末識別子と検索条件式を対にして
積み上げた構造となっている。キューが一つ積み上がる
毎にキューカウンタの数をインクリメントする。後に説
明する文字列探索プログラムがこのキューバッファから
検索要求をとりだし、その条件式にしたがって検索処理
を行ない、結果を要求元へ送る。従って、制御プログラ
ムではキューバッファに検索要求を溜める処理だけを行
ない、文字列探索プログラムではキューバッファから検
索要求を取り出す処理のみを行なうことになる。文字列
探索処理が終わらないまま、制御プログラムが次々と検
索要求を受け付けた場合には、キューバッファに待ちキ
ューがどんどん積み上げられていくことになる。When the search request is accepted, the control program newly creates a waiting queue and accumulates it in the queue buffer. As shown in FIG. 11, this queue buffer has a structure in which a requester terminal identifier and a search condition expression are paired and stacked. The number of queue counters is incremented each time one queue is piled up. A character string search program, which will be described later, takes out a search request from this queue buffer, performs search processing according to the conditional expression, and sends the result to the request source. Therefore, the control program only performs the process of accumulating the search request in the queue buffer, and the character string search program only performs the process of extracting the search request from the queue buffer. When the control program accepts search requests one after another without completing the character string search process, the waiting queues are piled up in the queue buffer.

【００４２】続いて文字列探索処理の説明をする。この
文字列探索プログラムは前述の制御プログラムとは時分
割（タイムシェアリング）で動作しているものとする。
すなわち、制御プログラムは端末から検索要求が来るの
を監視し、文字列探索プログラムは制御プログラムがキ
ューを積むのを待っている。キューバッファに検索要求
が入ると文字列探索プログラムは、検索処理を始める
が、検索処理の実行中はキューバッファは参照しないの
で、次々と検索要求が来たときには検索要求がバッファ
に溜りつづけることになる。この検索要求は文字列探索
プログラムの検索処理の実行終了後、処理されることに
なる。Next, the character string search processing will be described. It is assumed that this character string search program operates in a time-sharing manner (time sharing) with the control program described above.
That is, the control program monitors for a search request from the terminal, and the character string search program waits for the control program to queue. When a search request enters the queue buffer, the character string search program starts the search process, but since the queue buffer is not referenced during the execution of the search process, the search requests continue to accumulate in the buffer when the search requests come in one after another. Become. This search request will be processed after the search process of the character string search program is completed.

【００４３】図１２に、この文字列探索プログラムの処
理の流れを表すＰＡＤ図を示す。文字列探索プログラム
は、最初検索要求が入ってくるのを待っている。もし、
検索要求が入ってくれば検索処理を実行する。この時、
待ちキューバッファに入っている検索要求の個数をチェ
ックし、もし複数個の検索要求が入っている場合には、
これらの検索要求を同時に取り出して一括して検索処理
を行う。次にキューの中に入っている条件式を解析し、
検索タームにタームＩＤを振る処理を行う。以後の処理
ではこのタームＩＤを用いて行うことになる。すなわ
ち、文字列探索処理において、テキスト中に該当する検
索タームが有った場合、上記タームＩＤを出力する。そ
して、ターム間論理条件判定処理で、検索ターム間の論
理的な条件すなわち、ＡＮＤ，ＯＲの条件に合致するも
のだけを出力する。最後に今までの結果を要求端末毎に
分配して、送出して一連の検索処理が終了し、再び検索
要求待ちの状態へ移る。FIG. 12 is a PAD diagram showing the flow of processing of this character string search program. The string search program first waits for an incoming search request. if,
If a search request comes in, the search process is executed. At this time,
Check the number of search requests in the waiting queue buffer, and if there are multiple search requests,
These search requests are taken out at the same time and the search processing is performed collectively. Next, analyze the conditional expression in the queue,
The process of assigning the term ID to the search term is performed. In the subsequent processing, this term ID is used. That is, in the character string search process, when there is a corresponding search term in the text, the term ID is output. Then, in the inter-term logical condition determination processing, only those that meet the logical condition between the search terms, that is, the AND and OR conditions are output. Finally, the results up to now are distributed to each requesting terminal, and the result is sent to complete a series of search processing, and the state again waits for a search request.

【００４４】以上の文字列探索処理プログラムの中で、
タームＩＤの付与処理、論理条件の判定処理、結果の分
配処理について更に詳細に説明する。In the above character string search processing program,
The term ID assigning process, the logical condition determining process, and the result distributing process will be described in more detail.

【００４５】タームＩＤの付与処理とは、図１３に示す
ように、単一あるいは複数個のキューに蓄えられている
条件式の検索タームにユニークな番号を付与することが
主な処理となる。その他、条件式間で同じ検索タームが
使用されていないか、ターム間のＡＮＤ条件は存在する
か、各検索タームはどの検索端末からの要求によるもの
かを識別し、処理する。タームＩＤ付与処理では、図１
３に示すように端末識別子と条件式を格納した検索要求
を処理して、タームＩＤテーブル、多重照合タームＩＤ
テーブル、ＡＮＤ条件判定テーブル、及び出力ＩＤリス
トの４個のテーブルあるいはリストを生成する。ターム
ＩＤテーブルは、実際の文字列と対応するタームＩＤを
示したものである。このとき、ターム間にＡＮＤ条件が
あれば、ＡＮＤ条件の項に後述するＡＮＤ条件判定テー
ブルの条件ＩＤを登録する。この条件ＩＤは、１から始
まる整数値である。As shown in FIG. 13, the term ID assigning process is mainly performed by assigning a unique number to a search term of a conditional expression stored in a single or a plurality of queues. In addition, the same search term is not used between the conditional expressions, whether the AND condition between the terms exists, and each search term is identified and processed by the request terminal. In the term ID assigning process, FIG.
As shown in 3, the search request that stores the terminal identifier and the conditional expression is processed, and the term ID table and the multiple matching term ID are processed.
Four tables or a table, an AND condition determination table, and an output ID list are generated. The term ID table shows term IDs corresponding to actual character strings. At this time, if there is an AND condition between the terms, the condition ID of the AND condition determination table described later is registered in the item of the AND condition. This condition ID is an integer value starting from 1.

【００４６】タームＩＤテーブルに登録する処理は、検
索要求ごとに行われ、それぞれの検索要求内で指定され
る検索文字列にユニークな番号を付与していく。図１３
の例では、キュー１の検索条件により“計算機”にター
ムＩＤの１が“事務処理”にタームＩＤの２がつけられ
ることになる。以下、検索タームの登録ごとに順番にユ
ニークなＩＤ番号がつけられていく。The process of registering in the term ID table is performed for each search request, and a unique number is given to the search character string specified in each search request. FIG.
In the above example, the term ID 1 is assigned to “computer” and the term ID 2 is assigned to “office work” according to the search condition of queue 1. Below, a unique ID number is assigned in order for each registration of the search term.

【００４７】このようにタームＩＤテーブルにＩＤをつ
けていくとき、既に他の条件式で同じ文字列をタームと
して用いていた場合、例えば図１３の例でキュー３の処
理の場合には、多重照合タームＩＤという特殊なＩＤを
タームＩＤテーブルに登録する。図１３の例では、１０
０１から始まる整数値となっている。そして、多重照合
タームＩＤテーブルに２件分のタームＩＤを登録してお
く。すなわち、図１３の例で具体的に説明すると、ター
ムＩＤテーブルに元々登録されていた“計算機”をあら
わすＩＤ番号１を１００１で置き換え、更に多重照合タ
ームＩＤテーブルに２件分のタームＩＤである１と６を
登録する。この多重照合タームＩＤを付与された文字列
が文字列検索中ヒットした場合には、多重照合タームＩ
Ｄテーブルを参照してそこに登録された複数個のターム
ＩＤを検索結果として出力する。図１３の例では、“計
算機”という文字列が文書中に有ると、タームＩＤ１０
０１が出力されるがこれは多重照合タームＩＤなので、
多重照合タームＩＤテーブルを参照して、結局タームＩ
Ｄの１と６を出力することになる。When IDs are added to the term ID table in this way, if the same character string has already been used as a term in another conditional expression, for example, in the case of the processing of queue 3 in the example of FIG. A special ID called collation term ID is registered in the term ID table. In the example of FIG. 13, 10
It is an integer value starting from 01. Then, two term IDs are registered in the multiple matching term ID table. That is, specifically explaining with the example of FIG. 13, the ID number 1 representing the “computer” originally registered in the term ID table is replaced with 1001, and two term IDs are added to the multiple matching term ID table. Register 1 and 6. If the character string to which this multiple collation term ID is attached is hit during the character string search, the multiple collation term I
The D table is referred to and a plurality of term IDs registered therein are output as a search result. In the example of FIG. 13, when the character string “calculator” is present in the document, the term ID 10
01 is output, but since this is a multiple matching term ID,
Refer to the multiple collation term ID table, and eventually term I
1 and 6 of D will be output.

【００４８】ＡＮＤ条件判定テーブルは、ターム間にＡ
ＮＤ条件が有ったときに登録される。例えば、図１３の
キュー１の検索要求を処理するとき、タームＩＤテーブ
ルにタームＩＤとターム文字列を登録し、かつＡＮＤ条
件の項目に条件ＩＤの１を登録する。この条件ＩＤも各
ＡＮＤ条件ごとにユニークな１から始まる整数値であ
る。また、ＡＮＤ条件テーブルには、条件ＩＤとターム
ＩＤ及び出力ＩＤを登録する。出力ＩＤは、タームＩＤ
を含めたユニークな番号となる。図１３の例でキュー１
の処理をしていて、これまでタームＩＤは、１と２が付
与されているので、ＡＮＤ条件テーブルの出力ＩＤには
３が登録されることになる。また、ここでＩＤ番号３が
使用されるため、キュー２の処理では検索ターム“環境
問題”にタームＩＤの４が付与されることになる。The AND condition judgment table indicates that A
It is registered when there is an ND condition. For example, when processing the search request for the queue 1 in FIG. 13, the term ID and the term character string are registered in the term ID table, and the condition ID 1 is registered in the AND condition item. This condition ID is also an integer value starting from 1 that is unique for each AND condition. The condition ID, the term ID, and the output ID are registered in the AND condition table. Output ID is term ID
It will be a unique number including. Queue 1 in the example of FIG.
In this case, since the term IDs 1 and 2 have been given so far, 3 is registered in the output ID of the AND condition table. Since the ID number 3 is used here, the term ID 4 is added to the search term "environmental problem" in the processing of the queue 2.

【００４９】ＡＮＤ条件判定テーブルは、図に示すとお
りＡＮＤ条件を持つタームＩＤとそのタームが文書中に
あったことを示すヒットフラグ、及び条件が成立したと
きに出力する出力ＩＤとを項目としている。ヒットフラ
グは最初０に初期化されており、文字列検索処理におい
て、タームが文書中にあったときに１となる。一つの条
件ＩＤに対して全てのヒットフラグが１の場合に出力Ｉ
Ｄを出力する。また、文書単位でヒットフラグは０にリ
セットされる。図１３の例では、“計算機”（多重照合
により、タームＩＤは１及び６）と“事務処理”（ター
ムＩＤは２）の両方の文字列が文書中にあったときに条
件ＩＤが１の全てのヒットフラグが１になるのでＩＤの
３を出力する。As shown in the figure, the AND condition judgment table has as its items a term ID having an AND condition, a hit flag indicating that the term is present in the document, and an output ID output when the condition is satisfied. . The hit flag is initially initialized to 0, and becomes 1 when the term is in the document in the character string search process. Output I when all hit flags are 1 for one condition ID
Output D. Also, the hit flag is reset to 0 for each document. In the example of FIG. 13, when the character strings of both "Computer" (term IDs 1 and 6 by multiple collation) and "office work" (term ID 2) are in the document, the condition ID is 1 Since all hit flags become 1, ID 3 is output.

【００５０】出力ＩＤリストは、最終結果を出力すると
きのタームＩＤまたは条件式ＩＤを登録したもので、こ
こに登録されなかったＩＤは検索結果として出力しな
い。また、同時に出力するＩＤと該タームあるいは条件
を要求した端末の識別子の対応をとっていて、照合結果
分配プログラムにおいては、この対応をもとに検索結果
を端末毎の出力バッファに振り分ける処理を行う。The output ID list is a registration of the term ID or conditional expression ID when outputting the final result, and IDs not registered here are not output as search results. In addition, the IDs that are output at the same time are associated with the identifiers of the terminals that have requested the terms or conditions, and in the collation result distribution program, the search results are distributed to the output buffers for each terminal based on this correspondence. .

【００５１】以上のタームＩＤ付与処理のアルゴリズム
をＰＡＤ図を用いて詳細に説明する。図１４に示すよう
に、この処理は待ちキューバッファより取り出した検索
要求の個数分同一の処理を繰り返す。最初に検索要求を
出した端末識別子をキューより取得する。次に、条件式
の種別に従ってそれぞれの処理を行う。条件式の種別と
は、ターム間がどの様な論理条件で結ばれているかを区
別するもので、本実施例ではＡＮＤ，ＯＲ及び検索ター
ムが単一で用いられていることを示す単純条件の３種類
を用いている。ＡＮＤ条件のときは、条件式中のそれぞ
れのタームについて後述のタームＩＤ登録、タームＩＤ
テーブルへのＡＮＤ条件ＩＤの設定、及びＡＮＤ条件テ
ーブルの設定を行なった後、出力ＩＤリストに端末識別
子と出力ＩＤを登録する。ＯＲ条件のときは、それぞれ
のタームについてタームＩＤ登録をした後に、各ターム
ＩＤを出力ＩＤリストに端末識別子とともに登録する。
単純条件のときは、タームＩＤ登録と出力ＩＤリストへ
の登録を行なう。The algorithm of the above term ID assigning process will be described in detail with reference to the PAD diagram. As shown in FIG. 14, this process is repeated for the number of search requests fetched from the waiting queue buffer. The identifier of the terminal that first issued the search request is acquired from the queue. Next, each processing is performed according to the type of conditional expression. The type of conditional expression distinguishes what kind of logical condition the terms are connected to each other, and in the present embodiment, a simple condition indicating that AND, OR, and search terms are used singly. Three types are used. In the case of an AND condition, the term ID registration and term ID described later for each term in the conditional expression
After the AND condition ID is set in the table and the AND condition table is set, the terminal identifier and the output ID are registered in the output ID list. In the case of the OR condition, after registering the term ID for each term, each term ID is registered in the output ID list together with the terminal identifier.
In the simple condition, the term ID is registered and the output ID list is registered.

【００５２】次にタームＩＤの登録アルゴリズムの説明
をする。この処理は検索タームの文字列をタームＩＤテ
ーブルに登録するのが主な処理であるが、この時登録し
ようとする文字列が既に登録されているかチェックし、
登録されている場合には多重照合タームＩＤテーブルへ
の設定を行なう。すなわち、図１５に示すように、まず
登録しようとする検索タームの文字列と同一の文字列が
タームＩＤテーブルに登録されているかチェックする。
もし、新規であれば新しいタームＩＤとともに文字列を
タームＩＤテーブルに登録する。既に同じ文字列が存在
する場合には、既存タームのタームＩＤを多重照合ター
ムＩＤに変更し、多重照合タームＩＤテーブルに新規タ
ームとともにタームＩＤを設定する。このとき、既登録
のタームにＡＮＤ条件が設定されていた場合、タームＩ
Ｄテーブルに設定されていたＡＮＤ条件ＩＤを０にクリ
アし、多重照合タームＩＤテーブルの方へ設定する。図
１３の例では、最初“計算機”はタームＩＤテーブルに
ＡＮＤ条件ＩＤとともに登録されていたが、キュー３の
処理のとき、多重照合タームＩＤ１００１に変更し、こ
の時同時にＡＮＤ条件ＩＤも多重照合タームＩＤテーブ
ルの方へ設定し直している。Next, the term ID registration algorithm will be described. The main process of this process is to register the character string of the search term in the term ID table. At this time, check whether the character string to be registered has already been registered,
If registered, the multiple collation term ID table is set. That is, as shown in FIG. 15, first, it is checked whether the same character string as the character string of the search term to be registered is registered in the term ID table.
If new, the character string is registered in the term ID table together with the new term ID. If the same character string already exists, the term ID of the existing term is changed to the multiple matching term ID, and the term ID is set in the multiple matching term ID table together with the new term. At this time, if the AND condition is set for the registered terms, the term I
The AND condition ID set in the D table is cleared to 0 and set in the multiple collation term ID table. In the example of FIG. 13, the “computer” was initially registered in the term ID table together with the AND condition ID, but during the processing of queue 3, it is changed to the multiple matching term ID 1001 and at the same time, the AND condition ID is also matched with the multiple matching term. The ID table is set again.

【００５３】最後に検索結果の分配処理について詳細に
説明する。図１６は、本実施例で用いるテキストデータ
の形態と論理条件判定後のデータの形態及びセッション
ｎ用出力バッファに格納されるデータの形態を示してい
る。テキストデータは、文書を特定するための文書ＩＤ
のｄｘｘｘを先頭として複数文書が列になって格納され
ている。図１６の例では、ｄ００１からｄ００４までの
テキストデータが並んでいることを示している。このテ
キストデータを文字列判定処理、論理条件判定処理を行
なって文字列の有った文書番号とヒットしたタームＩＤ
を出力する。この時は、各要求元から来た条件は全て一
緒に出力される。この論理条件判定処理後のデータを図
１３に示した出力ＩＤリストをもとに各要求元に対応す
るセッションｎ用出力バッファへ結果の文書ＩＤを振り
分ける。すなわち、論理条件判定の出力するＩＤの２と
３の出現した文書はセッション３用出力バッファへ、Ｉ
Ｄ５はセッション５用出力バッファへ、ＩＤ６はセッシ
ョン１用出力バッファへそれぞれ文書ＩＤを出力するれ
ばよい。Finally, the search result distribution process will be described in detail. FIG. 16 shows the form of the text data used in this embodiment, the form of the data after the logical condition determination, and the form of the data stored in the session n output buffer. The text data is the document ID for identifying the document
A plurality of documents are stored in a column starting with dxxx of. The example of FIG. 16 shows that the text data from d001 to d004 are lined up. This text data is subjected to character string determination processing and logical condition determination processing, and the document number with the character string and the term ID that has been hit
Is output. At this time, all the conditions from each request source are output together. Based on the output ID list shown in FIG. 13, the data after the logical condition determination processing allocates the resulting document ID to the output buffer for session n corresponding to each request source. That is, the documents in which the IDs 2 and 3 output by the logical condition determination appear are input to the session 3 output buffer as I
The document ID D5 may be output to the session 5 output buffer, and the ID 6 may be output to the session 1 output buffer.

【００５４】以上、ＣＰＵ１１０の処理について詳細に
説明した。次に、磁気ディスク１２０に複数のデータベ
ースが格納されている時の処理について説明する。デー
タベースは図１７に示すように、磁気ディスク１２０上
にデータベース毎にまとまって格納されているものとす
る。The processing of the CPU 110 has been described above in detail. Next, a process when a plurality of databases are stored on the magnetic disk 120 will be described. As shown in FIG. 17, the databases are collectively stored on the magnetic disk 120 for each database.

【００５５】この様に複数個のデータベースを格納し
て、各要求元が対象とするデータベースを指定して検索
要求を出す場合、制御プログラムがキューバッファに溜
った検索要求を一括して処理しても処理時間の短縮につ
ながらない場合がある。つまり、各要求が別々のデータ
ベースに対するものであった時である。例えば、要求元
１がデータベース１を検索対称に指定し、要求元２がデ
ータベース２を検索対象に指定した検索要求がキューバ
ッファに溜っていた時、これらの検索要求を統一して、
データベース１と２の両方を一括して探索しても、別々
に探索してもトータルのスキャン量は変わらない。すな
わち、図１７の例では磁気ディスク上の論理アドレス０
から３５００番地までをスキャンする量は変らないので
ある。When a plurality of databases are stored in this way and each request source designates a target database to issue a search request, the control program collectively processes the search requests accumulated in the queue buffer. May not lead to reduction of processing time. That is, when each request was for a different database. For example, when the request source 1 designates the database 1 in a search symmetrical manner and the request source 2 designates the database 2 as the search target in the queue buffer, these search requests are unified,
The total scan amount does not change even if both databases 1 and 2 are searched collectively or separately. That is, in the example of FIG. 17, the logical address 0 on the magnetic disk is
The amount of scanning from 1 to 3500 remains unchanged.

【００５６】これに対し、要求元１と２の両方が同じデ
ータベース２を対象に指定しているときには、検索要求
を統一して一括して探索するとスキャンする量が半分に
なることがわかる。すなわち、検索要求を一つずつ別々
に処理すると、論理アドレス２１００番地から３５００
番地を２度スキャンするのに対して、統一して処理すれ
ば１度のスキャンで探索処理が終了できることがわか
る。On the other hand, when both the request sources 1 and 2 specify the same database 2 as a target, it can be seen that if the search requests are unified and collectively searched, the amount of scanning is halved. That is, if search requests are processed one by one, logical addresses 2100 to 3500
It can be seen that the address scanning is performed twice, whereas the search processing can be completed by scanning once if the processing is performed in a unified manner.

【００５７】この様に、情報検索装置が複数のデータベ
ースを格納する場合には、各要求の検索対象データベー
スを考慮し、探索範囲を同じくする要求にたいして、検
索要求を一括処理することによって、スキャン量を減ら
し、待ち時間の少ない処理を実現することができる。As described above, when the information retrieval apparatus stores a plurality of databases, the retrieval amount databases for each request are taken into consideration, and the retrieval requests are batch processed for the requests having the same retrieval range. It is possible to reduce the processing time and realize the processing with less waiting time.

【００５８】このために、検索装置側では各検索端末か
ら送られてくる検索要求をキューバッファに格納する際
に、どのデータベースが対象かを識別するためのデータ
ベースｉｄも同時にキューの中に格納する。この時のキ
ューバッファの構造は、図１８のようになる。すなわ
ち、端末識別子、データベースｉｄ、条件式をキューと
して格納することになる。そして、このキューバッファ
から、溜っている検索要求を取り出して統合処理すると
きに、データベースｉｄが同一の検索要求のみを選択す
ることで、前述の効果的な一括検索処理が可能となる。For this reason, when storing the search request sent from each search terminal in the queue buffer on the search device side, the database id for identifying which database is the target is also stored in the queue at the same time. . The structure of the queue buffer at this time is as shown in FIG. That is, the terminal identifier, database id, and conditional expression are stored as a queue. Then, when the accumulated search requests are taken out from this queue buffer and integrated processing is performed, by selecting only the search requests having the same database id, the above-mentioned effective batch search processing becomes possible.

【００５９】このデータベース選択の検索処理を具体的
に説明する。図１９は図１２で説明した文字列探索プロ
グラムの複数データベースを有するときの処理を示すＰ
ＡＤ図である。処理の違いはキューバッファからのキュ
ーの取り出しで、それ以外の処理は既に説明したので省
略する。今、図１８の様にキューバッファに検索要求が
溜っている場合、まずキュー１をバッファから取り出し
て、データベースｉｄを参照し、バッファ内のデータベ
ースｉｄが２であるキュー、すなわちキュー１と対象が
同一であるものだけをキューバッファから取り出す。こ
の例では、キュー３とキュー４が該当するため、キュー
２だけをバッファに残し、残りの検索要求は統合され一
括処理される。この様にして、検索要求を統合し、デー
タベース２を一度スキャンした後、端末識別子１，２，
５にそれぞれ検索結果を分配し検索処理を終える。この
時、キューバッファにはまだキュー２が残っているの
で、次にこの検索要求がバッファから取り出され、デー
タベース３について検索処理を始めることになる。The database selection search process will be described in detail. FIG. 19 shows a process P having a plurality of databases of the character string search program described in FIG.
FIG. The difference between the processes is the extraction of the queue from the queue buffer, and the other processes are already described and will be omitted. When search requests are accumulated in the queue buffer as shown in FIG. 18, first, the queue 1 is taken out from the buffer, the database id is referred to, and the queue whose database id is 2 in the buffer, that is, the queue 1 and the target are Only those that are the same are fetched from the queue buffer. In this example, since queue 3 and queue 4 correspond, only queue 2 is left in the buffer, and the remaining search requests are integrated and collectively processed. In this way, after integrating the search requests and scanning the database 2 once, the terminal identifiers 1, 2,
The search results are distributed to 5 and the search process ends. At this time, since the queue 2 still remains in the queue buffer, this search request is taken out from the buffer next, and the search process for the database 3 is started.

【００６０】この様に、複数のデータベースに対して要
求が送られてくる場合でも、複数の検索要求を一括して
処理することが可能で、待ち時間の少ない情報検索装置
を提供することができる。As described above, even when requests are sent to a plurality of databases, a plurality of search requests can be processed in a batch, and an information search device with a short waiting time can be provided. .

【００６１】以上、単一あるいは複数の検索要求が有る
場合の文字列検索処理について説明した。このようにし
て得られた検索結果である文書の集合は、図２０のよう
にして格納され、以後の検索の際にベース集合として用
いたり、あるいは集合間のＯＲ演算をしたりして用いら
れる。図２０の例では、検索結果集合である文書ＩＤの
列を文書ＩＤの位置を１とし、ヒットしていない文書Ｉ
Ｄの位置を０とした検索結果ビットリストで表現してい
る。ビットリストで表現した理由は、集合間のＡＮＤ，
ＯＲ処理を行なうのにビット間のＡＮＤ，ＯＲ演算で処
理できるという利点が有るからである。このビットリス
トを検索回数分蓄えて管理しているのが、結果集合管理
テーブルである。この結果集合管理テーブルは、接続し
ている端末の個数分用意され、それぞれに各端末の要求
毎の結果集合が格納されている。以下、この結果集合管
理テーブルを用いたハイアラーキ検索の処理について説
明する。The character string search processing when there is a single or a plurality of search requests has been described above. The set of documents, which is the search result obtained in this way, is stored as shown in FIG. 20 and is used as a base set or an OR operation between sets is used in the subsequent search. . In the example of FIG. 20, the position of the document ID is set to 1 in the document ID column that is the search result set, and the document I that has not been hit is found.
It is represented by a search result bit list in which the position of D is 0. The reason for expressing with a bit list is AND between sets,
This is because the OR processing has an advantage that it can be processed by AND and OR operations between bits. The result set management table stores and manages this bit list for the number of searches. This result set management table is prepared for the number of connected terminals, and the result set for each request of each terminal is stored in each. Hereinafter, the process of hierarchy search using this result set management table will be described.

【００６２】ハイアラーキ検索とは、図２１に示すよう
に、結果集合管理テーブルで管理する過去の結果集合を
検索範囲として、検索する処理のことで、絞り込み検索
とも呼ぶ。すなわち、過去（通常直前の検索結果）の検
索結果である文書ＩＤの列を結果集合管理テーブルのビ
ットリストから求め（これをベースと呼ぶ）、新たな条
件式（図２１の例では“学習型ユーザインタフェー
ス”）でテキストデータを探索する。この時、テキスト
データのベース以外のデータは読み飛ばす処理を行な
う。例えば、図２１の例ではベース以外のテキストデー
タｄ００１〜ｄ００２及びｄ００４〜ｄ０１４などは不
要テキストとして、文字列の探索処理を行なわない。こ
れが絞り込み処理である。結果として、ベースのテキス
トデータに条件の検索タームが存在した文書ＩＤｄ０
０８，ｄ０１６．．．．が出力される。複数の検索要求
が待ちキューバッファに溜っている場合にこのハイアラ
ーキ検索を実現する方法を次に説明する。検索要求のあ
った端末が異なる場合、当然ベースも異なることにな
る。従ってベースをどの様に設定するか、また結果集合
をどのように分配するかが問題となってくる。文字列の
照合動作は、テキストデータをスキャンするという正確
上できるだけ少なくすることが望まれる。そこで、図２
２のように、各要求元の結果集合管理テーブルからベー
スのビットリストを読みだし、それらのビットＯＲを複
数要求受付時のベースとする。そして、絞り込んだ後の
結果集合をまたビットリストに戻し、各要求元のベース
ビットリストとＡＮＤ演算する。最後のＡＮＤ演算は、
要求元のベース集合以外の他の要求元のベースでの条件
式の照合によるノイズを取り去るために必要となる。As shown in FIG. 21, the hierarchy search is a process of searching with a past result set managed by the result set management table as a search range, and is also called a narrowed search. That is, the column of the document ID that is the search result of the past (the search result just before the normal) is obtained from the bit list of the result set management table (this is called the base), and the new conditional expression (in the example of FIG. Search for text data in the user interface "). At this time, data other than the text data base is skipped. For example, in the example of FIG. 21, the text data d001 to d002, d004 to d014, and the like other than the base are regarded as unnecessary text, and the character string search processing is not performed. This is the narrowing down process. As a result, the document ID d0 in which the search term of the condition was present in the base text data
08, d016. ．．． Is output. A method of realizing this hierarchy search when a plurality of search requests are accumulated in the waiting queue buffer will be described below. If the terminal that requested the search is different, the base will naturally be different. Therefore, how to set the base and how to distribute the result set becomes a problem. It is desirable to minimize the number of character string matching operations in terms of scanning text data. Therefore, FIG.
As in 2, the bit list of the base is read from the result set management table of each request source, and those bit ORs are used as the base for receiving a plurality of requests. Then, the result set after narrowing down is returned to the bit list again, and ANDed with the base bit list of each request source. The last AND operation is
It is necessary to remove the noise caused by the matching of the conditional expressions with the base of the request source other than the base set of the request source.

【００６３】以上ハイアラーキ検索機能の実現処理を含
め、第一の実施例の説明をした。本実施例は、特別なハ
ードウエアを必要とせず、単一のＣＰＵで複数のフルテ
キストサーチ要求を処理するのに効果がある。The first embodiment has been described above including the processing for realizing the hierarchy search function. This embodiment does not require special hardware and is effective in processing multiple full-text search requests with a single CPU.

【００６４】本実施例では、各端末からユーザがオンラ
インで検索要求を発信することを仮定に説明したが、オ
フラインで指定された検索要求に対しても同様の処理で
複数の検索処理を一括して処理することができる。これ
は、例えばバッチジョブのように何件もの検索要求を一
台の検索装置で処理する場合に、従来は一件ずつＤＢの
テキストデータをスキャンして結果を出力していたもの
が、一回のスキャンですべての要求を処理できることを
意味する。つまり、本実施例によればオンライン、オフ
ラインを問わず高速な検索要求の処理が可能となる。Although the present embodiment has been described on the assumption that the user issues a search request online from each terminal, a plurality of search processes are collectively performed by the same process even for a search request designated offline. Can be processed. This is because when processing a number of search requests with a single search device, such as a batch job, conventionally, the text data in the DB was scanned and the result was output once. Means that all requests can be processed by the scan. That is, according to the present embodiment, it is possible to process the search request at high speed both online and offline.

【００６５】これより、第二の実施例の説明を行なう。
本実施例は、文字列探索処理のための特別なハードウエ
アあるいは探索処理専門のＣＰＵを有する複数個のＣＰ
Ｕで実現することに特徴がある。Now, the second embodiment will be described.
In this embodiment, a plurality of CPs having special hardware for character string search processing or a CPU specialized for search processing are used.
It is characterized by realizing with U.

【００６６】図２３は第二の実施例の構成を示す図であ
る。本実施例は、検索端末１３００、検索処理を実行す
るＣＰＵ１３１０、制御プログラムと各種のデータバッ
ファを格納するメモリ１３２０、ＣＰＵ１３１０に接続
し実際の文字列探索処理を制御するためのＣＰＵ１３３
０、文字列照合処理を専門に処理する文字列照合部１３
４０、データベースを格納する磁気ディスクの書込み読
出しを制御するディスク制御部１３５０、磁気ディスク
１３６０、文字列照合処理を制御するプログラムと各種
データエリアを格納するメモリ１３７０とからなってい
る。１３３０〜１３７０はＴＳＭという一つの処理ブロ
ックを形成してＣＰＵ１３１０に複数個接続する。メモ
リ１３２０には、ＴＳＭ及び端末を制御する制御プログ
ラムが格納されている。また、同メモリは、各ＴＳＭが
持つデータベースを管理するＤＢ−ＴＳＭ対応テーブ
ル、各ＴＳＭ用の待ちキューバッファ、接続された端末
用のセッションｎ用出力バッファとしても使っている。
メモリ１３７０には、文字列照合処理を制御する制御プ
ログラム、ターム間のＡＮＤ，ＯＲ条件を判定する論理
条件判定プログラム、複数の端末からの検索要求を同時
に受け付けたときに検索結果集合を分配する照合結果分
配プログラムが格納してある。また、同メモリは照合結
果を一時蓄える照合結果出力バッファ、各接続端末毎に
結果集合間利テーブルの格納領域としても用いる。これ
より、検索処理の流れに従って、各プログラムの処理の
詳細を説明する。FIG. 23 is a diagram showing the configuration of the second embodiment. In this embodiment, a search terminal 1300, a CPU 1310 that executes a search process, a memory 1320 that stores a control program and various data buffers, and a CPU 133 that is connected to the CPU 1310 and controls the actual character string search process.
0, a character string collation unit 13 that specializes in character string collation processing
40, a disk control unit 1350 that controls writing and reading of a magnetic disk that stores a database, a magnetic disk 1360, a program that controls character string collation processing, and a memory 1370 that stores various data areas. 1330 to 1370 form one processing block called TSM and are connected to a plurality of CPUs 1310. The memory 1320 stores a control program for controlling the TSM and the terminal. The memory is also used as a DB-TSM correspondence table that manages the database of each TSM, a waiting queue buffer for each TSM, and a session n output buffer for a connected terminal.
The memory 1370 has a control program for controlling the character string collation processing, a logical condition determination program for determining AND and OR conditions between terms, and a collation for distributing a retrieval result set when retrieval requests from a plurality of terminals are simultaneously received. The result distribution program is stored. The memory is also used as a collation result output buffer for temporarily storing the collation result, and as a storage area of the inter-result set table for each connection terminal. Now, the details of the processing of each program will be described according to the flow of the search processing.

【００６７】端末側での検索処理の流れは、第一の実施
例と同一である。第一の実施例と異なる点は、ＣＰＵ１
３１０，１３３０の制御プログラムの流れであるので、
これらについて詳しく説明する。図２４はＣＰＵ１３１
０の処理の流れを示すＰＡＤ図である。基本的にＣＰＵ
１３１０は端末からの要求を待っているが、割込みによ
りＴＳＭからの検索処理終了信号も処理する。まず、端
末からの要求の処理について説明し、次にＴＳＭからの
終了信号の処理について説明する。The flow of search processing on the terminal side is the same as in the first embodiment. The difference from the first embodiment is that the CPU 1
Since it is the flow of the control program 310, 1330,
These will be described in detail. FIG. 24 shows the CPU 131
It is a PAD figure showing the flow of processing of 0. Basically CPU
Although 1310 waits for a request from the terminal, it also processes the search processing end signal from TSM by an interrupt. First, the processing of the request from the terminal will be described, and then the processing of the end signal from the TSM will be described.

【００６８】端末から接続要求があった場合には、要求
の検索対象となるデータベースからＤＢ−ＴＳＭ対応テ
ーブルを参照し、対応するＴＳＭへ接続要求を発行す
る。その後、セッションｎ用出力バッファの領域をメモ
リ１３２０に確保する。解放要求の場合は、ＴＳＭへ解
放要求を出し、接続時に確保したセッションｎ用出力バ
ッファの領域を解放する。検索要求の場合は、対応ＴＳ
Ｍの待ちキューバッファへ端末識別子と共に検索要求を
登録する。文書データ表示要求の場合は、対応ＴＳＭへ
データ表示要求を出し、データが出力バッファに格納さ
れるのを待って要求元端末へデータを送出する。ＴＳＭ
からの検索処理終了割込みを受け付けた場合、検索要求
元の端末が指定したデータベースを担当する全てのＴＳ
Ｍの処理が終わったかを判定し、全てのＴＳＭの検索処
理が終了している場合にかぎり出力バッファからの検索
データの送出を行う。それ以外の場合は、終了割込みが
あっても、これを無視して全てのＴＳＭの処理が終了す
るまで待っている。When a connection request is issued from the terminal, the DB-TSM correspondence table is referenced from the database to be searched for the request, and the connection request is issued to the corresponding TSM. After that, the area of the output buffer for session n is secured in the memory 1320. In the case of a release request, the release request is issued to TSM to release the area of the session n output buffer secured at the time of connection. Corresponding TS in case of search request
The search request is registered in the wait queue buffer of M together with the terminal identifier. In the case of a document data display request, it issues a data display request to the corresponding TSM, waits until the data is stored in the output buffer, and sends the data to the request source terminal. TSM
When the search processing end interrupt from the server is accepted, all TSs in charge of the database specified by the terminal requesting the search
It is determined whether the processing of M has been completed, and the search data is sent from the output buffer only when the search processing of all the TSMs is completed. In other cases, even if there is a termination interrupt, it is ignored and waits until the processing of all TSMs is completed.

【００６９】ここで用いるＤＢ−ＴＳＭ対応テーブルは
図２５のような構造をしている。項目として、ＤＢ名称
とそのＤＢを格納しているＴＳＭの番号の対応表となっ
ていて、接続要求時、端末からの検索対象データベース
名称からこのテーブルを参照して、対応するＴＳＭへ接
続要求を出す。例えば、新聞記事を検索対象とする端末
からの接続要求が来た場合には、ＴＳＭ１と２に接続要
求を出し、以後該端末からの検索要求はＴＳＭ１と２に
対して行われる。The DB-TSM correspondence table used here has a structure as shown in FIG. As an item, there is a correspondence table of the DB name and the number of the TSM that stores the DB. At the time of connection request, the table is referred from the search target database name from the terminal, and the connection request is made to the corresponding TSM. put out. For example, when a connection request is made from a terminal that searches for newspaper articles, a connection request is issued to TSM1 and TSM1 and then a search request is made to TSM1 and TSM2.

【００７０】キューバッファの構造は第一の実施例と同
様であるが、ＴＳＭの数だけメモリ１３２０に用意され
ており、端末からの検索要求にしたがって対応するＴＳ
Ｍ用のキューバッファに端末識別子と共に登録される。
例えば、新聞記事ＤＢを検索対象とする端末からの検索
要求では、ＴＳＭ１と２用のキューバッファにそれぞれ
検索要求が登録される。The structure of the queue buffer is the same as that of the first embodiment, but as many TSMs as the number of TSMs are prepared in the memory 1320, and TSs corresponding to the search request from the terminal are provided.
It is registered in the queue buffer for M together with the terminal identifier.
For example, in a search request from a terminal that searches the newspaper article DB, the search requests are registered in the queue buffers for TSM1 and TSM2, respectively.

【００７１】次にＣＰＵ１３３０の処理について説明す
る。このＴＳＭに設けられたＣＰＵでは、上位のＣＰＵ
１３１０からの要求を待っている。図２６に示すよう
に、接続の要求があったときには、接続された端末用の
結果集合管理テーブルの領域を確保する。また、開放要
求の場合には、対応する端末用の結果集合管理テーブル
の領域を開放する。文書データ表示の要求では、送られ
て来た文書ＩＤに対応するテキストデータをディスク制
御部を駆動して、磁気ディスクより取り出し、上位ＣＰ
Ｕ１３１０の管理するセッションｎ用出力バッファへデ
ータを送出する。この時の、送り先であるセッションｎ
用出力バッファとは、勿論データ表示要求元の端末に対
応する出力バッファを示す。Next, the processing of the CPU 1330 will be described. Among the CPUs provided in this TSM, the higher CPU
Waiting for request from 1310. As shown in FIG. 26, when a connection is requested, the area of the result set management table for the connected terminal is secured. In the case of a release request, the area of the result set management table for the corresponding terminal is released. In the request for displaying the document data, the text data corresponding to the sent document ID is taken out from the magnetic disk by driving the disk control unit,
The data is sent to the output buffer for session n managed by U1310. Session n that is the destination at this time
The output buffer for use is, of course, an output buffer corresponding to the terminal which is the source of the data display request.

【００７２】検索要求の場合は、第一の実施例と同様
に、まず待ちキューバッファに蓄えられた検索要求の数
をチェックし、１個なら単一キュー取り出し、複数個な
らばキューを同時に取り出して、タームＩＤ付与処理を
行う。この後、文字列照合部とディスク制御部を駆動
し、条件式中の検索タームを探索する文字列探索を行
う。そして、ターム間の論理条件判定を行って条件式に
合致する文書ＩＤを照合結果出力バッファに出力した
後、端末識別子を基に照合結果を各要求元別に分配し
て、上位ＣＰＵの管理するセッションｎ用出力バッファ
へ出力する。この時、結果集合は要求元に対応する結果
集合管理テーブルへも格納し、ハイアラーキサーチ等で
次の検索のために利用する。最後に、上位ＣＰＵ１３１
０へ担当分の検索処理が終わったことを報告して、全て
の検索処理が終わり、また上位ＣＰＵからの要求を待
つ。In the case of a search request, as in the first embodiment, first, the number of search requests stored in the waiting queue buffer is checked. Then, the term ID assigning process is performed. After that, the character string collating unit and the disk control unit are driven to perform a character string search for searching for a search term in the conditional expression. Then, after determining the logical condition between the terms and outputting the document ID that matches the conditional expression to the collation result output buffer, the collation result is distributed to each requester based on the terminal identifier, and the session managed by the upper CPU is managed. Output to the n output buffer. At this time, the result set is also stored in the result set management table corresponding to the request source, and is used for the next search by hierarchy search or the like. Finally, the upper CPU 131
It reports to 0 that the search processing for the charge is completed, all the search processing is completed, and waits for a request from the upper CPU.

【００７３】以上、第二の実施例を説明した。本実施例
によれば、文字列探索処理のための特別なハードウエア
あるいは探索処理専門のＣＰＵを有するため、複数のＴ
ＳＭで並列にフルテキストサーチを実行することができ
る。The second embodiment has been described above. According to this embodiment, since the special hardware for the character string search processing or the CPU specialized for the search processing is provided, a plurality of T's are provided.
It is possible to perform full text searches in parallel in SM.

【００７４】また、この第二の実施例では、処理ブロッ
クＴＳＭが複数個ＣＰＵに接続し、同ＣＰＵが検索端末
のあるＬＡＮに接続する構成としたが、図２７に示すよ
うにＴＳＭを検索端末と同一のＬＡＮに接続する構成で
も同じ処理を行うことが可能である。Further, in the second embodiment, the processing block TSM is connected to a plurality of CPUs, and the CPUs are connected to the LAN having the search terminal. However, as shown in FIG. 27, the TSM is connected to the search terminal. The same processing can be performed with a configuration in which the same LAN is connected.

【００７５】さらにまた、検索端末からの接続要求によ
り確保されるセッションｎ用結果集合管理テーブルとセ
ッションｎ用出力バッファを、同一端末からの別々の接
続要求で個々に確保することで、同一端末上の異なる検
索セッションをそれぞれ管理することもできる。Furthermore, the result set management table for session n and the output buffer for session n, which are secured by the connection request from the search terminal, are individually secured by different connection requests from the same terminal. You can also manage different search sessions.

【００７６】[0076]

【発明の効果】本発明によれば、ＣＰＵ占有時間の多い
フルテキストサーチ処理を複数の端末をつなげて処理す
ることができる。すなわち、複数端末から同時に検索処
理要求が来たときでも、後から要求のあった端末を待た
せることなく、一回のテキストスキャンで複数個の条件
式を一括で処理することが可能となる。According to the present invention, it is possible to connect a plurality of terminals to perform a full-text search process that takes a lot of CPU time. That is, even when search processing requests are simultaneously received from a plurality of terminals, a plurality of conditional expressions can be collectively processed by one text scan without waiting for the terminals that have made the request later.

【００７７】待ちキューバッファを設け、端末から来た
要求を逐次登録しておくことにより、検索処理中に入っ
て来た要求は、現在の検索処理終了後に、複数の要求を
まとめて処理することができるようになる。By providing a waiting queue buffer and sequentially registering requests coming from the terminal, requests coming in during the search process can be processed as a group of a plurality of requests after the end of the current search process. Will be able to.

【００７８】まとめて処理された検索結果は、照合結果
分配プログラムと、セッションｎ用出力バッファにより
要求元別に分配され、それぞれの要求元へ送出すること
ができる。The search results collectively processed are distributed by request source by the collation result distribution program and the output buffer for session n, and can be sent to each request source.

【００７９】オフラインで複数件の検索要求を処理する
場合においても、同様の処理を行うことで、複数の検索
要求を一括して処理することができるので、一件単位で
検索要求を処理するよりも効率的に検索要求を消化する
ことができるようになる。Even when processing a plurality of search requests off-line, a plurality of search requests can be processed at once by performing the same processing. Therefore, rather than processing the search requests on a case-by-case basis. Will also be able to efficiently digest search requests.

【００８０】また、要求元別に結果集合を蓄える結果集
合管理テーブルを持ち、ハイアラーキサーチのときに、
第一に要求元分のベース集合のＯＲ集合を検索対象とし
て検索処理し、第二に得られた照合結果を要求元別に分
配し、第三に分配した結果集合に対して各要求元のベー
ス集合とＡＮＤすることで、複数の要求元から来る検索
要求を一括して処理することが可能となる。Further, it has a result set management table for storing a result set for each request source, and at the time of hierarchy search,
Firstly, the OR set of the base set for the request source is searched, and secondly, the obtained matching result is distributed by request source. Thirdly, the base of each request source is distributed to the distributed result set. By ANDing with the set, it becomes possible to collectively process search requests from a plurality of request sources.

【００８１】さらに、端末からの要求を受け付ける上位
ＣＰＵとそれにつながる複数個の文字列照合を担当する
下位ＣＰＵとを有することにより、テキストスキャン処
理を複数台で並列に行うことができ、高速なフルテキス
トサーチが可能となる。Further, by having the upper CPU which receives a request from the terminal and the lower CPU which is in charge of a plurality of character string collation connected thereto, the text scan processing can be performed in parallel by a plurality of units, and the high speed full processing can be performed. Text search is possible.

【図面の簡単な説明】[Brief description of drawings]

【図１】キューバッファを用いた複数要求受け付け処理
を示す概念図である。FIG. 1 is a conceptual diagram showing a multiple request reception process using a queue buffer.

【図２】複数の検索要求の一括処理を示す概念図であ
る。FIG. 2 is a conceptual diagram showing batch processing of a plurality of search requests.

【図３】マルチユーザでのハイアラーキ検索の処理を示
す説明図である。FIG. 3 is an explanatory diagram showing processing of a hierarchy search by multi-users.

【図４】マルチユーザでのハイアラーキ検索の効率的な
処理を示す説明図である。FIG. 4 is an explanatory diagram showing efficient processing of a hierarchy search by multi-users.

【図５】一括検索処理において検索ノイズが発生するこ
とを示す説明図である。FIG. 5 is an explanatory diagram showing that search noise occurs in the collective search process.

【図６】一括検索後の分配処理を示す概念図である。FIG. 6 is a conceptual diagram showing distribution processing after collective search.

【図７】一括検索後の分配処理が正しい結果を出力する
ことを示す集合関係図である。FIG. 7 is a set relationship diagram showing that the distribution process after collective search outputs a correct result.

【図８】第一の実施例の構成図である。FIG. 8 is a configuration diagram of a first embodiment.

【図９】本実施例における検索端末側の検索業務の流れ
を示すＰＡＤ図である。FIG. 9 is a PAD diagram showing the flow of a search operation on the search terminal side in the present embodiment.

【図１０】検索装置の制御プログラムの流れを示すＰＡ
Ｄ図である。FIG. 10 is a PA showing a flow of a control program of the search device.
FIG.

【図１１】待ちキューバッファの構造を示す概念図であ
る。FIG. 11 is a conceptual diagram showing a structure of a waiting queue buffer.

【図１２】文字列探索プログラムの処理を示すＰＡＤ図
である。FIG. 12 is a PAD diagram showing processing of a character string search program.

【図１３】タームＩＤ付与処理を示す概念図である。FIG. 13 is a conceptual diagram showing a term ID assigning process.

【図１４】タームＩＤ付与処理を示すＰＡＤ図である。FIG. 14 is a PAD diagram showing a term ID assigning process.

【図１５】タームＩＤの登録処理を示すＰＡＤ図であ
る。FIG. 15 is a PAD diagram showing a term ID registration process.

【図１６】検索結果の分配処理を示す概念図である。FIG. 16 is a conceptual diagram showing distribution processing of search results.

【図１７】データベースの格納形式を示す概念図であ
る。FIG. 17 is a conceptual diagram showing a storage format of a database.

【図１８】待ちキューバッファの構造を示す概念図であ
る。FIG. 18 is a conceptual diagram showing the structure of a waiting queue buffer.

【図１９】複数のデータベースを有するときの文字列探
索プログラムの処理を示すＰＡＤ図である。FIG. 19 is a PAD diagram showing processing of a character string search program when a plurality of databases are included.

【図２０】検索結果格納方法を示す概念図である。FIG. 20 is a conceptual diagram showing a search result storage method.

【図２１】ハイアラーキ検索の原理を示す概念図であ
る。FIG. 21 is a conceptual diagram showing the principle of hierarchy search.

【図２２】複数要求受付け時のハイアラーキ検索の方法
を示す概念図である。FIG. 22 is a conceptual diagram showing a hierarchy search method when receiving a plurality of requests.

【図２３】第二の実施例の構成図である。FIG. 23 is a configuration diagram of a second embodiment.

【図２４】上位ＣＰＵの制御プログラムの処理を示すＰ
ＡＤ図である。FIG. 24 is a flowchart P showing the processing of the control program of the upper CPU.
FIG.

【図２５】ＤＢ−ＴＳＭ対応テーブルの形態を示す図で
ある。FIG. 25 is a diagram showing a form of a DB-TSM correspondence table.

【図２６】下位ＣＰＵの制御プログラムの処理を示すＰ
ＡＤ図である。FIG. 26 is a flowchart P showing the processing of the control program of the lower CPU.
FIG.

【図２７】検索端末と検索処理ブロックを同一のＬＡＮ
で接続した構成図である。FIG. 27 is the same LAN as the search terminal and the search processing block
It is a block diagram connected by.

[Explanation of symbols]

１００…検索端末、１１０…ＣＰＵ，１２０…磁気ディ
スク、１３０…メモリ。100 ... Search terminal, 110 ... CPU, 120 ... Magnetic disk, 130 ... Memory.

───────────────────────────────────────────────────── フロントページの続き (72)発明者川口久光東京都国分寺市東恋ケ窪１丁目280番地株式会社日立製作所中央研究所内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Hisamitsu Kawaguchi 1-280, Higashi Koikekubo, Kokubunji, Tokyo Metropolitan Research Center, Hitachi, Ltd.

Claims

[Claims]

1. An information retrieval apparatus for retrieving a document containing a specific character string, wherein a plurality of retrieval terminals are connected, and means for storing the retrieval request when another retrieval request is received during the retrieval process. An information retrieval device comprising: a means for collectively processing a plurality of stored retrieval requests when a plurality of retrieval requests are stored in the retrieval request storage means.

2. The information search device according to claim 1, wherein the search request storage means is composed of a waiting queue buffer.

3. The information search device according to claim 1, wherein the search request storage means stores identification information indicating which search request, and after the character string search processing is completed, a search result is obtained based on the identification information. An information retrieving apparatus, which distributes to each retrieval request source and outputs.

4. The information search device according to claim 3, further comprising result set management storage means for storing past search results as a set for each search request source, and further adding a condition to the result set. And an information search device for performing a hierarchy search for narrowing down the result set.

5. The information search device according to claim 4, when performing a search process for a plurality of search requests at once, first, a search process is performed on all data, and the results are distributed to the search request source, and then, The information search device characterized by performing the inter-set AND process between the past result set corresponding to the search request source and the new result set to realize the narrowing down process.

6. The information search apparatus according to claim 4, wherein when a plurality of search requests are collectively searched, the search processing is performed based on the OR set of the result sets corresponding to respective search request sources, and the result is obtained. Between the search request sources and the result sets corresponding to the search request sources for which the respective OR sets are obtained, and a new result set.
An information search device characterized by performing D processing and realizing narrowing processing.

7. The information retrieving apparatus according to claim 1, comprising a plurality of databases, and a database id indicating which database is to be retrieved when the retrieval request is stored in the retrieval request storing means. When a plurality of search requests are collectively processed by search, only the search requests having the same database id in the search request storage means are selectively retrieved from the search request storage means and processed. Information retrieval device.

8. An information retrieving apparatus to which a plurality of retrieval terminals are connected, and a plurality of dedicated character string retrieving apparatuses for accumulating data in each of the retrieval terminals to retrieve a document containing a specific character string are connected, A correspondence table between the types of data stored in the column search device and the character string search device, and search request storage means for storing the search request to be sent to each character string search device are provided for each character string search device, and the search of the character string search device is performed. When another search request that uses the same character string search device is received during processing, the search request is stored in the search request storage means for the character string search device, and when a plurality of search requests are stored, An information retrieval device characterized in that a plurality of retrieval requests are collectively retrieved.

9. The information search device according to claim 8, wherein the search request storage means is composed of a waiting queue buffer.

10. The information retrieval device according to claim 8,
Information characterized in that the search request storage means stores identification means indicating which search request is made, and after completion of the character string search processing, the search result is sorted and output to each search request source based on the identification information. Search device.

11. The information search device according to claim 10, wherein the character string search device has result set management storage means for storing a past search result as a set for each search request source, and the result set is stored in the result set. An information search device characterized in that a hierarchy search for narrowing down the result set is performed by further adding a condition.

12. The information retrieval apparatus according to claim 11, wherein when the information retrieval apparatus collectively retrieves a plurality of retrieval requests, first, all the data possessed by the character string retrieval apparatus is retrieved. After processing and distributing the results according to the search request source, an inter-set AND is performed between the past result set corresponding to each search request source and the new result set.
An information retrieving apparatus, which performs a narrowing process by performing a process and returning the result to the information retrieving device.

13. The information search apparatus according to claim 11, wherein when a plurality of search requests are collectively processed by the information search apparatus, first, an OR set of result sets corresponding to respective search request sources is used. And the results are sorted according to the search request source, and then an inter-set AND process is performed between the result set corresponding to the search request source for which each OR set is obtained and the new result set, An information search device, characterized in that narrowing down processing is realized by returning a result to the information search device.

14. An information retrieval apparatus for retrieving a document containing a specific character string, a means for storing a plurality of retrieval requests, and a storage means for storing a plurality of retrieval requests in the retrieval request storage means. And a means for collectively processing a plurality of search requests.

15. An information retrieval apparatus for retrieving a document containing a specific character string, wherein a means for storing a plurality of retrieval requests and identification information indicating a request source in association with each other, and a retrieval request in the retrieval request storage means. When a plurality of stored search requests are stored, a means for collectively processing the stored search requests, and a means for distributing the search results to each search request source based on the identification information after the completion of the search processing and outputting the search results are provided. An information retrieval device characterized in that