JPH0546664A

JPH0546664A - Document processor

Info

Publication number: JPH0546664A
Application number: JP3207820A
Authority: JP
Inventors: Toshiyuki Noguchi; 利之野口; Shiro Ito; 史朗伊藤; Takanari Ueda; 隆也上田; Minoru Fujita; 稔藤田
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1991-08-20
Filing date: 1991-08-20
Publication date: 1993-02-26

Abstract

PURPOSE:To simplify a processing when it takes a long time to attain an adequate word and phrase extracting processing, and to efficiently operate a retrieving processing. CONSTITUTION:This device is provided with a retrieval processing part 3 which retrieves documents in a retrieval objective document holding part 2 by using retrieval words, holds character strings being the retrieved result in a retrieved result holding part 4, and holds the number of the retrieval objective documents matching with the retrieval words in a matching number holding part 5. And also, the device is provided with an adequate word and phrase extraction processing control part 6 which decides whether or not the processing of an adequate word and phrase extraction processing part 9 which extracts only the character strings which are actually coincident with the retrieval words from among those retrieved from the retrieved result holed in the retrieved result holding part 4, is executed, by comparing a required time obtained from the pertinent matching number by an adequate word and phrase extraction processing required time predicting table 7 with a threshold value.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、計算機によって検索を
行なう文書処理装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document processing apparatus for performing a search by a computer.

【０００２】[0002]

【従来の技術】最近、大量の文書中からあるキーワード
にマッチした語句を抽出する検索装置が用いられるよう
になってきた。一般に、検索対象となる文書が大量にな
ると検索に要する時間は大きくなり、時間がかかるのが
通常である。また、この種の装置では文字列パターンの
一致に基づく検索処理を行なった後に検索された文字列
から検索語に語句として適合する語句を抽出する処理が
行われることが多い。この処理は文解析等を行なうた
め、更に処理に大量の時間を要するのが通常である。2. Description of the Related Art Recently, a search device for extracting a phrase matching a certain keyword from a large amount of documents has been used. Generally, when the number of documents to be searched becomes large, the time required for the search becomes long, and it usually takes time. In addition, in this type of device, a search process based on matching of character string patterns is often performed, and then a process of extracting a phrase matching a search word as a phrase from the searched character string is often performed. Since this processing performs sentence analysis and the like, it usually takes a lot of time for processing.

【０００３】[0003]

【発明が解決しようとする課題】上述した検索語に適合
する語句を抽出する処理は文解析等を行なうため、検索
の精度は向上するが、大量の文書を検索対象とすると、
前に行う検索処理よりも更に大量の時間を要することに
なるという欠点があった。この発明はこの欠点を解決
し、検索処理が終了した段階でその後の処理を処理に要
する時間に鑑みて簡略化することができる文書処理装置
を提供することを目的とする。The above-described processing for extracting a phrase that matches a search word performs sentence analysis or the like, so the accuracy of the search is improved, but if a large number of documents are to be searched,
There is a drawback that it takes a much larger amount of time than the search processing performed before. SUMMARY OF THE INVENTION It is an object of the present invention to solve this drawback and to provide a document processing apparatus which can simplify the subsequent processing when the search processing is completed in view of the time required for the processing.

【０００４】[0004]

【課題を解決するための手段】上記目的を達成するため
本発明は、検索対象文書から検索語と一致する文字列を
検索する検索処理手段と、前記検索処理手段により検索
された文字列を保持する検索結果保持手段と、該検索結
果保持手段に保持された文字列の中から、前記検索語と
語句として一致するものを抽出する抽出手段と、前記検
索処理手段で検索対象文書に関して検索語とマッチング
した個所の数を保持するマッチング箇所数保持手段と、
前記マッチング箇所数保持手段により得られるマッチン
グ箇所数から前記抽出手段による抽出処理の所要時間を
予測する予測手段と、前記予測手段により予測される所
要時間と所定のしきい値とを比較することによって、前
記検索結果保持手段に保持されている検索結果に対し
て、前記抽出手段による抽出処理を行うか否かを決定す
る手段とを具えたことを特徴とする。In order to achieve the above object, the present invention has a search processing means for searching a document to be searched for a character string matching a search word, and a character string searched by the search processing means. Search result holding means, extracting means for extracting a word matching the search word from the character string held in the search result holding means, and a search word for a document to be searched by the search processing means. Matching point number holding means for holding the number of matched points,
By comparing the required time estimated by the predicting unit with a predetermined threshold value, the predicting unit predicting the required time for the extraction processing by the extracting unit from the number of matching places obtained by the matching place number holding unit. And a means for deciding whether or not to perform the extraction processing by the extraction means on the search result held in the search result holding means.

【０００５】[0005]

【作用】本発明によれば、検索処理が終了した段階でそ
の後の処理がどれくらいかかるのか時間を予測し、その
予測値がしきい値より大きければその後の処理を止める
など、処理を簡略化することができるようになる。According to the present invention, the processing is simplified by predicting how long the subsequent processing will take when the search processing is completed, and stopping the subsequent processing if the predicted value is larger than the threshold value. Will be able to.

【０００６】[0006]

【実施例】以下、図面を参照して本発明を詳細に説明す
る。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described in detail below with reference to the drawings.

【０００７】図１は、本発明の一実施例に係る装置の構
成を示すブロック図である。同図において１は検索語を
設定する検索語保持部、２は検索の対象となる文書が保
持されている検索対象文書保持部、３は検索語保持部１
に保持されている検索語を用いて検索対象文書保持部２
内の文書に対して検索処理を行なう検索処理部、４は検
索処理部３で検索された結果である文字列を保持する検
索結果保持部、５は検索処理部３で検索語とマッチング
した個所の数を保持するマッチング箇所数保持部、６は
マッチング箇所数保持部５に保持されているマッチング
箇所数と適合語句抽出処理所要時間予測テーブル７から
適合語句抽出処理を行なうか否かを判断する適合語句抽
出処理制御部である。適合語句抽出処理所要時間予測テ
ーブル７は、マッチング箇所数から適合語句抽出処理に
かかる時間を予測するための情報を格納している。８は
適合語句抽出処理制御部６において適合語句抽出処理所
要時間予測テーブル７から得られる所要時間と比較する
ことによって適合語句抽出処理を行うか否かを決定する
ためのしきい値時間が保持されている適合語句抽出処理
制御用しきい値時間保持部、９は検索結果保持部４に保
持されている検索された文字列の中から、検索語保持部
１に保持されている検索語と語句として真に一致するも
のだけを抽出する適合語句抽出処理部、１０は適合語句
抽出処理部９によって抽出された語句を保持する抽出結
果保持部である。FIG. 1 is a block diagram showing the configuration of an apparatus according to an embodiment of the present invention. In FIG. 1, 1 is a search word holding unit for setting a search word, 2 is a search target document holding unit that holds a document to be searched, and 3 is a search word holding unit 1.
Search target document holding unit 2 using the search word held in
A search processing unit for performing a search process for documents in the inside, a search result holding unit for holding a character string which is a result of the search by the search processing unit 3, and a portion 5 for matching a search word in the search processing unit 3. The number of matching places holding unit 6 holds the number of matching places, and the number of matching places held in the number of matching places holding unit 5 and the matching phrase extraction processing required time prediction table 7 determine whether or not to perform matching phrase extraction processing. It is a matching phrase extraction processing control unit. The matching phrase extraction processing required time prediction table 7 stores information for predicting the time required for the matching phrase extraction processing from the number of matching points. Numeral 8 holds the threshold time for determining whether or not to execute the matching phrase extraction processing by comparing with the required time obtained from the matching phrase extraction processing required time prediction table 7 in the matching phrase extraction processing control unit 6. A matching phrase extraction processing control threshold time holding unit, 9 is a search word and phrase held in the search word holding unit 1 from the searched character strings held in the search result holding unit 4. The relevant word / phrase extraction processing unit 10 that extracts only those that exactly match with is an extraction result holding unit that holds the word / phrase extracted by the relevant word / phrase extraction processing unit 9.

【０００８】図２は図１に示した装置における動作の処
理手順を示すフローチャートである。本図を参照しなが
ら本発明の一実施例の動作を説明する。FIG. 2 is a flow chart showing the processing procedure of the operation in the apparatus shown in FIG. The operation of the embodiment of the present invention will be described with reference to the figure.

【０００９】まず、ステップＳ１でマッチング箇所数保
持部５に０を保持する。次に検索語が検索語保持部１に
保持されるまで待つ（ステップＳ２）。検索語が検索語
保持部１に保持されると、ステップＳ３に移る。ステッ
プＳ３では、検索対象文書保持部２内の全ての検索対象
文書の検索が終わりかどうか調べ、検索対象文書の検索
が終わりの場合はステップＳ８へ移る。終わりでなけれ
ばステップＳ４で検索対象文書保持部２より１文を取り
出す。次にステップＳ５に移り、検索語保持部１に保持
されている検索語がステップＳ４で取り出した１文中に
含まれているかどうか検索する。検索結果は検索結果保
持部４に保持される。次にステップＳ６に移り、ステッ
プＳ５で検索語が１文中に存在したかどうか調べる。存
在しなかった場合はステップＳ３へ戻る。存在した場合
はステップＳ７でマッチング箇所数保持部５に保持され
ている数を１だけ増やし、ステップＳ３へ戻る。First, in step S1, 0 is held in the matching portion number holding unit 5. Next, it waits until the search word is held in the search word holding unit 1 (step S2). When the search word is held in the search word holding unit 1, the process proceeds to step S3. In step S3, it is checked whether all the search target documents in the search target document holding unit 2 have been searched. If the search of the search target documents is completed, the process proceeds to step S8. If it is not the end, one sentence is retrieved from the search target document holding unit 2 in step S4. Next, the process proceeds to step S5, and it is searched whether or not the search word held in the search word holding unit 1 is included in one sentence extracted in step S4. The search result is held in the search result holding unit 4. Then, in step S6, it is checked in step S5 whether or not the search word is present in one sentence. If it does not exist, the process returns to step S3. If it exists, the number held in the matching place number holding unit 5 is incremented by 1 in step S7, and the process returns to step S3.

【００１０】ステップＳ８では、マッチング箇所数保持
部５から得られるマッチング箇所数から適合語句抽出処
理所要時間予測テーブル７を参照して適合語句抽出処理
所要時間を求める。適合語句抽出処理所要時間予測テー
ブルの例を図３に示す。このテーブルから適合語句抽出
処理所要時間を予測するには表のマッチング箇所数に対
応する処理所要時間を求めれば良い。例えば、マッチン
グ箇所数が１００である場合、所要時間は１８０秒とい
うように求めることができる。次のステップＳ９で、ス
テップＳ８で得られた所要時間と適合語句抽出処理制御
用しきい値時間保持部８に保持されているしきい値時間
とを比較することにより、所要時間がしきい値時間より
大であれば、ここで全ての処理を終了する。所要時間が
しきい値時間以下であればステップＳ１０に移る。ステ
ップＳ１０では検索結果保持部４に保持されている検索
結果に対して、検索された語が検索語保持部１に保持さ
れている検索語と真に単語として一致するものだけを抽
出する適合語句抽出処理を行う。抽出結果は抽出結果保
持部１０に保持される。In step S8, the matching phrase extracting process required time is obtained from the matching phrase extracting process required time prediction table 7 based on the matching place number obtained from the matching place number holding unit 5. FIG. 3 shows an example of the matching phrase extraction processing required time prediction table. To predict the required phrase extraction processing time from this table, the processing time corresponding to the number of matching points in the table may be obtained. For example, when the number of matching points is 100, the required time can be calculated as 180 seconds. In the next step S9, by comparing the required time obtained in step S8 with the threshold time held in the matching phrase extraction processing control threshold time holding unit 8, the required time is set to the threshold value. If it is larger than the time, all the processes are ended here. If the required time is less than or equal to the threshold time, the process proceeds to step S10. In step S10, a matching phrase that extracts only the searched word that matches the search word held in the search word holding unit 1 from the search results held in the search result holding unit 4 as a true word. Perform extraction processing. The extraction result is held in the extraction result holding unit 10.

【００１１】（他の実施例）１．上記実施例では、適合語句抽出処理を行わないと判
断し、適合語句抽出処理を行わずにそのまま全ての処理
を終了している場合で説明したが、適合語句抽出処理を
行わない旨のメッセージを表示するなどの使用者に対す
るガイドを表示してもよい。(Other Embodiments) 1. In the above-mentioned embodiment, it is described that the matching phrase extraction processing is not performed, and all the processes are finished without performing the matching phrase extraction process, but a message indicating that the matching phrase extraction process is not performed is displayed. A guide for the user such as displaying may be displayed.

【００１２】２．上記実施例では、適合語句抽出処理制
御用しきい値時間を装置があらかじめ保持しているとい
う場合で説明したが、これは別の方式で保持してもよ
い。例えば、使用者が保持部へ設定するものであっても
良い。2. In the above embodiment, the case has been described in which the device holds the threshold time for controlling the matching phrase extraction processing in advance, but this may be held by another method. For example, it may be set in the holding unit by the user.

【００１３】３．上記実施例では、適合語句抽出処理所
要時間予測テーブルを装置があらかじめ保持していると
いう場合で説明したが、これは別の方式で保持してもよ
い。例えば、使用者が用意したものを使用してもよい。3. In the above-described embodiment, the case where the apparatus holds the matching phrase extraction processing time prediction table in advance has been described, but this may be held by another method. For example, the one prepared by the user may be used.

【００１４】４．上記実施例では、適合語句抽出処理所
要時間予測テーブルにおいてマッチング箇所数を用いて
予測する場合で説明したが、これだけに限定されるもの
でなく、これに別のものを加味してもよい。例えば、使
用している計算機のプロセス負荷や通信回線の混み具
合、ファイルアクセスの頻度などを用いてもよい。4. In the above-described embodiment, the case where the number of matching points is used for the prediction in the matching phrase extraction processing time prediction table has been described, but the present invention is not limited to this, and another one may be added. For example, the process load of the computer being used, the congestion degree of the communication line, the frequency of file access, etc. may be used.

【００１５】５．上記実施例では、適合語句抽出処理所
要時間予測テーブルを用いて適合語句抽出処理の所要時
間を予測する場合で説明したが、これは別のものであっ
てもよい。例えば、マッチング箇所数と所要時間の関係
式などを用いてもよい。5. In the above embodiment, the case where the time required for the matching phrase extraction processing is predicted using the matching phrase extraction processing time prediction table has been described, but this may be different. For example, a relational expression between the number of matching points and the required time may be used.

【００１６】６．上記実施例では、適合語句抽出処理所
要時間予測テーブルを用いて適合語句抽出処理の所要時
間を予測した際、その求めた所要時間についての表示に
ついては触れていないが、これは適合語句抽出処理の最
中にタイマを設け、処理所要時間や開始時間、終了予定
時間、残り時間などを表示しても良い。6. In the above embodiment, when the time required for the matching phrase extraction process is predicted using the matching phrase extraction process required time prediction table, the display of the required time is not mentioned, but this is A timer may be provided in the middle to display the processing required time, the start time, the scheduled end time, the remaining time, and the like.

【００１７】７．上記実施例では、ステップＳ１はステ
ップＳ２の前に処理する場合で説明したが、これは逆で
もよいものである。7. In the above embodiment, the case where step S1 is processed before step S2 has been described, but this may be reversed.

【００１８】８．上記実施例では、検索処理およびマッ
チング箇所数のカウントアップを１文毎に行なう場合で
説明したが、複数文を単位にしたり全文書を検索処理し
てもよい。例えば、１０文毎でもよいものである。8. In the above-described embodiment, the case where the search process and the number of matching points are counted up is described for each sentence, but the search process may be performed for a plurality of sentences or for all documents. For example, it may be 10 sentences.

【００１９】[0019]

【発明の効果】以上説明したように、本発明によれば適
合語句抽出処理に要する時間が長時間かかる場合に処理
を簡略化することができ、検索処理を能率よく行なうこ
とができる。As described above, according to the present invention, the processing can be simplified and the search processing can be efficiently performed when the time required for the matching phrase extraction processing is long.

[Brief description of drawings]

【図１】本発明の一実施例に係る装置の構成を示すブロ
ック図である。FIG. 1 is a block diagram showing a configuration of an apparatus according to an embodiment of the present invention.

【図２】本発明の一実施例に係る処理手順を示すフロー
チャートである。FIG. 2 is a flowchart showing a processing procedure according to an embodiment of the present invention.

【図３】本発明の一実施例を説明するための適合語句抽
出処理所要時間予測テーブルを示す図である。FIG. 3 is a diagram showing a matching phrase extraction processing required time prediction table for explaining an embodiment of the present invention.

[Explanation of symbols]

１検索語保持部２検索対象文書保持部３検索処理部４検索結果保持部５マッチング箇所数保持部６適合語句抽出処理制御部７適合語句抽出処理所要時間予測テーブル８適合語句抽出処理制御用しきい値時間保持部９適合語句抽出処理部１０抽出結果保持部 1 Search word holding unit 2 Search target document holding unit 3 Search processing unit 4 Search result holding unit 5 Matching point number holding unit 6 Matching phrase extraction processing control unit 7 Matching phrase extraction processing required time prediction table 8 Used for matching phrase extraction processing control Threshold time holding unit 9 Compatible phrase extraction processing unit 10 Extraction result holding unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者藤田稔東京都大田区下丸子３丁目30番２号キヤノン株式会社内 ─────────────────────────────────────────────────── ─── Continuation of front page (72) Inventor Minoru Fujita 3-30-2 Shimomaruko, Ota-ku, Tokyo Canon Inc.

Claims

[Claims]

1. A search processing unit for searching a character string matching a search word from a document to be searched, a search result holding unit for holding a character string searched by the search processing unit, and a search result holding unit for holding the search result holding unit. Extraction means for extracting a match from the searched character string as a phrase, and a matching position number holding means for holding the number of parts matched with the search word in the search target document by the search processing means, Comparing the required time predicted by the predicting means with a predetermined threshold value, and a predicting means for predicting the required time for the extraction processing by the extracting means from the number of matching points obtained by the matching position number holding means. Means for deciding whether or not to perform the extraction processing by the extraction means on the search results held in the search result holding means. Document processing apparatus, characterized in that the.