JPH05204993A

JPH05204993A - Retrieving device

Info

Publication number: JPH05204993A
Application number: JP4010353A
Authority: JP
Inventors: Katsumi Murai; 克己村井; Kenji Hashimoto; 賢治橋本
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1992-01-23
Filing date: 1992-01-23
Publication date: 1993-08-13

Abstract

PURPOSE:To reduce the capacity of a summary file which is used for presearch the whole sentence in particular and to improve the retrieving efficiency for a retrieving device which applies a whole sentence retrieving system that can extract the requested document data out of a secondary storage storing a large quantity of document data with no application of the retrieving index information. CONSTITUTION:The capacity of a summary file 10 can be reduced in such a way where the independent word candidate groups extracted out of a text file 11 with no duplication are arranged and continuously stored in a file for each number of independent words. Then a retrieving range is limited based on the information on the boundary positions among those independent word candidates. Thus a whole sentence is retrieved with high efficiency.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、大量の文書データを蓄
えた２次記憶装置から、要求された文書データを引き出
すことが出来る全文検索方式を基本とした検索装置に関
するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a retrieval device based on a full-text retrieval system capable of extracting requested document data from a secondary storage device storing a large amount of document data.

【０００２】[0002]

【従来の技術】近年、ワードプロセッサーやパーソナル
コンピューターの普及により大量の文書データが仕事場
や家庭に於いて流通利用される状況になってきた。この
大量の文書データを整理して有効に利用していくため
に、大容量データベースと高速検索装置が研究開発され
てきた。2. Description of the Related Art In recent years, due to the spread of word processors and personal computers, a large amount of document data has been distributed and used at work and at home. In order to organize and effectively use this large amount of document data, large-capacity databases and high-speed search devices have been researched and developed.

【０００３】しかし、従来の検索マシンでは検索用にイ
ンデックス（キーワード）情報を付ける必要があり、デ
ータ量が増大するにつれてこのインデックス付け作業に
大変な労力が必要となってきた。これに対して、前記イ
ンデックス情報をつける作業を必要としない方法とし
て、前記インデックス情報なしにこの大量の文書データ
の中から的確かつ高速に所望の文書データを探し出すこ
とができる全文検索方式に基づく検索装置が発表されて
いる。例えば、１９７０年スロトニック（Slotnick,D.
L.）が提案したロジック・パー・トラックディスクは、
２次記憶装置の一種であるディスクの各ヘッドに検索専
用のプロセッサーを付加し、検索条件を満足した情報だ
けをホストコンピューターに転送することにより検索の
高速化を試みたものであり、具体的な装置としてトロン
ト大学のＲＡＰ等が実現している。一方、全文検索用テ
キストサーチマシン（電子情報通信学会技術研究報告・
データ工学89-38）は、２次記憶装置の複数化、文字成
分表と凝縮本文という二種類の要約ファイルを用いた階
層型プリサーチ方式を用いることによって検索の高速化
を試みている。However, in the conventional search machine, it is necessary to add index (keyword) information for searching, and as the amount of data increases, this indexing work requires a great deal of labor. On the other hand, as a method that does not require the work of adding the index information, a search based on a full-text search method capable of accurately and quickly finding desired document data from the large amount of document data without the index information. The device has been announced. For example, 1970 Slotnick, D.
L.) proposed the Logic Par Track Disc,
This is an attempt to speed up the search by adding a search-dedicated processor to each head of the disk, which is a type of secondary storage device, and transferring only the information satisfying the search conditions to the host computer. RAP of the University of Toronto has been realized as a device. On the other hand, a text search machine for full-text search (Technical report of IEICE /
Data Engineering 89-38) attempts to speed up the search by using a secondary storage device and using a hierarchical pre-search method that uses two types of summary files: a character component table and condensed text.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、前記ロ
ジック・パー・トラックディスクにおいては、全文検索
を行おうとすると２次記憶装置の一種であるディスク全
体を検索する必要があり、前記２次記憶装置に対するデ
ータの入出力に必要な時間が多くなり、検索の高速化が
難しくなるという問題があった。However, in the logic-per-track disc, in order to perform a full-text search, it is necessary to search the entire disc, which is a kind of secondary storage device. There was a problem that the time required for inputting / outputting data increased and it became difficult to speed up the search.

【０００５】また、前記全文検索用テキストサーチマシ
ンでは、この２次記憶装置に対するデータの入出力に必
要な時間を少なくすることを目的として２次記憶装置の
複数化と全文を凝縮した要約ファイルによるプリサーチ
方式を用いているが、どうしても前記要約ファイルが大
きくなってしまうという課題があった。Further, in the above-mentioned text search machine for full-text search, a plurality of secondary storage devices and a summary file in which the whole text is condensed are used for the purpose of reducing the time required to input / output data to / from the secondary storage device. Although the pre-research method is used, there is a problem that the summary file inevitably becomes large.

【０００６】本発明は、このような従来の検索装置の課
題を考慮し、全文検索用テキストマシンよりも小容量に
作成し、検索において要約ファイルを高い効率で使用で
きる検索装置を提供することを目的とする。In consideration of the above problems of the conventional retrieval apparatus, the present invention provides a retrieval apparatus which has a smaller capacity than a full-text retrieval text machine and can use a summary file with high efficiency in retrieval. To aim.

【０００７】[0007]

【課題を解決するための手段】請求項１の本発明は、検
索要求に応じて、予め作成されている要約ファイルをプ
リサーチし、その結果を利用して対応本文ファイルを得
る検索装置に於て、要約ファイルは、日本語文法規則に
基づき対応本文ファイルから自立語候補を抽出し、それ
によって得られた自立語候補群を自立語文字数毎にまと
めて分類して得た自立語候補群、及び文字数毎の境界位
置情報とで構成される要約ファイルである検索装置であ
る。According to a first aspect of the present invention, there is provided a search device for pre-searching a pre-created summary file in response to a search request and using the result to obtain a corresponding text file. The summary file is a group of independent word candidates obtained by extracting independent word candidates from the corresponding text file based on the Japanese grammar rules and classifying the independent word candidate groups obtained by this by grouping them by the number of independent word characters, And a boundary position information for each character number, which is a summary file.

【０００８】請求項２の本発明は、検索要求に応じて、
予め作成されている要約ファイルをプリサーチし、その
結果を利用して本文ファイルを得る検索装置に於て、要
約ファイルは、日本語文法規則に基づき本文ファイルか
ら自立語候補を抽出し、その自立語候補群の自立語候補
が自身の中に、他の自立語候補を包括的に含む場合、そ
の含まれている方の自立語候補を削除し、それによって
得られた自立語候補群を自立語文字数毎にまとめて分類
して得た自立語候補群、及び文字数毎の境界位置情報と
で構成される要約ファイルである検索装置である。According to the present invention of claim 2, in response to a search request,
In a search device that pre-searches a pre-created summary file and uses the results to obtain a text file, the summary file extracts independent word candidates from the text file based on Japanese grammar rules and When an independent word candidate of a word candidate group comprehensively contains another independent word candidate in itself, the contained independent word candidate is deleted, and the independent word candidate group obtained thereby becomes independent. It is a retrieval device which is a summary file composed of independent word candidate groups obtained by classifying by word character counts and boundary position information for each character count.

【０００９】[0009]

【作用】本発明によれば上記のように、本文ファイルを
凝縮した要約ファイルとして、全文から抽出された自立
語候補群を各自立語文字数毎にまとめて記録した自立語
候補群と、自立語候補群から文字数毎まにとめるときに
得られる境界位置情報を用いることで、自立語間の区切
りが必要なくなり要約ファイルの記録の容量を小さくで
き、境界位置情報をもとにした効率の良い検索を行うこ
とができる。従って効果的な全文検索を行うことができ
る。According to the present invention, as described above, an independent word candidate group in which independent word candidate groups extracted from the whole sentence are collectively recorded for each independent word character and an independent word are used as a condensed file in which the text file is condensed. By using the boundary position information obtained when stopping from the candidate group for each number of characters, it is possible to reduce the storage capacity of the summary file because there is no need to separate independent words, and efficient search based on the boundary position information can be performed. It can be performed. Therefore, an effective full-text search can be performed.

【００１０】[0010]

【実施例】以下、本発明の実施例について図面を参照し
て説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００１１】図１は本発明の検索装置の一実施例の構成
図である。FIG. 1 is a block diagram of an embodiment of the retrieval apparatus of the present invention.

【００１２】図１において、２次記憶装置１は大量の文
書データを蓄えておくための手段であり、記録媒体２は
文書データの本文ファイルと検索時に使用する要約ファ
イルを記録する例えば光ディスクのようなものであり、
文字列検索回路３は検索動作を行う回路であり、データ
メモリ回路４は一時的にデータを記録しておく回路であ
り、制御回路５はホストコンピューターからの検索要求
に対して前記記録媒体２、データメモリ回路４、文字列
検索回路３を制御する制御回路であり、ホストコンピュ
ータ６はユーザーからの検索要求の受け入れとユーザー
への検索結果の出力を行い前記２次記憶装置１に検索要
求を送り前記２次記憶装置１から検索結果を受け取る働
きをするものである。なお、７はホストコンピューター
６から前記２次記憶装置１に送られる検索要求であり、
８は前記２次記憶装置１から前記ホストコンピューター
６に送られる検索結果である。In FIG. 1, a secondary storage device 1 is a means for storing a large amount of document data, and a recording medium 2 records a body file of document data and a summary file used for retrieval, such as an optical disc. Is something
The character string search circuit 3 is a circuit for performing a search operation, the data memory circuit 4 is a circuit for temporarily recording data, and the control circuit 5 is for the recording medium 2 in response to a search request from a host computer. A control circuit for controlling the data memory circuit 4 and the character string search circuit 3, and the host computer 6 receives the search request from the user, outputs the search result to the user, and sends the search request to the secondary storage device 1. It serves to receive search results from the secondary storage device 1. Reference numeral 7 is a search request sent from the host computer 6 to the secondary storage device 1,
Reference numeral 8 is a search result sent from the secondary storage device 1 to the host computer 6.

【００１３】次に、以上の様に構成された検索装置にお
いて、図２を用いて検索動作について説明する。ステッ
プ１でユーザーからホストコンピューター６に入力され
た検索要求が２次記憶装置１に送られる。ステップ２で
は、制御回路５が、送られてきた検索要求に基づいて記
録媒体２に記録されている要約ファイルを用いて文字列
検索回路３で検索するように制御し、ステップ３で前記
検索結果に基づいて文字列検索回路３で本文ファイルを
検索するように制御回路５で制御し、ステップ４で前記
検索結果が２次記憶装置１からホストコンピューター６
に送られて前記ユーザーに前記検索結果を表示する。Next, the search operation in the search device configured as described above will be described with reference to FIG. The search request input from the user to the host computer 6 in step 1 is sent to the secondary storage device 1. In step 2, the control circuit 5 controls the character string search circuit 3 to search using the summary file recorded in the recording medium 2 based on the sent search request, and in step 3, the search result is obtained. The control circuit 5 controls the character string search circuit 3 to search for a text file based on the above, and in step 4, the search result is transferred from the secondary storage device 1 to the host computer 6.
And send the search results to the user.

【００１４】図３は記録媒体２に記録されているデータ
ファイルの構成図である。図３において、データファイ
ル９は、記録媒体２に記録されている情報であり、要約
ファイル１０は境界位置情報ファイルと自立語候補群フ
ァイルで構成される、プリサーチに用いるファイルであ
り、本文ファイル１１は文書データがそのまま入った本
文ファイルである。このときの要約ファイル１０と本文
ファイル１１との容量比は文書データがどのような内容
で構成されているかによって変化するが、本発明では新
聞のデータで約１対４、特許文書のような文章中に出現
する自立語の重複の多い特殊な文書で約１対１０の割合
である。FIG. 3 is a configuration diagram of a data file recorded on the recording medium 2. In FIG. 3, a data file 9 is the information recorded on the recording medium 2, and a summary file 10 is a file used for pre-search, which is composed of a boundary position information file and an independent word candidate group file. Reference numeral 11 is a body file in which the document data is stored as it is. At this time, the capacity ratio between the summary file 10 and the body file 11 varies depending on the content of the document data, but in the present invention, the newspaper data is about 1: 4, and the text like a patent document is used. The ratio is about 1 to 10 for a special document that has many independent words that appear in it.

【００１５】要約ファイル１０をどのようにして作成す
るかを図４をもとにして説明する。ステップ１で全文フ
ァイル１１より日本語文法規則に基づいて自立語候補を
抽出する。前記自立語の抽出方法は、例えば日本語にお
いては平仮名で構成される自立語がほとんどないことを
利用して漢字と片仮名だけを取り出すことが考えられ
る。ステップ２で前記の抽出された自立語候補を前記全
文ファイルのうちの一定の文書範囲の中で重複する自立
語を削除して第１段階自立語候補群を得る。ステップ３
で前記第１段階自立語候補群の自立語として抽出したも
の同士を比較し、他方に包括的に包含される自立語を削
除して第２段階自立語候補群を得る。ステップ４で前記
第２段階自立語候補群を各自立語の文字数毎にまとめて
分類し整列することで第３段階自立語候補群を得る。こ
の際、最大となる自立語の文字数と各文字数毎の自立語
候補群の境界に関する情報が得られるので前記情報を境
界位置情報とする。前記第３段階自立語候補群と境界位
置情報を合わせたものが前記要約ファイル１０となる。How to create the summary file 10 will be described with reference to FIG. In step 1, independent word candidates are extracted from the full-text file 11 based on Japanese grammar rules. As the method for extracting the independent word, it can be considered to extract only kanji and katakana by utilizing the fact that there are almost no independent words composed of hiragana in Japanese. In step 2, the extracted independent word candidates are deleted from overlapping independent words within a certain document range of the full-text file to obtain a first stage independent word candidate group. Step 3
Then, the extracted independent words of the first-stage independent word candidate group are compared with each other, and the independent words comprehensively included in the other are deleted to obtain the second-stage independent word candidate group. In step 4, the second-stage independent word candidate group is classified and arranged according to the number of characters of each independent word to obtain a third-stage independent word candidate group. At this time, since the information about the maximum number of independent word characters and the boundary of the independent word candidate group for each character number is obtained, the information is used as boundary position information. The summary file 10 is a combination of the third-stage independent word candidate group and boundary position information.

【００１６】更に具体的に、要約ファイル１０について
説明する。図５は全文検索の前のプリサーチで用いる要
約ファイル１０の構成図である。The summary file 10 will be described more specifically. FIG. 5 is a configuration diagram of the summary file 10 used in the pre-search before the full-text search.

【００１７】図５において、１２は２次記憶装置に蓄え
られた文章データから抽出された前記第２段階の自立語
候補群をその自立語の文字数毎にまとめて整列し格納し
た前記第３段階の自立語候補群で構成されるファイルの
概要図であり、１３は前記ファイル中で１文字からなる
自立語候補群の例であり、１４は前記ファイル中で１文
字からなる自立語の候補群が格納されている位置の末尾
アドレスであり、１５は前記ファイル中で２文字からな
る自立語候補群の例であり、１６は前記ファイル中で２
文字からなる自立語の候補群が格納されている位置の末
尾アドレスであり、１７は前記ファイル中で３文字から
なる自立語候補群の例であり、１８は前記ファイル中で
３文字からなる自立語の候補群が格納されている位置の
末尾アドレスであり、前記の１３と１４、１５と１６、
１７と１８の構成が前記の文章データにおいて抽出され
た第３段階の自立語候補群中で最大文字数であるＮまで
繰り返されたものが前記自立語候補群ファイルの全体構
成となる。In FIG. 5, reference numeral 12 denotes the third stage in which the independent word candidate groups of the second stage extracted from the sentence data stored in the secondary storage device are arranged and stored together according to the number of characters of the independent word. 13 is a schematic diagram of a file composed of independent word candidate groups, 13 is an example of an independent word candidate group consisting of one character in the file, and 14 is an independent word candidate group consisting of one character in the file Is the end address of the position where 15 is stored, 15 is an example of an independent word candidate group consisting of 2 characters in the file, and 16 is 2 in the file.
It is the end address of the position where the independent word candidate group consisting of characters is stored, 17 is an example of an independent word candidate group consisting of 3 characters in the file, and 18 is an independent word consisting of 3 characters in the file. This is the end address of the position where the word candidate group is stored, and the above 13 and 14, 15 and 16,
The entire structure of the independent word candidate group file is obtained by repeating the structures 17 and 18 up to N, which is the maximum number of characters in the independent word candidate group of the third stage extracted in the sentence data.

【００１８】１９は前記第２段階の自立語候補群をその
自立語の文字数毎にまとめて整列する段階で得られる文
字数毎自立語群が前記自立語候補群ファイルに格納され
るときの境界アドレスである境界位置情報と、前記第３
段階の自立語候補群中で最大文字数Ｎの情報とで構成さ
れる境界位置情報ファイルの概要図であり、２０は前記
第３段階の自立語候補群中で最大文字数Ｎの情報が格納
されており、２１は前記自立語候補群ファイル中で１文
字からなる自立語の候補群が格納されている位置の末尾
アドレスが格納されており、２２は前記自立語候補群フ
ァイル中で２文字からなる自立語の候補群が格納されて
いる位置の末尾アドレスが格納されており、２３は前記
自立語候補群ファイル中で３文字からなる自立語の候補
群が格納されている位置の末尾アドレスが格納されてお
り、２４は前記自立語候補群ファイル中で前記最大文字
数Ｎ文字からなる自立語の候補群が格納されている位置
の末尾アドレスが格納されている。上述のように、前記
自立語候補群ファイル１２と前記境界位置情報ファイル
１９の２つのファイルで要約ファイル１０は構成されて
いる。Reference numeral 19 is a boundary address when the independent word group for each number of characters obtained in the step of arranging the independent word candidate groups in the second stage collectively for each number of characters of the independent word is stored in the independent word candidate group file. Boundary position information that is
FIG. 20 is a schematic diagram of a boundary position information file composed of information of the maximum number of characters N in the independent word candidate group of the stage, and 20 stores information of the maximum number of characters N in the independent word candidate group of the third stage. 21 stores the end address of the position where the independent word candidate group consisting of one character is stored in the independent word candidate group file, and 22 consists of two characters in the independent word candidate group file. The end address of the position where the independent word candidate group is stored is stored, and 23 is the end address of the position where the independent word candidate group consisting of three characters is stored in the independent word candidate group file. Reference numeral 24 stores the end address of the position where the independent word candidate group having the maximum number N of characters is stored in the independent word candidate group file. As described above, the summary file 10 is composed of the independent word candidate group file 12 and the boundary position information file 19.

【００１９】なお本実施例では、要約ファイル１０とし
て自立語候補群ファイル１２と境界位置情報ファイル１
９の２つに分けているが、自立語候補群ファイル１２の
前後に位置情報ファイル１９を接続することで１つのフ
ァイルにしたり、自立語文字数毎候補群に対して文字数
毎に複数ファイルに分割してもよい。In this embodiment, the independent word candidate group file 12 and the boundary position information file 1 are used as the summary file 10.
Although it is divided into two, it is made into one file by connecting the position information file 19 before and after the independent word candidate group file 12 or divided into a plurality of files by the number of independent word character groups for each candidate group. You may.

【００２０】また本実施例では、自立語候補群ファイル
を文字数の小さい順に整列して格納しているが、文字数
の大きい順に整列して格納してもよい。Further, in this embodiment, the independent word candidate group files are arranged and stored in the ascending order of the number of characters, but they may be arranged and stored in the descending order of the number of characters.

【００２１】前記自立語候補群ファイル１２は、全文フ
ァイルより日本語文法規則に基づいて抽出された自立語
候補群から重複しないように選別された自立語候補群に
対して各自立語の文字数毎に分類しまとめて整列化し、
前記整列化された自立語文字数毎候補群を各自立語間に
区切りを入れることなしに連続して格納する構成からな
っている。このように前記自立語候補群ファイル１２は
連続して自立語文字数毎候補群をファイルに格納するた
め、スペース等自立語間の区切りの情報や余分な情報を
入れることがないので、要約ファイルを従来の方法に比
べて小容量にすることが可能である。The independent word candidate group file 12 has a number of characters of each independent word for the independent word candidate group selected from the full text file based on the Japanese grammar rules so as not to overlap. Categorized into
It is configured such that the sorted candidate word groups for each number of independent words are continuously stored without a break between each independent word. In this way, since the independent word candidate group file 12 continuously stores the candidate groups for each number of independent word characters in the file, information such as spaces for separating independent words or extra information is not entered. The capacity can be reduced as compared with the conventional method.

【００２２】また、前記自立語候補群の整列化時に得ら
れる前記自立語文字数毎候補群の末尾の格納アドレス情
報を境界位置情報ファイルとして格納している。この境
界位置情報をもとにして、例えば検索要求として３文字
で構成される単語が与えられたとすると、前記境界位置
情報の２文字の末尾アドレスから３文字の末尾アドレス
までの間でのみ前記１２の自立語候補群ファイルに対し
て前記要求単語との照合を行えばよいので、無駄な照合
が省かれより効率よくプリサーチを行うことが可能であ
る。Further, the storage address information at the end of the candidate group for each independent word character number obtained at the time of sorting the independent word candidate group is stored as a boundary position information file. If, for example, a word consisting of three characters is given as a search request based on this boundary position information, the above-mentioned 12 characters are used only between the end address of two characters and the end address of three characters of the boundary position information. Since it suffices to collate the independent word candidate group file with the required word, unnecessary collation can be omitted, and pre-search can be performed more efficiently.

【００２３】なお本実施例では、境界位置情報ファイル
をもとにして検索要求の文字数に合う自立語候補群ファ
イルの特定範囲を照合することで検索の効率化を図って
いるが、自立語候補群ファイル中の検索要求の文字数以
上の範囲を照合することも可能である。この場合、例え
ば検索要求が「瀬戸」であれば「瀬戸物」や「瀬戸大
橋」のような複合語についても対処でき、自立語候補群
ファイルにおいて「瀬戸」を「瀬戸物」か「瀬戸大橋」
と同じものとして削除し、容量を小さくすることができ
る。この際、境界位置情報は検索単語の自立語候補群フ
ァイル内の位置が境界位置や各自立語間にまたがってい
ないかどうかの判定に使用できる。In the present embodiment, the search efficiency is improved by collating the specific range of the independent word candidate group file that matches the number of characters of the search request based on the boundary position information file. It is also possible to collate a range within the number of characters in the search request in the group file. In this case, for example, if the search request is "Seto", compound words such as "Setomono" and "Seto Ohashi" can be dealt with, and "Seto" is replaced with "Setomono" or "Seto Ohashi" in the independent word candidate group file.
It can be deleted as the same as, and the capacity can be reduced. At this time, the boundary position information can be used to determine whether the position of the search word in the independent word candidate group file does not extend across the boundary position or each independent word.

【００２４】また、本発明は、コンピュータを利用して
ソフトウェア的に実現しても、専用のハード回路を用い
て実現してもかまわない。Further, the present invention may be realized by software using a computer or by using a dedicated hardware circuit.

【００２５】[0025]

【発明の効果】以上、詳細に説明したように、本発明に
よれば次のような効果を得ることができる。As described above in detail, according to the present invention, the following effects can be obtained.

【００２６】(1)検索のプリサーチとして使用する要約
ファイルである自立語候補群ファイルは本文ファイルか
ら自立語候補を抽出し、その自立語候補群を各自立語の
文字数毎にまとめて整列して得た自立語文字数毎候補群
を連続して格納していく構成であるため、従来の要約フ
ァイル内の自立語間に必要となるスペース等の区切り記
号が不要となり、少ない容量で要約ファイルが作成でき
る。(1) The independent word candidate group file, which is a summary file used as a pre-search for the search, extracts independent word candidates from the main text file, and arranges the independent word candidate groups collectively by the number of characters of each independent word. Since the candidate groups for each number of independent word characters obtained in this way are stored consecutively, there is no need for delimiters such as spaces required between independent words in the conventional summary file, and the summary file can be stored with a small capacity. Can be created.

【００２７】検索領域を特定しなければ、複合語をまと
めることにより更に少ない容量で要約ファイルを作成で
きる。If the search area is not specified, a summary file can be created with a smaller capacity by combining the compound words.

【００２８】(2)検索のプリサーチの段階で自立語候補
群ファイルの検索位置を知るために、自立語候補群ファ
イルを各自立語の文字数毎にまとめて作成する時に得ら
れる境界位置情報ファイルを用いることで検索要求とし
て与えられる単語の文字数毎に自立語候補群ファイルの
特定領域のみを検索するだけでよいので、プリサーチを
効率よく行うことができる。(2) Boundary position information file obtained when an independent word candidate group file is created collectively for each number of independent word characters in order to know the search position of the independent word candidate group file in the pre-search stage of the search By using, it is only necessary to search only the specific region of the independent word candidate group file for each character number of the word given as the search request, so that the pre-search can be efficiently performed.

[Brief description of drawings]

【図１】本発明の一実施例における検索装置の構成図で
ある。FIG. 1 is a configuration diagram of a search device according to an embodiment of the present invention.

【図２】同検索装置における検索手順を示すフローチャ
ートである。FIG. 2 is a flowchart showing a search procedure in the search device.

【図３】同検索装置におけるファイルの構成図である。FIG. 3 is a configuration diagram of a file in the search device.

【図４】同検索装置における要約ファイルの作成手順を
示すフローチャートである。FIG. 4 is a flowchart showing a procedure of creating a summary file in the search device.

【図５】同検索装置における要約ファイルの概要図であ
る。FIG. 5 is a schematic diagram of a summary file in the search device.

[Explanation of symbols]

１２次記録装置２記録媒体（例えば光ディスク）３文字列検索回路４データメモリ回路５制御回路６ホストコンピューター７検索要求８検索結果９２次記録装置に記録されているファイル構成１０要約ファイル１１本文ファイル１２自立語候補群ファイル１３１文字自立語候補群例１４１文字自立語候補群のファイル格納末尾アドレ
ス１５２文字自立語候補群例１６２文字自立語候補群のファイル格納末尾アドレ
ス１７３文字自立語候補群例１８３文字自立語候補群のファイル格納末尾アドレ
ス１９境界位置情報ファイル２０自立語最大文字数情報２１１文字自立語候補群末尾アドレス２２２文字自立語候補群末尾アドレス２３３文字自立語候補群末尾アドレス２４最大文字数自立語候補群末尾アドレス1 Secondary Recording Device 2 Recording Medium (eg Optical Disk) 3 Character String Search Circuit 4 Data Memory Circuit 5 Control Circuit 6 Host Computer 7 Search Request 8 Search Result 9 File Structure Recorded in Secondary Recording Device 10 Summary File 11 Body File 12 Independent word candidate group file 13 1 character Independent word candidate group example 14 1 character Independent word candidate group file storage end address 15 2 characters Independent word candidate group example 16 2 character Independent word candidate group file storage end address 17 3 characters Independent word candidate group example 18 3 characters Independent word candidate group file storage end address 19 Boundary position information file 20 Independent word maximum number of characters information 21 1 character Independent word candidate group end address 22 2 characters Independent word candidate group end address 23 3 characters Independence Ending word group address 24 Maximum number of characters Independent word group ending address

Claims

[Claims]

1. A search device for pre-searching a pre-created summary file in response to a search request and using the result to obtain a corresponding text file, wherein the summary file complies with Japanese grammar rules. Independent word candidates are extracted from the corresponding text file based on the above, and independent word candidate groups obtained thereby are collectively classified by the number of independent word characters, and composed of boundary position information for each number of characters. Retrieval device characterized by being a summarized file.

2. A search device for presearching a preliminarily created summary file in response to a search request and using the result to obtain a text file, wherein the summary file is based on Japanese grammar rules. When an independent word candidate is extracted from the text file and the independent word candidate of the independent word candidate group comprehensively includes another independent word candidate in itself, the independent word candidate that is included is deleted. It is characterized by being a summary file composed of the independent word candidate group obtained by classifying the independent word candidate group thus obtained by the number of independent word characters and the boundary position information for each character number. Search device.