JP3415214B2

JP3415214B2 - Document search device

Info

Publication number: JP3415214B2
Application number: JP22436393A
Authority: JP
Inventors: 康雄田野崎; 紀子小山
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1993-09-09
Filing date: 1993-09-09
Publication date: 2003-06-09
Anticipated expiration: 2018-06-09
Also published as: JPH0785033A

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、テキストデータが含ま
れる文書の検索を高速、かつ正確に行なうための文書検
索装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document search device for quickly and accurately searching a document containing text data.

【０００２】[0002]

【従来の技術】従来、テキストデータが含まれる文書を
データベースとして登録する場合、検索処理時に所望す
る文書を容易に特定するために、文書データと共に、文
書に対して特徴的な単語をキーワードとして付与して登
録しておく。データベースに登録された大量の文書から
必要な文書を検索する際には、１つ以上の単語を検索キ
ーワードとして与え、この検索キーワードに該当するキ
ーワードが付与された文書を候補として抽出する。2. Description of the Related Art Conventionally, when a document including text data is registered as a database, in order to easily specify a desired document at the time of search processing, a characteristic word for the document is added as a keyword together with the document data And register. When searching for a required document from a large number of documents registered in the database, one or more words are given as a search keyword, and a document to which a keyword corresponding to this search keyword is given is extracted as a candidate.

【０００３】この方式によると、検索速度は比較的高速
であるが、登録する文書に対してキーワードを付与する
作業に労力を要していた。さらに、適当なキーワードを
付与することが困難であり、検索時に検索者の目的とす
る文書が得られない場合がある。According to this method, although the search speed is relatively high, it takes a lot of work to add a keyword to a document to be registered. Furthermore, it is difficult to assign an appropriate keyword, and the document targeted by the searcher may not be obtained during the search.

【０００４】これに対して、近年、コンピュータの処理
速度の高速化、記憶装置の大容量化等に伴い、フルテキ
ストサーチによる方法が実用化されつつある。この方式
では、検索時に、文書中に含まれる文字列（テキストデ
ータ）の全てが検索の対象となるため、各文書に対して
予めキーワードを付与しておく必要がなく、検索者の指
定した語句（キーワード）を含む文書を漏れなく抽出す
ることができる。On the other hand, in recent years, along with the increase in the processing speed of computers and the increase in capacity of storage devices, full-text search methods are being put to practical use. In this method, all the character strings (text data) contained in the document are searched when searching, so it is not necessary to assign a keyword to each document in advance, and the word specified by the searcher Documents including (keyword) can be extracted without omission.

【０００５】しかしながら、このフルテキストサーチ方
式によると、検索時にテキストデータの内容を全て読み
取らねばならないため、登録された文書データが多くな
るにしたがって処理時間を要することになる。このた
め、登録文書が大量になる場合には、現在のコンピュー
タの処理能力をもってしても、指定された文字列をサー
チするために膨大な処理時間を要し、実用に耐えられな
くなるという問題が生じていた。However, according to this full-text search method, since the entire contents of the text data must be read at the time of searching, the processing time becomes longer as the number of registered document data increases. Therefore, when the number of registered documents is large, even if the processing power of the current computer is used, a huge amount of processing time is required to search for the specified character string, which makes it impractical. It was happening.

【０００６】さらに、一般的なフルテキストサーチによ
る方法によると、単語の概念の扱いが困難であり、入力
されたキーワードを部分文字列として含む文書を全て検
索してしまう。例えば「マイク」という単語を含む文書
を検索する際に、「マイクロコンピュータ」という、
「マイク」と関係のない単語を含む文書も検索されてし
まうという不具合が生じていた。すなわち、「マイクロ
コンピュータ」という文字列が、「マイク」という部分
文字列を含み、この部分文字列に対してマッチングが行
なわれてしまうためである。Furthermore, according to a general full-text search method, it is difficult to handle the concept of a word, and all documents including the input keyword as a partial character string are searched. For example, when searching for documents that contain the word "microphone,"
There was a problem that documents containing words not related to "Mike" were also searched. That is, the character string “microcomputer” includes a partial character string “microphone”, and matching is performed on this partial character string.

【０００７】[0007]

【発明が解決しようとする課題】このように従来の文書
検索装置は、フルテキストサーチの方法であれば、検索
者の指定した文字列を含む文書を必ず検索できる点、及
び、キーワードを付与する作業を行なわなくても良いと
いう点で優れているが、その反面、データ量が増加する
にしたがって処理時間を要し、データ量が大量になると
検索能力が著しく低下してしまうという問題があった。As described above, the conventional document search apparatus can always search for a document including a character string designated by the searcher and a keyword is added by the full-text search method. It is excellent in that it does not require any work, but on the other hand, there is a problem that as the amount of data increases, processing time increases, and when the amount of data increases, the search capability decreases significantly. .

【０００８】また、フルテキストサーチ方式では、検索
の際に、一つの意味の単位である単語と、単なる文字列
との区別をしないために、目的の単語を部分文字列とし
て内部に持つ文字列を含む文書を検索してしまうなど、
検索者の意図した単語が正しく検索されないという問題
があった。Further, in the full-text search method, in order to make a distinction between a word, which is one unit of meaning, and a simple character string at the time of searching, a character string having a target word as a partial character string therein is provided. Such as searching for documents containing
There was a problem that the word intended by the searcher was not searched correctly.

【０００９】本発明は前記のような事情を考慮してなさ
れたもので、大量の文書中から、指定された単語を含む
適当な文書を高速かつ正確に検索することが可能な文書
検索装置を提供することを目的とする。The present invention has been made in consideration of the above circumstances, and provides a document search apparatus capable of quickly and accurately searching a large number of documents for an appropriate document including a designated word. The purpose is to provide.

【００１０】[0010]

【課題を解決するための手段】本発明は、文字コード列
からなるテキストデータを含む複数の文書の中から、任
意に与えられる文字列からなるキーワードを含む候補文
書を検索する文書検索装置において、文書中に含まれる
各文字コード毎に、各文字の文書中における出現位置と
同文字が含まれる単語内での位置を示す情報が格納され
た文字出現位置テーブルを、文字コード毎に全文書分に
ついて管理するためのテーブル管理ファイルを格納する
ためのテーブル管理ファイル格納手段と、キーワードと
なる文字列を入力するキーワード入力手段と、前記キー
ワード入力手段によって入力されたキーワードの各文字
に対応するテーブル管理ファイルを、前記テーブル管理
ファイル記憶手段から選出するテーブル管理ファイル選
出手段と、前記テーブル管理ファイル選出手段によって
選出された各テーブル管理ファイルの前記文字出現位置
テーブルに基づいて、１つの文書のテキストデータ中に
前記キーワード入力手段によって入力されたキーワード
が含まれるか否かを判定する文字列サーチ手段と、前記
文字列サーチ手段によってキーワードを含むと判定され
た候補文書の内容を表示する検索結果表示手段とを具備
して構成する。The present invention SUMMARY OF THE INVENTION from among the plurality of documents containing text data consisting of character code strings, the document search apparatus for searching the candidate documents that contain keywords of strings given arbitrarily, For each character code included in the document, a character appearance position table that stores information indicating the position of each character in the document and the position within the word that contains the same character Table management file storage means for storing a table management file for managing, keyword input means for inputting a character string as a keyword, and table management corresponding to each character of the keyword input by the keyword input means. files, and table management file selection <br/> detecting means for selecting from said table management file storage means, before Based on the character appearance position table of each table management files selected by the table management file selection means, the character is determined whether include keywords entered by the keyword input unit in the text data of one document It comprises a column search means and a search result display means for displaying the content of the candidate document determined to contain the keyword by the character string search means.

【００１１】[0011]

【作用】このような構成によれば、文字出現位置テーブ
ル中に格納されている出現位置情報と単語内の位置を示
す情報を、文字列サーチ手段で利用することにより、キ
ーワードとして指定された文字列が各文書データ中に存
在するか否かの判定を高速に行うことができ、さらに、
単語存在判定手段を用いることにより検索の際に、一つ
の意味の単位である単語と単なる文字列との区別を行な
うことが可能になり、キーワードの単語を部分文字列と
して内部に持つ文字列を含む文書を候補文書として検索
されることを防止する。According to this structure, by using the appearance position information stored in the character appearance position table and the information indicating the position within the word in the character string search means, the character specified as the keyword It is possible to quickly determine whether or not a column exists in each document data, and further,
By using the word existence determination means, it is possible to distinguish between a word that is one unit of meaning and a simple character string when searching, and use a character string that internally has the word of the keyword as a partial character string. Prevents the containing document from being searched as a candidate document.

【００１２】[0012]

【実施例】以下、図面を参照して本発明の一実施例につ
いて説明する。図１は本実施例に係わる文書検索装置の
概略構成を示すブロック図である。図１に示すように、
文書検索装置は、制御装置１０、入力装置１１、表示装
置１２、メモリ装置１３及び外部記憶装置１４によって
構成されている。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a schematic configuration of a document search device according to this embodiment. As shown in Figure 1,
The document search device includes a control device 10, an input device 11, a display device 12, a memory device 13, and an external storage device 14.

【００１３】制御装置１０は、ＣＰＵ、メモリ等から構
成されており、文書検索装置全体の制御を司るもので、
図２に示す機能構成を有している。制御装置１０は、例
えばＯＳ（operating system）としてＵＮＩＸをメイン
制御部２１として用いており、メモリ装置１３に格納さ
れたプログラムに従って動作する。The control device 10 is composed of a CPU, a memory, etc., and controls the entire document retrieval device.
It has the functional configuration shown in FIG. The control device 10 uses, for example, UNIX as the OS (operating system) as the main control unit 21, and operates according to a program stored in the memory device 13.

【００１４】入力装置１１は、装置に対するコマンド、
キーワードなどを入力するための装置であり、キーボー
ド、マウス等によって構成される。表示装置１２は、本
装置がユーザに対して与えるプロンプトメッセージ、検
索結果等を表示する装置であり、例えばＶＲＡＭ、及び
ＣＲＴ、コントローラによって構成される。The input device 11 is a command for the device,
This is a device for inputting keywords and the like, and is composed of a keyboard, a mouse and the like. The display device 12 is a device that displays a prompt message, a search result, and the like that this device gives to the user, and is configured by, for example, a VRAM, a CRT, and a controller.

【００１５】メモリ装置１３は、ＲＡＭ及びＲＯＭから
なり、制御装置１０のＣＰＵを制御するためのプログラ
ムや、各種データ等が格納されている。さらに、各種の
処理を行なうためのワークエリアも確保される。The memory device 13 is composed of a RAM and a ROM, and stores programs for controlling the CPU of the control device 10 and various data. Further, a work area for performing various kinds of processing is secured.

【００１６】外部記憶装置１４は、例えば磁気ディスク
装置から構成され、文字出現位置テーブル、及びこれら
を管理するための各種情報を格納する。また、検索結果
の表示のために必要なテキストデータあるいは図表デー
タなども格納されている。データはファイルとして管理
され、ファイルシステム中でランダムにアクセスすると
ができる。The external storage device 14 is composed of, for example, a magnetic disk device, and stores a character appearance position table and various information for managing these. In addition, text data or chart data necessary for displaying the search results is also stored. Data is managed as files and can be accessed randomly in the file system.

【００１７】次に、制御装置１０及びメモリ装置１３
（に格納されたプログラムに基づいて）実現されるの機
能構成の詳細について説明する。機能構成は、図２に示
すように、処理部２０、及び作業領域部３０によって構
成されている。Next, the control device 10 and the memory device 13
Details of the functional configuration of (based on the program stored in) will be described. As shown in FIG. 2, the functional configuration includes a processing unit 20 and a work area unit 30.

【００１８】処理部２０は、メイン制御部２１、初期化
部２２、キーワード入力部２３、文字出現位置テーブル
選出部２４、文字列リーチ部２５、文書選択部２６、検
索結果表示部２７が設けられている。作業領域部３０
は、キーワード文字数格納バッファ３１、キーワード格
納バッファ３２、テーブル管理ファイル格納バッファ３
３、候補文書数格納バッファ３４、候補文書番号格納バ
ッファ３５、及び表示文書番号格納バッファ３６が設け
られている。The processing unit 20 is provided with a main control unit 21, an initialization unit 22, a keyword input unit 23, a character appearance position table selection unit 24, a character string reach unit 25, a document selection unit 26, and a search result display unit 27. ing. Work area 30
Is a keyword character number storage buffer 31, a keyword storage buffer 32, a table management file storage buffer 3
3, a candidate document number storage buffer 34, a candidate document number storage buffer 35, and a display document number storage buffer 36 are provided.

【００１９】メイン制御部２１は、装置全体の処理を制
御する部分で、各機能部に対して、ユーザへのプロンプ
トメッセージの表示、処理の分岐などをするための指示
を与える。The main control section 21 is a section for controlling the processing of the entire apparatus, and gives each functional section an instruction for displaying a prompt message to the user and for branching the processing.

【００２０】初期化部２２は、データベース検索処理実
行に際して、検索処理に必要な各ハードウェア装置の初
期設定、作業領域部３０の各バッファの初期化を行な
う。キーワード入力部２３は、データベース検索処理実
行に際して、外部記憶装置１４に格納されたデータベー
ス中の大量の文書から所望する文書を検索するための検
索用キーワードとなる文字列を入力装置１１から入力す
る。キーワード入力部２３は、入力したキーワードをキ
ーワード格納バッファ３２に格納し、キーワード文字数
をキーワード文字数格納バッファ３１に格納する。When executing the database search process, the initialization unit 22 initializes each hardware device required for the search process and initializes each buffer of the work area unit 30. When executing the database search process, the keyword input unit 23 inputs, from the input device 11, a character string serving as a search keyword for searching a desired document from a large number of documents in the database stored in the external storage device 14. The keyword input unit 23 stores the input keyword in the keyword storage buffer 32, and stores the keyword character number in the keyword character number storage buffer 31.

【００２１】文字出現位置テーブル選出部２４は、キー
ワード入力部２３から入力され、キーワード格納バッフ
ァ３２に格納されたキーワードに応じて、キーワード中
の各文字に対応するテーブル管理ファイルを、作業領域
部３０のテーブル管理ファイル格納バッファ３４中から
選び出し、テーブル管理ファイル格納バッファ３３に格
納する。The character appearance position table selecting unit 24 receives a table management file corresponding to each character in the keyword according to the keyword input from the keyword input unit 23 and stored in the keyword storage buffer 32, and the work area unit 30. It is selected from the table management file storage buffer 34 and stored in the table management file storage buffer 33.

【００２２】文字列サーチ部２５は、テーブル管理ファ
イル格納バッファ３３に格納されているテーブル管理フ
ァイルの内容を参照して、キーワード格納バッファ３２
に格納されたキーワードの単語を含む検索対象となる候
補文書を、外部記憶装置１４に格納されたデータベース
から検索し、該当する候補文書番号格納バッファ３５に
格納し、候補文書数を候補文書数格納バッファ３４に格
納する。The character string search unit 25 refers to the contents of the table management file stored in the table management file storage buffer 33 and refers to the keyword storage buffer 32.
A candidate document to be searched including the word of the keyword stored in is searched from the database stored in the external storage device 14, stored in the corresponding candidate document number storage buffer 35, and the number of candidate documents is stored in the number of candidate documents. The data is stored in the buffer 34.

【００２３】文書選択部２６は、文字列サーチ部２５に
よって検索され、候補文書番号格納バッファ３５に格納
された候補文書番号が示す候補文書の一覧表を表示さ
せ、候補文書が複数ある場合に検索対象に該当する文書
を選択させ、選択された文書を示す表示文書番号を表示
文書番号格納バッファ３６に格納する。The document selection unit 26 displays a list of candidate documents indicated by the candidate document numbers stored in the candidate document number storage buffer 35 which are searched by the character string search unit 25, and when there are a plurality of candidate documents, the document selection unit 26 searches. A document corresponding to the target is selected, and the display document number indicating the selected document is stored in the display document number storage buffer 36.

【００２４】検索結果表示部２７は、文字列サーチ部２
５によって検索された候補文書のうち、文書選択部２６
によって特定され、表示文書番号格納バッファ３６に格
納された表示文書番号が示す文書の内容を、表示装置１
２の画面に表示させる。The search result display unit 27 is a character string search unit 2.
Document selection unit 26 among the candidate documents searched by
The content of the document identified by the display document number stored in the display document number storage buffer 36 specified by
Display on screen 2.

【００２５】キーワード文字数格納バッファ３１は、入
力装置１１からキーワード入力部２３を介して入力され
たキーワードの文字数を格納するためのものである。キ
ーワード格納バッファ３２は、入力されたキーワード
（文字列）を格納するためのものであり、図３で示すよ
うな配列（文字バッファ０，１，…）で格納される。The keyword character number storage buffer 31 is for storing the number of keyword characters input from the input device 11 via the keyword input unit 23. The keyword storage buffer 32 is for storing input keywords (character strings), and is stored in an array (character buffers 0, 1, ...) As shown in FIG.

【００２６】テーブル管理ファイル格納バッファ３３
は、文字出現位置テーブルを管理するためのテーブル管
理ファイルを格納するためのものであり、図４（ａ）に
示すような構造体の配列となっている。すなわち、テー
ブル管理ファイル格納用構造体０，１，２，…が配列さ
れている。Table management file storage buffer 33
Is for storing a table management file for managing the character appearance position table, and has an array of structures as shown in FIG. That is, the table management file storage structures 0, 1, 2, ... Are arranged.

【００２７】テーブル管理ファイルの各テーブル管理フ
ァイル格納用構造体は、図４（ｂ）に示すような構造と
なっている。１つのテーブル管理ファイル格納用構造体
には、本ファイルのファイルサイズを格納するためのフ
ァイルサイズ格納領域、本ファイルに格納されている文
字出現位置テーブルの数を格納するためのテーブル数格
納領域、さらに文字出現位置テーブル（０〜Ｎ−１）の
本体をそれぞれ格納するための文字出現位置テーブル格
納領域、各文字出現位置テーブル（０〜Ｎ−１）へのそ
れぞれのポインタを格納するためのポインタ記憶領域、
及び各ポインタ記憶領域に対応する文書番号（０〜Ｎ−
１）を格納するための文書番号記憶領域が設けられる。Each table management file storage structure of the table management file has a structure as shown in FIG. 4 (b). In one table management file storage structure, a file size storage area for storing the file size of this file, a table number storage area for storing the number of character appearance position tables stored in this file, Further, a character appearance position table storage area for storing each body of the character appearance position table (0 to N-1) and a pointer for storing each pointer to each character appearance position table (0 to N-1) Storage area,
And the document number (0-N-) corresponding to each pointer storage area.
A document number storage area for storing 1) is provided.

【００２８】さらに、テーブル管理ファイル格納用構造
体には、文字列サーチの際に各種の処理のために必要な
変数等を格納するための領域が設けられる。この領域に
は、処理の際に内部の何番目の文字出現位置テーブルを
参照しているか記憶しておくための参照テーブル番号記
憶領域、文字出現位置テーブル中での何番目の出現位置
データを参照しているかを記憶しておくための出現位置
記憶領域が設けられる。Further, the table management file storage structure is provided with an area for storing variables and the like necessary for various processes at the time of character string search. In this area, a reference table number storage area for storing which internal character appearance position table is referred to during processing, and which appearance position data in the character appearance position table are referenced An appearance position storage area is provided for storing whether or not it is being performed.

【００２９】また、各文字出現位置テーブル格納領域に
は、図５（ｂ）に示すように、出現総数を格納するため
の出現総数格納領域、出現位置データ（０，１，…）を
格納するための出現位置データ格納領域が設けられてい
る。In each character appearance position table storage area, as shown in FIG. 5B, an appearance total number storage area for storing the total number of appearances and appearance position data (0, 1, ...) Are stored. An appearance position data storage area is provided.

【００３０】本実施例では、以後、テーブル管理ファイ
ル格納バッファ３３中のｉ番目の構造体をTMF[i]で表わ
し、この構造体のファイルサイズ格納領域に格納されて
いるファイルサイズをTMF[i].fsize、テーブル数格納領
域に格納された一つの構造体に含まれる文字出現位置テ
ーブルの数をTMF[i].nCLT 、参照テーブル番号記憶領域
の参照テーブル番号をTMF[i].iCLT 、出現位置記憶領域
をTMF[i].iLOC で表わし、またｊ番目の文字出現位置テ
ーブルをTMF[i].CLT[j] 、ｊ番目の文字出現位置テーブ
ルに対応する文書番号をTMF[i].ID[j]で表わすことにす
る。In the present embodiment, hereinafter, the i-th structure in the table management file storage buffer 33 is represented by TMF [i], and the file size stored in the file size storage area of this structure is TMF [i]. ] .fsize, the number of character appearance position tables included in one structure stored in the table number storage area is TMF [i] .nCLT, the reference table number of the reference table number storage area is TMF [i] .iCLT, The appearance position storage area is represented by TMF [i] .iLOC, the jth character appearance position table is TMF [i] .CLT [j], and the document number corresponding to the jth character appearance position table is TMF [i]. It will be represented by .ID [j].

【００３１】さらに、ｊ番目の文字出現位置テーブル格
納領域に設けられた出現総数格納領域に格納された出現
総数をTMF[i].CLT[j].N 、この文字出現位置テーブル内
のｋ番目の出現位置をTMF[i].CLT[j].LOC[k]、単語内位
置情報をTMF[i].CLT[j].PIW[k]と表すことにする。Further, the total number of appearances stored in the total number of appearances storage area provided in the jth character appearance position table storage area is TMF [i] .CLT [j] .N, and the kth character appearance position table is kth. The appearance position of TMF [i] .CLT [j] .LOC [k] is represented by TMF [i] .CLT [j] .PIW [k].

【００３２】候補文書数格納バッファ３４は、文字列サ
ーチ部２５による検索によって得られた候補となる文書
の数を格納するためのものである。候補文書番号格納バ
ッファ３５は、文字列サーチ部２５による検索によって
得られた候補となる文書のＩＤ番号を順に格納するため
のものであり、図６に示すような配列になっている。表
示文書番号格納バッファ３６は、候補となる文書の中
で、ユーザが表示を希望するものとして文書選択部２６
によって選択された文書のＩＤ番号を格納するためのも
のである。The candidate document number storage buffer 34 is for storing the number of candidate documents obtained by the search by the character string search unit 25. The candidate document number storage buffer 35 is for sequentially storing ID numbers of candidate documents obtained by the search by the character string search unit 25, and has an array as shown in FIG. The display document number storage buffer 36 determines that the user wants to display the candidate documents among the document candidates.
It is for storing the ID number of the document selected by.

【００３３】次に、外部記憶装置１４の中に格納されて
いる各ファイルの構造について説明する。外部記憶装置
１４には、図１１（ａ）に示すように、検索の対象とな
る複数の文書データが、それぞれ文書ファイルとして格
納されている。各文書データには整数の文書番号（０，
１，…）が付与されており、各文書ファイルの名もそれ
に対応したものになっている。１つの文書ファイルは、
図１１（ｂ）に示すように、文書を構成するテキストデ
ータと、文書に含まれる図表やイメージのデータを含ん
でいる。Next, the structure of each file stored in the external storage device 14 will be described. As shown in FIG. 11A, the external storage device 14 stores a plurality of document data to be searched as document files. Each document data has an integer document number (0,
1, ...) Are added, and the names of the respective document files are also corresponding thereto. One document file is
As shown in FIG. 11B, it includes text data forming a document and data of a chart or an image included in the document.

【００３４】外部記憶装置１０には、図１１（ａ）に示
す文書データの他に、文字出現位置テーブルを含む同テ
ーブルを管理するためのファイル（テーブル管理ファイ
ル）が、図１１（ｃ）に示すように格納されている。ま
ず、テーブル管理ファイルに含まれる文字出現位置テー
ブルの内容について説明する。In the external storage device 10, in addition to the document data shown in FIG. 11A, a file (table management file) for managing the table including the character appearance position table is shown in FIG. 11C. It is stored as shown. First, the contents of the character appearance position table included in the table management file will be described.

【００３５】文字出現位置テーブルは、図７（ａ）に示
すような構造となっており、外部記憶装置１０に格納さ
れている任意の文書のテキストデータ中で、任意の文字
が出現する全ての位置を、その出現総数と共に記述して
いる。また、各出現位置に対応づけて、その文字の、そ
の文字が属する単語中での位置に関する情報を記述した
ものである。The character appearance position table has a structure as shown in FIG. 7 (a), and in the text data of an arbitrary document stored in the external storage device 10, all the characters in which an arbitrary character appears. The position is described along with the total number of occurrences. In addition, the information on the position of the character in the word to which the character belongs is described in association with each appearance position.

【００３６】図７（ａ）において、文字出現位置テーブ
ルの先頭には、任意の文字の任意の文書中のテキストデ
ータ中での出現総数Ｎが格納されており、これに続い
て、各文書内での出現位置（０〜Ｎ−１）がＮ個の整数
列として格納されている。各出現位置に対応付けて、そ
の文字の、その文字が属する単語内での位置情報、つま
り単語内位置情報（０〜Ｎ−１）が格納されている。こ
の情報は文書のテキストデータに対して、予め形態素解
析を施して、単語切りを行なうことにより得られるもの
である。In FIG. 7A, the total number N of occurrences of an arbitrary character in text data in an arbitrary document is stored at the head of the character appearance position table, and subsequently, in each document. The appearance positions (0 to N-1) in are stored as N integer strings. The position information of the character in the word to which the character belongs, that is, in-word position information (0 to N-1) is stored in association with each appearance position. This information is obtained by subjecting the text data of the document to morphological analysis in advance and cutting the words.

【００３７】ここで、文字出現位置テーブルに格納され
るデータの具体例について説明する。単語内位置情報は
数字で表現されるものとし、その定義は図８に示すよう
に設定されている。すなわち、その文字は単語の先頭で
あれば「１」、その文字は単語の末尾であれば「２」、
その文字は単語の先頭であり、かつ末尾であれば
「３」、その文字は単語の先頭でなく、かつ末尾でもな
ければ「０」、とそれぞれ定義されているものとする。Here, a specific example of the data stored in the character appearance position table will be described. The in-word position information is expressed by numbers, and its definition is set as shown in FIG. That is, the character is "1" at the beginning of the word, the character is "2" at the end of the word,
The character is defined as "3" if it is the beginning and end of a word, and "0" if it is neither the beginning nor end of a word.

【００３８】例えば、「太郎は中学校に行く」というテ
キストデータについて形態素解析を行なって単語切りを
行なうと、図ｈに示すように、「太郎／は／中学校／に
／行く」（／は単語切り位置を示す）となる。For example, when morphological analysis is performed on the text data "Taro goes to junior high school" and word cutting is performed, "Taro / wa / junior high school / ni / go" (/ is word cutting It shows the position).

【００３９】従って、図９に示すテキストデータ中の各
文字の単語内位置情報は、図１０に示すように表わされ
る。すなわち、例えば文字「太」については、単語「太
郎」の先頭にあるので単語内位置情報は「１」となる。
その他の文字についても、単語内の位置に応じて、図１
０に示すように、それぞれ単語内位置情報が設定され
る。Therefore, the in-word position information of each character in the text data shown in FIG. 9 is represented as shown in FIG. That is, for example, for the character "thick", since it is at the beginning of the word "Taro", the in-word position information is "1".
For other characters, depending on the position in the word,
As shown in 0, in-word position information is set for each.

【００４０】図７（ｂ）には、任意の文字、例えば
「文」という文字のあるテキストデータ中での文字出現
位置を記述した具体例を示している。テキストデータに
「文」という文字が７回出現し、その出現位置が５６，
１２３，…，６０１１であり、各出現位置に対応して前
述のようにして決定される単語内位置情報が記述されて
いる。FIG. 7B shows a specific example in which the character appearance position is described in the text data having an arbitrary character, for example, the character "sentence". The character "sentence" appears 7 times in the text data, and its appearance position is 56,
123, ..., 6011, and the in-word position information determined as described above corresponding to each appearance position is described.

【００４１】次に、テーブル管理ファイルの全体につい
て説明する。テーブル管理ファイルは、複数の文書につ
いての文字出現位置テーブルがファイル内部に配置され
ており、この文字出現位置テーブルを管理するためのフ
ァイルである。すなわち、同じ文字コード（例えばＪＩ
Ｓコード）の文字に対する複数文書に関する文字出現位
置テーブルをまとめるためのものである。外部記憶装置
１４には、任意の文字の文字コードにそれぞれ対応した
複数個のテーブル管理ファイルが格納され、文字コード
に応じたファイル名がつけられている。Next, the entire table management file will be described. The table management file is a file for managing a character appearance position table in which character appearance position tables for a plurality of documents are arranged inside the file. That is, the same character code (for example, JI
This is for gathering the character appearance position tables for a plurality of documents for the character of S code). The external storage device 14 stores a plurality of table management files each corresponding to a character code of an arbitrary character, and is given a file name corresponding to the character code.

【００４２】図１１（ｃ）にテーブル管理ファイルの一
般的な構造を示している。テーブル管理ファイルの先頭
から、ファイルサイズ、対象とする文字を含む文書数
（Ｎ）が格納されている。その後に続いて、対象とする
文字を含む文書の文書番号と、この番号に対応する文字
出現位置テーブルのファイルの先頭からの位置（テーブ
ル位置）とを対応づけて格納している。それらに続い
て、図７に示す構造をもった各文書に対応した文字出現
位置テーブル（０〜Ｎ−１）の本体が格納されている。FIG. 11C shows the general structure of the table management file. From the head of the table management file, the file size and the number of documents (N) including the target character are stored. Subsequently, the document number of the document including the target character and the position (table position) from the beginning of the file of the character appearance position table corresponding to this number are stored in association with each other. Subsequent to them, the body of the character appearance position table (0 to N-1) corresponding to each document having the structure shown in FIG. 7 is stored.

【００４３】次に、本実施例の動作について、図１２乃
至図１６に示すフローチャートを参照しながら説明す
る。まず、メモリ装置１３に格納されたプログラムによ
って初期化部２２が起動される。初期化部２２は、制御
装置１０、入力装置１１、表示装置１２、及び外部記憶
装置１４の初期設定、メモリ装置１３中の作業領域（作
業領域部３０）の初期化、初期画面の表示などを行う
（ステップＡ１）。Next, the operation of this embodiment will be described with reference to the flowcharts shown in FIGS. First, the initialization unit 22 is activated by the program stored in the memory device 13. The initialization unit 22 performs initialization of the control device 10, the input device 11, the display device 12, and the external storage device 14, initialization of a work area (work area unit 30) in the memory device 13, display of an initial screen, and the like. Perform (step A1).

【００４４】次に、キーワード入力部２３が起動され
る。キーワード入力部２３は、外部記憶装置１４に格納
された文書ファイルから所望する文書を検索するための
キーワード（文字列）を、入力装置１１から入力させる
（ステップＡ２）。Next, the keyword input section 23 is activated. The keyword input unit 23 causes the input device 11 to input a keyword (character string) for searching for a desired document from the document file stored in the external storage device 14 (step A2).

【００４５】本実施例では、キーワードを構成する単語
の各文字が２バイトからなるＪＩＳコードで表されるも
のとする。キーワード入力部２３は、各文字の文字コー
ドを、入力した順に、順次キーワード格納バッファ３２
に格納すると共に、入力した文字数をキーワード文字数
格納バッファ３１に格納する。In the present embodiment, it is assumed that each character of the words forming the keyword is represented by a 2-byte JIS code. The keyword input unit 23 sequentially inputs the character codes of the respective characters in the order in which they are input into the keyword storage buffer 32.
And the number of input characters is stored in the keyword character number storage buffer 31.

【００４６】次に、文字出現位置テーブル選出部２４が
起動される（ステップＡ３）。文字出現位置テーブル選
出部２４は、キーワード格納バッファ３２内に格納され
ている文字コードに対応するテーブル管理ファイルを外
部記憶装置１４から呼び出し、これを順に、テーブル管
理ファイル格納バッファ３３中の各構造体に格納する。
なお、キーワード格納バッファ３２に格納された各文字
コードを順に、C0,C1,…,Cn-1 と表わすものとする。ま
た、ｎはキーワード文字数格納バッファ３１に格納され
ている数値（キーワード文字数）を示す。Next, the character appearance position table selecting section 24 is activated (step A3). The character appearance position table selection unit 24 calls a table management file corresponding to the character code stored in the keyword storage buffer 32 from the external storage device 14, and sequentially calls each structure in the table management file storage buffer 33. To store.
The character codes stored in the keyword storage buffer 32 are sequentially represented as C0, C1, ..., Cn-1. Further, n indicates a numerical value (keyword character number) stored in the keyword character number storage buffer 31.

【００４７】本実施例では、外部記憶装置１４に格納さ
れたテーブル管理ファイルのファイル名は、対応する文
字コードをASCII 形式で表現したものにファイル識別子
「.tmf」を付加したものを用いる。例えば、ＪＩＳコー
ドが１６進数の“３０２６”である文字「愛」に対応す
るテーブル管理ファイルのファイル名は「3026.tmf」と
なる。In the present embodiment, as the file name of the table management file stored in the external storage device 14, a file identifier ".tmf" is added to the corresponding character code expressed in ASCII format. For example, the file name of the table management file corresponding to the character "love" whose JIS code is hexadecimal "3026" is "3026.tmf".

【００４８】従って、１つのテーブル管理ファイルを呼
び出す際には、まず対象とする文字の文字コードを１６
進数で表した文字列に、ファイル識別子「.tmf」を付加
した文字列を生成する。文字出現位置テーブル選出部２
４は、メイン制御部２１を介して、生成した文字列をフ
ァイル名として、外部記憶装置１４より該当するファイ
ルを読み出す。文字出現位置テーブル選出部２４は、メ
イン制御部２１によって読み出されたファイルを、図４
及び図５に示すような構成によってテーブル管理ファイ
ル格納バッファ３３に格納する。Therefore, when calling one table management file, first the character code of the target character is 16
Generates a character string in which the file identifier ".tmf" is added to the character string represented by a decimal number. Character appearance position table selection unit 2
4 reads the corresponding file from the external storage device 14 via the main control unit 21 with the generated character string as the file name. The character appearance position table selection unit 24 stores the file read by the main control unit 21 in FIG.
Also, the data is stored in the table management file storage buffer 33 with the configuration shown in FIG.

【００４９】テーブル管理ファイル格納バッファ３３に
対象とする文字に対応する各テーブル管理ファイルの内
容が格納されると文字列サーチ部２５が起動される。文
字列サーチ部２５は、テーブル管理ファイル格納バッフ
ァ３３に格納されているテーブル管理ファイル構造体の
データを参照しながら、キーワードを含む文書を検索
し、該当する文書の文書番号を順に候補文書番号格納バ
ッファ３５に格納する処理を実行する。When the contents of each table management file corresponding to the target character are stored in the table management file storage buffer 33, the character string search unit 25 is activated. The character string search unit 25 searches for a document including a keyword while referring to the data of the table management file structure stored in the table management file storage buffer 33, and stores the document numbers of the corresponding documents in order of candidate document numbers. The process of storing in the buffer 35 is executed.

【００５０】文字列サーチ部２５の処理について図１３
に示すフローチャートを参照しながら説明する。図１３
は候補文書を検索する処理の概略を述べたものである。
まず、文字列サーチ部２５は、文書を検索するために用
いられる各変数、すなわちテーブル管理ファイル格納用
構造体を特定するｉ及び文字出現位置テーブルを特定す
るｊの値を初期化する（ステップＢ１）。なお、ｉ及び
ｊは整数型の変数である。また、文字列サーチ部２５
は、候補文書数格納バッファ３４（ｎCand）の値に０を
代入し、次に、テーブル管理ファイル格納バッファ３３
に格納された各構造体（キーワード中の各文字に対応す
る）に含まれる参照テーブル番号記憶領域（０≦ｉ＜ｎ
のTMF[i].iCLT ）に０を代入する。Regarding the processing of the character string search unit 25, FIG.
This will be described with reference to the flowchart shown in FIG. FIG.
Shows the outline of the processing for searching the candidate documents.
First, the character string search unit 25 initializes each variable used for searching a document, i.e., the value of i for specifying the table management file storage structure and the value of j for specifying the character appearance position table (step B1). ). Note that i and j are integer type variables. In addition, the character string search unit 25
Substitutes 0 into the value of the candidate document number storage buffer 34 (nCand), and then the table management file storage buffer 33
Reference table number storage area (0≤i <n included in each structure (corresponding to each character in the keyword) stored in
0 is assigned to TMF [i] .iCLT of.

【００５１】次に、文字列サーチ部２５は、テーブル管
理ファイル格納バッファ３３中に格納された０番目から
ｎ−１番目までの全てのテーブル管理ファイル（全ての
構造体）中で、共通する文書から生成された文字出現位
置テーブルが含まれているかどうかを判定する（ステッ
プＢ２）。すなわち、キーワードの文字列を構成する全
ての文字を含む文書が存在するかどうかの判定を行なう
（文字存在判定処理）。この文字列サーチ部２５のステ
ップＢ２における文字存在判定処理について図１４に示
すフローチャートを参照しながら説明する。Next, the character string search unit 25 makes a common document in all the table management files (all structures) from the 0th to the (n-1) th stored in the table management file storage buffer 33. It is determined whether or not the character appearance position table generated from is included (step B2). That is, it is determined whether or not there is a document including all the characters that form the character string of the keyword (character existence determination process). The character presence determination process in step B2 of the character string search unit 25 will be described with reference to the flowchart shown in FIG.

【００５２】ここで、０番目のテーブル管理ファイル中
の何番目の文字出現テーブルを参照しているかは、参照
テーブル番号TMF[0].iCLT で表される。この参照テーブ
ル番号TMF[0].iCLT の値は初期状態では０であり、後の
処理で更新される。この参照テーブル番号TMF[0].iCLT
が示す文字出現位置テーブルに対応する文書の文書番号
は、TMF[0].ID[TMF[0].iCLT]で表すことができる。Here, the reference table number TMF [0] .iCLT indicates which number character appearance table in the 0th table management file is referred to. The value of this reference table number TMF [0] .iCLT is 0 in the initial state and is updated in the subsequent processing. This reference table number TMF [0] .iCLT
The document number of the document corresponding to the character appearance position table indicated by can be represented by TMF [0] .ID [TMF [0] .iCLT].

【００５３】文字列サーチ部２５は、まずｉ＝１とし
（ステップＣ１）、参照テーブル番号TMF[i].iCLT から
テーブル数TMF[i].nCLT より内のｊの値（ｉ番目のテー
ブル管理ファイル中の各文字出現位置テーブルを示す）
に対して、文書番号TMF[i].ID[j]が、０番目のテーブル
管理ファイル（構造体０）内の参照テーブル番号TMF
[i].iCLT に対応する文書番号TMF[0].ID[TMF[0].iCLT]
と等しくなるようなｊの値が存在するかを調べる（ステ
ップＣ２）。The character string search unit 25 first sets i = 1 (step C1), and the value of j in the table number TMF [i] .nCLT from the reference table number TMF [i] .iCLT (i-th table management). Shows each character appearance position table in the file)
, The document number TMF [i] .ID [j] is the reference table number TMF in the 0th table management file (structure 0).
Document number TMF [0] .ID [TMF [0] .iCLT] corresponding to [i] .iCLT
It is checked whether there is a value of j that is equal to (step C2).

【００５４】参照テーブル番号TMF[i].iCLT は初期値が
０で、ステップＣ３の処理で更新される変数であり、テ
ーブル数TMF[i].nCLT は固定値であり、１つのテーブル
管理データ中に含まれる文字出現位置テーブルの総数を
示している。The reference table number TMF [i] .iCLT is a variable whose initial value is 0 and is updated in the processing of step C3, and the number of tables TMF [i] .nCLT is a fixed value and one table management data. The total number of character appearance position tables included in the table is shown.

【００５５】この際、文字列サーチ部２５は、処理を高
速化するために、ｊの最小値をTMF[i].iCLT 、最大値を
TMF[i].nCLT-1 とした範囲でバリナリサーチを行なう。
すなわち、ステップＣ２では、ステップＣ６でｉの値を
インクリメントすることにより、文字列サーチ部２５
は、１番目からｎ−１番目までの各テーブル管理ファイ
ルについて、それぞれのファイル中の文書番号TMF[i].I
D[j]と、０番目のテーブル管理ファイルの参照テーブル
番号に対応する文書番号TMF[0].ID[TMF[0].iCLT]と同じ
ものがあるか（該当するｊの値があるか）を判定してい
る。At this time, the character string search unit 25 sets the minimum value of j to TMF [i] .iCLT and the maximum value of j in order to speed up the processing.
Perform ballina research in the range of TMF [i] .nCLT-1.
That is, in step C2, the character string search unit 25 is incremented by incrementing the value of i in step C6.
Is the document number TMF [i] .I in each file for each of the first to n-1th table management files.
Is D [j] the same as the document number TMF [0] .ID [TMF [0] .iCLT] corresponding to the reference table number of the 0th table management file (whether there is a corresponding j value? ) Is determined.

【００５６】ステップＣ２の処理後、文字列サーチ部１
５は、TMF[i].iCLT の更新を行なう（ステップＣ３）。
文字列サーチ部２５は、ステップＣ２において、条件を
満たすｊの値を求めることができた場合（共通する文書
から求められた文字出現テーブルがあった場合）、参照
テーブル番号TMF[i].iCLT の値としてこのｊの値を格納
する。ここで、ｊの値がTMF[i].nCLT-1 であれば、参照
テーブル番号TMF[i].iCLT の値としてｊを用いる。After the processing of step C2, the character string search unit 1
5 updates TMF [i] .iCLT (step C3).
In step C2, the character string search unit 25 can obtain the value of j that satisfies the condition (when there is the character appearance table obtained from the common document), the reference table number TMF [i] .iCLT. The value of j is stored as the value of. Here, if the value of j is TMF [i] .nCLT-1, then j is used as the value of the reference table number TMF [i] .iCLT.

【００５７】一方、条件を満たすｊの値が存在しなかっ
たならば、文書番号TMF[0].ID[TMF[0].iCLT]の値を越え
るという条件を満たす文書番号TMF[i].ID[j]の内、最小
のものを参照テーブル番号TMF[i].iCLT の値として用い
る。この条件を満たす文書番号TMF[i].ID[j]が存在しな
い場合には、TMF[i].iCLT の値としてTMF[i].nCLT-1の
値を用いる。On the other hand, if the value of j satisfying the condition does not exist, the document number TMF [i]. Which satisfies the condition that the value of the document number TMF [0] .ID [TMF [0] .iCLT] is exceeded. The smallest ID [j] is used as the value of the reference table number TMF [i] .iCLT. If the document number TMF [i] .ID [j] that satisfies this condition does not exist, the value of TMF [i] .nCLT-1 is used as the value of TMF [i] .iCLT.

【００５８】ステップＣ４では、ステップＣ２の処理結
果に応じて処理が分岐される。文字列サーチ部１５は、
ステップＣ２においてサーチを失敗した場合には、判定
失敗のフラグを立ててステップＢ２の処理を終了する
（ステップＣ５）。At step C4, the process branches depending on the result of the process at step C2. The character string search unit 15
If the search is unsuccessful in step C2, a determination failure flag is set and the process of step B2 is ended (step C5).

【００５９】一方、サーチを成功した場合には、文字列
サーチ部１５は、ｉの値をインクリメントする（ステッ
プＣ６）。そして、前述のようにして、ステップＣ２の
処理を実行する。On the other hand, when the search is successful, the character string search unit 15 increments the value of i (step C6). Then, the process of step C2 is executed as described above.

【００６０】ｉの値を順次インクリメントしながらステ
ップＣ２の処理を実行することにより、全てのテーブル
管理ファイルに対してサーチが成功した場合、すなわち
キーワードの文字列を構成する全ての文字を含む文書が
存在する場合には、判定成功のフラグを立ててステップ
Ｂ２での処理を終了する（ステップＣ８）。By executing the process of step C2 while sequentially incrementing the value of i, if the search is successful for all the table management files, that is, if the document including all the characters forming the character string of the keyword is found. If it exists, a flag indicating success in determination is set and the processing in step B2 ends (step C8).

【００６１】文字列サーチ部１５は、ステップＢ２の文
字存在判定処理を終了すると、その処理結果の結果、判
定失敗のフラグが立っていれば、ステップＢ７の後処理
に移る（ステップＢ３）。また、判定成功のフラグがた
っていれば、文字接続判定、及び単語存在判定の処理を
行う（ステップＢ４）。When the character string search unit 15 completes the character presence determination process of step B2, if the result of the process indicates that the determination failure flag is set, the process proceeds to post-process of step B7 (step B3). If the determination success flag is set, character connection determination and word presence determination processing are performed (step B4).

【００６２】ステップＢ２の文字存在判定処理では、キ
ーワードの文字列を構成する全ての文字を含む文書が存
在するかどうかの判定を行うのみで各文字の接続関係の
チェック、及びサーチ対象の文字列が単語として存在す
るかの判定は行なっておらず、ステップＢ４では、これ
らの処理を行なう。In the character presence determination process of step B2, it is only determined whether or not there is a document including all the characters constituting the character string of the keyword, the connection relation of each character is checked, and the character string to be searched is searched. Is not determined as a word, and these processes are performed in step B4.

【００６３】ステップＢ４での処理の詳細について図１
５に示すフローチャートを用いて説明する。ここでは、
ステップＢ２で判定された参照テーブル番号TMF[i].iCL
T の値、つまりキーワードの１番目の文字に対応する０
番目のテーブル管理ファイルに対する参照テーブル番号
に対応する文書番号TMF[0].ID[TMF[0].iCLT]に対する文
字出現位置テーブルに対して、その内部の文字出現位置
データを参照する。Details of the processing in step B4 are shown in FIG.
This will be described with reference to the flowchart shown in FIG. here,
Reference table number TMF [i] .iCL determined in step B2
The value of T, that is, 0 corresponding to the first character of the keyword
With respect to the character appearance position table for the document number TMF [0] .ID [TMF [0] .iCLT] corresponding to the reference table number for the second table management file, the character appearance position data therein is referenced.

【００６４】そして、０≦ｉ＜ｎ−１に対して、 TMF[i+1].CLT[TMF[i+1].iLOC] ＝TMF[i].CLT[TMF[i].iL
OC]+1 を満たすTMF[i].iLOC の組（０≦ｉ＜ｎ）が存在するか
どうか調べる。Then, for 0≤i <n-1, TMF [i + 1] .CLT [TMF [i + 1] .iLOC] = TMF [i] .CLT [TMF [i] .iL
It is checked whether there is a TMF [i] .iLOC pair (0 ≦ i <n) satisfying OC] +1.

【００６５】これにより、文書中で、キーワードの文字
列を構成する全文字が互いに隣接して存在する部分が存
在するか、つまり該当する文書中に、指定されたキーワ
ードが部分文字列として含まれているかの判定がなされ
る。Thus, in the document, there is a portion in which all the characters forming the character string of the keyword exist adjacent to each other, that is, the specified keyword is included as the partial character string in the corresponding document. It is determined whether or not there is.

【００６６】まず、文字列サーチ部１５は、各文字出現
位置データ中での現在の参照番号を格納するための変数
（これらは、主にステップＤ４でバイナリサーチの効率
を高める目的で利用する）の初期化を行なう（ステップ
Ｄ１）。First, the character string search section 15 is a variable for storing the current reference number in each character appearance position data (these are mainly used in step D4 to improve the efficiency of the binary search). Are initialized (step D1).

【００６７】次に、文字列サーチ部１５は、何番目のテ
ーブル管理ファイルを参照しているかを示す変数ｉを初
期化（ｉ＝０）し（ステップＤ２）、さらにキーワード
文字列の先頭の文字の位置TMF[0].CLT[TMF[0].iLOC].LO
C[TMF[0].iLOC]を変数Ｐに設定する（ステップＤ３）。Next, the character string search unit 15 initializes a variable i indicating which number of the table management file is referred to (i = 0) (step D2), and further, the first character of the keyword character string. Location of TMF [0] .CLT [TMF [0] .iLOC] .LO
C [TMF [0] .iLOC] is set to the variable P (step D3).

【００６８】ここでは、０番目のテーブル管理データ中
の出現位置データ中に格納されている各出現位置は全
て、キーワード文字列の先頭の文字の位置である。その
何番目を参照しているかを変数TMF[0].iLOC で表してい
る。In this case, all the appearance positions stored in the appearance position data in the 0th table management data are the positions of the first character of the keyword character string. The variable TMF [0] .iLOC indicates the number of the reference.

【００６９】次に、文字列サーチ部１５は、各出現位置
テーブル中の出現位置データ中に、P+i 、つまりキーワ
ードの先頭の文字位置のｉ文字分あとのｋの値が存在す
るかバイナリサーチによって調べる。このバイナリサー
チは、文字出現位置テーブル中で下限をTMF[i].iLOC 、
上限をTMF[i].CLT[TMF[i].iCLT.N番目の出現位置データ
のみに限定して行なわれる（ステップＤ４）。Next, the character string search unit 15 determines whether the appearance position data in each appearance position table has a value P + i, that is, a value k after i characters of the first character position of the keyword. Find out by searching. In this binary search, the lower limit of TMF [i] .iLOC in the character appearance position table is
The upper limit is limited to only the TMF [i] .CLT [TMF [i] .iCLT.Nth appearance position data (step D4).

【００７０】次、文字列サーチ部１５は、次回のステッ
プＤ４で用いられるTMF[i].iLOC の値を更新する（ステ
ップＤ５）。ステップＤ５でサーチに成功した場合に
は、TMF[i].iLOC の値としてｋの値が代入される。サー
チに失敗した場合には、TMF[i].iLOC の値としてP+i の
値を越えるという条件を満たすステップＤ４におけるTM
F[i].CLT[j].LOC[k]のうち最小のものを用いる。条件を
満たすものが存在しない場合には、該当する出現位置テ
ーブルの要素数−１であるTMF[i].CLT[j].N-1 の値を用
いる。Next, the character string search section 15 updates the value of TMF [i] .iLOC used in the next step D4 (step D5). If the search is successful in step D5, the value of k is substituted as the value of TMF [i] .iLOC. If the search is unsuccessful, TM in step D4 that satisfies the condition that the value of TMF [i] .iLOC exceeds the value of P + i
Use the smallest of F [i] .CLT [j] .LOC [k]. When there is no one satisfying the condition, the value of TMF [i] .CLT [j] .N-1 which is the number of elements-1 in the corresponding appearance position table is used.

【００７１】ステップＤ６ではステップＤ４での結果に
応じて処理が分岐される。バイナリサーチが成功した場
合には、ステップＤ７に移り、失敗した場合にはステッ
プＤ１の処理に移る。In step D6, the process branches depending on the result of step D4. If the binary search is successful, the process moves to step D7, and if it fails, the process goes to step D1.

【００７２】文字列サーチ部２５は、バイナリサーチが
成功した場合、ｉの値をインクリメントする（ステップ
Ｄ７）。文字列サーチ部２５は、インクリメントしたｉ
の値とｎの値、つまりキーワードを文字列の文字数と比
較し、その結果、ｉ≧ｎならば全文字に関して条件が満
たされたことになるため、キーワードに対応する文字列
の存在を確認したことになる。この場合、文字列サーチ
部２５は、さらに単語存在判定処理を行い、サーチされ
た文字列が、形態素解析の結果、一つの単語であると見
なされたものであるか否かの単語判定処理を行なう（ス
テップＤ９）。If the binary search is successful, the character string search unit 25 increments the value of i (step D7). The character string search unit 25 uses the incremented i
The value of n and the value of n, that is, the keyword is compared with the number of characters in the character string. As a result, if i ≧ n, the condition is satisfied for all characters. Therefore, the existence of the character string corresponding to the keyword is confirmed. It will be. In this case, the character string search unit 25 further performs a word existence determination process, and performs a word determination process of whether or not the searched character string is regarded as one word as a result of the morphological analysis. Perform (step D9).

【００７３】ここでステップＤ９における単語判定処理
の詳細について図１６に示すフローチャートを参照しな
がら説明する。図１６において、まず、ステップＤ７ま
での処理でサーチされた文字列の先頭の文字の単語内位
置情報を参照し、これが単語の先頭であるものかどうか
を調べる（ステップＥ１）。Details of the word determination process in step D9 will be described below with reference to the flowchart shown in FIG. In FIG. 16, first, the intra-word position information of the leading character of the character string searched by the processing up to step D7 is referred to, and it is checked whether or not this is the beginning of the word (step E1).

【００７４】ここでは、前述した処理でサーチされた文
字列の先頭の文字の単語内位置情報を、TMF[0].CLT[TMF
[0].iCLT].PIW[TMF[0].iLOC]で表わし、これが、単語の
先頭を表わす単語内位置情報「１」であるかどうかを判
定する。Here, the in-word position information of the first character of the character string searched by the above-mentioned processing is set to TMF [0] .CLT [TMF
It is represented by [0] .iCLT] .PIW [TMF [0] .iLOC], and it is determined whether or not this is the intra-word position information “1” indicating the beginning of the word.

【００７５】この結果、判定が成功すればステップＥ３
へ、失敗すればステップＥ７へそれぞれ処理が分岐す
る。判定が成功した場合、文字列サーチ部２５は、ステ
ップＤ７までの処理でサーチされた文字列の１番目の文
字から、n-1 番目の文字までの単語内位置情報を参照
し、これらが全て、単語の頭でも、末尾でもないことを
表わす単語内位置情報「０」であるかどうかを判定する
（ステップＥ３）。As a result, if the determination is successful, step E3
If it fails, the process branches to step E7. If the determination is successful, the character string search unit 25 refers to the in-word position information from the first character to the n-1th character of the character string searched by the processing up to step D7, and all of these are found. , It is determined whether or not it is the in-word position information "0" indicating that it is neither the beginning nor the end of the word (step E3).

【００７６】ここで各単語内位置情報は、 TMF[k].CLT[TMF[k].iCLT].PIW[TMF[k].iLOC] （ｋ＝1,
2,…n-1 ）で表わされる。ここで判定が成功すればステップＥ５
へ、失敗すればステップＥ７へそれぞれ処理が分岐す
る。判定が成功した場合、文字列サーチ部２５は、ステ
ップＤ７までの処理でサーチされた文字列の末尾の文字
の単語内位置情報を参照し、これが単語の末尾であるも
のかどうかを調べる（ステップＥ５）。The position information in each word is TMF [k] .CLT [TMF [k] .iCLT] .PIW [TMF [k] .iLOC] (k = 1,
2, ... n-1). If the determination is successful here, step E5.
If it fails, the process branches to step E7. If the determination is successful, the character string search unit 25 refers to the in-word position information of the character at the end of the character string searched in the processing up to step D7, and checks whether this is the end of the word (step E5).

【００７７】ここではサーチされた文字列の先頭の文字
の単語内位置情報は、 TMF[n-1].CLT[TMF[n-1].iCLT].PIW[TMF[n-1].iLOC] で表わされ、これが、単語の末尾を表わす単語内位置情
報「２」であるかどうかを判定する。ここで判定が成功
すればステップＥ８へ失敗すればステップＥ７へそれぞ
れ処理が分岐する。Here, the in-word position information of the first character of the searched character string is TMF [n-1] .CLT [TMF [n-1] .iCLT] .PIW [TMF [n-1] .iLOC. ], And it is determined whether this is the intra-word position information “2” indicating the end of the word. If the determination is successful here, the process branches to step E8 if it fails, and the process branches to step E7.

【００７８】判定が成功した場合、文字列サーチ部２５
は、判定成功のフラグを立てて復帰する（ステップＥ
８）。また、判定が失敗した場合、文字列サーチ部２５
は、判定失敗のフラグを立てて復帰する（ステップＥ
７）。If the determination is successful, the character string search unit 25
Sets a flag for successful determination and returns (step E
8). If the determination fails, the character string search unit 25
Sets the flag of judgment failure and returns (step E
7).

【００７９】以上が、ステップＤ９内での処理の流れで
ある。ここでは、キーワードとして、３文字以上の単語
が入力された場合について述べたが、キーワードが２文
字からなる単語の場合は、前述した処理において、先頭
の末尾の文字のチェックのみが行なわれる。The above is the flow of processing in step D9. Here, a case has been described in which a word having three or more characters is input as a keyword, but in the case of a word having a two-character keyword, only the first and last characters are checked in the processing described above.

【００８０】キーワードが１文字からなる単語の場合
は、 TMF[0].CLT[TMF[0].iCLT].PIW[TMF[0].iLOC] の値のみを参照し、これが、単語の先頭でありかつ末尾
であることを示す単語内位置情報「３」であるか否かの
判定のみを行う。When the keyword is a word consisting of one character, only the value of TMF [0] .CLT [TMF [0] .iCLT] .PIW [TMF [0] .iLOC] is referred to, and this is the beginning of the word. It is only determined whether or not it is the intra-word position information "3" indicating that it is the end position.

【００８１】単語存在判定処理の後、文字列サーチ部２
５は、ステップＤ９での処理結果に応じて、判定失敗の
フラグが立っている場合、ステップＤ４に戻り前述した
処理を繰り返して実行する。また、判定成功のフラグが
立っていた場合、もう一度、判定成功のフラグを立てて
復帰する（ステップＤ１１）。After the word existence determination process, the character string search unit 2
If the determination failure flag is set according to the processing result in step D9, step 5 returns to step D4 and repeats the above-described processing. If the determination success flag is set, the determination success flag is set again and the process returns (step D11).

【００８２】一方、ステップＤ４のサーチが失敗した場
合（ステップＤ６）、文字列サーチ部２５は、０番目の
テーブル管理データ中の文字出現位置テーブルの何番目
を参照しているかを表す変数TMF[0].iLOC の値をインク
リメントする（ステップＤ１２）。On the other hand, if the search in step D4 fails (step D6), the character string search unit 25 indicates a variable TMF [which indicates which number in the character appearance position table in the 0th table management data is referred to. 0] .iLOC value is incremented (step D12).

【００８３】文字列サーチ部２５は、まずステップＤ１
２でインクリメントしたTMF[0].iLOC の値と、０番目の
テーブル管理データの該当する出現位置テーブルの要素
数TMF[0].CLT[TMF[0].iCLT].N との比較を行い、前者の
値が後者の値以上ならば、判定失敗のフラグを立てて復
帰する（ステップＤ１３）。First, the character string search section 25 executes step D1.
The value of TMF [0] .iLOC incremented by 2 is compared with the number of elements TMF [0] .CLT [TMF [0] .iCLT] .N of the corresponding appearance position table of the 0th table management data. If the former value is greater than or equal to the latter value, a judgment failure flag is set and the process returns (step D13).

【００８４】そうでない場合、文字列サーチ部２５は、
１≦ｉ＜ｎに対応するバイナリサーチのための変数TMF
[i].iLOC の値をインクリメントする。これらの各値の
うち一つでも、対応する出現位置テーブルの要素数以上
になっている場合には、判定失敗のフラグを立てて復帰
する（ステップＤ１４）。Otherwise, the character string search unit 25
Variable TMF for binary search corresponding to 1 ≦ i <n
Increment the value of [i] .iLOC. If even one of these values is equal to or larger than the number of elements in the corresponding appearance position table, a judgment failure flag is set and the process returns (step D14).

【００８５】また、以上の条件の何れも満たさない場合
には、ステップＤ２からの処理を繰り返す。つまり、０
番目のテーブル管理データ中の文字出現位置テーブルの
要素のうち、前回の次のものに対して一連の処理を行う
のである。If none of the above conditions are satisfied, the processing from step D2 is repeated. That is, 0
Among the elements of the character appearance position table in the th table management data, a series of processing is performed for the next one following the previous one.

【００８６】以上がステップＢ４（図１３）における処
理の流れである。図１３においてステップＢ５は、ステ
ップＢ４の結果をうけ、判定失敗のフラグが立っていれ
ば途中の処理を行なわず、処理はステップＢ７に移る。
判定成功のフラグが立っていれば、文字列サーチ部２５
は、該当する候補文書番号を、候補文書番号格納バッフ
ァ３５に登録すると共に、候補文書数格納バッファ３４
に候補文書数を格納する（ステップＢ６）。The above is the flow of processing in step B4 (FIG. 13). In FIG. 13, step B5 receives the result of step B4, and if the determination failure flag is set, the intermediate processing is not performed and the processing proceeds to step B7.
If the determination success flag is set, the character string search unit 25
Registers the relevant candidate document number in the candidate document number storage buffer 35, and at the same time stores the candidate document number storage buffer 34.
The number of candidate documents is stored in (step B6).

【００８７】すなわち、ステップＢ６では、候補文書番
号格納バッファ３５の候補文書番号格納バッファ３４で
示される位置にステップＣ２で現在参照している文書の
ＩＤ番号、つまりTMF[0].ID[TMF[0].iCLT]を格納し、候
補文書番号３４の内容をインクリメントし、処理をステ
ップＢ７に移す。That is, in step B6, the ID number of the document currently referred to in step C2, that is, TMF [0] .ID [TMF [at the position indicated by the candidate document number storage buffer 34 in the candidate document number storage buffer 35. [0] .iCLT] is stored, the content of the candidate document number 34 is incremented, and the process proceeds to step B7.

【００８８】文字列サーチ部２５は、ステップＢ７にお
いて、図１３に示した一連の処理の後処理として、０番
目のテーブル管理データの中の何番目の出現位置データ
を参照しているかを示す変数TMF[0].iCLT をインクリメ
ントする。In step B7, the character string search section 25 is a variable indicating which number of appearance position data in the 0th table management data is referred to as post-processing of the series of processing shown in FIG. Increment TMF [0] .iCLT.

【００８９】続くステップＢ８では、ステップＢ７でイ
ンクリメントしたTMF[0].iCLT の値と、０番目のテーブ
ル管理データ中に含まれる出現位置テーブルの総数を示
す変数TMF[0].nCLT との値を比較し、前者の値が後者の
値以上ならば文字列サーチ部分２５での処理を終え復帰
する（ステップＢ９）。At the subsequent step B8, the value of TMF [0] .iCLT incremented at step B7 and the value of the variable TMF [0] .nCLT indicating the total number of appearance position tables included in the 0th table management data. Are compared, and if the former value is equal to or larger than the latter value, the processing in the character string search portion 25 is finished and the process returns (step B9).

【００９０】そうでない場合には、文字列サーチ部２５
は、１≦ｉ＜ｎに対応するTMF[i].iCLT の値をインクリ
メントする。これらの各値のうち一つでも対応する出現
位置テーブルの総数TMF[i].nCLT の値以上になっている
場合には、メイン処理に復帰する（ステップＢ９）。以
上の条件の何れも満たされない場合はステップＢ２から
の前述した一連の処理を繰り返す。If not, the character string search unit 25
Increments the value of TMF [i] .iCLT corresponding to 1 ≦ i <n. If even one of these values is equal to or more than the total number TMF [i] .nCLT of the corresponding appearance position tables, the process returns to the main process (step B9). If none of the above conditions are satisfied, the series of processes described above from step B2 is repeated.

【００９１】こうして、ステップＡ４における文字列サ
ーチの処理が完了すると、次に文書選択部２６が駆動さ
れる。まず、文書選択部２６は、候補文書格納バッファ
３５中に格納されている各文書番号に対応する文書のタ
イトルの一覧表を表示装置１２の画面上に表示させる。
タイトルの一覧表示された画面の状況の一例を図１７に
示している。When the character string search processing in step A4 is completed in this way, the document selecting section 26 is driven next. First, the document selection unit 26 displays a list of document titles corresponding to each document number stored in the candidate document storage buffer 35 on the screen of the display device 12.
FIG. 17 shows an example of the status of the screen on which the list of titles is displayed.

【００９２】文書選択部２６は、タイトルの一覧表示
後、ユーザによる入力装置１１を用いたタイトルの指定
を入力し、指定されたタイトルに対応する文書番号を表
示文書番号格納バッファ３６に格納する（ステップＡ
５）。After displaying the list of titles, the document selection section 26 inputs the title designation by the user using the input device 11, and stores the document number corresponding to the designated title in the display document number storage buffer 36 ( Step A
5).

【００９３】次に、検索結果表示部２７が駆動される
（ステップＡ６）。検索結果表示部２７は、表示文書番
号格納バッファ３６に格納されている文書番号に対応す
る文書ファイルを外部記憶装置１４から読み出す。検索
結果表示部２７は、文書ファイル中のテキストデータを
文字フォントに展開し、また図表やイメージデータをビ
ットイメージに展開し、表示装置１２の画面上に表示さ
せる。テキストデータ及び図表、イメージデータを表示
した状態の画面の一例を図１８に示している。Next, the search result display unit 27 is driven (step A6). The search result display unit 27 reads the document file corresponding to the document number stored in the display document number storage buffer 36 from the external storage device 14. The search result display unit 27 expands the text data in the document file into a character font, and expands the chart or image data into a bit image and displays it on the screen of the display device 12. FIG. 18 shows an example of a screen displaying text data, charts, and image data.

【００９４】ステップＡ６の処理の後、検索結果表示部
２７は、図１９に示すような２つのアイコン（「終了ア
イコン」「再検索アイコン」）を画面上に表示させる
（ステップＡ７）。After the processing of step A6, the search result display unit 27 displays two icons ("end icon" and "research icon") as shown in FIG. 19 on the screen (step A7).

【００９５】ここで、入力装置１１を用いて、ユーザに
よって他のタイトルが選択されると、前述したように、
選択されたタイトルに対応する文書を表示させる。ま
た、入力装置１１を用いて、「終了」アイコンが指定さ
れた場合には、本装置の全処理を終了する。一方、「再
検索」アイコンが指定された場合には、図１２における
ステップＡ２に戻り、他のキーワードをもとにした一連
の文書検索の処理を実行する。Here, when another title is selected by the user using the input device 11, as described above,
Display the document corresponding to the selected title. Further, when the “end” icon is designated using the input device 11, all the processes of this device are ended. On the other hand, if the "re-search" icon is specified, the process returns to step A2 in FIG. 12 to execute a series of document search processing based on other keywords.

【００９６】なお、前記実施例においては、外部記憶装
置１４に格納されている１つのテーブル管理ファイルを
全てテーブル管理ファイル格納バッファ３３にロードし
ているが、処理に従って、その中の必要な文字出現位置
テーブルのみをロードして格納するようにしても良い。In the above embodiment, one table management file stored in the external storage device 14 is all loaded into the table management file storage buffer 33. Only the position table may be loaded and stored.

【００９７】さらに、テーブル管理ファイル中のデータ
を一切メモリ上にロードせず、外部記憶装置１４を例え
ば磁気ディスク装置とすると、磁気ヘッドのシークを行
うことのみによって前述した文字列サーチの処理を行な
ってもよい。Furthermore, if the data in the table management file is not loaded onto the memory at all and the external storage device 14 is a magnetic disk device, for example, the above-mentioned character string search processing is performed only by seeking the magnetic head. May be.

【００９８】また、前記実施例では、形態素解析の結果
を利用し、キーワードとして入力された単語を含む文書
の検索のみを行ない、キーワードとして入力された文字
列を部分文字列として含むような文書は検索されなかっ
たが、表示の際に優先順位を考慮するなどして、部分文
字列を含む文書の表示を行なっても良い。Further, in the above embodiment, the result of the morphological analysis is used to search only the document including the word input as the keyword, and the document including the character string input as the keyword as the partial character string is searched. Although not searched, the document including the partial character string may be displayed by considering the priority order when displaying.

【００９９】さらに、各文字に対応付けて格納する単語
中内位置情報の形式は、形態素解析の結果が反映される
限り、本例以外の方式を用いることも可能である。要す
るに、本発明は、その要旨を逸脱しない限り、必要に応
じて様々な変形を行なって実施することが可能である。Further, as the format of the intra-word position information stored in association with each character, a method other than this example can be used as long as the result of the morphological analysis is reflected. In short, the present invention can be implemented with various modifications as necessary without departing from the spirit of the present invention.

【０１００】[0100]

【発明の効果】以上のように本発明によれば、大量の文
書データベースに対してフルテキストサーチによって文
書検索する場合に、高速に目的とする文書を取得するこ
とが可能となる。その際に、キーワードとして入力した
文字列を、単語中の部分文字列として含む文書を考慮し
て本来所望する文書を正確に選び出すことが可能になる
ため、検索者は、不適当な文書の内容を読む必要がなく
なるなどの実用上多大な効果を得ることができるもので
ある。As described above, according to the present invention, when a document search is performed by a full-text search for a large amount of document databases, it is possible to quickly obtain a target document. At that time, it becomes possible to accurately select the originally desired document in consideration of the document including the character string input as the keyword as the partial character string in the word. It is possible to obtain a great effect in practice, such as eliminating the need to read.

[Brief description of drawings]

【図１】本発明の一実施例に係わる文書検索装置の概略
構成を示すブロック図。FIG. 1 is a block diagram showing a schematic configuration of a document search device according to an embodiment of the present invention.

【図２】本実施例における制御装置１０及びメモリ装置
１３によって実現される機能構成を示すブロック図。FIG. 2 is a block diagram showing a functional configuration realized by a control device 10 and a memory device 13 in this embodiment.

【図３】本実施例におけるキーワード格納バッファ３１
の構造を説明するための図。FIG. 3 is a keyword storage buffer 31 in this embodiment.
For explaining the structure of the.

【図４】本実施例におけるテーブル管理ファイル格納バ
ッファ３３の一部の構造を説明するための図。FIG. 4 is a diagram for explaining a partial structure of a table management file storage buffer 33 in the present embodiment.

【図５】本実施例におけるにおけるテーブル管理ファイ
ル格納バッファ３３の一部の構造を説明するための図。FIG. 5 is a diagram for explaining a partial structure of a table management file storage buffer 33 in the present embodiment.

【図６】本実施例における候補文書番号格納バッファ３
４の構造を説明するための図。FIG. 6 is a candidate document number storage buffer 3 in this embodiment.
4 is a view for explaining the structure of FIG.

【図７】本実施例における文字出現位置テーブルの構造
を説明するための図。FIG. 7 is a diagram for explaining the structure of a character appearance position table in this embodiment.

【図８】本実施例における単語内位置情報の定義を示す
図。FIG. 8 is a diagram showing the definition of in-word position information in the present embodiment.

【図９】本実施例におけるテキストデータに対する形態
素解析による単語切りの一例を示す図。FIG. 9 is a diagram showing an example of word segmentation by morphological analysis on text data in the present embodiment.

【図１０】本実施例におけるテキストデータ中の各文字
に付与される単語内位置情報の例を示す図。FIG. 10 is a diagram showing an example of in-word position information given to each character in text data in the present embodiment.

【図１１】本実施例における外部記憶装置１４に格納さ
れた文書ファイル及びテーブル管理ファイルの構造を説
明するための図。FIG. 11 is a diagram for explaining the structure of a document file and a table management file stored in the external storage device 14 in this embodiment.

【図１２】本実施例における全体の処理の流れを示すフ
ローチャート。FIG. 12 is a flowchart showing the flow of overall processing according to the present embodiment.

【図１３】本実施例における文字列サーチ部２５による
処理の流れの概略を示すフローチャート。FIG. 13 is a flowchart showing an outline of the flow of processing by the character string search unit 25 in the present embodiment.

【図１４】本実施例における文字存在判定処理の流れを
示すフローチャート。FIG. 14 is a flowchart showing the flow of character presence determination processing in this embodiment.

【図１５】本実施例における文字接続及び単語存在判定
の処理の流れを示すフローチャート。FIG. 15 is a flowchart showing the flow of processing for character connection and word presence determination in this embodiment.

【図１６】本実施例における単語存在判定処理の流れを
示すフローチャート。FIG. 16 is a flowchart showing the flow of word presence determination processing in this embodiment.

【図１７】本実施例におけるタイトルの一覧表を表示後
の画面の一例を示す図。FIG. 17 is a diagram showing an example of a screen after displaying a list of titles in the present embodiment.

【図１８】本実施例における文書データを表示した後の
画面の一例を示す図。FIG. 18 is a diagram showing an example of a screen after displaying document data according to the present embodiment.

【図１９】本実施例における終了処理を行なうためのア
イコンが表示された画面の一例を示す図。FIG. 19 is a diagram showing an example of a screen on which an icon for performing an ending process according to the present embodiment is displayed.

[Explanation of symbols]

１０…制御装置、１１…入力装置、１２…表示装置、１
３…メモリ装置、１４…外部記憶装置、２１…メイン制
御部、２２…初期化部、２３…キーワード入力部、２４
…文字出現位置テーブル選出部、２５…文字列サーチ
部、２６…文字選択部、２７…検索結果表示部、３１…
キーワード文字数格納バッファ、３２…キーワード格納
バッファ、３３…テーブル管理ファイル格納バッファ、
３４…候補文書数格納バッファ、３５…候補文書番号格
納バッファ、３６…表示文書番号格納バッファ。10 ... Control device, 11 ... Input device, 12 ... Display device, 1
3 ... Memory device, 14 ... External storage device, 21 ... Main control unit, 22 ... Initialization unit, 23 ... Keyword input unit, 24
... Character appearance position table selection section, 25 ... Character string search section, 26 ... Character selection section, 27 ... Search result display section, 31 ...
Keyword character number storage buffer, 32 ... Keyword storage buffer, 33 ... Table management file storage buffer,
34 ... Candidate document number storage buffer, 35 ... Candidate document number storage buffer, 36 ... Displayed document number storage buffer.

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/30 ＪＩＣＳＴファイル（ＪＯＩＳ)─────────────────────────────────────────────────── ─── Continuation of the front page (58) Fields surveyed (Int.Cl. ⁷ , DB name) G06F 17/30 JISST file (JOIS)

Claims

(57) [Claims]

1. A document retrieval apparatus for retrieving a candidate document containing a keyword consisting of an arbitrarily given character string from among a plurality of documents containing text data consisting of a character code string, wherein each character code contained in the document A table management file for managing the character appearance position table that stores the information indicating the position of each character in the document and the position in the word that includes the same character for each document for each document A table management file storage means for storing a keyword, a keyword input means for inputting a character string as a keyword, a table management file corresponding to each character of the keyword input by the keyword input means, Table management function selected from the means
Whether the keyword input by the keyword input means is included in the text data of one document based on the file selection means and the character appearance position table of each table management file selected by the table management file selection means. A document search device comprising: a character string search unit for determining whether or not there is a search result display unit for displaying the content of a candidate document determined to contain a keyword by the character string search unit.

2. The information indicating the position of each character in the word in the character appearance position table is information indicating whether the character position is at the beginning or the end of the corresponding word. The document retrieval device according to claim 1, characterized in that

3. The information indicating the position of each character in the word stored in the character appearance position table is subjected to morphological analysis on the text data of each document to be searched and word-by-word units in the text. The document search device according to claim 1, wherein the document search device is set based on a result of division.

4. The character string search means refers to position information of each character in the word in the character appearance position table within a word, and the input keyword is not only used as a partial character string in the text of the document. 2. The document search apparatus according to claim 1, further comprising a word presence determination unit that determines whether or not the word exists.