JPH1083404A

JPH1083404A - Full text retrieving system and recording medium recorded with program

Info

Publication number: JPH1083404A
Application number: JP9185124A
Authority: JP
Inventors: Kenjiro Naemura; 健二郎苗村; Tadahiro Shirai; 直裕白井; Ryoko Kitagawa; 良子北川; Yoshiaki Suzuki; 善昭鈴木; Shinya Sugiyama; 晋也杉山; Tsunaki Hamamoto; 綱樹濱本; Naoya Wada; 直也和田; Tomohide Sugaya; 友秀菅谷
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1996-07-15
Filing date: 1997-07-10
Publication date: 1998-03-31

Abstract

PROBLEM TO BE SOLVED: To easily grasp where a character string corresponding to a keyword exists in a document for an operator. SOLUTION: This system comprises an image file 6 which stores document image data of each document, a database 14 which stores each character string included in each document and a character position table which stores the relation ship between each character and the character position in every document. In this case, the document designating information of a document that includes a character string corresponding to an inputted keyword is retrieved from the database 14, document image data corresponding to the retrieved document designating information is read out of an image file 6, and the read document image data and the retrieved document designating information are outputted for displaying. Furthermore, the position of each character constituting character string corresponding to a keyword in displayed document image data is read from the character position table, and the image data of a character position that is read in the displayed document image data is emphasis- displayed.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、大量の文書から必
要とする情報が記載された文書を検索する情報検索シス
テムに係わり、特に文書中のどの位置に検索対象の情報
が存在するかを把握できる全文検索システム及びこのプ
ログラムを記録した記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information retrieval system for retrieving a document in which necessary information is described from a large number of documents. The present invention relates to a possible full-text search system and a recording medium on which the program is recorded.

【０００２】[0002]

【従来の技術】一般の情報検索システムのデータベース
に新規の文書（文献）を登録する場合は、該当文書に含
まれる複数のキーワードをデータベースに登録する。そ
して、この情報検索システムを用いて必要な情報が記載
された文書（文献）を調べる場合は、必要な情報に関係
するキーワードでデータベースを検索すると、このキー
ワードが登録された文書の文書番号，文書名，発行所，
著者，発行日，所蔵場所等の該当文書を特定する文書指
定情報が検索結果として出力される。2. Description of the Related Art When a new document (document) is registered in a database of a general information retrieval system, a plurality of keywords included in the document are registered in the database. When a document (document) in which necessary information is described is searched using this information search system, a database is searched with a keyword related to the necessary information, and the document number and the document of the document in which the keyword is registered are searched. Name, issuer,
Document specification information that specifies the relevant document, such as the author, issue date, and holding place, is output as a search result.

【０００３】このような情報検索システムのなかには、
各文書に含まれる全ての文字、単語，文節等をキーワー
ドとしてデータベースに登録して、各文書に含まれる情
報を漏れなく検索できる全文検索システムが実用化され
ている。Some of such information retrieval systems include:
2. Description of the Related Art A full-text search system has been put into practical use in which all characters, words, phrases, and the like included in each document are registered as keywords in a database, and information contained in each document can be searched without omission.

【０００４】また、データベースに登録された各文書の
テキストデータも文書ファイルに記憶保持している全文
検索システムも実用化されている。このような全文検索
システムにおいては、操作者は必要な情報に関するキー
ワドでデータベースを検索して、該当キーワードの文字
列を含む文書を特定する文書指定情報を得る。Further, a full-text search system has been put to practical use in which text data of each document registered in a database is also stored and held in a document file. In such a full-text search system, an operator searches a database using a keyword related to necessary information, and obtains document designation information for specifying a document including a character string of a corresponding keyword.

【０００５】そして、文書ファイルから文書指定情報で
特定された文書のテキストデータを例えば、ＣＲＴ表示
装置の表示画面上に呼出す。そして、操作者は、自己が
必要とする情報が記載された文書か否かを判断する。Then, text data of the document specified by the document specification information is called from the document file, for example, on a display screen of a CRT display device. Then, the operator determines whether or not the document includes information required by the operator.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら、上述し
た全文検索システムにおいてもまだ改良すべき次のよう
な課題があった。一般に、全文検索システムにおいて
は、データベースに登録された文書に含まれる全ての文
字や単語や文節がキーワードとし検索される。したがっ
て、該当文書の本質的な内容に直接係わりのない文字や
単語や文節も検索対象となる。However, the above-described full-text search system has the following problems to be improved. Generally, in a full-text search system, all characters, words, and phrases included in a document registered in a database are searched as keywords. Therefore, characters, words, and phrases that are not directly related to the essential contents of the relevant document can also be searched.

【０００７】よって、１回の検索操作で多数の文書が検
索される。この多数の文書の各テキストデータ中のどの
位置に該当キーワードが存在するかを確かめて、該当文
書が目的とする内容を有するか否かを判別する作業が非
常に繁雑になる。Therefore, a large number of documents are retrieved by one retrieval operation. It becomes very complicated to check where the corresponding keyword exists in each of the text data of the large number of documents and determine whether or not the corresponding document has the desired content.

【０００８】特に、文書が通常の論文形式のように段落
毎に区切られていた場合は、比較的内容を把握しやす
い、しかし、一般に、テキストデータには、罫線や表や
図形は含まれないので、請求書、領収書、伝票等の各種
帳票等のテキストデータにおいては、帳票上における各
文字の記載位置は全く無視されて、全ての各文字，各単
語が詰めて記憶されている。したがって、このテキスト
データを見る限りにおいては、各文字，各単語が帳票上
のどの位置に位置するものであるかが全く把握できな
い。In particular, when the document is divided into paragraphs as in a normal paper format, it is relatively easy to grasp the contents. However, in general, text data does not include ruled lines, tables, and figures. Therefore, in the text data of various forms such as invoices, receipts, and slips, the position of each character on the form is completely ignored, and all the characters and words are stored together. Therefore, as far as this text data is viewed, it is impossible to grasp at which position on the form each character or each word is located.

【０００９】その結果、たとえ、テキストデータ上でキ
ーワードに対応する文字列（単語）が発見されたとして
も、そのキーワードが帳票上でどのような意味を有する
のかが判断できない。例えば、［合計］と言うキーワー
ドであれば、その合計の位置によって、例えば売上合計
か支払合計かが定まる。As a result, even if a character string (word) corresponding to the keyword is found on the text data, it cannot be determined what meaning the keyword has on the document. For example, if the keyword is “total”, the position of the total determines, for example, the total sales or the total payment.

【００１０】よって、たとえキーワードでもって文書が
特定されたとしても、自己が必要とする文書か否かを簡
単に判断できない問題がある。本発明はこのような事情
に鑑みてなされたものであり、データベースに登録され
た各文書の画像データをイメージファイルとして記憶保
持することによって、表示画面に検索結果である文書指
定情報と該当文書の画像データとを同時に表示可能であ
り、たとえ帳票形式の文書であったとしても、操作者に
とってキーワードに対応する文字列が文書中のどの位置
に存在するのかを簡単に把握でき、必要とする文書か否
かを簡単に判断でき、検索作業能率を大幅に向上できる
全文検索システム及びこのプログラムを記録した記録媒
体を提供することを目的とする。Therefore, even if a document is specified by a keyword, there is a problem that it cannot be easily determined whether or not the document is required by the user. The present invention has been made in view of such circumstances, and by storing and holding image data of each document registered in a database as an image file, document designation information as a search result and a corresponding document The image data can be displayed at the same time, and even if the document is in a form, the operator can easily grasp where the character string corresponding to the keyword exists in the document, and can obtain the required document. It is an object of the present invention to provide a full-text search system capable of easily determining whether or not the search operation has been performed and greatly improving the search operation efficiency, and a recording medium storing the program.

【００１１】[0011]

【課題を解決するための手段】上記課題を解消するため
に、請求項１の全文検索システムにおいては、各文書毎
に該当文書の文書画像データを記憶するイメージファイ
ルと、各文書に含まれる各文字列を文書指定情報に対応
して記憶するデータベースと、各文書毎に該当文書に含
まれる各文字と該当文字の文書画像データ内における位
置との関係を記憶する文字位置テーブルと、入力された
キーワードに対応する文字列を含む文書の文書指定情報
をデータベースから検索する検索手段と、検索された文
書指定情報に対応する文書画像データをイメージファイ
ルから読出す文書画像データ読出手段と、読出された文
書画像データ及び検索された文書指定情報を表示出力す
る検索結果表示手段と、表示された文書画像データ内に
おけるキーワードに対応する文字列を構成する各文字の
位置を文字位置テーブルから読出して、表示された文書
画像データ内の読出された文字位置の画像データを強調
表示する強調表示手段とを備えている。According to a first aspect of the present invention, there is provided a full-text search system, comprising: an image file for storing document image data of a corresponding document for each document; A database for storing character strings corresponding to the document designation information, a character position table for storing a relationship between each character included in the document and a position of the character in the document image data for each document; Search means for searching a database for document specification information of a document including a character string corresponding to a keyword; document image data reading means for reading document image data corresponding to the searched document specification information from an image file; Search result display means for displaying and outputting the document image data and the searched document designation information, and a keyword in the displayed document image data The position of each character constituting the corresponding character string is read from the character position table, and a highlight unit for highlighting the image data of the read character position in the document image data displayed.

【００１２】請求項２の発明は上記請求項１の全文検索
システムにおける検索手段が複数の文書指定情報を検索
したとき、文書画像データ読出手段は各文書指定情報に
対応する文書画像データを読出し、かつ検索結果表示手
段は読出された複数の文書画像データ及び検索された複
数の文書指定情報を一つの表示画面に表示出力するよう
にしている。According to a second aspect of the present invention, when the retrieval means in the full-text retrieval system of the first aspect retrieves a plurality of pieces of document designation information, the document image data reading means reads the document image data corresponding to each document designation information, The search result display means displays and outputs a plurality of read document image data and a plurality of searched document designation information on one display screen.

【００１３】請求項３の全文検索システムにおいては、
各文書毎に該当文書の文書画像データを記憶するイメー
ジファイルと、各文書に含まれる各文字列を文書指定情
報に対応して記憶するデータベースと、各文書毎に該当
文書に含まれる各文字と該当文字の文書画像データ内に
おける位置との関係を記憶する文字位置テーブルと、入
力されたキーワードに対応する文字列を含む文書の文書
指定情報をデータベースから検索する検索手段と、検索
された文書指定情報に対応する文書画像データをイメー
ジファイルから読出す文書画像データ読出手段と、読出
された文書画像データ内におけるキーワードに対応する
文字列を構成する各文字の位置を文字位置テーブルから
読出して、読出した各文字位置を含む所定領域の領域画
像データを読出された文書画像データ内から抽出する領
域画像データ抽出手段と、抽出された領域画像データ及
び検索された文書指定情報を表示出力する検索結果表示
手段とを備えている。[0013] In the full-text search system according to claim 3,
An image file that stores the document image data of the corresponding document for each document, a database that stores each character string included in each document corresponding to the document specification information, and a file that stores each character included in the corresponding document for each document. A character position table for storing a relationship between the position of the character in the document image data, a search means for searching a document for document specification information of a document including a character string corresponding to the input keyword, and a searched document specification Document image data reading means for reading document image data corresponding to information from an image file; and reading and reading the position of each character constituting a character string corresponding to a keyword in the read document image data from a character position table. Extraction of area image data of a predetermined area including each character position from the read document image data It comprises a stage, and a search result display means for displaying outputs the extracted region image data and the retrieved document specifying information was.

【００１４】請求項４の発明は、請求項３記載の全文検
索システムに対して、さらに、検索結果表示手段にて表
示された領域画像データ内における読出された文字位置
の画像データを強調表示する強調表示手段を付加してい
る。According to a fourth aspect of the present invention, in the full-text search system according to the third aspect, the image data at the character position read out in the area image data displayed by the search result display means is highlighted. Highlighting means is added.

【００１５】請求項５の全文検索システムにおいては、
各文書毎に該当文書の文書画像データを記憶するイメー
ジファイルと、各文書に含まれる各文字列を文書指定情
報に対応して記憶するデータベースと、各文書毎に該当
文書に含まれる各文字と該当文字の文書画像データ内に
おける位置との関係を記憶する文字位置テーブルと、キ
ーワード及び文書画像データ内における指定領域が入力
される入力手段と、入力手段を介して入力されたキーワ
ードに対応する文字列を含む文書の文書指定情報をデー
タベースから検索する検索手段と、検索された文書指定
情報に対応する文書画像データをイメージファイルから
読出す文書画像データ読出手段と、読出された文書画像
データ内におけるキーワードに対応する文字列を構成す
る各文字の位置を文字位置テーブルから読出す文字位置
読出手段と、読出された文字位置が入力された指定領域
内に位置していたとき、読出した文書画像データ及び検
索された文書指定情報を表示出力する検索結果表示手段
とを備えている。[0015] In the full text search system of claim 5,
An image file that stores the document image data of the corresponding document for each document, a database that stores each character string included in each document corresponding to the document specification information, and a file that stores each character included in the corresponding document for each document. A character position table for storing the relationship between the position of the character in the document image data, input means for inputting a keyword and a designated area in the document image data, and a character corresponding to the keyword input via the input means Searching means for searching a document for document specification information of a document including a column; document image data reading means for reading document image data corresponding to the searched document specification information from an image file; Character position reading means for reading the position of each character constituting the character string corresponding to the keyword from the character position table; When the character position was located has been designated region input, and a retrieval result displaying means for displaying the output document image data and the retrieved document designation information was read out.

【００１６】請求項６の発明は、請求項５記載の全文検
索システムに対して、さらに、検索結果表示手段にて表
示された文書画像データ内における指定領域内の読出さ
れた文字位置の画像データを強調表示する強調表示手段
を付加している。According to a sixth aspect of the present invention, there is provided the full-text search system according to the fifth aspect, further comprising image data of a read character position in a designated area in the document image data displayed by the search result display means. Is highlighted.

【００１７】このように構成された請求項１の全文検索
システムにおいては、データベースに登録されている文
書の文書画像データが別途イメージファイルに記憶保持
されている。そして、操作者がキーワードを入力する
と、データベースから該当キーワードに対応する文字列
が含まれる文書の文書指定情報が検索されて、対応する
文書の文書画像データとともに表示出力される。In the full-text search system according to the present invention, the document image data of the document registered in the database is separately stored in the image file. When the operator inputs a keyword, the database is searched for document designation information of a document including a character string corresponding to the keyword, and is displayed and output together with the document image data of the corresponding document.

【００１８】さらに、表示された文書画像データのうち
キーワードに対応する文字列位置の画像データが強調表
示される。したがって、操作者はキーワードに対応する
文字列が表示された文書画像データ内のどの位置に位置
しているかを一瞥して把握でき、該当文書が自己が必要
とする文書であるか否かの判断が容易になり、検索作業
能率が向上する。Further, of the displayed document image data, the image data at the character string position corresponding to the keyword is highlighted. Therefore, the operator can see at a glance where the character string corresponding to the keyword is located in the displayed document image data, and determine whether the corresponding document is a document required by the operator. And the search operation efficiency is improved.

【００１９】また、請求項２の発明においては、データ
ベースを検索した結果、複数の文書指定情報が検査され
た場合は、検索された複数の文書指定情報及び対応する
複数の文書画像データが一つの表示画面上に表示出力さ
れる。According to the second aspect of the present invention, when a plurality of document designation information are checked as a result of searching the database, the retrieved plurality of document designation information and the corresponding plurality of document image data are stored in one file. Displayed and output on the display screen.

【００２０】したがって、操作者は、表示された各文書
を見比べることができ、自己が必要とする文書をより効
率的に判断できる。また、複数の文書画像データを同時
に表示することにより、全部の検索結果を見る時間が短
縮できる。Therefore, the operator can compare the displayed documents, and can more efficiently determine the documents required by the operator. In addition, by simultaneously displaying a plurality of document image data, the time required to view all search results can be reduced.

【００２１】請求項３の発明においては、検索された文
書指定情報の文書画像データ内のキーワードに対応する
文字位置を含む所定領域のみが抽出されて、文書指定情
報とともに表示出力される。According to the third aspect of the present invention, only a predetermined area including a character position corresponding to a keyword in the document image data of the searched document designation information is extracted and displayed and output together with the document designation information.

【００２２】その結果、文書画像データ内の不必要な部
分は表示されないので、操作者は、表示された領域画像
データ内からキーワードに対応する文字を簡単に探し出
すことが可能となる。また、キーワード部分が大きく表
示されるので、文字も読取り易くなる。As a result, unnecessary portions in the document image data are not displayed, so that the operator can easily find a character corresponding to the keyword from the displayed region image data. In addition, since the keyword portion is displayed in a large size, characters can be easily read.

【００２３】請求項４においては、さらに、表示された
領域画像データ内のキーワードに対応する文字列の部分
の画像データが強調表示されるので、操作者は一瞥して
キーワードに対応する文字列を確認できる。According to the fourth aspect, since the image data of the character string corresponding to the keyword in the displayed area image data is further highlighted, the operator looks at the character string corresponding to the keyword at a glance. You can check.

【００２４】請求項５の発明においては、操作者が、キ
ーワードと共に文書画像データ内における領域を指定す
ると、データベースから検索された文書指定情報のうち
キーワードに対応する文字列が先の指定領域に存在する
場合のみ、この検索された文書指定情報及び対応する文
書画像データが表示出力される。In the invention according to claim 5, when the operator specifies an area in the document image data together with the keyword, a character string corresponding to the keyword in the document specification information retrieved from the database exists in the previous specified area. Only when this is done, the retrieved document specification information and the corresponding document image data are displayed and output.

【００２５】すなわち、例えば、帳票等のように、検索
対象となるキーワードの文書中における位置がある程度
特定できる場合は、領域指定を行うことによって、検索
される文書指定情報数が制限され、操作者にとって、必
要とする文書を選択する場合の選択範囲が狭くなり、結
果的に検索作業能率が向上する。That is, for example, when the position of a keyword to be searched in a document can be specified to some extent, such as in a form, the number of document specification information to be searched is limited by specifying an area. Therefore, the selection range when selecting a required document is narrowed, and as a result, the search work efficiency is improved.

【００２６】請求項６においては、さらに、表示された
文書画像データ内のキーワードに対応する文字列の部分
の画像データが強調表示されるので、操作者は一瞥して
キーワードに対応する文字列を確認できる。In the sixth aspect, the image data of the character string portion corresponding to the keyword in the displayed document image data is further highlighted, so that the operator looks at the character string corresponding to the keyword at a glance. You can check.

【００２７】また、請求項７のプログラムを記録したコ
ンピュータ読み取り可能な記録媒体においては、各文書
毎に該当文書の文書画像データを第１の記憶手段に記億
する機能と、前記各文書に含まれる各文字列を文書指定
情報に対応して第２の記憶手段に記憶する機能と、前記
各文書毎に該当文書に含まれる各文字と該当文字の文書
画像データ内における位置との関係を第３の記億手段に
記憶する機能と、入力されたキーワードに対応する前記
文字列を含む文書の文書指定情報を第２の記憶手段から
検索する機能と、この検素された文書指定情報に対応す
る前記文書画像データを第１の記憶手段から読み出す機
能と、この読み出された文書画像データ及び前記検索さ
れた文書指定情報を表示出力する機能と、前記表示され
た文書画像データ内における前記キーワードに対応する
前記文字列を構成する各文字の位置を第３の記憶手段か
ら読み出して、前記表示された文書画像データ内の前記
読み出された文宇位置の画像データを強調表示する機能
とを実現させるプログラムを記録している。According to a seventh aspect of the present invention, there is provided a computer readable recording medium having recorded thereon a program for storing document image data of a corresponding document in a first storage means for each document, The function of storing each character string corresponding to the document designation information in the second storage means, and the relationship between each character included in the corresponding document and the position of the corresponding character in the document image data for each of the documents. And a function of searching the second storage unit for the document designation information of the document including the character string corresponding to the input keyword, and a function corresponding to the document designation information obtained by the search. A function of reading the document image data from the first storage unit, a function of displaying the read document image data and the retrieved document designation information, and a function of displaying the displayed document image data. Reading the position of each character constituting the character string corresponding to the keyword from the third storage means, and highlighting the image data at the read sentence position in the displayed document image data. The program that realizes the function is recorded.

【００２８】請求項８のプログラムを記録したコンピュ
ータ読み取り可能な記録媒体においては、各文書毎に該
当文書の文書画像データを第１の記憶手段に記憶する機
熊と、前記各文書に含まれる各文字列を文書指定情報に
対応して第２の記憶手段に記憶する機能と、前記各文書
毎に該当文書に含まれる各文字と該当文字の文書画像デ
ータ内における位置との関係を第３の記憶手段に記憶す
る機能と、入力されたキーワードに対応する前記文字列
を含む文書の文書指定情報を第２の記億手段から検索さ
せる機能と、この検索された文書指定情報に対応する前
記文書画像データを第１の胆憶手段から読み出す機能
と、この読み出された文書画像データ内における前記キ
ーワードに対応する前記文字列を構成する各文字の位置
を第３の記憶手段から読み出して、この読み出した各文
字位置を含む所定領域の領域画像データを前記読み出さ
れた文書画像データ内から抽出する機能と、この抽出さ
れた領域画像データ及び前記検索された文書指定画像を
表示出力する機能とを実現させるプログラムを記録して
いる。[0028] In a computer readable recording medium on which the program according to claim 8 is recorded, a mechanism for storing document image data of the document in the first storage means for each document; A function of storing the character string in the second storage means in correspondence with the document designation information, and a third relation between each character included in the corresponding document and a position of the corresponding character in the document image data for each of the documents. A function of storing the document in the storage unit, a function of causing the second storage unit to search for the document designation information of the document including the character string corresponding to the input keyword, and a function of the document corresponding to the searched document designation information. The function of reading image data from the first storage means, and the position of each character constituting the character string corresponding to the keyword in the read document image data is stored in the third storage means. A function of reading and extracting area image data of a predetermined area including the read character positions from the read document image data, and displaying the extracted area image data and the searched document designation image. A program for realizing the output function is recorded.

【００２９】さらに、請求項９のプログラムを記録した
コンピュータ読み取り可能な記録媒体においては、各文
書毎に該当文書の文書画像データを第１の記憶手段に記
憶する機熊と、前記各文書に含まれる各文字列を文書指
定情報に対応して第２の記憶手段に記憶する機能と、前
記各文書毎に該当文書に含まれる各文字と該当文字の文
書画像データ内における位置との関係を第３の記憶手段
に記憶する機能と、キーワード及ぴ前記文書画像データ
内における指定領域の入力を受け付ける機能と、この入
力されたキーワードに対応する前記文字列を含む文書の
文書指定情報を第２の記憶手段から検索させる機能と、
この検索された文書指定情報に対応する前記文書画像デ
ータを第１の記憶手段から読み出す機能と、この読み出
された文書画像データが前記入力された指定領域内に位
置していた揚合に、前記読み出した文書画像データ及び
前記検索された文書指定情報を表示出力する機能とを実
現させるプログラムを記録している。Further, in a computer-readable recording medium on which the program according to claim 9 is recorded, a mechanism for storing document image data of the document in the first storage means for each document; The function of storing each character string corresponding to the document designation information in the second storage means, and the relationship between each character included in the corresponding document and the position of the corresponding character in the document image data for each of the documents. 3, a function for receiving an input of a keyword and a designated area in the document image data, and a document designation information of a document including the character string corresponding to the inputted keyword in a second unit. A function to search from storage means,
A function for reading the document image data corresponding to the searched document designation information from the first storage means, and a case where the read document image data is located in the input designated area, A program for realizing a function of displaying and outputting the read document image data and the searched document designation information.

【００３０】[0030]

【発明の実施の形態】以下本発明の一実施形態を図面を
用いて説明する。図１は実施形態の全文検索システムの
全体構成を示すブロック図である。この全文検索システ
ムの管理者は、例えば図２に示す印字フォーマットを有
する文書１ａが記載された帳票１を画像読取装置２へ挿
入する。画像読取装置２は挿入された帳票１の図２に示
すフォーマットで記載された文書１ａの画像データを読
取って、文書画像データとして文字認識装置３及び画像
管理部４へ送出する。また、前記管理者は、帳票１を画
像読取装置２へ挿入するとともに、該当帳票１を特定す
る文書特定情報としての文書番号を文書番号入力部５へ
入力する。DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing the overall configuration of the full-text search system according to the embodiment. The administrator of the full-text search system inserts the form 1 on which the document 1a having the print format shown in FIG. The image reading device 2 reads the image data of the document 1a described in the format shown in FIG. 2 of the inserted form 1, and sends it to the character recognition device 3 and the image management unit 4 as document image data. In addition, the administrator inserts the form 1 into the image reading device 2 and inputs a document number as document specifying information for specifying the form 1 into the document number input unit 5.

【００３１】画像管理部４は画像読取装置２から入力さ
れた帳票１の文書１ａの文書画像データを文書番号入力
部５から入力された文書番号を付してイメージファイル
６へ書込む。The image management unit 4 writes the document image data of the document 1 a of the form 1 input from the image reading device 2 into the image file 6 with the document number input from the document number input unit 5.

【００３２】文字認識装置３は、画像読取装置２から入
力された帳票１の文書１ａの文書画像データから各文字
を読取って文字コードに変換してテキストデータとして
テキスト管理部１１へ送出する。当然、このテキストデ
ータには帳票１に表記されている罫線や表や記載位置の
情報は含まれない。テキスト管理部１１は入力したテキ
ストデータを文書番号入力部５から入力された文書番号
を付して文書ファイル１２へ書込む。The character recognizing device 3 reads each character from the document image data of the document 1a of the form 1 input from the image reading device 2, converts it to a character code, and sends it to the text management unit 11 as text data. Naturally, the text data does not include information on ruled lines, tables, and description positions described in the form 1. The text management unit 11 writes the input text data into the document file 12 with the document number input from the document number input unit 5.

【００３３】文字位置検出部７は、文字認識装置３から
入力したテキストデータに含まれる各文字が該当文書１
ａの文書画像データ内のどの位置に位置するかを検出し
て、各文書１ａ毎に、図３に示す文字位置テーブル８ａ
を作成する。作成した各文字位置テーブル８ａをそれぞ
れ文書番号入力部５から入力された文書番号を付して文
字位置ファイル８へ書込む。The character position detecting section 7 determines that each character included in the text data input from the character
a in the document image data of document a, and a character position table 8a shown in FIG.
Create Each of the created character position tables 8a is written to the character position file 8 with the document number input from the document number input unit 5.

【００３４】具体的には、図３に示すように、文書１ａ
のテキストデータを構成する各文字の位置は文書画像デ
ータ上の（ｘ，ｙ）座標で表記されている。対応設定部
９は、画像管理部４，文字位置検出部７及び文書番号入
力部５からの各情報に基づいて、図４に示す管理テーブ
ル１０ａを作成する。この管理テーブル１０ａ内には、
各文書１ａを特定する文書番号毎に、該当文書１ａの文
書画像データの番号及び文字位置テーブル８ａの番号が
登録されている。対応設定部９は、作成した管理テーブ
ル１０ａを管理ファイル１０へ書込む。More specifically, as shown in FIG.
Are represented by (x, y) coordinates on the document image data. The correspondence setting unit 9 creates the management table 10a shown in FIG. 4 based on each information from the image management unit 4, the character position detection unit 7, and the document number input unit 5. In this management table 10a,
For each document number that specifies each document 1a, the number of the document image data of the corresponding document 1a and the number of the character position table 8a are registered. The correspondence setting unit 9 writes the created management table 10a into the management file 10.

【００３５】データベース登録部１３は、図５に示すフ
ォーマットの全文検索データベース１４に、文書ファイ
ル１２に記憶した各文書１ａのテキストデータに含まれ
る各文字、各単語を登録する。The database registration unit 13 registers each character and each word included in the text data of each document 1a stored in the document file 12 in the full-text search database 14 having the format shown in FIG.

【００３６】具体的には、図５に示すように、ひらか
な、カタカナ，漢字、英数字等の各１文字毎に、また、
２つの文字からなる２文字単語毎に、さらに、３つの文
字からなる３文字単語毎に、さらに、４つの文字からな
る４文字単語毎に、……、該当文字列が各文書のテキス
トデータ内に存在するか否かの情報が登録されている。
存在すれば［１］が設定され、存在しなければ［０］が
設定される。More specifically, as shown in FIG. 5, for each character such as hiragana, katakana, kanji, and alphanumeric characters,
For each two-character word composed of two characters, for each three-character word composed of three characters, and for each four-character word composed of four characters, the corresponding character string is included in the text data of each document. Is registered as information as to whether or not it exists.
If it exists, [1] is set, and if it does not exist, [0] is set.

【００３７】例えば、［本日］の２文字からなる２文字
単語は、文書番号１の文書１ａ及び文書番号３の文書１
ａに記述されていることを示す。また、［請求書］の３
文字からなる３文字単語は、文書番号２の文書１ａにの
み記述されていることを示す。For example, a two-character word consisting of two characters of [today] is a document 1a of document number 1 and a document 1 of document number 3
a. Also, [Invoice] 3
The three-letter word composed of letters indicates that the word is described only in the document 1a of document number 2.

【００３８】検索入力部１５は、例えばキーボード等か
らなり、操作者がキーワードを入力する。また、操作者
は必要に応じて指定領域を入力する。データベース検索
部１６は、検索入力部１５を介して入力されたキーワー
ドで全文検索データベース１４を検索して、該当キーワ
ードに対応する文字列（単語）が含まれる［１］に設定
された文書指定情報としての文書番号を読出す。データ
ベース検索部１６は全文検索データベース１４で検索し
た１個又は複数の文書番号を次の検索結果編集部１７へ
送付する。The search input unit 15 is composed of, for example, a keyboard, and the operator inputs a keyword. Further, the operator inputs a designated area as needed. The database search unit 16 searches the full-text search database 14 using the keyword input via the search input unit 15, and sets the document designation information set to [1] including a character string (word) corresponding to the keyword. The document number as is read. The database search unit 16 sends one or more document numbers searched in the full-text search database 14 to the next search result editing unit 17.

【００３９】検索結果編集部１７は、検索された文書番
号に対応する文書１ａの文書画像データの番号と文字位
置テーブル８ａの番号とを管理ファイル１０の管理テー
ブル１０ａから読取る。そして、イメージファアル６か
ら該当番号の文書画像データを読出す。また、文字位置
ファイル８内の該当番号の文字位置テーブル８ａから、
入力したキーワードに対応する文字列（単語）の該当文
書画像データ内における位置を示す各（ｘ，ｙ）座標を
読出す。The search result editing unit 17 reads the number of the document image data of the document 1a corresponding to the searched document number and the number of the character position table 8a from the management table 10a of the management file 10. Then, the document image data of the corresponding number is read from the image file 6. Also, from the character position table 8a of the corresponding number in the character position file 8,
The (x, y) coordinates indicating the position of the character string (word) corresponding to the input keyword in the corresponding document image data are read.

【００４０】そして、検索結果編集部１７は、検索され
た文書番号、キーワード、該当文書番号の文書画像デー
タを表示部１８の表示画面１８ａに検索結果として表示
出力する。Then, the search result editing unit 17 displays and outputs the searched document number, the keyword, and the document image data of the corresponding document number on the display screen 18a of the display unit 18 as a search result.

【００４１】さらに、検索結果編集部１７は、表示部１
８の表示画面１８ａに表示された文書画像データ内の前
記読出されたキーワードに対応する文字列（単語）の画
像データを例えば［黄］や［緑］等の他の部分とは異な
る色で強調表示する。Further, the search result editing unit 17 includes the display unit 1
8, image data of a character string (word) corresponding to the read keyword in the document image data displayed on the display screen 18a is emphasized in a color different from other parts such as [yellow] or [green]. indicate.

【００４２】この検索結果を表示部１８の表示画面１８
ａに表示する表示方法に対して複数の［表示モード］が
準備されており、操作者はこの［表示モード］を任意に
選択可能である。The search result is displayed on the display screen 18 of the display unit 18.
A plurality of [display modes] are prepared for the display method to be displayed in a, and the operator can arbitrarily select this [display mode].

【００４３】具体的には、［表示モード］として、図１
０に示す［縮小表示モード］、図１１に示す［強調表示
モード］、図１２に示す［領域抽出表示モード］、及び
図示しない［領域指定表示モード］の合計４つが準備さ
れている。Specifically, as [display mode], FIG.
A total of four [reduced display mode] shown in FIG. 0, [highlighted display mode] shown in FIG. 11, [region extraction display mode] shown in FIG. 12, and [region designation display mode] not shown are prepared.

【００４４】次に、各［表示モード］を指定した場合に
おけるデータベース検索部１６及び検索結編集部１７の
各具体的動作を図６〜図９を用いて説明する。図６は表
示モードが［縮小表示モード］に設定された場合の検索
処理動作を示す流れ図である。Next, the specific operations of the database search unit 16 and the search and edit unit 17 when each [display mode] is designated will be described with reference to FIGS. FIG. 6 is a flowchart showing the search processing operation when the display mode is set to [reduced display mode].

【００４５】Ｐ１にて検索入力部１５からキーワードが
入力されると、データベース検索検索部１６は、全文検
索データベース１４から、該当キーワードの文字列（単
語）が含まれる文書１ａの１個又は複数の文書番号を検
索する（Ｐ２）。次に、イメージファイル６から該当文
書番号に対応する文書１ａの文書画像データを読出す
（Ｐ３）。読出した文書画像データを表示部１８の表示
画面１８ａの面積の例えば１／８程度の面積に縮小する
（Ｐ４）。When a keyword is input from the search input unit 15 at P1, the database search / search unit 16 reads one or more documents 1a containing the character string (word) of the keyword from the full-text search database 14. The document number is searched (P2). Next, the document image data of the document 1a corresponding to the document number is read from the image file 6 (P3). The read document image data is reduced to, for example, about ８ of the area of the display screen 18a of the display unit 18 (P4).

【００４６】そして、図１０（ａ）に示すように、この
縮小された文書画像データ１９，キーワード２０及び文
書番号２１を検索結果として表示部１８の表示画面１８
ａに表示出力する。As shown in FIG. 10 (a), the reduced document image data 19, the keyword 20, and the document number 21 are used as search results to display the display screen 18 of the display unit 18.
Display output to a.

【００４７】Ｐ６にて、検索されてまだ表示処理してい
ない文書番号が存在すれば、Ｐ３へ戻り、該当文書番号
に対する検索結果の表示処理を開始する。そして、例え
ば、２つの文書番号が検索された場合は、図１０（ｂ）
に示すように、表示画面１８ａには、縮小された２つの
文書画像データ１９と１つのキーワード２０と２つの文
書番号２１とが表示される。At P6, if there is a document number that has been searched and has not been displayed yet, the process returns to P3, and display processing of the search result for the document number is started. Then, for example, when two document numbers are searched, FIG.
As shown in FIG. 5, two reduced document image data 19, one keyword 20, and two document numbers 21 are displayed on the display screen 18a.

【００４８】このように、表示画面１８ａには、検索さ
れた文書番号２１とともに、文書番号２１に対応する文
書１ａの縮小された文書画像データ１９が表示されるの
で、操作者は、キーワード２０に対応する文字列（単
語）が含まれる文書１ａの内容を表示画面１８ａ上で確
認できる。したがって、操作者は表示された文書１ａが
自己が必要とする文書１ａか否かを短時間でかつ簡単に
判断可能である。その結果、検索作業能率が大幅に向上
する。As described above, the reduced document image data 19 of the document 1 a corresponding to the document number 21 is displayed on the display screen 18 a together with the searched document number 21. The content of the document 1a including the corresponding character string (word) can be confirmed on the display screen 18a. Therefore, the operator can easily determine in a short time whether or not the displayed document 1a is the document 1a required by the operator. As a result, search operation efficiency is greatly improved.

【００４９】図７は表示モードが［強調表示モード］に
設定された場合の検索処理動作を示す流れ図である。Ｑ
１にて検索入力部１５からキーワードが入力されると、
データベース検索検索部１６は、全文検索データベース
１４から、該当キーワードの文字列（単語）が含まれる
文書１ａの１個又は複数の文書番号を検索する（Ｑ
２）。次に、イメージファイル６から該当文書番号に対
応する文書１ａの文書画像データを読出す（Ｑ３）。FIG. 7 is a flowchart showing a search processing operation when the display mode is set to [highlighted display mode]. Q
When a keyword is input from the search input unit 15 in step 1,
The database search / search unit 16 searches the full-text search database 14 for one or more document numbers of the document 1a including the character string (word) of the corresponding keyword (Q
2). Next, the document image data of the document 1a corresponding to the document number is read from the image file 6 (Q3).

【００５０】次に、Ｑ４にて、文字位置ファイル８の対
応する文書番号の文字位置テーブル８ａからキーワード
に対応する文字列（単語）の文字位置を示す（ｘ，ｙ）
座標を読出す。そして、Ｑ５にて、図１１（ａ）に示す
ように、先に読出した文書画像データ２２、キーワード
２０，文書番号２１を表示部１８の表示画面１８ａに表
示出力する。Next, at Q4, the character position of the character string (word) corresponding to the keyword is indicated from the character position table 8a of the corresponding document number of the character position file 8 (x, y).
Read the coordinates. Then, at Q5, the previously read document image data 22, keyword 20, and document number 21 are displayed on the display screen 18a of the display unit 18 as shown in FIG.

【００５１】さらに、表示された文書画像データ１９内
の前記読出した文字位置の画像データ２３を異なる色で
強調表示する（Ｑ６）。Ｑ７にて、検索されてまだ表示
処理していない文書番号が存在すれば、Ｑ３へ戻り、該
当文書番号に対する検索結果の表示処理を開始する。Further, the image data 23 at the read character position in the displayed document image data 19 is highlighted in a different color (Q6). In Q7, if there is a document number that has been searched and has not been displayed yet, the process returns to Q3, and display processing of the search result for the document number is started.

【００５２】そして、例えば、２つの文書番号が検索さ
れた場合は、図１１（ｂ）に示すように、表示画面１８
ａには、２つの文書画像データ２２と１つのキーワード
２０と２つの文書番号２１とが表示される。Then, for example, when two document numbers are retrieved, as shown in FIG.
In a, two document image data 22, one keyword 20, and two document numbers 21 are displayed.

【００５３】このように、表示画面１８ａには、検索さ
れた文書番号２１とともに、文書番号２１に対応する文
書１ａの文書画像データ２２が表示され、かつ文書画像
データ２２内のキーワード２０に対応する文字列（単
語）の画像データ２３が強調表示される。As described above, the document image data 22 of the document 1 a corresponding to the document number 21 is displayed on the display screen 18 a together with the retrieved document number 21, and corresponds to the keyword 20 in the document image data 22. The character string (word) image data 23 is highlighted.

【００５４】したがって、操作者は、キーワード２０に
対応する文字列（単語）が含まれる文書１ａの内容を表
示画面１８ａ上で確認できるとともに、該当文書１ａ内
のキーワードに対応する文字位置が即座に確認できる。
よって、操作者は表示された文書１ａが自己の必要とす
る文書１ａか否かを短時間でかつ簡単に判断可能であ
る。その結果、検索作業能率が大幅に向上する。Therefore, the operator can check the contents of the document 1a including the character string (word) corresponding to the keyword 20 on the display screen 18a, and immediately determine the character position corresponding to the keyword in the document 1a. You can check.
Therefore, the operator can easily and quickly determine whether the displayed document 1a is the document 1a required by the operator. As a result, search operation efficiency is greatly improved.

【００５５】また、表示部１８の表示画面１８ａには検
索結果として、キーワード２０と文書番号２１の他に、
該当文書１ａの文書画像データ２２も同時に表示され
る。したがって、たとえこの文書１ａに記載された各文
字列（単語）が帳票１等のように罫線や表で区切られた
状態であったとしても、キーワードに対応する文字列
（単語）の該当文書１ａ上における記載位置を一瞥して
確認できる。Further, in addition to the keyword 20 and the document number 21, the display screen 18 a of the display 18
The document image data 22 of the document 1a is also displayed at the same time. Therefore, even if each character string (word) described in the document 1a is separated by a ruled line or a table as in the form 1, etc., the corresponding document 1a of the character string (word) corresponding to the keyword You can see at a glance the location described above.

【００５６】よって、この文書１ａに記載されたキーワ
ードに対応する文字列が自己の必要とする情報に関係す
るか否かを短時間で把握できる。図８は表示モードが
［領域抽出表示モード］に設定された場合の検索処理動
作を示す流れ図である。Therefore, it is possible to grasp in a short time whether or not the character string corresponding to the keyword described in the document 1a relates to the information required by the user. FIG. 8 is a flowchart showing a search processing operation when the display mode is set to [region extraction display mode].

【００５７】Ｒ１にて検索入力部１５からキーワードが
入力されると、データベース検索検索部１６は、全文検
索データベース１４から、該当キーワードの文字列（単
語）が含まれる文書１ａの１個又は複数の文書番号を検
索する（Ｒ２）。次に、イメージファイル６から該当文
書番号に対応する文書１ａの文書画像データを読出す
（Ｒ３）。When a keyword is input from the search input unit 15 at R1, the database search / search unit 16 reads one or more documents 1a containing the character string (word) of the keyword from the full-text search database 14. The document number is searched (R2). Next, the document image data of the document 1a corresponding to the document number is read from the image file 6 (R3).

【００５８】次に、Ｒ４にて、文字位置ファイル８の対
応する文書番号の文字位置テーブル８ａからキーワード
に対応する文字列（単語）の文字位置を示す（ｘ，ｙ）
座標を読出す。そして、Ｒ５にて、この読出した文字位
置を囲む所定領域を設定し、前記読出した文書画像デー
タから設定された所定領域の画像データを領域画像デー
タとして抽出する（Ｒ６）。そして、Ｒ７にて、図１２
（ａ）に示すように、今回抽出した領域画像データ２
４、キーワード２０，文書番号２１を表示部１８の表示
画面１８ａに表示出力する。Next, at R4, the character position of the character string (word) corresponding to the keyword is indicated from the character position table 8a of the corresponding document number of the character position file 8 (x, y).
Read the coordinates. Then, in R5, a predetermined area surrounding the read character position is set, and image data of the set predetermined area is extracted as area image data from the read document image data (R6). Then, at R7, FIG.
As shown in (a), the region image data 2 extracted this time
4. Display the keyword 20 and the document number 21 on the display screen 18a of the display unit 18.

【００５９】Ｒ８にて、検索されてまだ表示処理してい
ない文書番号が存在すれば、Ｒ３へ戻り、該当文書番号
に対する検索結果の表示処理を開始する。そして、例え
ば、２つの文書番号が検索された場合は、表示画面１８
ａには、２つの領域画像データ２４と１つのキーワード
２０と２つの文書番号２１が表示される。At R8, if there is a document number that has been searched and has not been displayed yet, the process returns to R3, and display processing of the search result for the document number is started. Then, for example, when two document numbers are searched, the display screen 18
In a, two area image data 24, one keyword 20 and two document numbers 21 are displayed.

【００６０】このように、表示画面１８ａには、キーワ
ードに対応する文字列（単語）を囲む所定領域の領域画
像データ２４のみが表示され、キーワードに対応する文
字列（単語）が記載されていない部分の画像データが表
示されないので、操作者としては、表示された領域画像
データ２４から容易にキーワードに対応する文字列（単
語）を確認できる。その結果、該当文書１ａが必要か否
かの判断を迅速に行うことができる。As described above, only the area image data 24 of the predetermined area surrounding the character string (word) corresponding to the keyword is displayed on the display screen 18a, and the character string (word) corresponding to the keyword is not described. Since the partial image data is not displayed, the operator can easily confirm the character string (word) corresponding to the keyword from the displayed area image data 24. As a result, it is possible to quickly determine whether the document 1a is necessary.

【００６１】さらに、図１２（ｂ）に示すように、前述
した［強調表示モード］で採用した手法を用いて、表示
画面１８ａに表示されている領域画像データ２４内にお
けるキーワードに対応する文字列（単語）の画像データ
２３を強調表示することが可能である。Further, as shown in FIG. 12B, a character string corresponding to a keyword in the area image data 24 displayed on the display screen 18a by using the method adopted in the above-mentioned [highlighted display mode]. (Word) image data 23 can be highlighted.

【００６２】この場合、操作者としては、領域画像デー
タ２４内のキーワードに対応する文字列（単語）を即座
に把握できるので、上述した検索作業能率をさらに向上
できる。In this case, since the operator can immediately grasp the character string (word) corresponding to the keyword in the area image data 24, the above-described search work efficiency can be further improved.

【００６３】図９は表示モードが［領域指定表示モー
ド］に設定された場合の検索処理動作を示す流れ図であ
る。Ｓ１にて検索入力部１５からキーワードが入力さ
れ、さらに、Ｓ２にて検索入力部１５から文書１ａにお
け指定領域が入力されると、Ｓ３にて、データベース検
索検索部１６は、全文検索データベース１４から、該当
キーワードの文字列（単語）が含まれる文書１ａの１個
又は複数の文書番号を検索する（Ｓ３）。FIG. 9 is a flowchart showing a search processing operation when the display mode is set to [area designation display mode]. In S1, a keyword is input from the search input unit 15, and further in S2, a designated area in the document 1a is input from the search input unit 15, and in S3, the database search / search unit 16 Then, one or more document numbers of the document 1a including the character string (word) of the corresponding keyword are searched (S3).

【００６４】なお、前記指定領域は、例えば文書１ａに
おける上半分、下半分、又は右半分、左半分、又は中央
部等の概略位置を示す情報の場合と、先頭行や最終行等
の具体的位置を示す情報の場合とがある。The specified area is, for example, information indicating the approximate position of the upper half, the lower half, or the right half, the left half, or the center of the document 1a, and the specific area such as the first line and the last line. There may be information indicating a position.

【００６５】次に、Ｓ４にて、イメージファイル６から
該当文書番号に対応する文書１ａの文書画像データを読
出す。次に、Ｓ５にて、文字位置ファイル８の対応する
文書番号の文字位置テーブル８ａからキーワードに対応
する文字列（単語）の文字位置を示す（ｘ，ｙ）座標を
読出す。Next, in S4, the document image data of the document 1a corresponding to the document number is read from the image file 6. Next, in S5, the (x, y) coordinates indicating the character position of the character string (word) corresponding to the keyword are read from the character position table 8a of the corresponding document number in the character position file 8.

【００６６】そして、Ｓ６にて、この読出した文字位置
と先に入力された指定領域とを対比して、読出した文字
位置が指定領域に入る場合は（Ｓ７）、Ｓ８へ進み、先
に読出した文書画像データとキーワードと文書番号とを
表示部１８の表示画面１８ａに表示出力する（Ｓ８）。Then, in S6, the read character position is compared with the previously input designated area, and if the read character position falls within the designated area (S7), the flow advances to S8 to read first. The document image data, the keyword, and the document number are displayed on the display screen 18a of the display unit 18 (S8).

【００６７】次に、Ｓ９にて、前述した［強調表示モー
ド］で採用した手法を用いて、表示画面８ａに表示され
ている文書画像データ内におけるキーワードに対応する
文字列（単語）の画像データを強調表示する。Next, in step S9, the image data of the character string (word) corresponding to the keyword in the document image data displayed on the display screen 8a is obtained by using the method adopted in the above-mentioned [highlighted display mode]. Is highlighted.

【００６８】Ｓ１０にて、検索されてまだ表示処理して
いない文書番号が存在すれば、Ｓ４へ戻り、該当文書番
号に対する検索結果の表示処理を開始する。そして、例
えば、２つの文書番号が検索された場合は、表示画面１
８ａには、２つの文書画像データと１つのキーワード２
０と２つの文書番号２１とが表示される。At S10, if there is a document number that has been searched and has not been displayed yet, the process returns to S4, and display processing of the search result for the document number is started. Then, for example, when two document numbers are searched, the display screen 1
8a includes two document image data and one keyword 2
0 and two document numbers 21 are displayed.

【００６９】例えば、帳票１等のように、検索対象とな
るキーワードの文書中における位置がある程度特定でき
る場合は、領域指定を行うことによって、検索される文
書番号数が制限され、操作者にとって、必要とする文書
を選択する場合の選択範囲が狭くなり、結果的に検索作
業能率が向上する。For example, when the position of a keyword to be searched in a document can be specified to some extent, such as in the form 1, the number of document numbers to be searched is limited by specifying an area. The selection range when selecting the required document is narrowed, and as a result, the search work efficiency is improved.

【００７０】このように構成された全文検索システムに
おいては、表示部１８の表示画面１８ａに表示される表
示結果として、通常の検索された文書番号２１やキーワ
ード２０の他に、該当文書番号に対応する文書１ａの文
書画像データ１９，２２が同時に表示される。In the full-text search system configured as described above, as a display result displayed on the display screen 18a of the display unit 18, in addition to the ordinary searched document number 21 and the keyword 20, the corresponding document number The document image data 19 and 22 of the document 1a to be displayed are displayed at the same time.

【００７１】したがって、操作者としては、キーワード
２０に対対応する文字列（単語）が含まれる文書１ａの
内容を表示画面１８ａ上で確認でき、操作者は表示され
た文書１ａが自己の必要とする文書１ａであるか否かを
短時間でかつ簡単に判断可能である。その結果、検索作
業能率が大幅に向上する。Therefore, as an operator, the contents of the document 1a including the character string (word) corresponding to the keyword 20 can be confirmed on the display screen 18a, and the operator needs to display the displayed document 1a. It is possible to easily determine in a short time whether or not the document is a document 1a. As a result, search operation efficiency is greatly improved.

【００７２】さらに、検索結果を表示部１８の表示画面
１８ａに表示出力する［表示モード］として、図１０に
示す［縮小表示モード］、図１１に示す［強調表示モー
ド］、図１２に示す［領域抽出表示モード］、及び図示
しない［領域指定表示モード］が準備されており、操作
者は必要に応じてこれらの中から最適の［表示モード］
を指定して、検索操作を行うことにより、自己に適した
最良の方法で能率的に検索処理を実施できる。Further, as the [display mode] for displaying and outputting the search result on the display screen 18a of the display unit 18, the [reduced display mode] shown in FIG. 10, the [highlighted display mode] shown in FIG. 11, and the [display mode] shown in FIG. An area extraction display mode] and an unillustrated [area designation display mode] are prepared, and the operator selects an optimum [display mode] from these as necessary.
By performing the search operation by designating, the search process can be efficiently performed by the best method suitable for the user.

【００７３】なお、本発明は上記実施形態のみに限定さ
れるものではない。例えば図１に示した全文検索システ
ムを構成する各部（画像管理部、文字位置検出部、対応
設定部、テキスト管理部、データベース登録部、検索結
果編集部、データベース検索部）の機能をプログラム化
し、予めＣＤ−ＲＯＭなどの記録媒体に書き込んでお
き、このＣＤ−ＲＯＭをＣＤ−ＲＯＭドライブを搭載し
た計算機に装着し、計算機がＣＤ−ＲＯＭからプログラ
ムをロードすることにより、上記実施形態と同様の機能
を実現することができる。The present invention is not limited to the above embodiment. For example, the functions of each unit (image management unit, character position detection unit, correspondence setting unit, text management unit, database registration unit, search result editing unit, database search unit) constituting the full text search system shown in FIG. The CD-ROM is written in advance on a recording medium such as a CD-ROM, and the CD-ROM is mounted on a computer equipped with a CD-ROM drive. Can be realized.

【００７４】なお、記録媒体としては、上記ＣＤ−ＲＯ
Ｍ以外に、磁気テープ、ＤＶＤ−ＲＯＭ、フロッピーデ
ィスク、ＭＯ、ＭＤ、ＣＤ−Ｒ、メモリカードなどでも
よい。As a recording medium, the above-mentioned CD-RO
In addition to M, a magnetic tape, DVD-ROM, floppy disk, MO, MD, CD-R, memory card, etc. may be used.

【００７５】[0075]

【発明の効果】以上説明したように、本発明の全文検索
システム及びプログラムを記録した記録媒体において
は、データベースに登録された各文書の画像データをイ
メージファイルとして記憶保持している。そして、表示
画面に検索結果である文書指定情報と該当文書の画像デ
ータとを同時に表示している。As described above, in the recording medium storing the full-text search system and the program of the present invention, the image data of each document registered in the database is stored and held as an image file. Then, the document designation information as the search result and the image data of the corresponding document are simultaneously displayed on the display screen.

【００７６】したがって、たとえ帳票形式の文書であっ
たとしても、操作者にとってキーワードに対応する文字
列が文書中のどの位置に存在するのかを簡単に把握で
き、検索された文書が自己の必要とする文書か否かを簡
単に判断でき、検索作業能率を大幅に向上できる。Therefore, even if the document is in a form, it is easy for the operator to know where in the document the character string corresponding to the keyword is located. It is possible to easily determine whether or not a document is to be searched for, and the search work efficiency can be greatly improved.

【００７７】さらに、表示画面に表示された文書画像デ
ータ内のキーワードに対応する文字列の画像データを例
えば異なる色で強調表示している。したがって、操作者
にとってキーワードに対応する文字列を一瞥するのみで
即座に確認できる。Further, image data of a character string corresponding to a keyword in the document image data displayed on the display screen is highlighted in, for example, a different color. Therefore, the operator can confirm the character string corresponding to the keyword at a glance at a glance.

【００７８】また、結果的に文書中のキーワードの検索
領域を指定することができ、検索される文書番号数が制
限され、操作者にとって、必要とする文書を選択する場
合の選択範囲が狭くなり、結果的に検索作業能率が向上
する。Further, as a result, a search area for a keyword in a document can be designated, the number of document numbers to be searched is limited, and a selection range for selecting a required document for an operator is narrowed. As a result, the search operation efficiency is improved.

【００７９】さらに、検索された文書番号の文書画像デ
ータのうちのキーワードに対応する文字列を含む所定領
域の画像データのみを表示している。よって、操作者に
とって表示された画像データ上におけるキーワードに対
応する文字列（単語）をより短時間で確認でき、検索作
業能率を大幅に向上できる。Further, only the image data of a predetermined area including the character string corresponding to the keyword among the document image data of the searched document number is displayed. Therefore, a character string (word) corresponding to the keyword on the displayed image data can be confirmed for the operator in a shorter time, and the search work efficiency can be greatly improved.

[Brief description of the drawings]

【図１】本発明の一実施形態に係わる全文検索システ
ムの概略構成を示すブロック図FIG. 1 is a block diagram showing a schematic configuration of a full-text search system according to an embodiment of the present invention.

【図２】同全文検索システムに登録される文書が記述
された帳票を示す図FIG. 2 is a diagram showing a form in which a document registered in the full-text search system is described.

【図３】同全文検索システムに形成された文字位置テ
ーブルの記憶内容を示す図FIG. 3 is a diagram showing storage contents of a character position table formed in the full-text search system.

【図４】同全文検索システムに形成された管理テーブ
ルの記憶内容を示す図FIG. 4 is a diagram showing storage contents of a management table formed in the full-text search system.

【図５】同全文検索システムに形成された全文検索デ
ータベースの記憶内容を示す図FIG. 5 is a diagram showing storage contents of a full-text search database formed in the full-text search system.

【図６】同全文検索システムの縮小表示モード設定時
における検索処理を示す流れ図FIG. 6 is a flowchart showing search processing when the reduced display mode of the full-text search system is set.

【図７】同全文検索システムの強調表示モード設定時
における検索処理を示す流れ図FIG. 7 is a flowchart showing a search process when the highlight mode is set in the full-text search system.

【図８】同全文検索システムの領域抽出表示モード設
定時における検索処理を示す流れ図FIG. 8 is a flowchart showing search processing when the area extraction display mode is set in the full-text search system.

【図９】同全文検索システムの領域指定表示モード設
定時における検索処理を示す流れ図FIG. 9 is a flowchart showing a search process when an area designation display mode is set in the full-text search system.

【図１０】同全文検索システムの縮小表示モード設定
時における表示された検索結果を示す図FIG. 10 is a view showing search results displayed when the reduced text display mode of the full-text search system is set.

【図１１】同全文検索システムの強調表示モード設定
時における表示された検索結果を示す図FIG. 11 is a view showing a displayed search result when a highlight mode is set in the full-text search system.

【図１２】同全文検索システムの領域抽出表示モード
設定時における表示された検索結果を示す図FIG. 12 is a diagram showing search results displayed when the area extraction display mode is set in the full-text search system.

[Explanation of symbols]

１…帳票、１ａ…文書、２…画像読取装置、３…文字認
識装置、５…文書番号入力部、６…イメージファイル、
８…文字位置ファイル、８ａ…文字位置テーブル、１０
…管理ファイル、１０ａ…管理テーブル、１２…文書フ
ァイル、１４…全文検索データペース、１５…検索入力
部、１６…データベース検索部、１７…検索結果編集
部、１８…表示部、１８ａ…表示画面、１９…縮小され
た文書画像データ、２０…キーワード、２１…文書番
号、２２…文書画像データ、２４…領域画像データDESCRIPTION OF SYMBOLS 1 ... Form, 1a ... Document, 2 ... Image reading device, 3 ... Character recognition device, 5 ... Document number input part, 6 ... Image file,
8 ... character position file, 8a ... character position table, 10
... management file, 10a ... management table, 12 ... document file, 14 ... full text search database, 15 ... search input section, 16 ... database search section, 17 ... search result editing section, 18 ... display section, 18a ... display screen, 19: Reduced document image data, 20: Keyword, 21: Document number, 22: Document image data, 24: Area image data

───────────────────────────────────────────────────── フロントページの続き (72)発明者鈴木善昭東京都府中市東芝町１番地株式会社東芝府中工場内 (72)発明者杉山晋也東京都府中市東芝町１番地株式会社東芝府中工場内 (72)発明者濱本綱樹東京都府中市東芝町１番地株式会社東芝府中工場内 (72)発明者和田直也東京都府中市東芝町１番地株式会社東芝府中工場内 (72)発明者菅谷友秀東京都府中市東芝町１番地株式会社東芝府中工場内 ──────────────────────────────────────────────────続き Continued on the front page (72) Yoshiaki Suzuki, Inventor: 1 Toshiba-cho, Fuchu-shi, Tokyo, Japan Inside the Toshiba Fuchu Plant, Inc. 72) Inventor Tsunatsuki Hamamoto 1 Toshiba-cho, Fuchu-shi, Tokyo Tokyo, Japan Inside the Fuchu Plant, Toshiba Corporation (72) Inventor Naoya 1 Toshiba-cho, Fuchu-shi, Tokyo Inside the Fuchu Plant, Toshiba Corporation (72) Inventor Tomohide Sugaya Tokyo 1 Toshiba-cho, Fuchu-shi Toshiba Corporation Fuchu factory

Claims

[Claims]

An image file storing document image data of the document for each document; a database storing character strings included in each document in correspondence with document designation information; A character position table that stores a relationship between each character included in the document and a position of the character in the document image data; and a search that searches the database for document specification information of a document including a character string corresponding to an input keyword. Means, document image data reading means for reading document image data corresponding to the retrieved document designation information from the image file, and retrieval for displaying and outputting the read document image data and the retrieved document designation information. A result display unit, and a position of each character constituting a character string corresponding to the keyword in the displayed document image data. A full-text search system comprising: a highlighting unit that reads from the character position table and highlights the image data of the read character position in the displayed document image data.

2. When the search means searches for a plurality of document designation information, the document image data reading means reads document image data corresponding to each document designation information, and the search result display means reads the document image data. 2. The full-text search system according to claim 1, wherein a plurality of document image data and the searched plurality of document designation information are displayed and output on one display screen.

3. An image file for storing document image data of a corresponding document for each document; a database for storing character strings included in each of the documents in correspondence with document designation information; A character position table that stores a relationship between each character included in the document and a position of the character in the document image data; and a search that searches the database for document specification information of a document including a character string corresponding to an input keyword. Means, document image data reading means for reading the document image data corresponding to the searched document designation information from the image file, and a character string corresponding to the keyword in the read document image data. The position of a character is read from the character position table, and area image data of a predetermined area including the read character positions is read out from the character position table. Issued and the area image data extracting means for extracting from the document image data, full-text search system and a search result display means for displaying output the extracted region image data and the retrieved document specifying information was.

4. The full-text search system according to claim 3, further comprising highlighting means for highlighting the image data at the read character position in the area image data displayed by the search result display means.

5. An image file for storing document image data of a corresponding document for each document; a database for storing character strings included in each of the documents in correspondence with document designation information; A character position table for storing a relationship between each character included in the document and a position of the character in the document image data; input means for inputting a keyword and a designated area in the document image data; Search means for searching the database for document designation information of a document including a character string corresponding to the keyword input by the user, and document image data for reading document image data corresponding to the searched document designation information from the image file Reading means; and positions of respective characters constituting a character string corresponding to the keyword in the read document image data. Character position reading means for reading out from the character position table, when the read character position is located in the input designated area, the read document image data and the searched document designation information A full-text search system including a search result display unit for displaying and outputting.

6. The full text according to claim 5, further comprising highlighting means for highlighting image data at the read character position in the designated area in the document image data displayed by the search result display means. Search system.

7. A function for storing document image data of a corresponding document in a first storage means for each document, and a second storage means for storing each character string included in each document in correspondence with document designation information. A function of storing the relationship between each character included in the document and the position of the character in the document image data in the third storage unit for each document, and a function corresponding to the input keyword. A function of retrieving document specification information of a document including the character string from the second storage means, a function of reading the document image data corresponding to the detected document specification information from the first storage means, A function of displaying and outputting the read document image data and the retrieved document designation information; and a third position of each character constituting the character string corresponding to the keyword in the displayed document image data.
A computer-readable recording medium which stores a program read from the storage unit of claim 1 and realizing a function of highlighting the read image data at the sentence position in the displayed document image data.

8. A storage device for storing document image data of a corresponding document in a first storage device for each document, and a second storage device for storing each character string included in each document in correspondence with document designation information. A function of storing the relationship between each character included in the document and the position of the character in the document image data for each document in the third storage unit, and a function corresponding to the input keyword. A function for causing the second storage unit to search for the document designation information of the document including the character string; a function for reading out the document image data corresponding to the searched document designation information from the first storage unit; The position of each character constituting the character string corresponding to the keyword in the read document image data is read from the third storage unit, and the area image data of the predetermined area including the read character position is stored in the previous storage unit. A function of extracting from the read document image data, the extracted region image data and the retrieved document specified image recording a program for realizing a function of displaying and outputting the computer-readable recording medium.

9. A storage device for storing document image data of a corresponding document in a first storage device for each document, and a second storage device for storing each character string included in each document in correspondence with document designation information. A function of storing, in a third storage means, a relationship between each character included in the document and a position of the character in the document image data for each document, a keyword and the document image data A function of receiving an input of a designated area in the document, a function of searching the second storage means for document specification information of a document including the character string corresponding to the input keyword, and a function of responding to the searched document specification information. A function of reading the document image data from the first storage means, and reading the document image data when the read document image data is located in the input designated area. And a computer-readable recording medium recording a program for realizing a function of displaying and outputting the searched document designation information.