JPH0785080A

JPH0785080A - System for retrieving all document

Info

Publication number: JPH0785080A
Application number: JP5186925A
Authority: JP
Inventors: Mayumi Uchida; 眞由美内田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1993-06-30
Filing date: 1993-06-30
Publication date: 1995-03-31

Abstract

PURPOSE:To provide the system for retrieving all document capable of preferentically retrieving characters with its dimension, width and form changed and easily setting the priority order. CONSTITUTION:The device is provided with a character type discriminating part 11 discriminating the type of characters such as the characters with their dimension, width, and form changed, or the underlined characters, adding the result to the character information as the character type information, and storing it in a sentence text file 9. It is also provided with a character type extraction part 15 extracting the type of characters which are discriminated by the part 11, priority order setting part 21 setting the priority order for the extracted type of characters, and retrieval part 23 retrieving the character information with the character type information of its priority order added according to the set priority order and retrieving the docments including the required words, characters, and codes.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、文字認識装置等によっ
て読み込まれた文章や計算機やワードプロセッサによっ
て作成された文書中から所望の文や単語等を含む文書を
検索する装置に係わり、特に、文字の大きさ、太さ、字
体等によって検索の順番を変えることが可能な全文書検
索システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an apparatus for retrieving a document including a desired sentence or word from a sentence read by a character recognition device or a document created by a computer or a word processor, and more particularly to a character retrieval device. The present invention relates to a whole document search system capable of changing the search order depending on the size, thickness, font, etc.

【０００２】[0002]

【従来の技術】従来より、紙面等に記載された文、単
語、文字、記号等を含む文書を文字認識装置によって読
み取らせ、得られたイメージデータの各文字、記号が何
であるかを判別し、その各結果を文字情報として記憶し
た文章テキストファイルや、計算機やワードプロセッサ
によって作成された文章を記憶した多数の文書のテキス
トファイルの中から所望の文、単語、文字、記号を含む
文書を検索し、それをユーザーに対して出力するシステ
ムとして全文書検索システムが知られている。2. Description of the Related Art Conventionally, a document including a sentence, a word, a character, a symbol, etc. written on a paper surface is read by a character recognition device to determine what each character and symbol of the obtained image data is. , A document text file that stores each result as character information, or a text file of many documents that stores sentences created by a computer or word processor is searched for a document including a desired sentence, word, character, or symbol. , All document retrieval system is known as a system that outputs it to the user.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、文章の
中に存在する文、単語等が重要であるとして他の文字と
比較し、その文字の大きさ、太さ、字体等を変える場合
や、その文、単語等にアンダーラインを引く場合がある
が、従来の全文書検索システムにおいて、これらを含む
文書を優先的に検索させるには、文書登録時にこれらに
対してキーワードを付けるしかなく、手間が掛かるとい
う問題があった。However, when a sentence, word, etc. existing in a sentence are important and compared with other characters, and the size, thickness, font, etc. of the character is changed, Sometimes sentences, words, etc. are underlined, but in the conventional all document search system, in order to preferentially search for documents containing these, it is necessary to add keywords to these at the time of document registration, which is troublesome. There was a problem of hanging.

【０００４】本発明は上記事情に鑑みて成されたもので
あり、その目的は、他の文字と比較して大きさ、太さ、
字体の異なる文、単語等およびアンダーラインの引かれ
た文、単語等を含む文書を優先して迅速に検索でき、か
つ、その優先順位の設定が容易にできる全文書検索シス
テムを提供することにある。The present invention has been made in view of the above circumstances, and its purpose is to make the size, thickness, and
To provide an all-document search system that can prioritize and quickly search for documents that include sentences with different fonts, words with underlines, sentences with underlines, words, etc., and that can easily set the priority order. is there.

【０００５】[0005]

【課題を解決するための手段】上記の目的を達成するた
めに本発明は、文章メモリに記憶された複数の文書の中
から、所望の文、単語、文字、記号を含む文書を検索す
る全文書検索システムであって、文字の大きさ、太さお
よび字体やアンダーラインの引かれた文字等の文字種類
を判別し、判別結果を文字種類情報としてその文字情報
に対応させて文章メモリに記憶させる文字種類判別手段
と、前記文字種類判別手段によって判別された各文字種
類に対して検索の優先順位を設定する優先順位設定手段
と、検索の基準となる文、単語、文字、記号が指定され
た場合、前記設定された優先順位に従って、その優先順
位の文字種類情報に対応する文字情報を検索し、前記指
定された文、単語、文字、記号の文字情報が存在したと
き、それを含む文書を検索する検索手段と、を具備する
ことを特徴としている。In order to achieve the above-mentioned object, the present invention provides an overall search for a document containing a desired sentence, word, character or symbol from a plurality of documents stored in a sentence memory. A document retrieval system that determines the size and thickness of characters and the type of characters such as fonts and underlined characters, and stores the determination result as character type information in the text memory in association with that character information. Character type discriminating means, a priority setting means for setting a retrieval priority for each character type discriminated by the character type discriminating means, and a sentence, word, character, or symbol as a reference for retrieval is designated. When the character information corresponding to the character type information of the priority is searched according to the set priority, and the character information of the designated sentence, word, character, or symbol exists, a sentence including it It is characterized by comprising: a retrieval means for retrieving.

【０００６】[0006]

【作用】上記構成によれば、文字種類判別手段によっ
て、文字の大きさ、太さおよび字体やアンダーラインの
引かれた文字等の文字種類を判別し、判別結果を文字種
類情報としてその文字情報に対応させて記憶し、その判
別された各文字種類に対して検索の優先順位を優先順位
設定手段によって設定する。そして、検索の基準となる
文、単語、文字、記号が指定された場合、検索手段によ
って前記優先順位に従って、その優先順位の文字種類情
報に対応する文字情報を検索し、所望の文、単語、文
字、記号を含む文書を検索する。According to the above construction, the character type discriminating means discriminates the size and thickness of the character and the character type such as a character and an underlined character, and the discrimination result is used as the character type information. The priority order of the search is set by the priority order setting means for each of the determined character types. Then, when a sentence, word, character, or symbol to be the reference of the search is specified, the retrieval means searches the character information corresponding to the character type information of the priority according to the priority, and the desired sentence, word, Search for documents that include characters and symbols.

【０００７】[0007]

【実施例】図１は、本発明の全文書検索システムの一実
施例を示すブロック図である。DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 is a block diagram showing an embodiment of the whole document retrieval system of the present invention.

【０００８】図１に示すように、本実施例の全文書検索
システム１は、紙面等に記載された文、単語、文字、記
号等の文章を読み取り、得られたイメージデータを文章
イメージファイル３に記憶させるイメージスキャナ５
と、文章イメージファイル３に記憶された各文字、記号
が何であるかを判別し、その結果を文字情報として文章
テキストファイル９に記憶させる文字判別部７と、文章
イメージファイル３に記憶された各文字の大きさ、太
さ、字体等の文字種類を判別し、その文字に該当する文
字情報に文字種類情報として付加する文字種類判別部１
１とから成る文字認識装置１３と、文字種類判別部１１
によって判別された文字種類を抽出する文字種類抽出部
１５と、文字種類抽出部１５によって得られた文字種類
や検索結果をユーザーに対して出力するデータ出力部１
７と、優先順位の入力等を行うデータ入力部１９と、デ
ータ入力部１９から入力される各文字種類に対する優先
順位に基づいて、各文字種類情報に対する検索の優先順
位を設定する優先順位設定部２１と、文章テキストファ
イル９に記憶された複数の文書の中から、優先順位設定
部２１に設定された優先順位に従って、その優先順位の
文字種類情報が付加された文字情報を検索していき、所
望の文、単語、文字、記号を含む文書を検索する検索部
２３と、を備えている。As shown in FIG. 1, the whole document retrieval system 1 of the present embodiment reads sentences such as sentences, words, characters and symbols written on a paper surface and the obtained image data is used as a sentence image file 3 Image scanner 5 to be stored in
And a character discriminating unit 7 that discriminates what each character and symbol stored in the sentence image file 3 is, and stores the result as character information in the sentence text file 9, and each character stored in the sentence image file 3. A character type discriminating unit 1 that discriminates a character type such as a size, thickness, and font of a character and adds it to the character information corresponding to the character as the character type information.
1 and a character recognition device 13 and a character type determination unit 11
And a data output unit 1 for outputting the character type and the search result obtained by the character type extracting unit 15 to the user.
7, a data input unit 19 for inputting a priority order, etc., and a priority order setting unit for setting a search priority order for each character type information based on the priority order for each character type input from the data input unit 19. 21 and a plurality of documents stored in the text / text file 9 in accordance with the priority order set in the priority order setting unit 21, and searches for character information to which the character type information of the priority order is added. A search unit 23 that searches for a document including a desired sentence, word, character, or symbol.

【０００９】次に本実施例の作用を図２のフローチャー
トを用いて説明する。Next, the operation of this embodiment will be described with reference to the flowchart of FIG.

【００１０】まず、ユーザーは検索を行いたい文章を文
字認識装置１３のイメージスキャナ５によって読み取ら
せ、文章イメージファイル３に記憶させる（ステップＳ
Ｔ１）。First, the user causes the image scanner 5 of the character recognition device 13 to read a sentence to be searched and stores it in the sentence image file 3 (step S).
T1).

【００１１】そして、文字判別部７では、文章イメージ
ファイル３に記憶された各文字、記号が何であるかを判
別し、その結果を文字情報として文章テキストファイル
９に記憶させる（ステップＳＴ３）。Then, the character discriminating unit 7 discriminates what each character or symbol stored in the text image file 3 is, and stores the result as the character information in the text text file 9 (step ST3).

【００１２】例えばこの文字の判別は、予め登録されて
いる文字パターンとの照合によって文字を判別するパタ
ーンマッチング法、文字の線素を解析し、ループや端
点、凹凸等の特徴を抽出してそれを基に文字を判別する
構造解析法等によって行う。For example, this character discrimination is performed by pattern matching method in which a character is discriminated by collating with a character pattern registered in advance, character line elements are analyzed, and features such as loops, end points, and unevenness are extracted and used. It is carried out by a structural analysis method or the like that discriminates characters based on.

【００１３】そして、文字種類判別部１１では、文章イ
メージファイル３に記憶された各文字の大きさ、太さ、
字体等の文字種類を判別し、その結果を文字種類情報と
してその文字に該当する文字情報に付加する（ステップ
ＳＴ５）。Then, in the character type discriminating section 11, the size and thickness of each character stored in the text image file 3
A character type such as a font is discriminated, and the result is added as character type information to the character information corresponding to the character (step ST5).

【００１４】例えば文字の大きさの判別は、イメージス
キャナ５によって得られるイメージデータを基に、図３
に示すように各文字が収まる最小の長方形または正方形
の縦と横の長さから求める。字体、太さの判別は、イメ
ージスキャナ５によって得られるイメージデータと予め
登録されているパターンとの照合等によって判別する。
アンダーラインは、イメージスキャナ５によって得られ
るイメージデータを基に、文字の下部に線が存在するか
否かによって判別する。なお、これらの文字種類の判別
は、上記の判別方法に限らず他の判別方法を用いても良
い。For example, the size of a character is determined based on the image data obtained by the image scanner 5 as shown in FIG.
It is calculated from the vertical and horizontal lengths of the smallest rectangle or square in which each character fits, as shown in. The font and thickness are determined by comparing the image data obtained by the image scanner 5 with a pattern registered in advance.
The underline is determined based on the image data obtained by the image scanner 5 based on whether or not a line exists below the character. Note that the determination of these character types is not limited to the above determination method, and other determination methods may be used.

【００１５】また、文字情報に文字種類情報を付加する
方法は、例えば図５に示すように文章テキストファイル
９の文字情報が記憶されている文字情報記憶部９ａの先
頭部分に予め文字種類情報を記憶する文字種類情報記憶
部９ｂを設けておき、この文字種類情報記憶部９ｂに、
文字種類判別部１１によって判別された文字種類に対応
する文字種類情報を記憶させる。As a method of adding character type information to the character information, for example, as shown in FIG. 5, the character type information is previously stored in the head portion of the character information storage unit 9a in which the character information of the text file 9 is stored. A character type information storage unit 9b to be stored is provided, and the character type information storage unit 9b stores
The character type information corresponding to the character type determined by the character type determination unit 11 is stored.

【００１６】このようにして文章テキストファイル９に
記憶された複数の文書の中から所望の文、単語、文字、
記号を含む文書を検索する場合、文字種類抽出部１５
は、文章テキストファイル９に存在する文字種類を文字
種類情報記憶部９ｂを検索して抽出し（ステップＳＴ
７）、図６に示すようにユーザーに対して出力する（ス
テップＳＴ９）。なお、文字種類抽出部１５は、文章テ
キストファイル９に存在する文字種類を文字種類情報記
憶部９ｂを検索して抽出しているが、文字種類判別部１
１によって文字種類が判別される度にその文字種類を保
持していき、文書を検索するときにこの保持された文字
種類をユーザーに対して出力するようにしても良い。In this way, desired sentences, words, characters, among the plurality of documents stored in the sentence text file 9,
When searching for a document containing a symbol, the character type extraction unit 15
Searches the character type information storage unit 9b for the character type present in the text text file 9 and extracts it (step ST
7), and output to the user as shown in FIG. 6 (step ST9). The character type extraction unit 15 searches the character type information storage unit 9b for the character types existing in the text text file 9, and extracts the character types.
Each time the character type is discriminated by 1, the character type may be held, and when the document is searched, the held character type may be output to the user.

【００１７】文字種類が出力されると、ユーザーは出力
された文字種類に対して検索の優先順位を図７に示すよ
うに決め、データ入力部１９から入力する。When the character type is output, the user determines the retrieval priority order for the output character type as shown in FIG. 7, and inputs it from the data input unit 19.

【００１８】優先順位設定部２１では、入力された優先
順位に基づいて、検索の優先順位を設定する（ステップ
ＳＴ１１）。その後検索部２３では、検索の基準となる
文、単語、文字、記号がユーザーによって指定された場
合、優先順位設定部２１に設定された優先順位に従って
検索を行う（ステップＳＴ１３）。The priority order setting unit 21 sets the priority order for the search based on the input priority order (step ST11). After that, in the search unit 23, when the sentence, word, character, or symbol that is the reference of the search is designated by the user, the search is performed according to the priority order set in the priority order setting unit 21 (step ST13).

【００１９】例えば、図７に示すように優先順位１が
「アンダーラインの引かれている文字」、優先順位２が
「大きさが８mmの文字」、優先順位３が「ゴシック体の
文字」と設定したとする（その他の文字種類については
優先順位は設定せずに全て同等に検索させる）。この状
態で、「システム」という単語を含む文書の検索を開始
させると、検索部２３では、まず、優先順位１の「アン
ダーラインが引かれた文字」の「システム」を含む文書
を検索するため、文字種類情報記憶部９ｂに「アンダー
ラインの引かれた文字」の文字種類情報が記憶された文
字情報のみを検索していき、「システム」があったとき
のみその文書全体をデータ出力部１７に出力する。次
に、同様にして優先順位２の「大きさが８mmの文字」の
「システム」を含む文書を検索し、その後、同様にして
優先順位３の「ゴシック体の文字」の「システム」を含
む文書を検索する。そして優先順位１，２，３の「シス
テム」を含む文書の検索が終了後、従来通りに文字種類
に関係なく「システム」を含む文書を検索する。For example, as shown in FIG. 7, priority 1 is "underlined characters", priority 2 is "characters with a size of 8 mm", and priority 3 is "Gothic characters". Suppose that it is set (priority is not set for other character types and all characters are searched equally). In this state, when a search for a document including the word "system" is started, the search unit 23 first searches for a document including the "system" of "underlined characters" of priority 1. Then, only the character information in which the character type information of “underlined characters” is stored in the character type information storage unit 9b is searched, and only when the “system” exists, the entire document is output to the data output unit 17 Output to. Next, similarly, a document including the "system" of the priority 2 "characters of 8 mm in size" is searched, and thereafter, similarly, including the "system" of the priority 3 "Gothic characters". Search for documents. Then, after the search for the document including the "system" of the priorities 1, 2, and 3 is completed, the document including the "system" is searched for as usual regardless of the character type.

【００２０】このように、本実施例では、イメージスキ
ャナ５によって得られた文章イメージデータを基に、文
字判別部７と文字種類判別部１１によって文字と文字種
類を判別し、文字情報と文字種類情報を得て、文字種類
情報をそれに対応した文字情報に付加して文章テキスト
ファイル９に記憶させる。そして、検索の基準となる
文、単語、文字、記号が指定された場合、優先順位設定
部２１に設定された優先順位に従って、その優先順位の
文字種類情報が付加された文字情報を検索し、前記指定
の文、単語、文字、記号を含む文書を検索するので、他
の文字に比較して大きさ、太さ、字体の変えられた文、
単語等およびアンダーラインの引かれた文、単語等を含
む文書を優先して迅速に漏れることなく検索することが
できる。さらに、この優先順位は、データ出力部１７か
ら出力された各文字種類を基に、ユーザーがデータ入力
部１９を用いて自由にかつ容易に決めることができる。As described above, in this embodiment, the character discriminating unit 7 and the character type discriminating unit 11 discriminate the character and the character type based on the sentence image data obtained by the image scanner 5, and the character information and the character type are obtained. After the information is obtained, the character type information is added to the corresponding character information and stored in the text / text file 9. Then, when a sentence, word, character, or symbol that is the reference of the search is specified, the character information to which the character type information of the priority is added is searched according to the priority set in the priority setting unit 21. Documents containing the specified sentences, words, characters, and symbols will be searched, so sentences with changed size, thickness, and font compared to other characters,
A document including a word or the like, an underlined sentence, a word, or the like can be preferentially searched quickly and without omission. Further, this priority order can be freely and easily determined by the user using the data input unit 19 based on each character type output from the data output unit 17.

【００２１】なお、本実施例では、紙面等に記載された
文、単語、文字、記号をイメージスキャナ５によって読
み取らせ、文章イメージファイル３に記憶させた後、文
字判別部７と文字種類判別部１１によって各文字および
各文字種類を判別し、文字情報および文字種類情報を得
ているが、既に文字情報が存在する例えば計算機等によ
って作成された文章の場合は、文字種類の判別処理（ス
テップＳＴ５）から行う。また、文字情報、文字種類情
報ともに存在する例えばワードプロセッサによって作成
された文章の場合は、文字種類抽出部１５による文字種
類の抽出処理（ステップＳＴ７）から行う。さらに、以
前にこの全文書検索システム１によって検索処理を行
い、文章テキストファイル９に文字情報、文字種類情報
ともに記憶され、かつ、文字種類も記憶されている場合
は、文字種類をユーザーに対して出力する処理（ステッ
プＳＴ９）から行う。In this embodiment, after the sentences, words, characters, and symbols written on the paper surface are read by the image scanner 5 and stored in the text image file 3, the character discriminating unit 7 and the character type discriminating unit are used. Although each character and each character type are discriminated by 11 and the character information and the character type information are obtained, in the case of a sentence created by a computer or the like for which character information already exists, the character type discriminating process (step ST5). ). In the case of a sentence created by, for example, a word processor in which both character information and character type information exist, the character type extraction unit 15 performs the character type extraction process (step ST7). Further, when the search processing is performed by the whole document search system 1 before, and the text information 9 stores both the character information and the character type information, and the character type is also stored, the character type is given to the user. The process is started from the output process (step ST9).

【００２２】さらに、本実施例では、紙面等に記載され
た文、単語、文字、記号をイメージスキャナ５によって
読み取り、文章イメージファイル３に記憶した後、各文
字を文字判別部７によって判別し、その後、各文字種類
を文字種類判別部１１によって判別しているが、各文字
を判別するときに各文字種類も同時に判別するようにし
ても良い。さらに紙面等に記載された文、単語、文字、
記号をイメージスキャナ５によって読み取らせる時に各
文字を同時に読み取らせるもしくは各文字および各文字
種類を同時に読み取らせるようにしても良い。Further, in the present embodiment, sentences, words, characters and symbols written on a paper surface are read by the image scanner 5 and stored in the text image file 3, and then each character is discriminated by the character discriminating section 7. After that, each character type is discriminated by the character type discriminating unit 11, but when discriminating each character, each character type may be discriminated simultaneously. Furthermore, sentences, words, characters, etc. written on the paper
Each character may be read at the same time when the symbol is read by the image scanner 5, or each character and each character type may be read at the same time.

【００２３】さらに、本実施例では、文字情報と文字種
類情報を図５に示すように一つの文章テキストファイル
９に記憶させているが、文字情報と文字種類情報を別々
のファイルに記憶させるようにしても良い。この場合、
文字種類情報と文字情報とを対応するように、例えば、
文字種類情報にその文字種類情報に対応する文字情報の
アドレスを併記して記憶させる。Further, in the present embodiment, the character information and the character type information are stored in one sentence text file 9 as shown in FIG. 5, but the character information and the character type information may be stored in separate files. You can in this case,
To correspond the character type information and the character information, for example,
The address of the character information corresponding to the character type information is written together with the character type information and stored.

【００２４】さらに、本実施例では、文字種類抽出部１
５によって抽出された文字種類をデータ出力部１７から
ユーザーに対して出力し、それを基にユーザーが検索の
優先順位を決めているが、これに限らず、この全文書検
索システム１によって検索される可能性のある文字種類
とそれに対応する優先順位を示すテーブルを予め検索部
２３に設定しておき、文字種類抽出部１５によって抽出
された各文字種類の優先順位を前記テーブルに基づいて
決めるようにしても良い。Further, in this embodiment, the character type extraction unit 1
The character type extracted by 5 is output from the data output unit 17 to the user, and the user determines the priority of the search based on the output. However, the present invention is not limited to this. A table indicating the possible character types and the corresponding priorities is set in advance in the search unit 23, and the priority of each character type extracted by the character type extraction unit 15 is determined based on the table. You can

【００２５】さらに、本実施例では、検索の優先順位は
一つの文字種類毎に設定されるが、これに限らず、「大
きさが８mmでかつアンダーラインの引かれた文字」とい
うように複数の文字種類を合わせて優先順位を設定する
こともできる。Further, in the present embodiment, the search priority is set for each character type, but the present invention is not limited to this, and there are a plurality of characters such as "characters of 8 mm in size and underlined". It is also possible to set the priority order by combining the character types of.

【００２６】[0026]

【発明の効果】以上説明したように本発明によれば、文
字の大きさ、太さ、字体等の文字種類を判別し、その判
別結果を文字種類情報として、文字情報に対応させて記
憶し、判別された各文字種類に対して優先順位を設定
し、検索の基準となる文、単語、文字、記号が指定され
た場合、前記優先順位に従って、その優先順位の文字種
類情報に対応する文字情報を検索し、前記指定された
文、単語、文字、記号を含む文書を検索するようにして
いるので、他の文字と比較して大きさ、太さ、字体の異
なる文、単語等およびアンダーラインの引かれた文、単
語等を含む文書を優先して迅速に漏れることなく検索で
き、かつ、容易にその優先順位の設定を行うことができ
る。As described above, according to the present invention, character types such as character size, thickness, and font are discriminated, and the discrimination result is stored as character type information in association with the character information. , A priority is set for each of the determined character types, and when a sentence, word, character, or symbol that serves as a search criterion is specified, the characters corresponding to the character type information of that priority are assigned according to the priority. Since information is searched for in documents that include the specified sentence, word, character, or symbol, sentences, words, etc. that have different sizes, thicknesses, fonts, and underscores compared to other characters. It is possible to prioritize and quickly search for a document including a line-drawn sentence, word, etc. without omission, and easily set the priority order.

[Brief description of drawings]

【図１】本発明の全文書検索システムの一実施例を示す
ブロック図である。FIG. 1 is a block diagram showing an embodiment of an all document search system of the present invention.

【図２】図１に示された実施例の作用を示すフローチャ
ートである。FIG. 2 is a flowchart showing the operation of the embodiment shown in FIG.

【図３】文字の大きさの判別方法を示す説明図である。FIG. 3 is an explanatory diagram showing a method for determining a character size.

【図４】文字情報に文字種類情報を付加する方法の一例
を示す図である。FIG. 4 is a diagram showing an example of a method of adding character type information to character information.

【図５】ユーザーに対して出力された文字種類の一例を
示す図である。FIG. 5 is a diagram showing an example of character types output to a user.

【図６】ユーザーによって入力される優先順位の一例を
示す図である。FIG. 6 is a diagram showing an example of a priority order input by a user.

[Explanation of symbols]

１全文書検索システム３文字認識装置５文章イメージファイル７文字判別部９文章テキストファイル１１文字種類判別部１３文字認識装置１５文字種類抽出部１７データ出力部１９データ入力部２１優先順位設定部２３検索部 1 All Document Search System 3 Character Recognition Device 5 Text Image File 7 Character Discrimination Unit 9 Text Text File 11 Character Type Discrimination Unit 13 Character Recognition Device 15 Character Type Extraction Unit 17 Data Output Unit 19 Data Input Unit 21 Priority Setting Unit 23 Search Department

Claims

[Claims]

1. A full document search system for searching a document including a desired sentence, word, character, or symbol from a plurality of documents stored in a text memory, the size, thickness, and font of the character. A character type discriminating means for discriminating a character type such as an underlined character or the like and storing the discrimination result as character type information in the text memory in correspondence with the character information; and the character type discriminating means. Priority setting means for setting the search priority for each character type, and when a sentence, word, character, or symbol that is the reference of the search is specified, the character of that priority is specified according to the set priority. When the character information corresponding to the type information is searched, and the character information of the specified sentence, word, character, or symbol exists,
An all-document search system comprising: a search unit that searches for a document including the document.