JPH09237317A

JPH09237317A - General document reader

Info

Publication number: JPH09237317A
Application number: JP8043053A
Authority: JP
Inventors: Yukiko Chiba; 由紀子千葉
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1996-02-29
Filing date: 1996-02-29
Publication date: 1997-09-09

Abstract

PROBLEM TO BE SOLVED: To easily execute the correction of a character recognizing result by improving the recognizing rate of a general document reader. SOLUTION: This reader consists of a scanning part 4 which scans and photoelectricaly converts a character area and the character part of a table area which are obtained by dividing a document picture into the areas of characters, tables, ruled lines and graphics and outputs a picture signal, a character recognizing part 5 setting similarity between the picture and a previously registered standard character shape to be a distance value selecting a character of this distance value within a prescribed range in the order from the small distance value as a candidate character, and an output forming part 6 outputting, in line characters of most small distance values from the recognizing result of this character recognizing part. In this case the character recognizing result is outputted based on an optional character string specified by an operator and the character kind of this character string.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は、一般文書を構成
する文字を認識する一般文書読取装置に係り、特に、指
定した範囲を指定した字種で認識することができる一般
文書読取装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a general document reader for recognizing characters forming a general document, and more particularly to a general document reader for recognizing a specified range with a specified character type.

【０００２】[0002]

【従来の技術】一般に、従来の一般文書読取装置では、
読取対象の文書画像を文字・表・罫線・図および写真の
領域に自動あるいは手動で分割し、文字領域および表領
域の文字部分について文字認識が行われる。文字認識で
は、認識辞書内に登録された文字パターンである標準字
形と、文字として切り出された画像データを比較し、文
字認識結果が選択される。認識辞書内には、例えば日本
文読取用辞書であれば、漢字、ひらがな、カタカナ、数
字、英字、記号等が登録されている。2. Description of the Related Art Generally, in a conventional general document reading apparatus,
The document image to be read is automatically or manually divided into character, table, ruled line, figure, and photograph areas, and character recognition is performed on the character portions of the character area and table area. In character recognition, a standard character shape, which is a character pattern registered in a recognition dictionary, is compared with image data cut out as a character, and a character recognition result is selected. In the recognition dictionary, for example, in the case of a dictionary for reading Japanese sentences, kanji, hiragana, katakana, numbers, letters, symbols, etc. are registered.

【０００３】[0003]

【発明が解決しようとする課題】従来の一般文書読取装
置の文字認識では、例えば、英小文字“ｌ”と数字
“１”や、英字“ｏ”“Ｏ”と数字“０”、カタカナ
“メ”と記号“×”等は、文字種は異なるが字形は大変
よく似ており、認識を誤ることが多い。そのため、英字
列の中に数字が混入する等の結果が生じることになる。
この場合、正しい文字と誤認識結果の文字の字形が似て
いるため、修正時に誤りを見落としやすく、修正が行い
にくいという問題があった。In the character recognition of the conventional general document reading apparatus, for example, the lowercase letter "l" and the numeral "1", the alphabetical letters "o", "O" and the numeral "0", and the katakana character "me" are used. "" And the symbol "x" and the like have different character types but very similar glyphs and are often misrecognized. Therefore, a result such as a number being mixed in the alphabetic character string will occur.
In this case, since the correct character and the character resulting from the erroneous recognition are similar to each other in shape, there is a problem that it is easy to overlook an error during correction and it is difficult to correct.

【０００４】この発明は、以下に示すように、認識率の
向上および認識結果の修正の簡便化を行うものである。一般文書読取装置によって文字と認識された箇所につ
いて、オペレータが指定した文字列をオペレータが指定
した文字種で文字認識することで、認識率を向上させ
る。一般文書読取装置の文字認識結果を修正する場合、文
字認識結果について、オペレータが指定した文字列を、
オペレータが指定した文字種で再度認識することで、文
字認識結果の修正を容易にする。一般文書読取装置の文字認識結果を修正する場合、文
字認識結果について、オペレータが範囲と文字種を指定
し、当該範囲の各文字位置の候補文字中から指定文字種
の候補文字のみを選択することで、文字認識結果の修正
を容易にする。As described below, the present invention improves the recognition rate and simplifies the correction of the recognition result. The recognition rate is improved by recognizing the character string designated by the operator with the character type designated by the operator at a portion recognized as a character by the general document reading device. When correcting the character recognition result of a general document reading device, regarding the character recognition result, the character string specified by the operator
By recognizing again with the character type specified by the operator, it becomes easy to correct the character recognition result. When correcting the character recognition result of the general document reading device, for the character recognition result, the operator specifies the range and the character type, and by selecting only the candidate character of the specified character type from the candidate characters at each character position of the range, It facilitates the correction of character recognition results.

【０００５】[0005]

【課題を解決するための手段】本発明は、文書画像を文
字・表・罫線・図および写真等の領域に分割したものの
うち文字領域および表領域の文字部分を走査して光電変
換し、画像信号を出力する走査部と、その画像と予め登
録された標準字形とが類似する度合いを距離値としてこ
の距離値が所定の範囲にある文字をその距離値の小さい
ものから順に候補文字として選出する文字認識部と、こ
の文字認識部の認識結果から距離値が最も小さい文字を
ならべて出力する出力形成部とからなる一般文書読取装
置において、オペレータによって指定された任意の文字
列およびこの文字列の文字種の情報に基づいて文字認識
結果を出力することを特徴とする。SUMMARY OF THE INVENTION According to the present invention, a document image is divided into regions such as characters, tables, ruled lines, figures and photographs, and the character portions of the character region and the table region are scanned and photoelectrically converted to obtain an image. A scanning unit that outputs a signal and a degree of similarity between the image and a standard character shape registered in advance are used as distance values, and characters having a distance value within a predetermined range are selected as candidate characters in order from the smallest distance value. In a general document reading device including a character recognition unit and an output forming unit that arranges and outputs the character having the smallest distance value from the recognition result of the character recognition unit, an arbitrary character string designated by an operator and the character string The feature is that the character recognition result is output based on the character type information.

【０００６】[0006]

【発明の実施の形態】以下に図を用いて本発明の実施の
形態について説明する。〔第１の実施の形態〕図１は実施の形態の構成を示すブ
ロック図である。図において、１は読取対象である帳票
を示している。２は画像入力装置であり、その帳票１を
文書画像として取り込むためのものである。３はレイア
ウト解析部であり、前記の文書画像を、自動または手動
で、文字・表・罫線・図および写真等の各領域に分割す
る。Embodiments of the present invention will be described below with reference to the drawings. [First Embodiment] FIG. 1 is a block diagram showing the configuration of the first embodiment. In the figure, reference numeral 1 indicates a form to be read. Reference numeral 2 denotes an image input device, which is used to capture the form 1 as a document image. A layout analysis unit 3 automatically or manually divides the document image into areas such as characters, tables, ruled lines, figures, and photographs.

【０００７】４は走査部であり、レイアウト解析部３に
より文字領域と判断された領域と表領域内の文字部分を
走査し、光電変換して得られる画像信号を文字認識部５
に転送する。この文字認識部５は、入力された文字画像
に対する候補文字とその距離値とからなる集合を形成
し、出力形成部６に出力する。なお、候補文字を選ぶた
めの標準字形は図示しない認識辞書に登録しておく。A scanning unit 4 scans the area determined by the layout analysis unit 3 to be a character area and the character portion in the table area, and an image signal obtained by photoelectric conversion is scanned by the character recognition unit 5.
Transfer to The character recognition unit 5 forms a set of candidate characters for the input character image and their distance values, and outputs the set to the output forming unit 6. The standard character shape for selecting the candidate character is registered in a recognition dictionary (not shown).

【０００８】出力形成部６は、文字認識部５の認識結果
から距離値が最も小さい文字を並べて出力するためのも
のである。次に具体的な認識対象を用いて、上記構成の
一般文書読取装置の処理手順を説明する。まず、画像入
力装置２から入力した文書画像は、レイアウト解析部３
により、自動あるいは手動で、文字・表・罫線・図およ
び写真の領域に分割される。図２は文字と認識された領
域の一例を示す説明図である。The output forming unit 6 is for arranging and outputting the character having the smallest distance value from the recognition result of the character recognition unit 5. Next, the processing procedure of the general document reading apparatus having the above configuration will be described using a specific recognition target. First, the document image input from the image input device 2 is processed by the layout analysis unit 3
, It is automatically or manually divided into character, table, ruled line, figure and photo areas. FIG. 2 is an explanatory diagram showing an example of a region recognized as a character.

【０００９】走査部４は、レイアウト解析部３により文
字領域と判断された領域と表領域内の文字部分を走査
し、光電変換して得られる画像信号を文字認識部５に転
送する。文字認識部５は入力文字の字形と各文字の標準
字形との距離を計算し、距離の小さい順に（つまり字形
の似ている順に）並んだ候補文字と距離値からなる集合
を形成し、出力形成部に出力する。標準字形は認識辞書
に登録されている。この認識辞書内には、例えば日本文
読取用辞書であれば、漢字、ひらがな、カタカナ、数
字、英字、記号等が登録されている。図３は認識対象お
よび文字認識例の説明図を示している。The scanning section 4 scans the area determined by the layout analysis section 3 as a character area and the character portion in the front area, and transfers an image signal obtained by photoelectric conversion to the character recognition section 5. The character recognition unit 5 calculates the distance between the glyph of the input character and the standard glyph of each character, forms a set of candidate characters and distance values arranged in ascending order of distance (that is, in the order of similar glyphs), and outputs the result. Output to the forming unit. Standard glyphs are registered in the recognition dictionary. In the recognition dictionary, for example, in the case of a Japanese sentence reading dictionary, kanji, hiragana, katakana, numbers, letters, symbols, etc. are registered. FIG. 3 shows an explanatory diagram of a recognition target and a character recognition example.

【００１０】出力形成部６では、文字認識部５の認識結
果から距離値が最も小さい文字を並べて出力する。図４
は誤認識の出力例を示す説明図であり、この図の例で
は、“ｏ”（英小文字）“０”（数字）は文字種が異な
るが、どちらが正しいのかを判断することは難しい。こ
のため、文字認識時の判断を優先することになる。図４
は誤認識の出力例を示す説明図であり、従来の装置では
この図のような結果を出力してしまうことになる。The output forming unit 6 arranges and outputs the character having the smallest distance value from the recognition result of the character recognition unit 5. FIG.
Is an explanatory diagram showing an output example of erroneous recognition. In the example of this figure, "o" (lowercase letters) and "0" (numeric characters) have different character types, but it is difficult to determine which is correct. Therefore, the judgment at the time of character recognition is prioritized. FIG.
Is an explanatory diagram showing an output example of erroneous recognition, and the conventional device will output a result as shown in this figure.

【００１１】本実施の形態は、画像を取り込み、レイア
ウト解析により文字と判断された部分について、認識誤
りが発生すると思われる任意の文字列の画像をオペレー
タが指定し、その際、当該文字列の文字種をあわせて指
定し、当該文字列については、指定した文字種でのみ認
識することにより、認識率を向上させることを特徴とし
ている。According to the present embodiment, the operator designates an image of an arbitrary character string in which a recognition error is considered to occur in a portion determined to be a character by the layout analysis, and at that time, the character string Characteristic is also specified, and the recognition rate is improved by recognizing the character string only with the specified character type.

【００１２】図５は文書画像に対する文字列・文字種指
定の例を示す説明図であり、この図を参照して本実施の
形態の文字認識についてさらに述べる。まず、文字領域
と判断した部分の画像表示上に、任意の文字列の画像を
指定し、当該文字列の画像の文字種をあわせて指定す
る。図５の例では、指定した領域に、それぞれ、英字・
数字・カタカナの指定を行っている。FIG. 5 is an explanatory diagram showing an example of character string / character type designation for a document image, and the character recognition of the present embodiment will be further described with reference to this figure. First, an image of an arbitrary character string is designated on the image display of the portion determined to be the character region, and the character type of the image of the character string is also designated. In the example of FIG. 5, in the specified area, alphabetic characters and
The numbers and katakana are specified.

【００１３】次に、図５の指定に基づいて文字認識を行
う。指定された文字列の画像については、指定文字種の
標準字形でのみ文字認識を行う。指定以外の文字画像部
分は、辞書に登録している標準字形を全て用いて文字認
識を行う。図６は本実施の形態の文字認識例を示す説明
図であり、前記指定に基づく文字認識を行った例であ
る。Next, character recognition is performed based on the designation of FIG. For the image of the specified character string, character recognition is performed only with the standard glyph of the specified character type. For character image portions other than those designated, character recognition is performed using all standard glyphs registered in the dictionary. FIG. 6 is an explanatory diagram showing an example of character recognition according to the present embodiment, which is an example of performing character recognition based on the designation.

【００１４】図７は実施の形態の出力例を示す説明図で
あり、図６に示す文字認識結果から出力文字列を選択し
た例である。文字認識前に、任意の文字列について文字
種を指定して認識しているため、認識誤りを防ぐことが
できる。上述のように、本実施の形態では、オペレータ
が指定した文字列の画像を、オペレータが指定した文字
種で文字認識することで、認識率を向上させることがで
きる。〔第２の実施の形態〕本実施の形態は、第１の実施の形
態の構成において、文字認識結果後に文字認識結果誤り
が生じている部分について、オペレータが文字列を指定
可能とし、その指定した文字列について指定した文字種
で再度認識することを特徴としている。FIG. 7 is an explanatory diagram showing an output example of the embodiment, and is an example in which an output character string is selected from the character recognition result shown in FIG. Before recognizing a character, a character type is specified and recognized for an arbitrary character string, so that a recognition error can be prevented. As described above, in the present embodiment, the recognition rate can be improved by recognizing the image of the character string designated by the operator with the character type designated by the operator. [Second Embodiment] In the present embodiment, in the configuration of the first embodiment, an operator can specify a character string for a portion where a character recognition result error occurs after the character recognition result, and the specification is performed. The feature is that the specified character string is recognized again with the specified character type.

【００１５】まず、画像入力装置２から入力された文書
画像についてレイアウト解析を施し、図２の文書画像を
得る。この画像について、走査部４および文字認識部５
を用いて文字認識を行い、図３の文字認識結果を得る。
出力形成部６では、文字認識部５の認識結果から距離値
が最も小さい文字をならべて出力する。例えば図４に示
すような出力例を得る。例えば、“ｏ”（英小文字）と
“０”（数字）は文字種が異なるが、どちらが正しいの
かを判断することは難しい。その結果、図４の例のよう
に、文字認識時の判断を優先することになる。First, layout analysis is performed on the document image input from the image input device 2 to obtain the document image of FIG. For this image, the scanning unit 4 and the character recognition unit 5
Is used for character recognition, and the character recognition result of FIG. 3 is obtained.
The output forming unit 6 arranges and outputs the character having the smallest distance value from the recognition result of the character recognition unit 5. For example, an output example as shown in FIG. 4 is obtained. For example, "o" (lowercase letters) and "0" (numbers) have different character types, but it is difficult to determine which is correct. As a result, the judgment at the time of character recognition is prioritized as in the example of FIG.

【００１６】本実施の形態では、画像を取り込み、レイ
アウト解析により文字と判断された部分を文字認識した
結果について、オペレータが文字種を揃えたい任意の文
字列を指定し、その際、当該文字列の文字種をあわせて
指定し、当該文字列について指定した文字種で再度認識
することで、文字認識結果の修正を容易に行うことがで
きる。In the present embodiment, the operator designates an arbitrary character string whose character types are to be aligned as a result of character recognition of a portion judged to be a character by layout analysis, and at that time, the character string The character recognition result can be easily corrected by designating the character types together and recognizing the character string again with the designated character type.

【００１７】図８は文字認識結果に対する文字列・文字
種指定の例を示す説明図である。以下に本実施の形態の
文字認識について述べる。まず、図４に示す文字認識結
果上に、任意の文字列を指定し、当該文字列の文字種を
あわせて指定する。図８は、図４の文字認識結果上に、
任意の文字列を指定し、それぞれに英字・数字・カタカ
ナの指定を行った例である。FIG. 8 is an explanatory view showing an example of character string / character type designation for a character recognition result. The character recognition of this embodiment will be described below. First, an arbitrary character string is specified on the character recognition result shown in FIG. 4, and the character type of the character string is also specified. FIG. 8 shows the character recognition result of FIG.
This is an example in which an arbitrary character string is specified and alphabetic characters, numbers, and katakana are specified.

【００１８】次に、図８の指定に基づいて再度文字認識
を実行する。図８で指定した文字列についてのみ、指定
文字種の標準字形で再度文字認識を行い、指定以外の文
字については再度の文字認識は行わない。図９は本実施
の形態の文字認識結果を示す説明図であり、図８の指定
に基づいて再度文字認識を行った例である。これによ
り、図７に示すような正しい出力結果を得ることができ
る。Next, character recognition is executed again based on the designation in FIG. Only for the character string designated in FIG. 8, character recognition is performed again with the standard character shape of the designated character type, and character recognition other than the designated character is not performed again. FIG. 9 is an explanatory diagram showing the character recognition result of the present embodiment, which is an example in which character recognition is performed again based on the designation in FIG. As a result, the correct output result as shown in FIG. 7 can be obtained.

【００１９】上述のように、本実施の形態では、文字認
識結果後に文字認識結果誤りが生じている部分につい
て、オペレータが文字列を指定し、同時に指定した文字
種で再度文字認識することで、オペレータが１文字ずつ
修正する手間が省け、文字認識結果の修正を容易に行う
ことができる。〔第３の実施の形態〕本実施の形態は、第１の実施の形
態の構成において、文字認識結果後に文字認識結果誤り
が生じている部分について、オペレータが文字列を指定
可能とし、その指定した文字列について指定した文字種
の候補文字のみを選択することを特徴としている。As described above, in the present embodiment, the operator designates a character string for the portion where the character recognition result error occurs after the character recognition result, and at the same time, character recognition is performed again by the designated character type. It is possible to save the trouble of correcting each character by one character and easily correct the character recognition result. [Third Embodiment] In the present embodiment, in the configuration of the first embodiment, an operator can designate a character string for a portion where a character recognition result error occurs after the character recognition result, and the designation is performed. The feature is that only candidate characters of the specified character type are selected for the specified character string.

【００２０】まず、画像入力装置２から入力された文書
画像についてレイアウト解析を施し、図２の文書画像を
得る。この画像について、走査部４および文字認識部５
を用いて文字認識を行い、図３の文字認識結果を得る。
出力形成部６では、文字認識部５の認識結果から距離値
が最も小さい文字をならべて出力する。例えば図４に示
すような出力例を得る。“ｏ”（英小文字）と“０”
（数字）は文字種が異なるが、どちらが正しいのかを判
断することは難しい。その結果、図４の例のように、文
字認識時の判断を優先することになる。First, the document image input from the image input device 2 is subjected to layout analysis to obtain the document image shown in FIG. For this image, the scanning unit 4 and the character recognition unit 5
Is used for character recognition, and the character recognition result of FIG. 3 is obtained.
The output forming unit 6 arranges and outputs the character having the smallest distance value from the recognition result of the character recognition unit 5. For example, an output example as shown in FIG. 4 is obtained. "O" (lowercase letters) and "0"
(Number) has different character types, but it is difficult to determine which is correct. As a result, the judgment at the time of character recognition is prioritized as in the example of FIG.

【００２１】本実施の形態では、画像を取り込み、レイ
アウト解析により文字と判断された部分を文字認識した
結果について、オペレータが文字種を揃えたい任意の文
字列を指定し、その際、当該文字列の文字種をあわせて
指定し、当該文字列について指定した文字種の候補文字
のみを選択し、再度文字認識結果として表示すること
で、文字認識結果の修正を容易に行うことができる。In the present embodiment, the operator designates an arbitrary character string whose character types are to be aligned as a result of character recognition of a portion determined to be a character by layout analysis, and at that time, the character string It is possible to easily correct the character recognition result by designating the character types together, selecting only the candidate characters of the designated character type for the character string, and displaying them again as the character recognition result.

【００２２】以下に本実施の形態の文字認識について述
べる。まず、図４に示す文字認識結果上に、任意の文字
列を指定し、当該文字列の文字種をあわせて指定する。
図８は、図４の文字認識結果上に、任意の文字列を指定
し、それぞれに英字・数字・カタカナの指定を行った例
である。次に、図８の指定に基づいて指定した文字種に
あてはまる候補文字のみを選択し新たに候補文字群とし
て出力する。指定以外の文字については文字種による候
補文字選択は行わない。Character recognition according to this embodiment will be described below. First, an arbitrary character string is specified on the character recognition result shown in FIG. 4, and the character type of the character string is also specified.
FIG. 8 is an example in which an arbitrary character string is specified on the character recognition result of FIG. 4, and alphabetic characters, numbers, and katakana are respectively specified. Next, based on the designation in FIG. 8, only candidate characters that fit the designated character type are selected and newly output as a candidate character group. Characters other than the designated characters are not selected as candidate characters by character type.

【００２３】図１０は本実施の形態の候補文字選択の説
明図であり、図８の指定に基づいて候補文字選択を行っ
た例を示している。これにより、図７に示すような正し
い出力結果を得ることができる。上述のように、本実施
の形態では、文字認識結果後に文字認識結果誤りが生じ
ている部分について、オペレータが文字列を指定し、同
時に指定した文字種の候補文字のみを選択することで、
オペレータが１文字ずつ修正する手間が省け、文字認識
結果の修正を容易に行うことができる。FIG. 10 is an explanatory diagram of the candidate character selection according to the present embodiment, and shows an example in which the candidate character selection is performed based on the designation of FIG. As a result, the correct output result as shown in FIG. 7 can be obtained. As described above, in the present embodiment, with respect to the portion where the character recognition result error occurs after the character recognition result, the operator designates the character string, and at the same time, selects only the candidate characters of the designated character type,
The operator is not required to correct each character, and the character recognition result can be easily corrected.

【００２４】なお、上記の第１〜第３の実施の形態で
は、漢字、ひらがな、カタカナ、数字、英字、記号等を
指定することを例として挙げて説明したが、これに限る
ことなく、その他の言語や文字体系の文書の認識にも本
発明を同様に適用することができる。さらに、英字につ
いては、より詳細に、大文字、小文字の指定も可能であ
る。例えば、全部大文字、全部小文字、最初の文字は大
文字であとは小文字等と指定することもできる。In the first to third embodiments described above, the designation of kanji, hiragana, katakana, numbers, letters, symbols, etc. has been described as an example, but the present invention is not limited to this. The present invention can be similarly applied to the recognition of documents in the languages and script systems. Furthermore, uppercase letters and lowercase letters can be specified in more detail for English letters. For example, all uppercase letters, all lowercase letters, and the first letter can be specified as uppercase letters and lowercase letters.

【００２５】また、帳票等を読み取って文書画像を得る
こととして説明したが、これに限ることなく、ペン入力
による手書き文字認識を行う装置等にも、同様に本発明
を適用することが可能である。Further, although it has been described that a document image is obtained by reading a form or the like, the present invention is not limited to this, and the present invention can be similarly applied to a device for recognizing handwritten characters by pen input. is there.

【００２６】[0026]

【発明の効果】以上詳細に説明したように、オペレータ
が任意の対象についてその文字種を指定し、この指定に
基づいて文字認識結果を出力することにより、認識率を
向上させることができ、文字認識結果の修正を容易に行
うことができるようになる効果を有する。As described in detail above, the operator designates the character type of an arbitrary object and outputs the character recognition result based on this designation, whereby the recognition rate can be improved, and the character recognition can be improved. This has an effect that the result can be easily corrected.

[Brief description of drawings]

【図１】実施の形態の構成を示すブロック図FIG. 1 is a block diagram illustrating a configuration of an embodiment.

【図２】文字と認識された領域の一例を示す説明図FIG. 2 is an explanatory diagram showing an example of a region recognized as a character.

【図３】認識対象および文字認識例の説明図FIG. 3 is an explanatory diagram of a recognition target and a character recognition example.

【図４】誤認識の出力例を示す説明図FIG. 4 is an explanatory diagram showing an output example of erroneous recognition.

【図５】文書画像に対する文字列・文字種指定の例を示
す説明図FIG. 5 is an explanatory diagram showing an example of specifying a character string / character type for a document image.

【図６】第１の実施の形態の文字認識例を示す説明図FIG. 6 is an explanatory diagram showing an example of character recognition according to the first embodiment.

【図７】実施の形態の出力例を示す説明図FIG. 7 is an explanatory diagram showing an output example of the embodiment.

【図８】文字認識結果に対する文字列・文字種指定の例
を示す説明図FIG. 8 is an explanatory diagram showing an example of character string / character type designation for a character recognition result.

【図９】第２の実施の形態の文字認識結果を示す説明図FIG. 9 is an explanatory diagram showing a result of character recognition according to the second embodiment.

【図１０】第３の実施の形態の候補文字選択の説明図FIG. 10 is an explanatory diagram of selecting a candidate character according to the third embodiment.

[Explanation of symbols]

４走査部５文字認識部６出力形成部 4 Scanning part 5 Character recognition part 6 Output forming part

Claims

[Claims]

1. A scanning unit that divides a document image into areas such as characters, tables, ruled lines, figures, and photographs, and scans the character portions of the character areas and table areas to perform photoelectric conversion and output an image signal. , A character recognition unit that sequentially selects, as a candidate character, a character whose distance value is within a predetermined range with a degree of similarity between the image and a standard character shape registered in advance as a candidate character, and the character. In a general document reading device, which comprises an output forming unit that arranges and outputs the character having the smallest distance value from the recognition result of the recognition unit, the character recognition unit is configured to display an image of an arbitrary character string of a character portion designated by an operator and Based on the character type information of the image of the character string, character recognition is performed for any range specified by the operator using only the standard glyphs of the specified character type, and the range other than the specified range is registered. General document reading device being characterized in that shall perform character recognition using the entire quasi-shape.

2. A scanning unit that divides a document image into areas such as characters, tables, ruled lines, figures, and photographs, and scans the character portions of the character areas and table areas to perform photoelectric conversion and output an image signal. , A character recognition unit that sequentially selects, as a candidate character, a character whose distance value is within a predetermined range with the degree of similarity between the image and a standard character shape registered in advance as a distance value, and the character. In a general document reading device consisting of an output forming unit that arranges and outputs the character with the smallest distance value from the recognition result of the recognizing unit, the character recognizing unit is used for the output result from the output forming unit The general document reading device is characterized in that the character string of is recognized again with the designated character type.

3. A scanning unit that divides a document image into areas such as characters, tables, ruled lines, figures and photographs, and scans the character portions of the character areas and table areas to perform photoelectric conversion and output an image signal. , A character recognition unit that sequentially selects, as a candidate character, a character whose distance value is within a predetermined range with a degree of similarity between the image and a standard character shape registered in advance as a candidate character, and the character. In a general document reading device including an output forming unit that arranges and outputs the character having the smallest distance value from the recognition result of the recognition unit, the output forming unit is designated by the operator with respect to the output result from the output forming unit. A general document reading device characterized in that, for an arbitrary character string, only candidate characters of a designated character type are selected and output again.