JPH06223219A

JPH06223219A - Character recognizing device

Info

Publication number: JPH06223219A
Application number: JP5032664A
Authority: JP
Inventors: Shiori Ooaku; 志緒理大阿久
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1993-01-27
Filing date: 1993-01-27
Publication date: 1994-08-12

Abstract

PURPOSE:To facilitate the correcting work of a recognized result and to improve the performance of reliability. CONSTITUTION:An image input part 1 inputs charater image data. A character recognizing processing part 2 recognizes a character based upon character image data inputted by the input part 1. A post-processing part 3 executes the language processing of a character string recognized by the processing part 2. A dictionary part 4 is a language dictionary having a word dictionary and a grammar dictionary. A display part 5 displays a character string finally determined by the language processing of the post processing part 3. A correction part 6 allows a user to execute the correcting work of the characters displayed on the display part 5. An output part 7 outputs a determined character string (final output). At the time of outputting a recognized result, a dividing code in each word constituting the recognized character string together with the character string.

Description

Detailed Description of the Invention

【０００１】[0001]

【技術分野】本発明は、文字認識装置に関し、より詳細
には、文字認識、自然言語処理、ヒューマンインターフ
ェイスに適用される文字認識装置に関する。TECHNICAL FIELD The present invention relates to a character recognition device, and more particularly, to a character recognition device applied to character recognition, natural language processing, and a human interface.

【０００２】[0002]

【従来技術】本発明に係る従来技術を記載した公知文献
としては、例えば、特開平４−１７０８５号公報に「光
学文字読取システム」がある。この公報のものは、候補
単語を選択することで容易に修正を行い、修正に要する
時間の短縮を見るために、読取フィールドのフィールド
全体のイメージ情報を表示し、照合結果の候補単語先頭
文字のフィールド内文字位置情報を格納する手段を備
え、カーソルが位置付く当該文字位置に候補単語先頭文
字が位置する候補単語のみを表示選択するものである。
すなわち、誤認識文字の修正のカーソルの制御方法を単
語単位とし、カーソルが位置付く文字位置に候補単語先
頭文字が位置する候補単語のみを表示選択するもので、
文字認識結果の出力は、認識結果の文字列が一様にべた
書きで表示するものである。2. Description of the Related Art As a known document which describes the prior art according to the present invention, there is an "optical character reading system" in Japanese Patent Laid-Open No. 4-17085. In this publication, correction is easily performed by selecting a candidate word, and image information of the entire field of the reading field is displayed in order to see the reduction of the time required for correction, and the first character of the candidate word of the matching result is displayed. A means for storing in-field character position information is provided, and only the candidate word in which the candidate word first character is positioned at the character position where the cursor is positioned is displayed and selected.
That is, the method of controlling the cursor for correcting the erroneously recognized characters is set to the word unit, and only the candidate word in which the candidate word first character is located at the character position where the cursor is positioned is displayed and selected.
The output of the character recognition result is to display the character string of the recognition result uniformly in solid writing.

【０００３】また、特開平３−１４８７８６号公報の
「帳票読取装置」は、入力原稿と出力結果との照合チェ
ックを高能率で行うために、出力結果の中で知識処理が
反映されているものを可視化するものである。すなわ
ち、文字認識結果の出力は、認識結果の文字列が一様に
べた書きで表示し、誤認識と想定される文字に対して
は、認識の確信度に応じて、色別に表示するものであ
る。Further, in the "form reading device" of Japanese Patent Laid-Open No. 3-148786, the knowledge processing is reflected in the output result in order to check the collation between the input document and the output result with high efficiency. To visualize. That is, the output of the character recognition result is such that the character string of the recognition result is uniformly displayed in solid, and for the character assumed to be erroneous recognition, it is displayed by color according to the certainty of recognition. is there.

【０００４】現在の文字認識技術は、まだ充分とはいえ
ない。そのため、認識結果を正しい文字に修正する作業
をユーザ側が行うことは不可欠となっている。その作業
を支援する手段として、認識対象文書の一行単位に出力
すること、認識の信頼度を算出、その結果から誤認識と
推定できる文字列に対し色別に表示すること、という手
法が一般的に採られていた。しかし、信頼度の性能もま
だ充分ではなく、結局、ユーザは、認識された文字をす
べてチェックし直さなければならない。また、現在の表
示方法では、認識された文字が均等間隔に表示されるの
で、ユーザがディスプレイを見ながら修正するには、非
常に読みづらいという問題点があった。The current character recognition technology is not yet sufficient. Therefore, it is indispensable for the user to correct the recognition result to correct characters. As a means for supporting the work, a method of outputting the recognition target document line by line, calculating the reliability of recognition, and displaying a character string that can be estimated to be erroneous recognition from the result by color is generally used. It was taken. However, the reliability performance is still insufficient, and in the end, the user has to recheck all the recognized characters. Further, in the current display method, since the recognized characters are displayed at equal intervals, it is very difficult for the user to read and correct the characters while looking at the display.

【０００５】[0005]

【目的】本発明は、上述のごとき実情に鑑みてなされた
もので、認識結果の修正作業を容易にし、信頼度の性能
の向上を図るようにした文字認識装置を提供することを
目的としてなされたものである。The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a character recognition device that facilitates correction work of a recognition result and improves reliability performance. It is a thing.

【０００６】[0006]

【構成】本発明は、上記目的を達成するために、（１）
文字画像データに基づいて文字を認識する認識部と、該
認識部により認識された文字列に対して言語処理を行う
言語処理部と、単語辞書等の言語情報を有する辞書部
と、前記言語処理部によって最終的に決定した文字列を
表示する表示部と、該表示部により表示された文字を修
正できる修正部とから成り、認識結果の出力時に、認識
文字列とともにその文字列を構成する単語単位に区切り
符号を表示すること、更には、（２）認識結果の出力時
に、その文字列を構成する単語の表記長をもとに、区切
り符号を付加する位置を決定すること、更には、（３）
認識結果の出力時に、その文字列を構成する単語の品詞
情報をもとに、区切り符号の種類の変更可能としたこ
と、更には、（４）前記（１）〜（３）のいずれかにお
いて、区切り符号を削除し、修正・確認された認識文字
列のみを最終出力とすることを特徴としたものである。
以下、本発明の実施例に基づいて説明する。In order to achieve the above object, the present invention provides (1)
A recognition unit that recognizes a character based on character image data, a language processing unit that performs a language process on the character string recognized by the recognition unit, a dictionary unit that has language information such as a word dictionary, and the language processing unit. The display unit that displays the character string finally determined by the unit, and the correction unit that can correct the character displayed by the display unit, and a word that forms the character string together with the recognized character string when the recognition result is output. Displaying the delimiter code in units, and (2) determining the position to which the delimiter code is added based on the notation length of the words that form the character string when the recognition result is output, (3)
At the time of outputting the recognition result, it is possible to change the type of the delimiter code based on the part-of-speech information of the words forming the character string, and (4) In any one of (1) to (3) above. , The delimiter code is deleted and only the recognized / corrected recognition character string is output as the final output.
Hereinafter, description will be given based on examples of the present invention.

【０００７】図１は、本発明による文字認識装置の一実
施例を説明するための構成図で、図中、１は画像入力
部、２は文字認識処理部、３は後処理部（言語処理
部）、４は辞書部、５は表示部、６は修正部、７は出力
部である。画像入力部１は、文字画像データを入力す
る。文字認識処理部２は、画像入力部１により入力され
た文字画像データに基づいて文字を認識する。後処理部
３は、文字認識処理部２で認識された文字列に対して言
語処理を行う。辞書部４は、言語辞書で単語辞書と文法
辞書を有している。表示部５は、後処理部３による言語
処理によって最終的に決定した文字列を表示する。修正
部６は表示部５により表示された文字をユーザによる修
正作業を行う。出力部７により決定文字列（最終出力）
を出力する。FIG. 1 is a block diagram for explaining an embodiment of a character recognition apparatus according to the present invention. In the figure, 1 is an image input section, 2 is a character recognition processing section, 3 is a post-processing section (language processing). Part), 4 is a dictionary part, 5 is a display part, 6 is a correction part, and 7 is an output part. The image input unit 1 inputs character image data. The character recognition processing unit 2 recognizes a character based on the character image data input by the image input unit 1. The post-processing unit 3 performs language processing on the character string recognized by the character recognition processing unit 2. The dictionary unit 4 has a word dictionary and a grammar dictionary as language dictionaries. The display unit 5 displays the character string finally determined by the language processing by the post-processing unit 3. The correction unit 6 corrects the characters displayed on the display unit 5 by the user. Character string determined by the output unit 7 (final output)
Is output.

【０００８】次に、本発明による文字認識装置の動作に
ついて説明する。従来の文字認識装置と同様、イメージ画像を入力し、
文字認識を行い、候補文字を抽出する。後処理部３では、単語辞書４などの言語辞書を使った
言語処理を行い、候補文字の中から正しい文字を確定、
表示部に渡す。その際、言語処理によって確定された各
単語の表記長と品詞情報も併せて表示部５に渡す。表記
長は、表記の文字数を表す数値である。品詞情報は、そ
れぞれ順に名詞、助詞、サ変名詞、名詞などの品詞名で
ある。図２は、表記長と品詞情報の例を示す図である。
例えば、確定した文字列を誤認識文字を含む文字列「不
特定韓者の装直」とするこの文字列を構成する単語は、
言語処理によって決定されている。Next, the operation of the character recognition device according to the present invention will be described. As with the conventional character recognition device, you can enter an image,
Character recognition is performed and candidate characters are extracted. The post-processing unit 3 performs language processing using a language dictionary such as the word dictionary 4 to determine a correct character from the candidate characters,
Hand it over to the display. At that time, the notation length of each word and the part-of-speech information determined by the language processing are also passed to the display unit 5. The notation length is a numerical value indicating the number of characters in the notation. The part-of-speech information is a part-of-speech name such as a noun, a particle, a sahen noun, and a noun, in that order. FIG. 2 is a diagram showing an example of the notation length and the part-of-speech information.
For example, the word that composes the confirmed character string, which is a character string including misrecognized characters, is "disguise of unspecified Korean",
Determined by language processing.

【０００９】表示部５は、前記の出力情報である表
記長をもとに、各単語末に単語区切り符号を添えて表示
するものである。その表記長が示す文字数分の認識文字
（確定文字列）と単語区切り符号を表示する。ここで、
品詞情報を用いて、表示する区切り符号の種類を変更す
る。ある特定の品詞の単語を表示する際、その単語に後
接する区切り符号を変更するのである。例えば、区切り
符号は助詞・助動詞の後接記号とそれ以外の品詞に後接
する記号とに分けたほうがよい。図３は、表示部の出力
例を示す図である。ここでは、区切り符号を、助詞・助
動詞の後接記号を“ ＿”、それ以外の記号を“ _”
と設定したが、実際には開発者が自由に設定してよい。
その他、スペース符号などを用いてもよい。ユーザは、表示された文字列をチェックし、認識文字
に誤りがあれば修正する。図４は、ユーザ修正作業後の
表示部の出力例を示す図である。出力部７では、単語区切り符号を削除し、前記でユ
ーザが修正確認済みの文字列のみを図５に示すように最
終出力する。The display unit 5 displays a word delimiter at the end of each word based on the notation length which is the output information. The recognized characters (fixed character string) and the word delimiter code for the number of characters indicated by the notation length are displayed. here,
The type of delimiter code to be displayed is changed using the part-of-speech information. When displaying a word with a specific part of speech, the delimiter code that follows the word is changed. For example, it is better to divide the delimiter into a suffix of a particle and auxiliary verb and a suffix of a part of speech other than that. FIG. 3 is a diagram illustrating an output example of the display unit. Here, the delimiter code is "_" for the suffix of the particle and auxiliary verb, and "_" for the other symbols.
However, in practice, the developer may set it freely.
Alternatively, a space code or the like may be used. The user checks the displayed character string and corrects any recognized character error. FIG. 4 is a diagram showing an output example of the display unit after the user correction work. The output section 7 deletes the word delimiter code, and finally outputs only the character string whose correction has been confirmed by the user as shown in FIG.

【００１０】[0010]

【効果】以上の説明から明らかなように、本発明による
と、以下のような効果がある。（１）請求項１に対応する効果：単語単位に単語区切り
符号を付加して表示しているので、修正作業を行うユー
ザは、認識文を単語ごとに読むことができ、読みやすく
なるとともに、誤認識文字による単語切りの失敗が、発
見しやすくなる。（２）請求項２に対応する効果：請求項１を実現するた
めの構成要素である。単語の表記長を活用することで、
区切り符号を付加する位置が容易に算出することができ
る。（３）請求項３に対応する効果：さらに、区切り符号を
複数装備し、単語の品詞によってその種類を変更できる
のでユーザは、単語単位だけでなく文節を単位賭しても
認識文を読むことができ、請求項１の効果が一層高めら
れる。（４）請求項４に対応する効果：ユーザによって修正さ
れた認識文字列のみを最終出力として出力するための手
段である。これによって、請求項１〜３によって、効率
的に誤認識文字を修正したのち、ユーザは、従来通り、
データベースなど他のシステムの活用が可能となる。As is apparent from the above description, the present invention has the following effects. (1) Effect corresponding to claim 1: Since the word delimiter code is added and displayed on a word-by-word basis, the user performing the correction work can read the recognition sentence word by word, which makes it easier to read and Failure to cut words due to misrecognized characters makes it easier to find. (2) Effect corresponding to claim 2: It is a component for realizing claim 1. By utilizing the notation length of words,
The position where the delimiter code is added can be easily calculated. (3) Effect corresponding to claim 3: Furthermore, since a plurality of delimiters are provided and the type of word can be changed according to the part of speech of the word, the user can read the recognition sentence not only in word units but also in betting unit phrases. Therefore, the effect of claim 1 is further enhanced. (4) Effect corresponding to claim 4: This is a means for outputting only the recognized character string modified by the user as the final output. Thereby, after efficiently correcting the misrecognized character according to claims 1 to 3, the user can
It is possible to utilize other systems such as databases.

[Brief description of drawings]

【図１】本発明による文字認識装置の一実施例を説明
するための構成図である。FIG. 1 is a configuration diagram for explaining an embodiment of a character recognition device according to the present invention.

【図２】本発明による文字認識装置の表記長と品詞情
報の例を示す図である。FIG. 2 is a diagram showing an example of notation length and part-of-speech information of the character recognition device according to the present invention.

【図３】本発明の表示部の出力例を示す図である。FIG. 3 is a diagram showing an output example of a display unit of the present invention.

【図４】本発明の表示部のユーザ修正作業後の表示例
を示す図である。FIG. 4 is a diagram showing a display example of the display unit of the present invention after user correction work.

【図５】本発明の出力部の出力例を示す図である。FIG. 5 is a diagram showing an output example of an output unit of the present invention.

[Explanation of symbols]

１…画像入力部、２…文字認識処理部、３…後処理部
（言語処理部）、４…辞書部、５…表示部、６…修正
部、７…出力部。1 ... Image input section, 2 ... Character recognition processing section, 3 ... Post-processing section (language processing section), 4 ... Dictionary section, 5 ... Display section, 6 ... Correction section, 7 ... Output section.

Claims

[Claims]

1. A recognition unit for recognizing a character based on character image data, a language processing unit for performing a language process on a character string recognized by the recognition unit, and a dictionary unit having language information such as a word dictionary. And a display unit that displays the character string finally determined by the language processing unit, and a correction unit that can correct the character displayed by the display unit. When the recognition result is output, the character string is recognized together with the recognized character string. A character recognition device characterized by displaying a delimiter code in units of words forming a column.

2. The character recognition device according to claim 1, wherein, when the recognition result is output, the position to which the delimiter code is added is determined based on the notation length of the words forming the character string.

3. The character recognition device according to claim 1, wherein when outputting the recognition result, the type of the delimiter code can be changed based on the part-of-speech information of the words forming the character string.

4. The character recognition device according to claim 1, wherein the delimiter code is deleted and only the recognized / corrected character string is output as a final output.