JPH06282690A

JPH06282690A - Character recognizing device

Info

Publication number: JPH06282690A
Application number: JP5069929A
Authority: JP
Inventors: Makoto Kushima; 真久島; Koichi Higuchi; 浩一樋口
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1993-03-29
Filing date: 1993-03-29
Publication date: 1994-10-07

Abstract

PURPOSE:To provide the character recognizing device which can improve the rate of recognition by registering optimum word information by a user in a knowledge dictionary. CONSTITUTION:A knowledge processing part 16 sends the knowledge processing result based on the knowledge dictionary corresponding to the recognized result of a recognition part 12 to a control part 18 along with word registration information and the control part sends instructions to a display part 20 along with a character string so as to change the display depending on the case that words in the character string obtained as the result of knowledge processing are the words of word information existent in the knowledge dictionary from the beginning and the case that words are the words of word information registered by the user. Since the display part 20 changes the display method of words based on this display instruction, even when the erroneous post-processing result is outputted to a spot originally obtained with the correct answer words by the influence of word information newly registered, it is easily noticed and as a result. the optimum word information can be registered in the knowledge dictionary.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、帳票または文書を正
確に処理できるように読取り精度を向上させた文字認識
装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device having improved reading accuracy so that a form or a document can be processed accurately.

【０００２】[0002]

【従来の技術】従来より、手書き文字の認識率を向上さ
せるために、知識辞書が用いられている。知識辞書が保
持する情報は、認識対象とする単語の情報、文脈情報、
およびその他の情報である。文字認識装置の利用者は新
たにこれらの情報を登録することも可能である。以下、
知識辞書の単語情報を利用した従来の文字認識技術とし
て、例えば文献：昭和５７年電子通信学会総合全国大会
講演論文集分冊５ー３２６頁に開示されている技術に
つき説明する。2. Description of the Related Art Conventionally, a knowledge dictionary has been used to improve the recognition rate of handwritten characters. The information held by the knowledge dictionary includes the information of the words to be recognized, context information,
And other information. The user of the character recognition device can also newly register these pieces of information. Less than,
As a conventional character recognition technique using word information of a knowledge dictionary, for example, a technique disclosed in a document: Proc.

【０００３】まず帳票または文書の所定領域を光学的に
走査し紙面からの光信号を光電変換して帳票または文書
の画像データを得る。そして画像データから認識対象と
なる文字パタンを切り出す。この切り出した文字パタン
に基づき認識対象となる文字の認識を行ない、認識結果
として一つまたは複数個の候補文字を得る。First, a predetermined area of a form or a document is optically scanned and an optical signal from the paper surface is photoelectrically converted to obtain image data of the form or the document. Then, a character pattern to be recognized is cut out from the image data. A character to be recognized is recognized based on the cut-out character pattern, and one or a plurality of candidate characters are obtained as a recognition result.

【０００４】そして知識処理１単位分の文字に関して、
各文字毎に得た１つまたは複数個の候補文字を組み合わ
せて文字列を作り、文字列の各候補文字毎に付された候
補順位または類似度を用いて文字列および単語情報の単
語の間の類似度を算出し、この類似度が最大となる文字
列を選択する。そしてこの文字列に対応する単語情報の
単語を表示する。但し、当該文字列の類似度が所定の閾
値以下となる場合には、候補順位が１位となる候補文字
を組み合わせてできる候補文字を表示する。Then, regarding the character for one unit of knowledge processing,
A character string is created by combining one or more candidate characters obtained for each character, and the candidate rank or similarity assigned to each candidate character of the character string is used to determine the distance between the character string and the word in the word information. Is calculated, and the character string having the maximum similarity is selected. Then, the word of the word information corresponding to this character string is displayed. However, when the similarity of the character string is less than or equal to a predetermined threshold value, a candidate character formed by combining candidate characters having the first candidate rank is displayed.

【０００５】[0005]

【発明が解決しようとする課題】上述の単語情報を用い
た従来の文字認識技術では、（１）当該文字列と最初か
ら存在した単語情報の単語との類似度が最大となる場合
と、（２）当該文字列と利用者が登録した単語情報の単
語との類似度が最大となる場合とが考えられるが、もと
もと正解単語が得られていた箇所に新たに登録された単
語情報の影響によって（２）の場合の単語が誤って後処
理結果として出力されてしまうことがあった。In the conventional character recognition technique using the above-mentioned word information, (1) the case where the degree of similarity between the character string and the word of the word information existing from the beginning is maximum, 2) It is possible that the degree of similarity between the character string and the word of the word information registered by the user becomes maximum, but due to the influence of the word information newly registered in the place where the correct answer word was originally obtained. The word in the case of (2) may be erroneously output as the post-processing result.

【０００６】例えばある帳票の氏名欄の「安部」と記入
された箇所を認識して得られた候補文字（候補順位が２
位以下とする）から類似度を算出して正解単語を得てい
る場合を考える。もしここで「部」を認識した際の候補
順位が１の文字が「郡」で、候補順位が２の文字が
「部」であったとすると、例えば「安部」に比べてめず
らしい名字である「安郡（あぐ）」という単語（予め用
意されている知識辞書には登録されていないものとす
る）を利用者が知識辞書に追加登録すると上述した帳票
の氏名欄を「安郡」と読んでしまう。このようなケース
があるにもかかわらず、前述の（１）、（２）の場合の
単語の表示を同一としていた。従って、オペレーターは
文字列の表示を一瞥しただけでは上記の現象を発見する
ことはできないので、後から登録した単語情報によって
知識辞書の性能が悪くなってしまってもそのままの状態
となり、文字認識装置の認識率が下がってしまうという
問題点があった。[0006] For example, a candidate character obtained by recognizing a place where "Abe" is entered in the name column of a certain form (candidate rank is 2
Consider the case where the correct answer word is obtained by calculating the degree of similarity from the rank below. If the character with a candidate rank of 1 when recognizing a "part" is "gun" and the character with a candidate rank of 2 is "part", for example, it is a surname that is rarer than "Abe". If the user additionally registers in the knowledge dictionary the word "Agu" (which is not registered in the prepared knowledge dictionary), the name column of the above-mentioned form is read as "Angu". I will get out. Despite such cases, the word display in the cases (1) and (2) is the same. Therefore, the operator cannot discover the above phenomenon just by glancing at the display of the character string, so even if the performance of the knowledge dictionary deteriorates due to the word information registered later, it remains as it is and the character recognition device. There was a problem that the recognition rate of was lowered.

【０００７】この発明の目的は上述した従来の問題点を
解決し、利用者が知識辞書に最適な単語情報を登録して
認識率を向上させることができる文字認識装置を提供す
ることにある。An object of the present invention is to solve the above-mentioned conventional problems and to provide a character recognition device which allows a user to register optimum word information in a knowledge dictionary and improve the recognition rate.

【０００８】[0008]

【課題を解決するための手段】この発明の文字認識装置
は前記課題を解決するために、量子化された帳票または
文書の画像データから切り出した文字パタンの認識結果
を出力する認識部と、知識辞書に単語情報を登録するた
めの単語登録部と、前記知識辞書を用いて前記認識結果
に基づく知識処理結果を出力する後処理部と、前記知識
処理結果を表示する表示部とを備えて成る文字認識装置
において、前記表示部を、前記後処理部で出力された文
字列が当該知識辞書に最初から存在した単語情報の単語
である場合と、前記単語登録部で登録された単語である
場合とで単語の表示方法を変化させるようにしたことを
特徴とする。In order to solve the above-mentioned problems, a character recognition device of the present invention includes a recognition unit for outputting a recognition result of a character pattern cut out from image data of a quantized form or document, and a knowledge unit. It comprises a word registration unit for registering word information in the dictionary, a post-processing unit for outputting a knowledge processing result based on the recognition result using the knowledge dictionary, and a display unit for displaying the knowledge processing result. In the character recognition device, when the character string output by the post-processing unit is the word of the word information that originally existed in the knowledge dictionary, and the word registered by the word registration unit in the display unit. The feature is that the word display method is changed with and.

【０００９】[0009]

【作用】この発明によれば、表示部の表示は、（１）後
処理部で出力された文字列が知識辞書に最初から存在し
た単語である場合と、（２）利用者が登録した単語であ
る場合とで変化するので、もともと正解単語が得られて
いた箇所に新たに登録された単語の影響によって上記
（２）の場合の単語が誤って後処理結果として出力され
てしまうことがあっても、オペレータは容易にそれに気
づくことができる。その結果、利用者は知識辞書に最適
な単語情報を登録することが可能となり、前記課題が解
決されるのである。According to the present invention, the display on the display unit is (1) the case where the character string output by the post-processing unit is a word originally existing in the knowledge dictionary, and (2) the word registered by the user. However, the word in the case of (2) above may be erroneously output as the post-processing result due to the influence of the word newly registered in the place where the correct word was originally obtained. However, the operator can easily notice it. As a result, the user can register the optimum word information in the knowledge dictionary, and the above problem can be solved.

【００１０】[0010]

【実施例】以下、図面を参照しこの発明の実施例につき
説明する。尚、図面はこの発明が理解できる程度に概略
的に示されているにすぎず、従って各構成成分の形状、
配設位置、寸法、入出力信号および接続関係を図示例に
限定するものではない。Embodiments of the present invention will be described below with reference to the drawings. It should be noted that the drawings are only schematically shown to the extent that the present invention can be understood.
The arrangement position, dimensions, input / output signals, and connection relationship are not limited to the illustrated example.

【００１１】図１はこの発明の一実施例の説明に供する
機能ブロック図である。この実施例の文字認識装置１０
は、量子化された帳票または文書の画像データから文字
パタンを切り出し、この切り出した文字パタンの認識結
果を出力する認識部１２と、文字認識装置の利用者が希
望する単語情報を登録する単語登録部１４と、知識辞書
を用いて認識結果に基づく知識処理結果を出力する後処
理部１６と、知識処理結果を表示する表示部２０と、さ
らにこれら後処理部１６、及び表示部２０の動作を制御
する制御部１８を備えて成る。また図１において２２は
帳票または文書の量子化された画像データを出力する光
電変換部、及び２４は光電変換部２２からの画像データ
を格納する画像メモリである。FIG. 1 is a functional block diagram for explaining one embodiment of the present invention. Character recognition device 10 of this embodiment
Is a recognition unit 12 that cuts out character patterns from image data of a quantized form or document and outputs a recognition result of the cut-out character patterns, and word registration that registers word information desired by a user of the character recognition device. The unit 14, the post-processing unit 16 that outputs the knowledge processing result based on the recognition result using the knowledge dictionary, the display unit 20 that displays the knowledge processing result, and the operations of the post-processing unit 16 and the display unit 20 are described. The control unit 18 for controlling is provided. Further, in FIG. 1, reference numeral 22 is a photoelectric conversion unit that outputs quantized image data of a form or document, and 24 is an image memory that stores the image data from the photoelectric conversion unit 22.

【００１２】図２はこの装置の処理概要を示す流れ図で
ある。処理が開始されると、先ずステップＳ１で利用者
が希望する単語情報を登録する必要があるか判断する。
その必要があればステップＳ２で単語情報を登録し、必
要なければ次の処理に進む。ステップＳ３で帳票または
文書の読取りを行い、ステップＳ４で文字の認識、ステ
ップＳ５で認識結果に基づく知識処理、ステップＳ６で
知識処理結果の確認及び訂正を行う。続いて、ステップ
Ｓ７で知識辞書のメンテナンスを必要とする場合はステ
ップＳ２へ戻り単語情報を変更または抹消し、以降は同
じ処理を繰り返す。ステップＳ７で知識辞書のメンテナ
ンスを必要としなければ終了する。FIG. 2 is a flow chart showing an outline of processing of this apparatus. When the process is started, first, in step S1, it is determined whether or not it is necessary to register the word information desired by the user.
If necessary, the word information is registered in step S2, and if not necessary, the process proceeds to the next process. The form or the document is read in step S3, the characters are recognized in step S4, the knowledge processing is performed based on the recognition result in step S5, and the knowledge processing result is confirmed and corrected in step S6. Then, if maintenance of the knowledge dictionary is required in step S7, the process returns to step S2 to change or delete the word information, and thereafter, the same processing is repeated. If maintenance of the knowledge dictionary is not required in step S7, the process ends.

【００１３】図３は帳票の一例を示したものであり、同
図において３１は住所が記載される帳票の例、及び３２
は文字記載領域を指定する記入枠である。FIG. 3 shows an example of a form. In FIG. 3, 31 is an example of a form in which an address is described, and 32.
Is an entry frame for specifying the character description area.

【００１４】以下、図１、図２及び図３を参照し、この
実施例につきより詳細に説明する。光電変換部２２は帳
票または文書上の所定の読取り範囲を光学的に走査し、
帳票または文書からの光信号Ｌを光電変換して白黒２値
に量子化された画像データを出力し、画像メモリ２４は
この画像データを格納する。Hereinafter, this embodiment will be described in more detail with reference to FIGS. 1, 2 and 3. The photoelectric conversion unit 22 optically scans a predetermined reading range on a form or a document,
The optical signal L from the form or document is photoelectrically converted to output image data quantized into black and white binary, and the image memory 24 stores this image data.

【００１５】文字認識装置１０の認識部１２は画像メモ
リ２４の画像データから文字パタンを切り出し、この切
り出した文字パタンから認識対象となる文字に関する各
種特徴を抽出する。そして切り出した文字パタンの特徴
を標準文字パタンの特徴と照合し、文字の認識結果及び
候補順位を出力する。ひとつの文字に関して１個または
複数個の候補文字が認識結果として得られ、候補文字が
１個の場合には候補順位１を当該候補文字に付して出力
し、また候補文字が複数個の場合には各候補文字毎に定
めた候補順位を候補文字に付して出力する。The recognition unit 12 of the character recognition device 10 cuts out a character pattern from the image data in the image memory 24, and extracts various features relating to the character to be recognized from the cut-out character pattern. Then, the characteristics of the extracted character pattern are collated with the characteristics of the standard character pattern, and the recognition result and the candidate rank of the character are output. When one or more candidate characters are obtained as a recognition result for one character, and when there is one candidate character, candidate rank 1 is attached to the candidate character and output, and when there are multiple candidate characters , The candidate rank determined for each candidate character is attached to the candidate character and output.

【００１６】単語登録部１４は文字認識装置の利用者が
希望する単語情報を知識辞書へ追加登録する。または利
用者が既に登録した単語情報を変更、抹消する（図２の
ステップＳ７、Ｓ２参照）。知識辞書には単語登録部１
４で登録された単語情報と、最初から用意されている一
般の単語情報とを区別できるような情報を付加してお
く。以後この情報のことを単語登録情報と呼ぶ。The word registration unit 14 additionally registers the word information desired by the user of the character recognition device in the knowledge dictionary. Alternatively, the word information already registered by the user is changed or deleted (see steps S7 and S2 in FIG. 2). The knowledge dictionary has a word registration unit 1
Information for distinguishing the word information registered in 4 from the general word information prepared from the beginning is added. Hereinafter, this information is referred to as word registration information.

【００１７】後処理部１６は認識部１２からの認識結果
に基づき単語情報を用いた知識処理を行う。後処理部１
６は知識処理一単位文の文字の認識結果（例えば図３に
示す帳票３１において都道府県名の記載領域の認識結
果）を入力すると、知識処理一単位分の各文字の候補文
字を組み合わせてできる文字列を単語情報の単語と照合
し、候補文字から成る文字列に対応する単語が単語情報
の中に存在するか否か調べる。そして組み合わせてでき
た文字列の中から単語情報の単語と合致する文字列Ａを
検出したら、文字列Ａの評価値Ｊを算出する。ここでＳ
を文字列の各候補文字に付された候補順位の和とし、Ｎ
を文字列を構成する文字の総個数を示すものとすれば、
評価値Ｊは例えば、Ｊ＝Ｓ÷Ｎと表わすことができる。The post-processing section 16 performs knowledge processing using word information based on the recognition result from the recognition section 12. Post-processing unit 1
6 is a combination of candidate characters of each character for one unit of knowledge processing when the recognition result of the character of one unit of knowledge processing (for example, the recognition result of the description area of the prefecture name in the form 31 shown in FIG. 3) is input. The character string is collated with the word of the word information, and it is checked whether or not the word corresponding to the character string of the candidate characters exists in the word information. Then, when the character string A that matches the word of the word information is detected from the character strings formed by combining, the evaluation value J of the character string A is calculated. Where S
Be the sum of the candidate ranks given to each candidate character in the string, and N
Let be the total number of characters that make up the string,
The evaluation value J can be expressed as, for example, J = S ÷ N.

【００１８】単語及び文字列Ａが合致するか否かの判定
は、例えば、単語及び文字列Ａの対応する位置の文字の
文字コードが全部一致するか否かによって行なう。そし
て知識処理一単位分についてできた文字列の全てを単語
情報と照合し終えたときに文字列Ａの中から評価値Ｊが
最小となる文字列Ａを知識処理結果として選択し、この
文字列Ａを単語登録情報と共に制御部１８へ送出する。Whether or not the word and the character string A match is determined by, for example, whether or not the character codes of the characters at corresponding positions of the word and the character string A all match. Then, the character string A having the smallest evaluation value J is selected from the character strings A as the knowledge processing result when all the character strings formed for one unit of knowledge processing have been matched with the word information. A is sent to the control unit 18 together with the word registration information.

【００１９】また知識処理一単位文の文字列全てを単語
情報の単語と照合し終えたときに文字列Ａを１個だけ検
出していたら、当該文字列Ａを知識処理結果として選択
し、この文字列Ａを単語登録情報と共に制御部１８へ送
出する。If only one character string A is detected when all the character strings of the knowledge processing unit sentence are matched with the words of the word information, the character string A is selected as the knowledge processing result, and The character string A is sent to the control unit 18 together with the word registration information.

【００２０】また知識処理一単位分の文字列全てを単語
情報の単語と照合し終えたときに文字列Ａを１個も検出
していなければ、知識処理一単位分の各文字の候補順位
が１位の候補文字を組み合わせてできる文字列を知識処
理結果として選択し、この文字列Ａを制御部１８へ送出
する。If no character string A is detected when all the character strings for one unit of knowledge processing have been matched with the words of the word information, the candidate rank of each character for one unit of knowledge processing is A character string formed by combining the first-ranked candidate characters is selected as the knowledge processing result, and this character string A is sent to the control unit 18.

【００２１】制御部１８は上記単語照合で文字列を検出
していれば、選択された文字列Ａを単語登録情報に応じ
た表示指示と共に表示部２０へ送出する。また制御部１
８は上記単語照合で文字列Ａを検出していなければ、文
字列Ａを最初から知識辞書に登録されている単語情報の
単語を表示する場合と同じ表示指示と共に表示部２０へ
送出する。文字列Ａが最初から登録されている単語情報
の単語の場合は、第一の色で表示する指示を、文字列Ａ
が単語登録部で登録された単語情報の単語である場合は
第一の色と異なる第二の色で表示する指示を送出する。When the character string is detected by the word matching, the control unit 18 sends the selected character string A to the display unit 20 together with the display instruction according to the word registration information. In addition, the control unit 1
If the character string A is not detected by the word matching, 8 sends the character string A to the display unit 20 together with the same display instruction as when displaying the word of the word information registered in the knowledge dictionary from the beginning. If the character string A is a word of word information registered from the beginning, the instruction to display in the first color is
Is a word of the word information registered by the word registration unit, an instruction to display in a second color different from the first color is sent.

【００２２】表示部１８は選択された文字列Ａを表示指
示と共に入力すると、文字列Ａを表示指示で指定された
方法で表示器に表示する。When the selected character string A is input together with the display instruction, the display section 18 displays the character string A on the display device by the method designated by the display instruction.

【００２３】この発明は上述した実施例にのみ限定され
るものではなく、従って各構成成分の構成、動作、処理
内容、入出力信号及び数値的条件を任意好適に変更して
よい。例えば上述した実施例では単語登録部において利
用者は一般知識辞書に単語情報を追加登録したが、もう
１つの利用者専用の知識辞書を用意してそこに必要な単
語情報を登録するようにしてもよい。The present invention is not limited to the above-described embodiments, and therefore, the configuration, operation, processing contents, input / output signals and numerical conditions of each component may be arbitrarily changed. For example, in the above-mentioned embodiment, the user additionally registered the word information in the general knowledge dictionary in the word registration section, but another knowledge dictionary dedicated to the user is prepared and the necessary word information is registered therein. Good.

【００２４】さらに上述した実施例では評価値Ｊとし
て、文字列の各候補文字に付された候補順位の和Ｓを文
字列を構成する文字の総個数Ｎで割った値を用いたが、
候補順位の和Ｓにかえて各候補順位に対して対応した得
点（例えば候補順位１に対して１００点、候補順位２に
対して９０点を対応付けるというように候補順位が下が
るにつれて低くなる得点を対応付ける）の和を用いるよ
うにしてもよい。或いは候補順位の和Ｓにかえて文字列
の各候補文字の出現頻度（この場合出現頻度はあらかじ
め認識部が保有する）の和を用いるようにしてもよい。
或いは候補順位の和Ｓにかえて、候補文字と当該候補文
字に対応する文字パタンとの間の類似度を求め文字列の
各候補文字の前記類似度の和を用いてもよい。或いは候
補文字の和Ｓにかえて候補文字の辞書マトリクスと当該
候補文字に対応する文字パタンの特徴量との間の距離を
求め文字列の各候補文字の前記距離の和を用いるように
してもよい。或いは候補順位の和Ｓにかえて、文字列の
各候補文字の出現頻度の和と候補順位の和を用いるよう
にしてもよい。Further, in the above-mentioned embodiment, as the evaluation value J, a value obtained by dividing the sum S of the candidate ranks given to each candidate character of the character string by the total number N of the characters constituting the character string is used.
Scores corresponding to each candidate rank instead of the sum S of candidate ranks (for example, 100 points for candidate rank 1 and 90 points for candidate rank 2 are associated with lower scores as the candidate rank decreases). The sum of (corresponding) may be used. Alternatively, instead of the sum S of the candidate ranks, the sum of the appearance frequencies of the candidate characters in the character string (in this case, the appearance frequency is held in advance by the recognition unit) may be used.
Alternatively, instead of the sum S of the candidate ranks, the similarity between the candidate character and the character pattern corresponding to the candidate character may be obtained and the sum of the similarities of the candidate characters in the character string may be used. Alternatively, instead of the sum S of the candidate characters, the distance between the dictionary matrix of the candidate characters and the feature amount of the character pattern corresponding to the candidate character is obtained, and the sum of the distances of the candidate characters of the character string is used. Good. Alternatively, instead of the sum S of the candidate ranks, the sum of the appearance frequencies of the candidate characters in the character string and the sum of the candidate ranks may be used.

【００２５】また表示部の表示方法を上述のもののほ
か、例えば異なる色、異なる輝度、ブリンキング及びア
ンダーラインのうちのいずれか一つまたは複数を用い
て、表示を変化させるようにしてよい。In addition to the above-described display method of the display unit, the display may be changed using, for example, one or more of different colors, different brightness, blinking and underlining.

【００２６】また後処理部は、単語情報を用いた知識処
理を上述のほかつぎに述べるように行ってもよい。候補
文字から成る文字列に対応する単語が単語情報の中に存
在するか否か調べるため、知識処理一単位文の文字列を
単語情報の単語と照合し、これら文字列及び単語の間の
類似度或いは不一致度を算出する。文字列に対応する単
語として例えば文字列との類似度が所定の閾値を越える
単語或いは文字列との不一致度が所定の閾値を越えない
単語を検出する。そして、（１）類似度が所定の閾値を越える文字列或いは不一致
度が所定の閾値を越えない文字列を検出した場合には、
この検出した文字列のうち最大の類似度或いは最小の不
一致度を検出し、この最大の類似度或いは最小の不一致
度の文字列に対応する単語情報の単語を知識処理結果、
及びこの最大の類似度或いは最小の不一致度を知識処理
の評価値として出力する。（２）知識処理一単位分の文字列のすべてを単語情報の
単語と照合し終えても類似度が所定の閾値を越える文字
列、或いは不一致度が所定の閾値を越えない文字列を１
個も検出できなっかた場合には、候補順位が１位となる
候補文字の組み合わせの文字列を知識処理結果、及び類
似度のあらかじめ定めた下限値或いは不一致度のあらか
じめ定めた上限値を評価値として出力する。これら類似度の下限値及び不一致度の上限値は候補文字
から成る文字列に対応する単語が単語情報のなかに存在
しなかったことを表わす。Further, the post-processing section may perform knowledge processing using word information as described below in addition to the above. In order to check whether the word corresponding to the character string consisting of the candidate characters exists in the word information, the character string of the knowledge processing one unit sentence is compared with the word of the word information, and the similarity between these character strings and words is compared. Degree or inconsistency. As a word corresponding to the character string, for example, a word whose similarity to the character string exceeds a predetermined threshold value or a word whose dissimilarity to the character string does not exceed a predetermined threshold value is detected. Then, (1) when a character string whose similarity exceeds a predetermined threshold value or a character string whose degree of disagreement does not exceed a predetermined threshold value is detected,
The maximum similarity or the minimum dissimilarity is detected from the detected character strings, and the word of the word information corresponding to the character string having the maximum similarity or the minimum disagreement is used as the knowledge processing result,
Also, the maximum similarity or the minimum dissimilarity is output as the evaluation value of the knowledge processing. (2) Knowledge processing One character string whose similarity exceeds a predetermined threshold value or whose dissimilarity degree does not exceed a predetermined threshold value even if all the character strings for one unit of knowledge processing have been matched with the words of the word information.
If no individual can be detected, the knowledge processing result of the character string of the combination of the candidate characters having the first candidate rank is evaluated, and the predetermined lower limit value of the similarity or the predetermined upper limit value of the dissimilarity is evaluated. Output as a value. The lower limit value of the similarity and the upper limit value of the dissimilarity indicate that the word corresponding to the character string made up of the candidate characters does not exist in the word information.

【００２７】さらに上述した実施例では単語情報を用い
た知識処理の例につき説明したが文脈情報そのほかの知
識情報を用いた知識処理を行なう文字認識装置にこの発
明を適用してもよい。Further, in the above embodiment, an example of knowledge processing using word information has been described, but the present invention may be applied to a character recognition device that performs knowledge processing using context information and other knowledge information.

【００２８】[0028]

【発明の効果】以上詳細に説明したようにこの発明によ
れば、表示部の表示は、（１）後処理部で出力された文
字列が知識辞書に最初から存在した単語情報の単語であ
る場合と、（２）利用者が登録した単語情報の単語であ
る場合とで変化するので、もともと正解単語が得られて
いた箇所に新たに登録された単語情報の影響によって上
記（２）の場合の単語が誤って後処理結果として出力さ
れてしまうことがあっても、オペレータは容易にそれに
気づくことができる。その結果、利用者は知識辞書に最
適な単語情報を登録することができるようになる。従っ
て、正確に帳票または文書を処理できるように読取り精
度を向上させた文字認識装置を提供できる。As described in detail above, according to the present invention, the display on the display unit is (1) the character string output by the post-processing unit is the word of the word information that originally existed in the knowledge dictionary. In the case of (2) above, there is a difference between the case of (2) and the word of the word information registered by the user, and the influence of the word information newly registered in the place where the correct word was originally obtained. The operator can easily notice that even if the word is output as the post-processing result by mistake. As a result, the user can register optimum word information in the knowledge dictionary. Therefore, it is possible to provide a character recognition device with improved reading accuracy so that a form or a document can be processed accurately.

[Brief description of drawings]

【図１】本発明の実施例の構成を示す機能ブロック図で
ある。FIG. 1 is a functional block diagram showing a configuration of an exemplary embodiment of the present invention.

【図２】実施例の装置の処理概要を示す流れ図である。FIG. 2 is a flowchart showing an outline of processing of the apparatus of the embodiment.

【図３】帳票の一例を示す図である。FIG. 3 is a diagram showing an example of a form.

[Explanation of symbols]

１０文字認識装置１２認識部１４単語登録部１６後処理部１８制御部２０表示部２２光電変換部２４画像メモリ 10 character recognition device 12 recognition unit 14 word registration unit 16 post-processing unit 18 control unit 20 display unit 22 photoelectric conversion unit 24 image memory

Claims

[Claims]

1. A recognition unit for outputting a recognition result of a character pattern cut out from image data of a quantized form or document, a word registration unit for registering word information in a knowledge dictionary, and the knowledge dictionary. A post-processing unit for outputting a knowledge processing result based on the recognition result, and a display unit for displaying the knowledge processing result, wherein the display unit is a character output by the post-processing unit. Character recognition characterized in that the display method of the word is changed depending on whether the string is a word of word information that originally existed in the knowledge dictionary or a word registered by the word registration unit. apparatus.