JP3173363B2

JP3173363B2 - OCR maintenance method and device

Info

Publication number: JP3173363B2
Application number: JP05863296A
Authority: JP
Inventors: 勝彦高橋
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1996-03-15
Filing date: 1996-03-15
Publication date: 2001-06-04
Anticipated expiration: 2016-03-15
Also published as: JPH09251518A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、ＯＣＲ装置の認識
辞書・パラメータを最適に設定するためのＯＣＲメンテ
ナンス方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an OCR maintenance method for optimally setting a recognition dictionary and parameters of an OCR device.

【０００２】[0002]

【従来の技術】文書読み取り処理における重要な技術と
して、文字の切り出しと認識技術がある。文字切り出し
処理は一般的に、文字塊や文字間空白のヒストグラムと
いった統計量や文字認識の結果得られる距離値もしくは
類似度等を用いて行われるが、文書の性質にあわせてこ
れらの切り出し処理への寄与度をチューニングすると切
り出し性能が向上することが知られている。例えば、固
定ピッチ印刷文書の場合は、辻らの方法（「文字分
離」、電子情報通信学会技術報告、ＰＲＬ８３−６６）
のように切り出し位置間の距離をできるだけ一定に保つ
切り出し手法が適している。また、プロポーショナルピ
ッチの印刷文書や手書き文書の場合は、有吉の手法
（「動的な仮説生成・検証による日本語印刷文書からの
文字切り出し」、電子情報通信学会技術報告、ＰＲＵ９
３−４７）のように、切り出し位置間の距離よりも文字
認識結果等を重視する手法が一般的に利用される。この
ような各統計量の寄与度の設定は、寄与度をパラメータ
表現することによって制御可能であり、そうした方法は
石寺らの方法（「二次元配置情報を用いた文字列認識手
法」、電子情報通信学会技術報告、ＰＲＵ９５−１０
８）でも用いられている。2. Description of the Related Art As important techniques in document reading processing, there are techniques for extracting and recognizing characters. In general, character extraction processing is performed using statistics such as character chunks and histograms of inter-character spaces, and distance values or similarities obtained as a result of character recognition. It is known that the tuning performance is improved by tuning the contribution degree of. For example, in the case of a fixed-pitch printed document, the method of Tsuji et al. ("Character separation", IEICE technical report, PRL83-66)
Is suitable for keeping the distance between the cutout positions as constant as possible. In the case of a printed document or a handwritten document with a proportional pitch, Ariyoshi's method ("Character segmentation from Japanese printed document by dynamic hypothesis generation / verification", IEICE technical report, PRU9
As described in 3-47), a method of placing importance on the result of character recognition and the like rather than the distance between cutout positions is generally used. The setting of the degree of contribution of each statistic can be controlled by expressing the degree of contribution as a parameter. Such a method is described by the method of Ishidera et al. (“Character string recognition method using two-dimensional arrangement information”, electronic information IEICE Technical Report, PRU95-10
8) is also used.

【０００３】また、文字認識率を高めるためには、文字
認識アルゴリズムだけではなく、認識辞書（＝学習デー
タから作成される各カテゴリの標準パターン）の作成方
法も重要である。一般的に認識辞書は、多数の文字画像
に対して、その認識結果が正解になるように辞書内容を
変更していくことにより作成される。しかし、未学習文
字に対しては認識を誤る可能性があるため、文字を追加
学習する方法として、特開平４−２４７８３や特開平５
−３１４３０３などが開示されている。これらの発明で
は、文書画像を入力したときの認識結果を両面に表示
し、オペレータが指示する認識結果中の誤読文字を学習
して、システム性能を向上させる。In order to increase the character recognition rate, not only a character recognition algorithm but also a method of creating a recognition dictionary (= standard pattern of each category created from learning data) is important. Generally, a recognition dictionary is created by changing the contents of a dictionary for many character images so that the recognition result is correct. However, recognition of unlearned characters may be erroneous. Therefore, as a method of additionally learning characters, Japanese Patent Laid-Open Nos.
-314303 and the like are disclosed. In these inventions, the recognition result when a document image is input is displayed on both sides, and misread characters in the recognition result specified by the operator are learned to improve the system performance.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、従来文
字切り出し処理に関しては、パラメータは試行錯誤的に
決定されており、大量のデータに対してこれを最適化す
るのは非常に困難であった。However, in the conventional character segmentation processing, parameters are determined by trial and error, and it has been very difficult to optimize the parameters for a large amount of data.

【０００５】また、文書画像を学習に用いた特開平４−
２４７８３や特開平５−３１４３０３などの辞書学習方
法では、オペレータが誤読文字もしくは誤読単語を抽出
する必要があるため、大量の入力データを用いたシステ
ムのメンテナンスには向かない。Further, Japanese Patent Application Laid-Open No.
Dictionary learning methods such as 24783 and JP-A-5-314303 are not suitable for system maintenance using a large amount of input data because the operator needs to extract misread characters or misread words.

【０００６】本発明は上記の課題を解決するためになさ
れたものであり、文書画像の認識結果と正解テキストと
の対応づけを行い、誤読文字や文字切り出し誤り箇所を
自動的に抽出することによって、文字切り出しパラメー
タや認識辞書の最適化を容易に実現する方法を提供する
ことにある。SUMMARY OF THE INVENTION The present invention has been made to solve the above-described problems. The present invention relates a recognition result of a document image to a correct text, and automatically extracts a misread character or a character segmentation error portion. Another object of the present invention is to provide a method for easily realizing optimization of a character extraction parameter and a recognition dictionary.

【０００７】[0007]

【課題を解決するための手段】上記の目的を達成するた
めに、本発明では、文書画像中の文字を切り出し・認識
した結果と、文書画像中の文字を文字コード化した正解
テキストとを動的計画法で比較し、認識結果と正解テキ
ストとの距離と、文字の対応関係を求め、対応関係から
誤認識箇所を抽出して、距離ができるだけ小さくなるよ
うに誤認識した文字の認識辞書又は切り出しパラメータ
を修正する。In order to achieve the above object, according to the present invention, a result of cutting out and recognizing characters in a document image and a correct text in which characters in the document image are converted into character codes are moved. The distance between the recognition result and the correct text and the correspondence between the characters are determined by comparing them with the dynamic programming method, and the erroneous recognition part is extracted from the correspondence, and the recognition dictionary of the erroneously recognized character so that the distance is as small as possible. Modify the extraction parameters.

【０００８】更に本発明では、前記文字認識結果をＡ＝
｛ａ₁，ａ₂，…，ａ_i，…，ａ_I｝、前記正解テキス
トをＢ＝｛ｂ₁，ｂ₂，…，ｂ_j，…，ｂ_J｝とし、下
記の式により文字コード間距離ｄ（ｉ，ｊ）を定義した
時に、前記文字列照合手段においては、動的計画法の累
積距離を下記の漸化式ｇ（ｉ，ｊ）で定義し、前記距離
としてｇ（Ｉ，Ｊ）を出力し、かつその時の個々の文字
の対応関係を出力する。Further, according to the present invention, the character recognition result is represented by A =
_{_{{A 1, a 2, ...}} , a i, ..., a I}, the correct answer text _{_{B = {b 1, b 2}} , ..., b j, ..., b J} and, among character codes by the following formula When the distance d (i, j) is defined, the character string matching means defines the cumulative distance of the dynamic programming by the following recurrence formula g (i, j), and the distance g (I, j) J), and the correspondence between individual characters at that time is output.

【０００９】[0009]

【数３】 (Equation 3)

【００１０】文字認識結果と正解テキストを照合する
と、各文字について（１）正読文字、（２）誤読文字、
（３）文字切り出し誤り、の３つの場合が考えられる。
このうち、（１）、（２）に関しては文字コードが１対
１に、（３）に関しては文字コードが通常１対ｎ（もし
くはｎ対１）に対応する。従って、動的計画法によって
対応づけられた文字の文字コードの相違、及び最適対応
経路の形状からこれらの箇所を特定することができる。
特に、動的計画法の累積距離値を計算するときに、水平
方向・垂直方向にのびる経路は文字数が変化する対応、
すなわち文字切り出し誤りを表すことから、文字コード
が正しく対応する格子点からこれらの方向にのびる経路
に対して、通常の距離値に加え、ペナルティとなる距離
値を上乗せして文字切り出し誤り箇所を正しく抽出でき
るようにする。そして、動的計画法の累積距離値ができ
るだけ小さくなるように、誤認識した文字の認識辞書を
修正したり、切り出しパラメータを調整すれば、ＯＣＲ
システムの文字切り出し性能、及び文字認識性能をさら
に高めることができる。When the character recognition result is compared with the correct answer text, for each character, (1) correct reading character, (2) misread character,
(3) There are three cases of character extraction error.
Of these, the character codes for (1) and (2) correspond one-to-one, and the character codes for (3) usually correspond to one-to-n (or n-to-1). Therefore, these locations can be specified from the difference between the character codes of the characters associated by the dynamic programming and the shape of the optimal correspondence path.
In particular, when calculating the cumulative distance value of dynamic programming, the path extending in the horizontal and vertical directions corresponds to the number of characters changes,
In other words, since a character segmentation error is represented, in addition to the normal distance value, a penalty distance value is added to the path extending in these directions from the grid point where the character code correctly corresponds, and the character segmentation error portion is correctly identified. Be able to extract. If the recognition dictionary of the erroneously recognized character is corrected or the cutout parameters are adjusted so that the cumulative distance value of the dynamic programming becomes as small as possible, the OCR
The character extraction performance and the character recognition performance of the system can be further improved.

【００１１】[0011]

【発明の実施の形態】以下、本発明の実施の形態を詳細
に説明する。図１は、本発明のメンテナンス方法を採用
した認識用辞書メンテナンス方法の一例を示すブロック
図である。本実施の形態は、文書画像データベース１
０、各文書画像の正解テキスト１１、文字認識部１２、
文字列照合部１３、メンテナンス部１４、文字認識用辞
書１５から構成される。Embodiments of the present invention will be described below in detail. FIG. 1 is a block diagram showing an example of a recognition dictionary maintenance method employing the maintenance method of the present invention. In the present embodiment, the document image database 1
0, correct text 11 of each document image, character recognition unit 12,
It comprises a character string collating unit 13, a maintenance unit 14, and a character recognition dictionary 15.

【００１２】文書画像データベース１０は、システムの
メンテナンスに用いる文書画像の集合であり、正解テキ
スト１１は各画像に含まれている文字の文字コードを格
納したファイルである。１つの画像データには１つの正
解テキストが対応する。The document image database 10 is a set of document images used for system maintenance, and the correct answer text 11 is a file storing character codes of characters included in each image. One correct text corresponds to one image data.

【００１３】文字認識部１２は、レイアウト解析・行切
り出し・文字切り出し・文字認識等を行う処理部であ
り、文字領域を抽出した後に認識処理を行い、文字コー
ドを文字認識結果として出力する。The character recognition unit 12 is a processing unit that performs layout analysis, line segmentation, character segmentation, character recognition, etc., performs a recognition process after extracting a character area, and outputs a character code as a character recognition result.

【００１４】文字列照合部１３は、文字認識部１２の出
力する文字認識結果と入力文書画像に対応する正解テキ
スト１１を動的計画法によって行毎に照合する。動的計
画法の計算式を以下に示す。The character string collating unit 13 collates the character recognition result output from the character recognizing unit 12 and the correct text 11 corresponding to the input document image line by line by dynamic programming. The formula for the dynamic programming is shown below.

【００１５】[0015]

【数４】 (Equation 4)

【００１６】ここで関数ＯＲは、２つの引数のうち少な
くとも一方の値が０である時に値１を返し、それ以外の
場合は０を返す関数であり、この項が通常の動的計画法
とは異なる。図２を用いて、通常の動的計画法で文字コ
ード列を照合した時の例を説明する。通常の累積距離値
は、[0016] where the function OR is at least one value of the two arguments returns the value 1 when is 0, otherwise is a function that returns 0, this term is the normal dynamic programming Is different. An example in which a character code string is collated by ordinary dynamic programming will be described with reference to FIG. A typical cumulative distance value is

【００１７】[0017]

【数５】 (Equation 5)

【００１８】などで定義され、文字認識結果２０と正解
テキスト２１が与えられた場合には正しい経路２２と誤
った経路２３などが最適対応経路として抽出される。一
般の文字列照合では、入力パターン列にノイズパターン
が含まれていたり（挿入）、重要なパターンが欠落して
いる（欠落）場合、すなわち、入力パターン列中のパタ
ーンが標準パターン中のどのパターンとも対応しない場
合が少なくないので、パターンの対応関係を厳密に特定
することは難しい。しかし、本発明の場合には、認識結
果中の各文字は正解文字列のどれかの文字と対応すると
仮定することができるので、より正確に対応づけること
が可能である。本発明では、文字コードが一致する格子
点から横もしくは縦方向にのびる経路について関数ＯＲ
で示す距離値を上乗せし、斜め方向の経路すなわち正読
文字コードができるだけ１対１に対応するような経路を
抽出するようにする。文字コード列２０及び２１に対し
て本方式を適用すると、経路２２で示すパスが最小の累
積距離値ｇ（Ｉ，Ｊ）＝４を与える経路として抽出さ
れ、正しい対応関係を求めることができる。When a character recognition result 20 and a correct answer text 21 are given, a correct path 22 and an incorrect path 23 are extracted as optimally corresponding paths. In general character string matching, when an input pattern sequence contains a noise pattern (insertion) or an important pattern is missing (missing), that is, when the pattern in the input pattern sequence is any of the standard patterns In many cases, it is difficult to strictly specify the correspondence between patterns. However, in the case of the present invention, each character in the recognition result can be assumed to correspond to any one of the characters in the correct character string, so that it is possible to more accurately correspond. In the present invention, the function OR is used for a path extending in the horizontal or vertical direction from the grid point where the character code matches.
Is added, and a path in an oblique direction, that is, a path in which the correct-reading character code corresponds to the one-to-one correspondence as much as possible is extracted. When this method is applied to the character code strings 20 and 21, the path indicated by the path 22 is extracted as the path that gives the minimum cumulative distance value g (I, J) = 4 , and a correct correspondence relationship can be obtained.

【００１９】また、この対応関係から、誤読文字や切り
出し誤り等の認識誤り箇所を判定することができる。認
識結果中のｎ個の文字コードと正解テキストの１個の文
字コードが対応する場合には切り出し誤り（過分割）、
認識結果中の１個の文字コードと正解テキストのｎ個の
文字コードが対応する場合は切り出し誤り（過統合）、
認識結果と正解テキスト中の文字コードが１対１に対応
し、且つ文字コードが異なる場合は誤読文字、と判定で
きる。図２に示す例では、（認識結果，正解）誤りパターン（惰，情）誤読（報，報）（処，処）（理，理）（の，の）（発，発）（展，展）（１こ，に）切り出し誤り（過分割）（伴，伴）（い，い）という結果が得られる。From this correspondence, it is possible to determine a recognition error portion such as a misread character or a cutout error. If n character codes in the recognition result correspond to one character code of the correct text, a cutout error (over-segmentation) occurs,
If one character code in the recognition result and n character codes of the correct text correspond, a cutout error (over-integration),
If the recognition result and the character code in the correct answer text correspond one-to-one and the character codes are different, it can be determined that the character is misread. In the example shown in FIG. 2, (recognition result, correct answer) error pattern (coast, information) misreading (report, report) (process, process) (physical, logical) (no,, no) (development, departure) (exhibition, exposition) (1), (1), (2), (1), (2), (1), (2) are obtained.

【００２０】また、切り出し誤りや誤読文字が連続して
発生した場合の例を図３に示す。このような場合には、
点線で示した経路のどれを選んでも等しい累積距離値と
なり、正確な対応経路を求めることができない。そこ
で、誤認識箇所を特定する際、認識結果中のｎ（≧２）
文字を連続して誤認識し、且つそれに対して正解テキス
ト中のｍ（≧２）文字が対応する場合は、このｍ文字に
関してはメンテナンス用情報として考慮しない、もしく
はその旨をユーザに通知するなどする。FIG. 3 shows an example in which cutout errors and misread characters occur continuously. In such a case,
Even if any of the routes indicated by the dotted lines are selected, the accumulated distance values become equal, and an accurate corresponding route cannot be obtained. Therefore, when specifying a misrecognized part, n (≧ 2) in the recognition result
If characters are erroneously recognized consecutively and m (≧ 2) characters in the correct text correspond thereto, the m characters are not considered as maintenance information, or the user is notified of the fact. I do.

【００２１】但し、ＯＣＲの基本性能が低く、誤認識を
連続して起こしやすい場合には、文字コード間距離の定
義を修正することによって、ある程度メンテナンス用情
報を増やすことができる。前記実施例では文字コード間
距離を２値で表現したが、形状の似ている文字の組（例
えば“ば”と“ぱ”など）やＯＣＲが間違えやすい文字
の組をあらかじめ選定しておき、これらの文字に対応す
るコード間距離Ｄ_simを０＜Ｄ_sim＜１と定義すれば、
ある程度正しい経路を求めることができる。図３の場合
であれば、正解対応経路３３が一意に求められる。ゆえ
に、このように累積距離の計算式を設定した場合には、
類似形状もしくは誤読しやすい文字による誤認識以外の
誤認識文字が連続する箇所のみがメンテナンスに用いら
れないことになる。However, if the basic performance of the OCR is low and erroneous recognition is likely to occur continuously, maintenance information can be increased to some extent by modifying the definition of the distance between character codes. In the above-described embodiment, the distance between character codes is represented by binary values. However, a set of characters having similar shapes (for example, “ba” and “ぱ”) and a set of characters in which OCR is likely to be mistaken are selected in advance. If the inter-code distance D _sim corresponding to these characters is defined as 0 <D _sim <1, then
The correct route can be obtained to some extent. In the case of FIG. 3, the correct answer path 33 is uniquely obtained. Therefore, when the formula for calculating the cumulative distance is set in this way,
Only locations where erroneously recognized characters other than erroneously recognized characters having similar shapes or characters that are easy to read erroneously are not used for maintenance.

【００２２】メンテナンス部１４は、動的計画法の累積
距離ができるだけ小さくなるように、誤認識した文字の
認識辞書を修正もしくは追加登録したり、切り出しパラ
メータを調整したりする。The maintenance unit 14 corrects or additionally registers a recognition dictionary of erroneously recognized characters or adjusts a cut-out parameter so that the cumulative distance of the dynamic programming is as small as possible.

【００２３】認識辞書の修正は、主に誤読した文字につ
いて行うが、文字切り出しアルゴリズムが文字認識結果
を利用している場合は、切り出し誤り（過分割）した文
字に対する学習も有効である。また、切り出し誤りが多
い場合は、パラメータを微小変化させる。こうした修正
を、数十行程度分の累積距離値の合計値が小さくなるよ
うに行うことにより、システムの読み取り性能を高める
ことができる。Correction of the recognition dictionary is mainly performed for characters that have been misread, but if the character cutout algorithm uses the result of character recognition, learning for characters that have been cut out incorrectly (over-segmented) is also effective. If there are many cutout errors, the parameters are minutely changed. By performing such a correction so that the total value of the accumulated distance values for about several tens of lines becomes smaller, the reading performance of the system can be improved.

【００２４】[0024]

【発明の効果】このように、本発明によれば、ＯＣＲの
文字認識性能及び文字切り出し性能を効果的に改善する
ことができる。また、本発明では、累積距離計算式にＯ
Ｒ関数項を付加した動的計画法を用いて正解テキストと
認識結果とを照合するが、本方式は２つのパターン間で
パターンの挿入や欠落がない場合、すなわち一方に含ま
れるパターンが必ず他方のあるパターンに対応するよう
なパターン系列のマッチングにおいて効果を有する。従
って、複数の文字認識装置から得られた文字認識結果、
もしくは、重複を有する複数の文書画像から得られた文
字認識結果を総合して全体としての文字認識結果を生成
する場合などの文字列照合などにも適用することが可能
である。As described above, according to the present invention, the character recognition performance and the character segmentation performance of OCR can be effectively improved. In addition, in the present invention, O
The correct text and the recognition result are collated using dynamic programming to which an R function term is added. In this method, when there is no insertion or deletion of a pattern between two patterns, that is, when a pattern included in one pattern is always the other This has an effect in matching a pattern sequence corresponding to a certain pattern. Therefore, character recognition results obtained from a plurality of character recognition devices,
Alternatively, the present invention can be applied to character string collation in a case where character recognition results obtained from a plurality of overlapping document images are combined to generate a character recognition result as a whole.

[Brief description of the drawings]

【図１】文字認識辞書メンテナンス装置の一実施例を示
すブロック図である。FIG. 1 is a block diagram showing an embodiment of a character recognition dictionary maintenance device.

【図２】認識結果と正解テキストの対応を示す図であ
る。FIG. 2 is a diagram showing a correspondence between a recognition result and a correct answer text.

【図３】連続して文字を誤認識している時の対応を示す
図である。FIG. 3 is a diagram illustrating a case where characters are continuously erroneously recognized;

[Explanation of symbols]

１０文書画像データベース１１正解テキスト１２文字認識部１３文字列照合部１４メンテナンス部１５文字認識用辞書２０文字認識結果（１行分）２１正解テキスト（１行分）２２正しい最適経路２３正しい経路と同じ累積距離値を与える経路３０文字認識結果３１正解テキスト３２抽出される対応経路３３変更した累積距離の計算式により抽出される対応
経路Reference Signs List 10 Document image database 11 Correct text 12 Character recognition unit 13 Character string collation unit 14 Maintenance unit 15 Character recognition dictionary 20 Character recognition result (for one line) 21 Correct text (for one line) 22 Correct optimal path 23 Same as correct path Path that gives the cumulative distance value 30 Character recognition result 31 Correct answer text 32 Corresponding path to be extracted 33 Corresponding path to be extracted by the modified cumulative distance calculation formula

フロントページの続き (56)参考文献特開平２−250188（ＪＰ，Ａ) 特開平５−128299（ＪＰ，Ａ) 特開平６−176197（ＪＰ，Ａ) 特開平６−195519（ＪＰ，Ａ) 特開平６−251204（ＪＰ，Ａ) 特開昭61−296480（ＪＰ，Ａ) 特開平４−98586（ＪＰ，Ａ) 「パターン認識と学習のアルゴリズム」上坂吉則、尾関和彦著、第91頁乃第 108頁（特に、第100頁乃至102頁６．５脱落と挿入の項参照）1992年５月６日株式会社文一総合出版（第２刷）発行 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06K 9/68 G06K 9/62 640 Continuation of the front page (56) References JP-A-2-250188 (JP, A) JP-A-5-128299 (JP, A) JP-A-6-176197 (JP, A) JP-A-6-195519 (JP) JP-A-6-251204 (JP, A) JP-A-61-296480 (JP, A) JP-A-4-98586 (JP, A) "Pattern recognition and learning algorithms" Yoshinori Uesaka, Kazuhiko Ozeki Author, pp. 91-108 (especially, see pages 6.5 and 6.5, pages 100-102) Published on May 6, 1992, Bunichi Sogo Publishing Co., Ltd. (2nd printing) (58) Field surveyed (Int.Cl. ⁷ , DB name) G06K 9/68 G06K 9/62 640

Claims

(57) [Claims]

1. A character cutout recognizing means for cutting out and recognizing characters in a document image, a character recognition result output by the character cutout recognizing means,
A character string matching unit that compares the character in the document image prepared in advance with a character-coded correct text by a dynamic programming method, and outputs a distance between the character recognition result and the correct text, and a correspondence relationship between the characters. And a system maintenance means for extracting a misrecognized portion from the correspondence and correcting a recognition parameter of a misrecognized character or a cut-out parameter so that the distance becomes as small as possible. a ₁ , a ₂ , ..., a _i , ...,
a _I }, and the correct answer text is B = {b ₁ , b ₂ ,.
_j ,..., b _J }, and when the inter-character code distance d (i, j) is defined by the following equation, the character string matching means calculates the cumulative distance of the dynamic programming by the following recurrence equation g (I,
j), and outputs g (I, J) as the distance,
An OCR maintenance device characterized by outputting the correspondence of each character at that time. (Equation 1) [OR () returns 0 if both arguments are 1 ;
Otherwise, a function that returns 1 is shown. min is a function that returns the minimum value of the calculation results of the three equations shown on the right side of the parentheses (｛). w indicates a positive constant. ]

2. A dynamic programming method for comparing a result of extracting and recognizing characters in a document image with a correct text prepared by converting characters in the document image into character codes, and comparing the result with the correct answer. The distance from the text and the correspondence between the characters are obtained, the erroneously recognized portion is extracted from the correspondence, and the recognition dictionary or cutout parameter of the erroneously recognized character is corrected so that the distance becomes as small as possible. A = ｛a ₁ , a ₂ , ..., a _i , ...,
a _I }, and the correct answer text is B = {b ₁ , b ₂ ,.
_j ,..., b _J }, and when the character code distance d (i, j) is defined by the following equation, the cumulative distance of the dynamic programming is defined by the following recurrence equation g (i, j). , The distance as g
An OCR maintenance method, wherein (I, J) is obtained, and a correspondence between individual characters at that time is obtained. (Equation 2) [OR () returns 0 if both arguments are 1;
Otherwise, a function that returns 1 is shown. min is a function that returns the minimum value of the calculation results of the three equations shown on the right side of the parentheses (｛). w indicates a positive constant. ]