JP2755738B2

JP2755738B2 - Character recognition device

Info

Publication number: JP2755738B2
Application number: JP1301248A
Authority: JP
Inventors: 徹松尾; 恒太藤村
Original assignee: Sanyo Denki Co Ltd
Current assignee: Sanyo Denki Co Ltd
Priority date: 1989-11-20
Filing date: 1989-11-20
Publication date: 1998-05-25
Anticipated expiration: 2013-05-25
Also published as: JPH03161890A

Description

【発明の詳細な説明】（イ）産業上の利用分野本発明は、文字認識装置に係り、修正作業の簡便化や
単語照合等の後処理の精度を向上させるため、認識候補
文字を最小限に絞り込む文字認識装置に関するものであ
る。DETAILED DESCRIPTION OF THE INVENTION (a) Field of Industrial Application The present invention relates to a character recognition device, which minimizes recognition candidate characters in order to simplify correction work and improve post-processing accuracy such as word matching. The present invention relates to a character recognition device that narrows down the search.

（ロ）従来技術従来の文字認識では、認識結果として、候補文字は定
数個（通常10個程度）に固定されているものがある。ま
た電子情報通信学会論文法（Ｄ）Vol.J71−D No.6 pp10
37−1047 1988年６月の論文「情景画像からの文字パタ
ーン抽出と認識」に開示されているように、各文字カテ
ゴリ毎に距離（または類似度）のしきい値が設定され
て、個数可変のものがある。更に、電子情報通信学会論
文誌Ｄ−II Vol.J72−Ｄ−II No.7 pp993−1000 1989年
７月の論文「候補文字補完と言語処理による漢字認識の
誤り訂正処理法」に開示されているように、固定しきい
値により制限を加えたもの等の種々の方式によりある程
度候補文字は絞られていたが、十分とはいえなかった。(B) Conventional technology In conventional character recognition, as a recognition result, there are some candidate characters that are fixed to a fixed number (usually about 10). The Institute of Electronics, Information and Communication Engineers (D) Vol.J71-D No.6 pp10
37-1047 As disclosed in a paper "Character Pattern Extraction and Recognition from Scene Images" in June 1988, a threshold value of the distance (or similarity) is set for each character category, and the number is variable. There are things. Furthermore, it is disclosed in the IEICE Transactions D-II Vol.J72-D-II No.7 pp993-1000 in July 1989, "Error Correction Processing Method for Kanji Recognition by Candidate Character Completion and Language Processing". As described above, although the number of candidate characters was narrowed down to some extent by various methods such as those limited by a fixed threshold value, it was not sufficient.

（ハ）発明が解決しようとする課題候補文字の数は、選択する側の繁雑さからすると、で
きるだけ少ないことが望ましい。(C) Problems to be Solved by the Invention It is desirable that the number of candidate characters be as small as possible in view of the complexity of the selecting side.

従来の各方法では候補の絞り込みが十分でなく、修正
作業において、候補選択の際に、ほとんど類似していな
いような文字まで候補に挙がっており、選択が繁雑であ
った。In each of the conventional methods, candidates are not sufficiently narrowed down, and in the correction work, characters that are hardly similar are listed as candidates at the time of candidate selection, and selection is complicated.

また、認識後に単語照合等の後処理にかける場合に
は、候補文字が必要以上に多くなるため、組合せの数が
増えることにより、別の単語と照合されてしまい、正し
い結果が得られないことがあった。In addition, when performing post-processing such as word matching after recognition, the number of candidate characters increases more than necessary, and the number of combinations increases. was there.

本発明は、上述した従来の問題点に鑑み、認識精度を
損なうことなく候補文字の数を絞り込み、修正作業を容
易に、あるいは後処理を高精度にすることをその課題と
する。SUMMARY OF THE INVENTION In view of the above-described conventional problems, an object of the present invention is to narrow down the number of candidate characters without deteriorating recognition accuracy, to facilitate correction work, or to increase post-processing accuracy.

（ニ）課題を解決するための手段本発明は、各文字カテゴリ毎の標準字体をもつ標準パ
ターン辞書と、入力された文字パターンと上記標準パタ
ーン辞書に登録されている文字種との間の標準パターン
間距離を算出するパターン間距離算出部と、算出された
距離の小さい順に文字を並べ替える距離順並べ替え部
と、候補文字をおおまかに限定する第１次候補文字絞り
部と、並べ替え後の隣合う文字の間の距離差を算出する
とももに距離差計算部と距離差の平均並びに分散を計算
し、候補文字の絞り込みの可否を判定しかつ最大距離差
のところで、候補文字とそれ以外に分割する第２候補文
字絞り部とを具備し、距離差の最大となるところを見つ
けることにより、それより距離の小さい文字を候補文字
として残し、それ以外は棄却することを特徴とする。(D) Means for Solving the Problems The present invention provides a standard pattern dictionary having a standard font for each character category, and a standard pattern between an input character pattern and a character type registered in the standard pattern dictionary. An inter-pattern distance calculating unit for calculating an inter-distance, a distance order rearranging unit for rearranging characters in ascending order of the calculated distance, a first candidate character narrowing unit for roughly limiting candidate characters, In addition to calculating the distance difference between adjacent characters, the distance difference calculator and the average and variance of the distance difference are calculated, and it is determined whether or not the candidate characters can be narrowed down. And a second candidate character narrowing unit that divides the characters into characters, and by finding a position where the distance difference becomes the maximum, leaving characters having a smaller distance as candidate characters, and rejecting others. And

（ホ）作用候補文字を絞り込むことができるかどうかは、距離差
のバラツキが大きいかそうでないかによって決定するた
め、類似文字が少ない場合には、類似文字をそうでない
文字との間に明らかな距離差を生じる。したがって、距
離差に候補文字と棄却文字の分離点が求まり、候補は絞
られる。逆に、類似文字の多い場合は無理に候補を絞る
ことは避け、候補文字は全て残されることになる。(E) Function Whether or not the candidate characters can be narrowed down is determined by whether or not the variation in the distance difference is large or not. A distance difference occurs. Therefore, the separation point between the candidate character and the rejected character is obtained for the distance difference, and the candidates are narrowed down. On the other hand, when there are many similar characters, it is not necessary to forcibly narrow down the candidates, and all the candidate characters are left.

以上のように、この発明は、無理なくかつ効果的に候
補文字が絞られる。As described above, according to the present invention, candidate characters are reasonably and effectively narrowed down.

（ヘ）実施例以下、本発明の一実施例を図面を参照して説明する。(F) Example Hereinafter, an example of the present invention will be described with reference to the drawings.

本発明においては、候補文字の絞り込みは、候補文字
を距離の小さい順（類似度の大きい順）に並べた際の、
距離差に基づいて行われる。In the present invention, the narrowing down of candidate characters is performed by arranging candidate characters in the order of smaller distance (in order of larger similarity).
This is performed based on the distance difference.

まず本発明の概念につき第１図に従い説明する。第１
図は候補文字を距離の小さい順に並べたときの様子を示
す概念図であり、この図においては第１次候補文字を10
個としている。同図では、黒丸が１つの候補文字を表
し、距離を横軸として、距離軸方向の分布を示す。First, the concept of the present invention will be described with reference to FIG. First
FIG. 5 is a conceptual diagram showing a state in which candidate characters are arranged in ascending order of distance.
It is made into pieces. In the figure, a black circle represents one candidate character, and the distribution in the distance axis direction is shown with the distance as the horizontal axis.

第１図（ａ）は距離差にバラツキがある場合を示し、
図中の破線の所で距離差が最大となっている。すなわ
ち、第３位の文字と第４位の文字の間には優位な距離差
があり、そこを境に候補文字に含める文字と、棄却すべ
き文字のグループとを分離することができる。FIG. 1A shows a case where there is a variation in the distance difference,
The distance difference is maximum at the location of the broken line in the figure. That is, there is a significant distance difference between the third character and the fourth character, and the character to be included in the candidate characters and the group of characters to be rejected can be separated from the difference.

また、第１図（ｂ）は、距離差はほぼ同じである場合
を示す。この場合は、第１図（ａ）の場合にように２つ
のグループに分離することは危険である。この場合に
は、強制的に候補を絞り込むことは避け、すべての文字
を候補文字として残す。FIG. 1 (b) shows a case where the distance difference is almost the same. In this case, separation into two groups as in the case of FIG. 1 (a) is dangerous. In this case, forcibly narrowing down candidates is avoided, and all characters are left as candidate characters.

実際の場合にい当てはめてみると、第１図（ａ）は比
較的類似文字の少ない場合に相当し、第１図（ｂ）は類
似文字が多く存在する場合に相当する。When applied to an actual case, FIG. 1A corresponds to a case where there are relatively few similar characters, and FIG. 1B corresponds to a case where there are many similar characters.

本発明においては、まず、第１図の（ａ）の場合であ
るか、（ｂ）の場合であるかの判定が必要である。In the present invention, first, it is necessary to determine whether this is the case in FIG. 1 (a) or (b).

この判定につき、まず説明すると、各候補文字間の距
離差の平均を求め、その値よりも極端に大きいものであ
るかどうかを判定基準とする。すなわち、極端に大きい
ものがあれば、候補文字はさらに絞り込み可能であると
考える。もしそうでなければ、これ以上の絞り込みは行
わない。To explain this determination, first, the average of the distance differences between the candidate characters is obtained, and whether or not the average is extremely larger than that value is used as a criterion. That is, if there is an extremely large one, it is considered that the candidate characters can be further narrowed down. If not, no further refinement is performed.

この方式では、候補文字数があらかじめ数文字に限定
されている必要がある（第１図では10候補であった）。
このため、第１段階として、従来法を用いて候補文字を
数文字に限定しておく。第２段階では、上記の方式によ
り、その候補文字群の距離差を求め、可能であれば、候
補に含める文字と棄却する文字とに分割を行う。In this method, the number of candidate characters needs to be limited to a few characters in advance (10 candidates in FIG. 1).
For this reason, as a first step, the number of candidate characters is limited to a few using the conventional method. In the second stage, the distance difference between the candidate character groups is obtained by the above-described method, and if possible, the characters are divided into characters to be included in the candidates and characters to be rejected.

次に第２図および第３図に従い本発明を更に説明す
る。Next, the present invention will be further described with reference to FIGS.

第２図は本発明の構成の一例を示す構成図、第３図は
処理の流れを示す。FIG. 2 is a block diagram showing an example of the configuration of the present invention, and FIG. 3 shows a processing flow.

パターン間距離算出部２では、入力された文字パター
ン１と標準パターン辞書３に登録されている文字種全て
に対し、特徴比較を行い標準パターンとのパターン間距
離を算出する。標準パターン辞書３は各文字カテゴリ毎
の標準字体が格納されている。パターン間距離算出部２
で得られた結果すなわち、文字種とその距離値は、距離
順並べ替え部４に与えられる。距離順並べ替え部４で
は、第１図に示すように、距離の小さい順に並び替えら
れる。そして、並び替えられたデータは第１次候補文字
絞り部５に与えられる。第１次候補文字絞り部５は与え
られたデータに基づき、数候補に絞る。この処理では、
従来法を用いて候補をおおまかに絞り込む。この処理
は、単に候補文字を定数個（例えば、10個程度）に絞る
程度でもよいし、あるいは、文字カテゴリ毎にしきい値
を設定することによって候補を絞ってもよい。The inter-pattern distance calculation unit 2 performs a feature comparison on the input character pattern 1 and all the character types registered in the standard pattern dictionary 3 to calculate an inter-pattern distance from the standard pattern. The standard pattern dictionary 3 stores standard fonts for each character category. Pattern distance calculation unit 2
, That is, the character type and its distance value are provided to the distance order rearranging unit 4. In the distance order rearranging section 4, as shown in FIG. 1, the distances are rearranged in ascending order. The rearranged data is provided to the primary candidate character narrowing unit 5. The primary candidate character narrowing unit 5 narrows down to several candidates based on the given data. In this process,
Broadly narrow down candidates using conventional methods. In this process, the number of candidate characters may be simply reduced to a constant number (for example, about 10), or the number of candidates may be reduced by setting a threshold value for each character category.

続いて、データは第２次候補文字絞り部６に送られ
る。第２次候補文字絞り部６は距離差計算部61、平均並
びに分散計算部62、絞り込み可否判定部、および候補文
字分離部64を備え、次のような処理が行われる。Subsequently, the data is sent to the second candidate character narrowing section 6. The second candidate character narrowing unit 6 includes a distance difference calculating unit 61, an average and variance calculating unit 62, a narrowing down possibility determining unit, and a candidate character separating unit 64, and performs the following processing.

まず、必要であれば距離値の小さい順に並べ替えた
後、（本実施例では第１次候補文字絞り部の時点におい
てすでに距離順に並んでいるため第２図では省略してあ
る。）距離差計算部61にデータが与えられる。距離差計
算部61おいて、隣合う文字の距離の差分を計算する。平
均並びに分散計算部62では、距離差を平均並びに分散値
を計算する。そして、絞り込み可否判定部63では、その
平均値と分散とをもとに、距離差の極端に大きいものが
あるかどうかを判定する。具体的には、以下の式を満た
す距離差のものが存在する場合には、候補文字分離部64
が、最大の距離差のところで分割し、候補文字とそれ以
上の距離の文字候補は棄却する。First, if necessary, the distance values are rearranged in ascending order, and are omitted in FIG. 2 (in the present embodiment, they are already arranged in the order of distance at the time of the primary candidate character narrowing unit). The data is given to the calculation unit 61. The distance difference calculator 61 calculates the difference between the distances of adjacent characters. The average and variance calculator 62 calculates the average and variance of the distance difference. Then, the narrowing down determination unit 63 determines whether or not there is an extremely large distance difference based on the average value and the variance. Specifically, if there is a distance difference satisfying the following equation, the candidate character separation unit 64
However, the candidate is divided at the maximum distance difference, and the candidate character and the character candidate at a distance longer than that are rejected.

（距離差）＞（距離差平均）＋（距離差の分散の２倍）もし、上式を満たす距離差のものがない場合には、第
１次候補文字絞り部５で選別された、文字をそのまま候
補文字として採用する。(Distance difference)> (average distance difference) + (twice the variance of the distance difference) If there is no distance difference that satisfies the above equation, the characters selected by the primary candidate character narrowing unit 5 Is adopted as a candidate character as it is.

得られた候補文字は、後処理部７に渡され、単語照合
等の処理が行われる。The obtained candidate characters are passed to the post-processing unit 7, where processing such as word matching is performed.

また、第２図の例では示していないが、後処理を行わ
ない場合には、修正時に次候補文字として採用される。Although not shown in the example of FIG. 2, when post-processing is not performed, the character is adopted as the next candidate character at the time of correction.

次に本発明の処理手順につき第３図の流れ図に従い説
明する。Next, the processing procedure of the present invention will be described with reference to the flowchart of FIG.

ステップS1において、パターン間距離算出部２で、入
力された文字パターンと標準パターン辞書３に登録され
ている文字種全てに対し、特徴比較を行う。In step S 1, the inter-pattern distance calculation unit 2 performs a feature comparison on the input character pattern and all the character types registered in the standard pattern dictionary 3.

続いて、ステップS2において、ステップS1で得られた
結果（文字種とその距離値）が、距離順並べ替え部４
で、距離の小さい順に並び替えられる。Subsequently, in step S2, the result obtained in step S1 (character type and its distance value) is
, And are sorted in ascending order of distance.

ステップS3では、第１次候補文字絞り部５によって数
候補に絞られる。In step S3, the first candidate character narrowing unit 5 narrows down to several candidates.

その後、第２次候補文字絞り部６での処理が行なわれ
る。第２図候補文字絞り部６で、次のような処理が行わ
れる。Thereafter, the processing in the second candidate character narrowing section 6 is performed. The following processing is performed in the candidate character narrowing section 6 in FIG.

ステップS4において、距離差計算部61において、隣合
う文字の距離の差分を計算する。そして、ステップS5に
おいて、平均並びに分散計算部62で距離差の平均および
分散の計算を行ないステップS6に進む。In step S4, the distance difference calculation unit 61 calculates the difference in distance between adjacent characters. Then, in step S5, the average and variance calculation unit 62 calculates the average and variance of the distance difference, and then proceeds to step S6.

ステップ６においては、絞り込み可否判定部63で、そ
の平均値と分散とをもとに、距離差の極端に大きいもの
があるかどうかを判定する。In step 6, based on the average value and the variance, the narrowing down determination unit 63 determines whether or not there is an extremely large distance difference.

上述した所定の距離差のものが存在する場合にはステ
ップS7へ進む。ステップS7では、候補文字分離部64で、
最大の距離差のところで分割し、それ以上の距離の文字
候補は棄却し、そして動作を終了する。If there is the above-mentioned one having the predetermined distance difference, the process proceeds to step S7. In step S7, the candidate character separation unit 64
Splitting is performed at the maximum distance difference, character candidates at distances longer than that are rejected, and the operation ends.

一方、距離差のものがない場合には、ステップS8へ進
み、第１次候補文字絞り処理で選別された、文字をその
まま候補文字として採用し、そして、動作を終了する。On the other hand, if there is no difference in distance, the process proceeds to step S8, the character selected in the primary candidate character narrowing process is adopted as a candidate character as it is, and the operation is terminated.

（ト）発明の効果以上説明したように本発明によれば、文字認識精度を
損なうことなく、大幅に候補文字を絞ることができ、修
正時において、使用者の候補文字選択の繁雑さを軽減で
きる。また、認識後処理を行う場合にも、候補文字を減
少させることで、単語照合の組合せの数が減って、処理
の高速化と高精度化が可能である。(G) Effects of the Invention As described above, according to the present invention, candidate characters can be significantly narrowed without impairing the character recognition accuracy, and the complexity of user selection of candidate characters during correction can be reduced. it can. Also in the case of performing post-recognition processing, by reducing the number of candidate characters, the number of combinations of word matching is reduced, and the processing can be speeded up and the accuracy can be increased.

[Brief description of the drawings]

第１図は本発明の概念図、第２図は本発明の実施例を示
す構成図、第３図は本発明における処理の流れ図であ
る。２……パターン間距離演算部、３……標準パターン辞
書、４……距離順並び替え部、５……第１次候補文字絞
り部、６……第２次候補文字絞り部。FIG. 1 is a conceptual diagram of the present invention, FIG. 2 is a block diagram showing an embodiment of the present invention, and FIG. 3 is a flowchart of processing in the present invention. 2... Inter-pattern distance calculation unit, 3... Standard pattern dictionary, 4... Distance order rearranging unit, 5... Primary candidate character narrowing unit, 6.

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平３−71285（ＪＰ，Ａ) 特開昭53−80922（ＪＰ，Ａ) 特開昭61−72376（ＪＰ，Ａ) 特開昭61−114299（ＪＰ，Ａ) 特開昭62−192890（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁶，ＤＢ名) G06K 9/62 特許ファイル（ＰＡＴＯＬＩＳ) ＪＩＣＳＴファイル（ＪＯＩＳ)──────────────────────────────────────────────────続き Continuation of the front page (56) References JP-A-3-71285 (JP, A) JP-A-53-80922 (JP, A) JP-A-61-72376 (JP, A) JP-A-61-72376 114299 (JP, A) JP-A-62-192890 (JP, A) (58) Fields investigated (Int. Cl. ⁶ , DB name) G06K 9/62 Patent file (PATOLIS) JICST file (JOIS)

Claims

(57) [Claims]

1. A standard pattern dictionary having a standard font for each character category, an inter-pattern distance calculation unit for calculating a standard inter-pattern distance between an input character pattern and a character type registered in the standard pattern dictionary. A distance ordering unit that sorts characters in ascending order of the distance calculated by the calculating unit; a primary candidate character narrowing unit that roughly limits candidate characters; a distance difference between adjacent characters after sorting Divide the candidate character and the rest at the maximum distance difference, and the distance difference calculation unit, the average and variance calculation unit that calculates the average and variance of the distance difference, the narrowing down determination unit that determines whether narrowing down the candidate characters A candidate character separating unit comprising: a candidate character separating unit; and a second candidate character narrowing unit comprising: a candidate character separating unit; Left as characters, character recognition apparatus characterized by otherwise rejected.