JPS60153579A

JPS60153579A - Forming method of dictionary for character recognition

Info

Publication number: JPS60153579A
Application number: JP59008102A
Authority: JP
Inventors: Teruaki Inoue; 井上　暉朗; Minoru Nagao; 永尾　実; Nobukazu Nasuhara; 茄子原　伸和
Original assignee: Tateisi Electronics Co; Omron Tateisi Electronics Co
Current assignee: Omron Corp
Priority date: 1984-01-20
Filing date: 1984-01-20
Publication date: 1985-08-13

Abstract

PURPOSE:To form a highly accurate dictionary automatically by encoding the features of a character, and storing only feature codes excluding inhibited features out of feature codes including essential features in a dictionary to use the stored feature codes as study data. CONSTITUTION:When a CPU11 executes a program stored in a RAM12, a counter CO is initially set up. The contents of the counter CO are compared with the essential feature codes of the character, and when the essential feature is specified as ''0'' or ''1'' and the bit of the corresponding counter CO coincides with the essential feature code, an inhibited feature table 15 is referred. When the combination of the counter CO and the corresponding bit in the inhibited feature table coincides with each other, the succeeding character is selected as the inhibited feature. In case of inconsistency, the contents of the counter CO are stored in a RAM13 as study data. Thus, the contents of the counter CO which are stored in the RAM13 become automatically generated (formed) effective study data.

Description

【発明の詳細な説明】〈発明の技術分野〉この発明に、帳票上に手書きされた文字図形を計算機尋
に読み取らせるときに使用される文字認識用□辞書の作
成方式に関する。DETAILED DESCRIPTION OF THE INVENTION Technical Field of the Invention The present invention relates to a method for creating a dictionary for character recognition used when a computer reads characters and figures handwritten on a form.

〈従来技術とその問題点〉第１図には、従来における文字認識装置の一例がブロッ
ク図として概略的に示されている。帳票１上に手書きに
よって薔かれた文字図形に、ＣＯＤ等のイメージセンサ
２によって光学的に読み取られ、その光信号が電気信号
に変換されてＦ■コンバータ３に入力され、ここでＡ／
Ｄ変換されて前処理回路４に入力される。この前処理回
路４では、光学的に読み取った文字図形を２値化すなわ
ち白地と黒地とに分けた後、ノイズ除去、細線化等のい
わゆる前処理を実行する。前処理回路４の次の特徴抽出
回路５では、前処理後の文字図形の文字特徴、例えば文
字端点、文字分岐点、文字交差点、屈曲点、凹み、ルー
プ等全抽出する。ここで抽出された文字特徴は、予め標
準パターンの文字特徴をコード化して格納しである辞書
６と呼ばれるメモリ手段と同一のフォーマットで表現さ
れる。次に辞書照合回路１では、特徴抽出回路５で抽出
された文字特徴コードと辞書６の標準コードとを比較し
て、両者が一致したときには、そのコードを持つ標準パ
ターンのＩＤを出力し、一致しないときは、いわゆるリ
ジェクトとして処理される。<Prior art and its problems> FIG. 1 schematically shows an example of a conventional character recognition device as a block diagram. Characters and figures drawn by hand on the form 1 are optically read by an image sensor 2 such as a COD, and the optical signal is converted into an electrical signal and input to the F converter 3, where it is converted into an A/
The signal is D-converted and input to the preprocessing circuit 4. This preprocessing circuit 4 binarizes the optically read characters and figures, that is, divides them into white and black backgrounds, and then performs so-called preprocessing such as noise removal and line thinning. A feature extraction circuit 5 following the preprocessing circuit 4 extracts all character features of the preprocessed character graphics, such as character endpoints, character branching points, character intersections, bending points, depressions, and loops. The character features extracted here are expressed in the same format as a memory means called a dictionary 6, which stores character features of standard patterns encoded in advance. Next, the dictionary matching circuit 1 compares the character feature code extracted by the feature extraction circuit 5 with the standard code in the dictionary 6, and when the two match, outputs the ID of the standard pattern having that code, and If you do not, it will be treated as a so-called reject.

このような手書文字認鎗装置においては、帳票への記入
文字が筆記者によって同−字種であっても異なるため、
同一字種につｈてサブカテゴリと呼はれる複数の辞書全
用意しておかなければならない。第２図ＶＣに、文字１
アコについての辞書の一例が示されており、各回ω）０
Ｖこ示すサブカテゴリは１それぞれコード１０１１１．
００１００．１１１０１のように表現される。ここでビ
ット番号０は、文字図形の連結成分数が１のときに論理
１となり、１でないときに論理０となる。同様にビット
番号ｌは、文字図形にループがあるときに論理１、ない
ときに論理０となる。ビット番号２は、左凹みであυ、
文字図形全左側方向から見て凹みがあるかないかの意味
である。左凹みがあるととは論理１とな−り、ないとき
には論理０となる。ビット番号３は、文字端点数が３個
のときに論理１とｌり、３個でないときに論理０となる
。そしてビット番号４ｔＸ、文字図形に分岐があるとき
に論理１となり、ないときに論理０となる。このような
コードを各文字毎に必要な分だけ予じめＲＯＭ等に格納
したものを辞書と呼んでいる。勿論、Ｃの辞書は、第２
図には示されていないが、このような特徴コードを持つ
ものが文字「ア」であることを示す、いわゆるＩＤコー
ドおよびその他の補助情報を含めた一連の情報で構成さ
れている。そして、前記した特徴抽出回路５では、イメ
ージセッサ２で読み取られた文字図形の％、徴が、第２
図に示すようなフォーマットで抽出され、コード化され
て辞書照合回路ｌに入力されるこのような辞書手段は、従゛来でおいては、認識対象と
なる文字図形ごとｔこ人聞が作成していた。In such a handwritten character recognition device, the characters entered on the form differ depending on the scribe, even if they are of the same type.
It is necessary to prepare multiple dictionaries called subcategories for the same character type. Figure 2 VC, character 1
An example of the dictionary for Ako is shown, each time ω)0
The subcategories indicated by V are 1 each with code 10111.
It is expressed as 00100.11101. Here, the bit number 0 becomes logic 1 when the number of connected components of the character/figure is 1, and becomes logic 0 when it is not 1. Similarly, the bit number l becomes logical 1 when there is a loop in the character figure, and becomes logical 0 when there is no loop. Bit number 2 is a left concave υ,
It means whether or not there is a dent in the character figure when viewed from the left side. If there is a left dent, the logic is 1, and if there is no dent, the logic is 0. Bit number 3 becomes logic 1 when the number of character endpoints is three, and becomes logic 0 when there are not three. Bit number 4tX becomes logic 1 when there is a branch in the character figure, and becomes logic 0 when there is no branch. A dictionary is a dictionary in which the necessary number of codes for each character is stored in advance in a ROM or the like. Of course, C's dictionary is the second
Although not shown in the figure, it is composed of a series of information including a so-called ID code and other auxiliary information indicating that the character having such a feature code is the character "A". Then, in the feature extraction circuit 5 described above, the percentage of the character/figure read by the image sensor 2 is determined by the second
Conventionally, such a dictionary, which is extracted in the format shown in the figure, coded, and input to the dictionary matching circuit, is created by a person for each character and figure to be recognized. Was.

この辞書作成ｉｃ費やす時間は莫大な量となり、文字認
識装置の開発にあたっては、この辞書作成に全体の５０
％以上の時間と労力とを黄やさなけれはなＦつなかった
。そこでこのような問題を解決するためをこ、不特定多
数の毎記者にｇＲ対象の文字全記入してもらい、この記
入文字を文字認識装置で読み取らせ、Ａｒ１記した特徴
抽出回路からの出力全標準コードとして辞書Ｖ？−畜き
換える、いわゆる学１１による辞書作成が行なわれるよ
うになった。The amount of time it takes to create this dictionary is enormous, and when developing a character recognition device, it took about 50 minutes to create this dictionary.
It took more than % of the time and effort to do so. Therefore, in order to solve this problem, we asked an unspecified number of reporters to write in all the characters that are subject to gR, read these characters with a character recognition device, and read all the output from the feature extraction circuit described in Ar1. Dictionary V as standard code? - Dictionaries began to be created using the so-called Gaku11 method.

しかしながら、従来におけるこのような方法においては
、出来上がった辞書の精度が、収集した記入文字からの
文字特徴データの正確さによって左右され、いわゆる抜
けや漏れが発生することがあった。ｌた記入文字の内容
が同一パターンでのっても学習対象データとして同様に
処理してしまうため、処理に多くの時間を費やしてしま
うという問題点が残されていた。However, in such conventional methods, the accuracy of the completed dictionary depends on the accuracy of character feature data from collected written characters, and so-called omissions and omissions may occur. Even if the contents of the input characters are the same, they are processed in the same way as data to be learned, so the problem remains that a lot of time is spent on processing.

〈発明の目的〉この発明の目的は、したがって１期間で精度の高い学習
データ全作成し、この学習データをもとに精度の高いか
つ小容量の辞、ｉ・全自動的に作成することのできる改
良された文字認識用辞魯作成方式を提供することＶＣあ
る。<Purpose of the invention> Therefore, the purpose of the invention is to create all highly accurate learning data in one period, and to fully automatically create highly accurate and small-capacity words based on this learning data. VC aims to provide an improved dictionary creation method for character recognition.

く発明の構成〉この発明による文字認識用辞書作成方式においては、記
入さｉ−ｔた文字と同等の文字特徴コードを、計算機等
で自動的に発生およＯ・作成する。その際、谷文字毎に
、その文字がいかなる許容変形範囲内でも必ず取９得る
特徴全必須特徴として指定し、この指定された変形範囲
内でその文字特徴コードを計算機等で自動発生する。こ
の結果、前記した従来方法で問題となっていた辞書のＹ
？ｉ［の問題が解決されるようになった。すなわち従来
の方法においては、毎記名の数、内容によって辞書の出
来上りが左右されていたが、この発明による方式では、
許容変形範囲内で全ての文字特徴コード全発生、作成す
るので、辞書の抜けや漏れの問題が生じな（なった。Structure of the Invention> In the method for creating a dictionary for character recognition according to the present invention, a character feature code equivalent to the input character is automatically generated and created by a computer or the like. At this time, for each valley character, the character is designated as an all-required feature that the character always takes within any allowable deformation range, and the character feature code is automatically generated by a computer or the like within the designated deformation range. As a result, the Y of the dictionary, which was a problem with the conventional method mentioned above.
? The i[ problem is now resolved. In other words, in the conventional method, the quality of the dictionary depended on the number and contents of each name, but with the method according to the present invention,
All character feature codes are generated and created within the allowable deformation range, so there are no problems with omissions or omissions in the dictionary.

この発明においては、このような必須特徴コードの指定
に加えて、必須特徴を有する文字特徴コード上でいかな
る許容変形範囲内でも文字構成上取シ得ない特徴を禁止
特徴コードとして指定している。例えば第２図の文字「
ア」については、ループが１で連結成分数が１でない文
字は、文字図形イメージでは表現できない。このような
文字図形イメージで表現′ｔ−きない特徴コードは、前
記した電子計算機等の特徴コードの自動発生および必須
特徴の指定手段だけの構成では発生しうる。このことは
文字認識上は問題ないが、辞督の大きさが非常に大きく
なってしまう問題がある。したがって、必須特徴コード
の指定に加えて禁止！徴コードを指定することによって
、短期間で積置の烏いかつメモリ谷Ｓ−の小さい辞１”
を作成するための学習データが出来るようになる。この
学習データは、従来の学習による自動辞書作成装置の入
−力（学習）データと同様に用いられ、文字認識装詮の
現実の辞豊作成のためのデータとなる。In this invention, in addition to specifying such an essential feature code, a feature that cannot be obtained in character structure within any allowable deformation range on a character feature code having an essential feature is specified as a prohibited feature code. For example, the characters in Figure 2 “
Regarding "A", a character whose loop is 1 and the number of connected components is not 1 cannot be expressed by a character graphic image. Such a feature code that cannot be expressed by a character/graphic image may occur if the above-mentioned electronic computer or the like is configured only to automatically generate feature codes and specify essential features. Although this is not a problem in terms of character recognition, there is a problem in that the size of the gado becomes extremely large. Therefore, in addition to specifying the required feature code, it is prohibited! By specifying the characteristic code, you can quickly store the storage space and the memory valley S-1.
You will be able to create learning data for creating . This learning data is used in the same way as the input (learning) data of a conventional learning automatic dictionary creation device, and becomes data for creating an actual dictionary for the character recognition system.

く夾織例の説ψ」〉以下のこの発明の詳細な説明においゞては、５つの特徴
コードにより構成される辞書を作成する場合の学習デー
タの作成例について記述されている。In the detailed description of the present invention below, an example of creating learning data in the case of creating a dictionary composed of five feature codes will be described.

第２図に示すように、文字［ア」すこついての各サブカ
テゴリにおいては、ビット番号２の左凹みは必ず論理１
でろる。したがって文字「ア」の必須特徴コードは・　
ビット番号２が１となる他の４ビ゛ノドとの組み合わせ
になり、このコードを発生すればよいことになる。同様
な方法で、文字「伺についての必須特徴コードは、第３
図に示すようにビット番号１およびビット番号２が常に
０となる他の３ビツトとの組み合わせになり、このコー
ドを発生すればよいことになる。この結果、文字「イ」
については、第４図に示すようなａからｈまでの必須ｌ
ｒｆ微コードが発生する。これらのコードは、第２図に
示す文字「ア」についぞの必須特徴コードとは相違する
。したがって文字「ア」と［°イ」とは、第１図に示す
辞書照合回路Ｔにおいて、それぞれ区別して認識される
ことが理解される。As shown in Figure 2, in each subcategory of the letter [A], the left dent of bit number 2 is always a logical 1.
Deroru. Therefore, the required feature code for the character “A” is
It is a combination with other 4 bits in which bit number 2 is 1, and it is sufficient to generate this code. In a similar way, the required feature code for the character
As shown in the figure, bit number 1 and bit number 2 are always 0 in combination with the other 3 bits, and this code can be generated. As a result, the letter “i”
For, the required l from a to h as shown in Figure 4.
RF fine code is generated. These codes are different from the essential feature codes for the letter "A" shown in FIG. Therefore, it is understood that the characters "A" and "°I" are recognized separately in the dictionary matching circuit T shown in FIG. 1.

第４図に示す文字「イ」についての必須特徴コードにお
いて、コードｂからｇまでのコードは、いかなる許容変
形範囲内でも文字「イ」を構成しない。In the essential feature codes for the letter "A" shown in FIG. 4, the codes from code b to g do not constitute the letter "I" within any allowable deformation range.

これらのコードにおいては、そのビット番号３および４
の組み合わせが、それぞれ０．１および１゜０の特徴を
持つ。したがってこれらの特徴を持つコードに、文字図
形［伺を構成しない、すなわち文字図形として表現でき
ないコードとして学習、データから取り除かれる。すな
わちこれがこの発明における禁止特徴コードである。In these codes, bit numbers 3 and 4
The combinations have features of 0.1 and 1°0, respectively. Therefore, codes with these characteristics are learned and removed from the data as codes that do not constitute a character figure [in other words, that is, a code that cannot be expressed as a character figure. That is, this is the prohibited feature code in this invention.

以上のような必須特徴コードおよυ・禁止性徴コードの
ＲＡＭ等−＼の格納は、それぞれ第５図および第６図に
示すようなテーブルにもとづいて行なわれる。ｍ５図に
おいて、ｘ印は１′！！：たはＯのいずれでもよい、い
わゆるａｎｙ　ｆ示し、ｌは必須特徴が論理１の場合を
示し、０は必須特徴が論理０の場合を示す。ｌの場合は
該当する文字％敵が必ず１″′Ｃあることを意味し７．
０は必ず０であることを意味する。第６図に示す閉止特
徴コードを指定するたき）のテーブルについても、各符
号扛第５図の場合と同様な内容を意味する〇第７図は、この発明の一笑捲例にす６けるハードウェア
構成の概略を示しており、符号１１は、ＲＡＭ１２に格
納されたプログラムｔ−実行するためのＣＰＵであシ、
符号１３は、カウンタＣＯの内容すなわち学習データを
格納するＲ　Ａ　Ｍ’′ｃあシ、符号１４は必須特徴コ
ードを指定するテーブルメモリ、符号１５は禁止特徴コ
ードを指定するテーブルメモリである。The above-mentioned storage of the essential characteristic code, υ, prohibited sexual characteristic code, etc. in the RAM, etc. is carried out based on the tables shown in FIGS. 5 and 6, respectively. In the m5 diagram, the x mark is 1'! ! : or O, so-called any f, where l indicates that the required feature is logic 1, and 0 indicates that the essential feature is logic 0. In the case of l, it means that the corresponding character % enemy is always 1'''C7.
0 means always 0. Regarding the table shown in FIG. 6, which specifies the closure characteristic code, each code has the same meaning as in FIG. 5. 〇 FIG. 11 is a CPU for executing a program stored in a RAM 12;
Reference numeral 13 is a RAM''c receptacle for storing the contents of the counter CO, that is, learning data, reference numeral 14 is a table memory for specifying essential feature codes, and reference numeral 15 is a table memory for specifying prohibited feature codes.

第８ＦＡは、ＲＡＭ１２に格納されたこの発明を実施す
るためのプログラムの一例を示すフローチャートで、以
下これについて説明する。まずステップ２１で、カウン
タＣＯが初期セットされる。The eighth FA is a flowchart showing an example of a program for implementing the present invention stored in the RAM 12, which will be described below. First, in step 21, a counter CO is initially set.

この初期セットは、５ビツトカウンタを全てＯすなわち
ｏｏｏｏｏにすることである。次にステップ２２で、第
５図の対応する文字の必須特徴コードと比較される。こ
こで必須特徴が０または１と指定されている場合、該当
するカウンタＣＯのビットが一致する場合はステップ２
３に移行し、一致しない場合はステップ２５へ移行する
。ステップ２３に移行した場合は、＄６図の禁止特徴テ
ーブルを参照し、カウンタＣＯの該当するビットの組み
合わせと、禁止ｑ４徴テーブルの該当ビットの組み合わ
せとが一致するかどうかを比較する。一致したときｔよ
禁止４Ｍ歌としてステップ２５へ行く。一致しなＺ′）
つたときはステップ２４へ行き、学習データとしてカラ
／りＣＯの内容全第７図のＲＡＭ１３に格納する。なお
、第６図に示すように、禁止特徴に同一の文字図形にお
いて複数種類の組み合わせが指定される場合があるので
、ステップ２３の比較動作Ｕこおいては、これら複数の
禁止！徴ｅこついて比較一致の照合動作が行なわれるも
のとする。This initial set is to set the 5-bit counter to all O's, or ooooo. It is then compared in step 22 with the essential feature code of the corresponding character in FIG. If the required feature is specified as 0 or 1 here, step 2 if the corresponding counter CO bits match.
The process moves to step 3, and if they do not match, the process moves to step 25. When proceeding to step 23, the prohibited feature table shown in figure $6 is referred to, and a comparison is made to see if the corresponding bit combination of the counter CO matches the corresponding bit combination of the prohibited Q4 characteristic table. When a match is found, proceed to step 25 as a prohibited 4M song. Doesn't match Z')
If so, the process goes to step 24, and the entire contents of the color/color CO are stored in the RAM 13 of FIG. 7 as learning data. As shown in FIG. 6, there are cases where multiple types of combinations of the same character/figure are designated as prohibited features, so in the comparison operation U of step 23, these multiple prohibited! It is assumed that the comparison and matching matching operation is performed in response to the error.

このようｅこしてＲＡＭＩ　３ｒこ格納されたカウンタ
ＣＯの内容が、この発明において自動発生（作成〕する
有効学習データとなる。次にステップ２５でカウンタＣ
Ｏ’に＋ＩＬ、特徴コードを更新する。The contents of the counter CO thus stored in the RAMI 3r become effective learning data that is automatically generated (created) in this invention.Next, in step 25, the contents of the counter CO stored in the RAMI
+IL to O', update the feature code.

そしてこのような動作・とステップ２６においてカウン
タＣＯの内容が再［０００００になる゛まで続ける１す
なわち最大２’＝３２種類の％微コードを発生するとと
りこなる。以上の動作を辞書作成の対象文学会だけ実行
したとぎに学習データの発生、作成は完了する。この判
定ｔステップ２γで行なっている。なお、ステップ２８
で次の文字選択を行なった場合には、当然の事ながら、
第５図の処理対象の文字に対応する必須特徴および第６
１の対応する禁止％徴が、それぞれステップ２２および
２３の比較動作時に参照される。This operation is completed by generating 1, that is, a maximum of 2'=32 types of %fine codes, which continue until the contents of the counter CO become 00000 again in step 26. The generation and creation of learning data is completed when the above operations are performed only for the literary society targeted for dictionary creation. This determination is made in step 2γ. Note that step 28
If you select the next character with , of course,
The essential features corresponding to the characters to be processed in Figure 5 and the 6th
The corresponding prohibition percentage of 1 is referenced during the comparison operations of steps 22 and 23, respectively.

〈発明の効果〉以上のように、この発明による文字認識用辞書作成方式
においては、光学的に読み取られた文字図形の％徴をコ
ード化して学習により辞書を作成する際、認識対象とな
る各文字図形について、そのいかなる許容変形範囲内で
も必ず取り得る特徴と、そのいかなる許容変形範囲でも
取）得ない特徴とを、それぞれ必須特徴および禁止特徴
として指定し、必須特徴？含む特徴コードのうち禁止特
徴・を含まない特徴コードのみを辞書に格納して自動Ｉ
’ｌｌ：　ｇするときの学習データとするので、短期間
で精度の高い〃・つ小容量の辞書全自動的に発生および
作成することができる。<Effects of the Invention> As described above, in the method for creating a dictionary for character recognition according to the present invention, when creating a dictionary by learning by encoding the percentage characteristics of optically read characters and figures, each character recognition target is For a character figure, features that can always be obtained within any allowable deformation range and features that cannot be obtained within any allowable deformation range are designated as essential features and prohibited features, respectively, and the required features are determined. Among the included feature codes, only the feature codes that do not include prohibited features are stored in the dictionary and automatically
'll: Since this is used as learning data when performing g, it is possible to automatically generate and create a highly accurate and small-capacity dictionary in a short period of time.

[Brief explanation of the drawing]

第１図ｔよ手す文字認識装置の概略ン示すためのブロッ
ク図、第２図は文字「ア」についての特徴抽出コードの
一例を示す図、第３図は文字「伺についての特徴抽出コ
ードの一例を示す図、第４図は文字「イ」についての必
須特徴コードの一覧を示す図、比５図はこの発明の一実
施例における必須特徴コードを指定するソヒめのテーブ
ルのメモリ内容を示す図、第６図はこの発明の一実施例
における禁止４＋微コードヲ指定すめためのテーブルの
メモリ内容を示す図、第７図にこの発明の一％＆ｆｌｉ
例におけるハードウェア構成の概略を示すためのブロッ
ク図、第８図はこの発明の一実施例における動作を示す
ための流れ図である。特許出願人　立石電機株式会社代理人　弁理士　岩倉哲二（他１名）第５図　第６図第７図第８図Figure 1 is a block diagram showing an outline of the character recognition device. Figure 2 is a diagram showing an example of the feature extraction code for the character "A". Figure 3 is the feature extraction code for the character "A". Figure 4 shows a list of essential feature codes for the character "i", and Figure 5 shows the memory contents of a table specifying essential feature codes in an embodiment of the present invention. FIG. 6 is a diagram showing the memory contents of a table for specifying prohibited 4+fine codes in one embodiment of the present invention, and FIG.
FIG. 8 is a block diagram showing the outline of the hardware configuration in the example, and FIG. 8 is a flow chart showing the operation in one embodiment of the present invention. Patent Applicant Tateishi Electric Co., Ltd. Agent Patent Attorney Tetsuji Iwakura (1 other person) Figure 5 Figure 6 Figure 7 Figure 8

Claims

[Claims]

Characteristic code generation means for reading characters and figures on a form by fully optical scanning, extracting and encoding character features of the read characters and figures, and standard codes stored in a dictionary of characteristic codes generated by said means; In a character recognition device equipped with collation means for comparing and collating and outputting the results, means for specifying character features that can always be obtained within any allowable deformation range of each character figure to be recognized as essential features. and a means for specifying character features that cannot be acquired within any allowable deformation range as prohibited features;
The characteristic code from the micro code generation means is I! (ii) means for determining whether the essential feature is included; means for determining whether the feature code determined by the essential feature determining means to include the prohibited feature; means for storing a feature code determined not to include a prohibited feature by the prohibited feature determining means; and f:[I? ft.
E., a character recognition dictionary creation method in which all feature codes stored in the storage means are used as learning data when the dictionary is fully automatically created.