JPS60164882A

JPS60164882A - Character recognition device

Info

Publication number: JPS60164882A
Application number: JP59020368A
Authority: JP
Inventors: Minoru Nagao; 永尾　実
Original assignee: Tateisi Electronics Co; Omron Tateisi Electronics Co
Current assignee: Omron Corp
Priority date: 1984-02-06
Filing date: 1984-02-06
Publication date: 1985-08-27

Abstract

PURPOSE:To shorten a process time for dictionary collating by extracting character pattern features by inducing character groups common in features as candidate character and by storing start point information showing a search direction. CONSTITUTION:Characters on a form P are read optically by an A/D conversion circuit 1, converted into digital data and are stored in a picture memory 2. A CPU3 and stroke detection circuit 4 are connected to the circuit 1. The CPU3 performs character recognition process including collating process based on a program, and the circuit 4 converts the read characters into four direction stroke. An ROM5 is a memory for a dictionary storing standard patterns of the features to be recognized and for a program. An RAM6 is a memory for a work area used in storing a four direction stroke of input characters and eight direction conversion data and for program execution. The storage capacity of the RAM is stored corresponding to an image after stroke extraction.

Description

【発明の詳細な説明】〈発明の技術分野〉本発明は、未知のカナ、英字、数字等の文字を自動的に
読取って認識する文字認識装置に関する。DETAILED DESCRIPTION OF THE INVENTION Technical Field of the Invention The present invention relates to a character recognition device that automatically reads and recognizes unknown characters such as kana, alphabets, and numbers.

〈発明の背景〉従来この種の文字認識装置では、第１図のフローチャー
トに示す如く、用紙Ｐ上の文字を１文字毎に光電変換し
、これをざらに°′１″、“０″′の２値パターンに変
換している。続いて上記２値パターンにつき後の処理を
効果的にするための前処理、すなわち文字が記録されて
いる用紙Ｐ」二の黒点などのノイズ処理や文字図形境界
面の平滑化などを含む一連の処理がなされる。次に文字
認識を行なうのに必要ないくつかの特徴（交点、分岐点
、ループ数、ストロークの長さ情報など）を抽出する特
徴抽出処理が行なわれる。この抽出結果に応じて多数の
文字より特徴が共通する文字のいくつかが候補文字とし
て絞゛り込まれる。この絞り、込みにより第１段階の認
識が終了するが、候補文字が複数ある場合にはさらにそ
の中から唯一の文字の選択を行なうための詳細な識別処
理がなされる。この詳細な識別処理は一般には辞書照合
処理といわれる。ここで辞書とは各文字の特徴を一定の
系列で格納したもので通常ＲＯＭで構成されている。そ
れゆえ上述した候補文字の絞り込みは、特徴抽出によっ
て得られた特徴群にもとづき、辞書の中から特定の文字
群を選定することをいい辞書誘導処理と呼ばれる。<Background of the Invention> Conventionally, in this type of character recognition device, as shown in the flowchart of FIG. Next, the above binary pattern is subjected to pre-processing to make the subsequent processing more effective, i.e. noise processing such as black dots on paper P on which characters are recorded, and character graphics. A series of processing is performed including smoothing of the boundary surface. Next, feature extraction processing is performed to extract several features (intersections, branch points, number of loops, stroke length information, etc.) necessary for character recognition. Depending on this extraction result, some of the characters that have common features are narrowed down as candidate characters from a large number of characters. This narrowing down and refinement completes the first stage recognition, but if there are a plurality of candidate characters, detailed identification processing is further performed to select the only character from among them. This detailed identification process is generally called dictionary matching process. The dictionary here stores the characteristics of each character in a fixed series, and is usually constructed from a ROM. Therefore, the above-mentioned narrowing down of candidate characters is called dictionary guidance processing, which refers to selecting a specific group of characters from the dictionary based on the feature group obtained by feature extraction.

上述したようにこの辞書誘導処理に続いて、辞書中に予
じめ記憶される各文字の標準となる特徴系列（以下標準
パターンという）と認識対象文字の対応する特徴系列の
一致度を検出する照合処理がなされる。この照合処理結
果にもとづき読取った文字が特定されたり、あるいは認
識不能として認識結果が出力される。As described above, following this dictionary guidance process, the degree of matching between the standard feature series of each character (hereinafter referred to as standard pattern) stored in advance in the dictionary and the corresponding feature series of the recognition target character is detected. Verification processing is performed. Based on the result of this collation process, the read character is specified, or the recognition result is output as unrecognizable.

次に辞書の一例について説明する。Next, an example of a dictionary will be explained.

第２図（Ａ）は、同図０１）のカナ文字「ア」の標準パ
ターン【）１を図形化したものであり、ここに示した標
準パターンＤ１が辞畔として記憶されている。すなわち
この辞書において文字「ア」・は、３つの端点■と、１
個の分岐点■と、番号値で表わされる１０個の方向成分
（ベクトル）とで構成されている。なおここに示すベク
トル番号値は文字のストローク方向がそれぞれ第３図に
示ず０から７までの８方向のいずれの方向であるかを区
別するためのものである。したがって上記標準パターン
Ｄ１は文字「ア」が端点■から始まって２・１・３・４
・５というベクトルをこの順番で有した後分岐点０で終
了する要素と、端点■から始まりベクトル２・３・４・
５および端点■をこの順番で有する要素とを含むことを
示している。これらの各要素を図示したものが第２図Ｏ
ηの下段に示すものである。FIG. 2(A) is a graphic representation of the standard pattern [)1 of the kana character "a" in FIG. In other words, in this dictionary, the character "A" has three end points ■ and 1
It is made up of 1 branch points (2) and 10 direction components (vectors) represented by number values. The vector number values shown here are for distinguishing which of the eight directions from 0 to 7, not shown in FIG. 3, the stroke direction of each character is. Therefore, in the above standard pattern D1, the letter "A" starts from the end point ■ and is 2, 1, 3, 4.
・An element that has vectors 5 in this order and then ends at branch point 0, and an element that starts from end point ■ and has vectors 2, 3, 4,
5 and an element having the end point ■ in this order. A diagram illustrating each of these elements is shown in Figure 2.
This is shown in the lower part of η.

ここで、第１図のフローチャートで説明した文字認識装
置のより具体的な処理手順を上記した第２図および第４
図ないし第８図を用いて説明する。Here, the more specific processing procedure of the character recognition device explained in the flowchart of FIG. 1 will be described in FIGS.
This will be explained using FIGS. 8 through 8.

用紙Ｐから読取られた文字は２値化処理、前処理されて
、システムの記憶エリアに記憶される。ここで記憶され
る文字パターンは第４図に示すように筆跡そのままの太
目のものである。Characters read from paper P are binarized and preprocessed, and then stored in the storage area of the system. The character pattern stored here is a thick one, as shown in FIG. 4, just like the handwriting.

しかしこの太目の文字パターンはいわゆる細め等の処理
が施され、第５図に示すように文字の骨格を示す情報に
変換される。この変換にあたって前記骨格は第７図に示
すＡＣ左右）、Ｂ（上下）、Ｃ（斜右上り）、Ｄ（斜右
下り）の４方向に正規化される。ここ□で行なう正規化
は第３図に示した８方向成分への区別を容易にするため
のもので、この変換処理に続いて読取った文字の特徴系
列を得る処理がなされる。この特徴系列の表現は辞書す
なわち第２図中）の標準パターンＤ１と同様に行なわれ
、その結果、第６図に示すような■１３４■■３４■と
いう特徴系列が得られる。この特徴系列は読取対象文字
すなわち未知の文字のパターンを示すものであるから以
下未知パターンと呼ぶ。However, this thick character pattern is subjected to so-called thinning processing and converted into information indicating the skeleton of the character as shown in FIG. In this conversion, the skeleton is normalized in four directions: AC (left and right), B (up and down), C (diagonally upward to the right), and D (diagonally downward to the right) as shown in FIG. The normalization carried out here in □ is to facilitate differentiation into the eight directional components shown in FIG. 3. Following this conversion process, a process to obtain a feature sequence of the read character is performed. This feature series is expressed in the same way as the standard pattern D1 in the dictionary (see FIG. 2), and as a result, the feature series 134, 34, and 34 as shown in FIG. 6 are obtained. Since this feature series indicates a pattern of the character to be read, that is, an unknown character, it is hereinafter referred to as an unknown pattern.

以上のようにして得られた未知パターンは辞書誘導処理
の結果得られた候補文字の□標準パターンとその一致度
が照合される。令弟６図に示す未知パターンと、第２図
０９に示す「ア」の種糸パターンとが照合されたとする
と、この照合処理は第８図に示すように、標準パターン
中の□方向成分の順番と未知パターンの方向成分の順番
とが一致しているか等一定の規制にしたがって行なわれ
る。この例の場合、未知パターンの方向成分は標準パタ
ーンが持つ方向成分にすべて含まれ、かつ１１１Ｂ番も
一致しているので、未知パタ′−ンはカナ文字「ア」と
して認識され出力される。The unknown pattern obtained as described above is compared with the □ standard pattern of candidate characters obtained as a result of the dictionary guidance process for its degree of matching. Assuming that the unknown pattern shown in Figure 6 is compared with the seed yarn pattern of "A" shown in Figure 2 09, this matching process is performed by comparing the □ direction component in the standard pattern as shown in Figure 8. This is done according to certain regulations, such as whether the order matches the order of the direction components of the unknown pattern. In this example, the unknown pattern's directional components are all included in the standard pattern's directional components, and the number 111B also matches, so the unknown pattern is recognized and output as a kana character "A".

一般的な文字認識装置においては、以上のようにして文
字認識がなされるが、上記説明で理解できるように文字
認識における照合処理においては文字を構成している方
向成分の順番が重要であることがわかる。またこの方向
成分の順番は文字のどの位置を起点としてめていくかに
よって変化するものであることも理解できる。In general character recognition devices, character recognition is performed as described above, but as can be understood from the above explanation, the order of the directional components that make up a character is important in the matching process in character recognition. I understand. It can also be understood that the order of these directional components changes depending on which position of the character is used as the starting point.

ところで従来の文字認識装置においては□、文字のどの
位置を方向成分探索の起点にするかについて第９図（Ａ
）　、　（Ｂ）に示すように固定的で□あった。第９図
（Ａ）は文字の上方向から探索を開始するものであり、
第９図０９は文字の左方向から探索を開始する例である
。これらのように固定された方向から探索することは人
間にとって直観的でわかりやすいものであるが、機械的
な認識を行う場合には必らずしも妥当であるとはいえな
い。By the way, in the conventional character recognition device, the position of the character to be used as the starting point for the directional component search is determined according to Fig. 9 (A
), it was fixed and □ as shown in (B). In FIG. 9(A), the search starts from the top of the character,
FIG. 9 shows an example in which the search starts from the left side of the character. Searching from fixed directions like these is intuitive and easy to understand for humans, but it is not necessarily appropriate when performing mechanical recognition.

たとえば、第１０図（Ａ）　、　（１１）に示すように
文字「ヤ」と「ヌ」をいずれも文字の上側から起点探索
して両者のパターン要素を抽出した場合には、両者の類
似パターン要素は前後に入替っており、両文字は確実に
識別することができる。For example, as shown in Figures 10 (A) and (11), if the starting points for both the characters "ya" and "nu" are searched from the upper side of the characters and the pattern elements of both are extracted, similar patterns of both characters can be extracted. The elements are transposed and both characters can be reliably identified.

しかしながら第１１図（Ａ）　、　（Ｂ）に示すように
、文字「ユ」と「ヲ」の未知パターンを文字の上側から
起点探索して両文字の要素を抽出した場合には両者の要
素およびその順序が同じとなり、すなわち両文字のパタ
ーンが同じとなり、このような場合辞書照合処理のみに
よって両文字を区別することができない。この問題に対
応するため、従来の文字認識装置は、他の観点からの文
字特徴を抽出した後、別の識別処理をさらに追加してい
た。このため従来の文字認識装置では、別に追加した識
別処理のための処理時間がかかり、その分、処理の高速
化が阻害されていた。さらに、例えば第２０図に示す文
字パターンを１；８９図（Ａ）の探索方向で探索すると
、文字図形の変形によって、同図の）０に示す如く、ベ
クトル番号が相違する場合が生じる。これがためかかる
２種類の探索パターンを辞書に記憶させる必要があり、
辞書照合の処理時間が増し、また辞書の容量増大を招き
、その結果、処理の高速化および共通の低価格化を阻害
している。However, as shown in Figures 11 (A) and (B), when the unknown pattern of the characters "YU" and "WO" is searched from above the characters and the elements of both characters are extracted, the elements of both characters and The order is the same, that is, the patterns of both characters are the same, and in such a case, it is not possible to distinguish between the two characters only by dictionary matching. To deal with this problem, conventional character recognition devices extract character features from other viewpoints and then add another identification process. For this reason, in conventional character recognition devices, additional processing time is required for the additional identification processing, which hinders speeding up of the processing. Furthermore, for example, if the character pattern shown in FIG. 20 is searched in the search direction 1; It is necessary to store the two types of search patterns that this accumulates in the dictionary,
The processing time for dictionary matching increases, and the capacity of the dictionary increases, resulting in an impediment to speeding up processing and reducing common prices.

〈発明の目的〉本発明は、前記した従来の文字認識装置の欠点を解消し
、辞書照合の処理時間を短縮し、高速且つ高精度の文字
認識装置を提供することを目的とする。<Objective of the Invention> An object of the present invention is to eliminate the drawbacks of the conventional character recognition device described above, shorten processing time for dictionary comparison, and provide a high-speed and highly accurate character recognition device.

〈発明の構成および効果〉」―記目的を達成するため、本発明では、標準パターン
を格納する辞書記憶手段に、各文字毎に走査方向を示す
起点辺情報とストローク端点の位置情報とを記憶させて
、照合処理時に未知パターンを前記起点辺情報およびス
トローク端点の位置情報に基づいて走査するようにした
。<Configuration and Effects of the Invention> In order to achieve the object described above, the present invention stores, in a dictionary storage means for storing standard patterns, starting point side information indicating the scanning direction and stroke end point position information for each character. In this way, the unknown pattern is scanned during the matching process based on the starting point side information and the stroke end point position information.

かくて本発明の実施によって、例えば第１２図の両文字
を下方側から起点探索（第１３図参照）した場合、方向
成分の抽出順は文字「ユ」が先に■１■、続いて■０５
■であるに対し、文字「ヲ」の場合にはその逆になり、
両文字を確実に識別できる。従って常に一定方向より走
査して、最初に出会った文字端点より探索を開始する従
来方式と比較して、より確実に文字の識別が可能となり
、而も各文字につき複数種類の標準パターンを用意する
必要がなく、照合処理時間の短縮や辞書容量の節減に貢
献する等、発明目的を達成した顕著な効果を奏する。Thus, by implementing the present invention, for example, when searching for the starting point of both characters in FIG. 12 from the bottom side (see FIG. 13), the extraction order of the directional components is the character "Y" first, then ■1■, then ■ 05
■, but in the case of the character ``wo'', the opposite is true,
Both characters can be reliably identified. Therefore, compared to the conventional method, which always scans from a fixed direction and starts searching from the end point of the first character encountered, characters can be identified more reliably, and multiple types of standard patterns are prepared for each character. There is no need for this method, and the invention achieves remarkable effects such as contributing to shortening of collation processing time and dictionary capacity.

〈実施例の説明〉第１５図は本発明にかかる文字認識装置を示し、用紙Ｐ
−ヒの文字はＡ／Ｄ変挽変格回路１学的に読取られデジ
タルデータに変換されて画像メモリ２に記憶される。Ａ
／Ｄ変換回路１にはＣＰ　Ｕ　３およびストローク検出
回路４が結合なされており、ＣＰＵ３は照合処理を含む
文字認識処理をプログラムにしたがい実行し、またスト
ローク検出回路４は読取られた文字をたとえば第１６図
に示すように４方向ストロークに変換する。また図中、
ＲＯＭ５は認識対象文字の標準パターンを記憶する辞書
およびプログラム用のメモリであり、ＲＡＭ（５は入力
文字の４方向ストロークや８方向変換データの記憶用お
よヒプログラム実行時のワークエリア用のメモリである
。λＡＭ１３の記憶内容は第１６図に示すストローク抽
出後のイメージに対応して格納されており、その具体的
な内容は第１７図に示しである。<Description of Embodiments> FIG. 15 shows a character recognition device according to the present invention.
The character ``-'' is read by the A/D transformation circuit, converted into digital data, and stored in the image memory 2. A
A CPU 3 and a stroke detection circuit 4 are coupled to the /D conversion circuit 1, and the CPU 3 executes character recognition processing including collation processing according to a program, and the stroke detection circuit 4 detects read characters, for example, as the first character. It is converted into a four-direction stroke as shown in Figure 16. Also in the figure,
ROM 5 is a memory for a dictionary and program that stores standard patterns of characters to be recognized; The stored contents of the λAM 13 are stored corresponding to the image after stroke extraction shown in FIG. 16, and the specific contents are shown in FIG. 17.

第１７図中、ＴＥＲＭは文字図形の端点情報の格納領域
であり、Ｔ１は１つの端点、ＳＡは入方向のストローク
を、ＳｌはそのＡ方向の追番を夫々示す。同様にＴ２〜
Ｔ４は他の端点を示しており、ＳＣはＣ方向のストロー
クであることを示す。またり、Ｒは文字端点のストロー
りに対する位置を示しており、１．は端点位置がストロ
ークの左側であることを、またｋは同様に右側であるこ
とを夫々示す。ＣＩｆ　Ｋ　Ｎはストローク間の接続情
報（屈折点）が格納される領域であり、図示例では追番
Ｓ１のＡ方向ストロークのＲ（右側）と追番Ｓ２のＣ方
向ストロークのＲ（右側）とが屈折点として接続されて
いることを意味している。Ａ　Ｓ　Ｔ　ＭはＡストロー
クの端点の２次元座標アドレス情報が格納される領域で
あり、Ｃ８ＴＭはＣストロークの端点の２次元座標アド
レス情報が格納される領域である。なおストロークの端
点の２次元座標アドレス情報を格納する領域としては、
ＢストロークおよびＤストロークの端点アドレス情報を
格納する領域ＲＳＴＭ　、Ｄ　Ｓ’ＦＭも設けられるが
ここでは第１６図の文字例にＢストローク、Ｄストロー
クを含んでいないので図示されていない。またＦ　ＯＮ
　Ｔは未知パターンのストローク追跡過程におけるスト
ローク情報を格納する領域、ＳＣＭは認識対象文字すな
わち未知文字のストロークの８方向に変換されたデータ
を格納する領域である。In FIG. 17, TERM is a storage area for end point information of characters and figures, T1 represents one end point, SA represents a stroke in the incoming direction, and Sl represents its serial number in the A direction. Similarly, T2~
T4 indicates another end point, and SC indicates a stroke in the C direction. Also, R indicates the position of the end point of the character relative to the stroke; 1. indicates that the end point position is on the left side of the stroke, and similarly, k indicates that the end point position is on the right side. CIf KN is an area where connection information (inflection points) between strokes is stored, and in the illustrated example, the R (right side) of the A direction stroke of serial number S1 and the R (right side) of the C direction stroke of serial number S2. This means that they are connected as an inflection point. ASTM is an area where two-dimensional coordinate address information of the end point of the A stroke is stored, and C8TM is an area where two-dimensional coordinate address information of the end point of the C stroke is stored. Note that the area for storing the two-dimensional coordinate address information of the end point of the stroke is as follows:
Areas RSTM and DS'FM for storing end point address information of the B stroke and the D stroke are also provided, but are not shown here because the character examples in FIG. 16 do not include the B stroke and the D stroke. Also F ON
T is an area for storing stroke information in the stroke tracking process of an unknown pattern, and SCM is an area for storing data converted in eight directions of the stroke of the recognition target character, that is, the unknown character.

第１８図はＲＯＭ５の記憶内容のうち、辞書部分を例示
したものであり、図示例は第１４図の文字「ユ」を示し
ている。この辞書には各文字格納領域毎に先ず走査の起
点辺情報が格納しである。図中、起点連番号■は文字の
上側辺を起点辺として走査することを示し、同様に起点
連番号■■■は夫々左側辺、下側辺、右側辺を起点辺と
して走査することを示す（第２１図参照）。従って第１
８図に示す例では、第１回目はまず下側辺を起点辺とし
て走査し、第２回目は」―側辺を起点辺として走査する
ことになる。FIG. 18 shows an example of the dictionary portion of the contents stored in the ROM 5, and the illustrated example shows the character "U" in FIG. This dictionary first stores scanning starting point side information for each character storage area. In the figure, the starting point consecutive number ■ indicates that scanning is performed using the upper side of the character as the starting point, and similarly, the starting point consecutive number ■■■ indicates that scanning is performed using the left side, lower side, and right side, respectively. (See Figure 21). Therefore, the first
In the example shown in FIG. 8, the first scan is performed using the lower edge as the starting edge, and the second scanning is performed using the "-" side edge as the starting edge.

この起点辺情報のつぎには、ストローク端点の位鮪情報
が格納しである。図中、ＡＬはストロークＡの左側端点
より追跡することを示し、同様にストロークＡの右側端
点ではＡＲ、ストロークＩｓの上側および下側端点では
ＢＵ、ＢＤ、ストロークＣの左側および右側端点ではＣ
Ｌ。Next to this starting point side information, tuna information at the end point of the stroke is stored. In the figure, AL indicates tracking from the left end point of stroke A, similarly AR at the right end point of stroke A, BU and BD at the upper and lower end points of stroke Is, and C at the left and right end points of stroke C.
L.

Ｃｔｔ、ストロークＤの左側および右側端点ではＤＬ　
、ＤＲをもって示すことになる。第１８図の例では、第
１回および第２回目のいずれの走査においても、ストロ
ークＡの左側端点より追跡することになっている。従っ
て未知文字を前記起点辺より走査した際、最初に出会っ
た文字端点がストロークＡの左側端点であれば、その文
字端点より未知パターンのストローク追跡を実施するこ
とになる。Ctt, DL at the left and right end points of stroke D
, DR. In the example of FIG. 18, tracing is performed from the left end point of stroke A in both the first and second scans. Therefore, when an unknown character is scanned from the starting point side, if the first character end point encountered is the left end point of stroke A, the stroke of the unknown pattern is traced from that character end point.

第１９図は、−に記実施例にかかる装置の照合処理動作
を示し、以下、辞書誘導処理により、候補文字が「ユ」
に絞られた場合の照合処理について説明する。FIG. 19 shows the collation processing operation of the apparatus according to the embodiment described in -.
We will explain the matching process when the results are narrowed down to .

先ずステップ１１（以下、「５Ｔ１１」の如く示す）に
おいて、絞られた候補文字「ユ」からＲＯＭ５の「ユ」
の辞書（第１８図）を参照し、第１回目走査の起点連番
号■をロードする。つぎに、５Ｔ１２で同様にＲＯＭ５
の辞書を参照し、第１回目走査のストローク端点位置情
報ＡＬをロードした後、前記起点連番号■により下側辺
からの走査モードを決定しく　ＳＴＩ　３　”）、下側
辺の左端に相当するアドレスを走査カウンタに格納して
初期化する（ｓＴ１４）。次に走査モードにしたがって
第１６図に示したイメージをＲＡＭ５上で走査する。図
示例では、第２１図に示す如く、下辺左側から右端に走
査してゆくことになる。この時各走査点毎にＲＡＭ５の
□ＡＳＴＭ領域、Ｃ５ＴＭ領域のストローク端点の２次
元アドレスと走査カウンタの座標が一致するか比較され
る（ＳＴ１５）。比較の結果両座標が一致しない限り走
査カウンタが更新されこの座標アドレス比較が繰り返さ
れる（ＳＴ１５〜１７）。そして走査カウンタの内容と
ＲＡＭ６の端点２次元アドレスが例えば座標（２，２８
）にて一致スると、このストローク端点を仮の起点とし
、次にこのストローク端点が文字端点であるか否かを判
定する（ｓＴＩＢ）。この座標（２、２８）はＡＳＴＭ
領域に格納されており、入方向ストロークのストローク
番号２の左側端点に該当する。そしてこの端点の情報は
ＴＥＲＭ領域において文字端点Ｔ３の情報として存在し
ている。そこでこのストローク端点を文字端点とし、前
記走査時の仮の起点としてＦＯＮＴ領域にこの情報（Ｓ
Ａ、８２．Ｉ−）を格納する。First, in step 11 (hereinafter referred to as "5T11"), "Y" in ROM 5 is selected from the narrowed down candidate characters "YU".
Refer to the dictionary (FIG. 18) and load the starting point sequence number ■ of the first scan. Next, in the same way with 5T12, ROM5
After referring to the dictionary and loading the stroke end point position information AL of the first scan, determine the scanning mode from the lower side using the starting point sequence number ■ (STI 3''), which corresponds to the left end of the lower side. The address is stored in the scanning counter and initialized (sT14). Next, the image shown in FIG. 16 is scanned on the RAM 5 according to the scanning mode. In the illustrated example, as shown in FIG. At this time, for each scanning point, a comparison is made to see if the two-dimensional address of the stroke end point in the □ASTM area and C5TM area of RAM5 matches the coordinates of the scanning counter (ST15).Result of comparison Unless both coordinates match, the scan counter is updated and this coordinate address comparison is repeated (ST15-17).The contents of the scan counter and the end point two-dimensional address of RAM6 are, for example, coordinates (2, 28
), this stroke end point is used as a temporary starting point, and it is then determined whether this stroke end point is a character end point (sTIB). This coordinate (2, 28) is ASTM
It corresponds to the left end point of stroke number 2 of the incoming stroke. Information on this endpoint exists as information on the character endpoint T3 in the TERM area. Therefore, this stroke end point is set as the character end point, and this information (S
A, 82. I-) is stored.

つぎに、ＳＴ］９でこの文字端点を前記辞書よりロード
したストローク端点位置情報と比較し、ストロークの種
類および端点位置が一致するか否かを判定する。本実施
例の場合、辞書よりロードした情報はＡストロークの左
側端点を意味するＡ　Ｌであるから、この両者は一致し
、したがって、この文字端点を走査時の起点とし、文字
端点マーク■をＲＡＭ（ｉのＳＣＭ領域に格納する。Next, in ST]9, this character end point is compared with the stroke end point position information loaded from the dictionary to determine whether or not the stroke type and end point position match. In the case of this embodiment, the information loaded from the dictionary is A L, which means the left end point of the A stroke, so both of them match. Therefore, this character end point is used as the starting point for scanning, and the character end point mark ■ is stored in the RAM. (Stored in the SCM area of i.

ところで第１６図における端点Ｔ３のＹ座標アドレスが
、もし端点Ｔ４より小さいような場合には、下辺からの
走査においてはまず端点Ｔ４が仮の起点となる。この場
合、端点Ｔ４はＡ方向ストロークの右側端点に相当し、
これは辞書よりロードしたストローク端点位置情報と不
一致となる。この場合には５Ｔ１９の判定が・ＮＯ・、
５Ｔ２０の判定が“ＹＥＳ”となり、走査カウンタは、
その内容が端点Ｔ３のＹ座標アドレスに達するまで更新
が繰り返されることになる。By the way, if the Y coordinate address of the end point T3 in FIG. 16 is smaller than the end point T4, the end point T4 becomes the temporary starting point in scanning from the lower side. In this case, the end point T4 corresponds to the right end point of the stroke in the A direction,
This does not match the stroke end point position information loaded from the dictionary. In this case, the judgment of 5T19 is ・NO・,
The determination of 5T20 is “YES” and the scan counter is
Updates will be repeated until the contents reach the Y coordinate address of the end point T3.

尚５Ｔ２０の判定が“Ｎｏ”の場合は、全く目的外の文
字端点が抽出されたものとして処理を終了し、照合不一
致としてリジェクト処理される。If the determination at 5T20 is "No", it is assumed that a completely unintended character endpoint has been extracted, and the process is terminated, and the process is rejected as a match mismatch.

他方ＦＯＮＴ領域に格納されたデータが文字端点格納領
域ＴＥＲＭ中に発見されない場合には、これは検出され
た座標が文字端点てはないことを示すから、走査カウン
タを更新すべく制御はＳＴ□に移される。　。On the other hand, if the data stored in the FONT area is not found in the character endpoint storage area TERM, this indicates that the detected coordinates are not character endpoints, so control is passed to ST□ to update the scan counter. be transferred. .

次に文字端点であることが検出された場合には前記ＦＯ
ＮＴ領域のストローク情報を用いて、第３図に示す８方
向データ（入方向ストロークで左から右へのストローク
は方向１）に変換し、この方向データ“１”をＲＡＭ６
の８方向変換デ一タ格納飴域ＳＣＭに格納する（ｓＴ２
１）。Next, if a character endpoint is detected, the FO
Using the stroke information in the NT area, convert it into the eight-direction data shown in FIG. 3 (incoming stroke from left to right is direction 1), and store this direction data "1" in
8-way conversion data storage area SCM (sT2
1).

続いてＦＯＮＴ領域に格納したストロークからさらに接
続情報、を探すために、このストロークのもう一方のス
トローク端点情報（ＳＡ、Ｓ２゜Ｒ）にＦＯＮＴ領域を
変更する。そして検出されたＡ方向ストロークのストロ
ーク番号２の右側端（６）が文字端点であるか、あるい
は屈折点を構成しているかをＦＯＮＴ領域のデータを参
照して、文字端点格納領域ＴＥＲＭおよび屈折点格納領
域ＣＨＫＮ内の登録内容を調べてチェツー　りする。図
示例ではＴＥＲＭ領域より（ＳＡ。Next, in order to further search for connection information from the stroke stored in the FONT area, the FONT area is changed to the other stroke end point information (SA, S2°R) of this stroke. Then, it is determined whether the right end (6) of stroke number 2 of the detected stroke in the A direction is a character end point or an inflection point by referring to the data in the FONT area, and the character end point storage area TERM and the inflection point are checked. Check the registered contents in the storage area CHKN and check. In the illustrated example, from the TERM area (SA.

５２、Ｒ）を見出すことができ、変更後のストローク端
点は文字端点（Ｔ４）であることがわかる。52, R), and it can be seen that the stroke end point after the change is the character end point (T4).

この結果書こより８方向変換デ一タ格納領域ＳＣＭに■
を格納する（　ｓＴ２２　）。続いて、ＲＯＭ５０文字
「ユ」の辞書を参照して次の起点指定があるか判定され
るが（ＳＴ２３）、図示例では第１８図に示すように第
２番目の起点情報が存在するので、５Ｔ２４でつぎの起
点連番号■を、５Ｔ２５で第２番目のストローク端点位
置情報Ａ　Ｌをロードした後、」１記と同様の走査を行
うべく制御を５Ｔ１３へ移す。なお、」〕記の動作にお
いて、８方向変換の終了したストロークは変換後適時抹
消（ＲＡＭ５Ｊ二のＡＳＴＭ、Ｃ８ＴＭ領域のストロー
ク番号をスペースコードに変換）する。これによって、
一度検出されたストロークが重複検出されることはなく
なる。This result is written to the 8-direction conversion data storage area SCM.
(sT22). Next, it is determined whether there is the next starting point designation by referring to the dictionary of the ROM 50 character "Yu" (ST23), but in the illustrated example, as the second starting point information exists as shown in FIG. 18, After loading the next starting point serial number ■ at 5T24 and loading the second stroke end point position information AL at 5T25, control is transferred to 5T13 to perform the same scanning as in ``1''. In addition, in the operation described above, the strokes that have been converted in eight directions are deleted as appropriate after the conversion (the stroke numbers in the ASTM and C8TM areas of RAM 5J2 are converted to space codes). by this,
A stroke that has been detected once will no longer be detected twice.

第２回目の探索も第１回目の場合と同様起点指定された
データすなわち起点連番号により走査モードが決定され
（ＳＴ１３）、走査カウンタを初期化する（　ＳＴＩ　
４　）。上記例では２回目の起点連番号が■なので第２
１図に示すように、上側辺左側から右側にかけて走査が
進められることになる。そして上記した第１回目の場合
と同様に走査カウンタを更新しながら走査を行ない、Ｒ
ＡＭ６（ＤＡＳＴＭ、、Ｃ５ＴＭ領域に格納されるスト
ロークの２次元アドレスデータと走査カウンタとを比較
する（ＳＴ１５〜５Ｔ１７）。In the second search, as in the first search, the scanning mode is determined based on the data specifying the starting point, that is, the starting point serial number (ST13), and the scanning counter is initialized (STI).
4). In the above example, the second starting sequence number is ■, so the second
As shown in FIG. 1, scanning proceeds from the left side to the right side of the upper side. Then, as in the first case described above, scanning is performed while updating the scanning counter, and R
The two-dimensional address data of the stroke stored in the AM6 (DASTM, C5TM area) and the scan counter are compared (ST15 to 5T17).

第１６図に示した例ではアドレス（３，６）で一致が成
立する。このアドレス（３，６）は。In the example shown in FIG. 16, a match is established at address (3, 6). This address (3,6) is.

Ａ　Ｓ　Ｔ　Ｍ領域に格納されている。それゆえと９端
点は入方向ストローク（ＳＡ）のストローク番号１　（
Ｓｔ）の左側（Ｉ、）であることがわかる。次に、この
ストローク端点が文字端点であるかを判定し、文字端点
である時、その情報をＲＡＭ５のＦＯＮＴ領域に走査時
の仮の起点として格納する。次に、この文字端点と、前
記第２回目のストローク端点情報を比較し、一致した時
に、この文字端点を走査時の起点とし、さらに文字端点
マーク■をＲＡＭ５のＳＣＭ領域に格納する。次に、第
１回目と同様にＦＯＮＴ領域のストローク情報を用いて
８方向データ方向１に変換し、この方向データパ１″を
８方向変換デ一タ格納領域ＳＣＭに“１″を格納する（
ＳＴ２１）。It is stored in the ASTM area. Therefore, the end point 9 is stroke number 1 of the incoming stroke (SA) (
It can be seen that it is on the left side (I,) of St). Next, it is determined whether this stroke end point is a character end point, and if it is a character end point, that information is stored in the FONT area of the RAM 5 as a temporary starting point during scanning. Next, this character end point is compared with the second stroke end point information, and when they match, this character end point is used as the starting point for scanning, and a character end point mark 2 is stored in the SCM area of the RAM 5. Next, similarly to the first time, the stroke information in the FONT area is used to convert the 8-direction data direction 1, and this direction data pattern 1'' is stored as "1" in the 8-direction conversion data storage area SCM (
ST21).

続いて、ＦＯＮＴ領域に格納したストロークからさらに
接続情報を探すためにこのストロークのもう一方のスト
ローク端点情報にＰＯＮＴ領域を右側の端点を示す（Ｓ
Ａ、Ｓｌ、Ｒ）に変更する。接続有無の確認は、このＦ
ＯＮＴ領域のデータを参照してＴＥＲＭ領域、ＣＨＫＮ
領域を調べることにより行なうが、図示例では変更後の
ストローク端点（ＳＡ、Ｓｌ、Ｒ）が（ｓｃ。Next, in order to further search for connection information from the stroke stored in the FONT area, the other stroke end point information of this stroke indicates the right end point of the PONT area (S
A, Sl, R). To check if there is a connection, use this F
TERM area and CHKN by referring to the data in the ONT area.
This is done by checking the area, but in the illustrated example, the stroke end point (SA, Sl, R) after the change is (sc).

Ｓｌ、Ｒ）と接続していることがＣＨＫ　Ｎ領域でわか
る。それゆえ、ＳＴ２２の接続情報チェックの判定は「
有」と判断される。これに伴いＦＯＮＴ領域のストロー
ク情報をこの（ＳＣ。It can be seen in the CHK N area that it is connected to S1, R). Therefore, the judgment of ST22's connection information check is "
Yes. Along with this, the stroke information of the FONT area is changed to this (SC).

Ｓｌ、Ｒ）に変更した後、制御は５Ｔ２１に移される。After changing to 5T21 (Sl, R), control is transferred to 5T21.

以後文字の端点が検出されるまで５Ｔ２１およびＳＴ２
２が繰返し実行される。その結果この図示例では■１■
■１４■がＲＡＭ　６のストロークの８方向変換デ一タ
格納領域ＳＣＭに格納される。ＳＴ２３で再び次の起点
指定か？判定されるが図示例では第２回目までの起点辺
がＲＯＭ　５に格納されているのみなので、ここでは判
定が”ＮＯ”となり、続いてＲＡＭ５の８方向変換デ一
タ格納領域ＳＣＭの内容とＲＯＭ５の辞書の文字「ユ」
の標準パターンが比較される（　ＳＴ２６　）。比較の
結果ある一定率以上の一致が得られれば入力文字を「ユ
」と認識する。After that, 5T21 and ST2 until the end point of the character is detected.
2 is executed repeatedly. As a result, in this illustrated example, ■1■
■14■ is stored in the eight-direction stroke conversion data storage area SCM of the RAM 6. Should I specify the next starting point again in ST23? However, in the illustrated example, only the starting point edges up to the second time are stored in the ROM 5, so the determination is "NO" here, and then the contents of the 8-direction conversion data storage area SCM in the RAM 5 are stored. The character “YU” in the dictionary of ROM5
The standard patterns of are compared (ST26). As a result of the comparison, if a match is found at a certain rate or higher, the input character is recognized as "Yu".

[Brief explanation of drawings]

第１図は一般的な文字認識装置の概略を説明するための
フローチャート、第２図（〜、（Ｂ）は文字例「ア」を
方向成分に変換した標準パターン例を説明するための図
、第３図は文字のストロークの方向成分を８方向に正規
化する場合の方向番号を示す図、第４図は文字例「ア」
の光学的に読取った状態でのパターンを示す図、第５図
は第４図の文字「ア」を細めてストロークの骨格のみを
示した図、第６図は第５図に示す文字「ア」を端点、方
向成分、屈折点で表わす場合を説明するための図、第７
図は４方向成分を説明するための図、第８図は第２図に
示す文字「ア」の標準パターンと第５図、第６図に示す
文字の照合による文字認識動作を説明するための図、第
９図（ハ）、（ロ）は文学の探索°走査を開始する起点
辺を説明する図、第１０図（Ａ）、（ロ）は文字「ヤ」
「ヌ」の上側辺より走査を開始した場合の各ストローク
の８方向成分パターンを示す図、第１１図（〜、（ハ）
は文字「ユ」「ヲ」の上側辺より走査を開始した場合の
各ストロークの８方向成分パターンを示す図、第１２図
（Ａ）　、　（Ｂ）は文字「ユ」「ヲ」の上側辺より走
査を開始した場合１８１の各ストロークの８方向成分パターンを示す図、第１３
図は文字の下翻辺から走査を開始する場合を説明するた
めの図、第１４図はこの発明の文字認識装置において辞
書記憶手段の各文字領域を概説するための―、第１５図
はこの発明が実施される文字認識装置の回路ブロック図
、第１６図は第１５図に示す文字認識装置に入力される
文字例を示す図、第１７図は第１５図に示す文字認識装
置のＲＡＭの格納領域配置例を示す図、第１８図は同Ｒ
ＯＭの辞書格納領域の一文字分のデータ配置例を示す図
、第１９図は第１５図に示す文字認識装置の動作を説明
するためのフローチャート、第２０図は従来方式の不都
合を説明するための図、第２１図は第１８図に示される
起点辺情報に基づいて走査が開始される辺を示す図であ
る。 −１− ロ］Ｉ　「「■ 特開昭ＧＯ−１６４８８２（８）ｌｉ−Ｌ−涜ｌ包１　２Ｉ　トに　Ｉ特開昭ＧＯ−１６４８８２（１１）分２７　図 ■FIG. 1 is a flowchart for explaining the outline of a general character recognition device, FIG. Figure 3 is a diagram showing the direction numbers when the directional components of the stroke of a character are normalized into eight directions, and Figure 4 is an example of the character "A".
Figure 5 is a diagram showing only the skeleton of the stroke by thinning the letter "A" in Figure 4, and Figure 6 is a diagram showing the pattern as it is optically read. ” is expressed by end points, directional components, and refraction points, 7th figure
The figure is a diagram for explaining the four-directional components, and Figure 8 is a diagram for explaining the character recognition operation by matching the standard pattern of the character "A" shown in Figure 2 with the characters shown in Figures 5 and 6. Figures 9(c) and 9(b) are diagrams explaining the starting point edge for starting literature search ° scanning, and Figures 10(a) and (b) are the characters "ya".
A diagram showing the eight-direction component pattern of each stroke when scanning starts from the upper side of "nu", Fig.
Figures 12(A) and 12(B) show the 8-direction component pattern of each stroke when scanning starts from the upper side of the characters "YU" and "WO". Figure 13 shows the 8-direction component pattern of each stroke in 181 when scanning is started from
14 is a diagram for explaining the case where scanning starts from the lower side of a character, FIG. A circuit block diagram of a character recognition device in which the invention is implemented, FIG. 16 is a diagram showing an example of characters input to the character recognition device shown in FIG. 15, and FIG. 17 is a diagram of a RAM of the character recognition device shown in FIG. A diagram showing an example of storage area arrangement, FIG.
A diagram showing an example of data arrangement for one character in the dictionary storage area of OM, FIG. 19 is a flow chart for explaining the operation of the character recognition device shown in FIG. 15, and FIG. 20 is a flow chart for explaining the disadvantages of the conventional method. 21 are diagrams showing the side on which scanning is started based on the starting point side information shown in FIG. 18. -1- B]I ``■ JP-A Showa GO-164882 (8) li-L-Sacrificial package 1 2I To I JP-A Showa GO-164882 (11) Minute 27 Figure■

Claims

[Claims]

A means for optically reading unknown characters to form a character pattern, a means for extracting features of the character pattern, a means for storing the extracted features, and a group of characters that share the same extracted features as candidate characters. a dictionary storage means for storing a unique standard pattern for each character, and predefining and storing starting point side information indicating a search direction for each character and end point position information of a stroke during the search; , a character recognition device that searches for features of an unknown pattern and compares each character of the derived character group with a standard pattern based on the starting point side information and the stroke end point position information 1. Device.