JPS61182182A

JPS61182182A - Character recognizing device

Info

Publication number: JPS61182182A
Application number: JP60021428A
Authority: JP
Inventors: Minoru Nagao; 永尾　実
Original assignee: Omron Tateisi Electronics Co
Current assignee: Omron Corp
Priority date: 1985-02-06
Filing date: 1985-02-06
Publication date: 1986-08-14

Abstract

PURPOSE:To execute the correct character recognizing by providing a black base extracting means besides the stroke extracting means. CONSTITUTION:When a tracking point exists at a picture element 31 shown by labels d and l, the bit [the bit is '1' since it includes (d) in the label.] of a position A1 in the labels (d) and l is noticed, and then, the bit '1' does not exist at the position A2, the bit '1' exists at a position A3 and therefore, the tracking point at an adjoining picture element 32 displayed by the label including the bit '1'. In the sequence of B2-B4, C2-C4 and D2-D4 respectively, the bit '1' of the label is detected and tracked. It is decided whether the information, which is not extracted as a stroke edge point, exists in the information to show respective linked black base parts. When the information, which is not extracted as the stroke edge point, exists, the respective linked black parts are not extracted by the stroke, and therefore, they are extracted as a black base lump.

Description

【発明の詳細な説明】〈発明の技術分野〉この発明は、未知の文字や数字・記号等（この明細書で
は、これらを一括して「文字」と呼ぶ。）のパターンを
読み取り、このパターンを白黒２値化して入力画像を求
めた後、特徴抽出部によってその幾何学的特徴を抽出し
、その抽出結果と標準パターンとを照合することによっ
てその未知文字を認識する文字認識装置に関する。[Detailed Description of the Invention] <Technical Field of the Invention> This invention reads a pattern of unknown characters, numbers, symbols, etc. (in this specification, these are collectively referred to as "characters"), and The present invention relates to a character recognition device that recognizes an unknown character by converting it into black and white binarization to obtain an input image, extracting its geometric features by a feature extraction unit, and comparing the extraction results with a standard pattern.

〈発明の概要〉この発明は、入力画像の黒地と白地との境界を追跡して
ストロークを抽出する場合において、ストローク抽出さ
れないまま残る一連結の黒地り、これにより、誤った文
字認識が行なわれるのを防止している。<Summary of the Invention> This invention solves the problem that when strokes are extracted by tracing the boundary between a black background and a white background of an input image, a series of black backgrounds remain without stroke extraction, which causes incorrect character recognition. It prevents

〈発明の背景〉従来の文字認識装置は、第９図に示すように、未知文字
１を光学的に読み取って画像化する読取部２と、この読
取部２からの入力画像に対して平滑化等の前処理を施こ
す前処理部３と、前処理された画像の幾何学的特徴を抽
出する特徴抽出部４と、抽出された特徴をあらかじめ辞
書５に格納されている標準パターンと照合して未知文字
を認識する辞書照合部６とから構成されている。<Background of the Invention> As shown in FIG. 9, a conventional character recognition device includes a reading section 2 that optically reads an unknown character 1 and converts it into an image, and a smoothing process for the input image from the reading section 2. a preprocessing unit 3 that performs preprocessing such as the following; a feature extraction unit 4 that extracts geometric features of the preprocessed image; and a feature extraction unit 4 that compares the extracted features with standard patterns stored in a dictionary 5 in advance. and a dictionary matching section 6 that recognizes unknown characters.

一般に、この特徴抽出部４では、入力画像の黒画素に看
目し、第１０図に示すように、入力画像Ｇを細線化して
細線パターンｇを求め、この細線パターンｇから未知文
字の特徴を抽出する方式がとられている。In general, the feature extraction unit 4 focuses on black pixels in the input image, thins the input image G to obtain a thin line pattern g, and extracts the features of the unknown character from this thin line pattern g. An extraction method is used.

ところが、この方式においては、画像の細線化処理を必
要とするため、近年、入力画像から直接未知文字の特徴
を抽出する方式が提案されている。この方式では、たと
えば第Ｉ１図に示す文字パターンの特徴を抽出する際に
おいて、入力画像Ｇにおける黒画像と白画像との境界（
図中、太線で示す）に看目し、この境界を、第１２図に
示すＡ−Ｄの４つの方向で追跡することによって、対を
なすサブストローク（Ａ１．Ａ２）（Ｂ１．　Ｂ２）、
（Ｃ□、Ｃ２）を抽出する。この場合、サブストローク
の一対性は、その間隔が所定値以下のものに限って認め
る。境界の追跡は、第１３図に示すように、入力画像Ｇ
を所定の方向（図中、矢印で示す）に走査してゆき、こ
の走査過程において、あらかじめ設定しである上記Ａ−
Ｄのいずれかの方向に伸びるパターンと遭遇したときに
は、その方向への境界の追跡を開始し、その方向へのパ
ターンが消失するまでその追跡を継へすることによって
行なわれる。このようにして、対をなすサブストローク
が得られると、それらに基づいて第１４図に示すような
ストロークの集合としての近似パターンｆを求めるので
ある。However, this method requires image thinning processing, and in recent years, methods have been proposed in which features of unknown characters are directly extracted from the input image. In this method, for example, when extracting the features of the character pattern shown in FIG.
(indicated by thick lines in the figure), and by tracing this boundary in the four directions A-D shown in FIG. 12, the paired sub-strokes (A1.A2) (B1.B2),
Extract (C□, C2). In this case, pairness of substrokes is recognized only if the interval between them is less than or equal to a predetermined value. The boundary tracking is performed using the input image G as shown in FIG.
is scanned in a predetermined direction (indicated by an arrow in the figure), and in this scanning process, the preset A-
When a pattern extending in any direction of D is encountered, tracking of the boundary in that direction is started, and tracking is continued until the pattern in that direction disappears. Once paired substrokes are obtained in this way, an approximate pattern f as a set of strokes as shown in FIG. 14 is determined based on them.

ところが、この方式によれば、「つ」、「ツ」、「シ」
、「ン」、「ン」等の極短かっ幅広のストロークが発生
し易い文字の場合には、応々にして特徴抽出の誤りが生
じる。However, according to this method, "tsu", "tsu", "shi"
, "n", "n", and other characters that are likely to have very short and wide strokes, errors in feature extraction may occur.

第１５図は、「つ」の文字パターンを例示しており、こ
の文字パターンでは、Ｈで示す部分が極短かつ幅広のス
トロークとなっている。このストロークでは、サブスト
ロークＣ３，Ｃ４およびＤｌ、　Ｂ２は所定値以上に間
隔が開き過ぎているので、それらのサブストロークでは
一対性が成立しない。したがって、Ｈで示す黒地部分（
以下これを黒地塊と称す。）はストロークとして抽出さ
れず、得られた近似パターンは第１６図に示すように誤
ったものとなる。なお、第１６図の近似パターン中、点
線で示したストロークが、抽出されなかった黒地塊に相
当している。FIG. 15 illustrates a character pattern for "tsu", in which the portion indicated by H is an extremely short and wide stroke. In this stroke, sub-strokes C3, C4 and Dl, B2 are too far apart from each other by a predetermined value or more, so pairability does not hold between these sub-strokes. Therefore, the black background area indicated by H (
Hereinafter, this will be referred to as a black base mass. ) is not extracted as a stroke, and the obtained approximate pattern is incorrect as shown in FIG. Note that in the approximate pattern in FIG. 16, the strokes indicated by dotted lines correspond to black background blocks that were not extracted.

このように、従来の文字認識装置では、殊に極短かつ幅
広のストロークが発生し易い文字については、必ずしも
正確な文字認識を行なうことができないという欠点があ
った。As described above, conventional character recognition devices have the disadvantage that accurate character recognition cannot always be performed, especially for characters that tend to have extremely short and wide strokes.

〈発明の目的〉この発明は、上述した欠点の克服を意図しており、正確
な文字認識を行なうことのできる文字認識装置を提供す
ることを目的とする。<Object of the Invention> The present invention is intended to overcome the above-mentioned drawbacks, and aims to provide a character recognition device that can perform accurate character recognition.

〈発明の構成および効果〉上記目的を達成するため、この発明においては、ストロ
ーク抽出手段のほかに黒地抽出手段を設け、まずストロ
ーク抽出手段によって入力画像からストロークを抽出し
、しかる後、上記ストローク抽出手段によってストロー
ク化されなかった入力画像中の一連結の黒地部分を黒地
塊として抽出するようにしている。<Structure and Effects of the Invention> In order to achieve the above object, in this invention, a black background extraction means is provided in addition to the stroke extraction means, the stroke extraction means first extracts a stroke from an input image, and then the stroke extraction A connected black background portion in the input image that has not been converted into a stroke is extracted as a black background block by the means.

この発明によれば、文字パターン「つ」、「ツ」、「シ
」、「ン」、「ン」　等で黒地塊が発生した場合であっ
ても、ストロークの代替情報として黒地塊を表わす情報
が文字特徴として抽出されることになり、これを利用し
て正確な文字認識を行なうことが可能となる。According to this invention, even if a black background block occurs in character patterns such as "tsu", "tsu", "shi", "n", "n", etc., the black background block is used as stroke alternative information. The information represented is extracted as a character feature, and this can be used to perform accurate character recognition.

〈実施例の説明〉第１図は、この発明の一実施例である文字認識装置のう
ち、特徴抽出部の要部を示すブロック図である。<Description of Embodiments> FIG. 1 is a block diagram showing a main part of a feature extraction section of a character recognition device which is an embodiment of the present invention.

第１図において、画像メモリ７は、未知文字の幾何学的
パターンを読み取って白黒２値化した入力画像を記憶す
るためのメモリである。境界ラベリング回路８は、画像
メモリ７に記憶された画像のうち、白地と黒地との境界
部に位置する黒画素に、後述するラベル化を行なうため
のものである。また、サブストローク抽出回路９は、ラ
ベル化が行なわれた後の文字パターンのデータからサブ
ストロークを抽出する回路であり、ストローク抽出回路
１０は、このサブストロークについてのデータから、対
となったサブストロークを抽出してストロークデータを
与える回路である。文字特徴点抽出回路１１は、ストロ
ークデータに基づきストローク相互の位置関係を調べて
、文字の基本的な特徴である交点、分岐点などの幾何学
的な特徴を抽出する回路である。さらにＲＡＭ１２は、
ストローク抽出後のデータから、ストローク未抽出とな
った黒地塊を抽出する処理を行うためＣＰＵ１３のプロ
グラムや、上記境界ラベリング回路８、サブストローク
抽出回路９およびストローク抽出回路１０をＣＰＵ１３
によって制御するためのプログラム、それに前記ラベリ
ング回路によって得られるラベル化情報、サブストロー
ク抽出回路９とストローク抽出回路１０とによって得ら
れるサブストロークデータやストロークデータおよび本
発明の目的である黒地塊データなどを格納するメモリで
ある。In FIG. 1, an image memory 7 is a memory for storing an input image obtained by reading the geometric pattern of unknown characters and converting it into black and white. The boundary labeling circuit 8 is for labeling black pixels located at the boundary between a white background and a black background in the image stored in the image memory 7, as will be described later. Further, the substroke extraction circuit 9 is a circuit for extracting substrokes from the data of the character pattern after labeling, and the stroke extraction circuit 10 extracts paired substrokes from the data regarding the substrokes. This is a circuit that extracts strokes and provides stroke data. The character feature point extraction circuit 11 is a circuit that examines the positional relationship between strokes based on stroke data and extracts geometric features such as intersections and branch points, which are basic features of characters. Furthermore, RAM12 is
The program of the CPU 13, the boundary labeling circuit 8, the sub-stroke extraction circuit 9, and the stroke extraction circuit 10 are executed by the CPU 13 in order to perform the process of extracting the black ground blocks for which no strokes have been extracted from the data after stroke extraction.
a program for controlling the data, labeling information obtained by the labeling circuit, substroke data and stroke data obtained by the substroke extraction circuit 9 and the stroke extraction circuit 10, and black background block data that is the object of the present invention. It is a memory that stores.

ここでまず、境界ラベリング回路８によるラベル化処理
の方法について、第２図を参照して説明する。First, a method of labeling processing by the boundary labeling circuit 8 will be explained with reference to FIG. 2.

第２図は、第１５図の文字パターンのうち、部分Ｈに対
応する箇所を拡大して示しており、この部分Ｈにおいて
は、前記のとおりサブストロークＣ３とＣ１との間の一
対性並びにサブストロークＤ□とＤ２との間の一対性は
成立していない。従ってこの部分Ｈからは、ストローク
の抽出はなく、部分Ｈは黒地塊として残るものである。FIG. 2 shows an enlarged view of the part of the character pattern in FIG. 15 that corresponds to part H. The pairing between the strokes D□ and D2 is not established. Therefore, no strokes are extracted from this portion H, and the portion H remains as a black background block.

なあ第２図および第１５図において、Ｘとｙとは、各画
素を特定するための画素番号（座標〕である。また、文
字パターンに対応する画素は黒地であり、他の部分は白
地となっていると考える。In Figures 2 and 15, X and y are pixel numbers (coordinates) for identifying each pixel.Also, the pixels corresponding to the character pattern are on a black background, and the other parts are on a white background. I think it is.

まず境界ラベリング回路８は、各画素について、（０１
，（ｒｌ、　（／１．　　（ｕ、　り・・・等で示した
ラベル化を行なうが、これらの記号の意味を第３図に示
す。例えば記号（ｕｌは、第３図（７）に示すように、
その画素の上側に隣接する画素が白地（Ｗｌとなってお
り、右側、左側および下側に隣接する画素は黒地（ｂｌ
となっているような画素を示すラベルである。実際のラ
ベル化は、上下左右のそれぞれに関する計４ビットを“
１”、”０”の２進値によって表現する。同図（イ）の
例では、上側と左側にそれぞれ隣接する画素が白地−）
になっているおよびｒ＝右側（ｒｉｇｈｔ）によって、
それぞれの画素に隣接する画素のうち白地のものがいず
れに存在するかを示しており、記号（０）は、隣接する
画素中に、白地のものがないことを意味している。この
ようにして第１５図の画素にラベル化を行ない、その結
果であるラベルを各画素の（Ｘ、７）座標とともに、Ｒ
ＡＭ１２内のＮＣ：ＯＴと呼ぶ領域に第４図に示すよう
に格納しておく。First, the boundary labeling circuit 8 labels each pixel (01
, (rl, (/1. As shown,
The pixels adjacent to the top of that pixel are white (Wl), and the pixels adjacent to the right, left, and bottom are black (bl).
This is a label indicating a pixel that looks like this. Actual labeling requires a total of 4 bits for each of the top, bottom, left, and right sides.
It is expressed using binary values of 1" and 0. In the example in the same figure (a), the pixels adjacent to the top and left are white (-)
and by r=right,
It shows which of the pixels adjacent to each pixel has a white background, and the symbol (0) means that there is no white background among the adjacent pixels. In this way, the pixels in FIG.
It is stored in an area called NC:OT in the AM 12 as shown in FIG.

この格納に際しては、まず第１５図の未知パターンに対
して、第１３図中に矢印で示した方向へと走査を行ない
、最初に到達した（ｕ、ｒ）ラベルの画素から、文字パ
ターンを左まわりに追跡する。ここで、左まわりの追跡
とは、第５図に示すように、その時点における追跡点が
位置する画素のラベルの４ビツトのうちの＠１”を示す
ビットＡ１〜Ｄ１　に着目し、１”を示すビットの位置
がＡ１の場合には、そのラベルまたはそのラベルに隣接
するラベル中のＡ２〜Ａ、の位置に“１”を示すピッ、
トが存在するか否かを、この人２〜Ａ４の順序に従って
検出し、存在する場合には、検出されたビット＠１”を
含むラベルの座標位置へと追跡点を移すとともに、その
ビット＠１”に新らたに着目して次の追跡を行なうこと
を°いう。When storing this, first, the unknown pattern in Figure 15 is scanned in the direction shown by the arrow in Figure 13, and from the first pixel of the (u, r) label reached, the character pattern is moved to the left. Track around. Here, tracking in the counterclockwise direction means, as shown in FIG. 5, paying attention to the bits A1 to D1 indicating @1" of the four bits of the label of the pixel where the tracking point is located at that time, and When the position of the bit indicating "1" is A1, the bit indicating "1" is placed at the positions A2 to A in that label or the label adjacent to that label.
Detects whether or not the bit exists in the order of this person 2 to A4, and if it exists, moves the tracking point to the coordinate position of the label that includes the detected bit @1'', and moves the tracking point to the coordinate position of the label that includes the detected bit @ 1" to perform the next tracking.

例えば、第２図のラベル（ｄ、Ｚ）で示される画素３１
に追跡点がある場合、このラベル（ｄ、／）中の位置Ａ
１のビット（ラベルにｄを含んでいるので°１”である
）に春目すれば、Ａ２の位置にはビット＠１”が存在せ
ず、Ａ３の位置にビット１１”が存在しているので、こ
のピッ）”１’を含むラベルにより表わされる隣接の画
素３２に追跡点を移すのである。第５図中）〜（Ｄ）も
同様であって、Ｂ２〜Ｂ、、Ｃ２〜Ｃ４、Ｄ２〜Ｄ４の
それぞれの順序で、ラベルのビット１１”の検出と追跡
とを行なう。For example, pixel 31 indicated by label (d, Z) in FIG.
If there is a tracking point at , the position A in this label (d, /)
If we look at bit 1 (°1" because the label includes d), there is no bit @1" in position A2, and bit 11" exists in position A3. Therefore, the tracking point is moved to the adjacent pixel 32 represented by the label containing "1". 5) to (D) are similar, and bit 11'' of the label is detected and tracked in the respective orders of B2 to B, C2 to C4, and D2 to D4.

この追跡によって、文字パターンの外周境界に位置する
画素が順次検出されると、その順序に従って、位置座標
とラベルの種類とが、第４図に示すメモリ領域ＮＣ０Ｔ
上に格納されることになる。一方、追跡点は、文字パタ
ーンのひとつの連結部分のまわりをまわると、元の追跡
出発点へと戻って閉ループを形成するもので、この追跡
点が追跡出発点へ戻った場合には、第４図中に「ストッ
パ」と表示された指標を与えておく。その後、次の連結
部分についての追跡を行ない、同様の処理を繰返す。し
たがって、第４図中の「ストッパ」の数は、その文字パ
ターンの連結数と一致し、ひとつのストッパと他のスト
ッパとの間に格納されたデータは、同一の連結部分に関
するデータを意味することになる。Through this tracking, when pixels located at the outer boundary of the character pattern are sequentially detected, the position coordinates and label type are stored in the memory area NC0T shown in FIG.
It will be stored above. On the other hand, when a tracking point goes around one connected part of a character pattern, it returns to the original tracking starting point, forming a closed loop. 4. Provide an index labeled "stopper" in Figure 4. Thereafter, the next connected portion is tracked and the same process is repeated. Therefore, the number of "stoppers" in Figure 4 matches the number of connections in the character pattern, and data stored between one stopper and another stopper means data regarding the same connected part. It turns out.

そして、この一連結のデータの各々を識別するために、
各一連結データの前には連結番号ｃｏｒ１゜Ｃ０Ｔ２．
・・・を付加しておく。Then, to identify each piece of data in this series,
In front of each series of data is a concatenation number cor1゜C0T2.
Add...

第６図は、ＲＡＭ１２内の記憶領域のうち、各ストロー
クの端点を形成する画素のデータを記憶するためのＳＴ
Ｍ領域の一部を示す。例えばＡＳＴＭは、入方向のスト
ロークについての端点情報を格納する領域を意味し、入
方向のストロークたとえばストロークＳＡｌの４つの端
点に相当する画素のデータが、サブストローク単位のパ
ックとして格納されている。Ｃ方向のストロークの端点
情報を格納する領域Ｃ３ＴＭも同様の態様でデータが格
納されている。FIG. 6 shows ST for storing data of pixels forming the end points of each stroke in the storage area in the RAM 12.
A part of M area is shown. For example, ASTM refers to an area for storing end point information for incoming strokes, and pixel data corresponding to the four end points of an incoming stroke, for example, stroke SAl, is stored as a pack in substroke units. Data is stored in the same manner in the area C3TM that stores the end point information of the stroke in the C direction.

次に、この実施例における処理を順次説明する。まず、
第９図の読取部２によって、未知文字の幾何学的パター
ンが読取られる。このパターンは、画像メモリ７（第１
図）中に格納され、境界ラベリング回路８によって、上
述したラベル化が行なわれて、一連結データが第４図に
示した態様でＲＡＭ１２中のＮＣ０Ｔ領域に格納される
。その後、サブストローク抽出回路９はラベル化された
黒地の画素の、４方向における連続性を追跡して、サブ
ストロークデータを求め、つづいて、ストローク抽出回
路９は、このサブストロークデータを参照して、サブス
トローク相互間の一対性を判定し、サブストロークデー
タを各方向のストロークごとに分類する。Next, the processing in this embodiment will be sequentially explained. first,
The reading unit 2 shown in FIG. 9 reads the geometric pattern of the unknown character. This pattern is stored in the image memory 7 (first
The boundary labeling circuit 8 performs the above-described labeling, and the serial data is stored in the NCOT area of the RAM 12 in the manner shown in FIG. Thereafter, the substroke extraction circuit 9 traces the continuity of the labeled black background pixels in four directions to obtain substroke data, and then the stroke extraction circuit 9 refers to this substroke data. , the pairing between substrokes is determined, and the substroke data is classified for each stroke in each direction.

次に、このストローク抽出回路９は、抽出済みのストロ
ークデータから、各ストロークのストローク端点を決定
し、端点情報を第６図に示した態様でＲＡＭ１２中のＳ
ＴＭ領域に格納する。Next, this stroke extraction circuit 9 determines the stroke end points of each stroke from the extracted stroke data, and stores the end point information in the S memory in the RAM 12 in the manner shown in FIG.
Store in the TM area.

次にＣＰＵ１６は、上述のようにしてＲＡＭ１２中のＮ
Ｃ０Ｔ領域およびＳＴＭ領域に第４図および第６図に示
すようにしてそれぞれ格納された一連結データおよびス
トローク端点データを用いて、ストローク化されなかっ
た黒地塊の抽出処理を実行する。この黒地塊抽出処理の
処理手順を、第８図のフローチャートにより以下に説明
する。Next, the CPU 16 stores N in the RAM 12 as described above.
Using the continuous data and stroke end point data stored in the COT area and the STM area as shown in FIGS. 4 and 6, respectively, extraction processing of black background blocks that have not been converted into strokes is executed. The processing procedure of this black background block extraction process will be explained below with reference to the flowchart of FIG.

マス、ステップ２１では、第６図のストローク端点デー
タを全てチェックしたかどうかを判定する。最初は°Ｎ
ｏ”であるので、ステップ２２に進み、ストローク端点
データをロードする。In step 21, it is determined whether all stroke end point data in FIG. 6 has been checked. At first °N
o'', the process advances to step 22 and the stroke end point data is loaded.

第６図の例では、ＡＳＴＭ領域の（４，１１）の端点座
標データが最初にロードされる。次に、ステップ２３で
は、この端点座標がＮＣ０Ｔ領域内に存在するかどうか
を判定するため、ＮＣ０Ｔ領域を走査する。第４図の例
では、図中には表われていないが、Ｃ０Ｔ２領域で一致
がとられることになる。In the example of FIG. 6, the end point coordinate data of (4, 11) in the ASTM area is loaded first. Next, in step 23, the NCOT area is scanned to determine whether the end point coordinates exist within the NCOT area. In the example of FIG. 4, although not shown in the figure, a match is made in the C0T2 region.

続（ステップ２４では、一致がとられたかどうかを判定
し、一致すればステップ２５へと進み、該当するＣＯＴ
番号、すなわちＣ０Ｔ２をラベル化する。このラベル化
は、ＣＯＴ番号に関連スる特定のビットをＯＮにフラグ
化することにより行なう。そして再びステップ２１へと
戻り、全てのストローク端点をチェックしたかどうかを
調べる。なお、ステップ２４の判定が“Ｎｏ”のときに
は、ステップ２４からステップ２１へと戻る。Continuing (In step 24, it is determined whether a match has been made, and if there is a match, the process proceeds to step 25, and the corresponding COT
Label the number, C0T2. This labeling is performed by flagging a specific bit associated with the COT number ON. Then, the process returns to step 21 to check whether all stroke end points have been checked. Note that when the determination in step 24 is "No", the process returns from step 24 to step 21.

以上の処理を５７Ｍ領域内のすべての端点データに対し
て実行すると、ステップ２１での判定が“ＹＥＳ”とな
って、ステップ２６へと進む。When the above processing is executed for all end point data in the 57M area, the determination in step 21 becomes "YES" and the process proceeds to step 26.

ステップ２６では、ＮＣ０Ｔ領域内のＣＯＴ番号領域を
順次的に走査する。続いてステップ２７では、ステップ
２６で走査したＣＯＴ番号が、ラベル化されていないＣ
ＯＴ番号かどうかを判定する。ラベ／）ｊ化されていな
いＣＯＴ番号のときには、全ストローク情報が存在しな
い連結の黒地部分の存在を示唆しているので、ステップ
２８へと進んで、第８図に示すようなＲＡ　Ｍ１２のＮ
ＢＬＡＣＫ領域に、黒地塊の数およびそのＣＯＴ番号を
格納する。このステップ２８の処理が完了すると、次の
ステップ２９へと進むが、前記ステップ２７の判定が°
ＮＯ”のときには、ステップ２８をスキップして、ステ
ップ２９へと移行する。ステップ２９では、ＮＣ０Ｔ領
域のＣＯＴ番号領域を全て走査したかどうかを判定し、
＠ＹＥＳ’のときには処理を終了し、“Ｎｏ”のときに
はステップ２６に戻って３、上記ステップ２６〜ステツ
プ２９の処理を続行する。In step 26, the COT number areas within the NCOT area are sequentially scanned. Subsequently, in step 27, the COT number scanned in step 26 is replaced with an unlabeled COT number.
Determine whether it is an OT number. When the COT number is not converted to label/)j, it suggests the existence of a black background part of the connection for which no complete stroke information exists, so the process advances to step 28 and the N of RAM 12 as shown in FIG.
The number of black background blocks and their COT numbers are stored in the BLACK area. When the processing in step 28 is completed, the process proceeds to the next step 29, but the determination in step 27 is not completed.
If "NO", step 28 is skipped and the process moves to step 29. In step 29, it is determined whether the entire COT number area of the NCOT area has been scanned,
If @YES', the process ends; if "No", the process returns to step 26 and continues the process of step 3 and steps 26 to 29 described above.

以上の処理を実行することによって、各一連結黒地部分
を表わす情報内にストローク端点として抽出されなかっ
たものが存在するかどうかが判定される。そしてストロ
ーク端点として抽出されなかった情報が存在する場合に
は、その一連結黒地部分はストローク抽出されていない
のであるから、黒地塊として抽出されることになる。こ
の抽出された黒地塊情報は、上述したようにＲＡＭ１２
のＮＢＬＡＣＫ領域に格納され、後の辞書照合処理にお
いてこれを参照すれば、確実な文字認識を行なうことが
可能となる。By executing the above processing, it is determined whether or not there is a stroke end point that is not extracted as a stroke end point in the information representing each continuous black background portion. If there is information that has not been extracted as a stroke end point, the continuous black background portion has not been extracted as a stroke, and will therefore be extracted as a black background block. This extracted black background block information is stored in the RAM 12 as described above.
This is stored in the NBLACK area of , and by referring to it in later dictionary matching processing, it becomes possible to perform reliable character recognition.

この場合、辞書内の標準パターン中には、上記黒地塊情
報を予め格納しておくことになるが、特に、１つ”、１
ノ”、１シ”、１ソ”、１ン”　等の黒地塊が発生し易
いパターンの辞書については、その必要性が高いといえ
る。In this case, the above-mentioned black background block information is stored in advance in the standard pattern in the dictionary.
It can be said that there is a high need for dictionaries for patterns that tend to have black background blocks, such as ``ノ'', １し'', １しょ'', １ん'', etc.

[Brief explanation of the drawing]

第１図はこの発明の一実施例における特徴抽出部の要部
を示すブロック図、第２図は入力文字パターンの一部を
示す図、第３図はラベル化の説明図、第４図、第６図お
よび第８図は記憶領域の説明図、第５図は左まわり追跡
の説明図、第７図は黒地塊抽出処理の手順を示すフロー
チャート、第９図は従来の文字認識装置を示すブロック
図、第１０図は入力画像の細線化処理を示す図、第１１
図は入力画像のサブストロークを示す図、第１２図は追
跡方向の説明図、第１３図は入力画像の走査例を示す説
明図、第１４図および第１６図は近似パターンを示す図
、第１５図は入力文字パターンの一例を示す図である。８・・・境界ラベリング回路９・・・サブストローク抽出回路１０・・・ストローク抽出回路１２・・・ＲＡＭ１３　・・・ＣＰＵ特許出願人　　立石電機株式会社 ”ｋ　　ｌ　　迩口　　梵古色イシ・）のφ１吋・すブ
ロック６ｖ云３図 ″２図　　人力及享へ・ターンの一喜口斤、廿図、ｒ ”ｙ’ｒ　４　Ｆ２？　　　　壱と４老勃・へ・言え・
月Ｌ４’７６Ｅｉ２１　　　　　古ど）青令曳ｊへの七
−月面÷モｊ　圓　　　んＪゎリウ」５跨、すＬ克萌面
”Ａ　ｑ　　１２］　　　’ｔ　中’Ａ、ＩＫｊ１．４
ａ＋イ＊６に＊＝、＜ｒｏ−ｒｑｒＡ六７図り免Ｂ口図FIG. 1 is a block diagram showing the main parts of the feature extraction unit in an embodiment of the present invention, FIG. 2 is a diagram showing a part of an input character pattern, FIG. 3 is an explanatory diagram of labeling, and FIG. Figures 6 and 8 are explanatory diagrams of storage areas, Figure 5 is an explanatory diagram of counterclockwise tracking, Figure 7 is a flowchart showing the procedure of black background block extraction processing, and Figure 9 is a diagram of a conventional character recognition device. The block diagram shown in FIG. 10 is a diagram showing thinning processing of an input image, and FIG.
12 is an explanatory diagram of the tracking direction; FIG. 13 is an explanatory diagram showing an example of scanning the input image; FIGS. 14 and 16 are diagrams showing approximate patterns; FIG. 15 is a diagram showing an example of an input character pattern. 8...Boundary labeling circuit 9...Substroke extraction circuit 10...Stroke extraction circuit 12...RAM 13...CPU Patent applicant Tateishi Electric Co., Ltd. φ1 吋・Su block 6v 云3 fig. ``2 fig. Manpower and enjoyment to the turn of the turn.
Moon L4'76Ei21 Ancient) Seireihiki to Seven-Moon Surface ÷ Moj EnJゎRiu'' 5 straddles, Su L K Moe Side''A q 12] 't Medium'A, IKj1.4
a+I*6*=, <ro-rqrA67 diagram

Claims

[Claims]

(1) Read the character pattern of unknown characters, convert the pattern to black and white to obtain an input image, extract the geometric features of the input image using the feature extraction section, and then output the output from the feature extraction section to the dictionary section. A character recognition device that recognizes the unknown character by comparing it with a standard pattern stored in the input image, wherein the feature extraction unit extracts a directional stroke by adding a boundary between a black background and a white background of the input image. and a black background block extraction unit that extracts, as a black background block, a continuous black background part in the input image that is not extracted as a stroke by the stroke extraction unit. Device.

(2) The character recognition device according to claim 1, wherein the standard pattern is formed including information regarding the black background block.