JPS6255784A

JPS6255784A - optical character reader

Info

Publication number: JPS6255784A
Application number: JP60193731A
Authority: JP
Inventors: Kiyomichi Kurino; 栗野　清道; Takeyuki Sugimoto; 杉本　建行
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1985-09-04
Filing date: 1985-09-04
Publication date: 1987-03-11

Abstract

PURPOSE:To segment accurately characters having sonant points and also sonant points which are in contact with each other, by recognizing the boundary between a character and a sonant point or semi-sonant point in the form of a tentative theory of boundary like the setting of boundary between normal characters to check the propriety and then delivering the result of recognition in the form of unread characters in case the propriety is not approved. CONSTITUTION:For extraction of contours of character patterns, a pattern enough to recognize a single character is extracted for each character. Then the boundary is set to the character in an adjacent character frame from the positional relation between the extracted contours and the prescribed information on character frames. Then the candidate patterns of sonant and semi-sonant points are detected and the boundary is set between characters of sonant and semi-sonant points when said candidate pattern is detected. The patter existing at the left of the tentative theory of boundary is recognized as a character pattern. Then the propriety of the tentative theory of boundary is checked according to the result of recognition.

Description

【発明の詳細な説明】〔発明の利用分野〕本発明は、光学文字読取装置における濁点・半濁点およ
び句読点の読取りに係り、特に文字と同−枠に記入され
た濁点φ半濁点あるいは句読点を同一枠内の文字と分離
して読取る光学文字読取装置に関する。[Detailed Description of the Invention] [Field of Application of the Invention] The present invention relates to the reading of dakuten/handakuten and punctuation marks in an optical character reading device, and particularly to the reading of dakuten/handakuten and punctuation marks written in the same frame as characters. The present invention relates to an optical character reading device that reads characters separately from characters within the same frame.

[Background of the invention]

従来、文字と同一枠内に記入された濁点・半濁点を読取
る光学文字読取装置としては、４！ｆ冊昭５８−９７７
８３号に記載のように、元４変換された一文字分の文字
パターンを、矛１のメモリと、？２のメモリに格納し、
矛１のメモリから周縁長が長く濁点・半濁点のパターン
でないと判定されるパターンを消去して文字認識し、そ
の結果が濁点・半濁点なら第２のメモリから濁点・半濁
点のパターンを消去し、同一枠内の文字を認識している
。Conventionally, optical character reading devices that read voiced and handakuten marks written in the same frame as characters are 4! f book 1977-977
As described in No. 83, the character pattern of one character converted from the original 4 is stored in the memory of the spear 1, ? Store it in the memory of 2,
Erase from memory 1 the patterns that have long edges and are determined to be not dakuten/handakuten patterns, perform character recognition, and if the result is dakuten/handakuten, delete the dakuten/handakuten patterns from the second memory. However, characters within the same frame are recognized.

しかし、上記した従来の光学文字読取装置では、濁点・
半濁点と文字との境界を唯一に固定してノ、クターンを
作成し認識するため、例えば、濁点・半濁点を伴ってい
る文字を２文字であると誤認識することがある。また、
周縁長が長いパターンは、濁点・半濁点ではないと判断
され、。However, in the conventional optical character reading device mentioned above,
Since the boundary between a handakuten and a character is fixed uniquely to create and recognize a ``no'', a character with a dakuten or a handakuten may be mistakenly recognized as two characters, for example. Also,
Patterns with long margins are judged to be neither dakuten nor semi-dakuten.

第１のメモリから消去される。その為、例えば、濁点が
接触して周縁長が長くなっている場合には、濁点として
認識できない事態が生じる。erased from the first memory. Therefore, for example, if the dakuten points touch each other and the peripheral length becomes long, a situation may arise in which the dakuten cannot be recognized.

さらに、上記した従来技術においては、−生学の文字切
出し処理が終了していることを前提と１．ているため、
濁点、半濁点が隣接する文字枠まではみ出したり、ある
いは隣接の文字が侵入したりした場合、文字切出し処理
の誤りによる誤認識に対し考慮されていない。Furthermore, in the above-mentioned conventional technology, 1. Because
If a voiced mark or a half-voiced mark extends into an adjacent character frame, or if adjacent characters invade, no consideration is given to erroneous recognition due to errors in character extraction processing.

[Purpose of the invention]

本発明は上記した従来の光学文字読取装置の問題点に鑑
みなされたもので、上記した問題点を除去し、文字と同
二枠内に記入された濁点・半濁点を高い哨度で読取るこ
とが可能な光学文字読取装置を提供することを目的とし
ている。The present invention was made in view of the problems of the conventional optical character reading device described above, and it is possible to eliminate the above problems and read voiced and semi-voiced marks written in the same frame as characters with high precision. The purpose is to provide an optical character reading device that is capable of

[Summary of the invention]

本発明の光学文字読取装置は、まず文字を構成する各ブ
ロックパターンを輪郭情報として少くとも１文字枠分以
上拍出し、隣接文字枠との文字の境界を仮説として設定
すると共に、当該文字パターン内に濁点・半濁点と想定
されるパターンが存在するなら、さらに文字と濁点−半
濁点間の境界仮設を設定し、認識結果と文字パターン情
報から境界の妥当性をチェクし、妥当と判定される場合
のみ、同一文字枠内の文字と濁点半濁点を分離して読取
ることを特徴とする。The optical character reading device of the present invention first extracts each block pattern constituting a character as contour information for at least one character frame, sets the boundary of the character with an adjacent character frame as a hypothesis, and If there is a pattern that is assumed to be a voiced mark or a handakuten in Characters in the same character frame and dakuten and handakuten are read separately only when

[Embodiments of the invention]

以下、添付の図面に示す実施例により、更に詳細に本発
明について説明する。Hereinafter, the present invention will be explained in more detail with reference to embodiments shown in the accompanying drawings.

第１図は本発明の光学文字読取装置の一実施例を示すブ
ロック図である。第１図において、１は搬送ローラを含
む帳票搬送機構、２は帳票、３は発光源、４は受光素子
、５は二値化回路、６は記憶回路、７は文字パターンの
切出し認識部であり、通常マイクロプロセッサによって
構成される。８は辞書ファイル、９は上位装置である。FIG. 1 is a block diagram showing an embodiment of the optical character reading device of the present invention. In FIG. 1, 1 is a form transport mechanism including a transport roller, 2 is a form, 3 is a light emitting source, 4 is a light receiving element, 5 is a binarization circuit, 6 is a memory circuit, and 7 is a character pattern cutout recognition unit. Yes, usually configured by a microprocessor. 8 is a dictionary file, and 9 is a host device.

帳票２は、帳票搬送機構１により、発光源３および受光
素子４からなる読取部に搬送され、発光源３により光学
的に７２食される。この走査により、１１１１！県２上
の文字パターンの反射光は、受光素子４に入射され、受
光素子４は順次入射されるパターンな電気信号に変換し
て出力する。The form 2 is transported by the form transport mechanism 1 to a reading section consisting of a light emitting source 3 and a light receiving element 4, and is optically read 72 times by the light emitting source 3. With this scan, 1111! The reflected light of the character pattern on Prefecture 2 is incident on the light receiving element 4, which converts it into an electric signal corresponding to the pattern that is successively incident on it and outputs it.

受光素子４としては、例えば−次元半導体ＣＣ）Ｄ″″
″等”′ｆ″用６ｔ″６・０値“ヒ１１°１・受光素子
４からの出力信号を所定の閾値により１０″、′１”の
二値信号として記憶回路６に出力する。記憶回路６は帳
票上のパターンを′Ｏ″。As the light receiving element 4, for example, a -dimensional semiconductor CC)D""
"6t" for "f" such as "6t" 0 value "hi 11°1" The output signal from the light receiving element 4 is outputted to the storage circuit 6 as a binary signal of 10" and "1" according to a predetermined threshold value.Storage Circuit 6 reads the pattern on the form as 'O''.

１”の二値信号として記憶するもので、例えばＩＣメモ
リ等が使用される。切出し認識部７は記憶回路６の文字
パターンを切出し、切出された文字パターンは辞書ファ
イル８に格納されている率パターンと比較され、一致の
とれた標準パター／が認識結果として出力される。この
認識結果は、図示しない外部記憶装置や中央処理装置等
の上位装置に出力される。The character pattern is stored as a binary signal of 1", and an IC memory or the like is used, for example. The cutout recognition unit 7 cuts out the character pattern in the memory circuit 6, and the cutout character pattern is stored in the dictionary file 8. The pattern is compared with the rate pattern, and a matched standard putter is output as a recognition result.This recognition result is output to a host device such as an external storage device or a central processing unit (not shown).

矛２図は切出し認、ｆ＆部７の動作を示すフローチャー
トであり、第３図（ａ）、　＋ｂ）、　（Ｃ）は切出し
認識処理の一例を示す説明図である。矛２図に示す様に
、ステップ２０１において、文字パターンの輪郭の抽出
が行なわれる。本実施例において、文字パターンの輪郭
の抽出は、輪郭パターン格納メモリの容量の増大を防止
するため、−文字の認識に必要十分なパターンを一文字
毎に抽出する。第３図（ｅＬｌに示す文字パターンの輪
郭は、第３図（ｂｌに示す５０１，５０２，５０３の様
に抽出される。Figure 2 is a flowchart showing the cutting recognition and the operation of the f& section 7, and Figures 3 (a), +b) and (C) are explanatory diagrams showing an example of cutting recognition processing. As shown in Figure 2, in step 201, the outline of the character pattern is extracted. In this embodiment, in order to extract the outline of a character pattern, in order to prevent an increase in the capacity of the outline pattern storage memory, - patterns necessary and sufficient for character recognition are extracted for each character. The contours of the character pattern shown in FIG. 3 (eLl) are extracted as 501, 502, and 503 shown in FIG. 3 (bl).

次に、ステップ２０２において、抽出した輪−郭３０１
とあらかじめ指定されている文字枠情報との位置関係か
ら、隣接文字枠の文字との境界が設定される。、１？４
図は才５図（ｂ）に示す輪郭３０１における隣接文字枠
の文字の境界設定あ状態を示す図である。矛４図におい
て、らは矛１文字目を切出すための仮想枠で、あらかじ
め上位装置９から指定された書式情報に基づいてその位
置が算出される。しかし、仮想枠ｆ、の位置計算は誤差
を含むため、隣接文字枠との境界は才４図に斜線で示す
様に所定幅の不確定領域内に存在するものと想定する。Next, in step 202, the extracted contour 301
Based on the positional relationship between the character frame information specified in advance and the character frame information specified in advance, the boundary between the characters in the adjacent character frame is set. ,1?4
The figure shows a state in which boundaries are set for characters in adjacent character frames in the outline 301 shown in Figure 5(b). In Figure 4, is a virtual frame for cutting out the first character, and its position is calculated based on format information specified from the host device 9 in advance. However, since the calculation of the position of the virtual frame f includes an error, it is assumed that the boundary with the adjacent character frame exists within an uncertain region of a predetermined width, as shown by diagonal lines in Figure 4.

この不確定領域内にパターン中心が存在する場合、その
パターンは隣接文字枠内のパターンか否かが不明なため
、不確定パターンと定義する。、１？４図に示す例では
、パターンＣ１〜Ｃ４のうち、パターンＣ５，Ｃ４が不
確定パターンと定義される。このとき、矛４図に示す様
に、各不確定パターンｃ５＋　　ｃ４の左右の−１に真
の境界が存在すると仮定し、境界仮説Ｂ１１ｅ　ＢＨ＊
　ＢＩＳが設定される。If the center of the pattern exists within this uncertain area, it is unclear whether the pattern is within the adjacent character frame or not, so it is defined as an uncertain pattern. , 1 to 4, among the patterns C1 to C4, patterns C5 and C4 are defined as uncertain patterns. At this time, as shown in Figure 4, it is assumed that a true boundary exists at −1 on the left and right of each uncertain pattern c5+c4, and the boundary hypothesis B11e BH*
BIS is set.

不確定パターンが不存在の場合には、隣接文字枠との間
に境界仮説を設定する。If an uncertain pattern does not exist, a boundary hypothesis is set between adjacent character frames.

次に、第２１ｖ１に示すステップ２０３において、濁点
・半濁点候補パターンの検出が行なわれる。Next, in step 21v1, a candidate pattern for voiced and half-voiced marks is detected.

濁点・半濁点候補パターンの検出は、濁点・半濁点が文
字枠の右上に記入されていることと、パターンの輪郭長
が所定値以上にならないことを利用して行なう。パター
ンが文字枠の右上に記入されているか否かの判定は、上
位装置９から指定された書式情報に基づいて、ｉｓ図に
示す境界仮説ａを設定し、境界収税ａの上側にパターン
が存在するか否かで判定する。矛５図にボ丁Ｖ／１１で
は、パターンｃ２ａ　ｃ５ａ　ｃ４が境界仮説４の上に
記入されているが、パターンＣ４はその輪郭長が所定値
以上の値となるため、パターンｃ２ａ　ｃ３だけが濁点
・半濁点候補パターンとして検出される。Detection of voiced/handakuten candidate patterns is performed by utilizing the fact that the voiced/handakuten is written in the upper right corner of the character frame and that the contour length of the pattern does not exceed a predetermined value. To determine whether a pattern is written in the upper right corner of the character frame, a boundary hypothesis a shown in the IS diagram is set based on the format information specified from the host device 9, and a pattern exists above the boundary tax a. Determine whether or not to do so. In Figure 5, in Bocho V/11, patterns c2a, c5a, and c4 are written above boundary hypothesis 4, but since the contour length of pattern C4 is greater than the predetermined value, only patterns c2a and c3 are marked・Detected as a handakuten candidate pattern.

次に、矛２図に示すステップ２０４において、濁点・半
濁点候補パターンが検出されたか否かを判定し、検出さ
れたときには濁点・半濁点候補パターンを含む文字を濁
点・半濁点候補文字とする。Next, in step 204 shown in Figure 2, it is determined whether or not a voiced/handakuten candidate pattern has been detected, and if detected, a character containing the voiced/handakuten candidate pattern is set as a voiced/handakuten candidate character. .

次に、ステップ２０５において、濁点書半濁点文字境界
設定を行なう。濁点・半濁点文字境界設定は、濁点・半
濁点候補パターンを不確定パターンとみなし、前記した
境界仮説Ｂｉ、　Ｂ、、等の設定と同様に行なわれる。Next, in step 205, boundaries of voiced and handakuten characters are set. The voiced and handakuten character boundaries are set in the same manner as the boundary hypotheses Bi, B, etc. described above, with the voiced and handakuten candidate patterns regarded as uncertain patterns.

即ち、１６図に示す様に、境界仮設４より上の斜線で示
す領域を不確定領域とし、不確定領域中に存在する各濁
点・半濁点候補パターンの左右両側忙境界仮説を設定す
る。矛６図に示す例では、境界仮説Ｂ□。That is, as shown in FIG. 16, the area shown by diagonal lines above the boundary hypothesis 4 is defined as an uncertain area, and a left and right boundary hypothesis is set for each voiced and semi-voiced point candidate pattern existing in the uncertain area. In the example shown in Figure 6, boundary hypothesis B□.

Ｂｌｌ’　８１ｍ　　の他に、新たに境界仮説Ｂ１゜が
設定される。In addition to Bll' 81m, a new boundary hypothesis B1° is set.

次に、ステップ２０６において、境界仮説の左側に存在
するパターンを文字パターンとして認識処理を行なう。Next, in step 206, the pattern existing on the left side of the boundary hypothesis is recognized as a character pattern.

第３図（ｂ）の輪郭３０１は、境界仮説ＢＩＯＩ　ｓｏ
　ｌ　Ｂｌ！＃　ａｓｓにより４つの文字パターンＰ、
。＊　ＰＢｙｅ　Ｐ□ｅＰＢ１に分けられ、各文字パタ
ーンＰ１゜、Ｐ□、　ｐｌ、、　ｐ□に対して認識処理
が行なわれる。この場合には、文字パターンＰ、。だけ
が文字「ヒ」と正読され、他の文字パターンＰＨ＋　Ｐ
Ｈ１ｅＰａｌｌは不読と判定される。The contour 301 in FIG. 3(b) indicates the boundary hypothesis BIOI so
lBl! # Four character patterns P by ass,
. *PBye P□ePB1, and recognition processing is performed for each character pattern P1°, P□, pl, p□. In this case, the character pattern P,. Only the character ``hi'' is correctly read, and the other character patterns PH+P
H1ePall is determined to be unreadable.

次に、第２図に示すステップ２０７．２０Ｉｌｉにおい
て、ステップ２０６にぢける認識結果から境界仮説の妥
当性チェックが行なわれる。即ち、上記した。１′Ｆ３
図（Ｃ）に示す文字パターンＰ、。。Next, in steps 207 and 20Ili shown in FIG. 2, the validity of the boundary hypothesis is checked based on the recognition result obtained in step 206. That is, as described above. 1'F3
Character pattern P shown in figure (C). .

Ｐｌｌ　＋　ＰＢｙｅ　Ｐｌｌの例において、正しく認
識できたのは文字パターンＰ、。だけであるため、境界
仮説Ｂ、。In the example of Pll + PBye Pll, the character pattern P was correctly recognized. Since only , the boundary hypothesis B,.

だけが有効と判定され、読取結果として「ヒ」を出力す
る。尚、有効と判定される境界がない場合、または有効
な境界が複数存在する場合には、ステップ２０８で妥当
性なしと判定し、ステップ２０９で認識結果を不読文字
に変換して出力する。Only one is determined to be valid, and "hi" is output as the reading result. Note that if there is no boundary that is determined to be valid, or if there are multiple valid boundaries, it is determined in step 208 that there is no validity, and in step 209, the recognition result is converted into unreadable characters and output.

次に、ステップ２１０にＳいて、全ての文字の認識処理
が終了したか否かを判定し、終了していない場合にはス
テップ２０１へもどり、再度同一の処理を実行する。一
般に、濁点・半濁点候補文字であって、濁点・半濁点候
補パターンが有効な境界仮説の右側に残った場合、例え
ば前記した様に境界仮説Ｂ、。が有効で、境界仮説Ｂ、
。の右側に濁点−半濁点候補パターンＣ２゜Ｃ３が残っ
た場合、文字枠位置はそのままで、ステップ２０１に戻
り、濁点１半濁点の認識処理ケ行なう。濁点・半濁点の
認識処理によつ一濁点・半濁点候補パターンが、濁点か
半濁点か確認される。この場合、第３図１ｃ）に示す、
認識処理の終了した文字パターンＰ、。は消去される。Next, the process proceeds to step 210, and it is determined whether or not the recognition process for all characters has been completed. If the recognition process has not been completed, the process returns to step 201 and the same process is executed again. In general, when a voiced/handakuten candidate character remains on the right side of a valid boundary hypothesis, for example, boundary hypothesis B as described above. is valid, boundary hypothesis B,
. If the dakuten-handakuten candidate pattern C2-C3 remains on the right side of , the character frame position remains as it is and the process returns to step 201 to perform the recognition process for the dakuten 1 and handakuten. Through the voiced mark/handakuten recognition process, it is confirmed whether the idakuten/handakuten candidate pattern is a voiced mark or a handakuten mark. In this case, as shown in Figure 3 1c),
A character pattern P, for which recognition processing has been completed. will be deleted.

濁点・半濁点の認識処理は、次の様に行なわれる。即ち
、前記した例の場合、先ずステップ２０１で輪郭抽出を
行ない、第３図（Ｇ）に示す輪郭３０２が抽出される。The recognition process for voiced and half-voiced marks is performed as follows. That is, in the case of the above-mentioned example, contour extraction is first performed in step 201, and a contour 302 shown in FIG. 3(G) is extracted.

次に、ステップ２０２において、隣接文字境界設定を行
なう。これにより、第３図１ｂ）に示す境界仮説へ。、
　Ｂ、１．　Ｂ□が設定される。次に、ステップ２０５
．２０４に示す濁点・半濁点候補検出とその有無の判定
は行なわず、濁点・半濁点候補なしとして、ステップ２
０６へ進む。ステップ２０６において、矛３図（Ｃ）に
示す文字パターンＰ−，Ｐ□、Ｐ□の認識処理７行ない
、ステップ２０７，２０８で各文字パターンの妥当性が
判定される。その結果、文字パターンＰ□、だけが濁点
と正読され、他の文字パターンＰ、。、Ｐヨは不読と判
定される。ステップ２０６における認識処理において、
濁点・半濁点読取りの境界仮説の有効条件は、次の二つ
である。Next, in step 202, adjacent character boundaries are set. This leads to the boundary hypothesis shown in Figure 3 1b). ,
B.1. B□ is set. Next, step 205
．． The detection of voiced/handakuten candidates shown in step 204 and the determination of their presence/absence are not performed, and it is determined that there are no voiced/handakuten candidates, and step 2 is performed.
Proceed to 06. In step 206, seven lines of recognition processing are performed for the character patterns P-, P□, and P□ shown in Figure 3 (C), and the validity of each character pattern is determined in steps 207 and 208. As a result, only the character pattern P□ is correctly read as dakuten, and the other character patterns P,. , Pyo are determined to be unreadable. In the recognition process in step 206,
The following two conditions are valid for the boundary hypothesis for reading dakuten and handakuten.

（イ）　認識結果が濁点、または半濁点。(b) The recognition result is voiced or handakuten.

（ロ）　直前の文字の認識結果がハ行の文字（半濁点の
場合）、またはカ行、す行、夕行、へ行の文字（一点の
場合）。(b) The recognition result of the previous character is a character in the C line (in the case of a handakuten), or a character in the C line, Su line, Yu line, or He line (in the case of a single dot).

上記（イ）、仲）の条件が共に満たされたとき、濁点・
半濁点の境界仮説が有効と判定され、矛３図（ｂ）に示
す例では境界仮説Ｂ！ｌだけが有効と判定され、読取結
果として濁点を出力する。もし有効な境界仮説が存在し
なかったり、複数存在する場合は、ステップ２０７，２
０８で妥当性なしと判定し、ステップ２０９で直前の文
字の認識結果も含めて、−文字の不読文字として読取結
果を出力する。When the above conditions (a) and naka) are both satisfied,
The boundary hypothesis of the pendulum point is determined to be valid, and in the example shown in Figure 3 (b), the boundary hypothesis B! Only l is determined to be valid, and a voiced mark is output as the reading result. If there is no valid boundary hypothesis or if there are multiple valid boundary hypotheses, steps 207 and 2
In step 08, it is determined that there is no validity, and in step 209, the reading result is output as an unreadable character, including the recognition result of the immediately preceding character.

再びステップ２０１に戻り、矛５図［ｂ）に示す輪郭３
０３が抽出され、輪郭５０１と同様に、切出し認識処理
を実行する。そして、矛３図（ｂ）（ｃｌに示す様に、
境界仮説Ｂ、。の設定や文字パターンＰ、、の認識が行
なわれる。Returning to step 201 again, contour 3 shown in Figure 5 [b]
03 is extracted, and similarly to the contour 501, the cutout recognition process is executed. And, as shown in Figure 3 (b) (cl),
Boundary hypothesis B. settings and recognition of character patterns P, .

本実施例によれば、同一枠内に記入された文字とく濁点
、半濁点の境界を通常の文字と文字の境界と同様に境界
仮説として設定して読取りを行なうため、特別にハード
ウェアやメモＩＪ　）？増設することなしに濁点・半濁
点の読取りが可能となムさらに認識処理後の妥当性チェ
ックにより、不当と判定した場合、認識結果を不読文字
へ変換して出力するため、誤読を防止できる。According to this embodiment, in order to set and read the boundaries between characters, dakuten, and handakuten written in the same frame as boundary hypotheses in the same way as the boundaries between normal characters, special hardware and memo are required. IJ)? It is possible to read voiced and half-voiced marks without adding additional equipment.Furthermore, if a validity check is performed after recognition processing and it is judged to be invalid, the recognition result is converted to unreadable characters and output, thereby preventing misreading. .

〔Effect of the invention〕

不発明によれば、文字と一点半濁点の境界を通常の文字
と文字の境界設定と同じく境界仮説として、認識後、妥
当性をチェックし、不当な場合は認識結果を不読文字と
して出力するたべ例えば濁点のついた文字を正確に切出
すことができ、また濁点・半濁点候補パターンの判定の
際輪郭長の限界値を大きくとることができるため、例え
ば接触した濁点でも正確に切出すことが可能になる。さ
らに、濁点半濁点が隣接する文字枠まではみ出したり、
あるいは隣楼の文字の一部が侵入（７た場合、これが原
因となる切出し誤りによる誤読を有効に防出できる。According to the invention, the boundary between a character and a one-point handakuten is set as a boundary hypothesis in the same way as the boundary setting between normal characters, and after recognition, the validity is checked, and if it is invalid, the recognition result is output as an illegible character. For example, it is possible to accurately cut out characters with voiced marks, and because it is possible to set a large limit value for the contour length when determining candidate patterns for voiced and handakuten, it is possible to accurately cut out even voiced marks that touch each other. becomes possible. Furthermore, the dakuten and handakuten mark may extend to the adjacent character frame,
Alternatively, if a part of the characters in the neighboring building are intruded (7), misreading due to cutting errors caused by this can be effectively prevented.

[Brief explanation of drawings]

、１’　１１’Ｗは不発明の光学文字読取装置の一実施
例を示すブロック図、矛２図は矛１図に示す実施例の文
字切出し認識処理の手順を示すフローチャート、矛３図
（ａ）、　（ｂＬ　（ｃＮ！文字切出り認識処理の具体
例を示す説明図、矛４図は文字境界設定の一例を示す図
、矛５図はｔ！Ｉｉ点半濁点候補パターンの判定の一例
を示す図、オ６図は文字と濁点半濁点間の境界設定の一
例を示す図である。１・・・・飛票敲送機構、２・・・帳票、Ｓ　・−・−
；−光源４・・・受光素子、５　・２値化回路、６−・
記憶回路７・・・切出し認識部、８・・・辞書ファイル
、９・・・上位装置。第　１　口躬　２　口第　３　口（（Ｌ）（ｂ）（Ｃ）躬４の, 1'11'W is a block diagram showing an embodiment of the optical character reading device of the invention, Figure 2 is a flowchart showing the procedure of character extraction recognition processing of the embodiment shown in Figure 1, Figure 3 is a ), (bL (cN! Explanatory diagram showing a specific example of character extraction recognition processing, Figure 4 shows an example of character boundary setting, Figure 5 shows an example of determination of t!Ii point handakuten candidate pattern) Figure 6 is a diagram showing an example of boundary setting between characters and handakuten.
;-Light source 4... Light receiving element, 5 ・Binarization circuit, 6-・
Memory circuit 7... Extraction recognition unit, 8... Dictionary file, 9... Upper level device. 1st gibberish 2 gibberish 3rd gibberish ((L) (b) (C) gibberish 4

Claims

[Claims] 1. In an optical character reading device that optically reads out characters on a form, converts them into electrical signals, and recognizes them, the pattern constituting the character is detected for each block pattern, and its outline is extracted. A first means of detecting voiced and handakuten candidate patterns that may be voiced and handakuten from the extracted contours, and detecting voiced and handakuten candidates on both sides of each detected voiced and handakuten candidate pattern. A second means of setting a boundary hypothesis between the pattern and other patterns indicating characters, performing recognition processing using each set boundary hypothesis, and checking the validity of the boundary hypothesis based on the recognition result. and a third means for outputting the recognition result obtained by the boundary hypothesis only when a valid boundary exists according to the validity check, and otherwise outputting the recognition result as an unreadable character. An optical character reading device featuring: 2. The third means is that the recognition result is a voiced mark or a handakuten character, and the recognition result of the immediately preceding character is a character in the C line (in the case of a handakuten character), or a character in the C line, S line, T line, or C line. It is determined that there is validity only if the characters are (in case of voiced marks),
An optical character reading device characterized by outputting characters with voiced or semi-voiced marks as recognition results.