JPS601676B2

JPS601676B2 - character recognition device

Info

Publication number: JPS601676B2
Application number: JP55133784A
Authority: JP
Inventors: 正廣大川
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1980-09-26
Filing date: 1980-09-26
Publication date: 1985-01-16
Also published as: JPS5759288A

Description

【発明の詳細な説明】本発明は４・型形状の文字、たとえば濁点、（〃）、半
濁点（ｏ）、ピリオド（。DETAILED DESCRIPTION OF THE INVENTION The present invention provides four types of characters, such as dakuten, (〃), semi-dakuten (o), and period (.

）等を認識するに適した文字認識装置に関する。小型形
状の文字、濁点、（″）、半濁点（ｏ）、ピリオド（。), etc. The present invention relates to a character recognition device suitable for recognizing characters such as Small-sized characters, dakuten (''), half-dakuten (o), period (.

）等が変形して書かれてある場合には文字認識に有効な
特徴が少ないために、その文字だけの特徴による精度の
高い答の判定は困難である。しかし、この場合に答の候
補を２個又は３個に絞ることは／できる。又、このよう
な小型形状の文字は左隣り−の文字と何らかの相関（文
字の相対位置、文字のつながり）がある場合が多い。例
えば、濁点、半濁点は概ね左隣りの文字の上半分に、ピ
リオドは下半分に書くのが普通であるし、又、濁点の左
隣りの文字は力行，サ行．タ行，ハ行，ゥのいずれかの
文字であり、半濁点の左隣りの文字はハ行以外の文字で
はないのが普通である。本発明の方式は、この左隣りの
文字との相対情報を特徴として用いた認識方式である。) etc. are written in a deformed manner, there are few features that are effective for character recognition, so it is difficult to determine a highly accurate answer based on the features of that character alone. However, in this case, it is possible to narrow down the answer candidates to two or three. Further, such small-sized characters often have some kind of correlation (relative position of characters, connection of characters) with the characters on the left side. For example, dakuten and handakuten are generally written in the upper half of the character to the left, and periods are usually written in the lower half, and the characters to the left of the dakuten are written in ikigyo and sagyo. It is one of the characters T, H, and U, and the character to the left of the handakuten is usually not any other character. The method of the present invention is a recognition method that uses this relative information with the adjacent character on the left as a feature.

以下図に従って、本発明を説明する。The present invention will be explained below with reference to the figures.

図は本発明の実施例を示し、１はＣＣＤイメージセンサ
からの信号を記憶する文字バッファ、２は文字の高さ，
幅，基点を求める位置検出器、３は位置検出部からの情
報を記憶するメモリ、４は文字の輪郭特徴を抽出する特
徴抽出部、５は特徴抽出部で抽出した輪郭特徴を記憶す
るメモリ、６は辞書７内のパターンと、メモリ５の内容
を比較する判定部、８は制御部、９は文字の位置情報を
記憶するメモリ、１川ま左隣りの文字との相対関係によ
り最終的な答を決定する編集部、１１は判定部６により
選択された辞書内のパターンを記憶するメモリ、１２は
答を記憶するメモリである。The figure shows an embodiment of the present invention, in which 1 is a character buffer that stores signals from a CCD image sensor, 2 is the height of a character,
a position detector for determining the width and base point; 3 a memory for storing information from the position detection section; 4 a feature extraction section for extracting contour features of characters; 5 a memory for storing contour features extracted by the feature extraction section; 6 is a determination unit that compares the pattern in the dictionary 7 with the contents of the memory 5; 8 is a control unit; 9 is a memory that stores character position information; An editing section 11 determines the answer, a memory 11 stores the pattern in the dictionary selected by the determining section 6, and a memory 12 stores the answer.

動作を説明すると、ＣＣＤイメージセンサにより光電変
換された文字情報は文字バッファ１に記憶される。文字
バッファ１内の文字情報は位置検出器２及び特徴抽出部
４に入力し、位置検出器２において文字の高さ，幅，基
点の位置が求められ、これらのデータはメモリ３及び９
に入力される。To explain the operation, character information photoelectrically converted by the CCD image sensor is stored in the character buffer 1. The character information in the character buffer 1 is input to the position detector 2 and the feature extraction unit 4, and the height, width, and base point position of the character are determined by the position detector 2, and these data are stored in the memories 3 and 9.
is input.

メモリ３のデータは更に特徴抽出部４に入力し、このデ
ータを基に文字バッファ１から入力された文字の輪郭特
徴が抽出される。The data in the memory 3 is further input to a feature extraction section 4, and based on this data, the contour features of the characters input from the character buffer 1 are extracted.

一方メモリ９のデー外ま後述する小型形状の文字認識に
利用される。On the other hand, the data stored in the memory 9 are also used for small-sized character recognition, which will be described later.

特徴抽出部４により抽出された文字の輪郭はメモリ５に
記憶され、判定部６において、辞書内のパターンと比較
される。比較の結果、選択されたパターンはメモリ１１
を介し ′て編集部１０に入力する。選択されたパター
ンに相対情報チェック指示が付加されていなければ、編
集部１０からそのままメモリ１２に入力される。ここで
、前述した如き４・型形状の文字については辞書内のパ
ターンに相対情報チェック指示が付加されている。編集
部１０では、このチェック指示が付加されていた場合に
は、メモリ９に記憶されている文字位置情報及びメモリ
ー２内に記憶されている左隣りの文字とのつながりを用
いて最終的な答えを出力する。すなわち、答の候補とし
て以下のように、第１候補：半濁点第２候補：濁点第３候補：ピリオド３個の文字が挙げられた場合には、まず、左隣りの文字
との相対位置を比較し、濁点、半濁点とピリオドの両者
のいずれであるかを判定する。The outline of the character extracted by the feature extraction unit 4 is stored in the memory 5, and compared with the pattern in the dictionary in the determination unit 6. As a result of the comparison, the selected pattern is stored in the memory 11.
The information is input to the editing section 10 via '. If a relative information check instruction is not added to the selected pattern, the pattern is input to the memory 12 from the editing section 10 as is. Here, relative information check instructions are added to the patterns in the dictionary for the 4-shaped characters as described above. If this check instruction has been added, the editing department 10 uses the character position information stored in the memory 9 and the connection with the adjacent character on the left stored in the memory 2 to create the final answer. Output. In other words, when three characters are listed as answer candidates, as shown below: 1st candidate: 2nd candidate: half-voiced mark: 3rd candidate: period. Compare and determine whether it is a dakuten, a half-dakuten, or a full stop.

左隣りの文字の上半分にあるならば、濁点か半濁点であ
り、下半分にあるならばピリオドである。もし左隣りに
文字が存在しないときは、判定不能であるかちりジェク
トする。このようにして答の候補が濁点か半濁点に絞ら
れたら、次に左隣りとの文字とのつながりを調べる。そ
の結果、左隣りの文字がハ行の文字である場合は、どち
らの可能性もあるので、優先順位の高い候補（この場合
は半濁点）を答とする。又、左隣りの文字が力行，サ行
，夕行，ウのいずれかである場合は、濁点とする。もし
、上記両者のいずれでもない場合は、文字のつながり上
ありえないのでリジェクトとする。以上の動作制御は制
御部８により行なわれ○る。以上述べた様に本発明によ
れば小型形状の文字について、その文字位置及び左隣り
の文字とのつながりにより判定しているので、正確に認
識することが可能となる。If it is in the upper half of the character to the left, it is a dakuten or handakuten, and if it is in the lower half, it is a period. If there is no character to the left, it is impossible to determine and the character is clicked to be ejected. Once the answer candidates have been narrowed down to either voiced or handakuten, the next step is to examine the connection between the characters on the left. As a result, if the character on the left is a character in the C line, there are both possibilities, so the candidate with the highest priority (in this case, the handakuten) is selected as the answer. Also, if the character to the left is either ikigyo, sagyo, yugyo, or u, it will be a voiced mark. If it is neither of the above, it is impossible due to the connection of characters, so it will be rejected. The above operation control is performed by the control section 8. As described above, according to the present invention, since a small-sized character is determined based on its position and the connection with the adjacent character on the left, it is possible to accurately recognize the character.

タ図面の簡単な説明図は本発明の実施例を示し、１は文字バッファ、２は位
置検出器、３，５，９，１１はメモリ、４は特徴抽出部
、６は判定部、７は辞書、１川ま編集部である。A simple explanatory diagram of the data drawing shows an embodiment of the present invention, in which 1 is a character buffer, 2 is a position detector, 3, 5, 9, and 11 are memories, 4 is a feature extraction section, 6 is a determination section, and 7 is a Dictionary, Ichikawama Editorial Department.

Claims

[Claims]

1. A position detection means for detecting the position of a read character, an extraction means for extracting the characteristics of the character, and a determination section for selecting similar or similar patterns from patterns in the dictionary based on the characteristics from the extraction means. and an editing section that outputs a final answer based on the relative position with the adjacent character on the left or the connection with the adjacent character on the left when a relative information check instruction is added to the output from the determining section. A character recognition device featuring: