JP2995825B2

JP2995825B2 - Japanese character recognition device

Info

Publication number: JP2995825B2
Application number: JP2234917A
Authority: JP
Inventors: 亜由美橘
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1990-09-04
Filing date: 1990-09-04
Publication date: 1999-12-27
Anticipated expiration: 2014-12-27
Also published as: JPH04114292A

Description

【発明の詳細な説明】産業上の利用分野本発明は、スキャナなどの画像読み取り装置から入力
された日本語文書の文字パターンを１文字ずつ切り出
し、認識する日本語文字認識装置に関するものである。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a Japanese character recognition device that cuts out a character pattern of a Japanese document input from an image reading device such as a scanner one by one and recognizes it.

従来の技術日本語の文字は、全角文字が大半を占めており、ほと
んどが固定サイズであるが、一部の仮名文字、半角文字
などは、サイズが不定である。また、「は」、「い」等
の分離文字が多く存在する。そのため、文字パターンの
切り出しを正確に行うことは困難である。そこで、従来
は、入力された文字パターンを最初は全角文字として切
り出し、全文字の特徴を記憶している全体辞書との照合
により認識を行い、認識結果が棄却となった場合、再
度、文字パターンを切り出していた。2. Description of the Related Art Japanese characters occupy the majority of full-width characters, and most of them have a fixed size. However, the size of some kana characters, half-width characters, and the like is undefined. Also, there are many separated characters such as "ha" and "i". Therefore, it is difficult to accurately extract a character pattern. Therefore, conventionally, an input character pattern is first cut out as full-width characters, and recognition is performed by collation with an entire dictionary that stores the characteristics of all characters. If the recognition result is rejected, the character pattern is re-input. Was cut out.

発明が解決しようとする課題しかしながら上記従来の方法では、日本語の文字数が
膨大なので、全体辞書との照合に多大の認識時間を要し
ていた。例えば、半角文字を認識する場合、最初に、全
角文字と判断して切り出し、認識結果が棄却され、再
度、半角文字として切り出した後、全体辞書との照合を
行うため、認識処理に多くの時間を要し処理の高速化を
図ることが不可能であった。Problems to be Solved by the Invention However, in the above-described conventional method, since the number of Japanese characters is enormous, it takes a long recognition time to collate with the entire dictionary. For example, when recognizing a half-width character, first, it is determined that the character is a full-width character, and the recognition result is rejected.Then, the character is cut out again as a half-width character, and then collation with the entire dictionary is performed. And it was not possible to increase the processing speed.

課題を解決するための手段本発明は、入力された文字パターンを認識する文字認
識装置において、文字パターンを全角サイズの文字であ
ると推定して切り出す手段と、切り出された文字パター
ンに対し全角辞書を参照して認識する手段と、前記認識
結果が棄却となった場合に、再度、棄却となった文字パ
ターンを１文字となり得る最適な位置で切り出す手段
と、再度切り出された文字パターンが半角サイズであれ
ば、半角辞書を参照して認識を行う手段と、を有するも
のである。Means for Solving the Problems The present invention relates to a character recognition device for recognizing an input character pattern. The character recognition device estimates a character pattern as a full-size character and cuts out the character pattern. Means for recognizing the character pattern by referring to the character pattern, and means for cutting out the rejected character pattern again at an optimum position where the character pattern becomes one character when the recognition result is rejected. And means for performing recognition by referring to the half-width dictionary.

作用上記構成により、特に、半角文字の認識を行う場合、
最初の切り出しを失敗しても、再度認識を行う際に、文
字数の少ない半角辞書のみを参照するため、全角文字と
誤判断することなく正確に、しかも高速に処理すること
ができる。Operation With the above configuration, in particular, when performing recognition of half-width characters,
Even if the first segmentation fails, since only the half-width dictionary with a small number of characters is referred to when re-recognition is performed, the process can be performed accurately and at high speed without erroneous determination as a full-width character.

実施例第１図は本発明の一実施例における日本語文字認識装
置の全体構成図である。１はスキャナなどの画像読み取
り装置から文字パターンを入力する入力部、２は入力さ
れた文字パターンを１文字ずつ切り出す文字切り出し
部、３は切り出された文字パターンの認識を行う認識
部、４は全角文字のみの特徴を記憶している全角辞書４
−１と、半角文字のみの特徴を記憶している半角辞書４
−２から構成される辞書、５は文字切り出し部２と、認
識部３の制御を行う制御部、６は認識された文字を出力
する出力部である。Embodiment FIG. 1 is an overall configuration diagram of a Japanese character recognition device according to an embodiment of the present invention. 1 is an input unit for inputting a character pattern from an image reading device such as a scanner, 2 is a character cutout unit for cutting out the input character pattern one character at a time, 3 is a recognition unit for recognizing the cutout character pattern, and 4 is a full-width character Full-width dictionary 4 storing features of characters only
-1 and a half-width dictionary 4 storing features of only half-width characters
Reference numeral 5 denotes a control unit for controlling the character cutout unit 2 and the recognition unit 3, and 6 denotes an output unit for outputting recognized characters.

以上のように構成された本実施例の日本語文字認識装
置について、第２図に示すフローチャートに従って、以
下その動作を説明する。The operation of the thus configured Japanese character recognition device of the present embodiment will be described below with reference to the flowchart shown in FIG.

まず、S1において、入力部１から文字パターンを入力
し、S2において、入力文字パターンの基準矩形を抽出す
る。基準矩形の抽出方法は、まず、文字パターンの黒画
素領域の外接矩形を抽出する。第３図の文字切り出しの
手順例で説明すると、（ａ）における文字パターン領域
であれば、（ｂ）のような外接矩形が抽出される。次
に、文書の方向と垂直に見て、外接矩形が重なっている
場合、これらを統合し、基準矩形とする。（ｂ）の外接
矩形であれば、（ｃ）のように抽出される。First, in S1, a character pattern is input from the input unit 1, and in S2, a reference rectangle of the input character pattern is extracted. In the extraction method of the reference rectangle, first, a circumscribed rectangle of the black pixel region of the character pattern is extracted. Explaining the example of the character extraction procedure in FIG. 3, if it is the character pattern area in (a), a circumscribed rectangle as in (b) is extracted. Next, when the circumscribed rectangles overlap each other when viewed perpendicularly to the direction of the document, they are integrated and set as a reference rectangle. If it is the circumscribed rectangle of (b), it is extracted as in (c).

S3において、基準矩形を基に、矩形の縦／横比が全角
文字サイズの範囲である間、隣接する基準矩形を統合
し、第１候補矩形を抽出する。第３図（ｃ）の基準矩形
であれば、（ｄ）のように抽出される。S4において、第
１候補矩形のサイズが、半角文字サイズであれば、S5に
おいて半角辞書に切り換え、半角文字サイズでなけれ
ば、S6において、全角辞書に切り替える。S7において、
辞書との照合を行い、照合に成功すれば、S9において、
認識された文字を出力し、S10において、未処理の基準
矩形が残っているならば、S3に戻り、次の文字パターン
の基準矩形の統合に移る。In S3, based on the reference rectangle, while the aspect ratio of the rectangle is within the full-width character size range, adjacent reference rectangles are integrated, and a first candidate rectangle is extracted. If it is the reference rectangle in FIG. 3C, it is extracted as shown in FIG. In S4, if the size of the first candidate rectangle is a half-width character size, it is switched to a half-width dictionary in S5, and if not, it is switched to a full-width dictionary in S6. In S7,
After performing the collation with the dictionary and succeeding in the collation, in S9,
The recognized character is output, and in S10, if an unprocessed reference rectangle remains, the process returns to S3 and proceeds to integration of the reference rectangle of the next character pattern.

S7において、辞書との照合で棄却されると、S8におい
て、第１候補矩形の切り出しが失敗されたとみなし、次
に１文字となり得る最適な位置まで、隣接する基準矩形
を再度、統合し直し、第２候補矩形とする。第３図
（ｄ）の第１候補矩形であれば、（ｅ）のように抽出さ
れる。第２候補矩形を求めた後、S4に戻り、全角辞書・
半角辞書のいずれかと照合を行う。If it is rejected by collation with the dictionary in S7, it is considered that the extraction of the first candidate rectangle has failed in S8, and the adjacent reference rectangles are integrated again until the next optimal position that can be one character, Let it be the second candidate rectangle. If it is the first candidate rectangle in FIG. 3D, it is extracted as shown in FIG. After obtaining the second candidate rectangle, the process returns to S4, where the full-width dictionary
Check with one of the half-width dictionaries.

発明の効果本発明は、辞書を全角文字用と半角文字用の辞書とに
分割しておき、全角文字として切り出した文字パターン
の認識結果が棄却となった場合にのみ、再度、切り出し
を行い、切り出された文字パターンのサイズにより、い
ずれかの辞書との照合を行うので、全角文字・半角文字
混在の文書において、特に、半角文字を認識する場合、
最初に１文字の切り出しを失敗しても、再度切り出した
後、半角辞書のみを参照するため、全角文字と正確に区
別でき、その結果、飛躍的高速処理が可能となり、入力
作業の作業性が格段に向上した。The present invention, the dictionary is divided into a dictionary for full-width characters and a dictionary for half-width characters, and only when the recognition result of the character pattern cut out as full-width characters is rejected, cut out again, According to the size of the extracted character pattern, matching with one of the dictionaries is performed, so in the case of documents containing both full-width and half-width characters, especially when recognizing half-width characters,
Even if the extraction of one character fails first, the character is extracted again and then only the half-width dictionary is referred to, so it can be accurately distinguished from the full-width character. As a result, dramatic high-speed processing becomes possible, and the workability of the input operation is improved. It has improved dramatically.

[Brief description of the drawings]

第１図は本発明の一実施例における日本語文字認識装置
の全体構成図、第２図はフローチャート、第３図は文字
切り出しの手順例を示す説明図である。１……入力部２……文字切り出し部３……認識部４……辞書４−１……全角辞書４−２……半角辞書５……制御部６……出力部FIG. 1 is an overall configuration diagram of a Japanese character recognition device according to an embodiment of the present invention, FIG. 2 is a flowchart, and FIG. 3 is an explanatory diagram showing an example of a character extraction procedure. DESCRIPTION OF SYMBOLS 1 ... Input part 2 ... Character extraction part 3 ... Recognition part 4 ... Dictionary 4-1 ... Full-width dictionary 4-2 ... Half-width dictionary 5 ... Control part 6 ... Output part

Claims

(57) [Claims]

1. A character recognition apparatus for recognizing an input character pattern, means for extracting a character pattern as if it were a full-size character, and cutting out the character pattern by referring to a full-size dictionary. Means, if the recognition result is rejected, means for cutting out the rejected character pattern again at an optimal position where it can be one character, and if the re-cut out character pattern is half-size,
Means for performing recognition by referring to a half-width dictionary.