JP2995825B2 - Japanese character recognition device - Google Patents
Japanese character recognition deviceInfo
- Publication number
- JP2995825B2 JP2995825B2 JP2234917A JP23491790A JP2995825B2 JP 2995825 B2 JP2995825 B2 JP 2995825B2 JP 2234917 A JP2234917 A JP 2234917A JP 23491790 A JP23491790 A JP 23491790A JP 2995825 B2 JP2995825 B2 JP 2995825B2
- Authority
- JP
- Japan
- Prior art keywords
- character
- width
- dictionary
- character pattern
- full
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Landscapes
- Character Input (AREA)
- Character Discrimination (AREA)
Description
【発明の詳細な説明】 産業上の利用分野 本発明は、スキャナなどの画像読み取り装置から入力
された日本語文書の文字パターンを1文字ずつ切り出
し、認識する日本語文字認識装置に関するものである。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a Japanese character recognition device that cuts out a character pattern of a Japanese document input from an image reading device such as a scanner one by one and recognizes it.
従来の技術 日本語の文字は、全角文字が大半を占めており、ほと
んどが固定サイズであるが、一部の仮名文字、半角文字
などは、サイズが不定である。また、「は」、「い」等
の分離文字が多く存在する。そのため、文字パターンの
切り出しを正確に行うことは困難である。そこで、従来
は、入力された文字パターンを最初は全角文字として切
り出し、全文字の特徴を記憶している全体辞書との照合
により認識を行い、認識結果が棄却となった場合、再
度、文字パターンを切り出していた。2. Description of the Related Art Japanese characters occupy the majority of full-width characters, and most of them have a fixed size. However, the size of some kana characters, half-width characters, and the like is undefined. Also, there are many separated characters such as "ha" and "i". Therefore, it is difficult to accurately extract a character pattern. Therefore, conventionally, an input character pattern is first cut out as full-width characters, and recognition is performed by collation with an entire dictionary that stores the characteristics of all characters. If the recognition result is rejected, the character pattern is re-input. Was cut out.
発明が解決しようとする課題 しかしながら上記従来の方法では、日本語の文字数が
膨大なので、全体辞書との照合に多大の認識時間を要し
ていた。例えば、半角文字を認識する場合、最初に、全
角文字と判断して切り出し、認識結果が棄却され、再
度、半角文字として切り出した後、全体辞書との照合を
行うため、認識処理に多くの時間を要し処理の高速化を
図ることが不可能であった。Problems to be Solved by the Invention However, in the above-described conventional method, since the number of Japanese characters is enormous, it takes a long recognition time to collate with the entire dictionary. For example, when recognizing a half-width character, first, it is determined that the character is a full-width character, and the recognition result is rejected.Then, the character is cut out again as a half-width character, and then collation with the entire dictionary is performed. And it was not possible to increase the processing speed.
課題を解決するための手段 本発明は、入力された文字パターンを認識する文字認
識装置において、文字パターンを全角サイズの文字であ
ると推定して切り出す手段と、切り出された文字パター
ンに対し全角辞書を参照して認識する手段と、前記認識
結果が棄却となった場合に、再度、棄却となった文字パ
ターンを1文字となり得る最適な位置で切り出す手段
と、再度切り出された文字パターンが半角サイズであれ
ば、半角辞書を参照して認識を行う手段と、を有するも
のである。Means for Solving the Problems The present invention relates to a character recognition device for recognizing an input character pattern. The character recognition device estimates a character pattern as a full-size character and cuts out the character pattern. Means for recognizing the character pattern by referring to the character pattern, and means for cutting out the rejected character pattern again at an optimum position where the character pattern becomes one character when the recognition result is rejected. And means for performing recognition by referring to the half-width dictionary.
作用 上記構成により、特に、半角文字の認識を行う場合、
最初の切り出しを失敗しても、再度認識を行う際に、文
字数の少ない半角辞書のみを参照するため、全角文字と
誤判断することなく正確に、しかも高速に処理すること
ができる。Operation With the above configuration, in particular, when performing recognition of half-width characters,
Even if the first segmentation fails, since only the half-width dictionary with a small number of characters is referred to when re-recognition is performed, the process can be performed accurately and at high speed without erroneous determination as a full-width character.
実施例 第1図は本発明の一実施例における日本語文字認識装
置の全体構成図である。1はスキャナなどの画像読み取
り装置から文字パターンを入力する入力部、2は入力さ
れた文字パターンを1文字ずつ切り出す文字切り出し
部、3は切り出された文字パターンの認識を行う認識
部、4は全角文字のみの特徴を記憶している全角辞書4
−1と、半角文字のみの特徴を記憶している半角辞書4
−2から構成される辞書、5は文字切り出し部2と、認
識部3の制御を行う制御部、6は認識された文字を出力
する出力部である。Embodiment FIG. 1 is an overall configuration diagram of a Japanese character recognition device according to an embodiment of the present invention. 1 is an input unit for inputting a character pattern from an image reading device such as a scanner, 2 is a character cutout unit for cutting out the input character pattern one character at a time, 3 is a recognition unit for recognizing the cutout character pattern, and 4 is a full-width character Full-width dictionary 4 storing features of characters only
-1 and a half-width dictionary 4 storing features of only half-width characters
Reference numeral 5 denotes a control unit for controlling the character cutout unit 2 and the recognition unit 3, and 6 denotes an output unit for outputting recognized characters.
以上のように構成された本実施例の日本語文字認識装
置について、第2図に示すフローチャートに従って、以
下その動作を説明する。The operation of the thus configured Japanese character recognition device of the present embodiment will be described below with reference to the flowchart shown in FIG.
まず、S1において、入力部1から文字パターンを入力
し、S2において、入力文字パターンの基準矩形を抽出す
る。基準矩形の抽出方法は、まず、文字パターンの黒画
素領域の外接矩形を抽出する。第3図の文字切り出しの
手順例で説明すると、(a)における文字パターン領域
であれば、(b)のような外接矩形が抽出される。次
に、文書の方向と垂直に見て、外接矩形が重なっている
場合、これらを統合し、基準矩形とする。(b)の外接
矩形であれば、(c)のように抽出される。First, in S1, a character pattern is input from the input unit 1, and in S2, a reference rectangle of the input character pattern is extracted. In the extraction method of the reference rectangle, first, a circumscribed rectangle of the black pixel region of the character pattern is extracted. Explaining the example of the character extraction procedure in FIG. 3, if it is the character pattern area in (a), a circumscribed rectangle as in (b) is extracted. Next, when the circumscribed rectangles overlap each other when viewed perpendicularly to the direction of the document, they are integrated and set as a reference rectangle. If it is the circumscribed rectangle of (b), it is extracted as in (c).
S3において、基準矩形を基に、矩形の縦/横比が全角
文字サイズの範囲である間、隣接する基準矩形を統合
し、第1候補矩形を抽出する。第3図(c)の基準矩形
であれば、(d)のように抽出される。S4において、第
1候補矩形のサイズが、半角文字サイズであれば、S5に
おいて半角辞書に切り換え、半角文字サイズでなけれ
ば、S6において、全角辞書に切り替える。S7において、
辞書との照合を行い、照合に成功すれば、S9において、
認識された文字を出力し、S10において、未処理の基準
矩形が残っているならば、S3に戻り、次の文字パターン
の基準矩形の統合に移る。In S3, based on the reference rectangle, while the aspect ratio of the rectangle is within the full-width character size range, adjacent reference rectangles are integrated, and a first candidate rectangle is extracted. If it is the reference rectangle in FIG. 3C, it is extracted as shown in FIG. In S4, if the size of the first candidate rectangle is a half-width character size, it is switched to a half-width dictionary in S5, and if not, it is switched to a full-width dictionary in S6. In S7,
After performing the collation with the dictionary and succeeding in the collation, in S9,
The recognized character is output, and in S10, if an unprocessed reference rectangle remains, the process returns to S3 and proceeds to integration of the reference rectangle of the next character pattern.
S7において、辞書との照合で棄却されると、S8におい
て、第1候補矩形の切り出しが失敗されたとみなし、次
に1文字となり得る最適な位置まで、隣接する基準矩形
を再度、統合し直し、第2候補矩形とする。第3図
(d)の第1候補矩形であれば、(e)のように抽出さ
れる。第2候補矩形を求めた後、S4に戻り、全角辞書・
半角辞書のいずれかと照合を行う。If it is rejected by collation with the dictionary in S7, it is considered that the extraction of the first candidate rectangle has failed in S8, and the adjacent reference rectangles are integrated again until the next optimal position that can be one character, Let it be the second candidate rectangle. If it is the first candidate rectangle in FIG. 3D, it is extracted as shown in FIG. After obtaining the second candidate rectangle, the process returns to S4, where the full-width dictionary
Check with one of the half-width dictionaries.
発明の効果 本発明は、辞書を全角文字用と半角文字用の辞書とに
分割しておき、全角文字として切り出した文字パターン
の認識結果が棄却となった場合にのみ、再度、切り出し
を行い、切り出された文字パターンのサイズにより、い
ずれかの辞書との照合を行うので、全角文字・半角文字
混在の文書において、特に、半角文字を認識する場合、
最初に1文字の切り出しを失敗しても、再度切り出した
後、半角辞書のみを参照するため、全角文字と正確に区
別でき、その結果、飛躍的高速処理が可能となり、入力
作業の作業性が格段に向上した。The present invention, the dictionary is divided into a dictionary for full-width characters and a dictionary for half-width characters, and only when the recognition result of the character pattern cut out as full-width characters is rejected, cut out again, According to the size of the extracted character pattern, matching with one of the dictionaries is performed, so in the case of documents containing both full-width and half-width characters, especially when recognizing half-width characters,
Even if the extraction of one character fails first, the character is extracted again and then only the half-width dictionary is referred to, so it can be accurately distinguished from the full-width character. As a result, dramatic high-speed processing becomes possible, and the workability of the input operation is improved. It has improved dramatically.
第1図は本発明の一実施例における日本語文字認識装置
の全体構成図、第2図はフローチャート、第3図は文字
切り出しの手順例を示す説明図である。 1……入力部 2……文字切り出し部 3……認識部 4……辞書 4−1……全角辞書 4−2……半角辞書 5……制御部 6……出力部FIG. 1 is an overall configuration diagram of a Japanese character recognition device according to an embodiment of the present invention, FIG. 2 is a flowchart, and FIG. 3 is an explanatory diagram showing an example of a character extraction procedure. DESCRIPTION OF SYMBOLS 1 ... Input part 2 ... Character extraction part 3 ... Recognition part 4 ... Dictionary 4-1 ... Full-width dictionary 4-2 ... Half-width dictionary 5 ... Control part 6 ... Output part
Claims (1)
識装置において、 文字パターンを全角サイズの文字であると推定して切り
出す手段と、 切り出された文字パターンに対し全角辞書を参照して認
識する手段と、 前記認識結果が棄却となった場合に、再度、棄却となっ
た文字パターンを1文字となり得る最適な位置で切り出
す手段と、 再度切り出された文字パターンが半角サイズであれば、
半角辞書を参照して認識を行う手段と、を有することを
特徴とする日本語文字認識装置。1. A character recognition apparatus for recognizing an input character pattern, means for extracting a character pattern as if it were a full-size character, and cutting out the character pattern by referring to a full-size dictionary. Means, if the recognition result is rejected, means for cutting out the rejected character pattern again at an optimal position where it can be one character, and if the re-cut out character pattern is half-size,
Means for performing recognition by referring to a half-width dictionary.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2234917A JP2995825B2 (en) | 1990-09-04 | 1990-09-04 | Japanese character recognition device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2234917A JP2995825B2 (en) | 1990-09-04 | 1990-09-04 | Japanese character recognition device |
Publications (2)
Publication Number | Publication Date |
---|---|
JPH04114292A JPH04114292A (en) | 1992-04-15 |
JP2995825B2 true JP2995825B2 (en) | 1999-12-27 |
Family
ID=16978311
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2234917A Expired - Lifetime JP2995825B2 (en) | 1990-09-04 | 1990-09-04 | Japanese character recognition device |
Country Status (1)
Country | Link |
---|---|
JP (1) | JP2995825B2 (en) |
-
1990
- 1990-09-04 JP JP2234917A patent/JP2995825B2/en not_active Expired - Lifetime
Also Published As
Publication number | Publication date |
---|---|
JPH04114292A (en) | 1992-04-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JPH04195692A (en) | Document reader | |
Saitoh et al. | Document image segmentation and layout analysis | |
JP2995825B2 (en) | Japanese character recognition device | |
JPH0991371A (en) | Character display device | |
JP3197441B2 (en) | Character recognition device | |
JP2985243B2 (en) | Character recognition method | |
JP2576350B2 (en) | String extraction device | |
JPH09106437A (en) | Device and method for segmenting character | |
JP2550012B2 (en) | Pattern cutting and recognition method | |
JPH0259979A (en) | Document and image processor | |
JP3060237B2 (en) | Japanese character recognition device | |
JPH04287168A (en) | Automatic keyword extracting method for filing | |
JPH05174185A (en) | Japanese character recognizing device | |
JP3151866B2 (en) | English character recognition method | |
JP2746345B2 (en) | Post-processing method for character recognition | |
JPH0514952B2 (en) | ||
JPS60110089A (en) | Character recognizer | |
JPS61251984A (en) | Device for recognizing multi-font type character | |
JP3116452B2 (en) | English character recognition device | |
JP3116453B2 (en) | English character recognition device | |
JPS6318483A (en) | Character recognizing method for optical information input device | |
JPS6198487A (en) | Dictionary selecting system | |
JPH0554071A (en) | Digital translation device | |
JPH064709A (en) | Character recognition device | |
JPH10198763A (en) | Character recognizer and computer readable storage medium storing program making computer function as character recognizer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20081029 Year of fee payment: 9 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20091029 Year of fee payment: 10 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20091029 Year of fee payment: 10 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20101029 Year of fee payment: 11 |
|
EXPY | Cancellation because of completion of term |