JP2995825B2 - Japanese character recognition device - Google Patents

Japanese character recognition device

Info

Publication number
JP2995825B2
JP2995825B2 JP2234917A JP23491790A JP2995825B2 JP 2995825 B2 JP2995825 B2 JP 2995825B2 JP 2234917 A JP2234917 A JP 2234917A JP 23491790 A JP23491790 A JP 23491790A JP 2995825 B2 JP2995825 B2 JP 2995825B2
Authority
JP
Japan
Prior art keywords
character
width
dictionary
character pattern
full
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP2234917A
Other languages
Japanese (ja)
Other versions
JPH04114292A (en
Inventor
亜由美 橘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP2234917A priority Critical patent/JP2995825B2/en
Publication of JPH04114292A publication Critical patent/JPH04114292A/en
Application granted granted Critical
Publication of JP2995825B2 publication Critical patent/JP2995825B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)
  • Character Discrimination (AREA)

Description

【発明の詳細な説明】 産業上の利用分野 本発明は、スキャナなどの画像読み取り装置から入力
された日本語文書の文字パターンを1文字ずつ切り出
し、認識する日本語文字認識装置に関するものである。
Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a Japanese character recognition device that cuts out a character pattern of a Japanese document input from an image reading device such as a scanner one by one and recognizes it.

従来の技術 日本語の文字は、全角文字が大半を占めており、ほと
んどが固定サイズであるが、一部の仮名文字、半角文字
などは、サイズが不定である。また、「は」、「い」等
の分離文字が多く存在する。そのため、文字パターンの
切り出しを正確に行うことは困難である。そこで、従来
は、入力された文字パターンを最初は全角文字として切
り出し、全文字の特徴を記憶している全体辞書との照合
により認識を行い、認識結果が棄却となった場合、再
度、文字パターンを切り出していた。
2. Description of the Related Art Japanese characters occupy the majority of full-width characters, and most of them have a fixed size. However, the size of some kana characters, half-width characters, and the like is undefined. Also, there are many separated characters such as "ha" and "i". Therefore, it is difficult to accurately extract a character pattern. Therefore, conventionally, an input character pattern is first cut out as full-width characters, and recognition is performed by collation with an entire dictionary that stores the characteristics of all characters. If the recognition result is rejected, the character pattern is re-input. Was cut out.

発明が解決しようとする課題 しかしながら上記従来の方法では、日本語の文字数が
膨大なので、全体辞書との照合に多大の認識時間を要し
ていた。例えば、半角文字を認識する場合、最初に、全
角文字と判断して切り出し、認識結果が棄却され、再
度、半角文字として切り出した後、全体辞書との照合を
行うため、認識処理に多くの時間を要し処理の高速化を
図ることが不可能であった。
Problems to be Solved by the Invention However, in the above-described conventional method, since the number of Japanese characters is enormous, it takes a long recognition time to collate with the entire dictionary. For example, when recognizing a half-width character, first, it is determined that the character is a full-width character, and the recognition result is rejected.Then, the character is cut out again as a half-width character, and then collation with the entire dictionary is performed. And it was not possible to increase the processing speed.

課題を解決するための手段 本発明は、入力された文字パターンを認識する文字認
識装置において、文字パターンを全角サイズの文字であ
ると推定して切り出す手段と、切り出された文字パター
ンに対し全角辞書を参照して認識する手段と、前記認識
結果が棄却となった場合に、再度、棄却となった文字パ
ターンを1文字となり得る最適な位置で切り出す手段
と、再度切り出された文字パターンが半角サイズであれ
ば、半角辞書を参照して認識を行う手段と、を有するも
のである。
Means for Solving the Problems The present invention relates to a character recognition device for recognizing an input character pattern. The character recognition device estimates a character pattern as a full-size character and cuts out the character pattern. Means for recognizing the character pattern by referring to the character pattern, and means for cutting out the rejected character pattern again at an optimum position where the character pattern becomes one character when the recognition result is rejected. And means for performing recognition by referring to the half-width dictionary.

作用 上記構成により、特に、半角文字の認識を行う場合、
最初の切り出しを失敗しても、再度認識を行う際に、文
字数の少ない半角辞書のみを参照するため、全角文字と
誤判断することなく正確に、しかも高速に処理すること
ができる。
Operation With the above configuration, in particular, when performing recognition of half-width characters,
Even if the first segmentation fails, since only the half-width dictionary with a small number of characters is referred to when re-recognition is performed, the process can be performed accurately and at high speed without erroneous determination as a full-width character.

実施例 第1図は本発明の一実施例における日本語文字認識装
置の全体構成図である。1はスキャナなどの画像読み取
り装置から文字パターンを入力する入力部、2は入力さ
れた文字パターンを1文字ずつ切り出す文字切り出し
部、3は切り出された文字パターンの認識を行う認識
部、4は全角文字のみの特徴を記憶している全角辞書4
−1と、半角文字のみの特徴を記憶している半角辞書4
−2から構成される辞書、5は文字切り出し部2と、認
識部3の制御を行う制御部、6は認識された文字を出力
する出力部である。
Embodiment FIG. 1 is an overall configuration diagram of a Japanese character recognition device according to an embodiment of the present invention. 1 is an input unit for inputting a character pattern from an image reading device such as a scanner, 2 is a character cutout unit for cutting out the input character pattern one character at a time, 3 is a recognition unit for recognizing the cutout character pattern, and 4 is a full-width character Full-width dictionary 4 storing features of characters only
-1 and a half-width dictionary 4 storing features of only half-width characters
Reference numeral 5 denotes a control unit for controlling the character cutout unit 2 and the recognition unit 3, and 6 denotes an output unit for outputting recognized characters.

以上のように構成された本実施例の日本語文字認識装
置について、第2図に示すフローチャートに従って、以
下その動作を説明する。
The operation of the thus configured Japanese character recognition device of the present embodiment will be described below with reference to the flowchart shown in FIG.

まず、S1において、入力部1から文字パターンを入力
し、S2において、入力文字パターンの基準矩形を抽出す
る。基準矩形の抽出方法は、まず、文字パターンの黒画
素領域の外接矩形を抽出する。第3図の文字切り出しの
手順例で説明すると、(a)における文字パターン領域
であれば、(b)のような外接矩形が抽出される。次
に、文書の方向と垂直に見て、外接矩形が重なっている
場合、これらを統合し、基準矩形とする。(b)の外接
矩形であれば、(c)のように抽出される。
First, in S1, a character pattern is input from the input unit 1, and in S2, a reference rectangle of the input character pattern is extracted. In the extraction method of the reference rectangle, first, a circumscribed rectangle of the black pixel region of the character pattern is extracted. Explaining the example of the character extraction procedure in FIG. 3, if it is the character pattern area in (a), a circumscribed rectangle as in (b) is extracted. Next, when the circumscribed rectangles overlap each other when viewed perpendicularly to the direction of the document, they are integrated and set as a reference rectangle. If it is the circumscribed rectangle of (b), it is extracted as in (c).

S3において、基準矩形を基に、矩形の縦/横比が全角
文字サイズの範囲である間、隣接する基準矩形を統合
し、第1候補矩形を抽出する。第3図(c)の基準矩形
であれば、(d)のように抽出される。S4において、第
1候補矩形のサイズが、半角文字サイズであれば、S5に
おいて半角辞書に切り換え、半角文字サイズでなけれ
ば、S6において、全角辞書に切り替える。S7において、
辞書との照合を行い、照合に成功すれば、S9において、
認識された文字を出力し、S10において、未処理の基準
矩形が残っているならば、S3に戻り、次の文字パターン
の基準矩形の統合に移る。
In S3, based on the reference rectangle, while the aspect ratio of the rectangle is within the full-width character size range, adjacent reference rectangles are integrated, and a first candidate rectangle is extracted. If it is the reference rectangle in FIG. 3C, it is extracted as shown in FIG. In S4, if the size of the first candidate rectangle is a half-width character size, it is switched to a half-width dictionary in S5, and if not, it is switched to a full-width dictionary in S6. In S7,
After performing the collation with the dictionary and succeeding in the collation, in S9,
The recognized character is output, and in S10, if an unprocessed reference rectangle remains, the process returns to S3 and proceeds to integration of the reference rectangle of the next character pattern.

S7において、辞書との照合で棄却されると、S8におい
て、第1候補矩形の切り出しが失敗されたとみなし、次
に1文字となり得る最適な位置まで、隣接する基準矩形
を再度、統合し直し、第2候補矩形とする。第3図
(d)の第1候補矩形であれば、(e)のように抽出さ
れる。第2候補矩形を求めた後、S4に戻り、全角辞書・
半角辞書のいずれかと照合を行う。
If it is rejected by collation with the dictionary in S7, it is considered that the extraction of the first candidate rectangle has failed in S8, and the adjacent reference rectangles are integrated again until the next optimal position that can be one character, Let it be the second candidate rectangle. If it is the first candidate rectangle in FIG. 3D, it is extracted as shown in FIG. After obtaining the second candidate rectangle, the process returns to S4, where the full-width dictionary
Check with one of the half-width dictionaries.

発明の効果 本発明は、辞書を全角文字用と半角文字用の辞書とに
分割しておき、全角文字として切り出した文字パターン
の認識結果が棄却となった場合にのみ、再度、切り出し
を行い、切り出された文字パターンのサイズにより、い
ずれかの辞書との照合を行うので、全角文字・半角文字
混在の文書において、特に、半角文字を認識する場合、
最初に1文字の切り出しを失敗しても、再度切り出した
後、半角辞書のみを参照するため、全角文字と正確に区
別でき、その結果、飛躍的高速処理が可能となり、入力
作業の作業性が格段に向上した。
The present invention, the dictionary is divided into a dictionary for full-width characters and a dictionary for half-width characters, and only when the recognition result of the character pattern cut out as full-width characters is rejected, cut out again, According to the size of the extracted character pattern, matching with one of the dictionaries is performed, so in the case of documents containing both full-width and half-width characters, especially when recognizing half-width characters,
Even if the extraction of one character fails first, the character is extracted again and then only the half-width dictionary is referred to, so it can be accurately distinguished from the full-width character. As a result, dramatic high-speed processing becomes possible, and the workability of the input operation is improved. It has improved dramatically.

【図面の簡単な説明】[Brief description of the drawings]

第1図は本発明の一実施例における日本語文字認識装置
の全体構成図、第2図はフローチャート、第3図は文字
切り出しの手順例を示す説明図である。 1……入力部 2……文字切り出し部 3……認識部 4……辞書 4−1……全角辞書 4−2……半角辞書 5……制御部 6……出力部
FIG. 1 is an overall configuration diagram of a Japanese character recognition device according to an embodiment of the present invention, FIG. 2 is a flowchart, and FIG. 3 is an explanatory diagram showing an example of a character extraction procedure. DESCRIPTION OF SYMBOLS 1 ... Input part 2 ... Character extraction part 3 ... Recognition part 4 ... Dictionary 4-1 ... Full-width dictionary 4-2 ... Half-width dictionary 5 ... Control part 6 ... Output part

Claims (1)

(57)【特許請求の範囲】(57) [Claims] 【請求項1】入力された文字パターンを認識する文字認
識装置において、 文字パターンを全角サイズの文字であると推定して切り
出す手段と、 切り出された文字パターンに対し全角辞書を参照して認
識する手段と、 前記認識結果が棄却となった場合に、再度、棄却となっ
た文字パターンを1文字となり得る最適な位置で切り出
す手段と、 再度切り出された文字パターンが半角サイズであれば、
半角辞書を参照して認識を行う手段と、を有することを
特徴とする日本語文字認識装置。
1. A character recognition apparatus for recognizing an input character pattern, means for extracting a character pattern as if it were a full-size character, and cutting out the character pattern by referring to a full-size dictionary. Means, if the recognition result is rejected, means for cutting out the rejected character pattern again at an optimal position where it can be one character, and if the re-cut out character pattern is half-size,
Means for performing recognition by referring to a half-width dictionary.
JP2234917A 1990-09-04 1990-09-04 Japanese character recognition device Expired - Lifetime JP2995825B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2234917A JP2995825B2 (en) 1990-09-04 1990-09-04 Japanese character recognition device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2234917A JP2995825B2 (en) 1990-09-04 1990-09-04 Japanese character recognition device

Publications (2)

Publication Number Publication Date
JPH04114292A JPH04114292A (en) 1992-04-15
JP2995825B2 true JP2995825B2 (en) 1999-12-27

Family

ID=16978311

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2234917A Expired - Lifetime JP2995825B2 (en) 1990-09-04 1990-09-04 Japanese character recognition device

Country Status (1)

Country Link
JP (1) JP2995825B2 (en)

Also Published As

Publication number Publication date
JPH04114292A (en) 1992-04-15

Similar Documents

Publication Publication Date Title
JPH04195692A (en) Document reader
Saitoh et al. Document image segmentation and layout analysis
JP2995825B2 (en) Japanese character recognition device
JPH0991371A (en) Character display device
JP3197441B2 (en) Character recognition device
JP2985243B2 (en) Character recognition method
JP2576350B2 (en) String extraction device
JPH09106437A (en) Device and method for segmenting character
JP2550012B2 (en) Pattern cutting and recognition method
JPH0259979A (en) Document and image processor
JP3060237B2 (en) Japanese character recognition device
JPH04287168A (en) Automatic keyword extracting method for filing
JPH05174185A (en) Japanese character recognizing device
JP3151866B2 (en) English character recognition method
JP2746345B2 (en) Post-processing method for character recognition
JPH0514952B2 (en)
JPS60110089A (en) Character recognizer
JPS61251984A (en) Device for recognizing multi-font type character
JP3116452B2 (en) English character recognition device
JP3116453B2 (en) English character recognition device
JPS6318483A (en) Character recognizing method for optical information input device
JPS6198487A (en) Dictionary selecting system
JPH0554071A (en) Digital translation device
JPH064709A (en) Character recognition device
JPH10198763A (en) Character recognizer and computer readable storage medium storing program making computer function as character recognizer

Legal Events

Date Code Title Description
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20081029

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091029

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091029

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20101029

Year of fee payment: 11

EXPY Cancellation because of completion of term