JPH03198181A - Post-processing method for character recognition - Google Patents
Post-processing method for character recognitionInfo
- Publication number
- JPH03198181A JPH03198181A JP1339788A JP33978889A JPH03198181A JP H03198181 A JPH03198181 A JP H03198181A JP 1339788 A JP1339788 A JP 1339788A JP 33978889 A JP33978889 A JP 33978889A JP H03198181 A JPH03198181 A JP H03198181A
- Authority
- JP
- Japan
- Prior art keywords
- character
- recognition
- rule
- character recognition
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims description 12
- 238000012805 post-processing Methods 0.000 title claims description 8
- 238000012545 processing Methods 0.000 abstract description 18
- 235000016496 Panda oleosa Nutrition 0.000 abstract description 5
- 240000000220 Panda oleosa Species 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Landscapes
- Character Discrimination (AREA)
Abstract
Description
【発明の詳細な説明】
〔産業上の利用分野〕
本発明は、日本語文字認識装置に係り、特に入力文字と
辞書とのマツチングによる文字認識結果の後処理に関す
る。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a Japanese character recognition device, and particularly to post-processing of character recognition results by matching input characters with a dictionary.
一般に文字認識を行う場合、文字画像を一定のサイズに
正規化してから辞書とのマツチングが行われるため、形
が同じでサイズが違う文字、例えば、大文字の「あ」と
小文字の[あ」、大文字の「ヤ」と小文字の「ヤ」等は
区別ができない。Generally, when performing character recognition, character images are normalized to a certain size and then matched with a dictionary. Uppercase letters such as ``ya'' and lowercase letters ``ya'' cannot be distinguished.
この問題に対し、文字の位置やサイズを利用して大文字
、小文字、句読点等を識別する方法が考案されている。To solve this problem, methods have been devised to identify uppercase letters, lowercase letters, punctuation marks, etc. using the position and size of characters.
例えば、特公昭60−1676号公報に述べられている
ような、左隣の文字との相対位置関係により文字を判定
する方法、特公昭60−9314号公報に述べられてい
るような、帳票記入枠内での文字位置により文字を判定
する方法、特開昭63−78287号公報に述べられて
いるような、周辺N個の文字並び及び位置、サイズによ
り文字を判定する方法等が知られている。For example, a method of determining a character based on the relative positional relationship with the adjacent character on the left, as described in Japanese Patent Publication No. 1676/1980, and a method of writing a form as described in Japanese Patent Publication No. 60-9314. A method of determining a character based on the position of the character within a frame, a method of determining a character based on the arrangement, position, and size of N surrounding characters as described in Japanese Patent Laid-Open No. 63-78287 are known. There is.
しかし、フェントによっては文字サイズや印字位置が異
なるので、多様な文書を認識対象とする場合には、問題
が解決されたとは言えない。However, since the font size and printing position differ depending on the Fendt, the problem cannot be said to have been solved when a variety of documents are to be recognized.
よって本発明の目的は、文字辞書とのマツチングによる
文字認識結果の誤り、特に誤認しやすい大文字と小文字
の誤りを簡単な処理によって修正する後処理方法を提供
することにある。SUMMARY OF THE INVENTION Accordingly, it is an object of the present invention to provide a post-processing method for correcting errors in character recognition results due to matching with a character dictionary, particularly errors in uppercase and lowercase letters that are likely to be misrecognized, by simple processing.
本発明は、特定の文字と、それに前置可能な文字または
前置不可能な文字とを記述した文字並びルールと、文字
辞書とのマツチングによる文字認識結果とを照合するこ
とにより1文字認識結果を修正することを特徴とする。The present invention obtains a single character recognition result by comparing a character sequence rule that describes a specific character and characters that can or cannot be prefixed to it with a character recognition result by matching with a character dictionary. It is characterized by correcting.
例えばカナの小文字の「や」 「ゆ」 [よ」等は、文
の先頭に現れることはないし、必ずその前に何等かの大
文字を伴って現れる。その大文字は何でもよいという訳
ではなく、限られた文字しか許されない。つまり、そこ
には文字列としてのルールが存在する。For example, the lowercase kana letters ``ya'', ``yu'', and ``yo'' never appear at the beginning of a sentence, and they always appear with some kind of uppercase letter before them. Not all capital letters are allowed; only a limited number of characters are allowed. In other words, there are rules as strings.
したがって、前置きされる文字が限定される特定のカナ
小文字等について、その前置可能な(または不可能な)
文字を記述したルールを用意し、これを文字認識結果と
照合することによって、ルール違反となる大文字と小文
字の誤認識等を簡単かつ確実に修正することができる。Therefore, for specific kana lowercase letters etc. that are limited to the characters that can be prefixed,
By preparing rules that describe characters and comparing them with character recognition results, misrecognition of uppercase and lowercase letters that violate the rules can be easily and reliably corrected.
第1図は本発明の一実施例に係る漢字OCRの概略ブロ
ック図であり、第2図は処理の概略フローチャートであ
る。FIG. 1 is a schematic block diagram of Kanji OCR according to an embodiment of the present invention, and FIG. 2 is a schematic flowchart of processing.
第1図において、処理すべきJMMの画像データは画像
入力部10より入力され1画像メモリ11に格納される
(処理ステップ■)、この入力画像データに対し1行・
文字切出し部12によって行切出し及び文字切出しが行
われ、切り出された文字画像データが文字画像メモリ1
3に格納される(処理ステップ■)、この文字画像デー
タに対し、文字認識部14において、サイズの正規化等
の前処理と特徴抽出が行われ、抽出された特徴ベクトル
と文字辞書メモリ15に格納された文字辞書とのマツチ
ングにより、距離の小さいほうから最高N位までの文字
候補が選ばれる(処理ステップ■)。In FIG. 1, JMM image data to be processed is input from the image input unit 10 and stored in the one-image memory 11 (processing step ■).
Line cutting and character cutting are performed by the character cutting unit 12, and the cut out character image data is stored in the character image memory 1.
3 (processing step ■), the character recognition unit 14 performs preprocessing such as size normalization and feature extraction, and stores the extracted feature vectors and the character dictionary memory 15. By matching with the stored character dictionary, character candidates from the smallest distance to the highest N are selected (processing step ■).
後処理部16は、文字認識結果に対し、大文字/小文字
修正処理を行う部分であり、第2図に示す処理ステップ
■から処理ステップ0までが大文字/小文字修正処理で
ある。The post-processing unit 16 is a part that performs uppercase/lowercase letter correction processing on the character recognition result, and the uppercase/lowercase letter correction processing is performed from processing step ① to processing step 0 shown in FIG.
この大文字/小文字修正処理に用いられる文字並びルー
ルは1文字並びルールメモリ18に格納されている。The character arrangement rules used for this uppercase/lowercase letter correction processing are stored in the single character arrangement rule memory 18.
この文字並びルールは、前に来る文字が限定されるよう
な文字(ヒラガナ、カタカナの小文字)に関して、その
前に来ることが許される文字を記述したもので、その−
例を第1表に示す。This character arrangement rule describes the characters that are allowed to come before characters that can only come before them (lowercase letters in Hiragana and Katakana), and the -
Examples are shown in Table 1.
次に5文字列「およぐように歩く」を認識する場合を例
にして大文字/小文字修正処理について説明する。この
場合5文字認識部14により第2表に示す認識結果が得
られたとする6文字の大小が考慮されない結果、「お」
と「お」、「よ」と「よ」、「う」と「う」が誤Egl
!されている。Next, uppercase/lowercase letter correction processing will be described using an example of recognizing the five-character string ``Yogugu ni Aruru''. In this case, the recognition results shown in Table 2 are obtained by the 5-character recognition unit 14.As a result of not considering the size of the 6 characters, "O"
and "o", "yo" and "yo", "u" and "u" are incorrect Egl
! has been done.
まず、認識結果の第1候補を順番に文字並びルールと比
較し、修正処理が必要な対象文字が出現するまで進む、
対象文字が見つかったら、その位置から一つ前の文字の
第1候補が、前置可能な文字であるか文字並びルールと
比較する。その結果、その前の文字が存在しなかったり
(対象文字が文の先頭)、一つ前の文字が前置可能な文
字でない場合、対象文字位置の第2候補を第1候補に入
れ替える。第2候補もまた大小文字であるときは、その
文字に対して同様に一つ前の文字をルールと比較し、前
置可能な文字であれば第1候補に入れ替え、そうでなけ
れば次位の候補に対し同様の処理を行う。First, the first candidate of the recognition result is compared with the character arrangement rule in order, and the process is continued until a target character that requires correction processing appears.
When the target character is found, the first candidate for the character immediately before the target character is compared with the character arrangement rules to see if it is a character that can be prefixed. As a result, if the previous character does not exist (the target character is the beginning of a sentence) or the previous character is not a character that can be prefixed, the second candidate for the target character position is replaced with the first candidate. If the second candidate is also a lowercase letter, compare the previous character with the rule in the same way for that character, and if it can be prefixed, replace it with the first candidate, otherwise replace it with the next character. The same process is performed for the candidates.
この例の場合、文字番号1の第1候補「お」が対象文字
である。しかし、これは文の先頭であるので、「お」は
以上文字列と判定され、第2候補の「お」と入れ替えら
れる。この「お」は対象文字ではないので、文字番号1
の処理を終わる。In this example, the first candidate "o" with character number 1 is the target character. However, since this is the beginning of the sentence, "o" is determined to be a longer character string, and is replaced with the second candidate "o". This "o" is not a target character, so character number 1
processing ends.
次の文字番号2の第1候補の「よ」は対象文字であるの
で、一つ前の第1候補文字「お」を文字並びルールの内
容と比較する。しかし、「お」はルールにはないので、
「よ」を第2候補の「よJと入れ替える。この「よ」は
対象文字ではないので、文字番号2の処理を終わる。Since the first candidate character "yo" for the next character number 2 is the target character, the previous first candidate character "o" is compared with the content of the character arrangement rule. However, "o" is not in the rules, so
"Yo" is replaced with the second candidate "Yo J." Since "Yo" is not the target character, the processing for character number 2 ends.
同様の処理を文字番号8まで繰り返すと、認識結果は第
3表に示すように、誤認文字が修正される。When similar processing is repeated up to character number 8, misidentified characters are corrected in the recognition results as shown in Table 3.
第3表
(修正結果)
注)下線は修正されたもの
なお、文字並びルールは、対象文字に前置不可能な文字
を記述したものとすることができるが、日本語のカナの
場合は、実施例のように前置可能な文字を記述するほう
がルールの規模を小さくできる。Table 3 (Results of correction) Note: Underlining has been corrected. Note that the character arrangement rule can be such that a character that cannot be prefixed to the target character is written, but in the case of Japanese kana, The scale of the rule can be made smaller by writing characters that can be prefixed as in the example.
大文字/小文字の誤り以外の誤認識についても、同様の
ルールを作成することにより修正が可能である。Recognition errors other than errors in uppercase/lowercase letters can also be corrected by creating similar rules.
後処理部17において、大文字/小文字修正処理のほか
に、単語または文章としての妥当性による修正処理を行
ってもよい。In addition to the uppercase/lowercase letter correction process, the post-processing unit 17 may also perform a correction process based on the validity of the word or sentence.
以上のようにして後処理部17によって修正された認識
結果は!!!識結果メモリ16に得られるが、これは出
力部19により出力される。The recognition results corrected by the post-processing unit 17 as described above are! ! ! The results are obtained in the identification result memory 16, which is outputted by the output section 19.
以上説明した如く、本発明によれば、文字辞書とのマツ
チングによる文字認識結果の大文字と小文字の誤り等を
、簡単な処理によって確実に修正することができる。As described above, according to the present invention, errors in uppercase and lowercase letters in character recognition results obtained by matching with a character dictionary can be reliably corrected through simple processing.
第1図は本発明の一実施例に係る漢字OCRの概略ブロ
ック図、第2図は処理の概略フローチャートである。
10・・・画像人力部、 11・・・画像メモリ、12
・・・行・文字切出し部、
13・・・文字画像メモリ、 14・・・文字認識部
、15・・・文字辞書メモリ、
16・・・認識結果メモリ、 17・・・後処理部、
18・・・文字並びルールメモリ、 19・・・出力
部。FIG. 1 is a schematic block diagram of Kanji OCR according to an embodiment of the present invention, and FIG. 2 is a schematic flowchart of processing. 10... Image human resources department, 11... Image memory, 12
... line/character cutting section, 13... character image memory, 14... character recognition section, 15... character dictionary memory, 16... recognition result memory, 17... post-processing section,
18...Character arrangement rule memory, 19...Output section.
Claims (1)
不可能な文字とを記述した文字並びルールと、文字辞書
とのマッチングによる文字認識結果とを照合することに
より、文字認識結果を修正することを特徴とする文字認
識装置における文字認識の後処理方法。(1) Character recognition results are obtained by comparing character sequence rules that describe specific characters and characters that can or cannot be prefixed to them with the character recognition results obtained by matching with a character dictionary. A post-processing method for character recognition in a character recognition device, characterized in that the character recognition is corrected.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP1339788A JP2922949B2 (en) | 1989-12-27 | 1989-12-27 | Post-processing method for character recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP1339788A JP2922949B2 (en) | 1989-12-27 | 1989-12-27 | Post-processing method for character recognition |
Publications (2)
Publication Number | Publication Date |
---|---|
JPH03198181A true JPH03198181A (en) | 1991-08-29 |
JP2922949B2 JP2922949B2 (en) | 1999-07-26 |
Family
ID=18330814
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP1339788A Expired - Lifetime JP2922949B2 (en) | 1989-12-27 | 1989-12-27 | Post-processing method for character recognition |
Country Status (1)
Country | Link |
---|---|
JP (1) | JP2922949B2 (en) |
-
1989
- 1989-12-27 JP JP1339788A patent/JP2922949B2/en not_active Expired - Lifetime
Also Published As
Publication number | Publication date |
---|---|
JP2922949B2 (en) | 1999-07-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP3427692B2 (en) | Character recognition method and character recognition device | |
Javed et al. | Segmentation free nastalique urdu ocr | |
US5161245A (en) | Pattern recognition system having inter-pattern spacing correction | |
US8340425B2 (en) | Optical character recognition with two-pass zoning | |
US8725497B2 (en) | System and method for detecting and correcting mismatched Chinese character | |
Naz et al. | Arabic script based character segmentation: a review | |
JP2000089786A (en) | Method for correcting speech recognition result and apparatus therefor | |
Ahmed et al. | Ligature analysis-based Urdu OCR framework | |
JPH03198181A (en) | Post-processing method for character recognition | |
JPH10134141A (en) | Device and method for document collation | |
Balasooriya | Improving and Measuring OCR Accuracy for Sinhala with Tesseract OCR Engine | |
JP5289032B2 (en) | Document search device | |
JP2002312398A (en) | Document retrieval device | |
JP2974145B2 (en) | Correcting character recognition results | |
JPH03156589A (en) | Method for detecting and correcting erroneously read character | |
JPS6224382A (en) | Method for recognizing handwritten character | |
JP3151866B2 (en) | English character recognition method | |
JP2904517B2 (en) | Character recognition device | |
JPH0589281A (en) | Erroneous read correcting and detecting method | |
JPH11120294A (en) | Character recognition device and medium | |
Hassibi | Machine-printed Arabic OCR | |
JPS6081688A (en) | Recognizing method of information | |
Naza et al. | Arabic Script Based Character Segmentation: A | |
JPH0728957A (en) | English letter recognition device | |
JPH0496882A (en) | Full size/half size discriminating method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20080430 Year of fee payment: 9 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20090430 Year of fee payment: 10 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20090430 Year of fee payment: 10 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20100430 Year of fee payment: 11 |
|
EXPY | Cancellation because of completion of term | ||
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20100430 Year of fee payment: 11 |