JPH06309503A - English character recognizing device - Google Patents

English character recognizing device

Info

Publication number
JPH06309503A
JPH06309503A JP5119317A JP11931793A JPH06309503A JP H06309503 A JPH06309503 A JP H06309503A JP 5119317 A JP5119317 A JP 5119317A JP 11931793 A JP11931793 A JP 11931793A JP H06309503 A JPH06309503 A JP H06309503A
Authority
JP
Japan
Prior art keywords
character
rectangle
recognition
characters
rectangles
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP5119317A
Other languages
Japanese (ja)
Inventor
Ryoichi Yushimo
良一 湯下
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP5119317A priority Critical patent/JPH06309503A/en
Publication of JPH06309503A publication Critical patent/JPH06309503A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)
  • Character Input (AREA)

Abstract

PURPOSE:To provide an English character recognizing device which can eliminate the integrating errors of broken characters and can recognize the English characters with high accuracy. CONSTITUTION:An English character recognizing device is provided with a character rectangle detecting means 3 which decides a rectangle that is virtually circumscribed to characters as a character rectangle based on the continuation of black picture elements in a document image, a broken character integrating means 11 which performs the integration processing by deciding one character when a certain space between character rectangles is smaller than other spaces of character rectangular, and a character recognizing means 9 which recognizes the characters in the character rectangles and the characters in the character rectangles that have been integrally processed through the means 11. Furthermore, a recognition result deciding means 15 is added to confirm and decide the results of characters in the character rectangle recognized by the means 9. Then, only the character rectangles that are not decided by the means 15 are integrated by the means 11. The character images of the character rectangles to which the recognition results are decided are excluded in regard of decision of the broken characters. Thus, it is possible to prevent such a case where the characters other than the broken characters are wrong integrated even with a document of small character pitches.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は、文書画像中の英文字を
認識する英文字認識装置に関し、特に、途中がかすれて
いる「切れ文字」を正しく認識できるように構成したも
のである。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an English character recognizing device for recognizing English characters in a document image, and in particular, it is constructed so that "cut characters" which are faint in the middle can be correctly recognized.

【0002】[0002]

【従来の技術】文字認識装置は、近年、コンピュータ等
への文字入力の際の入力装置としても使用される等、そ
の利用分野が大きく広がり、高精度の文字認識能力が求
められている。
2. Description of the Related Art In recent years, a character recognition device has been used widely as an input device for inputting characters to a computer or the like, and its field of use has expanded greatly, and high-precision character recognition ability is required.

【0003】従来の英文字認識装置は、図7に示すよう
に、認識対象文書を2値化した文書画像として入力する
画像入力手段1と、入力した文書画像を記憶する画像格
納手段2と、文書画像内の黒画像の連なりを基にして文
字に仮想的に外接する矩形を求める文字矩形手段3と、
求められた文字矩形の文書画像上の座標データと各文字
矩形に与える通し番号とを記憶する文字矩形格納手段4
と、文字矩形格納手段4に格納された座標データを基に
各矩形間の水平方向の間隔を求め、間隔が広い部分を単
語区切りとして検出し、単語を構成する文字矩形番号を
求める単語切り出し手段5と、得られた単語を構成する
文字矩形の番号を一纏めに格納する単語文字矩形格納手
段6と、文字矩形番号に対応する文字矩形の座標データ
を文字矩形格納手段4から取り出すと共に、それを基に
画像格納手段2から文字画像を取出し、その黒画素の分
布を抽出して図形特徴を求める図形特徴抽出手段7と、
文書画像を構成する文字の図形特徴を記憶しておく認識
辞書手段8と、文字矩形内の図形特徴と認識辞書手段8
に記憶された図形特徴とを比較して両者の差を求め、そ
の差が小さい特徴を持つ文字を認識結果とする文字認識
手段9と、各文字矩形の認識結果を格納する認識結果格
納手段10と、隣接する矩形間の間隔が他の文字間隔より
狭い場合に1文字であると判定して統合し、統合した矩
形内の図形特徴を図形特徴抽出手段7で求めた後、文字
認識手段9に送り、認識結果を求める切れ文字統合手段
11と、認識結果を、単語情報を基に単語単位に分けて表
示する表示処理手段12とを備えている。
As shown in FIG. 7, a conventional English character recognition apparatus includes an image input unit 1 for inputting a document to be recognized as a binarized document image, an image storage unit 2 for storing the input document image, and an image storage unit 2. A character rectangle means 3 for obtaining a rectangle virtually circumscribing a character based on a series of black images in a document image,
Character rectangle storage means 4 for storing the coordinate data of the obtained character rectangle on the document image and the serial number given to each character rectangle.
And a word segmentation unit that obtains a horizontal interval between the rectangles based on the coordinate data stored in the character rectangle storage unit 4, detects a portion having a wide interval as a word delimiter, and obtains a character rectangle number forming a word. 5, the word character rectangle storage means 6 for collectively storing the numbers of the character rectangles forming the obtained word, and the coordinate data of the character rectangle corresponding to the character rectangle number are extracted from the character rectangle storage means 4 and A graphic feature extracting means 7 for obtaining a graphic feature by extracting a character image from the image storing means 2 based on the image data, and extracting the distribution of the black pixels.
Recognition dictionary means 8 for storing the graphic features of the characters forming the document image, and graphic feature and recognition dictionary means 8 in the character rectangle.
The character recognition means 9 for comparing the graphic features stored in the table 1 to obtain the difference between them, and the recognition result storing means 10 for storing the recognition result of each character rectangle. When the space between adjacent rectangles is narrower than other character spaces, it is determined as one character and integrated, and the graphic feature extraction means 7 obtains the graphic feature in the integrated rectangle, and then the character recognition means 9 To integrate the broken character requesting the recognition result
11 and display processing means 12 for displaying the recognition result in units of words based on the word information.

【0004】この英文字認識装置の動作を、図4を用い
て説明する。まず、画像入力手段1により認識対象文書
を2値画像として入力し、画像格納手段2に記憶する。
文書画像30の一例を図4(a)に示す。その格納される
データは、文書30の中のX座標とY座標上の交点が白か
黒かを表わす2値画像データで構成される。
The operation of this English character recognition apparatus will be described with reference to FIG. First, the recognition target document is input as a binary image by the image input means 1 and stored in the image storage means 2.
An example of the document image 30 is shown in FIG. The stored data is composed of binary image data indicating whether the intersection point on the X coordinate and the Y coordinate in the document 30 is white or black.

【0005】次に、文字矩形検出手段3により、文書画
像中の黒画素の連なりを基にして、文字に仮想的に外接
する矩形(31〜49)を求め、文字矩形格納手段4に文字
矩形の座標を格納する(b)。
Next, the character rectangle detecting means 3 obtains a rectangle (31 to 49) virtually circumscribing a character based on the series of black pixels in the document image, and the character rectangle storing means 4 stores the character rectangle. Store the coordinates of (b).

【0006】また、得られた文字矩形の座標を基に「切
れ文字統合手段」11で切れ文字(一つの文字「h」が矩
形44と矩形45とに分かれて表示されている文字)の統合
を行ない、その結果を文字矩形格納手段4に格納する。
Further, based on the coordinates of the obtained character rectangles, the "cut character integrating means" 11 integrates the cut characters (one character "h" is displayed separately in the rectangle 44 and the rectangle 45). And stores the result in the character rectangle storage means 4.

【0007】切れ文字の統合は、文字矩形間のx座標方
向の間隔に注目して行なわれ、通常の文字間(例えば矩
形41と42の間隔50)より狭い間隔(矩形44と45との間隔
51)を切れ文字の間隔として統合する。
The integration of broken characters is performed by paying attention to the space between the character rectangles in the x-coordinate direction, which is narrower than the normal space (for example, space 50 between the rectangles 41 and 42) (space between the rectangles 44 and 45).
51) is integrated as the interval of break characters.

【0008】格納された文字矩形格納手段4の座標を基
に、単語切り出し手段5により各矩形間の水平方向の間
隔を求め、間隔が広い部分を単語区切りとして検出し、
単語を構成する文字矩形番号を求め、求められた単語を
構成する文字矩形の番号を一纏めにして単語文字矩形格
納部6に格納する。格納時の形態を図6に示す。
Based on the stored coordinates of the character rectangle storage means 4, the word cut-out means 5 finds a horizontal interval between the rectangles, and detects a wide interval as a word break,
The character rectangle numbers forming the words are obtained, and the numbers of the character rectangles forming the obtained words are collected and stored in the word character rectangle storage unit 6. The form at the time of storage is shown in FIG.

【0009】その後、文字矩形格納手段4に格納されて
いる座標データを基に画像格納手段2から全ての文字画
像を取り出して図形特徴抽出手段7に送り、図形特徴を
抽出し、文字認識処理手段9により全ての文字に対する
認識結果を求め、認識結果格納手段10に格納する。
After that, based on the coordinate data stored in the character rectangle storage means 4, all the character images are taken out from the image storage means 2 and sent to the figure feature extraction means 7 to extract the figure features and the character recognition processing means. The recognition results for all the characters are obtained by 9 and stored in the recognition result storage means 10.

【0010】表示処理手段12は、認識結果格納手段10に
格納された各文字画像の認識結果を単語文字矩形格納手
段6に格納されている単語情報をもとに単語単位に分け
て表示する。
The display processing means 12 displays the recognition result of each character image stored in the recognition result storage means 10 in units of words based on the word information stored in the word character rectangle storage means 6.

【0011】以上の処理によって、切れ文字を含む文書
画像が認識されることとなる。
By the above processing, the document image including the broken character is recognized.

【0012】[0012]

【発明が解決しようとする課題】しかし、従来の英文字
認識装置では、切れ文字の統合の際に、隣接する文字矩
形との間隔が狭い全ての文字矩形を統合するため、文字
間のピッチが狭い文書を文字認識の対象とするときに
は、切れ文字以外の文字が統合されてしまう、という問
題点を有していた。
However, in the conventional English character recognizing device, when the broken characters are integrated, all the character rectangles having a narrow gap with the adjacent character rectangles are integrated, so that the pitch between the characters is reduced. When a narrow document is targeted for character recognition, there is a problem that characters other than broken characters are integrated.

【0013】本発明は、こうした従来の問題点を解決す
るものであり、切れ文字の統合の誤りを排除し、高精度
に英文字を認識することができる英文字認識装置を提供
することを目的としている。
The present invention solves these conventional problems, and an object of the present invention is to provide an English character recognizing device capable of recognizing an English character with high accuracy by eliminating an error in integrating broken characters. I am trying.

【0014】[0014]

【課題を解決するための手段】そこで、本発明では、文
書画像中の黒画素の連なりに基づいて文字に仮想的に外
接する矩形を文字矩形として求める文字矩形検出手段
と、文字矩形間の間隔が他の文字矩形間の間隔より狭い
場合に1文字と判定して統合処理を行なう切れ文字統合
手段と、文字矩形内の文字および切れ文字統合手段によ
って統合処理された矩形内の文字の文字認識を行なう文
字認識手段とを備える英文字認識装置において、文字認
識手段が文字矩形内の文字について認識した結果を確か
めて確定する認識結果確定手段を設け、切れ文字統合手
段における統合処理の対象をこの認識結果確定手段によ
って確定されなかった文字矩形のみに限定している。
In view of the above, according to the present invention, a character rectangle detecting means for obtaining a rectangle which virtually circumscribes a character as a character rectangle based on a series of black pixels in a document image, and an interval between the character rectangles. Character is smaller than the space between other character rectangles, the character is recognized as one character and integrated processing is performed, and the character in the character rectangle and the character recognition of the character in the rectangle integrated by the cut character integrating means are performed. In the English character recognizing device including a character recognizing means for performing the above, a recognition result deciding means for confirming and deciding a result recognized by the character recognizing means for a character within the character rectangle is provided, and a target of the integrating process in the broken character integrating means is It is limited to only the character rectangles that have not been confirmed by the recognition result confirmation means.

【0015】また、この英文字認識装置において、文字
矩形内の文字画像を他の文字矩形内の文字画像と画素単
位で重ね合わせることにより文書画像における文字矩形
を同一字種毎に分類する文字パターン分類手段と、英単
語の綴りを記憶する単語辞書手段とを設け、文字認識手
段の認識結果に基づいて得られた単語文字列が、単語辞
書手段に記憶された単語文字列と、文字パターン分類手
段で同一字種に分類された文字矩形を含む複数の単語に
おいて一致したとき、認識結果確定手段が、同一字種に
分類された文字矩形の認識結果を確定するように構成し
ている。
Further, in this English character recognizing device, a character pattern for classifying a character rectangle in a document image into the same character type by superposing a character image in a character rectangle on a pixel unit in another character rectangle. A classifying means and a word dictionary means for storing spelling of English words are provided, and a word character string obtained based on the recognition result of the character recognizing means is a word character string stored in the word dictionary means and a character pattern classification. When a plurality of words including character rectangles classified by the means into the same character type are matched, the recognition result determination means is configured to determine the recognition result of the character rectangles classified into the same character type.

【0016】[0016]

【作用】そのため、切れ文字かどうかの判定は、認識結
果が確定された文字矩形の文字画像については除外され
る。したがって、文字間のピッチが狭い文書の場合で
も、切れ文字以外の文字を誤って統合してしまうという
事態が避けられる。
Therefore, the determination as to whether the character is a broken character is excluded for the character rectangular character image for which the recognition result has been determined. Therefore, even in the case of a document having a narrow pitch between characters, it is possible to avoid a situation in which characters other than broken characters are mistakenly integrated.

【0017】また、認識結果の確定は、文書画像中で同
一文字と認識された文字を含む複数の単語が正しい綴り
であると判断されたとき、その文字認識が正しいものと
して確定される。
Further, the recognition of the recognition result is confirmed as the correct character recognition when it is determined that a plurality of words including the characters recognized as the same character in the document image are spelled correctly.

【0018】[0018]

【実施例】本発明の実施例における英文字認識装置は、
図1に示すように、文書中に含まれる文字画像を重ね合
わせて同じ字種であるかどうかを分類する文字パタ−ン
分類手段13と、英単語の綴りを格納している単語辞書手
段14と、認識した単語文字列を単語辞書手段14に記憶さ
れた単語文字列と照合して単語の綴りが正しいか否かを
判定し、一致していればその認識結果を確定する認識結
果確定手段15とを備えている。
EXAMPLE An English character recognition apparatus in an example of the present invention is
As shown in FIG. 1, a character pattern classifying means 13 for classifying character images included in a document by superposing them and classifying whether or not they are of the same character type, and a word dictionary means 14 for storing spelling of English words. A recognition result determining means for determining whether or not the spelling of the word is correct by collating the recognized word character string with the word character string stored in the word dictionary means 14, and determining the recognition result if they match. It has 15 and.

【0019】その他の機能ブロックは、従来の装置(図
7)と変わりがない。
The other functional blocks are the same as in the conventional device (FIG. 7).

【0020】文字パターン分類手段13では、文字矩形の
座標データを文字矩形格納手段4から取り出すと共に、
それを基に画像格納手段2から文字画像を取り出し、文
書中の全ての文字画像を対象として相互に、画像を構成
する画素の単位で重ね合わせ、同じ字種であるか否かを
一致する画素と文字矩形内の画素数との比により判定
し、その比が大きいときに同一字種として分類する。
The character pattern classification means 13 retrieves the coordinate data of the character rectangle from the character rectangle storage means 4 and
A character image is taken out from the image storage means 2 based on it, and all the character images in the document are overlapped with each other in units of pixels constituting the image, and pixels having the same character type are compared. And the number of pixels in the character rectangle, and when the ratio is large, it is classified as the same character type.

【0021】また、認識結果確定手段15は、単語を構成
する文字矩形番号およびその文字矩形が属する矩形群の
認識結果を、単語文字矩形格納手段6および認識結果格
納手段10から取り出し、その結果得られる単語文字列と
単語辞書手段14に記憶されている単語文字列とを照合し
て単語の綴りが正しいか否かを判定し、同一矩形群に属
する文字矩形の認識結果が綴りの正しい複数の単語と一
致していれば、その矩形群の認識結果を確定し、その旨
を認識結果格納手段10の認識結果確定情報に記憶する。
Further, the recognition result determining means 15 retrieves the recognition result of the character rectangle number forming the word and the rectangle group to which the character rectangle belongs from the word character rectangle storing means 6 and the recognition result storing means 10, and obtains the result. The word character string stored in the word dictionary means 14 and the word character string stored in the word dictionary means 14 are compared to determine whether or not the word is spelled correctly, and the recognition results of the character rectangles belonging to the same rectangle group are correctly spelled. If it matches the word, the recognition result of the rectangle group is confirmed, and the fact is stored in the recognition result confirmation information of the recognition result storage means 10.

【0022】また、文字認識手段9の他に文字パターン
分類手段13および認識結果確定手段15からも情報を受け
る認識結果格納手段10は、文字パターン分類手段13で分
類された同一文字種群の文字矩形番号と、同一文字種群
ごとに与える通し番号と、その代表矩形の認識結果を文
字認識手段9によって求めた結果と、認識結果確定情報
とを格納する。
The recognition result storing means 10 which receives information from the character pattern classifying means 13 and the recognition result determining means 15 as well as the character recognizing means 9 is a character rectangle of the same character type group classified by the character pattern classifying means 13. A number, a serial number given to each group of the same character type, a result of the recognition result of the representative rectangle obtained by the character recognition means 9, and recognition result confirmation information are stored.

【0023】また、切れ文字統合手段11は、認識結果確
定手段15によって確定されなかった文字矩形に注目し、
隣接する矩形間の間隔が他の文字間隔より狭い場合に1
文字であると判定して統合し、統合した矩形内の図形特
徴を図形特徴抽出手段7で求めた後、文字認識手段9に
送り、その認識結果を求める。
Further, the broken character integration means 11 pays attention to the character rectangle which is not fixed by the recognition result fixing means 15,
1 if the space between adjacent rectangles is smaller than the space between other characters
It is determined that the character is a character, the characters are integrated, and the figure feature in the integrated rectangle is obtained by the figure feature extraction means 7 and then sent to the character recognition means 9 to obtain the recognition result.

【0024】また、この英文字認識装置のハード構成を
表わす装置ブロックは、図2に示すように、認識対象文
書を2値化した文書画像として読み込むスキャナ等の画
像入力装置16と、全体の制御を行なう中央処理装置(C
PU)17と、データを固定的に記憶するリード・オン・
メモリ(ROM)19と、処理上のデータを一時的に記憶
するランダム・アクセス・メモリ(RAM)23と、CP
U17に対して外部より開始・終了等の指令を与えるため
のキーボード18と、認識結果を表示する表示装置28と、
各手段をつなぐ内部信号の伝送バスライン29とから成
る。
As shown in FIG. 2, the device block representing the hardware structure of the English character recognition device includes an image input device 16 such as a scanner for reading a document to be recognized as a binarized document image and an overall control. Central processing unit (C
PU) 17 and read-on
A memory (ROM) 19, a random access memory (RAM) 23 for temporarily storing processing data, and a CP
A keyboard 18 for giving commands such as start and end to U17 from the outside, a display device 28 for displaying a recognition result,
And an internal signal transmission bus line 29 connecting each means.

【0025】このうち、ROM19は、CPU17のための
制御プログラム20と、認識辞書データ21と、単語辞書デ
ータ22とを含み、また、RAM23は、画像格納データ24
と、文字矩形格納データ25と、単語文字矩形格納データ
26と、認識結果格納データ27とを含んでいる。
Of these, the ROM 19 includes the control program 20 for the CPU 17, the recognition dictionary data 21, and the word dictionary data 22, and the RAM 23 includes the image storage data 24.
, Character rectangle storage data 25, word character rectangle storage data
26 and recognition result storage data 27.

【0026】この英文字認識装置の動作を、図3のフロ
ーチャートに基づいて説明する。 ステップ1;画像入力手段1により認識対象文書を入力
し、画像格納手段2に2値化した文書画像として記憶す
る。 ステップ2;文字矩形検出手段3により、文書画像中の
黒画素の連なりを検出し、連なったひと固まりの黒画素
を文字として、その文字に仮想的に外接する矩形を求め
る。その結果、得られた矩形の座標データと各文字矩形
に与える通し番号とを文字矩形情報として文字矩形格納
手段4に記憶する。図5(a)には、ステップ2におい
て得られた文字矩形(31〜49)を、また、図5(b)に
は、その文字矩形情報の文字矩形格納手段4における格
納形態52を示している。
The operation of this English character recognition apparatus will be described with reference to the flowchart of FIG. Step 1: Input the document to be recognized by the image input means 1 and store it in the image storage means 2 as a binarized document image. Step 2: The character rectangle detecting means 3 detects a series of black pixels in the document image, and determines a rectangle virtually circumscribing the character by using a series of black pixels as a character. As a result, the obtained rectangle coordinate data and the serial number given to each character rectangle are stored in the character rectangle storage means 4 as character rectangle information. FIG. 5A shows the character rectangles (31 to 49) obtained in step 2, and FIG. 5B shows the storage form 52 of the character rectangle information in the character rectangle storage means 4. There is.

【0027】ステップ3;格納された文字矩形の座標を
基に、単語切り出し手段5により各矩形間の水平方向の
間隔を求め、間隔が平均的な文字間隔より広い部分を単
語区切りとして検出し、1つの単語を構成する文字矩形
を求める。単語を構成する文字矩形は、その番号を単語
毎に一纏めにして単語文字矩形格納部6に格納する。図
6には、単語文字矩形の格納時の形態61を示している。
Step 3; Based on the coordinates of the stored character rectangles, the word slicing means 5 finds the horizontal interval between the rectangles, and detects a portion having a larger interval than the average character interval as a word segment, Find the character rectangles that make up one word. The character rectangles that compose a word are stored in the word character rectangle storage unit 6 with their numbers collected together for each word. FIG. 6 shows a form 61 when the word character rectangle is stored.

【0028】ステップ4;次に、文字パターン分類手段
13は、文字矩形の座標データを文字矩形格納手段4から
取り出すと共に、それを基に画像格納手段2から文字画
像を取り出し、文書中の全ての文字画像を対象に、画像
を構成する画素単位で重ね合わせて、同じ字種であるか
否かを一致する画素と文字矩形内の画素数との比により
判定し、その比が大きいときに同一字種として分類す
る。
Step 4; Next, character pattern classification means
Reference numeral 13 retrieves the coordinate data of the character rectangle from the character rectangle storage means 4, extracts the character image from the image storage means 2 based on the coordinate data, and targets all the character images in the document in units of pixels forming the image. By superimposing them, it is determined whether or not they have the same character type by the ratio of the number of matching pixels and the number of pixels in the character rectangle, and when the ratio is large, they are classified as the same character type.

【0029】その過程を図5(a)を用いて説明する。
53は、文字画像「A」55と文字画像「B」54とを重ね合
わせた状態を表しており、55と54とが重なっている画素
56の数を、文字の高さを3分割したエリア57、58、59内
で計数し、エリア57、エリア58およびエリア59内で一致
している画素の数を求める。その後、各エリア57〜59の
一致画素数をエリア57〜59の面積で除し、全てのエリア
57〜59でその値が1に十分近ければ文字画像が一致した
ものと見なす。
The process will be described with reference to FIG.
Reference numeral 53 represents a state in which the character image “A” 55 and the character image “B” 54 are overlapped, and the pixel where 55 and 54 overlap
The number of 56 is counted in areas 57, 58, and 59 in which the height of the character is divided into three, and the number of matching pixels in area 57, area 58, and area 59 is determined. After that, divide the number of matching pixels in each area 57-59 by the area of areas 57-59, and
If the value is sufficiently close to 1 in 57 to 59, it is considered that the character images match.

【0030】文書画像30中の文字矩形31〜49は、その相
互間で比較され、以下のように分類される。
Character rectangles 31 to 49 in the document image 30 are compared with each other and classified as follows.

【0031】 文字矩形群通し番号 文字矩形番号 1 31、36、41、46 2 32、33、37、38、43 3 34、39、42 4 35、48 5 40 6 44 7 45 8 47 9 49 ステップ5;分類された文字矩形群のうちの一つを代表
パターンとして認識処理を行なう。認識処理は、文字矩
形中の文字画像から図形特徴抽出手段7により図形特徴
を抽出し、文字認識手段9において認識辞書手段8に格
納されている図形特徴と比較して両者の差を求め、その
差が小さい特徴を有する文字を認識結果とする。
Character rectangle group serial number Character rectangle number 1 31, 36, 41, 46 2 32, 33, 37, 38, 43 3 34, 39, 42 4 35, 48 5 40 6 44 7 45 8 47 9 9 Step 5 The recognition process is performed by using one of the classified character rectangle groups as a representative pattern. In the recognition processing, the graphic feature extraction unit 7 extracts the graphic feature from the character image in the character rectangle, the character recognition unit 9 compares the graphic feature with the graphic feature stored in the recognition dictionary unit 8, and the difference between the two is obtained. A character having a characteristic with a small difference is set as a recognition result.

【0032】以下に各文字矩形群の認識結果を示す。The recognition result of each character rectangle group is shown below.

【0033】 文字矩形群通し番号 認識結果 1 'a' 2 'p' 3 'l' 4 'e' 5 'y' 6 'I' 7 'l' 8 'b' 9 't' ステップ4およびステップ5で求められた分類結果およ
び認識結果は、認識結果格納手段10に格納される。格納
時の形態を図5(b)に示している。
Character rectangle group serial number recognition result 1'a '2'p'3'l'4'e'5'y'6'I'7'l'8'b'9't'Step 4 and Step 5 The classification result and the recognition result obtained in step 3 are stored in the recognition result storage means 10. The form at the time of storage is shown in FIG.

【0034】ステップ6;次に、認識結果の確定処理を
行なう。認識結果確定手段15により単語を構成する文字
矩形番号を単語文字矩形格納手段6から取り出すととも
に、該当する文字矩形の認識結果を認識結果格納手段10
から取り出し、認識結果による文字列として得られる単
語文字列と単語辞書手段14に記憶されている単語文字列
とを照合して、単語の綴りが正しいか否かを判定し、同
一矩形群に属する文字矩形の認識結果が複数の綴りの正
しい単語において一致していれば、その文字矩形群の認
識結果を確定し、その旨を認識結果格納手段10の認識結
果確定情報に記憶する。
Step 6; Next, a recognition result confirmation process is performed. The recognition result confirmation means 15 retrieves the character rectangle numbers forming the word from the word character rectangle storage means 6, and the recognition result of the corresponding character rectangle is stored in the recognition result storage means 10.
The word character string obtained as a character string by the recognition result is compared with the word character string stored in the word dictionary means 14 to determine whether or not the spelling of the word is correct, and belongs to the same rectangular group. If the recognition result of the character rectangle matches in a plurality of correctly spelled words, the recognition result of the character rectangle group is confirmed, and the fact is stored in the recognition result confirmation information of the recognition result storage means 10.

【0035】その過程を図4(a)により具体的に説明
する。
The process will be specifically described with reference to FIG.

【0036】文字矩形31で始まる単語の文字矩形、およ
びその矩形が属する文字矩形群とその認識結果は、 となっており、綴り'apple'は正しい綴りである。
The character rectangle of the word starting with the character rectangle 31, the character rectangle group to which the rectangle belongs, and the recognition result are: The spelling'apple 'is the correct spelling.

【0037】また、文字矩形36で始まる単語の文字矩
形、およびその矩形が属する文字矩形群とその認識結果
は、 であり、綴り'apply'も正しい綴りである。
The character rectangle of the word starting with the character rectangle 36, the character rectangle group to which the rectangle belongs, and the recognition result are The spelling'apply 'is also a correct spelling.

【0038】従って、文字矩形群1、2、3の認識結果
は、複数の単語において正しい綴りが得られたため、そ
れぞれ'a'、'p'、'l'と確定されることとなる。
Therefore, the recognition results of the character rectangle groups 1, 2, and 3 are determined to be'a ',' p ', and'l', respectively, because correct spelling was obtained in a plurality of words.

【0039】ステップ7;その後、ステップ6で確定さ
れなかった文字画像を対象として切れ文字の統合処理を
行なう。切れ文字統合手段11では確定されなかった文字
矩形の位置関係に注目し、隣接する矩形間の間隔が他の
文字間隔より狭い場合に1文字であると判定して統合
し、 ステップ8;次いで、統合された矩形内の図形特徴を図
形特徴抽出手段7で求めた後、文字認識手段9に送り、
その認識結果を求め、認識結果格納手段10に格納する。
Step 7: After that, the process of integrating the broken characters is performed on the character image that is not determined in Step 6. Paying attention to the positional relationship of the character rectangles that have not been determined by the broken character integration means 11, if the interval between adjacent rectangles is narrower than the other character intervals, it is determined to be one character and integrated, step 8; After the figure feature extraction means 7 obtains the figure feature in the integrated rectangle, it is sent to the character recognition means 9,
The recognition result is obtained and stored in the recognition result storage means 10.

【0040】その過程を図4(a)によって具体的に説
明する。
The process will be specifically described with reference to FIG.

【0041】文字矩形41で始まる単語の中で、文字矩形
41、42、43および46は、上述した様に、その認識結果が
確定されており、切れ文字である可能性は無く、切れ文
字の統合の対象から除外される。切れ文字の統合の対象
となる間隔は、文字矩形44と45、47と48、48と49の3箇
所に限定される。また、それらの内、平均的な文字間隔
より非常に狭い間隔は44と45の間の51のみであるため、
文字矩形44と45が1文字である可能性が高いと考え、こ
れらを統合して認識し、結果として'h'を得る。
Character rectangle in a word starting with character rectangle 41
As described above, the recognition results of 41, 42, 43, and 46 are fixed, there is no possibility that they are broken characters, and they are excluded from the target of integration of broken characters. The gaps that are the target of integration of broken characters are limited to three positions of character rectangles 44 and 45, 47 and 48, and 48 and 49. Also, among them, only 51 between 44 and 45 is much narrower than the average character spacing,
It is highly possible that the character rectangles 44 and 45 are one character, these characters are integrated and recognized, and as a result, 'h' is obtained.

【0042】ステップ9;以上の処理によって求められ
た認識結果を、単語文字矩形格納手段6および認識結果
格納手段10を基にして、単語単位に分けて表示する。
Step 9: The recognition result obtained by the above processing is displayed in units of words based on the word character rectangle storage means 6 and the recognition result storage means 10.

【0043】このように実施例の英文字認識装置では、
文書画像中の文字の重ね合わによって字種を分類し、分
類された文字群の認識結果を複数の単語の綴りが正しい
ときに確定する。そして、この認識結果を確定した文字
画像については、切れ文字統合処理を行なわず、それを
除いた文字画像を対象として、切れ文字の統合処理を行
ない、統合された画像の認識処理を行なう。
As described above, in the English character recognition device of the embodiment,
Character types are classified by overlapping characters in a document image, and a recognition result of the classified character group is determined when a plurality of words are spelled correctly. Then, the character image for which the recognition result has been confirmed is not subjected to the cut character integration processing, but the cut character integration processing is performed for the character images excluding it, and the integrated image recognition processing is performed.

【0044】なお、認識結果を確定する方法としては、
実施例に示した方法以外にも、文書画像中に同一字種の
文字画像が一定数以上存在する場合に、正しいものとし
て確定する等、簡略化した別の方法を採ることも可能で
ある。
As a method for determining the recognition result,
In addition to the method shown in the embodiment, it is also possible to adopt another simplified method such as determining as correct if there are a certain number or more of character images of the same character type in the document image.

【0045】[0045]

【発明の効果】以上の実施例の説明から明らかなよう
に、本発明の英文字認識装置では、認識結果の確定処理
を行ない、確定された文字を切れ文字統合の対象から除
外して統合処理を行なうことにより、文字間ピッチが狭
い文書について文字認識を行なう場合でも、切れ文字統
合を誤まって実行することが改善され、英文字を高精度
に認識することが可能になる。
As is apparent from the above description of the embodiment, the English character recognition apparatus of the present invention performs the recognition result confirmation processing, and excludes the confirmed character from the target of the cut character integration to perform the integration processing. By performing the above, even when performing character recognition for a document with a narrow character pitch, it is possible to improve erroneous execution of broken character integration, and it is possible to recognize English characters with high accuracy.

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の実施例における英文字認識装置の機能
ブロック図、
FIG. 1 is a functional block diagram of an English character recognition device according to an embodiment of the present invention,

【図2】本発明の実施例における英文字認識装置の装置
ブロック図、
FIG. 2 is a device block diagram of an English character recognition device according to an embodiment of the present invention,

【図3】実施例の英文字認識装置における動作手順を示
すフローチャート、
FIG. 3 is a flowchart showing an operation procedure in the English character recognition device according to the embodiment,

【図4】文書画像の文字矩形(a)と文字矩形格納手段
での矩形番号の格納形態(b)を示す図、
FIG. 4 is a diagram showing a character rectangle (a) of a document image and a storage form (b) of a rectangle number in a character rectangle storage means;

【図5】同一字種の判定のために重ね合わせた文字画像
(a)と、認識結果の格納形態(b)とを示す図、
FIG. 5 is a diagram showing a character image (a) superimposed for determining the same character type and a storage form (b) of recognition results;

【図6】単語文字矩形の格納形態を示す図、FIG. 6 is a diagram showing a storage form of word character rectangles;

【図7】従来の英文字認識装置の機能ブロック図を示す
図である。
FIG. 7 is a diagram showing a functional block diagram of a conventional English character recognition device.

【符号の説明】[Explanation of symbols]

1 画像入力手段 2 画像格納手段 3 文字矩形検出手段 4 文字矩形格納手段 5 単語切り出し手段 6 単語文字矩形格納手段 7 図形特徴抽出手段 8 認識辞書手段 9 文字認識手段 10 認識結果格納手段 11 切れ文字統合手段 12 表示処理手段 13 文字パターン分類手段 14 単語辞書手段 15 認識結果確定手段 16 画像入力装置 17 CPU 18 キーボード 19 ROM 20 制御プログラム 21 認識辞書データ 22 単語辞書データ 23 RAM 24 画像格納データ 25 文字矩形格納データ 26 単語文字矩形格納データ 27 認識結果格納データ 28 表示装置 29 バス 30 文書画像 31〜49 文字矩形 50 文字矩形間隔 51 切れ文字の文字矩形間隔 53 重ね合わせた文字画像 54、55 個々の文字画像 56 重なった画素 57〜59 エリア 1 image input means 2 image storage means 3 character rectangle detection means 4 character rectangle storage means 5 word cutout means 6 word character rectangle storage means 7 graphic feature extraction means 8 recognition dictionary means 9 character recognition means 10 recognition result storage means 11 cut character integration Means 12 Display processing means 13 Character pattern classifying means 14 Word dictionary means 15 Recognition result determining means 16 Image input device 17 CPU 18 Keyboard 19 ROM 20 Control program 21 Recognition dictionary data 22 Word dictionary data 23 RAM 24 Image storage data 25 Character rectangle storage Data 26 Word character rectangle storage data 27 Recognition result storage data 28 Display 29 Bus 30 Document image 31 to 49 Character rectangle 50 Character rectangle spacing 51 Broken character rectangle spacing 53 Overlaid character images 54, 55 Individual character images 56 Overlapping pixels 57-59 area

Claims (2)

【特許請求の範囲】[Claims] 【請求項1】 文書画像中の黒画素の連なりに基づいて
文字に仮想的に外接する矩形を文字矩形として求める文
字矩形検出手段と、前記文字矩形間の間隔が他の文字矩
形間の間隔より狭い場合に1文字と判定して統合処理を
行なう切れ文字統合手段と、前記文字矩形内の文字およ
び前記切れ文字統合手段によって統合処理された矩形内
の文字の文字認識を行なう文字認識手段とを備える英文
字認識装置において、 前記文字認識手段が前記文字矩形内の文字について認識
した結果を確かめて確定する認識結果確定手段を設け、
前記切れ文字統合手段における統合処理の対象を前記認
識結果確定手段によって確定されなかった文字矩形のみ
に限定することを特徴とする英文字認識装置。
1. A character rectangle detecting means for determining a rectangle that virtually circumscribes a character as a character rectangle based on a series of black pixels in a document image, and an interval between the character rectangles is greater than an interval between other character rectangles. In the case where the character is narrow, it is determined that the character is one character and integrated processing is performed, and character recognition means that performs character recognition of the character in the character rectangle and the character in the rectangle integrated by the cut character integration means. In an English character recognizing device, the character recognizing means includes a recognition result deciding means for confirming and deciding a result of recognizing a character in the character rectangle
An English character recognition device, wherein the target of the integration processing in the cut character integration means is limited to only the character rectangles that have not been determined by the recognition result determination means.
【請求項2】 前記文字矩形内の文字画像を他の文字矩
形内の文字画像と画素単位で重ね合わせることにより前
記文書画像における文字矩形を同一字種毎に分類する文
字パターン分類手段と、英単語の綴りを記憶する単語辞
書手段とを設け、 前記文字認識手段の認識結果に基づいて得られた単語文
字列が、前記単語辞書手段に記憶された単語文字列と、
前記文字パターン分類手段で同一字種に分類された文字
矩形を含む複数の単語において一致したとき、前記認識
結果確定手段が前記同一字種に分類された文字矩形の認
識結果を確定することを特徴とする請求項1に記載の英
文字認識装置。
2. A character pattern classifying unit for classifying a character rectangle in the document image for each same character type by superimposing a character image in the character rectangle on a character image in another character rectangle on a pixel-by-pixel basis. Provided with a word dictionary means for storing the spelling of words, the word character string obtained based on the recognition result of the character recognition means, the word character string stored in the word dictionary means,
When a plurality of words including character rectangles classified into the same character type by the character pattern classification unit are matched, the recognition result determination unit determines the recognition result of the character rectangle classified into the same character type. The English character recognition device according to claim 1.
JP5119317A 1993-04-23 1993-04-23 English character recognizing device Pending JPH06309503A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP5119317A JPH06309503A (en) 1993-04-23 1993-04-23 English character recognizing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP5119317A JPH06309503A (en) 1993-04-23 1993-04-23 English character recognizing device

Publications (1)

Publication Number Publication Date
JPH06309503A true JPH06309503A (en) 1994-11-04

Family

ID=14758462

Family Applications (1)

Application Number Title Priority Date Filing Date
JP5119317A Pending JPH06309503A (en) 1993-04-23 1993-04-23 English character recognizing device

Country Status (1)

Country Link
JP (1) JPH06309503A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000132690A (en) * 1998-10-22 2000-05-12 Xerox Corp Image processing method and image processor using image division by making token
JP2009199116A (en) * 2008-02-19 2009-09-03 Fuji Xerox Co Ltd Image processor and image processing program

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000132690A (en) * 1998-10-22 2000-05-12 Xerox Corp Image processing method and image processor using image division by making token
JP2009199116A (en) * 2008-02-19 2009-09-03 Fuji Xerox Co Ltd Image processor and image processing program

Similar Documents

Publication Publication Date Title
US6226402B1 (en) Ruled line extracting apparatus for extracting ruled line from normal document image and method thereof
JP2726656B2 (en) Pattern classification means used for pattern recognition method
US6212299B1 (en) Method and apparatus for recognizing a character
JPH07141463A (en) Detection of mechanically printed amount of money in binary-coded image
EP0045803A1 (en) System and method for processing horizontal line characteristics in an image
JPH04195692A (en) Document reader
JP2002203207A (en) Character recognizing method and program, and recording medium
US6324302B1 (en) Method and a system for substantially eliminating erroneously recognized non-solid lines
JPH0950527A (en) Frame extracting device and rectangle extracting device
JPH06309503A (en) English character recognizing device
JPH06180771A (en) English letter recognizing device
JP3607753B2 (en) Document image region dividing method and apparatus, and column type discrimination method and apparatus
JP2917427B2 (en) Drawing reader
JPH06187489A (en) Character recognizing device
JPH0728935A (en) Document image processor
US10878271B2 (en) Systems and methods for separating ligature characters in digitized document images
JPH117493A (en) Character recognition processor
CN115731250A (en) Text segmentation method, device, equipment and storage medium
JP2917394B2 (en) Character recognition device and character segmentation method
KR20220168787A (en) Method to extract units of Manchu characters and system
JP2795222B2 (en) Character extraction method and character extraction device
JPH06119484A (en) Character recognizing device
JPH0981685A (en) Occidental character recognition device and occidental character recognition method
JP2004246929A (en) Method and apparatus of dividing domains in document image
JPH0696277A (en) Alphabet recognizing device