JPH0696277A - Alphabet recognizing device - Google Patents

Alphabet recognizing device

Info

Publication number
JPH0696277A
JPH0696277A JP4243059A JP24305992A JPH0696277A JP H0696277 A JPH0696277 A JP H0696277A JP 4243059 A JP4243059 A JP 4243059A JP 24305992 A JP24305992 A JP 24305992A JP H0696277 A JPH0696277 A JP H0696277A
Authority
JP
Japan
Prior art keywords
character
unit
recognition
contact
pattern group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP4243059A
Other languages
Japanese (ja)
Inventor
Michiaki Nobuoka
道明 信岡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP4243059A priority Critical patent/JPH0696277A/en
Publication of JPH0696277A publication Critical patent/JPH0696277A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)
  • Character Discrimination (AREA)

Abstract

PURPOSE:To provide the alphabet recognizing device which can obtain a result of recognition of high accuracy in a short time by minimizing the influence of a contact character. CONSTITUTION:The device is constituted by providing a character pattern group classifying part 13 for classifying a character pattern into pattern groups by a superposing method, and a contact character pattern group defining part 17 for extracting a word character-string containing the character pattern in the character pattern group decided as a contact character, using other character than a contact character pattern as a key, collating it with a work dictionary 18 and defining a contact character-string.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は英文一般文書を構成する
各文字の認識を行う英文字認識装置に関するものであ
る。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an English character recognizing device for recognizing each character constituting an English general document.

【0002】[0002]

【従来の技術】近年、文字認識装置を電子計算機等の入
力装置として利用する要求が高まっており、安定な認識
結果を効率的に得ることができる文字入力装置が電子計
算機等のシステムの性能向上に不可欠となっている。
2. Description of the Related Art In recent years, there is an increasing demand for using a character recognition device as an input device for a computer or the like, and a character input device capable of efficiently obtaining a stable recognition result improves the performance of a system such as a computer. Has become essential.

【0003】以下に従来の英文字認識装置について説明
する。図6は従来の文字認識装置の機能構成を示すブロ
ック図である。
A conventional English character recognition device will be described below. FIG. 6 is a block diagram showing a functional configuration of a conventional character recognition device.

【0004】1は認識対象文書を2値画像として入力す
る画像入力部、2は画像入力部1で入力した文書画像を
記憶する画像格納部、3は画像格納部2で記憶した文書
画像中の文字に外接する矩形を黒画素の連なりをもとに
求める外接矩形検出部、4は外接矩形検出部3にて求め
られた矩形内の画像の黒画素の分布を図形特徴として抽
出する図形特徴抽出部、5は図形特徴抽出部4にて求め
られた図形特徴と、予め用意される全ての認識対象文字
の図形特徴を比較し、類似する特徴を有する文字を認識
結果と出力し、また、類似する特徴がない場合に矩形内
の画像を接触文字として判定する文字認識部、6は認識
対象となるすべての文字の図形特徴を記憶している認識
辞書、7は文字認識部5にて出力された認識結果を記憶
する認識結果格納部、8は文字認識部5にて接触文字と
して判定された画像の黒画素の縦方向のヒストグラムに
注目し、その値が小さい部分を文字間の接触点として1
文字ずつ分割する接触文字分割部である。
1 is an image input section for inputting a recognition target document as a binary image, 2 is an image storage section for storing the document image input by the image input section 1, and 3 is one of the document images stored in the image storage section 2. A circumscribing rectangle detection unit that obtains a rectangle circumscribing a character based on a series of black pixels. Reference numeral 4 is a graphic feature extraction that extracts a distribution of black pixels of an image within the rectangle obtained by the circumscribed rectangle detection unit 3 as a graphic feature. The units 5 compare the graphic features obtained by the graphic feature extraction unit 4 with the graphic features of all the recognition target characters prepared in advance, and output the characters having similar features as the recognition result. If there is no such feature, the character recognition unit that determines the image in the rectangle as a contact character, 6 is a recognition dictionary that stores the graphic features of all the characters to be recognized, and 7 is output by the character recognition unit 5. Storing recognition result , 1 8 is focused on the vertical histogram of black pixels in the image determined as a contact character at the character recognition section 5, a partial value is smaller as the contact point between characters
It is a contact character division unit that divides each character.

【0005】以上のように構成された文字認識装置につ
いて、以下その動作を説明する。まず、画像入力部1に
て認識対象文書を2値画像として入力し、画像格納部2
に記憶する。次に、画像格納部2に記憶された文書画像
中の文字に外接する矩形を黒画素の連なりをもとに外接
矩形検出部3にて求め、その矩形内の画像の黒画素の分
布を図形特徴として図形特徴抽出部4にて抽出する。抽
出された図形特徴は文字認識部5に送られ、認識辞書と
して予め用意される全ての認識対象文字の図形特徴と比
較し、類似する特徴を有する文字を認識結果と出力し、
また、類似する特徴がない場合に矩形内の画像を接触文
字として判定する。文字認識部5にて認識結果が得られ
た場合は、その結果を認識結果格納部7に記憶し、得ら
れなかった場合は、矩形内の画像を接触文字として接触
文字分割処理部8に送る。
The operation of the character recognizing device constructed as above will be described below. First, the image input unit 1 inputs a document to be recognized as a binary image, and the image storage unit 2
Remember. Next, the rectangle circumscribing the characters in the document image stored in the image storage unit 2 is determined by the circumscribed rectangle detection unit 3 based on the series of black pixels, and the distribution of the black pixels of the image within the rectangle is graphically displayed. The features are extracted by the graphic feature extraction unit 4. The extracted graphic features are sent to the character recognition unit 5, compared with the graphic features of all recognition target characters prepared in advance as a recognition dictionary, and characters having similar features are output as a recognition result.
If there is no similar feature, the image in the rectangle is determined as the contact character. If the recognition result is obtained by the character recognition unit 5, the result is stored in the recognition result storage unit 7. If not, the image in the rectangle is sent to the contact character division processing unit 8 as the contact character. .

【0006】接触文字分割処理部8では画素中の黒画素
の縦方向のヒストグラムを求め、その値が小さな部分を
文字間の接触点とし、接触点にて画像を分割することに
より接触した文字を1文字ずつ分離する。分割された画
像は再び図形特徴抽出部4に送られ、図形特徴を抽出し
た後、文字認識部5にて認識される。
The contact character division processing unit 8 obtains a vertical histogram of black pixels in the pixels, sets a portion having a small value as a contact point between characters, and divides the image at the contact point to detect the contacted characters. Separate one character at a time. The divided images are sent again to the graphic feature extraction unit 4, and after the graphic features are extracted, they are recognized by the character recognition unit 5.

【0007】以上の処理にて、文書画像中の接触文字を
含む全ての文字が認識していた。
Through the above processing, all characters including contact characters in the document image have been recognized.

【0008】[0008]

【発明が解決しようとする課題】しかしながら上記従来
の構成では、1文字ずつ認識処理を行うことにより認識
結果を得ているので、文書画像中に接触した文字列、す
なわち接触文字が存在している場合、その図形特徴をも
とに1文字ずつ分割した後、認識していたため、接触文
字が多く存在する文書の認識を行う場合、接触文字の分
割精度の不安定さにより低い認識精度しか得られないと
いう問題点を有していた。又、1文字ずつの認識処理な
ので分割処理・認識処理作業が多く発生することによる
多大の処理時間を要し、作業性が悪いという問題点を有
していた。
However, in the above-mentioned conventional configuration, since the recognition result is obtained by performing the recognition process character by character, there is a contact character string, that is, a contact character in the document image. In this case, since the characters are divided one by one based on the figure feature and then recognized, when recognizing a document with many contact characters, low recognition accuracy is obtained due to instability of the division accuracy of the contact characters. It had the problem of not having it. Further, since the recognition processing is performed for each character, a large amount of processing time is required due to a large number of division processing / recognition processing operations, resulting in poor workability.

【0009】本発明は上記従来の問題点を解決するもの
で、接触文字の影響を最小限にとどめ、高精度の認識結
果を短時間で得ることの出来る英文字認識装置を提供す
ることを目的とする。
The present invention solves the above-mentioned conventional problems, and an object of the present invention is to provide an English character recognition apparatus which can minimize the influence of contact characters and can obtain a highly accurate recognition result in a short time. And

【0010】[0010]

【課題を解決するための手段】この目的を達成するため
に本発明の英文字認識装置は、要約すると文字パターン
を重ね合わせ法によりパターン群に分類する文字パター
ン群分類部と、接触文字と判定された文字パターン群内
の文字パターンを含む単語文字列を抽出し接触文字パタ
ーンの以外の文字をキーとして、単語辞書と照合し接触
文字列を確定する接触文字パターン群確定部により構成
されている。具体的には、認識対象文書を入力する画像
入力部と、前記画像入力部で入力された文書画像を記憶
する画像記憶部と、前記画像記憶部に記憶された文書画
像中の文字に外接する矩形を黒画素の連なりをもとに求
める外接矩形検出部と、前記外接矩形検出部で求められ
た外接矩形の水平方向の間隔をもとに単語領域を求める
単語領域切り出し部と、前記外接矩形検出部で求められ
た外接矩形内の文字パターンを重ね合わせ法によりパタ
ーン群に分類する文字パターン群分類部と、前記外接矩
形検出部で求められた外接矩形内の黒画素の分布を図形
特徴として抽出する図形特徴抽出部と、予め全ての認識
対象文字の図形特徴を記憶している認識辞書と、前記図
形特徴抽出部で抽出した図形特徴と前記認識辞書との比
較により類似した特徴を有する文字が前記認識辞書中に
存在した場合、該当文字を認識結果として後述の認識結
果格納部に記憶させ、また、類似する特徴が認識辞書中
に存在しない場合、前記外接矩形検出部で求められた外
接矩形内の画像を隣接している2文字が接触している接
触文字として判定する文字認識処理部と、接触文字と判
定された文字パターン群内の文字パターンを含む単語文
字列を抽出し接触文字パターンの以外の文字をキーとし
て前記単語辞書と照合し接触文字列を確定する接触文字
パターン群確定部と、前記接触文字パターン群確定部に
て確定できなかった文字パターン群の画像の黒画素の縦
方向のヒストグラムに注目し、その値が小さな部分を文
字と文字との接触点として1文字ずつ分割する接触文字
分割処理部と、前記文字認識処理部にて得られた認識結
果及び前記接触文字パターン群確定部にて得られた確定
文字列を格納している認識結果格納部と、前記認識結果
格納部に格納されている結果及び前記単語領域切り出し
部にて切り出された単語領域をもとに認識対象文書の認
識結果を出力する認識結果出力部と、を備えた英文字認
識装置である。
To achieve this object, the English character recognition apparatus of the present invention summarizes, in summary, a character pattern group classification unit that classifies a character pattern into a pattern group by a superposition method, and determines a contact character. It is configured by a contact character pattern group determination unit that extracts a word character string including a character pattern in the selected character pattern group, and collates it with a word dictionary using the characters other than the contact character pattern as keys to determine the contact character string. . Specifically, an image input unit for inputting a recognition target document, an image storage unit for storing the document image input by the image input unit, and a circumscribed character in the document image stored in the image storage unit A circumscribing rectangle detection unit that obtains a rectangle based on a series of black pixels, a word region cutout unit that obtains a word region based on the horizontal interval of the circumscribed rectangle obtained by the circumscribed rectangle detection unit, and the circumscribed rectangle A character pattern group classification unit that classifies the character patterns in the circumscribed rectangle obtained by the detection unit into a pattern group by the superposition method, and the distribution of black pixels in the circumscribed rectangle obtained by the circumscribed rectangle detection unit as graphic features. The graphic feature extraction unit to be extracted, the recognition dictionary in which the graphic features of all the recognition target characters are stored in advance, and the feature similar to the graphic feature extracted by the graphic feature extraction unit and the recognition dictionary are similar to each other. If a character exists in the recognition dictionary, the corresponding character is stored as a recognition result in the recognition result storage unit described later, and if a similar feature does not exist in the recognition dictionary, it is obtained by the circumscribing rectangle detection unit. A character recognition processing unit that determines an image in a circumscribed rectangle as a contact character in which two adjacent characters are in contact, and a word character string that includes a character pattern in a character pattern group that is determined as a contact character is extracted to make contact. A contact character pattern group confirmation unit that confirms a contact character string by collating with the word dictionary using a character other than a character pattern as a key, and a black pixel of an image of a character pattern group that cannot be confirmed by the contact character pattern group confirmation unit Focusing on the histogram in the vertical direction of, the contact character division processing unit that divides the character having a small value as a contact point between the characters and the character recognition processing unit is obtained. The recognition result storage unit that stores the recognition result and the confirmed character string obtained by the contact character pattern group determination unit, and the result stored in the recognition result storage unit and the word area cutout unit. And a recognition result output unit that outputs the recognition result of the recognition target document based on the word area.

【0011】[0011]

【作用】この構成によって、分類された文字パターン群
の代表パターンの認識を行い、接触文字に対して、単語
列内の接触していない認識確度の高い文字列をキーとし
て単語辞書との照合により認識結果を求めるので、短時
間で高精度の認識結果を得ることができる。
With this structure, the representative pattern of the classified character pattern group is recognized, and the contact character is collated with the word dictionary by using the character string having high recognition accuracy in the word string which is not in contact as a key. Since the recognition result is obtained, a highly accurate recognition result can be obtained in a short time.

【0012】[0012]

【実施例】以下本発明の一実施例について、図面を参照
しながら説明する。
DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings.

【0013】図1は本発明の一実施例における英文字認
識装置の機能構成を示すブロック図であり、図2はその
装置構成を示すブロック図である。
FIG. 1 is a block diagram showing a functional configuration of an English character recognizing device according to an embodiment of the present invention, and FIG. 2 is a block diagram showing the device configuration.

【0014】図1において、9は認識対象文書を2値画
像として入力する画像入力部、10は入力した画像を記
憶する画像格納部、11は文書画像中の文字の外接する
矩形を黒画素の連結をもとに求める外接矩形検出部、1
2は求められた外接矩形の水平方向の間隔をもとに単語
領域を求める単語領域切り出し部、13は外接矩形検出
部11にて求められた矩形内の画像(文字パターン)を
重ね合わせ法により分類する文字パターン群分類部、1
4は外接矩形検出部11にて求められた矩形内の画像の
黒画素の分布を図形特徴として抽出する図形特徴抽出
部、15は図形特徴抽出部14にて求められた図形特徴
と、全ての認識対象文字の図形特徴を予め記憶している
認識辞書16とを比較し、類似する図形特徴を有する文
字を認識結果とし、また、類似する図形特徴を有する文
字が存在しない場合、矩形内の文字パターンを接触文字
と判定する文字認識部、16は全ての認識対象文字の図
形特徴を予め記憶している認識辞書、17は単語領域切
り出し部12にて求められた単語領域の中で、文字認識
部15において、接触文字と判定された文字パターン群
内の文字パターンを含む単語領域について、接触文字と
判定されなかった文字をキーとして、予め単語の綴りを
記憶している単語辞書18を参照することにより接触文
字列を確定する接触文字パターン群確定部、18は予め
単語の綴りを記憶している単語辞書、19は接触文字パ
ターン群確定部17において確定できなかった文字パタ
ーン群に対して、矩形内の画像の黒画素の縦方向のヒス
トグラムに注目し、その値が小さい部分を隣接文字間の
接触点とみなし1文字ずつ分割する接触文字分割処理
部、20は文字認識結果を格納している認識結果格納
部、21は認識結果格納部20に格納されている認識結
果、単語領域切り出し結果をもとに認識対象文書の認識
結果を出力する認識結果出力部である。
In FIG. 1, reference numeral 9 is an image input unit for inputting a recognition target document as a binary image, 10 is an image storage unit for storing the input image, and 11 is a rectangle circumscribing a character in the document image with black pixels. A circumscribed rectangle detection unit that is obtained based on the connection, 1
Reference numeral 2 is a word area cutout unit that obtains a word area based on the obtained horizontal interval of the circumscribed rectangle, and 13 is a method of superimposing the image (character pattern) in the rectangle obtained by the circumscribed rectangle detection unit 11. Character pattern group classification unit for classification, 1
Reference numeral 4 is a graphic feature extraction unit that extracts the distribution of black pixels of the image within the rectangle obtained by the circumscribed rectangle detection unit 11 as a graphic feature, and 15 is the graphic feature obtained by the graphic feature extraction unit 14 and all When a character having similar graphic characteristics is compared as a recognition result by comparing with a recognition dictionary 16 in which the graphic characteristics of the recognition target character are stored in advance, and when there is no character having similar graphic characteristics, the characters in the rectangle A character recognition unit that determines a pattern as a contact character, 16 is a recognition dictionary that stores graphic features of all recognition target characters in advance, and 17 is character recognition in the word area obtained by the word area cutout unit 12. In the unit 15, in the word region including the character pattern in the character pattern group determined to be the contact character, the spelling of the word is stored in advance using the character not determined to be the contact character as a key. A contact character pattern group determination unit that determines a contact character string by referring to 18, a word dictionary that stores the spelling of words in advance, and a character pattern group that cannot be determined by the contact character pattern group determination unit 17. On the other hand, paying attention to the vertical histogram of the black pixels of the image in the rectangle, the portion with a small value is regarded as the contact point between the adjacent characters, and the contact character division processing unit divides the characters one by one. Is a recognition result storage unit for storing the recognition result, and a recognition result output unit 21 for outputting the recognition result of the recognition target document based on the recognition result and the word area cutout result stored in the recognition result storage unit 20.

【0015】図2において、22は認識対象文書を2値
画像として読み込むスキャナ、23は全体の制御を行う
中央処理装置(以下CPUと略称する)、24はCPU
が全体の制御を行うための制御プログラム、25は全て
の認識対象文字の図形特徴を記憶している認識辞書、2
6は単語の綴りを記憶している単語辞書、27は制御プ
ログラム24、認識辞書25、単語辞書26を記憶する
リードオンリメモリ、28はスキャナ22にて読み込ま
れた文書画像領域、29は画像中の外接矩形領域、30
は画像中の単語領域、31は文字パターン群、32は認
識結果、33は文書画像領域28、外接矩形領域29、
単語領域30、文字パターン群31、認識結果32を記
憶するためのランダムアクセスメモリ、34はCPU2
3に対して外部より開始・終了等の指令を与えるための
キーボード、35は認識結果32を出力するための出力
装置、36はスキャナ22からキーボード34を結ぶ内
部バスである。
In FIG. 2, 22 is a scanner for reading a document to be recognized as a binary image, 23 is a central processing unit (hereinafter abbreviated as CPU) for controlling the whole, and 24 is a CPU.
Is a control program for performing overall control, 25 is a recognition dictionary that stores graphic features of all recognition target characters, 2
6 is a word dictionary that stores the spelling of words, 27 is a read-only memory that stores the control program 24, the recognition dictionary 25, and the word dictionary 26, 28 is a document image area read by the scanner 22, and 29 is an image. Circumscribed rectangular area, 30
Is a word area in the image, 31 is a character pattern group, 32 is a recognition result, 33 is a document image area 28, a circumscribed rectangular area 29,
A random access memory for storing the word area 30, the character pattern group 31, and the recognition result 32, and 34 is the CPU 2
A keyboard for externally giving a start / end command to the device 3, 35 is an output device for outputting the recognition result 32, and 36 is an internal bus connecting the scanner 22 to the keyboard 34.

【0016】以上のように構成された英文字認識装置に
ついて、以下その動作を説明する。図3は本実施例の英
文字認識装置の全体の動作を示すフローチャートであ
る。
The operation of the English character recognizing device configured as described above will be described below. FIG. 3 is a flowchart showing the overall operation of the English character recognition apparatus of this embodiment.

【0017】認識対象文書の2値画像を画像入力部9に
て入力し、画像格納部10に記憶し(ステップS1)、
その画像中にて、8近傍で連結している黒画素の集まり
を1つの文字パターンとして、その文字パターンに外接
する矩形を外接矩形検出部11にて求め、内部データと
して蓄える。この際、微少矩形が存在しかつ、その矩形
の垂直方向のすぐ近くに矩形が存在する場合、i,j等
の分離文字とみなし統合する(ステップS2)。また、
求められた外接矩形の水平方向の間隔が広い箇所を単語
区切りとして単語領域として求め、内部データに蓄える
(ステップS3)。
The binary image of the document to be recognized is input by the image input unit 9 and stored in the image storage unit 10 (step S1),
In the image, a group of black pixels connected in the vicinity of 8 is defined as one character pattern, and a rectangle circumscribing the character pattern is determined by the circumscribing rectangle detection unit 11 and stored as internal data. At this time, if a minute rectangle exists and a rectangle exists in the immediate vicinity of the rectangle in the vertical direction, it is regarded as a separated character such as i, j and integrated (step S2). Also,
A portion of the obtained circumscribed rectangle having a large horizontal interval is obtained as a word segment as a word area and stored as internal data (step S3).

【0018】検出された外接矩形の領域は文字パターン
群分類部13に送られ、外接矩形内の文字パターンを重
ね合わせ法によりパターン群に分類し、内部データに蓄
える(ステップS4)。
The detected circumscribed rectangle area is sent to the character pattern group classification unit 13, and the character patterns in the circumscribed rectangle are classified into pattern groups by the superposition method and stored in the internal data (step S4).

【0019】分類された文字パターン群の代表パターン
の外接矩形領域が、図形特徴抽出部14に送られ、この
領域内の画像を画像格納部から取り出し、画像中の黒画
素の分布を図形特徴として抽出する(ステップS5)。
The circumscribed rectangular area of the representative pattern of the classified character pattern group is sent to the graphic feature extraction unit 14, the image in this area is taken out from the image storage unit, and the distribution of black pixels in the image is used as the graphic feature. Extract (step S5).

【0020】抽出された図形特徴は文字認識部15に送
られ、予め全ての認識対象文字の図形特徴を記憶してい
る認識辞書との比較により、類似した特徴を有する文字
か認識辞書中に存在した場合、該当文字を認識結果とし
て認識結果格納部に記憶し、また、類似する特徴が認識
辞書中に存在しない場合、外接矩形内の画像を隣接して
いる2文字が接触している接触文字として判定する(ス
テップS6)。
The extracted graphic features are sent to the character recognition unit 15 and compared with a recognition dictionary in which the graphic features of all the recognition target characters are stored in advance, and whether the characters have similar features or exist in the recognition dictionary. In this case, the corresponding character is stored in the recognition result storage unit as a recognition result, and when a similar feature does not exist in the recognition dictionary, a contact character in which two adjacent characters touch the image in the circumscribed rectangle (Step S6).

【0021】文字認識処理において接触文字が存在した
場合、接触文字処理パターン群確定部にて接触文字処理
を行う(ステップS7,ステップS8)。
When a contact character is present in the character recognition processing, the contact character processing pattern group determining unit performs the contact character processing (steps S7 and S8).

【0022】次に接触文字の処理方法について、具体例
を基に説明する。図4は接触文字の処理方法を示すフロ
ーチャートであり、図5は被処理対象の具体例を示す図
であり、(a)は入力画像を示す図であり、(b)は文
字パターン群分類結果(文字の下の番号が分類パターン
番号)を示す図であり、(c)は文字認識結果(*は接
触文字)を示す図である。
Next, a method of processing a contact character will be described based on a specific example. FIG. 4 is a flowchart showing a method of processing a contact character, FIG. 5 is a diagram showing a concrete example of a processing target, (a) is a diagram showing an input image, and (b) is a character pattern group classification result. It is a figure which shows (the number under a character is a classification pattern number), and (c) is a figure which shows a character recognition result (* is a contact character).

【0023】まず、接触文字と判定された文字パターン
群内の文字パターンが存在する全ての単語を抽出する
(ステップS10)。抽出された各単語毎に接触文字以
外の文字列をキーとして、接触文字列の認識結果となり
得る文字列を単語辞書の検索により求める(ステップS
11)。ステップS10にて抽出された全ての単語にあ
てはまる文字列が存在するか判定する(ステップS1
2)。全ての単語にあてはまる文字列が存在した場合、
その文字列を接触文字列の認識結果として認識結果格納
部20に格納する(ステップS13)。図5の例では
〔*ey〕〔*rough〕〔*eir〕〔ra*e
r〕〔*an〕〔*eory〕キーにしてそれぞれ単語
照合を行い、全ての単語にあてはまる〔th〕を得、認
識結果とする。ステップS12において、該当する文字
列が存在しない場合、この接触文字パターン群の代表パ
ターンが、接触文字分割処理部19に送られ、この文字
パターンの矩形内の画像の黒画素の縦方向のヒストグラ
ムを求め、その値が与えられるしきい値より小さい場合
にその部分を隣接文字間の接触点として、接触点にて外
接矩形領域を分割する(ステップS14)。以下、ステ
ップS4、ステップS5と同様の図形特徴抽出処理、文
字認識処理を行い(ステップS15、ステップS1
6)、その認識結果を認識結果格納部20に格納する。
First, all the words in which the character patterns in the character pattern group determined as the contact character exist are extracted (step S10). A character string other than the contact character is used as a key for each extracted word, and a character string that can be a recognition result of the contact character string is obtained by searching the word dictionary (step S
11). It is determined whether there is a character string that applies to all the words extracted in step S10 (step S1).
2). If there is a character string that applies to all words,
The character string is stored in the recognition result storage unit 20 as the recognition result of the contact character string (step S13). In the example of FIG. 5, [* ey] [* rough] [* air] [ra * e
r] [* an] [* eory] keys are used to perform word matching, and [th] applicable to all words is obtained as a recognition result. If the corresponding character string does not exist in step S12, the representative pattern of the contact character pattern group is sent to the contact character division processing unit 19, and the vertical histogram of the black pixels of the image in the rectangle of this character pattern is displayed. When the obtained value is smaller than the given threshold value, the portion is set as the contact point between the adjacent characters, and the circumscribed rectangular area is divided at the contact point (step S14). Hereinafter, graphic feature extraction processing and character recognition processing similar to those in steps S4 and S5 are performed (steps S15 and S1).
6) The recognition result is stored in the recognition result storage unit 20.

【0024】認識結果格納部20に格納されている認識
結果、及び単語領域切り出し結果を認識結果出力部20
に送り、認識対象文書の認識結果を出力する(ステップ
S9)。
The recognition result stored in the recognition result storage unit 20 and the word region cutout result are output to the recognition result output unit 20.
Then, the recognition result of the recognition target document is output (step S9).

【0025】以上ステップS1〜ステップS9の処理を
行うことにより、与えられた文書画像の文字認識処理を
行う。
By performing the processing of steps S1 to S9, the character recognition processing of the given document image is performed.

【0026】[0026]

【発明の効果】以上のように本発明は、文字パターンを
重ね合わせ法によりパターン群に分類する文字パターン
群分類部と、接触文字と判定された文字パターン群内の
文字パターンを含む単語文字列のうち接触文字パターン
の以外の文字をキーとして、単語辞書と照合し接触文字
列を確定する接触文字パターン群確定部を設けることに
より、分類された文字パターン群の代表パターンの認識
を行い、接触文字に対しては、認識確度の高い文字列を
キーとして単語辞書との照合により認識結果を求めるの
で短時間で高精度の認識結果を得ることが出来る英文字
認識装置を実現できるものである。
As described above, according to the present invention, a character pattern group classification unit for classifying character patterns into pattern groups by a superposition method, and a word character string including a character pattern in a character pattern group determined to be a contact character. By using a character other than the contact character pattern as a key, the contact character pattern group determination unit that determines the contact character string by matching with the word dictionary is provided to recognize the representative pattern of the classified character pattern group and With respect to characters, since a recognition result is obtained by collating with a word dictionary using a character string having high recognition accuracy as a key, it is possible to realize an English character recognition device that can obtain a highly accurate recognition result in a short time.

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の一実施例における英文字認識装置の機
能構成を示すブロック図
FIG. 1 is a block diagram showing a functional configuration of an English character recognition device according to an embodiment of the present invention.

【図2】本発明の一実施例における英文字認識装置の装
置構成を示すブロック図
FIG. 2 is a block diagram showing a device configuration of an English character recognition device according to an embodiment of the present invention.

【図3】本発明の一実施例における英文字認識装置の全
体の制御手順を示すフローチャート
FIG. 3 is a flowchart showing the overall control procedure of the English character recognition device in the embodiment of the present invention.

【図4】本発明の一実施例における英文字認識装置の接
触文字の処理方法を示すフローチャート
FIG. 4 is a flowchart showing a contact character processing method of the English character recognition device according to the embodiment of the present invention.

【図5】(a)本発明の一実施例における英文字認識装
置の被処理対象の入力画像の例を示す図 (b)その文字パターン群の分類結果を示す図 (c)その文字の文字認識結果を示す図
FIG. 5 (a) is a diagram showing an example of an input image to be processed by the English character recognition device in one embodiment of the present invention (b) is a diagram showing a classification result of the character pattern group (c) is a character of the character Diagram showing recognition results

【図6】従来の英文字認識装置の機能構成を示すブロッ
ク図
FIG. 6 is a block diagram showing a functional configuration of a conventional English character recognition device.

【符号の説明】[Explanation of symbols]

9 画像入力部 10 画像格納部 11 外接矩形検出部 12 単語領域切り出し部 13 文字パターン群分類部 14 図形特徴抽出部 15 文字認識部 16 認識辞書 17 接触文字パターン群確定部 18 単語辞書 19 接触文字分割処理部 20 認識結果格納部 21 認識結果出力部 9 image input unit 10 image storage unit 11 circumscribed rectangle detection unit 12 word region cutout unit 13 character pattern group classification unit 14 graphic feature extraction unit 15 character recognition unit 16 recognition dictionary 17 contact character pattern group determination unit 18 word dictionary 19 contact character division Processing unit 20 Recognition result storage unit 21 Recognition result output unit

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】認識対象文書を入力する画像入力部と、前
記画像入力部で入力された文書画像を記憶する画像記憶
部と、前記画像記憶部に記憶された文書画像中の文字に
外接する矩形を黒画素の連なりをもとに求める外接矩形
検出部と、前記外接矩形検出部で求められた外接矩形の
水平方向の間隔をもとに単語領域を求める単語領域切り
出し部と、前記外接矩形検出部で求められた外接矩形内
の文字パターンを重ね合わせ法によりパターン群に分類
する文字パターン群分類部と、前記外接矩形検出部で求
められた外接矩形内の黒画素の分布を図形特徴として抽
出する図形特徴抽出部と、予め全ての認識対象文字の図
形特徴を記憶している認識辞書と、前記図形特徴抽出部
で抽出した図形特徴と前記認識辞書との比較により類似
した特徴を有する文字が前記認識辞書中に存在した場
合、該当文字を認識結果として後述の認識結果格納部に
記憶させ、また、類似する特徴が認識辞書中に存在しな
い場合、前記外接矩形検出部で求められた外接矩形内の
画像を隣接している2文字が接触している接触文字とし
て判定する文字認識処理部と、接触文字と判定された文
字パターン群内の文字パターンを含む単語文字列を抽出
し接触文字パターンの以外の文字をキーとして前記単語
辞書と照合し接触文字列を確定する接触文字パターン群
確定部と、前記接触文字パターン群確定部にて確定でき
なかった文字パターン群の画像の黒画素の縦方向のヒス
トグラムに注目し、その値が小さな部分を文字と文字と
の接触点として1文字ずつ分割する接触文字分割処理部
と、前記文字認識処理部にて得られた認識結果及び前記
接触文字パターン群確定部にて得られた確定文字列を格
納している認識結果格納部と、前記認識結果格納部に格
納されている結果及び前記単語領域切り出し部にて切り
出された単語領域をもとに認識対象文書の認識結果を出
力する認識結果出力部と、を備えたことを特徴とする英
文字認識装置。
1. An image input unit for inputting a document to be recognized, an image storage unit for storing the document image input by the image input unit, and a character circumscribing a character in the document image stored in the image storage unit. A circumscribing rectangle detection unit that obtains a rectangle based on a series of black pixels, a word region cutout unit that obtains a word region based on the horizontal interval of the circumscribed rectangle obtained by the circumscribed rectangle detection unit, and the circumscribed rectangle A character pattern group classification unit that classifies the character patterns in the circumscribed rectangle obtained by the detection unit into a pattern group by the superposition method, and the distribution of black pixels in the circumscribed rectangle obtained by the circumscribed rectangle detection unit as graphic features. The graphic feature extraction unit to be extracted, the recognition dictionary in which the graphic features of all the recognition target characters are stored in advance, and the feature similar to the graphic feature extracted by the graphic feature extraction unit and the recognition dictionary are similar to each other. If a character exists in the recognition dictionary, the corresponding character is stored as a recognition result in the recognition result storage unit described later, and if a similar feature does not exist in the recognition dictionary, it is obtained by the circumscribing rectangle detection unit. A character recognition processing unit that determines an image in a circumscribed rectangle as a contact character in which two adjacent characters are in contact, and a word character string that includes a character pattern in a character pattern group that is determined as a contact character is extracted to make contact. A contact character pattern group confirmation unit that confirms a contact character string by collating with the word dictionary using a character other than a character pattern as a key, and a black pixel of an image of a character pattern group that cannot be confirmed by the contact character pattern group confirmation unit Focusing on the histogram in the vertical direction of, the contact character division processing unit that divides the character having a small value as a contact point between the characters one by one and the character recognition processing unit are obtained. The recognition result storage unit that stores the recognition result and the confirmed character string obtained by the contact character pattern group confirmation unit, and the result stored in the recognition result storage unit and the word area cutout unit. And a recognition result output unit for outputting the recognition result of the recognition target document based on the word area.
JP4243059A 1992-09-11 1992-09-11 Alphabet recognizing device Pending JPH0696277A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP4243059A JPH0696277A (en) 1992-09-11 1992-09-11 Alphabet recognizing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP4243059A JPH0696277A (en) 1992-09-11 1992-09-11 Alphabet recognizing device

Publications (1)

Publication Number Publication Date
JPH0696277A true JPH0696277A (en) 1994-04-08

Family

ID=17098194

Family Applications (1)

Application Number Title Priority Date Filing Date
JP4243059A Pending JPH0696277A (en) 1992-09-11 1992-09-11 Alphabet recognizing device

Country Status (1)

Country Link
JP (1) JPH0696277A (en)

Similar Documents

Publication Publication Date Title
JP3445394B2 (en) How to compare at least two image sections
JP3155577B2 (en) Character recognition method and device
JP3919617B2 (en) Character recognition device, character recognition method, program, and storage medium
JPH11232296A (en) Document filing system and document filing method
JPH0696277A (en) Alphabet recognizing device
JPH11328315A (en) Character recognizing device
JPH06180771A (en) English letter recognizing device
JPH05346974A (en) Character recognizing device
JPH06348911A (en) English character recognition device
JP3151866B2 (en) English character recognition method
JP2746345B2 (en) Post-processing method for character recognition
JP3116453B2 (en) English character recognition device
JPH0589190A (en) Drawing information checking system
JPS63269267A (en) Character recognizing device
JPH0728944A (en) English character recognition device
JPH06309508A (en) English character recognizing device
JPH06309503A (en) English character recognizing device
JP2972443B2 (en) Character recognition device
JPH06119497A (en) Character recognizing method
JPH05298487A (en) Alphabet recognizing device
JP2851865B2 (en) Character recognition device
JP2639314B2 (en) Character recognition method
JPH10214308A (en) Character discrimination method
JPH06139277A (en) Electronic dictionary device
JPH01201789A (en) Character reader