JPH0981685A - Occidental character recognition device and occidental character recognition method - Google Patents

Occidental character recognition device and occidental character recognition method

Info

Publication number
JPH0981685A
JPH0981685A JP7238090A JP23809095A JPH0981685A JP H0981685 A JPH0981685 A JP H0981685A JP 7238090 A JP7238090 A JP 7238090A JP 23809095 A JP23809095 A JP 23809095A JP H0981685 A JPH0981685 A JP H0981685A
Authority
JP
Japan
Prior art keywords
character
recognition
word
characters
processing method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP7238090A
Other languages
Japanese (ja)
Inventor
Michiaki Nobuoka
道明 信岡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP7238090A priority Critical patent/JPH0981685A/en
Publication of JPH0981685A publication Critical patent/JPH0981685A/en
Pending legal-status Critical Current

Links

Abstract

PROBLEM TO BE SOLVED: To provide an occidental character recognition device whose processing speed is fast, recognition accuracy is high and operability and operation efficiency are high when the recognition characters of low recognition likelihood are many. SOLUTION: This device is provided with a character processing method judgement means 15 for deciding the processing speed of the recognition character depending on at what ratio the recognition characters of the high recognition likelihood are present in a document, a contact character separation means 7 for judging the recognition characters of the low recognition likelihood as contact characters, obtaining a separating position from graphic features and separating the contact character at the separating position when the ratio of the recognition characters of the high recognition likelihood is high in the character processing method judgement means 15, a word candidate detection means 16 for collating a word string composed of the recognition characters with words stored in a word dictionary, selecting one or plural word candidates and displaying them at a display part when the ratio of the recognition characters of the high recognition likelihood is low in the character processing method judgement means 15 and a word candidate establishment means 17 for selecting an accurate word from the word candidates displayed in the word candidate detection means 16 by a user.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明はイメージスキャナー等の
画像入力装置で入力されたアルファベット等の欧米文字
の画像データをASCIIコード等の文字コードからな
る文字データに変換してコンピュータ等での処理を容易
にする欧米文字認識装置及び欧米文字認識方法に関し、
特に、画像品質が悪く接触文字の区別が判別し難い画像
データの認識に用いられる欧米文字認識装置及び欧米文
字認識方法に関するものである。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention converts image data of Western characters such as alphabets input by an image input device such as an image scanner into character data composed of character codes such as ASCII code and processed by a computer or the like. Regarding the Western character recognition device and the Western character recognition method to facilitate,
In particular, the present invention relates to a Western character recognition device and a Western character recognition method used for recognizing image data that has poor image quality and is difficult to distinguish between contact characters.

【0002】[0002]

【従来の技術】近年、文字を記載した文書をイメージス
キャナー等の画像入力装置によりコンピューター内に取
り込み文字の処理を行うことが大量の文字データを短時
間で処理できる点で注目されている。コンピューター内
で処理するためには画像データを文字コード等に変換す
る文字認識装置が必要である。特に欧米文字は文字種が
少なく変換効率も高いので実用化が進んでいる。この際
入力される画像データの品質にかかわらず高い品質で正
確な変換が行われることが望まれている。
2. Description of the Related Art In recent years, it has been noticed that a large amount of character data can be processed in a short time by taking a document in which characters are written into a computer by an image input device such as an image scanner and processing the characters. A character recognition device that converts image data into a character code or the like is required for processing in a computer. Especially, since Western characters have few character types and high conversion efficiency, they are being put to practical use. At this time, it is desired that accurate conversion be performed with high quality regardless of the quality of the input image data.

【0003】以下に従来の欧米文字認識装置について説
明する。図4は従来例における欧米文字認識装置の機能
ブロック図である。図4において、1は文字を含んだ認
識対象文書を2値画像データとして入力する画像入力
部、2は画像入力部1において入力された画像データを
格納する画像格納部、3は画像格納部2に格納された画
像データ中の文字に外接する矩形を求めこの矩形を文字
領域として切り出す文字切り出し手段、4は文字切り出
し手段3で切り出された文字領域の水平方向の黒画素の
分布を計測し文字領域を行領域に切り出す行切り出し手
段、5は行切り出し手段4により切り出された各々の行
領域に対して垂直方向の黒画素の分布を計測し単語領域
を切り出す単語切り出し手段、6は文字切り出し手段3
により求められた文字領域内の画像の黒画素の分布を図
形特徴として注出しこの図形特徴と後述する認識辞書に
格納された標準の文字の図形特徴とをパターン照合し認
識文字を決定すると同時にその照合の程度から認識確度
を測定する文字認識手段、7は文字認識手段6で決定さ
れた認識文字の内認識確度の低いものを接触文字とし図
形特徴から分離位置を検出しこの分離位置で接触文字を
切断し再度文字認識手段6で認識辞書との照合を行う接
触文字分離手段、8は全体の認識対象文字の図形特徴を
格納する認識辞書、9は文字切り出し手段3で切り出さ
れた文字領域を格納する文字領域格納部、10は行切り
出し手段4で切り出された行領域を格納する行領域格納
部、11は単語切り出し手段5で切り出された単語領域
を格納する単語領域格納部、12は文字認識手段6で得
られた認識文字を格納する認識結果格納部、13は文字
切り出し手段3、行切り出し手段4、単語切り出し手段
5、文字認識手段6、接触文字分離手段7を有する制御
部、14は認識結果格納部12に格納された最終的な認
識結果を出力する認識結果出力部である。
A conventional Western character recognition device will be described below. FIG. 4 is a functional block diagram of a Western character recognition device in a conventional example. In FIG. 4, 1 is an image input unit for inputting a recognition target document including characters as binary image data, 2 is an image storage unit for storing the image data input in the image input unit 1, and 3 is an image storage unit 2. A character slicing means 4 for obtaining a rectangle circumscribing a character in the image data stored in the slicing device and slicing the rectangle as a character region, 4 measures the distribution of black pixels in the horizontal direction in the character region cut out by the character slicing device 3, and determines the character. A line segmenting means for segmenting a region into a line segment, 5 is a word segmenting segment for segmenting a word region by measuring a distribution of black pixels in the vertical direction with respect to each line segment segmented by the line segmenting segment 4, and 6 is a character segmenting segment. Three
The distribution of the black pixels of the image in the character area obtained by is extracted as a graphic feature, and this graphic feature and the graphic feature of the standard character stored in the recognition dictionary described later are pattern-matched to determine the recognized character and at the same time. Character recognition means for measuring the recognition accuracy from the degree of collation, 7 is a character which has a low recognition accuracy among the recognized characters determined by the character recognition means 6 is used as a contact character, and the separation position is detected from the figure feature to detect the contact character. The contact character separating means for cutting off the character and again collating it with the recognition dictionary by the character recognizing means 6, 8 is a recognition dictionary for storing the graphic features of the entire recognition target character, and 9 is the character area cut out by the character cutting means 3. A character area storage unit for storing, 10 is a line area storage unit for storing the line area cut out by the line cutting means 4, and 11 is a word area for storing the word area cut out by the word cutting means 5. A storage unit, 12 is a recognition result storage unit that stores the recognition characters obtained by the character recognition unit 6, and 13 is a character cutting unit 3, a line cutting unit 4, a word cutting unit 5, a character recognition unit 6, and a contact character separating unit 7. The control unit 14 has a recognition result output unit that outputs the final recognition result stored in the recognition result storage unit 12.

【0004】以上のように構成された欧米文字認識装置
について、以下その動作を説明する。まず、画像入力部
1から欧米文字を含む認識対象文書を2値画像として欧
米文字認識装置内に取り込み画像格納部2に格納する。
次に、文字切り出し手段3において、画像格納部2に格
納された画像データの中から文字パターンと思われる黒
画素領域に外接する矩形領域を切り出し文字領域として
文字領域格納部9に格納する。次に、行切り出し手段4
において、文字領域格納部9に格納された文字領域の水
平方向の黒画素の分布から行領域を切り出し行領域格納
部10に格納する。次に、単語切り出し手段5におい
て、文字領域格納部9に格納された文字領域と行領域格
納部10に格納された行領域から各行領域毎の垂直方向
の黒画素の分布を計測し単語領域を切り出し単語領域格
納部11に格納する。次に、文字認識手段6において、
文字領域格納部9に格納された文字領域のそれぞれの文
字に相当する図形特徴を注出し認識辞書8に格納された
標準文字の図形特徴とパターン照合を行い類似のパター
ンについて認識文字とするとともに類似の程度を認識確
度として計算し認識文字の文字コードと認識確度を認識
結果格納部12に格納する。次に、接触文字分離手段7
において、認識結果格納部12に格納された認識確度が
低い場合はその文字領域を接触文字と判断し接触文字に
図形特徴から分離位置を検出し分離位置で分離した後に
文字認識手段6で再度文字認識を行う。次に、認識結果
出力部14において、認識結果格納部12に格納された
認識結果を出力することにより欧米文字に認識処理を完
了する。
The operation of the Western character recognizing apparatus having the above-described structure will be described below. First, a document to be recognized containing Western characters is input as a binary image from the image input unit 1 into the Western character recognition device and stored in the image storage unit 2.
Next, in the character cut-out means 3, a rectangular area circumscribing a black pixel area which is considered to be a character pattern in the image data stored in the image storage section 2 is stored in the character area storage section 9 as a cut-out character area. Next, line cutting means 4
At, the line area is cut out from the distribution of black pixels in the horizontal direction of the character area stored in the character area storage unit 9 and stored in the line area storage unit 10. Next, in the word cutting means 5, the distribution of vertical black pixels for each line area is measured from the character area stored in the character area storage unit 9 and the line area stored in the line area storage unit 10 to determine the word area. The cut-out word area storage unit 11 stores it. Next, in the character recognition means 6,
The graphic features corresponding to the respective characters in the character area stored in the character area storage unit 9 are subjected to pattern matching with the graphic characteristics of the standard characters stored in the extraction recognition dictionary 8 to make similar patterns and recognized characters. Is calculated as the recognition accuracy, and the character code of the recognized character and the recognition accuracy are stored in the recognition result storage unit 12. Next, the contact character separating means 7
When the recognition accuracy stored in the recognition result storage unit 12 is low, the character area is determined to be a contact character, the separation position of the contact character is detected from the graphic feature, the character is separated at the separation position, and then the character recognition unit 6 again recognizes the character. To recognize. Next, the recognition result output unit 14 outputs the recognition result stored in the recognition result storage unit 12 to complete the recognition process for Western characters.

【0005】[0005]

【発明が解決しようとする課題】しかしながら上記従来
の欧米文字認識装置では、認識対象文書の印字品質が低
く文字間の接触部分が多く接触文字が増加するにつれ再
認識の頻度が増え認識にかかる時間が長くなり作業効率
が低下するという問題点を有していた。その結果、認識
処理後の後処理に手間がかかり作業性及び作業効率に劣
るという問題点を有していた。
However, in the above-mentioned conventional Western character recognition apparatus, the frequency of re-recognition increases and the time required for recognition increases as the print quality of the recognition target document is low and the number of contact portions between characters increases. However, there is a problem in that the work efficiency becomes low and the work efficiency decreases. As a result, there is a problem in that post-processing after the recognition processing is troublesome and workability and work efficiency are poor.

【0006】本発明は上記従来の問題点を解決するもの
で、認識確度が低い認識文字が多い場合に認識確度が高
い認識文字をキーとして単語検索を行い単語候補を選定
することにより処理速度が速いとともに認識精度が高く
作業性及び作業効率の高い欧米文字認識装置の提供、及
び、認識速度及び認識精度がともに高く作業性及び信頼
性に優れる欧米文字認識方法の提供を目的とするもので
ある。
The present invention solves the above-mentioned conventional problems, and when there are many recognized characters with low recognition accuracy, word recognition is performed using the recognized character with high recognition accuracy as a key, and the processing speed is increased by selecting word candidates. The object of the present invention is to provide a Western character recognition device that is fast and has high recognition accuracy and high workability and work efficiency, and a Western character recognition method that has both high recognition speed and recognition accuracy and excellent workability and reliability. .

【0007】[0007]

【課題を解決するための手段】この目的を達成するため
に本発明の請求項1に記載の欧米文字認識装置は、認識
確度が高い認識文字が文書中どれだけの割合で存在した
かによって認識文字の処理方法を決定する文字処理方法
判定手段と、文字処理方法判定手段において認識確度が
高い認識文字の割合が高い場合に認識確度が低い認識文
字を接触文字と判定し図形特徴から分離位置を求め分離
位置で接触文字を分離する接触文字分離手段と、文字処
理方法判定手段において認識確度が高い認識文字の割合
が低い場合に認識文字からなる単語列と単語辞書中に格
納された単語と照合を行い1乃至複数の単語候補を選定
し表示部に表示する単語候補検出手段と、使用者が単語
候補検出手段で表示された単語候補から正確な単語を選
定する単語候補確定手段と、を有する構成からなる。
In order to achieve this object, a Western character recognition apparatus according to claim 1 of the present invention recognizes a recognition character having a high recognition accuracy by the proportion of the recognition character in a document. A character processing method determining means for determining a character processing method, and a character recognition method having a high recognition accuracy in the character processing method determining means determines a recognition character having a low recognition accuracy as a contact character and separates it from a graphic feature. Contact character separating means for separating contact characters at the obtained separation position, and matching with a word string composed of recognized characters and words stored in a word dictionary when the proportion of recognized characters with high recognition accuracy in the character processing method determination means is low The word candidate detection means for selecting one or a plurality of word candidates and displaying them on the display unit, and the word candidate confirmation means for selecting an accurate word from the word candidates displayed by the word candidate detection means by the user. Consisting configuration having a means.

【0008】本発明の請求項2に記載の欧米文字認識方
法は、認識確度が高い認識文字が文書中どれだけの割合
で存在したかによって認識文字の処理方法を決定する文
字処理方法判定ステップと、文字処理方法判定ステップ
において認識確度が高い認識文字の割合が高い場合に認
識確度が低い認識文字を接触文字と判定し図形特徴から
分離位置を求め分離位置で接触文字を分離する接触文字
分離ステップと、文字処理方法判定ステップにおいて認
識確度が高い認識文字の割合が低い場合に認識文字から
なる単語列と単語辞書中に格納された単語と照合を行い
1乃至複数の単語候補を選定し表示部に表示する単語候
補検出ステップと、使用者が単語候補検出ステップで表
示された単語候補から正確な単語を選定する単語候補確
定ステップと、を有する構成からなる。
The Western character recognition method according to claim 2 of the present invention comprises a character processing method determining step for determining the processing method of the recognized character according to the proportion of the recognized character having high recognition accuracy in the document. In the character processing method determination step, when the proportion of recognized characters with high recognition accuracy is high, the recognized character with low recognition accuracy is determined to be a contact character, the separation position is obtained from the graphic feature, and the contact character is separated at the separation position. In the character processing method determining step, when the proportion of recognized characters having high recognition accuracy is low, the word string made up of the recognized characters is collated with the words stored in the word dictionary, and one or more word candidates are selected and displayed on the display unit. A word candidate detection step displayed on the screen, and a word candidate confirmation step in which the user selects an accurate word from the word candidates displayed in the word candidate detection step. Consisting of configuration that.

【0009】ここで、文字処理方法判定手段での認識確
度の判定値は予め固定された値でも良いが可変にし使用
者が認識結果を確認しながら設定できるようにしても良
い。
Here, the determination value of the recognition accuracy in the character processing method determination means may be a fixed value in advance, but it may be variable so that the user can set it while confirming the recognition result.

【0010】[0010]

【作用】この構成によって、文字処理方法判定手段が認
識確度が高い認識文字が文書中にどれだけの割合で存在
するかどうかを的確に判定し認識速度及び認識精度を向
上できるように後のステップを選定できる。接触文字分
離手段が接触文字と認定された認識文字の分離位置を正
確に識別しその分離位置から認識文字を分離して再度認
識処理を行わせ認識精度を向上できる。単語候補検出手
段が各々文字別に識別された認識文字を単語単位で単語
辞書と照合することにより認識精度をより向上できる。
単語候補確定手段が使用者に単語候補が正確であるかど
うかを判定させることにより認識精度を高くすると同時
に後処理での訂正を省略できる。
With this configuration, the character processing method determining means accurately determines at what rate the recognized characters having a high recognition accuracy are present in the document, so that the recognition speed and the recognition accuracy can be improved in the subsequent steps. Can be selected. The contact character separating means can accurately identify the separation position of the recognized character recognized as the contact character, separate the recognized character from the separated position, and perform the recognition process again to improve the recognition accuracy. The recognition accuracy can be further improved by the word candidate detecting means collating the recognized characters identified for each character with the word dictionary on a word-by-word basis.
By making the user determine whether or not the word candidate is correct by the word candidate determining means, the recognition accuracy can be increased and at the same time the correction in the post-processing can be omitted.

【0011】[0011]

【実施例】【Example】

(実施例1)以下本発明の第1実施例について、図面を
参照しながら説明する。図1は第1実施例における欧米
文字認識装置の機能ブロック図であり、図2は第1実施
例における欧米文字認識装置の装置ブロック図である。
図1において、1は画像入力部、2は画像格納部、3は
文字切り出し手段、4は行切り出し手段、5は単語切り
出し手段、6は文字認識手段、7は接触文字分離手段、
8は認識辞書、9は文字領域格納部、10は行領域格納
部、11は単語領域格納部、12は認識結果格納部、1
3は制御部、14認識結果出力部である。これらは従来
例と同様のものであり同一の符号を付けて説明を省略す
る。
(Embodiment 1) Hereinafter, a first embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a functional block diagram of the Western character recognition device in the first embodiment, and FIG. 2 is a device block diagram of the Western character recognition device in the first embodiment.
In FIG. 1, 1 is an image input unit, 2 is an image storage unit, 3 is character cutting means, 4 is line cutting means, 5 is word cutting means, 6 is character recognition means, 7 is contact character separation means,
8 is a recognition dictionary, 9 is a character area storage unit, 10 is a line area storage unit, 11 is a word area storage unit, 12 is a recognition result storage unit, 1
Reference numeral 3 is a control unit and 14 is a recognition result output unit. These are the same as those in the conventional example, and the same reference numerals are given to omit the description.

【0012】15は認識結果格納部12に格納された認
識文字の認識確度に基づいて後のステップで認識文字を
分離するか単語単位で単語辞書と照合するかを判定する
文字処理方法判定手段、16は単語領域格納部11に格
納されている単語領域や認識結果格納部12に格納され
ている認識文字と後述する単語辞書に格納されている単
語とを照合させ1乃至複数の類似する単語を単語候補に
選定する単語候補検出手段、17は使用者が後述する表
示部に表示された単語候補検出手段16で選定された単
語候補から正確な単語を決定し認識結果を確定する単語
候補確定手段、18は単語候補検出手段16にて検出さ
れた単語候補及びその座標を格納する単語候補格納部、
19は全体の単語の綴りを格納している単語辞書、20
は単語候補格納部18に格納された単語候補と画像格納
部2に格納された画像データの内単語候補の位置を表示
する表示部、21は使用者が表示部20に表示された単
語候補の内から正確な単語を選定し欧米文字認識装置に
入力し認識結果とする認識結果入力部である。
Reference numeral 15 is a character processing method determining means for determining whether to recognize the recognized character or to collate it with a word dictionary in word units at a later step based on the recognition accuracy of the recognized character stored in the recognition result storage unit 12. Reference numeral 16 matches one or a plurality of similar words by matching a word area stored in the word area storage unit 11 or a recognition character stored in the recognition result storage unit 12 with a word stored in a word dictionary described later. A word candidate detecting means for selecting as a word candidate, and 17 is a word candidate determining means for determining an accurate word from the word candidate selected by the word candidate detecting means 16 displayed on the display unit, which will be described later, and determining the recognition result. , 18 is a word candidate storage section for storing the word candidates detected by the word candidate detecting means 16 and the coordinates thereof,
19 is a word dictionary that stores the spelling of all words, 20
Is a display unit for displaying the positions of the word candidates stored in the word candidate storage unit 18 and the word candidates in the image data stored in the image storage unit 2, and 21 is the word candidate displayed by the user in the display unit 20. It is a recognition result input unit that selects an accurate word from among them and inputs it to a Western character recognition device to obtain a recognition result.

【0013】図2において、22は認識対象文書を画像
データとして欧米文字認識装置に取り込むイメージスキ
ャナーやカメラ等からなる画像入力装置、23は使用者
がシステムの起動、終了指令或いは認識結果の入力その
他の指令を行うキーボード、トラックボール、ポインテ
ィングデバイス等からなる入力装置、24はシステムの
動作状況或いは画像データや単語候補等を表示するCR
T、液晶ディスプレー等からなる表示装置、25は全体
の制御を行う中央処理装置(CPUと略称する)、26
は各制御指令を含みCPUがロードして使用する制御プ
ログラム、27は制御プログラム26が格納された読み
出し専用のリード・オンリ・メモリ(ROM)、28は
画像格納部2に格納された画像データ、29は文字領域
格納部9に格納された文字領域データ、30は行領域格
納部10に格納された行領域データ、31は単語領域格
納部11に格納された単語領域データ、32は認識結果
格納部12に格納された認識結果データ、33は単語候
補格納部18に格納された単語候補データ、34は演算
途中の各データを格納する書き込み消去が可能なランダ
ム・アクセス・メモリ(RAM)、35は認識結果を出
力するプリンター等の出力装置、36は各構成装置間を
連結し制御信号やデータ等の遣り取りをする内部バスで
ある。
In FIG. 2, reference numeral 22 is an image input device composed of an image scanner, a camera or the like for capturing a document to be recognized as image data into a Western character recognition device, and 23 is a user's input of a system start / end command or a recognition result. An input device including a keyboard, a trackball, a pointing device, and the like for issuing a command, 24 is a CR that displays the operating status of the system or image data, word candidates, and the like.
T is a display device including a liquid crystal display, 25 is a central processing unit (abbreviated as CPU) for performing overall control, 26
Is a control program including each control command to be loaded and used by the CPU, 27 is a read-only read-only memory (ROM) in which the control program 26 is stored, 28 is image data stored in the image storage unit 2, Reference numeral 29 is character area data stored in the character area storage unit 9, 30 is line area data stored in the line area storage unit 10, 31 is word area data stored in the word area storage unit 11, and 32 is recognition result storage. The recognition result data stored in the unit 12, 33 is the word candidate data stored in the word candidate storage unit, 34 is a write-erasable random access memory (RAM) that stores each data in the middle of calculation, 35 Is an output device such as a printer that outputs the recognition result, and 36 is an internal bus that connects the constituent devices and exchanges control signals and data.

【0014】以上のように構成された第1実施例におけ
る欧米文字認識装置について、以下図面を用いてその動
作を説明する。図3は第1実施例における欧米文字認識
装置の動作を示すフローチャートである。まず、画像入
力部1において欧米文字を含む認識対象文書の画像を2
値画像データに変換し画像格納部2に格納する(画像入
力処理、S1)。次に、文字切り出し手段3において画
像格納部2に格納された画像データの内近接する8点で
連結された黒画素の集まりを1つの文字パターンとみな
しこの文字パターンに外接する矩形を文字領域として切
り出す。その際、抽出した矩形の大きさが微小矩形であ
りかつその矩形の垂直方向のすぐ近くに矩形が存在する
場合、i、j等の分離文字とみなし2つの矩形を統合し
1つの文字領域とし、その他の場合抽出した外接矩形を
一つの文字領域とし各文字矩形ごとに左上の角の座標及
び矩形の幅、高さを文字領域格納部9に格納する(文字
切り出し処理、S2)。次に、行切り出し手段4におい
て文字領域格納部9に格納された文字領域について文字
矩形の水平方向の重なり具合を調べ各行の行領域を求め
各行毎に左上の角の座標及び右下の角の座標を行領域格
納部10に格納する(行切り出し処理、S3)。次に、
単語切り出し手段5において行領域格納部10に格納さ
れた行領域毎にその行の属する文字領域の水平方向の間
隔のヒストグラムを取りこのヒストグラムの山を2分す
る閾値を求めこの閾値より文字間の間隔が広い場合単語
区切りとし各単語毎に左上の角の座標及び右下の角の座
標を単語領域格納部11に格納する(単語切り出し処
理、S4)。次に、文字認識手段6において文字領域格
納部9に格納された全ての文字領域内の画像データを画
像格納部2から取り出し黒画素の分布を図形特徴として
抽出し認識辞書8内の図形特徴とパターン照合すること
により類似する文字から認識確度を算定し最も認識確度
の高い文字を認識文字とし文字コードを認識結果として
認識確度とともに認識結果格納部12に格納する(文字
認識処理、S5)。次に、文字処理方法判定手段15に
おいて認識結果格納部12に格納された認識確度が所定
の値以上の認識文字が85%以上であるかどうかを判定
する(文字処理方法判定処理、S6)。YESである場
合は接触文字分離手段7において認識確度が低い認識文
字の図形特徴の内で図形的つながりの薄いところ、例え
ば、黒画素の垂直方向の分布が少なく連結箇所が1か所
のところ等で画像を切断する(接触文字分離処理、S
7)。次に、接触文字分離手段7で切断された画像デー
タを再度文字認識手段6により文字認識処理を行い最も
認識確度の高いものを認識結果として認識結果格納部1
2に格納する(文字認識処理、S8)。次に、ステップ
11を実行する。ステップ6がNOである場合は単語候
補検出手段16において認識結果格納部12に格納され
たの認識文字と単語領域格納部11に格納された単語領
域から認識文字を単語列に区切りこの単語列と単語辞書
19に格納された単語とを照合しある程度の照合精度を
有する単語を単語候補として表示部20に表示する(単
語候補検出処理、S9)。次に、単語候補確定手段17
において使用者は表示部20に表示された単語候補から
正確な単語を認識結果として認識結果格納部12に格納
する(単語候補確定処理、S10)。次に、認識結果を
認識結果出力部に出力し(認識結果出力処理、S11)
動作を完了する。
The operation of the Western character recognition apparatus of the first embodiment constructed as described above will be described below with reference to the drawings. FIG. 3 is a flow chart showing the operation of the Western character recognition apparatus in the first embodiment. First, in the image input unit 1, the image of the document to be recognized including Western characters is
It is converted into value image data and stored in the image storage unit 2 (image input process, S1). Next, in the character cut-out means 3, a group of black pixels connected at eight adjacent points in the image data stored in the image storage unit 2 is regarded as one character pattern, and a rectangle circumscribing this character pattern is set as a character area. cut. At that time, when the size of the extracted rectangle is a minute rectangle and a rectangle exists in the immediate vertical direction of the rectangle, it is regarded as a separated character such as i and j, and the two rectangles are integrated into one character area. In other cases, the extracted circumscribed rectangle is used as one character area, and the coordinates of the upper left corner and the width and height of the rectangle are stored in the character area storage unit 9 for each character rectangle (character cutout process, S2). Next, the line cutting means 4 checks the horizontal overlap of the character rectangles in the character area stored in the character area storage unit 9 to find the line area of each line, and determines the coordinates of the upper left corner and the lower right corner of each line. The coordinates are stored in the line area storage unit 10 (line cutout process, S3). next,
In the word cutout means 5, a histogram of the horizontal intervals of the character regions to which the lines belong is obtained for each line region stored in the line region storage unit 10 and a threshold value for dividing the peaks of this histogram into two is obtained. If the interval is wide, the word is divided into words, and the coordinates of the upper left corner and the coordinates of the lower right corner are stored in the word area storage unit 11 for each word (word cutting process, S4). Next, in the character recognition means 6, the image data in all the character areas stored in the character area storage unit 9 is taken out from the image storage unit 2 and the distribution of black pixels is extracted as the graphic feature to obtain the graphic feature in the recognition dictionary 8. The recognition accuracy is calculated from similar characters by pattern matching, the character with the highest recognition accuracy is used as the recognition character, and the character code is stored as a recognition result together with the recognition accuracy in the recognition result storage unit 12 (character recognition processing, S5). Next, the character processing method determination means 15 determines whether or not 85% or more of the recognized characters whose recognition accuracy stored in the recognition result storage unit 12 is a predetermined value or more (character processing method determination processing, S6). In the case of YES, the contact character separating means 7 has a low recognition accuracy in the graphic features of the recognized characters, where the graphic connection is weak, for example, the distribution of black pixels in the vertical direction is small and the connection position is one place. Cut the image with (contact character separation process, S
7). Next, the image data cut by the contact character separating means 7 is again subjected to character recognition processing by the character recognizing means 6 and the one having the highest recognition accuracy is recognized as the recognition result.
2 (character recognition process, S8). Next, step 11 is executed. If step 6 is NO, the word candidate detection means 16 separates the recognized character from the recognized character stored in the recognition result storage section 12 and the word area stored in the word area storage section 11 into a word string and this word string. The word stored in the word dictionary 19 is collated, and a word having a certain degree of collation accuracy is displayed on the display unit 20 as a word candidate (word candidate detection process, S9). Next, the word candidate determination means 17
In, the user stores an accurate word from the word candidates displayed on the display unit 20 in the recognition result storage unit 12 as a recognition result (word candidate determination process, S10). Next, the recognition result is output to the recognition result output unit (recognition result output process, S11).
Complete the operation.

【0015】本実施例によれば、認識確度の分布状態に
基づいて認識文字の処理方法を決定する文字処理方法判
定手段と、認識確度が高い認識文字の割合が小さい場合
に単語辞書の単語照合を行い1乃至複数の単語候補を選
定する単語候補検出手段と、使用者が単語候補から認識
結果を決定する単語候補確定手段と、を設けたので、認
識確度の高い認識文字の分布が少なく認識確度が広がり
を以て分布している品質の悪い画像データでも使用者が
単語候補を選定することにより正確な文字認識を行うこ
とができる。結果的に手間のかかる後処理をする必要が
なく迅速な文字認識が可能で作業効率を高くできる。ま
た、未熟練者でも容易に作業ができ作業性が向上する。
According to the present embodiment, the character processing method determining means for determining the processing method of the recognized character based on the distribution state of the recognition accuracy, and the word collation of the word dictionary when the ratio of the recognized character having the high recognition accuracy is small. Since the word candidate detection means for selecting one or a plurality of word candidates and the word candidate determination means for allowing the user to determine the recognition result from the word candidates are provided, the recognition character distribution with high recognition accuracy is small and recognized. The user can perform accurate character recognition by selecting a word candidate even for image data of poor quality in which the accuracy is widely distributed. As a result, there is no need for complicated post-processing, rapid character recognition is possible, and work efficiency can be improved. Further, even an unskilled person can easily work and the workability is improved.

【0016】[0016]

【発明の効果】以上のように本発明は、認識確度が高い
認識文字が文書中どれだけの割合で存在したかによって
認識文字の処理方法を決定する文字処理方法判定手段
と、文字処理方法判定手段において認識確度が高い認識
文字の割合が高い場合に認識確度が低い認識文字を接触
文字と判定し図形特徴から分離位置を求め分離位置で接
触文字を分離する接触文字分離手段と、文字処理方法判
定手段において認識確度が低い認識文字の割合が高い場
合に認識文字からなる単語列と単語辞書中に格納された
単語と照合を行い1乃至複数の単語候補を選定し表示部
に表示する単語候補検出手段と、使用者が単語候補検出
手段で表示された単語候補から正確な単語を選定する単
語候補確定手段と、を有するので、認識確度の高い認識
文字の割合の少ない品質の劣化した画像データは認識確
度の高い認識文字をキーとして単語毎に認識作業を行い
1乃至複数の単語候補を選定し作業者に最終的な単語を
決定させるので誤認識が少なく後工程での校正作業が容
易で結果として作業性が向上する。従って、印字品質の
劣った接触文字が多く認識確度の分布が広い認識対象文
書でも確実に信頼性の高い文字認識が行われる優れた欧
米文字認識装置を実現することができる。
As described above, according to the present invention, the character processing method determining means and the character processing method determining means for determining the processing method of the recognized character depending on the proportion of the recognized character having high recognition accuracy in the document. When the ratio of recognition characters with high recognition accuracy in the means is high, the recognition character with low recognition accuracy is determined to be a contact character, the separation position is obtained from the graphic features, and the contact character is separated at the separation position, and a character processing method. When the ratio of recognition characters with low recognition accuracy is high in the determination means, a word string made up of recognition characters is compared with a word stored in a word dictionary to select one or more word candidates and display them on the display unit. Since the detection means and the word candidate confirmation means for selecting an accurate word from the word candidates displayed by the word candidate detection means are included, the proportion of recognized characters with high recognition accuracy is small. Image data with deteriorated quality is recognized for each word by using recognized characters with high recognition accuracy as a key, and one or more word candidates are selected to let the operator decide the final word. The calibration work of is easy and the workability is improved as a result. Therefore, it is possible to realize an excellent Western character recognition device that reliably performs highly reliable character recognition even on a recognition target document having many contact characters with poor print quality and a wide distribution of recognition accuracy.

【0017】また、本発明は、認識確度が高い認識文字
が文書中どれだけの割合で存在したかによって認識文字
の処理方法を決定する文字処理方法判定ステップと、文
字処理方法判定ステップにおいて認識確度が高い認識文
字の割合が高い場合に認識確度が低い認識文字を接触文
字と判定し図形特徴から分離位置を求め分離位置で接触
文字を分離する接触文字分離ステップと、文字処理方法
判定ステップにおいて認識確度が低い認識文字の割合が
高い場合に認識文字からなる単語列と単語辞書中に格納
された単語と照合を行い1乃至複数の単語候補を選定し
表示部に表示する単語候補検出ステップと、使用者が単
語候補検出ステップで表示された単語候補から正確な単
語を選定する単語候補確定ステップと、を有するので、
認識速度と認識精度のともに優れ印字品質の劣った接触
文字の多い認識対象文書でも作業効率が高いと同時に信
頼性高く文字認識ができる優れた欧米文字認識方法を実
現することができる。
Further, according to the present invention, the character processing method determining step of determining the processing method of the recognized character depending on the ratio of the recognized characters having high recognition accuracy in the document, and the recognition accuracy in the character processing method determining step. When the ratio of recognized characters is high, the recognized character with low recognition accuracy is recognized as a contact character, the separation position is obtained from the figure feature, and the contact character is separated at the separation position. A word candidate detecting step of selecting one or a plurality of word candidates by matching with a word string made up of the recognized characters and a word stored in the word dictionary when the proportion of the recognized characters having low accuracy is high; Since the user has a word candidate determination step of selecting an accurate word from the word candidates displayed in the word candidate detection step,
It is possible to realize an excellent character recognition method in the United States and America, which has excellent recognition speed and recognition accuracy, and has high work efficiency, and can also recognize characters with high reliability, even for a recognition target document having many contact characters with poor print quality.

【図面の簡単な説明】[Brief description of drawings]

【図1】第1実施例における欧米文字認識装置の機能ブ
ロック図
FIG. 1 is a functional block diagram of a Western character recognition device according to a first embodiment.

【図2】第1実施例における欧米文字認識装置の装置ブ
ロック図
FIG. 2 is a device block diagram of a Western character recognition device in the first embodiment.

【図3】第1実施例における欧米文字認識装置の動作を
示すフローチャート
FIG. 3 is a flowchart showing the operation of the Western character recognition device in the first embodiment.

【図4】従来例における欧米文字認識装置の機能ブロッ
ク図
FIG. 4 is a functional block diagram of a conventional Western character recognition device.

【符号の説明】[Explanation of symbols]

1 画像入力部 2 画像格納部 3 文字切り出し手段 4 行切り出し手段 5 単語切り出し手段 6 文字認識手段 7 接触文字分離手段 8 認識辞書 9 文字領域格納部 10 行領域格納部 11 単語領域格納部 12 認識結果格納部 13 制御部 14 認識結果出力部 15 文字処理方法判定手段 16 単語候補検出手段 17 単語候補確定手段 18 単語候補格納部 19 単語辞書 20 表示部 21 認識結果入力部 22 画像入力装置 23 入力装置 24 表示装置 25 中央処理装置 26 制御プログラム 27 リード・オンリ・メモリ 28 画像データ 29 文字領域データ 30 行領域データ 31 単語領域データ 32 認識結果データ 33 単語候補データ 34 ランダム・アクセス・メモリ 35 出力装置 36 内部バス 1 image input unit 2 image storage unit 3 character cutout unit 4 line cutout unit 5 word cutout unit 6 character recognition unit 7 contact character separation unit 8 recognition dictionary 9 character area storage unit 10 line area storage unit 11 word area storage unit 12 recognition result Storage unit 13 Control unit 14 Recognition result output unit 15 Character processing method determination unit 16 Word candidate detection unit 17 Word candidate determination unit 18 Word candidate storage unit 19 Word dictionary 20 Display unit 21 Recognition result input unit 22 Image input device 23 Input device 24 Display device 25 Central processing unit 26 Control program 27 Read only memory 28 Image data 29 Character area data 30 Line area data 31 Word area data 32 Recognition result data 33 Word candidate data 34 Random access memory 35 Output device 36 Internal bus

Claims (2)

【特許請求の範囲】[Claims] 【請求項1】認識確度が高い認識文字が文書中どれだけ
の割合で存在したかによって前記認識文字の処理方法を
決定する文字処理方法判定手段と、前記文字処理方法判
定手段において認識確度が高い前記認識文字の割合が高
い場合に認識確度が低い前記認識文字を接触文字と判定
し図形特徴から分離位置を求め前記分離位置で前記接触
文字を分離する接触文字分離手段と、前記文字処理方法
判定手段において認識確度が高い前記認識文字の割合が
低い場合に前記認識文字からなる単語列と単語辞書中に
格納された単語と照合を行い1乃至複数の単語候補を選
定し表示部に表示する単語候補検出手段と、使用者が前
記単語候補検出手段で表示された前記単語候補から正確
な単語を選定する単語候補確定手段と、を有することを
特徴とする欧米文字認識装置。
1. A character processing method determining means for determining a processing method of the recognized character according to a ratio of recognition characters having high recognition accuracy in a document, and a high recognition accuracy in the character processing method determining means. When the ratio of the recognized characters is high, the recognized character having a low recognition accuracy is determined as a contact character, a separation position is obtained from a graphic feature, and the contact character is separated at the separation position, and the character processing method determination is performed. When the ratio of the recognized characters having high recognition accuracy is low in the means, the word string made up of the recognized characters is collated with the word stored in the word dictionary, and one or more word candidates are selected and displayed on the display unit. A Western sentence characterized by having a candidate detecting means and a word candidate determining means for selecting an accurate word from the word candidates displayed by the word candidate detecting means by a user. Recognition device.
【請求項2】認識確度が高い認識文字が文書中どれだけ
の割合で存在したかによって前記認識文字の処理方法を
決定する文字処理方法判定ステップと、前記文字処理方
法判定ステップにおいて認識確度が高い前記認識文字の
割合が高い場合に認識確度が低い前記認識文字を接触文
字と判定し図形特徴から分離位置を求め前記分離位置で
前記接触文字を分離する接触文字分離ステップと、前記
文字処理方法判定ステップにおいて認識確度が高い前記
認識文字の割合が低い場合に前記認識文字からなる単語
列と単語辞書中に格納された単語と照合を行い1乃至複
数の単語候補を選定し表示部に表示する単語候補検出ス
テップと、使用者が前記単語候補検出ステップで表示さ
れた前記単語候補から正確な単語を選定する単語候補確
定ステップと、を有することを特徴とする欧米文字認識
方法。
2. A character processing method determining step of determining a processing method of the recognized character according to a ratio of recognition characters having high recognition accuracy in a document, and a high recognition accuracy in the character processing method determining step. When the ratio of the recognized characters is high, the recognized character having a low recognition accuracy is determined as a contact character, a separation position is obtained from a graphic feature, and the contact character is separated at the separation position, and the character processing method is determined. In the step, when the proportion of the recognized characters having high recognition accuracy is low, the word string composed of the recognized characters is compared with the word stored in the word dictionary, and one or more word candidates are selected and displayed on the display unit. A candidate detecting step, and a word candidate determining step of selecting an accurate word from the word candidates displayed by the user in the word candidate detecting step, Western character recognition method which is characterized in that.
JP7238090A 1995-09-18 1995-09-18 Occidental character recognition device and occidental character recognition method Pending JPH0981685A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP7238090A JPH0981685A (en) 1995-09-18 1995-09-18 Occidental character recognition device and occidental character recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP7238090A JPH0981685A (en) 1995-09-18 1995-09-18 Occidental character recognition device and occidental character recognition method

Publications (1)

Publication Number Publication Date
JPH0981685A true JPH0981685A (en) 1997-03-28

Family

ID=17025020

Family Applications (1)

Application Number Title Priority Date Filing Date
JP7238090A Pending JPH0981685A (en) 1995-09-18 1995-09-18 Occidental character recognition device and occidental character recognition method

Country Status (1)

Country Link
JP (1) JPH0981685A (en)

Similar Documents

Publication Publication Date Title
JPH1139428A (en) Direction correcting method for document video
JPH0950527A (en) Frame extracting device and rectangle extracting device
JPH0981685A (en) Occidental character recognition device and occidental character recognition method
US6208756B1 (en) Hand-written character recognition device with noise removal
JPH06180771A (en) English letter recognizing device
JP2917427B2 (en) Drawing reader
JPH06348911A (en) English character recognition device
KR100332752B1 (en) Method for recognizing character
JPH06309503A (en) English character recognizing device
JPH04260980A (en) Device for recognizing graphic
JPH11126236A (en) Device and method for on-line handwritten character recognition, and recording medium where the same method is recorded
JPH07168911A (en) Document recognition device
JPH08339417A (en) Device and method for recognition of european and american character
JPH09106437A (en) Device and method for segmenting character
CN115631493A (en) Text area determining method, system and related device
JPH05298487A (en) Alphabet recognizing device
JPS63269267A (en) Character recognizing device
JPH11203410A (en) Method and device for processing image and storage medium therefor
JPH06195515A (en) Character recognizing device
JPH0696277A (en) Alphabet recognizing device
JPH06162269A (en) Handwritten character recognizing device
JP2000099631A (en) Pattern recognizing device and pattern recognizing method
JPS6383888A (en) Character recognizer
JPH06150062A (en) Character recognizing device
JPS6383887A (en) Character recognizer