JPH0981685A

JPH0981685A - Occidental character recognition device and occidental character recognition method

Info

Publication number: JPH0981685A
Application number: JP7238090A
Authority: JP
Inventors: Michiaki Nobuoka; 道明信岡
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1995-09-18
Filing date: 1995-09-18
Publication date: 1997-03-28

Abstract

PROBLEM TO BE SOLVED: To provide an occidental character recognition device whose processing speed is fast, recognition accuracy is high and operability and operation efficiency are high when the recognition characters of low recognition likelihood are many. SOLUTION: This device is provided with a character processing method judgement means 15 for deciding the processing speed of the recognition character depending on at what ratio the recognition characters of the high recognition likelihood are present in a document, a contact character separation means 7 for judging the recognition characters of the low recognition likelihood as contact characters, obtaining a separating position from graphic features and separating the contact character at the separating position when the ratio of the recognition characters of the high recognition likelihood is high in the character processing method judgement means 15, a word candidate detection means 16 for collating a word string composed of the recognition characters with words stored in a word dictionary, selecting one or plural word candidates and displaying them at a display part when the ratio of the recognition characters of the high recognition likelihood is low in the character processing method judgement means 15 and a word candidate establishment means 17 for selecting an accurate word from the word candidates displayed in the word candidate detection means 16 by a user.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明はイメージスキャナー等の
画像入力装置で入力されたアルファベット等の欧米文字
の画像データをＡＳＣＩＩコード等の文字コードからな
る文字データに変換してコンピュータ等での処理を容易
にする欧米文字認識装置及び欧米文字認識方法に関し、
特に、画像品質が悪く接触文字の区別が判別し難い画像
データの認識に用いられる欧米文字認識装置及び欧米文
字認識方法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention converts image data of Western characters such as alphabets input by an image input device such as an image scanner into character data composed of character codes such as ASCII code and processed by a computer or the like. Regarding the Western character recognition device and the Western character recognition method to facilitate,
In particular, the present invention relates to a Western character recognition device and a Western character recognition method used for recognizing image data that has poor image quality and is difficult to distinguish between contact characters.

【０００２】[0002]

【従来の技術】近年、文字を記載した文書をイメージス
キャナー等の画像入力装置によりコンピューター内に取
り込み文字の処理を行うことが大量の文字データを短時
間で処理できる点で注目されている。コンピューター内
で処理するためには画像データを文字コード等に変換す
る文字認識装置が必要である。特に欧米文字は文字種が
少なく変換効率も高いので実用化が進んでいる。この際
入力される画像データの品質にかかわらず高い品質で正
確な変換が行われることが望まれている。2. Description of the Related Art In recent years, it has been noticed that a large amount of character data can be processed in a short time by taking a document in which characters are written into a computer by an image input device such as an image scanner and processing the characters. A character recognition device that converts image data into a character code or the like is required for processing in a computer. Especially, since Western characters have few character types and high conversion efficiency, they are being put to practical use. At this time, it is desired that accurate conversion be performed with high quality regardless of the quality of the input image data.

【０００３】以下に従来の欧米文字認識装置について説
明する。図４は従来例における欧米文字認識装置の機能
ブロック図である。図４において、１は文字を含んだ認
識対象文書を２値画像データとして入力する画像入力
部、２は画像入力部１において入力された画像データを
格納する画像格納部、３は画像格納部２に格納された画
像データ中の文字に外接する矩形を求めこの矩形を文字
領域として切り出す文字切り出し手段、４は文字切り出
し手段３で切り出された文字領域の水平方向の黒画素の
分布を計測し文字領域を行領域に切り出す行切り出し手
段、５は行切り出し手段４により切り出された各々の行
領域に対して垂直方向の黒画素の分布を計測し単語領域
を切り出す単語切り出し手段、６は文字切り出し手段３
により求められた文字領域内の画像の黒画素の分布を図
形特徴として注出しこの図形特徴と後述する認識辞書に
格納された標準の文字の図形特徴とをパターン照合し認
識文字を決定すると同時にその照合の程度から認識確度
を測定する文字認識手段、７は文字認識手段６で決定さ
れた認識文字の内認識確度の低いものを接触文字とし図
形特徴から分離位置を検出しこの分離位置で接触文字を
切断し再度文字認識手段６で認識辞書との照合を行う接
触文字分離手段、８は全体の認識対象文字の図形特徴を
格納する認識辞書、９は文字切り出し手段３で切り出さ
れた文字領域を格納する文字領域格納部、１０は行切り
出し手段４で切り出された行領域を格納する行領域格納
部、１１は単語切り出し手段５で切り出された単語領域
を格納する単語領域格納部、１２は文字認識手段６で得
られた認識文字を格納する認識結果格納部、１３は文字
切り出し手段３、行切り出し手段４、単語切り出し手段
５、文字認識手段６、接触文字分離手段７を有する制御
部、１４は認識結果格納部１２に格納された最終的な認
識結果を出力する認識結果出力部である。A conventional Western character recognition device will be described below. FIG. 4 is a functional block diagram of a Western character recognition device in a conventional example. In FIG. 4, 1 is an image input unit for inputting a recognition target document including characters as binary image data, 2 is an image storage unit for storing the image data input in the image input unit 1, and 3 is an image storage unit 2. A character slicing means 4 for obtaining a rectangle circumscribing a character in the image data stored in the slicing device and slicing the rectangle as a character region, 4 measures the distribution of black pixels in the horizontal direction in the character region cut out by the character slicing device 3, and determines the character. A line segmenting means for segmenting a region into a line segment, 5 is a word segmenting segment for segmenting a word region by measuring a distribution of black pixels in the vertical direction with respect to each line segment segmented by the line segmenting segment 4, and 6 is a character segmenting segment. Three
The distribution of the black pixels of the image in the character area obtained by is extracted as a graphic feature, and this graphic feature and the graphic feature of the standard character stored in the recognition dictionary described later are pattern-matched to determine the recognized character and at the same time. Character recognition means for measuring the recognition accuracy from the degree of collation, 7 is a character which has a low recognition accuracy among the recognized characters determined by the character recognition means 6 is used as a contact character, and the separation position is detected from the figure feature to detect the contact character. The contact character separating means for cutting off the character and again collating it with the recognition dictionary by the character recognizing means 6, 8 is a recognition dictionary for storing the graphic features of the entire recognition target character, and 9 is the character area cut out by the character cutting means 3. A character area storage unit for storing, 10 is a line area storage unit for storing the line area cut out by the line cutting means 4, and 11 is a word area for storing the word area cut out by the word cutting means 5. A storage unit, 12 is a recognition result storage unit that stores the recognition characters obtained by the character recognition unit 6, and 13 is a character cutting unit 3, a line cutting unit 4, a word cutting unit 5, a character recognition unit 6, and a contact character separating unit 7. The control unit 14 has a recognition result output unit that outputs the final recognition result stored in the recognition result storage unit 12.

【０００４】以上のように構成された欧米文字認識装置
について、以下その動作を説明する。まず、画像入力部
１から欧米文字を含む認識対象文書を２値画像として欧
米文字認識装置内に取り込み画像格納部２に格納する。
次に、文字切り出し手段３において、画像格納部２に格
納された画像データの中から文字パターンと思われる黒
画素領域に外接する矩形領域を切り出し文字領域として
文字領域格納部９に格納する。次に、行切り出し手段４
において、文字領域格納部９に格納された文字領域の水
平方向の黒画素の分布から行領域を切り出し行領域格納
部１０に格納する。次に、単語切り出し手段５におい
て、文字領域格納部９に格納された文字領域と行領域格
納部１０に格納された行領域から各行領域毎の垂直方向
の黒画素の分布を計測し単語領域を切り出し単語領域格
納部１１に格納する。次に、文字認識手段６において、
文字領域格納部９に格納された文字領域のそれぞれの文
字に相当する図形特徴を注出し認識辞書８に格納された
標準文字の図形特徴とパターン照合を行い類似のパター
ンについて認識文字とするとともに類似の程度を認識確
度として計算し認識文字の文字コードと認識確度を認識
結果格納部１２に格納する。次に、接触文字分離手段７
において、認識結果格納部１２に格納された認識確度が
低い場合はその文字領域を接触文字と判断し接触文字に
図形特徴から分離位置を検出し分離位置で分離した後に
文字認識手段６で再度文字認識を行う。次に、認識結果
出力部１４において、認識結果格納部１２に格納された
認識結果を出力することにより欧米文字に認識処理を完
了する。The operation of the Western character recognizing apparatus having the above-described structure will be described below. First, a document to be recognized containing Western characters is input as a binary image from the image input unit 1 into the Western character recognition device and stored in the image storage unit 2.
Next, in the character cut-out means 3, a rectangular area circumscribing a black pixel area which is considered to be a character pattern in the image data stored in the image storage section 2 is stored in the character area storage section 9 as a cut-out character area. Next, line cutting means 4
At, the line area is cut out from the distribution of black pixels in the horizontal direction of the character area stored in the character area storage unit 9 and stored in the line area storage unit 10. Next, in the word cutting means 5, the distribution of vertical black pixels for each line area is measured from the character area stored in the character area storage unit 9 and the line area stored in the line area storage unit 10 to determine the word area. The cut-out word area storage unit 11 stores it. Next, in the character recognition means 6,
The graphic features corresponding to the respective characters in the character area stored in the character area storage unit 9 are subjected to pattern matching with the graphic characteristics of the standard characters stored in the extraction recognition dictionary 8 to make similar patterns and recognized characters. Is calculated as the recognition accuracy, and the character code of the recognized character and the recognition accuracy are stored in the recognition result storage unit 12. Next, the contact character separating means 7
When the recognition accuracy stored in the recognition result storage unit 12 is low, the character area is determined to be a contact character, the separation position of the contact character is detected from the graphic feature, the character is separated at the separation position, and then the character recognition unit 6 again recognizes the character. To recognize. Next, the recognition result output unit 14 outputs the recognition result stored in the recognition result storage unit 12 to complete the recognition process for Western characters.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら上記従来
の欧米文字認識装置では、認識対象文書の印字品質が低
く文字間の接触部分が多く接触文字が増加するにつれ再
認識の頻度が増え認識にかかる時間が長くなり作業効率
が低下するという問題点を有していた。その結果、認識
処理後の後処理に手間がかかり作業性及び作業効率に劣
るという問題点を有していた。However, in the above-mentioned conventional Western character recognition apparatus, the frequency of re-recognition increases and the time required for recognition increases as the print quality of the recognition target document is low and the number of contact portions between characters increases. However, there is a problem in that the work efficiency becomes low and the work efficiency decreases. As a result, there is a problem in that post-processing after the recognition processing is troublesome and workability and work efficiency are poor.

【０００６】本発明は上記従来の問題点を解決するもの
で、認識確度が低い認識文字が多い場合に認識確度が高
い認識文字をキーとして単語検索を行い単語候補を選定
することにより処理速度が速いとともに認識精度が高く
作業性及び作業効率の高い欧米文字認識装置の提供、及
び、認識速度及び認識精度がともに高く作業性及び信頼
性に優れる欧米文字認識方法の提供を目的とするもので
ある。The present invention solves the above-mentioned conventional problems, and when there are many recognized characters with low recognition accuracy, word recognition is performed using the recognized character with high recognition accuracy as a key, and the processing speed is increased by selecting word candidates. The object of the present invention is to provide a Western character recognition device that is fast and has high recognition accuracy and high workability and work efficiency, and a Western character recognition method that has both high recognition speed and recognition accuracy and excellent workability and reliability. .

【０００７】[0007]

【課題を解決するための手段】この目的を達成するため
に本発明の請求項１に記載の欧米文字認識装置は、認識
確度が高い認識文字が文書中どれだけの割合で存在した
かによって認識文字の処理方法を決定する文字処理方法
判定手段と、文字処理方法判定手段において認識確度が
高い認識文字の割合が高い場合に認識確度が低い認識文
字を接触文字と判定し図形特徴から分離位置を求め分離
位置で接触文字を分離する接触文字分離手段と、文字処
理方法判定手段において認識確度が高い認識文字の割合
が低い場合に認識文字からなる単語列と単語辞書中に格
納された単語と照合を行い１乃至複数の単語候補を選定
し表示部に表示する単語候補検出手段と、使用者が単語
候補検出手段で表示された単語候補から正確な単語を選
定する単語候補確定手段と、を有する構成からなる。In order to achieve this object, a Western character recognition apparatus according to claim 1 of the present invention recognizes a recognition character having a high recognition accuracy by the proportion of the recognition character in a document. A character processing method determining means for determining a character processing method, and a character recognition method having a high recognition accuracy in the character processing method determining means determines a recognition character having a low recognition accuracy as a contact character and separates it from a graphic feature. Contact character separating means for separating contact characters at the obtained separation position, and matching with a word string composed of recognized characters and words stored in a word dictionary when the proportion of recognized characters with high recognition accuracy in the character processing method determination means is low The word candidate detection means for selecting one or a plurality of word candidates and displaying them on the display unit, and the word candidate confirmation means for selecting an accurate word from the word candidates displayed by the word candidate detection means by the user. Consisting configuration having a means.

【０００８】本発明の請求項２に記載の欧米文字認識方
法は、認識確度が高い認識文字が文書中どれだけの割合
で存在したかによって認識文字の処理方法を決定する文
字処理方法判定ステップと、文字処理方法判定ステップ
において認識確度が高い認識文字の割合が高い場合に認
識確度が低い認識文字を接触文字と判定し図形特徴から
分離位置を求め分離位置で接触文字を分離する接触文字
分離ステップと、文字処理方法判定ステップにおいて認
識確度が高い認識文字の割合が低い場合に認識文字から
なる単語列と単語辞書中に格納された単語と照合を行い
１乃至複数の単語候補を選定し表示部に表示する単語候
補検出ステップと、使用者が単語候補検出ステップで表
示された単語候補から正確な単語を選定する単語候補確
定ステップと、を有する構成からなる。The Western character recognition method according to claim 2 of the present invention comprises a character processing method determining step for determining the processing method of the recognized character according to the proportion of the recognized character having high recognition accuracy in the document. In the character processing method determination step, when the proportion of recognized characters with high recognition accuracy is high, the recognized character with low recognition accuracy is determined to be a contact character, the separation position is obtained from the graphic feature, and the contact character is separated at the separation position. In the character processing method determining step, when the proportion of recognized characters having high recognition accuracy is low, the word string made up of the recognized characters is collated with the words stored in the word dictionary, and one or more word candidates are selected and displayed on the display unit. A word candidate detection step displayed on the screen, and a word candidate confirmation step in which the user selects an accurate word from the word candidates displayed in the word candidate detection step. Consisting of configuration that.

【０００９】ここで、文字処理方法判定手段での認識確
度の判定値は予め固定された値でも良いが可変にし使用
者が認識結果を確認しながら設定できるようにしても良
い。Here, the determination value of the recognition accuracy in the character processing method determination means may be a fixed value in advance, but it may be variable so that the user can set it while confirming the recognition result.

【００１０】[0010]

【作用】この構成によって、文字処理方法判定手段が認
識確度が高い認識文字が文書中にどれだけの割合で存在
するかどうかを的確に判定し認識速度及び認識精度を向
上できるように後のステップを選定できる。接触文字分
離手段が接触文字と認定された認識文字の分離位置を正
確に識別しその分離位置から認識文字を分離して再度認
識処理を行わせ認識精度を向上できる。単語候補検出手
段が各々文字別に識別された認識文字を単語単位で単語
辞書と照合することにより認識精度をより向上できる。
単語候補確定手段が使用者に単語候補が正確であるかど
うかを判定させることにより認識精度を高くすると同時
に後処理での訂正を省略できる。With this configuration, the character processing method determining means accurately determines at what rate the recognized characters having a high recognition accuracy are present in the document, so that the recognition speed and the recognition accuracy can be improved in the subsequent steps. Can be selected. The contact character separating means can accurately identify the separation position of the recognized character recognized as the contact character, separate the recognized character from the separated position, and perform the recognition process again to improve the recognition accuracy. The recognition accuracy can be further improved by the word candidate detecting means collating the recognized characters identified for each character with the word dictionary on a word-by-word basis.
By making the user determine whether or not the word candidate is correct by the word candidate determining means, the recognition accuracy can be increased and at the same time the correction in the post-processing can be omitted.

【００１１】[0011]

【Example】

（実施例１）以下本発明の第１実施例について、図面を
参照しながら説明する。図１は第１実施例における欧米
文字認識装置の機能ブロック図であり、図２は第１実施
例における欧米文字認識装置の装置ブロック図である。
図１において、１は画像入力部、２は画像格納部、３は
文字切り出し手段、４は行切り出し手段、５は単語切り
出し手段、６は文字認識手段、７は接触文字分離手段、
８は認識辞書、９は文字領域格納部、１０は行領域格納
部、１１は単語領域格納部、１２は認識結果格納部、１
３は制御部、１４認識結果出力部である。これらは従来
例と同様のものであり同一の符号を付けて説明を省略す
る。(Embodiment 1) Hereinafter, a first embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a functional block diagram of the Western character recognition device in the first embodiment, and FIG. 2 is a device block diagram of the Western character recognition device in the first embodiment.
In FIG. 1, 1 is an image input unit, 2 is an image storage unit, 3 is character cutting means, 4 is line cutting means, 5 is word cutting means, 6 is character recognition means, 7 is contact character separation means,
8 is a recognition dictionary, 9 is a character area storage unit, 10 is a line area storage unit, 11 is a word area storage unit, 12 is a recognition result storage unit, 1
Reference numeral 3 is a control unit and 14 is a recognition result output unit. These are the same as those in the conventional example, and the same reference numerals are given to omit the description.

【００１２】１５は認識結果格納部１２に格納された認
識文字の認識確度に基づいて後のステップで認識文字を
分離するか単語単位で単語辞書と照合するかを判定する
文字処理方法判定手段、１６は単語領域格納部１１に格
納されている単語領域や認識結果格納部１２に格納され
ている認識文字と後述する単語辞書に格納されている単
語とを照合させ１乃至複数の類似する単語を単語候補に
選定する単語候補検出手段、１７は使用者が後述する表
示部に表示された単語候補検出手段１６で選定された単
語候補から正確な単語を決定し認識結果を確定する単語
候補確定手段、１８は単語候補検出手段１６にて検出さ
れた単語候補及びその座標を格納する単語候補格納部、
１９は全体の単語の綴りを格納している単語辞書、２０
は単語候補格納部１８に格納された単語候補と画像格納
部２に格納された画像データの内単語候補の位置を表示
する表示部、２１は使用者が表示部２０に表示された単
語候補の内から正確な単語を選定し欧米文字認識装置に
入力し認識結果とする認識結果入力部である。Reference numeral 15 is a character processing method determining means for determining whether to recognize the recognized character or to collate it with a word dictionary in word units at a later step based on the recognition accuracy of the recognized character stored in the recognition result storage unit 12. Reference numeral 16 matches one or a plurality of similar words by matching a word area stored in the word area storage unit 11 or a recognition character stored in the recognition result storage unit 12 with a word stored in a word dictionary described later. A word candidate detecting means for selecting as a word candidate, and 17 is a word candidate determining means for determining an accurate word from the word candidate selected by the word candidate detecting means 16 displayed on the display unit, which will be described later, and determining the recognition result. , 18 is a word candidate storage section for storing the word candidates detected by the word candidate detecting means 16 and the coordinates thereof,
19 is a word dictionary that stores the spelling of all words, 20
Is a display unit for displaying the positions of the word candidates stored in the word candidate storage unit 18 and the word candidates in the image data stored in the image storage unit 2, and 21 is the word candidate displayed by the user in the display unit 20. It is a recognition result input unit that selects an accurate word from among them and inputs it to a Western character recognition device to obtain a recognition result.

【００１３】図２において、２２は認識対象文書を画像
データとして欧米文字認識装置に取り込むイメージスキ
ャナーやカメラ等からなる画像入力装置、２３は使用者
がシステムの起動、終了指令或いは認識結果の入力その
他の指令を行うキーボード、トラックボール、ポインテ
ィングデバイス等からなる入力装置、２４はシステムの
動作状況或いは画像データや単語候補等を表示するＣＲ
Ｔ、液晶ディスプレー等からなる表示装置、２５は全体
の制御を行う中央処理装置（ＣＰＵと略称する）、２６
は各制御指令を含みＣＰＵがロードして使用する制御プ
ログラム、２７は制御プログラム２６が格納された読み
出し専用のリード・オンリ・メモリ（ＲＯＭ）、２８は
画像格納部２に格納された画像データ、２９は文字領域
格納部９に格納された文字領域データ、３０は行領域格
納部１０に格納された行領域データ、３１は単語領域格
納部１１に格納された単語領域データ、３２は認識結果
格納部１２に格納された認識結果データ、３３は単語候
補格納部１８に格納された単語候補データ、３４は演算
途中の各データを格納する書き込み消去が可能なランダ
ム・アクセス・メモリ（ＲＡＭ）、３５は認識結果を出
力するプリンター等の出力装置、３６は各構成装置間を
連結し制御信号やデータ等の遣り取りをする内部バスで
ある。In FIG. 2, reference numeral 22 is an image input device composed of an image scanner, a camera or the like for capturing a document to be recognized as image data into a Western character recognition device, and 23 is a user's input of a system start / end command or a recognition result. An input device including a keyboard, a trackball, a pointing device, and the like for issuing a command, 24 is a CR that displays the operating status of the system or image data, word candidates, and the like.
T is a display device including a liquid crystal display, 25 is a central processing unit (abbreviated as CPU) for performing overall control, 26
Is a control program including each control command to be loaded and used by the CPU, 27 is a read-only read-only memory (ROM) in which the control program 26 is stored, 28 is image data stored in the image storage unit 2, Reference numeral 29 is character area data stored in the character area storage unit 9, 30 is line area data stored in the line area storage unit 10, 31 is word area data stored in the word area storage unit 11, and 32 is recognition result storage. The recognition result data stored in the unit 12, 33 is the word candidate data stored in the word candidate storage unit, 34 is a write-erasable random access memory (RAM) that stores each data in the middle of calculation, 35 Is an output device such as a printer that outputs the recognition result, and 36 is an internal bus that connects the constituent devices and exchanges control signals and data.

【００１４】以上のように構成された第１実施例におけ
る欧米文字認識装置について、以下図面を用いてその動
作を説明する。図３は第１実施例における欧米文字認識
装置の動作を示すフローチャートである。まず、画像入
力部１において欧米文字を含む認識対象文書の画像を２
値画像データに変換し画像格納部２に格納する（画像入
力処理、Ｓ１）。次に、文字切り出し手段３において画
像格納部２に格納された画像データの内近接する８点で
連結された黒画素の集まりを１つの文字パターンとみな
しこの文字パターンに外接する矩形を文字領域として切
り出す。その際、抽出した矩形の大きさが微小矩形であ
りかつその矩形の垂直方向のすぐ近くに矩形が存在する
場合、ｉ、ｊ等の分離文字とみなし２つの矩形を統合し
１つの文字領域とし、その他の場合抽出した外接矩形を
一つの文字領域とし各文字矩形ごとに左上の角の座標及
び矩形の幅、高さを文字領域格納部９に格納する（文字
切り出し処理、Ｓ２）。次に、行切り出し手段４におい
て文字領域格納部９に格納された文字領域について文字
矩形の水平方向の重なり具合を調べ各行の行領域を求め
各行毎に左上の角の座標及び右下の角の座標を行領域格
納部１０に格納する（行切り出し処理、Ｓ３）。次に、
単語切り出し手段５において行領域格納部１０に格納さ
れた行領域毎にその行の属する文字領域の水平方向の間
隔のヒストグラムを取りこのヒストグラムの山を２分す
る閾値を求めこの閾値より文字間の間隔が広い場合単語
区切りとし各単語毎に左上の角の座標及び右下の角の座
標を単語領域格納部１１に格納する（単語切り出し処
理、Ｓ４）。次に、文字認識手段６において文字領域格
納部９に格納された全ての文字領域内の画像データを画
像格納部２から取り出し黒画素の分布を図形特徴として
抽出し認識辞書８内の図形特徴とパターン照合すること
により類似する文字から認識確度を算定し最も認識確度
の高い文字を認識文字とし文字コードを認識結果として
認識確度とともに認識結果格納部１２に格納する（文字
認識処理、Ｓ５）。次に、文字処理方法判定手段１５に
おいて認識結果格納部１２に格納された認識確度が所定
の値以上の認識文字が８５％以上であるかどうかを判定
する（文字処理方法判定処理、Ｓ６）。ＹＥＳである場
合は接触文字分離手段７において認識確度が低い認識文
字の図形特徴の内で図形的つながりの薄いところ、例え
ば、黒画素の垂直方向の分布が少なく連結箇所が１か所
のところ等で画像を切断する（接触文字分離処理、Ｓ
７）。次に、接触文字分離手段７で切断された画像デー
タを再度文字認識手段６により文字認識処理を行い最も
認識確度の高いものを認識結果として認識結果格納部１
２に格納する（文字認識処理、Ｓ８）。次に、ステップ
１１を実行する。ステップ６がＮＯである場合は単語候
補検出手段１６において認識結果格納部１２に格納され
たの認識文字と単語領域格納部１１に格納された単語領
域から認識文字を単語列に区切りこの単語列と単語辞書
１９に格納された単語とを照合しある程度の照合精度を
有する単語を単語候補として表示部２０に表示する（単
語候補検出処理、Ｓ９）。次に、単語候補確定手段１７
において使用者は表示部２０に表示された単語候補から
正確な単語を認識結果として認識結果格納部１２に格納
する（単語候補確定処理、Ｓ１０）。次に、認識結果を
認識結果出力部に出力し（認識結果出力処理、Ｓ１１）
動作を完了する。The operation of the Western character recognition apparatus of the first embodiment constructed as described above will be described below with reference to the drawings. FIG. 3 is a flow chart showing the operation of the Western character recognition apparatus in the first embodiment. First, in the image input unit 1, the image of the document to be recognized including Western characters is
It is converted into value image data and stored in the image storage unit 2 (image input process, S1). Next, in the character cut-out means 3, a group of black pixels connected at eight adjacent points in the image data stored in the image storage unit 2 is regarded as one character pattern, and a rectangle circumscribing this character pattern is set as a character area. cut. At that time, when the size of the extracted rectangle is a minute rectangle and a rectangle exists in the immediate vertical direction of the rectangle, it is regarded as a separated character such as i and j, and the two rectangles are integrated into one character area. In other cases, the extracted circumscribed rectangle is used as one character area, and the coordinates of the upper left corner and the width and height of the rectangle are stored in the character area storage unit 9 for each character rectangle (character cutout process, S2). Next, the line cutting means 4 checks the horizontal overlap of the character rectangles in the character area stored in the character area storage unit 9 to find the line area of each line, and determines the coordinates of the upper left corner and the lower right corner of each line. The coordinates are stored in the line area storage unit 10 (line cutout process, S3). next,
In the word cutout means 5, a histogram of the horizontal intervals of the character regions to which the lines belong is obtained for each line region stored in the line region storage unit 10 and a threshold value for dividing the peaks of this histogram into two is obtained. If the interval is wide, the word is divided into words, and the coordinates of the upper left corner and the coordinates of the lower right corner are stored in the word area storage unit 11 for each word (word cutting process, S4). Next, in the character recognition means 6, the image data in all the character areas stored in the character area storage unit 9 is taken out from the image storage unit 2 and the distribution of black pixels is extracted as the graphic feature to obtain the graphic feature in the recognition dictionary 8. The recognition accuracy is calculated from similar characters by pattern matching, the character with the highest recognition accuracy is used as the recognition character, and the character code is stored as a recognition result together with the recognition accuracy in the recognition result storage unit 12 (character recognition processing, S5). Next, the character processing method determination means 15 determines whether or not 85% or more of the recognized characters whose recognition accuracy stored in the recognition result storage unit 12 is a predetermined value or more (character processing method determination processing, S6). In the case of YES, the contact character separating means 7 has a low recognition accuracy in the graphic features of the recognized characters, where the graphic connection is weak, for example, the distribution of black pixels in the vertical direction is small and the connection position is one place. Cut the image with (contact character separation process, S
7). Next, the image data cut by the contact character separating means 7 is again subjected to character recognition processing by the character recognizing means 6 and the one having the highest recognition accuracy is recognized as the recognition result.
2 (character recognition process, S8). Next, step 11 is executed. If step 6 is NO, the word candidate detection means 16 separates the recognized character from the recognized character stored in the recognition result storage section 12 and the word area stored in the word area storage section 11 into a word string and this word string. The word stored in the word dictionary 19 is collated, and a word having a certain degree of collation accuracy is displayed on the display unit 20 as a word candidate (word candidate detection process, S9). Next, the word candidate determination means 17
In, the user stores an accurate word from the word candidates displayed on the display unit 20 in the recognition result storage unit 12 as a recognition result (word candidate determination process, S10). Next, the recognition result is output to the recognition result output unit (recognition result output process, S11).
Complete the operation.

【００１５】本実施例によれば、認識確度の分布状態に
基づいて認識文字の処理方法を決定する文字処理方法判
定手段と、認識確度が高い認識文字の割合が小さい場合
に単語辞書の単語照合を行い１乃至複数の単語候補を選
定する単語候補検出手段と、使用者が単語候補から認識
結果を決定する単語候補確定手段と、を設けたので、認
識確度の高い認識文字の分布が少なく認識確度が広がり
を以て分布している品質の悪い画像データでも使用者が
単語候補を選定することにより正確な文字認識を行うこ
とができる。結果的に手間のかかる後処理をする必要が
なく迅速な文字認識が可能で作業効率を高くできる。ま
た、未熟練者でも容易に作業ができ作業性が向上する。According to the present embodiment, the character processing method determining means for determining the processing method of the recognized character based on the distribution state of the recognition accuracy, and the word collation of the word dictionary when the ratio of the recognized character having the high recognition accuracy is small. Since the word candidate detection means for selecting one or a plurality of word candidates and the word candidate determination means for allowing the user to determine the recognition result from the word candidates are provided, the recognition character distribution with high recognition accuracy is small and recognized. The user can perform accurate character recognition by selecting a word candidate even for image data of poor quality in which the accuracy is widely distributed. As a result, there is no need for complicated post-processing, rapid character recognition is possible, and work efficiency can be improved. Further, even an unskilled person can easily work and the workability is improved.

【００１６】[0016]

【発明の効果】以上のように本発明は、認識確度が高い
認識文字が文書中どれだけの割合で存在したかによって
認識文字の処理方法を決定する文字処理方法判定手段
と、文字処理方法判定手段において認識確度が高い認識
文字の割合が高い場合に認識確度が低い認識文字を接触
文字と判定し図形特徴から分離位置を求め分離位置で接
触文字を分離する接触文字分離手段と、文字処理方法判
定手段において認識確度が低い認識文字の割合が高い場
合に認識文字からなる単語列と単語辞書中に格納された
単語と照合を行い１乃至複数の単語候補を選定し表示部
に表示する単語候補検出手段と、使用者が単語候補検出
手段で表示された単語候補から正確な単語を選定する単
語候補確定手段と、を有するので、認識確度の高い認識
文字の割合の少ない品質の劣化した画像データは認識確
度の高い認識文字をキーとして単語毎に認識作業を行い
１乃至複数の単語候補を選定し作業者に最終的な単語を
決定させるので誤認識が少なく後工程での校正作業が容
易で結果として作業性が向上する。従って、印字品質の
劣った接触文字が多く認識確度の分布が広い認識対象文
書でも確実に信頼性の高い文字認識が行われる優れた欧
米文字認識装置を実現することができる。As described above, according to the present invention, the character processing method determining means and the character processing method determining means for determining the processing method of the recognized character depending on the proportion of the recognized character having high recognition accuracy in the document. When the ratio of recognition characters with high recognition accuracy in the means is high, the recognition character with low recognition accuracy is determined to be a contact character, the separation position is obtained from the graphic features, and the contact character is separated at the separation position, and a character processing method. When the ratio of recognition characters with low recognition accuracy is high in the determination means, a word string made up of recognition characters is compared with a word stored in a word dictionary to select one or more word candidates and display them on the display unit. Since the detection means and the word candidate confirmation means for selecting an accurate word from the word candidates displayed by the word candidate detection means are included, the proportion of recognized characters with high recognition accuracy is small. Image data with deteriorated quality is recognized for each word by using recognized characters with high recognition accuracy as a key, and one or more word candidates are selected to let the operator decide the final word. The calibration work of is easy and the workability is improved as a result. Therefore, it is possible to realize an excellent Western character recognition device that reliably performs highly reliable character recognition even on a recognition target document having many contact characters with poor print quality and a wide distribution of recognition accuracy.

【００１７】また、本発明は、認識確度が高い認識文字
が文書中どれだけの割合で存在したかによって認識文字
の処理方法を決定する文字処理方法判定ステップと、文
字処理方法判定ステップにおいて認識確度が高い認識文
字の割合が高い場合に認識確度が低い認識文字を接触文
字と判定し図形特徴から分離位置を求め分離位置で接触
文字を分離する接触文字分離ステップと、文字処理方法
判定ステップにおいて認識確度が低い認識文字の割合が
高い場合に認識文字からなる単語列と単語辞書中に格納
された単語と照合を行い１乃至複数の単語候補を選定し
表示部に表示する単語候補検出ステップと、使用者が単
語候補検出ステップで表示された単語候補から正確な単
語を選定する単語候補確定ステップと、を有するので、
認識速度と認識精度のともに優れ印字品質の劣った接触
文字の多い認識対象文書でも作業効率が高いと同時に信
頼性高く文字認識ができる優れた欧米文字認識方法を実
現することができる。Further, according to the present invention, the character processing method determining step of determining the processing method of the recognized character depending on the ratio of the recognized characters having high recognition accuracy in the document, and the recognition accuracy in the character processing method determining step. When the ratio of recognized characters is high, the recognized character with low recognition accuracy is recognized as a contact character, the separation position is obtained from the figure feature, and the contact character is separated at the separation position. A word candidate detecting step of selecting one or a plurality of word candidates by matching with a word string made up of the recognized characters and a word stored in the word dictionary when the proportion of the recognized characters having low accuracy is high; Since the user has a word candidate determination step of selecting an accurate word from the word candidates displayed in the word candidate detection step,
It is possible to realize an excellent character recognition method in the United States and America, which has excellent recognition speed and recognition accuracy, and has high work efficiency, and can also recognize characters with high reliability, even for a recognition target document having many contact characters with poor print quality.

[Brief description of drawings]

【図１】第１実施例における欧米文字認識装置の機能ブ
ロック図FIG. 1 is a functional block diagram of a Western character recognition device according to a first embodiment.

【図２】第１実施例における欧米文字認識装置の装置ブ
ロック図FIG. 2 is a device block diagram of a Western character recognition device in the first embodiment.

【図３】第１実施例における欧米文字認識装置の動作を
示すフローチャートFIG. 3 is a flowchart showing the operation of the Western character recognition device in the first embodiment.

【図４】従来例における欧米文字認識装置の機能ブロッ
ク図FIG. 4 is a functional block diagram of a conventional Western character recognition device.

[Explanation of symbols]

１画像入力部２画像格納部３文字切り出し手段４行切り出し手段５単語切り出し手段６文字認識手段７接触文字分離手段８認識辞書９文字領域格納部１０行領域格納部１１単語領域格納部１２認識結果格納部１３制御部１４認識結果出力部１５文字処理方法判定手段１６単語候補検出手段１７単語候補確定手段１８単語候補格納部１９単語辞書２０表示部２１認識結果入力部２２画像入力装置２３入力装置２４表示装置２５中央処理装置２６制御プログラム２７リード・オンリ・メモリ２８画像データ２９文字領域データ３０行領域データ３１単語領域データ３２認識結果データ３３単語候補データ３４ランダム・アクセス・メモリ３５出力装置３６内部バス 1 image input unit 2 image storage unit 3 character cutout unit 4 line cutout unit 5 word cutout unit 6 character recognition unit 7 contact character separation unit 8 recognition dictionary 9 character area storage unit 10 line area storage unit 11 word area storage unit 12 recognition result Storage unit 13 Control unit 14 Recognition result output unit 15 Character processing method determination unit 16 Word candidate detection unit 17 Word candidate determination unit 18 Word candidate storage unit 19 Word dictionary 20 Display unit 21 Recognition result input unit 22 Image input device 23 Input device 24 Display device 25 Central processing unit 26 Control program 27 Read only memory 28 Image data 29 Character area data 30 Line area data 31 Word area data 32 Recognition result data 33 Word candidate data 34 Random access memory 35 Output device 36 Internal bus

Claims

[Claims]

1. A character processing method determining means for determining a processing method of the recognized character according to a ratio of recognition characters having high recognition accuracy in a document, and a high recognition accuracy in the character processing method determining means. When the ratio of the recognized characters is high, the recognized character having a low recognition accuracy is determined as a contact character, a separation position is obtained from a graphic feature, and the contact character is separated at the separation position, and the character processing method determination is performed. When the ratio of the recognized characters having high recognition accuracy is low in the means, the word string made up of the recognized characters is collated with the word stored in the word dictionary, and one or more word candidates are selected and displayed on the display unit. A Western sentence characterized by having a candidate detecting means and a word candidate determining means for selecting an accurate word from the word candidates displayed by the word candidate detecting means by a user. Recognition device.

2. A character processing method determining step of determining a processing method of the recognized character according to a ratio of recognition characters having high recognition accuracy in a document, and a high recognition accuracy in the character processing method determining step. When the ratio of the recognized characters is high, the recognized character having a low recognition accuracy is determined as a contact character, a separation position is obtained from a graphic feature, and the contact character is separated at the separation position, and the character processing method is determined. In the step, when the proportion of the recognized characters having high recognition accuracy is low, the word string composed of the recognized characters is compared with the word stored in the word dictionary, and one or more word candidates are selected and displayed on the display unit. A candidate detecting step, and a word candidate determining step of selecting an accurate word from the word candidates displayed by the user in the word candidate detecting step, Western character recognition method which is characterized in that.