JPH10154208A - Character recognition device, and storage medium, which computer storing program for functioning computer as character recognition device can read - Google Patents

Character recognition device, and storage medium, which computer storing program for functioning computer as character recognition device can read

Info

Publication number
JPH10154208A
JPH10154208A JP8312067A JP31206796A JPH10154208A JP H10154208 A JPH10154208 A JP H10154208A JP 8312067 A JP8312067 A JP 8312067A JP 31206796 A JP31206796 A JP 31206796A JP H10154208 A JPH10154208 A JP H10154208A
Authority
JP
Japan
Prior art keywords
character
width
recognition
reliability
cutout
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP8312067A
Other languages
Japanese (ja)
Other versions
JP3665435B2 (en
Inventor
Shiori Ooaku
志緒理 大阿久
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ricoh Co Ltd
Original Assignee
Ricoh Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ricoh Co Ltd filed Critical Ricoh Co Ltd
Priority to JP31206796A priority Critical patent/JP3665435B2/en
Publication of JPH10154208A publication Critical patent/JPH10154208A/en
Application granted granted Critical
Publication of JP3665435B2 publication Critical patent/JP3665435B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Abstract

PROBLEM TO BE SOLVED: To precisely detect a character rectangle which is possibly erroneous by judging the character rectangle whose character width is longer than character height based on the reliability of a recognized result, a noticed character and character information before and after it. SOLUTION: An image picture is inputted from a picture input part 101 and a character recognition processing part 104 executes a character recognition processing. A character re-segmenting judgement part 107 assumes the character whose character width is longer than character height as a contact character, and judges the re-segmenting to be necessary when a system applies to one of following three items. (1) The reliability of the noticed character by a reliability calculation part 106 is not more than a threshold, either character before or behind the noticed character is alphanumeric and the noticed character is not the alphanumeric. (2) Information of the part of speech on the noticed character is a non- registered word when the reliability of the noticed character is not more than the threshold, either character before or behind the noticed character is alphanumeric and the noticed character is also the alphanumeric. (3) The characters before and behind are not the non- registered words and the relations on the heights of the noticed character and the characters before and behind are the same.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【0001】[0001]

【発明が属する技術分野】本発明は日本語文書を対象と
し,その文字認識率を高めるため,特に2文字が接触し
て誤って1文字として切り出されていると想定される接
触文字を検出し,その文字の再切り出しの可否判定・再
処理などを行う文字認識装置およびその文字認識処理を
記憶した記憶媒体に関するものである。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention is directed to a Japanese document. In order to improve the character recognition rate, the present invention detects a contact character which is assumed to be in contact with two characters and erroneously cut out as one character. The present invention relates to a character recognizing device for determining whether or not the character can be re-cut out, re-processing, and the like, and a storage medium storing the character recognizing process.

【0002】[0002]

【従来の技術】近年,日本語文書は,英字列や数字列な
どが占める割合が高くなってきている。それと共に英数
フォントと日本語フォントとが混在して使用されてきて
いる。また,フォントの数も多種多様となってきてい
る。
2. Description of the Related Art In recent years, Japanese documents have been increasingly occupied by alphanumeric strings and numeric strings. At the same time, alphanumeric fonts and Japanese fonts have been used together. In addition, the number of fonts has also become diverse.

【0003】このような状況から原稿内の文字間ピッチ
は必ずしも一定ではないし,さらにコピー原稿ではコピ
ー時に生じる英数字の接触文字も多く見られる。
[0003] Under such circumstances, the pitch between characters in a document is not always constant, and in a copy document, there are many alphanumeric contact characters generated during copying.

【0004】このため,文字認識を行う際,英数字など
の半角文字と日本語部分の全角文字の文字矩形を的確に
切り出すために,1度求めた文字矩形を何らかの方法を
用い,誤っている可能性のあるものを検出し,再切り出
しを行うなどの方法が行われている。
For this reason, when character recognition is performed, the character rectangle obtained once is erroneously used in order to accurately extract the character rectangle of half-width characters such as alphanumeric characters and full-width characters of the Japanese part. A method of detecting a possible object and performing re-cutting has been used.

【0005】たとえば,このような技術に関連する参考
技術文献として,特開平4−1881号公報の『文字読
取り装置』には,全角文字と判断されない仮文字を文字
サイズ位置情報に基づき,英数字をカテゴリ分類し,分
類できなかったものを接触文字であるとみなして分割処
理を行うことや,総合文字/分離文字の妥当性を言語的
なルールにより判定を行うことが開示されている。
[0005] For example, as a reference technical document relating to such a technique, Japanese Patent Laid-Open No. Hei 4-18881 discloses a "character reading device" that includes a provisional character which is not determined to be a full-width character based on character size position information based on alphanumeric characters. It is disclosed that a character is classified into categories, and a character that could not be classified is regarded as a contact character to perform a division process, and that the validity of a comprehensive character / separated character is determined by a linguistic rule.

【0006】また,特開平4−184584号公報の
『文字認識方法及び装置』には,認識結果の確かでない
文字に対し,文字矩形を分割/再認識処理することが開
示されている。
Japanese Patent Application Laid-Open No. 4-184584 discloses a "character recognition method and apparatus" in which a character rectangle is divided / re-recognized for a character whose recognition result is not certain.

【0007】[0007]

【発明が解決しようとする課題】しかしながら,上記に
示されるような従来の技術にあっては,以下に述べるよ
うな問題点があった。
However, the prior art as described above has the following problems.

【0008】特開平4−1881号公報では,文字サイ
ズなどの認識処理前の情報から英数字であるかを判定し
ているが,認識後の結果に対して文字の再切り出し処理
を行っていないため,誤っている可能性のある文字を的
確に検出することができないことがあった。また,特開
平4−184584号公報では,2つの認識結果の評価
比較を行っていないため,文字として不適切な結果が出
力されてしまうおそれがあった。
In Japanese Patent Application Laid-Open No. Hei 4-18881, it is determined whether or not alphanumeric characters are present from information before recognition processing such as character size. However, character re-cutout processing is not performed on the result after recognition. As a result, characters that may be incorrect may not be accurately detected. Further, in Japanese Patent Application Laid-Open No. 4-184584, evaluation and comparison of two recognition results are not performed, so that an inappropriate result may be output as a character.

【0009】本発明は,上記に鑑みてなされたものであ
って,再切り出しを行うために,誤っている可能性のあ
る文字矩形の的確な検出を実現することを第1の目的と
する。
The present invention has been made in view of the above, and has as its first object to realize accurate detection of a possibly incorrect character rectangle in order to perform re-cutting.

【0010】また,文字として不適切な結果を出力しな
いようにすることにより,切り出しの誤りを最小限にす
ることを第2の目的とする。
It is a second object of the present invention to minimize an error in clipping by preventing an inappropriate result from being output as a character.

【0011】[0011]

【課題を解決するための手段】上記の目的を達成するた
めに,請求項1に係る文字認識装置にあっては,入力さ
れた文字画像情報から文字列と文字矩形とを切り出して
文字矩形を認識し,該認識結果の確からしさを表す信頼
度を求め,さらに前記信頼度に応じて前記文字矩形を再
切り出して認識し,最初の認識結果と再切り出し後の認
識結果とを比較評価し,尤もらしい結果を選択・出力す
る文字認識装置において,文字の構成矩形数が1であ
り,かつ文字幅が文字高さより長い文字矩形に対し,認
識結果の信頼度と着目文字および該着目文字の前後の文
字情報とに基づいて,文字を分離する再切り出し処理の
実行可否を判定する文字再切り出し判定手段を備えたも
のである。
According to a first aspect of the present invention, there is provided a character recognition apparatus, comprising: extracting a character string and a character rectangle from input character image information; Recognizing, obtaining a reliability indicating the likelihood of the recognition result, further re-cutting and recognizing the character rectangle according to the reliability, comparing and evaluating the first recognition result and the re-cutting recognition result, In a character recognition device that selects and outputs a likely result, for a character rectangle in which the number of rectangles constituting a character is one and the character width is longer than the character height, the reliability of the recognition result and the target character and the character before and after the target character And character re-cutting determination means for determining whether or not re-cut-out processing for separating characters is to be executed based on the character information.

【0012】すなわち,文字を構成している矩形の数が
1であり,かつ文字幅が文字の高さを上回る文字を,2
文字が接触して1文字として切り出されると想定される
接触文字候補と仮定し,着目文字の信頼度がしきい値以
下で,着目文字の前後いずれかが英数字であり,かつ着
目文字が英数字でない場合に再切り出しが必要であると
判定することにより,文字の再切り出しを行う際に,誤
っている可能性のある文字を絞り込むことができ,再切
り出し範囲を的確に検出することが可能となる。
That is, a character in which the number of rectangles constituting a character is one and whose character width exceeds the character height is two characters.
Assuming that the contact character is a contact character candidate that is assumed to be cut out as one character upon contact, the reliability of the target character is equal to or less than the threshold value, one of the characters before and after the target character is alphanumeric, and the target character is English. By judging that re-segmentation is necessary when it is not a number, when re-segmenting characters, it is possible to narrow down the characters that may be incorrect and to accurately detect the re-segmentation range Becomes

【0013】また,請求項2に係る文字認識装置にあっ
ては,前記文字再切り出し判定手段は,文字の構成矩形
数が1であり,かつ文字幅が文字高さより長い文字矩形
に対し,認識結果の信頼度と着目文字および該着目文字
の前後の文字情報と着目文字および該着目文字の直前の
文字の高さとに基づいて,文字を分離する再切り出し処
理の実行可否を判定するものである。
In the character recognition apparatus according to the second aspect, the character re-slicing determination means recognizes a character rectangle having a character configuration rectangle number of one and a character width longer than a character height. Based on the reliability of the result, the character of interest, character information before and after the character of interest, and the height of the character of interest and the character immediately before the character of interest, it is determined whether re-cutout processing for separating characters can be performed. .

【0014】すなわち,着目文字とその前の文字の高さ
をチェックし,前の文字と文字の高さが同じ場合にの
み,再切り出しが必要であると判定することにより,着
目画素の信頼度と前後の文字種情報を用いる請求項1の
処理では解決できない場合に対しても再切り出しが可能
となる。
That is, the height of the target character and the character before it are checked, and only when the height of the previous character is the same as that of the character, it is determined that the re-cutting is necessary. It is possible to cut out again even if the processing cannot be solved by the processing of claim 1 using the character type information before and after.

【0015】また,請求項3に係る文字認識装置にあっ
ては,前記文字再切り出し判定手段は,文字の構成矩形
数が1であり,かつ文字幅が文字高さより長い文字矩形
に対し,認識結果の信頼度と着目文字の前後の文字情報
と着目文字の前後文字との間隔(空白)幅の比とに基づ
いて,文字を分離する再切り出し処理の実行可否を判定
するものである。
In the character recognition device according to the third aspect, the character re-slicing determination means recognizes a character rectangle having a character configuration rectangle number of one and a character width longer than the character height. Based on the reliability of the result and the ratio of the space (blank) width between the character information before and after the character of interest and the character before and after the character of interest, it is determined whether or not re-cut processing for separating characters can be performed.

【0016】すなわち,着目文字とその前後の文字間隔
をチェックし,着目文字の文字種が前後文字のうちの空
白幅が狭い方の文字の文字種と一致していない場合に,
再切り出しが必要であると判定することにより,さらに
請求項2の処理では解決できない場合に対しても再切り
出しが可能となる。
That is, the target character and the character spacing before and after the target character are checked, and if the character type of the target character does not match the character type of the character with the smaller blank width of the preceding and following characters,
By judging that re-cutting is necessary, re-cutting can be performed even if the processing cannot be solved by the processing of claim 2.

【0017】また,請求項4に係る文字認識装置にあっ
ては,前記文字再切り出し判定手段は,文字の構成矩形
数が1であり,かつ文字幅が文字高さより長い文字矩形
に対し,認識結果の言語情報(品詞情報)に基づいて,
文字を分離する再切り出し処理の実行可否を判定するも
のである。
In the character recognition device according to the fourth aspect, the character re-slicing determination means recognizes a character rectangle having a character configuration rectangle number of one and a character width longer than the character height. Based on the resulting linguistic information (part of speech information)
This is for determining whether or not re-cutout processing for separating characters is executable.

【0018】すなわち,認識結果の言語情報(品詞情
報)をチェックすることにより,単語として正しい英字
を再切り出し処理にかける必要がなくなるので,切り出
しミスを少なくすることが可能となる。
That is, by checking the linguistic information (part-of-speech information) of the recognition result, it is not necessary to re-cut out the correct alphabetic character as a word, so that it is possible to reduce cut-out errors.

【0019】また,請求項5に係る文字認識装置にあっ
ては,前記文字再切り出し判定手段により再度文字を分
離する再切り出し処理を行うと判定され,1回目の認識
結果と異なる文字切り出し方法で再切り出し位置を検出
した場合,認識結果の信頼度と文字種とから再度標準文
字幅を求め,該標準文字幅を用いて分割文字矩形幅の妥
当性を判定し,文字再切り出し処理の実行可否を判定・
処理する文字再切り出し処理手段をさらに備えたもので
ある。
In the character recognition device according to the fifth aspect, the character re-cutting determination means determines that re-cutting processing for separating characters again is to be performed, and a character cut-out method different from the first recognition result. When the re-segmentation position is detected, the standard character width is calculated again from the reliability of the recognition result and the character type, the validity of the divided character rectangle width is determined using the standard character width, and whether or not the character re-segmentation process can be performed is determined. Judgment
The apparatus further includes a character re-cutout processing means for processing.

【0020】すなわち,再切り出しされた分割矩形が文
字として適切であるかを判定することにより,異常な矩
形を再認識処理にかける必要がなくなる。
That is, by judging whether or not the re-cut out divided rectangle is appropriate as a character, it is not necessary to apply an abnormal rectangle to the re-recognition processing.

【0021】また,請求項6に係る文字認識装置にあっ
ては,前記文字再切り出し処理手段は,1回目の認識結
果と異なる文字切り出し方法で再切り出し位置が適切で
ないと判定された場合,認識結果の前後の文字種情報と
着目文字の前後の空白幅とを用い,文字再切り出し処理
の実行可否を再度判定するものである。
In the character recognition device according to the present invention, the character re-sampling processing means may recognize the re-slicing position if it is determined that the re-slicing position is not appropriate by a character cutting method different from the first recognition result. By using the character type information before and after the result and the blank width before and after the target character, it is determined again whether or not the character re-cutout processing can be executed.

【0022】また,請求項7に係る文字認識装置にあっ
ては,前記文字再切り出し処理手段は,着目文字を分割
するのが適切であると判定した場合,前記文字切り出し
方法とは異なる第三の文字切り出し方法を用いて再切り
出し処理を実行するものである。
In the character recognition device according to the present invention, when the character re-slicing processing means determines that it is appropriate to divide the target character, the character re-slicing processing means is different from the character extracting method. The re-cutting process is executed by using the character cutting-out method described above.

【0023】すなわち,請求項6または請求項7では,
請求項5によって不適切と判定されたものに限定して,
再度切り出しの実行可否を判定し,適切と判定されたも
ののみに対して再切り出しを実行することにより,処理
の効率向上が可能となる。
That is, in claim 6 or claim 7,
Only those determined to be inappropriate by claim 5,
It is possible to improve the efficiency of the process by determining again whether or not the cutout can be executed and executing the cutout again only on the ones that are determined to be appropriate.

【0024】また,請求項8に係る文字認識装置にあっ
ては,文字の切り出し位置が異なる複数の認識結果から
尤もらしい結果を一意に選択する場合,認識結果の信頼
度と文字種とから再度標準文字幅を求め,再認識した結
果の文字種と前記標準文字幅とを用いて評価し,複数の
認識結果候補から不適切な候補を排除して出力する結果
評価選択手段をさらに備えたものである。
In the character recognition device according to the present invention, when a likely result is uniquely selected from a plurality of recognition results having different character cutout positions, a standard is re-determined based on the reliability of the recognition result and the character type. The apparatus further comprises a result evaluation selecting means for determining a character width, evaluating the character using the character type of the result of the re-recognition and the standard character width, excluding inappropriate candidates from a plurality of recognition result candidates, and outputting the result. .

【0025】すなわち,文字幅が不適切な認識結果を排
除し,確定文字として出力しないことにより,文字とし
て適したもののみ再切り出しを実行するので,処理の効
率向上が可能となる。
That is, by recognizing a character having an inappropriate character width and not outputting it as a fixed character, only the character suitable for the character is re-cut out, so that the processing efficiency can be improved.

【0026】また,請求項9に係る文字認識装置にあっ
ては,前記標準文字幅を,任意の処理対象行における認
識結果の信頼度が高く,かつ文字種が漢字である文字か
ら求めるものである。
Further, in the character recognition device according to the ninth aspect, the standard character width is obtained from a character having a high degree of reliability of a recognition result in an arbitrary processing target line and a character type of a kanji. .

【0027】すなわち,請求項5,6または8におい
て,標準文字幅を,任意の処理対象行における認識結果
の信頼度が高く,かつ文字種が漢字である文字から求め
ることにより,その認識精度を高めることが可能とな
る。
In other words, in claim 5, 6 or 8, the recognition accuracy is improved by determining the standard character width from a character having a high degree of reliability of the recognition result in an arbitrary processing target line and a character type of Chinese character. It becomes possible.

【0028】また,請求項10に係る記憶媒体にあって
は,コンピュータを,前記請求項1〜9記載のいずれか
一つに記載の文字認識装置の文字再切り出し判定手段,
文字再切り出し処理手段および結果評価選択手段として
機能させるためのプログラムを格納したものである。
According to a tenth aspect of the present invention, in the storage medium, a computer is provided with a character re-cutout determining means of the character recognition device according to any one of the first to ninth aspects.
It stores a program for causing it to function as character re-cutout processing means and result evaluation selection means.

【0029】すなわち,請求項10記載の記憶媒体から
プログラムを読み取ることにより,コンピュータ上で文
字認識処理を実行することができる。
That is, by reading the program from the storage medium according to the tenth aspect, character recognition processing can be executed on a computer.

【0030】[0030]

【発明の実施の形態】以下,本発明に係る文字認識装置
およびコンピュータを文字認識装置として機能させるプ
ログラムを格納したコンピュータが読取可能な記憶媒体
について添付図面を参照し,詳細に説明する。
BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing a computer-readable storage medium storing a program for causing a computer to function as a character recognition apparatus according to the present invention.

【0031】〔実施の形態の構成〕図1は,実施の形態
に係る文字認識装置の構成を示すブロック図である。図
において,この文字認識装置は,認識対象のイメージ画
像を入力する画像入力部101と,イメージ画像を行単
位で抽出する行切り出し部102と,行切り出し部10
2で抽出された行部分の文字情報を抽出する文字切り出
し部103と,文字切り出し部103で切り出された文
字情報を認識する文字認識処理部104と,文字認識処
理部104で認識された文字に対し,形態素解析などを
行う言語処理を含んだ後処理を行う後処理部(言語処理
部)105と,認識結果の確からしさを表す信頼度を算
出する信頼度算出部106と,2文字が接触して誤って
1文字として切り出されていると想定されている接触文
字を検出し,その文字の再切り出しの実行可否を判定す
る文字再切り出し判定手段としての文字再切り出し判定
部107と,再切り出しが必要であると判定された文字
矩形に対し,文字矩形を分割する文字再切り出し処理手
段としての文字再切り出し処理部108と,文字矩形の
再切り出し前の認識結果と再切り出し後の認識結果とを
比較し,尤もらしい結果を選択・出力する結果評価選択
手段としての結果評価選択部109と,文字情報を確定
し文字列として出力する確定文字列出力部110と,か
ら構成されている。
[Structure of Embodiment] FIG. 1 is a block diagram showing the structure of a character recognition device according to an embodiment. In the figure, the character recognition apparatus includes an image input unit 101 for inputting an image image to be recognized, a line cutout unit 102 for extracting the image image on a line basis, and a line cutout unit 10.
The character extraction unit 103 extracts the character information of the line portion extracted in step 2, the character recognition processing unit 104 that recognizes the character information extracted by the character extraction unit 103, and the character recognition unit 104 On the other hand, a post-processing unit (language processing unit) 105 that performs post-processing including linguistic processing such as morphological analysis, a reliability calculation unit 106 that calculates reliability indicating the certainty of the recognition result, A character re-cutout determining unit 107 as a character re-cutout determining unit for detecting a contact character assumed to be cut out as one character by mistake and determining whether re-cutting of the character can be executed; A character re-cutout processing unit 108 as character re-cutout processing means for dividing a character rectangle for which a character A result evaluation selecting unit 109 as a result evaluation selecting unit for comparing the result with the recognition result after re-cutting and selecting / outputting a likely result, and a fixed character string output unit 110 for fixing character information and outputting it as a character string It is composed of

【0032】〔実施の形態の動作〕つぎに,以上のよう
に構成された文字認識装置の動作について説明する。ま
ず,通常の文字認識と同様に,画像入力部101からイ
メージ画像を入力し,このイメージ画像に対し,文字認
識処理部104により文字認識処理を行い,さらに後処
理部(言語処理部)105により形態素解析などの言語
処理を含んだ後処理を行い,認識結果を得る。続いて,
信頼度算出部106により上記認識結果の確からしさを
表す信頼度を求める。
[Operation of Embodiment] Next, the operation of the character recognition apparatus configured as described above will be described. First, as in the normal character recognition, an image image is input from the image input unit 101, a character recognition process is performed on the image image by a character recognition processing unit 104, and a post-processing unit (language processing unit) 105 performs the same. Post-processing including language processing such as morphological analysis is performed, and a recognition result is obtained. continue,
The reliability calculating unit 106 obtains a reliability indicating the certainty of the recognition result.

【0033】なお,この信頼度の求め方は,パターン辞
書中の特徴量との距離や第二候補文字との差などを用い
る方法が一般的であるが,その他の求め方であっても勿
論かまわない。
The method of obtaining the reliability is generally a method using the distance from the feature amount in the pattern dictionary, the difference from the second candidate character, and the like. I don't care.

【0034】つぎに,この実施の形態の特徴となる動作
について,文字再切り出し判定処理,文字再切り出
し処理,結果評価選択処理ごとに分けて説明する。
Next, an operation which is a feature of this embodiment will be described separately for each of the character re-cutout determination process, the character re-cutout process, and the result evaluation selection process.

【0035】 文字再切り出し判定処理 この処理は文字再切り出し判定部107により実行され
る。すなわち,2文字が接触して誤って1文字として切
り出されていると想定させる文字(以下,接触文字とい
う)を検出し,その文字の再切り出しを行うか否かを判
定する処理を実行する。
Character Re-cutout Determination Process This process is executed by the character re-cutout determination unit 107. That is, a process is performed to detect a character (hereinafter, referred to as a contact character) assumed to be erroneously cut out as one character by touching two characters, and to determine whether or not to re-cut out the character.

【0036】ここで,再切り出しと判定されたものは,
文字矩形を分割し,文字矩形を複数発生させ,認識処理
を下記のように再実行させる。
Here, what is determined as re-cutout is
The character rectangle is divided, a plurality of character rectangles are generated, and the recognition process is executed again as follows.

【0037】まず,文字を構成しているもの矩形の数が
1であり,かつ文字幅が文字の高さを上回る文字を接触
文字と仮定し,以下に示す3つケースのいずれかが満た
された場合に文字矩形の再切り出しが必要であると判定
する。
First, it is assumed that a character constituting a character has the number of rectangles of 1 and a character width greater than the character height as a contact character, and one of the following three cases is satisfied. In this case, it is determined that the character rectangle needs to be cut out again.

【0038】(ケース1)着目文字の信頼度がしきい値
以下で,着目文字の前後いずれかが英数字であり,着目
文字が英数字でない。たとえば図2(a)の場合であ
る。
(Case 1) The reliability of the target character is equal to or less than the threshold value, one of the characters before and after the target character is an alphanumeric character, and the target character is not an alphanumeric character. For example, the case of FIG.

【0039】(ケース2)着目文字の信頼度がしきい値
以下で,着目文字の前後いずれかが英数字であり,着目
文字が英数字であるとき,着目文字の品詞情報をチェッ
クする。
(Case 2) When the reliability of the target character is equal to or less than the threshold value, any one of the characters before and after the target character is alphanumeric, and the target character is alphanumeric, the part of speech information of the target character is checked.

【0040】この品詞情報は,後処理中の言語処理によ
って認識文字に,たとえば名詞,動詞,未登録語といっ
た情報が付与される。この情報が図2(b)に示すよう
に未登録語であれば,着目文字は誤っている可能性があ
ると仮定し,再切り出しが必要であると判定する。
The part-of-speech information is provided with information such as a noun, a verb, and an unregistered word to the recognized character by language processing during post-processing. If this information is an unregistered word as shown in FIG. 2B, it is assumed that the character of interest may be erroneous, and it is determined that re-cutout is necessary.

【0041】(ケース3)前後の品詞が未登録語でなけ
れば着目画素とその前文字の高さをチェックする。前文
字の英数字と文字の高さが同じ場合のみ再切り出しが必
要であると判定する。
(Case 3) If the part of speech before and after is not an unregistered word, the height of the target pixel and the preceding character are checked. Only when the alphanumeric character of the preceding character and the character height are the same, it is determined that re-cutting is necessary.

【0042】これは,たとえば,“15mEthern
et”の単位を表す“m”などが誤って再切り出される
場合を回避するものである。
This corresponds to, for example, "15 m Ethernet
This is to prevent a case where “m” or the like representing the unit of “et” is erroneously cut out again.

【0043】ここで,さらに図3に示すフローチャート
を参照し,文字再切り出し判定部107による文字再切
り出し処理動作を説明する。この処理動作は,文字構成
矩形数が1で,文字高さ<文字幅の認識結果に対して実
行される。
Here, with reference to the flowchart shown in FIG. 3, the character re-sampling processing operation by the character re-slicing determination unit 107 will be described. This processing operation is executed for the recognition result of the character configuration rectangle number of 1 and the character height <character width.

【0044】図3において,まず,文字の信頼度<TH
cert(信頼度のしきい値:一定)であるか否かを判
断する(S301)。ここで文字の信頼度<THcer
tであると判断した場合,さらに着目文字の前後が英数
字であるか否かを判断する(S302)。
In FIG. 3, first, character reliability <TH
It is determined whether or not cert (reliability threshold: constant) (S301). Here, the reliability of the character <THcer
If it is determined that it is t, it is further determined whether or not the characters before and after the target character are alphanumeric characters (S302).

【0045】上記ステップS302において,着目文字
の前後が英数字であると判断した場合,さらに着目文字
が英数字であるか否かを判断する(S303)。ここ
で,着目文字が英数字ではないと判断した場合,再切り
出しセットをONし(S304),この結果を文字再切
り出し処理部108に与える。
If it is determined in step S302 that the character before and after the target character is alphanumeric, it is further determined whether or not the target character is alphanumeric (S303). Here, when it is determined that the character of interest is not an alphanumeric character, the re-cutout set is turned ON (S304), and the result is given to the character re-cutout processing unit 108.

【0046】また,上記ステップS301において,文
字の信頼度<THcertではないと判断した場合,あ
るいは上記ステップS302において,着目文字の前後
が英数字ではないと判断した場合,再切り出しセットを
OFFし(S305),この結果を文字再切り出し処理
部108に与える。
If it is determined in step S301 that the reliability of the character is not smaller than THcert, or if it is determined in step S302 that the character before and after the target character is not an alphanumeric character, the re-cutout set is turned off ( S 305), and gives the result to the character re-cutout processing unit 108.

【0047】一方,上記ステップS303において,着
目文字が英数字であると判断した場合,さらに未登録語
であるか否かを判断する(S306)。ここで未登録語
ではないと判断した場合,さらに文字の高さが前の文字
と同じであるか否かを判断する(S307)。
On the other hand, if it is determined in step S303 that the character of interest is an alphanumeric character, it is further determined whether or not the character is an unregistered word (S306). If it is determined that the word is not an unregistered word, it is further determined whether or not the character height is the same as the previous character (S307).

【0048】上記ステップS306において,未登録語
であると判断した場合,あるいは上記ステップS307
において,文字の高さが前の文字と同じであると判断し
た場合には,再切り出しセットをONし(S304),
この結果を文字再切り出し処理部108に与える。ま
た,上記ステップS307において,文字の高さが前の
文字と同じではないと判断した場合,再切り出しセット
をOFFし(S305),この結果を文字再切り出し処
理部108に与える。
If it is determined in step S306 that the word is unregistered, or if it is determined in step S307
If it is determined that the character height is the same as the previous character, the re-cutout set is turned ON (S304).
This result is provided to the character re-cutout processing unit 108. If it is determined in step S307 that the character height is not the same as the previous character, the re-slicing set is turned off (S305), and the result is provided to the character re-slicing processing unit 108.

【0049】 文字再切り出し処理 ここでは,上記のように再切り出し処理が必要であると
判定された文字矩形に対し,文字矩形を分割する文字再
切り出し処理を以下のように実行する。
Here, the character re-sampling process for dividing the character rectangle into the character rectangles determined to require the re-segmentation process as described above is executed as follows.

【0050】図4は,実施の形態に係る第1の文字再切
り出し処理例を示すフローチャートである。図におい
て,まず,文字幅方向へ分割文字幅を決定し(S40
1),標準文字幅の再設定を行う(S402)。さらに
1回目の文字切り出し方法とは異なる別の文字切り出し
方法で文字分割幅(以下,SW1,SW2という)を設
定する。なお,この文字切り出し方法はいかなる手段で
あってもよい。文字切り出し方法としては,たとえば一
般的である文字幅方向に黒画素の射影ヒストグラムを測
定し,分割地点を決定する方法を用いてもよい。
FIG. 4 is a flowchart showing an example of the first character re-cutout process according to the embodiment. In the figure, first, the character width is determined in the character width direction (S40).
1), the standard character width is reset (S402). Further, a character division width (hereinafter, referred to as SW1 and SW2) is set by another character extraction method different from the first character extraction method. This character extraction method may be any means. As a character cutting method, for example, a general method of measuring a projection histogram of black pixels in the character width direction and determining a division point may be used.

【0051】つぎに,上記で再設定された分割矩形が文
字矩形として適切であるかを,ステップS402で再設
定された標準文字幅を用いてチェックする。すなわち,
分割文字幅≧(標準文字幅/2)であるか否かを判断す
る(S403)。ここで,分割文字幅≧(標準文字幅/
2)であると判断した場合,上記分割文字幅で文字矩形
を分割する(S404)。一方,分割文字幅≧(標準文
字幅/2)ではないと判断した場合,後述する図5のフ
ローに移行する。
Next, it is checked whether the divided rectangle reset as described above is appropriate as a character rectangle by using the standard character width reset in step S402. That is,
It is determined whether or not the divided character width ≧ (standard character width / 2) (S403). Here, divided character width ≧ (standard character width /
If it is determined to be 2), the character rectangle is divided by the above-described divided character width (S404). On the other hand, if it is determined that the divided character width is not equal to or more than (the standard character width / 2), the flow shifts to the flow of FIG.

【0052】換言すれば,ここで用いる標準文字幅は,
1回目の認識結果の信頼度と文字種によって算出する。
そして,認識結果の信頼度が高く,かつ文字種が漢字で
ある文字の文字幅の平均を標準文字幅とする。矩形文字
は英数字などの半角文字を認識するために行っているの
で,求めた標準文字幅/2の値とSW1,SW2を比較
し,SW1,SW2ともに大きい文字幅であれば求めた
分割幅をそのまま採用する。
In other words, the standard character width used here is
It is calculated based on the reliability and the character type of the first recognition result.
Then, the average of the character widths of the characters whose recognition result has high reliability and the character type is a kanji is set as the standard character width. Since rectangular characters are used to recognize half-width characters such as alphanumeric characters, the calculated standard character width / 2 value is compared with SW1 and SW2. If SW1 and SW2 are both large character widths, the obtained division width is obtained. Is adopted as it is.

【0053】図5は,実施の形態に係る第2の文字再切
り出し処理例を示すフローチャートであり,図4のステ
ップS403において分割文字幅≧(標準文字幅/2)
ではないと判断した場合に,第3の文字切り出しを行う
ために,分割文字幅の再設定を実行する。
FIG. 5 is a flowchart showing an example of the second character re-cutout processing according to the embodiment. In step S403 in FIG. 4, the divided character width ≧ (standard character width / 2).
If it is determined that the character width is not the same, the divided character width is reset in order to perform the third character segmentation.

【0054】図において,まず,認識結果の前後文字の
文字種と空白幅をチェックする。具体的には,前文字が
英数字で,かつLgap<Rgapであるか否かを判断
する(S501)。なお,Lgapは着目文字とその前
文字との文字間隔,Rgapは着目文字とその後文字と
の文字間隔を表している。
In the figure, first, the character type and blank width of the characters before and after the recognition result are checked. Specifically, it is determined whether the preceding character is an alphanumeric character and Lgap <Rgap (S501). Note that Lgap represents the character interval between the target character and the preceding character, and Rgap represents the character interval between the target character and the subsequent character.

【0055】上記ステップS501において,前文字が
英数字で,かつLgap<Rgapであると判断した場
合,分割文字幅を別の手法でセットし,文字矩形を分割
する(S502)。
If it is determined in step S501 that the preceding character is an alphanumeric character and Lgap <Rgap, the character width is set by another method to divide the character rectangle (S502).

【0056】一方,上記ステップS501において,前
文字が英数字で,かつLgap<Rgapではないと判
断した場合,後文字が,英数字で,かつLgap>Rg
apであるか否かを判断する(S503)。ここで英数
字で,かつLgap>Rgapであると判断した場合,
分割文字幅を別の手法でセットし,文字矩形を分割する
(S502)。また,英数字で,かつLgap>Rga
pではないと判断した場合,再切り出しは行わず(S5
04),このフローを終了する。
On the other hand, if it is determined in step S501 that the preceding character is alphanumeric and that Lgap <Rgap is not satisfied, the succeeding character is alphanumeric and Lgap> Rg
It is determined whether it is ap (S503). If it is determined that the characters are alphanumeric and Lgap> Rgap,
The divided character width is set by another method, and the character rectangle is divided (S502). In addition, alphanumeric characters and Lgap> Rga
If it is not p, re-cutout is not performed (S5
04), this flow ends.

【0057】すなわち,ここでは空白幅が狭い方の前あ
るいは後ろの文字種と同様なサイズで文字切り出しを行
うことを狙いとしている。
That is, here, it is intended to extract a character with the same size as the character type before or after the narrower blank width.

【0058】たとえば,図2(c)に示すような場合,
着目文字“皿”の前の文字“1”との空白幅が後ろの文
字“東”との空白幅より狭い。このことから,着目文字
は前の文字“1”の文字幅の方が適していると想定し,
前の文字幅に併せて文字を分割する。その際,前述の図
4において実行した文字切り出しとは異なる方法で文字
を分割する。
For example, in the case shown in FIG.
The space width with the character “1” before the target character “plate” is narrower than the space width with the subsequent character “East”. From this, it is assumed that the character of interest is more suitable for the character width of the previous character “1”,
Divides characters according to the previous character width. At this time, the character is divided by a method different from the character extraction performed in FIG.

【0059】 結果評価選択処理 つぎに,結果評価選択部109の動作について説明す
る。ここでは,文字矩形の再切り出し前の認識結果と再
切り出し後の認識結果とを比較に,尤もらしい結果を選
択する。その際,文字幅が不適切な認識結果を排除し,
確定文字として出力しないようにする。
Next, the operation of the result evaluation selection unit 109 will be described. In this case, a likely result is selected by comparing the recognition result before re-cutting the character rectangle with the recognition result after re-cutting. At that time, the recognition result with inappropriate character width is excluded,
Do not output as fixed characters.

【0060】すなわち,文字の切り出し位置が異なる複
数の認識結果から尤もらしい結果を一意に選択する場合
に,認識結果の信頼度と文字種とから再度標準文字幅を
算出し,再認識した結果の文字種と求めた標準文字幅と
を用い,複数の認識結果候補から不適切な文字候補を排
除する。
That is, when a likely result is uniquely selected from a plurality of recognition results having different character cutout positions, the standard character width is calculated again from the reliability and the character type of the recognition result, and the character type of the re-recognized result is obtained. Then, an inappropriate character candidate is excluded from the plurality of recognition result candidates using the obtained standard character width.

【0061】また,前述した文字再切り出し処理に基づ
いて標準文字幅を再度設定する。すなわち,任意の処理
対象行における認識結果の信頼度が高く,かつ文字種が
漢字である文字から求める。
The standard character width is set again based on the above-described character re-cut processing. That is, it is obtained from a character having a high degree of reliability of a recognition result in an arbitrary processing target line and a character type of a Chinese character.

【0062】さらに,認識結果の文字幅が上記標準文字
幅より小さく,かつ認識結果の文字種が漢字である場合
は,その文字切り出し方が誤っている可能性が高いの
で,その認識結果は確定文字としない。
Further, when the character width of the recognition result is smaller than the standard character width and the character type of the recognition result is a kanji, it is highly probable that the character extraction method is erroneous. And not.

【0063】たとえば,再切り出し前の認識結果文字が
“読”の例では,以下のような手順で処理を実行する。 →文字矩形分割→再切り出し後の認識結果文字“言”と“売” →標準文字幅測定・チェック→標準文字幅>“言”の文字幅 標準文字幅>“売”の文字幅 →再切り出し後の認識結果文字として“言”と“売”は選択しない。
For example, in a case where the recognition result character before re-cutout is “read”, the processing is executed in the following procedure. → Character rectangle division → Recognition result characters “word” and “sell” after re-slicing → Standard character width measurement / check → standard character width> character width of “word” Standard character width> character width of “sell” → re-slicing "Word" and "Sell" are not selected as later recognition result characters.

【0064】[0064]

【発明の効果】以上説明したように,本発明に係る文字
認識装置(請求項1)によれば,文字を構成している矩
形の数が1であり,かつ文字幅が文字の高さを上回る文
字を,2文字が接触して1文字として切り出されると想
定される接触文字候補と仮定し,着目文字の信頼度がし
きい値以下で,着目文字の前後いずれかが英数字であ
り,かつ着目文字が英数字でない場合に再切り出しが必
要であると判定することにより,文字の再切り出しを行
う際に,誤っている可能性のある文字を絞り込み,再切
り出し範囲を的確に検出することが可能となるため,再
切り出し効率が向上する。
As described above, according to the character recognition device of the present invention (claim 1), the number of rectangles constituting a character is one, and the character width is the height of the character. Assuming that a character that exceeds the character is a contact character candidate that is assumed to be cut out as one character by contacting two characters, the reliability of the target character is equal to or less than the threshold value, and one of the characters before and after the target character is alphanumeric, If the target character is not alphanumeric and it is determined that re-segmentation is necessary, when re-segmenting characters, narrow down the characters that may be incorrect and accurately detect the re-segmentation range. Is possible, and the re-cutout efficiency is improved.

【0065】また,本発明に係る文字認識装置(請求項
2)によれば,着目文字とその前の文字の高さをチェッ
クし,前の文字と文字の高さが同じ場合にのみ,再切り
出しが必要であると判定するため,着目画素の信頼度と
前後の文字種情報を用いる請求項1では解決できない場
合に対しても再切り出しを行うことができる。
Further, according to the character recognition device of the present invention (claim 2), the height of the target character and the character before it are checked, and only when the height of the previous character is the same as that of the character, Since it is determined that clipping is necessary, re-clipping can be performed even in a case where the reliability cannot be solved by using the reliability of the target pixel and the character type information before and after.

【0066】また,本発明に係る文字認識装置(請求項
3)によれば,着目文字とその前後の文字間隔をチェッ
クし,着目文字の前文字間隔より後文字の間隔が大きい
場合に,再切り出しが必要であると判定することによ
り,さらに請求項2の処理では解決できない場合に対し
ても再切り出しを行うことができる。
Further, according to the character recognition device of the present invention (claim 3), the character of interest and the character spacing before and after the character of interest are checked. By determining that clipping is necessary, it is possible to perform re-clipping even when the processing cannot be solved by the processing of claim 2.

【0067】また,本発明に係る文字認識装置(請求項
4)によれば,認識結果の言語情報(品詞情報)をチェ
ックすることにより,単語として正しい英字を再切り出
し処理にかける必要がなくなるので,切り出しミスを少
なくすることができると共に,効率的な再切り出しが実
現する。
Further, according to the character recognition device of the present invention (claim 4), by checking the linguistic information (part-of-speech information) of the recognition result, it is not necessary to re-cut out a correct alphabetic character as a word. In addition, cutting errors can be reduced, and efficient re-cutting can be realized.

【0068】また,本発明に係る文字認識装置(請求項
5)によれば,再切り出しされた分割矩形が文字として
適切であるかを判定することにより,異常な矩形を再認
識処理にかける必要がなくなるため,切り出しミスを少
なくすることができる。
According to the character recognition device of the present invention (claim 5), it is necessary to determine whether or not the re-cut segmented rectangle is appropriate as a character so that the abnormal rectangle is subjected to the re-recognition processing. Is eliminated, and cutout errors can be reduced.

【0069】また,本発明に係る文字認識装置(請求項
6,7)によれば,請求項5によって不適切と判定され
たものに限定して,再度切り出しの実行可否を判定し,
適切と判定されたもののみに対して再切り出しを実行す
るため,処理の効率向上を図ることができる。
Further, according to the character recognition device of the present invention (claims 6 and 7), it is determined again whether or not cutout can be executed only for those determined to be inappropriate by claim 5,
Since re-cutting is executed only for those determined to be appropriate, processing efficiency can be improved.

【0070】また,本発明に係る文字認識装置(請求項
8)によれば,文字幅が不適切な認識結果を排除し,確
定文字として出力しないことにより,文字として適した
もののみ再切り出しを実行するため,処理の効率向上を
図ることができる。
Further, according to the character recognition device of the present invention (claim 8), a recognition result having an inappropriate character width is excluded, and the character is not output as a fixed character. Since the execution is performed, the efficiency of the processing can be improved.

【0071】また,本発明に係る文字認識装置(請求項
9)によれば,請求項5,6または8において,標準文
字幅を,任意の処理対象行における認識結果の信頼度が
高く,かつ文字種が漢字である文字から求めるため,そ
の認識精度を高めることができる。
According to the character recognition device of the present invention (claim 9), in claim 5, 6 or 8, the standard character width is set to a value that is high in the reliability of the recognition result in an arbitrary processing target line, and Since the character type is obtained from characters whose characters are kanji, the recognition accuracy can be improved.

【0072】また,本発明に係る記憶媒体(請求項1
0)によれば,コンピュータを,前記請求項1〜9記載
のいずれか一つに記載の文字認識装置の文字再切り出し
判定手段,文字再切り出し処理手段および結果評価選択
手段として機能させるためのプログラムを格納したた
め,コンピュータ上で文字認識処理を実行することがで
き,再切り出し効率の向上を図ることができる。
The storage medium according to the present invention (claim 1)
According to 0), a program for causing a computer to function as a character re-cutout determination unit, a character re-cutout processing unit, and a result evaluation selection unit of the character recognition device according to any one of claims 1 to 9. Is stored, the character recognition processing can be executed on the computer, and the re-segmentation efficiency can be improved.

【図面の簡単な説明】[Brief description of the drawings]

【図1】実施の形態に係る文字認識装置の構成を示すブ
ロック図である。
FIG. 1 is a block diagram illustrating a configuration of a character recognition device according to an embodiment.

【図2】実施の形態に係る接触文字例およびその文字切
り出し例を示す説明図である。
FIG. 2 is an explanatory diagram showing an example of a contact character and an example of cutting out the character according to the embodiment.

【図3】実施の形態に係る文字切り出し判定部の処理動
作を示すフローチャートである。
FIG. 3 is a flowchart illustrating a processing operation of a character cutout determination unit according to the embodiment.

【図4】実施の形態に係る第1の文字再切り出し処理例
を示すフローチャートである。
FIG. 4 is a flowchart showing an example of a first character re-cutout process according to the embodiment;

【図5】実施の形態に係る第2の文字再切り出し処理例
を示すフローチャートである。
FIG. 5 is a flowchart illustrating an example of a second character re-cutout process according to the embodiment;

【符号の説明】[Explanation of symbols]

101 画像入力部 102 行切り出し部 103 文字切り出し部 104 文字認識処理部 105 後処理部(言語処理部) 106 信頼度算出部 107 文字再切り出し判定部 108 文字再切り出し処理部 109 結果評価選択部 110 確定文字列出力部 Reference Signs List 101 Image input unit 102 Line cutout unit 103 Character cutout unit 104 Character recognition processing unit 105 Post-processing unit (language processing unit) 106 Reliability calculation unit 107 Character recutout determination unit 108 Character recutout processing unit 109 Result evaluation selection unit 110 Confirmation String output section

Claims (10)

【特許請求の範囲】[Claims] 【請求項1】 入力された文字画像情報から文字列と文
字矩形とを切り出して文字矩形を認識し,該認識結果の
確からしさを表す信頼度を求め,さらに前記信頼度に応
じて前記文字矩形を再切り出して認識し,最初の認識結
果と再切り出し後の認識結果とを比較評価し,尤もらし
い結果を選択・出力する文字認識装置において,文字の
構成矩形数が1であり,かつ文字幅が文字高さより長い
文字矩形に対し,認識結果の信頼度と着目文字および該
着目文字の前後の文字情報とに基づいて,文字を分離す
る再切り出し処理の実行可否を判定する文字再切り出し
判定手段を備えたことを特徴とする文字認識装置。
1. A character string and a character rectangle are cut out from input character image information to recognize the character rectangle, a reliability indicating a certainty of the recognition result is obtained, and the character rectangle is determined in accordance with the reliability. In the character recognition device for re-cutting out and recognizing and comparing and evaluating the first recognition result and the re-cutting-out result and selecting and outputting a likely result, the number of rectangles constituting the character is 1 and the character width is Re-cutout determining means for determining whether or not re-cutout processing for separating characters can be executed based on the reliability of the recognition result and the target character and character information before and after the target character for a character rectangle longer than the character height A character recognition device comprising:
【請求項2】 前記文字再切り出し判定手段は,文字の
構成矩形数が1であり,かつ文字幅が文字高さより長い
文字矩形に対し,認識結果の信頼度と着目文字および該
着目文字の前後の文字情報と着目文字および該着目文字
の直前の文字の高さとに基づいて,文字を分離する再切
り出し処理の実行可否を判定することを特徴とする請求
項1に記載の文字認識装置。
2. The character re-slicing determining means determines the reliability of the recognition result, the character of interest, and the character before and after the character of interest for a character rectangle whose number of constituent rectangles is one and whose character width is longer than the character height. 2. The character recognition device according to claim 1, wherein whether or not re-segmentation processing for separating characters is to be performed is determined based on the character information and the height of the target character and the character immediately before the target character.
【請求項3】 前記文字再切り出し判定手段は,文字の
構成矩形数が1であり,かつ文字幅が文字高さより長い
文字矩形に対し,認識結果の信頼度と着目文字の前後の
文字情報と着目文字の前後文字との間隔(空白)幅の比
とに基づいて,文字を分離する再切り出し処理の実行可
否を判定することを特徴とする請求項1に記載の文字認
識装置。
3. The character re-slicing determining means determines the reliability of the recognition result and the character information before and after the character of interest for a character rectangle whose number of constituent rectangles is 1 and whose character width is longer than the character height. 2. The character recognition apparatus according to claim 1, wherein whether or not re-segmentation processing for separating characters is to be executed is determined based on a ratio of an interval (blank) width between the character before and after the character of interest.
【請求項4】 前記文字再切り出し判定手段は,文字の
構成矩形数が1であり,かつ文字幅が文字高さより長い
文字矩形に対し,認識結果の言語情報(品詞情報)に基
づいて,文字を分離する再切り出し処理の実行可否を判
定することを特徴とする請求項1に記載の文字認識装
置。
4. The character re-slicing determining means determines a character rectangle based on linguistic information (part-of-speech information) of a recognition result with respect to a character rectangle having a character configuration rectangle number of 1 and a character width longer than the character height. 2. The character recognition device according to claim 1, wherein it is determined whether or not re-cutout processing for separating the character string can be executed.
【請求項5】 前記文字再切り出し判定手段により再度
文字を分離する再切り出し処理を行うと判定され,1回
目の認識結果と異なる文字切り出し方法で再切り出し位
置を検出した場合,認識結果の信頼度と文字種とから再
度標準文字幅を求め,該標準文字幅を用いて分割文字矩
形幅の妥当性を判定し,文字再切り出し処理の実行可否
を判定・処理する文字再切り出し処理手段をさらに備え
たことを特徴とする請求項1〜4のいずれか一つに記載
の文字認識装置。
5. When the character re-sampling determining means determines that re-segmentation processing for separating characters again is to be performed and a re-segmentation position is detected by a character segmentation method different from the first recognition result, the reliability of the recognition result is determined. And a character re-cut processing means for determining again the standard character width from the character type and the standard character width, determining the validity of the divided character rectangle width using the standard character width, and judging and processing whether or not the character re-cut processing can be executed. The character recognition device according to claim 1, wherein:
【請求項6】 前記文字再切り出し処理手段は,1回目
の認識結果と異なる文字切り出し方法で再切り出し位置
が適切でないと判定された場合,認識結果の前後の文字
種情報と着目文字の前後の空白幅とを用い,文字再切り
出し処理の実行可否を再度判定することを特徴とする請
求項5に記載の文字認識装置。
6. The character re-cutout processing means, when it is determined that the re-cutout position is not appropriate by a character cutout method different from the first recognition result, character type information before and after the recognition result and a blank before and after the target character. 6. The character recognition device according to claim 5, wherein whether or not the character re-cutout processing can be executed is determined again using the width.
【請求項7】 前記文字再切り出し処理手段は,着目文
字を分割するのが適切であると判定した場合,前記文字
切り出し方法とは異なる第三の文字切り出し方法を用い
て再切り出し処理を実行することを特徴とする請求項6
に記載の文字認識装置。
7. The character re-segmentation processing means, when judging that it is appropriate to divide a character of interest, executes re-segmentation processing using a third character segmentation method different from the character segmentation method. 7. The method according to claim 6, wherein
The character recognition device according to 1.
【請求項8】 文字の切り出し位置が異なる複数の認識
結果から尤もらしい結果を一意に選択する場合,認識結
果の信頼度と文字種とから再度標準文字幅を求め,再認
識した結果の文字種と前記標準文字幅とを用いて評価
し,複数の認識結果候補から不適切な候補を排除して出
力する結果評価選択手段をさらに備えたことを特徴とす
る請求項1〜7のいずれか一つに記載の文字認識装置。
8. When a likely result is uniquely selected from a plurality of recognition results having different character cutout positions, a standard character width is obtained again from the reliability of the recognition result and the character type, and the character type of the re-recognized result and the character type are determined. 8. The apparatus according to claim 1, further comprising a result evaluation selecting unit that evaluates using the standard character width and outputs an inappropriate candidate from a plurality of recognition result candidates. Character recognition device according to the description.
【請求項9】 前記標準文字幅を,任意の処理対象行に
おける認識結果の信頼度が高く,かつ文字種が漢字であ
る文字から求めることを特徴とする請求項5,6または
8に記載の文字認識装置。
9. The character according to claim 5, wherein the standard character width is determined from a character having a high degree of reliability of a recognition result in an arbitrary processing target line and a character type of a Chinese character. Recognition device.
【請求項10】 コンピュータを,前記請求項1〜9記
載のいずれか一つに記載の文字認識装置の文字再切り出
し判定手段,文字再切り出し処理手段および結果評価選
択手段として機能させるためのプログラムを格納したこ
とを特徴とするコンピュータが読取可能な記憶媒体。
10. A program for causing a computer to function as character re-cutout determination means, character re-cutout processing means and result evaluation selection means of the character recognition device according to any one of claims 1 to 9. A computer-readable storage medium having stored therein.
JP31206796A 1996-11-22 1996-11-22 Character recognition device and character recognition method Expired - Fee Related JP3665435B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP31206796A JP3665435B2 (en) 1996-11-22 1996-11-22 Character recognition device and character recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP31206796A JP3665435B2 (en) 1996-11-22 1996-11-22 Character recognition device and character recognition method

Publications (2)

Publication Number Publication Date
JPH10154208A true JPH10154208A (en) 1998-06-09
JP3665435B2 JP3665435B2 (en) 2005-06-29

Family

ID=18024838

Family Applications (1)

Application Number Title Priority Date Filing Date
JP31206796A Expired - Fee Related JP3665435B2 (en) 1996-11-22 1996-11-22 Character recognition device and character recognition method

Country Status (1)

Country Link
JP (1) JP3665435B2 (en)

Also Published As

Publication number Publication date
JP3665435B2 (en) 2005-06-29

Similar Documents

Publication Publication Date Title
JPH04195692A (en) Document reader
JP2000315247A (en) Character recognizing device
JP5189056B2 (en) Mark item recognition device, mark item recognition method, and mark item recognition program
JP3469375B2 (en) Method for determining certainty of recognition result and character recognition device
JPH10154208A (en) Character recognition device, and storage medium, which computer storing program for functioning computer as character recognition device can read
JP4087191B2 (en) Image processing apparatus, image processing method, and image processing program
JP2004046723A (en) Method for recognizing character, program and apparatus used for implementing the method
JP3546553B2 (en) Document image analyzer
JPH06348911A (en) English character recognition device
JP3487400B2 (en) Character recognition device, character recognition method, and storage medium
JPH11191135A (en) Japanese/english discriminating method for document image, document recognizing method and recording medium
JP3537570B2 (en) Space detection method for Japanese-English mixed documents, pitch format determination method, and space detection method for fixed-pitch alphanumeric character strings
JPH0749926A (en) Character recognizing device
JPH09274645A (en) Method and device for recognizing character
US11710331B2 (en) Systems and methods for separating ligature characters in digitized document images
JP2728086B2 (en) Character extraction method
JP3420853B2 (en) Character extraction method
JPH01277989A (en) Character string pattern reader
JPH0496882A (en) Full size/half size discriminating method
JPH06243294A (en) Character recognition postprocessing device
JPH05174185A (en) Japanese character recognizing device
JPH0950488A (en) Method for reading different size characters coexisting character string
JP3419418B2 (en) Character reading method and device
JPH10214308A (en) Character discrimination method
JPH07319998A (en) Method for segmenting character

Legal Events

Date Code Title Description
A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20040928

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20041130

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20050131

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20050329

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20050401

R150 Certificate of patent or registration of utility model

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20080408

Year of fee payment: 3

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090408

Year of fee payment: 4

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100408

Year of fee payment: 5

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100408

Year of fee payment: 5

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110408

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120408

Year of fee payment: 7

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130408

Year of fee payment: 8

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20140408

Year of fee payment: 9

LAPS Cancellation because of no payment of annual fees