JP2000090203A - Method and device for recognizing character - Google Patents

Method and device for recognizing character

Info

Publication number
JP2000090203A
JP2000090203A JP10259879A JP25987998A JP2000090203A JP 2000090203 A JP2000090203 A JP 2000090203A JP 10259879 A JP10259879 A JP 10259879A JP 25987998 A JP25987998 A JP 25987998A JP 2000090203 A JP2000090203 A JP 2000090203A
Authority
JP
Japan
Prior art keywords
character
recognition
similarity
reliability
characters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP10259879A
Other languages
Japanese (ja)
Other versions
JP3374762B2 (en
Inventor
Kenji Kondo
堅司 近藤
Toshiyuki Koda
敏行 香田
Tsuyoshi Megata
強司 目片
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP25987998A priority Critical patent/JP3374762B2/en
Publication of JP2000090203A publication Critical patent/JP2000090203A/en
Application granted granted Critical
Publication of JP3374762B2 publication Critical patent/JP3374762B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PROBLEM TO BE SOLVED: To detect a character having the possibility of erroneous recognition by finding the reliability of the recognized result while using the relation of character form feature amounts between the concerned character and the other character. SOLUTION: The same character type block extracting part 6 classifies inputted character images into the groups of characters with the same character types and for each block, a character type feature amount extracting part 7 extracts the character type feature amount. Concerning the combination of all the characters in the same character type block, a similarity calculating part 8 calculates the similarity of character type feature amounts and a similarity discriminating part 9 compares the similarity found by the similarity calculating part 8 with a predetermined threshold value and stores the result in a similarity information recording part 10. Concerning each character in the group of characters judged as the same character type, an erroneous recognition detecting part 11 calculates the score while using specified information provided from the relation between the similarity of character types of characters and the recognized result. Thus, the high-reliability score can be defined while considering the relation with the other character and a reject character can be determined.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【0001】[0001]

【発明の属する技術分野】本発明は、文字画像から抽出
した特徴量を他の文字の特徴量と比較することにより文
字認識結果の信頼度を計算する方法及びそれを用いて、
誤認識の可能性のある文字を訂正する誤認識訂正方法に
関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for calculating the reliability of a character recognition result by comparing a characteristic amount extracted from a character image with a characteristic amount of another character, and using the method.
The present invention relates to an erroneous recognition correction method for correcting a character that may be erroneously recognized.

【0002】[0002]

【従来の技術】文字認識処理は、通常、文字単位で切り
出された文字画像から特徴量を抽出し、予め学習により
カテゴリ間の境界が形成された特徴量空間内での位置に
よって、文字画像のカテゴリを決定する。
2. Description of the Related Art In character recognition processing, a feature amount is usually extracted from a character image cut out in character units, and the character image is extracted based on a position in a feature amount space in which boundaries between categories are formed in advance by learning. Determine the category.

【0003】例えば、統計的手法を用いた手書き文字認
識の場合は、予め、各文字カテゴリ毎に、多数の筆記者
によって記入された文字画像を用意し、その文字画像か
ら抽出された特徴量が、全体として最もうまくカテゴリ
間が分離できるようにカテゴリ間境界が形成される。
For example, in the case of handwritten character recognition using a statistical method, a character image filled in by a number of scribes is prepared in advance for each character category, and the feature amount extracted from the character image is determined. , An inter-category boundary is formed so that the categories can be best separated as a whole.

【0004】このようにして形成されたカテゴリ境界付
近では、しばしば次のような誤認識を生じる。
In the vicinity of the category boundary formed in this way, the following erroneous recognition often occurs.

【0005】異なる筆記者間の手書き文字の変動は非常
に大きいため、学習データにおいて、ある筆記者Aの書
いたあるカテゴリ1の文字は、別の筆記者Bの書いた別
のカテゴリ2の文字と非常に類似している場合がある。
[0005] Since the variation of handwritten characters between different writers is very large, in the training data, a character of a certain category 1 written by a certain writer A is replaced by a character of another category 2 written by another writer B. May be very similar.

【0006】当然、特徴量空間内においても、カテゴリ
1とカテゴリ2の境界付近では2つのカテゴリの特徴量
が混在する。
Naturally, even in the feature quantity space, feature quantities of two categories are mixed near the boundary between category 1 and category 2.

【0007】このような場合、学習時には多数の筆記者
に記入された文字全体で最適になるようカテゴリ間境界
が形成される。つまり、局所的な境界全てにおいて、カ
テゴリを分離できるとは限らないということである。
In such a case, at the time of learning, an inter-category boundary is formed so that the entire character written by a large number of writers is optimal. In other words, categories cannot always be separated at all local boundaries.

【0008】よって、認識時にはカテゴリ1,2の境界
付近で誤認識が起こる。このような問題に関して、特開
平8-50635号公報、特開平10-63785号公報では、筆記の
個人性を用いて、カテゴリ間境界付近の誤認識を検出・
修正する試みがなされている。
Therefore, at the time of recognition, erroneous recognition occurs near the boundary between categories 1 and 2. Regarding such a problem, JP-A-8-50635 and JP-A-10-63785 use the personality of writing to detect erroneous recognition near the boundary between categories.
Attempts have been made to fix it.

【0009】特開平8-50635号公報に記載の方法は、誤
読文字の字形と正読文字の字形とを比較したとき、誤読
文字には何らかの不自然さが存在するという考え方に基
づいている。
The method described in Japanese Patent Application Laid-Open No. 8-50635 is based on the idea that when the character shape of a misread character and the character shape of a correct character are compared, there is some unnaturalness in the misread character.

【0010】同一筆記者による手書き文字においては、
「同じカテゴリに属する文字は同じような字形であ
る」、「異なるカテゴリにも字形の間には相関が存在す
る」という2つの特性があるとし、認識処理で用いられ
る特徴量ベクトルを用いて字形ベクトルを算出し、着目
文字の字形ベクトルと他の文字の字形ベクトルとの比較
より上記2つの特性を反映した着目文字の「不自然さ」
を求め、誤読の検出に用いる。
[0010] In handwritten characters by the same writer,
It is assumed that there are two characteristics, "characters belonging to the same category have the same glyphs" and "characters in different categories have a correlation". The vector is calculated, and the "unnaturalness" of the target character reflecting the above two characteristics is compared with the character shape vector of the target character and the character shape vector of another character.
And use it to detect misreading.

【0011】具体的には、着目する文字と同じカテゴリ
に認識された文字との関係を用いて不自然さを算出する
カテゴリ内検出と、着目する文字と異なるカテゴリに認
識された文字との関係を用いて不自然さを算出するカテ
ゴリ間検出とがある。
More specifically, the in-category detection of calculating unnaturalness using the relationship between the character of interest and the character recognized in the same category, and the relationship between the character of interest and the character recognized in a different category from the character of interest. Between categories, which calculates unnaturalness using the above.

【0012】この特許公開公報の実施例では、カテゴリ
内検出については、(数1)のような不自然さQWを定
義している。
In the embodiment of this patent publication, an unnaturalness QW as shown in (Equation 1) is defined for detection in a category.

【0013】[0013]

【数1】 (Equation 1)

【0014】また、カテゴリ間検出について、(数2)
のような不自然さQBを定義している。
Further, regarding the detection between categories, (Equation 2)
Is defined as the unnaturalness QB.

【0015】[0015]

【数2】 (Equation 2)

【0016】また、特開平10-63785号公報の方法は、
「同一筆者でも同一文字カテゴリに複数の字形を筆記す
る」という調査結果に基づいている。
The method disclosed in Japanese Patent Application Laid-Open No. 10-63785 is
It is based on a survey result that "even the same writer writes multiple glyphs in the same character category."

【0017】まず、文字認識結果のカテゴリ毎に、認識
処理で用いられる特徴量ベクトルのクラスタリングを行
う。そして要素数の少ないクラスタを誤読文字からなる
クラスタと見なして抽出する。その誤読クラスタと最も
距離の近いクラスタを見つけ、それらのクラスタ間の距
離が閾値以下の場合に、要素数の少ないクラスタ内の要
素の文字カテゴリを統合相手クラスタの属する文字カテ
ゴリへと修正する。
First, clustering of feature vectors used in the recognition process is performed for each category of the character recognition result. Then, a cluster having a small number of elements is extracted as a cluster composed of misread characters. The cluster closest to the misread cluster is found, and when the distance between the clusters is equal to or smaller than the threshold, the character category of the element in the cluster having a small number of elements is corrected to the character category to which the integration partner cluster belongs.

【0018】[0018]

【発明が解決しようとする課題】しかし、特開平8-5063
5号公報に記載の方法では、不自然さQWがうまく働くた
めには、カテゴリaに認識された文字において、誤認識
文字数に対して正解文字数が十分大きくなくてはならな
い。また、QBを求めるためには、あらかじめ筆記者毎
の全文字カテゴリを備えた、膨大な文字データを必要と
する。
SUMMARY OF THE INVENTION However, Japanese Patent Application Laid-Open No. Hei 8-5063
In the method described in Japanese Patent Application Laid-Open No. 5-205, in order for the unnaturalness QW to work well, the number of correct characters in the characters recognized in the category a must be sufficiently large with respect to the number of misrecognized characters. Further, in order to obtain QB, an enormous amount of character data having all character categories for each writer in advance is required.

【0019】また、特開平10-63785号公報に記載の方法
は、「認識対象の文書画像内に、カテゴリ毎に文字数が
十分な数だけ存在する」ということが必要条件である。
つまり、振込伝票のような、口座、金額欄合わせて20
桁程度のものに対しては、全て同一人による手書き筆記
であったとしても、カテゴリ当たり十分な数が存在する
とは限らないため、この方法が有効であるとは言えな
い。
The method described in Japanese Patent Application Laid-Open No. 10-63785 is a necessary condition that "there is a sufficient number of characters for each category in a document image to be recognized".
In other words, such as a transfer slip, account and money
This method cannot be said to be effective because even for handwriting by the same person, there is not always a sufficient number of items per category.

【0020】また、これら2つの他にも、前もって筆記
者毎に文字サンプルを収集して、これに基づいて認識辞
書を作成するという方法を採ったり、入力文字の特徴を
抽出する際に、筆記者に合わせた補正をするものもある
が、筆記者毎にサンプルを収集するのは大変な手間を必
要とするし、筆記者の筆記特性も月日と共に少しずつ変
化していくことが考えられるので、実用的ではない。
In addition to the above two methods, when a character sample is collected for each writer in advance and a recognition dictionary is created based on the collected character samples, or when a feature of an input character is extracted, a method of writing is used. Some corrections are tailored to the writer, but collecting samples for each writer requires a great deal of trouble, and it is conceivable that the writer's writing characteristics will gradually change over time. So impractical.

【0021】本発明は、予め筆記者毎のサンプル収集を
必要とすること無しに、手書き表記の振り込み伝票のよ
うな桁数の少ないものに対しても、文書画像内の筆記特
性を有効に利用した、信頼度計算方法および誤認識訂正
方法を提供することを目的とする。
According to the present invention, a writing characteristic in a document image can be effectively used even for a document having a small number of digits, such as a handwritten transfer slip, without requiring a sample collection for each writer in advance. It is another object of the present invention to provide a reliability calculation method and an erroneous recognition correction method.

【0022】[0022]

【課題を解決するための手段】上記課題を解決するた
め、本発明は、以下の構成を採る。
Means for Solving the Problems In order to solve the above problems, the present invention employs the following constitution.

【0023】請求項1記載の発明は、複数の文字画像を
含む文書画像内の前記文字画像を入力とし、認識対象の
全カテゴリのうち少なくとも1つ以上のカテゴリを認識
結果として出力する文字認識を行うステップと、前記文
字画像の形状を数値化した字形特徴量を抽出するステッ
プと、任意の文字に対して、前記文字の字形特徴量と他
の文字の字形特徴量間の類似度合と、前記文字の認識結
果と他の文字の認識結果が同一であるか異なっているか
ということにより、前記文字の認識結果の信頼性を表す
信頼度を算出するステップとを有する。
According to a first aspect of the present invention, there is provided a character recognition apparatus which receives a character image in a document image including a plurality of character images as input and outputs at least one or more of all categories to be recognized as a recognition result. Performing, extracting a character-shaped feature value obtained by digitizing the shape of the character image, and for any character, the similarity between the character-shaped feature value of the character and the character-shaped feature value of another character; Calculating the reliability indicating the reliability of the character recognition result based on whether the character recognition result and the character recognition result are the same or different.

【0024】請求項7記載の発明は、任意の文字に対し
て、前記文字と他の文字の関係が、前記類似度合が類似
であり、かつ、認識結果が互いに異なる文字の個数と、
前記類似度合が非類似であり、かつ、認識結果が互いに
等しい文字の個数と、前記類似度合がが類似であり、か
つ、認識結果が互いに等しい文字の個数とを求め、前記
個数をもとに信頼度を算出する。
According to a seventh aspect of the present invention, for a given character, the relationship between the character and another character is the number of characters whose similarity is similar and whose recognition results are different from each other.
The similarity is dissimilar, and the number of characters whose recognition results are equal to each other, and the number of characters whose similarity is similar, and the recognition results are equal to each other, are determined based on the number. Calculate the reliability.

【0025】請求項8記載の発明は、任意の文字に対し
て、前記文字と他の文字との認識結果が同一の場合、前
記文字の字形特徴量と他の文字の字形特徴量間の類似度
に応じた信頼度S1を与え、前記文字と他の文字との認
識結果が異なる場合、前記文字の字形特徴量と他の文字
の字形特徴量間の類似度に応じた信頼度S2を与える。
According to an eighth aspect of the present invention, when an arbitrary character has the same recognition result between the character and another character, the similarity between the character shape characteristic amount of the character and the character shape characteristic amount of another character is obtained. When the recognition result between the character and another character is different, a reliability S2 according to the similarity between the character shape amount of the character and the character shape amount of the other character is provided. .

【0026】請求項13記載の発明は、信頼度があらか
じめ定めた閾値よりも小さい文字に対しては認識結果
を、認識結果とは異なるカテゴリである修正カテゴリに
置換して、複数文字の信頼度を再計算し、前記再計算さ
れた信頼度がもとの信頼度よりも大きくなる場合に、認
識結果を前記修正カテゴリに決定する。
According to a thirteenth aspect of the present invention, for a character whose reliability is smaller than a predetermined threshold, the recognition result is replaced with a correction category which is a category different from the recognition result, and the reliability of a plurality of characters is reduced. Is recalculated, and when the recalculated reliability is larger than the original reliability, the recognition result is determined to be the correction category.

【0027】[0027]

【発明の実施の形態】次に、本発明の実施の形態につい
て図面を参照して説明する。
Next, embodiments of the present invention will be described with reference to the drawings.

【0028】(実施の形態1)図1は、本発明の実施の
形態1に関わる文字認識装置の構成図である。
(Embodiment 1) FIG. 1 is a configuration diagram of a character recognition device according to Embodiment 1 of the present invention.

【0029】図1において、画像入力部1,前処理部
2,文字画像記憶部3,特徴抽出部4,認識部5,同一
字体ブロック抽出部6,字形特徴抽出部7,類似度計算
部8,類似判定部9,類似情報記憶部10,誤認識検出
部11とで、文字認識装置を構成する。
In FIG. 1, an image input unit 1, a preprocessing unit 2, a character image storage unit 3, a feature extraction unit 4, a recognition unit 5, an identical font block extraction unit 6, a character shape feature extraction unit 7, and a similarity calculation unit 8 , A similarity determination unit 9, a similarity information storage unit 10, and an erroneous recognition detection unit 11 constitute a character recognition device.

【0030】次にこのような文字認識装置の動作につい
て詳細に説明する。この文字認識装置は、あらかじめ多
数の学習データである文字画像から特徴抽出部4におい
て特徴量を抽出し、それらの特徴量から認識部5におけ
る、特徴量とカテゴリの対応付け(学習)を行ってお
く。
Next, the operation of such a character recognition device will be described in detail. In this character recognition device, a feature amount is extracted in advance from a character image, which is a large number of learning data, by a feature extraction unit 4, and the recognition unit 5 associates (learns) the feature amount with a category from the feature amount. deep.

【0031】認識時においては、複数の文字が印字、ま
たは、手書き表記された文書画像を、画像入力部1によ
り入力する。入力された文書の例(振込伝票)を図2に
示す。前処理部2においては、ノイズ除去、枠線除去の
後、認識対象の複数の文字を1文字単位の画像に切り出
し、切り出された文字画像は文字画像記憶部3に蓄えら
れる。
At the time of recognition, a document image in which a plurality of characters are printed or handwritten is input by the image input unit 1. FIG. 2 shows an example of the input document (transfer slip). In the pre-processing unit 2, after the noise removal and the frame line removal, a plurality of characters to be recognized are cut out into images in units of one character, and the cut-out character images are stored in the character image storage unit 3.

【0032】特徴抽出部4では、文字画像から認識時に
用いる特徴量を抽出する。認識部5では、その特徴量を
用いて認識を行い、少なくとも1つ以上(本実施の形態
ではN=1)の認識候補カテゴリを出力する。
The feature extracting unit 4 extracts a feature used at the time of recognition from the character image. The recognizing unit 5 performs recognition using the feature amount, and outputs at least one or more (N = 1 in the present embodiment) recognition candidate categories.

【0033】同一字体ブロック抽出部6では、文字画像
から得られる簡単な情報(本実施の形態では、文字幅、
文字高さ)により、入力された文字画像を、字体が同一
である文字のグループ(同一字体ブロック)に分類す
る。入力文字画像に対して、同一字体ブロックが求めら
れた結果の例を図3に示す。ここでいう文字の字体と
は、活字であるか手書き文字であるかということと、活
字であれば活字の字体の区別を含むものである。同一字
体ブロックに分類することの理由は、その後の誤認識検
出時に、文字の類似度合と認識カテゴリの関係により誤
認識の可能性がある文字を検出するわけであるが、手書
き文字と活字文字の場合、または、活字文字であっても
字体が異なる場合は、同一カテゴリの文字であっても文
字は類似していないため、これらの文字が混在した状態
で処理を行うと、その後の処理がうまく行かないためで
ある。本実施の形態では、活字文字であるならば文字
幅、高さはほぼ一定であると仮定し、文字幅、高さの分
散が予め定められた閾値以下の場合に活字であると決定
している。この時、手書き文字に関しては、1つの文書
内に書かれる手書き文字は、すべて同一の筆記者によっ
て記入されることを前提条件としている。
In the same font block extraction unit 6, simple information (in this embodiment, character width,
Based on the character height, the input character image is classified into a group of characters having the same font (the same font block). FIG. 3 shows an example of the result of obtaining the same character block for the input character image. Here, the font of the character includes whether the character is a print type or a handwritten character, and if it is a print type, includes the distinction of the print type. The reason for classifying into the same font block is to detect characters that may be erroneously recognized due to the relationship between the similarity of characters and the recognition category at the time of erroneous recognition detection. If the characters are different even if they are print characters, even if they are characters of the same category, the characters are not similar. Because they do not go. In the present embodiment, it is assumed that the character width and height are almost constant if the character is a print character, and that the character width and the height are determined to be print when the dispersion of the height is equal to or less than a predetermined threshold. I have. At this time, it is assumed that handwritten characters written in one document are all entered by the same writer.

【0034】同一字体ブロック抽出部6で同一字体ブロ
ックであると判断されたブロック毎に、ブロックに含ま
れる全ての文字に対して、字形特徴抽出部7で字形特徴
量が抽出される。
For each block that is determined to be the same font block by the same font block extraction unit 6, the character shape feature amount is extracted by the character shape feature extraction unit 7 for all the characters included in the block.

【0035】類似度計算部8では、同一字体ブロック内
の全ての文字の組み合わせに対して、字形特徴量同士の
類似度を計算する。例えば、同一字体ブロックにM個の
文字が含まれているとすると、M文字の全ての組み合わ
せであるM・(M−1)/2個の類似度を計算する。本
実施の形態では、類似度に(数3)のような類似度を採
用する。
The similarity calculation unit 8 calculates the similarity between the character-shaped feature amounts for all combinations of characters in the same font block. For example, assuming that M characters are included in the same font block, M · (M−1) / 2 similarities, which are all combinations of M characters, are calculated. In the present embodiment, a similarity such as (Equation 3) is adopted as the similarity.

【0036】[0036]

【数3】 (Equation 3)

【0037】次に類似判定部9では、類似度計算部8で
求めた類似度が予め定めた閾値T1より大きければ”類
似”と判定し、予め定めた閾値T2より小さければ”非
類似”と判定する処理を行い、その結果を類似情報記憶
部10に蓄える。
Next, the similarity judging section 9 judges "similarity" if the similarity calculated by the similarity calculating section 8 is larger than a predetermined threshold T1, and judges "dissimilar" if it is smaller than the predetermined threshold T2. A determination process is performed, and the result is stored in the similar information storage unit 10.

【0038】また、誤認識検出部11では、同一字体で
あると判断された文字のグループ内の各文字に対して、 A.類似と判断され、認識結果の文字カテゴリが同一で
ある文字の数 B.類似と判断され、認識結果の文字カテゴリが異なる
文字の数 C.非類似と判断され、認識結果の文字カテゴリが同一
である文字の数 の各条件にあてはまる、同一グループ内の文字数をそれ
ぞれカウントする。ここでは、図4のように、[000120]
という数字の列を同一筆記者によって手書き筆記された
ものを認識した結果が[006120]であった場合について考
える。(3文字目は正解0に対して6と誤認識してい
る)このような文字画像と認識結果に対して、上記A,
B,Cの条件にあてはまる文字数をカウントした例を図
5に示す。
In addition, the erroneous recognition detecting section 11 performs the following processing on each character in the group of characters determined to have the same font. Number of characters determined to be similar and having the same character category as the recognition result B. Number of characters that are determined to be similar and have different character categories in the recognition result Count the number of characters in the same group that are judged to be dissimilar and meet the conditions for the number of characters with the same character category in the recognition result. Here, as shown in FIG.
Let us consider a case in which the result of recognizing a handwritten handwriting of the string of numbers by the same writer is [006120]. (The third character is erroneously recognized as 6 for a correct answer of 0).
FIG. 5 shows an example in which the number of characters that meet the conditions B and C is counted.

【0039】図5の結果を出したときの類似判定部9で
の判定結果は、1,2,3,6番目の文字のうちのどの
2文字も互いに類似と判定されており、4番目の文字と
1,2,3,6番目の文字は互いに非類似と判定されて
いる。その他の組み合わせは、類似、非類似とも判定さ
れていない組み合わせである。ここで、Aを正のスコ
ア、B,Cを負のスコアと考えると、これらの合計スコ
アが最も低いものは3番目の文字(スコアは-3)であ
り、4,5番目の文字がスコア0、1,2,6番目の文
字がスコア1という順になる。よってこのスコアをもと
に、誤認識らしい文字を検出(リジェクト)することが
出来る(3番目の文字は実際に誤認識である)。
When the result of FIG. 5 is obtained by the similarity determination section 9, the two characters out of the first, second, third and sixth characters are determined to be similar to each other, and the fourth The character and the first, second, third, and sixth characters are determined to be dissimilar to each other. Other combinations are combinations that are not determined to be similar or dissimilar. Here, assuming that A is a positive score and B and C are negative scores, the one with the lowest total score is the third character (the score is -3), and the fourth and fifth characters are the scores. The 0th, 1st, 2nd, and 6th characters are in the order of score 1. Therefore, based on this score, a character that is likely to be erroneously recognized can be detected (rejected) (the third character is actually erroneously recognized).

【0040】以上のように、文字の字形同士の類似度合
と認識結果の関係から得られる情報A,B,Cを用いて
スコアを算出することにより、他の文字との関係も考慮
した、信頼性の高いスコアを定義することが出来る。ま
た、そのスコアを参考にリジェクト文字を決定すること
により、高精度に誤認識の可能性がある文字を検出する
ことができる。
As described above, by calculating the score using the information A, B, and C obtained from the relationship between the degree of similarity between character glyphs and the result of recognition, reliability based on the relationship with other characters is considered. Highly likely scores can be defined. In addition, by determining the rejected character with reference to the score, it is possible to detect a character having a possibility of erroneous recognition with high accuracy.

【0041】なお、入力文書内の認識対象の文字が、す
べて同一の活字書体のみで印字される場合、もしくは、
すべて同一の筆記者により手書き筆記される場合は、同
一字体ブロック抽出部6は省略可能である。
If the characters to be recognized in the input document are all printed in the same typeface only, or
When all are handwritten by the same writer, the same font block extraction unit 6 can be omitted.

【0042】なお、字形特徴量は、本実施の形態では認
識部で使用する特徴量と別のものを字形特徴抽出部で求
めているが、認識部で使用する特徴量と同一のものでも
良いし、主成分分析などを用いて認識部で使用する特徴
量の次元を削減したものでもよい。
In the present embodiment, the character-shaped feature amount is different from the characteristic amount used in the recognition unit in the character-shaped feature extraction unit, but may be the same as the characteristic amount used in the recognition unit. Alternatively, the dimension of the feature amount used in the recognition unit may be reduced using principal component analysis or the like.

【0043】なお、類似度計算部では、字形特徴量同士
について(数3)のような類似度を計算しているが、他
の類似度でもよい。または、特徴量間のユークリッド距
離、市街地距離、マハラノビス距離など他の距離尺度で
もよい。
Although the similarity calculation unit calculates the similarity as shown in (Equation 3) between the character-shaped feature quantities, another similarity may be used. Alternatively, another distance scale such as a Euclidean distance, a city distance, or a Mahalanobis distance between feature values may be used.

【0044】なお、類似度計算部で類似度ではなく距離
を求める場合は、類似判定部では、予め定めた閾値T1
よりも小さければ”類似”と判定し、予め定めた閾値T
2よりも大きければ”非類似”と判定してもよい。
In the case where the similarity calculation unit obtains a distance instead of a similarity, the similarity determination unit uses a predetermined threshold T1.
If it is smaller than this, it is determined to be “similar” and a predetermined threshold T
If it is larger than two, it may be determined that "dissimilar".

【0045】なお、スコアの計算方法は、A,B,Cの
数値に重み付けをして加算したものでもよいし、前処理
部(文字切り出し部)で得られる切り出しスコア、認識
部で得られる認識スコアなどと重み付けをして加算し、
総合的なスコアとして用いてもよい。
The score may be calculated by weighting and adding the numerical values of A, B, and C, or by using a cutout score obtained by the preprocessing unit (character cutout unit) and a recognition score obtained by the recognition unit. Add weights and scores
It may be used as an overall score.

【0046】(実施の形態2)図6は、本発明の実施の
形態2に関わる文字認識装置の構成図である。実施の形
態1のときの類似判定部9がないほかは、実施の形態1
と同一の構成である。
(Embodiment 2) FIG. 6 is a configuration diagram of a character recognition device according to Embodiment 2 of the present invention. Embodiment 1 is the same as Embodiment 1 except that there is no similarity determination unit 9 in Embodiment 1.
This is the same configuration as.

【0047】次にこのような文字認識装置の動作につい
て詳細に説明するが、類似度計算部8までの動作は、実
施の形態1と同様である。
Next, the operation of such a character recognition device will be described in detail. The operation up to the similarity calculation unit 8 is the same as that of the first embodiment.

【0048】すなわち、同一字体ブロック抽出部6で同
一字体であると判断された文字ブロック毎に、ブロック
に含まれる全ての文字に対して字形特徴抽出部7で字形
特徴量が抽出され、類似度計算部8では、ブロックに含
まれる全ての文字間について(数3)の式に従って類似
度を求める。図4のような同一字体ブロックに含まれる
6個の文字画像について求めた類似度を図7に示す。こ
の類似度は類似情報記憶部10に蓄えられる。
That is, for each character block determined to have the same font by the same font block extraction unit 6, the character shape feature extraction unit 7 extracts the character shape feature amount for all the characters included in the block, and the similarity The calculation unit 8 calculates the similarity between all the characters included in the block according to the equation (Equation 3). FIG. 7 shows the similarities obtained for the six character images included in the same font block as shown in FIG. This similarity is stored in the similarity information storage unit 10.

【0049】誤認識検出部11では、同一字体であると
判断された文字のグループ内の各文字に対して、次のよ
うなスコアを計算する。
The erroneous recognition detection unit 11 calculates the following score for each character in the group of characters determined to have the same font.

【0050】・着目文字と、認識結果が同一の文字全て
に対して、図8のような関数(類似度とスコアの関係を
表す)によりスコアを計算し、その平均Saを求める ・着目文字と、認識結果が異なる文字全てに対して、図
9のような関数(類似度とスコアの関係を表す)により
スコアを計算し、その平均Sbを求める ・スコアSaとスコアSbの和Sを求め、着目文字のスコ
アとする例えば、1番目の文字のスコアSは、認識結果
が同一の文字が2,6文字目の2文字であり、認識結果
が異なる文字が3,4,5文字目の3文字であるから、
下のようになる。
For all the characters having the same recognition result as the target character, the score is calculated by a function as shown in FIG. 8 (representing the relationship between the similarity and the score), and the average Sa is obtained. For all the characters having different recognition results, the score is calculated by a function as shown in FIG. 9 (representing the relationship between the similarity and the score), and the average Sb is calculated. The sum S of the score Sa and the score Sb is calculated. For example, the score S of the first character, which is the score of the target character, is such that the characters having the same recognition result are the second and sixth characters, and the characters having different recognition results are the third, fourth and fifth characters. Because it is a letter
It looks like below.

【0051】 Sa={(200 x 0.92 x 0.92 - 100) + (200 x 0.91 x 0.91 - 100)} / 2 =67.45 Sb=[{-400 x (0.94 - 0.5) x (0.94 - 0.5)} + 0 + {-400 x (0.62 - 0.5) x (0.62 - 0.5)] / 3 =-23.73... S=Sa+Sb= 67.45 + (-43.737) =39.72... このように、各文字に対してスコアを求めると、図10
のようになる(ただし小数点以下は四捨五入してあ
る)。
Sa = {(200 × 0.92 × 0.92−100) + (200 × 0.91 × 0.91−100)} / 2 = 67.45 Sb = [{− 400 × (0.94−0.5) × (0.94−0.5)} + 0 + {-400 x (0.62-0.5) x (0.62-0.5)] / 3 = -23.73 ... S = Sa + Sb = 67.45 + (-43.737) = 39.72 ... Thus, for each character When the score is obtained, FIG.
(However, the decimal places are rounded off).

【0052】よってこのスコアをもとに、誤認識らしい
文字を検出(リジェクト)することが出来る(3番目の
文字は実際に誤認識である)。
Thus, based on the score, a character that is likely to be erroneously recognized can be detected (rejected) (the third character is actually erroneously recognized).

【0053】以上のように、文字の字形同士の類似度合
を反映したスコアを算出することにより、他の文字との
関係も考慮した、信頼性の高いスコアを定義することが
出来る。また、そのスコアを参考にリジェクト文字を決
定することにより、高精度に誤認識の可能性がある文字
を検出することができる。
As described above, by calculating the score reflecting the degree of similarity between the character shapes, it is possible to define a highly reliable score in consideration of the relationship with other characters. In addition, by determining the rejected character with reference to the score, it is possible to detect a character having a possibility of erroneous recognition with high accuracy.

【0054】なお、入力文書内の認識対象の文字が、す
べて同一の活字書体のみで印字される場合、もしくは、
すべて同一の筆記者により手書き筆記される場合は、同
一字体ブロック抽出部6は省略可能である。
If the characters to be recognized in the input document are all printed in the same typeface only, or
When all are handwritten by the same writer, the same font block extraction unit 6 can be omitted.

【0055】なお、字形特徴量は、本実施の形態では認
識部で使用する特徴量と別のものを字形特徴抽出部で求
めているが、認識部で使用する特徴量と同一のものでも
良いし、主成分分析などを用いて認識部で使用する特徴
量の次元を削減したものでもよい。
In the present embodiment, the character-shaped feature amount is different from the characteristic amount used in the recognition unit in the character-shaped feature extraction unit, but may be the same as the characteristic amount used in the recognition unit. Alternatively, the dimension of the feature amount used in the recognition unit may be reduced using principal component analysis or the like.

【0056】なお、類似度計算部では、字形特徴量同士
について(数3)のような類似度を計算しているが、他
の類似度でもよい。または、特徴量間のユークリッド距
離、市街地距離、マハラノビス距離など他の距離尺度で
もよい。
Although the similarity calculation unit calculates the similarity as shown in (Equation 3) between the character-shaped feature quantities, another similarity may be used. Alternatively, another distance scale such as a Euclidean distance, a city distance, or a Mahalanobis distance between feature values may be used.

【0057】なお、類似度計算部で類似度ではなく距離
を求める場合は、誤認識検出部では、類似度とスコアの
関数ではなく、距離とスコアの関数を用意しておけばよ
い。また、類似度とスコアの関数は、必ずしも図8,図
9のものでなくてもよく、次の条件を満たしているもの
ならば他の適当な関数でもよい。
In the case where the similarity calculator calculates the distance instead of the similarity, the erroneous recognition detector may prepare the function of the distance and the score instead of the function of the similarity and the score. Further, the functions of the similarity and the score need not necessarily be those of FIGS. 8 and 9 and may be other appropriate functions as long as the following conditions are satisfied.

【0058】・認識結果が同一の文字であれば、類似度
が小さければ小さいスコアを、類似度が大きければ、大
きいスコアを与える単調増加の関数 ・認識結果が異なる文字であれば、類似度が小さければ
大きいスコアを、類似度が大きければ、小さいスコアを
与える単調減少の関数 なお、本実施の形態で計算されるスコアを、前処理部
(文字切り出し部)で得られる切り出しスコア、認識部
で得られる認識スコアなどと重み付けをして加算し、総
合的なスコアとして用いてもよい。
If the recognition result is the same character, a monotonically increasing function that gives a small score if the degree of similarity is small, and a large score if the degree of similarity is high. A monotonically decreasing function that gives a large score if the score is small and a small score if the degree of similarity is large. The score calculated in the present embodiment is calculated by the cutout score obtained by the preprocessing unit (character cutout unit) and the recognition unit. The obtained recognition score or the like may be weighted and added to be used as an overall score.

【0059】(実施の形態3)図11は、本発明の実施
の形態3に関わる文字認識装置の構成図である。構成に
関しては、誤認識検出部11が誤認識修正部12に変わ
ったほかは、実施の形態1と同様である。
(Embodiment 3) FIG. 11 is a configuration diagram of a character recognition apparatus according to Embodiment 3 of the present invention. The configuration is the same as that of the first embodiment except that the erroneous recognition detection unit 11 is replaced with the erroneous recognition correction unit 12.

【0060】次にこのような文字認識装置の動作につい
て詳細に説明するが、類似情報記憶部10までの動作
は、認識部5が、特徴抽出部4で抽出された特徴量を用
いて認識を行い複数(本実施の形態ではN=3)の認識
候補カテゴリを出力すること以外は、実施の形態1と同
様である。
Next, the operation of such a character recognition device will be described in detail. In the operation up to the similarity information storage unit 10, the recognition unit 5 performs the recognition using the feature amount extracted by the feature extraction unit 4. This is the same as the first embodiment, except that a plurality of (N = 3 in this embodiment) recognition candidate categories are output.

【0061】誤認識修正部12では、実施の形態1の誤
認識検出部11の動作と同様に、同一字体であると判断
された文字ブロック内の各文字に対して、 A.類似と判断され、認識結果の文字カテゴリが同一で
ある文字の数 B.類似と判断され、認識結果の文字カテゴリが異なる
文字の数 C.非類似と判断され、認識結果の文字カテゴリが同一
である文字の数 の各条件にあてはまる、同一グループ内の文字数をそれ
ぞれカウントする。図12のような入力画像、認識結果
の場合に、第1候補の認識結果に対してそれぞれA,
B,Cをカウントした例を図13に示す。ここで、図1
2は、[000120]という数字の列を同一筆記者によって手
書き筆記されたものを認識部の第1〜3候補の出力と共
に示した結果であり、3番目の文字が6に誤認識されて
いる他は全て第1候補で正解である。図13は、そして
それぞれの文字の第1候補の認識結果に対してA,B,
Cの条件に当てはまる文字の個数をカウントしている。
因みに類似判定部での判定の結果は、1,2,3,6番
目の文字のうちのどの2文字も互いに類似と判定されて
おり、4番目の文字と1,2,3,6番目の文字は互い
に非類似と判定されている。その他の組み合わせは、類
似、非類似とも判定されていない組み合わせである。
A,B,Cのそれぞれの条件に当てはまる文字の個数を
カウントしてあるが、Aを正のスコア、B,Cを負のス
コアと考えると、これらの合計スコアが最も低いものは
3番目の文字(スコアは-3)であり、4,5番目の文字
がスコア0、1,2,6番目の文字がスコア1という順に
なる。
The erroneous recognition correction unit 12 performs the following operations on each character in a character block determined to have the same font, as in the operation of the erroneous recognition detection unit 11 of the first embodiment. Number of characters determined to be similar and having the same character category as the recognition result B. Number of characters that are determined to be similar and have different character categories in the recognition result Count the number of characters in the same group that are judged to be dissimilar and meet the conditions for the number of characters with the same character category in the recognition result. In the case of the input image and the recognition result as shown in FIG.
FIG. 13 shows an example in which B and C are counted. Here, FIG.
2 is a result of showing a sequence of numbers [000120] handwritten and written by the same writer together with the outputs of the first to third candidates of the recognition unit, and the third character is erroneously recognized as 6. All others are first candidates and are correct. FIG. 13 shows the result of the recognition of the first candidate of each character as A, B,
The number of characters that meet the condition of C is counted.
Incidentally, the result of the determination by the similarity determination unit is that any two of the first, second, third and sixth characters are determined to be similar to each other, and the fourth character and the first, second, third and sixth characters The characters are determined to be dissimilar to each other. Other combinations are combinations that are not determined to be similar or dissimilar.
The number of characters that meet the respective conditions of A, B, and C is counted. When A is considered as a positive score and B and C are considered as negative scores, the one with the lowest total score is the third one. Characters (score is -3), the fourth and fifth characters are score 0, and the first, second, and sixth characters are score 1.

【0062】このスコアが最も低いものが、予め定めた
閾値T以下(本実施の形態ではT=−1)ならば、その
文字の認識結果を第2候補のものに置き換えて、再度
A,B,Cの文字をカウントする。その結果を図14に
示す。3番目の文字の認識結果を第2候補の”0”に変
えることで、負のスコアがなくなると共に全体のスコア
(例えば各文字のスコアの和)も向上している。すなわ
ち、単純な第1候補の認識結果の組み合わせよりは、今
回求めた、3番目の文字のみ第2候補を採用した組み合
わせの方が、認識の確からしさが増加したと考えられ
る。もし、別の認識候補カテゴリに変えた場合でも、ス
コアが改善しない場合は、その文字をリジェクトするこ
とも出来る。
If the score with the lowest score is equal to or less than a predetermined threshold value T (T = -1 in this embodiment), the recognition result of the character is replaced with that of the second candidate, and A, B , C are counted. FIG. 14 shows the result. By changing the recognition result of the third character to “0” as the second candidate, the negative score is eliminated and the overall score (for example, the sum of the scores of each character) is improved. In other words, it can be considered that the combination obtained by adopting the second candidate only for the third character obtained this time has increased the certainty of the recognition, rather than the combination of the recognition results of the simple first candidates. If the score does not improve even if it is changed to another recognition candidate category, the character can be rejected.

【0063】この様に、他の文字との関係(類似度合、
認識結果)から求めたスコアが低い文字の認識結果を置
き換えることで、誤認識らしい文字を修正することがで
きる。
In this way, the relationship with other characters (similarity,
By replacing the recognition result of a character having a low score obtained from (recognition result), a character that is likely to be erroneously recognized can be corrected.

【0064】なお、入力文書内の認識対象の文字が、す
べて同一の活字書体のみで印字される場合、もしくは、
すべて同一の筆記者により手書き筆記される場合は、同
一字体ブロック抽出部6は省略可能である。
When the characters to be recognized in the input document are all printed in the same typeface only, or
When all are handwritten by the same writer, the same font block extraction unit 6 can be omitted.

【0065】なお、字形特徴量は、本実施の形態では認
識部で使用する特徴量と別のものを字形特徴抽出部で求
めているが、認識部で使用する特徴量と同一のものでも
良いし、主成分分析などを用いて認識部で使用する特徴
量の次元を削減したものでもよい。
In the present embodiment, the character-shaped feature amount is obtained by the character-shaped feature extracting unit different from the characteristic amount used by the recognizing unit, but may be the same as the feature amount used by the recognizing unit. Alternatively, the dimension of the feature amount used in the recognition unit may be reduced using principal component analysis or the like.

【0066】なお、類似度計算部では、字形特徴量同士
について(数3)のような類似度を計算しているが、他
の類似度でもよい。または、特徴量間のユークリッド距
離、市街地距離、マハラノビス距離など他の距離尺度で
もよい。
Although the similarity calculating unit calculates the similarity as shown in (Equation 3) between the character-shaped feature quantities, other similarities may be used. Alternatively, another distance scale such as a Euclidean distance, a city distance, or a Mahalanobis distance between feature values may be used.

【0067】なお、類似度計算部で類似度ではなく距離
を求める場合は、類似判定部では、予め定めた閾値T1
よりも小さければ”類似”と判定し、予め定めた閾値T
2よりも大きければ”非類似”と判定してもよい。
When the similarity calculation unit obtains a distance instead of a similarity, the similarity determination unit uses a predetermined threshold T1.
If it is smaller than this, it is determined to be “similar” and a predetermined threshold T
If it is larger than two, it may be determined that "dissimilar".

【0068】なお、スコアの計算方法は、A,B,Cの
数値に重み付けをして加算したものでもよいし、前処理
部(文字切り出し部)で得られる切り出しスコア、認識
部で得られる認識スコアなどと重み付けをして加算し、
総合的なスコアとして用いてもよい。
The score may be calculated by adding weights to the numerical values of A, B, and C, or by using a cutout score obtained by the preprocessing unit (character cutout unit) and a recognition unit obtained by the recognition unit. Add weights and scores
It may be used as an overall score.

【0069】なお、本実施の形態では、スコアが低い文
字の認識結果を、複数の認識結果を出力する認識部の認
識候補を利用して置換しているが、文字カテゴリ毎に予
め誤りやすいカテゴリを情報として保持している類似文
字テーブルのようなものを利用しても良い。
In this embodiment, the recognition result of a character having a low score is replaced by using a recognition candidate of a recognition unit that outputs a plurality of recognition results. May be used as a similar character table that holds the information as information.

【0070】(実施の形態4)図15は、本発明の実施
の形態4に関わる文字認識装置の構成図である。構成に
関しては、誤認識検出部11が誤認識修正部12に変わ
ったほかは、実施の形態2と同様である。
(Embodiment 4) FIG. 15 is a configuration diagram of a character recognition device according to Embodiment 4 of the present invention. The configuration is the same as that of the second embodiment except that the erroneous recognition detection unit 11 is replaced by the erroneous recognition correction unit 12.

【0071】次にこのような文字認識装置の動作につい
て詳細に説明するが、類似情報記憶部10までの動作
は、認識部5が、特徴抽出部4で抽出された特徴量を用
いて認識を行い複数(本実施の形態ではN=3)の認識
候補カテゴリを出力すること以外は、実施の形態2と同
様である。
Next, the operation of such a character recognition device will be described in detail. In the operation up to the similarity information storage unit 10, the recognition unit 5 performs recognition using the feature amount extracted by the feature extraction unit 4. The second embodiment is the same as the second embodiment except that a plurality of (N = 3 in this embodiment) recognition candidate categories are output.

【0072】誤認修正部12では、実施の形態2の誤認
識検出部11の動作と同様に、同一字体であると判断さ
れた文字のグループ内の各文字に対して、次のようなス
コアを計算する。ここで、認識結果とは、複数(候補数
N=3)の認識候補カテゴリのうちのどれかを指すが、
最初は、全て第1候補を使用する。
The misrecognition correction unit 12 assigns the following score to each character in the group of characters determined to have the same font, similarly to the operation of the misrecognition detection unit 11 of the second embodiment. calculate. Here, the recognition result indicates one of a plurality of (number of candidates N = 3) recognition candidate categories,
Initially, all first candidates are used.

【0073】・着目文字と、認識結果が同一の文字全て
に対して、図8のような関数(類似度とスコアの関係を
表す)によりスコアを計算し、その平均Saを求める ・着目文字と、認識結果が異なる文字全てに対して、図
9のような関数(類似度とスコアの関係を表す)により
スコアを計算し、その平均Sbを求める ・スコアSaとスコアSbの和Sを求め、着目文字のスコ
アとする ここでは、図12のような文字イメージの集合が同一字
体ブロック抽出部6によって指定されたとし、類似度計
算部8では図7のような、文字同士の類似度が計算され
たとする。
For all characters having the same recognition result as the target character, a score is calculated by a function as shown in FIG. 8 (representing the relationship between similarity and score), and the average Sa is obtained. For all the characters having different recognition results, the score is calculated by a function as shown in FIG. 9 (representing the relationship between the similarity and the score), and the average Sb is calculated. The sum S of the score Sa and the score Sb is calculated. Here, it is assumed that a set of character images as shown in FIG. 12 is specified by the same font block extraction unit 6, and the similarity calculation unit 8 calculates the similarity between characters as shown in FIG. Suppose it was done.

【0074】例えば、1番目の文字のスコアSは、認識
結果が同一の文字が2,6文字目の2文字であり、認識
結果が異なる文字が3,4,5文字目の3文字であるか
ら、下のようになる。
For example, in the score S of the first character, the characters having the same recognition result are the second and second characters, and the characters having different recognition results are the third, fourth and fifth characters. From, it becomes like below

【0075】 Sa={(200 x 0.92 x 0.92 - 100) + (200 x 0.91 x 0.91 - 100)} / 2 =67.45 Sb=[{-400 x (0.94 - 0.5) x (0.94 - 0.5)} + 0 + {-400 x (0.62 - 0.5) x (0.62 - 0.5)] / 3 =-23.73... S=Sa+Sb= 67.45 + (-43.737) =39.72... このように、各文字に対してスコアを求めると、図16
のようになる(ただし小数点以下は四捨五入してあ
る)。スコアSが最も低いものは3番目の文字(スコア
は-44)である。
Sa = {(200 × 0.92 × 0.92−100) + (200 × 0.91 × 0.91−100)} / 2 = 67.45 Sb = [{− 400 × (0.94−0.5) × (0.94−0.5)} + 0 + {-400 x (0.62-0.5) x (0.62-0.5)] / 3 = -23.73 ... S = Sa + Sb = 67.45 + (-43.737) = 39.72 ... Thus, for each character When the score is obtained, FIG.
(However, the decimal places are rounded off). The one with the lowest score S is the third character (score is -44).

【0076】スコアが最も低いものが、予め定めた閾値
T以下(本実施の形態ではT=−20)ならば、その文字
の認識結果を第2候補のものに置き換えて、再度スコア
計算を行う。
If the score with the lowest score is equal to or smaller than a predetermined threshold value T (T = −20 in this embodiment), the recognition result of the character is replaced with that of the second candidate, and the score is calculated again. .

【0077】例えば、1番目のスコアSは、3文字目の
認識結果として第2候補である”0”が採用されたた
め、認識結果が同一の文字は、2,3,6文字目の3文
字であり、認識結果が異なる文字が4,5文字目の2文
字となるため、下のようになる。
For example, for the first score S, since the second candidate “0” is adopted as the recognition result of the third character, the characters having the same recognition result are the three characters of the second, third, and sixth characters. Since the characters with different recognition results are the second and fourth characters, the result is as follows.

【0078】 Sa={(200 x 0.92 x 0.92 - 100) + (200 x 0.91 x 0.91 - 100) + (200 x 0 .94 x 0-.94 - 100)} / 3 = 70.54 Sb=[0 + {-400 x (0.62 - 0.5) x (0.62 - 0.5)] / 2 = -2.88 S=Sa+Sb= 67.66 + (-2.88) = 67.66 このように、各文字に対してスコアを求めると、図17
のようになる。その結果を図17に示す。3番目の文字
の認識結果を第2候補の”0”に変えることで、3文字
目のスコアが上がるだけでなく、全体のスコア(例えば
各文字のスコアの和)も向上している。すなわち、単純
な第1候補の認識結果の組み合わせよりは、今回求め
た、3番目の文字のみ第2候補を採用した組み合わせの
方が、認識の確からしさが増加したと考えられる。も
し、別の認識候補カテゴリに変えた場合でも、スコアが
改善しない場合は、その文字をリジェクトすることも出
来る。また、スコアの低い文字が複数ある場合は、1文
字ずつ認識候補を変えて、スコアの変化を見てやればよ
い。
Sa = {(200 × 0.92 × 0.92−100) + (200 × 0.91 × 0.91−100) + (200 × 0.94 × 0−.94−100)} / 3 = 70.54 Sb = [0+ {-400 x (0.62-0.5) x (0.62-0.5)] / 2 = -2.88 S = Sa + Sb = 67.66 + (-2.88) = 67.66 Thus, when a score is obtained for each character, FIG.
become that way. The result is shown in FIG. By changing the recognition result of the third character to “0” as the second candidate, not only the score of the third character is raised, but also the overall score (for example, the sum of the scores of each character) is improved. In other words, it can be considered that the combination obtained by adopting the second candidate only for the third character obtained this time has increased the certainty of the recognition, rather than the combination of the recognition results of the simple first candidates. If the score does not improve even if it is changed to another recognition candidate category, the character can be rejected. When there are a plurality of characters with low scores, the recognition candidates may be changed one character at a time and the change in the score may be observed.

【0079】この様に、他の文字との関係(類似度合、
認識結果)から求めたスコアが低い文字の認識結果を置
き換えることで、誤認識らしい文字を修正することがで
きる。
In this way, the relationship with other characters (similarity,
By replacing the recognition result of a character having a low score obtained from (recognition result), a character that is likely to be erroneously recognized can be corrected.

【0080】なお、入力文書内の認識対象の文字が、す
べて同一の活字書体のみで印字される場合、もしくは、
すべて同一の筆記者により手書き筆記される場合は、同
一字体ブロック抽出部6は省略可能である。
When the characters to be recognized in the input document are all printed in the same typeface only, or
When all are handwritten by the same writer, the same font block extraction unit 6 can be omitted.

【0081】なお、字形特徴量は、本実施の形態では認
識部で使用する特徴量と別のものを字形特徴抽出部で求
めているが、認識部で使用する特徴量と同一のものでも
良いし、主成分分析などを用いて認識部で使用する特徴
量の次元を削減したものでもよい。
In the present embodiment, the character-shaped feature amount is different from the characteristic amount used in the recognition unit in the character-shaped feature extraction unit, but may be the same as the characteristic amount used in the recognition unit. Alternatively, the dimension of the feature amount used in the recognition unit may be reduced using principal component analysis or the like.

【0082】なお、類似度計算部では、字形特徴量同士
について(数3)のような類似度を計算しているが、他
の類似度でもよい。または、特徴量間のユークリッド距
離、市街地距離、マハラノビス距離など他の距離尺度で
もよい。
Although the similarity calculating section calculates the similarity as shown in (Equation 3) between the character-shaped feature values, another similarity may be used. Alternatively, another distance scale such as a Euclidean distance, a city distance, or a Mahalanobis distance between feature values may be used.

【0083】なお、類似度計算部で類似度ではなく距離
を求める場合は、誤認識検出部では、類似度とスコアの
関数ではなく、距離とスコアの関数を用意しておけばよ
い。また、類似度とスコアの関数は、必ずしも図8,図
9のものでなくてもよく、次の条件を満たしているもの
ならば他の適当な関数でもよい。
When the similarity calculation unit obtains the distance instead of the similarity, the misrecognition detection unit may prepare the function of the distance and the score instead of the function of the similarity and the score. Further, the functions of the similarity and the score need not necessarily be those of FIGS. 8 and 9 and may be other appropriate functions as long as the following conditions are satisfied.

【0084】・認識結果が同一の文字であれば、類似度
が小さければ小さいスコアを、類似度が大きければ、大
きいスコアを与える単調増加の関数 ・認識結果が異なる文字であれば、類似度が小さければ
大きいスコアを、類似度が大きければ、小さいスコアを
与える単調減少の関数 なお、本実施の形態で計算されるスコアを、前処理部
(文字切り出し部)で得られる切り出しスコア、認識部
で得られる認識スコアなどと重み付けをして加算し、総
合的なスコアとして用いてもよい。
If the recognition result is the same character, a monotonically increasing function that gives a small score if the degree of similarity is low, and a large score if the degree of similarity is high. A monotonically decreasing function that gives a large score if the score is small and a small score if the degree of similarity is large. The score calculated in the present embodiment is calculated by the cutout score obtained by the preprocessing unit (character cutout unit) and the recognition unit. The obtained recognition score or the like may be weighted and added to be used as an overall score.

【0085】なお、本実施の形態では、スコアが低い文
字の認識結果を、複数の認識結果を出力する認識部の認
識候補を利用して置換しているが、文字カテゴリ毎に予
め誤りやすいカテゴリを情報として保持している類似文
字テーブルのようなものを利用しても良い。
In this embodiment, the recognition result of a character having a low score is replaced by using a recognition candidate of a recognition unit that outputs a plurality of recognition results. May be used as a similar character table that holds the information as information.

【0086】[0086]

【発明の効果】以上のように、本発明は(請求項1記載
の発明の効果)、文字認識結果の信頼性を表す信頼度
を、注目文字だけでなく、注目文字と他の文字からそれ
ぞれ抽出した字形特徴量間の関係をも用いて求めること
で、文字同士が類似しているのに認識結果が異なる場合
や、文字同士が類似していないのに認識結果が同一の場
合は、信頼度が低くなることにより誤認識の可能性のあ
る文字を検出できる。
As described above, according to the present invention (the effect of the first aspect of the present invention), the reliability indicating the reliability of the character recognition result can be calculated not only from the target character but also from the target character and other characters. By using the relationship between the extracted glyph features, it is possible to obtain a reliable result if the recognition results are different although the characters are similar, or if the recognition results are the same but the characters are not similar. Characters that may be erroneously recognized due to the low degree can be detected.

【0087】また、予め筆記者毎の文字サンプルを集め
る必要もなく、認識対象の文書にカテゴリ当たりの文字
数が十分多くなくても動作する。
Further, there is no need to collect character samples for each writer in advance, and the operation is performed even if the number of characters per category is not sufficiently large in the document to be recognized.

【0088】また、本発明は(請求項7記載の発明の効
果)、注目文字の信頼度を求める際に、注目文字に対し
て、字形特徴量同士が類似と判断され、かつ、認識結果
が異なる文字の個数、および、字形特徴量同士が非類似
と判断され、かつ、認識結果が同一の文字の個数を負の
信頼度としてカウントし、字形特徴量同士が類似と判断
され、かつ、認識結果が同一の文字の個数を正の信頼度
としてカウントすることにより、簡単な方法で字形特徴
量間の関係を用いた信頼度を定義し、その信頼度をもと
に誤認識の可能性のある文字を検出することができる。
Further, according to the present invention (the effect of the seventh aspect of the present invention), when determining the reliability of the target character, the character shape feature values are determined to be similar to the target character, and the recognition result is not obtained. The number of different characters and the glyph features are determined to be dissimilar, and the number of characters with the same recognition result is counted as negative reliability, and the glyph features are determined to be similar and recognized. By counting the number of characters with the same result as positive reliability, the reliability using the relationship between the glyph features is defined by a simple method, and the possibility of erroneous recognition based on the reliability is determined. Certain characters can be detected.

【0089】また、本発明は(請求項8記載の発明の効
果)、注目文字の信頼度を求める際に、注目文字に対し
て、認識結果が同一の文字の場合は、字形特徴量間の類
似度に応じた信頼度S1を与え、認識結果が異なる文字
の場合は、字形特徴量間の類似度に応じた信頼度S2を
与えることにより、類似度の大きさを反映した、より精
度の高い信頼度を定義し、その信頼度をもとに誤認識の
可能性のある文字を検出することができる。
Further, according to the present invention (the effect of the invention described in claim 8), when the reliability of the target character is obtained, if the recognition result is the same for the target character, the character shape feature amount If the reliability S1 according to the similarity is given, and if the recognition result is a different character, the reliability S2 according to the similarity between the glyph feature amounts is given, so that the degree of similarity reflecting the magnitude of the similarity is improved. It is possible to define a high degree of reliability and detect a character that may be erroneously recognized based on the degree of reliability.

【0090】また、本発明は(請求項13記載の発明の
効果)、信頼度があらかじめ定めた閾値よりも小さい文
字に対しては認識結果を修正候補カテゴリに置換して信
頼度の再計算を行う。
Further, according to the present invention (the effect of the thirteenth aspect), for a character whose reliability is smaller than a predetermined threshold value, the recognition result is replaced with a correction candidate category and the reliability is recalculated. Do.

【0091】この信頼度の再計算については、認識結果
を置換した文字だけではなく、他の文字に対しても再計
算を行う。
As for the recalculation of the reliability, the recalculation is performed not only for the character whose recognition result has been replaced but also for other characters.

【0092】修正候補カテゴリが正解カテゴリの場合
は、認識結果を置換した文字の信頼度が上がるだけでな
く、認識結果と字形特徴量間の関係により他の文字の信
頼度も向上するため、修正候補カテゴリの正解可能性を
判断しやすい。
When the correction candidate category is the correct answer category, not only the reliability of the character whose recognition result has been replaced is increased, but also the reliability of other characters is improved due to the relationship between the recognition result and the glyph feature. It is easy to judge the correctness of the candidate category.

【0093】よって、高精度に誤認識の可能性のある文
字カテゴリを訂正することができる。
Therefore, it is possible to correct a character category which may be erroneously recognized with high accuracy.

【0094】従って、本発明の信頼度計算方法、およ
び、誤認識訂正方法を帳票認識装置に用いることによ
り、高精度な認識が可能となる。
Therefore, by using the reliability calculation method and the erroneous recognition correction method of the present invention in a form recognition apparatus, highly accurate recognition becomes possible.

【図面の簡単な説明】[Brief description of the drawings]

【図1】本発明の実施の形態1の構成を示すブロック図FIG. 1 is a block diagram showing a configuration of a first embodiment of the present invention.

【図2】本発明の実施の形態1で入力される文書画像の
一例を示す図
FIG. 2 is a diagram illustrating an example of a document image input according to the first embodiment of the present invention;

【図3】本発明の実施の形態1の同一字体ブロック抽出
部における判定結果の一例を示す図
FIG. 3 is a diagram illustrating an example of a determination result in the same font block extraction unit according to the first embodiment of the present invention;

【図4】本発明の実施の形態1の前処理部で切り出され
た文字画像と認識部で出力された認識結果の文字カテゴ
リの一例を示す図
FIG. 4 is a diagram illustrating an example of a character image cut out by a preprocessing unit and a character category of a recognition result output by a recognition unit according to the first embodiment of the present invention;

【図5】本発明の実施の形態1の類似情報記憶部におけ
る字形同士の類似度合と認識結果の関係の一例を示す図
FIG. 5 is a diagram showing an example of a relationship between similarities between character shapes and recognition results in a similar information storage unit according to the first embodiment of the present invention;

【図6】本発明の実施の形態2の構成を示すブロック図FIG. 6 is a block diagram showing a configuration of a second embodiment of the present invention.

【図7】本発明の実施の形態2の類似度計算部で計算さ
れた類似度の一覧を示す図
FIG. 7 is a diagram showing a list of similarities calculated by a similarity calculator according to the second embodiment of the present invention;

【図8】本発明の実施の形態2の誤認識検出部で、スコ
ア計算の際に利用される、認識結果が同一の場合の類似
度・スコア間の関数を示す図
FIG. 8 is a diagram showing a function between the similarity and the score when the recognition result is the same, which is used in the score calculation in the erroneous recognition detection unit according to the second embodiment of the present invention.

【図9】本発明の実施の形態2の誤認識検出部で、スコ
ア計算の際に利用される、認識結果が異なる場合の類似
度・スコア間の関数を示す図
FIG. 9 is a diagram showing a function between similarity and score when recognition results are different, which is used in score calculation by an erroneous recognition detection unit according to the second embodiment of the present invention.

【図10】本発明の実施の形態2の誤認識検出部で計算
された各文字のスコアを示す図
FIG. 10 is a diagram illustrating a score of each character calculated by an erroneous recognition detection unit according to the second embodiment of the present invention.

【図11】本発明の実施の形態3の構成を示すブロック
FIG. 11 is a block diagram showing a configuration of a third embodiment of the present invention.

【図12】本発明の実施の形態3の前処理部で切り出さ
れた文字画像と認識部で出力された認識結果の文字カテ
ゴリの一例を示す図
FIG. 12 is a diagram illustrating an example of a character image cut out by a preprocessing unit and a character category of a recognition result output by a recognition unit according to the third embodiment of the present invention;

【図13】本発明の実施の形態3の類似情報記憶部にお
いて、全て第1候補の認識結果を用いた時の、字形同士
の類似度合と認識結果から求められるスコアを示す図
FIG. 13 is a diagram illustrating a similarity between characters and a score obtained from the recognition result when all the recognition results of the first candidates are used in the similar information storage unit according to the third embodiment of the present invention;

【図14】本発明の実施の形態3の類似情報記憶部にお
いて、3文字目だけ第2候補の認識結果を用い、その他
は全て第1候補の認識結果を用いた時の、字形同士の類
似度合と認識結果から求められるスコアを示す図
FIG. 14 is a diagram illustrating a similarity between character shapes when the recognition result of the second candidate is used only for the third character and the recognition result of the first candidate is used for all other characters in the similar information storage unit according to the third embodiment of the present invention. Diagram showing the score obtained from the degree and the recognition result

【図15】本発明の実施の形態4の構成を示すブロック
FIG. 15 is a block diagram showing a configuration of a fourth embodiment of the present invention.

【図16】本発明の実施の形態4の類似情報記憶部にお
いて、全て第1候補の認識結果を用いた時の、字形同士
の類似度合と認識結果から求められるスコアを示す図
FIG. 16 is a diagram showing a similarity between characters and a score obtained from the recognition result when all the recognition results of the first candidates are used in the similar information storage unit according to the fourth embodiment of the present invention.

【図17】本発明の実施の形態4の類似情報記憶部にお
いて、3文字目だけ第2候補の認識結果を用い、その他
は全て第1候補の認識結果を用いた時の、字形同士の類
似度合と認識結果から求められるスコアを示す図
FIG. 17 shows a similarity between character shapes when the recognition result of the second candidate is used for only the third character and the recognition result of the first candidate is used for all other characters in the similar information storage unit according to the fourth embodiment of the present invention. Diagram showing the score obtained from the degree and the recognition result

【符号の説明】[Explanation of symbols]

1 画像入力部 2 前処理部 3 文字画像記憶部 4 特徴抽出部 5 認識部 6 同一字体ブロック抽出部 7 字形特徴抽出部 8 類似度計算部 9 類似判定部 10 類似情報記憶部 11 誤認識検出部 12 誤認識修正部 DESCRIPTION OF SYMBOLS 1 Image input part 2 Preprocessing part 3 Character image storage part 4 Feature extraction part 5 Recognition part 6 Identical character block extraction part 7 Character shape extraction part 8 Similarity calculation part 9 Similarity judgment part 10 Similar information storage part 11 False recognition detection part 12 Error recognition and correction unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者 目片 強司 大阪府門真市大字門真1006番地 松下電器 産業株式会社内 Fターム(参考) 5B064 AA01 AB02 AB03 AB13 EA08 EA26  ──────────────────────────────────────────────────続 き Continued on the front page (72) Inventor Takeshi Megumi 1006 Kadoma Kadoma, Osaka Prefecture Matsushita Electric Industrial Co., Ltd. F-term (reference) 5B064 AA01 AB02 AB03 AB13 EA08 EA26

Claims (17)

【特許請求の範囲】[Claims] 【請求項1】 複数の文字画像を含む文書画像内の前記
文字画像を入力とし、認識対象の全カテゴリのうち少な
くとも1つ以上のカテゴリを認識結果として出力するこ
とで文字認識を行うステップと、前記文字画像の形状を
数値化した字形特徴量を抽出するステップと、前記複数
の文字画像のうちの1文字と残りの文字画像のうちの1
文字以上の文字との字形特徴量間の類似の度合および前
記認識結果の相違をもとに前記文字の認識結果の信頼性
を表す信頼度を算出するステップとを有することを特徴
とする文字認識方法。
A step of inputting the character image in a document image including a plurality of character images as input and performing character recognition by outputting at least one or more categories among all categories to be recognized as a recognition result; Extracting a character-shaped feature amount obtained by digitizing the shape of the character image; and extracting one character from the plurality of character images and one from the remaining character images.
Calculating a reliability indicating the reliability of the recognition result of the character based on the degree of similarity between the character-shaped feature amounts of the character or more and the difference of the recognition result. Method.
【請求項2】 前記文字認識を行うステップが、文字画
像から何らかの方法で特徴量を抽出し、前記特徴量をも
とに出力するカテゴリを決定する場合、字形特徴量を抽
出するステップは、前記特徴量、または、前記特徴量の
一部を字形特徴量として出力することを特徴とする請求
項1記載の文字認識方法。
2. The method according to claim 1, wherein the step of performing the character recognition includes extracting a characteristic amount from the character image by some method and determining a category to be output based on the characteristic amount. 2. The character recognition method according to claim 1, wherein a characteristic amount or a part of the characteristic amount is output as a character-shaped characteristic amount.
【請求項3】 前記字形特徴量を抽出するステップは、
前記特徴量または前記特徴量の一部に主成分分析を行い
次元圧縮を行ったものを字形特徴量として出力すること
を特徴とする請求項2記載の文字認識方法。
3. The step of extracting the character-shaped feature value comprises:
3. The character recognition method according to claim 2, wherein a component obtained by performing a principal component analysis on the feature value or a part of the feature value and performing dimensional compression is output as a character-shaped feature value.
【請求項4】 前記類似度合を類似という第1の状態
と、非類似という第2の状態と、そのどちらでもないと
いう第3の状態のいずれかにより表すことを特徴とする
請求項1、2又は3に記載の文字認識方法。
4. The method according to claim 1, wherein the similarity is represented by one of a first state of similarity, a second state of dissimilarity, and a third state of neither of them. Or the character recognition method according to 3.
【請求項5】 任意の文字に対して、前記文字の字形特
徴量と他の文字の字形特徴量間の類似度が予め定めた閾
値1よりも大きければ、前記類似度合を類似とし、予め
定めた閾値2よりも小さければ、前記類似度合を非類似
とすることを特徴とする請求項4記載の文字認識方法。
5. If the degree of similarity between the character-shaped feature amount of the character and the character-shaped feature amount of another character is greater than a predetermined threshold value 1 with respect to an arbitrary character, the similarity is determined to be similar and the predetermined degree is determined. 5. The character recognition method according to claim 4, wherein the similarity is determined to be non-similar if the difference is smaller than the second threshold.
【請求項6】 任意の文字に対して、前記文字の字形特
徴量と他の文字の字形特徴量間の距離が予め定めた閾値
1よりも小さければ、前記類似度合を類似とし、予め定
めた閾値2よりも大きければ、前記類似度合を非類似と
することを特徴とする請求項4記載の信頼度計算方法。
6. For a given character, if the distance between the character-shaped feature value of the character and the character-shaped feature value of another character is smaller than a predetermined threshold value 1, the similarity is determined to be similar and the predetermined similarity is determined. 5. The reliability calculation method according to claim 4, wherein the similarity is determined to be non-similar if the value is larger than the second threshold.
【請求項7】 任意の文字に対して前記類似度合が類似
でありかつ認識結果が異なる関係にある文字の個数と、
前記類似度合が非類似でありかつ認識結果が等しい関係
にある文字の個数と、前記類似度合が類似でありかつ認
識結果が等しい関係にある文字の個数とを求め、前記個
数をもとに信頼度を算出することを特徴とする請求項4
から請求項6のいずれかに記載の文字認識方法。
7. The number of characters whose similarity is similar to an arbitrary character and whose recognition result is different,
The number of characters whose similarity is dissimilar and the recognition result is equal, and the number of characters whose similarity is similar and the recognition result is equal are determined, and based on the number, reliability is determined. 5. The degree is calculated.
The character recognition method according to any one of claims 1 to 6.
【請求項8】 任意の文字に対して、前記文字と他の文
字との認識結果が同一の場合、前記文字の字形特徴量と
他の文字の字形特徴量間の類似度に応じた信頼度S1を
与え、前記文字と他の文字との認識結果が異なる場合、
前記文字の字形特徴量と他の文字の字形特徴量間の類似
度に応じた信頼度S2を与えることを特徴とする請求項
1、2又は3に記載の文字認識方法。
8. A reliability according to a similarity between a character shape characteristic amount of the character and a character shape characteristic amount of another character when a recognition result of the character and another character is the same for an arbitrary character. S1 is given, and when the recognition result between the character and another character is different,
The character recognition method according to claim 1, wherein a reliability S <b> 2 is provided according to a degree of similarity between the character-shaped feature amount of the character and the character-shaped feature amount of another character.
【請求項9】 任意の文字に対して、前記文字と他の文
字との認識結果が同一の場合、前記文字の字形特徴量と
他の文字の字形特徴量間の距離に応じた信頼度S1を与
え、前記文字と他の文字との認識結果が異なる場合、前
記文字の字形特徴量と他の文字の字形特徴量間の距離に
応じた信頼度S2を与えることを特徴とする請求項1、
2又は3に記載の文字認識方法。
9. When a given character has the same recognition result as that of another character, the reliability S1 corresponding to the distance between the character feature of the character and the character feature of another character is determined. And when the recognition result of the character is different from that of another character, a reliability S2 according to a distance between the character shape amount of the character and the character shape amount of another character is provided. ,
4. The character recognition method according to 2 or 3.
【請求項10】請求項1から請求項9のいずれかに記載
の文字認識方法で得られる信頼度と、他の処理から得ら
れる信頼度とを合わせて文字の信頼度を算出することを
特徴とする文字認識方法。
10. A character reliability is calculated by combining reliability obtained by the character recognition method according to any one of claims 1 to 9 with reliability obtained from other processing. Character recognition method.
【請求項11】 同一人によって筆記された文字画像の
集合、および、同一書体によって印字された文字画像の
集合を同一字種セットとして出力するステップを備え、
前記同一字種セットに含まれる文字画像に対して、文字
の信頼度を計算することを特徴とする請求項1から請求
項10のいずれかに記載の文字認識方法。
11. A step of outputting a set of character images written by the same person and a set of character images printed in the same typeface as the same character type set,
The character recognition method according to claim 1, wherein character reliability is calculated for character images included in the same character type set.
【請求項12】 請求項1から請求項11のいずれかに
記載の文字認識方法で得られる信頼度をもとにリジェク
ト文字を決定することを特徴とするリジェクト文字決定
方法。
12. A rejected character determination method, wherein a rejected character is determined based on the reliability obtained by the character recognition method according to claim 1. Description:
【請求項13】 請求項1から請求項11のいずれかに
記載の文字認識方法において、信頼度があらかじめ定め
た閾値よりも小さい文字に対しては認識結果を、前記認
識結果とは異なるカテゴリである修正カテゴリに置換し
て、複数文字の信頼度を再計算し、前記再計算された信
頼度がもとの信頼度よりも大きくなる場合に、認識結果
を前記修正カテゴリに決定することを特徴とする誤認識
訂正方法。
13. The character recognition method according to claim 1, wherein a recognition result of a character whose reliability is smaller than a predetermined threshold is classified into a category different from the recognition result. Replacing the reliability of a plurality of characters by replacing with a certain correction category, and determining the recognition result in the correction category when the recalculated reliability is larger than the original reliability. Misrecognition and correction method.
【請求項14】 文字認識のステップが、認識結果とし
て複数の候補のカテゴリを出力する場合、修正カテゴリ
は、現在の信頼度を計算している候補と別の候補のカテ
ゴリであることを特徴とする請求項13に記載の誤認識
訂正方法。
14. The method according to claim 11, wherein the character recognition step outputs a plurality of candidate categories as a recognition result, wherein the correction category is a category of a candidate different from the candidate for which the current reliability is calculated. The erroneous recognition correction method according to claim 13, wherein
【請求項15】 予め誤認識しやすい文字同士の関係を
保持した誤認識文字テーブルを有し、修正カテゴリは、
前記誤認識文字テーブル中の文字であることを特徴とす
る請求項13に記載の誤認識訂正方法。
15. An erroneously recognized character table in which a relationship between characters that are easily erroneously recognized is stored in advance, and the correction category is
14. The erroneous recognition correction method according to claim 13, wherein the erroneous recognition character table is a character.
【請求項16】 請求項1から請求項15のいずれかに
記載の前記各ステップの機能の全部または一部を実行す
ることを特徴とする文字認識装置。
16. A character recognition device for executing all or a part of the function of each step according to claim 1. Description:
【請求項17】 請求項1から請求項15のいずれかに
記載の前記各ステップの機能の全部または一部をコンピ
ュータに実行させるプログラムを格納することを特徴と
する記録媒体。
17. A recording medium storing a program for causing a computer to execute all or a part of the functions of each of the steps according to claim 1. Description:
JP25987998A 1998-09-14 1998-09-14 Character recognition method and apparatus Expired - Lifetime JP3374762B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP25987998A JP3374762B2 (en) 1998-09-14 1998-09-14 Character recognition method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP25987998A JP3374762B2 (en) 1998-09-14 1998-09-14 Character recognition method and apparatus

Publications (2)

Publication Number Publication Date
JP2000090203A true JP2000090203A (en) 2000-03-31
JP3374762B2 JP3374762B2 (en) 2003-02-10

Family

ID=17340220

Family Applications (1)

Application Number Title Priority Date Filing Date
JP25987998A Expired - Lifetime JP3374762B2 (en) 1998-09-14 1998-09-14 Character recognition method and apparatus

Country Status (1)

Country Link
JP (1) JP3374762B2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020046734A (en) * 2018-09-14 2020-03-26 富士ゼロックス株式会社 Information processing device and program
JP2022088183A (en) * 2020-12-02 2022-06-14 株式会社三菱Ufj銀行 Ledger sheet reader and ledger sheet reading method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020046734A (en) * 2018-09-14 2020-03-26 富士ゼロックス株式会社 Information processing device and program
JP7247496B2 (en) 2018-09-14 2023-03-29 富士フイルムビジネスイノベーション株式会社 Information processing device and program
JP2022088183A (en) * 2020-12-02 2022-06-14 株式会社三菱Ufj銀行 Ledger sheet reader and ledger sheet reading method

Also Published As

Publication number Publication date
JP3374762B2 (en) 2003-02-10

Similar Documents

Publication Publication Date Title
US6249605B1 (en) Key character extraction and lexicon reduction for cursive text recognition
EP1362322B1 (en) Holistic-analytical recognition of handwritten text
JP5217127B2 (en) Collective place name recognition program, collective place name recognition apparatus, and collective place name recognition method
US5768417A (en) Method and system for velocity-based handwriting recognition
JP4787275B2 (en) Segmentation-based recognition
US5854855A (en) Method and system using meta-classes and polynomial discriminant functions for handwriting recognition
US5802205A (en) Method and system for lexical processing
US5917941A (en) Character segmentation technique with integrated word search for handwriting recognition
JP2734386B2 (en) String reader
US8340429B2 (en) Searching document images
KR100412317B1 (en) Character recognizing/correcting system
EP0436819B1 (en) Handwriting recognition employing pairwise discriminant measures
JP2008532176A (en) Recognition graph
Favata Off-line general handwritten word recognition using an approximate beam matching algorithm
EP2138959B1 (en) Word recognizing method and word recognizing program
Huang et al. Mapping transcripts to handwritten text
Wang et al. A study on the document zone content classification problem
US20090252417A1 (en) Unsupervised writer style adaptation for handwritten word spotting
JP3917349B2 (en) Retrieval device and method for retrieving information using character recognition result
JP2008225695A (en) Character recognition error correction device and program
Fornés et al. A symbol-dependent writer identification approach in old handwritten music scores
JP2004171316A (en) Ocr device, document retrieval system and document retrieval program
JP3374762B2 (en) Character recognition method and apparatus
JP4194020B2 (en) Character recognition method, program used for executing the method, and character recognition apparatus
JP3415342B2 (en) Character cutout method

Legal Events

Date Code Title Description
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20071129

Year of fee payment: 5

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20081129

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091129

Year of fee payment: 7

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091129

Year of fee payment: 7

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20101129

Year of fee payment: 8

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20111129

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20121129

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20121129

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20131129

Year of fee payment: 11

S533 Written request for registration of change of name

Free format text: JAPANESE INTERMEDIATE CODE: R313533

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

S111 Request for change of ownership or part of ownership

Free format text: JAPANESE INTERMEDIATE CODE: R313113

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

EXPY Cancellation because of completion of term