JPH02193281A - Character recognizing device - Google Patents

Character recognizing device

Info

Publication number
JPH02193281A
JPH02193281A JP1012721A JP1272189A JPH02193281A JP H02193281 A JPH02193281 A JP H02193281A JP 1012721 A JP1012721 A JP 1012721A JP 1272189 A JP1272189 A JP 1272189A JP H02193281 A JPH02193281 A JP H02193281A
Authority
JP
Japan
Prior art keywords
character
candidate
characters
candidate characters
identification means
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP1012721A
Other languages
Japanese (ja)
Inventor
Naoki Maeda
直樹 前田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sumitomo Electric Industries Ltd
Original Assignee
Sumitomo Electric Industries Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sumitomo Electric Industries Ltd filed Critical Sumitomo Electric Industries Ltd
Priority to JP1012721A priority Critical patent/JPH02193281A/en
Publication of JPH02193281A publication Critical patent/JPH02193281A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PURPOSE:To shorten the character recognition time by determining in advance the number of candidate characters which are brought to detailed calculation in accordance with a degree of 'complicated' and 'simple' of a character and storing it in a calculation candidate charac ter number correspondence table, and determining one candidate character number with regard to a candidate character group which is brought to rough classification. CONSTITUTION:In accordance with a difference of 'simplicity' and 'complicatedness' of a character, the number of candidate characters at the time of executing a feature quantity comparing calculation, which corresponds to each character is stored in advance in a candidate character number correspondence table 7. In this state, the candidate character selected by a rough classification discriminating means 3 is informed to a table retrieving means 8, and the table retrieving means 8 selects the candidate character from among the informed candidate characters by a prescribed reference. Subsequently, by referring to the candidate character number correspondence table 7, the number of candidate characters corresponding to this selected candidate character is retrieved, and a candidate character limiting means 9 determines one candidate character number, based on the number of retrieved candidate characters. In such a way, the number of candidate characters calculated in accordance with 'simplicity' of the character can be limited, and the recognition time of the character can be shortened as a whole.

Description

【発明の詳細な説明】 〈産業上の利用分野〉 本発明は文字認識装置に関し、さらに詳細にいえば、画
像入力装置、またはファクシミリ等の通信媒体を通して
文字、記号等(以下代表して「文字」という)を表わす
画像信号を取得し、その特徴量を抽出し、上記特徴量を
基に演算を行って−群の候補文字をひとまず選定し、上
記候補文字の中から被読取対象である文字に最も近い文
字を詳細識別して当該文字を表わす信号を出力すること
のできる文字認識装置に関するものである。
[Detailed Description of the Invention] <Industrial Field of Application> The present invention relates to a character recognition device, and more specifically, the present invention relates to a character recognition device, and more specifically, to recognize characters, symbols, etc. (hereinafter representatively referred to as “characters”) through an image input device or a communication medium such as a facsimile. ”), extract its feature amount, perform calculations based on the feature amount, select candidate characters of the − group, and select the character to be read from among the candidate characters. The present invention relates to a character recognition device that can precisely identify a character closest to a character and output a signal representing the character.

〈従来の技術〉 従来の文字認識装置による文字認識を行う場合、例えば
、第5図に示すように、スキャナ等の画像入力手段(1
1)で文字を含む画像を入力し、文字切り出し手段(1
2)で1つ1つの単位文字を切り出す。
<Prior art> When performing character recognition using a conventional character recognition device, for example, as shown in FIG.
1) Input an image containing characters, and use the character cutting means (1) to input an image containing characters.
2) Cut out each unit character.

そして、特徴抽出手段(13)で、切り出された文字信
号に基づいて特徴量を抽出した後、大分類識別手段(1
4)で、簡単な手法を用いて複数個の候補文字を絞る。
Then, the feature extraction means (13) extracts the feature amount based on the extracted character signal, and then the large classification identification means (13) extracts the feature amount based on the extracted character signal.
In step 4), narrow down multiple candidate characters using a simple method.

詳細識別手段(15)では大分類識別手段(14)から
送られてきた各候補文字について、認識用辞書(1B)
 (認識用辞書(16)は、特徴量の外、例えば特徴量
の平均値や分布の状態、各特徴量が認識に影響を与える
順位等を記憶している。)に蓄えられた当該候補文字の
特徴量その他の情報を得、特徴抽出手段(18)で得ら
れた特徴量と比較し、それぞれの相違度あるいは類似度
(以下代表して「相違度」という。)を計算してその結
果を出力する。順位決定手段(17)は、相違度の小さ
な順に並べ変え、例えば最も小さな相違度を出した文字
を認識結果として出力する。
The detailed identification means (15) uses the recognition dictionary (1B) for each candidate character sent from the major classification identification means (14).
(The recognition dictionary (16) stores, in addition to the feature quantities, for example, the average value of the feature quantities, the state of distribution, and the rank in which each feature quantity affects recognition.) Obtain feature quantities and other information, compare them with the feature quantities obtained by the feature extracting means (18), calculate the degree of dissimilarity or similarity (hereinafter representatively referred to as "degree of dissimilarity"), and obtain the results. Output. The ranking determining means (17) rearranges the characters in descending order of dissimilarity, and outputs, for example, the character with the smallest dissimilarity as a recognition result.

特に、この文字認識装置の一具体例として、特公昭63
−28915号公報記載の文字認識装置があげられる。
In particular, as a specific example of this character recognition device,
An example of this is the character recognition device described in Japanese Patent Publication No.-28915.

この文字認識装置では、大分類識別手段で、同じような
偏を持つカテゴリの辞書を選択し、詳細識別手段では、
選択された候補カテゴリの辞書ニ入っている全ての文字
に対して、側部分はマスクし、労 部分のみを詳細識別
する。これにより、同じような偏を持つ文字群の認識率
を向上させている。
In this character recognition device, the major classification identification means selects dictionaries with categories that have similar biases, and the detailed identification means selects dictionaries with categories that have similar biases.
For all characters in the dictionary of the selected candidate category, the side parts are masked and only the labor parts are identified in detail. This improves the recognition rate for groups of characters with similar biases.

〈発明が解決しようとする課題〉 ところが、上記第5図の文字認識装置では、詳細識別手
段(15)は、大分類識別手段(14)により選出され
た全ての候補文字について特徴量を比較するため、候補
文字を多く設定すると計算量が大きくなり、認識時間が
増大することになる。例えば、候補文字を100個設定
する漢字OCRでは、文字を認識するのに相違度を10
0回計算しなげればならない。
<Problem to be Solved by the Invention> However, in the character recognition device shown in FIG. Therefore, setting a large number of candidate characters increases the amount of calculation and increases the recognition time. For example, in Kanji OCR that sets 100 candidate characters, the degree of difference is 10 to recognize a character.
You have to calculate it 0 times.

候補文字数を減らせば、計算時間は短縮されるが、ある
一定の認識性能を確保するためには、大分類識別手段(
14)で選定する候補文字数は十分な数を確保しておく
ことが望ましい。なぜならば、大分類識別手段(14)
で選定した候補文字の中がら正続文字が漏れていると、
詳細識別手段(15)でぃくら詳細に認識計算を行って
も、正続文字を得ることはできないからである。
Reducing the number of candidate characters will shorten the calculation time, but in order to ensure a certain level of recognition performance, it is necessary to use a major classification identification method (
It is desirable to secure a sufficient number of candidate characters selected in step 14). Because, major classification identification means (14)
If a regular continuation character is omitted from the candidate characters selected in ,
This is because even if the detailed recognition means (15) performs detailed recognition calculations, it is not possible to obtain regular continuation characters.

このために、従来では、候補文字数を多めに設定してお
り、その結果、詳細識別のための時間が多く費やされて
いたとい□う問題があった。
For this reason, in the past, the number of candidate characters was set to be relatively large, resulting in a problem in that a large amount of time was wasted for detailed identification.

本発明の目的は、簡単な手法を採用することにより、文
字の認識時間を全体として短縮することのできる文字認
識装置を提供することにある。
An object of the present invention is to provide a character recognition device that can shorten the overall character recognition time by employing a simple method.

く課題を解決するための手段〉 上記の目的を達成するための本発明の文字認識゛装置は
、第1、図に示すように、文字を含む被読取対象を、表
わす画像信号を取得する画像信号取得手段(1)と、上
記画像信号に基づき画像中の認識しょうとする文字の特
徴量を抽出する特徴量抽出手段(2)と、上記特徴量を
基に演算を行い複数の候補文字を選定する大分類識別手
段(3)と、各文字の認識に必要な特徴量その他の情報
を記憶した認識用辞書と、上記認識しようとする文字の
特徴量を認識用辞書に記憶された候補文字の情報と比較
し特徴量比較計算を行う詳細識別手段(4)と、詳細識
別手段(4)で識別された文字の中から一定の基準で文
字を選択して当該文字を表わす信号を出力する認識文字
出力手段(5)と、特徴量比較計算をする時の候補文字
数を指定する定数を各文字に対応して記憶している候補
文字数対応表(7)と、上記候補文字の中から所定の基
準で1つまたは複数の候補文字を選びだし、この選び出
した候補文字に対応する候補文字数を検索する表検索手
段(8)と、検索した候補文字数を基に1つの候補文字
数を決定して当該候補文字数の候補文字信号を詳細識別
手段(4)に送り出す候補文字限定手段(9)とを具備
している。上記詳細識別手段(4)は、候補文字限定手
段(9)から指定された候補文字に対してのみ特徴量比
較計算をするものである。
Means for Solving the Problems> As shown in the figure, the character recognition device of the present invention for achieving the above objects first uses an image to obtain an image signal representing an object to be read including characters. a signal acquisition means (1); a feature extracting means (2) for extracting the feature of a character to be recognized in the image based on the image signal; A large classification identification means (3) to be selected, a recognition dictionary that stores feature amounts and other information necessary for recognizing each character, and a candidate character that stores the feature amounts of the character to be recognized in the recognition dictionary. a detailed identification means (4) that performs a feature comparison calculation by comparing the information with the information of the detailed identification means (4), and selects a character based on a certain standard from among the characters identified by the detailed identification means (4) and outputs a signal representing the character. Recognized character output means (5), a candidate character number correspondence table (7) that stores constants for specifying the number of candidate characters when performing feature value comparison calculations corresponding to each character, and a table search means (8) for selecting one or more candidate characters based on the criteria and searching for the number of candidate characters corresponding to the selected candidate characters; Candidate character limiting means (9) is provided for sending candidate character signals of the number of candidate characters to detailed identification means (4). The detailed identification means (4) performs feature quantity comparison calculation only for the candidate characters specified by the candidate character limiting means (9).

く作用〉 上記の構成の文字認識装置によれば、画像信号取得手段
(1)により取り込まれた画像信号に含まれる文字信号
に対して、特徴量抽出手段(2)によって特徴量が抽出
される。大分類識別手段(3)は、この特徴量を用いて
簡単な手法で一群の候補文字を選定する。上記候補文字
数は、正続文字が必ず存在しているように、十分な数に
設定される。
Effects> According to the character recognition device configured as described above, the feature amount extraction means (2) extracts the feature amount from the character signal included in the image signal captured by the image signal acquisition means (1). . The major classification identification means (3) selects a group of candidate characters by a simple method using this feature amount. The number of candidate characters is set to a sufficient number so that there are always consecutive characters.

ところで、認識しようとする文字か複雑な場合、候補文
字もこれに類似した複雑な文字からなり、認識しようと
する文字が簡単な場合、候補文字もこれに類似した簡単
な文字からなることは容易に推測できる。ここで、「複
雑」 「簡単」というのは、認識する上で他の文字を誤
読してしまう可能性が高いか低いかをいい、例えば画数
の少ない文字でも類似した文字が多くあれば「複雑」な
文字であり、画数の多い文字でもそれが特異な特徴を持
っていて類似した文字がほとんどない場合[簡単」な文
字といえる。
By the way, if the character you are trying to recognize is complex, the candidate characters will also consist of complex characters similar to it, and if the character you are trying to recognize is simple, the candidate characters will also easily consist of similar simple characters. It can be inferred that Here, "complex" and "easy" refer to whether there is a high or low possibility of misreading other characters during recognition. Even if a character has a large number of strokes, it can be said to be a ``simple'' character if it has unique characteristics and there are almost no similar characters.

そこで、大分類識別手段(3)により簡単な手法で候補
文字群を選択する時には、「簡単」な文、字に対しては
、少ない候補文字を選択しても、上位に正読文字が入る
のに対し、「複雑」な文字に対しては、多くの候補文字
を選択しなければ上位に正読文字が入るとは限らないこ
とに着目した。
Therefore, when selecting candidate character groups using a simple method using the major classification identification means (3), even if fewer candidate characters are selected for "simple" sentences and characters, the correct reading characters will be ranked high. On the other hand, we focused on the fact that for "complex" characters, unless many candidate characters are selected, the correct reading characters will not always be ranked high.

本発明では、文字の「簡単」さ、「複雑」さといった性
質の相違に応じて、各文字に対応する、特徴量比較計算
をする時の候補文字数を候補文字数対応表(7)に記憶
させておき、実際に選定された候補文字に基づいて、候
補文字数対応表(7)を参照しながら、詳細計算の対象
とする候補文字を限定している。
In the present invention, the number of candidate characters corresponding to each character at the time of feature value comparison calculation is stored in the candidate character number correspondence table (7) according to the difference in characteristics such as "simpleness" and "complexity" of the characters. Then, based on the actually selected candidate characters, candidate characters to be subjected to detailed calculation are limited while referring to the candidate character number correspondence table (7).

すなわち、大分類識別手段(3)で選定された候補文字
を表検索手段(8)に通知し、表検索手段(8)は通知
された候補文字の中から所定の基準で1つまたは複数の
候補文字を選ぶ。そして、候補文字数対応表(7)を参
照して、上記選ばれた候補文字に対応する候補文字数を
検索する。候補文字限定手段(9)は、検索した候補文
字数を基に1つの候補文字数を決定し、この限定された
候補文字を詳細識別手段(4)に送り出す。詳細識別手
段(4)は、認識用辞書(6)を参照しながら、上記限
定された候補文字のみについて、それぞれ比較計算をす
る。認識文字出力手段(5)は、詳細識別手段(4)で
識別された文字の中から特定の文字を選定して当該文字
を表わす信号を出力する。
That is, the candidate characters selected by the major classification identification means (3) are notified to the table search means (8), and the table search means (8) selects one or more candidate characters from among the notified candidate characters based on predetermined criteria. Select candidate characters. Then, referring to the candidate character number correspondence table (7), the number of candidate characters corresponding to the selected candidate character is searched. The candidate character limiting means (9) determines the number of candidate characters based on the number of searched candidate characters, and sends the limited candidate characters to the detailed identification means (4). The detailed identification means (4) performs comparative calculations on only the limited candidate characters while referring to the recognition dictionary (6). The recognized character output means (5) selects a specific character from among the characters identified by the detailed identification means (4) and outputs a signal representing the character.

〈実施例〉 以下実施例を示す添付図面によって詳細に説明する。<Example> Embodiments will be described in detail below with reference to the accompanying drawings showing embodiments.

第2図は、本発明の文字認識装置の一構成を示すブロッ
ク図であり、(1a)は原稿全体を写し出すビジコン等
のイメージカメラ、およびその出力信号を二値化して整
形された信号を得る二値化回路からなる画像入力部を表
わす。画像信号は、文字切り出し手段(lb)によって
1字1字ごとに細分される。特徴抽出手段(2)は、各
文字の特徴量を抽出する。例えば、文字輪郭線の方向ベ
クトルのヒストグラムや空白部領域量の分布である。こ
の特徴量は、大分類識別手段(3)に入力され、大分類
識別手段(3)は、演算時間の短い簡単な識別関数、例
えばシティ・ブロック関数や線形−次関数等で全字種と
の相違度を計算する。そして、複数の候補文字を選定す
る。
FIG. 2 is a block diagram showing the configuration of the character recognition device of the present invention, in which (1a) shows an image camera such as a bidicon that captures the entire document, and obtains a formatted signal by binarizing its output signal. It represents an image input section consisting of a binarization circuit. The image signal is subdivided character by character by character cutting means (lb). The feature extraction means (2) extracts the feature amount of each character. For example, it is a histogram of directional vectors of character outlines or a distribution of blank area amounts. This feature quantity is input to the major classification identification means (3), which uses a simple classification function that takes a short calculation time, such as a city block function or a linear-order function, to identify all character types. Calculate the dissimilarity of Then, a plurality of candidate characters are selected.

各候補文字のデータは表検索手段(8)に送られ、表検
索手段(8)は、候補文字数対応表(7)を用いて、候
補文字のうちの最上位の文字(相違度が最少である文字
)、または上位数位の文字について候補文字数対応表(
7)から候補文字数を検索する。候補文字限定手段(9
)は、これらの候補文字数に基づいて1つの候補文字数
nを決定し、候補文字群の中から上位n個の候補文字を
詳細識別手段(4)に送り出す。候補文字数対応表(力
の内容は、候補文字数対応表(7)の作成時に予め決定
しておけばよいが、実際に使用した結果により内容を更
新していくことが好ましい。例えば、文字認識装置に学
習機能を付け、学習により更新されるようにすればよい
The data of each candidate character is sent to the table search means (8), and the table search means (8) uses the candidate character number correspondence table (7) to search for the highest character among the candidate characters (the one with the least degree of difference). (a certain character) or the number of candidate characters for the top characters (
7) Search for the number of candidate characters. Candidate character limiting means (9
) determines the number n of one candidate character based on the number of candidate characters, and sends the top n candidate characters from the candidate character group to the detailed identification means (4). Candidate Character Number Correspondence Table (The contents of the force may be determined in advance when creating the Candidate Character Number Correspondence Table (7), but it is preferable to update the contents based on the results of actual use. For example, when using a character recognition device It is sufficient to add a learning function to the system so that it can be updated through learning.

また、学習という方法を取らず、人為的に更新していっ
てもよい。
Alternatively, the information may be updated artificially without using the learning method.

詳細識別手段(4)では、上記n個の候補文字のそれぞ
れについて相違度を計算する。例えば、認識用辞書(6
)に入っている候補文字の特徴量と、特徴抽出手段(2
)で求められた特徴量との距離を求めてもよい。
The detailed identification means (4) calculates the degree of dissimilarity for each of the n candidate characters. For example, the recognition dictionary (6
) and the feature extraction means (2).
) may also be used to find the distance to the feature amount found in ( ).

このようにして、各候補文字につき相違度が求まるので
、順位決定手段(51)は相違度の小さいものから順に
並べ変え、第一順位の文字コードを出力する。
In this way, since the degree of dissimilarity is determined for each candidate character, the ranking determining means (51) rearranges the candidate characters in descending order of degree of dissimilarity and outputs the character code of the first rank.

次に、以上の文字認識装置の動作を「本日は晴天なり」
と書かれた原稿の最初の文字「本」を認識する場合を例
にとって説明する。
Next, we will change the operation of the above character recognition device to "It's sunny today".
An example of recognizing the first character "hon" in a manuscript written as "hon" will be explained.

画像入力部(1a)が「本日は晴天なり」と書かれた原
稿を読み取ると、文字切り出し手段(1b)は「本」 
1日」 「は」 「晴」 「天」 「な」 「す」とい
った単文字に画像信号を変換する。そして特徴抽出手段
(2)は、「本」の特徴量X  (Xl、X2.−、X
n)を抽出する。大分類識別手段(3)は、前述したよ
うな手法で複数(例えば50程度)の候補文字「本」「
木」 「水」 「承」・・・を選定する。表検索手段(
8)は、例えば候補文字のうちの最上位の文字「本」に
ついて候補文字数を検索し、候補文字数対応表(7)か
ら3との結果を得る。候補文字限定手段(9)は、上位
3位の候補文字「本」 「木」 「水」を限定し、詳細
識別手段(4)に供給する。詳細識別手段(4)は、上
記3文字のみについて距離計算を行う。認識用辞書(6
)に入っている候補文字「本」の特徴量をA(A1.A
2.−An)、「木」の特徴量をB (Bl、B2.−
・・Bn)、r水」の特徴量をC(CL、C2,−Cn
)とすると、下記のシティ・ブロック関数により距離X
−A I −[(XI−Al)2 +(X2−A2)2
+−+   (Xn−An) 2 コ 1/2X −B
 l −[(Xi−Bl)2 +(X2−82)2+−
+  (Xn−Bn)2] 112X −CI −[(
Xi−C1)2 +(X2−02)2+−+   (X
n−Cn) 2 コ  l/2を求める。
When the image input unit (1a) reads a manuscript that says "Today is sunny", the character extraction means (1b) reads "Book".
Image signals are converted into single characters such as ``1st'', ``ha'', ``sunny'', ``heaven'', ``na'', and ``su''. Then, the feature extraction means (2) extracts the feature amount X (Xl, X2.-, X
Extract n). The major classification identification means (3) uses the method described above to identify a plurality of (for example, about 50) candidate characters "hon" and "hon".
Select "Thursday", "Wednesday", "Sho"... Table search means (
In step 8), for example, the number of candidate characters is searched for the highest character "hon" among the candidate characters, and a result of 3 is obtained from the candidate character number correspondence table (7). The candidate character limiting means (9) limits the top three candidate characters "hon", "wood" and "water" and supplies them to the detailed identification means (4). The detailed identification means (4) performs distance calculations on only the three characters mentioned above. Recognition dictionary (6
) is the feature amount of the candidate character “hon” in A (A1.A
2. -An), the feature amount of "tree" is B (Bl, B2.-
... Bn), r water' feature amount C(CL, C2, -Cn
), then the distance
-A I -[(XI-Al)2 +(X2-A2)2
+-+ (Xn-An) 2 ko 1/2X -B
l −[(Xi−Bl)2 +(X2−82)2+−
+ (Xn-Bn)2] 112X -CI -[(
Xi-C1)2 + (X2-02)2+-+ (X
Find n-Cn) 2 ko l/2.

なお、前述した大分類識別の際でも、このシティ・ブロ
ック関数が使われることがあるが、大分類識別の際は、
すべての項(1〜n)について総和をとるのではなく、
例えば、3つ飛ばしとか5つ飛ばしに「間引いて」高速
に計算する点で、この詳細分類の計算手順と異なってい
る。
Note that this city-block function is sometimes used for the above-mentioned major classification identification, but for the major classification identification,
Instead of taking the sum of all terms (1 to n),
For example, the calculation procedure differs from the detailed classification calculation procedure in that the calculation is performed at high speed by "thinning out" three or five parts at a time.

順位決定手段(51)は距離の小さいものから順に並べ
変え、第一順位の文字コードを出力する。
A ranking determining means (51) rearranges the characters in descending order of distance and outputs the character code of the first ranking.

なお、このように候補文字の数を限定して詳細分類する
こととしても、候補文字数の少ない入力文字はど、上位
に正続文字が入っているので、認識性能が低下すること
はない。
Note that even if the number of candidate characters is limited and detailed classification is performed in this way, the recognition performance will not deteriorate because input characters with a small number of candidate characters have regular continuation characters at the top.

以上のようにして、「本日は晴天なり」全ての文字に対
して、決定された候補文字数を示すグラフを第3図に示
す。「本」 「日」等のように「簡単」な文字に対して
は候補文字数は2〜3と少ないが、「晴」 「天」のよ
うな「複雑」な文字に対しては候補文字数は5〜8と多
くなっている。
FIG. 3 shows a graph showing the number of candidate characters determined for all characters of ``It's sunny today'' in the manner described above. For ``simple'' characters such as ``hon'' and ``日'', the number of candidate characters is small at 2 to 3, but for ``complex'' characters such as ``sunny'' and ``heaven'', the number of candidate characters is small. The number is 5-8.

以上の手法により、「本日は晴天なり」を認識するのに
要した時間を第4図に示す。
Figure 4 shows the time required to recognize "It's sunny today" using the above method.

第4図(a)は、従来の文字認識装置で認識した時間を
示すものであり、各文字について同じ計算時間tがかか
っているので、全ての文字に対しては、 X7 の時間がかかる。一方、第4図(b)は本実施例の文字
認識装置を用いて認識した時間を示し、各文字の「複雑
」さの相違に応じて計算時間t 1. t 2゜X7が
異なっている。この結果、全体としての認識時間は、 t=Σtn となり、第4図(a)の場合よりも短縮できることが分
かる。
FIG. 4(a) shows the time required for recognition by a conventional character recognition device. Since the same calculation time t is required for each character, it takes X7 time for all characters. On the other hand, FIG. 4(b) shows the time required for recognition using the character recognition device of this embodiment, and the calculation time t1. t2°X7 is different. As a result, the overall recognition time becomes t=Σtn, and it can be seen that it can be shorter than the case of FIG. 4(a).

なお、上記の実施例では、候補文字のうちの最上位の文
字「本」のみに基づいて候補文字数を決定する例を示し
たが、これに限定されるものではなく、上位数文字につ
いてそれぞれ候補文字数を求め、求めた複数の候補文字
数に基づき1つの候補文字数を決定するようにしてもよ
い。例えば、上位3文字「本」 「木」 「水」の候補
文字数がそれぞれ3,4.5であったとすると、これら
の候補文字数の平均値をとったり、最大値、最小値をと
ったりして1つの候補文字数を決定してもよい。
In addition, in the above example, an example was shown in which the number of candidate characters is determined based only on the highest character "hon" among the candidate characters, but the number of candidate characters is determined only based on the highest character "hon" among the candidate characters. The number of characters may be determined, and the number of one candidate character may be determined based on the number of candidate characters found. For example, if the number of candidate characters for the top three characters ``book'', ``wood'', and ``water'' is 3 and 4.5, respectively, then the average value of these candidate character numbers, the maximum value, and the minimum value are taken to form one. The number of candidate characters may also be determined.

なぜなら、大分類の結実現れる候補文字群に含まれる各
候補文字は、候補文字数対応表(7)の中では、おおむ
ね等しい候補文字数を伴っていると考えられるので、最
上位の文字のみについて候補文字数を決定しても、上位
幾つかの文字について候補文字数を決定しても、結果は
あまり異ならないからである。
This is because each candidate character included in the candidate character group realized as a result of the major classification is considered to have approximately the same number of candidate characters in the candidate character number correspondence table (7). This is because the results do not differ much even if the number of candidate characters is determined for the top few characters.

また、上記の実施例では、詳細識別手段(4)は、単純
に距離を求めていたが、認識率を上げるため、もっと複
雑な計算式を用いることも可能であるのは勿論である。
Further, in the above embodiment, the detailed identification means (4) simply calculates the distance, but it is of course possible to use a more complicated calculation formula in order to increase the recognition rate.

このときでも、候補文字の「簡単」さに応じて計算する
候補文字数を限定することができるので、本発明は、計
算時間の短縮に非常に効果的である。
Even in this case, the number of candidate characters to be calculated can be limited depending on the "simpleness" of the candidate characters, so the present invention is very effective in shortening calculation time.

また、上記実施例では、イメージカメラを用いて原稿画
像を入力していたが、これに限定されるものではなく、
ファクシミリ等通信回線を通して画像上方を入力するも
のであってもよい。
Further, in the above embodiment, the image camera was used to input the document image, but the invention is not limited to this.
The upper part of the image may be input through a communication line such as a facsimile.

その池水発明の要旨を変更しない範囲内において、種々
の設計変更を施すことが可能である。
Various design changes can be made without changing the gist of the invention.

〈発明の効果〉 以上のように、本発明の文字認識装置によれば、文字の
「複雑」 「簡単」の度合いに応じて詳細計算する候補
文字の数を予め決定して計算候補文字数対応表に記憶さ
せておき、大分類された候補文字群について1つの候補
文字数を決定した。
<Effects of the Invention> As described above, according to the character recognition device of the present invention, the number of candidate characters to be calculated in detail is determined in advance according to the degree of "complexity" or "simpleness" of the character, and a calculation candidate character number correspondence table is created. The number of candidate characters for one candidate character group was determined for each broadly classified candidate character group.

したがって、従来全ての候補文字に対して詳細計算して
いたため計算時間がかかっていたところ、本発明では、
−人力文字当たり、限定された候補文字のみについて比
較計算できるため、文字認識時間を短縮することができ
る。特に「簡単」な文字の認識に要する時間が著しく短
縮されるので、「複雑」 「簡単」な文字の混在する一
般の文書の認識に当たって、−文字当たりの平均認識時
間を短縮できる。このため、同じ認識時間を許されるな
らば文字認識の精度を高めることができ、文字認識装置
としての認識性能の向上を実現することができる。
Therefore, in the past, detailed calculations were performed for all candidate characters, which took a long calculation time, but in the present invention,
- Character recognition time can be shortened because comparison calculations can be made for only limited candidate characters per human character. In particular, since the time required to recognize "simple" characters is significantly reduced, the average recognition time per -character can be reduced when recognizing general documents containing a mixture of "complex" and "simple" characters. Therefore, if the same recognition time is allowed, the precision of character recognition can be increased, and the recognition performance of the character recognition device can be improved.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明の文字認識装置の構成を示すプロ 0ツク図、 第2図は文字認識装置一実施例を示すブロック構成図、 第3図は各入力文字に対応して決定された候補文字数の
グラフ、 第4図は文字認識時間の従来例との比較表、第5図は従
来の文字認識装置の構成を示すブロック図である。 (1)・・・画像信号取得手段、(2)・・・特徴量抽
出手段、■・・・大分類識別手段、(4)・・・詳細識
別手段、■・・・認識文字出力手段、(6)・・・認識
用辞書、(7)・・・候補文字数対応表、(8)・・・
表検索手段、(9)・・・候補文字限定手段 特許出願人  住友電気工業株式会社 代  理  人
Fig. 1 is a block diagram showing the configuration of the character recognition device of the present invention, Fig. 2 is a block diagram showing an embodiment of the character recognition device, and Fig. 3 is a candidate determined corresponding to each input character. A graph of the number of characters, FIG. 4 is a comparison table of character recognition time with a conventional example, and FIG. 5 is a block diagram showing the configuration of a conventional character recognition device. (1)...Image signal acquisition means, (2)...Feature quantity extraction means, ■...Major classification identification means, (4)...Detailed identification means, ■...Recognized character output means, (6)...Recognition dictionary, (7)...Candidate character number correspondence table, (8)...
Table search means, (9)... Candidate character limiting means Patent applicant: Sumitomo Electric Industries, Ltd. Agent

Claims (1)

【特許請求の範囲】 1、文字を含む被読取対象を表わす画像信 号を取得する画像信号取得手段と、上記 画像信号に基づき画像中の認識しようと する文字の特徴量を抽出する特徴量抽出 手段と、上記特徴量を基に演算を行い複 数の候補文字を選定する大分類識別手段 と、各文字の認識に必要な特徴量に関す る情報を記憶した認識用辞書と、上記認 識しようとする文字の特徴量を認識用辞 書に記憶された候補文字の情報と比較し 特徴量比較計算を行う詳細識別手段と、 詳細識別手段で識別された文字の中から 一定の基準で文字を選択して当該文字を 表わす信号を出力する認識文字出力手段 とを有する文字認識装置において、 特徴量比較計算をする時の候補文字数 を指定する定数を各文字に対応して記憶 している候補文字数対応表と、上記候補 文字の中から所定の基準で1つまたは複 数の候補文字を選び出し、この選び出し た候補文字に対応する候補文字数を検索 する表検索手段と、検索した候補文字数 を基に1つの候補文字数を決定して当該 候補文字数に対応する候補文字を詳細識 別手段に送り出す候補文字限定手段とを 具備し、かつ、上記詳細識別手段が候補 文字限定手段から指定された候補文字に 対して特徴量比較計算をすることを特徴 とする文字認識装置。[Claims] 1. Image signal representing the object to be read including characters an image signal acquisition means for acquiring the image signal; Trying to recognize images based on image signals Feature extraction to extract the features of characters Calculations are performed based on the means and the above features. Broad classification identification means for selecting candidate characters for numbers and the features necessary for recognizing each character. The recognition dictionary that stores the information Use recognition words to identify the features of the character you are trying to recognize. Compare the candidate character information stored in the detailed identification means for performing feature value comparison calculation; From among the characters identified by detailed identification means Select characters based on certain criteria and select the characters Recognized character output means that outputs a signal representing In a character recognition device having Number of candidate characters when calculating feature comparison Stores constants specifying each character. The correspondence table for the number of candidate characters and the above candidates Select one or more characters from among the characters according to the specified criteria. Select a number of candidate characters and select this selection Search for the number of candidate characters corresponding to the candidate character Table search method and number of searched candidate characters Determine the number of candidate characters based on Detailed knowledge of candidate characters corresponding to the number of candidate characters Candidate character limiting means to send to another means and the above detailed identification means are candidates. To the candidate character specified from the character limiting means The feature is that it performs feature comparison calculations for character recognition device.
JP1012721A 1989-01-20 1989-01-20 Character recognizing device Pending JPH02193281A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP1012721A JPH02193281A (en) 1989-01-20 1989-01-20 Character recognizing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP1012721A JPH02193281A (en) 1989-01-20 1989-01-20 Character recognizing device

Publications (1)

Publication Number Publication Date
JPH02193281A true JPH02193281A (en) 1990-07-30

Family

ID=11813294

Family Applications (1)

Application Number Title Priority Date Filing Date
JP1012721A Pending JPH02193281A (en) 1989-01-20 1989-01-20 Character recognizing device

Country Status (1)

Country Link
JP (1) JPH02193281A (en)

Similar Documents

Publication Publication Date Title
JP3689455B2 (en) Information processing method and apparatus
US6542635B1 (en) Method for document comparison and classification using document image layout
JPH0664631B2 (en) Character recognition device
US20030012440A1 (en) Form recognition system, form recognition method, program and storage medium
JP3917349B2 (en) Retrieval device and method for retrieving information using character recognition result
JPS60153574A (en) Character reading system
JPH02193281A (en) Character recognizing device
KR19990016894A (en) How to search video database
JPH05314320A (en) Recognition result evaluating system using difference of recognition distance and candidate order
JPH02193280A (en) Character recognizing device
JPH0766423B2 (en) Character recognition device
JP2976445B2 (en) Character recognition device
JP3720405B2 (en) Region identification apparatus and method
JP2728117B2 (en) Character recognition device
JPH113401A (en) Information processor and its method
JP4215385B2 (en) PATTERN RECOGNIZING DEVICE, PATTERN RECOGNIZING METHOD, AND COMPUTER-READABLE RECORDING MEDIUM CONTAINING PROGRAM FOR CAUSING COMPUTER TO EXECUTE THE METHOD
JP2851865B2 (en) Character recognition device
JP2622004B2 (en) Character recognition device
JP2969751B2 (en) Character recognition processing method
JP2886690B2 (en) Character recognition method for optical character reader
JPH0520490A (en) Optical character read and correction system
JPS6252912B2 (en)
JPH024035B2 (en)
JPH0438026B2 (en)
JPH067394B2 (en) Pattern recognizer