JPH06119495A

JPH06119495A - Device and method for recognizing character

Info

Publication number: JPH06119495A
Application number: JP4264930A
Authority: JP
Inventors: Yasuhiko Murayama; 靖彦村山
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 1992-10-02
Filing date: 1992-10-02
Publication date: 1994-04-28

Abstract

PURPOSE:To reduce the number of times for comparison with a dictionary for recognition and to shorter a recognizing period of time by using a threshold value for stopping useless calculation for the comparison with a standard feature vector which takes the longest processing time among character recognition processing. CONSTITUTION:An input module 21 of a ROM device 20 performs processing for inputting images on the surface of paper containing plural handwritten or printed characters, and a segment module 22 segments character strings or character images from the inputted image data on the surface of paper. On the other hand, a feature vector is extracted from the character image segmented by a feature vector extraction module 23, and a threshold value calculation module 24 calculates the threshold value used for comparison with the dictionary. Then, a character decision module 25 compares the standard feature vector of each character stored in the dictionary for recognition with the feature vector extracted by the feature vector extraction module 23, and decides the segmented character.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、手書きまたは印刷され
た文字を含む紙面から文字を認識する文字認識装置にお
いて認識用辞書との比較処理を高速化するための文字認
識方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition method for accelerating comparison processing with a recognition dictionary in a character recognition device for recognizing characters from a paper surface containing handwritten or printed characters.

【０００２】[0002]

【従来の技術】文字認識装置における文字認識処理の流
れは図１２に示すように、始めに認識対象の文字イメー
ジから特徴ベクトルを抽出し（ステップ１０１）、次に
文字を確定するために認識用辞書に収められている各文
字の標準特徴ベクトルと文字イメージから抽出した特徴
ベクトルとの比較を行い最も近い標準特徴ベクトルが示
す文字を文字イメージが示す文字とすることで文字認識
を行う（ステップ１０２）。認識用辞書に収められてい
る標準特徴ベクトルとの比較の際、認識用辞書に登録さ
れているｉ番目の標準特徴ベクトルａi＝（ａi1，ａi
2，・・・，ａin）と抽出した特徴ベクトルα＝（α1，
α2，・・・，αn）との比較をシティブロック距離を用
いて行う場合、ステップ１０２に示す式で距離「add」
を求め比較を行う。2. Description of the Related Art The flow of character recognition processing in a character recognition device is as shown in FIG. Character recognition is performed by comparing the standard feature vector of each character stored in the dictionary with the feature vector extracted from the character image, and determining the character indicated by the closest standard feature vector as the character indicated by the character image (step 102). ). At the time of comparison with the standard feature vector stored in the recognition dictionary, the i-th standard feature vector registered in the recognition dictionary ai = (ai1, ai
2, ..., ain) and extracted feature vector α = (α1,
α2, ..., αn) is compared using the city block distance, the distance “add” is calculated by the equation shown in step 102.
And compare.

【０００３】文字認識処理においては、認識用辞書との
比較（ステップ１０２）に最も処理時間がかかる。そこ
で、固定のしきい値を設定し、標準特徴ベクトルと抽出
した特徴ベクトルとの各要素の比較の途中で、設定した
しきい値の値を越えると比較している標準特徴ベクトル
が適切でないと判断して比較処理を中断し、次の標準特
徴ベクトルとの比較を始めることにより高速化を図って
いた。In the character recognition process, the comparison with the recognition dictionary (step 102) takes the longest processing time. Therefore, if a fixed threshold value is set and the standard feature vector exceeds the set threshold value during the comparison of each element between the standard feature vector and the extracted feature vector, the standard feature vector being compared is not appropriate. The decision was made to interrupt the comparison process, and the comparison with the next standard feature vector was started to speed up the process.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、文字認
識装置における従来の認識用辞書との比較方法は、認識
用辞書に収められた標準特徴ベクトルと抽出した特徴ベ
クトルとの比較を行う際に固定のしきい値を用いて比較
の中断を行うことによる高速化を図っていたが、認識用
辞書に収められた標準特徴ベクトルもしくは抽出した特
徴ベクトルに適したしきい値の設定ができないため無駄
な比較が生じ、充分な高速化が図れないという問題があ
った。However, the conventional method of comparing the recognition dictionary in the character recognition apparatus is fixed when the standard feature vector stored in the recognition dictionary is compared with the extracted feature vector. We tried to speed up by suspending the comparison using a threshold value, but it is not possible to set a threshold value suitable for the standard feature vector stored in the recognition dictionary or the extracted feature vector, which is a wasteful comparison. However, there is a problem in that the speed cannot be sufficiently increased.

【０００５】そこで、本発明の文字認識装置および文字
認識方法は、文字認識処理における辞書との比較処理に
おいて適切なしきい値の設定を行ない、認識用辞書との
無駄な比較を打ち切ることによって高速化を図り、文字
認識処理全体の処理時間を短縮することを目的とする。Therefore, the character recognizing apparatus and the character recognizing method of the present invention are speeded up by setting an appropriate threshold value in the comparison processing with the dictionary in the character recognition processing and terminating unnecessary comparison with the recognition dictionary. It is intended to reduce the processing time of the entire character recognition process.

【０００６】[0006]

【課題を解決するための手段】上記課題を解決するため
に本発明の文字認識装置および文字認識方法は、手書き
または印刷された文字を含む紙面のイメージを入力する
入力手段と、入力手段により入力した紙面のイメージか
ら文字列や文字の切り出しを行う切り出し手段と、切り
出し手段により切り出した文字から特徴ベクトルを抽出
する特徴ベクトル抽出手段と、認識用辞書に収められた
標準特徴ベクトルとの比較の際に用いるしきい値を算出
するしきい値算出手段と、しきい値算出手段により算出
したしきい値を用いて、認識用辞書に収められた標準特
徴ベクトルと特徴ベクトル抽出手段により抽出した特徴
ベクトルとの比較を行ない、切り出し手段で切り出した
文字の決定を行う文字決定手段とを備えたことを特徴と
する。In order to solve the above-mentioned problems, a character recognition device and a character recognition method of the present invention include an input means for inputting an image of a paper surface including handwritten or printed characters, and an input means. When comparing the cut-out means for cutting out a character string or a character from the image on the paper, the feature vector extraction means for extracting a feature vector from the character cut out by the cut-out means, and the standard feature vector stored in the recognition dictionary. Threshold value calculating means for calculating the threshold value used in the above, and the feature vector extracted by the feature vector extracting means using the standard feature vector stored in the recognition dictionary using the threshold value calculated by the threshold value calculating means. And character determining means for determining the character cut out by the cutting means.

【０００７】[0007]

【Example】

（実施例１）以下、本発明の一実施例について図面を用
いて説明する。(Embodiment 1) An embodiment of the present invention will be described below with reference to the drawings.

【０００８】図１は本実施例の装置の構成例を示すブロ
ック図である。同図より、本実施例の文字認識装置は、
各処理を制御するＣＰＵ１０と、処理モジュールなどが
格納されたＲＯＭ装置２０と、各処理結果等を一時的に
蓄えるＲＡＭ装置３０と、文字の入力や表示などを行う
入出力装置４０とを備えている。ＲＯＭ装置２０には、
手書きまたは印刷された複数個の文字を含む紙面のイメ
ージを入力する処理である入力モジュール２１と、入力
した紙面のイメージデータから文字列や文字イメージの
切り出しを行う切り出しモジュール２２と切り出した文
字イメージから特徴ベクトルを抽出する特徴ベクトル抽
出モジュール２３と、辞書との比較の際に用いるしきい
値を算出するためのしきい値算出モジュール２４と認識
用辞書に収められた各文字の標準特徴ベクトルと特徴ベ
クトル抽出モジュール２３により抽出した特徴ベクトル
との比較を行ない、切り出した文字の決定を行う文字決
定モジュール２５とが備えられている。また、入出力装
置４０には、手書きまたは印刷された文字を含む紙面の
イメージを入力するイメージスキャナ４１と、入力した
紙面や文字認識結果を表示する表示装置４２と、各文字
の標準特徴ベクトルが収められた認識用辞書等を収める
ための記憶装置４３とが備えられている。FIG. 1 is a block diagram showing an example of the arrangement of the apparatus of this embodiment. From the figure, the character recognition device of the present embodiment is
A CPU 10 for controlling each processing, a ROM device 20 in which processing modules and the like are stored, a RAM device 30 for temporarily storing each processing result and the like, and an input / output device 40 for inputting and displaying characters and the like are provided. There is. In the ROM device 20,
From an input module 21 that is a process of inputting a paper surface image including a plurality of handwritten or printed characters, a cutout module 22 that cuts out a character string or a character image from the input paper surface image data, and a cutout character image A feature vector extraction module 23 for extracting a feature vector, a threshold value calculation module 24 for calculating a threshold value used for comparison with a dictionary, and standard feature vectors and features of each character stored in the recognition dictionary. A character determination module 25 that compares the feature vector extracted by the vector extraction module 23 and determines the cut-out character is provided. Further, the input / output device 40 includes an image scanner 41 for inputting an image of a paper surface including handwritten or printed characters, a display device 42 for displaying the input paper surface and a character recognition result, and a standard feature vector of each character. A storage device 43 for storing the stored recognition dictionary and the like is provided.

【０００９】次に本実施例の処理内容について説明す
る。本実施例における文字認識装置の処理は図２に示す
ように入力モジュール２１によるデータ読み込み処理で
ある入力手段２０１と、切り出しモジュール２２による
切り出し手段２０２と、特徴ベクトル抽出モジュール２
３による特徴ベクトル抽出手段２０３と、しきい値算出
モジュール２４によるしきい値算出手段（１）２０４
と、文字決定モジュール２５による文字決定手段（１）
２０５とから構成される。図において切り出し手段２０
２で１つの文字イメージが切り出される毎に文字認識を
行う場合には、文字決定手段（１）２０５により切り出
した文字イメージの文字コードが確定すると、次の文字
イメージの切り出しに移るために切り出し手段２０２へ
戻るものとする。また、切り出し手段２０２において、
入力した紙面イメージからの文字イメージの切り出しを
すべて行った後に文字認識処理を行う場合には、切り出
した文字イメージ毎に特徴ベクトル抽出手段２０３によ
る特徴ベクトルの抽出、しきい値算出手段（１）２０４
によるしきい値の算出、文字決定手段（１）２０５によ
る切り出した文字イメージの文字コードの確定を繰り返
し行っていくものとする。Next, the processing contents of this embodiment will be described. As shown in FIG. 2, the processing of the character recognition apparatus in this embodiment is an input unit 201 which is a data reading process by the input module 21, a cutout unit 202 by the cutout module 22, and a feature vector extraction module 2.
Feature vector extraction means 203 by 3 and threshold value calculation means (1) 204 by threshold value calculation module 24
And character determining means (1) by the character determining module 25
And 205. In the figure, the cutting means 20
When character recognition is performed every time one character image is cut out in 2, when the character code of the cut out character image is determined by the character determination means (1) 205, the cutout means moves to the cutout of the next character image. Return to 202. Also, in the cutting means 202,
When character recognition processing is performed after all character images have been cut out from the input paper image, feature vector extraction means 203 extracts a feature vector for each cut out character image, and threshold value calculation means (1) 204
It is assumed that the calculation of the threshold value by the above and the determination of the character code of the cut out character image by the character determining means (1) 205 are repeated.

【００１０】まず、データ入力手段２０１について説明
する。データ入力手段２０１は、イメージスキャナ４１
を用いて手書きまたは印刷された文字を含む紙面のイメ
ージを読み込む。具体的には、オペレータがＣＰＵ１０
の制御下で入力モジュール２１を起動させ雑誌等に印刷
された所望の文字を含む紙面のイメージを読み取る。こ
の読み取りによって紙面のイメージデータが入力され
る。また、必要によっては入力された紙面のイメージは
表示装置４２に表示される。First, the data input means 201 will be described. The data input unit 201 is the image scanner 41.
Use to read an image on paper that includes handwritten or printed characters. Specifically, the operator uses the CPU 10
Under the control of, the input module 21 is activated to read the image of the paper surface including the desired characters printed on the magazine or the like. By this reading, the image data of the paper surface is input. If necessary, the input image on the paper surface is displayed on the display device 42.

【００１１】次に、切り出し手段２０２について説明す
る。切り出し手段２０２は、入力手段２０１により入力
した紙面イメージから文字列の切り出しおよび切り出し
た文字列から１文字毎の文字イメージの切り出しを行
う。文字列の切り出し、１文字毎の文字イメージの切り
出し方法として、例えば公知の方法である射影を用い
る。横書きの文書の場合、射影を用いた文字列の切り出
し、文字の切り出しは以下のように行われる。始めに水
平方向の射影（垂直軸に対する射影）を測定し、射影の
値が特定の数値を超える範囲を文字列として切り出す。
次に切り出した文字列を用いて垂直方向の射影（水平軸
に対する射影）を測定し、射影の値が特定の数値を超え
る範囲を文字として切り出す。Next, the cutting means 202 will be described. The cutout unit 202 cuts out a character string from the paper image input by the input unit 201 and a character image for each character from the cut out character string. As a method for cutting out a character string and a character image for each character, for example, projection, which is a known method, is used. In the case of a horizontally written document, character string segmentation and character segmentation using projection are performed as follows. First, measure the horizontal projection (projection to the vertical axis) and cut out the range where the projection value exceeds a specific numerical value as a character string.
Next, the vertical projection (projection to the horizontal axis) is measured using the cut-out character string, and the range in which the projection value exceeds a specific numerical value is cut out as a character.

【００１２】特徴ベクトル抽出手段２０３について説明
する。特徴ベクトル抽出手段２０３は切り出した文字イ
メージから文字認識するための特徴を数値的に表現する
手段である。なお、文字認識するための特徴を数値的に
表現したものを「特徴ベクトル」と呼ぶ。文字認識する
ための特徴の一例としてとしてペリフェラル特徴につい
て図３を用いて簡単に説明する。まず、文字パターンの
外接枠を求め、外接枠を４辺をそれぞれいくつかに分割
する。図では４つに分割した例を示してある。そして、
分割された外接枠と、外接枠から見て最初に出会う文字
部で囲まれた白領域の面積を計数し、これを外接枠で囲
まれた面積で正規化することによって、特徴ベクトルα
｛＝（α1，α2，・・・，α16）｝（１６次元の特徴ベ
クトル）が得られる。The feature vector extraction means 203 will be described. The feature vector extraction means 203 is a means for numerically expressing the features for character recognition from the cut out character image. A numerical representation of the features for character recognition is called a "feature vector". As an example of features for character recognition, peripheral features will be briefly described with reference to FIG. First, a circumscribing frame of a character pattern is obtained, and the circumscribing frame is divided into some four sides. In the figure, an example of dividing into four is shown. And
By counting the area of the divided circumscribing frame and the white area surrounded by the character part that first meets when seen from the circumscribing frame, and normalizing this with the area surrounded by the circumscribing frame, the feature vector α
{= (Α1, α2, ..., α16)} (16-dimensional feature vector) is obtained.

【００１３】次に、しきい値算出手段（１）２０４につ
いて説明する。しきい値算出手段（１）２０４では認識
用辞書に収められている標準特徴ベクトルと特徴ベクト
ル抽出手段２０３により抽出した特徴ベクトルとの比較
を高速化するためのしきい値を求める。なお、認識用辞
書に収められた標準特徴ベクトルとは、事前に文字認識
対象とする文字のイメージデータを用意しておき、その
用意した文字のイメージデータから特徴ベクトル抽出手
段２０３で用いる特徴ベクトル抽出方法と同じ方法によ
り特徴ベクトルを求め、文字認識対象とする文字の文字
コードとともに求めた特徴ベクトルを登録しておき、切
り出し手段２０３で切り出した文字イメージの文字認識
の際に用いるものである。以下で「標準特徴ベクトル」
とは認識用辞書に収められた標準特徴ベクトルのことを
意味するものとする。Next, the threshold value calculating means (1) 204 will be described. The threshold value calculation means (1) 204 finds a threshold value for speeding up the comparison between the standard feature vector stored in the recognition dictionary and the feature vector extracted by the feature vector extraction means 203. As the standard feature vector stored in the recognition dictionary, image data of a character to be a character recognition target is prepared in advance, and the feature vector extraction means 203 extracts the feature vector from the prepared image data of the character. A feature vector is obtained by the same method as the method, the obtained feature vector is registered together with the character code of the character to be recognized, and the feature image is used for character recognition of the character image cut out by the cutting means 203. "Standard feature vector" below
Is a standard feature vector stored in the recognition dictionary.

【００１４】本実施例ではこのしきい値を特徴ベクトル
抽出手段２０３で求めた特徴ベクトルより求める方法を
図４を用いて説明する。まず、特徴ベクトル抽出手段２
０３で求めたｎ次元の特徴ベクトルの各要素をγ乗した
ものの和を求め、その値をβとする（ステップ４０
１）。なお、γは標準特徴ベクトルとの比較にシティブ
ロック距離を用いる場合には「１」、ユークリッド距離
の２乗を用いる場合には「２」とするとよい。次に、実
験により事前に求めたβを変数とする関数ｆ（β）から
しきい値ｔｈαを求め（ステップ４０２）、この値を標
準特徴ベクトルとの比較の際に用いる。なお、ここで示
した関数ｆ（β）の求め方は、後で説明する。次に文
字決定手段（１）２０５について説明する。ここでは、
しきい値算出手段（１）２０４で求めたしきい値ｔｈα
を用いて標準特徴ベクトルと特徴ベクトル抽出手段２０
３により求めた特徴ベクトルとの比較を行うことにより
切り出した文字イメージを特定の文字と認識する。Ｍ個
の標準特徴ベクトルとの比較を行い文字を決定する場合
における、しきい値ｔｈαを用いた辞書との比較の高速
化方法を図５を用いて説明する。図中で「ｉ」は比較す
る標準特徴ベクトルの番号を示す変数であり、「min」
は標準特徴ベクトルとの比較の結果、最小の距離を収め
るための変数、「code」は認識結果を収めるための変数
である。In this embodiment, a method of obtaining this threshold value from the feature vector obtained by the feature vector extraction means 203 will be described with reference to FIG. First, the feature vector extraction means 2
The sum of the elements of the n-dimensional feature vector obtained in 03 raised to the power of γ is obtained, and the sum is set as β (step 40).
1). It should be noted that γ may be set to “1” when the city block distance is used for comparison with the standard feature vector and “2” when the square of the Euclidean distance is used. Next, a threshold thα is obtained from a function f (β) having β as a variable obtained in advance by experiment (step 402), and this value is used in comparison with the standard feature vector. The method of obtaining the function f (β) shown here will be described later. Next, the character determining means (1) 205 will be described. here,
Threshold value thα obtained by threshold value calculation means (1) 204
Using the standard feature vector and feature vector extraction means 20
The character image cut out is recognized as a specific character by performing comparison with the feature vector obtained in 3. A method of speeding up the comparison with the dictionary using the threshold value thα in the case of comparing the M standard feature vectors to determine a character will be described with reference to FIG. In the figure, "i" is a variable indicating the number of the standard feature vector to be compared, and "min"
Is a variable for containing the minimum distance as a result of comparison with the standard feature vector, and “code” is a variable for containing the recognition result.

【００１５】始めに変数にステップ５０１、５０２にお
いて初期化を行う。ここで変数「min」にはしきい値ｔ
ｈα、変数「code」には初期文字コードを代入する。な
お、初期文字コードは文字認識できない場合に表示する
文字コード（例えば’■’）を代入するとよい。First, variables are initialized in steps 501 and 502. Here, the variable “min” has a threshold t
The initial character code is assigned to hα and the variable “code”. As the initial character code, a character code (for example, "■") to be displayed when the character cannot be recognized may be substituted.

【００１６】図中のステップ５０３〜５１１は各標準特
徴ベクトルとの比較ループであり、各標準特徴ベクトル
との比較の過程で変数「code」および「min」の更新が
行われる。各標準特徴ベクトルとの比較ループの中のス
テップ５０４〜５０８は特徴ベクトル抽出手段２０３で
抽出した特徴ベクトルαとｉ番目の標準特徴ベクトルａ
iの要素ごとの比較ループである。この要素ごとの比較
ループにおいて無駄な比較の削減を行う。Steps 503 to 511 in the figure are a comparison loop with each standard feature vector, and variables "code" and "min" are updated in the process of comparison with each standard feature vector. Steps 504 to 508 in the comparison loop with each standard feature vector are the feature vector α extracted by the feature vector extraction means 203 and the i-th standard feature vector a.
It is a comparison loop for each element of i. In this element-by-element comparison loop, unnecessary comparisons are reduced.

【００１７】ステップ５０３では特徴ベクトルの要素番
号を示す変数「ｊ］を「１」で初期化し、「ｉ」番目の
標準特徴ベクトルとの距離を示す変数「add」を「０」
で初期化する。なお、特徴ベクトルの要素数はｎ（ｎ次
元）とする。辞書に収められているｉ番目の標準特徴ベ
クトルのｊ番目の要素ａijと特徴ベクトル抽出手段２０
３で求めた特徴ベクトルの要素αjとの距離を求め変数
「add」に加える（ステップ５０４）。ステップ５０４
ではシティブロック距離による比較の方法を示したが、
標準特徴ベクトルとの比較を別の方法、例えばユークリ
ッド距離の２乗等で求めてもかまわない。続いて、標準
特徴ベクトルとの距離を表す変数「add」と標準特徴ベ
クトルとの距離の最小値を表す変数「min」との比較を
行い（ステップ５０５）、「min」より大きい場合には
次の標準特徴ベクトルとの比較へ進み（ステップ５０
６）、そうでない場合には特徴ベクトルの次の要素の比
較のためにステップ５０４へ戻る（ステップ５０７、５
０８）。In step 503, the variable "j" indicating the element number of the feature vector is initialized to "1", and the variable "add" indicating the distance from the "i" th standard feature vector is set to "0".
Initialize with. The number of elements of the feature vector is n (n-dimensional). The j-th element aij of the i-th standard feature vector stored in the dictionary and the feature vector extraction means 20.
The distance from the feature vector element αj obtained in 3 is obtained and added to the variable “add” (step 504). Step 504
Then, I showed the method of comparison by city block distance,
The comparison with the standard feature vector may be obtained by another method, for example, the square of the Euclidean distance. Then, the variable "add" representing the distance to the standard feature vector and the variable "min" representing the minimum distance to the standard feature vector are compared (step 505). To the standard feature vector of (step 50
6), otherwise return to step 504 for comparison of next element of feature vector (steps 507, 5)
08).

【００１８】ステップ５０２において、変数「min」を
しきい値ｔｈαで初期化し、ステップ５０５で標準特徴
ベクトルとの距離を求める途中で、それまでに求めた距
離「add」と最小距離を表す「min」との比較を行い、標
準特徴ベクトルとの距離計算を継続するか判断すること
により比較回数の削減を行うことができる。なぜなら
ば、しきい値ｔｈαはこれ以上距離が離れていれば誤っ
た標準特徴ベクトルとの比較を行っていることを示す値
であり、このしきい値ｔｈαで標準特徴ベクトルとの最
小距離を表す変数「min」の初期化を行い、処理の過程
で「min」はしきい値ｔｈαより小さく、かつ比較した
標準特徴ベクトルの中で最も近い距離を示すように更新
されるからであれる。よって、この「min」の値を超え
た時点で現在比較している標準特徴ベクトルは適切な比
較対象でないことが分かり、特徴ベクトルの次の要素と
の比較を続ける必要がないことが分かる。In step 502, the variable "min" is initialized with a threshold value thα, and in the course of obtaining the distance from the standard feature vector in step 505, the distance "add" obtained up to that point and "min" representing the minimum distance are obtained. It is possible to reduce the number of comparisons by comparing with the standard feature vector and determining whether to continue the distance calculation with the standard feature vector. This is because the threshold value thα is a value indicating that if the distance is longer than this, comparison with an incorrect standard feature vector is performed, and this threshold value thα represents the minimum distance from the standard feature vector. This is because the variable "min" is initialized, and in the process, "min" is updated so as to be smaller than the threshold value thα and to indicate the closest distance among the compared standard feature vectors. Therefore, it is understood that the standard feature vector currently being compared at the time when the value of “min” is exceeded is not an appropriate comparison target, and it is not necessary to continue the comparison with the next element of the feature vector.

【００１９】本実施例では切り出した文字イメージから
特徴ベクトルを抽出し、それを用いてしきい値の設定を
行っているので、抽出した特徴ベクトルに適したしきい
値の設定が可能となる。よって、従来の固定のしきい値
を用いる方法に比べ、抽出した特徴ベクトルに合ったし
きい値を設定することができ、無駄な標準特徴ベクトル
との比較回数を減らすことが可能となる。このことは後
ほど説明する。In the present embodiment, the feature vector is extracted from the cut-out character image, and the threshold value is set using the feature vector. Therefore, the threshold value suitable for the extracted feature vector can be set. Therefore, as compared with the conventional method using a fixed threshold value, a threshold value suitable for the extracted feature vector can be set, and the number of unnecessary comparisons with the standard feature vector can be reduced. This will be explained later.

【００２０】特徴ベクトルの全要素との比較が終了した
場合（ステップ５０８において「NO」の場合）には、ス
テップ５０５での判断が「YES」となり、ステップ５０
８に至ったわけであるから、これまでの比較の中で最小
の距離「min」よりもｉ番目の標準特徴ベクトルとの距
離の方が近い（小さい）ことになる。そこで、標準特徴
ベクトルとの最小距離を表す「min」にｉ番目の標準特
徴ベクトルとの距離を示す「add」を代入し、認識結果
を示す変数「code」にｉ番目の標準特徴ベクトルが示す
文字コード「code(i)」を代入する（ステップ５０
９）。When the comparison with all the elements of the feature vector is completed ("NO" in step 508), the determination in step 505 becomes "YES", and step 50
Therefore, the distance to the i-th standard feature vector is shorter (smaller) than the smallest distance “min” in the comparison so far. Therefore, "min" indicating the minimum distance from the standard feature vector is substituted with "add" indicating the distance from the i-th standard feature vector, and the variable "code" indicating the recognition result indicates the i-th standard feature vector. Substitute the character code "code (i)" (step 50)
9).

【００２１】以上の方法により各標準特徴ベクトルとの
比較を行う。各標準特徴ベクトルとの比較を終了したと
き（ステップ５１２）、変数「code」に収められている
文字コードが文字の認識結果となる。なお、ステップ５
０９での更新が一度もされていない場合は、「code」に
初期文字コード（’■’）が収められたままとなる。こ
のような場合は、特徴ベクトル抽出手段２０３で抽出し
た特徴ベクトルαに近い標準特徴ベクトルがなかった、
すなわち文字認識できなかったことを意味する。以上
のようにしてしきい値算出手段（１）２０４で求めたし
きい値ｔｈαを用いることにより認識用辞書との比較回
数を減らし、高速化を図ることができる。図５では認識
用辞書に収められているＭ個の特徴ベクトルとの比較を
行い文字を決定する場合において、しきい値ｔｈαを用
いた辞書との比較の高速化方法を示したが、認識用辞書
に収められている特徴ベクトルのうち、いくつかに絞っ
て比較を行う場合にも同様にしてしきい値ｔｈαにより
高速化を行うことができる。Comparison with each standard feature vector is performed by the above method. When the comparison with each standard feature vector is completed (step 512), the character code stored in the variable "code" becomes the recognition result of the character. In addition, step 5
If the update in 09 is not performed even once, the initial character code ('■') remains stored in "code". In such a case, there is no standard feature vector close to the feature vector α extracted by the feature vector extraction unit 203,
That is, it means that the character could not be recognized. As described above, by using the threshold value thα obtained by the threshold value calculating means (1) 204, the number of comparisons with the recognition dictionary can be reduced and the speed can be increased. FIG. 5 shows a method of accelerating the comparison with the dictionary using the threshold value thα in the case of comparing the M feature vectors stored in the recognition dictionary and determining the character. In the case of comparing only some of the feature vectors stored in the dictionary, the speed can be similarly increased by the threshold value thα.

【００２２】しきい値算出手段（１）２０３において、
しきい値を算出する際に関数ｆ（β）を用いたが、この
関数の求め方について説明する。ここでは各標準特徴ベ
クトルに対し、それが示す文字の文字イメージを用意し
ておき、用意した文字イメージから図４のステップ４０
１で示すβを求めるとともに標準特徴ベクトルとの距離
を求め、βと距離の分布図から関数ｆ（β）を決定す
る。In the threshold value calculation means (1) 203,
Although the function f (β) was used when calculating the threshold value, how to obtain this function will be described. Here, for each standard feature vector, a character image of the character indicated by the standard feature vector is prepared, and the prepared character image is used in step 40 of FIG.
1 is obtained, the distance from the standard feature vector is obtained, and the function f (β) is determined from the distribution map of β and the distance.

【００２３】以下で図６を用いて詳しく説明する。図に
おいて「ｉ」は認識用辞書に収めれている標準特徴ベク
トルの番号を示す変数であり、標準特徴ベクトルはＭ個
あるものとする。A detailed description will be given below with reference to FIG. In the figure, “i” is a variable indicating the number of the standard feature vector stored in the recognition dictionary, and it is assumed that there are M standard feature vectors.

【００２４】ステップ６０２において辞書に収められて
いるｉ番目の標準特徴ベクトルが示す文字と同じ文字の
文字イメージを事前に用意しておき、用意した文字イメ
ージから特徴ベクトル抽出手段２０３と同じ方法で特徴
ベクトルαを求め、図４のステップ４０１に示す方法で
βを求める。このときｉ番目の標準特徴ベクトルが示す
文字と同じ文字で明朝体やゴシック体等、複数の書体の
文字イメージを用意した場合は、各文字イメージからそ
れぞれβを求める。In step 602, a character image of the same character as the character indicated by the i-th standard feature vector stored in the dictionary is prepared in advance, and the feature image is extracted from the prepared character image by the same method as the feature vector extraction means 203. The vector α is obtained, and β is obtained by the method shown in step 401 of FIG. At this time, when character images of a plurality of typefaces such as Mincho typeface or Gothic typeface with the same character as the i-th standard feature vector are prepared, β is obtained from each character image.

【００２５】次に、用意した文字イメージから求めた特
徴ベクトルαと認識用辞書に収められたｉ番目の標準特
徴ベクトルａiとの距離を求める（ステップ６０３）。
距離は文字決定手段（１）２０５で用いる距離の算出方
法と同じ方法を用いて全要素の比較を行い距離を求める
ものとする。すなわち、文字決定手段（１）２０５でシ
ティブロック距離を用いた場合には図１０のステップ１
０２で示す方法により距離「add」を求める。このと
き、ステップ６０２において複数の文字イメージが用意
されている場合には、各文字イメージからβと距離を求
めるものとする。Next, the distance between the feature vector α obtained from the prepared character image and the i-th standard feature vector ai stored in the recognition dictionary is obtained (step 603).
Regarding the distance, all elements are compared using the same method as the distance calculating method used in the character determining means (1) 205 to obtain the distance. That is, when the city block distance is used in the character determining means (1) 205, step 1 in FIG.
The distance “add” is obtained by the method indicated by 02. At this time, if a plurality of character images are prepared in step 602, β and the distance are obtained from each character image.

【００２６】Ｍ個全ての標準特徴ベクトルの処理が終わ
ると、図６（ｂ）のようにβと認識用辞書との距離の分
布図を描くことができる。そこで、図６（ｂ）の分布図
から関数ｆ（ｂ）を求める。求める方法の一例として、
関数ｆ（β）をｆ（β）＝ｃ1×β＋ｃ2 として、最小２乗法により定数ｃ1およびｃ2を決定す
る。ここで求めた関数ｆ（β）が６１１で表す線である
とすると、６１１がしきい値となるのでこの線より下の
点に関しては認識が可能であるが、上の点に関してはし
きい値より大きな値をとるため認識できないことにな
る。そこで、分布図を求める際に事前に用意した文字イ
メージにおける最高の認識率をε％としたいならば、求
めた関数を６１２のように移動し、６１２より下の点が
全体のε％になるようにするとよい。上記のような移動
により文字認識装置が必要とする能力に合わせて関数ｆ
（β）を設定するとよい。以上のようにしてしきい値ｔ
ｈαを決定するための関数ｆ（β）を求める。When all the M standard feature vectors have been processed, a distribution map of the distance between β and the recognition dictionary can be drawn as shown in FIG. 6 (b). Therefore, the function f (b) is obtained from the distribution chart of FIG. 6 (b). As an example of how to obtain,
With the function f (β) as f (β) = c1 × β + c2, the constants c1 and c2 are determined by the method of least squares. Assuming that the function f (β) obtained here is a line represented by 611, 611 is a threshold value, and therefore points below this line can be recognized, but upper points are threshold values. Since it takes a larger value, it cannot be recognized. Therefore, if the highest recognition rate in the character image prepared in advance is to be set to ε% when obtaining the distribution map, the obtained function is moved like 612, and the points below 612 become ε% of the whole. It is good to do so. The function f is adjusted according to the capability required by the character recognition device by the above movement.
(Β) should be set. As described above, the threshold value t
A function f (β) for determining hα is obtained.

【００２７】従来の固定のしきい値を用いる場合は、６
１２と同様、線の下に含まれる点を全体のε％とする
と、点線６１３のようになる。図から分かるように、従
来の固定のしきい値を用いる場合には、分布に合ったし
きい値の設定がされない。例えば６１４に示す領域にお
いて効果的なしきい値の設定がされていなかったため、
標準特徴ベクトルとの比較でしきい値に達するまで無駄
な比較処理が生じていた。しかし、本実施例に示すよう
な関数ｆ（β）を求めることにより、抽出した特徴ベク
トルに適したしきい値を設定することができるので、従
来の固定のしきい値を用いる場合に比べ、無駄な比較処
理を減らすことができる。When the conventional fixed threshold value is used, 6
Similar to 12, if the points included under the line are ε% of the whole, the line becomes a dotted line 613. As can be seen from the figure, when the conventional fixed threshold value is used, the threshold value matching the distribution is not set. For example, since the effective threshold value has not been set in the area 614,
There was a wasteful comparison process until the threshold value was reached in comparison with the standard feature vector. However, by obtaining the function f (β) as shown in this embodiment, it is possible to set a threshold value suitable for the extracted feature vector, and therefore, as compared with the case of using a conventional fixed threshold value, Useless comparison processing can be reduced.

【００２８】本実施例で示したように、切り出した文字
イメージから求めた特徴ベクトルよりしきい値を算出
し、このしきい値を用いることで、固定のしきい値を用
いる場合より認識用辞書に収められた標準特徴ベクトル
との距離を求める際の比較回数を減らすことができる。
よって認識用辞書との比較時間を減らすことができ、文
字認識速度を上げることが可能となる。As shown in this embodiment, a threshold value is calculated from the feature vector obtained from the cut-out character image, and by using this threshold value, the recognition dictionary is more effective than when a fixed threshold value is used. It is possible to reduce the number of comparisons when obtaining the distance from the standard feature vector stored in.
Therefore, the comparison time with the recognition dictionary can be reduced and the character recognition speed can be increased.

【００２９】（実施例２）実施例１では、切り出した文
字イメージから求めた特徴ベクトルを用いてしきい値を
算出し、このしきい値を用いて認識用辞書との比較時間
の短縮をする方法を説明した。本実施例では認識用辞書
との比較時間の短縮をするために用いるしきい値を標準
特徴ベクトルから算出する方法について説明する。(Embodiment 2) In Embodiment 1, a threshold value is calculated using the feature vector obtained from the cut-out character image, and the comparison time with the recognition dictionary is shortened using this threshold value. I explained how. In this embodiment, a method of calculating a threshold value used to shorten the comparison time with the recognition dictionary from the standard feature vector will be described.

【００３０】本実施例の処理内容について説明する。本
実施例における文字認識装置の処理は図７に示すように
入力モジュール２１によるデータ読み込み処理である入
力手段２０１と、切り出しモジュール２２による切り出
し手段２０２と、特徴ベクトル抽出モジュール２３によ
る特徴ベクトル抽出手段２０３と、しきい値算出モジュ
ール２４によるしきい値算出手段（２）２０６と、文字
決定モジュール２５による文字決定手段（２）２０７と
から構成される。なお、入力手段２０１、切り出し手段
２０２、特徴量抽出手段２０３は実施例１と同じであ
る。The processing contents of this embodiment will be described. As shown in FIG. 7, the processing of the character recognition device in this embodiment is an input unit 201 which is a data reading process by the input module 21, a cutout unit 202 by the cutout module 22, and a feature vector extraction unit 203 by the feature vector extraction module 23. And threshold value calculating means (2) 206 by the threshold value calculating module 24 and character determining means (2) 207 by the character determining module 25. The input unit 201, the cutout unit 202, and the feature amount extraction unit 203 are the same as those in the first embodiment.

【００３１】まず、本実施例でのしきい値算出手段
（２）２０６について図を用いて説明する。図８（ａ）
は標準特徴ベクトル空間における各標準ベクトルの分布
状況を示す概念図である。図において標準ベクトルａ1
に着目したとき、未知の特徴ベクトルαとの比較におい
て最も近いベクトルがａ1と判断されるのは、図の破線
で囲まれた範囲内に入った場合である。なお、図の破線
で囲まれた領域はａ1の近傍にある各標準特徴ベクトル
と等距離にある点を多数求めることによって描いたもの
でる。よって、標準特徴ベクトルａ1と未知の特徴ベク
トルαを比較する際、この破線内にあるかを判断すれば
よいことになる。図８では標準特徴ベクトルを２次元空
間に概念的に示してあるが、実際にはｎ次元空間に存在
しているため、図８の破線で示す境界を求めることは困
難である。そこで、破線の領域をａ1を中心としたｎ次
元の球で近似し、その球の半径をしきい値とする方法を
以下に示す。例えば標準特徴ベクトルａ1について球の
半径を求める方法として、ａ1の近傍の標準特徴ベクト
ルｍ個（ｄ1〜ｄm）を選択し、選択した各標準特徴ベク
トルとａ1との距離の平均を求め、その値をもとに球の
半径を決定する。図９を用いてＭ個の標準特徴ベクトル
に対応するＭ個のしきい値ｔｈ（ｉ）｛１≦ｉ≦Ｍ｝の
求め方を詳しく説明する。図において「ｉ」は認識用辞
書に収めれている標準特徴ベクトルの番号を示す変数で
ある。ステップ７０２においてｉ番目の標準特徴ベクト
ルに対して、これ以外の標準特徴ベクトルとの距離を求
め近いものからｍ個の標準特徴ベクトルを選択し、選択
した各標準特徴ベクトルとｉ番目の標準特徴ベクトルと
の距離の平均「Ａ」を求める。ここで距離は、文字決定
手段（２）２０７で用いる距離を求める方法と同じ方法
を用いて求める。例えば、距離をシティブロック距離と
すると、ステップ７０２に示した方法により標準ベクト
ルａi、ａkとの距離を求める。そして、ｉ番目の特徴ベ
クトルａiと切り出した文字から求めた特徴ベクトルと
の比較の際に用いるしきい値をｔｈ（ｉ）＝κ×Ａとする。このｔｈ（ｉ）がｉ番目の標準特徴ベクトルａ
iの有効範囲を近似するａiを中心とするｎ次元の球の半
径を示すものとなる（ステップ７０３）。First, the threshold value calculating means (2) 206 in this embodiment will be described with reference to the drawings. Figure 8 (a)
FIG. 3 is a conceptual diagram showing the distribution status of each standard vector in the standard feature vector space. In the figure, the standard vector a1
When paying attention to, the closest vector in the comparison with the unknown feature vector α is judged to be a1 when it falls within the range surrounded by the broken line in the figure. The region surrounded by the broken line in the figure is drawn by obtaining a large number of points equidistant from each standard feature vector in the vicinity of a1. Therefore, when comparing the standard feature vector a1 and the unknown feature vector α, it suffices to determine whether they are within the broken line. Although the standard feature vector is conceptually shown in the two-dimensional space in FIG. 8, it is difficult to find the boundary indicated by the broken line in FIG. 8 because the standard feature vector actually exists in the n-dimensional space. Therefore, a method of approximating the broken line region by an n-dimensional sphere centered on a1 and using the radius of the sphere as a threshold value will be described below. For example, as a method of obtaining the radius of the sphere for the standard feature vector a1, m standard feature vectors (d1 to dm) near a1 are selected, the average of the distances between the selected standard feature vectors and a1 is calculated, and the value is calculated. Determine the radius of the sphere based on. How to obtain the M threshold values th (i) {1 ≦ i ≦ M} corresponding to the M standard feature vectors will be described in detail with reference to FIG. 9. In the figure, "i" is a variable indicating the number of the standard feature vector stored in the recognition dictionary. In step 702, the distance between the i-th standard feature vector and the other standard feature vectors is calculated, and m standard feature vectors are selected from the closest ones, and each selected standard feature vector and the i-th standard feature vector are selected. The average “A” of the distances from and is calculated. Here, the distance is obtained using the same method as the method for obtaining the distance used by the character determining means (2) 207. For example, assuming that the distance is a city block distance, the distance to the standard vectors ai and ak is obtained by the method shown in step 702. Then, the threshold value used when comparing the i-th feature vector ai with the feature vector obtained from the cut-out character is th (i) = κ × A. This th (i) is the i-th standard feature vector a
It indicates the radius of an n-dimensional sphere centered on ai which approximates the effective range of i (step 703).

【００３２】そして認識用辞書に収めれているＭ個全て
の標準特徴ベクトルについてしきい値を求め終わったら
処理を終了する。When the threshold values have been calculated for all M standard feature vectors contained in the recognition dictionary, the process is terminated.

【００３３】ステップ７０３でのκの設定の方法を説明
する。まず、１〜Ｍ番目の標準特徴ベクトルに対応する
文字イメージ事前に用意しておき、用意した文字イメー
ジから特徴ベクトル抽出手段２０３と同じ方法で特徴ベ
クトルを求め、対応する標準特徴ベクトルとの距離を求
める。距離は文字決定手段（２）２０７で用いる距離の
算出方法と同じ方法を用いて全要素の比較を行い距離を
求めるものとする。この距離とステップ７０２で求めた
ｍ個の距離の平均Ａを各標準特徴ベクトルごとに求め、
図８（ｂ）に示す分布図を求める。求めた分布図より、
事前に用意した文字イメージにおける最高の認識率を
ε’％としたいならば、原点を通る線６２１を引き、６
２１より下の点が全体のε’％になるようにする。この
ようにして求めた線の傾きをκとする。上記のようなκ
の設定方法により文字認識装置が必要とする能力に合わ
せてκの値を設定するとよい。A method of setting κ in step 703 will be described. First, character images corresponding to the first to Mth standard feature vectors are prepared in advance, a feature vector is obtained from the prepared character images by the same method as the feature vector extraction means 203, and the distance from the corresponding standard feature vector is calculated. Ask. Regarding the distance, all elements are compared using the same method as the distance calculation method used by the character determining means (2) 207 to obtain the distance. The average A of this distance and the m distances obtained in step 702 is obtained for each standard feature vector,
The distribution chart shown in FIG. 8B is obtained. From the obtained distribution map,
If you want to set the highest recognition rate in the character image prepared in advance as ε '%, draw a line 621 that passes through the origin, and
The points below 21 should be ε '% of the whole. The slope of the line thus obtained is κ. Κ as above
The value of κ may be set according to the ability required by the character recognition device by the setting method of.

【００３４】なお、上記で説明した方法によるしきい値
の算出は事前に行っておき、算出した結果のみをテーブ
ルにして記憶装置４３等に蓄え、文字決定手段（２）２
０７ではそのテーブルを参照してしきい値を設定し、文
字の決定を行うものとする。The threshold value is calculated in advance by the method described above, and only the calculated result is stored in the storage device 43 or the like as a table, and the character determining means (2) 2
At 07, the threshold is set by referring to the table to determine the character.

【００３５】従来の固定のしきい値を用いる場合は、６
２１と同様、線の下に含まれる点を全体のε’％とする
と、点線６２２のようになる。図から分かるように、従
来の固定のしきい値を用いる場合には、分布に合ったし
きい値の設定がされない。例えば６２３で示す領域では
効果的なしきい値の設定がされていなかったため、標準
特徴ベクトルとの比較において、しきい値に達するまで
無駄な比較処理が生じていた。しかし、本実施例に示す
ようなしきい値の算出方法により、各標準特徴ベクトル
に適したしきい値を設定することができるので、従来の
固定のしきい値を用いる場合より無駄な比較処理を減ら
すことができる。When the conventional fixed threshold value is used, 6
As in the case of 21, the dotted line 622 is obtained when the points included under the line are ε '% of the whole. As can be seen from the figure, when the conventional fixed threshold value is used, the threshold value matching the distribution is not set. For example, in the area indicated by 623, an effective threshold value has not been set, so that in comparison with the standard feature vector, useless comparison processing occurs until the threshold value is reached. However, since a threshold value suitable for each standard feature vector can be set by the threshold value calculation method as shown in the present embodiment, a wasteful comparison process can be performed as compared with the conventional fixed threshold value. Can be reduced.

【００３６】次にしきい値算出手段（２）２０６で求め
たしきい値を用いて標準特徴ベクトルとの比較を行い文
字を決定する文字決定手段（２）２０７について説明す
る。ここではしきい値算出手段（２）２０６で求めた各
標準特徴ベクトルとの比較の際に用いるしきい値ｔ
ｈ（）と、各標準特徴ベクトルとの比較の過程で求めた
最小距離とを比較し、小さい方をしきい値とすることに
より比較回数を削減し高速化を図る。図１０を用いて詳
細に説明する。Next, the character determining means (2) 207 for determining the character by comparing with the standard feature vector using the threshold value calculated by the threshold value calculating means (2) 206 will be described. Here, the threshold value t used in the comparison with each standard feature vector obtained by the threshold value calculation means (2) 206.
h () is compared with the minimum distance obtained in the process of comparison with each standard feature vector, and the smaller one is used as a threshold to reduce the number of comparisons and speed up. This will be described in detail with reference to FIG.

【００３７】図において「ｉ」は標準特徴ベクトルの番
号を示す変数であり、「min」は標準特徴ベクトルとの
比較の結果、最小の距離を収めるための変数、「code」
は認識結果を収めるための変数である。In the figure, "i" is a variable indicating the number of the standard feature vector, and "min" is a variable for accommodating the minimum distance as a result of comparison with the standard feature vector, "code".
Is a variable for storing the recognition result.

【００３８】始めにステップ９０１で各変数の初期化を
行う。なお、変数「min」はしきい値算出手段（２）２
０６で求めたＭ個のしきい値のうち最大のもので初期化
し、「code」は初期文字コードで初期化する。First, in step 901, each variable is initialized. The variable “min” is the threshold value calculation means (2) 2
Initialization is performed with the maximum of the M threshold values obtained in 06, and "code" is initialized with the initial character code.

【００３９】図中のステップ９０２〜９１５は各標準特
徴ベクトルとの比較ループであり、各標準特徴ベクトル
との比較の過程で変数「code」および「min」の更新を
行っていく。各特徴ベクトルとの比較ループの中のステ
ップ９０６〜９１０は特徴ベクトル抽出手段２０３で抽
出した特徴ベクトルαとｉ番目の標準特徴ベクトルａi
の要素ごとの比較ループである。この要素ごとの比較ル
ープにおいてしきい値ｔｈを用いて無駄な比較の削減を
行う。Steps 902 to 915 in the figure are a comparison loop with each standard feature vector, and variables "code" and "min" are updated in the process of comparison with each standard feature vector. Steps 906 to 910 in the comparison loop with each feature vector are the feature vector α extracted by the feature vector extraction means 203 and the i-th standard feature vector ai.
Is a comparison loop for each element of. In this element-by-element comparison loop, the threshold value th is used to reduce unnecessary comparison.

【００４０】ステップ９０２ではｉ番目の標準特徴ベク
トルの有効範囲を表すしきい値ｔｈ（ｉ）と変数「mi
n」との比較を行い、小さい方を標準特徴ベクトルとの
比較の際のしきい値ｔｈとする（ステップ９０３、９０
４）。In step 902, the threshold th (i) representing the effective range of the i-th standard feature vector and the variable "mi"
n ”, and the smaller one is used as the threshold value th for comparison with the standard feature vector (steps 903, 90).
4).

【００４１】ステップ９０５で特徴ベクトルの要素番号
を示す変数「ｊ］を「１」で初期化し、辞書に収められ
ている特徴ベクトルとの距離を示す変数「add」を
「０」で初期化する。そして、辞書に収められているｉ
番目の標準特徴ベクトルのｊ番目の要素ａijと特徴ベク
トル抽出手段２０３で求めた特徴ベクトルの要素αjと
の距離を求め変数「add」に加える（ステップ９０
６）。ここではシティブロック距離の求め方を示した
が、認識用辞書に収められている標準特徴ベクトルとの
比較を別の方法、例えばユークリッド距離の２乗等で求
めてもかまわない。そして、標準特徴ベクトルとの距離
を表す変数「add」としきい値ｔｈとの比較を行い（ス
テップ９０７）、条件を満たさない場合には次の標準特
徴ベクトルとの比較へ進み（ステップ９０８）、条件を
満たす場合には特徴ベクトルの次の要素の比較のために
ステップ９０６へ戻る（ステップ９０９、９１０）。
ステップ９０７で標準特徴ベクトルとの距離を求める途
中で、それまでに求めた距離「add」と最小有効範囲を
示すしきい値ｔｈとの比較を行い、標準特徴ベクトルと
の距離計算を継続するか判断することにより計算回数の
削減を行うことができる。なぜならば、しきい値ｔｈは
これ以上距離が離れていれば比較している標準特徴ベク
トルが適切でないことを示す値だからである。よって、
ステップ９０７の判断によって無駄な標準特徴ベクトル
との比較計算時間を短縮することが可能となる。In step 905, the variable "j" indicating the element number of the feature vector is initialized to "1", and the variable "add" indicating the distance to the feature vector stored in the dictionary is initialized to "0". . And i in the dictionary
The distance between the j-th element aij of the th standard feature vector and the element αj of the feature vector obtained by the feature vector extraction means 203 is obtained and added to the variable "add" (step 90).
6). Although the method of obtaining the city block distance is shown here, the comparison with the standard feature vector stored in the recognition dictionary may be obtained by another method, for example, the square of the Euclidean distance. Then, the variable "add" representing the distance from the standard feature vector is compared with the threshold th (step 907), and if the condition is not satisfied, the process proceeds to the comparison with the next standard feature vector (step 908), If the condition is satisfied, the process returns to step 906 to compare the next element of the feature vector (steps 909 and 910).
In the middle of obtaining the distance to the standard feature vector in step 907, the distance “add” obtained up to that time is compared with the threshold th indicating the minimum effective range, and the distance calculation with the standard feature vector is continued. By making a judgment, the number of calculations can be reduced. This is because the threshold th is a value indicating that the standard feature vector to be compared is not appropriate if the distance is longer than this. Therefore,
By the determination in step 907, it becomes possible to shorten the time required for the comparison calculation with the useless standard feature vector.

【００４２】特徴ベクトルの全ての要素との比較が済ん
だら、求めた距離「add」が「ｉ−１」番目までの標準
特徴ベクトルとの比較で最小の距離を示す「min」より
小さいかを判断し（ステップ９１１）、小さい場合には
変数「min」および認識結果を収める変数「code」の更
新を行う（ステップ９１３）。After the comparison with all the elements of the feature vector, whether the obtained distance "add" is smaller than "min" showing the minimum distance in comparison with the standard feature vector up to the "i-1" th. The determination is made (step 911), and if smaller, the variable “min” and the variable “code” containing the recognition result are updated (step 913).

【００４３】各標準特徴ベクトルとの比較が終了したら
（ステップ９１５）文字決定手段（２）２０５を終了す
る（ステップ９１６）。文字決定手段（２）２０７を終
了したときに変数「code」に収められている文字コード
が文字の認識結果となる。When the comparison with each standard feature vector is completed (step 915), the character determining means (2) 205 is ended (step 916). When the character determining means (2) 207 is terminated, the character code stored in the variable "code" becomes the character recognition result.

【００４４】本実施例で説明したように、標準特徴ベク
トル空間における各標準特徴ベクトルの有効範囲をその
標準特徴ベクトルを中心とする球で表し、球の半径をし
きい値とすることにより、比較する標準特徴ベクトルご
とに適切なしきい値の設定が可能となる。また、１）各標準特徴ベクトルとの比較の際に用いるしきい値
「ｔｈ（）」２）それまでに比較した標準特徴ベクトルで最小の距離
の値「min」のうち小さい方の値をしきい値ｔｈとして、この値を越
えた時点で現在比較している特徴ベクトルとの比較を中
断することにより、各標準特徴ベクトルとの比較の際に
無駄な比較を省くことができ、結果として処理時間の短
縮を図ることが可能となる。As described in the present embodiment, the effective range of each standard feature vector in the standard feature vector space is represented by a sphere centered on the standard feature vector, and the radius of the sphere is used as a threshold value for comparison. It is possible to set an appropriate threshold value for each standard feature vector to be used. In addition, 1) the threshold value “th ()” used in comparison with each standard feature vector, and 2) the smaller one of the minimum distance values “min” in the standard feature vectors compared up to that point. By interrupting the comparison with the feature vector currently being compared when the threshold value th exceeds this value, useless comparison can be omitted when comparing with each standard feature vector, and as a result, processing is performed. It becomes possible to shorten the time.

【００４５】（実施例３）本実施例では、実施例１にお
いて特徴ベクトル抽出手段２０３で求めた特徴ベクトル
を用いて算出したしきい値ｔｈαと、実施例２において
標準特徴ベクトルより求めたしきい値ｔｈ（）を併用し
て、標準特徴ベクトルとの比較時間を短縮する方法につ
いて説明する。(Third Embodiment) In this embodiment, the threshold value thα calculated using the feature vector obtained by the feature vector extracting means 203 in the first embodiment and the threshold obtained from the standard feature vector in the second embodiment. A method of shortening the comparison time with the standard feature vector by using the value th () together will be described.

【００４６】始めに、本実施例の処理内容について説明
する。本実施例における文字認識装置の処理は図１１に
示すように入力モジュール２１によるデータ読み込み処
理である入力手段２０１と、切り出しモジュール２２に
よる切り出し手段２０２と、特徴ベクトル抽出モジュー
ル２３による特徴ベクトル抽出手段２０３と、しきい値
算出モジュール２４によるしきい値算出手段（１）２０
４および（２）２０６と、文字決定モジュール２５によ
る文字決定手段（３）２０８とから構成される。なお、
入力手段２０１、切り出し手段２０２、特徴量抽出手段
２０３、しきい値算出手段（１）２０４は実施例１と同
じであり、しきい値算出手段（２）２０６は実施例２と
同じである。そこで、本実施例では文字決定手段（３）
２０８についての説明を行う。First, the processing contents of this embodiment will be described. As shown in FIG. 11, the processing of the character recognition device in the present embodiment is an input unit 201 which is a data reading process by the input module 21, a cutout unit 202 by the cutout module 22, and a feature vector extraction unit 203 by the feature vector extraction module 23. And threshold value calculation means (1) 20 by the threshold value calculation module 24
4 and (2) 206, and the character determination means (3) 208 by the character determination module 25. In addition,
The input unit 201, the cutout unit 202, the feature amount extraction unit 203, and the threshold value calculation unit (1) 204 are the same as those in the first embodiment, and the threshold value calculation unit (2) 206 is the same as that in the second embodiment. Therefore, in this embodiment, the character determining means (3)
208 will be described.

【００４７】文字決定手段（３）２０８は、図１０に示
す文字決定手段（２）２０７とほぼ同じである。違う点
は、ステップ９０１において標準特徴ベクトルとの最小
距離を表す変数「min」をしきい値算出手段（１）２０
４で求めたｔｈαで初期化するという点である。The character determining means (3) 208 is almost the same as the character determining means (2) 207 shown in FIG. The difference is that in step 901, the variable “min” representing the minimum distance from the standard feature vector is set to the threshold value calculation means (1) 20.
The point is that initialization is performed with thα obtained in 4.

【００４８】図６（ｂ）における固定のしきい値６１３
と図８（ｂ）における固定のしきい値６２２は同じレベ
ルの値であると考えてよい。なぜならば図６（ｂ）にお
ける縦軸は文字イメージの辞書との距離で、図８（ｂ）
の縦軸は辞書の文字イメージとの距離であるが、辞書と
文字イメージとの比較という点でおなじだからである。
よって、しきい値算出手段（２）２０６で求めたしきい
値の最大値max｛ｔｈ（）｝は図８（ｂ）における固定
のしきい値６２２とほぼ同じであるので、図６（ｂ）に
おける固定のしきい値６１３ともほぼ同じことになる。
このことより、しきい値算出手段（１）２０４で求めた
しきい値ｔｈαとしきい値算出手段（２）２０６で求め
たしきい値の最大値max｛ｔｈ（）｝を比較した場合、
ｔｈαがmax｛ｔｈ（）｝より小さくなることがわか
る。よって、ステップ９０１において標準特徴ベクトル
との最小距離を表す変数「min」をしきい値算出手段
（１）２０４で求めたｔｈαで初期化することにより初
期値としてより小さな値を設定することができ、そのた
め標準特徴ベクトルの各要素との比較回数を減らすこと
ができる。また、実施例１および実施例２よりも各標準
特徴ベクトルとの比較処理の時間の短縮を図ることが可
能となる。A fixed threshold value 613 in FIG. 6B.
It can be considered that the fixed threshold value 622 in FIG. 8B is a value of the same level. Because the vertical axis in FIG. 6 (b) is the distance from the character image dictionary, and FIG.
This is because the vertical axis of is the distance from the character image in the dictionary, but it is the same in comparison between the dictionary and the character image.
Therefore, the maximum value max {th ()} of the threshold value obtained by the threshold value calculation means (2) 206 is almost the same as the fixed threshold value 622 in FIG. The same applies to the fixed threshold 613 in ().
From this, when the threshold value thα obtained by the threshold value calculation means (1) 204 and the maximum value max {th ()} of the threshold value obtained by the threshold value calculation means (2) 206 are compared,
It can be seen that thα is smaller than max {th ()}. Therefore, in step 901, a smaller value can be set as an initial value by initializing the variable “min” representing the minimum distance from the standard feature vector with thα obtained by the threshold value calculation means (1) 204. Therefore, the number of comparisons with each element of the standard feature vector can be reduced. Further, it is possible to reduce the time required for the comparison process with each standard feature vector as compared with the first and second embodiments.

【００４９】このように実施例１による特徴ベクトル抽
出手段２０３で求めた特徴ベクトルを用いて算出したし
きい値ｔｈαと、実施例２による標準特徴ベクトルより
求めたしきい値ｔｈ（）を併用することにより、２つの
違った方向から求めた適切なしきい値を設定できるので
高速化が望める。As described above, the threshold value thα calculated using the feature vector obtained by the feature vector extraction unit 203 according to the first embodiment and the threshold value th () obtained from the standard feature vector according to the second embodiment are used together. By doing so, it is possible to set an appropriate threshold value obtained from two different directions, so that speedup can be expected.

【００５０】[0050]

【発明の効果】以上説明したように本発明の文字認識装
置および文字認識方法は、文字認識処理で最も処理時間
のかかる標準特徴ベクトルとの比較において、無駄な計
算を打ち切るためのしきい値を用いることにより、認識
用辞書との比較回数の削減を図り、結果として比較処理
の時間の短縮につながり、全体として文字認識時間の短
縮を実現する。このとき用いるしきい値の算出方法とし
て１）切り出した文字イメージから抽出した特徴ベクトル
を用いて算出する２）標準特徴ベクトル空間における標準特徴ベクトルの
分布状態から各標準特徴ベクトルとの比較の際のしきい
値を算出するの２つの方法がある。As described above, the character recognizing device and the character recognizing method of the present invention provide a threshold value for canceling unnecessary calculation in comparison with the standard feature vector which requires the longest processing time in the character recognizing process. By using it, the number of comparisons with the recognition dictionary is reduced, and as a result, the comparison processing time is shortened, and the character recognition time is shortened as a whole. As the threshold value calculation method used at this time, 1) calculation is performed using the feature vector extracted from the cut-out character image. 2) When comparing with each standard feature vector from the distribution state of the standard feature vector in the standard feature vector space. There are two ways to calculate the threshold.

【００５１】１）は図６（ｂ）に示すように、抽出した
特徴ベクトルから求めたβと認識用辞書に収められた標
準特徴ベクトルとの距離の分布図から実線で示す関数ｆ
（β）を求めるため、点線６１３で示す従来の固定のし
きい値に比べ、切り出した文字に適切なしきい値の設定
が可能となり、６１４に示す領域において無駄な比較を
削減することができる。As shown in FIG. 6 (b), 1) is a function f indicated by a solid line from the distribution map of the distance between β obtained from the extracted feature vector and the standard feature vector stored in the recognition dictionary.
Since (β) is obtained, it is possible to set an appropriate threshold value for the cut out character as compared with the conventional fixed threshold value shown by the dotted line 613, and it is possible to reduce unnecessary comparison in the area shown at 614.

【００５２】２）は図８（ｂ）に示すように、各標準特
徴ベクトルから求めたｍ個の距離の平均Ａと、事前に用
意した文字イメージとの距離の分布図から６２１に示す
しきい値を算出し、各標準特徴ベクトルごとに適切なし
きい値の設定が可能となるので、点線６２２で示す従来
の固定のしきい値に比べ６２３に示す領域において、無
駄な比較を削減することができる。As shown in FIG. 8 (b), 2) is a threshold 621 from the distribution map of the distance between the average A of m distances obtained from each standard feature vector and the character image prepared in advance. Since a value can be calculated and an appropriate threshold value can be set for each standard feature vector, useless comparison can be reduced in the area indicated by 623 compared with the conventional fixed threshold value indicated by the dotted line 622. it can.

【００５３】さらに１）と２）を併用することにより、
しきい値の初期値をより小さな値に設定でき、１）もし
くは２）だけの場合より無駄な比較が削減することがで
きる。By further using 1) and 2) together,
The initial value of the threshold value can be set to a smaller value, and unnecessary comparison can be reduced as compared with the case of only 1) or 2).

【００５４】以上のように固定のしきい値を用いて比較
を行う方法に比べて、文字認識処理で最も処理時間のか
かる標準特徴ベクトルとの比較において、より柔軟で高
速な認識用辞書との比較を可能とする。As compared with the method of performing comparison using a fixed threshold value as described above, in comparison with a standard feature vector that takes the longest processing time in character recognition processing, it is more flexible and faster than a recognition dictionary. Allows comparison.

[Brief description of drawings]

【図１】本発明に必要な装置の構成例を示す図である。FIG. 1 is a diagram showing a configuration example of an apparatus necessary for the present invention.

【図２】実施例１における本発明のブロック図である。2 is a block diagram of the present invention in Embodiment 1. FIG.

【図３】特徴ベクトル抽出手段を説明するための特徴ベ
クトル抽出例を示す図である。FIG. 3 is a diagram illustrating a feature vector extraction example for explaining a feature vector extraction unit.

【図４】実施例１におけるしきい値算出手段を説明する
ための流れ図である。FIG. 4 is a flow chart for explaining a threshold value calculation means in the first embodiment.

【図５】実施例１における文字決定手段を説明するため
の流れ図である。FIG. 5 is a flow chart for explaining a character determination means in the first embodiment.

【図６】実施例１におけるしきい値算出手段で用いる関
数ｆ（β）を求める方法を説明するための図である。FIG. 6 is a diagram for explaining a method of obtaining a function f (β) used by the threshold value calculation means in the first embodiment.

【図７】実施例２における本発明のブロック図である。FIG. 7 is a block diagram of the present invention in a second embodiment.

【図８】実施例２におけるしきい値算出手段を説明する
ための図である。FIG. 8 is a diagram for explaining a threshold value calculation means according to the second embodiment.

【図９】実施例２におけるしきい値算出手段を説明する
ための流れ図である。FIG. 9 is a flow chart for explaining a threshold value calculation means in the second embodiment.

【図１０】実施例２における文字決定手段を説明するた
めの流れ図である。FIG. 10 is a flow chart for explaining a character determination means in the second embodiment.

【図１１】実施例３における本発明のブロック図であ
る。FIG. 11 is a block diagram of the present invention in a third embodiment.

【図１２】従来の文字認識方法を説明するための図であ
る。FIG. 12 is a diagram for explaining a conventional character recognition method.

[Explanation of symbols]

１０ＣＰＵ２０ＲＯＭ装置２１入力モジュール２２切り出しモジュール２３特徴ベクトル抽出モジュール２４しきい値算出モジュール２５文字決定モジュール３０ＲＡＭ装置４０入力装置４１スキャナー４２表示装置４３記憶装置 10 CPU 20 ROM device 21 Input module 22 Clipping module 23 Feature vector extraction module 24 Threshold calculation module 25 Character determination module 30 RAM device 40 Input device 41 Scanner 42 Display device 43 Storage device

Claims

[Claims]

1. Input means for inputting an image of a paper surface including handwritten or printed characters, cutout means for cutting out a character string or characters from the image of the paper surface input by the input means, and cutout by the cutout means The feature vector extraction means for extracting the feature vector from the extracted character and the standard feature vector stored in the recognition dictionary are used to calculate the threshold value using the feature vector obtained by the feature vector extraction means. By using the threshold value calculation means and the threshold value calculated by the threshold value calculation means, the standard feature vector stored in the recognition dictionary and the feature vector extracted by the feature vector extraction means are compared. And a character deciding means for deciding the character cut out by the cutting means.

2. Input means for inputting an image of a paper surface including handwritten or printed characters, cutout means for cutting out a character string or a character from the image of the paper surface input by the input means, and cutout by the cutout means. The feature vector extracting means for extracting the feature vector from the character and the standard feature vector used in the comparison with the standard feature vector stored in the recognition dictionary are effective for each standard feature vector stored in the recognition dictionary. A threshold value calculating means for calculating the range, a threshold value calculated by the threshold value calculating means, a standard feature vector stored in the recognition dictionary, and a feature vector extracted by the feature vector extracting means, And a character determining means for determining the character cut out by the cutting means. Recognition device.

3. Input means for inputting a paper surface image including handwritten or printed characters, cutout means for cutting out a character string or characters from the paper surface image input by the input means, and cutout by the cutout means. The feature vector extracting means for extracting the feature vector from the extracted character and the standard feature vector stored in the recognition dictionary are used as a threshold value using the feature vector obtained by the feature vector extracting means. While calculating the value 1,
Threshold value calculating means for calculating the effective range of each standard feature vector stored in the recognition dictionary as the threshold value 2, and the threshold value 1 and the threshold value calculated by the threshold value calculating means. 2 in combination, the standard feature vector stored in the recognition dictionary is compared with the feature vector extracted by the feature vector extraction means, and the character determination means for determining the character cut out by the cutout means. A character recognition device characterized by comprising.

4. An input means for inputting an image on a paper surface including handwritten or printed characters, a cutting means for cutting out a character string or a character from the image on the paper surface input by the input means, and a cutout by the cutting means. Feature vector extracting means for extracting a feature vector from the extracted character, threshold value calculating means for calculating a threshold value used in comparison with the standard feature vector stored in the recognition dictionary, and the threshold value calculating means. Using the threshold value calculated by the above, the standard feature vector stored in the recognition dictionary is compared with the feature vector extracted by the feature vector extraction means, and the character cut out by the cutout means is determined. In the character recognition device including a determination means, the threshold value calculation means is characterized by the feature vector extraction means. Character recognition method characterized by calculating the threshold using torr.

5. Input means for inputting an image of a paper surface including handwritten or printed characters, cutout means for cutting out a character string or a character from the image of the paper surface input by the input means, and cutout by the cutout means. Feature vector extracting means for extracting a feature vector from the extracted character, threshold value calculating means for calculating a threshold value used in comparison with the standard feature vector stored in the recognition dictionary, and the threshold value calculating means. Using the threshold value calculated by the above, the standard feature vector stored in the recognition dictionary is compared with the feature vector extracted by the feature vector extraction means, and the character cut out by the cutout means is determined. In the character recognition device including a determination unit, the threshold value calculation unit includes each standard feature stored in the recognition dictionary. Character recognition method characterized by calculating the effective range of the vector as a threshold.

6. An input means for inputting an image of a paper surface including handwritten or printed characters, a cutting means for cutting out a character string or a character from the image of the paper surface input by the input means, and a cutout by the cutting means. Feature vector extracting means for extracting a feature vector from the extracted character, threshold value calculating means for calculating a threshold value used in comparison with the standard feature vector stored in the recognition dictionary, and the threshold value calculating means. Using the threshold value calculated by the above, the standard feature vector stored in the recognition dictionary is compared with the feature vector extracted by the feature vector extraction means, and the character cut out by the cutout means is determined. In the character recognition device including a determination means, the threshold value calculation means is characterized by the feature vector extraction means. The threshold value 1 is calculated by using Toru, and the effective range of each standard feature vector stored in the recognition dictionary is calculated as the threshold value 2. A character recognition method characterized by using the threshold value 2 together.