JPH0927014A

JPH0927014A - Dictionary data learning system of handy optical character recognition device

Info

Publication number: JPH0927014A
Application number: JP7197929A
Authority: JP
Inventors: Masao Ogiwara; 政夫荻原
Original assignee: SMK Corp
Current assignee: SMK Corp
Priority date: 1995-07-12
Filing date: 1995-07-12
Publication date: 1997-01-28

Abstract

PROBLEM TO BE SOLVED: To generate new dictionary data in a short time from many sample characters by outputting feature data extracted from read data on plural sample characters to a host computer. SOLUTION: An OCR 10 reads sample characters printed by >=2 different printing means as to one reference character from an image sensor 1 and outputs feature data extracted from read data on respective sample characters to the host computer 6. The host computer 6 generates dictionary data on the reference character from the feature data on >=2 different sample characters and outputs them to a microcomputer 4 together with educational data corresponding to the reference character. Consequently, the host computer 6 which can perform fast processing can performs calculation for the dictionary data generation. The new dictionary data are generated by processing feature data together for >=2 sample data, thus it is hardly affected by the read order of sample characters is little.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、光学的に読み取った文
字を文字認識するハンディ型光学式文字認識装置の辞書
データ記憶部に記憶された辞書データを、複数のサンプ
ル文字から作成する辞書データ学習方式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to dictionary data for creating dictionary data stored in a dictionary data storage unit of a handy type optical character recognition device for recognizing an optically read character from a plurality of sample characters. Regarding learning methods.

【０００２】[0002]

[Prior art]

【０００３】ハンディ型光学式文字認識装置（以下ＯＣ
Ｒという）１００では、例えばプリンタなどで印字され
た文字と比較するための辞書データを辞書データ記憶部
に記憶しており、この辞書データと読み取った文字とを
比較して文字認識を行っている。A handy type optical character recognition device (hereinafter referred to as OC
In R) 100, dictionary data for comparison with characters printed by a printer, for example, is stored in the dictionary data storage unit, and character recognition is performed by comparing this dictionary data with the read characters. .

【０００４】すなわち、図７に示すように、ＣＣＤ（電
荷結合素子）等からなるイメージセンサ１０１によって
読み取られた被読取文字の微小なアナログ信号は、増幅
・２値化回路１０２で増幅された後に、明暗を表す２値
化信号（０か１かの信号）に変換され、その読取データ
がＲＡＭ（ランダム・アクセス・メモリ）からなるイメ
ージデータ記憶部１０３に書き込まれる。That is, as shown in FIG. 7, a minute analog signal of a character to be read, which is read by an image sensor 101 such as a CCD (charge coupled device), is amplified by an amplification / binarization circuit 102 and then amplified. , And is converted into a binary signal (a signal of 0 or 1) representing light and dark, and the read data is written in the image data storage unit 103 formed of a RAM (random access memory).

【０００５】一方、電気的再書き込み可能ＲＯＭ（ＥＥ
ＲＯＭ）からなる辞書データ記憶部１０４には、特定の
書体文字一組、例えばセンチュリの書体印字文字一組の
読取データからその文字イメージの特徴をもとに作成さ
れた辞書データが記憶されている。On the other hand, an electrically rewritable ROM (EE
A dictionary data storage unit 104 including a ROM) stores dictionary data created from read data of a specific typeface character set, for example, a set of Century typeface print characters, based on the characteristics of the character image. .

【０００６】これらのイメージデータ記憶部１０３と辞
書データ記憶部１０４に接続したマイクロコンピュータ
１０５は、文字認識モードと辞書学習モードの二つのモ
ードで動作し、文字認識モードにおいては、イメージデ
ータ記憶部１０３から読み出した読取データと辞書デー
タ記憶部１０４に格納されている辞書データとを比較し
て、被読取文字の文字認識を行っている。The microcomputer 105 connected to the image data storage unit 103 and the dictionary data storage unit 104 operates in two modes, a character recognition mode and a dictionary learning mode. In the character recognition mode, the image data storage unit 103 is operated. The read data read from the dictionary data is compared with the dictionary data stored in the dictionary data storage unit 104 to perform character recognition of the read character.

【０００７】すなわち、両者を各部の特徴毎に比較し、
両者の不一致度ｋが所定の範囲内にあるときに、被読取
文字を認識できたと判定して、被読取文字の認識データ
を所定の文字コード、例えばアスキーコードによって、
ＯＣＲ１００に接続されたホストコンピュータ１０６へ
出力するものである。That is, the two are compared for each feature of each part,
When the degree of disagreement k between the two is within a predetermined range, it is determined that the read character can be recognized, and the recognition data of the read character is converted into a predetermined character code, for example, an ASCII code.
The data is output to the host computer 106 connected to the OCR 100.

【０００８】この文字認識を更に詳しく説明すると、図
６（ａ）のように読取データをｍ個のマスクに分割し、
各マスク毎の読取データの値をｍ個の成分に割り当て
て、読取データをｍ次元のベクトルｙ＝（ｙ₁，ｙ₂，ｙ
₃・・・ｙ_m）で構成される特徴データで表し、この特徴
データを対応するｍ次元のベクトルｆ＝（ｆ₁，ｆ₂，ｆ
₃・・・ｆ_m）からなる辞書データと比較するものであ
る。This character recognition will be described in more detail.
The read data is divided into m masks as shown in 6 (a),
Assigns the read data value for each mask to m components
Read data as an m-dimensional vectory= (Y₁, Y_Two, Y
_Three... y_m), And the feature data
M-dimensional vector corresponding to the dataf= (F₁, F_Two, F
_Three... f_m) Dictionary data consisting of
You.

【０００９】[0009]

【数３】 (Equation 3)

【００１０】で求められるある辞書データとの不一致度
ｋが、一定の定数Ｃ₁より小さく、すなわちｋ＜Ｃ₁であ
って、この不一致度ｋと２番目に小さい他の辞書データ
との不一致度ｋ₂との差がＣ₂以上、すなわちｋ₂−ｋ＞
Ｃ₂を満たすときに、読取データが当該辞書データに係
る文字であると判定する。The degree of disagreement k with a certain dictionary data obtained in step S1 is smaller than a constant C ₁ , that is, k <C ₁ , and the degree of disagreement between this degree of disagreement k and the second smallest dictionary data. The difference from k ₂ is C ₂ or more, that is, k ₂ −k>
When C ₂ is satisfied, it is determined that the read data is a character related to the dictionary data.

【００１１】一方、辞書学習モードは、辞書データ記憶
部１０４に、新たに辞書データを作成したり、文字認識
率を高めるために認識しようとする文字のフォントに合
わせた辞書データを作成したときに、これらの辞書を辞
書データ記憶部１０４に記憶するものである。On the other hand, in the dictionary learning mode, when new dictionary data is created in the dictionary data storage unit 104 or dictionary data matching the font of the character to be recognized in order to increase the character recognition rate is created. The dictionary data storage unit 104 stores these dictionaries.

【００１２】この辞書学習モードでのＯＣＲ１００とホ
ストコンピュータ１０６の動作を、図８の流れ図で説明
する。The operation of the OCR 100 and the host computer 106 in this dictionary learning mode will be described with reference to the flowchart of FIG.

【００１３】始めに、ＯＣＲ１００のマイクロコンピュ
ータ１０５を辞書学習モードとするために、ホストコン
ピュータ１０６から辞書学習モードへの設定を要求する
学習モード指定信号を出力する（ステップＳ１０１）。First, in order to set the microcomputer 105 of the OCR 100 to the dictionary learning mode, the host computer 106 outputs a learning mode designating signal requesting the setting to the dictionary learning mode (step S101).

【００１４】マイクロコンピュータ１０５は、この信号
を受けて自身を辞書学習モードとした後、このモードに
設定したことを応答信号によってホストコンピュータ１
０６へ通知する（ステップＳ１０２）。Upon receipt of this signal, the microcomputer 105 sets itself in the dictionary learning mode, and then, in response to the response signal, the host computer 1
06 is notified (step S102).

【００１５】操作者は、この通知をホストコンピュータ
１０６のディスプレーで確認し、続いて辞書データとし
て記憶しようとするフォントのサンプル文字を被読取文
字として、イメージセンサ１０１から読み取る。マイク
ロコンピュータ１０５は、このサンプル文字の読取デー
タをホストコンピュータ１０６へ出力する（ステップＳ
１０３）とともに、前述の文字認識モードと同様に、こ
の読取データをｍ個のマスクに分割し、読取データから
ｍ次元のベクトルからなる特徴データｘを抽出する。The operator confirms this notification on the display of the host computer 106 and then reads from the image sensor 101 the sample character of the font to be stored as dictionary data as the read character. The microcomputer 105 outputs the read data of this sample character to the host computer 106 (step S).
At the same time as 103), the read data is divided into m masks and the feature data x consisting of an m-dimensional vector is extracted from the read data, as in the character recognition mode.

【００１６】操作者は、ホストコンピュータ１０６のデ
ィスプレーに表示されたこの読取データを確認しなが
ら、対応する正しい文字コードを教育データとして入力
し、ホストコンピュータ１０６は、この教育データをマ
イクロコンピュータ１０５へ出力する（ステップＳ１０
４）。While checking the read data displayed on the display of the host computer 106, the operator inputs the corresponding correct character code as educational data, and the host computer 106 outputs this educational data to the microcomputer 105. Yes (step S10
4).

【００１７】マイクロコンピュータ１０６は、ホストコ
ンピュータ１０６から出力された教育データに対応する
辞書データｆを辞書データ記憶部１０４から読み出し、
抽出した特徴データｘと辞書データｆから新たな辞書デ
ータｆ´を作成し、教育データと対応づけて新たな辞書
データｆ´を辞書データ記憶部１０４へ書き込み、辞書
データを書き替えるものである。The microcomputer 106 reads out the dictionary data f corresponding to the educational data output from the host computer 106 from the dictionary data storage unit 104,
The dictionary data is rewritten by creating new dictionary data f ′ from the extracted feature data x and the dictionary data f , writing the new dictionary data f ′ in the dictionary data storage unit 104 in association with the education data.

【００１８】ところで、一般に同じフォントであっても
プリンタの機種などの相違によって異なったイメージが
形成され、新たなフォントの辞書データを作成する際に
は、１文字の基準文字（例えばＡ）について少なくとも
数種類（例えばｎ種類）のプリンタで印字したサンプル
文字を読み取って、これらの各特徴を取り入れ辞書デー
タを作成している。By the way, generally, even if the same font is used, different images are formed due to differences in printer models and the like, and when creating dictionary data of a new font, at least one reference character (for example, A) is used. The sample characters printed by several types (for example, n types) of printers are read, and each of these characteristics is incorporated to create dictionary data.

【００１９】従って、単にサンプル文字を読み取る毎
に、特徴データｘと辞書データｆの平均値をとって新た
な辞書データｆ´とすると、後に読み取るサンプル文字
の特徴ほど新たな辞書データに大きな影響を与える。Therefore, if the average value of the feature data x and the dictionary data f is taken every time the sample character is simply read to obtain new dictionary data f ', the feature of the sample character read later has a greater effect on the new dictionary data. give.

【００２０】そこで、特開平６−９６２８３号公報で
は、サンプル文字による特徴データに一定の学習速度係
数α（例えば、＝０．９）を付けて、新たな辞書データ
ｆ´を、ｆ´＝αｆ＋（１−α）ｘで作成して、既存の
辞書データに重みづけを行いこの問題を解決している。Therefore, in Japanese Patent Laid-Open No. 6-96283, new dictionary data is created by adding a constant learning speed coefficient α (eg, = 0.9) to the feature data of sample characters.
This problem is solved by creating f ′ by f ′ = α f + (1-α) x and weighting the existing dictionary data.

【００２１】このようにして、新たな辞書データｆ´を
教育データに対応づけて辞書記憶部１０４に記憶する
と、マイクロコンピュータ１０５は、学習終了信号をホ
ストコンピュータ１０６へ出力する（ステップＳ１０
５）。[0021] Thus, when stored in the dictionary storage unit 104 in association with the new dictionary data f 'to the education data, the microcomputer 105 outputs a learning completion signal to the host computer 106 (step S10
5).

【００２２】学習終了信号をホストコンピュータ１０６
のディスプレーで確認すると、操作者は、引き続き、他
のプリンタで印字されたサンプル文字をイメージセンサ
１０１から読み取る。The learning end signal is sent to the host computer 106.
When the display is confirmed, the operator continuously reads the sample characters printed by another printer from the image sensor 101.

【００２３】ステップＳ１０３からステップＳ１０５の
同様の処理を、ｎ種類のサンプル文字が全て読み取ら
れ、新たな辞書データとして記憶されるまで繰り返す。The same processing from step S103 to step S105 is repeated until all n kinds of sample characters are read and stored as new dictionary data.

【００２４】最後のサンプル文字を読み取らせ、その学
習終了信号をホストコンピュータ１０６のディスプレー
で確認すると、操作者は、マイクロコンピュータ１０５
を文字認識モードへ戻すために、ホストコンピュータ１
０６から文字認識モードへの設定を要求する認識モード
指定信号を出力させ、（ステップＳ１０６）辞書学習モ
ードを終了する。When the last sample character is read and the learning completion signal is confirmed on the display of the host computer 106, the operator finds the microcomputer 105
Host computer 1 to return the character to the character recognition mode.
A recognition mode designation signal for requesting the setting to the character recognition mode is output from 06, and the dictionary learning mode is ended (step S106).

【００２５】[0025]

【発明が解決しようとする課題】しかしながら、このよ
うな従来のＯＣＲ１００による辞書データ学習方式に
は、次のような問題点があった。However, the conventional dictionary data learning method using the OCR 100 has the following problems.

【００２６】すなわち第１に、複数のサンプル文字で新
たな辞書データを作成する場合に、上記従来例では、先
に作成した辞書データに重み付けを行うことにより読取
順序による影響を少なくしているが、にもかかわらず最
終的に作成される辞書データは、サンプル文字の読み取
り順序によって異なるものとなり、ｎ種類のサンプル文
字の特徴を平均化したものとならない。That is, first, in the case of creating new dictionary data with a plurality of sample characters, in the above conventional example, the influence of the reading order is reduced by weighting the previously created dictionary data. However, the finally created dictionary data differs depending on the reading order of the sample characters, and is not an average of the characteristics of n kinds of sample characters.

【００２７】第２に、サンプル文字毎に辞書データを更
新するため、複数のサンプルで新たな辞書データを作成
する場合に、その都度ＯＣＲ１００からの学習終了信号
を待たねばならず作業が途切れて処理時間を要する。Secondly, since the dictionary data is updated for each sample character, when a new dictionary data is created with a plurality of samples, the learning end signal from the OCR 100 must be waited each time the work is interrupted and the processing is performed. It takes time.

【００２８】第３に、新たなサンプル文字を読み取る毎
に、ｆ´＝αｆ＋（１−α）ｘの計算を行って、新たな
辞書データを作成しなければならず、実用上支障のない
程度の時間でこの計算を行って新たな辞書データを辞書
記憶部１０４へ記憶するためには、ＯＣＲ１００に高速
で動作するマイクロコンピュータ１０５を搭載しなけれ
ばならない。Thirdly, every time a new sample character is read, f ′ = α f + (1-α) x must be calculated to create new dictionary data, which is a practical obstacle. In order to carry out this calculation and store new dictionary data in the dictionary storage unit 104 within a short time, the microcomputer 105 that operates at high speed must be installed in the OCR 100.

【００２９】上記第２、第３の問題は、ＯＣＲ１００に
おいて、全てのサンプル文字の読取データを記憶し、高
速処理が可能なマイクロコンピュータ１０５によって新
たな辞書データ作成の計算を行えば解決するが、ＯＣＲ
１００に全てのサンプル文字の読み取りデータを記憶す
るだけの容量を備えたメモリー１０３と、高速処理が可
能なマイクロコンピュータ１０５を搭載する必要があ
る。The above-mentioned second and third problems can be solved by storing read data of all sample characters in the OCR 100 and calculating new dictionary data by the microcomputer 105 capable of high speed processing. OCR
It is necessary to mount a memory 103 having a capacity for storing the read data of all sample characters and a microcomputer 105 capable of high-speed processing in 100.

【００３０】しかしながら、ハンディ型の光学式文字認
識装置１００には、据え置きの大型の文字認識装置との
差別化のために、小型化と低価格化が求められ、大容量
を備えたメモリー１０３や高速動作する高価なマイクロ
コンピュータ１０５を搭載することはできない。However, the handy type optical character recognizing device 100 is required to be downsized and low in price in order to differentiate it from a large stationary character recognizing device, and the memory 103 having a large capacity and the like. It is not possible to mount the expensive microcomputer 105 that operates at high speed.

【００３１】第４に、文字認識を定数で比較するので、
定数の設定によって読取エラーが増加する。Fourth, since character recognition is compared with a constant,
Setting the constant increases the read error.

【００３２】すなわち、辞書データをｍ次元のベクトル
でのみ表し、サンプル文字の特徴データとのベクトル間
の距離（不一致度ｋ）と一定の定数Ｃ₁とを比較して、
文字認識の条件としているので、学習させるサンプル文
字のばらつき度合いの影響が無視され、サンプル文字の
特徴データのばらつきに対して定数Ｃ₁が小さすぎる
と、「認識できない」というエラーが多く発生し、サン
プル文字の特徴データのばらつきに対して定数Ｃ₁が大
きすぎると、「誤った文字を認識する」というエラーが
多く発生する。That is, the dictionary data is represented only by an m-dimensional vector, and the distance between vectors (the degree of disagreement k) with the feature data of the sample character is compared with a constant C ₁ .
Since it is a condition for character recognition, the influence of the degree of variation of sample characters to be learned is ignored, and if the constant C ₁ is too small for the variation of the characteristic data of the sample characters, many "unrecognizable" errors occur, If the constant C ₁ is too large with respect to the variation in the characteristic data of the sample characters, many errors "recognize erroneous characters" occur.

【００３３】一方、このばらつき度合いをサンプル文字
群から目視で確認することは困難であり、文字認識を繰
り返しながら定数Ｃ₁を設定しなければならないという
問題があった。On the other hand, it is difficult to visually confirm the degree of variation from the sample character group, and there is a problem that the constant C ₁ must be set while repeating character recognition.

【００３４】更に、定数Ｃ₁が大きいと、複数の文字の
辞書データと文字認識条件を満たすことが多くなり、あ
る文字にかかる辞書データで文字認識条件を満たして
も、更に他の文字との距離と定数Ｃ₂を用いて比較しな
ければならず、文字認識に時間を要していた。Further, if the constant C ₁ is large, the dictionary data of a plurality of characters and the character recognition condition are often satisfied. Even if the character recognition condition is satisfied by the dictionary data of a certain character, the character data will not be matched with other characters. The distance and the constant C ₂ must be used for comparison, and it took time for character recognition.

【００３５】この発明は、上述の問題を解決するために
なされたもので、短時間で多数のサンプル文字から新た
な辞書データを作成することができ、しかも作成した辞
書データは、サンプル文字の読取順序に影響されず、更
にサンプル文字間にばらつきがあっても文字認識エラー
が発生しないハンディ型光学式文字認識装置の辞書デー
タ学習方式を提供することを目的とする。The present invention has been made to solve the above-mentioned problems, and new dictionary data can be created from a large number of sample characters in a short time, and the created dictionary data can be read from the sample characters. It is an object of the present invention to provide a dictionary data learning method for a handy type optical character recognition device that is not affected by the order and does not cause a character recognition error even if there are variations between sample characters.

【００３６】[0036]

【課題を解決するための手段】上記目的を達成するため
に本発明によるハンディ型光学式文字認識装置の辞書デ
ータ学習方式は、被読取文字を光学的に読み取るイメー
ジセンサと、イメージセンサから出力される被読取文字
の読取データを記憶する記憶部と、辞書データを記憶す
る辞書データ記憶部と、読取データと辞書データを比較
して被読取文字の文字認識を行う文字認識モードと、読
取データをもとに作成された辞書データを辞書データ記
憶部へ記憶する辞書学習モードの二つのモードで動作す
るマイクロコンピュータとを備えたハンディ型光学式文
字認識装置と、辞書学習モードにおいて、ハンディ型光
学式文字認識装置のマイクロコンピュータへ、被読取文
字の読取データに対応する文字コードを教育データとし
て出力するホストコンピュータとからなり、サンプル文
字を辞書作成用の被読取文字としてイメージセンサで読
み取り、辞書学習モードで動作するマイクロコンピュー
タによって、サンプル文字の読取データから特徴データ
を抽出し、特徴データをもとに作成された辞書データを
教育データと対応づけて辞書データ記憶部へ記憶するこ
とにより辞書データを書き替えるハンディ型光学式文字
認識装置の辞書データ学習方式において、ハンディ型光
学式文字認識装置は、イメージセンサから１文字の基準
文字について少なくとも２以上の異なる印字手段で印字
されたサンプル文字を読み取り、各サンプル文字の読取
データから抽出した特徴データをホストコンピュータへ
出力し、ホストコンピュータは、２以上の異なるサンプ
ル文字の特徴データから基準文字の辞書データを作成
し、基準文字に対応する教育データとともにマイクロコ
ンピュータへ出力することを特徴とする。In order to achieve the above object, a dictionary data learning method of a handy type optical character recognition apparatus according to the present invention is an image sensor for optically reading a character to be read and an output from the image sensor. The storage unit for storing the read data of the read character, the dictionary data storage unit for storing the dictionary data, the character recognition mode for performing the character recognition of the read character by comparing the read data with the dictionary data, and the read data A handy type optical character recognition device provided with a microcomputer that operates in two modes, a dictionary learning mode, for storing dictionary data created based on the dictionary data storage section, and a handy type optical character recognition device in the dictionary learning mode. A host that outputs the character code corresponding to the read data of the read character as educational data to the microcomputer of the character recognition device Computer, which reads sample characters as read characters for dictionary creation with an image sensor, uses a microcomputer that operates in dictionary learning mode to extract characteristic data from the sample character read data, and creates it based on the characteristic data. In the dictionary data learning method of the hand-held optical character recognition device that rewrites the dictionary data by storing the stored dictionary data in association with the education data in the dictionary data storage unit, the hand-held optical character recognition device is an image sensor. Read the sample characters printed by at least two different printing means for one reference character, and output the characteristic data extracted from the read data of each sample character to the host computer, and the host computer outputs two or more different samples. Of the reference character from the character feature data Create a write data, along with education data corresponding to the reference characters and outputs to the microcomputer.

【００３７】請求項１の発明は、イメージセンサから１
文字の基準文字について少なくとも２以上の異なる印字
手段で印字されたサンプル文字を読み取り、各サンプル
文字の読取データから抽出した特徴データをホストコン
ピュータへ出力する。従って、各サンプル文字の読取デ
ータを記憶するだけのメモリー容量を備える必要がな
い。According to a first aspect of the present invention, an image sensor is provided.
The sample character printed by at least two different printing means is read for the reference character of the character, and the characteristic data extracted from the read data of each sample character is output to the host computer. Therefore, it is not necessary to provide a memory capacity for storing the read data of each sample character.

【００３８】ホストコンピュータは、２以上の異なるサ
ンプル文字の特徴データから基準文字の辞書データを作
成し、マイクロコンピュータへ出力する。このホストコ
ンピュータは、教育データを出力するホストコンピュー
タであり、ハンディ型光学式文字認識装置で認識した文
字のコードを入力しデータ処理する上位処理装置と兼用
することができ、高速で特徴データから辞書データを作
成することができる。The host computer creates the dictionary data of the reference character from the characteristic data of two or more different sample characters and outputs it to the microcomputer. This host computer is a host computer that outputs educational data, and can also be used as a host processing device that inputs the code of the character recognized by the handheld optical character recognition device and processes the data. Data can be created.

【００３９】また、新たな辞書データは、２以上のサン
プルデータの特徴データをまとめて処理して作成したも
のなので、サンプル文字の読取順序の影響は受けない。Further, since the new dictionary data is created by collectively processing the characteristic data of two or more sample data, it is not affected by the reading order of the sample characters.

【００４０】更に、請求項２のハンディ型光学式文字認
識装置の辞書データ学習方式は、基準文字の辞書データ
を、２以上の異なるサンプル文字の特徴データの平均値
と、特徴データの分散量の平方根σで作成したことを特
徴とする。Further, in the dictionary data learning method of the handy type optical character recognition device according to the second aspect, the dictionary data of the reference character is defined as the average value of the feature data of two or more different sample characters and the variance amount of the feature data. It is characterized by being created with a square root σ.

【００４１】請求項２の発明は、基準文字の辞書データ
が、２以上の異なるサンプル文字の特徴データの平均値
と、特徴データの分散量の平方根σで表されるので、サ
ンプル文字の特徴データとのベクトル間の距離（不一致
度ｋ）をこの分散量の平方根σをもとにした定数Ｃ₁と
で比較することによって、サンプル文字のばらつきが異
なっても、文字認識エラーが発生しない。According to the second aspect of the present invention, the dictionary data of the reference character is represented by the average value of the feature data of two or more different sample characters and the square root σ of the variance of the feature data. By comparing the distance (the degree of disagreement k) between the vectors and and the constant C ₁ based on the square root σ of the variance, a character recognition error does not occur even if the variations in sample characters are different.

【００４２】更に、請求項３のハンディ型光学式文字認
識装置の辞書データ学習方式は、サンプル文字の特徴デ
ータが、サンプル文字の読取データをｍ個のマスクに分
割し、該マスク毎のサンプル文字を示す読取データの値
を、ｍ個の各成分に割り当てたｍ次元のベクトルｘであ
り、サンプル文字数をｎとしたときに、辞書データは、Further, in the dictionary data learning method of the handy type optical character recognition device of the third aspect, the characteristic data of the sample character divides the read data of the sample character into m masks, and the sample character for each mask. Is a m-dimensional vector x that is assigned to each of m components, and the number of sample characters is n, the dictionary data is

【００４３】[0043]

【数１】(Equation 1)

【００４４】でサンプル文字の特徴データの平均を表し
たｆと、And f , which represents the average of the characteristic data of the sample characters,

【００４５】[0045]

【数２】(Equation 2)

【００４６】で特徴データの分散量の平方根をあらわし
たσとからなることを特徴とする。And is represented by σ representing the square root of the variance of the feature data.

【００４７】請求項３の発明は、ｍ次元のベクトルｘで
表される２以上のサンプル文字の特徴データからその平
均値をAccording to a third aspect of the present invention, the average value is calculated from the characteristic data of two or more sample characters represented by the m-dimensional vector x.

【００４８】[0048]

【数１】(Equation 1)

【００４９】から求め、分散量の平方根をThe square root of the amount of variance is calculated from

【００５０】[0050]

【数２】(Equation 2)

【００５１】から求めて、それぞれ辞書データとするの
で、イメージデータである読取データの平均とばらつき
度合いをホストコンピュータでの処理に適した計算式で
求めることができる。Since the respective dictionary data are obtained from the above, the average and the degree of variation of the read data as the image data can be obtained by a calculation formula suitable for the processing by the host computer.

【００５２】[0052]

【発明の実施の形態】以下、図面を参照して本発明の実
施の形態を説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００５３】図１は、本発明によるＯＣＲ１０とホスト
コンピュータ６の構成を示すブロック図である。この図
１において、１はＣＣＤ等からなるイメージセンサで、
被読取文字を光学的に読み取って電気信号に変換するも
のである。FIG. 1 is a block diagram showing the configurations of the OCR 10 and the host computer 6 according to the present invention. In FIG. 1, 1 is an image sensor including a CCD,
The read character is optically read and converted into an electric signal.

【００５４】２は入力されたイメージセンサ１からの微
小なアナログ信号を増幅した後に、濃淡を表わす２値信
号に変換する増幅・２値化回路である。３は、ＳＲＡＭ
（スタティックＲＡＭ）等からなるイメージデータ記憶
部を兼ねた記憶部で、増幅・２値化回路２から出力され
た明暗を表わす２値信号を、図６（ａ）に示すようにパ
ターン処理化した読取データとし、記憶するものであ
る。Reference numeral 2 is an amplification / binarization circuit for amplifying a minute analog signal input from the image sensor 1 and then converting it into a binary signal representing a gradation. 3 is SRAM
In the storage unit that also serves as the image data storage unit such as (static RAM), the binary signal representing the light and dark output from the amplification / binarization circuit 2 is subjected to pattern processing as shown in FIG. 6A. It is stored as read data.

【００５５】すなわち、被読取文字の各位置に対応する
記憶部３の番地に、該位置の２値信号の明暗の情報（例
えば、地色部分の明ドットを０、文字部分の暗ドットを
１とする）が記憶され、記憶部３のメモリー上に被読取
文字と相似形のイメージデータからなる読取データが形
成される。That is, at the address of the storage unit 3 corresponding to each position of the character to be read, the light / dark information of the binary signal at that position (for example, 0 for a light dot in the background color portion and 1 for a dark dot in the character portion). Is stored, and read data composed of image data similar in shape to the character to be read is formed on the memory of the storage unit 3.

【００５６】記憶部３の後段には、文字認識モードと辞
書学習モードの二つのモードで動作するマイクロコンピ
ュータ４が接続されている。いずれのモードにおいて
も、このマイクロコンピュータ４は、記憶部３に記憶さ
れた読取データについて、文字列抽出、文字切り出し、
正規化、特徴抽出の各処理を実行する。A microcomputer 4 which operates in two modes, a character recognition mode and a dictionary learning mode, is connected to the latter stage of the storage unit 3. In any mode, the microcomputer 4 extracts a character string, cuts out a character from the read data stored in the storage unit 3,
Each process of normalization and feature extraction is executed.

【００５７】文字列抽出は、複数の文字を同時に読み取
った場合に、全体の読取データから複数の文字を含む読
取データを抽出するものである。The character string extraction is to extract read data including a plurality of characters from the entire read data when a plurality of characters are read at the same time.

【００５８】文字切り出しは、更にこの読取データから
１文字毎の被読取文字に対応する読取データを切り出す
ものである。例えば、読取データにおいて、暗ドット
「１」が連続する部分の周囲が明ドット「０」で囲まれ
ている場合に、その暗ドット「１」の周囲の部分を１文
字として切り出すものである。The character cut-out is to cut out read data corresponding to a read character for each character from the read data. For example, in the read data, when the periphery of the portion where the dark dots “1” are continuous is surrounded by the bright dots “0”, the portion around the dark dots “1” is cut out as one character.

【００５９】切り出された１文字部分は、辞書データと
比較するため、正規化処理によって辞書データを作成し
たときと同じ大きさに合わされ、次いでこの正規化され
た読取データは、ホストコンピュータ６へ出力されると
共に、この正規化された読取データから各部の特徴が抽
出される。Since the cut-out one-character portion is compared with the dictionary data, it is adjusted to the same size as when the dictionary data was created by the normalization process, and the normalized read data is output to the host computer 6. At the same time, the characteristics of each part are extracted from the normalized read data.

【００６０】特徴抽出は、例えば記憶部３において、正
規化された読取データを縦横で６分割した全体で３６個
のマスクで分割し、各マスクにおける暗ドット「１」の
数と位置から、マスク毎に被読取文字を表す読取データ
の値である特徴数を算出する。この特徴数を算出する際
には、マスクの中央位置に重み付けがされて、マスク内
の暗ドット「１」の数が集計される。In the feature extraction, for example, in the storage unit 3, the normalized read data is divided into 6 vertically and horizontally and is divided into 36 masks in total, and the mask is calculated from the number and position of dark dots "1" in each mask. The feature number, which is the value of the read data representing the read character, is calculated for each. When calculating the number of features, the central position of the mask is weighted and the number of dark dots “1” in the mask is totaled.

【００６１】例えば、図６（ａ）の右下のマスクの特徴
数は、同図（ｂ）マスク周辺の暗ドット「１」の数が
「６」、マスク中央の明ドット「０」の数が「２」であ
ることから、マスク周辺の「１」の数「６」＊重みづけ
量「１」とマスク中央の「１」の数「２」＊重みづけ量
「２」の和「１０」で表される。For example, regarding the number of features of the lower right mask in FIG. 6A, the number of dark dots "1" around the mask in FIG. 6B is "6", and the number of bright dots "0" in the center of the mask is "6". Is “2”, the sum of the number “1” around the mask “6” * the weighting amount “1” and the number “1” in the mask center “2” * the weighting amount “2” is “10”. It is represented by.

【００６２】同様にして、３６個全てのマスクの特徴数
を求め、これを３６個の成分に割り当てた３６次元のベ
クトルによって、被読取文字の特徴データが表される。Similarly, the characteristic number of all 36 masks is obtained, and the characteristic data of the character to be read is represented by a 36-dimensional vector in which this is assigned to 36 components.

【００６３】すなわち、特徴データを３６次元のベクト
ルｘとすると、このベクトルｘは、図６（ａ）の各マス
クを上から横方向に順序づけたときの各マスクの特徴数
をｘ ₁、ｘ₂、ｘ₃、・・・ｘ₃₆としたときに、ｘ＝（ｘ ₁、ｘ₂、ｘ₃、・・・、ｘ₃₆）で表せる。That is, assuming that the feature data is a 36-dimensional vector x , this vector x has the feature numbers x ₁ and x ₂ of each mask when the masks of FIG. 6A are ordered in the horizontal direction from the top. , X ₃ , ... X ₃₆ , it can be expressed as x = ( x ₁ , x ₂ , x ₃ , ..., x ₃₆ ).

【００６４】マイクロコンピュータ４は、文字認識モー
ドにおいて、この特徴データｘと辞書データを比較して
被読取文字の認識データを所定の文字コードでホストコ
ンピュータ６へ出力し、辞書作成モードでは、この特徴
データｘをホストコンピュータ６へ出力し、ホストコン
ピュータ６から入力された新たな辞書データを辞書デー
タ記憶部５に格納するものである。In the character recognition mode, the microcomputer 4 compares the characteristic data x with the dictionary data and outputs the recognition data of the read character to the host computer 6 with a predetermined character code. The data x is output to the host computer 6, and new dictionary data input from the host computer 6 is stored in the dictionary data storage unit 5.

【００６５】５は、辞書データ記憶部で、電気的再書込
み可能ＲＯＭ（ＥＥＰＲＯＭ）等の不揮発性メモリから
なり、被読取文字と比較する辞書データが記憶されてい
る。A dictionary data storage unit 5 is composed of a non-volatile memory such as an electrically rewritable ROM (EEPROM) and stores the dictionary data to be compared with the read character.

【００６６】この辞書データ記憶部は、マイクロコンピ
ュータ４と接続し、辞書データをマイクロコンピュータ
４へ読み出したり、新たな辞書データをマイクロコンピ
ュータ４から書き込むことができる。This dictionary data storage unit is connected to the microcomputer 4 and can read dictionary data into the microcomputer 4 or write new dictionary data from the microcomputer 4.

【００６７】図１に示すように、これらの構成からなる
ＯＣＲ１０には、更にマイクロコンピュータ４と双方向
でデータ通信を行うことができるホストコンピュータ６
が接続されている。As shown in FIG. 1, the OCR 10 having these configurations further includes a host computer 6 capable of bidirectional data communication with the microcomputer 4.
Is connected.

【００６８】ホストコンピュータ６には、データ入力部
７とディスプレーからなる表示部８と特徴データを記憶
する特徴データ記憶部９が備えられている。データ入力
部７は、キーボードであり、操作者がこのキーボードを
操作することによって、マイクロコンピュータ４の動作
モードを選択したり、後述する所定の制御を行う。The host computer 6 is provided with a data input section 7, a display section 8 including a display, and a characteristic data storage section 9 for storing characteristic data. The data input unit 7 is a keyboard, and an operator operates the keyboard to select an operation mode of the microcomputer 4 or perform predetermined control described later.

【００６９】表示部８であるディスプレーには、正規化
された読取データや後述する仮認識結果などを表示す
る。The display which is the display unit 8 displays normalized read data, a temporary recognition result described later, and the like.

【００７０】次にこのような構成のＯＣＲ１０とホスト
コンピュータ６によって、新たな辞書データを作成する
方法を図２の流れ図によって説明する。Next, a method for creating new dictionary data by the OCR 10 and the host computer 6 having such a configuration will be described with reference to the flowchart of FIG.

【００７１】始めに、ＯＣＲ１０のマイクロコンピュー
タ４を辞書学習モードとするために操作者がデータ入力
部７を操作し、ホストコンピュータ６から辞書学習モー
ドへの設定を要求する学習モード指定信号をＯＣＲ１０
へ出力する（ステップＳ１）。First, the operator operates the data input section 7 to put the microcomputer 4 of the OCR 10 into the dictionary learning mode, and the host computer 6 sends a learning mode designating signal for requesting the setting of the dictionary learning mode to the OCR 10.
To (step S1).

【００７２】マイクロコンピュータ４は、この信号を受
けて自身を辞書学習モードとした後、このモードに設定
したことを応答信号によってホストコンピュータ１０６
へ通知する（ステップＳ２）。After receiving this signal, the microcomputer 4 sets itself in the dictionary learning mode, and then, in response to the response signal, the host computer 106 sets this mode.
To (step S2).

【００７３】操作者は、この通知を表示部８で確認し、
続いて辞書データとして記憶しようとするフォントの文
字について、各種プリンタから印字した複数種類のサン
プル文字（例えば１０種類）をイメージセンサ１から読
み取らせる。このとき、１０種類のサンプル文字を必ず
しも一度に連続して読み取らせる必要はない。The operator confirms this notification on the display unit 8,
Next, regarding the characters of the font to be stored as dictionary data, a plurality of types of sample characters (for example, 10 types) printed by various printers are read from the image sensor 1. At this time, it is not always necessary to continuously read the 10 types of sample characters at once.

【００７４】本実施の形態では、辞書データとして記憶
しようとするフォントの文字について、一種類のプリン
タから複数のサンプル文字（例えば「Ａ」「Ｂ」「Ｃ」
「Ｄ」「Ｅ」「３」「４」の７文字）を印字し、これら
のサンプル文字を同時に読み取らせている。In this embodiment, a plurality of sample characters (for example, "A", "B", "C") from one type of printer are used for the characters of the font to be stored as dictionary data.
"D", "E", "3", and "4" (7 characters) are printed, and these sample characters are read at the same time.

【００７５】複数のサンプル文字を被読取文字としてイ
メージセンサ１から読み取る毎に、マイクロコンピュー
タ４は、前述の文字列抽出、文字切り出し処理を実行
し、サンプル文字１文字毎に読取データを切り出す。Each time a plurality of sample characters are read from the image sensor 1 as the characters to be read, the microcomputer 4 executes the above-mentioned character string extraction and character cutout processing, and cuts out read data for each sample character.

【００７６】続いて、切り出された読取データについ
て、正規化、特徴抽出の各処理を実行し、サンプル文字
の読み取りデータから特徴データｘを抽出する。Then, the cut-out read data is subjected to normalization and feature extraction processing to extract feature data x from the read data of the sample characters.

【００７７】同時に、このマイクロコンピュータ４は、
辞書学習モードではあるが、文字認識モードと同様に、
辞書データ記憶部５に記憶されている仮の辞書データを
呼び出し、この特徴データｘと比較することによって、
サンプル文字の仮文字認識を行う。特徴データｘと辞書
データとの比較で文字認識を行う方法については、後述
する。At the same time, the microcomputer 4
Although it is a dictionary learning mode, like the character recognition mode,
By calling the temporary dictionary data stored in the dictionary data storage unit 5 and comparing it with the characteristic data x ,
Performs temporary character recognition of sample characters. A method for character recognition by comparing the characteristic data x with the dictionary data will be described later.

【００７８】１文字のサンプル文字「Ａ」について仮文
字認識が終了すると、このサンプル文字の正規化した読
取データ、特徴データｘ、仮文字認識で認識した仮認識
データは、一時、記憶部３に記憶される。When the temporary character recognition of one sample character "A" is completed, the normalized read data of the sample character, the feature data x , and the temporary recognition data recognized by the temporary character recognition are temporarily stored in the storage unit 3. Remembered.

【００７９】マイクロコンピュータ４は、抽出した文字
列から、更に次のサンプル文字「Ｂ」の読取データを切
り出して上記と同様の処理を実行し、抽出した文字列に
含まれる全てのサンプル文字の読取データについて処理
を繰り返す。The microcomputer 4 further cuts out the read data of the next sample character "B" from the extracted character string and executes the same processing as described above to read all the sample characters contained in the extracted character string. Repeat the process for the data.

【００８０】抽出した文字列に含まれる全てのサンプル
文字の読取データについて、読取データ、特徴データ
ｘ、仮認識データが記憶部３に記憶されると、これらの
データを記憶部３から順次読み出し、ホストコンピュー
タ６へ出力する（ステップＳ３乃至ステップＳ５）。For the read data of all the sample characters included in the extracted character string, read data and characteristic data
x , when the temporary recognition data is stored in the storage unit 3, these data are sequentially read from the storage unit 3 and output to the host computer 6 (steps S3 to S5).

【００８１】すなわち、ステップＳ３においては、１番
目のサンプル文字「Ａ」から順に「４」まで、仮文字認
識で認識した仮認識データを所定の文字コードによって
ホストコンピュータ６へ連続して出力する。That is, in step S3, the temporary recognition data recognized by the temporary character recognition from the first sample character "A" to "4" are sequentially output to the host computer 6 by a predetermined character code.

【００８２】続いて、ステップＳ４で、１番目のサンプ
ル文字から順に、サンプル文字の順序を示すコードと共
に、そのサンプル文字の読取データを連続して出力し、
更に、ステップＳ５で、１番目のサンプル文字から順
に、サンプル文字の順序を示すコードとそのサンプル文
字の特徴データｘを連続して出力する。Then, in step S4, the code indicating the order of the sample characters and the read data of the sample characters are continuously output in order from the first sample character.
Further, in step S5, a code indicating the order of the sample characters and the characteristic data x of the sample characters are continuously output in order from the first sample character.

【００８３】ホストコンピュータ６は、これらの入力さ
れたデータの内、仮認識データと読取データから、サン
プル文字の仮認識結果と読取イメージを表示部８へ表示
する。The host computer 6 displays the temporary recognition result and the read image of the sample character on the display unit 8 from the temporary recognition data and the read data among the input data.

【００８４】図３は、表示部８にこの仮認識結果と読取
イメージを表示させた状態を示すもので、読み取った
「Ａ」「Ｂ」「Ｃ」「Ｄ」「Ｅ」「３」「４」のサンプ
ル文字の仮認識結果（１１₁、１１₂・・１１₇）と読取
イメージ（１２₁、１２₂・・１２₇）がディスプレー下
方の枠２０内にサンプル文字毎に分けられて表示されて
いる。FIG. 3 shows a state in which the temporary recognition result and the read image are displayed on the display section 8. The read “A”, “B”, “C”, “D”, “E”, “3” and “4” are displayed. The temporary recognition result (11 ₁ , 11 _2, ... 11 ₇ ) of the sample characters and the read image (12 ₁ , 12 _2, ... 12 ₇ ) are displayed in the frame 20 below the display, divided for each sample character. ing.

【００８５】操作者は、この表示によって仮認識結果を
読取イメージと見比べながら確認し、誤認識しているサ
ンプル文字がある場合には、データ入力部７から正しい
文字を入力しその文字の仮認識結果を訂正する。例え
ば、読取イメージ１２₂が、文字「Ｂ」を示すのに対
し、仮認識結果１１₂が「Ｄ」を示し誤認識である場合
には、ディスプレーのカーソルをこの仮認識結果１１₂
上に合わせ、データ入力部７から正しい文字「Ｂ」を入
力する。The operator confirms the temporary recognition result by comparing this display with the read image, and if there is a sample character that is erroneously recognized, the operator inputs the correct character from the data input unit 7 and tentatively recognizes the character. Correct the result. For example, when the read image 12 ₂ shows the character “B”, but the temporary recognition result 11 ₂ shows “D” and there is an erroneous recognition, the cursor on the display is moved to the temporary recognition result 11 _2.
In accordance with the above, the correct character "B" is input from the data input unit 7.

【００８６】表示された全てのサンプル文字の仮認識結
果が、読取イメージに対応するものとなったことを確認
すると、操作者は、データ入力部７を操作してこれを確
定する。この確定操作によって、そのときに表示されて
いた仮認識結果に相当する文字コードが、該サンプル文
字の教育データとなってホストコンピュータ６へ入力さ
れる。When the operator confirms that the displayed temporary recognition results of all the sample characters correspond to the read image, the operator operates the data input section 7 to confirm this. By this confirming operation, the character code corresponding to the temporary recognition result displayed at that time becomes the educational data of the sample character and is input to the host computer 6.

【００８７】また、このときに、ＯＣＲ１０から入力さ
れた特徴データは、サンプル文字毎に前述の教育データ
と対応づけられて、特徴データ記憶部９に記憶される。At this time, the characteristic data input from the OCR 10 is stored in the characteristic data storage section 9 in association with the above-mentioned educational data for each sample character.

【００８８】図３のディスプレー上方に表示された表示
枠２１は、異なる種類のプリンタによって何種類のサン
プル文字の特徴データが特徴データに記憶されたかを文
字毎に示すもので、表示枠２１の上段は、辞書データと
して記憶しようとする文字を表示している。本実施の形
態において、辞書データとして記憶しようとする文字
は、「Ａ」から「Ｚ」までのアルファベットと「０」か
ら「９」までの数字であるため、３６の文字１３₁、１
３₂・・１３₃₆が枠毎に表示されている。The display frame 21 displayed above the display of FIG. 3 shows for each character how many kinds of sample character characteristic data are stored in the characteristic data by different types of printers. Indicates a character to be stored as dictionary data. In the present embodiment, the characters to be stored as dictionary data are the alphabets from "A" to "Z" and the numbers from "0" to "9", so 36 characters 13 ₁ , 1
3 ₂ ... 13 ₃₆ is displayed for each frame.

【００８９】表示枠２１の下段の読取サンプル種数１４
は、その上段に表示された文字について、前述の方法に
よって、何種類の異なるサンプル文字の特徴データが、
特徴データ記憶部９に記憶されているかを示すものであ
る。The number of read sample types in the lower part of the display frame 21 is 14
Is the character data displayed in the upper part of the character data of several different sample characters by the method described above.
It shows whether or not it is stored in the characteristic data storage unit 9.

【００９０】従って、上記のように「Ａ」「Ｂ」「Ｃ」
「Ｄ」「Ｅ」「３」「４」のサンプル文字について新た
に特徴データが記憶されると、表示枠２１のこれらの文
字の下段に表示された読取サンプル種数１４₁、１４₂・
・・に１が加えられ表示されることとなる。Therefore, as described above, "A""B""C"
When the characteristic data is newly stored for the sample characters “D”, “E”, “3”, and “4”, the read sample type numbers 14 ₁ , 14 ₂ displayed below the characters in the display frame 21.
・・ Will be displayed with 1 added.

【００９１】新たな辞書データ作成に必要なサンプル数
ｎを「１０」とすれば、表示枠２１の下段の読取サンプ
ル種数１４が全て「１０」以上となるように、不足して
いる文字について異なるプリンタで印字されたサンプル
文字を用意して、ＯＣＲ１０のイメージセンサ１から再
び読み取らせる。If the number of samples n required to create new dictionary data is "10", the number of read sample types 14 in the lower part of the display frame 21 is set to "10" or more, and Sample characters printed by a different printer are prepared and read again from the image sensor 1 of the OCR 10.

【００９２】このようにして、ステップＳ３からステッ
プＳ５の処理とホストコンピュータ６での上述の処理を
繰り返し、表示枠２１の下段の読取サンプル種数１４が
全て「１０」以上となったときに、操作者は、必要な特
徴データが用意されたものとして、データ入力部７を操
作して、ホストコンピュータ６へ新たな辞書データの作
成を指示する。In this way, when the processes from step S3 to step S5 and the above-described process in the host computer 6 are repeated and all the read sample types 14 in the lower part of the display frame 21 become "10" or more, The operator operates the data input unit 7 to instruct the host computer 6 to create new dictionary data, assuming that necessary feature data is prepared.

【００９３】ホストコンピュータ６は、この指示を受け
て「Ａ」から順に文字毎に辞書データを作成する。すな
わち、新たな辞書データを作成しようとする文字の１文
字を基準文字とすれば、この基準文字を教育データとす
る１０以上の特徴データを特徴データ記憶部９から読み
出し、これらの特徴データから特徴データの平均値と特
徴データの分散量の平方根σからなる新たな辞書データ
を作成する。In response to this instruction, the host computer 6 creates dictionary data for each character in order from "A". That is, if one character of a character for which new dictionary data is to be created is used as a reference character, 10 or more characteristic data having the reference character as educational data are read from the characteristic data storage unit 9, and the characteristic data is used as a characteristic. New dictionary data composed of the average value of the data and the square root σ of the variance of the feature data is created.

【００９４】特徴データｘが、ｍ次元のベクトルで表さ
れるものとし、基準文字についての特徴データの数すな
わちサンプル文字数をｎ、ｉ番目のサンプル文字の特徴
データｘをｘ _iとすれば、特徴データの平均値ｆは、If the characteristic data x is represented by an m-dimensional vector, and the number of characteristic data for the reference character, that is, the number of sample characters is n, and the characteristic data x of the i-th sample character is x _i , The average value f of the data is

【００９５】[0095]

【数１】(Equation 1)

【００９６】で求めることができ、ｍ次元のベクトルで
表される。It can be obtained by, and is represented by an m-dimensional vector.

【００９７】更に、ｉ番目の特徴データｘのｊ番目の成
分をｘ _ij、上記で求めたｍ次元のベクトルで表される平
均値ｆのｊ番目の成分をｆ _jとすれば、特徴データの分
散量の平方根σは、Further, if the j-th component of the i-th feature data x is x _ij and the j-th component of the average value f represented by the m-dimensional vector obtained above is f _j , The square root σ of the variance is

【００９８】[0098]

【数２】(Equation 2)

【００９９】で求めることができる。It can be obtained by

【０１００】１文字の基準文字について、平均値ｆと分
散量の平方根σからなる辞書データを作成すると、基準
文字を他の文字に移し、同様の計算を行って、３６文字
全ての辞書データを作成する。With respect to one reference character, when dictionary data consisting of the average value f and the square root σ of the variance is created, the reference character is moved to another character and the same calculation is performed to obtain the dictionary data of all 36 characters. create.

【０１０１】ホストコンピュータ６は、全ての新たな辞
書データを作成すると、１番目の文字から順に文字毎
に、教育データである文字コードと、その後にその教育
データに対応する辞書データの平均値ｆと分散量の平方
根σをマイクロコンピュータ４へ出力し、マイクロコン
ピュータ４へこれらの辞書データを教育データと対応づ
けて、辞書データ記憶部５へ書き込むよう指示する（ス
テップＳ６）。When all the new dictionary data is created, the host computer 6 creates a character code, which is educational data, for each character in order from the first character, and then an average value f of the dictionary data corresponding to the educational data. And the square root σ of the variance amount are output to the microcomputer 4, and the microcomputer 4 is instructed to associate these dictionary data with the educational data and write them in the dictionary data storage unit 5 (step S6).

【０１０２】マイクロコンピュータ４は、新たな辞書デ
ータを教育データに対応づけて辞書データ記憶部５へ記
憶させた後、辞書データを書き替えたことをホストコン
ピュータ６へ通知し（ステップＳ７）、操作者は表示部
８の表示によりこれを確認する。The microcomputer 4 stores new dictionary data in association with the education data in the dictionary data storage unit 5, and then notifies the host computer 6 that the dictionary data has been rewritten (step S7), and the operation is performed. The person confirms this by the display on the display unit 8.

【０１０３】次に、このようにして作成した辞書データ
によって、被読取文字を文字認識する方法について、文
字認識モードにおける動作を示した図４の流れ図によっ
て説明する。Next, a method of recognizing a character to be read by using the dictionary data thus created will be described with reference to the flowchart of FIG. 4 showing the operation in the character recognition mode.

【０１０４】ＯＣＲ１０のマイクロコンピュータ４を文
字認識モードとするために操作者がデータ入力部７を操
作すると、ホストコンピュータ６から文字認識モードへ
の設定を要求する認識モード指定信号をＯＣＲ１０へ出
力する（ステップＳ１０）。When the operator operates the data input section 7 to set the microcomputer 4 of the OCR 10 to the character recognition mode, the host computer 6 outputs a recognition mode designating signal requesting the setting of the character recognition mode to the OCR 10 ( Step S10).

【０１０５】マイクロコンピュータ４は、この信号を受
けて自身を文字認識モードとした後、このモードに設定
したことを応答信号によってホストコンピュータ６へ通
知する（ステップＳ１１）。Upon receiving this signal, the microcomputer 4 sets itself to the character recognition mode, and then notifies the host computer 6 of the setting of this mode by a response signal (step S11).

【０１０６】操作者は、この通知を表示部８で確認し、
続いて文字認識しようとする被読取文字をイメージセン
サ１から読み取らせる。The operator confirms this notification on the display unit 8,
Then, the read character to be recognized is read from the image sensor 1.

【０１０７】被読取文字をイメージセンサ１から読み取
る毎に、マイクロコンピュータ４は、辞書学習モードと
同様に、文字列抽出、文字切り出し処理を実行し、被読
取文字１文字毎に読取データを切り出す。Every time when the character to be read is read from the image sensor 1, the microcomputer 4 performs the character string extraction and character cutout processing as in the dictionary learning mode, and cuts out the read data for each character to be read.

【０１０８】続いて、切り出された読取データについ
て、正規化、特徴抽出の各処理を実行し、被読取文字の
読み取りデータから特徴データｙを抽出する。Subsequently, the cut-out read data is subjected to normalization and feature extraction processing to extract feature data y from the read data of the read character.

【０１０９】被読取文字の文字認識は、この特徴データ
ｙと辞書データを比較して行う。Character recognition of the read character is based on this feature data.
This is performed by comparing y with dictionary data.

【０１１０】始めに、辞書データを辞書データ記憶部５
から読み出し、特徴データｙと平均値ｆを比較する。こ
の比較は、特徴データｙと平均値ｆがいずれもｍ次元
（本実施の形態では３６次元）のベクトルで表されるの
で、両者のベクトル間の距離ｋを求めることにより行
う。First, the dictionary data is stored in the dictionary data storage unit 5
The characteristic data y and the average value f are compared. This comparison is performed by finding the distance k between the two vectors, since the feature data y and the average value f are both represented by an m-dimensional (36-dimensional in this embodiment) vector.

【０１１１】すなわち、この距離ｋをある特定文字の辞
書データとの不一致度とすれば、不一致度ｋは、特徴デ
ータｙと平均値ｆのｊ番目の成分をそれぞれｙ_j、ｆ_jと
して、That is, assuming that the distance k is the degree of disagreement with the dictionary data of a certain specific character, the degree of disagreement k is represented by the feature data y and the j-th component of the average value f as y _j and f _j , respectively.

【０１１２】[0112]

【数３】(Equation 3)

【０１１３】で求めることができる。It can be obtained by

【０１１４】この不一致度ｋが、同じ特定文字の辞書デ
ータである分散量の平方根σ＊３以下であるとき、すな
わちｋ＜３σを満たすときに、被読取文字が当該辞書デ
ータに係る文字であると判定する。When the dissimilarity k is less than or equal to the square root σ * 3 of the variance which is dictionary data of the same specific character, that is, when k <3σ is satisfied, the read character is a character related to the dictionary data. To determine.

【０１１５】ここで分散量σ²は、辞書学習モードで説
明したｎ種類の特徴データのばらつきの尺度を示すもの
で、ばらつきが大きいと分散量σ²が大きくなり、ばら
つきが小さいと分散量σ²が小さくなる。統計学上、正
規分布に従う特徴データのばらつきは、特徴データの平
均値ｆ＋３σの中に、９９．８％の未知の特徴データが
ふくまれる。つまり、被読取文字の特徴データｙが、特
徴データの平均値ｆ±３σであれば、同じ文字のばらつ
きの範囲内、それ以外は異常値すなわち、該文字と異な
る特徴データと判別することができる。Here, the variance amount σ ² is a measure of the variation of the n kinds of feature data described in the dictionary learning mode. The variance amount σ ² increases when the variance is large, and the variance amount σ when the variance is small. ² becomes smaller. Statisticalally, in the variation of the characteristic data according to the normal distribution, 99.8% of unknown characteristic data is included in the average value f + 3σ of the characteristic data. That is, if the characteristic data y of the character to be read is the average value f ± 3σ of the characteristic data, it can be determined that the characteristic data is within the same character variation range and is otherwise an abnormal value, that is, characteristic data different from the character. .

【０１１６】従って、図５のようにｍ次元空間における
平均値ｆに対し、平均値ｆとの距離ｋ₁が３σ以内であ
る被読取文字の特徴データｙ ₁は、被読取文字がその平
均値ｆの辞書データを有する文字であると判定すること
ができる。[0116] Therefore, with respect to the mean value f in m-dimensional space as shown in FIG. 5, the mean value feature data y ₁ distance k ₁ is to be read characters is within 3σ with f are to be read characters mean that It can be determined that the character has the dictionary data of f .

【０１１７】一方、平均値ｆとの距離ｋ₂が３σ以上の
被読取文字の特徴データｙ ₂は、被読取文字がその平均
値ｆの辞書データを有する文字ではないと判定でき、引
き続き他の文字にかかる辞書データと同様の比較を行
う。On the other hand, the characteristic data y ₂ of the read character whose distance k _{2 from the} average value f is 3σ or more can be determined to be not the character having the dictionary data of the average value f , and the other The same comparison as the dictionary data for characters is performed.

【０１１８】このように平均値ｆとの距離ｋと３σを比
較して判定することにより、定数と比較する場合に比
べ、サンプル文字のばらつき度合いを文字認識の判定条
件に取り入れることができ、「認識できない」というエ
ラーや、「誤認識」というエラーが多発することがな
い。By comparing the distance k with the average value f with 3σ in this way, it is possible to incorporate the degree of variation of the sample characters into the judgment condition of character recognition, as compared with the case of comparing with a constant. The error "Unrecognized" and the error "Error recognition" do not occur frequently.

【０１１９】また、１文字の辞書データについて、文字
認識条件を２度判定する必要もなくなり、短時間で文字
認識ができるようになる。Further, it is not necessary to determine the character recognition condition twice for the dictionary data of one character, and the character recognition can be performed in a short time.

【０１２０】マイクロコンピュータ４は、被読取文字を
特定文字と認識すると、その特定文字の文字コードを認
識データとしてホストコンピュータ６に出力する（ステ
ップＳ１２）。また、いずれの文字にかかる辞書データ
とも、文字認識条件を満たさない場合にも、「認識でき
ない」ことを示す「＊」の文字コードを認識データとし
てホストコンピュータ６へ出力する。Upon recognizing the read character as a specific character, the microcomputer 4 outputs the character code of the specific character to the host computer 6 as recognition data (step S12). Further, even when the character recognition condition is not satisfied for any of the dictionary data relating to any character, the character code of “*” indicating “unrecognizable” is output to the host computer 6 as recognition data.

【０１２１】被読取文字１文字について文字認識が終了
すると、マイクロコンピュータ４は、抽出した文字列か
ら、更に次の被読取文字の読取データを切り出して上記
と同様の処理を実行し、抽出した文字列に含まれる全て
の被読取文字の読取データについて処理を繰り返す。When the character recognition is completed for one character to be read, the microcomputer 4 further cuts the read data of the next character to be read from the extracted character string and executes the same process as above to extract the extracted character. The process is repeated for the read data of all the read characters included in the column.

【０１２２】全ての被読取文字について認識データをホ
ストコンピュータ６へ出力すると、マイクロコンピュー
タ４は、文字認識が終了したことをホストコンピュータ
６へ通知し（ステップＳ１３）、文字認識モードは終了
する。When the recognition data for all the characters to be read is output to the host computer 6, the microcomputer 4 notifies the host computer 6 that the character recognition is completed (step S13), and the character recognition mode is completed.

【０１２３】以上の実施の形態では、ホストコンピュー
タ６とＯＣＲ１０を１対１で接続した例で説明したが、
これに限るものではなく、例えば、一台のホストコンピ
ュータ６に対し、２台以上のＯＣＲ１０を接続して、そ
れぞれの０ＣＲ１０の辞書データ作成に、共通のホスト
コンピュータ６を用いてもよい。In the above embodiment, an example in which the host computer 6 and the OCR 10 are connected one-to-one has been described.
However, the present invention is not limited to this. For example, two or more OCRs 10 may be connected to one host computer 6, and the common host computer 6 may be used to create dictionary data for each 0CR 10.

【０１２４】このように構成すれば、辞書データ作成の
為の計算を高速処理が可能なホストコンピュータ６で行
うことができ、各ＯＣＲ１０へ大容量のメモリーや高速
動作するマイクロコンピュータ４を備える必要がなくな
り、ＯＣＲ１０の小型化、低価格化が可能となる。With this configuration, the calculation for creating the dictionary data can be performed by the host computer 6 capable of high-speed processing, and it is necessary to provide each OCR 10 with a large-capacity memory and a high-speed operating microcomputer 4. Therefore, the OCR 10 can be reduced in size and price.

【０１２５】また、上記いずれの実施の形態において
も、ホストコンピュータ６とＯＣＲ１０は、必ずしも常
時接続させておく必要はなく、ＯＣＲ１０に新たな辞書
データを作成する際にのみ、接続させてもよい。Further, in any of the above-mentioned embodiments, the host computer 6 and the OCR 10 do not necessarily have to be always connected, and may be connected only when new dictionary data is created in the OCR 10.

【０１２６】更に、ホストコンピュータ６は、ＯＣＲ１
０で文字認識した認識データを入力してこれを処理する
ホストコンピュータを利用することが好ましいが、必ず
しもこれに限らず文字認識の認識データを処理するホス
トコンピュータと、新たな辞書データを作成するホスト
コンピュータが異なるものであってもよい。Further, the host computer 6 uses the OCR1
It is preferable to use a host computer that inputs recognition data obtained by character recognition by 0 and processes the recognition data, but not limited to this, a host computer that processes recognition data for character recognition and a host that creates new dictionary data. The computers may be different.

【０１２７】[0127]

【発明の効果】以上説明したように本発明によれば、Ｏ
ＣＲ１０は、各サンプル文字の読取データから抽出した
特徴データをホストコンピュータ６へ出力するだけであ
り、新たな辞書データの作成は、ホストコンピュータ６
が行うので、ＯＣＲ１０に大容量のメモリー容量や高速
で動作する高価なマイクロコンピュータを備える必要が
ない。従って、ＯＣＲ１０の小型化と低価格化が可能と
なる。As described above, according to the present invention, O
The CR 10 only outputs the characteristic data extracted from the read data of each sample character to the host computer 6, and the new dictionary data is created by the host computer 6 only.
Therefore, it is not necessary to provide the OCR 10 with a large memory capacity and an expensive microcomputer that operates at high speed. Therefore, it is possible to reduce the size and cost of the OCR 10.

【０１２８】また、ホストコンピュータ６で作成された
新たな辞書データは、２以上のサンプルデータの特徴デ
ータをまとめて処理して作成したものなので、サンプル
文字の読取順序の影響は受けない。また、サンプル文字
毎に辞書データを更新する必要がないので、その都度辞
書データ作成処理の間、待機する必要がなく、複数のサ
ンプル文字を連続して読み取らせ、一連の作業で辞書デ
ータを作成できる。Since the new dictionary data created by the host computer 6 is created by collectively processing the characteristic data of two or more sample data, it is not affected by the reading order of the sample characters. Also, because it is not necessary to update the dictionary data for each sample character, there is no need to wait during the dictionary data creation process each time, and multiple sample characters can be read continuously and dictionary data created in a series of operations. it can.

【０１２９】請求項２の発明によれば、基準文字の辞書
データが、２以上の異なるサンプル文字の特徴データの
平均値と、特徴データの分散量の平方根σで表されるの
で、サンプル文字の特徴データとのベクトル間の距離
（不一致度ｋ）を、この分散量の平方根σをもとにした
境界値とで比較することによって、サンプル文字の特徴
データのばらつきが異なっても、文字認識エラーが発生
しない。According to the second aspect of the present invention, the dictionary data of the reference character is represented by the average value of the feature data of two or more different sample characters and the square root σ of the variance of the feature data. By comparing the distance between vectors with the feature data (degree of disagreement k) with the boundary value based on the square root σ of this variance, even if the variations in the feature data of the sample characters are different, character recognition error Does not occur.

【０１３０】また、１文字の辞書データについて、文字
認識条件を２度判定する必要もなくなり、短時間で文字
認識ができるようになる。Further, it is not necessary to determine the character recognition condition twice for the dictionary data of one character, and the character recognition can be performed in a short time.

【０１３１】請求項３の発明によれば、ｍ次元のベクト
ルｘで表される２以上のサンプル文字の特徴データから
その平均値とばらつきの度合いをホストコンピュータで
の処理に適した計算式で求めることができる。According to the third aspect of the present invention, the average value and the degree of variation are obtained from the characteristic data of two or more sample characters represented by the m-dimensional vector x by a calculation formula suitable for processing by the host computer. be able to.

【０１３２】[0132]

[Brief description of drawings]

【図１】本発明に係るハンディ型光学式文字認識装置１
０とホストコンピュータ６の構成を示すブロック図であ
る。FIG. 1 is a handy type optical character recognition device 1 according to the present invention.
2 is a block diagram showing the configurations of 0 and a host computer 6. FIG.

【図２】辞書学習モードでの、ホストコンピュータ６と
ＯＣＲ１０との動作を示す流れ図である。FIG. 2 is a flowchart showing operations of the host computer 6 and the OCR 10 in the dictionary learning mode.

【図３】辞書学習モードで、表示部８に表示された操作
画面を示す図である。FIG. 3 is a diagram showing an operation screen displayed on a display unit 8 in a dictionary learning mode.

【図４】文字認識モードでの、ホストコンピュータ６と
ＯＣＲ１０との動作を示す流れ図である。FIG. 4 is a flowchart showing operations of the host computer 6 and the OCR 10 in the character recognition mode.

【図５】ｍ次元のベクトルで表される平均値ｆと被読取
文字の特徴データｙ ₁、ｙ ₂を説明上、２次元平面で表し
た説明図である。FIG. 5 is an explanatory diagram in which a mean value f represented by an m-dimensional vector and characteristic data y ₁ and y ₂ of a read character are represented in a two-dimensional plane for explanation.

【図６】イメージデータ記憶部に記憶された読取データ
の記憶状態を示すもので、（ａ）は、縦横で６分割して
３６個のマスクを形成したイメージデータ記憶部を
（ｂ）は、イメージデータ記憶部の右下のマスクを拡大
した説明図である。FIG. 6 shows a storage state of read data stored in an image data storage unit. FIG. 6A shows an image data storage unit in which 36 masks are formed by vertically and horizontally dividing into 6 parts, and FIG. It is explanatory drawing which expanded the lower right mask of the image data storage part.

【図７】従来のハンディ型光学式文字認識装置１００と
ホストコンピュータ１０６の構成を示すブロック図であ
る。FIG. 7 is a block diagram showing configurations of a conventional handy type optical character recognition device 100 and a host computer 106.

【図８】従来のＯＣＲ１００とホストコンピュータ１０
６との動作を示す流れ図である。FIG. 8 is a conventional OCR 100 and host computer 10.
6 is a flow chart showing the operation of FIG.

[Explanation of symbols]

１イメージセンサ３記憶部４マイクロコンピュータ５辞書データ記憶部６ホストコンピュータ１０ハンディ型光学式文字認識装置（ＯＣＲ） 1 Image Sensor 3 Storage Unit 4 Microcomputer 5 Dictionary Data Storage Unit 6 Host Computer 10 Handheld Optical Character Recognition Device (OCR)

Claims

[Claims]

1. An image sensor (1) for optically reading a read character, a storage section (3) for storing read data of the read character output from the image sensor (1), and dictionary data. A dictionary data storage unit (5), a character recognition mode in which the read data and the dictionary data are compared to recognize the character to be read, and dictionary data created based on the read data are stored in the dictionary data storage unit (5). A handy type optical character recognition device (1) having a microcomputer (4) which operates in two modes of a dictionary learning mode which is stored in
0) and a host computer (6) that outputs a character code corresponding to the read data of the read character as educational data to the microcomputer (4) of the handy type optical character recognition device (10) in the dictionary learning mode. The sample character is read by the image sensor (1) as a read character for dictionary creation, and the feature data is extracted from the read data of the sample character by the microcomputer (4) operating in the dictionary learning mode. In the dictionary data learning method of the handy-type optical character recognition device, in which dictionary data created based on the dictionary data is rewritten by storing the dictionary data in the dictionary data storage unit (5) in association with the educational data. The character recognition device (10) has less than one reference character from the image sensor (1). Also 2
The sample characters printed by the different printing means described above are read, and the feature data extracted from the read data of each sample character is output to the host computer (6), and the host computer (6) features the two or more different sample characters. Create standard character dictionary data from the data,
A dictionary data learning method for a handy type optical character recognition device, which outputs to a microcomputer (4) together with educational data corresponding to a reference character.

2. The reference character dictionary data is created by an average value of the feature data of two or more different sample characters and a square root σ of the variance amount of the feature data.
Dictionary data learning method for the handy type optical character recognition device described.

3. The characteristic data of a sample character is an m-dimensional pattern in which the read data of the sample character is divided into m masks and the value of the read data indicating the sample character for each mask is assigned to each of the m components. is a vector x, the sample number of characters when it is n, dictionary data, [number 1] And f , which represents the average of the feature data of the sample characters, and 3. The dictionary data learning method for a handy type optical character recognition device according to claim 1 or 2, characterized in that it comprises σ representing the square root of the variance of the feature data.