JPS6072088A

JPS6072088A - Method for selecting dictionary set of character form corresponding to character form used in original

Info

Publication number: JPS6072088A
Application number: JP58179386A
Authority: JP
Inventors: Hajime Sato; 元佐藤
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1983-09-29
Filing date: 1983-09-29
Publication date: 1985-04-24

Abstract

PURPOSE:To improve an accuracy for recognizing a character form by selecting a dictionary of a character form corresponding to a character form used in an original, from in plural dictionary sets prepared at every character of different character forms. CONSTITUTION:In a dictionary retrieving part DRD, with regard to each character which has been read from an original, a distance arithmetic of a feature parameter from a feature extracting part CAD, and a dictionary set is executed at every dictionary set 1-3. In case a difference (size of a distance) of the dictionary and the feature parameter is outside of the allowable range, the character is regarded as an unreadable character. The number of unreadable characters is counted by counters 6-8 provided on every dictionary set. In case a dictionary set in which the number of unreadable characters is the smallest is only one, its dictionary set is selected, and in case two or more dictionary sets in which the number of unreadable characters is the smallest exist, the dictionary set in which a total value of the distance is the smallest in those dictionary sets is selected.

Description

【発明の詳細な説明】（技術分野）本発明は、光学的文字読取！Ａ、同などにおいて多種類
の字体の文字（文字、斂字）の認識を行なう場合に使用
される字体の識別に関連し、て用いられろ技術に関する
。[Detailed Description of the Invention] (Technical Field) The present invention provides optical character reading! A. This paper relates to a technique used in connection with the identification of fonts used in the recognition of characters (characters, cross-cut characters) of various fonts.

（従来技術）光学的文字読取装置などにおいて多種類の文字の字体を
識別する際に用いられていた従来法としては、例えば、
特公昭５Ｇ　４４４５５号公報で開示されているように
、複数個の辞書セラｌ−を用いて行なわれた読取動作時
に、読取不能文字の発生の個数が最少の辞書セラＩ・か
らの出力結果を採用するというもの、あるいは例えば、
抽出特徴パラメータと、容具なる字体の辞書との距甜ｔ
をめて字体の識別を行なう方法、などがあろが、前記し
た従゛来法における曲名の場合には、誤認文字の判定が
困難であるとともに、異なる字体間の同一文字種でパラ
メータ間に重なりがある場合に字体の識別が円錐でＩり
るという問題点があり、また、前記した従来法における
後者の場合には異なる字体の同一の文字種のパラメータ
間で明らかに重なりがない場合（例えば、アルファベッ
トの小文字のｇと？）に問題が生じる・（目的）本発明は、多種類の字体の文字の認識率を向」ニさせる
ことができるような原稿の字体と対応する７体の辞書セ
ットを選択する方法を提供して前記した従来法における
問題点が解消されるようにすることをｌＡ的とする。(Prior Art) Conventional methods used to identify the fonts of many types of characters in optical character reading devices include, for example,
As disclosed in Japanese Patent Publication No. 5G 44455, when a reading operation is performed using a plurality of dictionary cellars, the output result from the dictionary cellar I. which generates the least number of unreadable characters is selected. Recruitment or, for example,
The distance t between the extracted feature parameters and the dictionary of fonts
However, in the case of song titles using the conventional method described above, it is difficult to identify misidentified characters, and there is overlap between parameters for the same character type in different fonts. In some cases, there is a problem in that the fonts are identified by a cone, and in the latter case in the conventional method described above, there is no obvious overlap between the parameters of the same character type of different fonts (for example, when there is no overlap between the parameters of the same character type in different fonts (Purpose) The present invention provides a dictionary set of 7 fonts corresponding to the fonts of the manuscript, which can improve the recognition rate of characters with many types of fonts. It is our objective to provide a method for selection so that the problems with the conventional methods described above can be solved.

（ｔ−ｆ成）以下、添イ・（凹面を恰照して、本発明の原稿の字イ・
（＼ど文４応する字体の辞書セラｉ〜を選択する方法、
ず４ｒわち、それぞれ異なる字体の文字毎に用意さＪし
た↑；（数の辞書セットの内から原価で使用されている
゛と体と文、Ｉ　Ｌｔ、：する字体の辞デ（を選択する
方法でｌつっで、前記のそれぞれ異なる字体毎に用意さ
れた複（３！の辞書セットの内の個々の辞書セット毎に
、原稿中の予め定められた個数の文字の個々のものにつ
いて、特徴パラメータと辞書との距離演算を行ない、前
記の距離が予め定められた値よりも大きな文字は読取不
能文字として、その個数を各辞書セラ１−毎に計数する
とともに、各辞書セラ１−毎に演算された距離の合計値
を算出し７ておき、読取不能文字の個数が最少の辞書セ
ットが１個だけの場合には、その辞書セットを選択し、
また、読取不能文字の個数が最少の辞書セットが２以上
存在する場合には、それらの辞書セラ１−の内で顕部の
合Ｒ１値の最小のものを選択するようにした原稿の字体
と対応する字体の辞書セットを）顎択する方法、につい
てのＢｉ’　Ｋ（ｉｔな説明を行なう。(t-f composition) Hereinafter, appended A.
(\Dobun 4 How to select the corresponding font in the dictionary,
4r That is, for each character with a different font, select the ``゛, font, and sentence, I Lt,: ``, ``, font, and sentence'' used at cost from a dictionary set of numbers. In a method of The distance between the feature parameter and the dictionary is calculated, and characters whose distance is larger than a predetermined value are considered unreadable characters, and the number of characters is counted for each dictionary cellar 1-, and the number of characters is counted for each dictionary cellar 1-. Calculate the total value of the distances calculated in 7, and if there is only one dictionary set with the least number of unreadable characters, select that dictionary set,
In addition, if there are two or more dictionary sets with the minimum number of unreadable characters, the one with the minimum sum R1 value of the visible part is selected from among those dictionary sets. We will give a detailed explanation of how to select a dictionary set of corresponding fonts.

第１図は、本発明の原稿の字体とり１応する字体の辞書
セットを選択する方法が適用される光学的読取装釘の一
例のものの概略椅成を示すブロック図であって、この第
１図においで、Ｆ’　Ｅ　Ｃは光学的文字読取部（光電
変換部）、Ｆ　ＪＪ　Ｄは前処理部、ＣＡＤは特徴抽出
部、Ｌ）　−ＲＤは辞￥ＩＦ検索部であり、また、第２
図は本発明の原稿の字体と対応する字体の辞書セットを
選択する方法が適用された辞書検索部Ｄ　ＲＤの一実施
例の一部のブロック図を示し、さらに、第３図はフロー
チャートを示している。FIG. 1 is a block diagram schematically showing the configuration of an example of an optical reading nail to which the method of selecting a dictionary set of fonts corresponding to the font of a manuscript according to the present invention is applied. F'EC is an optical character reading section (photoelectric conversion section), FJJD is a preprocessing section, CAD is a feature extraction section, L)-RD is a dictionary IF search section, and the second
The figure shows a block diagram of a part of an embodiment of the dictionary search unit DRD to which the method of selecting a dictionary set of fonts corresponding to the font of a manuscript according to the present invention is applied, and further, FIG. 3 shows a flowchart. ing.

第２図において、１〜３はそれぞれ異なる字体の辞書セ
ットであり、また、４は検索制御部、５は距離演算部、
６〜８は距離加算器、９〜１１は読取不能文字の個数の
計数器であるが、この第２図においては、距離加算器６
と８！数Ｍ）９とが辞書セット１に関するデータを取扱
い、また、圧部加算Ｗ７と言１数器１０とが辞書セラ１
〜２に凹するデータを取扱い、さらに距離加算器８と計
数器１１とが辞書セット３に門するデータを取扱ってい
るものとされている（第２図示の例では、字体の種類が
３７Ｊの場合を示しているが、字体のＴｆｆ類は任意で
あっ°Ｃもよいのであり、字体の￥′ＴＸ類がＭの３ｇ
　６には前記した外部加算器や泪数器も、それぞれＭ個
づつ用！：Ｊ：　ｌ、で、トメ１個の辞費・セットにお
ける各１つづつの辞書セッＩ−毎に、それぞれ１つづつ
の距離加算器と計数器とを対応させればよいので５口る
）。In FIG. 2, 1 to 3 are dictionary sets with different fonts, 4 is a search control section, 5 is a distance calculation section,
6 to 8 are distance adders, and 9 to 11 are counters for counting the number of unreadable characters.
And 8! The number M) 9 handles data related to the dictionary set 1, and the pressure part addition W7 and the 1 digit unit 10 handle the data regarding the dictionary set 1.
It is assumed that the distance adder 8 and the counter 11 handle data belonging to the dictionary set 3 (in the example shown in the second figure, the font type is 37J). The case is shown here, but the Tff type of the font is arbitrary and °C is also fine, and the \'TX type of the font is 3g of M.
6 also uses M pieces of each of the external adders and multimeters mentioned above! :J: l, and each one dictionary set I- in one dictionary set of one tome needs to be associated with one distance adder and one counter, so there are five entries).

さて、原稿の文字が一光学的文字読取部（光電変換部）
ＰＥＣによって画像信号に変換され、その画像信号は前
処理部ＦＤＤ、特徴抽出部ＣＡＤなどによって所定の信
号処理が施こされて、辞書検索部ＤＲＤには原稿の文字
の特徴パラメータが、特徴抽出部ＣＡＤに与えられる。Now, the characters on the original are one optical character reading section (photoelectric conversion section).
The image signal is converted into an image signal by PEC, and the image signal is subjected to predetermined signal processing by the preprocessing unit FDD, feature extraction unit CAD, etc., and the feature parameters of the characters in the document are stored in the dictionary search unit DRD. given to CAD.

辞書検索部ＤＲＤでは、原稿から読取られた予め定めら
れた個数Ｎの文字における個々の文字について、特徴抽
出部ＣＡＤからそれに与えられた特徴パラメータと辞書
セットとの距離演算を、各辞書セラ１へ１〜３毎に行な
う。The dictionary search unit DRD calculates the distance between the feature parameters given to each character from the feature extraction unit CAD and the dictionary set for each character in a predetermined number N of characters read from the manuscript to each dictionary cellar 1. Do this every 1 to 3 times.

前記した距離演算は、例えば距１１　ＲについてＲｊｉ
　＝Σｗ　ｌ　ｄｋｊ−Ｐｋｉ　ｌ　−−・・・・・（
１）ｋ＝１上記の（１）式のような関数を考えて行なうことができ
る。（１）式において、Ｒｊｉは、辞書ベタ１−ルｄｊ　と特徴ベクトルＰｉと
の距歴、Ｗは重み関数、ｄｋｊは辞書ベクトルｄｊのに成分（パラメータ）、Ｐ
ｈｉは特徴ベクトルＰｉのｈ成分（パラメータ）、であ
る。For example, the distance calculation described above is performed using Rji for the distance 11R.
=Σw l dkj−Pki l −−・・・・・・(
1) k=1 This can be done by considering a function such as the above equation (1). In equation (1), Rji is the distance history between the dictionary vector dj and the feature vector Pi, W is the weighting function, dkj is the component (parameter) of the dictionary vector dj, and P
hi is the h component (parameter) of the feature vector Pi.

距Ｑｉの演算に当っては、パラメータイσに辞書と特徴
パラメータとの相違（距Ｒ１の大ささ）の許容範囲が、
前記した重み関数Ｗによって、予め定められており、前
記した辞書と特徴パラメータとの相違（距１ゲｔの大き
さ）が前記した許容範囲外の場合には、その許容範囲外
のものが読取不能文字となされて距ｍの演算の財政から
除外され、その読取不能文字の個数がそれぞれの辞書セ
ット毎に個別に設けられている計数器で計数される。When calculating the distance Qi, the allowable range of the difference between the dictionary and the feature parameter (the size of the distance R1) is given to the parameter σ.
It is predetermined by the weighting function W described above, and if the difference (size of distance 1 get t) between the dictionary and the feature parameters described above is outside the tolerance range described above, those outside the tolerance range are read. The characters are treated as unreadable characters and excluded from the calculation of distance m, and the number of unreadable characters is counted by a counter provided individually for each dictionary set.

予め定められた個数Ｎの文字についての、各字体別に用
意されているそれぞ扛の辞書セット毎の演算結果は、そ
れぞれの辞芹セット毎に個別に設けられている距離加算
器によ−っ−Ｃ加算される。The calculation results for each dictionary set prepared for each font for a predetermined number N of characters are calculated by a distance adder provided individually for each dictionary set. -C is added.

前記のようにして辞書ベクトルと特徴ベクトルどのｌｉ
’ｌ？　ｎｉ１演算が行なわれて、特徴ベタトルヱ１は
）Ｌｊｉの中で最小距屑の辞書の文字コードとして認、
識される。As mentioned above, the dictionary vector and the feature vector are
'l? The ni1 operation is performed, and the feature Betator 1 is recognized as the character code of the dictionary with the minimum distance in Lji,
be recognized.

すなわち、今、ｆｆｓ　２図示の例のように３種類の字
体に対して用意された３種類の辞書セットがあるものと
し、また、字体の認識を行なうために原稿中の予め定め
られた個数Ｎの文字（例えば、原稿における最初の文字
から引続くＮ個の文字、あるいは例えば、原稿における
１行分の文字）が用いられるとすると、原稿中の前記の
Ｎ個の文字＆よ、前記した３種類の辞書セラ１−との間
でそれぞれの距離演算が行なわれる。In other words, it is assumed that there are three types of dictionary sets prepared for three types of fonts, as shown in the example shown in FFS2, and that there are a predetermined number N of dictionaries in the manuscript in order to recognize the fonts. (for example, the N characters following the first character in the manuscript, or for example, the characters for one line in the manuscript) are used, then the above N characters &yo, the above 3 Distance calculations are performed between each type of dictionary cellar 1-.

そして、各辞書セット毎に読取不能文字の個数ｎｕが計
数器６〜８に計数されて行くとともに、それぞれの文字
の第１候補の距離が距離加算器９〜１１に加算されて行
く。予め定められた個数Ｎの文字の全部についての距離
加算が終了したならば。Then, for each dictionary set, the number nu of unreadable characters is counted by counters 6-8, and the distance of the first candidate for each character is added to distance adders 9-11. Once distance addition has been completed for all of the predetermined number N of characters.

前記した計数器６〜８における計数値の比較を行なって
、８！数値が最小の計数器が１個であれば、その計数器
と対応している辞書セットが選択され。After comparing the counts in the counters 6 to 8, the result is 8! If there is one counter with the smallest value, the dictionary set corresponding to that counter is selected.

また、前記したｎｌ数器６〜８における計数値の比較を
行なった結果として、計数値の最小の計数器が２個以」
二であるという結果が得られた場合には。In addition, as a result of comparing the count values of the nl counters 6 to 8 described above, it was found that there are two or more counters with the minimum count value.
If the result is 2.

次式で示される距離の合計値ＬＲｔ　が最小値を示ず距
凛加算器と対応している辞書セットが選択される。A dictionary set in which the total distance value LRt shown by the following equation does not have a minimum value and corresponds to the distance adder is selected.

第３図は前記した辞書セットの選択の仕方をフローチャ
ートで示したものである。FIG. 3 is a flowchart showing how to select the dictionary set described above.

（効果）以上、詳３・１１に説明したところから明らかなように
１本Ｊａ明の原稿の字体と対応する字体の辞書セットを
ｊ″！４ｉ；くする方法は、それぞ４し異なる字体の文
字毎に用意された複数の辞書セットの内から原稿で使用
されている字体と対応する字体の辞書を選択夕る方法で
あフて、前記のそれぞれ異なる字体毎に用意された複数
の辞書セラ１〜の内の個々の辞Ｚセ°ツ＋−ｆＴに、原
稿中の予め定められた個数の文字の個々のものについて
、！ｌ’を徴パラメータと辞書との距離演算を行ない、
前記の距ｌ１ｉｌｃが予め定めら４した値よりも大きな
文字は読取不能文字として、その関数を各辞書セット毎
にｎ（数するとともに、各辞書セラ１−毎に演算された
距離の合８１値を算出しておき、読取不能文字の個数か
最少のｆｆＦ書セットが１個たもづの場合には、そのｉ
′ｉ塙ｊ（ヒソ１−を透杉退し、また、読取不能文字の
ＩＩ＾１数力ｉ　、％１．少の３？、′七：セットが２
以上存在する場合には、−Ｃれらの辞１１；　ｂ　、、
。(Effect) As is clear from the above explanation in detail 3.11, the method of creating a dictionary set of fonts corresponding to the fonts of one Ja Ming manuscript is j''!4i; Select a dictionary with a font that corresponds to the font used in the manuscript from among multiple dictionary sets prepared for each character, and then For each word Zset+-fT in cella 1~, calculate the distance between the character parameter and the dictionary using !l' for each of the predetermined number of characters in the manuscript,
Characters whose distance l1ilc is larger than a predetermined value of 4 are treated as unreadable characters, and the function is calculated for each dictionary set by n (and the sum of the distances calculated for each dictionary set is 81). If the number of unreadable characters or the minimum ffF document set is one, then the i
'i Hana j (Hiso 1- is removed from Tosugi, and the unreadable characters II^1 numerical power i, %1. Small 3?, '7: Set is 2
If there are more than -C these words 11; b , ,
.

１・の内で距離の合ｉｔ　１ｉｉｊ、の最小のしのを選
択するようにして原稿の字体とツ４応する字体の辞瞥レ
ットを選択するものであるから、既述した従〕）Ｃ法り
二ふし較しで、字体認識の精度を格段に向上させること
ができる。Since the method selects the font of the font that corresponds to the font of the manuscript by selecting the smallest distance within 1. The accuracy of font recognition can be greatly improved by comparing the numbers.

[Brief explanation of drawings]

第１１？ｉ１は本発明による原稿の字体と対応する字体
の部層セットを選択する方法ンメ）占用される光学的読
取装置の−「１１のものの４Ｑ　ｊｌ＆　４’ｆｆ成を
示すブロック図、ｆ（’ｊ　２　ｌｐｌは本発明の原稿
の字体と対応する字体の辞書セットを選択」−る方法が
、’、７．’川された辞１１検索部の一実施例の一部の
プロミツク図、第３図はフローヂャ−１・である。Ｐ　Ｅ　Ｃ・・・光学的文字読取部（光電変換部〕、Ｆ
１〕Ｄ・・・前処理部、ＣＡＤ・・・特徴抽出部、ＤＲ
Ｄ・・・辞書検索部、１〜３・・・字体の辞書セラ１へ
、４・・検崇制御部、５・・・圧部演算部、６〜８・・
・距離加算器、９〜１１・・・読取不能文字の個数の計
数器、１　’ｍEleventh? i1 is a block diagram illustrating the 4Q jl &4'ff configuration of the optical reading device occupied by the method of selecting the subset of fonts corresponding to the font of a document according to the present invention, f('j 2 lpl is a method for selecting a dictionary set of fonts corresponding to the fonts of the manuscript of the present invention. is flowchart 1. P E C... optical character reading section (photoelectric conversion section), F
1] D...Preprocessing section, CAD...Feature extraction section, DR
D... Dictionary search unit, 1-3... Dictionary cellar 1 of font, 4... Kensu control unit, 5... Pressure part calculation unit, 6-8...
・Distance adder, 9 to 11... Counter for the number of unreadable characters, 1 'm

Claims

[Claims]

A method of selecting a dictionary of fonts corresponding to the font used in a manuscript from among a plurality of dictionary sets prepared for each of the different fonts, For each set of if books of the plurality of dictionary sets 1~, for each of the predetermined number of characters in the manuscript, the 1st parameter and the 1♂Ij l' of the τr book are calculated. 71j operation, the above y+x
Characters where m is larger than -F are treated as unreadable characters, and the number of such characters is added to each dictionary set 4if as a (along with the number, the sum of the distances calculated for each dictionary set 1-i σ is 31 values). If there is only one dictionary set with the least number of unreadable characters, select that dictionary set, and select the dictionary set with the least number of unreadable characters: i'!
If there are two or more ``motu 1~'', select the dictionary set of the font corresponding to the font of the manuscript. how to