JPS6072088A - Method for selecting dictionary set of character form corresponding to character form used in original - Google Patents

Method for selecting dictionary set of character form corresponding to character form used in original

Info

Publication number
JPS6072088A
JPS6072088A JP58179386A JP17938683A JPS6072088A JP S6072088 A JPS6072088 A JP S6072088A JP 58179386 A JP58179386 A JP 58179386A JP 17938683 A JP17938683 A JP 17938683A JP S6072088 A JPS6072088 A JP S6072088A
Authority
JP
Japan
Prior art keywords
dictionary
characters
character
distance
dictionary set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP58179386A
Other languages
Japanese (ja)
Inventor
Hajime Sato
元 佐藤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ricoh Co Ltd
Original Assignee
Ricoh Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ricoh Co Ltd filed Critical Ricoh Co Ltd
Priority to JP58179386A priority Critical patent/JPS6072088A/en
Publication of JPS6072088A publication Critical patent/JPS6072088A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PURPOSE:To improve an accuracy for recognizing a character form by selecting a dictionary of a character form corresponding to a character form used in an original, from in plural dictionary sets prepared at every character of different character forms. CONSTITUTION:In a dictionary retrieving part DRD, with regard to each character which has been read from an original, a distance arithmetic of a feature parameter from a feature extracting part CAD, and a dictionary set is executed at every dictionary set 1-3. In case a difference (size of a distance) of the dictionary and the feature parameter is outside of the allowable range, the character is regarded as an unreadable character. The number of unreadable characters is counted by counters 6-8 provided on every dictionary set. In case a dictionary set in which the number of unreadable characters is the smallest is only one, its dictionary set is selected, and in case two or more dictionary sets in which the number of unreadable characters is the smallest exist, the dictionary set in which a total value of the distance is the smallest in those dictionary sets is selected.

Description

【発明の詳細な説明】 (技術分野) 本発明は、光学的文字読取!A、同などにおいて多種類
の字体の文字(文字、斂字)の認識を行なう場合に使用
される字体の識別に関連し、て用いられろ技術に関する
[Detailed Description of the Invention] (Technical Field) The present invention provides optical character reading! A. This paper relates to a technique used in connection with the identification of fonts used in the recognition of characters (characters, cross-cut characters) of various fonts.

(従来技術) 光学的文字読取装置などにおいて多種類の文字の字体を
識別する際に用いられていた従来法としては、例えば、
特公昭5G 44455号公報で開示されているように
、複数個の辞書セラl−を用いて行なわれた読取動作時
に、読取不能文字の発生の個数が最少の辞書セラI・か
らの出力結果を採用するというもの、あるいは例えば、
抽出特徴パラメータと、容具なる字体の辞書との距甜t
をめて字体の識別を行なう方法、などがあろが、前記し
た従゛来法における曲名の場合には、誤認文字の判定が
困難であるとともに、異なる字体間の同一文字種でパラ
メータ間に重なりがある場合に字体の識別が円錐でIり
るという問題点があり、また、前記した従来法における
後者の場合には異なる字体の同一の文字種のパラメータ
間で明らかに重なりがない場合(例えば、アルファベッ
トの小文字のgと?)に問題が生じる・ (目的) 本発明は、多種類の字体の文字の認識率を向」ニさせる
ことができるような原稿の字体と対応する7体の辞書セ
ットを選択する方法を提供して前記した従来法における
問題点が解消されるようにすることをlA的とする。
(Prior Art) Conventional methods used to identify the fonts of many types of characters in optical character reading devices include, for example,
As disclosed in Japanese Patent Publication No. 5G 44455, when a reading operation is performed using a plurality of dictionary cellars, the output result from the dictionary cellar I. which generates the least number of unreadable characters is selected. Recruitment or, for example,
The distance t between the extracted feature parameters and the dictionary of fonts
However, in the case of song titles using the conventional method described above, it is difficult to identify misidentified characters, and there is overlap between parameters for the same character type in different fonts. In some cases, there is a problem in that the fonts are identified by a cone, and in the latter case in the conventional method described above, there is no obvious overlap between the parameters of the same character type of different fonts (for example, when there is no overlap between the parameters of the same character type in different fonts (Purpose) The present invention provides a dictionary set of 7 fonts corresponding to the fonts of the manuscript, which can improve the recognition rate of characters with many types of fonts. It is our objective to provide a method for selection so that the problems with the conventional methods described above can be solved.

(t−f成) 以下、添イ・(凹面を恰照して、本発明の原稿の字イ・
(\ど文4応する字体の辞書セラi〜を選択する方法、
ず4rわち、それぞれ異なる字体の文字毎に用意さJし
た↑;(数の辞書セットの内から原価で使用されている
゛と体と文、I Lt、:する字体の辞デ(を選択する
方法でlつっで、前記のそれぞれ異なる字体毎に用意さ
れた複(3!の辞書セットの内の個々の辞書セット毎に
、原稿中の予め定められた個数の文字の個々のものにつ
いて、特徴パラメータと辞書との距離演算を行ない、前
記の距離が予め定められた値よりも大きな文字は読取不
能文字として、その個数を各辞書セラ1−毎に計数する
とともに、各辞書セラ1−毎に演算された距離の合計値
を算出し7ておき、読取不能文字の個数が最少の辞書セ
ットが1個だけの場合には、その辞書セットを選択し、
また、読取不能文字の個数が最少の辞書セットが2以上
存在する場合には、それらの辞書セラ1−の内で顕部の
合R1値の最小のものを選択するようにした原稿の字体
と対応する字体の辞書セットを)顎択する方法、につい
てのBi’ K(itな説明を行なう。
(t-f composition) Hereinafter, appended A.
(\Dobun 4 How to select the corresponding font in the dictionary,
4r That is, for each character with a different font, select the ``゛, font, and sentence, I Lt,: ``, ``, font, and sentence'' used at cost from a dictionary set of numbers. In a method of The distance between the feature parameter and the dictionary is calculated, and characters whose distance is larger than a predetermined value are considered unreadable characters, and the number of characters is counted for each dictionary cellar 1-, and the number of characters is counted for each dictionary cellar 1-. Calculate the total value of the distances calculated in 7, and if there is only one dictionary set with the least number of unreadable characters, select that dictionary set,
In addition, if there are two or more dictionary sets with the minimum number of unreadable characters, the one with the minimum sum R1 value of the visible part is selected from among those dictionary sets. We will give a detailed explanation of how to select a dictionary set of corresponding fonts.

第1図は、本発明の原稿の字体とり1応する字体の辞書
セットを選択する方法が適用される光学的読取装釘の一
例のものの概略椅成を示すブロック図であって、この第
1図においで、F’ E Cは光学的文字読取部(光電
変換部)、F JJ Dは前処理部、CADは特徴抽出
部、L) −RDは辞¥IF検索部であり、また、第2
図は本発明の原稿の字体と対応する字体の辞書セットを
選択する方法が適用された辞書検索部D RDの一実施
例の一部のブロック図を示し、さらに、第3図はフロー
チャートを示している。
FIG. 1 is a block diagram schematically showing the configuration of an example of an optical reading nail to which the method of selecting a dictionary set of fonts corresponding to the font of a manuscript according to the present invention is applied. F'EC is an optical character reading section (photoelectric conversion section), FJJD is a preprocessing section, CAD is a feature extraction section, L)-RD is a dictionary IF search section, and the second
The figure shows a block diagram of a part of an embodiment of the dictionary search unit DRD to which the method of selecting a dictionary set of fonts corresponding to the font of a manuscript according to the present invention is applied, and further, FIG. 3 shows a flowchart. ing.

第2図において、1〜3はそれぞれ異なる字体の辞書セ
ットであり、また、4は検索制御部、5は距離演算部、
6〜8は距離加算器、9〜11は読取不能文字の個数の
計数器であるが、この第2図においては、距離加算器6
と8!数M)9とが辞書セット1に関するデータを取扱
い、また、圧部加算W7と言1数器10とが辞書セラ1
〜2に凹するデータを取扱い、さらに距離加算器8と計
数器11とが辞書セット3に門するデータを取扱ってい
るものとされている(第2図示の例では、字体の種類が
37Jの場合を示しているが、字体のTff類は任意で
あっ°Cもよいのであり、字体の¥′TX類がMの3g
 6には前記した外部加算器や泪数器も、それぞれM個
づつ用!:J: l、で、トメ1個の辞費・セットにお
ける各1つづつの辞書セッI−毎に、それぞれ1つづつ
の距離加算器と計数器とを対応させればよいので5口る
)。
In FIG. 2, 1 to 3 are dictionary sets with different fonts, 4 is a search control section, 5 is a distance calculation section,
6 to 8 are distance adders, and 9 to 11 are counters for counting the number of unreadable characters.
And 8! The number M) 9 handles data related to the dictionary set 1, and the pressure part addition W7 and the 1 digit unit 10 handle the data regarding the dictionary set 1.
It is assumed that the distance adder 8 and the counter 11 handle data belonging to the dictionary set 3 (in the example shown in the second figure, the font type is 37J). The case is shown here, but the Tff type of the font is arbitrary and °C is also fine, and the \'TX type of the font is 3g of M.
6 also uses M pieces of each of the external adders and multimeters mentioned above! :J: l, and each one dictionary set I- in one dictionary set of one tome needs to be associated with one distance adder and one counter, so there are five entries).

さて、原稿の文字が一光学的文字読取部(光電変換部)
PECによって画像信号に変換され、その画像信号は前
処理部FDD、特徴抽出部CADなどによって所定の信
号処理が施こされて、辞書検索部DRDには原稿の文字
の特徴パラメータが、特徴抽出部CADに与えられる。
Now, the characters on the original are one optical character reading section (photoelectric conversion section).
The image signal is converted into an image signal by PEC, and the image signal is subjected to predetermined signal processing by the preprocessing unit FDD, feature extraction unit CAD, etc., and the feature parameters of the characters in the document are stored in the dictionary search unit DRD. given to CAD.

辞書検索部DRDでは、原稿から読取られた予め定めら
れた個数Nの文字における個々の文字について、特徴抽
出部CADからそれに与えられた特徴パラメータと辞書
セットとの距離演算を、各辞書セラ1へ1〜3毎に行な
う。
The dictionary search unit DRD calculates the distance between the feature parameters given to each character from the feature extraction unit CAD and the dictionary set for each character in a predetermined number N of characters read from the manuscript to each dictionary cellar 1. Do this every 1 to 3 times.

前記した距離演算は、例えば距11 RについてRji
 =Σw l dkj−Pki l −−・・・・・(
1)k=1 上記の(1)式のような関数を考えて行なうことができ
る。(1)式において、 Rjiは、辞書ベタ1−ルdj と特徴ベクトルPiと
の距歴、 Wは重み関数、 dkjは辞書ベクトルdjのに成分(パラメータ)、P
hiは特徴ベクトルPiのh成分(パラメータ)、であ
る。
For example, the distance calculation described above is performed using Rji for the distance 11R.
=Σw l dkj−Pki l −−・・・・・・(
1) k=1 This can be done by considering a function such as the above equation (1). In equation (1), Rji is the distance history between the dictionary vector dj and the feature vector Pi, W is the weighting function, dkj is the component (parameter) of the dictionary vector dj, and P
hi is the h component (parameter) of the feature vector Pi.

距Qiの演算に当っては、パラメータイσに辞書と特徴
パラメータとの相違(距R1の大ささ)の許容範囲が、
前記した重み関数Wによって、予め定められており、前
記した辞書と特徴パラメータとの相違(距1ゲtの大き
さ)が前記した許容範囲外の場合には、その許容範囲外
のものが読取不能文字となされて距mの演算の財政から
除外され、その読取不能文字の個数がそれぞれの辞書セ
ット毎に個別に設けられている計数器で計数される。
When calculating the distance Qi, the allowable range of the difference between the dictionary and the feature parameter (the size of the distance R1) is given to the parameter σ.
It is predetermined by the weighting function W described above, and if the difference (size of distance 1 get t) between the dictionary and the feature parameters described above is outside the tolerance range described above, those outside the tolerance range are read. The characters are treated as unreadable characters and excluded from the calculation of distance m, and the number of unreadable characters is counted by a counter provided individually for each dictionary set.

予め定められた個数Nの文字についての、各字体別に用
意されているそれぞ扛の辞書セット毎の演算結果は、そ
れぞれの辞芹セット毎に個別に設けられている距離加算
器によ−っ−C加算される。
The calculation results for each dictionary set prepared for each font for a predetermined number N of characters are calculated by a distance adder provided individually for each dictionary set. -C is added.

前記のようにして辞書ベクトルと特徴ベクトルどのli
’l? ni1演算が行なわれて、特徴ベタトルヱ1は
)Ljiの中で最小距屑の辞書の文字コードとして認、
識される。
As mentioned above, the dictionary vector and the feature vector are
'l? The ni1 operation is performed, and the feature Betator 1 is recognized as the character code of the dictionary with the minimum distance in Lji,
be recognized.

すなわち、今、ffs 2図示の例のように3種類の字
体に対して用意された3種類の辞書セットがあるものと
し、また、字体の認識を行なうために原稿中の予め定め
られた個数Nの文字(例えば、原稿における最初の文字
から引続くN個の文字、あるいは例えば、原稿における
1行分の文字)が用いられるとすると、原稿中の前記の
N個の文字&よ、前記した3種類の辞書セラ1−との間
でそれぞれの距離演算が行なわれる。
In other words, it is assumed that there are three types of dictionary sets prepared for three types of fonts, as shown in the example shown in FFS2, and that there are a predetermined number N of dictionaries in the manuscript in order to recognize the fonts. (for example, the N characters following the first character in the manuscript, or for example, the characters for one line in the manuscript) are used, then the above N characters &yo, the above 3 Distance calculations are performed between each type of dictionary cellar 1-.

そして、各辞書セット毎に読取不能文字の個数nuが計
数器6〜8に計数されて行くとともに、それぞれの文字
の第1候補の距離が距離加算器9〜11に加算されて行
く。予め定められた個数Nの文字の全部についての距離
加算が終了したならば。
Then, for each dictionary set, the number nu of unreadable characters is counted by counters 6-8, and the distance of the first candidate for each character is added to distance adders 9-11. Once distance addition has been completed for all of the predetermined number N of characters.

前記した計数器6〜8における計数値の比較を行なって
、8!数値が最小の計数器が1個であれば、その計数器
と対応している辞書セットが選択され。
After comparing the counts in the counters 6 to 8, the result is 8! If there is one counter with the smallest value, the dictionary set corresponding to that counter is selected.

また、前記したnl数器6〜8における計数値の比較を
行なった結果として、計数値の最小の計数器が2個以」
二であるという結果が得られた場合には。
In addition, as a result of comparing the count values of the nl counters 6 to 8 described above, it was found that there are two or more counters with the minimum count value.
If the result is 2.

次式で示される距離の合計値LRt が最小値を示ず距
凛加算器と対応している辞書セットが選択される。
A dictionary set in which the total distance value LRt shown by the following equation does not have a minimum value and corresponds to the distance adder is selected.

第3図は前記した辞書セットの選択の仕方をフローチャ
ートで示したものである。
FIG. 3 is a flowchart showing how to select the dictionary set described above.

(効果) 以上、詳3・11に説明したところから明らかなように
1本Ja明の原稿の字体と対応する字体の辞書セットを
j″!4i;くする方法は、それぞ4し異なる字体の文
字毎に用意された複数の辞書セットの内から原稿で使用
されている字体と対応する字体の辞書を選択夕る方法で
あフて、前記のそれぞれ異なる字体毎に用意された複数
の辞書セラ1〜の内の個々の辞Zセ°ツ+−fTに、原
稿中の予め定められた個数の文字の個々のものについて
、!l’を徴パラメータと辞書との距離演算を行ない、
前記の距l1ilcが予め定めら4した値よりも大きな
文字は読取不能文字として、その関数を各辞書セット毎
にn(数するとともに、各辞書セラ1−毎に演算された
距離の合81値を算出しておき、読取不能文字の個数か
最少のffF書セットが1個たもづの場合には、そのi
′i塙j(ヒソ1−を透杉退し、また、読取不能文字の
II^1数力i 、%1.少の3?、′七:セットが2
以上存在する場合には、−Cれらの辞11; b 、、
(Effect) As is clear from the above explanation in detail 3.11, the method of creating a dictionary set of fonts corresponding to the fonts of one Ja Ming manuscript is j''!4i; Select a dictionary with a font that corresponds to the font used in the manuscript from among multiple dictionary sets prepared for each character, and then For each word Zset+-fT in cella 1~, calculate the distance between the character parameter and the dictionary using !l' for each of the predetermined number of characters in the manuscript,
Characters whose distance l1ilc is larger than a predetermined value of 4 are treated as unreadable characters, and the function is calculated for each dictionary set by n (and the sum of the distances calculated for each dictionary set is 81). If the number of unreadable characters or the minimum ffF document set is one, then the i
'i Hana j (Hiso 1- is removed from Tosugi, and the unreadable characters II^1 numerical power i, %1. Small 3?, '7: Set is 2
If there are more than -C these words 11; b , ,
.

1・の内で距離の合it 1iij、の最小のしのを選
択するようにして原稿の字体とツ4応する字体の辞瞥レ
ットを選択するものであるから、既述した従〕)C法り
二ふし較しで、字体認識の精度を格段に向上させること
ができる。
Since the method selects the font of the font that corresponds to the font of the manuscript by selecting the smallest distance within 1. The accuracy of font recognition can be greatly improved by comparing the numbers.

【図面の簡単な説明】[Brief explanation of drawings]

第11?i1は本発明による原稿の字体と対応する字体
の部層セットを選択する方法ンメ)占用される光学的読
取装置の−「11のものの4Q jl& 4’ff成を
示すブロック図、f(’j 2 lplは本発明の原稿
の字体と対応する字体の辞書セットを選択」−る方法が
、’、7.’川された辞11検索部の一実施例の一部の
プロミツク図、第3図はフローヂャ−1・である。 P E C・・・光学的文字読取部(光電変換部〕、F
1〕D・・・前処理部、CAD・・・特徴抽出部、DR
D・・・辞書検索部、1〜3・・・字体の辞書セラ1へ
、4・・検崇制御部、5・・・圧部演算部、6〜8・・
・距離加算器、9〜11・・・読取不能文字の個数の計
数器、1 ’m
Eleventh? i1 is a block diagram illustrating the 4Q jl &4'ff configuration of the optical reading device occupied by the method of selecting the subset of fonts corresponding to the font of a document according to the present invention, f('j 2 lpl is a method for selecting a dictionary set of fonts corresponding to the fonts of the manuscript of the present invention. is flowchart 1. P E C... optical character reading section (photoelectric conversion section), F
1] D...Preprocessing section, CAD...Feature extraction section, DR
D... Dictionary search unit, 1-3... Dictionary cellar 1 of font, 4... Kensu control unit, 5... Pressure part calculation unit, 6-8...
・Distance adder, 9 to 11... Counter for the number of unreadable characters, 1 'm

Claims (1)

【特許請求の範囲】[Claims] それぞれ異なる字体の文字毎に用意された複数の辞書セ
ットの内から原稿で使用されている字体と91応する字
体の辞書を選択する方法であって、前記のそれぞれ異な
る字体毎に用膚;された複数の辞書セットの内の飼々の
if書セツ1〜毎に、原稿中の予め定められた個数の文
字の個々のものについて、1寺徴パラメータと、τr書
との1♂Ij l′71j演算を行ない、前記のy+x
 mが−Fめ定められた値よりも大きな文字は読取不能
文字として、その個数を各辞書セット4ifにa(数ず
ろとともに、各辞汗セッ1−イσに演算された距離の合
31値を算出し′Cおき、読取不能文字の個数が最少の
辞書セットが1個だけの場合には、その辞書セットを選
択し、また、読取不能文字の個り:父が最少の辞i’!
’モツ1〜が2以上荏在する場合には、それらの辞−ト
セット・の内で距離の合n1値の最小のものを選択する
ようにした原稿の字体と対応する字体の辞書セットを選
択する方法
A method of selecting a dictionary of fonts corresponding to the font used in a manuscript from among a plurality of dictionary sets prepared for each of the different fonts, For each set of if books of the plurality of dictionary sets 1~, for each of the predetermined number of characters in the manuscript, the 1st parameter and the 1♂Ij l' of the τr book are calculated. 71j operation, the above y+x
Characters where m is larger than -F are treated as unreadable characters, and the number of such characters is added to each dictionary set 4if as a (along with the number, the sum of the distances calculated for each dictionary set 1-i σ is 31 values). If there is only one dictionary set with the least number of unreadable characters, select that dictionary set, and select the dictionary set with the least number of unreadable characters: i'!
If there are two or more ``motu 1~'', select the dictionary set of the font corresponding to the font of the manuscript. how to
JP58179386A 1983-09-29 1983-09-29 Method for selecting dictionary set of character form corresponding to character form used in original Pending JPS6072088A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP58179386A JPS6072088A (en) 1983-09-29 1983-09-29 Method for selecting dictionary set of character form corresponding to character form used in original

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP58179386A JPS6072088A (en) 1983-09-29 1983-09-29 Method for selecting dictionary set of character form corresponding to character form used in original

Publications (1)

Publication Number Publication Date
JPS6072088A true JPS6072088A (en) 1985-04-24

Family

ID=16064948

Family Applications (1)

Application Number Title Priority Date Filing Date
JP58179386A Pending JPS6072088A (en) 1983-09-29 1983-09-29 Method for selecting dictionary set of character form corresponding to character form used in original

Country Status (1)

Country Link
JP (1) JPS6072088A (en)

Similar Documents

Publication Publication Date Title
CN106598959B (en) Method and system for determining mutual translation relationship of bilingual sentence pairs
CN110741376B (en) Automatic document analysis for different natural languages
EP3716099A1 (en) Document classification device
CN111694946A (en) Text keyword visual display method and device and computer equipment
CN106528508A (en) Repeated text judgment method and apparatus
CN111666761A (en) Fine-grained emotion analysis model training method and device
CN111382248A (en) Question reply method and device, storage medium and terminal equipment
CN108363691A (en) A kind of field term identifying system and method for 95598 work order of electric power
CN114861635A (en) Chinese spelling error correction method, device, equipment and storage medium
CN112989829B (en) Named entity recognition method, device, equipment and storage medium
CN112182337B (en) Method for identifying similar news from massive short news and related equipment
CN114842982B (en) Knowledge expression method, device and system for medical information system
CN104281433B (en) For calculating the model computing unit and controller of the function model based on data
JPS6072088A (en) Method for selecting dictionary set of character form corresponding to character form used in original
JPH06282587A (en) Automatic classifying method and device for document and dictionary preparing method and device for classification
CN110909546A (en) Text data processing method, device, equipment and medium
JP3194080B2 (en) Fraction processing device
CN116341543B (en) Method, system, equipment and storage medium for identifying and correcting personal names
CN114253960B (en) Electronic invoice reimbursement and posting and filing integrated service center platform system
CN114610873A (en) Text processing method and device, electronic equipment and computer readable storage medium
CN114357005A (en) Method and device for generating scientific information, terminal and storage medium
CN115220582A (en) Text input method, equipment, device and storage medium for supporting backspace input method
CN116662567A (en) Comprehensive energy knowledge graph completion method
KR900007727B1 (en) Character recognition apparatus
JPH09204490A (en) Method for selecting character string input means