JPS6162986A - Recognition order determining system - Google Patents

Recognition order determining system

Info

Publication number
JPS6162986A
JPS6162986A JP59186011A JP18601184A JPS6162986A JP S6162986 A JPS6162986 A JP S6162986A JP 59186011 A JP59186011 A JP 59186011A JP 18601184 A JP18601184 A JP 18601184A JP S6162986 A JPS6162986 A JP S6162986A
Authority
JP
Japan
Prior art keywords
character type
character
recognition
categories
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP59186011A
Other languages
Japanese (ja)
Inventor
Hiroshi Matsumura
松村 博
Tatsunosuke Iwahara
岩原 達之助
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tokyo Sanyo Electric Co Ltd
Sanyo Electric Co Ltd
Original Assignee
Tokyo Sanyo Electric Co Ltd
Sanyo Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tokyo Sanyo Electric Co Ltd, Sanyo Electric Co Ltd filed Critical Tokyo Sanyo Electric Co Ltd
Priority to JP59186011A priority Critical patent/JPS6162986A/en
Publication of JPS6162986A publication Critical patent/JPS6162986A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PURPOSE:To improve the recognition rate of the titled system by determining priority levels corresponding to a learning stage for individual character type categories and determining the recognition order of candidate character type categories on the basis of calculated degrees of resemblance and these priority levels. CONSTITUTION:A binary character pattern from a character observing part 1 is inputted to a pattern matching part 4 through a feature extracting part 2,a nd degrees of resemblance between a feature pattern and standard feature patterns are calculated. Character type categories in a dictionary part 3 re divided in accordance with learning stages of learning education, and priority levels are determined for individual category sets. Operating parts 4a-4d in the pattern matching part 4 add priority levels of category sets to character type codes and degrees of resemblance and store them in a candidate memory 5. A knowledge part 6 determines the recognition order of candidate character type categories on the basis of calculated degrees of resemblance and priority levels and stores character type codes in a result memory 8 in the recognition order.

Description

【発明の詳細な説明】 (イ)産業上の利用分野 本発明は、手書き漢字を認識する文字認識システムに係
り、候補字種カテゴリーの認識順位決定方式に関する。
DETAILED DESCRIPTION OF THE INVENTION (a) Field of Industrial Application The present invention relates to a character recognition system for recognizing handwritten Chinese characters, and more particularly to a recognition ranking determination method for candidate character type categories.

(ロ)従来の技術 一般に、文字認識システムでは、入力文字パターンから
抽出した特徴パターンと、予め辞書部に登録された字種
カテゴリー毎の標準特徴パターンとの類似度を計算し、
類似度の大きいn1個の候補字種カテゴリーを選択する
。そして、類似度の最も大きい候補字種カテゴリーを認
識結果として出力すると共に、誤認識の訂正のために、
選択したル個の候補字種カテゴリーには、類似度の大き
い1!iiに第1位から第1位までの認識順位を決定し
ておく。
(b) Conventional technology Generally, in a character recognition system, the degree of similarity between a feature pattern extracted from an input character pattern and a standard feature pattern for each character type category registered in advance in a dictionary section is calculated,
Select n1 candidate character type categories with a high degree of similarity. Then, the candidate character type category with the highest degree of similarity is output as the recognition result, and in order to correct misrecognition,
Among the selected candidate character categories, 1! In step ii, the recognition ranking from first place to first place is determined.

ところが上述の如く、認R順位の決定に知什j、1丈の
みを用いていたのでは誤♂識が多く、そこで、類似度に
よる複数の候補字種カテゴリーの選択後に、何らかの後
処理を施して認識順位を決定する方式が考えられるよう
になった。
However, as mentioned above, using only Chijij and Ichijo to determine the recognition R ranking causes many false positives, so some kind of post-processing is performed after selecting multiple candidate character type categories based on similarity. A method for determining recognition rankings has become possible.

そして、従来、後処理としては、特開昭59−3208
2号公報に開示されているように、文法的処理を行なう
ものや、特開昭59−27381号公報のように、被認
識文字の前後の文字が、漢字か、カタカナかあるいはひ
らがなかを判定するものが提案されていた。
Conventionally, as post-processing, Japanese Patent Application Laid-Open No. 59-3208
As disclosed in Publication No. 2, there are methods that perform grammatical processing, and methods that determine whether the characters before and after the recognized character are kanji, katakana, or hiragana, as in Japanese Patent Application Laid-open No. 59-27381. Something was proposed.

(ハ)発明が解決しようとする問題点 従来の技術においては、文法的処理を後処理として行な
うので、文法的な辞書等の知識部が莫大となり、更には
、その処理内容が非常に複雑になるという問題があり、
又、前後の文字が、漢字かカタカナか等を判定する方式
では、選択した候補字種カテゴリーが漢字やひらがなば
かりである場合には、認識率の向上は期待できなかった
(c) Problems to be solved by the invention In the conventional technology, grammatical processing is performed as post-processing, so the knowledge section such as a grammatical dictionary becomes enormous, and furthermore, the processing content becomes extremely complex. There is a problem that
Furthermore, in the method of determining whether the preceding and succeeding characters are kanji or katakana, no improvement in recognition rate could be expected if the selected candidate character categories were only kanji or hiragana.

更に、難しい字種カテゴリーは、その特徴が多いため、
他の字種カテゴリーと区別しやすく、誤認識は比較的少
ないが、簡単な字種カテゴリーは、その特徴が少ないた
め、類似度の接近した候補字種カテゴリーが多く選択さ
れて1.まい、難しい字種カテゴリーの認識順位が上位
に来て誤認識してしまうということが多々あった。
Furthermore, difficult character categories have many characteristics, so
Easy to distinguish from other glyph categories, and there are relatively few misrecognitions, but because simple glyph categories have few characteristics, many candidate glyph categories with close similarity are selected.1. However, there were many cases where difficult character categories were ranked higher in the recognition ranking and were misrecognized.

に)問題点を解決するための手段 本発明は、字種カテゴリーの各々に、学校教育の学習段
階に応じた優先度を定めておき、計算により得られた類
似度とこの優先度とに基づいて、候補字種カテゴリーの
認識順位を決定するものである。
B) Means for Solving the Problems The present invention sets a priority level for each character type category according to the learning stage of school education, and then sets a priority level for each character type category based on the degree of similarity obtained by calculation and this priority level. This determines the recognition ranking of candidate character type categories.

(ホ)作用 本発明に依れば、類似度が犬ぎく、且つ、学校教育の学
習段階に応じた優先度の高い候補字種カテゴリーの認識
順位が上位に来ることとなり、簡単で一般大衆によ(知
られている字種カテゴIJ−が優先されるようになる。
(e) Effects According to the present invention, the recognition ranking of the candidate character type categories that have the highest degree of similarity and have a high priority according to the learning stage of school education will be placed at the top, making it easy for the general public to recognize YO (known character type category IJ- will now have priority.

又、知識部には、類似度と優先度の関係を記憶しておけ
ばよいこととなる。
Furthermore, it is sufficient to store the relationship between similarity and priority in the knowledge section.

(へ)実施例 第1図は、本発明を適用した文字認識システムのブロッ
ク図であり、(1)は入力用原稿に書かれた文字を読取
り、読取り結果を2値の文字パターンとして出力する文
字観測部、(2)は入力文字パターンから特徴パターン
を抽出する特徴抽出部、(3)は字種カテゴリー毎の標
準特徴パターンを記憶した辞1°部、(4)は抽出した
特徴パターンと標準特徴パターンとのマツチングを行な
い、両パターンの類似度を計算するパターンマツチング
部である。
(F) Embodiment FIG. 1 is a block diagram of a character recognition system to which the present invention is applied, and (1) reads characters written on an input manuscript and outputs the reading result as a binary character pattern. Character observation section, (2) is a feature extraction section that extracts feature patterns from input character patterns, (3) is a character observation section that stores standard feature patterns for each character category, and (4) is a feature extraction section that extracts feature patterns from input character patterns. This is a pattern matching unit that performs matching with a standard feature pattern and calculates the degree of similarity between both patterns.

辞書部(3)の字種カテゴリーは、学校教育の学習段階
に応じたカテゴリー分けが為されており、各カテゴリー
セントに優先度が定められている。即ち、第2図て示す
ように、小学校1年〜3年で学習する文字をカテゴリー
七ン) 1 (3a) 、小学校4〜6年で学習する文
字をカテゴリーセット2(3b)、中学校以上で学習す
る文字をカテゴリーセット3 (3c)、というように
全ての字種カテゴリーを3つのカテゴリーセットに分け
、カテゴリーセット1〜3に順に優先度1〜3を定めて
いる。
The character categories in the dictionary section (3) are divided into categories according to the learning stage of school education, and a priority is set for each category cent. That is, as shown in Figure 2, the characters learned in the first to third years of elementary school are categorized as Category 7) 1 (3a), the characters learned in the fourth to sixth years of elementary school as Category Set 2 (3b), and the characters learned in junior high school and above are categorized as Category Set 1 (3a). All character categories are divided into three category sets, such as character to be learned in category set 3 (3c), and priority levels 1 to 3 are assigned to category sets 1 to 3 in order.

パターンマツチング部(4)は、カテゴリーセット1〜
3に各々対応する3つの演算部(4a)〜(4c)を備
えており、各演算部は各カテゴリーセットの中から類似
度の大きい順に九個の候補字種カテゴリーを選択し、そ
の字種コード及び計算結果としての類似度を、候補メモ
リ(5)に格納する。この際、演算部では対応するカテ
ゴリーセットの優先度を字種コード及び類似度f付加し
、これら3つの情報が各々の候補字種カテゴリーの情報
として候補メモ1月5)に記憶される。このようにして
、候補メモリ(5)には、各カテゴリーセントの中から
九個づつ、合計3rL個の候補字種カテゴリーが記憶さ
れる。
The pattern matching section (4) selects category sets 1~
3, each of which selects nine candidate character type categories from each category set in descending order of similarity. The code and the similarity as a calculation result are stored in a candidate memory (5). At this time, the calculation unit adds the priority of the corresponding category set to the character type code and the similarity f, and these three pieces of information are stored in the candidate memo as information of each candidate character type category. In this way, a total of 3rL candidate character type categories, nine of each category cent, are stored in the candidate memory (5).

知識部(6)には、類似度と優先度との関係が記憶され
ており、クラスタリング制御処理部(7)はこの知識部
(6)の内容を参照して、候補メモリ(5)に記憶され
た3rL個の候補字種カテゴリーのうち上位九個の認識
順位を決定し、その字種コードを認識順位順に結果メモ
1月8)に格納する。例えば類似度としてシティブロッ
ク距離DI  を用い、この距離が小さい程類似度が太
きいとすれば、知識部(6)には具体的には、第3図に
示すように、距離DIと優先度による認識順位の入れ換
えの可否の関係が記憶されている。
The knowledge section (6) stores the relationship between similarity and priority, and the clustering control processing section (7) refers to the contents of this knowledge section (6) and stores it in the candidate memory (5). The recognition rankings of the top nine of the 3rL candidate character type categories are determined, and the character type codes are stored in the result memo (January 8) in order of recognition ranking. For example, if we use the city block distance DI as the degree of similarity, and assume that the smaller this distance is, the greater the degree of similarity, then the knowledge section (6) specifically contains the distance DI and the priority, as shown in Figure 3. The relationship as to whether or not the recognition order can be swapped is stored.

以下、具体例を上げて認識順位の決定の様子を説明する
The manner in which the recognition ranking is determined will be explained below using a specific example.

先ず、字種コードがMl、シティブロック距離がDI優
先度がP (1=a、b、c−1P=1、と、3)の候
補字種カテゴリーを(M2S DI、 p )と表わす
こととし、例えば、第4図(イ)に示すように、カテゴ
リーセット1〜3の各々から、類似度が上位5個づつの
候補字種カテゴリーが選択され、各シティブロック距離
Di の関係が、Dv(:D。
First, the candidate character type category with character type code Ml, city block distance and DI priority P (1=a, b, c-1P=1, and 3) is expressed as (M2S DI, p). For example, as shown in FIG. 4(a), the top five candidate character type categories with the highest similarity are selected from each of category sets 1 to 3, and the relationship between the city block distances Di is expressed as Dv( :D.

<D−<DQ <Db <Dr <D Wであったとす
る。
Assume that <D-<DQ <Db <Dr <D W.

スルト、従来の如くシティブロック距離の大小だけから
では、認識順位は第4図(ロ)に示すようになる。
If, as in the past, only the size of the city block distance is used, the recognition order will be as shown in FIG. 4 (b).

とこうが、今、知識部(6)における閾値り、、D、、
D、と計算したシティブロック距離の関係が、D。
Now, the threshold in the knowledge section (6) is D...
The relationship between D and the calculated city block distance is D.

<Dv 、Da <Dt <DQ 、Dw <Da で
あったとすると、クラスタリング制御処理部(7)は、
字種コードMv、MP、M、をBランクに、そして、字
種コードM、、Mb、Mr、MwをCランクにランク分
けし、これらのランク内で優先度による認識順位の入れ
換えを行なうため、第4図(ハ)に示すように、各ラン
ク内では優先度の高い字種コードが上位に来ることとな
り、結果として、類似度と優先度に基づく認識順位が決
定される。そして、上位5個の字種コードM@ 、Mp
 、My 、Mb。
<Dv, Da <Dt <DQ, Dw <Da, the clustering control processing unit (7)
In order to rank the character type codes Mv, MP, M, as rank B, and the character type codes M, , Mb, Mr, Mw as rank C, and change the recognition order based on priority within these ranks. , As shown in FIG. 4(c), within each rank, character type codes with higher priority are placed at the top, and as a result, recognition rankings are determined based on similarity and priority. Then, the top 5 character type codes M@, Mp
, My , Mb.

M、が順に結果メモ1月8)に記憶され、答出力制御部
(9)は第1位の字種コードM、をワープロあるいはパ
ソコン等の文字表示装置に認識結果として送出し、その
字種の表示が行なわれる。このとき、もし誤認識であれ
ば次の認識順位の字種コードを送出し、以下、正しい認
識結果が得られるまで順次、次の順位の字種コードが送
出されろ。
M, are stored in the result memo (January 8) in order, and the answer output control unit (9) sends the first character type code M, to a character display device such as a word processor or a personal computer as a recognition result, and displays that character type. is displayed. At this time, if there is a misrecognition, the character type code of the next recognition rank is sent out, and thereafter, the character type codes of the next rank are sent out in sequence until a correct recognition result is obtained.

従って、類似度が大きく、且つ、学校教育の学習段階の
低い候補字種カテゴリーが優先され、簡単で一般大衆に
よく知られた文字の認識順位が上位となり、認識率が向
上する。
Therefore, priority is given to candidate character type categories that have a high degree of similarity and are at a low learning stage in school education, and characters that are simple and well-known to the general public are ranked high in the recognition ranking, improving the recognition rate.

又、他の例として、シティブロック距離Dvが他の候補
と比べて十分小さくDv<D、であり、D、<D、 、
Dt)<Dtである場合は、字種コードMvはAランつ
てランクされ認識順位の入れ換えは行なわれず、Bラン
クの字種コードM、 、M、、MQ、Mbが優先度によ
る順位入れ換えが行なわれ、その認識順位は第4図に)
のようになる。この場合も第4図((ロ)に比較すれば
、学習段階の低い候補字種カテゴリーの認識順位が上位
に来ることとなる。
Also, as another example, the city block distance Dv is sufficiently small compared to other candidates such that Dv<D, and D,<D, ,
Dt)<Dt, the character type code Mv is ranked with the A rank and the recognition order is not changed, and the character type codes M, , M, , MQ, Mb of the B rank are changed in the ranking based on the priority. (The recognition order is shown in Figure 4)
become that way. In this case as well, when compared to FIG. 4 ((b)), the recognition ranking of the candidate character type category with a low learning stage comes to the top.

とこうで、本実施例においては、字種カテゴリーを優先
度に応じたカテゴリーセットに予め分けておき、優先度
情報を得るようにしたが、辞書部(3)に標準特徴パタ
ーンと共に優先度情報を予め記憶しておいてもよい。
Therefore, in this embodiment, the character type categories are divided in advance into category sets according to the priorities, and the priority information is obtained. It may be stored in advance.

尚、学習段階に応じた優先度の付与は、文部省の小学校
指導書及び中学校指導書を参照すれば容易に行なえる。
Assigning priorities according to the learning stage can be easily done by referring to the Ministry of Education's elementary school guidance manual and junior high school guidance manual.

(ト)発明の効果 本発明に依れば、簡単で一般大衆によく知られている字
種カテゴリーが優先されるようになるので、認識率が向
上すると共に、莫大な知識部を必要とせず、短かい処理
時間で認識順位を決定できる。
(g) Effects of the Invention According to the present invention, priority is given to character type categories that are simple and well-known to the general public, which improves the recognition rate and eliminates the need for a huge knowledge department. , the recognition ranking can be determined in a short processing time.

又、特に、学校教育の学習段階に応じた優先度を用いて
いるため、−膜性をもたせることが可能となり、ワープ
ロやパソコンへの入力手段として文字認識装置を用いる
場合に非常に有効となる。
In addition, in particular, since priority is used according to the learning stage of school education, it is possible to provide a film-like character, which is extremely effective when using a character recognition device as an input means to a word processor or personal computer. .

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明を適用した文字認識システムのブロック
図、第2図はカテゴリーセットの内容を示す説明図、第
3図は知識部の内容を示す説明図、第4図(イ)〜に)
は認識順位決定の具体例を示す説明図である。 主な図番の説明 (1)・・・文学観側部、 (2)・・・特徴抽出部、
 (3)・・・辞書部、(4)・・・パターンマンチン
グ部、(5)・・・候補メモリ、 (6)・・・知識部
、 (7)・・・クラスタリング制御処理部、 (8)
・・・結果メモリ、 (9)・・・答出力制御部。
Figure 1 is a block diagram of a character recognition system to which the present invention is applied, Figure 2 is an explanatory diagram showing the contents of the category set, Figure 3 is an explanatory diagram showing the contents of the knowledge section, and Figures 4 (a) to )
FIG. 2 is an explanatory diagram showing a specific example of recognition ranking determination. Explanation of main figure numbers (1)... Literary view side part, (2)... Feature extraction part,
(3)...Dictionary section, (4)...Pattern munching section, (5)...Candidate memory, (6)...Knowledge section, (7)...Clustering control processing section, ( 8)
...Result memory, (9)...Answer output control section.

Claims (1)

【特許請求の範囲】[Claims] (1)入力文字パターンから抽出した特徴パターンと、
予め辞書部に登録された字種カテゴリー毎の標準特徴パ
ターンとの類似度を計算し、複数の候補字種カテゴリー
を選択する文字認識システムにおいて、前記字種カテゴ
リーの各々に、学校教育の学習段階に応じた優先度を定
め、前記計算により得られた類似度と該優先度に基づい
て、前記複数の候補字種カテゴリーの認識順位を決定す
るようにしたことを特徴とする認識順位決定方式。
(1) A feature pattern extracted from the input character pattern,
In a character recognition system that selects a plurality of candidate character categories by calculating the degree of similarity with standard feature patterns for each character type category registered in advance in the dictionary section, each character type category is assigned a learning stage in school education. A recognition order determining method, characterized in that a priority is determined according to the above, and the recognition order of the plurality of candidate character type categories is determined based on the similarity obtained by the calculation and the priority.
JP59186011A 1984-09-05 1984-09-05 Recognition order determining system Pending JPS6162986A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP59186011A JPS6162986A (en) 1984-09-05 1984-09-05 Recognition order determining system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP59186011A JPS6162986A (en) 1984-09-05 1984-09-05 Recognition order determining system

Publications (1)

Publication Number Publication Date
JPS6162986A true JPS6162986A (en) 1986-03-31

Family

ID=16180813

Family Applications (1)

Application Number Title Priority Date Filing Date
JP59186011A Pending JPS6162986A (en) 1984-09-05 1984-09-05 Recognition order determining system

Country Status (1)

Country Link
JP (1) JPS6162986A (en)

Similar Documents

Publication Publication Date Title
JPH10232866A (en) Method and device for processing data
JPH0981730A (en) Method and device for pattern recognition and computer controller
Vankadaru et al. Text Identification from Handwritten Data using Bi-LSTM and CNN with FastAI
JP3375819B2 (en) Recognition method combining method and apparatus for performing the method
KR102569381B1 (en) System and Method for Machine Reading Comprehension to Table-centered Web Documents
JPS6162986A (en) Recognition order determining system
Can et al. Automatic categorization of ottoman poems
JPS592191A (en) Recognizing and processing system of handwritten japanese sentence
JPS6162985A (en) Recognition order determining system
JP3952964B2 (en) Reading information determination method, apparatus and program
Le Automated Transcription for Pre-Modern Japanese Kuzushiji Documents by Random Lines Erasure and Curriculum Learning
JPS62251986A (en) Misread character correction processor
JPS6168679A (en) Decision system of recognition order
JPS5842904B2 (en) Handwritten kana/kanji character recognition device
KR100255640B1 (en) Character recognizing method
US11935425B2 (en) Electronic device, pronunciation learning method, server apparatus, pronunciation learning processing system, and storage medium
JPH0338630B2 (en)
JP3763262B2 (en) Handwritten character recognition device
JP2538543B2 (en) Character information recognition device
JPH04115383A (en) Character recognizing system for on-line handwritten character recognizing device
JPH0338631B2 (en)
JPH0340434B2 (en)
JPS60134992A (en) Input device of character
CN117634474A (en) Language identification method and related equipment applied to medium-day text
JPS6252912B2 (en)