JPS5884372A

JPS5884372A - Recognizing method for on-line character written by specific writer

Info

Publication number: JPS5884372A
Application number: JP56181029A
Authority: JP
Inventors: Shuzo Owaku; 大和久　修三; Akio Nagano; 長野　昭夫; Katsuhide Tanoshima; 田野島　克秀; 「あ」木　正義; Masayoshi Yurugi
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1981-11-13
Filing date: 1981-11-13
Publication date: 1983-05-20

Abstract

PURPOSE:To increase recognition factor regardless of the writing habit of the writer, by recognizing the partial gathering of fractionated characters to discriminate the KANJI (Chinese character) in the form of a group of the partial gatherings, and therefore reducing the storage capacity necessary for a dictionary. CONSTITUTION:A pseudo radical dictionary 3 containing the features of pseudo radicals and pseudo radical codes is used when the stroke information fed through a tablet 1 is recognized at a recognizing part 2. The output of recognition is recognized and stored in an input register 4 in the form of a pseudo radical code. A selecting circuit 5 selects the characters from the contents of a character dictionary 6 and the pseudo code of the output of the register 4. In this case, the contents of the dictionary 3 are corrected in terms of only the pseudo radicals registered to the dictionary 8 when the registering is finished to the dictionary 8 by a writer. The information of a standard dictionary 7 is previously transferred to the dictionary 3 for other pseudo radicals. Therefore the input characters written by the writer are all recognized.

Description

【発明の詳細な説明】本発明は、情報処理機器の入力装置として用（・られる
オンライン手書文字認識装置に関わる認識方法に関する
ものである。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a recognition method related to an online handwritten character recognition device used as an input device for information processing equipment.

従来の手書文字入力ワード・プロセッサは、手書文字を
認識するためにオンライン手書文字認識技術を用いてい
たが、例えば数字、アルファベット、ひらがな、漢字を
認識する場合、数字１０ケ、アルファベット２６ケ、ひ
らがな４６文字及び濁点、半濁点をはじめとする記号類
を含むと約２００字あり、又、漢字をＪＩＳ　Ｃ６２２
６第１水準漢字集合に限っても２９６５字存在する。漢
字を常用漢字にしぼっても１９４５字あり、計２０００
文字を越えることとなる。Conventional handwritten character input word processors use online handwritten character recognition technology to recognize handwritten characters, but for example, when recognizing numbers, alphabets, hiragana, and kanji, it is necessary to recognize 10 numbers and 26 alphabets. There are about 200 characters including 46 hiragana characters and symbols such as voiced and handakuten, and kanji are written in JIS C622.
6 There are 2,965 characters in the first level kanji set. Even if we narrow down the kanji to commonly used kanji, there are 1945 characters, a total of 2000.
It goes beyond words.

さて、漢字を当用漢字とし、合計２０００字強の文字を
認識する手書文字入力日本語ワードプロセツザも発表さ
れているが、その認識処理は、例えば、Ｋ　−Ｌ展開法
を用いて漢字を直接認識せんとする等、ハードウェア量
が大となりすぎるという欠点があった。又、前記の例に
限らず手書漢字を数字、アルファベット、ひらがな等と
同じアルゴリズムで直接認識せんとする１こめ、認識の
ための計算量が膨大となり処理時間がかかることとなり
、この処理時間を短縮するためハードウェア量の増加を
招くという、実用化という観点からみた場合重大な欠点
があった。Now, a handwritten character input Japanese word processor that uses kanji as regular kanji and recognizes a total of over 2000 characters has been announced, but the recognition process uses, for example, the K-L expansion method. The drawback was that it required too much hardware, such as not being able to directly recognize kanji. Furthermore, in addition to the above example, when handwritten kanji are directly recognized using the same algorithm as numbers, alphabets, hiragana, etc., the amount of calculation required for recognition is enormous and processing time is required. There was a serious drawback from the point of view of practical application, as the reduction in time required an increase in the amount of hardware.

更に、従来は手書漢字を認識する場合、筆順等に関して
は少々正しくなくても認識できる様になっているもので
も、厳密に指書体で書くことを筆記者に強制する等、使
い易い装置とはいえなかった。これは筆記者各個人の字
の癖を極力排除するための処置であって、オンライン手
書文字認識にとされていたが、筆記者にとって苦痛であ
り、又、筆記者の癖が出ると認°識率が落ちるという重
大な欠点・があった。Furthermore, in the past, when recognizing handwritten kanji, even if the stroke order could be recognized even if the stroke order was slightly incorrect, it was necessary to force the scribe to write it strictly in the handwriting style, making it easier to use. I couldn't say yes. This was a measure to eliminate as much as possible the handwriting habits of individual scribes, and was supposed to be used for online handwritten character recognition, but it was found to be painful for the scribes, and it was recognized that the scribe's habits would come out. °There was a serious drawback that the recognition rate decreased.

以上説明した様に°、従来のオンライン手書文字認識方
法は、漢字を認識しようとする場合コスト高となり、更
に、クセ字を認識することはほとんど不可能に近いとい
う重大な欠点があった。As explained above, the conventional online handwritten character recognition method has the serious drawback that it is expensive when attempting to recognize Chinese characters, and furthermore, it is almost impossible to recognize irregular characters.

本発明は前記の欠点を除去するため、漢字を細分化して
細分化された文字の部分集合を認識し、前記認識された
部分集合の集まりとして漢字を識別するものであって、
等制約に認識する漢字の字数を減すると共に辞書に要す
る記憶容量をも減するもので、認識対象の文字辞書は前
記文字の部分集合の系列の形で登録できるため、認識文
字数の増加にともなう記憶容量の増加を低（おさえるこ
とができるという特徴を有する他、本発明は漢字を細分
化した文字の部分集合を認識〜する辞書として、標準辞
書と筆記者各個人の筆記結果より特徴を抽出した個人別
辞書を有するため、筆記者による字のクセの有無に関係
なく高い認識率で漢字を含む文字を識別できるという特
徴をも有する。In order to eliminate the above drawbacks, the present invention subdivides kanji, recognizes subsets of the subdivided characters, and identifies kanji as a collection of the recognized subsets, comprising:
This reduces the number of kanji characters to be recognized under equal constraints and also reduces the storage capacity required for the dictionary.Since the character dictionary to be recognized can be registered in the form of a series of subsets of the aforementioned characters, the number of characters to be recognized increases. In addition to having the feature of minimizing the increase in memory capacity, the present invention is a dictionary that recognizes a subset of characters by subdividing kanji, and extracts features from the standard dictionary and the handwriting results of each scribe. Because it has a personalized dictionary, it also has the feature of being able to identify characters, including kanji, with a high recognition rate, regardless of the presence or absence of handwriting quirks by the scribe.

第１図は本発明の一実施例を示すブロック図であって、
１はタブレット、２は認識部、３は漢字を細分化した文
字の部分集合と漢字以外の文字（以下擬似部首という）
の特徴と擬似部首コードを格納した擬似部首辞書、４は
認識部２で認識された擬似部首コードを格納する入力レ
ジスタ、５は入力レジスタ４内に格納されている１ケ又
は複数の擬似部首コードより文字を選択する選択回路、
６は擬似部首コードと文字コードを格納した文字辞書、
７は擬似部首の標準的な特徴と擬似部首コードを格納し
た標準擬似部首辞書、８は峰記者各個人の筆記結果より
特徴を抽出した擬似部首の個人別の特徴と、擬似部首コ
ードを格納した個人別擬似部首辞書の如く構成されてい
る。FIG. 1 is a block diagram showing an embodiment of the present invention,
1 is a tablet, 2 is a recognition unit, and 3 is a subset of characters obtained by subdividing kanji and characters other than kanji (hereinafter referred to as pseudo radicals)
4 is an input register that stores the pseudo radical code recognized by the recognition unit 2, and 5 is one or more of the characters stored in the input register 4. A selection circuit that selects characters from pseudo-radical codes;
6 is a character dictionary storing pseudo radical codes and character codes;
7 is a standard pseudo-radical dictionary that stores the standard features of pseudo-radicals and pseudo-radical codes, and 8 is a list of individual features of pseudo-radicals extracted from the handwriting results of each reporter, and pseudo-radicals. It is structured like a personal pseudo-radical dictionary that stores neck codes.

タブレット１より入力されたストローク情報は、認識部
２へ送られる。ｇ識部２は、オンライン手書に好適な周
知のスト′四−クアナリシス法等により、擬似部首を認
識する様、ｋ働く。認識部２で認識する際は、擬似部首
の特徴とＩＮ供部首コードを格納した擬似部首辞書（以
下辞書という）′３を用いる。認識部２の出力は、擬似
部首コードの形で入力レジスタ゛４に認識される都度出
力され格納される。その後、入力レジスタ４より格納さ
れている擬似部首コードが順次出力され、選択回路５に
入力される。選択回路５では、文字辞書６の内容と入力
レジスタ４の出力の擬似部首コードとにより文字を選択
し、結果の文字コードを出力する。Stroke information input from the tablet 1 is sent to the recognition section 2. The recognition unit 2 works to recognize pseudo radicals by a well-known stroke analysis method suitable for online handwriting. When the recognition unit 2 performs recognition, a pseudo radical dictionary (hereinafter referred to as a dictionary) '3 that stores features of pseudo radicals and IN radical codes is used. The output of the recognition unit 2 is outputted and stored in the input register 4 in the form of a pseudo-radical code each time it is recognized. Thereafter, the stored pseudo radical codes are sequentially output from the input register 4 and input to the selection circuit 5. The selection circuit 5 selects a character based on the contents of the character dictionary 6 and the pseudo radical code output from the input register 4, and outputs the resulting character code.

第２図は辞書３の一例を示す。０００より続＜１６進の
数字は擬似部首コードを示し、擬似部首コードの右側に
は各擬似部首が記載されている。実際の辞書では、各擬
似部首の位置には認識部２の認識アルゴリズムにもとす
く特徴データが記載されるが、ここでは説明の便のため
各擬似部首そのものを示す。なお辞書３には、漢字以外
のひらがな、数字、アルファベット等は細分化されず、
直接そのままの形で格納されている。FIG. 2 shows an example of the dictionary 3. Numbers in hexadecimal digits following 000 indicate pseudo-radical codes, and each pseudo-radical is written on the right side of the pseudo-radical code. In an actual dictionary, feature data is written at the position of each pseudo-radical according to the recognition algorithm of the recognition unit 2, but each pseudo-radical itself is shown here for convenience of explanation. In addition, in Dictionary 3, hiragana, numbers, alphabets, etc. other than kanji are not subdivided.
It is stored directly in its original form.

ここで、第１図における標準擬似部首辞書（以下標準辞
書という）７及び個人別擬似部首辞書（以下個人辞書と
いう）８について説明する。Here, the standard pseudo-radical dictionary (hereinafter referred to as standard dictionary) 7 and the individual pseudo-radical dictionary (hereinafter referred to as personal dictionary) 8 in FIG. 1 will be explained.

一般的に、オンラインであるとないにかかわらず文字認
識は、不特定の筆記者による筆跡を認識すべく構成され
てきた。しかしながら、オンライン手書文字認識を邦文
ワードプロセッサに適用し、オンライン手書文字入力ワ
ードプロセッサを構成する様な場合を考慮すると、前記
ワードプロセッサを用いるのは限られた人々であり、必
ずしも不特定多数の人々を対象として考える必要はない
。In general, character recognition, whether online or not, has been configured to recognize the handwriting of an unspecified scribe. However, considering the case where online handwritten character recognition is applied to a Japanese word processor to configure an online handwritten character input word processor, the word processor is used only by a limited number of people, and not necessarily by an unspecified number of people. There is no need to think of it as an object.

かかる考え方より、本発明の一つの特徴として、使用す
る個人の筆跡を予め登録するという考え方を提案するも
のである。Based on this idea, one feature of the present invention is to propose the idea of registering the handwriting of the individual who will use it in advance.

さて、前記標準辞書７には擬似部首として、ひらがな、
数字、アルファベット等は細分化せず直接そのままの形
で格納され（約２００ケ）、漢字については漢字２９６
５字を４００ケ強の擬似部首に細分化して格納されてい
る。標準辞書７に格納される各擬似部首の特徴は、数百
人から集めた各擬似部首に関し、認識部２の認識アルゴ
リズムにもとすく特徴データで牟り、標準辞書７の内容
を辞書３へ転送した場合、常識的な文字であればほぼ認
識できる状態となっている。Now, in the standard dictionary 7, hiragana,
Numbers, alphabets, etc. are stored directly as they are without being subdivided (approximately 200 characters), and kanji are stored in 296 kanji.
Five characters are subdivided into over 400 pseudo radicals and stored. The characteristics of each pseudo-radical stored in the standard dictionary 7 are determined by using feature data for each pseudo-radical collected from several hundred people, and the recognition algorithm of the recognition unit 2 uses the contents of the standard dictionary 7 as a dictionary. 3, common sense characters can almost be recognized.

次に個人辞書８は、標準辞書７を辞書３へ転送し１こ状
態で筆記者が筆記した際、筆記者のクセ等で認識できな
い場合、前記筆記者の筆跡を登録できる様に構成されて
いる。従って、認識部２には認識機能の他に入力文字に
対する分析機能も必要であるが、分析という言葉で表現
されたとしてもその内容は特徴量の計算であり、機能的
には認識機能と伺も変らない。相違する点は、認識の場
合、入力文字の特徴と辞書に格納されている特徴を照合
するが、分析の場合は照合せず、入力文字の特徴を個人
辞書８に格納する点だけモある。この様にして、筆記者
が筆記しても筆記者の意志に反した認識結果を与える文
字についてのみ個人辞書８に登録する。Next, the personal dictionary 8 is configured to be able to transfer the standard dictionary 7 to the dictionary 3 and register the handwriting of the scribe if it cannot be recognized due to the scribe's habits or the like when the scribe writes in one state. There is. Therefore, in addition to the recognition function, the recognition unit 2 also needs an analysis function for input characters, but even if it is expressed in the word analysis, the content is the calculation of feature quantities, and functionally it is not a recognition function. There is no change. The only difference is that in the case of recognition, the features of the input character are compared with the features stored in the dictionary, but in the case of analysis, the features of the input character are stored in the personal dictionary 8 without comparison. In this way, only characters that give a recognition result contrary to the scribe's will even if written by the scribe are registered in the personal dictionary 8.

以上の如く、筆記者による個人辞書８への登録が終了し
た時点で、個人辞書８に登録されている擬似部首に関し
てのみ辞書３の内容を修正すれば、他の擬似部首につい
ては予め標準辞書７の情報が辞書３に歓送されているた
め、前記筆記者の入力文字はすべて認識できることとな
る。As mentioned above, when the scribe completes the registration in the personal dictionary 8, if the contents of the dictionary 3 are corrected only for the pseudo radicals registered in the personal dictionary 8, other pseudo radicals can be set as standard in advance. Since the information in the dictionary 7 is transferred to the dictionary 3, all characters input by the scribe can be recognized.

各辞書は以上詳細に説明した様に機能するので、辞書３
はランダムアクセスメモリで、又、標準辞書７及び個人
辞書８はフロッピーディスクのディスケット等のファイ
ル装置で構成すればよい。Each dictionary functions as explained in detail above, so Dictionary 3
may be a random access memory, and the standard dictionary 7 and personal dictionary 8 may be constructed from a file device such as a floppy disk.

以上、標準辞書７及び個人辞書８と辞書３の関係につい
ては詳細に説明したので、以後の説明は、使用前に標準
辞書７の内容を辞書３へ転送し、更に個人辞書８に登録
されている擬似部首についてδみ辞書３の内容が修正さ
れているものとして説明を進める。The relationship between the standard dictionary 7, the personal dictionary 8, and the dictionary 3 has been explained in detail above, so the following explanation will explain how the contents of the standard dictionary 7 are transferred to the dictionary 3 before use, and are further registered in the personal dictionary 8. The explanation will proceed assuming that the contents of the δ-reading dictionary 3 have been modified for the pseudo radical.

第３図は入力レジスタ４の詳細を示す。９は認識部２よ
りの出力、１０〜１７は入力レジスタ４内のＩｏレジス
タ〜Ｉ、レジスタ、１８は切換−路、１９は入力レジス
タ４の出力を示す。FIG. 3 shows details of the input register 4. Reference numeral 9 indicates an output from the recognition unit 2, 10 to 17 indicate an Io register to an I register in the input register 4, 18 indicates a switching path, and 19 indicates an output from the input register 4.

第４図は文字辞書６の部分を示す。第４図第３行目は、
擬似部首コード１７６で示される「立」という擬似部音
と、擬似一部首コ、−ドＯＦＢで示される「日」という
擬似部首により「音」という文字であることを示し、「
音」という文字の文字コー、ドは、ＪＩＳ　Ｃ６２２６
０−Ｉ＋−５，’つ諺３Ｂというコードであることを示
す。なお第４図の０内の文字は、説明の便のため記載し
たもので、実際の辞書は擬似部首コードと文字コードで
構成される。FIG. 4 shows a portion of the character dictionary 6. The third line of Figure 4 is
The pseudo radical code 176 indicates the character ``tate'' and the pseudo radical ``日'', which is expressed by the pseudo partial ko and -do OFB, indicates that it is the character ``on.''
The character code for the character "One" is JIS C6226
0-I+-5, 'Proverb 3B' code. Note that the characters in 0 in FIG. 4 are shown for convenience of explanation, and the actual dictionary is composed of pseudo radical codes and character codes.

第５図は、「彰」という文字を入力した時の本発明によ
る処理を示すため、入力レジスタ４の■。レジスタ１０
〜■７レジスタ１７へ入力される擬似部首コードを示し
たものである。FIG. 5 shows the processing according to the present invention when the character "Akira" is input, and the input register 4 is filled with ■. register 10
~■7 This shows the pseudo radical code input to the register 17.

以下、第５図を中心として本発明によるオンライン手書
文字認識の方法について、「彰」という文字を例にとり
詳細に説明する。先ずタイミングＴ。Hereinafter, the online handwritten character recognition method according to the present invention will be described in detail with reference to FIG. 5, taking the character "Akira" as an example. First, timing T.

で、タブレット１より操作者が「′」を入力すると、「
゛」は認識部２へ出力され、認識部２において擬似部首
辞書（以下辞書という）３を用いて擬似部首として登録
されているか否かを検定するが、「′」は辞書３に登録
されていないため、未定義コード井をＩ。レジスタＩＯ
Ｋ登録する。次いでタイミン’；ｆ　Ｔ２で「−」がタ
ブレット１より入力されると、タイミングＴ１で未定義
の「゛」と合せて「−」という擬似部首が辞書３にある
かどうかを認識部２において検定すると、第ｉ図で示さ
れる様に、擬似部首コード０５０として「１」という擬
似部首が登録されているため、■。レジスタ１０に０５
０というコードがセットされる。タイミングＴ３で入力
される「＼」は、擬似部首が辞書３に登録されていない
ため、■。レジスタ１０はそのままにして１１レジスタ
１１に未定義コードチを登録する。タイミングＴ、で入
力された［／Ｊは、辞書３に登録されていないが、タイ
ミングＴ、で未定義の「＼」と合せて「＼ｌ」という擬
似部首が辞書３に０１４という擬似部首コードで登録さ
れているため、工、レジスタ１１の未定義コード簀を消
去して新たに０１４というコードがセットされる。なお
、擬似部首コード０５０と０１４で新たな擬似部首とな
るかどうかについて、「立」という文字で辞書３を用い
て検定するが、「立」という文字は独立の擬似部首とし
て辞書３に存在じないため、Ｉ０レジスタ１０．１１レ
ジスタ１１０内容は変らないで保持される。タイミング
Ｔ、で「−」が入力されると、「−」　という文字は辞
書３より擬似部首コード００４であることが判明するた
め、■、レジスタ１２に００４がセットされ、その後「
豆１及び「立」について擬似部首コードが辞書３に登録
されているか否かを検定する。Then, when the operator inputs "'" from Tablet 1, "
゛'' is output to the recognition unit 2, and the recognition unit 2 uses a pseudo radical dictionary (hereinafter referred to as dictionary) 3 to check whether it is registered as a pseudo radical. Because it is not defined, the code is undefined. Register IO
KRegister. Next, when "-" is input from the tablet 1 at timing ';f T2, the recognition unit 2 checks whether the pseudo radical "-" exists in the dictionary 3 along with the undefined "゛" at timing T1. When tested, as shown in Figure i, the pseudo radical "1" is registered as pseudo radical code 050, so ■. 05 in register 10
A code of 0 is set. Since the pseudo radical of "\" input at timing T3 is not registered in the dictionary 3, ■. An undefined code is registered in register 11, leaving register 10 as it is. The [/J input at timing T is not registered in the dictionary 3, but at timing T, the pseudo radical ``\l'' is added to the dictionary 3, along with the undefined ``\''. Since it is registered with the head code, the undefined code box in register 11 is deleted and a new code of 014 is set. Note that whether or not the pseudo radical codes 050 and 014 are new pseudo radicals is tested using Dictionary 3 using the character ``tate'', but the character ``tate'' is tested as a new pseudo radical by using Dictionary 3. Therefore, the contents of the I0 register 10 and 11 register 110 are held unchanged. When "-" is input at timing T, the character "-" is found to be a pseudo radical code 004 from the dictionary 3, so 004 is set in the register 12, and then "
It is verified whether pseudo-radical codes are registered in the dictionary 3 for Mame 1 and "tachi".

即ち、その文字内の全ストロークについて最小個数の擬
似部首コードで表現するために検定を行なうこととなる
。この場合、「豆」は擬似部首コードとして登録されて
なく、「立」は擬似部首コード１７６として辞書に登録
されている。従って、■。In other words, a test is performed to express all strokes within the character using the minimum number of pseudo-radical codes. In this case, "mame" is not registered as a pseudo-radical code, and "tate" is registered as pseudo-radical code 176 in the dictionary. Therefore, ■.

レジスタ１０、■、レジスタ１１．Ｉ２レジスタ１２を
リセットし、■。レジスタ１０に１７６を登録する。か
くして「立」という文字は、第２図で示す擬似部首コー
ド１７６で示される１ケの擬似部首であることを示すこ
ととなぁ。Register 10, ■, Register 11. Reset I2 register 12, ■. Register 176 in register 10. Thus, the character ``tate'' indicates that it is a single pseudo-radical as indicated by the pseudo-radical code 176 shown in FIG.

同様にして、第５図に示す様に「彰」という文字に関し
て、結果的に擬似部首コード１７６、ＯＦＢ。Similarly, as shown in FIG. 5, the pseudo radical code for the character "Akira" is 176, OFB.

０４５．０６５で表わされる文字であることが黛縁され
る。なお、タイミングＴ、で、３ケの擬似部首コードよ
りなる入力文字を再検定して１ケの擬似部首コード１７
６を識別したのと同様に、タイミングＴ。It is confirmed that the character is represented by 045.065. Furthermore, at timing T, the input character consisting of three pseudo radical codes is re-verified and one pseudo radical code 17 is obtained.
6, timing T.

では、擬似部首コード０２１よりなる文字と未定義の「
−」及び「−」より擬似部首コードＯＦＢなる文字「日
」な識別し、更に又タイミングＴ、、、Ｔ、３゜Ｔ、４
においても、それぞれ２°ケの擬似部首コードと識別さ
れた入力文字から１ケの擬似部首コードを識別している
。Then, the character consisting of the pseudo radical code 021 and the undefined "
-" and "-" to identify the character "day" with the pseudo radical code OFB, and also the timing T,..., T, 3°T, 4
Also, one pseudo-radical code is identified from input characters that have been identified as 2° pseudo-radical codes.

この様にして、一種の最長一致法により擬似部首コード
の検定を行なっている。一般的に最長一致法の場合、そ
の入力すべてが入力され終ってから一致を見ることが一
般的である。即ち、入力された全ストロークに対して判
定し、擬似部首と認３に登録されている擬似部首の数が
、数字、アルファベット、ひらがな、記号等を加えても
６００ケ強であり大きな数にならないこと、更には、人
が文字を入力する速度が遅いことを勘案して、入力順に
検定を行なっているものであり。In this way, pseudo-radical codes are tested using a type of longest match method. Generally, in the case of the longest match method, a match is generally checked after all of the inputs have been input. In other words, the number of pseudo radicals that are judged for all input strokes and registered as pseudo radicals in Recognition 3 is over 600, even including numbers, alphabets, hiragana, symbols, etc., which is a large number. In consideration of the fact that the speed of inputting characters is slow, the verification is performed in the order of input.

以上説明した様に、タプレ・ット１より「彰」を入力す
ることにより、入力レジスタ４内のＩ０レジスタ１０〜
Ｉ３レジスタ１３内に擬似部首コード１７６゜ＯＦＢ　
、　０４５　、０６５が格納される。これら入力レジス
タ４内のＩ。レジスタ１０〜■マレジスタ１７の内容は
、切換回路１８により出力１９に順次導出され、選択回
路５に入力される。選択回路５では、入力された擬似部
首コードにより、第４図に示す文字辞書６を用いてＪＩ
Ｓ　Ｃ６２２６による文字コードを選択する。As explained above, by inputting "Akira" from Taplet 1, I0 registers 10 to 10 in input register 4 are
Pseudo radical code 176°OFB in I3 register 13
, 045, and 065 are stored. I in these input registers 4. The contents of the registers 10 to 17 are sequentially output to the output 19 by the switching circuit 18 and input to the selection circuit 5. The selection circuit 5 uses the input pseudo radical code to select JI using the character dictionary 6 shown in FIG.
Select the character code according to SC6226.

即ち、入力レジスタ４内のＩ。レジスタ１０〜■７レジ
スタ１７に格納されている擬似部首コードが、１７６゜
ＯＦＢ、０４５，０６５であることより選択回路５によ
り文字辞書６を調べると、第４図に示すごとく、擬似部
首コードが１７６　、ＯＦＢ、０４５，０６５である文
字は、３Ｅ３４なる文字コードで示される漢字「彰」で
あることが判明する。以上の様にして、３Ｅ３４なるＪ
ＩＳ　Ｃ６２２６文字コードが選択回路５より出力され
ることにより、タブレット１より入力された手書文字が
漢字「彰」であることが認識される。That is, I in input register 4. Since the pseudo-radical code stored in registers 10 to 7 registers 17 is 176°OFB, 045,065, when the character dictionary 6 is checked by the selection circuit 5, the pseudo-radical code is found as shown in FIG. It turns out that the characters whose codes are 176, OFB, and 045,065 are the kanji character "Ang", which is indicated by the character code 3E34. In the above manner, J becomes 3E34.
By outputting the IS C6226 character code from the selection circuit 5, it is recognized that the handwritten character input from the tablet 1 is the Chinese character "Ang".

以上詳細に説明した様に前記実施例においては、漢字を
細分化して、細分化された文字の擬似部首と名付けた部
分集合を認識し、前記認識された擬似部首の集まりとし
て漢字を識別する方法を示す。As explained in detail above, in the above embodiment, a kanji is subdivided, a subset of the subdivided characters named pseudo-radicals is recognized, and a kanji is identified as a collection of the recognized pseudo-radicals. We will show you how to do it.

ここで擬似部首を認識するためには、簡単なアルゴリズ
ムの認識部２と、漢字の数に比して極めて少数の擬似部
首からなる擬似゛部首辞書３でよいこととなる。例えば
数字、アルファベット、ひらがな、記号類及５びＪＩＳ
　Ｃ６２２６第１水準漢字集合２９６５字のための擬似
部首の数は６００強であり、この内４００強が漢字２９
６５字のための擬似部首である。In order to recognize pseudo-radicals here, it is sufficient to use a recognition unit 2 with a simple algorithm and a pseudo-radical dictionary 3 consisting of a very small number of pseudo-radicals compared to the number of Chinese characters. For example, numbers, alphabets, hiragana, symbols, and JIS
The number of pseudo radicals for the 2965 characters in the C6226 first level kanji set is over 600, of which over 400 are 29 kanji.
It is a pseudo-radical for 65 characters.

この様に、漢字２９６５字の字数を本発明によれば認識
時のみ等制約に減少させる効果を有することとなる。ス
第２図に示す擬似部首辞書３の内容を認識するための認
識部２０機能は、周知のストロークアナリシス法等の簡
単なアルゴリズムでよいことは、その道の専門家であれ
ば容易に理解できるものと考える。更に、これら認識さ
れた擬似部首の集まりとして漢字を含む文字を識別する
ための文字辞書６は、第４図に示す様に、単に擬似部首
コードとＪＩＳ　Ｃ６２２６による漢字コードだけで構
成できるため、漢字を含めた文字の字数が多くなった場
合でも極めて少量のメモリしか増加しないことも本発明
の利点であり、擬似部首辞書３及び文字辞書６を合せて
も、直接漢字を含む文字の特徴を記録した従来の辞書の
１容量が大であったのと相違して極めて少量となるため
一オンライン手書文字認識を、その対象を漢字２９６５
字より構成されるＪＩＳ　Ｃ６２２６第１水準漢字集合
にまで広げたとしても、安価に提供することができる。In this way, the present invention has the effect of reducing the number of 2965 Chinese characters to the same constraint only during recognition. An expert in the field can easily understand that the function of the recognition unit 20 for recognizing the contents of the pseudo-radical dictionary 3 shown in Figure 2 can be implemented using a simple algorithm such as the well-known stroke analysis method. Think of it as something. Furthermore, the character dictionary 6 for identifying characters including kanji as a collection of these recognized pseudo-radicals can be composed of only pseudo-radical codes and kanji codes according to JIS C6226, as shown in FIG. Another advantage of the present invention is that even when the number of characters including kanji increases, only a very small amount of memory is required. Unlike conventional dictionaries that record features, which have a large capacity, the amount is extremely small. Therefore, online handwritten character recognition has been developed, with the target being 2965 kanji characters.
Even if it is extended to the JIS C6226 first level kanji set consisting of characters, it can be provided at a low cost.

更に本発明の特徴として、擬似部首辞書３の内容が、数
百人の筆跡より集めた擬似部首に関する特徴よりなる標
準擬似部首辞書７を、筆記者各個人のクセ字等の特徴を
格納した個人別擬似部首辞書８で修正し、各筆記者にと
って認識し易い構成としたことをあげることができる。Furthermore, as a feature of the present invention, the contents of the pseudo-radical dictionary 3 include a standard pseudo-radical dictionary 7 consisting of features related to pseudo-radicals collected from the handwriting of several hundred people, and a standard pseudo-radical dictionary 7 consisting of features related to pseudo-radicals collected from the handwriting of several hundred people. It is possible to modify the information using the stored personal pseudo-radical dictionary 8, and to create a configuration that is easy for each scribe to recognize.

一般に不特定多数の文字を認識する。場合、千差万別の
クセ字のｆこめに装置全体が大きくなったり、又、認識
し易い様にするため筆順、ストローク等を規制する等筆
記者にとって満足すべきものを提供することができなか
った。この様な現状において、筆記者が必ずしも不特定
多数でないオンライン手書文字入力ワードプロセッサの
如き用途にあっては、認識率そのものが飛躍的に向上し
、更に認識手法そのものも特別なりセ字等を考慮する必
要がないため、認識アルゴリズムが簡単となり、情報処
理装置全般にとって好適な入力装置を安価に提供するこ
とができる。In general, an unspecified number of characters are recognized. In this case, the entire device becomes large due to the wide variety of quirky characters, and it is not possible to provide something that satisfies the scribe, such as regulating stroke order, strokes, etc., in order to make it easier to recognize. Ta. Under these circumstances, in applications such as online handwritten character input word processors where the number of scribes is not necessarily an unspecified number, the recognition rate itself can be dramatically improved, and the recognition method itself is also special, taking into account characters such as C characters. Since there is no need to do this, the recognition algorithm becomes simple, and an input device suitable for information processing devices in general can be provided at low cost.

前記実施例では基本的な要素について説明したが、以下
に示す様に各種の改良を実施することにより、よりよい
オンライン手書文字認識方法を提供することができるの
で、以下に説明する。Although the basic elements have been explained in the above embodiment, by implementing various improvements as shown below, a better online handwritten character recognition method can be provided, which will be explained below.

第１に、前記実施例では、文字辞書６の内容として擬似
部首コードと文字コードだけの組合せとしたが、第４図
の例でも明らかな様に、同じ「立」という文字でも「立
」単独で一つの文字を構成するもの、「妾」、「音」、
「章」、「意」、「童」の様に文字の上部に「ｉＬ」が
位置するもの、「彰」、「韻」の様に文字の左上部に、
「五」が位置するもの等がある。これらの「立」は前記
実施例では全く同一に扱つ′ており、字数が実施例程度
の場合問題はないが、゛字数を更に増加させたい場合に
おいては、これら「立」という文字の位置情報を文字辞
書６に含−ませれば、更に認識率を向上させることがで
きる。位置情報としては、ＪＩＳ　Ｃ６２２６字形索引
第４項の字形構成を用いれば充分である。First, in the above embodiment, the content of the character dictionary 6 is a combination of only pseudo-radical codes and character codes, but as is clear from the example in FIG. Things that make up one character by themselves: ``concubine'', ``on'',
"iL" is located at the top of the character such as "Chapter", "I", and "Dou", and "iL" is located at the top left of the character such as "Akira" and "Rime".
There are some where "five" is located. These ``tachi'' are treated exactly the same in the above example, and there is no problem if the number of characters is about the same as in the example, but if you want to further increase the number of characters, the position of these ``tachi'' characters may be changed. If the information is included in the character dictionary 6, the recognition rate can be further improved. As the position information, it is sufficient to use the glyph structure in item 4 of the JIS C6226 glyph index.

、第２に、前記実−例では第５−を用いて説明した如く
、新しいストロークがタブレットｌより入力される毎に
第１ストロークまでさかのぼって、令書かれた前記新し
いストロークまでが一つの擬似部首を構成しているかど
うかについて、入力されたストロークについて検定すべ
く説明した。例えば、擬似部首「−」と「＼ｌ」と「−
２１−は、擬似部首コードでは０５０，０１４，００４
であるが、この３ケの擬似部首が集まった「立」も擬似
部首であり、そのコードは１７６であることを「立」の
ストロークにより擬似部首辞書３を参照して求めた。, Second, as explained using the fifth example in the above example, every time a new stroke is input from the tablet L, all the steps up to the first stroke and the new stroke written are one pseudo stroke. We have explained how to test input strokes to determine whether they constitute a radical. For example, the pseudo radicals "-", "\l", and "-"
21- is the pseudo radical code 050,014,004
However, ``tate'', which is a collection of these three pseudo radicals, is also a pseudo radical, and its code is 176, which was determined by referring to the pseudo radical dictionary 3 using the stroke of ``tate''.

しかしながら、認識部２より参照する辞書として擬似部
首辞書３の他に、（擬似部首０５０　）　＋（擬似部首
０１４　）　＋　（擬似部首００４　）　＝　（擬似Ｎ
首１７６）の如き擬似部首間の相関関係を示す擬似部首
相関辞書を設ければ、認識処理時間が極めて早（なると
いう効果がある。However, in addition to the pseudo-radical dictionary 3 as a dictionary referred to by the recognition unit 2, (pseudo-radical 050) + (pseudo-radical 014) + (pseudo-radical 004) = (pseudo-N
Providing a pseudo-radical correlation dictionary that shows the correlation between pseudo-radicals such as the head 176) has the effect of extremely shortening the recognition processing time.

以上詳細に説明した様に、本発明は認識が困難な漢字を
細分化し、この細分化した擬似部首を認識することで漢
字を識別でき、更にクセ字に対しても個人別辞書を保有
することにより認識率の向上を実現しているため、認識
率を高く保った状態でコストを下げることができ、もっ
て安価で高性能のオンライン手書文字認識装置を提供す
ることができる。As explained in detail above, the present invention subdivides kanji that are difficult to recognize, and recognizes the subdivided pseudo radicals to identify kanji, and also has a personal dictionary for quirky characters. As a result, the recognition rate can be improved, so the cost can be lowered while keeping the recognition rate high, thereby making it possible to provide an inexpensive and high-performance online handwritten character recognition device.

[Brief explanation of drawings]

第１図は本発明の一実施例のブロック図、第２図は擬似
部首辞書の一例を示す図、第３図は入力レジスタの詳細
図、第４図は文字辞書の部分図、第５図は本発明による
認識方法を示す図である。１・・・・・・タブレット、　　２・・・・・・認識部
、３・・・・・・擬似部首辞書、４・・・・・・入力レ
ジスタ、５・・・・・・選択回路、　　　６・・・・・
・文字辞書、７・・・・・・標準擬似部首辞書、８・・・・・・個人別擬似部首辞書、９・・・・・・認識部２よりの出力、１８・・・１・・・切換回路、１９・・・・・・入力レジスタ４の出力特許出願人　　
沖電気工業株式会社Fig. 1 is a block diagram of an embodiment of the present invention, Fig. 2 is a diagram showing an example of a pseudo radical dictionary, Fig. 3 is a detailed diagram of an input register, Fig. 4 is a partial diagram of a character dictionary, and Fig. 5 is a diagram showing an example of a pseudo radical dictionary. The figure is a diagram showing a recognition method according to the present invention. 1... Tablet, 2... Recognition section, 3... Pseudo radical dictionary, 4... Input register, 5... Selection circuit. , 6...
・Character dictionary, 7...Standard pseudo-radical dictionary, 8...Personal pseudo-radical dictionary, 9...Output from recognition unit 2, 18...1 ...Switching circuit, 19...Output of input register 4 Patent applicant
Oki Electric Industry Co., Ltd.

Claims

[Claims]

In an online character identification method that includes a tablet that extracts stroke information of handwritten characters and a dictionary that stores character features, the information from the tablet is compared with the features of the dictionary to identify handwritten characters. It has a first dictionary that stores characteristics of subdivided subset patterns and the entire pattern of characters other than kanji, and a second dictionary that stores character codes for collections of subsets of kanji and characters other than kanji. and the first dictionary comprises a memory that integrates a standard dictionary that stores patterns in the standard handwriting of characters and a personal dictionary that stores patterns in the handwriting of at least one specific scribe. , furthermore, a recognition unit that compares the information from the tablet with the output of the first dictionary, an input register that stores the recognition result 4, and a character identification unit that compares the contents of the input register with the second dictionary. A selection circuit, a specific scribe online character recognition method characterized in that after the contents of the first dictionary are modified by a specific individual dictionary, characters input by the specific individual are compared and identified for each subset. .