JPS63268082A - Pattern recognizing device - Google Patents

Pattern recognizing device

Info

Publication number
JPS63268082A
JPS63268082A JP62101869A JP10186987A JPS63268082A JP S63268082 A JPS63268082 A JP S63268082A JP 62101869 A JP62101869 A JP 62101869A JP 10186987 A JP10186987 A JP 10186987A JP S63268082 A JPS63268082 A JP S63268082A
Authority
JP
Japan
Prior art keywords
kanji
candidate
reading
word
pattern
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP62101869A
Other languages
Japanese (ja)
Inventor
Haruo Asada
麻田 治男
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Corp
Original Assignee
Toshiba Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Corp filed Critical Toshiba Corp
Priority to JP62101869A priority Critical patent/JPS63268082A/en
Publication of JPS63268082A publication Critical patent/JPS63268082A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PURPOSE:To automatically correct a pattern recognizing result at a high speed by defining the final result when the coincidence of recognizing results is obtained between a KANJI (Chinese characters) string and its Japanese phonetic syllabary. CONSTITUTION:The KANJI strings recognized by a character recognizer 1 are stored in a KANJI candidate table 2 together with the Japanese phonetic syllabary stored in a Japanese syllabary candidate table 3 respectively. A control part 4 reads successively the candidate words out of the table 2 and refers to a read dictionary 5 storing the reading of each KANJI for candidate words to obtain the reading of all candidate words. Then the part 4 selects the reading coincident with a character number signal 8 for the Japanese phonetic syllable received from the table 3 to send it to a word collating part 6 together with the likelihood degrees of KANJI candidate words. The part 6 calculates the collation degree between the entire reading of the candidate word of the transferred KANJI and the Japanese phonetic syllabary and sends the collation degree to a sorting circuit 7. The circuit 7 sorts the reading of all KANJI candidate words in the order of larger collation degrees and outputs the KANJI candidate word having the largest collation degree and its reading as the recognizing results of the KANJI string and the Japanese phonetic syllabary.

Description

【発明の詳細な説明】 〔発明の目的〕 (産業上の利用分野) 本発明は光学的文字読取装置あるいはオンライン文字認
識装置などによって文字認識された認識結果を簡易にか
つ高精度で自動修正することのできる認識処理装置に関
する。
[Detailed Description of the Invention] [Objective of the Invention] (Industrial Application Field) The present invention automatically corrects recognition results obtained by character recognition using an optical character reading device, an online character recognition device, etc. in a simple and highly accurate manner. This invention relates to a recognition processing device that can perform

(°従来の技術) 近年、情報処理技術の発展に伴なって、手書き文字を認
識処理してデータ入力する装置が種々開発されている。
(°Prior Art) In recent years, with the development of information processing technology, various devices have been developed that recognize handwritten characters and input data.

これらの装置ではパターン認識の技術を使って個々の文
字を認識しているが、手書文字の変形が大きい場合や文
字の品質が悪い場合には高い認識精度を得ることができ
ないのが現状である。そのため、姓名や住所、あるいは
商品名など単語として意味のある情報を入力する場合に
は、個々の文字の認識結果として唯一の結果を出力する
のではなく、いくつかの候補文字を出力し姓名、住所、
商品名などの単語知識ベースと照合することによシ意味
のある単語となる候補文字の組合せを出力することによ
って精度向上を計ることが一般的である。このような方
法はたとえば特開昭59−197974号公報に開示さ
れている。この方法によればある限られた数量の商品名
や限られた範囲の住所を入力する場合には個々の文字の
認識性能がある程度確保されれば100%に近い入読率
を得ることができる。
These devices use pattern recognition technology to recognize individual characters, but currently it is not possible to obtain high recognition accuracy when the handwritten characters are significantly deformed or the quality of the characters is poor. be. Therefore, when inputting information that has meaning as a word, such as a first and last name, address, or product name, instead of outputting a single recognition result for each character, several candidate characters are output. address,
It is common to improve accuracy by outputting candidate character combinations that form meaningful words by comparing them with a word knowledge base such as product names. Such a method is disclosed, for example, in Japanese Patent Laid-Open No. 197974/1983. According to this method, when inputting a limited number of product names or a limited range of addresses, it is possible to obtain a reading rate close to 100% as long as the recognition performance of each character is secured to a certain degree. .

しかし一般の姓や名のように種類が10万種を起えるよ
うな場合には次のような2つの問題が生じている。まず
、知識ベースに登録された単語の数が増えるに従って偽
照合のケースが増えることである。偽照合とは1字や2
字違う別の単語に誤まって照合されることであシ、例え
ば名の場合では、恵美子と恵津子を区別するためには美
と津を正しく認識する必要があシ、このような場合には
単語の持つ冗長性を利用することができず、文字認識の
性能がそのiま単語認識の性能になってしまう。第2の
問題点は姓や名の場合、その種類の数が明らかにされて
おらず、すべての姓や名を含む知識ベースを構築するこ
とが不可能に近いということである。また「日本の苗字
」(日本経済新聞社刊、昭和53年)によるとカナ表記
の場合約100の姓で全体の37%を占めるにもかかわ
らず10,000種の姓で96.7 %の占有率しかな
く、99、9 %を超えるためには数万穏の姓が必要と
されておシ、高精度な候補文字選択を行なうには厖大な
知識ベースが必要となる。漢字表記の姓の場合はカナ表
記よシさらに多数の種類を必要とし、名の場合にはさら
に種類が多く困難さが増大する。
However, in cases where there are 100,000 types of names, such as common surnames and given names, the following two problems arise. First, as the number of words registered in the knowledge base increases, the number of cases of false matching increases. False verification means 1 character or 2
For example, in the case of given names, in order to distinguish between Emiko and Etsuko, it is necessary to correctly recognize Mi and Tsu. It is not possible to utilize the redundancy of words, and the performance of character recognition becomes the performance of word recognition. The second problem is that the number of types of surnames and given names is not disclosed, and it is nearly impossible to construct a knowledge base that includes all surnames and given names. Furthermore, according to "Japanese Surnames" (published by Nihon Keizai Shimbun, 1978), although there are approximately 100 surnames in kana, accounting for 37% of the total, there are 10,000 surnames, accounting for 96.7% of the total. To exceed 99.9%, tens of thousands of surnames would be required, and a vast knowledge base would be required to select candidate characters with high precision. In the case of a surname written in kanji, many more types are required than in kana, and in the case of a given name, there are even more types, which increases the difficulty.

(発明が解決しようとする問題点) 本発明は、このような技術的状況を鑑みてなされたもの
で、パターン認識装置の認識結果の自動修正を少ない記
憶容量で効率的に、しかも精度よく行なうことのできる
パターン認識装置を提供することを目的とする。
(Problems to be Solved by the Invention) The present invention has been made in view of the above technical situation, and aims to automatically correct the recognition results of a pattern recognition device efficiently and accurately with a small storage capacity. The purpose of this invention is to provide a pattern recognition device that can perform

〔発明の構成〕[Structure of the invention]

(問題点を解決するための手段) 従来の認識結果の自動修正を行なう方式では対象となる
文字列に対してのみの処理を行なっていた。そのため、
文字列間の差異が少ない場合には高精度な自動修正が望
めなかった。しかしたとえば文字式文字読取機の読取対
象となる伝票、特に各種の申し返書や申請書では、姓名
や住所にふシがなか付けられることが通常である。この
ように対応づけが可能な複数のパターン列を同時に処理
することによって、そこにある冗長性を利用して処理の
効率と精度を上げることが本発明の主旨である。具体的
には、漢字文字列とそのふりがなとを対応ずけるものと
して個々の漢字の読みを用い漢字列の認識結果として得
られた候補文字各々の読みを組合せてふりがなの認識候
補文字の組合せに一致するものを最終結果とするもので
ある。
(Means for solving the problem) In conventional methods for automatically correcting recognition results, processing is performed only on target character strings. Therefore,
Highly accurate automatic correction cannot be expected when there are few differences between character strings. However, for example, in documents that are read by a character-based character reader, especially in various return letters and application forms, a border is usually added to the name and address. The gist of the present invention is to improve processing efficiency and accuracy by simultaneously processing a plurality of pattern sequences that can be correlated in this manner and utilizing the existing redundancy. Specifically, the pronunciation of each kanji is used to match a kanji character string with its furigana, and the pronunciations of each candidate character obtained as a result of recognition of the kanji string are combined to form a combination of furigana recognition candidate characters. The one that matches will be the final result.

例を用いて本手段を説明する。第2図はその説明図であ
る。第2図において21は記入帳票であり、漢字の姓と
そのふりがなが記入されている。
The present means will be explained using an example. FIG. 2 is an explanatory diagram thereof. In Figure 2, numeral 21 is an entry form, in which the surname in kanji and its furigana are entered.

22は文字認識装置によって認識した結果であって漢字
については2位までの候補文字が、ふりがなについては
第3位までの候補文字がリストアツブされている。23
は個々の漢字の読みのテーブルであり、候補としてあげ
られた漢字についてのみ表示している。24は漢字候補
のすべての組合せについて可能な読みを列挙したもので
ある。ただしふシがなの文字数と同じ文字数の読みのみ
をあげている。(濁点はひとけたに数える。)22のカ
ナの候補の任意の組合せのうちで24に含まれるものは
“ウスイ”しかなく従って漢字は“碓居″と書かれてい
たことになる。一般的には複数の読みが含まれることが
あシうるが、この場合は候補順位や個々の文字を認識す
るときの類似度や距離尺度をもとに単語としての照合度
で順位づけして最終結果とすることができる。
22 is the result of recognition by the character recognition device, in which candidate characters up to second place for kanji characters and candidate characters up to third place for furigana characters are restored. 23
is a table of readings of individual kanji, and only kanji that have been suggested as candidates are displayed. 24 lists possible readings for all combinations of kanji candidates. However, only the readings with the same number of characters as the Fushigana are listed. (Dakuten counts as one digit.) Of the 22 possible combinations of kana, only ``usui'' is included in 24, so the kanji would have been written as ``usui''. In general, it is possible that multiple pronunciations are included, but in this case, it is ranked by matching degree as a word based on the candidate ranking and the similarity and distance scale when recognizing individual characters. The final result can be

他の例としては第3図に示す:うな郵便番号と漢字で記
入された住所の認識がある。郵便番号を認識した結果書
られる候補に対応した住所のうち漢字の候補の組合せに
含まれるものを郵便番号の認識結果とするものである。
Another example is shown in Figure 3: the recognition of addresses written in Una postal codes and Kanji characters. Among the addresses corresponding to the candidates written as a result of recognizing the postal code, those included in the combination of Kanji candidates are taken as the postal code recognition result.

(作用) 以上述べたように、本発明によれば、個々の漢字の読み
情報のみを知識ベースとして記憶しておくだけで、同一
の内容を示す漢字列とふりがなの両方の認識結果の自動
(6正を行なうことができる。
(Operation) As described above, according to the present invention, by simply storing only the reading information of individual kanji as a knowledge base, the recognition results of both kanji strings and furigana that indicate the same content can be automatically ( 6. Can do the correct thing.

(実施例) 以下、図面を参照して本発明の一実施例につき説明する
(Example) Hereinafter, an example of the present invention will be described with reference to the drawings.

第1図は本発明の概略構成を示すブロック図である。第
1図において文字認識装置1は光学的文字認識装置、オ
ンライン文字認識装置などのパターン認識装置であシ漢
字列およびふりがなの認識結果はおのおの漢字文字候補
テーブル2とふりがな候補文字テーブル3に出力され記
憶される。ここでテーブルとは第2図22に示すように
候補文字を並べて表形式にしたものであり、各候補はコ
ード化されメモリに記憶されている。制御部4は漢字候
補テーブル2から順次、候補単語を読み出す。候補単語
の読み出し方は各文字位置からひとつの候補文字を選び
出して単語を構成し、可能な組合せをすべてもれなく読
み出せばよく順序はいかようでもよい。また候補順位で
定まる得点をすべての文字位置にわたって加算したもの
を候補単語の尤度として計算しておく。さらに制御部4
は読み出された候補単語について個々の漢字の読みが記
憶されている読み辞書6を参照して候補単語全体の読み
を生成する。このとき個々の漢字には複数の読みがある
ことがあるので、ここで生成される候補単語全体の読み
は複数である可能性がある。信号8はふシがなの文字数
であυ、前記候補文字全体の読みのうち文字数が信号8
と一致するもののみを選択して漢字候補単語の尤度と共
に順次単語照合部6へ転送する。単語照合部6では、制
御部から転送された漢字の候補単語全体の読みとふシが
なとの照合度を計算する。照合度としては特開昭59−
197974に開示されているような候補順位によって
定まる得点をすべての文字位置について加え合せたもの
を用いることができる。
FIG. 1 is a block diagram showing a schematic configuration of the present invention. In FIG. 1, a character recognition device 1 is a pattern recognition device such as an optical character recognition device or an online character recognition device, and the recognition results of kanji strings and furigana are output to a kanji character candidate table 2 and a furigana candidate character table 3, respectively. be remembered. Here, the table is a list of candidate characters arranged in a table format as shown in FIG. 22, and each candidate is encoded and stored in a memory. The control unit 4 sequentially reads candidate words from the kanji candidate table 2. Candidate words can be read out in any order as long as one candidate character is selected from each character position to form a word, and all possible combinations are read out without exception. Also, the likelihood of the candidate word is calculated by adding the scores determined by the candidate ranking over all character positions. Furthermore, the control section 4
generates the reading of the entire candidate word by referring to the reading dictionary 6 which stores the reading of each kanji for the read candidate word. At this time, since each kanji may have multiple readings, the entire candidate word generated here may have multiple readings. Signal 8 is the number of letters in Fushigana υ, and the number of characters among the readings of all candidate characters is signal 8.
Only the words that match are selected and sequentially transferred to the word matching unit 6 along with the likelihood of the Kanji candidate word. The word matching unit 6 calculates the degree of matching between the pronunciation of the entire candidate word of the kanji transferred from the control unit and the Japanese kana. The degree of matching is JP-A-59-
197974, which is the sum of scores determined by candidate rankings for all character positions, can be used.

さらに加え合せて得られた得点と制御部よシ転送された
漢字候補単語の尤度との和もしくは積をとることによシ
これを漢字候補単語の読みの照合度とする。漢字候補単
語およびその読みと照合度はソート回路7へ転送される
。ソート回路ではすべての漢字候補単語の読みについて
照合度の大きい項にソートされ、最も大きな照合度を持
つ漢字候補単語とその読みが漢字列とふシがなの認識結
果として出力される。
Furthermore, by calculating the sum or product of the obtained score and the likelihood of the Kanji candidate word transferred from the control unit, this is used as the degree of matching of the pronunciation of the Kanji candidate word. The kanji candidate words, their pronunciations, and matching degrees are transferred to the sorting circuit 7. The sorting circuit sorts the pronunciations of all kanji candidate words into terms with the highest matching degree, and outputs the kanji candidate word with the highest matching degree and its pronunciation as the recognition result of the kanji string and Japanese alphabet.

漢字の種類を4000字とすれば読み辞書の容量は4O
Kバイト程度であり、10万種の姓を格納するのに必要
なメモリ容量が数100にバイトであるのに比べて飛躇
的に少ないメモリ容量しか必要としない。また、10万
種の姓をひとつづつ照合するのに比べて処理時間も格段
に短縮される。
If there are 4000 types of kanji, the capacity of a reading dictionary is 4O.
The memory capacity is approximately 1,000 K bytes, which is significantly smaller than the several 100 bytes required to store 100,000 surnames. Additionally, the processing time is significantly reduced compared to comparing 100,000 surnames one by one.

〔発明の効果〕〔Effect of the invention〕

以上述べたように本発明によればメモリ容量の少ない高
速のパターン認識結果の自動修正が可能となり、しかも
、対象となる単語の種類は、漢字の読みのすべての組合
せであり漢字の種類が1000字以上なら事実上無限と
いってもよい。
As described above, according to the present invention, it is possible to automatically correct pattern recognition results at high speed with a small memory capacity, and moreover, the target word types include all combinations of kanji readings, and the number of kanji types is 1000. If it is more than 1 character, it can be said that it is practically infinite.

また本発明は他の方法と組合せることによって効果を上
げることが可能である。たとえば比較的出現頻度の高い
単語については特開昭59−197974号公報に示さ
れるような単語照合方式で照合を行ない、これで照合の
得られなかった単語についてのみ本発明を適用すること
ができる。姓の場合的1000で全体の80%を占めて
おり1000ケの単語との照合で80チが照合でき、残
りの20%に対してのみ本発明を適用することによって
さらに高速化を図ることができ、認識精度も向上させる
ことができる。
Moreover, the present invention can be more effective by combining with other methods. For example, words that appear relatively frequently can be compared using a word matching method such as that shown in Japanese Patent Application Laid-Open No. 197974/1985, and the present invention can be applied only to words that cannot be matched. . In the case of a surname of 1000, it accounts for 80% of the total, and by matching 1000 words, 80 words can be matched, and by applying the present invention only to the remaining 20%, it is possible to further speed up the processing. It is possible to improve recognition accuracy.

【図面の簡単な説明】[Brief explanation of drawings]

第1図は本発明の一実施例の概略構成を示すブロック図
、第2図および第3図は本発明の一実施例の動作を説明
するための図である。 1・・・文字認識装置 2・・・漢字候補文字テーブル 3・・・ふシがな候補文字テーブル 5・・・読み辞書 6・・単語照合部 7・・・ソート回路 代理人 弁理士 則 近 憲 佑 同       松  山  光  之第1図 第2図 〒回ロロ ロ可弔F 第3図
FIG. 1 is a block diagram showing a schematic configuration of one embodiment of the present invention, and FIGS. 2 and 3 are diagrams for explaining the operation of one embodiment of the present invention. 1...Character recognition device 2...Kanji candidate character table 3...Fushigana candidate character table 5...Yomi dictionary 6...Word collation unit 7...Sort circuit agent Patent attorney Chika Nori Ken Yudo Mitsuru Matsuyama Figure 1 Figure 2 Roro Roro Kasuke F Figure 3

Claims (1)

【特許請求の範囲】[Claims] 同一内容を示す第1及び第2のパターン列の両方を認識
する装置において個々のパターンを識別するパターン識
別手段と、第1のパターン列に存在しうるパターンと、
第2のパターン列に存在しうるパターンとの対応関係を
記憶する手段と、第1のパターン列の識別結果を上記対
応関係によって変換して得られる第3のパターン列と第
2のパターン列の識別結果とを照合する手段を有し、最
も照合度の高い第2のパターン列の識別結果と、第3の
パターン列に対応する第1のパターン列の識別結果の一
方もしくは両方を最終結果とすることを特徴とするパタ
ーン認識装置。
pattern identification means for identifying individual patterns in a device that recognizes both first and second pattern sequences indicating the same content; and patterns that may exist in the first pattern sequence;
means for storing a correspondence relationship with patterns that may exist in the second pattern sequence; and means for storing a correspondence relationship between the third pattern sequence and the second pattern sequence obtained by converting the identification result of the first pattern sequence using the correspondence relationship; It has a means for comparing the identification result with the identification result, and one or both of the identification result of the second pattern string with the highest matching degree and the identification result of the first pattern string corresponding to the third pattern string is used as the final result. A pattern recognition device characterized by:
JP62101869A 1987-04-27 1987-04-27 Pattern recognizing device Pending JPS63268082A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP62101869A JPS63268082A (en) 1987-04-27 1987-04-27 Pattern recognizing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP62101869A JPS63268082A (en) 1987-04-27 1987-04-27 Pattern recognizing device

Publications (1)

Publication Number Publication Date
JPS63268082A true JPS63268082A (en) 1988-11-04

Family

ID=14311991

Family Applications (1)

Application Number Title Priority Date Filing Date
JP62101869A Pending JPS63268082A (en) 1987-04-27 1987-04-27 Pattern recognizing device

Country Status (1)

Country Link
JP (1) JPS63268082A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04340686A (en) * 1991-05-17 1992-11-27 Pfu Ltd Name dictionary for post-processing of character recognition
JPH06180767A (en) * 1992-12-11 1994-06-28 Hideaki Isogai Character recognizing device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04340686A (en) * 1991-05-17 1992-11-27 Pfu Ltd Name dictionary for post-processing of character recognition
JPH06180767A (en) * 1992-12-11 1994-06-28 Hideaki Isogai Character recognizing device

Similar Documents

Publication Publication Date Title
US6978044B2 (en) Pattern string matching apparatus and pattern string matching method
JPS63268082A (en) Pattern recognizing device
JPS6262388B2 (en)
El Yacoubi et al. Conjoined location and recognition of street names within a postal address delivery line
JPS592191A (en) Recognizing and processing system of handwritten japanese sentence
JPS5842904B2 (en) Handwritten kana/kanji character recognition device
JP2923295B2 (en) Pattern identification processing method
JP2942375B2 (en) Character reader
JP2939945B2 (en) Roman character address recognition device
JP2908132B2 (en) Post-processing method of character recognition result
JPH0420229B2 (en)
JPS60225273A (en) Word retrieving system
JP2746899B2 (en) Character recognition device
JPH0583957B2 (en)
JPS5930176A (en) Character discrimination processing system
JPS62285189A (en) Character recognition post processing system
JPS63138479A (en) Character recognizing device
JPH01177180A (en) Character recognizing method
JPH09171539A (en) Character recognition device
JPS5953986A (en) Character recognizing device
JPH01166188A (en) Method for recognizing character
JPS594071B2 (en) character recognition device
JPH076212A (en) Intelligence processing unit for optical character reader
JPS6121581A (en) Character recognizer
JPS60225987A (en) Pattern recognizer