JPH0373086A

JPH0373086A - Character recognition post-processing system

Info

Publication number: JPH0373086A
Application number: JP1092894A
Authority: JP
Inventors: Akira Suzuki; 章鈴木; Sueji Miyahara; 末治宮原; Yoshimasa Kimura; 木村　義政
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1989-04-14
Filing date: 1989-04-14
Publication date: 1991-03-28
Anticipated expiration: 2013-08-27
Also published as: JP2790842B2

Abstract

PURPOSE:To perform the correction processing of a character with high accuracy by performing the selection of the optimum word candidate out of the characters with the number of strokes near to that of the character with a prescribed processing even when it includes the character not being recognized precisely. CONSTITUTION:The subject system is provided with a word dictionary 2, a word dictionary collation part 3 which collates the word dictionary 2 with a character recognition result 1, a stroke number table 4 on which the numbers of strokes of all the character categories are recorded, and a stroke number collation part 5 operated under a prescribed condition by receiving a word outputted from the word dictionary collation part 3. The character recognized at the word dictionary collation part 3 is collated with the word dictionary 2, and one or plural words with the highest priority are outputted. When plural words outputted from the stroke number table 4 on which the numbers of strokes of all the character categories and the word dictionary collation means 3 exist and no character exists in the plural words in a candidate character set that is the recognition result at the same part, the word with the nearest mean value of the number of strokes can be detected precisely at the stroke number collation part 5. In such a manner, the character with the number of strokes ear to that of the character not being recognized precisely can be selected, which improves correction accuracy.

Description

【発明の詳細な説明】［産業上の利用分野］本発明は、文字認識装置等において、文字認識結果の候
補文字の画数を情報として利用し、正確に認識できなか
った文字を含んでいる場合においても、画数が近い正確
な候補文字を選択して訂正し、ＫＲ識精度を向上させる
文字認識後処理方式に関する。[Detailed Description of the Invention] [Industrial Application Field] The present invention utilizes the number of strokes of a candidate character as information in a character recognition device, etc., when the number of strokes of a candidate character as a result of character recognition is included. The present invention also relates to a character recognition post-processing method that selects and corrects accurate candidate characters with a similar number of strokes to improve KR recognition accuracy.

［従来の技術］一般に文字認識装置は、認識結果として１入力文字に対
して複数個の候補文字とその距離値または類似度を出力
する。従来の文字認識後処理は、この認識結果と単語辞
書を照合し、単語辞書の各文字がそれと対応する位置の
候補文字集合中に存在すればその単語を正解としていた
。しかしこの方法では、認識結果の候補文字の中に正解
文字が含まれていない場合には、正確に文字認識出力が
得られない欠点があった。[Prior Art] Generally, a character recognition device outputs a plurality of candidate characters and their distance values or similarities for one input character as a recognition result. In conventional character recognition post-processing, this recognition result is compared with a word dictionary, and if each character in the word dictionary exists in the candidate character set at the corresponding position, that word is determined to be correct. However, this method has the drawback that accurate character recognition output cannot be obtained if the correct character is not included in the candidate characters of the recognition result.

［発明が解決しようとする課題］本発明は、文字認識後処理方式において、文字認識装置
の入力文字の画数と認識結果の候補文字の画数は誤って
認識した場合でもそれほど変化しないことが多い性質を
利用して、正確に認識できなかった文字を含んでいる場
合にも、近似した画数の文字を選択し、訂正精度を向上
させることを目的とする。[Problem to be Solved by the Invention] The present invention solves the problem that, in a character recognition post-processing method, the number of strokes of an input character of a character recognition device and the number of strokes of a candidate character as a recognition result often do not change much even if they are recognized incorrectly. The purpose of this paper is to improve correction accuracy by selecting characters with a similar number of strokes even when characters that cannot be accurately recognized are included.

し課題を解決するための手段］上記の課題を解決するために、本発明は、単語辞書、単
語辞書と文字認識結果を照合する単語辞書照合部、全文
字カテゴリの画数を記録した画数テーブル、単語辞書照
合部から出力された単語を受けて、所定の条件で動作す
る画数照合部を備えている。Means for Solving the Problems] In order to solve the above problems, the present invention provides a word dictionary, a word dictionary matching unit that matches the word dictionary and character recognition results, a stroke number table that records the number of strokes of all character categories, It includes a stroke number matching section that receives the word output from the word dictionary matching section and operates under predetermined conditions.

［作用］単語辞書照合部において、認識した文字を単語辞書と照
合し、各単語に優先度を付加して最も優先度の高い単語
を単数もしくは複数出力し、全文字カテゴリの画数を記
録した画数テーブルと、単語辞書照合手段から出力され
た単語が複数存在しかつ該複数単語が同一の箇所で認識
結果の候補文字集合中に該当文字が存在しない場合、画
数照合部において画数テーブルを用いて単語集合の文字
の画数と認識結果の該箇所の上位の単数もしくは複数の
候補文字の画数の平均値とを比較し、もっとも近い該平
均値を該箇所に持つ単語を正解として出力する。[Function] In the word dictionary matching section, the recognized characters are compared with the word dictionary, a priority is added to each word, the word or words with the highest priority are output, and the number of strokes for all character categories is recorded. If there are multiple words output from the table and the word dictionary matching means, and the corresponding character does not exist in the candidate character set of the recognition result at the same location, the stroke number matching unit uses the stroke number table to The number of strokes of the characters in the set is compared with the average value of the number of strokes of the single or plural candidate characters at the top of the position in the recognition result, and the word having the closest average value at the position is output as the correct answer.

［実施例］以下、本発明の一実施例について図面を用いて説明する
。[Example] An example of the present invention will be described below with reference to the drawings.

第１図は本発明で対象とする文字認識後処理方式を構成
する装置の一実施例である。FIG. 1 shows an embodiment of a device constituting a character recognition post-processing system targeted by the present invention.

第１図において、１は認識結果、２は単語辞書、３は単
語辞書照合部、４は画数テーブル、５は画数照合部、６
は訂正結果である。In FIG. 1, 1 is a recognition result, 2 is a word dictionary, 3 is a word dictionary matching section, 4 is a stroke number table, 5 is a stroke number matching section, 6
is the correction result.

文字認識による認識結果ｌは、最初に単語辞書照合部３
に入力される。第２図は、この単語辞書照合部３におけ
る処理のフローチャートである。The recognition result l from character recognition is first passed to the word dictionary collation unit 3.
is input. FIG. 2 is a flowchart of the processing in the word dictionary matching section 3.

また、第３図は、このフローに従って、単語辞書照合部
において単語を処理する実施例を説明する図である。Further, FIG. 3 is a diagram illustrating an embodiment in which words are processed in the word dictionary matching section according to this flow.

第２図において、ステップＳ１は、文字パターン認識装
置において文字列を８Ｂ識し、ステップＳ２において単
語に切り分け、Ｓ３において認識候補を順位をつけて各
文字の距離を算出し認識結果として所定の記憶場所に格
納する。Ｓ４は、認識結果１と単語辞書２に格納されて
いる単語とを照合する段階、Ｓ５は照合した各単語につ
いて一致した文字数と合計距離を計算して単語辞書照合
部の中間結果格納部に記憶する段階、Ｓ６はＳ５におい
て処理されたデータを一致文字数の大きい順にソートす
る処理を実行する段階、Ｓ７で一致文字数の最大のグル
ープを抽出した後、Ｓ８でそのグループの中で合計距離
値が小さい順にソートし記憶場所に格納する。このよう
にして処理されたデータは、Ｓ９において最小合計距離
値を持つ単語数を判断して単数であれば■の出力にその
単語のみを次段の画数照合部に送る判断出力を発生し、
判断出力が■であれば、すなわち単語が複数存在すれば
、それらの単語のほかに、認識結果の第１位候補文字の
みを抽出した文字列を画数照合部に送るよう動作する。In FIG. 2, step S1 recognizes 8B character strings in a character pattern recognition device, cuts them into words in step S2, ranks recognition candidates in S3, calculates the distance between each character, and stores them in a predetermined memory as recognition results. Store in place. S4 is a step of comparing the recognition result 1 with words stored in the word dictionary 2, and S5 is a step of calculating the number of matched characters and total distance for each matched word and storing it in the intermediate result storage section of the word dictionary matching section. S6 is a step of sorting the data processed in S5 in descending order of the number of matching characters. After extracting the group with the largest number of matching characters in S7, S8 is a step of sorting the data processed in S5 in descending order of the number of matching characters. Sort them in order and store them in a memory location. The data processed in this way determines the number of words with the minimum total distance value in S9, and if it is singular, generates a judgment output that sends only that word to the next stage stroke number matching unit,
If the judgment output is ■, that is, if there are a plurality of words, then in addition to those words, the character string extracted from the first candidate character of the recognition result is sent to the stroke number matching unit.

処理の実施例を説明する第３図において、１１は文字認
識装置から送られてきた認識結果の例で、正解は「川口
市幸町」である。１１の例では「口」を認識した結果の
候補文字集合中に正解が入っていない。In FIG. 3 illustrating an example of processing, 11 is an example of the recognition result sent from the character recognition device, and the correct answer is "Saiwai-cho, Kawaguchi City." In example 11, the correct answer is not included in the candidate character set resulting from recognition of "mouth".

２１は単語辞書２の内部表現の例である。単語辞書照合
部３では認識結果１と単語辞書２の中の各単語を照合し
、中間結果格納部３１にいったん処理結果を保持し、次
に一致文字数の大きい順にソートし、−散文字数のもっ
とも大きいグループを抽出した後、そのグループの中で
合計距離値の小さい順にソートし単語辞書照合結果部３
２に格納する。そして最も小さい合計距離値を持った単
語が単数存在する場合はその単語だけを次段の処理であ
る画数照合部５に送る。単語が複数存在する場合は、複
数個の単語全部と、文字認識結果の所定順位の候補文字
から作られる文字列を画数照合部５に渡す。この例では
、単語候補としてｒＪｌ崎市幸町」、「川口市幸町」、
「川越市幸町」、そして認識結果の候補文字が画数照合
部５に送られる。21 is an example of internal representation of the word dictionary 2. The word dictionary collation unit 3 collates the recognition result 1 with each word in the word dictionary 2, temporarily stores the processing results in the intermediate result storage unit 31, and then sorts them in descending order of the number of matched characters. After extracting large groups, the groups are sorted in descending order of total distance value and word dictionary matching result section 3
Store in 2. If there is a single word with the smallest total distance value, only that word is sent to the stroke number matching section 5, which is the next stage of processing. If a plurality of words exist, a character string made from all of the plurality of words and candidate characters of a predetermined rank as a result of character recognition is passed to the stroke number matching unit 5. In this example, the word candidates are rJl Saki City Saiwai-cho”, “Kawaguchi City Saiwai-cho”,
"Kawagoe City Saiwai-cho" and the candidate characters as a result of recognition are sent to the stroke number matching section 5.

第４図は画数照合部５における処理のフローチャートで
あって、各レジスタに格納されるべき単語等のデータと
、各段階毎の処理の内容を示している。FIG. 4 is a flowchart of the processing in the stroke number matching section 5, showing data such as words to be stored in each register and the contents of the processing at each stage.

第５図は、第４図の処理フローに従った処理の実施例で
あって、画数照合部５は、単語辞書照合部３から送られ
た３つの単語候補を、二次元配列のレジスタミー］の第
１行から第３行に、また認識結果の候補文字をおなじく
二次元配列のレジスタｂ−１の第１行から第３行に格納
する。次にレジスタミー１の内容を、第１行の第１文字
から第２行の第１文字というように、先頭から順に縦方
向に比較し、各列の文字が全部等しければ○、そうでな
ければ×の記号をその列に記録する。第５図の例では、
先頭から２番目の列に存在する文字が異なっているので
×印の記号が記録される。FIG. 5 shows an example of processing according to the processing flow of FIG. 4, in which the stroke number matching unit 5 registers three word candidates sent from the word dictionary matching unit 3 in a two-dimensional array. The candidate characters resulting from the recognition are also stored in the first to third rows of the two-dimensional array register b-1. Next, compare the contents of register me 1 vertically from the beginning, from the first character of the first row to the first character of the second row, and if all the characters in each column are the same, then ○, otherwise Record the x symbol in that column. In the example in Figure 5,
Since the characters present in the second column from the beginning are different, an x symbol is recorded.

その後、レジスタミー１において、×印の記号が記録さ
れている列の文字をレジスタミー２に書き込み、またレ
ジスタｔ）−１の文字の中でレジスタミー１の×印が記
録されている列と同じ列番号に属する候補文字を上位か
ら順番に、すなわち第１行、第２行・・・の順にレジス
タｂ−２に書き込む。After that, in register me 1, write the characters in the column in which the cross symbol is recorded in register me 2, and write the characters in the column in which the cross mark in register me 1 is recorded in the characters in register t)-1 Candidate characters belonging to the same column number are written into register b-2 in order from the top, ie, in the order of the first row, the second row, and so on.

レジスタｂ−２に書き込む文字の数は自由に設定できる
が、この実施例では上位３個の候補文字を書き込むよう
に設定されている。第５図の例では、レジスタミー２に
は「口」、「越」、「崎」が、レジスタｂ−２には「は
」、「は」、「に」、が書き込まれる。Although the number of characters to be written into the register b-2 can be set freely, in this embodiment, it is set to write the top three candidate characters. In the example of FIG. 5, "Kuchi", "Koshi", and "Saki" are written in register me2, and "ha", "ha", and "ni" are written in register b-2.

次にレジスタミー２の各文字の画数をテーブル４１を参
照して調べる。そしてレジスタｂ−２の各文字の画数を
画数テーブル４１を参照して調べ、各画数を除算器５５
に送り、画数の和を文字数の和で除してその平均値を求
める。次にその平均画数とレジスタミー２の各文字の画
数と比較し、レジスタｂ−２の文字の平均画数ともっと
も近い画数を持ったレジスタミー２の文字を採用し、そ
れを含む単語候補を正解として出力する。第５図の例で
は、「口ｊ　（画数３）が、レジスタｂ−２の平均画数
の画数（＝３．３＞に最も近いことから、それを含む「
川口市幸町」が訂正結果６として出力される。Next, the number of strokes of each character of register me 2 is checked with reference to the table 41. Then, the number of strokes of each character in register b-2 is checked with reference to the number of strokes table 41, and each number of strokes is calculated by the divider 55.
, and divide the sum of the number of strokes by the sum of the number of characters to find the average value. Next, compare that average number of strokes with the number of strokes of each character in Register Me 2, select the character in Register Me 2 that has the closest number of strokes to the average number of strokes of the characters in Register b-2, and select the word candidate containing it as the correct word. Output as . In the example shown in FIG.
"Kawaguchi City Saiwai-cho" is output as correction result 6.

なお、実施例では距離値を用いて候補文字の適否を判定
したが、類似度を用いても同様の処理を行うことができ
る。In the embodiment, the suitability of candidate characters is determined using distance values, but similar processing can be performed using similarity.

本発明を実施するための装置は、専用の回路を組み込ん
で文字１ＥＷｉ結果を入力とし、正解単語を出力とする
ことが出来る集積回路として実現することもできるし、
また、汎用の情報処理回路に本発明の目的を達成できる
機能の制御回路を付加し、必要な文字情報などを記憶回
路に格納して実現することもできる。The device for implementing the present invention can be realized as an integrated circuit that incorporates a dedicated circuit and can input the character 1 EWi result and output the correct word.
Further, it is also possible to add a control circuit with a function capable of achieving the object of the present invention to a general-purpose information processing circuit, and store necessary character information and the like in a storage circuit.

［発明の効果］以上のように、本発明は、文字認識後処理を行うに際し
て、正確に認識されなかった文字を含む場合であっても
、画数の近い文字の中から所定の処理によって最適な単
語候補の選択を行うことが可能であり、これにより文字
認識をした後、正確に認識出来なかった文字の訂正処理
の高精度化が図れる。[Effects of the Invention] As described above, when performing character recognition post-processing, even when characters that are not accurately recognized are included, the present invention can select the optimal one from among characters with a similar number of strokes through predetermined processing. It is possible to select word candidates, and thereby, after character recognition, correction processing for characters that cannot be accurately recognized can be performed with high accuracy.

[Brief explanation of drawings]

第１図は本発明で対象とする文字認識後処理方式の一実
施例を示すブロック図、第２図は第１図の単語辞書照合
部３における処理例を説明するための図、第３図は第１
図の画数照合部５のにおける処理例を説明するためその
内部の各レジスタ及び画数テーブルの内容と、処理を示
す図である。１：認識結果２：単語辞書３：単語辞書照合部４：画数テーブル５：画数照合部６：訂正結果１１；文字認識装置から送られてきた認識結果の例２１：単語辞書２の内部表現の例：中間結果格納部：単語辞書照合結果部二面数テーブル：レジスタａ−１：レジスタｂ−１レジスタミー２：レジスタｂ−２：除算器FIG. 1 is a block diagram showing an example of a character recognition post-processing method targeted by the present invention, FIG. 2 is a diagram for explaining an example of processing in the word dictionary matching section 3 of FIG. 1, and FIG. is the first
FIG. 2 is a diagram showing the contents of each internal register and a stroke number table, and the processing, in order to explain an example of processing in the stroke number matching unit 5 shown in the figure. 1: Recognition result 2: Word dictionary 3: Word dictionary collation unit 4: Stroke count table 5: Stroke count collation unit 6: Correction result 11; Example of recognition result sent from character recognition device 21: Internal representation of word dictionary 2 Example: Intermediate result storage section: Word dictionary matching result section Two-sided number table: Register a-1: Register b-1 Register me 2: Register b-2: Divider

Claims

[Claims]

A character recognition device that recognizes an input character pattern and generates a distance value or similarity between a single or multiple candidate characters and a standard pattern, compares the recognition results with a word dictionary and determines priorities. There is a word dictionary matching unit that outputs one or more words with the highest priority added to each word, a stroke count table that records the number of strokes of all character categories, and multiple words output from the word dictionary matching unit.
In addition, if the plural words are in the same location and the corresponding character does not exist in the candidate character set of the recognition result, the number of strokes of the character in the word set and the number of strokes of the single or multiple candidate characters at the top of the recognition result at the location are determined. A character recognition post-processing method comprising: a stroke count comparison unit that compares the average value and outputs the word having the closest average value at the location as the correct answer.