JPH0373086A - Character recognition post-processing system - Google Patents

Character recognition post-processing system

Info

Publication number
JPH0373086A
JPH0373086A JP1092894A JP9289489A JPH0373086A JP H0373086 A JPH0373086 A JP H0373086A JP 1092894 A JP1092894 A JP 1092894A JP 9289489 A JP9289489 A JP 9289489A JP H0373086 A JPH0373086 A JP H0373086A
Authority
JP
Japan
Prior art keywords
character
word
strokes
word dictionary
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP1092894A
Other languages
Japanese (ja)
Other versions
JP2790842B2 (en
Inventor
Akira Suzuki
章 鈴木
Sueji Miyahara
末治 宮原
Yoshimasa Kimura
木村 義政
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP1092894A priority Critical patent/JP2790842B2/en
Publication of JPH0373086A publication Critical patent/JPH0373086A/en
Application granted granted Critical
Publication of JP2790842B2 publication Critical patent/JP2790842B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PURPOSE:To perform the correction processing of a character with high accuracy by performing the selection of the optimum word candidate out of the characters with the number of strokes near to that of the character with a prescribed processing even when it includes the character not being recognized precisely. CONSTITUTION:The subject system is provided with a word dictionary 2, a word dictionary collation part 3 which collates the word dictionary 2 with a character recognition result 1, a stroke number table 4 on which the numbers of strokes of all the character categories are recorded, and a stroke number collation part 5 operated under a prescribed condition by receiving a word outputted from the word dictionary collation part 3. The character recognized at the word dictionary collation part 3 is collated with the word dictionary 2, and one or plural words with the highest priority are outputted. When plural words outputted from the stroke number table 4 on which the numbers of strokes of all the character categories and the word dictionary collation means 3 exist and no character exists in the plural words in a candidate character set that is the recognition result at the same part, the word with the nearest mean value of the number of strokes can be detected precisely at the stroke number collation part 5. In such a manner, the character with the number of strokes ear to that of the character not being recognized precisely can be selected, which improves correction accuracy.

Description

【発明の詳細な説明】 [産業上の利用分野] 本発明は、文字認識装置等において、文字認識結果の候
補文字の画数を情報として利用し、正確に認識できなか
った文字を含んでいる場合においても、画数が近い正確
な候補文字を選択して訂正し、KR識精度を向上させる
文字認識後処理方式に関する。
[Detailed Description of the Invention] [Industrial Application Field] The present invention utilizes the number of strokes of a candidate character as information in a character recognition device, etc., when the number of strokes of a candidate character as a result of character recognition is included. The present invention also relates to a character recognition post-processing method that selects and corrects accurate candidate characters with a similar number of strokes to improve KR recognition accuracy.

[従来の技術] 一般に文字認識装置は、認識結果として1入力文字に対
して複数個の候補文字とその距離値または類似度を出力
する。従来の文字認識後処理は、この認識結果と単語辞
書を照合し、単語辞書の各文字がそれと対応する位置の
候補文字集合中に存在すればその単語を正解としていた
。しかしこの方法では、認識結果の候補文字の中に正解
文字が含まれていない場合には、正確に文字認識出力が
得られない欠点があった。
[Prior Art] Generally, a character recognition device outputs a plurality of candidate characters and their distance values or similarities for one input character as a recognition result. In conventional character recognition post-processing, this recognition result is compared with a word dictionary, and if each character in the word dictionary exists in the candidate character set at the corresponding position, that word is determined to be correct. However, this method has the drawback that accurate character recognition output cannot be obtained if the correct character is not included in the candidate characters of the recognition result.

[発明が解決しようとする課題] 本発明は、文字認識後処理方式において、文字認識装置
の入力文字の画数と認識結果の候補文字の画数は誤って
認識した場合でもそれほど変化しないことが多い性質を
利用して、正確に認識できなかった文字を含んでいる場
合にも、近似した画数の文字を選択し、訂正精度を向上
させることを目的とする。
[Problem to be Solved by the Invention] The present invention solves the problem that, in a character recognition post-processing method, the number of strokes of an input character of a character recognition device and the number of strokes of a candidate character as a recognition result often do not change much even if they are recognized incorrectly. The purpose of this paper is to improve correction accuracy by selecting characters with a similar number of strokes even when characters that cannot be accurately recognized are included.

し課題を解決するための手段] 上記の課題を解決するために、本発明は、単語辞書、単
語辞書と文字認識結果を照合する単語辞書照合部、全文
字カテゴリの画数を記録した画数テーブル、単語辞書照
合部から出力された単語を受けて、所定の条件で動作す
る画数照合部を備えている。
Means for Solving the Problems] In order to solve the above problems, the present invention provides a word dictionary, a word dictionary matching unit that matches the word dictionary and character recognition results, a stroke number table that records the number of strokes of all character categories, It includes a stroke number matching section that receives the word output from the word dictionary matching section and operates under predetermined conditions.

[作用] 単語辞書照合部において、認識した文字を単語辞書と照
合し、各単語に優先度を付加して最も優先度の高い単語
を単数もしくは複数出力し、全文字カテゴリの画数を記
録した画数テーブルと、単語辞書照合手段から出力され
た単語が複数存在しかつ該複数単語が同一の箇所で認識
結果の候補文字集合中に該当文字が存在しない場合、画
数照合部において画数テーブルを用いて単語集合の文字
の画数と認識結果の該箇所の上位の単数もしくは複数の
候補文字の画数の平均値とを比較し、もっとも近い該平
均値を該箇所に持つ単語を正解として出力する。
[Function] In the word dictionary matching section, the recognized characters are compared with the word dictionary, a priority is added to each word, the word or words with the highest priority are output, and the number of strokes for all character categories is recorded. If there are multiple words output from the table and the word dictionary matching means, and the corresponding character does not exist in the candidate character set of the recognition result at the same location, the stroke number matching unit uses the stroke number table to The number of strokes of the characters in the set is compared with the average value of the number of strokes of the single or plural candidate characters at the top of the position in the recognition result, and the word having the closest average value at the position is output as the correct answer.

[実施例] 以下、本発明の一実施例について図面を用いて説明する
[Example] An example of the present invention will be described below with reference to the drawings.

第1図は本発明で対象とする文字認識後処理方式を構成
する装置の一実施例である。
FIG. 1 shows an embodiment of a device constituting a character recognition post-processing system targeted by the present invention.

第1図において、1は認識結果、2は単語辞書、3は単
語辞書照合部、4は画数テーブル、5は画数照合部、6
は訂正結果である。
In FIG. 1, 1 is a recognition result, 2 is a word dictionary, 3 is a word dictionary matching section, 4 is a stroke number table, 5 is a stroke number matching section, 6
is the correction result.

文字認識による認識結果lは、最初に単語辞書照合部3
に入力される。第2図は、この単語辞書照合部3におけ
る処理のフローチャートである。
The recognition result l from character recognition is first passed to the word dictionary collation unit 3.
is input. FIG. 2 is a flowchart of the processing in the word dictionary matching section 3.

また、第3図は、このフローに従って、単語辞書照合部
において単語を処理する実施例を説明する図である。
Further, FIG. 3 is a diagram illustrating an embodiment in which words are processed in the word dictionary matching section according to this flow.

第2図において、ステップS1は、文字パターン認識装
置において文字列を8B識し、ステップS2において単
語に切り分け、S3において認識候補を順位をつけて各
文字の距離を算出し認識結果として所定の記憶場所に格
納する。S4は、認識結果1と単語辞書2に格納されて
いる単語とを照合する段階、S5は照合した各単語につ
いて一致した文字数と合計距離を計算して単語辞書照合
部の中間結果格納部に記憶する段階、S6はS5におい
て処理されたデータを一致文字数の大きい順にソートす
る処理を実行する段階、S7で一致文字数の最大のグル
ープを抽出した後、S8でそのグループの中で合計距離
値が小さい順にソートし記憶場所に格納する。このよう
にして処理されたデータは、S9において最小合計距離
値を持つ単語数を判断して単数であれば■の出力にその
単語のみを次段の画数照合部に送る判断出力を発生し、
判断出力が■であれば、すなわち単語が複数存在すれば
、それらの単語のほかに、認識結果の第1位候補文字の
みを抽出した文字列を画数照合部に送るよう動作する。
In FIG. 2, step S1 recognizes 8B character strings in a character pattern recognition device, cuts them into words in step S2, ranks recognition candidates in S3, calculates the distance between each character, and stores them in a predetermined memory as recognition results. Store in place. S4 is a step of comparing the recognition result 1 with words stored in the word dictionary 2, and S5 is a step of calculating the number of matched characters and total distance for each matched word and storing it in the intermediate result storage section of the word dictionary matching section. S6 is a step of sorting the data processed in S5 in descending order of the number of matching characters. After extracting the group with the largest number of matching characters in S7, S8 is a step of sorting the data processed in S5 in descending order of the number of matching characters. Sort them in order and store them in a memory location. The data processed in this way determines the number of words with the minimum total distance value in S9, and if it is singular, generates a judgment output that sends only that word to the next stage stroke number matching unit,
If the judgment output is ■, that is, if there are a plurality of words, then in addition to those words, the character string extracted from the first candidate character of the recognition result is sent to the stroke number matching unit.

処理の実施例を説明する第3図において、11は文字認
識装置から送られてきた認識結果の例で、正解は「川口
市幸町」である。11の例では「口」を認識した結果の
候補文字集合中に正解が入っていない。
In FIG. 3 illustrating an example of processing, 11 is an example of the recognition result sent from the character recognition device, and the correct answer is "Saiwai-cho, Kawaguchi City." In example 11, the correct answer is not included in the candidate character set resulting from recognition of "mouth".

21は単語辞書2の内部表現の例である。単語辞書照合
部3では認識結果1と単語辞書2の中の各単語を照合し
、中間結果格納部31にいったん処理結果を保持し、次
に一致文字数の大きい順にソートし、−散文字数のもっ
とも大きいグループを抽出した後、そのグループの中で
合計距離値の小さい順にソートし単語辞書照合結果部3
2に格納する。そして最も小さい合計距離値を持った単
語が単数存在する場合はその単語だけを次段の処理であ
る画数照合部5に送る。単語が複数存在する場合は、複
数個の単語全部と、文字認識結果の所定順位の候補文字
から作られる文字列を画数照合部5に渡す。この例では
、単語候補としてrJl崎市幸町」、「川口市幸町」、
「川越市幸町」、そして認識結果の候補文字が画数照合
部5に送られる。
21 is an example of internal representation of the word dictionary 2. The word dictionary collation unit 3 collates the recognition result 1 with each word in the word dictionary 2, temporarily stores the processing results in the intermediate result storage unit 31, and then sorts them in descending order of the number of matched characters. After extracting large groups, the groups are sorted in descending order of total distance value and word dictionary matching result section 3
Store in 2. If there is a single word with the smallest total distance value, only that word is sent to the stroke number matching section 5, which is the next stage of processing. If a plurality of words exist, a character string made from all of the plurality of words and candidate characters of a predetermined rank as a result of character recognition is passed to the stroke number matching unit 5. In this example, the word candidates are rJl Saki City Saiwai-cho”, “Kawaguchi City Saiwai-cho”,
"Kawagoe City Saiwai-cho" and the candidate characters as a result of recognition are sent to the stroke number matching section 5.

第4図は画数照合部5における処理のフローチャートで
あって、各レジスタに格納されるべき単語等のデータと
、各段階毎の処理の内容を示している。
FIG. 4 is a flowchart of the processing in the stroke number matching section 5, showing data such as words to be stored in each register and the contents of the processing at each stage.

第5図は、第4図の処理フローに従った処理の実施例で
あって、画数照合部5は、単語辞書照合部3から送られ
た3つの単語候補を、二次元配列のレジスタミー]の第
1行から第3行に、また認識結果の候補文字をおなじく
二次元配列のレジスタb−1の第1行から第3行に格納
する。次にレジスタミー1の内容を、第1行の第1文字
から第2行の第1文字というように、先頭から順に縦方
向に比較し、各列の文字が全部等しければ○、そうでな
ければ×の記号をその列に記録する。第5図の例では、
先頭から2番目の列に存在する文字が異なっているので
×印の記号が記録される。
FIG. 5 shows an example of processing according to the processing flow of FIG. 4, in which the stroke number matching unit 5 registers three word candidates sent from the word dictionary matching unit 3 in a two-dimensional array. The candidate characters resulting from the recognition are also stored in the first to third rows of the two-dimensional array register b-1. Next, compare the contents of register me 1 vertically from the beginning, from the first character of the first row to the first character of the second row, and if all the characters in each column are the same, then ○, otherwise Record the x symbol in that column. In the example in Figure 5,
Since the characters present in the second column from the beginning are different, an x symbol is recorded.

その後、レジスタミー1において、×印の記号が記録さ
れている列の文字をレジスタミー2に書き込み、またレ
ジスタt)−1の文字の中でレジスタミー1の×印が記
録されている列と同じ列番号に属する候補文字を上位か
ら順番に、すなわち第1行、第2行・・・の順にレジス
タb−2に書き込む。
After that, in register me 1, write the characters in the column in which the cross symbol is recorded in register me 2, and write the characters in the column in which the cross mark in register me 1 is recorded in the characters in register t)-1 Candidate characters belonging to the same column number are written into register b-2 in order from the top, ie, in the order of the first row, the second row, and so on.

レジスタb−2に書き込む文字の数は自由に設定できる
が、この実施例では上位3個の候補文字を書き込むよう
に設定されている。第5図の例では、レジスタミー2に
は「口」、「越」、「崎」が、レジスタb−2には「は
」、「は」、「に」、が書き込まれる。
Although the number of characters to be written into the register b-2 can be set freely, in this embodiment, it is set to write the top three candidate characters. In the example of FIG. 5, "Kuchi", "Koshi", and "Saki" are written in register me2, and "ha", "ha", and "ni" are written in register b-2.

次にレジスタミー2の各文字の画数をテーブル41を参
照して調べる。そしてレジスタb−2の各文字の画数を
画数テーブル41を参照して調べ、各画数を除算器55
に送り、画数の和を文字数の和で除してその平均値を求
める。次にその平均画数とレジスタミー2の各文字の画
数と比較し、レジスタb−2の文字の平均画数ともっと
も近い画数を持ったレジスタミー2の文字を採用し、そ
れを含む単語候補を正解として出力する。第5図の例で
は、「口j (画数3)が、レジスタb−2の平均画数
の画数(=3.3>に最も近いことから、それを含む「
川口市幸町」が訂正結果6として出力される。
Next, the number of strokes of each character of register me 2 is checked with reference to the table 41. Then, the number of strokes of each character in register b-2 is checked with reference to the number of strokes table 41, and each number of strokes is calculated by the divider 55.
, and divide the sum of the number of strokes by the sum of the number of characters to find the average value. Next, compare that average number of strokes with the number of strokes of each character in Register Me 2, select the character in Register Me 2 that has the closest number of strokes to the average number of strokes of the characters in Register b-2, and select the word candidate containing it as the correct word. Output as . In the example shown in FIG.
"Kawaguchi City Saiwai-cho" is output as correction result 6.

なお、実施例では距離値を用いて候補文字の適否を判定
したが、類似度を用いても同様の処理を行うことができ
る。
In the embodiment, the suitability of candidate characters is determined using distance values, but similar processing can be performed using similarity.

本発明を実施するための装置は、専用の回路を組み込ん
で文字1EWi結果を入力とし、正解単語を出力とする
ことが出来る集積回路として実現することもできるし、
また、汎用の情報処理回路に本発明の目的を達成できる
機能の制御回路を付加し、必要な文字情報などを記憶回
路に格納して実現することもできる。
The device for implementing the present invention can be realized as an integrated circuit that incorporates a dedicated circuit and can input the character 1 EWi result and output the correct word.
Further, it is also possible to add a control circuit with a function capable of achieving the object of the present invention to a general-purpose information processing circuit, and store necessary character information and the like in a storage circuit.

[発明の効果] 以上のように、本発明は、文字認識後処理を行うに際し
て、正確に認識されなかった文字を含む場合であっても
、画数の近い文字の中から所定の処理によって最適な単
語候補の選択を行うことが可能であり、これにより文字
認識をした後、正確に認識出来なかった文字の訂正処理
の高精度化が図れる。
[Effects of the Invention] As described above, when performing character recognition post-processing, even when characters that are not accurately recognized are included, the present invention can select the optimal one from among characters with a similar number of strokes through predetermined processing. It is possible to select word candidates, and thereby, after character recognition, correction processing for characters that cannot be accurately recognized can be performed with high accuracy.

【図面の簡単な説明】[Brief explanation of drawings]

第1図は本発明で対象とする文字認識後処理方式の一実
施例を示すブロック図、第2図は第1図の単語辞書照合
部3における処理例を説明するための図、第3図は第1
図の画数照合部5のにおける処理例を説明するためその
内部の各レジスタ及び画数テーブルの内容と、処理を示
す図である。 1:認識結果 2:単語辞書 3:単語辞書照合部 4:画数テーブル 5:画数照合部 6:訂正結果 11;文字認識装置から送られてきた認識結果の例 21:単語辞書2の内部表現の例 :中間結果格納部 :単語辞書照合結果部 二面数テーブル :レジスタa−1 :レジスタb−1 レジスタミー2 :レジスタb−2 :除算器
FIG. 1 is a block diagram showing an example of a character recognition post-processing method targeted by the present invention, FIG. 2 is a diagram for explaining an example of processing in the word dictionary matching section 3 of FIG. 1, and FIG. is the first
FIG. 2 is a diagram showing the contents of each internal register and a stroke number table, and the processing, in order to explain an example of processing in the stroke number matching unit 5 shown in the figure. 1: Recognition result 2: Word dictionary 3: Word dictionary collation unit 4: Stroke count table 5: Stroke count collation unit 6: Correction result 11; Example of recognition result sent from character recognition device 21: Internal representation of word dictionary 2 Example: Intermediate result storage section: Word dictionary matching result section Two-sided number table: Register a-1: Register b-1 Register me 2: Register b-2: Divider

Claims (1)

【特許請求の範囲】[Claims] 入力された文字パターンを認識して単数または複数の候
補文字および標準パターンとの距離値または類似度を生
成する文字認識装置において、単語辞書と、該単語辞書
と認識結果を照合し、優先度を各単語に付加して最も優
先度の高い単語を単数もしくは複数出力する単語辞書照
合部と、全文字カテゴリの画数を記録した画数テーブル
と、単語辞書照合部から出力された単語が複数存在し、
かつ、該複数単語が同一の箇所で認識結果の候補文字集
合中に該当文字が存在しない場合、該単語集合の文字の
画数と認識結果の該箇所の上位の単数もしくは複数の候
補文字の画数の平均値とを比較し、もっとも近い該平均
値を該箇所に持つ単語を正解として出力する画数照合部
とからなることを特徴とする文字認識後処理方式。
A character recognition device that recognizes an input character pattern and generates a distance value or similarity between a single or multiple candidate characters and a standard pattern, compares the recognition results with a word dictionary and determines priorities. There is a word dictionary matching unit that outputs one or more words with the highest priority added to each word, a stroke count table that records the number of strokes of all character categories, and multiple words output from the word dictionary matching unit.
In addition, if the plural words are in the same location and the corresponding character does not exist in the candidate character set of the recognition result, the number of strokes of the character in the word set and the number of strokes of the single or multiple candidate characters at the top of the recognition result at the location are determined. A character recognition post-processing method comprising: a stroke count comparison unit that compares the average value and outputs the word having the closest average value at the location as the correct answer.
JP1092894A 1989-04-14 1989-04-14 Character recognition post-processing method Expired - Lifetime JP2790842B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP1092894A JP2790842B2 (en) 1989-04-14 1989-04-14 Character recognition post-processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP1092894A JP2790842B2 (en) 1989-04-14 1989-04-14 Character recognition post-processing method

Publications (2)

Publication Number Publication Date
JPH0373086A true JPH0373086A (en) 1991-03-28
JP2790842B2 JP2790842B2 (en) 1998-08-27

Family

ID=14067168

Family Applications (1)

Application Number Title Priority Date Filing Date
JP1092894A Expired - Lifetime JP2790842B2 (en) 1989-04-14 1989-04-14 Character recognition post-processing method

Country Status (1)

Country Link
JP (1) JP2790842B2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0540854A (en) * 1991-08-06 1993-02-19 Oki Electric Ind Co Ltd Post-processing method for character recognizing result
DE19547812A1 (en) * 1994-12-20 1996-07-04 Nec Corp Character reading device for image input containing character strings
JP2008293109A (en) * 2007-05-22 2008-12-04 Toshiba Corp Text processor and program

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0540854A (en) * 1991-08-06 1993-02-19 Oki Electric Ind Co Ltd Post-processing method for character recognizing result
DE19547812A1 (en) * 1994-12-20 1996-07-04 Nec Corp Character reading device for image input containing character strings
DE19547812C2 (en) * 1994-12-20 1999-08-05 Nec Corp Character string reader
US6014460A (en) * 1994-12-20 2000-01-11 Nec Corporation Character strings reading device
JP2008293109A (en) * 2007-05-22 2008-12-04 Toshiba Corp Text processor and program

Also Published As

Publication number Publication date
JP2790842B2 (en) 1998-08-27

Similar Documents

Publication Publication Date Title
US5982929A (en) Pattern recognition method and system
US4771385A (en) Word recognition processing time reduction system using word length and hash technique involving head letters
US5774588A (en) Method and system for comparing strings with entries of a lexicon
KR100324847B1 (en) Address reader and mails separater, and character string recognition method
US6978044B2 (en) Pattern string matching apparatus and pattern string matching method
US8103099B2 (en) Method and system for recognizing characters and character groups in electronically represented text
JPS6120038B2 (en)
US3810093A (en) Character recognizing system employing category comparison and product value summation
CN113128504A (en) OCR recognition result error correction method and device based on verification rule
JPH0373086A (en) Character recognition post-processing system
JPS6262388B2 (en)
JP2732593B2 (en) Character reading system
JPS5953986A (en) Character recognizing device
JPH0766423B2 (en) Character recognition device
JP2671311B2 (en) Address reader
JPS646514B2 (en)
JP3788262B2 (en) Address recognition system and address identification method thereof
JPH07271920A (en) Character recognizing device
Trenkle et al. Disambiguation and spelling correction for a neural network based character recognition system
JP2000090203A (en) Method and device for recognizing character
JPS6252912B2 (en)
JPS60138689A (en) Character recognizing method
JPH06139402A (en) Character recognizer
JP2001025714A (en) Read-out information analysis device of postal article
JPH0255825B2 (en)

Legal Events

Date Code Title Description
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090612

Year of fee payment: 11

EXPY Cancellation because of completion of term