JPH0373086A - Character recognition post-processing system - Google Patents
Character recognition post-processing systemInfo
- Publication number
- JPH0373086A JPH0373086A JP1092894A JP9289489A JPH0373086A JP H0373086 A JPH0373086 A JP H0373086A JP 1092894 A JP1092894 A JP 1092894A JP 9289489 A JP9289489 A JP 9289489A JP H0373086 A JPH0373086 A JP H0373086A
- Authority
- JP
- Japan
- Prior art keywords
- character
- word
- strokes
- word dictionary
- recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012805 post-processing Methods 0.000 title claims description 8
- 238000000034 method Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 abstract description 16
- 238000010586 diagram Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 2
- 230000010365 information processing Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
Landscapes
- Character Discrimination (AREA)
Abstract
Description
【発明の詳細な説明】
[産業上の利用分野]
本発明は、文字認識装置等において、文字認識結果の候
補文字の画数を情報として利用し、正確に認識できなか
った文字を含んでいる場合においても、画数が近い正確
な候補文字を選択して訂正し、KR識精度を向上させる
文字認識後処理方式に関する。[Detailed Description of the Invention] [Industrial Application Field] The present invention utilizes the number of strokes of a candidate character as information in a character recognition device, etc., when the number of strokes of a candidate character as a result of character recognition is included. The present invention also relates to a character recognition post-processing method that selects and corrects accurate candidate characters with a similar number of strokes to improve KR recognition accuracy.
[従来の技術]
一般に文字認識装置は、認識結果として1入力文字に対
して複数個の候補文字とその距離値または類似度を出力
する。従来の文字認識後処理は、この認識結果と単語辞
書を照合し、単語辞書の各文字がそれと対応する位置の
候補文字集合中に存在すればその単語を正解としていた
。しかしこの方法では、認識結果の候補文字の中に正解
文字が含まれていない場合には、正確に文字認識出力が
得られない欠点があった。[Prior Art] Generally, a character recognition device outputs a plurality of candidate characters and their distance values or similarities for one input character as a recognition result. In conventional character recognition post-processing, this recognition result is compared with a word dictionary, and if each character in the word dictionary exists in the candidate character set at the corresponding position, that word is determined to be correct. However, this method has the drawback that accurate character recognition output cannot be obtained if the correct character is not included in the candidate characters of the recognition result.
[発明が解決しようとする課題]
本発明は、文字認識後処理方式において、文字認識装置
の入力文字の画数と認識結果の候補文字の画数は誤って
認識した場合でもそれほど変化しないことが多い性質を
利用して、正確に認識できなかった文字を含んでいる場
合にも、近似した画数の文字を選択し、訂正精度を向上
させることを目的とする。[Problem to be Solved by the Invention] The present invention solves the problem that, in a character recognition post-processing method, the number of strokes of an input character of a character recognition device and the number of strokes of a candidate character as a recognition result often do not change much even if they are recognized incorrectly. The purpose of this paper is to improve correction accuracy by selecting characters with a similar number of strokes even when characters that cannot be accurately recognized are included.
し課題を解決するための手段]
上記の課題を解決するために、本発明は、単語辞書、単
語辞書と文字認識結果を照合する単語辞書照合部、全文
字カテゴリの画数を記録した画数テーブル、単語辞書照
合部から出力された単語を受けて、所定の条件で動作す
る画数照合部を備えている。Means for Solving the Problems] In order to solve the above problems, the present invention provides a word dictionary, a word dictionary matching unit that matches the word dictionary and character recognition results, a stroke number table that records the number of strokes of all character categories, It includes a stroke number matching section that receives the word output from the word dictionary matching section and operates under predetermined conditions.
[作用]
単語辞書照合部において、認識した文字を単語辞書と照
合し、各単語に優先度を付加して最も優先度の高い単語
を単数もしくは複数出力し、全文字カテゴリの画数を記
録した画数テーブルと、単語辞書照合手段から出力され
た単語が複数存在しかつ該複数単語が同一の箇所で認識
結果の候補文字集合中に該当文字が存在しない場合、画
数照合部において画数テーブルを用いて単語集合の文字
の画数と認識結果の該箇所の上位の単数もしくは複数の
候補文字の画数の平均値とを比較し、もっとも近い該平
均値を該箇所に持つ単語を正解として出力する。[Function] In the word dictionary matching section, the recognized characters are compared with the word dictionary, a priority is added to each word, the word or words with the highest priority are output, and the number of strokes for all character categories is recorded. If there are multiple words output from the table and the word dictionary matching means, and the corresponding character does not exist in the candidate character set of the recognition result at the same location, the stroke number matching unit uses the stroke number table to The number of strokes of the characters in the set is compared with the average value of the number of strokes of the single or plural candidate characters at the top of the position in the recognition result, and the word having the closest average value at the position is output as the correct answer.
[実施例]
以下、本発明の一実施例について図面を用いて説明する
。[Example] An example of the present invention will be described below with reference to the drawings.
第1図は本発明で対象とする文字認識後処理方式を構成
する装置の一実施例である。FIG. 1 shows an embodiment of a device constituting a character recognition post-processing system targeted by the present invention.
第1図において、1は認識結果、2は単語辞書、3は単
語辞書照合部、4は画数テーブル、5は画数照合部、6
は訂正結果である。In FIG. 1, 1 is a recognition result, 2 is a word dictionary, 3 is a word dictionary matching section, 4 is a stroke number table, 5 is a stroke number matching section, 6
is the correction result.
文字認識による認識結果lは、最初に単語辞書照合部3
に入力される。第2図は、この単語辞書照合部3におけ
る処理のフローチャートである。The recognition result l from character recognition is first passed to the word dictionary collation unit 3.
is input. FIG. 2 is a flowchart of the processing in the word dictionary matching section 3.
また、第3図は、このフローに従って、単語辞書照合部
において単語を処理する実施例を説明する図である。Further, FIG. 3 is a diagram illustrating an embodiment in which words are processed in the word dictionary matching section according to this flow.
第2図において、ステップS1は、文字パターン認識装
置において文字列を8B識し、ステップS2において単
語に切り分け、S3において認識候補を順位をつけて各
文字の距離を算出し認識結果として所定の記憶場所に格
納する。S4は、認識結果1と単語辞書2に格納されて
いる単語とを照合する段階、S5は照合した各単語につ
いて一致した文字数と合計距離を計算して単語辞書照合
部の中間結果格納部に記憶する段階、S6はS5におい
て処理されたデータを一致文字数の大きい順にソートす
る処理を実行する段階、S7で一致文字数の最大のグル
ープを抽出した後、S8でそのグループの中で合計距離
値が小さい順にソートし記憶場所に格納する。このよう
にして処理されたデータは、S9において最小合計距離
値を持つ単語数を判断して単数であれば■の出力にその
単語のみを次段の画数照合部に送る判断出力を発生し、
判断出力が■であれば、すなわち単語が複数存在すれば
、それらの単語のほかに、認識結果の第1位候補文字の
みを抽出した文字列を画数照合部に送るよう動作する。In FIG. 2, step S1 recognizes 8B character strings in a character pattern recognition device, cuts them into words in step S2, ranks recognition candidates in S3, calculates the distance between each character, and stores them in a predetermined memory as recognition results. Store in place. S4 is a step of comparing the recognition result 1 with words stored in the word dictionary 2, and S5 is a step of calculating the number of matched characters and total distance for each matched word and storing it in the intermediate result storage section of the word dictionary matching section. S6 is a step of sorting the data processed in S5 in descending order of the number of matching characters. After extracting the group with the largest number of matching characters in S7, S8 is a step of sorting the data processed in S5 in descending order of the number of matching characters. Sort them in order and store them in a memory location. The data processed in this way determines the number of words with the minimum total distance value in S9, and if it is singular, generates a judgment output that sends only that word to the next stage stroke number matching unit,
If the judgment output is ■, that is, if there are a plurality of words, then in addition to those words, the character string extracted from the first candidate character of the recognition result is sent to the stroke number matching unit.
処理の実施例を説明する第3図において、11は文字認
識装置から送られてきた認識結果の例で、正解は「川口
市幸町」である。11の例では「口」を認識した結果の
候補文字集合中に正解が入っていない。In FIG. 3 illustrating an example of processing, 11 is an example of the recognition result sent from the character recognition device, and the correct answer is "Saiwai-cho, Kawaguchi City." In example 11, the correct answer is not included in the candidate character set resulting from recognition of "mouth".
21は単語辞書2の内部表現の例である。単語辞書照合
部3では認識結果1と単語辞書2の中の各単語を照合し
、中間結果格納部31にいったん処理結果を保持し、次
に一致文字数の大きい順にソートし、−散文字数のもっ
とも大きいグループを抽出した後、そのグループの中で
合計距離値の小さい順にソートし単語辞書照合結果部3
2に格納する。そして最も小さい合計距離値を持った単
語が単数存在する場合はその単語だけを次段の処理であ
る画数照合部5に送る。単語が複数存在する場合は、複
数個の単語全部と、文字認識結果の所定順位の候補文字
から作られる文字列を画数照合部5に渡す。この例では
、単語候補としてrJl崎市幸町」、「川口市幸町」、
「川越市幸町」、そして認識結果の候補文字が画数照合
部5に送られる。21 is an example of internal representation of the word dictionary 2. The word dictionary collation unit 3 collates the recognition result 1 with each word in the word dictionary 2, temporarily stores the processing results in the intermediate result storage unit 31, and then sorts them in descending order of the number of matched characters. After extracting large groups, the groups are sorted in descending order of total distance value and word dictionary matching result section 3
Store in 2. If there is a single word with the smallest total distance value, only that word is sent to the stroke number matching section 5, which is the next stage of processing. If a plurality of words exist, a character string made from all of the plurality of words and candidate characters of a predetermined rank as a result of character recognition is passed to the stroke number matching unit 5. In this example, the word candidates are rJl Saki City Saiwai-cho”, “Kawaguchi City Saiwai-cho”,
"Kawagoe City Saiwai-cho" and the candidate characters as a result of recognition are sent to the stroke number matching section 5.
第4図は画数照合部5における処理のフローチャートで
あって、各レジスタに格納されるべき単語等のデータと
、各段階毎の処理の内容を示している。FIG. 4 is a flowchart of the processing in the stroke number matching section 5, showing data such as words to be stored in each register and the contents of the processing at each stage.
第5図は、第4図の処理フローに従った処理の実施例で
あって、画数照合部5は、単語辞書照合部3から送られ
た3つの単語候補を、二次元配列のレジスタミー]の第
1行から第3行に、また認識結果の候補文字をおなじく
二次元配列のレジスタb−1の第1行から第3行に格納
する。次にレジスタミー1の内容を、第1行の第1文字
から第2行の第1文字というように、先頭から順に縦方
向に比較し、各列の文字が全部等しければ○、そうでな
ければ×の記号をその列に記録する。第5図の例では、
先頭から2番目の列に存在する文字が異なっているので
×印の記号が記録される。FIG. 5 shows an example of processing according to the processing flow of FIG. 4, in which the stroke number matching unit 5 registers three word candidates sent from the word dictionary matching unit 3 in a two-dimensional array. The candidate characters resulting from the recognition are also stored in the first to third rows of the two-dimensional array register b-1. Next, compare the contents of register me 1 vertically from the beginning, from the first character of the first row to the first character of the second row, and if all the characters in each column are the same, then ○, otherwise Record the x symbol in that column. In the example in Figure 5,
Since the characters present in the second column from the beginning are different, an x symbol is recorded.
その後、レジスタミー1において、×印の記号が記録さ
れている列の文字をレジスタミー2に書き込み、またレ
ジスタt)−1の文字の中でレジスタミー1の×印が記
録されている列と同じ列番号に属する候補文字を上位か
ら順番に、すなわち第1行、第2行・・・の順にレジス
タb−2に書き込む。After that, in register me 1, write the characters in the column in which the cross symbol is recorded in register me 2, and write the characters in the column in which the cross mark in register me 1 is recorded in the characters in register t)-1 Candidate characters belonging to the same column number are written into register b-2 in order from the top, ie, in the order of the first row, the second row, and so on.
レジスタb−2に書き込む文字の数は自由に設定できる
が、この実施例では上位3個の候補文字を書き込むよう
に設定されている。第5図の例では、レジスタミー2に
は「口」、「越」、「崎」が、レジスタb−2には「は
」、「は」、「に」、が書き込まれる。Although the number of characters to be written into the register b-2 can be set freely, in this embodiment, it is set to write the top three candidate characters. In the example of FIG. 5, "Kuchi", "Koshi", and "Saki" are written in register me2, and "ha", "ha", and "ni" are written in register b-2.
次にレジスタミー2の各文字の画数をテーブル41を参
照して調べる。そしてレジスタb−2の各文字の画数を
画数テーブル41を参照して調べ、各画数を除算器55
に送り、画数の和を文字数の和で除してその平均値を求
める。次にその平均画数とレジスタミー2の各文字の画
数と比較し、レジスタb−2の文字の平均画数ともっと
も近い画数を持ったレジスタミー2の文字を採用し、そ
れを含む単語候補を正解として出力する。第5図の例で
は、「口j (画数3)が、レジスタb−2の平均画数
の画数(=3.3>に最も近いことから、それを含む「
川口市幸町」が訂正結果6として出力される。Next, the number of strokes of each character of register me 2 is checked with reference to the table 41. Then, the number of strokes of each character in register b-2 is checked with reference to the number of strokes table 41, and each number of strokes is calculated by the divider 55.
, and divide the sum of the number of strokes by the sum of the number of characters to find the average value. Next, compare that average number of strokes with the number of strokes of each character in Register Me 2, select the character in Register Me 2 that has the closest number of strokes to the average number of strokes of the characters in Register b-2, and select the word candidate containing it as the correct word. Output as . In the example shown in FIG.
"Kawaguchi City Saiwai-cho" is output as correction result 6.
なお、実施例では距離値を用いて候補文字の適否を判定
したが、類似度を用いても同様の処理を行うことができ
る。In the embodiment, the suitability of candidate characters is determined using distance values, but similar processing can be performed using similarity.
本発明を実施するための装置は、専用の回路を組み込ん
で文字1EWi結果を入力とし、正解単語を出力とする
ことが出来る集積回路として実現することもできるし、
また、汎用の情報処理回路に本発明の目的を達成できる
機能の制御回路を付加し、必要な文字情報などを記憶回
路に格納して実現することもできる。The device for implementing the present invention can be realized as an integrated circuit that incorporates a dedicated circuit and can input the character 1 EWi result and output the correct word.
Further, it is also possible to add a control circuit with a function capable of achieving the object of the present invention to a general-purpose information processing circuit, and store necessary character information and the like in a storage circuit.
[発明の効果]
以上のように、本発明は、文字認識後処理を行うに際し
て、正確に認識されなかった文字を含む場合であっても
、画数の近い文字の中から所定の処理によって最適な単
語候補の選択を行うことが可能であり、これにより文字
認識をした後、正確に認識出来なかった文字の訂正処理
の高精度化が図れる。[Effects of the Invention] As described above, when performing character recognition post-processing, even when characters that are not accurately recognized are included, the present invention can select the optimal one from among characters with a similar number of strokes through predetermined processing. It is possible to select word candidates, and thereby, after character recognition, correction processing for characters that cannot be accurately recognized can be performed with high accuracy.
第1図は本発明で対象とする文字認識後処理方式の一実
施例を示すブロック図、第2図は第1図の単語辞書照合
部3における処理例を説明するための図、第3図は第1
図の画数照合部5のにおける処理例を説明するためその
内部の各レジスタ及び画数テーブルの内容と、処理を示
す図である。
1:認識結果
2:単語辞書
3:単語辞書照合部
4:画数テーブル
5:画数照合部
6:訂正結果
11;文字認識装置から送られてきた認識結果の例
21:単語辞書2の内部表現の例
:中間結果格納部
:単語辞書照合結果部
二面数テーブル
:レジスタa−1
:レジスタb−1
レジスタミー2
:レジスタb−2
:除算器FIG. 1 is a block diagram showing an example of a character recognition post-processing method targeted by the present invention, FIG. 2 is a diagram for explaining an example of processing in the word dictionary matching section 3 of FIG. 1, and FIG. is the first
FIG. 2 is a diagram showing the contents of each internal register and a stroke number table, and the processing, in order to explain an example of processing in the stroke number matching unit 5 shown in the figure. 1: Recognition result 2: Word dictionary 3: Word dictionary collation unit 4: Stroke count table 5: Stroke count collation unit 6: Correction result 11; Example of recognition result sent from character recognition device 21: Internal representation of word dictionary 2 Example: Intermediate result storage section: Word dictionary matching result section Two-sided number table: Register a-1: Register b-1 Register me 2: Register b-2: Divider
Claims (1)
補文字および標準パターンとの距離値または類似度を生
成する文字認識装置において、単語辞書と、該単語辞書
と認識結果を照合し、優先度を各単語に付加して最も優
先度の高い単語を単数もしくは複数出力する単語辞書照
合部と、全文字カテゴリの画数を記録した画数テーブル
と、単語辞書照合部から出力された単語が複数存在し、
かつ、該複数単語が同一の箇所で認識結果の候補文字集
合中に該当文字が存在しない場合、該単語集合の文字の
画数と認識結果の該箇所の上位の単数もしくは複数の候
補文字の画数の平均値とを比較し、もっとも近い該平均
値を該箇所に持つ単語を正解として出力する画数照合部
とからなることを特徴とする文字認識後処理方式。A character recognition device that recognizes an input character pattern and generates a distance value or similarity between a single or multiple candidate characters and a standard pattern, compares the recognition results with a word dictionary and determines priorities. There is a word dictionary matching unit that outputs one or more words with the highest priority added to each word, a stroke count table that records the number of strokes of all character categories, and multiple words output from the word dictionary matching unit.
In addition, if the plural words are in the same location and the corresponding character does not exist in the candidate character set of the recognition result, the number of strokes of the character in the word set and the number of strokes of the single or multiple candidate characters at the top of the recognition result at the location are determined. A character recognition post-processing method comprising: a stroke count comparison unit that compares the average value and outputs the word having the closest average value at the location as the correct answer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP1092894A JP2790842B2 (en) | 1989-04-14 | 1989-04-14 | Character recognition post-processing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP1092894A JP2790842B2 (en) | 1989-04-14 | 1989-04-14 | Character recognition post-processing method |
Publications (2)
Publication Number | Publication Date |
---|---|
JPH0373086A true JPH0373086A (en) | 1991-03-28 |
JP2790842B2 JP2790842B2 (en) | 1998-08-27 |
Family
ID=14067168
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP1092894A Expired - Lifetime JP2790842B2 (en) | 1989-04-14 | 1989-04-14 | Character recognition post-processing method |
Country Status (1)
Country | Link |
---|---|
JP (1) | JP2790842B2 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0540854A (en) * | 1991-08-06 | 1993-02-19 | Oki Electric Ind Co Ltd | Post-processing method for character recognizing result |
DE19547812A1 (en) * | 1994-12-20 | 1996-07-04 | Nec Corp | Character reading device for image input containing character strings |
JP2008293109A (en) * | 2007-05-22 | 2008-12-04 | Toshiba Corp | Text processor and program |
-
1989
- 1989-04-14 JP JP1092894A patent/JP2790842B2/en not_active Expired - Lifetime
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0540854A (en) * | 1991-08-06 | 1993-02-19 | Oki Electric Ind Co Ltd | Post-processing method for character recognizing result |
DE19547812A1 (en) * | 1994-12-20 | 1996-07-04 | Nec Corp | Character reading device for image input containing character strings |
DE19547812C2 (en) * | 1994-12-20 | 1999-08-05 | Nec Corp | Character string reader |
US6014460A (en) * | 1994-12-20 | 2000-01-11 | Nec Corporation | Character strings reading device |
JP2008293109A (en) * | 2007-05-22 | 2008-12-04 | Toshiba Corp | Text processor and program |
Also Published As
Publication number | Publication date |
---|---|
JP2790842B2 (en) | 1998-08-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5982929A (en) | Pattern recognition method and system | |
US4771385A (en) | Word recognition processing time reduction system using word length and hash technique involving head letters | |
US5774588A (en) | Method and system for comparing strings with entries of a lexicon | |
KR100324847B1 (en) | Address reader and mails separater, and character string recognition method | |
US6978044B2 (en) | Pattern string matching apparatus and pattern string matching method | |
US8103099B2 (en) | Method and system for recognizing characters and character groups in electronically represented text | |
JPS6120038B2 (en) | ||
US3810093A (en) | Character recognizing system employing category comparison and product value summation | |
CN113128504A (en) | OCR recognition result error correction method and device based on verification rule | |
JPH0373086A (en) | Character recognition post-processing system | |
JPS6262388B2 (en) | ||
JP2732593B2 (en) | Character reading system | |
JPS5953986A (en) | Character recognizing device | |
JPH0766423B2 (en) | Character recognition device | |
JP2671311B2 (en) | Address reader | |
JPS646514B2 (en) | ||
JP3788262B2 (en) | Address recognition system and address identification method thereof | |
JPH07271920A (en) | Character recognizing device | |
Trenkle et al. | Disambiguation and spelling correction for a neural network based character recognition system | |
JP2000090203A (en) | Method and device for recognizing character | |
JPS6252912B2 (en) | ||
JPS60138689A (en) | Character recognizing method | |
JPH06139402A (en) | Character recognizer | |
JP2001025714A (en) | Read-out information analysis device of postal article | |
JPH0255825B2 (en) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20090612 Year of fee payment: 11 |
|
EXPY | Cancellation because of completion of term |