JPH0442382A - Word reader - Google Patents

Word reader

Info

Publication number
JPH0442382A
JPH0442382A JP2149089A JP14908990A JPH0442382A JP H0442382 A JPH0442382 A JP H0442382A JP 2149089 A JP2149089 A JP 2149089A JP 14908990 A JP14908990 A JP 14908990A JP H0442382 A JPH0442382 A JP H0442382A
Authority
JP
Japan
Prior art keywords
kanji
word
hiragana
words
word dictionary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2149089A
Other languages
Japanese (ja)
Inventor
Hiroyasu Miyahara
景泰 宮原
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mitsubishi Electric Corp
Original Assignee
Mitsubishi Electric Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitsubishi Electric Corp filed Critical Mitsubishi Electric Corp
Priority to JP2149089A priority Critical patent/JPH0442382A/en
Publication of JPH0442382A publication Critical patent/JPH0442382A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PURPOSE:To accurately decide a word at all times by tentatively converting all the KANJI (Chinese characters) in a candidate characters to a HIRAGANA (cursive form of Japanese syllabary writing) when there is a possibility that the HIRAGANA is included in the inputted words, collating them with a HIRAGANA word dictionary, and deciding the inputted word while using the result of collation together. CONSTITUTION:First, the collation is performed to collate the candidate word obtained from a character recognition means 3 in the order named with a KANJI word dictionary 6. The obtained words are outputted as they are when the collation with the specific KANJI words is succeeded and the candidates of an upper grade than individual candidate characters forming the word are all KANJI on the obtained KANJI words, and when they are not, all the KANJI in the candidate characters are tentatively converted to the HIRAGANA by a KANJI/KANA conversion table 8 to collate them with a HIRAGANA word dictionary 7, and the word with the highest accuracy is outputted from among the obtained HIRAGANA words and the KANJI words decided here. Thus, the word with orthography altering variously by the KANJI, the KANA, and the one mixed with the KANJI and the KANA, etc., can be accurately decided independent of its orthography.

Description

【発明の詳細な説明】 〔産業上の利用分野〕 この発明は帳票に記入された単語を読取る単語読取装置
に関するものであり、特に漢字、かな漢字かな混じり等
の表記の変動がある単語を読取る単語読取装置に関する
ものである。
[Detailed Description of the Invention] [Industrial Application Field] This invention relates to a word reading device for reading words written on a form, and in particular for reading words that have variations in written expression such as kanji, kana, kanji, and kana. This relates to a reading device.

〔従来の技術〕[Conventional technology]

第7図は、例えば“昭和57年度電子通信学会総合全国
大会(1341)r手書漢字認識における単語情報の利
用」”などに示された従来の単語読取装置の構成図であ
る。図において1は単語の記入された帳票、2は上記帳
票I上の単語を光学的に走査し光電変換する走査手段、
3は光電変換された単語を1文字毎に切り出して認識し
認識候補文字を出力する文字認識手段、4は帳票1に記
入される単語を格納した単語辞書、5は上記文字認識手
段3から出力された候補文字と上記単語辞書4内に含ま
れる複数の単語とを照合して入力単語を決定する単語決
定手段である。
FIG. 7 is a configuration diagram of a conventional word reading device shown in, for example, "Utilization of word information in handwritten kanji recognition" at the 1981 National Conference of the Institute of Electronics and Communication Engineers (1341). 2 is a form on which words are written; 2 is a scanning means for optically scanning and photoelectrically converting the words on the form I;
3 is a character recognition means for cutting out and recognizing each character of the photoelectrically converted word and outputting recognition candidate characters; 4 is a word dictionary storing words to be entered in the form 1; and 5 is an output from the character recognition means 3. This is a word determining means that determines an input word by comparing the selected candidate characters with a plurality of words included in the word dictionary 4.

第8図は入力単語とその候補文字の例である。FIG. 8 shows an example of an input word and its candidate characters.

9は入力単語であり、10は上記入力単語9の個々の文
字に対応した候補文字である。図に示すように候補文字
10は1位から順に順位付けされている。
9 is an input word, and 10 is a candidate character corresponding to each character of the input word 9. As shown in the figure, the candidate characters 10 are ranked in order starting from the first place.

第9図は単語辞書の例である。11は″石川”という単
語、12は“石田”という単語、13は″石田”という
単語であり、14は“右田”という単語、15は“右谷
”という単語である。
FIG. 9 is an example of a word dictionary. 11 is the word "Ishikawa," 12 is the word "Ishida," 13 is the word "Ishida," 14 is the word "Migita," and 15 is the word "Migaya."

次にこの従来例の動作について説明する。第8図に示す
入力単語9が帳票1上に記入されていた場合、走査手段
2はそれを光電変換して文字認識手段3に出力する。文
字認識手段3は得られたパターンから個々の文字パター
ンを切り出して認識し、類似度の大きい順に順序付けさ
れた複数の候補文字10を出力する。第8図においては
、入力単語の第一文字に対して1位候補文字°′右”、
2位候補文字“石”を出力し、入力単語の第二文字に対
して1位候補文字パ因”、2位候補文字“田”3位候補
文字“団”を出力している。単語決定手段5は上記候補
文字10と単語辞書4内の単語とを照合する。照合の具
体的な方法としては、例えば次式によりまず単語辞書4
のj番目の単語A。
Next, the operation of this conventional example will be explained. When the input word 9 shown in FIG. 8 is written on the form 1, the scanning means 2 photoelectrically converts it and outputs it to the character recognition means 3. The character recognition means 3 cuts out and recognizes individual character patterns from the obtained pattern, and outputs a plurality of candidate characters 10 ordered in descending order of similarity. In Figure 8, the first candidate character °'right for the first character of the input word,
The 2nd place candidate character "stone" is output, and the 1st place candidate character "Pain", the 2nd place candidate character "田", and the 3rd place candidate character "dan" are output for the second character of the input word. Word determination The means 5 collates the candidate character 10 with the word in the word dictionary 4.As a specific method of collation, for example, first, the word dictionary 4 is
jth word A.

の評価値PJを計算する。ここでnは単語の文字数を示
す。
Calculate the evaluation value PJ. Here, n indicates the number of characters in the word.

PJ −Σ p 。PJ-Σp.

大きい定数) そして Pk/n =min  (P、/n) <T  (Tは
閾値)なるA1が存在するならばA、を決定結果とする
If there exists A1 such that Pk/n = min (P, /n) < T (T is a threshold), then A is taken as the determination result.

第8図の候補文字10を第9図の単語辞書4で照合する
例を詳しく説明する。この例では前記判定条件の1=1
0.T=5とする。また入力単語9は2文字単語なので
n=2である。単語決定手段はまず一番目の単語“石川
”11と照合する。
An example in which the candidate character 10 shown in FIG. 8 is compared with the word dictionary 4 shown in FIG. 9 will be explained in detail. In this example, the above judgment condition 1=1
0. Let T=5. Furthermore, since the input word 9 is a two-character word, n=2. The word determining means first compares it with the first word "Ishikawa" 11.

“石”は2位、′川”は候補外なのでPI=12となる
。次に二番目の単語“石田”12と照合する。1石”は
2位、′田”は2位なのでP2=4となる。以下同様に
して“石田”13.“右田”14、“右谷”15との照
合を行い、P3=12゜P4=3.P5=11を得る。
“Ishi” is in second place, and “kawa” is not a candidate, so PI = 12. Next, it is compared with the second word “Ishida”, which is 12. Since “1 stone” is in second place and “da” is in second place, P2 = 4. Thereafter, "Ishida" 13, "Migida" 14, and "Migaya" 15 are compared in the same way, and P3=12°P4=3.P5=11 is obtained.

ここで min  (P、H/n) =Pa /n  <T (
−5)が成立するので、四番目の単語“右田”14を決
定結果として出力する。
Here min (P, H/n) = Pa /n < T (
-5) holds, so the fourth word "Migita" 14 is output as the determination result.

このように単語決定手段5がなければ“右図”という誤
った結果が出てしまうところであったが、単語決定手段
5を設け、単語辞書4内の単語と照合して結果を出すこ
とで“右田”という正しい単語が出た。
Without the word determination means 5, an incorrect result such as "right figure" would have been obtained, but by providing the word determination means 5 and comparing the words with the words in the word dictionary 4, the result is " The correct word "Migita" came out.

〔発明が解決しようとする課題〕[Problem to be solved by the invention]

従来の単語読取装置は以上のように構成されているので
、帳票に書かれる単語は漢字、かな、漢字かな混じり等
の表記を全て単語辞書に格納しておかねばならなず、同
じ単語であっても単語辞書に格納されている表記では正
しく決定できるが、格納されていない表記では正しく決
定できない、などの問題点があった。
Conventional word reading devices are configured as described above, so the words written on the form must be stored in a word dictionary including kanji, kana, mixed kanji and kana, etc. However, there are problems in that, although it is possible to correctly determine the notation stored in the word dictionary, it cannot be determined correctly with the notation that is not stored.

この発明は上記のような問題点を解消するためになされ
てもので、漢字、かな、漢字かな混じり等により様々に
変動する表記を持つ単語をその表記に依らず正しく決定
できる単語読取装置を得ることを目的とする。
This invention has been made in order to solve the above-mentioned problems, and it is an object of the present invention to provide a word reading device that can correctly determine words whose notation varies in various ways due to kanji, kana, mixture of kanji and kana, etc., regardless of the notation. The purpose is to

〔課題を解決するための手段〕[Means to solve the problem]

この発明に係る単語読取装置は、帳票1上の単語を読取
る走査手段2と、読取られた上記単語を構成する各文字
パターンを認識し、類似度の高い順に順序付けされた複
数の候補文字を出力する文字認識手段3と、帳票1上に
記入される単語を漢字で格納した漢字単語辞書6と、上
記漢字単語辞書6の内容をすべでひらがなで置き換えて
格納したひらがな単語辞書7と、上記漢字単語辞書6内
の全漢字について1文字単位で対応するひらがなを示す
漢字かな変換テーブル8と、上記文字認識手段3から出
力された候補文字17を入力とし、上記漢字単語辞書6
、上記ひらがな単語辞書7、上記漢字かな変換テーブル
8を参1gすることで単語を決定する単語決定手段5と
を備える。
The word reading device according to the present invention includes a scanning means 2 for reading words on a form 1, a scanning means 2 for reading words on a form 1, recognizing each character pattern constituting the read word, and outputting a plurality of candidate characters ordered in descending order of similarity. a kanji word dictionary 6 that stores the words written on the form 1 in kanji, a hiragana word dictionary 7 that stores all the contents of the kanji word dictionary 6 replaced with hiragana, and the kanji characters. The kanji-kana conversion table 8 indicating the corresponding hiragana character by character for all kanji in the word dictionary 6 and the candidate characters 17 output from the character recognition means 3 are input, and the kanji-word dictionary 6 is
, word determining means 5 for determining a word by referring to the hiragana word dictionary 7 and the kanji-kana conversion table 8.

〔作用〕[Effect]

この発明における単語決定手段5は、文字認識手段3か
ら得られた候補文字17とその順から、まず漢字単語辞
書6との照合が行われ、特定の漢字単語との照合が成功
し、かつ得られた漢字単語について、その単語を成す個
々の候補文字より上位の候補がすべて漢字のときは、そ
のまま得られた漢字単語が出力され、そうでない場合は
漢字かな変換テーブル8により、候補文字17中の漢字
が一時的にすべてひらがな27に置き換えられてひらが
な単語辞書7と照合され、得られたひらがな単語と上記
で求められた漢字争語のうち、最も確度の高い単語が出
力される。
The word determining means 5 in this invention first compares the candidate characters 17 obtained from the character recognition means 3 and their order with the Kanji word dictionary 6, and if the matching with a specific Kanji word is successful and the For a Kanji word obtained, if all of the candidates higher than the individual candidate characters that make up the word are Kanji, the obtained Kanji word is output as is. All kanji are temporarily replaced with 27 hiragana and compared with the hiragana word dictionary 7, and the word with the highest accuracy among the obtained hiragana words and the kanji conflict words determined above is output.

〔発明の実施例〕[Embodiments of the invention]

以下、この発明の−・実施例を図について説明する。第
1図はこの発明の実施例を示す構成図であり、帳票1.
走査手段22文字認識手段3.単語決定手段5は従来例
と同一構成要素を示している。。
Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing an embodiment of the present invention.
Scanning means 22 Character recognition means 3. The word determining means 5 shows the same components as the conventional example. .

図において、6は帳票1に書かれる単語を漢字で格納し
た漢字単語辞書、7ば漢字単語辞書6の内容をすべてひ
らがなで置き換えたひらがな単語辞書、8は漢字単語辞
書6内の漢字を、対応するひらがなに変換する漢字かな
変換テーブルである。
In the figure, 6 is a kanji word dictionary that stores the words written in the form 1 in kanji, 7 is a hiragana word dictionary in which all the contents of the kanji word dictionary 6 are replaced with hiragana, and 8 is a kanji word dictionary that stores the kanji in the kanji word dictionary 6. This is a kanji-kana conversion table that converts hiragana to hiragana.

第2図は入力単語とその候補文字を示す例であり、16
ば入力単語、17は候補文字である。図の例では入力単
語16の第一文字に対して1位候補文字が“宮”、2位
候補文字“営”、となぼおり、第二文字に対しては1位
候補文字が“ざ゛2位候補文字が“ぎ°、3位候補文字
が“之”となり、さらに第二文字については1位候補文
字を“さ”、2位候補文字を“き゛としている。
Figure 2 is an example showing an input word and its candidate characters.
17 is an input word, and 17 is a candidate character. In the example shown in the figure, for the first character of input word 16, the first candidate character is "Miya" and the second candidate character is "Ei", and for the second character, the first candidate character is "Za". The second candidate character is "gi°", the third candidate character is "", and for the second character, the first candidate character is "sa" and the second candidate character is "ki".

第3図は漢字単語辞書6を示す例であり、18は“宮之
原゛なる単語、19は“山之上”という単語、20は“
長谷用゛という単語である。
Figure 3 shows an example of the kanji word dictionary 6, where 18 is the word "Miyanohara", 19 is the word "Yamanoue", and 20 is "
The word is Haseyo.

第4図は漢字かな変換テーブル8を示す例であり、21
は漢字“′宮”、22は漢字“営”、23ば漢字“之”
を示している。ざらに24は“宮”21に対応するひら
がな“みや”、25ば“営”22に対応するひらがな“
えい”、26は“之”23に対応するひらがな“の”で
ある。
Figure 4 shows an example of Kanji-kana conversion table 8, with 21
is the kanji “gu”, 22 is the kanji “ei”, 23 is the kanji “之”
It shows. Zarani 24 is the hiragana "miya" corresponding to "miya" 21, and 25 is the hiragana corresponding to "ei" 22.
``Ei'', 26 is the hiragana ``no'' corresponding to ``之'' 23.

第5図は第2図に示す候補文字17についてその中の漢
字をすべてひらがなに置き換えたものであり、27はひ
らがなのみから成る候補文字である。
FIG. 5 shows candidate character 17 shown in FIG. 2 in which all the kanji are replaced with hiragana, and 27 is a candidate character consisting only of hiragana.

第6図はひらがな単語辞書7を示す例であり、28は“
あいざわ”という単語、29は“えのもと”という単語
、30は“みやざき”という単語である。
FIG. 6 shows an example of the hiragana word dictionary 7, where 28 is “
29 is the word ``Enomoto,'' and 30 is the word ``Miyazaki.''

次にこの実施例の動作について説明する。従来例と同様
の処理をした後、単語決定手段5は入力単語16の候補
文字17を得て、まず漢字単語辞書6と照合する。具体
例として第2図の入力単語16を第3図の漢字単語辞書
6と照合する場合を考える。照合の方法は従来例と同じ
く単語の評価値PJを用いるものとし、その中で使う定
数の値も従来例と同じとする。ただし、入力単語16は
3文字なのでn=3となる。この照合例ではまず漢字単
語辞書6内の3文字単語″宮之原”18との照合を行う
。“宮”は1位、“之”は2位、“原”は候補外なので
Pl=13となる。以下同様に“山之上”19.“長谷
用”20との照合を行い、P2 =22.  Pa =
30を得る。ここでmin  (PJ/n) =P+ 
/n  <T (−5)なので“宮之原”18との照合
が成功するが、“之”の上位候補にひらがな“ざ”があ
るため、単語決定手段5は“宮之原”18を決定結果と
して出力せず、次の処理に進み第4図に示す漢字かな変
換テーブル8を用いて候補文字17内の漢字をすべてひ
らがなに変換する。具体的には宮”21.“営”22.
  “之”23を一時的にそれぞれに対応したひらがな
“みや”24.“えい”25.6の”26に変換する。
Next, the operation of this embodiment will be explained. After performing the same processing as in the conventional example, the word determining means 5 obtains the candidate characters 17 of the input word 16, and first compares them with the Kanji word dictionary 6. As a specific example, consider the case where the input word 16 in FIG. 2 is compared with the kanji word dictionary 6 in FIG. 3. The matching method uses the word evaluation value PJ as in the conventional example, and the values of the constants used therein are also the same as in the conventional example. However, since the input word 16 has three characters, n=3. In this comparison example, first, a comparison is made with the three-letter word "Miyanohara" 18 in the Kanji word dictionary 6. “Miya” is in first place, “No” is in second place, and “Hara” is not a candidate, so Pl=13. Similarly, “Yamanoue” 19. Comparison with “for Hase” 20 is performed, and P2 = 22. Pa=
Get 30. Here min (PJ/n) =P+
Since /n < T (-5), the matching with "Miyanohara" 18 is successful, but since the hiragana "za" is among the top candidates for "no", the word determination means 5 outputs "Miyanohara" 18 as the determination result. Instead, the process proceeds to the next step and converts all the kanji in the candidate characters 17 into hiragana using the kanji-kana conversion table 8 shown in FIG. Specifically, Miya” 21. “Ei” 22.
The hiragana ``miya'' 24 temporarily corresponds to ``no'' 23. Convert "Ei" 25.6 to "26".

その結果第2図の候補文字17は第5図に示す形の候補
文字27となる。次にこの候補文字27を第6図に示す
ようなひらがな単語辞書7と照合するわけである。ここ
での照合法は基本的には従来例と同じであるが、漢字の
候補文字17から複数のひらがなに変換された候補文字
27 (例えば第5図の“みや”や“えい”)に対して
は、その漢字1文字に対応するひらがながすべて合致し
た時のみ、変換前の漢字が持つ順位をそのひらがなの個
々の文字の順位とし、それ以外の場合は候補外とする。
As a result, candidate character 17 in FIG. 2 becomes candidate character 27 in the form shown in FIG. Next, this candidate character 27 is compared with a hiragana word dictionary 7 as shown in FIG. The matching method here is basically the same as in the conventional example, but for the candidate character 27 converted from the kanji candidate character 17 to multiple hiragana characters (for example, "miya" and "ei" in Figure 5). In this case, only when all the hiragana characters corresponding to that kanji character match, the rank of the kanji character before conversion is used as the rank of each individual hiragana character, and in other cases, it is not a candidate.

これに関しては第5図の候補文字27を第6図のひらが
な単語辞書7と照合する例で説明する。単語決定手段5
は候補文字27をまず“あいざわ”28と照合する。こ
こで候補文字27の最初の2文字は漢字1文字から変換
されてできたものなので、2文字とも合致しなくては候
補外となる。“あいざわ”28の最初の2文字はどれと
も合致しないので2文字とも候補外であり、3文字目の
“′ざ”は1位、4文字目の“わ”は候補外なのでP。
This will be explained using an example in which the candidate characters 27 in FIG. 5 are compared with the hiragana word dictionary 7 in FIG. 6. Word determination means 5
first compares the candidate character 27 with "Aizawa" 28. Here, the first two characters of the candidate character 27 are converted from one Kanji character, so if both characters do not match, the candidate character 27 is not a candidate. The first two characters of “Aizawa” 28 do not match any of the characters, so they are both non-candidates, the third character “’za” ranks first, and the fourth character “wa” is a non-candidate, so it is P.

1−31となる。次に“えのもと”29との照合である
が、これについてはすべて候補外となりP。2−40と
なる。′みやざき”30との照合では最初の2文字“み
やパが候補文字27中に有り、元の漢字の候補文字は1
位であるので1文字目、2文字目共順位1位とする。3
文字目の“ざパは1位、4文字目の“き”は2位なので
P。3=5となる。
It becomes 1-31. Next, there is a comparison with "Enomoto" 29, but all of them are not candidates, so P. It becomes 2-40. ``Miyazaki'' 30, the first two characters ``Miyapa'' are among the candidate characters 27, and the original kanji candidate character is 1.
Therefore, both the first and second characters are ranked first. 3
The letter ``zapa'' ranks first, and the fourth letter ``ki'' ranks 2nd, so P.3=5.

最後にP。1〜PO3、さらに漢字単語として照合に成
功したP、に関して min  (P、/n) −PO3/n <T (= 
5)が成立するので“みやざき”を候補単語に決定する
。ただし結果出力する際は、−時的にひらがなに置き換
えていた“みや”24を漢字“′宮”21にもどして単
語“宮ざき”という形で出力する。
Finally P. 1 to PO3, and P, which was successfully matched as a kanji word, min (P, /n) −PO3/n <T (=
Since 5) holds true, "Miyazaki" is determined as a candidate word. However, when outputting the result, the ``miya'' 24, which was temporarily replaced with hiragana, is changed back to the kanji character ``'宮'' 21 and output as the word ``Miyazaki''.

なお、上記実施例では、漢字かな変換テーブル内で1つ
の漢字に1組のひらがなを対応させるように示したが、
1つの漢字に複数の組のひらがなを対応させるようにし
てもよい。
In addition, in the above embodiment, one set of hiragana is shown to correspond to one kanji in the kanji-kana conversion table, but
A plurality of sets of hiragana may be associated with one kanji.

〔発明の効果〕〔Effect of the invention〕

以上のように、この発明によれば単語決定手段に漢字単
語辞書とひらがな単語辞書と漢字かな変換テーブルを加
え、入力単語にひらがなが含まれている可能性のある場
合は、候補文字中の全漢字を漢字かな変換テーブルによ
り一時的にひらがなに変換し、ひらがな単語辞書と照合
し、その照合結果も併用して入力単語を決定するように
したので、漢字、かな、漢字かな混じり等の表記の変動
に左右されることなく、正しく単語を決定できるという
効果がある。
As described above, according to the present invention, a kanji word dictionary, a hiragana word dictionary, and a kanji-kana conversion table are added to the word determination means, and when there is a possibility that an input word contains hiragana, all candidate characters are Kanji are temporarily converted to hiragana using a kanji-kana conversion table, and then compared with a hiragana word dictionary, and the results of the comparison are also used to determine the input word. This has the effect of allowing words to be determined correctly without being affected by fluctuations.

【図面の簡単な説明】[Brief explanation of drawings]

第1図は本発明の一実施例による装置の構成図、第2図
と第8図は入力単語とその候補文字を示す例図、第3図
は漢字単語辞書の例を示す図、第4図は漢字かな変換テ
ーブルの例を示す図、第5図は第2図の候補文字で漢字
のものを全てひらがなに置き換えたものを示す図、第6
図はひらがな単語辞書の例を示す図、第7図は従来の装
置の構成図、第9図は従来の装置で使われている単語辞
書の例を示す図である。 1・・・帳票、2・・・走査手段、3・・・文字認識手
段、5・・・単語決定手段、6・・・漢字単語辞書、7
・・・ひらがな単語辞書、8・・漢字かな変換テーブル
。 なお、図中、同一符号は同一、または相当する構成要素
を示す。 代理人 弁理士  宮  園  純
FIG. 1 is a block diagram of an apparatus according to an embodiment of the present invention, FIGS. 2 and 8 are example diagrams showing input words and their candidate characters, FIG. 3 is a diagram showing an example of a kanji word dictionary, and FIG. 4 is a diagram showing an example of a kanji word dictionary. The figure shows an example of a kanji-kana conversion table, Figure 5 shows the candidate characters in Figure 2 with all kanji characters replaced with hiragana, and Figure 6 shows an example of a kanji-kana conversion table.
The figure shows an example of a hiragana word dictionary, FIG. 7 is a block diagram of a conventional device, and FIG. 9 is a diagram showing an example of a word dictionary used in a conventional device. DESCRIPTION OF SYMBOLS 1... Form, 2... Scanning means, 3... Character recognition means, 5... Word determining means, 6... Kanji word dictionary, 7
...Hiragana word dictionary, 8...Kanji-kana conversion table. In addition, in the figures, the same reference numerals indicate the same or corresponding components. Agent Patent Attorney Jun Miyazono

Claims (1)

【特許請求の範囲】[Claims] 帳票上の単語を読取る走査手段と、読取られた上記単語
を構成する各文字パターンを認識し、類似度の高い順に
順序付けされた複数の候補文字を出力する文字認識手段
と、帳票上に記入される単語を漢字で格納した漢字単語
辞書と、上記漢字単語辞書の内容をすべてひらがなで置
き換えて格納したひらがな単語辞書と、上記漢字単語辞
書内の全漢字について1文字単位で対応するひらがなを
示す漢字かな変換テーブルと、上記文字認識手段から出
力された候補文字を入力とし、上記漢字単語辞書、上記
ひらがな単語辞書、上記漢字かな変換テーブルを参照す
ることで単語を決定する単語決定手段とを備え、上記文
字認識手段から得られた候補文字とその順から、上記単
語決定手段はまず漢字単語辞書との照合を行い、特定の
漢字単語との照合が成功し、かつ得られた漢字単語につ
いて、その単語を成す個々の候補文字より上位の候補が
すべて漢字のときは、そのまま得られた漢字単語を出力
し、そうでない場合は上記漢字かな変換テーブルを用い
て、候補文字中の漢字を一時的にすべてひらがなに置き
換えてひらがな単語辞書と照合し、得られたひらがな単
語と上記求めた漢字単語のうち、最も確度の高い単語を
出力することを特徴とする単語読取装置。
a scanning means for reading words on a form, a character recognition means for recognizing each character pattern constituting the read word and outputting a plurality of candidate characters ordered in descending order of similarity; A kanji word dictionary that stores words in kanji, a hiragana word dictionary that stores all the contents of the kanji word dictionary replaced with hiragana, and kanji that indicate the hiragana that corresponds to each character for all the kanji in the kanji word dictionary. comprising a kana conversion table, and word determination means that receives candidate characters output from the character recognition means and determines a word by referring to the kanji word dictionary, the hiragana word dictionary, and the kanji kana conversion table; From the candidate characters and their order obtained from the character recognition means, the word determination means first compares them with a Kanji word dictionary, and for the Kanji words that are successfully matched with a specific Kanji word and obtained, If the candidates higher than the individual candidate characters that make up the word are all kanji, the obtained kanji word is output as is. If not, the Kanji-kana conversion table above is used to temporarily convert the kanji among the candidate characters. A word reading device characterized by replacing all hiragana words with hiragana words, comparing them with a hiragana word dictionary, and outputting the word with the highest accuracy among the obtained hiragana words and the above-determined kanji words.
JP2149089A 1990-06-07 1990-06-07 Word reader Pending JPH0442382A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2149089A JPH0442382A (en) 1990-06-07 1990-06-07 Word reader

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2149089A JPH0442382A (en) 1990-06-07 1990-06-07 Word reader

Publications (1)

Publication Number Publication Date
JPH0442382A true JPH0442382A (en) 1992-02-12

Family

ID=15467460

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2149089A Pending JPH0442382A (en) 1990-06-07 1990-06-07 Word reader

Country Status (1)

Country Link
JP (1) JPH0442382A (en)

Similar Documents

Publication Publication Date Title
US4991094A (en) Method for language-independent text tokenization using a character categorization
Khirbat OCR post-processing text correction using simulated annealing (OPTeCA)
JPH0442382A (en) Word reader
Desta et al. Automatic spelling error detection and correction for Tigrigna information retrieval: a hybrid approach
JP2827066B2 (en) Post-processing method for character recognition of documents with mixed digit strings
JPS61114388A (en) Character input device
JPH0256086A (en) Method for postprocessing for character recognition
JPH0766423B2 (en) Character recognition device
JP2003296354A (en) Dictionary creation device
JPS61267824A (en) Arraying and processing system for japanese word data
JP3245415B2 (en) Character recognition method
JPH01134585A (en) Document reader device having function for processing separated character
JP2639314B2 (en) Character recognition method
JPS63282586A (en) Character recognition device
JPH0795337B2 (en) Word recognition method
JP2875678B2 (en) Post-processing method of character recognition result
JP2917310B2 (en) Word dictionary search method for word matching
JP2923295B2 (en) Pattern identification processing method
JPS60138689A (en) Character recognizing method
JPS63249282A (en) Multifont printed character reader
JPS59188783A (en) Character discriminating and processing system
JPH0262659A (en) Extracting device for correction candidate character of japanese sentence
JPS63138479A (en) Character recognizing device
JPS60138688A (en) Character recognizing method
JPH1185910A (en) Device for recognizing character and method therefor and recording medium for recording the same method