JPH0319590B2 - - Google Patents

Info

Publication number
JPH0319590B2
JPH0319590B2 JP58204183A JP20418383A JPH0319590B2 JP H0319590 B2 JPH0319590 B2 JP H0319590B2 JP 58204183 A JP58204183 A JP 58204183A JP 20418383 A JP20418383 A JP 20418383A JP H0319590 B2 JPH0319590 B2 JP H0319590B2
Authority
JP
Japan
Prior art keywords
character
word
misread
dictionary
characters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired
Application number
JP58204183A
Other languages
Japanese (ja)
Other versions
JPS6097477A (en
Inventor
Tozen Hai
Eiichiro Yamamoto
Yukikazu Kaburayama
Yoshihisa Fujii
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to JP58204183A priority Critical patent/JPS6097477A/en
Publication of JPS6097477A publication Critical patent/JPS6097477A/en
Publication of JPH0319590B2 publication Critical patent/JPH0319590B2/ja
Granted legal-status Critical Current

Links

Description

【発明の詳細な説明】[Detailed description of the invention]

発明の技術分野 本発明は、単語辞書を備えた文字認識装置にお
ける誤読文字修正方式に関する。 従来技術と問題点 手書き又は印刷された文字列を光学的に読取る
文字認識装置では、認識結果をデイスプレイに表
示して、誤つて認識した文字(誤読文字)があれ
ばそれをオペレータの指示により修正する機能が
付加されている。 従来の誤読文字修正方式の代表的なものは、各
文字に対する認識結果を第1位から第N位まで用
意し、誤読文字があればそれをオペレータに指示
させ、この指示に従つて次の順位の認識結果を表
示し、それも誤読文字であれば次の順位の認識結
果を表示し、以下同様にして順次順位の低い文字
候補を表示しながら正しい文字をオペレータに選
択してもらう方法である。この方法の欠点は誤読
文字と前後の文字との関係を利用しないために、
例えば1つの単語の1文字が誤読されている場
合、他の文字と組合わせても既存の単語を構成し
ない無意味な文字候補も、単に候補に挙げられい
るというだけで順次出力してオペレータに選択さ
せ、オペレータ負担を重くし修正に要する時間を
長びかせる点である。 そこで本発明者等は先に単語辞書を利用した誤
読文字修正方式を提案した。この方式は誤読文字
を含む単語の範囲をオペレータが指定すると、認
識装置側が該範囲内の各文字の候補を組合せて構
成される単語候補のうち、実際に単語辞書に存在
する単語だけを選択して出力するものである。従
つて、単語を構成しない文字候補は一切出力され
ないので、その分オペレータによる修正作業は簡
略化され、高速化される。 しかし、この方式を実用化するには次の点に留
意する必要がある。即ち、単語は2文字、3文字
などで構成されが、2文字の単語は非常に多い。
そこで該単語の各文字に対する第1〜第N位候補
間で有意の単語を作つてしまい、オペレータにそ
れではないと指示させる結果となるものが多発す
る恐れがある。例えば、如何に単語辞書を引いて
該辞書にある単語として出力されたものであつて
も、その単語が誤読文字と組合わされたもので
は、その中に正解はない。 発明の目的 本発明は上述した単語単位の誤読文字修正方式
を更に改善して、正解文字はそのまゝ採用しそれ
に誤読文字に対する他の候補文字を組合せた単語
だけを出力することで誤読文字の修正を高速化し
ようとするものである。 発明の構成 本発明は、光学的に読取つた入力文字列を文字
単位で識別する識別部と、該識別部で参照する各
種の文字候補を予め格納した認識辞書と、該識別
部により認識され、且つ1文字当り複数位に順位
付けされた認識結果を格納する候補カテゴリバツ
フアと、該バツフア内の第1順位の文字による文
字列を表示する表示部と、該表示部で表示された
文字列の修正個所を指定できる操作部と、単語単
位の文字修正時に参照される各種の単語を格納し
た単語辞書とを備える文字認識装置の誤読文字修
正方式において、該操作部によつて誤読文字と該
文字を含む単語の範囲を該表示部上で指定するこ
とにより、正解文字はそのまゝとしてそれに誤読
文字に対する他の候補文字を組合せて該範囲と同
じ単語長になる単語を作り、それを単語辞書で検
索して該辞書に存在した単語を取出し、その各々
について該候補文字に関する前記バツフア内の順
位を参照して距離計算し、該距離が最小の単語を
該誤読文字を含む単語範囲の他の候補として表示
することを特徴とするが、以下図示の実施例を参
照しながらこれを詳細に説明する。 発明の実施例 表1は一般的な文字認識装置において認識した
入力文字列「この分野における進歩……」に対す
る認識結果即ち1文字ごとの1位から5位までの
候補文字を示したものである。
TECHNICAL FIELD OF THE INVENTION The present invention relates to a method for correcting misread characters in a character recognition device equipped with a word dictionary. Conventional technology and problems In character recognition devices that optically read handwritten or printed character strings, the recognition results are displayed on a display, and if there are characters that are incorrectly recognized (misread characters), they can be corrected by instructions from the operator. The function to do this is added. A typical conventional method for correcting misread characters is to prepare the recognition results for each character from 1st to Nth, have the operator indicate which character is misread, and then move to the next rank according to these instructions. If it is a misread character, the recognition result of the next rank is displayed, and in the same way, lower rank character candidates are displayed in order and the operator selects the correct character. . The disadvantage of this method is that it does not utilize the relationship between the misread character and the characters before and after it.
For example, if a single character of a word is misread, meaningless character candidates that do not form an existing word even when combined with other characters are output one after another to the operator just because they are listed as candidates. This is because the operator is forced to make selections, which increases the burden on the operator and lengthens the time required for correction. Therefore, the present inventors previously proposed a method for correcting misread characters using a word dictionary. In this method, when the operator specifies a range of words that include misread characters, the recognition device selects only the words that actually exist in the word dictionary from among the word candidates that are formed by combining candidates for each character within the range. This is what is output. Therefore, character candidates that do not constitute a word are not output at all, so the correction work by the operator is simplified and speeded up accordingly. However, in order to put this method into practical use, it is necessary to pay attention to the following points. That is, words are composed of two or three letters, and there are very many two-letter words.
Therefore, there is a risk that a significant word will be created among the first to Nth candidates for each character of the word, resulting in the operator being instructed that it is not the word. For example, no matter how many words you look up in a word dictionary and output as a word in the dictionary, if that word is combined with a misread character, there is no correct answer. Purpose of the Invention The present invention further improves the word-by-word misread character correction method described above, and corrects the misread characters by using the correct character as is and outputting only words that combine it with other candidate characters for the misread character. This is an attempt to speed up corrections. Structure of the Invention The present invention provides an identification unit that identifies an optically read input character string character by character, a recognition dictionary in which various character candidates referenced by the identification unit are stored in advance, and a and a candidate category buffer that stores recognition results ranked in multiple positions per character, a display section that displays a character string of the first-ranked character in the buffer, and a character string displayed on the display section. In the misread character correction method of a character recognition device, which is equipped with an operation section that allows you to specify the correction part of a character, and a word dictionary that stores various words that are referred to when correcting characters on a word-by-word basis, the operation section allows you to specify the misread characters and the By specifying a range of words containing characters on the display, the correct character is left as is, and other candidate characters for misread characters are combined to create a word with the same word length as the range, and then the word Search the dictionary to retrieve the words that existed in the dictionary, calculate the distance for each of the words by referring to the rank in the buffer regarding the candidate character, and select the word with the smallest distance from the word range that includes the misread character. This will be described in detail below with reference to the illustrated embodiment. Embodiments of the Invention Table 1 shows the recognition results for the input character string "Advances in this field..." recognized by a general character recognition device, that is, the candidate characters ranked 1st to 5th for each character. .

【表】 候補文字(候補カテゴリ)の順位1,2,……
は類似度の高い順で、各文字の1位が優先して先
ず表示される。オペレータはその表示から入力文
字列が正しく読み取られたかを判断する。表1の
例では入力文字列第4文字目の「野」と第9文字
目の「進」がそれぞれ「師」と「追」に誤読され
ていることが判る。そこで修正が必要となる。 表2は誤りを文字単位で指示する従来方式1
と、誤りを単語単位で指定する従来方式2と、本
発明方式とを指示要領別及び選択順位別に対比し
て示したものである。
[Table] Ranking of candidate characters (candidate categories) 1, 2,...
are displayed first in order of similarity, with the first place of each character given priority. The operator determines whether the input character string has been correctly read from the display. In the example of Table 1, it can be seen that the fourth character "No" and the ninth character "Shin" in the input character string are misread as "shi" and "su", respectively. Therefore, a correction is necessary. Table 2 shows conventional method 1 that indicates errors character by character.
This figure shows a comparison between conventional method 2, which specifies errors in word units, and the method of the present invention, by instruction method and by selection order.

【表】 従来方式1は誤読文字を文字単位で修正するも
のであるため、第4文字目の「野」に相当する部
分をカーソル等で指定して誤りであることを示す
と、表1の候補カテゴリから順位2の文字「貯」
が代つて表示され、それも誤りであることを指示
すると順位3の文字「終」が表示され、以下同様
であつてこうして順次順位の低い文字が表示され
る。目的とする「野」は4位であるから、オペレ
ータが3回選択操作をすると修正される。第9文
字目の「進」についても候補カテゴリは3位であ
るから2回目の選択で修正できる。以下同様であ
り、目的とする文字の順位が低いほど多くの選択
操作が必要となる。 従来方式2は単語辞書を用いる方式である。但
し、単語範囲(誤読文字を含む単語)だけを指示
して該単語の複数個ある文字のどれが誤読文字で
あるかは示さないので、各文字毎の複数個の候補
文字の全てにいての組合せを単語辞書で引き、該
組合せが辞書にあるか否かをチエツクすることに
なる。例えば、第3および第4文字からなる単語
(「分野」に相当する)を指示した場合には、第3
文字候補と第4文字候補の全ての組合せ(5×5
=25通り)の中に単語として意味を持つものがあ
るか否かを単語辞書で調べることになる。そし
て、この場合は偶然、この組合せの中に意味を持
つもの従つて辞書にあるものは「分野」だけであ
るから問題ない(直接、正解が得られる)が、第
9および第10文字からなる単語(「進歩」に相当
する)の場合には、選択順位の1位に誤読文字の
「追」を含んで意味をもつ単語「追手」が入り、
正解の単語「進歩」が2位になる欠点がある。こ
の場合は先ず「追手」が表示され、オペレータが
これではないと指示して次のメニユーを出させる
必要がある。 本発明では上記従来方式2の単語範囲の指示に
加え、どの文字が誤読されているかの指示も行
う。この点は表2には示してないが、かかる誤読
文字の指示を行えば単語辞書の検索範囲が狭くな
りその分時間が短縮されるが、それだけでなく表
2に示すように「集歩」が第1順位で選択される
ようになる(後述する)のでオペレータの操作が
更に楽になる。 第1図は上述した処理をなす本発明の一実施例
を示す要部構成図、第2図は文字認識装置全体の
ブロツク図である。先ず第2図を参照して全体の
概略動作を説明する。観測部1は光学式の文字読
取り機能を有し、手書き又は印刷された入力文字
列を読取る(光電変換する)。特徴抽出部2は読
取られた入力文字列の各文字の特徴を検出して以
後の認識操作がしやすいようにする。識別部3は
特徴抽出部2の出力を認識辞書4内の認識用のパ
ターン又はパラメータ等と比較、照合して距離計
算を行い、各文字毎に類似度が高い1位から5位
までの候補カテゴリ(表1参照)をバツフア5に
格納する。ここまでは表2の全ての方式に共通で
あり、各文字毎に1位の文字を配列した認識結果
(文字列)がデイスプレイ制御部8を通して表示
部9に表示される。オペレータはこの表示を見て
誤読されている文字があれば該誤読文字とそれを
含む単語範囲を指示する。これらの修正指示は例
えばキーボード(操作部)10から行なう。修正
指示を受けた文字認識装置は単語マツチング部7
を起動し、単語辞書6を参照しての単語単位の文
字修正を行う。 この動作を第1図で説明する。同図は表1の第
9、第10文字の単語「進歩」を例としたものであ
る。キーボード10からは単語範囲と誤読文字を
指示るので、その修正指示情報Aを受けた単語辞
書6は単語長が“2”で最後が「歩」となる単語
B(第1文字が誤読文字「追」の第2〜第5候補
で、第2文字が歩である単語)を全て単語マツチ
ング部7に出力する。この単語Bが「進歩」「退
歩」……など複数あるときは、単語マツチング部
7はバツフア5内の誤読文字に関する順位を参照
して単語Bに順位付けを行う。つまり2文字目を
「歩」に固定して1文字目を表1の第9文字のい
ずれかと組合せると5通りの単語候補Cが生ずる
(エラーと判つている第1位は無視してよい)。そ
こで、単語マツチング部7は辞書6からの単語B
と上記の単語候補Cの距離計算を行う。この際2
文字目の「歩」については考える必要がないの
で、実際には1文字目についてだけ距離計算を行
う。 先ず、単語Bの「進歩」について単語候補Cと
距離計算を行うと、1文字目の「進」が候補順位
3位であるため、順位K位に距離(K−1)とい
う値を与えるものとすれば、「進歩」の距離Dは
“2”となる。同様の考えから単語Bの「退歩」
については距離“4“が与えられる。このように
単語Bの全てについて距離計算を行つた後、距離
が最小の単語(この場合「進歩」)が表示部9に
表示される。従つて、表2に示す従来方式2のよ
うに、「進歩」の前に誤読文字を含む単語「追手」
が出力されることはない。 発明の効果 以上述べたように本発明によれば、誤読された
文字を含む単語の範囲のみならず該文字そのもの
も指定するので、単語辞書の検索範囲が狭くなつ
て処理時間が短縮されると共に、該誤読文字に関
する認識時の文字候補順位も参照するので修正用
の出力単語の正解率が高まる利点がある。
[Table] Conventional method 1 corrects misread characters on a character-by-character basis. Rank 2 character “Tame” from the candidate category
is displayed instead, and when it is indicated that this is also an error, the character ``end'' of rank 3 is displayed, and the same goes for the characters of lower rank. Since the target "field" is in the 4th place, it will be corrected if the operator performs the selection operation three times. The candidate category for the ninth character "Shin" is also in the third place, so it can be corrected in the second selection. The same applies below, and the lower the rank of the target character, the more selection operations are required. Conventional method 2 is a method using a word dictionary. However, since it only specifies the word range (words that include misread characters) and does not indicate which of the multiple characters in the word is the misread character, it is possible to specify all of the multiple candidate characters for each character. The combination is looked up in a word dictionary and checked to see if the combination is in the dictionary. For example, if you specify a word consisting of the third and fourth letters (corresponding to "field"),
All combinations of character candidates and 4th character candidates (5 x 5
= 25 ways) to find out whether there is a word that has meaning as a word in a word dictionary. In this case, by chance, the only thing that has a meaning in this combination, and therefore the only thing in the dictionary, is "field", so there is no problem (the correct answer can be obtained directly), but the 9th and 10th letters In the case of a word (corresponding to "progress"), the word "Oite", which has a meaning that includes the misread character "su", is ranked first in the selection order.
The problem is that the correct word ``progress'' comes in second place. In this case, "Pursuer" is first displayed, and the operator must indicate that it is not this and have the next menu appear. In the present invention, in addition to the instruction of the word range as in the conventional method 2, an instruction is also provided as to which character is being misread. Although this point is not shown in Table 2, if you specify such misread characters, the search range of the word dictionary will be narrowed and the time will be shortened accordingly. is selected in the first order (described later), making the operator's operation even easier. FIG. 1 is a block diagram of an essential part of an embodiment of the present invention that performs the above-described processing, and FIG. 2 is a block diagram of the entire character recognition device. First, the overall general operation will be explained with reference to FIG. The observation unit 1 has an optical character reading function, and reads (photoelectrically converts) a handwritten or printed input character string. The feature extraction unit 2 detects the features of each character in the read input character string to facilitate subsequent recognition operations. The identification unit 3 compares and collates the output of the feature extraction unit 2 with recognition patterns or parameters in the recognition dictionary 4, calculates distance, and selects candidates from 1st to 5th with the highest degree of similarity for each character. The categories (see Table 1) are stored in buffer 5. The steps up to this point are common to all the methods in Table 2, and the recognition result (character string) in which the first-ranked character is arranged for each character is displayed on the display section 9 through the display control section 8. The operator looks at this display and, if any character is misread, indicates the misread character and the word range containing it. These correction instructions are given, for example, from the keyboard (operation unit) 10. The character recognition device that received the correction instruction is the word matching section 7.
is activated, and characters are corrected for each word by referring to the word dictionary 6. This operation will be explained with reference to FIG. The figure shows an example of the word "progress" in the 9th and 10th letters of Table 1. Since the word range and the misread character are specified from the keyboard 10, the word dictionary 6 that has received the correction instruction information A contains the word B whose word length is "2" and the last character is "Ayumu" (the first character is the misread character "Ayu"). All of the second to fifth candidates for ``su'' (words in which the second character is ``ayu'') are output to the word matching section 7. When there are multiple words B such as "progress", "regression", etc., the word matching unit 7 ranks the words B by referring to the ranks of misread characters in the buffer 5. In other words, if you fix the second character as "Ayumu" and combine the first character with any of the 9th characters in Table 1, five word candidates C will be generated (the first word that is known to be an error can be ignored) ). Therefore, the word matching unit 7 selects the word B from the dictionary 6.
The distance between the word candidate C and the above word candidate C is calculated. At this time 2
Since there is no need to think about the first character "Ayumu", the distance calculation is actually performed only for the first character. First, when we calculate the distance between word B "Shin" and word candidate C, we find that the first character "Shin" is in the third candidate ranking, so we give the value of distance (K-1) to the ranking K. Then, the distance D of "progress" is "2". From a similar idea, the word B “regression”
A distance "4" is given for . After distance calculations are performed for all of the words B in this manner, the word with the smallest distance (in this case, "progress") is displayed on the display section 9. Therefore, as in conventional method 2 shown in Table 2, the word ``suite'' that includes a misread character before ``progress''
is never output. Effects of the Invention As described above, according to the present invention, not only the word range including the misread character but also the character itself is specified, so the search range of the word dictionary is narrowed and the processing time is shortened. , since the character candidate ranking at the time of recognition regarding the misread character is also referred to, there is an advantage that the correct answer rate of the output word for correction is increased.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明の一実施例を示す要部説明図、
第2図は本発明を適用した文字認識装置全体のブ
ロツク図である。 図中、1は観測部、2は特徴抽出部、3は識別
部、4は認識辞書、5は候補カテゴリバツフア、
6は単語辞書、7は単語マツチング部、8はデイ
スプレイ制御部、9は表示部、10は操作部であ
る。
FIG. 1 is an explanatory diagram of main parts showing one embodiment of the present invention,
FIG. 2 is a block diagram of the entire character recognition device to which the present invention is applied. In the figure, 1 is an observation unit, 2 is a feature extraction unit, 3 is an identification unit, 4 is a recognition dictionary, 5 is a candidate category buffer,
6 is a word dictionary, 7 is a word matching section, 8 is a display control section, 9 is a display section, and 10 is an operation section.

Claims (1)

【特許請求の範囲】 1 光学的に読取つた入力文字列を文字単位で識
別する識別部と、該識別部で参照する認識辞書
と、該識別部により認識され、且つ1文字当り複
数位に順位付けされた認識結果を格納する候補カ
テゴリバツフアと、該バツフア内の第1順位の文
字による文字列を表示する表示部と、該表示部で
表示された文字列の修正個所を指定できる操作部
と、単語単位の文字修正時に参照される各種の単
語を格納した単語辞書とを備える文字認識装置の
誤読文字修正方式において、 誤操作部によつて誤読文字と該文字を含む単語
の範囲を該表示部上で指定することにより、正解
文字はそのまゝとしてそれに誤読文字に対する他
の候補文字を組合せて該範囲と同じ単語長になる
単語を作り、それを単語辞書で検索して該辞書に
存在した単語を取出し、その各々について該候補
文字に関する前記バツフア内の順位を参照して距
離計算し、該距離が最小の単語を該誤読文字を含
む単語範囲の他の候補として表示することを特徴
とする誤読文字修正方式。
[Scope of Claims] 1. An identification unit that identifies an optically read input character string character by character, a recognition dictionary that is referenced by the identification unit, and a recognition dictionary that is recognized by the identification unit and ranks each character in multiple positions. A candidate category buffer that stores the assigned recognition results, a display section that displays a character string of the first-ranked character in the buffer, and an operation section that allows you to specify where to modify the character string displayed on the display section. In a misread character correction method of a character recognition device, which includes a word dictionary storing various words to be referred to when correcting characters on a word-by-word basis, the misread character and the range of words including the misread character are displayed by the misoperation unit. By specifying the word in the dictionary, the correct character is left as is, and other candidate characters for the misread character are combined to create a word with the same word length as the range, and the word is searched in the word dictionary and found in the dictionary. The method is characterized by extracting the words that have been read, calculating the distance for each of the words by referring to the rank in the buffer with respect to the candidate character, and displaying the word with the smallest distance as another candidate in the word range that includes the misread character. A method for correcting misread characters.
JP58204183A 1983-10-31 1983-10-31 Correcting system of misread character Granted JPS6097477A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP58204183A JPS6097477A (en) 1983-10-31 1983-10-31 Correcting system of misread character

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP58204183A JPS6097477A (en) 1983-10-31 1983-10-31 Correcting system of misread character

Publications (2)

Publication Number Publication Date
JPS6097477A JPS6097477A (en) 1985-05-31
JPH0319590B2 true JPH0319590B2 (en) 1991-03-15

Family

ID=16486208

Family Applications (1)

Application Number Title Priority Date Filing Date
JP58204183A Granted JPS6097477A (en) 1983-10-31 1983-10-31 Correcting system of misread character

Country Status (1)

Country Link
JP (1) JPS6097477A (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6330990A (en) * 1986-07-25 1988-02-09 Matsushita Electric Ind Co Ltd Character recognizing device
JPS63157292A (en) * 1986-12-22 1988-06-30 Yokogawa Electric Corp Hand-written kanji ocr device
JP4744317B2 (en) 2006-02-16 2011-08-10 富士通株式会社 Word search device, word search method, and computer program
JP4647032B1 (en) * 2010-01-08 2011-03-09 一夫 黒川 Hydroelectric power conduit and mountain hydropower generation method

Also Published As

Publication number Publication date
JPS6097477A (en) 1985-05-31

Similar Documents

Publication Publication Date Title
JPH0319590B2 (en)
JPH0454564A (en) Weight learning type text base retrieving device
JPS6210763A (en) Kana to kanji conversion system
JPH1049623A (en) Character reader
JPS59229683A (en) Recognition processor
JPS61239378A (en) Discrimination processor
JPS61184683A (en) Recognition-result selecting system
JP3442548B2 (en) Character recognition method and device
JP2669897B2 (en) How to correct misread characters
JPS6118081A (en) Candidate character selection system for recognition of handwritten character
JPH113401A (en) Information processor and its method
JP3387421B2 (en) Word input support device and word input support method
JPH0225967A (en) Misinput correction system for homophone
JP2907947B2 (en) Optical character reading system
JP2001283156A (en) Device and method for recognizing address and computer readable recording medium stored with program for allowing computer to execute the same method
JPH06149888A (en) Electronic filing system
JP2000099635A (en) Device and method for predicting character string
JPH0475185A (en) Input device
JPH06223219A (en) Character recognizing device
JPH05143303A (en) Fetching system for program information
JPH06325202A (en) Character string correcting device
JPS58169629A (en) Information processing system
JPH05114052A (en) Method and device for recognizing character
JPH01134584A (en) Device for recognizing character
JPH0582621B2 (en)