JPH0319590B2

JPH0319590B2 -

Info

Publication number: JPH0319590B2
Application number: JP58204183A
Authority: JP
Inventors: Tozen Hai; Eiichiro Yamamoto; Yukikazu Kaburayama; Yoshihisa Fujii
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1983-10-31
Filing date: 1983-10-31
Publication date: 1991-03-15
Also published as: JPS6097477A

Description

[Detailed description of the invention]

発明の技術分野本発明は、単語辞書を備えた文字認識装置にお
ける誤読文字修正方式に関する。従来技術と問題点手書き又は印刷された文字列を光学的に読取る
文字認識装置では、認識結果をデイスプレイに表
示して、誤つて認識した文字（誤読文字）があれ
ばそれをオペレータの指示により修正する機能が
付加されている。従来の誤読文字修正方式の代表的なものは、各
文字に対する認識結果を第１位から第Ｎ位まで用
意し、誤読文字があればそれをオペレータに指示
させ、この指示に従つて次の順位の認識結果を表
示し、それも誤読文字であれば次の順位の認識結
果を表示し、以下同様にして順次順位の低い文字
候補を表示しながら正しい文字をオペレータに選
択してもらう方法である。この方法の欠点は誤読
文字と前後の文字との関係を利用しないために、
例えば１つの単語の１文字が誤読されている場
合、他の文字と組合わせても既存の単語を構成し
ない無意味な文字候補も、単に候補に挙げられい
るというだけで順次出力してオペレータに選択さ
せ、オペレータ負担を重くし修正に要する時間を
長びかせる点である。そこで本発明者等は先に単語辞書を利用した誤
読文字修正方式を提案した。この方式は誤読文字
を含む単語の範囲をオペレータが指定すると、認
識装置側が該範囲内の各文字の候補を組合せて構
成される単語候補のうち、実際に単語辞書に存在
する単語だけを選択して出力するものである。従
つて、単語を構成しない文字候補は一切出力され
ないので、その分オペレータによる修正作業は簡
略化され、高速化される。しかし、この方式を実用化するには次の点に留
意する必要がある。即ち、単語は２文字、３文字
などで構成されが、２文字の単語は非常に多い。
そこで該単語の各文字に対する第１〜第Ｎ位候補
間で有意の単語を作つてしまい、オペレータにそ
れではないと指示させる結果となるものが多発す
る恐れがある。例えば、如何に単語辞書を引いて
該辞書にある単語として出力されたものであつて
も、その単語が誤読文字と組合わされたもので
は、その中に正解はない。発明の目的本発明は上述した単語単位の誤読文字修正方式
を更に改善して、正解文字はそのまゝ採用しそれ
に誤読文字に対する他の候補文字を組合せた単語
だけを出力することで誤読文字の修正を高速化し
ようとするものである。発明の構成本発明は、光学的に読取つた入力文字列を文字
単位で識別する識別部と、該識別部で参照する各
種の文字候補を予め格納した認識辞書と、該識別
部により認識され、且つ１文字当り複数位に順位
付けされた認識結果を格納する候補カテゴリバツ
フアと、該バツフア内の第１順位の文字による文
字列を表示する表示部と、該表示部で表示された
文字列の修正個所を指定できる操作部と、単語単
位の文字修正時に参照される各種の単語を格納し
た単語辞書とを備える文字認識装置の誤読文字修
正方式において、該操作部によつて誤読文字と該
文字を含む単語の範囲を該表示部上で指定するこ
とにより、正解文字はそのまゝとしてそれに誤読
文字に対する他の候補文字を組合せて該範囲と同
じ単語長になる単語を作り、それを単語辞書で検
索して該辞書に存在した単語を取出し、その各々
について該候補文字に関する前記バツフア内の順
位を参照して距離計算し、該距離が最小の単語を
該誤読文字を含む単語範囲の他の候補として表示
することを特徴とするが、以下図示の実施例を参
照しながらこれを詳細に説明する。発明の実施例表１は一般的な文字認識装置において認識した
入力文字列「この分野における進歩……」に対す
る認識結果即ち１文字ごとの１位から５位までの
候補文字を示したものである。 TECHNICAL FIELD OF THE INVENTION The present invention relates to a method for correcting misread characters in a character recognition device equipped with a word dictionary. Conventional technology and problems In character recognition devices that optically read handwritten or printed character strings, the recognition results are displayed on a display, and if there are characters that are incorrectly recognized (misread characters), they can be corrected by instructions from the operator. The function to do this is added. A typical conventional method for correcting misread characters is to prepare the recognition results for each character from 1st to Nth, have the operator indicate which character is misread, and then move to the next rank according to these instructions. If it is a misread character, the recognition result of the next rank is displayed, and in the same way, lower rank character candidates are displayed in order and the operator selects the correct character. . The disadvantage of this method is that it does not utilize the relationship between the misread character and the characters before and after it.
For example, if a single character of a word is misread, meaningless character candidates that do not form an existing word even when combined with other characters are output one after another to the operator just because they are listed as candidates. This is because the operator is forced to make selections, which increases the burden on the operator and lengthens the time required for correction. Therefore, the present inventors previously proposed a method for correcting misread characters using a word dictionary. In this method, when the operator specifies a range of words that include misread characters, the recognition device selects only the words that actually exist in the word dictionary from among the word candidates that are formed by combining candidates for each character within the range. This is what is output. Therefore, character candidates that do not constitute a word are not output at all, so the correction work by the operator is simplified and speeded up accordingly. However, in order to put this method into practical use, it is necessary to pay attention to the following points. That is, words are composed of two or three letters, and there are very many two-letter words.
Therefore, there is a risk that a significant word will be created among the first to Nth candidates for each character of the word, resulting in the operator being instructed that it is not the word. For example, no matter how many words you look up in a word dictionary and output as a word in the dictionary, if that word is combined with a misread character, there is no correct answer. Purpose of the Invention The present invention further improves the word-by-word misread character correction method described above, and corrects the misread characters by using the correct character as is and outputting only words that combine it with other candidate characters for the misread character. This is an attempt to speed up corrections. Structure of the Invention The present invention provides an identification unit that identifies an optically read input character string character by character, a recognition dictionary in which various character candidates referenced by the identification unit are stored in advance, and a and a candidate category buffer that stores recognition results ranked in multiple positions per character, a display section that displays a character string of the first-ranked character in the buffer, and a character string displayed on the display section. In the misread character correction method of a character recognition device, which is equipped with an operation section that allows you to specify the correction part of a character, and a word dictionary that stores various words that are referred to when correcting characters on a word-by-word basis, the operation section allows you to specify the misread characters and the By specifying a range of words containing characters on the display, the correct character is left as is, and other candidate characters for misread characters are combined to create a word with the same word length as the range, and then the word Search the dictionary to retrieve the words that existed in the dictionary, calculate the distance for each of the words by referring to the rank in the buffer regarding the candidate character, and select the word with the smallest distance from the word range that includes the misread character. This will be described in detail below with reference to the illustrated embodiment. Embodiments of the Invention Table 1 shows the recognition results for the input character string "Advances in this field..." recognized by a general character recognition device, that is, the candidate characters ranked 1st to 5th for each character. .

【表】候補文字（候補カテゴリ）の順位１，２，……
は類似度の高い順で、各文字の１位が優先して先
ず表示される。オペレータはその表示から入力文
字列が正しく読み取られたかを判断する。表１の
例では入力文字列第４文字目の「野」と第９文字
目の「進」がそれぞれ「師」と「追」に誤読され
ていることが判る。そこで修正が必要となる。表２は誤りを文字単位で指示する従来方式１
と、誤りを単語単位で指定する従来方式２と、本
発明方式とを指示要領別及び選択順位別に対比し
て示したものである。[Table] Ranking of candidate characters (candidate categories) 1, 2,...
are displayed first in order of similarity, with the first place of each character given priority. The operator determines whether the input character string has been correctly read from the display. In the example of Table 1, it can be seen that the fourth character "No" and the ninth character "Shin" in the input character string are misread as "shi" and "su", respectively. Therefore, a correction is necessary. Table 2 shows conventional method 1 that indicates errors character by character.
This figure shows a comparison between conventional method 2, which specifies errors in word units, and the method of the present invention, by instruction method and by selection order.

【表】従来方式１は誤読文字を文字単位で修正するも
のであるため、第４文字目の「野」に相当する部
分をカーソル等で指定して誤りであることを示す
と、表１の候補カテゴリから順位２の文字「貯」
が代つて表示され、それも誤りであることを指示
すると順位３の文字「終」が表示され、以下同様
であつてこうして順次順位の低い文字が表示され
る。目的とする「野」は４位であるから、オペレ
ータが３回選択操作をすると修正される。第９文
字目の「進」についても候補カテゴリは３位であ
るから２回目の選択で修正できる。以下同様であ
り、目的とする文字の順位が低いほど多くの選択
操作が必要となる。従来方式２は単語辞書を用いる方式である。但
し、単語範囲（誤読文字を含む単語）だけを指示
して該単語の複数個ある文字のどれが誤読文字で
あるかは示さないので、各文字毎の複数個の候補
文字の全てにいての組合せを単語辞書で引き、該
組合せが辞書にあるか否かをチエツクすることに
なる。例えば、第３および第４文字からなる単語
（「分野」に相当する）を指示した場合には、第３
文字候補と第４文字候補の全ての組合せ（５×５
＝25通り）の中に単語として意味を持つものがあ
るか否かを単語辞書で調べることになる。そし
て、この場合は偶然、この組合せの中に意味を持
つもの従つて辞書にあるものは「分野」だけであ
るから問題ない（直接、正解が得られる）が、第
９および第10文字からなる単語（「進歩」に相当
する）の場合には、選択順位の１位に誤読文字の
「追」を含んで意味をもつ単語「追手」が入り、
正解の単語「進歩」が２位になる欠点がある。こ
の場合は先ず「追手」が表示され、オペレータが
これではないと指示して次のメニユーを出させる
必要がある。本発明では上記従来方式２の単語範囲の指示に
加え、どの文字が誤読されているかの指示も行
う。この点は表２には示してないが、かかる誤読
文字の指示を行えば単語辞書の検索範囲が狭くな
りその分時間が短縮されるが、それだけでなく表
２に示すように「集歩」が第１順位で選択される
ようになる（後述する）のでオペレータの操作が
更に楽になる。第１図は上述した処理をなす本発明の一実施例
を示す要部構成図、第２図は文字認識装置全体の
ブロツク図である。先ず第２図を参照して全体の
概略動作を説明する。観測部１は光学式の文字読
取り機能を有し、手書き又は印刷された入力文字
列を読取る（光電変換する）。特徴抽出部２は読
取られた入力文字列の各文字の特徴を検出して以
後の認識操作がしやすいようにする。識別部３は
特徴抽出部２の出力を認識辞書４内の認識用のパ
ターン又はパラメータ等と比較、照合して距離計
算を行い、各文字毎に類似度が高い１位から５位
までの候補カテゴリ（表１参照）をバツフア５に
格納する。ここまでは表２の全ての方式に共通で
あり、各文字毎に１位の文字を配列した認識結果
（文字列）がデイスプレイ制御部８を通して表示
部９に表示される。オペレータはこの表示を見て
誤読されている文字があれば該誤読文字とそれを
含む単語範囲を指示する。これらの修正指示は例
えばキーボード（操作部）１０から行なう。修正
指示を受けた文字認識装置は単語マツチング部７
を起動し、単語辞書６を参照しての単語単位の文
字修正を行う。この動作を第１図で説明する。同図は表１の第
９、第10文字の単語「進歩」を例としたものであ
る。キーボード１０からは単語範囲と誤読文字を
指示るので、その修正指示情報Ａを受けた単語辞
書６は単語長が“２”で最後が「歩」となる単語
Ｂ（第１文字が誤読文字「追」の第２〜第５候補
で、第２文字が歩である単語）を全て単語マツチ
ング部７に出力する。この単語Ｂが「進歩」「退
歩」……など複数あるときは、単語マツチング部
７はバツフア５内の誤読文字に関する順位を参照
して単語Ｂに順位付けを行う。つまり２文字目を
「歩」に固定して１文字目を表１の第９文字のい
ずれかと組合せると５通りの単語候補Ｃが生ずる
（エラーと判つている第１位は無視してよい）。そ
こで、単語マツチング部７は辞書６からの単語Ｂ
と上記の単語候補Ｃの距離計算を行う。この際２
文字目の「歩」については考える必要がないの
で、実際には１文字目についてだけ距離計算を行
う。先ず、単語Ｂの「進歩」について単語候補Ｃと
距離計算を行うと、１文字目の「進」が候補順位
３位であるため、順位Ｋ位に距離（Ｋ−１）とい
う値を与えるものとすれば、「進歩」の距離Ｄは
“２”となる。同様の考えから単語Ｂの「退歩」
については距離“４“が与えられる。このように
単語Ｂの全てについて距離計算を行つた後、距離
が最小の単語（この場合「進歩」）が表示部９に
表示される。従つて、表２に示す従来方式２のよ
うに、「進歩」の前に誤読文字を含む単語「追手」
が出力されることはない。発明の効果以上述べたように本発明によれば、誤読された
文字を含む単語の範囲のみならず該文字そのもの
も指定するので、単語辞書の検索範囲が狭くなつ
て処理時間が短縮されると共に、該誤読文字に関
する認識時の文字候補順位も参照するので修正用
の出力単語の正解率が高まる利点がある。[Table] Conventional method 1 corrects misread characters on a character-by-character basis. Rank 2 character “Tame” from the candidate category
is displayed instead, and when it is indicated that this is also an error, the character ``end'' of rank 3 is displayed, and the same goes for the characters of lower rank. Since the target "field" is in the 4th place, it will be corrected if the operator performs the selection operation three times. The candidate category for the ninth character "Shin" is also in the third place, so it can be corrected in the second selection. The same applies below, and the lower the rank of the target character, the more selection operations are required. Conventional method 2 is a method using a word dictionary. However, since it only specifies the word range (words that include misread characters) and does not indicate which of the multiple characters in the word is the misread character, it is possible to specify all of the multiple candidate characters for each character. The combination is looked up in a word dictionary and checked to see if the combination is in the dictionary. For example, if you specify a word consisting of the third and fourth letters (corresponding to "field"),
All combinations of character candidates and 4th character candidates (5 x 5
= 25 ways) to find out whether there is a word that has meaning as a word in a word dictionary. In this case, by chance, the only thing that has a meaning in this combination, and therefore the only thing in the dictionary, is "field", so there is no problem (the correct answer can be obtained directly), but the 9th and 10th letters In the case of a word (corresponding to "progress"), the word "Oite", which has a meaning that includes the misread character "su", is ranked first in the selection order.
The problem is that the correct word ``progress'' comes in second place. In this case, "Pursuer" is first displayed, and the operator must indicate that it is not this and have the next menu appear. In the present invention, in addition to the instruction of the word range as in the conventional method 2, an instruction is also provided as to which character is being misread. Although this point is not shown in Table 2, if you specify such misread characters, the search range of the word dictionary will be narrowed and the time will be shortened accordingly. is selected in the first order (described later), making the operator's operation even easier. FIG. 1 is a block diagram of an essential part of an embodiment of the present invention that performs the above-described processing, and FIG. 2 is a block diagram of the entire character recognition device. First, the overall general operation will be explained with reference to FIG. The observation unit 1 has an optical character reading function, and reads (photoelectrically converts) a handwritten or printed input character string. The feature extraction unit 2 detects the features of each character in the read input character string to facilitate subsequent recognition operations. The identification unit 3 compares and collates the output of the feature extraction unit 2 with recognition patterns or parameters in the recognition dictionary 4, calculates distance, and selects candidates from 1st to 5th with the highest degree of similarity for each character. The categories (see Table 1) are stored in buffer 5. The steps up to this point are common to all the methods in Table 2, and the recognition result (character string) in which the first-ranked character is arranged for each character is displayed on the display section 9 through the display control section 8. The operator looks at this display and, if any character is misread, indicates the misread character and the word range containing it. These correction instructions are given, for example, from the keyboard (operation unit) 10. The character recognition device that received the correction instruction is the word matching section 7.
is activated, and characters are corrected for each word by referring to the word dictionary 6. This operation will be explained with reference to FIG. The figure shows an example of the word "progress" in the 9th and 10th letters of Table 1. Since the word range and the misread character are specified from the keyboard 10, the word dictionary 6 that has received the correction instruction information A contains the word B whose word length is "2" and the last character is "Ayumu" (the first character is the misread character "Ayu"). All of the second to fifth candidates for ``su'' (words in which the second character is ``ayu'') are output to the word matching section 7. When there are multiple words B such as "progress", "regression", etc., the word matching unit 7 ranks the words B by referring to the ranks of misread characters in the buffer 5. In other words, if you fix the second character as "Ayumu" and combine the first character with any of the 9th characters in Table 1, five word candidates C will be generated (the first word that is known to be an error can be ignored) ). Therefore, the word matching unit 7 selects the word B from the dictionary 6.
The distance between the word candidate C and the above word candidate C is calculated. At this time 2
Since there is no need to think about the first character "Ayumu", the distance calculation is actually performed only for the first character. First, when we calculate the distance between word B "Shin" and word candidate C, we find that the first character "Shin" is in the third candidate ranking, so we give the value of distance (K-1) to the ranking K. Then, the distance D of "progress" is "2". From a similar idea, the word B “regression”
A distance "4" is given for . After distance calculations are performed for all of the words B in this manner, the word with the smallest distance (in this case, "progress") is displayed on the display section 9. Therefore, as in conventional method 2 shown in Table 2, the word ``suite'' that includes a misread character before ``progress''
is never output. Effects of the Invention As described above, according to the present invention, not only the word range including the misread character but also the character itself is specified, so the search range of the word dictionary is narrowed and the processing time is shortened. , since the character candidate ranking at the time of recognition regarding the misread character is also referred to, there is an advantage that the correct answer rate of the output word for correction is increased.

[Brief explanation of the drawing]

第１図は本発明の一実施例を示す要部説明図、
第２図は本発明を適用した文字認識装置全体のブ
ロツク図である。図中、１は観測部、２は特徴抽出部、３は識別
部、４は認識辞書、５は候補カテゴリバツフア、
６は単語辞書、７は単語マツチング部、８はデイ
スプレイ制御部、９は表示部、１０は操作部であ
る。 FIG. 1 is an explanatory diagram of main parts showing one embodiment of the present invention,
FIG. 2 is a block diagram of the entire character recognition device to which the present invention is applied. In the figure, 1 is an observation unit, 2 is a feature extraction unit, 3 is an identification unit, 4 is a recognition dictionary, 5 is a candidate category buffer,
6 is a word dictionary, 7 is a word matching section, 8 is a display control section, 9 is a display section, and 10 is an operation section.

Claims

[Scope of Claims] 1. An identification unit that identifies an optically read input character string character by character, a recognition dictionary that is referenced by the identification unit, and a recognition dictionary that is recognized by the identification unit and ranks each character in multiple positions. A candidate category buffer that stores the assigned recognition results, a display section that displays a character string of the first-ranked character in the buffer, and an operation section that allows you to specify where to modify the character string displayed on the display section. In a misread character correction method of a character recognition device, which includes a word dictionary storing various words to be referred to when correcting characters on a word-by-word basis, the misread character and the range of words including the misread character are displayed by the misoperation unit. By specifying the word in the dictionary, the correct character is left as is, and other candidate characters for the misread character are combined to create a word with the same word length as the range, and the word is searched in the word dictionary and found in the dictionary. The method is characterized by extracting the words that have been read, calculating the distance for each of the words by referring to the rank in the buffer with respect to the candidate character, and displaying the word with the smallest distance as another candidate in the word range that includes the misread character. A method for correcting misread characters.