JPS6297081A

JPS6297081A - Character recognizer

Info

Publication number: JPS6297081A
Application number: JP61237924A
Authority: JP
Inventors: Hiromichi Fujisawa; 藤沢　浩道; Yasuaki Nakano; 中野　康明; Michio Yasuda; 安田　道夫
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1986-10-08
Filing date: 1986-10-08
Publication date: 1987-05-06
Also published as: JPH0520797B2

Abstract

PURPOSE:To deliver a misrecognized part of character recognizing results after correcting it accurately and to reduce the misrecognizing factor of characters, by providing a resemblance calculating circuit and an inspection processor consisting of a memory and a microprocessor to perform recognition of words. CONSTITUTION:A character observing part 4 of a character recognizer 1 recognizes the characters written on a form for each line and delivers 6 them to a microprocessor 20 of an inspection processor 10 in the form of a character code string. The processor 20 delivers two character codes to a resemblance calculating circuit 30. Then the circuit 30 reads two standard patterns corresponding to said character codes out of a standard pattern storage 5 and sends them back to the processor 20 after calculation of their resemblance. Then the processor 20 performs first the recognition of words (key items) and designates the types of characters of the fixed items via a memory 11 to check the character codes. For the character codes corresponding to the key items, the character code string of a word dictionary, i.e., the work recognizing result is substituted for the character code string of the key items. Thus the misrecognition is corrected accurately and the misrecognizing facto of characters is reduced.

Description

【発明の詳細な説明】［発明の利用分野］本発明は、漢字のように文字カテゴリが多い場合に適し
た、誤読文字修正機能を備えた文字認識装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Application of the Invention] The present invention relates to a character recognition device having a function of correcting misread characters, which is suitable for cases where there are many character categories such as kanji.

［従来技術］従来、たとえば官公庁などの各種申請書の処理の多くは
人手によってなされている。これらの申請書は、普通漢
字仮名混り文で書かれており、これらの申請処理業務を
機械化しようとすると、漢字も含めた日本語文字の認識
装置が入力部に必要となる。現在、研究室のレベルでは
、実用上満足し得る読取精度を有する印刷漢字認識装置
の原理実験に成功しており（たとえば電子通信学会論文
誌、５８−Ｄ巻、２号、９４頁参照）５上記の各種申請
書の大半は和文タイプによる比較的高品質のタイプ印字
文書であることを考えると、上記の申請書処理業務にお
いて、印刷漢字認識装置を使用する環境条件は整ってい
るといえる。[Prior Art] Conventionally, most of the processing of various applications in government offices and the like has been done manually. These application forms are usually written in a mixture of kanji and kana, and if the application processing task were to be automated, the input unit would need a recognition device for Japanese characters, including kanji. Currently, at the laboratory level, we have successfully conducted a principle experiment of a printed kanji recognition device with reading accuracy that is satisfactory for practical use (for example, see Journal of the Institute of Electronics and Communication Engineers, Vol. 58-D, No. 2, p. 94).5 Considering that most of the various application forms mentioned above are relatively high-quality type-printed documents in Japanese type, it can be said that the environmental conditions for using the printed kanji recognition device in the above-mentioned application processing work are in place.

しかし、実際に印刷漢字認識装置を実用化しようとする
場合、申請業務の性格上かなり高度の認識精度が要求さ
れる。一方、漢字は文字類が極めて多いことや、印字品
質が比較的良好であるといえども比較的品質の悪い申請
書が入力されることもあり得ることを考えると、読取精
度は全く十分であるとは言えない。However, if a printed kanji recognition device is to be put into practical use, a fairly high level of recognition accuracy is required due to the nature of the application process. On the other hand, considering that kanji has an extremely large number of characters, and even if the print quality is relatively good, it is possible that an application form of relatively poor quality may be input, so the reading accuracy is quite sufficient. It can not be said.

しかるに、認識結果が正しいか否かを検定することによ
り、誤認識率を著しく減少せしめることが考えられる。However, it is conceivable that the rate of misrecognition can be significantly reduced by testing whether or not the recognition results are correct.

従来、上記の考え方は次のように行われていた。数字を
対象とした文字認識装置では、金額を扱うことが多いの
で、たとえば帳票上には各項目の金額とともにそれらの
総計をも記載しておき、認識装置では各項目の認識結果
の総計と、総計の認識結果とを比較して誤りを検出する
方法が取られている。また英字を対象とする文字認識装
置では、各英文字はある限定された語當の中の１つの単
語を構成しているということを前提として、Ｎ−ｇｒａ
ｍという手法を用いたが検定方法が考えられる。Conventionally, the above idea was carried out as follows. Character recognition devices that target numbers often handle monetary amounts, so for example, the amount of each item and the total amount are written on a form, and the recognition device can write the total amount of recognition results for each item, A method is used to detect errors by comparing the total recognition results. In addition, in character recognition devices that target English letters, N-gra
Although the method called m was used, other verification methods can be considered.

しかし、上記従来の方法はそのまま漢字を対象とした文
字認識装置に適用することができない。However, the above conventional method cannot be directly applied to a character recognition device for Chinese characters.

その理由は、漢字の場合は字種が英数字（多くて５０字
）などの場合に比較して２０００〜４０００と多く、た
とえばＮ−ｇｒａｍの表の記憶容量が膨大になり、その
ままでは実現不可能になる。The reason for this is that in the case of kanji, the number of character types is 2000 to 4000 compared to alphanumeric characters (50 characters at most), and for example, the storage capacity of an N-gram table becomes enormous, making it impossible to realize it as it is. It becomes possible.

［発明の目的コしたがって、本発明の目的は、字種の多い場合に適した
手法として単語情報を用いて読取結果を修正する手段、
を与え、全体として誤認識率を下げることにある。[Object of the Invention] Accordingly, the object of the present invention is to provide a means for correcting reading results using word information as a method suitable for cases where there are many character types;
The objective is to reduce the overall misrecognition rate.

［発明の総括説明コ上記の目的を達成するために、本発明においては、読取
結果を単語辞書中の単語と照合し、不一致の場合には単
語中の文字コードにおきかえて修正する点に特徴がある
。[General description of the invention] In order to achieve the above object, the present invention is characterized in that the reading result is compared with the word in the word dictionary, and if there is a mismatch, the character code in the word is replaced and corrected. There is.

［発明の実施例］たとえば、次のような文書をもつ申請書を考える。[Embodiments of the invention] For example, consider an application with the following documents:

（例）申請書の種類　登記申請書登記の目的　　全部移転原　因　　　　昭和５２年２月２日売買権利者氏　名　　　申出太部所　在　　　東京都国立市１−１持　分　　　３分の１義務者氏　名　　　乙用吹部住　所　　　東京都立川市２−２申請日　　　　昭和５２年３月３日以上本発明装置の原理の概略を、第１図の流れ図を用いて説
明する。まず、２０１，２０２で帳票上の文字を光電変
換し、一定枠内に切り出し、１行毎に認識し、認識結果
を文字コードの形で１行分出力する。認識部は上記動作
を帳票上の全文字が認識されるまで続ける。以上までは
従来の文字認識装置と同じである。つぎに、認識結果検
定部は、キー項目（１行の左側の所定の長さのフィール
ドに印刷される文字列）に対応する認識結果の文字系列
を抽出し、全キー項目が記憶される辞書の中から、この
文字系列が何番目のキー項目に該当するかを２０３で認
識する。これをキー項目の単語認識という。なお、文字
認識は誤まることも考えられるので、上記単語認識の手
法は工夫する必要がある。手法は後述する。(Example) Type of application Registration application Purpose of registration All transfers Cause February 2, 1971 Name of purchase and sale right holder Name of applicant Atabe Office 1-1 Kunitachi-shi, Tokyo Equity 1/3rd of obligor Name: Otsuyo Fukibe Address: 2-2, Tachikawa-shi, Tokyo Application date: March 3, 1978 The outline of the principle of the device of the present invention will be explained using the flowchart shown in FIG. First, at 201 and 202, characters on a form are photoelectrically converted, cut out within a certain frame, recognized line by line, and the recognition result is output for one line in the form of a character code. The recognition unit continues the above operation until all characters on the form are recognized. Everything up to this point is the same as the conventional character recognition device. Next, the recognition result verification unit extracts the character sequence of the recognition result corresponding to the key item (character string printed in a field of a predetermined length on the left side of one line), and creates a dictionary in which all key items are stored. It is recognized in step 203 to which key item this character sequence corresponds. This is called key item word recognition. Note that character recognition may be incorrect, so the word recognition method described above needs to be devised. The method will be described later.

何番目のキー項目かが分ると、このキー項目に続く固定
項目に出現し得る字種が限定可能となり２０４で字種を
指定する。従って、文字認識結果の文字コード列の中で
固定項目に対応する文字コードを調べて、上記の許容さ
れる字種に含まれるか否かを次に２０５で調べる。この
とき、含まれないことが分れば、文字認識の結果が誤り
であるか、帳票の文字が誤字であったかのどちらかであ
る。したがって、この場合は上記の旨を認識結果に付随
して出力する。たとえば文字コードの符号を反転させる
。検定の結果、許容字種に含まれていれば、正読と見做
して、そのまま文字コートを出力する。Once the number of the key item is known, the character types that can appear in the fixed items following this key item can be limited, and the character type is specified in step 204. Therefore, in the character code string resulting from character recognition, the character code corresponding to the fixed item is checked to see if it is included in the above-mentioned permissible character types in step 205. At this time, if it is found that it is not included, either the result of character recognition is incorrect or the characters on the form are misspelled. Therefore, in this case, the above information is output along with the recognition result. For example, reverse the sign of the character code. As a result of the test, if the character type is included in the allowable character types, it is assumed to be read correctly and the character code is output as is.

以上の動作を帳票上の文字がなくなるまで続ける。Continue the above operations until there are no more characters on the form.

つぎに、本発明の要点である認識結果の文字系列を単語
として認識する手筋を説明する。一般に単語認識をする
ためには単語の辞書（各単語を構成する文字コード列か
らなる表）を用意して、入力された字系列がどの辞書項
目と一致するかを調べればよい。しかし、実際には入力
された文字系列がすべて正しく読取られているとは限ら
ないので、どの辞書項目とも完全一致がとれない場合が
ある６したがって、辞書項目と一致がとれるか否かでは
なく、入力文字系列と各辞書項目との距離または等価的
に類似度（後で定義する）を求めて単語認識をする必要
がある。Next, a method for recognizing a character sequence resulting from recognition as a word, which is the main point of the present invention, will be explained. Generally, in order to recognize words, it is sufficient to prepare a word dictionary (a table consisting of character code strings that make up each word) and check which dictionary item the input character sequence matches. However, in reality, not all input character sequences are read correctly, so it may not be possible to find a perfect match with any dictionary entry6. It is necessary to perform word recognition by determining the distance or equivalently the degree of similarity (to be defined later) between the input character sequence and each dictionary entry.

たとえば「申請日」が読取った結果として「甲請日」が
得られることがあるが、「甲請日」という辞書項目は明
らかに存在しない６文字系列と辞書項目との類似度を各文字同志の類似度と
すると、上記例では「申」と「甲」との類似度が必要に
なる。しかし、このような２つの文字の組合せは、読取
対象字種を２０００字として４００，０００の組合せと
なり、記憶しておくことは不可能である。したがって、
本発明装置では、異なる文字同志（上記例では「甲」と
「申」の類似度が必要になった場合は、認識装置内の該
当する標準パターン同志の類似度を計算してその値を用
いる。同じ文字同志の類似度は常に１とする。For example, "Application date" may be obtained as a result of reading "Application date", but there is clearly no dictionary entry for "Application date"6. In the above example, the similarity between "Monkey" and "A" is required. However, such combinations of two characters are 400,000 combinations when the number of characters to be read is 2000 characters, and it is impossible to store them. therefore,
In the device of the present invention, when the similarity between different characters (in the above example, "K" and "Mon" is required, the similarity between the corresponding standard patterns in the recognition device is calculated and that value is used. .The degree of similarity between the same characters is always 1.

ここで類似１度とはＯから１までの値をとる数値で、二
つの文字パターン同志の間に定義され、専用計算回路に
より容易に計算され、公知であるので、ここでは説明を
省略する。Here, the degree of similarity is a numerical value ranging from O to 1, which is defined between two character patterns, is easily calculated by a dedicated calculation circuit, and is well known, so its explanation will be omitted here.

上記手法による単語認識のアルゴリズムを第４図の流れ
図を用いて説明する。まず、各辞書項目は、単語を構成
する文字数Ｎｋと、文字コード列Ｗｋ＝　（ｗｉ（ｋ）
ｌ　ｉ＝Ｌ　　Ｌ　”’ｐ　Ｎｋ）　とで表現されてい
る。全辞書項目の数をＫとする。上でｋは、項目番号（
単語番号）であり、１からＫまでの値をとる。また単語
認識部へ入力される文字認識結果の文字系列（文字コー
ト列）を５＝（ｓ　ｉｌ　ｉ＝１．２．　・＝、Ｎ）で
表わす。文字系列Ｓとｗｋとの類似度をρにで表わす。The word recognition algorithm using the above method will be explained using the flowchart shown in FIG. First, each dictionary entry has the number of characters Nk that makes up the word, and the character code string Wk = (wi(k)
It is expressed as l i=L L ”'p Nk). Let K be the total number of dictionary items. In the above, k is the item number (
word number) and takes values from 1 to K. Further, the character sequence (character code sequence) of the character recognition result input to the word recognition unit is expressed as 5=(s il i=1.2. . . . =, N). The degree of similarity between character series S and wk is expressed as ρ.

第２図に単語認識に必要な辞書の構成を示す。Figure 2 shows the structure of a dictionary necessary for word recognition.

１　　辞書の最初の語５０１（番号Ｄ）はキー項目の数
Ｋを保持し、つぎに各項目の文字コード列を記憶する番
地Ａ　１　ｚ　Ａ　２３・・・、ＡＮを記憶する語５０
２が続く。つぎは各キー項目の文字コード列を記憶する
語がつづく。たとえばＡ１番地５０３は、項目番号１の
単語を構成する文字の長さく文字数）Ｎ１を保持し、以
下のＮ１語５０４は各文字コードを記憶している。1 The first word 501 (number D) of the dictionary holds the number K of key items, then the address A 1 z A 23, which stores the character code string of each item, the word 50 which stores AN.
2 follows. Next follows a word that stores the character code string of each key item. For example, the A1 address 503 stores N1 (length and number of characters) that constitute the word of item number 1, and the following N1 word 504 stores each character code.

第３図に単語認識の対象となる文字コード列を図示する
。文字コード列はメモリの作業用領域に一担格納され、
Ｎ語からなる。FIG. 3 illustrates a character code string that is a target of word recognition. The character code string is stored in the working area of memory,
Consists of N words.

第４図において、単語認識は次のように実行される。ま
ず１０１，１０２で初期化をする。In FIG. 4, word recognition is performed as follows. First, initialization is performed in steps 101 and 102.

１０３において、単語長が入力文字系列長に一致するか
否かを判定して、一致しないときは類似度ρにはＯのま
まとして、次の単語を調べる。単語長が一致するときは
、１０５〜１１２の過程で類似度ρｋを求める。In step 103, it is determined whether the word length matches the input character sequence length, and if they do not match, the similarity ρ is left as O and the next word is examined. When the word lengths match, the degree of similarity ρk is calculated in steps 105 to 112.

１０４で初期化を行い、１０５で辞書内に番目の項目の
ｉ番目の文字コードｗｘ（ｋ）と入力文字系列のｉ番目
の文字コードｓｌとが一致するか否かを調べ、一致しな
いときは、１０６でρｋに１を加え、一致しないときは
１０７において判定不能であったかどうかを調べる。５
ｂ＝Ｑのときは判定不能を示し、このときは１０６を実
行し、ｓ１≠０のときは１０８において、認識装置内の
標準パターンを用いて、Ｗｉ（ｋ）の標準パターンとｓ
ｌの標準パターンの類似度を計算し、ρｋに加える。そ
こまでの文字数ｉでρｋを割った値がしきい値εを越え
るかどうかを１０９で判定し、越えない場合は項目には
候補から１１３において除外する。越える場合は次の文
字に進み、全文字に対して１０５〜１１１の処理が終了
したときは１１２において、文字系列同志の類似度を文
字数Ｎで割って正規化する。Initialization is performed in 104, and in 105 it is checked whether the i-th character code wx(k) of the th item in the dictionary matches the i-th character code sl of the input character series, and if they do not match, , 106 adds 1 to ρk, and if they do not match, it is checked in 107 whether it could not be determined. 5
When b=Q, it indicates that it is impossible to determine, in this case 106 is executed, and when s1≠0, in 108, the standard pattern in the recognition device is used to compare the standard pattern of Wi(k) and s
Calculate the similarity of standard patterns of l and add it to ρk. It is determined in 109 whether the value obtained by dividing ρk by the number of characters i up to that point exceeds the threshold value ε, and if it does not, the item is excluded from the candidates in 113. If the number exceeds the number, the process advances to the next character, and when the processing in steps 105 to 111 is completed for all characters, in step 112, the degree of similarity between the character series is divided by the number of characters N to normalize.

１１５において全辞書項目の処理が済んだことが検知さ
れたときは、１１６で求められた全類似度（ρｋｌ　ｋ
”１＊　２ｓ・・・、Ｋ）の中の最大値ρ　と次大値ρ
　を求め、絶対しきい値δとρ１を比較して１１７．さ
らにρ１とρ２の差に十分な開きがあるか否かを相対し
きい値γにより検定し、１　　　　　、十分なときは１１９でρ　を与λる単語番号ｋ　を出力
十分でないときは判定不能を１２０で出力する。When it is detected in 115 that all dictionary items have been processed, the total similarity (ρkl k
``1 * 2s..., K) maximum value ρ and next largest value ρ
is calculated, and the absolute threshold value δ and ρ1 are compared and 117. Furthermore, whether or not there is a sufficient difference between ρ1 and ρ2 is tested using a relative threshold value γ, and when it is 1, output the word number k that gives ρ at 119. If it is not sufficient, it is impossible to judge. Output at 120.

つぎに、キー項目に続く固定項目に出現し得る字種を指
定する手段を説明する。本発明では、フラグ表なるもの
を第５図に示すごとく、またビット番号変換表なるもの
を第６図に示すごとく用意する。キー項目の単語認識結
果がｋのときは、まずビット番号変換表を参照してフラ
グ表のどのビットを利用するかを示すビット位置番号ｂ
（ｋ）を求める。つぎに任意の文字に対するフラグ表の
内容を取り出し、ｂ（ｋ）ビット項目の値が１であると
きは同文字は同キー項目に続く字種として許され、０で
あるときは許されないということが分る。Next, a method for specifying character types that can appear in fixed items following a key item will be explained. In the present invention, a flag table as shown in FIG. 5 and a bit number conversion table as shown in FIG. 6 are prepared. When the word recognition result of the key item is k, first refer to the bit number conversion table and bit position number b indicating which bit of the flag table is used.
Find (k). Next, extract the contents of the flag table for a given character, and if the value of the b(k) bit item is 1, the same character is allowed as a character type following the same key item, and if it is 0, it is not allowed. I understand.

したがって、この結果を用いて、原理の説明で述べたよ
うに認識結果を検定することができる６以下、本発明を
実施例を参照して詳細に説明する。Therefore, using this result, the recognition result can be verified as described in the explanation of the principle.6 Hereinafter, the present invention will be described in detail with reference to examples.

第７図は本発明装置の一実施例のブロック図である。以
下、同図に従って実施例を説明する。FIG. 7 is a block diagram of an embodiment of the apparatus of the present invention. Hereinafter, an embodiment will be described according to the same figure.

同図において１は従来の文字認識装置で、３が未知パタ
ーンを観測する文字観測部、４が文字認識処理装置、５
は標準パターン記憶装置である。In the figure, 1 is a conventional character recognition device, 3 is a character observation section that observes unknown patterns, 4 is a character recognition processing device, and 5 is a character recognition device.
is a standard pattern storage device.

上記の部分は一公知であるのでここでは詳述しない。Since the above-mentioned parts are well known, they will not be described in detail here.

認識処理装置４の出力６は、帳票上の文字を行単位に認
識した結果で、文字コード列の形で転送される。ここで
、文字コード０のときは、その文字は認識不能であった
ことを表わす。The output 6 of the recognition processing device 4 is the result of recognizing characters on a form line by line, and is transferred in the form of a character code string. Here, when the character code is 0, it means that the character is unrecognizable.

検定処理装置１０は、メモリ１１と類似度計算回路３０
と、マイクロプロセッサ２ｏから成っている。回路３０
は、マイクロプロセッサ２０から２個の文字コードを受
けて、同文字コードに対応する２個の標準パターンを５
より受けて同標準パターン同志の類似度を計算し、結果
の類似度を２０へ転送する。回路３ｏは、第４図の処理
１０８を実行するときに用いられる。The test processing device 10 includes a memory 11 and a similarity calculation circuit 30.
and a microprocessor 2o. circuit 30
receives two character codes from the microprocessor 20 and generates two standard patterns corresponding to the same character codes into 5
Then, the similarity between the same standard patterns is calculated, and the resulting similarity is transferred to 20. The circuit 3o is used when executing the process 108 in FIG.

メモリ１１は、第５図に示したフラグ表を記憶する部分
１２と、第６図に示したビット番号変換表を記憶する部
分１３と、第２に示したキー項目辞書を記憶する部分１
４と、さらに作業用領域１５とからなっている。The memory 11 includes a section 12 for storing the flag table shown in FIG. 5, a section 13 for storing the bit number conversion table shown in FIG. 6, and a section 1 for storing the key item dictionary shown in FIG.
4 and a work area 15.

マイクロプロセッサ２０は２ｏ内に持つマイクロプログ
ラムに従って、第４図で説明したアルゴリズムにより単
語認識（キー項目認識）を行い。The microprocessor 20 performs word recognition (key item recognition) using the algorithm explained in FIG. 4 according to the microprogram contained in the microprocessor 2o.

固定項目の字種の指定を１２に用いて行い、固定項目の
認識結果である文字コードを検定する。The character type of the fixed item is specified using step 12, and the character code that is the recognition result of the fixed item is verified.

つぎに１文字認識装置としての処理の流れに沿って説明
する。Next, the flow of processing as a single character recognition device will be explained.

帳票上に印加された文字パターンは３により充電変換さ
れ、一定の枠内に切り出され、４へ転送される。４では
３から送られてきた未知パターンと５内の各標準パター
ンとの類似度を計算し、最大類似度を与える文字のコー
トを、１行分まとめて、文字コード列として出力線６上
に出力する６ただし、ここで４は最大類似度が所定のし
きい値以上になっているかどうかを検定し、しきい値に
達しない場合は出力コーｋを０とする。The character pattern applied on the form is charged and converted by 3, cut out within a certain frame, and transferred to 4. In step 4, the degree of similarity between the unknown pattern sent from step 3 and each standard pattern in step 5 is calculated, and the coats of characters that give the maximum degree of similarity are summarized for one line and output as a character code string on output line 6. Output 6 However, here, 4 tests whether the maximum similarity is greater than a predetermined threshold, and if it does not reach the threshold, the output code k is set to 0.

検定処理装置ｌｏ内のマイクロプロセッサ２０は１行ご
との認識結果の文字コート列を６を通して受は取りメモ
リ１５に格納する。まず１行分の文字系列（ブランクも
１つの文字コードを与えられている）からキー項目に対
応する文字コード系列を抽出し、単語認識に移る。１行
分の文字コード列の例を第８図に示す。１行は２５文字
からなり、先頭の８文字８０１がキー項目に対応し、後
半の１７文字８０２が固定項目に対応する。文字コード
９９９９はブランクを意味する１ｅＪ８０１内のブラン
クでない文字コード（第８図に於いてはＳ’ｌ＋　Ｓ２
＋　　・・、ｓ６）がキー項目の文字を認識した結果の
文字コート列である。The microprocessor 20 in the verification processing device lo receives the character code string of the recognition result for each line through 6 and stores it in the memory 15. First, a character code sequence corresponding to a key item is extracted from a character sequence for one line (blanks are also given one character code), and the process moves on to word recognition. An example of a character code string for one line is shown in FIG. One line consists of 25 characters, the first 8 characters 801 correspond to key items, and the latter 17 characters 802 correspond to fixed items. The character code 9999 means blank and is a non-blank character code in 1eJ801 (in Figure 8, S'l+S2
+..., s6) is a character code string as a result of recognizing the characters of the key items.

単語認識はマイクロプログラムにより、第４図に示した
アルゴリズムに従って行う。ただし、同アルゴリズムに
おいて、第４図の処理１０８は、類似度計算回路によっ
て行う。すなわち、２０は２個の文字コードＳｉと１（
番目の辞書項目の１番目の文字コードｗｉ（ｋ）（第４
図参照）を３０に転送し、類似度計算の命令を３０に対
して発する。Word recognition is performed by a microprogram according to the algorithm shown in FIG. However, in the same algorithm, the process 108 in FIG. 4 is performed by a similarity calculation circuit. In other words, 20 is composed of two character codes Si and 1 (
1st character code wi(k) (4th
(see figure) is transferred to 30, and a similarity calculation command is issued to 30.

３０は同命令を受けて、Ｓｉとｗｉ（ｋ）に対応する２
個の標準パターンを５より読み出し、同標準パターン同
志の類似度ρ　を計算し、２０に対し返送する。以上は
第１図の処理２０３である。30 receives the same command and converts 2 corresponding to Si and wi(k).
The standard patterns 5 are read out from 5, the similarity ρ between the same standard patterns is calculated, and the result is returned to 20. The above is the process 203 in FIG.

マイクロプログラム単語認識が終了すると、検定処理に
移る。まず第１図の処理２０４を行う。When the microprogram word recognition is completed, the process moves to verification processing. First, processing 204 in FIG. 1 is performed.

まず、キー項目認識の結果のキー項目番号が分ると、メ
モリ１３内のビット番号変換表を調べて、同キー項目に
続く固定項目の字種を指定する所のフラグ表のビット番
号ｂ　を得る。続いて固定項目の認識結果の検定処理２
０５を行う。２・０はメモリ１５内の認識結果文字コー
ド列（第８図）の内、固定項目に対応する文字コード８
０２から１つづつ取り出し、メモリ１２内のフラグ表（
第５図参照）の各文字コードに対応するフラグのｂビッ
ト目を調べる。同ビットが１のときは、許容される字種
であるので、その時は何もしないが、０のときは許容さ
れない字種であるので、同結果を与えた８０２内の文字
コードの符号を反転させる。たとえば、固定項目のある
認識結果文字コードが５００であり、検定の結果許容さ
れない文字のときは同符号を反転させて−５００とする
。First, when the key item number as a result of key item recognition is known, the bit number conversion table in the memory 13 is checked and bit number b of the flag table that specifies the character type of the fixed item following the key item is determined. obtain. Next, verification processing 2 of recognition results for fixed items
Do 05. 2.0 is the character code 8 corresponding to the fixed item in the recognition result character code string (Fig. 8) in the memory 15.
02 one by one and the flag table in memory 12 (
Check the b-th bit of the flag corresponding to each character code (see FIG. 5). When the same bit is 1, it is an acceptable character type, so nothing is done at that time, but when it is 0, it is an unacceptable character type, so the sign of the character code in 802 that gave the same result is reversed. let For example, if the recognition result character code with a fixed item is 500, and the character is not allowed as a result of the verification, the same code is inverted and set to -500.

ここで、固定項目の認識結果の文字コードが４から送出
された段階で負の符号のときは、同文字コードに対する
検定処理は行わない。Here, if the character code of the fixed item recognition result is a negative sign at the stage when it is sent from 4, the verification process for the same character code is not performed.

また、キー項目に対応する文字コー１（につぃては、単
語認識結果の辞書の文字コード列を第８図に示したキー
項目の文字コード列に代入する。例えば、文字認識結果
８０１が「申請臼」であっても単語認識の結果が「申請
臼」に対応するキー項目番号であるとするとマイクロプ
ロセッサ２ｏは「申請臼」の代りに「申請臼」に対応す
る文字コート列をメモリ１４に格納しである辞書から取
り出して８０１を書き替えるので、文字認識結果に誤り
があって石正しく修正される。キー項目の単語認識の結
果が判定不能であった場合は、以後の文字コードの検定
ができないので同行の文字コードをすべて負に反転させ
る。In addition, the character code string corresponding to the key item 1 (initially, the character code string of the dictionary of word recognition results is substituted into the character code string of the key item shown in FIG. 8. For example, if the character recognition result 801 is If the word recognition result is a key item number corresponding to "Application mortar" even if it is "Application mortar", the microprocessor 2o stores the character code string corresponding to "Application mortar" instead of "Application mortar". Since 801 is stored in 14 and retrieved from a dictionary, any errors in the character recognition results are corrected correctly.If the word recognition result for the key item is undecidable, the subsequent character codes Since it is not possible to test, all the accompanying character codes are inverted to negative.

検定が終了して第８図に示した文字コード列が書き替え
られると（誤りがない場合は結果的には変更がない。）
、２０は同文字コード列８０１゜８０２を出力線５０上
に出力する。When the verification is completed and the character code string shown in Figure 8 is rewritten (if there are no errors, there will be no change as a result).
, 20 output the same character code strings 801 and 802 onto the output line 50.

以上の過程は帳票の行単位に実行される。The above process is executed for each line of the form.

［まとめコ以上説明したごとく、本発明装置は文字認識結果の誤認
識が正しく修正されて出力されるので。[Summary] As explained above, the apparatus of the present invention correctly corrects misrecognitions in character recognition results and outputs them.

誤認識率を低下させることができる。The misrecognition rate can be reduced.

本文字認識装置の結果を、たとえばっぎのように表示、
人手により最終判定を仰ぐことができる。The results of this character recognition device are displayed, for example,
The final judgment can be made manually.

すなわち、正の文字コードが出力された場合は通常に表
示し、負の文字コードが出力された場合は、誤認識の可
能性が高いので、輝度や、色を変えてディスプレイした
り、樋示文字の脇に特殊記号を付して表示したりでき、
人手を介して修正できる。In other words, if a positive character code is output, it will be displayed normally, but if a negative character code is output, there is a high possibility of misrecognition, so you may want to change the brightness or color of the display, or change the gutter display. You can display special symbols next to the characters.
It can be corrected manually.

本発明装置の特徴は、従来の文字認識装置の後段に付け
ればよいので大きな変更を必要としないこと、本検定処
理部を容易に取除くことができ認識部はそのまま従来の
認識装置として動作できるので、本検定処理部をオプシ
ョンとして取扱えることである。The features of the device of the present invention are that it does not require major changes as it can be installed after the conventional character recognition device, and that the verification processing section can be easily removed and the recognition section can operate as a conventional recognition device as is. Therefore, this verification processing section can be handled as an option.

また、単語認識に際して、誤りを含んだ文字コード系列
から辞書を検索する手法で必要になる任意の二つの文字
の近さの測度を、標準パターン同志の類似度によって得
ている点も特徴である。したがって、近さの測度を貯え
るための膨大な記憶装置が不必要である。なお、本明細
書に述べた実施例においては、第５図で１０の中に類似
度計算回路３ｏを設けたが、類似度計算機能は４が本来
持っているので、４を若干変更することにより、３０を
４の中に含め、全体として効率的なものにすることがで
きる。Another feature of this method is that the measure of the closeness between any two characters, which is required in the method of searching a dictionary from a character code sequence containing errors during word recognition, is obtained from the similarity between standard patterns. . Therefore, a large amount of storage for storing proximity measures is unnecessary. In the embodiment described in this specification, the similarity calculation circuit 3o is provided in 10 in FIG. 5, but since 4 originally has the similarity calculation function, 4 may be slightly modified. Therefore, 30 can be included in 4, making it more efficient as a whole.

[Brief explanation of drawings]

第１図は本発明の詳細な説明するための流れ図である。第２図は、単語辞書の構成の図、第３図はキー項目に対
応する文字コード列の図である。第４図は単語認識（キー項目認識）アルゴリズムを説明
するための流れ図、第５図はフラグ表、第６図はビット
番号変換表の図である。第７図は本発明の一実施例のブロック図である。第８図は文字認識結果の文字コード列の図であ第２凪４夕１凶第２）因FIG. 1 is a flowchart for explaining the invention in detail. FIG. 2 is a diagram of the structure of a word dictionary, and FIG. 3 is a diagram of character code strings corresponding to key items. FIG. 4 is a flowchart for explaining the word recognition (key item recognition) algorithm, FIG. 5 is a flag table, and FIG. 6 is a bit number conversion table. FIG. 7 is a block diagram of one embodiment of the present invention. Figure 8 is a diagram of the character code string resulting from character recognition.

Claims

[Claims]

1. An input means for inputting an unknown character pattern, an output means for comparing the input unknown character pattern with a standard pattern and outputting the recognition result for each word, a storage means for storing word information, and the above output. a collation means for comparing and collating the word information with the recognition result for each word outputted by the means; and means for specifying the unknown character pattern for each word based on the collation result; . A character recognition device comprising means for distinguishing and displaying characters that are likely to be misrecognized from other characters based on the matching results.