JPH06274702A

JPH06274702A - Character recognizing device

Info

Publication number: JPH06274702A
Application number: JP5063781A
Authority: JP
Inventors: Makoto Kushima; 真久島; Koichi Higuchi; 浩一樋口
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1993-03-23
Filing date: 1993-03-23
Publication date: 1994-09-30

Abstract

PURPOSE:To improve the character recognizing precision by storing the correcting frequency information on the registered words and deciding the order of candidate words according to the correcting frequency of those words registered by users. CONSTITUTION:An after-processing part 16 holds a knowledge dictionary, and a word register part 14 registers additionally the word information desired by users into the knowledge dictionary. The correcting frequency information contained in the knowledge dictionary are increased by a control part 24 every time the registered information on a word is corrected. Then the part 16 supplied the character recognizing result through a recognizing part 12 and calculates the evaluation value of a character string consisting of the combination of candidate characters equivalent to a single unit of knowledge processing. Thus a character string whose elavulation value is less than the prescribed threshold value is sent to a result editing part 18 as a candidate word. If plural character strings have the same evaluation value, the candidate order of words registered by users that satisfies X<S is defined as 1 (X: correcting frequency information value of registered words, S: prescribed threshold value).

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、正確かつ迅速に帳票
または文書を処理できる認識性能の良い文字認識装置に
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognizing device which can accurately and quickly process a form or a document and has a good recognition performance.

【０００２】[0002]

【従来の技術】従来より、手書き文字の認識率を向上さ
せるために知識辞書が用いられている。知識辞書が保持
する情報は、認識対象の単語の情報、文脈情報、及びそ
の他の情報である。文字認識装置の利用者は知識辞書に
予め用意されていない単語情報を追加して登録すること
ができる（以下このように利用者により登録された単語
を利用者登録単語と称する）。以下、知識辞書の単語情
報を利用した従来の文字認識技術として、例えば文献：
昭和５７年電子通信学会総合全国大会講演論文集分冊
５ー３２６頁に示されている技術を基に説明する。2. Description of the Related Art Conventionally, a knowledge dictionary has been used to improve the recognition rate of handwritten characters. The information held by the knowledge dictionary is the information of the recognition target word, the context information, and other information. The user of the character recognition device can add and register word information that is not prepared in advance in the knowledge dictionary (hereinafter, the word registered by the user in this way is referred to as a user registered word). Hereinafter, as a conventional character recognition technique using word information of a knowledge dictionary, for example, a document:
The explanation will be given based on the technology shown in pp. 5-326 of the Proceedings of the IEICE General Conference, 1982.

【０００３】まず帳票または文書の所定領域を光学的に
走査し紙面からの光信号を光電変換して帳票または文書
の画像データを得る。そして画像データから認識対象と
なる文字パタンを切り出す。この切り出し文字パタンに
基づき認識対象となる文字の認識を行い、認識結果とし
て１個または複数個の候補文字を得る。First, a predetermined area of a form or a document is optically scanned and an optical signal from the paper surface is photoelectrically converted to obtain image data of the form or the document. Then, a character pattern to be recognized is cut out from the image data. Characters to be recognized are recognized based on the cut-out character pattern, and one or more candidate characters are obtained as a recognition result.

【０００４】そして知識処理１単位分の文字に関して、
各文字毎に得た１個または複数個の候補文字を組み合わ
せて文字列を作り、文字列の各候補文字毎に付された候
補順位または類似度を用いて当該文字列と知識辞書中の
単語の間の評価値を算出する。さらにこの文字列の評価
値に応じて候補順位を付した１個または複数個の候補単
語を表示する。但し、当該文字列の評価値が所定の範囲
外となる場合には、候補順位が１となる候補文字を組み
合わせてできる候補文字列を表示する。Then, regarding the character for one unit of knowledge processing,
A character string is created by combining one or more candidate characters obtained for each character, and the character string and a word in the knowledge dictionary are used by using the candidate rank or similarity assigned to each candidate character of the character string. The evaluation value between is calculated. Further, one or a plurality of candidate words with candidate ranks assigned according to the evaluation value of this character string are displayed. However, when the evaluation value of the character string is out of the predetermined range, a candidate character string formed by combining the candidate characters having the candidate rank of 1 is displayed.

【０００５】もし評価値が等しい複数個の候補単語が存
在した場合は、一般に利用者登録単語を最優先し、その
他の単語については知識辞書に登録されている順番に従
い候補順位を決定する。If there are a plurality of candidate words having the same evaluation value, the user-registered word is generally given the highest priority, and the candidate ranks of the other words are determined according to the order registered in the knowledge dictionary.

【０００６】候補順位が１位の候補単語が誤っている場
合は、キーボードから正解単語を直接入力するか、或い
は候補順位が２位以下の候補単語も表示してその中に正
解単語があればそれを選択することにより訂正を行う。If the candidate word having the first candidate rank is wrong, the correct answer word is directly input from the keyboard, or the candidate words having the second or lower candidate rank are displayed and if there is a correct answer word among them. Correction is made by selecting it.

【０００７】[0007]

【発明が解決しようとする課題】上記文字認識技術で
は、評価値が等しい複数個の候補単語が存在する場合に
利用者登録単語が最優先されるので、使用頻度が低くか
つ類似単語が多い単語等が利用者登録単語として登録さ
れると、その単語の影響により候補順位が１位の候補単
語の誤りが増えてしまい、文字の認識性能が低下すると
いう問題点があった。In the above character recognition technique, since the user-registered word has the highest priority when there are a plurality of candidate words having the same evaluation value, the word which is used less frequently and has many similar words. However, if such a word is registered as a user-registered word, the number of errors in the candidate word having the first candidate rank increases due to the influence of the word, and there is a problem that the character recognition performance deteriorates.

【０００８】例えばある帳票の氏名欄の「阿部」と記入
された箇所を認識して得られた候補文字から評価値を算
出して、正解単語を得ている場合を考える。もしここで
「阿」を認識した際の候補順位が１位及び２位の文字が
それぞれ「阿」及び「河」で、「部」を認識した際の候
補順位が１位及び２位の文字がそれぞれ「那」及び
「部」であり、なおかつ「阿部」は知識辞書に予め用意
されており、「河那（かわな）」は利用者によって追加
登録された単語であるとする（「阿那」は知識辞書に存
在しないものとする）。すると「阿部」及び「河那」の
評価値は等しくなり、「阿部」は「河那」よりはるかに
出現頻度が大きい単語であるにもかかわらず利用者登録
単語である「河那」が優先されてしまう。For example, consider a case where a correct answer word is obtained by calculating an evaluation value from a candidate character obtained by recognizing a place where "Abe" is entered in the name column of a certain form. If "A" is recognized here, the first and second characters in the candidate rank are "A" and "Kawa", respectively, and when the "part" is recognized, the first and second characters are candidate ranks. Are “na” and “part” respectively, and “Abe” is prepared in the knowledge dictionary in advance, and “kawana” is a word additionally registered by the user (“Ana”). "Na" is not present in the knowledge dictionary). Then, the evaluation values of "Abe" and "Kana" are equal, and although "Abe" is a word that appears much more frequently than "Kana", the user-registered word "Kana" takes precedence. Will be done.

【０００９】本発明はこの問題点を解決するために、利
用者登録単語の訂正頻度に応じて候補単語の候補順位が
決定されるようにして認識精度を高めた文字認識装置を
提供することを目的とする。In order to solve this problem, the present invention provides a character recognition apparatus in which the candidate rank of candidate words is determined according to the frequency of correction of user-registered words to improve the recognition accuracy. To aim.

【００１０】[0010]

【課題を解決するための手段】この発明に係る文字認識
装置は前記課題を解決するために、量子化された帳票ま
たは文書の画像データから切り出した文字パタンの認識
結果を出力する認識部と、単語情報を知識辞書へ登録す
るための単語登録部と、前記知識辞書を保持し前記認識
結果に基づく知識処理結果を出力する後処理部と、前記
知識処理結果を編集する結果編集部と、前記知識処理結
果及び編集の結果を表示する表示部と、正解単語を入力
するための入力部と、前記結果編集部、表示部、及び入
力部の動作を制御する制御部を備えて成る文字認識装置
において、前記知識辞書は登録された単語の訂正頻度に
係る頻度情報を記憶し、前記制御部は前記頻度情報を更
新し、前記後処理部は次回以降の処理において前記頻度
情報に応じて候補単語の候補順位を定めることを特徴と
する。In order to solve the above-mentioned problems, a character recognition device according to the present invention includes a recognition unit which outputs a recognition result of a character pattern cut out from image data of a quantized form or document, A word registration unit for registering word information in a knowledge dictionary; a post-processing unit that holds the knowledge dictionary and outputs a knowledge processing result based on the recognition result; a result editing unit that edits the knowledge processing result; A character recognition device including a display unit for displaying a knowledge processing result and an editing result, an input unit for inputting a correct word, a result editing unit, a display unit, and a control unit for controlling operations of the input unit. In the above, the knowledge dictionary stores frequency information related to the correction frequency of registered words, the control unit updates the frequency information, and the post-processing unit selects candidates according to the frequency information in the subsequent processing. Characterized in that it defines the candidate ranking of the word.

【００１１】[0011]

【作用】この発明によれば、後処理部は利用者登録単語
の訂正頻度に応じて候補単語の候補順位を決定するよう
にしたので、使用頻度の低い登録単語が優先して選択さ
れるようなことがなくなり、前記課題が解決されるので
ある。According to the present invention, the post-processing unit determines the candidate rank of the candidate words according to the correction frequency of the user-registered words, so that the registered words that are less frequently used are preferentially selected. The above problem is solved.

【００１２】[0012]

【実施例】以下、図面を参照しこの発明の実施例につき
説明する。尚、図面はこの発明が理解できる程度に概略
的に示されているにすぎず、従って各構成成分の形状、
配設位置、寸法、入出力信号および接続関係を図示例に
限定するものではない。Embodiments of the present invention will be described below with reference to the drawings. It should be noted that the drawings are only schematically shown to the extent that the present invention can be understood.
The arrangement position, dimensions, input / output signals, and connection relationship are not limited to the illustrated example.

【００１３】図１はこの発明の一実施例の説明に供する
機能ブロック図である。この実施例の文字認識装置１０
は、認識辞書を保持し、量子化された帳票または文書の
画像データから文字パタンを切り出し、この切り出した
文字パタンの候補文字を出力する認識部１２と、文字認
識装置の利用者が希望する単語情報を登録する単語登録
部１４と、知識辞書を保持し、認識結果の知識処理結果
を出力する後処理部１６と、認識結果を訂正、確認する
結果編集部１８と、知識処理結果及び編集結果を表示す
る表示部２０と、正解単語を入力する入力部２２を備
え、更に結果編集部１８、表示部２０、及び入力部２２
の動作を制御すると共に、利用者登録単語の訂正頻度に
係る頻度情報を更新する制御部２４を備えて成る。また
図１において２６は帳票または文書の量子化された画像
データを出力する光電変換部であり、２８は光電変換部
２６からの画像データを格納する画像メモリである。FIG. 1 is a functional block diagram for explaining one embodiment of the present invention. Character recognition device 10 of this embodiment
Is a recognition unit 12 that holds a recognition dictionary, cuts out character patterns from quantized image data of a form or document, outputs candidate characters of the cut-out character patterns, and a word desired by a user of the character recognition device. A word registration unit 14 that registers information, a post-processing unit 16 that holds a knowledge dictionary and outputs a knowledge processing result of a recognition result, a result editing unit 18 that corrects and confirms the recognition result, a knowledge processing result and an editing result. And a display unit 20 for displaying a correct answer word, and an input unit 22 for inputting a correct word.
The control unit 24 controls the operation of (1) and updates the frequency information related to the correction frequency of the user registration word. Further, in FIG. 1, reference numeral 26 is a photoelectric conversion unit that outputs quantized image data of a form or document, and 28 is an image memory that stores the image data from the photoelectric conversion unit 26.

【００１４】図２は１単語についての知識辞書における
記憶内容のフォーマットを示している。同図において３
０は単語情報、３２はその単語が利用者登録単語か否か
を示すフラグ（以下利用者フラグと称する）、３４は訂
正頻度に係る頻度情報（以下訂正頻度情報と称する）で
ある。FIG. 2 shows the format of the stored contents in the knowledge dictionary for one word. 3 in the figure
Reference numeral 0 is word information, 32 is a flag indicating whether or not the word is a user-registered word (hereinafter referred to as user flag), and 34 is frequency information related to correction frequency (hereinafter referred to as correction frequency information).

【００１５】図３は帳票の一例を示したものであり、同
図において３６は住所が記載される帳票の例であり、３
８は文字記載領域を指定する記入枠である。FIG. 3 shows an example of a form. In FIG. 3, 36 is an example of a form in which an address is described.
Reference numeral 8 is an entry frame for designating a character writing area.

【００１６】以下、図１、図２及び図３を参照し、この
実施例について詳細に説明する。光電変換部２６は帳票
または文書上の所定の読取り範囲を光学的に走査し、帳
票または文書からの光信号Ｌを光電変換して白黒２値に
量子化された画像データを出力し、画像メモリ２８はこ
の画像データを格納する。Hereinafter, this embodiment will be described in detail with reference to FIGS. 1, 2 and 3. The photoelectric conversion unit 26 optically scans a predetermined reading range on a form or a document, photoelectrically converts an optical signal L from the form or the document, outputs image data quantized into black and white binary, and outputs the image memory. 28 stores this image data.

【００１７】認識部１２は画像メモリ２８の画像データ
から文字パタンを切り出し、この切り出した文字パタン
から認識対象となる文字に関する各種特徴を抽出する。
そして当該文字パタンの特徴を標準文字パタンと照合
し、候補文字を出力する。１文字に関して１個または複
数個の候補文字が認識結果として得られ、候補文字が１
個の場合には候補順位１を当該候補文字に付して出力
し、また候補文字が複数個の場合には類似度に応じて各
候補文字毎に定めた候補順位を候補文字に付して出力す
る。The recognition unit 12 cuts out a character pattern from the image data of the image memory 28, and extracts various characteristics relating to the character to be recognized from the cut-out character pattern.
Then, the characteristics of the character pattern are compared with the standard character pattern, and the candidate character is output. For one character, one or more candidate characters are obtained as a recognition result, and the candidate character is 1
In the case of the number of candidate characters, the candidate rank 1 is attached to the candidate character and outputted, and in the case of a plurality of candidate characters, the candidate rank assigned to each candidate character according to the similarity is attached to the candidate character. Output.

【００１８】単語登録部１４は文字認識辞書の利用者が
希望する単語情報を知識辞書へ追加登録する。The word registration unit 14 additionally registers the word information desired by the user of the character recognition dictionary in the knowledge dictionary.

【００１９】後処理部１６は知識辞書を保持し、知識辞
書中の利用者フラグ３２には当該単語情報３０が予め用
意されていた場合は０、利用者により追加登録された場
合は１がセットされている。訂正頻度情報３４は初期値
が０であり、候補順位が１の当該単語情報の単語が結果
編集部１８で訂正されるとその都度、制御部２４によっ
て１づづ加算される。The post-processing unit 16 holds a knowledge dictionary, and the user flag 32 in the knowledge dictionary is set to 0 when the word information 30 is prepared in advance, and is set to 1 when the word information 30 is additionally registered by the user. Has been done. The correction frequency information 34 has an initial value of 0, and each time the word of the word information having the candidate rank of 1 is corrected by the result editing unit 18, the control unit 24 increments by 1 each time.

【００２０】また後処理部１６は認識部１２からの認識
結果に基づき以下のような知識処理を行う。知識処理１
単位分の文字の認識結果（例えば図３に示す帳票３６に
おいて都道府県名の記載領域３８の認識結果）を入力す
ると、知識処理１単位分の各文字の候補文字を組み合わ
せてできる文字列を単語情報の単語と照合し、候補文字
から成る文字列に対応する単語が単語情報の中に存在す
るか否か調べる。そして組み合わせてできた文字列の中
から単語情報の単語と合致する文字列Ａを検出したら、
文字列Ａの評価値Ｊを算出する。ここでＳを文字列の各
候補文字に付された候補順位の和、Ｎを文字列を構成す
る文字の総個数を示すものとすれば、評価値Ｊは例え
ば、Ｊ＝Ｓ÷Ｎと表わすことができる。Further, the post-processing section 16 carries out the following knowledge processing based on the recognition result from the recognition section 12. Knowledge processing 1
When a recognition result of characters for a unit (for example, a recognition result of the area 38 for describing a prefecture name in the form 36 shown in FIG. 3) is input, a character string formed by combining candidate characters of each character for knowledge processing is used as a word. It is checked whether or not a word corresponding to a character string composed of candidate characters exists in the word information by collating with the information word. Then, when the character string A that matches the word of the word information is detected from the character strings formed by combining,
The evaluation value J of the character string A is calculated. Here, if S is the sum of the candidate ranks given to the candidate characters of the character string and N is the total number of characters that make up the character string, the evaluation value J is expressed as, for example, J = S ÷ N. be able to.

【００２１】単語及び文字列Ａが合致するか否かの判定
は、例えば、単語及び文字列Ａの対応する位置の文字の
文字コードが全部一致するか否かによって行う。そして
知識処理１単位分についてできた文字列の全てを単語情
報と照合し終えたときに評価値Ｊが所定の閾値内である
文字列Ａを候補単語として結果編集部１８へ送出する。
各候補単語には評価値の小さな方から順に候補順位を定
める。もし評価値Ｊが等しい文字列Ａが複数個存在し、
かつその中に利用者登録単語が含まれていない場合は、
知識辞書への登録順に候補順位を決定する。また評価値
Ｊが等しい文字列Ａが複数個存在し、かつその中に利用
者登録単語が含まれている場合は、上記訂正頻度情報に
応じて本実施例ではつぎのように候補順位を決定する。Whether or not the word and the character string A match is determined by, for example, whether or not all the character codes of the characters at the corresponding positions of the word and the character string A match. Then, when all the character strings formed for one unit of knowledge processing have been checked against the word information, the character string A whose evaluation value J is within a predetermined threshold is sent to the result editing unit 18 as a candidate word.
For each candidate word, the candidate rank is determined in order from the smallest evaluation value. If there are multiple character strings A with the same evaluation value J,
And if the user registration word is not included in it,
Candidate ranks are determined in the order of registration in the knowledge dictionary. Further, when there are a plurality of character strings A having the same evaluation value J and the user registration word is included therein, in the present embodiment, the candidate order is determined as follows according to the correction frequency information. To do.

【００２２】当該利用者登録単語の訂正頻度情報の値Ｘ
と所定の閾値Ｓとの関係がＸ＜Ｓの場合は、当該利用者
登録単語の候補順位を１とし、その他の評価値が等しい
候補単語の候補順位は知識辞書への登録順に決定され、
Ｘ≧Ｓの場合は当該利用者登録単語を優先せず、評価値
が等しい候補単語の候補順位は全て認識辞書への登録順
に決定される。Value X of correction frequency information of the user registered word
And the predetermined threshold value S is X <S, the candidate rank of the user registration word is set to 1, and the candidate ranks of the candidate words having the same evaluation value are determined in the order of registration in the knowledge dictionary.
If X ≧ S, the user-registered word is not prioritized, and the candidate ranks of candidate words having the same evaluation value are all determined in the order of registration in the recognition dictionary.

【００２３】また知識処理１単位分の文字列全てを単語
情報の単語と照合し終えたときに文字列Ａを１個だけ検
出していたら、当該文字列Ａの候補順位を１として結果
編集部１８へ送出する。また知識処理１単位分の文字列
全てを単語情報の単語と照合し終えたときに文字列Ａを
１個も検出していなければ、知識処理１単位分の各文字
の候補順位が１の候補文字を組み合わせてできる文字列
を知識処理結果として結果編集部１８へ送出する。If only one character string A is detected when all the character strings for one unit of knowledge processing are matched with the words in the word information, the candidate editing order of the character string A is set to 1 and the result editing section Send to 18. Also, if no character string A is detected when all the character strings for one unit of knowledge processing have been matched with the words of the word information, the candidate rank of each character for one unit of knowledge processing is 1 A character string formed by combining characters is sent to the result editing unit 18 as the knowledge processing result.

【００２４】結果編集部１８は後処理部１６にて出力さ
れた候補順位が１の候補単語または候補文字が誤ってい
る場合に、正しく訂正する。The result editing unit 18 corrects the candidate word or candidate character whose candidate rank is 1 output by the post-processing unit 16 when it is incorrect.

【００２５】表示部２０は候補単語、候補文字、及び編
集結果を表示する。The display unit 20 displays candidate words, candidate characters, and edited results.

【００２６】入力部２２は候補順位が１の候補単語が誤
りの場合に正解単語を入力する。候補単語が１個の場
合、及び候補単語が複数個存在しかつその候補順位が２
以下の候補単語にも正解が含まれていない場合は、キー
ボード等により正解単語を入力する。候補単語が複数個
存在し、かつ候補順位が２以下の候補単語に正解が含ま
ている場合は、マウス等により正解単語を選択する。The input unit 22 inputs a correct word when a candidate word having a candidate rank of 1 is incorrect. If there is one candidate word, or if there are multiple candidate words and their candidate rank is 2
If the correct word is not included in the following candidate words, enter the correct word using a keyboard or the like. When there are a plurality of candidate words and the candidate word having the candidate rank of 2 or less includes the correct answer, the correct answer word is selected with the mouse or the like.

【００２７】制御部２４は結果編集部１８、表示部２
０、及び入力部２２の動作を制御し、また結果編集部１
８にて候補順位が１位の利用者登録単語が訂正されると
その都度、知識辞書中の訂正頻度情報に１を加算する。The control unit 24 includes a result editing unit 18 and a display unit 2.
0, and controls the operation of the input unit 22, and also the result editing unit 1
Whenever the user-registered word whose candidate rank is 1 is corrected in 8, the correction frequency information in the knowledge dictionary is incremented by 1 each time.

【００２８】この発明は上述した実施例にのみ限定され
るものではなく、従って各構成成分の構成、動作、処理
内容、入出力信号及び数値的条件を任意好適に変更して
よい。例えば上記評価値が等しい候補単語が複数個存在
し、且つその中に利用者登録単語が含まれている場合の
候補順位の決定における閾値ＳのかわりにＳ1＜Ｓ2とな
るような２個の閾値Ｓ1，Ｓ2を段階的に設けてもよい。
この際、訂正頻度情報ＸがＸ＜Ｓ1のときは当該利用者
登録単語を最優先して、またＳ1≦Ｘ＜Ｓ2のときは当該
利用者登録単語を当該候補単語群の中間順位になるよう
にして、またＸ≧Ｓ2のときは当該利用者登録単語を最
低順位になるようにして候補順位を決定する。また上記
閾値Ｓを段階的に３個以上設定してもよい。The present invention is not limited to the above-mentioned embodiment, and therefore, the configuration, operation, processing content, input / output signal and numerical condition of each component may be arbitrarily changed. For example, when there are a plurality of candidate words having the same evaluation value and the user-registered words are included in the candidate words, two thresholds such that S1 <S2 instead of the threshold S in the determination of the candidate rank. S1 and S2 may be provided stepwise.
At this time, when the correction frequency information X is X <S1, the user registration word is given the highest priority, and when S1 ≦ X <S2, the user registration word is set to an intermediate rank of the candidate word group. Further, when X ≧ S2, the candidate rank is determined such that the user registration word has the lowest rank. Further, three or more threshold values S may be set stepwise.

【００２９】また上記評価値が等しい候補単語が複数個
存在し、かつその中に利用者登録単語が含まれている場
合の候補順位の決定における閾値Ｓを利用者が単語の登
録時に設定するようにしてもよい。In addition, when there are a plurality of candidate words having the same evaluation value and the user-registered words are included in the candidate words, the user sets the threshold S in determining the candidate rank at the time of registering the words. You may

【００３０】また上記評価値が等しい候補単語が複数個
存在し、かつその中に利用者登録単語が含まれている場
合の候補順位の決定における閾値Ｓを、知識処理で同時
に出力された他の候補単語の個数及び訂正頻度情報や文
脈情報等に応じて自動的に変化するようにしてもよい。Further, when there are a plurality of candidate words having the same evaluation value and the user-registered words are included in the candidate words, the threshold value S in the determination of the candidate rank is set to another value which is output at the same time in the knowledge processing. It may be changed automatically according to the number of candidate words, correction frequency information, context information, and the like.

【００３１】また上述した実施例では単語登録部１４に
おいて利用者は予め単語情報を記憶している知識辞書に
単語情報を追加登録したが、その他の利用者専用の知識
辞書を用意してそこに必要な単語情報を登録するように
してもよい。In the above-described embodiment, the user has additionally registered the word information in the knowledge dictionary in which the word information is stored in advance in the word registration unit 14. However, another user-specific knowledge dictionary is prepared and stored therein. You may make it register necessary word information.

【００３２】また上述した実施例では評価値Ｊとして、
文字列の各候補に付された候補順位の和Ｓを文字列を構
成する文字の総個数Ｎで割った値を用いたが、候補順位
の和Ｓにかえて各候補順位に対応した得点（例えば候補
順位１位に対して１００点、候補順位２位に対して９０
点を対応づけるというように候補順位が下がるにつれて
低くなる得点を対応づける）の和を用いるようにしても
よい。或いは、候補順位の和Ｓにかえて文字列の各候補
文字の出現頻度（この場合は出現頻度は予め認識辞書が
保有する）の和を用いるようにしてもよい。或いは候補
順位の和Ｓにかえて候補文字の辞書マトリクスと当該候
補文字に対応する文字パタンの特徴量との間の距離を求
め文字列の各候補の前記距離の和を用いるようにしても
よい。或いは候補順位の和Ｓにかえて、文字列の各候補
文字の出現頻度の和と候補順位の和を用いるようにして
もよい。In the above-mentioned embodiment, the evaluation value J is
A value obtained by dividing the sum S of the candidate ranks given to each candidate of the character string by the total number N of the characters forming the character string was used, but instead of the sum S of the candidate ranks, the score corresponding to each candidate rank ( For example, 100 points for the first candidate ranking and 90 for the second candidate ranking
It is also possible to use a sum of (corresponding points that become lower as the candidate rank goes down, such as associating points). Alternatively, instead of the sum S of the candidate ranks, the sum of the appearance frequencies of the candidate characters in the character string (in this case, the appearance frequency is held in advance in the recognition dictionary) may be used. Alternatively, instead of the sum S of the candidate ranks, the distance between the dictionary matrix of the candidate characters and the feature amount of the character pattern corresponding to the candidate character may be obtained and the sum of the distances of the respective candidates of the character string may be used. . Alternatively, instead of the sum S of the candidate ranks, the sum of the appearance frequencies of the candidate characters in the character string and the sum of the candidate ranks may be used.

【００３３】また、後処理部の知識処理は上述したほか
次のように行ってもよい。候補文字から成る文字列に対
応する単語が単語情報の中に存在するか否か調べるた
め、知識処理１単位分の文字列を知識辞書内の単語情報
の単語と照合し、これら文字列及び単語の間の類似度
（或いは不一致度）を算出する。文字列に対応する単語
として例えば文字列との類似度が所定の閾値を越える単
語（或いは文字列との不一致度が所定の閾値を越えない
単語）を検出する。そして、（１）類似度が所定の閾値を越える文字列（或いは不一
致度が所定の閾値を越えない文字列）を検出した場合に
は、この検出した文字列のうち最大の類似度（或いは最
小の不一致度）を検出し、この最大の類似度（或いは不
一致度）の文字列に対応する単語情報の単語を知識処理
結果、及びこの最大の類似度（或いは不一致度）を知識
処理の評価値として出力する。（２）知識処理１単位分の文字列の全てを単語情報の単
語と照合し終えても類似度が所定の閾値を越える文字列
（或いは不一致度が所定の閾値を越えな文字列）を一つ
も検出できなかった場合には、候補順位が１位となる候
補文字の組み合わせの文字列を知識処理結果として出力
すると共に、予め定めた類似度の下限値（或いは不一致
度の予め定めた上限値）を評価値として出力する。これ
ら類似度の下限値（或いは不一致度の上限値）は候補文
字から成る文字列に対応する単語が単語情報の中に存在
しなかったことを表わす。In addition to the above, the knowledge processing of the post-processing section may be performed as follows. In order to check whether or not a word corresponding to a character string consisting of candidate characters exists in word information, a character string for one unit of knowledge processing is collated with a word of word information in a knowledge dictionary, and the character string and the word The degree of similarity (or degree of disagreement) between the two is calculated. As a word corresponding to the character string, for example, a word whose similarity to the character string exceeds a predetermined threshold value (or a word whose dissimilarity to the character string does not exceed a predetermined threshold value) is detected. Then, (1) When a character string whose similarity exceeds a predetermined threshold (or a character string whose degree of disagreement does not exceed a predetermined threshold) is detected, the maximum similarity (or minimum) of the detected character strings is detected. Of the word information corresponding to the character string of the maximum similarity (or the degree of dissimilarity), and the maximum similarity (or the degree of disagreement) as the evaluation value of the knowledge processing. Output as. (2) Even if all the character strings for one unit of knowledge processing are completely matched with the words of the word information, one character string whose similarity exceeds a predetermined threshold value (or a character string whose dissimilarity degree does not exceed a predetermined threshold value) is If none of them is detected, the character string of the combination of the candidate characters having the first rank of the candidate is output as the knowledge processing result, and the lower limit value of the predetermined similarity degree (or the predetermined upper limit value of the dissimilarity degree). ) Is output as an evaluation value. The lower limit value of the similarity (or the upper limit value of the dissimilarity) indicates that the word corresponding to the character string including the candidate character does not exist in the word information.

【００３４】さらに上述した実施例では単語情報を用い
た知識処理の例につき説明したが文脈情報そのほかの知
識情報を用いた知識処理を行う文字認識装置にこの発明
を適応してもよい。Further, in the above-described embodiment, an example of knowledge processing using word information has been described, but the present invention may be applied to a character recognition device that performs knowledge processing using context information and other knowledge information.

【００３５】[0035]

【発明の効果】上述したようにこの発明によれば、利用
者登録単語の訂正頻度に応じて候補順位が決定されるの
で、如何なる単語が追加登録されても継続的にこの装置
を利用することにより認識性能の低下を防止することが
できる。その結果、非専門家でも容易に単語を登録でき
る。従って、正確かつ迅速に帳票または文書を処理でき
るように認識性能を良くした文字認識装置を提供でき
る。As described above, according to the present invention, the candidate rank is determined according to the correction frequency of the user-registered word, so that no matter what word is additionally registered, this device can be continuously used. This makes it possible to prevent the recognition performance from deteriorating. As a result, even non-specialists can easily register words. Therefore, it is possible to provide a character recognition device having improved recognition performance so that a form or a document can be processed accurately and quickly.

[Brief description of drawings]

【図１】本発明の実施例の構成を示す機能ブロック図で
ある。FIG. 1 is a functional block diagram showing a configuration of an exemplary embodiment of the present invention.

【図２】知識辞書における記憶内容のフォーマットを示
す図である。FIG. 2 is a diagram showing a format of stored contents in a knowledge dictionary.

【図３】帳票の一例を示す図である。FIG. 3 is a diagram showing an example of a form.

[Explanation of symbols]

１０文字認識装置１２認識部１４単語登録部１６後処理部１８結果編集部２０表示部２２入力部２４制御部２６光電変換部２８画像メモリ 10 character recognition device 12 recognition unit 14 word registration unit 16 post-processing unit 18 result editing unit 20 display unit 22 input unit 24 control unit 26 photoelectric conversion unit 28 image memory

Claims

[Claims]

1. A recognition unit for outputting a recognition result of a character pattern cut out from image data of a quantized form or document, a word registration unit for registering word information in a knowledge dictionary, and holding the knowledge dictionary. To input a correct word, a post-processing unit that outputs a knowledge processing result based on the recognition result, a result editing unit that edits the knowledge processing result, a display unit that displays the knowledge processing result and the edited result, In a character recognition device comprising an input unit of, a result editing unit, a display unit, and a control unit that controls the operation of the input unit, the knowledge dictionary stores frequency information related to the correction frequency of registered words. The character recognition device, wherein the control unit updates the frequency information, and the post-processing unit determines a candidate rank of candidate words according to the frequency information in subsequent processing.