JPH06274702A - Character recognizing device - Google Patents

Character recognizing device

Info

Publication number
JPH06274702A
JPH06274702A JP5063781A JP6378193A JPH06274702A JP H06274702 A JPH06274702 A JP H06274702A JP 5063781 A JP5063781 A JP 5063781A JP 6378193 A JP6378193 A JP 6378193A JP H06274702 A JPH06274702 A JP H06274702A
Authority
JP
Japan
Prior art keywords
candidate
word
character
unit
knowledge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP5063781A
Other languages
Japanese (ja)
Inventor
Makoto Kushima
真 久島
Koichi Higuchi
浩一 樋口
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oki Electric Industry Co Ltd
Original Assignee
Oki Electric Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oki Electric Industry Co Ltd filed Critical Oki Electric Industry Co Ltd
Priority to JP5063781A priority Critical patent/JPH06274702A/en
Publication of JPH06274702A publication Critical patent/JPH06274702A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PURPOSE:To improve the character recognizing precision by storing the correcting frequency information on the registered words and deciding the order of candidate words according to the correcting frequency of those words registered by users. CONSTITUTION:An after-processing part 16 holds a knowledge dictionary, and a word register part 14 registers additionally the word information desired by users into the knowledge dictionary. The correcting frequency information contained in the knowledge dictionary are increased by a control part 24 every time the registered information on a word is corrected. Then the part 16 supplied the character recognizing result through a recognizing part 12 and calculates the evaluation value of a character string consisting of the combination of candidate characters equivalent to a single unit of knowledge processing. Thus a character string whose elavulation value is less than the prescribed threshold value is sent to a result editing part 18 as a candidate word. If plural character strings have the same evaluation value, the candidate order of words registered by users that satisfies X<S is defined as 1 (X: correcting frequency information value of registered words, S: prescribed threshold value).

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】この発明は、正確かつ迅速に帳票
または文書を処理できる認識性能の良い文字認識装置に
関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognizing device which can accurately and quickly process a form or a document and has a good recognition performance.

【0002】[0002]

【従来の技術】従来より、手書き文字の認識率を向上さ
せるために知識辞書が用いられている。知識辞書が保持
する情報は、認識対象の単語の情報、文脈情報、及びそ
の他の情報である。文字認識装置の利用者は知識辞書に
予め用意されていない単語情報を追加して登録すること
ができる(以下このように利用者により登録された単語
を利用者登録単語と称する)。以下、知識辞書の単語情
報を利用した従来の文字認識技術として、例えば文献:
昭和57年電子通信学会総合全国大会講演論文集 分冊
5ー326頁に示されている技術を基に説明する。
2. Description of the Related Art Conventionally, a knowledge dictionary has been used to improve the recognition rate of handwritten characters. The information held by the knowledge dictionary is the information of the recognition target word, the context information, and other information. The user of the character recognition device can add and register word information that is not prepared in advance in the knowledge dictionary (hereinafter, the word registered by the user in this way is referred to as a user registered word). Hereinafter, as a conventional character recognition technique using word information of a knowledge dictionary, for example, a document:
The explanation will be given based on the technology shown in pp. 5-326 of the Proceedings of the IEICE General Conference, 1982.

【0003】まず帳票または文書の所定領域を光学的に
走査し紙面からの光信号を光電変換して帳票または文書
の画像データを得る。そして画像データから認識対象と
なる文字パタンを切り出す。この切り出し文字パタンに
基づき認識対象となる文字の認識を行い、認識結果とし
て1個または複数個の候補文字を得る。
First, a predetermined area of a form or a document is optically scanned and an optical signal from the paper surface is photoelectrically converted to obtain image data of the form or the document. Then, a character pattern to be recognized is cut out from the image data. Characters to be recognized are recognized based on the cut-out character pattern, and one or more candidate characters are obtained as a recognition result.

【0004】そして知識処理1単位分の文字に関して、
各文字毎に得た1個または複数個の候補文字を組み合わ
せて文字列を作り、文字列の各候補文字毎に付された候
補順位または類似度を用いて当該文字列と知識辞書中の
単語の間の評価値を算出する。さらにこの文字列の評価
値に応じて候補順位を付した1個または複数個の候補単
語を表示する。但し、当該文字列の評価値が所定の範囲
外となる場合には、候補順位が1となる候補文字を組み
合わせてできる候補文字列を表示する。
Then, regarding the character for one unit of knowledge processing,
A character string is created by combining one or more candidate characters obtained for each character, and the character string and a word in the knowledge dictionary are used by using the candidate rank or similarity assigned to each candidate character of the character string. The evaluation value between is calculated. Further, one or a plurality of candidate words with candidate ranks assigned according to the evaluation value of this character string are displayed. However, when the evaluation value of the character string is out of the predetermined range, a candidate character string formed by combining the candidate characters having the candidate rank of 1 is displayed.

【0005】もし評価値が等しい複数個の候補単語が存
在した場合は、一般に利用者登録単語を最優先し、その
他の単語については知識辞書に登録されている順番に従
い候補順位を決定する。
If there are a plurality of candidate words having the same evaluation value, the user-registered word is generally given the highest priority, and the candidate ranks of the other words are determined according to the order registered in the knowledge dictionary.

【0006】候補順位が1位の候補単語が誤っている場
合は、キーボードから正解単語を直接入力するか、或い
は候補順位が2位以下の候補単語も表示してその中に正
解単語があればそれを選択することにより訂正を行う。
If the candidate word having the first candidate rank is wrong, the correct answer word is directly input from the keyboard, or the candidate words having the second or lower candidate rank are displayed and if there is a correct answer word among them. Correction is made by selecting it.

【0007】[0007]

【発明が解決しようとする課題】上記文字認識技術で
は、評価値が等しい複数個の候補単語が存在する場合に
利用者登録単語が最優先されるので、使用頻度が低くか
つ類似単語が多い単語等が利用者登録単語として登録さ
れると、その単語の影響により候補順位が1位の候補単
語の誤りが増えてしまい、文字の認識性能が低下すると
いう問題点があった。
In the above character recognition technique, since the user-registered word has the highest priority when there are a plurality of candidate words having the same evaluation value, the word which is used less frequently and has many similar words. However, if such a word is registered as a user-registered word, the number of errors in the candidate word having the first candidate rank increases due to the influence of the word, and there is a problem that the character recognition performance deteriorates.

【0008】例えばある帳票の氏名欄の「阿部」と記入
された箇所を認識して得られた候補文字から評価値を算
出して、正解単語を得ている場合を考える。もしここで
「阿」を認識した際の候補順位が1位及び2位の文字が
それぞれ「阿」及び「河」で、「部」を認識した際の候
補順位が1位及び2位の文字がそれぞれ「那」及び
「部」であり、なおかつ「阿部」は知識辞書に予め用意
されており、「河那(かわな)」は利用者によって追加
登録された単語であるとする(「阿那」は知識辞書に存
在しないものとする)。すると「阿部」及び「河那」の
評価値は等しくなり、「阿部」は「河那」よりはるかに
出現頻度が大きい単語であるにもかかわらず利用者登録
単語である「河那」が優先されてしまう。
For example, consider a case where a correct answer word is obtained by calculating an evaluation value from a candidate character obtained by recognizing a place where "Abe" is entered in the name column of a certain form. If "A" is recognized here, the first and second characters in the candidate rank are "A" and "Kawa", respectively, and when the "part" is recognized, the first and second characters are candidate ranks. Are “na” and “part” respectively, and “Abe” is prepared in the knowledge dictionary in advance, and “kawana” is a word additionally registered by the user (“Ana”). "Na" is not present in the knowledge dictionary). Then, the evaluation values of "Abe" and "Kana" are equal, and although "Abe" is a word that appears much more frequently than "Kana", the user-registered word "Kana" takes precedence. Will be done.

【0009】本発明はこの問題点を解決するために、利
用者登録単語の訂正頻度に応じて候補単語の候補順位が
決定されるようにして認識精度を高めた文字認識装置を
提供することを目的とする。
In order to solve this problem, the present invention provides a character recognition apparatus in which the candidate rank of candidate words is determined according to the frequency of correction of user-registered words to improve the recognition accuracy. To aim.

【0010】[0010]

【課題を解決するための手段】この発明に係る文字認識
装置は前記課題を解決するために、量子化された帳票ま
たは文書の画像データから切り出した文字パタンの認識
結果を出力する認識部と、単語情報を知識辞書へ登録す
るための単語登録部と、前記知識辞書を保持し前記認識
結果に基づく知識処理結果を出力する後処理部と、前記
知識処理結果を編集する結果編集部と、前記知識処理結
果及び編集の結果を表示する表示部と、正解単語を入力
するための入力部と、前記結果編集部、表示部、及び入
力部の動作を制御する制御部を備えて成る文字認識装置
において、前記知識辞書は登録された単語の訂正頻度に
係る頻度情報を記憶し、前記制御部は前記頻度情報を更
新し、前記後処理部は次回以降の処理において前記頻度
情報に応じて候補単語の候補順位を定めることを特徴と
する。
In order to solve the above-mentioned problems, a character recognition device according to the present invention includes a recognition unit which outputs a recognition result of a character pattern cut out from image data of a quantized form or document, A word registration unit for registering word information in a knowledge dictionary; a post-processing unit that holds the knowledge dictionary and outputs a knowledge processing result based on the recognition result; a result editing unit that edits the knowledge processing result; A character recognition device including a display unit for displaying a knowledge processing result and an editing result, an input unit for inputting a correct word, a result editing unit, a display unit, and a control unit for controlling operations of the input unit. In the above, the knowledge dictionary stores frequency information related to the correction frequency of registered words, the control unit updates the frequency information, and the post-processing unit selects candidates according to the frequency information in the subsequent processing. Characterized in that it defines the candidate ranking of the word.

【0011】[0011]

【作用】この発明によれば、後処理部は利用者登録単語
の訂正頻度に応じて候補単語の候補順位を決定するよう
にしたので、使用頻度の低い登録単語が優先して選択さ
れるようなことがなくなり、前記課題が解決されるので
ある。
According to the present invention, the post-processing unit determines the candidate rank of the candidate words according to the correction frequency of the user-registered words, so that the registered words that are less frequently used are preferentially selected. The above problem is solved.

【0012】[0012]

【実施例】以下、図面を参照しこの発明の実施例につき
説明する。尚、図面はこの発明が理解できる程度に概略
的に示されているにすぎず、従って各構成成分の形状、
配設位置、寸法、入出力信号および接続関係を図示例に
限定するものではない。
Embodiments of the present invention will be described below with reference to the drawings. It should be noted that the drawings are only schematically shown to the extent that the present invention can be understood.
The arrangement position, dimensions, input / output signals, and connection relationship are not limited to the illustrated example.

【0013】図1はこの発明の一実施例の説明に供する
機能ブロック図である。この実施例の文字認識装置10
は、認識辞書を保持し、量子化された帳票または文書の
画像データから文字パタンを切り出し、この切り出した
文字パタンの候補文字を出力する認識部12と、文字認
識装置の利用者が希望する単語情報を登録する単語登録
部14と、知識辞書を保持し、認識結果の知識処理結果
を出力する後処理部16と、認識結果を訂正、確認する
結果編集部18と、知識処理結果及び編集結果を表示す
る表示部20と、正解単語を入力する入力部22を備
え、更に結果編集部18、表示部20、及び入力部22
の動作を制御すると共に、利用者登録単語の訂正頻度に
係る頻度情報を更新する制御部24を備えて成る。また
図1において26は帳票または文書の量子化された画像
データを出力する光電変換部であり、28は光電変換部
26からの画像データを格納する画像メモリである。
FIG. 1 is a functional block diagram for explaining one embodiment of the present invention. Character recognition device 10 of this embodiment
Is a recognition unit 12 that holds a recognition dictionary, cuts out character patterns from quantized image data of a form or document, outputs candidate characters of the cut-out character patterns, and a word desired by a user of the character recognition device. A word registration unit 14 that registers information, a post-processing unit 16 that holds a knowledge dictionary and outputs a knowledge processing result of a recognition result, a result editing unit 18 that corrects and confirms the recognition result, a knowledge processing result and an editing result. And a display unit 20 for displaying a correct answer word, and an input unit 22 for inputting a correct word.
The control unit 24 controls the operation of (1) and updates the frequency information related to the correction frequency of the user registration word. Further, in FIG. 1, reference numeral 26 is a photoelectric conversion unit that outputs quantized image data of a form or document, and 28 is an image memory that stores the image data from the photoelectric conversion unit 26.

【0014】図2は1単語についての知識辞書における
記憶内容のフォーマットを示している。同図において3
0は単語情報、32はその単語が利用者登録単語か否か
を示すフラグ(以下利用者フラグと称する)、34は訂
正頻度に係る頻度情報(以下訂正頻度情報と称する)で
ある。
FIG. 2 shows the format of the stored contents in the knowledge dictionary for one word. 3 in the figure
Reference numeral 0 is word information, 32 is a flag indicating whether or not the word is a user-registered word (hereinafter referred to as user flag), and 34 is frequency information related to correction frequency (hereinafter referred to as correction frequency information).

【0015】図3は帳票の一例を示したものであり、同
図において36は住所が記載される帳票の例であり、3
8は文字記載領域を指定する記入枠である。
FIG. 3 shows an example of a form. In FIG. 3, 36 is an example of a form in which an address is described.
Reference numeral 8 is an entry frame for designating a character writing area.

【0016】以下、図1、図2及び図3を参照し、この
実施例について詳細に説明する。光電変換部26は帳票
または文書上の所定の読取り範囲を光学的に走査し、帳
票または文書からの光信号Lを光電変換して白黒2値に
量子化された画像データを出力し、画像メモリ28はこ
の画像データを格納する。
Hereinafter, this embodiment will be described in detail with reference to FIGS. 1, 2 and 3. The photoelectric conversion unit 26 optically scans a predetermined reading range on a form or a document, photoelectrically converts an optical signal L from the form or the document, outputs image data quantized into black and white binary, and outputs the image memory. 28 stores this image data.

【0017】認識部12は画像メモリ28の画像データ
から文字パタンを切り出し、この切り出した文字パタン
から認識対象となる文字に関する各種特徴を抽出する。
そして当該文字パタンの特徴を標準文字パタンと照合
し、候補文字を出力する。1文字に関して1個または複
数個の候補文字が認識結果として得られ、候補文字が1
個の場合には候補順位1を当該候補文字に付して出力
し、また候補文字が複数個の場合には類似度に応じて各
候補文字毎に定めた候補順位を候補文字に付して出力す
る。
The recognition unit 12 cuts out a character pattern from the image data of the image memory 28, and extracts various characteristics relating to the character to be recognized from the cut-out character pattern.
Then, the characteristics of the character pattern are compared with the standard character pattern, and the candidate character is output. For one character, one or more candidate characters are obtained as a recognition result, and the candidate character is 1
In the case of the number of candidate characters, the candidate rank 1 is attached to the candidate character and outputted, and in the case of a plurality of candidate characters, the candidate rank assigned to each candidate character according to the similarity is attached to the candidate character. Output.

【0018】単語登録部14は文字認識辞書の利用者が
希望する単語情報を知識辞書へ追加登録する。
The word registration unit 14 additionally registers the word information desired by the user of the character recognition dictionary in the knowledge dictionary.

【0019】後処理部16は知識辞書を保持し、知識辞
書中の利用者フラグ32には当該単語情報30が予め用
意されていた場合は0、利用者により追加登録された場
合は1がセットされている。訂正頻度情報34は初期値
が0であり、候補順位が1の当該単語情報の単語が結果
編集部18で訂正されるとその都度、制御部24によっ
て1づづ加算される。
The post-processing unit 16 holds a knowledge dictionary, and the user flag 32 in the knowledge dictionary is set to 0 when the word information 30 is prepared in advance, and is set to 1 when the word information 30 is additionally registered by the user. Has been done. The correction frequency information 34 has an initial value of 0, and each time the word of the word information having the candidate rank of 1 is corrected by the result editing unit 18, the control unit 24 increments by 1 each time.

【0020】また後処理部16は認識部12からの認識
結果に基づき以下のような知識処理を行う。知識処理1
単位分の文字の認識結果(例えば図3に示す帳票36に
おいて都道府県名の記載領域38の認識結果)を入力す
ると、知識処理1単位分の各文字の候補文字を組み合わ
せてできる文字列を単語情報の単語と照合し、候補文字
から成る文字列に対応する単語が単語情報の中に存在す
るか否か調べる。そして組み合わせてできた文字列の中
から単語情報の単語と合致する文字列Aを検出したら、
文字列Aの評価値Jを算出する。ここでSを文字列の各
候補文字に付された候補順位の和、Nを文字列を構成す
る文字の総個数を示すものとすれば、評価値Jは例え
ば、 J=S÷N と表わすことができる。
Further, the post-processing section 16 carries out the following knowledge processing based on the recognition result from the recognition section 12. Knowledge processing 1
When a recognition result of characters for a unit (for example, a recognition result of the area 38 for describing a prefecture name in the form 36 shown in FIG. 3) is input, a character string formed by combining candidate characters of each character for knowledge processing is used as a word. It is checked whether or not a word corresponding to a character string composed of candidate characters exists in the word information by collating with the information word. Then, when the character string A that matches the word of the word information is detected from the character strings formed by combining,
The evaluation value J of the character string A is calculated. Here, if S is the sum of the candidate ranks given to the candidate characters of the character string and N is the total number of characters that make up the character string, the evaluation value J is expressed as, for example, J = S ÷ N. be able to.

【0021】単語及び文字列Aが合致するか否かの判定
は、例えば、単語及び文字列Aの対応する位置の文字の
文字コードが全部一致するか否かによって行う。そして
知識処理1単位分についてできた文字列の全てを単語情
報と照合し終えたときに評価値Jが所定の閾値内である
文字列Aを候補単語として結果編集部18へ送出する。
各候補単語には評価値の小さな方から順に候補順位を定
める。もし評価値Jが等しい文字列Aが複数個存在し、
かつその中に利用者登録単語が含まれていない場合は、
知識辞書への登録順に候補順位を決定する。また評価値
Jが等しい文字列Aが複数個存在し、かつその中に利用
者登録単語が含まれている場合は、上記訂正頻度情報に
応じて本実施例ではつぎのように候補順位を決定する。
Whether or not the word and the character string A match is determined by, for example, whether or not all the character codes of the characters at the corresponding positions of the word and the character string A match. Then, when all the character strings formed for one unit of knowledge processing have been checked against the word information, the character string A whose evaluation value J is within a predetermined threshold is sent to the result editing unit 18 as a candidate word.
For each candidate word, the candidate rank is determined in order from the smallest evaluation value. If there are multiple character strings A with the same evaluation value J,
And if the user registration word is not included in it,
Candidate ranks are determined in the order of registration in the knowledge dictionary. Further, when there are a plurality of character strings A having the same evaluation value J and the user registration word is included therein, in the present embodiment, the candidate order is determined as follows according to the correction frequency information. To do.

【0022】当該利用者登録単語の訂正頻度情報の値X
と所定の閾値Sとの関係がX<Sの場合は、当該利用者
登録単語の候補順位を1とし、その他の評価値が等しい
候補単語の候補順位は知識辞書への登録順に決定され、
X≧Sの場合は当該利用者登録単語を優先せず、評価値
が等しい候補単語の候補順位は全て認識辞書への登録順
に決定される。
Value X of correction frequency information of the user registered word
And the predetermined threshold value S is X <S, the candidate rank of the user registration word is set to 1, and the candidate ranks of the candidate words having the same evaluation value are determined in the order of registration in the knowledge dictionary.
If X ≧ S, the user-registered word is not prioritized, and the candidate ranks of candidate words having the same evaluation value are all determined in the order of registration in the recognition dictionary.

【0023】また知識処理1単位分の文字列全てを単語
情報の単語と照合し終えたときに文字列Aを1個だけ検
出していたら、当該文字列Aの候補順位を1として結果
編集部18へ送出する。また知識処理1単位分の文字列
全てを単語情報の単語と照合し終えたときに文字列Aを
1個も検出していなければ、知識処理1単位分の各文字
の候補順位が1の候補文字を組み合わせてできる文字列
を知識処理結果として結果編集部18へ送出する。
If only one character string A is detected when all the character strings for one unit of knowledge processing are matched with the words in the word information, the candidate editing order of the character string A is set to 1 and the result editing section Send to 18. Also, if no character string A is detected when all the character strings for one unit of knowledge processing have been matched with the words of the word information, the candidate rank of each character for one unit of knowledge processing is 1 A character string formed by combining characters is sent to the result editing unit 18 as the knowledge processing result.

【0024】結果編集部18は後処理部16にて出力さ
れた候補順位が1の候補単語または候補文字が誤ってい
る場合に、正しく訂正する。
The result editing unit 18 corrects the candidate word or candidate character whose candidate rank is 1 output by the post-processing unit 16 when it is incorrect.

【0025】表示部20は候補単語、候補文字、及び編
集結果を表示する。
The display unit 20 displays candidate words, candidate characters, and edited results.

【0026】入力部22は候補順位が1の候補単語が誤
りの場合に正解単語を入力する。候補単語が1個の場
合、及び候補単語が複数個存在しかつその候補順位が2
以下の候補単語にも正解が含まれていない場合は、キー
ボード等により正解単語を入力する。候補単語が複数個
存在し、かつ候補順位が2以下の候補単語に正解が含ま
ている場合は、マウス等により正解単語を選択する。
The input unit 22 inputs a correct word when a candidate word having a candidate rank of 1 is incorrect. If there is one candidate word, or if there are multiple candidate words and their candidate rank is 2
If the correct word is not included in the following candidate words, enter the correct word using a keyboard or the like. When there are a plurality of candidate words and the candidate word having the candidate rank of 2 or less includes the correct answer, the correct answer word is selected with the mouse or the like.

【0027】制御部24は結果編集部18、表示部2
0、及び入力部22の動作を制御し、また結果編集部1
8にて候補順位が1位の利用者登録単語が訂正されると
その都度、知識辞書中の訂正頻度情報に1を加算する。
The control unit 24 includes a result editing unit 18 and a display unit 2.
0, and controls the operation of the input unit 22, and also the result editing unit 1
Whenever the user-registered word whose candidate rank is 1 is corrected in 8, the correction frequency information in the knowledge dictionary is incremented by 1 each time.

【0028】この発明は上述した実施例にのみ限定され
るものではなく、従って各構成成分の構成、動作、処理
内容、入出力信号及び数値的条件を任意好適に変更して
よい。例えば上記評価値が等しい候補単語が複数個存在
し、且つその中に利用者登録単語が含まれている場合の
候補順位の決定における閾値SのかわりにS1<S2とな
るような2個の閾値S1,S2を段階的に設けてもよい。
この際、訂正頻度情報XがX<S1のときは当該利用者
登録単語を最優先して、またS1≦X<S2のときは当該
利用者登録単語を当該候補単語群の中間順位になるよう
にして、またX≧S2のときは当該利用者登録単語を最
低順位になるようにして候補順位を決定する。また上記
閾値Sを段階的に3個以上設定してもよい。
The present invention is not limited to the above-mentioned embodiment, and therefore, the configuration, operation, processing content, input / output signal and numerical condition of each component may be arbitrarily changed. For example, when there are a plurality of candidate words having the same evaluation value and the user-registered words are included in the candidate words, two thresholds such that S1 <S2 instead of the threshold S in the determination of the candidate rank. S1 and S2 may be provided stepwise.
At this time, when the correction frequency information X is X <S1, the user registration word is given the highest priority, and when S1 ≦ X <S2, the user registration word is set to an intermediate rank of the candidate word group. Further, when X ≧ S2, the candidate rank is determined such that the user registration word has the lowest rank. Further, three or more threshold values S may be set stepwise.

【0029】また上記評価値が等しい候補単語が複数個
存在し、かつその中に利用者登録単語が含まれている場
合の候補順位の決定における閾値Sを利用者が単語の登
録時に設定するようにしてもよい。
In addition, when there are a plurality of candidate words having the same evaluation value and the user-registered words are included in the candidate words, the user sets the threshold S in determining the candidate rank at the time of registering the words. You may

【0030】また上記評価値が等しい候補単語が複数個
存在し、かつその中に利用者登録単語が含まれている場
合の候補順位の決定における閾値Sを、知識処理で同時
に出力された他の候補単語の個数及び訂正頻度情報や文
脈情報等に応じて自動的に変化するようにしてもよい。
Further, when there are a plurality of candidate words having the same evaluation value and the user-registered words are included in the candidate words, the threshold value S in the determination of the candidate rank is set to another value which is output at the same time in the knowledge processing. It may be changed automatically according to the number of candidate words, correction frequency information, context information, and the like.

【0031】また上述した実施例では単語登録部14に
おいて利用者は予め単語情報を記憶している知識辞書に
単語情報を追加登録したが、その他の利用者専用の知識
辞書を用意してそこに必要な単語情報を登録するように
してもよい。
In the above-described embodiment, the user has additionally registered the word information in the knowledge dictionary in which the word information is stored in advance in the word registration unit 14. However, another user-specific knowledge dictionary is prepared and stored therein. You may make it register necessary word information.

【0032】また上述した実施例では評価値Jとして、
文字列の各候補に付された候補順位の和Sを文字列を構
成する文字の総個数Nで割った値を用いたが、候補順位
の和Sにかえて各候補順位に対応した得点(例えば候補
順位1位に対して100点、候補順位2位に対して90
点を対応づけるというように候補順位が下がるにつれて
低くなる得点を対応づける)の和を用いるようにしても
よい。或いは、候補順位の和Sにかえて文字列の各候補
文字の出現頻度(この場合は出現頻度は予め認識辞書が
保有する)の和を用いるようにしてもよい。或いは候補
順位の和Sにかえて候補文字の辞書マトリクスと当該候
補文字に対応する文字パタンの特徴量との間の距離を求
め文字列の各候補の前記距離の和を用いるようにしても
よい。或いは候補順位の和Sにかえて、文字列の各候補
文字の出現頻度の和と候補順位の和を用いるようにして
もよい。
In the above-mentioned embodiment, the evaluation value J is
A value obtained by dividing the sum S of the candidate ranks given to each candidate of the character string by the total number N of the characters forming the character string was used, but instead of the sum S of the candidate ranks, the score corresponding to each candidate rank ( For example, 100 points for the first candidate ranking and 90 for the second candidate ranking
It is also possible to use a sum of (corresponding points that become lower as the candidate rank goes down, such as associating points). Alternatively, instead of the sum S of the candidate ranks, the sum of the appearance frequencies of the candidate characters in the character string (in this case, the appearance frequency is held in advance in the recognition dictionary) may be used. Alternatively, instead of the sum S of the candidate ranks, the distance between the dictionary matrix of the candidate characters and the feature amount of the character pattern corresponding to the candidate character may be obtained and the sum of the distances of the respective candidates of the character string may be used. . Alternatively, instead of the sum S of the candidate ranks, the sum of the appearance frequencies of the candidate characters in the character string and the sum of the candidate ranks may be used.

【0033】また、後処理部の知識処理は上述したほか
次のように行ってもよい。候補文字から成る文字列に対
応する単語が単語情報の中に存在するか否か調べるた
め、知識処理1単位分の文字列を知識辞書内の単語情報
の単語と照合し、これら文字列及び単語の間の類似度
(或いは不一致度)を算出する。文字列に対応する単語
として例えば文字列との類似度が所定の閾値を越える単
語(或いは文字列との不一致度が所定の閾値を越えない
単語)を検出する。そして、 (1)類似度が所定の閾値を越える文字列(或いは不一
致度が所定の閾値を越えない文字列)を検出した場合に
は、この検出した文字列のうち最大の類似度(或いは最
小の不一致度)を検出し、この最大の類似度(或いは不
一致度)の文字列に対応する単語情報の単語を知識処理
結果、及びこの最大の類似度(或いは不一致度)を知識
処理の評価値として出力する。 (2)知識処理1単位分の文字列の全てを単語情報の単
語と照合し終えても類似度が所定の閾値を越える文字列
(或いは不一致度が所定の閾値を越えな文字列)を一つ
も検出できなかった場合には、候補順位が1位となる候
補文字の組み合わせの文字列を知識処理結果として出力
すると共に、予め定めた類似度の下限値(或いは不一致
度の予め定めた上限値)を評価値として出力する。これ
ら類似度の下限値(或いは不一致度の上限値)は候補文
字から成る文字列に対応する単語が単語情報の中に存在
しなかったことを表わす。
In addition to the above, the knowledge processing of the post-processing section may be performed as follows. In order to check whether or not a word corresponding to a character string consisting of candidate characters exists in word information, a character string for one unit of knowledge processing is collated with a word of word information in a knowledge dictionary, and the character string and the word The degree of similarity (or degree of disagreement) between the two is calculated. As a word corresponding to the character string, for example, a word whose similarity to the character string exceeds a predetermined threshold value (or a word whose dissimilarity to the character string does not exceed a predetermined threshold value) is detected. Then, (1) When a character string whose similarity exceeds a predetermined threshold (or a character string whose degree of disagreement does not exceed a predetermined threshold) is detected, the maximum similarity (or minimum) of the detected character strings is detected. Of the word information corresponding to the character string of the maximum similarity (or the degree of dissimilarity), and the maximum similarity (or the degree of disagreement) as the evaluation value of the knowledge processing. Output as. (2) Even if all the character strings for one unit of knowledge processing are completely matched with the words of the word information, one character string whose similarity exceeds a predetermined threshold value (or a character string whose dissimilarity degree does not exceed a predetermined threshold value) is If none of them is detected, the character string of the combination of the candidate characters having the first rank of the candidate is output as the knowledge processing result, and the lower limit value of the predetermined similarity degree (or the predetermined upper limit value of the dissimilarity degree). ) Is output as an evaluation value. The lower limit value of the similarity (or the upper limit value of the dissimilarity) indicates that the word corresponding to the character string including the candidate character does not exist in the word information.

【0034】さらに上述した実施例では単語情報を用い
た知識処理の例につき説明したが文脈情報そのほかの知
識情報を用いた知識処理を行う文字認識装置にこの発明
を適応してもよい。
Further, in the above-described embodiment, an example of knowledge processing using word information has been described, but the present invention may be applied to a character recognition device that performs knowledge processing using context information and other knowledge information.

【0035】[0035]

【発明の効果】上述したようにこの発明によれば、利用
者登録単語の訂正頻度に応じて候補順位が決定されるの
で、如何なる単語が追加登録されても継続的にこの装置
を利用することにより認識性能の低下を防止することが
できる。その結果、非専門家でも容易に単語を登録でき
る。従って、正確かつ迅速に帳票または文書を処理でき
るように認識性能を良くした文字認識装置を提供でき
る。
As described above, according to the present invention, the candidate rank is determined according to the correction frequency of the user-registered word, so that no matter what word is additionally registered, this device can be continuously used. This makes it possible to prevent the recognition performance from deteriorating. As a result, even non-specialists can easily register words. Therefore, it is possible to provide a character recognition device having improved recognition performance so that a form or a document can be processed accurately and quickly.

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の実施例の構成を示す機能ブロック図で
ある。
FIG. 1 is a functional block diagram showing a configuration of an exemplary embodiment of the present invention.

【図2】知識辞書における記憶内容のフォーマットを示
す図である。
FIG. 2 is a diagram showing a format of stored contents in a knowledge dictionary.

【図3】帳票の一例を示す図である。FIG. 3 is a diagram showing an example of a form.

【符号の説明】[Explanation of symbols]

10 文字認識装置 12 認識部 14 単語登録部 16 後処理部 18 結果編集部 20 表示部 22 入力部 24 制御部 26 光電変換部 28 画像メモリ 10 character recognition device 12 recognition unit 14 word registration unit 16 post-processing unit 18 result editing unit 20 display unit 22 input unit 24 control unit 26 photoelectric conversion unit 28 image memory

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】 量子化された帳票または文書の画像デー
タから切り出した文字パタンの認識結果を出力する認識
部と、単語情報を知識辞書へ登録するための単語登録部
と、前記知識辞書を保持し前記認識結果に基づく知識処
理結果を出力する後処理部と、前記知識処理結果を編集
する結果編集部と、前記知識処理結果及び編集の結果を
表示する表示部と、正解単語を入力するための入力部
と、前記結果編集部、表示部、及び入力部の動作を制御
する制御部を備えて成る文字認識装置において、 前記知識辞書は登録された単語の訂正頻度に係る頻度情
報を記憶し、前記制御部は前記頻度情報を更新し、前記
後処理部は次回以降の処理において前記頻度情報に応じ
て候補単語の候補順位を定めることを特徴とする文字認
識装置。
1. A recognition unit for outputting a recognition result of a character pattern cut out from image data of a quantized form or document, a word registration unit for registering word information in a knowledge dictionary, and holding the knowledge dictionary. To input a correct word, a post-processing unit that outputs a knowledge processing result based on the recognition result, a result editing unit that edits the knowledge processing result, a display unit that displays the knowledge processing result and the edited result, In a character recognition device comprising an input unit of, a result editing unit, a display unit, and a control unit that controls the operation of the input unit, the knowledge dictionary stores frequency information related to the correction frequency of registered words. The character recognition device, wherein the control unit updates the frequency information, and the post-processing unit determines a candidate rank of candidate words according to the frequency information in subsequent processing.
JP5063781A 1993-03-23 1993-03-23 Character recognizing device Pending JPH06274702A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP5063781A JPH06274702A (en) 1993-03-23 1993-03-23 Character recognizing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP5063781A JPH06274702A (en) 1993-03-23 1993-03-23 Character recognizing device

Publications (1)

Publication Number Publication Date
JPH06274702A true JPH06274702A (en) 1994-09-30

Family

ID=13239273

Family Applications (1)

Application Number Title Priority Date Filing Date
JP5063781A Pending JPH06274702A (en) 1993-03-23 1993-03-23 Character recognizing device

Country Status (1)

Country Link
JP (1) JPH06274702A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007102264A (en) * 2005-09-30 2007-04-19 Toshiba Corp Character recognition device and method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007102264A (en) * 2005-09-30 2007-04-19 Toshiba Corp Character recognition device and method
JP4528705B2 (en) * 2005-09-30 2010-08-18 株式会社東芝 Character recognition device and character recognition method

Similar Documents

Publication Publication Date Title
US20030189603A1 (en) Assignment and use of confidence levels for recognized text
JPH06274702A (en) Character recognizing device
JP3221968B2 (en) Character recognition device
JP3274014B2 (en) Character recognition device and character recognition method
JP4047895B2 (en) Document proofing apparatus and program storage medium
JP2894305B2 (en) Recognition device candidate correction method
JPH0935006A (en) Character recognition device
JP2002207960A (en) Method and program for recognized character correction
JP4047894B2 (en) Document proofing apparatus and program storage medium
JP4318223B2 (en) Document proofing apparatus and program storage medium
JP3101073B2 (en) Post-processing method for character recognition
JP2677271B2 (en) Character recognition device
JPH06295358A (en) Character recognition device
JPH08297663A (en) Device and method for correcting input error
JP2976990B2 (en) Character recognition device
JPH05298495A (en) Character recognizing device, erroneous recognition character correcting method and occidental document processor
JPH07114622A (en) Postprocessing method of character recognition device
JPH06333083A (en) Optical character reader
JPH11120294A (en) Character recognition device and medium
JPH05120472A (en) Character recognizing device
JPH10261049A (en) Character recognizing device
JPH1185910A (en) Device for recognizing character and method therefor and recording medium for recording the same method
JPS63150788A (en) Character recognition device
JPH01171080A (en) Recognizing device for error automatically correcting character
JPH11143983A (en) Character recognition device and method and computer readable recording medium storing character recognition program