JPH03268089A - Optical character reader - Google Patents

Optical character reader

Info

Publication number
JPH03268089A
JPH03268089A JP2066820A JP6682090A JPH03268089A JP H03268089 A JPH03268089 A JP H03268089A JP 2066820 A JP2066820 A JP 2066820A JP 6682090 A JP6682090 A JP 6682090A JP H03268089 A JPH03268089 A JP H03268089A
Authority
JP
Japan
Prior art keywords
subset information
character
subset
recognition
characters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2066820A
Other languages
Japanese (ja)
Other versions
JP2954968B2 (en
Inventor
Katsumi Yaguchi
矢口 克巳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Corp
Original Assignee
Toshiba Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Corp filed Critical Toshiba Corp
Priority to JP2066820A priority Critical patent/JP2954968B2/en
Publication of JPH03268089A publication Critical patent/JPH03268089A/en
Application granted granted Critical
Publication of JP2954968B2 publication Critical patent/JP2954968B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PURPOSE:To omit the troublesome correction of characters by recognizing the characters based on the subset information including a corrected character after an operator corrects the character which is not included in the subset information. CONSTITUTION:A photoelectric transducer part 1 is provided together with a character recognizing part 2, a subset information storage part 3, a reading control part 4, an operating part 5, and a display part 6. When the recognizing result is corrected, a subset information control means 41 functions to reflect the result of correction onto the subset information stored in the part 3. Hereafter a character range is limited by the subset information and therefore the characters are recognized in response to the change of a subset range. Thus each troublesome correction of characters can be omitted.

Description

【発明の詳細な説明】 [発明の目的] (産業上の利用分野) 本発明は光学的文字読取装置(OCR>に関するもので
ある。
DETAILED DESCRIPTION OF THE INVENTION [Object of the Invention] (Field of Industrial Application) The present invention relates to an optical character reader (OCR).

(従来の技術) 従来、OCRにおいては読取精度を向上させるために、
文字記入欄毎に読取対象文字種を限定して文字認識する
手法が行われている。例えば、金額欄等のように数字の
みが記入されることが予め判っているような場合には「
数字」を読取対象文字として指定し、また、フリガナ欄
等のようにカタカナのみが記入される欄の場合には「カ
タカナ」を読取対象文字として指定する。同様に、氏名
欄や住所欄等のように漢字が記入される欄の場合には漢
字を読取対象文字として指定すべきであるが、漢字の場
合には字種が多いため(J I S−第1水準だけでも
約2000字種以上)、同じ漢字でも氏名欄では氏名に
よく使用される文字種のみを読取対象文字として指定し
、住所欄では住所によく使用される文字種のみを読取対
象文字として指定する。このような指定をサブセット指
定または単にサブセットと称し、この指定の範囲の文字
の情報をサブセット情報という。
(Conventional technology) Conventionally, in OCR, in order to improve reading accuracy,
A method is used to recognize characters by limiting the types of characters to be read for each character entry field. For example, if you know in advance that only numbers will be entered, such as in an amount column,
``Number'' is specified as the character to be read, and in the case of a column where only katakana characters are entered, such as a furigana column, ``katakana'' is specified as the character to be read. Similarly, in the case of fields in which kanji are entered, such as name fields and address fields, kanji should be specified as the characters to be read, but in the case of kanji, there are many types of characters (JIS- In the first level alone, there are over 2,000 character types), even for the same kanji, in the name field, only the character types that are often used in names are specified as the characters to be read, and in the address field, only the character types that are often used in addresses are specified as the characters to be read. specify. Such a designation is referred to as a subset designation or simply a subset, and information on characters within this designated range is referred to as subset information.

(発明が解決しようとする課題) しかしながら、このような従来のOCRではサブセット
指定を行った場合に、サブセット情報に含まれていない
文字があると誤認識となったり、認識不能となったりす
る。例えば、銀行の振込依頼書などの帳票を読取る場合
には、振込先の銀行名、支店名などの記入欄は、それぞ
れ銀行名サブセット、支店名サブセットにより読み取り
を行うのであるが、新規に銀行または支店ができた場合
にお(Xで、この銀行名、または支店名がサブセットに
含まれていなければ、誤読または認識不能となるのであ
る。
(Problems to be Solved by the Invention) However, in such conventional OCR, when a subset is specified, if there are characters that are not included in the subset information, the character may be recognized incorrectly or may not be recognized. For example, when reading a form such as a bank transfer request form, fields such as the bank name and branch name of the transfer destination are read using the bank name subset and branch name subset, respectively. When a branch is established (X), if this bank name or branch name is not included in the subset, it will be misread or unrecognizable.

従って、かかる場合にはオペレータが介入して結果の修
正等を行わねばならず、処理が煩しいという問題点が生
じていた。
Therefore, in such a case, an operator must intervene to correct the results, resulting in a problem that the process is complicated.

本発明はこのような従来のOCRの問題点を解決せんと
してなされたもので、その目的は、サブセット情報に含
まれぬ文字が登場してオペレータにより修正された場合
に、それ以後では当該文字を含めたサブセット情報によ
り文字認識がなされ得る光学的文字読取装置を提供する
ことである。
The present invention was made to solve the problems of conventional OCR, and its purpose is to prevent characters from appearing in the subset information and being modified by the operator. An object of the present invention is to provide an optical character reading device that can perform character recognition using included subset information.

[発明の構成] (課題を解決するための手段) 本発明では、帳票イメージを充電変換して画信号を得る
光電変換手段と、 文字認識時に用いる各種パラメータが格納された認識辞
書部と、 認識に係る文字の範囲を限定するサブセット情報が格納
されたサブセット情報記憶手段と、所要時に前記サブセ
ット情報記憶手段に格納されているサブセット情報によ
り文字範囲を限定して前記認識辞書部のパラメータを参
照し、前記光電変換手段により得られた画信号に基づき
文字認識を行う文字認識手段と、 この文字認識手段の認識結果を修正する修正手段と、 この修正手段による修正を前記サブセット情報記憶手段
内のサブセット情報に反映させる処理を行うサブセット
情報管理手段とを備えさせて光学的文字読取装置を構成
した。
[Structure of the Invention] (Means for Solving the Problems) The present invention comprises: a photoelectric conversion means for charging and converting a form image to obtain an image signal; a recognition dictionary unit storing various parameters used during character recognition; a subset information storage means storing subset information for limiting the range of characters related to the character range; and when necessary, the character range is limited by the subset information stored in the subset information storage means and the parameters of the recognition dictionary section are referred to. , a character recognition means for performing character recognition based on the image signal obtained by the photoelectric conversion means, a modification means for modifying the recognition result of the character recognition means, and a subset in the subset information storage means that performs the modification by the modification means. The optical character reading device is provided with a subset information management means that performs processing to reflect the information.

(作用) 上記構成によると、認識結果の修正が行われた場合には
、サブセット情報管理手段が働き、サブセット情報記憶
手段内のサブセット情報に修正を反映させて文字範囲の
追加を所望の場合等に行い得ることになる。
(Operation) According to the above configuration, when a recognition result is modified, the subset information management means operates, and the modification is reflected in the subset information in the subset information storage means, so that when it is desired to add a character range, etc. This means that it can be done.

(実施例) 以下、図面を参照して本発明の一実施例を説明する。第
1図は本発明の一実施例のブロック図である。同図にお
いて、1は光電変換部を示し、例えばCCD等のイメー
ジスキャナ、ノイズ除去回路、A/D変換回路等が含ま
れ、光電変換したアナログ信号を文字認識に供されるデ
ィジタル画信号に変換して文字認識部2へ与える。文字
認識部2には、パターンマツチング法や特徴抽出法など
のために用いられる類似度検出用のパラメータが格納さ
れた認識辞書部21が含まれており、上記手法等を用い
て文字認識を行う。3はサブセット情報記憶部を示し、
前述した如くの認識に係る文字の範囲を限定するサブセ
ット情報が格納されている。文字認識部2は読取制御部
4から指示があると、サブセット情報記憶部3内の該当
サブセット情報により文字範囲を限定して認識を行う。
(Example) Hereinafter, an example of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram of one embodiment of the present invention. In the figure, 1 indicates a photoelectric conversion unit, which includes an image scanner such as a CCD, a noise removal circuit, an A/D conversion circuit, etc., and converts photoelectrically converted analog signals into digital image signals used for character recognition. and gives it to the character recognition section 2. The character recognition unit 2 includes a recognition dictionary unit 21 that stores parameters for similarity detection used for pattern matching methods, feature extraction methods, etc., and performs character recognition using the above-mentioned methods. conduct. 3 indicates a subset information storage unit;
Subset information that limits the range of characters related to recognition as described above is stored. When the character recognition unit 2 receives an instruction from the reading control unit 4, the character recognition unit 2 performs recognition by limiting a character range based on the relevant subset information in the subset information storage unit 3.

読取制御部4はキーボード入力装置等により構成される
操作部5からサブセットの指示を受は取りこれを文字認
識部2へ与えるのである。文字認識部2により認識され
た結果はキャラクタコードとされて読取制御部4へ与え
られる。読取制御部4は二のキャラクタコードを表示部
6へ与える。表示部6には、例えば、CRT及びそのコ
ントローラまたパターンジェネレータ等が含まれ、読取
制御部4から与えられたキャラクタコードをパターン化
して対応する文字を表示する。このとき、操作部5は修
正手段として機能する。即ち、操作部には訂正キー、カ
ーソルキー、次候補指示キーあるいは文字キーなどが備
えられていて、修正すべき文字の下部にカーソルを位置
させ、訂正キーを操作し、次候補指示キーあるいは文字
キーの操作によって正しい文字を入力する。この文字の
キャラクタコードは操作部5から、あるいは、文字認識
部2から読取制御部4を介して表示部6へ与えられ、表
示される。正しい文字が表示されるとカーソルを移動さ
せ同様に訂正を続ける。訂正後のデータは、確定を指示
する操作部5のキーの操作により図示せぬ記憶装置へ転
送されて保持される。
The reading control section 4 receives a subset instruction from an operation section 5 constituted by a keyboard input device, etc., and supplies it to the character recognition section 2. The result recognized by the character recognition section 2 is given as a character code to the reading control section 4. The reading control unit 4 provides the second character code to the display unit 6. The display section 6 includes, for example, a CRT, its controller, a pattern generator, etc., and patterns the character code given from the reading control section 4 to display corresponding characters. At this time, the operation section 5 functions as a correction means. That is, the operation unit is equipped with a correction key, a cursor key, a next candidate indication key, a character key, etc., and by positioning the cursor below the character to be corrected, operating the correction key, and then selecting the next candidate indication key or character key. Input the correct characters by operating the keys. The character code of this character is given to the display section 6 from the operation section 5 or from the character recognition section 2 via the reading control section 4 and displayed. When the correct character is displayed, move the cursor and continue making corrections in the same way. The corrected data is transferred to and held in a storage device (not shown) by operating a key on the operation unit 5 that instructs confirmation.

上記のサブセット情報を用いた文字認識後の訂正動作時
には、例えば、CPUで構成される読取制御部4内のサ
ブセット情報管理手段41が第2図のフローチャートに
従って動作する。即ち、操作部5からの訂正の入力がな
されるが否かを検出しく5IOI)、ある場合には前述
のような表示に係る制御を行うとともに、訂正された内
容(キャラクタコード)を保持する(S102)。そし
て、訂正が終了するまて′ステップ5IOI 、 51
02を繰り返す(S103)、、ここで具体例で説明す
ると、第3図に示されるような銀行の振込依頼書の帳票
において銀行名の欄の読取りか行われたとする。このと
き、銀行名のサブセ・7 ト指定がなされるが、「東芝
銀行」が新規な銀行であり、このサブセット情報には「
芝」の字が含まれておらず、誤認識される。
During a correction operation after character recognition using the above-mentioned subset information, the subset information management means 41 in the reading control section 4, which is comprised of a CPU, operates according to the flowchart shown in FIG. 2, for example. That is, it detects whether or not a correction input is made from the operation unit 5 (5IOI), and if so, controls the display as described above and holds the corrected content (character code) (5IOI). S102). Then, until the correction is finished, step 5IOI, 51
02 is repeated (S103). To explain a specific example here, it is assumed that only the bank name column is read in the form of a bank transfer request form as shown in FIG. At this time, a subset of the bank name is specified, but since "Toshiba Bank" is a new bank, this subset information includes "
It does not contain the character ``shiba'' and is misrecognized.

そこで、訂正が行われ「芝ヨの字が表示部6にて表示さ
れるとともに、そのキャラクタコートがサブセット情報
管理手段41に保持される。2この「芝」の字だけが修
正されると終了となる場合には、サブセット情報管理手
段41は操作部5から、保持内容(ここでは「芝」の文
字)を銀行名のサブセット情報に追加するための指示(
キー操作)がなされるかを検出する(S104)。ここ
で指示があった場合には読取モード切換となったが(っ
まり、他のサブセットによる読取へ移行するか)を検出
しく5105)、切換えとなっていれば当該サブセット
情報(銀行名のサブセット情報)内を検索しく8106
)、保持内容の文字が含まれているかを調べることによ
り追加すべきかを検出する(S107)。上記例では「
芝」の文字が銀行名のサブセットに含まれているか否か
を調べ含まれていなければ追加を行う(8108)。そ
して他に訂正内容がないかを調べ(S109)、あると
きにはステップS 106へ戻って動作を続け、訂正内
容に関する処理がなくなると、メインのプログラムへリ
ターンする。かくして、サブセット情報には新たな文字
の範囲が追加され、次回からの認識に供され、誤認識、
認識不能による訂正動作を行わなくてもよいようになる
。このように本実施例では、サブセット情報管理手段4
1が、第1図に示される通りに、修正結果の文字を当該
サブセット情報に追加すべきが否かを判定する(例えば
、ステップ5104 、5105)判定手段410及び
、この判定手段41の判定結果に応じてサブセット情報
記憶手段内のサブセ・ソト情報の書き換えを行う書換手
段420からなるものである。
Therefore, the correction is made and the character ``Shiba yo'' is displayed on the display unit 6, and its character coat is held in the subset information management means 41.2 When only this character ``Shiba'' is corrected, the process ends. In this case, the subset information management means 41 receives an instruction (from the operation unit 5) to add the retained content (in this case, the character "shiba") to the subset information of the bank name.
A key operation) is detected (S104). If there was an instruction here, the reading mode would have been switched (i.e. whether to switch to reading by another subset) (5105), but if it had been switched, the relevant subset information (bank name subset information) 8106
), it is detected whether the content should be added by checking whether the characters of the retained content are included (S107). In the above example,
It is checked whether the character "Shiba" is included in the bank name subset, and if it is not included, it is added (8108). Then, it is checked whether there is any other correction content (S109), and if there is, the process returns to step S106 to continue the operation, and when there is no more processing related to the correction content, the process returns to the main program. In this way, a new range of characters is added to the subset information and used for next time recognition, eliminating erroneous recognition and
There is no need to perform correction operations due to unrecognizability. In this way, in this embodiment, the subset information management means 4
As shown in FIG. 1, determination means 410 determines whether or not the characters resulting from the correction should be added to the subset information (for example, steps 5104 and 5105), and the determination result of this determination means 41. It consists of a rewriting means 420 that rewrites the subset information in the subset information storage means in accordance with the information stored in the subset information storage means.

しかしながら、常に自動的に修正内容をサブセット情報
に取り込むようにするならば、ステップ。
However, if you want to always automatically incorporate modifications into the subset information, step.

5104は不要である。また、サブセット情報の検索と
いう負荷が多少多くなってもよい場合には、ステップ5
105は不要である、 [発明の効果] 以上説明したように本発明によれば、認識結果の修正が
行われた場合にはサブセット情報管理手段が働き、サブ
セット情報記憶手段内のサブセット情報に修正が反映さ
れ得るので、修正が反映されたときにはその清新しいサ
ブセ・ソト情報により文字範囲が限定されるからサブセ
ラ1〜の範囲の変化に対応して認識がなされ、逐一修正
を行う煩わしさを少なくできる。
5104 is unnecessary. In addition, if the load of retrieving subset information is acceptable, step 5
105 is unnecessary. [Effect of the Invention] As explained above, according to the present invention, when the recognition result is modified, the subset information management means operates, and the subset information in the subset information storage means is modified. can be reflected, so when the correction is reflected, the character range is limited by the fresh subse-soto information, so recognition is made in response to changes in the range of subseras 1 to 1, reducing the trouble of having to make corrections one by one. can.

【図面の簡単な説明】[Brief explanation of drawings]

第1図は本発明の一実施例のブロック図、第2図は本発
明の一実施例の動作を説明するためのフローチャート、
第3図は本発明の一実施例により読取られる帳票の平面
図である。 1・・・光電変換部 2・・・文字認識部 3・・・サブセット情報記憶部 4・・・読取制御部 5・・・操作部 6・・・表示部 21・・・認識辞書部 41・・・サブセット情報管埋手段 410・・・判定部 420・・・書換手段
FIG. 1 is a block diagram of an embodiment of the present invention, and FIG. 2 is a flowchart for explaining the operation of an embodiment of the present invention.
FIG. 3 is a plan view of a form read by an embodiment of the present invention. 1... Photoelectric conversion unit 2... Character recognition unit 3... Subset information storage unit 4... Reading control unit 5... Operation unit 6... Display unit 21... Recognition dictionary unit 41. ... Subset information management means 410 ... Judgment section 420 ... Rewriting means

Claims (2)

【特許請求の範囲】[Claims] (1)帳票イメージを光電変換して画信号を得る光電変
換手段と、 文字認識時に用いる各種パラメータが格納された認識辞
書部と、 認識に係る文字の範囲を限定するサブセット情報が格納
されたサブセット情報記憶手段と、所要時に前記サブセ
ット情報記憶手段に格納されているサブセット情報によ
り文字範囲を限定して前記認識辞書部のパラメータを参
照し、前記光電変換手段により得られた画信号に基づき
文字認識を行う文字認識手段と、 この文字認識手段の認識結果を修正する修正手段と、 この修正手段による修正を前記サブセット情報記憶手段
内のサブセット情報に反映させる処理を行うサブセット
情報管理手段とを備えたことを特徴とする光学的文字読
取装置。
(1) A photoelectric conversion means that obtains an image signal by photoelectrically converting a form image, a recognition dictionary section that stores various parameters used during character recognition, and a subset that stores subset information that limits the range of characters involved in recognition. information storage means, and character recognition based on the image signal obtained by the photoelectric conversion means by limiting a character range using the subset information stored in the subset information storage means when necessary and referring to the parameters of the recognition dictionary section. a character recognition means for performing the above character recognition means, a modification means for modifying the recognition result of the character recognition means, and a subset information management means for performing a process of reflecting the modification by the modification means on the subset information in the subset information storage means. An optical character reading device characterized by:
(2)サブセット情報管理手段は、 修正結果の文字を当該サブセット情報に追加すべきか否
かを判定する判定手段と、 この判定手段の判定結果に応じてサブセット情報記憶手
段内のサブセット情報の書き換えを行う書換手段とから
成ることを特徴とする請求項(1)記載の光学的文字読
取装置。
(2) The subset information management means includes a determining means for determining whether or not characters resulting from the correction should be added to the subset information, and rewriting the subset information in the subset information storage means in accordance with the determination result of the determining means. 2. The optical character reading device according to claim 1, further comprising rewriting means for performing rewriting.
JP2066820A 1990-03-19 1990-03-19 Optical character reader and method of adding subset information in optical reader Expired - Lifetime JP2954968B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2066820A JP2954968B2 (en) 1990-03-19 1990-03-19 Optical character reader and method of adding subset information in optical reader

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2066820A JP2954968B2 (en) 1990-03-19 1990-03-19 Optical character reader and method of adding subset information in optical reader

Publications (2)

Publication Number Publication Date
JPH03268089A true JPH03268089A (en) 1991-11-28
JP2954968B2 JP2954968B2 (en) 1999-09-27

Family

ID=13326872

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2066820A Expired - Lifetime JP2954968B2 (en) 1990-03-19 1990-03-19 Optical character reader and method of adding subset information in optical reader

Country Status (1)

Country Link
JP (1) JP2954968B2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02136986A (en) * 1988-11-17 1990-05-25 Sanyo Electric Co Ltd Handwritten character recognition device
JPH05128314A (en) * 1991-11-07 1993-05-25 Mitsubishi Electric Corp Character recognition device
JP2007233489A (en) * 2006-02-27 2007-09-13 Nec Engineering Ltd Optical sign reader

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02136986A (en) * 1988-11-17 1990-05-25 Sanyo Electric Co Ltd Handwritten character recognition device
JPH05128314A (en) * 1991-11-07 1993-05-25 Mitsubishi Electric Corp Character recognition device
JP2007233489A (en) * 2006-02-27 2007-09-13 Nec Engineering Ltd Optical sign reader

Also Published As

Publication number Publication date
JP2954968B2 (en) 1999-09-27

Similar Documents

Publication Publication Date Title
JP3602596B2 (en) Document filing apparatus and method
US5233672A (en) Character reader and recognizer with a specialized editing function
JPH03268089A (en) Optical character reader
KR100352170B1 (en) Method and Apparatus for A Numeral Code Generation of using Fingerprint Recognition Sensor
JP2636736B2 (en) Fingerprint synthesis device
JP2578748B2 (en) Handwritten information processing method
JPH0782541B2 (en) Fingerprint data flow controller based on fingerprint quality
JP3101073B2 (en) Post-processing method for character recognition
JP2002207960A (en) Method and program for recognized character correction
JPH06251187A (en) Method and device for correcting character recognition error
JPH07210623A (en) Document picture processor
JPH05120471A (en) Character recognizing device
JPH07334611A (en) Display method for non-recognized character
JP2683711B2 (en) How to recognize / correct character / symbol data
JPH0512484A (en) Optical character recognizing device
JPS6215678A (en) Correcting system for read character
JPH04138583A (en) Character recognizing device
JPH05151384A (en) Correcting method for recognition character
JPH03250277A (en) Document processor
JPH09305712A (en) Method, device for recognizing character and storage medium storing program for character recognition
JPS62281089A (en) Recognition method for pattern information
JPH04348475A (en) Method and device for retrieving image information
JPH11143993A (en) Recognized character correction device and its method
JPH06195498A (en) Facsimile character recognition system
JPH08185470A (en) Document reader

Legal Events

Date Code Title Description
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20080716

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090716

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090716

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100716

Year of fee payment: 11

EXPY Cancellation because of completion of term
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100716

Year of fee payment: 11