JPS60225983A - Reader of character - Google Patents

Reader of character

Info

Publication number
JPS60225983A
JPS60225983A JP59081999A JP8199984A JPS60225983A JP S60225983 A JPS60225983 A JP S60225983A JP 59081999 A JP59081999 A JP 59081999A JP 8199984 A JP8199984 A JP 8199984A JP S60225983 A JPS60225983 A JP S60225983A
Authority
JP
Japan
Prior art keywords
character
kanji
printed
pattern
characters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP59081999A
Other languages
Japanese (ja)
Other versions
JPH0731712B2 (en
Inventor
Yoshikatsu Nakamura
中村 好勝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Corp
Original Assignee
Toshiba Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Corp filed Critical Toshiba Corp
Priority to JP59081999A priority Critical patent/JPH0731712B2/en
Publication of JPS60225983A publication Critical patent/JPS60225983A/en
Publication of JPH0731712B2 publication Critical patent/JPH0731712B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)

Abstract

PURPOSE:To attain the recognition processing of different types of characters in an optical character reader by recognizing and processing a sensor output as multiple information at reading a printed KANJI (Chinese character) and in case a hand-written KANJI, as binary information. CONSTITUTION:A photoelectric transfer analog signal obtained from a scanning system is stored as a multiple pattern in an image buffer 32 by an A/D convertor 31. Characters in the image buffer 32 are detected by a control signal 37 and character frame information of every character is transferred to a normarization part 36, which controls addresses in the buffer 32 so that characters can be normarized into the prescribed size in accordance with either the hand-written character or the printed one. In case of the printed character, a normarized character pattern 34 consisting of multiple data can be obtained. In case of the hand-written character, as for a character image, a binary pattern 35 in the prescribed normarized size is obtained by processing, in the similar manner, the multiple data of the buffer 32 and data output nearest to an extraction point. The pattern 35 is sampled and recognized.

Description

【発明の詳細な説明】 〔発明の技術分野〕 この発明は、光学的文字読取装置に関する。[Detailed description of the invention] [Technical field of invention] The present invention relates to an optical character reading device.

〔発明の技術的背景とその問題点〕[Technical background of the invention and its problems]

光学的文字読取装置(以下OCRと略す)は印刷文字か
ら始まり手書きの数字、英字、カタカナさらには数千字
種に及ぶ印刷漢字または手書き漢字までがその読取対象
文字となり、商品化も積極的に進められている。
Optical character reading devices (hereinafter abbreviated as OCR) can read not only printed characters but also handwritten numbers, alphabets, katakana, and even thousands of printed or handwritten kanji, and are actively commercializing them. It is progressing.

数千字種に及ぶ漢字が読取れることで、日本語入力の新
しい展開が期待される一方、より低価格へのニーズも高
いものがある。これまで漢字認識OCRとしては手書き
漢字読取り専用機と印刷漢字、手書き漢字を同一の装置
で読取る装置があるが、前記2つの漢字体を読取る装置
にあっては。
While the ability to read thousands of kanji characters is expected to lead to new developments in Japanese input, there is also a strong need for lower prices. Until now, as a Kanji recognition OCR, there are machines that only read handwritten Kanji and devices that read printed Kanji and handwritten Kanji using the same device, but there are devices that read the two types of Kanji.

印刷1手書共に大きなニーズを持っていることで特に優
れた装置として、漢字入力の多様なニーズを開拓しつつ
ある。
Since there is great demand for both printing and handwriting, this device is developing into a particularly excellent device to meet diverse needs for kanji input.

このような印刷漢字1手書き漢字を読取るに際しての最
初の技術的問題点は光学的走査系の解潅度、つまりどれ
糧精細に観測すれば良いかとい仁とである。
The first technical problem in reading such printed kanji and handwritten kanji is the resolution of the optical scanning system, that is, how precisely can it be observed?

第11図は従来のOCR,における例を説明するもので
、印刷漢字10は、種々の大きさ、線幅のものがあり、
最も多(用いられる号数の印字にあっては20本/mm
程度の解潅度が必要であり、この解潅度を粗にすると、
特に横線を消失してしまう場合がある。−男手書き漢字
入力の場合では、データエントリーとしてのニーズが多
く、住所、性基などの記入を、8〜l 9mm角の記入
枠に0.5mm以上のシャープペンシル、ボールペン等
ニヨって記入をしてもらっている。
FIG. 11 explains an example of conventional OCR, in which printed kanji 10 come in various sizes and line widths.
The highest number (20 lines/mm for printing of the number used)
A certain degree of irrigation is required, and if this irrigation is made coarser,
In particular, horizontal lines may disappear. - In the case of male handwritten kanji input, there are many needs for data entry, and the address, gender base, etc. should be entered in an 8 to 9 mm square writing frame using a 0.5 mm or larger mechanical pencil, ballpoint pen, etc. I'm having it done.

したがって、手書き漢字11を読取る場合の解Ifは文
字のカスレ等が起ったとしても、印刷漢字よりさらに粗
を大約半分程の解慮度lO本/m mで充分であり、汎
用OCRとして手書き英数字、カナ文字などにおいて実
積ある解像度となっている。
Therefore, the solution If for reading handwritten kanji 11 is even if the characters are blurred, etc., it is sufficient to read 10 books/m m, which is even rougher than printed kanji, and is sufficient for reading handwritten kanji 11. It has a proven resolution for alphanumeric characters, kana characters, etc.

これらのことから、印刷漢字lOと手書き漢字11を同
一の装置12で読取る要求に対しては、その走査解像度
を印刷漢字の高精細解隊度に設定し、手書き漢字の読取
時は、1ビツトおきに間引いて、その文字イメージ信号
を得る方法、または印刷漢字と手書き漢字は同時に読取
対象としないという駆足条件をOCRに付けることで、
光学系の解像度をメカニカルに光学的に切換を行い、同
一センサーを共用する方法がとられている。
For these reasons, in response to a request to read printed kanji lO and handwritten kanji 11 with the same device 12, the scanning resolution is set to the high-definition resolution level of printed kanji, and when reading handwritten kanji, the scanning resolution is set to the high resolution level of 1 bit. By thinning out characters at intervals to obtain character image signals, or by adding a driving condition to OCR that print and handwritten kanji are not read at the same time.
A method is used in which the resolution of the optical system is mechanically and optically switched and the same sensor is shared.

これら従来OCRは前者については、手書き漢字読取り
時にセンサーが半分しか機能せず、後者については手書
き、印刷漢字が同時に読取れないという欠点をもち、い
ずれも回路機構を複雑にしその装置価格を高いものにし
ている。
These conventional OCRs have the disadvantage that in the former case, only half of the sensor functions when reading handwritten kanji, and in the latter case, handwritten and printed kanji cannot be read at the same time, both of which complicate the circuit structure and make the device expensive. I have to.

〔発明の目的〕[Purpose of the invention]

この発明は上述した従来装置の欠点を改良したもので、
低解隊度のセンサを用い高品質の認識画像を得、簡易に
実現することのできる光学的文字読取装置を提供するこ
とを目的とする。
This invention improves the drawbacks of the conventional device mentioned above.
It is an object of the present invention to provide an optical character reading device that can obtain high-quality recognition images using a sensor with a low degree of decomposition and that can be easily realized.

〔発明の概要〕[Summary of the invention]

光学的文字読取装置にあって、印刷漢字と手書き漢字を
同一センサ機能を用い、印刷漢字読取時は、センサ出力
を多値情報として認識処理を行い、手書き漢字の場合は
従来通りバイナリ−情報として認識処理を行うことによ
りて、印字品質(線幅)の異なる文字種であっても同一
の装置にて、高速(電気的)に切換え認識処理する光学
的文字読取装置である。
The optical character reading device uses the same sensor function to read printed kanji and handwritten kanji, and when reading printed kanji, the sensor output is recognized as multivalued information, and in the case of handwritten kanji, it is processed as binary information as before. This is an optical character reading device that performs recognition processing to switch recognition processing at high speed (electrically) even for character types with different print quality (line width) in the same device.

〔発明の効果〕〔Effect of the invention〕

センサ部の解像度が大きくでき、センサのビット数が少
な(でき低価格に役立つ、また調整工程が短縮される。
The resolution of the sensor unit can be increased, the number of bits of the sensor can be reduced, which helps lower costs and shortens the adjustment process.

センサ解像切換の必要がrx < b電気的な信号処理
が簡易に実現でき、光学的、機構部の構造が簡単になり
低価格な装置となると伴に切換え処理が高速化され、同
一帳票に印刷漢字0手書き漢字がある場合でも、高速な
帳票処理を可能とする。
The need for sensor resolution switching is rx < b. Electrical signal processing can be easily realized, and the structure of the optical and mechanical parts is simplified, resulting in a low-cost device. At the same time, the switching processing speed is increased, and the same form can be processed. To enable high-speed form processing even when there are no printed kanji and no handwritten kanji.

〔発明の実施例〕[Embodiments of the invention]

第2図は本発明の一実施例における処理の過程姿説明す
るものである、。手書き漢字llの走査系解像度を8本
/mmとし手書き漢字入力時は、得られた光電変換信号
をバイナリ−信号として量子化する、そのイメージを2
5に表わす。また印刷漢字10の場合にあってより高精
細に走査信号を得るときは、光電変換信号を、量子化す
る際に多値の信号として記憶しそのイメージを26に示
す。
FIG. 2 explains the process of processing in an embodiment of the present invention. When inputting handwritten kanji, the scanning system resolution for handwritten kanji ll is set to 8 lines/mm, and the resulting photoelectric conversion signal is quantized as a binary signal.
5. Further, in the case of printed kanji 10, when obtaining a scanning signal with higher definition, the photoelectric conversion signal is stored as a multivalued signal when quantized, and an image thereof is shown in 26.

前処理回路は別に用意される制御情報によって処理方法
を211に切換え、手書き漢字、印刷漢字を同一の職別
空間次元数に変換する。このようにして得られる入カバ
ターンは22に標本化パターンとして24識別処理回路
に入力する。あらかじめ用意された漢字標準パターン2
3も識別処理回路24に入力され、入カバターンの照合
演算を行い、答を出力する。
The preprocessing circuit switches the processing method to 211 using separately prepared control information, and converts handwritten Chinese characters and printed Chinese characters into the same number of job-specific spatial dimensions. The input pattern obtained in this way is input to the identification processing circuit 22 as a sampling pattern. Pre-prepared kanji standard pattern 2
3 is also input to the identification processing circuit 24, which performs a matching operation on the input pattern and outputs the answer.

上述した処理をより具体的説明する図を@3図に示す、
ここで30は走査系から得られる光電変換アナログ信号
、であり31 A/D変換器を用いてイメージバッファ
32に多値パターンとして記憶する。制御部から手書、
印刷の切換制御信号ならびに検切33.正規化部34の
動作開始信号、37によってイメージバッファ32中の
文字の検出を行い、文字毎の文字枠情報(イメージバッ
ファ32上の文字領域アドレス)を正規化部36に転送
する。正規化部36は制御信号37によって手書きか印
刷文字かによってあらかじめ定められた大きさに正規化
するよう、イメージバッファ32のアドレスをコントロ
ールし、印刷漢字の場合は第3図34に示す多値データ
からなる正規化された文字パターンを得る。この時、印
刷漢字にあつても、種々の大きさがあるため、イメージ
バッファ32からの領域転送だけでは済まず、大きさ正
規のための補間処理によって正規化されることは云うま
でもない。同様にして手書き漢字の場合はイメージバッ
ファ32のアドレス制御によって読み出された文字イメ
ージはイメージバッファ32の多値データと抽出点の最
近傍データ出力などの近傍処理によって1ビツトのバイ
ナリ−パターンとしてあらかじめ定められた正規化サイ
ズに正規化し、そのイメージを35のパターンバッフ丁
に記憶する。
A diagram illustrating the above-mentioned process in more detail is shown in Figure @3.
Here, 30 is a photoelectric conversion analog signal obtained from the scanning system, and 31 is stored as a multi-value pattern in an image buffer 32 using an A/D converter. Handwritten from the control unit,
Print switching control signal and inspection cut 33. Characters in the image buffer 32 are detected by the operation start signal 37 of the normalization section 34, and character frame information (character area address on the image buffer 32) for each character is transferred to the normalization section 36. The normalization unit 36 controls the address of the image buffer 32 using a control signal 37 so as to normalize it to a predetermined size depending on whether it is a handwritten character or a printed character, and in the case of a printed kanji character, the multi-valued data shown in FIG. Obtain a normalized character pattern consisting of . At this time, since printed kanji characters come in various sizes, it is not enough just to transfer the area from the image buffer 32, but needless to say, they are normalized by interpolation processing to normalize the size. Similarly, in the case of handwritten kanji, the character image read out by the address control of the image buffer 32 is preliminarily converted into a 1-bit binary pattern by neighborhood processing such as outputting the multivalued data of the image buffer 32 and the nearest neighbor data of the extraction point. The image is normalized to a predetermined normalized size and stored in 35 pattern buffers.

このようにして正規化された文字パターンは、さらに+
4!4図に示す標本化処理が施される。14図の41は
、手書漢字正規化パターンバッファ(第3図の35)か
らの出力であり2×2加算マスクを通過する。42は印
刷漢字正規fヒバターンバッファ(第3図34)からの
出力であり、手書、印刷切換制御信号37によって選択
回路43がいずれかtl!択し、シフトレジスタからな
る3×3の荷重加算マスクをもりたウィンド回路44を
構成しそのウィンドウに対して45の荷重テーブルをも
って荷重加算を行い識別回路又は標準パターンの次元数
に一致した標本化パターンを得る。
The character pattern normalized in this way is further
4! The sampling process shown in Figure 4 is performed. 41 in FIG. 14 is the output from the handwritten kanji normalization pattern buffer (35 in FIG. 3) and passes through a 2×2 addition mask. 42 is the output from the printing kanji regular f Hibatan buffer (FIG. 3, 34), and the selection circuit 43 selects either tl! by the handwriting/printing switching control signal 37. A window circuit 44 having a 3×3 weight addition mask consisting of a shift register is configured, and weight addition is performed on the window using 45 load tables to obtain a sampling pattern that matches the number of dimensions of the identification circuit or standard pattern. get.

43の選択回路への入力は印刷漢字の場合は多値のデー
タ(この例で)そのままであるが手書漢字の場合はシフ
トレジスタからな近傍2×2のマスクにより、その値を
加算し2ピツトに圧縮したデータとして43の選択回路
に入力され、この時点において手書漢字、印刷漢字の情
報密度が同一となる。
In the case of printed kanji, the input to the selection circuit of 43 is the multivalued data (in this example) as is, but in the case of handwritten kanji, the values are added by a 2 x 2 mask from the shift register, and 2 The data is input to the selection circuit 43 as compressed data, and at this point the information density of the handwritten kanji and the printed kanji are the same.

第5図は前述の標本化回路の周辺回路を説明するもので
、正規化パターンを入力として標本化演算回路52が動
作し、標本化バッファ53へ格納し、制御コントローラ
・56の指令によって識別回路54が動作して56標準
パターンとの類似度計算など識別演算を行いその結果を
制御コントローラ56に返送する。
FIG. 5 explains the peripheral circuit of the above-mentioned sampling circuit, in which the sampling calculation circuit 52 operates with the normalized pattern as input, stores it in the sampling buffer 53, and in response to a command from the controller 56, the identification circuit 52 operates. 54 operates to perform identification calculations such as similarity calculation with the 56 standard pattern and sends the results back to the controller 56.

タイ之ング発生回路55は制御コントローラから手書、
印刷漢字のフォーマットコントロール信号と起動信号を
受け、標本化回路52の%種シフトレジスタ、レシジス
タへタイミング信号を出力するものである。
The tying generation circuit 55 receives manual input from the controller.
It receives a format control signal and a start signal for printed Chinese characters, and outputs a timing signal to the % seed shift register and register of the sampling circuit 52.

〔発明の他の実施例〕[Other embodiments of the invention]

前述した実施例では手書漢字と印刷漢字が混在するOC
Rを例にとったが、印刷漢字専用のOCRにあってもそ
の適用は何不足なく可能である。
In the above-mentioned example, an OC in which handwritten kanji and printed kanji are mixed
Although R is taken as an example, it can be applied to OCR exclusively for printed kanji.

例に於て印刷漢字の場合を2ビツト多値化することで説
明したが、さらに多数のビットで記憶してもよく、その
方が正規化、標本化精度がさらに向上することは当然考
えられる。(コストを無視すれば) またこの考えは、認識リジェクト時の入力文字イメージ
を多値情報としてCATに出力したり、低群@度センサ
ーによりて高品質の画像入力を行うことが同様に実施で
きる。
In the example, we explained the case of printed kanji by converting it into 2-bit multi-level data, but it is also possible to store it using even more bits, and it is of course possible that normalization and sampling accuracy will be further improved by doing so. . (If cost is ignored) This idea can also be implemented in the same way, such as outputting the input character image at the time of recognition rejection to CAT as multivalued information, or inputting high-quality images using a low group @ degree sensor. .

【図面の簡単な説明】[Brief explanation of the drawing]

@1図は、従来技術の手書き、印刷漢字OCRを説明す
る図である。第2図は本発明の実施例概要を説明する図
である。第3図は本発明の実施例における検切、正規化
部を説明する図である。第4図は本発明の実施例におけ
る標本化回路を説明する図である。第5図は@4図標本
化回路の周辺回路の機能を説明する図である。 30・・・アナログビデオ信号、31・・・アナログ/
デジタル変換器、32・・・多値イメージバッファ、3
3・・・検切制御部、34・・・印刷漢字正規化パター
ンバッファ、35・・・手書き漢字正規化パターンバッ
ファ、36・・・正規化回路、37・・・手書、印刷制
市線。 代理人弁理士 則 近 憲 佑(ほか1名)第 1 図 第 2 図 第 3 図 第 4 図 〆 、5ム
Figure @1 is a diagram illustrating handwritten and printed Kanji OCR of the prior art. FIG. 2 is a diagram illustrating an outline of an embodiment of the present invention. FIG. 3 is a diagram illustrating the cutoff and normalization section in the embodiment of the present invention. FIG. 4 is a diagram illustrating a sampling circuit in an embodiment of the present invention. FIG. 5 is a diagram explaining the functions of peripheral circuits of the sampling circuit shown in FIG. 30...Analog video signal, 31...Analog/
Digital converter, 32...Multi-value image buffer, 3
3... Cutoff control unit, 34... Printed kanji normalization pattern buffer, 35... Handwritten kanji normalization pattern buffer, 36... Normalization circuit, 37... Handwritten, printed city line . Representative Patent Attorney Noriyuki Chika (and 1 other person) Figure 1 Figure 2 Figure 3 Figure 4 Figures 4 and 5

Claims (1)

【特許請求の範囲】[Claims] 光学的に帳票上を走査し文字を読取る文字認識装置にあ
って走査によって得られた電気信号を多値にデジタイズ
する手段、前記デジタル信号を記憶する手段、前記起重
手段からデータを続出し、バイナリ−信号に変換し文字
の検出、切出し領域を決定する手段、前記切出し領域に
ついて任意の領域の荷重加算を行う手段をもち、あらか
じめ用
In a character recognition device that optically scans a document and reads characters, a means for digitizing an electrical signal obtained by scanning into multi-values, a means for storing the digital signal, and a means for continuously outputting data from the raising means, It has a means for converting into a binary signal, detecting characters, determining a cutout area, and a means for adding weight of an arbitrary area for the cutout area.
JP59081999A 1984-04-25 1984-04-25 Character reader Expired - Lifetime JPH0731712B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP59081999A JPH0731712B2 (en) 1984-04-25 1984-04-25 Character reader

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP59081999A JPH0731712B2 (en) 1984-04-25 1984-04-25 Character reader

Publications (2)

Publication Number Publication Date
JPS60225983A true JPS60225983A (en) 1985-11-11
JPH0731712B2 JPH0731712B2 (en) 1995-04-10

Family

ID=13762170

Family Applications (1)

Application Number Title Priority Date Filing Date
JP59081999A Expired - Lifetime JPH0731712B2 (en) 1984-04-25 1984-04-25 Character reader

Country Status (1)

Country Link
JP (1) JPH0731712B2 (en)

Also Published As

Publication number Publication date
JPH0731712B2 (en) 1995-04-10

Similar Documents

Publication Publication Date Title
US4791679A (en) Image character enhancement using a stroke strengthening kernal
CA2704830C (en) Method for image analysis especially, for mobile stations
JPS61227483A (en) Apparatus for reading, processing and compressing document
US20040179733A1 (en) Image reading apparatus
Rodrigues et al. Cursive character recognition–a character segmentation method using projection profile-based technique
EP0352011A3 (en) Method for establishing pixel colour probabilities for use in ocr logic
JPS60225983A (en) Reader of character
JPS6141029B2 (en)
JPS58219682A (en) Read system of character picture information
JP3245449B2 (en) Form reader
JPS6037952B2 (en) Optimal binarization method
JPH07120393B2 (en) Character recognition / graphic processing device
JPS6325387B2 (en)
JPS6020785B2 (en) Character pattern buffer
JPS63143683A (en) Optical character reader
JPS61290581A (en) Extracting and storing device for retrieval information
JP2936766B2 (en) Image input device
JPS60230275A (en) Optical character reader
JPS61147379A (en) Optical character reader
JPH04190473A (en) Optical character reader
JP2917367B2 (en) Character recognition device
KR930007083B1 (en) Candidate character classification method
JPS6321954B2 (en)
Ham et al. Simple sequentially designed rule-based alphanumerics recognition algorithm for OCR document processing using a thinning process
JPH04316180A (en) Method for discriminating attribute of document picture