JPS63157292A

JPS63157292A - Hand-written kanji ocr device

Info

Publication number: JPS63157292A
Application number: JP61305843A
Authority: JP
Inventors: Chiharu Aoki; 千春青木; Taishin Iwamura; 岩村　太信
Original assignee: Yokogawa Electric Corp
Current assignee: Yokogawa Electric Corp
Priority date: 1986-12-22
Filing date: 1986-12-22
Publication date: 1988-06-30

Abstract

PURPOSE:To accurately discriminate similar KANJIs (Chinese character) by constituting the titled device so that employed candidate characters are replaced with discrimination-unable characters in the case of phrases of allowable combination, storing the results in a text memory, and the results are inputted for a confirmation work by an operator by conversa tional format through CRT screen after ending the processing of the discrimination-unable characters of whole text. CONSTITUTION:An original 1 is set in a hand-written KANJI OCR part 2, and the KANJIs are optically read. A processor 4 discriminates an output character from the part 2 by means of pattern aligning method as referring to a standard pattern memory 5. A discriminated character is stored in the form of character code in a text memory 6, but for a character that is not discriminated, a discrimination-unable code is stored in the memory 6. The proces sor 4 obtains a phrase which is the combination of respective candidate characters listed up and characters before and after the discrimination-unable character concerned in the text memory, and checks whether or not this phrase is registered in a dictionary memory 3. When it is the phrase of existing combination, the candidate character is employed, and it is substituted in the part of a discrimination-unable code in the text memory 6. When the phrase is not present in the dictionary memory, the discrimination-unable code is left as it is in the text memory, then, it is confirmed by an operator.

Description

【発明の詳細な説明】［産業上の利用分野］本発明は１手書き漢字ＯＣＲ（ＯＣＲは光学式文字読取
り機の略）装置に関し、特に文字判別の機能の改善に関
する。DETAILED DESCRIPTION OF THE INVENTION [Industrial Application Field] The present invention relates to a handwritten kanji OCR (OCR is an abbreviation for optical character reader) device, and particularly relates to an improvement in the character discrimination function.

［従来の技術］従来より手書き漢字をＯＣＲにより読取る装置はよく知
られている。ところで、漢字には、点の有無、微小な線
素の差異、あるいは類似した線素の位置の違いで、異な
った字種を表す例が多数ある。[Prior Art] Devices for reading handwritten kanji using OCR are well known. By the way, there are many examples of kanji in which different types of characters are expressed by the presence or absence of dots, minute differences in line elements, or differences in the positions of similar line elements.

点の有無の例としては「王」と「玉」、微小な線素の差
異の例としては「徴」と「微」、類似した線素の位置の
違いによる例としては「大」と「丈」などである。Examples of the presence or absence of points are ``king'' and ``tama,'' examples of minute differences in line elements are ``sign'' and ``micro,'' and examples of differences in position of similar line elements are ``large'' and ``. "Length" etc.

［発明が解決しようとする問題点コＯＣＲ用の漢字字形としては、このような微妙な差異も
明瞭に区別できるような字形とする必要があり、判別可
能な漢字字形に対する制約が強く。[Problems to be Solved by the Invention] The Kanji character shapes for OCR must be such that even such subtle differences can be clearly distinguished, and there are strong restrictions on the distinguishable Kanji character shapes.

使いにくいという問題があった。The problem was that it was difficult to use.

本発明は、このような点に鑑みてなされたもので、上記
のような類似した漢字（ここで漢字という場合は、かな
文字や記号等も含む）の判別を正確に行うこと、および
漢字字形に対する制約を緩和することのできる手書き漢
字ＯＣＲ装置を提供することにある。The present invention has been made in view of the above points, and it is an object of the present invention to accurately distinguish similar kanji as described above (herein, kanji includes kana characters, symbols, etc.) An object of the present invention is to provide a handwritten kanji OCR device that can relax the restrictions on handwritten kanji characters.

［問題点を解決するための手段］このような目的を達成するために、本発明、では。[Means for solving problems] In order to achieve such an objective, the present invention.

原稿上の漢字を光学的に読み取る手書き漢字ＯＣＲ部と
、漢字および漢字２文字以上の組合せでなる語句が格納さ
れた辞書メモリと。A handwritten kanji OCR unit that optically reads kanji on a manuscript, and a dictionary memory that stores kanji and words that are combinations of two or more kanji characters.

判別不可文字に対して候補文字とその判別不可文字の前
後にある文字との組合せを求め、辞書メモリでその語句
を確認し、許されている組合せの語句である場合はその
語句の文字を採用し、辞書に無い語句である場合には当
該判別不可文字に特定コードを付して、その結果をテキ
ストメモリに格納する処理を行うプロセッサと、文字や記号を画面に表示するためのＣＲＴ表示装置と、テキストメモリの内容をＣＲＴ表示装置に表示させるた
めのＣＲＴインターフェイスと。For an unidentifiable character, find a combination of a candidate character and the characters before and after the unidentifiable character, check the word in the dictionary memory, and if it is a combination of words that is allowed, adopt the character of that word. However, if the word is not in the dictionary, a processor that attaches a specific code to the unidentifiable character and stores the result in a text memory, and a CRT display device that displays the characters and symbols on the screen. and a CRT interface for displaying the contents of the text memory on a CRT display device.

文字や記号および制御情報を前記プロセッサに入力する
ためのキーボードとを具備したことを特徴とする。The present invention is characterized by comprising a keyboard for inputting characters, symbols, and control information to the processor.

［作用］本発明では、ＯＣＲにより漢字原稿を読み込み、プロセ
ッサにおいて、原稿から読み込んだ文字を解析して対応
する文字コードをテキストに格納し、判別不可の文字に
対しては候補文字とその前後の文字との組合せの語句を
辞書メモリで確認し、許されている組合せの語句の場合
には採用した候補文字を判別不可文字に置き換えてテキ
ストメモリに格納し。[Operation] In the present invention, a kanji manuscript is read by OCR, and the processor analyzes the characters read from the manuscript and stores the corresponding character code in the text. For characters that cannot be distinguished, candidate characters and their surroundings are The word/phrase in combination with characters is checked in the dictionary memory, and if the word/phrase is in an allowed combination, the adopted candidate character is replaced with an indistinguishable character and stored in the text memory.

テキスト全体の判別不可文字の処理が終了した後、更に
オペレータによりＣＲＴ画面と会話形式で確認作業に入
りキーボードからの情報入力により適切な文字に変換す
る。After the processing of indistinguishable characters in the entire text is completed, the operator further confirms the text on the CRT screen and in a conversational manner and converts the text into appropriate characters by inputting information from the keyboard.

［実施例コ以下図面を参照して本発明の実施例を詳細に説明する。[Example code] Embodiments of the present invention will be described in detail below with reference to the drawings.

第１図は本発明に係る手書き漢字ＯＣＲ装置の一実施例
を示す構成図である０図において、１は漢字を含む手書
きの原稿、２は原稿１の漢字。FIG. 1 is a block diagram showing an embodiment of a handwritten kanji OCR device according to the present invention. In FIG.

記号などを光学的に読取る手書き漢字ＯＣＲ部、３は漢
字辞書が格納された辞書メモリ、４はプロセッサ、５は
漢字や記号（以下総称して漢字という）の標準字形パタ
ーンが格納された標準パターンメモリ、６は読み取った
原稿の漢字の文字コードが格納されるテキストメモリ、
７はＣＲＴインターフェイス、８は漢字を表示するＣＲ
Ｔ表示装置、９は情報を入力するキーボードである。3 is a dictionary memory that stores a kanji dictionary; 4 is a processor; 5 is a standard pattern that stores standard character shape patterns of kanji and symbols (hereinafter collectively referred to as kanji); memory, 6 is a text memory in which the character code of the kanji of the read manuscript is stored;
7 is a CRT interface, 8 is a CR that displays kanji
T display device, 9 is a keyboard for inputting information.

このような構成における動作を第２図のフローチャート
を参照して次に説明する。原稿１を手書き漢字ＯＣＲ部
２にかけ、漢字を光学的に読み取る。プロセッサ４では
、標準パターンメモリ５を参照してパターン整合法によ
り手書き漢字ＯＣＲ部２からの出力文字を識別する。な
お、パターン整合法は公知の手法により行われる。The operation in such a configuration will be explained next with reference to the flowchart of FIG. A manuscript 1 is subjected to a handwritten kanji OCR section 2 to optically read kanji. The processor 4 refers to the standard pattern memory 5 and uses a pattern matching method to identify the characters output from the handwritten kanji OCR section 2. Note that the pattern matching method is performed using a known method.

判別された文字はテキストメモリ６に文字コードで格納
し、判別できない文字は判別不可コードをテキストメモ
リ６に格納する。Discriminated characters are stored in the text memory 6 as character codes, and characters that cannot be discriminated are stored as undiscernible codes in the text memory 6.

このようにしてテキストメモリ６に読み込んだ後、判別
不可コードについてはプロセッサ４により次の処理を行
う６判別不可文字に対する候補文字を複数個挙げる。列
挙された各候補文字とテキストメモリ上での当該判別不
可文字の前後にある文字（この文字は判別可能な文字で
ある）を含めた組合せの語句を求め、その組合せ語句が
辞書メモリ３上に登録されているかどうかをチェックし
、辞書メモリ上に存在する組合せの語句である場合には
その候補文字を採用しテキストメモリ６上の判別不可コ
ードの部分に置き換える。After reading into the text memory 6 in this manner, the processor 4 selects a plurality of candidate characters for the 6 unidentifiable characters, which are subjected to the following processing for the unidentifiable code. A word combination including each of the listed candidate characters and the characters before and after the unidentifiable character in the text memory (these characters are distinguishable characters) is found, and the combination word is stored in the dictionary memory 3. It is checked to see if it is registered, and if the combination of words exists on the dictionary memory, the candidate character is adopted and replaced with the unidentifiable code part on the text memory 6.

辞書メモリ上に無い語句である場合は、求められた他の
組合せ語句についての総べてについてチェックするが、
依然として辞書メモリ上に無い場合は判別不可コードを
そのままテキストメモリに残す、そして、オペレータに
確認してもらうために、テキストメモリ６上の文字コー
ドに対してハイライト表示のＩ！Ｉ′ｓ情報を付加して
おく。If the word is not in the dictionary memory, all other combinations of words are checked.
If it is still not in the dictionary memory, the unidentifiable code is left in the text memory as is, and the I! code in the text memory 6 is highlighted for the operator to confirm. I's information is added.

判別不可文字に対する処理の具体例を示せば次の通りで
ある。例えば、第３図に示すように、文字３１が判別不
可文字である場合、その候補文字としてパターン認識手
法により判別して、「丈」。A specific example of processing for unidentifiable characters is as follows. For example, as shown in FIG. 3, if the character 31 is an unrecognizable character, it is identified as a candidate character using a pattern recognition technique, and is identified as "length".

ｒ文」、ｒ大」を挙げる。そして１判別不可文字の後の
文字１字」との組合せの語句として、「大字」、「文字
」、「大字」があるとする、プロセッサはこれらの各語
句についてそれぞれチェックする。まずｒ大字」は辞書
に無い語句で、間違いであると判定する１次のｒ文字」
は辞書に在り、正解の語句と判定する。「大字」は辞書
に在り正解の語句と判定するが、この場合出現頻度は「
文字」の方が高いので、ｒ文字」をｊ１１ａ補とし、「
大字」は１２候補とする。テキストメモリには第１候補
で採用した語句の文字１文」を判別不可文字に置き換え
る。List ``r sentence'' and ``r large''. Then, it is assumed that there are "oaza", "character", and "oaza" as words and phrases in combination with "1 character after 1 unidentifiable character", and the processor checks each of these words and phrases. First of all, ``r large letter'' is a word that is not found in the dictionary, and is the primary r character that is determined to be incorrect.''
is in the dictionary and is determined to be the correct word. "Oaza" is in the dictionary and is determined to be the correct word, but in this case, the frequency of occurrence is "
Since ``letter'' is higher, we use ``r letter'' as j11a complement, and ``
``Oaza'' has 12 candidates. In the text memory, one character sentence of the word adopted as the first candidate is replaced with an unidentifiable character.

なお、ｒ文字」のｒ文」にするか、「大字」のｒ大」に
するかの最終的な決定は、オペレータの選択に任せる。The final decision as to whether to use "r character" for "r sentence" or "large character" for "r large" is left to the operator's choice.

ＣＲＴインターフェイス７は、テキストメモリ６内の文
字コードに対応する文字をＣＲＴ表示装置ｉ！８上に表
示する。判別不可文字でハイライト表示の制御情報を持
つ文字は、ＣＲ７表示装置８のＣＲ７画面上の輝度を高
くして、あるいは反転やブリンキング等の特殊表示形式
で表示する。The CRT interface 7 displays the characters corresponding to the character codes in the text memory 6 on the CRT display device i! Display on 8. Characters that are indistinguishable and have control information for highlighted display are displayed with high brightness on the CR7 screen of the CR7 display device 8, or in a special display format such as inversion or blinking.

以上のようにしてテキスト全体についての検索が終了し
た後は、オペレータはＣＲ７画面上にて会話モードでこ
れらの文字の部分をチェックし。After completing the search for the entire text as described above, the operator checks these characters on the CR7 screen in conversation mode.

妥当ならばキーボード９から”ＹＥＳ　（またはＹ）″
を、間違っていれば”Ｎｏ　（またはＮ）”を入力し次
候補の選択あるいは直接入力（文字あるいは文字コード
の入力）によりテキストを完成させる。If appropriate, press “YES (or Y)” from keyboard 9.
If the text is incorrect, enter "No (or N)" and complete the text by selecting the next candidate or by directly inputting (inputting characters or character codes).

なお、次候補の選択あるいは直接入力の方式は公知の手
法を用いることができる。Note that a known method can be used to select the next candidate or directly input it.

［発明の効果コ以上詳細に説明したように、本発明によれば、判別不可
文字に対して、候補文字と判別不可文字の前後にある文
字との組合せの語句についてそれが妥当かどうかを確認
して採用決定しているため、正確な読取りが期待できる
。このように、従来のＯＣＲでは判別不可となってしま
ったものでも、判別可、あるいはオペレータにより確認
することができるため、手書き文字に対する制約も緩く
することができる。[Effects of the Invention] As explained in detail above, according to the present invention, for an unidentifiable character, it is checked whether the combination of a candidate character and the characters before and after the unidentifiable character is valid. Since the adoption was decided based on the above, accurate readings can be expected. In this way, even characters that cannot be identified by conventional OCR can be identified or confirmed by an operator, so restrictions on handwritten characters can be relaxed.

[Brief explanation of the drawing]

第１図は本発明に係る手書き漢字０ＣＲ１置の一実施例
を示す構成図、第２図は動作を説明するためのフローチ
ャート、第３図は判別不可文字に対して候補文字ｔｔ採
用決定する一具体例を示す図である。１・・・手書きの原稿、２・・・手書き漢字ＯＣＲ部。３・・・辞書メモリ、４・・・プロセッサ、５・・・標
準パターンメモリ、６・・・テキストメモリ、７・・・
ＣＲＴインターフェイス、８・・・ＣＲＴ表示装置、９
・・・キーボード。FIG. 1 is a configuration diagram showing an embodiment of handwritten kanji 0CR1 according to the present invention, FIG. 2 is a flowchart for explaining the operation, and FIG. It is a figure showing a concrete example. 1...Handwritten manuscript, 2...Handwritten kanji OCR department. 3...Dictionary memory, 4...Processor, 5...Standard pattern memory, 6...Text memory, 7...
CRT interface, 8... CRT display device, 9
···keyboard.

Claims

[Scope of Claims] A handwritten kanji OCR unit that optically reads kanji on a manuscript; a dictionary memory that stores kanji and words consisting of combinations of two or more kanji characters; Find the combination with the characters before and after the unidentifiable literature, check the word in the dictionary memory, and if it is a combination of words that is allowed, use the characters of that word, and if the word is not in the dictionary, A processor that attaches a specific code to the unidentifiable characters and stores the result in a text memory, a CRT display device that displays the characters and symbols on the screen, and a CRT display device that displays the contents of the text memory. a CRT interface for displaying on the screen, and a keyboard for inputting characters, symbols, and control information into the processor, and the processor analyzes the characters read from the original and stores the corresponding character code in the text. However, for unrecognizable characters, the word combination of the candidate character and the characters before and after it is checked in the dictionary memory, and in the case of a combination of words that is allowed, the adopted candidate character is replaced with the unrecognizable character. After the entire text has been processed for unintelligible characters, the operator can confirm the text on the CRT screen and in a conversational manner, and input information from the keyboard to convert the text into appropriate characters. A handwritten kanji OCR device featuring the following.