JPS6386089A

JPS6386089A - Character recognizing device

Info

Publication number: JPS6386089A
Application number: JP61232131A
Authority: JP
Inventors: Shigeru Miyao; 宮尾　滋
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1986-09-30
Filing date: 1986-09-30
Publication date: 1988-04-16

Abstract

PURPOSE:To decrease reading errors by comparing a designating character group with the prescribed character group of a knowledge dictionary and executing recognizing processing of the designating character group. CONSTITUTION:The character group of a slip is read by a reading part 1 and the character of the read character group is recognized part 2. A segmenting part 3 segments the designating character group of the same character species designated out of the character groups of a slip in which a character recognition is completed. A semantically unfair character group is stored into a knowledge dictionary part 4. The designating character group and the prescribed character group are compared by a recognizing processing part 5 and the recognizing processing of the designating character group is executed.

Description

【発明の詳細な説明】［発明の目的］（産業上の利用分野）本発明は、帳票の文字を認識する文字認識装置に関する
。DETAILED DESCRIPTION OF THE INVENTION [Object of the Invention] (Industrial Application Field) The present invention relates to a character recognition device that recognizes characters on a form.

（従来の技術）従来から、各種事務処理分野において、帳票の文字を読
取り認識する文字認識装置が広く用いられている。(Prior Art) Character recognition devices that read and recognize characters on forms have been widely used in various office processing fields.

このような文字認識装置に読取らせる帳票としては、次
のようなものがある。The following types of forms are read by such a character recognition device.

第３図は名簿形式の帳票Ｐ１を示すもので、ａは住所記
入枠を示している。また第４図は原稿用紙形式の帳票Ｐ
２を示すもので、Ａは文章、Ｂ１．８２．８３．８４．
８５は漢字熟語、ｂは文字記入枠を示している。FIG. 3 shows a list-type form P1, where a indicates an address entry frame. Also, Figure 4 shows the form P in manuscript paper format.
2, A is a sentence, B1.82.83.84.
85 indicates a kanji compound word, and b indicates a character entry frame.

ところで帳票Ｐ１のように文字記入枠が、ある程度のま
とまりをもって設定されている帳票の読取りを行なう場
合には、文字記入枠内にいかなる分類内容の文字が記入
されているかが予め判明しているので、認識結果の妥当
性のチェックを行うことができる。By the way, when reading a form such as form P1 in which the character entry frames are set in a certain degree of grouping, it is known in advance what kind of classification content the characters are written in the character entry frames. , the validity of the recognition results can be checked.

たとえば帳票Ｐ１の住所記入枠ａの「東京都」という漢
字固有名詞を認識する場合には、第５図に示したような
認識結果が得られる可能性がある。For example, when recognizing the Kanji proper noun "Tokyo" in the address entry box a of form P1, the recognition result shown in FIG. 5 may be obtained.

ここで認識結果としては、同図に示したように順位をも
って複数の候補が得られる場合と、単一の候補のみが得
られる場合とがあるが、いずれの場合においても、住所
記入枠ａに記入されている漢字は住所を示す漢字である
ことが予め判明しているので、認識結果を住所として妥
当であるか否かを住所の単語が記憶された知識辞書の所
定の単語と比較することにより、第５図に示した第３位
の「東京都」を選出することが可能である。したがって
前者の場合には妥当なものを選出して出力し、後者の場
合には、正しい認識結果が得られているときのみ出力す
れば、読取ミスの発生を極力押えることができる。Here, as a recognition result, as shown in the figure, there are cases in which multiple candidates are obtained with ranking, and cases in which only a single candidate is obtained, but in both cases, address entry frame a is Since it is known in advance that the written kanji is a kanji that indicates an address, the recognition result is compared with a predetermined word in the knowledge dictionary in which the address word is stored to determine whether it is valid as an address. Accordingly, it is possible to select the third place "Tokyo" shown in FIG. Therefore, in the former case, a valid one is selected and output, and in the latter case, the occurrence of reading errors can be minimized by outputting only when a correct recognition result has been obtained.

しかしながら、このような従来の文字認識装置では、第
４図に示したように、帳票Ｐ２の文字記入枠すが連続し
ている帳票を使用した場合、文字記入枠にいかなる分類
内容の文字が記入されているかを予め知ることはできな
い。したがって原稿用紙形式の帳票を使用した場合には
先に述べた認識結果の妥当性チェックを実施することが
できず、読取ミスが多くなってしまうという問題があっ
た。However, in such a conventional character recognition device, when a form is used in which the character entry frames of form P2 are consecutive, as shown in FIG. It is not possible to know in advance what will happen. Therefore, when a document in the form of a manuscript paper is used, the above-mentioned validity check of the recognition result cannot be carried out, and there is a problem in that reading errors increase.

（発明が解決しようとする問題点）本発明は上記従来の問題点を解決するためのもので、文
字記入枠が連続している形式の帳票に記入された漢字混
り文等の文章を認識する場合でも、漢字熟語等の単語単
位の文字群の認識処理を良好に行うことができ、読取ミ
スの発生を大幅に低減させることのできる文字認識装置
を提供することを目的する。(Problems to be Solved by the Invention) The present invention is intended to solve the above conventional problems, and recognizes sentences such as sentences containing kanji written in a form with continuous character entry frames. It is an object of the present invention to provide a character recognition device that can satisfactorily perform recognition processing for word-based character groups such as kanji and compound words, and can significantly reduce the occurrence of reading errors.

［発明の構成コ（問題点を解決するための手段）本発明は上記目的を達成するために、帳票の文字群を読
取る読取手段と、読取った文字群の文字を１文字単位で
認識する文字認識手段と、文字認識の完了した帳票の文
字群の中から指定された同一文字種の指定文字群を切出
す切出手段と、意味的に正当な文字群を記憶した知識辞
書と、前記指定文字群と前記知識辞書の所定の文字群と
を比較して前記指定文字群の認識処理を行う認識処理手
段とを備えたものである。[Configuration of the Invention (Means for Solving Problems)] In order to achieve the above object, the present invention provides a reading means for reading a group of characters on a form, and a character for recognizing each character of the read character group. a recognition means, an extraction means for cutting out a specified character group of the same character type from among the character group of the form for which character recognition has been completed, a knowledge dictionary storing a semantically valid character group, and the specified character. and recognition processing means for performing recognition processing on the designated character group by comparing the group with a predetermined character group in the knowledge dictionary.

（作用）そして本発明は上記手段により、文字記入枠が連続して
いる形式の帳票に記入された漢字混り文等の文章を認識
する場合でも、漢字熟語等の単語単位の文字群の認識処
理を良好に行うことができ、読取ミスの発生を大幅に低
減させることができる。(Operation) The present invention uses the above means to recognize character groups in units of words such as kanji compound words, even when recognizing sentences such as sentences containing kanji written in a form with continuous character entry frames. Processing can be performed satisfactorily, and the occurrence of reading errors can be significantly reduced.

（実施例）以下、本発明の一実施例を図面に基づいて詳細に説明す
る。(Example) Hereinafter, an example of the present invention will be described in detail based on the drawings.

第１図は本発明の一実施例の文字認識装置の構成を示す
ブロック図である。FIG. 1 is a block diagram showing the configuration of a character recognition device according to an embodiment of the present invention.

同図において、１は帳票の、たとえば文章等の文字群を
読取る読取部、２は読取った文字群の各文字を１文字ｔ
ｐ位でＨＬＩＪする文字認識部、３は文字認識の完了し
た文字群の中から指定された同一文字種の指定文字群を
切出す切出部、４は意味的に正当な漢字熟語、ひらがな
、カタカナ等の単語が保存された知識辞書部、５は文字
認識部１により認識された文字を、切出部３により切出
された指定文字群（単語）の認識結果と知識辞書部４に
保存された単語とを較べて認識処理する認識処理部、６
は切出部４に指定する文字群の単語たとえば、漢字熟語
、ひらがな、カタカナの情報を入力および指定するため
の入力部である。In the figure, 1 is a reading unit that reads a group of characters, such as a sentence, on a form, and 2 is a reading unit that reads each character of the read character group.
Character recognition unit that performs HLIJ at position p, 3 is an extraction unit that extracts a specified character group of the same character type from the character group for which character recognition has been completed, 4 is a semantically valid kanji idiom, hiragana, and katakana. The knowledge dictionary section 5 stores the characters recognized by the character recognition section 1 and the recognition result of the specified character group (word) extracted by the extraction section 3, and the knowledge dictionary section 5 stores the words such as a recognition processing unit that performs recognition processing by comparing the words with
is an input section for inputting and specifying information on words of a character group to be specified in the cutting section 4, such as kanji compound words, hiragana, and katakana.

このように構成された文字認識装置の動作を第２図に示
したフローチャートにより説明する。The operation of the character recognition device configured as described above will be explained with reference to the flowchart shown in FIG.

まず、帳票Ｐを処理する以前に、切出部３に指定する単
語たとえば、漢字熟語、ひらがな、カタカナの情報を入
力部６により入力する（ＳＴＥＰｌ）。次に、帳票Ｐが
読取部１に搬送され、読取部１により帳票Ｐの文字が読
取られる（ＳＴＥＰ２）。そして読取った文字のイメー
ジが文字Ｈ’９２部２により１文字毎に認識される（Ｓ
ＴＥＰ３）。First, before processing the form P, information on words to be specified in the cutting section 3, such as kanji compound words, hiragana, and katakana, is input through the input section 6 (STEP 1). Next, the form P is conveyed to the reading section 1, and the characters on the form P are read by the reading section 1 (STEP 2). The image of the read characters is then recognized character by character by the character H'92 unit 2 (S
TEP3).

この後、帳票Ｐの文章の文字群の中から切出部３により
、指定された同一文字種の指定文字群が切出される（Ｓ
ＴＥＰ４）。たとえば漢字熟語を切出す場合、文字認識
部２の認識結果に基づき、認識された文字が漢字である
か否かを判断し、漢字が２語以上続く熟語のみ切出す。Thereafter, a specified character group of the same specified character type is cut out by the cutting unit 3 from the character group of the text on the form P (S
TEP4). For example, when cutting out kanji phrases, it is determined whether the recognized characters are kanji or not based on the recognition results of the character recognition unit 2, and only idioms containing two or more kanji characters are cut out.

この後、指定文字群である単語が認識処理部５によりそ
の単語の認識結果と知識辞書部４の単語とが比較される
。そして、知識辞書の単語と認識結果が一致した場合に
は、それを最終的な認識結果とする。一方、不一致で、
かつ認識結果に複数の候補が存在する場合、次候補を知
識辞書の単語と比較する。このようにして知識辞書の単
語と一致するまで比較を行う（ＳＴＥＰ５）。Thereafter, the recognition processing unit 5 compares the recognition result of the word that is the specified character group with the words in the knowledge dictionary unit 4. If the word in the knowledge dictionary matches the recognition result, this is taken as the final recognition result. On the other hand, in disagreement,
If there are multiple candidates in the recognition result, the next candidate is compared with the word in the knowledge dictionary. In this way, the comparison is performed until the words match the words in the knowledge dictionary (STEP 5).

なお、３語以上の漢字熟語にＪ３いては、認識結果の複
数の候補すべてが知識辞書の単語と不一致の場合、その
熟語を構成する文字列の最後の語を削除して、知識辞書
の単語と比較する。そして、一致するまで一語ずつ後ろ
から削除して比較する。For J3 kanji compound words with three or more words, if all the candidates in the recognition result do not match the words in the knowledge dictionary, the last word of the character string that makes up the compound word is deleted and the words in the knowledge dictionary are replaced with the words in the knowledge dictionary. Compare with. Then, delete each word from the end and compare until a match is found.

一致したら残った文字列について、同様にして比較を行
う。たとえば「操作説明書」という漢字熟語の場合、「
操作説明書」→「操作説明」→「操作説」→「操作」、
「説明書」→「説明」の順に知識辞書の単語と比較され
、「操作」、「説明」が知識辞書の単語と一致する。If there is a match, compare the remaining strings in the same way. For example, in the case of the kanji idiom "operation manual", "
``Operation Manual'' → ``Operation Instructions'' → ``Operation Instructions'' → ``Operation'',
"Instruction" → "Explanation" are compared with the words in the knowledge dictionary in this order, and "operation" and "explanation" match the words in the knowledge dictionary.

したがって、第４図に示した帳票Ｐ１を認識する場合に
、たとえば帳票Ｐ１の文章Ａの単語Ｂ２　　Ｆ構成」の
「構」を認識する際、「講」を第１候補とし、正解であ
る「構」を第２候補として認識した状態において、文章
Ａの中から８１「提案」、Ｂ２　「組成Ｊ、Ｂ３、Ｂ４
、Ｂ５の漢字熟語が切出部３により切出される。Therefore, when recognizing form P1 shown in FIG. 4, for example, when recognizing the "structure" of word B2 F composition of sentence A in form P1, "ko" is the first candidate and the correct answer is " 81 "Proposal" from text A, B2 "Composition J, B3, B4" is recognized as the second candidate.
, B5 are extracted by the extraction unit 3.

なお、この場合では、切出す単語を入力部６により漢字
熟語と指定している。そして、認識処理部５により漢字
熟語Ｂ＋　、８２等が知識辞書部４の単語と比較される
。ここで単語Ｂ１の「提案」は正しいことが認識される
。また、単語Ｂ２の「組成」は知識辞８部４に存在しな
いので、第２候補である「構成」が知識辞２部４の単語
と比較される。そして単語Ｂ２の「構成」が正しいこと
が認識され、「構成」を最終的な認識結果とする。Ｂ３
　、Ｂ　４　、Ｂ　ｓについても同様に認識される。In this case, the input unit 6 specifies the word to be extracted as a kanji compound word. Then, the recognition processing unit 5 compares the kanji compound words B+, 82, etc. with the words in the knowledge dictionary unit 4. Here, it is recognized that the word B1 "proposal" is correct. Furthermore, since the word B2 "composition" does not exist in the knowledge dictionary 8 part 4, the second candidate "composition" is compared with the word in the knowledge dictionary 2 part 4. Then, it is recognized that "composition" of word B2 is correct, and "composition" is taken as the final recognition result. B3
, B 4 , and B s are similarly recognized.

このように構成された文字認識装置では、文字認識の完
了した帳票Ｐ１の文章Ａの中から、入力部６により指定
された同一文字種（漢字熟語）の単語８１、Ｂ；！、Ｂ
３、Ｂ４、ＢＳ　　（文字群）を切出部３により切出し
、認識処理部５により知識辞書部４の単語と較べて認識
処理するように構成したので、帳票に記入された漢字混
り文等の文章を認識する場合でも、漢字熟語等の単語単
位で認識処理を良好に行うことができ、読取ミスを大幅
に低減させることができる。In the character recognition device configured in this manner, words 81, B;! of the same character type (kanji compound word) designated by the input unit 6 are selected from the text A of the form P1 for which character recognition has been completed. , B
3, B4, BS (character group) is extracted by the extraction unit 3, and the recognition processing unit 5 compares it with the words in the knowledge dictionary unit 4 and performs recognition processing, so that sentences containing kanji written on a form, etc. Even when recognizing sentences, the recognition process can be performed well on a word-by-word basis such as kanji and phrases, and reading errors can be significantly reduced.

［発明の効果］以上説明したように本発明の文字認識装置は、文字記入
枠が連続している形式の帳票に記入された漢字混り文等
の文章を認識する場合でも、漢字熟語等の単語単位の文
字群の認識処理を良好に行うことができ、読取ミスの発
生を大幅に低減させることができる。[Effects of the Invention] As explained above, the character recognition device of the present invention recognizes sentences such as kanji compound words written in a form with continuous character entry frames. Recognition processing for character groups in word units can be performed satisfactorily, and the occurrence of reading errors can be significantly reduced.

[Brief explanation of the drawing]

第１図は本発明の一実施例の文字認識装置の構成を示す
ブロック図、第２図は第１図の文字認識装置の動作を説
明するためのフローチャート、第３図は名簿形式の転票
を示す平面図、第４図は原稿用紙形式の帳票を示す平面
図、第５図は一般的な文字認識装置における認識結果の
候補になると考えられる固有名詞を示す図である。１・・・・・・・・・読取部２・・・・・・・・・文字認識部３・・・・・・・・・切出部４・・・・・・・・・知識辞書部５・・・・・・・・・認識処理部６・・・・・・・・・入力部出願人　　　　　　株式会社　東芝代理人　弁理士　　須　山　佐　− 第１図第３図第５図FIG. 1 is a block diagram showing the configuration of a character recognition device according to an embodiment of the present invention, FIG. 2 is a flowchart for explaining the operation of the character recognition device shown in FIG. 1, and FIG. 3 is a list-type transfer form. FIG. 4 is a plan view showing a form of manuscript paper, and FIG. 5 is a diagram showing proper nouns that are considered to be candidates for recognition results in a general character recognition device. 1...Reading unit 2...Character recognition unit 3...Cutout unit 4...Knowledge dictionary Part 5: Recognition Processing Part 6: Input Part Applicant: Toshiba Corporation, Patent Attorney, Satoshi Suyama - Figure 1 Figure 3 Figure 5

Claims

[Claims]

(1) A reading means for reading a group of characters on a form, a character recognition means for recognizing each character in the read character group, and designation of the same character type from among the group of characters on the form for which character recognition has been completed. A cutting means for cutting out a character group, a knowledge dictionary storing semantically valid character groups, and a recognition process for the specified character group by comparing the specified character group with a predetermined character group in the knowledge dictionary. 1. A character recognition device characterized by comprising a recognition processing means for performing recognition processing.