JPS6386089A - Character recognizing device - Google Patents

Character recognizing device

Info

Publication number
JPS6386089A
JPS6386089A JP61232131A JP23213186A JPS6386089A JP S6386089 A JPS6386089 A JP S6386089A JP 61232131 A JP61232131 A JP 61232131A JP 23213186 A JP23213186 A JP 23213186A JP S6386089 A JPS6386089 A JP S6386089A
Authority
JP
Japan
Prior art keywords
character
group
character group
recognition
knowledge dictionary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP61232131A
Other languages
Japanese (ja)
Inventor
Shigeru Miyao
宮尾 滋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Corp
Original Assignee
Toshiba Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Corp filed Critical Toshiba Corp
Priority to JP61232131A priority Critical patent/JPS6386089A/en
Publication of JPS6386089A publication Critical patent/JPS6386089A/en
Pending legal-status Critical Current

Links

Abstract

PURPOSE:To decrease reading errors by comparing a designating character group with the prescribed character group of a knowledge dictionary and executing recognizing processing of the designating character group. CONSTITUTION:The character group of a slip is read by a reading part 1 and the character of the read character group is recognized part 2. A segmenting part 3 segments the designating character group of the same character species designated out of the character groups of a slip in which a character recognition is completed. A semantically unfair character group is stored into a knowledge dictionary part 4. The designating character group and the prescribed character group are compared by a recognizing processing part 5 and the recognizing processing of the designating character group is executed.

Description

【発明の詳細な説明】 [発明の目的] (産業上の利用分野) 本発明は、帳票の文字を認識する文字認識装置に関する
DETAILED DESCRIPTION OF THE INVENTION [Object of the Invention] (Industrial Application Field) The present invention relates to a character recognition device that recognizes characters on a form.

(従来の技術) 従来から、各種事務処理分野において、帳票の文字を読
取り認識する文字認識装置が広く用いられている。
(Prior Art) Character recognition devices that read and recognize characters on forms have been widely used in various office processing fields.

このような文字認識装置に読取らせる帳票としては、次
のようなものがある。
The following types of forms are read by such a character recognition device.

第3図は名簿形式の帳票P1を示すもので、aは住所記
入枠を示している。また第4図は原稿用紙形式の帳票P
2を示すもので、Aは文章、B1.82.83.84.
85は漢字熟語、bは文字記入枠を示している。
FIG. 3 shows a list-type form P1, where a indicates an address entry frame. Also, Figure 4 shows the form P in manuscript paper format.
2, A is a sentence, B1.82.83.84.
85 indicates a kanji compound word, and b indicates a character entry frame.

ところで帳票P1のように文字記入枠が、ある程度のま
とまりをもって設定されている帳票の読取りを行なう場
合には、文字記入枠内にいかなる分類内容の文字が記入
されているかが予め判明しているので、認識結果の妥当
性のチェックを行うことができる。
By the way, when reading a form such as form P1 in which the character entry frames are set in a certain degree of grouping, it is known in advance what kind of classification content the characters are written in the character entry frames. , the validity of the recognition results can be checked.

たとえば帳票P1の住所記入枠aの「東京都」という漢
字固有名詞を認識する場合には、第5図に示したような
認識結果が得られる可能性がある。
For example, when recognizing the Kanji proper noun "Tokyo" in the address entry box a of form P1, the recognition result shown in FIG. 5 may be obtained.

ここで認識結果としては、同図に示したように順位をも
って複数の候補が得られる場合と、単一の候補のみが得
られる場合とがあるが、いずれの場合においても、住所
記入枠aに記入されている漢字は住所を示す漢字である
ことが予め判明しているので、認識結果を住所として妥
当であるか否かを住所の単語が記憶された知識辞書の所
定の単語と比較することにより、第5図に示した第3位
の「東京都」を選出することが可能である。したがって
前者の場合には妥当なものを選出して出力し、後者の場
合には、正しい認識結果が得られているときのみ出力す
れば、読取ミスの発生を極力押えることができる。
Here, as a recognition result, as shown in the figure, there are cases in which multiple candidates are obtained with ranking, and cases in which only a single candidate is obtained, but in both cases, address entry frame a is Since it is known in advance that the written kanji is a kanji that indicates an address, the recognition result is compared with a predetermined word in the knowledge dictionary in which the address word is stored to determine whether it is valid as an address. Accordingly, it is possible to select the third place "Tokyo" shown in FIG. Therefore, in the former case, a valid one is selected and output, and in the latter case, the occurrence of reading errors can be minimized by outputting only when a correct recognition result has been obtained.

しかしながら、このような従来の文字認識装置では、第
4図に示したように、帳票P2の文字記入枠すが連続し
ている帳票を使用した場合、文字記入枠にいかなる分類
内容の文字が記入されているかを予め知ることはできな
い。したがって原稿用紙形式の帳票を使用した場合には
先に述べた認識結果の妥当性チェックを実施することが
できず、読取ミスが多くなってしまうという問題があっ
た。
However, in such a conventional character recognition device, when a form is used in which the character entry frames of form P2 are consecutive, as shown in FIG. It is not possible to know in advance what will happen. Therefore, when a document in the form of a manuscript paper is used, the above-mentioned validity check of the recognition result cannot be carried out, and there is a problem in that reading errors increase.

(発明が解決しようとする問題点) 本発明は上記従来の問題点を解決するためのもので、文
字記入枠が連続している形式の帳票に記入された漢字混
り文等の文章を認識する場合でも、漢字熟語等の単語単
位の文字群の認識処理を良好に行うことができ、読取ミ
スの発生を大幅に低減させることのできる文字認識装置
を提供することを目的する。
(Problems to be Solved by the Invention) The present invention is intended to solve the above conventional problems, and recognizes sentences such as sentences containing kanji written in a form with continuous character entry frames. It is an object of the present invention to provide a character recognition device that can satisfactorily perform recognition processing for word-based character groups such as kanji and compound words, and can significantly reduce the occurrence of reading errors.

[発明の構成コ (問題点を解決するための手段) 本発明は上記目的を達成するために、帳票の文字群を読
取る読取手段と、読取った文字群の文字を1文字単位で
認識する文字認識手段と、文字認識の完了した帳票の文
字群の中から指定された同一文字種の指定文字群を切出
す切出手段と、意味的に正当な文字群を記憶した知識辞
書と、前記指定文字群と前記知識辞書の所定の文字群と
を比較して前記指定文字群の認識処理を行う認識処理手
段とを備えたものである。
[Configuration of the Invention (Means for Solving Problems)] In order to achieve the above object, the present invention provides a reading means for reading a group of characters on a form, and a character for recognizing each character of the read character group. a recognition means, an extraction means for cutting out a specified character group of the same character type from among the character group of the form for which character recognition has been completed, a knowledge dictionary storing a semantically valid character group, and the specified character. and recognition processing means for performing recognition processing on the designated character group by comparing the group with a predetermined character group in the knowledge dictionary.

(作用) そして本発明は上記手段により、文字記入枠が連続して
いる形式の帳票に記入された漢字混り文等の文章を認識
する場合でも、漢字熟語等の単語単位の文字群の認識処
理を良好に行うことができ、読取ミスの発生を大幅に低
減させることができる。
(Operation) The present invention uses the above means to recognize character groups in units of words such as kanji compound words, even when recognizing sentences such as sentences containing kanji written in a form with continuous character entry frames. Processing can be performed satisfactorily, and the occurrence of reading errors can be significantly reduced.

(実施例) 以下、本発明の一実施例を図面に基づいて詳細に説明す
る。
(Example) Hereinafter, an example of the present invention will be described in detail based on the drawings.

第1図は本発明の一実施例の文字認識装置の構成を示す
ブロック図である。
FIG. 1 is a block diagram showing the configuration of a character recognition device according to an embodiment of the present invention.

同図において、1は帳票の、たとえば文章等の文字群を
読取る読取部、2は読取った文字群の各文字を1文字t
p位でHLIJする文字認識部、3は文字認識の完了し
た文字群の中から指定された同一文字種の指定文字群を
切出す切出部、4は意味的に正当な漢字熟語、ひらがな
、カタカナ等の単語が保存された知識辞書部、5は文字
認識部1により認識された文字を、切出部3により切出
された指定文字群(単語)の認識結果と知識辞書部4に
保存された単語とを較べて認識処理する認識処理部、6
は切出部4に指定する文字群の単語たとえば、漢字熟語
、ひらがな、カタカナの情報を入力および指定するため
の入力部である。
In the figure, 1 is a reading unit that reads a group of characters, such as a sentence, on a form, and 2 is a reading unit that reads each character of the read character group.
Character recognition unit that performs HLIJ at position p, 3 is an extraction unit that extracts a specified character group of the same character type from the character group for which character recognition has been completed, 4 is a semantically valid kanji idiom, hiragana, and katakana. The knowledge dictionary section 5 stores the characters recognized by the character recognition section 1 and the recognition result of the specified character group (word) extracted by the extraction section 3, and the knowledge dictionary section 5 stores the words such as a recognition processing unit that performs recognition processing by comparing the words with
is an input section for inputting and specifying information on words of a character group to be specified in the cutting section 4, such as kanji compound words, hiragana, and katakana.

このように構成された文字認識装置の動作を第2図に示
したフローチャートにより説明する。
The operation of the character recognition device configured as described above will be explained with reference to the flowchart shown in FIG.

まず、帳票Pを処理する以前に、切出部3に指定する単
語たとえば、漢字熟語、ひらがな、カタカナの情報を入
力部6により入力する(STEPl)。次に、帳票Pが
読取部1に搬送され、読取部1により帳票Pの文字が読
取られる(STEP2)。そして読取った文字のイメー
ジが文字H’92部2により1文字毎に認識される(S
TEP3)。
First, before processing the form P, information on words to be specified in the cutting section 3, such as kanji compound words, hiragana, and katakana, is input through the input section 6 (STEP 1). Next, the form P is conveyed to the reading section 1, and the characters on the form P are read by the reading section 1 (STEP 2). The image of the read characters is then recognized character by character by the character H'92 unit 2 (S
TEP3).

この後、帳票Pの文章の文字群の中から切出部3により
、指定された同一文字種の指定文字群が切出される(S
TEP4)。たとえば漢字熟語を切出す場合、文字認識
部2の認識結果に基づき、認識された文字が漢字である
か否かを判断し、漢字が2語以上続く熟語のみ切出す。
Thereafter, a specified character group of the same specified character type is cut out by the cutting unit 3 from the character group of the text on the form P (S
TEP4). For example, when cutting out kanji phrases, it is determined whether the recognized characters are kanji or not based on the recognition results of the character recognition unit 2, and only idioms containing two or more kanji characters are cut out.

この後、指定文字群である単語が認識処理部5によりそ
の単語の認識結果と知識辞書部4の単語とが比較される
。そして、知識辞書の単語と認識結果が一致した場合に
は、それを最終的な認識結果とする。一方、不一致で、
かつ認識結果に複数の候補が存在する場合、次候補を知
識辞書の単語と比較する。このようにして知識辞書の単
語と一致するまで比較を行う(STEP5)。
Thereafter, the recognition processing unit 5 compares the recognition result of the word that is the specified character group with the words in the knowledge dictionary unit 4. If the word in the knowledge dictionary matches the recognition result, this is taken as the final recognition result. On the other hand, in disagreement,
If there are multiple candidates in the recognition result, the next candidate is compared with the word in the knowledge dictionary. In this way, the comparison is performed until the words match the words in the knowledge dictionary (STEP 5).

なお、3語以上の漢字熟語にJ3いては、認識結果の複
数の候補すべてが知識辞書の単語と不一致の場合、その
熟語を構成する文字列の最後の語を削除して、知識辞書
の単語と比較する。そして、一致するまで一語ずつ後ろ
から削除して比較する。
For J3 kanji compound words with three or more words, if all the candidates in the recognition result do not match the words in the knowledge dictionary, the last word of the character string that makes up the compound word is deleted and the words in the knowledge dictionary are replaced with the words in the knowledge dictionary. Compare with. Then, delete each word from the end and compare until a match is found.

一致したら残った文字列について、同様にして比較を行
う。たとえば「操作説明書」という漢字熟語の場合、「
操作説明書」→「操作説明」→「操作説」→「操作」、
「説明書」→「説明」の順に知識辞書の単語と比較され
、「操作」、「説明」が知識辞書の単語と一致する。
If there is a match, compare the remaining strings in the same way. For example, in the case of the kanji idiom "operation manual", "
``Operation Manual'' → ``Operation Instructions'' → ``Operation Instructions'' → ``Operation'',
"Instruction" → "Explanation" are compared with the words in the knowledge dictionary in this order, and "operation" and "explanation" match the words in the knowledge dictionary.

したがって、第4図に示した帳票P1を認識する場合に
、たとえば帳票P1の文章Aの単語B2  F構成」の
「構」を認識する際、「講」を第1候補とし、正解であ
る「構」を第2候補として認識した状態において、文章
Aの中から81「提案」、B2 「組成J、B3、B4
、B5の漢字熟語が切出部3により切出される。
Therefore, when recognizing form P1 shown in FIG. 4, for example, when recognizing the "structure" of word B2 F composition of sentence A in form P1, "ko" is the first candidate and the correct answer is " 81 "Proposal" from text A, B2 "Composition J, B3, B4" is recognized as the second candidate.
, B5 are extracted by the extraction unit 3.

なお、この場合では、切出す単語を入力部6により漢字
熟語と指定している。そして、認識処理部5により漢字
熟語B+ 、82等が知識辞書部4の単語と比較される
。ここで単語B1の「提案」は正しいことが認識される
。また、単語B2の「組成」は知識辞8部4に存在しな
いので、第2候補である「構成」が知識辞2部4の単語
と比較される。そして単語B2の「構成」が正しいこと
が認識され、「構成」を最終的な認識結果とする。B3
 、B 4 、B sについても同様に認識される。
In this case, the input unit 6 specifies the word to be extracted as a kanji compound word. Then, the recognition processing unit 5 compares the kanji compound words B+, 82, etc. with the words in the knowledge dictionary unit 4. Here, it is recognized that the word B1 "proposal" is correct. Furthermore, since the word B2 "composition" does not exist in the knowledge dictionary 8 part 4, the second candidate "composition" is compared with the word in the knowledge dictionary 2 part 4. Then, it is recognized that "composition" of word B2 is correct, and "composition" is taken as the final recognition result. B3
, B 4 , and B s are similarly recognized.

このように構成された文字認識装置では、文字認識の完
了した帳票P1の文章Aの中から、入力部6により指定
された同一文字種(漢字熟語)の単語81、B;!、B
3、B4、BS  (文字群)を切出部3により切出し
、認識処理部5により知識辞書部4の単語と較べて認識
処理するように構成したので、帳票に記入された漢字混
り文等の文章を認識する場合でも、漢字熟語等の単語単
位で認識処理を良好に行うことができ、読取ミスを大幅
に低減させることができる。
In the character recognition device configured in this manner, words 81, B;! of the same character type (kanji compound word) designated by the input unit 6 are selected from the text A of the form P1 for which character recognition has been completed. , B
3, B4, BS (character group) is extracted by the extraction unit 3, and the recognition processing unit 5 compares it with the words in the knowledge dictionary unit 4 and performs recognition processing, so that sentences containing kanji written on a form, etc. Even when recognizing sentences, the recognition process can be performed well on a word-by-word basis such as kanji and phrases, and reading errors can be significantly reduced.

[発明の効果] 以上説明したように本発明の文字認識装置は、文字記入
枠が連続している形式の帳票に記入された漢字混り文等
の文章を認識する場合でも、漢字熟語等の単語単位の文
字群の認識処理を良好に行うことができ、読取ミスの発
生を大幅に低減させることができる。
[Effects of the Invention] As explained above, the character recognition device of the present invention recognizes sentences such as kanji compound words written in a form with continuous character entry frames. Recognition processing for character groups in word units can be performed satisfactorily, and the occurrence of reading errors can be significantly reduced.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明の一実施例の文字認識装置の構成を示す
ブロック図、第2図は第1図の文字認識装置の動作を説
明するためのフローチャート、第3図は名簿形式の転票
を示す平面図、第4図は原稿用紙形式の帳票を示す平面
図、第5図は一般的な文字認識装置における認識結果の
候補になると考えられる固有名詞を示す図である。 1・・・・・・・・・読取部 2・・・・・・・・・文字認識部 3・・・・・・・・・切出部 4・・・・・・・・・知識辞書部 5・・・・・・・・・認識処理部 6・・・・・・・・・入力部 出願人      株式会社 東芝 代理人 弁理士  須 山 佐 − 第1図 第3図 第5図
FIG. 1 is a block diagram showing the configuration of a character recognition device according to an embodiment of the present invention, FIG. 2 is a flowchart for explaining the operation of the character recognition device shown in FIG. 1, and FIG. 3 is a list-type transfer form. FIG. 4 is a plan view showing a form of manuscript paper, and FIG. 5 is a diagram showing proper nouns that are considered to be candidates for recognition results in a general character recognition device. 1...Reading unit 2...Character recognition unit 3...Cutout unit 4...Knowledge dictionary Part 5: Recognition Processing Part 6: Input Part Applicant: Toshiba Corporation, Patent Attorney, Satoshi Suyama - Figure 1 Figure 3 Figure 5

Claims (1)

【特許請求の範囲】[Claims] (1)帳票の文字群を読取る読取手段と、読取つた文字
群の文字を1文字単位で認識する文字認識手段と、文字
認識の完了した帳票の文字群の中から指定された同一文
字種の指定文字群を切出す切出手段と、意味的に正当な
文字群を記憶した知識辞書と、前記指定文字群と前記知
識辞書の所定の文字群とを比較して前記指定文字群の認
識処理を行う認識処理手段とを具備したことを特徴とす
る文字認識装置。
(1) A reading means for reading a group of characters on a form, a character recognition means for recognizing each character in the read character group, and designation of the same character type from among the group of characters on the form for which character recognition has been completed. A cutting means for cutting out a character group, a knowledge dictionary storing semantically valid character groups, and a recognition process for the specified character group by comparing the specified character group with a predetermined character group in the knowledge dictionary. 1. A character recognition device characterized by comprising a recognition processing means for performing recognition processing.
JP61232131A 1986-09-30 1986-09-30 Character recognizing device Pending JPS6386089A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP61232131A JPS6386089A (en) 1986-09-30 1986-09-30 Character recognizing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP61232131A JPS6386089A (en) 1986-09-30 1986-09-30 Character recognizing device

Publications (1)

Publication Number Publication Date
JPS6386089A true JPS6386089A (en) 1988-04-16

Family

ID=16934484

Family Applications (1)

Application Number Title Priority Date Filing Date
JP61232131A Pending JPS6386089A (en) 1986-09-30 1986-09-30 Character recognizing device

Country Status (1)

Country Link
JP (1) JPS6386089A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012068879A (en) * 2010-09-22 2012-04-05 Fujitsu Ltd Character recognition device, character recognition device control program, character recognition device control method and portable terminal device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012068879A (en) * 2010-09-22 2012-04-05 Fujitsu Ltd Character recognition device, character recognition device control program, character recognition device control method and portable terminal device

Similar Documents

Publication Publication Date Title
Ng et al. Improving machine learning approaches to coreference resolution
US5587902A (en) Translating system for processing text with markup signs
JPH06259424A (en) Document display device and document summary device and digital copying device
JPS5892063A (en) Idiom processing system
JPS6386089A (en) Character recognizing device
Taylor et al. Integrating natural language understanding with document structure analysis
JPH06290209A (en) Sentence segmentation device
JP2570784B2 (en) Document reader post-processing device
JPS6368972A (en) Unregistered word processing system
JPH03259376A (en) Japanese language long text division supporting device
JPS63109572A (en) Derivative processing system
JP2002297585A (en) Splitting method for noun phrase in text in english, creating method and apparatus for syntax information in english
JPH10293811A (en) Document recognition device and method, and program storage medium
JP3492442B2 (en) Document Content Characterization Using Word Shape Tokens
JPS62143178A (en) Natural language translation system
JPS61208164A (en) Display system of proofreading device for japanese document
JPH05250403A (en) Japanese sentence word analyzing system
JPS63136269A (en) Automatic translating device
JPH0262659A (en) Extracting device for correction candidate character of japanese sentence
JPH0312780A (en) Storing system for morpheme analysis result information on document reader after-processor
JPH0410161A (en) Omission compensating processing device
JPH02153461A (en) Method of checking quantum name
JPH05250406A (en) Automatic translating device
JPH0756926A (en) Parting word processing system in chinese/japanese machine translation
JPH10105552A (en) Machine translation system