JPS61131091A

JPS61131091A - Character reader

Info

Publication number: JPS61131091A
Application number: JP59251587A
Authority: JP
Inventors: Kenji Hasegawa; 健治長谷川; Kazunori Nakao; 中尾　和則
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1984-11-30
Filing date: 1984-11-30
Publication date: 1986-06-18

Abstract

PURPOSE:To discriminate whether output data are post-processed or not and to clarify the discriminated result. CONSTITUTION:The recognition data in a 'sei' (a Chinese character pronounced by 'sei' and meaning 'famiry name') field are sent from a control part, a post-processing part 3 retrieves all 'sei' files, finds out the similarity between the sent recognition data and data retrieved from the 'sei' files and outputs the data having the maximum value of similarity and having a fixed value or more to a deciding part 4 as a correct answer. The deciding part 4 decides whether the data sent from the part 3 are post-processed and regarded as a correct answer or regarded as no-proposed character without post-processing and returns the data sent from the part 3 to a control part 1 together with the decided result. The control part 1 applied a high luminance display control code at the transfer to a CRT display device 5, a double printing control code at the transfer to a printer 6, or a control code for adding (*) to an 1-byte stand-by area formed on the head of the field at the transfer to a floppy disk device 7 to the data decided as no post-processed data by the deciding part 4.

Description

【発明の詳細な説明】［発明の技術分野］本発明は、漢字等の文字を認識するに際して、知識工学
的手法による後段処理を行なう機能を持った文字読取装
置に関する。DETAILED DESCRIPTION OF THE INVENTION [Technical Field of the Invention] The present invention relates to a character reading device having a function of performing post-processing using knowledge engineering techniques when recognizing characters such as Chinese characters.

［発明の技術的背型］近年、Ｈ２Ｊ技術及び素子技術の進歩により、文字読取
装置は漢字を読み取ることができるようになってぎた。[Technical Background of the Invention] In recent years, with advances in H2J technology and element technology, character reading devices have become capable of reading Chinese characters.

しかし、この漢字の読み取りには、英数文字の読み取り
等に比べると、文字種の増加、類似文字の増加等により
、非常に高度な技術が要求される。このため、漢字読み
取り可能な文字読み取り装置では、総合読み取り精度を
上げるために、光学的文字読取装置等における個別文字
の４謂技術の向上だけでなく、個別文字の認識結果（以
下認識データと称する）を用いて、語として又は文章と
して適切であるかどうかをチェックする所謂知識工学的
手法による後段処理が行なわれるようになってきている
。However, compared to reading alphanumeric characters, reading these kanji requires extremely advanced technology due to the increased number of character types and similar characters. For this reason, in order to improve the overall reading accuracy of character reading devices that can read kanji, we not only improve the four so-called technologies for individual characters in optical character reading devices, but also improve the recognition results of individual characters (hereinafter referred to as recognition data). ) is used for post-processing using a so-called knowledge engineering method to check whether the words are appropriate as words or sentences.

［背景技術の問題点コところで、上記のような後段処理する方式の文字読取装
置では、その最終出力データは、個別文字認識データに
対して後段処理されたデータと、後段処理されなかった
データとを含むことになる。[Problems with the Background Art] By the way, in a character reading device that uses post-processing as described above, the final output data includes data that has been post-processed with respect to individual character recognition data, and data that has not been post-processed. will be included.

しかし、従来の読取装置では、これら出力データは全く
同一の出力形態をとっており、両者の区別がなされてい
なかった。このように、後段処理されたデータとされな
かったデータとの区別がつかないと、操作者がデータの
照合を行なって、例えば、帳票記入ミス、個別文字の読
み取りミス、後段処理データの不備等を発見、修正する
際に、その作業効率を非常に悪くするという欠点があっ
た。However, in conventional reading devices, these output data have exactly the same output format, and no distinction is made between the two. In this way, if it is not possible to distinguish between post-processed data and non-post-processed data, the operator may have to check the data and find errors such as form entry errors, misreading of individual characters, deficiencies in post-processed data, etc. The problem was that it greatly reduced work efficiency when discovering and correcting problems.

［発明の目的］本発明の目的は、上記の欠点に鑑み、出力データが後段
処理されたものであるか否かを区別してこれを明示する
ことができる文字読取装置を提供することにある。[Object of the Invention] In view of the above-mentioned drawbacks, an object of the present invention is to provide a character reading device that can distinguish and clearly indicate whether or not output data has been subjected to post-processing.

［発明の概要コ本発明は、個別文字の認識結果に対して後段処理したか
否かを判定する判定手段と、この判定手段の結果に基づ
き車路認識出力データが後段処理されたものであるか否
かを区別して明示する制御手段とを具備することにより
、上記目的を達成するものである。[Summary of the Invention] The present invention includes a determining means for determining whether post-processing has been performed on the recognition results of individual characters, and road recognition output data is post-processing based on the results of the determining means. The above object is achieved by providing a control means for distinguishing and clearly indicating whether or not.

［発明の実施例］以下本発明の一実施例を図面を参照しつつ説明する。第
１図は本発明の文字読取装置の一実施例を示したブロッ
ク図である。制御部１に、光学式％式％Ｔディスプレイ５、プリンタ６及びフロッピーディスク
装置７が接続されている。また、上記後段処理部３には
データベース８が接続されている。[Embodiment of the Invention] An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing an embodiment of a character reading device of the present invention. An optical %T display 5, a printer 6, and a floppy disk device 7 are connected to the control section 1. Further, a database 8 is connected to the post-processing section 3 .

制御部１は光学式漢字読取装置２、後段処理部３、判定
部４、ＣＲＴディスプレイ５、プリンタ６、フロッピー
ディスク装置７等の制御を行なう。光学式漢字読取装置
２は帳票等に記載されている文字を１文字ずつ個別認識
を行ない、その結果を制御部１に送る。後段処理部３は
、光学式漢字読取装置２から出力される個別認識データ
と、データベース８に蓄えられているデータとの類似度
を計算し、その結果、最大値で且つ閾値を越えているデ
ータを正解とする処理を行なう。判定部４は光学式漢字
読取装置１から出力される個別認識デーりが後段処理部
３で処理されたかどうかを判定するものである。データ
ベース８には帳票のフィールド毎に発生する可能性のあ
る語を蓄えておく記憶装置である。なお、ＣＲＴディス
プレイ５は高　　　　Ｘ輝度表示が、プリンタ６は二重
印字ができるものとする。The control section 1 controls an optical Chinese character reading device 2, a post-processing section 3, a determining section 4, a CRT display 5, a printer 6, a floppy disk device 7, and the like. The optical Chinese character reading device 2 individually recognizes characters written on a form or the like one by one, and sends the results to the control section 1. The post-processing unit 3 calculates the degree of similarity between the individual recognition data output from the optical kanji reader 2 and the data stored in the database 8, and as a result, the data that has the maximum value and exceeds the threshold value Processing is performed to determine the correct answer. The determining unit 4 determines whether the individual recognition data output from the optical Chinese character reading device 1 has been processed by the subsequent processing unit 3. The database 8 is a storage device that stores words that may occur in each field of a form. It is assumed that the CRT display 5 is capable of high-X brightness display and the printer 6 is capable of double printing.

次に本実施例の動作について説明する。先ず、光学式漢
字読取装置は帳票上の文字を読み取って１文字ずつ個別
認識を行ない、その認識データを制御部１に送る。ここ
で、読み取り帳票には、第２図に示した文字が書かれて
いるものとする。そこで、制御部１は送られ゛てきた認
識結果の内、姓名と会社名については、各々単独に前記
認識データを後段処理部３に送り、住所については３個
のフィールド（ここでは東京都、青梅市、末広町）を上
位、下位の関連をつけて送る。なお、上位、下位の関連
とは、具体例で言えば、東京都が上位で、次に青梅市が
きて、下位に末広町がくるということである。Next, the operation of this embodiment will be explained. First, the optical Chinese character reading device reads the characters on the form, performs individual recognition for each character, and sends the recognition data to the control section 1. Here, it is assumed that the characters shown in FIG. 2 are written on the read form. Therefore, among the received recognition results, the control unit 1 sends the recognition data for the first and last name and company name individually to the subsequent processing unit 3, and for the address, three fields (in this case, Tokyo, Ome City, Suehiro Town) will be sent with upper and lower relationships. In addition, the relationship between upper and lower ranks means, in a concrete example, that Tokyo is at the top, followed by Ome City, and Suehiro Town at the bottom.

後段処理部３はデータベース８のデータを使用して、送
られてきた個別認識データを知識工学的に処理するもの
である。ここで、データベース８は第２図に示した内容
を持った帳票の各フィールド毎に発生する可能性のある
語を蓄えている。即ち、「姓」ファイルには、日中、佐
藤等の発生類・　度の多い名字が例えば２０００種程蓄
えられ、「名」ファイルには、太部、法部等の名前が例
えば４０００種程蓄えられており、更に「会社名」ファ
イルには第１部、第２部上場企業名が蓄えられている。The post-processing section 3 uses the data in the database 8 to process the sent individual recognition data in a knowledge-based manner. Here, the database 8 stores words that may occur in each field of a form having the contents shown in FIG. In other words, the "surname" file stores, for example, about 2,000 types of surnames with many occurrences and degrees, such as "Chinese" and "Sato," and the "given name" file stores, for example, about 4,000 types of surnames, such as "Obe" and "Hobe." In addition, the names of companies listed on the 1st and 2nd sections are stored in the "company name" file.

また、データベース８の住所ファイルには、日本全国の
住所が第３層までツリー構造で蓄えられている。但し、
この第３層までツリー構造とは、例えば東京都が第１層
、その下に青梅市、昭島市等の市が第２層を形成し、こ
の第２層の下に末広町、大山町等の町が第３層としてツ
リー構造で蓄えられているということである。Further, the address file of the database 8 stores addresses all over Japan in a tree structure up to the third layer. however,
This tree structure up to the third layer means, for example, Tokyo is on the first layer, below which cities such as Ome City and Akishima City form the second layer, and below this second layer are Suehiro Town, Oyama Town, etc. This means that the towns are stored in a tree structure as the third layer.

後段処理部３に、制御部１から「姓」フィールドの認識
データが送られてくると、後段処理部３は、データベー
ス８の「姓」ファイルを全部検索し、送られてきた認識
データと「姓」ファイルから検索したデータとの類似度
を各々とり、その結果、最大値を示し且つ一定の値以上
のデータを正解として判定部４へ出力する。もし、後段
処理部３の前記類似度処理において最大値が一定の値を
越えない場合は、後段処理部３は候補なしとして制御部
１から送られてきた個別認識データをそのまま判定部４
へ送る。判定部４は後段処理部３から送られてきたデー
タが後段処理され正解とされたものか、あるいは後段処
理されず候補なしとされたものかを判定し、この判定結
果をつけて後段処理部３から送られてきたデータを制御
部１へ送り返す。制御部１は判定部４から送り返されて
きたデータを、ＣＲＴディスプレイ５、プリンタ６及び
フロッピーディスク装置７に出力する。制御部１は、各
出力機器に対するデータの出力に際して、判定部４によ
り後段処理されたと判定されたデータには何の制御コー
ドも付加せずそのまま各出力機器に送る。しかし、制御
部１は、判定部４により後段処理されないと判定された
データには、ＣＲＴディスプレイ装置５に送る場合は高
輝度表示制御コードを、プリンタ６に送る場合には２重
印字制御コードを、フロッピーディスク装置７に送る場
合は、フィールドの先頭に１バイト予備エリアを設け「
＊」を付加する制御コードを付加して送る。When the recognition data of the "surname" field is sent from the control section 1 to the post-processing section 3, the post-processing section 3 searches all the "surname" files in the database 8 and combines the sent recognition data with "surname" field. The degree of similarity with the data retrieved from the "Last Name" file is determined, and as a result, the data that shows the maximum value and is greater than a certain value is output to the determination section 4 as the correct answer. If the maximum value does not exceed a certain value in the similarity processing of the post-processing unit 3, the post-processing unit 3 determines that there are no candidates and directly passes the individual recognition data sent from the control unit 1 to the determination unit 4.
send to The determining unit 4 determines whether the data sent from the downstream processing unit 3 has been subsequently processed and determined to be correct, or has not been subjected to downstream processing and has been determined to have no candidates, and sends the data to the downstream processing unit with this determination result. The data sent from 3 is sent back to the control unit 1. The control section 1 outputs the data sent back from the determination section 4 to the CRT display 5, printer 6, and floppy disk device 7. When the control section 1 outputs data to each output device, the control section 1 does not add any control code to the data determined by the determination section 4 to have undergone post-processing and sends the data to each output device as it is. However, the control unit 1 applies a high-intensity display control code when sending data to the CRT display device 5 and a double print control code when sending it to the printer 6 to the data determined by the judgment unit 4 as not to be processed later. , when sending to the floppy disk device 7, leave a 1-byte spare area at the beginning of the field and write "
A control code that adds "*" is added and sent.

第２図は、第１図に示した文字読取装置のＣＲＴディス
プレイ５の画面に表示された読み取り結果例である。こ
の例では会社名フィールドの「東芝青梅工場」が高輝度
表示され、後段処理部３にて処理されなかったことを示
している。FIG. 2 shows an example of a reading result displayed on the screen of the CRT display 5 of the character reading device shown in FIG. In this example, "Toshiba Ome Factory" in the company name field is displayed with high brightness, indicating that it has not been processed by the subsequent processing section 3.

本実施例によれば、出力データが後段処理されたか否か
を明示することができ、例えば末広町という町がＣＲＴ
ディスプレイ装置５にて高輝度表示された場合で、オペ
レータが末広町という新たな町名ができたことを知って
いるとすると、データベース８に末広町というデータが
入っていないことを容易に発見することができ、データ
ベースの更新を容易に行なうことができる。また、上記
と同様に末広町が高輝度表示され、且つ、オペレータが
この町名がないことを知っていた場合、帳票上の誤記入
、又は、光学式文字読取装置２の誤読であることが容易
に推定でき、読み取りデータの照合作業を能率的に行な
うことができ、装置の操作性を著しく向上させることが
できる。According to this embodiment, it is possible to clearly indicate whether or not the output data has been subjected to post-processing.
In the case where the display device 5 displays a high-intensity display and the operator knows that a new town name Suehiro Town has been created, it is easy to discover that the database 8 does not contain data for Suehiro Town. This makes it easy to update the database. In addition, if Suehiro Town is displayed with high brightness as above, and the operator knows that this town name does not exist, it is easy to see that it is an incorrect entry on the form or a misreading by the optical character reader 2. It is possible to estimate the accuracy of the reading data, efficiently perform the verification work on the read data, and significantly improve the operability of the device.

なお、第１図に示した光学式漢字読取装置の代りに、音
声認識装置又は図形認識装置を備えたシステムにも本発
明を適用して同様の効果を得ることができる。Note that the present invention can be applied to a system equipped with a voice recognition device or a graphic recognition device instead of the optical Chinese character reading device shown in FIG. 1 to obtain the same effect.

［発明の効果コ以上記述した如く本発明の文字読取装置によれば、光学
式漢字読取装置における個別文字認識結果に対して後段
処理したか否かを判定する判定手段と、後段処理した出
力データを明示して出力する制御手段とを具備したこと
により、出力結果が後段処理されたものであるか否かを
区別してこれを明示し得る効果がある。[Effects of the Invention] As described above, the character reading device of the present invention includes a determining means for determining whether or not post-processing has been performed on the individual character recognition results in the optical kanji reading device, and output data that has been post-processed. By including the control means for explicitly outputting the output result, there is an effect that it is possible to distinguish and clearly indicate whether or not the output result has been subjected to post-processing.

【図面の簡単な説明】第１図は本発明の文字読取装置の一実施例を示したブロ
ック図、第２図は第１図のＣＲＴディスプレイの表示画
面例を示した図である。１・・・制御部　　　　　　２・・・光学式漢字読取装
置３・・・後段処理部　　　　４・・・判定部５・・・
ＣＲＴディスプレイ　　　８・・−データベース代理人
　弁理士　　則　近　憲　佑（ほか１名）第１図Ｉ。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing an embodiment of the character reading device of the present invention, and FIG. 2 is a diagram showing an example of the display screen of the CRT display of FIG. 1. 1... Control unit 2... Optical Kanji reading device 3... Post-processing unit 4... Judgment unit 5...
CRT display 8...-Database agent Patent attorney Noriyuki Chika (and 1 other person) Figure 1 I.

Claims

[Claims]

In a character recognition device equipped with a function of post-processing the recognition results of individual characters, a determination means for determining whether post-processing has been performed on the recognition results of the individual characters, and output data based on the results of the determination means. 1. A character reading device comprising: a control means for distinguishing and clearly indicating whether or not a character has been subjected to post-processing.