JPS59148983A - Method for selecting "kanji" recognizing dictionary - Google Patents

Method for selecting "kanji" recognizing dictionary

Info

Publication number
JPS59148983A
JPS59148983A JP58021484A JP2148483A JPS59148983A JP S59148983 A JPS59148983 A JP S59148983A JP 58021484 A JP58021484 A JP 58021484A JP 2148483 A JP2148483 A JP 2148483A JP S59148983 A JPS59148983 A JP S59148983A
Authority
JP
Japan
Prior art keywords
dictionary
kanji
recognition
column
line
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP58021484A
Other languages
Japanese (ja)
Inventor
Hitoshi Komatsu
仁 小松
Akizo Kadota
門田 彰三
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Priority to JP58021484A priority Critical patent/JPS59148983A/en
Publication of JPS59148983A publication Critical patent/JPS59148983A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)
  • Document Processing Apparatus (AREA)

Abstract

PURPOSE:To attain ''kanji'' (Chinese character) recognition with high recognition accuracy and superior operability by using correspondence between item names and columns in a business form. CONSTITUTION:A preprocessing logical part 304 reads out the contents of a sequential picture memory 303 under the control of a control part 302, extracts the ruled lines of a table and segments images included in the insides of the A and B columns of each line. A control part 302 selects the internal image of the column A of the 1st line and an educational ''kanji'' dictionary as a specified dictionary from a dicitionary file 306 and transmits the selected results to a recognition logical part 305. The recognition part 305 returns the recognized result to the control part 302. On the basis of the returned result, the control part 302 selects a proper noun ''kanji'' dictionary as a specified dictionary for the column B of the 1st line in the corresponding table from the dictionary file 306 and transmits the selected dictionary to the recongition part 305 together with the image of the inside of the 1st line column B. On the basis of the specified ''kanji'' dictionary, the recognition part 305 recognizes the contents of the column B of the 1st line and transmits the recognized result to the control part 302.

Description

【発明の詳細な説明】 〔発明の利用分野〕 本発明は、帳票上の文字を自動的に読みとる。[Detailed description of the invention] [Field of application of the invention] The present invention automatically reads characters on a form.

光学文字読取装置(以下OCRという)に係り4.。4. Regarding optical character reading devices (hereinafter referred to as OCR). .

特に漢字を含む日本語情報を読みとる場合に好適な認識
辞書の選択方式に関する。
In particular, the present invention relates to a recognition dictionary selection method suitable for reading Japanese information including kanji.

〔従来技術〕[Prior art]

従来の英数字、記号、カナを読みとる(JCBでは5読
取対象字種が約100種で少ない為、砲扱い字種以外の
文字(所謂外字)を考慮する必・要はなかったが、漢字
、ひらがなまで対象を拡・げた場合は1通常2000〜
3000字種が必要とさ・れ、又、アプリケージ肩ンに
依存してこれ以外・の外字を考慮しなければならない。
Read conventional alphanumeric characters, symbols, and kana (with JCB, there are only about 100 types of characters that can be read, so there was no need to consider characters other than those classified as cannons (so-called external characters), but kanji, If you expand the target to hiragana, 1 usually costs 2000~
3000 character types are required, and other external characters must be considered depending on the application name.

例えば、第51図に示す伝票の例では、第1行商品名欄
では・。
For example, in the example of the slip shown in FIG. 51, in the product name column of the first line.

固有な名称に由来する漢字、第2行住所欄では、。Kanji derived from the unique name, in the second line address field.

地名固有の漢字、第3行氏名欄では1人名固有。Kanji characters unique to place names, unique to one person's name in the third line name column.

の漢字が記入される。従来方式の1つは単純に。The kanji is entered. One of the conventional methods is simply.

必要な外字を全て予め登録した共通辞書で認識。Recognizes all necessary external characters using a common dictionary that has been registered in advance.

するもので、該当欄に対し不必要に対象字種が。, and the target character type is unnecessary for the corresponding column.

多くなり、認識精度を低下させる欠点を有して。This has the disadvantage of reducing recognition accuracy.

いた。there was.

従来とられてきた他の方式は、伝票上のフォーマットを
詳細に規定することにより1例えば5 第1図の第1行B欄は、商品固有名詞辞書、第2行B欄
は、地名辞書、第3行B欄は人名辞書を認識段階にて照
合するもので、辞書に対しては最適化されるが、伝票フ
ォーマットの種類が多い場合、又は変更が生じる場合等
、いちいちフォーマット規定をOCRに与えなくてはな
ら。
Another method that has been used in the past is to specify the format on the slip in detail. The third row, column B, is used to check a person's name dictionary at the recognition stage. Although it is optimized for the dictionary, if there are many types of slip formats, or if there are changes to be made, the format specifications are changed to OCR one by one. I have to give.

ない為、極めて煩雑となる欠点を有していた。・r発明
の目的〕 本発明の目的は、上記従来方式の欠点を解決・し、認識
精度が高く、操作性の優れた漢字認識5方式を提供する
ことにある。
Because there is no such thing, it has the disadvantage of being extremely complicated. -Object of the Invention] An object of the present invention is to solve the drawbacks of the above-mentioned conventional methods and to provide five Kanji recognition methods with high recognition accuracy and excellent operability.

〔発明の概要〕[Summary of the invention]

本発明の特徴とするところは、帳票内の項目・名称と記
入欄の対応を利用し1項1名称の開織。
A feature of the present invention is that it is possible to create one name per item by using the correspondence between items/names and entry fields in a form.

に対しては汎用性の高い比較的少字種の漢字辞。A highly versatile kanji dictionary with a relatively small number of characters.

書を用い、記入欄の認識に対しては対応する項、目名称
の認識結果から専用漢字辞書を選択的に。
A special kanji dictionary is selectively used for recognition of entry fields based on the recognition results of corresponding items and item names.

適用するところにある。It's where you apply it.

通常の帳票類のフォーマットは、第1図に示。The format of normal forms is shown in Figure 1.

す如く、記入欄(B欄)の内容を表わす項目名1゜称が
隣接して設けられている(Δ欄)。従って第1段階とし
てΔ欄を読みとり、この認識結果から対応するB欄の認
識の際、必要な辞書を選択適用することができる。又、
A欄に記される字種は一般に極めて汎用性の高いもので
1例えば教育漢字辞書のみで充分である。
As shown, the item names representing the contents of the entry column (column B) are provided adjacently (Δ column). Therefore, as a first step, the Δ column is read, and from this recognition result, a necessary dictionary can be selectively applied when recognizing the corresponding B column. or,
The character types listed in Column A are generally extremely versatile and can be used only in, for example, an educational kanji dictionary.

更に、フナ−マットの多様性に関しては1例・えば、第
1図に示す如く1行内構成のデータ構・造が1項目欄(
A’)と記入欄(13)で、この複・数行構成であるこ
とのみ、予め与えておき、帳5票中のケイ線を認識する
ことにより、対応する・ことができる。文書中の枠の検
出に関しては、・公知の技術により容易に達成可能であ
り1例え。
Furthermore, regarding the diversity of funer mats, for example, as shown in Figure 1, the data structure in one line is one item field (
A') and the entry field (13) can be specified in advance by providing only this multi-line configuration and by recognizing the check lines in the form. Regarding the detection of frames in documents, this can be easily achieved using known techniques.

ば、下記文献に記されている。For example, it is described in the following document.

情報処理学会第25目金国大会      1゜「文書
中のわくの検出」 桜井 又1項目名称と対応辞書の多様性に対しては、。
Information Processing Society of Japan 25th Gold Country Conference 1゜ "Detection of frames in documents" Mata Sakurai Regarding the diversity of item names and corresponding dictionaries.

対応テーブルを設けておき、このテーブルを追。Create a corresponding table and add this table.

加するのみで対応できる。第2図に対応テープ。This can be done by simply adding Corresponding tape to Figure 2.

ルの例を示す。              I5r発
明の実施例〕 以下0本発明の一実施例を第6図により説明する。スキ
ャナ301にて、帳票上のイメージが順次読敞られ制御
部302を介して画像メモ+7303内に記憶される。
Here is an example of a file. Embodiment of the I5r Invention] An embodiment of the present invention will be described below with reference to FIG. Images on the form are sequentially read by the scanner 301 and stored in the image memo+7303 via the control unit 302.

前処理論理部304は、制御部・ 3 ・ 602の制御下で、逐次画像メモリ303の内容を。The preprocessing logic unit 304 is a control unit. 602, the contents of the image memory 303 are sequentially read.

読みだし1表ケイ線を抽出し、各行のΔ、B欄。Extract the lines from Table 1 and record the Δ and B columns in each row.

の内部のイメージを切り出す。Cut out the image inside.

制御部502は、まず第1行Δ欄の内部イメー・ジと、
指定辞書として教育漢字辞書を辞書ファ5イル306か
ら選択し認識論理部605に伝える。・帳票として、第
1図を例にとった場合、認識部・305は「商品」とい
う認識結果を制御部302に・送り返し、制御部302
は1次にこれを基に、第・2図に示す対応テーブルから
、第1行B欄に対、。
The control unit 502 first selects the internal image in the Δ column of the first row, and
An educational kanji dictionary is selected from the dictionary file 306 as the designated dictionary and is transmitted to the recognition logic unit 605.・If we take FIG. 1 as an example of a form, the recognition unit 305 sends back the recognition result of “product” to the control unit 302.
First, based on this, from the correspondence table shown in FIG.

する指定辞書として、固有名詞漢字辞書を、辞。As a designated dictionary, use a proper noun kanji dictionary.

書ファイル306から選択し、第1行B欄内部の。Select from the document file 306 and inside the first row, column B.

イメージと共に、認識部605に伝える。認識部。It is transmitted to the recognition unit 605 along with the image. Recognition part.

605では、指定漢字辞書を基に第1行B欄の内。605, in the first row, column B, based on the designated kanji dictionary.

容を認識し、結果を制御部302に伝える。以下、。and transmits the result to the control unit 302. below,.

同様にして、第2行l欄、第3行A欄に対して。In the same way, apply to the second row, column I, and the third row, column A.

「住所」、「氏名」と認識し、制御部602は第2図の
対応テーブルから地名漢字辞書1人名漢字辞書を、各々
選択し、認識部5Q5Kil当するイメージと共に伝え
、認識部305では、指定漢・ 4 ・ 字辞書を基に、該当するB欄の内容を認識し、・結果を
制御部502に送り返す。
The control unit 602 recognizes "address" and "name", and selects a place name kanji dictionary and a person's name kanji dictionary from the correspondence table shown in FIG. The contents of the corresponding column B are recognized based on the Kanji, 4 and 4 character dictionaries, and the results are sent back to the control unit 502.

以上の一連の制御部の処理手順を第4図にフ・ローチャ
ートで示す。
The above series of processing steps of the control section are shown in a flowchart in FIG.

〔発明の効果〕               5本発
明によれば、多様な帳票フォーマットに・対しても、煩
雑なフォーマット規定を与えるこ。
[Effects of the Invention] 5. According to the present invention, complicated format regulations can be provided even for various document formats.

となく、帳票上の項目欄に対応した最適な漢字。Rather, the most suitable kanji corresponding to the item field on the form.

辞書を適用することにより、高精度にて、多様。High precision and variety by applying a dictionary.

な漢字字種の読暇りを実現することができる。18It is possible to realize leisurely reading of kanji types. 18

【図面の簡単な説明】[Brief explanation of drawings]

第1図は、漢字を含む帳票例を示す図、第2゜図は1項
目名称と対応欄適用漢字辞書の対応を。 示すテーブルを示す図、第3図は1本発明の一実施例の
ブロック図、第4図は、第3図の制御、5部処理手順を
示すフローチャートである。 301、・、スキャナ   302・・・制御部303
・・・画像メモIJ   3n4 、、、前処理論理部
305・・・認識論理部  306・・・漢字辞書ファ
イル第 / 図 第 2 図 埠 3図 第 4暦
Figure 1 shows an example of a form that includes kanji, and Figure 2 shows the correspondence between the name of one item and the corresponding kanji dictionary. 3 is a block diagram of one embodiment of the present invention, and FIG. 4 is a flowchart showing the control and five-part processing procedure of FIG. 3. 301... Scanner 302... Control unit 303
...Image memo IJ 3n4,...Preprocessing logic section 305...Recognition logic section 306...Kanji dictionary file No. / Figure 2 Figure 4 Calendar

Claims (1)

【特許請求の範囲】 1 漢字を含む多字種で記載された帳票を対象。 とする光学文字認識装置において、該帳票内の5項目名
称と記入欄の対応を利用し、該項目名称・の認識に対し
ては、汎用性の高い比較的少字種・の漢字辞書を用い、
記入欄の認識に対しては対・応する項目名称の認識結果
から専用漢字辞書を・選択的に適用することを特徴とす
る漢字認識辞1゜書選択方式。
[Claims] 1. Covers forms written in multiple characters including kanji. The optical character recognition device uses the correspondence between the five item names and entry fields in the form, and uses a highly versatile kanji dictionary with a relatively small number of characters to recognize the item names. ,
A kanji recognition dictionary 1 degree calligraphy selection method characterized by selectively applying a dedicated kanji dictionary from the recognition results of corresponding item names for recognition of entry fields.
JP58021484A 1983-02-14 1983-02-14 Method for selecting "kanji" recognizing dictionary Pending JPS59148983A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP58021484A JPS59148983A (en) 1983-02-14 1983-02-14 Method for selecting "kanji" recognizing dictionary

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP58021484A JPS59148983A (en) 1983-02-14 1983-02-14 Method for selecting "kanji" recognizing dictionary

Publications (1)

Publication Number Publication Date
JPS59148983A true JPS59148983A (en) 1984-08-25

Family

ID=12056244

Family Applications (1)

Application Number Title Priority Date Filing Date
JP58021484A Pending JPS59148983A (en) 1983-02-14 1983-02-14 Method for selecting "kanji" recognizing dictionary

Country Status (1)

Country Link
JP (1) JPS59148983A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6143383A (en) * 1984-08-08 1986-03-01 Fujitsu Ltd Character recognizer
JPS63163586A (en) * 1986-12-25 1988-07-07 Pentel Kk Document recognition system
JPS63195450U (en) * 1987-06-03 1988-12-15
JPH03142694A (en) * 1989-10-30 1991-06-18 Mitsubishi Electric Corp Document reader
JPH0612406A (en) * 1991-04-19 1994-01-21 Pfu Ltd Kanji conversion processing system for kana address notation and kana corporation name notation

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6143383A (en) * 1984-08-08 1986-03-01 Fujitsu Ltd Character recognizer
JPS63163586A (en) * 1986-12-25 1988-07-07 Pentel Kk Document recognition system
JPS63195450U (en) * 1987-06-03 1988-12-15
JPH03142694A (en) * 1989-10-30 1991-06-18 Mitsubishi Electric Corp Document reader
JPH0612406A (en) * 1991-04-19 1994-01-21 Pfu Ltd Kanji conversion processing system for kana address notation and kana corporation name notation

Similar Documents

Publication Publication Date Title
US5860075A (en) Document data filing apparatus for generating visual attribute values of document data to be filed
US6035308A (en) System and method of managing document data with linking data recorded on paper media
JPS62267876A (en) Picture register system
US7712028B2 (en) Using annotations for summarizing a document image and itemizing the summary based on similar annotations
EP0568161A1 (en) Interctive desktop system
JPH04321183A (en) Document register method for filing device
JP2006085733A (en) Filing/retrieval device and filing/retrieval method
JPS59148983A (en) Method for selecting "kanji" recognizing dictionary
JP2740335B2 (en) Table reader with automatic cell attribute determination function
US5854860A (en) Image filing apparatus having a character recognition function
JPS60100264A (en) Device for retrieving information
JPH0327471A (en) Picture registration system
JP2904849B2 (en) Character recognition device
JPH09204511A (en) Filing device
Germeraad et al. A computer-based registration system for geological collections
Law The Royal African Company of England's West African Correspondence, 1681-1699
JP2865443B2 (en) Kanji conversion device for Kana name or Kana corporation name
JPH0520300A (en) Document processor
Anderson A COMPARATIVE STUDY OF METHODS OF ARRANGING CHINESE LANGUAGE AUTHOR-TITLECATALOGS IN LARGE AMERICAN CHINESE LANGUAGE COLLECTIONS
JPS5968044A (en) List generation processing system
JPS61206090A (en) Character reading device
JPS6319033A (en) Filing system
JPS6326789A (en) Character recognizing device
JPH04130973A (en) Data registering system for electronic filing
Timbie The Life of Stephen of Mar Sabas