JPS59148983A - Method for selecting "kanji" recognizing dictionary - Google Patents
Method for selecting "kanji" recognizing dictionaryInfo
- Publication number
- JPS59148983A JPS59148983A JP58021484A JP2148483A JPS59148983A JP S59148983 A JPS59148983 A JP S59148983A JP 58021484 A JP58021484 A JP 58021484A JP 2148483 A JP2148483 A JP 2148483A JP S59148983 A JPS59148983 A JP S59148983A
- Authority
- JP
- Japan
- Prior art keywords
- dictionary
- kanji
- recognition
- column
- line
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Character Discrimination (AREA)
- Document Processing Apparatus (AREA)
Abstract
Description
【発明の詳細な説明】 〔発明の利用分野〕 本発明は、帳票上の文字を自動的に読みとる。[Detailed description of the invention] [Field of application of the invention] The present invention automatically reads characters on a form.
光学文字読取装置(以下OCRという)に係り4.。4. Regarding optical character reading devices (hereinafter referred to as OCR). .
特に漢字を含む日本語情報を読みとる場合に好適な認識
辞書の選択方式に関する。In particular, the present invention relates to a recognition dictionary selection method suitable for reading Japanese information including kanji.
従来の英数字、記号、カナを読みとる(JCBでは5読
取対象字種が約100種で少ない為、砲扱い字種以外の
文字(所謂外字)を考慮する必・要はなかったが、漢字
、ひらがなまで対象を拡・げた場合は1通常2000〜
3000字種が必要とさ・れ、又、アプリケージ肩ンに
依存してこれ以外・の外字を考慮しなければならない。Read conventional alphanumeric characters, symbols, and kana (with JCB, there are only about 100 types of characters that can be read, so there was no need to consider characters other than those classified as cannons (so-called external characters), but kanji, If you expand the target to hiragana, 1 usually costs 2000~
3000 character types are required, and other external characters must be considered depending on the application name.
例えば、第51図に示す伝票の例では、第1行商品名欄
では・。For example, in the example of the slip shown in FIG. 51, in the product name column of the first line.
固有な名称に由来する漢字、第2行住所欄では、。Kanji derived from the unique name, in the second line address field.
地名固有の漢字、第3行氏名欄では1人名固有。Kanji characters unique to place names, unique to one person's name in the third line name column.
の漢字が記入される。従来方式の1つは単純に。The kanji is entered. One of the conventional methods is simply.
必要な外字を全て予め登録した共通辞書で認識。Recognizes all necessary external characters using a common dictionary that has been registered in advance.
するもので、該当欄に対し不必要に対象字種が。, and the target character type is unnecessary for the corresponding column.
多くなり、認識精度を低下させる欠点を有して。This has the disadvantage of reducing recognition accuracy.
いた。there was.
従来とられてきた他の方式は、伝票上のフォーマットを
詳細に規定することにより1例えば5
第1図の第1行B欄は、商品固有名詞辞書、第2行B欄
は、地名辞書、第3行B欄は人名辞書を認識段階にて照
合するもので、辞書に対しては最適化されるが、伝票フ
ォーマットの種類が多い場合、又は変更が生じる場合等
、いちいちフォーマット規定をOCRに与えなくてはな
ら。Another method that has been used in the past is to specify the format on the slip in detail. The third row, column B, is used to check a person's name dictionary at the recognition stage. Although it is optimized for the dictionary, if there are many types of slip formats, or if there are changes to be made, the format specifications are changed to OCR one by one. I have to give.
ない為、極めて煩雑となる欠点を有していた。・r発明
の目的〕
本発明の目的は、上記従来方式の欠点を解決・し、認識
精度が高く、操作性の優れた漢字認識5方式を提供する
ことにある。Because there is no such thing, it has the disadvantage of being extremely complicated. -Object of the Invention] An object of the present invention is to solve the drawbacks of the above-mentioned conventional methods and to provide five Kanji recognition methods with high recognition accuracy and excellent operability.
本発明の特徴とするところは、帳票内の項目・名称と記
入欄の対応を利用し1項1名称の開織。A feature of the present invention is that it is possible to create one name per item by using the correspondence between items/names and entry fields in a form.
に対しては汎用性の高い比較的少字種の漢字辞。A highly versatile kanji dictionary with a relatively small number of characters.
書を用い、記入欄の認識に対しては対応する項、目名称
の認識結果から専用漢字辞書を選択的に。A special kanji dictionary is selectively used for recognition of entry fields based on the recognition results of corresponding items and item names.
適用するところにある。It's where you apply it.
通常の帳票類のフォーマットは、第1図に示。The format of normal forms is shown in Figure 1.
す如く、記入欄(B欄)の内容を表わす項目名1゜称が
隣接して設けられている(Δ欄)。従って第1段階とし
てΔ欄を読みとり、この認識結果から対応するB欄の認
識の際、必要な辞書を選択適用することができる。又、
A欄に記される字種は一般に極めて汎用性の高いもので
1例えば教育漢字辞書のみで充分である。As shown, the item names representing the contents of the entry column (column B) are provided adjacently (Δ column). Therefore, as a first step, the Δ column is read, and from this recognition result, a necessary dictionary can be selectively applied when recognizing the corresponding B column. or,
The character types listed in Column A are generally extremely versatile and can be used only in, for example, an educational kanji dictionary.
更に、フナ−マットの多様性に関しては1例・えば、第
1図に示す如く1行内構成のデータ構・造が1項目欄(
A’)と記入欄(13)で、この複・数行構成であるこ
とのみ、予め与えておき、帳5票中のケイ線を認識する
ことにより、対応する・ことができる。文書中の枠の検
出に関しては、・公知の技術により容易に達成可能であ
り1例え。Furthermore, regarding the diversity of funer mats, for example, as shown in Figure 1, the data structure in one line is one item field (
A') and the entry field (13) can be specified in advance by providing only this multi-line configuration and by recognizing the check lines in the form. Regarding the detection of frames in documents, this can be easily achieved using known techniques.
ば、下記文献に記されている。For example, it is described in the following document.
情報処理学会第25目金国大会 1゜「文書
中のわくの検出」 桜井
又1項目名称と対応辞書の多様性に対しては、。Information Processing Society of Japan 25th Gold Country Conference 1゜ "Detection of frames in documents" Mata Sakurai Regarding the diversity of item names and corresponding dictionaries.
対応テーブルを設けておき、このテーブルを追。Create a corresponding table and add this table.
加するのみで対応できる。第2図に対応テープ。This can be done by simply adding Corresponding tape to Figure 2.
ルの例を示す。 I5r発
明の実施例〕
以下0本発明の一実施例を第6図により説明する。スキ
ャナ301にて、帳票上のイメージが順次読敞られ制御
部302を介して画像メモ+7303内に記憶される。Here is an example of a file. Embodiment of the I5r Invention] An embodiment of the present invention will be described below with reference to FIG. Images on the form are sequentially read by the scanner 301 and stored in the image memo+7303 via the control unit 302.
前処理論理部304は、制御部・ 3 ・ 602の制御下で、逐次画像メモリ303の内容を。The preprocessing logic unit 304 is a control unit. 602, the contents of the image memory 303 are sequentially read.
読みだし1表ケイ線を抽出し、各行のΔ、B欄。Extract the lines from Table 1 and record the Δ and B columns in each row.
の内部のイメージを切り出す。Cut out the image inside.
制御部502は、まず第1行Δ欄の内部イメー・ジと、
指定辞書として教育漢字辞書を辞書ファ5イル306か
ら選択し認識論理部605に伝える。・帳票として、第
1図を例にとった場合、認識部・305は「商品」とい
う認識結果を制御部302に・送り返し、制御部302
は1次にこれを基に、第・2図に示す対応テーブルから
、第1行B欄に対、。The control unit 502 first selects the internal image in the Δ column of the first row, and
An educational kanji dictionary is selected from the dictionary file 306 as the designated dictionary and is transmitted to the recognition logic unit 605.・If we take FIG. 1 as an example of a form, the recognition unit 305 sends back the recognition result of “product” to the control unit 302.
First, based on this, from the correspondence table shown in FIG.
する指定辞書として、固有名詞漢字辞書を、辞。As a designated dictionary, use a proper noun kanji dictionary.
書ファイル306から選択し、第1行B欄内部の。Select from the document file 306 and inside the first row, column B.
イメージと共に、認識部605に伝える。認識部。It is transmitted to the recognition unit 605 along with the image. Recognition part.
605では、指定漢字辞書を基に第1行B欄の内。605, in the first row, column B, based on the designated kanji dictionary.
容を認識し、結果を制御部302に伝える。以下、。and transmits the result to the control unit 302. below,.
同様にして、第2行l欄、第3行A欄に対して。In the same way, apply to the second row, column I, and the third row, column A.
「住所」、「氏名」と認識し、制御部602は第2図の
対応テーブルから地名漢字辞書1人名漢字辞書を、各々
選択し、認識部5Q5Kil当するイメージと共に伝え
、認識部305では、指定漢・ 4 ・
字辞書を基に、該当するB欄の内容を認識し、・結果を
制御部502に送り返す。The control unit 602 recognizes "address" and "name", and selects a place name kanji dictionary and a person's name kanji dictionary from the correspondence table shown in FIG. The contents of the corresponding column B are recognized based on the Kanji, 4 and 4 character dictionaries, and the results are sent back to the control unit 502.
以上の一連の制御部の処理手順を第4図にフ・ローチャ
ートで示す。The above series of processing steps of the control section are shown in a flowchart in FIG.
〔発明の効果〕 5本発
明によれば、多様な帳票フォーマットに・対しても、煩
雑なフォーマット規定を与えるこ。[Effects of the Invention] 5. According to the present invention, complicated format regulations can be provided even for various document formats.
となく、帳票上の項目欄に対応した最適な漢字。Rather, the most suitable kanji corresponding to the item field on the form.
辞書を適用することにより、高精度にて、多様。High precision and variety by applying a dictionary.
な漢字字種の読暇りを実現することができる。18It is possible to realize leisurely reading of kanji types. 18
第1図は、漢字を含む帳票例を示す図、第2゜図は1項
目名称と対応欄適用漢字辞書の対応を。
示すテーブルを示す図、第3図は1本発明の一実施例の
ブロック図、第4図は、第3図の制御、5部処理手順を
示すフローチャートである。
301、・、スキャナ 302・・・制御部303
・・・画像メモIJ 3n4 、、、前処理論理部
305・・・認識論理部 306・・・漢字辞書ファ
イル第 / 図
第 2 図
埠 3図
第 4暦Figure 1 shows an example of a form that includes kanji, and Figure 2 shows the correspondence between the name of one item and the corresponding kanji dictionary. 3 is a block diagram of one embodiment of the present invention, and FIG. 4 is a flowchart showing the control and five-part processing procedure of FIG. 3. 301... Scanner 302... Control unit 303
...Image memo IJ 3n4,...Preprocessing logic section 305...Recognition logic section 306...Kanji dictionary file No. / Figure 2 Figure 4 Calendar
Claims (1)
称と記入欄の対応を利用し、該項目名称・の認識に対し
ては、汎用性の高い比較的少字種・の漢字辞書を用い、
記入欄の認識に対しては対・応する項目名称の認識結果
から専用漢字辞書を・選択的に適用することを特徴とす
る漢字認識辞1゜書選択方式。[Claims] 1. Covers forms written in multiple characters including kanji. The optical character recognition device uses the correspondence between the five item names and entry fields in the form, and uses a highly versatile kanji dictionary with a relatively small number of characters to recognize the item names. ,
A kanji recognition dictionary 1 degree calligraphy selection method characterized by selectively applying a dedicated kanji dictionary from the recognition results of corresponding item names for recognition of entry fields.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP58021484A JPS59148983A (en) | 1983-02-14 | 1983-02-14 | Method for selecting "kanji" recognizing dictionary |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP58021484A JPS59148983A (en) | 1983-02-14 | 1983-02-14 | Method for selecting "kanji" recognizing dictionary |
Publications (1)
Publication Number | Publication Date |
---|---|
JPS59148983A true JPS59148983A (en) | 1984-08-25 |
Family
ID=12056244
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP58021484A Pending JPS59148983A (en) | 1983-02-14 | 1983-02-14 | Method for selecting "kanji" recognizing dictionary |
Country Status (1)
Country | Link |
---|---|
JP (1) | JPS59148983A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6143383A (en) * | 1984-08-08 | 1986-03-01 | Fujitsu Ltd | Character recognizer |
JPS63163586A (en) * | 1986-12-25 | 1988-07-07 | Pentel Kk | Document recognition system |
JPS63195450U (en) * | 1987-06-03 | 1988-12-15 | ||
JPH03142694A (en) * | 1989-10-30 | 1991-06-18 | Mitsubishi Electric Corp | Document reader |
JPH0612406A (en) * | 1991-04-19 | 1994-01-21 | Pfu Ltd | Kanji conversion processing system for kana address notation and kana corporation name notation |
-
1983
- 1983-02-14 JP JP58021484A patent/JPS59148983A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6143383A (en) * | 1984-08-08 | 1986-03-01 | Fujitsu Ltd | Character recognizer |
JPS63163586A (en) * | 1986-12-25 | 1988-07-07 | Pentel Kk | Document recognition system |
JPS63195450U (en) * | 1987-06-03 | 1988-12-15 | ||
JPH03142694A (en) * | 1989-10-30 | 1991-06-18 | Mitsubishi Electric Corp | Document reader |
JPH0612406A (en) * | 1991-04-19 | 1994-01-21 | Pfu Ltd | Kanji conversion processing system for kana address notation and kana corporation name notation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5860075A (en) | Document data filing apparatus for generating visual attribute values of document data to be filed | |
US6035308A (en) | System and method of managing document data with linking data recorded on paper media | |
JPS62267876A (en) | Picture register system | |
US7712028B2 (en) | Using annotations for summarizing a document image and itemizing the summary based on similar annotations | |
EP0568161A1 (en) | Interctive desktop system | |
JPH04321183A (en) | Document register method for filing device | |
JP2006085733A (en) | Filing/retrieval device and filing/retrieval method | |
JPS59148983A (en) | Method for selecting "kanji" recognizing dictionary | |
JP2740335B2 (en) | Table reader with automatic cell attribute determination function | |
US5854860A (en) | Image filing apparatus having a character recognition function | |
JPS60100264A (en) | Device for retrieving information | |
JPH0327471A (en) | Picture registration system | |
JP2904849B2 (en) | Character recognition device | |
JPH09204511A (en) | Filing device | |
Germeraad et al. | A computer-based registration system for geological collections | |
Law | The Royal African Company of England's West African Correspondence, 1681-1699 | |
JP2865443B2 (en) | Kanji conversion device for Kana name or Kana corporation name | |
JPH0520300A (en) | Document processor | |
Anderson | A COMPARATIVE STUDY OF METHODS OF ARRANGING CHINESE LANGUAGE AUTHOR-TITLECATALOGS IN LARGE AMERICAN CHINESE LANGUAGE COLLECTIONS | |
JPS5968044A (en) | List generation processing system | |
JPS61206090A (en) | Character reading device | |
JPS6319033A (en) | Filing system | |
JPS6326789A (en) | Character recognizing device | |
JPH04130973A (en) | Data registering system for electronic filing | |
Timbie | The Life of Stephen of Mar Sabas |