JPH10198688A - Fixed form document reader - Google Patents

Fixed form document reader

Info

Publication number
JPH10198688A
JPH10198688A JP9002643A JP264397A JPH10198688A JP H10198688 A JPH10198688 A JP H10198688A JP 9002643 A JP9002643 A JP 9002643A JP 264397 A JP264397 A JP 264397A JP H10198688 A JPH10198688 A JP H10198688A
Authority
JP
Japan
Prior art keywords
character string
character
additional information
item
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP9002643A
Other languages
Japanese (ja)
Inventor
Yoshikatsu Ito
好克 井藤
Ichiro Nakao
一郎 中尾
Mariko Takenouchi
磨理子 竹之内
Minoru Takakura
穂 高倉
Yoshimoto Yamamoto
喜大 山本
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP9002643A priority Critical patent/JPH10198688A/en
Publication of JPH10198688A publication Critical patent/JPH10198688A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

PROBLEM TO BE SOLVED: To provide a reader capable of efficiently sorting/retrieving a fixed form document. SOLUTION: A character area is extracted from a fixed form document inputted from a picture input part 201 by a character string area extracting part 203 and its character parttern is recognized by a character recognizing part 206. Then the recognized character string is sorted in each item by an item sorting part 209, business category information for sorting is applied to the recognized character string by a business category applying part 210 and the character string of the item sorted by the item sorting part 209 is registered in a name plate database 212 as a fixed form document reading result together with the business category information. Thereby the read fixed form document can be efficiently sorted/retrieved.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【0001】[0001]

【発明の属する技術分野】本発明は、名刺などの画像を
読み取る定型文書読み取り装置に関する。
[0001] 1. Field of the Invention [0002] The present invention relates to a standard document reading apparatus for reading an image such as a business card.

【0002】[0002]

【従来の技術】従来、定型文書読み取りの技術として、
特開平2‐240780号公報に記載されたものが知ら
れている。
2. Description of the Related Art Conventionally, as a standard document reading technology,
One described in Japanese Patent Application Laid-Open No. 2-240780 is known.

【0003】これによれば、定型文書読み取りのため
に、まず定型文書の画像から文字列領域を抽出し、認識
文字列を出力する。続いて、認識文字列を、主に単語辞
書を用いて項目分類する。さらに続いて、認識して項目
分類した文字列を定型文書データベースに格納してい
た。
According to this, a character string region is first extracted from an image of a standard document to read a standard document, and a recognized character string is output. Subsequently, the recognized character strings are classified into items mainly using a word dictionary. Subsequently, the character strings recognized and classified into items are stored in the fixed-form document database.

【0004】[0004]

【発明が解決しようとする課題】しかし、上記従来の技
術では、読み取った後に定型文書データベースに登録さ
れた定型文書は、分類するための情報として、項目毎に
登録された文字列の文字コード以外になく、分類や検索
が不便であった。
However, in the above-mentioned conventional technique, the fixed-form document which has been read and registered in the fixed-form document database is used as information for classification other than the character code of the character string registered for each item. And classification and search were inconvenient.

【0005】また、定型文書を分類するための付加情報
を入力するのは非常に繁雑であるという問題があった。
There is another problem that it is very complicated to input additional information for classifying a standard document.

【0006】また、定型文書を分類するための付加情報
の正解率が低いという問題があった。
Another problem is that the accuracy rate of the additional information for classifying the standard document is low.

【0007】さらに、定型文書を分類するための付加情
報を得るために記憶容量と検索時間を浪費するという問
題があった。
Further, there is a problem that storage capacity and search time are wasted in obtaining additional information for classifying a standard document.

【0008】本発明は上記問題点に鑑み、定型文書を効
率良く分類するための付加情報を読み取り結果に与える
定型文書読み取り装置を提供することを目的とする.
SUMMARY OF THE INVENTION In view of the above problems, an object of the present invention is to provide a fixed document reading apparatus for providing additional information for efficiently classifying fixed documents to a reading result.

【0009】[0009]

【課題を解決するための手段】上記課題を解決するため
に本発明は、以下の構成を備える。
To solve the above-mentioned problems, the present invention has the following arrangement.

【0010】請求項1では、定型文書の画像から文字列
領域を抽出し認識文字列を出力する文字認識手段と、前
記文字認識手段が出力する認識文字列に項目を付与する
項目分類手段と、前記定型文書を分類するための付加情
報を付与する付加情報付与手段と、前記付加情報付与手
段が付与する付加情報と共に前記文字認識手段が出力す
る認識文字列を項目毎に登録する登録手段とを備える。
According to the first aspect, a character recognizing means for extracting a character string area from an image of a standard document and outputting a recognized character string, an item classifying means for adding an item to the recognized character string output by the character recognizing means, An additional information providing unit for providing additional information for classifying the standard document; and a registration unit for registering, for each item, a recognition character string output by the character recognition unit together with the additional information provided by the additional information providing unit. Prepare.

【0011】請求項2では、定型文書の画像から文字列
領域を抽出し認識文字列を出力する文字認識手段と、前
記文字認識手段が出力する認識文字列に項目を付与する
項目分類手段と、定型文書を分類するための付加情報を
付与した単語を格納した単語辞書と、前記文字認識手段
が出力する認識文字列を前記単語辞書と照合し一致する
単語を抽出する単語照合手段と、前記単語照合手段が抽
出する単語に付与された付加情報から前記定型文書に付
与する付加情報を決定する付加情報決定手段と、前記付
加情報決定手段が決定する付加情報と共に前記文字認識
手段が出力する認識文字列を項目毎に登録する登録手段
とを備える。
According to a second aspect of the present invention, a character recognizing means for extracting a character string area from an image of a standard document and outputting a recognized character string, an item classifying means for giving an item to the recognized character string output by the character recognizing means, A word dictionary storing words to which additional information for classifying a standard document is stored, word matching means for checking a recognized character string output by the character recognition means with the word dictionary and extracting a matching word, Additional information determining means for determining additional information to be added to the fixed form document from additional information added to words extracted by the matching means; and a recognition character output by the character recognition means together with the additional information determined by the additional information determining means Registration means for registering a column for each item.

【0012】請求項3では、定型文書を分類するための
付加情報と共に定型文書に記載された文字列を項目毎に
記録する定型文書データベースと、画像として入力され
た定型文書から文字列領域を抽出し認識文字列を出力す
る文字認識手段と、前記文字認識手段が出力する認識文
字列に項目を付与する項目分類手段と、前記文字認識手
段が出力する認識文字列を前記定型文書データベースに
記録されている文字列と照合し一致する文字列と共に記
録されている付加情報を抽出する文字列照合手段と、前
記文字列照合手段が抽出する付加情報から前記画像とし
て入力された定型文書に付与する付加情報を決定する付
加情報決定手段と、前記付加情報決定手段が決定する付
加情報と共に前記文字認識手段が出力する認識文字列を
項目毎に前記定型文書データベースに登録する登録手段
とを備える。
According to a third aspect of the present invention, a fixed-form document database that records character strings described in the fixed-form document for each item together with additional information for classifying the fixed-form document, and a character string area is extracted from the fixed-form document input as an image. A character recognition unit that outputs a recognition character string, an item classification unit that adds an item to the recognition character string output by the character recognition unit, and a recognition character string output by the character recognition unit is recorded in the fixed-form document database. Character string collating means for extracting additional information recorded together with a character string that matches the matching character string, and additional information added to the standard document input as the image from the additional information extracted by the character string collating means An additional information determining means for determining information; and a recognition character string output by the character recognition means together with the additional information determined by the additional information determining means. And a registration means for registering in the book database.

【0013】請求項4では、定型文書の画像から文字列
領域を抽出し認識文字列を出力する文字認識手段と、定
型文書を分類するための付加情報と共に文字列に項目を
付与するための項目分類情報を付与した単語を格納した
単語辞書と、前記文字認識手段が出力する認識文字列を
前記単語辞書と照合し一致する単語を抽出する単語照合
手段と、前記単語照合手段が抽出する単語に付与された
付加情報から前記定型文書に付与する付加情報を決定す
る付加情報決定手段と、前記単語照合手段が抽出する単
語に付与された項目分類情報から前記文字認識手段が出
力する認識文字列に項目を付与する項目分類手段と、前
記付加情報決定手段が決定する付加情報と共に前記文字
認識手段が出力する認識文字列を項目毎に登録する登録
手段とを備える。
According to a fourth aspect of the present invention, a character recognition means for extracting a character string region from an image of a standard document and outputting a recognized character string, and an item for adding an item to the character string together with additional information for classifying the standard document A word dictionary storing words to which classification information has been added, a word matching means for checking a recognized character string output by the character recognition means with the word dictionary and extracting a matching word, and a word extracted by the word matching means. An additional information determining unit that determines additional information to be added to the fixed form document from the added additional information; and a recognition character string output by the character recognition unit from item classification information added to a word extracted by the word matching unit. Item classification means for assigning items, and registration means for registering, for each item, a recognition character string output by the character recognition means together with the additional information determined by the additional information determination means.

【0014】[0014]

【発明の実施の形態】以下、本発明の実施の形態につい
て、図1から図3を用いて説明する。
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to FIGS.

【0015】図1は名刺を読み取る装置の構成図であ
る。図面において201は名刺画像を入力する画像入力
部、202は画像入力部201で入力した名刺画像を保
持するための画像メモリ、203は画像メモり202に
保持される名刺画像から文字列領域を抽出する文字列領
域抽出部、204は文字列領域抽出部203が抽出した
名刺画像中の文字列領域から文字パターンを切り出す文
字切り出し部、205は文字認識のための認識辞書、2
06は文字切り出し部204で切り出した文字を認識辞
書205と照合し認識する文字認識部、207は文字認
識部206で認識された文字を文字列単位で保持する認
識文字メモリ(図3)、208は単語と項目分類情報と
業種情報が記録されている単語辞書(図4)、209は
認識された文字列を単語辞書208に記録された単語と
照合し項目に分類する項目分類部、210は名刺データ
ベース212に記録された特定の項目の文字列の属する
読み取り結果に付与された業種情報および項目分類部2
09で単語辞書208と照合した単語に該当する業種情
報から名刺画像の読み取り結果に業種情報を付与する業
種情報付与部、211は項目分類部209の分類結果と
業種情報付与部210が付与した業種情報とともに名刺
画像の読み取り結果を名刺データベースに登録する名刺
データベース登録部、212は名刺データベース登録部
211が名刺画像の読み取り結果を1つ以上記録する名
刺データベース(図5)、213は名刺データベース登
録部211において登録内容の確認と修正と分類表示を
行うユーザインタフェース(図6、図7、図8)、であ
る。
FIG. 1 is a block diagram of a device for reading business cards. In the drawing, reference numeral 201 denotes an image input unit for inputting a business card image; 202, an image memory for storing the business card image input by the image input unit 201; 203, a character string area extracted from the business card image stored in the image memory 202; A character string extracting unit 204 for extracting a character pattern from a character string region in the business card image extracted by the character string area extracting unit 203; 205, a recognition dictionary for character recognition;
Reference numeral 06 denotes a character recognition unit for collating and recognizing the character extracted by the character extraction unit 204 with the recognition dictionary 205. Reference numeral 207 denotes a recognition character memory (FIG. 3) for storing the character recognized by the character recognition unit 206 in units of character strings. Is a word dictionary (FIG. 4) in which words, item classification information, and industry information are recorded. 209 is an item classification unit that collates the recognized character strings with words recorded in the word dictionary 208 and classifies them as items. Business type information and item classification unit 2 assigned to the read result to which the character string of a specific item recorded in the business card database 212 belongs
09 is an industry information adding unit that adds industry information to the result of reading the business card image from the industry information corresponding to the word matched with the word dictionary 208, and 211 is the classification result of the item classification unit 209 and the industry that the industry information adding unit 210 has added. A business card database registration unit that registers the reading result of the business card image together with the information in the business card database, 212 is a business card database (FIG. 5) in which the business card database registration unit 211 records one or more business card image reading results, and 213 is a business card database registration unit. 211 is a user interface (FIGS. 6, 7, and 8) for confirming, correcting, and classifying registered contents.

【0016】以上のように構成された名刺読み取り装置
に、図2に示す名刺画像P1を入力した時の動作を以下
に述べる。
The operation when the business card image P1 shown in FIG. 2 is input to the business card reading device configured as described above will be described below.

【0017】画像入力部201で名刺画像P1を入力
し、画像メモリ202に保持する。文字列領域抽出部2
03は画像メモリ202に保持された名刺画像P1から
文字列領域L1、L2、L3を抽出する。文字切り出し
部204は各文字列領域から文字パターンL1C1、L
1C2、…、L1C4、L2C1、…、L2C4、L3
C1、…、L3C4を切り出す。文字認識部206は切
り出した各文字パターンを認識辞書205と照合して候
補文字を得て、図2に示すように認識文字メモリ207
に保持する。項目分類部209は認識文字メモリ207
に保持された認識結果を文字列領域毎に図4に示す単語
辞書208と照合し項目分類する。項目分類の動作をL
1、L2、L3に関して順に述べる。L1の認識結果と
単語辞書208の単語を順次照合すると、L1C3の1
位候補文字「大」とL1C4の1位候補文字「学」がW
3の単語「大学」と一致し、W3の項目分類情報「会
社」と業種情報に1が登録されている業種「教育機関」
を得る。L2の認識結果と単語辞書208の単語を順次
照合すると、L2C1の2位候補文字「教」とL2C2
の1位候補文字「務」がW5の単語「教務」と一致し、
W5の項目分類情報「肩書」と業種情報に1が登録され
ている業種「教育機関」を得る。L3の認識結果と単語
辞書208の単語を順次照合すると、L3C1の1位候
補文字「山」とL3C2の1位候補文字「田」がW6の
単語「山田」と一致し、さらにL3C3の2位候補文字
「太」とL3C4の1位候補文字「郎」がW7の単語
「太郎」と一致し、W6とW7で共通の項目分類情報
「氏名」を得る。業種情報付与部210は、項目分類部
209で行われた単語照合で得られた業種情報を計数し
て最多の業種情報「教育機関」を抽出し、また図5に示
す名刺データベース212の既登録のデータD1の会社
項目の文字列「門真大学」がL1C1の1位候補文字
「門」とL1C2の1位候補文字「真」とL1C3の1
位候補文字「大」とL1C4の1位候補文字「学」が一
致しD1の業種情報が「教育機関」であることから、名
刺画像P1の読み取り結果に「教育機関」の業種情報を
付与する。名刺データベース登録部211は認識文字メ
モリ207に保持された認識結果から項目分類部209
で照合していない文字パターンの一位候補文字と項目分
類部209で照合した単語W3、W5、W6、W7から
読取文字列「門真大学」、「教務」、「山田太郎」を構
築し、項目分類部209で得た項目分類結果(順に「会
社」、「肩書」、「名前」)と業種付与部210で付与
した業種情報「教育機関」と共に、U・I部213に出
力し(図6)U・I部の入力に従って構築した文字列と
項目分類結果と業種情報を図5に示す名刺データベース
212のD26の部分に登録する。U・I部213は名
刺データベース登録部の出力を表示(701)し使用者
の確認もしくは修正結果を名刺データベース登録部に返
すと共に、名刺データベースの登録内容を指定の分類方
法で出力する。図7は名刺データベース212の分類方
法に会社名を指定した時の表示で、名刺データベース2
12の会社項目の登録内容を用いて分類する。また図8
は名刺データベースの分類方法に業種を指定した時の表
示で、名刺データベース212の業種情報を用いて分類
する。
An image input unit 201 inputs a business card image P 1, and stores it in an image memory 202. Character string area extraction unit 2
03 extracts character string regions L1, L2 and L3 from the business card image P1 stored in the image memory 202. The character cutout unit 204 converts character patterns L1C1, L1
, L1C4, L2C1, ..., L2C4, L3
Cut out C1,..., L3C4. The character recognition unit 206 collates each cut-out character pattern with the recognition dictionary 205 to obtain candidate characters, and as shown in FIG.
To hold. The item classification unit 209 stores the recognition character memory 207
Are collated with the word dictionary 208 shown in FIG. 4 for each character string area to classify items. L for item classification
1, L2 and L3 will be described in order. When the recognition result of L1 and the words in the word dictionary 208 are sequentially compared, 1 in L1C3 is obtained.
The position candidate character "Large" and the first position candidate character "Gaku" of L1C4 are W
Business type “educational institution” that matches the word “university” of 3 and has 1 registered in the item classification information “company” and business type information of W3
Get. When the recognition result of L2 and the words in the word dictionary 208 are sequentially compared, the second-place candidate character “Kyo” of L2C1 and L2C2
Matches the word “Kyoto” in W5,
The business type “educational institution” in which 1 is registered in the item classification information “title” of W5 and the business type information is obtained. When the L3 recognition result and the words in the word dictionary 208 are sequentially collated, the first-place candidate character "yama" of L3C1 and the first-place candidate character "da" of L3C2 match the word "Yamada" of W6, and the second place character of L3C3. The candidate character "fat" and the first-place candidate character "ro" of L3C4 match the word "taro" of W7, and the item classification information "name" common to W6 and W7 is obtained. The business type information adding unit 210 counts the business type information obtained by the word matching performed by the item classification unit 209 to extract the maximum business type information “educational institution”, and also registers the business type information “educational institution” in the business card database 212 shown in FIG. The character string “Kadoshin University” of the company item of the data D1 is the first-place candidate character “Gate” of L1C1, the first-place candidate character “true” of L1C2, and 1 of L1C3.
Since the first-place candidate character "Large" matches the first-place candidate character "Gaku" of L1C4 and the business information of D1 is "educational institution", the business information of "educational institution" is added to the read result of the business card image P1. . The business card database registration unit 211 uses an item classification unit 209 based on the recognition result held in the recognition character memory 207.
From the words W3, W5, W6, and W7 collated by the item classification unit 209 with the first-place candidate character of the character pattern not collated in the above, the read character strings "Kadoshin University", "Kyoto", and "Taro Yamada" are constructed. Output to the UI section 213 together with the item classification results (in order, “company”, “title”, “name”) obtained by the classification section 209 and the business information “educational institution” provided by the business provision section 210 (FIG. 6). 5) The character string, the item classification result, and the business type information constructed in accordance with the input of the UI section are registered in the part D26 of the business card database 212 shown in FIG. The UI section 213 displays the output of the business card database registration section (701), returns the user's confirmation or correction result to the business card database registration section, and outputs the registered contents of the business card database by a specified classification method. FIG. 7 is a display when a company name is specified in the classification method of the business card database 212.
Classification is performed using the registered contents of 12 company items. FIG.
Is displayed when the business type is specified as the classification method of the business card database, and is classified using the business type information of the business card database 212.

【0018】以上の説明で明らかなように、名刺画像を
読み取るにあたり、本発明は、項目分類に使用する単語
辞書に業種情報の属性を加えることにより少ない照合回
数で読み取り結果に業種情報を付与し、名刺データベー
スの文字コード順以外の分類表示を可能にすることで使
用者が名刺データベースを使用する際の検索を容易にす
る。また、過去に登録された読み取り結果の会社名など
と照合し過去に付与された業種情報を検索することで付
与する業種情報の信頼性が向上し、確実な分類を行うこ
とができる。
As apparent from the above description, in reading a business card image, the present invention adds the business information to the read result with a small number of collations by adding the business information attribute to the word dictionary used for item classification. In addition, by enabling the classification display of the business card database other than the character code order, the user can easily search when using the business card database. In addition, the reliability of the business information to be provided is improved by collating with the company name or the like of the read result registered in the past and searching for the business information provided in the past, so that reliable classification can be performed.

【0019】また、本発明に鑑みれば本発明は名刺のみ
に限定されるものではなく、定型文書一般に適用でき
る。また、付与する付加情報も業種に限らず、地域など
他の情報にも適用できる。
Further, in view of the present invention, the present invention is not limited to business cards only, but can be applied to general fixed documents in general. Further, the additional information to be provided is not limited to the type of business, and can be applied to other information such as a region.

【0020】[0020]

【発明の効果】以上のように本発明によれば、以下のよ
うな有利な効果が得られる。
As described above, according to the present invention, the following advantageous effects can be obtained.

【0021】請求項1の構成では定型文書に分類するた
めの付加情報を与えることにより読み取った文字コード
順による分類以外の分類手段を使用者に提供し、読み取
った定型文書を効率よく分類・検索できる。
According to the first aspect of the present invention, the user is provided with classification means other than the classification based on the read character code order by providing additional information for classifying the read standard document, and the read standard document is efficiently classified and searched. it can.

【0022】また請求項2の構成では単語辞書を備え読
み取り結果と照合することにより自動的に付加情報を与
え、使用者が付加情報を与えるための作業を軽減でき
る。
According to the second aspect of the present invention, a word dictionary is provided, and additional information is automatically given by collating with a read result, so that the work for the user to give additional information can be reduced.

【0023】また請求項3の構成では定型文書データベ
ースから過去の読み取り結果を参照することにより付与
する付加情報の信頼性を高め、使用者が付加情報の訂正
にかける作業を軽減できる。
According to the configuration of the third aspect, the reliability of the additional information to be provided by referring to the past read result from the standard document database can be enhanced, and the work required for the user to correct the additional information can be reduced.

【0024】また請求項4の構成では定型文書の項目分
類に用いる単語辞書と付加情報を与えるための単語辞書
を共通にすることで、装置に使用する記憶容量と単語照
合にかかる時間を軽減できる。
In the configuration of the fourth aspect, the word dictionary used for classifying items of the standard document and the word dictionary for providing the additional information are made common, so that the storage capacity used in the apparatus and the time required for word matching can be reduced. .

【図面の簡単な説明】[Brief description of the drawings]

【図1】本発明の一実施の形態による名刺読み取り装置
の構成図
FIG. 1 is a configuration diagram of a business card reading device according to an embodiment of the present invention;

【図2】名刺画像の一例を示す図FIG. 2 illustrates an example of a business card image.

【図3】名刺画像の文字認識結果の一例を示す図FIG. 3 is a diagram illustrating an example of a character recognition result of a business card image;

【図4】項目分類情報と業種情報を伴う単語辞書の構成
FIG. 4 is a configuration diagram of a word dictionary including item classification information and industry information.

【図5】名刺データベースの構成図FIG. 5 is a configuration diagram of a business card database.

【図6】読み取り結果の確認画面の一例を示す図FIG. 6 is a diagram showing an example of a reading result confirmation screen.

【図7】名刺データベースの業種別分類表示の一例を示
す図
FIG. 7 is a diagram showing an example of an industry classification display of a business card database;

【図8】名刺データベースの会社名による分類表示の一
例を示す図
FIG. 8 is a diagram showing an example of a classification display based on company names in a business card database;

【符号の説明】[Explanation of symbols]

201 画像入力部 202 画像メモリ 203 文字列領域抽出部 204 文字切り出し部 205 認識辞書 206 文字認識部 207 認識文字メモリ 208 単語辞書 209 項目分類部 210 業種情報付与部 211 名刺データベース登録部 212 名刺データベース 213 U・I部 701 読み取り結果確認表示 Reference Signs List 201 Image input unit 202 Image memory 203 Character string area extraction unit 204 Character cutout unit 205 Recognition dictionary 206 Character recognition unit 207 Recognition character memory 208 Word dictionary 209 Item classification unit 210 Business information addition unit 211 Business card database registration unit 212 Business card database 213 U・ I part 701 Reading result confirmation display

───────────────────────────────────────────────────── フロントページの続き (72)発明者 高倉 穂 大阪府門真市大字門真1006番地 松下電器 産業株式会社内 (72)発明者 山本 喜大 大阪府門真市大字門真1006番地 松下電器 産業株式会社内 ──────────────────────────────────────────────────続 き Continued on the front page (72) Hoho Takakura 1006 Kadoma Kadoma, Kadoma, Osaka Prefecture Inside Matsushita Electric Industrial Co., Ltd. (72) Yoshihiro Yamamoto 1006 Kadoma Kadoma, Kadoma, Osaka Pref.

Claims (4)

【特許請求の範囲】[Claims] 【請求項1】 定型文書の画像から文字列領域を抽出し
認識文字列を出力する文字認識手段と、前記文字認識手
段が出力する認識文字列に項目を付与する項目分類手段
と、前記定型文書を分類するための付加情報を付与する
付加情報付与手段と、前記付加情報付与手段が付与する
付加情報と共に前記文字認識手段が出力する認識文字列
を項目毎に登録する登録手段とを有する定型文書読み取
り装置。
1. A character recognition means for extracting a character string area from an image of a fixed form document and outputting a recognized character string, an item classifying means for adding an item to a recognized character string output by the character recognition means, and the fixed form document A fixed form document having additional information adding means for adding additional information for classifying the information, and registration means for registering, for each item, a recognition character string output by the character recognition means together with the additional information added by the additional information adding means. Reader.
【請求項2】 定型文書の画像から文字列領域を抽出し
認識文字列を出力する文字認識手段と、前記文字認識手
段が出力する認識文字列に項目を付与する項目分類手段
と、定型文書を分類するための付加情報を付与した単語
を格納した単語辞書と、前記文字認識手段が出力する認
識文字列を前記単語辞書と照合し一致する単語を抽出す
る単語照合手段と、前記単語照合手段が抽出する単語に
付与された付加情報から前記定型文書に付与する付加情
報を決定する付加情報決定手段と、前記付加情報決定手
段が決定する付加情報と共に前記文字認識手段が出力す
る認識文字列を項目毎に登録する登録手段とを有する定
型文書読み取り装置。
2. A character recognition unit for extracting a character string region from an image of a standard document and outputting a recognition character string, an item classifying unit for adding an item to the recognition character string output by the character recognition unit, and A word dictionary storing words to which additional information for classification is stored, word matching means for checking a recognized character string output by the character recognition means with the word dictionary and extracting a matching word, An additional information determining means for determining additional information to be added to the fixed form document from additional information added to the word to be extracted; and a recognition character string output by the character recognition means together with the additional information determined by the additional information determining means. A fixed form document reading device having a registration unit for registering each document.
【請求項3】 定型文書を分類するための付加情報と共
に定型文書に記載された文字列を項目毎に記録する定型
文書データベースと、画像として入力された定型文書か
ら文字列領域を抽出し認識文字列を出力する文字認識手
段と、前記文字認識手段が出力する認識文字列に項目を
付与する項目分類手段と、前記文字認識手段が出力する
認識文字列を前記定型文書データベースに記録されてい
る文字列と照合し一致する文字列と共に記録されている
付加情報を抽出する文字列照合手段と、前記文字列照合
手段が抽出する付加情報から前記画像として入力された
定型文書に付与する付加情報を決定する付加情報決定手
段と、前記付加情報決定手段が決定する付加情報と共に
前記文字認識手段が出力する認識文字列を項目毎に前記
定型文書データベースに登録する登録手段とを有する定
型文書読み取り装置。
3. A fixed-form document database for recording a character string described in the fixed-form document for each item together with additional information for classifying the fixed-form document, a character string area extracted from the fixed-form document input as an image, and a recognition character A character recognizing unit that outputs a string, an item classifying unit that adds an item to the recognized character string output by the character recognizing unit, and a character that is output from the character recognizing unit and outputs the recognized character string in the fixed-form document database. A character string matching unit that extracts additional information recorded together with a matching character string by collating with a column, and determines additional information to be added to the standard document input as the image from the additional information extracted by the character string matching unit And a recognition character string output by the character recognition means together with the additional information determined by the additional information determination means for each item. A fixed form document reading device having a registration unit for registering a document in a document.
【請求項4】 定型文書の画像から文字列領域を抽出し
認識文字列を出力する文字認識手段と、定型文書を分類
するための付加情報と共に文字列に項目を付与するため
の項目分類情報を付与した単語を格納した単語辞書と、
前記文字認識手段が出力する認識文字列を前記単語辞書
と照合し一致する単語を抽出する単語照合手段と、前記
単語照合手段が抽出する単語に付与された付加情報から
前記定型文書に付与する付加情報を決定する付加情報決
定手段と、前記単語照合手段が抽出する単語に付与され
た項目分類情報から前記文字認識手段が出力する認識文
字列に項目を付与する項目分類手段と、前記付加情報決
定手段が決定する付加情報と共に前記文字認識手段が出
力する認識文字列を項目毎に登録する登録手段とを有す
る定型文書読み取り装置。
4. A character recognition means for extracting a character string region from an image of a standard document and outputting a recognition character string, and item classification information for adding an item to the character string together with additional information for classifying the standard document. A word dictionary storing the assigned words,
A word matching unit that matches a recognized character string output by the character recognition unit with the word dictionary to extract a matching word; and an addition that is added to the fixed-form document from additional information added to the word extracted by the word matching unit. Additional information determining means for determining information; item classifying means for assigning an item to a recognized character string output by the character recognizing means from item classification information assigned to a word extracted by the word matching means; A registration unit for registering, for each item, a recognition character string output by the character recognition unit together with the additional information determined by the unit.
JP9002643A 1997-01-10 1997-01-10 Fixed form document reader Pending JPH10198688A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP9002643A JPH10198688A (en) 1997-01-10 1997-01-10 Fixed form document reader

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP9002643A JPH10198688A (en) 1997-01-10 1997-01-10 Fixed form document reader

Publications (1)

Publication Number Publication Date
JPH10198688A true JPH10198688A (en) 1998-07-31

Family

ID=11535057

Family Applications (1)

Application Number Title Priority Date Filing Date
JP9002643A Pending JPH10198688A (en) 1997-01-10 1997-01-10 Fixed form document reader

Country Status (1)

Country Link
JP (1) JPH10198688A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000200280A (en) * 1999-01-05 2000-07-18 Nec Software Kobe Ltd Device and method for automatic generation of organization constitution information
JP2001147990A (en) * 1999-11-24 2001-05-29 Sharp Corp Device and method for processing image data and storage medium to be utilized therefor
JP2006350964A (en) * 2005-06-20 2006-12-28 Sharp Corp Character recognition device, character recognition method, data conversion device, data conversion method, character recognition program, data conversion program, and computer readable recording medium recording character recognition program and data conversion program
CN100465945C (en) * 2004-03-24 2009-03-04 微软公司 Method and apparatus for populating electronic forms from scanned documents
US7814043B2 (en) 2001-11-26 2010-10-12 Fujitsu Limited Content information analyzing method and apparatus

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000200280A (en) * 1999-01-05 2000-07-18 Nec Software Kobe Ltd Device and method for automatic generation of organization constitution information
JP2001147990A (en) * 1999-11-24 2001-05-29 Sharp Corp Device and method for processing image data and storage medium to be utilized therefor
US7814043B2 (en) 2001-11-26 2010-10-12 Fujitsu Limited Content information analyzing method and apparatus
CN100465945C (en) * 2004-03-24 2009-03-04 微软公司 Method and apparatus for populating electronic forms from scanned documents
JP2006350964A (en) * 2005-06-20 2006-12-28 Sharp Corp Character recognition device, character recognition method, data conversion device, data conversion method, character recognition program, data conversion program, and computer readable recording medium recording character recognition program and data conversion program

Similar Documents

Publication Publication Date Title
JPH04321183A (en) Document register method for filing device
US20060045340A1 (en) Character recognition apparatus and character recognition method
KR870011552A (en) Document registration method
JP2006085733A (en) Filing/retrieval device and filing/retrieval method
JPH11282955A (en) Character recognition device, its method and computer readable storage medium recording program for computer to execute the method
JPS5947641A (en) Producer of visiting card data base
JPH10198688A (en) Fixed form document reader
JP2002342343A (en) Document managing system
JPH08263587A (en) Method and device for document input
JP4054453B2 (en) Character recognition device and program recording medium
JPH0744573A (en) Electronic filling device
JP3727422B2 (en) Character recognition apparatus and method
JP2000090192A (en) Character string correcting method for address and zip code
KR100544375B1 (en) Extractor and method for extracting card information of the document file, and computer readable medium thereof
JP2588261B2 (en) Address database search device by OCR
JPH10302025A (en) Handwritten character recognizing device and its program recording medium
JP2549745B2 (en) Document search device
JPH06203083A (en) Electronic filing device
JP4769379B2 (en) Document search device
JP3007697B2 (en) Word matching device and word matching method
JP2904849B2 (en) Character recognition device
JP2000029877A (en) Method and device for analyzing document structure and storage medium storing document structure analyzing program
JPH0922442A (en) Electronic management system for image document data
JPH0520505A (en) Character recognizing device
JPH02151984A (en) Image recognizing system