JP2001318941A

JP2001318941A - Information processor and its method

Info

Publication number: JP2001318941A
Application number: JP2000135684A
Authority: JP
Inventors: Toshio Niwa; 寿男丹羽; Kazuhiro Kayashima; 一弘萱嶋; Keiji Ogawa; 啓司小川
Original assignee: Matsushita Refrigeration Co; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2000-05-09
Filing date: 2000-05-09
Publication date: 2001-11-16

Abstract

PROBLEM TO BE SOLVED: To automatically execute document processing suited to an inputted document by identifying the sort of the document without specification of a document processing method of a user. SOLUTION: A document feature extraction part 22 extracts the features of a document from a document image and a document sort identification part 23 identifies the sort of the document by comparing the extracted document features with document sort feature information read out from a document feature database 24. On receiving the identification result of the document sort, a processing method suited to the document is read out from a processing selection database 26 and transmitted to a processing selection part 25 to execute prescribed document processing.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、情報処理装置およ
び情報処理方法に係り、特に「メモ」，「はがき」，
「名刺」，「レシート」またはその他各種の書類，資料
等の文書をスキャナ等で読み込み、それらの文書の種類
を自動的に識別するものに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information processing apparatus and an information processing method, and more particularly to a "memo", a "postcard",
The present invention relates to a system in which a document such as a "business card", a "receipt" or various other documents and materials is read by a scanner or the like, and the type of the document is automatically identified.

【０００２】[0002]

【従来の技術】スキャナやカメラで読み取った画像情報
を処理するものとしては、たとえば、特開平９−３０５
６８５号公報（以下、従来技術１と記す）、特開平１０
−２４０９０１号公報（以下、従来技術２と記す）に開
示されている。2. Description of the Related Art For processing image information read by a scanner or a camera, see, for example, Japanese Patent Application Laid-Open No. 9-305.
No. 685 (hereinafter referred to as prior art 1),
-240901 (hereinafter referred to as prior art 2).

【０００３】図１１は従来技術１に示された一例を示
す。図１１において、情報処理装置１は文書，画像情報
を表示する画像表示部２と、ユーザが提示する「領収
書」，「レシート」，「はがき」，「名刺」，「写真」
等の画像情報が入力可能なスキャナ部３と、スキャナ部
３からの情報を記憶する記憶部４と、入力された紙や写
真の大きさや形状を認識する形状認識部５と、形状認識
部５からの情報に基づき画像情報を大きさや形状別に複
数種類に分類し画像表示部２に分類表示させる画像分類
部６を備える。FIG. 11 shows an example shown in prior art 1. In FIG. 11, an information processing apparatus 1 includes an image display unit 2 that displays a document and image information, and a “receipt”, “receipt”, “postcard”, “business card”, and “photograph” presented by a user.
, A storage unit 4 for storing information from the scanner unit 3, a shape recognition unit 5 for recognizing the size and shape of the input paper or photograph, and a shape recognition unit 5 And an image classifying unit 6 that classifies the image information into a plurality of types according to size and shape based on the information from the image display unit and causes the image display unit 2 to classify and display the image information.

【０００４】また、画像分類部６はパターン認識部１２
からの画像の形状やイメージパターンに関する情報に基
づき、スキャナ部３または記憶部４からの画像情報を画
像種類記憶部１３からの画像情報と照合して、共通の形
状またはイメージパターンのものに分類し、画像表示部
２に分類表示させるとしている。The image classifying unit 6 includes a pattern recognition unit 12
The image information from the scanner unit 3 or the storage unit 4 is collated with the image information from the image type storage unit 13 based on the information on the shape and image pattern of the image from. , And are classified and displayed on the image display unit 2.

【０００５】さらに、画像分類部６はアプリケーション
プログラム１４に画像情報として自動的にデータを振り
分け送信する。アプリケーションプログラム１４によっ
て選択されるアプリケーションはたとえば「はがき」や
「名刺」ならば［住所録］１４ａ、「領収書」，「レシ
ート」ならば［家計簿］１４ｂ、「写真」ならば［アル
バム］１４ｃといった風に実生活に合った対応アプリケ
ーションの指定を行うとしている。なお、図１１におい
て７は時計部、１５はアプリケーション起動部である。Further, the image classification section 6 automatically sorts and transmits data as image information to the application program 14. The application selected by the application program 14 is, for example, "address book" 14a for "postcard" or "business card", "household book" 14b for "receipt" or "receipt", and "album" 14c for "photo". In this way, it specifies a compatible application that suits real life. In FIG. 11, reference numeral 7 denotes a clock unit, and reference numeral 15 denotes an application starting unit.

【０００６】また従来技術２には、レイアウト変動が大
きく、項目の省略が発生しても正しく文書の項目を自動
分類する文書ファイリング装置及びその方法が開示され
ている。The prior art 2 discloses a document filing apparatus and a method for automatically classifying items of a document correctly even when layout changes are large and items are omitted.

【０００７】上記従来技術１には本願発明に類似した技
術思想が示されているも、文書種類を識別するにあたり
基準となる文書特徴データベースに格納されているデー
タを更新することについては示されていない。Although the above-mentioned prior art 1 shows a technical idea similar to that of the present invention, it shows that data stored in a document feature database serving as a reference for identifying a document type is updated. Absent.

【０００８】また、従来技術２には、「はがき」や「名
刺」ならば［住所録］、「領収書」，「レシート」関係
ならば［家計簿］，「写真」ならアルバムといった風に
実生活に合った対応アプリケーションの指定を実行し文
書類を自動的に分類するという技術思想が示されていな
い。[0008] In the prior art 2, "postcard" or "business card" is [address book], "receipt", "receipt" is related to [household book], and "photo" is album. There is no technical idea to execute the specification of the corresponding application suitable for life and to automatically classify documents.

【０００９】[0009]

【発明が解決しようとする課題】本発明はこのような従
来の課題を解決するもので、入力された文書の種類を自
動的に識別することにより、ユーザが処理方法を指定す
ることなしに、その文書に適した情報処理を実行できる
情報処理装置および情報処理方法を提供することを目的
とする。SUMMARY OF THE INVENTION The present invention solves such a conventional problem. By automatically identifying the type of an input document, a user can specify a processing method without specifying a processing method. It is an object of the present invention to provide an information processing apparatus and an information processing method capable of executing information processing suitable for the document.

【００１０】[0010]

【課題を解決するための手段】本発明の請求項１記載の
発明は、文書を認識し当該文書の画像情報を出力する文
書画像入出力部と、前記文書の画像情報から文書の特徴
を抽出する文書特徴抽出部と、登録された文書種類の文
書特徴情報を格納する文書特徴データベースと、前記文
書特徴データベースに格納された文書特徴情報と、前記
文書特徴抽出部によって抽出された文書特徴情報と前記
文書種類に基づいて当該文書の特徴が登録される文書特
徴登録部と、前記文書特徴登録部によって前記文書特徴
データベースに文書特徴情報を新たに登録または更新登
録させる手段と、前記文書特徴抽出部で抽出された文書
特徴と前記文書特徴データベースに格納された文書特徴
情報とを照合し前記文書種類を識別する文書種類識別部
と、前記文書種類に対応する処理方法を格納する処理選
択データベースと、前記処理選択データベースに格納さ
れた処理方法の情報と前記文書種類識別部から出力され
た文書種類に応動する処理選択部とを備えた情報処理装
置である。According to a first aspect of the present invention, there is provided a document image input / output unit for recognizing a document and outputting image information of the document, and extracting a feature of the document from the image information of the document. A document feature extraction unit, a document feature database storing document feature information of registered document types, document feature information stored in the document feature database, and document feature information extracted by the document feature extraction unit. A document feature registration unit in which features of the document are registered based on the document type; a unit for newly registering or updating document feature information in the document feature database by the document feature registration unit; and a document feature extraction unit. A document type identification unit for comparing the document characteristics extracted in the above with document characteristic information stored in the document characteristic database to identify the document type; An information processing apparatus comprising: a processing selection database that stores a corresponding processing method; and a processing selection unit that responds to information on the processing method stored in the processing selection database and a document type output from the document type identification unit. is there.

【００１１】請求項１記載の発明はおもに図１，図２に
よって説明される。すなわち、文書２０を認識しその文
書画像の情報を出力する文書画像入出力部２１と、出力
された当該文書画像から文書の特徴を抽出する文書特徴
抽出部２２と、登録された文書種類の文書特徴情報を格
納する文書特徴データベース２４と、前記文書特徴デー
タベースに格納された文書特徴情報と前記文書特徴抽出
部２２で抽出された文書特徴と前記文書特徴データベー
ス２４に格納された文書特徴情報とを照合し前記文書種
類を識別する文書種類識別部２３と、前記文書種類と対
応する処理方法を格納する処理選択データベース２６
と、処理選択データベース２６に格納された文書処理情
報を取り出す処理選択部２５とを備える。The invention described in claim 1 will be explained mainly with reference to FIGS. That is, a document image input / output unit 21 for recognizing the document 20 and outputting information of the document image, a document feature extraction unit 22 for extracting document characteristics from the output document image, and a document of the registered document type A document feature database 24 for storing feature information; a document feature information stored in the document feature database; a document feature extracted by the document feature extraction unit 22; and a document feature information stored in the document feature database 24. A document type identification unit 23 for collating and identifying the document type; and a process selection database 26 for storing a processing method corresponding to the document type
And a processing selection unit 25 for extracting the document processing information stored in the processing selection database 26.

【００１２】また処理選択部２５の後段にはその指示を
受けて、たとえば文書が「はがき」と認識されると、文
書処理として「住所録追加」を実行するための処理部２
７〜２９を備えている。こうした構成によって、ユーザ
が文書処理の内容や操作方法をあらかじめ把握していな
くても、文書を入力するだけで、たとえば文書の種類が
「メモ」，「はがき」，「レシート」であればそれぞれ
「メモ帳張り付け」，「住所録追加」，「家計簿登録」
と言った、文書処理が自動的に実行される。In response to the instruction, for example, when the document is recognized as "postcard", the processing unit 2 for executing "add address book" as document processing is provided at the subsequent stage of the processing selection unit 25.
7 to 29 are provided. With this configuration, even if the user does not know in advance the contents of the document processing and the operation method, just input the document. For example, if the document type is “memo”, “postcard”, or “receipt”, "Paste notepad", "Add address book", "Register household account"
The document processing is automatically executed.

【００１３】本発明の請求項２記載の発明は、文書種類
識別部が、文書特徴と文書特徴データベースに格納され
た文書特徴情報とを照合し、文書種類を識別する文書種
類認識部と、文書の種類をユーザに問い合わせるユーザ
文書種類選択部とからなる請求項１記載の情報処理装置
である。According to a second aspect of the present invention, there is provided a document type recognizing section for collating a document characteristic with document characteristic information stored in a document characteristic database to identify a document type. 2. The information processing apparatus according to claim 1, further comprising a user document type selection unit for inquiring the user about the type of the document.

【００１４】請求項２記載の発明は図１〜図３によって
説明される。特に、図３に示した文書種類識別部２３
が、文書種類認識部３１とユーザ文書種類選択部３２と
を備える。The invention according to claim 2 will be described with reference to FIGS. In particular, the document type identification unit 23 shown in FIG.
Includes a document type recognition unit 31 and a user document type selection unit 32.

【００１５】これによって、文書種類識別部２３の識別
確度が当初低くとも、文書種類の候補をユーザ文書種類
選択部３２を通じてユーザに問い合わせするものである
からユーザの設定した文書の種類に選択されるととも
に、以降の文書種類の識別の確度が高められる。Thus, even if the identification accuracy of the document type identification unit 23 is initially low, the user is inquired of the document type candidate through the user document type selection unit 32, so that the document type selected by the user is selected. At the same time, the accuracy of subsequent document type identification is increased.

【００１６】本発明の請求項３記載の発明は、文書種類
識別部が、文書画像入出力部から出力された文書画像の
中の文字を認識する文字認識部と、登録された文書種類
の文書に含まれるキーワード情報を格納する文書キーワ
ードデータベースと、前記文字認識部から出力された文
字認識結果と、前記文書キーワードデータベースに格納
されているキーワード情報からキーワード文書特徴を抽
出するキーワード文書特徴抽出部と、文書特徴抽出部で
抽出された文書特徴と前記キーワード文書特徴抽出部で
抽出されたキーワード文書特徴を文書特徴データベース
に格納された文書特徴情報と照合し文書種類を識別する
文書種類認識部を含む請求項１記載の情報処理装置であ
る。According to a third aspect of the present invention, there is provided a character recognition unit for recognizing characters in a document image output from a document image input / output unit, and a document of a registered document type. A keyword keyword database that stores the keyword information included in the keyword information; a character recognition result output from the character recognition unit; and a keyword document feature extraction unit that extracts keyword document features from the keyword information stored in the document keyword database. A document type recognizing unit for comparing the document features extracted by the document feature extracting unit and the keyword document features extracted by the keyword document feature extracting unit with document feature information stored in a document feature database to identify a document type. An information processing apparatus according to claim 1.

【００１７】請求項３記載の発明は図１，図２および図
５によって説明される。特に図５において、文書種類識
別部２３は文書画像入出力部２１から出力された文書画
像の文字を認識する文字認識部４１と、登録された文書
種類の文書に含まれるキーワード情報を格納する文書キ
ーワードデータベース４３と、前記文字認識部から出力
された文字認識結果と文書キーワードデータベース４３
に格納されているキーワード情報からキーワード文書特
徴を抽出するキーワード文書特徴抽出部４２と、文書特
徴抽出部４２で抽出された文書特徴と文書特徴データベ
ース２４に格納された文書特徴情報とを照合し文書種類
を識別する文書種類認識部４４とを備える。The invention according to claim 3 will be explained with reference to FIGS. 1, 2 and 5. In particular, in FIG. 5, a document type identification unit 23 includes a character recognition unit 41 that recognizes characters of a document image output from the document image input / output unit 21 and a document that stores keyword information included in a document of the registered document type. A keyword database 43; a character recognition result output from the character recognition unit;
A keyword document feature extraction unit 42 for extracting keyword document features from the keyword information stored in the document information storage unit 24. The document features extracted by the document feature extraction unit 42 are compared with the document feature information stored in the document feature database 24, and the document is compared. A document type recognition unit 44 for identifying the type.

【００１８】上記の構成によって、ユーザが選択した文
書書類と文書種類認識部４４が認識した文書書類の識別
結果が不適合な場合、文書特徴データベース２４を更新
するものであるから、以降の情報処理での文書種類認識
の確度がさらに向上せしめられる。With the above configuration, if the identification result of the document selected by the user and the document recognized by the document type recognition unit 44 are incompatible, the document feature database 24 is updated. The accuracy of the document type recognition is further improved.

【００１９】本発明の請求項４記載の発明は、画像入出
力部が、文書を画像として読み取る文書画像読み取り部
と、前記文書画像読み取り部に文書が入力状態になった
ことを検知する文書検知部を含む請求項１〜請求項３記
載のいずれかの情報処理装置である。According to a fourth aspect of the present invention, in the image input / output unit, a document image reading unit that reads a document as an image, and a document detection unit that detects that a document is input to the document image reading unit. An information processing apparatus according to any one of claims 1 to 3, including a unit.

【００２０】請求項４記載の発明は図１，図２および図
６によって説明される。特に図６に示したように文書画
像入出力部２１に、またはそれに併設させて文書画像読
み取り部５１と文書検知部５２を備える。The invention according to claim 4 will be described with reference to FIGS. 1, 2 and 6. Particularly, as shown in FIG. 6, a document image reading unit 51 and a document detecting unit 52 are provided in the document image input / output unit 21 or in addition thereto.

【００２１】上記によれば、ユーザが文書を文書画像読
み取り部５１に取り込むと、文書検知部５２はその状態
を検知して直ちに情報処理が自動的に実行される。According to the above, when a user takes a document into the document image reading section 51, the document detecting section 52 detects the state and immediately executes information processing automatically.

【００２２】本発明の請求項５記載の発明は、文書を入
力として文書画像を出力するステップと、前記文書画像
から文書の特徴を抽出するステップと、登録された文書
種類の文書特徴情報を格納するステップと、ユーザの操
作入力により文書種類および文書特徴情報を新たに追加
登録しかつ更新登録するステップと、前記文書特徴情報
と前記抽出された文書特徴を照合し文書種類を識別する
ステップと、前記文書種類と対応する処理方法の情報を
格納するステップと、前記格納された処理方法の情報と
前記登録された文書種類の情報に基づき文書処理を選択
するステップと、前記文書処理を選択するステップに基
づき文書処理を実行するステップよりなる情報処理方法
である。According to a fifth aspect of the present invention, a step of outputting a document image with a document as an input, a step of extracting document characteristics from the document image, and storing document characteristic information of a registered document type. Performing new registration and update registration of the document type and document characteristic information by a user's operation input, and identifying the document type by comparing the document characteristic information with the extracted document characteristics, Storing information on the processing method corresponding to the document type; selecting document processing based on the stored information on the processing method and the registered document type information; and selecting the document processing This is an information processing method including a step of executing document processing based on the information.

【００２３】請求項５記載の発明は図１，図２によって
説明される。この情報処理方法によれば、ユーザが文書
処理の内容や操作方法をあらかじめ把握していなくて
も、文書を入力するだけで、自動的にいくつか文書種類
に整理、分類する処理が奏される。The invention according to claim 5 will be described with reference to FIGS. According to this information processing method, even if the user does not know in advance the contents and operation method of the document processing, a process of automatically organizing and classifying some document types only by inputting a document is performed. .

【００２４】本発明の請求項６記載の発明は、文書を入
力として文書画像を出力するステップは、さらに文書特
徴情報と前記抽出された文書特徴を照合し文書種類を識
別するステップおよび文書処理を選択するステップの少
なくともいずれか一方に画像情報を提供する請求５記載
の情報処理方法である。According to a sixth aspect of the present invention, the step of outputting a document image with a document as an input further includes the step of collating document characteristic information with the extracted document characteristic to identify a document type and a document process. The information processing method according to claim 5, wherein image information is provided to at least one of the selecting steps.

【００２５】請求項６記載の発明は図１によって説明さ
れる。特に、文書画像を出力するステップが実行される
文書画像入出力部２１から、文書種類を識別するステッ
プが実行される文書種類識別部２３および文書処理を選
択するステップが実行される処理選択部２５に対して文
書画像が提供される。The invention described in claim 6 will be explained with reference to FIG. In particular, from the document image input / output unit 21 where the step of outputting the document image is executed, the document type identification unit 23 where the step of identifying the document type is executed and the processing selection unit 25 where the step of selecting the document processing are executed Is provided with a document image.

【００２６】この情報処理方法によれば、文書種類の識
別および文書処理を選択する確度が高められる。According to this information processing method, the identification of the document type and the accuracy of selecting the document processing can be enhanced.

【００２７】本発明の請求項７記載の発明は、文書特徴
情報と抽出された文書特徴を照合し文書種類を識別する
ステップにおいて、ユーザは既に登録されている文書の
種類と当該文書の評価値または、当該文書と当該文書に
出現するキーワードの確信度を参照する手段と交信する
請求項５記載の情報処理方法である。According to a seventh aspect of the present invention, in the step of collating the document characteristic information with the extracted document characteristic and identifying the document type, the user determines the type of the registered document and the evaluation value of the document. 6. The information processing method according to claim 5, wherein the communication is performed with a unit that refers to the certainty factor of the document and a keyword appearing in the document.

【００２８】請求項７記載の発明は図１〜図３によって
説明される。特に、図３においてユーザ（図示せず）は
ユーザ文書種類選択部３２を介して、文書種類認識部３
１と交信を行い、文書種類の識別結果を確認し、その確
認結果に基づき、所定の文書種類を選択する。The invention according to claim 7 will be described with reference to FIGS. In particular, in FIG. 3, the user (not shown) inputs the document type recognition unit 3 through the user document type selection unit 32.
1 and confirms the result of identifying the document type, and selects a predetermined document type based on the result of the identification.

【００２９】この情報処理方法によれば、文書種類を識
別する確度が初期状態において、たとえ低いものであっ
ても文書種類の候補（選択）をユーザに問い合わせする
ものであるから、ユーザが所望する文書の種類に設定す
ることができる。また、ユーザが行う文書種類の選択と
指示によって、文書特徴データベース２４および文書キ
ーワードデータベース４３が更新され、評価値の精度，
確信度が向上する。According to this information processing method, in the initial state, even if the accuracy of identifying the document type is low, the user is inquired of a candidate (selection) of the document type. Can be set to the document type. The document feature database 24 and the document keyword database 43 are updated by the user's selection and instruction of the document type, and the accuracy of the evaluation value,
Increases confidence.

【００３０】本発明の請求項８記載の発明は、文書特徴
情報と抽出された文書特徴を照合し文書種類を識別する
ステップにおいて、文書画像入出力部から出力された文
書画像の中の文字を認識する文字認識ステップと、前記
文字認識ステップに基づいてあらかじめ登録されている
文書キーワードデータベースから当該文書種類に適合し
たキーワードを抽出するステップと、前記キーワードを
抽出するステップからの情報と文書特徴データベースに
格納された文書特徴情報とを照合し文書種類を識別する
文書種類認識ステップとを含む請求項５記載の情報処理
方法である。According to an eighth aspect of the present invention, in the step of comparing the document feature information with the extracted document feature and identifying the document type, the character in the document image output from the document image input / output unit is replaced with a character. A character recognition step for recognizing, a step of extracting a keyword suitable for the document type from a document keyword database registered in advance based on the character recognition step, and a step of extracting information and a document feature database from the step of extracting the keyword. 6. The information processing method according to claim 5, further comprising: a document type recognition step of comparing the stored document characteristic information with the stored document characteristic information to identify a document type.

【００３１】この情報処理方法は図１，図２および図５
によって説明される。特に図５において、文書キーワー
ドデータベース４３から当該文書種類に適合したキーワ
ードを抽出するステップと、前記キーワードを抽出する
ステップからの情報と文書特徴データベース２４に格納
された文書特徴情報とを、文書種類認識部４４で照合し
文書種類を識別する文書種類認識ステップとを含む。This information processing method is shown in FIGS.
Explained by In particular, in FIG. 5, the step of extracting a keyword suitable for the document type from the document keyword database 43 and the information from the step of extracting the keyword and the document feature information stored in the document feature database 24 are used for document type recognition. A document type recognizing step of collating by the unit 44 to identify the document type.

【００３２】上記の情報処理方法によれば、ユーザが選
択した文書書類と文書種類認識部４４が認識した文書書
類の識別結果が不適合な場合、文書特徴データベース２
４を更新するものであるから、文書種類認識の確度がさ
らに向上せしめられるAccording to the above information processing method, when the identification result of the document selected by the user and the identification of the document recognized by the document type recognizing unit 44 do not match, the document characteristic database 2 is used.
4 is updated, so that the accuracy of document type recognition can be further improved.

【００３３】[0033]

【発明の実施の形態】以下、本発明の実施の形態を添付
図面に基づき詳細に説明する。Embodiments of the present invention will be described below in detail with reference to the accompanying drawings.

【００３４】（実施の形態１）以下、実施の形態１につ
いて図１，図２を用いて説明する。本発明においては、
「メモ」，「はがき」，「名刺」，「レシート」などの
文書２０が情報処理の対象となる。(Embodiment 1) Hereinafter, Embodiment 1 will be described with reference to FIGS. In the present invention,
Documents 20 such as “memo”, “postcard”, “business card”, and “receipt” are to be processed.

【００３５】文書２０の中にはこれらの他に「各種議事
録」，「写真」，「書籍」など日常生活で一般的に使用
され、よく知られているもの或いは特定の団体間や個人
間で取り交わされる書類や文書も含む。In the document 20, in addition to these, "various minutes", "photographs", "books", etc. which are generally used in daily life and are well-known or between specific organizations or individuals Includes documents and documents exchanged at

【００３６】上記の「メモ」等は本来、文書という概念
からは外れるかも知れないが本発明においては便宜上、
上記のものを総称して「文書」と称することとする。The above “memo” may deviate from the concept of a document, but in the present invention, for convenience,
The above items are collectively referred to as "documents".

【００３７】さて、上記の文書２０はスキャナやカメラ
などで文書画像入出力部２１で読み取られ文書画像が出
力される。すなわち、文書画像入出力部２１において
は、文書２０が入力され、当該文書の文書画像を出力す
るステップがなされる。The document 20 is read by a document image input / output unit 21 using a scanner or a camera, and a document image is output. That is, the document image input / output unit 21 receives the document 20 and performs a step of outputting a document image of the document.

【００３８】なお図１には、文書画像入出力部２１で生
成された文書画像を、文書特徴抽出部２２だけではな
く、後述の文書種類識別部２３および処理選択部２５の
両方に提供したものを示したが、これらは必須の構成要
件ではなく、文書特徴抽出部２２に提供するだけでも本
発明の情報処理効果は奏される。In FIG. 1, the document image generated by the document image input / output unit 21 is provided not only to the document feature extraction unit 22 but also to both a document type identification unit 23 and a processing selection unit 25 described later. However, these are not indispensable components, and the information processing effect of the present invention can be achieved only by providing them to the document feature extraction unit 22.

【００３９】文書特徴抽出部２２は、文書画像入出力部
２１に取り込まれた文書２０の文書画像の特徴を抽出す
る。ここで文書画像の特徴としては、文書用紙の大き
さ，黒画素の密度，連結黒画素外接矩形面積，黒画素の
分布と位置，行の数と位置などである。こうして文書特
徴抽出部２２においては、当該文書画像の特徴を抽出す
るステップがなされる。The document feature extraction unit 22 extracts the features of the document image of the document 20 captured by the document image input / output unit 21. Here, the characteristics of the document image include the size of the document paper, the density of black pixels, the area of a circumscribed rectangle of connected black pixels, the distribution and position of black pixels, and the number and position of lines. Thus, the document feature extracting unit 22 performs a step of extracting features of the document image.

【００４０】なお「連結黒画素外接矩形面積」とは、画
像中で接続した黒画素を１つの集合体として、それらが
包含されるように外接四角形で囲まれた面積を指す。The "circumscribed rectangular area of connected black pixels" refers to an area surrounded by a circumscribed rectangle so that the black pixels connected in the image are included in one set and are included.

【００４１】文書種類識別部２３は、文書特徴データベ
ース２４に格納されている文書特徴情報と文書特徴抽出
部２２で抽出された文書特徴を照合し、文書画像入出力
部２１に入力された文書が「メモ」，「はがき」，「名
刺」，「レシート」のいずれであるかまたはその他の文
書であるか、いわゆる、文書の種類を識別する。The document type identification unit 23 compares the document feature information stored in the document feature database 24 with the document features extracted by the document feature extraction unit 22, and determines whether the document input to the document image input / output unit 21 is valid. The type of the document is identified as "memo", "postcard", "business card", "receipt" or other document.

【００４２】また、文書種類識別部２３においては、文
書画像入出力部２１に入力された文書画像または出力さ
れた文書画像を取り込み、該文書画像を基準とした文書
識別を行うことができる。これによって文書種類の識別
確度はさらに高められる。The document type identification section 23 can fetch a document image input or output to the document image input / output section 21 and perform document identification based on the document image. Thereby, the identification accuracy of the document type is further enhanced.

【００４３】すなわち、文書種類識別部２３において
は、文書特徴データベース２４からの文書情報を第１
（または第２）の基準情報とし、文書画像入出力部２１
からの文書画像情報を第２（または第１）の基準情報と
して、文書種類を識別するステップが実行されうる。し
かし、前に述べたように文書画像入出力部２１から直接
文書種類識別部２３に文書情報を取り込むことは必須の
構成要件ではない。That is, in the document type identifying section 23, the document information from the document feature database 24 is
(Or second) reference information, and the document image input / output unit 21
The step of identifying the document type may be performed using the document image information from the document as the second (or first) reference information. However, as described above, it is not an essential component that the document information is directly fetched from the document image input / output unit 21 into the document type identification unit 23.

【００４４】文書特徴データベース２４には、文書の種
類とその種類の文書が持っている文書特徴の情報があら
かじめ登録されており、こうした情報は文書特徴データ
ベース２４によって提供される。文書特徴データベース
にはたとえば文書が名刺，レシートの場合表１に示すよ
うに、（ａ）文書の長さ、（ｂ）文書の幅、（ｃ）行の
数、（ｃ）黒画素の密度、などが登録され、またそれら
の標準偏差も併せて登録、格納されている。In the document characteristic database 24, information on the type of document and the document characteristics of the document of the type is registered in advance, and such information is provided by the document characteristic database 24. For example, when the document is a business card or a receipt, as shown in Table 1, the document characteristic database includes (a) the length of the document, (b) the width of the document, (c) the number of lines, (c) the density of black pixels, Are registered, and their standard deviations are also registered and stored.

【００４５】[0045]

【表１】 [Table 1]

【００４６】また文書特徴データベース２４は新規な文
書を登録しまた既存の情報を更新することができる。図
２を参照すると、本実施の形態の特徴であるところの文
書特徴登録部３０が示され、ユーザが操作入力した文書
種類と、文書特徴抽出部２２で抽出された文書特徴を文
書特徴データベース２４に追加登録し、または文書種類
に対する文書特徴情報が更新登録される。The document feature database 24 can register a new document and update existing information. Referring to FIG. 2, a document feature registration unit 30 which is a feature of the present embodiment is shown. The document type registered by the user and the document features extracted by the document feature extraction unit 22 are stored in a document feature database 24. , Or the document characteristic information for the document type is updated and registered.

【００４７】以上のようにして、文書特徴データベース
２４に新たな情報、データを追加登録し、また更新登録
することにより以降の情報処理において適切な文書種類
の識別が奏される。As described above, by additionally registering new information and data in the document feature database 24 and registering the updated information, an appropriate document type can be identified in subsequent information processing.

【００４８】処理選択部２５は処理選択データベース２
６に格納されている処理方法の情報と文書種類識別部２
３から出力された文書種類に応動し、文書画像を文書処
理部２７〜２９に送出する。The processing selection section 25 is a processing selection database 2
6 and the document type identification unit 2
In response to the document type output from the document processing unit 3, the document image is sent to the document processing units 27 to 29.

【００４９】また処理選択部２５においては、処理選択
データベース２６からの文書情報を第１（または第２）
の基準情報とし、文書画像入出力部２１からの文書画像
情報を第２（または第１）の基準情報として、文書種類
の識別ステップを実行することができる。しかし、文書
画像入出力部２１からの文書画像を処理選択部２５に取
り込むことは必須の構成要件ではない。In the processing selection section 25, the document information from the processing selection database 26 is stored in the first (or second)
The document type identification step can be executed using the document image information from the document image input / output unit 21 as the second (or first) reference information. However, capturing the document image from the document image input / output unit 21 into the processing selection unit 25 is not an essential component.

【００５０】すなわち文書画像入出力部２１からの文書
画像を文書種類識別部２３および処理選択部２５に提供
するという２つの提供ルートは必須のものではなく、文
書特徴抽出部２２の機能、性能によっては２つの提供ル
ートは不要であり、或いはいずれか１つの提供ルートで
十分である。That is, the two providing routes of providing the document image from the document image input / output unit 21 to the document type identifying unit 23 and the processing selecting unit 25 are not essential, but may depend on the function and performance of the document feature extracting unit 22. Does not require two provision routes, or one of the provision routes is sufficient.

【００５１】文書処理部２７〜２９は図示したように複
数の処理部を備える。その主旨は前述で少し触れたが、
「はがき」や「名刺」ならば［住所録］、「領収書」や
「レシート」ならば［家計簿］、「写真」ならばアルバ
ムといった具合に複数の文書書類に応じて文書を分類，
整理するためである。したがって、文書処理の数だけ少
なくとも準備される。The document processing units 27 to 29 include a plurality of processing units as shown. As mentioned above,
Classify documents according to multiple documents, such as [address book] for "postcard" or "business card", [household account] for "receipt" or "receipt", and album for "photo".
This is for organizing. Therefore, at least the number of document processes is prepared.

【００５２】図７は、「文書種類」と各文書に施される
「処理方法」を示す。図７に示したように、たとえば文
書２０が「メモ」と認識されると、当該画像は文書処理
部２７に送出され、該文書処理部においては「メモ帳張
り付け」という文書処理がなされる。FIG. 7 shows "document type" and "processing method" applied to each document. As shown in FIG. 7, for example, when the document 20 is recognized as a "memo", the image is sent to the document processing unit 27, and the document processing unit performs a document process of "pasting a memo pad".

【００５３】また、たとえば文書の種類が「はがき」や
「名刺」の場合は、「住所録追加」という文書処理が選
択，指示され、当該文書画像は文書処理部２８に送出さ
れる。When the type of the document is “postcard” or “business card”, for example, a document process of “add address book” is selected and instructed, and the document image is sent to the document processing unit 28.

【００５４】また認識された文書が「領収書」や「レシ
ート」であれば、たとえば［家計簿登録］という処理方
法が選択，指示され、当該文書画像は文書処理部２９に
送出されるという具合である。If the recognized document is “receipt” or “receipt”, for example, a processing method of “register household account book” is selected and instructed, and the document image is sent to the document processing unit 29. It is.

【００５５】以上のように、文書種類を識別して文書処
理することにより、ユーザが文書処理の内容や操作方法
をあらかじめ把握していなくても、文書を入力するだけ
で、文書処理を自動的に実行することが奏される。As described above, by performing document processing by identifying the document type, even if the user does not know in advance the content of the document processing and the operation method, the document processing is automatically performed only by inputting the document. Is performed.

【００５６】（実施の形態２）次に、実施の形態２につ
いて説明する。主要構成部は実施の形態１と同じ図１，
図２によって説明される。(Embodiment 2) Next, Embodiment 2 will be described. The main components are the same as those in the first embodiment shown in FIGS.
This is illustrated by FIG.

【００５７】「メモ」，「はがき」，「名刺」，「レシ
ート」などの文書２０は図示しないスキャナやカメラで
読み取られ文書画像入出力部２１で処理され文書画像を
出力する。文書特徴抽出部２２は文書画像入出力部２１
から取り出された文書画像の特徴を抽出する。文書種類
識別部２３は、文書特徴データベース２４に格納されて
いる文書特徴情報と文書特徴抽出部２２で抽出された文
書特徴を照合し、文書種類を識別する。処理選択データ
ベース２６は、文書種類と文書の処理方法の情報を格納
する。処理選択部２５は、処理選択データベース２６に
格納されている処理方法の情報と文書種類識別部２３か
ら出力された文書種類から処理方法を選択し、文書画像
を文書処理部２７〜２９のいずれかに送出する。Documents 20 such as "memo", "postcard", "business card", and "receipt" are read by a scanner or a camera (not shown), processed by a document image input / output unit 21, and output a document image. The document feature extraction unit 22 is a document image input / output unit 21
Extract the features of the document image extracted from. The document type identification unit 23 compares the document characteristic information stored in the document characteristic database 24 with the document characteristics extracted by the document characteristic extraction unit 22, and identifies the document type. The process selection database 26 stores information on a document type and a document processing method. The processing selection unit 25 selects a processing method from the information on the processing method stored in the processing selection database 26 and the document type output from the document type identification unit 23, and converts the document image into one of the document processing units 27 to 29. To send to.

【００５８】図３は、本実施の形態の文書種類識別部２
３の具体的な構成図である。文書種類認識部３１は、文
書特徴データベース２４に格納されている文書特徴情報
と文書特徴抽出部２２で抽出された文書特徴を照合し文
書種類を認識する。認識結果が適切であるか否かをユー
ザ文書種類選択部３２に送りユーザに問い合わせを行
い、ユーザの選択，指示を待つ。すなわち本実施の形態
においては、文書種類識別部２３はユーザとの交信手段
を備える。FIG. 3 shows a document type identification unit 2 according to this embodiment.
3 is a specific configuration diagram of FIG. The document type recognizing unit 31 collates the document feature information stored in the document feature database 24 with the document features extracted by the document feature extracting unit 22 to recognize the document type. It sends to the user document type selection section 32 whether the recognition result is appropriate or not, inquires the user, and waits for the user's selection and instruction. That is, in the present embodiment, the document type identification unit 23 includes a means for communicating with the user.

【００５９】以上の構成の情報処理装置において次のよ
うにして文書処理を行う。「メモ」，「はがき」，「名
刺」，「レシート」などの種々の文書２０を、スキャナ
やカメラを用いて、文書画像入出力部２１で処理し文書
画像を得る。The document processing is performed in the information processing apparatus having the above configuration as follows. Various documents 20 such as “memo”, “postcard”, “business card”, and “receipt” are processed by a document image input / output unit 21 using a scanner or a camera to obtain a document image.

【００６０】文書特徴抽出部２２では、文書画像入出力
部２１から出力された文書画像から文書の特徴となる文
書特徴情報を抽出する。文書特徴情報は、たとえば文書
用紙の大きさ，黒画素の密度，連結黒画素外接矩形面
積，黒画素の分布と位置，行の数と位置などである。The document feature extraction unit 22 extracts document feature information, which is a feature of the document, from the document image output from the document image input / output unit 21. The document characteristic information includes, for example, the size of a document sheet, the density of black pixels, the area of a rectangle circumscribing connected black pixels, the distribution and position of black pixels, and the number and position of lines.

【００６１】文書特徴データベース２４には、文書の種
類とその種類の文書が持っている文書特徴情報があらか
じめ登録されている。文書種類認識部３１では、文書特
徴抽出部２２で抽出された文書特徴と、文書特徴データ
ベース２４に登録されている文書特徴の情報とを照合し
て、文書種類を認識し、文書種類候補と文書種類識別の
確度を表す評価値を得る。図８は文書種類候補と評価値
の例を示している。評価値は、あらかじめ設定されてい
るものではなく、文書種類認識部３１において計算で求
められる。評価値は、入力され識別された文書がどの文
書と認識されたかを示す、いわゆる適合率でありその数
値が大きい（高い）ほど当該文書である確率が高いこと
を表している。In the document characteristic database 24, document types and document characteristic information of the documents of the types are registered in advance. The document type recognizing unit 31 collates the document features extracted by the document feature extracting unit 22 with information on the document features registered in the document feature database 24, recognizes the document type, and determines a document type candidate and a document type. An evaluation value representing the accuracy of type identification is obtained. FIG. 8 shows an example of a document type candidate and an evaluation value. The evaluation value is not set in advance, but is calculated by the document type recognition unit 31. The evaluation value is a so-called relevance ratio that indicates which document the input and identified document has been recognized, and indicates that the larger (higher) the numerical value is, the higher the probability of being the document is.

【００６２】さて、図８の評価値（ａ）に示したもの
は、ある文書を文書種類認識部３１で認識してみると当
該文書が「名刺」であると認定した評価結果（評価値）
が１００、「はがき」，「メモ」，「レシート」と認識
した評価値がそれぞれ５０，３０，２０という結果を示
している。この場合には、一番評価値が高い「名刺」の
評価値が１００、二番目に高い「はがき」の評価値が５
０であり、これらの評価値には大きな差がみられるの
で、当該文書は「名刺」であると確信できる。したがっ
てこの場合には、ユーザに文書種類の選択，指示を委ね
る必要はない。The evaluation value (a) shown in FIG. 8 is an evaluation result (evaluation value) in which when a document is recognized by the document type recognition unit 31, the document is recognized as a "business card".
, 100, and the evaluation values recognized as “postcard”, “memo”, and “receipt” are 50, 30, and 20, respectively. In this case, the evaluation value of the “business card” with the highest evaluation value is 100, and the evaluation value of the “postcard” with the second highest evaluation is 5
Since the evaluation value is 0 and there is a large difference between these evaluation values, it can be convinced that the document is a “business card”. Therefore, in this case, it is not necessary to entrust the user with the selection and instruction of the document type.

【００６３】一方、図８の評価値（ｂ）に示した結果
は、「名刺」，「はがき」，「メモ」，「レシート」の
評価値がそれぞれ４０，２０，５，１０という結果を示
している。この場合には、一番高い「名刺」の評価値が
４０、二番目に評価値が高いのは「はがき」の２０で、
両者の評価値には２０の差がみられるが、一番高い「名
刺」であってもその評価値は４０と低い。したがって、
この場合に当該文書が「名刺」である確度は低いので、
ユーザに文書種類の選択と指示を委ねることになる。On the other hand, the results shown in the evaluation value (b) of FIG. 8 indicate that the evaluation values of “business card”, “postcard”, “memo”, and “receipt” are 40, 20, 5, and 10, respectively. ing. In this case, the highest evaluation value of "business card" is 40, and the second highest evaluation value is "postcard" of 20,
Although there is a difference of 20 between the two evaluation values, the evaluation value of the highest “business card” is as low as 40. Therefore,
In this case, the probability that the document is a “business card” is low,
The selection and instruction of the document type are entrusted to the user.

【００６４】また、図８の評価値（ｃ）に示した結果
は、「名刺」，「はがき」，「メモ」，「レシート」の
評価値がそれぞれ７５，８０，１０，３０という結果を
示している。この場合には、一番高い「はがき」の評価
値が８０、二番目に評価値が高いのは「名刺」で７５
と、両者の評価値は高いが、両者の差は５だけであるの
で、当該文書が「はがき」であるのか「名刺」であるの
かを識別するのが困難である。したがって、こうした場
合にはユーザに文書種類の選択と指示を委ねることにな
る。The results shown in the evaluation value (c) of FIG. 8 indicate that the evaluation values of “business card”, “postcard”, “memo”, and “receipt” are 75, 80, 10, and 30, respectively. ing. In this case, the highest evaluation value of “postcard” is 80, and the second highest evaluation value is “business card” of 75.
And the evaluation value of both is high, but the difference between them is only 5, so it is difficult to identify whether the document is a “postcard” or a “business card”. Therefore, in such a case, the selection and instruction of the document type are entrusted to the user.

【００６５】図９にはユーザに文書種類を選択させるメ
ニュー例を示す。ユーザに対してたとえば「文書の種類
はどれですか」という問いかけが行われ、その時に、た
とえば「メモ」，「名刺」，「はがき」，「レシート」
の４種類のメニューが表示され、ユーザにメニュー選択
を仰ぐことになる。ユーザによって選択された文書種類
は、文書種類認識部３１に戻され、文書種類認識部３１
はその文書種類を出力する。FIG. 9 shows an example of a menu that allows the user to select a document type. The user is asked, for example, "what kind of document", and at that time, for example, "memo", "business card", "postcard", "receipt"
Are displayed, and the user is asked to select a menu. The document type selected by the user is returned to the document type recognition unit 31, and the document type recognition unit 31
Outputs its document type.

【００６６】処理選択部２５では、文書種類識別部２３
から文書種類を受け取り、処理選択データベース２６を
参照して、処理方法を処理選択データベース２６から選
択し、文書画像入出力部２１から入力された文書画像
を、選択された文書処理部２７〜２９に送出する。In the process selecting section 25, the document type identifying section 23
, And selects a processing method from the process selection database 26 with reference to the process selection database 26, and transfers the document image input from the document image input / output unit 21 to the selected document processing units 27 to 29. Send out.

【００６７】以上のようにして、文書種類識別の確度が
低い場合に文書種類候補の選択をユーザに問い合わせる
ことにより、文書種類識別の確度が低い場合でも、文書
の種類に合った正しい情報処理を行うことが可能とな
る。As described above, when the accuracy of the document type identification is low, the user is inquired of the selection of the document type candidate, so that even if the accuracy of the document type identification is low, correct information processing suitable for the document type can be performed. It is possible to do.

【００６８】（実施の形態３）次に、実施の形態３につ
いて説明する。図４にこの実施の形態の情報処理装置全
体の構成を示す。図４に示した構成は図１を基本構成と
し、それに図２および図３を組み合わせた構成とほぼ同
じである。(Embodiment 3) Next, Embodiment 3 will be described. FIG. 4 shows the configuration of the entire information processing apparatus according to this embodiment. The configuration shown in FIG. 4 is substantially the same as the configuration obtained by combining FIG. 1 with FIG. 1 and FIG. 2 and FIG.

【００６９】実施の形態１で述べたように文書画像入出
力部２１は、「メモ」，「はがき」，「名刺」，「レシ
ート」などの文書２０をスキャナやカメラなどで入力
し、文書画像を出力する。文書特徴抽出部２２は、文書
画像から文書特徴を抽出する。文書特徴データベース２
４は、登録された文書種類の文書特徴情報を格納してい
る。文書種類認識部３１は文書種類識別部２３の一部を
構成し、文書特徴データベース２４に格納されている文
書特徴情報と文書特徴抽出部２２で抽出された文書特徴
を照合し当該文書種類を認識する。文書種類の認識が適
切であるか或いは不適切であるかについては、認識結果
を文書種類識別部２３を構成するもう１つのユーザ文書
種類選択部３２に送り、ユーザに問い合わせを行い、ユ
ーザの判断，選択，指示を仰ぐことになる。As described in the first embodiment, the document image input / output unit 21 inputs the document 20 such as “memo”, “postcard”, “business card”, “receipt” with a scanner or a camera, and outputs the document image. Is output. The document feature extraction unit 22 extracts a document feature from a document image. Document feature database 2
Reference numeral 4 stores document characteristic information of the registered document type. The document type recognizing unit 31 forms a part of the document type recognizing unit 23, and recognizes the document type by comparing the document characteristic information stored in the document characteristic database 24 with the document characteristics extracted by the document characteristic extracting unit 22. I do. Whether the recognition of the document type is appropriate or inappropriate is sent to another user document type selection unit 32 constituting the document type identification unit 23, an inquiry is made to the user, and the judgment of the user is made. , Choices, and directions.

【００７０】文書特徴登録部３０は、文書特徴抽出部２
２で抽出された文書特徴と、文書種類認識部３１から出
力された文書種類を入力として、文書特徴データベース
２４に、文書種類の文書特徴情報を追加する。処理選択
データベース２６は、文書種類と文書の処理方法の情報
を格納している。処理選択部２５は、処理選択データベ
ース２６に格納されている処理方法の情報と文書種類識
別部２３から出力された文書種類から処理方法を選択
し、文書画像を文書処理部２７〜２９に送出する。The document feature registration unit 30 includes the document feature extraction unit 2
The document feature extracted in step 2 and the document type output from the document type recognition unit 31 are input, and document feature information of the document type is added to the document feature database 24. The process selection database 26 stores information on document types and document processing methods. The processing selection unit 25 selects a processing method from the information on the processing method stored in the processing selection database 26 and the document type output from the document type identification unit 23, and sends a document image to the document processing units 27 to 29. .

【００７１】以上の構成の情報処理装置において次のよ
うにして文書処理を行う。「メモ」、「はがき」，「名
刺」，「レシート」など、媒体の大きさ，媒体の形，文
字の大きさや文字の形が異なる種々の文書２０をスキャ
ナやカメラを用いて、文書画像入出力部２１で処理し文
書画像を得る。The document processing is performed in the information processing apparatus having the above configuration as follows. Using a scanner or a camera, various types of documents 20, such as "memos", "postcards", "business cards", and "receipts", which differ in medium size, medium shape, character size, and character shape, are input using a scanner or camera. Processing is performed by the output unit 21 to obtain a document image.

【００７２】文書特徴抽出部２２では、文書画像から文
書の特徴となる文書特徴を抽出する。文書の特徴を示す
ものとしては、たとえば文書用紙の大きさ，黒画素の密
度，連結黒画素外接矩形面積，黒画素の分布と位置，行
の数と位置などである。The document feature extracting section 22 extracts a document feature which is a feature of the document from the document image. The characteristics of the document include, for example, the size of the document paper, the density of black pixels, the area of a rectangle circumscribing connected black pixels, the distribution and position of black pixels, and the number and position of lines.

【００７３】文書特徴データベース２４には、文書の種
類とその種類の文書が持っている文書特徴の情報をあら
かじめ登録しておく。文書種類認識部３１では、文書特
徴抽出部２２で抽出された文書特徴と、文書特徴データ
ベース２４に登録されている文書特徴の情報とを照合し
て、文書種類を認識し、文書種類候補と文書種類識別の
確度を表す評価値を得る。In the document characteristic database 24, information on the type of document and the document characteristics of the document of that type is registered in advance. The document type recognizing unit 31 collates the document features extracted by the document feature extracting unit 22 with information on the document features registered in the document feature database 24, recognizes the document type, and determines a document type candidate and a document type. An evaluation value representing the accuracy of type identification is obtained.

【００７４】図８に文書種類候補と評価値の例を示す。
評価値から文書種類識別の確度が低い場合は、ユーザ文
書種類選択部３２に文書種類候補を送る。ユーザ文書種
類選択部３２では、文書種類候補を表示し、ユーザに文
書種類を選択させる。選択された文書種類は、文書種類
認識部３１に返される。文書種類認識部３１は、その文
書種類を出力する。ユーザが選択した文書種類と文書種
類候補の第１候補が異なるときは、文書種類を文書特徴
登録部３０へ送る。FIG. 8 shows an example of a document type candidate and an evaluation value.
If the accuracy of the document type identification is low from the evaluation value, the document type candidate is sent to the user document type selection unit 32. The user document type selection unit 32 displays the document type candidates and allows the user to select a document type. The selected document type is returned to the document type recognition unit 31. The document type recognition unit 31 outputs the document type. When the document type selected by the user is different from the first candidate of the document type candidate, the document type is sent to the document feature registration unit 30.

【００７５】文書特徴登録部３０では、ユーザが入力し
た文書種類と、文書特徴登録部３０で抽出された文書特
徴を文書特徴データベース２４に追加登録し、文書種類
に対する文書特徴情報を更新する。The document feature registration unit 30 additionally registers the document type input by the user and the document features extracted by the document feature registration unit 30 in the document feature database 24, and updates the document feature information for the document type.

【００７６】処理選択部２５では、文書種類識別部２３
から文書種類を受け取り、処理選択データベース２６を
参照して、処理方法を処理選択データベース２６から選
択し、文書画像入出力部２１から入力された文書画像
を、選択された文書処理部２７〜２９に送出する。In the process selecting section 25, the document type identifying section 23
, And selects a processing method from the process selection database 26 with reference to the process selection database 26, and transfers the document image input from the document image input / output unit 21 to the selected document processing units 27 to 29. Send out.

【００７７】以上のようにして、ユーザが選択した文書
種類と文書種類認識部３１が認識した文書種類候補の第
１候補が異なるときに、文書特徴データベース２４が更
新されることにより、次回以降の文書種類認識の確度が
高められる。As described above, when the document type selected by the user is different from the first candidate of the document type candidate recognized by the document type recognizing unit 31, the document feature database 24 is updated, so that the next and subsequent document types are updated. The accuracy of document type recognition is increased.

【００７８】（実施の形態４）次に、実施の形態４につ
いて説明する。全体の構成は実施の形態１と同じ図１に
よって説明される。(Fourth Embodiment) Next, a fourth embodiment will be described. The overall configuration will be described with reference to FIG.

【００７９】文書２０は文書画像入出力部２１で、スキ
ャナやカメラなどで入力され、文書画像入出力部２１で
文書画像が出力される。文書特徴抽出部２２は、文書画
像入出力部２１から入力された文書画像の特徴を抽出す
る。文書種類識別部２３は、文書特徴データベース２４
に格納されている文書特徴情報と文書特徴抽出部２２で
抽出された文書画像の特徴を照合し、文書種類を識別す
る。処理選択部２５は、処理選択データベース２６に格
納されている処理方法の情報と文書種類識別部２３から
出力された文書種類から処理方法を選択し、文書画像を
文書処理部２７〜２９のいずれかに渡して文書処理を実
行させる。The document 20 is input by a document image input / output unit 21 via a scanner or a camera, and the document image input / output unit 21 outputs a document image. The document feature extraction unit 22 extracts features of the document image input from the document image input / output unit 21. The document type identification unit 23 includes a document feature database 24
Of the document image extracted by the document feature extraction unit 22 and identifies the document type. The processing selection unit 25 selects a processing method from the information on the processing method stored in the processing selection database 26 and the document type output from the document type identification unit 23, and converts the document image into one of the document processing units 27 to 29. To execute document processing.

【００８０】図５は、本実施の形態の文書種類識別部２
３の具体構成図である。文字認識部４１は、文書画像入
出力部２１から取り込んだ文書画像に書かれている文字
を認識しテキスト情報を出力する。文書キーワードデー
タベース４３は、登録された文書種類に含まれるキーワ
ード情報を格納している。キーワード文書特徴抽出部４
２は、文書キーワードデータベース４３に格納されてい
るキーワード情報と文字認識部４１で認識されたテキス
ト情報とを照合し、キーワード文書特徴を出力する。文
書種類認識部４４は、文書特徴データベース２４に格納
されている文書特徴情報と、文書特徴抽出部２２で抽出
された文書特徴とキーワード文書特徴抽出部４２で抽出
された文書キーワード特徴を照合，識別し、識別した文
書種類を文書種類識別部２３へ送り出す。FIG. 5 is a diagram showing a document type identification unit 2 according to the present embodiment.
3 is a specific configuration diagram of FIG. The character recognizing unit 41 recognizes characters written in the document image fetched from the document image input / output unit 21 and outputs text information. The document keyword database 43 stores keyword information included in the registered document types. Keyword document feature extraction unit 4
Reference numeral 2 compares keyword information stored in the document keyword database 43 with text information recognized by the character recognition unit 41, and outputs keyword document characteristics. The document type recognition unit 44 collates and identifies the document feature information stored in the document feature database 24, the document features extracted by the document feature extraction unit 22, and the document keyword features extracted by the keyword document feature extraction unit 42. Then, the identified document type is sent to the document type identification unit 23.

【００８１】以上の構成の情報処理装置において次のよ
うにして文書処理を行う。まず図１に示したように「メ
モ」，「はがき」，「名刺」，「レシート」などの種々
の文書２０を、スキャナやカメラを用いて、文書画像入
出力部２１で処理し文書画像を得る。The document processing is performed in the information processing apparatus having the above configuration as follows. First, as shown in FIG. 1, various documents 20 such as “memo”, “postcard”, “business card”, and “receipt” are processed by a document image input / output unit 21 using a scanner or a camera, and a document image is processed. obtain.

【００８２】文書特徴抽出部２２では、文書画像から文
書の特徴となる文書特徴を抽出する。文書特徴として
は、たとえば文書用紙の大きさ，黒画素の密度，連結黒
画素外接矩形面積，黒画素の分布と位置，行の数と位置
などが抽出される。The document feature extraction unit 22 extracts a document feature which is a feature of the document from the document image. As the document features, for example, the size of the document paper, the density of black pixels, the area of a rectangle circumscribing connected black pixels, the distribution and position of black pixels, the number and position of lines, and the like are extracted.

【００８３】次に図５に示したように、文字認識部４１
では文書画像から文書に記載されている文字を認識し、
テキスト情報を出力する。文書キーワードデータベース
４３には、キーワードおよび当該キーワードを含むと推
測される文書の種類などの文書キーワード情報をあらか
じめ登録しておく。なお実施の形態４は実施の形態１〜
実施の形態３のすべてに適用することができる。Next, as shown in FIG.
Recognizes the characters in the document from the document image,
Output text information. In the document keyword database 43, document keyword information such as a keyword and a type of a document presumed to include the keyword is registered in advance. Note that Embodiment 4 is Embodiment 1 to Embodiment 1.
It can be applied to all of the third embodiment.

【００８４】図１０に文書キーワードデータベース４３
に格納されているデータの一例を示す。たとえば「名
刺」には一般的に、電話番号や住所が記載されることが
多いので「TEL」や「県」なるキーワードを「名刺」な
る文書と関係づけて文書キーワードデータベース４３に
登録しておくとよい。ここで説明の都合上「TEL」なる
キーワードの確信度が７０であるのに比べて「県」なる
キーワードの確信度は３０であると示した。それは、
「名刺」には「県」なる文字を省略して印刷されるもの
や、「都」，「道」，「府」に該当するものも比較的多
く「県」なるキーワードが「名刺」に出現する割合が低
いことを想定して示した。また近年、「名刺」に「Ｅメ
ールアドレス」が記入される場合も多いから「Ｅメール
アドレス」によく用いられる文字や記号と「名刺」とを
関連づけて登録しておいてもよい。FIG. 10 shows the document keyword database 43.
Shows an example of the data stored in. For example, in general, a telephone number and an address are often described in a "business card". Therefore, the keywords "TEL" and "prefecture" are associated with the document "business card" and registered in the document keyword database 43. Good. Here, for the sake of explanation, the degree of certainty for the keyword “TEL” is 70, whereas the degree of certainty for the keyword “prefecture” is 30. that is,
"Business card" is printed without the word "prefecture", and "To", "road", and "fu" are relatively common, and the keyword "prefecture" appears in "business card". It is shown assuming that the rate of doing is low. In recent years, an “e-mail address” is often written in a “business card”, so that characters or symbols frequently used in an “e-mail address” may be registered in association with a “business card”.

【００８５】また「県」なる文字は「はがき」にも出現
するから、これらを結びつけて登録しておけばよい。し
かし、郵便番号の記入が徹底されるほど「県」なる文字
の確信度は低くなるであろうから、その確信度も比較的
低い４０であることを示している。また図１０には示さ
ないが７桁の郵便番号キーワードを、「はがき」と結び
つけて登録しておいてもよい。Since the character "prefecture" also appears in "postcard", it is sufficient to register them in association with each other. However, the more surely the postal code is entered, the lower the degree of certainty of the character "prefecture" will be, which indicates that the degree of certainty is relatively low as well. Although not shown in FIG. 10, a seven-digit postal code keyword may be registered in association with “postcard”.

【００８６】また、レシートや領収書には購入金額や取
引金額を示す「合計」や「小計」なる文字が多用される
ことが多いので、これらのキーワードを「レシート」と
結びつけて文書キーワードデータベース４３に登録して
おけばよい。また「￥」なる記号をレシートと関係づけ
て登録しておいてもよい。In addition, since characters such as “total” and “subtotal” indicating the purchase amount and the transaction amount are often used in receipts and receipts, these keywords are linked to “receipts” and the document keyword database 43 is used. It is good to register in. Further, the symbol “￥” may be registered in association with the receipt.

【００８７】なお、図１０に示した確信度の数値は出願
人が説明の都合上、暫定的に定めた数値であるから、正
確さを欠いていることを理解されたい。いずれにしても
文書種類と当該文書に出現するであろうと予測できるキ
ーワードの確信度を把握した上で文書キーワードデータ
ベース４３に登録しておくとよい。It should be understood that the numerical value of the certainty factor shown in FIG. 10 is tentatively set by the applicant for the sake of explanation, and therefore lacks accuracy. In any case, the document type and the degree of certainty of the keyword that can be predicted to appear in the document are grasped and registered in the document keyword database 43.

【００８８】また、文書種類とキーワードとの対応は、
複数のキーワードの和、積およびこれらの組み合わせを
もって文書種類とを対応させておいてもよい。たとえば
「レシート」には金額の合計，小計なる文字とその金額
を示す数字および日付が記入されていることを勘案する
と｛（合計＋小計＋￥）×（数字）×（西暦の数字＋平
成）｝を「レシート」なる文書と関連づけておいてもよ
い。The correspondence between the document type and the keyword is as follows.
The document type may be associated with the sum, product, and combination of a plurality of keywords. For example, considering that the “receipt” contains the total and subtotal characters and the number and date indicating the amount, ｛(total + subtotal + ￥) × (numbers) × (years + Heisei) ｝ May be associated with a “receipt” document.

【００８９】さて、図１０に示した「キーワード」，
「文書種類」，「確信度」は前に述べたように、文書種
類識別部２３に構成された文書キーワードデータベース
４３に格納されている。Now, the "keyword" shown in FIG.
The “document type” and the “certainty” are stored in the document keyword database 43 configured in the document type identification unit 23 as described above.

【００９０】文書キーワードデータベース４３の登録キ
ーワードをより最適なものとすることによって、文書種
類識別の精度が高くなり、図８に示した文書種類候補の
評価値の確度を高めることができる。このためには前に
も述べたが｛（合計＋小計＋￥）×（数字）×（西暦の
数字＋平成）｝なるキーワードの組み合わせを、「レシ
ート」なる文書と関連づけておくことが効果的である。
図８に示した評価値の確度が高められると、図９に示し
たユーザの文書種類の選択，指示する頻度も少なくなる
とともに、情報処理が自動的に行われるという効果が奏
される。By optimizing the registered keywords in the document keyword database 43, the accuracy of document type identification is increased, and the accuracy of the evaluation value of the document type candidate shown in FIG. 8 can be increased. To this end, it is effective to associate the keyword combination {(total + subtotal + ￥) × (number) × (number in the Christian era + Heisei)} with the document called “receipt” as described above. It is.
When the accuracy of the evaluation value shown in FIG. 8 is increased, the frequency of selecting and instructing the document type by the user shown in FIG. 9 is reduced, and the information processing is automatically performed.

【００９１】キーワード文書特徴抽出部４２では、文書
キーワードデータベース４３に格納されている文書キー
ワード情報と文字認識部４１で認識されたテキスト情報
とを照合し、文書に含まれるキーワードによって文書が
どの文書種類に対応するのかを表す文書キーワード特徴
を抽出する。The keyword document feature extracting unit 42 compares the document keyword information stored in the document keyword database 43 with the text information recognized by the character recognizing unit 41, and determines the type of the document according to the keyword included in the document. Then, a document keyword feature representing whether the document keyword corresponds to is extracted.

【００９２】処理選択部２５では、文書種類識別部２３
から文書種類を受け取り、処理選択データベース２６か
ら処理方法選択しその処理方法を文書処理部２７〜２９
に送出する。In the process selecting section 25, the document type identifying section 23
, And a processing method selected from the processing selection database 26, and the processing method is determined by the document processing units 27 to 29.
To send to.

【００９３】以上のようにして、文字認識を行って得た
テキスト情報から、キーワード情報を取り出して、これ
をキーワード文書特徴の抽出に使用することにより、よ
り詳しい文書特徴を用いることができ、文書種類認識の
確度を高くすることが可能になる。As described above, by extracting keyword information from the text information obtained by performing character recognition and using the keyword information for extracting keyword document characteristics, more detailed document characteristics can be used. It is possible to increase the accuracy of type recognition.

【００９４】なお、実施の形態３と実施の形態４を併存
させてもよい。こうするならば、文書種類認識の確度は
さらに向上する。Note that the third embodiment and the fourth embodiment may coexist. In this case, the accuracy of document type recognition is further improved.

【００９５】（実施の形態５）次に、実施の形態５につ
いて説明する。全体の構成は実施の形態１と同じように
図１によって説明される。(Fifth Embodiment) Next, a fifth embodiment will be described. The overall configuration will be described with reference to FIG. 1 as in the first embodiment.

【００９６】文書画像入出力部２１は、文書２０をスキ
ャナやカメラなどから入力し文書画像を出力する。文書
特徴抽出部２２は、文書画像から文書２０の特徴を抽出
する。文書特徴データベース２４は、登録された文書種
類の文書特徴情報を格納している。文書種類識別部２３
は、文書特徴データベース２４に格納されている文書特
徴情報と文書特徴抽出部２２で抽出された文書特徴を照
合し、文書種類を識別する。処理選択部２５は、処理選
択データベース２６に格納されている処理方法の情報と
文書種類識別部２３から出力された文書種類から処理方
法を選択し、文書画像を文書処理部２７〜２９に送出す
る。The document image input / output unit 21 inputs the document 20 from a scanner or a camera and outputs a document image. The document feature extraction unit 22 extracts features of the document 20 from the document image. The document feature database 24 stores document feature information of registered document types. Document type identification unit 23
Compares the document feature information stored in the document feature database 24 with the document features extracted by the document feature extraction unit 22, and identifies the document type. The processing selection unit 25 selects a processing method from the information on the processing method stored in the processing selection database 26 and the document type output from the document type identification unit 23, and sends a document image to the document processing units 27 to 29. .

【００９７】本実施の形態の特徴は図６に示めされる。
すなわち文書画像読み取り部５１は、文書２０を図示し
ないスキャナやカメラなどから読み取り文書画像を出力
する。文書検知部５２は、文書の用紙が文書画像読み取
り部５１に検知されたとき、文書が入力状態になってい
ることを文書画像読み取り部５１に知らせる。The features of this embodiment are shown in FIG.
That is, the document image reading section 51 outputs a document image by reading the document 20 from a scanner or a camera (not shown). When a document sheet is detected by the document image reading unit 51, the document detection unit 52 notifies the document image reading unit 51 that the document is in an input state.

【００９８】以上の構成の情報処理装置においては次の
ようにして文書処理を行う。「メモ」，「はがき」，
「名刺」，「レシート」などの種々の文書２０をユーザ
がスキャナの読み取り口に入れるか、あるいはカメラの
撮影範囲に入れると、文書検知部５２では文書が入力状
態であると判断して入力状態であることを文書画像読み
取り部５１に知らせる。文書画像読み取り部５１は、文
書検知部５２から文書が入力状態である知らせを受ける
と、スキャナやカメラから、文書を読み取り文書画像を
得る。In the information processing apparatus having the above configuration, document processing is performed as follows. "Memo", "postcard",
When the user inserts various documents 20, such as "business card" and "receipt", into the reading opening of the scanner or into the shooting range of the camera, the document detection unit 52 determines that the document is in the input state and enters the input state. Is notified to the document image reading unit 51. When receiving a notification that the document is in the input state from the document detection unit 52, the document image reading unit 51 reads the document from a scanner or a camera and obtains a document image.

【００９９】読み取られた文書画像は図１に示した文書
特徴抽出部２２に送られる。文書特徴抽出部２２では文
書画像から文書の特徴となる文書特徴を抽出する。文書
特徴はたとえば、文書用紙の大きさ，黒画素の密度，連
結黒画素外接矩形面積，黒画素の分布と位置，行の数と
位置などをもって示される。The read document image is sent to the document feature extraction unit 22 shown in FIG. The document feature extraction unit 22 extracts a document feature that is a feature of the document from the document image. The document characteristics are indicated by, for example, the size of the document paper, the density of black pixels, the area of a rectangle circumscribing connected black pixels, the distribution and position of black pixels, the number and position of lines, and the like.

【０１００】文書特徴データベース２４には、文書の種
類とその種類の文書が所有する文書特徴の情報をあらか
じめ登録しておく。文書種類識別部２３では、文書特徴
抽出部２２で抽出された文書特徴と、文書特徴データベ
ース２４に登録されている文書特徴の情報とを照合し
て、もっとも適合した文書種類を入力された文書画像の
文書種類として出力する。In the document feature database 24, information on the type of the document and the document characteristics owned by the document of the type is registered in advance. The document type identification unit 23 compares the document feature extracted by the document feature extraction unit 22 with the information of the document feature registered in the document feature database 24, and a document image in which the most suitable document type is input. Output as document type.

【０１０１】処理選択データベース２６には、文書の種
類と文書種類別の処理方法の情報を格納している。処理
選択データベース２６の例を図７に示す。なお。図７に
ついては既に実施の形態１で説明し、本実施の形態にお
いても同じであるので説明は割愛する。The process selection database 26 stores information on the type of document and the processing method for each document type. FIG. 7 shows an example of the process selection database 26. In addition. FIG. 7 has already been described in the first embodiment, and is the same in the present embodiment, so description thereof will be omitted.

【０１０２】処理選択部２５では、文書種類識別部２３
から文書種類を受け取り、処理選択データベース２６を
参照して処理方法が選択されて、文書画像入出力部２１
から入力された文書画像を、選択された文書処理部２７
〜２９に送出される。In the process selecting section 25, the document type identifying section 23
, And a processing method is selected with reference to the processing selection database 26, and the document image input / output unit 21
The document image input from the selected document processing unit 27
To 29.

【０１０３】以上のように、ユーザが文書を文書画像読
み取り部５１に入れると、情報処理が自動的に進む。こ
れにより、ユーザは、文書を文書画像入出力部２１に取
り込むだけで、その他の操作をしなくても文書処理を行
うことが奏される。As described above, when the user inserts a document into the document image reading section 51, the information processing automatically proceeds. As a result, the user can perform the document processing without taking any other operation by merely loading the document into the document image input / output unit 21.

【０１０４】[0104]

【発明の効果】以上説明したように本発明の情報処理装
置、情報処理方法によれば、入力された文書の種類を識
別し、文書の種類にあった文書処理を自動的に選択して
実行する。これにより、ユーザが文書処理の内容や操作
方法をあらかじめ把握していなくても、文書を入力する
だけで、文書処理を行うことができる。As described above, according to the information processing apparatus and the information processing method of the present invention, the type of the input document is identified, and the document processing corresponding to the type of the document is automatically selected and executed. I do. Thus, even if the user does not know the contents of the document processing and the operation method in advance, the document processing can be performed only by inputting the document.

【０１０５】また本発明によれば、新たに識別したい種
類の文書を入力することにより、文書特徴データベース
を更新することができる。これにより新しい種類の文書
や、今まで識別を誤った文書の種類を正しく識別でき
る。Further, according to the present invention, the document feature database can be updated by inputting a new type of document to be identified. As a result, a new type of document or a type of document that has been erroneously identified can be correctly identified.

【０１０６】また本発明によれば、文書種類識別の確度
が低い場合に、文書種類候補の選択をユーザに問い合わ
せることにより、文書種類識別の確度が低い場合でも、
文書の種類に合った正しい文書処理を行うことができ
る。Further, according to the present invention, when the accuracy of the document type identification is low, the user is inquired about the selection of the document type candidate, so that even if the accuracy of the document type identification is low,
Correct document processing suitable for the type of document can be performed.

【０１０７】また本発明によれば、文書種類識別の確度
が低い場合に、ユーザに文書種類を問い合わせ、ユーザ
が選択した文書種類と文書種類認識部が認識した文書種
類候補の第１候補が異なるときに、文書特徴データベー
スを更新する。これにより、次回以降の文書種類認識の
確度を高めることが奏される。According to the present invention, when the accuracy of document type identification is low, the user is inquired about the document type, and the document type selected by the user is different from the first candidate of the document type candidate recognized by the document type recognition unit. Sometimes, the document feature database is updated. As a result, the accuracy of the next and subsequent document type recognition is improved.

【０１０８】さらに本発明によれば、文書画像に対して
文字認識を行って得たテキスト情報からキーワード情報
を取り出して、これを文書特徴の一つとして使用するこ
とにより、より詳しい文書特徴を用いることができるの
で、文書種類認識の確度を高くすることができる。Further, according to the present invention, keyword information is extracted from text information obtained by performing character recognition on a document image, and is used as one of the document features, thereby using more detailed document features. Therefore, the accuracy of document type recognition can be increased.

【０１０９】加えて本発明によれば、ユーザが文書を文
書画像読み取り部に取り込むと、情報処理が自動的に実
行される。これにより、ユーザは文書を入力部に挿入す
るだけでその他の操作を施さなくても情報処理を行うこ
とができる。In addition, according to the present invention, when a user takes in a document into a document image reading section, information processing is automatically executed. Thus, the user can perform information processing only by inserting a document into the input unit without performing other operations.

[Brief description of the drawings]

【図１】本発明の実施の形態１による情報処理装置の全
体を示す構成図FIG. 1 is a configuration diagram showing an entire information processing apparatus according to a first embodiment of the present invention;

【図２】本発明の実施の形態１による情報処理装置の主
要部を示す構成図FIG. 2 is a configuration diagram showing a main part of the information processing apparatus according to the first embodiment of the present invention;

【図３】本発明の実施の形態２の処理装置における文書
種類識別部の構成図FIG. 3 is a configuration diagram of a document type identification unit in the processing device according to the second embodiment of the present invention;

【図４】本発明の実施の形態３の情報処理装置全体の構
成図FIG. 4 is a configuration diagram of an entire information processing apparatus according to a third embodiment of the present invention;

【図５】本発明の実施の形態４の情報処理装置における
文書種類識別部の構成図FIG. 5 is a configuration diagram of a document type identification unit in the information processing apparatus according to the fourth embodiment of the present invention.

【図６】本発明の実施の形態５の情報処理装置における
文書画像入出力部の構成図FIG. 6 is a configuration diagram of a document image input / output unit in the information processing apparatus according to the fifth embodiment of the present invention.

【図７】本発明第実施の形態１の処理選択データベース
図FIG. 7 is a diagram showing a process selection database according to the first embodiment of the present invention;

【図８】本発明の実施の形態３における文書種類候補図FIG. 8 is a document type candidate diagram according to the third embodiment of the present invention.

【図９】本発明の実施の形態３におけるユーザ文書種類
選択部の表示図FIG. 9 is a display diagram of a user document type selection unit according to the third embodiment of the present invention.

【図１０】本発明の実施の形態４における文書キーワー
ドデータベース図FIG. 10 is a document keyword database diagram according to the fourth embodiment of the present invention.

【図１１】従来の情報処理装置の構成図FIG. 11 is a configuration diagram of a conventional information processing apparatus.

[Explanation of symbols]

１情報処理装置２画像表示部３スキャナ部４記憶部５形状認識部６画像分類部１２パターン認識部１４アプリケーションプログラム１４ａ住所録１４ｂ家計簿１４ｃアルバム１５アプリケーションプログラム２０文書２１文書画像入出力部２２文書特徴抽出部２３文書種類識別部２４文書特徴データベース２５処理選択部２６処理選択データベース３０文書特徴登録部３１文書種類認識部３２ユーザ文書種類選択部４１文字認識部４２キーワード文書特徴抽出部４３文書キーワードデータベース４４文書種類認識部５１文書画像読み取り部５２文書検知部 DESCRIPTION OF SYMBOLS 1 Information processing apparatus 2 Image display part 3 Scanner part 4 Storage part 5 Shape recognition part 6 Image classification part 12 Pattern recognition part 14 Application program 14a Address book 14b Housekeeping book 14c Album 15 Application program 20 Document 21 Document image input / output part 22 Document Feature extraction unit 23 Document type identification unit 24 Document feature database 25 Process selection unit 26 Process selection database 30 Document feature registration unit 31 Document type recognition unit 32 User document type selection unit 41 Character recognition unit 42 Keyword document feature extraction unit 43 Document keyword database 44 Document type recognition unit 51 Document image reading unit 52 Document detection unit

フロントページの続き (72)発明者萱嶋一弘大阪府門真市大字門真1006番地松下電器産業株式会社内 (72)発明者小川啓司大阪府東大阪市高井田本通４丁目２番５号松下冷機株式会社内Ｆターム(参考） 5B050 AA10 BA16 CA07 DA06 EA07 EA08 GA08 5B075 ND07 NK02 NK06 NR12 PP04 PP30 Continuing on the front page (72) Kazushima Kayashima, 1006 Kazuma Kadoma, Kadoma, Osaka Prefecture Inside Matsushita Electric Industrial Co., Ltd. (72) Keiji Ogawa 4-5-2, Takaidahondori, Higashi Osaka City, Osaka Matsushita Refrigerator Co., Ltd. F term (reference) 5B050 AA10 BA16 CA07 DA06 EA07 EA08 GA08 5B075 ND07 NK02 NK06 NR12 PP04 PP30

Claims

[Claims]

1. A document image input / output unit for recognizing a document and outputting image information of the document, a document feature extraction unit for extracting a document feature from the image information of the document, and a document feature of a registered document type A document feature database for storing information;
A document feature registration unit in which features of the document are registered based on the document feature information stored in the document feature database, the document feature information extracted by the document feature extraction unit, and the document type; Means for newly registering or updating document feature information in the document feature database by the unit, and comparing the document feature information extracted by the document feature extraction unit with the document feature information stored in the document feature database to obtain the document. A document type identification unit for identifying a type, a process selection database for storing a processing method corresponding to the document type, information on the processing method stored in the process selection database, and a document type output from the document type identification unit An information processing apparatus, comprising:

2. A document type recognizing unit for comparing a document feature extracted from a document feature extracting unit with document feature information stored in a document feature database to identify a document type; 2. The information processing apparatus according to claim 1, further comprising a user document type selection unit for inquiring a user of a type.

3. A character recognition unit for recognizing characters of a document image output from a document image input / output unit,
Extracting a keyword document feature from a document keyword database storing keyword information included in a document of a registered document type, a character recognition result output from the character recognition unit, and keyword information stored in the document keyword database. A key word document feature extracting unit, and comparing the document features extracted by the document feature extracting unit and the keyword document features extracted by the keyword document feature extracting unit with the document feature information stored in the document feature database to identify a document type. The information processing apparatus according to claim 1, further comprising a document type recognizing unit that performs the processing.

4. The image input / output unit includes a document image reading unit that reads a document as an image, and a document detection unit that detects that a document has been input to the document image reading unit. Item 4. The information processing device according to item 1, 2, or 3.

5. A step of outputting a document image with a document as an input, a step of extracting document characteristics from the document image, a step of storing document characteristic information of a registered document type, and A step of newly registering and updating the document type and document characteristic information; a step of comparing the document characteristic information with the extracted document characteristics to identify a document type; and a processing method corresponding to the document type. Storing information, selecting document processing based on the stored processing method information and the registered document type information, and executing the document processing based on the selecting the document processing. An information processing method, comprising:

6. A step of outputting a document image with a document as input, supplying image information to a step of extracting document characteristics from the document image, collating document characteristic information with the extracted document characteristics, and performing document type 6. The information processing method according to claim 5, wherein the image information is provided to at least one of the step of identifying the document processing and the step of selecting the document processing.

7. In the step of comparing the document feature information with the extracted document feature to identify the document type, the user may select an already registered document type and an evaluation value indicating the relevance of the document, or 6. The information processing method according to claim 5, further comprising: communicating with a means for referring to a certainty factor indicating an appearance rate of a keyword appearing in the document.

8. A character recognition step of recognizing a character in a document image output from a document image input / output unit, wherein the step of comparing the document characteristic information and the extracted document characteristic to identify a document type; Extracting a keyword suitable for the document type from a document keyword database registered in advance based on the recognition step; and collating information from the step of extracting the keyword with document feature information stored in the document feature database. 6. A method according to claim 5, further comprising the step of recognizing a document type.