JP2002055985A

JP2002055985A - Device and method for document attribute discrimination

Info

Publication number: JP2002055985A
Application number: JP2000238843A
Authority: JP
Inventors: Masatoshi Nishimura; 正寿西村
Original assignee: NTT Data Corp
Current assignee: NTT Data Group Corp
Priority date: 2000-08-07
Filing date: 2000-08-07
Publication date: 2002-02-20

Abstract

PROBLEM TO BE SOLVED: To lighten the burden of document attribute adding operation on a user by making it possible to discriminate document attributes written on a paper document itself when document management is performed by using a computer. SOLUTION: A document managing server 3 is mounted with a document attribute extraction information generating AP 37, an area information DB 15, and a document management processing AP 10. The user specifies the kinds, entry positions, and entry position IDs of document attributes for an image-documented paper document where the document attributes are described by using the extraction information AP 37 and stores them in the area information DB 15. Further, a document attribute extraction sheet is generated in which the kinds and entry position IDs of the document attributes as objects of extraction are set. When the paper document is scanned, the sheet is scanned as the cover of the paper document. The document management processing AP 10 discriminates and analyzes the scanned sheet, extracts document attributed from the paper document scanned after the sheet on the basis of the kinds and entry position IDs of the document attributes set on the sheet and the information in the area information DB 15, and gives them to an image document of the paper document.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、コンピュータを利
用した文書管理に関する技術であって、詳しくは、提案
書、会議録、決済書等の紙文書をスキャナ等で読み込ん
で作成したイメージ文書の属性情報（文書名、作成者、
作成日時など、以下、文書属性）を識別するための技術
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technique relating to document management using a computer, and more particularly, to the attributes of an image document created by reading a paper document such as a proposal, a meeting record, and a payment with a scanner or the like. Information (document name, author,
The present invention relates to a technology for identifying a document attribute (hereinafter referred to as a document date and time).

【０００２】[0002]

【従来の技術】銀行や会社等の種々の機関では、一般
に、提案書、会議録、決済書等の紙文書を、スキャナ等
に読み込ませてスキャンイメージ（以下、イメージ文書
と言う）にして、コンピュータに入力し管理することが
行なわれる。このとき、各イメージ文書に、文書名、作
成者、作成日時などの文書属性を付与する。2. Description of the Related Art In various institutions such as banks and companies, generally, paper documents such as proposals, minutes, and settlement documents are read by a scanner or the like to form a scan image (hereinafter, referred to as an image document). Input to the computer and management are performed. At this time, document attributes such as a document name, a creator, and a creation date and time are assigned to each image document.

【０００３】イメージ文書に文書属性を付与する方法と
しては、紙文書に記載されている文書属性を元にオペレ
ータがキー操作して直接付与するか、或いは、ＯＣＲ
（Optical Character Reader）帳票と称される文書属性
付与用の紙を用意し、その文書属性付与用紙の所定位置
に所定の文書属性を手で記入して（所定位置に記入でき
れば印刷でも可）、記入したその用紙を、紙文書の表紙
にし紙文書と共にスキャナ等に読込ませることで付与す
る方法がある。後者の場合は、コンピュータ又はスキャ
ナ等に搭載されるＯＣＲソフト（ＯＣＲ機能を持つアプ
リケーションソフト）が、紙文書の１枚目を文書属性付
与用紙とみなして、その紙の各所定位置から記入内容を
抽出して認識し、認識した記入内容を文書属性としてイ
メージ文書に付与する。As a method of assigning a document attribute to an image document, an operator can directly assign the attribute based on a document attribute described in a paper document,
(Optical Character Reader) Prepare a paper for assigning document attributes called a form, and manually enter the prescribed document attributes in a prescribed position on the document attribute assignment paper (printing is possible if it can be entered in the prescribed position). There is a method in which the filled-in sheet is provided as a cover of a paper document and read by a scanner or the like together with the paper document. In the latter case, the OCR software (application software having an OCR function) mounted on a computer or a scanner regards the first sheet of the paper document as a document attribute-added sheet, and writes the entry contents from each predetermined position on the paper. The content is extracted and recognized, and the recognized entry content is added to the image document as a document attribute.

【０００４】[0004]

【発明が解決しようとする課題】提案書、会議録、決済
書等の種々の紙文書には、文書名、作成者、作成日時な
どの文書属性が記載されるが、従来は、コンピュータは
紙文書自体に書かれている文書属性を識別することが出
来ない。そのため、文書属性をイメージ文書に付与する
ときは、上述したように、ユーザが、キー操作して文書
属性をコンピュータに直接付与するか、或いは、文書属
性付与用紙に手書き又は印刷により文書属性を記入して
付与するという煩わしい作業を行わなければならない。
これらいずれの作業も、一般には文書属性に係る文字を
全て入力しなければならないので、文書属性の文字数が
多いときには（例えば、文書名「国際登録出願及び国際
商標登録出願に関する政令の改正及び実務運用につい
て」におけるこれらの文字を入力しなければならないと
きには）、特に煩わしい。Various paper documents such as proposals, minutes, and settlement documents include document attributes such as a document name, a creator, and a creation date and time. Document attributes written on the document itself cannot be identified. Therefore, when assigning document attributes to an image document, as described above, the user directly assigns the document attributes to the computer by operating keys, or writes the document attributes by handwriting or printing on a document attribute assignment sheet. Cumbersome work of granting and giving.
In any of these operations, generally all characters related to the document attribute must be entered. Therefore, when the number of characters in the document attribute is large (for example, when the document name "Revision of a Cabinet Order concerning International Registration Application and International Trademark Registration Application and Practical Operation This is particularly troublesome when these characters in "about" must be entered).

【０００５】しかも、この作業は、１つの紙文書をイメ
ージ文書化する度に行う必要がある。In addition, this operation needs to be performed every time one paper document is converted into an image document.

【０００６】以上のように、コンピュータを利用した文
書管理では、文書属性の付与作業がユーザにとって面倒
且つ負担の大きいものとなっている。As described above, in document management using a computer, the task of assigning document attributes is cumbersome and burdensome for the user.

【０００７】従って、本発明の目的は、コンピュータを
利用した文書管理において、紙文書自体に書かれている
文書属性を識別できるようにし、以って、ユーザの文書
属性付与作業の負担を軽減することにある。Accordingly, an object of the present invention is to make it possible to identify a document attribute written in a paper document itself in document management using a computer, thereby reducing the burden on the user for assigning document attributes. It is in.

【０００８】[0008]

【課題を解決するための手段】本発明に従う文書属性識
別装置は、イメージ文書にする紙文書に記載された文書
属性の位置を示す位置情報と、その位置に記載された文
書属性の種類を示す種類情報とを含む文書属性関連情報
を入力する入力手段と、入力された文書属性関連情報を
記憶する記憶手段と、イメージ文書にする紙文書に記載
された文書属性のうち所望の種類の文書属性の位置情報
を指定する指定手段と、指定された位置情報と記憶した
文書属性関連情報とに基づいて、紙文書のイメージ文書
から文書属性を抽出し抽出した文書属性の種類を識別す
る識別手段とを備える。SUMMARY OF THE INVENTION A document attribute identification device according to the present invention indicates position information indicating a position of a document attribute described in a paper document to be an image document, and indicates a type of the document attribute described in the position. Input means for inputting document attribute-related information including type information; storage means for storing the input document attribute-related information; and document attributes of a desired type among document attributes described in a paper document to be an image document Specifying means for specifying the position information of the document, identifying means for extracting a document attribute from the image document of the paper document based on the specified position information and the stored document attribute related information, and identifying a type of the extracted document attribute. Is provided.

【０００９】本発明によれば、ユーザが紙文書における
所望の文書属性の位置情報を指定するだけで、紙文書自
体からユーザ所望の文書属性を抽出し、その文書属性の
種類を識別できる。抽出した文書属性は、その文書属性
に係るイメージ文書に付与して、イメージ文書と共に保
存することが可能である。According to the present invention, it is possible to extract a user-desired document attribute from the paper document itself and identify the type of the document attribute only by specifying the position information of the desired document attribute in the paper document. The extracted document attribute can be added to the image document related to the document attribute and stored together with the image document.

【００１０】好適な実施形態では、上記文書属性関連情
報には、位置情報に割当てた位置識別子が含まれ、指定
手段が、所望の種類の文書属性の位置情報の位置識別子
を指定し、識別手段が、指定された位置識別子と上記文
書属性関連情報とに基づいて、紙文書のイメージ文書か
ら文書属性を抽出し、抽出した文書属性の種類を識別す
る。In a preferred embodiment, the document attribute-related information includes a position identifier assigned to the position information, and the specifying means specifies the position identifier of the desired type of document attribute, and Extracts the document attribute from the image document of the paper document based on the designated position identifier and the document attribute related information, and identifies the type of the extracted document attribute.

【００１１】好適な実施形態では、上記文書属性関連情
報には、文書属性の記載文字の特徴を示す文字特徴情報
が含まれ、指定手段が、所望の種類の文書属性の記載文
字の文字特徴情報を指定し、識別手段が、指定された文
字特徴情報と上記文書属性関連情報とに基づいて、紙文
書のイメージ文書から文書属性を抽出し、抽出した文書
属性の種類を識別する。In a preferred embodiment, the document attribute-related information includes character feature information indicating a feature of a character described in the document attribute, and the designating means determines the character feature information of the character described in the desired type of document attribute. And identifying means for extracting a document attribute from the image document of the paper document based on the specified character characteristic information and the document attribute related information, and identifying the type of the extracted document attribute.

【００１２】[0012]

【発明の実施の形態】以下、図面を参照して本発明の実
施の形態を詳細に説明する。Embodiments of the present invention will be described below in detail with reference to the drawings.

【００１３】図１は、本発明の一実施形態に係るシステ
ムの全体構成を示すブロック図である。FIG. 1 is a block diagram showing the overall configuration of a system according to one embodiment of the present invention.

【００１４】このシステムは、会議録や決済文書等の種
々の紙文書をスキャンしてイメージ文書を作成するスキ
ャナ１ａ、１ｂ、…と、スキャナ１ａ、１ｂ、…が作成
したイメージ文書自体から文書属性を抽出しそれをその
イメージ文書に付与して文書管理する文書管理サーバ３
とを備える。スキャナ１ａ、１ｂ、…は、例えば文書管
理サーバ３の周辺機器として利用される。This system scans various paper documents such as conference records and settlement documents to create image documents, and scanners 1a, 1b,. Document management server 3 that extracts and assigns it to the image document to manage the document
And The scanners 1a, 1b,... Are used, for example, as peripheral devices of the document management server 3.

【００１５】図２は、文書管理サーバ３の構成を示すブ
ロック図である。FIG. 2 is a block diagram showing the configuration of the document management server 3.

【００１６】文書管理サーバ３は、例えば、パーソナル
コンピュータやワークステーション等の汎用コンピュー
タであって、ディスプレイ３１と、キーボードやマウス
等の入力装置３３と、スキャナ１ａ、１ｂ、…が作成し
たイメージ文書自体から文書属性を抽出するための情報
を設定する文書属性抽出シートを作成するアプリケーシ
ョンソフト（以下、抽出情報作成ＡＰ）３７を備える。
また、文書管理サーバ３は、文書属性抽出シートフォー
マットデータベース（以下、シートフォーマットＤＢ）
１１と、エリア情報データベース（エリア情報ＤＢ）１
５と、文字情報データベース（文字情報ＤＢ）１９と、
文書属性／イメージ文書データベース（文書属性／イメ
ージ文書ＤＢ）２７と、文書管理処理アプリケーション
（文書管理処理ＡＰ）１０とを備える。文書管理処理Ａ
Ｐ１０は、文書属性抽出シートの設定内容を認識し、そ
の設定内容に従ってイメージ文書自体から文書属性を抽
出してそれをイメージ文書に付与して文書管理をする。
具体的に言うと、文書管理処理ＡＰ１０は、スキャナ制
御部７と、文書属性抽出シート識別部（以下、シート識
別部）９と、文書属性抽出シート解析部（以下、シート
解析部）１３と、エリア情報解析部１７と、文字情報解
析部２１と、文書属性抽出部２３と、文書属性付与部２
５とを備える。The document management server 3 is, for example, a general-purpose computer such as a personal computer or a workstation, and includes a display 31, an input device 33 such as a keyboard and a mouse, and the image document itself created by the scanners 1a, 1b,. Application software (hereinafter referred to as an extraction information creation AP) 37 for creating a document attribute extraction sheet for setting information for extracting a document attribute from the document.
Further, the document management server 3 stores a document attribute extraction sheet format database (hereinafter, sheet format DB).
11 and area information database (area information DB) 1
5, a character information database (character information DB) 19,
A document attribute / image document database (document attribute / image document DB) 27 and a document management processing application (document management processing AP) 10 are provided. Document management processing A
P10 recognizes the setting content of the document attribute extraction sheet, extracts the document attribute from the image document itself according to the setting content, and attaches it to the image document to manage the document.
More specifically, the document management process AP10 includes a scanner control unit 7, a document attribute extraction sheet identification unit (hereinafter, sheet identification unit) 9, a document attribute extraction sheet analysis unit (hereinafter, sheet analysis unit) 13, Area information analyzing section 17, character information analyzing section 21, document attribute extracting section 23, document attribute assigning section 2
5 is provided.

【００１７】抽出情報作成ＡＰ３７は、スキャナ１ａ、
１ｂ、…から文書属性が記載されたイメージ文書を受信
し、それをディスプレイ３１に表示して、表示したイメ
ージ文書に記載されている各種文書属性の記載位置（ペ
ージ番号やそのページにおける記載位置）や記載の特徴
等を元に、文書属性抽出シートの作成、文書属性エリア
の指定、及び文書属性抽出シート上への情報設定をユー
ザに実行させる。文書属性抽出シートには、文書属性の
記載位置を示すエリア情報、文書属性の文字特徴情報
（フォント、文字サイズなど）等が設定される（詳細は
後述する）。The extraction information creation AP 37 includes a scanner 1a,
1b,..., An image document in which the document attributes are described, is received and displayed on the display 31, and the description positions (page numbers and description positions in the pages) of various document attributes described in the displayed image document are received. The user is caused to create a document attribute extraction sheet, specify a document attribute area, and set information on the document attribute extraction sheet based on the characteristics of the document and the description. In the document attribute extraction sheet, area information indicating the writing position of the document attribute, character feature information (font, character size, and the like) of the document attribute are set (details will be described later).

【００１８】シートフォーマットＤＢ１１には、ユーザ
によって作成された文書属性抽出シートに関する情報
が、抽出情報作成ＡＰ３７によって保存される。In the sheet format DB 11, information on a document attribute extraction sheet created by the user is stored by an extraction information creation AP 37.

【００１９】エリア情報ＤＢ１５には、ユーザによって
指定された文書属性エリアに関する情報が、抽出情報作
成ＡＰ３７によって保存される。In the area information DB 15, information relating to the document attribute area designated by the user is stored by the extraction information creation AP 37.

【００２０】文字情報ＤＢ１９には、文書管理処理ＡＰ
１０の文字情報解析部２１によって参照される、文書属
性の文字フォントを特定するための後述のフォント／フ
ォントＩＤテーブルが保存されている。The character information DB 19 contains a document management process AP.
A font / font ID table, which will be described later, for specifying a character font having a document attribute and referred to by the character information analysis unit 21 is stored.

【００２１】文書属性／イメージ文書ＤＢ２７には、文
書管理処理ＡＰ１０の文書属性付与部２５によって、文
書属性が付与されたイメージ文書が保存される。The document attribute / image document DB 27 stores an image document to which a document attribute has been assigned by the document attribute assigning unit 25 of the document management process AP10.

【００２２】スキャナ制御部７は、スキャナ１ａ、１
ｂ、…から紙文書がスキャンされているかどうかを定期
的に監視し、紙文書がスキャンされているスキャナがあ
る場合は、そのスキャナからイメージ文書を取得する。The scanner control unit 7 includes scanners 1a, 1
b,... are periodically monitored to determine whether a paper document is being scanned. If there is a scanner that is scanning the paper document, an image document is acquired from the scanner.

【００２３】シート識別部９は、取得されたイメージ文
書の各ページをチェックし、それらが文書属性抽出シー
トであるか否かを判断する。シート識別部９は、文書属
性抽出シートを識別した場合は、その旨をシート解析部
１３に通知する。The sheet identification unit 9 checks each page of the acquired image document and determines whether or not each page is a document attribute extraction sheet. When identifying the document attribute extraction sheet, the sheet identification unit 9 notifies the sheet analysis unit 13 of the identification.

【００２４】シート解析部１３は、シートフォーマット
ＤＢ１１に保存されている情報を元に、識別された文書
属性抽出シートからユーザに設定された各種情報を取得
する。The sheet analysis unit 13 acquires various information set by the user from the identified document attribute extraction sheet based on the information stored in the sheet format DB 11.

【００２５】エリア情報解析部１７は、文書属性抽出シ
ートから取得された各種情報のうちの文書属性記載位置
情報と、エリア情報ＤＢ１５内に格納されているエリア
情報とに基づいて、イメージ文書自体に記載されている
各種文書属性の記載エリアを特定し、特定したエリアの
情報を、文書属性抽出部２３に通知する。The area information analysis unit 17 converts the image document itself based on the document attribute description position information of the various information obtained from the document attribute extraction sheet and the area information stored in the area information DB 15. The description area of the described various document attributes is specified, and information of the specified area is notified to the document attribute extraction unit 23.

【００２６】文字情報解析部２１は、文書属性抽出シー
トから取得された各種情報のうちの文字特徴情報（例え
ばフォント）と、文字情報ＤＢ１９内に格納されている
フォント／フォントＩＤテーブルとに基づいて、イメー
ジ文書自体に記載されている各種文書属性の文字特徴を
特定し、特定した文字特徴の情報を、文書属性抽出部２
３に通知する。The character information analysis unit 21 is based on character characteristic information (for example, font) of various information obtained from the document attribute extraction sheet and a font / font ID table stored in the character information DB 19. The character attribute of various document attributes described in the image document itself is specified, and information of the specified character characteristic is transmitted to the document attribute extracting unit 2.
Notify 3.

【００２７】文書属性抽出部２３は、イメージ文書を解
析し、エリア情報解析部１７から通知された情報と、文
字情報解析部２１から通知された情報とに基づいて、イ
メージ文書自体から文書属性を抽出する。The document attribute extracting unit 23 analyzes the image document, and extracts the document attributes from the image document itself based on the information notified from the area information analyzing unit 17 and the information notified from the character information analyzing unit 21. Extract.

【００２８】文書属性付与部２５は、文書属性抽出部２
３が抽出した文書属性をイメージ文書に付与し、文書属
性を付与したイメージ文書を、文書属性／イメージ文書
ＤＢ２７に保存する。The document attribute assigning section 25 includes the document attribute extracting section 2
3 assigns the extracted document attributes to the image document, and stores the image document to which the document attributes have been assigned in the document attribute / image document DB 27.

【００２９】以下、文書属性が記載されたイメージ文書
が、図３に示すイメージ文書４１である場合を例に、文
書管理サーバ３で行なわれる処理を具体的に説明する。Hereinafter, the processing performed by the document management server 3 will be specifically described, taking as an example the case where the image document in which the document attributes are described is the image document 41 shown in FIG.

【００３０】まず、文書属性抽出シートの作成、文書属
性エリアの分割、及び文書属性抽出シート上への情報設
定について具体的に説明する。First, the creation of the document attribute extraction sheet, the division of the document attribute area, and the setting of information on the document attribute extraction sheet will be specifically described.

【００３１】図３に示すイメージ文書４１は、例えば複
数ページあるイメージ文書のうちの１ページ目であっ
て、各種文書属性が記載されている。例えば、文書属性
の種類として、バージョン情報「Version 1.0」、文書
名「○○に関する提案書」、企業・組織名「株式会社○
○」、作成者「○×太郎」、作成年月日「2000/01/01」
が記載されている（勿論、種々の文書属性は、１ページ
目に限らず、複数のページに渡って記載されている場合
もある）。ユーザは、これらの文書属性を元に、抽出情
報作成ＡＰ３７を用いて、以下のような文書属性抽出シ
ートを作成する。The image document 41 shown in FIG. 3 is, for example, the first page of an image document having a plurality of pages, in which various document attributes are described. For example, as the types of document attributes, version information “ Version 1.0 ”, document name “Proposal on XX”, company / organization name “
○ ”, author“ ○ × Taro ”, creation date“ 2000/01/01 ”
(Of course, various document attributes are described not only on the first page but also on a plurality of pages in some cases). The user creates the following document attribute extraction sheet using the extraction information creation AP 37 based on these document attributes.

【００３２】図４は、文書属性抽出シートの一例を示
す。FIG. 4 shows an example of a document attribute extraction sheet.

【００３３】文書属性抽出シート４３は、紙文書自体
（つまりイメージ文書４１自体）に記載されている抽出
対象の文書属性（文書名、作成者など）、文書属性の記
載位置（ページ番号及びそのページにおける記載位置な
ど）、及び文書属性の記載の特徴（文字サイズ、フォン
トなど）などを設定できるシートである。より具体的に
言えば、文書属性抽出シート４３には、文書属性抽出シ
ートを識別するためのユニークなＩＤ（シートＩＤ）の
記入欄４５と、種々の文書属性について設定可能な項目
（以下、属性項目）が展開される属性項目展開欄４９
と、抽出対象の文書属性を入力できる文書属性入力欄４
７と、入力した文書属性の各属性項目について情報を設
定するための情報設定ボックス５１_１〜５１_ｎ（図では
５１_１〜５１ _３０）とが設けられる。The document attribute extraction sheet 43 is a paper document itself.
(That is, extraction described in the image document 41 itself)
Description of target document attributes (document name, creator, etc.) and document attributes
Placement position (page number and position on the page
) And document attributes (character size, phone
Sheet, etc.) can be set. More specifically
In other words, the document attribute extraction sheet 43 includes the document attribute extraction system.
Unique ID (sheet ID)
Entry field 45 and items that can be set for various document attributes
Attribute item expansion column 49 in which (hereinafter, attribute items) are expanded.
And a document attribute input field 4 for inputting a document attribute to be extracted
7 and information about each attribute item of the input document attribute.
Information setting box 51 for specifying₁~ 51_n(In the figure
51₁~ 51 ₃₀) Are provided.

【００３４】属性項目展開欄４９には、種々の文書属性
について記録可能な属性項目が横方向に配列されるよう
になっている。配列される属性項目の種類には、例え
ば、『記載ページ』、『エリア』、『フォント』、『サ
イズ』、『文字飾り』、及び『取得文字数』がある。
『記載ページ』は、抽出対象の文書属性が記載されてい
るページ番号を設定するための属性項目である。『エリ
ア』は、『記載ページ』で設定されたページのどのエリ
アに抽出対象の文書属性が記載されているかを設定する
ための属性項目である。『フォント』は、抽出対象の文
書属性の文字のフォントを設定するための属性項目であ
る。『サイズ』は、抽出対象の文書属性の文字のサイズ
を設定するための属性項目である。『文字飾り』は、抽
出対象の文書属性の文字にどのような文字飾り（斜体、
下線など）がされているかを設定するための属性項目で
ある。『取得文字数』は、抽出したい文字数を設定する
ための属性項目である。属性項目は、ユーザが任意に追
加登録できるようにしても良い。In the attribute item expansion column 49, recordable attribute items for various document attributes are arranged in the horizontal direction. Types of the attribute items to be arranged include, for example, “description page”, “area”, “font”, “size”, “character decoration”, and “number of acquired characters”.
The “description page” is an attribute item for setting a page number in which a document attribute to be extracted is described. “Area” is an attribute item for setting in which area of the page set in “Description Page” the document attribute to be extracted is described. “Font” is an attribute item for setting the font of the character of the document attribute to be extracted. “Size” is an attribute item for setting the size of the character of the document attribute to be extracted. "Character decoration" means what character decoration (italics, italic,
This is an attribute item for setting whether or not the item is underlined. “Acquired number of characters” is an attribute item for setting the number of characters to be extracted. The attribute item may be arbitrarily added by the user.

【００３５】文書属性入力欄４７は、抽出対象の文書属
性を縦方向に配列するようになっている。ユーザは、こ
の入力欄４７に、イメージ文書４１に記載されている文
書属性の種類で抽出対象とするもの、例えば、『文書
名』、『作成者』、『企業・組織名』、『作成年月
日』、及び『バージョン』を入力する。なお、抽出対象
の文書属性は、デフォルトでいくつか用意しても良い。
その場合は、ユーザが任意に追加登録・削除することが
できる。The document attribute input column 47 arranges document attributes to be extracted in the vertical direction. In the input field 47, the user can extract the types of document attributes described in the image document 41, such as “document name”, “creator”, “company / organization name”, and “year of creation”. Enter the “month and date” and “version”. Note that some document attributes to be extracted may be prepared by default.
In that case, the user can arbitrarily add / delete.

【００３６】情報設定ボックス５１_１〜５１_３０は、属
性項目展開欄４９に展開された各属性項目と、文書属性
入力欄４７に入力した各文書属性との交点上に展開され
る。ユーザは、各情報設定ボックス５１_１〜５１_３０に
は、情報設定対象の属性項目に応じて、チェックマーク
又は数字のどちらかを設定するようにする。この設定
は、クライアントマシン５ａを用いて文書抽出作成シー
ト４３に直接データ設定するか、或いは、文書抽出作成
データ４３をプリントアウトして手書きで設定する。The information setting box 51 _1-51 ₃₀ includes a respective attribute items developed in the attribute item expand column 49, is deployed on the intersection of the respective document attributes entered in the document attribute input column 47. The user, each information setting box 51 _1-51 _30, in accordance with the attribute item information setting target, so as to set either check marks or numbers. This setting is performed by directly setting data on the document extraction and creation sheet 43 by using the client machine 5a, or by printing out the document extraction and creation data 43 by handwriting.

【００３７】以上のようにして、ユーザは、抽出情報作
成ＡＰ３７を用いて文書属性抽出シート４３を作成す
る。抽出情報作成ＡＰ３７は、作成された文書属性抽出
シート４３を解析して、以下の情報を取得する。すなわ
ち、抽出情報作成ＡＰ３７は、所定の位置（例えば文書
属性抽出シート４３の左上頂点）を基準（原点）とした
ときの各情報設定ボックス５１_１〜５１_３０の位置座標
と、各情報設定ボックス５１_１〜５１_３０が対応する属
性項目及び文書属性の種類情報と、各情報設定ボックス
５１_１〜５１_３０への設定方法（チェックマークと数字
のどちらを設定するか）の情報とを取得する。抽出情報
作成ＡＰ３７は、取得したこれらの情報を、シートフォ
ーマットＤＢ１１に保存する。As described above, the user creates the document attribute extraction sheet 43 using the extraction information creation AP 37. The extraction information creation AP 37 analyzes the created document attribute extraction sheet 43 and acquires the following information. That is, the extracted information creation AP37 includes position coordinates of each information setting box ₅₁ _{1-51 30} when the as a reference (origin) (upper left corner of the example document attribute extraction sheet 43) a predetermined position, each information setting box 51 and attribute items and document attribute type _{information 1-51} ₃₀ corresponds, obtains the information of the setting method for each information setting box 51 _1-51 ₃₀ (to set either check mark and numbers). The extraction information creation AP 37 saves the acquired information in the sheet format DB 11.

【００３８】ユーザは、この文書属性抽出シート４３の
各情報設定ボックス５１_１〜５１_３ _０に、イメージ文書
４１に記載されている各種文書属性の記載位置や記載の
特徴を元にして、各属性項目について情報設定する。以
下、各属性項目への情報設定方法について説明する。The user, in the information setting box 51 _1-51 ₃ ₀ of the document attribute extraction sheet 43, based on the features described positions and descriptions of various document attributes that are listed in the image document 41, each attribute Set information about the item. Hereinafter, a method of setting information for each attribute item will be described.

【００３９】属性項目『記載ページ』の情報設定ボック
ス５１_１〜５１_５には、それぞれの文書属性のが記載さ
れているページ数を設定する。図３に示したイメージ文
書４１で言えば、情報設定ボックス５１_１〜５１_５に設
定される数字は全て「１」となる。[0039] information setting box 51 _1-51 ₅ attribute item "according page" are each unique document attribute sets the number of pages that have been described. In terms of image document 41 shown in FIG. 3, numbers are set in the information setting box 51 _1-51 ₅ are all "1".

【００４０】属性項目『エリア』の情報設定ボックス５
１_６〜５１_１０には、以下のようにして付与したエリア
ＩＤを設定する。図５及び図６を参照して、エリアＩＤ
の付与方法について説明する。Information setting box 5 for attribute item “Area”
1 ₆ to 51 ₁₀ sets the area ID assigned in the following manner. Referring to FIG. 5 and FIG. 6, the area ID
Will be described.

【００４１】図５は、文書属性の記載エリアを指定する
ときのイメージ文書４１の表示画面を示す。FIG. 5 shows a display screen of the image document 41 when the description area of the document attribute is designated.

【００４２】ユーザは、マウス操作して、ディスプレイ
３１に表示されているイメージ文書４１上の各種文書属
性を任意の形状（方形、楕円など）の枠で囲み、イメー
ジ文書４１にエリアを設定する。例えば、ユーザは、図
５に示すように、イメージ文書４１に対し、バージョン
情報「Version 1.0」を方形の枠で囲んだエリア６１、
文書名「○○に関する提案書」を方形の枠で囲んだエリ
ア６３、企業・組織名「株式会社○○」を方形の枠で囲
んだエリア６５、作成者「○×太郎」を方形の枠で囲ん
だエリア６７、及び、作成年月日「2000/01/01」を方形
の枠で囲んだエリア６９を指定する。ユーザは、以上の
ようにイメージ文書４１に対しエリア６１〜６９を指定
した後は、各エリア６１〜６９に対して、各エリア６１
〜６９を識別するためのユニークなエリアＩＤを付与す
る。The user operates the mouse to surround various document attributes on the image document 41 displayed on the display 31 with a frame of an arbitrary shape (square, ellipse, etc.) and set an area in the image document 41. For example, as shown in FIG. 5, the user adds an area 61 enclosing version information “ Version 1.0 ” with a rectangular frame to the image document 41,
Area 63 enclosing the document name “Proposal on XX” in a square frame, area 65 enclosing the company / organization name “XX Corporation” in a square frame, and the creator “XX Taro” in a square frame And an area 69 enclosing the creation date "2000/01/01" in a rectangular frame. After the user designates the areas 61 to 69 for the image document 41 as described above, the user designates each area 61 to 69 for each of the areas 61 to 69.
A unique area ID for identifying ６９69 is assigned.

【００４３】図６は、各エリア６１〜６９に対してエリ
アＩＤを付与するときの表示画面である。FIG. 6 is a display screen when an area ID is assigned to each of the areas 61 to 69.

【００４４】イメージ文書４１にエリア６１〜６９を指
定した後は、イメージ文書４１の表示エリア４１ａと共
に、指定したエリア６１〜６９が表示される。ユーザ
は、この画面上で、各エリア６１〜６９にユニークなエ
リアＩＤ（例えば数字）を付与する。例えば、ユーザ
は、上述のエリア６１にはエリアＩＤ「１」、エリア６
３にはエリアＩＤ「２」、エリア６５にはエリアＩＤ
「３」、エリア６７にはエリアＩＤ「４」、エリア６９
にはエリアＩＤ「５」を付与する。After specifying the areas 61 to 69 in the image document 41, the specified areas 61 to 69 are displayed together with the display area 41a of the image document 41. The user assigns a unique area ID (for example, a number) to each of the areas 61 to 69 on this screen. For example, the user sets area ID “1” in area 61 and area 6
3 is the area ID "2", area 65 is the area ID
"3", area 67 has area ID "4", area 69
Is assigned an area ID "5".

【００４５】抽出情報作成ＡＰ３７は、エリアＩＤが付
与されたイメージ文書エリア４１ａを解析し、各エリア
ＩＤ及び各エリアＩＤが付与されたエリアの位置情報を
取得し、それらの情報を、エリア情報ＤＢ１５に保存す
る。なお、エリアの位置情報は、イメージ文書エリア４
１ａの所定の位置（例えば左上頂点）を基準（原点）と
したときの位置、例えば、エリアの形状が方形ならば各
頂点座標、エリア形状が楕円ならば中心座標及び楕円を
表す式の情報である。The extraction information creation AP 37 analyzes the image document area 41a to which the area ID is assigned, obtains each area ID and the position information of the area to which each area ID is assigned, and stores the information in the area information DB15. To save. The location information of the area is stored in the image document area 4
The position when a predetermined position 1a (for example, the upper left vertex) is set as a reference (origin), for example, the coordinates of each vertex if the shape of the area is a rectangle, and the center coordinates and the information of the equation representing the ellipse if the area is an ellipse is there.

【００４６】以上のようなエリアＩＤの付与作業を行う
のは、文書属性抽出シートを作成する前でも後でも良
い。The work of assigning the area ID as described above may be performed before or after the creation of the document attribute extraction sheet.

【００４７】再び図４を参照する。Referring back to FIG.

【００４８】属性項目『エリア』の情報設定ボックス５
１_６〜５１_１０には、各種文書属性に対応するエリアＩ
Ｄを設定する。すなわち、上述のエリアＩＤの付与作業
によれば、エリアＩＤと文書属性種類との対応関係は、
「ＩＤ「１」：『バージョン』」、「ＩＤ「２」：『文
書名』」、「ＩＤ「３」：『企業・組織名』」、「ＩＤ
「４」：『作成者』」、「ＩＤ「５」：『作成年月
日』」となっているので、ユーザは、文書属性『文書
名』の情報設定ボックス５１_６には「２」、文書属性
『作成者』の情報設定ボックス５１_７には「４」、文書
属性『企業・組織名』の情報設定ボックス５１_８には
「３」、文書属性『作成年月日』の情報設定ボックス５
１_９には「５」、文書属性『バージョン』の情報設定ボ
ックス５１_１０には「１」を設定する。Information setting box 5 for attribute item “Area”
1 _6-51 The _10, the area corresponding to the various document attributes I
Set D. That is, according to the above-described area ID assignment operation, the correspondence between the area ID and the document attribute type is as follows.
“ID“ 1 ”:“ version ””, “ID“ 2 ”:“ document name ””, “ID“ 3 ”:“ company / organization name ””, “ID
"4": "Creator,""," ID "5": so has become a "creation date", "the user, the information set box 51 ₆ of the document attribute" document name "," 2 ", the information set box 51 ₇ of the document attribute "creator", "4", the information set box 51 ₈ of the document attribute "companies and organization name", "3", the document attribute "creation date" in the information setting box 5
The 1 ₉ in the information setting box 51 ₁₀ "5", document attribute "version" is set to "1".

【００４９】属性項目『フォント』の情報設定ボックス
５１_１１〜５１_１５には、イメージ文書４１に記載され
ている文書属性のフォントを元に、文字情報ＤＢ１９内
に予め格納されているフォント／フォントＩＤテーブル
に従って、情報を設定する。[0049] information setting box ₅₁ _{11-51 15} property item "Font", based on the font document attributes that are listed in the image document 41, font / font ID previously stored in the character information DB19 Set the information according to the table.

【００５０】図７は、フォント／フォントＩＤテーブル
を示す。FIG. 7 shows a font / font ID table.

【００５１】フォント／フォントＩＤテーブル７１に
は、フォント名と、各フォント名に対応させたフォント
ＩＤ（例えば数字）とが記録されている。例えば、この
図に示すフォント／フォントＩＤテーブル７１には、
「フォント名：フォントＩＤ」の順で言うと、「ゴシッ
ク：１」、「明朝：２」、「楷書：３」、「太ゴシッ
ク：４」、及び「ポップ：５」が記載されている。この
フォント／フォントＩＤテーブル７１は、ユーザが作成
できるようにしても良い。In the font / font ID table 71, a font name and a font ID (for example, a numeral) corresponding to each font name are recorded. For example, the font / font ID table 71 shown in FIG.
In the order of "font name: font ID", "Gothic: 1", "Mincho: 2", "Square: 3", "Thick Gothic: 4", and "Pop: 5" are described. . This font / font ID table 71 may be created by the user.

【００５２】ユーザは、このフォント／フォントＩＤテ
ーブル７１に従って、図４に示す属性項目『フォント』
の情報設定ボックス５１_１１〜５１_１５に情報を設定す
る。なお、属性項目『フォント』の情報設定ボックス５
１_１１〜５１_１５に情報を設定しない場合は、文書属性
の抽出は、文書属性のフォントに関係無く行なわれる。In accordance with the font / font ID table 71, the user selects the attribute item “Font” shown in FIG.
Setting the information of the information setting box ₅₁ _{11-51 15.} The information setting box 5 of the attribute item “Font”
If not set information in 1 _11-51 _15, extracts the document attribute is performed regardless of the font of the document attributes.

【００５３】属性項目『サイズ』の情報設定ボックス５
１_１６〜５１_２０には、抽出対象の文書属性の文字サイ
ズをポイント単位（所定のワープロソフト（例えばMicr
osoft社のWord98）で設定可能な文字サイズ）で設定す
る。なお、属性項目『サイズ』の情報設定ボックス５１
_１６〜５１_２０に情報を設定しない場合は、文書属性の
抽出は、文書属性の文字サイズに関係無く行なわれる。Information setting box 5 for attribute item “size”
1 _16-51 to _20, points to the character size of the document attributes to be extracted (predetermined word processing software (e.g., Micr
Set with the character size that can be set in Word98) of osoft. The information setting box 51 of the attribute item “size”
_16-51 ₂₀ when not set information, the extraction of document attribute is performed regardless of the character size of the document attributes.

【００５４】属性項目『文字飾り』の情報設定ボックス
５１_２１〜５１_２５には、抽出対象の文書属性の文字
に、斜体、下線付き等の文字飾りがあるときに情報設
定、例えばチェックマークを記入する。図３に示したイ
メージ文書４１で言えば、文書属性『バージョン』は
「Version 1.0」と記載されているので文字飾り「下線
付き」が、文書属性『作成年月日』は「2000/01/01」と
記載されているので文字飾り「斜体」があるので、文書
属性『バージョン』及び『作成年月日』に対応の情報設
定ボックス５１_２４及び５１_２５には、チェックマーク
を記入する。[0054] in the attribute item information set box 51 ₂₁ to 51 ₂₅ of "character ornament" is, in the character of the document attributes to be extracted, fill in italics, information set when there is a character ornament underlined, such as, for example, a check mark I do. In the image document 41 shown in FIG. 3, since the document attribute "version" is described as " Version 1.0 ", the character decoration "underlined" is displayed, and the document attribute "creation date" is "2000/01/2000". since it is described that 01 "because of the character embellishment" italic ", the document attribute" version "and" correspondence information set creation date "box 51 ₂₄ and 51 _25, is a check mark.

【００５５】勿論、本実施形態では、各種文字飾りに対
応させた文字飾りＩＤを用意して（例えば、文字飾り
『太線』には文字飾りＩＤ「１」、文字飾り『斜体』に
は文字飾りＩＤ「２」、文字飾り『下線付き』には文字
飾りＩＤ「３」を対応させたテーブルを用意して）、抽
出対象の文書属性の文字に文字飾りがあるときは、その
文字飾りに対応した文字飾りＩＤを記入しても良い。な
お、属性項目『文字飾り』の情報設定ボックス５１_１６
〜５１_２０に情報設定しない場合は、文書属性の抽出
は、文書属性に文字飾りが有るか無いかに関係無く行な
われる。Of course, in this embodiment, character decoration IDs corresponding to various character decorations are prepared (for example, character decoration ID “1” for character decoration “bold”, character decoration ID “1” for character decoration “Italic”). Prepare a table in which character decoration ID “3” is associated with ID “2” and character decoration “underlined”. If the character of the document attribute to be extracted has character decoration, it corresponds to the character decoration. The entered character decoration ID may be entered. It should be noted that the information set box 51 ₁₆ of the attribute item "character ornament"
If the to 51 ₂₀ no information setting, the extraction of document attributes is carried out regardless of whether or not there or character ornament is in the document attribute.

【００５６】属性項目『取得文字数』の情報設定ボック
ス５１_２６〜５１_３０には、抽出する文書属性の文字数
を制限したいときに、その文字数を設定する。[0056] The information set box 51 _26-51 ₃₀ of the attribute item "Get the number of characters" is, when you want to limit the number of characters of the document attributes to be extracted, to set the number of characters.

【００５７】以上のようにして、ユーザは、文書属性抽
出シート４３の各情報設定ボックス５１_１〜５１_３０に
イメージ文書４１から各種文書属性を抽出するための情
報を設定し、且つ、シートＩＤ記入欄４５にユニークな
シートＩＤを記入して、図８に示す文書属性抽出シート
４３’を完成させる。[0057] As described above, the user sets the information for extracting various document attributes from the image document 41 in the information setting box 51 _1-51 ₃₀ document attribute extraction sheet 43, and a sheet ID entry A unique sheet ID is entered in the column 45 to complete the document attribute extraction sheet 43 'shown in FIG.

【００５８】ユーザは、文書属性抽出シート４３’に係
る紙文書を任意のスキャナ１ａ（又は１ｂ、…）にスキ
ャンさせる場合は、その紙文書の一番上に（つまり紙文
書の表紙として）文書属性抽出シート４３’をセットす
る（つまり、文書属性抽出シート４３’が一番初めにス
キャナ１ａにスキャンされるようにする）。１台のスキ
ャナ１ａに、複数種類の紙文書をスキャンさせる場合に
は、紙文書の種類が異なる各境目に、文書属性抽出シー
トをそれぞれセットする。When the user scans a paper document related to the document attribute extraction sheet 43 'with an arbitrary scanner 1a (or 1b,...), The document is placed at the top of the paper document (that is, as a cover of the paper document). The attribute extraction sheet 43 'is set (that is, the document attribute extraction sheet 43' is first scanned by the scanner 1a). When one scanner 1a is to scan a plurality of types of paper documents, a document attribute extraction sheet is set at each boundary where the types of paper documents are different.

【００５９】スキャナ１ａにセットした文書属性抽出シ
ート４３’及び紙文書は、スキャナ１ａによってスキャ
ンされる。スキャンされた文書属性抽出シート４３’及
び紙文書は、既に述べたように、文書管理サーバ３の文
書管理処理ＡＰ１０が、文書属性抽出シート４３’を識
別及び解析し、そのシート４３’に設定されている情報
に従って、シート４３’の後にスキャンされて作成され
たイメージ文書から属性情報を抽出し、抽出した属性情
報をそのイメージ文書に付与する。The document attribute extraction sheet 43 'and the paper document set on the scanner 1a are scanned by the scanner 1a. As described above, the document management processing AP 10 of the document management server 3 identifies and analyzes the document attribute extraction sheet 43 ′, and sets the scanned document attribute extraction sheet 43 ′ and the paper document as described above. The attribute information is extracted from the image document created by scanning after the sheet 43 ′ according to the information, and the extracted attribute information is added to the image document.

【００６０】以下、図９を参照して、上述の文書属性抽
出シート４３’が添えられた紙文書がスキャナ１ａでス
キャンされたときの、文書管理処理ＡＰ１０の処理流れ
を説明する。The processing flow of the document management process AP10 when the paper document to which the above-described document attribute extraction sheet 43 'is attached is scanned by the scanner 1a will be described below with reference to FIG.

【００６１】いずれかのスキャナ、例えばスキャナ１ａ
で紙文書のスキャンが行なわれているときは（ステップ
Ｓ１でＹｅｓ）、そのスキャナ１ａで作成されたスキャ
ンデータを取得する（Ｓ２）。文書管理処理ＡＰ１０
は、取得したスキャンデータの各ページをチェックして
（Ｓ３）、そのページが文書属性抽出シート４３’か否
かを判断する（Ｓ４）。文書属性抽出シート４３’を識
別したときは（Ｓ４でＹｅｓ）、それ以前のページのイ
メージ文書と、それ以降のページのイメージ文書は別の
ものであると判別し、シート４３’以降のページのイメ
ージ文書から文書属性を抽出するようにする。One of the scanners, for example, the scanner 1a
If a paper document is being scanned (Yes in step S1), the scan data created by the scanner 1a is acquired (S2). Document management processing AP10
Checks each page of the acquired scan data (S3), and determines whether or not the page is the document attribute extraction sheet 43 '(S4). When the document attribute extraction sheet 43 'is identified (Yes in S4), it is determined that the image document of the previous page and the image document of the subsequent pages are different, and the image document of the pages subsequent to the sheet 43' is determined. Extract document attributes from image documents.

【００６２】シート４３’を識別したときは（Ｓ４でＹ
ｅｓ）、文書管理処理ＡＰ１０は、シートフォーマット
ＤＢ１１内の情報、つまり、文書属性抽出シート４３’
における情報設定ボックス５１_１〜５１_３０の位置座標
と、各情報設定ボックス５１ _１〜５１_３０が対応する属
性項目及び文書属性の種類情報と、各情報設定ボックス
５１_１〜５１_３０への設定方法（チェックマークと数字
のどちらを設定するか）の情報とを元にして、シート４
３’を解析し、シート４３’の各情報設定ボックス５１
_１〜５１_３０に設定された情報を取得する（Ｓ５）。When the sheet 43 'is identified (Y in S4)
es) The document management process AP10 is in a sheet format
Information in DB 11, ie, document attribute extraction sheet 43 '
Information setting box 51 in₁~ 51₃₀Position coordinates
And each information setting box 51 ₁~ 51₃₀Genus corresponding
Type information of gender items and document attributes, and each information setting box
51₁~ 51₃₀Setting method (check mark and number
Sheet 4) based on the information
3 'is analyzed, and each information setting box 51 of the sheet 43' is analyzed.
₁~ 51₃₀The information set in (1) is obtained (S5).

【００６３】次に、文書管理処理ＡＰ１０は、シート４
３’の情報設定ボックス５１_１〜５１_１０から取得され
た文書属性の記載位置情報（ページ番号及びエリアＩ
Ｄ）と、エリア情報ＤＢ１５内に格納されているエリア
情報（エリアＩＤ及びエリアＩＤに対応したエリアの位
置情報）とに基づいて、イメージ文書自体に記載されて
いる各種抽出対象の文書属性の記載エリア（位置）を特
定する（Ｓ６）。また、文書管理処理ＡＰ１０は、シー
ト４３’の情報設定ボックス５１_１１〜５１_３０から取
得された文書属性の文字特徴情報（フォントＩＤ、文字
サイズポイント、文字飾りの有無を示すチェックマー
ク、及び取得文字数）と、文字情報ＤＢ１９内に格納さ
れているフォント／フォントＩＤテーブルとに基づい
て、イメージ文書自体に記載されている各種抽出対象の
文書属性の文字特徴を特定する（Ｓ７）。Next, the document management processing AP 10
Wherein the position information of the document attribute acquired from the information setting box ₅₁ _{1-51 10} 3 '(page number and area I
Based on D) and the area information (area ID and area position information corresponding to the area ID) stored in the area information DB 15, document attributes of various extraction targets described in the image document itself are described. The area (position) is specified (S6). The document management process AP10 the character characteristic information (font ID of the document attributes acquired from the information setting box ₅₁ _{11-51 30} sheets 43 ', the character size point, check mark indicating the presence or absence of character embellishment, and acquires the number of characters ) And the font / font ID table stored in the character information DB 19, the character characteristics of the document attributes of various extraction targets described in the image document itself are specified (S7).

【００６４】そして、文書管理処理ＡＰ１０は、イメー
ジ文書を解析し、特定した記載エリア及び文字特徴に基
づいて、イメージ文書から文書属性を抽出する（Ｓ
８）。文書管理処理ＡＰ１０は、抽出した文書属性をそ
れに係るイメージ文書（つまり、シート４３’を識別し
た次のページから別の文書属性抽出シートを識別する前
までのページのイメージ文書）に付与して（Ｓ９）、そ
のイメージ文書を文書属性／イメージ文書ＤＢ２７に保
存する（Ｓ１０）。Then, the document management process AP10 analyzes the image document and extracts document attributes from the image document based on the specified description area and character characteristics (S).
8). The document management process AP10 assigns the extracted document attributes to the corresponding image document (that is, the image document of the page from the next page identifying the sheet 43 ′ to the page before identifying another document attribute extracting sheet) ( S9) The image document is stored in the document attribute / image document DB 27 (S10).

【００６５】以上、上述した実施形態によれば、ユーザ
が文書属性抽出シート４３に所望の数字（ページ番号、
シートＩＤ、エリアＩＤ、フォントＩＤ等）又はチェッ
クマークを記入して、そのシートをそれに係る紙文書の
表紙にしてスキャンさせれば、その紙文書自体から文書
属性を抽出して、イメージ文書に文書属性を付与するこ
とができる。ユーザが文書属性抽出シート４３に記入す
る情報は数字又はチェックマークだけであるため、文書
属性の文字数がどんなに多くても、記入する量は少なく
て済む。つまり、従来は面倒且つ負担の大きいものであ
った文書属性付与作業が、ユーザにとって比較的楽なも
のになる。As described above, according to the above-described embodiment, the user sets the desired number (page number,
If a sheet ID, area ID, font ID, etc.) or a check mark is entered, and the sheet is used as the cover of the paper document and scanned, the document attributes are extracted from the paper document itself, and the document is added to the image document. Attributes can be assigned. Since the information to be entered by the user in the document attribute extraction sheet 43 is only numbers or check marks, no matter how many characters are included in the document attribute, the amount of entry is small. In other words, the document attribute assigning operation, which was conventionally cumbersome and burdensome, becomes relatively easy for the user.

【００６６】また、上述した実施形態によれば、文書属
性抽出シート４３に記入する情報は、スキャンする紙文
書の文書属性の記載位置だけで足りるので、スキャンす
る紙文書のフォーマットが同じであれば（つまり、紙文
書に記載される文書属性の種類とその記載位置が同じで
あれば）、一度情報を記入した文書属性抽出シート４３
をコピーして再利用することができる（従来は、たとえ
スキャンする紙文書のフォーマットが同じであっても、
文書属性の内容自体は異なっているため（例えば、文書
名の記載位置が同じでも、文書名自体は異なっているた
め）、結局は、文書名、作成者等の文書属性を入力しな
ければならない）。提案書、会議録、決済書等の種々の
紙文書には、同一種類の紙文書であれば、紙文書のフォ
ーマットは同一であるものが多い。従って、前述したよ
うに、一度利用した文書属性抽出シートをコピーして再
利用できれば、ユーザにとって非常に便利であり、効率
的である。Further, according to the above-described embodiment, the information to be entered in the document attribute extraction sheet 43 is sufficient only at the position where the document attribute of the paper document to be scanned is written. (That is, if the type of the document attribute described in the paper document and the position of the document attribute are the same), the document attribute extraction sheet 43 in which the information has been once written
Can be copied and reused (previously, even if the scanned paper document was the same format,
Since the content of the document attribute itself is different (for example, the document name itself is different even if the description position of the document name is the same), after all, the document attributes such as the document name and the creator must be input. ). Various paper documents, such as proposals, minutes, and settlement documents, often have the same format if they are the same type of paper document. Therefore, as described above, if the document attribute extraction sheet that has been used once can be copied and reused, it is very convenient and efficient for the user.

【００６７】以上、本発明の好適な実施形態を説明した
が、これは本発明の説明のための例示であって、本発明
の範囲をこの実施例にのみ限定する趣旨ではない。本発
明は、他の種々の形態でも実施することが可能である。
例えば、文書属性として、『文書種類』を追加登録し、
仕様書、設計書、提案書、会議録、決済書、受注書、論
文などの文書種類に基づいて、文書管理するようにする
こともできる。Although the preferred embodiment of the present invention has been described above, this is an exemplification for describing the present invention, and is not intended to limit the scope of the present invention only to this embodiment. The present invention can be implemented in other various forms.
For example, "document type" is additionally registered as a document attribute,
It is also possible to manage documents based on document types such as specifications, design documents, proposals, minutes, settlement documents, orders, and papers.

[Brief description of the drawings]

【図１】本発明の一実施形態に係るシステムの全体構成
を示すブロック図。FIG. 1 is a block diagram showing the overall configuration of a system according to an embodiment of the present invention.

【図２】文書管理サーバ３の構成を示すブロック図。FIG. 2 is a block diagram showing a configuration of a document management server 3.

【図３】スキャンしたイメージ文書４１の一例を示す
図。FIG. 3 is a diagram showing an example of a scanned image document 41.

【図４】文書属性抽出シートの一例を示す図。FIG. 4 is a diagram showing an example of a document attribute extraction sheet.

【図５】文書属性の記載エリアを指定するときのイメー
ジ文書４１の表示画面を示す図。FIG. 5 is a view showing a display screen of an image document 41 when a description area of a document attribute is designated.

【図６】各エリア６１〜６９に対してエリアＩＤを付与
するときの表示画面を示す図。FIG. 6 is a view showing a display screen when an area ID is assigned to each of areas 61 to 69;

【図７】フォント／フォントＩＤテーブルを示す図。FIG. 7 is a view showing a font / font ID table.

【図８】イメージ文書４１に基づいて情報を設定したと
きの文書属性抽出シートを示す図。FIG. 8 is a view showing a document attribute extraction sheet when information is set based on an image document 41.

【図９】文書属性抽出シート４３’が添えられた紙文書
がスキャナ１ａでスキャンされたときの、文書管理処理
ＡＰ１０の処理流れを示す図。FIG. 9 is a diagram showing a processing flow of a document management process AP10 when a paper document attached with a document attribute extraction sheet 43 ′ is scanned by the scanner 1a.

[Explanation of symbols]

１ａ、１ｂ、… スキャナ３文書管理サーバ７スキャナ制御部９文書属性抽出シート識別部（シート識別部）１１文書属性抽出シートフォーマットデータベース
（シートフォーマットＤＢ）１０文書管理処理アプリケーション（文書管理処理Ａ
Ｐ）１３文書属性抽出シート解析部（シート解析部）１５エリア情報データベース（エリア情報ＤＢ）１７エリア情報解析部１９文書情報データベース（文字情報ＤＢ）２１文字情報解析部２３文書属性抽出部２５文書属性付与部２７文書属性／イメージ文書データベース（文書属性
／イメージ文書ＤＢ）３１ディスプレイ３３入力装置３７文書属性抽出情報作成アプリケーション（抽出情
報作成ＡＰ）1a, 1b,... Scanner 3 Document management server 7 Scanner control unit 9 Document attribute extraction sheet identification unit (sheet identification unit) 11 Document attribute extraction sheet format database (sheet format DB) 10 Document management processing application (document management process A)
P) 13 Document attribute extraction sheet analysis unit (sheet analysis unit) 15 Area information database (area information DB) 17 Area information analysis unit 19 Document information database (character information DB) 21 Character information analysis unit 23 Document attribute extraction unit 25 Document attribute Assignment unit 27 Document attribute / image document database (document attribute / image document DB) 31 Display 33 Input device 37 Document attribute extraction information creation application (extraction information creation AP)

Claims

[Claims]

An input for inputting document attribute related information including position information indicating a position of a document attribute described in a paper document to be an image document and type information indicating a type of the document attribute described in the position. Means for storing the input document attribute-related information; specifying means for specifying position information of a desired type of document attribute among document attributes described in a paper document to be the image document; A document attribute identification device comprising: identification means for extracting a document attribute from an image document of the paper document based on designated position information and the stored document attribute related information, and identifying a type of the extracted document attribute.

2. The document attribute-related information includes a position identifier assigned to the position information, wherein the specifying unit specifies a position identifier of position information of the desired type of document attribute, 2. The document attribute identification device according to claim 1, wherein the document attribute identification device extracts a document attribute from the image document of the paper document based on the designated position identifier and the document attribute related information, and identifies a type of the extracted document attribute. .

3. The document attribute-related information includes character feature information indicating a feature of a character described in the document attribute, and the specifying unit specifies the character characteristic information of the character described in the desired type of document attribute. 2. The method according to claim 1, wherein the identifying unit extracts a document attribute from the image document of the paper document based on the designated character feature information and the document attribute related information, and identifies a type of the extracted document attribute. Document attribute identification device described.

4. A step of inputting document attribute related information including position information indicating a position of a document attribute described in a paper document to be an image document and type information indicating a type of the document attribute described in the position. Storing the input document attribute-related information; specifying the position information of a desired type of document attribute among the document attributes described in the paper document to be the image document; Extracting a document attribute from the image document of the paper document based on the position information and the stored document attribute-related information, and identifying a type of the extracted document attribute.

5. A step of inputting document attribute related information including position information indicating a position of a document attribute described in a paper document to be an image document and type information indicating a type of the document attribute described in the position. Storing the input document attribute-related information; specifying the position information of a desired type of document attribute among the document attributes described in the paper document to be the image document; Extracting a document attribute from the image document of the paper document based on the location information and the stored document attribute-related information, and identifying a type of the extracted document attribute. Computer readable recording medium.