JP2002342343A

JP2002342343A - Document managing system

Info

Publication number: JP2002342343A
Application number: JP2001149808A
Authority: JP
Inventors: Shinobu Yamamoto; 忍山本
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2001-05-18
Filing date: 2001-05-18
Publication date: 2002-11-29

Abstract

PROBLEM TO BE SOLVED: To provide a document managing system, by which an inputted document image can be automatically divided by applying a suitable attribute even when that image is not a routine form. SOLUTION: When a document image is inputted by an image input part 1, the document image is divided into sentence area, table area, ruled line or drawing area by an area information extracting part 2. A layout information extracting part 22 extracts information on a layout from the document image divided into areas by area identifying processing on the basis of a layout database 23. After the layout information is extracted, character recognition is executed to an area at a specified position by a character recognizing part 24. Simultaneously with character recognition, bar code recognition is executed by a bar code recognizing part 26. Information read by such character recognition or bar code recognition is added to the document image as an attribute and the document image is saved in a document image storage part 4. By using such an attribute, the document image can be retrieved.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、入力された文書画
像に一つ以上の属性を付与して管理する文書管理システ
ムに関する。[0001] 1. Field of the Invention [0002] The present invention relates to a document management system for assigning and managing one or more attributes to an input document image.

【０００２】[0002]

【従来の技術】一般に、電子文書ファイリング装置など
の文書画像管理システムにおいては、入力された文書に
対して一つ以上の属性を付与して管理する方法がとられ
ることが多い。従来、文書画像に対する属性の付加は、
文書画像が登録された日時や当該画像文書を登録した登
録者などのように、文書管理システムが簡単に取得でき
る情報が用いられる。このような文書管理システムが容
易に取得できる情報の他に文書画像に対する属性の付加
としては、オペレータが直接文字列を入力したり、あら
かじめ登録された属性リストの中から付加する属性を選
択するなど、一般的に手作業で行われることが多い。ま
た、入力する前に文書を種類毎に仕分けし、一つの種類
ごとに一括して文書画像として入力する方法もある。こ
のような文書画像を入力して管理する方法として、特開
平１０−２４０９５８号公報には、定型のフォームを識
別して文書上に含まれている情報を読み取り、読み取っ
た情報と文書を結びつけて管理することにより、自動的
に文書上から情報を抽出し、オペレータの作業を削減す
ることができる画像から管理情報を抽出する管理情報抽
出装置および方法が記載されている。2. Description of the Related Art In general, in a document image management system such as an electronic document filing apparatus, a method of assigning and managing one or more attributes to an input document is often used. Conventionally, adding attributes to document images
Information that can be easily obtained by the document management system, such as the date and time when the document image was registered and the registrant who registered the image document, are used. In addition to the information that can be easily obtained by such a document management system, the addition of an attribute to a document image includes, for example, an operator directly inputting a character string or selecting an attribute to be added from a previously registered attribute list. In general, it is often performed manually. There is also a method of sorting documents by type before inputting them, and inputting them as document images collectively for each type. As a method of inputting and managing such a document image, Japanese Patent Application Laid-Open No. H10-240958 discloses a method in which a fixed form is identified, information included in the document is read, and the read information is associated with the document. A management information extracting apparatus and method for extracting management information from an image that can automatically extract information from a document by managing the information and reduce the work of an operator are described.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、オペレ
ータによる文字列の入力や付加する属性の選択などの一
般的な手作業による方法、入力する前に文書を種類毎に
仕分けしてから一括して文書画像として入力するという
ような方法では、文書画像の種類や量が多くなると、オ
ペレータの負荷が多大なものとなってしまい、非効率的
なものであった。また、特開平１０−２４０９５８号公
報記載の発明では、文書に付加する属性が文書上に記載
されている情報を読み取ったものであるため、実際には
必要となる属性が該当する文書上に記載されているとは
限らず、正確な属性を抽出できない場合もある。このよ
うな場合には、文書入力後にやはり手作業で所望の属性
を付加する必要がある。その際のオペレータの作業を削
減するために、入力文書の定型フォームごとに与えられ
る属性を定めておき、入力された文書画像がいずれのフ
ォームであるか識別することで文書画像に与えられる属
性を決定するような方法も提案されているが、文書画像
管理システムに入力される文書画像が必ずしも定型フォ
ームであるとは限らない。However, a general manual method such as input of a character string by an operator or selection of an attribute to be added, documents are sorted by type before inputting, and documents are collectively collected. In the method of inputting as an image, when the type and amount of the document image increase, the load on the operator increases, which is inefficient. Also, in the invention described in Japanese Patent Application Laid-Open No. 10-240958, since the attribute to be added to the document is obtained by reading information described on the document, the attribute actually required is described in the corresponding document. It is not always the case, and accurate attributes may not be extracted in some cases. In such a case, it is necessary to manually add desired attributes after inputting the document. In order to reduce the operator's work at that time, attributes to be given to each fixed form of the input document are defined, and attributes given to the document image by identifying which form the input document image is. Although a method of determining the document image has been proposed, the document image input to the document image management system is not always a fixed form.

【０００４】そこで、本発明の目的は、入力される文書
画像の領域情報を用いて属性を与えることにより、定型
フォームでなくても入力された文書画像に適切な属性を
与えて自動的に分類することができ、さらにオペレータ
によって後に入力された文書画像にも属性を付加して分
類することができる文書管理システムを提供することで
ある。Accordingly, an object of the present invention is to provide an attribute by using area information of an input document image, thereby automatically assigning an appropriate attribute to the input document image even if the input document image is not a fixed form. It is another object of the present invention to provide a document management system capable of classifying a document image input later by an operator by adding attributes.

【０００５】[0005]

【課題を解決するための手段】請求項１記載の発明で
は、文書画像を取得する文書画像取得手段と、前記文書
画像取得手段で取得した文書画像から文書領域、文字領
域、表領域、図面領域などの領域情報を抽出する領域情
報抽出手段と、前記領域情報抽出手段によって抽出され
た領域情報を属性として前記文書画像に付加する属性付
加手段と、前記属性付加手段によって属性が付加された
文書画像を同一属性ごとに分類して格納する文書画像格
納手段と、を備えたことにより、前記の目的を達成す
る。According to the first aspect of the present invention, a document image acquiring means for acquiring a document image, and a document area, a character area, a table area, and a drawing area from the document image acquired by the document image acquiring means. Area information extracting means for extracting area information, such as, area information extracted by the area information extracting means, an attribute adding means for adding the area information as an attribute to the document image, and a document image to which the attribute is added by the attribute adding means The above object is achieved by providing document image storage means for classifying and storing the same for each attribute.

【０００６】請求項２記載の発明では、請求項１記載の
発明において、前記領域情報抽出手段は、前記領域情報
として抽出された文書領域、表領域、図面領域などの各
領域の種類と数も抽出し、前記属性付加手段は、この各
領域の種類と数も属性として付加することにより、前記
の目的を達成する。請求項３記載の発明では、請求項１
または請求項２記載の発明において、前記領域情報抽出
手段は、前記領域情報として抽出した各領域の用紙上で
の位置をレイアウト情報として抽出し、前記属性付加手
段は、このレイアウト情報も属性として付加することに
より、前記の目的を達成する。請求項４記載の発明で
は、請求項１、請求項２、請求項３のうちいずれか１に
記載の発明において、前記領域情報抽出手段は、前記領
域情報のうちの所定の領域に含有されている文字を文字
情報として抽出し、前記属性付加手段は、この文字情報
も属性として付加することにより、前記の目的を達成す
る。請求項５記載の発明では、請求項１、請求項２、請
求項３、請求項４のうちいずれか１に記載の発明におい
て、前記領域情報抽出手段は、前記文書画像にバーコー
ドが付加されている場合、前記バーコードをバーコード
情報として抽出し、前記属性付加手段は、このバーコー
ド情報も属性として付加することにより、前記の目的を
達成する。According to a second aspect of the present invention, in the first aspect of the present invention, the area information extracting means also determines the type and number of each area such as a document area, a table area, and a drawing area extracted as the area information. The attribute adding means extracts and adds the type and number of each area as attributes to achieve the above object. In the invention according to claim 3, claim 1
Alternatively, in the invention according to claim 2, the area information extracting unit extracts, as layout information, a position on a sheet of each area extracted as the area information, and the attribute adding unit adds the layout information as an attribute. By doing so, the above object is achieved. According to a fourth aspect of the present invention, in any one of the first, second, and third aspects of the present invention, the area information extracting means includes the area information extracting means included in a predetermined area of the area information. The character is extracted as character information, and the attribute adding means achieves the object by adding the character information as an attribute. In the invention described in claim 5, in the invention described in any one of claims 1, 2, 3, and 4, the area information extracting means adds a barcode to the document image. In this case, the barcode is extracted as barcode information, and the attribute adding means achieves the object by adding the barcode information as an attribute.

【０００７】請求項６記載の発明では、文書画像を取得
する文書画像取得手段と、前記文書画像取得手段で取得
した文書画像から文書領域、文字領域、表領域、図面領
域などの領域情報を抽出する領域情報抽出手段と、前記
領域情報を２つ以上組み合わせ、この領域情報の組み合
わせごとに前記文書画像の種類を分類する文書画像分類
手段と、前記領域情報抽出手段によって抽出された領域
情報を属性として前記文書画像に付加する属性付加手段
と、前記文書画像分類手段による分類に基づいて、前記
属性付加手段によって属性が付加された文書画像を種類
ごとに分類して格納する文書画像格納手段と、を備えた
ことにより、前記の目的を達成する。According to the present invention, a document image acquiring means for acquiring a document image and area information such as a document area, a character area, a table area and a drawing area are extracted from the document image acquired by the document image acquiring means. Area information extraction means, two or more pieces of the area information are combined, a document image classification means for classifying the type of the document image for each combination of the area information, and the area information extracted by the area information extraction means is attributed. Attribute adding means for adding to the document image as, a document image storing means for classifying and storing the document images to which the attribute is added by the attribute adding means based on the classification by the document image classifying means, With the provision of the above, the above object is achieved.

【０００８】[0008]

【発明の実施の形態】以下、本発明の好適な実施の形態
について図１ないし図６を参照して詳細に説明する。図
１は、第１の実施形態の文書管理システムの概略構成を
示したブロック図である。図１に示すように文書管理シ
ステムは、スキャナなどの画像入力装置を用いて文書画
像を入力する画像入力部１、入力された文書画像から領
域情報を抽出する領域情報抽出部２、抽出された領域情
報を文書画像に属性として付加する文書画像管理部３お
よび入力された文書画像を保存する文書画像記憶部４を
備えている。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments of the present invention will be described below in detail with reference to FIGS. FIG. 1 is a block diagram illustrating a schematic configuration of the document management system according to the first embodiment. As shown in FIG. 1, the document management system includes an image input unit 1 for inputting a document image using an image input device such as a scanner, an area information extraction unit 2 for extracting area information from the input document image, A document image management unit 3 for adding area information to a document image as an attribute and a document image storage unit 4 for storing an input document image are provided.

【０００９】図２は、領域情報抽出部２を示した図であ
る。領域情報抽出部２は、図２（ａ）に示したように抽
出する領域情報に応じて文書画像を文章、図、表などの
一つ以上の領域に分割して識別する領域識別部２１を備
えている。また、図２（ｂ）に示したように、領域識別
部２１とは別にあらかじめ作成されたレイアウトデータ
ベース２３をもとに領域のレイアウト情報を抽出するレ
イアウト情報抽出部２２を組み合わせて利用するように
してもよい。また、図２（ｃ）に示したように、あらか
じめ作成された文字認識辞書２５をもとに特定の領域の
文字を認識し、文字情報としてを抽出する文字認識部２
４を組み合わせて利用するようにしてもよい。FIG. 2 is a diagram showing the area information extraction unit 2. The area information extracting unit 2 divides the document image into one or more areas such as sentences, figures, and tables according to the area information to be extracted as shown in FIG. Have. Further, as shown in FIG. 2B, a layout information extracting unit 22 for extracting layout information of an area based on a layout database 23 created in advance separately from the area identifying unit 21 is used in combination. You may. As shown in FIG. 2C, a character recognition unit 2 that recognizes a character in a specific area based on a character recognition dictionary 25 created in advance and extracts character information as character information.
4 may be used in combination.

【００１０】さらに、図２（ｄ）に示したように、バー
コード辞書２７をもとに特定の領域のバーコードを認識
してバーコード情報として抽出するバーコード認識部２
６とを組み合わせて利用するようにしてもよい。なお、
図２（ａ）〜（ｄ）に示した領域識別部２１、レイアウ
ト情報抽出部２２、レイアウトデータベース２３、文字
認識部２４、文字認識辞書２５、バーコード認識部２６
およびバーコード辞書２７を適宜組み合わせて領域情報
抽出部２としてもよいし、また、全て組み合わせて領域
情報抽出部２として利用するようにしてもよい。Further, as shown in FIG. 2D, a bar code recognizing unit 2 for recognizing a bar code in a specific area based on the bar code dictionary 27 and extracting it as bar code information.
6 may be used in combination. In addition,
2A to 2D, the area identifying unit 21, the layout information extracting unit 22, the layout database 23, the character recognizing unit 24, the character recognizing dictionary 25, and the barcode recognizing unit 26.
The barcode dictionary 27 and the barcode dictionary 27 may be appropriately combined to form the area information extraction unit 2, or may be combined and used as the area information extraction unit 2.

【００１１】次に、本実施の形態に係る文書管理システ
ムの動作について説明する。図３は、第１の実施形態の
文書管理システムによる入力文書画像の属性付加処理の
処理手順を示したフローチャートである。まず、画像入
力部１が文書画像を入力すると（ステップ１０）、領域
情報抽出部２によって、この入力された文書画像の領域
識別処理を実行する（ステップ１１）。ここで、領域情
報抽出部２による領域識別処理について説明する。例え
ば、入力された文書画像を圧縮して２値の圧縮画像と
し、この２値の圧縮画像のうちの黒画素連結成分に外接
する矩形を抽出し、その矩形を文字矩形とそれ以外の矩
形とに分類し、それぞれの矩形の統合により文字領域や
その他の領域を抽出する方法などがある。ここで抽出さ
れる矩形には、１文字または２文字以上の文字列、罫
線、表、図、グラフなどの様々な属性の矩形が含まれる
ものとする。このようにして文書画像が文章領域、表領
域、罫線、図領域などに分けられる。これらの領域の種
類と数に関する情報を属性として文書画像に付加するこ
とにより、後に例えば、文書画像に含まれる図や表の数
などを用いて文書管理システムにおいて検索をかけるこ
とができるようになっている。Next, the operation of the document management system according to the present embodiment will be described. FIG. 3 is a flowchart illustrating a processing procedure of an attribute adding process of an input document image by the document management system according to the first embodiment. First, when the image input unit 1 inputs a document image (step 10), the area information extracting unit 2 executes an area identification process of the input document image (step 11). Here, the area identification processing by the area information extraction unit 2 will be described. For example, the input document image is compressed into a binary compressed image, a rectangle circumscribing the black pixel connected component in the binary compressed image is extracted, and the rectangle is defined as a character rectangle and other rectangles. And extracting a character area and other areas by integrating the rectangles. It is assumed that the rectangles extracted here include rectangles having various attributes such as a character string of one character or two or more characters, ruled lines, tables, figures, and graphs. In this way, the document image is divided into a text area, a table area, a ruled line, a figure area, and the like. By adding information on the type and number of these areas to the document image as attributes, it becomes possible to search later in the document management system using, for example, the number of figures and tables included in the document image. ing.

【００１２】次に、レイアウト情報抽出部２２は、レイ
アウトデータベース２３に基づいて領域識別処理によっ
て領域に分けられた文書画像からレイアウトに関する情
報を抽出する（ステップ１２）。このレイアウト情報抽
出部２２による文書画像のレイアウト情報抽出は、例え
ば、領域識別された文書画像をさらに細かな領域、行に
分割し、各領域や行などの要素の座標位置、さらには該
当する領域、行に含まれる文字の大きさ、行数などをレ
イアウト情報の特徴として求める方法がある。このレイ
アウト情報抽出部２２によって各要素（領域、行、文字
数など）の座標値として検出されたものが文書画像のレ
イアウト情報となる。このようにして各領域のレイアウ
ト情報を抽出し、文書画像の属性を付加することにより
後に例えば、文章、図、表などのレイアウトを用いて文
書管理システムにおいて検索をかけることができるよう
になっている。Next, the layout information extracting section 22 extracts information on the layout from the document image divided into areas by the area identification processing based on the layout database 23 (step 12). The layout information extraction unit 22 extracts the layout information of the document image by, for example, dividing the area-identified document image into smaller areas and lines, and coordinates the positions of the elements such as the respective areas and lines and the corresponding areas. , The size of characters included in a line, the number of lines, and the like are obtained as characteristics of layout information. What is detected as the coordinate value of each element (area, line, number of characters, etc.) by the layout information extracting unit 22 becomes the layout information of the document image. In this way, by extracting the layout information of each area and adding the attribute of the document image, it is possible to search later in the document management system using the layout of, for example, sentences, figures, and tables. I have.

【００１３】また、レイアウト情報を抽出した後（ステ
ップ１２）、特定の位置にある領域に対して文字認識部
２４によって文字認識を実行する（ステップ１３）。文
字認識と同時にバーコード認識部２６によってバーコー
ド認識を実行する（ステップ１４）。また、文字認識を
実行せず、バーコード認識のみを実行するようにしても
よい。この文字認識やバーコード認識によって読み取ら
れた情報を属性として文書画像に付加し（ステップ１
５）、文章画像を保存する（ステップ１６）。文書画像
の見出しやヘッダ、フッタなど特定の領域に文書画像に
関する情報として有用な文字列やバーコードが添付され
ることが多くあり、その情報を抽出して属性として付加
した文書画像を管理することにより、後にこの属性を用
いることで文書管理システムにおいて文書画像の検索を
かけることができるようになっている。After the layout information is extracted (step 12), character recognition is performed by the character recognizing unit 24 on an area at a specific position (step 13). At the same time as character recognition, barcode recognition is executed by the barcode recognition unit 26 (step 14). Further, only barcode recognition may be performed without performing character recognition. The information read by the character recognition or the barcode recognition is added to the document image as an attribute (step 1).
5) Save the sentence image (step 16). Character strings and barcodes useful as information related to the document image are often attached to specific areas such as the heading, header, and footer of the document image, and extracting the information and managing the document image added as attributes Thus, a document image can be searched in the document management system by using this attribute later.

【００１４】以上のように本実施の形態では、領域情報
を文書画像から自動的に抽出して属性として付加するこ
とにより、定型フォームでない文書画像に関してもオペ
レータが手作業で入力することなしに、属性による検索
が容易な文書管理システムを実現することができる。本
実施の形態の文書管理システムでは、入力された文書画
像に対して自動的に抽出できる領域情報を属性値として
付加するので、属性値を画像毎に付加させるといったオ
ペレータの文書画像登録時の作業を大幅に削減すること
ができる。また、本実施の形態の文書管理システムで
は、自動的に抽出される領域情報として、文字領域、表
領域、罫線、図領域など領域の種類と各領域の数を文書
画像の属性として付加することにより、簡単な処理で検
索を容易に実行することができる。As described above, in this embodiment, by automatically extracting area information from a document image and adding it as an attribute, an operator does not manually input a document image that is not a fixed form. It is possible to realize a document management system that allows easy retrieval by attributes. In the document management system according to the present embodiment, since area information that can be automatically extracted from an input document image is added as an attribute value, an operator can add attribute values to each image when registering a document image. Can be greatly reduced. Further, in the document management system of the present embodiment, as the area information automatically extracted, the type of area such as a character area, a table area, a ruled line, and a figure area and the number of each area are added as attributes of the document image. Thus, the search can be easily executed by a simple process.

【００１５】本実施の形態の文書管理システムによる
と、領域情報としてレイアウト情報を用いるので、領域
の種類と数だけでは区別できないような文書画像の属性
値を付加することができ、より詳細な情報で文書画像の
検索をすることができる。また、本実施の形態の文書管
理システムによると、領域情報として特定の領域に記載
されている文字情報を用いるので、レイアウト情報だけ
では区別できないような文書画像の属性値を付加するこ
とができ、より詳細な情報で文書画像の検索をすること
ができる。さらに、本実施の形態の文書管理システムに
よると、領域情報として特定の領域に記載されているバ
ーコード情報を用いるので、レイアウト情報だけでは区
別できないような文書画像の属性値を付加することがで
き、より詳細な情報で文書画像の検索をすることができ
る。According to the document management system of the present embodiment, since layout information is used as area information, it is possible to add a document image attribute value that cannot be distinguished only by the type and number of areas, and to obtain more detailed information. Can be used to search for a document image. Further, according to the document management system of the present embodiment, since character information described in a specific area is used as area information, it is possible to add an attribute value of a document image that cannot be distinguished only by layout information, A document image can be searched with more detailed information. Further, according to the document management system of the present embodiment, since barcode information described in a specific area is used as area information, it is possible to add a document image attribute value that cannot be distinguished only by layout information. Thus, a document image can be searched with more detailed information.

【００１６】図４は、第２の実施形態の文書管理システ
ムの概略構成を示した図である。第２の実施形態の文書
管理システムは、第１の実施形態の文書管理システムに
あらかじめ定義された領域情報と文書画像種類とを有す
る文書画像分類テーブル５をさらに備えた構成となって
いる。以下、第１の実施形態と同様の部分は同一の番号
を用いて説明をする。文書画像管理部３では、抽出され
た領域情報を文書画像に属性として付加するとともに文
書画像を種類ごとに分類して文書画像記憶部４に保存す
る。FIG. 4 is a diagram showing a schematic configuration of a document management system according to the second embodiment. The document management system according to the second embodiment further includes a document image classification table 5 having area information and a document image type defined in advance in the document management system according to the first embodiment. Hereinafter, the same parts as those in the first embodiment will be described using the same numbers. The document image management unit 3 adds the extracted area information to the document image as an attribute, classifies the document image by type, and stores it in the document image storage unit 4.

【００１７】図５は、第２の実施形態の文書管理システ
ムによる入力文書画像の属性付加処理の処理手順を示し
たフローチャートである。まず、画像入力部１が文書画
像を入力すると（ステップ２０）、領域情報抽出部２に
よって、この入力された文書画像の領域識別処理を実行
する（ステップ２１）。次に、レイアウト情報抽出部２
２は、レイアウトデータベース２３に基づいて領域識別
処理によって領域に分けられた文書画像からレイアウト
に関する情報を抽出する（ステップ２２）。また、レイ
アウト情報を抽出した後（ステップ２２）、特定の位置
にある領域に対して文字認識部２４によって文字認識を
実行する（ステップ２３）。文字認識と同時にバーコー
ド認識部２６によってバーコード認識を実行する（ステ
ップ２４）。また、文字認識を実行せず、バーコード認
識のみを実行するようにしてもよい。FIG. 5 is a flowchart showing a processing procedure of an attribute adding process of an input document image by the document management system according to the second embodiment. First, when the image input unit 1 inputs a document image (step 20), the area information extracting unit 2 executes an area identification process for the input document image (step 21). Next, the layout information extraction unit 2
2 extracts information on a layout from the document image divided into regions by the region identification processing based on the layout database 23 (step 22). After extracting the layout information (step 22), the character recognition unit 24 performs character recognition on an area at a specific position (step 23). At the same time as character recognition, barcode recognition is executed by the barcode recognition unit 26 (step 24). Further, only barcode recognition may be performed without performing character recognition.

【００１８】レイアウト情報抽出、文字認識、バーコー
ド認識後（ステップ２２〜２４）、文書画像分類テーブ
ル５が有する文書画像種類との比較参照を行う（ステッ
プ２５）。なお、文書画像分類テーブル５は、あらかじ
め一つ以上の領域情報の組み合わせによって文書画像を
分類し、この分類された文書画像を保存するためのテー
ブルが作成されているものとする。図６は、文書画像分
類テーブルの一例を示した図である。図６では一例とし
て、二つの文書画像種類が示されている。文書画像種類
番号”００１”は、図の数が１、表の数が２、文字領域
の数が４である領域情報をもつ文書画像を表している。
文書画像種類番号”００２”は、レイアウト情報がレイ
アウトデータベース２３に登録されている４番のレイア
ウト情報であり、その中の文字領域１には「報告書」と
いう文字列が、文字領域２には「研究所」という文字列
が存在することを表している。After layout information extraction, character recognition, and barcode recognition (steps 22 to 24), comparison with the document image type in the document image classification table 5 is performed (step 25). In the document image classification table 5, it is assumed that a document image is classified in advance by combining one or more pieces of area information, and a table for storing the classified document images is created. FIG. 6 is a diagram illustrating an example of the document image classification table. FIG. 6 shows two document image types as an example. The document image type number “001” represents a document image having area information in which the number of figures is 1, the number of tables is 2, and the number of character areas is 4.
The document image type number “002” is the fourth layout information whose layout information is registered in the layout database 23, in which a character string “report” is stored in the character area 1, and This indicates that the character string "lab" exists.

【００１９】入力された文書画像から抽出された領域情
報をこの文書画像分類テーブル５と比較参照した後、適
合する文書画像の種類を決定し（ステップ２５）、その
文書画像種類に従って入力された文書画像に属性を付加
する（ステップ２６）。属性を付加した後（ステップ２
６）、文章画像を分類し（ステップ２７）、文書画像記
憶部４に保存する（ステップ２８）。以上のように本実
施の形態の文書管理システムによると、定型フォームで
ない文書画像に関しても検索、分類を実行することがで
きる。After comparing the area information extracted from the input document image with the document image classification table 5, the type of the suitable document image is determined (step 25). An attribute is added to the image (step 26). After adding attributes (Step 2
6) Classify the sentence images (step 27) and store them in the document image storage unit 4 (step 28). As described above, according to the document management system of the present embodiment, search and classification can be executed even for a document image that is not a fixed form.

【００２０】例えば、公開特許公報の冒頭ページは、右
上にバーコード領域があり、上から「公開特許公報」の
見出し、記号、出願に関する情報、発明の名称、要約な
どの文字列領域が続き、右下に代表図の領域がある。公
開特許公報の冒頭ページであれば含まれる領域情報は同
じであるが、ＩＰＣ分類記号の数や発明者の数など、情
報の量によってこれらの位置がずれることがあるため、
定型フォームとはなっておらず、定型フォームでの文書
画像を分類することがシステムでは容易に文書の分類を
実行できないことがある。また、公開特許公報のような
定型フォームでなくても、同じ種類の文書画像である場
合に含まれる領域情報が同一であることも多い。このよ
うな場合、本実施の形態のように情報量が変化しない領
域情報を用いて文書画像分類テーブル５を定義すること
により、同じ種類の文書画像として分類することができ
る。For example, the top page of the published patent publication has a barcode area at the upper right, followed by a character string area from the top, such as the heading, symbol, information on the application, the title of the invention, and the abstract, In the lower right is the region of the representative figure. The area information included in the first page of the published patent publication is the same, but these positions may be shifted depending on the amount of information such as the number of IPC classification symbols and the number of inventors.
The system does not have a fixed form, and the system may not be able to easily classify a document by classifying a document image in the fixed form. In addition, even if the document image is not a fixed form as disclosed in the patent publication, the region information included in the case of document images of the same type is often the same. In such a case, the document images can be classified as the same type by defining the document image classification table 5 using the area information whose information amount does not change as in the present embodiment.

【００２１】このように本実施の形態では、定型フォー
ムではないが内容は同じであるというような文書画像を
同じ種類であるとして自動的に分類することができ、文
書画像の検索の際には、検索対象を文書画像種類ごとに
限定でき、効率的な文書管理システムを実現することが
できる。また、本実施の形態の文書管理システムでは、
領域情報の組み合わせによって文書画像を分類するの
で、同一種類ごとの文書の管理、分類、検索などを容易
に実行することができる。As described above, in this embodiment, document images that are not fixed forms but have the same contents can be automatically classified as being of the same type. The search target can be limited for each document image type, and an efficient document management system can be realized. In the document management system according to the present embodiment,
Since the document images are classified according to the combination of the area information, the management, classification, search, and the like of the documents of the same type can be easily executed.

【００２２】[0022]

【発明の効果】請求項１記載の発明では、領域情報抽出
手段によって抽出された領域情報を属性として前記文書
画像に付加する属性付加手段と、属性付加手段によって
属性が付加された文書画像を同一属性ごとに分類して格
納する文書画像格納手段と、を備えたので、属性値を画
像ごとに付加させるというようなオペレータの文書画像
登録時の作業を大幅に削減することができる。According to the first aspect of the present invention, the attribute adding means for adding the area information extracted by the area information extracting means to the document image as an attribute is the same as the document image to which the attribute is added by the attribute adding means. Since document image storage means for classifying and storing attributes is provided for each attribute, it is possible to greatly reduce an operator's work of registering a document image such as adding an attribute value to each image.

【００２３】請求項２記載の発明では、領域情報抽出手
段は、領域情報として抽出された文書領域、表領域、図
面領域などの各領域の種類と数も抽出し、属性付加手段
は、この各領域の種類と数も属性として付加するので、
簡単な処理で検索が容易な文書画像の属性値を付加する
ことができる。According to the second aspect of the present invention, the area information extracting means also extracts the type and number of each area such as a document area, a table area, and a drawing area which are extracted as area information. Since the type and number of areas are also added as attributes,
It is possible to add an attribute value of a document image which is easily searched by a simple process.

【００２４】請求項３記載の発明では、領域情報抽出手
段は、領域情報として抽出した各領域の用紙上での位置
をレイアウト情報として抽出し、属性付加手段は、この
レイアウト情報も属性として付加するので、領域の種類
と数だけでは区別できないような文書画像の属性値を付
加することができ、より詳細な情報で文書画像の検索が
できる。請求項４記載の発明では、領域情報抽出手段
は、領域情報のうちの所定の領域に含有されている文字
を文字情報として抽出し、属性付加手段は、この文字情
報も属性として付加するので、レイアウト情報だけでは
区別できないような文書画像の属性値を付加することが
でき、より詳細な情報で文書画像の検索ができる。請求
項５記載の発明では、領域情報抽出手段は、文書画像に
バーコードが付加されている場合、バーコードをバーコ
ード情報として抽出し、属性付加手段は、このバーコー
ド情報も属性として付加するので、レイアウト情報だけ
では区別できないような文書画像の属性値を付加するこ
とができ、より詳細な情報で文書画像の検索ができる。According to the third aspect of the present invention, the area information extracting means extracts the position of each area extracted as the area information on the paper as layout information, and the attribute adding means adds the layout information as an attribute. Therefore, an attribute value of the document image that cannot be distinguished only by the type and the number of the regions can be added, and the document image can be searched with more detailed information. According to the fourth aspect of the present invention, the area information extracting means extracts characters contained in a predetermined area of the area information as character information, and the attribute adding means adds the character information as an attribute. An attribute value of the document image that cannot be distinguished only by the layout information can be added, and the document image can be searched with more detailed information. According to the fifth aspect of the present invention, when a bar code is added to the document image, the area information extracting means extracts the bar code as bar code information, and the attribute adding means adds the bar code information as an attribute. Therefore, an attribute value of the document image that cannot be distinguished only by the layout information can be added, and the document image can be searched with more detailed information.

【００２５】請求項６記載の発明では、文書画像取得手
段で取得した文書画像から文書領域、文字領域、表領
域、図面領域などの領域情報を抽出する領域情報抽出手
段と、領域情報を２つ以上組み合わせ、この領域情報の
組み合わせごとに文書画像の種類を分類する文書画像分
類手段と、領域情報抽出手段によって抽出された領域情
報を属性として文書画像に付加する属性付加手段と、文
書画像分類手段による分類に基づいて、属性付加手段に
よって属性が付加された文書画像を種類ごとに分類して
格納する文書画像格納手段と、を備えたので、同一種類
ごとの文書の管理が可能となり、効率的な文書画像管理
システムが実現できる。According to the present invention, there are provided two area information extracting means for extracting area information such as a document area, a character area, a table area and a drawing area from a document image obtained by the document image obtaining means. Document image classifying means for classifying the types of document images for each combination of the area information, attribute adding means for adding the area information extracted by the area information extracting means to the document image as attributes, and document image classifying means Document image storage means for classifying and storing document images to which attributes have been added by the attribute adding means for each type based on the classification according to, so that documents of the same type can be managed, and A simple document image management system can be realized.

[Brief description of the drawings]

【図１】第１の実施形態の文書管理システムの概略構成
を示したブロック図である。FIG. 1 is a block diagram illustrating a schematic configuration of a document management system according to a first embodiment.

【図２】領域情報抽出部を示した図である。FIG. 2 is a diagram illustrating an area information extraction unit.

【図３】第１の実施形態の文書管理システムによる入力
文書画像の属性付加処理の処理手順を示したフローチャ
ートである。FIG. 3 is a flowchart illustrating a procedure of a process of adding an attribute of an input document image by the document management system according to the first embodiment;

【図４】第２の実施形態の文書管理システムの概略構成
を示したブロック図である。FIG. 4 is a block diagram illustrating a schematic configuration of a document management system according to a second embodiment.

【図５】第２の実施形態の文書管理システムによる入力
文書画像の属性付加処理の処理手順を示したフローチャ
ートである。FIG. 5 is a flowchart illustrating a processing procedure of an attribute adding process of an input document image by the document management system according to the second embodiment.

【図６】文書画像分類テーブルの一例を示した図であ
る。FIG. 6 is a diagram illustrating an example of a document image classification table.

[Explanation of symbols]

１画像入力部２領域情報抽出部３文書画像管理部４文書画像記憶部２１領域識別部２２レイアウト情報抽出部２３レイアウトデータベース２４文字認識部２５文字認識辞書２６バーコード認識部２７バーコード辞書 DESCRIPTION OF SYMBOLS 1 Image input part 2 Area information extraction part 3 Document image management part 4 Document image storage part 21 Area identification part 22 Layout information extraction part 23 Layout database 24 Character recognition part 25 Character recognition dictionary 26 Barcode recognition part 27 Barcode dictionary

Claims

[Claims]

1. A document image obtaining means for obtaining a document image, and area information extracting means for extracting area information such as a document area, a character area, a table area, and a drawing area from the document image obtained by the document image obtaining means. An attribute adding unit that adds the area information extracted by the area information extracting unit as an attribute to the document image; and a document image that classifies and stores the document image to which the attribute is added by the attribute adding unit for each attribute. Storage means;
A document management system comprising:

2. The area information extracting means also extracts the type and number of each area such as a document area, a table area, and a drawing area, which are extracted as the area information. 2. The document management system according to claim 1, wherein the number and the number are added as attributes.

3. The method according to claim 1, wherein the area information extracting means extracts, as layout information, the position of each area extracted as the area information on a sheet, and the attribute adding means adds the layout information as an attribute. 3. The document management system according to claim 1, wherein:

4. The area information extracting means extracts a character contained in a predetermined area from the area information as character information, and the attribute adding means adds the character information as an attribute. Claims 1 and 2,
The document management system according to claim 3.

5. The area information extracting means extracts the bar code as bar code information when a bar code is added to the document image, and the attribute adding means adds the bar code information as an attribute. The document management system according to any one of claims 1, 2, 3, and 4, wherein:

6. A document image acquiring means for acquiring a document image, and area information extracting means for extracting area information such as a document area, a character area, a table area, and a drawing area from the document image acquired by the document image acquiring means. A document image classifying unit that combines two or more of the region information and classifies the type of the document image for each combination of the region information; and adds the region information extracted by the region information extracting unit to the document image as an attribute. And a document image storage unit for classifying and storing, by type, document images to which attributes have been added by the attribute addition unit based on the classification by the document image classification unit. And a document management system.