JP2003178071A

JP2003178071A - Document management system

Info

Publication number: JP2003178071A
Application number: JP2001375696A
Authority: JP
Inventors: Shinobu Yamamoto; 忍山本
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2001-12-10
Filing date: 2001-12-10
Publication date: 2003-06-27

Abstract

<P>PROBLEM TO BE SOLVED: To provide a document management system capable of efficiently classifying document images by extracting an areal image of a drawing or a photograph, and using the same as an attribute. <P>SOLUTION: When an image input part 1 inputs the document image, an area discriminating processing of the document image is executed by an area discriminating means 2. An image cutout part 3 classifies a drawing area or a photograph area among the areas divided into a sentence area, a table area, a ruled line, the drawing area and the photograph area by the area discriminating part 2, as the 'drawing area', and a partial image of the area is cut out. A document image management part 4 adds the cutout partial image to the management information of the document image as the attribute information, and stored in a document image storing part 5. When two or more drawing areas or the photograph areas exist in the inputted document image, the image cutout part 3 can cut out all of the drawing areas as the partial image, or representative drawing area or the photograph area among two or more drawing areas or the photograph areas, as the partial image. <P>COPYRIGHT: (C)2003,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、入力された文書画
像に一つ以上の属性を付与して管理する文書画像管理シ
ステムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document image management system for adding and managing one or more attributes to an input document image.

【０００２】[0002]

【従来の技術】一般に、電子文書ファイリング装置など
の文書画像管理システムにおいては、入力された文書に
対して一つ以上の属性を付与して管理する方法がとられ
ることが多い。従来、文書画像に対する属性の付加は、
文書画像が登録された日時や当該画像文書を登録した登
録者などのように、文書管理システムが簡単に取得でき
る情報が付加情報として用いられている。このような文
書管理システムが容易に取得できる情報の他に文書画像
に対する属性の付加情報としては、オペレータが直接文
字列を入力したり、あらかじめ登録された属性リストの
中から付加する属性を選択するなど、一般的に手作業で
行われることが多い。また、入力する前に文書を種類毎
に仕分けし、一つの種類ごとに一括して文書画像として
入力する方法もある。このような文書画像を入力して管
理する方法として、特開平１０−２４０９５８号公報に
は、定型のフォームを識別して文書上に含まれている情
報を読み取り、読み取った情報と文書を結びつけて管理
することにより、自動的に文書上から情報を抽出し、オ
ペレータの作業を削減することができる画像から管理情
報を抽出する管理情報抽出装置および方法が記載されて
いる。2. Description of the Related Art Generally, in a document image management system such as an electronic document filing apparatus, a method of giving one or more attributes to an input document and managing it is often adopted. Conventionally, the addition of attributes to document images is
Information that the document management system can easily acquire, such as the date and time when the document image was registered and the registrant who registered the image document, is used as additional information. In addition to the information that can be easily acquired by the document management system, the operator directly inputs a character string or selects an attribute to be added from a pre-registered attribute list as additional information of the attribute to the document image. In general, it is often done manually. There is also a method of sorting documents by type before inputting and inputting as a document image collectively for each type. As a method of inputting and managing such a document image, Japanese Laid-Open Patent Publication No. 10-240958 discloses that a fixed form is identified, information included in the document is read, and the read information and the document are linked. A management information extracting apparatus and method for automatically extracting information from a document by management and extracting management information from an image that can reduce the work of an operator are described.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、オペレ
ータによる文字列の入力や付加する属性の選択などの一
般的な手作業による方法、入力する前に文書を種類毎に
仕分けしてから一括して文書画像として入力するという
ような方法では、文書画像の種類や量が多くなると、オ
ペレータの負荷が多大なものとなってしまい、非効率的
なものであった。また、特開平１０−２４０９５８号公
報記載の発明では、文書に付加する属性が文書上に記載
されている情報を読み取ったものであるため、実際には
必要となる属性が該当する文書上に記載されているとは
限らず、正確な属性を抽出できない場合もある。このよ
うな場合には、文書入力後にやはり手作業で所望の属性
を付加する必要がある。その際のオペレータの作業を削
減するために、入力文書の定型フォームごとに与えられ
る属性を定めておき、入力された文書画像がいずれのフ
ォームであるか識別することで文書画像に与えられる属
性を決定するような方法も提案されているが、文書画像
管理システムに入力される文書画像が必ずしも定型フォ
ームであるとは限らない。However, a general manual method such as an operator inputting a character string or selecting an attribute to be added, a method of sorting documents by type before inputting, and then collectively documenting The method of inputting as an image is inefficient because the load on the operator becomes large as the number and type of document images increase. Further, in the invention described in Japanese Patent Laid-Open No. 10-240958, since the attribute to be added to the document is obtained by reading the information described on the document, the actually required attribute is described on the corresponding document. It is not always the case that accurate attributes cannot be extracted. In such a case, it is necessary to manually add a desired attribute after inputting the document. In order to reduce the work of the operator at that time, the attributes given to each fixed form of the input document are defined, and the attributes given to the document image are determined by identifying which form the input document image is. Although a method of deciding is also proposed, the document image input to the document image management system is not always a fixed form.

【０００４】さらに、オペレータの手作業による負荷を
軽減し、入力される文書画像の領域情報を用いて属性を
与えることにより、定型フォームでなくても入力された
文書画像に適切な属性を与えて自動的に分類することが
可能な文書管理システムも提案されてきている。この文
書管理システムでは、入力された文書画像の領域情報を
用いて属性を与えるため、文書画像上の特定の位置に情
報を記載しておいたり、定型のフォームを使用したりす
ることなく、入力された文書画像に適切な属性を与えて
自動的に分類するようになっている。このような場合に
属性として使用している情報は、領域識別の結果、レイ
アウト情報、文字認識結果、バーコード認識結果情報な
どに基づいているものであるが、一般に文書画像の特徴
としてそれらの情報に加えて、文書画像に含まれるロ
ゴ、グラフ、その他の図および写真が文書画像を分類す
る際に重要な位置を占める場合もある。例えば、似たよ
うな図や写真を有する文書で分類すれば、内容を詳しく
調べなくても効率的に分類することができるようにな
る。Furthermore, by reducing the load of the operator's manual work and giving an attribute using the area information of the input document image, an appropriate attribute can be given to the input document image even if it is not a fixed form. Document management systems capable of automatically classifying have also been proposed. In this document management system, since the attribute is given by using the area information of the input document image, the information can be input without writing the information at a specific position on the document image or using a fixed form. Appropriate attributes are given to the created document images to automatically classify them. The information used as an attribute in such a case is based on area identification results, layout information, character recognition results, barcode recognition result information, etc. In addition, logos, graphs, other figures and photographs contained in the document image may occupy an important position in classifying the document image. For example, if documents having similar figures or photographs are used for classification, it is possible to efficiently classify them without examining the contents in detail.

【０００５】そこで、本発明の第１の目的は、入力され
る文書画像に図または写真が存在する場合に、この図ま
たは写真の領域画像を抽出し、属性として用いることに
より、入力された文書画像を効率的に分類することがで
きる文書管理システムを提供することである。本発明の
第２の目的は、後からオペレータが入力した文書画像に
抽出した図または写真などの属性を付加し、付加された
属性に基づき、自動的に文書画像を分類することによ
り、オペレータによる手作業の負荷を削減することがで
きる文書管理システムを提供することである。Therefore, a first object of the present invention is to extract an area image of a figure or photograph when the figure or photograph exists in the inputted document image and use it as an attribute to input the document. An object of the present invention is to provide a document management system capable of efficiently classifying images. A second object of the present invention is to add an attribute such as an extracted figure or photograph to a document image input by an operator later, and to automatically classify the document image based on the added attribute. An object of the present invention is to provide a document management system that can reduce the load of manual work.

【０００６】[0006]

【課題を解決するための手段】請求項１記載の発明で
は、文書画像を取得する文書画像取得手段と、前記文書
画像取得手段で取得した文書画像に単数または複数の図
領域または写真領域が存在するかどうかを認識する画像
領域認識手段と、前記画像領域認識手段が前記文書画像
に単数または複数の図領域または写真領域が存在すると
認識した場合、前記認識した図領域または写真領域を図
領域情報として抽出する領域情報抽出手段と、前記領域
情報抽出手段が抽出した図領域情報を属性として前記文
書画像に付加する属性付加手段と、前記属性付加手段が
属性を付加した文書画像を格納する文書画像格納手段
と、を備えたことにより、前記第１および前記第２の目
的を達成する。According to a first aspect of the present invention, a document image acquiring unit for acquiring a document image and a document image acquired by the document image acquiring unit have one or more drawing areas or photograph areas. Image area recognition means for recognizing whether or not to do so, and when the image area recognition means recognizes that one or more drawing areas or photograph areas exist in the document image, the recognized drawing area or photograph area is used as drawing area information. Area information extracting means for extracting as the attribute, attribute adding means for adding the drawing area information extracted by the area information extracting means to the document image as an attribute, and a document image storing the document image to which the attribute adding means adds the attribute By including the storing means, the first and second objects are achieved.

【０００７】請求項２記載の発明では、請求項１記載の
発明において、前記領域情報抽出手段は、前記画像領域
認識手段が認識した複数の図領域または写真領域を全て
図領域情報として抽出することにより、前記第１および
前記第２の目的を達成する。請求項３記載の発明では、
請求項２記載の発明において、前記領域情報抽出手段が
複数の図領域情報を抽出した場合、前記属性として付加
する図領域情報の選択を促す領域情報選択手段をさらに
備え、前記属性付加手段は、前記領域情報選択手段によ
って選択された図領域情報を属性として前記文書画像に
付加することにより、前記第１および前記第２の目的を
達成する。According to a second aspect of the invention, in the first aspect of the invention, the area information extracting means extracts all of the plurality of drawing areas or photograph areas recognized by the image area recognizing means as drawing area information. Thus, the first and second objects are achieved. According to the invention of claim 3,
In the invention according to claim 2, further comprising area information selecting means for prompting selection of drawing area information to be added as the attribute when the area information extracting means extracts a plurality of drawing area information, and the attribute adding means, By adding the drawing area information selected by the area information selecting means to the document image as an attribute, the first and second objects are achieved.

【０００８】請求項４記載の発明では、請求項１、請求
項２、請求項３のうちいずれか１に記載の発明におい
て、前記領域情報抽出手段が抽出した単数または複数の
図領域情報の特徴を抽出する特徴抽出手段をさらに備
え、前記属性付加手段は、前記特徴抽出手段が抽出した
特徴も属性として前記文書画像に付加することにより、
前記第１および前記第２の目的を達成する。According to a fourth aspect of the present invention, in the invention according to any one of the first, second and third aspects, the feature of the single or plural figure area information extracted by the area information extracting means. Further comprising a feature extracting unit for extracting the feature, and the attribute adding unit adds the feature extracted by the feature extracting unit to the document image as an attribute,
The first and second objects are achieved.

【０００９】請求項５記載の発明では、請求項４記載の
発明において、前記特徴抽出手段が抽出した特徴に基づ
いて、前記文書画像を分類する文書画像分類手段をさら
に備え、前記文書画像格納手段は、前記文書画像分類手
段による分類ごとに前記文書画像を格納することによ
り、前記第１および前記第２の目的を達成する。According to a fifth aspect of the invention, in the fourth aspect of the invention, the document image storage means further comprises a document image classification means for classifying the document images based on the features extracted by the feature extraction means. Achieves the first and second objects by storing the document image for each classification by the document image classification means.

【００１０】[0010]

【発明の実施の形態】以下、本発明の好適な実施の形態
について図１ないし図５を参照して詳細に説明する。図
１は、第１の実施形態に係る文書管理システムの概略構
成を示したブロック図である。図１に示すように文書管
理システムは、スキャナなどの画像入力装置を用いて文
書画像を入力する画像入力部１、入力された文書画像を
文章、図、写真、表などの一つ以上の領域に分割して識
別する領域識別部２、領域識別部２による領域識別結果
に基づいて、図または写真の領域を切り出す画像切り出
し部３、画像切り出し部３により切り出された図または
写真の領域を文書画像に属性として付加する文書画像管
理部４、入力された文書画像を保存する文書画像記憶部
５を備えている。BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, preferred embodiments of the present invention will be described in detail with reference to FIGS. FIG. 1 is a block diagram showing a schematic configuration of the document management system according to the first embodiment. As shown in FIG. 1, the document management system includes an image input unit 1 for inputting a document image using an image input device such as a scanner, and the input document image in one or more areas such as sentences, diagrams, photographs and tables. The area identification unit 2 that divides and identifies the image or the area of the photograph based on the area identification result by the area identification unit 2, and the image or image area cut out by the image cutout unit 3 is a document. A document image management unit 4 for adding an attribute to an image and a document image storage unit 5 for storing an input document image are provided.

【００１１】図２は、第１の実施形態の文書管理システ
ムによる入力文書画像の画像切り出し処理および属性付
加処理の処理手順を示したフローチャートである。ま
ず、画像入力部１が文書画像を入力すると（ステップ２
０１）、領域識別部２によって、この入力された文書画
像の領域識別処理が実行される（ステップ２０２）。こ
こで、領域識別部２による領域識別処理について説明す
る。例えば、入力された文書画像を圧縮して２値の圧縮
画像とし、この２値の圧縮画像のうちの黒画素連結成分
に外接する矩形を抽出し、その矩形を文字矩形とそれ以
外の矩形とに分類し、それぞれの矩形の統合により文字
領域やその他の領域を抽出する方法などがある。ここで
抽出される矩形には、１文字または２文字以上の文字
列、罫線、表、図、グラフ、写真などの様々な属性の矩
形が含まれるものとする。このようにして文書画像が文
章領域、表領域、罫線、図領域、写真領域などに分けら
れる。これらの領域の種類と数に関する情報を属性とし
て文書画像に付加することにより、後に例えば、文書画
像に含まれる図や表の数などを用いた文書管理システム
において検索をかけることができるようになっている。FIG. 2 is a flow chart showing a processing procedure of an image clipping process and an attribute adding process of an input document image by the document management system of the first embodiment. First, when the image input unit 1 inputs a document image (step 2
01), the area identifying unit 2 executes the area identifying process of the input document image (step 202). Here, the area identification processing by the area identification unit 2 will be described. For example, the input document image is compressed into a binary compressed image, a rectangle circumscribing the black pixel connected component of the binary compressed image is extracted, and the rectangle is divided into a character rectangle and other rectangles. There is a method of extracting character areas and other areas by classifying the rectangles and integrating them. The rectangles extracted here include one or more character strings, ruled lines, tables, diagrams, graphs, photographs, and other rectangles having various attributes. In this way, the document image is divided into a text area, a table area, ruled lines, a drawing area, a photograph area, and the like. By adding information about the type and number of these areas to the document image as an attribute, it becomes possible to search later in a document management system using the number of figures and tables included in the document image. ing.

【００１２】次に、画像切り出し部３は、領域識別部２
によって、文章領域、表領域、罫線、図領域、写真領域
などに分けられた各領域のうち、図領域または写真領域
をまとめて「図領域」として分類し、この「図領域」と
して分類した領域の部分画像を切り出す（ステップ２０
３）。そして、文書画像管理部４は、この切り出された
部分画像を文書画像の属性情報として文書画像の管理情
報に付加し（ステップ２０４）、文書画像に関する属性
情報とともに文書画像記憶部５に保存する（ステップ２
０５）。これにより、文書管理システムにおいてユーザ
は、管理されている文書画像の管理情報の一覧を参照し
た場合、一緒に表示される文書画像から切り出された図
領域または写真領域を一瞥するだけで容易に所望の文書
画像を選択し、取得することができる。Next, the image cutout unit 3 includes the area identification unit 2
Of the areas divided into text areas, table areas, ruled lines, figure areas, photo areas, etc., figure areas or photo areas are grouped together as "figure areas", and areas classified as "figure areas" Cut out a partial image of (Step 20
3). Then, the document image management unit 4 adds the clipped partial image to the management information of the document image as the attribute information of the document image (step 204), and stores it in the document image storage unit 5 together with the attribute information about the document image ( Step two
05). Thus, in the document management system, when the user refers to the list of the management information of the managed document images, the user can easily obtain a desired image area by simply glancing at the drawing area or the photograph area cut out from the document image displayed together. The document image of can be selected and acquired.

【００１３】また、画像切り出し部３は、入力された文
書画像に図領域または写真領域が複数存在した場合、す
べての図領域を部分画像として切り出すようにしてもよ
い。この場合には文書画像管理部４は、この切り出され
たすべての部分画像を文書画像に関する属性情報として
文書画像の管理情報に付加し、保存する。このように文
書画像管理部４は、画像切り出し部３によって切り出さ
れた文書画像内に記載されているすべての図領域または
写真領域に関する属性情報を管理情報に付加するので、
入力された文書画像に同じような図領域や写真領域が含
まれており、一種類の図領域または写真領域の部分画像
だけでは明確に区別することができないような場合にも
ユーザが容易に所望の文書画像を選択し、取得すること
ができる。If there are a plurality of drawing areas or photograph areas in the input document image, the image cutting unit 3 may cut out all the drawing areas as partial images. In this case, the document image management unit 4 adds all the cut out partial images to the management information of the document image as attribute information regarding the document image and stores it. In this way, the document image management unit 4 adds the attribute information relating to all the drawing areas or the photo areas described in the document image cut out by the image cutout unit 3 to the management information.
Even if the input document image contains the same drawing area or photograph area and it is not possible to clearly distinguish it with only one kind of partial image of the drawing area or photograph area, the user can easily request it. The document image of can be selected and acquired.

【００１４】さらに、画像切り出し部３は、入力された
文書画像に図領域または写真領域が複数存在した場合、
これら複数の図領域または写真領域のうち、代表的な図
領域または写真領域を部分画像として切り出すようにし
てもよい。この場合には文書画像管理部４は、この代表
的なものとして切り出された部分画像を文書画像に関す
る属性情報として文書画像の管理情報に付加し、保存す
る。すべての図領域または写真領域を属性情報として付
加すると、属性情報の数が多い場合にメモリやディスク
の消費量が増加してしまうこともある。そこで、保存す
る情報量の増加を防止するために、代表的な図領域また
は写真領域を一つないし複数選択し、この選択されたも
ののみを属性情報として付加することにより、メモリや
ディスクの消費量を軽減することができる。なお、代表
的な図領域または写真領域の選択方法としては、図領域
または写真領域の一覧を表示し、この表示された一覧に
基づいてオペレータやユーザが選択指示を行うような方
法が挙げられる。Further, the image cut-out section 3 determines whether the input document image has a plurality of drawing areas or photograph areas.
Of these plural drawing areas or photograph areas, a representative drawing area or photograph area may be cut out as a partial image. In this case, the document image management unit 4 adds the partial image cut out as a representative one to the management information of the document image as attribute information about the document image and stores it. If all drawing areas or photo areas are added as attribute information, the amount of memory or disk consumption may increase when the number of attribute information is large. Therefore, in order to prevent an increase in the amount of information to be stored, one or a plurality of typical drawing areas or photo areas are selected and only the selected ones are added as attribute information, thereby consuming memory and disk. The amount can be reduced. As a typical method of selecting a drawing area or a photograph area, there is a method of displaying a list of drawing areas or photograph areas and giving an instruction for selection by an operator or a user based on the displayed list.

【００１５】以上のように、第１の実施形態の文書管理
システムでは、入力された文書画像に対して、図領域ま
たは写真領域の画像部分を属性値として付加するように
したので、わかりやすく効果的な文書画像の特徴付けが
可能となる。また、第１の実施形態の文書管理システム
では、入力された文書画像に対して、すべての図領域ま
たは写真領域の画像部分を属性値として付加するように
したので、同様の図領域または写真領域の画像部分を有
する文書画像を容易に特定することができ、文書画像間
における相互識別を容易にすることができる。また、第
１の実施形態の文書管理システムでは、入力された文書
画像に対して、所定の図領域または写真領域の画像部分
を選択し、この選択された画像部分を属性値として付加
するようにしたので、メモリやディスクの消費量を軽減
することができる。As described above, in the document management system of the first embodiment, the image portion of the drawing area or the photograph area is added as an attribute value to the input document image, so that it is easy to understand. It is possible to characterize a typical document image. Further, in the document management system of the first embodiment, the image portions of all the drawing areas or the photograph areas are added as attribute values to the input document image, so that the same drawing area or photograph area is added. It is possible to easily specify the document image having the image portion of, and facilitate mutual identification between the document images. Further, in the document management system of the first embodiment, an image portion of a predetermined drawing area or photograph area is selected for an input document image, and the selected image portion is added as an attribute value. Therefore, it is possible to reduce the amount of memory and disk consumption.

【００１６】次に、第２の実施形態について説明する。
図３は、第２の実施形態に係る文書管理システムの概略
構成を示したブロック図である。図３に示したように、
第２の実施形態の文書管理システムは、画像切り出し部
３によって切り出された図領域または写真領域の特徴を
抽出する画像特徴抽出部６を第１の実施形態の文書管理
システムにさらに備えた構成となっている。以下、第１
の実施形態と同様の部分は同一の番号を用いて説明をす
る。文書画像管理部４は、画像特徴抽出部６によって抽
出された画像特徴を文書画像の属性情報としてさらに付
加するとともに、文書画像を種類ごとに分類して文書画
像記憶部５に保存する。Next, a second embodiment will be described.
FIG. 3 is a block diagram showing a schematic configuration of the document management system according to the second embodiment. As shown in FIG.
The document management system according to the second embodiment further includes an image feature extraction unit 6 that extracts the features of the drawing region or the photo region cut out by the image cutout unit 3 in the document management system according to the first embodiment. Has become. Below, the first
The same parts as those in the embodiment will be described using the same numbers. The document image management unit 4 further adds the image features extracted by the image feature extraction unit 6 as attribute information of the document image, classifies the document images by type, and stores the document images in the document image storage unit 5.

【００１７】図４は、第２の実施形態の文書管理システ
ムによる入力文書画像の画像特徴抽出処理および属性付
加処理の処理手順を示したフローチャートである。ま
ず、画像入力部１が文書画像を入力すると（ステップ３
０１）、領域識別部２によって、この入力された文書画
像の領域識別処理が実行される（ステップ３０２）。領
域識別部２による領域識別処理は、図２において説明し
た処理と同様であるのでここでは説明を省略する。画像
切り出し部３は、領域識別部２によって、文章領域、表
領域、罫線、図領域、写真領域などに分けられた各領域
のうち、図領域または写真領域をまとめて「図領域」と
して分類し、この「図領域」として分類した領域の部分
画像を切り出す（ステップ３０３）。FIG. 4 is a flow chart showing the processing procedure of the image feature extraction processing and the attribute addition processing of the input document image by the document management system of the second embodiment. First, when the image input unit 1 inputs a document image (step 3
01), the area identifying unit 2 executes the area identifying process of the input document image (step 302). The area identification processing by the area identification unit 2 is the same as the processing described in FIG. 2, and thus the description thereof is omitted here. The image cutout unit 3 classifies the figure region or the photo region out of the respective regions divided into the text region, the table region, the ruled line, the figure region, the photo region, etc. by the region identification unit 2 and classifies them as a “figure region”. Then, the partial image of the area classified as the "drawing area" is cut out (step 303).

【００１８】画像特徴抽出部６は、画像切り出し部３に
よって図領域または写真領域を切り出した後、切り出さ
れた部分画像に対して特徴抽出処理を実行する（ステッ
プ３０４）。例えば、特徴抽出処理としては、用意され
た画像を構成する画素毎の色に基づいて、各画素があら
かじめ複数の領域に分割された色空間中のいずれの領域
に属するかを特定し、各領域に属する画素の数に基づき
画像の色のヒストグラムを生成し、生成した色のヒスト
グラムを画像の特徴を表す特徴量として抽出する画像特
徴抽出方法が挙げられる。また、各領域に属する画素の
数に基づいて生成した画像の色のヒストグラムを画像の
特徴を表す特徴量として抽出する際に、各領域に属する
画素の数を領域間の色の累積度に応じてそれぞれ累積
し、累積後の各領域に属する画素の数に基づいて、さら
に生成された色のヒストグラムを画像の特徴を表す特徴
量として抽出する画像特徴抽出方法などが挙げられる。The image feature extraction unit 6 cuts out the drawing region or the photographic region by the image cutout unit 3, and then executes the feature extraction processing on the cut out partial image (step 304). For example, as the feature extraction processing, based on the color of each pixel forming the prepared image, it is specified which area in the color space each pixel belongs to in advance, and each area is identified. There is an image feature extraction method of generating a color histogram of an image based on the number of pixels belonging to and extracting the generated color histogram as a feature amount representing a feature of the image. In addition, when extracting the histogram of the color of the image generated based on the number of pixels belonging to each area as the feature amount representing the feature of the image, the number of pixels belonging to each area is determined according to the degree of color accumulation between areas. And an image feature extraction method that extracts the histogram of the generated color as a feature amount representing the feature of the image based on the number of pixels belonging to each region after the accumulation.

【００１９】そして、文書画像管理部４は、画像特徴抽
出部６によって抽出された画像特徴を文書画像の属性情
報として文書画像の管理情報に付加し（ステップ３０
５）、文書画像に関する属性情報とともに文書画像記憶
部５に保存する（ステップ３０６）。これにより、文書
管理システムにおいてユーザは、管理されている文書画
像の管理情報の一覧を参照した場合、一緒に表示される
文書画像から抽出された画像特徴を一瞥するだけで容易
に所望の文書画像を選択し、取得することができる。ま
た、画像特徴を抽出して属性情報として付加するので、
画像そのものを属性情報として付加する場合と比較し
て、メモリやディスクの消費量を大幅に軽減することが
できる。Then, the document image management unit 4 adds the image features extracted by the image feature extraction unit 6 to the management information of the document image as attribute information of the document image (step 30).
5), it is stored in the document image storage unit 5 together with the attribute information about the document image (step 306). Thus, in the document management system, when the user refers to the list of the management information of the managed document images, the user can easily glance at the image features extracted from the document images displayed together with the desired document image. You can select and get. Also, since the image features are extracted and added as attribute information,
Compared to the case where the image itself is added as the attribute information, the consumption of memory and disk can be significantly reduced.

【００２０】図５は、第２の実施形態の文書管理システ
ムによる入力文書画像の文書画像分類処理および属性付
加処理の処理手順を示したフローチャートである。ま
ず、画像入力部１が文書画像を入力すると（ステップ５
０１）、領域識別部２によって、この入力された文書画
像の領域識別処理が実行される（ステップ５０２）。な
お、領域識別部２による領域識別処理は、図２において
説明した処理と同様であるのでここでは説明を省略す
る。画像切り出し部３は、領域識別部２によって、文章
領域、表領域、罫線、図領域、写真領域などに分けられ
た各領域のうち、図領域または写真領域をまとめて「図
領域」として分類し、この「図領域」として分類した領
域の部分画像を切り出す（ステップ５０３）。画像特徴
抽出部６は、画像切り出し部３によって図領域または写
真領域を切り出した後、切り出された部分画像に対して
特徴抽出処理を実行する（ステップ５０４）。なお、画
像特徴抽出部６による画像特徴抽出処理は、図４におい
て説明した処理と同様であるのでここでは説明を省略す
る。FIG. 5 is a flow chart showing the processing procedure of the document image classification processing and the attribute addition processing of the input document image by the document management system of the second embodiment. First, when the image input unit 1 inputs a document image (step 5)
01), the area identifying unit 2 executes the area identifying process of the input document image (step 502). The area identification processing by the area identification unit 2 is the same as the processing described with reference to FIG. The image cutout unit 3 classifies the figure region or the photo region out of the respective regions divided into the text region, the table region, the ruled line, the figure region, the photo region, etc. by the region identification unit 2 and classifies them as a “figure region”. The partial image of the area classified as the "drawing area" is cut out (step 503). The image feature extraction unit 6 cuts out the drawing region or the photographic region by the image cutout unit 3, and then executes the feature extraction processing on the cut out partial image (step 504). The image feature extraction processing by the image feature extraction unit 6 is the same as the processing described in FIG. 4, and thus the description thereof is omitted here.

【００２１】文書画像管理部４は、画像特徴抽出部６に
よって抽出された画像特徴を文書画像の属性情報として
文書画像の管理情報に付加する（ステップ５０５）。こ
こで、文書画像管理部４は、文書画像に関する属性情報
を付加するとともに、その属性情報となる画像特徴を用
いて統計的手法により、保存する文書画像の分類を実行
する（ステップ５０６）。保存する文書画像の分類は、
画像特徴に基づいて実行されるので、似た画像特徴をも
つ文書画像ごとにグループ分けするというように、所定
の要件ごとに自動的にグループ分類することができる。
そして、グループ分けした文書画像を分類ごとに文書画
像記憶部５に保存する（ステップ５０７）。これによ
り、ユーザが文書画像を検索するような場合、文書画像
記憶部５にグループ分けされた分類ごとの文書画像を参
照することにより、探索範囲を限定することができ、効
率的な文書管理システムを実現することができる。The document image management unit 4 adds the image features extracted by the image feature extraction unit 6 to the document image management information as attribute information of the document image (step 505). Here, the document image management unit 4 adds the attribute information about the document image and classifies the document images to be stored by a statistical method using the image feature as the attribute information (step 506). The classification of document images to be saved is
Since the processing is performed based on the image feature, it is possible to automatically perform the group classification according to predetermined requirements such as grouping the document images having similar image features.
Then, the grouped document images are stored in the document image storage unit 5 for each classification (step 507). Thus, when the user searches for a document image, the search range can be limited by referring to the document image for each classification grouped in the document image storage unit 5, and an efficient document management system can be obtained. Can be realized.

【００２２】以上のように、第２の実施形態の文書管理
システムでは、入力された文書画像から図領域または写
真領域の部分画像を切り出し、さらにこの部分画像の特
徴を抽出して属性値として入力された文書画像に付加
し、保存・管理するので、部分画像そのものを属性値と
して付加する場合と比較して、メモリやディスクの消費
量を大幅に軽減することができる。また、第２の実施形
態の文書管理システムでは、入力された文書画像から図
領域または写真領域の部分画像を切り出し、この部分画
像から特徴として抽出した画像特徴を用いて所定の分類
を実行し、この分類ごとに文書画像を保存・管理するよ
うにしたので、文書画像に領域情報として特定の領域に
バーコードが付加されているような場合に、このバーコ
ード情報を用いることにより、容易に文書画像を分類す
ることができ、効率的な文書画像の保存・管理を実現す
ることができる。As described above, in the document management system of the second embodiment, a partial image of a drawing area or a photographic area is cut out from an input document image, and the feature of this partial image is extracted and input as an attribute value. Since it is added to the created document image and saved / managed, the consumption of memory and disk can be significantly reduced compared to the case where the partial image itself is added as an attribute value. Further, in the document management system of the second exemplary embodiment, a partial image of a drawing area or a photographic area is cut out from an input document image, and predetermined classification is executed using image features extracted as features from this partial image, Since the document images are stored and managed for each classification, if the barcode is added to a specific area as the area information in the document image, the barcode information can be used to easily document the document. Images can be classified, and efficient document image storage / management can be realized.

【００２３】[0023]

【発明の効果】請求項１記載の発明では、画像領域認識
手段が文書画像に単数または複数の図領域または写真領
域が存在すると認識した場合、この認識した図領域また
は写真領域を図領域情報として抽出する領域情報抽出手
段と、領域情報抽出手段が抽出した図領域情報を属性と
して文書画像に付加する属性付加手段と、を具備するの
で、効率的な文書画像の特徴付けにより、検索、管理な
どを容易に行うことができる。According to the first aspect of the present invention, when the image area recognizing means recognizes that a document image has one or more drawing areas or photograph areas, the recognized drawing area or photograph area is used as drawing area information. Since the area information extracting means for extracting and the attribute adding means for adding the drawing area information extracted by the area information extracting means to the document image as attributes are provided, retrieval, management, etc. can be performed by efficient characterization of the document image. Can be done easily.

【００２４】請求項２記載の発明では、領域情報抽出手
段は、画像領域認識手段が認識した複数の図領域または
写真領域を全て図領域情報として抽出するので、同様の
図領域や写真領域の画像を有する文書画像間の識別を容
易にすることができる。According to the second aspect of the present invention, since the area information extracting means extracts all of the plurality of drawing areas or photograph areas recognized by the image area recognizing means as drawing area information, images of similar drawing areas or photograph areas are extracted. It is possible to facilitate the discrimination between the document images having the.

【００２５】請求項３記載の発明では、領域情報抽出手
段が複数の図領域情報を抽出した場合、属性として付加
する図領域情報の選択を促す領域情報選択手段をさらに
具備するので、入力された文書画像に対して、任意の図
領域または写真領域の画像を選択して属性値として付加
することができ、メモリやディスクの容量消費を軽減す
ることができる。According to the third aspect of the invention, when the area information extracting means extracts a plurality of drawing area information, it further comprises area information selecting means for urging selection of the drawing area information to be added as an attribute. It is possible to select an image in an arbitrary drawing area or photo area and add it as an attribute value to the document image, and it is possible to reduce the capacity consumption of the memory or the disk.

【００２６】請求項４記載の発明では、領域情報抽出手
段が抽出した単数または複数の図領域情報の特徴を抽出
する特徴抽出手段をさらに具備するので、入力された文
書画像に対して、図領域または写真領域の特徴を抽出し
て属性値として付加することができ、メモリやディスク
の容量消費を大幅に軽減することができる。According to the fourth aspect of the present invention, since the feature information extracting means for extracting the feature of the single or plural drawing area information extracted by the area information extracting means is further provided, the drawing area is added to the input document image. Alternatively, the characteristics of the photo area can be extracted and added as an attribute value, and the capacity consumption of the memory or disk can be significantly reduced.

【００２７】請求項５記載の発明では、特徴抽出手段が
抽出した特徴に基づいて、文書画像を分類する文書画像
分類手段をさらに具備するので、図領域情報として特定
の領域にバーコード情報などが記載されているような場
合に、このバーコード情報に基づいた分類を行うことが
でき、効率的な文書画像管理を実現することができる。According to the fifth aspect of the present invention, since the document image classification means for classifying the document images based on the features extracted by the feature extraction means is further provided, the bar code information or the like is assigned to the specific area as the drawing area information. In the case as described, classification can be performed based on this barcode information, and efficient document image management can be realized.

[Brief description of drawings]

【図１】第１の実施形態に係る文書管理システムの概略
構成を示したブロック図である。FIG. 1 is a block diagram showing a schematic configuration of a document management system according to a first embodiment.

【図２】第１の実施形態の文書管理システムによる入力
文書画像の画像切り出し処理および属性付加処理の処理
手順を示したフローチャートである。FIG. 2 is a flowchart showing a processing procedure of an image clipping process and an attribute adding process of an input document image by the document management system of the first embodiment.

【図３】第２の実施形態に係る文書管理システムの概略
構成を示したブロック図である。FIG. 3 is a block diagram showing a schematic configuration of a document management system according to a second exemplary embodiment.

【図４】第２の実施形態の文書管理システムによる入力
文書画像の画像特徴抽出処理および属性付加処理の処理
手順を示したフローチャートである。FIG. 4 is a flowchart showing a processing procedure of image feature extraction processing and attribute addition processing of an input document image by the document management system of the second embodiment.

【図５】第２の実施形態の文書管理システムによる入力
文書画像の文書画像分類処理および属性付加処理の処理
手順を示したフローチャートである。FIG. 5 is a flowchart showing a processing procedure of a document image classification process and an attribute addition process of an input document image by the document management system of the second embodiment.

[Explanation of symbols]

１画像入力部２領域識別部３画像切り出し部４文書画像管理部５文書画像記憶部６画像特徴抽出部 1 Image input section 2 Area identification part 3 Image cutout section 4 Document image management section 5 Document image storage 6 Image feature extraction unit

Claims

[Claims]

1. A document image acquisition unit for acquiring a document image, and an image region recognition unit for recognizing whether or not a document image acquired by the document image acquisition unit has one or more drawing regions or photograph regions. Area information extracting means for extracting the recognized drawing area or photo area as drawing area information when the image area recognizing means recognizes that one or more drawing areas or photograph areas exist in the document image; An attribute adding unit that adds the drawing area information extracted by the extracting unit to the document image as an attribute, and a document image storing unit that stores the document image to which the attribute adding unit has added the attribute are provided. Document management system.

2. The document management system according to claim 1, wherein the area information extracting means extracts all of the plurality of drawing areas or photograph areas recognized by the image area recognizing means as drawing area information.

3. When the area information extracting means extracts a plurality of drawing area information, the area information selecting means further includes area information selecting means for urging selection of the drawing area information to be added as the attribute, and the attribute adding means includes the area information. 3. The document management system according to claim 2, wherein the drawing area information selected by the selection means is added to the document image as an attribute.

4. The apparatus further comprises a feature extraction means for extracting a feature of the single or plural drawing area information extracted by the area information extraction means, wherein the attribute addition means also uses the feature extracted by the feature extraction means as an attribute. The document management system according to any one of claims 1, 2, and 3, which is added to a document image.

5. A document image classifying unit that classifies the document images based on the features extracted by the feature extracting unit, wherein the document image storing unit includes the document images for each classification performed by the document image classifying unit. The document management system according to claim 4, wherein the document management system stores the document.