JP2008204184A

JP2008204184A - Image processor, image processing method, program and recording medium

Info

Publication number: JP2008204184A
Application number: JP2007039787A
Authority: JP
Inventors: Koji Kobayashi; 幸二小林; Hirohisa Inamoto; 浩久稲本; Yuka Kihara; 酉華木原
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2007-02-20
Filing date: 2007-02-20
Publication date: 2008-09-04
Anticipated expiration: 2027-02-20
Also published as: JP4859054B2

Abstract

PROBLEM TO BE SOLVED: To accurately identify and automatically classify the document type of an image. SOLUTION: A document type identification processing part 115 identifies an attribute (form, drawing, etc.) of a registration image 112, registers the attribute in an image information DB 117, and registers the registration image 112 in an image DB 114. At the time of document retrieval, a display screen 119 classified by the document type is formed and displayed on a display device 101, and a user selects a category such as "form" from the display screen using an input device 103. A thumbnail list of the selected "form" is displayed. COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、文書画像の文書タイプを認識および／または自動分類する画像処理装置、画像処理方法、プログラムおよび記録媒体に関し、例えば、複写機の複合機（ＭＦＰ（ＭｕｌｔｉＦｕｎｃｔｉｏｎＰｒｉｎｔｅｒ））、ファイルサーバ、画像処理プログラムなどに好適な技術に関する。 The present invention relates to an image processing apparatus, an image processing method, a program, and a recording medium for recognizing and / or automatically classifying a document type of a document image. For example, a multifunction peripheral (MFP (Multi Function Printer)), a file server, The present invention relates to a technique suitable for an image processing program.

紙文書をスキャナなどの入力デバイスを使用して電子化する例えば電子ファイリングなどの装置があるが、専ら紙文書を大量に扱う業務用途として使用されていた。近年、スキャナの低価格化やスキャン機能を搭載したＭＦＰの普及やｅ−文書法などの法制化により、オフィスでもそのハンドリングの良さや利便性が認知され、紙文書が電子化されている。一方では、電子化された文書画像データや写真画像データ、ＰＣなどのアプリケーションによって作成された文書データなどをデータベース（以下ＤＢ）化して一元管理する画像ＤＢの利用も増大している。例えば紙文書の原本を保存する必要があっても管理や検索のし易さから、画像ＤＢを構築する場合もある。 There are devices such as electronic filing for digitizing paper documents using an input device such as a scanner, but they have been used exclusively for business purposes dealing with a large amount of paper documents. In recent years, due to the low price of scanners, the widespread use of MFPs equipped with scanning functions and the legislation such as the e-document method, the handling and convenience of offices have been recognized in offices, and paper documents have been digitized. On the other hand, the use of an image DB that integrates electronic document image data, photographic image data, document data created by an application such as a PC into a database (hereinafter referred to as DB), and the like, is also increasing. For example, even if it is necessary to save an original paper document, an image DB may be constructed for ease of management and retrieval.

上記した画像ＤＢは、サーバ装置を設置して多数のユーザがアクセスする大規模なものから、個人のＰＣ内にＤＢを構築するパーソナル用途まで様々であり、例えば、近年のＭＦＰは内蔵のＨＤＤに文書を蓄積する機能が備えられ、ＭＦＰをベースとした画像ＤＢが構築されている。 The above-mentioned image DB varies from a large-scale one installed by a server device to be accessed by a large number of users to a personal use for constructing a DB in a personal PC. For example, recent MFPs have built-in HDDs. A function for storing documents is provided, and an image DB based on the MFP is constructed.

このような文書画像ＤＢにおいて、大量の文書画像から所望の文書画像を検索するために検索機能を備えたものがある。文書画像における現在主流の検索機能は、ＯＣＲ（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅａｄｅｒ）処理した文字認識結果をキーワードとして全文検索、または概念検索等を行うのが一般的である。しかし、このようなテキストベースの検索においては、
（１）ＯＣＲの精度に依存する
（２）検索キーワードが必要
（３）ヒット数が多い場合に絞り込みが困難
という問題点がある。 Some document image DBs have a search function for searching a desired document image from a large number of document images. The current mainstream search function for document images generally performs full-text search or concept search using a character recognition result obtained by OCR (Optical Character Reader) processing as a keyword. However, in such text-based searches,
(1) Depends on OCR accuracy (2) Need search keyword (3) There is a problem that it is difficult to narrow down when the number of hits is large.

上記（１）については、ＯＣＲにより１００％の正解を得ることは現状では不可能であるため、入力した検索キーワードの部分にＯＣＲのミスがあればヒットしないという問題が生じる。上記（２）については、テキストベースの検索は、例えば、インターネットのＨＰを検索するような未知のものを検索する場合や、そのキーワードが明らかな場合には有効性が高いが、例えば数年前に入力した記憶が曖昧な文書を検索するような場合には、適切なキーワードが思い浮かばなければ検索することができない。また、ページ全体が写真やグラフィクスで文章が存在しない場合には当然ながら検索することができない。上記（３）については、テキストベースの検索は順位付けが困難であるため、キーワードにヒットしたものは同格に扱われる。そのため、ヒット数が多い場合、ヒットした多数の文書画像を一つ一つ確認する必要があり、ユーザビリティが悪い。 With regard to (1), since it is currently impossible to obtain 100% correct answer by OCR, there is a problem that if there is an OCR mistake in the input search keyword portion, it will not hit. As for (2) above, text-based search is highly effective when searching for unknown things such as searching for HPs on the Internet, or when the keyword is clear, but for example several years ago When searching for a document with an ambiguous memory input, the search cannot be performed unless an appropriate keyword is conceived. In addition, if the entire page is a photograph or graphics and there is no text, it cannot be retrieved naturally. With regard to (3) above, it is difficult to rank the text-based search, so that a keyword hit is treated as equivalent. Therefore, when the number of hits is large, it is necessary to check a large number of hit document images one by one, resulting in poor usability.

そこで、テキストベース検索の問題を改善する手法として、テキストベース検索と異なるアプローチによる検索手法がある。この検索手法は、文書画像を複数のカテゴリに分類して徐々に文書画像を絞り込む手法であり、あるいは付与されたカテゴリ属性とテキストベースの検索との連携により画像を検索する、文書画像分類（文書画像の属性付与）という方法である。 Therefore, as a technique for improving the problem of the text-based search, there is a search technique based on an approach different from the text-based search. This search method is a method of classifying document images into a plurality of categories and gradually narrowing down the document images, or a document image classification (document that searches for images by cooperation between a given category attribute and text-based search) Image attribute assignment).

文書画像分類は、例えばユーザが文書画像を登録する際に、所望の文書カテゴリ属性を付与し、ユーザの分類体系に基づいて文書画像を分類する。この分類方法は、個々のユーザにとって理想的な分類カテゴリを得る方法であるが、多量の文書画像を登録する際の処理が煩雑であり、またユーザの作業負担が大きいことから、専門のオペレータが作業を行う場合を除いては一般的ではない。 In the document image classification, for example, when a user registers a document image, a desired document category attribute is given, and the document image is classified based on a user classification system. This classification method is a method for obtaining an ideal classification category for each user. However, since a process for registering a large number of document images is complicated and a user's work load is large, a specialized operator is required. It is not common except when performing work.

例えば、ＭＦＰのスキャン機能を使用し、スキャンした文書にカテゴリ属性を付与する場合に、ＭＦＰの操作パネル上で行うが、一文書毎にスキャニング作業を中断し、カテゴリ名を入力しなければならず作業効率が悪い。別な方法として、全ての文書をスキャニングした後、各々の文書にカテゴリ属性を付与する方法もあるが、それぞれの文書を確認する必要があり、ユーザの作業負担が大きい。 For example, when a category attribute is assigned to a scanned document using the scanning function of the MFP, it is performed on the operation panel of the MFP, but the scanning operation must be interrupted for each document and a category name must be input. Work efficiency is poor. As another method, there is a method of assigning a category attribute to each document after scanning all the documents. However, it is necessary to check each document, and the work burden on the user is large.

このような問題を解決する手段として、文書画像を自動分類する技術がある。文書画像の自動分類は、ユーザの理想的な分類カテゴリが得られ難いが、ユーザの負担が大幅に軽減され、またテキストベース検索の問題が解決できることから、文書画像の検索手法として有効な手段である。 As means for solving such a problem, there is a technique for automatically classifying document images. Although automatic classification of document images is difficult to obtain the ideal classification category for users, it is an effective means of searching for document images because it greatly reduces the burden on users and solves the problem of text-based search. is there.

文書画像の自動分類（もしくは文書画像への自動属性付与）の技術としては、例えば特許文献１がある。特許文献１では、スキャンされた文書画像を、属性の等しい矩形領域に分割し、各領域の属性を判定するレイアウト解析処理を行い、レイアウトの特徴（各ブロックの属性、その大きさ等）を認識することで、文書画像を、表や帳票文書、写真文書、及びその他の文書の何れかの文書に自動分類する。 As a technique for automatic classification of document images (or automatic attribute assignment to document images), for example, there is Patent Document 1. In Patent Document 1, a scanned document image is divided into rectangular areas having the same attributes, layout analysis processing for determining the attributes of each area is performed, and layout characteristics (attributes of each block, their sizes, etc.) are recognized. As a result, the document image is automatically classified into any one of a table, a form document, a photo document, and other documents.

また、他の例として特許文献２がある。特許文献２では、入力文書画像を圧縮し２値の圧縮画像とし、２値の圧縮画像の黒画素連結成分に外接する矩形を抽出し、その矩形を文字矩形とそれ以外の矩形に分類し、それぞれの矩形の統合により文字領域やその他の領域を抽出して属性の等しい領域を、文章領域、表領域、罫線、図領域、写真領域等の矩形領域に分割してレイアウト解析を行い、これらの領域の種類と数に関する情報を属性として文書画像を分類する。 Another example is Patent Document 2. In Patent Document 2, an input document image is compressed into a binary compressed image, a rectangle circumscribing a black pixel connected component of the binary compressed image is extracted, and the rectangle is classified into a character rectangle and other rectangles. Character areas and other areas are extracted by integrating each rectangle, and areas with the same attributes are divided into rectangular areas such as text areas, table areas, ruled lines, figure areas, and photo areas, and layout analysis is performed. Document images are classified using information on the type and number of areas as attributes.

特開２００１−１０１２１３号公報JP 2001-101213 A 特開２００３−１７８０７１号公報JP 2003-178071 A

しかし、上記したレイアウト解析処理による文書画像の自動分類手法には、以下のような問題がある。
（１）カギ形のように領域の形状が複雑である場合や、領域が重なり合って込み入っている場合など、領域を精度よく抽出することが難しい。
（２）領域属性を誤って識別（分類、属性判定）した場合の回復手段がない。 However, the document image automatic classification method based on the layout analysis process described above has the following problems.
(1) It is difficult to extract a region with high accuracy when the shape of the region is complicated, such as a key shape, or when the regions overlap each other.
(2) There is no recovery means when an area attribute is mistakenly identified (classification, attribute determination).

つまり、上記従来技術のように、レイアウト解析による矩形領域情報を基に文書画像を自動分類する手法は、上記したような誤った識別（分類判定）が避けられない。 That is, as in the prior art described above, the method of automatically classifying document images based on rectangular area information obtained by layout analysis cannot avoid the erroneous identification (classification determination) as described above.

例えば、「帳票」カテゴリに分類されるべき文書画像が入力され、文書画像の表領域が図領域として誤って識別された場合、文書画像を正しく「帳票」として分類することができない。また、チラシ等のように込み入ったレイアウトに対して、正しく領域を判定できない場合には、領域の形状を誤って識別するだけではなく、領域の属性も誤って判定し、その結果、文書画像を誤って分類することになる。このように誤分類された文書画像を検索しても正しい検索結果が得られず、また検索に時間がかかり、検索効率が悪化する。 For example, when a document image to be classified into the “form” category is input and the table area of the document image is mistakenly identified as a figure area, the document image cannot be correctly classified as “form”. In addition, when the area cannot be correctly determined for a complicated layout such as a flyer, not only the area shape is erroneously identified, but also the area attribute is erroneously determined. It will be classified incorrectly. Even if a document image that has been misclassified in this way is searched, a correct search result cannot be obtained, and the search takes time and the search efficiency deteriorates.

上記した手法では、特に表や図形等に使用されている線画を主体とした領域を誤って判定する。一般的に罫線の判定は、前掲した特許文献２のように、２値化した画像の黒画素連結成分（黒ラン）の長さによって判定する方法を用いているが、この方法では、スキャン画像の２値化時に線が途切れることにより、誤判定を引き起こし、精度に問題がある。また、属性の等しい画像領域を一つのまとまった領域として検出する場合の一般的な手法では、前掲した特許文献２のように、領域の外接矩形を検出するために、同一画像を複数回走査しなければならず、一回の走査による処理、つまり例えば画像を左上からラスタ走査するに従って処理が進み、画像の右下に達すると処理が終了するような処理が難しく、処理時間がかかり、複雑な処理を必要とする。また文書画像毎に処理時間が異なるため、処理時間の予測が難しい。 In the method described above, an area mainly composed of a line drawing used for a table or a figure is erroneously determined. Generally, the ruled line is determined by using a method of determining by the length of a black pixel connected component (black run) of a binarized image as in Patent Document 2 described above. When the line is binarized, the line is interrupted, causing erroneous determination and causing a problem in accuracy. Further, in a general method for detecting image areas having the same attribute as a single area, the same image is scanned a plurality of times in order to detect a circumscribed rectangle of the area as in Patent Document 2 described above. The process must be performed once, that is, for example, the process proceeds as the image is raster scanned from the upper left, and the process ends when the lower right of the image is reached. Requires processing. In addition, since the processing time differs for each document image, it is difficult to predict the processing time.

本発明は上記した問題点に鑑みてなされたもので、
本発明の目的は、処理時間を短縮すると共に、処理を簡単化しつつ、画像の文書タイプを高精度に識別し、自動的に分類する画像処理装置、画像処理方法、プログラムおよび記録媒体を提供することにある。 The present invention has been made in view of the above problems,
An object of the present invention is to provide an image processing apparatus, an image processing method, a program, and a recording medium that can identify a document type of an image with high accuracy and automatically classify it while shortening the processing time and simplifying the processing. There is.

本発明は、画像の文書タイプを識別する画像処理装置であって、前記画像の所定領域の画素を参照して、線画領域か否かを識別する局所領域識別手段と、前記局所領域識別手段により識別された線画識別結果から特徴量を算出する特徴量算出手段と、前記特徴量に応じて前記画像の文書タイプを識別する文書タイプ識別手段を備えたことを最も主要な特徴とする。 The present invention is an image processing apparatus for identifying a document type of an image, comprising: referring to pixels in a predetermined region of the image; a local region identifying unit that identifies whether or not the image is a line drawing region; and the local region identifying unit The main features include a feature amount calculating unit that calculates a feature amount from the identified line drawing identification result and a document type identifying unit that identifies the document type of the image according to the feature amount.

（１）スキャン画像のようなビットマップ画像の文書タイプを識別する画像処理装置、方法において、局所領域毎に線画か否かを識別し、局所領域識別結果から特徴量を算出し、算出された特徴量に基づいて文書タイプを識別し文書分類を行うので、複雑なレイアウト形状を持つ文書画像においても、精度良く文書タイプの識別が可能であり、レイアウト解析の領域識別のように画像を複数回走査する必要がなく、一度の走査で実現できるため、処理コスト（処理時間が短縮され、複雑な処理を必要としない）を低減することができ、かつ処理時間を予測しやすく使い勝手の良い処理を実現できる。
（２）局所領域毎に線画／文字を識別するように構成し、特徴量の次元を増やしているため、識別の精度が向上する。
（３）局所領域毎に線画／文字／写真を識別するように構成し、特徴量の次元を増やしているため、識別の精度が向上すると共に写真を含む文書タイプのカテゴリを識別できる。
（４）周波数変換を行うことによって、線画等の局所領域を識別する際に特定の周波数係数へ電力を集中させることが可能となり局所領域の識別精度が向上する。
（５）文書タイプ属性を識別する際に、複数の識別手段を並列に使用して識別結果に重複を許すことにより識別の精度が向上するとともに、ユーザの主観が異なるような画像を検索する場合にも文書タイプ識別を用いた分類表示によって検索が可能となる。 (1) In an image processing apparatus and method for identifying a document type of a bitmap image such as a scanned image, whether or not the image is a line drawing for each local area, and a feature amount is calculated from the local area identification result Since document types are identified and document classification is performed based on feature quantities, it is possible to identify document types with high accuracy even for document images with complex layout shapes. Since there is no need to scan, it can be realized by a single scan, so it is possible to reduce processing costs (processing time is shortened and complex processing is not required), and processing time is easy to predict and easy to use. realizable.
(2) Since the line drawing / character is identified for each local area and the dimension of the feature amount is increased, the identification accuracy is improved.
(3) Since the line drawing / character / photo is identified for each local area and the dimension of the feature amount is increased, the accuracy of identification is improved and the category of the document type including the photo can be identified.
(4) By performing frequency conversion, it is possible to concentrate power on a specific frequency coefficient when identifying a local area such as a line drawing, and the identification accuracy of the local area is improved.
(5) When recognizing document type attributes, a plurality of identifying means are used in parallel to allow duplication of identification results, thereby improving identification accuracy and searching for an image with different user subjectivity. In addition, the search can be performed by the classification display using the document type identification.

以下、発明の実施の形態について図面により詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

実施例１：
図１は、本発明の実施例１のシステム構成を示す。図１において、１００はパーソナルコンピュータ（以下ＰＣ）、ＰＤＡや携帯電話などのモバイル端末などのクライアント装置である。１０１はモニタなどの表示デバイス、１０２はユーザ指示の解釈、サーバ装置１１０との通信、表示デバイス１０１の制御を行うアプリケーションプログラム、１０３はユーザからの指示入力手段であるキーボードやマウスなどの入力デバイス、１０４はＬＡＮやインターネットなどの外部通信路である。 Example 1:
FIG. 1 shows a system configuration of Embodiment 1 of the present invention. In FIG. 1, reference numeral 100 denotes a client device such as a personal computer (hereinafter referred to as a PC), a mobile terminal such as a PDA or a mobile phone. 101 is a display device such as a monitor, 102 is interpretation of a user instruction, communication with the server apparatus 110, an application program for controlling the display device 101, 103 is an input device such as a keyboard or mouse that is an instruction input means from the user, Reference numeral 104 denotes an external communication path such as a LAN or the Internet.

１１０は画像データを蓄積するデータベース（以下ＤＢ）を有し、入力された画像データの文書タイプを識別し、ＤＢへ文書画像およびその属性情報を登録し、クライアント装置１００からのコマンドに応じて表示画面を生成してクライアント装置１００へ出力するサーバ装置、１１１は外部通信路１０４とのインターフェース（以下Ｉ／Ｆ）、１１２は画像ＤＢ１１４へ登録する登録画像データ、１１３は登録画像１１２を所定サイズ以下に変倍してサムネイル画像を生成するサムネイル生成処理部、１１４は登録画像１１２の画像データ、登録画像１１２のサムネイル画像データを蓄積する画像ＤＢ、１１５は登録画像１１２の画像データについて文書タイプを識別する文書タイプ識別処理部、１１７は画像ＤＢ１１４へ登録されている画像データ毎の情報を蓄積する画像情報ＤＢである。ここで情報とは、例えば、登録画像データのファイル名、作成日、画像データとの紐付け情報（紐付け情報とは例えば、画像ＤＢ１１４に蓄積されるとき、各画像データ固有に付されたＩＤやファイル名など）、文書タイプ（属性）などを言う。 110 has a database (hereinafter referred to as DB) for storing image data, identifies the document type of the input image data, registers the document image and its attribute information in the DB, and displays them in accordance with commands from the client device 100. A server device that generates a screen and outputs it to the client device 100, 111 is an interface (hereinafter referred to as I / F) with the external communication path 104, 112 is registered image data to be registered in the image DB 114, and 113 is a registered image 112 having a predetermined size or less. The thumbnail generation processing unit 114 generates a thumbnail image by scaling, the image database 114 stores the image data of the registered image 112, the image DB stores the thumbnail image data of the registered image 112, and 115 identifies the document type for the image data of the registered image 112. Document type identification processing unit 117 to be registered in the image DB 114 An image information DB for storing information for each image data. Here, the information is, for example, the file name of the registered image data, the creation date, the association information with the image data (the association information is, for example, an ID given to each image data when stored in the image DB 114) Or file name), document type (attribute), etc.

１１８はクライアント装置１００へ表示するための表示画面を生成し、かつ画面制御データ１２０の内容に応じて表示画面を制御する表示画面制御処理部、１１９はクライアント装置１００の表示デバイス１０１上へ表示するための表示画面データ、１２０はクライアント装置１００によって指定され、入力される画面制御データである。図中の点線は、画像登録時のデータの流れを表し、実線は表示画面の生成時のデータの流れを表している。 A display screen control processing unit 118 generates a display screen for display on the client device 100 and controls the display screen according to the content of the screen control data 120, and 119 displays the display screen on the display device 101 of the client device 100. Display screen data 120 for this is screen control data designated and input by the client device 100. The dotted line in the figure represents the data flow at the time of image registration, and the solid line represents the data flow at the time of generating the display screen.

図２は、サーバ装置１１０／クライアント１００装置の構成を示す。図２において、２０１はプログラムに応じた演算や処理を行うＣＰＵ、２０２はプログラムのコードや画像の符号データなどのデータを一時的に記憶、保持するワーク領域として使用される揮発性のメモリ、２０３は画像データやプログラムなどを保存、蓄積するためのハードディスク（以下ＨＤＤ）であり、画像ＤＢ１１４、画像情報ＤＢ１１７を保持する。２０４はモニタ２０５へ表示するためのデータバッファであるビデオメモリである。ビデオメモリ２０４に書き込まれた画像データは、定期的にモニタ２０５へ表示される。２０６はマウスやキーボードなどの入力デバイス、２０７はインターネットやＬＡＮなどの外部通信路１０４を介してデータを送受信する外部Ｉ／Ｆ、２０８は各々の構成要素を接続するバスである。 FIG. 2 shows the configuration of the server apparatus 110 / client 100 apparatus. In FIG. 2, 201 is a CPU that performs calculations and processing according to a program, 202 is a volatile memory used as a work area for temporarily storing and holding data such as program code and image code data, 203 Is a hard disk (hereinafter referred to as HDD) for storing and accumulating image data and programs, and holds an image DB 114 and an image information DB 117. A video memory 204 is a data buffer for displaying on the monitor 205. The image data written in the video memory 204 is periodically displayed on the monitor 205. Reference numeral 206 denotes an input device such as a mouse or a keyboard, 207 denotes an external I / F that transmits and receives data via the external communication path 104 such as the Internet or a LAN, and 208 denotes a bus that connects each component.

本実施例では、サーバ装置１１０がサーバコンピュータにより構成され、表示画面生成などの処理がソフトウェによって実現する例を示す。すなわち、サーバ内の処理は図示しないアプリケーションプログラムによって実現される。本発明の実施形態はこれに限定されず、ＭＦＰなどの装置内にハードウェアによって処理を行うように構成しても良いし、また、サーバ、クライアント構成を採らずに、例えば１つのＰＣやＭＦＰなどの機器内に、図１を構成するようにしても良い。 In the present embodiment, an example in which the server device 110 is configured by a server computer and processing such as display screen generation is realized by software. That is, the processing in the server is realized by an application program (not shown). The embodiment of the present invention is not limited to this, and may be configured to perform processing by hardware in an apparatus such as an MFP. Also, for example, a single PC or MFP can be used without adopting a server or client configuration. 1 may be configured in such a device.

以下、本実施例の動作概要を説明する。実施例１のシステムは、大別すると二つの動作に分かれている。一つは文書画像の登録動作であり、他の一つは所望の文書画像を検索し、閲覧し、取得（サーバからのダウンロード）する「ＤＢ内の文書画像を利用する」動作である。文書画像の利用においては、まず所望の文書画像を検索し、その後、アプリケーションのビューアーを使用して画像を閲覧し、ユーザのＰＣ内へ蓄積する。以下、本実施例の文書画像登録時の動作と、文書画像の検索動作を説明する。 Hereinafter, an outline of the operation of this embodiment will be described. The system according to the first embodiment is roughly divided into two operations. One is an operation for registering a document image, and the other is an operation of “using a document image in a DB” for searching, browsing, and acquiring (downloading from a server) a desired document image. In using a document image, a desired document image is first searched, and then the image is viewed using an application viewer and stored in the user's PC. Hereinafter, an operation at the time of document image registration and a document image search operation according to the present embodiment will be described.

図３は、文書画像登録時の動作フローチャートを示す。図１（破線は登録時の動作を示す）、図３を参照して文書画像登録動作を説明する。 FIG. 3 shows an operation flowchart when registering a document image. The document image registration operation will be described with reference to FIG. 1 (broken lines indicate the operation during registration) and FIG.

ステップＳ００１において、ユーザは、クライアント装置１００からアプリケーションプログラム１０２を介してサーバ装置１１０へ画像データの登録の指示と登録する登録画像データ１１２を指示する。 In step S001, the user instructs the server apparatus 110 to register image data and the registered image data 112 to be registered from the client apparatus 100 via the application program 102.

ステップＳ００２において、登録画像データ１１２は、外部通信路１０４を介してサーバ装置１１０へファイル名、作成日等のファイル情報と共に入力され、外部Ｉ／Ｆ１１１を経由して画像ＤＢ１１４へＩＤ番号が付与されて登録される。同時に、サムネイル生成処理部１１３は登録画像１１２を変倍処理して所定サイズ以下のサムネイル画像を生成し、画像ＤＢ１１４へＩＤ番号を付与して登録する。登録画像データ１１２が複数ページの画像データの場合には、ページ単位でサムネイルを生成する。 In step S002, the registered image data 112 is input to the server apparatus 110 along with file information such as a file name and a creation date via the external communication path 104, and an ID number is assigned to the image DB 114 via the external I / F 111. Registered. At the same time, the thumbnail generation processing unit 113 performs a scaling process on the registered image 112 to generate a thumbnail image having a predetermined size or less, and registers the image DB 114 with an ID number. When the registered image data 112 is a plurality of pages of image data, thumbnails are generated for each page.

ステップＳ００３において、登録画像データ１１２は、後述する文書タイプ識別処理部１１５へ入力され、文書タイプ属性が識別される。識別された文書タイプ属性は、以下の画像情報データと共に画像情報ＤＢ１１７へ登録される。
・ファイル名、作成日
・画像データＩＤ
・サムネイル画像データＩＤ
・文書タイプ属性
なお、画像情報ＤＢ１１７は、一般的なＲＤＢ（リレーショナルデータベース）を使用することにより、情報の登録、管理、検索などの処理を簡易に実現できる。また、画像ＤＢ１１４、画像情報ＤＢ１１７は上述の機能を満たせば、同じＤＢに例えばＸＭＬ（ｅＸｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇａｇｅ）などの言語を使用し、階層的なデータ構造などを構築して蓄積しても良く、また、異なるサーバ毎にそれぞれＤＢとして蓄積してもよい。画像登録は、スキャナやデジタルカメラなどの画像入力装置から直接、画像データをサーバ装置１１０へ登録するようにしても良い。 In step S003, the registered image data 112 is input to a document type identification processing unit 115 described later, and the document type attribute is identified. The identified document type attribute is registered in the image information DB 117 together with the following image information data.
・ File name, creation date ・ Image data ID
・ Thumbnail image data ID
Document Type Attribute Note that the image information DB 117 can easily realize processing such as information registration, management, and search by using a general RDB (relational database). Further, if the image DB 114 and the image information DB 117 satisfy the above-described functions, a hierarchical data structure or the like may be constructed and stored in the same DB using a language such as XML (extensible Markup Language). Alternatively, each different server may be stored as a DB. In the image registration, the image data may be registered in the server device 110 directly from an image input device such as a scanner or a digital camera.

図４は、実施例１の文書タイプ識別処理部１１５の構成を示す。３０１は、入力された登録画像１１２に対して、後段処理の処理量（処理コスト）を低減し、局所領域識別処理部３０３の精度を向上するための画像処理を行う前処理部である。例えば、画素数を減らして処理コストを低減する処理としては、カラー画像のグレー画像への変換処理、縮小変倍処理等があり、精度向上の処理としては、画像のトーンカーブを補正するγ補正処理、例えばデバイス固有の色空間を標準色空間へ変換を行う色補正処理、画像のＭＴＦを補正する空間フィルタ処理、所定の解像度への変換処理（変倍処理と同様）等がある。 FIG. 4 illustrates a configuration of the document type identification processing unit 115 according to the first embodiment. A pre-processing unit 301 performs image processing on the input registered image 112 to reduce the processing amount (processing cost) of subsequent processing and improve the accuracy of the local region identification processing unit 303. For example, processing for reducing the processing cost by reducing the number of pixels includes conversion processing of a color image into a gray image, reduction scaling processing, etc., and accuracy improvement processing includes γ correction for correcting the tone curve of the image. Processing includes, for example, color correction processing for converting a device-specific color space into a standard color space, spatial filter processing for correcting the MTF of an image, conversion processing to a predetermined resolution (similar to scaling processing), and the like.

３０２は、複数の画素を実空間から周波数空間へ変換する周波数変換処理部である。画像の周波数変換処理は、離散フーリエ変換（ＤＦＴ）、離散コサイン（ＤＣＴ）、離散ウェーブレット変換（ＤＷＴ）等種々の方法があるが、本実施例では一例としてＪＰＥＧ圧縮等で使用されている離散コサイン変換（以下ＤＣＴ）を縦横８画素の６４画素単位に行うものとする。ＤＣＴは式１により実現される。 A frequency conversion processing unit 302 converts a plurality of pixels from real space to frequency space. There are various image frequency conversion processes, such as discrete Fourier transform (DFT), discrete cosine (DCT), and discrete wavelet transform (DWT). It is assumed that conversion (hereinafter referred to as DCT) is performed in units of 64 pixels of 8 pixels vertically and horizontally. DCT is realized by Equation 1.

Ｎ＝８：ブロック内水平（垂直）画素数
ｕ，ｖ＝０，１，〜Ｎ−１：ブロック内ＤＣＴ係数座標
ｘ，ｙ＝０，１，〜Ｎ−１：ブロック内画素座
ｆ（ｘ，ｙ）＝入力画素値
Ｆ（ｕ，ｖ）＝ＤＣＴ係数値

N = 8: number of horizontal (vertical) pixels in the block u, v = 0, 1, to N−1: DCT coefficient coordinates in the block x, y = 0, 1, to N−1: pixel position in the block f (x , Y) = input pixel value F (u, v) = DCT coefficient value

３０３は、周波数変換処理部３０２から出力されるＤＣＴ係数を基にして、線画か否かを識別する局所領域識別処理部、３０４は、線画の識別結果（オン／オフ）により構成される識別結果画像、３０５は、線画の識別結果画像３０４から特徴量を算出する特徴量算出処理部、３０６は、特徴量算出処理部３０５で算出された特徴量に基づいて登録画像１１２の文書タイプ属性を識別するタイプ属性識別処理部、３０７は、識別された文書タイプ属性情報である。 Reference numeral 303 denotes a local region identification processing unit for identifying whether or not the image is a line drawing based on the DCT coefficient output from the frequency conversion processing unit 302. Reference numeral 304 denotes an identification result constituted by a line drawing identification result (on / off). An image 305 is a feature amount calculation processing unit that calculates a feature amount from the line drawing identification result image 304, and 306 identifies the document type attribute of the registered image 112 based on the feature amount calculated by the feature amount calculation processing unit 305. A type attribute identification processing unit 307 for identifying document type attribute information is identified.

登録画像１１２に対して、前処理部３０１は所定の前処理を行う。本実施例では、一例として（１）処理コスト低減のために、入力画像がカラー画像の場合、グレー画像へ変換するグレー画像変換処理を行い、（２）処理コスト低減と後段の局所領域識別処理の精度向上のための解像度変換処理が行われる。
（１）グレー画像変換処理は、カラー画像が入力された場合、画像データ量を１／３に低減する処理であり、処理コストが低減される。変換方法は種々の方法があるが、入力される登録画像１１２がＲ（ｅｄ）Ｇ（ｒｅｅｎ）Ｂ（ｌｕｅ）画像の場合は、輝度Ｙへ変換する。ＲＧＢから輝度Ｙへの変換式は、式２で表される。
Ｙ＝０．２９９Ｒ＋０．５８７Ｇ＋０．１１４Ｂ式２
但し、Ｙ：輝度
Ｒ：Ｒｅｄ画素値
Ｇ：Ｇｒｅｅｎ画素値
Ｂ：Ｂｌｕｅ画素値
なお、簡易的に式３を使用しても良い。
Ｙ＝（Ｒ＋２Ｇ＋Ｂ）／４式３
（２）解像度変換処理は、周波数変換処理部３０２が行う周波数変換時に出力される周波数変換係数（ＤＣＴ係数）の周波数を統一する目的で実施する。本実施例で示すように、８ｘ８画素の領域固定で周波数変換を行う場合、登録画像１１２の解像度が異なる場合に、出力される空間周波数も異なるものになるため、予め前処理部３０１において周波数変換処理部３０２へ入力される画像解像度を統一することにより、全体として処理コストが低減される。なお、解像度変換を実施しなくても周波数変換処理部３０２で登録画像１１２の解像度に応じて周波数変換に使用する領域面積（画素数）を変更しても同様の効果が得られる。 The preprocessing unit 301 performs predetermined preprocessing on the registered image 112. In this embodiment, as an example, (1) when the input image is a color image, a gray image conversion process for converting to a gray image is performed in order to reduce the processing cost. Resolution conversion processing is performed to improve the accuracy.
(1) The gray image conversion process is a process of reducing the image data amount to 1/3 when a color image is input, and the processing cost is reduced. There are various conversion methods. If the input registered image 112 is an R (ed) G (reen) B (lue) image, the image is converted to luminance Y. A conversion formula from RGB to luminance Y is expressed by Formula 2.
Y = 0.299R + 0.587G + 0.114B Formula 2
However, Y: Luminance R: Red pixel value G: Green pixel value B: Blue pixel value In addition, you may use Formula 3 simply.
Y = (R + 2G + B) / 4 Formula 3
(2) The resolution conversion process is performed for the purpose of unifying the frequencies of the frequency conversion coefficients (DCT coefficients) output during the frequency conversion performed by the frequency conversion processing unit 302. As shown in the present embodiment, when frequency conversion is performed with an 8 × 8 pixel area fixed, the output spatial frequency is different when the resolution of the registered image 112 is different. By unifying the image resolution input to the processing unit 302, the processing cost is reduced as a whole. Even if the resolution conversion is not performed, the same effect can be obtained by changing the area (number of pixels) used for frequency conversion in accordance with the resolution of the registered image 112 in the frequency conversion processing unit 302.

また、解像度は周波数変換に使用する領域面積に応じて決定することが望ましい。本実施例では、８ｘ８画素分の領域面積において、線画を文字と区別して抽出する必要がある。このため、解像度が高すぎると文字の辺や棒部分を線画として誤識別する可能性が高くなり、逆に、解像度が低すぎると、例えば表部分等の線画と文字部分が結合してしまうため、線画を誤識別する可能性が高くなる。 The resolution is preferably determined according to the area of the area used for frequency conversion. In the present embodiment, it is necessary to extract the line drawing separately from the characters in the area area of 8 × 8 pixels. For this reason, if the resolution is too high, there is a high possibility of misidentifying the side or bar of the character as a line drawing. Conversely, if the resolution is too low, for example, a line drawing such as a table portion and the character portion are combined. This increases the possibility of misidentifying line drawings.

また、解像度変換処理、いわゆる画像の拡大、縮小処理についても、ニアレストネイバー法、線形補間法、キュービックコンボリューション法等種々の方法があるが、特に画像を縮小する場合は、細線画像が途切れたりしないように間引き処理を行わない、線形補間法等が好適である。線形補間法は画像のＭＴＦが落ちるので、解像度変換実施後に空間フィルタ処理等でＭＴＦ補正を行っても良い。 There are various methods for resolution conversion processing, so-called image enlargement / reduction processing, such as nearest neighbor method, linear interpolation method, cubic convolution method, etc. For example, a linear interpolation method that does not perform a thinning process is preferable. In the linear interpolation method, the MTF of the image is lowered. Therefore, MTF correction may be performed by spatial filter processing or the like after the resolution conversion.

次に、周波数変換処理部３０２において、縦横８画素の計６４画素を使用して、式１に示すようなＤＣＴを行い、６４個のＤＣＴ変換係数を出力する。局所領域識別処理部３０８では、ＤＣＴ係数から、８ｘ８画素領域が線画か否かを識別する。 Next, the frequency conversion processing unit 302 performs DCT as shown in Equation 1 using a total of 64 pixels of 8 pixels in the vertical and horizontal directions, and outputs 64 DCT conversion coefficients. The local region identification processing unit 308 identifies whether the 8 × 8 pixel region is a line drawing from the DCT coefficient.

ＤＣＴ係数を使用した局所領域識別手法について、以下説明する。図５は、式１のＤＣＴ係数であるＦ（ｕ，ｖ）を２次元上に配置した図であり、縦線、横線、斜め線が入力された場合の、ＤＣＴ係数の電力が集中する主な係数を黒で塗りつぶして示す。座標（０，０）は、直流（ＤＣ）成分を表す。 A local region identification method using DCT coefficients will be described below. FIG. 5 is a diagram in which F (u, v), which is the DCT coefficient of Expression 1, is two-dimensionally arranged. Mainly, the power of the DCT coefficient concentrates when vertical lines, horizontal lines, and diagonal lines are input. Are shown in black. Coordinates (0, 0) represent a direct current (DC) component.

図５（ａ）は、縦線画像が入力された場合に集中するＤＣＴ係数であり、便宜上、縦線成分とする。図５（ｂ）は、横線画像に対応し横線成分、図５（ｃ）は、斜め線画像に対応し斜め線成分とする。各図のＤＣ成分以外の白塗り部分を各々、非縦線、非横線、非斜め線成分とする。 FIG. 5A shows DCT coefficients that are concentrated when a vertical line image is input, and a vertical line component is used for convenience. 5B corresponds to a horizontal line image corresponding to the horizontal line image, and FIG. 5C corresponds to a diagonal line component corresponding to the diagonal line image. The white portions other than the DC component in each figure are non-vertical lines, non-horizontal lines, and non-diagonal line components, respectively.

図５に示すように、ＤＣＴ係数は、入力画像に対して電力の集中する係数がほぼ予想できる性質を有し、非線成分は低い値を示す。このような性質を使って、ＤＣＴ係数から線画とそれ以外を識別することが可能となる。 As shown in FIG. 5, the DCT coefficient has a property that a coefficient at which power is concentrated with respect to the input image can be almost predicted, and the non-linear component shows a low value. By using such a property, it becomes possible to distinguish a line drawing from the rest from the DCT coefficient.

具体的には、図１３のフローチャートに示すように、各線成分、非線成分に対する「ＤＣＴ係数の絶対値」の合計値、または最大値を予め算出し、縦、横、斜め線各々で線成分、非線成分の差を所定値と比較して（ステップ２１〜２３）、その差が所定値以上のとき、線画と識別する。 Specifically, as shown in the flowchart of FIG. 13, a total value or maximum value of “absolute value of DCT coefficient” for each line component and non-line component is calculated in advance, and the line component is calculated for each of the vertical, horizontal, and diagonal lines. The non-linear component difference is compared with a predetermined value (steps 21 to 23), and when the difference is equal to or larger than the predetermined value, it is identified as a line drawing.

また、他の方法として、サポート・ベクタ・マシン（以下ＳＶＭ）等の学習機械を使用して、ＤＣ成分以外の交流（ＡＣ）成分の絶対値を特徴量として入力し、予め線画から算出されるＤＣＴ係数を学習させたモデルを使用することによって線画とそれ以外を識別する。なお、このような性質は他の周波数変換方法でも同様の性質を示す。例えば、周波数変換処理にＤＷＴを使用し、ＤＷＴ係数を使用して局所領域識別を行っても同様の識別が可能である。 As another method, a learning machine such as a support vector machine (hereinafter referred to as SVM) is used to input an absolute value of an alternating current (AC) component other than a DC component as a feature amount, and is calculated from a line drawing in advance. A line drawing and the others are discriminated by using a model in which DCT coefficients are learned. Such a property shows the same property in other frequency conversion methods. For example, the same identification can be performed by using DWT for frequency conversion processing and performing local region identification using a DWT coefficient.

図６（ａ）は帳票の登録画像１１２を示し、（ｂ）は登録画像１１２に対する線画識別結果画像（局所領域識別結果）３０４を示す。図６（ｂ）では、線画部分を黒、それ以外の部分を白の２値画像として表した。また、図６は、入力画像を縦横８画素のＤＣＴを行った矩形領域単位に、局所領域識別を行った例を示す。従って、識別結果の画像は、周波数変換処理部３０２に入力される画像と比較して、縦横１／８のサイズである。 FIG. 6A shows a registered image 112 of a form, and FIG. 6B shows a line drawing identification result image (local area identification result) 304 for the registered image 112. In FIG. 6B, the line drawing portion is represented as black and the other portion is represented as white binary image. FIG. 6 shows an example in which local area identification is performed on a rectangular area unit obtained by performing DCT of 8 pixels in the vertical and horizontal directions on the input image. Therefore, the image of the identification result is 1/8 in length and width compared to the image input to the frequency conversion processing unit 302.

このように、矩形領域の複数画素により周波数変換処理を使用した局所領域識別を行う場合に、識別単位を矩形単位で行うか、画素単位で行うかは求める精度によって異なる。本実施例のように、矩形領域単位に処理を行えば、処理量が大幅に低減され処理コストを下げることができるが、画素単位に識別する処理（つまり、ＤＣＴを行う画素をオーバーラップさせて注目画素単位にＤＣＴを行い、画素単位に識別結果を出力する。線画識別結果画像３０４は、周波数変換処理部３０２に入力された画像と等しい大きさになる）と比較して精度が低い。 As described above, when local region identification using frequency conversion processing is performed using a plurality of pixels in a rectangular region, whether the identification unit is performed in a rectangular unit or a pixel unit differs depending on the accuracy to be obtained. If processing is performed in units of rectangular areas as in this embodiment, the amount of processing can be greatly reduced and processing costs can be reduced, but processing for identifying in units of pixels (that is, by overlapping pixels for DCT). DCT is performed for each pixel of interest, and the identification result is output for each pixel.The line drawing identification result image 304 has the same size as the image input to the frequency conversion processing unit 302), and the accuracy is low.

次いで、特徴量算出処理部３０５は、線画識別結果画像３０４から特徴量を算出する。画像特徴量としては、画像のモーメントやテクスチャ、エッジ量などがあり、また、算出手法として、画像をいくつかの領域に区切って各々の領域毎に特徴量を算出する。画像の配置情報を特徴量とすることも可能である。 Next, the feature amount calculation processing unit 305 calculates a feature amount from the line drawing identification result image 304. The image feature amount includes an image moment, texture, edge amount, and the like. As a calculation method, the image is divided into several regions, and the feature amount is calculated for each region. It is also possible to use image arrangement information as a feature amount.

本実施例では、高次局所自己相関特徴量を使用した例を示す。２値画像に対して、高次局所自己相関特徴量を使用する場合、図７に示す２５種類の３×３の局所パターンに対して、各々のパターン毎に特徴量が算出される。つまり、２５次元の特徴量が算出される。各特徴量の計算は、局所パターンの対応する画素値の積（図７のパターンの“１”に対応する画素の積であり、２値画像の場合、局所パターンの“１”に対応する画素の論理積となる）を全画素に対して走査し、足し合わせることにより実現される。ただし、算出される特徴量が画像サイズの影響を受けるので、サイズの異なる画像が登録画像１１２に入力される場合は、正規化する。このようにして、局所領域識別処理３０３で識別された線画識別結果が２５次元の特徴量へ変換される。 In the present embodiment, an example in which a higher-order local autocorrelation feature is used is shown. When higher-order local autocorrelation feature values are used for a binary image, the feature values are calculated for each of the 25 types of 3 × 3 local patterns shown in FIG. That is, a 25-dimensional feature amount is calculated. The calculation of each feature amount is the product of the pixel values corresponding to the local pattern (the product of the pixels corresponding to “1” of the pattern in FIG. 7. In the case of a binary image, the pixel corresponding to “1” of the local pattern This is realized by scanning all pixels and adding them together. However, since the calculated feature amount is affected by the image size, normalization is performed when images of different sizes are input to the registered image 112. In this way, the line drawing identification result identified by the local area identification processing 303 is converted into a 25-dimensional feature amount.

２５次元の特徴量がタイプ属性識別処理部３０６へ入力され、タイプ属性識別処理部３０６は、登録画像１１２の文書タイプ属性３０７を識別する。本実施例のタイプ属性識別処理部３０６は、一例として「帳票」、「図面」、「その他」の文書タイプを識別する。 A 25-dimensional feature amount is input to the type attribute identification processing unit 306, and the type attribute identification processing unit 306 identifies the document type attribute 307 of the registered image 112. The type attribute identification processing unit 306 according to the present exemplary embodiment identifies document types such as “form”, “drawing”, and “others” as an example.

タイプ属性識別処理部３０６が文書タイプ属性を識別するには、入力された２５次元の特徴量から、「帳票」、「図面」における特徴量に着目して識別すれば良い。図８は、帳票、図面、論文の各文書画像を入力した場合の特徴量を表したグラフである。図８の各グラフは、各々の文書タイプの特徴的な傾向を示す。すなわち、
・帳票は、Ｎｏ３、Ｎｏ６の次元の特徴量が多く、Ｎｏ１０〜Ｎｏ２５までの次元の特徴量も少ないながらもある。
・図面は、帳票の特徴量に比べて全体的に特徴量が多い（数値が高い）。
・論文は、帳票、図面の特徴量に比べて全体的に特徴量が少ない（数値が低い）。 In order for the type attribute identification processing unit 306 to identify the document type attribute, it is only necessary to identify from the input 25-dimensional feature amount by paying attention to the feature amount in “form” and “drawing”. FIG. 8 is a graph showing the feature amount when each document image of a form, a drawing, and a paper is input. Each graph in FIG. 8 shows a characteristic tendency of each document type. That is,
-The form has many feature quantities in the dimensions of No. 3 and No. 6 and has few feature quantities in the dimensions from No. 10 to No. 25.
-Drawings generally have more feature values (higher numerical values) than feature values of forms.
・ The paper has less overall feature values (lower numerical values) than the feature values of forms and drawings.

従って、これらの特徴を使用することにより、「帳票」、「図面」を識別することができる。識別は、特徴量の値を比較することにより、あるいは所定のしきい値を超えているか否かのしきい値処理などにより行う。 Therefore, by using these features, “form” and “drawing” can be identified. The identification is performed by comparing feature value values or by threshold processing for determining whether or not a predetermined threshold value is exceeded.

図９は、タイプ属性識別処理部３０６のフローチャートである。ステップＳ０１１において、特徴量Ｎｏ２〜Ｎｏ２５の合計値を算出し、所定のしきい値以下か否かを比較する。所定値以下である場合は、「その他」属性を選択する。 FIG. 9 is a flowchart of the type attribute identification processing unit 306. In step S011, the total value of the feature amounts No2 to No25 is calculated, and it is compared whether or not it is a predetermined threshold value or less. If the value is equal to or less than the predetermined value, the “other” attribute is selected.

ステップＳ０１２において、特徴量Ｎｏ２〜Ｎｏ２５の合計値が所定値より大きい場合は、特徴量Ｎｏ３＋Ｎｏ６とＮｏ４＋Ｎｏ５の比率が所定値以下か否か比較し、所定値以下の場合は、「図面」属性を選択し、所定値より大きな場合は「帳票」属性を選択する。 In step S012, if the total value of the feature amounts No2 to No25 is larger than the predetermined value, the ratio of the feature amounts No3 + No6 and No4 + No5 is compared to be less than or equal to the predetermined value. If it is larger than the predetermined value, the “form” attribute is selected.

また、他の方法として、ＳＶＭ等の学習機械を使用して、２５次元の特徴量を入力し、予め学習させたモデルを使用することによって文書タイプを識別しても良い。 As another method, a document type may be identified by using a learning machine such as SVM, inputting a 25-dimensional feature value, and using a model learned in advance.

以上の処理により、実施例１の文書タイプ識別処理が完了し、登録画像１１２の文書タイプ属性が識別される。 Through the above process, the document type identification process of the first embodiment is completed, and the document type attribute of the registered image 112 is identified.

図１０は、文書タイプ識別処理による文書分類処理を使用して文書画像を検索するときの動作フローチャートである。 FIG. 10 is an operation flowchart when a document image is searched using the document classification process based on the document type identification process.

ステップＳ１０１において、ユーザは、クライアント装置１００において、アプリケーションプログラム１０２を使用して、文書分類をサーバ装置１１０へ指示する。このときの指示手段は、例えば図１１に示すようなサムネイル一覧の表示画面をクライアント装置１００の表示デバイス１０１上へ表示する。 In step S 101, the user uses the application program 102 in the client device 100 to instruct the document classification to the server device 110. The instruction means at this time displays a thumbnail list display screen as shown in FIG. 11 on the display device 101 of the client apparatus 100, for example.

図１１において、４０１は文書分類による表示画面を指示する分類ラジオボタン、４０２はサムネイル表示を指示するサムネイルラジオボタン、４０３は画像を表示するフレーム、４０４は画像のサムネイルである。複数の画像サムネイル４０４がフレーム４０３上に表示され、一般的な画像ＤＢで使用されるサムネイルが一覧で表示されている。通常、画像ＤＢ１１４には多数の画像が登録されているが、一度に表示できない画像サムネイルは、フレームに上下のスライダを設けてスクロールして画像を閲覧したり、ページ送り機能を設けて表示画像群を変更することにより閲覧する。 In FIG. 11, 401 is a classification radio button for instructing a display screen by document classification, 402 is a thumbnail radio button for instructing thumbnail display, 403 is a frame for displaying an image, and 404 is a thumbnail of the image. A plurality of image thumbnails 404 are displayed on the frame 403, and thumbnails used in a general image DB are displayed in a list. Normally, a large number of images are registered in the image DB 114, but for image thumbnails that cannot be displayed at once, the upper and lower sliders are provided on the frame to scroll the images, or a page feed function is provided to display a group of displayed images. Browse by changing.

ユーザは、入力デバイス１０３のマウス等のポインティングデバイスを使用して分類ラジオボタン４０１をクリックすることによって、サーバ側に文書分類指示である画面制御データ１２０が外部通信路１０４を介して転送される。 When the user clicks the classification radio button 401 using a pointing device such as a mouse of the input device 103, the screen control data 120 that is a document classification instruction is transferred to the server side via the external communication path 104.

ステップＳ１０２において、サーバ装置１１０は、文書分類指示である画面制御データ１２０を受信すると、表示画面制御処理部１１８は、画面情報ＤＢ１１７の文書タイプ識別データ（以下分類カテゴリ）毎の文書画像数の集計等を行い、表示画面のレイアウトや表示する文書画像データを決定する。 In step S102, when the server apparatus 110 receives the screen control data 120, which is a document classification instruction, the display screen control processing unit 118 counts the number of document images for each document type identification data (hereinafter, classification category) in the screen information DB 117. To determine the layout of the display screen and the document image data to be displayed.

次いで、画像ＤＢ１１４より、表示する文書画像データまたは文書画像データのサムネイルを入力し、分類結果の表示画面１１９を生成し、外部Ｉ／Ｆ１１１より外部通信路１０４を経由してクライアント装置１００へ送信する。 Next, document image data to be displayed or a thumbnail of the document image data is input from the image DB 114 to generate a classification result display screen 119, which is transmitted from the external I / F 111 to the client apparatus 100 via the external communication path 104. .

図１２は、分類結果の表示画面の一例を示す。図１２の「帳票」、「図面」などの文字はカテゴリを表す。４１１は分類カテゴリを表し、３つのカテゴリに分類した例を示す。また、４１１の楕円の大きさは各カテゴリ内に含まれる文書数を模式的に表し（文書数を数字で直接表すようにしても良い）、カテゴリ内のサムネイルは、各カテゴリに含まれる文書画像によるものである。ここで表示される画像サムネイルは、画像ＤＢ１１４に登録されている画像数が少ない場合には全ての画像を表示し、そうでない場合には各カテゴリ内のいくつかの代表画像を表示する。このように表示画像数を絞ることによってクライアント装置１００での表示時間や外部転送路を介した転送時間、サーバ装置１１０での処理時間が共に短縮される。処理速度が十分得られる場合には、全ての画像を重ね合わせたり、スライダを設ける等して表示しても良い。 FIG. 12 shows an example of a display screen for the classification result. Characters such as “form” and “drawing” in FIG. 12 represent categories. Reference numeral 411 represents a classification category and shows an example of classification into three categories. In addition, the size of the ellipse 411 schematically represents the number of documents included in each category (the number of documents may be directly expressed as a number), and the thumbnails in the categories are document images included in each category. Is due to. The image thumbnail displayed here displays all images when the number of images registered in the image DB 114 is small, and displays some representative images in each category otherwise. By reducing the number of display images in this way, the display time on the client device 100, the transfer time via the external transfer path, and the processing time on the server device 110 are both shortened. If a sufficient processing speed is obtained, all the images may be superimposed or displayed with a slider.

上記したような表示画面の作成方法やサーバクライアント間の通信方法には種々の手法があるが、一般的によく使用される手法としてサーバ装置１１０をＷｅｂサーバとしてＷｏｒｌｄＷｉｄｅＷｅｂベースの技術を使用することにより実現可能となる。そして、表示画面１１９はＨＴＭＬ（ＨｙｐｅｒＴｅｘｔＭａｒｋｕｐＬａｎｇａｇｅ）によって記述され、アプリケーション１０２は一般的なＷｅｂブラウザを使用すれば良い。 There are various methods for creating the display screen and the communication method between the server and client as described above. As a commonly used method, by using the World Wide Web-based technology using the server device 110 as a Web server. It becomes feasible. The display screen 119 is described in HTML (Hyper Text Markup Language), and the application 102 may use a general Web browser.

ステップＳ１０３において、クライアント装置１００では、表示デバイス１０１上に表示画面１１９を表示する。 In step S 103, the client apparatus 100 displays a display screen 119 on the display device 101.

ステップＳ１０４において、ユーザは、検索している文書画像に近いカテゴリを、入力デバイス１０３を使用して選択し、選択したカテゴリデータをサーバ装置１１０へ送信する。例えば、ここでは図１２の「帳票」のカテゴリを選択したとする。選択方法としては、カテゴリ内をマウス等のポインティングデバイスでクリックすることによりカテゴリを選択する。また、カテゴリ毎に選択／非選択を可能とし、複数のカテゴリを選択可能なように構成することも可能である。 In step S 104, the user selects a category close to the document image being searched using the input device 103, and transmits the selected category data to the server apparatus 110. For example, it is assumed here that the “form” category of FIG. 12 is selected. As a selection method, a category is selected by clicking in the category with a pointing device such as a mouse. It is also possible to select / deselect for each category, and to be able to select a plurality of categories.

ステップＳ１０５において、サーバ装置１１０は、カテゴリ内の選択指示を表示画面制御処理部１１８で受信し、図１１に示すように、選択されたカテゴリ（帳票カテゴリ）内のサムネイル一覧の表示画面を作成し、クライアント装置１００へ送信する。 In step S105, the server apparatus 110 receives an instruction for selecting a category in the display screen control processing unit 118, and creates a display screen for a thumbnail list in the selected category (form category) as shown in FIG. To the client device 100.

ステップＳ１０６において、クライアント装置１００では、表示デバイス１０１上に、図１１に示す表示画面１２０を表示する。 In step S 106, the client apparatus 100 displays the display screen 120 illustrated in FIG. 11 on the display device 101.

このように文書タイプを識別し、文書タイプを分類表示することにより、ユーザは、文書画像登録数の多い画像ＤＢにおいても画像の概観等の特徴を確認しながら検索対象画像を絞り込むことができる。 Thus, by identifying the document type and classifying and displaying the document type, the user can narrow down the search target images while confirming the features such as the overview of the image even in the image DB having a large number of document image registrations.

以上説明したように本実施例によれば、ビットマップ画像の文書タイプを識別する画像処理装置、方法において、局所領域毎に線画か否かを識別し、局所領域の識別結果から特徴量を算出し、算出された特徴量に基づいて文書タイプを識別し文書分類を行うので、複雑なレイアウト形状を持つ文書画像においても、精度良く文書タイプの識別が可能であり、レイアウト解析の領域識別のように画像を複数回走査する必要がなく、一回の走査により実現できるので、処理コストが低減され、かつ処理時間が予測しやすく使い勝手の良い処理が実現できる。 As described above, according to the present embodiment, in the image processing apparatus and method for identifying the document type of a bitmap image, whether or not the image is a line drawing for each local area is identified, and the feature amount is calculated from the identification result of the local area Since the document type is identified and the document is classified based on the calculated feature amount, the document type can be accurately identified even in a document image having a complicated layout shape. In addition, since it is not necessary to scan the image a plurality of times and it can be realized by a single scan, the processing cost can be reduced, and the processing time can be easily predicted and easy-to-use processing can be realized.

実施例２：
実施例２では、局所領域識別処理部３０３において線画を識別すると共に、文字画像を識別し、文書タイプ属性の精度を向上させる実施例である。実施例２のシステム構成は実施例１と同様である。 Example 2:
In the second embodiment, the local area identification processing unit 303 identifies a line drawing, identifies a character image, and improves the accuracy of the document type attribute. The system configuration of the second embodiment is the same as that of the first embodiment.

文字画像のＤＣＴ係数は、線画のそれと比較して、複数の線成分が含まれている。図１４は、実施例２の局所領域識別処理部３０３のフローチャートを示す。実施例１では、線成分と非線成分の差が所定値以上である場合に、線画と識別したが、実施例２では、予めＡＣ成分の最大値を算出し、最大値が所定値以下である場合には（ステップＳ０３１でｙｅｓ）、「その他」とし、最大値が所定値以下ではなく（ステップＳ０３１でｎｏ）、線成分と非線成分の差が所定値以上の場合は（ステップＳ０３２でｙｅｓ）、「線画」とし、すべての線成分と非線成分の差が所定値以下の場合には（ステップＳ０３３、Ｓ０３４でｎｏ）、「文字」とする。なお、実施例１の場合と同様にＳＶＭ等の学習機械を使用しても良い。 The DCT coefficient of the character image includes a plurality of line components as compared with that of the line drawing. FIG. 14 is a flowchart of the local area identification processing unit 303 according to the second embodiment. In the first embodiment, when the difference between the line component and the non-linear component is equal to or larger than the predetermined value, the line drawing is identified. However, in the second embodiment, the maximum value of the AC component is calculated in advance, and the maximum value is less than the predetermined value. If there is (yes in step S031), “other” is set, the maximum value is not less than or equal to the predetermined value (no in step S031), and if the difference between the line component and the non-linear component is greater than or equal to the predetermined value (in step S032). yes), “line drawing”, and when the difference between all line components and non-line components is equal to or smaller than a predetermined value (no in steps S033 and S034), “line”. Note that a learning machine such as SVM may be used as in the first embodiment.

実施例２の局所領域識別処理部３０３において、「線画」、「文字」を識別するため、識別結果画像は、線画識別結果画像と文字識別結果画像の２種類となる。図６（ｃ）は、実施例２における文字識別結果画像を示す。線画識別結果画像は、実施例１の場合と同様の結果になる。 In the local area identification processing unit 303 according to the second embodiment, “line drawing” and “character” are identified, so that there are two types of identification result images: a line drawing identification result image and a character identification result image. FIG. 6C shows a character identification result image in the second embodiment. The line drawing identification result image is the same as that in the first embodiment.

実施例２では、各々の識別結果画像３０４に対して、特徴量算出処理部３０５は、実施例１の場合と同様に特徴量を算出する。例えば、高次局所自己相関特徴量を算出する場合には、線画識別結果画像から２５次元の特徴量が算出され、文字識別結果画像から２５次元の特徴量が算出され、これら５０次元の特徴量がタイプ属性識別処理部３０６へ入力される。 In the second embodiment, the feature amount calculation processing unit 305 calculates a feature amount for each identification result image 304 as in the case of the first embodiment. For example, when calculating higher-order local autocorrelation feature quantities, 25-dimensional feature quantities are calculated from the line drawing identification result image, 25-dimensional feature quantities are calculated from the character identification result image, and these 50-dimensional feature quantities are calculated. Is input to the type attribute identification processing unit 306.

実施例２では、一例として、入力された登録画像１１２を「帳票」、「図面」、「書籍」、「その他」の４種類の属性を識別する例を示す。図１５は、実施例２のタイプ属性識別処理部３０６の構成を示す。実施例２では、タイプ属性識別処理をＳＶＭ等の学習機械を用いた識別器により識別する例を示す。 In the second embodiment, as an example, an example in which four types of attributes of “form”, “drawing”, “book”, and “other” are identified in the input registration image 112 is shown. FIG. 15 illustrates a configuration of the type attribute identification processing unit 306 according to the second embodiment. In the second embodiment, an example is shown in which the type attribute identification processing is identified by a classifier using a learning machine such as SVM.

図１５において、５０１は特徴量算出処理部３０５より出力される５０次元の特徴量、５０２は入力された特徴量５０１に基づいて文書属性タイプが帳票か否かを識別する帳票識別器、５０３は予め帳票画像の学習データによる特徴量を基に学習され、作成された帳票モデル、５０４は帳票か否かの識別結果、５０５は入力された特徴量５０１に基づいて文書属性タイプが図面か否かを識別する図面識別器、５０６は予め図面画像の学習データによる特徴量を基に学習され、作成された図面モデル、５０７は図面か否かの識別結果、５０８は入力された特徴量５０１に基づいて文書属性タイプが書籍か否かを識別する書籍識別器、５０９は予め書籍画像の学習データによる特徴量を基に学習され、作成された書籍モデル、５１０は書籍か否かの識別結果である。 In FIG. 15, reference numeral 501 denotes a 50-dimensional feature quantity output from the feature quantity calculation processing unit 305, 502 denotes a form identifier for identifying whether or not the document attribute type is a form based on the inputted feature quantity 501, and 503 denotes A form model that has been learned and created based on the feature amount based on the learning data of the form image in advance, 504 is the identification result of whether or not it is a form, 505 is whether or not the document attribute type is a drawing based on the input feature quantity 501 A drawing classifier 506 for learning based on a feature amount based on learning data of a drawing image and created in advance, 507 for identifying whether or not it is a drawing, and 508 based on the inputted feature amount 501. A book discriminator for identifying whether or not the document attribute type is a book, 509 is a book model that has been previously learned and created based on a feature amount based on learning data of a book image, and 510 is a book or not. It is the identification result.

ＳＶＭは、多次元の特徴量を空間射影して識別平面を自動的に作成し、カーネルと呼ばれる識別関数を使用して識別を行う識別器である。実際に識別を行うためには、予め学習データを使用して学習させ、学習結果のモデリングを行う必要がある。学習結果のモデリング結果はモデルと呼ばれるファイルに格納されることになる。学習の際には、実際に識別に使用する「特徴量」と「識別結果の正解」の組を用意して学習させる。一度学習させることにより、後はモデルファイルを使用して識別するのみで高速な識別が可能となる。本実施例では、帳票、図面、書籍の学習結果を各々のモデルファイル５０３、５０６、５０９に格納している。 The SVM is a discriminator that automatically creates a discrimination plane by spatially projecting multi-dimensional feature values and discriminates using a discrimination function called a kernel. In order to actually identify, it is necessary to perform learning using learning data in advance and to model learning results. The learning result modeling result is stored in a file called a model. In learning, a set of “feature amount” and “correct answer of identification result” actually used for identification is prepared and learned. Once learning is performed, high-speed identification is possible only by using a model file. In this embodiment, the learning results of forms, drawings, and books are stored in the respective model files 503, 506, and 509.

以下、実施例２のタイプ属性識別処理部３０６の動作を説明する。特徴量算出処理部３０５から出力される５０次元の特徴量を帳票識別器５０２、図面識別器５０５、書籍識別器５０８へ各々入力し、各モデル５０３、５０６、５０９を使用して各々識別し、識別結果を各々出力する。すべての識別結果が否（オフ）の場合には、「その他」属性が選択されたことになる。文書タイプ属性は、通常１つの登録画像について１つの属性が付与されるが、例えばユーザが画像を識別し、分類する場合でも、どのような属性を付与すべきか迷うことがある。このようなユーザの主観が分かれるような場合に、分類表示を使用して画像を検索すると、ユーザの主観と識別結果が異なり、ユーザが望むような検索結果が得られない。 Hereinafter, the operation of the type attribute identification processing unit 306 according to the second embodiment will be described. The 50-dimensional feature quantity output from the feature quantity calculation processing unit 305 is input to the form discriminator 502, the drawing discriminator 505, and the book discriminator 508, respectively, and identified using the models 503, 506, and 509, Each identification result is output. If all the identification results are negative (off), the “other” attribute is selected. As the document type attribute, one attribute is usually given to one registered image. However, for example, even when the user identifies and classifies an image, it may be confused about what attribute should be given. If the user's subjectivity is divided and an image is searched using the classification display, the user's subjectivity is different from the identification result, and the search result desired by the user cannot be obtained.

そこで、予め、ユーザの主観が分かれるような画像の場合は、複数の属性を付与しておき、ユーザがどちらのカテゴリを選択しても検索対象文書にヒットするように構成することが望ましい。 Therefore, in the case of an image in which the user's subjectivity is divided in advance, it is desirable to provide a plurality of attributes so that the search target document is hit regardless of which category the user selects.

これを実現するには、学習の際にタイプ属性を重複させるべき学習データを用意して学習させることが必要となる。図１６は、重複データの学習を説明する図である。図１６の矩形６０１は画像を表し、６０２、６０３は同一属性のカテゴリを表す。すなわち、図１６のモデルは、特徴量を使用して、２次元上に画像をマッピングした場合の各画像がマップ上にとり得る位置を表している。図１６の黒い画像６０４（以下、重複画像）は、帳票、図面の両方のカテゴリ６０２、６０３に含まれている。これらを学習する際に、帳票モデル５０３の学習時に帳票を正解として学習させ、図面モデル５０６の学習時に図面を正解として学習させる。このように、帳票データを重複して学習させることにより、帳票とも図面とも判然としない画像が帳票、図面の双方に識別されることになる。 In order to realize this, it is necessary to prepare and learn learning data that should overlap type attributes during learning. FIG. 16 is a diagram illustrating learning of duplicate data. A rectangle 601 in FIG. 16 represents an image, and 602 and 603 represent categories having the same attribute. That is, the model of FIG. 16 represents the positions that each image can take on the map when the image is mapped in two dimensions using the feature amount. The black image 604 in FIG. 16 (hereinafter referred to as a duplicate image) is included in both categories 602 and 603 of the form and the drawing. When learning these, the form is learned as a correct answer when learning the form model 503, and the drawing is learned as a correct answer when learning the drawing model 506. In this way, by learning the form data redundantly, an image that is unclear in both the form and the drawing is identified in both the form and the drawing.

なお、通常、書籍等の文書画像は複数ページで入力される場合が多いが、本実施例のようにページ単位に処理を行い、ページ単位に識別し、最終的に識別されたタイプ属性が多いものを複数ページの文書画像の代表文書タイプ属性とする。また、各ページと複数ページの属性を異ならせて、「ページ単位の表示画面」を生成する場合と「文書単位の表示画面」を生成する場合とに、使い分けしても良い。 Normally, a document image such as a book is often input in a plurality of pages. However, as in the present embodiment, processing is performed in units of pages, identified in units of pages, and finally has many type attributes identified. A thing is used as a representative document type attribute of a multi-page document image. In addition, the attributes of each page and a plurality of pages may be different from each other to generate a “page unit display screen” and to generate a “document unit display screen”.

以上、説明したように本実施例によれば、局所領域毎に線画／文字／その他を識別するように構成し、特徴量の次元を増やしているため、識別の精度が向上する。また、タイプ属性を識別する際に、複数の識別器を並列に使用して識別結果に重複を許すことによって識別の精度が向上することに加えて、ユーザの主観が分かれるような画像を検索する場合にも、文書タイプ識別を用いた分類表示によって検索が可能となる。 As described above, according to the present embodiment, the line drawing / character / others are identified for each local region, and the feature quantity dimension is increased, so that the identification accuracy is improved. Also, when identifying type attributes, in addition to improving the accuracy of identification by allowing a plurality of classifiers to be used in parallel and allowing the identification results to be duplicated, search for an image in which the subjectivity of the user is divided. Even in this case, the search can be performed by the classification display using the document type identification.

実施例３：
実施例３は、局所領域識別処理部３０３において線画、文字画像を識別すると共に、写真画像を識別し、文書タイプ属性の精度を向上させる実施例である。実施例３のシステム構成も実施例１と同様である。 Example 3:
The third embodiment is an embodiment in which the local region identification processing unit 303 identifies line drawings and character images and also identifies photographic images to improve the accuracy of document type attributes. The system configuration of the third embodiment is the same as that of the first embodiment.

写真画像のＤＣＴ係数は、線画や文字画像のそれと比較して、基本的にＡＣ成分の電力が弱く、ＤＣＴ係数の電力が比較的広い範囲に分布している。図１７は、実施例３の局所領域識別処理部のフローチャートを示す。実施例３では、ＡＣ成分の最大値と第１のしきい値を比較し、ＡＣ成分の最大値が第１のしきい値以下のとき（ステップＳ０４１でｙｅｓ）、「その他」と識別し、ＡＣ成分の最大値が第１のしきい値以下ではなく（ステップＳ０４１でｎｏ）、ＡＣ成分の最大値が第２のしきい値以下のとき（ステップＳ０４２でｙｅｓ）、「写真」と識別する。 The DCT coefficient of a photographic image basically has a lower AC component power than that of a line drawing or character image, and the DCT coefficient power is distributed over a relatively wide range. FIG. 17 is a flowchart of the local region identification processing unit according to the third embodiment. In the third embodiment, the maximum value of the AC component is compared with the first threshold value. When the maximum value of the AC component is equal to or less than the first threshold value (yes in step S041), “other” is identified. When the maximum value of the AC component is not less than or equal to the first threshold value (no in step S041) and the maximum value of the AC component is less than or equal to the second threshold value (yes in step S042), it is identified as “photograph”. .

実施例３の局所領域識別処理部３０３において、「線画」、「文字」、「写真」を識別するため、識別結果画像は、線画識別結果画像、文字識別結果画像、写真識別結果画像の３種類となり、各々の識別結果画像３０４に対して、特徴量算出処理部３０５は実施例２と同様に特徴量を算出する。例えば、高次局所自己相関特徴量を算出する場合を例にとると、７５次元の特徴量が算出され、タイプ属性識別処理部３０６へ入力される。タイプ属性識別処理部３０６では、７５次元の特徴量を使用して、実施例２と同様に複数の識別器を使用して複数の文書タイプを識別する。 In the local area identification processing unit 303 according to the third embodiment, in order to identify “line drawing”, “character”, and “photograph”, there are three types of identification result images: a line drawing identification result image, a character identification result image, and a photo identification result image. Thus, for each identification result image 304, the feature amount calculation processing unit 305 calculates a feature amount as in the second embodiment. For example, taking the case of calculating a higher-order local autocorrelation feature amount as an example, a 75-dimensional feature amount is calculated and input to the type attribute identification processing unit 306. The type attribute identification processing unit 306 identifies a plurality of document types using a plurality of classifiers in the same manner as in the second embodiment using 75-dimensional feature values.

実施例３では、局所領域識別処理部３０３において写真画像を識別しているので、カタログやチラシ等の文字と写真が含まれている文書タイプや写真のみの画像等を識別できる。 In the third embodiment, since the photographic image is identified by the local area identification processing unit 303, it is possible to identify a document type including a character and a photograph such as a catalog or a flyer, an image of only a photograph, and the like.

以上、説明したように本実施例によれば、局所領域毎に線画／文字／写真／その他を識別するように構成し、特徴量の次元を増やしているため、識別の精度が向上すると共に写真を含む文書タイプのカテゴリの識別が可能となる。 As described above, according to the present embodiment, line drawings / characters / photos / others are identified for each local region, and the feature dimension is increased. It is possible to identify the category of the document type including

本発明の実施例１のシステム構成を示す。The system configuration | structure of Example 1 of this invention is shown. サーバ装置／クライアント装置の構成を示す。The structure of a server apparatus / client apparatus is shown. 文書画像登録時の動作フローチャートを示す。The operation | movement flowchart at the time of document image registration is shown. 文書タイプ識別処理部の構成を示す。The structure of a document type identification process part is shown. ＤＣＴ係数の分布を示す。The distribution of DCT coefficients is shown. 局所領域識別出力の結果を示す。The result of local region identification output is shown. 高次局所自己相関用のパターンを示す。A pattern for higher order local autocorrelation is shown. 帳票、図面、論文の各画像の特徴量を示す。Indicates the feature amount of each image of the form, drawing, and paper. 実施例１のタイプ属性識別処理部のフローチャートである。6 is a flowchart of a type attribute identification processing unit according to the first embodiment. 実施例１の文書画像検索時の動作フローチャートである。6 is an operation flowchart when retrieving a document image according to the first exemplary embodiment. サムネイル一覧表示画面の例を示す。An example of a thumbnail list display screen is shown. 分類表示の例を示す。An example of classification display is shown. 実施例１の局所領域識別処理部のフローチャートである。3 is a flowchart of a local region identification processing unit according to the first embodiment. 実施例２の局所領域識別処理部のフローチャートである。6 is a flowchart of a local region identification processing unit according to the second embodiment. 実施例２のタイプ属性識別処理部の構成を示す。The structure of the type attribute identification process part of Example 2 is shown. 重複データの学習を説明する図である。It is a figure explaining learning of duplication data. 実施例３の局所領域識別処理部のフローチャートである。10 is a flowchart of a local area identification processing unit according to a third embodiment.

Explanation of symbols

１００クライアント装置
１０１表示デバイス
１０２アプリケーションプログラム
１０３入力デバイス
１０４外部通信路
１１０サーバ装置
１１１外部インターフェース
１１２登録画像データ
１１３サムネイル生成処理部
１１４画像ＤＢ
１１５文書タイプ識別処理部
１１７画像情報ＤＢ
１１８表示画面制御処理部
１１９表示画面データ
１２０画面制御データ DESCRIPTION OF SYMBOLS 100 Client apparatus 101 Display device 102 Application program 103 Input device 104 External communication path 110 Server apparatus 111 External interface 112 Registered image data 113 Thumbnail generation process part 114 Image DB
115 Document Type Identification Processing Unit 117 Image Information DB
118 Display Screen Control Processing Unit 119 Display Screen Data 120 Screen Control Data

Claims

An image processing apparatus for identifying a document type of an image, wherein local line identifying means for identifying whether or not the image is a line drawing area with reference to pixels in a predetermined area of the image, and the line drawing identified by the local area identifying means An image processing apparatus comprising: a feature amount calculating unit that calculates a feature amount from an identification result; and a document type identifying unit that identifies a document type of the image according to the feature amount.

An image processing apparatus for identifying a document type of an image, wherein a local area identifying means for identifying a line drawing area and a character area with reference to pixels of a predetermined area of the image, and the line drawing identified by the local area identifying means An image processing apparatus comprising: a feature amount calculating unit that calculates a feature amount from an identification result and a character identification result; and a document type identifying unit that identifies a document type of the image according to the feature amount.

An image processing apparatus for identifying a document type of an image, wherein a local region identifying unit for identifying a line drawing region, a character region, and a photographic region with reference to pixels in a predetermined region of the image, and the local region identifying unit A feature amount calculating unit that calculates a feature amount from the line drawing identification result, a character identification result, and a photo identification result, and a document type identifying unit that identifies a document type of the image according to the feature amount. An image processing apparatus.

The local area identifying means performs frequency conversion on a plurality of pixels in the predetermined area, outputs a plurality of frequency conversion coefficients, and identifies each area based on the frequency conversion coefficients. The image processing apparatus according to 1, 2, or 3.

4. The image processing apparatus according to claim 1, wherein the document type identification unit includes a plurality of classifiers for each document type, and allows duplication of identification results by the plurality of classifiers.

The document type identification unit performs identification based on learning data learned in advance using a feature amount for each document type, and the learning data includes data included in a plurality of document types. 5. The image processing apparatus according to 5.

An image processing method for identifying a document type of an image, wherein a local region identification step for identifying whether or not the image is a line drawing region with reference to pixels of a predetermined region of the image, and the line drawing identified by the local region identification step An image processing method comprising: a feature amount calculating step for calculating a feature amount from an identification result; and a document type identifying step for identifying a document type of the image according to the feature amount.

An image processing method for identifying a document type of an image, wherein a local region identifying step for identifying a line drawing region and a character region with reference to pixels of a predetermined region of the image, and the line drawing identified by the local region identifying step An image processing method comprising: a feature amount calculation step of calculating a feature amount from an identification result and a character identification result; and a document type identification step of identifying a document type of the image according to the feature amount.

An image processing method for identifying a document type of an image, wherein a local region identifying step for identifying a line drawing region, a character region, and a photographic region with reference to pixels in a predetermined region of the image, and the local region identifying step A feature amount calculating step of calculating a feature amount from the line drawing identification result, the character identification result, and the photo identification result, and a document type identifying step of identifying the document type of the image according to the feature amount. Image processing method.

The local area identifying step performs frequency conversion on a plurality of pixels in the predetermined area, outputs a plurality of frequency conversion coefficients, and identifies each area based on the frequency conversion coefficients. The image processing method according to 7, 8 or 9.

The image processing method according to claim 7, wherein the document type identification step includes a plurality of identification steps for each document type, and allows duplication of identification results by the plurality of identification steps.

The document type identifying step performs identification based on learning data learned in advance using a feature value for each document type, and the learning data includes data included in a plurality of document types. 11. The image processing method according to 11.

A program for causing a computer to implement the image processing method according to any one of claims 7 to 12.

A computer-readable recording medium on which a program for causing a computer to implement the image processing method according to any one of claims 7 to 12 is recorded.