JP2006203429A

JP2006203429A - Image data processor

Info

Publication number: JP2006203429A
Application number: JP2005011541A
Authority: JP
Inventors: Ayumi Onishi; あゆみ大西; Nobuo Inoue; 伸夫井上; Minoru Sodeura; 稔袖浦; Masataka Kamiya; 昌孝神谷; Sadao Kootani; 貞夫古尾谷; Junji Kaminari; 淳二神成; Norihisa Hasegawa; 記央長谷川
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2005-01-19
Filing date: 2005-01-19
Publication date: 2006-08-03
Anticipated expiration: 2025-01-19
Also published as: JP4665522B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image data processor capable of easily making database even in the case of an image which is a previously unformatted hard document etc., and includes figures, frame bodies, etc., without forcing a user to bear a large burden. <P>SOLUTION: The image data processor which processes inputted image data consisting of a plurality of pages as specified is equipped with an image discrimination means of discriminating between a common pate common to the respective pages and non-common images different by the pages based upon the inputted image data consisting of the plurality of pages and a property adding means of adding property information to rectangular regions while the common image and non-common images discriminated by the image discriminating means are cut into the rectangular regions. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

この発明は、スキャナー等の画像読取装置により読み取られるなどにして入力された原稿の画像データを処理する画像データ処理装置に関し、特に、同一のフォームを持った複数ページからなる原稿などの入力画像データに対して、共通画像と非共通画像を分離することにより効率よくデータベース化することが可能な画像データ処理装置に関するものである。 The present invention relates to an image data processing apparatus that processes image data of a document input by being read by an image reading apparatus such as a scanner, and more particularly, input image data such as a document consisting of a plurality of pages having the same form. On the other hand, the present invention relates to an image data processing apparatus capable of efficiently creating a database by separating a common image and a non-common image.

特開２００２−２７２２８号公報JP 2002-27228 A 特開平９−１０６４５０号公報JP-A-9-106450

近年、企業のオフィスや役所等で取り扱われる多くの文書は、用紙にプリントやコピーされたハード文書以外に、パーソナルコンピュータ等で作成され保存された文書データや、原稿の画像をスキャナー等で読み取った文書データなど、電子化された画像データとしてやり取りされるようになってきており、ハード文書を電子化してデータベース化されるようになっている。 In recent years, many documents handled in corporate offices and public offices have scanned document data created and saved with a personal computer, etc., as well as hard documents printed or copied on paper, etc., with scanners etc. Document data and the like have been exchanged as digitized image data, and hard documents have been digitized into a database.

その際、例えば、数十ページに及ぶ紙の資料をスキャナーによって読み取ることにより、画像データに変換して保存したり転送するなどというケースも多く発生しているが、この場合には、画像データのファイルサイズが過大となって記憶装置としてのハードディスクの容量を多く消費することになる。 At that time, for example, there are many cases in which, for example, paper documents of several tens of pages are read by a scanner, converted into image data, stored, and transferred. The file size becomes excessive and consumes a large capacity of the hard disk as a storage device.

また、このような数十ページに及ぶ画像データをプリントアウトする場合や、画像データのファイルを転送する場合などには、画像データのデータ量が過大となって、当該画像データをプリントする際の読出や転送に長時間を要したり、ネットワークを混雑させる原因になるという問題点を有していた。 In addition, when printing out image data of several tens of pages or when transferring a file of image data, the amount of image data becomes excessive, and the image data is printed. It has a problem that it takes a long time to read and transfer, or causes a network congestion.

かかる問題点を解決し得る技術としては、例えば、特開２００２−２７２２８号公報や特開平９−１０６４５０号公報等に開示されたものが既に提案されている。 As techniques that can solve such problems, for example, those disclosed in Japanese Patent Application Laid-Open No. 2002-27228 and Japanese Patent Application Laid-Open No. 9-106450 have already been proposed.

上記特開２００２−２７２２８号公報に開示された技術は、プリントアウトする際に共通部分を除去して出力するように構成したものである。 The technique disclosed in the above-mentioned Japanese Patent Application Laid-Open No. 2002-27228 is configured to remove a common portion and output when printing out.

また、上記特開平９−１０６４５０号公報に開示された技術は、ページ間で画像データ中の下地色が共通の濃度を有するものであれば、共通の下地データとなるように構成したものである。 Further, the technique disclosed in the above Japanese Patent Application Laid-Open No. 9-106450 is configured so as to be common ground data if the ground color in the image data has a common density between pages. .

しかし、これらの公報に開示された技術の場合には、複数ページからなる画像の共通部分が保存されなかったり、共通の絵柄や文字を複数のページ間にわたって共通部として認識し管理することはできないという問題点を有していた。 However, in the case of the techniques disclosed in these publications, a common part of an image composed of a plurality of pages cannot be stored, and a common pattern or character cannot be recognized and managed as a common part across a plurality of pages. It had the problem that.

上記の如く複数のページにわたるハード文書を電子化してデータベース化するためには、既にハード文書のフォーマットを電子化しておくことによって、当該電子化されたフォーマットにデータが書き込まれるため、書き込まれたデータに基づいて、データベース化することができる。 In order to digitize a hard document that spans multiple pages as described above into a database, the data is written in the digitized format by already digitizing the format of the hard document, so the written data Based on the database.

また、上記の如く複数のページにわたるハード文書を電子化してデータベース化する際に、フォーマットが電子化されていない場合には、入力されたデータをユーザが矩形状に切り出して、当該矩形状に切り出されたデータをＯＣＲ等により自動認識して、データベース化する必要がある。 In addition, when the hard document that covers a plurality of pages is converted into a database as described above, if the format is not converted to a digital format, the user cuts out the input data into a rectangular shape and cuts it into the rectangular shape. It is necessary to automatically recognize the received data by OCR and create a database.

しかしながら、上記従来技術の場合には、次のような問題点を有している。すなわち、前者の場合には、予めフォーマットが電子化されていないハード文書では、データベース化することができないという問題点を有している。また、上記前者の場合には、フォーマットが電子化されていないハード文書においては、予め、ハード文書のフォーマットを電子化する必要があり、ユーザの負担が大きいという問題点をも有している。 However, the conventional technique has the following problems. That is, the former case has a problem that a hard document whose format has not been digitized in advance cannot be made into a database. In the former case, in the case of a hard document whose format is not digitized, it is necessary to digitize the format of the hard document in advance, which has a problem that the burden on the user is heavy.

また、後者の場合には、ユーザが入力されたデータを矩形状に切り出して、当該矩形状に切り出されたデータをＯＣＲ等により自動認識して、データベース化する必要があり、ユーザの工数が増加するという問題点を有している。 In the latter case, it is necessary to cut out the data input by the user into a rectangular shape, automatically recognize the data cut out in the rectangular shape by OCR or the like, and create a database, which increases the man-hours for the user. Have the problem of

さらに、上記後者の場合には、データベース化すべきハード文書が、文字データのみではなく、図形や枠体などを含んだ画像であると、文字や図形を個別に切り出す必要があり、データベース化する作業が非常に煩雑となるという問題点を有している。 Furthermore, in the latter case, if the hard document to be databased is an image that includes not only character data but also graphics and frames, it is necessary to cut out the characters and graphics individually. Has the problem of becoming very complicated.

そこで、この発明は、上記従来技術の問題点を解決するためになされたものであり、その目的とするところは、予めフォーマット化されていないハード文書等であり、図形や枠体などを含んだ画像である場合であっても、ユーザに過大な負担を強いることなく、容易にデータベース化することが可能な画像データ処理装置を提供することにある。 Therefore, the present invention has been made to solve the above-described problems of the prior art, and its object is a hard document that has not been formatted in advance, including figures and frames. An object of the present invention is to provide an image data processing apparatus that can be easily made into a database without imposing an excessive burden on the user even if it is an image.

上記目的を達成するため、請求項１に記載された発明は、入力された複数ページからなる画像データに対して所定の処理を施す画像データ処理装置において、
前記入力された複数ページからなる画像データに基づいて、各ページに共通する共通画像と各ページ毎に異なる非共通画像を識別する画像識別手段と、
前記画像識別手段によって識別された共通画像と非共通画像を、矩形状に切り出した状態で、当該矩形領域に属性情報を付加する属性付加手段とを備えたことを特徴とする画像データ処理装置である。 In order to achieve the above object, the invention described in claim 1 is an image data processing apparatus that performs predetermined processing on input image data consisting of a plurality of pages.
Image identifying means for identifying a common image common to each page and a different non-common image for each page based on the input image data consisting of a plurality of pages;
An image data processing apparatus comprising: attribute addition means for adding attribute information to the rectangular area in a state where the common image and the non-common image identified by the image identification means are cut out in a rectangular shape. is there.

また、請求項２に記載された発明は、前記属性付加手段は、前記画像識別手段によって識別された共通画像と非共通画像を、Ｔ／Ｉ分離手段によってテキスト部とイメージ部とに分離した状態で、前記イメージ部はイメージ情報のまま、前記テキスト部には文字認識処理を施して、当該文字認識処理によって認識された文字情報に基づいて、前記矩形領域毎に属性情報を付加することを特徴とする請求項１に記載の画像データ処理装置である。 Further, the invention described in claim 2 is a state in which the attribute adding unit separates the common image and the non-common image identified by the image identifying unit into a text part and an image part by the T / I separating unit. The image portion remains image information, the text portion is subjected to character recognition processing, and attribute information is added to each rectangular area based on the character information recognized by the character recognition processing. The image data processing apparatus according to claim 1.

さらに、請求項３に記載された発明は、前記属性付加手段は、前記画像識別手段によって識別された共通画像と非共通画像のうち非共通画像に、共通画像の属性の下に従属するように属性情報を付加することを特徴とする請求項１又は２に記載の画像データ処理装置である。 Furthermore, the invention described in claim 3 is such that the attribute adding means is subordinate to the non-common image among the common image and the non-common image identified by the image identification means under the attribute of the common image. 3. The image data processing apparatus according to claim 1, wherein attribute information is added.

又、請求項４に記載された発明は、前記属性付加手段は、前記画像識別手段によって識別された共通画像に属性情報を付加する順番を、主走査方向優先か副走査方向優先かを選択することができることを特徴とする請求項１乃至３のいずれかに記載の画像データ処理装置である。 According to a fourth aspect of the present invention, the attribute adding means selects whether the priority of adding the attribute information to the common image identified by the image identifying means is the main scanning direction priority or the sub scanning direction priority. The image data processing apparatus according to claim 1, wherein the image data processing apparatus is capable of performing the processing.

更に、請求項５に記載された発明は、前記属性付加手段は、前記画像識別手段によって識別された非共通画像に付加する属性情報を、共通画像に対して、主走査方向優先か副走査方向優先かを選択することができることを特徴とする請求項１乃至３のいずれかに記載の画像データ処理装置である。 Further, in the invention described in claim 5, the attribute adding unit adds the attribute information to be added to the non-common image identified by the image identifying unit with respect to the common image in the main scanning direction or in the sub scanning direction. 4. The image data processing apparatus according to claim 1, wherein priority can be selected.

また、請求項６に記載された発明は、前記画像識別手段が、前記入力された複数ページからなる画像データに基づいて、各ページに共通する共通画像を認識する共通画像認識手段と、
前記入力された各ページの画像データから前記共通画像認識手段によって認識された共通画像を抽出する共通画像抽出手段と、
前記入力された各ページの画像データから前記共通画像抽出手段によって抽出された共通画像を除去して、各ページ毎に異なる非共通画像を求める共通画像除去手段とを備えていることを特徴とする請求項１乃至５のいずれかに記載の画像データ処理装置である。 The invention described in claim 6 is characterized in that the image identification unit recognizes a common image common to each page based on the input image data including a plurality of pages, and
Common image extraction means for extracting a common image recognized by the common image recognition means from the input image data of each page;
And a common image removing unit that removes the common image extracted by the common image extracting unit from the input image data of each page and obtains a different non-common image for each page. An image data processing apparatus according to claim 1.

この発明によれば、予めフォーマット化されていないハード文書等であり、図形や枠体などを含んだ画像である場合であっても、ユーザに過大な負担を強いることなく、容易にデータベース化することが可能な画像データ処理装置を提供することができる。 According to the present invention, even a hard document or the like that has not been formatted in advance and is an image including a figure, a frame, etc., can easily be made into a database without imposing an excessive burden on the user. It is possible to provide an image data processing apparatus capable of processing the image data.

以下に、この発明の実施の形態について図面を参照して説明する。 Embodiments of the present invention will be described below with reference to the drawings.

実施の形態１
図２はこの発明の実施の形態１に係る画像データ処理装置を適用した画像処理システムを示すものである。 Embodiment 1
FIG. 2 shows an image processing system to which the image data processing apparatus according to Embodiment 1 of the present invention is applied.

この画像処理システム１は、図２に示すように、例えば、画像読取装置としてのスキャナー２と、画像出力装置としてのカラー複合機３と、データベースとしてのサーバー４と、画像作成装置としてのパーソナルコンピュータ５と、これらスキャナー２やカラー複合機３、サーバー４、パーソナルコンピュータ５等を互いに通信可能に接続するＬＡＮや電話回線等からなるネットワーク６とを備えるように構成されている。なお、図中、７はスキャナー２とネットワーク６を通信可能に接続する通信モデムを示すものである。 As shown in FIG. 2, the image processing system 1 includes, for example, a scanner 2 as an image reading device, a color multifunction device 3 as an image output device, a server 4 as a database, and a personal computer as an image creation device. 5 and a network 6 including a LAN, a telephone line, and the like for connecting the scanner 2, the color multifunction peripheral 3, the server 4, the personal computer 5 and the like so as to communicate with each other. In the figure, reference numeral 7 denotes a communication modem that connects the scanner 2 and the network 6 so that they can communicate with each other.

上記スキャナー２は、複数のページからなる文書８等の画像を電子化する際に、当該複数のページからなる文書８の画像を順次読み取り、文書８の画像情報を電子化した画像データを出力するものである。このスキャナー２によって読み取られた文書８の画像データは、例えば、ネットワーク６を介して、カラー複合機３に送られ、当該カラー複合機３の内部に設けられた画像データ処理装置によって、所定の画像処理が施された後にプリントアウトされたり、画像データ処理装置によって所望の加工が施されたりするようになっている。なお、上記画像データ処理装置は、カラー複合機３に内蔵される以外に、パーソナルコンピュータ５に画像データ処理用のソフトウエアとしてインストールされ、当該パーソナルコンピュータ５自身が画像データ処理装置として機能するように構成されていても良い。 When the scanner 2 digitizes an image of the document 8 or the like composed of a plurality of pages, the scanner 2 sequentially reads the image of the document 8 composed of the plurality of pages and outputs image data obtained by digitizing the image information of the document 8. Is. The image data of the document 8 read by the scanner 2 is sent to, for example, the color multifunction device 3 via the network 6, and a predetermined image is processed by the image data processing device provided inside the color multifunction device 3. After the processing is performed, it is printed out or desired processing is performed by an image data processing apparatus. The image data processing apparatus is installed in the personal computer 5 as image data processing software in addition to being built in the color multifunction peripheral 3, so that the personal computer 5 itself functions as an image data processing apparatus. It may be configured.

また、上記カラー複合機３は、それ自身で画像読取装置としてのスキャナー９を備えており、当該スキャナー９で読み取った文書の画像を複写したり、パーソナルコンピュータ５から送られてきたり、サーバー４から読み出された画像データに基づいてプリントしたり、電話回線を介して画像データを送受信するファックスとして機能するものである。 Further, the color multifunction machine 3 is provided with a scanner 9 as an image reading device itself, and copies an image of a document read by the scanner 9, is sent from a personal computer 5, or is sent from a server 4. It functions as a fax machine that prints based on the read image data or transmits / receives image data via a telephone line.

さらに、上記サーバー４は、電子化された文書８の画像データなどをそのまま記憶したり、スキャナー２、９によって読み取られ、画像データ処理装置によって所定の画像処理が施され、データベース化されたデータなどを記憶保持するものである。 Further, the server 4 stores the image data of the digitized document 8 as it is, or is read by the scanners 2 and 9 and subjected to predetermined image processing by the image data processing device, and the data stored in the database. Is stored and retained.

図３はこの発明の実施の形態１に係る画像データ処理装置を適用した画像出力装置としてのカラー複合機を示すものである。 FIG. 3 shows a color multifunction machine as an image output apparatus to which the image data processing apparatus according to Embodiment 1 of the present invention is applied.

図３において、１０はカラー複合機の本体を示すものであり、このカラー複合機の上部には、文書８を一枚ずつ分離した状態で自動的に搬送する自動原稿搬送装置（ＡＤＦ）１１と、当該自動原稿搬送装置１１によって搬送される文書８の画像を読み取る画像入力装置（ＩＩＴ）１２を備えた画像読取装置としてのスキャナー９が配設されている。なお、スキャナー２も、当該スキャナー９と同様に構成されている。上記画像入力装置１２は、プラテンガラス１５上に載置された文書８を光源１６によって照明し、文書８からの反射光像を、フルレートミラー１７及びハーフレートミラー１８、１９及び結像レンズ２０からなる縮小光学系を介してＣＣＤ等からなる画像読取素子２１上に走査露光して、この画像読取素子２１によって文書８の色材反射光像を所定のドット密度（例えば、１６ドット／ｍｍ）で読み取るようになっている。 In FIG. 3, reference numeral 10 denotes a main body of the color multifunction peripheral, and an automatic document feeder (ADF) 11 that automatically conveys the document 8 in a state where the documents 8 are separated one by one is disposed above the color multifunction peripheral. A scanner 9 is provided as an image reading device including an image input device (IIT) 12 that reads an image of the document 8 conveyed by the automatic document conveying device 11. The scanner 2 is configured in the same manner as the scanner 9. The image input device 12 illuminates the document 8 placed on the platen glass 15 with the light source 16, and reflects the reflected light image from the document 8 from the full-rate mirror 17, the half-rate mirrors 18 and 19, and the imaging lens 20. The image reading element 21 composed of a CCD or the like is scanned and exposed through the reduction optical system, and the color material reflected light image of the document 8 is formed at a predetermined dot density (for example, 16 dots / mm) by the image reading element 21. It is supposed to read.

上記画像入力装置１２によって読み取られた文書８の反射光像は、例えば、赤（Ｒ）、緑（Ｇ）、青（Ｂ）（各８ｂｉｔ）の３色の反射率データとして画像処理装置１３（ＩＰＳ）に送られ、この画像処理装置１３では、文書８の画像データに対して、必要に応じて、シェーデイング補正、位置ズレ補正、明度／色空間変換、ガンマ補正、枠消し、色／移動編集等の処理を含め、後述するように所定の画像処理が施される。また、この画像処理装置１３は、パーソナルコンピュータ５等から送られてくる画像データに対しても、所定の画像処理を行なうようになっている。上記画像処理装置１３には、本実施の形態に係る画像データ処理装置１００が組み込まれている。 The reflected light image of the document 8 read by the image input device 12 is, for example, the image processing device 13 (as the reflectance data of three colors of red (R), green (G), and blue (B) (each 8 bits). In this image processing device 13, the image data of the document 8 is subjected to shading correction, position shift correction, brightness / color space conversion, gamma correction, frame deletion, color / movement, as necessary. Predetermined image processing is performed as will be described later, including processing such as editing. The image processing apparatus 13 also performs predetermined image processing on image data sent from the personal computer 5 or the like. The image processing apparatus 13 incorporates the image data processing apparatus 100 according to the present embodiment.

そして、上記画像処理装置１３で所定の画像処理が施された画像データは、同じく画像処理装置１３によって、イエロー（Ｙ）、マジェンタ（Ｍ）、シアン（Ｃ）、ブラック（Ｋ）（各８ビット）の４色の階調データに変換され、次に述べるように、イエロー（Ｙ）、マジェンタ（Ｍ）、シアン（Ｃ）、ブラック（Ｋ）の各色の画像形成ユニット２３Ｙ、２３Ｍ、２３Ｃ、２３Ｋに共通するＲＯＳ（ＲａｓｅｒＯｕｔｐｕｔＳｃａｎｎｅｒ）２４に送られ、この画像露光装置としてのＲＯＳ２４では、所定の色の階調データに応じてレーザ光ＬＢによる画像露光が行われる。なお、カラー画像に限らず、白黒の画像のみを形成しても勿論良い。 The image data that has been subjected to predetermined image processing by the image processing device 13 is also processed by the image processing device 13 in the same manner as yellow (Y), magenta (M), cyan (C), and black (K) (each 8 bits). ), And as described below, yellow (Y), magenta (M), cyan (C), and black (K) image forming units 23Y, 23M, 23C, and 23K. The ROS (Raster Output Scanner) 24 that is common to the ROS 24 and the ROS 24 as the image exposure apparatus performs image exposure with the laser beam LB in accordance with gradation data of a predetermined color. Of course, not only a color image but also a monochrome image may be formed.

ところで、上記カラー複合機３の内部には、図３に示すように、画像形成手段Ａが配設されており、この画像形成手段Ａには、イエロー（Ｙ）、マジェンタ（Ｍ）、シアン（Ｃ）、ブラック（Ｋ）の４つの画像形成ユニット２３Ｙ、２３Ｍ、２３Ｃ、２３Ｋが、水平方向に一定の間隔をおいて並列的に配置されている。 By the way, as shown in FIG. 3, an image forming unit A is disposed inside the color multifunction peripheral 3, and the image forming unit A includes yellow (Y), magenta (M), cyan ( C) Four image forming units 23Y, 23M, 23C, and 23K of black (K) are arranged in parallel in the horizontal direction at a constant interval.

これらの４つの画像形成ユニット２３Ｙ、２３Ｍ、２３Ｃ、２３Ｋは、すべて同様に構成されており、大別して、所定の速度で回転駆動される像担持体としての感光体ドラム２５と、この感光体ドラム２５の表面を一様に帯電する一次帯電用の帯電ロール２６と、当該感光体ドラム２５の表面に所定の色に対応した画像を露光して静電潜像を形成する画像露光装置としてのＲＯＳ２４と、感光体ドラム２５上に形成された静電潜像を所定の色のトナーで現像する現像器２７と、感光体ドラム２５の表面を清掃するクリーニング装置２８とから構成されている。これらの感光体ドラム２５と周辺に配置される画像形成部材は、一体的にユニット化されており、カラ―複合機本体１０から個別に交換可能に構成されている。 These four image forming units 23Y, 23M, 23C, and 23K are all configured in the same manner, and are roughly divided into a photosensitive drum 25 as an image carrier that is rotationally driven at a predetermined speed, and the photosensitive drum. A charging roll 26 for primary charging that uniformly charges the surface of 25 and an ROS 24 as an image exposure device that exposes an image corresponding to a predetermined color on the surface of the photosensitive drum 25 to form an electrostatic latent image. And a developing device 27 that develops the electrostatic latent image formed on the photosensitive drum 25 with toner of a predetermined color, and a cleaning device 28 that cleans the surface of the photosensitive drum 25. These photosensitive drums 25 and image forming members arranged in the periphery are integrally unitized, and are configured to be individually replaceable from the color MFP main body 10.

上記ＲＯＳ２４は、図３に示すように、４つの画像形成ユニット２３Ｙ、２３Ｍ、２３Ｃ、２３Ｋに共通に構成されており、図示しない４つの半導体レーザを各色の階調データに応じて変調して、これらの半導体レーザからレーザ光ＬＢ−Ｙ、ＬＢ−Ｍ、ＬＢ−Ｃ、ＬＢ−Ｋを階調データに応じて出射するように構成されている。なお、上記ＲＯＳ２４は、複数の画像形成ユニット毎に個別に構成しても勿論よい。上記半導体レーザから出射されたレーザ光ＬＢ−Ｙ、ＬＢ−Ｍ、ＬＢ−Ｃ、ＬＢ−Ｋは、図示しないｆ−θレンズを介してポリゴンミラー２９に照射され、このポリゴンミラー２９によって偏向走査される。上記ポリゴンミラー２９によって偏向走査されたレーザ光ＬＢ−Ｙ、ＬＢ−Ｍ、ＬＢ−Ｃ、ＬＢ−Ｋは、図示しない結像レンズ及び複数枚のミラーを介して、感光体ドラム２５上の露光ポイントに、斜め下方から走査露光される。 As shown in FIG. 3, the ROS 24 is configured in common to the four image forming units 23Y, 23M, 23C, and 23K, and modulates four semiconductor lasers (not shown) according to gradation data of each color, Laser light beams LB-Y, LB-M, LB-C, and LB-K are emitted from these semiconductor lasers according to gradation data. Of course, the ROS 24 may be individually configured for each of a plurality of image forming units. The laser beams LB-Y, LB-M, LB-C, and LB-K emitted from the semiconductor laser are irradiated to the polygon mirror 29 through an f-θ lens (not shown), and are deflected and scanned by the polygon mirror 29. The The laser beams LB-Y, LB-M, LB-C, and LB-K deflected and scanned by the polygon mirror 29 are used as exposure points on the photosensitive drum 25 through an imaging lens (not shown) and a plurality of mirrors. Then, scanning exposure is performed obliquely from below.

上記ＲＯＳ２４は、図３に示すように、下方から感光体ドラム２５上に画像を走査露光するものであるため、このＲＯＳ２４には、上方に位置する４つの画像形成ユニット２３Ｙ、２３Ｍ、２３Ｃ、２３Ｋの現像器２７などからトナー等が落下して、汚損される虞れを有している。そのため、ＲＯＳ２４は、その周囲が直方体状のフレーム３０によって密閉されているとともに、当該フレーム３０の上部には、４本のレーザ光ＬＢ−Ｙ、ＬＢ−Ｍ、ＬＢ−Ｃ、ＬＢ−Ｋを、各画像形成ユニット２３Ｙ、２３Ｍ、２３Ｃ、２３Ｋの感光体ドラム２５上に露光するため、シールド部材としての透明なガラス製のウインドウ３１Ｙ、３１Ｍ、３１Ｃ、３１Ｋが設けられている。 As shown in FIG. 3, the ROS 24 scans and exposes an image on the photosensitive drum 25 from below, so that the ROS 24 includes four image forming units 23Y, 23M, 23C, and 23K located above. There is a risk that toner or the like may fall from the developing device 27 and be contaminated. Therefore, the periphery of the ROS 24 is sealed by a rectangular parallelepiped frame 30, and four laser beams LB-Y, LB-M, LB-C, and LB-K are placed on the upper portion of the frame 30. Transparent glass windows 31Y, 31M, 31C, and 31K as shield members are provided for exposure on the photosensitive drums 25 of the image forming units 23Y, 23M, 23C, and 23K.

上記画像データ処理装置１３からは、イエロー（Ｙ）、マジェンタ（Ｍ）、シアン（Ｃ）、ブラック（Ｋ）の各色の画像形成ユニット２３Ｙ、２３Ｍ、２３Ｃ、２３Ｋに共通して設けられたＲＯＳ２４に、各色の画像データが順次出力され、このＲＯＳ２４から画像データに応じて出射されたレーザ光ＬＢ−Ｙ、ＬＢ−Ｍ、ＬＢ−Ｃ、ＬＢ−Ｋは、対応する感光体ドラム２５の表面に走査露光され、静電潜像が形成される。上記感光体ドラム２５上に形成された静電潜像は、現像器２７Ｙ、２７Ｍ、２７Ｃ、２７Ｋによって、それぞれイエロー（Ｙ）、マジェンタ（Ｍ）、シアン（Ｃ）、ブラック（Ｋ）の各色のトナー像として現像される。 From the image data processing device 13, the ROS 24 is provided in common for the image forming units 23Y, 23M, 23C, and 23K for each color of yellow (Y), magenta (M), cyan (C), and black (K). The image data of each color is sequentially output, and the laser beams LB-Y, LB-M, LB-C, and LB-K emitted from the ROS 24 according to the image data scan the surface of the corresponding photosensitive drum 25. Exposure is performed to form an electrostatic latent image. The electrostatic latent images formed on the photosensitive drum 25 are respectively yellow (Y), magenta (M), cyan (C), and black (K) by developing units 27Y, 27M, 27C, and 27K. Developed as a toner image.

上記各画像形成ユニット２３Ｙ、２３Ｍ、２３Ｃ、２３Ｋの感光体ドラム２５上に、順次形成されたイエロー（Ｙ）、マジェンタ（Ｍ）、シアン（Ｃ）、ブラック（Ｋ）の各色のトナー像は、各画像形成ユニット２３Ｙ、２３Ｍ、２３Ｃ、２３Ｋの上方にわたって配置された転写ユニット３２の中間転写ベルト３５上に、４つの一次転写ロール３６Ｙ、３６Ｍ、３６Ｃ、３６Ｋによって多重に転写される。これらの一次転写ロール３６Ｙ、３６Ｍ、３６Ｃ、３６Ｋは、各画像形成ユニット２３Ｙ、２３Ｍ、２３Ｃ、２３Ｋの感光体ドラム２５に対応した中間転写ベルト３５の裏面側に配設されている。この実施の形態における一次転写ロール３６Ｙ、３６Ｍ、３６Ｃ、３６Ｋの体積抵抗値は、１０5 〜１０8 Ωｃｍに抵抗調整されたものを使用している。そして、一次転写ロール３６Ｙ、３６Ｍ、３６Ｃ、３６Ｋには、転写バイアス電源（図示しない）が接続されており、所定のトナー極性とは逆極性（本実施の形態では正極性）の転写バイアスが所定のタイミングで印加されるようになっている。 The yellow (Y), magenta (M), cyan (C), and black (K) toner images sequentially formed on the photosensitive drums 25 of the image forming units 23Y, 23M, 23C, and 23K are as follows. On the intermediate transfer belt 35 of the transfer unit 32 disposed over the image forming units 23Y, 23M, 23C, and 23K, the images are transferred in multiple by the four primary transfer rolls 36Y, 36M, 36C, and 36K. These primary transfer rolls 36Y, 36M, 36C, and 36K are disposed on the back side of the intermediate transfer belt 35 corresponding to the photosensitive drum 25 of each of the image forming units 23Y, 23M, 23C, and 23K. In this embodiment, the primary transfer rolls 36Y, 36M, 36C, and 36K have a volume resistance adjusted to 10 5 to 10 8 Ωcm. The primary transfer rolls 36Y, 36M, 36C, and 36K are connected to a transfer bias power source (not shown), and a transfer bias having a polarity opposite to a predetermined toner polarity (positive polarity in the present embodiment) is predetermined. It is applied at the timing.

また、上記中間転写ベルト３５は、図３に示すように、ドライブロール３７と、テンションロール３４と、バックアップロール３８との間に一定のテンションで掛け回されており、図示しない定速性に優れた専用の駆動モーターによって回転駆動されるドライブロール３７により、矢印方向に所定の速度で循環駆動されるようになっている。上記中間転写ベルト３５は、例えば、チャージアップを起こさないべルト素材（ゴムまたは樹脂）にて構成されている。 Further, as shown in FIG. 3, the intermediate transfer belt 35 is wound around the drive roll 37, the tension roll 34, and the backup roll 38 with a constant tension, and has excellent constant speed (not shown). A drive roll 37 that is rotated by a dedicated drive motor is circulated at a predetermined speed in the direction of the arrow. The intermediate transfer belt 35 is made of, for example, a belt material (rubber or resin) that does not cause charge-up.

上記中間転写ベルト３５上に多重に転写されたイエロー（Ｙ）、マジェンタ（Ｍ）、シアン（Ｃ）、ブラック（Ｋ）の各色のトナー像は、図３に示すように、バックアップロール３８に圧接する二次転写ロール３９によって、シート材としての用紙４０上に二次転写され、これらの各色のトナー像が転写された用紙４０は、上方に位置する定着器４１へと搬送される。上記二次転写ロール３９は、バックアップロール３８の側方に圧接しており、下方から上方に搬送される用紙４０上に、各色のトナー像を二次転写するようになっている。 The yellow (Y), magenta (M), cyan (C), and black (K) toner images transferred onto the intermediate transfer belt 35 in multiple layers are pressed against the backup roll 38 as shown in FIG. The secondary transfer roll 39 performs secondary transfer onto the paper 40 as a sheet material, and the paper 40 on which the toner images of these colors are transferred is conveyed to a fixing device 41 positioned above. The secondary transfer roll 39 is in pressure contact with the side of the backup roll 38 and is configured to secondary-transfer toner images of each color onto a sheet 40 conveyed upward from below.

上記用紙４０は、カラー複合機本体１０の下部に複数段配設された給紙トレイ４１、４２、４３、４４のいずれかから所定サイズのものが、フィードロール４５及びリタードロール４６等によって一枚ずつ分離された状態で、搬送ロール４７を備えた用紙搬送路４８を介して給紙される。そして、上記給紙トレイ４１、４２、４３、４４のいずれかから給紙された用紙４０は、レジストロール４９で一旦停止され、中間転写ベルト３５上の画像と同期して、当該レジストロール４９によって中間転写ベルト３５の二次転写位置へと再度給紙される。 The paper 40 is a sheet of a predetermined size from any of the paper feed trays 41, 42, 43, 44 arranged in a plurality of stages at the lower part of the color MFP main body 10, and is fed by a feed roll 45, a retard roll 46, etc. In a state where they are separated one by one, the paper is fed through a paper conveyance path 48 provided with a conveyance roll 47. The paper 40 fed from any of the paper feed trays 41, 42, 43, 44 is temporarily stopped by the registration roll 49 and is synchronized with the image on the intermediate transfer belt 35 by the registration roll 49. The sheet is fed again to the secondary transfer position of the intermediate transfer belt 35.

そして、上記各色のトナー像が転写された用紙４０は、図３に示すように、定着器５０によって熱及び圧力で定着処理を受けた後、搬送ロール５１によって、画像形成面を下にして第１の排出トレイとしてのフェイスダウントレイ５２に排出するための第１の用紙搬送路５３を介して、当該第１の用紙搬送路５３の出口に設けられた排出ロール５４によって、装置本体１０の上部に設けられたフェイスダウントレイ５２上に排出される。 As shown in FIG. 3, the paper 40 on which the toner images of the respective colors are transferred is subjected to a fixing process with heat and pressure by a fixing device 50, and then the image forming surface is faced down by a conveying roll 51. An upper portion of the apparatus main body 10 is discharged by a discharge roll 54 provided at an outlet of the first paper transport path 53 via a first paper transport path 53 for discharging to a face down tray 52 as a single discharge tray. It is discharged onto a face-down tray 52 provided in.

また、上記の如く画像が形成された用紙４０を、画像形成面を上にして排出する場合には、図３に示すように、画像形成面を上にして第２の排出トレイとしてのフェイスアップトレイ５５に排出するための第２の用紙搬送路５６を介して、当該第２の用紙搬送路５６の出口に設けられた排出ロール５７によって、装置本体１の側部（図中、左側面）に設けられるフェイスアップトレイ５５上に排出されるようになっている。 Further, when the sheet 40 on which the image is formed as described above is discharged with the image forming surface facing up, as shown in FIG. 3, the image forming surface is faced up as a second discharge tray. A discharge roller 57 provided at the outlet of the second paper transport path 56 through the second paper transport path 56 for discharging to the tray 55 causes a side portion (left side surface in the figure) of the apparatus main body 1. The paper is discharged onto a face-up tray 55 provided on the surface.

なお、上記カラー複合機３において、フルカラー等の両面コピーをとる場合には、図３に示すように、片面に画像が定着された記録用紙４０を、排出ロール５４によってフェイスダウントレイ５２上にそのまま排出せずに、図示しない切替ゲートによって搬送方向を切り替えるとともに、排出ロール５４を一旦停止させた後に逆転して、当該排出ロール５４によって両面用の用紙搬送路５８へと搬送する。そして、この両面用の用紙搬送路５８には、当該搬送路５８に沿って設けられた搬送ローラ５９により、記録用紙４０の表裏が反転された状態で、再度レジストロール４９へと搬送され、今度は、当該記録用紙４０の裏面に画像が転写・定着された後、第１の用紙搬送路５３又は第２の用紙搬送路５６を介して、フェイスダウントレイ５２又はフェイスアップトレイ５５のいずれかに排出される。 In the above-described color multifunction device 3, when full-color double-sided copying is performed, the recording paper 40 with the image fixed on one side is directly placed on the face-down tray 52 by the discharge roll 54 as shown in FIG. Without discharging, the transfer direction is switched by a switching gate (not shown), the discharge roll 54 is temporarily stopped and then reversely rotated, and the discharge roll 54 transfers the sheet to the double-sided paper transfer path 58. Then, the recording paper 40 is conveyed again to the registration roll 49 in a state where the recording paper 40 is turned upside down by the conveyance roller 59 provided along the conveyance path 58. After the image is transferred / fixed on the back surface of the recording paper 40, the image is transferred to either the face-down tray 52 or the face-up tray 55 via the first paper transport path 53 or the second paper transport path 56. Discharged.

図３中、６０Ｙ、６０Ｍ、６０Ｃ、６０Ｋは、イエロー（Ｙ）、マジェンタ（Ｍ）、シアン（Ｃ）、ブラック（Ｋ）の各色の現像器２７に、所定の色のトナーを供給するトナーカートリッジをそれぞれ示している。 In FIG. 3, reference numerals 60Y, 60M, 60C, and 60K denote toner cartridges that supply toner of a predetermined color to the developing devices 27 of each color of yellow (Y), magenta (M), cyan (C), and black (K). Respectively.

図４は上記カラー複合機３の各画像形成ユニットを示すものである。 FIG. 4 shows each image forming unit of the color MFP 3.

上記イエロー色、マジェンタ色、シアン色及びブラック色の４つの画像形成ユニット２３Ｙ、２３Ｍ、２３Ｃ、２３Ｋは、図４に示すように、すべて同様に構成されており、これらの４つの画像形成ユニット２３Ｙ、２３Ｍ、２３Ｃ、２３Ｋでは、上述したように、それぞれイエロー色、マジェンタ色、シアン色及びブラック色のトナー像が所定のタイミングで順次形成されるように構成されている。上記各色の画像形成ユニット２３Ｙ、２３Ｍ、２３Ｃ、２３Ｋは、上述したように、それぞれ感光体ドラム２５を備えており、この感光体ドラム２５の表面は、一次帯電用の帯電ロール２６によって一様に帯電される。その後、上記感光体ドラム２５の表面は、ＲＯＳ２４から画像データに応じて出射される画像形成用のレーザ光ＬＢが走査露光されて、各色に対応した静電潜像が形成される。上記感光体ドラム２５上に走査露光されるレーザ光ＬＢは、当該感光体ドラム２５の直下よりやや右側寄りの斜め下方から露光されるように設定されている。上記感光体ドラム２５上に形成された静電潜像は、各画像形成ユニット２３Ｙ、２３Ｍ、２３Ｃ、２３Ｋの現像器２７の現像ロール２７ａによってそれぞれイエロー色、マジェンタ色、シアン色、ブラック色の各色のトナーにより現像されて可視トナー像となり、これらの可視トナー像は、一次転写ロール３６の帯電によって中間転写ベルト３５上に順次多重に転写される。 The four image forming units 23Y, 23M, 23C, and 23K of yellow, magenta, cyan, and black are all configured in the same manner as shown in FIG. 4, and these four image forming units 23Y. , 23M, 23C, and 23K, as described above, yellow, magenta, cyan, and black toner images are sequentially formed at predetermined timings. As described above, the image forming units 23Y, 23M, 23C, and 23K for the respective colors are each provided with the photosensitive drum 25, and the surface of the photosensitive drum 25 is uniformly formed by the charging roll 26 for primary charging. Charged. Thereafter, the surface of the photosensitive drum 25 is scanned and exposed to an image forming laser beam LB emitted from the ROS 24 according to image data, and an electrostatic latent image corresponding to each color is formed. The laser beam LB scanned and exposed on the photosensitive drum 25 is set so as to be exposed from an obliquely lower side slightly to the right side from just below the photosensitive drum 25. The electrostatic latent images formed on the photosensitive drum 25 are respectively yellow, magenta, cyan, and black by the developing rolls 27a of the developing units 27 of the image forming units 23Y, 23M, 23C, and 23K. The visible toner images are developed with the toners of the toner, and these visible toner images are sequentially transferred in multiple onto the intermediate transfer belt 35 by the charging of the primary transfer roll 36.

なお、トナー像の転写工程が終了した後の感光体ドラム２５の表面は、クリーニング装置２８によって残留トナーや紙粉等が除去されて、次の画像形成プロセスに備える。上記クリーニング装置２８は、クリーニングブレード２８ａを備えており、このクリーニングブレード２８ａによって、感光体ドラム２５上の残留トナーや紙粉等を除去するようになっている。また、トナー像の転写工程が終了した後の中間転写ベルト３５の表面は、図３に示すように、クリーニング装置６１によって残留トナーや紙粉等が除去されて、次の画像形成プロセスに備える。上記クリーニング装置６１は、クリーニングブラシ６２及びクリーニングブレード６３を備えており、これらのクリーニングブラシ６２及びブレード６３によって、中間転写ベルト３５上の残留トナーや紙粉等を除去するようになっている。 Residual toner, paper dust, and the like are removed from the surface of the photosensitive drum 25 after the toner image transfer process is completed by the cleaning device 28 to prepare for the next image forming process. The cleaning device 28 includes a cleaning blade 28a, and the cleaning blade 28a removes residual toner, paper dust, and the like on the photosensitive drum 25. Further, as shown in FIG. 3, residual toner, paper dust, and the like are removed from the surface of the intermediate transfer belt 35 after the toner image transfer process is completed, so as to prepare for the next image forming process. The cleaning device 61 includes a cleaning brush 62 and a cleaning blade 63, and residual toner and paper dust on the intermediate transfer belt 35 are removed by the cleaning brush 62 and the blade 63.

図５は、単独で配置された画像読取装置としてのスキャナー２を示すものである。 FIG. 5 shows a scanner 2 as an image reading apparatus arranged independently.

このスキャナー２は、上述したカラー複合機３のスキャナー９と同様に構成されているが、当該スキャナー２には、画像処理装置１３が内蔵されている。 The scanner 2 is configured in the same manner as the scanner 9 of the color multifunction machine 3 described above, but the scanner 2 includes an image processing device 13.

ところで、この実施の形態１に係る画像データ処理装置は、入力された複数ページからなる画像データに対して所定の処理を施す画像データ処理装置において、前記入力された複数ページからなる画像データに基づいて、各ページに共通する共通画像と各ページ毎に異なる非共通画像を識別する画像識別手段と、前記画像識別手段によって識別された共通画像と非共通画像を、矩形状に切り出した状態で、当該矩形領域に属性情報を付加する属性付加手段とを備えるように構成されている。 By the way, the image data processing apparatus according to the first embodiment is based on the input image data including a plurality of pages in the image data processing apparatus that performs a predetermined process on the image data including the plurality of pages. The image identification means for identifying a common image common to each page and a non-common image different for each page, and the common image and the non-common image identified by the image identification means are cut out in a rectangular shape, Attribute addition means for adding attribute information to the rectangular area is provided.

また、この実施の形態では、前記属性付加手段は、前記画像識別手段によって識別された共通画像と非共通画像を、Ｔ／Ｉ分離手段によってテキスト部とイメージ部とに分離した状態で、前記イメージ部はイメージ情報のまま、前記テキスト部には文字認識処理を施して、当該文字認識処理によって認識された文字情報に基づいて、前記矩形領域毎に属性情報を付加するように構成されている。 In this embodiment, the attribute adding unit is configured to separate the common image and the non-common image identified by the image identifying unit into a text part and an image part by a T / I separating unit. The part remains configured as image information, and the text part is subjected to character recognition processing, and attribute information is added to each rectangular area based on the character information recognized by the character recognition processing.

さらに、この実施の形態では、前記属性付加手段は、前記画像識別手段によって識別された共通画像と非共通画像のうち非共通画像に、共通画像の属性の下に従属するように属性情報を付加するように構成されている。 Furthermore, in this embodiment, the attribute adding unit adds attribute information to the non-common image among the common image and the non-common image identified by the image identification unit so as to be subordinate to the common image attribute. Is configured to do.

又、この実施の形態では、前記属性付加手段は、前記画像識別手段によって識別された共通画像に属性情報を付加する順番を、主走査方向優先か副走査方向優先かを選択することができるように構成されている。 In this embodiment, the attribute adding unit can select whether the priority of adding the attribute information to the common image identified by the image identifying unit is the main scanning direction priority or the sub scanning direction priority. It is configured.

更に、この実施の形態では、前記属性付加手段は、前記画像識別手段によって識別された非共通画像に付加する属性情報を、共通画像に対して、主走査方向優先か副走査方向優先かを選択することができるように構成されている。 Furthermore, in this embodiment, the attribute adding unit selects whether the attribute information to be added to the non-common image identified by the image identifying unit is the main scanning direction priority or the sub scanning direction priority with respect to the common image. It is configured to be able to.

また、この実施の形態では、前記画像識別手段が、前記入力された複数ページからなる画像データに基づいて、各ページに共通する共通画像を認識する共通画像認識手段と、前記入力された各ページの画像データから前記共通画像認識手段によって認識された共通画像を抽出する共通画像抽出手段と、前記入力された各ページの画像データから前記共通画像抽出手段によって抽出された共通画像を除去して、各ページ毎に異なる非共通画像を求める共通画像除去手段とを備えるように構成されている。 Further, in this embodiment, the image identifying means includes a common image recognition means for recognizing a common image common to each page based on the inputted image data including a plurality of pages, and each inputted page. A common image extraction unit that extracts the common image recognized by the common image recognition unit from the image data of the image, and removes the common image extracted by the common image extraction unit from the input image data of each page, A common image removing unit that obtains a different non-common image for each page.

すなわち、この実施の形態に係る画像データ処理装置１００は、図３に示すように、画像出力装置としてのカラー複合機３の内部に、画像処理装置１３の一部として組み込まれた状態で装着されている。また、この画像データ処理装置１００は、パーソナルコンピュータ５等に画像データ処理用のソフトウエアをインストールすることによって構成されている。さらに、上記画像データ処理装置１００は、図５に示すように、画像読取装置としてのスキャナー２の内部に、画像処理装置１３の一部として組み込まれた状態で装着されるように構成しても良い。 That is, as shown in FIG. 3, the image data processing apparatus 100 according to this embodiment is mounted inside the color multifunction peripheral 3 as an image output apparatus in a state of being incorporated as a part of the image processing apparatus 13. ing. The image data processing apparatus 100 is configured by installing image data processing software in the personal computer 5 or the like. Further, as shown in FIG. 5, the image data processing apparatus 100 may be configured to be mounted inside the scanner 2 as an image reading apparatus in a state of being incorporated as a part of the image processing apparatus 13. good.

この画像データ処理装置１００は、図１に示すように、大別して、画像読取装置としてのスキャナー２、９から画像データが入力され、当該入力された画像データに対して所定の画像処理を施す画像処理手段としての画像処理部１１０と、入力された画像データや画像処理部１１０によって所定の画像処理が施された画像データ等を記憶するメモリ部１２０とから構成されている。また、上記画像処理部１１０は、共通画像認識部１１１と、共通画像抽出部１１２と、共通画像除去部１１３と、Ｔ／Ｉ分離部１１４と、矩形切り出し部１１５と、ＯＣＲ部１１６と、属性付加部１１７と、ファイル生成部１１８とを備えている。さらに、上記メモリ部１２０は、第１のメモリ１２１と、第２のメモリ１２２と、第３のメモリ１２３とを備えている。なお、データベース化部１１９は、例えば、パーソナルコンピュータ５側に設けられる。 As shown in FIG. 1, the image data processing apparatus 100 is roughly divided into image data input from scanners 2 and 9 as image reading apparatuses, and an image for performing predetermined image processing on the input image data. The image processing unit 110 is a processing unit, and the memory unit 120 stores input image data, image data subjected to predetermined image processing by the image processing unit 110, and the like. The image processing unit 110 includes a common image recognition unit 111, a common image extraction unit 112, a common image removal unit 113, a T / I separation unit 114, a rectangular cutout unit 115, an OCR unit 116, an attribute, An addition unit 117 and a file generation unit 118 are provided. Further, the memory unit 120 includes a first memory 121, a second memory 122, and a third memory 123. The database creating unit 119 is provided on the personal computer 5 side, for example.

上記画像読取装置２、９から入力された複数ページの画像データは、共通画像認識部１１１を介して、第１のメモリ１２１の入力画像記憶部１２４に一時記憶される。上記共通画像認識部１１１は、画像読取装置２、９から入力され、第１のメモリ１２１の入力画像記憶部１２４に一時記憶された複数ページの画像データに基づいて、各ページに共通する共通画像を認識するためのものである。この共通画像認識部１１１は、１ページ目の画像データと２ページ目の画像データというように、各ページの画像データを互いに比較して、各ページに共通する共通画像を認識するように構成されている。 A plurality of pages of image data input from the image reading devices 2 and 9 are temporarily stored in the input image storage unit 124 of the first memory 121 via the common image recognition unit 111. The common image recognition unit 111 is a common image common to each page based on a plurality of pages of image data input from the image reading devices 2 and 9 and temporarily stored in the input image storage unit 124 of the first memory 121. It is for recognizing. The common image recognition unit 111 is configured to recognize the common image common to each page by comparing the image data of each page, such as the image data of the first page and the image data of the second page. ing.

上記画像読取装置２、９によって読み取られる複数ページにわたる文書８としては、例えば、図６に示すように、学校や予備校、あるいは学習塾等で用いられるテスト用紙や、企業のオフィスや役所等で使用される定型の文書などが挙げられる。ただし、文書としては、これらに限定されるものではなく、他の種類の文書等であっても良いことは勿論である。このテスト用紙からなる文書８には、図６に示すように、テスト用紙を作成した会社等を表示したマークやイラスト等の図形８０１や、学期末テストや科目等の文書のタイトルを示す文字画像８０２、「氏名」を書く欄に記載された「氏名」の文字８０３、「問１」、「問２」、「問３」・・・等の問題番号を示す文字を含む問題文８０４、８０５、８０６、各「問１」、「問２」、「問３」・・・等の問題の配点を示す数字８０７、８０８、８０９、「氏名」の欄や問題文の欄を囲む矩形状の枠を表示する直線状の枠画像８１０などが、印刷やプリント等によって予め記載されている。また、上記テスト用紙の文書８には、テストを受けた者が、「氏名」８１１や解答としての数字８１２や数式８１３、あるいは解答としての文章８１４や棒グラフ等の図形８１５が手書きによって記載されている。 As the document 8 covering a plurality of pages read by the image reading devices 2 and 9, for example, as shown in FIG. 6, it is used in a test sheet used in a school, a prep school, a learning cram school, a company office, a government office, etc. For example, a standard document. However, the document is not limited to these documents, and may be other types of documents. As shown in FIG. 6, the test document 8 includes a graphic 801 such as a mark or an illustration indicating the company or the like that created the test paper, and a character image indicating the title of a document such as a term test or subject. 802, question sentences 804 and 805 including characters indicating the problem number such as “name” characters 803, “question 1”, “question 2”, “question 3”. 806, numbers “807”, “808”, “809” indicating the score of each question such as “question 1”, “question 2”, “question 3”, etc., and a rectangular shape surrounding the “name” column and the question sentence column A linear frame image 810 for displaying a frame is described in advance by printing or printing. Further, in the test sheet document 8, a person who has taken the test has written “name” 811, numbers 812 and mathematical expressions 813 as answers, or figures 815 such as sentences 814 and bar graphs as answers. Yes.

また、上記テスト用紙の文書８には、図６に示すように、左上の隅等の所定位置に、矩形状や十字状等の所定の形状に形成された位置合わせ用の認識マーカ８１６が印刷やプリント等によって予め記載されている。 Further, as shown in FIG. 6, a registration marker 816 for alignment formed in a predetermined shape such as a rectangular shape or a cross shape is printed on the test paper document 8 at a predetermined position such as the upper left corner. Or printed in advance.

そして、上記共通画像認識手段１１１は、入力された各ページの画像データに付加された位置合わせ用の認識マーカ８１６を検出し、当該認識マーカ８１６の検出結果に基づいて、入力された各ページの画像データの位置を調整するように構成されている。そのため、各ページの文書８に印刷等された図形８０１や文字画像８０２等に、ページ毎にずれが存在する場合であっても、認識マーカ８１６の位置を基準として、入力された各ページの画像データの位置を調整することにより、各ページに共通する画像を誤差なく認識することが可能となる。 Then, the common image recognition unit 111 detects a recognition marker 816 for alignment added to the input image data of each page, and based on the detection result of the recognition marker 816, the input of each input page. The position of the image data is adjusted. Therefore, even when there is a shift for each page in the figure 801, the character image 802, etc. printed on the document 8 of each page, the image of each page input based on the position of the recognition marker 816 is used. By adjusting the data position, an image common to each page can be recognized without error.

更に説明すると、上記共通画像認識手段１１１は、図７に示すように、各ページの画像を読み取った画像データに全体的な位置ずれがある場合であっても、例えば、認識マーカ８１６から文字画像８０３等までのｘ方向及びｙ方向の距離Ｄｘ，Ｄｙを基準にして、文字画像８０３に外接する矩形のｘ方向の幅Ｗ及びｙ方向の高さＨが求められ、各ページの画像データの位置を調整するように構成されている。そして、この共通画像認識部１１１は、図８乃至図１０に示すように、入力された各ページの画像データのうち、１ページ目と２ページ目の画像データの共通画像を認識し、当該認識結果と３ページ目の画像データの共通画像を認識し、以降同様に前ページまでの認識結果と現ページの画像データの共通画像を認識するように構成されている。 More specifically, as shown in FIG. 7, the common image recognizing unit 111 can detect, for example, a character image from the recognition marker 816 even when there is an overall positional shift in the image data obtained by reading the image of each page. The width W in the x direction and the height H in the y direction of the rectangle circumscribing the character image 803 are obtained on the basis of the distances Dx and Dy in the x direction and the y direction up to 803, etc., and the position of the image data on each page Configured to adjust. Then, as shown in FIGS. 8 to 10, the common image recognition unit 111 recognizes the common image of the image data of the first page and the second page among the input image data of each page, and recognizes the recognition. A common image of the result and the third page of image data is recognized, and thereafter, the recognition result up to the previous page and the common image of the current page of image data are similarly recognized.

その際、上記共通画像認識手段１１１では、入力された各ページの画像データにビット膨張処理を施して共通画像を認識するように構成されている。つまり、上記各ページの画像が図６に示すように枠体状の画像８１０である場合には、１ページ目の画像データと２ページ目の画像データとが１ビット程度でもずれると、枠体状の画像８１０を共通画像として認識できない虞れがある。 At this time, the common image recognition unit 111 is configured to recognize the common image by performing bit expansion processing on the input image data of each page. That is, when the image of each page is a frame-shaped image 810 as shown in FIG. 6, if the image data of the first page and the image data of the second page are shifted by about 1 bit, the frame There is a possibility that the image 810 having a shape cannot be recognized as a common image.

そこで、この実施の形態では、特に、図１１に示すように、枠体状の画像８０６のようにビット数が少ない画像の場合に、枠体状の画像８０６を縦方向及び横方向に１〜数ビット程度だけビット数を増加させるビット膨張処理を施した上で、共通画像を認識するように構成されている。 Therefore, in this embodiment, particularly in the case of an image with a small number of bits such as a frame-shaped image 806 as shown in FIG. It is configured to recognize a common image after performing bit expansion processing for increasing the number of bits by about several bits.

また、上記共通画像抽出部１１２は、入力された各ページの画像データから、前記共通画像認識手段１１１によって認識された各ページに共通する共通画像を抽出するように構成されている。そして、この共通画像抽出部１１２によって抽出された共通画像は、第１のメモリ１２１の共通画像記憶部１２５に記憶される。 The common image extraction unit 112 is configured to extract a common image common to each page recognized by the common image recognition unit 111 from the input image data of each page. The common image extracted by the common image extraction unit 112 is stored in the common image storage unit 125 of the first memory 121.

さらに、上記共通画像除去部１１３では、入力された各ページの画像データから、前記共通画像抽出部１１２によって抽出された共通画像を除去する処理が行なわれ、各ページの画像データ毎に異なる非共通画像が求められる。この共通画像除去部１１３によって求められた非共通画像は、第２のメモリ１２２の非共通画像記憶部１２６に記憶される。 Further, the common image removal unit 113 performs a process of removing the common image extracted by the common image extraction unit 112 from the input image data of each page, and is different for each page of image data. An image is required. The non-common image obtained by the common image removal unit 113 is stored in the non-common image storage unit 126 of the second memory 122.

又、上記Ｔ／Ｉ分離部１１４は、入力された各ページの画像データを、文字画像等からなるテキスト（Ｔｅｘｔ）部と、図形等の画像からなるイメージ（Ｉｍａｇｅ）部とに分離するためのものである。このＴ／Ｉ分離部１１４によって分離された各ページの画像データのテキスト部とイメージ部の情報は、個別にＴ／Ｉ分離結果１２７として第３のメモリ１２３に適宜読み出し可能に記憶されるようになっている。 The T / I separation unit 114 separates the input image data of each page into a text (Text) portion made up of character images and the like and an image (Image) portion made up of images such as figures. Is. Information of the text part and the image part of the image data of each page separated by the T / I separation unit 114 is individually stored in the third memory 123 as a T / I separation result 127 so as to be appropriately readable. It has become.

また、上記矩形切り出し部１１５は、各ページの共通画像と非共通画像から、前記Ｔ／Ｉ分離部１１４によって分離されたテキスト部の画像とイメージ部の画像を、少なくとも１つ以上の矩形部分に切り出すように構成されている。この矩形切り出し部１１５による矩形状の画像の切り出しは、図８に示すように、入力画像データの共通画像及び非共通画像の中から、イメージ画像やテキスト画像を、例えば、カラー複合機のユーザーインタフェース１３０に設けられたタッチパネルやマウス等によって対角線状に左上の角８４１と右下の角８４２とを指定することによって行われる。また、上記矩形切り出し部１１５による矩形状の画像の切り出しは、図１２に示すように、イメージ画像やテキスト画像の周囲を所定の距離以上隔てて自動的に行なうように構成しても良い。 The rectangular cutout unit 115 converts the image of the text part and the image of the image part separated by the T / I separation unit 114 from the common image and the non-common image of each page into at least one rectangular part. It is configured to cut out. As shown in FIG. 8, the rectangular cutout unit 115 cuts out a rectangular image by converting an image image or text image from a common image and a non-common image of input image data, for example, a user interface of a color multifunction peripheral. This is done by designating the upper left corner 841 and the lower right corner 842 diagonally with a touch panel, a mouse or the like provided at 130. In addition, as shown in FIG. 12, the rectangular image cutout by the rectangular cutout unit 115 may be automatically performed with a predetermined distance or more around the image image or text image.

上記ＯＣＲ部１１６では、前記切り出し部１１５によって矩形状に切り出された画像のうち、Ｔ／Ｉ分離部１１４によってテキスト部として分離された画像データが、文字認識されて文字コードに変換される。 In the OCR unit 116, image data separated as a text part by the T / I separation unit 114 among the images cut out in a rectangular shape by the cutout unit 115 is character-recognized and converted into a character code.

上記属性付加部１１７では、図１３及び図１４に示すように、画像識別手段によって識別された共通画像と非共通画像を、矩形状に切り出した状態で、図１５に示すように、当該矩形領域８２１〜８２５と、矩形領域８３１、８３２・・・等に属性情報を付加するように構成されている。また、この属性付加部１１７は、画像識別手段によって識別された共通画像と非共通画像を、Ｔ／Ｉ分離部１１４によってテキスト部とイメージ部とに分離した状態で、前記イメージ部はイメージ情報のまま、前記テキスト部にはＯＣＲ処理を施して、当該ＯＣＲ処理１１６によって認識された文字情報に基づいて、前記矩形領域８２１〜８２５と、矩形領域８３１、８３２毎に属性情報を付加するように構成されている。 In the attribute adding unit 117, as shown in FIGS. 13 and 14, in the state where the common image and the non-common image identified by the image identification unit are cut out in a rectangular shape, as shown in FIG. 821 to 825, rectangular areas 831, 832,... The attribute adding unit 117 separates the common image and the non-common image identified by the image identifying unit into a text portion and an image portion by the T / I separation unit 114, and the image portion stores image information. The text portion is subjected to OCR processing, and attribute information is added to each of the rectangular regions 821 to 825 and the rectangular regions 831 and 832 based on the character information recognized by the OCR processing 116. Has been.

さらに、上記属性付加部１１７は、画像識別手段によって識別された共通画像と非共通画像のうち非共通画像に、共通画像の属性情報の下位に従属するように属性情報を付加するように構成されている。例えば、共通画像としての「氏名」８２３の欄に記入された非共通画像としての「氏名」８３１は、共通画像３の下位（子）の属性情報が付加されるようになっている。 Further, the attribute adding unit 117 is configured to add attribute information to the non-common image among the common image and the non-common image identified by the image identification unit so as to be subordinate to the attribute information of the common image. ing. For example, “name” 831 as a non-common image entered in the field of “name” 823 as a common image is added with attribute information at a lower level (child) of the common image 3.

その際、上記属性付加部１１７は、画像識別手段によって識別された共通画像に属性情報を付加する順番を、主走査方向優先か副走査方向優先かを選択することができるように構成されている。この属性付加部１１７は、例えば、副走査方向優先が選択された場合には、図１５に示すように、副走査方向（図中、上下方向）に沿って属性情報を付加するようになっている。いま、「学期末テスト数学」の文字が最上部に位置する場合には、当該「学期末テスト数学」の共通画像が共通１として、当該共通画像の文字認識結果である、「学期末テスト数学」の文字が識別情報として付加される。次に、絵柄８０１の共通画像が共通２として、当該共通画像がイメージである場合には、「絵柄」の文字が識別情報として付加される。以下、同様に、「氏名」の共通画像が共通３として、当該共通画像の文字認識結果である、「氏名」の文字が識別情報として付加される。 At that time, the attribute adding unit 117 is configured to be able to select whether the priority of adding the attribute information to the common image identified by the image identifying means is priority in the main scanning direction or priority in the sub scanning direction. . For example, when priority is given to the sub-scanning direction, the attribute adding unit 117 adds attribute information along the sub-scanning direction (vertical direction in the figure) as shown in FIG. Yes. Now, if the letter “end of semester test mathematics” is at the top, the common image of the “end of semester test mathematics” is common 1, and the character recognition result of the common image is “end of semester test math”. "Is added as identification information. Next, when the common image of the picture 801 is common 2, and the common image is an image, the characters “picture” are added as identification information. Hereinafter, similarly, the common image of “name” is common 3, and the character of “name”, which is the character recognition result of the common image, is added as identification information.

なお、上記識別情報は、当該共通画像の文字認識結果以外にも、ユーザがユーザインタフェース５やパーソナルコンピュータ５等から任意に変更することができるように構成されている。 In addition to the character recognition result of the common image, the identification information can be arbitrarily changed by the user from the user interface 5, the personal computer 5, or the like.

さらに、上記属性付加部１１７は、画像識別手段によって識別された非共通画像に付加する属性情報を、共通画像に対して、主走査方向優先か副走査方向優先かを選択することができるように構成されている。つまり、図１５に示すように、共通画像の主走査方向に非共通画像が記載される場合には、例えば、主走査方向優先が選択される。また、共通画像の副走査方向に非共通画像が記載される場合には、例えば、副走査方向優先が選択される。 Further, the attribute adding unit 117 can select attribute information to be added to the non-common image identified by the image identifying unit, whether the main scanning direction priority or the sub-scanning direction priority is given to the common image. It is configured. That is, as shown in FIG. 15, when a non-common image is described in the main scanning direction of the common image, for example, priority on the main scanning direction is selected. Further, when a non-common image is described in the sub-scanning direction of the common image, for example, priority is given to the sub-scanning direction.

さらに、上記ファイル生成部１１８では、入力画像データのうち、共通画像と非共通画像の画像データに基づいて、これら共通画像と非共通画像の画像データを別々に電子化して、ＰＤＦファイルやポストスクリプト等のファイルデータを生成するようになっている。 Further, in the file generation unit 118, based on the image data of the common image and the non-common image among the input image data, the image data of the common image and the non-common image are separately digitized to generate a PDF file or a postscript. Such file data is generated.

また、上記データベース化部１１９は、ファイル生成部１１９で生成されたファイルデータに基づいて、読み込まれた複数ページにわたる文書８のデータベースを作成するように構成されている。即ち、このデータベース化部１１９では、図１６に示すように、共通画像部の属性を「親」として認識するとともに、当該属性が同じ共通画像部に対応した非共通画像部の属性を「子」として認識することにより、共通画像部の属性である「親」のデータ毎に、非共通画像部の属性である「子」のデータを順次まとめていく操作を行うように構成されている。なお、上記データベース化部１１９は、例えば、パーソナルコピュータ５側に設けられる。 Further, the database creating unit 119 is configured to create a database of the document 8 that covers a plurality of read pages based on the file data generated by the file generating unit 119. That is, as shown in FIG. 16, the database creating unit 119 recognizes the attribute of the common image portion as “parent” and sets the attribute of the non-common image portion corresponding to the common image portion having the same attribute as “child”. As a result of the recognition, for each “parent” data that is an attribute of the common image portion, an operation of sequentially collecting “child” data that is an attribute of the non-common image portion is performed. The database creating unit 119 is provided on the personal computer 5 side, for example.

上記具体例の場合には、データベース化部１１９によって、共通画像部の属性である「親」のデータが「氏名」である場合には、当該「親」データに「子」の相当する個別の「氏名」のデータである「富士ゼロ子」や「○×△□」・・・等のデータが順次まとめられる。上記データベース化部１１９によってデータベース化されたデータをエクセル等の図表として表示すれば、図１７に示すようになる。 In the case of the above specific example, if the data of the “parent” that is the attribute of the common image portion is “name” by the database creation unit 119, an individual corresponding to “child” is added to the “parent” data. Data such as “Fuji Zero child”, “○ × Δ □”,. If the data made into a database by the database creating unit 119 is displayed as a chart such as Excel, it becomes as shown in FIG.

以上の構成において、この実施の形態に係る画像データ処理装置では、次のようにして、予めフォーマット化されていないハード文書等であり、図形や枠体などを含んだ画像である場合であっても、ユーザに過大な負担を強いることなく、容易にデータベース化することが可能となっている。 In the above configuration, the image data processing apparatus according to this embodiment is a hard document that has not been formatted in advance as described below, and is an image that includes figures, frames, and the like. However, it is possible to easily create a database without imposing an excessive burden on the user.

すなわち、この実施の形態に係る画像データ処理装置１００が適用された画像処理システム１では、図２に示すように、複数ページにわたる文書８等の画像が画像読取装置としてのスキャナー２又はスキャナー９によって読み取られ、当該スキャナー２、９によって読み取られた複数ページにわたる文書８等の画像データは、図１に示すように、画像データ処理装置１００が装着された画像出力装置としてのカラー複合機３に入力される。なお、上記スキャナー２、９によって読み取られた複数ページにわたる文書８としては、例えば、図６に示すように、学校や学習塾等で用いられるテスト用紙や、企業のオフィスや役所等で使用される定型の文書などが挙げられる。 That is, in the image processing system 1 to which the image data processing apparatus 100 according to this embodiment is applied, as shown in FIG. 2, an image such as a document 8 over a plurality of pages is obtained by a scanner 2 or a scanner 9 as an image reading apparatus. As shown in FIG. 1, image data such as a document 8 that is read and read by the scanners 2 and 9 is input to a color multifunction peripheral 3 as an image output device to which an image data processing device 100 is attached. Is done. In addition, as the document 8 covering a plurality of pages read by the scanners 2 and 9, for example, as shown in FIG. 6, it is used in a test paper used in a school or a cram school, a company office, a public office, or the like. For example, a standard document.

上記画像データ処理装置１００には、図１に示すように、画像読取装置としてのスキャナー２、９によって読み取られた複数ページにわたる文書８の画像データが入力され、当該入力された画像データは、共通画像認識部１１１によって、当該入力された複数ページからなる画像データに基づいて、各ページに共通する共通画像が認識される。上記共通画像認識部１１１によって認識される文書８の画像データとしては、例えば、２値化された画像データが用いられるが、多値のままの画像データを用いるように構成しても良い。また、カラー画像の場合には、色を問わず画像データのある部分は、画像とみなすようにしても良い。 As shown in FIG. 1, the image data processing apparatus 100 receives image data of a document 8 over a plurality of pages read by scanners 2 and 9 as image reading apparatuses, and the input image data is common. The image recognition unit 111 recognizes a common image common to each page based on the input image data including a plurality of pages. As the image data of the document 8 recognized by the common image recognition unit 111, for example, binarized image data is used. However, it is also possible to use multi-valued image data. In the case of a color image, a portion having image data regardless of color may be regarded as an image.

例えば、図８に示すように、学期末テストの氏名や解答が書き込まれたテスト用紙８の複数ページからなる画像データ８００が入力されると、共通画像認識部１１１では、図９に示すように、１ページ目の画像データと、２ページ目の画像データというように、各ページの画像データ８００がビット単位で比較され、図１０に示すように、共通画像８２１、８２２などが認識される。この共通画像認識部１１１によって認識された共通画像は、第１メモリ１２１の共通画像記憶部１２５に一時記憶される。次に、共通画像記憶部１２５に記憶された１ページ目の画像データと２ページ目の画像データとの共通画像は、共通画像認識部１１１によって、３ページ目の画像データと比較され、共通画像が認識されて、第１メモリ１２１の共通画像記憶部１２５に一時記憶される。 For example, as shown in FIG. 8, when image data 800 consisting of a plurality of pages of the test sheet 8 on which the name and answer of the semester test are written is input, the common image recognition unit 111 receives the image data 800 as shown in FIG. The image data 800 of each page is compared in bit units, such as the image data of the first page and the image data of the second page, and the common images 821 and 822 are recognized as shown in FIG. The common image recognized by the common image recognition unit 111 is temporarily stored in the common image storage unit 125 of the first memory 121. Next, the common image of the image data of the first page and the image data of the second page stored in the common image storage unit 125 is compared with the image data of the third page by the common image recognition unit 111, and the common image Is recognized and temporarily stored in the common image storage unit 125 of the first memory 121.

このように、上記共通画像認識部１１１では、入力された各ページの画像データのうち、１ページ目と２ページ目の画像データの共通画像を認識し、図８に示すように、１ページ目と２ページ目の画像データの共通画像が識別される。次に、上記記共通画像認識部１１１では、１ページ目と２ページ目の画像データの共通画像として識別された結果と、３ページ目の画像データの共通画像を認識するというように、入力された各ページの画像データのうち、ｎページ目とｎ＋１ページ目の画像データの共通画像を識別し、当該識別結果とｎ＋２ページ目の画像データの共通画像を識別し、以降同様に前ページまでの識別結果と現ページの画像データの共通画像を識別するようになっている。その結果、上記共通画像認識部１１１では、各ページの画像に共通する共通画像が識別され、この共通画像は、第１のメモリ１２１の共通画像記憶部１２５に記憶される。 As described above, the common image recognition unit 111 recognizes the common image of the image data of the first page and the second page among the input image data of each page, and as shown in FIG. And the common image of the image data of the second page are identified. Next, in the above-mentioned common image recognition unit 111, the result identified as the common image of the image data of the first page and the second page and the common image of the image data of the third page are recognized. Among the image data of each page, the common image of the image data of the nth page and the (n + 1) th page is identified, the common result of the identification result and the image data of the (n + 2) th page is identified. A common image of the identification result and the image data of the current page is identified. As a result, the common image recognition unit 111 identifies a common image common to the images of each page, and the common image is stored in the common image storage unit 125 of the first memory 121.

次に、上記共通画像抽出部１１２では、共通画像認識部１１１による各ページの画像データを比較した結果である共通画像の認識結果に基づいて、図８に示すように、共通画像８３１が抽出される。この共通画像抽出部１１２によって抽出された共通画像８３１は、第１のメモリ１２１の共通画像記憶部１２５に記憶される。 Next, as shown in FIG. 8, the common image extraction unit 112 extracts a common image 831 based on the recognition result of the common image, which is a result of comparing the image data of each page by the common image recognition unit 111. The The common image 831 extracted by the common image extraction unit 112 is stored in the common image storage unit 125 of the first memory 121.

次に、共通画像除去部１１３では、図８に示すように、第１のメモリ１２１の入力画像記憶部１２４に記憶された各ページの画像データから、共通画像抽出部１１２で抽出されて共通画像記憶部１２５に記憶された共通画像８３１が除去され、各ページで異なる非共通画像８３２が求められる。これらの非共通画像８３２は、第２のメモリ１２２の非共通画像記憶部１２６に記憶される。 Next, in the common image removal unit 113, as shown in FIG. 8, the common image extraction unit 112 extracts the common image from the image data of each page stored in the input image storage unit 124 of the first memory 121. The common image 831 stored in the storage unit 125 is removed, and a different non-common image 832 is obtained for each page. These non-common images 832 are stored in the non-common image storage unit 126 of the second memory 122.

その後、上記第１のメモリ１２１の共通画像記憶部１２５に記憶された共通画像８３１と、第２のメモリ１２２の非共通画像記憶部１２６に記憶された非共通画像８３２は、図１に示すように、Ｔ／Ｉ分離部１１４によって、テキスト部とイメージ部とが分離される。上記共通画像では、図８に示すように、学期末テスト等の文書のタイトルを示す文字画像８０２、「氏名」を書く欄に記載された「氏名」の文字８０３、「問１」、「問２」・・・等の問題番号を示す文字を含む問題文８０４、８０５からなるテキスト部と、テスト用紙を作成した会社や科目等を表示したマーク等の図形８０１や、「氏名」の欄や問題文の欄を囲む矩形状の枠を表示する直線状の枠画像８０６からなるイメージ部とが分離され、これらのテキスト部とイメージ部の分離結果は、第３のメモリ１２３にＴ／Ｉ分離結果として記憶される。 Thereafter, the common image 831 stored in the common image storage unit 125 of the first memory 121 and the non-common image 832 stored in the non-common image storage unit 126 of the second memory 122 are as shown in FIG. Further, the text part and the image part are separated by the T / I separator 114. In the above common image, as shown in FIG. 8, a character image 802 indicating the title of a document for the end of semester test, a “name” character 803 written in the column of “name”, “question 1”, “question” 2 ”..., Etc., a text portion including question sentences 804 and 805 including characters indicating a problem number, a figure 801 such as a mark indicating the company or subject that created the test sheet, a“ name ”field, An image part composed of a linear frame image 806 that displays a rectangular frame surrounding the question sentence column is separated, and the separation result of the text part and the image part is T / I separated in the third memory 123. Stored as a result.

また、上記非共通画像８３２では、図８に示すように、テストを受けた者の「氏名」８０７や、解答としての数字８０８、あるいは解答としての文章８０９からなるテキスト部と、棒グラフ等の図形８１０からなるイメージ部とが分離され、これらのテキスト部とイメージ部の分離結果は、第３のメモリ１２３にＴ／Ｉ分離結果として記憶される。 In the non-common image 832, as shown in FIG. 8, a “name” 807 of the person who took the test, a number 808 as an answer, or a text part consisting of a sentence 809 as an answer, and a graphic such as a bar graph The image part composed of 810 is separated, and the separation result of the text part and the image part is stored in the third memory 123 as the T / I separation result.

次に、上記Ｔ／Ｉ分離部１１４によってテキスト部とイメージ部とに分離された共通画像８３１と非共通画像８３２は、矩形切り出し部１１５によって矩形状の切り出し枠８４１で、図８、図１３及び図１４に示すように、テキスト部とイメージ部の各画像データ毎に切り出される。 Next, the common image 831 and the non-common image 832 separated into the text part and the image part by the T / I separation unit 114 are rectangular cutout frames 841 by the rectangular cutout unit 115, and are shown in FIGS. As shown in FIG. 14, it is cut out for each image data of the text part and the image part.

上記画像データ処理装置１００の処理操作を指示するカラー複合機３等のユーザーインターフェースでは、矩形状に切り出された画像を、ビットマップで生成するか、ＯＣＲ部１１６によって文字コードを生成するかを選択することができるようになっている。 In the user interface such as the color multifunction peripheral 3 that instructs the processing operation of the image data processing apparatus 100, it is selected whether to generate a rectangular image or a character code by the OCR unit 116. Can be done.

そして、上記矩形切り出し部１１５によって矩形状に切り出されたテキスト部の各画像データは、例えば、ＯＣＲ部１１６によって文字認識され、文字コードに変換される。 Then, each image data of the text portion cut out in a rectangular shape by the rectangular cutout unit 115 is recognized by the OCR unit 116 and converted into a character code, for example.

上記矩形切り出し部１１５によって矩形状に切り出され、ＯＣＲ部１１６によって文字認識された各画像データには、図１５に示すように、属性付加部１１７によって対応する属性が付加される。この属性付加部１１７では、例えば、共通画像に対して副走査方向優先で、上から順に、共通画像１として「絵柄」の図形画像に対して、「共通１：絵柄」の属性が付加され、次に、共通画像２として「学期末テスト数学」の文字画像に対して、「共通２：学期末テスト数学」の属性が付加される。以下、同様に、共通画像３として「氏名」の文字画像に対して、「共通３：氏名」の属性が、共通画像４として「問１：１＋１＝」の文字画像に対して、「共通４：問１」の属性が、共通画像５として「５」の文字画像に対して、「共通５：問１」の属性が、それぞれ付加されるようになっている。 As shown in FIG. 15, a corresponding attribute is added by an attribute adding unit 117 to each image data cut out in a rectangular shape by the rectangular cutout unit 115 and character-recognized by the OCR unit 116. In the attribute addition unit 117, for example, the common image 1 attribute is added to the graphic image of “picture” as the common image 1 in order from the top in the sub-scanning direction with respect to the common image. Next, the attribute of “common 2: semester test math” is added to the character image “semester semester test math” as the common image 2. Similarly, for the character image “name” as the common image 3, the attribute “common 3: name” is “common 4” for the character image “question 1: 1 + 1 =” as the common image 4. The attribute “Question 1” is added to the character image “5” as the common image 5 and the attribute “Common 5: Question 1” is added to the character image “5”.

なお、ここで、共通画像４の「問１：１＋１＝」の文字画像と、共通画像５の「５」の文字画像に対して、同じ属性「問１」が付加されるのは、副走査方向の位置が略同じで、主走査方向に沿って並んで配置された画像に対して、同じ属性を付加するように設定されているためであり、共通画像４の「問１：１＋１＝」の文字画像と、共通画像５の「５」の文字画像に対して、異なる属性を付加するように設定しても勿論よい。 Here, the same attribute “Question 1” is added to the character image “Question 1: 1 + 1 =” of the common image 4 and the character image “5” of the common image 5 in the sub-scanning. This is because it is set so that the same attribute is added to the images arranged in the main scanning direction and having the same position in the direction, and “question 1: 1 + 1 =” of the common image 4 is set. Of course, different character attributes may be added to the character image “5” and the character image “5” of the common image 5.

また、図示の実施例では、共通画像にすべて「親」としての属性が付加されており、当該共通画像に対して、主走査方向に沿って並んで配置された画像に対しては、「親」に対する「子」としての属性が付加されるようになっている。 In the illustrated example, the attribute “parent” is added to all the common images, and “parent” is assigned to the images arranged side by side in the main scanning direction with respect to the common image. ”As a“ child ”attribute is added.

」
次に、ファイル生成部１１８では、入力された画像データに対して、テキスト画像の認識された文字コードや文字の大きさ、文字の位置等のデータ、及びイメージ画像のイメージの内容及び位置等のデータがファイル化されて、図１８に示すように、共通部分の１番目のヘッダと当該１番目の共通部分であるイメージ１のデータ、次に、共通部分の２番目のヘッダと当該２番目の共通部分であるテキスト１のデータ・・・、1 ページ目の非共通部分の１番目のヘッダと当該１番目の非共通部分であるデータ、次に、非共通部分の２番目のヘッダと当該２番目の非共通部分であるデータ・・・、２ページ目の非共通部分の１番目のヘッダと当該１番目の非共通部分であるデータ、次に、非共通部分の２番目のヘッダと当該２番目の非共通部分であるデータ・・・というように、ファイルが生成される。このファイルの種類としては、ＰＤＦファイルやポストスクリプトファイルなど、任意の種類のものであっても良いことは勿論である。 "
Next, in the file generation unit 118, data such as the recognized character code, character size, character position, and the like of the text image, and the content and position of the image of the image image are input to the input image data. As shown in FIG. 18, the data is filed, and the first header of the common part and the data of the image 1 as the first common part, and then the second header of the common part and the second header Text 1 data that is the common part ..., the first header of the non-common part of the first page and the data that is the first non-common part, then the second header of the non-common part and the 2 Data that is the second non-common part: the first header of the non-common part on the second page and the data that is the first non-common part, then the second header of the non-common part and the second In the non-common part That so that the data ..., the file is generated. Of course, this file may be of any type, such as a PDF file or a Postscript file.

そして、最後に、データベース化部１１９では、ファイル生成部１１８で生成されたファイルに基づいて、図１６に示すように、共通画像である「親」としての属性に対応して、「子」としての関係を有する非共通画像を、順次配列してデータベースを作成するようになっている。このデータベース化部１１９でデータベース化されたデータを、エクセル等の図表で表示すれば、図１７に示すように、共通画像である「親」としての属性が横に並んだ状態で、各「親」としての属性に対する「子」としての属性を有するデータが縦に並んだ状態で表示される。 Finally, in the database creation unit 119, based on the file generated by the file generation unit 118, as shown in FIG. 16, corresponding to the attribute as “parent” which is a common image, as “child” A database is created by sequentially arranging non-common images having the above relationship. If the data stored in the database 119 is displayed in a chart such as Excel, as shown in FIG. 17, the “parent” attributes that are common images are arranged side by side. Data having an attribute as a “child” with respect to an attribute as “” is displayed in a vertically arranged state.

このように、上記実施の形態に係る画像データ処理装置１００では、予めフォーマット化されていないハード文書等であり、図形や枠体などを含んだ画像である場合であっても、共通画像と非共通画像とを識別し、それぞれに対応した属性を付加して配列することによって、自動的にデータベースを作成することができ、ユーザに過大な負担を強いることなく、容易にデータベース化することが可能となっている。 As described above, in the image data processing apparatus 100 according to the above-described embodiment, even if the document is a hard document or the like that has not been formatted in advance and includes an image including a figure and a frame, A database can be created automatically by identifying common images and arranging them with corresponding attributes, and can easily be created without imposing an excessive burden on the user. It has become.

なお、上記の実施の形態では、データベース化部１１９がパーソナルコンピュータ５側に設けられる場合について説明したが、当該データベース化部１１９は、画像データ処理装置１００側に設けるように構成しても良い。 In the above embodiment, the case where the database creating unit 119 is provided on the personal computer 5 side has been described. However, the database creating unit 119 may be configured to be provided on the image data processing apparatus 100 side.

図１はこの発明の実施の形態１に係る画像データ処理装置を示すブロック図である。FIG. 1 is a block diagram showing an image data processing apparatus according to Embodiment 1 of the present invention. 図２はこの発明の実施の形態１に係る画像データ処理装置を適用した画像処理システムを示す構成図である。FIG. 2 is a block diagram showing an image processing system to which the image data processing apparatus according to Embodiment 1 of the present invention is applied. 図３はこの発明の実施の形態１に係る画像データ処理装置を適用した画像出力装置としてのカラー複合機を示す構成図である。FIG. 3 is a block diagram showing a color multifunction machine as an image output apparatus to which the image data processing apparatus according to Embodiment 1 of the present invention is applied. 図４はこの発明の実施の形態１に係る画像データ処理装置を適用した画像出力装置としてのカラー複合機の画像形成部を示す構成図である。FIG. 4 is a block diagram showing an image forming unit of a color multifunction peripheral as an image output apparatus to which the image data processing apparatus according to Embodiment 1 of the present invention is applied. 図５はこの発明の実施の形態１に係る画像データ処理装置を適用し得る画像読取装置を示す構成図である。FIG. 5 is a block diagram showing an image reading apparatus to which the image data processing apparatus according to Embodiment 1 of the present invention can be applied. 図６はこの発明の実施の形態１に係る画像データ処理装置によって処理される答案用紙を示す説明図である。FIG. 6 is an explanatory view showing an answer sheet processed by the image data processing apparatus according to Embodiment 1 of the present invention. 図７はこの発明の実施の形態１に係る画像データ処理装置によって処理される画像を示す説明図である。FIG. 7 is an explanatory view showing an image processed by the image data processing apparatus according to Embodiment 1 of the present invention. 図８はこの発明の実施の形態１に係る画像データ処理装置によって処理される画像を示す説明図である。FIG. 8 is an explanatory view showing an image processed by the image data processing apparatus according to Embodiment 1 of the present invention. 図９はこの発明の実施の形態１に係る画像データ処理装置によって処理される画像を示す説明図である。FIG. 9 is an explanatory view showing an image processed by the image data processing apparatus according to Embodiment 1 of the present invention. 図１０はこの発明の実施の形態１に係る画像データ処理装置によって処理される画像を示す説明図である。FIG. 10 is an explanatory view showing an image processed by the image data processing apparatus according to Embodiment 1 of the present invention. 図１１はこの発明の実施の形態１に係る画像データ処理装置によって処理される画像を示す説明図である。FIG. 11 is an explanatory view showing an image processed by the image data processing apparatus according to Embodiment 1 of the present invention. 図１２はこの発明の実施の形態１に係る画像データ処理装置によって処理される画像を示す説明図である。FIG. 12 is an explanatory view showing an image processed by the image data processing apparatus according to Embodiment 1 of the present invention. 図１３はこの発明の実施の形態１に係る画像データ処理装置によって処理される画像を示す説明図である。FIG. 13 is an explanatory view showing an image processed by the image data processing apparatus according to Embodiment 1 of the present invention. 図１４はこの発明の実施の形態１に係る画像データ処理装置によって処理される画像を示す説明図である。FIG. 14 is an explanatory view showing an image processed by the image data processing apparatus according to Embodiment 1 of the present invention. 図１５はこの発明の実施の形態１に係る画像データ処理装置によって処理される画像を示す説明図である。FIG. 15 is an explanatory view showing an image processed by the image data processing apparatus according to Embodiment 1 of the present invention. 図１６はこの発明の実施の形態１に係る画像データ処理装置によって処理されるデータを示す図である。FIG. 16 is a diagram showing data processed by the image data processing apparatus according to Embodiment 1 of the present invention. 図１７はこの発明の実施の形態１に係る画像データ処理装置によって処理されたデータベースを示す図表である。FIG. 17 is a chart showing a database processed by the image data processing apparatus according to Embodiment 1 of the present invention. 図１８はこの発明の実施の形態１に係る画像データ処理装置によって処理されるデータを示す図表である。FIG. 18 is a chart showing data processed by the image data processing apparatus according to Embodiment 1 of the present invention.

Explanation of symbols

２，９：画像読取装置、１００：画像データ処理装置、１１１：共通画像認識部、１１２：共通画像抽出部、１１３：共通画像除去部、１１４：Ｔ／Ｉ分離部、１１５：矩形切り出し部、１１６：ＯＣＲ部、１１７：属性付加部、１１８：ファイル生成部、１１９：データベース化部。 2, 9: Image reading device, 100: Image data processing device, 111: Common image recognition unit, 112: Common image extraction unit, 113: Common image removal unit, 114: T / I separation unit, 115: Rectangular clipping unit 116: OCR unit, 117: attribute addition unit, 118: file generation unit, 119: database creation unit.

Claims

In an image data processing device that performs predetermined processing on input image data consisting of a plurality of pages,
Image identifying means for identifying a common image common to each page and a different non-common image for each page based on the input image data consisting of a plurality of pages;
An image data processing apparatus comprising: an attribute addition unit that adds attribute information to the rectangular area in a state where the common image and the non-common image identified by the image identification unit are cut out in a rectangular shape.

The attribute adding means separates the common image and the non-common image identified by the image identifying means into a text part and an image part by a T / I separating means, and the image part remains as image information. 2. The image data processing apparatus according to claim 1, wherein the text portion is subjected to character recognition processing, and attribute information is added to each rectangular area based on character information recognized by the character recognition processing. .

The attribute addition unit adds attribute information to a non-common image among the common image and the non-common image identified by the image identification unit so as to be subordinate to the attribute of the common image. The image data processing apparatus according to 1 or 2.

4. The attribute adding unit can select whether the priority of adding the attribute information to the common image identified by the image identifying unit is priority in the main scanning direction or priority in the sub scanning direction. An image data processing apparatus according to any one of the above.

The attribute adding means can select whether the attribute information to be added to the non-common image identified by the image identifying means is a main scanning direction priority or a sub-scanning direction priority with respect to the common image. The image data processing apparatus according to claim 1.

A common image recognition means for recognizing a common image common to each page based on the inputted image data consisting of a plurality of pages;
Common image extraction means for extracting a common image recognized by the common image recognition means from the input image data of each page;
And a common image removing unit that removes the common image extracted by the common image extracting unit from the input image data of each page and obtains a different non-common image for each page. The image data processing apparatus according to claim 1.