JP2007041832A

JP2007041832A - Difference image extraction apparatus

Info

Publication number: JP2007041832A
Application number: JP2005224923A
Authority: JP
Inventors: Yukinori Matsumoto; 行倫松本
Original assignee: Konica Minolta Business Technologies Inc
Current assignee: Konica Minolta Business Technologies Inc
Priority date: 2005-08-03
Filing date: 2005-08-03
Publication date: 2007-02-15

Abstract

<P>PROBLEM TO BE SOLVED: To provide a difference image extraction apparatus for accurately extracting an image of a difference between unwritten printed matter where no positioning mark is recorded and written printed matter. <P>SOLUTION: The image data of unwritten printed matter 201 is converted into binary image data and histograms of black pixels are created in the horizontal direction 203 and the vertical direction 204 of a paper surface. An area where black pixels alternate with white pixels in a regular fashion is determined to be a character area, which is used as a template 207. The coordinates of two points on the template 207 are calculated while one corner of the paper surface is regarded as the original point. The binary image data of written printed matter is collated with the binary image data of the template to find an area of the template that matches it, to calculate coordinates corresponding to the two points on the template. Differences between the coordinates of the written printed matter and the coordinates of the template are calculated and a correction value is determined to correct the position of the binary image data. Differences between the binary image data of both printed matters whose positions have been corrected are extracted for output. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、未書込みの印刷物と書き込みがされた印刷物との差分画像を抽出する差分画像抽出装置に関する。 The present invention relates to a difference image extraction apparatus that extracts a difference image between an unwritten print and a written print.

配布された印刷物にコメントを書き込んだ後に、そのコメントだけを抜き出したい場合がある。このような場合、未書込みの印刷物の原本と書き込みがされた印刷物とをスキャナ等の画像入力装置で取り込み、画像データとして比較を行う場合には、２つの画像データの位置を合わせる必要がある。
印刷物に位置合わせ用のマークが印刷されている場合には、それを基準に２つの画像データの位置を合わせ、その後、両画像データの差分を抽出すると、印刷物に書き込まれたコメントだけを抽出することができる。 After writing a comment on the distributed printout, you may want to extract only that comment. In such a case, when the original unwritten print and the written print are captured by an image input device such as a scanner and compared as image data, the positions of the two image data must be aligned.
When a mark for alignment is printed on the printed material, the positions of the two image data are aligned based on the mark, and then the difference between the two image data is extracted, so that only the comment written on the printed material is extracted. be able to.

ところが、位置合わせ用のマークが印刷されていない印刷物では、どこを位置合わせの基準に用いるかで位置合わせの精度が大きく異なる。例えば、絵や図形などの領域を位置合わせの基準に用いると、スキャン時の画像の濃度や色彩の変化或いは図形の形状等によって、位置合わせが難しい場合がある。
この問題を解決するため、例えば、特許文献１には、画像データの文字認識処理を行い、その中から２つの文字を抽出し、その抽出した文字の位置情報を基に、位置合わせを行う技術が開示されている。
特開平１１−１７５７１７号公報 However, in the printed matter on which the alignment mark is not printed, the alignment accuracy differs greatly depending on which position is used as the alignment reference. For example, if a region such as a picture or a figure is used as a reference for alignment, alignment may be difficult due to image density, color change, figure shape, or the like during scanning.
In order to solve this problem, for example, Patent Document 1 discloses a technique for performing character recognition processing of image data, extracting two characters from the image data, and performing alignment based on position information of the extracted characters. Is disclosed.
JP-A-11-175717

しかし、従来のように、予め印刷物に位置合せ用のマークを印刷するのにも工数を要するし、また、特許文献１の技術では、文字認識処理をする必要があり、文字認識処理に工数が必要となる。
そこで、本発明は、位置合せ用のマークがない画像データに対しても、文字認識処理を行うことなく、少ない工数で高い精度の位置合せをすることのできる差分画像抽出装置を提供することを目的とする。 However, as in the prior art, man-hours are required to print the alignment mark on the printed material in advance, and the technique of Patent Document 1 requires character recognition processing, which requires man-hours for character recognition processing. Necessary.
Therefore, the present invention provides a differential image extraction apparatus that can perform high-precision alignment with a small number of man-hours without performing character recognition processing even on image data without alignment marks. Objective.

上記課題を解決するため、本発明は、末書込みの印刷物と、書き込みがされた印刷物との差分を抽出する差分画像抽出装置であって、２つの印刷物の画像を取得する画像取得手段と、末書込みの印刷物の文字領域を抽出する画像抽出手段と、抽出した文字領域と書込みがされた印刷物の画像データとを照合して両印刷物の位置合せをする位置合せ手段と、位置合せした状態で２つの印刷物の差分画像を抽出する差分抽出手段とを備えることとしている。 In order to solve the above-described problems, the present invention provides a differential image extraction apparatus that extracts a difference between a last-printed printed matter and a written matter that has been written, an image obtaining unit that obtains images of two printed matters, An image extracting means for extracting the character area of the printed material to be written, an alignment means for matching the extracted character area with the image data of the written printed material to align both printed materials, and 2 in the aligned state Difference extraction means for extracting a difference image of two printed matters.

上述のような構成によって、絵や図形領域よりも精度の高い文字領域を位置合せの基準とするので、位置合せのマークがなくても精度の高い位置合せをして、差分画像を効率的に抽出することができる。
また、前記画像抽出手段は、前記画像取得手段で取得された末書込みの印刷物の画像を各画素毎に黒画素と白画素とに２値化した画像データとする２値化部と、前記２値化部で黒画素とされた画像データを末書込みの印刷物の紙面の縦方向と横方向とに投影したヒストグラムを生成するヒストグラム生成部と、生成されたヒストグラムの形状から文字領域を判定する判定部と、前記判定部で判定された文字領域からＭ行Ｎ列の画像データをテンプレートとして抽出するテンプレート抽出部とを有することとしている。 With the configuration as described above, a character area with higher accuracy than the picture or graphic area is used as a reference for alignment. Therefore, even if there is no alignment mark, high-precision alignment is performed, and the difference image is efficiently obtained. Can be extracted.
In addition, the image extraction unit includes a binarization unit that converts the image of the last written print acquired by the image acquisition unit into image data binarized into a black pixel and a white pixel for each pixel; A histogram generation unit that generates a histogram obtained by projecting image data that has been converted to black pixels in the value conversion unit into the vertical and horizontal directions of the last-printed printed matter, and a determination that determines a character area from the shape of the generated histogram And a template extraction unit that extracts M rows and N columns of image data from the character region determined by the determination unit as a template.

このような構成によって、画像データを文字部分に相当する黒画素と背景部分に相当する白画素とに弁別し、文字領域を絵や図形領域と区別して判定することができる。
また、前記位置合せ手段は、前記書込みがされた印刷物の画像を各画素毎に黒画素と白画素とに２値化した画像データとする対象２値化部と、前記テンプレート抽出部で抽出されたテンプレートと前記対象２値化部で２値化された画像データとが一致する領域を認識する一致領域認識部と、前記一致領域認識部で一致するとした領域の座標値を前記テンプレートの座標値と一致させる位置合せ部とを有することとしている。 With such a configuration, the image data can be discriminated into black pixels corresponding to the character portion and white pixels corresponding to the background portion, and the character region can be discriminated from the picture or graphic region.
Further, the alignment means is extracted by the target binarization unit that converts the written image of the printed material into image data binarized into black pixels and white pixels for each pixel, and the template extraction unit. A matching area recognition unit for recognizing a region where the template and the image data binarized by the target binarization unit match, and a coordinate value of the region determined to be a match by the matching region recognition unit. And an alignment portion to be matched.

このような構成によって、前記画像取得手段で取得された末書込みの印刷物と書込みがされた印刷物との画像データの位置が多少ずれていた場合でも、正確に位置合せがされるので差分画像を精度よく得ることができる。
また、前記差分抽出手段は、前記末書込みの印刷物の文字領域と判定された領域に白画素の集合である余白領域を加えた領域と、書込みがされた印刷物の対応する領域とを抽出する領域抽出部と、抽出された両領域の画像データの差分を計算する計算部とを有することとしている。 With such a configuration, even if the position of the image data of the last-written printed matter acquired by the image acquiring unit and the written printed matter is slightly deviated, accurate alignment is performed, so that the difference image is accurate. Can get well.
In addition, the difference extracting unit extracts a region obtained by adding a blank region that is a set of white pixels to a region determined as the character region of the last-written printed material, and a region corresponding to the written printed material. An extraction unit and a calculation unit that calculates a difference between the extracted image data of both regions are included.

このような構成によって、画像取得手段で取得される画像で濃度差が大きくなる場合のある絵や図形データ領域を除外して、差分画像が抽出されるので、正確な差分画像を得ることができる。
また、前記計算部で計算された差分が「０」でないとき、黒画素として出力する出力手段を更に備えることとしている。 With such a configuration, a differential image is extracted by excluding a picture or graphic data area that may have a large density difference in an image acquired by the image acquisition means, so an accurate differential image can be obtained. .
In addition, when the difference calculated by the calculation unit is not “0”, output means for outputting as a black pixel is further provided.

このような構成によって、未書込みの印刷物と書込みがされた印刷物との差分である書込みを黒画素として正確に出力することができる。
また、前記テンプレート抽出部は、複数のＭ行Ｎ列の画像データを抽出できるとき、そのうちの最大のものを抽出することとしている。
このような構成によって、位置合わせに用いるテンプレートが印刷物の紙面に局在することがなくなるので、正確な位置合わせが可能となる。 With such a configuration, writing that is the difference between an unwritten print and a written print can be accurately output as a black pixel.
In addition, when the template extraction unit can extract a plurality of M rows and N columns of image data, it extracts the largest one of them.
With such a configuration, a template used for alignment is not localized on the paper surface of the printed material, and thus accurate alignment is possible.

また、本発明は、末書込みの印刷物と、書き込みがされた印刷物との差分を抽出する差分画像抽出装置の差分画像抽出方法であって、２つの印刷物の画像を取得する画像取得ステップと、末書込みの印刷物の文字領域を抽出する画像抽出ステップと、抽出した文字領域と書込みがされた印刷物の画像とを照合して両印刷物の位置合せをする位置合せステップと、位置合せした状態で２つの印刷物の差分画像を抽出する差分抽出ステップとを有することとしている。 The present invention also provides a differential image extraction method of a differential image extraction apparatus that extracts a difference between a final printed material and a written material, an image acquisition step for acquiring images of two printed materials, An image extraction step for extracting the character area of the printed material to be written, an alignment step for comparing the extracted character region and the written image of the printed material to align the two printed materials, and two states in the aligned state And a difference extraction step for extracting a difference image of the printed matter.

このような方法によって、絵や図形領域よりも精度の高い文字領域を位置合せの基準とするので、位置合せのマークがなくても、未書込みの印刷物と書込みがされた印刷物との精度の高い位置合わせをして、差分画像を効率的に抽出することができる。 By using such a method, a character area with higher accuracy than a picture or graphic area is used as a reference for alignment. Therefore, even if there is no alignment mark, the accuracy of unprinted printed matter and written printed matter is high. It is possible to extract the difference image efficiently by performing alignment.

以下、本発明に係る差分画像抽出装置の実施の形態について、図面を用いて説明する。
（一実施の形態）
図1は、本発明に係る差分画像抽出装置の一実施の形態のハードウェア構成の外観図である。
この差分画像抽出装置は、スキャナ１０１とパーソナルコンピュータ（以下「ＰＣ」という）１０２とから構成される。 Hereinafter, embodiments of a differential image extracting apparatus according to the present invention will be described with reference to the drawings.
(One embodiment)
FIG. 1 is an external view of a hardware configuration of an embodiment of a differential image extraction apparatus according to the present invention.
This differential image extraction apparatus includes a scanner 101 and a personal computer (hereinafter referred to as “PC”) 102.

スキャナ１０１は、載置された原稿を走査し、デジタル画像データに変換し、ＰＣ１０２にデジタル画像データを出力する。
この原稿には、基準となる未書込みの印刷物と、メモ等の書込みがされた印刷物とがある。
ＰＣ１０２は、ＣＰＵを内蔵し、差分画像を抽出するためのプログラムをＲＯＭに記憶している。ＣＰＵは、このプログラムに従い、画像認識抽出部、位置合せ部、差分抽出部等の機能を発揮する。 The scanner 101 scans the placed document, converts it into digital image data, and outputs the digital image data to the PC 102.
This document includes a non-written printed matter as a reference and a printed matter on which a memo or the like is written.
The PC 102 has a built-in CPU and stores a program for extracting a difference image in the ROM. In accordance with this program, the CPU exhibits functions such as an image recognition extraction unit, a registration unit, and a difference extraction unit.

ＰＣ１０２は、スキャナ１０１から入力された未書込みの印刷物と書込みがされた印刷物とのデジタル画像データから、例えばメモ等の差分画像データを抽出し、液晶ディスプレイ等からなる表示部１０３に差分画像を表示する。
図２は、この差分画像抽出装置における差分画像の抽出手順の概略を説明する図である。 The PC 102 extracts difference image data such as a memo from the digital image data of the unwritten print and the written print input from the scanner 101, and displays the difference image on the display unit 103 including a liquid crystal display or the like. To do.
FIG. 2 is a diagram for explaining the outline of the difference image extraction procedure in the difference image extraction apparatus.

（１）先ず、スキャナ１０１は未書込みの印刷物２０１のデジタル画像データを取得し、ＰＣ１０２に出力する。
ＰＣ１０２の制御部において、入力されたデジタル画像データを所定のしきい値で２値化した画像データ２０２に変換する。次に、この２値化画像データ２０２を用いて、文字領域を抽出する。この際、未書込みの印刷物２０１の紙面の横方向と縦方向との黒画素のヒストグラム２０３、２０４を生成して文字領域を認識する。 (1) First, the scanner 101 acquires digital image data of an unwritten printed matter 201 and outputs it to the PC 102.
The control unit of the PC 102 converts the input digital image data into image data 202 binarized with a predetermined threshold value. Next, a character area is extracted using the binarized image data 202. At this time, histograms 203 and 204 of black pixels in the horizontal direction and vertical direction of the paper surface of the unwritten printed matter 201 are generated to recognize the character area.

（２）次に、文字領域の所定の部分をテンプレート２０７として抽出する。
図３は、このテンプレート２０７を文字領域から抽出する画像認識抽出ステップの一例を説明するための図である。
未書込みの印刷物のデジタル画像データの各画素の画像データを所定のしきい値で黒画素と白画素との２値化画像データに変換する。このしきい値は、未書込みの印刷物の何も印刷されていない紙面の画像データが白画素になるように選ばれている。 (2) Next, a predetermined part of the character area is extracted as a template 207.
FIG. 3 is a diagram for explaining an example of an image recognition extraction step for extracting the template 207 from the character area.
The image data of each pixel of the digital image data of the unwritten printed matter is converted into binary image data of black pixels and white pixels with a predetermined threshold value. This threshold value is selected so that the image data of the unprinted printed matter on the paper surface is white pixels.

この未書込みの印刷物の２値化画像データ３０１から紙面３０２の横方向の黒画素のヒストグラム３１０と縦方向の黒画素のヒストグラム３２０とを生成する。
生成した横方向のヒストグラム３１０の形状は、黒画素が多数存在する領域３３１、３３３、３３５と、白画素領域３３２、３３４、３３６とがそれぞれ交互にあり、黒画素が多数存在する領域３３７がある。このヒストグラム３１０の形状で規則的に黒画素領域３３１、３３３、３３５と白画素領域３３２、３３４、３３６とが繰り返す領域は、黒画素領域３３１等が文字行の存在領域であり、白画素領域３３２等がその行間であると判定する。黒画素領域３３７には、白画素領域が介在しないので、文字領域とは判定できないので、絵又は図形領域２０５であると判定する。 A histogram 310 of black pixels in the horizontal direction and a histogram 320 of black pixels in the vertical direction are generated from the binarized image data 301 of the unwritten printed matter.
The shape of the generated histogram 310 in the horizontal direction includes regions 331, 333, and 335 in which many black pixels are present, and white pixel regions 332, 334, and 336 alternately, and includes a region 337 in which many black pixels are present. . In the area where the black pixel areas 331, 333, 335 and the white pixel areas 332, 334, 336 are regularly repeated in the shape of the histogram 310, the black pixel area 331 and the like are the character line existence areas, and the white pixel area 332 Etc. are determined to be between the lines. Since the black pixel region 337 does not include a white pixel region, it cannot be determined as a character region, and thus is determined to be a picture or graphic region 205.

また、生成した縦方向のヒストグラム３２０の形状は、黒画素領域３４１、３４３、３４５と白画素領域３４２、３４４、３４６とがそれぞれ交互にあり、黒画素領域３４７が続いている。規則的に黒画素領域３４１等と白画素領域３４２等とが繰り返す領域は、黒画素領域３４１等が文字列の存在領域であり、白画素領域３４２等がその文字間であると判定する。また黒画素領域３４７は、絵又は図形領域２０５であると判定する。 Further, the generated vertical histogram 320 has black pixel areas 341, 343, and 345 and white pixel areas 342, 344, and 346 alternately followed by a black pixel area 347. In an area where the black pixel area 341 and the like and the white pixel area 342 and the like regularly repeat, it is determined that the black pixel area 341 and the like are character string existing areas and the white pixel area 342 and the like are between the characters. The black pixel area 347 is determined to be a picture or graphic area 205.

これによって、紙面３０２において、領域３０６を共有する文字領域３５０と文字領域３５１と、絵図形領域２０５とが存在すると判定する。なお、その他の領域は、余白を形成している。
次に、文字領域３５０から最大のＭ行Ｎ列の画像データを抽出する。更に、文字領域３５１から最大のＭ´行Ｎ´列の画像データを抽出する。Ｍ×Ｎの値とＭ´×Ｎ´の値とを比較し、大きい値を採用し、Ｍ行Ｎ列の画像データとする。この未書込みの印刷物の２値化画像データ３０１では、文字領域３５１のＭ´行Ｎ´列の方が文字領域３５０のそれより大きいので、文字領域３５１からＭ行Ｎ列の画像データをテンプレートとして採用する。 As a result, it is determined that the character area 350, the character area 351, and the graphic area 205 sharing the area 306 exist on the paper surface 302. The other areas form blanks.
Next, the maximum M rows and N columns of image data are extracted from the character area 350. Further, the maximum M ′ row N ′ column image data is extracted from the character area 351. The value of M × N and the value of M ′ × N ′ are compared, and a larger value is adopted as image data of M rows and N columns. In the binary image data 301 of the unwritten printed matter, the M ′ row N ′ column of the character region 351 is larger than that of the character region 350, and therefore image data from the character region 351 to M row N column is used as a template. adopt.

なお、テンプレートをなるべく大きくとるのは、対象物との位置合せを行うとき、誤差をなるべく小さくするためである。
このテンプレートの左上部３０４の位置（Ｘ_１，Ｙ_１）と右下部３０５の位置（Ｘ_Ｍ，Ｙ_Ｎ）との紙面３０２の一隅、例えば左上部を原点（０，０）とした座標値を求める。
再び、図２を参照して、説明する。 The reason for making the template as large as possible is to make the error as small as possible when aligning with the object.
A coordinate value with the origin (0, 0) at one corner of the paper surface 302 of the position (X ₁ , Y ₁ ) of the upper left 304 and the position (X _M , Y _N ) of the lower right 305 of this template is used. Ask.
Again, a description will be given with reference to FIG.

（３）メモ２１２が書き込まれた印刷物２１１をスキャナ１０１で読み、デジタル画像データを取得し、ＰＣ１０２に出力する。
ＰＣ１０２において、所定のしきい値で２値化した２値化画像データ２１３を得る。
（４）次に、基準となる未書込みの印刷物の２値化画像データ２０２から抽出されたテンプレート２０７と書込みがされた印刷物の２値化画像データ２１３とを照合し、テンプレート２０７と一致する領域２１４を見つける。 (3) The printed matter 211 on which the memo 212 is written is read by the scanner 101 to acquire digital image data and output to the PC 102.
In the PC 102, binary image data 213 binarized with a predetermined threshold value is obtained.
(4) Next, the template 207 extracted from the binarized image data 202 of the unwritten printed matter as a reference is collated with the binarized image data 213 of the written printed matter, and an area that matches the template 207 Find 214.

（５）領域２１４が見つかると、その左上部と右下部との座標値を紙面２１５の左上部を原点として求め、テンプレート２０７の（Ｘ_１，Ｙ_１）、（Ｘ_Ｍ，Ｙ_Ｎ）とそれぞれ一致するよう書込みがされた印刷物の２値化画像データを移動し、位置補正をする。
図４は、この位置補正ステップを説明するための図である。
位置合せステップにおいて、未書込みの印刷物４０１のテンプレート２０７が書込みがされた印刷物４０２のいずれの領域と一致するかをテンプレート２０７の２値化画像データを書込みがされた印刷物の２値化画像データ２１３の範囲を変えながら見つける。 (5) When the region 214 is found, the coordinate values of the upper left portion and the lower right portion are obtained with the upper left portion of the paper surface 215 as the origin, and (X ₁ , Y ₁ ), (X _M , Y _N ) of the template 207 and The binarized image data of the printed material written so as to match is moved to correct the position.
FIG. 4 is a diagram for explaining the position correction step.
In the alignment step, the binarized image data 213 of the printed material in which the binary image data of the template 207 is written to indicate which region of the printed material 402 in which the template 207 of the unwritten printed material 401 matches is written. Find while changing the range.

テンプレート２０７の２値化画像データと所定の一致度以上の領域２１４が見つけられると、書込みがされた印刷物４０２の紙面の左上部４０３を原点（０，０）として、領域２１４の左上部４０４の座標（Ｘ_１´，Ｙ_１´）と、右下部４０５の座標（Ｘ_Ｍ´，Ｙ_Ｎ´）を求める。
なお、所定の一致度とは、テンプレート２０７の各画素の画像データと領域２１４の各画素の画像データとが、例えば、９５％以上一致している状態で、範囲をずらして一致度を比べても、これ以上一致する領域がないことを言う。 When an area 214 having a predetermined coincidence or higher with the binary image data of the template 207 is found, the upper left 403 of the printed surface of the printed matter 402 is set to the origin (0, 0), and the upper left 404 of the area 214 is displayed. The coordinates (X ₁ ′, Y ₁ ′) and the coordinates (X _M ′, Y _N ′) of the lower right 405 are obtained.
Note that the predetermined degree of coincidence means that the degree of coincidence is compared by shifting the range in a state where the image data of each pixel of the template 207 and the image data of each pixel of the region 214 are 95% or more, for example. Also says that there are no more matching areas.

次に、位置補正ステップでは予め求めてあるテンプレート２０７の未書込みの印刷物４０１の紙面の左上部３０３を原点（０，０）とするテンプレート２０７の左上部３０４の座標（Ｘ_１，Ｙ_１）と領域２１４の座標（Ｘ_１´，Ｙ_１´）とを比較し、テンプレート２０７の右下部３０５の座標（Ｘ_Ｍ，Ｙ_Ｎ）と領域２１４の座標（Ｘ_Ｍ´，Ｙ_Ｎ´）とを比較し、書込みがされた印刷物４０２が未書込みの印刷物４０１と同一の位置となるようにする。 Next, in the position correction step, the coordinates (X ₁ , Y ₁ ) of the upper left portion 304 of the template 207 with the origin (0, 0) being the upper left portion 303 of the unprinted printed material 401 of the template 207 are obtained in advance. The coordinates (X ₁ ′, Y ₁ ′) of the region 214 are compared, and the coordinates (X _M , Y _N ) of the lower right 305 of the template 207 are compared with the coordinates (X _M ′, Y _N ′) of the region 214. Then, the written matter 402 that has been written is set to the same position as the printed matter 401 that has not been written yet.

例えば、Ｘ_１−Ｘ_１´＝−１．００，Ｘ_Ｍ−Ｘ_Ｍ´＝−１．００，Ｙ_１−Ｙ_１´＝０，Ｙ_Ｎ−Ｙ_Ｎ´＝０であれば、書込みがされた印刷物４０２画像データを矢符４０６に示すように左方に１．００だけ平行移動する。勿論、上下、左右の平行移動だけで位置補正できない場合、例えば、画像データ２１３が傾いているときには、回転移動を行う。
再び図２を参照して説明する。 For example, if X ₁ −X ₁ ′ = −1.00, X _M −X _M ′ = −1.00, Y ₁ −Y ₁ ′ = 0, Y _N −Y _N ′ = 0, writing is performed. The printed product 402 image data is translated by 1.00 to the left as indicated by an arrow 406. Of course, when the position cannot be corrected only by vertical and horizontal translation, for example, when the image data 213 is tilted, rotational movement is performed.
A description will be given with reference to FIG. 2 again.

未書込みの印刷物２０１で絵図形領域２０５と判定された領域に対応する書込みがされた印刷物２１１の領域２１６を差分抽出の対象から除外する。これは、絵等では、画素の階調が文字領域のように明確に黒画素と白画素とのように２値化を正しくすることが困難であり、未書込みの印刷物２０１と書込みがされた印刷物２１１との２値化画像データの一致度が低いと考えられるからである。 The area 216 of the printed material 211 that has been written corresponding to the area determined as the picture area 205 in the unwritten printed material 201 is excluded from the difference extraction targets. This is because, in pictures and the like, it is difficult to make the binarization correct as if the pixel gradation is clearly a black pixel and a white pixel as in the character area, and the unwritten printed material 201 was written. This is because it is considered that the degree of coincidence of the binarized image data with the printed material 211 is low.

最後に、差分抽出ステップにおいて、未書込みの印刷物２０１の２値化画像データ２０２と位置補正された書込みがされた印刷物２１１の２値化画像データ２１７との差分を抽出する。対応する各画素の２値化データが同一であれば、白画素となり、同一でなければ黒画素となる。
この結果、差分画像として、メモ２１２が抽出される。この差分画像データ２０８を出力し、ＰＣ１０２の表示部１０３に差分画像を表示する。 Finally, in the difference extraction step, a difference between the binarized image data 202 of the unwritten printed material 201 and the binarized image data 217 of the printed material 211 that has been subjected to position correction is extracted. If the corresponding binarized data is the same, it becomes a white pixel, and if it is not the same, it becomes a black pixel.
As a result, a memo 212 is extracted as a difference image. The difference image data 208 is output, and the difference image is displayed on the display unit 103 of the PC 102.

なお、上記実施の形態では、差分画像抽出装置は、スキャナ１０１とＰＣ１０２とで構成されるとしたけれども、ＰＣ１０２にプリンタを接続した構成とし、差分画像をプリンタから出力するようにしてもよい。
また、上記実施の形態では、テンプレート２０７を文字領域３５１とほぼ一致させたけれども、文字領域３５１の一部としてもよい。 In the above embodiment, the differential image extraction apparatus is configured by the scanner 101 and the PC 102. However, the differential image extraction apparatus may be configured by connecting a printer to the PC 102 and outputting the differential image from the printer.
In the above embodiment, the template 207 is substantially matched with the character area 351, but may be a part of the character area 351.

また、上記実施の形態では、文字領域を判定するのに、ヒストグラムを生成して判定したけれども、紙面を分割し、分割した各領域で黒画素と白画素との比を求め、その比が所定の範囲内にある領域を文字領域と判定する等、他の方法によって文字領域を判定してもよい。
次に、本実施の形態の動作を図５、図６のフローチャートを用いて説明する。 In the above embodiment, the character area is determined by generating a histogram. However, the page is divided, the ratio of the black pixel to the white pixel is obtained in each divided area, and the ratio is predetermined. The character area may be determined by other methods, such as determining an area within the range as a character area.
Next, the operation of this embodiment will be described with reference to the flowcharts of FIGS.

先ず、基準となる未書込みの印刷物をスキャナ１０１で画像データに変換し、ＰＣ１０２に出力する（Ｓ５０２）。
ＰＣ１０２において、入力された画像データを画素ごとに黒画素と白画素とに２値化した画像データに変換する（Ｓ５０４）。
次に、未書込みの印刷物の紙面の縦方向と横方向とに黒画素のヒストグラムを生成し（Ｓ５０６）、ヒストグラムの形状が各方向で略「０」を規制的に繰り返す領域があるか否かを判定する（Ｓ５０８）。 First, a reference unwritten print is converted into image data by the scanner 101 and output to the PC 102 (S502).
In the PC 102, the input image data is converted into image data binarized into black pixels and white pixels for each pixel (S504).
Next, a histogram of black pixels is generated in the vertical direction and the horizontal direction of the page of the unwritten printed matter (S506), and whether or not there is a region where the histogram shape regularly repeats substantially “0” in each direction. Is determined (S508).

あるときには、その領域を文字領域と判定し（Ｓ５１０）、Ｓ５１４に移る。
ないときには、その領域を絵図形領域と判定し（Ｓ５１２）、文字領域がないので、処理を終了する。
Ｓ５１４において、文字領域からＭ行Ｎ列の画像データを抽出する（Ｓ５１４）。他の文字領域からＭ´行Ｎ´列の画像データを抽出できるか判定する（Ｓ５１６）。 If there is, the area is determined as a character area (S510), and the process proceeds to S514.
If not, the area is determined to be a graphic area (S512). Since there is no character area, the process ends.
In S514, image data of M rows and N columns is extracted from the character area (S514). It is determined whether image data of M ′ rows and N ′ columns can be extracted from other character regions (S516).

抽出できたときにはＭ×Ｎ＞Ｍ´×Ｎ´か否かを判定し（Ｓ５１８）、否であれば、Ｍ´行Ｎ´列の画像データをＭ行Ｎ列の画像データと置換し（Ｓ５２０）、Ｓ５１６に戻る。
Ｓ５１８でＭ×Ｎ＞Ｍ´×Ｎ´のとき、Ｓ５１６で他の文字領域からＭ´行Ｎ´列の画像データを抽出できないとき、Ｍ行Ｎ列の画像データをテンプレートとして記憶する（Ｓ５２２）。 When the extraction has been completed, it is determined whether or not M × N> M ′ × N ′ (S518). If not, the image data in M ′ rows and N ′ columns is replaced with image data in M rows and N columns (S520). ), The process returns to S516.
When M × N> M ′ × N ′ in S518, when image data of M ′ rows and N ′ columns cannot be extracted from other character areas in S516, the image data of M rows and N columns is stored as a template (S522). .

次に、書込みがされた印刷物をスキャナ１０１で画像データに変換し、ＰＣ１０２に出力する（Ｓ６０２）。
ＰＣ１０２において、入力された画像データを画素ごとに黒画素と白画素とに２値化した画像データに変換する（Ｓ６０４）。
Ｓ５２２で記憶されたテンプレートの画像データと書込みがされた印刷物の画像データとを照合する（Ｓ６０６）。一致する領域があるか否かを判定する（Ｓ６０８）。なければ、処理を終了し、あるときには、その書込みがされた印刷物の領域の画像データの位置（Ｘ_１´，Ｙ_１´）、（Ｘ_Ｍ´，Ｙ_Ｎ´）と、テンプレートの画像データの位置（Ｘ_１，Ｙ_１）、（Ｘ_Ｍ，Ｙ_Ｎ）とのそれぞれの差ΔＸ_１，ΔＸ_Ｍ，ΔＹ_１，ΔＹ_Ｎを計算し、書込みがされた印刷物の画像データを移動する補正値を求める（Ｓ６１０）。 Next, the written material that has been written is converted into image data by the scanner 101 and output to the PC 102 (S602).
In the PC 102, the input image data is converted into image data binarized into black pixels and white pixels for each pixel (S604).
The template image data stored in S522 is collated with the written image data of the printed matter (S606). It is determined whether there is a matching area (S608). If not, the process is terminated, and in some cases, the position (X ₁ ′, Y ₁ ′), (X _M ′, Y _N ′) of the image data area in which the writing has been performed, and the template image data Differences ΔX ₁ , ΔX _M , ΔY ₁ , ΔY _N from the positions (X ₁ , Y ₁ ) and (X _M , Y _N ) are calculated, and correction values for moving the image data of the printed matter that has been written are calculated. Obtain (S610).

求めた補正値に従い書込みがされた印刷物の画像データを移動する（Ｓ６１２）。
次に、絵図形領域を除いて、未書込みの印刷物と書込みがされた印刷物との２値化画像データの差分を抽出し（Ｓ６１４）、差分が「０」を白画素に、差分が「０」以外を黒画素として出力し（Ｓ６１６）、処理を終了する。 The image data of the printed material written according to the obtained correction value is moved (S612).
Next, the difference between the binarized image data between the unwritten printed material and the written printed material is extracted except for the graphic region (S614), the difference is “0” as a white pixel, and the difference is “0”. ”Are output as black pixels (S616), and the process ends.

本発明に係る差分画像抽出装置は、位置合せ用のマークが予め記録されていない未書込みの印刷物と書込みがされた印刷物との差分画像を正しく抽出できるので事務処理分野で活用される。 The differential image extraction apparatus according to the present invention can be used in the field of business processing because it can correctly extract a differential image between an unwritten printed material in which a mark for alignment is not recorded in advance and a written printed material.

本発明に係る差分画像抽出装置の一実施の形態のハードウェア構成の外観図である。It is an external view of the hardware constitutions of one Embodiment of the difference image extraction apparatus which concerns on this invention. 上記実施の形態の未書込みの印刷物と書込みがされた印刷物とから差分画像を抽出する概要を説明する図である。It is a figure explaining the outline | summary which extracts a difference image from the unprinted printed material of the said embodiment, and the printed material written. 上記実施の形態の未書込みの印刷物からテンプレートを抽出する一例を示す図である。It is a figure which shows an example which extracts a template from the unwritten printed matter of the said embodiment. 上記実施の形態の未書込みの印刷物と書込みがされた印刷物との位置合せの一例を説明する図である。It is a figure explaining an example of position alignment with the printed matter in which the unwritten and printed matter of the said embodiment were written. 上記実施の形態の動作を説明するフローチャート（その１）である。It is a flowchart (the 1) explaining operation | movement of the said embodiment. 上記実施の形態の動作を説明するフローチャート（その２）である。It is a flowchart (the 2) explaining operation | movement of the said embodiment.

Explanation of symbols

１０１スキャナ
１０２ＰＣ
１０３表示部 101 Scanner 102 PC
103 display

Claims

A differential image extraction device that extracts a difference between a printed matter at the end of writing and a printed matter that has been written,
Image acquisition means for acquiring images of two printed materials;
Image extracting means for extracting the character area of the last-printed printed matter;
An alignment unit that compares the extracted character area with the written image data of the printed material to align both printed materials;
A difference image extraction apparatus comprising: difference extraction means for extracting a difference image between two printed materials in the aligned state.

The image extracting means includes
A binarization unit that converts the image of the last written print acquired by the image acquisition unit into image data binarized into a black pixel and a white pixel for each pixel;
A histogram generation unit that generates a histogram by projecting image data that has been converted into black pixels in the binarization unit in the vertical direction and the horizontal direction of the paper surface of the last written print;
A determination unit for determining a character area from the shape of the generated histogram;
The difference image extraction apparatus according to claim 1, further comprising: a template extraction unit that extracts image data of M rows and N columns as a template from the character region determined by the determination unit.

The alignment means includes
An object binarization unit that converts the written image of the printed material into image data binarized into black pixels and white pixels for each pixel;
A matching region recognition unit for recognizing a region where the template extracted by the template extraction unit and the image data binarized by the target binarization unit match;
The difference image extracting apparatus according to claim 2, further comprising: an alignment unit configured to match the coordinate value of the area determined by the matching area recognition unit with the coordinate value of the template.

The difference extraction means includes
An area extracting unit that extracts an area determined by adding a blank area that is a set of white pixels to an area determined as a character area of the last written printed matter, and a corresponding region of the written printed matter;
The difference image extraction apparatus according to claim 3, further comprising a calculation unit that calculates a difference between the extracted image data of both regions.

5. The difference image extraction apparatus according to claim 4, further comprising output means for outputting as a black pixel when the difference calculated by the calculation unit is not "0".

3. The difference image extracting apparatus according to claim 2, wherein the template extracting unit extracts the largest one of the plurality of M rows and N columns of image data when it can be extracted.

A difference image extraction method of a difference image extraction device that extracts a difference between a printed matter at the end of writing and a printed matter that has been written,
An image acquisition step of acquiring images of two printed materials;
An image extraction step for extracting the character area of the last-printed printed matter;
An alignment step in which the extracted character area and the written image of the printed material are collated to align both printed materials;
A difference image extraction method comprising: a difference extraction step of extracting a difference image between two printed materials in the aligned state.