JP4936250B2

JP4936250B2 - Write extraction method, write extraction apparatus, and write extraction program

Info

Publication number: JP4936250B2
Application number: JP2007057693A
Authority: JP
Inventors: 友弘中居; 浩一黄瀬; 雅一岩村
Original assignee: Sharp Corp; Osaka Prefecture University
Current assignee: Sharp Corp; Osaka Prefecture University
Priority date: 2007-03-07
Filing date: 2007-03-07
Publication date: 2012-05-23
Anticipated expiration: 2027-03-07
Also published as: JP2008219800A

Description

この発明は、文書に書き込みが付加されたものから書き込みを抽出する書込み抽出方法、書込み抽出装置および書込み抽出プログラムに関する。 The present invention relates to a writing extraction method, a writing extraction apparatus, and a writing extraction program for extracting writing from a document with writing added.

紙などの印刷媒体上に記された文書（紙文書）は高い可読性および可搬性を持つため、多くの情報が紙文書を介して提供されている。また、紙文書に対して注目した部分に印をつけたり、メモを書き込むなどの書き込み行為は一般的である。従って、紙文書における書き込みには利用者の興味や知識などの情報が豊富に蓄積されているといえ、書き込みを抽出・解析することで貴重な情報を得ることができると考えられる。具体的には、a)ユーザによる書き込みの閲覧、b) 書き込みの大まかな位置の同定、に用いることが考えられる。 Since a document (paper document) written on a print medium such as paper has high readability and portability, a large amount of information is provided through the paper document. Also, writing actions such as marking a noticed portion of a paper document or writing a memo are common. Therefore, it can be said that valuable information can be obtained by extracting and analyzing the writing, even though the writing in the paper document has abundant information such as the user's interest and knowledge. Specifically, it can be used for a) browsing of writing by a user and b) identification of a rough position of writing.

前述の用途を考えると、必ずしも書き込みの領域を正確にトレースする必要はなく、人間にとって読みやすいことが重要である。正確な抽出を行おうとすると，しばしば，書き込みの抽出が不十分（分断して抽出されるなど）となり、ユーザによる可読性が低下する。分断などは、位置の同定にも悪影響を及ぼす。逆に、書き込み以外を抽出しても可読性には影響がない。 Considering the above application, it is not always necessary to accurately trace the writing area, and it is important that it is easy for humans to read. If an accurate extraction is attempted, the writing is often insufficiently extracted (for example, divided and extracted), and the readability by the user is deteriorated. Splitting also adversely affects the location identification. Conversely, extracting anything other than writing does not affect readability.

これまでに、紙文書から書き込みを抽出する様々な手法（例えば、非特許文献１，２，３および４）が提案されている。これらの手法は、機械による認識を念頭に置いたものであり、高い精度で書き込みを抽出するために、書き込みの色や種類に制限を課している。しかし、実環境で行われる書き込みには制限は存在しないため、どのような色でどのような書き込みが行われても、その抽出が可能な手法が必要である。
より詳細に説明すると、前述の従来手法は、以下の2つのタイプに分類される。
第１のタイプは書き込みの抽出ではなく、書き込みを用いた自動校正などの抽出された書き込みの利用に重点を置いたもの（非特許文献１，２）である。これらの手法では、抽出結果が機械で利用されるため、高精度な抽出が求められる。その実現のため、これらの手法では書き込みに制限を設けている。具体的には、書き込みに用いることのできるペンの色があらかじめ定められており、スキャンされた画像の画素の色によって書き込みであるか否かを判断する。そのため、これらの手法では書き込みの色に制限がある。
第２のタイプは、画像における連結成分を手書き文字と印刷された文字に分類するもの（非特許文献３，４）である。これらの手法では、書き込みのなされた文書の画像のみから書き込み抽出が可能であるという長所がある一方、抽出できる書き込みは文字に限られており、手書きの線や図形などを抽出することはできない。実際の書き込みでは下線や矢印などの図形は頻繁に用いられるため、そのような書き込みを抽出できる手法が必要である。 Until now, various methods (for example, nonpatent literature 1, 2, 3, and 4) which extract writing from a paper document have been proposed. These methods are intended for machine recognition and impose restrictions on the color and type of writing in order to extract writing with high accuracy. However, since there is no limitation on writing performed in a real environment, there is a need for a technique that can extract whatever writing is performed with any color.
More specifically, the above-described conventional methods are classified into the following two types.
The first type is not extraction of writing, but focuses on the use of extracted writing such as automatic calibration using writing (Non-Patent Documents 1 and 2). In these methods, since the extraction result is used by a machine, highly accurate extraction is required. In order to realize this, these methods place restrictions on writing. Specifically, the pen color that can be used for writing is determined in advance, and it is determined whether or not the writing is performed based on the color of the pixel of the scanned image. Therefore, these methods have a limitation in writing color.
The second type classifies connected components in an image into handwritten characters and printed characters (Non-Patent Documents 3 and 4). These methods have an advantage that writing and extraction can be performed only from an image of a written document. On the other hand, writing that can be extracted is limited to characters, and handwritten lines and figures cannot be extracted. In actual writing, figures such as underlines and arrows are frequently used, so a method capable of extracting such writing is required.

なお、この発明に関連する手法として、発明者らは、特徴点の局所的配置に基づく文書画像検索法を提案している（例えば、特許文献１、非特許文献５参照）。以下の説明では、この提案手法を単に文書画像検索法という。本願発明と前記文書画像検索法との関連については、後に説明する。
国際公開第２００６／０９２９５７号パンフレット D. Mori and H. Bunke, "Automatic Interpretation and Execution of Manual Corrections on Text Documents", in Handbook of Character Recognition and Document Image Analysis, ed. H. Bunke and P. S. P. Wang, pp.679-702, World Scientific, Singapore (1997). J. Stevens, A. Gee, and C. Dance, "Automatic Processing of Document Annotations", In Proc. 1998 British Machine Vision Conf., Vol. 2, pp.438-448 (1998). J. K. Guo and M. Y. Ma, "Separating Handwritten Material from Machine Printed Text using Hidden Markov Models", In Proc. 6th International Conf. on Document Analysis and Recognition, pp.436-443 (2001). Y. Zheng, H. Li, and D. Doermann, "The Segmentation and Identification of Handwriting in Noisy Document Images", In Lecture Notes in Computer Science (5th International Workshop DAS2002), vol.2423, pp.95-105 (2002). 中居、黄瀬、岩村：「デジタルカメラを用いた高速文書画像検索におけるアフィン不変量及び相似不変量の利用」、信学技報、PRMU2005-188 (2006) As a technique related to the present invention, the inventors have proposed a document image search method based on local arrangement of feature points (see, for example, Patent Document 1 and Non-Patent Document 5). In the following description, this proposed method is simply referred to as a document image search method. The relationship between the present invention and the document image search method will be described later.
International Publication No. 2006/092957 Pamphlet D. Mori and H. Bunke, "Automatic Interpretation and Execution of Manual Corrections on Text Documents", in Handbook of Character Recognition and Document Image Analysis, ed. H. Bunke and PSP Wang, pp.679-702, World Scientific, Singapore (1997). J. Stevens, A. Gee, and C. Dance, "Automatic Processing of Document Annotations", In Proc. 1998 British Machine Vision Conf., Vol. 2, pp.438-448 (1998). JK Guo and MY Ma, "Separating Handwritten Material from Machine Printed Text using Hidden Markov Models", In Proc. 6th International Conf. On Document Analysis and Recognition, pp.436-443 (2001). Y. Zheng, H. Li, and D. Doermann, "The Segmentation and Identification of Handwriting in Noisy Document Images", In Lecture Notes in Computer Science (5th International Workshop DAS2002), vol.2423, pp.95-105 (2002 ). Nakai, Kise, and Iwamura: "Use of affine and similarity invariants in high-speed document image retrieval using a digital camera", IEICE Technical Report, PRMU2005-188 (2006)

この発明は、カラーの紙文書に任意の色で書き込みを行った場合でも、あるいは、文字以外に線や図形などを書き込んだ場合であっても、その書き込みを精度よく抽出できる手法を提供するものである。より詳細には、この発明は、書き込み対象の紙文書の画像（元画像）が利用可能であることを前提とし、書き込みの含まれる文書画像（書き込み画像）と元の文書画像との差分を求めることで書き込みを抽出するものである。
本発明では、前述の分断が起きるなど、書き込みの抽出が不十分にならないように、書き込みの領域を余計目に抽出する。これによって、書き込み以外の部分が、書き込みに混ざって抽出されることもあり得るが、ユーザによる閲覧や位置の同定にはそれほど悪影響を及ぼさない。 The present invention provides a technique that can accurately extract writing even when writing on a color paper document in an arbitrary color, or when writing a line or a figure in addition to characters. It is. More specifically, the present invention is based on the premise that an image (original image) of a paper document to be written is available, and obtains a difference between the document image (written image) including writing and the original document image. This is to extract writing.
In the present invention, the writing area is extracted extra so that the writing extraction does not become insufficient due to the above-described division. As a result, portions other than writing may be extracted while being mixed with writing, but it does not have a bad influence on the browsing and position identification by the user.

この発明は、文書画像に書き込みが付加されてなる画像から書き込みを抽出する方法であって、書き込みが付加される前の元画像と書き込みが付加された後の書込み画像とを色成分で表される局所領域の集合としてそれぞれ取得する工程と、元画像と書込み画像とを比較して両画像の位置合わせを行う工程と、位置合わせがなされた両画像の一方の画像の各局所領域について、それに対応する位置もしくはそれから所定範囲内の位置にある他方の画像の局所領域を対の候補とし、対の候補の中から類似した局所領域を求めることにより対の各局所領域を決定する工程と、対にされた局所領域の色成分の差分に基づいて、書込み画像に含まれ、かつ、元画像に含まれない局所領域を書き込みとして抽出する工程と
をコンピュータが処理することを特徴とする書込み抽出方法を提供する。 The present invention is a method for extracting writing from an image in which writing is added to a document image, and the original image before writing is added and the writing image after writing is expressed by color components. Each of the local regions of one of the images that are aligned with each other, the step of acquiring each as a set of local regions, the step of comparing the original image with the written image, Determining a local region of the pair by determining a similar local region from the pair of candidates, using a local region of the other image at a corresponding position or a position within a predetermined range as a pair candidate, A computer that extracts a local area that is included in the written image and that is not included in the original image as a writing based on the difference in color components of the local area Providing a write extraction method characterized.

また、異なる観点から、この発明は、文書画像に書き込みが付加されてなる書込み画像から書き込みを抽出する装置であって、書き込みが付加される前の元画像と書き込みが付加された後の書込み画像とを色成分で表される局所領域の集合としてそれぞれ取得する画像取得部と、元画像と書込み画像とを比較して両画像の位置合わせを行う位置合わせ部と、位置合わせがなされた両画像の一方の画像の各局所領域について、それに対応する位置もしくはそれから所定範囲内の位置にある他方の画像の局所領域を対の候補とし、対の候補の中から局所領域の色成分が最も近い局所領域を求めることにより対の各局所領域を決定する対局所領域決定部と、対にされた局所領域の色成分の差分に基づいて、書込み画像に含まれ、かつ、元画像に含まれない局所領域を書き込みとして抽出する書込み抽出部とを備える書込み抽出装置を提供する。 Further, from a different point of view, the present invention is an apparatus for extracting a writing from a writing image in which writing is added to a document image, the original image before the writing is added and the writing image after the writing is added. Are acquired as a set of local regions represented by color components, a registration unit that compares the original image and the written image and aligns both images, and both images that have been aligned For each local region of one of the images, the corresponding region or the local region of the other image at a position within the predetermined range is taken as a pair candidate, and the local component having the closest color component of the local region from among the pair candidates Included in the written image and included in the original image based on the difference between the color components of the paired local regions and the local region determining unit that determines each local region of the pair by determining the region Providing a write extraction device and a write extractor for extracting no local region as a write.

さらに、異なる観点から、この発明は、文書画像に書き込みが付加されてなる書込み画像から書き込みを抽出する処理を実行させるためのプログラムであって、書き込みが付加される前の元画像と書き込みが付加された後の書込み画像とを色成分で表される局所領域の集合としてそれぞれ取得する画像取得部と、元画像と書込み画像とを比較して両画像の位置合わせを行う位置合わせ部と、位置合わせがなされた両画像の一方の画像の各局所領域について、それに対応する位置もしくはそれから所定範囲内の位置にある他方の画像の局所領域を対の候補とし、対の候補の中から局所領域の色成分が最も近い局所領域を求めることにより対の各局所領域を決定する対局所領域決定部と、対にされた局所領域の色成分の差分に基づいて、書込み画像に含まれ、かつ、元画像に含まれない局所領域を書き込みとして抽出する書込み抽出部としての機能をコンピュータに実行させることを特徴とする書込み抽出プログラムを提供する。 Furthermore, from a different point of view, the present invention is a program for executing a process of extracting writing from a written image in which writing is added to a document image, and the original image and writing before the writing is added are added. An image acquisition unit that acquires the written image as a set of local regions represented by color components, a registration unit that compares the original image and the written image, and aligns both images, For each local region of one image of both images that have been matched, a local region of the other image at a position corresponding to it or a position within the predetermined range is set as a pair candidate, and a local region is selected from the pair of candidates. Based on the local area determination unit that determines each local area of the pair by determining the local area that has the closest color component and the difference between the color components of the paired local areas Included in, and provides a write extraction program characterized by executing the function of the write extraction unit for extracting a local region which is not included in the original image as a write to the computer.

この発明の書込み抽出方法は、位置合わせがなされた両画像の一方の画像の各局所領域について、それに対応する位置もしくはそれから所定範囲内の位置にある他方の画像の局所領域を対の候補とし、対の候補の中から局所領域の色成分が最も近い局所領域を求めることにより対の各局所領域を決定するので、両画像の対応する位置にある局所領域を対にする場合に比べ、より正確に書き込みを抽出することができる。換言すれば、元画像と書込み画像との位置合わせを局所領域単位で厳密に行わなくても、正確に対応局所領域を決定することができる。局所領域単位の厳密な位置合わせが不要になるので、位置合わせに時間を費やすことなく精度のよい抽出結果を得ることができる。
また、従来手法のような制約がないので、カラーの紙文書に任意の色で書き込みを行った場合でも、あるいは、文字以外に線や図形などを書き込んだ場合であっても、その書き込みを精度よく抽出することができる。 In the writing extraction method of the present invention, for each local region of one image of both images, the corresponding region or the local region of the other image at a position within a predetermined range is used as a pair candidate. Since each local region of the pair is determined by finding the local region with the closest color component of the local region from the pair candidates, it is more accurate than when pairing local regions at corresponding positions in both images You can extract the writing. In other words, the corresponding local region can be accurately determined without strictly aligning the original image and the written image in units of local regions. Strict alignment in units of local areas is not necessary, and an accurate extraction result can be obtained without spending time for alignment.
In addition, since there is no restriction like the conventional method, even when writing in a color paper document with an arbitrary color, or when writing a line or figure in addition to characters, the writing is accurate. Can be extracted well.

ここで、文書画像とは、文字および／または画像からなる情報が画像として表されたものをいう。さらに、前記画像が電子データ化された画像データも文書画像に含む。 Here, the document image is an image in which information including characters and / or images is represented as an image. Further, the document image includes image data obtained by converting the image into electronic data.

書き込みとは、文書画像に対して付加された可視的な情報をいう。書き込みは、文字、線図などから構成される画像である。多くの場合、書き込みは手書きで付加されるが、それに限定されるものではなく、印刷やスタンプなど他の手法で付加されたものであってもよい。 Writing refers to visible information added to a document image. Writing is an image composed of characters, line diagrams, and the like. In many cases, the writing is added by handwriting, but the writing is not limited thereto, and the writing may be added by other methods such as printing or stamping.

また、位置合わせとは、書込み画像中の書き込む前から存在した部分と元画像とを重ね合わせるため、一方の画像に施すべき幾何的変換のパラメータを決定し、決定されたパラメータに基づく幾何変換を対象画像に適用する処理をいう。幾何変換は、変換の自由度に応じて射影変換、アフィン変換、相似変換などがある。以下の実施形態では、前記３種類の幾何変換のうち最も自由度の低い相似変換を前提とした位置合わせについて説明している。しかし、この発明の本質は、必ずしもこれに限定されない。
この発明の好ましい態様として、前記局所領域は、１またはそれ以上の画素を単位とする領域であってもよい。ここで、画像の読み取りは、画素単位で行われる。前記局所領域は、各画素に対応するものであってもよいし、所定の配置関係にある複数画素を単位としてもよい。
また、好ましくは、画像の読み取りは、レッド（Ｒ）、グリーン（Ｇ）、ブルー（Ｂ）の各色成分に分解して行われるのが一般的であるので、それらの色成分で表現される画素、即ちＲＧＢ色空間で表現される画素に対して各工程の処理を行うことができる。しかし、この発明は、必ずしもそれに限定されるものではなく、ＲＧＢ色空間以外の色空間で表現される画素に対して適用することができる。他の色空間としては、ＹＭＣ色空間やＬａｂ色空間などが知られており、演算により異なる色空間に変換が可能である。 In addition, alignment means determining the geometric transformation parameters to be applied to one image in order to superimpose the original image and the portion existing before writing in the written image, and performing geometric transformation based on the determined parameters. A process applied to the target image. Geometric transformation includes projective transformation, affine transformation, similarity transformation, etc. depending on the degree of freedom of transformation. In the following embodiment, the alignment based on the similarity transformation having the lowest degree of freedom among the three types of geometric transformations is described. However, the essence of the present invention is not necessarily limited to this.
As a preferred aspect of the present invention, the local region may be a region having one or more pixels as a unit. Here, the image reading is performed in units of pixels. The local area may correspond to each pixel, or a plurality of pixels having a predetermined arrangement relationship may be used as a unit.
Preferably, the image reading is generally performed after being decomposed into each color component of red (R), green (G), and blue (B). That is, each process can be performed on the pixels expressed in the RGB color space. However, the present invention is not necessarily limited to this, and can be applied to pixels expressed in a color space other than the RGB color space. As other color spaces, the YMC color space, the Lab color space, and the like are known and can be converted into different color spaces by calculation.

また、好ましくは、局所領域を書き込みとして抽出する工程は、対にされた局所領域の色成分の差分が所定の閾値よりも大きい場合、その対のうち書込み画像の局所領域を出力する工程であってもよい。このようにすれば、書き込みとして抽出される画像の濃度は、元画像の局所領域の属性を示す値に影響されないので、抽出された書き込みを認識しやすい。
ここで、局所領域の属性を示す値とは、例えば、局所領域が１画素である場合はその画素値（画素の色成分の値）が属性を示す値に相当する。局所領域が複数画素からなる場合は、例えば、それらの各画素値を平均した値が属性を示す値に相当する。あるいは、各画素値に所定の重み付けをして平均してもよいし、その他の手順で算出されてもよい。 Preferably, the step of extracting the local region as writing is a step of outputting the local region of the writing image of the pair when the difference between the color components of the paired local regions is larger than a predetermined threshold. May be. In this way, the density of the image extracted as writing is not affected by the value indicating the attribute of the local region of the original image, so that the extracted writing can be easily recognized.
Here, the value indicating the attribute of the local region corresponds to a value indicating the attribute, for example, when the local region is one pixel, the pixel value (the value of the color component of the pixel). When the local region is composed of a plurality of pixels, for example, a value obtained by averaging the pixel values corresponds to a value indicating an attribute. Alternatively, each pixel value may be averaged with a predetermined weight, or may be calculated by other procedures.

好ましくは、画像の位置合わせを行う工程は、元画像と書込み画像をそれぞれ色クラスタリングし、色クラスタリングされた各画像について元画像と書込み画像との対応をとる処理を含んでいてもよい。このようにすれば、色クラスタリングされた各画像について元画像と書込み画像との対応をとるので、カラーの元画像にカラーで書き込みを行った場合でも、精度よく書き込みを抽出することができる。 Preferably, the step of aligning the images may include a process of performing color clustering on the original image and the written image, and obtaining correspondence between the original image and the written image for each color clustered image. In this way, the correspondence between the original image and the written image is taken for each color clustered image, so that even when the color original image is written in color, the writing can be extracted with high accuracy.

また、好ましくは、画像の位置合わせを行う工程は、元画像と書込み画像から特徴点をそれぞれ抽出し、抽出された各特徴点について元画像と書込み画像との対応をとる処理を含んでいてもよい。このようにすれば、元画像と書込み画像から抽出された特徴点を用いて位置合わせを行うので、特徴点を用いないで全局所領域を考慮して位置合わせを行う場合に比べて位置合わせに要する処理時間を圧倒的に短縮することができる。即ち、課題を解決するための手段を実現するためには、書き込み画像と元画像を高速かつ正確に位置合わせする必要がある。この問題について、前述した文書画像検索法を用いることができる。前記文書画像検索法と本願発明とは、その解決しようとする課題が異なるものである。しかし、前記文書画像検索法によれば、検索の過程で検索質問の文書画像と検索された文書画像の間で、部分ごとの対応関係を得ることができる。例えば、前記特許文献１の図３０あるいは図４４に検索処理のアルゴリズムが示されているが、このアルゴリズムを本願発明に流用すれば、高速に、および／または、精度よく画像の位置合わせを実現することができる。 Preferably, the step of aligning the images may include a process of extracting feature points from the original image and the written image, and taking correspondence between the original image and the written image for each extracted feature point. Good. In this way, since the alignment is performed using the feature points extracted from the original image and the written image, the alignment is performed compared to the case where the alignment is performed in consideration of the entire local region without using the feature points. The processing time required can be greatly reduced. That is, in order to realize the means for solving the problem, it is necessary to align the written image and the original image at high speed and accurately. For this problem, the document image search method described above can be used. The document image search method and the present invention are different from each other in the problem to be solved. However, according to the document image search method, it is possible to obtain a correspondence for each part between the document image of the search query and the searched document image in the search process. For example, FIG. 30 or FIG. 44 of Patent Document 1 shows an algorithm for search processing. If this algorithm is applied to the present invention, image alignment can be achieved at high speed and / or with high accuracy. be able to.

また、好ましくは、局所領域を書き込みとして抽出する工程の後に、抽出された書込みを整形して出力する整形工程をさらにコンピュータに処理させ、前記整形工程は、書込みとして抽出された局所領域を透過部とし、前記透過部の領域をさらに所定量だけ拡張してなるマスクパターンを得る工程と、得られたマスクパターンを元画像に重ね合わせ、拡張された透過部に重なる部分の元画像を出力する工程とを含んでなるようにしてもよい。このようにすれば、整形工程がない場合よりも抽出された書き込みの局所領域をより多く抽出できるので、書き込みをより認識し易くできる。色成分の差分を判定する際の閾値の大きさによって、書き込みとして抽出された画像の線部が途切れたり細ったりすることがあるが、このようにすれば線部の途切れや細りを抑制して書き込みの認識を容易にすることができる。 Preferably, after the step of extracting the local region as writing, the computer further processes a shaping step of shaping and outputting the extracted writing, and the shaping step transmits the local region extracted as writing to the transmission unit. And a step of obtaining a mask pattern obtained by further expanding the area of the transmissive portion by a predetermined amount, and a step of superposing the obtained mask pattern on the original image and outputting an original image of a portion overlapping the expanded transmissive portion. May be included. In this way, more extracted local regions of writing can be extracted than when there is no shaping step, and writing can be recognized more easily. Depending on the threshold value used to determine the difference in color components, the line portion of the image extracted as writing may be interrupted or thinned. In this way, the line portion is prevented from being interrupted or thinned. Recognition of writing can be facilitated.

さらに、好ましくは、マスクパターンを得る工程は、抽出された書込みの各局所領域を２値化する工程と、２値化された各局所領域に対して連結処理を行う工程とを含んでいてもよい。このようにすれば、書き込み中の線部の途切れを修復することができる。 Further preferably, the step of obtaining a mask pattern may include a step of binarizing each extracted local region of writing and a step of performing a connection process on each binarized local region. Good. By doing so, it is possible to repair the interruption of the line portion during writing.

さらにまた、好ましくは、マスクパターンを得る工程は、連結処理により生成される書込みの領域のうち所定の面積よりも小さい領域を除去する工程をさらに含んでいてもよい。このようにすれば、抽出された書き込みからドット状のノイズを除去することができるので、この処理を行わない場合に比べて書き込みをより認識し易くできる。
前述の好ましい態様は、組み合わせることができる。 Still preferably, the step of obtaining the mask pattern may further include a step of removing a region smaller than a predetermined area from the write region generated by the concatenation process. In this way, since dot-like noise can be removed from the extracted writing, writing can be recognized more easily than when this processing is not performed.
The preferred embodiments described above can be combined.

なお、以下の実施形態においては、抽出結果の解析を目視によって行うものと想定しており、書き込みの厳密な抽出よりも良好な視認性を優先させる手法を記載している。ただし、本願発明の本質は、この手法に限定されるものではない。実験の結果、文字を多く含む文書画像については良好な結果が得られることが示された。 In the following embodiments, it is assumed that the analysis of the extraction result is performed by visual observation, and a method is described in which better visibility is prioritized than strict extraction of writing. However, the essence of the present invention is not limited to this method. As a result of experiments, it was shown that good results can be obtained for document images containing many characters.

この発明の書き込みの抽出法は、元画像が取得可能であるという前提の下で、元画像と書込み画像の位置合わせを行い、差分を取ることで書き込みの抽出を行うものである。この発明の特徴は、大局的なずれのみならず、局所的なずれにも強い柔軟な差分処理である。
さらに、前述の位置合わせは、対応点探索によって幾何変換パラメータを求め、前記文書画像検索法を応用した高速な位置合わせを行うことが非常に好ましい。 In the writing extraction method of the present invention, on the assumption that the original image can be acquired, the original image and the writing image are aligned, and writing is extracted by taking the difference. A feature of the present invention is flexible difference processing that is resistant to not only global shift but also local shift.
Further, in the above-described alignment, it is very preferable to obtain a geometric transformation parameter by searching for corresponding points and perform high-speed alignment using the document image search method.

以下、図面を用いてこの発明をさらに詳述する。なお、以下の説明は、すべての点で例示であって、この発明を限定するものと解されるべきではない。
１．処理の流れ
図１は、この発明の書き込み抽出方法の処理の流れを示す説明図である。処理は大きく位置合わせ処理１１と差分取得処理１３とに分かれる。 Hereinafter, the present invention will be described in more detail with reference to the drawings. In addition, the following description is an illustration in all the points, Comprising: It should not be interpreted as limiting this invention.
1. Flow diagram 1 of processing is an explanatory diagram showing a flow of processing of the write extraction method of the present invention. The process is roughly divided into an alignment process 11 and a difference acquisition process 13.

位置合わせ処理１１では、書き込み画像を元画像に合わせて相似変換し、ずれや回転、スケールの違いを補正する。位置合わせの処理はさらに４つの小さな処理に分けられる。
差分取得処理１３では補正された書き込み画像と元画像を比較し、差分を取ることで書き込み画像に含まれて元画像に含まれない局所領域を書き込みとして出力する。なお、以下の説明では、局所領域が１画素である場合を代表例として説明する。差分取得の処理は６つの小さな処理に分けられる。 In the alignment process 11, the written image is subjected to similarity conversion in accordance with the original image, and the difference in deviation, rotation, and scale is corrected. The alignment process is further divided into four small processes.
In the difference acquisition process 13, the corrected writing image is compared with the original image, and a local area that is included in the writing image and not included in the writing image is output as writing by taking the difference. In the following description, a case where the local region is one pixel will be described as a representative example. The difference acquisition process is divided into six small processes.

２．位置合わせ
2.1. 特徴点抽出
位置合わせにおける第１の処理は、特徴点抽出である。ここでは書き込み画像および元画像から、変形やノイズに対してロバストな（強い）点を特徴点として抽出する。この実施形態では、画像をRGB色空間で色クラスタリングして得られた色ごとの連結成分の重心を特徴点とする。これは、文字などの単色領域の位置を特徴点として用いることを意図したものである。文字は特徴的な配置をもち、また背景とのコントラストが大きいため安定した抽出が可能である。
具体的な処理を以下に述べる。まず、処理の軽減のために画像を縮小する。縮小倍率は、書き込み画像と元画像の両方に対して同じ値Lを用いる。そして、ノイズの影響を減らすためにモルフォロジ演算によって画像をスムージングする。具体的には、3×3の矩形要素によるエロージョン（erosion、即ち浸食）演算をｌ（エル）回繰り返す。次に、画像をX倍に縮小した後、k-means法で画像を画素の色でk個の色クラスタにクラスタリングする。ここで、色クラスタの数kはあらかじめ定められた値である。また、クラスタリングの繰り返し処理は、処理回数がi回を超えたとき、または重心の移動距離がpを下回ったときに打ち切られる。 2. Alignment
2.1. Feature point extraction The first process in position alignment is feature point extraction. Here, points that are robust (strong) against deformation and noise are extracted as feature points from the written image and the original image. In this embodiment, the feature point is the centroid of the connected component for each color obtained by color clustering the image in the RGB color space. This is intended to use the position of a monochromatic area such as a character as a feature point. Characters have a characteristic arrangement, and the contrast with the background is large, so that stable extraction is possible.
Specific processing will be described below. First, the image is reduced to reduce processing. As the reduction ratio, the same value L is used for both the written image and the original image. Then, in order to reduce the influence of noise, the image is smoothed by a morphological operation. Specifically, the erosion calculation with 3 × 3 rectangular elements is repeated l times. Next, after reducing the image by X times, the image is clustered into k color clusters with pixel colors by the k-means method. Here, the number k of color clusters is a predetermined value. Further, the clustering repetitive process is terminated when the number of processes exceeds i or when the moving distance of the center of gravity is less than p.

次に、図２に示すように、色クラスタリングの結果に基づいて対象画像15をk=5枚の色クラスタの画像17a,17b,17c,17dに分解する。このようにして得られた各色クラスタの画像から連結成分を抽出し、その面積を調べて面積がE以上のもの及び面積がe以下のものを除去する。これは、ノイズなどによる微小な領域や、背景などの大きい領域は安定でないと考えられるためである。このようにして残った連結成分の重心を色クラスタごとの特徴点とする。 Next, as shown in FIG. 2, the target image 15 is decomposed into k = 5 color cluster images 17a, 17b, 17c, and 17d based on the result of the color clustering. A connected component is extracted from the image of each color cluster obtained in this manner, and the area is examined to remove those having an area of E or more and those having an area of e or less. This is because a small area due to noise or the like or a large area such as the background is considered to be unstable. The centroid of the connected component remaining in this way is used as a feature point for each color cluster.

なお、k-means法は、最終のクラスタ数（グループの数）kと各クラスタの評価基準が与えられたときに、与えられたデータを最適なクラスタに分類する公知の手法である（例えば、高木幹雄・下田陽久監修、画像解析ハンドブック、第１版、東京大学出版会、２００４年９月１０日、p.1576-1579参照）。 The k-means method is a known technique for classifying given data into optimal clusters when given the final number of clusters (number of groups) k and evaluation criteria for each cluster (for example, Supervised by Mikio Takagi and Yoshihisa Shimoda, Image Analysis Handbook, 1st Edition, University of Tokyo Press, September 10, 2004, p.1576-1579).

2.2. 対応点探索
第２の処理は、対応点検索である。ここでは、書き込み画像の特徴点と元画像の特徴点を、前記文書画像検索法を応用して対応付ける。 2.2. Corresponding point search The second process is a corresponding point search. Here, the feature points of the written image and the feature points of the original image are associated by applying the document image search method.

この段階では、書き込み画像と元画像のそれぞれがk個の色クラスタごとの特徴点をもつ。色クラスタによってはノイズの特徴点が多く含まれている場合があるため、全クラスタの特徴点をまとめてから対応付けるのではなく、色クラスタごとに対応付ける方が正しい対応点を得やすい。そのため、特徴点同士の対応付けに先立って、書き込み画像の色クラスタと元画像の色クラスタの1対1の対応付けを行う。この実施形態では、色クラスタのRGB色空間での重心の距離の最も小さいものから対応付ける。具体的には以下の処理のようになる。まず、書き込み画像の色クラスタと元画像の色クラスタのすべての組み合わせについて距離を求める。次に、最も距離の小さい組を対応付ける。そして、距離の小さい組を順に対応付けていく。ただし、既に組のいずれかの色クラスタが対応付けられていた場合は対応付けを行わない。この処理をすべての書き込み画像の色クラスタと元画像の色クラスタの対応が1対1で決定するまで繰り返し行う。 At this stage, each of the written image and the original image has a feature point for each of k color clusters. Since there may be many noise feature points depending on the color cluster, it is easier to obtain the correct corresponding points by associating the feature points of all the clusters, instead of associating them together. Therefore, prior to the association between the feature points, the one-to-one association between the color cluster of the written image and the color cluster of the original image is performed. In this embodiment, the color clusters are associated in ascending order of the centroid distance in the RGB color space. Specifically, the following processing is performed. First, distances are obtained for all combinations of the color cluster of the written image and the color cluster of the original image. Next, the pair with the smallest distance is associated. Then, the pairs with the smallest distances are associated in order. However, if any color cluster in the set has already been associated, the association is not performed. This process is repeated until the correspondence between the color clusters of all written images and the color clusters of the original image is determined on a one-to-one basis.

次に、対応付けられた色クラスタの組ごとに特徴点の対応付けを行う。これは、前記文書画像検索法のように、元画像の特徴点をデータベースに登録し、書き込み画像の各特徴点に対して対応するものを検索するという処理を対応付けられた色クラスタの各組に対して行うことで実現される。ただし、登録される画像が1つだけである点が前記文書画像検索法の場合と異なっている。なお、対応点探索で用いられるパラメータは、特徴量計算を行う近傍点の範囲n、特徴量計算に用いる点の数m、特徴量の離散化レベル数dである。
上記の処理により、特徴点の対応関係は、色クラスタの組ごとに得られる。画像全体の対応関係は、すべての色クラスタの対応関係を集めることによって得られる。図３に特徴点同士の対応関係を示す。 Next, feature points are associated for each set of associated color clusters. This is because, as in the document image search method, the feature points of the original image are registered in the database, and the process of searching for the corresponding features for each feature point of the written image is associated with each set of color clusters. It is realized by doing for. However, it differs from the document image search method in that only one image is registered. Note that the parameters used in the corresponding point search are the range n of neighboring points in which the feature amount calculation is performed, the number m of points used in the feature amount calculation, and the discretization level number d of the feature amount.
With the above processing, the correspondence relationship between the feature points is obtained for each set of color clusters. The correspondence of the entire image is obtained by collecting the correspondences of all the color clusters. FIG. 3 shows the correspondence between feature points.

2.3. 相似変換パラメータの取得
第３の処理は、相似変換パラメータの取得である。ここでは、特徴点の対応関係に基づいて相似変換パラメータを取得する。相似変換パラメータは、拡大縮小、回転、ｘおよびｙ軸方向の並進の各要素からなる4次元のベクトルである。相似変換パラメータは、2組の対応点から計算される。しかし、この実施の形態における対応点には、誤ったものが含まれている場合があるため、誤った対応点を除いて変換パラメータを計算する必要がある。そこで、RANSACを用いて変換パラメータの推定を行う。RANSACは、複数の画像間の特徴点の対応を定めるのに多用される公知の手法である（例えば、M.A.Fischler and R.C.Bolles, "Random Sample Consensus: A Paradigm for Model Fitting with Application to Image Analysis and Automated Cartography", Comm. ACM, Vol.6, No.24, pp.381-395 (1981)を参照）。 2.3. Acquisition of similarity transformation parameters The third process is acquisition of similarity transformation parameters. Here, the similarity conversion parameter is acquired based on the correspondence between the feature points. The similarity transformation parameter is a four-dimensional vector composed of elements of scaling, rotation, and translation in the x and y axis directions. Similar transformation parameters are calculated from two sets of corresponding points. However, since the corresponding points in this embodiment may include wrong ones, it is necessary to calculate the conversion parameters by removing the wrong corresponding points. Therefore, the conversion parameter is estimated using RANSAC. RANSAC is a well-known technique often used to define the correspondence of feature points between multiple images (eg, MAFischler and RCBolles, "Random Sample Consensus: A Paradigm for Model Fitting with Application to Image Analysis and Automated Cartography"). , Comm. ACM, Vol. 6, No. 24, pp. 381-395 (1981)).

変換パラメータの推定アルゴリズムを図４に示す。まず、対応関係の集合からランダムに2つの対応関係を選ぶ（図４の２行目）。そして、2つの対応関係から相似変換パラメータを求める（３行目）。次に、得られた相似変換パラメータの評価を行う。これは、相似変換パラメータを支持する対応関係の数をスコアとすることで行われる（４行目）。具体的には、対応関係の書き込み画像側の点をパラメータによって変換し、得られた点の座標と対応関係の元画像側の点の座標との距離を求めて、閾値ｔ以下であれば対応関係がパラメータを支持するとみなす。このような処理をT回繰り返し（１〜６行目のforループ）、最大のスコアが閾値以上であれば（７行目）最大のスコアを得たパラメータを推定結果とする（８行目）。そうでなければ推定失敗とする（１０行目）。 A conversion parameter estimation algorithm is shown in FIG. First, two correspondences are selected at random from the set of correspondences (second line in FIG. 4). Then, a similarity transformation parameter is obtained from the two correspondences (third line). Next, the obtained similarity transformation parameter is evaluated. This is performed by setting the number of correspondences supporting the similarity transformation parameter as a score (line 4). Specifically, the point on the writing image side of the correspondence is converted by the parameter, and the distance between the coordinates of the obtained point and the coordinates of the point on the original image side of the correspondence is obtained. Assume that the relationship supports the parameter. Such processing is repeated T times (for loop in the 1st to 6th lines), and if the maximum score is equal to or greater than the threshold (7th line), the parameter having the maximum score is used as the estimation result (8th line). . Otherwise, the estimation failure is assumed (line 10).

2.4. 相似変換処理
第４の処理は、相似変換処理である。ここまでの処理で得られた相似変換パラメータを用いて、書き込み画像に相似変換を適用し、元画像への位置合わせを行う。これにより、元画像と同じ大きさの補正された書き込み画像が得られる。 2.4. Similarity Conversion Process The fourth process is a similarity conversion process. Using the similarity transformation parameter obtained so far, the similarity transformation is applied to the written image, and the original image is aligned. As a result, a corrected written image having the same size as the original image is obtained.

３．差分取得
3.1. 前処理
前述のように、差分取得処理１３は、６つの小さな処理に分けられる。第１の処理は、前処理である。ここでは以降の処理を適切に行うため、補正された書き込み画像と元画像に対して画像処理を施す。
まず、処理の軽減のため書き込み画像および元画像をそれぞれZ倍、z倍に縮小する。
次に、書き込み画像にカーネルの大きさがg×gのガウシアンフィルタを適用し、プリンタで印刷する際に生じるドットパターンの除去を行う。これは、電子文書では中間色となっている部分が印刷の際に原色を用いたドットパターンになるため、そのまま差分をとるとノイズが生じるためである。 3. Difference acquisition
3.1. Preprocessing As described above, the difference acquisition process 13 is divided into six small processes. The first process is a preprocess. Here, image processing is performed on the corrected written image and the original image in order to appropriately perform the subsequent processing.
First, to reduce processing, the written image and the original image are reduced to Z times and z times, respectively.
Next, a Gaussian filter having a kernel size of g × g is applied to the written image, and the dot pattern generated when printing with the printer is removed. This is because in an electronic document, a portion that is an intermediate color becomes a dot pattern that uses a primary color during printing, and noise is generated if the difference is taken as it is.

元画像に対しても処理を行う。ここで、元画像として2つのパターンを考える。1つは、電子文書から直接得た画像であり、もう1つは、電子文書をプリンタで印刷し、それをスキャナで取り込んで得た画像である。前者の場合は、erosion演算をR回繰り返して画像を太らせる。これは、印刷の際ににじみの生じる書き込み画像との差分を適切に得るためである。後者の場合は、書き込み画像と同様にG×Gのガウシアンフィルタを適用し、ドットパターンの除去を行う。 Processing is also performed on the original image. Here, two patterns are considered as the original image. One is an image obtained directly from an electronic document, and the other is an image obtained by printing an electronic document with a printer and capturing it with a scanner. In the former case, the image is fattened by repeating erosion calculation R times. This is to appropriately obtain a difference from a written image that is blurred during printing. In the latter case, the dot pattern is removed by applying a G × G Gaussian filter as in the case of the written image.

3.2. 差分・閾値処理
第２の処理は、差分・閾値処理である。ここでは補正された書き込み画像と元画像を画素ごとに比較し、差分および閾値処理を行う。ただし、スキャナによる画像取り込みの際の歪みや特徴点の誤差などによって、書き込み画像と元画像に微小なずれがあり得る。このため、ずれを考慮しつつ差分を取る必要がある。 3.2. Difference / Threshold Processing The second processing is difference / threshold processing. Here, the corrected written image and the original image are compared for each pixel, and difference and threshold processing are performed. However, there may be a slight deviation between the written image and the original image due to distortion at the time of image capture by the scanner, error of feature points, and the like. For this reason, it is necessary to take the difference while considering the deviation.

この実施形態では、以下のようにして差分および閾値処理を行う。まず、書き込み画像の各画素と元画像の対応する画素を比較する。書き込み画像の画素と比較する元画像の画素は、同一の座標のものではない。この実施形態では、図５に示すように、同一の座標を中心としたN×Nの正方形の領域内にある元画像の画素が対応画素の候補になる。前記領域内の画素のうち、比較対象の書込み画像の画素に画素値の最も近いものを対応画素とする。このように、所定範囲内の画素から画素値の最も近いものを探すことで、書き込み画像と元画像に微小なずれがあっても対応する画素を精度よく見つけることができる。こうして得られた書き込み画像と元画像の画素値の差分を求める。 In this embodiment, the difference and threshold processing is performed as follows. First, each pixel of the written image is compared with the corresponding pixel of the original image. The pixels of the original image to be compared with the pixels of the written image are not of the same coordinates. In this embodiment, as shown in FIG. 5, pixels of the original image in an N × N square area centered on the same coordinates are candidates for corresponding pixels. Among the pixels in the region, the pixel closest in pixel value to the pixel of the writing image to be compared is set as the corresponding pixel. In this way, by searching for the closest pixel value from the pixels within the predetermined range, it is possible to accurately find the corresponding pixel even if there is a slight deviation between the written image and the original image. The difference between the pixel values of the written image and the original image obtained in this way is obtained.

ここで、画素値の差分をそのまま出力すると、その後の２値化処理で色付きの背景になされた書き込みが消えてしまう場合がある。例えば、濃い色の背景に書き込みを行った場合、差分の値は小さくなるため、２値化の閾値によっては失われることがある。また、小さい差分でも書き込みとして抽出するように２値化の閾値を設定すると、今度は汚れや変色がノイズとなって現れる。このような問題を避けるため、この実施形態においては画素値の差分と定められた閾値とを比較する。閾値を超えた場合は、画素値の差ではなく書き込み画像の画素値をそのまま出力する。これにより、濃い色の書き込みのみを得ることができる。 Here, if the pixel value difference is output as it is, the writing made on the colored background in the subsequent binarization processing may be erased. For example, when writing is performed on a dark background, the difference value becomes small and may be lost depending on the binarization threshold. If a threshold value for binarization is set so that even a small difference is extracted as writing, dirt or discoloration appears as noise this time. In order to avoid such a problem, in this embodiment, the difference between pixel values is compared with a predetermined threshold value. When the threshold value is exceeded, the pixel value of the written image is output as it is, not the difference in pixel value. Thereby, only dark color writing can be obtained.

3.3. ２値化処理
第３の処理は、２値化処理である。ここでは、図６で示されるように、差分・閾値処理で得られた書き込み（差分画像）をあらかじめ定められた閾値で２値化する。なお２値化の閾値は、スキャナで得た画像に対してｕ、電子文書から直接得た画像に対してｘを用いる。一般に、ｕはｘより小さい。その理由は次の通りである。書き込み画像はスキャナで取り込むため、元画像をスキャナで取り込んだ場合は、両者は同様の画像変換を経ていることから、書き込み以外の領域ではほとんど差分がなくなる。一方、元画像を電子文書から変換して得た場合には、色合いなどが異なるために書き込み以外の領域でも差分が生じる。このような抽出対象でない差分を取り除くためには、２値化処理においてより大きな閾値が必要になる。 3.3. Binarization The third process is a binarization process. Here, as shown in FIG. 6, the writing (difference image) obtained by the difference / threshold processing is binarized with a predetermined threshold. As the binarization threshold, u is used for an image obtained by a scanner, and x is used for an image obtained directly from an electronic document. In general, u is less than x. The reason is as follows. Since the written image is captured by the scanner, when the original image is captured by the scanner, since both have undergone the same image conversion, there is almost no difference in the area other than the writing. On the other hand, when the original image is obtained by converting from an electronic document, a difference occurs even in an area other than writing because of a difference in color. In order to remove such a difference that is not an extraction target, a larger threshold is required in the binarization process.

3.4. 書き込み連結処理
第４の処理は、書き込み連結処理である。ここまでの処理では、書き込みの抽出のために書き込み画像と元画像の差分を求め、２値化を行った。これまでの処理の問題点は、ノイズの影響で書き込みが部分的に失われていることである。そこで、モルフォロジ演算の１つであるクロージング（closing）を用いた画像処理によって連結成分をまとめることで書き込みの復元を行う。 3.4. Write Concatenation Process The fourth process is a write concatenation process. In the process so far, the difference between the written image and the original image is obtained for the extraction of writing, and binarization is performed. The problem with the processing so far is that writing is partially lost due to noise. Therefore, writing is restored by collecting connected components by image processing using closing (closing) which is one of the morphological operations.

図７にclosingによる書き込みの連結処理を示す。closingとは、まずディレーション（dilation、即ち拡張）によって連結成分を拡張し、次にerosionによって連結成分を縮小するものである。ここで、それぞれの処理で拡張・縮小されるピクセル数は同じであるため、ごま塩ノイズ（ドット状ノイズ）のように孤立した連結成分の面積は変化しない。一方、図７に示されるような分断された連結成分では、dilationで連結成分が結合するため、erosionで縮小した後もまとめられた状態を維持する。これによって分断された書き込みを連結させることができる。なお、closing処理において用いる繰り返し数をhとする。 FIG. 7 shows a connection process of writing by closing. Closing means that the connected component is first expanded by dilation, and then the connected component is reduced by erosion. Here, since the number of pixels expanded / reduced in each process is the same, the area of isolated connected components such as sesame salt noise (dot noise) does not change. On the other hand, in the divided connected components as shown in FIG. 7, since the connected components are combined by dilation, the combined components are maintained even after being reduced by erosion. As a result, the divided writing can be linked. Note that the number of repetitions used in the closing process is h.

以上の処理で書き込みに対応して得られる領域（透過領域）は、本来の書き込みよりも大きめのものになる。これは、この後のAND処理においてマスクとして用いるためである。AND処理において書き込みの欠損を防ぐためには、マスクの透過領域は書き込みよりも大きい必要がある。 The area (transmission area) obtained by the above processing corresponding to writing becomes larger than the original writing. This is for use as a mask in the subsequent AND processing. In order to prevent writing loss in AND processing, the transparent area of the mask needs to be larger than writing.

3.5. ノイズ除去
第５の処理は、ノイズ除去処理である。ここでは、差分・２値化処理で生じた細かいノイズを除去する。具体的には、図８に示されるように、まず、各連結成分の面積を調べる。そして、予め定められた閾値M以上の面積の連結成分のみを書き出す。これにより、ノイズが除去される。なお、差分・閾値処理および２値化処理の段階で書き込み領域が細かく分断されていても、続く書き込み連結処理で閾値Ｍ以上の面積に結合された領域は、このノイズ除去処理で誤って除去されることはない。 3.5. Noise removal The fifth process is a noise removal process. Here, fine noise generated by the difference / binarization processing is removed. Specifically, as shown in FIG. 8, first, the area of each connected component is examined. Then, only connected components having an area equal to or larger than a predetermined threshold value M are written out. Thereby, noise is removed. Even if the write area is finely divided at the stage of the difference / threshold process and the binarization process, the area coupled to the area of the threshold value M or more in the subsequent write connection process is erroneously removed by this noise removal process. Never happen.

3.6. AND処理
最後の第６の処理は、AND処理である。ここまでの処理で得られた画像にdilation演算を繰り返し数Dで適用し、マスクを作成する。マスクと書き込み画像とのANDを画素ごとにとることで書き込みを抽出する。処理を図９に示す。このように、マスクの透過領域が大きめに取られているため、書き込みだけでなく背景も抽出される。しかし、目視によって解析する際は、書き込みと背景は容易に判別できるためあまり問題とはならない。 3.6. AND Process The final sixth process is an AND process. A dilation operation is applied to the image obtained by the above processing with the repetition number D to create a mask. Writing is extracted by taking the AND of the mask and the writing image for each pixel. The process is shown in FIG. As described above, since the transparent region of the mask is made larger, not only the writing but also the background is extracted. However, when visual analysis is performed, writing and background can be easily discriminated, which is not a problem.

4. 実験例
以下に、本願発明の手法についての実験例とその結果を説明する。この実験例では、カラーPDFファイルから得た画像を元画像とし、PDFファイルを印刷したものに書き込みを行い、スキャナで取得した画像を書き込み画像とした。
実験は2通りの元画像を用いて行った。一方はPDFファイルをラスタ形式に変換した画像（元画像Ａ）であり、もう一方はPDFファイルを一旦印刷し、書き込み画像の取得に用いたものと同じスキャナで取り込んで画像化したもの（元画像Ｂ）である。前者は印刷やスキャニングの際の色の変化や歪みを受けないため、書き込み画像との対応点探索処理や差分処理が比較的困難となる。後者は元画像と書き込み画像の両方に同様の変化が生じるため、これらの処理は比較的容易である。なお、元画像の数はそれぞれ109枚である。 4. Experimental Examples Experimental examples and results of the method of the present invention will be described below. In this experimental example, an image obtained from a color PDF file was used as an original image, writing was performed on a printed PDF file, and an image obtained by a scanner was used as a written image.
The experiment was performed using two original images. One is an image obtained by converting a PDF file into a raster format (original image A), and the other is a PDF file that is once printed and captured by the same scanner used to acquire the written image (original image) B). Since the former is not subjected to color change or distortion during printing or scanning, the corresponding point search process and difference process with the written image are relatively difficult. Since the latter causes the same change in both the original image and the written image, these processes are relatively easy. The number of original images is 109 for each.

書き込み画像は、印刷された文書に対して黒・赤・青のボールペンで文字や図形などを数箇所に書き込み、スキャナを用いて600dpiで取り込んで作成した。1つの元画像に対して3通りの書き込み画像を用意したため、書き込み画像の数は327枚である。2種類の元画像および書き込み画像の例を図１０に示す。図１０（ａ）が元画像Ａ、図１０（ｂ）が元画像Ｂ、図１０（ｃ）が書き込み画像である。 Written images were created by writing characters and figures in black / red / blue ballpoint pens at several locations on a printed document and capturing them at 600 dpi using a scanner. Since three kinds of writing images are prepared for one original image, the number of writing images is 327. Examples of two types of original images and written images are shown in FIG. 10A shows the original image A, FIG. 10B shows the original image B, and FIG. 10C shows the written image.

実験に用いた計算機はAMD Opteron 2.8GHzのCPUを搭載し、16GBのメモリを持つものである。また、実験でのパラメータを表１に示す。ここで、表１中の各アルファベットは、前述した実施形態の記載に対応している。例えば、Ｌは、特徴点抽出における画像の縮小倍率である。また、ｌ（エル）は特徴点抽出におけるerosion演算の繰り返し回数である。 The computer used in the experiment is equipped with AMD Opteron 2.8GHz CPU and 16GB memory. The experimental parameters are shown in Table 1. Here, each alphabet in Table 1 corresponds to the description of the embodiment described above. For example, L is an image reduction magnification in feature point extraction. Further, l is the number of repetitions of erosion calculation in feature point extraction.

4-1. PDFからの画像を用いた場合
4-1. When using images from PDF

まず、元画像としてPDFファイルからの画像を用いた場合について述べる。この実験でのパラメータを表1に示す。元画像には109個のPDFファイルを600dpiで画像に変換したものを用いた。1つの元画像に対して3通りの書き込み画像を用意したため、書き込み画像の数は327枚である。書き込み抽出画像を目視で評価した結果を表２に示す。 First, the case where an image from a PDF file is used as the original image will be described. The parameters in this experiment are shown in Table 1. For the original image, 109 PDF files converted to images at 600 dpi were used. Since three kinds of writing images are prepared for one original image, the number of writing images is 327. Table 2 shows the result of visual evaluation of the written extracted image.

ここで、成功、かすれ・ノイズあり、失敗は、抽出された書込みが以下の状態であることを表す。
1) 成功：書き込みが十分抽出され、ノイズがほとんどないもの（図１１参照）
2) かすれ・ノイズあり：書き込みが部分的にかすれたり、目立つノイズがあったりするもの（図１２参照）
3) 失敗：書き込みがほぼ完全に失われていたり、紙面の大半をノイズが占めていたりするもの（図１３参照）
表２には、文字などの単色領域を多く含む画像と、単色領域の少ない画像とで分けて示してある。これは、単色の連結成分の重心を特徴点とする実施態様での入力画像の性質の影響を明らかにするためである。なお、単色領域か多いかどうかの判定は目視によって行った。 Here, success, fading / noise, and failure indicate that the extracted writing is in the following state.
1) Success: Writing is extracted sufficiently and there is almost no noise (see Fig. 11)
2) Fading and noise: Writing is partially faint or noticeable noise (see Fig. 12)
3) Failure: Writing has been almost completely lost or noise has occupied most of the paper (see Fig. 13).
Table 2 shows an image including many monochromatic areas such as characters and an image having few monochromatic areas. This is for clarifying the influence of the nature of the input image in the embodiment in which the center of gravity of the single color connected component is the feature point. Whether or not there are many monochromatic regions was determined by visual observation.

なお、図１１（ａ）は、「成功」に分類された書き込み画像、図１１（ｂ）は、図１１（ａ）の画像から抽出された書き込みを示す。図１２（ａ）は、「かすれ・ノイズあり」に分類された書込み画像、図１２（ｂ）は、図１２（ａ）から抽出された書き込みを示す。また、図１３（ａ）は、「失敗」に分類された書込み画像、図１３（ｂ）は、図１３（ａ）から抽出された書き込みを示す。 Note that FIG. 11A shows a writing image classified as “success”, and FIG. 11B shows writing extracted from the image of FIG. FIG. 12A shows a writing image classified as “having blur and noise”, and FIG. 12B shows writing extracted from FIG. 12A. FIG. 13A shows the writing image classified as “failure”, and FIG. 13B shows the writing extracted from FIG. 13A.

以下では実験結果の考察を行う。PDFからの画像を直接画像に変換した場合はスキャンした画像を用いた場合よりも成功率が低くなっていることが分かる。これは、元画像が印刷とスキャニングを経ておらず、元画像と書き込み画像の色合いが異なり、このため、対応点探索に失敗することが原因と考えられる。元画像と書き込み画像の色合いが異なると、特徴点抽出処理での色クラスタリングの結果が異なるものになり、このために、対応する特徴点が得られないことがある。また、対応点探索処理で色クラスタ同士の対応付けに失敗することもある。これが、PDFからの画像を元画像とした場合に失敗の割合が多い原因であると考えられる。 The experimental results will be discussed below. It can be seen that the success rate is lower when images from PDF are converted directly into images than when scanned images are used. This is considered to be because the original image has not undergone printing and scanning, and the hue of the original image and the written image is different, so that the corresponding point search fails. If the hues of the original image and the written image are different, the result of color clustering in the feature point extraction process will be different, and therefore, corresponding feature points may not be obtained. In addition, the correspondence between the color clusters may fail in the corresponding point search process. This is considered to be the reason why the failure rate is high when an image from PDF is used as the original image.

この問題については、印刷およびスキャニングにおける画像劣化のモデルを作成することで対処できると考えられる。元画像がPDFから得たものであった場合、元画像に書き込み画像と同様の劣化を再現する処理を加えることができれば、色合いの相違を低減し、対応点探索における失敗を低減できると考えられる。 This problem can be dealt with by creating a model of image degradation in printing and scanning. If the original image was obtained from PDF, it would be possible to reduce the difference in hue and reduce failure in corresponding point search if processing that reproduces the same deterioration as the written image can be added to the original image. .

また、表2から、単色領域の多く含まれる画像の方が高い精度が得られたことが分かる。これは、単色領域の少ない画像はこの実施形態の手法に適さないためである。この実施形態では、色クラスタリングした結果の連結成分の重心を特徴点としている。これは、図１１のような、文字などの単一色の図形が多く含まれる画像に適したものである。しかし、図１３のように、文字があまり含まれず、グラデーションのある図形や写真が大部分を占める画像では、特徴点を安定に抽出することができない。つまり、元画像と書き込み画像で異なる特徴点が抽出される。そのため、対応点探索を正しく行うことができず、以降の処理に失敗する。このことから、特徴点が安定して得られる文字中心のモノクロ文書を対象とすればより高い精度が期待できると考えられる。 Also, from Table 2, it can be seen that a higher accuracy was obtained for an image including a large number of monochrome regions. This is because an image with a small monochromatic area is not suitable for the method of this embodiment. In this embodiment, the feature point is the center of gravity of the connected component as a result of color clustering. This is suitable for an image including a large number of single-color figures such as characters as shown in FIG. However, as shown in FIG. 13, feature points cannot be stably extracted in an image that does not include so many characters and occupies most of a gradation figure or photograph. That is, different feature points are extracted between the original image and the written image. Therefore, the corresponding point search cannot be performed correctly, and subsequent processing fails. From this, it can be expected that higher accuracy can be expected for a character-centered monochrome document in which feature points are stably obtained.

なお、処理時間については、A4サイズの文書を600dpiの解像度で取得した場合、1枚あたり平均で20秒程度、A3の文書ならば40秒程度という結果を得た。処理時間の多くは特徴点抽出や差分処理などの画像処理に費やされており、解像度を落とすなどの単純な高速化処理で処理時間の短縮が可能であると考えられる。
4-2. スキャンした画像を用いた場合 Regarding the processing time, when an A4 size document was acquired at a resolution of 600 dpi, an average of about 20 seconds per sheet was obtained, and for an A3 document, about 40 seconds were obtained. Most of the processing time is spent on image processing such as feature point extraction and difference processing, and it is thought that processing time can be shortened by simple high-speed processing such as reducing the resolution.
4-2. Using scanned images

次に、元画像としてスキャンした画像を用いた場合について述べる。この実験では、元画像は書き込み画像と同じスキャナおよび同じ設定で書き込みのない文書を取得したものである。
この発明の手法を適用する上での問題点を明確にするため、図14(a)に示されるようなポスターやWebページなどの図形や写真を多く含むカラー文書と、図14(b)に示されるような文字が紙面の大部分を占めるモノクロ文書を用いてそれぞれ実験を行った。
4.2.1. カラー文書を用いた実験 Next, a case where a scanned image is used as the original image will be described. In this experiment, the original image was obtained by acquiring the same scanner as the written image and a document without writing with the same settings.
In order to clarify the problems in applying the method of the present invention, a color document containing many figures and photographs such as posters and web pages as shown in FIG. 14 (a), and FIG. 14 (b) Each experiment was carried out using a monochrome document in which the letters shown occupy most of the paper.
4.2.1. Experiments with color documents

この実験では、PDFからの画像を用いた場合の実験と同様に、109枚の元画像および327枚の書き込み画像を用いた。特徴点抽出の際の連結成分の最大面積Eを∞（無限大）とした以外は表1と同じパラメータを用いた。Eを極めて大きい値としたのは、大きい連結成分からも特徴点を抽出するためである。
この実験では、最終的な書き込みの抽出結果だけでなく、位置合わせ処理が終わった時点の補正された書き込み画像についても評価を行った。これは、位置合わせ処理と差分処理の、それぞれの性能を明確にするためである。位置合わせ処理の結果の判定は、目視によって行った。 In this experiment, 109 original images and 327 written images were used in the same manner as in the experiment using images from PDF. The same parameters as in Table 1 were used except that the maximum area E of the connected components during feature point extraction was ∞ (infinity). The reason why E is set to a very large value is to extract feature points from a large connected component.
In this experiment, not only the final writing extraction result but also the corrected writing image at the time when the alignment process was completed were evaluated. This is to clarify the respective performances of the alignment process and the difference process. The determination of the result of the alignment process was performed visually.

位置合わせの結果を表3に、抽出結果を表4に示す。 The alignment results are shown in Table 3, and the extraction results are shown in Table 4.

以下では実験結果を考察する。まず、表3に示す位置合わせの結果について考察する。全体では、91%が位置合わせに成功した。特に、単色領域の多いものでは99%が成功した。図１５は、単色領域が多い場合の対応点の例を示す説明図である。前述したように、単色領域が多いと、安定した特徴点を多く抽出することができ、多くの正しい対応点を得ることができる。一方、図１６は、単色領域が少ない場合の対応点の例を示す説明図である。単色領域が少ないと、正しい対応点があまり得られなくなる。単色領域が多いものでも2つの画像で失敗した。これらは、同一画像内で全く同じテキストが複数の場所に存在するなど、同じ特徴点の配置が複数あるものである。図１７は、同じ特徴点の配置を複数の場所にもつ例を示す説明図である。このような場合、誤った対応が生じるため位置合わせに失敗する。全体としては、ポスターなどのカラー文書でも91%で位置合わせに成功したことから、この実施形態の手法がロバストであることが示された。 The experimental results are considered below. First, consider the alignment results shown in Table 3. Overall, 91% were successfully aligned. In particular, 99% succeeded in those with many monochromatic areas. FIG. 15 is an explanatory diagram illustrating an example of corresponding points when there are many single color regions. As described above, when there are many monochromatic regions, many stable feature points can be extracted, and many correct corresponding points can be obtained. On the other hand, FIG. 16 is an explanatory diagram showing an example of corresponding points when there are few monochromatic regions. If the monochromatic area is small, correct corresponding points cannot be obtained. Even those with many monochromatic areas failed in two images. These have a plurality of arrangements of the same feature points such that the same text exists in a plurality of places in the same image. FIG. 17 is an explanatory diagram showing an example in which the same feature points are arranged at a plurality of locations. In such a case, the alignment fails because an incorrect response occurs. Overall, 91% of color documents such as posters were successfully registered, indicating that the method of this embodiment is robust.

次に、表4に示す書き込み抽出の結果について考察する。全体では76%、単色領域の多いものに限ると88%が書き込み抽出に成功した。また、単に閲覧するだけといった用途では、多少のかすれやノイズがあってもあまり問題はない。このような場合では、成功率は全体で91%、単色領域の多いものでは98%であるといえる。失敗に分類されたもののほとんどは、位置合わせの時点で失敗したものである。位置合わせが成功したにも関わらず、失敗あるいはノイズ・かすれありに分類されたものは、色付きの背景に書き込まれたために差分処理において書き込みが消失したものと、位置合わせに微小な誤差があったために差分処理後にノイズが生じたものである。 Next, consider the results of write extraction shown in Table 4. In total, 76% succeeded in extracting and writing out only 88% of the monochromatic areas. Also, in applications such as simply browsing, there is no problem even if there is a slight blur or noise. In such a case, the success rate is 91% overall, and it can be said that it is 98% in the case of many monochromatic regions. Most of those classified as failures are those that failed at the time of alignment. Even though the alignment was successful, those that were classified as failed or noise / blurred were written on a colored background, so the writing was lost in the difference processing, and there was a slight error in alignment. The noise is generated after the difference processing.

4.2.2. モノクロ文書を用いた実験
カラー文書を用いた実験では、文字などの単色領域の少ない画像や、色付きの背景に書き込みのなされた画像において書き込み抽出が困難であることが示された。本実験では、多数の文字を含み、白地に書き込みのなされるモノクロ文書に対してこの実施形態の手法を適用し、この発明の手法に適した対象での性能を調べた。
実験には英語論文のPDFファイルから作成した34枚の元画像と、それらに黒・赤・青のボールペンで書き込みを行った34枚の書き込み画像を用いた。色クラスタリングの際のクラスタ数はk=2とし、それ以外の条件は前記項目4.2.1の場合と同じものとした。 4.2.2. Experiments using monochrome documents Experiments using color documents showed that writing extraction is difficult for images with few monochromatic areas, such as text, and images written on a colored background. In this experiment, the technique of this embodiment was applied to a monochrome document including a large number of characters and written on a white background, and the performance of an object suitable for the technique of the present invention was examined.
In the experiment, we used 34 original images created from PDF files of English papers and 34 written images written on them with black, red, and blue ballpoint pens. The number of clusters at the time of color clustering was set to k = 2, and other conditions were the same as in the case of item 4.2.1.

位置合わせおよび書き込み抽出の結果を、それぞれ表5と表6に示す。表5に示されるように、モノクロ文書の場合ではすべての画像で位置合わせに成功した。これは、多数の文字が含まれるため、多くの安定な特徴点を得ることが可能であり、それによって多くの正しい対応点から適切な変換パラメータを推定できるためである。表6に示されるように、2枚の文書において抽出された書き込みにかすれが生じていた。これらは、赤色のボールペンによる書き込みが薄くなっていたため、差分処理で部分的に消失したものである。ただし、94%のもので書き込み抽出に成功しており、失敗したものは存在しなかった。以上のことから、この発明の手法は文字中心のモノクロ文書に対しては極めて有効であることが示された。 The results of alignment and writing extraction are shown in Table 5 and Table 6, respectively. As shown in Table 5, in the case of a monochrome document, alignment was successful for all images. This is because a large number of characters are included, so that it is possible to obtain many stable feature points, and thereby it is possible to estimate an appropriate conversion parameter from many correct corresponding points. As shown in Table 6, blurring occurred in the extracted writing in the two documents. These have been partially lost in the differential processing because the writing with the red ballpoint pen was thin. However, 94% of them succeeded in write extraction, and none failed. From the above, it has been shown that the method of the present invention is extremely effective for a character-centered monochrome document.

なお、特徴点抽出処理を改良して文字をあまり含まない対象についても安定な特徴点を得られるように改善する手法として、Harrisオペレータ（例えば、C.Harris and M.Stephens, "A Combined Corner and Edge Detector", Proc. 4th Alvey Vision Conf., pp.147-151 (1988)を参照）などのコンピュータビジョンでの研究成果を導入することが有効であると考えられる。また、画像処理の改良によって、さらなる高速化も可能であると考える。 As a technique for improving the feature point extraction process so that stable feature points can be obtained even for objects that do not contain many characters, Harris operators (for example, C. Harris and M. Stephens, "A Combined Corner and It is considered effective to introduce research results in computer vision such as Edge Detector ", Proc. 4th Alvey Vision Conf., Pp. 147-151 (1988)). Further, it is considered that further improvement in image processing is possible.

前述した実施の形態の他にも、この発明について種々の変形例があり得る。それらの変形例は、この発明の範囲に属さないと解されるべきものではない。この発明には、請求の範囲と均等の意味および前記範囲内でのすべての変形とが含まれるべきである。 In addition to the embodiments described above, there can be various modifications of the present invention. These modifications should not be construed as not belonging to the scope of the present invention. The present invention should include the meaning equivalent to the scope of the claims and all modifications within the scope.

本発明によると、書き込みは、文書内での位置、大きさ、形状の情報とともに抽出される。元となる電子文書が利用可能であるとき、文書内での位置、大きさ、形状などの情報を用いると、書き込みの周囲にどのような図形、単語、文字が存在しているのかがわかる。この情報を利用することによって、書き込みの索引付けが可能となる。具体的には、書き込みの周囲に存在する単語をキーワードとして書き込みに付与する。一旦、キーワードが
付与されると、書き込みのキーワード検索が可能となるほか、(1)書き込みをした人が、その文書のどの部分に興味を持っていたのかという情報を抽出できる (ユーザ・プロファイルの作成）、(2)その文書のどの部分が重要であるのかをランク付けできる（例えば、多くのユーザが下線や囲みを施した箇所は重要）、などの情報処理が可能となる。 In accordance with the present invention, writing is extracted along with information about position, size, and shape within the document. When the original electronic document is available, information such as position, size, and shape in the document can be used to determine what figure, word, and character exist around the writing. By using this information, writing can be indexed. Specifically, words existing around the writing are given to the writing as keywords. Once a keyword is assigned, it is possible to search for a keyword for writing, and (1) it is possible to extract information about which part of the document the person who wrote is interested in (user profile Creation), and (2) which part of the document is important can be ranked (for example, the places underlined and enclosed by many users are important).

この発明の書込み抽出方法の処理の流れを示す説明図である。It is explanatory drawing which shows the flow of a process of the writing extraction method of this invention. 対象画像と、それを色クラスタリングして得られる画像の一例を示す説明図である。It is explanatory drawing which shows an example of a target image and the image obtained by color-clustering it. 元画像と書込み画像との対応付けが、色クラスタの特徴点単位でなされた例を示す説明図である。It is explanatory drawing which shows the example with which matching with the original image and the writing image was made | formed in the feature point unit of the color cluster. 特徴点の対応関係に基づいて相似変換パラメータを推定するアルゴリズムの一例を示す説明図である。It is explanatory drawing which shows an example of the algorithm which estimates a similarity transformation parameter based on the correspondence of a feature point. 元画像と書込み画像との対応画素の差分を求める処理の手順を示す説明図である。It is explanatory drawing which shows the procedure of the process which calculates | requires the difference of the corresponding pixel of an original image and a writing image. 差分・閾値処理によって抽出された書き込みを２値化処理する様子を示す説明図である。It is explanatory drawing which shows a mode that the write extracted by the difference and threshold value process is binarized. closingによる書き込みの連結処理の手順を示す説明図である。It is explanatory drawing which shows the procedure of the connection process of the writing by closing. ノイズ除去処理の手順を示す説明図である。It is explanatory drawing which shows the procedure of a noise removal process. AND処理の手順を示す説明図である。It is explanatory drawing which shows the procedure of AND processing. 実験に用いた２種類の元画像と書込み画像の一例を示す説明図である。It is explanatory drawing which shows an example of two types of original images used for experiment, and a writing image. 実験例で、「成功」に分類された書込み画像と書き込みの抽出結果を示す説明図である。In an experimental example, it is explanatory drawing which shows the write image classified into "success", and the extraction result of writing. 実験例で、「かすれ・ノイズあり」に分類された書込み画像と書き込みの抽出結果を示す説明図である。It is explanatory drawing which shows the extraction result of the writing image classified into "with a blur and noise" in an experiment example, and writing. 実験例で、「失敗」に分類された書込み画像と書き込みの抽出結果を示す説明図である。In an experimental example, it is explanatory drawing which shows the writing image classified into "failure", and the extraction result of writing. 実験に用いた図１０と異なる元画像の例を示す説明図である。It is explanatory drawing which shows the example of the original image different from FIG. 10 used for experiment. 単色領域が多い場合の対応点の例を示す説明図である。It is explanatory drawing which shows the example of a corresponding point when there are many monochrome areas. 単色領域が少ない場合の対応点の例を示す説明図である。It is explanatory drawing which shows the example of a corresponding point when there are few monochrome areas. 同じ特徴点の配置を複数の場所にもつ例を示す説明図である。It is explanatory drawing which shows the example which has arrangement | positioning of the same feature point in several places.

Explanation of symbols

１１位置合わせ処理
１３差分取得処理
１５対象画像
１７ａ、１７ｂ、１７ｃ、１７ｄ、１７ｅ色クラスタの画像 11 Registration processing 13 Difference acquisition processing 15 Target image 17a, 17b, 17c, 17d, 17e Color cluster image

Claims

A method for extracting writing from an image in which writing is added to a document image,
Acquiring each of the original image before the writing is added and the writing image after the writing is added as a set of local regions represented by color components;
A step of aligning the two images by comparing the local region of the local region and writing the image of the original image,
For each pixel of one image of both images that have been aligned, a pixel corresponding to the other image or a pixel in the other image at a position within a predetermined range is used as a pair candidate, and similar pixels are selected from the pair candidates. Determining a pair of pixels by determining;
A writing extraction method characterized in that a computer processes a step of extracting pixels included in a writing image and not included in an original image as writing based on a difference in color components of a paired pixel .

Step, if the difference in color components of pixels in the pair is greater than a predetermined threshold, the method of writing the extraction according to claim 1, which is a step of outputting a pixel of the written image of the pair of extracting pixels as write .

The writing extraction method according to claim 1, wherein the step of aligning the images includes a process of performing color clustering on the original image and the written image and obtaining correspondence between the original image and the written image for each color clustered image.

The writing extraction method according to claim 1, wherein the step of aligning images includes a process of extracting feature points from the original image and the written image, and taking correspondence between the original image and the written image for each extracted feature point. .

After the step of extracting pixels as writing, the computer further processes a shaping step of shaping and outputting the extracted writing,
The shaping step includes obtaining a mask pattern in which a pixel extracted as writing is a transmissive portion, and the region of the transmissive portion is further expanded by a predetermined amount;
The write extraction method according to claim 1, further comprising the step of superimposing the obtained mask pattern on the original image and outputting the original image of a portion overlapping the expanded transmission part.

The step of obtaining the mask pattern includes the step of binarizing each pixel of the extracted writing,
The writing extraction method according to claim 5, further comprising a step of performing a connection process on each binarized pixel .

The write extraction method according to claim 6, wherein the step of obtaining a mask pattern further includes a step of removing a region smaller than a predetermined area from write regions generated by the concatenation process.

An apparatus for extracting writing from a written image in which writing is added to a document image,
An image acquisition unit for acquiring the original image before the writing is added and the writing image after the writing is added as a set of local regions represented by color components, and
By comparing the local region of the local region and writing the image of the original image and the positioning unit for aligning the two images,
For each pixel of one image of both images that have been aligned, a pixel corresponding to that position or a pixel in the other image at a position within a predetermined range is used as a pair candidate, and the color component of the pixel from the pair of candidates A counter- pixel determining unit that determines a pair of pixels by obtaining the closest pixel ;
A writing extraction apparatus comprising: a writing extracting unit that extracts pixels included in a writing image and not included in the original image as writing based on a difference between color components of the paired pixels .

A program for executing a process of extracting writing from a writing image in which writing is added to a document image,
An image acquisition unit for acquiring the original image before the writing is added and the writing image after the writing is added as a set of local regions represented by color components, and
By comparing the local region of the local region and writing the image of the original image and the positioning unit for aligning the two images,
For each pixel of one image of both images that have been aligned, a pixel corresponding to that position or a pixel in the other image at a position within a predetermined range is used as a pair candidate, and the color component of the pixel from the pair of candidates A counter- pixel determining unit that determines a pair of pixels by obtaining the closest pixel ;
A computer is caused to execute a function as a writing extraction unit that extracts pixels included in a writing image and not included in the original image as writing based on a difference between color components of the paired pixels. Write extraction program.