JP2011248494A

JP2011248494A - Picture synthesizing apparatus, picture synthesizing method and program for the same

Info

Publication number: JP2011248494A
Application number: JP2010118991A
Authority: JP
Inventors: Takuya Maekawa; 卓也前川; Hitoshi Sakano; 鋭坂野; Shogo Kimura; 昭悟木村
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2010-05-25
Filing date: 2010-05-25
Publication date: 2011-12-08

Abstract

PROBLEM TO BE SOLVED: To enable, when overlapping areas are detected among a plurality of partial pictures consecutively taken of characters and/or pictures handwritten by a copyist with a camera mounted on the hand or elsewhere and a whole picture is to be reconstructed by superposing the overlapping areas, elements that would obstruct detection of the overlapping areas to be excluded without having to fit a special device or to perform a special operation and the overlapping areas to be precisely detected even from pictures with little feature amount.SOLUTION: Pictures affected by camera shake or parts of pictures accidentally showing a hand or a pen, which would obstruct detection of overlapping areas among a plurality of partial photographic pictures, are detected and removed, and furthermore corners (crossings, edges and bends) of handwritten segments are utilized as feature points.

Description

本発明は、動画像から全景画像を構成する技術に属し、特に手書き文書、図形を対象とし、画像を記録する技術に関する。また、複数の連続する撮像画像を組み合わせて全景画像を得る画像モザイキングと呼ばれる技術体系に属する。 The present invention belongs to a technique for constructing a panoramic image from a moving image, and particularly relates to a technique for recording an image for a handwritten document and a figure. Further, it belongs to a technical system called image mosaicing that obtains a panoramic image by combining a plurality of consecutive captured images.

手書きの文書や図形（以下、これらをまとめて手書き文書と呼ぶ）の記録、電子化、認識は、文書情報処理や個々人の情報の記録などの観点から重要な研究課題と認識されており、様々な研究開発が行われ、製品化されてきた。 Recording, digitizing, and recognizing handwritten documents and figures (hereinafter collectively referred to as handwritten documents) are recognized as important research issues from the viewpoint of document information processing and recording of individual information. R & D has been conducted and commercialized.

既にイメージスキャナ、タブレット、マウスなどを用いて手書き文書を電子化することは日常的に行われている。また、ペン先に搭載したカメラと特殊な模様を印刷した用紙により手書き文書を電子化する製品もある(非特許文献1)。 Already, digitizing handwritten documents using an image scanner, a tablet, a mouse, or the like is routinely performed. There is also a product that digitizes handwritten documents using a camera mounted on the pen tip and a paper on which a special pattern is printed (Non-patent Document 1).

また、紙面上と同程度に重要な、ホワイトボードや黒板上の手書き文書を記録するため、ホワイトボードや黒板を撮像するように設置したカメラ画像から写り込んでいる人の姿を除去することにより、手書き文書を記録することも提案されている(非特許文献２、３)。 In addition, by recording handwritten documents on whiteboards and blackboards, which are as important as on paper, by removing the appearance of people appearing in the camera images set up to capture the whiteboards and blackboards. It has also been proposed to record handwritten documents (Non-Patent Documents 2 and 3).

また、ホワイトボードにタブレット機能を持たせることで手書き文書の記録と認識を実現している例もある(非特許文献４)。 In addition, there is an example in which recording and recognition of a handwritten document is realized by giving a tablet function to the whiteboard (Non-Patent Document 4).

一方で、身体に装着した小型センサを用いて装着者の日常を自動的に記録し、ライフログとして利用する研究が数多く行われている(非特許文献５、６)。これらの研究では人生を充実させるため、一つ又は複数のセンサを身体に装着することが日常的である社会を想定しており、実際、将来的にはこうしたセンサは特別のものではなくなるはずである。 On the other hand, many studies have been conducted to automatically record a wearer's daily life using a small sensor attached to the body and use it as a life log (Non-Patent Documents 5 and 6). These studies have envisioned a society where it is routine to wear one or more sensors on the body to enrich life, and in fact, these sensors should not be special in the future. is there.

本発明で想定しているのはそのような未来であり、図１に示すように身体に装着したカメラやセンサを用いて情報を取得し記録することは、何ら特別なことではなくなっているものとする。 The present invention envisions such a future, and acquiring and recording information using a camera or sensor attached to the body as shown in FIG. 1 is nothing special. And

しかし、カメラを図１に示すように身体に装着して筆記中の手書き文書の全景を得ようとするとき、個々のカメラ画像は筆記中の手書き文書のごく一部しか撮影できないという問題がある。こうした問題は航空写真からの地図作成、風景写真の撮影など、一回の撮影では全景の一部しか撮影できない場合において普遍的な問題である。そこで、これを解決するため、画像モザイキングと呼ばれる部分画像を組み合わせて全景画像を得る技術が研究され、実用に供されつつある。モザイキングの技術は一般的には部分画像の類似性に着目し、類似する小部分（重複部分）を検知して、これらを重ねる合わせることで全景画像を得る。近年は部分画像の類似性を評価するために、ＳＩＦＴ(Scale invariant faeture trancate、非特許文献７)やＳＵＲＦ(Speeded up robust feature、非特許文献８)などの輝度勾配ベースの局所特徴を複数採取し、特徴の存在する点(以下、特徴点)の類似性をベースに画像の重ね合わせを行うことで全景画像を得る方法が一般的である。 However, when the camera is mounted on the body as shown in FIG. 1 and an attempt is made to obtain a full view of the handwritten document being written, each camera image has a problem that only a small portion of the handwritten document being written can be photographed. . Such a problem is a universal problem when only a part of the entire view can be taken with a single shot, such as making a map from an aerial photograph or taking a landscape photograph. In order to solve this problem, a technique for obtaining a panoramic image by combining partial images called image mosaicing has been studied and put into practical use. The technique of mosaicing generally pays attention to the similarity of partial images, detects similar small portions (overlapping portions), and superimposes them to obtain a panoramic image. In recent years, in order to evaluate the similarity of partial images, multiple intensity gradient-based local features such as SIFT (Scale invariant faeture trancate, Non-Patent Document 7) and SURF (Speeded up robust feature, Non-Patent Document 8) have been collected. In general, a method for obtaining a panoramic image by superimposing images based on the similarity of points at which features exist (hereinafter, feature points) is used.

Anoto, "Anoto", [online], [平成22年4月26日検索], インターネット<URL : http://www.anoto.com/>Anoto, "Anoto", [online], [Search April 26, 2010], Internet <URL: http://www.anoto.com/> L. He and Z. Zhang, "Real-Time Whiteboard Capture and Processing Using a Video Camera for Remote Collaboration", IEEE Transactions on Multimedia, 2007, Vol.9, No.1, p.198-206L. He and Z. Zhang, "Real-Time Whiteboard Capture and Processing Using a Video Camera for Remote Collaboration", IEEE Transactions on Multimedia, 2007, Vol.9, No.1, p.198-206 P.E. Dickson, W.R. Adrion, and A.R. Hanson, "Whiteboard Content Extraction and Analysis for the Classroom Environment", IEEE International Symposium on Multimedia, 2008, p.702-707P.E.Dickson, W.R.Adrion, and A.R.Hanson, "Whiteboard Content Extraction and Analysis for the Classroom Environment", IEEE International Symposium on Multimedia, 2008, p.702-707 Marcus Liwicki and Horst Bunke，"Recognition of Whiteboard Notes - Online, Offline and Combination"，World Scientific Publishing，2008, p.1-11Marcus Liwicki and Horst Bunke, "Recognition of Whiteboard Notes-Online, Offline and Combination", World Scientific Publishing, 2008, p.1-11 L. Bao and S.S. Intille, "Activity Recognition from User-Annotated Acceleration Data", Proceedings of PERVASIVE 2004, 2004, p.1-17L. Bao and S.S. Intille, "Activity Recognition from User-Annotated Acceleration Data", Proceedings of PERVASIVE 2004, 2004, p.1-17 T. Maekawa, Y. Yanagisawa, Y. Kishino, K. Kamei, Y. Sakurai, and T. Okadome, "Wristband Type Sensor Device for Recognizing Activities that Involve Object Use", Proceedings of European Conference on Ambient Intelligence, 2009, Poster paperT. Maekawa, Y. Yanagisawa, Y. Kishino, K. Kamei, Y. Sakurai, and T. Okadome, "Wristband Type Sensor Device for Recognizing Activities that Involve Object Use", Proceedings of European Conference on Ambient Intelligence, 2009, Poster paper M. Brown and D.G. Lowe, "Recognizing Panoramas", International Conference on Computer Vision - Volume 2, 2003, p.1218-1225M. Brown and D.G.Lowe, "Recognizing Panoramas", International Conference on Computer Vision-Volume 2, 2003, p.1218-1225 H. Bay, T. Tuytelaars, and L. Van Gool, "SURF: Speeded Up Robust Features", European Conference on Computer Vision, 2006, p.404-417H. Bay, T. Tuytelaars, and L. Van Gool, "SURF: Speeded Up Robust Features", European Conference on Computer Vision, 2006, p.404-417

非特許文献１〜４の技術では、特別に用意されたペンや用紙、タブレット、スキャナ等の装置が必要となるためコストがかかる。また、ユーザもそれらの特殊な装置の操作方法を習得していなければならない。 In the techniques of Non-Patent Documents 1 to 4, a specially prepared device such as a pen, paper, tablet, or scanner is required, which is expensive. The user must also learn how to operate these special devices.

非特許文献５、６のような技術によれば、手首に装着したカメラ、センサを用いて手書き文書を自動的に記録することができる。しかし、手首に装着できる程度のカメラでは視野角が小さく、描画中の手書き文書全てにわたって撮影することができない。そのため、カメラの移動に伴ってカバーされる広範囲の画像の部分画像を統合して全景画像を得る必要がある。また、カメラを手首などに装着した場合、紙面とカメラの間に、描画する手、ペンなどが写りこんでしまう。更に、筆記動作は必ずしも安定したものではなく、手首などに装着可能なカメラの動画性能では追跡できないような高速動作をすることもあり、このような場合には、適切な部分画像を撮像することが困難である。加えて、白い紙にシンプルな手描き図形を描くことを考えると、自然画像などの複雑な画像の場合と比べ、自然な勾配を得ることが難しい。そのため、現在主流となっている非特許文献７（ＳＩＦＴ）や非特許文献８（ＳＵＲＦ）のような特徴抽出系では安定した特徴点を得ることが難しく、適切な合成画像が得られないという問題があった。特に、離れた位置から撮影されたストローク画像同士から類似した特徴点を発見することは困難である。 According to techniques such as Non-Patent Documents 5 and 6, a handwritten document can be automatically recorded using a camera and a sensor attached to the wrist. However, a camera that can be worn on the wrist has a small viewing angle and cannot capture all the handwritten documents being drawn. Therefore, it is necessary to obtain a panoramic image by integrating partial images of a wide range of images that are covered as the camera moves. In addition, when the camera is mounted on a wrist or the like, a hand or pen to be drawn is reflected between the paper surface and the camera. Furthermore, the writing operation is not always stable, and it may operate at a high speed that cannot be tracked with the video performance of a camera that can be worn on the wrist, etc. In such a case, an appropriate partial image should be taken. Is difficult. In addition, when considering drawing simple hand-drawn figures on white paper, it is difficult to obtain a natural gradient compared to the case of complex images such as natural images. Therefore, it is difficult to obtain stable feature points in feature extraction systems such as Non-Patent Document 7 (SIFT) and Non-Patent Document 8 (SURF) which are currently mainstream, and an appropriate composite image cannot be obtained. was there. In particular, it is difficult to find similar feature points from stroke images taken from distant positions.

本発明の目的は、筆記者が手で書いている文字や絵などを、手などに装着したカメラを用いて連続的に撮像した複数の部分画像について、重複領域を検知し、その重複領域を重ね合わせることにより全体画像に再構成する場合に、特別な装置を装着したり特別な操作を行うことなく、重複領域の検出の妨げとなる要素を排除でき、かつ、特徴量が少ない画像でも重複領域を精度よく検知可能な、画像合成装置、画像合成方法、及びそのプログラムを提供することにある。 An object of the present invention is to detect an overlapping area of a plurality of partial images obtained by continuously capturing characters or pictures handwritten by a writer using a camera attached to a hand or the like. When reconstructing the entire image by superimposing, elements that hinder the detection of overlapping areas can be eliminated without wearing a special device or performing special operations, and even images with small feature quantities are duplicated An object of the present invention is to provide an image composition device, an image composition method, and a program thereof that can accurately detect a region.

第１の本発明は、装着者が筆記する側の所定の部位に取り付けられたカメラから、その部位の動きに応じて連続的に撮像された複数の画像から大きな連結画像を合成する画像合成装置であって、前記所定の部位の動きの激しさや位置を検出する信号入力手段を備えることを特徴とする画像合成装置である。 According to a first aspect of the present invention, there is provided an image synthesizing device for synthesizing a large connected image from a plurality of images continuously captured in accordance with the movement of a part from a camera attached to a predetermined part on the side where the wearer writes. An image composition apparatus comprising signal input means for detecting the intensity and position of movement of the predetermined part.

第２の本発明は、第１に係る本発明の画像合成装置において、信号入力手段が加速度センサであることを特徴とする画像合成装置である。 A second aspect of the present invention is the image composition apparatus according to the first aspect of the present invention, wherein the signal input means is an acceleration sensor.

第３の本発明は、第１又は第２の本発明に係る画像合成装置において、前記撮像された画像の背景が紙面やホワイトボードである場合に、当該紙面やホワイトボードの色を検出し、それと異なる色彩の画素を写り込み画像とみなして除去する前景除去手段を備えることを特徴とする画像合成装置である。 In a third aspect of the present invention, in the image composition device according to the first or second aspect of the present invention, when the background of the captured image is a paper surface or a whiteboard, the color of the paper surface or the whiteboard is detected, An image synthesizing apparatus comprising foreground removal means for removing pixels having different colors as a reflected image.

第４の本発明は、第３の本発明に係る画像合成装置において、画像を合成する手掛かりとなる特徴点として、輝度勾配ベースの局所特徴に加え、白黒２値図形に特徴的な、線の交差点、屈曲点、及び端点を抽出する特徴抽出手段と、前記特徴点を用いて２つの画像間で重複する領域を検出することにより画像を合成する画像合成手段と、を備えることを特徴とする画像合成装置である。 According to a fourth aspect of the present invention, in the image composition device according to the third aspect of the present invention, in addition to a luminance gradient-based local feature, as a feature point serving as a clue to synthesize an image, And a feature extraction unit that extracts an intersection, a bending point, and an end point, and an image synthesis unit that synthesizes an image by detecting an overlapping area between two images using the feature point. An image composition device.

第５の本発明は、装着者が筆記する側の所定の部位に取り付けられたカメラから、その部位の動きに応じて連続的に画像を撮像する画像入力ステップと、前記所定の部位の動きの激しさや位置を検出する信号入力ステップと、信号入力ステップでの検出情報に基づき、画像入力ステップにて撮像された複数の画像から所望の画質の画像を選択する画像選択ステップと、画像選択ステップにて選択された各画像の背景が紙面やホワイトボードである場合に、当該紙面やホワイトボードの色を検出し、それと異なる色彩の画素を写り込み画像とみなして除去する前景除去ステップと、前景除去ステップで得られた各画像について特徴点を抽出する特徴抽出ステップと、前記特徴点を用いて２つの画像間で重複する領域を検出し、当該領域が重なり合うように各画像を合成する画像合成ステップと、を実行する画像合成方法である。 According to a fifth aspect of the present invention, there is provided an image input step of continuously capturing images in accordance with movement of a part from a camera attached to a predetermined part on the side where the wearer writes, and movement of the predetermined part. A signal input step for detecting intensity and position, an image selection step for selecting an image of a desired image quality from a plurality of images captured in the image input step based on detection information in the signal input step, and an image selection step A foreground removal step for detecting the color of the paper surface or whiteboard when the background of each image selected in is a paper surface or a whiteboard, and removing pixels of a different color as a reflected image; A feature extraction step for extracting feature points for each image obtained in the removal step, and a region overlapping between the two images is detected using the feature points, and the regions overlap. An image combining step of combining the respective images to Migihitsuji, an image synthesizing method of execution.

第６の本発明は、第５の本発明に係る画像合成方法において、特徴抽出ステップで抽出する特徴点は、輝度勾配ベースの局所特徴に加え、白黒２値図形に特徴的な、線の交差点、屈曲点、及び端点を含むことを特徴とする画像合成方法である。 According to a sixth aspect of the present invention, in the image composition method according to the fifth aspect of the present invention, the feature points extracted in the feature extraction step are intersections of lines, which are characteristic of black and white binary graphics in addition to luminance gradient-based local features. And an inflection point and an end point.

第７の本発明は、第１乃至第４のいずれかの本発明の画像合成装置としてコンピュータを機能させるためのプログラムである。 A seventh aspect of the present invention is a program for causing a computer to function as the image composition apparatus according to any one of the first to fourth aspects of the present invention.

本発明の画像合成装置、画像合成方法、及びそのプログラムによれば、撮像した複数の部分画像の重複領域の検出の妨げとなる、手ぶれのある画像や画像内の手やペンが写りこんでいる部分を検出して除去することで重複領域の検出の妨げとなる要素を排除することができ、かつ、特徴点として手書き線分のコーナー（交差点、端点、屈曲点）を利用することで特徴量が少ない画像でも重複領域を精度よく検知することができる。そのため、特別な装置を装着したり特別な操作を行うことなく、高精度に全体画像を再構成することができる。 According to the image synthesizing apparatus, the image synthesizing method, and the program thereof according to the present invention, an image with a camera shake and a hand or a pen in the image that interfere with the detection of an overlapping area of a plurality of captured partial images are captured. By detecting and removing parts, it is possible to eliminate elements that hinder the detection of overlapping areas, and to use feature points by using corners (intersections, end points, bending points) of handwritten line segments as feature points Even in an image with few images, the overlapping area can be detected with high accuracy. Therefore, the entire image can be reconstructed with high accuracy without mounting a special device or performing a special operation.

カメラやセンサを身体に装着したイメージを示す図。The figure which shows the image which mounted | wore the body with the camera and the sensor. 画像構成装置１００の構成例を示すブロック図。1 is a block diagram illustrating a configuration example of an image configuration apparatus 100. FIG. 画像構成装置１００の処理フロー例を示す図。FIG. 3 is a diagram showing an example of a processing flow of the image construction apparatus 100. 手書きストロークを連続して撮像するイメージを示す図。The figure which shows the image which images a handwritten stroke continuously. 部分画像の重複領域を重ね合わせて全体画像を再構成するイメージを示す図。The figure which shows the image which superimposes the overlapping area of a partial image, and reconfigure | reconstructs the whole image. 画像から特徴として抽出するコーナー（交差点、端点、屈曲点など）の例を示す図。The figure which shows the example of the corners (an intersection, an end point, a bending point, etc.) extracted as a feature from an image.

図２は本発明の画像構成装置１００の構成例を示すブロック図、図３はその処理フロー例である。画像構成装置１００は、画像入力部２１、信号入力部２２、画像選択部２３、前景除去部２４、特徴抽出部２５、画像合成部２６、及び画像出力部２７を備える。 FIG. 2 is a block diagram showing a configuration example of the image construction apparatus 100 of the present invention, and FIG. 3 is a processing flow example thereof. The image construction apparatus 100 includes an image input unit 21, a signal input unit 22, an image selection unit 23, a foreground removal unit 24, a feature extraction unit 25, an image composition unit 26, and an image output unit 27.

画像入力部２１は、筆記者の手や手首や下腕などの所定の自然な部位に装着されるカメラであり、筆記者が手で紙面等に描いているストロークを例えば図４に示すように連続的に撮像する（Ｓ１）。例えば、手首に装着すれば、紙面の位置によらず紙面を撮影することができ、かつ、紙面の近くから撮影するため、細かな模様や文字などを捉えることが可能である。なお、撮像した画像データはヒストグラムなどに変換することでデータ量を削減することができる。信号入力部２２は例えば加速度センサであり、画像入力部２１とほぼ同じ位置に装着されて、カメラが装着された部位の動きの激しさや位置を検出する（Ｓ２）。加速度センサの代わりに、位置センサ、三次元マウス、赤外線受光部を備えたデバイス、ドップラーレーダなど、動作の激しさを検出できる他のデバイスを用いても構わない。なお、加速度データなどの波形データに高速フーリエ変換を適用してそのフーリエ係数のみを抽出することでデータ量を削減することができる。 The image input unit 21 is a camera that is attached to a predetermined natural part such as a writer's hand, wrist, or lower arm, and the stroke that the writer draws on the paper or the like by hand as shown in FIG. Images are continuously captured (S1). For example, if it is attached to the wrist, it is possible to photograph the paper surface regardless of the position of the paper surface, and it is possible to capture fine patterns and characters because it is photographed from near the paper surface. Note that the amount of data can be reduced by converting captured image data into a histogram or the like. The signal input unit 22 is an acceleration sensor, for example, and is mounted at substantially the same position as the image input unit 21 to detect the intensity and position of the movement of the part where the camera is mounted (S2). Instead of the acceleration sensor, another device that can detect the intensity of operation, such as a position sensor, a three-dimensional mouse, a device including an infrared light receiving unit, or a Doppler radar, may be used. The amount of data can be reduced by applying fast Fourier transform to waveform data such as acceleration data and extracting only the Fourier coefficients.

従来のタブレットやスキャナなどの入力デバイスは、筆記図形を電子的に記録するための特別な装置であるのに対し、こうしたカメラやセンサは他の目的にも自然に利用できる。一つ又は複数のカメラやセンサを身体に装着することは、将来的に筆記動作のみならず日常生活の記録のために特別なことでなくなると考えられ、本発明はそのような将来に特に親和性が高い。 While conventional input devices such as tablets and scanners are special devices for electronically recording written figures, such cameras and sensors can be naturally used for other purposes as well. Wearing one or more cameras or sensors on the body is considered not to be special for the recording of daily life as well as writing operations in the future. High nature.

画像選択部２３は、画像入力部２１より入力された画像がブレなどを含まない合成に適した画像であるかどうかを判定し、適した画像であると判定された画像を出力する（Ｓ３）。具体的には、まず、信号入力部２２にて検知された動作情報に基づきブレている画像を抽出し、破棄する。信号入力部２２として加速度センサを用いた場合、加速度データの移動平均からの変化量から手が動いている時間を検知し、その変化量が任意に設定した閾値より大きい時間に撮影された画像をブレ画像として破棄する。移動平均は加速度データ系列から求めることができる。加速度センサを用いたブレ画像の抽出は、既にデジタルカメラなどで用いられている技術である（例えば、参考文献１）。このように、カメラ装着部の動作情報により大まかに良質画像を選定した後、更に、画像処理によりブレのある画像を検出して破棄する。画像処理によるブレの検出に関しても多くの既存手法がある（例えば、参考文献２）。画像選択部２３は、以上のようにして選択した画像を出力する。なお、本発明においては特徴抽出部２５で各画像から複数の特徴点を抽出し、それらを用いて合成するが、特徴点が少ない画像は、ブレ画像やストロークを捉えていない画像であることが多いため、そのような画像については画像選択部２３において予め破棄してしまってもよい。 The image selection unit 23 determines whether or not the image input from the image input unit 21 is an image suitable for synthesis that does not include blur and the like, and outputs the image determined to be a suitable image (S3). . Specifically, first, a blurred image is extracted based on the operation information detected by the signal input unit 22 and discarded. When an acceleration sensor is used as the signal input unit 22, the time during which the hand is moving is detected from the amount of change from the moving average of the acceleration data, and an image taken at a time when the amount of change is larger than the arbitrarily set threshold value. Discard as blurred image. The moving average can be obtained from the acceleration data series. Extraction of a blurred image using an acceleration sensor is a technique already used in a digital camera or the like (for example, Reference 1). As described above, after a high-quality image is roughly selected based on the operation information of the camera mounting unit, a blurred image is further detected and discarded by image processing. There are many existing methods for blur detection by image processing (for example, Reference 2). The image selection unit 23 outputs the image selected as described above. In the present invention, the feature extraction unit 25 extracts a plurality of feature points from each image and combines them. However, an image with few feature points may be an image that does not capture a blurred image or a stroke. Since there are many such images, the image selection unit 23 may discard such images in advance.

〔参考文献１〕米国特許第４６７３２７６号明細書
〔参考文献２〕H. Tong, M. Li, H. Zhang and C. Zhang, "Blur Detection for Digital Images Using Wavelet Transform", Proceedings of IEEE International Conference on Multimedia and Expo, 2004, p.17-20 [Reference 1] US Pat. No. 4,673,276 [Reference 2] H. Tong, M. Li, H. Zhang and C. Zhang, "Blur Detection for Digital Images Using Wavelet Transform", Proceedings of IEEE International Conference on Multimedia and Expo, 2004, p.17-20

前景除去部２４は、画像選択部２３で選択された各画像の手前に映りこんだ手やペンなどの像を除去する（Ｓ４）。具体的には、背景となっている紙の色とは大きく異なる色の前景画像（領域）や、肌の色に近い色の前景画像（領域）を破棄することにより実現できる。より具体的には、各画像を格子状の小領域に分割し、小領域ごとに、その色との類似度合いから前景画像か否かを判別する。類似度は、例えばＨＳＶ色空間におけるユークリッド距離とし、任意の閾値を設定して判別する。なお、紙の色については、例えば画像入力部２１で取得した連続する画像系列の先頭の画像において最も多く使われている色を探すことにより検出することができる。また、肌の色については予め設定しておく。 The foreground removal unit 24 removes an image such as a hand or a pen reflected in front of each image selected by the image selection unit 23 (S4). Specifically, this can be realized by discarding a foreground image (region) having a color that is significantly different from the background paper color or a foreground image (region) having a color close to the skin color. More specifically, each image is divided into grid-like small areas, and it is determined for each small area whether or not it is a foreground image from the degree of similarity to the color. The similarity is, for example, the Euclidean distance in the HSV color space, and is determined by setting an arbitrary threshold value. The paper color can be detected, for example, by searching for the most frequently used color in the first image in the continuous image series acquired by the image input unit 21. The skin color is set in advance.

特徴抽出部２５は、前景除去部２４で前景が除去された各画像について、各画像の対応点をとるための特徴点を抽出する（Ｓ５）。ここでいう特徴とは、輝度勾配ベースの局所特徴であるＳＵＲＦ、ＳＩＦＴ等の特徴のほか、特に手書き文書などの白黒２値図形に多く見られる特徴である、線分のコーナー（交差点、端点、屈曲点など）も含む。線分の交差点、端点、屈曲点などの抽出は、前景抽出部２４において前景画像であると判断された小領域以外の小領域について抽出処理を行えばよい。 The feature extraction unit 25 extracts feature points for taking corresponding points of each image for each image from which the foreground has been removed by the foreground removal unit 24 (S5). The features referred to here are not only features such as SURF and SIFT, which are luminance gradient-based local features, but also features that are often found in black and white binary graphics such as handwritten documents, such as corners (intersections, end points, Inflection points). For the extraction of intersections, end points, inflection points, and the like of line segments, extraction processing may be performed on small areas other than the small areas determined by the foreground extraction unit 24 to be foreground images.

画像合成部２６は、特徴抽出部２５で抽出した特徴のうち、画像間で一致する特徴点を計算することにより画像の重複している領域を発見し、これらが重なり合うように各画像を合成する（Ｓ６）。具体的には、画像間の投影変換を求める。つまり、図５に示すような、重複領域のある画像１、２について、画像２に変換を適用し、画像１との重複領域と重なるような変換パラメータ（ホモグラフィ行列）を求める。その変換により、図５右部に示す再構成（合成）画像が作成される。以下、より具体的に説明する。 The image compositing unit 26 finds overlapping regions of the images by calculating feature points that match between the images extracted by the feature extracting unit 25, and synthesizes the images so that they overlap. (S6). Specifically, projection conversion between images is obtained. In other words, for images 1 and 2 having overlapping regions as shown in FIG. 5, transformation is applied to image 2 to obtain a transformation parameter (homography matrix) that overlaps with the overlapping region with image 1. By the conversion, a reconstructed (synthesized) image shown in the right part of FIG. 5 is created. More specific description will be given below.

１つの画像を木のノード、２画像間の投影変換をエッジと考える。得られた全ての画像を用いて再構成画像を作成するためには、それらの画像からスパニング木を作成する必要がある。スパニング木を作成できれば、任意の画像間の投影変換は、画像（ノード）間の投影変換（エッジ）を順次適用することにより求めることができる。そこで、画像合成部２６ではスパニング木を作成するため、まず、高速処理が可能な、特徴点を用いた画像の比較を行う。そして、特徴点を用いた比較のみでスパニング木を作成できなかった部分については、画像から抽出したコーナーを用いて重複領域を発見する。前述のとおり、特徴点を用いた比較では、離れた位置から撮影されたストローク画像同士から類似した特徴点を発見することが難しく、それ故、画像同士の重複領域を発見することが難しい。しかし、画像から抽出したコーナーを用いることで、離れた位置から撮影されたストローク画像同士についても重複領域を発見することができる。以下、特徴点を用いた比較手法とコーナーを用いた重複領域の発見手法について説明する。 One image is considered as a node of a tree and a projection transformation between two images as an edge. In order to create a reconstructed image using all the obtained images, it is necessary to create a spanning tree from these images. If a spanning tree can be created, projection transformation between arbitrary images can be obtained by sequentially applying projection transformation (edges) between images (nodes). Therefore, in order to create a spanning tree, the image composition unit 26 first compares images using feature points that can be processed at high speed. And about the part which could not create a spanning tree only by the comparison using a feature point, an overlapping area | region is discovered using the corner extracted from the image. As described above, in comparison using feature points, it is difficult to find similar feature points from stroke images photographed from distant positions, and therefore, it is difficult to find overlapping regions between images. However, by using the corner extracted from the image, it is possible to find an overlapping region even between stroke images taken from a distant position. In the following, a comparison method using feature points and an overlapping region discovery method using corners will be described.

＜特徴点を用いた比較手法＞
比較は、ＳＵＲＦなどの高速な特徴抽出手法を用いて行う。また、特徴抽出・記述手法は、ＳＵＲＦなどの回転や明るさの変化に対してある程度不変なものが望ましい。そして、それぞれの画像から得られた類似した特徴点のペアから投影変換を求める。得られた特徴点のペアにはエラーが多く含まれるため、投影変換の導出にはロバスト推定を用いる（参考文献３、４参照）。 <Comparison method using feature points>
The comparison is performed using a high-speed feature extraction method such as SURF. In addition, it is desirable that the feature extraction / description method is invariable to some extent with respect to rotation and brightness change such as SURF. Then, a projection transformation is obtained from a pair of similar feature points obtained from each image. Since the obtained feature point pairs contain many errors, robust estimation is used to derive the projection transformation (see References 3 and 4).

〔参考文献３〕M.A. Fischler and R.C. Bolles, "Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography", Communications of the ACM, 1981, Vol.24, p.381-395
〔参考文献４〕R. Hartley and A.Zisserman, "Multiple view geometry in computer vision", Cambridge University Press, 2003 [Reference 3] MA Fischler and RC Bolles, "Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography", Communications of the ACM, 1981, Vol.24, p.381-395
[Reference 4] R. Hartley and A. Zisserman, "Multiple view geometry in computer vision", Cambridge University Press, 2003

＜コーナーを用いた重複領域の発見手法＞
図６に示す２つの画像(a),(b)を例にとると、それぞれ○印で示す部分がコーナーである。画像間でコーナー同士の対応を見つけ、変換パラメータ（ホモグラフィ行列）を計算する。例えば、(a)の画像と(b)の画像との間で４つの対応するコーナーのペア (p1,p1')、(p2,p2')、(p3,p3')、(p4,p4')を見つければ、連立方程式を解くことでホモグラフィ行列を計算できる（参考文献５、６参照）。 <A method for finding overlapping areas using corners>
Taking the two images (a) and (b) shown in FIG. 6 as an example, the portions indicated by ◯ are corners. Find correspondences between corners between images and calculate transformation parameters (homography matrix). For example, four corresponding corner pairs (p1, p1 '), (p2, p2'), (p3, p3 '), (p4, p4') between the image of (a) and the image of (b) ), The homography matrix can be calculated by solving the simultaneous equations (see References 5 and 6).

〔参考文献５〕J. Shi and C. Tomasi, "Good features to track", IEEE Conference on Computer Vision and Pattern Recognition, 1994, p.593-600
〔参考文献６〕C.E. Springer, "Geometry and analysis of projective spaces", Freeman, 1964 [Reference 5] J. Shi and C. Tomasi, "Good features to track", IEEE Conference on Computer Vision and Pattern Recognition, 1994, p.593-600
[Reference 6] CE Springer, "Geometry and analysis of projective spaces", Freeman, 1964

そして、画像合成部２６で合成された画像が画像出力部２７から出力される（Ｓ７）。 Then, the image synthesized by the image synthesis unit 26 is output from the image output unit 27 (S7).

以上のように本発明によれば、筆記者が手で書いている文字や絵などを、手などに装着したカメラを用いて連続的に撮像した複数の部分画像について、重複領域を検知し、その重複領域を重ね合わせることにより全体画像に再構成する場合に、撮像した複数の部分画像の重複領域の検出の妨げとなる、手ぶれのある画像や画像内の手やペンが写りこんでいる部分を検出して除去することで重複領域の検出の妨げとなる要素を排除することができ、かつ、特徴点として手書き線分のコーナー（交差点、端点、屈曲点）を利用することで特徴量が少ない画像でも重複領域を精度よく検知することができる。そのため、特別な装置な装着したり特別な操作を行うことなく、高精度に全体画像を再構成することができる。 As described above, according to the present invention, the overlapping area is detected for a plurality of partial images obtained by continuously capturing characters or pictures written by the writer by hand using a camera attached to the hand, When reconstructing the whole image by superimposing the overlapping areas, it is a part of the image with hand shake and the hand or pen in the image that hinders detection of the overlapping area of the captured partial images By removing and detecting elements, it is possible to eliminate elements that hinder the detection of overlapping areas, and feature quantities can be obtained by using corners (intersections, end points, bending points) of handwritten line segments as feature points. Even with a small number of images, an overlapping region can be detected with high accuracy. Therefore, the entire image can be reconstructed with high accuracy without wearing a special device or performing a special operation.

上記の各処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。 Each of the above processes is not only executed in time series according to the description, but may also be executed in parallel or individually according to the processing capability of the apparatus that executes the processes or as necessary. In addition, it can change suitably in the range which does not deviate from the meaning of this invention.

また、上記の各装置をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そのプログラムは、例えば、ハードディスク装置に格納されており、実行時には必要なプログラムやデータがＲＡＭ(Random Access Memory)に読み込まれる。その読み込まれたプログラムがＣＰＵにより実行される。このようにして、コンピュータ上で各処理内容が実現される。なお、処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 When each of the above devices is realized by a computer, the processing contents of the functions that each device should have are described by a program. The program is stored in, for example, a hard disk device, and necessary programs and data are read into a RAM (Random Access Memory) at the time of execution. The read program is executed by the CPU. In this way, each processing content is realized on the computer. Note that at least a part of the processing content may be realized by hardware.

Claims

An image synthesizer that synthesizes a large connected image from a plurality of images continuously captured according to the movement of the part from a camera attached to a predetermined part on the side where the wearer writes,
An image synthesizing apparatus comprising signal input means for detecting the intensity and position of movement of the predetermined part.

The image composition apparatus according to claim 1,
The image synthesizing apparatus, wherein the signal input means is an acceleration sensor.

In the image composition device according to claim 1 or 2,
When the background of the imaged image is a paper surface or a whiteboard, foreground removal means is provided that detects the color of the paper surface or the whiteboard and regards pixels of a different color as a reflected image and removes them. An image composition device.

The image composition device according to claim 3.
Feature extraction means for extracting line intersections, inflection points, and end points characteristic of black-and-white binary graphics, in addition to luminance gradient-based local features, as feature points that serve as clues for synthesizing images;
Image compositing means for compositing images by detecting an overlapping area between two images using the feature points;
An image composition apparatus comprising:

An image input step for continuously capturing images according to the movement of the part from a camera attached to a predetermined part on the side where the wearer writes;
A signal input step for detecting the intensity and position of the movement of the predetermined part;
An image selection step for selecting an image of a desired image quality from a plurality of images captured in the image input step based on the detection information in the signal input step;
Foreground removal step of detecting the color of the paper surface or whiteboard when the background of each image selected in the image selection step is a paper surface or whiteboard, and considering pixels of a different color as a reflected image and removing them When,
A feature extraction step for extracting feature points for each image obtained in the foreground removal step;
An image combining step of detecting an overlapping area between two images using the feature points, and combining the images so that the areas overlap;
The image composition method to execute.

The image composition method according to claim 5, wherein
The image synthesizing method characterized in that the feature points extracted in the feature extraction step include, in addition to the luminance gradient-based local features, line intersections, bend points, and end points characteristic of black and white binary graphics.

A program for causing a computer to function as the image composition device according to claim 1.