JP2002095011A

JP2002095011A - Method and device for generating pseudo-three dimensional image

Info

Publication number: JP2002095011A
Application number: JP2000275813A
Authority: JP
Inventors: Yoshihisa Shinagawa; 嘉久品川; Hiroki Nagashima; 浩樹永嶋
Original assignee: Monolith Co Ltd
Current assignee: Monolith Co Ltd
Priority date: 2000-09-12
Filing date: 2000-09-12
Publication date: 2002-03-29

Abstract

PROBLEM TO BE SOLVED: To provide a pseudo-three dimensional image generating method for displaying solid images, by using encoding and decoding technology which efficiently compresses image data. SOLUTION: The image is photographed from two or more directions, and matching is calculated between two or above original images including a viewpoint with respect to an object. The image is interpolated, based on the result of matching and a composited image viewed from a setting eye direction is generated and data of the composited image is displayed, so that a natural image which does not have unnatural divided parts or steps is displayed.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、画像データ処理
技術に関し、特に、撮影した方向と異なる方向から見た
画像を合成することにより対象物を立体的に提示する疑
似３次元画像生成方法と装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an image data processing technique, and more particularly to a method and apparatus for generating a pseudo three-dimensional image by presenting an object three-dimensionally by synthesizing an image viewed from a direction different from the photographing direction. About.

【０００２】[0002]

【従来の技術】近年、映像技術の進歩により、視線の変
化に合わせてカメラを移動する代わりに、多くのカメラ
により色々な方向から対象を捉えて撮影した映像を用
い、撮影後の編集において視野を移動させたときの映像
を合成することにより、臨場感のある画像を提示するこ
とができるようになった。また、インターネットなどオ
ンラインで画像データを配信し、これを処理して映画や
動画、あるいは３Ｄ立体像として表示したり、パノラマ
写真や連続写真として看者が指示した通りの視点から見
られるようにしたものもある。特に、オンラインコマー
スなどでは、画像を提示して広告することが当たり前に
なっているが、任意の方向から対象商品を観察できると
便利であり、こうした技術が利用されている。なお、オ
ンラインショッピングにおいては、画像を立体的に表示
するだけで購買意欲を数倍増大させることが分かってい
る。2. Description of the Related Art In recent years, with the advance of image technology, instead of moving a camera in accordance with a change in a line of sight, images taken by capturing an object from various directions by many cameras are used, and a field of view in editing after shooting is used. By synthesizing the video when the image is moved, it is possible to present a realistic image. In addition, image data is distributed online such as the Internet, and the image data is processed and displayed as a movie, a moving image, or a 3D stereoscopic image, or can be viewed as a panoramic photograph or a continuous photograph from the viewpoint specified by the viewer. There are also things. In particular, in online commerce and the like, it is natural to present an image and advertise it. However, it is convenient to be able to observe a target product from any direction, and such a technology is used. In online shopping, it is known that the desire to purchase is increased several times only by displaying an image three-dimensionally.

【０００３】上記のように複数の画面を用いて視野の変
化に追従する画像を提供する方法は、視野の重なった映
像を多数準備しておいて視線の移動に従って表示画像を
切り換えたり、隣り合わせた映像にずれがないように継
いでおいて、視野の移動に合わせて映像を切り出して表
示するようにしたものである。このような方法では隣接
する撮影画像の間に画像のずれが存在するため、１枚の
撮影画像から隣の画像に視線を移すときに画像の仕切り
を意識させず自然に感じさせるために、重なり部分にお
ける寸法差が十分小さくなるようにしたり、２つの画像
が重なる部分を利用して精密に寸法合わせする必要があ
る。As described above, a method of providing an image that follows a change in the visual field using a plurality of screens is to prepare a large number of images having overlapping visual fields and switch the displayed image according to the movement of the line of sight, or change the displayed image adjacent to each other. The images are joined so that there is no deviation, and the images are cut out and displayed according to the movement of the visual field. In such a method, there is an image shift between adjacent photographed images, so that when moving the line of sight from one photographed image to an adjacent image, it is possible to make the image partition feel natural without being aware of the partition of the image. It is necessary to make the dimensional difference in the portion sufficiently small, or to precisely adjust the size using the portion where the two images overlap.

【０００４】また、インターネットなどにおいて、オン
ラインで画像データを配信し受信者装置に動画あるいは
３Ｄ立体などとして表示する場合も、隣り合った画像同
士の差が大きいほど移動する視線に対応する自然な画像
を生成させることが難しい。したがって、隣合った画像
の撮影角度を極く僅か変化させて取った極めて多数の画
像データを準備して使用しなければならない。このよう
に、立体表示をするには極めて大量の画像データを必要
とし、情報伝送量に制約があるインターネットなどでは
現実的でない。また、２枚の画像の重なりを調整するた
めには２個の画面における映像位置の対応を正確に指定
する必要があるが、これを自動的に行うことは困難であ
る。[0004] Also, when image data is distributed online and displayed on a receiver device as a moving image or a 3D stereoscopic image on the Internet or the like, a natural image corresponding to a moving line of sight moves as the difference between adjacent images increases. Is difficult to generate. Therefore, it is necessary to prepare and use an extremely large number of image data obtained by slightly changing the photographing angles of adjacent images. As described above, stereoscopic display requires an extremely large amount of image data, which is not practical on the Internet or the like where the amount of information transmission is limited. In addition, in order to adjust the overlap between two images, it is necessary to accurately specify the correspondence between the video positions on the two screens, but it is difficult to automatically perform this.

【０００５】[0005]

【発明が解決しようとする課題】本発明はこうした現状
に鑑みてなされたものであり、その目的は、画像データ
の効率的圧縮を実現する符号化および復号技術を利用し
て立体像の提示を行う疑似３次元画像生成方法の提供に
ある。SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and has as its object to present a stereoscopic image using an encoding and decoding technique for achieving efficient compression of image data. Another object of the present invention is to provide a pseudo three-dimensional image generation method.

【０００６】[0006]

【課題を解決するための手段】本発明の疑似３次元画像
生成方法は、２以上の方向から撮影し対象物に対する視
点を画像内に含んだ２枚以上の原画像の間でマッチング
を計算する工程と、該マッチングの結果に基づき補間し
て設定した視線方向から見た合成画像を生成する工程
と、該合成画像のデータを提示する工程を含むことを特
徴とする。SUMMARY OF THE INVENTION A pseudo three-dimensional image generating method according to the present invention calculates matching between two or more original images photographed from two or more directions and including a viewpoint for an object in the image. And a step of generating a composite image viewed from a line of sight set by interpolation based on the result of the matching, and a step of presenting data of the composite image.

【０００７】本発明の疑似３次元画像生成方法によれ
ば、原画像の間でとったマッチングの結果に基づき補間
して中間点である視線方向から見た合成画像を生成す
る。本発明では、複数の原画像を単に選択して切り換え
表示するのではなく、原画像間のマッチングを自動的に
行い、その結果を利用し画像を補間することにより新し
い画像を形成して表示するため、従来方法におけるよう
な不自然な仕切り目や段差のない自然な画像表示が可能
である。According to the pseudo three-dimensional image generation method of the present invention, a composite image viewed from the line of sight, which is an intermediate point, is generated by interpolating based on the result of matching between the original images. According to the present invention, instead of simply selecting and switching and displaying a plurality of original images, matching between the original images is automatically performed, and a new image is formed and displayed by interpolating the image using the result. Therefore, a natural image display without unnatural partitions and steps as in the conventional method is possible.

【０００８】また、本発明によれば、予想される多数の
視点を含み得るように対象物を撮影した原画像を多数用
意しておいて、看者が画像中の視点を指定すると用意さ
れた原画像の中からその視点を含む原画像を選択して、
指定された視点を中心とする画像を合成して表示するの
で、看者は対象物を任意の方向から見ることができるか
ら、あたかも立体画像を観察するのと同じ感覚を持つこ
とになる。Further, according to the present invention, a large number of original images of a target object are prepared so as to include a large number of expected viewpoints, and prepared when a viewer designates a viewpoint in the image. Select the original image containing the viewpoint from the original image,
Since the image centered on the designated viewpoint is synthesized and displayed, the viewer can see the object from any direction, and thus has the same feeling as observing a stereoscopic image.

【０００９】また、被写体の周囲に配置したカメラや被
写体の周囲を移動するカメラにより異なる位置から複数
の原画像を取得し、この中から指定された視点を含む原
画像を選択し、これらの原画像の間でマッチング計算し
て、設定された視点を中心に有する合成画像を生成して
提示するようにすることができる。Further, a plurality of original images are obtained from different positions by a camera arranged around the subject or a camera moving around the subject, and an original image including a designated viewpoint is selected from these, and these original images are selected. By performing matching calculation between the images, it is possible to generate and present a composite image having the set viewpoint at the center.

【００１０】本発明におけるマッチング計算工程は、各
原画像に多重解像度特異点フィルタを作用させて解像度
の異なる一連の階層画像を生成し、解像度レベルの異な
る階層画像に基づいて原画像同士の間でマッチングを計
算するものであり、合成画像生成工程が、設定された視
点と原画像の視点の位置関係を用いて原画像中のマッチ
ングした画素同士の位置および輝度の対応関係を基に画
素単位で補間計算を実施して中間画像を生成するもので
あってよい。In the matching calculation step of the present invention, a multi-resolution singularity filter is applied to each original image to generate a series of hierarchical images having different resolutions. Computation is calculated, and the composite image generation step uses the positional relationship between the set viewpoint and the viewpoint of the original image, based on the pixel-to-pixel correspondence based on the position and brightness of matched pixels in the original image. An intermediate image may be generated by performing an interpolation calculation.

【００１１】本発明で処理される画像データは、映画や
アニメなどの動画でもよいし、医療画像や商品写真など
のように三次元オブジェクトを二次元化した静止画でも
よい。さらにいえば、１枚の画像ごとに画像データが対
応する任意次元の画像すべてが処理可能である。本方法
ではマッチングを用いるので、原画像同士のずれがかな
り大きくても自然な中間画像を得ることができる。した
がって、任意の視点から見た中間画像を得るために準備
する原画像の数は少なく、扱う情報量も少なくて済む。
また、マッチング計算も高速演算が可能なので、リアル
タイムで画像処理して視点移動に伴う画像合成を行って
動画や映画として表示することも可能である。The image data processed in the present invention may be a moving image such as a movie or an animation, or a still image obtained by converting a three-dimensional object into a two-dimensional image such as a medical image or a product photograph. Furthermore, all the images of an arbitrary dimension corresponding to the image data for each image can be processed. Since matching is used in this method, a natural intermediate image can be obtained even if the deviation between the original images is considerably large. Therefore, the number of original images prepared to obtain an intermediate image viewed from an arbitrary viewpoint is small, and the amount of information to be handled can be small.
In addition, since the matching calculation can be performed at a high speed, it is also possible to perform image processing in real time, perform image synthesis accompanying the movement of the viewpoint, and display the moving image or movie.

【００１２】演算に必要な画像データ量が小さいので、
インターネットやオンデマンド映画放送のように通信量
に制約のある場合にも十分自然な移動画像の提供を行う
ことができる。もちろん、オンラインコマース（ＥＣ）
などで求められている３Ｄ映像による商品表示に適用す
ることにより大きな効果をもたらすことができる。な
お、本発明の疑似３次元画像生成方法では、生成した階
層画像に関する画像データの符号化データとして出力、
保存または伝送するようにして、電子情報伝達量を節減
することができる。Since the amount of image data required for the calculation is small,
It is possible to provide a sufficiently natural moving image even when the communication amount is restricted as in the case of the Internet or on-demand movie broadcasting. Of course, online commerce (EC)
A great effect can be brought about by applying the present invention to a product display using a 3D video, which is required in the above. Note that, in the pseudo three-dimensional image generation method of the present invention, output as encoded data of image data relating to the generated hierarchical image,
By storing or transmitting, the amount of electronic information transmission can be reduced.

【００１３】本発明の疑似３次元画像生成装置は、被写
体の周囲において複数の位置から撮影して複数の画像の
画像データを取得するユニットと、対象物に対する視線
を画像内に含む２枚以上の原画像を選択するユニット
と、原画像の画像データそれぞれに多重解像度特異点フ
ィルタを作用させて解像度の異なる一連の階層画像の画
像データを生成するユニットと、解像度レベルの異なる
階層画像の画像データに基づいて原画像同士の間でマッ
チングを計算するユニットと、前記設定された視点と原
画像の視点の位置関係を用いて前記原画像中のマッチン
グした画素同士の位置および輝度の対応関係を基に画素
単位で補間計算を実施して中間画像の画像データを生成
するユニットを含み、前記設定された視点を含む方向か
ら見た合成画像を生成して提示することを特徴とする。
これらの各ユニットは、ソフトウエア、ハードウエアの
任意の組合せで実現できる。[0013] A pseudo three-dimensional image generating apparatus according to the present invention includes a unit for acquiring image data of a plurality of images by photographing from a plurality of positions around a subject, and two or more images including a line of sight to an object in the image. A unit for selecting an original image, a unit for applying a multi-resolution singularity filter to each image data of the original image to generate image data for a series of hierarchical images having different resolutions, and a unit for generating image data for hierarchical images having different resolution levels. A unit that calculates matching between the original images based on the positional relationship between the set pixels and the matched pixels in the original image using the positional relationship between the set viewpoint and the viewpoint of the original image. A unit that performs interpolation calculation on a pixel-by-pixel basis to generate image data of an intermediate image, and generates a composite image viewed from a direction including the set viewpoint. Characterized in that it presented in.
Each of these units can be realized by any combination of software and hardware.

【００１４】なお、疑似３次元画像を生成するために使
用する原画像の画像データはロボットにより取得するよ
うにすることができる。ロボットを用いて必要な画像デ
ータを取得するようにすると、多数のカメラを配設した
撮影装置を準備しなくても、ロボットが自立走行したり
ロボットアームを操作したりして、予め最適化した撮影
位置と撮影方向を忠実に再現した画像の画像データを必
要なだけ自動的に取得することができる。The image data of the original image used to generate the pseudo three-dimensional image can be obtained by a robot. By using a robot to acquire the required image data, the robot can run independently or operate the robot arm without having to prepare an imaging device with a large number of cameras. It is possible to automatically acquire image data of an image in which a photographing position and a photographing direction are faithfully reproduced as required.

【００１５】本発明によれば、取得済みの映像を利用し
て撮影時とは異なる視点からの映像を自動的に合成する
ことができるので、対象物を表示装置に立体的に表示す
る訳ではないが、実質的に任意の方向から見た画像を得
るので、疑似的に３次元画像表示を実現することができ
る。このように、本発明の方法による疑似３次元画像を
用いてインターネットなどで立体的な事物の情報を正確
に伝達することができ、たとえば立体的な商品を展示す
る場合にも、看者が指示する任意の視点から対象物を観
察できるように表示して、購買意欲を喚起することがで
きる。According to the present invention, it is possible to automatically synthesize an image from a viewpoint different from that at the time of shooting by using an already-acquired image, so that an object is not stereoscopically displayed on a display device. However, since an image viewed from substantially any direction is obtained, a pseudo three-dimensional image display can be realized. As described above, it is possible to accurately transmit information of a three-dimensional object over the Internet or the like using the pseudo three-dimensional image according to the method of the present invention. It is possible to display the object so that the object can be observed from any viewpoint to stimulate purchase will.

【００１６】また、本発明によれば、極めて高速にマッ
チング計算と補間による合成画像生成を実施することが
できるので、映画や動画として表示することが可能であ
る。さらに、階層画像に基づいたマッチングによれば、
画像データの情報量が十分に小さいので、原画像データ
をほぼオンラインで伝送し受信者側で処理して疑似３次
元画像表示をすることができる。Further, according to the present invention, since a composite image can be generated by matching calculation and interpolation at a very high speed, it can be displayed as a movie or a moving image. Furthermore, according to the matching based on the hierarchical image,
Since the information amount of the image data is sufficiently small, the original image data can be transmitted almost online and processed at the receiver side to display a pseudo three-dimensional image.

【００１７】[0017]

【発明の実施の形態】本発明の疑似３次元画像生成の技
術を説明するに当たり、はじめに、図１から図２２を用
いて、実施の形態で利用する多重解像度特異点フィルタ
技術とそれを用いた画像マッチング処理を「前提技術」
として詳述する。これらの技術は本出願人がすでに特許
第２９２７３５０号を得ている技術であり、本発明との
組合せに適する。ただし、実施の形態で採用可能な画像
マッチング技術はこれに限らないことはいうまでもな
い。DESCRIPTION OF THE PREFERRED EMBODIMENTS In describing a pseudo three-dimensional image generation technique according to the present invention, first, a multi-resolution singularity filter technique used in the embodiment and a technique using the same will be described with reference to FIGS. "Prerequisite technology" for image matching
It will be described in detail. These techniques have already been obtained by the present applicant in Japanese Patent No. 2927350, and are suitable for combination with the present invention. However, it goes without saying that the image matching technique that can be adopted in the embodiment is not limited to this.

【００１８】[前提技術に関する従来の技術]ふたつの画
像の自動的なマッチング、つまり画像領域や画素どうし
の対応付けは、コンピュータビジョンやコンピュータグ
ラフィックスにおける最も難しくかつ重要なテーマのひ
とつである。例えば、あるオブジェクトに関して異なる
視点からの画像間でマッチングがとれれば、他の視点か
らの画像を生成することができる。右目画像と左目画像
のマッチングが計算できれば、立体画像を用いた写真測
量も可能である。顔の画像のモデルと他の顔の画像のマ
ッチングがとれたとき、目、鼻、口といった特徴的な顔
の部分を抽出することができる。例えば人の顔と猫の顔
の画像間でマッチングが正確にとられたとき、それらの
中間画像を自動的に生成することでモーフィングを完全
自動化することができる。[Prior Art Related to Prerequisite Technology] Automatic matching of two images, that is, correspondence between image regions and pixels, is one of the most difficult and important themes in computer vision and computer graphics. For example, if matching can be achieved between images of a certain object from different viewpoints, an image from another viewpoint can be generated. If the matching between the right-eye image and the left-eye image can be calculated, photogrammetry using a stereoscopic image is also possible. When a model of a face image is matched with another face image, characteristic face portions such as eyes, nose, and mouth can be extracted. For example, when matching is correctly performed between images of a human face and a cat face, morphing can be completely automated by automatically generating an intermediate image between them.

【００１９】しかし従来一般に、ふたつの画像間の対応
点は人がいちいち指定しなければならず、多大な作業工
数を要した。この問題を解消するために数多くの対応点
自動検出方法が提案されている。例えば、エピポーラ直
線を用いることによって対応点の候補の数を減らす考え
がある。しかし、その場合でも処理はきわめて複雑であ
る。複雑さを低減するために、左目画像の各点の座標は
通常右目画像でもほぼ同じ位置にあると想定される。し
かし、こうした制約を設けると、大域的特徴及び局所的
特徴を同時に満たすマッチングをとることは非常に困難
になる。However, conventionally, generally, the corresponding point between two images has to be designated by a person, and a large number of work steps are required. To solve this problem, a number of corresponding point automatic detection methods have been proposed. For example, there is a method of reducing the number of corresponding point candidates by using an epipolar straight line. However, even in that case, the processing is extremely complicated. To reduce complexity, the coordinates of each point in the left-eye image are usually assumed to be approximately at the same position in the right-eye image. However, with such constraints, it is very difficult to achieve matching that satisfies both global and local features at the same time.

【００２０】ボリュームレンダリングでは、ボクセルを
構成するために一連の断面画像が用いられる。この場
合、従来一般に、上方の断面画像における画素が下方の
断面画像の同一箇所にある画素と対応すると仮定され、
これらの画素のペアが内挿計算に用いられる。このよう
にきわめて単純な方法を用いるため、連続する断面間の
距離が遠く、オブジェクトの断面形状が大きく変化する
場合、ボリュームレンダリングで構築されたオブジェク
トは不明瞭になりがちである。In volume rendering, a series of cross-sectional images is used to construct a voxel. In this case, conventionally, it is generally assumed that the pixel in the upper cross-sectional image corresponds to the pixel at the same position in the lower cross-sectional image,
These pixel pairs are used for the interpolation calculation. Since such a very simple method is used, if the distance between consecutive sections is long and the cross-sectional shape of the object changes greatly, the object constructed by volume rendering tends to be unclear.

【００２１】立体写真測量法など、エッジの検出を利用
するマッチングアルゴリズムも多い。しかしこの場合、
結果的に得られる対応点の数が少ないため、マッチング
のとれた対応点間のギャップを埋めるべく、ディスパリ
ティの値を内挿計算しなければならない。一般にあらゆ
るエッジ検出器は、それらが用いる局所的なウィンドウ
の中で画素の輝度が変化したとき、これが本当にエッジ
の存在を示唆するかどうかを判断することが難しい。エ
ッジ検出器は、本来的にすべてハイパスフィルタであ
り、エッジと同時にノイズも拾ってしまう。There are many matching algorithms that use edge detection, such as the three-dimensional photogrammetry method. But in this case,
Since the resulting number of corresponding points is small, the value of the disparity must be interpolated to fill the gap between the matched corresponding points. In general, all edge detectors have difficulty determining if this really indicates the presence of an edge when the brightness of a pixel changes within the local window they use. Edge detectors are all high-pass filters in nature, and pick up noise simultaneously with edges.

【００２２】さらに別の手法として、オプティカルフロ
ーが知られている。二枚の画像が与えられたとき、オプ
ティカルフローでは画像内のオブジェクト（剛体）の動
きを検出する。その際、オブジェクトの各画素の輝度は
変化しないと仮定する。オプティカルフローでは例えば
（ｕ，ｖ）のベクトル場の滑らかさといった、いくつか
の付加的な条件とともに、各画素の動きベクトル（ｕ，
ｖ）を計算する。しかし、オプティカルフローでは画像
間の大域的な対応関係を検出することはできない。画素
の輝度の局所的な変化に注目するのみであり、画像の変
位が大きい場合、システムの誤差は顕著になる。As another method, an optical flow is known. When two images are given, the optical flow detects the movement of an object (rigid body) in the images. At this time, it is assumed that the brightness of each pixel of the object does not change. In the optical flow, the motion vector (u, v) of each pixel is added together with some additional conditions, for example, the smoothness of the vector field of (u, v).
Calculate v). However, the optical flow cannot detect a global correspondence between images. It focuses only on local changes in pixel brightness, and when the displacement of the image is large, system errors become significant.

【００２３】画像の大域的な構造を認識するために、多
重解像度フィルタも数多く提案されてきた。それらは線
形フィルタと非線形フィルタに分類される。前者の例と
してウェーブレットがあるが、線形フィルタは一般に、
画像マッチングにはさして有用ではない。なぜなら、極
値をとる画素の輝度に関する情報がそれらの位置情報と
ともに次第に不鮮明になるためである。図１（ａ）と図
１（ｂ）はそれぞれ図１９（ａ）と図１９（ｂ）に示す
顔の画像に対して平均化フィルタを適用した結果を示し
ている。同図のごとく、極値をとる画素の輝度が平均化
によって次第に薄れるとともに、位置も平均化の影響で
シフトしていく。その結果、目（輝度の極小点）の輝度
や位置の情報は、このような粗い解像度レベルで曖昧に
なり、この解像度では正しいマッチングを計算すること
ができない。したがって、粗い解像度レベルを設けるの
が大域的なマッチングのためでありながら、ここで得ら
れたマッチングは画像の本当の特徴（目、つまり極小
点）に正確に対応しない。より精細な解像度レベルで目
が鮮明に現れたとしても、大域的なマッチングをとる際
に混入した誤差は、もはや取り返しがつかない。入力画
像にスムージング処理を加えることにより、テクスチャ
領域のステレオ情報が落ちてしまうこともすでに指摘さ
れている。Many multi-resolution filters have been proposed to recognize the global structure of an image. They are classified into linear filters and non-linear filters. Wavelets are an example of the former, but linear filters are generally
It is not very useful for image matching. This is because the information on the luminance of the pixel having the extreme value becomes gradually blurred together with the positional information. FIGS. 1A and 1B show the results of applying an averaging filter to the face images shown in FIGS. 19A and 19B, respectively. As shown in the figure, the brightness of the pixel having the extreme value gradually decreases due to the averaging, and the position shifts under the influence of the averaging. As a result, the luminance and position information of the eyes (minimum luminance points) become ambiguous at such a coarse resolution level, and correct matching cannot be calculated at this resolution. Thus, while providing a coarse resolution level is for global matching, the matching obtained here does not correspond exactly to the real features of the image (eye, i.e. the minimum point). Even if the eyes appear sharper at a finer resolution level, the errors introduced during global matching are no longer irreparable. It has also been pointed out that adding smoothing processing to an input image causes stereo information in a texture area to be reduced.

【００２４】一方、最近地形学の分野で利用されはじめ
た非線形フィルタとして一次元の「ふるい（sieve）」
演算子がある。この演算子は、所定の大きさの一次元ウ
ィンドウ内の極小値（または極大値）を選択することに
より、縮尺と空間の因果関係を保存しながら画像にスム
ージング処理を加える。その結果得られる画像は元の画
像と同じ大きさであるが、小さな波の成分が取り除かれ
ているため、より単純になる。画像の情報を落とすとい
う点で、この演算子は広い意味での「多重解像度フィル
タ」に分類することはできるが、実際にはウェーブレッ
トのように画像の解像度を変えながら画像を階層化する
わけではなく（つまり狭い意味での多重解像度フィルタ
ではなく）、画像間の対応の検出には利用できない。 [前提技術が解決しようとする課題]以上をまとめれば以
下の課題が認められる。On the other hand, a one-dimensional "sieve" is used as a nonlinear filter which has recently been used in the field of topography.
There are operators. This operator applies a smoothing process to an image by selecting a local minimum (or local maximum) in a one-dimensional window of a predetermined size while preserving the causal relationship between the scale and the space. The resulting image is the same size as the original image, but is simpler because small wave components have been removed. This operator can be classified as a "multi-resolution filter" in a broad sense in terms of dropping image information, but it does not actually hierarchize the image while changing the image resolution like a wavelet. (Ie not a multi-resolution filter in the narrow sense) and cannot be used to detect correspondence between images. [Problems to be solved by the underlying technology] The following problems are recognized by summarizing the above.

【００２５】１．画像の特徴を正確に、かつ比較的簡単
な処理で把握する画像処理方法が乏しかった。特に、特
徴のある点に関する情報、例えば画素値や位置を維持し
ながら特徴を抽出できる画像処理方法に関する有効な提
案が少なかった。２．画像の特徴をもとに対応点を自動検出する場合、一
般に処理が複雑であるか、ノイズ耐性が低いなどの欠点
があった。また、処理に際していろいろな制約を設ける
必要があり、大域的特徴及び局所的特徴を同時に満たす
マッチングをとることが困難だった。３．画像の大域的な構造または特徴を認識するために多
重解像度フィルタを導入しても、そのフィルタが線形フ
ィルタの場合、画素の輝度情報と位置情報が曖昧になっ
た。その結果、対応点の把握が不正確になりやすかっ
た。非線形フィルタである一次元ふるい演算子は画像を
階層化しないため、画像間の対応点の検出には利用でき
なかった。４．これらの結果、対応点を正しく把握しようとすれ
ば、結局人手による指定に頼るほか有効な手だてがなか
った。1. There have been few image processing methods for accurately grasping the features of images with relatively simple processing. In particular, there have been few effective proposals regarding an image processing method capable of extracting a feature while maintaining information on a characteristic point, for example, a pixel value and a position. 2. When the corresponding points are automatically detected based on the features of the image, there are generally disadvantages such as complicated processing or low noise resistance. In addition, it is necessary to set various restrictions upon processing, and it has been difficult to perform matching that simultaneously satisfies global features and local features. 3. Even if a multi-resolution filter was introduced to recognize the global structure or features of the image, if the filter was a linear filter, the luminance and position information of the pixel would be ambiguous. As a result, it was easy for the grasp of the corresponding points to be inaccurate. The one-dimensional sieve operator, which is a non-linear filter, does not hierarchize images, and thus cannot be used for detecting corresponding points between images. 4. As a result, in order to correctly grasp the corresponding points, there was no effective means other than relying on manual designation.

【００２６】本発明はこれらの課題の解決を目的として
なされたものであり、画像処理の分野において、画像の
特徴の的確な把握を可能にする技術を提供するものであ
る。 [前提技術が課題を解決するための手段]この目的のため
に本発明のある態様は、新たな多重解像度の画像フィル
タを提案する。この多重解像度フィルタは画像から特異
点を抽出する。したがって、特異点フィルタともよばれ
る。特異点とは画像上特徴をもつ点をいう。例として、
ある領域において画素値（画素値とは、色番号、輝度値
など画像または画素に関する任意の数値を指す）が最大
になる極大点、最小になる極小点、ある方向については
最大だが別の方向については最小になるような鞍点があ
る。特異点は位相幾何学上の概念であってもよい。ただ
し、その他どのような特徴を有してもよい。いかなる性
質の点を特異点と考えるかは、本発明にとって本質問題
ではない。The present invention has been made for the purpose of solving these problems, and provides a technique that enables accurate grasp of the characteristics of an image in the field of image processing. [Means for Solving the Problem by the Base Technology] For this purpose, an embodiment of the present invention proposes a new multi-resolution image filter. This multi-resolution filter extracts a singular point from an image. Therefore, it is also called a singular point filter. A singular point is a point having a feature on an image. As an example,
In a certain area, a pixel value (a pixel value indicates an arbitrary numerical value related to an image or a pixel such as a color number and a luminance value) is a maximum point where the pixel value is maximum, a minimum point where the pixel value is minimum, and maximum in one direction but in another direction Has a minimum saddle point. The singular point may be a concept on topological geometry. However, it may have any other features. It is not essential for the present invention what kind of property is considered a singular point.

【００２７】この態様では、多重解像度フィルタを用い
た画像処理が行われる。まず検出工程において、第一の
画像に対し、二次元的な探索を行って特異点が検出され
る。つぎに生成工程において、検出された特異点を抽出
して第一の画像よりも解像度の低い第二の画像が生成さ
れる。第二の画像には第一の画像のもつ特異点が引き継
がれる。第二の画像は第一の画像よりも解像度が低いた
め、画像の大域的な特徴の把握に好適である。In this embodiment, image processing using a multi-resolution filter is performed. First, in a detection step, a singular point is detected by performing a two-dimensional search on the first image. Next, in the generation step, a detected image is extracted to generate a second image having a lower resolution than the first image. The second image inherits the singular point of the first image. Since the second image has a lower resolution than the first image, it is suitable for grasping global characteristics of the image.

【００２８】本発明の別の態様は特異点フィルタを用い
た画像マッチング方法に関する。この態様では、始点画
像と終点画像間のマッチングがとられる。始点画像およ
び終点画像とは、ふたつの画像の区別のために便宜的に
与えた名称であり、本質的な違いはない。Another embodiment of the present invention relates to an image matching method using a singular point filter. In this embodiment, matching between the start image and the end image is performed. The start point image and the end point image are names given for convenience in distinguishing the two images, and there is no essential difference.

【００２９】この態様では、第一工程にて、始点画像に
特異点フィルタを施して解像度の異なる一連の始点階層
画像が生成される。第二工程では、終点画像に特異点フ
ィルタを施して解像度の異なる一連の終点階層画像が生
成される。始点階層画像、終点階層画像とは、それぞれ
始点画像、終点画像を階層化して得られる画像群をい
い、それぞれ最低２枚の画像からなる。つぎに第三工程
において、始点階層画像と終点階層画像のマッチングが
解像度レベルの階層の中で計算される。この態様によれ
ば、多重解像度フィルタによって特異点に関連する画像
の特徴が抽出され、および／または明確化されるため、
マッチングが容易になる。マッチングのための拘束条件
は特に必要としない。In this embodiment, in the first step, a singular point filter is applied to the starting image to generate a series of starting hierarchical images having different resolutions. In the second step, a series of end point hierarchical images having different resolutions are generated by applying a singularity filter to the end point image. The start hierarchical image and the end hierarchical image refer to an image group obtained by hierarchizing the start image and the end image, respectively, and each include at least two images. Next, in the third step, matching between the start hierarchical image and the end hierarchical image is calculated in the resolution level hierarchy. According to this aspect, the features of the image associated with the singularity are extracted and / or clarified by the multi-resolution filter,
Matching becomes easy. No particular constraint is required for matching.

【００３０】本発明のさらに別の態様も始点画像と終点
画像のマッチングに関する。この態様では、予め複数の
マッチング評価項目のそれぞれに関して評価式を設け、
それらの評価式を統合して総合評価式を定義し、その総
合評価式の極値付近に注目して最適マッチングを探索す
る。総合評価式は、評価式の少なくもひとつに係数パラ
メータを掛けたうえでそれらの評価式の総和として定義
してもよく、その場合、総合評価式またはいずれかの評
価式がほぼ極値をとる状態を検出して前記パラメータを
決定してもよい。「極値付近」または「ほぼ極値をと
る」としたのは、多少誤差を含んでいてもよいためであ
る。多少の誤差は本発明にはさして問題とならない。Still another preferred embodiment according to the present invention relates to matching of a start image and an end image. In this aspect, an evaluation formula is provided in advance for each of the plurality of matching evaluation items,
The comprehensive evaluation formula is defined by integrating the evaluation formulas, and the optimum matching is searched by focusing on the vicinity of the extreme value of the comprehensive evaluation formula. The comprehensive evaluation formula may be defined as a sum of those evaluation formulas after multiplying at least one of the evaluation formulas by the coefficient parameter. In this case, the comprehensive evaluation formula or one of the evaluation formulas takes an extreme value. The parameter may be determined by detecting a state. The reason why "approximate to an extreme value" or "approximately takes an extreme value" is that an error may be included to some extent. Some errors are not a problem for the present invention.

【００３１】極値自体も前記パラメータに依存するた
め、極値の挙動、つまり極値の変化の様子をもとに、最
適と考えられるパラメータを決定する余地が生じる。こ
の態様はその事実を利用している。この態様によれば、
元来調整の困難なパラメータの決定を自動化する途が拓
かれる。 [前提技術の実施の形態]最初に［１］で前提技術の要素
技術を詳述し、［２］で処理手順を具体的に説明する。
さらに［３］で実験の結果を報告する。［１］要素技術の詳細［１．１］イントロダクション特異点フィルタと呼ばれる新たな多重解像度フィルタを
導入し、画像間のマッチングを正確に計算する。オブジ
ェクトに関する予備知識は一切不要である。画像間のマ
ッチングの計算は、解像度の階層を進む間、各解像度に
おいて計算される。その際、粗いレベルから精細なレベ
ルへと順に解像度の階層を辿っていく。計算に必要なパ
ラメータは、人間の視覚システムに似た動的計算によっ
て完全に自動設定される。画像間の対応点を人手で特定
する必要はない。Since the extreme value itself also depends on the parameters, there is room for determining a parameter considered to be optimal based on the behavior of the extreme value, that is, the manner of change of the extreme value. This embodiment makes use of that fact. According to this aspect,
This opens up ways to automate the determination of parameters that are inherently difficult to adjust. [Embodiment of Base Technology] First, the basic technology of the base technology will be described in detail in [1], and the processing procedure will be specifically described in [2].
The results of the experiment are reported in [3]. [1] Details of Elemental Technology [1.1] Introduction A new multi-resolution filter called a singularity filter is introduced to accurately calculate matching between images. No prior knowledge of the object is required. The calculation of matching between images is calculated at each resolution while proceeding through the resolution hierarchy. At that time, the hierarchy of the resolution is sequentially traced from a coarse level to a fine level. The parameters required for the calculations are set completely automatically by dynamic calculations similar to the human visual system. It is not necessary to manually specify corresponding points between images.

【００３２】本前提技術は、例えば完全に自動的なモー
フィング、物体認識、立体写真測量、ボリュームレンダ
リング、間隔の開いた少ない画像からの滑らかな動画像
の生成などに応用できる。モーフィングに用いる場合、
与えられた画像を自動的に変形することができる。ボリ
ュームレンダリングに用いる場合、断面間の中間的な画
像を正確に再構築することができる。断面間の距離が遠
く、断面の形状が大きく変化する場合でも同様である。The prerequisite technique can be applied to, for example, completely automatic morphing, object recognition, stereogrammetry, volume rendering, generation of a smooth moving image from images with small intervals. When used for morphing,
A given image can be automatically transformed. When used for volume rendering, an intermediate image between sections can be accurately reconstructed. The same applies to the case where the distance between the sections is long and the shape of the section greatly changes.

【００３３】［１．２］特異点フィルタの階層前提技術に係る多重解像度特異点フィルタは、画像の解
像度を落としながら、しかも画像に含まれる各特異点の
輝度及び位置を保存することができる。ここで画像の幅
をＮ、高さをＭとする。以下簡単のため、Ｎ＝Ｍ＝２ｎ
（ｎは自然数）と仮定する。また、区間［０，Ｎ］⊂Ｒ
をＩと記述する。（ｉ，ｊ）における画像の画素をｐ
（ｉ，ｊ）と記述する（ｉ，ｊ∈Ｉ）。[1.2] Hierarchy of Singularity Filter The multi-resolution singularity filter according to the base technology can reduce the resolution of an image while preserving the brightness and position of each singularity included in the image. Here, the width of the image is N and the height is M. For simplicity, N = M = 2n
(N is a natural number). Also, the interval [0, N] ⊂R
Is described as I. Let the pixel of the image at (i, j) be p
Described as (i, j) (i, j∈I).

【００３４】ここで多重解像度の階層を導入する。階層
化された画像群は多重解像度フィルタで生成される。多
重解像度フィルタは、もとの画像に対して二次元的な探
索を行って特異点を検出し、検出された特異点を抽出し
てもとの画像よりも解像度の低い別の画像を生成する。
ここで第ｍレベルにおける各画像のサイズは２ｍ×２ｍ
（０≦ｍ≦ｎ）とする。特異点フィルタは次の４種類の
新たな階層画像をｎから下がる方向で再帰的に構築す
る。Here, a multi-resolution hierarchy is introduced. The hierarchized image group is generated by a multi-resolution filter. The multi-resolution filter performs a two-dimensional search on the original image to detect a singular point, and extracts the detected singular point to generate another image having a lower resolution than the original image. .
Here, the size of each image at the m-th level is 2mx2m
(0 ≦ m ≦ n). The singularity filter recursively constructs the following four types of new hierarchical images in a direction descending from n.

【００３５】[0035]

【数１】ただしここで、(Equation 1) But where

【数２】とする。以降これら４つの画像を副画像（サブイメー
ジ）と呼ぶ。ｍｉｎ _x≦t _≦x+1、ｍａｘ _x≦t≦x+1 を
それぞれα及びβと記述すると、副画像はそれぞれ以下
のように記述できる。(Equation 2) And Hereinafter, these four images are called sub-images (sub-images). If min _{x ≦ t} _{≦ x + 1} and max _{x ≦ t ≦ x + 1} are described as α and β, respectively, the sub-images can be described as follows.

【００３６】Ｐ（ｍ，０）＝α（ｘ）α（ｙ）ｐ（ｍ＋１，０）Ｐ（ｍ，１）＝α（ｘ）β（ｙ）ｐ（ｍ＋１，１）Ｐ（ｍ，２）＝β（ｘ）α（ｙ）ｐ（ｍ＋１，２）Ｐ（ｍ，３）＝β（ｘ）β（ｙ）ｐ（ｍ＋１，３）すなわち、これらはαとβのテンソル積のようなものと
考えられる。副画像はそれぞれ特異点に対応している。
これらの式から明らかなように、特異点フィルタはもと
の画像について２×２画素で構成されるブロックごとに
特異点を検出する。その際、各ブロックのふたつの方
向、つまり縦と横について、最大画素値または最小画素
値をもつ点を探索する。画素値として、前提技術では輝
度を採用するが、画像に関するいろいろな数値を採用す
ることができる。ふたつの方向の両方について最大画素
値となる画素は極大点、ふたつの方向の両方について最
小画素値となる画素は極小点、ふたつの方向の一方につ
いて最大画素値となるとともに、他方について最小画素
値となる画素は鞍点として検出される。P (m, 0) = α (x) α (y) p (m + 1,0) P (m, 1) = α (x) β (y) p (m + 1,1) P (m, 2 ) = Β (x) α (y) p (m + 1,2) P (m, 3) = β (x) β (y) p (m + 1,3) That is, these are like the tensor product of α and β. It is considered something. Each sub-image corresponds to a singular point.
As is apparent from these equations, the singularity filter detects a singularity for each block composed of 2 × 2 pixels in the original image. At this time, a point having a maximum pixel value or a minimum pixel value is searched for in two directions of each block, that is, in the vertical and horizontal directions. As the pixel value, luminance is adopted in the base technology, but various numerical values relating to an image can be adopted. The pixel with the maximum pixel value in both directions is the local maximum point, the pixel with the minimum pixel value in both directions is the local minimum point, the maximum pixel value in one of the two directions, and the minimum pixel value in the other. Are detected as saddle points.

【００３７】特異点フィルタは、各ブロックの内部で検
出された特異点の画像（ここでは１画素）でそのブロッ
クの画像（ここでは４画素）を代表させることにより、
画像の解像度を落とす。特異点の理論的な観点からすれ
ば、α（ｘ）α（ｙ）は極小点を保存し、β（ｘ）β
（ｙ）は極大点を保存し、α（ｘ）β（ｙ）及びβ
（ｘ）α（ｙ）は鞍点を保存する。The singular point filter represents an image of a singular point (here, one pixel) detected inside each block by representing the image of the block (here, four pixels).
Decrease the image resolution. From a theoretical point of view of the singularity, α (x) α (y) preserves the minimum point and β (x) β
(Y) preserves the maxima, and α (x) β (y) and β
(X) α (y) preserves the saddle point.

【００３８】はじめに、マッチングをとるべき始点（ソ
ース）画像と終点（デスティネーション）画像に対して
別々に特異点フィルタ処理を施し、それぞれ一連の画像
群、すなわち始点階層画像と終点階層画像を生成してお
く。始点階層画像と終点階層画像は、特異点の種類に対
応してそれぞれ４種類ずつ生成される。First, a singular point filter process is separately performed on a start point (source) image and an end point (destination) image to be matched to generate a series of image groups, that is, a start point hierarchical image and an end point hierarchical image. Keep it. The start point hierarchical image and the end point hierarchical image are each generated in four types corresponding to the type of the singular point.

【００３９】この後、一連の解像度レベルの中で始点階
層画像と終点階層画像のマッチングがとられていく。ま
ずｐ（ｍ，０）を用いて極小点のマッチングがとられ
る。次に、その結果に基づき、ｐ（ｍ，１）を用いて鞍
点のマッチングがとられ、ｐ（ｍ，２）を用いて他の鞍
点のマッチングがとられる。そして最後にｐ（ｍ，３）
を用いて極大点のマッチングがとられる。Thereafter, the start hierarchical image and the end hierarchical image are matched in a series of resolution levels. First, the minimum point is matched using p (m, 0). Next, based on the result, a saddle point is matched using p (m, 1), and another saddle point is matched using p (m, 2). And finally p (m, 3)
Is used to match the maximum point.

【００４０】図１（ｃ）と図１（ｄ）はそれぞれ図１
（ａ）と図１（ｂ）の副画像ｐ（５，０）を示してい
る。同様に、図１（ｅ）と図１（ｆ）はｐ（５，１）、
図１（ｇ）と図１（ｈ）はｐ（５，２）、図１（ｉ）と
図１（ｊ）はｐ（５，３）をそれぞれ示している。これ
らの図からわかるとおり、副画像によれば画像の特徴部
分のマッチングが容易になる。まずｐ（５，０）によっ
て目が明確になる。目は顔の中で輝度の極小点だからで
ある。ｐ（５，１）によれば口が明確になる。口は横方
向で輝度が低いためである。ｐ（５，２）によれば首の
両側の縦線が明確になる。最後に、ｐ（５，３）によっ
て耳や頬の最も明るい点が明確になる。これらは輝度の
極大点だからである。FIGS. 1 (c) and 1 (d) correspond to FIG.
(A) and the sub-image p (5,0) of FIG.1 (b) are shown. Similarly, FIGS. 1E and 1F show p (5,1),
FIGS. 1 (g) and 1 (h) show p (5, 2), and FIGS. 1 (i) and 1 (j) show p (5, 3). As can be seen from these figures, the matching of the characteristic portions of the image is facilitated by using the sub-image. First, eyes become clear by p (5,0). This is because the eyes are the minimum point of luminance in the face. According to p (5,1), the mouth becomes clear. This is because the mouth has low brightness in the horizontal direction. According to p (5,2), the vertical lines on both sides of the neck become clear. Finally, p (5,3) defines the brightest points on the ears and cheeks. This is because these are the maximum points of luminance.

【００４１】特異点フィルタによれば画像の特徴が抽出
できるため、例えばカメラで撮影された画像の特徴と、
予め記録しておいたいくつかのオブジェクトの特徴を比
較することにより、カメラに映った被写体を識別するこ
とができる。According to the singular point filter, the features of the image can be extracted.
By comparing the features of several objects recorded in advance, the subject reflected on the camera can be identified.

【００４２】［１．３］画像間の写像の計算始点画像の位置（ｉ，ｊ）の画素をｐ（ｎ）（ｉ，ｊ）
と書き、同じく終点画像の位置（ｋ，ｌ）の画素をｑ
（ｎ）（ｋ，ｌ）で記述する。ｉ，ｊ，ｋ，ｌ∈Ｉとす
る。画像間の写像のエネルギー（後述）を定義する。こ
のエネルギーは、始点画像の画素の輝度と終点画像の対
応する画素の輝度の差、及び写像の滑らかさによって決
まる。最初に最小のエネルギーを持つｐ（ｍ，０）とｑ
（ｍ，０）間の写像ｆ（ｍ，０）：ｐ（ｍ，０）→ｑ
（ｍ，０）が計算される。ｆ（ｍ，０）に基づき、最小
エネルギーを持つｐ（ｍ，１）、ｑ（ｍ，１）間の写像
ｆ（ｍ，１）が計算される。この手続は、ｐ（ｍ，３）
とｑ（ｍ，３）の間の写像ｆ（ｍ，３）の計算が終了す
るまで続く。各写像ｆ（ｍ，ｉ）（ｉ＝０，１，２，
…）を副写像と呼ぶことにする。ｆ（ｍ，ｉ）の計算の
都合のために、ｉの順序は次式のように並べ替えること
ができる。並べ替えが必要な理由は後述する。[1.3] Calculation of Mapping Between Images The pixel at the position (i, j) of the starting image is represented by p (n) (i, j).
And the pixel at the position (k, l) of the end point image is q
(N) Described by (k, l). i, j, k, l∈I. Defines the energy of mapping between images (described below). This energy is determined by the difference between the luminance of the pixel of the start image and the luminance of the corresponding pixel of the end image, and the smoothness of the mapping. First, p (m, 0) and q with minimum energy
Mapping f (m, 0) between (m, 0): p (m, 0) → q
(M, 0) is calculated. Based on f (m, 0), a mapping f (m, 1) between p (m, 1) and q (m, 1) having the minimum energy is calculated. This procedure is p (m, 3)
And until the calculation of the mapping f (m, 3) between q (m, 3) is completed. Each mapping f (m, i) (i = 0, 1, 2,
...) is called a submap. For the convenience of calculating f (m, i), the order of i can be rearranged as follows: The reason why the sorting is necessary will be described later.

【数３】ここでσ（ｉ）∈｛０，１，２，３｝である。(Equation 3) Here, σ (i) {0, 1, 2, 3}.

【００４３】［１．３．１］全単射始点画像と終点画像の間のマッチングを写像で表現する
場合、その写像は両画像間で全単射条件を満たすべきで
ある。両画像に概念上の優劣はなく、互いの画素が全射
かつ単射で接続されるべきだからである。しかしながら
通常の場合とは異なり、ここで構築すべき写像は全単射
のデジタル版である。前提技術では、画素は格子点によ
って特定される。[1.3.1] Bijection When the matching between the start image and the end image is expressed by a mapping, the mapping should satisfy the bijection condition between both images. This is because there is no conceptual advantage between the two images, and each pixel should be connected in a bijective and injective manner. However, unlike the usual case, the mapping to be constructed here is a digital version of the bijection. In the base technology, pixels are specified by grid points.

【００４４】始点副画像（始点画像について設けられた
副画像）から終点副画像（終点画像について設けられた
副画像）への写像は、ｆ（ｍ，ｓ）：Ｉ／２ｎ−ｍ×Ｉ
／２ｎ−ｍ→Ｉ／２ｎ−ｍ×Ｉ／２ｎ−ｍ（ｓ＝０，
１，…）によって表される。ここで、ｆ（ｍ，ｓ）
（ｉ，ｊ）＝（ｋ，ｌ）は、始点画像のｐ（ｍ，ｓ）
（ｉ，ｊ）が終点画像のｑ（ｍ，ｓ）（ｋ，ｌ）に写像
されることを意味する。簡単のために、ｆ（ｉ，ｊ）＝
（ｋ，ｌ）が成り立つとき画素ｑ（ｋ，ｌ）をｑｆ
（ｉ，ｊ）と記述する。The mapping from the start point sub-image (sub-image provided for the start-point image) to the end-point sub-image (sub-image provided for the end-point image) is f (m, s): I / 2n-m × I
/ 2n-m → I / 2n-m × I / 2n-m (s = 0,
1,...). Where f (m, s)
(I, j) = (k, l) is p (m, s) of the starting image
This means that (i, j) is mapped to q (m, s) (k, l) of the destination image. For simplicity, f (i, j) =
When (k, l) is satisfied, the pixel q (k, l) is represented by qf
(I, j).

【００４５】前提技術で扱う画素（格子点）のようにデ
ータが離散的な場合、全単射の定義は重要である。ここ
では以下のように定義する（ｉ，ｉ’，ｊ，ｊ’，ｋ，
ｌは全て整数とする）。まず始めに、始点画像の平面に
おいてＲによって表記される各正方形領域、When the data is discrete like pixels (grid points) handled in the base technology, the definition of bijection is important. Here, it is defined as follows (i, i ', j, j', k,
1 is an integer). First, each square area represented by R in the plane of the starting image,

【数４】を考える（ｉ＝０，…，２ｍ−１、ｊ＝０，…，２ｍ−
１）。ここでＲの各辺（エッジ）の方向を以下のように
定める。(Equation 4) (I = 0, ..., 2m-1, j = 0, ..., 2m-
1). Here, the direction of each side (edge) of R is determined as follows.

【００４６】[0046]

【数５】この正方形は写像ｆによって終点画像平面における四辺
形に写像されなければならない。ｆ（ｍ，ｓ）（Ｒ）に
よって示される四辺形、(Equation 5) This square must be mapped by mapping f to a quadrilateral in the destination image plane. a quadrilateral indicated by f (m, s) (R),

【数６】は、以下の全単射条件を満たす必要がある。(Equation 6) Must satisfy the following bijection conditions:

【００４７】１．四辺形ｆ（ｍ，ｓ）（Ｒ）のエッジは
互いに交差しない。２．ｆ（ｍ，ｓ）（Ｒ）のエッジの方向はＲのそれらに
等しい（図２の場合、時計回り）。３．緩和条件として収縮写像（リトラクション：retrac
tions）を許す。1. The edges of the quadrilateral f (m, s) (R) do not intersect each other. 2. The directions of the edges of f (m, s) (R) are equal to those of R (clockwise in FIG. 2). 3. Shrinkage mapping (retraction: retrac
allowtions).

【００４８】何らかの緩和条件を設けないかぎり、全単
射条件を完全に満たす写像は単位写像しかないためであ
る。ここではｆ（ｍ，ｓ）（Ｒ）のひとつのエッジの長
さが０、すなわちｆ（ｍ，ｓ）（Ｒ）は三角形になって
もよい。しかし、面積が０となるような図形、すなわち
１点または１本の線分になってはならない。図２（Ｒ）
がもとの四辺形の場合、図２（Ａ）と図２（Ｄ）は全単
射条件を満たすが、図２（Ｂ）、図２（Ｃ）、図２
（Ｅ）は満たさない。This is because there is only a unit mapping that completely satisfies the bijection condition unless some relaxation condition is provided. Here, the length of one edge of f (m, s) (R) may be 0, that is, f (m, s) (R) may be a triangle. However, the figure must not be a figure having an area of 0, that is, one point or one line segment. FIG. 2 (R)
2 (A) and FIG. 2 (D) satisfy the bijection condition, but FIG. 2 (B), FIG. 2 (C), FIG.
(E) is not satisfied.

【００４９】実際のインプリメンテーションでは、写像
が全射であることを容易に保証すべく、さらに以下の条
件を課してもよい。つまり始点画像の境界上の各画素
は、終点画像において同じ位置を占める画素に写影され
るというものである。すなわち、ｆ（ｉ，ｊ）＝（ｉ，
ｊ）（ただしｉ＝０，ｉ＝２ｍ−１，ｊ＝０，ｊ＝２ｍ
−１の４本の線上）である。この条件を以下「付加条
件」とも呼ぶ。In an actual implementation, the following condition may be further imposed to easily guarantee that the mapping is surjective. That is, each pixel on the boundary of the start image is mapped to a pixel occupying the same position in the end image. That is, f (i, j) = (i,
j) (where i = 0, i = 2m-1, j = 0, j = 2m
-1 on four lines). This condition is hereinafter also referred to as “additional condition”.

【００５０】［１．３．２］写像のエネルギー［１．３．２．１］画素の輝度に関するコスト写像ｆのエネルギーを定義する。エネルギーが最小にな
る写像を探すことが目的である。エネルギーは主に、始
点画像の画素の輝度とそれに対応する終点画像の画素の
輝度の差で決まる。すなわち、写像ｆ（ｍ，ｓ）の点
（ｉ，ｊ）におけるエネルギーＣ（ｍ，ｓ）（ｉ，ｊ）
は次式によって定まる。[1.3.2] Energy of mapping [1.3.2.1] Cost related to luminance of pixel Define energy of mapping f. The goal is to find a mapping that minimizes energy. The energy is mainly determined by the difference between the luminance of the pixel of the start image and the luminance of the corresponding pixel of the end image. That is, the energy C (m, s) (i, j) at the point (i, j) of the mapping f (m, s)
Is determined by the following equation.

【００５１】[0051]

【数７】ここで、Ｖ（ｐ（ｍ，ｓ）（ｉ，ｊ））及びＶ（ｑ
（ｍ，ｓ）ｆ（ｉ，ｊ））はそれぞれ画素ｐ（ｍ，ｓ）
（ｉ，ｊ）及びｑ（ｍ，ｓ）ｆ（ｉ，ｊ）の輝度であ
る。ｆのトータルのエネルギーＣ（ｍ，ｓ）は、マッチ
ングを評価するひとつの評価式であり、つぎに示すＣ
（ｍ，ｓ）（ｉ，ｊ）の合計で定義できる。(Equation 7) Here, V (p (m, s) (i, j)) and V (q
(M, s) f (i, j)) is a pixel p (m, s)
(I, j) and q (m, s) are the luminances of f (i, j). The total energy C (m, s) of f is one evaluation formula for evaluating the matching, and the following C
(M, s) (i, j).

【００５２】[0052]

【数８】 (Equation 8)

【００５３】［１．３．２．２］滑らかな写像のための
画素の位置に関するコスト滑らかな写像を得るために、写像に関する別のエネルギ
ーＤｆを導入する。このエネルギーは画素の輝度とは関
係なく、ｐ（ｍ，ｓ）（ｉ，ｊ）およびｑ（ｍ，ｓ）ｆ
（ｉ，ｊ）の位置によって決まる（ｉ＝０，…，２ｍ−
１，ｊ＝０，…，２ｍ−１）。点（ｉ，ｊ）における写
像ｆ（ｍ，ｓ）のエネルギーＤ（ｍ，ｓ）（ｉ，ｊ）は
次式で定義される。[1.3.2.2] Cost related to pixel position for smooth mapping In order to obtain a smooth mapping, another energy Df for the mapping is introduced. This energy is independent of the brightness of the pixel and is independent of p (m, s) (i, j) and q (m, s) f
(I = 0, ..., 2m-) determined by the position of (i, j)
1, j = 0, ..., 2m-1). The energy D (m, s) (i, j) of the mapping f (m, s) at the point (i, j) is defined by the following equation.

【００５４】[0054]

【数９】ただし、係数パラメータηは０以上の実数であり、ま
た、(Equation 9) Here, the coefficient parameter η is a real number equal to or greater than 0, and

【数１０】 (Equation 10)

【数１１】とする。ここで、[Equation 11] And here,

【数１２】であり、ｉ’＜０およびｊ’＜０に対してｆ（ｉ’，
ｊ’）は０と決める。Ｅ０は（ｉ，ｊ）及びｆ（ｉ，
ｊ）の距離で決まる。Ｅ０は画素があまりにも離れた画
素へ写影されることを防ぐ。ただしＥ０は、後に別のエ
ネルギー関数で置き換える。Ｅ１は写像の滑らかさを保
証する。Ｅ１は、ｐ（ｉ，ｊ）の変位とその隣接点の変
位の間の隔たりを表す。以上の考察をもとに、マッチン
グを評価する別の評価式であるエネルギーＤｆは次式で
定まる。(Equation 12) And for i ′ <0 and j ′ <0, f (i ′,
j ′) is determined to be 0. E0 is (i, j) and f (i,
j) is determined by the distance. E0 prevents pixels from being mapped to pixels that are too far apart. However, E0 will be replaced by another energy function later. E1 guarantees the smoothness of the mapping. E1 represents the distance between the displacement of p (i, j) and the displacement of its adjacent point. Based on the above consideration, energy Df, which is another evaluation expression for evaluating matching, is determined by the following expression.

【００５５】[0055]

【数１３】［１．３．２．３］写像の総エネルギー写像の総エネルギー、すなわち複数の評価式の統合に係
る総合評価式はλＣ（ｍ，ｓ）ｆ＋Ｄ（ｍ，ｓ）ｆで定
義される。ここで係数パラメータλは０以上の実数であ
る。目的は総合評価式が極値をとる状態を検出するこ
と、すなわち次式で示す最小エネルギーを与える写像を
見いだすことである。(Equation 13) [1.3.2.3] Total energy of mapping The total energy of the mapping, that is, the total evaluation formula relating to the integration of a plurality of evaluation formulas, is defined by λC (m, s) f + D (m, s) f. Here, the coefficient parameter λ is a real number of 0 or more. The purpose is to detect a state where the comprehensive evaluation expression takes an extreme value, that is, to find a mapping giving the minimum energy represented by the following expression.

【数１４】 [Equation 14]

【００５６】λ＝０及びη＝０の場合、写像は単位写像
になることに注意すべきである（すなわち、全てのｉ＝
０，…，２ｍ−１及びｊ＝０，…，２ｍ−１に対してｆ
（ｍ，ｓ）（ｉ，ｊ）＝（ｉ，ｊ）となる）。後述のご
とく、本前提技術では最初にλ＝０及びη＝０の場合を
評価するため、写像を単位写像から徐々に変形していく
ことができる。仮に総合評価式のλの位置を変えてＣ
（ｍ，ｓ）ｆ＋λＤ（ｍ，ｓ）ｆと定義したとすれば、
λ＝０及びη＝０の場合に総合評価式がＣ（ｍ，ｓ）ｆ
だけになり、本来何等関連のない画素どうしが単に輝度
が近いというだけで対応づけられ、写像が無意味なもの
になる。そうした無意味な写像をもとに写像を変形して
いってもまったく意味をなさない。このため、単位写像
が評価の開始時点で最良の写像として選択されるよう係
数パラメータの与えかたが配慮されている。It should be noted that for λ = 0 and η = 0, the mapping is a unit mapping (ie, all i =
0, ..., 2m-1 and j = 0, ..., 2m-1
(M, s) (i, j) = (i, j)). As will be described later, in the base technology, since the case where λ = 0 and η = 0 are evaluated first, the mapping can be gradually deformed from the unit mapping. Suppose that the position of λ in the comprehensive evaluation formula is changed and C
Assuming that (m, s) f + λD (m, s) f,
When λ = 0 and η = 0, the comprehensive evaluation formula is C (m, s) f
And the pixels that have no relation at all are simply associated with each other simply because of the close luminance, and the mapping becomes meaningless. There is no point in transforming a mapping based on such a meaningless mapping. For this reason, consideration is given to how to give the coefficient parameters so that the unit mapping is selected as the best mapping at the start of the evaluation.

【００５７】オプティカルフローもこの前提技術同様、
画素の輝度の差と滑らかさを考慮する。しかし、オプテ
ィカルフローは画像の変換に用いることはできない。オ
ブジェクトの局所的な動きしか考慮しないためである。
前提技術に係る特異点フィルタを用いることによって大
域的な対応関係を検出することができる。The optical flow is also similar to this base technology.
Consider the brightness difference and smoothness of the pixels. However, optical flow cannot be used for image conversion. This is because only the local motion of the object is considered.
Global correspondence can be detected by using the singularity filter according to the base technology.

【００５８】［１．３．３］多重解像度の導入による写
像の決定最小エネルギーを与え、全単射条件を満足する写像ｆｍ
ｉｎを多重解像度の階層を用いて求める。各解像度レベ
ルにおいて始点副画像及び終点副画像間の写像を計算す
る。解像度の階層の最上位（最も粗いレベル）からスタ
ートし、各解像度レベルの写像を、他のレベルの写像を
考慮に入れながら決定する。各レベルにおける写像の候
補の数は、より高い、つまりより粗いレベルの写像を用
いることによって制限される。より具体的には、あるレ
ベルにおける写像の決定に際し、それよりひとつ粗いレ
ベルにおいて求められた写像が一種の拘束条件として課
される。[1.3.3] Determination of mapping by introducing multiple resolutions Mapping fm that satisfies bijection condition by giving minimum energy
in is obtained using a multi-resolution hierarchy. At each resolution level, the mapping between the starting sub-image and the ending sub-image is calculated. Starting from the highest level (the coarsest level) in the resolution hierarchy, the mapping of each resolution level is determined taking into account the mapping of the other levels. The number of mapping candidates at each level is limited by using higher, or coarser, levels of mapping. More specifically, when determining a mapping at a certain level, a mapping obtained at a level lower than that is imposed as a kind of constraint condition.

【００５９】まず、First,

【数１５】が成り立つとき、ｐ（ｍ−１，ｓ）（ｉ’，ｊ’）、ｑ
（ｍ−１，ｓ）（ｉ’，ｊ’）をそれぞれｐ（ｍ，ｓ）
（ｉ，ｊ）、ｑ（ｍ，ｓ）（ｉ，ｊ）のｐａｒｅｎｔと
呼ぶことにする。［ｘ］はｘを越えない最大整数であ
る。またｐ（ｍ，ｓ）（ｉ，ｊ）、ｑ（ｍ，ｓ）（ｉ，
ｊ）をそれぞれｐ（ｍ−１，ｓ）（ｉ’，ｊ’）、ｑ
（ｍ−１，ｓ）（ｉ’，ｊ’）のｃｈｉｌｄと呼ぶ。関
数ｐａｒｅｎｔ（ｉ，ｊ）は次式で定義される。(Equation 15) Holds, p (m−1, s) (i ′, j ′), q
(M-1, s) (i ', j') is respectively p (m, s)
Parents of (i, j) and q (m, s) (i, j) will be called. [X] is the maximum integer not exceeding x. Also, p (m, s) (i, j), q (m, s) (i,
j) is p (m−1, s) (i ′, j ′), q
It is called the child of (m-1, s) (i ', j'). The function parent (i, j) is defined by the following equation.

【００６０】[0060]

【数１６】ｐ（ｍ，ｓ）（ｉ，ｊ）とｑ（ｍ，ｓ）（ｋ，ｌ）の間
の写像ｆ（ｍ，ｓ）は、エネルギー計算を行って最小に
なったものを見つけることで決定される。ｆ（ｍ，ｓ）
（ｉ，ｊ）＝（ｋ，ｌ）の値はｆ（ｍ−１，ｓ）（ｍ＝
１，２，…，ｎ）を用いることによって、以下のように
決定される。まず、ｑ（ｍ，ｓ）（ｋ，ｌ）は次の四辺
形の内部になければならないという条件を課し、全単射
条件を満たす写像のうち現実性の高いものを絞り込む。(Equation 16) The mapping f (m, s) between p (m, s) (i, j) and q (m, s) (k, l) is determined by performing the energy calculation and finding the minimum one. Is done. f (m, s)
The value of (i, j) = (k, l) is f (m-1, s) (m =
, N) is determined as follows. First, q (m, s) (k, l) imposes a condition that it must be inside the following quadrilateral, and narrows down the mappings that satisfy the bijection condition with high reality.

【００６１】[0061]

【数１７】ただしここで、[Equation 17] But where

【数１８】である。こうして定めた四辺形を、以下ｐ（ｍ，ｓ）
（ｉ，ｊ）の相続（inherited）四辺形と呼ぶことにす
る。相続四辺形の内部において、エネルギーを最小にす
る画素を求める。(Equation 18) It is. The quadrilateral thus determined is expressed as p (m, s)
Let's call it the (i, j) inherited quadrilateral. The pixel which minimizes the energy inside the inherited quadrilateral is determined.

【００６２】図３は以上の手順を示している。同図にお
いて、始点画像のＡ，Ｂ，Ｃ，Ｄの画素は、第ｍ−１レ
ベルにおいてそれぞれ終点画像のＡ’，Ｂ’，Ｃ’，
Ｄ’へ写影される。画素ｐ（ｍ，ｓ）（ｉ，ｊ）は、相
続四辺形Ａ’Ｂ’Ｃ’Ｄ’の内部に存在する画素ｑ
（ｍ，ｓ）ｆ（ｍ）（ｉ，ｊ）へ写影されなければなら
ない。以上の配慮により、第ｍ−１レベルの写像から第
ｍレベルの写像への橋渡しがなされる。FIG. 3 shows the above procedure. In the figure, pixels A, B, C, and D of the start image are represented by A ', B', C ', and C' of the end image at the m-1 level.
It is mapped to D '. The pixel p (m, s) (i, j) is a pixel q existing inside the inherited quadrilateral A'B'C'D '.
(M, s) f (m) (i, j). With the above considerations, the mapping from the (m-1) th level mapping to the mth level mapping is performed.

【００６３】先に定義したエネルギーＥ０は、第ｍレベ
ルにおける副写像ｆ（ｍ，０）を計算するために、次式
に置き換える。The energy E0 defined above is replaced by the following equation in order to calculate the submap f (m, 0) at the m-th level.

【００６４】[0064]

【数１９】また、副写像ｆ（ｍ，ｓ）を計算するためには次式を用
いる。[Equation 19] The following equation is used to calculate the submap f (m, s).

【数２０】こうしてすべての副写像のエネルギーを低い値に保つ写
像が得られる。式２０により、異なる特異点に対応する
副写像が、副写像どうしの類似度が高くなるように同一
レベル内で関連づけられる。式１９は、ｆ（ｍ，ｓ）
（ｉ，ｊ）と、第ｍ−１レベルの画素の一部と考えた場
合の（ｉ，ｊ）が射影されるべき点の位置との距離を示
している。(Equation 20) In this way, a mapping is obtained in which the energies of all submappings are kept low. According to Equation 20, the submappings corresponding to different singularities are associated within the same level so that the submappings have a high degree of similarity. Equation 19 gives f (m, s)
It shows the distance between (i, j) and the position of the point to be projected at (i, j) when considered as a part of the pixel of the (m-1) th level.

【００６５】仮に、相続四辺形Ａ’Ｂ’Ｃ’Ｄ’の内部
に全単射条件を満たす画素が存在しない場合は以下の措
置をとる。まず、Ａ’Ｂ’Ｃ’Ｄ’の境界線からの距離
がＬ（始めはＬ＝１）である画素を調べる。それらのう
ち、エネルギーが最小になるものが全単射条件を満たせ
ば、これをｆ（ｍ，ｓ）（ｉ，ｊ）の値として選択す
る。そのような点が発見されるか、またはＬがその上限
のＬ（ｍ）ｍａｘに到達するまで、Ｌを大きくしてい
く。Ｌ（ｍ）ｍａｘは各レベルｍに対して固定である。
そのような点が全く発見されない場合、全単射の第３の
条件を一時的に無視して変換先の四辺形の面積がゼロに
なるような写像も認め、ｆ（ｍ，ｓ）（ｉ，ｊ）を決定
する。それでも条件を満たす点が見つからない場合、つ
ぎに全単射の第１及び第２条件を外す。If no pixel satisfying the bijection condition exists inside the inherited quadrilateral A'B'C'D ', the following measures are taken. First, a pixel whose distance from the boundary line of A'B'C'D 'is L (initially L = 1) is examined. If the one with the smallest energy satisfies the bijection condition, it is selected as the value of f (m, s) (i, j). L is increased until such a point is found or L reaches its upper limit L (m) max. L (m) max is fixed for each level m.
If no such point is found, a mapping in which the area of the destination quadrilateral becomes zero by temporarily ignoring the third condition of bijection is also recognized, and f (m, s) (i , J). If a point satisfying the condition is still not found, the first and second conditions for bijection are removed.

【００６６】多重解像度を用いる近似法は、写像が画像
の細部に影響されることを回避しつつ、画像間の大域的
な対応関係を決定するために必須である。多重解像度に
よる近似法を用いなければ、距離の遠い画素間の対応関
係を見いだすことは不可能である。その場合、画像のサ
イズはきわめて小さなものに限定しなければならず、変
化の小さな画像しか扱うことができない。さらに、通常
写像に滑らかさを要求するため、そうした画素間の対応
関係を見つけにくくしている。距離のある画素から画素
への写像のエネルギーは高いためである。多重解像度を
用いた近似法によれば、そうした画素間の適切な対応関
係を見いだすことができる。それらの距離は、解像度の
階層の上位レベル（粗いレベル）において小さいためで
ある。The approximation using multiple resolutions is essential for determining the global correspondence between images while avoiding the mapping being affected by image details. Unless an approximation method based on multiple resolutions is used, it is impossible to find a correspondence between pixels at a long distance. In that case, the size of the image must be limited to a very small one, and only an image with a small change can be handled. Further, since the mapping is usually required to be smooth, it is difficult to find the correspondence between such pixels. This is because the energy of mapping from a pixel at a distance to the pixel is high. According to the approximation method using multiple resolutions, an appropriate correspondence between such pixels can be found. This is because those distances are small at the upper level (coarse level) of the resolution hierarchy.

【００６７】［１．４］最適なパラメータ値の自動決定既存のマッチング技術の主な欠点のひとつに、パラメー
タ調整の困難さがある。大抵の場合、パラメータの調整
は人手作業によって行われ、最適な値を選択することは
きわめて難しい。前提技術に係る方法によれば、最適な
パラメータ値を完全に自動決定することができる。[1.4] Automatic Determination of Optimum Parameter Values One of the main drawbacks of the existing matching techniques is the difficulty of parameter adjustment. In most cases, parameter adjustments are made manually and it is extremely difficult to select the optimal value. According to the method according to the base technology, the optimal parameter value can be completely automatically determined.

【００６８】前提技術に係るシステムはふたつのパラメ
ータ、λ及びηを含む。端的にいえば、λは画素の輝度
の差の重みであり、ηは写像の剛性を示している。これ
らのパラメータの値は初期値が０であり、まずη＝０に
固定してλを０から徐々に増加させる。λの値を大きく
しながら、しかも総合評価式（式１４）の値を最小にす
る場合、各副写像に関するＣ（ｍ，ｓ）ｆの値は一般に
小さくなっていく。このことは基本的にふたつの画像が
よりマッチしなければならないことを意味する。しか
し、λが最適値を超えると以下の現象が発生する。The system according to the base technology includes two parameters, λ and η. In short, λ is the weight of the difference in luminance between pixels, and η indicates the rigidity of the mapping. The initial values of these parameters are 0. First, η is fixed to 0 and λ is gradually increased from 0. When increasing the value of λ and minimizing the value of the comprehensive evaluation expression (Expression 14), the value of C (m, s) f for each submapping generally decreases. This basically means that the two images have to match better. However, when λ exceeds the optimum value, the following phenomenon occurs.

【００６９】１．本来対応すべきではない画素どうし
が、単に輝度が近いというだけで誤って対応づけられ
る。２．その結果、画素どうしの対応関係がおかしくなり、
写像がくずれはじめる。３．その結果、式１４においてＤ（ｍ，ｓ）ｆが急激に
増加しようとする。４．その結果、式１４の値が急激に増加しようとするた
め、Ｄ（ｍ，ｓ）ｆの急激な増加を抑制するようｆ
（ｍ，ｓ）が変化し、その結果Ｃ（ｍ，ｓ）ｆが増加す
る。したがって、λを増加させながら式１４が最小値をとる
という状態を維持しつつＣ（ｍ，ｓ）ｆが減少から増加
に転じる閾値を検出し、そのλをη＝０における最適値
とする。つぎにηを少しづつ増やしてＣ（ｍ，ｓ）ｆの
挙動を検査し、後述の方法でηを自動決定する。そのη
に対応してλも決まる。1. Pixels that should not correspond to each other are erroneously associated simply because the luminance is close. 2. As a result, the correspondence between pixels becomes strange,
The mapping begins to collapse. 3. As a result, D (m, s) f tends to increase sharply in equation (14). 4. As a result, the value of equation (14) tends to increase sharply, so that f (m, s) f is suppressed so as to suppress a sharp increase.
(M, s) changes and consequently C (m, s) f increases. Therefore, while maintaining the state where Expression 14 takes the minimum value while increasing λ, a threshold value at which C (m, s) f changes from decreasing to increasing is detected, and the λ is set as the optimal value at η = 0. Next, the behavior of C (m, s) f is inspected by gradually increasing η, and η is automatically determined by a method described later. Its η
Is also determined corresponding to

【００７０】この方法は、人間の視覚システムの焦点機
構の動作に似ている。人間の視覚システムでは、一方の
目を動かしながら左右両目の画像のマッチングがとられ
る。オブジェクトがはっきりと認識できるとき、その目
が固定される。This method is similar to the operation of the focus mechanism of the human visual system. In the human visual system, the image of the left and right eyes is matched while moving one eye. When an object is clearly recognizable, its eyes are fixed.

【００７１】［１．４．１］λの動的決定 λは０から所定の刻み幅で増加されていき、λの値が変
わる度に副写像が評価される。式１４のごとく、総エネ
ルギーはλＣ（ｍ，ｓ）ｆ＋Ｄ（ｍ，ｓ）ｆによって定
義される。式９のＤ（ｍ，ｓ）ｆは滑らかさを表すもの
で、理論的には単位写像の場合に最小になり、写像が歪
むほどＥ０もＥ１も増加していく。Ｅ１は整数であるか
ら、Ｄ（ｍ，ｓ）ｆの最小刻み幅は１である。このた
め、現在のλＣ（ｍ，ｓ）（ｉ，ｊ）の変化（減少量）
が１以上でなければ、写像を変化させることによって総
エネルギーを減らすことはできない。なぜなら、写像の
変化に伴ってＤ（ｍ，ｓ）ｆは１以上増加するため、λ
Ｃ（ｍ，ｓ）（ｉ，ｊ）が１以上減少しない限り総エネ
ルギーは減らないためである。[1.4.1] Dynamic determination of λ λ is increased from 0 at a predetermined interval, and each time the value of λ changes, the submap is evaluated. As in Equation 14, the total energy is defined by λC (m, s) f + D (m, s) f. D (m, s) f in Equation 9 represents smoothness, and theoretically becomes minimum in the case of unit mapping, and E0 and E1 increase as the mapping becomes more distorted. Since E1 is an integer, the minimum step size of D (m, s) f is 1. Therefore, the change (decrease amount) of the current λC (m, s) (i, j)
If is not greater than 1, the total energy cannot be reduced by changing the mapping. Because D (m, s) f increases by 1 or more with the change of the mapping,
This is because the total energy does not decrease unless C (m, s) (i, j) decreases by 1 or more.

【００７２】この条件のもと、λの増加に伴い、正常な
場合にＣ（ｍ，ｓ）（ｉ，ｊ）が減少することを示す。
Ｃ（ｍ，ｓ）（ｉ，ｊ）のヒストグラムをｈ（ｌ）と記
述する。ｈ（ｌ）はエネルギーＣ（ｍ，ｓ）（ｉ，ｊ）
がｌ２である画素の数である。λｌ２≧１が成り立つた
めに、例えばｌ２＝１／λの場合を考える。λがλ１か
らλ２まで微小量変化するとき、Under these conditions, it is shown that C (m, s) (i, j) decreases in a normal case as λ increases.
The histogram of C (m, s) (i, j) is described as h (l). h (l) is the energy C (m, s) (i, j)
Is the number of pixels with l2. In order to satisfy λ12 ≧ 1, let us consider, for example, the case of 12 = 1 / λ. When λ changes by a small amount from λ1 to λ2,

【００７３】[0073]

【数２１】で示されるＡ個の画素が、(Equation 21) A pixels represented by

【数２２】のエネルギーを持つより安定的な状態に変化する。ここ
では仮に、これらの画素のエネルギーがすべてゼロにな
ると近似している。この式はＣ（ｍ，ｓ）ｆの値が、(Equation 22) Changes to a more stable state with the energy of Here, it is assumed that all of the energies of these pixels become zero. This equation shows that the value of C (m, s) f is

【００７４】[0074]

【数２３】だけ変化することを示し、その結果、(Equation 23) Only change, so that

【００７５】[0075]

【数２４】が成立する。ｈ（ｌ）＞０であるから、通常Ｃ（ｍ，
ｓ）ｆは減少する。しかし、λが最適値を越えようとす
るとき、上述の現象、つまりＣ（ｍ，ｓ）ｆの増加が発
生する。この現象を検出することにより、λの最適値を
決定する。(Equation 24) Holds. Since h (l)> 0, C (m,
s) f decreases. However, when λ exceeds the optimum value, the above phenomenon, that is, an increase in C (m, s) f occurs. By detecting this phenomenon, the optimum value of λ is determined.

【００７６】なお、Ｈ（ｈ＞０）及びｋを定数とすると
き、When H (h> 0) and k are constants,

【数２５】と仮定すれば、(Equation 25) Assuming that

【数２６】が成り立つ。このときｋ≠−３であれば、(Equation 26) Holds. At this time, if k ≠ -3,

【００７７】[0077]

【数２７】となる。これがＣ（ｍ，ｓ）ｆの一般式である（Ｃは定
数）。[Equation 27] Becomes This is the general formula of C (m, s) f (C is a constant).

【００７８】λの最適値を検出する際、さらに安全を見
て、全単射条件を破る画素の数を検査してもよい。ここ
で各画素の写像を決定する際、全単射条件を破る確率を
ｐ０と仮定する。この場合、When detecting the optimum value of λ, the number of pixels that violates the bijection condition may be examined for safety. Here, when determining the mapping of each pixel, it is assumed that the probability of violating the bijection condition is p0. in this case,

【００７９】[0079]

【数２８】が成立しているため、全単射条件を破る画素の数は次式
の率で増加する。[Equation 28] Holds, the number of pixels that violate the bijection condition increases at the following rate.

【数２９】従って、(Equation 29) Therefore,

【数３０】は定数である。仮にｈ（ｌ）＝Ｈｌｋを仮定するとき、
例えば、[Equation 30] Is a constant. Assuming h (l) = Hlk,
For example,

【数３１】は定数になる。しかしλが最適値を越えると、上の値は
急速に増加する。この現象を検出し、Ｂ０λ３／２＋ｋ
／２／２ｍの値が異常値Ｂ０ｔｈｒｅｓを越えるかどう
かを検査し、λの最適値を決定することができる。同様
に、Ｂ１λ３／２＋ｋ／２／２ｍの値が異常値Ｂ１ｔｈ
ｒｅｓを越えるかどうかを検査することにより、全単射
の第３の条件を破る画素の増加率Ｂ１を確認する。ファ
クター２ｍを導入する理由は後述する。このシステムは
これら２つの閾値に敏感ではない。これらの閾値は、エ
ネルギーＣ（ｍ，ｓ）ｆの観察では検出し損なった写像
の過度の歪みを検出するために用いることができる。(Equation 31) Becomes a constant. However, when λ exceeds the optimal value, the above value increases rapidly. This phenomenon is detected, and B0λ3 / 2 + k
It is possible to determine whether the value of / 2 / 2m exceeds the abnormal value B0thres and determine the optimal value of λ. Similarly, the value of B1λ3 / 2 + k / 2 / 2m is an abnormal value B1th
By checking whether or not res is exceeded, the rate of increase B1 of pixels that violates the third condition of bijection is confirmed. The reason for introducing the factor 2m will be described later. This system is not sensitive to these two thresholds. These thresholds can be used to detect excessive distortion of the mapping that cannot be detected by observation of the energy C (m, s) f.

【００８０】なお実験では、副写像ｆ（ｍ，ｓ）を計算
する際、もしλが０．１を越えたらｆ（ｍ，ｓ）の計算
は止めてｆ（ｍ，ｓ＋１）の計算に移行した。λ＞０．
１のとき、画素の輝度２５５レベル中のわずか「３」の
違いが副写像の計算に影響したためであり、λ＞０．１
のとき正しい結果を得ることは困難だったためである。In the experiment, when the submap f (m, s) is calculated, if λ exceeds 0.1, the calculation of f (m, s) is stopped and the calculation is shifted to the calculation of f (m, s + 1). did. λ> 0.
When 1, the difference of only “3” in the 255 levels of luminance of the pixel affected the calculation of the submapping, and λ> 0.1
It was difficult to get the correct result at that time.

【００８１】［１．４．２］ヒストグラムｈ（ｌ）Ｃ（ｍ，ｓ）ｆの検査はヒストグラムｈ（ｌ）に依存し
ない。全単射及びその第３の条件の検査の際、ｈ（ｌ）
に影響を受けうる。実際に（λ，Ｃ（ｍ，ｓ）ｆ）をプ
ロットすると、ｋは通常１付近にある。実験ではｋ＝１
を用い、Ｂ０λ２とＢ１λ２を検査した。仮にｋの本当
の値が１未満であれば、Ｂ０λ２とＢ１λ２は定数にな
らず、ファクターλ（１−ｋ）／２に従って徐々に増加
する。ｈ（ｌ）が定数であれば、例えばファクターはλ
１／２である。しかし、こうした差は閾値Ｂ０ｔｈｒｅ
ｓを正しく設定することによって吸収することができ
る。[1.4.2] Histogram h (l) The inspection of C (m, s) f does not depend on the histogram h (l). When testing for bijection and its third condition, h (l)
Can be affected. When (λ, C (m, s) f) is actually plotted, k is usually around 1. In the experiment, k = 1
Was used to examine B0λ2 and B1λ2. If the true value of k is less than 1, B0λ2 and B1λ2 do not become constants, but gradually increase according to the factor λ (1-k) / 2. If h (l) is a constant, for example, the factor is λ
It is 1/2. However, such a difference is due to the threshold B0thre
It can be absorbed by setting s correctly.

【００８２】ここで次式のごとく始点画像を中心が（ｘ
０，ｙ０）、半径ｒの円形のオブジェクトであると仮定
する。Here, as shown in the following equation, the center of the starting image is (x
(0, y0) and a circular object with a radius r.

【数３２】一方、終点画像は、次式のごとく中心（ｘ１，ｙ１）、
半径がｒのオブジェクトであるとする。(Equation 32) On the other hand, the end point image has the center (x1, y1) as in the following equation,
Assume that the object has a radius of r.

【００８３】[0083]

【数３３】ここでｃ（ｘ）はｃ（ｘ）＝ｘｋの形であるとする。中
心（ｘ０，ｙ０）及び（ｘ１，ｙ１）が十分遠い場合、
ヒストグラムｈ（ｌ）は次式の形となる。[Equation 33] Here, it is assumed that c (x) has a form of c (x) = xk. If the centers (x0, y0) and (x1, y1) are far enough,
The histogram h (l) has the form shown below.

【００８４】[0084]

【数３４】ｋ＝１のとき、画像は背景に埋め込まれた鮮明な境界線
を持つオブジェクトを示す。このオブジェクトは中心が
暗く、周囲にいくに従って明るくなる。ｋ＝−１のと
き、画像は曖昧な境界線を持つオブジェクトを表す。こ
のオブジェクトは中心が最も明るく、周囲にいくに従っ
て暗くなる。一般のオブジェクトはこれらふたつのタイ
プのオブジェクトの中間にあると考えてもさして一般性
を失わない。したがって、ｋは−１≦ｋ≦１として大抵
の場合をカバーでき、式２７が一般に減少関数であるこ
とが保障される。(Equation 34) When k = 1, the image shows objects with sharp boundaries embedded in the background. This object is darker in the center and brighter toward the periphery. When k = -1, the image represents an object with ambiguous boundaries. This object is brightest in the center and gets darker as it goes around. The generality of objects can be considered to be intermediate between these two types of objects without loss of generality. Therefore, k can cover most cases as −1 ≦ k ≦ 1, and it is guaranteed that Equation 27 is generally a decreasing function.

【００８５】なお、式３４からわかるように、ｒは画像
の解像度に影響されること、すなわちｒは２ｍに比例す
ることに注意すべきである。このために［１．４．１］
においてファクター２ｍを導入した。It should be noted that as can be seen from Equation 34, r is affected by the resolution of the image, that is, r is proportional to 2 m. For this, [1.4.1]
Introduced a factor of 2 m.

【００８６】［１．４．３］ηの動的決定パラメータηも同様の方法で自動決定できる。はじめに
η＝０とし、最も細かい解像度における最終的な写像ｆ
（ｎ）及びエネルギーＣ（ｎ）ｆを計算する。つづい
て、ηをある値Δηだけ増加させ、再び最も細かい解像
度における最終写像ｆ（ｎ）及びエネルギーＣ（ｎ）ｆ
を計算し直す。この過程を最適値が求まるまで続ける。
ηは写像の剛性を示す。次式の重みだからである。[1.4.3] Dynamic Determination of η The parameter η can be automatically determined in the same manner. First, η = 0, and the final mapping f at the finest resolution
Calculate (n) and energy C (n) f. Subsequently, η is increased by a certain value Δη, and again the final mapping f (n) and the energy C (n) f at the finest resolution
Is recalculated. This process is continued until the optimum value is obtained.
η indicates the rigidity of the mapping. This is because the weight of the following equation.

【００８７】[0087]

【数３５】 ηが０のとき、Ｄ（ｎ）ｆは直前の副写像と無関係に決
定され、現在の副写像は弾性的に変形され、過度に歪む
ことになる。一方、ηが非常に大きな値のとき、Ｄ
（ｎ）ｆは直前の副写像によってほぼ完全に決まる。こ
のとき副写像は非常に剛性が高く、画素は同じ場所に射
影される。その結果、写像は単位写像になる。ηの値が
０から次第に増えるとき、後述のごとくＣ（ｎ）ｆは徐
々に減少する。しかしηの値が最適値を越えると、図４
に示すとおり、エネルギーは増加し始める。同図のＸ軸
はη、Ｙ軸はＣｆである。(Equation 35) When η is 0, D (n) f is determined independently of the immediately preceding submap, and the current submap is elastically deformed and excessively distorted. On the other hand, when η is a very large value, D
(N) f is almost completely determined by the immediately preceding submapping. At this time, the sub-mapping is very rigid, and the pixels are projected at the same place. As a result, the mapping becomes a unit mapping. When the value of η gradually increases from 0, C (n) f gradually decreases as described later. However, when the value of η exceeds the optimum value, FIG.
As shown in the figure, the energy starts to increase. In the figure, the X axis is η, and the Y axis is Cf.

【００８８】この方法でＣ（ｎ）ｆを最小にする最適な
ηの値を得ることができる。しかし、λの場合に比べて
いろいろな要素が計算に影響する結果、Ｃ（ｎ）ｆは小
さく揺らぎながら変化する。λの場合は、入力が微小量
変化するたびに副写像を１回計算しなおすだけだが、η
の場合はすべての副写像が計算しなおされるためであ
る。このため、得られたＣ（ｎ）ｆの値が最小であるか
どうかを即座に判断することはできない。最小値の候補
が見つかれば、さらに細かい区間を設定することによっ
て真の最小値を探す必要がある。In this way, an optimum value of η that minimizes C (n) f can be obtained. However, as compared with the case of λ, various factors affect the calculation, and as a result, C (n) f changes while slightly fluctuating. In the case of λ, the submapping is only recalculated once every time the input changes by a small amount.
In the case of, all submappings are recalculated. Therefore, it is not possible to immediately determine whether the obtained value of C (n) f is the minimum. If a candidate for the minimum value is found, it is necessary to search for the true minimum value by setting a finer section.

【００８９】［１．５］スーパーサンプリング画素間の対応関係を決定する際、自由度を増やすため
に、ｆ（ｍ，ｓ）の値域をＲ×Ｒに拡張することができ
る（Ｒは実数の集合）。この場合、終点画像の画素の輝
度が補間され、非整数点、[1.5] Super Sampling When determining the correspondence between pixels, the range of f (m, s) can be extended to R × R in order to increase the degree of freedom (R is a real number). set). In this case, the brightness of the pixel of the end point image is interpolated, and the non-integer point,

【００９０】[0090]

【数３６】における輝度を持つｆ（ｍ，ｓ）が提供される。つまり
スーパーサンプリングが行われる。実験では、ｆ（ｍ，
ｓ）は整数及び半整数値をとることが許され、[Equation 36] F (m, s) with a luminance at is provided. That is, super sampling is performed. In the experiment, f (m,
s) is allowed to take integer and half-integer values,

【数３７】は、(37) Is

【数３８】によって与えられた。(38) Given by

【００９１】［１．６］各画像の画素の輝度の正規化始点画像と終点画像がきわめて異なるオブジェクトを含
んでいるとき、写像の計算に元の画素の輝度がそのまま
では利用しにくい。輝度の差が大きいために輝度に関す
るエネルギーＣ（ｍ，ｓ）ｆが大きくなりすぎ、正しい
評価がしずらいためである。[1.6] Normalization of Luminance of Pixels in Each Image When the start image and the end image include extremely different objects, it is difficult to use the original pixel luminance as it is for calculating the mapping. This is because the energy C (m, s) f relating to the luminance is too large due to a large difference in luminance, and it is difficult to make a correct evaluation.

【００９２】例えば、図２０（ａ）と図２０（ｂ）に示
すように人の顔と猫の顔のマッチングをとる場合を考え
る。猫の顔は毛で覆われており、非常に明るい画素と非
常に暗い画素が混じっている。この場合、ふたつの顔の
間の副写像を計算するために、まず副画像を正規化す
る。すなわち、最も暗い画素の輝度を０、最も明るいそ
れを２５５に設定し、他の画素の輝度は線形補間によっ
て求めておく。For example, consider a case where a human face and a cat face are matched as shown in FIGS. 20 (a) and 20 (b). The cat's face is covered with hair and contains very bright and very dark pixels. In this case, the sub-image is first normalized in order to calculate the sub-map between the two faces. That is, the brightness of the darkest pixel is set to 0, the brightness of the brightest pixel is set to 255, and the brightness of the other pixels is obtained by linear interpolation.

【００９３】［１．７］インプリメンテーション始点画像のスキャンに従って計算がリニアに進行する帰
納的な方法を用いる。始めに、１番上の左端の画素
（ｉ，ｊ）＝（０，０）についてｆ（ｍ，ｓ）の値を決
定する。次にｉを１ずつ増やしながら各ｆ（ｍ，ｓ）
（ｉ，ｊ）の値を決定する。ｉの値が画像の幅に到達し
たとき、ｊの値を１増やし、ｉを０に戻す。以降、始点
画像のスキャンに伴いｆ（ｍ，ｓ）（ｉ，ｊ）を決定し
ていく。すべての点について画素の対応が決まれば、ひ
とつの写像ｆ（ｍ，ｓ）が決まる。[1.7] Implementation A recursive method in which the calculation proceeds linearly according to the scanning of the starting point image is used. First, the value of f (m, s) is determined for the top left pixel (i, j) = (0,0). Next, while increasing i by 1, each f (m, s)
Determine the value of (i, j). When the value of i reaches the width of the image, the value of j is increased by 1 and i is returned to 0. Thereafter, f (m, s) (i, j) is determined in accordance with the scanning of the starting point image. If the correspondence of pixels is determined for all points, one mapping f (m, s) is determined.

【００９４】あるｐ（ｉ，ｊ）について対応点ｑｆ
（ｉ，ｊ）が決まれば、つぎにｐ（ｉ，ｊ＋１）の対応
点ｑｆ（ｉ，ｊ＋１）が決められる。この際、ｑｆ
（ｉ，ｊ＋１）の位置は全単射条件を満たすために、ｑ
ｆ（ｉ，ｊ）の位置によって制限される。したがって、
先に対応点が決まる点ほどこのシステムでは優先度が高
くなる。つねに（０，０）が最も優先される状態がつづ
くと、求められる最終の写像に余計な偏向が加わる。本
前提技術ではこの状態を回避するために、ｆ（ｍ，ｓ）
を以下の方法で決めていく。For a given p (i, j), the corresponding point qf
Once (i, j) is determined, the corresponding point qf (i, j + 1) of p (i, j + 1) is next determined. At this time, qf
The position of (i, j + 1) is q
Limited by the position of f (i, j). Therefore,
The priority of this system is higher as the corresponding point is determined first. If the state where (0,0) always has the highest priority continues, an extra deflection is added to the final mapping required. In the base technology, f (m, s) is used to avoid this state.
Is determined by the following method.

【００９５】まず（ｓｍｏｄ４）が０の場合、（０，
０）を開始点としｉ及びｊを徐々に増やしながら決めて
いく。（ｓｍｏｄ４）が１の場合、最上行の右端点を
開始点とし、ｉを減少、ｊを増加させながら決めてい
く。（ｓｍｏｄ４）が２のとき、最下行の右端点を開
始点とし、ｉ及びｊを減少させながら決めていく。（ｓ
ｍｏｄ４）が３の場合、最下行の左端点を開始点と
し、ｉを増加、ｊを減少させながら決めていく。解像度
が最も細かい第ｎレベルには副写像という概念、すなわ
ちパラメータｓが存在しないため、仮にｓ＝０及びｓ＝
２であるとしてふたつの方向を連続的に計算した。First, when (s mod 4) is 0, (0,
0) is determined as a starting point while gradually increasing i and j. When (s mod 4) is 1, the right end point of the uppermost line is set as a start point, and i is decreased and j is increased. When (s mod 4) is 2, the right end point of the bottom row is set as a starting point, and the values are determined while decreasing i and j. (S
When mod 4) is 3, the left end point of the bottom line is set as a start point, and i is increased and j is decreased while j is determined. Since the concept of submapping, that is, the parameter s does not exist at the n-th level having the finest resolution, s = 0 and s =
The two directions were continuously calculated as being 2.

【００９６】実際のインプリメンテーションでは、全単
射条件を破る候補に対してペナルティを与えることによ
り、候補（ｋ，ｌ）の中からできる限り全単射条件を満
たすｆ（ｍ，ｓ）（ｉ，ｊ）（ｍ＝０，…，ｎ）の値を
選んだ。第３の条件を破る候補のエネルギーＤ（ｋ、
ｌ）にはφを掛け、一方、第１または第２の条件を破る
候補にはψを掛ける。今回はφ＝２、ψ＝１０００００
を用いた。In an actual implementation, a penalty is given to a candidate that violates the bijection condition, so that f (m, s) ( i, j) (m = 0,..., n). The energy D (k,
l) is multiplied by φ, while candidates that violate the first or second condition are multiplied by ψ. This time φ = 2, ψ = 100000
Was used.

【００９７】前述の全単射条件のチェックのために、実
際の手続として（ｋ，ｌ）＝ｆ（ｍ，ｓ）（ｉ，ｊ）を
決定する際に以下のテストを行った。すなわちｆ（ｍ，
ｓ）（ｉ，ｊ）の相続四辺形に含まれる各格子点（ｋ，
ｌ）に対し、次式の外積のｚ成分が０以上になるかどう
かを確かめる。For checking the above-mentioned bijection condition, the following test was performed when (k, l) = f (m, s) (i, j) was determined as an actual procedure. That is, f (m,
s) Each lattice point (k, k) included in the inherited quadrilateral of (i, j)
For 1), it is checked whether or not the z component of the outer product of the following equation becomes 0 or more.

【００９８】[0098]

【数３９】ただしここで、[Equation 39] But where

【数４０】 (Equation 40)

【数４１】である（ここでベクトルは三次元ベクトルとし、ｚ軸は
直交右手座標系において定義される）。もしＷが負であ
れば、その候補についてはＤ（ｍ，ｓ）（ｋ，ｌ）にψ
を掛けることによってペナルティを与え、できるかぎり
選択しないようにする。[Equation 41] (Where the vector is a three-dimensional vector, and the z-axis is defined in an orthogonal right-handed coordinate system). If W is negative, then for that candidate D (m, s) (k, l)
To give a penalty and avoid choices as much as possible.

【００９９】図５（ａ）、図５（ｂ）はこの条件を検査
する理由を示している。図５（ａ）はペナルティのない
候補、図５（ｂ）はペナルティがある候補をそれぞれ表
す。隣接画素（ｉ，ｊ＋１）に対する写像ｆ（ｍ，ｓ）
（ｉ，ｊ＋１）を決定する際、Ｗのｚ成分が負であれば
始点画像平面上において全単射条件を満足する画素は存
在しない。なぜなら、ｑ（ｍ，ｓ）（ｋ，ｌ）は隣接す
る四辺形の境界線を越えるためである。FIGS. 5A and 5B show the reason for checking this condition. FIG. 5A shows a candidate without a penalty, and FIG. 5B shows a candidate with a penalty. Mapping f (m, s) for adjacent pixel (i, j + 1)
When determining (i, j + 1), if the z component of W is negative, no pixel satisfies the bijection condition on the source image plane. This is because q (m, s) (k, l) crosses the boundary of an adjacent quadrilateral.

【０１００】［１．７．１］副写像の順序インプリメンテーションでは、解像度レベルが偶数のと
きにはσ（０）＝０、σ（１）＝１、σ（２）＝２、σ
（３）＝３、σ（４）＝０を用い、奇数のときはσ
（０）＝３、σ（１）＝２、σ（２）＝１、σ（３）＝
０、σ（４）＝３を用いた。このことで、副写像を適度
にシャッフルした。なお、本来副写像は４種類であり、
ｓは０〜３のいずれかである。しかし、実際にはｓ＝４
に相当する処理を行った。その理由は後述する。[1.7.1] Order of submapping In the implementation, when the resolution level is even, σ (0) = 0, σ (1) = 1, σ (2) = 2, σ
(3) = 3, σ (4) = 0 is used, and when the number is odd, σ
(0) = 3, σ (1) = 2, σ (2) = 1, σ (3) =
0 and σ (4) = 3 were used. As a result, the sub-mapping was shuffled appropriately. Note that there are originally four types of submappings,
s is any of 0 to 3. However, actually, s = 4
Was performed. The reason will be described later.

【０１０１】［１．８］補間計算始点画像と終点画像の間の写像が決定された後、対応し
あう画素の輝度が補間される。実験では、トライリニア
補間を用いた。始点画像平面における正方形ｐ（ｉ，
ｊ）ｐ（ｉ＋１，ｊ）ｐ（ｉ，ｊ＋１）ｐ（ｉ＋１，ｊ
＋１）が終点画像平面上の四辺形ｑｆ（ｉ，ｊ）ｑｆ
（ｉ＋１，ｊ）ｑｆ（ｉ，ｊ＋１）ｑｆ（ｉ＋１，ｊ＋
１）に射影されると仮定する。簡単のため、画像間の距
離を１とする。始点画像平面からの距離がｔ（０≦ｔ≦
１）である中間画像の画素ｒ（ｘ，ｙ，ｔ）（０≦ｘ≦
Ｎ−１，０≦ｙ≦Ｍ−１）は以下の要領で求められる。
まず画素ｒ（ｘ，ｙ，ｔ）の位置（ただしｘ，ｙ，ｔ∈
Ｒ）を次式で求める。[1.8] Interpolation Calculation After the mapping between the start image and the end image is determined, the luminance of the corresponding pixels is interpolated. In the experiment, trilinear interpolation was used. The square p (i,
j) p (i + 1, j) p (i, j + 1) p (i + 1, j
+1) is a quadrangle qf (i, j) qf on the destination image plane
(I + 1, j) qf (i, j + 1) qf (i + 1, j +
Assume that it is projected to 1). For simplicity, the distance between images is set to 1. The distance from the starting image plane is t (0 ≦ t ≦
1) pixel r (x, y, t) of the intermediate image (0 ≦ x ≦
N−1, 0 ≦ y ≦ M−1) is obtained in the following manner.
First, the position of the pixel r (x, y, t) (where x, y, tｙ)
R) is determined by the following equation.

【０１０２】[0102]

【数４２】つづいてｒ（ｘ，ｙ，ｔ）における画素の輝度が次の式
を用いて決定される。(Equation 42) Subsequently, the luminance of the pixel at r (x, y, t) is determined using the following equation.

【０１０３】[0103]

【数４３】ここでｄｘ及びｄｙはパラメータであり、０から１まで
変化する。[Equation 43] Here, dx and dy are parameters and change from 0 to 1.

【０１０４】［１．９］拘束条件を課したときの写像いままでは拘束条件がいっさい存在しない場合の写像の
決定を述べた。しかし、始点画像と終点画像の特定の画
素間に予め対応関係が規定されているとき、これを拘束
条件としたうえで写像を決定することができる。[1.9] Mapping when constraint conditions are imposed The determination of a mapping when no constraint conditions exist without any change has been described. However, when the correspondence between the specific pixels of the start point image and the end point image is defined in advance, the mapping can be determined based on the constraint.

【０１０５】基本的な考えは、まず始点画像の特定の画
素を終点画像の特定の画素に移す大まかな写像によって
始点画像を大まかに変形し、しかる後、写像ｆを正確に
計算する。The basic idea is that the starting image is roughly transformed by a rough mapping that moves a specific pixel of the starting image to a specific pixel of the end image, and then the mapping f is calculated accurately.

【０１０６】まず始めに、始点画像の特定の画素を終点
画像の特定の画素に射影し、始点画像の他の画素を適当
な位置に射影する大まかな写像を決める。すなわち、特
定の画素に近い画素は、その特定の画素が射影される場
所の近くに射影されるような写像である。ここで第ｍレ
ベルの大まかな写像をＦ（ｍ）と記述する。First, a specific mapping of a specific pixel of the start image to a specific pixel of the end image and a projection of another pixel of the start image to an appropriate position are determined. That is, a pixel that is close to a particular pixel is a mapping that is projected near where the particular pixel is projected. Here, the rough mapping of the m-th level is described as F (m).

【０１０７】大まかな写像Ｆは以下の要領で決める。ま
ず、いくつかの画素について写像を特定する。始点画像
についてｎｓ個の画素、A rough mapping F is determined in the following manner. First, mappings are specified for some pixels. Ns pixels for the source image,

【数４４】を特定するとき、以下の値を決める。[Equation 44] When specifying, determine the following values:

【数４５】始点画像の他の画素の変位量は、ｐ（ｉｈ，ｊｈ）（ｈ
＝０，…，ｎｓ−１）の変位に重み付けをして求められ
る平均である。すなわち画素ｐ（ｉ，ｊ）は、終点画像
の以下の画素に射影される。[Equation 45] The displacement amount of the other pixels of the starting image is p (ih, jh) (h
= 0,..., Ns-1). That is, the pixel p (i, j) is projected to the following pixel of the end point image.

【０１０８】[0108]

【数４６】ただしここで、[Equation 46] But where

【数４７】 [Equation 47]

【数４８】とする。[Equation 48] And

【０１０９】つづいて、Ｆ（ｍ）に近い候補写像ｆがよ
り少ないエネルギーを持つように、その写像ｆのエネル
ギーＤ（ｍ，ｓ）（ｉ，ｊ）を変更する。正確には、Ｄ
（ｍ，ｓ）（ｉ，ｊ）は、Next, the energy D (m, s) (i, j) of the mapping f is changed so that the candidate mapping f close to F (m) has less energy. To be precise, D
(M, s) (i, j) is

【数４９】である。ただし、[Equation 49] It is. However,

【数５０】であり、κ，ρ≧０とする。最後に、前述の写像の自動
計算プロセスにより、ｆを完全に決定する。[Equation 50] And κ, ρ ≧ 0. Finally, f is completely determined by the automatic mapping calculation process described above.

【０１１０】ここで、ｆ（ｍ，ｓ）（i,j)がＦ（ｍ）
（i,j）に十分近いとき、つまりそれらの距離が、Here, f (m, s) (i, j) is F (m)
When they are close enough to (i, j), that is, their distance is

【数５１】以内であるとき、Ｅ２（ｍ，ｓ）（ｉ，ｊ）が０になる
ことに注意すべきである。そのように定義した理由は、
各ｆ（ｍ，ｓ）（i,j)がＦ（ｍ）（i,j)に十分近い限
り、終点画像において適切な位置に落ち着くよう、その
値を自動的に決めたいためである。この理由により、正
確な対応関係を詳細に特定する必要がなく、始点画像は
終点画像にマッチするように自動的にマッピングされ
る。(Equation 51) Note that E2 (m, s) (i, j) goes to 0 when The reason for such a definition is
This is because, as long as each f (m, s) (i, j) is sufficiently close to F (m) (i, j), its value is to be automatically determined so as to settle to an appropriate position in the end point image. For this reason, the exact correspondence need not be specified in detail, and the start image is automatically mapped to match the end image.

【０１１１】［２］具体的な処理手順［１］の各要素技術による処理の流れを説明する。図６
は前提技術の全体手順を示すフローチャートである。同
図のごとく、まず多重解像度特異点フィルタを用いた処
理を行い（Ｓ１）、つづいて始点画像と終点画像のマッ
チングをとる（Ｓ２）。ただし、Ｓ２は必須ではなく、
Ｓ１で得られた画像の特徴をもとに画像認識などの処理
を行ってもよい。[2] Specific Processing Procedure The flow of processing according to each element technology of [1] will be described. FIG.
Is a flowchart showing the overall procedure of the base technology. As shown in the figure, first, processing using a multi-resolution singularity filter is performed (S1), and then, matching between the start image and the end image is performed (S2). However, S2 is not essential,
Processing such as image recognition may be performed based on the features of the image obtained in S1.

【０１１２】図７は図６のＳ１の詳細を示すフローチャ
ートである。ここではＳ２で始点画像と終点画像のマッ
チングをとることを前提としている。そのため、まず特
異点フィルタによって始点画像の階層化を行い（Ｓ１
０）、一連の始点階層画像を得る。つづいて同様の方法
で終点画像の階層化を行い（Ｓ１１）、一連の終点階層
画像を得る。ただし、Ｓ１０とＳ１１の順序は任意であ
るし、始点階層画像と終点階層画像を並行して生成して
いくこともできる。FIG. 7 is a flowchart showing the details of S1 in FIG. Here, it is assumed that the start image and the end image are matched in S2. Therefore, the starting point image is first hierarchized by the singular point filter (S1).
0), obtain a series of starting hierarchical images. Subsequently, the end image is hierarchized by the same method (S11), and a series of end hierarchical images is obtained. However, the order of S10 and S11 is arbitrary, and the start hierarchical image and the end hierarchical image can be generated in parallel.

【０１１３】図８は図７のＳ１０の詳細を示すフローチ
ャートである。もとの始点画像のサイズは２ｎ×２ｎと
する。始点階層画像は解像度が細かいほうから順に作ら
れるため、処理の対象となる解像度レベルを示すパラメ
ータｍをｎにセットする（Ｓ１００）。つづいて第ｍレ
ベルの画像ｐ（ｍ，０）、ｐ（ｍ，１）、ｐ（ｍ，
２）、ｐ（ｍ，３）から特異点フィルタを用いて特異点
を検出し（Ｓ１０１）、それぞれ第ｍ−１レベルの画像
ｐ（ｍ−１，０）、ｐ（ｍ−１，１）、ｐ（ｍ−１，
２）、ｐ（ｍ−１，３）を生成する（Ｓ１０２）。ここ
ではｍ＝ｎであるため、ｐ（ｍ，０）＝ｐ（ｍ，１）＝
ｐ（ｍ，２）＝ｐ（ｍ，３）＝ｐ（ｎ）であり、ひとつ
の始点画像から４種類の副画像が生成される。FIG. 8 is a flowchart showing the details of S10 of FIG. The size of the original start image is 2n × 2n. Since the starting hierarchical image is created in order from the one with the smallest resolution, the parameter m indicating the resolution level to be processed is set to n (S100). Subsequently, the m-th level images p (m, 0), p (m, 1), p (m,
2), a singular point is detected from p (m, 3) using a singular point filter (S101), and the images p (m-1, 0) and p (m-1, 1) at the (m-1) th level, respectively. , P (m−1,
2), p (m-1, 3) is generated (S102). Here, since m = n, p (m, 0) = p (m, 1) =
p (m, 2) = p (m, 3) = p (n), and four types of sub-images are generated from one start-point image.

【０１１４】図９は第ｍレベルの画像の一部と、第ｍ−
１レベルの画像の一部の対応関係を示している。同図の
数値は各画素の輝度を示す。同図のｐ（ｍ，ｓ）はｐ
（ｍ，０）〜ｐ（ｍ，３）の４つの画像を象徴するもの
で、ｐ（ｍ−１，０）を生成する場合には、ｐ（ｍ，
ｓ）はｐ（ｍ，０）であると考える。［１．２］で示し
た規則により、ｐ（ｍ−１，０）は例えば同図で輝度を
記入したブロックについて、そこに含まれる４画素のう
ち「３」、ｐ（ｍ−１，１）は「８」、ｐ（ｍ−１，
２）は「６」、ｐ（ｍ−１，３）を「１０」をそれぞれ
取得し、このブロックをそれぞれ取得したひとつの画素
で置き換える。したがって、第ｍ−１レベルの副画像の
サイズは２ｍ−１×２ｍ−１になる。FIG. 9 shows a part of the m-th level image and the m-th level image.
The correspondence of a part of the one-level image is shown. Numerical values in the figure show the luminance of each pixel. In the figure, p (m, s) is p
Symbolizes four images (m, 0) to p (m, 3). When generating p (m-1, 0), p (m, 0)
s) is considered to be p (m, 0). According to the rule shown in [1.2], p (m−1,0) is, for example, “3”, p (m−1,1) out of four pixels included in a block in which luminance is written in FIG. ) Is “8”, p (m−1,
In 2), “10” is obtained for “6” and p (m−1,3), respectively, and this block is replaced with one obtained pixel. Therefore, the size of the sub-image at the (m-1) th level is 2m-1 × 2m-1.

【０１１５】つづいてｍをデクリメントし（図８のＳ１
０３）、ｍが負になっていないことを確認し（Ｓ１０
４）、Ｓ１０１に戻ってつぎに解像度の粗い副画像を生
成していく。この繰り返し処理の結果、ｍ＝０、すなわ
ち第０レベルの副画像が生成された時点でＳ１０が終了
する。第０レベルの副画像のサイズは１×１である。Subsequently, m is decremented (S1 in FIG. 8).
03), and confirm that m is not negative (S10).
4) Returning to S101, a sub-image having a coarse resolution is generated next. As a result of this repetition processing, S = 0 ends when m = 0, that is, when the 0th level sub-image is generated. The size of the 0th level sub-image is 1 × 1.

【０１１６】図１０はＳ１０によって生成された始点階
層画像をｎ＝３の場合について例示している。最初の始
点画像のみが４つの系列に共通であり、以降特異点の種
類に応じてそれぞれ独立に副画像が生成されていく。な
お、図８の処理は図７のＳ１１にも共通であり、同様の
手順を経て終点階層画像も生成される。以上で図６のＳ
１による処理が完了する。FIG. 10 exemplifies the starting hierarchical image generated in S10 when n = 3. Only the first starting point image is common to the four series, and thereafter, sub-images are generated independently according to the types of singularities. Note that the processing in FIG. 8 is common to S11 in FIG. 7, and the destination hierarchical image is also generated through the same procedure. Thus, S in FIG.
1 is completed.

【０１１７】前提技術では、図６のＳ２に進むためにマ
ッチング評価の準備をする。図１１はその手順を示して
いる。同図のごとく、まず複数の評価式が設定される
（Ｓ３０）。［１．３．２．１］で導入した画素に関す
るエネルギーＣ（ｍ，ｓ）ｆと［１．３．２．２］で導
入した写像の滑らかさに関するエネルギーＤ（ｍ，ｓ）
ｆがそれである。つぎに、これらの評価式を統合して総
合評価式を立てる（Ｓ３１）。［１．３．２．３］で導
入した総エネルギーλＣ（ｍ，ｓ）ｆ＋Ｄ（ｍ，ｓ）ｆ
がそれであり、［１．３．２．２］で導入したηを用い
れば、 ΣΣ（λＣ（ｍ，ｓ）（ｉ，ｊ）＋ηＥ０（ｍ，ｓ）（ｉ，ｊ）＋Ｅ１（ｍ，ｓ）（ｉ，ｊ））（式５２）となる。ただし、総和はｉ、ｊについてそれぞれ０、１
…、２ｍ−１で計算する。以上でマッチング評価の準備
が整う。In the base technology, a preparation for matching evaluation is made in order to proceed to S2 in FIG. FIG. 11 shows the procedure. As shown in the figure, first, a plurality of evaluation expressions are set (S30). Energy C (m, s) f for the pixel introduced in [1.3.2.1] and energy D (m, s) for the smoothness of the mapping introduced in [1.3.2.2].
f is that. Next, these evaluation expressions are integrated to form a comprehensive evaluation expression (S31). Total energy λC (m, s) f + D (m, s) f introduced in [1.3.2.3]
And using η introduced in [1.3.2.2], 、 (λC (m, s) (i, j) + ηE0 (m, s) (i, j) + E1 (m, s ) (I, j)) (Equation 52). However, the sum is 0, 1 for i and j, respectively.
... Calculated as 2m-1. The preparation for the matching evaluation is now completed.

【０１１８】図１２は図６のＳ２の詳細を示すフローチ
ャートである。［１］で述べたごとく、始点階層画像と
終点階層画像のマッチングは互いに同じ解像度レベルの
画像どうしでとられる。画像間の大域的なマッチングを
良好にとるために、解像度が粗いレベルから順にマッチ
ングを計算する。特異点フィルタを用いて始点階層画像
および終点階層画像を生成しているため、特異点の位置
や輝度は解像度の粗いレベルでも明確に保存されてお
り、大域的なマッチングの結果は従来に比べて非常に優
れたものになる。FIG. 12 is a flowchart showing the details of S2 in FIG. As described in [1], matching between the start hierarchical image and the end hierarchical image is performed between images having the same resolution level. In order to obtain good global matching between images, matching is calculated in order from the level with the lowest resolution. Since the start point hierarchical image and the end point hierarchical image are generated using the singular point filter, the position and luminance of the singular point are clearly preserved even at a coarse resolution level, and the result of global matching is lower than in the past. It will be very good.

【０１１９】図１２のごとく、まず係数パラメータηを
０、レベルパラメータｍを０に設定する（Ｓ２０）。つ
づいて、始点階層画像中の第ｍレベルの４つの副画像と
終点階層画像中の第ｍレベルの４つの副画像のそれぞれ
の間でマッチングを計算し、それぞれ全単射条件を満た
し、かつエネルギーを最小にするような４種類の副写像
ｆ（ｍ，ｓ）（ｓ＝０，１，２，３）を求める（Ｓ２
１）。全単射条件は［１．３．３］で述べた相続四辺形
を用いて検査される。この際、式１７、１８が示すよう
に、第ｍレベルにおける副写像は第ｍ−１レベルのそれ
らに拘束されるため、より解像度の粗いレベルにおける
マッチングが順次利用されていく。これは異なるレベル
間の垂直的参照である。なお、いまｍ＝０であってそれ
より粗いレベルはないが、この例外的な処理は図１３で
後述する。As shown in FIG. 12, first, the coefficient parameter η is set to 0 and the level parameter m is set to 0 (S20). Subsequently, matching is calculated between each of the four sub-images at the m-th level in the start hierarchical image and each of the four sub-images at the m-th level in the destination hierarchical image. Are obtained such that four kinds of submappings f (m, s) (s = 0, 1, 2, 3, 3) that minimize (S2)
1). The bijection condition is checked using the inheritance quadrilateral described in [1.3.3]. At this time, as shown in Expressions 17 and 18, the sub-mappings at the m-th level are constrained by those at the (m-1) -th level, so that matching at a lower resolution level is sequentially used. This is a vertical reference between different levels. Note that m = 0 now and there is no coarser level, but this exceptional processing will be described later with reference to FIG.

【０１２０】一方、同一レベル内における水平的参照も
行われる。［１．３．３］の式２０のごとく、ｆ（ｍ，
３）はｆ（ｍ，２）に、ｆ（ｍ，２）はｆ（ｍ，１）
に、ｆ（ｍ，１）はｆ（ｍ，０）に、それぞれ類似する
ように決める。その理由は、特異点の種類が違っても、
それらがもともと同じ始点画像と終点画像に含まれてい
る以上、副写像がまったく異なるという状況は不自然だ
からである。式２０からわかるように、副写像どうしが
近いほどエネルギーは小さくなり、マッチングが良好と
みなされる。On the other hand, horizontal reference within the same level is also performed. As shown in Equation 20 of [1.3.3], f (m,
3) is f (m, 2) and f (m, 2) is f (m, 1)
F (m, 1) is determined to be similar to f (m, 0). The reason is that even if the type of singularity is different,
This is because it is unnatural that the submappings are completely different as long as they are originally included in the same start image and end image. As can be seen from Equation 20, the closer the submappings are, the smaller the energy is, and the matching is considered to be good.

【０１２１】なお、最初に決めるべきｆ（ｍ，０）につ
いては同一のレベルで参照できる副写像がないため、式
１９に示すごとくひとつ粗いレベルを参照する。ただ
し、実験ではｆ（ｍ，３）まで求まった後、これを拘束
条件としてｆ（ｍ，０）を一回更新するという手続をと
った。これは式２０にｓ＝４を代入し、ｆ（ｍ，４）を
新たなｆ（ｍ，０）とすることに等しい。ｆ（ｍ，０）
とｆ（ｍ，３）の関連度が低くなり過ぎる傾向を回避す
るためであり、この措置によって実験結果がより良好に
なった。この措置に加え、実験では［１．７．１］に示
す副写像のシャッフルも行った。これも本来特異点の種
類ごとに決まる副写像どうしの関連度を密接に保つ趣旨
である。また、処理の開始点に依存する偏向を回避する
ために、ｓの値にしたがって開始点の位置を変える点は
［１．７］で述べたとおりである。Since there is no submapping that can be referred to at the same level for f (m, 0) to be determined first, a coarser level is referred to as shown in Expression 19. However, in the experiment, after f (m, 3) was obtained, a procedure was performed in which f (m, 0) was updated once using this as a constraint. This is equivalent to substituting s = 4 into Equation 20, and replacing f (m, 4) with a new f (m, 0). f (m, 0)
And f (m, 3) in order to avoid the tendency of the relevance to become too low, and this measure improved the experimental result. In addition to this measure, the experiment also shuffled the submap shown in [1.7.1]. This is also intended to keep the degree of relevance of submappings originally determined for each type of singularity closely. The point at which the position of the start point is changed according to the value of s in order to avoid the deflection depending on the start point of the processing is as described in [1.7].

【０１２２】図１３は第０レベルにおいて副写像を決定
する様子を示す図である。第０レベルでは各副画像がた
だひとつの画素で構成されるため、４つの副写像ｆ
（０，ｓ）はすべて自動的に単位写像に決まる。図１４
は第１レベルにおいて副写像を決定する様子を示す図で
ある。第１レベルでは副画像がそれぞれ４画素で構成さ
れる。同図ではこれら４画素が実線で示されている。い
ま、ｐ（１，ｓ）の点ｘの対応点をｑ（１，ｓ）の中で
探すとき、以下の手順を踏む。FIG. 13 is a diagram showing how the submapping is determined at the 0th level. At the 0th level, since each sub-image is composed of only one pixel, four sub-maps f
(0, s) is automatically determined as a unit mapping. FIG.
FIG. 8 is a diagram showing how a sub-mapping is determined at the first level. At the first level, each sub-image is composed of four pixels. In the figure, these four pixels are indicated by solid lines. Now, when searching for a corresponding point of point x of p (1, s) in q (1, s), the following procedure is taken.

【０１２３】１．第１レベルの解像度で点ｘの左上点
ａ、右上点ｂ、左下点ｃ、右下点ｄを求める。２．点ａ〜ｄがひとつ粗いレベル、つまり第０レベルに
おいて属する画素を探す。図１４の場合、点ａ〜ｄはそ
れぞれ画素Ａ〜Ｄに属する。ただし、画素Ａ〜Ｃは本来
存在しない仮想的な画素である。３．第０レベルですでに求まっている画素Ａ〜Ｄの対応
点Ａ’〜Ｄ’をｑ（１，ｓ）の中にプロットする。画素
Ａ’〜Ｃ’は仮想的な画素であり、それぞれ画素Ａ〜Ｃ
と同じ位置にあるものとする。４．画素Ａの中の点ａの対応点ａ’が画素Ａ’の中にあ
るとみなし、点ａ’をプロットする。このとき、点ａが
画素Ａの中で占める位置（この場合、右下）と、点ａ’
が画素Ａ’の中で占める位置が同じであると仮定する。５．４と同様の方法で対応点ｂ’〜ｄ’をプロットし、
点ａ’〜ｄ’で相続四辺形を作る。６．相続四辺形の中でエネルギーが最小になるよう、点
ｘの対応点ｘ’を探す。対応点ｘ’の候補として、例え
ば画素の中心が相続四辺形に含まれるものに限定しても
よい。図１４の場合、４つの画素がすべて候補になる。1. An upper left point a, an upper right point b, a lower left point c, and a lower right point d of the point x are obtained at the first level of resolution. 2. A pixel to which the points a to d belong at one coarse level, that is, the 0th level is searched. In the case of FIG. 14, points a to d belong to pixels A to D, respectively. However, the pixels A to C are virtual pixels that do not originally exist. 3. The corresponding points A ′ to D ′ of the pixels A to D already determined at the 0th level are plotted in q (1, s). Pixels A ′ to C ′ are virtual pixels, and pixels A to C, respectively.
At the same position as 4. Assuming that the corresponding point a 'of the point a in the pixel A is in the pixel A', the point a 'is plotted. At this time, the position occupied by the point a in the pixel A (in this case, the lower right) and the point a ′
Occupy the same position in pixel A ′. The corresponding points b ′ to d ′ are plotted in the same manner as in 5.4,
An inheritance quadrilateral is created at points a ′ to d ′. 6. A corresponding point x ′ of the point x is searched so that the energy is minimized in the inherited quadrilateral. The candidates for the corresponding point x ′ may be limited to, for example, those whose pixel centers are included in an inherited quadrilateral. In the case of FIG. 14, all four pixels are candidates.

【０１２４】以上がある点ｘの対応点の決定手順であ
る。同様の処理を他のすべての点について行い、副写像
を決める。第２レベル以上のレベルでは、次第に相続四
辺形の形が崩れていくと考えられるため、図３に示すよ
うに画素Ａ’〜Ｄ’の間隔が空いていく状況が発生す
る。The above is the procedure for determining the corresponding point of the point x. The same processing is performed for all other points to determine a submapping. At the second and higher levels, the shape of the inherited quadrilateral is considered to gradually collapse, so that a situation occurs in which the pixels A ′ to D ′ are spaced apart as shown in FIG.

【０１２５】こうして、ある第ｍレベルの４つの副写像
が決まれば、ｍをインクリメントし（図１２のＳ２
２）、ｍがｎを超えていないことを確かめて（Ｓ２
３）、Ｓ２１に戻る。以下、Ｓ２１に戻るたびに次第に
細かい解像度のレベルの副写像を求め、最後にＳ２１に
戻ったときに第ｎレベルの写像ｆ（ｎ）を決める。この
写像はη＝０に関して定まったものであるから、ｆ
（ｎ）（η＝０）と書く。When the four sub-maps of the m-th level are determined, m is incremented (S2 in FIG. 12).
2) Check that m does not exceed n (S2
3) Return to S21. Hereinafter, each time the process returns to S21, a sub-mapping of a finer resolution level is obtained, and when the process returns to S21, an n-th level mapping f (n) is determined. Since this mapping is determined for η = 0, f
Write (n) (η = 0).

【０１２６】つぎに異なるηに関する写像も求めるべ
く、ηをΔηだけシフトし、ｍをゼロクリアする（Ｓ２
４）。新たなηが所定の探索打切り値ηｍａｘを超えて
いないことを確認し（Ｓ２５）、Ｓ２１に戻り、今回の
ηに関して写像ｆ（ｎ）（η＝Δη）を求める。この処
理を繰り返し、Ｓ２１でｆ（ｎ）（η＝ｉΔη）（ｉ＝
０，１，…）を求めていく。ηがηｍａｘを超えたとき
Ｓ２６に進み、後述の方法で最適なη＝ηｏｐｔを決定
し、ｆ（ｎ）（η＝ηｏｐｔ）を最終的に写像ｆ（ｎ）
とする。Next, in order to obtain mappings for different η, η is shifted by Δη and m is cleared to zero (S2).
4). It is confirmed that the new η does not exceed the predetermined search cutoff value ηmax (S25), and the process returns to S21 to obtain the mapping f (n) (η = Δη) for the current η. This process is repeated, and in S21, f (n) (η = iΔη) (i =
0, 1, ...). When η exceeds ηmax, the process proceeds to S26, where the optimal η = ηopt is determined by a method described later, and f (n) (η = ηopt) is finally mapped f (n).
And

【０１２７】図１５は図１２のＳ２１の詳細を示すフロ
ーチャートである。このフローチャートにより、ある定
まったηについて、第ｍレベルにおける副写像が決ま
る。副写像を決める際、前提技術では副写像ごとに最適
なλを独立して決める。FIG. 15 is a flowchart showing the details of S21 in FIG. According to this flowchart, the submapping at the m-th level is determined for a certain η. In determining the submapping, in the base technology, the optimum λ is independently determined for each submapping.

【０１２８】同図のごとく、まずｓとλをゼロクリアす
る（Ｓ２１０）。つぎに、そのときのλについて（およ
び暗にηについて）エネルギーを最小にする副写像ｆ
（ｍ，ｓ）を求め（Ｓ２１１）、これをｆ（ｍ，ｓ）
（λ＝０）と書く。異なるλに関する写像も求めるべ
く、λをΔλだけシフトし、新たなλが所定の探索打切
り値λｍａｘを超えていないことを確認し（Ｓ２１
３）、Ｓ２１１に戻り、以降の繰り返し処理でｆ（ｍ，
ｓ）（λ＝ｉΔλ）（ｉ＝０，１，…）を求める。λが
λｍａｘを超えたときＳ２１４に進み、最適なλ＝λｏ
ｐｔを決定し、ｆ（ｍ，ｓ）（λ＝λｏｐｔ）を最終的
に写像ｆ（ｍ，ｓ）とする（Ｓ２１４）。As shown in the figure, first, s and λ are cleared to zero (S210). Next, the submap f that minimizes the energy for λ (and implicitly for η) at that time
(M, s) is obtained (S211), and this is f (m, s)
Write (λ = 0). In order to obtain a mapping for a different λ, λ is shifted by Δλ, and it is confirmed that the new λ does not exceed the predetermined search cutoff value λmax (S21).
3), returning to S211 and performing f (m,
s) (λ = iΔλ) (i = 0, 1,...) is obtained. When λ exceeds λmax, the process proceeds to S214, and the optimal λ = λo
pt is determined, and f (m, s) (λ = λopt) is finally set as a mapping f (m, s) (S214).

【０１２９】つぎに、同一レベルにおける他の副写像を
求めるべく、λをゼロクリアし、ｓをインクリメントす
る（Ｓ２１５）。ｓが４を超えていないことを確認し
（Ｓ２１６）、Ｓ２１１に戻る。ｓ＝４になれば上述の
ごとくｆ（ｍ，３）を利用してｆ（ｍ，０）を更新し、
そのレベルにおける副写像の決定を終了する。Next, λ is cleared to zero and s is incremented in order to obtain another submapping at the same level (S215). Confirm that s does not exceed 4 (S216), and return to S211. When s = 4, f (m, 0) is updated using f (m, 3) as described above,
The determination of the sub-mapping at that level ends.

【０１３０】図１６は、あるｍとｓについてλを変えな
がら求められたｆ（ｍ，ｓ）（λ＝ｉΔλ）（ｉ＝０，
１，…）に対応するエネルギーＣ（ｍ，ｓ）ｆの挙動を
示す図である。［１．４］で述べたとおり、λが増加す
ると通常Ｃ（ｍ，ｓ）ｆは減少する。しかし、λが最適
値を超えるとＣ（ｍ，ｓ）ｆは増加に転じる。そこで本
前提技術ではＣ（ｍ，ｓ）ｆが極小値をとるときのλを
λｏｐｔと決める。同図のようにλ＞λｏｐｔの範囲で
再度Ｃ（ｍ，ｓ）ｆが小さくなっていっても、その時点
ではすでに写像がくずれていて意味をなさないため、最
初の極小点に注目すればよい。λｏｐｔは副写像ごとに
独立して決めていき、最後にｆ（ｎ）についてもひとつ
定まる。FIG. 16 shows that f (m, s) (λ = iΔλ) (i = 0,
It is a figure which shows the behavior of energy C (m, s) f corresponding to (1, ...). As described in [1.4], as λ increases, C (m, s) f usually decreases. However, when λ exceeds the optimum value, C (m, s) f starts to increase. Therefore, in the base technology, λ when C (m, s) f takes a minimum value is determined as λopt. Even if C (m, s) f becomes smaller again in the range of λ> λopt as shown in the figure, the mapping is already distorted at that point and it is meaningless. Good. λopt is determined independently for each submapping, and finally one for f (n).

【０１３１】一方、図１７は、ηを変えながら求められ
たｆ（ｎ）（η＝ｉΔη）（ｉ＝０，１，…）に対応す
るエネルギーＣ（ｎ）ｆの挙動を示す図である。ここで
もηが増加すると通常Ｃ（ｎ）ｆは減少するが、ηが最
適値を超えるとＣ（ｎ）ｆは増加に転じる。そこでＣ
（ｎ）ｆが極小値をとるときのηをηｏｐｔと決める。
図１７は図４の横軸のゼロ付近を拡大した図と考えてよ
い。ηｏｐｔが決まればｆ（ｎ）を最終決定することが
できる。FIG. 17 is a diagram showing the behavior of energy C (n) f corresponding to f (n) (η = iΔη) (i = 0, 1,...) Obtained while changing η. . Here, as η increases, C (n) f usually decreases, but when η exceeds the optimum value, C (n) f starts to increase. So C
(N) η when f takes a minimum value is determined as η opt.
FIG. 17 can be considered as an enlarged view of the vicinity of zero on the horizontal axis in FIG. Once ηopt is determined, f (n) can be finally determined.

【０１３２】以上、本前提技術によれば種々のメリット
が得られる。まずエッジを検出する必要がないため、エ
ッジ検出タイプの従来技術の課題を解消できる。また、
画像に含まれるオブジェクトに対する先験的な知識も不
要であり、対応点の自動検出が実現する。特異点フィル
タによれば、解像度の粗いレベルでも特異点の輝度や位
置を維持することができ、オブジェクト認識、特徴抽
出、画像マッチングに極めて有利である。その結果、人
手作業を大幅に軽減する画像処理システムの構築が可能
となる。As described above, according to the base technology, various merits can be obtained. First, since there is no need to detect an edge, the problem of the conventional edge detection type technology can be solved. Also,
No a priori knowledge of the objects included in the image is required, and automatic detection of corresponding points is realized. According to the singular point filter, the luminance and position of the singular point can be maintained even at a coarse resolution level, which is extremely advantageous for object recognition, feature extraction, and image matching. As a result, it is possible to construct an image processing system that significantly reduces manual work.

【０１３３】なお、本前提技術について次のような変形
技術も考えられる。（１）前提技術では始点階層画像と終点階層画像の間で
マッチングをとる際にパラメータの自動決定を行った
が、この方法は階層画像間ではなく、通常の２枚の画像
間のマッチングをとる場合全般に利用できる。The following prerequisite technology can be considered as the prerequisite technology. (1) In the base technology, parameters are automatically determined when matching is performed between the start hierarchical image and the end hierarchical image. However, this method performs matching between two normal images, not between hierarchical images. Available for all cases.

【０１３４】たとえば２枚の画像間で、画素の輝度の差
に関するエネルギーＥ０と画素の位置的なずれに関する
エネルギーＥ１のふたつを評価式とし、これらの線形和
Ｅｔｏｔ＝αＥ０＋Ｅ１を総合評価式とする。この総合
評価式の極値付近に注目してαを自動決定する。つま
り、いろいろなαについてＥｔｏｔが最小になるような
写像を求める。それらの写像のうち、αに関してＥ１が
極小値をとるときのαを最適パラメータと決める。その
パラメータに対応する写像を最終的に両画像間の最適マ
ッチングとみなす。For example, between two images, energy E0 relating to the difference in luminance of pixels and energy E1 relating to positional deviation of pixels are used as an evaluation expression, and the linear sum Etot = αE0 + E1 is used as an overall evaluation expression. Attention is paid to the vicinity of the extreme value of this comprehensive evaluation formula, and α is automatically determined. That is, a mapping that minimizes Etot for various α is obtained. Among these mappings, α when E1 takes a minimum value with respect to α is determined as an optimal parameter. The mapping corresponding to the parameter is finally regarded as the optimal matching between the two images.

【０１３５】これ以外にも評価式の設定にはいろいろな
方法があり、例えば１／Ｅ１と１／Ｅ２のように、評価
結果が良好なほど大きな値をとるものを採用してもよ
い。総合評価式も必ずしも線形和である必要はなく、ｎ
乗和（ｎ＝２、１／２、−１、−２など）、多項式、任
意の関数などを適宜選択すればよい。In addition to the above, there are various methods for setting the evaluation formula. For example, a formula having a larger evaluation value such as 1 / E1 and 1 / E2 may be adopted as the evaluation result becomes better. The overall evaluation formula does not necessarily need to be a linear sum, and n
A sum of squares (n = 2, 、, −1, −2, etc.), a polynomial, an arbitrary function, or the like may be appropriately selected.

【０１３６】パラメータも、αのみ、前提技術のごとく
ηとλのふたつの場合、それ以上の場合など、いずれで
もよい。パラメータが３以上の場合はひとつずつ変化さ
せて決めていく。（２）本前提技術では、総合評価式の値が最小になるよ
う写像を決めた後、総合評価式を構成するひとつの評価
式であるＣ（ｍ，ｓ）ｆが極小になる点を検出してパラ
メータを決定した。しかし、こうした二段階処理の代わ
りに、状況によっては単に総合評価式の最小値が最小に
なるようにパラメータを決めても効果的である。その場
合、例えばαＥ０＋βＥ１を総合評価式とし、α＋β＝
１なる拘束条件を設けて各評価式を平等に扱うなどの措
置を講じてもよい。パラメータの自動決定の本質は、エ
ネルギーが最小になるようにパラメータを決めていく点
にあるからである。（３）前提技術では各解像度レベルで４種類の特異点に
関する４種類の副画像を生成した。しかし、当然４種類
のうち１、２、３種類を選択的に用いてもよい。例え
ば、画像中に明るい点がひとつだけ存在する状態であれ
ば、極大点に関するｆ（ｍ，３）だけで階層画像を生成
しても相応の効果が得られるはずである。その場合、同
一レベルで異なる副写像は不要になるため、ｓに関する
計算量が減る効果がある。（４）本前提技術では特異点フィルタによってレベルが
ひとつ進むと画素が１／４になった。例えば３×３で１
ブロックとし、その中で特異点を探す構成も可能であ
り、その場合、レベルがひとつ進むと画素は１／９にな
る。（５）始点画像と終点画像がカラーの場合、それらをま
ず白黒画像に変換し、写像を計算する。その結果求めら
れた写像を用いて始点のカラー画像を変換する。それ以
外の方法として、ＲＧＢの各成分について副写像を計算
してもよい。［４］実験の結果この前提技術を用いて様々な画像を補間することができ
る。異なるふたつの視点からの画像を補間すると、中間
視点からの画像を生成することができる。これはＷＷＷ
においてきわめて有利である。なぜなら、限られた数の
画像から任意の視点画像を生成できるからである。ふた
りの人の顔の画像を補間すれば、モーフィングを行うこ
とができる。ＣＴまたはＭＲＩのデータのように三次元
オブジェクトの断面データ画像を用いれば、補間の結果
ボリュームレンダリングのための三次元オブジェクトの
正確な形状を再構築することができる。The parameter may be any parameter, such as α alone, two cases of η and λ as in the base technology, and more than two cases. If the parameter is 3 or more, it is determined by changing one by one. (2) In the base technology, after a mapping is determined so that the value of the overall evaluation formula is minimized, a point at which C (m, s) f, which is one of the evaluation formulas constituting the overall evaluation formula, becomes minimum is detected. The parameters were determined. However, in place of such a two-step process, depending on the situation, it is effective to simply determine the parameters so that the minimum value of the comprehensive evaluation formula is minimized. In this case, for example, αE0 + βE1 is used as a comprehensive evaluation formula, and α + β =
One constraint condition may be provided to take measures such as treating each evaluation formula equally. This is because the essence of automatic parameter determination is to determine parameters so that energy is minimized. (3) In the base technology, four types of sub-images related to four types of singular points are generated at each resolution level. However, one, two, and three of the four types may be used selectively. For example, if there is only one bright point in the image, a corresponding effect should be obtained even if a hierarchical image is generated only with f (m, 3) relating to the maximum point. In this case, different submappings at the same level are not required, so that there is an effect that the amount of calculation regarding s is reduced. (4) In the base technology, when the level is advanced by one by the singular point filter, the number of pixels is reduced to 1/4. For example, 3 × 3 and 1
A configuration in which a block is used to search for a singular point in the block is also possible. In this case, if the level advances by one, the pixel becomes 1/9. (5) If the start and end images are color, they are first converted to black and white images and the mapping is calculated. The starting color image is converted using the mapping obtained as a result. As another method, a submapping may be calculated for each component of RGB. [4] Results of Experiment Various images can be interpolated using this base technology. By interpolating images from two different viewpoints, an image from an intermediate viewpoint can be generated. This is WWW
Is very advantageous in This is because an arbitrary viewpoint image can be generated from a limited number of images. Morphing can be performed by interpolating the images of the faces of the two people. If a cross-sectional data image of a three-dimensional object is used, such as CT or MRI data, an accurate shape of the three-dimensional object for volume rendering can be reconstructed as a result of the interpolation.

【０１３７】図１８（ａ）、図１８（ｂ）、図１８
（ｃ）は、写像が中間視点画像を生成するために用いら
れた場合を示している。ここでは右目画像と左目画像が
補間された。図１８（ａ）は左目から見た始点画像、図
１８（ｂ）は右目から見た始点画像、図１８（ｃ）は簡
単のために［１．８］においてｔの値を０．５としたと
きの中間画像をそれぞれ示している。FIGS. 18 (a), 18 (b), 18
(C) shows a case where the mapping is used to generate an intermediate viewpoint image. Here, the right eye image and the left eye image are interpolated. 18 (a) is a starting image viewed from the left eye, FIG. 18 (b) is a starting image viewed from the right eye, and FIG. 18 (c) is t = 0.5 in [1.8] for simplicity. Each of the intermediate images is shown.

【０１３８】図１９（ａ）、図１９（ｂ）、図１９
（ｃ）、図１９（ｄ）は、写像を用いて人間の顔のモー
フィングを行った場合を示している。ここではふたりの
顔を補間した。図１９（ａ）は始点画像、図１９（ｂ）
は終点画像、図１９（ｃ）は終点画像に始点画像を重ね
合わせた画像、図１９（ｄ）はｔ＝０．５のときの中間
画像をそれぞれ示している。FIGS. 19 (a), 19 (b), 19
(C) and FIG. 19 (d) show a case where a human face is morphed using a mapping. Here, the two faces were interpolated. FIG. 19A shows a starting point image, and FIG.
19C shows an end image, FIG. 19C shows an image obtained by superimposing the start image on the end image, and FIG. 19D shows an intermediate image when t = 0.5.

【０１３９】図２０（ａ）、図２０（ｂ）は、人と猫の
顔の補間に写像を用いた場合を示している。図２０
（ａ）は猫の顔、図２０（ｂ）は人の顔と猫の顔のモー
フィング画像をそれぞれ示している。人の顔として図１
９（ａ）の画像を用いた。［１．６］で説明した輝度の
正規化はこの例においてのみ用いられている。FIGS. 20A and 20B show a case where a mapping is used for interpolation of human and cat faces. FIG.
(A) shows a cat face, and FIG. 20 (b) shows a morphing image of a human face and a cat face. Figure 1 as a human face
9 (a) was used. The luminance normalization described in [1.6] is used only in this example.

【０１４０】図２１（ａ）、図２１（ｂ）、図２１
（ｃ）は、数多くのオブジェクトを含む画像に対して今
回の方法を適応した例を示している。図２１（ａ）は始
点画像、図２１（ｂ）は終点画像、図２１（ｃ）はｔ＝
０．５のときの中間画像をそれぞれ示す。FIGS. 21 (a), 21 (b), 21
(C) shows an example in which the present method is applied to an image including many objects. FIG. 21 (a) is the start point image, FIG. 21 (b) is the end point image, and FIG.
An intermediate image at 0.5 is shown.

【０１４１】図２２（ａ）、図２２（ｂ）、図２２
（ｃ）、図２２（ｄ）は、ＭＲＩによって得られた人間
の脳の断面画像を補間するために写像を用いた結果を示
している。図２２（ａ）は始点画像、図２２（ｂ）は終
点画像（上部断面）、図２２（ｃ）はｔ＝０．５の場合
の中間画像をそれぞれ示している。また図２２（ｄ）は
四つの断面画像を用いてボリュームレンダリングを行っ
た結果を斜め方向から見た様子を示す。オブジェクトは
完全に不透明であり、補間の結果輝度が５１（＝２５５
×０．２）以上となった画素のみが表示されている。再
構築されたオブジェクトは中心付近で垂直にカットさ
れ、その内部が示されている。FIGS. 22 (a), 22 (b) and 22
(C) and FIG. 22 (d) show the results of using a mapping to interpolate a cross-sectional image of the human brain obtained by MRI. FIG. 22A shows a start point image, FIG. 22B shows an end point image (upper section), and FIG. 22C shows an intermediate image when t = 0.5. FIG. 22D shows the result of performing volume rendering using four cross-sectional images viewed from an oblique direction. The object is completely opaque and has a luminance of 51 (= 255) as a result of the interpolation.
× 0.2) are displayed. The reconstructed object is cut vertically near the center, showing the interior.

【０１４２】これらの例において、ＭＲＩの画像は２５
６×２５６画素、それ以外の画像は全て５１２×５１２
画素である。画素の輝度は０〜２５５のいずれかの値を
とる。［１．３．１］で説明した付加条件は、図２１
（ａ）〜図２１（ｃ）の場合を除き、すべての例で用い
られている。これら全ての例においてＢ０ｔｈｒｅｓ＝
０．００３及びＢ１ｔｈｒｅｓ＝０．５が用いられ、こ
れらの値を変更する必要は全くなかった。各副画像の画
素の輝度は図２０（ａ）、図２０（ｂ）の場合のみ、正
規化された。In these examples, the MRI images were 25
6 × 256 pixels, all other images are 512 × 512
Pixel. The luminance of the pixel takes any value from 0 to 255. The additional conditions described in [1.3.1] are the same as those in FIG.
Except for the cases of (a) to FIG. 21 (c), they are used in all examples. In all these examples, B0thres =
0.003 and B1thres = 0.5 were used and there was no need to change these values. The luminance of the pixel of each sub-image was normalized only in the case of FIGS. 20 (a) and 20 (b).

【０１４３】［疑似３次元画像生成技術］以上詳細に記
述した前提技術を利用した疑似３次元画像生成技術を説
明する。疑似３次元画像生成技術は、この前提技術にお
ける中間画像を生成する処理技術を応用したものであ
る。図２３は、疑似３次元画像生成の手順を例示するフ
ローチャートである。対象物を各方向から撮影する（Ｓ
２０１０）。図２４は対象物Ｏを取り囲んで等間隔に撮
像装置ａ，ｂ，・・・を配置した撮影装置を示す配置図
である。映像の視野は、後に疑似３次元画像として見る
ときの視点ｓが、撮影された画像のいずれかに含まれる
ような大きさとする。[Pseudo Three-Dimensional Image Generation Technology] A pseudo three-dimensional image generation technology using the base technology described in detail above will be described. The pseudo three-dimensional image generation technology is an application of the processing technology for generating an intermediate image in the base technology. FIG. 23 is a flowchart illustrating a procedure for generating a pseudo three-dimensional image. The object is photographed from each direction (S
2010). FIG. 24 is an arrangement diagram showing an imaging device in which imaging devices a, b,... The visual field of the video is set to a size such that the viewpoint s when viewed as a pseudo three-dimensional image later is included in any of the captured images.

【０１４４】撮影は、図２４に示した撮影装置のように
撮像装置を固定して行っても良いが、たとえば人がカメ
ラを持って適当な方向から行っても良い。ただし、疑似
３次元画像として見たい視点がいずれかの画像に含まれ
るようにすることは同じである。なお、撮影距離あるい
は倍率が変化しても、後に行うマッチング計算や補間計
算により補正されるため問題がない。また、ロボットに
撮像装置を把持させて、プログラムに従って歩行あるい
は腕の操作により撮影ポイントを決定して撮影させるよ
うにしても良い。ロボットを利用するときは、補間に必
要とされる撮影位置や撮影方向を予め最適化しておいて
対象物が変化しても的確な画像をいつでも取得できるよ
うにすることができる。The photographing may be performed with the imaging device fixed as in the photographing device shown in FIG. 24, or may be performed from an appropriate direction with a person holding the camera, for example. However, it is the same that a viewpoint desired to be viewed as a pseudo three-dimensional image is included in any image. Even if the shooting distance or the magnification changes, there is no problem because it is corrected by matching calculation or interpolation calculation performed later. Alternatively, the imaging device may be held by the robot, and the imaging point may be determined by walking or operating the arm in accordance with the program so that the imaging device can perform the imaging. When a robot is used, a photographing position and a photographing direction required for interpolation can be optimized in advance so that an accurate image can be obtained at any time even if the target object changes.

【０１４５】次に撮像装置から画像データを入力し（Ｓ
２０１２）、各画像について符号化画像データを生成す
る（Ｓ２０１４）。符号化画像データ生成工程では、原
画像の画像データについて多重解像度フィルタにより二
次元的な探索を行って、ある領域における画素値の極大
点、極小点、鞍点、その他の特異な特徴を有する特異点
を検出し、この特異点を含み元の画像よりも解像度の低
い階層画像を生成する。このような手順を繰り返すこと
により階層化した画像群を生成する。これら階層画像群
の画像データは符号化画像データとして記憶装置に格納
され、疑似３次元画像の生成に利用される。また、画像
表示が行われる機器に伝送することもできる（Ｓ２０１
６）。このようにして、各原画像について一連の階層画
像のデータからなる符号化画像データが得られる。Next, image data is input from the imaging device (S
2012), encoded image data is generated for each image (S2014). In the coded image data generation step, a two-dimensional search is performed on the image data of the original image by a multi-resolution filter, and a maximum point, a minimum point, a saddle point, and other singular points having unique characteristics of a pixel value in a certain area are obtained. Is detected, and a hierarchical image including the singular point and having a lower resolution than the original image is generated. By repeating such a procedure, a hierarchical image group is generated. The image data of these hierarchical image groups is stored as encoded image data in a storage device, and is used for generating a pseudo three-dimensional image. In addition, the image can be transmitted to a device that performs image display (S201).
6). In this way, encoded image data including a series of hierarchical image data is obtained for each original image.

【０１４６】対象物の疑似３次元画像を表示するため
に、疑似３次元画像生成装置は視点情報を取得する（Ｓ
２０２０）。画像表示装置により対象物を観察する看者
は、見ている対象物について異なる視点を指定すること
ができる。疑似３次元画像生成装置はポインティングデ
バイスなどを介して、指定した視点位置の情報を入力す
る。視点位置情報に基づき符号化画像データを検索し
て、指定された視点を含む画像を検出する（Ｓ２０２
２）。In order to display a pseudo three-dimensional image of the object, the pseudo three-dimensional image generation device acquires viewpoint information (S
2020). A viewer observing an object using the image display device can specify a different viewpoint for the object being viewed. The pseudo three-dimensional image generation device inputs information on a designated viewpoint position via a pointing device or the like. The encoded image data is searched based on the viewpoint position information to detect an image including the designated viewpoint (S202).
2).

【０１４７】これら検出された画像の階層画像データを
利用して画像間で共有される特異点を検出し、画像同士
のマッチングを取る（Ｓ２０２４）。画像のマッチング
は、対象とする画像について解像度レベルごとに取って
いく。このとき、前提技術の説明にあるように、画像間
の写像のエネルギーを定義し、このエネルギーが最小に
なるような写像によりマッチングを表現することができ
る。写像のエネルギーを位置と輝度における両者間の差
異に関連する評価関数とし、パラメータを導入すること
により最適なマッチングを自動的に決定することができ
る。さらに、写像に視点情報に基づく補間を施して、指
定した視点を中心とする合成画像における対応対応画素
の位置と輝度を決定し、新しい画像を生成する。Using the hierarchical image data of the detected images, a singular point shared between the images is detected, and matching between the images is performed (S2024). Image matching is performed for each resolution level of the target image. At this time, as described in the base technology, the energy of the mapping between the images is defined, and the matching can be expressed by a mapping that minimizes this energy. The optimal matching can be automatically determined by using the energy of the mapping as an evaluation function relating to the difference between the position and the luminance, and introducing parameters. Further, interpolation is performed on the mapping based on the viewpoint information to determine the position and luminance of the corresponding pixel in the composite image centered on the specified viewpoint, and generate a new image.

【０１４８】たとえば、図２５に概念的に示すように、
カメラａで撮影した画像Ａについての視点Ｖａとカメラ
ｂで撮影した画像Ｂについての視点Ｖｂに対する新たな
視点Ｖの位置関係を求めて補間計算することにより、視
点Ｖを見通す中間位置ｓから見た中間的な画像Ｓを合成
する。視点がふたつの画像の視点間をｔ：（１−ｔ）に
内分する位置にあったとすれば、中間画像も原画像の間
をｔ：（１−ｔ）に内分する位置にあると想定して補間
する。例えば、新たな視点Ｖがそれぞれ取得した画像に
おける視点ＶａとＶｂの真ん中にあれば、中間画像Ｓの
画像データＫは画像ＡとＢの画像データＫａとＫｂに対
して、Ｋ＝（Ｋａ＋Ｋｂ）／２の関係を有することにな
る。For example, as shown conceptually in FIG.
The position of the new viewpoint V with respect to the viewpoint Va of the image A captured by the camera a and the viewpoint Vb of the image B captured by the camera b is obtained and subjected to interpolation calculation, so that the viewpoint V is viewed from the intermediate position s through which the viewpoint V is seen. An intermediate image S is synthesized. Assuming that the viewpoint is at a position that internally divides the viewpoint between the two images into t: (1-t), the intermediate image is also at a position that internally divides the space between the original images into t: (1-t). Assuming and interpolating. For example, if the new viewpoint V is in the middle of the viewpoints Va and Vb in the acquired image, the image data K of the intermediate image S is K = (Ka + Kb) / K with respect to the image data Ka and Kb of the images A and B. It has a relationship of 2.

【０１４９】このようにして生成された合成画像は、対
象物を撮影方向以外の任意の方向から見た状態を表すも
のであるから、顔を動かしたり対象を回転させて観察す
る場合と同じ画像を提供するもので、擬似的な３次元画
像を提示することができる（Ｓ２０２６）。なお、新た
な視点Ｖの位置が取得した画像の視点ＶａとＶｂを結ぶ
直線から外れている場合は、該直線に対する垂直方向の
成分を考慮しなければならない。この場合でも原画像を
用い、垂直方向に換算した距離計算を行うことにより合
成画像を求めることができる。しかし、より好ましく
は、垂直方向に偏倚した視点Ｖｃを有する第３の画像Ｋ
ｃを取得して、Ｋ＝ΣａｉＫｉで表されるような画像Ｋ
として得るようにすればよい。ただし、Ｋｉ＝Ｋａ，Ｋ
ｂ，Ｋｃ、Σａｉ＝１である。なお、各画像を撮影した
視線方向は１点で交わることが好ましい。ただし、多少
の誤差があっても十分実用に耐えることができる。Since the composite image generated in this way represents the state of the object viewed from an arbitrary direction other than the photographing direction, the same image as when the face is moved or the object is rotated and observed is obtained. And a pseudo three-dimensional image can be presented (S2026). When the position of the new viewpoint V is out of the straight line connecting the viewpoints Va and Vb of the acquired image, the component in the vertical direction with respect to the straight line must be considered. Also in this case, a synthesized image can be obtained by performing distance calculation in the vertical direction using the original image. However, more preferably, the third image K having a vertically offset viewpoint Vc
c to obtain an image K as represented by K = ΣaiKi
What is necessary is just to get. Where Ki = Ka, K
b, Kc, Σai = 1. In addition, it is preferable that the line-of-sight directions at which each image is taken intersect at one point. However, even if there is some error, it can sufficiently withstand practical use.

【０１５０】このようにして得られた合成画像の画像デ
ータは画像表示装置に送られ、そこで合成画像が表示さ
れる。なお、合成画像の画像データは原画像の階層画像
を利用した階層画像の画像データにより形成するように
しても良い。以上の処理を実現する装置は、それぞれの
機能を果たすユニットにより構成されるが、たとえばＰ
Ｃ（パーソナルコンピュータ）にＣＤ−ＲＯＭなどの記
録媒体からロードされるプログラムで実現することもで
きる。The image data of the composite image thus obtained is sent to the image display device, where the composite image is displayed. The image data of the composite image may be formed by image data of a hierarchical image using a hierarchical image of the original image. An apparatus for realizing the above processing is constituted by units performing the respective functions.
It can also be realized by a program loaded into a C (personal computer) from a recording medium such as a CD-ROM.

【０１５１】本実施例の疑似３次元画像生成装置によれ
ば、画像のマッチングを利用するため、画像同士の歪み
が大きくても良好な合成画像を得ることができる。実
際、対象物の周囲を約３０度毎に撮影した画像があれ
ば、画像によってはどの方向からも立体像として自然に
見えるような映像を提示でき、さらに１０度毎の画像が
準備できれば通常に撮影したものと殆ど区別ができない
映像を生成することが確かめられている。According to the pseudo three-dimensional image generating apparatus of this embodiment, since the image matching is used, a good composite image can be obtained even if the distortion between the images is large. In fact, if there is an image of the periphery of the object taken at about every 30 degrees, depending on the image, it is possible to present an image that looks natural as a stereoscopic image from any direction. It has been confirmed that an image that can hardly be distinguished from a captured image is generated.

【０１５２】また、本手順の演算速度は極めて高速なの
で、パソコン水準のコンピュータを使用する場合にも、
ユーザがマウス操作によって視点を変更したときに遅れ
を感じさせないようなリアルタイム処理が実現されてい
る。また、作成あるいは撮影した原画像と異なる視点を
有する動画や映画を提示することも可能である。本実施
例の技術を利用すると、例えばインターネットを利用し
た販売においても、僅かな符号化画像データを伝送する
ことにより商品の立体映像を表示することができ、消費
者の購入意欲を喚起して効率的な営業を行うことができ
る。Since the operation speed of this procedure is extremely high, even when using a computer of the personal computer level,
Real-time processing that does not cause a delay when the user changes the viewpoint by operating the mouse is realized. It is also possible to present a moving image or a movie having a different viewpoint from the created or photographed original image. By using the technology of the present embodiment, for example, even in the case of sales using the Internet, it is possible to display a stereoscopic image of a product by transmitting a small amount of encoded image data, thereby evoking consumers' willingness to purchase and improving efficiency. Business can be performed.

【０１５３】なお、図２６に示すように、対象物Ｏを覆
う球面Ｇ上に適当な間隔で撮像装置Ｆを配置して対象物
の周囲全体から撮影するようにすれば、看者の視点がＺ
方向の任意の位置に移動する場合にも対応することがで
き、より完璧な疑似３次元画像提供装置になる。このと
き、撮像装置は固定して配置するのではなく、人やロボ
ットの手によって適当な位置に動かしながら撮影するよ
うにしてもよい。また、撮像装置を所定の位置を据えて
外側に向けて回転移動しながら適当な間隔で撮影した画
像を処理することにより、任意の方向に見た外景映像を
合成して提示することができる。移動する視点に応じて
映像が移動するパノラマ写真として表示することもでき
る。As shown in FIG. 26, if the image pickup device F is arranged at appropriate intervals on the spherical surface G covering the object O and an image is taken from the entire periphery of the object, the viewpoint of the viewer can be improved. Z
It is possible to cope with the case of moving to any position in the direction, and a more complete pseudo three-dimensional image providing device can be provided. At this time, the imaging device may not be fixedly arranged, and may be photographed while being moved to an appropriate position by the hand of a person or a robot. In addition, by processing images taken at appropriate intervals while rotating and moving the imaging apparatus outward at a predetermined position, it is possible to synthesize and present an external view image viewed in an arbitrary direction. The image can be displayed as a panoramic photograph in which the image moves according to the moving viewpoint.

【０１５４】図２７は、２枚の風景写真ａとｂから視線
の異なる合成写真ｓを生成することによりパノラマ写真
にする場合を概念的に説明する図面である。図２７
（ａ）において、視線の異なる画像ＡとＢを取得する。
画像Ａと画像Ｂは風景の一部が重複して撮影されてい
る。看者は２つの画像と異なるところに視点をおいた画
像Ｓを表示するように要求するとする。疑似３次元画像
生成装置は、図２７（ｂ）に示すように画像Ａと画像Ｂ
の画像データそれぞれについて特異点を検出しこれに基
づいて両者のマッチングを行う。さらに合成したい画像
Ｓに関する視点情報に基づいて補間計算を行って、図２
７（ｃ）に示すような新しい画像Ｓを合成する。FIG. 27 is a diagram conceptually illustrating a case where a composite photograph s having different eyes is generated from two landscape photographs a and b to make a panoramic photograph. FIG.
In (a), images A and B having different lines of sight are acquired.
In the image A and the image B, a part of the scene is overlapped and photographed. It is assumed that the viewer requests to display the image S with a viewpoint different from the two images. The pseudo three-dimensional image generation device performs the image A and the image B as shown in FIG.
A singular point is detected for each of the image data, and the two are matched based on this. Further, an interpolation calculation is performed based on the viewpoint information on the image S to be synthesized, and FIG.
A new image S as shown in FIG.

【０１５５】本実施例の装置によると、極めて小さい符
号化画像データを用いて高速に画像処理してこれらの疑
似３次元画像を提示するので、通信路を介して画像表示
するＷＷＷなどに活用することができる。なお、上記実
施例では、多重解像度フィルタを用いたマッチング手順
を利用してより少ない原画像から実際に近い中間位置画
像を合成するものを説明したが、他のマッチング手法を
利用しても同様の効果が得られることはいうまでもな
い。According to the apparatus of this embodiment, since these pseudo three-dimensional images are presented by performing image processing at a high speed using extremely small encoded image data, they are utilized for WWW or the like for displaying images via a communication path. be able to. Note that, in the above-described embodiment, an example in which an intermediate position image closer to the actual position is synthesized from a smaller number of original images using a matching procedure using a multi-resolution filter has been described. Needless to say, an effect can be obtained.

【０１５６】[0156]

【発明の効果】以上説明したとおり、本発明の疑似３次
元画像生成方法および装置は、複数の画像を符号化画像
データ化して自動的にマッチング計算させ補間計算させ
ることにより、少ない情報量に基づいて任意の視点を有
する画像を合成して提示することができるので、エレク
トリックコマース（ＥＣ）や動画などにおける立体物の
オンライン画像表示や撮影と異なる方向から見た画像で
構成する映画の製作などが可能となる。As described above, the method and apparatus for generating a pseudo three-dimensional image according to the present invention convert a plurality of images into coded image data, and automatically perform matching calculation and interpolation calculation, thereby obtaining a small amount of information. It is possible to combine and present an image with an arbitrary viewpoint, and to display an online image of a three-dimensional object in an electric commerce (EC) or a moving image, or to produce a movie composed of an image viewed from a different direction from shooting. It becomes possible.

[Brief description of the drawings]

【図１】図１（ａ）とは図１（ｂ）は、ふたりの人物の
顔に平均化フィルタを施して得られる画像、図１（ｃ）
と図１（ｄ）は、ふたりの人物の顔に関して前提技術で
求められるｐ（５，０）の画像、図１（ｅ）と図１
（ｆ）は、ふたりの人物の顔に関して前提技術で求めら
れるｐ（５，１）の画像、図１（ｇ）と図１（ｈ）は、
ふたりの人物の顔に関して前提技術で求められるｐ
（５，２）の画像、図１（ｉ）と図１（ｊ）は、ふたり
の人物の顔に関して前提技術で求められるｐ（５，３）
の画像をそれぞれディスプレイ上に表示した中間調画像
の写真である。FIG. 1A is an image obtained by applying an averaging filter to the faces of two persons, and FIG.
1 (d) is an image of p (5,0) obtained by the base technology with respect to the faces of two persons, FIG. 1 (e) and FIG.
(F) is an image of p (5,1) obtained by the base technology with respect to the faces of two persons, and FIGS. 1 (g) and 1 (h)
P required by the underlying technology for the faces of two people
The images of (5, 2) and FIGS. 1 (i) and 1 (j) show p (5,3) obtained by the premise technique for the faces of two persons.
2 is a photograph of a halftone image in which each image is displayed on a display.

【図２】図２（Ｒ）はもとの四辺形を示す図、図２
（Ａ）、図２（Ｂ）、図２（Ｃ）、図２（Ｄ）、図２
（Ｅ）はそれぞれ相続四辺形を示す図である。FIG. 2 (R) shows an original quadrilateral, FIG.
(A), FIG. 2 (B), FIG. 2 (C), FIG. 2 (D), FIG.
(E) is a figure which shows an inheritance quadrangle.

【図３】始点画像と終点画像の関係、および第ｍレベル
と第ｍ−１レベルの関係を相続四辺形を用いて示す図で
ある。FIG. 3 is a diagram illustrating a relationship between a start point image and an end point image and a relationship between an m-th level and an (m-1) th level using an inherited quadrilateral.

【図４】パラメータηとエネルギーＣｆの関係を示す図
である。FIG. 4 is a diagram showing a relationship between a parameter η and an energy Cf.

【図５】図５（ａ）、図５（ｂ）は、ある点に関する写
像が全単射条件を満たすか否かを外積計算から求める様
子を示す図である。5 (a) and 5 (b) are diagrams showing how to determine from a cross product calculation whether or not a mapping for a certain point satisfies the bijection condition.

【図６】前提技術の全体手順を示すフローチャートであ
る。FIG. 6 is a flowchart showing an overall procedure of the base technology.

【図７】図６のＳ１の詳細を示すフローチャートであ
る。FIG. 7 is a flowchart showing details of S1 of FIG. 6;

【図８】図７のＳ１０の詳細を示すフローチャートであ
る。FIG. 8 is a flowchart showing details of S10 in FIG. 7;

【図９】第ｍレベルの画像の一部と、第ｍ−１レベルの
画像の一部の対応関係を示す図である。FIG. 9 is a diagram illustrating a correspondence relationship between a part of an m-th level image and a part of an (m−1) th level image;

【図１０】前提技術で生成された始点階層画像を示す図
である。FIG. 10 is a diagram showing a starting hierarchical image generated by the base technology.

【図１１】図６のＳ２に進む前に、マッチング評価の準
備の手順を示す図である。FIG. 11 is a diagram showing a procedure for preparing a matching evaluation before proceeding to S2 of FIG. 6;

【図１２】図６のＳ２の詳細を示すフローチャートであ
る。FIG. 12 is a flowchart illustrating details of S2 in FIG. 6;

【図１３】第０レベルにおいて副写像を決定する様子を
示す図である。FIG. 13 is a diagram showing how a sub-mapping is determined at the 0th level.

【図１４】第１レベルにおいて副写像を決定する様子を
示す図である。FIG. 14 is a diagram showing how a sub-mapping is determined at the first level.

【図１５】図１２のＳ２１の詳細を示すフローチャート
である。FIG. 15 is a flowchart showing details of S21 in FIG. 12;

【図１６】あるｆ（ｍ，ｓ）についてλを変えながら求
められたｆ（ｍ，ｓ）（λ＝ｉΔλ）に対応するエネル
ギーＣ（ｍ，ｓ）ｆの挙動を示す図である。FIG. 16 is a diagram showing a behavior of energy C (m, s) f corresponding to f (m, s) (λ = iΔλ) obtained while changing λ for a certain f (m, s).

【図１７】ηを変えながら求められたｆ（ｎ）（η＝ｉ
Δη）（ｉ＝０，１，…）に対応するエネルギーＣ
（ｎ）ｆの挙動を示す図である。FIG. 17 shows f (n) obtained while changing η (η = i
Δη) energy C corresponding to (i = 0, 1,...)
It is a figure which shows the behavior of (n) f.

【図１８】図１８（ａ）、図１８（ｂ）、図１８（ｃ）
はそれぞれ、あるオブジェクトに関する左目画像、右目
画像、前提技術で生成された補間画像をディスプレイ上
に表示した中間調画像の写真である。18 (a), 18 (b), 18 (c)
Are photographs of a left-eye image, a right-eye image, and a halftone image of an interpolated image generated by the base technology on a display, respectively.

【図１９】図１９（ａ）、図１９（ｂ）、図１９
（ｃ）、図１９（ｄ）はそれぞれ、ある人物の顔、別の
人物の顔、それらの重ね合わせ画像、前提技術で生成さ
れたモーフィング画像をディスプレイ上に表示した中間
調画像の写真である。19 (a), 19 (b), 19
(C) and FIG. 19 (d) are photographs of a halftone image in which a face of a certain person, a face of another person, a superimposed image thereof, and a morphing image generated by the base technology are displayed on a display. .

【図２０】図２０（ａ）、図２０（ｂ）はそれぞれ、猫
の顔、人と猫の顔のモーフィング画像をディスプレイ上
に表示した中間調画像の写真である。FIGS. 20 (a) and 20 (b) are photographs of a halftone image in which a morphing image of a face of a cat, a face of a person and a face of a cat are displayed on a display, respectively.

【図２１】図２１（ａ）、図２１（ｂ）、図２１（ｃ）
はそれぞれ、多数のオブジェクトが含まれる左目画像、
右目画像、前提技術で生成された補間画像をディスプレ
イ上に表示した中間調画像の写真である。FIG. 21 (a), FIG. 21 (b), FIG. 21 (c)
Are left-eye images containing many objects,
It is a photograph of a right-eye image and a halftone image in which an interpolation image generated by the base technology is displayed on a display.

【図２２】図２２（ａ）、図２２（ｂ）、図２２
（ｃ）、図２２（ｄ）はそれぞれ、ＭＲに関する始点画
像、終点画像、前提技術で生成された補間画像、補間画
像をもとに生成されたボリュームリンダリング画像をデ
ィスプレイ上に表示した中間調画像の写真である。FIG. 22 (a), FIG. 22 (b), FIG.
(C) and FIG. 22 (d) respectively show a halftone image in which a start point image, an end point image, an interpolated image generated by the base technology, and a volume rendering image generated based on the interpolated image are displayed on a display. It is a photograph of an image.

【図２３】実施の形態による疑似３次元画像生成の手順
を示すフローチャートである。FIG. 23 is a flowchart showing a procedure for generating a pseudo three-dimensional image according to the embodiment.

【図２４】実施の形態に係る撮影装置を示す配置図であ
る。FIG. 24 is a layout diagram showing a photographing apparatus according to an embodiment.

【図２５】実施の形態に係る中間的な画像の合成処理を
概念的に説明する図面である。FIG. 25 is a diagram conceptually illustrating an intermediate image combining process according to the embodiment.

【図２６】実施の形態に係る別の撮影装置を示す配置図
である。FIG. 26 is a layout diagram showing another photographing apparatus according to the embodiment.

【図２７】実施の形態に係るパノラマ写真の合処理を説
明する概念図である。FIG. 27 is a conceptual diagram illustrating a process of combining panoramic photos according to the embodiment.

[Explanation of symbols]

Ｏ対象ａ，ｂカメラｓ新たな視線Ａ，Ｂ原画像Ｓ補間画像Ｖａ，Ｖｂ，Ｖｓ視点ＦカメラＧカメラ配置座標 O Target a, b Camera s New line of sight A, B Original image S Interpolated image Va, Vb, Vs Viewpoint F Camera G Camera arrangement coordinates

─────────────────────────────────────────────────────
────────────────────────────────────────────────── ───

【手続補正書】[Procedure amendment]

【提出日】平成１３年７月１９日（２００１．７．１
９）[Submission date] July 19, 2001 (2001.7.1)
9)

【手続補正１】[Procedure amendment 1]

【補正対象書類名】図面[Document name to be amended] Drawing

【補正対象項目名】全図[Correction target item name] All figures

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【図６】 FIG. 6

【図７】 FIG. 7

【図１０】 FIG. 10

【図１１】 FIG. 11

【図１３】 FIG. 13

【図１】 FIG.

【図５】 FIG. 5

【図１６】 FIG.

【図１７】 FIG.

【図２】 FIG. 2

【図３】 FIG. 3

【図８】 FIG. 8

【図９】 FIG. 9

【図１４】 FIG. 14

【図２５】 FIG. 25

【図４】 FIG. 4

【図１２】 FIG.

【図２４】 FIG. 24

【図２６】 FIG. 26

【図１５】 FIG.

【図１９】 FIG.

【図１８】 FIG.

【図２０】 FIG.

【図２３】 FIG. 23

【図２７】 FIG. 27

【図２１】 FIG. 21

【図２２】 FIG.

フロントページの続きＦターム(参考） 5B057 CA01 CA08 CA12 CA16 CB01 CB08 CB12 CB16 CE06 CE08 CE10 CH09 5C023 AA03 AA09 AA10 AA32 AA37 AA38 BA02 BA13 CA01 DA04 DA08 5C061 AA21 AB04 AB08 AB12 5C076 AA19 AA40 BA01 BA06 BA09 BB15 CA02 Continued on front page F term (reference) 5B057 CA01 CA08 CA12 CA16 CB01 CB08 CB12 CB16 CE06 CE08 CE10 CH09 5C023 AA03 AA09 AA10 AA32 AA37 AA38 BA02 BA13 CA01 DA04 DA08 5C061 AA21 AB04 AB08 AB12 5C076 AA19 BA09

Claims

[Claims]

1. A step of calculating matching between a plurality of original images which are photographed from two or more directions and include a viewpoint set for an object in the image, and interpolation based on the result of the matching. A method for generating a pseudo three-dimensional image, comprising: generating a composite image viewed from a direction including the viewpoint; and presenting data of the composite image.

2. A step of acquiring a plurality of images photographed from a plurality of positions around a subject, and a step of selecting, from the acquired images, two or more original images including a set viewpoint in the images. Calculating a matching between the two or more original images; generating a composite image viewed from a direction including the set viewpoint based on the result of the matching; A pseudo three-dimensional image generation method including a step of presenting.

3. The matching calculating step includes the steps of: applying a multi-resolution singularity filter to each original image to generate a series of hierarchical images having different resolutions; Calculating the matching between the two, wherein the synthetic image generating step includes the step of: using the positional relationship between the set viewpoint and the viewpoint of the original image, and the correspondence between the position and the luminance of the matched pixels in the original image. 3. The method according to claim 1, further comprising the step of performing an interpolation calculation on a pixel-by-pixel basis to generate an intermediate image.
The pseudo three-dimensional image generation method described in the above.

4. A unit for photographing from a plurality of positions around a subject and acquiring image data of a plurality of images,
A unit for selecting two or more original images that are photographed from two or more directions and include a viewpoint set for the object in the image, and a multi-resolution singularity filter is applied to each of the image data of the original images. A unit for generating image data of a series of hierarchical images having different resolutions, a unit for calculating matching between original images based on the image data of the hierarchical images having different resolution levels, A unit for generating image data of an intermediate image by performing an interpolation calculation on a pixel basis based on the correspondence relationship between the position and luminance of the matched pixels in the original image using the positional relationship of the viewpoint of the image, A pseudo three-dimensional image generation apparatus, which generates and presents a composite image viewed from a direction including a set viewpoint.

5. The robot according to claim 1, wherein the unit for acquiring the plurality of images is a robot equipped with an imaging device, and the robot is operated by a built-in program or remote control to acquire image data of images by photographing from a plurality of positions. The pseudo three-dimensional image generation device according to claim 4, wherein

6. A recording medium storing a computer-executable program, the program comprising: a step of selecting two or more original images including a viewpoint for an object in the image; Applying a multi-resolution singularity filter to each image data to generate image data of a series of hierarchical images having different resolutions; and performing matching between original images based on the image data of the hierarchical images having different resolution levels. Calculating the image data of the intermediate image by performing an interpolation calculation on a pixel-by-pixel basis based on the correspondence between the positions of the matched pixels and the luminance using the positional relationship between the set viewpoint and the viewpoint of the original image. A computer-readable recording medium, comprising: