JP2019507934A

JP2019507934A - 3D motion evaluation apparatus, 3D motion evaluation method, and program

Info

Publication number: JP2019507934A
Application number: JP2018548017A
Authority: JP
Inventors: シュボジトチャウダリー; 中野　学; 学中野
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2016-03-11
Filing date: 2016-03-11
Publication date: 2019-03-22
Anticipated expiration: 2036-03-11
Also published as: WO2017154045A1; JP6806160B2

Abstract

３次元運動評価装置１００は、２次元画像対応点探索部１０１と、３次元運動最適化部１０２とを備えている。２次元画像対応点探索部１０１は、後続のフレーム間の密な２次元の対応点を探索し、フレーム画像間の画素単位での２次元の動きを出力する。３次元運動最適化部１０２は、正確な実世界の３次元の動きを取得するために、単一のフレームで観察された単一点の深さを用いて、画素単位での２次元の動きの誤差を最適化し、この画素単位での２次元の動きの誤差の最適化によって、対象物の３次元の実世界の動きを算出する。The three-dimensional motion evaluation apparatus 100 includes a two-dimensional image corresponding point search unit 101 and a three-dimensional motion optimization unit 102. The two-dimensional image corresponding point search unit 101 searches for a dense two-dimensional corresponding point between subsequent frames and outputs a two-dimensional motion in pixel units between frame images. The 3D motion optimization unit 102 uses the depth of a single point observed in a single frame to obtain accurate 3D motion in the real world. By optimizing the error and optimizing the error of the two-dimensional motion in this pixel unit, the three-dimensional real world motion of the object is calculated.

Description

本発明は、３次元運動評価装置、３次元運動評価方法、及びこれらを実現するためのプログラムに関し、更には、ビデオによる３次元再構築の分野、より詳しくは、単眼画像シーケンスからの非剛性３次元運動評価に関する。 The present invention relates to a three-dimensional motion evaluation apparatus, a three-dimensional motion evaluation method, and a program for realizing them, and further to the field of video three-dimensional reconstruction, more specifically, non-rigid 3 from a monocular image sequence. Related to dimensional motion evaluation.

単眼画像シーケンスからの３次元再構築の分野は、略２０年にわたり、コンピュータビジョン共同体において、アクティブな研究の分野である。画像からの３次元の再構築は、アニメーション、３次元印刷、ビデオおよび画像編集などの様々な分野において、種々の用途を見出している。この分野におけるほとんどの従来のシステムは、カメラが様々な視点から所望の物体の画像を撮影する、カメラベースの方式で動作する。また、このとき、画像は、物体の構造及びカメラの動きを同時に計算するために使用される。物体の構造は、カメラの動きに基づいて、この分野で広く普及している３次元再構築手法においては、クラスとされる。また、画像シーケンスが取得され、画像データ上のランク制約を用いて、構造とカメラの動きとが算出される。このステージの後には、通常、カメラのポーズと物体構造とを同時に最適化するバンドル調整ステージが続く。 The field of 3D reconstruction from monocular image sequences has been the field of active research in the computer vision community for nearly 20 years. Three-dimensional reconstruction from images has found various applications in various fields such as animation, three-dimensional printing, video and image editing. Most conventional systems in this field operate in a camera-based manner in which the camera takes images of the desired object from various viewpoints. At this time, the image is used to calculate the structure of the object and the movement of the camera at the same time. The structure of an object is classified as a class in a three-dimensional reconstruction method widely used in this field based on the movement of a camera. Also, an image sequence is acquired, and the structure and camera motion are calculated using rank constraints on the image data. This stage is usually followed by a bundle adjustment stage that simultaneously optimizes the camera pose and object structure.

この分野における挑戦は、効率的、且つ正確に、２次元における対応点のみから、非剛体物の構造を計算することである。高密度の非剛体の構造は、モーション手法に基づいて、アフィンカメラモデル（非特許文献１）を想定し、ランク制約を適用し、分解法を用いて解かれる。しかし、アファインカメラモデルは、画像形成がカメラの光学中心からの点の深さとは無関係であると仮定しているので、光軸に沿って、変換を回復できない。 The challenge in this area is to calculate the structure of a non-rigid object from only corresponding points in two dimensions efficiently and accurately. The structure of the high-density non-rigid body is solved by using a decomposition method, assuming a affine camera model (Non-Patent Document 1) based on a motion technique, applying rank constraints. However, the affine camera model assumes that the image formation is independent of the depth of the point from the optical center of the camera, so the transformation cannot be recovered along the optical axis.

この奥行のあいまいさの問題を解決するため、透視投影として、３次元から２次元への運動マッピングをモデル化する必要がある。Triggs等（非特許文献２）によって行われた研究は、透視投影下での剛体構造とカメラ姿勢推定とのための、因子分解による定式化を提供する。近年、動きによる剛体構造における密度は、Ondruska等（非特許文献３）によって、携帯電話、タブレットといったポータブルプラットフォーム上で、これらで利用可能な一般的なストックカメラを用いて、実装に成功している。 In order to solve this depth ambiguity problem, it is necessary to model a three-dimensional to two-dimensional motion mapping as perspective projection. A study conducted by Triggs et al. (Non-Patent Document 2) provides a formulation by factorization for rigid body structure and camera pose estimation under perspective projection. In recent years, the density of rigid structures due to movement has been successfully implemented by Ondruska et al. (Non-Patent Document 3) on portable platforms such as mobile phones and tablets using common stock cameras available on these platforms. .

上記の研究は、動きによる剛体構造の例を扱うが、構造と姿勢の最終的な解は非常に複雑な多様体上にあり、解は初期の種に大きく依存しているため、透視投影法での動きからの非剛体構造の密度の問題は、非常に困難な問題である。このような最適化の問題は、しばしば厄介であり、リアルタイムで解くことが困難である。多くの場合、解決策が存在する空間を制約するために、事前知識が解決策に適用される。Vidal等（非特許文献４）による透視投影ベースの非剛性体の再構成の分野では、以前から研究がなされているが、これらの方法は、主に、まばらな点の再構築に基づくものであり、高密度の再構成に対してはうまく拡張できない。 The above study deals with the example of rigid body structure by motion, but the final solution of structure and pose is on a very complex manifold, and the solution depends heavily on the initial species, so perspective projection The problem of non-rigid structure density from motion at is very difficult. Such optimization problems are often cumbersome and difficult to solve in real time. In many cases, prior knowledge is applied to the solution to constrain the space in which the solution exists. In the field of perspective projection based non-rigid body reconstruction by Vidal et al. (Non-Patent Document 4), research has been done for some time, but these methods are mainly based on reconstruction of sparse points. Yes, it cannot scale well for high density reconstructions.

Newcombe等（非特許文献５）による研究は、RGB-Dベースの入力データにおいて、対象物の標準的な剛体モデルの計算を試み、フレーム間の３次元運動を計算して、標準的な剛体モデルをアニメーション化し、更に、実際の非剛体の変形を生成する。この研究は、提案された発明に最も近いものであるが、固定されたパースペクティブカメラの下で３次元フローを計算するためにRGB情報のみを使用し、問題をはるかに難しくしている。 Newcombe et al. (Non-Patent Document 5) tried to calculate a standard rigid body model of an object in RGB-D-based input data, and calculated a three-dimensional motion between frames to obtain a standard rigid body model. , And generate an actual non-rigid deformation. This work is closest to the proposed invention, but uses only RGB information to calculate the 3D flow under a fixed perspective camera, making the problem much more difficult.

Garg, R.; Roussos, A.; Agapito, L., "Dense Variational Reconstruction of Non-rigid Surfaces from Monocular Video," in Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on , vol., no., pp.1272-1279, 23-28 June 2013Garg, R .; Roussos, A .; Agapito, L., "Dense Variational Reconstruction of Non-rigid Surfaces from Monocular Video," in Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, vol., No., pp.1272-1279, 23-28 June 2013 Triggs, B.,"Factorizationmethods for projective structure and motion," in Computer Vision and Pattern Recognition, 1996. Proceedings CVPR '96, 1996 IEEE Computer Society Conference on , vol., no., pp.845-851, 18-20 Jun 1996Triggs, B., "Factorizationmethods for projective structure and motion," in Computer Vision and Pattern Recognition, 1996. Proceedings CVPR '96, 1996 IEEE Computer Society Conference on, vol., No., Pp.845-851, 18-20 Jun 1996 Ondruska, P.; Kohli, P.; Izadi, S., "MobileFusion: Real-Time Volumetric Surface Reconstruction and Dense Tracking on Mobile Phones," in Visualization and Computer Graphics, IEEE Transactions on , vol.21, no.11, pp.1251-1258, Nov. 15 2015Ondruska, P .; Kohli, P .; Izadi, S., "MobileFusion: Real-Time Volumetric Surface Reconstruction and Dense Tracking on Mobile Phones," in Visualization and Computer Graphics, IEEE Transactions on, vol.21, no.11, pp.1251-1258, Nov. 15 2015 Ren´e Vidal and Daniel Abretske , “Nonrigid Shape and Motion from Multiple Perspective Views”, ECCV 2006-European Conference on Computer Vision, 2014Ren´e Vidal and Daniel Abretske, “Nonrigid Shape and Motion from Multiple Perspective Views”, ECCV 2006-European Conference on Computer Vision, 2014 Newcombe, R.A.; Fox, D.; Seitz,S.M., "DynamicFusion: Reconstruction and tracking of non-rigid scenes inreal-time," in Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on , vol., no., pp.343-352, 7-12 June 2015Newcombe, RA; Fox, D .; Seitz, SM, "DynamicFusion: Reconstruction and tracking of non-rigid scenes inreal-time," in Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on, vol., No., pp.343-352, 7-12 June 2015

上述した進歩にもかかわらず、高密度非剛体の３次元再構築は、以前として困難な問題である。従って、本発明は、上述した問題を解決するためになされたものである。 Despite the progress described above, three-dimensional reconstruction of high density non-rigid bodies is a difficult problem as before. Therefore, the present invention has been made to solve the above-described problems.

本発明の目的の一例は、２次元の対応点から３次元の運動を解くことができ、非剛体の３次元再構築のために、初期モデルの３次元の動きをワープする、３次元運動評価装置、３次元運動評価方法、及びプログラムを提供することにある。 An example of an object of the present invention is to evaluate 3D motion, which can solve 3D motion from 2D corresponding points and warp 3D motion of the initial model for 3D reconstruction of non-rigid bodies An apparatus, a three-dimensional motion evaluation method, and a program are provided.

上記目的を達成するため、本発明の一側面における３次元運動評価装置は、単眼画像から対象物の高密度非剛体３次元運動を算出するための３次元運動評価装置であって、
後続のフレーム間の密な２次元の対応点を探索し、フレーム画像間の画素単位での２次元の動きを出力する、２次元画像対応点探索部と、
正確な実世界の３次元の動きを取得するために、単一のフレームで観察された単一点の深さを用いて、画素単位での２次元の動きの誤差を最適化し、この画素単位での２次元の動きの誤差の最適化によって、対象物の３次元の実世界の動きを算出する、３次元運動最適化部と、
を備えている、ことを特徴とする。 In order to achieve the above object, a three-dimensional motion evaluation apparatus according to one aspect of the present invention is a three-dimensional motion evaluation apparatus for calculating high-density non-rigid three-dimensional motion of an object from a monocular image,
A two-dimensional image corresponding point search unit that searches for dense two-dimensional corresponding points between subsequent frames and outputs a two-dimensional motion in pixel units between frame images;
In order to obtain accurate real-world 3D motion, the single-point depth observed in a single frame is used to optimize the 2D motion error in pixel units, A three-dimensional motion optimization unit that calculates the three-dimensional real-world motion of the object by optimizing the two-dimensional motion error of
It is characterized by having.

また、上記目的を達成するため、本発明の一側面における３次元運動評価方法は、単眼画像から対象物の高密度非剛体３次元運動を算出するための３次元運動評価方法であって、
（ａ）後続のフレーム間の密な２次元の対応点を探索し、フレーム画像間の画素単位での２次元の動きを出力する、ステップと、
（ｂ）正確な実世界の３次元の動きを取得するために、単一のフレームで観察された単一点の深さを用いて、画素単位での２次元の動きの誤差を最適化し、この画素単位での２次元の動きの誤差の最適化によって、対象物の３次元の実世界の動きを算出する、ステップと、
を有する、ことを特徴とする。 In order to achieve the above object, a three-dimensional motion evaluation method according to one aspect of the present invention is a three-dimensional motion evaluation method for calculating a high-density non-rigid three-dimensional motion of an object from a monocular image,
(A) searching for dense two-dimensional corresponding points between subsequent frames and outputting a two-dimensional motion in pixel units between frame images;
(B) In order to obtain accurate real-world 3D motion, the single point depth observed in a single frame is used to optimize the 2D motion error on a pixel-by-pixel basis. Calculating a three-dimensional real-world motion of the object by optimizing a two-dimensional motion error in pixel units;
It is characterized by having.

更に、上記目的を達成するため、本発明の一側面におけるプログラムは、コンピュータによって、単眼画像から対象物の高密度非剛体３次元運動を算出するためのプログラムであって、
前記コンピュータに、
（ａ）後続のフレーム間の密な２次元の対応点を探索し、フレーム画像間の画素単位での２次元の動きを出力する、ステップと、
（ｂ）正確な実世界の３次元の動きを取得するために、単一のフレームで観察された単一点の深さを用いて、画素単位での２次元の動きの誤差を最適化し、この画素単位での２次元の動きの誤差の最適化によって、対象物の３次元の実世界の動きを算出する、ステップと、
を実行させることを特徴とする。 In order to achieve the above object, a program according to one aspect of the present invention is a program for calculating a high-density non-rigid three-dimensional motion of an object from a monocular image by a computer,
In the computer,
(A) searching for dense two-dimensional corresponding points between subsequent frames and outputting a two-dimensional motion in pixel units between frame images;
(B) In order to obtain accurate real-world 3D motion, the single point depth observed in a single frame is used to optimize the 2D motion error on a pixel-by-pixel basis. Calculating a three-dimensional real-world motion of the object by optimizing a two-dimensional motion error in pixel units;
Is executed.

以上のように本発明によれば、２次元の対応点から３次元の運動を解くことができ、非剛体の３次元再構築のために、初期モデルの３次元の動きをワープすることができる。 As described above, according to the present invention, the three-dimensional motion can be solved from the two-dimensional corresponding points, and the three-dimensional motion of the initial model can be warped for the three-dimensional reconstruction of the non-rigid body. .

詳細な説明と共に図面は、本発明の３次元運動評価方法の原理を説明するのに役立つ。図面は、図示のためのものであり、技術の適用を制限するものではない。
図１は、本発明の３次元運動評価装置の構成を示す概略ブロック図である。図２は、本発明の実施の形態における３次元運動評価装置の具体的な構成を示すブロック図である。図３は、２次元の対応点から高密度３次元運動野を算出する際の詳細なステップを示すワークフローである。図４は、パースペクティブカメラモデルの下での２次元から３次元への運動のマッピングプロセスを示す図である。図５は、本発明を用いた物体の推定形状と実際の物体の３次元形状との比較を示す図である。図６は、３次元運動評価装置が実装されるコンピュータプラットフォームの一例を示すブロック図である。 The drawings together with the detailed description serve to explain the principle of the three-dimensional motion evaluation method of the present invention. The drawings are for illustration purposes and do not limit the application of the technology.
FIG. 1 is a schematic block diagram showing the configuration of the three-dimensional motion evaluation apparatus of the present invention. FIG. 2 is a block diagram showing a specific configuration of the three-dimensional motion evaluation apparatus according to the embodiment of the present invention. FIG. 3 is a workflow showing detailed steps in calculating a high-density three-dimensional motor area from two-dimensional corresponding points. FIG. 4 is a diagram illustrating a 2D to 3D motion mapping process under a perspective camera model. FIG. 5 is a diagram showing a comparison between an estimated shape of an object using the present invention and a three-dimensional shape of an actual object. FIG. 6 is a block diagram illustrating an example of a computer platform on which the three-dimensional motion evaluation apparatus is implemented.

（発明の概要）
［技術的な問題点］
２次元画像シーケンスからの非剛体表面の３次元再構成のプロセスは、非常に難しい問題であり、信頼できる３次元ポーズ及び構造の推定のための、カメラ及びオブジェクトの構造の様々な条件に依存する。運動のモデリングは、対象物及びカメラの構成が回復可能であるかを決定するために重要な役割を果たす。例えば、運動の問題（例えば、非特許文献４のような）に由来する多くの高密度構造において、アフィンカメラモデルは、光軸に沿った変換を回復することができない。従って、カメラが固定され、対象物が光軸に沿って移動している場合、アフィンモデルのみを使用して正確な解を得ることは不可能となる。この問題を解決するためには、本質的に深さにおいて曖昧さを持つ射影カメラモデルを想定しなければならず、この結果、故意の問題が生じ、高密度再構成のための計算上のスケールアップが困難となる。 (Summary of Invention)
[Technical issues]
The process of 3D reconstruction of non-rigid surfaces from 2D image sequences is a very difficult problem and depends on various conditions of the camera and object structure for reliable 3D pose and structure estimation . Motion modeling plays an important role in determining whether the object and camera configuration is recoverable. For example, in many high density structures resulting from motion problems (eg, as in Non-Patent Document 4), the affine camera model cannot recover the transformation along the optical axis. Therefore, when the camera is fixed and the object is moving along the optical axis, it is impossible to obtain an accurate solution using only the affine model. To solve this problem, one must assume a projection camera model that is inherently ambiguous in depth, resulting in deliberate problems and a computational scale for high-density reconstruction. It becomes difficult to up.

この問題を解決しようとする既存のアプローチがあるが、これらには、以下のような欠点もある。射影非剛体構造を運動から処理する方法のほとんどは、低ランク制約を仮定し、分解法を用いて構造と形状との解法を試みる。このような方法は、疎な再構成には適しているが、オブジェクトの高密度再構成においては、構造及び姿勢の時間に伴う進展に対
してうまく調整できない。 There are existing approaches that attempt to solve this problem, but these also have the following disadvantages: Most methods for processing projected non-rigid structures from motion assume low rank constraints and attempt to solve structures and shapes using decomposition methods. Such a method is suitable for sparse reconstruction, but in high-density reconstruction of objects, it cannot adjust well for the evolution over time of structure and pose.

上述したエンティティに加えて、本発明が克服することができる他の明白且つ明らかな欠点は、詳細な明細書および図面から明らかになるであろう。これらの問題を解決するための概要は、次の通りである。 In addition to the entities described above, other obvious and obvious disadvantages that the present invention can overcome will become apparent from the detailed description and drawings. The outline for solving these problems is as follows.

［問題の解説策］
上述した技術的問題を解決するため、全体のアプローチを以下に概略的に説明する。３次元非剛体の再構成のプロセスは、増分プロセスとして扱われ、現在のフレームにおける形状は、以前のフレームの形状と現在のフレームにおける３次元の動きとの組合せで構成されている。３次元再構成プロセスのこのような処理は、上述の問題を下位の２つの問題に下げ、（i）対象物の信頼できる初期の剛体モデルを特定し、（ii）剛体モデルをアニメーション化するためにフレーム毎に３次元の動きを計算し、この結果、シーンにおける非剛体の３次元の再構成を実行する。対象物の信頼性のある初期のモデルの生成に利用可能な、多くの技術が、文献（例えば、非特許文献３）に存在している。 [Problem explaining the problem]
In order to solve the technical problem mentioned above, the overall approach is outlined below. The process of 3D non-rigid reconstruction is treated as an incremental process and the shape in the current frame is composed of a combination of the previous frame shape and the 3D motion in the current frame. Such processing of the 3D reconstruction process reduces the above problem to the lower two problems, (i) identifies a reliable initial rigid body model of the object, and (ii) animates the rigid body model 3D motion is calculated for each frame, and as a result, 3D reconstruction of the non-rigid body in the scene is performed. Many techniques that can be used to generate a reliable initial model of an object exist in the literature (eg, Non-Patent Document 3).

本発明の技術的な利点は、この方法によれば、時間的及び空間的にコヒーレントであり、且つ、画像及び動きのノイズにロバストな、パースペクティブカメラ投影モデルを仮定して、２次元の対応点から高密度な３次元の運動野を生成できる、ことである。提案された発明のフレームワークは、本質的にインクレメンタルである。即ち、３次元の運動野の解明は、フレーム毎に、以前のフレームに基づいて行われている。また、提案されたモデルは、既知の絶対深度を少なくとも１点で利用するため、カメラに対する任意の運動方向の絶対的なスケールで３次元の運動を計算することができる。 The technical advantage of the present invention is that, according to this method, a two-dimensional corresponding point is assumed, assuming a perspective camera projection model that is temporally and spatially coherent and robust to image and motion noise. It is possible to generate a high-density three-dimensional motor area. The framework of the proposed invention is incremental in nature. That is, the elucidation of the three-dimensional motor area is performed for each frame based on the previous frame. Also, since the proposed model uses a known absolute depth at least at one point, it can calculate 3D motion on an absolute scale in any direction of motion relative to the camera.

従って、本発明は、いくつかのステップと、これらのステップのうちの１つ以上の他のステップとの関係と、構成の特徴を具現化する装置と、要素の組合せと、このようなステップに影響を与えるように最適化されたパーツの配置と構成され、これら全ては、以下の詳細な開示、即ち、図面の説明及び詳細な説明において例示される。本発明の範囲は、特許請求の範囲に示される。 Accordingly, the present invention relates to a number of steps, the relationship between one or more of these steps, an apparatus that embodies the features of the arrangement, a combination of elements, and such steps. The arrangement of parts optimized to influence is configured, all of which are illustrated in the following detailed disclosure, ie, the description of the drawings and the detailed description. The scope of the invention is indicated in the claims.

（実施の形態）
以下、本発明の実施の形態における、ネットワークシステム、３次元運動評価装置、３次元運動評価方法、及びプログラムについて、図１〜図５を参照しながら説明する。 (Embodiment)
Hereinafter, a network system, a three-dimensional motion evaluation apparatus, a three-dimensional motion evaluation method, and a program according to an embodiment of the present invention will be described with reference to FIGS.

以下に、本発明の実施の形態の一例を詳細に説明する。本発明の実装が、完全に詳細に述べられる。例示的な図面に沿って、ここで提供される説明は、本発明を実施する当業者に確かなガイドを提供する。 Hereinafter, an example of an embodiment of the present invention will be described in detail. Implementations of the invention are described in full detail. The description provided herein along with the exemplary drawings provides a solid guide to those skilled in the art practicing the present invention.

本発明は、パースペクティブカメラモデルを使用して、高密度の２次元画像の対応点から、高密度の３次元の非線形の運動を計算することに関する。本発明は、高密度の２次元対応点の計算と、これらの２次元の対応点からの３次元の運動の計算とに大別される。また、後者の計算は、時間的及び空間的に一貫する制約によって解決を制約する、透視投影モデルを用いて行われる。この３次元の運動は、以前の３次元形状と共にワープされて、現在の３次元形状を取得する。 The present invention relates to calculating high density three-dimensional non-linear motion from corresponding points in a high density two-dimensional image using a perspective camera model. The present invention is broadly divided into calculation of high-density two-dimensional corresponding points and calculation of three-dimensional motions from these two-dimensional corresponding points. The latter calculation is also performed using a perspective projection model that constrains the solution by constraints that are consistent in time and space. This 3D motion is warped with the previous 3D shape to obtain the current 3D shape.

［装置構成］
最初に、本発明の３次元運動評価装置の構成について図１を用いて説明する。図１は、本発明の３次元運動評価装置の構成を示す概略ブロック図である。 [Device configuration]
First, the configuration of the three-dimensional motion evaluation apparatus of the present invention will be described with reference to FIG. FIG. 1 is a schematic block diagram showing the configuration of the three-dimensional motion evaluation apparatus of the present invention.

図１に示すように、３次元運動評価装置１００は、上述したタスクを実行する。３次元
運動評価装置は、更に、種々のユニットに分けることができる。ここで、各ユニットの機能を、図１を用いて説明する。 As shown in FIG. 1, the three-dimensional motion evaluation apparatus 100 performs the above-described task. The three-dimensional motion evaluation apparatus can be further divided into various units. Here, the function of each unit will be described with reference to FIG.

図１に示すように、３次元運動評価装置１００は、２次元画像対応点探索部１０１と、３次元運動最適化部１０２とを備えている。２次元画像対応点探索部１０１は、後続のフレーム間の密な２次元の対応点を探索し、フレーム画像間の画素単位での２次元の動きを出力する。３次元運動最適化部１０２は、正確な実世界の３次元の動きを取得するために、単一のフレームで観察された単一点の深さを用いて、画素単位での２次元の動きの誤差を最適化し、この画素単位での２次元の動きの誤差の最適化によって、対象物の３次元の実世界の動きを算出する。 As shown in FIG. 1, the three-dimensional motion evaluation apparatus 100 includes a two-dimensional image corresponding point search unit 101 and a three-dimensional motion optimization unit 102. The two-dimensional image corresponding point search unit 101 searches for a dense two-dimensional corresponding point between subsequent frames and outputs a two-dimensional motion in pixel units between frame images. The 3D motion optimization unit 102 uses the depth of a single point observed in a single frame to obtain accurate 3D motion in the real world. By optimizing the error and optimizing the error of the two-dimensional motion in this pixel unit, the three-dimensional real world motion of the object is calculated.

このように、本実施の形態では、画素単位での２次元の動きの誤差が、単一点の深さを用いて最適化される。よって、２次元の対応点から３次元の運動を解くことが可能になり、非剛体の３次元再構築のために、初期モデルの３次元の運動がワープされる。 As described above, in this embodiment, the two-dimensional motion error in pixel units is optimized using the depth of a single point. Therefore, it becomes possible to solve the three-dimensional motion from the two-dimensional corresponding points, and the three-dimensional motion of the initial model is warped for the three-dimensional reconstruction of the non-rigid body.

続いて、図２を用いて、本実施の形態における３次元運動評価装置１００について、より詳細に説明する。図２は、本発明の実施の形態における３次元運動評価装置の具体的な構成を示すブロック図である。 Subsequently, the three-dimensional motion evaluation apparatus 100 in the present embodiment will be described in more detail with reference to FIG. FIG. 2 is a block diagram showing a specific configuration of the three-dimensional motion evaluation apparatus according to the embodiment of the present invention.

図２に示すように、３次元運動評価装置１００は、２次元画像対応点探索部１０１と、３次元運動最適化部１０２とに加えて、３次元運動ワーピング部１０３を備えている。 As shown in FIG. 2, the three-dimensional motion evaluation apparatus 100 includes a three-dimensional motion warping unit 103 in addition to a two-dimensional image corresponding point search unit 101 and a three-dimensional motion optimization unit 102.

提案されている３次元運動評価装置１００において、第１のユニットは、２次元画像対応点探索部１０１である。３次元運動評価のプロセスは、まず、画像フレーム間の２次元対応点を評価することによって始められる。図１から分かるように、画像シーケンス２００が、２次元画像対応点探索部１０１に入力され、２次元画像対応点探索部１０１は、後続のフレーム間での画素単位で２次元運動を算出する。２次元画像対応点探索部１０１は、画像ペア毎に、画像強度が一致する連続フレームにおける画像パッチを比較することによって、高密度２次元対応点を算出する。画像間の２次元対応点を見つけるための方法の１つは、オプティカルフローである。オプティカルフローは、参照フレーム内の画素毎に、輝度定数仮定を使用して求められ、動きベクトルは局所最適化または大域最適化を使用して密に計算される。２次元画像対応点を見つけるもう１つの方法は、特徴追跡技術によるものである。特徴追跡技術では、ターゲットフレームにマッチする基準フレーム内の各画素の周りで特徴記述子が計算され、２次元運動ベクトルが算出される。２つの方法のうちの１つ、即ち、オプティカルフロー方法、及び特徴追跡ベース方法のうちの１つ、又は類似の方法が、２次元の動きの探索に用いられる。 In the proposed three-dimensional motion evaluation apparatus 100, the first unit is a two-dimensional image corresponding point search unit 101. The process of 3D motion evaluation begins by first evaluating 2D corresponding points between image frames. As can be seen from FIG. 1, the image sequence 200 is input to the two-dimensional image corresponding point search unit 101, and the two-dimensional image corresponding point search unit 101 calculates a two-dimensional motion in units of pixels between subsequent frames. The two-dimensional image corresponding point search unit 101 calculates a high-density two-dimensional corresponding point by comparing image patches in consecutive frames having the same image intensity for each image pair. One method for finding two-dimensional corresponding points between images is optical flow. The optical flow is determined for each pixel in the reference frame using the luminance constant assumption, and the motion vector is calculated densely using local optimization or global optimization. Another way to find the two-dimensional image corresponding points is by a feature tracking technique. In the feature tracking technique, a feature descriptor is calculated around each pixel in the reference frame that matches the target frame, and a two-dimensional motion vector is calculated. One of two methods, one of optical flow method and one of feature tracking based method, or similar method is used for two-dimensional motion search.

次のユニットは、２次元の対応点から対応する３次元の動きを見つける、３次元最適化部１０２である。２次元画像探索点探索部１０１から得られた高密度２次元対応点は、パースペクティブカメラモデルを仮定することによって、３次元の運動の計算に用いられる。対象物における単一点での絶対深度２０１が利用できると、３次元運動最適化部１０２は、透視投影モデルを使用して、３次元の運動の現在の定値を画像平面に投影する。 The next unit is a three-dimensional optimization unit 102 that finds a corresponding three-dimensional motion from a two-dimensional corresponding point. The high-density two-dimensional corresponding points obtained from the two-dimensional image search point search unit 101 are used for calculation of three-dimensional motion by assuming a perspective camera model. When the absolute depth 201 at a single point in the object is available, the 3D motion optimization unit 102 projects the current constant value of the 3D motion onto the image plane using the perspective projection model.

単一点での絶対深度は、市販のレーザデプスセンサを用いて得ることができる。対象物上の既知のパターンを三角測量するといった、画像ベースの方法も、絶対深度を得るために使用することができる。この投影された２次元の運動は、観察された２次元の運動と比較され、最適化アルゴリズムによって、両者の間の誤差が最小化される。各ステップにおける運動の更新を解決するために、変分最適化技術が用いられる。これにより、投影された２次元の運動と観察された２次元の運動との間の誤差が最小化される。また、変分最適化で用いられるエネルギー関数は、３次元運動ソリューションにおいて、空間的及び時間
的な整合性が維持されることを保証する。これにより、異常値に対する、最終的なソリューションの堅牢性が保証される。最適化が収束すると、３次元運動最適化部１０２は、現在のフレームにおける最適な３次元の運動として、出力を提供する。 The absolute depth at a single point can be obtained using a commercially available laser depth sensor. Image-based methods such as triangulation of known patterns on the object can also be used to obtain absolute depth. This projected two-dimensional motion is compared with the observed two-dimensional motion, and an optimization algorithm minimizes the error between them. Variational optimization techniques are used to resolve motion updates at each step. This minimizes the error between the projected 2D motion and the observed 2D motion. The energy function used in variational optimization also ensures that spatial and temporal consistency is maintained in the 3D motion solution. This ensures the robustness of the final solution against outliers. When the optimization converges, the three-dimensional motion optimization unit 102 provides an output as the optimal three-dimensional motion in the current frame.

３次元運動ワーピング部１０３は、カメラと対象物との間の相対的な変換のみを想定する透視投影モデルを使用して、３次元の運動を２次元の運動に投影する。３次元運動ワーピング部１０３は、現フレームにおける画素単位での２次元の動きと、モデル化された絶対的な３次元から２次元への動きとの間の誤差を、カメラの固有パラメータと単一フレームにおける単一点の深さ値とを用いて、計算する。３次元運動ワーピング部１０３は、上述の誤差を最小化する。また、３次元運動ワーピング１０３は、最適な３次元運動のために、フレーム毎の空間平滑度とフレーム間の時間平滑度とを保存し、上述の誤差を最小化する３次元の実世界の運動を算出する。加えて、３次元運動ワーピング部１０３は、以前の非剛体３次元形状を入力として受け取り、計算された現フレームの３次元の運動を追加して、現在の非剛体３次元モデルを更新することができる。そして、３次元運動ワーピング部１０３は、現フレーム２０３における３次元形状を出力する。 The three-dimensional motion warping unit 103 projects a three-dimensional motion into a two-dimensional motion using a perspective projection model that assumes only relative conversion between the camera and the object. The three-dimensional motion warping unit 103 calculates an error between a two-dimensional motion in a pixel unit in the current frame and a modeled absolute three-dimensional motion to a single parameter with a camera intrinsic parameter. Calculation is performed using the depth value of a single point in the frame. The three-dimensional motion warping unit 103 minimizes the above-described error. Also, the three-dimensional motion warping 103 preserves the spatial smoothness of each frame and the temporal smoothness between frames for optimal three-dimensional motion, and minimizes the above-mentioned error in a three-dimensional real-world motion. Is calculated. In addition, the 3D motion warping unit 103 may receive a previous non-rigid 3D shape as an input, and add the calculated 3D motion of the current frame to update the current non-rigid 3D model. it can. Then, the three-dimensional motion warping unit 103 outputs a three-dimensional shape in the current frame 203.

次のステップでは、前のステップで得られた対象物の前の形状２０２と共に、３次元運動ワーピング部１０３によって現３次元運動がワープされ、現フレーム１０７における最終形状が得られる。このことは、３次元形状を２次元のメッシュとして表現することによって達成され、この場合、３次元の位置の各頂点には、幾つかのエッジの接続情報が含まれている。各フレームで最適な動きが見つかると、メッシュ内の各頂点の３次元の位置は、計算された現在の３次元運動と、前のフレームにおけるメッシュの頂点の位置とを使用して更新され、その際、頂点間のエッジの接続は変更されず、そのまま維持される。
これにより、対応する画像シーケンスから、対象物における、最終的な非剛体形状の再構成が達成される。 In the next step, the current 3D motion is warped by the 3D motion warping unit 103 together with the previous shape 202 of the object obtained in the previous step, and the final shape in the current frame 107 is obtained. This is achieved by expressing the three-dimensional shape as a two-dimensional mesh. In this case, connection information of several edges is included in each vertex of the three-dimensional position. When the optimal motion is found in each frame, the 3D position of each vertex in the mesh is updated using the calculated current 3D motion and the mesh vertex position in the previous frame, and At this time, the edge connection between the vertices is not changed and is maintained as it is.
This achieves a final non-rigid shape reconstruction of the object from the corresponding image sequence.

［装置動作］
次に、高密度の２次元の対応点からの高密度３次元運動の算出プロセスの全体と、その結果による３次元構造の検索と、について説明する。 [Device operation]
Next, the entire calculation process of the high-density three-dimensional motion from the high-density two-dimensional corresponding points and the search of the three-dimensional structure based on the result will be described.

図３を用いて、本発明の実施の形態における３次元運動評価装置１００の動作について説明する。図３は、本発明の実施の形態における３次元運動評価装置の動作を示すフロー図である。以下の説明においては、図１及び図２が適宜参照される。本実施の形態において、３次元運動評価方法は、３次元運動評価装置１００を動作させることによって実行される。従って、以下の３次元運動評価装置１００の動作の説明は、本実施の形態における３次元運動評価方法の説明に代える。 The operation of the three-dimensional motion evaluation apparatus 100 according to the embodiment of the present invention will be described with reference to FIG. FIG. 3 is a flowchart showing the operation of the three-dimensional motion evaluation apparatus in the embodiment of the present invention. In the following description, FIGS. 1 and 2 are referred to as appropriate. In the present embodiment, the three-dimensional motion evaluation method is executed by operating the three-dimensional motion evaluation device 100. Therefore, the following description of the operation of the three-dimensional motion evaluation apparatus 100 is replaced with the description of the three-dimensional motion evaluation method in the present embodiment.

システムの２次元画像対応点探索部１０１は、前の画像フレームと現在の画像フレームとを入力として取得する（ステップ３０１）。次に、２次元画像対応点探索部１０１は、これらの画像から、２次元密度対応点を算出する（ステップ３０２）。 The two-dimensional image corresponding point search unit 101 of the system acquires the previous image frame and the current image frame as inputs (step 301). Next, the two-dimensional image corresponding point search unit 101 calculates a two-dimensional density corresponding point from these images (step 302).

次に、２次元画像対応探索部１０１は、現在の入力である２次元運動フレームが、シークエンスにおける最初のフレームであるかどうかを判定する（ステップ３０３）。最初のフレームである場合は、２次元画像対応点探索部１０１は、対象物が、ブロック単位で剛体であると仮定して、３次元運動のための初期値を算出し、そして、ブロック毎の動きを計算する（ステップ３０４）。 Next, the two-dimensional image correspondence search unit 101 determines whether or not the two-dimensional motion frame that is the current input is the first frame in the sequence (step 303). In the case of the first frame, the two-dimensional image corresponding point search unit 101 calculates an initial value for three-dimensional motion assuming that the object is a rigid body in units of blocks, and for each block. The movement is calculated (step 304).

最初のフレームでない場合は、２次元画像対応点探索部１０１は、前回の３次元運動野を用いて等速度モデルを仮定して、現フレームにおける３次元運動のための初期値を算出する（ステップ３０５）。 If it is not the first frame, the two-dimensional image corresponding point search unit 101 assumes a constant velocity model using the previous three-dimensional motion field, and calculates an initial value for the three-dimensional motion in the current frame (step). 305).

次に、反復プロセスがスタートする。２次元画像対応点探索部１０１は、既知の絶対深度を用いて、画像平面上に３次元の運動を投影する（ステップ３０６）。次に、２次元画像対応点探索部１０１は、投影された２次元の運動と観察された２次元の運動との間の誤差を計算する（ステップ３０７）。 Next, the iterative process starts. The two-dimensional image corresponding point search unit 101 projects a three-dimensional motion on the image plane using the known absolute depth (step 306). Next, the two-dimensional image corresponding point search unit 101 calculates an error between the projected two-dimensional motion and the observed two-dimensional motion (step 307).

次に、３次元運動最適化部１０２は、誤差が特定の閾値未満かどうかを判定する（ステップ３０８）。誤差が特定の閾値未満でない場合は、３次元運動最適化部１０２は、最適化ステップを実行し、これにより、投影された動きと観察された動きとの間の２次元の運動の誤差が小さくなるように、３次元の運動が更新される（ステップ３０９）。 Next, the three-dimensional motion optimization unit 102 determines whether the error is less than a specific threshold (step 308). If the error is not less than a specific threshold, the three-dimensional motion optimization unit 102 performs an optimization step, thereby reducing a two-dimensional motion error between the projected motion and the observed motion. Thus, the three-dimensional motion is updated (step 309).

誤差が特定の閾値未満である場合は、３次元運動ワーピング部１０３は、３次元運動最適化部１０２から得られた最適な３次元運動と共に、前のフレーム３１０における対象物の形状をワープする。その結果、アルゴリズムの最終的な出力は、現フレームにおける３次元の形状となる（ステップ３１１）。 When the error is less than a specific threshold, the 3D motion warping unit 103 warps the shape of the object in the previous frame 310 together with the optimal 3D motion obtained from the 3D motion optimization unit 102. As a result, the final output of the algorithm is a three-dimensional shape in the current frame (step 311).

ここで、図４を用いて、本発明の数学的詳細について説明する。パースペクティブカメラのモーションマッピングについては、図４を参照して説明する。 Here, the mathematical details of the present invention will be described with reference to FIG. The motion mapping of the perspective camera will be described with reference to FIG.

図４においては、非剛体の対象物は、タイムフレームｔ（４０１）と、タイムフレームｔ＋１（４０２）とに示されている。図４に示すように、対象物の形状は変形している。特定点Xt（４０３）は、点間の３次元運動であるT=(Tx,Ty,Tz)T （４０８）と共に、点Xt+1（４０４）に移動する。これらの点は、画像平面４０５上において、点mt（４０６）と点mt+1（４０７）とに配置される。３次元運動は、画像平面上では、２次元運動、ot = mt+1 - mt、として登録される。本発明の目的は、図４に示すように、それに示された透視投影モデルを用いて、対応する２次元の運動(ot)から、３次元の運動(T)を探索することにある。カメラは空間内に固定されていると想定される。また、フレーム間の３次元運動は、並進運動によってのみ正確にモデル化できるということも想定される。 In FIG. 4, the non-rigid object is shown in time frame t (401) and time frame t + 1 (402). As shown in FIG. 4, the shape of the object is deformed. The specific point Xt (403) moves to the point Xt + 1 (404) together with T = (Tx, Ty, Tz) T (408) which is a three-dimensional motion between the points. These points are arranged at a point mt (406) and a point mt + 1 (407) on the image plane 405. The three-dimensional motion is registered as a two-dimensional motion, ot = mt + 1−mt, on the image plane. The object of the present invention is to search for a three-dimensional motion (T) from a corresponding two-dimensional motion (ot) using the perspective projection model shown in FIG. The camera is assumed to be fixed in space. It is also assumed that the three-dimensional motion between frames can be accurately modeled only by translational motion.

上述の議論から、２つの成分をot=(mxt,myt)Tとし、画像における２次元の運動をot=mt+1-mtとして表すと、３次元運動T=(Tx,Ty,Tz)Tと任意の所与の画素における２次元の運動(mxt,myt)Tとは、数１として与えられる。 From the above discussion, if the two components are represented as ot = (mxt, myt) T and the two-dimensional motion in the image is represented as ot = mt + 1-mt, the three-dimensional motion T = (Tx, Ty, Tz) T And the two-dimensional motion (mxt, myt) T at any given pixel is given as:

ここで、u、vはゼロ中心の画像座標系であり、fは焦点距離であり、Zはカメラ光学中心からの絶対距離である。ある点における絶対深度は、市販のレーザ深度センサを用いて得ることができる。対象物上の既知のパターンを三角測量するといった画像ベースの方法も、絶対深度を得るために使用することができる。本発明は、画素毎に、２次元運動から、３次元運動Tの算出を試みる。 Here, u and v are zero-centered image coordinate systems, f is a focal length, and Z is an absolute distance from the camera optical center. The absolute depth at a point can be obtained using a commercially available laser depth sensor. Image-based methods such as triangulation of known patterns on the object can also be used to obtain absolute depth. The present invention tries to calculate the three-dimensional motion T from the two-dimensional motion for each pixel.

カメラが固定され、物体が自由空間内で非線形の並進運動をしていると仮定して、密な2d動き対応からの後続のフレーム間の密な3d動作を見つけることを目指す。画像シーケンスが集合{It,t=1,2,...N}によって与えられていると想定すると、Nはシーケンス内の画像の枚数である。画像Itと画像It-1との間で、２次元運動野はmtで表される。画像Itと画像
It-1との間の３次元運動野をMtとし、現フレームMtにおける３次元運動の計算中に、以前の各フレームの３次元運動、Mt-1がバッファに格納されていると想定する。次に、現フレームにおける３次元運動を計算するため、数２がエネルギー関数としてフレーム化される。数２により、最適な３次元運動Mt*の値が最小化される。 Assuming that the camera is fixed and the object is in a nonlinear translation in free space, we aim to find a dense 3d motion between subsequent frames from a dense 2d motion correspondence. Assuming that the image sequence is given by the set {It, t = 1,2, ... N}, N is the number of images in the sequence. Between the image It and the image It-1, the two-dimensional motor area is represented by mt. Image It and Image
Assume that the three-dimensional motion field between It-1 is Mt, and the three-dimensional motion of each previous frame, Mt-1, is stored in the buffer during the calculation of the three-dimensional motion in the current frame Mt. Next, the number 2 is framed as an energy function to calculate the three-dimensional motion in the current frame. Equation 2 minimizes the value of the optimal 3D motion Mt *.

上記の数２は、縮小すべきグローバルエネルギー関数を与える。データ項Edは、観察された２次元運動と３次元運動の推定値とに依存する。データ項は、観察された２次元を用いて、画像平面上に投影された２次元運動の誤差を測定し、誤差が大きい場合には解に不利益を与える。データ項は、一般に、数３によって与えられる。 Equation 2 above gives the global energy function to be reduced. The data term Ed depends on the observed two-dimensional motion and the estimated value of the three-dimensional motion. The data term uses the observed 2D to measure the error of the 2D motion projected onto the image plane, and penalizes the solution if the error is large. The data term is generally given by Equation 3.

Ψ(.)は、従来のL2ノルムよりも重大ではない異常値の誤差にペナルティを課すロバストな重み関数である。注目画素毎の誤差の総和が、画像全体にわたって得られる。 Ψ (.) Is a robust weight function that penalizes outlier errors that are less critical than the conventional L2 norm. A sum of errors for each pixel of interest is obtained over the entire image.

エネルギー関数の第２項は、解の空間的及び時間的な滑らかさを維持する役割を担う平滑項である。空間的な滑らかさとは、最終的な３次元運動ソリューションが、画像のＸＹ軸に沿って滑らかであり、急な不連続性を伴っていないことを指している。つまり、隣接する画素の３次元運動の間には強い相関があるはずである。これは、画像座標におけるX-Y軸における３次元の運動の空間勾配として表される。時間的コヒーレンスとは、与えられた画像毎に、速度は時間とともに急激に変化すべきではく、画素の３次元速度における時間的変化は滑らかであるべき、という事実を指している。このことは、時間軸に沿った勾配としてエネルギー項に現れる。従って、平滑項Es は、数４のように与えられる。 The second term of the energy function is a smooth term that plays a role in maintaining the spatial and temporal smoothness of the solution. Spatial smoothness refers to the final three-dimensional motion solution being smooth along the XY axis of the image and without steep discontinuities. That is, there should be a strong correlation between the three-dimensional motions of adjacent pixels. This is expressed as a spatial gradient of the three-dimensional motion in the XY axis in image coordinates. Temporal coherence refers to the fact that for a given image, the speed should not change abruptly with time, and the temporal change in the three-dimensional speed of the pixel should be smooth. This appears in the energy term as a gradient along the time axis. Therefore, the smooth term Es is given as in Equation 4.

ここで、∇=(δ/δx, δ/δy, δ/δt)は、画像のX,Y軸に沿った３次元運動ベクトルを示している。重み関数φ(.)は、Ψ(.)に類似したロバストなカーネルである。画素毎の滑らかさの総和が、画像全体に渡って得られている。前フレームでの３次元運動は、時間の動きの勾配を計算するために必要であり、現フレームの３次元運動の計算の間にバッファに格納する必要がある。 Here, ∇ = (δ / δx, δ / δy, δ / δt) represents a three-dimensional motion vector along the X and Y axes of the image. The weight function φ (.) Is a robust kernel similar to Ψ (.). The sum of smoothness for each pixel is obtained over the entire image. The 3D motion in the previous frame is necessary to calculate the gradient of the motion in time and needs to be buffered during the calculation of the 3D motion in the current frame.

上述の式から適切な３次元運動を得るため、全体的な最適化解を探索する様々な技術が採用される。エネルギー関数は、全ての項の誤差の合計であるため、得られた解は、異常
値と誤差とにロバストな、全体的な最適解である。最適化が全体的に収束することを保証する凸関数として、重み関数φ(.)及びΨ(.)が選択される。最適化は、３次元運動の初期化に依存し、全体的な最適化に近い初期化は、より速い収束をもたらすことができる。運動フレームがシーケンス内の最初のフレームである場合、運動に関する過去の情報はありません。この場合、運動がブロック単位の剛体であり、３次元運動が各ブロックであると仮定されて、初期化が行われ、計算される。他のフレームについては、連続するフレーム間の動きが非常にゆっくりと時間と共に変化すると仮定する、等速モデルを考える。この仮定により、３次元運動は、前のフレームにおける３次元運動として初期化される。 In order to obtain an appropriate three-dimensional motion from the above equation, various techniques for searching for an overall optimization solution are employed. Since the energy function is the sum of the errors of all terms, the resulting solution is an overall optimal solution that is robust to outliers and errors. The weight functions φ (.) And Ψ (.) Are selected as convex functions that ensure that the optimization converges overall. Optimization depends on initialization of three-dimensional motion, and initialization close to global optimization can result in faster convergence. If the motion frame is the first frame in the sequence, there is no past information about the motion. In this case, the initialization is performed and calculated assuming that the motion is a block-based rigid body and the three-dimensional motion is each block. For the other frames, consider a constant velocity model that assumes that the motion between successive frames changes very slowly with time. With this assumption, the 3D motion is initialized as the 3D motion in the previous frame.

処理対象となっている対象物の３次元構造を表現するために、時間的に発展するグラフGt=(Vt,E)として表される２次元メッシュ構造が使用される。ここで、Vtは、時間的に発展する全頂点の集合であり、Eは、頂点がどのように接続されているかの情報を含む、変化しないエッジの集合である。頂点集合における各頂点は、各点の３次元の位置を含む。頂点だけでポイントクラウドデータが得られ、エッジ情報により３次元の面の構築が可能となる。変分最適化ステップからの３次元運動の出力は、以前の形状にワープされ、そして、対象物の現３次元形状を生成する。このことは、グラフのエッジ接続性を変更することなく、頂点の３次元の位置をVt+1=Vt + mtとして更新することによって達成される。このように、時間に伴って進展するメッシュが得られる。メッシュは、時間と共に発展する対象物の３次元構造を表現するために使用される。最初のフレームでは、対象物の形状の推定が必要となる。この目的のために使用できる文献には多くの方法が記載されている。対象物の初期形状を推定するために、最初の数フレームを使用して、運動から剛体構造を作成することが可能である。シェーディングからシェイプのような測光的アプローチを使用して、初期の対象物の形状を得ることもできる。 In order to express the three-dimensional structure of the object to be processed, a two-dimensional mesh structure represented as a graph Gt = (Vt, E) that evolves with time is used. Here, Vt is a set of all vertices that evolve in time, and E is a set of unchanging edges that includes information on how the vertices are connected. Each vertex in the vertex set includes the three-dimensional position of each point. Point cloud data can be obtained from only the vertices, and edge information can be used to construct a three-dimensional surface. The output of the 3D motion from the variational optimization step is warped to the previous shape and generates the current 3D shape of the object. This is achieved by updating the three-dimensional position of the vertices as Vt + 1 = Vt + mt without changing the edge connectivity of the graph. In this way, a mesh that progresses with time is obtained. A mesh is used to represent the three-dimensional structure of an object that evolves over time. In the first frame, it is necessary to estimate the shape of the object. There are many methods described in the literature that can be used for this purpose. To estimate the initial shape of the object, it is possible to create a rigid structure from motion using the first few frames. Photometric approaches such as shape from shading can also be used to obtain the initial object shape.

本発明の有効性を検証するために、実験が行われた。x軸に沿った正弦運動下での表面曲げの画像シーケンスがシミュレートされた。本発明は、上述の単眼画像シーケンスを使用して、運動下での表面の時間に伴う進展を推定するために使用された。実験結果は図５に示す通りである。図５には、対象物の推定された３次元運動６０２と対象物の実際の３次元運動６０１とが示されている。このプロットは、画像シーケンス内のすべてのフレームの中の特定のフレームを示す。この比較から分かるように、本発明は、地面の実データとよく一致する３次元形状の再構成を実行する。このように、提案された方式が検証される。 Experiments were performed to verify the effectiveness of the present invention. An image sequence of surface bending under sinusoidal motion along the x axis was simulated. The present invention was used to estimate the time evolution of the surface under motion using the monocular image sequence described above. The experimental results are as shown in FIG. FIG. 5 shows the estimated three-dimensional motion 602 of the object and the actual three-dimensional motion 601 of the object. This plot shows a particular frame among all frames in the image sequence. As can be seen from this comparison, the present invention performs a reconstruction of a three-dimensional shape that closely matches the actual ground data. In this way, the proposed scheme is verified.

図５は、本発明の方法論の実施形態が実装される、コンピュータ及びネットワークシステムの実施の形態を示すブロック図を示している。システムは、入力装置及びネットワーク接続を伴うコンピュータ示している。コンピュータプラットフォーム５０２は、データ及び命令を格納するためのEEPROM（Electrically Erasable and Programmable Read Only
Memory）及びRAM（Random Access Memory）と、情報の処理及び命令の実行を行うためのCPU（Central Processing Unit）及びデータバスと、ローカルネットワーク又はインターネット５０４を用いて、ホスト又はクライアントシステムに接続するためのネットワークカードとを備えている。また、コンピュータプラットフォームは、ベーシック・入出力装置５０１に接続されていても良い。また、コンピュータプラットフォームは、例えば、キーボード、マウス、ディスプレイ、及び外部記憶装置を含んでいても良い。 FIG. 5 shows a block diagram illustrating an embodiment of a computer and network system in which an embodiment of the methodology of the present invention is implemented. The system shows a computer with an input device and a network connection. The computer platform 502 is an EEPROM (Electrically Erasable and Programmable Read Only) for storing data and instructions.
To connect to a host or client system using a memory (RAM) and a RAM (Random Access Memory), a CPU (Central Processing Unit) and a data bus for processing information and executing instructions, and a local network or the Internet 504 Network card. The computer platform may be connected to the basic / input / output device 501. The computer platform may include, for example, a keyboard, a mouse, a display, and an external storage device.

最後の点として、本明細書で説明及び図示された、プロセス、技術、及び方法論は、特定の装置に限定されず、又は関連しておらず、このことは明白である。本発明の実装は、コンポーネントの組み合わせによって可能である。また、本明細書の指示に従って、様々な種類の汎用装置を使用することができる。更に、本発明は、特定のセットの例を用いて記載されている。しかし、これらは単なる例示であり、限定的なものではない。例えば、説明されたソフトウェアは、C ++、Java（登録商標）、Python、Perlなどの多種多様な言
語で実装されてもよい。更には、本発明の技術の他の実装形態は、当業者には明らかになるであろう。 Finally, the processes, techniques, and methodologies described and illustrated herein are not limited to or related to a particular device, and this is clear. Implementation of the present invention is possible by a combination of components. In addition, various types of general-purpose devices can be used in accordance with the instructions in this specification. Furthermore, the present invention has been described using a specific set of examples. However, these are merely examples and are not limiting. For example, the described software may be implemented in a wide variety of languages such as C ++, Java, Python, Perl. Furthermore, other implementations of the technology of the present invention will be apparent to those skilled in the art.

１００３次元運動評価装置
１０１２次元対応点探索部
１０２３次元運動最適化部
１０３３次元運動ワーピング部
２００画像シーケンス
２０１単一点深度
２０２前の形状
２０３現フレームでの３次元形状
４０１タイムフレームt
４０２タイムフレームt+1
４０３特定点Xt
４０４点Xt+1
４０５画像面
４０６点mt
４０７点mt+1
４０８ T=(Tx,Ty,Tz)T
５０１対象物の実際の３次元形状
５０２対象物の推定された３次元形状
６０１ベーシック入出力装置
６０２コンピュータプラットフォーム
６０３クライアントシステム
６０４ローカルネットワーク又はインターネット
６０５ホストシステム DESCRIPTION OF SYMBOLS 100 3D motion evaluation apparatus 101 2D corresponding point search part 102 3D motion optimization part 103 3D motion warping part 200 Image sequence 201 Single point depth 202 Previous shape 203 3D shape in present frame 401 Time frame t
402 Time frame t + 1
403 Specific point Xt
404 points Xt + 1
405 Image plane 406 points mt
407 points mt + 1
408 T = (Tx, Ty, Tz) T
501 Actual three-dimensional shape of object 502 Estimated three-dimensional shape of object 601 Basic input / output device 602 Computer platform 603 Client system 604 Local network or Internet 605 Host system

Claims

A three-dimensional motion evaluation apparatus for calculating a high-density non-rigid three-dimensional motion of an object from a monocular image,
A two-dimensional image corresponding point search unit that searches for dense two-dimensional corresponding points between subsequent frames and outputs a two-dimensional motion in pixel units between frame images;
In order to obtain accurate real-world 3D motion, the single-point depth observed in a single frame is used to optimize the 2D motion error in pixel units, A three-dimensional motion optimization unit that calculates the three-dimensional real-world motion of the object by optimizing the two-dimensional motion error of
A three-dimensional motion evaluation apparatus comprising:

The three-dimensional motion optimization unit includes:
Projecting 3D motion to 2D using a perspective projection model that only assumes relative transformation between the camera and the object;
The error between the 2D motion in pixels in the current frame and the modeled absolute 3D to 2D motion is expressed as the camera intrinsic parameter and the depth of a single point in a single frame. Using the value and
Storing the spatial smoothness for each frame and temporal smoothness between frames for the optimal three-dimensional motion, minimizing the error, and calculating a three-dimensional real-world motion that minimizes the error;
The three-dimensional motion evaluation apparatus according to claim 1.

A 3D motion warping unit that receives a previous non-rigid 3D shape as input, adds the calculated 3D motion of the current frame, and updates the current non-rigid 3D model;
The three-dimensional motion evaluation apparatus according to claim 1 or 2.

A three-dimensional motion evaluation method for calculating a high-density non-rigid three-dimensional motion of an object from a monocular image,
(A) searching for dense two-dimensional corresponding points between subsequent frames and outputting a two-dimensional motion in pixel units between frame images;
(B) In order to obtain accurate real-world 3D motion, the single point depth observed in a single frame is used to optimize the 2D motion error on a pixel-by-pixel basis. Calculating a three-dimensional real-world motion of the object by optimizing a two-dimensional motion error in pixel units;
A three-dimensional motion evaluation method characterized by comprising:

In step (b),
Projecting 3D motion to 2D using a perspective projection model that only assumes relative transformation between the camera and the object;
The error between the 2D motion in pixels in the current frame and the modeled absolute 3D to 2D motion is expressed as the camera intrinsic parameter and the depth of a single point in a single frame. Using the value and
Storing the spatial smoothness for each frame and temporal smoothness between frames for the optimal three-dimensional motion, minimizing the error, and calculating a three-dimensional real-world motion that minimizes the error;
The three-dimensional movement evaluation method according to claim 4.

(C) receiving a previous non-rigid 3D shape as input, adding the calculated 3D motion of the current frame and updating the current non-rigid 3D model;
In addition,
The three-dimensional motion evaluation method according to claim 5 or 6.

A program for calculating a high-density non-rigid three-dimensional motion of an object from a monocular image by a computer,
In the computer,
(A) searching for dense two-dimensional corresponding points between subsequent frames and outputting a two-dimensional motion in pixel units between frame images;
(B) In order to obtain accurate real-world 3D motion, the single point depth observed in a single frame is used to optimize the 2D motion error on a pixel-by-pixel basis. Calculating a three-dimensional real-world motion of the object by optimizing a two-dimensional motion error in pixel units;
A program characterized by having executed.

In step (b),
Projecting 3D motion to 2D using a perspective projection model that only assumes relative transformation between the camera and the object;
The error between the 2D motion in pixels in the current frame and the modeled absolute 3D to 2D motion is expressed as the camera intrinsic parameter and the depth of a single point in a single frame. Using the value and
Storing the spatial smoothness for each frame and temporal smoothness between frames for the optimal three-dimensional motion, minimizing the error, and calculating a three-dimensional real-world motion that minimizes the error;
The program according to claim 8.

In the computer,
(C) receiving a previous non-rigid 3D shape as input, adding the calculated 3D motion of the current frame, and updating the current non-rigid 3D model;
The program according to claim 7 or 8.