WO2017154045A1 - 3d motion estimation device, 3d motion estimation method, and program - Google Patents

3d motion estimation device, 3d motion estimation method, and program Download PDF

Info

Publication number
WO2017154045A1
WO2017154045A1 PCT/JP2016/001406 JP2016001406W WO2017154045A1 WO 2017154045 A1 WO2017154045 A1 WO 2017154045A1 JP 2016001406 W JP2016001406 W JP 2016001406W WO 2017154045 A1 WO2017154045 A1 WO 2017154045A1
Authority
WO
WIPO (PCT)
Prior art keywords
motion
error
pixel
wise
rigid
Prior art date
Application number
PCT/JP2016/001406
Other languages
English (en)
French (fr)
Inventor
Subhajit CHAUDHURY
Gaku Nakano
Original Assignee
Nec Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nec Corporation filed Critical Nec Corporation
Priority to PCT/JP2016/001406 priority Critical patent/WO2017154045A1/en
Priority to JP2018548017A priority patent/JP6806160B2/ja
Publication of WO2017154045A1 publication Critical patent/WO2017154045A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/579Depth or shape recovery from multiple images from motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Definitions

  • the present invention relates to a 3d motion estimation device, a 3d motion estimation method, and a program for realizing them, further relates to the field of 3d reconstruction in videos, and more particularly to non-rigid 3d motion estimation from monocular image sequences.
  • the field of 3D reconstruction from monocular image sequences is a field of active research in the computer vision community for almost two decades.
  • the 3d reconstruction from images finds applications in various fields such as animation, 3d printing, video and image editing etc.
  • Most conventional systems in this field work on a camera based scheme where a camera takes images of the desired object from various viewpoints which is used to compute the structure of the object and motion of the camera simultaneously.
  • Structure from motion is a class of 3d reconstruction methods that is largely prevalent in this field. It takes an image sequence and computes the structure and motion of camera using a rank constraint on the image data. This stage is usually followed by a bundle adjustment stage which optimizes the camera pose and the object structure simultaneously.
  • Non Patent Literature 1 affine camera model
  • affine camera models cannot recover translations along the optical axis because it assumes that image formation is independent of the depth of the points from the camera optical center.
  • Non Patent Literature 5 The work by Newcombe et al. (Non Patent Literature 5) on RGB-D based input data tries to compute a canonical rigid model for the object and computes 3d motion between frames to animate the canonical rigid model to produce live non-rigid deformation. This work is the closest to the proposed invention, with the exception that we use only RGB information to compute 3d flow under a fixed perspective camera which makes the problem much more difficult to solve.
  • Non Patent Literature 1 Garg, R.; Roussos, A.; Agapito, L., "Dense Variational Reconstruction of Non-rigid Surfaces from Monocular Video,” in Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on , vol., no., pp.1272-1279, 23-28 June 2013
  • Non Patent Literature 2 Triggs, B.,”Factorization methods for projective structure and motion,” in Computer Vision and Pattern Recognition, 1996.
  • Non Patent Literature 3 Ondruska, P.; Kohli, P.; Izadi, S., “MobileFusion: Real-Time Volumetric Surface Reconstruction and Dense Tracking on Mobile Phones," in Visualization and Computer Graphics, IEEE Transactions on , vol.21, no.11, pp.1251-1258, Nov.
  • Non Patent Literature 4 Ren ⁇ e Vidal and Daniel Abretske , “Nonrigid Shape and Motion from Multiple Perspective Views”, ECCV 2006-European Conference on Computer Vision, 2014
  • Non Patent Literature 5 Newcombe, R.A.; Fox, D.; Seitz, S.M., "DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time," in Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on , vol., no., pp.343-352, 7-12 June 2015 DISCLOSURE OF THE INVENTION PROBLEMS TO BE SOLVED BY THE INVENTION
  • An example of an object of the present invention is to provide a 3d motion estimation device, a 3d motion estimation method, and a program that can solve for 3D motion from 2D correspondences and warp that 3d motion on an initial model for non-rigid 3D reconstruction.
  • a 3D motion estimation device of the present invention is a 3D motion estimation device for computing dense non-rigid 3d motion of an object from monocular images, comprising: a 2d image correspondence finding unit that finds dense 2d correspondence between subsequent frames, and outputs the pixel-wise 2d motion between the images; a 3D motion optimization unit that optimizes the pixel-wise 2d motion error using a single point depth which is observed in single frame to obtain accurate real world 3d motion, and computes the 3d real world motion of the object by optimizing pixel-wise 2d motion error.
  • a 3D motion estimation method of the present invention is a 3D motion estimation method for computing dense non-rigid 3d motion of an object from monocular images, the method comprising: a step (a) of finding dense 2d correspondence between subsequent reference frames, and outputs the pixel-wise 2d motion between the images; a step (b) of optimizing the pixel-wise 2d motion error using a single point depth which is observed in single frame to obtain accurate real world 3d motion, and computes the 3d real world motion of the object by optimizing pixel-wise 2d motion error.
  • a program of the present invention is a program for computing dense non-rigid 3d motion of an object from monocular images, by a computer, the program causing the computer to execute: a step (a) of finding dense 2d correspondence between subsequent reference frames, and outputs the pixel-wise 2d motion between the images; a step (b) of optimizing the pixel-wise 2d motion error using a single point depth which is observed in single frame to obtain accurate real world 3d motion, and computes the 3d real world motion of the object from the optimized pixel-wise 2d motion error.
  • Figure 1 is a schematic block diagram showing configurations of the 3D motion estimation device of the present invention.
  • Figure 2 is a block diagram showing a specific configuration of the 3d motion estimation device according to the embodiment of the present invention.
  • Figure 3 is the workflow showing the detailed steps of dense 3d motion field computation from 2d motion correspondences. It shows the various processes that are to be applied to each frame of 2d correspondences and how the final 3d shape is obtained.
  • Figure 4 shows the process of 3d to 2d motion mapping under a perspective camera model.
  • Figure 5 shows a comparison of the estimated shape of the object using the present invention with the actual 3d shape of object.
  • Figure 6 is a block diagram of an example of the computer platform on which the 3d motion estimation device may be implemented.
  • the process of 3d non-rigid reconstruction is treated an as incremental process, wherein the shape in the current frame is a combination of shape in the previous frame and the 3d motion in the current frame.
  • Such a treatment of the 3d reconstruction process effectively reduces the problem to two sub-problems, (i) Finding a reliable initial rigid model of the object and (ii) Computing 3d motion per frame to animate the rigid model, thus producing a non-rigid 3d reconstruction of the scene.
  • This invention focuses on the aspect of computing robust 3d motion per frame to produce 3d non-rigid animation of the initial rigid model from monocular vision.
  • the advantage of the technique of the present invention is that, this method can produce a dense 3d motion field from dense 2d correspondences assuming a perspective camera projection model that is temporally and spatially coherent and thus robust to image and motion noise.
  • the framework of the proposed invention is incremental in nature i.e. the solution of the 3d motion field for the current frame is dependent on the previous frames.
  • the proposed model makes use of known absolute depth at-least at a single point, which makes it possible to compute 3d motion in absolute scale in any direction of motion with respect to the camera.
  • the present invention accordingly comprises of several steps and the relation of one or more of these steps with respect to each of the others, and the apparatus embodying the features of construction, combinations of elements and arrangement of parts that are adapted to affect such steps, all will be exemplified in the following detailed disclosure, i.e. description of drawings and detailed description. The scope of the invention will be indicated in the claims.
  • Embodiments A network system, a 3D motion estimation device, a 3D motion estimation method, and a program in embodiments of the present invention will be described below with reference to Figs. 1 to 5.
  • This present invention relates to computing dense 3d non-linear motion from 2d image correspondence using a perspective camera model.
  • This present invention can be broadly divided into dense 2D correspondence computation and computing the 3d motion from those 2d correspondences using a perspective projection model constraining the solution by temporal and spatial consistency constraints. This 3d motion is then warped with the previous 3d shape to obtain the current 3d shape.
  • Fig. 1 is a schematic block diagram showing configurations of the 3D motion estimation device of the present invention.
  • the 3d motion estimation device (100) shown in Fig. 1, performs the above mentioned tasks.
  • the 3d motion estimation device can be further divided into various units. Here onwards the functionality of each unit will be described with reference to Fig 1.
  • the 3d motion estimation device (100) includes a 2d image correspondences finding unit (101) and a 3d motion optimization unit (102).
  • the 2d image correspondence finding unit (101) finds dense 2d correspondence between subsequent frames, and outputs the pixel-wise 2d motion between the images.
  • the 3D motion optimization unit (102) optimizes the pixel-wise 2d motion error using a single point depth which is observed in single frame to obtain accurate real world 3d motion, and computes the 3d real world motion of the object by optimizing pixel-wise 2d motion error.
  • the pixel-wise 2d motion error is optimized using a single point depth, it is possible to solve for 3D motion from 2D correspondences and warp that 3d motion on an initial model for non-rigid 3D reconstruction.
  • Fig. 2 is a block diagram showing a specific configuration of the 3d motion estimation device according to the embodiment of the present invention.
  • the 3d motion estimation device (100) includes a 3d motion warping unit (103) more in addition to the 2d image correspondences finding unit and the 3d motion optimization unit (102).
  • the first unit in the proposed 3d motion estimation device (100) is the 2d image correspondence finding unit (101).
  • the process of 3d motion estimation begins by first estimating 2d correspondences between image frames.
  • an image sequence (200) is input to the 2d image correspondences finding unit (101) which computes pixel-wise 2d motion between subsequent frames.
  • the 2d image correspondence finding unit (101) computes a dense 2d correspondence field by comparing the image patches in subsequent frames which match in image intensity.
  • a class of methods for finding 2d correspondences between images is Optical Flow, where for each pixel in the reference frame, using the brightness constancy assumption, the motion vectors are computed in a dense fashion using local or global optimizations.
  • Another method of finding 2d image correspondences is by feature tracking techniques where a feature descriptor is calculated around each pixel in the reference frame that is matched in the target frame to compute 2d motion vector.
  • the optical flow method or the feature tracking based method or similar methods can be used to find 2d motion.
  • the next unit is the 3d motion optimization unit (102) which finds corresponding 3d motion from 2d correspondences.
  • the dense 2d correspondence obtained from the 2d image correspondence finding unit (101) is used to compute the 3d motion by assuming a perspective camera model.
  • the 3d motion optimization unit (102) project the current estimate of the 3d motion on to the image plane using the perspective projection model.
  • the absolute depth at a point can be obtained using a commercial laser depth sensor.
  • Image based methods like triangulation of a known pattern on the object can also be used to obtain absolute depth.
  • This projected 2d motion is compared with the observed 2d motion and the error between them is minimized by an optimization algorithm.
  • the energy function used in the variational optimization also ensures that the spatial and the temporal consistency in the 3d motion solution are maintained. This ensures the robustness of the final solution to outliers.
  • the 3d motion optimization unit (102) provides the output as the optimal 3d motion in the current frame.
  • the 3d motion warping unit (103) projects the 3d motion to 2d motion using a perspective projection model assuming only relative translation between camera and object.
  • the 3d motion warping unit (103) computes the error between the pixel-wise 2d motion in the current frame and modeled absolute 3d to 2d motion using the intrinsic parameters of the camera and the single point depth values at a single frame.
  • the 3d motion warping unit (103) minimizes the error, and preserves the spatial smoothness in each frame and temporal smoothness between frames for the optimum 3d motion, and computes the 3d real world motion minimizing the error.
  • the 3d motion warping unit can take previous non-rigid 3d shape as input and adds the computed 3d motion of the current frame to update the current non-rigid 3d model.
  • the 3d motion warping unit (103) outputs a 3d shape in current frame (203).
  • the next steps is to warp the current 3d motion by the 3d motion warping unit (103), obtained in the previous step, with the previous shape (202) of the object and obtain the final shape in the current frame (107).
  • This is achieved by representing the 3d shape as a 2d mesh where each vertex in a 3d location with some edge connectivity information.
  • the 3d position of each vertex in the mesh is updated using the computed current 3d motion and the vertex positions of the mesh in the previous frame keeping the edge connectivity between the vertices unchanged. This achieves the final non-rigid shape reconstruction of the object from corresponding image sequences.
  • Fig. 3 is a flowchart showing the operations of the 3d motion estimation device according to the embodiment of the present invention.
  • Figs. 1 and 2 will be referenced as appropriate.
  • the 3d motion estimation method is carried out by operating the 3d motion estimation device (100). Accordingly, the following description of the operations of the 3d motion estimation device (100) will replace a description of the 3d motion estimation method according to this embodiment.
  • the 2d image correspondence finding unit (101) of the system takes the previous image frame and the current image frame (step 301) as input. Next the 2d image correspondence finding unit (101) computes the 2d dense correspondence (step 302) from these images.
  • the 2d image correspondence finding unit (101) determines whether the current input 2d motion frame is the first frame in the sequence or not (step 303). If it is the first frame, the 2d image correspondence finding unit (101) computes an initial value for 3d motion assuming the object is block-wise rigid and motion in each block is computed (step 304).
  • the 2d image correspondence finding unit (101) computes the initial value for the 3d motion in the current frame assuming a constant velocity model, using the previous 3d motion field (step 305).
  • the 2d correspondence finding unit (101) projects the 3d motion onto the image plane using the known absolute depth (step 306).
  • the 2d correspondence finding unit (101) calculates the error between the projected and the observed 2d motion (step 307).
  • the 3d motion optimization unit (102) determines whether the error is less than a certain threshold or not (step 308). If the error is not less than the certain threshold, the 3d motion optimization unit (102) performs an optimization step whereby the 3d motion is updated to reduce the 2d motion error between the projected and the observed motion (step 309).
  • the 3d motion warping unit (103) warps the shape of the object in the previous frame (310) with the optimal 3d motion obtained from the 3d optimization unit (102). As a result the final output of the algorithm is the 3d shape in the current frame (step 311).
  • u,v are the zero centered image co-ordinate system and f is the focal length and Z is the absolute distance from the camera optical center.
  • the absolute depth at a point can be obtained using a commercial laser depth sensor. Image based methods like triangulation of a known pattern on the object can also be used to obtain absolute depth.
  • This present invention tries to compute the 3d motion T from the 2d motion o t for every pixel representing an object in a dense fashion.
  • Equation 2 gives the global energy function which is to be reduced.
  • the data term E d is dependent on the observed 2d motion and the estimate of 3D motion.
  • the data term measures the error in the projected 2d motion on the image plane with the observed 2d and penalizes the solution if the error is high.
  • the data term is generally given by equation3.
  • the second term in the energy function is the smoothness term which is responsible for maintaining the spatial and temporal smoothness of the solution.
  • Spatial smoothness refers to the final 3d motion solution being smooth along the image XY axis without any abrupt discontinuity. That means that there should be strong correlation between the 3d motions in the neighboring pixels. This is expressed as the spatial gradient of the 3d motion in the X-Y axis in the image co-ordinate.
  • Temporal coherence refers to the fact that for a given pixel, velocity should not change abruptly with time but there should a smooth temporal variation in pixel 3d velocity. This is manifested in the energy term as gradient along the time axis. So the smoothness term, E s is given as equation 4.
  • ( ⁇ / ⁇ x, ⁇ / ⁇ y, ⁇ / ⁇ t) measures the gradient of the 3d motion vectors along the image X,Y axis and the temporal axis .
  • the weighing function phi(.) is a robust kernel similar to psi(.). The summation of the smoothness in each pixel is taken over the entire image.
  • the 3d motion in the previous frame is required to compute the gradient of the motion in time and thus it needs to be stored in the buffer during the current frame’s 3d motion computation also.
  • V t the temporally evolving set of all vertices
  • E the non-changing set of edges containing the information of how the vertices are connected.
  • Each vertex in the vertex set contains the 3d location of each point.
  • the vertices only by themselves give us a point cloud data and with the edge information we can construct a surface in 3d.
  • the 3d motion output from the variational optimization step is warped onto the previous shape to produce the current 3d shape of the object.
  • Fig.5 shows the block diagram showing an embodiment of a computer and network system upon which an embodiment of the inventive methodology may be implemented.
  • the system shows a computer along with input devices and network connection.
  • the computer platform (502) may include an EEPROM (Electrically Erasable and Programmable Read Only Memory) and RAM (Random Access Memory) for storage of data and instructions, CPU (Central Processing Unit) and data-bus for processing information and carry out instructions and a network card to connect to the host or client systems (505, 503), using local network or internet (504).
  • the computer platform can be connected to basic input-output devices (501). These can include, for example a keyboard, mouse, display and external storage devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
PCT/JP2016/001406 2016-03-11 2016-03-11 3d motion estimation device, 3d motion estimation method, and program WO2017154045A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2016/001406 WO2017154045A1 (en) 2016-03-11 2016-03-11 3d motion estimation device, 3d motion estimation method, and program
JP2018548017A JP6806160B2 (ja) 2016-03-11 2016-03-11 3次元運動評価装置、3次元運動評価方法、及びプログラム

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2016/001406 WO2017154045A1 (en) 2016-03-11 2016-03-11 3d motion estimation device, 3d motion estimation method, and program

Publications (1)

Publication Number Publication Date
WO2017154045A1 true WO2017154045A1 (en) 2017-09-14

Family

ID=59789048

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2016/001406 WO2017154045A1 (en) 2016-03-11 2016-03-11 3d motion estimation device, 3d motion estimation method, and program

Country Status (2)

Country Link
JP (1) JP6806160B2 (ja)
WO (1) WO2017154045A1 (ja)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110232705A (zh) * 2019-05-17 2019-09-13 沈阳大学 一种融合分数阶变分调整的反向低秩稀疏学习目标跟踪方法
WO2020003037A1 (en) * 2018-06-26 2020-01-02 Sony Corporation Motion compensation of geometry information
US11321859B2 (en) 2020-06-22 2022-05-03 Toyota Research Institute, Inc. Pixel-wise residual pose estimation for monocular depth estimation

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09326029A (ja) * 1996-06-04 1997-12-16 Fujitsu Ltd 3次元計測方法及び装置

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09326029A (ja) * 1996-06-04 1997-12-16 Fujitsu Ltd 3次元計測方法及び装置

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MORIKAWA, HIROYUKI ET AL.: "Image Sequence Coding Using 3-D Structure and Motion Information", THE TRANSACTIONS OF THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS, vol. J73-D-II, no. 7, 25 July 1990 (1990-07-25), pages 982 - 991, XP000262674 *
SHIMADA , NOBUTAKA ET AL.: "Model Fitting of Articulated Objects", IPSJ SIG TECHNICAL REPORTS, vol. 2006, no. 51, 19 May 2006 (2006-05-19), pages 375 - 392 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020003037A1 (en) * 2018-06-26 2020-01-02 Sony Corporation Motion compensation of geometry information
KR20200142567A (ko) * 2018-06-26 2020-12-22 소니 주식회사 지오메트리 정보의 모션 보상
CN112219222A (zh) * 2018-06-26 2021-01-12 索尼公司 几何形状信息的运动补偿
KR102402494B1 (ko) * 2018-06-26 2022-05-30 소니그룹주식회사 지오메트리 정보의 모션 보상
US11501507B2 (en) 2018-06-26 2022-11-15 Sony Group Corporation Motion compensation of geometry information
CN110232705A (zh) * 2019-05-17 2019-09-13 沈阳大学 一种融合分数阶变分调整的反向低秩稀疏学习目标跟踪方法
CN110232705B (zh) * 2019-05-17 2023-05-12 沈阳大学 一种融合分数阶变分调整的反向低秩稀疏学习目标跟踪方法
US11321859B2 (en) 2020-06-22 2022-05-03 Toyota Research Institute, Inc. Pixel-wise residual pose estimation for monocular depth estimation

Also Published As

Publication number Publication date
JP6806160B2 (ja) 2021-01-06
JP2019507934A (ja) 2019-03-22

Similar Documents

Publication Publication Date Title
US10553026B2 (en) Dense visual SLAM with probabilistic surfel map
US20230054821A1 (en) Systems and methods for keypoint detection with convolutional neural networks
Fortun et al. Optical flow modeling and computation: A survey
US7623731B2 (en) Direct method for modeling non-rigid motion with thin plate spline transformation
Yu et al. Robust video stabilization by optimization in cnn weight space
US7755619B2 (en) Automatic 3D face-modeling from video
Stühmer et al. Real-time dense geometry from a handheld camera
US8885880B2 (en) Robust video stabilization
US8824801B2 (en) Video processing
JP2007257287A (ja) 画像レジストレーション方法
WO2015043872A1 (en) Semi-dense simultaneous localization and mapping
US20180005039A1 (en) Method and apparatus for generating an initial superpixel label map for an image
US20140168204A1 (en) Model based video projection
Senst et al. Robust local optical flow: Long-range motions and varying illuminations
WO2017154045A1 (en) 3d motion estimation device, 3d motion estimation method, and program
Xu et al. Integrating motion, illumination, and structure in video sequences with applications in illumination-invariant tracking
Yu et al. A GPU-based implementation of motion detection from a moving platform
Kuschk et al. Real-time variational stereo reconstruction with applications to large-scale dense SLAM
Snape et al. Face flow
Xu et al. Optical flow-based video completion in spherical image sequences
Jones Accurate and Computationally-inexpensive Recovery of Ego-Motion using Optical Flow and Range Flow with Extended Temporal Support.
Roy-Chowdhury et al. Face Tracking.
Ramnath et al. Increasing the density of active appearance models
Zinßer et al. High-speed feature point tracking
Schreiber Incorporating symmetry into the Lucas–Kanade framework

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2018548017

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16893377

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 16893377

Country of ref document: EP

Kind code of ref document: A1