WO2016193393A1 - Real-time temporal filtering and super-resolution of depth image sequences - Google Patents

Real-time temporal filtering and super-resolution of depth image sequences Download PDF

Info

Publication number
WO2016193393A1
WO2016193393A1 PCT/EP2016/062554 EP2016062554W WO2016193393A1 WO 2016193393 A1 WO2016193393 A1 WO 2016193393A1 EP 2016062554 W EP2016062554 W EP 2016062554W WO 2016193393 A1 WO2016193393 A1 WO 2016193393A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
depth
observed
depth image
pixel value
Prior art date
Application number
PCT/EP2016/062554
Other languages
French (fr)
Inventor
Kassem AL ISMAEIL
Djamila AOUADA
Original Assignee
Université Du Luxembourg
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Université Du Luxembourg filed Critical Université Du Luxembourg
Publication of WO2016193393A1 publication Critical patent/WO2016193393A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/73Deblurring; Sharpening
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20182Noise reduction or smoothing in the temporal domain; Spatio-temporal filtering

Definitions

  • the present invention relates to the field of image processing.
  • it relates to methods and devices for temporally filtering depth image sequences, which may comprise upsampled depth images captured at an initial low resolution.
  • Dynamic depth videos with one or multiple moving objects deforming non-rigidly represent a very typical scenario encountered in applications such as people sensing, cloth deformation, hand gesture recognition, computer-based recognition of variations of facial expressions, to name a few. Such scenes are more challenging than static scenes. Indeed, in addition to challenges due to noise and outliers, non-rigid deformations in 3D cause occlusions, which result in missing data, and in undesired holes.
  • Super-resolution, SR algorithms have been proposed as a solution to this problem. Two categories of algorithms may be distinguished; multi-frame SR which uses multiple frames in an inverse problem formulation to reconstruct one high resolution frame [16, 7, 4]. The second category is known as single-image SR.
  • Super-resolution is a common technique used to recover a high resolution, HR, reference image from a plurality of observed low resolution, LR, images subject to errors due to the optical acquisition system such as noise and blurring, and to deviations from the reference image due to relative motion.
  • Patent document US 2014/0169701 A1 discloses an algorithm for generating high resolution depth images from captured low resolution depth images.
  • the disclosed algorithm relies on the availability of high resolution, HR, image data.
  • Features of the HR image (boundaries, edges) are detected, and a boundary map is generated therefrom. After an initial upsampling of the low resolution depth information, the boundary map is used to identify regions in the upsampled depth image, which require further refinement.
  • Patent document US 8,532,425 B2 discloses a method for generating a dense depth map using an adaptive joint bilateral filter, starting from an initial depth map. Parameters of the filter are adapted based upon the content of an image corresponding to the initial depth map.
  • the UP-SR algorithm proposed in [3] is limited to lateral motions as it only computes 2D dense optical flow but does not account for the full motion in 3D, known as scene flow, or the 2.5D motion, known as range flow. It consequently fails in the case of radial deformations. Moreover, it is not practical because of a heavy cumulative motion estimation process applied to a number of frames buffered in the memory.
  • the invention provides a method for generating in real time a temporally filtered representation of a depth image sequence.
  • the method comprises the following subsequent steps: a) providing a sequence of at least two depth images representing depth image data captured using depth image sensing means at two consecutive instants in time, wherein a first depth image precedes a second, observed depth image, and wherein each pixel value of a depth image represents the distance between the image sensing means and an imaged object at the time of capture;
  • each observed pixel value by computing a filtered pixel value based on the observed pixel value and on the corresponding registered pixel value of the first image, comprising computing an approximation of the depth displacement for each pixel, by computing the difference between the pixel value of the observed image and the corresponding registered pixel value of the first, preceding image;
  • the resulting filtered depth image is used as the first depth image in a subsequent application of steps a) - d)
  • the method may further comprise the subsequent step of using the filtered depth image resulting from the immediately previous application of steps a) to d) as the first depth image in a following, iterative application of steps a)-d).
  • the registration step b) may preferably comprise estimating, for each pixel of the first image, a motion vector, which is an estimation of the spatial displacement of said pixel with respect to an observed pixel of the observed image.
  • the observed image may be provided at a first, low, spatial resolution, while the first, preceding image is provided at a second spatial resolution being r times larger than the first resolution, r>0.
  • the registration and filtering steps b) , c) may be performed using an upsampled representation of the observed image, the upsampling factor being equal to r.
  • the motion vectors may preferably be computed using the low resolution observed image and upsampled by the factor r.
  • said motion vectors may be computed using the upsampled representation of the observed image.
  • the filtering step c) may preferably comprise the generation of a predicted pixel value based on the pixel value of the first image, and a correction of said predicted pixel value based on the corresponding observed pixel value. Further preferably, the filtering step c) may comprise applying a spatial median filter to said observed pixel, if the absolute difference between said predicted pixel value and said observed pixel value exceeds a predetermined threshold value.
  • a spatial deblurring filter may further be applied to the filtered depth image.
  • the deblurring filter may be a multi-level iterative bilateral total variation deblurring filter.
  • the depth image sequence may preferably be captured using depth image sensing means in a fixed position relative to an imaged scene comprising motion.
  • the depth image sensing means may preferably comprise a Time of Flight, ToF camera.
  • the depth image sequence may preferably be captured using depth image sensing means, which are in motion relative to a static imaged scene.
  • the device comprises a memory element and computing means configured for:
  • each observed pixel value by computing a filtered pixel value based on the observed pixel value and on the corresponding registered pixel value of the first image, comprising computing an approximation of the depth displacement for each pixel, by computing the difference between the pixel value of the observed image and the corresponding registered pixel value of the first, preceding image;
  • the device may further comprise depth image sensing means.
  • the depth image sensing means may comprise at least one sensor element capable of capturing images comprising image depth information, and optical means defining a field of view.
  • the invention further provides a computer configured to carry out the method according to the invention.
  • a computer program comprising computer readable code means is provided.
  • the computer readable code means When the computer readable code means are run on a computer, they cause the computer to carry out the method according to the invention.
  • a computer program product comprising a computer-readable medium on which the computer program according to the invention is stored.
  • the invention provides a real-time dynamic multi-frame filtering algorithm for depth videos has been proposed.
  • the invention may be applied to a super-resolution framework, without being limited thereto.
  • the algorithm according to the invention is effective in enhancing the resolution and the quality of low resolution dynamic scenes with highly non-rigidly moving objects. Obtained results show the robustness of the proposed algorithm against radial motions, i.e., motions in the direction of image depth. This is handled by first estimating the depth displacement, and simultaneously correcting the depth measurement by filtering.
  • the proposed algorithm is based on per-pixel temporal processing of the depth video sequence such that multiple one-dimensional signals are filtered separately. Each filtered depth frame is further refined using a multi-level iterative bilateral total variation regularization after filtering and before proceeding to the next frame in the sequence.
  • the invention only requires the availability of depth images.
  • the invention allows to enhance low resolution dynamic depth videos containing freely non-rigidly moving objects with the proposed dynamic multi-frame super-resolution algorithm.
  • Existent methods are either limited to rigid objects, or restricted to global lateral motions discarding radial displacements.
  • the invention addresses these shortcomings by accounting for non-rigid displacements in 3D.
  • the depth displacement is estimated, and simultaneously the depth measurement is corrected through appropriate filtering.
  • This concept is incorporated efficiently in a multi-frame super-resolution framework. It is formulated in a recursive manner that ensures an efficient deployment in real-time. Results show the overall improved performance of the proposed method as compared to alternative approaches, and specifically in handling relatively large 3D motions. Test examples range from a full moving human body to a highly dynamic facial video with varying expressions.
  • the invention solves two limitations thereof: not considering 3D motions and using an inefficient cumulative motion estimation.
  • the proposed solution in accordance with the invention is based on the assumption that the 3D motion of a point can be approximated by decoupling the radial component from the lateral ones. This approximation allows the handling of non-rigid deformations while reducing the computational complexity associated with an explicit full 3D motion estimation at each point.
  • a recursive depth multi- frame SR is formulated by replacing UP-SR's cumulative motion estimation with a point- tracking operation locally at each pixel.
  • the invention treats a video sequence as a set of one-dimensional signals. By so doing, an approximation of range flow is achieved. This allows for taking into account radial deformations in the SR estimation.
  • Figure 1 is a flow chart illustrating the main method steps according to a preferred embodiment of the invention
  • Figure 2 schematically illustrates a device according to a preferred embodiment of the invention
  • Figure 3a shows a lot resolution depth frame
  • Figure 3b shows an up-sampled representation of the depth frame shown in Figure 3a, wherein a known bi-cubic interpolation is used;
  • Figure 3c shows a super-resolved representation of the depth frame shown in
  • Figure 3d shows a super-resolved representation of the depth frame shown in
  • Figure 3e shows a super-resolved representation of the depth frame shown in
  • Figure 4 is a flow chart illustrating the main method steps according to a preferred embodiment of the invention.
  • Figures 5a and 5b show amplitude images of a dynamic scene containing a moving hand towards the image sensing means
  • Figure 5c and 5d show corrected representations of the images shown in Figures 5a and 5b respectively, wherein a known standardization step has been used;
  • Figure 6 plots the 3D RMSE in mm of a super-resolved depth image sequence based on a preferred embodiment of the invention for three different upsampling factors and for different noise levels;
  • Figures 7a-7c illustrate tracking results for different depth values randomly chosen from a super-resolved depth image sequence based on a preferred embodiment of the invention, for three different upsampling factors;
  • Figures 7d-7f illustrate the filtered depth displacements corresponding to the results of Figures 7a-7c respectively;
  • Figure 8a illustrates a 3D plot of a low resolution depth frame
  • Figure 8b illustrates a super-resolved representation of the depth frame of Figure
  • Figure 8c illustrates a super-resolved representation of the depth frame of Figure
  • Figure 8d illustrates a super-resolved representation of the depth frame of Figure 8a, wherein a known super-resolution algorithm has been applied;
  • Figure 8e illustrates a super-resolved representation of the depth frame of Figure 8a, wherein an algorithm in accordance with a preferred embodiment of the invention has been applied;
  • Figure 8f illustrates ground-truth super-resolved representation of the depth frame of Figure 8a
  • Figure 9 illustrates results obtained using a preferred embodiment of the method according to the invention, applied on real data
  • FIG. 10 shows results of a filtered depth value profile in accordance with a preferred embodiment of the invention, compared to the corresponding captured real data depth values.
  • An image is an array of pixels representing imaging data.
  • a pixel generally corresponds to the information captured by a single sensing element of an image sensor.
  • a depth image is an array of pixels, wherein each pixel value represents depth information of an imaged scene or object. The depth is measured with respect to the imaging plane or imaging sensor of a depth image sensing means. Typically, the depth data are relative distances.
  • a sequence of frames or images is synonymous with a video sequence.
  • the considered sequences comprise an object in relative motion with respect to the image sending means.
  • Figure 1 shows the main steps of a preferred embodiment according to the invention.
  • the method starts at step 10 with the provision of a sequence of at least two depth images.
  • Each depth image represents depth image data captured using depth image sensing means.
  • the depth image sensing means may be any such sensing means known to the skilled person, for example they may comprise a Time of Flight, ToF, camera. Further details of such devices are well known in the art and will not be provided in the course of the present description.
  • the two provided depth images are immediately consecutive images of an image sequence.
  • the second image is also referred to as the observed image.
  • the observed image may be an image instantaneously captured by the sensing means, while the first image was captured earlier and stored in a memory of a computing device.
  • each pixel of the first image is registered with an observed pixel of the observed image. This corresponds to the computation of the dense optical flow between the two images. Any known method for computing the dense optical flow may be applied at this step.
  • the registration may be performed directly on the depth images. Alternatively, if available through other sensing means, a first and second amplitude image corresponding to the first and second depth images may be used in the registration step.
  • each observed pixel of the second depth image is filtered by computing a filtered pixel value.
  • the filtered pixel value is based on the observed pixel value itself, and on the corresponding registered pixel value of the first image.
  • the temporal filtering step 30 is performed per pixel on a one dimensional signal.
  • the filtering or smoothing which is applied on this signal may be any such method known in the art, based for example on prediction methods.
  • the resulting filtered depth image replaces the observed image in step 40.
  • the method may be iteratively applied, wherein the resulting filtered depth image will be considered as the first image, and a newly captured image will be considered as the observed image.
  • the temporal filtering only acts on the observed image, and a filtered representation of the preceding image.
  • the method may be initialized using an initial image.
  • the observed images are provided at a low resolution, LR.
  • the sensing means may for example be limited to providing LR depth images.
  • the registration step 20 may be performed on the LR representations of the first and observed images.
  • the registration and filtering steps 20, 30 are preferably performed using an upsampled representation of the observed image, the upsampling factor being equal to r>1.
  • the resulting filtered image is in that case a super-resolved temporally filtered depth image representation of the observed LR image.
  • an additional spatial deblurring step is applied on the temporally filtered depth image.
  • Any known deblurring filter may be used in this step, although a multi-level iterative bilateral total variation deblurring filter will be detailed in a further embodiment.
  • the recursive processing in accordance with the present invention only considers two consecutive frames at (t-1 ) and t, where the current frame is to be enhanced each time.
  • Figure 2 illustrates a device 100 for carrying out the method according to the invention.
  • the device comprises a memory element 1 10 for storing depth images and for storing, for example, executable computer code.
  • the memory element may be any type of memory element known in the art, such as a Random Access Memory, Solid State Drive or Hard Disk Drive.
  • the device may be a computing device.
  • the device 100 comprises computing means 120, which are configured for carrying out method steps 10-40 in accordance with the invention.
  • the computing means may for example be the Central Processing Unit of computing device, which is configured to execute the method by reading corresponding program code from a memory element.
  • the computing means 120 have read and write access to the memory element 1 10. Based on an initial input consisting of a first depth image f t -i and an observed consecutive depth image gt, the computing means generate a filtered, and in accordance with some embodiments of the invention upsampled, representation of the observed image, ft.
  • the device may comprise depth image sensing means 130 having optical means for defining a field of view.
  • the computing means 120 are operatively coupled to the sensing means, so that once an iteration of the method steps 10-40 has been completed, a new observed image is captured using the sensing means 130, relayed to the memory element 1 10, and fed into a new iteration of the method steps 10-40.
  • a computer program comprising computer readable code means, which when loaded into a memory element and executed by processing means of a computer, cause the computer to carry out the described method steps, can be implemented by the skilled person based on the provided description and figures illustrating the method. 1. Particularly preferred embodiment
  • Figure 3a shows the low resolution depth frame.
  • Figure 3b shows a result obtained using bicubic interpolation.
  • Figure 3c shows a result obtained using Patch Based Single Image Super Resolution (SISR) [5].
  • Figure 3d shows a result obtained using Upsampling for Precise Super Resolution (UP-SR) [4].
  • Figure 3e shows a result obtained using the proposed algorithm in accordance with this particularly preferred embodiment of the invention (50 ms per frame).
  • Section 2 gives the problem for depth video super-resolution.
  • Section 3 explains the proposed concept for handling radial motion within the super-resolution framework.
  • the proposed recursive depth video SR algorithm is presented in Section 4.
  • Quantitative and qualitative evaluations and comparisons with other approaches are given in Section 5. The following notations will be considered: bold small letters correspond to vectors.
  • Bold capital letters denote matrices. Italic letters are scalars.
  • pt denotes a pixel position on image plane at instant t
  • m t denotes the corresponding 2D optical flow at t.
  • a noisy LR observation is modelled as follows:
  • LidarBoost algorithm [16] is a reference method for multi-frame depth SR, it is only applicable to static scenes for object scanning.
  • the second component of UP-SR is to use a cumulative motion compensation approach between the reference frame and all observations.
  • This operation starts by estimating the motion between consecutive frames, using classical dense 2D optical flow estimation between the upsampled versions g t -i ⁇ and g t ⁇ , namely,
  • is a dense optical flow-related cost function
  • the vector 5t is referred to as the innovation image. It contains novel points appearing, or disappearing due to occlusions or large motions. This innovation is assumed in [4] to be negligible. In addition, similarly to [8], for analytical convenience, it is assumed that all pixels in g t ⁇ originate from pixels in g t -i ⁇ in a one to one mapping. Therefore, each row in contains 1 for each position corresponding to the address of the source pixel in g t -i ⁇ . This assumption of bijectiveness implies that the matrix AV -i is assumed to be an invertible [ ] ⁇ t I— 1 , ⁇ — 1
  • FIG. 4 is a flow chart of the proposed multi-frame depth super-resolution algorithm for dynamic depth videos containing one or multiple non-rigidly deforming objects in accordance with this particularly preferred embodiment of the invention.
  • the method steps 10-40 as shown in Figure 1 are also identified in Figure 4.
  • the depth surface can be defined as the following mapping:
  • the surface deformation may then be expressed through the derivative of 3 ⁇ following the direction ⁇ resulting in a range flow i3 ⁇ 4 ' 3 ⁇ 4 ⁇ '3 ⁇ 4) where the lateral displacement is m *o — i u t 0 i v *o ) and the radial displacement in the depth direction is 0 dt ' °.
  • m *o is computed using available approaches for 2D optical flow estimation.
  • the 2D optical flow is computed using the low resolution 2D intensity images associated with the considered depth sensor.
  • the intensity (amplitude) images provided by the ToF camera cannot be used directly. Their values differ significantly depending on the integration time and object distance from the camera.
  • a standardization step is applied, which is similar to the one proposed in [17] prior to motion estimation, see Figure 5.
  • Figures 5a and Figure 5b show original amplitude images for a dynamic scene containing a moving hand towards the camera where the intensity (amplitude) values differ significantly depending on the object distance from the camera.
  • the corrected amplitude images for the same scene are presented in Figure 5c and Figure 5d respectively where the intensity consistency is preserved.
  • the 2D optical flow can be directly estimated using the depth images after a preprocessing step with a bilateral filter.
  • the bilateral filter is only used in the preprocessing step while the original depth data is mapped in the registration step.
  • the registered depth image from (to-1 ) to to is defined as *o-i . Consequently, the radial displacement may be approximated by the temporal difference between depth values, i.e.,
  • both the depth measurement and the radial displacement are to be filtered.
  • filtering one may introduce a Gaussian system; so a noisy depth observation may be modelled as
  • ⁇ ⁇ ⁇ (0, ⁇ ⁇ 2 ) may be assumed .
  • the dynamic model is then defined as
  • a prediction s *l*- i may then be computed and subsequently corrected using the observed measurement 2 *.
  • the corrected error is considered as the difference between the prediction and the observation. This per pixel filtering is extended to all pixels of the depth frame and incorporated in the SR framework in Section 4. 4. Proposed Recursive Depth Video Super-Resolution
  • f t is estimated in two steps; first, finding a blurred version h ⁇ as the result of the filtering step, then a deblurred version ft as the result of the MAP iterative regularization.
  • the obtained motion vectors are further scaled using the SR factor r.
  • the scaled motion vectors are then used in order to register the depth images and g t ⁇ , resulting in -i.
  • the registration step reorders the pixels in order to have a correspondence that enables a direct pixel-wise filtering over time. Moreover, to apply the filtering as outlined in Section
  • the observation model in (12) is applicable to the SR data model in (6) under the assumption of a zero mean additive white Gaussian noise.
  • the dynamic model in (14) is actually equivalent to the model in (3), and one can prove that the innovation is related to the depth displacement w t-i and acceleration uncertainty ' of the pixel P* by the following equation:
  • the choice of the threshold value ⁇ is related to the type of the used depth sensor and the level of the sensor-specific noise.
  • is the regularization parameter
  • B is the blurring matrix
  • the matrices S' x and S j y are shifting matrices which shift f t by i, and j pixels in the horizontal and vertical directions, respectively.
  • the scalar a e ]0, ] is the base of the exponential kernel which controls the speed of decay [3].
  • the MAP estimation in (17) is applied, wherein a multi-level version in a similar fashion as in [14, 19, 6] is used.
  • the parameter ⁇ is a scalar which represents the step size in the direction of the gradient, and I is the identity matrix and sign(-) is the sign function.
  • L 3
  • L 3
  • the SR resolution problem is merely a denoising one. In other words, the objective is not to increase resolution, and hence there is no blur due to upsampling. In contrast, by increasing the SR factor r more blurring effects occur leading to a higher 3D error in the final reconstructed HR scene Figure 6.
  • one pixel Pt was randomly chosen and its filtered depth value z t was tracked and its filtered velocity
  • the proposed method in accordance with the particularly preferred embodiment of the invention is evaluated using a complex scene with a highly non-rigidly moving object.
  • the publicly available "Samba” [1 ] data set is used. This dataset provides a real sequence of a full 3D dynamic dancing lady scene with high resolution ground truth. This sequence is quite complex where it contains both non-rigid radial motions and self-occlusions, represented by hands and leg movements, respectively.
  • the publicly available toolbox V- REP [2] is used to create from the "Samba” data a synthetic depth sequence with fully known ground truth.
  • a depth camera is fixedly provided at a distance of 2 meters from the 3D scene. Its resolution is 1024 2 pixels. The camera is used to capture the depth sequence.
  • the created LR noisy depth sequence is then super-resolved using state-of-art methods, the conventional bicubic interpolation, UP-SR [4], SISR [5], and the proposed algorithm.
  • Table I 3D RMSE in mm for the super-resolved dancing girl sequence using different SR methods. These methods are applied on LR noisy depth sequences with two noise levels.
  • the reconstructed HR depth images are back projected to the 3D world using the camera matrix.
  • the 3D RMSE of each back projected 3D point cloud as compared to the 3D ground truth is calculated.
  • Table I shows the 3D reconstruction error of the bicubic, UP-SR [4], and SISR [5] methods as compared to the proposed method versus different noise levels. The comparison is done at two levels: (i) Different parts of the reconstructed 3D body, namely, hand, torso, and the leg, and (ii) full reconstructed 3D body. As expected, by applying the conventional bicubic interpolation method directly on depth images, a large error is obtained.
  • Figure 8a is a 3D plotting of one LR depth frame while (f) is the 3D ground truth.
  • the proposed algorithm has been tested on a real sequence captured with a Time of Flight, ToF, camera (pmd CamBoard NanoTM).
  • the captured LR depth sequence contains a non-rigidly moving face. Samples of the LR captured frames are plotted in the first and second rows of Figure 9.
  • Figure 9 shows results of applying the proposed algorithm according to the particularly preferred embodiment of the invention on a real sequence captured by a LR ToF camera (120x160 pixels) of a non-rigidly moving face.
  • First and second rows contain a 3D plotting of selected LR captured frames.
  • Figure 10 we plots the filtered depth value of a randomly chosen tracked pixel.
  • the blue line shows the filtered trajectory of this pixel as compared to its row noisy measurement in red.
  • the algorithm's run-time on this sequence is 50 ms per frame on a 2.2 GHz i7TM processor with 4 Gigabyte of RAM.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)

Abstract

The invention proposed a method and device for efficient real-time temporal filtering and super-resolution of depth image sequences. By using the method steps in accordance with the invention, the complexity of processing captured depth image sequences is reduced to the complexity of filtering one-dimensional signals.

Description

REAL-TIME TEMPORAL FILTERING AND SUPER-RESOLUTION OF DEPTH IMAGE
SEQUENCES
Technical field
The present invention relates to the field of image processing. In particular it relates to methods and devices for temporally filtering depth image sequences, which may comprise upsampled depth images captured at an initial low resolution.
Background of the invention
The recent developments in depth sensing technologies, be it Time of Flight, ToF, cameras or structured light cameras, have seen the explosion of their applications in gaming, automotive sensing, surveillance, medical care, and many more. The major problem of these sensors is their high contamination with noise and low spatial resolution. In addition, in the case of large distances between the sensor and the scene of interest, a similar effect is observed even by using a relatively high resolution depth sensor.
Dynamic depth videos with one or multiple moving objects deforming non-rigidly represent a very typical scenario encountered in applications such as people sensing, cloth deformation, hand gesture recognition, computer-based recognition of variations of facial expressions, to name a few. Such scenes are more challenging than static scenes. Indeed, in addition to challenges due to noise and outliers, non-rigid deformations in 3D cause occlusions, which result in missing data, and in undesired holes. Super-resolution, SR, algorithms have been proposed as a solution to this problem. Two categories of algorithms may be distinguished; multi-frame SR which uses multiple frames in an inverse problem formulation to reconstruct one high resolution frame [16, 7, 4]. The second category is known as single-image SR. It is based on dictionary learning and a heavy training [5, 12]. In [4], a first dynamic multi-frame depth SR has been proposed. This algorithm is, however, limited to lateral motions, and fails in the case of radial deformations in the depth direction. Moreover, it is not practical for real-time applications due to a heavy cumulative motion estimation process applied to a certain number of frames buffered in the memory. Alternatively, a recursive formulation may be thought of as in [15] where an iterative SR was proposed based on a block affine motion model resulting in a relatively efficient processing. This, however, is not applicable to non-lateral motions. Earlier attempts for recursive SR approaches have proposed to use a Kalman filter formulation [8, 10, 9, 13, 18]. These methods work only under two conditions: constant translational motion between low resolution frames which represents the system motion model (i.e. transition matrix), and intensity consistency assumption between each pair of images in the video sequence. In the case of dynamic depth videos, these assumptions are not always valid. A local motion model such as a dense 2D optical flow as in [4] is not sufficient, it is necessary to account for the full 3D motion in the SR reconstruction, known as scene flow, or the 2.5D motion, known as range flow.
Super-resolution, SR, is a common technique used to recover a high resolution, HR, reference image from a plurality of observed low resolution, LR, images subject to errors due to the optical acquisition system such as noise and blurring, and to deviations from the reference image due to relative motion.
Patent document US 2014/0169701 A1 discloses an algorithm for generating high resolution depth images from captured low resolution depth images. The disclosed algorithm relies on the availability of high resolution, HR, image data. Features of the HR image (boundaries, edges) are detected, and a boundary map is generated therefrom. After an initial upsampling of the low resolution depth information, the boundary map is used to identify regions in the upsampled depth image, which require further refinement.
Patent document US 8,532,425 B2 discloses a method for generating a dense depth map using an adaptive joint bilateral filter, starting from an initial depth map. Parameters of the filter are adapted based upon the content of an image corresponding to the initial depth map.
The UP-SR algorithm proposed in [3] is limited to lateral motions as it only computes 2D dense optical flow but does not account for the full motion in 3D, known as scene flow, or the 2.5D motion, known as range flow. It consequently fails in the case of radial deformations. Moreover, it is not practical because of a heavy cumulative motion estimation process applied to a number of frames buffered in the memory.
Technical problem to be solved
It is an objective of the present invention to provide a method and device for real time filtering of depth image data, which overcomes at least some of the disadvantages of the prior art.
Summary of the invention
The invention provides a method for generating in real time a temporally filtered representation of a depth image sequence. The method comprises the following subsequent steps: a) providing a sequence of at least two depth images representing depth image data captured using depth image sensing means at two consecutive instants in time, wherein a first depth image precedes a second, observed depth image, and wherein each pixel value of a depth image represents the distance between the image sensing means and an imaged object at the time of capture;
b) spatially registering each pixel of the first image with an observed pixel of the observed image;
c) filtering each observed pixel value by computing a filtered pixel value based on the observed pixel value and on the corresponding registered pixel value of the first image, comprising computing an approximation of the depth displacement for each pixel, by computing the difference between the pixel value of the observed image and the corresponding registered pixel value of the first, preceding image;
d) replacing said observed depth image with the resulting filtered depth image.
The resulting filtered depth image is used as the first depth image in a subsequent application of steps a) - d)
Preferably, the method may further comprise the subsequent step of using the filtered depth image resulting from the immediately previous application of steps a) to d) as the first depth image in a following, iterative application of steps a)-d).
The registration step b) may preferably comprise estimating, for each pixel of the first image, a motion vector, which is an estimation of the spatial displacement of said pixel with respect to an observed pixel of the observed image. Preferably, the observed image may be provided at a first, low, spatial resolution, while the first, preceding image is provided at a second spatial resolution being r times larger than the first resolution, r>0. The registration and filtering steps b) , c) may be performed using an upsampled representation of the observed image, the upsampling factor being equal to r.
The motion vectors may preferably be computed using the low resolution observed image and upsampled by the factor r.
Alternatively, said motion vectors may be computed using the upsampled representation of the observed image. The filtering step c) may preferably comprise the generation of a predicted pixel value based on the pixel value of the first image, and a correction of said predicted pixel value based on the corresponding observed pixel value. Further preferably, the filtering step c) may comprise applying a spatial median filter to said observed pixel, if the absolute difference between said predicted pixel value and said observed pixel value exceeds a predetermined threshold value.
Subsequently to step c), a spatial deblurring filter may further be applied to the filtered depth image.
Preferably, the deblurring filter may be a multi-level iterative bilateral total variation deblurring filter. The depth image sequence may preferably be captured using depth image sensing means in a fixed position relative to an imaged scene comprising motion.
The depth image sensing means may preferably comprise a Time of Flight, ToF camera. The depth image sequence may preferably be captured using depth image sensing means, which are in motion relative to a static imaged scene.
It is a further object of the invention to provide a device for carrying out the method according to the invention. The device comprises a memory element and computing means configured for:
a) reading, from said memory element, a sequence of at least two depth images representing depth image data captured using depth image sensing means at two consecutive instants in time, wherein a first depth image precedes a second, observed depth image, and wherein each pixel value of a depth image represents the distance between the image sensing means and an imaged object at the time of capture;
b) spatially registering each pixel of the first image with an observed pixel of the observed image;
c) filtering each observed pixel value by computing a filtered pixel value based on the observed pixel value and on the corresponding registered pixel value of the first image, comprising computing an approximation of the depth displacement for each pixel, by computing the difference between the pixel value of the observed image and the corresponding registered pixel value of the first, preceding image;
d) replacing said observed depth image with the resulting filtered depth image in said memory element, and using the resulting filtered depth image as the first depth image in a subsequent application of steps a) - d).
The device may further comprise depth image sensing means. The depth image sensing means may comprise at least one sensor element capable of capturing images comprising image depth information, and optical means defining a field of view.
The invention further provides a computer configured to carry out the method according to the invention.
According to a further aspect of the invention, a computer program comprising computer readable code means is provided. When the computer readable code means are run on a computer, they cause the computer to carry out the method according to the invention.
According to yet another aspect of the invention, a computer program product is provided, comprising a computer-readable medium on which the computer program according to the invention is stored.
The invention provides a real-time dynamic multi-frame filtering algorithm for depth videos has been proposed. In particular, the invention may be applied to a super-resolution framework, without being limited thereto. The algorithm according to the invention is effective in enhancing the resolution and the quality of low resolution dynamic scenes with highly non-rigidly moving objects. Obtained results show the robustness of the proposed algorithm against radial motions, i.e., motions in the direction of image depth. This is handled by first estimating the depth displacement, and simultaneously correcting the depth measurement by filtering. For the sake of real-time processing, the proposed algorithm is based on per-pixel temporal processing of the depth video sequence such that multiple one-dimensional signals are filtered separately. Each filtered depth frame is further refined using a multi-level iterative bilateral total variation regularization after filtering and before proceeding to the next frame in the sequence. The invention only requires the availability of depth images.
According to preferred embodiments, the invention allows to enhance low resolution dynamic depth videos containing freely non-rigidly moving objects with the proposed dynamic multi-frame super-resolution algorithm. Existent methods are either limited to rigid objects, or restricted to global lateral motions discarding radial displacements. The invention addresses these shortcomings by accounting for non-rigid displacements in 3D. In addition to 2D optical flow, the depth displacement is estimated, and simultaneously the depth measurement is corrected through appropriate filtering. This concept is incorporated efficiently in a multi-frame super-resolution framework. It is formulated in a recursive manner that ensures an efficient deployment in real-time. Results show the overall improved performance of the proposed method as compared to alternative approaches, and specifically in handling relatively large 3D motions. Test examples range from a full moving human body to a highly dynamic facial video with varying expressions.
Compared to state of the art solutions, for example to the UP-SR algorithm known from [3], the invention solves two limitations thereof: not considering 3D motions and using an inefficient cumulative motion estimation. The proposed solution in accordance with the invention is based on the assumption that the 3D motion of a point can be approximated by decoupling the radial component from the lateral ones. This approximation allows the handling of non-rigid deformations while reducing the computational complexity associated with an explicit full 3D motion estimation at each point. Moreover, a recursive depth multi- frame SR is formulated by replacing UP-SR's cumulative motion estimation with a point- tracking operation locally at each pixel.
For a reduced complexity, which allows providing a real-time implementation, it is proposed to approximate range flow by estimating radial motions on top of the 2D optical flow. To ensure efficiency, the invention treats a video sequence as a set of one-dimensional signals. By so doing, an approximation of range flow is achieved. This allows for taking into account radial deformations in the SR estimation.
To adequately preserve the smoothness properties of the depth surface, and remove noise and blur without over smoothing, it is proposed to use a multi-level version of the iterative bilateral total variation regularization given in [1 1]. In summary, the contribution of this invention is a new multi-frame depth information filtering algorithm. Applied to the case of super-resolution, it provides the following advantages:
1 ) Recursive, hence, suitable for real-time applications.
2) Robust to radial motions without explicitly computing range flow.
3) Accurate depth video reconstruction thanks to the proposed multi-level iterative bilateral regularization. Brief description of the drawings
Several embodiments of the present invention are illustrated by way of figures, which do not limit the scope of the invention, wherein:
Figure 1 is a flow chart illustrating the main method steps according to a preferred embodiment of the invention;
Figure 2 schematically illustrates a device according to a preferred embodiment of the invention;
Figure 3a shows a lot resolution depth frame;
Figure 3b shows an up-sampled representation of the depth frame shown in Figure 3a, wherein a known bi-cubic interpolation is used;
Figure 3c shows a super-resolved representation of the depth frame shown in
Figure 3a, wherein a known super-resolution algorithm is used;
Figure 3d shows a super-resolved representation of the depth frame shown in
Figure 3a, wherein a known super-resolution algorithm is used;
Figure 3e, shows a super-resolved representation of the depth frame shown in
Figure 3a, wherein an algorithm in accordance with a preferred embodiment of the invention is used;
Figure 4 is a flow chart illustrating the main method steps according to a preferred embodiment of the invention;
Figures 5a and 5b show amplitude images of a dynamic scene containing a moving hand towards the image sensing means;
Figure 5c and 5d show corrected representations of the images shown in Figures 5a and 5b respectively, wherein a known standardization step has been used;
Figure 6 plots the 3D RMSE in mm of a super-resolved depth image sequence based on a preferred embodiment of the invention for three different upsampling factors and for different noise levels;
Figures 7a-7c illustrate tracking results for different depth values randomly chosen from a super-resolved depth image sequence based on a preferred embodiment of the invention, for three different upsampling factors;
Figures 7d-7f illustrate the filtered depth displacements corresponding to the results of Figures 7a-7c respectively;
Figure 8a illustrates a 3D plot of a low resolution depth frame;
Figure 8b illustrates a super-resolved representation of the depth frame of Figure
8a, wherein a known bi-cubic interpolation has been applied;
Figure 8c illustrates a super-resolved representation of the depth frame of Figure
8a, wherein a known super-resolution algorithm has been applied; Figure 8d illustrates a super-resolved representation of the depth frame of Figure 8a, wherein a known super-resolution algorithm has been applied;
Figure 8e illustrates a super-resolved representation of the depth frame of Figure 8a, wherein an algorithm in accordance with a preferred embodiment of the invention has been applied;
Figure 8f illustrates ground-truth super-resolved representation of the depth frame of Figure 8a;
Figure 9 illustrates results obtained using a preferred embodiment of the method according to the invention, applied on real data;
- Figure 10 shows results of a filtered depth value profile in accordance with a preferred embodiment of the invention, compared to the corresponding captured real data depth values.
Detailed description of the invention
This section describes the invention in further detail based on preferred embodiments and on the figures. Unless otherwise stated, features in one described embodiment may be combined with additional features of another described embodiment.
The words "frame" and "image" are to be interpreted as synonyms. An image is an array of pixels representing imaging data. A pixel generally corresponds to the information captured by a single sensing element of an image sensor. A depth image is an array of pixels, wherein each pixel value represents depth information of an imaged scene or object. The depth is measured with respect to the imaging plane or imaging sensor of a depth image sensing means. Typically, the depth data are relative distances.
A sequence of frames or images is synonymous with a video sequence. In the context of the invention, the considered sequences comprise an object in relative motion with respect to the image sending means. Figure 1 shows the main steps of a preferred embodiment according to the invention. The method starts at step 10 with the provision of a sequence of at least two depth images. Each depth image represents depth image data captured using depth image sensing means. The depth image sensing means may be any such sensing means known to the skilled person, for example they may comprise a Time of Flight, ToF, camera. Further details of such devices are well known in the art and will not be provided in the course of the present description.
The two provided depth images are immediately consecutive images of an image sequence. The first depth image precedes the second image. The second image is also referred to as the observed image. The observed image may be an image instantaneously captured by the sensing means, while the first image was captured earlier and stored in a memory of a computing device. In a second step 20, each pixel of the first image is registered with an observed pixel of the observed image. This corresponds to the computation of the dense optical flow between the two images. Any known method for computing the dense optical flow may be applied at this step. The registration may be performed directly on the depth images. Alternatively, if available through other sensing means, a first and second amplitude image corresponding to the first and second depth images may be used in the registration step.
In a third step 30, each observed pixel of the second depth image is filtered by computing a filtered pixel value. The filtered pixel value is based on the observed pixel value itself, and on the corresponding registered pixel value of the first image. Thus, the temporal filtering step 30 is performed per pixel on a one dimensional signal. The filtering or smoothing which is applied on this signal may be any such method known in the art, based for example on prediction methods.
The resulting filtered depth image replaces the observed image in step 40. The method may be iteratively applied, wherein the resulting filtered depth image will be considered as the first image, and a newly captured image will be considered as the observed image. Thus, the temporal filtering only acts on the observed image, and a filtered representation of the preceding image. The method may be initialized using an initial image. In a preferred embodiment of the method according to the invention, the observed images are provided at a low resolution, LR. The sensing means may for example be limited to providing LR depth images. The registration step 20 may be performed on the LR representations of the first and observed images. However, the registration and filtering steps 20, 30 are preferably performed using an upsampled representation of the observed image, the upsampling factor being equal to r>1. This presents the advantage that a more precise motion estimation between the two images is achieved, resulting in an improved registration. The resulting filtered image is in that case a super-resolved temporally filtered depth image representation of the observed LR image.
In a preferred embodiment of the method according to the invention, an additional spatial deblurring step is applied on the temporally filtered depth image. Any known deblurring filter may be used in this step, although a multi-level iterative bilateral total variation deblurring filter will be detailed in a further embodiment.
The recursive processing in accordance with the present invention only considers two consecutive frames at (t-1 ) and t, where the current frame is to be enhanced each time.
Figure 2 illustrates a device 100 for carrying out the method according to the invention. The device comprises a memory element 1 10 for storing depth images and for storing, for example, executable computer code. The memory element may be any type of memory element known in the art, such as a Random Access Memory, Solid State Drive or Hard Disk Drive. The device may be a computing device.
The device 100 comprises computing means 120, which are configured for carrying out method steps 10-40 in accordance with the invention. The computing means may for example be the Central Processing Unit of computing device, which is configured to execute the method by reading corresponding program code from a memory element. The computing means 120 have read and write access to the memory element 1 10. Based on an initial input consisting of a first depth image ft-i and an observed consecutive depth image gt, the computing means generate a filtered, and in accordance with some embodiments of the invention upsampled, representation of the observed image, ft.
According to a preferred embodiment of the device 100, the device may comprise depth image sensing means 130 having optical means for defining a field of view. The computing means 120 are operatively coupled to the sensing means, so that once an iteration of the method steps 10-40 has been completed, a new observed image is captured using the sensing means 130, relayed to the memory element 1 10, and fed into a new iteration of the method steps 10-40.
A computer program comprising computer readable code means, which when loaded into a memory element and executed by processing means of a computer, cause the computer to carry out the described method steps, can be implemented by the skilled person based on the provided description and figures illustrating the method. 1. Particularly preferred embodiment
The following description outlines a particularly preferred embodiment in accordance with the invention. In this embodiment, a real-time filtering and u psam pi ing framework for depth image sequences is considered. The described embodiment is in no way limitative of the invention and it should be understood that, unless otherwise stated, particular features of this embodiment may be combined in isolation with the embodiments previously described.
Figure 3 shows the results obtained using different super-resolution methods applied to a real low resolution dynamic depth sequence captured with a ToF camera with SR scale factor of r = 4. Figure 3a shows the low resolution depth frame. Figure 3b shows a result obtained using bicubic interpolation. Figure 3c shows a result obtained using Patch Based Single Image Super Resolution (SISR) [5]. Figure 3d shows a result obtained using Upsampling for Precise Super Resolution (UP-SR) [4]. Figure 3e shows a result obtained using the proposed algorithm in accordance with this particularly preferred embodiment of the invention (50 ms per frame).
In what follows, Section 2 gives the problem for depth video super-resolution. Section 3 explains the proposed concept for handling radial motion within the super-resolution framework. The proposed recursive depth video SR algorithm is presented in Section 4. Quantitative and qualitative evaluations and comparisons with other approaches are given in Section 5. The following notations will be considered: bold small letters correspond to vectors. Bold capital letters denote matrices. Italic letters are scalars. pt denotes a pixel position on image plane at instant t, and mt denotes the corresponding 2D optical flow at t.
2. Background and Problem Formulation
A brief review of the problem of multi-frame super-resolution, SR, of dynamic depth videos is provided and highlights on the challenges that remain untackled by existing approaches are given. Let us consider a video of N observed low resolution, LR, depth frames of a dynamically deforming depth scene F acquired using a depth sensor, ToF or structured light. The scene is assumed to contain one or multiple moving objects. Each LR frame gt, t = 1 , ... , N, is represented by a column vector of size (m x 1 ) corresponding to the lexicographic ordering of frame pixels. The objective of depth SR is to reconstruct a higher resolution, HR, depth video {ft; t = 1 , ...,N}, where each frame is of size (n x 1 ) with n/m = r e M being the SR scale factor. The classical multi-frame depth SR problem may be simplified by reconstructing one HR frame at a time, referred to as reference frame, by using the observed video. Therefore, if the reference time is to, then the problem is to reconstruct fto using the N' = (N-to+1 ) preceding measurements. The operation may be repeated for to = 1 , N. A noisy LR observation is modelled as follows:
¾ = DHMj ft + nt . t0 < t and t , t0 G [1. N] C N* (1 ) where D is a known constant downsampling matrix of dimension (m x n). The system blur is represented by the time and space invariant matrix H. The (n x n) matrices M'to correspond to the motion between fto and gt before their downsampling. The vector nt is an additive white noise at time instant t. Without loss of generality, both H and M'to are assumed to be block circulant matrices, so they are commutative. As a result, the estimation of fto may be decomposed into two steps; estimation of a blurred HR image, followed by a deblurring step.
While the LidarBoost algorithm [16] is a reference method for multi-frame depth SR, it is only applicable to static scenes for object scanning. The UP-SR algorithm in [4] is, so far, the only depth multi-frame SR proposed for dynamic scenes. This algorithm is based on two key components. The first one is to densely upsample the observed LR sequence prior to any operation. This is shown to ensure a more accurate registration of frames. The resulting r-times upsampled image is defined as gt†= U gt, where U is an (n x m) upsampling matrix. The second component of UP-SR is to use a cumulative motion compensation approach between the reference frame and all observations.
This operation starts by estimating the motion between consecutive frames, using classical dense 2D optical flow estimation between the upsampled versions gt-i† and gt†, namely,
M{ _ i = arg niiii Φ (gt_!†, gf †, M) (2)
M
where Ψ is a dense optical flow-related cost function and
gf †= M _ l ,_ ,† +St (3)
The vector 5t is referred to as the innovation image. It contains novel points appearing, or disappearing due to occlusions or large motions. This innovation is assumed in [4] to be negligible. In addition, similarly to [8], for analytical convenience, it is assumed that all pixels in gt† originate from pixels in gt-i† in a one to one mapping. Therefore, each row in contains 1 for each position corresponding to the address of the source pixel in gt-i†. This assumption of bijectiveness implies that the matrix AV -i is assumed to be an invertible [ ] †t I— 1 ,τί— 1
permutation, s.t., L1V½-IJ — iVAt . Furthermore, its estimate leads to the following registration to gn†:
g[" ] t= M*- ] gi†_ (4)
Using a cumulative motion compensation approach, the registration of a non-consecutive frame gt† to the reference to† is achieved as follows:
Figure imgf000014_0001
Choosing the upsampling matrix U to be the transpose of D, the product UD = A gives a block circulant matrix A that defines a new blurring matrix B = AH. Therefore, the estimation of fto starts by estimating its blurred version hto = Bfto. The data model in (1 ) becomes:
f*o †= hif) + vt, t0 < t and t, t0€ [1, N] C N* (6)
1 TVT *° T J■ n
where * * * is an additive noise vector of length n. It is assumed to be independent and identically distributed. Using an Li-norm, the blurred estimate is found by pixel-wise temporal median filtering of the upsampled registered LR observations such as:
N
hto = argmln∑ \\hto - g;"† ||i = medt{gi°†} Lto (7)
Then, as a second step, follows an image deblurring to recover from 1?<>. The robustness of the UP-SR algorithm in handling large motions is achieved thanks to the cumulative motion approach combined with upsampling, as has been shown experimentally in [4]. However, as described above, the only considered motions are lateral motions using 2D dense optical flow. Radial displacements in the depth direction, often encountered in depth sequences, are therefore not handled. Moreover, the UP-SR registration step is based on a heavy cumulative motion estimation which makes this algorithm not suitable for real-time applications.
3. Range Flow Approximation
In accordance with the invention, it is argue that the above mentioned challenges may be resolved by incorporating the 2.5D version of dense optical flow [20], known as range flow, in the UP-SR framework. The direct computation of range flow can be complex. Instead of its direct computation, in accordance with the present invention, it is proposed to provide an approximation by decomposing range flow into 2D optical flow and a filtered radial motion (in the depth direction). Figure 4 is a flow chart of the proposed multi-frame depth super-resolution algorithm for dynamic depth videos containing one or multiple non-rigidly deforming objects in accordance with this particularly preferred embodiment of the invention. For the sake of clarity, the method steps 10-40 as shown in Figure 1 are also identified in Figure 4.
3.1. Flow Decoupling
In order to address the problem of radial motions, it is important to consider the full 3D motion per pixel. At a time instant t, and for a pixel position pt = (xt; yt) on the sensor image plane, the depth surface can be defined as the following mapping:
R2 x N - M3
(8)
Pt ·→ (x*, s¾, zt{xt, yt)) The deformation of the surface J~ from (to-1 ) to to takes the point P*°_ 1 to a new position
Pto . Given
Figure imgf000015_0001
t0
dyt
eft
and I to 3 the vector t = V
represents the direction of the displacement from P*° 1 to The surface deformation may then be expressed through the derivative of 3~ following the direction ί resulting in a range flow i¾ ' ¾ ^ '¾) where the lateral displacement is m*o — iut0 i v*o ) and the radial displacement in the depth direction is 0 dt ' °.
Applying the gradient constraint on the depth total derivative, the range flow constraint as first proposed in [20] is found, and defined as follows:
Figure imgf000015_0002
In accordance with the present invention, it is proposed to decouple m*o from the radial displacement Wt° . The quantity m*o is computed using available approaches for 2D optical flow estimation. Further, the 2D optical flow is computed using the low resolution 2D intensity images associated with the considered depth sensor. Note that the intensity (amplitude) images provided by the ToF camera cannot be used directly. Their values differ significantly depending on the integration time and object distance from the camera. Thus, in order to guarantee an accurate registration, a standardization step is applied, which is similar to the one proposed in [17] prior to motion estimation, see Figure 5. Figures 5a and Figure 5b show original amplitude images for a dynamic scene containing a moving hand towards the camera where the intensity (amplitude) values differ significantly depending on the object distance from the camera. The corrected amplitude images for the same scene are presented in Figure 5c and Figure 5d respectively where the intensity consistency is preserved.
If the intensity images are not available (e.g. using synthetic data) the 2D optical flow can be directly estimated using the depth images after a preprocessing step with a bilateral filter. The bilateral filter is only used in the preprocessing step while the original depth data is mapped in the registration step.
-to
The registered depth image from (to-1 ) to to is defined as *o-i . Consequently, the radial displacement may be approximated by the temporal difference between depth values, i.e.,
Wto ~ Zto (Pi0 ) - ¾_! (pto ) (10) This first approximation of Wt° is an initial value that requires further refinement directly accounting for the system noise. In accordance with the invention, it is proposed to do that using temporal tracking with a predictive filter as detailed in Section 3.2. Several specific filtering methods may be applied by the skilled person without exercising inventive skill and while remaining within the scope of the present invention.
3.2. Refinement by Filtering
The notation is simplified as ¾ (P*) = ¾ . Since, by definition, ¾- i iP*-i ) — ^t-i, one may write fi-i (Pi) The following state vector is considered:
Figure imgf000016_0001
where both the depth measurement and the radial displacement are to be filtered. For the purpose of filtering, one may introduce a Gaussian system; so a noisy depth observation may be modelled as
¾ = b - st + nt (12) where the observation vector is b = (1 ,0)T , and the observation noise nt is Gaussian with the variance o2n, i.e. "* ~ ΛΓ(0, σ¾ Further, a constant velocity model with an acceleration yt following a Gaussian distribution
Ίί ^ Λί(0, σα 2) may be assumed.
The dynamic model is then defined as
{ wt = wt-i + tAt. which can be rewritten as:
si = Kst_1 + 7f (14) where t
Figure imgf000017_0001
is the process error which is white Gaussian with the covariance
Q = < *i Δί/ / 2 J (15)
Using techniques known in the art, a prediction s*l*-i may then be computed and subsequently corrected using the observed measurement 2*. The corrected error is considered as the difference between the prediction and the observation. This per pixel filtering is extended to all pixels of the depth frame and incorporated in the SR framework in Section 4. 4. Proposed Recursive Depth Video Super-Resolution
In what follows, describe a recursive multi-frame super-resolution algorithm in accordance with this particularly preferred embodiment of the invention is described, by incorporating the filtering framework of Section 3.2 to the dynamic depth video SR problem. In addition to handling radial motions, and in order to properly preserve non-rigidity, it is proposed to recursively filter each pixel trajectory or track (through time) separately by assuming that all trajectories are independent. This assumption requires ideally a corrective step to bring back the correlation between neighboring pixels from the original depth surface F. To that end, a maximum a posteriori, MAP, estimation is used, where a multi-level iterative bilateral total variation, TV, regularization is proposed. The advantage of the processing per pixel is to keep the exact same formulation as in Section 3.2; hence, all the required matrix inversions will be for (2 x 2) matrices. For a recursive multi-frame SR algorithm, instead of using the whole video sequence of length N to recover one frame, the preceding recovered frame ¾ -i is used to estimate ft from the current upsampled observation gt†. Similarly to the
UP-SR algorithm, ft is estimated in two steps; first, finding a blurred version h< as the result of the filtering step, then a deblurred version ft as the result of the MAP iterative regularization.
4.1. Blurred Estimation
To extend the range flow approximation of Section 3 to a full frame, the point pt is now considered as an element of a grid constituting a discrete sampling of W. This results in discrete positions Pt ~ (xt Vt ) with i e [1 , n].
The depth image at time t is defined as the column vector of all the blurred depth values zt Pt ) , and written h< = [¾ (Pt )] - Vi .
The obtained motion vectors are further scaled using the SR factor r. The scaled motion vectors are then used in order to register the depth images and gt†, resulting in -i. The registration step reorders the pixels in order to have a correspondence that enables a direct pixel-wise filtering over time. Moreover, to apply the filtering as outlined in Section
3.2, one may define a Gaussian system similar to the one defined by (12) and (14). The observation model in (12) is applicable to the SR data model in (6) under the assumption of a zero mean additive white Gaussian noise. The dynamic model in (14) is actually equivalent to the model in (3), and one can prove that the innovation is related to the depth displacement wt-i and acceleration uncertainty ' of the pixel P* by the following equation:
5t(i) = wl_1At + 1HAt† (16)
The result of the n joint filters run in parallel is the blurred depth image estimate Furthermore, in order to separate background from foreground depth pixels, and tackle the problem of flying pixels, especially around edges a fixed threshold τ is defined such that:
Continue the track if | ¾—
Figure imgf000018_0001
New track & spatial median if j.¾— zt\t-i \≥ r.
The choice of the threshold value τ is related to the type of the used depth sensor and the level of the sensor-specific noise. In order to correct the artifacts due to this one- dimensional processing of an image, it is proposed to use a multilevel iterative bilateral TV deblurring step as described in the next section. 4.2. Multi-Level Iterative Bilateral TV Deblurring
In order to estimate the deblurred high resolution depth image from h*, the following MAP deblurring framework is applied: r
Figure imgf000019_0001
where λ is the regularization parameter, and B is the blurring matrix. For example, a bilateral TV regularizer [1 1] may be chosen, defined as:
Figure imgf000019_0002
i=-I j=-J
The matrices S'x and Sj y are shifting matrices which shift ft by i, and j pixels in the horizontal and vertical directions, respectively. The scalar a e ]0, ] is the base of the exponential kernel which controls the speed of decay [3]. In order to effectively deblur while keeping the details of ¾ without over smoothing, the MAP estimation in (17) is applied, wherein a multi-level version in a similar fashion as in [14, 19, 6] is used. Combined with a steepest descent numerical solver, the proposed solution is described by the following pseudo-code: for = 1, · · , L
for k— I. - · · , K
Figure imgf000019_0003
end for
h-t <— I'/-..;
end tor
The parameter β is a scalar which represents the step size in the direction of the gradient, and I is the identity matrix and sign(-) is the sign function. In our experiments, three levels with L = 3, and seven iterations per level with K = 7 have been used. 5. Experimental Results
In this section, the performance of the proposed algorithm in accordance with this particularly preferred embodiment of the invention is evaluated using: (i) synthetic depth videos, and (ii) real depth videos of dynamic scenes captured by a ToF camera (pmd CamBoard nano™). The effectiveness of the proposed algorithm is illustrated, as compared to state-of-art methods. Quantitative and qualitative evaluations are provided.
5.1. Synthetic Data
In order to provide a quantitative evaluation, a simple and fully controlled set-up is provided. A generated sequence of 20 depth frames of a synthetic hand moving radially with respect to the camera (5 cm difference between each two successive frames, and At = 0.1 seconds) is used.
The sequence is downsampled with a scale factor of r = 2, and r = 4. These sequences are further degraded with additive noise with σ varying from 10 to 80 mm. The created LR noisy depth sequences are then super-resolved using the proposed algorithm with, r = 1 , r = 2, and a scale factor of r = 4. In the simple case where r = 1 , the SR resolution problem is merely a denoising one. In other words, the objective is not to increase resolution, and hence there is no blur due to upsampling. In contrast, by increasing the SR factor r more blurring effects occur leading to a higher 3D error in the final reconstructed HR scene Figure 6. Figure 6 plots 3D RMSE in mm of the super-resolved hand sequence using the proposed method according to the invention with different SR scale factors. Increasing the SR factor leads to a higher 3D reconstruction error. This is due to the blurring effects of the upsampling process and the lower resolution of the used LR depth sequence as compared to the one used with r = 1 .
In order to evaluate the quality of the filtered depth data and the filtered velocity, one pixel Pt was randomly chosen and its filtered depth value zt was tracked and its filtered velocity
Δ* through the super-resolved sequence. The same is applied for all SR factors. In Figure 7 the tracking results are shown for the randomly chosen pixels from the super- resolved sequences with r = 1 , r = 2, and r = 4, and a fixed noise level of σ = 50 mm. Tracking results for different depth values randomly chosen from the super-resolved sequences with different SR scale factors r = 1 ; r = 2; and r = 4, are plotted in (a), (b), and (c), respectively. The corresponding filtered depth displacements are shown in (d), (e), and (f), respectively. It can be appreciated that the depth values are filtered (solid lines) as compared to the noisy depth measurements (dotted lines) for all scale factors as shown in Figure 7 (a), (b) and (c). Similar behavior is observed for the corresponding filtered velocities in Figure 7 (d), (e), and (f). 5.2. Publicallv Available Data
The proposed method in accordance with the particularly preferred embodiment of the invention is evaluated using a complex scene with a highly non-rigidly moving object. The publicly available "Samba" [1 ] data set is used. This dataset provides a real sequence of a full 3D dynamic dancing lady scene with high resolution ground truth. This sequence is quite complex where it contains both non-rigid radial motions and self-occlusions, represented by hands and leg movements, respectively. The publicly available toolbox V- REP [2] is used to create from the "Samba" data a synthetic depth sequence with fully known ground truth. A depth camera is fixedly provided at a distance of 2 meters from the 3D scene. Its resolution is 10242 pixels. The camera is used to capture the depth sequence. Then, similarly to the previous set-up, the obtained depth sequence is downsampled with r = 4 and further degraded with additive noise with standard deviation σ varying from 0 to 50 mm. The created LR noisy depth sequence is then super-resolved using state-of-art methods, the conventional bicubic interpolation, UP-SR [4], SISR [5], and the proposed algorithm.
Figure imgf000021_0001
Table I: 3D RMSE in mm for the super-resolved dancing girl sequence using different SR methods. These methods are applied on LR noisy depth sequences with two noise levels.
The super-resolution scale factor for this experiment is r = 4. To measure the accuracy of each method, the reconstructed HR depth images are back projected to the 3D world using the camera matrix. Then, the 3D RMSE of each back projected 3D point cloud as compared to the 3D ground truth is calculated. Table I shows the 3D reconstruction error of the bicubic, UP-SR [4], and SISR [5] methods as compared to the proposed method versus different noise levels. The comparison is done at two levels: (i) Different parts of the reconstructed 3D body, namely, hand, torso, and the leg, and (ii) full reconstructed 3D body. As expected, by applying the conventional bicubic interpolation method directly on depth images, a large error is obtained. This error is mainly due to the flying pixels around object boundaries. Thus, another round of experiments has been run using a modified bicubic interpolation, where all flying pixels are removed by defining a fixed threshold. Yet, the 3D reconstruction error is still relatively high across all noise levels, see Table I. This is due to the fact that bicubic interpolation does not profit from the temporal information provided by the sequence.
In Table I it is shown that the proposed method in accordance with the particularly preferred embodiment of the invention provides better results as compared to state-of-art algorithms. In order to visually evaluate the performance of the proposed algorithm, the super-resolved results of the dancing girl sequence in 3D are shown in Figure 8. Figure 8 shows 3D plotting of one super-resolved depth frame with r = 4 using: (b) bicubic interpolation, (c) Patch based single image SR (SISR) [5], (d) UP-SR [4], (e) the proposed algorithm according to the invention. Figure 8a is a 3D plotting of one LR depth frame while (f) is the 3D ground truth.
The results for the sequence at the noise level of σ = 30 mm are shown. It is noted that the proposed algorithm outperforms state-of-art methods by keeping the fine details (e.g. the details of the face) as can be seen in Figure 8 (e). Note that the UP-SR algorithm fails in the presence of radial movements and self-occlusions, see rectangular boxes in Figure 8 (d). In contrast, the SISR algorithm can handle these cases, but cannot keep the fine details due to its patch-based nature, see Figure 8 (c). In addition, a heavy training phase is required.
5.3. Real Data
Finally, the proposed algorithm has been tested on a real sequence captured with a Time of Flight, ToF, camera (pmd CamBoard Nano™). The captured LR depth sequence contains a non-rigidly moving face. Samples of the LR captured frames are plotted in the first and second rows of Figure 9.
Figure 9 shows results of applying the proposed algorithm according to the particularly preferred embodiment of the invention on a real sequence captured by a LR ToF camera (120x160 pixels) of a non-rigidly moving face. First and second rows contain a 3D plotting of selected LR captured frames. Third and fourth rows contain the 3D plotting of the super- resolved depth frames with r = 4.
This sequence is super-resolved using the proposed algorithm in accordance with the invention with an SR scale factor of r = 4. Obtained results are given in 3D in the third and fourth rows of Figure 9. The obtained results show the effectiveness of the proposed algorithm in reducing the noise, and further increasing the resolution of the reconstructed 3D face under large non-rigid deformations. To visually appreciate these results as compared to state- of-art methods, the bicubic, UP-SR, and SISR were tested on the same LR real depth sequence. Obtained results show the superiority of the proposed algorithm as compared to other methods, see Figure 10. Figure 10 shows a filtered depth value profile of a tracked pixel through the super-resolved sequence of a real face, with SR scale factor of 4. Figure 10 we plots the filtered depth value of a randomly chosen tracked pixel. The blue line shows the filtered trajectory of this pixel as compared to its row noisy measurement in red. The algorithm's run-time on this sequence is 50 ms per frame on a 2.2 GHz i7™ processor with 4 Gigabyte of RAM. It should be understood that the detailed description of specific preferred embodiments is given by way of illustration only, since various changes and modifications within the scope of the invention will be apparent to the skilled person. The scope of protection is defined by the following set of claims.
The research leading to the present invention was supported by the Fond National de Recherche (FNR) of Luxembourg under grant number C1 1/BM/1204105/FAVE/Ottersten.
References
[1 ] http://people.csail.mit.edu/drdaniel/mesh animation/.
[2] http://www.k-team.com/mobile-robotics-products/v-rep.
[3] K. Al Ismaeil, D. Aouada, B. Mirbach, and B. Ottersten. Bilateral filter evaluation based on exponential kernels. In Pattern Recognition (ICPR), 2012 20th IEEE International
Conference on, pages 258-261 , Nov 2012.
[4] K. Al Ismaeil, D. Aouada, B. Mirbach, and B. Ottersten. Dynamic super resolution of depth sequences with non-rigid motions. In Image Processing (ICIP), 2013 20th IEEE
International Conference on, pages 660-664, Sept 2013.
[5] O. M. Aodha, N. Campbell, A. Nair, and G. Brostow. Patch based synthesis for single depth image super-resolution, 2012.
[6] M. Charest, M. Elad, and P. Milanfar. A general iterative regularization framework for image denoising. In Information Sciences and Systems, 2006 40th Annual Conference on, pages 452-457, March 2006.
[7] Y. Cui, S. Schuon, S. Thrun, D. Strieker, and C. Theobalt. Algorithms for 3d shape scanning with a depth camera. Pattern Analysis and Machine Intelligence, IEEE
Transactions on, 35(5): 1039-1050, May 2013.
[8] M. Elad and A. Feuer. Super-resolution reconstruction of image sequences. Pattern
Analysis and Machine Intelligence, IEEE Transactions on, 21 (9):817-834, Sep 1999.
[9] M. Elad and A. Feuer. Superresolution restoration of an image sequence: adaptive filtering approach. Image Processing, IEEE Transactions on, 8(3):387-395, Mar 1999.
[10] S. Farsiu, M. Elad, and P. Milanfar. Video-to-video dynamic super-resolution for grayscale and color sequences. EURASIP J. Appl. Signal Process., 2006:232-232,
Jan. 2006.
[1 1 ] S. Farsiu, M. Robinson, M. Elad, and P. Milanfar. Fast and robust multiframe super resolution. Image Processing, IEEE Transactions on, 13(10): 1327-1344, Oct 2004.
[12] J. Li, Z. Lu, G. Zeng, R. Gan, and H. Zha. Similarityaware patchwork assembly for depth image super-resolution. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 3374-3381 , June 2014.
[13] C. B. Newland, D. A. Gray, and D. Gibbins. Modified kalman filtering for image super- resolution: Experimental convergence results. In Proceedings of the Ninth IASTED International Conference on Signal and Image Processing, SIP Ό7, pages 58-63, Anaheim, CA, USA, 2007. ACTA Press.
[14] S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin. An iterative regularization method for total variation-based image restoration. Simul, 4:460-489, 2005.
[15] V. Patanaviji, S. Tae-O-Sot, and S. Jitapunkul. A robust iterative super-resolution reconstruction of image sequences using a lorentzian bayesian approach with fast affine blockbased registration. In Image Processing, 2007. ICIP 2007. IEEE
International Conference on, volume 5, pages V - 393-V - 396, Sept 2007.
[16] S. Schuon, C. Theobalt, J. Davis, and S. Thrun. Lidarboost: Depth superresolution for tof 3d shape scanning. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 343-350, June 2009.
[17] M. Sturmer, J. Penne, and J. Hornegger. Standardization of intensity-values acquired by time-of-flight-cameras. In Computer Vision and Pattern Recognition, 2008. CVPRW
2008. IEEE Workshop on, pages 660-664, Sept 2013.
[18] J. Tian and K.-K. Ma. A new state-space approach for superresolution image sequence reconstruction. In Image Processing, 2005. ICIP 2005. IEEE International Conference on, volume 1 , pages 1-881-4, Sept 2005.
[19] Q. L. Q. S. S. X. Wenshu Li 1 , Chao Zhaol . A parameteradaptive iterative regularization model for image denoising, 2012.
[20] M. Yamamoto, P. Boulanger, J. -A. Beraldin, and M. Rioux. Direct estimation of range flow on deformable shape from a video rate range camera. Pattern Analysis and
Machine Intelligence, IEEE Transactions on, 15(1 ):82-89, Jan 1993.

Claims

Claims
1 . A method for generating in real time a temporally filtered representation of a depth image sequence, comprising the following subsequent steps:
a) providing a sequence of at least two depth images representing depth image data captured using depth image sensing means at two consecutive instants in time, wherein a first depth image precedes a second, observed depth image, and wherein each pixel value of a depth image represents the distance between the image sensing means and an imaged object at the time of capture (10);
b) spatially registering each pixel of the first image with an observed pixel of the observed image (20);
c) filtering each observed pixel value by computing a filtered pixel value based on the observed pixel value and on the corresponding registered pixel value of the first image, comprising computing an approximation of the depth displacement for each pixel, by computing the difference between the pixel value of the observed image and the corresponding registered pixel value of the first, preceding image (30);
d) replacing said observed depth image with the resulting filtered depth image (40),
wherein the resulting filtered depth image is used as the first depth image in a subsequent application of steps a) - d).
2. The method according to claim 1 , wherein the registration step b) comprises estimating for each pixel of the first image a motion vector, which is an estimation of the spatial displacement of said pixel with respect to an observed pixel of the observed image.
3. The method according to any of claims 1 or 2, wherein the observed image is provided at a first, low, spatial resolution, wherein the first, preceding image is provided at a second spatial resolution being r times larger than the first resolution, r>0, and wherein the registration and filtering steps b), c) are performed using an upsampled representation of the observed image, the upsampling factor being equal to r.
4. The method according to claim 3, wherein said motion vectors are computed using the low resolution observed image and upsampled by the factor r.
The method according to claim 3, wherein said motion vectors are computed using the upsampled representation of the observed image.
The method according to any of claims 1 to 5, wherein said filtering step c) comprises the generation of a predicted pixel value based on the pixel value of the first image, and a correction of said predicted pixel value based on the corresponding observed pixel value.
The method according to claim 6, wherein said filtering step c) further comprises applying a spatial median filter to said observed pixel, if the absolute difference between said predicted pixel value and said observed pixel value exceeds a predetermined threshold value.
The method according to any of claims 1 to 7, wherein subsequently to step c), a spatial deblurring filter is further applied to the filtered depth image.
The method according to claim 8, wherein said deblurring filter is a multi-level iterative bilateral total variation deblurring filter.
The method according to any of claims 1 to 9, wherein said depth image sequence is captured using depth image sensing means in a fixed position relative to an imaged scene comprising motion.
The method according to any of claims 1 to 9, wherein said depth image sequence is captured using depth image sensing means, which are in motion relative to a static imaged scene.
A device (100) for carrying out the method according to any of claims 1 to 9, the device comprising a memory element (1 10) and computing means (120) configured for:
a) reading, from said memory element (1 10), a sequence of at least two depth images representing depth image data captured using depth image sensing means at two consecutive instants in time, wherein a first depth image precedes a second, observed depth image, and wherein each pixel value of a depth image represents the distance between the image sensing means and an imaged object at the time of capture; b) spatially registering each pixel of the first image with an observed pixel of the observed image;
c) filtering each observed pixel value by computing a filtered pixel value based on the observed pixel value and on the corresponding registered pixel value of the first image, comprising computing an approximation of the depth displacement for each pixel, by computing the difference between the pixel value of the observed image and the corresponding registered pixel value of the first, preceding image; d) replacing said observed depth image with the resulting filtered depth image in said memory element (1 10), and using the resulting filtered depth image as the first depth image in a subsequent application of steps a) - d).
13. The device according to claim 12, wherein the device further comprises depth image sensing means (130), the depth image sensing means comprising at least one sensor element capable of capturing images comprising image depth information, and optical means defining a field of view.
14. A computer configured to carry out the method according to any of claims 1 to 1 1 .
15. A computer program comprising computer readable code means, which when they are run on a computer, causes the computer to carry out the method according to any of claims 1 to 1 1 .
16. A computer program product comprising a computer-readable medium on which the computer program according to claim 15 is stored.
PCT/EP2016/062554 2015-06-05 2016-06-02 Real-time temporal filtering and super-resolution of depth image sequences WO2016193393A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
LU92731A LU92731B1 (en) 2015-06-05 2015-06-05 Real-time temporal filtering and super-resolution of depth image sequences
LULU92731 2015-06-05

Publications (1)

Publication Number Publication Date
WO2016193393A1 true WO2016193393A1 (en) 2016-12-08

Family

ID=53434423

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2016/062554 WO2016193393A1 (en) 2015-06-05 2016-06-02 Real-time temporal filtering and super-resolution of depth image sequences

Country Status (2)

Country Link
LU (1) LU92731B1 (en)
WO (1) WO2016193393A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107657585A (en) * 2017-08-30 2018-02-02 天津大学 High magnification super-resolution method based on double transform domains
CN111289470A (en) * 2020-02-06 2020-06-16 上海交通大学 OCT measurement imaging method based on computational optics
CN111489383A (en) * 2020-04-10 2020-08-04 山东师范大学 Depth image up-sampling method and system based on depth edge point and color image
CN112465730A (en) * 2020-12-18 2021-03-09 辽宁石油化工大学 Motion video deblurring method
CN113096024A (en) * 2020-01-09 2021-07-09 舜宇光学(浙江)研究院有限公司 Flying spot removing method for depth data, system and electronic equipment thereof
US20220101547A1 (en) * 2019-07-11 2022-03-31 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Depth image processing method and apparatus, electronic device, and readable storage medium
CN118314060A (en) * 2024-06-05 2024-07-09 中国人民解放军国防科技大学 Image preprocessing method for space target observation

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112950464B (en) * 2021-01-25 2023-09-01 西安电子科技大学 Binary super-resolution reconstruction method without regularization layer

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140147056A1 (en) * 2012-11-29 2014-05-29 Korea Institute Of Science And Technology Depth image noise removal apparatus and method based on camera pose

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140147056A1 (en) * 2012-11-29 2014-05-29 Korea Institute Of Science And Technology Depth image noise removal apparatus and method based on camera pose

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ISMAEIL KASSEM AL ET AL: "Dynamic super resolution of depth sequences with non-rigid motions", 2013 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, IEEE, 15 September 2013 (2013-09-15), pages 660 - 664, XP032565873, DOI: 10.1109/ICIP.2013.6738136 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107657585A (en) * 2017-08-30 2018-02-02 天津大学 High magnification super-resolution method based on double transform domains
CN107657585B (en) * 2017-08-30 2021-02-05 天津大学 High-magnification super-resolution method based on double transformation domains
US20220101547A1 (en) * 2019-07-11 2022-03-31 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Depth image processing method and apparatus, electronic device, and readable storage medium
US11961246B2 (en) * 2019-07-11 2024-04-16 Guangdong OPPO Mobile Telecommunications Corp. Ltd Depth image processing method and apparatus, electronic device, and readable storage medium
CN113096024A (en) * 2020-01-09 2021-07-09 舜宇光学(浙江)研究院有限公司 Flying spot removing method for depth data, system and electronic equipment thereof
CN113096024B (en) * 2020-01-09 2023-05-09 舜宇光学(浙江)研究院有限公司 Flying spot removing method for depth data, system and electronic equipment thereof
CN111289470A (en) * 2020-02-06 2020-06-16 上海交通大学 OCT measurement imaging method based on computational optics
CN111489383A (en) * 2020-04-10 2020-08-04 山东师范大学 Depth image up-sampling method and system based on depth edge point and color image
CN111489383B (en) * 2020-04-10 2022-06-10 山东师范大学 Depth image up-sampling method and system based on depth marginal point and color image
CN112465730A (en) * 2020-12-18 2021-03-09 辽宁石油化工大学 Motion video deblurring method
CN118314060A (en) * 2024-06-05 2024-07-09 中国人民解放军国防科技大学 Image preprocessing method for space target observation

Also Published As

Publication number Publication date
LU92731B1 (en) 2016-12-06

Similar Documents

Publication Publication Date Title
WO2016193393A1 (en) Real-time temporal filtering and super-resolution of depth image sequences
Kim et al. Spatio-temporal transformer network for video restoration
Nasrollahi et al. Super-resolution: a comprehensive survey
Mitzel et al. Video super resolution using duality based tv-l 1 optical flow
US9781381B2 (en) Super-resolution of dynamic scenes using sampling rate diversity
Zhu et al. Removing atmospheric turbulence via space-invariant deconvolution
Rav-Acha et al. Two motion-blurred images are better than one
US8290212B2 (en) Super-resolving moving vehicles in an unregistered set of video frames
Lee et al. Simultaneous localization, mapping and deblurring
Su et al. Super-resolution without dense flow
Jeong et al. Multi-frame example-based super-resolution using locally directional self-similarity
Mustaniemi et al. Fast motion deblurring for feature detection and matching using inertial measurements
Al Ismaeil et al. Real-time non-rigid multi-frame depth video super-resolution
Kim et al. Dynamic scene deblurring using a locally adaptive linear blur model
Al Ismaeil et al. Enhancement of dynamic depth scenes by upsampling for precise super-resolution (UP-SR)
Al Ismaeil et al. Dynamic super resolution of depth sequences with non-rigid motions
Takeda et al. Locally adaptive kernel regression for space-time super-resolution
Hadhoud et al. New trends in high resolution image processing
Vrigkas et al. On the improvement of image registration for high accuracy super-resolution
Russo et al. Blurring prediction in monocular slam
Mohan Adaptive super-resolution image reconstruction with lorentzian error norm
Upla et al. Multiresolution fusion using contourlet transform based edge learning
Singh et al. An efficient and robust multi-frame image super-resolution reconstruction using orthogonal Fourier-Mellin moments
Chavez et al. Super resolution imaging via sparse interpolation in wavelet domain with implementation in DSP and GPU
Qian et al. Blind super-resolution restoration with frame-by-frame nonparametric blur estimation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16728884

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16728884

Country of ref document: EP

Kind code of ref document: A1