US20170262768A1 - Depth from time-of-flight using machine learning - Google Patents
Depth from time-of-flight using machine learning Download PDFInfo
- Publication number
- US20170262768A1 US20170262768A1 US15/068,632 US201615068632A US2017262768A1 US 20170262768 A1 US20170262768 A1 US 20170262768A1 US 201615068632 A US201615068632 A US 201615068632A US 2017262768 A1 US2017262768 A1 US 2017262768A1
- Authority
- US
- United States
- Prior art keywords
- machine learning
- time
- flight sensor
- learning component
- trained
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G06N99/005—
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S17/00—Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
- G01S17/88—Lidar systems specially adapted for specific applications
- G01S17/89—Lidar systems specially adapted for specific applications for mapping or imaging
- G01S17/894—3D imaging with simultaneous measurement of time-of-flight at a 2D array of receiver pixels, e.g. time-of-flight cameras or flash lidar
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S7/00—Details of systems according to groups G01S13/00, G01S15/00, G01S17/00
- G01S7/48—Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S17/00
- G01S7/4808—Evaluating distance, position or velocity data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G06T7/0057—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/521—Depth or shape recovery from laser ranging, e.g. using interferometry; from the projection of structured light
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S17/00—Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
- G01S17/02—Systems using the reflection of electromagnetic waves other than radio waves
- G01S17/06—Systems determining position data of a target
- G01S17/08—Systems determining position data of a target for measuring distance only
- G01S17/32—Systems determining position data of a target for measuring distance only using transmission of continuous waves, whether amplitude-, frequency-, or phase-modulated, or unmodulated
- G01S17/36—Systems determining position data of a target for measuring distance only using transmission of continuous waves, whether amplitude-, frequency-, or phase-modulated, or unmodulated with phase comparison between the received signal and the contemporaneously transmitted signal
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S17/00—Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
- G01S17/66—Tracking systems using electromagnetic waves other than radio waves
Definitions
- Time-of-flight (TOF) cameras are increasingly used in a variety of applications, for example, human computer interaction, automotive applications, measurement applications and machine vision.
- a TOF camera can be used to compute depth maps which contain information relating to the depth of an object in a scene from the camera. The depth refers to the projection of distance on an imaginary line that extends from the camera, where the distance is the absolute radial distance.
- a light source at the TOF camera illuminates the scene and the light is reflected by objects in the scene. The camera receives the reflected light that, dependent on the distance of an object to the camera, experiences a delay. Given the fact that the speed of light is known, a depth map is computable.
- the time of flight measurement is subject to a number of errors and uncertainties which lead to errors in the computed depth maps.
- the reflected light often undergoes multiple reflections from different surfaces within the scene which cause significant errors in the calculated depth.
- a depth detection apparatus which has a memory storing raw time-of-flight sensor data received from a time-of-flight sensor.
- the depth detection apparatus also has a trained machine learning component having been trained using training data pairs.
- a training data pair comprises at least one simulated raw time-of-flight sensor data value and a corresponding simulated ground truth depth value.
- the trained machine learning component is configured to compute in a single stage, for an item of the stored raw time-of-flight sensor data, a depth value of a surface depicted by the item, by pushing the item through the trained machine learning component.
- FIG. 1 is a schematic diagram of a trained machine learning component deployed with a time-of-flight camera
- FIG. 2 is a schematic diagram of a time-of-flight camera
- FIG. 3 is a flow diagram of a method of using a trained machine learning component such as that of FIG. 1 ;
- FIG. 4 is a graph of empirical results of depth values computed using the arrangement of FIG. 1 and various other arrangements;
- FIG. 5 is a schematic diagram of components used to create a trained machine learning component, such as that of FIG. 1 ;
- FIG. 6 is a schematic diagram of components used to generate training data pairs such as the training data pairs of FIG. 5 ;
- FIG. 7 is a graph of data output by a time-of-flight simulator such as that of FIG. 6 ;
- FIG. 8 is a flow diagram of a method of training a random decision forest using training data pairs such as those of FIG. 6 ;
- FIG. 9 is a schematic diagram of a plurality of random decision trees
- FIG. 10 is a flow diagram of a method of using a trained random decision forest at test time, such as the trained machine learning component of FIG. 1 ;
- FIG. 11 is a flow diagram of a method of training a convolutional neural network using training data pairs such as those of FIG. 6 ;
- FIG. 12 is a flow diagram of using a trained convolutional neural network
- FIG. 13 illustrates an exemplary computing-based device in which embodiments of a trained machine learning component for use with a time-of-flight camera are implemented.
- Time-of-flight cameras output raw sensor data which is then processed to derive depth values.
- the act of processing the raw sensor data to compute the depth values is time consuming and complex.
- the depth values which are computed suffer from inaccuracy due to multi-path interference and noise in the raw sensor data.
- time-of-flight cameras are increasingly used for real time applications and/or where highly accurate depth values are needed. For example, hand tracking, body tracking, 3D scene reconstruction and others.
- the trained machine learning system takes raw time-of-flight sensor data as input and computes depth values as output, where the depth values already take into account multi-path interference and optionally also take into account sensor noise. This improves the speed with which the depth values are computed, since there is a single stage. For example, there is no need to compute the depth values and then subsequently process the depth values to correct for multi-path interference and/or sensor noise.
- the result gives a better computing device which is able to control a downstream system using accurate depth values obtained from one or more time-of-flight sensors. Usability from the point of view of the end user is improved since the accurate depth values give a better correspondence with reality such as for hand tracking, body tracking, augmented reality and others.
- the machine learning system has been trained using pairs of simulated raw time-of-flight sensor data frames and corresponding depth maps.
- the simulated raw time-of-flight sensor data frames are calculated using a modified computer graphics renderer as described in more detail below.
- the simulated raw time-of-flight sensor data frames are simulated assuming that multi-path interference occurs. Therefore it is possible to learn a mapping from simulated raw time-of-flight sensor data direct to depth values which are already corrected for multi-path interference. There is no need to apply a subsequent stage to correct the depth values for multi-path interference.
- As a result processing is significantly simplified and speeded up. Because the processing is simpler than two stage processes, the processing is implementable at a dedicated chip, field programmable gate array (FPGA) or similar. This is particularly useful where the processing is to be carried out at a time-of-flight camera itself, or on a resource constrained device such as a wearable or mobile computing device which has an integral time-of-flight camera.
- FPGA field programm
- the machine learning component described herein is found to give highly accurate depth values, especially for situations where depth values are difficult to compute accurately using existing approaches. For example, at corners of rooms, where the floor meets the wall, where the wall meets the ceiling, in the case of highly reflective surfaces such as shiny floors, and others. Accuracy improvements are believed to be due, at least in part, to the fact that the machine learning component has been trained with the particular type of training data.
- FIG. 1 is a schematic diagram of a depth detection apparatus 100 comprising a memory 122 and a trained machine learning component 124 .
- a time-of-flight camera 104 which is a phase modulation time-of-flight depth camera or a gated time-of-flight depth camera (or another future type of TOF camera), captures a stream of raw sensor data 108 depicting a scene 102 .
- One or more objects in the scene 102 and/or the time-of-flight camera itself are moving in some examples.
- the scene comprises a child doing a cartwheel so that there are several moving objects in the scene (the child's limbs).
- the time-of-flight camera is wall-mounted in the room or in some examples is body worn or head-mounted or mounted on a robot or vehicle.
- the stream of raw sensor data 108 comprises a plurality of frames of raw sensor data which have been captured by the time-of-flight camera.
- a frame of raw sensor data comprises, for each pixel of the camera sensor, complex numbers which are amplitude and phase measurements of reflected light.
- a frame of raw sensor data comprises, for each pixel of the camera sensor, a plurality of intensity values of reflected light sensed at the pixel, for different exposure periods.
- the time-of-flight camera uses one or more measurement patterns 106 also referred to as exposure profiles.
- a measurement pattern is a set of values of configurable parameters of the time-of-flight camera, to be used when a frame of raw sensor data is captured by the camera. Where different measurement patterns 106 are available, the time-of-flight camera is able to capture different frames using different measurement patterns.
- the stream of raw sensor data 108 is input to a depth detection apparatus 100 comprising a memory 122 and a trained machine learning component 124 .
- the trained machine learning component 124 computes depth maps, or depth values of individual pixels, in a single stage process which takes into account multi-path interference and/or sensor noise so that an accurate depth map stream 110 is output.
- a depth map comprises a plurality of depth values, each depth value being for an individual pixel of the time of flight image sensor.
- depth values of individual pixels are output.
- the trained machine learning component 100 also outputs uncertainty data associated with the depth values.
- the stream 110 of depth values and optional uncertainty data is input to a downstream system 112 such as a scene reconstruction engine 114 , a gesture detection system 116 , an augmented reality system 118 , a touch-less user interface 120 or others.
- the depth detection apparatus 100 operates in real-time in some examples.
- the depth detection apparatus 100 is integral with the time-of-flight camera 104 .
- the depth detection apparatus 100 is in a computing device such as a smart phone, tablet computer, head worn augmented reality computing device, or other computing device which has a time-of-flight camera.
- the memory 122 holds raw time-of-flight sensor data from the stream 108 and makes this available to the trained machine learning component 124 for processing.
- the trained machine learning component 124 has been trained using pairs of simulated raw time-of-flight sensor data frames and corresponding depth maps.
- the simulated raw time-of-flight sensor data frames are simulated assuming that multi-path interference occurs.
- the trained machine learning component 124 has been trained using pairs of raw time-of-flight sensor data values associated with individual sensor pixels and corresponding depth values.
- the trained machine learning component 124 comprises a trained regressor such as a random decision forest, directed acyclic graph, support vector machine, neural network, or other trained regressor.
- the trained regressor is a pixel independent trained regressor in some examples, in that it is trained using pairs comprising individual pixels and associated individual depth values, and where dependencies between the pairs are not taken into account.
- the trained regressor does take dependencies between individual pixels into account.
- An example, of a trained regressor which does take dependencies between individual pixels into account is a convolutional neural network.
- An example in which the trained regressor is a pixel independent regressor is a random decision forest which is given below with reference to FIGS. 8 to 10 .
- An example in which the trained regressor is a convolutional neural network taking into account dependencies between pixels is given with respect to FIGS. 11 to 12 below.
- FIG. 2 is a schematic diagram of a time-of-flight depth camera 200 which is a phase modulation time-of-flight depth camera or a gated time of flight depth camera or any other future type of time-of-flight depth camera.
- the time-of-flight camera 200 comprises a source of transmitted light 202 .
- the source of transmitted light is an incoherent light source.
- the source of transmitted light is a coherent light source.
- An example of an appropriate light source is a near infra-red laser or light emitting diode (LED) however another appropriate light source may be used.
- the transmitted light is modulated at a modulation frequency.
- the modulation frequency may be a radio frequency (RF) frequency in the range kHz-GHz (kilo hertz to giga hertz), for example the modulation frequency may be in the MHz (mega hertz) range.
- RF radio frequency
- the transmitted light is pulsed where the pulses may be of picosecond to nanosecond duration.
- a time-of-flight depth camera comprises an image sensor 204 that receives light reflected from objects within the scene.
- the image sensor 204 comprises a charge-coupled device (CCD) sensor, a complementary metal-oxide-semiconductor (CMOS) sensor, for example a Photonic Mixer Device (PMD) sensor or other appropriate sensor which is arranged to detect light reflected from objects, people and surfaces within the camera range.
- CCD charge-coupled device
- CMOS complementary metal-oxide-semiconductor
- PMD Photonic Mixer Device
- the image sensor 204 has a resolution compatible with the duration of the pulses emitted by the light source.
- the camera comprises an optical system 206 that is arranged to gather and focus reflected light from the environment on to the image sensor 204 .
- the optical system comprises an optical band pass filter, which allows only light of the same wavelength as the light source to be received by the sensor. The use of an optical band pass filter helps to suppress background light.
- the camera comprises driver electronics 208 which control both the light source and an image sensor, for example, to enable highly accurate phase difference measurements to be made or to enable a train of light pulses to be emitted and for the image sensor to be “shuttered” on and off.
- An image sensor may be shuttered on and off electronically rather than with physical shutters.
- the camera comprises a processor 208 and a memory 210 which stores raw time-of-flight data, depth maps and other data.
- a trained machine learning component 214 is available at the camera 212 in some examples and in other examples this trained machine learning component is at another computing device which receives and processes the raw sensor data from the camera. Where the trained machine learning component 214 is at the camera 212 it comprises software stored at memory 210 and executed at processor 208 in some cases. In some examples the trained machine learning component 214 is an FPGA or a dedicated chip. For example, the functionality of the trained machine learning component 214 is implemented, in whole or in part, by one or more hardware logic components.
- illustrative types of hardware logic components include Field-programmable Gate Arrays (FPGAs), Application-specific Integrated Circuits (ASICs), Application-specific Standard Products (ASSPs), System-on-a-chip systems (SOCs), Complex Programmable Logic Devices (CPLDs), Graphics Processing Units (GPUs).
- FPGAs Field-programmable Gate Arrays
- ASICs Application-specific Integrated Circuits
- ASSPs Application-specific Standard Products
- SOCs System-on-a-chip systems
- CPLDs Complex Programmable Logic Devices
- GPUs Graphics Processing Units
- the trained machine learning component 214 is arranged to execute the methods described herein with respect to FIGS. 3, 10, 12 in order to compute depth in real time, using a single stage, from a stream of raw time-of-flight data in a manner which allows for multi-path interference. This is achieved without the need to compute corrections to the depth values and apply those subsequent to the depth values having been computed.
- FIG. 3 is a flow diagram of a method at the depth detection apparatus 100 .
- Raw sensor data is received from the time-of-flight camera 104 and stored at memory 122 .
- a check 302 is made as to whether the process is to compute neighborhoods or not. The decision is made according to user input, or settings configured by an operator during manufacturing. In some examples, the depth detection apparatus automatically decides whether neighborhoods are to be computed according to the nature of the captured sensor data or other sensor data, such as sensors which detect motion of the time-of-flight camera.
- the depth detection apparatus inputs 306 the raw sensor data 300 to the trained machine learning component.
- the input process comprises inputting raw sensor data associated with individual pixels and/or whole frames of raw sensor data.
- the depth detection apparatus receives 308 , from the trained machine learning component depth value(s) and optionally also uncertainty data.
- the depth detection apparatus outputs the depth values in real time 310 together with the uncertainty data in some cases.
- real time it is meant that the rate of the received raw sensor data 300 is at least matched by the output rate of the output depth values at operation 310 .
- the process of FIG. 3 repeats as more raw sensor data is received 300 .
- the depth detection apparatus aggregates 304 raw sensor data values of pixels in a neighborhood of the pixel under current consideration.
- the neighborhood is either a spatial neighborhood or a temporal neighborhood or a combination of a spatial and temporal neighborhood.
- the aggregated raw sensor data values are input to the trained machine learning component at operation 306 .
- the trained machine learning component has been trained using training data which has been aggregated in the same manner.
- the training data allows for motion between the camera and the scene. For example, as a result of objects in the scene moving and/or as a result of the camera moving. This is achieved by using a model of motion between the camera and the scene.
- the machine learning system outputs depth values associated with the neighborhoods at operation 308 and the process proceeds with operation 310 and returns to operation 300 as described above.
- the table below has empirical data demonstrating how the depth detection apparatus 100 of FIG. 1 has improved accuracy as compared with an alternative method using a probabilistic generative model of time-of-flight on which inference is possible.
- the alternative method used the same conditions, such as number of exposures per frame, as the present method.
- the results for the alternative method are in the column labeled “generative” in the table below.
- a probabilistic generative model of time-of-flight is a description, expressed using likelihoods, of how raw time-of-flight data is generated by a time-of-flight camera under specified imaging conditions comprising reflectivity of a surface generating reflected light received at the camera (also referred to as albedo), illumination of the surface, and depth of the surface from the camera. Inference is possible on the probabilistic generative model so that given known imaging conditions it is possible to infer corresponding raw sensor data and vice versa.
- the probabilistic generative model takes into account a single path for reflected light for each pixel.
- raw time-of-flight sensor data is used to compute depth values from the probabilistic generative model, or from an approximation of the probabilistic generative model.
- FIG. 4 is a cumulative distribution plot of the empirical data of the above table and additional empirical data.
- the graph of FIG. 4 plots the probability of the depth error being less than a threshold, against the size of the threshold in millimetres.
- Line 400 indicates the data for the approach using a generative model described above and lines 402 , 404 , 406 indicate data for the present approach using either four exposures per frame (line 402 ), six exposures per frame (line 406 ) or eight exposures per frame (line 404 ) of the time-of-flight camera. It is seen that the generative approach gives the worst performance.
- FIG. 5 is a schematic diagram of how to create a trained machine learning component such as component 124 of FIG. 1 .
- Millions of training data pairs are stored at a database 500 or other store.
- a training data pair comprises a simulated raw time-of-flight sensor data frame and a corresponding ground truth depth map.
- a training data pair comprises a plurality of raw sensor data values associated with a pixel of the sensor, and a ground truth depth value associated with the pixel.
- a training data pair comprises a plurality of aggregated raw sensor data values, aggregated over a spatial or temporal neighborhood around a pixel of the sensor, and a ground truth depth value associated with the pixel.
- the raw sensor data values are obtained from a time-of-flight simulator which simulates multi-path interference as part of the simulation.
- the time-of-flight simulator comprises a renderer 606 and a viewpoint selector 604 and an example is described in more detail with reference to FIG. 6 .
- the time-of-flight simulator is relatively complex and it is not possible to carry out inference on the time-of-flight simulator as it is for the generative model of time-of-flight mentioned above.
- a trainer 504 accesses the training data pairs 500 and uses them to train and produce a trained machine learning component 506 such as a random decision forest, a convolutional neural network, a support vector machine or other trained regressor.
- the resulting trained machine learning component 506 may then be used as described above with respect to FIGS. 1 and 3 .
- the type of training data used to train the machine learning component corresponds with the type of data input to the machine learning component at test time. Test time is the time in which the machine learning component is operational to compute depth values from previously unseen raw sensor data.
- FIG. 6 is a schematic diagram of a time-of-flight simulator 602 and other components used to create training data pairs 626 such as the training data pairs described above with reference to FIG. 5 .
- the time-of-flight simulator 602 comprises a renderer 606 such as a computer graphics renderer which uses ray-tracing to render an image from a model of a 3D object or environment.
- the renderer 606 is a physically-accurate renderer which produces realistic rendered images by using physical modeling of light scattering, light transport simulation, and integration of paths of light at every pixel.
- the renderer 606 records, for each pixel, an intensity weight and a path length (the length of the path of simulated light from the simulated emitter of the TOF camera to the simulated surface(s) in the world and back to the simulated TOF sensor) for each of a plurality N of light path samples.
- the number of light path samples is the same for each pixel and is fixed in advance in some cases, such as a few thousand light path samples. In other examples, the number of light path samples is selected adaptively during simulation, for example, so that more complex areas in the scene are allocated more simulated light paths compared to simpler areas. More complex areas are identified in various ways such as according to the presence of corners, the presence of edges, the degree of surface reflectivity, or other factors.
- An example of a per-pixel weighted point mass is given in FIG. 7 for a pixel depicting a surface in a corner of a room. The pixel in the example of FIG.
- the first peak of the per-pixel weighted point mass gives an estimate of the ground truth depth which is input to the training data pair 624 in some examples.
- the renderer 606 uses ray-tracing to render an image from a model of a 3D object or environment. It is time consuming and expensive to generate suitable models of 3D objects or environments. For example, where the time-of-flight camera is to be used in doors, the models are of typical indoor environments such as living rooms, offices, kitchens and other indoor environments. However, it is difficult to obtain a good range and variety of models of such 3D environments. In order to address this, the present technology uses a plurality of parametric 3D environment models 610 .
- a parametric 3D environment model 610 is a computer manipulable description of a 3D environment expressed using one or more parameters.
- An instance generator 612 accesses a parametric 3D environment model from a store of parametric 3D environment models 610 and creates a plurality of instances.
- An instance is a 3D environment model created from a parametric 3D environment model by selecting values of the parameters of the parametric 3D environment model. The instances are created by selecting values of the parameters at random and/or within a specified range of possible values of the parameters according to knowledge of feasible parameter value ranges.
- a non-exhaustive list of examples of parameters of a parametric 3D environment model is: geometry of individual objects in the 3D model, presence or absence of individual objects (including light sources), object location, object orientation, surface materials and textures, amount of ambient illumination.
- Using parametric models in this way enables a huge number of variations of 3D environment model to be generated in a fast, efficient manner.
- values of parameters can be adjusted to vary surface reflectivity of the flooring material, ceiling, walls, furniture, and also to vary geometry and/or position of objects in the room such as furniture, light fittings, windows other objects.
- the time-of-flight simulator 602 is able to render good variety of simulated raw time-of-flight data which incorporates multi-path interference.
- This gives improved quality training data pairs 626 and as a consequence, the trained machine learning component gives better quality depth values and uncertainty information.
- there is a depth detection apparatus giving highly accurate depth values enabling better control by downstream computing systems.
- the renderer 606 renders an image from a model of a 3D object or environment given a camera viewpoint.
- a camera viewpoint is a 3D position and orientation within a bounding box of the 3D environment model instance.
- the renderer 606 uses details of optical properties 600 of the time-of-flight camera such as the field of view of the camera, the focal length, and the spatial light emission intensity profile.
- the time-of-flight simulator has a viewpoint selector 604 which selects a large number of possible viewpoints of the camera within the instance of the 3D environment model. For example, the viewpoint selector 604 selects the viewpoints at random by choosing random 3D positions and orientations within a bounding box of the 3D environment model.
- the viewpoint selector 604 is arranged to reject viewpoints which are within a threshold distance of objects in the 3D environment model. For example, to reject viewpoints which face a wall of the 3D environment with only 20 centimetres between the camera viewpoint and the wall.
- the renderer computes simulated raw time-of-flight data for each of a plurality of viewpoints selected by the viewpoint selector 604 . For example, thousands of different viewpoints.
- the training data pairs 624 exhibit good variety and the resulting trained machine learning component 124 is able to generalize well to unseen 3D environments with unseen camera viewpoints.
- the time-of-flight simulator 602 outputs per-pixel weighted point masses 608 . These do not take into account exposure profiles a time-of-flight camera has. This means that the time-of-flight simulator can be used for any type of time-of-flight camera.
- the per-pixel weighted point masses 608 are input to an exposure profile combiner 616 which incorporates information about a specified exposure profile of a time-of-flight camera into the raw time-of-flight data being simulated.
- the exposure profile is specified, by an operator during manufacturing, by selecting the exposure profile from a library of exposure profile details, or using user input. For example, the exposure profile is described using vector constant A and vector-valued function C.
- a vector-valued function is a function which takes a scalar argument and returns a vector.
- the exposure profile combiner 616 combines the per-pixel weighted point masses (values of weight w and path length t) with the vector constants A and the vector-valued function C using the following equation:
- N is the number of light path samples fixed at a value such as a few thousand samples and where the symbol ⁇ denotes the ambient light intensity used by the time-of-flight simulator.
- the values of the vector constant A and the values of the elements returned by the vector-valued function C in some examples are between zero and 2 12 .
- a mean response vector ⁇ right arrow over ( ⁇ ) ⁇ of simulated raw time-of-flight sensor intensity values (such as four intensity values one for each of four exposures) simulated as being observed at the same pixel of the sensor is equal to the ambient light intensity ⁇ times a vector constant ⁇ right arrow over (A) ⁇ which represents part of the exposure pattern of the time of flight camera, plus the sum over the number of light path samples N, of a vector-valued function ⁇ right arrow over (C) ⁇ evaluated at t i , which represents another part of the exposure pattern of the time-of-flight camera times the weight ⁇ right arrow over ( ⁇ ) ⁇ i which is the point mass weight from the time-of-flight simulator output for the light path sample i, and taking into account a distance decay function d(t i ) where intensity falls away with distance from the camera of the surface which reflects the light.
- sensor noise is simulated. That is, the output of the exposure profile combiner is processed by a noise addition component 620 which adds noise to the simulated raw time-of-flight data. However it is not essential to use the noise addition component 620 .
- the output of the noise addition component 620 is simulated raw intensity values 622 associated with a pixel, and which incorporate multi-path interference and sensor noise.
- This data is formed into a training data pair 624 by accessing the corresponding ground truth depth value (which is the true depth of the surface depicted by the pixel).
- the corresponding ground truth depth value is known either by computing it from the 3D environment instance 614 or by taking the first peak of the per-pixel weighted point mass. Given a 3D environment model instance the depth detection apparatus computes ground truth depth values 318 for a given camera viewpoint.
- a training data pair comprises a frame of simulated raw time-of-flight sensor data and a corresponding ground truth depth map. This is achieved by repeating the process for individual pixels of the sensor to form a frame.
- the machine learning system comprises a random decision forest.
- a random decision forest comprises one or more decision trees each having a root node, a plurality of split nodes and a plurality of leaf nodes.
- Raw TOF sensor data is pushed through trees of a random decision forest from the root to a leaf node in a process whereby a decision is made at each split node.
- the decision is made according to values of parameters at the split nodes, where the values of the parameters have been learnt during training.
- the raw TOF sensor data proceeds to the next level of the tree down a branch chosen according to the results of the decision.
- parameter values (which specify decision criteria to be used at the split nodes) are learnt for use at the split nodes and data (Raw TOF sensor data with ground truth depth values) is accumulated at the leaf nodes.
- the training data accumulated at a leaf node during training is stored as a histogram, or in an aggregated manner, such as using a mean, median or mode or by fitting a probability distribution to the histogram and storing statistics describing the probability distribution.
- the training set described above is first received 800 .
- the number of decision trees to be used in a random decision forest is selected 802 .
- a random decision forest is a collection of deterministic decision trees. Decision trees can be used in classification or regression algorithms, but can suffer from over-fitting, i.e. poor generalization. However, an ensemble of many randomly trained decision trees (a random forest) yields improved generalization.
- the number of trees is fixed.
- FIG. 9 An example random decision forest is shown illustrated in FIG. 9 .
- the illustrative decision forest of FIG. 9 comprises three decision trees: a first tree 900 ; a second tree 902 ; and a third tree 904 .
- Each decision tree comprises a root node (e.g. root node 906 of the first decision tree 900 ), a plurality of internal nodes, called split nodes (e.g. split node 908 of the first decision tree 900 ), and a plurality of leaf nodes (e.g. leaf node 910 of the first decision tree 900 ).
- a decision tree from the decision forest is selected 804 (e.g. the first decision tree 800 ) and the root node 806 is selected 806 .
- a random set of test parameter values are then generated 810 for use by a binary test performed at the root node.
- the parameters are thresholds or other parameters of a binary test. In the case that neighborhoods of pixels are used then the binary test optionally comprises pairwise tests comparing pairs of pixels. In the pixel-independent case pairwise tests are not essential.
- every combination of test parameter value is applied 812 to each raw TOF training data item which has reached the current node.
- criteria also referred to as objectives
- the calculated criteria comprise the information gain (also known as the relative entropy).
- the combination of parameters that optimize the criteria is selected 814 and stored at the current node for future use.
- other criteria can be used, such as the residual variance criterion or others.
- the current node is set 818 as a leaf node.
- the current depth of the tree is determined (i.e. how many levels of nodes are between the root node and the current node). If this is greater than a predefined maximum value, then the current node is set 818 as a leaf node.
- Each leaf node has labeled raw TOF data which accumulate at that leaf node during the training process as described below.
- the current node is set 820 as a split node.
- the current node As the current node is a split node, it has child nodes, and the process then moves to training these child nodes.
- Each child node is trained using a subset of the training time-of-flight data at the current node.
- the subset of training time-of-flight data sent to a child node is determined using the parameters that optimized the criteria. These parameters are used in the binary test, and the binary test performed 822 on all training time-of-flight data at the current node.
- the raw TOF data items that pass the binary test form a first subset sent to a first child node
- the raw TOF data items that fail the binary test form a second subset sent to a second child node.
- the process as outlined in blocks 810 to 822 of FIG. 8 are recursively executed 824 for the subset of raw TOF data items directed to the respective child node.
- new random test parameters are generated 810 , applied 812 to the respective subset of raw TOF data items, parameters optimizing the criteria selected 814 , and the type of node (split or leaf) determined 816 . If it is a leaf node, then the current branch of recursion ceases. If it is a split node, binary tests are performed 822 to determine further subsets of raw TOF data items and another branch of recursion starts.
- this process recursively moves through the tree, training each node until leaf nodes are reached at each branch. As leaf nodes are reached, the process waits 826 until the nodes in all branches have been trained. Note that, in other examples, the same functionality can be attained using alternative techniques to recursion.
- raw TOF data items with ground truth depth values are accumulated 828 at the leaf nodes of the tree.
- a representation of the accumulated depth values is stored 830 using various different methods.
- Each tree comprises a plurality of split nodes storing optimized test parameters, and leaf nodes storing associated ground truth depth values. Due to the random generation of parameters from a limited subset used at each node, the trees of the forest are distinct (i.e. different) from each other.
- the training process is performed in advance of using the trained machine learning system to compute depth values of observed raw TOF data.
- the decision forest and the optimized test parameters is stored on a storage device for use in computing depth values at a later time.
- FIG. 10 illustrates a flowchart of a process for predicting depth values from previously unseen raw TOF data using a decision forest that has been trained as described above.
- an unseen raw TOF data item is received 1000 .
- a raw TOF data item is referred to as ‘unseen’ to distinguish it from a training TOF data item which has the depth value specified.
- neighborhoods are computed 1002 from the unseen raw TOF data.
- the neighborhoods are spatial and/or temporal neighborhoods as described above.
- a trained decision tree from the decision forest is selected 1004 .
- the selected raw TOF data item (whole frame, values for an individual pixel, values for a neighborhood) is pushed 1006 through the selected decision tree such that it is tested against the trained parameters values at a node, and then passed to the appropriate child in dependence on the outcome of the test, and the process repeated until the raw TOF data item reaches a leaf node.
- the accumulated depth values associated with this leaf node are stored 1008 for this raw TOF data item.
- a new decision tree is selected 1004 , the raw TOF data item pushed 1006 through the tree and the accumulated depth values stored 1008 . This is repeated until it has been performed for all the decision trees in the forest. Note that the process for pushing a raw TOF data item through the plurality of trees in the decision forest can also be performed in parallel, instead of in sequence as shown in FIG. 10 .
- the data from the indexed leaf nodes is aggregated 1014 by averaging or in other ways. For example, where histograms of depth values are stored at the leaf nodes the histograms from the indexed leaf nodes are combined and used to identify one or more depth values associated with the raw TOF data item.
- the processes outputs 816 at least one depth value as a result, and is able to output a confidence weighting of the depth value. This helps any subsequent algorithm assess whether the proposal is good or not. More than one depth value may be output; for example, where there is uncertainty.
- the random decision forest example described above is modified in some cases by implementing the random decision forest as a directed acyclic graph in order to reduce the number of nodes of the graph. This facilitates deployment of the machine learning component on resource constrained devices such as smart phones, tablet computers and wearable computing devices.
- FIG. 11 is a flow diagram of a method of training a convolutional neural network.
- a training data pair comprises a raw TOF frame and a corresponding depth map. Individual pixel locations of the TOF frame have one or more intensity values, for different exposures for example.
- the training data pair is accessed 1100 and input to the convolutional neural network 1102 .
- a neural network is a plurality of weighted nodes which are interconnected by edges which may also be weighted.
- the neural network has input nodes, output nodes and internal nodes.
- the output nodes are associated with depth values learnt during a training phase.
- a convolutional neural network is a neural network where the nodes are arranged in multiple layers so that there are nodes in three dimensions, width, height and depth. Within each layer there are multiple receptive fields where a receptive field is a group of interconnected nodes which processes a portion of an input image (or TOF frame in the present examples). Within a layer the receptive fields are arranged so that their outputs partially overlap one another to give redundancy. A node of an internal layer is connected to neurons of one receptive field in the layer above.
- a convolutional neural network is typically a feed-forward neural network in which an input image (or TOF frame) is fed into input nodes, processed forwards through the network according to weights at the nodes, weighted connections between the nodes, and non-linear activation functions, and reaching a set of one or more output nodes.
- the training data instance is fed forwards through the network, from the input nodes to the output nodes, with computations performed at the nodes which update 1104 the weights of the nodes and edges according to update rules.
- the update process is repeated for more training instances according to a check for convergence at check point 1106 of FIG. 11 , such as to see if the amount of change from the most recent update was smaller than a threshold.
- convergence the training ends 1108 .
- the trained machine learning component 124 receives 1200 an unseen raw time-of-flight frame. It inputs 1202 the frame to the trained convolutional neural network. Values associated with individual pixel locations (or neighborhoods) of the frame are input to the plurality of input nodes and this triggers a feed forward process through the network. The values from the frame pass through a layer of the neural network and trigger input to subsequent layers via the overlapping receptive fields. Eventually output nodes are triggered and depth values associated with the triggered output nodes are retrieved from storage. The depth values are then stored as a depth map 1204 optionally with uncertainty data calculated from the neural network outputs. The depth map has smoothed depth values because the receptive fields of the convolutional neural network enable spatial relationships between the pixel locations to be taken into account.
- FIG. 13 illustrates various components of an exemplary computing-based device 1300 which is implemented as any form of a computing and/or electronic device, and in which embodiments of a depth detection apparatus is implemented in cases where the depth detection apparatus is separate from the time-of-flight camera.
- a non-exhaustive list of examples of forms of the computing and/or electronic device is: augmented reality near eye computing system, augmented reality body worn computing system, augmented reality wearable computing device, smart phone, desk top computer, computer game console, touch-less user interface computing device, tablet computer, laptop computer.
- Computing-based device 1300 comprises one or more processors 1302 which are microprocessors, controllers or any other suitable type of processors for processing computer executable instructions to control the operation of the device in order to compute depth values or depth maps from raw time-of-flight data.
- the computing-based device computes a stream of depth maps from a stream of frames of raw time-of-flight data (received from time-of-flight camera 1326 ) in real time and in a manner which takes into account multipath interference.
- the processors 1302 include one or more fixed function blocks (also referred to as accelerators) which implement a part of the method of any of FIGS.
- Platform software comprising an operating system 1304 or any other suitable platform software is provided at the computing-based device to enable application software 1006 to be executed on the device.
- a trained machine learning component 1308 is provided such as the trained machine learning component 124 of FIG. 1 .
- a data store 1310 at memory 1316 stores raw time-of-flight data, simulated raw time-of-flight data, parameter values, exposure profile data, 3D environment models and other data.
- Computer-readable media includes, for example, computer storage media such as memory 1316 and communications media.
- Computer storage media, such as memory 1316 includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or the like.
- Computer storage media includes, but is not limited to, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM), electronic erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that is used to store information for access by a computing device.
- communication media embody computer readable instructions, data structures, program modules, or the like in a modulated data signal, such as a carrier wave, or other transport mechanism. As defined herein, computer storage media does not include communication media.
- a computer storage medium should not be interpreted to be a propagating signal per se.
- the computer storage media memory 1316
- the storage is, in some examples, distributed or located remotely and accessed via a network or other communication link (e.g. using communication interface 1318 ).
- the computing-based device 1300 also comprises an input/output controller 1320 arranged to output display information to a display device 1324 , where used, which is separate from or integral to the computing-based device 1300 .
- the display information optionally graphically presents depth maps computed by the computing-based device.
- the input/output controller 1320 is also arranged to receive and process input from one or more devices, such as time-of-flight camera 1326 , a user input device 1322 (e.g. a stylus, mouse, keyboard, camera, microphone or other sensor).
- the user input device 1322 detects voice input, user gestures or other user actions and provides a natural user interface (NUI). This user input is used to specify 3D environment models, specify parameter values or for other purposes.
- the display device 1324 also acts as the user input device 1322 if it is a touch sensitive display device.
- the input/output controller 1320 outputs data to devices other than the display device in some examples, e.g. a locally connected printing device.
- any of the input/output controller 1320 , display device 1324 and the user input device 1322 comprise, in some examples, NUI technology which enables a user to interact with the computing-based device in a natural manner, free from artificial constraints imposed by input devices such as mice, keyboards, remote controls and the like.
- NUI technology that are provided in some examples include but are not limited to those relying on voice and/or speech recognition, touch and/or stylus recognition (touch sensitive displays), gesture recognition both on screen and adjacent to the screen, air gestures, head and eye tracking, voice and speech, vision, touch, gestures, and machine intelligence.
- NUI technology examples include intention and goal understanding systems, motion gesture detection systems using depth cameras (such as stereoscopic camera systems, infrared camera systems, red green blue (rgb) camera systems and combinations of these), motion gesture detection using accelerometers/gyroscopes, facial recognition, three dimensional (3D) displays, head, eye and gaze tracking, immersive augmented reality and virtual reality systems and technologies for sensing brain activity using electric field sensing electrodes (electro encephalogram (EEG) and related methods).
- depth cameras such as stereoscopic camera systems, infrared camera systems, red green blue (rgb) camera systems and combinations of these
- motion gesture detection using accelerometers/gyroscopes motion gesture detection using accelerometers/gyroscopes
- facial recognition three dimensional (3D) displays
- head, eye and gaze tracking immersive augmented reality and virtual reality systems and technologies for sensing brain activity using electric field sensing electrodes (electro encephalogram (EEG) and related methods).
- EEG electric field sensing electrodes
- examples include any combination of the following:
- a depth detection apparatus comprising:
- a memory storing raw time-of-flight sensor data received from a time-of-flight sensor
- a trained machine learning component having been trained using training data pairs, a training data pair comprising at least one simulated raw time-of-flight sensor data value and a corresponding simulated ground truth depth value;
- the trained machine learning component configured to compute in a single stage, for an item of the stored raw time-of-flight sensor data, a depth value of a surface depicted by the item, by pushing the item through the trained machine learning component.
- the trained machine learning component having been trained using simulated raw time-of-flight sensor data values which incorporate simulated multi-path interference.
- the trained machine learning component having been trained using simulated raw time-of-flight sensor data values computed using a computer graphics renderer which simulates multi-path interference.
- the trained machine learning component having been trained using simulated raw time-of-flight sensor data values comprising, for an individual pixel, weighted intensity values at different depths potentially depicted by the pixel.
- the trained machine learning component having been trained using simulated raw time-of-flight sensor data values where information about an exposure profile of the time-of-flight sensor is combined with the simulated raw time-of-flight sensor data values.
- the trained machine learning component having been trained using simulated raw time-of-flight sensor data values where information about sensor noise of the time-of-flight sensor is combined with the simulated raw time-of-flight sensor data values.
- the trained machine learning component having been trained using simulated raw time-of-flight sensor data values computed using a computer graphics renderer from a plurality of instances of a parametric 3D environment model, where the instances of the parametric 3D environment model are computer generated automatically at random.
- parameters of the parametric 3D environment model comprise one or more of: geometry of an object in the 3D environment model, position of an object in the 3D environment model, presence of an object in the 3D environment model, orientation of an object in the 3D environment model, surface materials and reflectivity, ambient illumination.
- a training data pair comprises a frame of simulated raw time-of-flight sensor data values and a corresponding simulated ground truth depth map.
- the trained machine learning component having been trained using simulated raw time-of-flight sensor data values computed using a computer graphics renderer for a plurality of randomly selected viewpoints of the time-of-flight sensor, and where any of the viewpoints which are within a threshold distance of a surface in a 3D environment model used by the computer graphics renderer are omitted.
- the trained machine learning component having been trained using simulated raw time-of-flight sensor data values aggregated over a neighborhood of a pixel, where the neighborhood is a spatial neighborhood, or a temporal neighborhood, or a spatial and temporal neighborhood.
- the trained machine learning component is a pixel independent regressor.
- the trained machine learning component is regressor which takes into account relationships between pixels of the stored time-of-flight sensor data.
- each training data pair comprises a frame of simulated raw time-of-flight sensor data and a ground truth depth map.
- the trained machine learning component is at least partially implemented using hardware logic selected from any one or more of: a field-programmable gate array, an application-specific integrated circuit, an application-specific standard product, a system-on-a-chip, a complex programmable logic device, a graphics processing unit.
- a depth detection apparatus comprising:
- a memory storing frames of raw time-of-flight sensor data received from a time-of-flight sensor
- a trained machine learning component having been trained using training data pairs, a training data pair comprising a simulated raw time-of-flight sensor frame and a corresponding simulated ground truth depth map;
- the trained machine learning component configured to compute in a single stage, for a frame of the stored raw time-of-flight sensor data, a depth map of surfaces depicted by the frame, by pushing the frame through the trained machine learning component.
- the trained machine learning component is configured to operate in real time by computing the depth maps at a rate which is equivalent to or faster than a frame rate of the time-of-flight sensor.
- the trained machine learning component comprises a convolutional neural network.
- the trained machine learning component comprises a pixel independent regressor which is a regressor that does not take into account relationships between pixels of a time-of-flight sensor frame.
- a computer-implemented method comprising:
- a training data pair comprising at least one simulated raw time-of-flight sensor data value and a corresponding simulated ground truth depth value
- operating the trained machine learning component comprises computing, in a single stage, for an item of the stored raw time-of-flight sensor data, a depth value of a surface depicted by the item, by pushing the item through the trained machine learning component.
- An apparatus comprising:
- a trained machine learning component having been trained using training data pairs, a training data pair comprising at least one simulated raw time-of-flight sensor data value and a corresponding simulated ground truth depth value; wherein operating the trained machine learning component comprises computing, in a single stage, for an item of the stored raw time-of-flight sensor data, a depth value of a surface depicted by the item, by pushing the item through the trained machine learning component.
- the examples illustrated and described herein as well as examples not specifically described herein but within the scope of aspects of the disclosure constitute exemplary means for storing raw time-of-flight sensor data, executing a trained machine learning system, computing depth values or computing depth maps.
- the memory of FIG. 2 or 13 constitutes exemplary means for storing raw time-of-flight sensor data.
- the processor of FIG. 2 or 13 constitutes exemplary means for operating a trained machine learning component.
- computer or ‘computing-based device’ is used herein to refer to any device with processing capability such that it executes instructions.
- processors including smart phones
- tablet computers set-top boxes
- media players including games consoles
- personal digital assistants wearable computers
- many other devices include personal computers (PCs), servers, mobile telephones (including smart phones), tablet computers, set-top boxes, media players, games consoles, personal digital assistants, wearable computers, and many other devices.
- the methods described herein are performed, in some examples, by software in machine readable form on a tangible storage medium e.g. in the form of a computer program comprising computer program code means adapted to perform all the operations of one or more of the methods described herein when the program is run on a computer and where the computer program is embodied on a computer readable medium.
- tangible storage media include computer storage devices comprising computer-readable media such as disks, thumb drives, memory etc. and do not include propagated signals.
- the software is suitable for execution on a parallel processor or a serial processor such that the method operations are carried out in any suitable order, or simultaneously.
- a remote computer is able to store an example of the process described as software.
- a local or terminal computer is able to access the remote computer and download a part or all of the software to run the program.
- the local computer may download pieces of the software as needed, or execute some software instructions at the local terminal and some at the remote computer (or computer network).
- a dedicated circuit such as a digital signal processor (DSP), programmable logic array, or the like.
- DSP digital signal processor
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Computer Networks & Wireless Communication (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Mathematical Physics (AREA)
- Electromagnetism (AREA)
- Optics & Photonics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Image Analysis (AREA)
Abstract
A depth detection apparatus is described which has a memory storing raw time-of-flight sensor data received from a time-of-flight sensor. The depth detection apparatus also has a trained machine learning component having been trained using training data pairs. A training data pair comprises at least one simulated raw time-of-flight sensor data value and a corresponding simulated ground truth depth value. The trained machine learning component is configured to compute in a single stage, for an item of the stored raw time-of-flight sensor data, a depth value of a surface depicted by the item, by pushing the item through the trained machine learning component.
Description
- Time-of-flight (TOF) cameras are increasingly used in a variety of applications, for example, human computer interaction, automotive applications, measurement applications and machine vision. A TOF camera can be used to compute depth maps which contain information relating to the depth of an object in a scene from the camera. The depth refers to the projection of distance on an imaginary line that extends from the camera, where the distance is the absolute radial distance. A light source at the TOF camera illuminates the scene and the light is reflected by objects in the scene. The camera receives the reflected light that, dependent on the distance of an object to the camera, experiences a delay. Given the fact that the speed of light is known, a depth map is computable.
- However, the time of flight measurement is subject to a number of errors and uncertainties which lead to errors in the computed depth maps. For example, the reflected light often undergoes multiple reflections from different surfaces within the scene which cause significant errors in the calculated depth.
- The embodiments described below are not limited to implementations which solve any or all of the disadvantages of known TOF cameras or TOF data processing systems.
- The following presents a simplified summary of the disclosure in order to provide a basic understanding to the reader. This summary is not intended to identify key features or essential features of the claimed subject matter nor is it intended to be used to limit the scope of the claimed subject matter. Its sole purpose is to present a selection of concepts disclosed herein in a simplified form as a prelude to the more detailed description that is presented later.
- A depth detection apparatus is described which has a memory storing raw time-of-flight sensor data received from a time-of-flight sensor. The depth detection apparatus also has a trained machine learning component having been trained using training data pairs. A training data pair comprises at least one simulated raw time-of-flight sensor data value and a corresponding simulated ground truth depth value. The trained machine learning component is configured to compute in a single stage, for an item of the stored raw time-of-flight sensor data, a depth value of a surface depicted by the item, by pushing the item through the trained machine learning component.
- Many of the attendant features will be more readily appreciated as the same becomes better understood by reference to the following detailed description considered in connection with the accompanying drawings.
- The present description will be better understood from the following detailed description read in light of the accompanying drawings, wherein:
-
FIG. 1 is a schematic diagram of a trained machine learning component deployed with a time-of-flight camera; -
FIG. 2 is a schematic diagram of a time-of-flight camera; -
FIG. 3 is a flow diagram of a method of using a trained machine learning component such as that ofFIG. 1 ; -
FIG. 4 is a graph of empirical results of depth values computed using the arrangement ofFIG. 1 and various other arrangements; -
FIG. 5 is a schematic diagram of components used to create a trained machine learning component, such as that ofFIG. 1 ; -
FIG. 6 is a schematic diagram of components used to generate training data pairs such as the training data pairs ofFIG. 5 ; -
FIG. 7 is a graph of data output by a time-of-flight simulator such as that ofFIG. 6 ; -
FIG. 8 is a flow diagram of a method of training a random decision forest using training data pairs such as those ofFIG. 6 ; -
FIG. 9 is a schematic diagram of a plurality of random decision trees; -
FIG. 10 is a flow diagram of a method of using a trained random decision forest at test time, such as the trained machine learning component ofFIG. 1 ; -
FIG. 11 is a flow diagram of a method of training a convolutional neural network using training data pairs such as those ofFIG. 6 ; -
FIG. 12 is a flow diagram of using a trained convolutional neural network; -
FIG. 13 illustrates an exemplary computing-based device in which embodiments of a trained machine learning component for use with a time-of-flight camera are implemented. - Like reference numerals are used to designate like parts in the accompanying drawings.
- The detailed description provided below in connection with the appended drawings is intended as a description of the present examples and is not intended to represent the only forms in which the present example are constructed or utilized. The description sets forth the functions of the example and the sequence of operations for constructing and operating the example. However, the same or equivalent functions and sequences may be accomplished by different examples.
- Time-of-flight cameras output raw sensor data which is then processed to derive depth values. The act of processing the raw sensor data to compute the depth values is time consuming and complex. In addition, the depth values which are computed suffer from inaccuracy due to multi-path interference and noise in the raw sensor data. However, time-of-flight cameras are increasingly used for real time applications and/or where highly accurate depth values are needed. For example, hand tracking, body tracking, 3D scene reconstruction and others.
- By using a trained machine learning system as described herein, it is possible to directly derive highly accurate depth values from raw time-of-flight sensor data in real time. This is achieved in a single stage, without the need to compute depth values using conventional, non-machine learning methods. The trained machine learning system takes raw time-of-flight sensor data as input and computes depth values as output, where the depth values already take into account multi-path interference and optionally also take into account sensor noise. This improves the speed with which the depth values are computed, since there is a single stage. For example, there is no need to compute the depth values and then subsequently process the depth values to correct for multi-path interference and/or sensor noise. The result gives a better computing device which is able to control a downstream system using accurate depth values obtained from one or more time-of-flight sensors. Usability from the point of view of the end user is improved since the accurate depth values give a better correspondence with reality such as for hand tracking, body tracking, augmented reality and others.
- The machine learning system has been trained using pairs of simulated raw time-of-flight sensor data frames and corresponding depth maps. The simulated raw time-of-flight sensor data frames are calculated using a modified computer graphics renderer as described in more detail below. The simulated raw time-of-flight sensor data frames are simulated assuming that multi-path interference occurs. Therefore it is possible to learn a mapping from simulated raw time-of-flight sensor data direct to depth values which are already corrected for multi-path interference. There is no need to apply a subsequent stage to correct the depth values for multi-path interference. As a result processing is significantly simplified and speeded up. Because the processing is simpler than two stage processes, the processing is implementable at a dedicated chip, field programmable gate array (FPGA) or similar. This is particularly useful where the processing is to be carried out at a time-of-flight camera itself, or on a resource constrained device such as a wearable or mobile computing device which has an integral time-of-flight camera.
- The machine learning component described herein is found to give highly accurate depth values, especially for situations where depth values are difficult to compute accurately using existing approaches. For example, at corners of rooms, where the floor meets the wall, where the wall meets the ceiling, in the case of highly reflective surfaces such as shiny floors, and others. Accuracy improvements are believed to be due, at least in part, to the fact that the machine learning component has been trained with the particular type of training data.
-
FIG. 1 is a schematic diagram of adepth detection apparatus 100 comprising amemory 122 and a trainedmachine learning component 124. A time-of-flight camera 104, which is a phase modulation time-of-flight depth camera or a gated time-of-flight depth camera (or another future type of TOF camera), captures a stream ofraw sensor data 108 depicting ascene 102. One or more objects in thescene 102 and/or the time-of-flight camera itself are moving in some examples. For example, in the scenario ofFIG. 1 the scene comprises a child doing a cartwheel so that there are several moving objects in the scene (the child's limbs). The time-of-flight camera is wall-mounted in the room or in some examples is body worn or head-mounted or mounted on a robot or vehicle. - The stream of
raw sensor data 108 comprises a plurality of frames of raw sensor data which have been captured by the time-of-flight camera. For example, for some types of time-of-flight camera a frame of raw sensor data comprises, for each pixel of the camera sensor, complex numbers which are amplitude and phase measurements of reflected light. For example, for another type of time-of-flight camera, a frame of raw sensor data comprises, for each pixel of the camera sensor, a plurality of intensity values of reflected light sensed at the pixel, for different exposure periods. - The time-of-flight camera uses one or
more measurement patterns 106 also referred to as exposure profiles. A measurement pattern is a set of values of configurable parameters of the time-of-flight camera, to be used when a frame of raw sensor data is captured by the camera. Wheredifferent measurement patterns 106 are available, the time-of-flight camera is able to capture different frames using different measurement patterns. - The stream of
raw sensor data 108 is input to adepth detection apparatus 100 comprising amemory 122 and a trainedmachine learning component 124. The trainedmachine learning component 124 computes depth maps, or depth values of individual pixels, in a single stage process which takes into account multi-path interference and/or sensor noise so that an accuratedepth map stream 110 is output. A depth map comprises a plurality of depth values, each depth value being for an individual pixel of the time of flight image sensor. In some examples, depth values of individual pixels are output. In some examples the trainedmachine learning component 100 also outputs uncertainty data associated with the depth values. Thestream 110 of depth values and optional uncertainty data is input to adownstream system 112 such as ascene reconstruction engine 114, agesture detection system 116, anaugmented reality system 118, atouch-less user interface 120 or others. - The
depth detection apparatus 100 operates in real-time in some examples. In some cases thedepth detection apparatus 100 is integral with the time-of-flight camera 104. In some cases thedepth detection apparatus 100 is in a computing device such as a smart phone, tablet computer, head worn augmented reality computing device, or other computing device which has a time-of-flight camera. Thememory 122 holds raw time-of-flight sensor data from thestream 108 and makes this available to the trainedmachine learning component 124 for processing. The trainedmachine learning component 124 has been trained using pairs of simulated raw time-of-flight sensor data frames and corresponding depth maps. The simulated raw time-of-flight sensor data frames are simulated assuming that multi-path interference occurs. In some examples the trainedmachine learning component 124 has been trained using pairs of raw time-of-flight sensor data values associated with individual sensor pixels and corresponding depth values. - In some examples, the trained
machine learning component 124 comprises a trained regressor such as a random decision forest, directed acyclic graph, support vector machine, neural network, or other trained regressor. The trained regressor is a pixel independent trained regressor in some examples, in that it is trained using pairs comprising individual pixels and associated individual depth values, and where dependencies between the pairs are not taken into account. In other examples, the trained regressor does take dependencies between individual pixels into account. An example, of a trained regressor which does take dependencies between individual pixels into account is a convolutional neural network. An example in which the trained regressor is a pixel independent regressor is a random decision forest which is given below with reference toFIGS. 8 to 10 . An example in which the trained regressor is a convolutional neural network taking into account dependencies between pixels is given with respect toFIGS. 11 to 12 below. -
FIG. 2 is a schematic diagram of a time-of-flight depth camera 200 which is a phase modulation time-of-flight depth camera or a gated time of flight depth camera or any other future type of time-of-flight depth camera. The time-of-flight camera 200 comprises a source of transmittedlight 202. In an example the source of transmitted light is an incoherent light source. In another example the source of transmitted light is a coherent light source. An example of an appropriate light source is a near infra-red laser or light emitting diode (LED) however another appropriate light source may be used. In the case of a phase modulated time of flight camera the transmitted light is modulated at a modulation frequency. In an example the modulation frequency may be a radio frequency (RF) frequency in the range kHz-GHz (kilo hertz to giga hertz), for example the modulation frequency may be in the MHz (mega hertz) range. In the case of a gated time-of-flight camera the transmitted light is pulsed where the pulses may be of picosecond to nanosecond duration. - A time-of-flight depth camera comprises an
image sensor 204 that receives light reflected from objects within the scene. Theimage sensor 204 comprises a charge-coupled device (CCD) sensor, a complementary metal-oxide-semiconductor (CMOS) sensor, for example a Photonic Mixer Device (PMD) sensor or other appropriate sensor which is arranged to detect light reflected from objects, people and surfaces within the camera range. In the case of a gated time-of-flight camera theimage sensor 204 has a resolution compatible with the duration of the pulses emitted by the light source. - The camera comprises an
optical system 206 that is arranged to gather and focus reflected light from the environment on to theimage sensor 204. In an example the optical system comprises an optical band pass filter, which allows only light of the same wavelength as the light source to be received by the sensor. The use of an optical band pass filter helps to suppress background light. The camera comprisesdriver electronics 208 which control both the light source and an image sensor, for example, to enable highly accurate phase difference measurements to be made or to enable a train of light pulses to be emitted and for the image sensor to be “shuttered” on and off. An image sensor may be shuttered on and off electronically rather than with physical shutters. - In one example the camera comprises a
processor 208 and amemory 210 which stores raw time-of-flight data, depth maps and other data. A trainedmachine learning component 214 is available at thecamera 212 in some examples and in other examples this trained machine learning component is at another computing device which receives and processes the raw sensor data from the camera. Where the trainedmachine learning component 214 is at thecamera 212 it comprises software stored atmemory 210 and executed atprocessor 208 in some cases. In some examples the trainedmachine learning component 214 is an FPGA or a dedicated chip. For example, the functionality of the trainedmachine learning component 214 is implemented, in whole or in part, by one or more hardware logic components. For example, and without limitation, illustrative types of hardware logic components that can be used include Field-programmable Gate Arrays (FPGAs), Application-specific Integrated Circuits (ASICs), Application-specific Standard Products (ASSPs), System-on-a-chip systems (SOCs), Complex Programmable Logic Devices (CPLDs), Graphics Processing Units (GPUs). - The trained
machine learning component 214 is arranged to execute the methods described herein with respect toFIGS. 3, 10, 12 in order to compute depth in real time, using a single stage, from a stream of raw time-of-flight data in a manner which allows for multi-path interference. This is achieved without the need to compute corrections to the depth values and apply those subsequent to the depth values having been computed. -
FIG. 3 is a flow diagram of a method at thedepth detection apparatus 100. Raw sensor data is received from the time-of-flight camera 104 and stored atmemory 122. Acheck 302 is made as to whether the process is to compute neighborhoods or not. The decision is made according to user input, or settings configured by an operator during manufacturing. In some examples, the depth detection apparatus automatically decides whether neighborhoods are to be computed according to the nature of the captured sensor data or other sensor data, such as sensors which detect motion of the time-of-flight camera. - In the case that neighborhoods are not computed the depth
detection apparatus inputs 306 theraw sensor data 300 to the trained machine learning component. The input process comprises inputting raw sensor data associated with individual pixels and/or whole frames of raw sensor data. As a result, the depth detection apparatus receives 308, from the trained machine learning component depth value(s) and optionally also uncertainty data. The depth detection apparatus outputs the depth values inreal time 310 together with the uncertainty data in some cases. By real time, it is meant that the rate of the receivedraw sensor data 300 is at least matched by the output rate of the output depth values atoperation 310. The process ofFIG. 3 repeats as more raw sensor data is received 300. - In the case that neighborhoods are computed the depth detection apparatus aggregates 304 raw sensor data values of pixels in a neighborhood of the pixel under current consideration. The neighborhood is either a spatial neighborhood or a temporal neighborhood or a combination of a spatial and temporal neighborhood.
- The aggregated raw sensor data values are input to the trained machine learning component at
operation 306. In this case the trained machine learning component has been trained using training data which has been aggregated in the same manner. In the case that temporal neighborhoods are used, the training data allows for motion between the camera and the scene. For example, as a result of objects in the scene moving and/or as a result of the camera moving. This is achieved by using a model of motion between the camera and the scene. - The machine learning system outputs depth values associated with the neighborhoods at
operation 308 and the process proceeds withoperation 310 and returns tooperation 300 as described above. - The table below has empirical data demonstrating how the
depth detection apparatus 100 ofFIG. 1 has improved accuracy as compared with an alternative method using a probabilistic generative model of time-of-flight on which inference is possible. To obtain the data for the table below and the graph ofFIG. 4 , the alternative method used the same conditions, such as number of exposures per frame, as the present method. The results for the alternative method are in the column labeled “generative” in the table below. A probabilistic generative model of time-of-flight is a description, expressed using likelihoods, of how raw time-of-flight data is generated by a time-of-flight camera under specified imaging conditions comprising reflectivity of a surface generating reflected light received at the camera (also referred to as albedo), illumination of the surface, and depth of the surface from the camera. Inference is possible on the probabilistic generative model so that given known imaging conditions it is possible to infer corresponding raw sensor data and vice versa. The probabilistic generative model takes into account a single path for reflected light for each pixel. In the alternative approach raw time-of-flight sensor data is used to compute depth values from the probabilistic generative model, or from an approximation of the probabilistic generative model. - It can be seen from the table below that the present approach has a slightly lower median error but that the biggest improvement is with respect to the largest errors. There is a 44% reduction in the number of depth value errors where the depth error is more than 5 centimetres.
-
Generative approach Approach of FIG. 1 Median error in 17 16 millimetres 90% quantile error in 96 63 millimetres % reduction of errors baseline 44 above 50 millimetres -
FIG. 4 is a cumulative distribution plot of the empirical data of the above table and additional empirical data. The graph ofFIG. 4 plots the probability of the depth error being less than a threshold, against the size of the threshold in millimetres.Line 400 indicates the data for the approach using a generative model described above andlines -
FIG. 5 is a schematic diagram of how to create a trained machine learning component such ascomponent 124 ofFIG. 1 . Millions of training data pairs (or more) are stored at adatabase 500 or other store. In some examples, a training data pair comprises a simulated raw time-of-flight sensor data frame and a corresponding ground truth depth map. In some examples a training data pair comprises a plurality of raw sensor data values associated with a pixel of the sensor, and a ground truth depth value associated with the pixel. In some examples a training data pair comprises a plurality of aggregated raw sensor data values, aggregated over a spatial or temporal neighborhood around a pixel of the sensor, and a ground truth depth value associated with the pixel. The raw sensor data values are obtained from a time-of-flight simulator which simulates multi-path interference as part of the simulation. The time-of-flight simulator comprises arenderer 606 and aviewpoint selector 604 and an example is described in more detail with reference toFIG. 6 . The time-of-flight simulator is relatively complex and it is not possible to carry out inference on the time-of-flight simulator as it is for the generative model of time-of-flight mentioned above. - A
trainer 504 accesses the training data pairs 500 and uses them to train and produce a trainedmachine learning component 506 such as a random decision forest, a convolutional neural network, a support vector machine or other trained regressor. The resulting trainedmachine learning component 506 may then be used as described above with respect toFIGS. 1 and 3 . The type of training data used to train the machine learning component corresponds with the type of data input to the machine learning component at test time. Test time is the time in which the machine learning component is operational to compute depth values from previously unseen raw sensor data. By using a wide variety of training data examples performance of the trained machine learning system is improved both in terms of accuracy and in terms of ability to generalize to examples which are different from the training examples. However, it is difficult to obtain appropriate training data. Ways in which good variety of training data are obtained are now described with reference toFIG. 6 . -
FIG. 6 is a schematic diagram of a time-of-flight simulator 602 and other components used to create training data pairs 626 such as the training data pairs described above with reference toFIG. 5 . - The time-of-
flight simulator 602 comprises arenderer 606 such as a computer graphics renderer which uses ray-tracing to render an image from a model of a 3D object or environment. Therenderer 606 is a physically-accurate renderer which produces realistic rendered images by using physical modeling of light scattering, light transport simulation, and integration of paths of light at every pixel. Therenderer 606 records, for each pixel, an intensity weight and a path length (the length of the path of simulated light from the simulated emitter of the TOF camera to the simulated surface(s) in the world and back to the simulated TOF sensor) for each of a plurality N of light path samples. The number of light path samples is the same for each pixel and is fixed in advance in some cases, such as a few thousand light path samples. In other examples, the number of light path samples is selected adaptively during simulation, for example, so that more complex areas in the scene are allocated more simulated light paths compared to simpler areas. More complex areas are identified in various ways such as according to the presence of corners, the presence of edges, the degree of surface reflectivity, or other factors. This gives per-pixelweighted point masses 608. An example of a per-pixel weighted point mass is given inFIG. 7 for a pixel depicting a surface in a corner of a room. The pixel in the example ofFIG. 7 receives light from multiple paths due to multi-path interference and so there aremultiple peaks training data pair 624 in some examples. - As mentioned above the
renderer 606 uses ray-tracing to render an image from a model of a 3D object or environment. It is time consuming and expensive to generate suitable models of 3D objects or environments. For example, where the time-of-flight camera is to be used in doors, the models are of typical indoor environments such as living rooms, offices, kitchens and other indoor environments. However, it is difficult to obtain a good range and variety of models of such 3D environments. In order to address this, the present technology uses a plurality of parametric3D environment models 610. A parametric3D environment model 610 is a computer manipulable description of a 3D environment expressed using one or more parameters. Aninstance generator 612 accesses a parametric 3D environment model from a store of parametric3D environment models 610 and creates a plurality of instances. An instance is a 3D environment model created from a parametric 3D environment model by selecting values of the parameters of the parametric 3D environment model. The instances are created by selecting values of the parameters at random and/or within a specified range of possible values of the parameters according to knowledge of feasible parameter value ranges. A non-exhaustive list of examples of parameters of a parametric 3D environment model is: geometry of individual objects in the 3D model, presence or absence of individual objects (including light sources), object location, object orientation, surface materials and textures, amount of ambient illumination. Using parametric models in this way enables a huge number of variations of 3D environment model to be generated in a fast, efficient manner. For example, in the case of a parametric 3D model of a living room, values of parameters can be adjusted to vary surface reflectivity of the flooring material, ceiling, walls, furniture, and also to vary geometry and/or position of objects in the room such as furniture, light fittings, windows other objects. As a result of the wide range and large number of 3D environment model instances the time-of-flight simulator 602 is able to render good variety of simulated raw time-of-flight data which incorporates multi-path interference. This gives improved quality training data pairs 626 and as a consequence, the trained machine learning component gives better quality depth values and uncertainty information. As a result there is a depth detection apparatus giving highly accurate depth values enabling better control by downstream computing systems. - The
renderer 606 renders an image from a model of a 3D object or environment given a camera viewpoint. A camera viewpoint is a 3D position and orientation within a bounding box of the 3D environment model instance. Therenderer 606 uses details ofoptical properties 600 of the time-of-flight camera such as the field of view of the camera, the focal length, and the spatial light emission intensity profile. The time-of-flight simulator has aviewpoint selector 604 which selects a large number of possible viewpoints of the camera within the instance of the 3D environment model. For example, theviewpoint selector 604 selects the viewpoints at random by choosing random 3D positions and orientations within a bounding box of the 3D environment model. Theviewpoint selector 604 is arranged to reject viewpoints which are within a threshold distance of objects in the 3D environment model. For example, to reject viewpoints which face a wall of the 3D environment with only 20 centimetres between the camera viewpoint and the wall. For a given 3D environment model instance, the renderer computes simulated raw time-of-flight data for each of a plurality of viewpoints selected by theviewpoint selector 604. For example, thousands of different viewpoints. As a result the training data pairs 624 exhibit good variety and the resulting trainedmachine learning component 124 is able to generalize well to unseen 3D environments with unseen camera viewpoints. - As mentioned above, the time-of-
flight simulator 602 outputs per-pixelweighted point masses 608. These do not take into account exposure profiles a time-of-flight camera has. This means that the time-of-flight simulator can be used for any type of time-of-flight camera. The per-pixelweighted point masses 608 are input to anexposure profile combiner 616 which incorporates information about a specified exposure profile of a time-of-flight camera into the raw time-of-flight data being simulated. The exposure profile is specified, by an operator during manufacturing, by selecting the exposure profile from a library of exposure profile details, or using user input. For example, the exposure profile is described using vector constant A and vector-valued function C. A vector-valued function is a function which takes a scalar argument and returns a vector. In an example, theexposure profile combiner 616 combines the per-pixel weighted point masses (values of weight w and path length t) with the vector constants A and the vector-valued function C using the following equation: -
- Where N is the number of light path samples fixed at a value such as a few thousand samples and where the symbol τ denotes the ambient light intensity used by the time-of-flight simulator. The values of the vector constant A and the values of the elements returned by the vector-valued function C in some examples are between zero and 212.
- The above equation is expressed in words as a mean response vector {right arrow over (μ)} of simulated raw time-of-flight sensor intensity values (such as four intensity values one for each of four exposures) simulated as being observed at the same pixel of the sensor, is equal to the ambient light intensity τ times a vector constant {right arrow over (A)} which represents part of the exposure pattern of the time of flight camera, plus the sum over the number of light path samples N, of a vector-valued function {right arrow over (C)} evaluated at ti, which represents another part of the exposure pattern of the time-of-flight camera times the weight {right arrow over (ω)}i which is the point mass weight from the time-of-flight simulator output for the light path sample i, and taking into account a distance decay function d(ti) where intensity falls away with distance from the camera of the surface which reflects the light.
- In some examples, sensor noise is simulated. That is, the output of the exposure profile combiner is processed by a
noise addition component 620 which adds noise to the simulated raw time-of-flight data. However it is not essential to use thenoise addition component 620. - The output of the
noise addition component 620 is simulated raw intensity values 622 associated with a pixel, and which incorporate multi-path interference and sensor noise. This data is formed into atraining data pair 624 by accessing the corresponding ground truth depth value (which is the true depth of the surface depicted by the pixel). The corresponding ground truth depth value is known either by computing it from the3D environment instance 614 or by taking the first peak of the per-pixel weighted point mass. Given a 3D environment model instance the depth detection apparatus computes ground truth depth values 318 for a given camera viewpoint. - The process described above for computing a training data pair is repeated to obtain millions of training data pairs 626 which are stored. In some cases a training data pair comprises a frame of simulated raw time-of-flight sensor data and a corresponding ground truth depth map. This is achieved by repeating the process for individual pixels of the sensor to form a frame.
- In some examples, the machine learning system comprises a random decision forest. A random decision forest comprises one or more decision trees each having a root node, a plurality of split nodes and a plurality of leaf nodes. Raw TOF sensor data is pushed through trees of a random decision forest from the root to a leaf node in a process whereby a decision is made at each split node. The decision is made according to values of parameters at the split nodes, where the values of the parameters have been learnt during training. At a split node the raw TOF sensor data proceeds to the next level of the tree down a branch chosen according to the results of the decision.
- During training, parameter values (which specify decision criteria to be used at the split nodes) are learnt for use at the split nodes and data (Raw TOF sensor data with ground truth depth values) is accumulated at the leaf nodes. The training data accumulated at a leaf node during training is stored as a histogram, or in an aggregated manner, such as using a mean, median or mode or by fitting a probability distribution to the histogram and storing statistics describing the probability distribution.
- At test time previously unseen raw TOF sensor data is input to the system to have one or more depth values predicted. This is described with reference to
FIG. 10 . - Referring to
FIG. 8 , to train the decision trees, the training set described above is first received 800. The number of decision trees to be used in a random decision forest is selected 802. A random decision forest is a collection of deterministic decision trees. Decision trees can be used in classification or regression algorithms, but can suffer from over-fitting, i.e. poor generalization. However, an ensemble of many randomly trained decision trees (a random forest) yields improved generalization. During the training process, the number of trees is fixed. - An example random decision forest is shown illustrated in
FIG. 9 . The illustrative decision forest ofFIG. 9 comprises three decision trees: afirst tree 900; asecond tree 902; and athird tree 904. Each decision tree comprises a root node (e.g. root node 906 of the first decision tree 900), a plurality of internal nodes, called split nodes (e.g. splitnode 908 of the first decision tree 900), and a plurality of leaf nodes (e.g. leaf node 910 of the first decision tree 900). - A decision tree from the decision forest is selected 804 (e.g. the first decision tree 800) and the
root node 806 is selected 806. A random set of test parameter values are then generated 810 for use by a binary test performed at the root node. The parameters are thresholds or other parameters of a binary test. In the case that neighborhoods of pixels are used then the binary test optionally comprises pairwise tests comparing pairs of pixels. In the pixel-independent case pairwise tests are not essential. - Then, every combination of test parameter value is applied 812 to each raw TOF training data item which has reached the current node. For each combination, criteria (also referred to as objectives) are calculated 814. In an example, the calculated criteria comprise the information gain (also known as the relative entropy). The combination of parameters that optimize the criteria (such as maximizing the information gain is selected 814 and stored at the current node for future use. As an alternative to information gain, other criteria can be used, such as the residual variance criterion or others.
- It is then determined 816 whether the value for the calculated criteria is less than (or greater than) a threshold. If the value for the calculated criteria is less than the threshold, then this indicates that further expansion of the tree does not provide significant benefit. This gives rise to asymmetrical trees which naturally stop growing when no further nodes are beneficial. In such cases, the current node is set 818 as a leaf node. Similarly, the current depth of the tree is determined (i.e. how many levels of nodes are between the root node and the current node). If this is greater than a predefined maximum value, then the current node is set 818 as a leaf node. Each leaf node has labeled raw TOF data which accumulate at that leaf node during the training process as described below.
- It is also possible to use another stopping criterion in combination with those already mentioned. For example, to assess the number of raw TOF data items that reach the node. If there are too few examples (compared with a threshold for example) then the process is arranged to stop to avoid overfitting. However, it is not essential to use this stopping criterion.
- If the value for the calculated criteria is greater than or equal to the threshold, and the tree depth is less than the maximum value, then the current node is set 820 as a split node. As the current node is a split node, it has child nodes, and the process then moves to training these child nodes. Each child node is trained using a subset of the training time-of-flight data at the current node. The subset of training time-of-flight data sent to a child node is determined using the parameters that optimized the criteria. These parameters are used in the binary test, and the binary test performed 822 on all training time-of-flight data at the current node. The raw TOF data items that pass the binary test form a first subset sent to a first child node, and the raw TOF data items that fail the binary test form a second subset sent to a second child node.
- For each of the child nodes, the process as outlined in
blocks 810 to 822 ofFIG. 8 are recursively executed 824 for the subset of raw TOF data items directed to the respective child node. In other words, for each child node, new random test parameters are generated 810, applied 812 to the respective subset of raw TOF data items, parameters optimizing the criteria selected 814, and the type of node (split or leaf) determined 816. If it is a leaf node, then the current branch of recursion ceases. If it is a split node, binary tests are performed 822 to determine further subsets of raw TOF data items and another branch of recursion starts. Therefore, this process recursively moves through the tree, training each node until leaf nodes are reached at each branch. As leaf nodes are reached, the process waits 826 until the nodes in all branches have been trained. Note that, in other examples, the same functionality can be attained using alternative techniques to recursion. - Once all the nodes in the tree have been trained to determine the parameters for the binary test optimizing the criteria at each split node, and leaf nodes have been selected to terminate each branch, then raw TOF data items with ground truth depth values are accumulated 828 at the leaf nodes of the tree. A representation of the accumulated depth values is stored 830 using various different methods.
- Once the accumulated depth values have been stored it is determined 832 whether more trees are present in the decision forest. If so, then the next tree in the decision forest is selected, and the process repeats. If all the trees in the forest have been trained, and no others remain, then the training process is complete and the process terminates 834.
- Therefore, as a result of the training process, one or more decision trees are trained using synthetic raw TOF data. Each tree comprises a plurality of split nodes storing optimized test parameters, and leaf nodes storing associated ground truth depth values. Due to the random generation of parameters from a limited subset used at each node, the trees of the forest are distinct (i.e. different) from each other.
- The training process is performed in advance of using the trained machine learning system to compute depth values of observed raw TOF data. The decision forest and the optimized test parameters is stored on a storage device for use in computing depth values at a later time.
-
FIG. 10 illustrates a flowchart of a process for predicting depth values from previously unseen raw TOF data using a decision forest that has been trained as described above. Firstly, an unseen raw TOF data item is received 1000. A raw TOF data item is referred to as ‘unseen’ to distinguish it from a training TOF data item which has the depth value specified. - Optionally neighborhoods are computed 1002 from the unseen raw TOF data. The neighborhoods are spatial and/or temporal neighborhoods as described above.
- A trained decision tree from the decision forest is selected 1004. The selected raw TOF data item (whole frame, values for an individual pixel, values for a neighborhood) is pushed 1006 through the selected decision tree such that it is tested against the trained parameters values at a node, and then passed to the appropriate child in dependence on the outcome of the test, and the process repeated until the raw TOF data item reaches a leaf node. Once the raw TOF data item reaches a leaf node, the accumulated depth values associated with this leaf node are stored 1008 for this raw TOF data item.
- If it is determined 1010 that there are more decision trees in the forest, then a new decision tree is selected 1004, the raw TOF data item pushed 1006 through the tree and the accumulated depth values stored 1008. This is repeated until it has been performed for all the decision trees in the forest. Note that the process for pushing a raw TOF data item through the plurality of trees in the decision forest can also be performed in parallel, instead of in sequence as shown in
FIG. 10 . - The data from the indexed leaf nodes is aggregated 1014 by averaging or in other ways. For example, where histograms of depth values are stored at the leaf nodes the histograms from the indexed leaf nodes are combined and used to identify one or more depth values associated with the raw TOF data item. The processes outputs 816 at least one depth value as a result, and is able to output a confidence weighting of the depth value. This helps any subsequent algorithm assess whether the proposal is good or not. More than one depth value may be output; for example, where there is uncertainty.
- The random decision forest example described above is modified in some cases by implementing the random decision forest as a directed acyclic graph in order to reduce the number of nodes of the graph. This facilitates deployment of the machine learning component on resource constrained devices such as smart phones, tablet computers and wearable computing devices.
-
FIG. 11 is a flow diagram of a method of training a convolutional neural network. In this case a training data pair comprises a raw TOF frame and a corresponding depth map. Individual pixel locations of the TOF frame have one or more intensity values, for different exposures for example. The training data pair is accessed 1100 and input to the convolutionalneural network 1102. - A neural network is a plurality of weighted nodes which are interconnected by edges which may also be weighted. The neural network has input nodes, output nodes and internal nodes. In the present examples the output nodes are associated with depth values learnt during a training phase.
- A convolutional neural network is a neural network where the nodes are arranged in multiple layers so that there are nodes in three dimensions, width, height and depth. Within each layer there are multiple receptive fields where a receptive field is a group of interconnected nodes which processes a portion of an input image (or TOF frame in the present examples). Within a layer the receptive fields are arranged so that their outputs partially overlap one another to give redundancy. A node of an internal layer is connected to neurons of one receptive field in the layer above. A convolutional neural network is typically a feed-forward neural network in which an input image (or TOF frame) is fed into input nodes, processed forwards through the network according to weights at the nodes, weighted connections between the nodes, and non-linear activation functions, and reaching a set of one or more output nodes.
- During training the training data instance is fed forwards through the network, from the input nodes to the output nodes, with computations performed at the nodes which update 1104 the weights of the nodes and edges according to update rules. The update process is repeated for more training instances according to a check for convergence at
check point 1106 ofFIG. 11 , such as to see if the amount of change from the most recent update was smaller than a threshold. When convergence is reached the training ends 1108. - During test time, when the trained convolutional neural network is used to predict depth maps from raw time-of-flight data frames, the trained
machine learning component 124 receives 1200 an unseen raw time-of-flight frame. Itinputs 1202 the frame to the trained convolutional neural network. Values associated with individual pixel locations (or neighborhoods) of the frame are input to the plurality of input nodes and this triggers a feed forward process through the network. The values from the frame pass through a layer of the neural network and trigger input to subsequent layers via the overlapping receptive fields. Eventually output nodes are triggered and depth values associated with the triggered output nodes are retrieved from storage. The depth values are then stored as adepth map 1204 optionally with uncertainty data calculated from the neural network outputs. The depth map has smoothed depth values because the receptive fields of the convolutional neural network enable spatial relationships between the pixel locations to be taken into account. -
FIG. 13 illustrates various components of an exemplary computing-baseddevice 1300 which is implemented as any form of a computing and/or electronic device, and in which embodiments of a depth detection apparatus is implemented in cases where the depth detection apparatus is separate from the time-of-flight camera. A non-exhaustive list of examples of forms of the computing and/or electronic device is: augmented reality near eye computing system, augmented reality body worn computing system, augmented reality wearable computing device, smart phone, desk top computer, computer game console, touch-less user interface computing device, tablet computer, laptop computer. - Computing-based
device 1300 comprises one ormore processors 1302 which are microprocessors, controllers or any other suitable type of processors for processing computer executable instructions to control the operation of the device in order to compute depth values or depth maps from raw time-of-flight data. In some examples, the computing-based device computes a stream of depth maps from a stream of frames of raw time-of-flight data (received from time-of-flight camera 1326) in real time and in a manner which takes into account multipath interference. In some examples, for example where a system on a chip architecture is used, theprocessors 1302 include one or more fixed function blocks (also referred to as accelerators) which implement a part of the method of any ofFIGS. 3, 5, 6, 8, 10, 11, 12 in hardware (rather than software or firmware). Platform software comprising anoperating system 1304 or any other suitable platform software is provided at the computing-based device to enableapplication software 1006 to be executed on the device. A trainedmachine learning component 1308 is provided such as the trainedmachine learning component 124 ofFIG. 1 . - A
data store 1310 atmemory 1316 stores raw time-of-flight data, simulated raw time-of-flight data, parameter values, exposure profile data, 3D environment models and other data. - The computer executable instructions are provided using any computer-readable media that is accessible by computing based
device 1300. Computer-readable media includes, for example, computer storage media such asmemory 1316 and communications media. Computer storage media, such asmemory 1316, includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or the like. Computer storage media includes, but is not limited to, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM), electronic erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that is used to store information for access by a computing device. In contrast, communication media embody computer readable instructions, data structures, program modules, or the like in a modulated data signal, such as a carrier wave, or other transport mechanism. As defined herein, computer storage media does not include communication media. Therefore, a computer storage medium should not be interpreted to be a propagating signal per se. Although the computer storage media (memory 1316) is shown within the computing-baseddevice 1300 it will be appreciated that the storage is, in some examples, distributed or located remotely and accessed via a network or other communication link (e.g. using communication interface 1318). - The computing-based
device 1300 also comprises an input/output controller 1320 arranged to output display information to adisplay device 1324, where used, which is separate from or integral to the computing-baseddevice 1300. The display information optionally graphically presents depth maps computed by the computing-based device. The input/output controller 1320 is also arranged to receive and process input from one or more devices, such as time-of-flight camera 1326, a user input device 1322 (e.g. a stylus, mouse, keyboard, camera, microphone or other sensor). In some examples theuser input device 1322 detects voice input, user gestures or other user actions and provides a natural user interface (NUI). This user input is used to specify 3D environment models, specify parameter values or for other purposes. In an embodiment thedisplay device 1324 also acts as theuser input device 1322 if it is a touch sensitive display device. The input/output controller 1320 outputs data to devices other than the display device in some examples, e.g. a locally connected printing device. - Any of the input/
output controller 1320,display device 1324 and theuser input device 1322 comprise, in some examples, NUI technology which enables a user to interact with the computing-based device in a natural manner, free from artificial constraints imposed by input devices such as mice, keyboards, remote controls and the like. Examples of NUI technology that are provided in some examples include but are not limited to those relying on voice and/or speech recognition, touch and/or stylus recognition (touch sensitive displays), gesture recognition both on screen and adjacent to the screen, air gestures, head and eye tracking, voice and speech, vision, touch, gestures, and machine intelligence. Other examples of NUI technology that are used in some examples include intention and goal understanding systems, motion gesture detection systems using depth cameras (such as stereoscopic camera systems, infrared camera systems, red green blue (rgb) camera systems and combinations of these), motion gesture detection using accelerometers/gyroscopes, facial recognition, three dimensional (3D) displays, head, eye and gaze tracking, immersive augmented reality and virtual reality systems and technologies for sensing brain activity using electric field sensing electrodes (electro encephalogram (EEG) and related methods). - Alternatively or in addition to the other examples described herein, examples include any combination of the following:
- A depth detection apparatus comprising:
- a memory storing raw time-of-flight sensor data received from a time-of-flight sensor; and
- a trained machine learning component having been trained using training data pairs, a training data pair comprising at least one simulated raw time-of-flight sensor data value and a corresponding simulated ground truth depth value;
- the trained machine learning component configured to compute in a single stage, for an item of the stored raw time-of-flight sensor data, a depth value of a surface depicted by the item, by pushing the item through the trained machine learning component.
- The apparatus mentioned above, the trained machine learning component having been trained using simulated raw time-of-flight sensor data values which incorporate simulated multi-path interference.
- The apparatus mentioned above, the trained machine learning component having been trained using simulated raw time-of-flight sensor data values computed using a computer graphics renderer which simulates multi-path interference.
- The apparatus mentioned above, the trained machine learning component having been trained using simulated raw time-of-flight sensor data values comprising, for an individual pixel, weighted intensity values at different depths potentially depicted by the pixel.
- The apparatus mentioned above, the trained machine learning component having been trained using simulated raw time-of-flight sensor data values where information about an exposure profile of the time-of-flight sensor is combined with the simulated raw time-of-flight sensor data values.
- The apparatus mentioned above, the trained machine learning component having been trained using simulated raw time-of-flight sensor data values where information about sensor noise of the time-of-flight sensor is combined with the simulated raw time-of-flight sensor data values.
- The apparatus mentioned above, the trained machine learning component having been trained using simulated raw time-of-flight sensor data values computed using a computer graphics renderer from a plurality of instances of a parametric 3D environment model, where the instances of the parametric 3D environment model are computer generated automatically at random.
- The apparatus mentioned above, where parameters of the parametric 3D environment model comprise one or more of: geometry of an object in the 3D environment model, position of an object in the 3D environment model, presence of an object in the 3D environment model, orientation of an object in the 3D environment model, surface materials and reflectivity, ambient illumination.
- The apparatus mentioned above, wherein a training data pair comprises a frame of simulated raw time-of-flight sensor data values and a corresponding simulated ground truth depth map.
- The apparatus mentioned above, the trained machine learning component having been trained using simulated raw time-of-flight sensor data values computed using a computer graphics renderer for a plurality of randomly selected viewpoints of the time-of-flight sensor, and where any of the viewpoints which are within a threshold distance of a surface in a 3D environment model used by the computer graphics renderer are omitted.
- The apparatus mentioned above, the trained machine learning component having been trained using simulated raw time-of-flight sensor data values aggregated over a neighborhood of a pixel, where the neighborhood is a spatial neighborhood, or a temporal neighborhood, or a spatial and temporal neighborhood.
- The apparatus mentioned above, where the trained machine learning component is a pixel independent regressor.
- The apparatus mentioned above, where the trained machine learning component is regressor which takes into account relationships between pixels of the stored time-of-flight sensor data.
- The apparatus mentioned above, where the trained machine learning component is a convolutional neural network and where each training data pair comprises a frame of simulated raw time-of-flight sensor data and a ground truth depth map.
- The apparatus mentioned above, where the trained machine learning component is at least partially implemented using hardware logic selected from any one or more of: a field-programmable gate array, an application-specific integrated circuit, an application-specific standard product, a system-on-a-chip, a complex programmable logic device, a graphics processing unit.
- A depth detection apparatus comprising:
- a memory storing frames of raw time-of-flight sensor data received from a time-of-flight sensor; and
- a trained machine learning component having been trained using training data pairs, a training data pair comprising a simulated raw time-of-flight sensor frame and a corresponding simulated ground truth depth map;
- the trained machine learning component configured to compute in a single stage, for a frame of the stored raw time-of-flight sensor data, a depth map of surfaces depicted by the frame, by pushing the frame through the trained machine learning component.
- The apparatus mentioned above, where the trained machine learning component is configured to operate in real time by computing the depth maps at a rate which is equivalent to or faster than a frame rate of the time-of-flight sensor.
- The apparatus mentioned above, where the trained machine learning component comprises a convolutional neural network.
- The apparatus mentioned above, where the trained machine learning component comprises a pixel independent regressor which is a regressor that does not take into account relationships between pixels of a time-of-flight sensor frame.
- A computer-implemented method comprising:
- storing, at a memory, raw time-of-flight sensor data received from a time-of-flight sensor; and
- operating a trained machine learning component having been trained using training data pairs, a training data pair comprising at least one simulated raw time-of-flight sensor data value and a corresponding simulated ground truth depth value;
- wherein operating the trained machine learning component comprises computing, in a single stage, for an item of the stored raw time-of-flight sensor data, a depth value of a surface depicted by the item, by pushing the item through the trained machine learning component.
- An apparatus comprising:
- means for storing raw time-of-flight sensor data received from a time-of-flight sensor; and
- means for operating a trained machine learning component having been trained using training data pairs, a training data pair comprising at least one simulated raw time-of-flight sensor data value and a corresponding simulated ground truth depth value; wherein operating the trained machine learning component comprises computing, in a single stage, for an item of the stored raw time-of-flight sensor data, a depth value of a surface depicted by the item, by pushing the item through the trained machine learning component.
- The examples illustrated and described herein as well as examples not specifically described herein but within the scope of aspects of the disclosure constitute exemplary means for storing raw time-of-flight sensor data, executing a trained machine learning system, computing depth values or computing depth maps. For example, the memory of
FIG. 2 or 13 constitutes exemplary means for storing raw time-of-flight sensor data. For example, the processor ofFIG. 2 or 13 constitutes exemplary means for operating a trained machine learning component. - The term ‘computer’ or ‘computing-based device’ is used herein to refer to any device with processing capability such that it executes instructions. Those skilled in the art will realize that such processing capabilities are incorporated into many different devices and therefore the terms ‘computer’ and ‘computing-based device’ each include personal computers (PCs), servers, mobile telephones (including smart phones), tablet computers, set-top boxes, media players, games consoles, personal digital assistants, wearable computers, and many other devices.
- The methods described herein are performed, in some examples, by software in machine readable form on a tangible storage medium e.g. in the form of a computer program comprising computer program code means adapted to perform all the operations of one or more of the methods described herein when the program is run on a computer and where the computer program is embodied on a computer readable medium. Examples of tangible storage media include computer storage devices comprising computer-readable media such as disks, thumb drives, memory etc. and do not include propagated signals. The software is suitable for execution on a parallel processor or a serial processor such that the method operations are carried out in any suitable order, or simultaneously.
- This acknowledges that software is a valuable, separately tradable commodity. It is intended to encompass software, which runs on or controls “dumb” or standard hardware, to carry out the desired functions. It is also intended to encompass software which “describes” or defines the configuration of hardware, such as HDL (hardware description language) software, as is used for designing silicon chips, or for configuring universal programmable chips, to carry out desired functions.
- Those skilled in the art will realize that storage devices utilized to store program instructions are optionally distributed across a network. For example, a remote computer is able to store an example of the process described as software. A local or terminal computer is able to access the remote computer and download a part or all of the software to run the program. Alternatively, the local computer may download pieces of the software as needed, or execute some software instructions at the local terminal and some at the remote computer (or computer network). Those skilled in the art will also realize that by utilizing conventional techniques known to those skilled in the art that all, or a portion of the software instructions may be carried out by a dedicated circuit, such as a digital signal processor (DSP), programmable logic array, or the like.
- Any range or device value given herein may be extended or altered without losing the effect sought, as will be apparent to the skilled person.
- Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.
- It will be understood that the benefits and advantages described above relate to one embodiment or relate to several embodiments. The embodiments are not limited to those that solve any or all of the stated problems or those that have any or all of the stated benefits and advantages. It will further be understood that reference to ‘an’ item refers to one or more of those items.
- The operations of the methods described herein are carried out in any suitable order, or simultaneously where appropriate. Additionally, individual blocks may be deleted from any of the methods without departing from the scope of the subject matter described herein. Aspects of any of the examples described above are combined with aspects of any of the other examples described to form further examples without losing the effect sought.
- The term ‘comprising’ is used herein to mean including the method blocks or elements identified, but that such blocks or elements do not comprise an exclusive list and a method or apparatus may contain additional blocks or elements.
- It will be understood that the above description is given by way of example only and that various modifications may be made by those skilled in the art. The above specification, examples and data provide a complete description of the structure and use of exemplary embodiments. Although various embodiments have been described above with a certain degree of particularity, or with reference to one or more individual embodiments, those skilled in the art could make numerous alterations to the disclosed embodiments without departing from the scope of this specification.
Claims (20)
1. A depth detection apparatus comprising:
a memory storing raw time-of-flight sensor data received from a time-of-flight sensor; and
a processor comprising a trained machine learning component having been trained using training data pairs, a training data pair comprising at least one simulated raw time-of-flight sensor data value and a corresponding simulated ground truth depth value;
the trained machine learning component configured to compute in a single stage, for an item of the stored raw time-of-flight sensor data, a depth value of a surface depicted by the item, by pushing the item through the trained machine learning component.
2. The apparatus of claim 1 the trained machine learning component having been trained using simulated raw time-of-flight sensor data values which incorporate simulated multi-path interference.
3. The apparatus of claim 1 the trained machine learning component having been trained using simulated raw time-of-flight sensor data values computed using a computer graphics renderer which simulates multi-path interference.
4. The apparatus of claim 1 the trained machine learning component having been trained using simulated raw time-of-flight sensor data values comprising, for an individual pixel, weighted intensity values at different depths potentially depicted by the pixel.
5. The apparatus of claim 1 the trained machine learning component having been trained using simulated raw time-of-flight sensor data values where information about an exposure profile of the time-of-flight sensor is combined with the simulated raw time-of-flight sensor data values.
6. The apparatus of claim 1 the trained machine learning component having been trained using simulated raw time-of-flight sensor data values where information about sensor noise of the time-of-flight sensor is combined with the simulated raw time-of-flight sensor data values.
7. The apparatus of claim 1 the trained machine learning component having been trained using simulated raw time-of-flight sensor data values computed using a computer graphics renderer from a plurality of instances of a parametric 3D environment model, where the instances of the parametric 3D environment model are computer generated automatically at random.
8. The apparatus of claim 7 where parameters of the parametric 3D environment model comprise one or more of: geometry of an object in the 3D environment model, position of an object in the 3D environment model, presence of an object in the 3D environment model, orientation of an object in the 3D environment model, surface materials and reflectivity, ambient illumination.
9. The apparatus of claim 1 wherein a training data pair comprises a frame of simulated raw time-of-flight sensor data values and a corresponding simulated ground truth depth map.
10. The apparatus of claim 1 , the trained machine learning component having been trained using simulated raw time-of-flight sensor data values computed using a computer graphics renderer for a plurality of randomly selected viewpoints of the time-of-flight sensor, and where any of the viewpoints which are within a threshold distance of a surface in a 3D environment model used by the computer graphics renderer are omitted.
11. The apparatus of claim 1 the trained machine learning component having been trained using simulated raw time-of-flight sensor data values aggregated over a neighborhood of a pixel, where the neighborhood is a spatial neighborhood, or a temporal neighborhood, or a spatial and temporal neighborhood.
12. The apparatus of claim 1 where the trained machine learning component is a pixel independent regressor.
13. The apparatus of claim 1 where the trained machine learning component is regressor which takes into account relationships between pixels of the stored time-of-flight sensor data.
14. The apparatus of claim 1 where the trained machine learning component is a convolutional neural network and where each training data pair comprises a frame of simulated raw time-of-flight sensor data and a ground truth depth map.
15. The apparatus of claim 1 where the trained machine learning component is at least partially implemented using hardware logic selected from any one or more of: a field-programmable gate array, an application-specific integrated circuit, an application-specific standard product, a system-on-a-chip, a complex programmable logic device, a graphics processing unit.
16. A depth detection apparatus comprising:
a memory storing frames of raw time-of-flight sensor data received from a time-of-flight sensor; and
a trained machine learning component having been trained using training data pairs, a training data pair comprising a simulated raw time-of-flight sensor frame and a corresponding simulated ground truth depth map;
the trained machine learning component configured to compute in a single stage, for a frame of the stored raw time-of-flight sensor data, a depth map of surfaces depicted by the frame, by pushing the frame through the trained machine learning component.
17. The apparatus of claim 16 where the trained machine learning component is configured to operate in real time by computing the depth maps at a rate which is equivalent to or faster than a frame rate of the time-of-flight sensor.
18. The apparatus of claim 16 where the trained machine learning component comprises a convolutional neural network.
19. The apparatus of claim 16 where the trained machine learning component comprises a pixel independent regressor which is a regressor that does not take into account relationships between pixels of a time-of-flight sensor frame.
20. A computer-implemented method comprising:
storing, at a memory, raw time-of-flight sensor data received from a time-of-flight sensor; and
operating, by a processor, a trained machine learning component having been trained using training data pairs, a training data pair comprising at least one simulated raw time-of-flight sensor data value and a corresponding simulated ground truth depth value;
wherein operating the trained machine learning component comprises computing, in a single stage, for an item of the stored raw time-of-flight sensor data, a depth value of a surface depicted by the item, by pushing the item through the trained machine learning component.
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/068,632 US9760837B1 (en) | 2016-03-13 | 2016-03-13 | Depth from time-of-flight using machine learning |
PCT/US2017/020846 WO2017160516A1 (en) | 2016-03-13 | 2017-03-06 | Depth from time-of-flight using machine learning |
CN201780016747.8A CN108885701B (en) | 2016-03-13 | 2017-03-06 | Time-of-flight depth using machine learning |
EP17714934.1A EP3430571A1 (en) | 2016-03-13 | 2017-03-06 | Depth from time-of-flight using machine learning |
US15/672,261 US10311378B2 (en) | 2016-03-13 | 2017-08-08 | Depth from time-of-flight using machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/068,632 US9760837B1 (en) | 2016-03-13 | 2016-03-13 | Depth from time-of-flight using machine learning |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/672,261 Continuation US10311378B2 (en) | 2016-03-13 | 2017-08-08 | Depth from time-of-flight using machine learning |
Publications (2)
Publication Number | Publication Date |
---|---|
US9760837B1 US9760837B1 (en) | 2017-09-12 |
US20170262768A1 true US20170262768A1 (en) | 2017-09-14 |
Family
ID=58461429
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/068,632 Active US9760837B1 (en) | 2016-03-13 | 2016-03-13 | Depth from time-of-flight using machine learning |
US15/672,261 Active US10311378B2 (en) | 2016-03-13 | 2017-08-08 | Depth from time-of-flight using machine learning |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/672,261 Active US10311378B2 (en) | 2016-03-13 | 2017-08-08 | Depth from time-of-flight using machine learning |
Country Status (4)
Country | Link |
---|---|
US (2) | US9760837B1 (en) |
EP (1) | EP3430571A1 (en) |
CN (1) | CN108885701B (en) |
WO (1) | WO2017160516A1 (en) |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108304782A (en) * | 2017-01-13 | 2018-07-20 | 福特全球技术公司 | Generate the simulated sensor data for training and verifying detection model |
US10304335B2 (en) | 2016-04-12 | 2019-05-28 | Ford Global Technologies, Llc | Detecting available parking spaces |
WO2019139760A1 (en) | 2018-01-12 | 2019-07-18 | Microsoft Technology Licensing, Llc | Automated localized machine learning training |
WO2019139759A1 (en) | 2018-01-12 | 2019-07-18 | Microsoft Technology Licensing, Llc | Automated collection of machine learning training data |
WO2019188348A1 (en) * | 2018-03-29 | 2019-10-03 | パナソニックIpマネジメント株式会社 | Distance information acquisition device, multipath detection device, and multipath detection method |
US10602270B1 (en) | 2018-11-30 | 2020-03-24 | Microsoft Technology Licensing, Llc | Similarity measure assisted adaptation control |
WO2020117611A1 (en) | 2018-12-06 | 2020-06-11 | Microsoft Technology Licensing, Llc | Automatically performing and evaluating pilot testing of software |
WO2020131499A1 (en) | 2018-12-19 | 2020-06-25 | Microsoft Technology Licensing, Llc | System and method of receiving and converting digital ink input |
WO2020193412A1 (en) * | 2019-03-22 | 2020-10-01 | Sony Semiconductor Solutions Corporation | Analysis portion, time-of-flight imaging device and method |
WO2020197793A1 (en) | 2019-03-22 | 2020-10-01 | Microsoft Technology Licensing, Llc | Method and system for intelligently suggesting tags for documents |
KR20210013149A (en) * | 2018-12-14 | 2021-02-03 | 선전 센스타임 테크놀로지 컴퍼니 리미티드 | Image processing method and device, electronic device and storage medium |
WO2021071615A1 (en) | 2019-10-11 | 2021-04-15 | Microsoft Technology Licensing, Llc | Keeping track of important tasks |
US20210166124A1 (en) * | 2019-12-03 | 2021-06-03 | Sony Semiconductor Solutions Corporation | Apparatuses and methods for training a machine learning network for use with a time-of-flight camera |
DE102020101706A1 (en) | 2020-01-24 | 2021-07-29 | Ifm Electronic Gmbh | Method for generating depth image pairs for a database |
US20210264166A1 (en) * | 2020-02-26 | 2021-08-26 | GM Global Technology Operations LLC | Natural surround view |
DE102021109386A1 (en) | 2020-04-22 | 2021-10-28 | Ifm Electronic Gmbh | Method for correcting depth images of a time-of-flight camera |
US20210365722A1 (en) * | 2020-05-21 | 2021-11-25 | Canon Kabushiki Kaisha | Information processing device, information processing method, and storage medium |
US20220165027A1 (en) * | 2020-11-23 | 2022-05-26 | Sony Corporation | Training dataset generation for depth measurement |
WO2022152374A1 (en) * | 2021-01-13 | 2022-07-21 | Eaton Intelligent Power Limited | A surface roughness measurement system |
WO2022219564A1 (en) * | 2021-04-16 | 2022-10-20 | Paladin AI Inc. | Automatic inferential pilot competency analysis based on detecting performance norms in flight simulation data |
US11488317B2 (en) | 2020-11-23 | 2022-11-01 | Sony Group Corporation | Neural network model based depth estimation |
DE102021111602A1 (en) | 2021-05-05 | 2022-11-10 | Ifm Electronic Gmbh | Computer-implemented method for correcting artifacts in measurement data generated by a time-of-flight 3D sensor, a corresponding computer program, a corresponding computer-readable medium and a PMD detector |
WO2024062874A1 (en) * | 2022-09-20 | 2024-03-28 | ソニーセミコンダクタソリューションズ株式会社 | Information processing device, information processing method, and program |
Families Citing this family (74)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10712529B2 (en) | 2013-03-13 | 2020-07-14 | Cognex Corporation | Lens assembly with integrated feedback loop for focus adjustment |
US11002854B2 (en) | 2013-03-13 | 2021-05-11 | Cognex Corporation | Lens assembly with integrated feedback loop and time-of-flight sensor |
US10409165B2 (en) * | 2014-12-15 | 2019-09-10 | Asml Netherlands B.V. | Optimization based on machine learning |
US10062201B2 (en) | 2015-04-21 | 2018-08-28 | Microsoft Technology Licensing, Llc | Time-of-flight simulation of multipath light phenomena |
US9906717B1 (en) * | 2016-09-01 | 2018-02-27 | Infineon Technologies Ag | Method for generating a high-resolution depth image and an apparatus for generating a high-resolution depth image |
EP4131172A1 (en) * | 2016-09-12 | 2023-02-08 | Dassault Systèmes | Deep convolutional neural network for 3d reconstruction of a real object |
US10451714B2 (en) | 2016-12-06 | 2019-10-22 | Sony Corporation | Optical micromesh for computerized devices |
US10536684B2 (en) | 2016-12-07 | 2020-01-14 | Sony Corporation | Color noise reduction in 3D depth map |
US10181089B2 (en) | 2016-12-19 | 2019-01-15 | Sony Corporation | Using pattern recognition to reduce noise in a 3D map |
US10178370B2 (en) | 2016-12-19 | 2019-01-08 | Sony Corporation | Using multiple cameras to stitch a consolidated 3D depth map |
US10495735B2 (en) | 2017-02-14 | 2019-12-03 | Sony Corporation | Using micro mirrors to improve the field of view of a 3D depth map |
US10795022B2 (en) * | 2017-03-02 | 2020-10-06 | Sony Corporation | 3D depth map |
US10147193B2 (en) | 2017-03-10 | 2018-12-04 | TuSimple | System and method for semantic segmentation using hybrid dilated convolution (HDC) |
US10979687B2 (en) | 2017-04-03 | 2021-04-13 | Sony Corporation | Using super imposition to render a 3D depth map |
US10762635B2 (en) | 2017-06-14 | 2020-09-01 | Tusimple, Inc. | System and method for actively selecting and labeling images for semantic segmentation |
US10609342B1 (en) * | 2017-06-22 | 2020-03-31 | Insight, Inc. | Multi-channel sensing system with embedded processing |
US10816354B2 (en) | 2017-08-22 | 2020-10-27 | Tusimple, Inc. | Verification module system and method for motion-based lane detection with multiple sensors |
US10762673B2 (en) | 2017-08-23 | 2020-09-01 | Tusimple, Inc. | 3D submap reconstruction system and method for centimeter precision localization using camera-based submap and LiDAR-based global map |
US10565457B2 (en) | 2017-08-23 | 2020-02-18 | Tusimple, Inc. | Feature matching and correspondence refinement and 3D submap position refinement system and method for centimeter precision localization using camera-based submap and LiDAR-based global map |
US10953881B2 (en) | 2017-09-07 | 2021-03-23 | Tusimple, Inc. | System and method for automated lane change control for autonomous vehicles |
US10649458B2 (en) | 2017-09-07 | 2020-05-12 | Tusimple, Inc. | Data-driven prediction-based system and method for trajectory planning of autonomous vehicles |
US10953880B2 (en) | 2017-09-07 | 2021-03-23 | Tusimple, Inc. | System and method for automated lane change control for autonomous vehicles |
US10552979B2 (en) | 2017-09-13 | 2020-02-04 | TuSimple | Output of a neural network method for deep odometry assisted by static scene optical flow |
US10671083B2 (en) | 2017-09-13 | 2020-06-02 | Tusimple, Inc. | Neural network architecture system for deep odometry assisted by static scene optical flow |
US10484667B2 (en) | 2017-10-31 | 2019-11-19 | Sony Corporation | Generating 3D depth map using parallax |
EP3480782A1 (en) * | 2017-11-02 | 2019-05-08 | Vrije Universiteit Brussel | Method and device for reducing noise in a depth image |
CN107745697A (en) | 2017-11-16 | 2018-03-02 | 北京图森未来科技有限公司 | A kind of auto cleaning system and method |
US11270161B2 (en) * | 2017-11-27 | 2022-03-08 | Nvidia Corporation | Deep-learning method for separating reflection and transmission images visible at a semi-reflective surface in a computer image of a real-world scene |
EP3737595B1 (en) | 2018-01-09 | 2023-12-27 | TuSimple, Inc. | Real-time remote control of vehicles with high redundancy |
CN111989716B (en) | 2018-01-11 | 2022-11-15 | 图森有限公司 | Monitoring system for autonomous vehicle operation |
CN108270970B (en) | 2018-01-24 | 2020-08-25 | 北京图森智途科技有限公司 | Image acquisition control method and device and image acquisition system |
US11009356B2 (en) | 2018-02-14 | 2021-05-18 | Tusimple, Inc. | Lane marking localization and fusion |
US11009365B2 (en) | 2018-02-14 | 2021-05-18 | Tusimple, Inc. | Lane marking localization |
CN110378185A (en) | 2018-04-12 | 2019-10-25 | 北京图森未来科技有限公司 | A kind of image processing method applied to automatic driving vehicle, device |
CN116129376A (en) | 2018-05-02 | 2023-05-16 | 北京图森未来科技有限公司 | Road edge detection method and device |
US10996335B2 (en) * | 2018-05-09 | 2021-05-04 | Microsoft Technology Licensing, Llc | Phase wrapping determination for time-of-flight camera |
US10565728B2 (en) | 2018-06-01 | 2020-02-18 | Tusimple, Inc. | Smoothness constraint for camera pose estimation |
US10549186B2 (en) | 2018-06-26 | 2020-02-04 | Sony Interactive Entertainment Inc. | Multipoint SLAM capture |
DE102018117938A1 (en) * | 2018-07-25 | 2020-01-30 | Ifm Electronic Gmbh | Method and image processing system for automatic detection and / or correction of image artifacts in images from a runtime camera |
US11609313B2 (en) | 2018-07-31 | 2023-03-21 | Waymo Llc | Hybrid time-of-flight and imager module |
WO2020045770A1 (en) * | 2018-08-31 | 2020-03-05 | Samsung Electronics Co., Ltd. | Method and device for obtaining 3d images |
US11023742B2 (en) | 2018-09-07 | 2021-06-01 | Tusimple, Inc. | Rear-facing perception system for vehicles |
US11019274B2 (en) | 2018-09-10 | 2021-05-25 | Tusimple, Inc. | Adaptive illumination for a time-of-flight camera on a vehicle |
CN112689586B (en) | 2018-09-13 | 2024-04-16 | 图森有限公司 | Remote safe driving method and system |
JP2020046774A (en) * | 2018-09-14 | 2020-03-26 | 株式会社東芝 | Signal processor, distance measuring device and distance measuring method |
US10922882B2 (en) * | 2018-10-26 | 2021-02-16 | Electronics Arts Inc. | Terrain generation system |
US10942271B2 (en) | 2018-10-30 | 2021-03-09 | Tusimple, Inc. | Determining an angle between a tow vehicle and a trailer |
US11353588B2 (en) | 2018-11-01 | 2022-06-07 | Waymo Llc | Time-of-flight sensor with structured light illuminator |
US20210341620A1 (en) * | 2018-12-02 | 2021-11-04 | Gentex Corporation | SYSTEMS, DEVICES AND METHODS FOR MICRO-VIBRATION DATA EXTRACTION USING A TIME OF FLIGHT (ToF) IMAGING DEVICE |
CN111319629B (en) | 2018-12-14 | 2021-07-16 | 北京图森智途科技有限公司 | Team forming method, device and system for automatically driving fleet |
US10846917B2 (en) | 2019-01-03 | 2020-11-24 | Microsoft Technology Licensing, Llc | Iterating different camera representations in three-dimensional model |
CN109738881B (en) * | 2019-01-11 | 2023-08-08 | 歌尔光学科技有限公司 | Calibration method and device of time-of-flight depth module and readable storage medium |
US10861165B2 (en) * | 2019-01-11 | 2020-12-08 | Microsoft Technology Licensing, Llc | Subject tracking with aliased time-of-flight data |
US11698441B2 (en) * | 2019-03-22 | 2023-07-11 | Viavi Solutions Inc. | Time of flight-based three-dimensional sensing system |
CN110197228B (en) * | 2019-05-31 | 2020-11-27 | 北京百度网讯科技有限公司 | Image correction method and device |
US11823460B2 (en) | 2019-06-14 | 2023-11-21 | Tusimple, Inc. | Image fusion for autonomous vehicle operation |
US11587448B2 (en) * | 2019-07-26 | 2023-02-21 | General Electric Company | Systems and methods for manifolds learning of airline network data |
EP4004582A1 (en) | 2019-08-07 | 2022-06-01 | Huawei Technologies Co., Ltd. | Time-of-flight depth enhancement |
KR20220043125A (en) * | 2019-08-13 | 2022-04-05 | 소니 세미컨덕터 솔루션즈 가부시키가이샤 | Measuring devices and rangefinders |
US11561292B1 (en) * | 2019-08-23 | 2023-01-24 | Zoox, Inc. | Active power control of sensors |
CN112532858A (en) * | 2019-09-18 | 2021-03-19 | 华为技术有限公司 | Image processing method, image acquisition method and related device |
CN113009508B (en) * | 2019-12-20 | 2023-11-07 | 舜宇光学(浙江)研究院有限公司 | Multipath interference correction method for TOF module, system and electronic equipment thereof |
US11694341B2 (en) * | 2019-12-23 | 2023-07-04 | Texas Instmments Incorporated | Cascaded architecture for disparity and motion prediction with block matching and convolutional neural network (CNN) |
TWI742543B (en) * | 2020-02-26 | 2021-10-11 | 香港商冠捷投資有限公司 | Display device |
CN115428431A (en) * | 2020-04-02 | 2022-12-02 | 株式会社小糸制作所 | Door control camera, vehicle sensing system, and vehicle lamp |
EP3893150A1 (en) | 2020-04-09 | 2021-10-13 | Tusimple, Inc. | Camera pose estimation techniques |
CN111708039B (en) * | 2020-05-24 | 2023-09-05 | 奥比中光科技集团股份有限公司 | Depth measurement device and method and electronic equipment |
CN111736173B (en) * | 2020-05-24 | 2023-04-11 | 奥比中光科技集团股份有限公司 | Depth measuring device and method based on TOF and electronic equipment |
AU2021203567A1 (en) | 2020-06-18 | 2022-01-20 | Tusimple, Inc. | Angle and orientation measurements for vehicles with multiple drivable sections |
US11932238B2 (en) | 2020-06-29 | 2024-03-19 | Tusimple, Inc. | Automated parking technology |
US11727719B2 (en) | 2020-08-28 | 2023-08-15 | Stmicroelectronics, Inc. | System and method for detecting human presence based on depth sensing and inertial measurement |
EP4002327A1 (en) * | 2020-11-23 | 2022-05-25 | Sony Semiconductor Solutions Corporation | Time-of-flight simulation data training circuitry, time-of-flight simulation data training method, time-of-flight simulation data output method, time-of-flight simulation data output circuitry |
WO2022194352A1 (en) | 2021-03-16 | 2022-09-22 | Huawei Technologies Co., Ltd. | Apparatus and method for image correlation correction |
CN117455299B (en) * | 2023-11-10 | 2024-05-31 | 中国民用航空飞行学院 | Method and device for evaluating performance of fly-away training of simulator |
Family Cites Families (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2002952873A0 (en) * | 2002-11-25 | 2002-12-12 | Dynamic Digital Depth Research Pty Ltd | Image encoding system |
US8294809B2 (en) | 2005-05-10 | 2012-10-23 | Advanced Scientific Concepts, Inc. | Dimensioning system |
US7289119B2 (en) | 2005-05-10 | 2007-10-30 | Sony Computer Entertainment Inc. | Statistical rendering acceleration |
WO2008152647A2 (en) | 2007-06-15 | 2008-12-18 | Ben Gurion University Of The Negev Research And Development Authority | Three-dimensional imaging method and apparatus |
US9836538B2 (en) | 2009-03-03 | 2017-12-05 | Microsoft Technology Licensing, Llc | Domain-based ranking in document search |
US8803967B2 (en) | 2009-07-31 | 2014-08-12 | Mesa Imaging Ag | Time of flight camera with rectangular field of illumination |
US8849616B2 (en) | 2009-08-04 | 2014-09-30 | Microsoft Corporation | Method and system for noise simulation analysis useable with systems including time-of-flight depth systems |
US8717469B2 (en) | 2010-02-03 | 2014-05-06 | Microsoft Corporation | Fast gating photosurface |
US8405680B1 (en) | 2010-04-19 | 2013-03-26 | YDreams S.A., A Public Limited Liability Company | Various methods and apparatuses for achieving augmented reality |
US8918209B2 (en) | 2010-05-20 | 2014-12-23 | Irobot Corporation | Mobile human interface robot |
US8670029B2 (en) | 2010-06-16 | 2014-03-11 | Microsoft Corporation | Depth camera illuminator with superluminescent light-emitting diode |
US9753128B2 (en) | 2010-07-23 | 2017-09-05 | Heptagon Micro Optics Pte. Ltd. | Multi-path compensation using multiple modulation frequencies in time of flight sensor |
KR101669412B1 (en) | 2010-11-01 | 2016-10-26 | 삼성전자주식회사 | Method and apparatus of measuring depth information for 3d camera |
US20120154535A1 (en) | 2010-12-15 | 2012-06-21 | Microsoft Corporation | Capturing gated and ungated light in the same frame on the same photosurface |
US8872826B2 (en) | 2011-02-17 | 2014-10-28 | Sony Corporation | System and method for decoupled ray marching for production ray tracking in inhomogeneous participating media |
US20130141420A1 (en) | 2011-12-02 | 2013-06-06 | The Boeing Company | Simulation of Three-Dimensional (3D) Cameras |
US9213883B2 (en) | 2012-01-10 | 2015-12-15 | Samsung Electronics Co., Ltd. | Method and apparatus for processing depth image |
WO2013120041A1 (en) | 2012-02-10 | 2013-08-15 | Massachusetts Institute Of Technology | Method and apparatus for 3d spatial localization and tracking of objects using active optical illumination and sensing |
CN102663712B (en) * | 2012-04-16 | 2014-09-17 | 天津大学 | Depth calculation imaging method based on flight time TOF camera |
US9349169B2 (en) | 2012-05-17 | 2016-05-24 | The Regents Of The University Of California | Sampling-based multi-lateral filter method for depth map enhancement and codec |
US8854633B2 (en) | 2012-06-29 | 2014-10-07 | Intermec Ip Corp. | Volume dimensioning system and method employing time-of-flight camera |
GB201214976D0 (en) * | 2012-08-22 | 2012-10-03 | Connect In Ltd | Monitoring system |
LU92074B1 (en) | 2012-09-18 | 2014-03-19 | Iee Sarl | Depth image enhancement method |
US9373087B2 (en) * | 2012-10-25 | 2016-06-21 | Microsoft Technology Licensing, Llc | Decision tree training in machine learning |
KR101896301B1 (en) | 2013-01-03 | 2018-09-07 | 삼성전자주식회사 | Apparatus and method for processing depth image |
JPWO2014167876A1 (en) * | 2013-04-12 | 2017-02-16 | シャープ株式会社 | Nitride semiconductor device |
US9405008B2 (en) | 2013-05-17 | 2016-08-02 | Massachusetts Institute Of Technology | Methods and apparatus for multi-frequency camera |
KR102103984B1 (en) * | 2013-07-15 | 2020-04-23 | 삼성전자주식회사 | Method and apparatus processing a depth image |
US10063844B2 (en) * | 2013-10-17 | 2018-08-28 | Microsoft Technology Licensing, Llc. | Determining distances by probabilistic time of flight imaging |
US9542749B2 (en) | 2014-01-06 | 2017-01-10 | Microsoft Technology Licensing, Llc | Fast general multipath correction in time-of-flight imaging |
US9380224B2 (en) * | 2014-02-28 | 2016-06-28 | Microsoft Technology Licensing, Llc | Depth sensing using an infrared camera |
US9582753B2 (en) * | 2014-07-30 | 2017-02-28 | Mitsubishi Electric Research Laboratories, Inc. | Neural networks for transforming signals |
US10275707B2 (en) * | 2014-11-10 | 2019-04-30 | The Boeing Company | Systems and methods for training multipath filtering systems |
US10062201B2 (en) | 2015-04-21 | 2018-08-28 | Microsoft Technology Licensing, Llc | Time-of-flight simulation of multipath light phenomena |
US9633282B2 (en) * | 2015-07-30 | 2017-04-25 | Xerox Corporation | Cross-trained convolutional neural networks using multimodal images |
-
2016
- 2016-03-13 US US15/068,632 patent/US9760837B1/en active Active
-
2017
- 2017-03-06 WO PCT/US2017/020846 patent/WO2017160516A1/en active Application Filing
- 2017-03-06 EP EP17714934.1A patent/EP3430571A1/en not_active Ceased
- 2017-03-06 CN CN201780016747.8A patent/CN108885701B/en active Active
- 2017-08-08 US US15/672,261 patent/US10311378B2/en active Active
Cited By (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10304335B2 (en) | 2016-04-12 | 2019-05-28 | Ford Global Technologies, Llc | Detecting available parking spaces |
CN108304782A (en) * | 2017-01-13 | 2018-07-20 | 福特全球技术公司 | Generate the simulated sensor data for training and verifying detection model |
US10228693B2 (en) * | 2017-01-13 | 2019-03-12 | Ford Global Technologies, Llc | Generating simulated sensor data for training and validation of detection models |
US11429807B2 (en) | 2018-01-12 | 2022-08-30 | Microsoft Technology Licensing, Llc | Automated collection of machine learning training data |
WO2019139759A1 (en) | 2018-01-12 | 2019-07-18 | Microsoft Technology Licensing, Llc | Automated collection of machine learning training data |
WO2019139760A1 (en) | 2018-01-12 | 2019-07-18 | Microsoft Technology Licensing, Llc | Automated localized machine learning training |
US11481571B2 (en) | 2018-01-12 | 2022-10-25 | Microsoft Technology Licensing, Llc | Automated localized machine learning training |
WO2019188348A1 (en) * | 2018-03-29 | 2019-10-03 | パナソニックIpマネジメント株式会社 | Distance information acquisition device, multipath detection device, and multipath detection method |
JPWO2019188348A1 (en) * | 2018-03-29 | 2021-03-25 | ヌヴォトンテクノロジージャパン株式会社 | Distance information acquisition device, multipath detection device and multipath detection method |
US10602270B1 (en) | 2018-11-30 | 2020-03-24 | Microsoft Technology Licensing, Llc | Similarity measure assisted adaptation control |
WO2020117611A1 (en) | 2018-12-06 | 2020-06-11 | Microsoft Technology Licensing, Llc | Automatically performing and evaluating pilot testing of software |
KR20210013149A (en) * | 2018-12-14 | 2021-02-03 | 선전 센스타임 테크놀로지 컴퍼니 리미티드 | Image processing method and device, electronic device and storage medium |
KR102538164B1 (en) * | 2018-12-14 | 2023-05-30 | 선전 센스타임 테크놀로지 컴퍼니 리미티드 | Image processing method and device, electronic device and storage medium |
WO2020131499A1 (en) | 2018-12-19 | 2020-06-25 | Microsoft Technology Licensing, Llc | System and method of receiving and converting digital ink input |
WO2020197793A1 (en) | 2019-03-22 | 2020-10-01 | Microsoft Technology Licensing, Llc | Method and system for intelligently suggesting tags for documents |
US20220155454A1 (en) * | 2019-03-22 | 2022-05-19 | Sony Semiconductor Solutions Corporation | Analysis portion, time-of-flight imaging device and method |
WO2020193412A1 (en) * | 2019-03-22 | 2020-10-01 | Sony Semiconductor Solutions Corporation | Analysis portion, time-of-flight imaging device and method |
WO2021071615A1 (en) | 2019-10-11 | 2021-04-15 | Microsoft Technology Licensing, Llc | Keeping track of important tasks |
US20210166124A1 (en) * | 2019-12-03 | 2021-06-03 | Sony Semiconductor Solutions Corporation | Apparatuses and methods for training a machine learning network for use with a time-of-flight camera |
DE102020101706A1 (en) | 2020-01-24 | 2021-07-29 | Ifm Electronic Gmbh | Method for generating depth image pairs for a database |
US20210264166A1 (en) * | 2020-02-26 | 2021-08-26 | GM Global Technology Operations LLC | Natural surround view |
US11532165B2 (en) * | 2020-02-26 | 2022-12-20 | GM Global Technology Operations LLC | Natural surround view |
CN113315946A (en) * | 2020-02-26 | 2021-08-27 | 通用汽车环球科技运作有限责任公司 | Natural peripheral view |
DE102021109386A1 (en) | 2020-04-22 | 2021-10-28 | Ifm Electronic Gmbh | Method for correcting depth images of a time-of-flight camera |
DE102021109386B4 (en) | 2020-04-22 | 2024-05-16 | Ifm Electronic Gmbh | Method for correcting depth images of a time-of-flight camera |
US20210365722A1 (en) * | 2020-05-21 | 2021-11-25 | Canon Kabushiki Kaisha | Information processing device, information processing method, and storage medium |
US20220165027A1 (en) * | 2020-11-23 | 2022-05-26 | Sony Corporation | Training dataset generation for depth measurement |
US11475631B2 (en) * | 2020-11-23 | 2022-10-18 | Sony Corporation | Training dataset generation for depth measurement |
US11488317B2 (en) | 2020-11-23 | 2022-11-01 | Sony Group Corporation | Neural network model based depth estimation |
WO2022152374A1 (en) * | 2021-01-13 | 2022-07-21 | Eaton Intelligent Power Limited | A surface roughness measurement system |
WO2022219564A1 (en) * | 2021-04-16 | 2022-10-20 | Paladin AI Inc. | Automatic inferential pilot competency analysis based on detecting performance norms in flight simulation data |
DE102021111602A1 (en) | 2021-05-05 | 2022-11-10 | Ifm Electronic Gmbh | Computer-implemented method for correcting artifacts in measurement data generated by a time-of-flight 3D sensor, a corresponding computer program, a corresponding computer-readable medium and a PMD detector |
WO2024062874A1 (en) * | 2022-09-20 | 2024-03-28 | ソニーセミコンダクタソリューションズ株式会社 | Information processing device, information processing method, and program |
Also Published As
Publication number | Publication date |
---|---|
CN108885701A (en) | 2018-11-23 |
CN108885701B (en) | 2021-12-31 |
US9760837B1 (en) | 2017-09-12 |
US10311378B2 (en) | 2019-06-04 |
EP3430571A1 (en) | 2019-01-23 |
WO2017160516A1 (en) | 2017-09-21 |
US20180129973A1 (en) | 2018-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10311378B2 (en) | Depth from time-of-flight using machine learning | |
EP3411731B1 (en) | Temporal time-of-flight | |
US8571263B2 (en) | Predicting joint positions | |
US10311282B2 (en) | Depth from time of flight camera | |
EP2932444B1 (en) | Resource allocation for machine learning | |
US9380224B2 (en) | Depth sensing using an infrared camera | |
US9373087B2 (en) | Decision tree training in machine learning | |
US8625897B2 (en) | Foreground and background image segmentation | |
US10110881B2 (en) | Model fitting from raw time-of-flight images | |
CN107466411B (en) | Two-dimensional infrared depth sensing | |
EP3092509B1 (en) | Fast general multipath correction in time-of-flight imaging | |
US20140184749A1 (en) | Using photometric stereo for 3d environment modeling | |
Zhang et al. | Close the optical sensing domain gap by physics-grounded active stereo sensor simulation | |
Al-Temeemy | The development of ViBe foreground detection algorithm using Lévy flights random update strategy and Kinect laser imaging sensor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:NOWOZIN, SEBASTIAN;ADAM, AMIT;MAZOR, SHAI;AND OTHERS;SIGNING DATES FROM 20160301 TO 20160308;REEL/FRAME:037962/0159 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |