WO2023057343A1 - Appareils et procédés d'estimation de profondeur guidée par un événement - Google Patents

Appareils et procédés d'estimation de profondeur guidée par un événement Download PDF

Info

Publication number
WO2023057343A1
WO2023057343A1 PCT/EP2022/077355 EP2022077355W WO2023057343A1 WO 2023057343 A1 WO2023057343 A1 WO 2023057343A1 EP 2022077355 W EP2022077355 W EP 2022077355W WO 2023057343 A1 WO2023057343 A1 WO 2023057343A1
Authority
WO
WIPO (PCT)
Prior art keywords
depth
event
motion
areas
scene
Prior art date
Application number
PCT/EP2022/077355
Other languages
English (en)
Inventor
Manasi MUGLIKAR
Diederik Paul MOEYS
Davide SCARAMUZZA
Original Assignee
Sony Semiconductor Solutions Corporation
University Of Zurich
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Semiconductor Solutions Corporation, University Of Zurich filed Critical Sony Semiconductor Solutions Corporation
Publication of WO2023057343A1 publication Critical patent/WO2023057343A1/fr

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/02Systems using the reflection of electromagnetic waves other than radio waves
    • G01S17/06Systems determining position data of a target
    • G01S17/46Indirect determination of position data
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/02Systems using the reflection of electromagnetic waves other than radio waves
    • G01S17/50Systems of measurement based on relative movement of target
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/88Lidar systems specially adapted for specific applications
    • G01S17/89Lidar systems specially adapted for specific applications for mapping or imaging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/521Depth or shape recovery from laser ranging, e.g. using interferometry; from the projection of structured light
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present disclosure generally relates to optical imaging devices, particularly optical assemblies for depth estimation. Examples relate to apparatuses and methods for event guided depth estimation.
  • High-resolution and high-speed depth estimation is desirable in many applications such as robotics, augmented reality (AR) and virtual reality (VR), and 3D modeling.
  • AR augmented reality
  • VR virtual reality
  • 3D modeling 3D modeling
  • SL Event-based Structured Light
  • employing event cameras circumvent this problem by suppressing redundant information and generating events asynchronously only where the logarithmic brightness change exceeds a threshold. Therefore, no redundant data may be created or processed, and in turn, that processing may be faster than frame-based algorithms.
  • This type of data may encode location, polarity, and timing of logarithmic changes in brightness perceived by pixels of an event camera.
  • An event camera also known as event-based vision sensor (EVS) or dynamic vision sensor (DVS) refers to an imaging sensor that responds to local changes in brightness. Pixels of an EVS operate independently and asynchronously. This property allows every pixel to generate an event exactly at the point in time when its illumination changes, leading to a very fast response time; the latency typically lies in the order of tens of microseconds.
  • An increase in brightness may trigger a so called ON-event, a decrease may trigger an OFF-event.
  • Modem EVS have microsecond temporal resolution, 120 dB dynamic range, and less under/overex- posure and motion blur than frame cameras.
  • the present disclosure proposes an apparatus for event-based depth measurement.
  • the apparatus comprises an event-based vision sensor (event camera) which is configured to detect one or more areas of motion (events) in a scene.
  • the apparatus further comprises a depth detector configured to determine depth (distance) information for the detected one or more areas of motion with a higher spatial resolution than for a remainder of the scene.
  • an event-based vision sensor EVS
  • EVS event-based vision sensor
  • the apparatus further comprises analog and/or digital processing circuitry configured to, based on information on the detected one or more areas of motion in the scene, control an illumination source of the depth detector to illuminate or scan the detected one or more areas of motion with a higher spatial resolution.
  • the processing circuitry may adaptively control a scanning frequency based on the information on the detected one or more areas of motion.
  • the apparatus further comprises an optical splitter configured to provide the same field of view of the scene for the event-based vision sensor and the depth detector. In this way, the same field of view may be shared among depth detector (or its illumination source) and the EVS.
  • the apparatus further comprises analog and/or digital processing circuitry configured to compute depth information based on sensor output signals of the depth detector.
  • the processing circuitry may comprise one or more machine learning networks for so-called depth completion.
  • the depth detector comprises a laser point projector (as illumination source) configured to scan the detected one or more areas of motion with the higher spatial resolution than the remainder of the scene.
  • a laser point projector as illumination source
  • the depth detector comprises a second event-based vision sensor sensitive to a wavelength and pulse frequency of an illumination source of the depth detector.
  • the apparatus may further comprise analog and/or digital processing circuitry configured to compute depth information based on triangulating illumination events of the illumination source and detected events of the second event-based vision sensor.
  • one EVS may be used for motion detection and a second EVS may be used for depth estimation.
  • the event-based vision sensor comprises a first set of pixels configured to detect the one or more areas of motion and a second set of pixels sensitive to a wavelength and pulse frequency of an illumination source of the depth detector.
  • the apparatus further comprises analog and/or digital processing circuitry configured to compute depth information based on triangulating illumination events of the illumination source and detected events of the second set of pixels.
  • the second EVS may be omitted. Instead, a single EVS may be used for motion detection and depth estimation.
  • the apparatus further comprises analog and/or digital processing circuitry configured to determine a respective bounding box around the detected one or more areas of motion.
  • the depth detector may be configured to scan areas within a respective bounding box with a higher spatial resolution than outside the bounding box.
  • the depth detector is configured to perform an initial scan of the scene with full spatial resolution and to adapt the spatial resolution for subsequent scans based on the detected one or more areas of motion in the scene.
  • the present disclosure proposes a method for event-based depth measurement.
  • the method includes detecting one or more areas of motion in a scene with an event-based vision sensor and determining depth information for the detected one or more areas of motion with a higher spatial resolution than for a remainder of the scene.
  • Embodiments of the present disclosure take inspiration from human perception, which involves foveating: areas of interest are scanned with the highest resolution while the other regions are scanned with lower resolution.
  • the area of interest for robotics applications is often large contrast features or dynamic objects in the scene. Therefore, identifying these areas with low latency in challenging environments is essential. Therefore, it is proposed to use an event camera to identify these areas of interest as events naturally correspond to moving objects (assuming brightness constancy) and therefore do not require further processing to segment the area of interest.
  • the present disclosure provides a depth sensor that can scan dynamic objects with low latency.
  • the skilled person having benefit from the present disclosure will appreciate that ideas presented here are applicable for a wide range of depth sensing modalities like LIDAR, Time of Flight (ToF) and standard stereo.
  • Fig. 1 shows an apparatus for event-based depth measurement according to a first embodiment
  • Fig. 2A shows an apparatus for event-based depth measurement according to a second embodiment
  • Fig. 2B shows an apparatus for event-based depth measurement according to a third embodiment
  • Fig. 3 A shows a top view of an EVS pixel matrix with a first set of pixels configured to detect events (motion) and a second set of pixels sensitive to a wavelength and pulse frequency of an illumination source of the depth detector for depth estimation;
  • Fig. 3B shows a side view of two adjacent EVS pixels, one for event (motion) detection, the other one for depth estimation;
  • Fig. 4 show an example event time surface with sparse sampling and adaptive fove- ating depth
  • Fig. 5 shows a method for event-based depth measurement according to a first embodiment
  • Fig. 6 shows a method for event-based depth measurement according to a second embodiment.
  • the problem of depth estimation may also be considered using an illumination source, such as, for example, a laser point projector (laser scanner), and an EVS (event camera).
  • an illumination source such as, for example, a laser point projector (laser scanner), and an EVS (event camera).
  • laser scanner laser point projector
  • EVS event camera
  • structured light A scene may be illuminated by a laser beam inside the laser scanner, moving in a raster scanning fashion. For example, one or more moveable mirrors may be used to steer the laser beam. Light reflected from the scene may generate events in the EVS.
  • Event-based Structured Light uses the event timestamp to map the event pixel to the projector pixel.
  • Disparity (d) may be converted to depth (Z), given a system calibration parameters baseline (Z>) and focal length (f) as:
  • Latency may be defined as the time it takes for an event to be registered since the moment logarithmic change in intensity exceeds the threshold, it can currently range on average from a few microseconds to hundreds of milliseconds, depending on bias settings, manufacturing process and illumination level.
  • Transistor noise may be defined as the random transistor noise of the circuits, it may also depend on settings and illumination. This noise randomly changes the measured signal, leading to threshold-comparisonjitter. Other non-idealities may encompass parasitic photocurrents and junction leakages, these effects bias event generation to one specific polarity.
  • Read-out architectures may be arbitrated architectures, which preserve or partially preserve the order of the pixels’ firing. They may lead to significant queuing delays before the timestamping operation. This may be particularly noticeable when the number of active pixels (and resolution) scale up. Scanning readouts, on the other hand, may limit the possible delays by sacrificing event timing resolution.
  • Jitter may be defined as the random variation that appears in timestamps. It may depend on all of the aforementioned factors, all of which increase the unpredictability and imprecision of the event timing.
  • This approach may be defined as dense sampling since it illuminates every pixel. At is lower than 1 ps for almost every sensor resolution for scanning frequency higher than 60 Hz. Since At depends on the scanning frequency f and sensor resolution W x H. there is no simple way to ensure At is higher than EVS temporal resolution. Similarly, another disadvantage of dense sampling arises from high-event rates. Event-rate, measured in MEv/s, is directly proportional to the scanning frequency f and spatial resolution W x H. Event-rate significantly affects the event timestamps as a high event rate would mean a high latency in the event timestamps in the order of ms. High latency can lead to event being dropped due to an overcrowded eventbuffer queue or wrong timestamps allocated to events based on the read-out architecture. The present disclosure therefore focuses on sparse sampling to increase efficiency.
  • Sparse sampling increases the At between consecutive events by scanning only a fraction of the pixels.
  • the At is now proportional to the number of pixels skipped between two illuminated pixels (defined as TV). This allows to control the timestamp errors up to a certain extend.
  • Sparse sampling does not consider a scene’s geometry. This can lead to undersampling of areas of fast-moving objects and oversampling planar areas, leading to redundant information. Conversely, sparse sampling is not always desirable in areas of interest (usually of object motion).
  • an autonomous driving scenario for example, if a fast-moving vehicle or pedestrian moves in front of a car, it is useful to have more depth information about this.
  • dynamic objects in the scene may be important for obstacle avoidance or local planning of trajectories. These scenarios make it desirable to have a higher resolution in these areas of interest at a low latency.
  • EVS are well suited for this application as they naturally encode for motion in their data (assuming constant brightness), eliminating static redundant information.
  • a strong parallelism is present in the human retina, together with low processing latency and high dynamic range.
  • Sparse sampling may have the disadvantage of under-sampling areas of interest and over- sampling planar areas.
  • the present disclosure therefore proposes to use an event-based sampling for depth. As events naturally encode the relative motion of the scene with high temporal resolution, they are ideal for this task.
  • Fig. 1 schematically illustrates an apparatus 100 for event-based depth measurement according to an embodiment of the present disclosure.
  • Apparatus 100 comprises an EVS 110 configured to detect one or more areas of motion 112 in a scene 114 under observation.
  • the scene may be a (partial) surrounding of a vehicle.
  • pixels in EVS 110 may operate independently and asynchronously, responding to intensity changes by producing events. Events may be represented by the x, y pixel location and timestamp t (in microseconds) of an intensity change as well as its polarity (i.e., whether the pixel became darker or brighter).
  • An area of motion 112 may trigger one or more events in EVS 110 and may comprise one or more objects moving relative to the EVS 110.
  • a moving object may be a moving vehicle, a pedestrian, a ball, or the like. The skilled person having benefit from the present disclosure will appreciate that there are numerous examples for moving objects.
  • Apparatus 100 further comprises a depth detector 120 configured to determine depth information 122 for the detected one or more areas of motion 112 with a higher spatial resolution than for a remainder 116 of the scene 114.
  • EVS 110 may be communicatively coupled to depth detector 120 as indicated by reference numeral 115.
  • Communication interface 115 may convey spatial information on the detected one or more areas of motion 112 from EVS 110 to depth detector 120.
  • Depth detector 120 may then be controlled based on the spatial information on the detected one or more areas of motion 112.
  • depth detector 120 may be controlled to increase its spatial scanning or sampling resolution for the detected one or more areas of motion 112 compared to the remainder 116 of the scene 114.
  • depth detector 120 may be controlled to decrease its spatial scanning or sampling resolution for remainder 116 of the scene 114 compared to the detected one or more areas of motion 112.
  • Depth detector 120 may illuminate the same scene 114 as observed by the EVS 110 and may comprise any ranging device capable of determining depth (distance) information at different scanning or sampling rates, such as radar devices, lidar devices, stereo cameras, a time-of- flight (ToF) sensors, or an event-based Structured Light (SL) system including a laser point projector / laser scanner (as optical transmitter) in conjunction with an EVS (as optical receiver), for example.
  • Depth information 122 may comprise spatial information of the illuminated scene 114 together with associated depth or distance information.
  • depth information 122 may come in the form of triplets comprising spatial coordinates (x, y) and associated distance (d), for example, (x, y, d).
  • Spatial scanning may take place with higher spatial resolution, for example higher scanning frequency, for the detected one or more areas of motion 112 compared to areas 116 of less or no motion (for example, a background of the scene).
  • depth information 122 may comprise raw sensor output signals of the depth detector 120.
  • depth information 122 may comprise depth information obtained after further processing the raw sensor output signals. This processing may take place in analog and/or digital depth processing circuitry.
  • the depth processing circuitry may implement machine learning algorithms, such as (fully) convolutional neural networks, for example, for depth completion.
  • Fig. 2A illustrates a more specific embodiment of an event-based SL apparatus 200
  • the depth detector 120 comprises a laser point projector / laser scanner 220-Tx as illumination source and a second EVS 220-Rx tuned to observe events generated by the laser point projector 220-Tx.
  • Laser point projector 220-Tx and second EVS 220-Rx together constitute a depth detector 220.
  • laser point projector 220-Tx may emit infrared (IR) light beams.
  • Laser point projector 220-Tx may use one or more moveable mirrors to steer the laser beams.
  • EVS 110 is configured to detect areas of motion 112 in the scene 114 and to provide corresponding control data for controlling the spatial (scanning) resolution of depth detector 220.
  • some or all of the devices 110, 220-Tx, and 220-Rx may include lens assemblies to improve optical imaging properties of the respective devices.
  • Event-based depth measurement apparatus 200 comprises an optical splitter 240 coupled to both, laser point projector 220-Tx and EVS 110.
  • Optical splitter 240 is configured to provide the same field of view of the scene 114 for the EVS 110 and depth detector 200 or its illumination source 220-Tx.
  • Optical splitter 240 may be transparent to laser beams 224 emitted from laser point projector 220-Tx towards the scene 114 and may at the same time limit the field of view of the laser point projector 220-Tx.
  • Optical splitter 240 may comprise a mirror assembly 242 which is configured to direct light reflected from the scene 114 to EVS 110.
  • the EVS 110 and the illumination source (projector) 220-Tx may share the same field of view avoiding any occluded regions.
  • EVS 110 may forward motion data to control circuitry 230 (e.g., one or more DSPs, ASICs, FPGAs, etc.).
  • Control circuitry 230 may determine the areas 112 of relative motion based on the motion data received from EVS 110 and based on certain predefined threshold values, for example.
  • Control signals based on the areas 112 of relative motion may then be fed back to the laser point projector 220-Tx to adapt its scanning frequency accordingly. That is, laser point projector 220-Tx may apply a higher scanning frequency for the areas 112 of relative motion whereas a lower scanning frequency may be applied for the remainder of scene 114.
  • depth detector 220 comprises a second EVS 220-Rx sensitive to a wavelength (e.g., IR) and pulse/scanning frequency of laser point projector 220-Tx.
  • Second EVS 220-Rx may be used to compute depth information based on triangulating illumination events of laser point projector 220-Tx and detected events of the second EVS 220-Rx.
  • Laser point projector 220-Tx illuminates the scene 114 whose surface is to be measured at an angle.
  • EVS 220-Rx registers the scattered light corresponding to events (motion).
  • the distance from the (moving) object to the EVS 220-Rx can be determined.
  • the connection between EVS 220-Rx and light source 220-Tx as well as the two beams from and to the (moving) object 112 form a triangle, hence the name triangulation. If the procedure is carried out in a raster-like or continuously moving manner, the surface relief can be determined with high accuracy.
  • This computation of depth information 122 may be performed via one or more digital signal processing circuits which may be implemented together with control circuitry 230. After calibrating this system EVS and projector, depth may be estimated by triangulating corresponding points on the time surface of the second EVS 220-Rx and the projector 220-Tx.
  • Fig. 2B illustrates an embodiment of an event-based depth measurement apparatus 250 where the positions of the second EVS 220-Rx and the projector 220-Tx have been switched without deviating from the working principle.
  • optical splitter 240 is configured to provide the same field of view of the scene 114 for the first EVS 110 for motion detection and the second EVS 220-Rx for depth estimation.
  • EVS 310 comprises a first set of pixels 312 configured to detect the one or more areas of motion and a second set of pixels 314 sensitive to a wavelength (e.g., in the IR region) and pulse frequency of laser point projector 220-Tx.
  • a wavelength e.g., in the IR region
  • pulse frequency of laser point projector 220-Tx e.g., in the IR region
  • processing circuitry 230 may be used to compute depth information based on triangulating illumination events of the laser point projector 220-Tx and detected events 324 of the second set of pixels 314.
  • a single EVS sensor 310 with hybrid IR-coated pixels 314 and uncoated pixels 312 or differently biased pixels may be used instead of the two EVS 110 and 220-Rx in Fig. 2
  • Fig. 3 A shows a schematic top view of EVS 310
  • Fig. 3B illustrates a sectional view of EVS 310 comprising an upper layer implementing photodiodes of the respective pixels 312, 314 and a lower layer implementing respective read-out-circuitry on a common semiconductor substrate.
  • Active pixels may be defined as pixels of EVS 110 that correspond to one or more moving objects 112. Assuming brightness constancy, events observed by EVS 110 correspond to edges moving either due to ego-motion or object motion.
  • the distinction between these two types of events may be simple in the case of a stationary EVS since all the events are caused due to object motion.
  • differentiating these two motion types may not be as simple in the case of moving setup and may typically be solved using optimization or by assuming only rotational motion using Inertial Measurement Unit (IMU) priors. Since the proposed application aims for low latency, all events (ego-motion and object motion) may be considered as equal contributors and it may not be necessary to distinguish between them.
  • IMU Inertial Measurement Unit
  • fined tuned faster algorithms for ego motion segmentation may be embedded onto ASICs or FPGAs to reduce latency overhead.
  • Active pixels define the area 112 of dense scanning for depth sensor 120. By sampling a small area of pixels, the proposed system can lead to a low-power and high framerate depth sensor.
  • Depth information 122 may be estimated by triangulating pixels from the event-plane and the projector-plane.
  • the geometry of the measurement setup 200 has to be known.
  • Data association may be achieved by using corresponding event timestamps from illumination source 220-Tx and EVS 220-Rx or 310.
  • EVS 220-Rx or 310 is synchronized with projector 220-Tx
  • events due to the projector 220-Tx may be represented as a time surface.
  • An example time surface is shown in Fig. 4 illustrating exemplary event time surfaces with sparse sampling (a) and the proposed adaptive foveating depth (b) within a rectangular bounding box 410 surrounding an area of motion in a scene.
  • the point laser projector 220-Tx may be modeled as an inverse EVS (event camera), allowing to represent the projector illumination as a time surface and simplifying the structured light (SL) system as a special case of stereo event camera. Assuming a calibrated system of EVS and projector, the two time surfaces can be rectified, and depth information 122 can be computed by triangulating illumination events from the projector 220-Tx and regular events from EVS 220-Rx or 310. The projector 220-Tx traverses its pixels precise time in the scanning period, allowing to define a time surface over the projector’s pixel grid. Due to this similarity with event time surface and inverse camera behavior (where camera perceives light, a projector emits light), one may think of the projector as an “inverse” event camera.
  • EVS event camera
  • processing circuitry 230 may be configured to determine a respective rectangular bounding box 410 around the detected one or more areas of motion.
  • the depth detector 120, 220 may be configured to scan areas within a respective bounding box 410 with a higher spatial resolution than outside the bounding box 410.
  • Fig- 5 summarizes the proposed concept by illustrating a flowchart of a method 500 for event- guided depth estimation.
  • Method 500 includes an act 502 of detecting one or more areas of motion 112 in a scene 114 with an EVS 110 and an act 504 of determining depth information for the detected one or more areas of motion 112 with a higher spatial resolution than for a remainder of the scene.
  • EVS 110 tuned to detect temporal changes in the scene brightness. These events effectively locate moving objects in a scene, which are associated with regions of interest.
  • a pulsed IR laser projector 220-Tx synchronized with a second EVS sensor 220- Rx or pixel looking at the same scene is then activated to compute the depth of the moving objects with high-density (non-redundant information), in order to save power and increase speed, while only very sparsely sampling the depth of the rest of the scene.
  • the algorithm may work as illustrated in Fig. 6.
  • An initial full scan of the scene depth may be performed with the calibrated pair of lasersensitive EVS sensor 220-Rx + pulsed IR laser 220-Tx.
  • a threshold of such a EVS sensor 220-Rx can be made high enough that it is only sensitive to the pulsed-laser 220-Tx and not the scene motion.
  • the data from the tuned EVS sensor 110 relating only to scene motion may be fed to the DSP 230 which can estimate one or more regions of event data spatial and temporal density, according to a selectable threshold.
  • the correlation of the data can be recognized thanks to an embedded background activity filter and basic clustering algorithm of the literature.
  • the projector spatial sampling may be increased at the location of detected motion thanks to the control signals from the DSP 230, while the rest 116 of the scene 114 is only very sparsely sampled, if at all.
  • Depth may only computed at the areas of enough events by the same DSP 230.
  • the proposed concept of event-based depth sampling may be used for down-stream task such as depth completion aiming to recover dense depth maps from sparse depth measurements.
  • Depth completion task is a sub-problem of depth estimation.
  • sparse-to-dense depth completion problem one wants to infer the dense depth map of a 3D scene given an RGB image and its corresponding sparse reconstruction in the form of a sparse depth map obtained either from a depth detector.
  • CNNs convolutional neural networks
  • the general network architecture is inspired from U-Net which was used previously used for monocular depth estimation with events and combining events with images. Similar to Daniel Gehrig, Michelle Ruegg, Mathias Gehrig, Javier Hidalgo-Carrio, and Davide Scaramuzza, “Combining events and frames using recurrent asynchronous multimodal networks for monocular depth prediction,” IEEE Robotic and Automation Letters, the proposed architecture uses skip connections and residual block.
  • the metric depth D m may be first converted into a normalized log depth map £> G [0, 1] facilitating large depth variation learning.
  • Events may be converted to fixed-size tensor to utilize existing convolutional neural network architectures. Events in a time window AT may be drawn into a voxel grid with spatial dimensions H x W and B temporal bins. The number of bins B may be set to 5 for all experiments.
  • the proposed event-guided depth estimation is agnostic to depth estimation method (SL, ToF or LiDAR) and may be superior to dense sampling. It can be shown that in different scenarios like autonomous driving and indoor application, the active pixels correspond to 10% of the pixel area on average. Dense methods may have a very high event rates of 6.8MEv/s, whereas the sparse sampling may have a lowest event rate of 265kEv/s.
  • the event rate for the proposed event-guided depth is data dependent and for the wall scene this varies between 600kEv/s and 5MEv/s depending on the motion. It may be observed that with sparse sampling, a lower reconstruction error may be achieved because the event timestamps are accurate to produce accurate depth maps.
  • dense sampling has the highest reconstruction error due to the noise in the event timestamps present at such high event rates. The proposed concept strikes a balance between these two approaches and achieves a low reconstruction error and a low event-rate rate.
  • the present disclosure presents a bio-inspired event-guided depth estimation method.
  • event-camera based depth sampling is proposed as event naturally encode relative motion of the scene. It may be shown that in natural scenes like autonomous driving and indoor environments moving edges correspond to less than 10% of the scene. Thus, the proposed setup requires the sensor to scan only 10% of the scene. This would equate to almost 90% less power consumption by the illumination source and a significant gain in the framerate of the depth sensor.
  • Example 1 is an apparatus for event-based depth measurement.
  • the apparatus comprising an event-based vision sensor configured to detect one or more areas of motion in a scene, and a depth detector configured to determine depth information for the detected one or more areas of motion with a higher spatial resolution than for a remainder of the scene.
  • Example 2 the apparatus of Example 1 optionally further comprises processing circuitry configured to, based on information on the detected one or more areas of motion in the scene, control an illumination source of the depth detector to illuminate the detected one or more areas of motion with a higher spatial resolution.
  • Example 3 the apparatus of Example 1 or 2 optionally further comprises an optical splitter configured to provide the same field of view of the scene for the event-based vision sensor and the depth detector.
  • Example 4 the apparatus of any one of Examples 1 to 3 further optionally comprises processing circuitry configured to compute depth information based on sensor output signals of the depth detector.
  • Example 5 the depth detector of any one of Examples 1 to 4 further optionally comprises a laser point projector configured to scan the detected one or more areas of motion with the higher spatial resolution than the remainder of the scene.
  • Example 6 the depth detector of any one of Examples 1 to 5 further optionally comprises a second event-based vision sensor sensitive to a wavelength and pulse frequency of an illumination source of the depth detector.
  • the apparatus further comprises processing circuitry configured to compute depth information based on triangulating illumination events of the illumination source and detected events of the second event-based vision sensor.
  • the event-based vision sensor of any one of Examples 1 to 5 further optionally comprises a first set of pixels configured to detect the one or more areas of motion and a second set of pixels sensitive to a wavelength and pulse frequency of an illumination source of the depth detector.
  • the apparatus further comprises processing circuitry configured to compute depth information based on triangulating illumination events of the illumination source and detected events of the second set of pixels.
  • Example 8 the apparatus of any one of Examples 1 to 7 further optionally comprises processing circuitry configured to determine a respective rectangular bounding box around the detected one or more areas of motion, and wherein the depth detector is configured to scan areas within a respective bounding box with a higher spatial resolution than outside the bounding box.
  • Example 9 the depth detector of any one of Examples 1 to 8 is further optionally configured to perform an initial scan of the scene with full spatial resolution and to adapt the spatial resolution for subsequent scans based on the detected one or more areas of motion in the scene.
  • Example 10 is a method for event-based depth measurement, the method comprising detecting one or more areas of motion in a scene with an event-based vision sensor, and determining depth information for the detected one or more areas of motion with a higher spatial resolution than for a remainder of the scene.
  • Examples may further be or relate to a (computer) program including a program code to execute one or more of the above methods when the program is executed on a computer, processor or other programmable hardware component.
  • steps, operations or processes of different ones of the methods described above may also be executed by programmed computers, processors or other programmable hardware components.
  • Examples may also cover program storage devices, such as digital data storage media, which are machine-, processor- or computer-readable and encode and/or contain machine-executable, processor-executable or computer-executable programs and instructions.
  • Program storage devices may include or be digital storage devices, magnetic storage media such as magnetic disks and magnetic tapes, hard disk drives, or optically readable digital data storage media, for example.
  • Other examples may also include local computer devices (e.g.
  • the computer system may comprise any circuit or combination of circuits.
  • the computer system may include one or more processors which can be of any type.
  • processor may mean any type of computational circuit, such as but not limited to a microprocessor, a microcontroller, a complex instruction set computing (CISC) microprocessor, a reduced instruction set computing (RISC) microprocessor, a very long instruction word (VLIW) microprocessor, a graphics processor, a digital signal processor (DSP), multiple core processor, a field programmable gate array (FPGA), for example, of a microscope or a microscope component (e.g. camera) or any other type of processor or processing circuit.
  • CISC complex instruction set computing
  • RISC reduced instruction set computing
  • VLIW very long instruction word
  • DSP digital signal processor
  • FPGA field programmable gate array
  • circuits that may be included in the computer system may be a custom circuit, an application-specific integrated circuit (ASIC), or the like, such as, for example, one or more circuits (such as a communication circuit) for use in wireless devices like mobile telephones, tablet computers, laptop computers, two-way radios, and similar electronic systems.
  • the computer system may include one or more storage devices, which may include one or more memory elements suitable to the particular application, such as a main memory in the form of random access memory (RAM), one or more hard drives, and/or one or more drives that handle removable media such as compact disks (CD), flash memory cards, digital video disk (DVD), and the like.
  • RAM random access memory
  • CD compact disks
  • DVD digital video disk
  • the computer system may also include a display device, one or more speakers, and a keyboard and/or controller, which can include a mouse, trackball, touch screen, voice-recognition device, or any other device that permits a system user to input information into and receive information from the computer system.
  • a display device one or more speakers
  • a keyboard and/or controller which can include a mouse, trackball, touch screen, voice-recognition device, or any other device that permits a system user to input information into and receive information from the computer system.
  • Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a processor, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus.
  • embodiments of the invention can be implemented in hardware or in software.
  • the implementation can be performed using a non- transitory storage medium such as a digital storage medium, for example a floppy disc, a DVD, a Blu-Ray, a CD, a ROM, a PROM, and EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
  • Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
  • embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer.
  • the program code may, for example, be stored on a machine readable carrier.
  • an embodiment of the present invention is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
  • a further embodiment of the present invention is, therefore, a storage medium (or a data carrier, or a computer-readable medium) comprising, stored thereon, the computer program for performing one of the methods described herein when it is performed by a processor.
  • the data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitionary.
  • a further embodiment of the present invention is an apparatus as described herein comprising a processor and the storage medium.
  • a further embodiment of the invention is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein.
  • the data stream or the sequence of signals may, for example, be configured to be transferred via a data communication connection, for example, via the internet.
  • a further embodiment comprises a processing means, for example, a computer or a programmable logic device, configured to, or adapted to, perform one of the methods described herein.
  • a processing means for example, a computer or a programmable logic device, configured to, or adapted to, perform one of the methods described herein.
  • a further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
  • a further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver.
  • the receiver may, for example, be a computer, a mobile device, a memory device or the like.
  • the apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
  • a programmable logic device for example, a field programmable gate array
  • a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein.
  • the methods are preferably performed by any hardware apparatus. It is further understood that the disclosure of several steps, processes, operations or functions disclosed in the description or claims shall not be construed to imply that these operations are necessarily dependent on the order described, unless explicitly stated in the individual case or necessary for technical reasons. Therefore, the previous description does not limit the execution of several steps or functions to a certain order.
  • a single step, function, process or operation may include and/or be broken up into several sub-steps, - functions, -processes or -operations.
  • aspects described in relation to a device or system should also be understood as a description of the corresponding method.
  • a block, device or functional aspect of the device or system may correspond to a feature, such as a method step, of the corresponding method.
  • aspects described in relation to a method shall also be understood as a description of a corresponding block, a corresponding element, a property or a functional feature of a corresponding device or a corresponding system.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Electromagnetism (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • Optics & Photonics (AREA)
  • Multimedia (AREA)
  • Length Measuring Devices By Optical Means (AREA)

Abstract

La présente divulgation concerne un appareil pour une mesure de profondeur basée sur un événement, l'appareil comprenant un capteur de vision basé sur un événement conçu pour détecter une ou plusieurs zones de mouvement dans une scène, et un détecteur de profondeur conçu pour déterminer des informations de profondeur pour la ou les zones de mouvement détectées avec une résolution spatiale plus élevée que pour un reste de la scène.
PCT/EP2022/077355 2021-10-08 2022-09-30 Appareils et procédés d'estimation de profondeur guidée par un événement WO2023057343A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP21201703.2 2021-10-08
EP21201703 2021-10-08

Publications (1)

Publication Number Publication Date
WO2023057343A1 true WO2023057343A1 (fr) 2023-04-13

Family

ID=78087063

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2022/077355 WO2023057343A1 (fr) 2021-10-08 2022-09-30 Appareils et procédés d'estimation de profondeur guidée par un événement

Country Status (1)

Country Link
WO (1) WO2023057343A1 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190279379A1 (en) * 2018-03-09 2019-09-12 Samsung Electronics Co., Ltd. Method and apparatus for performing depth estimation of object
US20200223434A1 (en) * 2020-03-27 2020-07-16 Intel Corporation Methods and devices for detecting objects and calculating a time to contact in autonomous driving systems

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190279379A1 (en) * 2018-03-09 2019-09-12 Samsung Electronics Co., Ltd. Method and apparatus for performing depth estimation of object
US20200223434A1 (en) * 2020-03-27 2020-07-16 Intel Corporation Methods and devices for detecting objects and calculating a time to contact in autonomous driving systems

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DANIEL GEHRIGMICHELLE RUEGGMATHIAS GEHRIGJAVIER HIDALGO-CARRIODAVIDE SCARAMUZZA: "Combining events and frames using recurrent asynchronous multimodal networks for mo-nocular depth prediction", IEEE ROBOTIC AND AUTOMATION LETTERS
GUILLERMO GALLEGO ET AL: "Event-based Vision: A Survey", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 17 April 2019 (2019-04-17), XP081607933 *

Similar Documents

Publication Publication Date Title
CN108370438B (zh) 范围选通的深度相机组件
US11209528B2 (en) Time-of-flight depth image processing systems and methods
TWI709943B (zh) 深度估計裝置、自動駕駛車輛及其深度估算方法
US9576375B1 (en) Methods and systems for detecting moving objects in a sequence of image frames produced by sensors with inconsistent gain, offset, and dead pixels
CN109903324B (zh) 一种深度图像获取方法及装置
US20190147624A1 (en) Method for Processing a Raw Image of a Time-of-Flight Camera, Image Processing Apparatus and Computer Program
US20180350087A1 (en) System and method for active stereo depth sensing
US11670083B2 (en) Vision based light detection and ranging system using dynamic vision sensor
Muglikar et al. Event guided depth sensing
Nishimura et al. Disambiguating monocular depth estimation with a single transient
US20220099836A1 (en) Method for measuring depth using a time-of-flight depth sensor
US20220028102A1 (en) Devices and methods for determining confidence in stereo matching using a classifier-based filter
US20230403385A1 (en) Spad array for intensity image capture and time of flight capture
CN112740065B (zh) 成像装置、用于成像的方法和用于深度映射的方法
US11985433B2 (en) SPAD array for intensity image sensing on head-mounted displays
JP2017134561A (ja) 画像処理装置、撮像装置および画像処理プログラム
WO2023057343A1 (fr) Appareils et procédés d'estimation de profondeur guidée par un événement
US11539895B1 (en) Systems, methods, and media for motion adaptive imaging using single-photon image sensor data
Vaida et al. Automatic extrinsic calibration of LIDAR and monocular camera images
WO2022195954A1 (fr) Système de détection
CN118119966A (zh) 用于获得暗电流图像的系统和方法
US20240230910A9 (en) Time-of-flight data generation circuitry and time-of-flight data generation method
US20240144502A1 (en) Information processing apparatus, information processing method, and program
US20240161319A1 (en) Systems, methods, and media for estimating a depth and orientation of a portion of a scene using a single-photon detector and diffuse light source
TWI753344B (zh) 混合型深度估算系統

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22789239

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022789239

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2022789239

Country of ref document: EP

Effective date: 20240508