WO2023052264A1 - Light-field camera, vision system for a vehicle, and method for operating a vision system for a vehicle - Google Patents

Light-field camera, vision system for a vehicle, and method for operating a vision system for a vehicle Download PDF

Info

Publication number
WO2023052264A1
WO2023052264A1 PCT/EP2022/076528 EP2022076528W WO2023052264A1 WO 2023052264 A1 WO2023052264 A1 WO 2023052264A1 EP 2022076528 W EP2022076528 W EP 2022076528W WO 2023052264 A1 WO2023052264 A1 WO 2023052264A1
Authority
WO
WIPO (PCT)
Prior art keywords
light
field camera
vision system
scene
processing circuitry
Prior art date
Application number
PCT/EP2022/076528
Other languages
French (fr)
Inventor
Mattia ROSSI
Piergiorgio Sartor
Original Assignee
Sony Group Corporation
Sony Europe B.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Group Corporation, Sony Europe B.V. filed Critical Sony Group Corporation
Publication of WO2023052264A1 publication Critical patent/WO2023052264A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/207Image signal generators using stereoscopic image cameras using a single 2D image sensor
    • H04N13/232Image signal generators using stereoscopic image cameras using a single 2D image sensor using fly-eye lenses, e.g. arrangements of circular lenses
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/95Computational photography systems, e.g. light-field imaging systems
    • H04N23/957Light-field or plenoptic cameras or camera modules

Definitions

  • Examples relate to a light -field camera, a vision system for a vehicle, and a method for operating a vision system for a vehicle.
  • a conventional light-field camera may capture a direction of light rays emanating from a scene. This enables, amongst other things, a variable depth of field and three-dimensional imaging.
  • capturing a light field presents many technical challenges, such as with respect to energy consumption, latency, dynamic range, or frame-rate. This may be limiting, in particular, for applications in dynamic environments or under challenging lighting conditions such as automotive scenarios.
  • the present disclosure provides a light-field camera comprising at least one image sensor and at least one optical device.
  • the at least one optical device is configured to receive light emanating from a scene at different perspectives and to direct the light to the at least one image sensor.
  • the at least one image sensor is a dynamic vision sensor.
  • the present disclosure provides a vision system for a vehicle.
  • the vision system comprises a light-field camera as described herein.
  • the scene is an environment of the vehicle.
  • the vision system further comprises processing circuitry configured to generate at least two output image streams showing the vehicle’s environment from different perspectives based on output data of the light-field camera.
  • the vision system further comprises a display configured to simultaneously display the at least two output image streams.
  • the present disclosure provides a method of operating a vision system for a vehicle.
  • the vision system comprises a light-field camera as described herein, processing circuitry and a display.
  • the method comprises capturing an environment of the vehicle by the light-field camera.
  • the method further comprises generating at least two output image streams showing the vehicle’ s environment from different perspectives based on output data of the light-field camera.
  • the method further comprises simultaneously displaying the at least two output image streams on the display.
  • the present disclosure provides a vision system for a gaming console.
  • the vision system comprises a light-field camera as described herein.
  • the scene comprises a player of the gaming console.
  • the vision system further comprises processing circuitry configured to use output data of the light-field camera as input to a gaming program.
  • the present disclosure provides a vision system for a robot.
  • the vision system comprises a light-field camera as described herein.
  • the scene is an environment of the robot.
  • the vision system further comprises processing circuitry configured to control an operation of the robot based on output data of the light-field camera.
  • Fig. 1-3 illustrate examples of a light-field camera
  • Fig. 4 illustrates an example of a vision system for a vehicle
  • Fig. 5a, b illustrate an exemplary passive or active vision system for a vehicle
  • Fig. 6 illustrates an exemplary process for converting output data of a light-field camera into output image streams
  • Fig. 7 illustrates a flow chart of an example of a method of operating a vision system for a vehicle.
  • Fig. 8 illustrates an example of a vision system for a gaming console or for a robot.
  • Fig- 1 illustrates an example of a light-field camera 100.
  • the light-field camera 100 includes an optical device 110 and an image sensor 120.
  • the optical device 110 is configured to receive light emanating from a scene at different perspectives and to direct the light to the image sensor 120.
  • the image sensor 120 is a dynamic vision sensor (DVS).
  • the optical device may include at least one of a lens, a mirror, an optical filter, an aperture, a diffractive element.
  • the term “perspective” may be understood as a position or viewpoint at which an optical device receives light from the scene.
  • the perspective denotes a certain view on the scene.
  • the perspective may determine a distance between the optical device and the scene and/or a camera angle of the optical device relative to objects in the scene.
  • the DVS captures light intensity (brightness) changes of the light received from the optical device 110 over time.
  • the DVS includes pixels operating independently and asynchronously, detecting the light intensity changes as they occur, and staying silent otherwise.
  • the pixels may generate electrical signals, called events, which indicate per-pixel light intensity changes by a predefined threshold.
  • the DVS is an example for an event-based image sensor.
  • Each pixel may include a photo-sensitive element exposed to the light received from the optical device 110.
  • the received light may cause a photocurrent in the photo-sensitive element depending on a value of light intensity of the received light.
  • a difference between a resulting output voltage and a previous voltage reset-level may be compared against the predefined threshold.
  • a circuit of the pixel may include comparators with different bias voltages for an ON- and an OFF-threshold.
  • the comparators may compare the output voltage against the ON- and the OFF-threshold.
  • the ON- and the OFF-threshold may correspond to a voltage level higher or lower by the predefined threshold than the voltage resetlevel, respectively.
  • an ON- or an OFF-event respectively, may be communicated to a periphery of the DVS.
  • the voltage reset-level may be newly set to the output voltage that triggered the event. In this manner, the pixel may log a light-intensity change since a previous event.
  • the periphery of the DVS may include a readout circuit to associate each event with a time stamp and pixel coordinates of the pixel that recorded the event.
  • a series of events captured by the DVS at a certain perspective and over a certain time may be considered an event stream.
  • the light-field camera 100 may capture light of any wavelength, e.g., visible light and/or infrared light.
  • the light-field camera 100 may include a different number of image sensors than shown in Fig. 1, of which at least one is a DVS.
  • the light-field camera 100 may include a different number of optical devices than shown in Fig. 1.
  • the light -field camera 100 may include N image sensors, with N > 1, and M optical devices, with M > 1.
  • the light-field camera 100 may capture events based on light emanating from a scene and additionally capture (an absolute value of) a light intensity and/or a color (i.e., a wavelength) of light emanating from the scene. This may be advantageous for reconstructing an image of the scene.
  • the light-field camera 100 may include at least one conventional (i.e., frame-based) image sensor capturing the light intensity and/or color of the light.
  • at least one of the DVS of the light-field camera 100 may include, besides event-capturing pixels, one or more pixels measuring the light intensity and/or the color of the light (e.g. the one or more pixels may be operated in a frame-based manner). For instance, a certain percentage (e.g., 10%, 15%, or 20%) of pixels of the DVS may be for capturing the light intensity and/or the color of the light.
  • the DVS of the light-field camera 100 may include one or more pixel capable of capturing a light intensity change (event) in the light emanating from the scene and an absolute value of the light intensity of the light.
  • a pixel of the DVS may, e.g., include two circuits of which a first one reports the events and a second one reports the light intensity (and/or color of the light) based on a photocurrent of a common photo-sensitive element. Since the first circuit does not necessarily “consume” the photocurrent when measuring the event, the second circuit may measure the light intensity at the same time, i.e., the two circuits may operate independently.
  • a control circuit may change between a first operation mode of the pixel and a second operation mode of the pixel.
  • the photo-sensitive element In the first operation mode, the photo-sensitive element may be connected to the first circuit.
  • the photo-sensitive element In the second operation mode, the photo-sensitive element may be connected to the second circuit.
  • the first operation mode may enable the pixel to measure events.
  • the second operation mode may enable the pixel to measure the light intensity and/or the color of the light.
  • the light-field camera 100 may be understood as a plenoptic DVS camera, i.e., a camera capturing events based on light emanating from a scene at different perspectives. This may be advantageous since a so-called light field may be inferred from output data of the lightfield camera.
  • the light field may be a vector function describing a direction of light rays in a three-dimensional space. For instance, based on the output data, a direction of a light ray which triggered an event captured by a DVS of the light-field camera 100 may be determined.
  • the light-field camera 100 unlike a camera with a single fixed viewpoint -may provide information about a geometry of the scene.
  • the light-field camera may maintain a form factor of a single camera while behaving like an array of multiple cameras.
  • the light-field camera 100 includes at least one DVS. This may be advantageous since the light-field camera 100 may have a lower energy consumption, a lower latency, a higher dynamic range, and a higher frame-rate than a conventional light -field camera.
  • the light-field camera 100 may capture the light-field of the scene at high temporal resolution and high dynamic range in comparison to conventional frame-based light-field cameras. This may be advantageous in scenarios where visual stimuli are used for control tasks requiring low latency and resilience to challenging lighting conditions, such as in automotive scenarios.
  • the control tasks carried out by an agent, either artificial or human, may additionally benefit from an understanding of a (three-dimensional) geometry of a surrounding environment (which may be inferred from the light-field).
  • Embodiments of the present disclosure may enable, on the one hand, high temporal resolution and dynamic range imaging based on the DVS technology and, on the other hand, the capability of capturing a distance of objects based on the plenoptic camera technology.
  • Fig- 2 illustrates another example of a light-field camera 200.
  • the light-field camera 200 includes two optical devices, a first optical device 210-1 and a second optical device 210-2.
  • the two optical devices 210-1 and 210-2 face a scene 230 and are arranged in one line parallel to the scene 230.
  • the first optical device 210-1 is configured to receive light emanating from the scene 230 at a first perspective 215-1.
  • the second optical device 210-2 is configured to receive light emanating from the scene 230 from a second perspective 215-2.
  • the two optical devices 210-1, 210-2 are configured to receive the light emanating from the scene 230 at a respective perspective.
  • the first perspective 215-1 is located on a surface of the first optical device 210-1 where chief rays of the light emanating from the scene 230 impinge on the first optical device 210-1.
  • the second perspective 215-2 is located on a surface of the second optical device 210-2 where chief rays of the light emanating from the scene 230 impinge on the second optical device 210-2.
  • the light-field camera 200 further includes two image sensors, a first image sensor 220-1 and a second image sensor 220-2.
  • the first image sensor 220-1 and/or the second image sensor 220-2 is a dynamic vision sensor.
  • the first image sensor 220-1 is placed in front of the first optical device 210-1.
  • the second image sensor 220-2 is placed in front of the second optical device 210-2.
  • the first optical device 210-1 is configured to direct received light to the first image sensor 220-1
  • the second optical device 110-2 is configured to direct received light to the second image sensor 220-2.
  • each of the two optical devices 210-1, 210-2 is configured to direct the light to a respective one of the two image sensors 220-1, 220-2.
  • the first optical device 210-1 and the second optical device 210-2 may deflect, focus, or diffract the received light onto a photo-sensitive area of the first image sensor 220-1 and a photo-sensitive area of the second image sensor 220-2, respectively.
  • the first image sensor 220-1 captures light emanating from the scene 230 at the first perspective 215- 1.
  • the second image sensor 220-2 captures light emanating from the scene 230 at the second perspective 215-2.
  • optical paths 240-1, 245-1 from the scene 230 to the first optical device 210-1 and two optical paths 240-2, 245-2 from the scene 230 to the second optical device 210-2 are illustrated.
  • the optical paths 240-1, 245-1, 240-2, 245-2 are to be understood as exemplary light rays emanating from the scene 230.
  • Other optical paths may run from the scene 230 to the first optical device 210-1 within a field of view of the first optical device 210-1 or from the scene 230 to the second optical device 210-2 within a field of view of the second optical device 210-2.
  • the field of view of the first optical device 210-1 may at least partly overlap the field of view of the second optical device 210-2.
  • a scene 230 shown in Fig. 2 is meant as an example.
  • a scene may show objects in a field of view of a light-field camera and light rays may emanate from the objects.
  • the light-field camera 200 may include a different number of image sensors than shown in Fig. 2, of which at least one is a DVS.
  • the light-field camera 200 may include a different number of optical devices than shown in Fig. 2.
  • the light-field camera 200 may include X image sensors, with X > 2, and Y optical devices, with Y > 2.
  • a light-field camera according to the present disclosure may include N image sensors, with N > 1, and M optical devices, with M > 1.
  • one of the first image sensor 220-1 and the second image sensor 220-2 may be omitted.
  • one of the first optical device 210-1 and the second optical device 210-2 may be omitted.
  • the light-field camera 200 may capture the light emanating from the scene 230 at more than the two perspectives shown in Fig. 2.
  • the light-field camera 200 may capture the light at n > 2 perspectives.
  • the light-field camera 200 may include at least one optical device that directs light to more than one respective image sensor. Thus, one view of a scene may be captured by more than one image sensor.
  • the light-field camera 200 may additionally or alternatively include at least two optical devices that direct light to one image sensor. Thus, one image sensor may capture more than one view of the scene.
  • the light-field camera 200 may include optical devices and image sensors which are differently arranged than shown in Fig. 2.
  • the optical devices may be arranged in a two-dimensional array, e.g., in a curved shape.
  • the optical devices and/or the image sensors may be (partly) offset from each other relative to the scene.
  • the optical devices and/or the image sensors may be arranged with regular or irregular distances to each other.
  • the light-field camera 200 may include optical devices of different shapes and types, e.g., convex and concave lenses.
  • the light-field camera 200 may include image sensors of different types, e.g., active-pixel sensors, passive-pixel sensors, charge-coupled devices.
  • Fig- 3 illustrates another example of a light-field camera 300 according to an embodiment.
  • the light-field camera 300 includes a DVS 320 placed in parallel to an array 310 of microlenses.
  • the array 310 of microlenses is configured to direct light emanating from a scene 330 to different subsets of a plurality of pixels of the DVS 320.
  • the array 310 of microlenses is placed between the DVS 320 and a convex lens 316 facing the scene 330.
  • the lens 316 receives light emanating from the scene 330 at different perspectives.
  • the different perspectives may be located on a respective sub-aperture of the lens 316.
  • the sub-apertures may divide the lens 316 into different sections along an axis 318 of the lens 316, e.g., a first section 315-1, a second section 315-2, and a third section 315-3.
  • the axis 318 may be a symmetry axis of the lens 316 and perpendicular to an optical axis of the lens 316.
  • the array 310 of microlenses includes six microlenses, such as microlens 312 and microlens 314.
  • Microlenses such as the microlenses 312, 314, may be lenses with a diameter less than a millimeter, e.g., smaller than 10 micrometers.
  • a microlens may be, e.g., a single optical element including a plane surface and a spherical convex or aspherical surface to refract light.
  • the microlens may include several layers of optical material, e.g., two flat surfaces and several parallel surfaces with different refractive indexes (gradient-index lens).
  • microlenses may be Fresnel lenses or binary-optic lenses or a combination of the aforementioned types.
  • Multiple microlenses may be formed into a one-dimensional or two-dimensional array on a supporting substrate, such as the array 310.
  • optical paths 342-1, 342-2, 342-3, 344-1, 344-2, and 344-3 are illustrated.
  • the optical paths 342-1, 342-2, 342-3 start on a first area 332 of the scene 330, proceed in different directions and impact the lens 316 on the first section 330-1, the second section 330-2, and the third section 330-3, respectively.
  • the optical paths 344-1, 344-2, 344-3 start on a second area 334 of the scene 330, proceed in different directions and impact the lens 316 on the first section 315-1, the second section 315-2, and the third section 315-3, respectively.
  • optical paths 342-1, 342-2, 342-3, 344-1, 344-2, and 344-3 are to be understood as exemplary light rays emanating from the scene 330. Other optical paths may run from the scene 330 to the lens 316.
  • the lens 316 directs the light emanating from the scene 330 to the array 310 of microlenses.
  • Light rays of the optical paths 342-1, 342-2, 342-3 impinge on the microlens 314, and light rays of the optical paths 344-1, 344-2, 344-3 impinge on the microlens 312.
  • Each of the microlenses receives light from a respective area of the scene 330.
  • the microlens 312 receives light from the area 334
  • the microlens 314 receives light from the area 332.
  • the DVS 320 includes a plurality of pixels.
  • the pixels are arranged in one row and in parallel to the array 310 of microlenses.
  • Each of the microlenses of the array 310 directs the light impinging on the respective microlens to a respective subset of the pixels.
  • the microlens 312 directs the light to pixels 352-1, 352-2, 352-3, and the microlens 314 directs the light to pixels 354-1, 354-2, 354-3.
  • Light rays of the optical paths 342-1, 342-2, 342-3 may be captured by the pixels 354-1, 354-2, 354-3, respectively.
  • Light rays of the optical paths 344-1, 344-2, 344-3 may be captured by the pixels 352-1, 352-2, 352-3, respectively.
  • each subset of pixels receives light from a respective microlens and, consequently, from a respective area of the scene 330.
  • each pixel of a subset receives light of the respective area at a respective perspective.
  • the pixels of a subset receive light of the respective area at different perspectives.
  • the pixel 352-1 receives light from the area 334 which passes the first section 315-1 of the lens 316.
  • the light-field camera 300 may generate output data based on electric signals triggered by light impacting the DVS 320. By combining data of pixels associated to one perspective, a view of the scene 330 at the said perspective may be reconstructed.
  • a plenoptic DVS camera as disclosed herein may be equipped with an array of microlenses placed in front of an image sensor of the plenoptic DVS camera. Each microlens may be placed in front of an a x b array (subset) of pixels of the image sensor. Pixels of a certain row and column over all arrays may be associated to one view. Thus, the plenoptic DVS camera may capture a x b event streams of the same scene, each from a different camera angle and with a resolution determined by a number of the microlenses.
  • the light-field camera 300 shown in Fig. 3 captures light at several perspectives based on one optical device (e.g., the array 310 of microlenses) and one image sensor (e.g., DVS 320), the light-field camera 300 may be more compact than in other embodiments.
  • one optical device e.g., the array 310 of microlenses
  • one image sensor e.g., DVS 320
  • the light-field camera 300 may include a DVS with a different number of pixels than shown in Fig. 3.
  • the DVS may include any number > 2 of pixels.
  • the DVS may include a two-dimensional array of c x d pixels.
  • One or more pixels of the DVS may be pixels measuring a light intensity and/or a color of the light. Some of the pixels may be part of more than one subset of pixels related to a respective microlens.
  • the light-field camera 300 may include a different number of image sensors, arrays of microlenses, or lenses than shown in Fig. 3.
  • the light-field camera 300 may include any number > 1 of image sensors, any number > 1 of arrays of microlenses, and any number > 0 of lenses. All image sensors may be DVSs. In other examples, one or more of the image sensors may be DVSs and one or more (but not all) of the image sensors may be framebased image sensors.
  • the light-field camera 300 may provide a different number of perspectives than shown in Fig. 3. For instance, the light-field camera 300 may include a lens which is divided into any number > 2 of sections.
  • the light-field camera 300 may include an array of microlenses which include a different number of microlenses than shown in Fig. 3. The array may be a two-dimensional arrangement of r x s microlenses.
  • an arrangement of a lens, an image sensor, and/or an optical device of the light-field camera 300 may differ from the one shown in Fig. 3.
  • the lens may be placed between the array of microlenses and the image sensor of the light-field camera 300.
  • Fig. 4 illustrates an example of a vision system 400 for a vehicle.
  • the vehicle may be any apparatus for transporting people or cargo.
  • the vehicle may comprise wheels, nozzles, or propellers driven by an engine (and optionally a powertrain system).
  • the vehicle may be an automobile, a truck, a motorcycle, or a tractor.
  • Embodiments of the present disclosure further relate to a vehicle comprising the vision system 400.
  • the vision system 400 includes a light-field camera 410 according to embodiments of the present disclosure, such as the light-field camera 100, 200, or 300 described above.
  • the lightfield camera 410 captures light emanating from a scene 420 at different capturing perspectives.
  • the capturing perspectives refer to views of the scene which are captured by the light - field camera 410.
  • the scene 420 is an environment of the vehicle.
  • the environment of the vehicle may be, e.g., a space in a surrounding of the vehicle (outside the vehicle).
  • the light-field camera 410 may capture light emanating in front, at a side (e.g. left or right side) and/or behind the vehicle.
  • the vision system 400 further includes a processing circuitry 430.
  • the processing circuitry 430 is communicatively coupled to the light-field camera 410, e.g., for data communication.
  • the processing circuitry 430 may receive output data of the light-field camera 410 via any data communication channel (e.g., wired and/or wireless).
  • the processing circuitry 430 may, for instance, be part of the light-field camera 410 or be separate from the light-field camera 410 (e.g., be part of an electronic control module of the vehicle).
  • the processing circuitry 430 is configured to generate at least two output image streams showing the vehicle’s environment from different perspectives based on output data of the light - field camera 410.
  • the output data of the light-field camera 410 may include sensor data of an image sensor of the light-field camera 410, such as event streams of a DVS.
  • the perspectives refer to views which the output image streams represent.
  • the perspectives may (partly) correspond to the capturing perspectives or completely differ from the capturing perspectives. A number of the perspectives may be smaller, equal, or higher than a number of the capturing perspectives.
  • the output image streams may be image streams which represent a series of images (frames) which are assigned to a chronological order in which the images may be displayed.
  • the output image streams may, e.g., be a digital video.
  • each of the output image streams may show the environment of the vehicle (e.g., rear views or a side views) at a respective perspective.
  • the processing circuitry 430 may, for instance, be a single dedicated processor, a single shared processor, or a plurality of individual processors, some of which or all of which may be shared, a digital signal processor (DSP) hardware, an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA).
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • the processing circuitry 430 may optionally be coupled to, e.g., read only memory (ROM) for storing software, random access memory (RAM) and/or non-volatile memory.
  • ROM read only memory
  • RAM random access memory
  • the processing circuitry 430 is configured to perform at least part of processes described herein, such as generating the at least two output image streams. Exemplary processes of how the processing circuitry 430 may generate the output image streams are explained below with reference to Fig. 6.
  • the vision system 400 further includes a display 440 configured to display the output image streams simultaneously, i.e., at the same time.
  • the processing circuitry 430 is communicatively coupled to the display 440 to transmit the output image streams to the display 440.
  • the display 440 may be any device capable of depicting the output image streams.
  • the display 440 may convey depth perception to a viewer, e.g., a driver, a passenger, or a user of the vehicle.
  • the display 440 may be a light-field display, a stereo display, a 3D- display, or the like.
  • the vision system 400 may be considered a “virtual mirror”.
  • the virtual mirror may replace a side-view (or rear-view) mirror of the vehicle by the light-field camera 410 streaming acquired content (e.g., the side view or the rear-view) to the display 440 inside a vehicle’s cockpit. Due to smaller dimensions of the light-field camera 410 compared to the side-view mirror, the virtual mirror may improve vehicle aerodynamics, increase an autonomy range of the vehicle, and decrease a noise level inside the vehicle. Using the light-field camera 410 rather than a mirror may also permit better visibility in adverse weather conditions and provide a side view (or rear view) with a wider field of view.
  • a side-view mirror may redirect an entire light field from a scene (a surrounding next to or behind the vehicle) towards the eyes of a viewer (e.g., a driver or user of the vehicle) looking onto the side-view mirror.
  • the viewer may have access to the same information accessible when watching the scene directly.
  • a side-view mirror a user’s left and right eye may see two different views of the same scene, as their viewpoints (perspectives) are different. This is referred to as “parallax”, which may allow the user of the vehicle to have a three-dimensional perception of the scene.
  • a side-view mirror may provide the viewer with an entire light-field coming from the scene.
  • the two views perceived by the two eyes may change when the viewer moves his/her head. This may allow the viewer to perceive a geometry of the scene more accurately. Larger parallaxes may be provided than with a limited baseline between the two eyes. Further, the viewer may “zoom” into the scene by moving his/her head closer to the side-view mirror, e.g., to focus on a point of interest.
  • the light field and the parallax provided by a side-view mirror may be crucial in a driving scenario, in particular, for safety reasons. For instance, when a driver of a vehicle intends to start an overtaking maneuver on a highway to change to a fast lane of the highway, the driver may need to understand the approximative distance of vehicles driving on the fast lane, in order to avoid a collision with the vehicles.
  • a conventional virtual mirror may be uncapable of exhibiting the light field or the parallax of the scene since a camera of the conventional virtual mirror may capture the scene from a single fixed viewpoint with a fixed magnification. Consequently, eyes of a viewer looking onto a display of the conventional virtual mirror may see the scene from the fixed viewpoint, even when the viewer moves his/her head.
  • the conventional virtual mirror may lose a three-dimensional nature of the scene since the camera “flattens” the scene geometry such that a depth of the scene may be unperceivable.
  • a vision system such as vision system 400, may overcome the aforementioned problems of conventional virtual mirrors.
  • the proposed vision system 400 may replace a conventional display of a conventional virtual mirror setup by a display capable of displaying several views of the scene at the same time.
  • the display may show a light field derived from output data of a plenoptic DVS camera. This may provide a viewer looking onto the display with a similar experience of looking into a sideview (or rear-view) mirror.
  • the proposed vision system may deliver a respective output image stream to the two viewer’s eyes depending on poses of the eyes. Thus, the two eyes may see different output images streams which may change depending on the poses of the eyes.
  • the vision system 400 includes a plenoptic DVS camera (e.g., the light-field camera 410) instead of a conventional light-field camera.
  • the plenoptic DVS camera may capture a scene’s geometric information with high temporal resolution. This may be desirable in a fast and dynamic automotive environment.
  • the plenoptic DVS camera may represent a safer solution than a conventional light-field camera since the DVS may provide a higher dynamic range.
  • the plenoptic DVS camera may cope better with sudden light changes, e. g., when entering/exiting a tunnel or when lit by other vehicles’ headlights. (When exposed to light changes, a frame-based image sensor may be temporally blinded.)
  • the plenoptic DVS camera may provide a higher light sensitivity at night than a conventional light-field camera.
  • the vision system 400 may be passive or active. “Passive” may refer to a visualization of as many perspectives as the display technology permits, i.e., the display may visualize a lightfield of the captured scene, regardless of poses of viewer’s eyes. In other words, the passive vision system may display light-field without information about the poses of the viewer’s eyes. The poses of the viewer’s eyes may determine which pair of output image streams may be observed by the viewer. This scenario may resemble an operation and physics of a mirror. An example of a passive vision system is further explained with reference to Fig. 5a.
  • active may refer to a visualization of two output image streams showing views selected according to the poses of the viewer’s eyes.
  • the poses of the viewer’s eyes may be tracked over time and a display’s content may be adapted accordingly.
  • the display may deliver only the two output image streams that correspond to the views that the viewer’s eyes may have when seeing the scene directly or through a mirror.
  • An example of an active vision system is further explained with reference to Fig. 5b.
  • Fig. 5a illustrates an example of a vision system 500a for a vehicle.
  • the vision system 500a includes a light-field camera 410 as explained with reference to Fig. 4.
  • the light -field camera 410 is configured to capture light from a scene in an environment of the vehicle at different capturing perspectives (e.g., 4, 8, 16, 32 or more capturing perspectives).
  • the light-field camera 410 generates output data 515 based on the captured light and forwards the output data 515 to a processing circuitry 530a.
  • the processing circuitry 530a is configured to generate more than two output image streams 535 (e.g., 4, 8, 16, 32 or more output image streams) for more than two different perspectives (e.g., 4, 8, 16, 32 or more perspectives) based on the output data 515.
  • the output image streams 535 may be a high frame rate and high dynamic range light field video.
  • the processing circuitry 530a forwards the output image streams 535 to a display 540a.
  • the display 540a is configured to simultaneously display the output image streams 535.
  • the display 540a includes an array of microlenses (e.g., a lenticular lens) or a parallax barrier configured to restrict a view of a user 560 of the vehicle to a respective pair of the more than two output image streams 535 depending on poses of eyes of the user 560.
  • the display 540a may convey depth perception to the user 560.
  • the display 540a may include a conventional display, e.g., a Liquid-Crystal Display.
  • the array of microlenses or the parallax barrier may be placed in front of the conventional display.
  • the array of microlenses may receive the light of the output image streams 535 provided (output, displayed) by display pixels of the display 540a.
  • the array of microlenses may direct the light of the output image streams 535 towards a respective direction.
  • the array of microlenses may show a respective output image stream. This may give a three-dimensional impression to the user 560 looking on the array of microlenses from two angles, with the left eye and the right eye.
  • Each microlens of the array of microlenses may be placed in front of an e x f array of the display pixels.
  • Each display pixel of the e x f array may be for displaying a respective view.
  • a number of the microlenses may determine a resolution of the output image streams 535.
  • the display 540a may output an entire light field video.
  • the microlenses may allow the user 560 to see a pair of views of the light field video determined by poses of the user’s eyes.
  • the parallax barrier may include an opaque layer with a series of spaced slits, allowing the left eye and the right eye of the user 560 to see a respective set of pixels of the underneath conventional display.
  • the display 540a may include a different display technology, e.g., a 3D or light-field display technology.
  • the vision system 500a shown in Fig. 5a may refer to a passive vision system.
  • the passive vision system may be beneficial since it may generate output image streams for any number of viewers that may look at the display 540. Different viewers may receive different views of the scene.
  • Fig. 5b illustrates another example of a vision system 500b for a vehicle.
  • the vision system 500b includes a light-field camera 410 as explained with reference to Fig. 4.
  • the light-field camera 410 is configured to capture light from a scene in an environment of the vehicle at different capturing perspectives.
  • the light-field camera 410 generates output data 515 based on the captured light and forwards the output data 515 to a processing circuitry 530b.
  • the vision system 500b further includes an eye-tracking device 570 configured to track poses of eyes of a user 560 of the vehicle.
  • the eye tracking device 570 may, e.g., include a camera in an interior of the vehicle directed at a face of the user 560.
  • the eye tracking device 570 tracks the viewer’s eyes and estimates the viewer’s head position or poses of the viewer’s eyes.
  • the eye-tracking device 570 may determine data 575 indicating the poses of eyes (e.g., in terms of coordinates) and communicate the data 575 to the processing circuitry 530b.
  • the processing circuitry 530b is configured to determine two different perspectives corresponding to the poses of the eyes (based on the data 575).
  • the processing circuitry 530b is further configured to generate two output image streams 535 for the two different perspectives.
  • the processing circuitry 530b may determine two perspectives in accordance with the poses of eyes, thus, two perspectives which may correspond to views that the left eye and the right eye of the user 560 may have when looking at the scene directly or through a mirror.
  • the processing circuitry 530b generates the two output image streams 535, e.g., by selecting two event streams of the output data 515 which are suitable according to the estimated poses of the user’s eyes. For instance, the processing circuitry 530b may select two event streams which approximate best (aligns with) the views the user’s eyes would have when directly looking at the scene or looking at the scene through a mirror. The processing circuitry 530b converts the two selected event streams into two high frame rate and high dynamic range videos (the output image streams 535). Alternatively, the processing circuitry 530b may select several event streams of the output data 515 which approximate best the view of the left eye of the user 560.
  • the processing circuitry 530b may interpolate the selected event streams to generate a first output image stream for the left eye.
  • the first output image stream may correspond to a view the left eye would have when directly looking at the scene or looking at the scene through a mirror.
  • the processing circuitry 530b may select several event streams of the output data 515 which approximate best the view of the right eye of the user 560 and generate a second output image stream for the right eye.
  • the first output image stream and the second image stream are the two output image streams 535.
  • the processing circuitry 530b communicates the two output image streams 535 to a display 540b.
  • the display 540b is configured to simultaneously display (only, exclusively) the two output image streams 535.
  • the display 540 displays the two output image streams 535 towards the user 560.
  • the vision system 500b shown in Fig. 5b may refer to an active vision system.
  • the active vision system may be beneficial since only the two output image streams 535 are generated, thus, less processing power may be needed.
  • Fig- 6 illustrates an exemplary process for converting output data 615 of a light-field camera into at least two output image streams 635.
  • the output image streams 635 may be high frame- rate and high dynamic-range image streams (i.e., a light field video).
  • the light-field camera shall be considered a light-field camera according to the present disclosure, e.g., light-field camera 410.
  • the processes may be performed by a processing circuitry of a vision system, such as process circuitry 430.
  • the output data 615 includes at least one event stream representing a certain view of a scene.
  • the output data 615 further includes at least another event stream representing at least one other view of the scene and/or at least one auxiliary image stream representing at least one other view of the scene.
  • the light-field camera may additionally include a conventional image sensor, and/or a DVS including pixels for measuring light-intensity and/or color of light emanating from the scene.
  • Fig. 6 shows two exemplary options 620a and 620b to convert the output data 615 into image streams 630.
  • the image streams 630 represent different views of the scene corresponding to capturing perspectives represented by the output data 615.
  • the output image streams 635 may correspond to the image streams 630.
  • the image streams 630 may include the output image streams 635, thus, the output image streams 635 may be selected from the image streams 630, e.g., by determining suitable perspectives and select the image streams 630 accordingly.
  • the output image streams 635 may be derived from the image streams 630. For instance, if the output image streams 635 show more and/or different views than the image streams 630, views of the image streams 630 may be interpolated to generate views of the output image streams 635.
  • the output data 615 represents at least two event streams.
  • Each of the event streams represents events in the scene from a respective capturing perspective.
  • a processing circuitry of a vision system such as vision system 400, is configured to convert the at least two event streams into the at least two output image streams 635.
  • the processing circuitry may be configured to generate the at least two output image streams 635 by determining a respective intensity value for one or more pixels of a respective one of the at least two output image streams 635 based on a respective one of the at least two event streams.
  • the processing circuitry may be configured to determine the respective intensity value using a trained machine-learning model, for instance.
  • the event streams may, firstly, be mapped to image streams 630 from which the output image streams 635 are selected.
  • the mapping may be performed by using the trained machine-learning model or a machine-learning algorithm, for instance.
  • Machine learning may refer to algorithms and statistical models that computer systems may use to perform a specific task without using explicit instructions, instead relying on models and inference.
  • machine-learning instead of a rule-based transformation of data, a transformation of data may be used, that is inferred from an analysis of historical and/or training data.
  • the content of event streams may be analyzed using a machinelearning model or using a machine-learning algorithm.
  • the machine-learning model may be trained using training event streams as input and training content information as output.
  • the machine-learning model By training the machine-learning model with a large number of training event streams and/or training sequences and associated training content information (e.g., labels or annotations), the machine-learning model "learns" to recognize the content of the event streams, so the content of event streams that are not included in the training data can be recognized using the machinelearning model.
  • the same principle may be used for other kinds of data as well:
  • the machine-learning model By training a machine-learning model using training data and a desired output, the machine-learning model "learns” a transformation between the data and the output, which can be used to provide an output based on non-training data provided to the machine-learning model.
  • the provided data e.g., event data, image data
  • Machine-learning models may be trained using training input data.
  • the examples specified above use a training method called "supervised learning".
  • supervised learning the machine-learning model is trained using a plurality of training samples, wherein each sample may comprise a plurality of input data values, and a plurality of desired output values, i.e., each training sample is associated with a desired output value.
  • the machine-learning model "learns" which output value to provide based on an input sample that is similar to the samples provided during the training.
  • semi-supervised learning may be used. In semi -supervised learning, some of the training samples lack a corresponding desired output value.
  • Supervised learning may be based on a supervised learning algorithm (e.g., a classification algorithm, a regression algorithm, or a similarity learning algorithm.
  • Classification algorithms may be used when the outputs are restricted to a limited set of values (categorical variables), i.e., the input is classified to one of the limited set of values.
  • Regression algorithms may be used when the outputs may have any numerical value (within a range).
  • Similarity learning algorithms may be similar to both classification and regression algorithms but are based on learning from examples using a similarity function that measures how similar or related two objects are. Apart from supervised or semi-supervised learning, unsupervised learning may be used to train the machine-learning model.
  • Clustering is one exemplary method to find structures in the input data. Clustering is the assignment of input data comprising a plurality of input values into subsets (clusters) so that input values within the same cluster are similar according to one or more (predefined) similarity criteria, while being dissimilar to input values that are included in other clusters.
  • Reinforcement learning is a third group of machine-learning algorithms.
  • reinforcement learning may be used to train the machine-learning model.
  • one or more software actors (called “software agents") are trained to take actions in an environment. Based on the taken actions, a reward is calculated.
  • Reinforcement learning is based on training the one or more software agents to choose the actions such, that the cumulative reward is increased, leading to software agents that become better at the task they are given (as evidenced by increasing rewards).
  • Feature learning may be used.
  • the machine-learning model may at least partially be trained using feature learning, and/or the machine-learning algorithm may comprise a feature learning component.
  • Feature learning algorithms which may be called representation learning algorithms, may preserve the information in their input but also transform it in a way that makes it useful, often as a pre-processing step before performing classification or predictions.
  • Feature learning may be based on principal components analysis or cluster analysis, for example.
  • anomaly detection i.e., outlier detection
  • the machine-learning model may at least partially be trained using anomaly detection, and/or the machine-learning algorithm may comprise an anomaly detection component.
  • the machine-learning algorithm may use a decision tree as a predictive model.
  • the machine-learning model may be based on a decision tree.
  • observations about an item e.g., a set of input values
  • an output value corresponding to the item may be represented by the leaves of the decision tree.
  • Decision trees may support both discrete values and continuous values as output values. If discrete values are used, the decision tree may be denoted a classification tree, if continuous values are used, the decision tree may be denoted a regression tree.
  • Association rules are a further technique that may be used in machine-learning algorithms. In other words, the machine-learning model may be based on one or more association rules.
  • Association rules are created by identifying relationships between variables in large amounts of data.
  • the machine-learning algorithm may identify and/or utilize one or more relational rules that represent the knowledge that is derived from the data.
  • the rules may e.g., be used to store, manipulate, or apply the knowledge.
  • Machine-learning algorithms are usually based on a machine-learning model.
  • the term “machine-learning algorithm” may denote a set of instructions that may be used to create, train, or use a machine-learning model.
  • the term “machine-learning model” may denote a data structure and/or set of rules that represents the learned knowledge (e.g., based on the training performed by the machine-learning algorithm).
  • the usage of a machine-learning algorithm may imply the usage of an underlying machine-learning model (or of a plurality of underlying machine-learning models).
  • the usage of a machine-learning model may imply that the machine-learning model and/or the data structure/set of rules that is the machine-learning model is trained by a machine-learning algorithm.
  • the machine-learning model may be an artificial neural network (ANN).
  • ANNs are systems that are inspired by biological neural networks, such as can be found in a retina or a brain.
  • ANNs comprise a plurality of interconnected nodes and a plurality of connections, so-called edges, between the nodes.
  • Each node may represent an artificial neuron.
  • Each edge may transmit information, from one node to another.
  • the output of a node may be defined as a (non-linear) function of its inputs (e.g., of the sum of its inputs).
  • the inputs of a node may be used in the function based on a "weight" of the edge or of the node that provides the input.
  • the weight of nodes and/or of edges may be adjusted in the learning process.
  • the training of an artificial neural network may comprise adjusting the weights of the nodes and/or edges of the artificial neural network, i.e., to achieve a desired output for a given input.
  • the machine-learning model may be a support vector machine, a random forest model or a gradient boosting model.
  • Support vector machines i.e., support vector networks
  • Support vector machines are supervised learning models with associated learning algorithms that may be used to analyze data (e.g., in classification or regression analysis).
  • Support vector machines may be trained by providing an input with a plurality of training input values that belong to one of two categories. The support vector machine may be trained to assign a new input value to one of the two categories.
  • the machine-learning model may be a Bayesian network, which is a probabilistic directed acyclic graphical model.
  • a Bayesian network may represent a set of random variables and their conditional dependencies using a directed acyclic graph.
  • the machine-learning model may be based on a genetic algorithm, which is a search algorithm and heuristic technique that mimics the process of natural selection.
  • the machine-learning model may be trained based on self-supervised learning.
  • Self-supervised learning may be regarded as an intermediate form of supervised and unsupervised learning.
  • training data classified by humans are not necessarily required.
  • a pretext task may be solved based on pseudo-labels which may help to initialize computational weights of the machine-learning model in a way which is useful for a second step.
  • a more complex downstream task may be solved. This downstream task which computes the actual task may be performed with supervised or unsupervised learning as described above.
  • a first neural network may be a convolutional neural network for estimating an optical flow by compensating for the motion blur in input event streams.
  • a second neural network may be a recurrent convolutional network for performing image reconstruction through event-based photometric constancy.
  • the optical flow is a distribution of apparent velocities of movement in light intensity patterns which are caused by motion of a scene relative to an observer of the scene.
  • the event-based photometric constancy is a linearization of pixel-wise light intensity increments under the assumptions of Lambertian surfaces, constant illumination, and small-time windows in which the light intensity increment occurs.
  • the first neural network may estimate the optical flow based on the input event streams.
  • the second neural network may use the estimated optical flow and prior reconstructed image streams to estimate a first distribution of light intensity increments for events of the input event streams based on the event-based photometric constancy.
  • the second neural network may determine a second distribution of light intensity increments by deblurring and averaging the input events.
  • the two distributions may be compared and a difference between the two distributions may be propagated backwards to the first neural network to improve a reconstruction accuracy of the first neural network.
  • the output data 615 represents at least one event stream of the scene.
  • a processing circuitry of a vision system receives at least two auxiliary image streams. Each of the at least two auxiliary image streams shows the scene from a respective capturing perspective.
  • the processing circuitry generates the at least two output image streams 635 by increasing a framerate and/or a dynamic range of the at least two auxiliary image streams based on the at least one event stream.
  • the auxiliary image streams may be super-resolved based on the event streams.
  • one of the auxiliary image streams may include a first frame captured at a time tl and a consecutive second frame captured at a later time t2.
  • Synthetic frames for a time interval between tl and t2 may be generated by interpolating between light intensities of the first frame and the second frame.
  • a corresponding interpolation function may be determined based on events (of the event streams) that occurred in the time interval between tl and t2.
  • a processing circuitry of the vision system is further configured to determine a distance between a light-field camera of the vision system and objects in the scene based on the output data 615.
  • the output data 615 is used to derive distance information 640.
  • the processing circuitry may adapt a field of focus of the at least two output image streams 635 based on the distance.
  • the processing circuitry may be configured to determine a difference between corresponding portions of the output data 615, wherein each of the portions relates to a respective capturing perspective, and determine the distance based on the difference.
  • the distance information 640 may improve a generation of views of the scene which are not captured by the light-field camera. For instance, in case a light-field display of the vision system has a higher resolution than the light-field camera, more views may be shown by the light-field display than represented by the output data 615. Or, in case of an active vision system, two views of the scene may be determined which are aligned with poses of eyes of a viewer. In case the two determined views are not part of the captured views, the two views may be generated synthetically with the help of the distance information 640. In other examples of the present disclosure, determining the distance to objects in the scene may be omitted.
  • One option 650a to approximate the distance to objects in the scene may be deriving the distance information 640 from the event streams of the output data 615.
  • events of the event streams may be associated to respective three-dimensional coordinates.
  • Depth estimation techniques for plenoptic cameras may be applied to the event streams, such as triangulation.
  • Light rays may be traced back to an object plane of the objects where the light rays have been emitted from. Assuming lens parameters of the light-field camera are given, a slope of each light ray in object space may be retrieved and described as a linear function.
  • linear functions of two arbitrary light rays intersecting at the object plane may be mathematically equated. A common solution for the linear functions may correspond to the distance.
  • Another depth estimation technique to associate the events to respective three-dimensional coordinates may be determining a disparity between corresponding events of different views.
  • the disparity corresponds to a difference in pixel location of the corresponding events.
  • the light-field camera may generate an output similar to an output of several event cameras arranged in an array with known distances between adjacent event cameras (baseline). Each event camera of the array captures a different view of the scene.
  • a moving point P in the scene triggers events in each of the event cameras.
  • the movement of point P leaves traces in event streams of the event cameras arranged in one row of the array. The traces are shifted due to the movement, thus, create a disparity across the event streams of the different views.
  • a slope of the disparity corresponds to a speed at which P moves past the event cameras of one row.
  • the slope is in a mathematical relation with the distance between the point P and the array of cameras.
  • the lightfield camera may simplify a detection of the disparity associated to the point P and an estimation of the slope due to a sparser nature of events compared to images (frames).
  • Another depth estimation technique is to estimate the distance to objects in the scene based on an auxiliary image stream received by the vision system (e.g., option 620b).
  • An event- derived optical flow may be used to assign a distance value to events being recorded at a time between a first capturing time of a first frame of the auxiliary image stream and a second capturing time of a subsequent frame of the auxiliary image stream.
  • the optical flow may be derived, e.g., based on an adaptive block matching optical flow (ABMOF) method which computes the optical flow at pixels where brightness changes.
  • ABSMOF adaptive block matching optical flow
  • Another option 650b may be deriving the distance information 640 based on the image streams 630. Depth estimation techniques for the (multi-view, stereo) output image streams 630 may be applied.
  • the output image streams 635 may be generated. For an active vision system, additional information about poses of eyes of a viewer may be needed.
  • Fig- 7 illustrates a flow chart of an example of a method 700 of operating a vision system for a vehicle, such as the vision system 500a or 500b.
  • the vision system includes a light-field camera according to embodiments of the present disclosure, processing circuitry and a display.
  • the method 700 includes capturing 710 an environment of the vehicle by the light -field camera.
  • the method 700 further includes generating 720 at least two output image streams showing the vehicle’s environment from different perspectives based on output data of the light-field camera.
  • the method 700 further includes simultaneously displaying 730 the at least two output image streams on the display.
  • the method 700 may comprise one or more additional optional features corresponding to one or more aspects of the proposed technique, or one or more examples described above.
  • the method 700 may allow to generate a light-field video with high temporal resolution and high dynamic range, which may be particularly suitable for fast dynamic scenarios with challenging lighting conditions.
  • the method 700 may be appliable for virtual mirrors and may improve a realistic depiction of a vehicle’s environment.
  • the method 700 may increase driving safety as it may simulate a rear view or side view mirror and provide a driver of the vehicle with a realistic rear view or side view.
  • the method 700 may offer comfort and safety similar to a conventional mirror while permitting better aerodynamics.
  • Fig- 8 illustrates an example of a vision system 800.
  • the vision system 800 includes a lightfield camera 810 as described herein and a processing circuitry 820.
  • the vision system 800 may, e.g., be for a gaming console.
  • a scene that may be captured by the light-field camera 810 may include one or more players of the gaming console.
  • the processing circuitry 820 may use output data of the light -field camera 810 as input to a gaming program. For instance, the processing circuitry 820 may determine gestures of the players or predict movements of the players based on the output data.
  • the vision system 800 may be for a robot, e.g., the vision system 800 may be integrated into the robot.
  • a scene that may be captured by the light-field camera 810 may be an environment of the robot.
  • the processing circuitry 820 may control an operation of the robot based on the output data of the light-field camera 810.
  • the vision system 800 may be used for other applications than the above- mentioned.
  • the vision system 800 may be for telescopes, microscopes, or smartphones.
  • Examples of the present disclosure may propose a plenoptic DVS camera, a compact camera system for capturing events of a scene from multiple perspectives. This may augment measurements of conventional event camera with additional information about a geometry of the scene. Due to a high temporal resolution and dynamic range, the plenoptic DVS camera may be particularly beneficial for fast dynamic scenarios where perceiving the scene geometry plays a crucial role, such as in automotive scenarios.
  • a light-field camera comprising: at least one image sensor; and at least one optical device configured to receive light emanating from a scene at different perspectives and to direct the light to the at least one image sensor, wherein the at least one image sensor is a dynamic vision sensor.
  • the light-field camera of (1) wherein the light-field camera comprises at least two image sensors and at least two optical devices, wherein the at least two optical devices are configured to receive the light emanating from the scene at a respective perspective, and wherein each of the at least two optical devices is configured to direct the light to a respective one of the at least two image sensors.
  • the at least one optical device comprises: an array of microlenses configured to direct the light to different subsets of a plurality of pixels of the at least one image sensor.
  • a vision system for a vehicle comprising: a light-field camera according to any one of (1) to (5), wherein the scene is an environment of the vehicle; processing circuitry configured to generate at least two output image streams showing the vehicle’s environment from different perspectives based on output data of the light-field camera; and a display configured to simultaneously display the at least two output image streams.
  • processing circuitry is further configured to generate the at least two output image streams by determining a respective intensity value for one or more pixels of a respective one of the at least two output image streams based on a respective one of the at least two event streams.
  • a method of operating a vision system for a vehicle comprising: capturing an environment of the vehicle by the light -field camera; generating at least two output image streams showing the vehicle’s environment from different perspectives based on output data of the light-field camera; and simultaneously displaying the at least two output image streams on the display.
  • a vision system for a gaming console comprising: a light-field camera according to any one of (1) to (5), wherein the scene comprises a player of the gaming console; and processing circuitry configured to use output data of the light-field camera as input to a gaming program.
  • the processing circuitry is further configured to determine a gesture of the player based on the output data.
  • a vision system for a robot comprising: a light-field camera according to any one of (1) to (5), wherein the scene is an environment of the robot; and processing circuitry configured to control an operation of the robot based on output data of the light-field camera.
  • Examples may further be or relate to a (computer) program including a program code to execute one or more of the above methods when the program is executed on a computer, processor, or other programmable hardware component.
  • steps, operations, or processes of different ones of the methods described above may also be executed by programmed computers, processors, or other programmable hardware components.
  • Examples may also cover program storage devices, such as digital data storage media, which are machine-, processor- or computer-readable and encode and/or contain machine-executable, processor-executable or computer-executable programs and instructions.
  • Program storage devices may include or be digital storage devices, magnetic storage media such as magnetic disks and magnetic tapes, hard disk drives, or optically readable digital data storage media, for example.
  • aspects described in relation to a device or system should also be understood as a description of the corresponding method.
  • a block, device or functional aspect of the device or system may correspond to a feature, such as a method step, of the corresponding method.
  • aspects described in relation to a method shall also be understood as a description of a corresponding block, a corresponding element, a property or a functional feature of a corresponding device or a corresponding system.

Abstract

It is provided a light-field camera including at least one image sensor and at least one optical device. The at least one optical device is configured to receive light emanating from a scene at different perspectives and to direct the light to the at least one image sensor. The at least one image sensor is a dynamic vision sensor.

Description

LIGHT-FIELD CAMERA, VISION SYSTEM FOR A VEHICLE, AND METHOD FOR OPERATING A VISION SYSTEM FOR A VEHICLE
Field
Examples relate to a light -field camera, a vision system for a vehicle, and a method for operating a vision system for a vehicle.
Background
A conventional light-field camera may capture a direction of light rays emanating from a scene. This enables, amongst other things, a variable depth of field and three-dimensional imaging. However, capturing a light field presents many technical challenges, such as with respect to energy consumption, latency, dynamic range, or frame-rate. This may be limiting, in particular, for applications in dynamic environments or under challenging lighting conditions such as automotive scenarios. Hence, there may be a demand for improved light-field cameras.
Summary
This demand is met by apparatuses and methods in accordance with the independent claims. Advantageous embodiments are addressed by the dependent claims.
According to a first aspect, the present disclosure provides a light-field camera comprising at least one image sensor and at least one optical device. The at least one optical device is configured to receive light emanating from a scene at different perspectives and to direct the light to the at least one image sensor. The at least one image sensor is a dynamic vision sensor.
According to a second aspect, the present disclosure provides a vision system for a vehicle. The vision system comprises a light-field camera as described herein. The scene is an environment of the vehicle. The vision system further comprises processing circuitry configured to generate at least two output image streams showing the vehicle’s environment from different perspectives based on output data of the light-field camera. The vision system further comprises a display configured to simultaneously display the at least two output image streams.
According to a third aspect, the present disclosure provides a method of operating a vision system for a vehicle. The vision system comprises a light-field camera as described herein, processing circuitry and a display. The method comprises capturing an environment of the vehicle by the light-field camera. The method further comprises generating at least two output image streams showing the vehicle’ s environment from different perspectives based on output data of the light-field camera. The method further comprises simultaneously displaying the at least two output image streams on the display.
According to a fourth aspect, the present disclosure provides a vision system for a gaming console. The vision system comprises a light-field camera as described herein. The scene comprises a player of the gaming console. The vision system further comprises processing circuitry configured to use output data of the light-field camera as input to a gaming program.
According to a fifth aspect, the present disclosure provides a vision system for a robot. The vision system comprises a light-field camera as described herein. The scene is an environment of the robot. The vision system further comprises processing circuitry configured to control an operation of the robot based on output data of the light-field camera.
Brief description of the Figures
Some examples of apparatuses and/or methods will be described in the following by way of example only, and with reference to the accompanying figures, in which
Fig. 1-3 illustrate examples of a light-field camera;
Fig. 4 illustrates an example of a vision system for a vehicle;
Fig. 5a, b illustrate an exemplary passive or active vision system for a vehicle;
Fig. 6 illustrates an exemplary process for converting output data of a light-field camera into output image streams; Fig. 7 illustrates a flow chart of an example of a method of operating a vision system for a vehicle; and
Fig. 8 illustrates an example of a vision system for a gaming console or for a robot.
Detailed Description
Some examples are now described in more detail with reference to the enclosed figures. However, other possible examples are not limited to the features of these embodiments described in detail. Other examples may include modifications of the features as well as equivalents and alternatives to the features. Furthermore, the terminology used herein to describe certain examples should not be restrictive of further possible examples.
Throughout the description of the figures same or similar reference numerals refer to same or similar elements and/or features, which may be identical or implemented in a modified form while providing the same or a similar function. The thickness of lines, layers and/or areas in the figures may also be exaggerated for clarification.
When two elements A and B are combined using an 'or', this is to be understood as disclosing all possible combinations, i.e., only A, only B as well as A and B, unless expressly defined otherwise in the individual case. As an alternative wording for the same combinations, "at least one of A and B" or "A and/or B" may be used. This applies equivalently to combinations of more than two elements.
If a singular form, such as “a”, “an” and “the” is used and the use of only a single element is not defined as mandatory either explicitly or implicitly, further examples may also use several elements to implement the same function. If a function is described below as implemented using multiple elements, further examples may implement the same function using a single element or a single processing entity. It is further understood that the terms "include", "including", "comprise" and/or "comprising", when used, describe the presence of the specified features, integers, steps, operations, processes, elements, components and/or a group thereof, but do not exclude the presence or addition of one or more other features, integers, steps, operations, processes, elements, components and/or a group thereof. Fig- 1 illustrates an example of a light-field camera 100.
The light-field camera 100 includes an optical device 110 and an image sensor 120. The optical device 110 is configured to receive light emanating from a scene at different perspectives and to direct the light to the image sensor 120. The image sensor 120 is a dynamic vision sensor (DVS).
As used herein the term “optical device”, it is intended to refer to any optical component or any assembly of several optical components for receiving light from a scene and directing the light to an image sensor. For instance, the optical device may include at least one of a lens, a mirror, an optical filter, an aperture, a diffractive element.
The term “perspective” may be understood as a position or viewpoint at which an optical device receives light from the scene. In other words, the perspective denotes a certain view on the scene. The perspective may determine a distance between the optical device and the scene and/or a camera angle of the optical device relative to objects in the scene.
The DVS captures light intensity (brightness) changes of the light received from the optical device 110 over time. The DVS includes pixels operating independently and asynchronously, detecting the light intensity changes as they occur, and staying silent otherwise. The pixels may generate electrical signals, called events, which indicate per-pixel light intensity changes by a predefined threshold. Accordingly, the DVS is an example for an event-based image sensor. Each pixel may include a photo-sensitive element exposed to the light received from the optical device 110. The received light may cause a photocurrent in the photo-sensitive element depending on a value of light intensity of the received light. A difference between a resulting output voltage and a previous voltage reset-level may be compared against the predefined threshold. For instance, a circuit of the pixel may include comparators with different bias voltages for an ON- and an OFF-threshold. The comparators may compare the output voltage against the ON- and the OFF-threshold. The ON- and the OFF-threshold may correspond to a voltage level higher or lower by the predefined threshold than the voltage resetlevel, respectively. When the ON- or the OFF-threshold is crossed, an ON- or an OFF-event, respectively, may be communicated to a periphery of the DVS. Then, the voltage reset-level may be newly set to the output voltage that triggered the event. In this manner, the pixel may log a light-intensity change since a previous event. The periphery of the DVS may include a readout circuit to associate each event with a time stamp and pixel coordinates of the pixel that recorded the event. A series of events captured by the DVS at a certain perspective and over a certain time may be considered an event stream.
The light-field camera 100 may capture light of any wavelength, e.g., visible light and/or infrared light.
In other examples, the light-field camera 100 may include a different number of image sensors than shown in Fig. 1, of which at least one is a DVS. The light-field camera 100 may include a different number of optical devices than shown in Fig. 1. The light -field camera 100 may include N image sensors, with N > 1, and M optical devices, with M > 1.
In some examples, the light-field camera 100 may capture events based on light emanating from a scene and additionally capture (an absolute value of) a light intensity and/or a color (i.e., a wavelength) of light emanating from the scene. This may be advantageous for reconstructing an image of the scene.
For capturing the light intensity and/or color of the light, the light-field camera 100 may include at least one conventional (i.e., frame-based) image sensor capturing the light intensity and/or color of the light. Alternatively, at least one of the DVS of the light-field camera 100 may include, besides event-capturing pixels, one or more pixels measuring the light intensity and/or the color of the light (e.g. the one or more pixels may be operated in a frame-based manner). For instance, a certain percentage (e.g., 10%, 15%, or 20%) of pixels of the DVS may be for capturing the light intensity and/or the color of the light.
In other examples, the DVS of the light-field camera 100 may include one or more pixel capable of capturing a light intensity change (event) in the light emanating from the scene and an absolute value of the light intensity of the light. A pixel of the DVS may, e.g., include two circuits of which a first one reports the events and a second one reports the light intensity (and/or color of the light) based on a photocurrent of a common photo-sensitive element. Since the first circuit does not necessarily “consume” the photocurrent when measuring the event, the second circuit may measure the light intensity at the same time, i.e., the two circuits may operate independently. Alternatively, a control circuit may change between a first operation mode of the pixel and a second operation mode of the pixel. In the first operation mode, the photo-sensitive element may be connected to the first circuit. In the second operation mode, the photo-sensitive element may be connected to the second circuit. The first operation mode may enable the pixel to measure events. The second operation mode may enable the pixel to measure the light intensity and/or the color of the light.
The light-field camera 100 may be understood as a plenoptic DVS camera, i.e., a camera capturing events based on light emanating from a scene at different perspectives. This may be advantageous since a so-called light field may be inferred from output data of the lightfield camera. The light field may be a vector function describing a direction of light rays in a three-dimensional space. For instance, based on the output data, a direction of a light ray which triggered an event captured by a DVS of the light-field camera 100 may be determined. Thus, the light-field camera 100 - unlike a camera with a single fixed viewpoint -may provide information about a geometry of the scene. Moreover, the light-field camera may maintain a form factor of a single camera while behaving like an array of multiple cameras.
Unlike a conventional light-field camera with frame-based image sensors, the light-field camera 100 includes at least one DVS. This may be advantageous since the light-field camera 100 may have a lower energy consumption, a lower latency, a higher dynamic range, and a higher frame-rate than a conventional light -field camera.
The light-field camera 100 may capture the light-field of the scene at high temporal resolution and high dynamic range in comparison to conventional frame-based light-field cameras. This may be advantageous in scenarios where visual stimuli are used for control tasks requiring low latency and resilience to challenging lighting conditions, such as in automotive scenarios. The control tasks carried out by an agent, either artificial or human, may additionally benefit from an understanding of a (three-dimensional) geometry of a surrounding environment (which may be inferred from the light-field). Embodiments of the present disclosure may enable, on the one hand, high temporal resolution and dynamic range imaging based on the DVS technology and, on the other hand, the capability of capturing a distance of objects based on the plenoptic camera technology.
Examples of how light emanating from a scene is captured at different perspectives are explained below with reference to Fig. 2 and Fig. 3. Fig- 2 illustrates another example of a light-field camera 200.
The light-field camera 200 includes two optical devices, a first optical device 210-1 and a second optical device 210-2. The two optical devices 210-1 and 210-2 face a scene 230 and are arranged in one line parallel to the scene 230. The first optical device 210-1 is configured to receive light emanating from the scene 230 at a first perspective 215-1. The second optical device 210-2 is configured to receive light emanating from the scene 230 from a second perspective 215-2. Thus, the two optical devices 210-1, 210-2 are configured to receive the light emanating from the scene 230 at a respective perspective.
The first perspective 215-1 is located on a surface of the first optical device 210-1 where chief rays of the light emanating from the scene 230 impinge on the first optical device 210-1. Similarly, the second perspective 215-2 is located on a surface of the second optical device 210-2 where chief rays of the light emanating from the scene 230 impinge on the second optical device 210-2.
The light-field camera 200 further includes two image sensors, a first image sensor 220-1 and a second image sensor 220-2. The first image sensor 220-1 and/or the second image sensor 220-2 is a dynamic vision sensor. The first image sensor 220-1 is placed in front of the first optical device 210-1. The second image sensor 220-2 is placed in front of the second optical device 210-2. The first optical device 210-1 is configured to direct received light to the first image sensor 220-1, and the second optical device 110-2 is configured to direct received light to the second image sensor 220-2. Thus, each of the two optical devices 210-1, 210-2 is configured to direct the light to a respective one of the two image sensors 220-1, 220-2. For instance, the first optical device 210-1 and the second optical device 210-2 may deflect, focus, or diffract the received light onto a photo-sensitive area of the first image sensor 220-1 and a photo-sensitive area of the second image sensor 220-2, respectively. In this way, the first image sensor 220-1 captures light emanating from the scene 230 at the first perspective 215- 1. And the second image sensor 220-2 captures light emanating from the scene 230 at the second perspective 215-2.
In Fig. 2, two optical paths 240-1, 245-1 from the scene 230 to the first optical device 210-1 and two optical paths 240-2, 245-2 from the scene 230 to the second optical device 210-2 are illustrated. The optical paths 240-1, 245-1, 240-2, 245-2 are to be understood as exemplary light rays emanating from the scene 230. Other optical paths may run from the scene 230 to the first optical device 210-1 within a field of view of the first optical device 210-1 or from the scene 230 to the second optical device 210-2 within a field of view of the second optical device 210-2. The field of view of the first optical device 210-1 may at least partly overlap the field of view of the second optical device 210-2.
It is to be noted, that the scene 230 shown in Fig. 2 is meant as an example. In other examples, a scene may show objects in a field of view of a light-field camera and light rays may emanate from the objects.
In other examples, the light-field camera 200 may include a different number of image sensors than shown in Fig. 2, of which at least one is a DVS. The light-field camera 200 may include a different number of optical devices than shown in Fig. 2. In the embodiment shown in Fig. 2, the light-field camera 200 may include X image sensors, with X > 2, and Y optical devices, with Y > 2. In other embodiments (e.g., as shown in Fig. 3), a light-field camera according to the present disclosure may include N image sensors, with N > 1, and M optical devices, with M > 1. For example, one of the first image sensor 220-1 and the second image sensor 220-2 may be omitted. Similarly, one of the first optical device 210-1 and the second optical device 210-2 may be omitted.
In other examples, the light-field camera 200 may capture the light emanating from the scene 230 at more than the two perspectives shown in Fig. 2. The light-field camera 200 may capture the light at n > 2 perspectives.
In other examples, the light-field camera 200 may include at least one optical device that directs light to more than one respective image sensor. Thus, one view of a scene may be captured by more than one image sensor. The light-field camera 200 may additionally or alternatively include at least two optical devices that direct light to one image sensor. Thus, one image sensor may capture more than one view of the scene.
In other examples, the light-field camera 200 may include optical devices and image sensors which are differently arranged than shown in Fig. 2. For instance, the optical devices may be arranged in a two-dimensional array, e.g., in a curved shape. Alternatively, the optical devices and/or the image sensors may be (partly) offset from each other relative to the scene. The optical devices and/or the image sensors may be arranged with regular or irregular distances to each other. The light-field camera 200 may include optical devices of different shapes and types, e.g., convex and concave lenses. The light-field camera 200 may include image sensors of different types, e.g., active-pixel sensors, passive-pixel sensors, charge-coupled devices.
Fig- 3 illustrates another example of a light-field camera 300 according to an embodiment.
The light-field camera 300 includes a DVS 320 placed in parallel to an array 310 of microlenses. The array 310 of microlenses is configured to direct light emanating from a scene 330 to different subsets of a plurality of pixels of the DVS 320.
The array 310 of microlenses is placed between the DVS 320 and a convex lens 316 facing the scene 330. The lens 316 receives light emanating from the scene 330 at different perspectives. The different perspectives may be located on a respective sub-aperture of the lens 316. The sub-apertures may divide the lens 316 into different sections along an axis 318 of the lens 316, e.g., a first section 315-1, a second section 315-2, and a third section 315-3. The axis 318 may be a symmetry axis of the lens 316 and perpendicular to an optical axis of the lens 316.
The array 310 of microlenses includes six microlenses, such as microlens 312 and microlens 314. Microlenses, such as the microlenses 312, 314, may be lenses with a diameter less than a millimeter, e.g., smaller than 10 micrometers. A microlens may be, e.g., a single optical element including a plane surface and a spherical convex or aspherical surface to refract light. In other examples, the microlens may include several layers of optical material, e.g., two flat surfaces and several parallel surfaces with different refractive indexes (gradient-index lens). Other types of microlenses may be Fresnel lenses or binary-optic lenses or a combination of the aforementioned types. Multiple microlenses may be formed into a one-dimensional or two-dimensional array on a supporting substrate, such as the array 310.
In Fig. 3, six optical paths 342-1, 342-2, 342-3, 344-1, 344-2, and 344-3 are illustrated. The optical paths 342-1, 342-2, 342-3 start on a first area 332 of the scene 330, proceed in different directions and impact the lens 316 on the first section 330-1, the second section 330-2, and the third section 330-3, respectively. The optical paths 344-1, 344-2, 344-3 start on a second area 334 of the scene 330, proceed in different directions and impact the lens 316 on the first section 315-1, the second section 315-2, and the third section 315-3, respectively.
The optical paths 342-1, 342-2, 342-3, 344-1, 344-2, and 344-3 are to be understood as exemplary light rays emanating from the scene 330. Other optical paths may run from the scene 330 to the lens 316.
The lens 316 directs the light emanating from the scene 330 to the array 310 of microlenses. Light rays of the optical paths 342-1, 342-2, 342-3 impinge on the microlens 314, and light rays of the optical paths 344-1, 344-2, 344-3 impinge on the microlens 312. Each of the microlenses receives light from a respective area of the scene 330. For instance, the microlens 312 receives light from the area 334, and the microlens 314 receives light from the area 332.
The DVS 320 includes a plurality of pixels. The pixels are arranged in one row and in parallel to the array 310 of microlenses. Each of the microlenses of the array 310 directs the light impinging on the respective microlens to a respective subset of the pixels. For instance, the microlens 312 directs the light to pixels 352-1, 352-2, 352-3, and the microlens 314 directs the light to pixels 354-1, 354-2, 354-3. Light rays of the optical paths 342-1, 342-2, 342-3 may be captured by the pixels 354-1, 354-2, 354-3, respectively. Light rays of the optical paths 344-1, 344-2, 344-3 may be captured by the pixels 352-1, 352-2, 352-3, respectively. Thus, each subset of pixels receives light from a respective microlens and, consequently, from a respective area of the scene 330. And each pixel of a subset receives light of the respective area at a respective perspective. In other words, the pixels of a subset receive light of the respective area at different perspectives. For instance, the pixel 352-1 receives light from the area 334 which passes the first section 315-1 of the lens 316.
The light-field camera 300 may generate output data based on electric signals triggered by light impacting the DVS 320. By combining data of pixels associated to one perspective, a view of the scene 330 at the said perspective may be reconstructed.
A plenoptic DVS camera as disclosed herein may be equipped with an array of microlenses placed in front of an image sensor of the plenoptic DVS camera. Each microlens may be placed in front of an a x b array (subset) of pixels of the image sensor. Pixels of a certain row and column over all arrays may be associated to one view. Thus, the plenoptic DVS camera may capture a x b event streams of the same scene, each from a different camera angle and with a resolution determined by a number of the microlenses.
Since the light-field camera 300 shown in Fig. 3 captures light at several perspectives based on one optical device (e.g., the array 310 of microlenses) and one image sensor (e.g., DVS 320), the light-field camera 300 may be more compact than in other embodiments.
In other examples, the light-field camera 300 may include a DVS with a different number of pixels than shown in Fig. 3. The DVS may include any number > 2 of pixels. The DVS may include a two-dimensional array of c x d pixels. One or more pixels of the DVS may be pixels measuring a light intensity and/or a color of the light. Some of the pixels may be part of more than one subset of pixels related to a respective microlens.
In some examples, the light-field camera 300 may include a different number of image sensors, arrays of microlenses, or lenses than shown in Fig. 3. The light-field camera 300 may include any number > 1 of image sensors, any number > 1 of arrays of microlenses, and any number > 0 of lenses. All image sensors may be DVSs. In other examples, one or more of the image sensors may be DVSs and one or more (but not all) of the image sensors may be framebased image sensors. The light-field camera 300 may provide a different number of perspectives than shown in Fig. 3. For instance, the light-field camera 300 may include a lens which is divided into any number > 2 of sections. In some examples, the light-field camera 300 may include an array of microlenses which include a different number of microlenses than shown in Fig. 3. The array may be a two-dimensional arrangement of r x s microlenses.
In other examples, an arrangement of a lens, an image sensor, and/or an optical device of the light-field camera 300 may differ from the one shown in Fig. 3. For instance, the lens may be placed between the array of microlenses and the image sensor of the light-field camera 300.
An exemplary application using a light-field camera according to the present disclosure is explained with reference to Fig. 4 which illustrates an example of a vision system 400 for a vehicle. The vehicle may be any apparatus for transporting people or cargo. For instance, the vehicle may comprise wheels, nozzles, or propellers driven by an engine (and optionally a powertrain system). In particular, the vehicle may be an automobile, a truck, a motorcycle, or a tractor. Embodiments of the present disclosure further relate to a vehicle comprising the vision system 400.
The vision system 400 includes a light-field camera 410 according to embodiments of the present disclosure, such as the light-field camera 100, 200, or 300 described above. The lightfield camera 410 captures light emanating from a scene 420 at different capturing perspectives. The capturing perspectives refer to views of the scene which are captured by the light - field camera 410. The scene 420 is an environment of the vehicle. The environment of the vehicle may be, e.g., a space in a surrounding of the vehicle (outside the vehicle). For example, the light-field camera 410 may capture light emanating in front, at a side (e.g. left or right side) and/or behind the vehicle.
The vision system 400 further includes a processing circuitry 430. The processing circuitry 430 is communicatively coupled to the light-field camera 410, e.g., for data communication. The processing circuitry 430 may receive output data of the light-field camera 410 via any data communication channel (e.g., wired and/or wireless). The processing circuitry 430 may, for instance, be part of the light-field camera 410 or be separate from the light-field camera 410 (e.g., be part of an electronic control module of the vehicle).
The processing circuitry 430 is configured to generate at least two output image streams showing the vehicle’s environment from different perspectives based on output data of the light - field camera 410. The output data of the light-field camera 410 may include sensor data of an image sensor of the light-field camera 410, such as event streams of a DVS. The perspectives refer to views which the output image streams represent. The perspectives may (partly) correspond to the capturing perspectives or completely differ from the capturing perspectives. A number of the perspectives may be smaller, equal, or higher than a number of the capturing perspectives.
The output image streams may be image streams which represent a series of images (frames) which are assigned to a chronological order in which the images may be displayed. The output image streams may, e.g., be a digital video. For instance, each of the output image streams may show the environment of the vehicle (e.g., rear views or a side views) at a respective perspective. The processing circuitry 430 may, for instance, be a single dedicated processor, a single shared processor, or a plurality of individual processors, some of which or all of which may be shared, a digital signal processor (DSP) hardware, an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA). The processing circuitry 430 may optionally be coupled to, e.g., read only memory (ROM) for storing software, random access memory (RAM) and/or non-volatile memory. The processing circuitry 430 is configured to perform at least part of processes described herein, such as generating the at least two output image streams. Exemplary processes of how the processing circuitry 430 may generate the output image streams are explained below with reference to Fig. 6.
Referring back to Fig. 4, the vision system 400 further includes a display 440 configured to display the output image streams simultaneously, i.e., at the same time. The processing circuitry 430 is communicatively coupled to the display 440 to transmit the output image streams to the display 440.
The display 440 may be any device capable of depicting the output image streams. The display 440 may convey depth perception to a viewer, e.g., a driver, a passenger, or a user of the vehicle. In particular, the display 440 may be a light-field display, a stereo display, a 3D- display, or the like.
In case the output image streams show side views (or rear views) of the vehicle, the vision system 400 may be considered a “virtual mirror”. The virtual mirror may replace a side-view (or rear-view) mirror of the vehicle by the light-field camera 410 streaming acquired content (e.g., the side view or the rear-view) to the display 440 inside a vehicle’s cockpit. Due to smaller dimensions of the light-field camera 410 compared to the side-view mirror, the virtual mirror may improve vehicle aerodynamics, increase an autonomy range of the vehicle, and decrease a noise level inside the vehicle. Using the light-field camera 410 rather than a mirror may also permit better visibility in adverse weather conditions and provide a side view (or rear view) with a wider field of view.
Conventional virtual mirrors, i.e., including a conventional camera, may face several problems regarding a realistic depiction of the side view. A side-view mirror may redirect an entire light field from a scene (a surrounding next to or behind the vehicle) towards the eyes of a viewer (e.g., a driver or user of the vehicle) looking onto the side-view mirror. As a consequence, the viewer may have access to the same information accessible when watching the scene directly. For instance, with a side-view mirror, a user’s left and right eye may see two different views of the same scene, as their viewpoints (perspectives) are different. This is referred to as “parallax”, which may allow the user of the vehicle to have a three-dimensional perception of the scene.
Moreover, a side-view mirror may provide the viewer with an entire light-field coming from the scene. Thus, the two views perceived by the two eyes may change when the viewer moves his/her head. This may allow the viewer to perceive a geometry of the scene more accurately. Larger parallaxes may be provided than with a limited baseline between the two eyes. Further, the viewer may “zoom” into the scene by moving his/her head closer to the side-view mirror, e.g., to focus on a point of interest.
The light field and the parallax provided by a side-view mirror may be crucial in a driving scenario, in particular, for safety reasons. For instance, when a driver of a vehicle intends to start an overtaking maneuver on a highway to change to a fast lane of the highway, the driver may need to understand the approximative distance of vehicles driving on the fast lane, in order to avoid a collision with the vehicles.
However, a conventional virtual mirror may be uncapable of exhibiting the light field or the parallax of the scene since a camera of the conventional virtual mirror may capture the scene from a single fixed viewpoint with a fixed magnification. Consequently, eyes of a viewer looking onto a display of the conventional virtual mirror may see the scene from the fixed viewpoint, even when the viewer moves his/her head. Thus, the conventional virtual mirror may lose a three-dimensional nature of the scene since the camera “flattens” the scene geometry such that a depth of the scene may be unperceivable.
By contrast, a vision system according to the present disclosure, such as vision system 400, may overcome the aforementioned problems of conventional virtual mirrors. Firstly, the proposed vision system 400 may replace a conventional display of a conventional virtual mirror setup by a display capable of displaying several views of the scene at the same time. The display may show a light field derived from output data of a plenoptic DVS camera. This may provide a viewer looking onto the display with a similar experience of looking into a sideview (or rear-view) mirror. The proposed vision system may deliver a respective output image stream to the two viewer’s eyes depending on poses of the eyes. Thus, the two eyes may see different output images streams which may change depending on the poses of the eyes.
Secondly, the vision system 400 includes a plenoptic DVS camera (e.g., the light-field camera 410) instead of a conventional light-field camera. Due to the DVS, the plenoptic DVS camera may capture a scene’s geometric information with high temporal resolution. This may be desirable in a fast and dynamic automotive environment. Further, the plenoptic DVS camera may represent a safer solution than a conventional light-field camera since the DVS may provide a higher dynamic range. Thus, the plenoptic DVS camera may cope better with sudden light changes, e. g., when entering/exiting a tunnel or when lit by other vehicles’ headlights. (When exposed to light changes, a frame-based image sensor may be temporally blinded.) Moreover, the plenoptic DVS camera may provide a higher light sensitivity at night than a conventional light-field camera.
The vision system 400 may be passive or active. “Passive” may refer to a visualization of as many perspectives as the display technology permits, i.e., the display may visualize a lightfield of the captured scene, regardless of poses of viewer’s eyes. In other words, the passive vision system may display light-field without information about the poses of the viewer’s eyes. The poses of the viewer’s eyes may determine which pair of output image streams may be observed by the viewer. This scenario may resemble an operation and physics of a mirror. An example of a passive vision system is further explained with reference to Fig. 5a.
On the contrary, “active” may refer to a visualization of two output image streams showing views selected according to the poses of the viewer’s eyes. In this case, the poses of the viewer’s eyes may be tracked over time and a display’s content may be adapted accordingly. The display may deliver only the two output image streams that correspond to the views that the viewer’s eyes may have when seeing the scene directly or through a mirror. An example of an active vision system is further explained with reference to Fig. 5b.
Fig. 5a illustrates an example of a vision system 500a for a vehicle. The vision system 500a includes a light-field camera 410 as explained with reference to Fig. 4. The light -field camera 410 is configured to capture light from a scene in an environment of the vehicle at different capturing perspectives (e.g., 4, 8, 16, 32 or more capturing perspectives). The light-field camera 410 generates output data 515 based on the captured light and forwards the output data 515 to a processing circuitry 530a. The processing circuitry 530a is configured to generate more than two output image streams 535 (e.g., 4, 8, 16, 32 or more output image streams) for more than two different perspectives (e.g., 4, 8, 16, 32 or more perspectives) based on the output data 515. The output image streams 535 may be a high frame rate and high dynamic range light field video.
The processing circuitry 530a forwards the output image streams 535 to a display 540a. The display 540a is configured to simultaneously display the output image streams 535. The display 540a includes an array of microlenses (e.g., a lenticular lens) or a parallax barrier configured to restrict a view of a user 560 of the vehicle to a respective pair of the more than two output image streams 535 depending on poses of eyes of the user 560. The display 540a may convey depth perception to the user 560. The display 540a may include a conventional display, e.g., a Liquid-Crystal Display. The array of microlenses or the parallax barrier may be placed in front of the conventional display.
The array of microlenses may receive the light of the output image streams 535 provided (output, displayed) by display pixels of the display 540a. The array of microlenses may direct the light of the output image streams 535 towards a respective direction. Thus, when viewed from slightly different angles, the array of microlenses may show a respective output image stream. This may give a three-dimensional impression to the user 560 looking on the array of microlenses from two angles, with the left eye and the right eye. Each microlens of the array of microlenses may be placed in front of an e x f array of the display pixels. Each display pixel of the e x f array may be for displaying a respective view. A number of the microlenses may determine a resolution of the output image streams 535. The display 540a may output an entire light field video. The microlenses may allow the user 560 to see a pair of views of the light field video determined by poses of the user’s eyes.
The parallax barrier may include an opaque layer with a series of spaced slits, allowing the left eye and the right eye of the user 560 to see a respective set of pixels of the underneath conventional display.
In other examples, the display 540a may include a different display technology, e.g., a 3D or light-field display technology. The vision system 500a shown in Fig. 5a may refer to a passive vision system. The passive vision system may be beneficial since it may generate output image streams for any number of viewers that may look at the display 540. Different viewers may receive different views of the scene.
Fig. 5b illustrates another example of a vision system 500b for a vehicle. The vision system 500b includes a light-field camera 410 as explained with reference to Fig. 4. The light-field camera 410 is configured to capture light from a scene in an environment of the vehicle at different capturing perspectives. The light-field camera 410 generates output data 515 based on the captured light and forwards the output data 515 to a processing circuitry 530b. The vision system 500b further includes an eye-tracking device 570 configured to track poses of eyes of a user 560 of the vehicle.
The eye tracking device 570 may, e.g., include a camera in an interior of the vehicle directed at a face of the user 560. The eye tracking device 570 tracks the viewer’s eyes and estimates the viewer’s head position or poses of the viewer’s eyes.
The eye-tracking device 570 may determine data 575 indicating the poses of eyes (e.g., in terms of coordinates) and communicate the data 575 to the processing circuitry 530b. The processing circuitry 530b is configured to determine two different perspectives corresponding to the poses of the eyes (based on the data 575). The processing circuitry 530b is further configured to generate two output image streams 535 for the two different perspectives. In other words, the processing circuitry 530b may determine two perspectives in accordance with the poses of eyes, thus, two perspectives which may correspond to views that the left eye and the right eye of the user 560 may have when looking at the scene directly or through a mirror.
The processing circuitry 530b generates the two output image streams 535, e.g., by selecting two event streams of the output data 515 which are suitable according to the estimated poses of the user’s eyes. For instance, the processing circuitry 530b may select two event streams which approximate best (aligns with) the views the user’s eyes would have when directly looking at the scene or looking at the scene through a mirror. The processing circuitry 530b converts the two selected event streams into two high frame rate and high dynamic range videos (the output image streams 535). Alternatively, the processing circuitry 530b may select several event streams of the output data 515 which approximate best the view of the left eye of the user 560. The processing circuitry 530b may interpolate the selected event streams to generate a first output image stream for the left eye. The first output image stream may correspond to a view the left eye would have when directly looking at the scene or looking at the scene through a mirror. Similarly, the processing circuitry 530b may select several event streams of the output data 515 which approximate best the view of the right eye of the user 560 and generate a second output image stream for the right eye. The first output image stream and the second image stream are the two output image streams 535.
The processing circuitry 530b communicates the two output image streams 535 to a display 540b. The display 540b is configured to simultaneously display (only, exclusively) the two output image streams 535. The display 540 displays the two output image streams 535 towards the user 560.
The vision system 500b shown in Fig. 5b may refer to an active vision system. The active vision system may be beneficial since only the two output image streams 535 are generated, thus, less processing power may be needed.
Fig- 6 illustrates an exemplary process for converting output data 615 of a light-field camera into at least two output image streams 635. The output image streams 635 may be high frame- rate and high dynamic-range image streams (i.e., a light field video). The light-field camera shall be considered a light-field camera according to the present disclosure, e.g., light-field camera 410. The processes may be performed by a processing circuitry of a vision system, such as process circuitry 430. The output data 615 includes at least one event stream representing a certain view of a scene. The output data 615 further includes at least another event stream representing at least one other view of the scene and/or at least one auxiliary image stream representing at least one other view of the scene. For capturing auxiliary image streams, the light-field camera may additionally include a conventional image sensor, and/or a DVS including pixels for measuring light-intensity and/or color of light emanating from the scene. Fig. 6 shows two exemplary options 620a and 620b to convert the output data 615 into image streams 630. The image streams 630 represent different views of the scene corresponding to capturing perspectives represented by the output data 615. In some examples, the output image streams 635 may correspond to the image streams 630. In other examples, the image streams 630 may include the output image streams 635, thus, the output image streams 635 may be selected from the image streams 630, e.g., by determining suitable perspectives and select the image streams 630 accordingly. In still other examples, the output image streams 635 may be derived from the image streams 630. For instance, if the output image streams 635 show more and/or different views than the image streams 630, views of the image streams 630 may be interpolated to generate views of the output image streams 635.
In option 620a, the output data 615 represents at least two event streams. Each of the event streams represents events in the scene from a respective capturing perspective. A processing circuitry of a vision system, such as vision system 400, is configured to convert the at least two event streams into the at least two output image streams 635.
For instance, the processing circuitry may be configured to generate the at least two output image streams 635 by determining a respective intensity value for one or more pixels of a respective one of the at least two output image streams 635 based on a respective one of the at least two event streams.
The processing circuitry may be configured to determine the respective intensity value using a trained machine-learning model, for instance. The event streams may, firstly, be mapped to image streams 630 from which the output image streams 635 are selected. The mapping may be performed by using the trained machine-learning model or a machine-learning algorithm, for instance.
Machine learning may refer to algorithms and statistical models that computer systems may use to perform a specific task without using explicit instructions, instead relying on models and inference. For example, in machine-learning, instead of a rule-based transformation of data, a transformation of data may be used, that is inferred from an analysis of historical and/or training data. For example, the content of event streams may be analyzed using a machinelearning model or using a machine-learning algorithm. In order for the machine-learning model to analyze the content of an event stream, the machine-learning model may be trained using training event streams as input and training content information as output. By training the machine-learning model with a large number of training event streams and/or training sequences and associated training content information (e.g., labels or annotations), the machine-learning model "learns" to recognize the content of the event streams, so the content of event streams that are not included in the training data can be recognized using the machinelearning model. The same principle may be used for other kinds of data as well: By training a machine-learning model using training data and a desired output, the machine-learning model "learns" a transformation between the data and the output, which can be used to provide an output based on non-training data provided to the machine-learning model. The provided data (e.g., event data, image data) may be preprocessed to obtain a feature vector, which is used as input to the machine-learning model.
Machine-learning models may be trained using training input data. The examples specified above use a training method called "supervised learning". In supervised learning, the machine-learning model is trained using a plurality of training samples, wherein each sample may comprise a plurality of input data values, and a plurality of desired output values, i.e., each training sample is associated with a desired output value. By specifying both training samples and desired output values, the machine-learning model "learns" which output value to provide based on an input sample that is similar to the samples provided during the training. Apart from supervised learning, semi-supervised learning may be used. In semi -supervised learning, some of the training samples lack a corresponding desired output value. Supervised learning may be based on a supervised learning algorithm (e.g., a classification algorithm, a regression algorithm, or a similarity learning algorithm. Classification algorithms may be used when the outputs are restricted to a limited set of values (categorical variables), i.e., the input is classified to one of the limited set of values. Regression algorithms may be used when the outputs may have any numerical value (within a range). Similarity learning algorithms may be similar to both classification and regression algorithms but are based on learning from examples using a similarity function that measures how similar or related two objects are. Apart from supervised or semi-supervised learning, unsupervised learning may be used to train the machine-learning model. In unsupervised learning, (only) input data might be supplied, and an unsupervised learning algorithm may be used to find structure in the input data (e.g., by grouping or clustering the input data, finding commonalities in the data). Clustering is one exemplary method to find structures in the input data. Clustering is the assignment of input data comprising a plurality of input values into subsets (clusters) so that input values within the same cluster are similar according to one or more (predefined) similarity criteria, while being dissimilar to input values that are included in other clusters.
Reinforcement learning is a third group of machine-learning algorithms. In other words, reinforcement learning may be used to train the machine-learning model. In reinforcement learning, one or more software actors (called "software agents") are trained to take actions in an environment. Based on the taken actions, a reward is calculated. Reinforcement learning is based on training the one or more software agents to choose the actions such, that the cumulative reward is increased, leading to software agents that become better at the task they are given (as evidenced by increasing rewards).
Furthermore, some techniques may be applied to some of the machine-learning algorithms. For example, feature learning may be used. In other words, the machine-learning model may at least partially be trained using feature learning, and/or the machine-learning algorithm may comprise a feature learning component. Feature learning algorithms, which may be called representation learning algorithms, may preserve the information in their input but also transform it in a way that makes it useful, often as a pre-processing step before performing classification or predictions. Feature learning may be based on principal components analysis or cluster analysis, for example.
In some examples, anomaly detection (i.e., outlier detection) may be used, which is aimed at providing an identification of input values that raise suspicions by differing significantly from the maj ority of input or training data. In other words, the machine-learning model may at least partially be trained using anomaly detection, and/or the machine-learning algorithm may comprise an anomaly detection component.
In some examples, the machine-learning algorithm may use a decision tree as a predictive model. In other words, the machine-learning model may be based on a decision tree. In a decision tree, observations about an item (e.g., a set of input values) may be represented by the branches of the decision tree, and an output value corresponding to the item may be represented by the leaves of the decision tree. Decision trees may support both discrete values and continuous values as output values. If discrete values are used, the decision tree may be denoted a classification tree, if continuous values are used, the decision tree may be denoted a regression tree. Association rules are a further technique that may be used in machine-learning algorithms. In other words, the machine-learning model may be based on one or more association rules. Association rules are created by identifying relationships between variables in large amounts of data. The machine-learning algorithm may identify and/or utilize one or more relational rules that represent the knowledge that is derived from the data. The rules may e.g., be used to store, manipulate, or apply the knowledge.
Machine-learning algorithms are usually based on a machine-learning model. In other words, the term "machine-learning algorithm" may denote a set of instructions that may be used to create, train, or use a machine-learning model. The term "machine-learning model" may denote a data structure and/or set of rules that represents the learned knowledge (e.g., based on the training performed by the machine-learning algorithm). In embodiments, the usage of a machine-learning algorithm may imply the usage of an underlying machine-learning model (or of a plurality of underlying machine-learning models). The usage of a machine-learning model may imply that the machine-learning model and/or the data structure/set of rules that is the machine-learning model is trained by a machine-learning algorithm.
For example, the machine-learning model may be an artificial neural network (ANN). ANNs are systems that are inspired by biological neural networks, such as can be found in a retina or a brain. ANNs comprise a plurality of interconnected nodes and a plurality of connections, so-called edges, between the nodes. There are usually three types of nodes, input nodes that receive input values, hidden nodes that are (only) connected to other nodes, and output nodes that provide output values. Each node may represent an artificial neuron. Each edge may transmit information, from one node to another. The output of a node may be defined as a (non-linear) function of its inputs (e.g., of the sum of its inputs). The inputs of a node may be used in the function based on a "weight" of the edge or of the node that provides the input. The weight of nodes and/or of edges may be adjusted in the learning process. In other words, the training of an artificial neural network may comprise adjusting the weights of the nodes and/or edges of the artificial neural network, i.e., to achieve a desired output for a given input.
Alternatively, the machine-learning model may be a support vector machine, a random forest model or a gradient boosting model. Support vector machines (i.e., support vector networks) are supervised learning models with associated learning algorithms that may be used to analyze data (e.g., in classification or regression analysis). Support vector machines may be trained by providing an input with a plurality of training input values that belong to one of two categories. The support vector machine may be trained to assign a new input value to one of the two categories. Alternatively, the machine-learning model may be a Bayesian network, which is a probabilistic directed acyclic graphical model. A Bayesian network may represent a set of random variables and their conditional dependencies using a directed acyclic graph. Alternatively, the machine-learning model may be based on a genetic algorithm, which is a search algorithm and heuristic technique that mimics the process of natural selection.
Alternatively, the machine-learning model may be trained based on self-supervised learning. Self-supervised learning may be regarded as an intermediate form of supervised and unsupervised learning. For self-supervised learning, training data classified by humans are not necessarily required. In a first step, a pretext task may be solved based on pseudo-labels which may help to initialize computational weights of the machine-learning model in a way which is useful for a second step. In the second step, a more complex downstream task may be solved. This downstream task which computes the actual task may be performed with supervised or unsupervised learning as described above.
In case of self-supervised learning, two neural networks may be jointly trained, for instance. A first neural network may be a convolutional neural network for estimating an optical flow by compensating for the motion blur in input event streams. A second neural network may be a recurrent convolutional network for performing image reconstruction through event-based photometric constancy.
The optical flow is a distribution of apparent velocities of movement in light intensity patterns which are caused by motion of a scene relative to an observer of the scene. The event-based photometric constancy is a linearization of pixel-wise light intensity increments under the assumptions of Lambertian surfaces, constant illumination, and small-time windows in which the light intensity increment occurs.
The first neural network may estimate the optical flow based on the input event streams. The second neural network may use the estimated optical flow and prior reconstructed image streams to estimate a first distribution of light intensity increments for events of the input event streams based on the event-based photometric constancy. The second neural network may determine a second distribution of light intensity increments by deblurring and averaging the input events. The two distributions may be compared and a difference between the two distributions may be propagated backwards to the first neural network to improve a reconstruction accuracy of the first neural network.
Referring back to Fig. 6, for option 620b, the output data 615 represents at least one event stream of the scene. A processing circuitry of a vision system according to embodiments disclosed herein receives at least two auxiliary image streams. Each of the at least two auxiliary image streams shows the scene from a respective capturing perspective. The processing circuitry generates the at least two output image streams 635 by increasing a framerate and/or a dynamic range of the at least two auxiliary image streams based on the at least one event stream.
In other words, the auxiliary image streams may be super-resolved based on the event streams. For instance, one of the auxiliary image streams may include a first frame captured at a time tl and a consecutive second frame captured at a later time t2. Synthetic frames for a time interval between tl and t2 may be generated by interpolating between light intensities of the first frame and the second frame. A corresponding interpolation function may be determined based on events (of the event streams) that occurred in the time interval between tl and t2.
Referring back to Fig. 6, a processing circuitry of the vision system is further configured to determine a distance between a light-field camera of the vision system and objects in the scene based on the output data 615. Thus, the output data 615 is used to derive distance information 640. The processing circuitry may adapt a field of focus of the at least two output image streams 635 based on the distance. For instance, the processing circuitry may be configured to determine a difference between corresponding portions of the output data 615, wherein each of the portions relates to a respective capturing perspective, and determine the distance based on the difference.
This may be advantageous since the distance may help to adjust a field of depth of the output image streams 635, i.e., a suitable apparent focal length may be determined to sharpen objects of interest in the scene. Moreover, the distance information 640 may improve a generation of views of the scene which are not captured by the light-field camera. For instance, in case a light-field display of the vision system has a higher resolution than the light-field camera, more views may be shown by the light-field display than represented by the output data 615. Or, in case of an active vision system, two views of the scene may be determined which are aligned with poses of eyes of a viewer. In case the two determined views are not part of the captured views, the two views may be generated synthetically with the help of the distance information 640. In other examples of the present disclosure, determining the distance to objects in the scene may be omitted.
One option 650a to approximate the distance to objects in the scene may be deriving the distance information 640 from the event streams of the output data 615. For this purpose, events of the event streams may be associated to respective three-dimensional coordinates. Depth estimation techniques for plenoptic cameras may be applied to the event streams, such as triangulation. Light rays may be traced back to an object plane of the objects where the light rays have been emitted from. Assuming lens parameters of the light-field camera are given, a slope of each light ray in object space may be retrieved and described as a linear function. To approximate the distance to the objects, linear functions of two arbitrary light rays intersecting at the object plane may be mathematically equated. A common solution for the linear functions may correspond to the distance.
Another depth estimation technique to associate the events to respective three-dimensional coordinates may be determining a disparity between corresponding events of different views. The disparity corresponds to a difference in pixel location of the corresponding events. The light-field camera may generate an output similar to an output of several event cameras arranged in an array with known distances between adjacent event cameras (baseline). Each event camera of the array captures a different view of the scene. A moving point P in the scene triggers events in each of the event cameras. The movement of point P leaves traces in event streams of the event cameras arranged in one row of the array. The traces are shifted due to the movement, thus, create a disparity across the event streams of the different views. Due to the constant distance between the event cameras, a slope of the disparity corresponds to a speed at which P moves past the event cameras of one row. The slope is in a mathematical relation with the distance between the point P and the array of cameras. The same approach applies to the light-field camera. In comparison to a conventional light-field camera, the lightfield camera according to the present disclosure may simplify a detection of the disparity associated to the point P and an estimation of the slope due to a sparser nature of events compared to images (frames). Another depth estimation technique is to estimate the distance to objects in the scene based on an auxiliary image stream received by the vision system (e.g., option 620b). An event- derived optical flow may be used to assign a distance value to events being recorded at a time between a first capturing time of a first frame of the auxiliary image stream and a second capturing time of a subsequent frame of the auxiliary image stream. The optical flow may be derived, e.g., based on an adaptive block matching optical flow (ABMOF) method which computes the optical flow at pixels where brightness changes.
Another option 650b may be deriving the distance information 640 based on the image streams 630. Depth estimation techniques for the (multi-view, stereo) output image streams 630 may be applied.
Based on the image streams 630 and, optionally, based on the distance information 640, the output image streams 635 may be generated. For an active vision system, additional information about poses of eyes of a viewer may be needed.
Fig- 7 illustrates a flow chart of an example of a method 700 of operating a vision system for a vehicle, such as the vision system 500a or 500b. The vision system includes a light-field camera according to embodiments of the present disclosure, processing circuitry and a display. The method 700 includes capturing 710 an environment of the vehicle by the light -field camera. The method 700 further includes generating 720 at least two output image streams showing the vehicle’s environment from different perspectives based on output data of the light-field camera. The method 700 further includes simultaneously displaying 730 the at least two output image streams on the display.
More details and aspects of the method 700 are explained in connection with the proposed technique or one or more examples described above, e.g., with reference to Fig. 4. The method 700 may comprise one or more additional optional features corresponding to one or more aspects of the proposed technique, or one or more examples described above.
The method 700 may allow to generate a light-field video with high temporal resolution and high dynamic range, which may be particularly suitable for fast dynamic scenarios with challenging lighting conditions. The method 700 may be appliable for virtual mirrors and may improve a realistic depiction of a vehicle’s environment. The method 700 may increase driving safety as it may simulate a rear view or side view mirror and provide a driver of the vehicle with a realistic rear view or side view. In particular, the method 700 may offer comfort and safety similar to a conventional mirror while permitting better aerodynamics.
Fig- 8 illustrates an example of a vision system 800. The vision system 800 includes a lightfield camera 810 as described herein and a processing circuitry 820.
The vision system 800 may, e.g., be for a gaming console. In this case, a scene that may be captured by the light-field camera 810 may include one or more players of the gaming console. The processing circuitry 820 may use output data of the light -field camera 810 as input to a gaming program. For instance, the processing circuitry 820 may determine gestures of the players or predict movements of the players based on the output data.
In other examples, the vision system 800 may be for a robot, e.g., the vision system 800 may be integrated into the robot. A scene that may be captured by the light-field camera 810 may be an environment of the robot. The processing circuitry 820 may control an operation of the robot based on the output data of the light-field camera 810.
In other examples, the vision system 800 may be used for other applications than the above- mentioned. For instance, the vision system 800 may be for telescopes, microscopes, or smartphones.
Examples of the present disclosure may propose a plenoptic DVS camera, a compact camera system for capturing events of a scene from multiple perspectives. This may augment measurements of conventional event camera with additional information about a geometry of the scene. Due to a high temporal resolution and dynamic range, the plenoptic DVS camera may be particularly beneficial for fast dynamic scenarios where perceiving the scene geometry plays a crucial role, such as in automotive scenarios.
The following examples pertain to further embodiments:
(1) A light-field camera, comprising: at least one image sensor; and at least one optical device configured to receive light emanating from a scene at different perspectives and to direct the light to the at least one image sensor, wherein the at least one image sensor is a dynamic vision sensor.
(2) The light-field camera of (1), wherein the light-field camera comprises at least two image sensors and at least two optical devices, wherein the at least two optical devices are configured to receive the light emanating from the scene at a respective perspective, and wherein each of the at least two optical devices is configured to direct the light to a respective one of the at least two image sensors.
(3) The light-field camera of (2), wherein one of the at least two image sensors is the dynamic vision sensor, and wherein the other one of the at least two image sensors is configured to measure a light intensity and/or a color of the light.
(4) The light-field camera of (1), wherein the at least one optical device comprises: an array of microlenses configured to direct the light to different subsets of a plurality of pixels of the at least one image sensor.
(5) The light-field camera of any one of (1) to (4), wherein the at least one image sensor comprises one or more pixels configured to measure a light intensity and/or a color of the light.
(6) A vision system for a vehicle, comprising: a light-field camera according to any one of (1) to (5), wherein the scene is an environment of the vehicle; processing circuitry configured to generate at least two output image streams showing the vehicle’s environment from different perspectives based on output data of the light-field camera; and a display configured to simultaneously display the at least two output image streams.
(7) The vision system of (6), wherein the output data represents at least two event streams, wherein each of the at least two event streams represents events in the scene from a respective capturing perspective, and wherein the processing circuitry is further configured to convert the at least two event streams into the at least two output image streams.
(8) The vision system of (7), wherein the processing circuitry is further configured to generate the at least two output image streams by determining a respective intensity value for one or more pixels of a respective one of the at least two output image streams based on a respective one of the at least two event streams.
(9) The vision system of (8), wherein the processing circuitry is configured to determine the respective intensity value using a trained machine-learning model.
(10) The vision system of (6), wherein the output data represents at least one event stream of the scene, and wherein the processing circuitry is further configured to: receive at least two auxiliary image streams, wherein each of the at least two auxiliary image streams shows the scene from a respective capturing perspective; and generate the at least two output image streams by increasing a framerate and/or a dynamic range of the at least two auxiliary image streams based on the at least one event stream.
(11) The vision system of any one of (6) to (10), wherein the processing circuitry is further configured to: determine a distance between the light-field camera and objects in the scene based on the output data; and adapt a field of focus of the at least two output image streams based on the distance.
(12) The vision system of (11), wherein the processing circuitry is configured to determine a difference between corresponding portions of the output data, wherein each of the portions relates to a respective capturing perspective, and determine the distance based on the difference.
(13) The vision system of any one of (6) to (12), further comprising: an eye-tracking device configured to track poses of eyes of a user of the vehicle, wherein the processing circuitry is further configured to: determine two different perspectives corresponding to the poses of the eyes; and generate two output image streams for the two different perspectives, wherein the display is configured to simultaneously display the two output image streams.
(14) The vision system of any one of (6) to (12), wherein the processing circuitry is further configured to generate more than two output image streams for more than two different perspectives, wherein the display is configured to simultaneously display the more than two output image streams, and wherein the display comprises an array of microlenses or a parallax barrier configured to restrict a view of a user of the vehicle to a respective pair of the more than two output image streams depending on poses of eyes of the user.
(15) The vision system of any one of (6) to (14), wherein the at least two output image streams show a rear view or a side view of the vehicle.
(16) A method of operating a vision system for a vehicle, wherein the vision system comprises a light-field camera according to any one of (1) to (5), processing circuitry and a display, the method comprising: capturing an environment of the vehicle by the light -field camera; generating at least two output image streams showing the vehicle’s environment from different perspectives based on output data of the light-field camera; and simultaneously displaying the at least two output image streams on the display.
(17) A vision system for a gaming console, comprising: a light-field camera according to any one of (1) to (5), wherein the scene comprises a player of the gaming console; and processing circuitry configured to use output data of the light-field camera as input to a gaming program. (18) The vision system of (17), wherein the processing circuitry is further configured to determine a gesture of the player based on the output data.
(19) The vision system of (17) or (18), wherein the processing circuitry is further configured to predict a movement of the player based on the output data.
(20) A vision system for a robot, comprising: a light-field camera according to any one of (1) to (5), wherein the scene is an environment of the robot; and processing circuitry configured to control an operation of the robot based on output data of the light-field camera.
The aspects and features described in relation to a particular one of the previous examples may also be combined with one or more of the further examples to replace an identical or similar feature of that further example or to additionally introduce the features into the further example.
Examples may further be or relate to a (computer) program including a program code to execute one or more of the above methods when the program is executed on a computer, processor, or other programmable hardware component. Thus, steps, operations, or processes of different ones of the methods described above may also be executed by programmed computers, processors, or other programmable hardware components. Examples may also cover program storage devices, such as digital data storage media, which are machine-, processor- or computer-readable and encode and/or contain machine-executable, processor-executable or computer-executable programs and instructions. Program storage devices may include or be digital storage devices, magnetic storage media such as magnetic disks and magnetic tapes, hard disk drives, or optically readable digital data storage media, for example.
It is further understood that the disclosure of several steps, processes, operations, or functions disclosed in the description or claims shall not be construed to imply that these operations are necessarily dependent on the order described, unless explicitly stated in the individual case or necessary for technical reasons. Therefore, the previous description does not limit the execution of several steps or functions to a certain order. Furthermore, in further examples, a single step, function, process, or operation may include and/or be broken up into several sub-steps, -functions, -processes or -operations.
If some aspects have been described in relation to a device or system, these aspects should also be understood as a description of the corresponding method. For example, a block, device or functional aspect of the device or system may correspond to a feature, such as a method step, of the corresponding method. Accordingly, aspects described in relation to a method shall also be understood as a description of a corresponding block, a corresponding element, a property or a functional feature of a corresponding device or a corresponding system.
The following claims are hereby incorporated in the detailed description, wherein each claim may stand on its own as a separate example. It should also be noted that although in the claims a dependent claim refers to a particular combination with one or more other claims, other examples may also include a combination of the dependent claim with the subject matter of any other dependent or independent claim. Such combinations are hereby explicitly proposed, unless it is stated in the individual case that a particular combination is not intended. Furthermore, features of a claim should also be included for any other independent claim, even if that claim is not directly defined as dependent on that other independent claim.

Claims

Claims What is claimed is:
1. A light-field camera, comprising: at least one image sensor; and at least one optical device configured to receive light emanating from a scene at different perspectives and to direct the light to the at least one image sensor, wherein the at least one image sensor is a dynamic vision sensor.
2. The light-field camera of claim 1, wherein the light-field camera comprises at least two image sensors and at least two optical devices, wherein the at least two optical devices are configured to receive the light emanating from the scene at a respective perspective, and wherein each of the at least two optical devices is configured to direct the light to a respective one of the at least two image sensors.
3. The light-field camera of claim 2, wherein one of the at least two image sensors is the dynamic vision sensor, and wherein the other one of the at least two image sensors is configured to measure a light intensity and/or a color of the light.
4. The light-field camera of claim 1, wherein the at least one optical device comprises: an array of microlenses configured to direct the light to different subsets of a plurality of pixels of the at least one image sensor.
5. The light-field camera of claim 1, wherein the at least one image sensor comprises one or more pixels configured to measure a light intensity and/or a color of the light.
6. A vision system for a vehicle, comprising: a light-field camera according to claim 1, wherein the scene is an environment of the vehicle; processing circuitry configured to generate at least two output image streams showing the vehicle’s environment from different perspectives based on output data of the light-field camera; and a display configured to simultaneously display the at least two output image streams.
7. The vision system of claim 6, wherein the output data represents at least two event streams, wherein each of the at least two event streams represents events in the scene from a respective capturing perspective, and wherein the processing circuitry is further configured to convert the at least two event streams into the at least two output image streams.
8. The vision system of claim 7, wherein the processing circuitry is further configured to generate the at least two output image streams by determining a respective intensity value for one or more pixels of a respective one of the at least two output image streams based on a respective one of the at least two event streams.
9. The vision system of claim 8, wherein the processing circuitry is configured to determine the respective intensity value using a trained machine-learning model.
10. The vision system of claim 6, wherein the output data represents at least one event stream of the scene, and wherein the processing circuitry is further configured to: receive at least two auxiliary image streams, wherein each of the at least two auxiliary image streams shows the scene from a respective capturing perspective; and generate the at least two output image streams by increasing a framerate and/or a dynamic range of the at least two auxiliary image streams based on the at least one event stream.
11. The vision system of claim 6, wherein the processing circuitry is further configured to: determine a distance between the light-field camera and objects in the scene based on the output data; and adapt a field of focus of the at least two output image streams based on the distance.
12. The vision system of claim 11, wherein the processing circuitry is configured to determine a difference between corresponding portions of the output data, wherein each of the portions relates to a respective capturing perspective, and determine the distance based on the difference.
13. The vision system of claim 6, further comprising: an eye-tracking device configured to track poses of eyes of a user of the vehicle, wherein the processing circuitry is further configured to: determine two different perspectives corresponding to the poses of the eyes; and generate two output image streams for the two different perspectives, wherein the display is configured to simultaneously display the two output image streams.
14. The vision system of claim 6, wherein the processing circuitry is further configured to generate more than two output image streams for more than two different perspectives, wherein the display is configured to simultaneously display the more than two output image streams, and wherein the display comprises an array of microlenses or a parallax barrier configured to restrict a view of a user of the vehicle to a respective pair of the more than two output image streams depending on poses of eyes of the user.
15. The vision system of claim 6, wherein the at least two output image streams show a rear view or a side view of the vehicle.
16. A method of operating a vision system for a vehicle, wherein the vision system comprises a light-field camera according to claim 1, processing circuitry and a display, the method comprising: capturing an environment of the vehicle by the light-field camera; generating at least two output image streams showing the vehicle’s environment from different perspectives based on output data of the light-field camera; and simultaneously displaying the at least two output image streams on the display.
17. A vision system for a gaming console, comprising: a light-field camera according to claim 1, wherein the scene comprises a player of the gaming console; and processing circuitry configured to use output data of the light-field camera as input to a gaming program.
18. The vision system of claim 17, wherein the processing circuitry is further configured to determine a gesture of the player based on the output data.
19. The vision system of claim 17, wherein the processing circuitry is further configured to predict a movement of the player based on the output data.
20. A vision system for a robot, comprising: a light-field camera according to claim 1, wherein the scene is an environment of the robot; and processing circuitry configured to control an operation of the robot based on output data of the light-field camera.
PCT/EP2022/076528 2021-09-29 2022-09-23 Light-field camera, vision system for a vehicle, and method for operating a vision system for a vehicle WO2023052264A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP21199960 2021-09-29
EP21199960.2 2021-09-29

Publications (1)

Publication Number Publication Date
WO2023052264A1 true WO2023052264A1 (en) 2023-04-06

Family

ID=78179202

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2022/076528 WO2023052264A1 (en) 2021-09-29 2022-09-23 Light-field camera, vision system for a vehicle, and method for operating a vision system for a vehicle

Country Status (1)

Country Link
WO (1) WO2023052264A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160096477A1 (en) * 2014-10-07 2016-04-07 Magna Electronics Inc. Vehicle vision system with gray level transition sensitive pixels
US20180180733A1 (en) * 2016-12-27 2018-06-28 Gerard Dirk Smits Systems and methods for machine perception
WO2020163663A1 (en) * 2019-02-07 2020-08-13 Magic Leap, Inc. Lightweight and low power cross reality device with high temporal resolution
US20210279909A1 (en) * 2020-03-03 2021-09-09 Magic Leap, Inc. Efficient localization based on multiple feature types

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160096477A1 (en) * 2014-10-07 2016-04-07 Magna Electronics Inc. Vehicle vision system with gray level transition sensitive pixels
US20180180733A1 (en) * 2016-12-27 2018-06-28 Gerard Dirk Smits Systems and methods for machine perception
WO2020163663A1 (en) * 2019-02-07 2020-08-13 Magic Leap, Inc. Lightweight and low power cross reality device with high temporal resolution
US20210279909A1 (en) * 2020-03-03 2021-09-09 Magic Leap, Inc. Efficient localization based on multiple feature types

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GUILLERMO GALLEGO ET AL: "Event-based Vision: A Survey", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 17 April 2019 (2019-04-17), XP081170744 *
ZHU ALEX ZIHAO ET AL: "The Multi Vehicle Stereo Event Camera Dataset: An Event Camera Dataset for 3D Perception", 30 January 2018 (2018-01-30), XP055795584, Retrieved from the Internet <URL:https://arxiv.org/pdf/1801.10202.pdf> [retrieved on 20210415], DOI: 10.1109/LRA.2018.2800793 *

Similar Documents

Publication Publication Date Title
US10071688B2 (en) Multi-focus optical system
US20200290513A1 (en) Light field display system for vehicle augmentation
US20160280136A1 (en) Active-tracking vehicular-based systems and methods for generating adaptive image
US11922711B2 (en) Object tracking assisted with hand or eye tracking
US20210142086A1 (en) Sparse image sensing and processing
CN104883554A (en) Virtual see-through instrument cluster with live video
US11348262B1 (en) Three-dimensional imaging with spatial and temporal coding for depth camera assembly
JP2005126068A (en) Adaptively imaging night vision device
US10687034B1 (en) Image sensor with switchable optical filter
CN115803656A (en) Method for measuring the distance between an object and an optical sensor, control device for carrying out said method, distance measuring device having such a control device, and motor vehicle having such a distance measuring device
US20180370461A1 (en) Vehicle system for reducing motion sickness
US20190313086A1 (en) System and method for generating virtual objects in a reflective augmented reality system
US20230403385A1 (en) Spad array for intensity image capture and time of flight capture
CN107797436B (en) Holographic display device and display method thereof
CN111629930A (en) Mirror replacement system and method for displaying image data and/or video data of the surroundings of a motor vehicle
CN109765693A (en) Three-dimensional imaging display system and the display methods for showing stereopsis
Badgujar et al. Driver gaze tracking and eyes off the road detection
KR20180129044A (en) Driver assistance apparatus in vehicle and method for guidance a safety driving thereof
CN107483915B (en) Three-dimensional image control method and device
WO2023052264A1 (en) Light-field camera, vision system for a vehicle, and method for operating a vision system for a vehicle
CN110908500B (en) Target information preloading display method and device
Schwehr et al. How to evaluate object-of-fixation detection
US9197822B1 (en) Array augmented parallax image enhancement system and method
Inoue et al. Enhanced depth estimation of integral imaging using pixel blink rate
Uchino et al. Photon-counting integral imaging using Bayesian estimation with depth camera information

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22782886

Country of ref document: EP

Kind code of ref document: A1