CN115375581A

CN115375581A - Dynamic visual event stream noise reduction effect evaluation method based on event time-space synchronization

Info

Publication number: CN115375581A
Application number: CN202211076662.8A
Authority: CN
Inventors: 王立辉; 许宁徽
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2022-09-05
Filing date: 2022-09-05
Publication date: 2022-11-22

Abstract

The dynamic visual event stream noise reduction effect evaluation method based on event time-space synchronization comprises the following steps: 1. reading an event stream output by the dynamic vision sensor, and acquiring pose information of the dynamic vision sensor; 2. performing three-dimensional reconstruction by using the event stream and the pose information, and synchronizing the events triggered at different moments to a reference moment in a time-space mode to obtain a confidence map; 3. converting the confidence map into an event probability map, and representing the response probability of the dynamic visual sensor to the scene under an ideal condition; 4. calculating the rationality of the event stream based on the consistency of the event stream and the event probability map; 5. and by improving the rationality of the event stream through the calculation noise reduction algorithm, obtaining the noise reduction precision index for evaluating and comparing the noise reduction effects of different algorithms. The method can realize the objective evaluation of the noise reduction precision of the event stream by utilizing the high-frequency advantage of the dynamic vision sensor and combining the pose information under the condition that the specific distribution of noise and the reference event stream are unknown.

Description

Dynamic visual event stream noise reduction effect evaluation method based on event time-space synchronization

Technical Field

The invention belongs to the technical field of sensor signal processing, and particularly relates to a dynamic visual event stream noise reduction effect evaluation method based on event time-space synchronization.

Background

A Dynamic Vision Sensor (DVS), also called an event camera, is a biological heuristic Sensor that operates in a completely different manner from conventional cameras by simulating the visual system of a natural organism. The dynamic vision sensor does not output images at a fixed rate, drives the illumination change on each pixel sensitive based on asynchronous events, and fires an "Event" (Event) with ultra-low latency (less than 1 microsecond) when the logarithmic change in brightness reaches a preset threshold. Each event is represented by a four-dimensional vector e (x, y, t, p) containing the pixel coordinates (x, y) of the event, the trigger time t, and a polarity p, where the polarity p e {1, -1} represents an increase or decrease in luminance across the pixel. The DVS only outputs related information of local brightness change, has the characteristics of high response speed, ultralow delay, high dynamic range, only capturing dynamic change, low power consumption and the like, can overcome the defects of high redundancy, low frame rate, large delay, low dynamic range and the like of the traditional camera, and further is widely applied to the fields of unmanned driving, robots and the like.

The dynamic vision sensor is very sensitive to the change of the ambient brightness due to the structure of the dynamic vision sensor, is limited by hardware water, and contains a large amount of noise interference in the output asynchronous event stream. Noise may be derived from impulse noise during digital signal transmission, gaussian noise caused by a photodiode, and the like, and greatly affects further application and visualization of an event stream, so that noise reduction processing on event stream information is a very important link, and noise reduction of an event stream also becomes an important research subject in the field of dynamic visual sensors.

However, because the data load of the DVS is too large (millions of events are output per second), it is difficult to manually mark the validity of each event, and the specific distribution of noise and the reference event stream cannot be obtained, so that there is no effective method for measuring the noise reduction effect of the event stream and comparing the noise reduction effects of different algorithms, thereby restricting further development and application of the event stream noise reduction algorithm and the dynamic visual sensor.

Disclosure of Invention

Aiming at the problems, the invention provides a dynamic visual event stream noise reduction effect evaluation method based on event time-space synchronization, which utilizes the high-frequency advantage of a dynamic visual sensor and combines pose information to realize the objective evaluation of the noise reduction effect of the event stream under the condition that the specific distribution of noise and a reference event stream are unknown.

The invention provides a dynamic visual event stream noise reduction effect evaluation method based on event time-space synchronization, which comprises the following specific steps:

step 1: reading an event stream output by a dynamic vision sensor DVS, and acquiring pose information of the DVS by an action capture system, a visual odometer, an inertial navigation or an indoor positioning method;

step 2: based on an Event-based Multi View Stereo algorithm, three-dimensional reconstruction is carried out on an actual scene by using an Event stream in combination with pose information, events triggered at different moments are projected to a reference moment to carry out space-time synchronization, a confidence map is obtained, sharpening of a real Event and blurring of a noise Event are realized, and the sharpening and the blurring of the noise Event are used as a reference standard for noise reduction;

and 3, step 3: since the local maxima and edge regions in the DSI often correspond to intensity gradients in the scene, the probability of a trigger event is greater, getting the reference instant t _r After the confidence map c (x, y) of (c), each pixel is converted into an event probability map by calculating the probability of being a local maximum or edge for each pixel, representing t in an ideal case _r Event probability triggered by each pixel of a real scene in time on DVS;

and 4, step 4: based on the reference instant t _r The consistency of the event stream and the event probability map is used for calculating the rationality of the event stream;

and 5: the high-precision event stream denoising method can remove noise events with low rationality and keep effective events with high rationality, so that the overall reliability of the event stream is improved, and therefore, the improvement of the noise reduction algorithm on the rationality of the event stream is calculated by comparing the rationality of the event stream before and after noise reduction, and the noise reduction precision index is obtained for evaluating and comparing the noise reduction effects of different algorithms:

wherein e is _original And e _denoised Respectively representing event streams before and after noise reduction, wherein the higher the noise reduction precision index is, the event stream to which the noise reduction algorithm is applied is representedThe rationality is improved more obviously, and the noise reduction effect is better.

As a further improvement of the invention, the three-dimensional reconstruction by using the event stream and the pose information in the step 2 comprises the following processes:

before three-dimensional reconstruction, detecting repeated events on each pixel, and only retaining the first trigger event:

IE＝{e _i (x _i ,y _i ,t _i )|(t _i -t _i-1 )＞τ _IE ^(t _i+1 -t _i )＜τ _IE }

wherein IE represents the first event in the repeated trigger event and represents t _i Time stamp of the i-th event triggered on a certain pixel, time threshold parameter t _IE Set to 20ms;

then, three-dimensional reconstruction based on the event is carried out to generate a confidence map, and the method specifically comprises the following steps:

(2-1) selecting an observation visual angle at a reference moment as a reference visual angle, discretizing an observation camera system along the optical axis direction of the reference visual angle into a grid map, constructing a parallax space diagram DSI, and discretizing the reference visual angle into N depth planes by the parallax space diagram DSI

Each depth plane is divided into w × h spatial cells, consistent with the pixel resolution of DVS, so DSI is divided into w × h × N spatial voxel cells, where N is set to 100;

(2-2) projecting all events from the pixel plane to the parallax space map DSI according to the poses of the corresponding moments, and calculating the intersection times of each unit voxel in the parallax space map DSI and the event back-projection ray, wherein the more the intersection times, the more times of observing and responding the corresponding area by the DVS, the greater the probability of containing the scene edge in the unit voxel, and correspondingly, the greater the probability of triggering the event on the DVS of the reference view angle;

in the process of projecting and composing an event, an effective event continuously triggered by a high frequency in an actual scene is automatically synchronized to a corresponding spatial position, and noise in an event stream cannot vote the space-time continuity of a fixed area in the DSI and is diluted by the effective event;

and finally, recording the DSI maximum value of each pixel of the reference visual angle along the optical axis direction to obtain a confidence map under the reference visual angle.

As a further improvement of the invention, the step of projecting the event from the pixel plane to the DSI in step 2 is as follows:

solving the intersection units of the event projection ray and the depth planes of the DSI by using homography, wherein each depth plane

Are respectively:

Z _i ＝[n,d _i ] ^T ＝[(0,0,1),z _i ] ^T

wherein n and z _i Respectively the normal vector and depth of each plane;

in the projection process, each event e is first utilized _i (x _i ,y _i ) Relative pose [ R | t ] between observation time and reference time]Calculating the relative Z between them ₀ Planar homography matrix

Then, the projection matrix P of DVS is combined, and the pixel coordinate of event is obtained in Z through homography transformation ₀ Projection coordinates on a plane:

in the formula (x) _i ,y _i ) And (x (z) ₀ ),y(z ₀ ) Are the pixel coordinates of the event and are at Z, respectively ₀ Projection coordinates of the plane;

the projected coordinates of the event on the remaining depth planes of the DSI, again by homographic transformation, from Z ₀ The coordinates on the plane are calculated to yield:

wherein

Simplified to obtain an event at Z _i Coordinates on a plane:

in the formula (c) _x ,c _y ,c _z ) ^T ＝-R ^T t, is the coordinates of the DVS relative to the reference time instant.

As a further improvement of the present invention, the step 3 of converting the confidence map into the event probability map comprises the following processes:

(3-1) for each pixel x, y in the confidence map, using it separately with each pixel (x) in the neighborhood window Ω _i ,y _i ) Constructing a spatial domain Gaussian kernel G by corresponding space proximity and pixel similarity between the E and the omega _d Sum-valued Gaussian kernel G _r ：

Wherein the content of the first and second substances,

(3-2) after that, the normalized product of the spatial domain Gaussian kernel and the value domain Gaussian kernel is taken as the weight W (x) _i ,y _i ) And performing weighted fusion on all pixels in the window omega to obtain an adaptive threshold value T (x, y), comparing the adaptive threshold value T (x, y) with a confidence value c (x, y) at a central pixel, and calculating the probability of the pixel (x, y) serving as a DSI local maximum or an edge region to represent an event trigger probability p (x, y):

the window size is set to 7x7, and the above steps are repeated for all pixels on the confidence map, so as to obtain an event probability map of the reference time.

As a further improvement of the present invention, the event stream rationality calculation in step 4 includes the following processes:

(4-1) for event stream e _i (x _i ,y _i ,t _i ) Using I: Z ² → 0,1 represents the dynamic visual sensor pixel plane Z in a time range before and after the reference time ² Event trigger case of (3):

wherein τ represents a time range, and 1 and 0 each represent [ t _r -τ,t _r +τ]The presence and absence of an event at the pixel over a period of time;

(4-2) when I (x, y) =1, constructing an exponential decay kernel using the time distance between the time stamp of the event at the corresponding pixel and the reference time instant, representing the time correlation between the event stream at that pixel and the event probability map:

wherein the decay rate parameter delta _t Setting to 20ms, quantifying the rationality of an event stream triggering an event at a pixel by the product of the event probabilities p (x, y) and Γ (x, y) for the respective location, the greater the rationality, the greater the likelihood of representing the event as a valid signal triggered by the actual scene;

when I (x, y) =0, the inverse event probability on the event probability map is used:

indicating the rationality of the pixel miss event;

(4-3) thus, [ t ] is obtained _r -τ,t _r +τ]During this time, the log reasonableness of the event stream on pixel (x, y):

separately calculating the pixel plane Z ² The logarithm rationality of all the pixel positions is obtained to obtain an event stream e _i The rationality of (2):

logP(e _i ) The smaller, the event stream e is represented _i The better the consistency with the event probability map, the higher the rationality.

Has the advantages that:

the invention fully utilizes the high-frequency characteristic of the dynamic visual sensor, synchronizes the effective events continuously triggered on the pixel plane of the actual scene to the three-dimensional space of the reference time, realizes the prominent sharpening of the effective events and the fuzzy filtering of the noise events, thereby quantifying the rationality of the event stream and the noise reduction precision of the algorithm, and realizing the objective evaluation of the noise reduction effect of the event stream under the condition that the specific distribution of the noise and the reference event stream are unknown.

Drawings

Fig. 1 is a flowchart of an event stream noise reduction effect evaluation method provided by the present invention.

Detailed Description

The invention is described in further detail below with reference to the following detailed description and accompanying drawings:

the invention discloses a dynamic visual event stream noise reduction effect evaluation method based on event time-space synchronization, the flow of the method is shown in figure 1, and the method specifically comprises the following steps:

step 1: and reading an event stream output by a Dynamic Vision Sensor (DVS), and acquiring the pose information of the DVS by using a motion capture system, a visual odometer, inertial navigation or indoor positioning and other methods.

Step 2: based on an Event-based multiview Stereo (EMVS) algorithm, an actual scene is reconstructed in a three-dimensional mode by using Event streams and pose information, events triggered at different moments are projected to reference moments to be subjected to space-time synchronization, a confidence map is obtained, sharpening of real events and blurring of noise events are achieved, and the sharpened events and the blurred events are used as a reference basis for noise reduction.

Since repeated events on the same pixel can be repeatedly projected to a reference moment in a short time, so that the construction of a confidence map and the subsequent evaluation of event effectiveness are influenced, before three-dimensional reconstruction is carried out, the repeated events on each pixel are detected, and only the first trigger event is reserved:

IE＝{e _i (x _i ,y _i ,t _i )(t _i -t _i-1 )＞τ _IE ^(t _i+1 -t _i )＜τ _IE }

wherein IE represents the first event in the repeated trigger event and represents t _i Time stamp of the i-th event triggered on a certain pixel, time threshold parameter t _IE Set to 20ms.

and (2-1) selecting an observation visual angle at a reference moment as a reference visual angle, and discretizing an observation camera system along the optical axis direction into a grid map to construct a Disparity Space Image (DSI). DSI discretizes reference views into N depth planes

Each depth plane is divided into w × h spatial cells, consistent with the pixel resolution of DVS, and thus DSI is divided into w × h × N spatial voxel cells. Where N is set to 100.

(2-2) projecting all events from the pixel plane to the DSI according to the poses of the corresponding moments, and calculating the intersection (also called as 'voting') times of each unit voxel in the DSI and the event back projection ray, wherein the more the intersection times, the more times of observing and responding the corresponding region by the DVS, the greater the probability of containing the scene edge in the unit voxel, and correspondingly, the greater the probability of triggering the event on the DVS of the reference view angle.

Are respectively:

Z _i ＝[n,d _i ] ^T ＝[(0,0,1),z _i ] ^T

wherein n and z _i Respectively, the normal vector and depth of each plane.

In the projection process, each event e is first utilized _i (x _i ,y _i ) Relative pose [ R | t ] between observation time and reference time]Calculating the relative Z between them ₀ Planar homography matrix H _Z0 ：

Then, the projection matrix P of DVS is combined, and the pixel coordinate of event is obtained in Z through homography transformation ₀ Projection on planeShadow coordinates:

in the formula (x) _i ,y _i ) And (x (z) ₀ ),y(z ₀ ) Are the pixel coordinates of the event and are in Z, respectively ₀ Projection coordinates of the plane.

The projected coordinates of the event on the remaining depth plane of the DSI, again by homography, are transformed from Z ₀ The coordinates on the plane are calculated to yield:

wherein

Simplified to obtain an event at Z _i Coordinates on a plane:

in the formula (c) _x ,c _y ,c _z ) ^T ＝-R ^T t, is the coordinates of the DVS relative to the reference time.

In the process of projection composition of the event, effective events continuously triggered by a high frequency in an actual scene are automatically synchronized to a corresponding space position, and noise in the event stream cannot vote space-time persistence on a fixed region in the DSI and is diluted by the effective events, so that the algorithm has good robustness on the noise, and reference is provided for objectively evaluating the noise reduction effect of the event stream.

And step 3: since the local maxima and edge regions in the DSI often correspond to intensity gradients in the scene, the probability of a trigger event is greater, getting the reference instant t _r After confidence map c (x, y), we convert each pixel into an event probability map by computing its probability as a local maximum or edge, representing ideally t _r The event probability triggered by the real scene on each pixel of the DVS at a moment specifically includes the following steps:

(3-1) for each pixel (x, y) in the confidence map, using it separately with each pixel (x) in the neighborhood window Ω _i ,y _i ) Constructing a spatial domain Gaussian kernel G by corresponding space proximity and pixel similarity between the E and the omega _d Sum-valued Gaussian Kernel G _r ：

Wherein, the first and the second end of the pipe are connected with each other,

(3-2) thereafter, the normalized product of the spatial domain Gaussian kernel and the value domain Gaussian kernel is taken as the weight W (x) _i ,y _i ) And performing weighted fusion on all pixels in the window omega to obtain an adaptive threshold value T (x, y), comparing the adaptive threshold value T (x, y) with a confidence value c (x, y) at the central pixel, and calculating the probability of the pixel (x, y) serving as a local maximum or an edge region of the DSI, wherein the probability represents an event triggering probability p (x, y):

where the window size is set to 7x7. And repeating the steps for all the pixels on the confidence map to obtain the event probability map of the reference time. Therefore, by introducing the spatial proximity and the pixel similarity, the confidence map can be converted into an event probability map by combining the local scene information according to the event triggering characteristics.

And 4, step 4: based on the reference time t _r The method comprises the following steps of calculating the reasonability of an event stream at a corresponding moment according to the consistency of the event stream and an event probability map, and specifically comprising the following steps:

wherein τ represents a time range, and 1 and 0 each represent [ t ] _r -τ,t _r +τ]The presence and absence of an event at the pixel over the time period.

with the decay rate parameter deltat set to 20ms. The rationality of an event stream triggering an event at a pixel is quantified by the product of the event probabilities p (x, y) and Γ (x, y) of the corresponding locations, the greater the rationality, the greater the likelihood of representing the event as a valid signal for an actual scene trigger.

indicating the rationality of the pixel dropout event.

And 5: the high-precision event stream noise reduction method can remove noise events with low rationality and keep effective events with high rationality, so that the overall reliability of the event stream is improved. Therefore, the improvement of the noise reduction algorithm on the rationality of the event stream is calculated by comparing the rationality of the event stream before and after noise reduction, and the noise reduction precision index is obtained and used for evaluating and comparing the noise reduction effect of different algorithms:

wherein e is _original And e _denoised The event streams before and after noise reduction are respectively represented, the higher the noise reduction precision index is, the more obvious the enhancement of the noise reduction algorithm on the rationality of the event streams is represented, and the better the noise reduction effect is.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, but any modifications or equivalent variations made according to the technical spirit of the present invention are within the scope of the present invention as claimed.

Claims

1. The method for evaluating the noise reduction effect of the dynamic visual event stream based on event time-space synchronization comprises the following specific steps:

and 4, step 4: based on the reference time t _r The consistency of the event stream and the event probability map is calculated, and the reasonability of the event stream is calculated;

and 5: the event stream noise reduction method with high precision can remove noise events with low rationality and keep effective events with high rationality, so that the overall reliability of the event stream is improved, and therefore, the rationality of the event stream before and after noise reduction is compared, the improvement of the noise reduction algorithm on the rationality of the event stream is calculated, and the noise reduction precision index is obtained and is used for evaluating and comparing the noise reduction effects of different algorithms:

2. The method for evaluating the noise reduction effect of the dynamic visual event stream based on the event space-time synchronization as claimed in claim 1, wherein: the three-dimensional reconstruction by using the event stream and the pose information in the step 2 comprises the following processes:

IE＝{e _i (x _i ,y _i ,t _i )|(t _i -t _i-1 )＞τ _IE ∧(t _i+1 -t _i )＜τ _IE }

wherein IE represents the first event in the repeated trigger events and represents t _i Time stamp of the i-th event triggered on a certain pixel, time threshold parameter t _IE Setting the time to be 20ms;

then, three-dimensional reconstruction based on the event is carried out to generate a confidence map, and the specific steps comprise:

(2-2) projecting all events from the pixel plane to the parallax space map DSI according to the poses at the corresponding moments, and calculating the intersection times of each unit voxel in the parallax space map DSI and event back-projection rays, wherein the more the intersection times, the more times of observing and responding the corresponding area by the DVS, the greater the probability that the unit voxel contains a scene edge, and correspondingly, the greater the probability of triggering an event on the DVS at the reference view angle;

3. The method for evaluating the noise reduction effect of the dynamic visual event stream based on the event space-time synchronization as claimed in claim 2, wherein: the step of projecting the event from the pixel plane to the DSI in step 2 is as follows:

Are respectively:

Z _i ＝[n,d _i ] ^T ＝[(0,0,1),z _i ] ^T

wherein n and z _i Respectively the normal vector and depth of each plane;

Then, in combination with the projection matrix P of the DVS,deriving the position in Z from the pixel coordinates of the event by a homography transformation ₀ Projection coordinates on a plane:

in the formula (x) _i ,y _i ) And (x (z) ₀ ),y(z ₀ ) Are the pixel coordinates of the event and are in Z, respectively ₀ Projection coordinates of the plane;

wherein

Simplified to obtain an event at Z _i Coordinates on a plane:

4. The method for evaluating the noise reduction effect of the dynamic visual event stream based on the event space-time synchronization as claimed in claim 1, wherein: the step 3 of converting the confidence map into the event probability map comprises the following processes:

(3-1) for each pixel x, y in the confidence map, using it and the neighborhood window, respectivelyEach pixel (x) in the omega _i ,y _i ) Constructing a spatial domain Gaussian kernel G by corresponding space proximity and pixel similarity between the E and the omega _d Sum-valued Gaussian kernel G _r ：

(3-2) after that, the normalized product of the spatial domain Gaussian kernel and the value domain Gaussian kernel is taken as the weight W (x) _i ,y _i ) And performing weighted fusion on all pixels in the window omega to obtain an adaptive threshold value T (x, y), comparing the adaptive threshold value T (x, y) with a confidence value c (x, y) at the central pixel, and calculating the probability of the pixel (x, y) serving as a local maximum or an edge region of the DSI, wherein the probability represents an event triggering probability p (x, y):

the window size is set to 7x7, and the above steps are repeated for all pixels on the signaling map, so as to obtain an event probability map at the reference time.

5. The method for evaluating the noise reduction effect of the dynamic visual event stream based on the event space-time synchronization as claimed in claim 1, wherein: the event stream rationality calculation in step 4 includes the following processes:

wherein the decay rate parameter delta _t Setting to 20ms, quantifying the rationality of an event stream triggering an event at a pixel by the product of the event probabilities p (x, y) and Γ (x, y) of the respective locations, the greater the rationality, the greater the likelihood of representing the event as a valid signal for actual scene triggering;

indicating the rationality of the pixel miss event;

(4-3) thus, [ t ] is obtained _r -τ,t _r +τ]During this time, the log justification of the event stream on pixel (x, y) is: