WO2023083121A1

WO2023083121A1 - Denoising method and related device

Info

Publication number: WO2023083121A1
Application number: PCT/CN2022/130027
Authority: WO
Inventors: 冷卢子未
Original assignee: 华为技术有限公司
Priority date: 2021-11-09
Filing date: 2022-11-04
Publication date: 2023-05-19
Also published as: CN116109489A

Abstract

The present application provides a denoising method and a related device in the field of artificial intelligence. The method comprises: obtaining a first dynamic vision sensor signal; and performing denoising processing on the first dynamic vision sensor signal by means of a spiking neural network model to obtain a second dynamic vision sensor signal, wherein a postsynaptic membrane voltage kernel function of a spiking neuron of the spiking neural network model comprises a target parameter, and the target parameter is determined according to autocorrelation coefficients of dynamic vision sensor signals. By means of embodiments of the present application, the denoising effect of the dynamic vision sensor signal can be improved.

Description

Denoising method and related equipment

This application claims the priority of the Chinese patent application with the application number 202111321228.7 and the title of the invention "Denoising method and related equipment" filed with the State Intellectual Property Office of China on November 09, 2021, the entire contents of which are incorporated by reference in this application middle.

technical field

The embodiments of the present application relate to the technical field of artificial intelligence, and in particular, to a denoising method and related equipment.

Background technique

The existing dynamic vision sensor (Dynamic Vision Sensor, DVS) signal denoising methods are mainly filter denoising and artificial neural network (Artificial Neural Network, ANN) denoising based on deep learning framework. Filtering denoising methods include temporal filtering methods or spatial filtering methods, which mainly denoise by filtering temporally or spatially isolated events. The denoising method based on the artificial neural network usually compresses the dynamic visual sensor data stream into an image to increase the data density, and then uses the traditional RGB image denoising network to denoise the image obtained by compressing the frame; among them, based on the two-dimensional (2D) Convolutional Neural Networks (CNN) denoising methods usually require an additional definition of a noise model containing time information, while denoising methods based on three-dimensional (3D) Convolutional Neural Networks require a period of time window temporal convolution.

However, the filter denoising method is easy to filter events and noise together when the data is sparse, and the performance on the benchmark dataset is not as good as the denoising method based on the artificial neural network; while the denoising method based on the artificial neural network faces the problem of large network size, Problems such as large amount of calculation and long processing time. Moreover, due to the characteristics of highly sparse dynamic visual sensor signals, large data density variation range, and high time resolution, existing dynamic visual sensor signal denoising methods cannot achieve good denoising effects.

Contents of the invention

The present application provides a denoising method and related equipment, which can improve the denoising effect of dynamic visual sensor signals.

According to a first aspect, the present application relates to a denoising method, comprising: acquiring a first dynamic visual sensor signal; using a Spiking Neural Network (SNN) model to perform denoising processing on the first dynamic visual sensor signal, To obtain the second dynamic visual sensor signal, the post-synaptic membrane voltage (Post-Synaptic Potential, PSP) kernel function of the spiking neuron (spiking neuron) of the spiking neural network model includes a target parameter, and the target parameter is based on the dynamic The autocorrelation coefficient of the visual sensor signal is determined.

In this application, the spiking neural network model is used to denoise the dynamic visual sensor signal. The spiking neural network model includes spiking neurons, and the target parameters in the postsynaptic membrane voltage kernel function of the spiking neurons are based on the dynamic The autocorrelation coefficient of the visual sensor signal is determined, so the post-synaptic membrane voltage kernel function can enable the spiking neural network model to learn the time correlation of the dynamic visual sensor signal, and remove the noise with weak temporal correlation in the dynamic visual sensor signal events, so that the denoising effect of the spiking neural network model on dynamic visual sensor signals can be improved. Moreover, due to the inherent temporal dynamics of the spiking neural network model, it performs streaming denoising on highly sparse dynamic visual sensor signals, that is, the input data of the spiking neural network model is the same as the number of output data frames, and the data in the spiking neural network One pass in the model does not rely on three-dimensional convolution to perform traversal sliding window calculations in the time dimension, so compared with the existing denoising methods based on artificial neural networks, it can greatly reduce the running time, network size and calculation amount.

In a possible implementation manner, the autocorrelation coefficient of the dynamic vision sensor signal includes a plurality of autocorrelation coefficients, and the plurality of autocorrelation coefficients are based on a plurality of first target dynamic vision sensor signals within a preset time period. obtained; the target parameter is obtained according to a preset autocorrelation coefficient threshold and a preset function, and the preset function is obtained by fitting according to the distribution of the multiple autocorrelation coefficients in time.

In this implementation, based on the time distribution of multiple autocorrelation coefficients obtained from multiple first target dynamic visual sensor signals within a preset time period, function fitting is performed to obtain a preset function, so the preset function represents The relationship between the autocorrelation coefficient of the dynamic visual sensor signal and time; then the inverse function is calculated for the preset function to obtain the inverse function of the preset function, so the inverse function of the preset function represents the autocorrelation between time and the dynamic visual sensor signal The relationship between the coefficients; then the time value obtained by solving the inverse function of the preset function based on the preset autocorrelation coefficient threshold is used as the value of the target parameter in the post-synaptic membrane voltage kernel function of the spiking neuron. In this way, the target parameters in the post-synaptic membrane voltage kernel function are adjusted, so that the spiking neural network model learns the temporal correlation of the dynamic visual sensor signal.

In a possible implementation, any autocorrelation coefficient among the plurality of autocorrelation coefficients is the average value of the first values corresponding to the D second target dynamic vision sensor signals, and D is a positive integer; the D The second target dynamic vision sensor signals are the D first target dynamic vision sensor signals belonging to the same first preset period among the multiple first target dynamic vision sensor signals, and the preset time period includes a plurality of the first target dynamic vision sensor signals The first preset period; the first numerical value corresponding to any second target dynamic visual sensor signal in the D second target dynamic visual sensor signals is obtained by accumulating the second numerical values corresponding to a plurality of pixels; The second numerical value corresponding to any pixel in the plurality of pixels is obtained according to the first signal value of the any pixel and the target signal value of the first target pixel, and the first target pixel is the same as the any pixel The pixels whose proximity is not greater than the preset proximity threshold; the any second target dynamic visual sensor signal includes the first signal value of any pixel, and the any second target dynamic visual sensor signal is the The w first target dynamic vision sensor signal in a plurality of first target dynamic vision sensor signals, w is a positive integer; the target signal value of the first target pixel is the first target pixel in the third target dynamic vision The first signal value in the sensor signal, the third target dynamic vision sensor signal is the w+qth first target dynamic vision sensor signal among the plurality of first target dynamic vision sensor signals, and q is a positive integer. Wherein, the plurality of pixels are a plurality of pixels in the photosensitive element of the dynamic vision sensor, or the plurality of pixels are a plurality of pixels in a two-dimensional image, and the two-dimensional image is collected by the photosensitive element of the dynamic vision sensor Obtained: the first target pixel is a pixel whose proximity to any pixel in the photosensitive element of the dynamic vision sensor is not greater than a preset proximity threshold, or the first target pixel is a pixel in a two-dimensional image that is not closer to the pixel The proximity of any pixel is not greater than the preset proximity threshold, and the two-dimensional image is collected by the photosensitive element of the dynamic vision sensor.

In this implementation, the multiple first target dynamic visual sensor signals are dynamic visual sensor signals at multiple moments in the preset time period, and the dynamic visual sensor signals in the preset time period can be calculated by the method of time sliding window. Autocorrelation coefficient; for example, the size of the first preset period is used as the size of the time window to slide the window on the preset time period, and the time interval of one sliding is the size of a first preset period; during a time sliding window process, the time The window can frame D first target dynamic visual sensor signals in a plurality of first target dynamic visual sensor signals, for example, denoted as D second target dynamic visual sensor signals; based on the D second target dynamic visual sensor signals, The autocorrelation coefficient of the dynamic visual sensor signal corresponding to the time sliding window is calculated. Wherein, because each first target dynamic visual sensor signal of a plurality of first target dynamic visual sensor signals comprises the first signal value of a plurality of pixels, so D second in each time window (or first preset period) Each second target dynamic visual sensor signal in the target dynamic visual sensor signal also includes the first signal values of a plurality of pixels, so each time window (or the first signal value of each time window (or the second) can be calculated according to the first signal values of adjacent pixels at different times. A preset period) corresponding to the autocorrelation coefficient. In this way, with the size of the first preset period as the time window size, after multiple sliding windows in the preset time period, the autocorrelation coefficient of the dynamic visual sensor signal corresponding to the multiple time sliding windows can be calculated, so that it can be obtained Multiple autocorrelation coefficients.

In a possible implementation manner, the first target pixel includes a plurality of second target pixels, and the second value corresponding to any pixel is accumulated according to the third value corresponding to the plurality of second target pixels The third numerical value corresponding to any second target pixel among the plurality of second target pixels is the product of the first signal value of any pixel and the target signal value of any second target pixel.

In this implementation, for any pixel in all the pixels in the dynamic visual sensor signal, the first signal value of the any pixel at a certain moment is respectively compared with the value of each adjacent pixel of the any pixel at another moment Multiply the first signal value of each adjacent pixel to obtain the third numerical value corresponding to each adjacent pixel, that is, to obtain a plurality of third numerical values corresponding to any pixel; then add the third numerical values corresponding to each adjacent pixel to obtain The second numerical value corresponding to any pixel; and then accumulating the second numerical values corresponding to all pixels to obtain the first numerical value corresponding to the second target dynamic vision sensor signal at a certain moment. For the second target dynamic visual sensor signals (such as D second target dynamic visual sensor signals) at all moments within a time window (or the first preset period), the above operations can be performed to obtain the second target at all moments The first value corresponding to the dynamic vision sensor signal (such as D second target dynamic vision sensor signal); and then calculate the average value of all the first values obtained in this time window (or the first preset period), and this can be obtained The autocorrelation coefficient of the dynamic vision sensor signal corresponding to the time window (or the first preset period).

In a possible implementation manner, the first signal value of any pixel included in any first target dynamic vision sensor signal among the plurality of first target dynamic vision sensor signals is based on the first signal value of any pixel Two signal values are obtained; wherein: if the second signal value of any pixel is greater than 0, then the first signal value of any pixel is 1; if the second signal value of any pixel is less than 0, Then the first signal value of any pixel is -1; the second signal value of any pixel is any one of the m third signal values of any pixel, m is a positive integer; The m third signal values are obtained according to a plurality of third dynamic visual sensor signals within the preset time period; wherein: the gth third signal value among the m third signal values is the m The sum of the g-1th third signal value among the third signal values and the cumulative sum of the fourth signal value of any pixel in the gth second preset period, 1≤g≤m, g is a positive integer, the multiple third dynamic visual sensor signals include the fourth signal value of any pixel in the g second preset period, and the preset time period includes the g a second preset period; and when g is equal to 1, the first third signal value is the accumulated sum of the fourth signal values of any pixel in the first second preset period.

In this implementation manner, multiple dynamic vision sensor signals within a certain period of time are compressed into one dynamic vision sensor signal, so as to increase the data density of the dynamic vision sensor signal. For example, taking the size of the second preset period as the size of the time window as the size of the time window to compress the multiple third dynamic visual sensor signals within the preset time period, thereby obtaining multiple first target dynamic visual sensor signals, The first signal value of any pixel included in any first target dynamic visual sensor signal is obtained based on the cumulative value reset of the fourth signal value of the arbitrary pixel in multiple third dynamic visual sensor signals, The principle of resetting is: if the cumulative value of the fourth signal value of any pixel in multiple third dynamic vision sensor signals is greater than 0, then the first signal value is set to 1, if the any pixel is in multiple If the accumulated value of the fourth signal value in the third dynamic vision sensor signal is greater than 0, then the first signal value is set to -1. Moreover, since the first target dynamic visual sensor signal has a higher data density, compared with the third dynamic visual sensor signal, using the first target dynamic visual sensor signal to train the spiking neural network model can improve the training efficiency.

In a possible implementation manner, the spiking neural network model includes N convolutional layers and N inverse convolutional layers, wherein: the output of the jth convolutional layer in the N convolutional layers is The input of the j+1th convolutional layer in the N convolutional layers, the output of the jth inverse convolutional layer in the N inverse convolutional layers is the first in the N inverse convolutional layers The input of the j+1 deconvolution layer, the output of the jth convolution layer is also the input of the N-j deconvolution layer in the N deconvolution layers, and the N convolution layers The output of the Nth convolutional layer in is the input of the first inverse convolutional layer in the N deconvolutional layers, 1≤j≤N, and N and j are positive integers.

In this implementation, the spiking neural network model includes symmetrical N convolutional layers and N deconvolutional layers, and there is a skip connection between each convolutional layer and its symmetric deconvolutional layer; thus, the spiking The neural network model extracts and reconstructs the features of dynamic visual sensor signals through deconvolution and skip connections, which is beneficial to ensure the integrity of the extracted features and the authenticity of the reconstructed features.

According to the second aspect, the present application relates to a denoising device, and the beneficial effects can be referred to the description of the first aspect, which will not be repeated here. The denoising device has the function of realizing the behavior in the method example of the first aspect above. The functions described above may be implemented by hardware, or may be implemented by executing corresponding software on the hardware. The hardware or software includes one or more modules corresponding to the above functions. In a possible implementation manner, the denoising device includes: an acquisition unit, configured to acquire a first dynamic visual sensor signal; a processing unit, configured to use a spiking neural network model to process the first dynamic visual sensor signal Denoising processing, to obtain the second dynamic visual sensor signal, the post-synaptic membrane voltage kernel function of the spiking neuron of the spiking neural network model includes a target parameter, and the target parameter is determined according to the autocorrelation coefficient of the dynamic visual sensor signal of.

In a possible implementation, any autocorrelation coefficient among the plurality of autocorrelation coefficients is the average value of the first values corresponding to the D second target dynamic vision sensor signals, and D is a positive integer; the D The second target dynamic vision sensor signals are the D first target dynamic vision sensor signals belonging to the same first preset period among the multiple first target dynamic vision sensor signals, and the preset time period includes a plurality of the first target dynamic vision sensor signals The first preset period; the first numerical value corresponding to any second target dynamic visual sensor signal in the D second target dynamic visual sensor signals is obtained by accumulating the second numerical values corresponding to a plurality of pixels; The second numerical value corresponding to any pixel in the plurality of pixels is obtained according to the first signal value of the any pixel and the target signal value of the first target pixel, and the first target pixel is the same as the any pixel The pixels whose proximity is not greater than the preset proximity threshold; the any second target dynamic visual sensor signal includes the first signal value of any pixel, and the any second target dynamic visual sensor signal is the The w first target dynamic vision sensor signal in a plurality of first target dynamic vision sensor signals, w is a positive integer; the target signal value of the first target pixel is the first target pixel in the third target dynamic vision The first signal value in the sensor signal, the third target dynamic vision sensor signal is the w+qth first target dynamic vision sensor signal among the plurality of first target dynamic vision sensor signals, and q is a positive integer.

According to a third aspect, the present application relates to an electronic device, comprising: one or more processors; a computer-readable storage medium coupled to the processors and storing a program executed by the processors, wherein the programs are in When executed by the processor, the electronic device is made to execute the method in any possible embodiment of the first aspect.

According to a fourth aspect, the present application relates to a computer-readable storage medium, including program codes, which, when executed by a computer device, are used to perform the method in any possible embodiment of the first aspect.

According to a fifth aspect, the present application relates to a chip, including: a processor, configured to call and run a computer program from a memory, so that a device installed with the above-mentioned chip executes any of the possible embodiments of the first aspect. method.

According to a sixth aspect, the present application relates to a computer program product comprising program code which, when run, performs the method of any one of the possible embodiments of the first aspect.

Description of drawings

The accompanying drawings used in the embodiments of the present application are introduced below.

Fig. 1 is a schematic structural diagram of a neural network provided by an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a spiking neural network provided in an embodiment of the present application;

FIG. 3 is a schematic flowchart of a denoising method provided in an embodiment of the present application;

Fig. 4 is the time distribution curve of the autocorrelation coefficient of a kind of dynamic vision sensor signal provided by the embodiment of the present application;

5 is a schematic diagram of the post-synaptic membrane voltage kernel function corresponding to different time parameters provided by the embodiment of the present application;

FIG. 6 is a schematic diagram of a training process of a spiking neural network model provided in an embodiment of the present application;

Fig. 7 is a schematic diagram of the influence of different target parameters on the training of the spiking neural network model provided by the embodiment of the present application;

8 is a comparison diagram of the denoising effect of the spiking neural network model provided by the embodiment of the present application and the denoising effect of the three-dimensional convolutional neural network model;

Fig. 9 is a comparison diagram of de-blurring effects between the denoised signal of the pulse neural network model provided by the embodiment of the present application and the de-noised signal of the three-dimensional convolutional neural network model;

FIG. 10 is a schematic structural diagram of a denoising device provided in an embodiment of the present application;

FIG. 11 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

Detailed ways

First, related technologies involved in the embodiments of the present application are introduced, so that those skilled in the art can understand the present application.

(1) Dynamic vision sensor and noise

A dynamic vision sensor, also known as an event camera or a neuromorphic camera, is an imaging sensor that responds to local brightness changes. Dynamic vision sensors do not use a shutter to capture images like traditional cameras do. Instead, each pixel within the dynamic vision sensor operates independently and asynchronously, reporting when brightness changes and remaining silent otherwise. Dynamic vision sensors have microsecond-level temporal resolution, 120dB dynamic range, and less under/overexposure and motion blur than frame cameras. Due to the advantages of asynchronous triggering, high time resolution, high dynamic range, low latency, low bandwidth and low power consumption, dynamic vision sensors can be mounted on mobile platforms (such as mobile phones, drones and cars, etc.) for objects Vision tasks such as detection, tracking, recognition, depth estimation, etc.

Different from the traditional technical solution, based on the "frame" collected at a fixed frequency, and read all the pixel information in each "frame" in sequence, the dynamic vision sensor does not need to read all the pixels in the picture, only need Obtain the address and information of the pixel point where the light intensity changes; specifically, when the dynamic vision sensor detects that the light intensity change of a certain pixel point is greater than or equal to the preset threshold value, an event signal for the pixel point is sent; wherein, if the If the light intensity change is a positive change, that is, the pixel point jumps from low brightness to high brightness, then a "+1" event signal is sent and marked as a positive event; if the light intensity change is a negative change, that is, the pixel point is changed by When high brightness jumps to low brightness, a "-1" event signal is issued and marked as a negative event; if the light intensity change is less than the preset threshold value, no event signal is sent and marked as no event; the dynamic vision sensor The event annotation performed by the point constitutes the event flow information. The form of light intensity change information collected by the dynamic visual sensor can be (X, Y, P, T), where "X, Y" is the event address, "P" is the event output, and "T" is the time when the event is generated . An event address matches a pixel in the two-dimensional image associated with the dynamic vision sensor, which means that the event address corresponds to a pixel position in the reference color image, and "X, Y" can be the row in the reference color image , column position, "P" is the specific value of the real-time light intensity change, and "T" is the generation time of the real-time light intensity change.

The dynamic visual sensor based on address-event expression (AER) imitates the working mechanism of biological vision, while the traditional visual image acquisition method is based on "frames" collected at a fixed frequency, which has high redundancy, high latency, low dynamic range and high data Quantity and other defects. The dynamic vision sensor works asynchronously with pixels, and only outputs the address and information of the pixels whose light intensity changes, instead of passively reading out the information of each pixel in the "frame" in sequence, eliminating redundant data from the source, and having real-time dynamics of scene changes Response, super-sparse representation of images, and asynchronous output of events. Due to the high sensitivity of the dynamic vision sensor, the signal output is often accompanied by noise, including background noise, device thermal noise, etc. At the same time, due to its large range of data density changes and high temporal resolution, it poses challenges to traditional image-based denoising algorithms.

(2) Spike neural network

A neural network is a computing system that mimics the structure of a biological brain for data processing. The interior of the biological brain is composed of a large number of neurons combined in different ways, and the former neuron and the latter neuron are connected through synaptic structures for information transmission. Neural network has powerful nonlinear, self-adaptive and fault-tolerant information processing capabilities.

Correspondingly, the neural network structure is shown in Figure 1. Each node simulates a neuron and performs a specific operation, such as an activation function; the connection between nodes simulates a synapse, and the weight value of the synapse represents two neurons. The connection strength between elements.

In the current artificial neural network, the transmission of information is carried out through analog values, and each neuron accumulates the value of the previous neuron through the multiplication and addition operation, and passes the activation function to the subsequent neuron. In a more brain-like spiking neural network, the transmission of information is carried out through pulse sequences, and each neuron regulates the membrane voltage by accumulating the pulse sequences of the previous neurons. When the membrane voltage reaches a certain threshold, the neuron Send out new pulses and transmit them to subsequent neurons, in this way the transmission, processing and nonlinear transformation of information are realized. Many different types of neurons can be used in the spiking neural network, such as integral discharge (Integrate and Fire, IF) model, leakage integral discharge (Leaky Integrate and Fire, LIF) model, impulse response model (Spike Response Model, SRM), Threshold variable neurons, etc.

Some important terms of spiking neural networks are as follows:

The spike neuron is the basic unit of the spike neural network, which integrates information by receiving pulse input. Pulse input will cause the membrane voltage of the neuron to increase. When the neuron membrane voltage increases beyond a certain threshold voltage, the neuron will send out a pulse and transmit it to other neurons.

Synapse is the carrier in the process of pulse transmission, and the connection between pulse neurons depends on synapses.

The post-synaptic membrane voltage is also called the post-synaptic potential, which is the voltage change on the post-synaptic neuron membrane voltage generated by the pulse fired by the pre-synaptic neuron.

Compared with artificial neural networks, spiking neural networks are based on the inspiration of brain neural networks, and have the characteristics of low energy consumption, asynchronous computing and time dynamics, and are an ideal technical method for processing highly sparse dynamic visual sensor streaming data.

The technical solution provided by this application uses the pulse neural network model to denoise the dynamic visual sensor signal, specifically including: the structure design of the pulse neural network model, the technical solution for training design, the technical solution for improving the data density of the dynamic visual sensor signal, and A technical solution for adjusting the parameters of spiking neurons based on the characteristics of dynamic visual sensor signals (such as temporal correlation), and so on.

The technical solutions provided by the present application will be described in detail below in conjunction with specific implementation methods.

Please refer to Fig. 2, Fig. 2 is a schematic structural diagram of a spiking neural network provided by an embodiment of the present application; the spiking neural network model includes N convolutional layers and N deconvolutional layers, wherein: the N volumes The output of the jth convolutional layer in the convolutional layer is the input of the j+1th convolutional layer in the N convolutional layers, and the jth inverse convolutional layer in the N deconvolutional layers The output of the N deconvolution layer is the input of the j+1th deconvolution layer, and the output of the jth convolution layer is also the N-jth of the N deconvolution layers The input of the deconvolution layer, the output of the Nth convolution layer in the N convolution layers is the input of the first deconvolution layer in the N deconvolution layers, 1≤j≤N , N and j are positive integers.

As can be seen from the above, a skip connection is added between the convolutional layer and the deconvolutional layer of the spiking neural network. Moreover, the network computing unit of the spiking neural network is a spiking neuron, and the spiking neuron can use many different types of neurons, such as integral discharge model, leakage integral discharge model, impulse response model, and threshold-variable neuron.

In a possible implementation, the neurons of the spiking neural network adopt an impulse response model, and the impulse response model is defined by the following integral equation:

In the formula (1), t represents the time, u(t) represents the voltage at time t, η represents the voltage value variable of the neuron after each firing pulse, f represents the index number of the neuron pulse firing time, t ^(f) Indicates the pulse release time, κ ^ext represents the post-synaptic membrane voltage kernel function and κ ^ext is an exponential function, s represents the pulse input time, u _rest represents the resting voltage,

Indicates the voltage value resulting from the cumulative effect on the preceding neuron spikes. Among them, when the voltage u(t) exceeds a certain threshold θ, the neuron fires a pulse.

Due to the high sparsity of the output data of dynamic vision sensors, traditional loss functions based on Euclidean distance or L2 norm (L2norm) are prone to produce "zero output". To solve this problem:

In a possible implementation, a loss function based on Van Rossum distance (Van Rossum) is used in the spike neural network to better reflect the error of the spike sequence and avoid insufficient network output. Among them, the specific definition of Van Rossum distance is as follows:

Given the output pulse train:

u＝{u ₁ ,u ₂ ,…,u _n }(2)

In formula (2), u ₁ , u ₂ , . . . , u _n all represent the pulse time.

If the target pulse sequence is:

v＝{v ₁ ,v ₂ ,…,v _n }(3)

In formula (3), v ₁ , v ₂ , . . . , v _n all represent the pulse time.

Then the Van Rossum distance is defined as:

In formula (4), τ represents the time constant of the kernel function h(t), and t represents time; f(t; u) and f(t; v) represent the convolution of the pulse sequence and a specific kernel function, respectively, as in the formula ( 5) and formula (6):

Among them, the kernel function h(t) is defined as:

The spiking neural network model shown in Figure 2 includes symmetrical N convolutional layers and N deconvolutional layers, and there is a skip connection between each convolutional layer and its symmetric deconvolutional layer; thus, the spiking neural network The network model extracts and reconstructs the features of dynamic visual sensor signals through deconvolution and skip connections, which is beneficial to ensure the integrity of the extracted features and the authenticity of the reconstructed features.

Please refer to FIG. 3 . FIG. 3 is a flowchart illustrating a process 300 of a denoising method according to an embodiment of the present application. The process 300 is described as a series of steps or operations. It should be understood that the process 300 may be executed in various orders and/or concurrently, and is not limited to the execution order shown in FIG. 3 . The process 300 can be executed by an electronic device, the electronic device includes a server and a terminal, and the process 300 includes but not limited to the following steps or operations:

Step 301: Acquire the first dynamic vision sensor signal;

Step 302: Using the spiking neural network model to denoise the first dynamic visual sensor signal to obtain the second dynamic visual sensor signal, the post-synaptic membrane voltage kernel function of the spiking neuron of the spiking neural network model includes A target parameter, the target parameter is determined according to the autocorrelation coefficient of the dynamic visual sensor signal.

Wherein, the structure of the spiking neural network model may be shown in FIG. 2 .

Wherein, the first dynamic visual sensor signal is the original signal collected by the dynamic visual sensor, and the second dynamic visual sensor signal is the signal after the pulse neural network model provided by this application is denoised.

Wherein, the autocorrelation coefficient of the dynamic vision sensor signal can be the time data autocorrelation coefficient (autocorrelation coefficient): <x(s)x(s-t)>; wherein, x represents the signal, t represents the interval time, s represents the time point, < > means to average the signal at different time points.

Wherein, the target parameter can be preset according to the autocorrelation coefficient of the dynamic visual sensor signal before the training of the spiking neural network model; The autocorrelation coefficient is adjusted in real time. Moreover, during the training process of the spiking neural network model, the target parameter can be set as self-learning or fixed.

Wherein, in this implementation manner, the target parameters in the postsynaptic membrane voltage kernel function can be obtained based on the autocorrelation coefficient of the dynamic visual sensor signal within a period of time. For example: multiple autocorrelation coefficients are obtained according to a plurality of first target dynamic vision sensor signals within a preset time period, wherein the multiple autocorrelation coefficients are in one-to-one correspondence with multiple moments, that is, the multiple autocorrelation coefficients are Distributed on these multiple moments, as shown in Figure 4; Then according to the distribution of multiple autocorrelation coefficients in time (that is, the dynamic visual sensor signal correlation curve), the function fitting is carried out to obtain the preset function; according to the preset Set the autocorrelation coefficient threshold and the preset function obtained by fitting to obtain the objective function. details as follows:

First, the correlation curve of the dynamic visual sensor signal is fitted by the function fitting method, that is, the preset function is obtained by fitting the distribution shape of multiple autocorrelation coefficients in time; for example, the format of the fitted preset function can be for:

y＝be ^-ax +c (8)

Among them, the values of a, b and c in formula (8) are determined according to the distribution shape of the actual autocorrelation coefficient in time.

Then, the inverse function of the preset function is solved based on the preset autocorrelation coefficient threshold; for example, the inverse function of the preset function can be:

Among them, the preset autocorrelation coefficient threshold is also the value of y in formula (9), so that the value of x in formula (9) is obtained based on the preset autocorrelation coefficient threshold, and the value of x in formula (9) is is the value of the objective parameter of the postsynaptic membrane voltage kernel function.

In one example, there are multiple preset autocorrelation coefficient thresholds, thereby forming a preset autocorrelation coefficient threshold interval, based on the preset autocorrelation coefficient threshold interval, the inverse function of the preset function is solved to obtain the post-synaptic membrane voltage The value range of the target parameter of the kernel function, and the target parameter of the postsynaptic membrane voltage kernel function is selected from the value range. Among them, the target parameters of the postsynaptic membrane voltage kernel function are selected based on the following principles:

(1) The time span of the postsynaptic membrane voltage kernel function can match the correlation of the dynamic visual sensor signal, that is, the abscissa scale of the postsynaptic membrane voltage kernel function and the time correlation range of the dynamic visual sensor signal should coincide as much as possible.

(2) The time span of the postsynaptic membrane voltage kernel function should not be close to 0, so as to avoid weakening the time dynamics of neurons.

For example, the threshold interval of the preset autocorrelation coefficient is [0.2, 0.5], and the value interval of the target parameter of the post-synaptic membrane voltage kernel function is [5, 13].

Assuming that the neuron of the spiking neural network model is an impulse response model, the post-synaptic membrane voltage kernel function can be a double exponential function, and its expression is as follows:

In the formula (10), κ ^ext represents the post-synaptic membrane voltage kernel function, κ ^ext is also called the amplitude; β represents the amplitude adjustment coefficient; τ _s represents the first time parameter, τ _s is also referred to in the application target parameter; τ _m represents the second time parameter. For example, the value interval of τ _s is [5, 13], τ _s =5 can be set, and τ _m is a fixed value, for example, τ _m =2. Figure 5 shows the postsynaptic membrane voltage kernel function corresponding to several different time parameters.

It should be noted that the present application can calculate the autocorrelation coefficient of the dynamic visual sensor signal within a period of time online through the time sliding window method. The specific operation is as follows:

Assuming that C _q represents the autocorrelation coefficient between dynamic visual sensor signals with an interval of q, the calculation formula of C _q is as follows:

In formula (11), D represents the number of dynamic visual sensor signals in the sliding window; W and H are the dimensions of the pixel on the x-axis and y-axis, respectively, that is, the width and height of the sliding window; S _k,x,y Represent the signal value of the pixel whose coordinates are (x, y) in the kth dynamic visual sensor signal in the sliding window, S _{k, x, y} =1 or S _{k, x, y} =-1; S _{k+q , x′, y′} represent the signal value of the pixel whose coordinates are (x′, y′) in the dynamic visual sensor signal with an interval of q from the kth dynamic visual sensor signal, S _{k+q, x′, y′} = 1 or S _{k+q, x′, y′} =-1; the pixel whose coordinates are (x′, y′) is the adjacent pixel of the pixel whose coordinates are (x, y), and the selection of x′ and y′ The value ranges are as follows:

x-Δ≤x′≤x+Δ (12)

y-Δ≤y′≤y+Δ (13)

In formula (11), p _x,y (S _k,x,y ,S _{k+q,x′,y′} ,Δ) represents the pair of S _k,x,y and S _{k+q,x′,y '} Carry out the calculation of the proximity degree Δ, Δ means the calculation formula of the proximity p _x,y (S _k,x,y ,S _k+q,x',y' ,Δ) is as follows:

Assuming Δ=1, a series of C _q values can be calculated according to formula (11) to formula (14). The time distribution of the series of C _q values is shown in FIG. 4 .

Wherein, the plurality of pixels are a plurality of pixels in the photosensitive element of the dynamic vision sensor, or the plurality of pixels are a plurality of pixels in a two-dimensional image, and the two-dimensional image is collected by the photosensitive element of the dynamic vision sensor Obtained: the first target pixel is a pixel whose proximity to any pixel in the photosensitive element of the dynamic vision sensor is not greater than a preset proximity threshold, or the first target pixel is a pixel in a two-dimensional image that is not closer to the pixel The proximity of any pixel is not greater than the preset proximity threshold, and the two-dimensional image is collected by the photosensitive element of the dynamic vision sensor.

Specifically, there are multiple first target dynamic visual sensor signals in the preset time period, and the autocorrelation coefficient of the dynamic visual sensor signals in the preset time period is calculated by the time sliding window method, and the time window size of the sliding window can be the first The size of the preset cycle; D represents the quantity of the first target dynamic visual sensor signal in the same first preset cycle, that is, there are D second target dynamic visual sensor signals in the same first preset cycle; W and H is the width and height of the photosensitive element of the dynamic vision sensor respectively; (x, y) represents the coordinates of any pixel in the photosensitive element of the dynamic vision sensor, or (x, y) represents any pixel on the two-dimensional image coordinates; (x', y') represents the coordinates of pixels whose proximity to any pixel (x, y) in the photosensitive element of the dynamic vision sensor is not greater than the preset proximity threshold, or (x', y') Indicates the coordinates of pixels whose proximity to any pixel (x, y) in the two-dimensional image is not greater than the preset proximity threshold, that is, the coordinates of the first target pixel; S _{k, x, y} represents any pixel (x , y) the first signal value in the wth first target dynamic visual sensor signal among multiple first target dynamic visual sensor signals; S _{k+q, x', y'} represent the first target pixel (x' , y′) the first signal value in the w+qth first target dynamic visual sensor signal among multiple first target dynamic visual sensor signals, that is, the first target pixel (x′, y′) at the The first signal value in the three-target dynamic vision sensor signal; the first value is

The second value is p _x,y (S _k,x,y ,S _{k+q,x′,y′} ,Δ).

In this implementation, the multiple first target dynamic visual sensor signals are dynamic visual sensor signals at multiple moments in the preset time period, and the dynamic visual sensor signals in the preset time period can be calculated by the method of time sliding window. Autocorrelation coefficient; for example, the size of the first preset period is used as the size of the time window to slide the window on the preset time period, and the time interval of one sliding is the size of a first preset period; during a time sliding window process, the time The window can frame D first target dynamic visual sensor signals in a plurality of first target dynamic visual sensor signals, for example, denoted as D second target dynamic visual sensor signals; based on the D second target dynamic visual sensor signals, The autocorrelation coefficient of the dynamic visual sensor signal corresponding to the time sliding window is calculated. Wherein, because each first target dynamic visual sensor signal of a plurality of first target dynamic visual sensor signals comprises the first signal value of a plurality of pixels, so D second in each time window (or first preset cycle) Each second target dynamic visual sensor signal in the target dynamic visual sensor signal also includes the first signal values of a plurality of pixels, so each time window (or the first signal value of each time window (or the second) can be calculated according to the first signal values of adjacent pixels at different times. A preset period) corresponding to the autocorrelation coefficient. In this way, with the size of the first preset period as the time window size, after multiple sliding windows in the preset time period, the autocorrelation coefficient of the dynamic visual sensor signal corresponding to the multiple time sliding windows can be calculated, so that it can be obtained Multiple autocorrelation coefficients.

Specifically, S _{k, x, y} represents the first signal value of any pixel (x, y); S _{k+q, x′, y′} represents the target value of any second target pixel (x′, y′). Signal value, the third value is S _k,x,y S _{k+q,x′,y′} , the second value is ∑ _x′,y′ S _k,x,y S _{k+q,x′,y′} .

It should be noted that the minimum time resolution of the dynamic vision sensor signal is 1us. In order to improve the data density of the dynamic vision sensor signal, this application can compress the dynamic vision sensor signal within a certain period of time into one dynamic vision sensor signal, that is, multiple A dynamic visual sensor signal compression frame is a dynamic visual sensor signal.

The following takes any pixel (x, y) as an example to illustrate the specific operation of dynamic visual sensor signal compression, as follows: Given a series of dynamic visual sensor signals: {s ₁ , s ₂ ,…,s _r }, where , s _r represents the signal polarization value of any pixel (x, y) in the dynamic visual sensor signal at time t _r , the positive event takes the value 1, and the negative event takes the value -1; given the frame compression time window T , for example T=500us, let S _g be the accumulated signal value of any pixel (x, y) after g times of compression; if gT<t _r <(g+1)T, then S _g +=s _r , That is, S _g =S _g +s _r ; after traversing the series of dynamic vision sensor signals, reset according to the following principles to obtain the signal reset value of any pixel (x, y): if S _g >0, Then S _g =1; if S _g <0, then S _g =−1.

Specifically, taking the size of the frame compression time window as the size of the second preset period, the multiple third dynamic visual sensor signals within the preset time period are compressed to obtain multiple first target dynamic visual sensor signals, wherein , the third dynamic vision sensor signal is the original dynamic vision sensor signal; for example, the r third dynamic vision sensor signals are compressed to obtain m first target dynamic vision sensor signals. The first signal value of any pixel (x, y) included in any first target dynamic vision sensor signal is a signal reset value, and the second or third signal value of any pixel (x, y) is A signal accumulation value, the fourth signal value of any pixel (x, y) is a signal polarization value.

In this implementation manner, multiple dynamic vision sensor signals within a certain period of time are compressed into one dynamic vision sensor signal, so as to increase the data density of the dynamic vision sensor signals. For example, taking the size of the second preset period as the size of the time window as the size of the time window to compress the multiple third dynamic visual sensor signals within the preset time period, thereby obtaining multiple first target dynamic visual sensor signals, The first signal value of any pixel included in any first target dynamic visual sensor signal is obtained based on the cumulative value reset of the fourth signal value of the arbitrary pixel in multiple third dynamic visual sensor signals, The principle of resetting is: if the cumulative value of the fourth signal value of any pixel in multiple third dynamic vision sensor signals is greater than 0, then the first signal value is set to 1, if the any pixel is in multiple If the accumulated value of the fourth signal value in the third dynamic vision sensor signal is greater than 0, then the first signal value is set to -1. Moreover, since the first target dynamic visual sensor signal has a higher data density, compared with the third dynamic visual sensor signal, using the first target dynamic visual sensor signal to train the spiking neural network model can improve the training efficiency.

It should be understood that the first target dynamic visual sensor signal can be an original dynamic visual sensor signal, such as the third dynamic visual sensor signal; the first target dynamic visual sensor signal can also be obtained by compressing the original dynamic visual sensor signal For example, a plurality of third dynamic vision sensor signals are compressed to obtain a first target dynamic vision sensor signal. Wherein, when the first target dynamic visual sensor signal was the original dynamic visual sensor signal, the first signal value of any pixel in the first target dynamic visual sensor signal was a signal polarization value; when the first target dynamic visual sensor signal When it is a signal obtained by compressing frames, the first signal value of any pixel in the first target dynamic vision sensor signal is a signal reset value.

Please refer to Fig. 6, Fig. 6 is a schematic diagram of the training process of a spiking neural network model provided by the embodiment of the present application; on the training data set of the dynamic visual sensor signal, the technical solution of the present application can be used to effectively train the training set, The spiking neural network model basically converges after a period of time, for example, after 50 iterations, the loss basically converges.

Please refer to Fig. 7. Fig. 7 is a schematic diagram of the impact of different target parameters on the training of the spiking neural network model provided by the embodiment of the present application; the target parameter τ _s of the post-synaptic membrane voltage kernel function is different, and the spiking neural network model training effect is different; training With the increase of the target parameter τ _s , the loss at time presents a trend of first decreasing and then increasing; among them, when τ _s is around 10, the loss is smaller. Therefore, through the technical solution of the present application, an appropriate target parameter τ _s can be selected so that the training of the spiking neural network model can be optimized.

Please refer to Fig. 8, Fig. 8 is a comparison chart of the denoising effect of the spike neural network model provided by the embodiment of the present application and the denoising effect of the three-dimensional convolutional neural network model; compared with the traditional three-dimensional convolutional neural network model volume, the present application The denoising effect of the provided spiking neural network model is significantly better.

Among them, the comparison between the spiking neural network model provided in the embodiment of the present application and the three-dimensional convolutional neural network model in terms of running time, parameter quantity and calculation quantity is shown in Table 1.

Table 1 Comparison of running time, parameters and computation between the spiking neural network model and the 3D convolutional neural network model

网络模型network model	运行时间operation hours	参数量Parameter amount	计算量Calculations
三维卷积神经网络模型3D Convolutional Neural Network Model	170s170s	6.54MB6.54MB	531.5G531.5G
脉冲神经网络模型Spiking Neural Network Model	14s14s	150KB150KB	1G-10G1G-10G

It can be seen from Table 1 that, in the case of similar denoising effects, the spiking neural network model provided by this application has obvious advantages in terms of parameter quantity, running time, and calculation amount; among them, Table 1 uses 1000 pieces of 346*260 size Statistical results of images captured by dynamic vision sensors running on TESLA v100 GPU.

Please refer to Fig. 9, Fig. 9 is a comparison diagram of the denoising effect of the denoising signal of the pulse neural network model provided by the embodiment of the present application and the de-blurred signal of the three-dimensional convolutional neural network model; first, the dynamic visual sensor signal is processed Denoising, when using the denoised dynamic visual sensor signal for subsequent deblurring tasks, the deblurred image based on the dynamic visual sensor signal after denoising based on the pulse neural network model is compared with the denoising based on the three-dimensional convolutional neural network model. The deblurred image performed by the visual sensor signal can retain more real details and be closer to reality.

In summary, through the technical solution of this application, setting the target parameters of the post-synaptic membrane voltage kernel function enables the spiking neural network model to learn the temporal correlation of dynamic visual sensor signals, thereby using the temporal dynamics of spiking neurons to dynamically Streaming denoising of visual sensor signals greatly reduces the network size, running time and computation.

It should be noted that the series of steps or operations described in the process 300 may also correspond to corresponding descriptions in the embodiment shown in FIG. 1 and FIG. 2 .

Please refer to FIG. 10. FIG. 10 is a schematic structural diagram of a denoising device provided by an embodiment of the present application; the denoising device 1000 is applied to electronic equipment, the electronic equipment includes a server and a terminal, and the denoising device 1000 includes: an acquisition unit 1001, for acquiring a first dynamic visual sensor signal; a processing unit 1002, for performing denoising processing on the first dynamic visual sensor signal by using a spiking neural network model to obtain a second dynamic visual sensor signal, the spiking neural network model The postsynaptic membrane voltage kernel function of the spiking neurons of the network model includes target parameters determined from the autocorrelation coefficients of dynamic visual sensor signals.

It should be noted that, the implementation of each unit of the denoising apparatus 1000 described in FIG. 10 may also refer to corresponding descriptions of the embodiments shown in FIGS. 1 to 9 . Moreover, the beneficial effects brought by the denoising device 1000 described in FIG. 10 can refer to the corresponding descriptions of the embodiments shown in FIG. 1 to FIG. 9 , and the description will not be repeated here.

Please refer to FIG. 11. FIG. 11 is a schematic structural diagram of an electronic device 1110 provided by an embodiment of the present application. The electronic device 1110 includes a processor 1111, a memory 1112, and a communication interface 1113. The above-mentioned processor 1111, memory 1112, and communication interface 1113 They are connected to each other through the bus 1114 .

Memory 1112 includes, but is not limited to, random access memory (random access memory, RAM), read-only memory (read-only memory, ROM), erasable programmable read-only memory (erasable programmable read only memory, EPROM), or Portable read-only memory (compact disc read-only memory, CD-ROM), the memory 1112 is used for related computer programs and data. The communication interface 1113 is used to receive and send data.

The processor 1111 may be one or more central processing units (central processing unit, CPU). In the case where the processor 1111 is a CPU, the CPU may be a single-core CPU or a multi-core CPU.

The processor 1111 in the electronic device 1110 is configured to read the computer program code stored in the memory 1112, and execute the method of any one of the embodiments shown in FIG. 3 .

It should be noted that the electronic device may be a server or a terminal, and the realization of various operations of the electronic device 1110 described in FIG. 11 may refer to corresponding descriptions of the embodiments shown in FIGS. 1 to 9 . Moreover, for the beneficial effects brought by the electronic device 1110 described in FIG. 11 , reference may be made to the corresponding descriptions of the embodiments shown in FIGS. 1 to 9 , and the description will not be repeated here.

The embodiment of the present application also provides a chip, the above-mentioned chip includes at least one processor, memory and interface circuit, the above-mentioned memory, the above-mentioned transceiver and the above-mentioned at least one processor are interconnected by lines, and the above-mentioned at least one memory stores a computer program; the above-mentioned When the computer program is executed by the above-mentioned processor, the method flow of any one embodiment shown in FIG. 3 is realized.

An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored in the above-mentioned computer-readable storage medium, and when the computer program is run on a computer, the method flow of any one of the embodiments shown in FIG. 3 is implemented.

An embodiment of the present application further provides a computer program product. When the computer program product is run on a computer, the method flow of any one of the embodiments shown in FIG. 3 is realized.

It should be understood that the processor mentioned in the embodiment of the present application may be a central processing unit (Central Processing Unit, CPU), and may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application-specific integrated circuits ( Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.

It should also be understood that the memory mentioned in the embodiments of the present application may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memories. Among them, the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash. The volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (Static RAM, SRAM), Dynamic Random Access Memory (Dynamic RAM, DRAM), Synchronous Dynamic Random Access Memory (Synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (Synchlink DRAM, SLDRAM ) and Direct Memory Bus Random Access Memory (Direct Rambus RAM, DR RAM).

It should be noted that when the processor is a general-purpose processor, DSP, ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components, the memory (storage module) is integrated in the processor.

It should be noted that the memories described herein are intended to include, but are not limited to, these and any other suitable types of memories.

It should also be understood that the first, second, third and various numbers mentioned herein are only for convenience of description and are not intended to limit the scope of the present application.

It should be understood that the term "and/or" in this article is only an association relationship describing associated objects, indicating that there may be three relationships, for example, A and/or B may mean: A exists alone, and A and B exist at the same time , there are three cases of B alone. In addition, the character "/" in this article generally indicates that the contextual objects are an "or" relationship.

It should be understood that, in various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, and should not be used in the embodiments of the present application. The implementation process constitutes any limitation.

Those skilled in the art can appreciate that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.

Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described system, device and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the above units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or can be Integrate into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

The units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.

If the above functions are realized in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods shown in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .

The steps in the methods of the embodiments of the present application can be adjusted, combined and deleted according to actual needs.

The modules in the device of the embodiment of the present application can be combined, divided and deleted according to actual needs.

Above, the above embodiments are only used to illustrate the technical solutions of the present application, rather than to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still be applied to the foregoing embodiments The technical solutions described in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the application.

Claims

A denoising method, characterized in that, comprising:

Acquiring the first dynamic vision sensor signal;

Using a spiking neural network model to denoise the first dynamic visual sensor signal to obtain a second dynamic visual sensor signal, the post-synaptic membrane voltage kernel function of the spiking neuron of the spiking neural network model includes a target parameter, The target parameter is determined according to the autocorrelation coefficient of the dynamic vision sensor signal.
The method according to claim 1, wherein the autocorrelation coefficient of the dynamic visual sensor signal includes a plurality of autocorrelation coefficients, and the plurality of autocorrelation coefficients are based on a plurality of first target objects in a preset time period. Obtained from dynamic visual sensor signals;

The target parameter is obtained according to a preset autocorrelation coefficient threshold and a preset function, and the preset function is obtained by fitting according to the distribution of the multiple autocorrelation coefficients in time.
The method according to claim 2, wherein any autocorrelation coefficient in the plurality of autocorrelation coefficients is the average value of the first values corresponding to D second target dynamic vision sensor signals, and D is a positive integer ; The D second target dynamic visual sensor signals are D first target dynamic visual sensor signals belonging to the same first preset cycle among the multiple first target dynamic visual sensor signals, and the preset time period including multiple first preset periods;

The first value corresponding to any second target dynamic visual sensor signal in the D second target dynamic visual sensor signals is obtained by accumulating the second numerical values corresponding to a plurality of pixels;

The second numerical value corresponding to any pixel in the plurality of pixels is obtained according to the first signal value of the any pixel and the target signal value of the first target pixel, and the first target pixel is the same as the target signal value of the any pixel. The proximity of a pixel is not greater than the pixel of the preset proximity threshold; the any second target dynamic visual sensor signal includes the first signal value of any pixel, and the any second target dynamic visual sensor signal is The wth first target dynamic visual sensor signal in the plurality of first target dynamic visual sensor signals, w is a positive integer; the target signal value of the first target pixel is the first target pixel in the third target The first signal value in the dynamic visual sensor signal, the third target dynamic visual sensor signal is the w+q first target dynamic visual sensor signal in the plurality of first target dynamic visual sensor signals, and q is positive integer.
The method according to claim 3, wherein the first target pixel includes a plurality of second target pixels, and the second value corresponding to any pixel is based on the first value corresponding to the plurality of second target pixels Accumulated by three values;

The third value corresponding to any second target pixel among the plurality of second target pixels is the product of the first signal value of any pixel and the target signal value of any second target pixel.
The method according to any one of claims 2-4, characterized in that the first signal value of any pixel included in any first target dynamic visual sensor signal in the plurality of first target dynamic visual sensor signals It is obtained according to the second signal value of any pixel; wherein: if the second signal value of any pixel is greater than 0, then the first signal value of any pixel is 1; if any If the second signal value of the pixel is less than 0, then the first signal value of any pixel is -1;

The second signal value of any pixel is any one of m third signal values of any pixel, m is a positive integer; the m third signal values are based on the preset time period A plurality of third dynamic visual sensor signals obtained; wherein: the gth third signal value in the m third signal values is the g-1th third signal in the m third signal values Value and the sum of the cumulative sum of the fourth signal value of any pixel in the g second preset period, 1≤g≤m, g is a positive integer, and the plurality of third dynamic vision sensor signals include The fourth signal value of any pixel in the g-th second preset cycle, the preset time period includes the g-th second preset cycle; and when g is 1, the first The third signal value is an accumulated sum of fourth signal values of any pixel in the first second preset period.
The method according to any one of claims 1-5, wherein the spiking neural network model includes N convolutional layers and N inverse convolutional layers, wherein: the first of the N convolutional layers The output of the j convolutional layers is the input of the j+1th convolutional layer in the N convolutional layers, and the output of the jth inverse convolutional layer in the N deconvolutional layers is the The input of the j+1th deconvolution layer in the N deconvolution layers, the output of the jth convolution layer is also the N-j deconvolution layer in the N deconvolution layers Input, the output of the Nth convolutional layer in the N convolutional layers is the input of the first inverse convolutional layer in the N deconvolutional layers, 1≤j≤N, N and j are positive integer.
A denoising device, characterized in that it comprises:

an acquisition unit, configured to acquire the first dynamic vision sensor signal;

The processing unit is used to use the spiking neural network model to perform denoising processing on the first dynamic visual sensor signal to obtain the second dynamic visual sensor signal, and the post-synaptic membrane voltage core of the spiking neuron of the spiking neural network model The function includes an objective parameter determined from an autocorrelation coefficient of the dynamic vision sensor signal.
The device according to claim 7, wherein the autocorrelation coefficient of the dynamic visual sensor signal comprises a plurality of autocorrelation coefficients, and the plurality of autocorrelation coefficients are based on a plurality of first targets within a preset time period. Obtained from dynamic visual sensor signals;

The target parameter is obtained according to a preset autocorrelation coefficient threshold and a preset function, and the preset function is obtained by fitting according to the distribution of the multiple autocorrelation coefficients in time.
The device according to claim 8, wherein any autocorrelation coefficient in the plurality of autocorrelation coefficients is the average value of the first values corresponding to D second target dynamic vision sensor signals, and D is a positive integer ; The D second target dynamic visual sensor signals are D first target dynamic visual sensor signals belonging to the same first preset cycle among the multiple first target dynamic visual sensor signals, and the preset time period including multiple first preset periods;

The first value corresponding to any second target dynamic visual sensor signal in the D second target dynamic visual sensor signals is obtained by accumulating the second numerical values corresponding to a plurality of pixels;

The second numerical value corresponding to any pixel in the plurality of pixels is obtained according to the first signal value of the any pixel and the target signal value of the first target pixel, and the first target pixel is the same as the target signal value of the any pixel. The proximity of a pixel is not greater than the pixel of the preset proximity threshold; the any second target dynamic visual sensor signal includes the first signal value of any pixel, and the any second target dynamic visual sensor signal is The wth first target dynamic visual sensor signal in the plurality of first target dynamic visual sensor signals, w is a positive integer; the target signal value of the first target pixel is the first target pixel in the third target The first signal value in the dynamic visual sensor signal, the third target dynamic visual sensor signal is the w+q first target dynamic visual sensor signal in the plurality of first target dynamic visual sensor signals, and q is positive integer.
The device according to claim 9, wherein the first target pixel includes a plurality of second target pixels, and the second value corresponding to any pixel is based on the first value corresponding to the plurality of second target pixels Accumulated by three values;

The third value corresponding to any second target pixel among the plurality of second target pixels is the product of the first signal value of any pixel and the target signal value of any second target pixel.
The device according to any one of claims 8-10, wherein the first signal value of any pixel included in any first target dynamic visual sensor signal among the plurality of first target dynamic visual sensor signals It is obtained according to the second signal value of any pixel; wherein: if the second signal value of any pixel is greater than 0, then the first signal value of any pixel is 1; if any If the second signal value of the pixel is less than 0, then the first signal value of any pixel is -1;

The second signal value of any pixel is any one of m third signal values of any pixel, m is a positive integer; the m third signal values are based on the preset time period A plurality of third dynamic visual sensor signals obtained; wherein: the gth third signal value in the m third signal values is the g-1th third signal in the m third signal values Value and the sum of the cumulative sum of the fourth signal value of any pixel in the g second preset period, 1≤g≤m, g is a positive integer, and the plurality of third dynamic vision sensor signals include The fourth signal value of any pixel in the g-th second preset cycle, the preset time period includes the g-th second preset cycle; and when g is 1, the first The third signal value is an accumulated sum of fourth signal values of any pixel in the first second preset period.
The device according to any one of claims 7-11, wherein the spiking neural network model includes N convolutional layers and N inverse convolutional layers, wherein: the first of the N convolutional layers The output of the j convolutional layers is the input of the j+1th convolutional layer in the N convolutional layers, and the output of the jth inverse convolutional layer in the N deconvolutional layers is the The input of the j+1th deconvolution layer in the N deconvolution layers, the output of the jth convolution layer is also the N-j deconvolution layer in the N deconvolution layers Input, the output of the Nth convolutional layer in the N convolutional layers is the input of the first inverse convolutional layer in the N deconvolutional layers, 1≤j≤N, N and j are positive integer.
An electronic device, characterized in that it comprises:

one or more processors;

A computer-readable storage medium coupled to the processor and storing a program executed by the processor, wherein the program, when executed by the processor, causes the electronic device to perform any of claims 1-6. one of the methods described.
A computer-readable storage medium, characterized by comprising program code, which is used to execute the method according to any one of claims 1-6 when executed by a computer device.
A chip, characterized by comprising: a processor, configured to call and run a computer program from a memory, so that a device installed with the chip executes the method according to any one of claims 1-6.