WO2023083121A1 - Denoising method and related device - Google Patents

Denoising method and related device Download PDF

Info

Publication number
WO2023083121A1
WO2023083121A1 PCT/CN2022/130027 CN2022130027W WO2023083121A1 WO 2023083121 A1 WO2023083121 A1 WO 2023083121A1 CN 2022130027 W CN2022130027 W CN 2022130027W WO 2023083121 A1 WO2023083121 A1 WO 2023083121A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
pixel
signal
visual sensor
dynamic visual
Prior art date
Application number
PCT/CN2022/130027
Other languages
French (fr)
Chinese (zh)
Inventor
冷卢子未
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023083121A1 publication Critical patent/WO2023083121A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the embodiments of the present application relate to the technical field of artificial intelligence, and in particular, to a denoising method and related equipment.
  • the existing dynamic vision sensor (Dynamic Vision Sensor, DVS) signal denoising methods are mainly filter denoising and artificial neural network (Artificial Neural Network, ANN) denoising based on deep learning framework.
  • Filtering denoising methods include temporal filtering methods or spatial filtering methods, which mainly denoise by filtering temporally or spatially isolated events.
  • the denoising method based on the artificial neural network usually compresses the dynamic visual sensor data stream into an image to increase the data density, and then uses the traditional RGB image denoising network to denoise the image obtained by compressing the frame; among them, based on the two-dimensional (2D) Convolutional Neural Networks (CNN) denoising methods usually require an additional definition of a noise model containing time information, while denoising methods based on three-dimensional (3D) Convolutional Neural Networks require a period of time window temporal convolution.
  • 2D Convolutional Neural Networks
  • the filter denoising method is easy to filter events and noise together when the data is sparse, and the performance on the benchmark dataset is not as good as the denoising method based on the artificial neural network; while the denoising method based on the artificial neural network faces the problem of large network size, Problems such as large amount of calculation and long processing time. Moreover, due to the characteristics of highly sparse dynamic visual sensor signals, large data density variation range, and high time resolution, existing dynamic visual sensor signal denoising methods cannot achieve good denoising effects.
  • the present application provides a denoising method and related equipment, which can improve the denoising effect of dynamic visual sensor signals.
  • the present application relates to a denoising method, comprising: acquiring a first dynamic visual sensor signal; using a Spiking Neural Network (SNN) model to perform denoising processing on the first dynamic visual sensor signal,
  • SNN Spiking Neural Network
  • PSP post-synaptic membrane voltage
  • the spiking neuron spikeking neuron
  • the spiking neural network model includes a target parameter, and the target parameter is based on the dynamic
  • the autocorrelation coefficient of the visual sensor signal is determined.
  • the spiking neural network model is used to denoise the dynamic visual sensor signal.
  • the spiking neural network model includes spiking neurons, and the target parameters in the postsynaptic membrane voltage kernel function of the spiking neurons are based on the dynamic The autocorrelation coefficient of the visual sensor signal is determined, so the post-synaptic membrane voltage kernel function can enable the spiking neural network model to learn the time correlation of the dynamic visual sensor signal, and remove the noise with weak temporal correlation in the dynamic visual sensor signal events, so that the denoising effect of the spiking neural network model on dynamic visual sensor signals can be improved.
  • the spiking neural network model performs streaming denoising on highly sparse dynamic visual sensor signals, that is, the input data of the spiking neural network model is the same as the number of output data frames, and the data in the spiking neural network
  • One pass in the model does not rely on three-dimensional convolution to perform traversal sliding window calculations in the time dimension, so compared with the existing denoising methods based on artificial neural networks, it can greatly reduce the running time, network size and calculation amount.
  • the autocorrelation coefficient of the dynamic vision sensor signal includes a plurality of autocorrelation coefficients, and the plurality of autocorrelation coefficients are based on a plurality of first target dynamic vision sensor signals within a preset time period. obtained; the target parameter is obtained according to a preset autocorrelation coefficient threshold and a preset function, and the preset function is obtained by fitting according to the distribution of the multiple autocorrelation coefficients in time.
  • function fitting is performed to obtain a preset function, so the preset function represents The relationship between the autocorrelation coefficient of the dynamic visual sensor signal and time; then the inverse function is calculated for the preset function to obtain the inverse function of the preset function, so the inverse function of the preset function represents the autocorrelation between time and the dynamic visual sensor signal
  • the relationship between the coefficients; then the time value obtained by solving the inverse function of the preset function based on the preset autocorrelation coefficient threshold is used as the value of the target parameter in the post-synaptic membrane voltage kernel function of the spiking neuron.
  • the target parameters in the post-synaptic membrane voltage kernel function are adjusted, so that the spiking neural network model learns the temporal correlation of the dynamic visual sensor signal.
  • any autocorrelation coefficient among the plurality of autocorrelation coefficients is the average value of the first values corresponding to the D second target dynamic vision sensor signals, and D is a positive integer;
  • the D The second target dynamic vision sensor signals are the D first target dynamic vision sensor signals belonging to the same first preset period among the multiple first target dynamic vision sensor signals, and the preset time period includes a plurality of the first target dynamic vision sensor signals The first preset period;
  • the first numerical value corresponding to any second target dynamic visual sensor signal in the D second target dynamic visual sensor signals is obtained by accumulating the second numerical values corresponding to a plurality of pixels;
  • the second numerical value corresponding to any pixel in the plurality of pixels is obtained according to the first signal value of the any pixel and the target signal value of the first target pixel, and the first target pixel is the same as the any pixel
  • the any second target dynamic visual sensor signal includes the first signal value of any pixel, and the any second target dynamic visual sensor
  • the plurality of pixels are a plurality of pixels in the photosensitive element of the dynamic vision sensor, or the plurality of pixels are a plurality of pixels in a two-dimensional image, and the two-dimensional image is collected by the photosensitive element of the dynamic vision sensor Obtained: the first target pixel is a pixel whose proximity to any pixel in the photosensitive element of the dynamic vision sensor is not greater than a preset proximity threshold, or the first target pixel is a pixel in a two-dimensional image that is not closer to the pixel The proximity of any pixel is not greater than the preset proximity threshold, and the two-dimensional image is collected by the photosensitive element of the dynamic vision sensor.
  • the multiple first target dynamic visual sensor signals are dynamic visual sensor signals at multiple moments in the preset time period
  • the dynamic visual sensor signals in the preset time period can be calculated by the method of time sliding window.
  • Autocorrelation coefficient for example, the size of the first preset period is used as the size of the time window to slide the window on the preset time period, and the time interval of one sliding is the size of a first preset period; during a time sliding window process, the time The window can frame D first target dynamic visual sensor signals in a plurality of first target dynamic visual sensor signals, for example, denoted as D second target dynamic visual sensor signals; based on the D second target dynamic visual sensor signals, The autocorrelation coefficient of the dynamic visual sensor signal corresponding to the time sliding window is calculated.
  • each first target dynamic visual sensor signal of a plurality of first target dynamic visual sensor signals comprises the first signal value of a plurality of pixels, so D second in each time window (or first preset period)
  • Each second target dynamic visual sensor signal in the target dynamic visual sensor signal also includes the first signal values of a plurality of pixels, so each time window (or the first signal value of each time window (or the second) can be calculated according to the first signal values of adjacent pixels at different times.
  • a preset period) corresponding to the autocorrelation coefficient.
  • the autocorrelation coefficient of the dynamic visual sensor signal corresponding to the multiple time sliding windows can be calculated, so that it can be obtained Multiple autocorrelation coefficients.
  • the first target pixel includes a plurality of second target pixels
  • the second value corresponding to any pixel is accumulated according to the third value corresponding to the plurality of second target pixels
  • the third numerical value corresponding to any second target pixel among the plurality of second target pixels is the product of the first signal value of any pixel and the target signal value of any second target pixel.
  • the first signal value of the any pixel at a certain moment is respectively compared with the value of each adjacent pixel of the any pixel at another moment Multiply the first signal value of each adjacent pixel to obtain the third numerical value corresponding to each adjacent pixel, that is, to obtain a plurality of third numerical values corresponding to any pixel; then add the third numerical values corresponding to each adjacent pixel to obtain The second numerical value corresponding to any pixel; and then accumulating the second numerical values corresponding to all pixels to obtain the first numerical value corresponding to the second target dynamic vision sensor signal at a certain moment.
  • the above operations can be performed to obtain the second target at all moments
  • the first value corresponding to the dynamic vision sensor signal such as D second target dynamic vision sensor signal
  • the first signal value of any pixel included in any first target dynamic vision sensor signal among the plurality of first target dynamic vision sensor signals is based on the first signal value of any pixel
  • Two signal values are obtained; wherein: if the second signal value of any pixel is greater than 0, then the first signal value of any pixel is 1; if the second signal value of any pixel is less than 0, Then the first signal value of any pixel is -1; the second signal value of any pixel is any one of the m third signal values of any pixel, m is a positive integer;
  • the m third signal values are obtained according to a plurality of third dynamic visual sensor signals within the preset time period; wherein: the gth third signal value among the m third signal values is the m The sum of the g-1th third signal value among the third signal values and the cumulative sum of the fourth signal value of any pixel in the gth second preset period, 1 ⁇ g ⁇ m, g is a positive integer
  • the multiple third dynamic visual sensor signals include the fourth signal value of
  • multiple dynamic vision sensor signals within a certain period of time are compressed into one dynamic vision sensor signal, so as to increase the data density of the dynamic vision sensor signal.
  • taking the size of the second preset period as the size of the time window as the size of the time window to compress the multiple third dynamic visual sensor signals within the preset time period, thereby obtaining multiple first target dynamic visual sensor signals The first signal value of any pixel included in any first target dynamic visual sensor signal is obtained based on the cumulative value reset of the fourth signal value of the arbitrary pixel in multiple third dynamic visual sensor signals, The principle of resetting is: if the cumulative value of the fourth signal value of any pixel in multiple third dynamic vision sensor signals is greater than 0, then the first signal value is set to 1, if the any pixel is in multiple If the accumulated value of the fourth signal value in the third dynamic vision sensor signal is greater than 0, then the first signal value is set to -1.
  • the first target dynamic visual sensor signal has a higher data density, compared with the third dynamic visual sensor signal, using the first target dynamic visual sensor signal to
  • the spiking neural network model includes N convolutional layers and N inverse convolutional layers, wherein: the output of the jth convolutional layer in the N convolutional layers is The input of the j+1th convolutional layer in the N convolutional layers, the output of the jth inverse convolutional layer in the N inverse convolutional layers is the first in the N inverse convolutional layers The input of the j+1 deconvolution layer, the output of the jth convolution layer is also the input of the N-j deconvolution layer in the N deconvolution layers, and the N convolution layers The output of the Nth convolutional layer in is the input of the first inverse convolutional layer in the N deconvolutional layers, 1 ⁇ j ⁇ N, and N and j are positive integers.
  • the spiking neural network model includes symmetrical N convolutional layers and N deconvolutional layers, and there is a skip connection between each convolutional layer and its symmetric deconvolutional layer; thus, the spiking The neural network model extracts and reconstructs the features of dynamic visual sensor signals through deconvolution and skip connections, which is beneficial to ensure the integrity of the extracted features and the authenticity of the reconstructed features.
  • the present application relates to a denoising device, and the beneficial effects can be referred to the description of the first aspect, which will not be repeated here.
  • the denoising device has the function of realizing the behavior in the method example of the first aspect above.
  • the functions described above may be implemented by hardware, or may be implemented by executing corresponding software on the hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the denoising device includes: an acquisition unit, configured to acquire a first dynamic visual sensor signal; a processing unit, configured to use a spiking neural network model to process the first dynamic visual sensor signal Denoising processing, to obtain the second dynamic visual sensor signal, the post-synaptic membrane voltage kernel function of the spiking neuron of the spiking neural network model includes a target parameter, and the target parameter is determined according to the autocorrelation coefficient of the dynamic visual sensor signal of.
  • the autocorrelation coefficient of the dynamic vision sensor signal includes a plurality of autocorrelation coefficients, and the plurality of autocorrelation coefficients are based on a plurality of first target dynamic vision sensor signals within a preset time period. obtained; the target parameter is obtained according to a preset autocorrelation coefficient threshold and a preset function, and the preset function is obtained by fitting according to the distribution of the multiple autocorrelation coefficients in time.
  • any autocorrelation coefficient among the plurality of autocorrelation coefficients is the average value of the first values corresponding to the D second target dynamic vision sensor signals, and D is a positive integer;
  • the D The second target dynamic vision sensor signals are the D first target dynamic vision sensor signals belonging to the same first preset period among the multiple first target dynamic vision sensor signals, and the preset time period includes a plurality of the first target dynamic vision sensor signals The first preset period;
  • the first numerical value corresponding to any second target dynamic visual sensor signal in the D second target dynamic visual sensor signals is obtained by accumulating the second numerical values corresponding to a plurality of pixels;
  • the second numerical value corresponding to any pixel in the plurality of pixels is obtained according to the first signal value of the any pixel and the target signal value of the first target pixel, and the first target pixel is the same as the any pixel
  • the any second target dynamic visual sensor signal includes the first signal value of any pixel, and the any second target dynamic visual sensor
  • the first target pixel includes a plurality of second target pixels
  • the second value corresponding to any pixel is accumulated according to the third value corresponding to the plurality of second target pixels
  • the third numerical value corresponding to any second target pixel among the plurality of second target pixels is the product of the first signal value of any pixel and the target signal value of any second target pixel.
  • the first signal value of any pixel included in any first target dynamic vision sensor signal among the plurality of first target dynamic vision sensor signals is based on the first signal value of any pixel
  • Two signal values are obtained; wherein: if the second signal value of any pixel is greater than 0, then the first signal value of any pixel is 1; if the second signal value of any pixel is less than 0, Then the first signal value of any pixel is -1; the second signal value of any pixel is any one of the m third signal values of any pixel, m is a positive integer;
  • the m third signal values are obtained according to a plurality of third dynamic visual sensor signals within the preset time period; wherein: the gth third signal value among the m third signal values is the m The sum of the g-1th third signal value among the third signal values and the cumulative sum of the fourth signal value of any pixel in the gth second preset period, 1 ⁇ g ⁇ m, g is a positive integer
  • the multiple third dynamic visual sensor signals include the fourth signal value of
  • the spiking neural network model includes N convolutional layers and N inverse convolutional layers, wherein: the output of the jth convolutional layer in the N convolutional layers is The input of the j+1th convolutional layer in the N convolutional layers, the output of the jth inverse convolutional layer in the N inverse convolutional layers is the first in the N inverse convolutional layers The input of the j+1 deconvolution layer, the output of the jth convolution layer is also the input of the N-j deconvolution layer in the N deconvolution layers, and the N convolution layers The output of the Nth convolutional layer in is the input of the first inverse convolutional layer in the N deconvolutional layers, 1 ⁇ j ⁇ N, and N and j are positive integers.
  • the present application relates to an electronic device, comprising: one or more processors; a computer-readable storage medium coupled to the processors and storing a program executed by the processors, wherein the programs are in When executed by the processor, the electronic device is made to execute the method in any possible embodiment of the first aspect.
  • the present application relates to a computer-readable storage medium, including program codes, which, when executed by a computer device, are used to perform the method in any possible embodiment of the first aspect.
  • the present application relates to a chip, including: a processor, configured to call and run a computer program from a memory, so that a device installed with the above-mentioned chip executes any of the possible embodiments of the first aspect. method.
  • the present application relates to a computer program product comprising program code which, when run, performs the method of any one of the possible embodiments of the first aspect.
  • Fig. 1 is a schematic structural diagram of a neural network provided by an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a spiking neural network provided in an embodiment of the present application
  • FIG. 3 is a schematic flowchart of a denoising method provided in an embodiment of the present application.
  • Fig. 4 is the time distribution curve of the autocorrelation coefficient of a kind of dynamic vision sensor signal provided by the embodiment of the present application;
  • FIG. 5 is a schematic diagram of the post-synaptic membrane voltage kernel function corresponding to different time parameters provided by the embodiment of the present application;
  • FIG. 6 is a schematic diagram of a training process of a spiking neural network model provided in an embodiment of the present application.
  • Fig. 7 is a schematic diagram of the influence of different target parameters on the training of the spiking neural network model provided by the embodiment of the present application;
  • FIG. 8 is a comparison diagram of the denoising effect of the spiking neural network model provided by the embodiment of the present application and the denoising effect of the three-dimensional convolutional neural network model;
  • Fig. 9 is a comparison diagram of de-blurring effects between the denoised signal of the pulse neural network model provided by the embodiment of the present application and the de-noised signal of the three-dimensional convolutional neural network model;
  • FIG. 10 is a schematic structural diagram of a denoising device provided in an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • a dynamic vision sensor also known as an event camera or a neuromorphic camera, is an imaging sensor that responds to local brightness changes. Dynamic vision sensors do not use a shutter to capture images like traditional cameras do. Instead, each pixel within the dynamic vision sensor operates independently and asynchronously, reporting when brightness changes and remaining silent otherwise. Dynamic vision sensors have microsecond-level temporal resolution, 120dB dynamic range, and less under/overexposure and motion blur than frame cameras. Due to the advantages of asynchronous triggering, high time resolution, high dynamic range, low latency, low bandwidth and low power consumption, dynamic vision sensors can be mounted on mobile platforms (such as mobile phones, drones and cars, etc.) for objects Vision tasks such as detection, tracking, recognition, depth estimation, etc.
  • mobile platforms such as mobile phones, drones and cars, etc.
  • the dynamic vision sensor does not need to read all the pixels in the picture, only need Obtain the address and information of the pixel point where the light intensity changes; specifically, when the dynamic vision sensor detects that the light intensity change of a certain pixel point is greater than or equal to the preset threshold value, an event signal for the pixel point is sent; wherein, if the If the light intensity change is a positive change, that is, the pixel point jumps from low brightness to high brightness, then a "+1" event signal is sent and marked as a positive event; if the light intensity change is a negative change, that is, the pixel point is changed by When high brightness jumps to low brightness, a "-1" event signal is issued and marked as a negative event; if the light intensity change is less than the preset threshold value, no event signal is sent and marked as no event; the dynamic vision sensor
  • the event annotation performed by the point constitutes the event
  • the form of light intensity change information collected by the dynamic visual sensor can be (X, Y, P, T), where "X, Y” is the event address, "P” is the event output, and “T” is the time when the event is generated .
  • An event address matches a pixel in the two-dimensional image associated with the dynamic vision sensor, which means that the event address corresponds to a pixel position in the reference color image, and "X, Y” can be the row in the reference color image , column position, "P” is the specific value of the real-time light intensity change, and "T” is the generation time of the real-time light intensity change.
  • the dynamic visual sensor based on address-event expression imitates the working mechanism of biological vision, while the traditional visual image acquisition method is based on "frames" collected at a fixed frequency, which has high redundancy, high latency, low dynamic range and high data Quantity and other defects.
  • the dynamic vision sensor works asynchronously with pixels, and only outputs the address and information of the pixels whose light intensity changes, instead of passively reading out the information of each pixel in the "frame” in sequence, eliminating redundant data from the source, and having real-time dynamics of scene changes Response, super-sparse representation of images, and asynchronous output of events. Due to the high sensitivity of the dynamic vision sensor, the signal output is often accompanied by noise, including background noise, device thermal noise, etc. At the same time, due to its large range of data density changes and high temporal resolution, it poses challenges to traditional image-based denoising algorithms.
  • a neural network is a computing system that mimics the structure of a biological brain for data processing.
  • the interior of the biological brain is composed of a large number of neurons combined in different ways, and the former neuron and the latter neuron are connected through synaptic structures for information transmission.
  • Neural network has powerful nonlinear, self-adaptive and fault-tolerant information processing capabilities.
  • each node simulates a neuron and performs a specific operation, such as an activation function; the connection between nodes simulates a synapse, and the weight value of the synapse represents two neurons.
  • the connection strength between elements is shown in Figure 1.
  • the transmission of information is carried out through analog values, and each neuron accumulates the value of the previous neuron through the multiplication and addition operation, and passes the activation function to the subsequent neuron.
  • the transmission of information is carried out through pulse sequences, and each neuron regulates the membrane voltage by accumulating the pulse sequences of the previous neurons. When the membrane voltage reaches a certain threshold, the neuron Send out new pulses and transmit them to subsequent neurons, in this way the transmission, processing and nonlinear transformation of information are realized.
  • spiking neural network many different types of neurons can be used in the spiking neural network, such as integral discharge (Integrate and Fire, IF) model, leakage integral discharge (Leaky Integrate and Fire, LIF) model, impulse response model (Spike Response Model, SRM), Threshold variable neurons, etc.
  • integral discharge Integrate and Fire, IF
  • leakage integral discharge Leaky Integrate and Fire, LIF
  • impulse response model Spike Response Model, SRM
  • Threshold variable neurons etc.
  • the spike neuron is the basic unit of the spike neural network, which integrates information by receiving pulse input. Pulse input will cause the membrane voltage of the neuron to increase. When the neuron membrane voltage increases beyond a certain threshold voltage, the neuron will send out a pulse and transmit it to other neurons.
  • Synapse is the carrier in the process of pulse transmission, and the connection between pulse neurons depends on synapses.
  • the post-synaptic membrane voltage is also called the post-synaptic potential, which is the voltage change on the post-synaptic neuron membrane voltage generated by the pulse fired by the pre-synaptic neuron.
  • spiking neural networks are based on the inspiration of brain neural networks, and have the characteristics of low energy consumption, asynchronous computing and time dynamics, and are an ideal technical method for processing highly sparse dynamic visual sensor streaming data.
  • the technical solution provided by this application uses the pulse neural network model to denoise the dynamic visual sensor signal, specifically including: the structure design of the pulse neural network model, the technical solution for training design, the technical solution for improving the data density of the dynamic visual sensor signal, and A technical solution for adjusting the parameters of spiking neurons based on the characteristics of dynamic visual sensor signals (such as temporal correlation), and so on.
  • Fig. 2 is a schematic structural diagram of a spiking neural network provided by an embodiment of the present application; the spiking neural network model includes N convolutional layers and N deconvolutional layers, wherein: the N volumes
  • the output of the jth convolutional layer in the convolutional layer is the input of the j+1th convolutional layer in the N convolutional layers, and the jth inverse convolutional layer in the N deconvolutional layers
  • the output of the N deconvolution layer is the input of the j+1th deconvolution layer, and the output of the jth convolution layer is also the N-jth of the N deconvolution layers
  • the input of the deconvolution layer, the output of the Nth convolution layer in the N convolution layers is the input of the first deconvolution layer in the N deconvolution layers, 1 ⁇ j ⁇ N , N and j are positive integers.
  • the network computing unit of the spiking neural network is a spiking neuron, and the spiking neuron can use many different types of neurons, such as integral discharge model, leakage integral discharge model, impulse response model, and threshold-variable neuron.
  • the neurons of the spiking neural network adopt an impulse response model
  • the impulse response model is defined by the following integral equation:
  • t represents the time
  • u(t) represents the voltage at time t
  • represents the voltage value variable of the neuron after each firing pulse
  • f represents the index number of the neuron pulse firing time
  • t (f) Indicates the pulse release time
  • ⁇ ext represents the post-synaptic membrane voltage kernel function and ⁇ ext is an exponential function
  • s represents the pulse input time
  • u rest represents the resting voltage
  • the neuron fires a pulse.
  • a loss function based on Van Rossum distance (Van Rossum) is used in the spike neural network to better reflect the error of the spike sequence and avoid insufficient network output.
  • Van Rossum distance the specific definition of Van Rossum distance is as follows:
  • u 1 , u 2 , . . . , u n all represent the pulse time.
  • v 1 , v 2 , . . . , v n all represent the pulse time.
  • represents the time constant of the kernel function h(t), and t represents time; f(t; u) and f(t; v) represent the convolution of the pulse sequence and a specific kernel function, respectively, as in the formula ( 5) and formula (6):
  • kernel function h(t) is defined as:
  • the spiking neural network model shown in Figure 2 includes symmetrical N convolutional layers and N deconvolutional layers, and there is a skip connection between each convolutional layer and its symmetric deconvolutional layer; thus, the spiking neural network
  • the network model extracts and reconstructs the features of dynamic visual sensor signals through deconvolution and skip connections, which is beneficial to ensure the integrity of the extracted features and the authenticity of the reconstructed features.
  • FIG. 3 is a flowchart illustrating a process 300 of a denoising method according to an embodiment of the present application.
  • the process 300 is described as a series of steps or operations. It should be understood that the process 300 may be executed in various orders and/or concurrently, and is not limited to the execution order shown in FIG. 3 .
  • the process 300 can be executed by an electronic device, the electronic device includes a server and a terminal, and the process 300 includes but not limited to the following steps or operations:
  • Step 301 Acquire the first dynamic vision sensor signal
  • Step 302 Using the spiking neural network model to denoise the first dynamic visual sensor signal to obtain the second dynamic visual sensor signal, the post-synaptic membrane voltage kernel function of the spiking neuron of the spiking neural network model includes A target parameter, the target parameter is determined according to the autocorrelation coefficient of the dynamic visual sensor signal.
  • the structure of the spiking neural network model may be shown in FIG. 2 .
  • the first dynamic visual sensor signal is the original signal collected by the dynamic visual sensor
  • the second dynamic visual sensor signal is the signal after the pulse neural network model provided by this application is denoised.
  • the autocorrelation coefficient of the dynamic vision sensor signal can be the time data autocorrelation coefficient (autocorrelation coefficient): ⁇ x(s)x(s-t)>; wherein, x represents the signal, t represents the interval time, s represents the time point, ⁇ > means to average the signal at different time points.
  • the target parameter can be preset according to the autocorrelation coefficient of the dynamic visual sensor signal before the training of the spiking neural network model;
  • the autocorrelation coefficient is adjusted in real time.
  • the target parameter can be set as self-learning or fixed.
  • the spiking neural network model is used to denoise the dynamic visual sensor signal.
  • the spiking neural network model includes spiking neurons, and the target parameters in the postsynaptic membrane voltage kernel function of the spiking neurons are based on the dynamic The autocorrelation coefficient of the visual sensor signal is determined, so the post-synaptic membrane voltage kernel function can enable the spiking neural network model to learn the time correlation of the dynamic visual sensor signal, and remove the noise with weak temporal correlation in the dynamic visual sensor signal events, so that the denoising effect of the spiking neural network model on dynamic visual sensor signals can be improved.
  • the spiking neural network model performs streaming denoising on highly sparse dynamic visual sensor signals, that is, the input data of the spiking neural network model is the same as the number of output data frames, and the data in the spiking neural network
  • One pass in the model does not rely on three-dimensional convolution to perform traversal sliding window calculations in the time dimension, so compared with the existing denoising methods based on artificial neural networks, it can greatly reduce the running time, network size and calculation amount.
  • the autocorrelation coefficient of the dynamic vision sensor signal includes a plurality of autocorrelation coefficients, and the plurality of autocorrelation coefficients are based on a plurality of first target dynamic vision sensor signals within a preset time period. obtained; the target parameter is obtained according to a preset autocorrelation coefficient threshold and a preset function, and the preset function is obtained by fitting according to the distribution of the multiple autocorrelation coefficients in time.
  • the target parameters in the postsynaptic membrane voltage kernel function can be obtained based on the autocorrelation coefficient of the dynamic visual sensor signal within a period of time.
  • multiple autocorrelation coefficients are obtained according to a plurality of first target dynamic vision sensor signals within a preset time period, wherein the multiple autocorrelation coefficients are in one-to-one correspondence with multiple moments, that is, the multiple autocorrelation coefficients are Distributed on these multiple moments, as shown in Figure 4;
  • the function fitting is carried out to obtain the preset function; according to the preset Set the autocorrelation coefficient threshold and the preset function obtained by fitting to obtain the objective function. details as follows:
  • the correlation curve of the dynamic visual sensor signal is fitted by the function fitting method, that is, the preset function is obtained by fitting the distribution shape of multiple autocorrelation coefficients in time; for example, the format of the fitted preset function can be for:
  • the values of a, b and c in formula (8) are determined according to the distribution shape of the actual autocorrelation coefficient in time.
  • the inverse function of the preset function is solved based on the preset autocorrelation coefficient threshold; for example, the inverse function of the preset function can be:
  • the preset autocorrelation coefficient threshold is also the value of y in formula (9), so that the value of x in formula (9) is obtained based on the preset autocorrelation coefficient threshold, and the value of x in formula (9) is is the value of the objective parameter of the postsynaptic membrane voltage kernel function.
  • the target parameters of the postsynaptic membrane voltage kernel function are selected based on the following principles:
  • the time span of the postsynaptic membrane voltage kernel function can match the correlation of the dynamic visual sensor signal, that is, the abscissa scale of the postsynaptic membrane voltage kernel function and the time correlation range of the dynamic visual sensor signal should coincide as much as possible.
  • the time span of the postsynaptic membrane voltage kernel function should not be close to 0, so as to avoid weakening the time dynamics of neurons.
  • the threshold interval of the preset autocorrelation coefficient is [0.2, 0.5]
  • the value interval of the target parameter of the post-synaptic membrane voltage kernel function is [5, 13].
  • the post-synaptic membrane voltage kernel function can be a double exponential function, and its expression is as follows:
  • ⁇ ext represents the post-synaptic membrane voltage kernel function, ⁇ ext is also called the amplitude; ⁇ represents the amplitude adjustment coefficient; ⁇ s represents the first time parameter, ⁇ s is also referred to in the application target parameter; ⁇ m represents the second time parameter.
  • Figure 5 shows the postsynaptic membrane voltage kernel function corresponding to several different time parameters.
  • function fitting is performed to obtain a preset function, so the preset function represents The relationship between the autocorrelation coefficient of the dynamic visual sensor signal and time; then the inverse function is calculated for the preset function to obtain the inverse function of the preset function, so the inverse function of the preset function represents the autocorrelation between time and the dynamic visual sensor signal
  • the relationship between the coefficients; then the time value obtained by solving the inverse function of the preset function based on the preset autocorrelation coefficient threshold is used as the value of the target parameter in the post-synaptic membrane voltage kernel function of the spiking neuron.
  • the target parameters in the post-synaptic membrane voltage kernel function are adjusted, so that the spiking neural network model learns the temporal correlation of the dynamic visual sensor signal.
  • the present application can calculate the autocorrelation coefficient of the dynamic visual sensor signal within a period of time online through the time sliding window method.
  • the specific operation is as follows:
  • D represents the number of dynamic visual sensor signals in the sliding window
  • W and H are the dimensions of the pixel on the x-axis and y-axis, respectively, that is, the width and height of the sliding window
  • the pixel whose coordinates are (x′, y′) is the adjacent pixel of the pixel whose coordinates are (x, y), and the selection of x′ and
  • p x,y (S k,x,y ,S k+q,x′,y′ , ⁇ ) represents the pair of S k,x,y and S k+q,x′,y '
  • Carry out the calculation of the proximity degree ⁇ , ⁇ means the calculation formula of the proximity p x,y (S k,x,y ,S k+q,x',y' , ⁇ ) is as follows:
  • any autocorrelation coefficient among the plurality of autocorrelation coefficients is the average value of the first values corresponding to the D second target dynamic vision sensor signals, and D is a positive integer;
  • the D The second target dynamic vision sensor signals are the D first target dynamic vision sensor signals belonging to the same first preset period among the multiple first target dynamic vision sensor signals, and the preset time period includes a plurality of the first target dynamic vision sensor signals The first preset period;
  • the first numerical value corresponding to any second target dynamic visual sensor signal in the D second target dynamic visual sensor signals is obtained by accumulating the second numerical values corresponding to a plurality of pixels;
  • the second numerical value corresponding to any pixel in the plurality of pixels is obtained according to the first signal value of the any pixel and the target signal value of the first target pixel, and the first target pixel is the same as the any pixel
  • the any second target dynamic visual sensor signal includes the first signal value of any pixel, and the any second target dynamic visual sensor
  • the plurality of pixels are a plurality of pixels in the photosensitive element of the dynamic vision sensor, or the plurality of pixels are a plurality of pixels in a two-dimensional image, and the two-dimensional image is collected by the photosensitive element of the dynamic vision sensor Obtained: the first target pixel is a pixel whose proximity to any pixel in the photosensitive element of the dynamic vision sensor is not greater than a preset proximity threshold, or the first target pixel is a pixel in a two-dimensional image that is not closer to the pixel The proximity of any pixel is not greater than the preset proximity threshold, and the two-dimensional image is collected by the photosensitive element of the dynamic vision sensor.
  • the autocorrelation coefficient of the dynamic visual sensor signals in the preset time period is calculated by the time sliding window method, and the time window size of the sliding window can be the first The size of the preset cycle;
  • D represents the quantity of the first target dynamic visual sensor signal in the same first preset cycle, that is, there are D second target dynamic visual sensor signals in the same first preset cycle;
  • W and H is the width and height of the photosensitive element of the dynamic vision sensor respectively;
  • (x, y) represents the coordinates of any pixel in the photosensitive element of the dynamic vision sensor, or (x, y) represents any pixel on the two-dimensional image coordinates;
  • (x', y') represents the coordinates of pixels whose proximity to any pixel (x, y) in the photosensitive element of the dynamic vision sensor is not greater than the preset proximity threshold, or (x', y') Indicates the coordinates of pixels whose proximity to any pixel (x, y) in the two-dimensional
  • the multiple first target dynamic visual sensor signals are dynamic visual sensor signals at multiple moments in the preset time period
  • the dynamic visual sensor signals in the preset time period can be calculated by the method of time sliding window.
  • Autocorrelation coefficient for example, the size of the first preset period is used as the size of the time window to slide the window on the preset time period, and the time interval of one sliding is the size of a first preset period; during a time sliding window process, the time The window can frame D first target dynamic visual sensor signals in a plurality of first target dynamic visual sensor signals, for example, denoted as D second target dynamic visual sensor signals; based on the D second target dynamic visual sensor signals, The autocorrelation coefficient of the dynamic visual sensor signal corresponding to the time sliding window is calculated.
  • each first target dynamic visual sensor signal of a plurality of first target dynamic visual sensor signals comprises the first signal value of a plurality of pixels, so D second in each time window (or first preset cycle)
  • Each second target dynamic visual sensor signal in the target dynamic visual sensor signal also includes the first signal values of a plurality of pixels, so each time window (or the first signal value of each time window (or the second) can be calculated according to the first signal values of adjacent pixels at different times.
  • the autocorrelation coefficient of the dynamic visual sensor signal corresponding to the multiple time sliding windows can be calculated, so that it can be obtained Multiple autocorrelation coefficients.
  • the first target pixel includes a plurality of second target pixels
  • the second value corresponding to any pixel is accumulated according to the third value corresponding to the plurality of second target pixels
  • the third numerical value corresponding to any second target pixel among the plurality of second target pixels is the product of the first signal value of any pixel and the target signal value of any second target pixel.
  • S k, x, y represents the first signal value of any pixel (x, y); S k+q, x′, y′ represents the target value of any second target pixel (x′, y′).
  • Signal value, the third value is S k,x,y S k+q,x′,y′ , the second value is ⁇ x′,y′ S k,x,y S k+q,x′,y′ .
  • the first signal value of the any pixel at a certain moment is respectively compared with the value of each adjacent pixel of the any pixel at another moment Multiply the first signal value of each adjacent pixel to obtain the third numerical value corresponding to each adjacent pixel, that is, to obtain a plurality of third numerical values corresponding to any pixel; then add the third numerical values corresponding to each adjacent pixel to obtain The second numerical value corresponding to any pixel; and then accumulating the second numerical values corresponding to all pixels to obtain the first numerical value corresponding to the second target dynamic vision sensor signal at a certain moment.
  • the above operations can be performed to obtain the second target at all moments
  • the first value corresponding to the dynamic vision sensor signal such as D second target dynamic vision sensor signal
  • the minimum time resolution of the dynamic vision sensor signal is 1us.
  • this application can compress the dynamic vision sensor signal within a certain period of time into one dynamic vision sensor signal, that is, multiple A dynamic visual sensor signal compression frame is a dynamic visual sensor signal.
  • the first signal value of any pixel included in any first target dynamic vision sensor signal among the plurality of first target dynamic vision sensor signals is based on the first signal value of any pixel
  • Two signal values are obtained; wherein: if the second signal value of any pixel is greater than 0, then the first signal value of any pixel is 1; if the second signal value of any pixel is less than 0, Then the first signal value of any pixel is -1; the second signal value of any pixel is any one of the m third signal values of any pixel, m is a positive integer;
  • the m third signal values are obtained according to a plurality of third dynamic visual sensor signals within the preset time period; wherein: the gth third signal value among the m third signal values is the m The sum of the g-1th third signal value among the third signal values and the cumulative sum of the fourth signal value of any pixel in the gth second preset period, 1 ⁇ g ⁇ m, g is a positive integer
  • the multiple third dynamic visual sensor signals include the fourth signal value of
  • the multiple third dynamic visual sensor signals within the preset time period are compressed to obtain multiple first target dynamic visual sensor signals, wherein , the third dynamic vision sensor signal is the original dynamic vision sensor signal; for example, the r third dynamic vision sensor signals are compressed to obtain m first target dynamic vision sensor signals.
  • the first signal value of any pixel (x, y) included in any first target dynamic vision sensor signal is a signal reset value
  • the second or third signal value of any pixel (x, y) is A signal accumulation value
  • the fourth signal value of any pixel (x, y) is a signal polarization value.
  • multiple dynamic vision sensor signals within a certain period of time are compressed into one dynamic vision sensor signal, so as to increase the data density of the dynamic vision sensor signals.
  • taking the size of the second preset period as the size of the time window as the size of the time window to compress the multiple third dynamic visual sensor signals within the preset time period, thereby obtaining multiple first target dynamic visual sensor signals The first signal value of any pixel included in any first target dynamic visual sensor signal is obtained based on the cumulative value reset of the fourth signal value of the arbitrary pixel in multiple third dynamic visual sensor signals, The principle of resetting is: if the cumulative value of the fourth signal value of any pixel in multiple third dynamic vision sensor signals is greater than 0, then the first signal value is set to 1, if the any pixel is in multiple If the accumulated value of the fourth signal value in the third dynamic vision sensor signal is greater than 0, then the first signal value is set to -1.
  • the first target dynamic visual sensor signal has a higher data density, compared with the third dynamic visual sensor signal, using the first target dynamic visual sensor signal
  • the first target dynamic visual sensor signal can be an original dynamic visual sensor signal, such as the third dynamic visual sensor signal; the first target dynamic visual sensor signal can also be obtained by compressing the original dynamic visual sensor signal For example, a plurality of third dynamic vision sensor signals are compressed to obtain a first target dynamic vision sensor signal.
  • the first target dynamic visual sensor signal was the original dynamic visual sensor signal
  • the first signal value of any pixel in the first target dynamic visual sensor signal was a signal polarization value
  • the first target dynamic visual sensor signal When it is a signal obtained by compressing frames, the first signal value of any pixel in the first target dynamic vision sensor signal is a signal reset value.
  • Fig. 6 is a schematic diagram of the training process of a spiking neural network model provided by the embodiment of the present application; on the training data set of the dynamic visual sensor signal, the technical solution of the present application can be used to effectively train the training set,
  • the spiking neural network model basically converges after a period of time, for example, after 50 iterations, the loss basically converges.
  • Fig. 7 is a schematic diagram of the impact of different target parameters on the training of the spiking neural network model provided by the embodiment of the present application; the target parameter ⁇ s of the post-synaptic membrane voltage kernel function is different, and the spiking neural network model training effect is different; training With the increase of the target parameter ⁇ s , the loss at time presents a trend of first decreasing and then increasing; among them, when ⁇ s is around 10, the loss is smaller. Therefore, through the technical solution of the present application, an appropriate target parameter ⁇ s can be selected so that the training of the spiking neural network model can be optimized.
  • Fig. 8 is a comparison chart of the denoising effect of the spike neural network model provided by the embodiment of the present application and the denoising effect of the three-dimensional convolutional neural network model; compared with the traditional three-dimensional convolutional neural network model volume, the present application
  • the denoising effect of the provided spiking neural network model is significantly better.
  • Fig. 9 is a comparison diagram of the denoising effect of the denoising signal of the pulse neural network model provided by the embodiment of the present application and the de-blurred signal of the three-dimensional convolutional neural network model; first, the dynamic visual sensor signal is processed Denoising, when using the denoised dynamic visual sensor signal for subsequent deblurring tasks, the deblurred image based on the dynamic visual sensor signal after denoising based on the pulse neural network model is compared with the denoising based on the three-dimensional convolutional neural network model.
  • the deblurred image performed by the visual sensor signal can retain more real details and be closer to reality.
  • FIG. 10 is a schematic structural diagram of a denoising device provided by an embodiment of the present application; the denoising device 1000 is applied to electronic equipment, the electronic equipment includes a server and a terminal, and the denoising device 1000 includes: an acquisition unit 1001, for acquiring a first dynamic visual sensor signal; a processing unit 1002, for performing denoising processing on the first dynamic visual sensor signal by using a spiking neural network model to obtain a second dynamic visual sensor signal, the spiking neural network model
  • the postsynaptic membrane voltage kernel function of the spiking neurons of the network model includes target parameters determined from the autocorrelation coefficients of dynamic visual sensor signals.
  • the autocorrelation coefficient of the dynamic vision sensor signal includes a plurality of autocorrelation coefficients, and the plurality of autocorrelation coefficients are based on a plurality of first target dynamic vision sensor signals within a preset time period. obtained; the target parameter is obtained according to a preset autocorrelation coefficient threshold and a preset function, and the preset function is obtained by fitting according to the distribution of the multiple autocorrelation coefficients in time.
  • any autocorrelation coefficient among the plurality of autocorrelation coefficients is the average value of the first values corresponding to the D second target dynamic vision sensor signals, and D is a positive integer;
  • the D The second target dynamic vision sensor signals are the D first target dynamic vision sensor signals belonging to the same first preset period among the multiple first target dynamic vision sensor signals, and the preset time period includes a plurality of the first target dynamic vision sensor signals The first preset period;
  • the first numerical value corresponding to any second target dynamic visual sensor signal in the D second target dynamic visual sensor signals is obtained by accumulating the second numerical values corresponding to a plurality of pixels;
  • the second numerical value corresponding to any pixel in the plurality of pixels is obtained according to the first signal value of the any pixel and the target signal value of the first target pixel, and the first target pixel is the same as the any pixel
  • the any second target dynamic visual sensor signal includes the first signal value of any pixel, and the any second target dynamic visual sensor
  • the first target pixel includes a plurality of second target pixels
  • the second value corresponding to any pixel is accumulated according to the third value corresponding to the plurality of second target pixels
  • the third numerical value corresponding to any second target pixel among the plurality of second target pixels is the product of the first signal value of any pixel and the target signal value of any second target pixel.
  • the first signal value of any pixel included in any first target dynamic vision sensor signal among the plurality of first target dynamic vision sensor signals is based on the first signal value of any pixel
  • Two signal values are obtained; wherein: if the second signal value of any pixel is greater than 0, then the first signal value of any pixel is 1; if the second signal value of any pixel is less than 0, Then the first signal value of any pixel is -1; the second signal value of any pixel is any one of the m third signal values of any pixel, m is a positive integer;
  • the m third signal values are obtained according to a plurality of third dynamic visual sensor signals within the preset time period; wherein: the gth third signal value among the m third signal values is the m The sum of the g-1th third signal value among the third signal values and the cumulative sum of the fourth signal value of any pixel in the gth second preset period, 1 ⁇ g ⁇ m, g is a positive integer
  • the multiple third dynamic visual sensor signals include the fourth signal value of
  • the spiking neural network model includes N convolutional layers and N inverse convolutional layers, wherein: the output of the jth convolutional layer in the N convolutional layers is The input of the j+1th convolutional layer in the N convolutional layers, the output of the jth inverse convolutional layer in the N inverse convolutional layers is the first in the N inverse convolutional layers The input of the j+1 deconvolution layer, the output of the jth convolution layer is also the input of the N-j deconvolution layer in the N deconvolution layers, and the N convolution layers The output of the Nth convolutional layer in is the input of the first inverse convolutional layer in the N deconvolutional layers, 1 ⁇ j ⁇ N, and N and j are positive integers.
  • each unit of the denoising apparatus 1000 described in FIG. 10 may also refer to corresponding descriptions of the embodiments shown in FIGS. 1 to 9 .
  • the beneficial effects brought by the denoising device 1000 described in FIG. 10 can refer to the corresponding descriptions of the embodiments shown in FIG. 1 to FIG. 9 , and the description will not be repeated here.
  • FIG. 11 is a schematic structural diagram of an electronic device 1110 provided by an embodiment of the present application.
  • the electronic device 1110 includes a processor 1111, a memory 1112, and a communication interface 1113.
  • the above-mentioned processor 1111, memory 1112, and communication interface 1113 They are connected to each other through the bus 1114 .
  • Memory 1112 includes, but is not limited to, random access memory (random access memory, RAM), read-only memory (read-only memory, ROM), erasable programmable read-only memory (erasable programmable read only memory, EPROM), or Portable read-only memory (compact disc read-only memory, CD-ROM), the memory 1112 is used for related computer programs and data.
  • the communication interface 1113 is used to receive and send data.
  • the processor 1111 may be one or more central processing units (central processing unit, CPU).
  • CPU central processing unit
  • the CPU may be a single-core CPU or a multi-core CPU.
  • the processor 1111 in the electronic device 1110 is configured to read the computer program code stored in the memory 1112, and execute the method of any one of the embodiments shown in FIG. 3 .
  • the electronic device may be a server or a terminal, and the realization of various operations of the electronic device 1110 described in FIG. 11 may refer to corresponding descriptions of the embodiments shown in FIGS. 1 to 9 . Moreover, for the beneficial effects brought by the electronic device 1110 described in FIG. 11 , reference may be made to the corresponding descriptions of the embodiments shown in FIGS. 1 to 9 , and the description will not be repeated here.
  • the embodiment of the present application also provides a chip, the above-mentioned chip includes at least one processor, memory and interface circuit, the above-mentioned memory, the above-mentioned transceiver and the above-mentioned at least one processor are interconnected by lines, and the above-mentioned at least one memory stores a computer program; the above-mentioned When the computer program is executed by the above-mentioned processor, the method flow of any one embodiment shown in FIG. 3 is realized.
  • An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored in the above-mentioned computer-readable storage medium, and when the computer program is run on a computer, the method flow of any one of the embodiments shown in FIG. 3 is implemented.
  • An embodiment of the present application further provides a computer program product.
  • the computer program product is run on a computer, the method flow of any one of the embodiments shown in FIG. 3 is realized.
  • processors mentioned in the embodiment of the present application may be a central processing unit (Central Processing Unit, CPU), and may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application-specific integrated circuits ( Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the memory mentioned in the embodiments of the present application may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memories.
  • the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash.
  • the volatile memory can be Random Access Memory (RAM), which acts as external cache memory.
  • RAM Static Random Access Memory
  • SRAM Static Random Access Memory
  • DRAM Dynamic Random Access Memory
  • Synchronous Dynamic Random Access Memory Synchronous Dynamic Random Access Memory
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM, DDR SDRAM enhanced synchronous dynamic random access memory
  • Enhanced SDRAM, ESDRAM synchronous connection dynamic random access memory
  • Synchlink DRAM, SLDRAM Direct Memory Bus Random Access Memory
  • Direct Rambus RAM Direct Rambus RAM
  • the processor is a general-purpose processor, DSP, ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components
  • the memory storage module
  • sequence numbers of the above-mentioned processes do not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, and should not be used in the embodiments of the present application.
  • the implementation process constitutes any limitation.
  • the disclosed systems, devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the above units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or can be Integrate into another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods shown in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .
  • the modules in the device of the embodiment of the present application can be combined, divided and deleted according to actual needs.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Eye Examination Apparatus (AREA)

Abstract

The present application provides a denoising method and a related device in the field of artificial intelligence. The method comprises: obtaining a first dynamic vision sensor signal; and performing denoising processing on the first dynamic vision sensor signal by means of a spiking neural network model to obtain a second dynamic vision sensor signal, wherein a postsynaptic membrane voltage kernel function of a spiking neuron of the spiking neural network model comprises a target parameter, and the target parameter is determined according to autocorrelation coefficients of dynamic vision sensor signals. By means of embodiments of the present application, the denoising effect of the dynamic vision sensor signal can be improved.

Description

去噪方法和相关设备Denoising method and related equipment
本申请要求于2021年11月09日提交中国国家知识产权局、申请号为202111321228.7、发明名称为“去噪方法和相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202111321228.7 and the title of the invention "Denoising method and related equipment" filed with the State Intellectual Property Office of China on November 09, 2021, the entire contents of which are incorporated by reference in this application middle.
技术领域technical field
本申请实施例涉及人工智能技术领域,尤其涉及一种去噪方法和相关设备。The embodiments of the present application relate to the technical field of artificial intelligence, and in particular, to a denoising method and related equipment.
背景技术Background technique
现有的动态视觉传感器(Dynamical Vision Sensor,DVS)信号去噪方法主要为滤波去噪和基于深度学习框架下的人工神经网络(Artificial Neural Network,ANN)去噪。滤波去噪方法包括时间滤波方法或空间滤波方法,其主要通过过滤时间或空间上孤立的事件去噪。基于人工神经网络的去噪方法通常把动态视觉传感器数据流压帧成图像以提高数据密度,再采用传统的对RGB图像去噪的网络对该压帧得到的图像去噪;其中,基于二维(2D)卷积神经网络(Convolutional Neural Networks,CNN)的去噪方法通常需要额外定义包含时间信息的噪声模型,而基于三维(3D)卷积神经网络的去噪方法则需要对一段时间窗做时间卷积。The existing dynamic vision sensor (Dynamic Vision Sensor, DVS) signal denoising methods are mainly filter denoising and artificial neural network (Artificial Neural Network, ANN) denoising based on deep learning framework. Filtering denoising methods include temporal filtering methods or spatial filtering methods, which mainly denoise by filtering temporally or spatially isolated events. The denoising method based on the artificial neural network usually compresses the dynamic visual sensor data stream into an image to increase the data density, and then uses the traditional RGB image denoising network to denoise the image obtained by compressing the frame; among them, based on the two-dimensional (2D) Convolutional Neural Networks (CNN) denoising methods usually require an additional definition of a noise model containing time information, while denoising methods based on three-dimensional (3D) Convolutional Neural Networks require a period of time window temporal convolution.
然而,滤波去噪方法在数据稀疏时容易把事件和噪声一起过滤,在基准数据集上的表现不如基于人工神经网络的去噪方法;而基于人工神经网络的去噪方法则面临网络规模大、计算量大和处理时间长等问题。并且,因动态视觉传感器信号高度稀疏、数据密度变化范围大和时间分辨率高等特性,现有的动态视觉传感器信号去噪方法无法取得良好的去噪效果。However, the filter denoising method is easy to filter events and noise together when the data is sparse, and the performance on the benchmark dataset is not as good as the denoising method based on the artificial neural network; while the denoising method based on the artificial neural network faces the problem of large network size, Problems such as large amount of calculation and long processing time. Moreover, due to the characteristics of highly sparse dynamic visual sensor signals, large data density variation range, and high time resolution, existing dynamic visual sensor signal denoising methods cannot achieve good denoising effects.
发明内容Contents of the invention
本申请提供一种去噪方法和相关设备,能够提高动态视觉传感器信号的去噪效果。The present application provides a denoising method and related equipment, which can improve the denoising effect of dynamic visual sensor signals.
根据第一方面,本申请涉及一种去噪方法,包括:获取第一动态视觉传感器信号;采用脉冲神经网络(Spiking Neural Network,SNN)模型对所述第一动态视觉传感器信号进行去噪处理,以得到第二动态视觉传感器信号,所述脉冲神经网络模型的脉冲神经元(spiking neuron)的突触后膜电压(Post-Synaptic Potential,PSP)核函数包括目标参数,所述目标参数是根据动态视觉传感器信号的自相关系数确定的。According to a first aspect, the present application relates to a denoising method, comprising: acquiring a first dynamic visual sensor signal; using a Spiking Neural Network (SNN) model to perform denoising processing on the first dynamic visual sensor signal, To obtain the second dynamic visual sensor signal, the post-synaptic membrane voltage (Post-Synaptic Potential, PSP) kernel function of the spiking neuron (spiking neuron) of the spiking neural network model includes a target parameter, and the target parameter is based on the dynamic The autocorrelation coefficient of the visual sensor signal is determined.
在本申请中,采用脉冲神经网络模型对动态视觉传感器信号进行去噪处理,该脉冲神经网络模型包括脉冲神经元,且该脉冲神经元的突触后膜电压核函数中的目标参数是根据动态视觉传感器信号的自相关系数确定的,故该突触后膜电压核函数能够让该脉冲神经网络模型学习到动态视觉传感器信号的时间相关性,并去除动态视觉传感器信号中时间相关性弱的噪声事件,从而能够提高该脉冲神经网络模型对动态视觉传感器信号的去噪效果。并且,由于脉冲神经网络模型固有的时间动力学特性,其对高度稀疏的动态视觉传感器信号进行流式去噪,也即脉冲神经网络模型的输入数据与输出数据帧数相同,数据在脉冲神经网络模型中一遍通过,不依赖三维卷积在时间维度进行遍历滑窗计算,故与现有的基于人工神经网络的去噪方法相比,可以极大降低运行时间,网络规模和计算量。In this application, the spiking neural network model is used to denoise the dynamic visual sensor signal. The spiking neural network model includes spiking neurons, and the target parameters in the postsynaptic membrane voltage kernel function of the spiking neurons are based on the dynamic The autocorrelation coefficient of the visual sensor signal is determined, so the post-synaptic membrane voltage kernel function can enable the spiking neural network model to learn the time correlation of the dynamic visual sensor signal, and remove the noise with weak temporal correlation in the dynamic visual sensor signal events, so that the denoising effect of the spiking neural network model on dynamic visual sensor signals can be improved. Moreover, due to the inherent temporal dynamics of the spiking neural network model, it performs streaming denoising on highly sparse dynamic visual sensor signals, that is, the input data of the spiking neural network model is the same as the number of output data frames, and the data in the spiking neural network One pass in the model does not rely on three-dimensional convolution to perform traversal sliding window calculations in the time dimension, so compared with the existing denoising methods based on artificial neural networks, it can greatly reduce the running time, network size and calculation amount.
在一种可能的实现方式中,所述动态视觉传感器信号的自相关系数包括多个自相关系数,所述多个自相关系数是根据预设时间段内的多个第一目标动态视觉传感器信号得到的;所述 目标参数是根据预设自相关系数阈值和预设函数得到的,所述预设函数是根据所述多个自相关系数在时间上的分布拟合得到的。In a possible implementation manner, the autocorrelation coefficient of the dynamic vision sensor signal includes a plurality of autocorrelation coefficients, and the plurality of autocorrelation coefficients are based on a plurality of first target dynamic vision sensor signals within a preset time period. obtained; the target parameter is obtained according to a preset autocorrelation coefficient threshold and a preset function, and the preset function is obtained by fitting according to the distribution of the multiple autocorrelation coefficients in time.
在本实现方式中,基于预设时间段内的多个第一目标动态视觉传感器信号得到的多个自相关系数在时间上的分布,进行函数拟合得到预设函数,故该预设函数表征动态视觉传感器信号的自相关系数与时间的关系;然后针对该预设函数求反函数,得到该预设函数的反函数,故该预设函数的反函数表征时间与动态视觉传感器信号的自相关系数的关系;再将基于预设自相关系数阈值求解该预设函数的反函数得到的时间值,作为脉冲神经元的突触后膜电压核函数中的目标参数的值。如此,实现对突触后膜电压核函数中的目标参数进行调整,以使得脉冲神经网络模型学习到动态视觉传感器信号的时间相关性。In this implementation, based on the time distribution of multiple autocorrelation coefficients obtained from multiple first target dynamic visual sensor signals within a preset time period, function fitting is performed to obtain a preset function, so the preset function represents The relationship between the autocorrelation coefficient of the dynamic visual sensor signal and time; then the inverse function is calculated for the preset function to obtain the inverse function of the preset function, so the inverse function of the preset function represents the autocorrelation between time and the dynamic visual sensor signal The relationship between the coefficients; then the time value obtained by solving the inverse function of the preset function based on the preset autocorrelation coefficient threshold is used as the value of the target parameter in the post-synaptic membrane voltage kernel function of the spiking neuron. In this way, the target parameters in the post-synaptic membrane voltage kernel function are adjusted, so that the spiking neural network model learns the temporal correlation of the dynamic visual sensor signal.
在一种可能的实现方式中,所述多个自相关系数中的任一自相关系数为D个第二目标动态视觉传感器信号对应的第一数值的平均值,D为正整数;所述D个第二目标动态视觉传感器信号为所述多个第一目标动态视觉传感器信号中归属于同一第一预设周期的D个第一目标动态视觉传感器信号,所述预设时间段包括多个所述第一预设周期;所述D个第二目标动态视觉传感器信号中的任一第二目标动态视觉传感器信号对应的第一数值是根据多个像素对应的第二数值累加得到的;所述多个像素中的任一像素对应的第二数值是根据所述任一像素的第一信号值和第一目标像素的目标信号值得到的,所述第一目标像素为与所述任一像素的邻近度不大于预设邻近度阈值的像素;所述任一第二目标动态视觉传感器信号包括所述任一像素的第一信号值,所述任一第二目标动态视觉传感器信号为所述多个第一目标动态视觉传感器信号中的第w个第一目标动态视觉传感器信号,w为正整数;所述第一目标像素的目标信号值为所述第一目标像素在第三目标动态视觉传感器信号中的第一信号值,所述第三目标动态视觉传感器信号为所述多个第一目标动态视觉传感器信号中的第w+q个第一目标动态视觉传感器信号,q为正整数。其中,所述多个像素为动态视觉传感器的感光元件中的多个像素,或所述多个像素为二维图像中的多个像素,且该二维图像是通过动态视觉传感器的感光元件采集得到;所述第一目标像素为动态视觉传感器的感光元件中与所述任一像素的邻近度不大于预设邻近度阈值的像素,或所述第一目标像素为二维图像中与所述任一像素的邻近度不大于预设邻近度阈值的像素,且该二维图像是通过动态视觉传感器的感光元件采集得到。In a possible implementation, any autocorrelation coefficient among the plurality of autocorrelation coefficients is the average value of the first values corresponding to the D second target dynamic vision sensor signals, and D is a positive integer; the D The second target dynamic vision sensor signals are the D first target dynamic vision sensor signals belonging to the same first preset period among the multiple first target dynamic vision sensor signals, and the preset time period includes a plurality of the first target dynamic vision sensor signals The first preset period; the first numerical value corresponding to any second target dynamic visual sensor signal in the D second target dynamic visual sensor signals is obtained by accumulating the second numerical values corresponding to a plurality of pixels; The second numerical value corresponding to any pixel in the plurality of pixels is obtained according to the first signal value of the any pixel and the target signal value of the first target pixel, and the first target pixel is the same as the any pixel The pixels whose proximity is not greater than the preset proximity threshold; the any second target dynamic visual sensor signal includes the first signal value of any pixel, and the any second target dynamic visual sensor signal is the The w first target dynamic vision sensor signal in a plurality of first target dynamic vision sensor signals, w is a positive integer; the target signal value of the first target pixel is the first target pixel in the third target dynamic vision The first signal value in the sensor signal, the third target dynamic vision sensor signal is the w+qth first target dynamic vision sensor signal among the plurality of first target dynamic vision sensor signals, and q is a positive integer. Wherein, the plurality of pixels are a plurality of pixels in the photosensitive element of the dynamic vision sensor, or the plurality of pixels are a plurality of pixels in a two-dimensional image, and the two-dimensional image is collected by the photosensitive element of the dynamic vision sensor Obtained: the first target pixel is a pixel whose proximity to any pixel in the photosensitive element of the dynamic vision sensor is not greater than a preset proximity threshold, or the first target pixel is a pixel in a two-dimensional image that is not closer to the pixel The proximity of any pixel is not greater than the preset proximity threshold, and the two-dimensional image is collected by the photosensitive element of the dynamic vision sensor.
在本实现方式中,多个第一目标动态视觉传感器信号是预设时间段内的多个时刻的动态视觉传感器信号,可以通过时间滑窗的方法计算预设时间段内的动态视觉传感器信号的自相关系数;例如以第一预设周期的大小为时间窗大小在预设时间段上滑窗,一次滑动的时间间隔为一个第一预设周期的大小;在一次时间滑窗过程中,时间窗可以框住多个第一目标动态视觉传感器信号中的D个第一目标动态视觉传感器信号,例如记为D个第二目标动态视觉传感器信号;基于该D个第二目标动态视觉传感器信号,计算得到本次时间滑窗对应的动态视觉传感器信号的自相关系数。其中,由于多个第一目标动态视觉传感器信号每个第一目标动态视觉传感器信号包括多个像素的第一信号值,故每个时间窗(或第一预设周期)内的D个第二目标动态视觉传感器信号中的每个第二目标动态视觉传感器信号也包括多个像素的第一信号值,故可以根据相邻像素在不同时刻的第一信号值计算得到每个时间窗(或第一预设周期)对应的自相关系数。如此,以第一预设周期的大小为时间窗大小在预设时间段上经过多次滑窗后,并可以计算到多次时间滑窗对应的动态视觉传感器信号的自相关系数,从而可以得到多个自相关系数。In this implementation, the multiple first target dynamic visual sensor signals are dynamic visual sensor signals at multiple moments in the preset time period, and the dynamic visual sensor signals in the preset time period can be calculated by the method of time sliding window. Autocorrelation coefficient; for example, the size of the first preset period is used as the size of the time window to slide the window on the preset time period, and the time interval of one sliding is the size of a first preset period; during a time sliding window process, the time The window can frame D first target dynamic visual sensor signals in a plurality of first target dynamic visual sensor signals, for example, denoted as D second target dynamic visual sensor signals; based on the D second target dynamic visual sensor signals, The autocorrelation coefficient of the dynamic visual sensor signal corresponding to the time sliding window is calculated. Wherein, because each first target dynamic visual sensor signal of a plurality of first target dynamic visual sensor signals comprises the first signal value of a plurality of pixels, so D second in each time window (or first preset period) Each second target dynamic visual sensor signal in the target dynamic visual sensor signal also includes the first signal values of a plurality of pixels, so each time window (or the first signal value of each time window (or the second) can be calculated according to the first signal values of adjacent pixels at different times. A preset period) corresponding to the autocorrelation coefficient. In this way, with the size of the first preset period as the time window size, after multiple sliding windows in the preset time period, the autocorrelation coefficient of the dynamic visual sensor signal corresponding to the multiple time sliding windows can be calculated, so that it can be obtained Multiple autocorrelation coefficients.
在一种可能的实现方式中,所述第一目标像素包括多个第二目标像素,所述任一像素对应的第二数值是根据所述多个第二目标像素对应的第三数值累加得到的;所述多个第二目标 像素中的任一第二目标像素对应的第三数值为所述任一像素的第一信号值和所述任一第二目标像素的目标信号值的乘积。In a possible implementation manner, the first target pixel includes a plurality of second target pixels, and the second value corresponding to any pixel is accumulated according to the third value corresponding to the plurality of second target pixels The third numerical value corresponding to any second target pixel among the plurality of second target pixels is the product of the first signal value of any pixel and the target signal value of any second target pixel.
在本实现方式中,针对动态视觉传感器信号中的所有像素中任一像素,将该任一像素在某一时刻的第一信号值分别与该任一像素的每个相邻像素在另一时刻的第一信号值相乘,得到每个相邻像素对应的第三数值,也即得到该任一像素对应的多个第三数值;然后将每个相邻像素对应的第三数值累加,得到该任一像素对应的第二数值;再将所有像素对应的第二数值累加,即可得到该某一时刻的第二目标动态视觉传感器信号对应的第一数值。针对一个时间窗(或第一预设周期)内的所有时刻的第二目标动态视觉传感器信号(例如D个第二目标动态视觉传感器信号)均执行上述操作,即可得到所有时刻的第二目标动态视觉传感器信号(例如D个第二目标动态视觉传感器信号)对应的第一数值;再对这个时间窗(或第一预设周期)内得到的所有第一数值计算平均值,即可得到这个时间窗(或第一预设周期)对应的动态视觉传感器信号的自相关系数。In this implementation, for any pixel in all the pixels in the dynamic visual sensor signal, the first signal value of the any pixel at a certain moment is respectively compared with the value of each adjacent pixel of the any pixel at another moment Multiply the first signal value of each adjacent pixel to obtain the third numerical value corresponding to each adjacent pixel, that is, to obtain a plurality of third numerical values corresponding to any pixel; then add the third numerical values corresponding to each adjacent pixel to obtain The second numerical value corresponding to any pixel; and then accumulating the second numerical values corresponding to all pixels to obtain the first numerical value corresponding to the second target dynamic vision sensor signal at a certain moment. For the second target dynamic visual sensor signals (such as D second target dynamic visual sensor signals) at all moments within a time window (or the first preset period), the above operations can be performed to obtain the second target at all moments The first value corresponding to the dynamic vision sensor signal (such as D second target dynamic vision sensor signal); and then calculate the average value of all the first values obtained in this time window (or the first preset period), and this can be obtained The autocorrelation coefficient of the dynamic vision sensor signal corresponding to the time window (or the first preset period).
在一种可能的实现方式中,所述多个第一目标动态视觉传感器信号中的任一第一目标动态视觉传感器信号包括的任一像素的第一信号值是根据所述任一像素的第二信号值得到的;其中:若所述任一像素的第二信号值大于0,则所述任一像素的第一信号值为1;若所述任一像素的第二信号值小于0,则所述任一像素的第一信号值为-1;所述任一像素的第二信号值为所述任一像素的m个第三信号值中的任意一个,m为正整数;所述m个第三信号值是根据所述预设时间段内的多个第三动态视觉传感器信号得到的;其中:所述m个第三信号值中的第g个第三信号值为所述m个第三信号值中的第g-1个第三信号值与所述任一像素在第g个第二预设周期内的第四信号值的累加和的和,1≤g≤m,g为正整数,所述多个第三动态视觉传感器信号包括所述任一像素在所述第g个第二预设周期内的第四信号值,所述预设时间段包括所述第g个第二预设周期;且g等于1时,第1个第三信号值为所述任一像素在第1个第二预设周期内的第四信号值的累加和。In a possible implementation manner, the first signal value of any pixel included in any first target dynamic vision sensor signal among the plurality of first target dynamic vision sensor signals is based on the first signal value of any pixel Two signal values are obtained; wherein: if the second signal value of any pixel is greater than 0, then the first signal value of any pixel is 1; if the second signal value of any pixel is less than 0, Then the first signal value of any pixel is -1; the second signal value of any pixel is any one of the m third signal values of any pixel, m is a positive integer; The m third signal values are obtained according to a plurality of third dynamic visual sensor signals within the preset time period; wherein: the gth third signal value among the m third signal values is the m The sum of the g-1th third signal value among the third signal values and the cumulative sum of the fourth signal value of any pixel in the gth second preset period, 1≤g≤m, g is a positive integer, the multiple third dynamic visual sensor signals include the fourth signal value of any pixel in the g second preset period, and the preset time period includes the g a second preset period; and when g is equal to 1, the first third signal value is the accumulated sum of the fourth signal values of any pixel in the first second preset period.
在本实现方式中,将一定时长内的多个动态视觉传感器信号压缩为一个动态视觉传感器信号,以提高动态视觉传感器信号的数据密度。例如,以第二预设周期的大小为时间窗的大小为时间窗大小对预设时间段内的多个第三动态视觉传感器信号进行压帧,从而得到多个第一目标动态视觉传感器信号,其中的任一第一目标动态视觉传感器信号包括的任一像素的第一信号值是基于该任一像素在多个第三动态视觉传感器信号中的第四信号值的累加值重置得到的,重置的原则为:若该任一像素在多个第三动态视觉传感器信号中的第四信号值的累加值大于0,则将第一信号值设为1,若该任一像素在多个第三动态视觉传感器信号中的第四信号值的累加值大于0,则将第一信号值设为-1。并且,由于第一目标动态视觉传感器信号具有较高的数据密度,相比于第三动态视觉传感器信号,采用第一目标动态视觉传感器信号对脉冲神经网络模型进行训练,可以提高训练效率。In this implementation manner, multiple dynamic vision sensor signals within a certain period of time are compressed into one dynamic vision sensor signal, so as to increase the data density of the dynamic vision sensor signal. For example, taking the size of the second preset period as the size of the time window as the size of the time window to compress the multiple third dynamic visual sensor signals within the preset time period, thereby obtaining multiple first target dynamic visual sensor signals, The first signal value of any pixel included in any first target dynamic visual sensor signal is obtained based on the cumulative value reset of the fourth signal value of the arbitrary pixel in multiple third dynamic visual sensor signals, The principle of resetting is: if the cumulative value of the fourth signal value of any pixel in multiple third dynamic vision sensor signals is greater than 0, then the first signal value is set to 1, if the any pixel is in multiple If the accumulated value of the fourth signal value in the third dynamic vision sensor signal is greater than 0, then the first signal value is set to -1. Moreover, since the first target dynamic visual sensor signal has a higher data density, compared with the third dynamic visual sensor signal, using the first target dynamic visual sensor signal to train the spiking neural network model can improve the training efficiency.
在一种可能的实现方式中,所述脉冲神经网络模型包括N个卷积层和N个逆卷积层,其中:所述N个卷积层中的第j个卷积层的输出为所述N个卷积层中的第j+1个卷积层的输入,所述N个逆卷积层中的第j个逆卷积层的输出为所述N个逆卷积层中的第j+1个逆卷积层的输入,所述第j个卷积层的输出还为所述N个逆卷积层中的第N-j个逆卷积层的输入,所述N个卷积层中的第N个卷积层的输出为所述N个逆卷积层中的第1个逆卷积层的输入,1≤j≤N,N和j为正整数。In a possible implementation manner, the spiking neural network model includes N convolutional layers and N inverse convolutional layers, wherein: the output of the jth convolutional layer in the N convolutional layers is The input of the j+1th convolutional layer in the N convolutional layers, the output of the jth inverse convolutional layer in the N inverse convolutional layers is the first in the N inverse convolutional layers The input of the j+1 deconvolution layer, the output of the jth convolution layer is also the input of the N-j deconvolution layer in the N deconvolution layers, and the N convolution layers The output of the Nth convolutional layer in is the input of the first inverse convolutional layer in the N deconvolutional layers, 1≤j≤N, and N and j are positive integers.
在本实现方式中,脉冲神经网络模型包括对称的N个卷积层和N个逆卷积层,且每个卷积层和与其对称的逆卷积层之间有跳跃连接;如此,该脉冲神经网络模型通过逆卷积和跳跃 连接对动态视觉传感器信号进行特征提取和重构,有利于确保提取的特征完整以及重构的特征真实。In this implementation, the spiking neural network model includes symmetrical N convolutional layers and N deconvolutional layers, and there is a skip connection between each convolutional layer and its symmetric deconvolutional layer; thus, the spiking The neural network model extracts and reconstructs the features of dynamic visual sensor signals through deconvolution and skip connections, which is beneficial to ensure the integrity of the extracted features and the authenticity of the reconstructed features.
根据第二方面,本申请涉及一种去噪装置,有益效果可以参见第一方面的描述,此处不再赘述。所述去噪装置具有实现上述第一方面的方法实例中行为的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。在一种可能的实现方式中,所述去噪装置,包括:获取单元,用于获取第一动态视觉传感器信号;处理单元,用于采用脉冲神经网络模型对所述第一动态视觉传感器信号进行去噪处理,以得到第二动态视觉传感器信号,所述脉冲神经网络模型的脉冲神经元的突触后膜电压核函数包括目标参数,所述目标参数是根据动态视觉传感器信号的自相关系数确定的。According to the second aspect, the present application relates to a denoising device, and the beneficial effects can be referred to the description of the first aspect, which will not be repeated here. The denoising device has the function of realizing the behavior in the method example of the first aspect above. The functions described above may be implemented by hardware, or may be implemented by executing corresponding software on the hardware. The hardware or software includes one or more modules corresponding to the above functions. In a possible implementation manner, the denoising device includes: an acquisition unit, configured to acquire a first dynamic visual sensor signal; a processing unit, configured to use a spiking neural network model to process the first dynamic visual sensor signal Denoising processing, to obtain the second dynamic visual sensor signal, the post-synaptic membrane voltage kernel function of the spiking neuron of the spiking neural network model includes a target parameter, and the target parameter is determined according to the autocorrelation coefficient of the dynamic visual sensor signal of.
在一种可能的实现方式中,所述动态视觉传感器信号的自相关系数包括多个自相关系数,所述多个自相关系数是根据预设时间段内的多个第一目标动态视觉传感器信号得到的;所述目标参数是根据预设自相关系数阈值和预设函数得到的,所述预设函数是根据所述多个自相关系数在时间上的分布拟合得到的。In a possible implementation manner, the autocorrelation coefficient of the dynamic vision sensor signal includes a plurality of autocorrelation coefficients, and the plurality of autocorrelation coefficients are based on a plurality of first target dynamic vision sensor signals within a preset time period. obtained; the target parameter is obtained according to a preset autocorrelation coefficient threshold and a preset function, and the preset function is obtained by fitting according to the distribution of the multiple autocorrelation coefficients in time.
在一种可能的实现方式中,所述多个自相关系数中的任一自相关系数为D个第二目标动态视觉传感器信号对应的第一数值的平均值,D为正整数;所述D个第二目标动态视觉传感器信号为所述多个第一目标动态视觉传感器信号中归属于同一第一预设周期的D个第一目标动态视觉传感器信号,所述预设时间段包括多个所述第一预设周期;所述D个第二目标动态视觉传感器信号中的任一第二目标动态视觉传感器信号对应的第一数值是根据多个像素对应的第二数值累加得到的;所述多个像素中的任一像素对应的第二数值是根据所述任一像素的第一信号值和第一目标像素的目标信号值得到的,所述第一目标像素为与所述任一像素的邻近度不大于预设邻近度阈值的像素;所述任一第二目标动态视觉传感器信号包括所述任一像素的第一信号值,所述任一第二目标动态视觉传感器信号为所述多个第一目标动态视觉传感器信号中的第w个第一目标动态视觉传感器信号,w为正整数;所述第一目标像素的目标信号值为所述第一目标像素在第三目标动态视觉传感器信号中的第一信号值,所述第三目标动态视觉传感器信号为所述多个第一目标动态视觉传感器信号中的第w+q个第一目标动态视觉传感器信号,q为正整数。In a possible implementation, any autocorrelation coefficient among the plurality of autocorrelation coefficients is the average value of the first values corresponding to the D second target dynamic vision sensor signals, and D is a positive integer; the D The second target dynamic vision sensor signals are the D first target dynamic vision sensor signals belonging to the same first preset period among the multiple first target dynamic vision sensor signals, and the preset time period includes a plurality of the first target dynamic vision sensor signals The first preset period; the first numerical value corresponding to any second target dynamic visual sensor signal in the D second target dynamic visual sensor signals is obtained by accumulating the second numerical values corresponding to a plurality of pixels; The second numerical value corresponding to any pixel in the plurality of pixels is obtained according to the first signal value of the any pixel and the target signal value of the first target pixel, and the first target pixel is the same as the any pixel The pixels whose proximity is not greater than the preset proximity threshold; the any second target dynamic visual sensor signal includes the first signal value of any pixel, and the any second target dynamic visual sensor signal is the The w first target dynamic vision sensor signal in a plurality of first target dynamic vision sensor signals, w is a positive integer; the target signal value of the first target pixel is the first target pixel in the third target dynamic vision The first signal value in the sensor signal, the third target dynamic vision sensor signal is the w+qth first target dynamic vision sensor signal among the plurality of first target dynamic vision sensor signals, and q is a positive integer.
在一种可能的实现方式中,所述第一目标像素包括多个第二目标像素,所述任一像素对应的第二数值是根据所述多个第二目标像素对应的第三数值累加得到的;所述多个第二目标像素中的任一第二目标像素对应的第三数值为所述任一像素的第一信号值和所述任一第二目标像素的目标信号值的乘积。In a possible implementation manner, the first target pixel includes a plurality of second target pixels, and the second value corresponding to any pixel is accumulated according to the third value corresponding to the plurality of second target pixels The third numerical value corresponding to any second target pixel among the plurality of second target pixels is the product of the first signal value of any pixel and the target signal value of any second target pixel.
在一种可能的实现方式中,所述多个第一目标动态视觉传感器信号中的任一第一目标动态视觉传感器信号包括的任一像素的第一信号值是根据所述任一像素的第二信号值得到的;其中:若所述任一像素的第二信号值大于0,则所述任一像素的第一信号值为1;若所述任一像素的第二信号值小于0,则所述任一像素的第一信号值为-1;所述任一像素的第二信号值为所述任一像素的m个第三信号值中的任意一个,m为正整数;所述m个第三信号值是根据所述预设时间段内的多个第三动态视觉传感器信号得到的;其中:所述m个第三信号值中的第g个第三信号值为所述m个第三信号值中的第g-1个第三信号值与所述任一像素在第g个第二预设周期内的第四信号值的累加和的和,1≤g≤m,g为正整数,所述多个第三动态视觉传感器信号包括所述任一像素在所述第g个第二预设周期内的第四信号值,所述预设时间段包括 所述第g个第二预设周期;且g等于1时,第1个第三信号值为所述任一像素在第1个第二预设周期内的第四信号值的累加和。In a possible implementation manner, the first signal value of any pixel included in any first target dynamic vision sensor signal among the plurality of first target dynamic vision sensor signals is based on the first signal value of any pixel Two signal values are obtained; wherein: if the second signal value of any pixel is greater than 0, then the first signal value of any pixel is 1; if the second signal value of any pixel is less than 0, Then the first signal value of any pixel is -1; the second signal value of any pixel is any one of the m third signal values of any pixel, m is a positive integer; The m third signal values are obtained according to a plurality of third dynamic visual sensor signals within the preset time period; wherein: the gth third signal value among the m third signal values is the m The sum of the g-1th third signal value among the third signal values and the cumulative sum of the fourth signal value of any pixel in the gth second preset period, 1≤g≤m, g is a positive integer, the multiple third dynamic visual sensor signals include the fourth signal value of any pixel in the g second preset period, and the preset time period includes the g a second preset period; and when g is equal to 1, the first third signal value is the accumulated sum of the fourth signal values of any pixel in the first second preset period.
在一种可能的实现方式中,所述脉冲神经网络模型包括N个卷积层和N个逆卷积层,其中:所述N个卷积层中的第j个卷积层的输出为所述N个卷积层中的第j+1个卷积层的输入,所述N个逆卷积层中的第j个逆卷积层的输出为所述N个逆卷积层中的第j+1个逆卷积层的输入,所述第j个卷积层的输出还为所述N个逆卷积层中的第N-j个逆卷积层的输入,所述N个卷积层中的第N个卷积层的输出为所述N个逆卷积层中的第1个逆卷积层的输入,1≤j≤N,N和j为正整数。In a possible implementation manner, the spiking neural network model includes N convolutional layers and N inverse convolutional layers, wherein: the output of the jth convolutional layer in the N convolutional layers is The input of the j+1th convolutional layer in the N convolutional layers, the output of the jth inverse convolutional layer in the N inverse convolutional layers is the first in the N inverse convolutional layers The input of the j+1 deconvolution layer, the output of the jth convolution layer is also the input of the N-j deconvolution layer in the N deconvolution layers, and the N convolution layers The output of the Nth convolutional layer in is the input of the first inverse convolutional layer in the N deconvolutional layers, 1≤j≤N, and N and j are positive integers.
根据第三方面,本申请涉及一种电子设备,包括:一个或多个处理器;计算机可读存储介质,耦合到所述处理器并存储由所述处理器执行的程序,其中所述程序在由所述处理器执行时,使得所述电子设备执行第一方面中任意一种可能的实施例中的方法。According to a third aspect, the present application relates to an electronic device, comprising: one or more processors; a computer-readable storage medium coupled to the processors and storing a program executed by the processors, wherein the programs are in When executed by the processor, the electronic device is made to execute the method in any possible embodiment of the first aspect.
根据第四方面,本申请涉及一种计算机可读存储介质,包括程序代码,当其由计算机设备执行时,用于执行第一方面中任意一种可能的实施例中的方法。According to a fourth aspect, the present application relates to a computer-readable storage medium, including program codes, which, when executed by a computer device, are used to perform the method in any possible embodiment of the first aspect.
根据第五方面,本申请涉及了一种芯片,包括:处理器,用于从存储器中调用并运行计算机程序,使得安装有上述芯片的设备执行第一方面中任意一种可能的实施例中的方法。According to a fifth aspect, the present application relates to a chip, including: a processor, configured to call and run a computer program from a memory, so that a device installed with the above-mentioned chip executes any of the possible embodiments of the first aspect. method.
根据第六方面,本申请涉及包括程序代码的计算机程序产品,所述程序代码在运行时执行第一方面中任意一种可能的实施例中的方法。According to a sixth aspect, the present application relates to a computer program product comprising program code which, when run, performs the method of any one of the possible embodiments of the first aspect.
附图说明Description of drawings
下面对本申请实施例用到的附图进行介绍。The accompanying drawings used in the embodiments of the present application are introduced below.
图1为本申请实施例提供的一种神经网络的结构示意图;Fig. 1 is a schematic structural diagram of a neural network provided by an embodiment of the present application;
图2为本申请实施例提供的一种脉冲神经网络的结构示意图;FIG. 2 is a schematic structural diagram of a spiking neural network provided in an embodiment of the present application;
图3为本申请实施例提供的一种去噪方法的流程示意图;FIG. 3 is a schematic flowchart of a denoising method provided in an embodiment of the present application;
图4为本申请实施例提供的一种动态视觉传感器信号的自相关系数在时间上的分布曲线;Fig. 4 is the time distribution curve of the autocorrelation coefficient of a kind of dynamic vision sensor signal provided by the embodiment of the present application;
图5为本申请实施例提供的不同时间参数对应的突触后膜电压核函数的示意图;5 is a schematic diagram of the post-synaptic membrane voltage kernel function corresponding to different time parameters provided by the embodiment of the present application;
图6为本申请实施例提供的一种脉冲神经网络模型的训练过程示意图;FIG. 6 is a schematic diagram of a training process of a spiking neural network model provided in an embodiment of the present application;
图7为不同目标参数对本申请实施例提供的脉冲神经网络模型的训练的影响示意图;Fig. 7 is a schematic diagram of the influence of different target parameters on the training of the spiking neural network model provided by the embodiment of the present application;
图8为本申请实施例提供的脉冲神经网络模型的去噪效果与三维卷积神经网络模型的去噪效果对比图;8 is a comparison diagram of the denoising effect of the spiking neural network model provided by the embodiment of the present application and the denoising effect of the three-dimensional convolutional neural network model;
图9为本申请实施例提供的脉冲神经网络模型去噪后的信号与三维卷积神经网络模型去噪后的信号去模糊后的效果对比图;Fig. 9 is a comparison diagram of de-blurring effects between the denoised signal of the pulse neural network model provided by the embodiment of the present application and the de-noised signal of the three-dimensional convolutional neural network model;
图10为本申请实施例提供的一种去噪装置的结构示意图;FIG. 10 is a schematic structural diagram of a denoising device provided in an embodiment of the present application;
图11为本申请实施例提供的一种电子设备的结构示意图。FIG. 11 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
具体实施方式Detailed ways
首先,对本申请实施例涉及的相关技术进行介绍,以便于本领域技术人员理解本申请。First, related technologies involved in the embodiments of the present application are introduced, so that those skilled in the art can understand the present application.
(一)动态视觉传感器与噪声(1) Dynamic vision sensor and noise
动态视觉传感器也称事件相机或神经形态相机,是一种响应局部亮度变化的成像传感器。动态视觉传感器不像传统相机那样使用快门捕捉图像。相反,动态视觉传感器内的每个像素都独立且异步地运行,在亮度发生变化时报告,否则保持沉默。动态视觉传感器具有微秒级 的时间分辨率、120dB的动态范围,以及比帧相机更少的曝光不足/过度曝光和运动模糊。由于动态视觉传感器具有异步触发、高时间分辨率、高动态范围、低延时、低带宽和低功耗等优势,可搭载于移动端平台(如手机、无人机和车等)用于物体检测、跟踪、识别、深度估计等视觉任务。A dynamic vision sensor, also known as an event camera or a neuromorphic camera, is an imaging sensor that responds to local brightness changes. Dynamic vision sensors do not use a shutter to capture images like traditional cameras do. Instead, each pixel within the dynamic vision sensor operates independently and asynchronously, reporting when brightness changes and remaining silent otherwise. Dynamic vision sensors have microsecond-level temporal resolution, 120dB dynamic range, and less under/overexposure and motion blur than frame cameras. Due to the advantages of asynchronous triggering, high time resolution, high dynamic range, low latency, low bandwidth and low power consumption, dynamic vision sensors can be mounted on mobile platforms (such as mobile phones, drones and cars, etc.) for objects Vision tasks such as detection, tracking, recognition, depth estimation, etc.
区别与传统技术方案中,以固定频率采集的“帧”为基础,并依次读取各“帧”中所有的像素信息,动态视觉传感器不需要对画面中的所有像素点进行读取,仅需要获取光强度变化的像素点的地址和信息;具体的,当动态视觉传感器检测到某个像素点的光强度变化大于等于预设门限数值时,则发出该像素点的事件信号;其中,如果该光强度变化为正向变化,即该像素点由低亮度跳变至高亮度,则发出“+1”事件信号,并标注为正事件;如果该光强度变化为负向变化,即该像素点由高亮度跳变至低亮度,则发出“-1”事件信号,并标注为负事件;如果光强度变化小于预设门限数值,则不发出事件信号,标注为无事件;动态视觉传感器对各像素点进行的事件标注,构成了事件流信息。动态视觉传感器采集得到的光强变化信息的形式可以为(X,Y,P,T),其中,“X,Y”为事件地址,“P”为事件输出,“T”为事件产生的时间。一个事件地址与动态视觉传感器所关联的二维图像中的一个像素点匹配,是指该事件地址对应于基准彩色图像中的一个像素位置,“X,Y”可以分别为基准彩色图像中的行、列位置,“P”为实时光强变化量的具体数值,“T”为该实时光强变化量的生成时间。Different from the traditional technical solution, based on the "frame" collected at a fixed frequency, and read all the pixel information in each "frame" in sequence, the dynamic vision sensor does not need to read all the pixels in the picture, only need Obtain the address and information of the pixel point where the light intensity changes; specifically, when the dynamic vision sensor detects that the light intensity change of a certain pixel point is greater than or equal to the preset threshold value, an event signal for the pixel point is sent; wherein, if the If the light intensity change is a positive change, that is, the pixel point jumps from low brightness to high brightness, then a "+1" event signal is sent and marked as a positive event; if the light intensity change is a negative change, that is, the pixel point is changed by When high brightness jumps to low brightness, a "-1" event signal is issued and marked as a negative event; if the light intensity change is less than the preset threshold value, no event signal is sent and marked as no event; the dynamic vision sensor The event annotation performed by the point constitutes the event flow information. The form of light intensity change information collected by the dynamic visual sensor can be (X, Y, P, T), where "X, Y" is the event address, "P" is the event output, and "T" is the time when the event is generated . An event address matches a pixel in the two-dimensional image associated with the dynamic vision sensor, which means that the event address corresponds to a pixel position in the reference color image, and "X, Y" can be the row in the reference color image , column position, "P" is the specific value of the real-time light intensity change, and "T" is the generation time of the real-time light intensity change.
基于地址-事件表达(AER)的动态视觉传感器模仿生物视觉的工作机理,而传统视觉图像采集方式以固定频率采集的“帧”为基础,具有高冗余、高延迟、低动态范围和高数据量等缺陷。动态视觉传感器是像素异步工作,仅输出光强发生变化的像素的地址和信息,而非被动依次读出“帧”内每个像素的信息,从源头上消除冗余数据,具有场景变化实时动态响应、图像超稀疏表示和事件异步输出等特点。由于动态视觉传感器的高灵敏性,信号的输出常会伴随噪声,包括背景噪声、器件热噪声等。同时,因其数据密度变化范围大,时间分辨率高,给传统基于图像的去噪算法带来挑战。The dynamic visual sensor based on address-event expression (AER) imitates the working mechanism of biological vision, while the traditional visual image acquisition method is based on "frames" collected at a fixed frequency, which has high redundancy, high latency, low dynamic range and high data Quantity and other defects. The dynamic vision sensor works asynchronously with pixels, and only outputs the address and information of the pixels whose light intensity changes, instead of passively reading out the information of each pixel in the "frame" in sequence, eliminating redundant data from the source, and having real-time dynamics of scene changes Response, super-sparse representation of images, and asynchronous output of events. Due to the high sensitivity of the dynamic vision sensor, the signal output is often accompanied by noise, including background noise, device thermal noise, etc. At the same time, due to its large range of data density changes and high temporal resolution, it poses challenges to traditional image-based denoising algorithms.
(二)脉冲神经网络(2) Spike neural network
神经网络是一种模仿生物大脑结构进行数据处理的计算系统。生物大脑内部是由大量的神经元通过不同的连接方式组合而成,前一个神经元与后一个神经元之间是通过突触结构连接起来进行信息传递的。神经网络具有强大的非线性、自适应、容错性的信息处理能力。A neural network is a computing system that mimics the structure of a biological brain for data processing. The interior of the biological brain is composed of a large number of neurons combined in different ways, and the former neuron and the latter neuron are connected through synaptic structures for information transmission. Neural network has powerful nonlinear, self-adaptive and fault-tolerant information processing capabilities.
相应地,神经网络结构如图1所示,每个节点模拟一个神经元,执行某个特定运算,例如激活函数;节点之间的连接模拟神经突触,突触的权重值代表了两个神经元之间的连接强度。Correspondingly, the neural network structure is shown in Figure 1. Each node simulates a neuron and performs a specific operation, such as an activation function; the connection between nodes simulates a synapse, and the weight value of the synapse represents two neurons. The connection strength between elements.
当前的人工神经网络中,信息的传输通过模拟数值进行,每一个神经元通过乘加运算累积前序神经元的数值,经过激活函数后传递到后序神经元。而在更加仿脑的脉冲神经网络中,信息的传输通过脉冲序列进行,每一个神经元通过累积前序神经元的脉冲序列,进行膜电压的调控,当膜电压到达一定阈值后,该神经元发放新的脉冲并传输至后序神经元,通过这样的方式实现了信息的传输、处理和非线性变换。脉冲神经网络中可以使用多种不同类型的神经元,例如积分放电(Integrate and Fire,IF)模型、漏电积分放电(Leaky Integrate and Fire,LIF)模型、脉冲响应模型(Spike Response Model,SRM)、阈值可变神经元等。In the current artificial neural network, the transmission of information is carried out through analog values, and each neuron accumulates the value of the previous neuron through the multiplication and addition operation, and passes the activation function to the subsequent neuron. In a more brain-like spiking neural network, the transmission of information is carried out through pulse sequences, and each neuron regulates the membrane voltage by accumulating the pulse sequences of the previous neurons. When the membrane voltage reaches a certain threshold, the neuron Send out new pulses and transmit them to subsequent neurons, in this way the transmission, processing and nonlinear transformation of information are realized. Many different types of neurons can be used in the spiking neural network, such as integral discharge (Integrate and Fire, IF) model, leakage integral discharge (Leaky Integrate and Fire, LIF) model, impulse response model (Spike Response Model, SRM), Threshold variable neurons, etc.
脉冲神经网络的一些重要术语如下:Some important terms of spiking neural networks are as follows:
脉冲神经元是脉冲神经网络的基本组成单元,通过接收脉冲输入进行信息的整合。脉冲输入会导致神经元的膜电压增加,当神经元膜电压增加到超过一定的阈值电压后,该神经元会发放脉冲并传输至其他神经元。The spike neuron is the basic unit of the spike neural network, which integrates information by receiving pulse input. Pulse input will cause the membrane voltage of the neuron to increase. When the neuron membrane voltage increases beyond a certain threshold voltage, the neuron will send out a pulse and transmit it to other neurons.
突触(synapse)是脉冲传递过程中的载体,脉冲神经元之间依靠突触进行连接。Synapse is the carrier in the process of pulse transmission, and the connection between pulse neurons depends on synapses.
突触后膜电压也称为突触后电位,由突触前神经元发放的脉冲在突触后神经元膜电压上产生的电压变化。The post-synaptic membrane voltage is also called the post-synaptic potential, which is the voltage change on the post-synaptic neuron membrane voltage generated by the pulse fired by the pre-synaptic neuron.
相比于人工神经网络,脉冲神经网络基于大脑神经网络的启发,具有低能耗、异步计算和时间动力学等特性,是处理高度稀疏的动态视觉传感器流式数据的一种理想技术方法。Compared with artificial neural networks, spiking neural networks are based on the inspiration of brain neural networks, and have the characteristics of low energy consumption, asynchronous computing and time dynamics, and are an ideal technical method for processing highly sparse dynamic visual sensor streaming data.
本申请提供的技术方案采用脉冲神经网络模型对动态视觉传感器信号进行去噪,具体包括:脉冲神经网络模型的架构设计、训练设计的技术方案,提高动态视觉传感器信号的数据密度的技术方案,以及基于动态视觉传感器信号的特征(例如时间相关性)调整脉冲神经元的参数的技术方案,等等。The technical solution provided by this application uses the pulse neural network model to denoise the dynamic visual sensor signal, specifically including: the structure design of the pulse neural network model, the technical solution for training design, the technical solution for improving the data density of the dynamic visual sensor signal, and A technical solution for adjusting the parameters of spiking neurons based on the characteristics of dynamic visual sensor signals (such as temporal correlation), and so on.
下面结合具体实施方式对本申请提供的技术方案进行详细的介绍。The technical solutions provided by the present application will be described in detail below in conjunction with specific implementation methods.
请参阅图2,图2是本申请实施例提供的一种脉冲神经网络的结构示意图;所述脉冲神经网络模型包括N个卷积层和N个逆卷积层,其中:所述N个卷积层中的第j个卷积层的输出为所述N个卷积层中的第j+1个卷积层的输入,所述N个逆卷积层中的第j个逆卷积层的输出为所述N个逆卷积层中的第j+1个逆卷积层的输入,所述第j个卷积层的输出还为所述N个逆卷积层中的第N-j个逆卷积层的输入,所述N个卷积层中的第N个卷积层的输出为所述N个逆卷积层中的第1个逆卷积层的输入,1≤j≤N,N和j为正整数。Please refer to Fig. 2, Fig. 2 is a schematic structural diagram of a spiking neural network provided by an embodiment of the present application; the spiking neural network model includes N convolutional layers and N deconvolutional layers, wherein: the N volumes The output of the jth convolutional layer in the convolutional layer is the input of the j+1th convolutional layer in the N convolutional layers, and the jth inverse convolutional layer in the N deconvolutional layers The output of the N deconvolution layer is the input of the j+1th deconvolution layer, and the output of the jth convolution layer is also the N-jth of the N deconvolution layers The input of the deconvolution layer, the output of the Nth convolution layer in the N convolution layers is the input of the first deconvolution layer in the N deconvolution layers, 1≤j≤N , N and j are positive integers.
由上可知,在该脉冲神经网络的卷积层和逆卷积层之间加入跳跃连接。并且,该脉冲神经网络的网络计算单元为脉冲神经元,脉冲神经元可以使用多种不同类型的神经元,如积分放电模型、漏电积分放电模型、脉冲响应模型、阈值可变神经元等。As can be seen from the above, a skip connection is added between the convolutional layer and the deconvolutional layer of the spiking neural network. Moreover, the network computing unit of the spiking neural network is a spiking neuron, and the spiking neuron can use many different types of neurons, such as integral discharge model, leakage integral discharge model, impulse response model, and threshold-variable neuron.
在一种可能的实现方式中,该脉冲神经网络的神经元采用脉冲响应模型,脉冲响应模型由以下积分方程定义:In a possible implementation, the neurons of the spiking neural network adopt an impulse response model, and the impulse response model is defined by the following integral equation:
Figure PCTCN2022130027-appb-000001
Figure PCTCN2022130027-appb-000001
在公式(1)中,t表示时间,u(t)表示t时刻的电压,η表示神经元在每次发放脉冲后的电压值变量,f表示神经元脉冲发放时间索引数,t (f)表示脉冲发放时间,κ ext表示突触后膜电压核函数且κ ext为指数函数,s表示脉冲输入时间,u rest表示静息电压,
Figure PCTCN2022130027-appb-000002
表示对前序神经元脉冲的累积作用产生的电压值。其中,当电压u(t)超过一定阈值θ时,神经元发放脉冲。
In the formula (1), t represents the time, u(t) represents the voltage at time t, η represents the voltage value variable of the neuron after each firing pulse, f represents the index number of the neuron pulse firing time, t (f) Indicates the pulse release time, κ ext represents the post-synaptic membrane voltage kernel function and κ ext is an exponential function, s represents the pulse input time, u rest represents the resting voltage,
Figure PCTCN2022130027-appb-000002
Indicates the voltage value resulting from the cumulative effect on the preceding neuron spikes. Among them, when the voltage u(t) exceeds a certain threshold θ, the neuron fires a pulse.
由于动态视觉传感器输出数据的高度稀疏性,传统的基于欧几里得距离或L2范数(L2norm)的损失函数容易产生“零输出”。针对这一问题:Due to the high sparsity of the output data of dynamic vision sensors, traditional loss functions based on Euclidean distance or L2 norm (L2norm) are prone to produce "zero output". To solve this problem:
在一种可能的实现方式中,在脉冲神经网络中采用基于范罗苏姆距离(Van Rossum)的损失函数,以更好的体现脉冲序列的误差,避免网络输出不足。其中,范罗苏姆距离具体定义如下:In a possible implementation, a loss function based on Van Rossum distance (Van Rossum) is used in the spike neural network to better reflect the error of the spike sequence and avoid insufficient network output. Among them, the specific definition of Van Rossum distance is as follows:
给定输出脉冲序列:Given the output pulse train:
u={u 1,u 2,…,u n}(2) u={u 1 ,u 2 ,…,u n }(2)
在公式(2)中,u 1,u 2,…,u n均表示脉冲时间。 In formula (2), u 1 , u 2 , . . . , u n all represent the pulse time.
若目标脉冲序列为:If the target pulse sequence is:
v={v 1,v 2,…,v n}(3) v={v 1 ,v 2 ,…,v n }(3)
在公式(3)中,v 1,v 2,…,v n均表示脉冲时间。 In formula (3), v 1 , v 2 , . . . , v n all represent the pulse time.
则范罗苏姆距离定义为:Then the Van Rossum distance is defined as:
Figure PCTCN2022130027-appb-000003
Figure PCTCN2022130027-appb-000003
在公式(4)中,τ表示核函数h(t)的时间常数,t表示时间;f(t;u)和f(t;v)表示脉冲序列与特定核函数的卷积分别如公式(5)和公式(6)所示:In formula (4), τ represents the time constant of the kernel function h(t), and t represents time; f(t; u) and f(t; v) represent the convolution of the pulse sequence and a specific kernel function, respectively, as in the formula ( 5) and formula (6):
Figure PCTCN2022130027-appb-000004
Figure PCTCN2022130027-appb-000004
Figure PCTCN2022130027-appb-000005
Figure PCTCN2022130027-appb-000005
其中,核函数h(t)定义为:Among them, the kernel function h(t) is defined as:
Figure PCTCN2022130027-appb-000006
Figure PCTCN2022130027-appb-000006
图2所示的脉冲神经网络模型包括对称的N个卷积层和N个逆卷积层,且每个卷积层和与其对称的逆卷积层之间有跳跃连接;如此,该脉冲神经网络模型通过逆卷积和跳跃连接对动态视觉传感器信号进行特征提取和重构,有利于确保提取的特征完整以及重构的特征真实。The spiking neural network model shown in Figure 2 includes symmetrical N convolutional layers and N deconvolutional layers, and there is a skip connection between each convolutional layer and its symmetric deconvolutional layer; thus, the spiking neural network The network model extracts and reconstructs the features of dynamic visual sensor signals through deconvolution and skip connections, which is beneficial to ensure the integrity of the extracted features and the authenticity of the reconstructed features.
请参阅图3,图3是示出根据本申请一种实施例的去噪方法的过程300的流程图。过程300描述为一系列的步骤或操作,应当理解的是,过程300可以以各种顺序执行和/或同时发生,不限于图3所示的执行顺序。过程300可由电子设备执行,该电子设备包括服务器和终端,过程300包括但不限于如下步骤或操作:Please refer to FIG. 3 . FIG. 3 is a flowchart illustrating a process 300 of a denoising method according to an embodiment of the present application. The process 300 is described as a series of steps or operations. It should be understood that the process 300 may be executed in various orders and/or concurrently, and is not limited to the execution order shown in FIG. 3 . The process 300 can be executed by an electronic device, the electronic device includes a server and a terminal, and the process 300 includes but not limited to the following steps or operations:
步骤301:获取第一动态视觉传感器信号;Step 301: Acquire the first dynamic vision sensor signal;
步骤302:采用脉冲神经网络模型对所述第一动态视觉传感器信号进行去噪处理,以得到第二动态视觉传感器信号,所述脉冲神经网络模型的脉冲神经元的突触后膜电压核函数包括目标参数,所述目标参数是根据动态视觉传感器信号的自相关系数确定的。Step 302: Using the spiking neural network model to denoise the first dynamic visual sensor signal to obtain the second dynamic visual sensor signal, the post-synaptic membrane voltage kernel function of the spiking neuron of the spiking neural network model includes A target parameter, the target parameter is determined according to the autocorrelation coefficient of the dynamic visual sensor signal.
其中,所述脉冲神经网络模型的结构可以如图2所示。Wherein, the structure of the spiking neural network model may be shown in FIG. 2 .
其中,第一动态视觉传感器信号为动态视觉传感器采集的原始信号,第二动态视觉传感器信号为经过本申请提供的脉冲神经网络模型进行去噪处理之后的信号。Wherein, the first dynamic visual sensor signal is the original signal collected by the dynamic visual sensor, and the second dynamic visual sensor signal is the signal after the pulse neural network model provided by this application is denoised.
其中,动态视觉传感器信号的自相关系数可以为时间数据自相关系数(autocorrelation coefficient):<x(s)x(s-t)>;其中,x表示信号,t表示间隔时间,s表示时间点,<>表示对不同时间点的信号求平均值。Wherein, the autocorrelation coefficient of the dynamic vision sensor signal can be the time data autocorrelation coefficient (autocorrelation coefficient): <x(s)x(s-t)>; wherein, x represents the signal, t represents the interval time, s represents the time point, < > means to average the signal at different time points.
其中,该目标参数可以在该脉冲神经网络模型的训练前,根据动态视觉传感器信号的自相关系数预先设定;该目标参数也可以在该脉冲神经网络模型的训练过程中,根据动态视觉传感器信号的自相关系数实时调整。并且,在该脉冲神经网络模型的训练过程中,该目标参数可以设置为自学习,也可以设置为固定。Wherein, the target parameter can be preset according to the autocorrelation coefficient of the dynamic visual sensor signal before the training of the spiking neural network model; The autocorrelation coefficient is adjusted in real time. Moreover, during the training process of the spiking neural network model, the target parameter can be set as self-learning or fixed.
在本申请中,采用脉冲神经网络模型对动态视觉传感器信号进行去噪处理,该脉冲神经网络模型包括脉冲神经元,且该脉冲神经元的突触后膜电压核函数中的目标参数是根据动态视觉传感器信号的自相关系数确定的,故该突触后膜电压核函数能够让该脉冲神经网络模型学习到动态视觉传感器信号的时间相关性,并去除动态视觉传感器信号中时间相关性弱的噪声事件,从而能够提高该脉冲神经网络模型对动态视觉传感器信号的去噪效果。并且,由于 脉冲神经网络模型固有的时间动力学特性,其对高度稀疏的动态视觉传感器信号进行流式去噪,也即脉冲神经网络模型的输入数据与输出数据帧数相同,数据在脉冲神经网络模型中一遍通过,不依赖三维卷积在时间维度进行遍历滑窗计算,故与现有的基于人工神经网络的去噪方法相比,可以极大降低运行时间,网络规模和计算量。In this application, the spiking neural network model is used to denoise the dynamic visual sensor signal. The spiking neural network model includes spiking neurons, and the target parameters in the postsynaptic membrane voltage kernel function of the spiking neurons are based on the dynamic The autocorrelation coefficient of the visual sensor signal is determined, so the post-synaptic membrane voltage kernel function can enable the spiking neural network model to learn the time correlation of the dynamic visual sensor signal, and remove the noise with weak temporal correlation in the dynamic visual sensor signal events, so that the denoising effect of the spiking neural network model on dynamic visual sensor signals can be improved. Moreover, due to the inherent temporal dynamics of the spiking neural network model, it performs streaming denoising on highly sparse dynamic visual sensor signals, that is, the input data of the spiking neural network model is the same as the number of output data frames, and the data in the spiking neural network One pass in the model does not rely on three-dimensional convolution to perform traversal sliding window calculations in the time dimension, so compared with the existing denoising methods based on artificial neural networks, it can greatly reduce the running time, network size and calculation amount.
在一种可能的实现方式中,所述动态视觉传感器信号的自相关系数包括多个自相关系数,所述多个自相关系数是根据预设时间段内的多个第一目标动态视觉传感器信号得到的;所述目标参数是根据预设自相关系数阈值和预设函数得到的,所述预设函数是根据所述多个自相关系数在时间上的分布拟合得到的。In a possible implementation manner, the autocorrelation coefficient of the dynamic vision sensor signal includes a plurality of autocorrelation coefficients, and the plurality of autocorrelation coefficients are based on a plurality of first target dynamic vision sensor signals within a preset time period. obtained; the target parameter is obtained according to a preset autocorrelation coefficient threshold and a preset function, and the preset function is obtained by fitting according to the distribution of the multiple autocorrelation coefficients in time.
其中,本实现方式可以基于一段时间内的动态视觉传感器信号的自相关系数得到突触后膜电压核函数中的目标参数。例如:根据预设时间段内的多个第一目标动态视觉传感器信号得到多个自相关系数,其中,多个自相关系数是与多个时刻一一对应的,也即多个自相关系数是分布在这多个时刻上的,如图4所示;然后根据多个自相关系数在时间上(也即动态视觉传感器信号相关性曲线)的分布进行函数拟合,得到预设函数;根据预设自相关系数阈值和拟合得到的该预设函数,得到该目标函数。具体如下:Wherein, in this implementation manner, the target parameters in the postsynaptic membrane voltage kernel function can be obtained based on the autocorrelation coefficient of the dynamic visual sensor signal within a period of time. For example: multiple autocorrelation coefficients are obtained according to a plurality of first target dynamic vision sensor signals within a preset time period, wherein the multiple autocorrelation coefficients are in one-to-one correspondence with multiple moments, that is, the multiple autocorrelation coefficients are Distributed on these multiple moments, as shown in Figure 4; Then according to the distribution of multiple autocorrelation coefficients in time (that is, the dynamic visual sensor signal correlation curve), the function fitting is carried out to obtain the preset function; according to the preset Set the autocorrelation coefficient threshold and the preset function obtained by fitting to obtain the objective function. details as follows:
首先,通过函数拟合的方法拟合动态视觉传感器信号相关性曲线,也即根据多个自相关系数在时间上的分布形状拟合得到预设函数;例如拟合得到的预设函数的格式可以为:First, the correlation curve of the dynamic visual sensor signal is fitted by the function fitting method, that is, the preset function is obtained by fitting the distribution shape of multiple autocorrelation coefficients in time; for example, the format of the fitted preset function can be for:
y=be -ax+c  (8) y=be -ax +c (8)
其中,公式(8)中a、b和c的值是根据实际的自相关系数在时间上的分布形状确定的。Among them, the values of a, b and c in formula (8) are determined according to the distribution shape of the actual autocorrelation coefficient in time.
然后,基于预设自相关系数阈值求解预设函数的反函数;例如预设函数的反函数可以为:Then, the inverse function of the preset function is solved based on the preset autocorrelation coefficient threshold; for example, the inverse function of the preset function can be:
Figure PCTCN2022130027-appb-000007
Figure PCTCN2022130027-appb-000007
其中,预设自相关系数阈值也即公式(9)中y的取值,从而基于预设自相关系数阈值计算得到公式(9)中x的取值,公式(9)中x的取值即为突触后膜电压核函数的目标参数的值。Among them, the preset autocorrelation coefficient threshold is also the value of y in formula (9), so that the value of x in formula (9) is obtained based on the preset autocorrelation coefficient threshold, and the value of x in formula (9) is is the value of the objective parameter of the postsynaptic membrane voltage kernel function.
在一种示例中,预设自相关系数阈值是多个,从而组成一个预设自相关系数阈值区间,基于该预设自相关系数阈值区间求解预设函数的反函数,得到突触后膜电压核函数的目标参数的取值区间,突触后膜电压核函数的目标参数从该取值区间中选择。其中,突触后膜电压核函数的目标参数基于以下原则选择:In one example, there are multiple preset autocorrelation coefficient thresholds, thereby forming a preset autocorrelation coefficient threshold interval, based on the preset autocorrelation coefficient threshold interval, the inverse function of the preset function is solved to obtain the post-synaptic membrane voltage The value range of the target parameter of the kernel function, and the target parameter of the postsynaptic membrane voltage kernel function is selected from the value range. Among them, the target parameters of the postsynaptic membrane voltage kernel function are selected based on the following principles:
(1)突触后膜电压核函数的时间跨度能够吻合动态视觉传感器信号的相关性,也即突触后膜电压核函数的横坐标尺度和动态视觉传感器信号的时间相关性范围要尽量重合。(1) The time span of the postsynaptic membrane voltage kernel function can match the correlation of the dynamic visual sensor signal, that is, the abscissa scale of the postsynaptic membrane voltage kernel function and the time correlation range of the dynamic visual sensor signal should coincide as much as possible.
(2)突触后膜电压核函数的时间跨度不宜接近0,避免弱化神经元的时间动力学特性。(2) The time span of the postsynaptic membrane voltage kernel function should not be close to 0, so as to avoid weakening the time dynamics of neurons.
例如,预设自相关系数阈值区间为[0.2,0.5],得出突触后膜电压核函数的目标参数的取值区间为[5,13]。For example, the threshold interval of the preset autocorrelation coefficient is [0.2, 0.5], and the value interval of the target parameter of the post-synaptic membrane voltage kernel function is [5, 13].
假设该脉冲神经网络模型的神经元为脉冲响应模型,则突触后膜电压核函数可以为双指数函数,其表达式如下所示:Assuming that the neuron of the spiking neural network model is an impulse response model, the post-synaptic membrane voltage kernel function can be a double exponential function, and its expression is as follows:
Figure PCTCN2022130027-appb-000008
Figure PCTCN2022130027-appb-000008
在公式(10)中,κ ext表示突触后膜电压核函数,κ ext也称为幅值;β表示幅值调节系数;τ s表示第一时间参数,τ s也即本申请所述的目标参数;τ m表示第二时间参数。例如,τ s的取值区间为[5,13],可以设定τ s=5,而τ m为固定值,例如τ m=2。图5示出了几种不同时间参数对应的突触后膜电压核函数。 In the formula (10), κ ext represents the post-synaptic membrane voltage kernel function, κ ext is also called the amplitude; β represents the amplitude adjustment coefficient; τ s represents the first time parameter, τ s is also referred to in the application target parameter; τ m represents the second time parameter. For example, the value interval of τ s is [5, 13], τ s =5 can be set, and τ m is a fixed value, for example, τ m =2. Figure 5 shows the postsynaptic membrane voltage kernel function corresponding to several different time parameters.
在本实现方式中,基于预设时间段内的多个第一目标动态视觉传感器信号得到的多个自相关系数在时间上的分布,进行函数拟合得到预设函数,故该预设函数表征动态视觉传感器信号的自相关系数与时间的关系;然后针对该预设函数求反函数,得到该预设函数的反函数,故该预设函数的反函数表征时间与动态视觉传感器信号的自相关系数的关系;再将基于预设自相关系数阈值求解该预设函数的反函数得到的时间值,作为脉冲神经元的突触后膜电压核函数中的目标参数的值。如此,实现对突触后膜电压核函数中的目标参数进行调整,以使得脉冲神经网络模型学习到动态视觉传感器信号的时间相关性。In this implementation, based on the time distribution of multiple autocorrelation coefficients obtained from multiple first target dynamic visual sensor signals within a preset time period, function fitting is performed to obtain a preset function, so the preset function represents The relationship between the autocorrelation coefficient of the dynamic visual sensor signal and time; then the inverse function is calculated for the preset function to obtain the inverse function of the preset function, so the inverse function of the preset function represents the autocorrelation between time and the dynamic visual sensor signal The relationship between the coefficients; then the time value obtained by solving the inverse function of the preset function based on the preset autocorrelation coefficient threshold is used as the value of the target parameter in the post-synaptic membrane voltage kernel function of the spiking neuron. In this way, the target parameters in the post-synaptic membrane voltage kernel function are adjusted, so that the spiking neural network model learns the temporal correlation of the dynamic visual sensor signal.
需要说明的是,本申请可以通过时间滑窗方法在线计算一段时间内的动态视觉传感器信号的自相关系数。具体操作如下:It should be noted that the present application can calculate the autocorrelation coefficient of the dynamic visual sensor signal within a period of time online through the time sliding window method. The specific operation is as follows:
假设C q表示间隔为q的动态视觉传感器信号之间的自相关系数,C q的计算公式如下所示: Assuming that C q represents the autocorrelation coefficient between dynamic visual sensor signals with an interval of q, the calculation formula of C q is as follows:
Figure PCTCN2022130027-appb-000009
Figure PCTCN2022130027-appb-000009
在公式(11)中,D表示滑窗内的动态视觉传感器信号的数量;W和H分别为像素在x轴和y轴的维度,也即滑窗的宽和高;S k,x,y表示坐标为(x,y)的像素在滑窗中的第k个动态视觉传感器信号中的信号值,S k,x,y=1或S k,x,y=-1;S k+q,x′,y′表示坐标为(x′,y′)的像素在与第k个动态视觉传感器信号间隔为q的动态视觉传感器信号中的信号值,S k+q,x′,y′=1或S k+q,x′,y′=-1;坐标为(x′,y′)的像素是坐标为(x,y)的像素的相邻像素,x′和y′的取值范围如下所示: In formula (11), D represents the number of dynamic visual sensor signals in the sliding window; W and H are the dimensions of the pixel on the x-axis and y-axis, respectively, that is, the width and height of the sliding window; S k,x,y Represent the signal value of the pixel whose coordinates are (x, y) in the kth dynamic visual sensor signal in the sliding window, S k, x, y =1 or S k, x, y =-1; S k+q , x′, y′ represent the signal value of the pixel whose coordinates are (x′, y′) in the dynamic visual sensor signal with an interval of q from the kth dynamic visual sensor signal, S k+q, x′, y′ = 1 or S k+q, x′, y′ =-1; the pixel whose coordinates are (x′, y′) is the adjacent pixel of the pixel whose coordinates are (x, y), and the selection of x′ and y′ The value ranges are as follows:
x-Δ≤x′≤x+Δ  (12)x-Δ≤x′≤x+Δ (12)
y-Δ≤y′≤y+Δ  (13)y-Δ≤y′≤y+Δ (13)
在公式(11)中,p x,y(S k,x,y,S k+q,x′,y′,Δ)表示对S k,x,y和S k+q,x′,y′进行邻近度为Δ的计算,Δ表示邻近度p x,y(S k,x,y,S k+q,x′,y′,Δ)的计算公式如下所示: In formula (11), p x,y (S k,x,y ,S k+q,x′,y′ ,Δ) represents the pair of S k,x,y and S k+q,x′,y ' Carry out the calculation of the proximity degree Δ, Δ means the calculation formula of the proximity p x,y (S k,x,y ,S k+q,x',y' ,Δ) is as follows:
Figure PCTCN2022130027-appb-000010
Figure PCTCN2022130027-appb-000010
假设Δ=1,根据公式(11)至公式(14)可以计算得出一系列C q的值,该一系列C q的值在时间上的分布如图4所示。 Assuming Δ=1, a series of C q values can be calculated according to formula (11) to formula (14). The time distribution of the series of C q values is shown in FIG. 4 .
在一种可能的实现方式中,所述多个自相关系数中的任一自相关系数为D个第二目标动态视觉传感器信号对应的第一数值的平均值,D为正整数;所述D个第二目标动态视觉传感器信号为所述多个第一目标动态视觉传感器信号中归属于同一第一预设周期的D个第一目标动态视觉传感器信号,所述预设时间段包括多个所述第一预设周期;所述D个第二目标动态视觉传感器信号中的任一第二目标动态视觉传感器信号对应的第一数值是根据多个像素对应的第二数值累加得到的;所述多个像素中的任一像素对应的第二数值是根据所述任一像素的第一信号值和第一目标像素的目标信号值得到的,所述第一目标像素为与所述任一像素的邻近度不大于预设邻近度阈值的像素;所述任一第二目标动态视觉传感器信号包括所述任一像素的第一信号值,所述任一第二目标动态视觉传感器信号为所述多个第一目标动态视觉传感器信号中的第w个第一目标动态视觉传感器信号,w为正整数;所述第一目标像素的目标信号值为所述第一目标像素在第三目标动态视觉传感器信号中的第一信号值,所述第三目标动态视觉传感器信号为所述多个第一目标动态视觉传感器信号中的第w+q个第一目标动态视觉传感器信号,q为正整数。In a possible implementation, any autocorrelation coefficient among the plurality of autocorrelation coefficients is the average value of the first values corresponding to the D second target dynamic vision sensor signals, and D is a positive integer; the D The second target dynamic vision sensor signals are the D first target dynamic vision sensor signals belonging to the same first preset period among the multiple first target dynamic vision sensor signals, and the preset time period includes a plurality of the first target dynamic vision sensor signals The first preset period; the first numerical value corresponding to any second target dynamic visual sensor signal in the D second target dynamic visual sensor signals is obtained by accumulating the second numerical values corresponding to a plurality of pixels; The second numerical value corresponding to any pixel in the plurality of pixels is obtained according to the first signal value of the any pixel and the target signal value of the first target pixel, and the first target pixel is the same as the any pixel The pixels whose proximity is not greater than the preset proximity threshold; the any second target dynamic visual sensor signal includes the first signal value of any pixel, and the any second target dynamic visual sensor signal is the The w first target dynamic vision sensor signal in a plurality of first target dynamic vision sensor signals, w is a positive integer; the target signal value of the first target pixel is the first target pixel in the third target dynamic vision The first signal value in the sensor signal, the third target dynamic vision sensor signal is the w+qth first target dynamic vision sensor signal among the plurality of first target dynamic vision sensor signals, and q is a positive integer.
其中,所述多个像素为动态视觉传感器的感光元件中的多个像素,或所述多个像素为二维图像中的多个像素,且该二维图像是通过动态视觉传感器的感光元件采集得到;所述第一目标像素为动态视觉传感器的感光元件中与所述任一像素的邻近度不大于预设邻近度阈值的像素,或所述第一目标像素为二维图像中与所述任一像素的邻近度不大于预设邻近度阈值的像素,且该二维图像是通过动态视觉传感器的感光元件采集得到。Wherein, the plurality of pixels are a plurality of pixels in the photosensitive element of the dynamic vision sensor, or the plurality of pixels are a plurality of pixels in a two-dimensional image, and the two-dimensional image is collected by the photosensitive element of the dynamic vision sensor Obtained: the first target pixel is a pixel whose proximity to any pixel in the photosensitive element of the dynamic vision sensor is not greater than a preset proximity threshold, or the first target pixel is a pixel in a two-dimensional image that is not closer to the pixel The proximity of any pixel is not greater than the preset proximity threshold, and the two-dimensional image is collected by the photosensitive element of the dynamic vision sensor.
具体地,预设时间段内有多个第一目标动态视觉传感器信号,通过时间滑窗方法计算预设时间段内的动态视觉传感器信号的自相关系数,滑窗的时间窗大小可以为第一预设周期的大小;D表示同一个第一预设周期内的第一目标动态视觉传感器信号的数量,也即同一个第一预设周期内有D个第二目标动态视觉传感器信号;W和H分别为动态视觉传感器的感光元件的宽和高;(x,y)表示动态视觉传感器的感光元件中的任一像素的坐标,或(x,y)表示在二维图像上的任一像素的坐标;(x′,y′)表示动态视觉传感器的感光元件中与任一像素(x,y)的邻近度不大于预设邻近度阈值的像素的坐标,或(x′,y′)表示二维图像中与任一像素(x,y)的邻近度不大于预设邻近度阈值的像素的坐标,也即第一目标像素的坐标;S k,x,y表示任一像素(x,y)在多个第一目标动态视觉传感器信号中的第w个第一目标动态视觉传感器信号中的第一信号值;S k+q,x′,y′表示第一目标像素(x′,y′)在多个第一目标动态视觉传感器信号中的第w+q个第一目标动态视觉传感器信号中的第一信号值,也即第一目标像素(x′,y′)在第三目标动态视觉传感器信号中的第一信号值;第一数值为
Figure PCTCN2022130027-appb-000011
第二数值为p x,y(S k,x,y,S k+q,x′,y′,Δ)。
Specifically, there are multiple first target dynamic visual sensor signals in the preset time period, and the autocorrelation coefficient of the dynamic visual sensor signals in the preset time period is calculated by the time sliding window method, and the time window size of the sliding window can be the first The size of the preset cycle; D represents the quantity of the first target dynamic visual sensor signal in the same first preset cycle, that is, there are D second target dynamic visual sensor signals in the same first preset cycle; W and H is the width and height of the photosensitive element of the dynamic vision sensor respectively; (x, y) represents the coordinates of any pixel in the photosensitive element of the dynamic vision sensor, or (x, y) represents any pixel on the two-dimensional image coordinates; (x', y') represents the coordinates of pixels whose proximity to any pixel (x, y) in the photosensitive element of the dynamic vision sensor is not greater than the preset proximity threshold, or (x', y') Indicates the coordinates of pixels whose proximity to any pixel (x, y) in the two-dimensional image is not greater than the preset proximity threshold, that is, the coordinates of the first target pixel; S k, x, y represents any pixel (x , y) the first signal value in the wth first target dynamic visual sensor signal among multiple first target dynamic visual sensor signals; S k+q, x', y' represent the first target pixel (x' , y′) the first signal value in the w+qth first target dynamic visual sensor signal among multiple first target dynamic visual sensor signals, that is, the first target pixel (x′, y′) at the The first signal value in the three-target dynamic vision sensor signal; the first value is
Figure PCTCN2022130027-appb-000011
The second value is p x,y (S k,x,y ,S k+q,x′,y′ ,Δ).
在本实现方式中,多个第一目标动态视觉传感器信号是预设时间段内的多个时刻的动态视觉传感器信号,可以通过时间滑窗的方法计算预设时间段内的动态视觉传感器信号的自相关系数;例如以第一预设周期的大小为时间窗大小在预设时间段上滑窗,一次滑动的时间间隔为一个第一预设周期的大小;在一次时间滑窗过程中,时间窗可以框住多个第一目标动态视觉传感器信号中的D个第一目标动态视觉传感器信号,例如记为D个第二目标动态视觉传感器信号;基于该D个第二目标动态视觉传感器信号,计算得到本次时间滑窗对应的动态视觉传感器信号的自相关系数。其中,由于多个第一目标动态视觉传感器信号每个第一目标动态视觉传感器信号包括多个像素的第一信号值,故每个时间窗(或第一预设周期)内的D个第二目标动态视觉传感器信号中的每个第二目标动态视觉传感器信号也包括多个像素的第一信号值,故可以根据相邻像素在不同时刻的第一信号值计算得到每个时间窗(或第一预设周期)对应的自相关系数。如此,以第一预设周期的大小为时间窗大小在预设时间段上经过多次滑窗后,并可以计算到多次时间滑窗对应的动态视觉传感器信号的自相关系数,从而可以得到多个自相关系数。In this implementation, the multiple first target dynamic visual sensor signals are dynamic visual sensor signals at multiple moments in the preset time period, and the dynamic visual sensor signals in the preset time period can be calculated by the method of time sliding window. Autocorrelation coefficient; for example, the size of the first preset period is used as the size of the time window to slide the window on the preset time period, and the time interval of one sliding is the size of a first preset period; during a time sliding window process, the time The window can frame D first target dynamic visual sensor signals in a plurality of first target dynamic visual sensor signals, for example, denoted as D second target dynamic visual sensor signals; based on the D second target dynamic visual sensor signals, The autocorrelation coefficient of the dynamic visual sensor signal corresponding to the time sliding window is calculated. Wherein, because each first target dynamic visual sensor signal of a plurality of first target dynamic visual sensor signals comprises the first signal value of a plurality of pixels, so D second in each time window (or first preset cycle) Each second target dynamic visual sensor signal in the target dynamic visual sensor signal also includes the first signal values of a plurality of pixels, so each time window (or the first signal value of each time window (or the second) can be calculated according to the first signal values of adjacent pixels at different times. A preset period) corresponding to the autocorrelation coefficient. In this way, with the size of the first preset period as the time window size, after multiple sliding windows in the preset time period, the autocorrelation coefficient of the dynamic visual sensor signal corresponding to the multiple time sliding windows can be calculated, so that it can be obtained Multiple autocorrelation coefficients.
在一种可能的实现方式中,所述第一目标像素包括多个第二目标像素,所述任一像素对应的第二数值是根据所述多个第二目标像素对应的第三数值累加得到的;所述多个第二目标像素中的任一第二目标像素对应的第三数值为所述任一像素的第一信号值和所述任一第二目标像素的目标信号值的乘积。In a possible implementation manner, the first target pixel includes a plurality of second target pixels, and the second value corresponding to any pixel is accumulated according to the third value corresponding to the plurality of second target pixels The third numerical value corresponding to any second target pixel among the plurality of second target pixels is the product of the first signal value of any pixel and the target signal value of any second target pixel.
具体地,S k,x,y表示任一像素(x,y)的第一信号值;S k+q,x′,y′表示任一第二目标像素(x′,y′)的目标信号值,第三数值为S k,x,yS k+q,x′,y′,第二数值为∑ x′,y′S k,x,yS k+q,x′,y′Specifically, S k, x, y represents the first signal value of any pixel (x, y); S k+q, x′, y′ represents the target value of any second target pixel (x′, y′). Signal value, the third value is S k,x,y S k+q,x′,y′ , the second value is ∑ x′,y′ S k,x,y S k+q,x′,y′ .
在本实现方式中,针对动态视觉传感器信号中的所有像素中任一像素,将该任一像素在某一时刻的第一信号值分别与该任一像素的每个相邻像素在另一时刻的第一信号值相乘,得到每个相邻像素对应的第三数值,也即得到该任一像素对应的多个第三数值;然后将每个相 邻像素对应的第三数值累加,得到该任一像素对应的第二数值;再将所有像素对应的第二数值累加,即可得到该某一时刻的第二目标动态视觉传感器信号对应的第一数值。针对一个时间窗(或第一预设周期)内的所有时刻的第二目标动态视觉传感器信号(例如D个第二目标动态视觉传感器信号)均执行上述操作,即可得到所有时刻的第二目标动态视觉传感器信号(例如D个第二目标动态视觉传感器信号)对应的第一数值;再对这个时间窗(或第一预设周期)内得到的所有第一数值计算平均值,即可得到这个时间窗(或第一预设周期)对应的动态视觉传感器信号的自相关系数。In this implementation, for any pixel in all the pixels in the dynamic visual sensor signal, the first signal value of the any pixel at a certain moment is respectively compared with the value of each adjacent pixel of the any pixel at another moment Multiply the first signal value of each adjacent pixel to obtain the third numerical value corresponding to each adjacent pixel, that is, to obtain a plurality of third numerical values corresponding to any pixel; then add the third numerical values corresponding to each adjacent pixel to obtain The second numerical value corresponding to any pixel; and then accumulating the second numerical values corresponding to all pixels to obtain the first numerical value corresponding to the second target dynamic vision sensor signal at a certain moment. For the second target dynamic visual sensor signals (such as D second target dynamic visual sensor signals) at all moments within a time window (or the first preset period), the above operations can be performed to obtain the second target at all moments The first value corresponding to the dynamic vision sensor signal (such as D second target dynamic vision sensor signal); and then calculate the average value of all the first values obtained in this time window (or the first preset period), and this can be obtained The autocorrelation coefficient of the dynamic vision sensor signal corresponding to the time window (or the first preset period).
需要说明的是,动态视觉传感器信号的最低时间分辨率为1us,为了提高动态视觉传感器信号的数据密度,本申请可以将一定时长内的动态视觉传感器信号压缩为一个动态视觉传感器信号,也即将多个动态视觉传感器信号压帧为一个动态视觉传感器信号。It should be noted that the minimum time resolution of the dynamic vision sensor signal is 1us. In order to improve the data density of the dynamic vision sensor signal, this application can compress the dynamic vision sensor signal within a certain period of time into one dynamic vision sensor signal, that is, multiple A dynamic visual sensor signal compression frame is a dynamic visual sensor signal.
下面以任一像素(x,y)为例,说明动态视觉传感器信号压缩的具体操作,如下所示:给定一系列动态视觉传感器信号:{s 1,s 2,…,s r},其中,s r表示任一像素(x,y)在t r时刻的动态视觉传感器信号中的信号极化值,正事件取值为1,负事件取值为-1;给定压帧时间窗T,例如T=500us,设S g为任一像素(x,y)经过g次压帧后的信号累加值;若gT<t r<(g+1)T,则S g+=s r,也即S g=S g+s r;遍历该一系列动态视觉传感器信号后,根据以下原则进行重置,以得到任一像素(x,y)的信号重置值:若S g>0,则S g=1;若S g<0,则S g=-1。 The following takes any pixel (x, y) as an example to illustrate the specific operation of dynamic visual sensor signal compression, as follows: Given a series of dynamic visual sensor signals: {s 1 , s 2 ,…,s r }, where , s r represents the signal polarization value of any pixel (x, y) in the dynamic visual sensor signal at time t r , the positive event takes the value 1, and the negative event takes the value -1; given the frame compression time window T , for example T=500us, let S g be the accumulated signal value of any pixel (x, y) after g times of compression; if gT<t r <(g+1)T, then S g +=s r , That is, S g =S g +s r ; after traversing the series of dynamic vision sensor signals, reset according to the following principles to obtain the signal reset value of any pixel (x, y): if S g >0, Then S g =1; if S g <0, then S g =−1.
在一种可能的实现方式中,所述多个第一目标动态视觉传感器信号中的任一第一目标动态视觉传感器信号包括的任一像素的第一信号值是根据所述任一像素的第二信号值得到的;其中:若所述任一像素的第二信号值大于0,则所述任一像素的第一信号值为1;若所述任一像素的第二信号值小于0,则所述任一像素的第一信号值为-1;所述任一像素的第二信号值为所述任一像素的m个第三信号值中的任意一个,m为正整数;所述m个第三信号值是根据所述预设时间段内的多个第三动态视觉传感器信号得到的;其中:所述m个第三信号值中的第g个第三信号值为所述m个第三信号值中的第g-1个第三信号值与所述任一像素在第g个第二预设周期内的第四信号值的累加和的和,1≤g≤m,g为正整数,所述多个第三动态视觉传感器信号包括所述任一像素在所述第g个第二预设周期内的第四信号值,所述预设时间段包括所述第g个第二预设周期;且g等于1时,第1个第三信号值为所述任一像素在第1个第二预设周期内的第四信号值的累加和。In a possible implementation manner, the first signal value of any pixel included in any first target dynamic vision sensor signal among the plurality of first target dynamic vision sensor signals is based on the first signal value of any pixel Two signal values are obtained; wherein: if the second signal value of any pixel is greater than 0, then the first signal value of any pixel is 1; if the second signal value of any pixel is less than 0, Then the first signal value of any pixel is -1; the second signal value of any pixel is any one of the m third signal values of any pixel, m is a positive integer; The m third signal values are obtained according to a plurality of third dynamic visual sensor signals within the preset time period; wherein: the gth third signal value among the m third signal values is the m The sum of the g-1th third signal value among the third signal values and the cumulative sum of the fourth signal value of any pixel in the gth second preset period, 1≤g≤m, g is a positive integer, the multiple third dynamic visual sensor signals include the fourth signal value of any pixel in the g second preset period, and the preset time period includes the g a second preset period; and when g is equal to 1, the first third signal value is the accumulated sum of the fourth signal values of any pixel in the first second preset period.
具体地,以压帧时间窗的大小为第二预设周期的大小,对预设时间段内的多个第三动态视觉传感器信号进行压帧,得到多个第一目标动态视觉传感器信号,其中,第三动态视觉传感器信号为原始的动态视觉传感器信号;例如,对r个第三动态视觉传感器信号进行压帧,得到m个第一目标动态视觉传感器信号。任一第一目标动态视觉传感器信号包括的任一像素(x,y)的第一信号值为信号重置值,该任一像素(x,y)的第二信号值或第三信号值为信号累加值,该任一像素(x,y)的第四信号值为信号极化值。Specifically, taking the size of the frame compression time window as the size of the second preset period, the multiple third dynamic visual sensor signals within the preset time period are compressed to obtain multiple first target dynamic visual sensor signals, wherein , the third dynamic vision sensor signal is the original dynamic vision sensor signal; for example, the r third dynamic vision sensor signals are compressed to obtain m first target dynamic vision sensor signals. The first signal value of any pixel (x, y) included in any first target dynamic vision sensor signal is a signal reset value, and the second or third signal value of any pixel (x, y) is A signal accumulation value, the fourth signal value of any pixel (x, y) is a signal polarization value.
在本实现方式中,将一定时长内的多个动态视觉传感器信号压缩为一个动态视觉传感器信号,以提高动态视觉传感器信号的数据密度。例如,以第二预设周期的大小为时间窗的大小为时间窗大小对预设时间段内的多个第三动态视觉传感器信号进行压帧,从而得到多个第一目标动态视觉传感器信号,其中的任一第一目标动态视觉传感器信号包括的任一像素的第一信号值是基于该任一像素在多个第三动态视觉传感器信号中的第四信号值的累加值重置得到的,重置的原则为:若该任一像素在多个第三动态视觉传感器信号中的第四信号值的累加值大于0,则将第一信号值设为1,若该任一像素在多个第三动态视觉传感器信号中的第四信 号值的累加值大于0,则将第一信号值设为-1。并且,由于第一目标动态视觉传感器信号具有较高的数据密度,相比于第三动态视觉传感器信号,采用第一目标动态视觉传感器信号对脉冲神经网络模型进行训练,可以提高训练效率。In this implementation manner, multiple dynamic vision sensor signals within a certain period of time are compressed into one dynamic vision sensor signal, so as to increase the data density of the dynamic vision sensor signals. For example, taking the size of the second preset period as the size of the time window as the size of the time window to compress the multiple third dynamic visual sensor signals within the preset time period, thereby obtaining multiple first target dynamic visual sensor signals, The first signal value of any pixel included in any first target dynamic visual sensor signal is obtained based on the cumulative value reset of the fourth signal value of the arbitrary pixel in multiple third dynamic visual sensor signals, The principle of resetting is: if the cumulative value of the fourth signal value of any pixel in multiple third dynamic vision sensor signals is greater than 0, then the first signal value is set to 1, if the any pixel is in multiple If the accumulated value of the fourth signal value in the third dynamic vision sensor signal is greater than 0, then the first signal value is set to -1. Moreover, since the first target dynamic visual sensor signal has a higher data density, compared with the third dynamic visual sensor signal, using the first target dynamic visual sensor signal to train the spiking neural network model can improve the training efficiency.
应理解,第一目标动态视觉传感器信号可以为原始的动态视觉传感器信号,例如第三动态视觉传感器信号;第一目标动态视觉传感器信号也可以是通过对原始的动态视觉传感器信号进行压帧而得到的信号,例如多个第三动态视觉传感器信号压帧得到一个第一目标动态视觉传感器信号。其中,当第一目标动态视觉传感器信号是原始的动态视觉传感器信号时,第一目标动态视觉传感器信号中的任一像素的第一信号值为信号极化值;当第一目标动态视觉传感器信号是压帧得到的信号时,第一目标动态视觉传感器信号中的任一像素的第一信号值为信号重置值。It should be understood that the first target dynamic visual sensor signal can be an original dynamic visual sensor signal, such as the third dynamic visual sensor signal; the first target dynamic visual sensor signal can also be obtained by compressing the original dynamic visual sensor signal For example, a plurality of third dynamic vision sensor signals are compressed to obtain a first target dynamic vision sensor signal. Wherein, when the first target dynamic visual sensor signal was the original dynamic visual sensor signal, the first signal value of any pixel in the first target dynamic visual sensor signal was a signal polarization value; when the first target dynamic visual sensor signal When it is a signal obtained by compressing frames, the first signal value of any pixel in the first target dynamic vision sensor signal is a signal reset value.
请参阅图6,图6为本申请实施例提供的一种脉冲神经网络模型的训练过程示意图;在动态视觉传感器信号的训练数据集上,运用本申请的技术方案可以对训练集进行有效训练,脉冲神经网络模型在一段时间后基本收敛,例如再迭代50次以后,损失基本收敛。Please refer to Fig. 6, Fig. 6 is a schematic diagram of the training process of a spiking neural network model provided by the embodiment of the present application; on the training data set of the dynamic visual sensor signal, the technical solution of the present application can be used to effectively train the training set, The spiking neural network model basically converges after a period of time, for example, after 50 iterations, the loss basically converges.
请参阅图7,图7为不同目标参数对本申请实施例提供的脉冲神经网络模型的训练的影响示意图;突触后膜电压核函数的目标参数τ s不同,脉冲神经网络模型训练效果不同;训练时的损失随着目标参数τ s的增大,呈现先减小再增大的趋势;其中,当τ s在10附近时,损失越小。因此,可以通过本申请的技术方案,选择合适的目标参数τ s,使得脉冲神经网络模型的训练可以达到最佳。 Please refer to Fig. 7. Fig. 7 is a schematic diagram of the impact of different target parameters on the training of the spiking neural network model provided by the embodiment of the present application; the target parameter τ s of the post-synaptic membrane voltage kernel function is different, and the spiking neural network model training effect is different; training With the increase of the target parameter τ s , the loss at time presents a trend of first decreasing and then increasing; among them, when τ s is around 10, the loss is smaller. Therefore, through the technical solution of the present application, an appropriate target parameter τ s can be selected so that the training of the spiking neural network model can be optimized.
请参阅图8,图8为本申请实施例提供的脉冲神经网络模型的去噪效果与三维卷积神经网络模型的去噪效果对比图;与传统三维卷积神经网络模型卷相比,本申请提供的脉冲神经网络模型的去噪效果明显更佳。Please refer to Fig. 8, Fig. 8 is a comparison chart of the denoising effect of the spike neural network model provided by the embodiment of the present application and the denoising effect of the three-dimensional convolutional neural network model; compared with the traditional three-dimensional convolutional neural network model volume, the present application The denoising effect of the provided spiking neural network model is significantly better.
其中,本申请实施例提供的脉冲神经网络模型与三维卷积神经网络模型在运行时间、参数量和计算量方面的对比如表1所示。Among them, the comparison between the spiking neural network model provided in the embodiment of the present application and the three-dimensional convolutional neural network model in terms of running time, parameter quantity and calculation quantity is shown in Table 1.
表1脉冲神经网络模型与三维卷积神经网络模型的运行时间、参数量和计算量对比Table 1 Comparison of running time, parameters and computation between the spiking neural network model and the 3D convolutional neural network model
网络模型network model 运行时间operation hours 参数量Parameter amount 计算量Calculations
三维卷积神经网络模型3D Convolutional Neural Network Model 170s170s 6.54MB6.54MB 531.5G531.5G
脉冲神经网络模型Spiking Neural Network Model 14s14s 150KB150KB 1G-10G1G-10G
由表1可知,在去噪效果相近的情况下,本申请提供的脉冲神经网络模型在参数量、运行时间、计算量方面都有明显优势;其中,表1是采用1000片346*260大小的动态视觉传感器采集的图像在TESLA v100 GPU上运行的统计结果。It can be seen from Table 1 that, in the case of similar denoising effects, the spiking neural network model provided by this application has obvious advantages in terms of parameter quantity, running time, and calculation amount; among them, Table 1 uses 1000 pieces of 346*260 size Statistical results of images captured by dynamic vision sensors running on TESLA v100 GPU.
请参阅图9,图9为本申请实施例提供的脉冲神经网络模型去噪后的信号与三维卷积神经网络模型去噪后的信号去模糊后的效果对比图;先对动态视觉传感器信号进行去噪,再利用去噪后的动态视觉传感器信号进行后续去模糊任务时,基于脉冲神经网络模型去噪后动态视觉传感器信号进行的去模糊图像,对比基于三维卷积神经网络模型去噪后动态视觉传感器信号进行的去模糊图像,可以保留更多真实细节,更加贴近真实。Please refer to Fig. 9, Fig. 9 is a comparison diagram of the denoising effect of the denoising signal of the pulse neural network model provided by the embodiment of the present application and the de-blurred signal of the three-dimensional convolutional neural network model; first, the dynamic visual sensor signal is processed Denoising, when using the denoised dynamic visual sensor signal for subsequent deblurring tasks, the deblurred image based on the dynamic visual sensor signal after denoising based on the pulse neural network model is compared with the denoising based on the three-dimensional convolutional neural network model. The deblurred image performed by the visual sensor signal can retain more real details and be closer to reality.
综上可知,通过本申请的技术方案设定突触后膜电压核函数的目标参数,使得脉冲神经网络模型学习到动态视觉传感器信号的时间相关性,从而运用脉冲神经元的时间动力学对动态视觉传感器信号进行流式去噪,极大缩小了网络规模、运行时间和计算量。In summary, through the technical solution of this application, setting the target parameters of the post-synaptic membrane voltage kernel function enables the spiking neural network model to learn the temporal correlation of dynamic visual sensor signals, thereby using the temporal dynamics of spiking neurons to dynamically Streaming denoising of visual sensor signals greatly reduces the network size, running time and computation.
需要说明的是,过程300描述的一系列的步骤或操作,还可以对应参照图1和图2所示实施例的相应描述。It should be noted that the series of steps or operations described in the process 300 may also correspond to corresponding descriptions in the embodiment shown in FIG. 1 and FIG. 2 .
请参阅图10,图10为本申请实施例提供的一种去噪装置的结构示意图;该去噪装置1000应用于电子设备,该电子设备包括服务器和终端,该去噪装置1000包括:获取单元1001,用于获取第一动态视觉传感器信号;处理单元1002,用于采用脉冲神经网络模型对所述第一动态视觉传感器信号进行去噪处理,以得到第二动态视觉传感器信号,所述脉冲神经网络模型的脉冲神经元的突触后膜电压核函数包括目标参数,所述目标参数是根据动态视觉传感器信号的自相关系数确定的。Please refer to FIG. 10. FIG. 10 is a schematic structural diagram of a denoising device provided by an embodiment of the present application; the denoising device 1000 is applied to electronic equipment, the electronic equipment includes a server and a terminal, and the denoising device 1000 includes: an acquisition unit 1001, for acquiring a first dynamic visual sensor signal; a processing unit 1002, for performing denoising processing on the first dynamic visual sensor signal by using a spiking neural network model to obtain a second dynamic visual sensor signal, the spiking neural network model The postsynaptic membrane voltage kernel function of the spiking neurons of the network model includes target parameters determined from the autocorrelation coefficients of dynamic visual sensor signals.
在一种可能的实现方式中,所述动态视觉传感器信号的自相关系数包括多个自相关系数,所述多个自相关系数是根据预设时间段内的多个第一目标动态视觉传感器信号得到的;所述目标参数是根据预设自相关系数阈值和预设函数得到的,所述预设函数是根据所述多个自相关系数在时间上的分布拟合得到的。In a possible implementation manner, the autocorrelation coefficient of the dynamic vision sensor signal includes a plurality of autocorrelation coefficients, and the plurality of autocorrelation coefficients are based on a plurality of first target dynamic vision sensor signals within a preset time period. obtained; the target parameter is obtained according to a preset autocorrelation coefficient threshold and a preset function, and the preset function is obtained by fitting according to the distribution of the multiple autocorrelation coefficients in time.
在一种可能的实现方式中,所述多个自相关系数中的任一自相关系数为D个第二目标动态视觉传感器信号对应的第一数值的平均值,D为正整数;所述D个第二目标动态视觉传感器信号为所述多个第一目标动态视觉传感器信号中归属于同一第一预设周期的D个第一目标动态视觉传感器信号,所述预设时间段包括多个所述第一预设周期;所述D个第二目标动态视觉传感器信号中的任一第二目标动态视觉传感器信号对应的第一数值是根据多个像素对应的第二数值累加得到的;所述多个像素中的任一像素对应的第二数值是根据所述任一像素的第一信号值和第一目标像素的目标信号值得到的,所述第一目标像素为与所述任一像素的邻近度不大于预设邻近度阈值的像素;所述任一第二目标动态视觉传感器信号包括所述任一像素的第一信号值,所述任一第二目标动态视觉传感器信号为所述多个第一目标动态视觉传感器信号中的第w个第一目标动态视觉传感器信号,w为正整数;所述第一目标像素的目标信号值为所述第一目标像素在第三目标动态视觉传感器信号中的第一信号值,所述第三目标动态视觉传感器信号为所述多个第一目标动态视觉传感器信号中的第w+q个第一目标动态视觉传感器信号,q为正整数。In a possible implementation, any autocorrelation coefficient among the plurality of autocorrelation coefficients is the average value of the first values corresponding to the D second target dynamic vision sensor signals, and D is a positive integer; the D The second target dynamic vision sensor signals are the D first target dynamic vision sensor signals belonging to the same first preset period among the multiple first target dynamic vision sensor signals, and the preset time period includes a plurality of the first target dynamic vision sensor signals The first preset period; the first numerical value corresponding to any second target dynamic visual sensor signal in the D second target dynamic visual sensor signals is obtained by accumulating the second numerical values corresponding to a plurality of pixels; The second numerical value corresponding to any pixel in the plurality of pixels is obtained according to the first signal value of the any pixel and the target signal value of the first target pixel, and the first target pixel is the same as the any pixel The pixels whose proximity is not greater than the preset proximity threshold; the any second target dynamic visual sensor signal includes the first signal value of any pixel, and the any second target dynamic visual sensor signal is the The w first target dynamic vision sensor signal in a plurality of first target dynamic vision sensor signals, w is a positive integer; the target signal value of the first target pixel is the first target pixel in the third target dynamic vision The first signal value in the sensor signal, the third target dynamic vision sensor signal is the w+qth first target dynamic vision sensor signal among the plurality of first target dynamic vision sensor signals, and q is a positive integer.
在一种可能的实现方式中,所述第一目标像素包括多个第二目标像素,所述任一像素对应的第二数值是根据所述多个第二目标像素对应的第三数值累加得到的;所述多个第二目标像素中的任一第二目标像素对应的第三数值为所述任一像素的第一信号值和所述任一第二目标像素的目标信号值的乘积。In a possible implementation manner, the first target pixel includes a plurality of second target pixels, and the second value corresponding to any pixel is accumulated according to the third value corresponding to the plurality of second target pixels The third numerical value corresponding to any second target pixel among the plurality of second target pixels is the product of the first signal value of any pixel and the target signal value of any second target pixel.
在一种可能的实现方式中,所述多个第一目标动态视觉传感器信号中的任一第一目标动态视觉传感器信号包括的任一像素的第一信号值是根据所述任一像素的第二信号值得到的;其中:若所述任一像素的第二信号值大于0,则所述任一像素的第一信号值为1;若所述任一像素的第二信号值小于0,则所述任一像素的第一信号值为-1;所述任一像素的第二信号值为所述任一像素的m个第三信号值中的任意一个,m为正整数;所述m个第三信号值是根据所述预设时间段内的多个第三动态视觉传感器信号得到的;其中:所述m个第三信号值中的第g个第三信号值为所述m个第三信号值中的第g-1个第三信号值与所述任一像素在第g个第二预设周期内的第四信号值的累加和的和,1≤g≤m,g为正整数,所述多个第三动态视觉传感器信号包括所述任一像素在所述第g个第二预设周期内的第四信号值,所述预设时间段包括所述第g个第二预设周期;且g等于1时,第1个第三信号值为所述任一像素在第1个第二预设周期内的第四信号值的累加和。In a possible implementation manner, the first signal value of any pixel included in any first target dynamic vision sensor signal among the plurality of first target dynamic vision sensor signals is based on the first signal value of any pixel Two signal values are obtained; wherein: if the second signal value of any pixel is greater than 0, then the first signal value of any pixel is 1; if the second signal value of any pixel is less than 0, Then the first signal value of any pixel is -1; the second signal value of any pixel is any one of the m third signal values of any pixel, m is a positive integer; The m third signal values are obtained according to a plurality of third dynamic visual sensor signals within the preset time period; wherein: the gth third signal value among the m third signal values is the m The sum of the g-1th third signal value among the third signal values and the cumulative sum of the fourth signal value of any pixel in the gth second preset period, 1≤g≤m, g is a positive integer, the multiple third dynamic visual sensor signals include the fourth signal value of any pixel in the g second preset period, and the preset time period includes the g a second preset period; and when g is equal to 1, the first third signal value is the accumulated sum of the fourth signal values of any pixel in the first second preset period.
在一种可能的实现方式中,所述脉冲神经网络模型包括N个卷积层和N个逆卷积层,其中:所述N个卷积层中的第j个卷积层的输出为所述N个卷积层中的第j+1个卷积层的输入,所述N个逆卷积层中的第j个逆卷积层的输出为所述N个逆卷积层中的第j+1个逆卷积层的 输入,所述第j个卷积层的输出还为所述N个逆卷积层中的第N-j个逆卷积层的输入,所述N个卷积层中的第N个卷积层的输出为所述N个逆卷积层中的第1个逆卷积层的输入,1≤j≤N,N和j为正整数。In a possible implementation manner, the spiking neural network model includes N convolutional layers and N inverse convolutional layers, wherein: the output of the jth convolutional layer in the N convolutional layers is The input of the j+1th convolutional layer in the N convolutional layers, the output of the jth inverse convolutional layer in the N inverse convolutional layers is the first in the N inverse convolutional layers The input of the j+1 deconvolution layer, the output of the jth convolution layer is also the input of the N-j deconvolution layer in the N deconvolution layers, and the N convolution layers The output of the Nth convolutional layer in is the input of the first inverse convolutional layer in the N deconvolutional layers, 1≤j≤N, and N and j are positive integers.
需要说明的是,图10所描述的该去噪装置1000的各个单元的实现还可以对应参照图1至图9所示的实施例的相应描述。并且,图10所描述的去噪装置1000带来的有益效果可以参照图1至图9所示的实施例的相应描述,此处不再重复描述。It should be noted that, the implementation of each unit of the denoising apparatus 1000 described in FIG. 10 may also refer to corresponding descriptions of the embodiments shown in FIGS. 1 to 9 . Moreover, the beneficial effects brought by the denoising device 1000 described in FIG. 10 can refer to the corresponding descriptions of the embodiments shown in FIG. 1 to FIG. 9 , and the description will not be repeated here.
请参见图11,图11是本申请实施例提供的一种电子设备1110的结构示意图,该电子设备1110包括处理器1111、存储器1112和通信接口1113,上述处理器1111、存储器1112和通信接口1113通过总线1114相互连接。Please refer to FIG. 11. FIG. 11 is a schematic structural diagram of an electronic device 1110 provided by an embodiment of the present application. The electronic device 1110 includes a processor 1111, a memory 1112, and a communication interface 1113. The above-mentioned processor 1111, memory 1112, and communication interface 1113 They are connected to each other through the bus 1114 .
存储器1112包括但不限于是随机存储记忆体(random access memory,RAM)、只读存储器(read-only memory,ROM)、可擦除可编程只读存储器(erasable programmable read only memory,EPROM)、或便携式只读存储器(compact disc read-only memory,CD-ROM),该存储器1112用于相关计算机程序及数据。通信接口1113用于接收和发送数据。 Memory 1112 includes, but is not limited to, random access memory (random access memory, RAM), read-only memory (read-only memory, ROM), erasable programmable read-only memory (erasable programmable read only memory, EPROM), or Portable read-only memory (compact disc read-only memory, CD-ROM), the memory 1112 is used for related computer programs and data. The communication interface 1113 is used to receive and send data.
处理器1111可以是一个或多个中央处理器(central processing unit,CPU),在处理器1111是一个CPU的情况下,该CPU可以是单核CPU,也可以是多核CPU。The processor 1111 may be one or more central processing units (central processing unit, CPU). In the case where the processor 1111 is a CPU, the CPU may be a single-core CPU or a multi-core CPU.
该电子设备1110中的处理器1111用于读取上述存储器1112中存储的计算机程序代码,执行图3所示的任意一个实施例的方法。The processor 1111 in the electronic device 1110 is configured to read the computer program code stored in the memory 1112, and execute the method of any one of the embodiments shown in FIG. 3 .
需要说明的是,该电子设备可以为服务器或终端,图11所描述的电子设备1110的各个操作的实现还可以对应参照图1至图9所示的实施例的相应描述。并且,图11所描述的电子设备1110带来的有益效果可以参照图1至图9所示的实施例的相应描述,此处不再重复描述。It should be noted that the electronic device may be a server or a terminal, and the realization of various operations of the electronic device 1110 described in FIG. 11 may refer to corresponding descriptions of the embodiments shown in FIGS. 1 to 9 . Moreover, for the beneficial effects brought by the electronic device 1110 described in FIG. 11 , reference may be made to the corresponding descriptions of the embodiments shown in FIGS. 1 to 9 , and the description will not be repeated here.
本申请实施例还提供一种芯片,上述芯片包括至少一个处理器,存储器和接口电路,上述存储器、上述收发器和上述至少一个处理器通过线路互联,上述至少一个存储器中存储有计算机程序;上述计算机程序被上述处理器执行时,图3所示的任意一个实施例的方法流程得以实现。The embodiment of the present application also provides a chip, the above-mentioned chip includes at least one processor, memory and interface circuit, the above-mentioned memory, the above-mentioned transceiver and the above-mentioned at least one processor are interconnected by lines, and the above-mentioned at least one memory stores a computer program; the above-mentioned When the computer program is executed by the above-mentioned processor, the method flow of any one embodiment shown in FIG. 3 is realized.
本申请实施例还提供一种计算机可读存储介质,上述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,图3所示的任意一个实施例的方法流程得以实现。An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored in the above-mentioned computer-readable storage medium, and when the computer program is run on a computer, the method flow of any one of the embodiments shown in FIG. 3 is implemented.
本申请实施例还提供一种计算机程序产品,当上述计算机程序产品在计算机上运行时,图3所示的任意一个实施例的方法流程得以实现。An embodiment of the present application further provides a computer program product. When the computer program product is run on a computer, the method flow of any one of the embodiments shown in FIG. 3 is realized.
应理解,本申请实施例中提及的处理器可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that the processor mentioned in the embodiment of the present application may be a central processing unit (Central Processing Unit, CPU), and may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application-specific integrated circuits ( Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
还应理解,本申请实施例中提及的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存 储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。It should also be understood that the memory mentioned in the embodiments of the present application may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memories. Among them, the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash. The volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (Static RAM, SRAM), Dynamic Random Access Memory (Dynamic RAM, DRAM), Synchronous Dynamic Random Access Memory (Synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (Synchlink DRAM, SLDRAM ) and Direct Memory Bus Random Access Memory (Direct Rambus RAM, DR RAM).
需要说明的是,当处理器为通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件时,存储器(存储模块)集成在处理器中。It should be noted that when the processor is a general-purpose processor, DSP, ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components, the memory (storage module) is integrated in the processor.
应注意,本文描述的存储器旨在包括但不限于这些和任意其它适合类型的存储器。It should be noted that the memories described herein are intended to include, but are not limited to, these and any other suitable types of memories.
还应理解,本文中涉及的第一、第二、第三以及各种数字编号仅为描述方便进行的区分,并不用来限制本申请的范围。It should also be understood that the first, second, third and various numbers mentioned herein are only for convenience of description and are not intended to limit the scope of the present application.
应理解,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。It should be understood that the term "and/or" in this article is only an association relationship describing associated objects, indicating that there may be three relationships, for example, A and/or B may mean: A exists alone, and A and B exist at the same time , there are three cases of B alone. In addition, the character "/" in this article generally indicates that the contextual objects are an "or" relationship.
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that, in various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, and should not be used in the embodiments of the present application. The implementation process constitutes any limitation.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those skilled in the art can appreciate that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described system, device and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the above units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or can be Integrate into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
上述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所示方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the above functions are realized in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods shown in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .
本申请实施例方法中的步骤可以根据实际需要进行顺序调整、合并和删减。The steps in the methods of the embodiments of the present application can be adjusted, combined and deleted according to actual needs.
本申请实施例装置中的模块可以根据实际需要进行合并、划分和删减。The modules in the device of the embodiment of the present application can be combined, divided and deleted according to actual needs.
以上,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。Above, the above embodiments are only used to illustrate the technical solutions of the present application, rather than to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still be applied to the foregoing embodiments The technical solutions described in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the application.

Claims (15)

  1. 一种去噪方法,其特征在于,包括:A denoising method, characterized in that, comprising:
    获取第一动态视觉传感器信号;Acquiring the first dynamic vision sensor signal;
    采用脉冲神经网络模型对所述第一动态视觉传感器信号进行去噪处理,以得到第二动态视觉传感器信号,所述脉冲神经网络模型的脉冲神经元的突触后膜电压核函数包括目标参数,所述目标参数是根据动态视觉传感器信号的自相关系数确定的。Using a spiking neural network model to denoise the first dynamic visual sensor signal to obtain a second dynamic visual sensor signal, the post-synaptic membrane voltage kernel function of the spiking neuron of the spiking neural network model includes a target parameter, The target parameter is determined according to the autocorrelation coefficient of the dynamic vision sensor signal.
  2. 根据权利要求1所述的方法,其特征在于,所述动态视觉传感器信号的自相关系数包括多个自相关系数,所述多个自相关系数是根据预设时间段内的多个第一目标动态视觉传感器信号得到的;The method according to claim 1, wherein the autocorrelation coefficient of the dynamic visual sensor signal includes a plurality of autocorrelation coefficients, and the plurality of autocorrelation coefficients are based on a plurality of first target objects in a preset time period. Obtained from dynamic visual sensor signals;
    所述目标参数是根据预设自相关系数阈值和预设函数得到的,所述预设函数是根据所述多个自相关系数在时间上的分布拟合得到的。The target parameter is obtained according to a preset autocorrelation coefficient threshold and a preset function, and the preset function is obtained by fitting according to the distribution of the multiple autocorrelation coefficients in time.
  3. 根据权利要求2所述的方法,其特征在于,所述多个自相关系数中的任一自相关系数为D个第二目标动态视觉传感器信号对应的第一数值的平均值,D为正整数;所述D个第二目标动态视觉传感器信号为所述多个第一目标动态视觉传感器信号中归属于同一第一预设周期的D个第一目标动态视觉传感器信号,所述预设时间段包括多个所述第一预设周期;The method according to claim 2, wherein any autocorrelation coefficient in the plurality of autocorrelation coefficients is the average value of the first values corresponding to D second target dynamic vision sensor signals, and D is a positive integer ; The D second target dynamic visual sensor signals are D first target dynamic visual sensor signals belonging to the same first preset cycle among the multiple first target dynamic visual sensor signals, and the preset time period including multiple first preset periods;
    所述D个第二目标动态视觉传感器信号中的任一第二目标动态视觉传感器信号对应的第一数值是根据多个像素对应的第二数值累加得到的;The first value corresponding to any second target dynamic visual sensor signal in the D second target dynamic visual sensor signals is obtained by accumulating the second numerical values corresponding to a plurality of pixels;
    所述多个像素中的任一像素对应的第二数值是根据所述任一像素的第一信号值和第一目标像素的目标信号值得到的,所述第一目标像素为与所述任一像素的邻近度不大于预设邻近度阈值的像素;所述任一第二目标动态视觉传感器信号包括所述任一像素的第一信号值,所述任一第二目标动态视觉传感器信号为所述多个第一目标动态视觉传感器信号中的第w个第一目标动态视觉传感器信号,w为正整数;所述第一目标像素的目标信号值为所述第一目标像素在第三目标动态视觉传感器信号中的第一信号值,所述第三目标动态视觉传感器信号为所述多个第一目标动态视觉传感器信号中的第w+q个第一目标动态视觉传感器信号,q为正整数。The second numerical value corresponding to any pixel in the plurality of pixels is obtained according to the first signal value of the any pixel and the target signal value of the first target pixel, and the first target pixel is the same as the target signal value of the any pixel. The proximity of a pixel is not greater than the pixel of the preset proximity threshold; the any second target dynamic visual sensor signal includes the first signal value of any pixel, and the any second target dynamic visual sensor signal is The wth first target dynamic visual sensor signal in the plurality of first target dynamic visual sensor signals, w is a positive integer; the target signal value of the first target pixel is the first target pixel in the third target The first signal value in the dynamic visual sensor signal, the third target dynamic visual sensor signal is the w+q first target dynamic visual sensor signal in the plurality of first target dynamic visual sensor signals, and q is positive integer.
  4. 根据权利要求3所述的方法,其特征在于,所述第一目标像素包括多个第二目标像素,所述任一像素对应的第二数值是根据所述多个第二目标像素对应的第三数值累加得到的;The method according to claim 3, wherein the first target pixel includes a plurality of second target pixels, and the second value corresponding to any pixel is based on the first value corresponding to the plurality of second target pixels Accumulated by three values;
    所述多个第二目标像素中的任一第二目标像素对应的第三数值为所述任一像素的第一信号值和所述任一第二目标像素的目标信号值的乘积。The third value corresponding to any second target pixel among the plurality of second target pixels is the product of the first signal value of any pixel and the target signal value of any second target pixel.
  5. 根据权利要求2-4任一项所述的方法,其特征在于,所述多个第一目标动态视觉传感器信号中的任一第一目标动态视觉传感器信号包括的任一像素的第一信号值是根据所述任一像素的第二信号值得到的;其中:若所述任一像素的第二信号值大于0,则所述任一像素的第一信号值为1;若所述任一像素的第二信号值小于0,则所述任一像素的第一信号值为-1;The method according to any one of claims 2-4, characterized in that the first signal value of any pixel included in any first target dynamic visual sensor signal in the plurality of first target dynamic visual sensor signals It is obtained according to the second signal value of any pixel; wherein: if the second signal value of any pixel is greater than 0, then the first signal value of any pixel is 1; if any If the second signal value of the pixel is less than 0, then the first signal value of any pixel is -1;
    所述任一像素的第二信号值为所述任一像素的m个第三信号值中的任意一个,m为正整数;所述m个第三信号值是根据所述预设时间段内的多个第三动态视觉传感器信号得到的;其中:所述m个第三信号值中的第g个第三信号值为所述m个第三信号值中的第g-1个第三信号值与所述任一像素在第g个第二预设周期内的第四信号值的累加和的和,1≤g≤m,g为正整数,所述多个第三动态视觉传感器信号包括所述任一像素在所述第g个第二预设周期内的第四信号值,所述预设时间段包括所述第g个第二预设周期;且g等于1时,第1个第三信号值为所述任一像素在第1个第二预设周期内的第四信号值的累加和。The second signal value of any pixel is any one of m third signal values of any pixel, m is a positive integer; the m third signal values are based on the preset time period A plurality of third dynamic visual sensor signals obtained; wherein: the gth third signal value in the m third signal values is the g-1th third signal in the m third signal values Value and the sum of the cumulative sum of the fourth signal value of any pixel in the g second preset period, 1≤g≤m, g is a positive integer, and the plurality of third dynamic vision sensor signals include The fourth signal value of any pixel in the g-th second preset cycle, the preset time period includes the g-th second preset cycle; and when g is 1, the first The third signal value is an accumulated sum of fourth signal values of any pixel in the first second preset period.
  6. 根据权利要求1-5任一项所述的方法,其特征在于,所述脉冲神经网络模型包括N个 卷积层和N个逆卷积层,其中:所述N个卷积层中的第j个卷积层的输出为所述N个卷积层中的第j+1个卷积层的输入,所述N个逆卷积层中的第j个逆卷积层的输出为所述N个逆卷积层中的第j+1个逆卷积层的输入,所述第j个卷积层的输出还为所述N个逆卷积层中的第N-j个逆卷积层的输入,所述N个卷积层中的第N个卷积层的输出为所述N个逆卷积层中的第1个逆卷积层的输入,1≤j≤N,N和j为正整数。The method according to any one of claims 1-5, wherein the spiking neural network model includes N convolutional layers and N inverse convolutional layers, wherein: the first of the N convolutional layers The output of the j convolutional layers is the input of the j+1th convolutional layer in the N convolutional layers, and the output of the jth inverse convolutional layer in the N deconvolutional layers is the The input of the j+1th deconvolution layer in the N deconvolution layers, the output of the jth convolution layer is also the N-j deconvolution layer in the N deconvolution layers Input, the output of the Nth convolutional layer in the N convolutional layers is the input of the first inverse convolutional layer in the N deconvolutional layers, 1≤j≤N, N and j are positive integer.
  7. 一种去噪装置,其特征在于,包括:A denoising device, characterized in that it comprises:
    获取单元,用于获取第一动态视觉传感器信号;an acquisition unit, configured to acquire the first dynamic vision sensor signal;
    处理单元,用于采用脉冲神经网络模型对所述第一动态视觉传感器信号进行去噪处理,以得到第二动态视觉传感器信号,所述脉冲神经网络模型的脉冲神经元的突触后膜电压核函数包括目标参数,所述目标参数是根据动态视觉传感器信号的自相关系数确定的。The processing unit is used to use the spiking neural network model to perform denoising processing on the first dynamic visual sensor signal to obtain the second dynamic visual sensor signal, and the post-synaptic membrane voltage core of the spiking neuron of the spiking neural network model The function includes an objective parameter determined from an autocorrelation coefficient of the dynamic vision sensor signal.
  8. 根据权利要求7所述的装置,其特征在于,所述动态视觉传感器信号的自相关系数包括多个自相关系数,所述多个自相关系数是根据预设时间段内的多个第一目标动态视觉传感器信号得到的;The device according to claim 7, wherein the autocorrelation coefficient of the dynamic visual sensor signal comprises a plurality of autocorrelation coefficients, and the plurality of autocorrelation coefficients are based on a plurality of first targets within a preset time period. Obtained from dynamic visual sensor signals;
    所述目标参数是根据预设自相关系数阈值和预设函数得到的,所述预设函数是根据所述多个自相关系数在时间上的分布拟合得到的。The target parameter is obtained according to a preset autocorrelation coefficient threshold and a preset function, and the preset function is obtained by fitting according to the distribution of the multiple autocorrelation coefficients in time.
  9. 根据权利要求8所述的装置,其特征在于,所述多个自相关系数中的任一自相关系数为D个第二目标动态视觉传感器信号对应的第一数值的平均值,D为正整数;所述D个第二目标动态视觉传感器信号为所述多个第一目标动态视觉传感器信号中归属于同一第一预设周期的D个第一目标动态视觉传感器信号,所述预设时间段包括多个所述第一预设周期;The device according to claim 8, wherein any autocorrelation coefficient in the plurality of autocorrelation coefficients is the average value of the first values corresponding to D second target dynamic vision sensor signals, and D is a positive integer ; The D second target dynamic visual sensor signals are D first target dynamic visual sensor signals belonging to the same first preset cycle among the multiple first target dynamic visual sensor signals, and the preset time period including multiple first preset periods;
    所述D个第二目标动态视觉传感器信号中的任一第二目标动态视觉传感器信号对应的第一数值是根据多个像素对应的第二数值累加得到的;The first value corresponding to any second target dynamic visual sensor signal in the D second target dynamic visual sensor signals is obtained by accumulating the second numerical values corresponding to a plurality of pixels;
    所述多个像素中的任一像素对应的第二数值是根据所述任一像素的第一信号值和第一目标像素的目标信号值得到的,所述第一目标像素为与所述任一像素的邻近度不大于预设邻近度阈值的像素;所述任一第二目标动态视觉传感器信号包括所述任一像素的第一信号值,所述任一第二目标动态视觉传感器信号为所述多个第一目标动态视觉传感器信号中的第w个第一目标动态视觉传感器信号,w为正整数;所述第一目标像素的目标信号值为所述第一目标像素在第三目标动态视觉传感器信号中的第一信号值,所述第三目标动态视觉传感器信号为所述多个第一目标动态视觉传感器信号中的第w+q个第一目标动态视觉传感器信号,q为正整数。The second numerical value corresponding to any pixel in the plurality of pixels is obtained according to the first signal value of the any pixel and the target signal value of the first target pixel, and the first target pixel is the same as the target signal value of the any pixel. The proximity of a pixel is not greater than the pixel of the preset proximity threshold; the any second target dynamic visual sensor signal includes the first signal value of any pixel, and the any second target dynamic visual sensor signal is The wth first target dynamic visual sensor signal in the plurality of first target dynamic visual sensor signals, w is a positive integer; the target signal value of the first target pixel is the first target pixel in the third target The first signal value in the dynamic visual sensor signal, the third target dynamic visual sensor signal is the w+q first target dynamic visual sensor signal in the plurality of first target dynamic visual sensor signals, and q is positive integer.
  10. 根据权利要求9所述的装置,其特征在于,所述第一目标像素包括多个第二目标像素,所述任一像素对应的第二数值是根据所述多个第二目标像素对应的第三数值累加得到的;The device according to claim 9, wherein the first target pixel includes a plurality of second target pixels, and the second value corresponding to any pixel is based on the first value corresponding to the plurality of second target pixels Accumulated by three values;
    所述多个第二目标像素中的任一第二目标像素对应的第三数值为所述任一像素的第一信号值和所述任一第二目标像素的目标信号值的乘积。The third value corresponding to any second target pixel among the plurality of second target pixels is the product of the first signal value of any pixel and the target signal value of any second target pixel.
  11. 根据权利要求8-10任一项所述的装置,其特征在于,所述多个第一目标动态视觉传感器信号中的任一第一目标动态视觉传感器信号包括的任一像素的第一信号值是根据所述任一像素的第二信号值得到的;其中:若所述任一像素的第二信号值大于0,则所述任一像素的第一信号值为1;若所述任一像素的第二信号值小于0,则所述任一像素的第一信号值为-1;The device according to any one of claims 8-10, wherein the first signal value of any pixel included in any first target dynamic visual sensor signal among the plurality of first target dynamic visual sensor signals It is obtained according to the second signal value of any pixel; wherein: if the second signal value of any pixel is greater than 0, then the first signal value of any pixel is 1; if any If the second signal value of the pixel is less than 0, then the first signal value of any pixel is -1;
    所述任一像素的第二信号值为所述任一像素的m个第三信号值中的任意一个,m为正整数;所述m个第三信号值是根据所述预设时间段内的多个第三动态视觉传感器信号得到的;其中:所述m个第三信号值中的第g个第三信号值为所述m个第三信号值中的第g-1个第三 信号值与所述任一像素在第g个第二预设周期内的第四信号值的累加和的和,1≤g≤m,g为正整数,所述多个第三动态视觉传感器信号包括所述任一像素在所述第g个第二预设周期内的第四信号值,所述预设时间段包括所述第g个第二预设周期;且g等于1时,第1个第三信号值为所述任一像素在第1个第二预设周期内的第四信号值的累加和。The second signal value of any pixel is any one of m third signal values of any pixel, m is a positive integer; the m third signal values are based on the preset time period A plurality of third dynamic visual sensor signals obtained; wherein: the gth third signal value in the m third signal values is the g-1th third signal in the m third signal values Value and the sum of the cumulative sum of the fourth signal value of any pixel in the g second preset period, 1≤g≤m, g is a positive integer, and the plurality of third dynamic vision sensor signals include The fourth signal value of any pixel in the g-th second preset cycle, the preset time period includes the g-th second preset cycle; and when g is 1, the first The third signal value is an accumulated sum of fourth signal values of any pixel in the first second preset period.
  12. 根据权利要求7-11任一项所述的装置,其特征在于,所述脉冲神经网络模型包括N个卷积层和N个逆卷积层,其中:所述N个卷积层中的第j个卷积层的输出为所述N个卷积层中的第j+1个卷积层的输入,所述N个逆卷积层中的第j个逆卷积层的输出为所述N个逆卷积层中的第j+1个逆卷积层的输入,所述第j个卷积层的输出还为所述N个逆卷积层中的第N-j个逆卷积层的输入,所述N个卷积层中的第N个卷积层的输出为所述N个逆卷积层中的第1个逆卷积层的输入,1≤j≤N,N和j为正整数。The device according to any one of claims 7-11, wherein the spiking neural network model includes N convolutional layers and N inverse convolutional layers, wherein: the first of the N convolutional layers The output of the j convolutional layers is the input of the j+1th convolutional layer in the N convolutional layers, and the output of the jth inverse convolutional layer in the N deconvolutional layers is the The input of the j+1th deconvolution layer in the N deconvolution layers, the output of the jth convolution layer is also the N-j deconvolution layer in the N deconvolution layers Input, the output of the Nth convolutional layer in the N convolutional layers is the input of the first inverse convolutional layer in the N deconvolutional layers, 1≤j≤N, N and j are positive integer.
  13. 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises:
    一个或多个处理器;one or more processors;
    计算机可读存储介质,耦合到所述处理器并存储由所述处理器执行的程序,其中,所述程序在由所述处理器执行时,使得所述电子设备执行权利要求1-6中任一项所述的方法。A computer-readable storage medium coupled to the processor and storing a program executed by the processor, wherein the program, when executed by the processor, causes the electronic device to perform any of claims 1-6. one of the methods described.
  14. 一种计算机可读存储介质,其特征在于,包括程序代码,当其由计算机设备执行时,用于执行权利要求1-6中任一项所述的方法。A computer-readable storage medium, characterized by comprising program code, which is used to execute the method according to any one of claims 1-6 when executed by a computer device.
  15. 一种芯片,其特征在于,包括:处理器,用于从存储器中调用并运行计算机程序,使得安装有所述芯片的设备执行权利要求1-6中任一项所述的方法。A chip, characterized by comprising: a processor, configured to call and run a computer program from a memory, so that a device installed with the chip executes the method according to any one of claims 1-6.
PCT/CN2022/130027 2021-11-09 2022-11-04 Denoising method and related device WO2023083121A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111321228.7 2021-11-09
CN202111321228.7A CN116109489A (en) 2021-11-09 2021-11-09 Denoising method and related equipment

Publications (1)

Publication Number Publication Date
WO2023083121A1 true WO2023083121A1 (en) 2023-05-19

Family

ID=86253115

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/130027 WO2023083121A1 (en) 2021-11-09 2022-11-04 Denoising method and related device

Country Status (2)

Country Link
CN (1) CN116109489A (en)
WO (1) WO2023083121A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116989800A (en) * 2023-09-27 2023-11-03 安徽大学 Mobile robot visual navigation decision-making method based on pulse reinforcement learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100094382A1 (en) * 2006-12-22 2010-04-15 Pezaris John S Visual prosthesis and methods of creating visual perceptions
CN107622303A (en) * 2016-07-13 2018-01-23 三星电子株式会社 For the method for neutral net and the equipment of execution this method
CN111105581A (en) * 2019-12-20 2020-05-05 上海寒武纪信息科技有限公司 Intelligent early warning method and related product
CN112085768A (en) * 2020-09-02 2020-12-15 北京灵汐科技有限公司 Optical flow information prediction method, optical flow information prediction device, electronic device, and storage medium
CN112184760A (en) * 2020-10-13 2021-01-05 中国科学院自动化研究所 High-speed moving target detection tracking method based on dynamic vision sensor
CN112987026A (en) * 2021-03-05 2021-06-18 武汉大学 Event field synthetic aperture imaging algorithm based on hybrid neural network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100094382A1 (en) * 2006-12-22 2010-04-15 Pezaris John S Visual prosthesis and methods of creating visual perceptions
CN107622303A (en) * 2016-07-13 2018-01-23 三星电子株式会社 For the method for neutral net and the equipment of execution this method
CN111105581A (en) * 2019-12-20 2020-05-05 上海寒武纪信息科技有限公司 Intelligent early warning method and related product
CN112085768A (en) * 2020-09-02 2020-12-15 北京灵汐科技有限公司 Optical flow information prediction method, optical flow information prediction device, electronic device, and storage medium
CN112184760A (en) * 2020-10-13 2021-01-05 中国科学院自动化研究所 High-speed moving target detection tracking method based on dynamic vision sensor
CN112987026A (en) * 2021-03-05 2021-06-18 武汉大学 Event field synthetic aperture imaging algorithm based on hybrid neural network

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116989800A (en) * 2023-09-27 2023-11-03 安徽大学 Mobile robot visual navigation decision-making method based on pulse reinforcement learning

Also Published As

Publication number Publication date
CN116109489A (en) 2023-05-12

Similar Documents

Publication Publication Date Title
Rebecq et al. High speed and high dynamic range video with an event camera
Baldwin et al. Time-ordered recent event (TORE) volumes for event cameras
Iliadis et al. Deep fully-connected networks for video compressive sensing
WO2022036777A1 (en) Method and device for intelligent estimation of human body movement posture based on convolutional neural network
WO2021208122A1 (en) Blind video denoising method and device based on deep learning
CN112236779A (en) Image processing method and image processing device based on convolutional neural network
CN107958235B (en) Face image detection method, device, medium and electronic equipment
CN110222717B (en) Image processing method and device
CN111861894B (en) Image motion blur removing method based on generation type countermeasure network
CN111402130B (en) Data processing method and data processing device
CN111914997B (en) Method for training neural network, image processing method and device
CN112541877B (en) Defuzzification method, system, equipment and medium for generating countermeasure network based on condition
Haoyu et al. Learning to deblur and generate high frame rate video with an event camera
CN111079764A (en) Low-illumination license plate image recognition method and device based on deep learning
WO2023083121A1 (en) Denoising method and related device
WO2022100490A1 (en) Methods and systems for deblurring blurry images
Duan et al. Guided event filtering: Synergy between intensity images and neuromorphic events for high performance imaging
CN112712170A (en) Neural morphology vision target classification system based on input weighted impulse neural network
US20220215617A1 (en) Viewpoint image processing method and related device
CN112950505B (en) Image processing method, system and medium based on generation countermeasure network
Henderson et al. Spike event based learning in neural networks
WO2021037125A1 (en) Object identification method and apparatus
CN116206196B (en) Ocean low-light environment multi-target detection method and detection system thereof
CN114998659A (en) Image data classification method for training impulse neural network model on line along with time
Lian et al. An Image Deblurring Method Using Improved U‐Net Model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22891916

Country of ref document: EP

Kind code of ref document: A1