WO2023193670A1 - 基于事件相机的脉冲神经网络目标跟踪方法及系统 - Google Patents

基于事件相机的脉冲神经网络目标跟踪方法及系统 Download PDF

Info

Publication number
WO2023193670A1
WO2023193670A1 PCT/CN2023/085815 CN2023085815W WO2023193670A1 WO 2023193670 A1 WO2023193670 A1 WO 2023193670A1 CN 2023085815 W CN2023085815 W CN 2023085815W WO 2023193670 A1 WO2023193670 A1 WO 2023193670A1
Authority
WO
WIPO (PCT)
Prior art keywords
event
image
target
neural network
target tracking
Prior art date
Application number
PCT/CN2023/085815
Other languages
English (en)
French (fr)
Inventor
赵文一
唐华锦
洪朝飞
王笑
袁孟雯
陆宇婧
张梦骁
潘纲
Original Assignee
之江实验室
浙江大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 之江实验室, 浙江大学 filed Critical 之江实验室
Priority to US18/240,526 priority Critical patent/US20230410328A1/en
Publication of WO2023193670A1 publication Critical patent/WO2023193670A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4007Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the present application relates to the field of target tracking, and specifically relates to a pulse neural network target tracking method and system based on event cameras.
  • the identification and tracking of moving targets is a hot issue in the field of computer vision, and is widely used in human-computer interaction, video tracking, visual navigation, robots, and military guidance.
  • Target tracking methods based on correlation filtering are fast, but have limited feature extraction capabilities, and are less effective when faced with scale transformation and target loss problems.
  • the target tracking method based on deep learning has good feature expression ability and higher tracking accuracy, but it is followed by an increase in the amount of calculation, is subject to certain limitations in terms of real-time tracking, and is greatly affected by lighting, so it is not applicable. for highly dynamic scenes.
  • Event-based camera or Dynamic vision sensor (DVS) works differently than traditional frame rate cameras. Its output is not an intensity image, but at microsecond resolution. An asynchronous event stream in which each pixel is generated independently. Compared with frame rate cameras, event cameras have the advantages of low latency, low power consumption, and high dynamic range. They are more suitable for achieving fast target tracking in scenes with harsh lighting conditions such as too bright, too dark, or strong contrast between light and dark.
  • spiking neural networks can integrate spatiotemporal information due to the use of a spiking mechanism, and the way they simulate biological membrane potential has higher biological authenticity.
  • an event camera-based impulse neural network target tracking method and system are provided.
  • a pulse neural network target tracking method based on event cameras is proposed.
  • the method includes:
  • asynchronous event accumulation the data stream of asynchronous events is divided into synchronous event frames with millisecond time resolution.
  • the synchronous event frames are binary images similar to pulses;
  • the twin network includes a weight-sharing feature extractor and a cross-correlation calculator to calculate the target position. It is trained using the gradient substitution algorithm.
  • the twin network ;
  • the generation method of the synchronization event frame is: dividing the asynchronous events according to the set time step size and number, and accumulating the data stream of the asynchronous events in each time step. If the number of asynchronous events generated at a certain coordinate is greater than 0, the pixel at that coordinate is set to 1, otherwise the pixel is set to 0, and finally an event frame image divided by time steps is generated.
  • the feature extractor is generated by using a pulse convolutional neural network whose network structure is 96C5-2S-256C3-2S-384C3-384C3-256C3, where 96C5 represents the convolution kernel size kernel_size is 5.
  • a pulse convolution layer with an output channel of 96. 2S represents a pooling layer with 2 times downsampling, and so on.
  • the convolution step size of the first layer is 2, and the remaining convolution step sizes are all 1. All convolutional layers of this feature extractor are followed by spiking neurons.
  • the spiking neuron is a LIF (Leaky integrate and fire) neuron model, that is,
  • ⁇ m represents the membrane time constant
  • V represents the membrane potential
  • t represents the pulse time
  • V rest represents the resting potential
  • R m and I represent the impedance and input current of the cell membrane respectively.
  • the template image z size is 255*255*3, and the search image x size is 127*127*3.
  • the outputs after the feature extractor operation are respectively The size is 6*6*256; The size is 22*22*256.
  • the cross-correlation calculator is generated by: using the feature map of the template image z to extract features As a convolution kernel, search the feature map after extracting features from image x As the feature map to be convolved, perform a convolution operation on the two.
  • the result generated after the calculation of the convolution layer is a similarity heat map representing the predicted probability of the position of the predicted target center.
  • the position of the maximum pulse firing rate is the position of the predicted target center.
  • the generation method of the twin network is: using a brain-like computing development framework, based on batch training, the filled template images and search images are sequentially put into the same batch, so that the input layer neurons The number is the same and they share the same network connection; after passing through the feature extractor After the operation, the output of the odd-numbered sample is the output of the z branch. Perform cropping and delete edge padding operations to obtain a feature map of the desired size of 6*6*256.
  • the target tracking method is: the target image, that is, the template image, is not updated, and the feature extractor of the initial target It only needs to be calculated once.
  • the search image is an image that is cropped with the target position in the previous event frame as the center and is equivalent to 4 times the size of the template image.
  • the real-time performance is further improved by narrowing the search area.
  • Bicubic interpolation is used to upsample and restore the size of the similarity heat map to determine the predicted target position.
  • Three scales are used to search, that is, the images are respectively scaled to 1.03 ⁇ -1,0,1 ⁇ , and the position with the highest pulse firing rate, that is, the highest similarity, is selected from the output as the final result.
  • a pulse neural network target tracking system based on event cameras includes a data stream acquisition module, a high time resolution event frame generation module, a network training module, and a network output module.
  • the data stream acquisition module is configured to acquire the data stream of asynchronous events in the target high dynamic scene through the event camera as input data; the high time resolution event frame generation module is configured to accumulate the asynchronous events through asynchronous event accumulation.
  • the data stream is divided into synchronization event frames with millisecond time resolution, which are binary images similar to pulses;
  • the network training module is configured to use the target image as the template image z, and use the complete image as the search image x , train a twin network based on the impulse neural network.
  • the twin network includes a weight-sharing feature extractor and a similarity calculator for calculating the target position.
  • the gradient substitution algorithm is used to train the twin network; the network output module is configured to use the trained The twin network is used for target tracking, and the result of the feature mapping is interpolated and upsampled to obtain the target position in the original image to achieve target tracking.
  • a third aspect of the present application proposes an electronic device, including a memory and a processor.
  • a computer program is stored in the memory, and the processor is configured to run the computer program to perform any of the above.
  • Impulse neural network target tracking method based on event camera.
  • a fourth aspect of the present application proposes a computer-readable storage medium on which a computer program is stored, characterized in that when the computer program is executed by a processor, the event camera-based pulse event described in any one of the above items is implemented. Steps of neural network target tracking method.
  • Figure 1 is a schematic flowchart of an event camera-based impulse neural network target tracking method according to one or more embodiments of the present application.
  • Figure 2 is a schematic framework diagram of an event camera-based impulse neural network target tracking system according to one or more embodiments of the present application.
  • Figure 3 is a schematic flowchart of an event camera-based impulse neural network target tracking method according to one or more embodiments of the present application.
  • Figure 4 is a schematic structural diagram of an electronic device according to one or more embodiments of the present application.
  • the event camera-based impulse neural network target tracking method of this application includes the following steps:
  • Step S10 obtain the data stream of asynchronous events in the target high-dynamic scene through the event camera as input data
  • Step S20 divide the data stream of asynchronous events into synchronization event frames with millisecond time resolution through asynchronous event accumulation, and the synchronization event frames are binary images similar to pulses;
  • Step S30 using the target image as the template image z and the complete image as the search image x, train a twin network based on the impulse neural network.
  • the twin network includes a weight-sharing feature extractor and a cross-correlation calculator for calculating the target position, using gradients.
  • Alternative algorithms train the Siamese network;
  • Step S40 Use the twin network trained in step S30 to perform target tracking, perform interpolation upsampling on the feature mapping results, obtain the target position in the original image, and implement target tracking.
  • Step S10 Obtain the data stream of asynchronous events in the target high-dynamic scene through the event camera as input data.
  • the data stream of asynchronous events in the target high-dynamic scene is obtained through the event camera.
  • the data stream is specifically in the format of [t, p, x, y], where t is the timestamp and p is the polarity of the asynchronous event.
  • t, p, x, y represent the coordinates of the asynchronous event in the pixel coordinate system.
  • Step S20 Divide the data stream of asynchronous events into synchronization event frames with millisecond time resolution through asynchronous event accumulation.
  • the synchronization event frames are binary images similar to pulses.
  • asynchronous events need to be divided according to the set time step size and number.
  • the time step is set to 0.1ms.
  • the event camera can be used to achieve the equivalent of traditional frames.
  • the shooting rate of the camera is 200FPS (Frames Per Second, frames transmitted per second), which greatly improves the real-time nature of the data.
  • Image of event frames divided into time steps. In this embodiment, only asynchronous events with positive polarity, that is, p 1, are processed. As shown in Figure 3, the remainder of Figure 3 is described below.
  • Step S30 using the target image as the template image z and the complete image as the search image x, train a twin network based on the impulse neural network.
  • the twin network includes a weight-sharing feature extractor and a cross-correlation calculator for calculating the target position, using gradients.
  • Alternative algorithms train the Siamese network.
  • Step S30 includes steps S301 to S305.
  • the feature extractor is a pulse convolutional neural network, and its network structure is 96C5-2S-256C3-2S-384C3-384C3-256C3, where 96C5 indicates that the convolution kernel size kernel_size is 5, and the output channel is A pulse convolution layer of 96, 2S means a pooling layer with 2 times downsampling, and so on.
  • the convolution step size of the first layer is 2, and the remaining convolution step sizes are all 1. All convolutional layers of this feature extractor are followed by spiking neurons.
  • the spiking neuron is a LIF (Leaky integrate and fire) neuron model, that is,
  • ⁇ m represents the membrane time constant
  • V represents the membrane potential
  • t represents the pulse time
  • V rest represents the resting potential
  • R m and I represent the impedance and input current of the cell membrane respectively.
  • the template image z size is 255*255*3, and the search image x size is 127*127*3.
  • the outputs after the feature extractor operation are respectively The size is 6*6*256; The size is 22*22*256.
  • the cross-correlation calculator described in this embodiment is a convolution layer, and the convolution layer uses the feature map of the template image z to extract features As a convolution kernel, search the feature map after extracting features from image x As the feature map to be convolved, the calculation formula of the cross-correlation calculator is: where b is the bias term.
  • the result generated after the calculation of the convolution layer is a similarity heat map representing the predicted probability of the position of the predicted target center, with a size of 17*17*1.
  • the position of the maximum pulse firing rate is the position of the predicted target center.
  • the code implementation of the twin network structure can usually have two implementation forms depending on the development framework adopted.
  • the first is to use deep learning development frameworks such as Pytorch and TensorFlow.
  • This type of framework can directly implement the weight sharing concept of twin networks, and the same network produces different outputs based on different inputs. That is, for a twin network You can calculate it once Calculate again
  • the second type is a brain-like computing framework dedicated to the development of spiking neural networks.
  • the design concept of this type of framework is to simulate biological neural structures. The number of neurons in each layer must be clearly stated in the network connections defined using this type of framework.
  • the sizes of the template image z and the search image x input by the two branches of the twin network are different, which corresponds to the network connection, which will lead to a different number of input layer neurons. Therefore, if written according to the conventional method, the two branches will become different networks and cannot share weights.
  • every two consecutive samples in the same batch are a group, and the feature map of the odd-numbered sample is As the convolution kernel, use the feature map of the even-numbered sample As the feature map to be convolved, cross-correlation calculations are performed to achieve the same effect as the deep learning development framework.
  • S304 Loss function design.
  • v is the similarity heat map output by the cross-correlation calculator, and its value is a real value.
  • the similarity heat map be D
  • the overall loss function is That is, the average value of the loss function of all points on the similarity heat map D.
  • the twin pulse neural network training method described in this embodiment is the gradient substitution method.
  • STBP Session-Temporal Backpropagation
  • STCA Session-Temporal Credit Assignment
  • other algorithms can be used to replace the non-differentiable pulse output with Approximate continuously differentiable functions, using gradient descent methods such as SGD or Adam to optimize network parameters.
  • Step S40 use the network trained in step S30 to perform target tracking, perform bicubic interpolation upsampling on the feature mapping results, obtain the target position in the original image, and implement target tracking.
  • the target that is, the template image is not updated, and the feature extractor of the initial target It only needs to be calculated once.
  • the search image x used in this embodiment is an image that is four times the size of the template image and is cropped from the target position in the previous event frame as the center.
  • the real-time performance is further improved by reducing the search area.
  • Bicubic interpolation is used to upsample the similarity heat map from 17*17 to 272*272 to determine the predicted target position.
  • three scales are used for search, that is, the images are respectively scaled to 1.03 ⁇ -1,0,1 ⁇ , and the one with the highest pulse emission rate and the highest similarity is selected from the output as the final result.
  • This application reduces the transmission delay of image data and the calculation delay of the target tracking algorithm, and improves the accuracy of target tracking in highly dynamic scenes.
  • This application obtains the data stream of asynchronous events through event cameras, reducing the amount of data transmission and communication delay.
  • This application divides synchronous event frames according to time steps and inputs the pulse neural network in real time, eliminating the requirement for pulse coding when traditional image frames are input to the pulse neural network.
  • the impulse neural network model described in this application uses impulse calculation, which reduces the calculation amount and reduces the algorithm calculation delay.
  • This application uses pulse convolutional neural network to extract features from event camera data, which can improve the tracking accuracy of the algorithm in high dynamic range scenarios.
  • the embodiment of the present application provides an event camera-based impulse neural network target tracking system 1, as shown in Figure 2, including: a data stream acquisition module 100, a high time resolution event frame generation module 200, a network training module 300, a network Output module 400.
  • the data stream acquisition module 100 is configured to acquire the data stream of an asynchronous event in a target high-dynamic scene through an event camera as input data.
  • the high time resolution event frame generation module 200 is configured to divide the data stream of asynchronous events into synchronous event frames with millisecond time resolution through asynchronous event accumulation.
  • the synchronous event frames are binary images similar to pulses.
  • the network training module 300 is configured to use the target image as the template image z and the complete image as the search image x to train a twin network based on the impulse neural network.
  • the twin network includes a feature extractor with weight sharing and calculates the similarity of the target position. degree calculator, using the gradient substitution algorithm to train this Siamese network.
  • the network output module 400 is configured to use the twin network trained by the module 300 to perform target tracking, interpolate and upsample the result of the feature mapping, obtain the target position in the original image, and implement target tracking.
  • the impulse neural network target tracking system based on event cameras provided in the above embodiments is only illustrated by the division of the above functional modules.
  • the above functions can be allocated to different functions as needed.
  • Modules are used to complete the process, that is, the modules or steps in the embodiments of the present application are decomposed or combined.
  • the modules of the above embodiments can be combined into one module, or further divided into multiple sub-modules to complete all or part of the above description.
  • Function The names of the modules and steps involved in the embodiments of this application are only for distinguishing each module or step and are not regarded as improper limitations of this application.
  • each of the above modules can be a functional module or a program module, and can be implemented by software or hardware.
  • each of the above-mentioned modules can be located in the same processor; or each of the above-mentioned modules can also be located in different processors in any combination.
  • FIGS. 1-3 in the embodiment of the present application can be implemented by the electronic device 2 .
  • Figure 4 is a schematic diagram of the hardware structure of the electronic device 2 according to an embodiment of the present application.
  • the electronic device 2 may include a processor 21 and a memory 22 storing computer program instructions.
  • the above-mentioned processor 21 may include a central processing unit (CPU), or an Application Specific Integrated Circuit (ASIC for short), or may be configured to implement one or more integrated circuits according to the embodiments of the present application.
  • CPU central processing unit
  • ASIC Application Specific Integrated Circuit
  • memory 22 may include mass storage for data or instructions.
  • the memory 22 may include a hard disk drive (Hard Disk Drive, HDD for short), floppy disk drive, Solid State Drive (SSD for short), flash memory, optical disk, magneto-optical disk, magnetic tape or universal serial drive. Universal Serial Bus (USB) drive or a combination of two or more of these.
  • Memory 22 may include removable or non-removable (or fixed) media, where appropriate. Where appropriate, the memory 22 may be internal or external to the data processing device.
  • memory 22 is Non-Volatile memory.
  • the memory 22 includes read-only memory (Read-Only Memory, ROM for short) and random access memory (Random Access Memory, RAM for short).
  • the ROM can be a mask-programmed ROM, a programmable ROM (Programmable Read-Only Memory, referred to as PROM), an erasable PROM (Erasable Programmable Read-Only Memory, referred to as EPROM), or an electrically removable ROM.
  • PROM Programmable Read-Only Memory
  • EPROM Erasable Programmable Read-Only Memory
  • Erase PROM Electrically Erasable Programmable Read-Only Memory, referred to as EEPROM
  • electrically rewritable ROM Electrically Alterable Read-Only Memory, referred to as EAROM
  • flash memory FLASH
  • this RAM can be static random access memory (Static Random-Access Memory, referred to as SRAM) or dynamic random access memory (Dynamic Random Access Memory, referred to as DRAM).
  • the memory 22 may be used to store or cache various data files required for processing and/or communication, as well as possible computer program instructions executed by the processor 21 .
  • the processor 21 reads and executes the computer program instructions stored in the memory 22 to implement any of the event camera-based impulse neural network target tracking methods in the above embodiments.
  • the electronic device 2 may also include a communication interface 23 and a bus 20 .
  • the processor 21, the memory 22, and the communication interface 23 are connected through the bus 20 and complete communication with each other.
  • the communication interface 23 is used to implement communication between various modules, devices, units and/or equipment in the embodiments of the present application.
  • the communication interface 23 can also implement data communication with other components such as: external devices, image/data acquisition equipment, databases, external storage, image/data processing workstations, etc.
  • Bus 20 includes hardware, software, or both, coupling the components of electronic device 2 to each other.
  • the bus 20 includes but is not limited to at least one of the following: Data Bus, Address Bus, Control Bus, Expansion Bus, and Local Bus.
  • the bus 20 may include an Accelerated Graphics Port (AGP for short) or other graphics bus, an Enhanced Industry Standard Architecture (Extended Industry Standard Architecture (EISA for short)) bus, a Front Side Bus (Front Side Bus) , referred to as FSB), Hyper Transport (HT) interconnect, Industry Standard Architecture (ISA) bus, wireless bandwidth (InfiniBand) interconnect, low pin count (Low Pin Count, LPC for short) bus, memory bus, Micro Channel Architecture (MCA for short) bus, Peripheral Component Interconnect (PCI for short) bus, PCI-Express (PCI-X) bus, serial The Serial Advanced Technology Attachment (SATA) bus, the Video Electronics Standards Association Local Bus (VLB) bus or other suitable bus or a combination of two or more
  • AGP
  • a storage medium may also be provided in this embodiment for implementation.
  • the storage medium stores a computer program; when the computer program is executed by the processor, any one of the event camera-based impulse neural network target tracking methods in the above embodiments is implemented.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

一种基于事件相机的脉冲神经网络目标跟踪方法及系统。该方法包括通过事件相机获取目标高动态场景中的异步事件的数据流,作为输入数据;将异步事件的数据流划分为毫秒级时间分辨率的同步事件帧;以目标图像为模板图像,以完整图像作为搜索图像,训练基于脉冲神经网络的孪生网络,该孪生网络包括特征提取器以及计互相关计算器;使用训练好的孪生网络,将特征映射的结果进行插值上采样,获得目标在原图中的位置,实现目标跟踪。

Description

基于事件相机的脉冲神经网络目标跟踪方法及系统
相关申请
本申请要求2022年4月7日申请的,申请号为202210357273.6,发明名称为“一种基于事件相机的脉冲神经网络目标跟踪方法和系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及目标跟踪领域,具体涉及一种基于事件相机的脉冲神经网络目标跟踪方法及系统。
背景技术
运动目标的识别和跟踪是计算机视觉领域的热点问题,在人机交互、视频跟踪、视觉导航、机器人以及军事指导等方面有广泛的应用。目前,目标跟踪有基于相关滤波以及基于深度学习两种主流技术路线。
基于相关滤波的目标跟踪方法速度快,但特征提取能力有限,面对尺度变换与目标丢失问题时效果较差。而基于深度学习的目标跟踪方法具有良好的特征表达能力,跟踪精度更高,但随之而来的是计算量的增加,在跟踪实时性方面受到一定的限制,且受光照影响大,不适用于高动态场景。
事件相机(Event-based camera,EB)或称动态视觉传感器(Dynamic vision sensor,DVS)与传统的帧率相机相比具有不同的工作方式,其输出不是强度图像,而是在微秒分辨率下的异步事件流,其每个像素的产生是独立的。事件相机相比于帧率相机,具有延迟低、功耗低、动态范围高等优点,更适合在过亮、过暗或明暗对比强烈等光照条件恶劣的场景下实现快速目标跟踪。
[根据细则91更正 07.08.2023]
同时,脉冲神经网络相比人工神经网络,由于采用脉冲发放机制,能够融合时空信息,且模拟生物膜电位的方式具有更高的生物真实性。
发明内容
根据本申请的各种实施例,提供一种基于事件相机的脉冲神经网络目标跟踪方法及系统。
本申请第一方面,提出了一种基于事件相机的脉冲神经网络目标跟踪方法。该方法包括:
通过事件相机获取目标高动态场景中的异步事件的数据流,作为输入数据;
通过异步事件累积将异步事件的数据流划分为毫秒级时间分辨率的同步事件帧,该同步事件帧为与脉冲相似的二值图像;
以目标图像为模板图像z,以完整图像作为搜索图像x,训练基于脉冲神经网络的孪生网络,该孪生网络包括权重共享的特征提取器以及计算目标位置的互相关计算器,使用梯度替代算法训练该孪生网络;
使用训练好的孪生网络,进行目标跟踪,将特征映射的结果进行插值上采样,获得目标在原图中的位置,实现目标跟踪。
在一实施例中,所述同步事件帧的生成方法为:根据设置好的时间步大小及数量对异步事件进行划分,将每个时间步内异步事件的数据流进行累积,只要同一时间步内某个坐标产生的异步事件数大于0,则该坐标处像素设置为1,否则像素设置为0,最终生成一张按时间步划分的事件帧图像。
[根据细则91更正 07.08.2023]
在一实施例中,所述特征提取器的生成方法为:采用脉冲卷积神经网络,其网络结构为96C5-2S-256C3-2S-384C3-384C3-256C3,其中96C5表示卷积核大小kernel_size为5,输出通道为96的脉冲卷积层,2S表示下采样2倍的池化层,以此类推。第一层卷积步长为2,其余卷积步长均为1。该特征提取器的所有卷积层后均带有脉冲神经元。
在一实施例中,所述脉冲神经元是LIF(Leaky integrate and fire)神经元模型,即
其中,τm表示膜时间常数,V表示膜电位,t表示脉冲时间,Vrest表示静息电位,Rm、I分别表示细胞膜的阻抗与输入电流。
将所述特征提取器表示为模板图像z尺寸为255*255*3,搜索图像x尺寸为127*127*3,则经过特征提取器运算后的输出分别为大小为6*6*256;大小为22*22*256。
在一实施例中,所述互相关计算器的生成方法为:以模板图像z提取特征后的特征映射作为卷积核,以搜索图像x提取特征后的特征映射作为待卷积的特征图,对二者进行卷积操作。经过该卷积层计算后产生的结果为一个代表预测目标中心的位置预测概率的相似度热力图,最大脉冲发放率的位置即为预测目标中心的位置。
在一实施例中,所述孪生网络的生成方法为:采用类脑计算开发框架,基于批次训练将填充后的模板图像与搜索图像顺序放入同一个批次batch中,使得输入层神经元数量一致,共用同一个网络连接;经过特征提取器运算后,再对第奇数个样本的输出即z分支的输出进行裁剪,删除边缘填充的操作,得到应有的尺寸为6*6*256的特征映射。
在一实施例中,所述目标跟踪的方法为:目标图像即模板图像采用不更新的方式,初始目标的特征提取器只需要计算一次。搜索图像为以上一事件帧中目标位置为中心裁剪出的相当于模板图像4倍大小的图像,通过缩小搜索区域进一步提升实时性。采用双三次插值将相似度热力图的尺寸上采样还原,确定预测的目标位置。采用3种尺度进行搜索,即将图像分别缩放为1.03{-1,0,1},从输出中选择脉冲发放率最高即相似度最高的位置作为最终结果。
本申请的第二方面,提出了一种基于事件相机的脉冲神经网络目标跟踪系统,该系统包括数据流获取模块、高时间分辨率事件帧生成模块、网络训练模块、网络输出模块。
所述数据流获取模块,配置为通过事件相机获取目标高动态场景中的异步事件的数据流,作为输入数据;所述高时间分辨率事件帧生成模块,配置为通过异步事件累积将异步事件的数据流划分为毫秒级时间分辨率的同步事件帧,该同步事件帧为与脉冲相似的二值图像;所述网络训练模块,配置为以目标图像为模板图像z,以完整图像作为搜索图像x,训练基于脉冲神经网络的孪生网络,该孪生网络包括权重共享的特征提取器以及计算目标位置的相似度计算器,使用梯度替代算法训练该孪生网络;所述网络输出模块,配置为使用训练好的孪生网络,进行目标跟踪,将特征映射的结果进行插值上采样,获得目标在原图中的位置,实现目标跟踪。
本申请的第三方面,提出了一种电子设备,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序以执行如上任一项所述的基于事件相机的脉冲神经网络目标跟踪方法。
本申请的第四方面,提出了一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如上任一项所述的基于事件相机的脉冲神经网络目标跟踪方法的步骤。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更好地描述和说明这里公开的那些发明的实施例和/或示例,可以参考一幅或多幅附图。用于描述附图的附加细节或示例不应当被认为是对所公开的发明、目前描述的实施例和/或示例以及目前理解的这些发明的最佳模式中的任何一者的范围的限制。
图1是本申请根据一个或多个实施例的基于事件相机的脉冲神经网络目标跟踪方法的流程示意图。
图2是本申请根据一个或多个实施例的基于事件相机的脉冲神经网络目标跟踪系统的框架示意图。
图3是本申请根据一个或多个实施例的基于事件相机的脉冲神经网络目标跟踪方法的简略流程示意图。
图4是本申请根据一个或多个实施例的电子设备的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
本申请的基于事件相机的脉冲神经网络目标跟踪方法,如图1所示,包括以下步骤:
步骤S10,通过事件相机获取目标高动态场景中的异步事件的数据流,作为输入数据;
步骤S20,通过异步事件累积将异步事件的数据流划分为毫秒级时间分辨率的同步事件帧,该同步事件帧为与脉冲相似的二值图像;
步骤S30,以目标图像为模板图像z,以完整图像作为搜索图像x,训练基于脉冲神经网络的孪生网络,该孪生网络包括权重共享的特征提取器以及计算目标位置的互相关计算器,使用梯度替代算法训练该孪生网络;
步骤S40,使用步骤S30训练好的孪生网络,进行目标跟踪,将特征映射的结果进行插值上采样,获得目标在原图中的位置,实现目标跟踪。
为了更清晰地对本申请基于基于事件相机的脉冲神经网络目标检测跟踪方法进行说明,下面结合附图对本申请一些实施例中各步骤进行展开详述。
步骤S10,通过事件相机获取目标高动态场景中的异步事件的数据流,作为输入数据。
在本实施例中,通过事件相机获取目标高动态场景中的异步事件的数据流,该数据流具体为[t,p,x,y]格式,其中t为时间戳,p为异步事件极性,x,y代表异步事件在像素坐标系下的坐标。
步骤S20,通过异步事件累积将异步事件的数据流划分为毫秒级时间分辨率的同步事件帧,该同步事件帧为与脉冲相似的二值图像。
在本实施例中,需根据设置好的时间步大小及数量对异步事件进行划分,可选的,设置时间步为0.1ms,以50个时间步为例,采用事件相机可实现相当于传统帧率相机200FPS(Frames Per Second,每秒传输帧数)的拍摄速率,极大提高了数据实时性。将每个时间步内异步事件的数据流进行累积,只要同一时间步内某个坐标产生的异步事件数大于0,则该坐标处像素设置为1,否则像素设置为0,最终生成一张按时间步划分的事件帧图像。本实施例中只对极性为正即p=1的异步事件进行处理。如图3所示,图3中的其余部分在下文中描述。
步骤S30,以目标图像为模板图像z,以完整图像作为搜索图像x,训练基于脉冲神经网络的孪生网络,该孪生网络包括权重共享的特征提取器以及计算目标位置的互相关计算器,使用梯度替代算法训练该孪生网络。
步骤S30包括步骤S301至步骤S305。
[根据细则91更正 07.08.2023]
设计特征提取器结构。在本实施例中,所述特征提取器为脉冲卷积神经网络,其网络结构为96C5-2S-256C3-2S-384C3-384C3-256C3,其中96C5表示卷积核大小kernel_size为5,输出通道为96的脉冲卷积层,2S表示下采样2倍的池化层,以此类推。第一层卷积步长为2,其余卷积步长均为1。该特征提取器的所有卷积层后均带有脉冲神经元。
可选的,所述脉冲神经元是LIF(Leaky integrate and fire)神经元模型,即
其中,τm表示膜时间常数,V表示膜电位,t表示脉冲时间,Vrest表示静息电位,Rm、I分别表示细胞膜的阻抗与输入电流。
将所述特征提取器表示为模板图像z尺寸为255*255*3,搜索图像x尺寸为127*127*3,则经过特征提取器运算后的输出分别为大小为6*6*256;大小为22*22*256。
S302:设计互相关计算器结构:本实施例中所述互相关计算器为卷积层,该卷积层以模板图像z提取特征后的特征映射作为卷积核,以搜索图像x提取特征后的特征映射作为待卷积的特征图,则互相关计算器的运算公式为其中b为偏置项。经过该卷积层计算后产生的结果为一个代表预测目标中心的位置预测概率的相似度热力图,大小为17*17*1,最大脉冲发放率的位置即为预测目标中心的位置。
S303:网络前向传播实现。本实施例中,所述孪生网络结构的代码实现,根据所采用的开发框架不同,通常可有两种实现形式。第一种为使用Pytorch、TensorFlow等深度学习开发框架,该类框架可直接实现孪生网络的权重共享理念,同一网络根据不同输入产生不同输出。即对一个孪生网络可以先计算一次再计算一次第二种为专用于脉冲神经网络开发的类脑计算框架,该类框架设计理念为模拟生物神经结构,使用该类框架定义的网络连接中须明确指出每层神经元的数量。在本实施例中,孪生网络的两条分支输入的模板图像z和搜索图像x的尺寸不同,对应到网络连接中,将会导致输入层神经元的数量不一样。因此如果按照常规写法,两条分支会变成不同的网络,不能共享权重。
在本实施例中,为解决类脑计算框架存在的上述问题,提出一种基于批次训练的解决方案。将模板图像z进行边缘填充0操作,设填充数量padding=p,使其尺寸与搜索图像x相等;将z与x顺序放入同一个批次batch中,使批次大小batchsize变为原来的2倍,新的批次batch中第奇数个样本为z,第偶数个样本为x,如此可使得输入层神经元数量一致,共用同一个网络连接;经过特征提取器运算后,再对第奇数个样本的输出即z分支的输出进行裁剪,删除边缘填充的操作,得到应有的尺寸为6*6*256的特征映射。如此,同一个批次中每连续两个样本为一组,以第奇数个样本的特征映射作为卷积核,以第偶数个样本的特征映射作为待卷积的特征图,执行互相关计算,即可实现与深度学习开发框架相同的效果。
S304:损失函数设计。在本实施例中,每个像素点的损失函数设置为l(y,v)=log(1+exp(-yv)),其中,y为真实标签,即原图中有目标的位置为1,无目标的位置为-1。v为互相关计算器输出的相似度热力图,其值为实数值。设相似度热力图为D,则整体的损失函数为即相似度热力图D上所有点损失函数的平均值。
S305:学习算法选择。本实施例中所述孪生脉冲神经网络训练方法为梯度替代法,可选的,可以采用STBP(Spatio-Temporal Backpropagation),STCA(Spatio-Temporal Credit Assignment)等算法,将不可导的脉冲输出替换为近似的连续可导函数,使用SGD或Adam等梯度下降方法实现网络参数的优化。
步骤S40,使用步骤S30训练好的网络,进行目标跟踪,将特征映射的结果进行双三次插值上采样,获得目标在原图中的位置,实现目标跟踪。
在本实施例中,目标即模板图像采用不更新的方式,初始目标的特征提取器只需要计算一次。
由于事件相机的低延迟特性,本实施例中采用的搜索图像x为以上一事件帧中目标位置为中心裁剪出的相当于模板图像4倍大小的图像,通过缩小搜索区域进一步提升实时性。
采用双三次插值将相似度热力图从17*17上采样到272*272,确定预测的目标位置。
在本实施例中,采用3种尺度进行搜索,即将图像分别缩放为1.03{-1,0,1},从输出中选择脉冲发放率最高即相似度最高的作为最终结果。
本申请的有益效果如下,本申请降低了图像数据的传输延迟与目标跟踪算法的计算延迟,提高了目标跟踪在高动态场景下的精度。
(1)本申请通过事件相机获取异步事件的数据流,减少了数据传输量,降低了通信延迟。
(2)本申请通过根据时间步划分同步事件帧的方式,实时输入脉冲神经网络,免去了传统图像帧输入脉冲神经网络时需要进行脉冲编码的要求。
(3)本申请所述脉冲神经网络模型相比于深度神经网络,由于采用脉冲计算,其计算量降低,算法计算延迟降低。
(4)本申请采用脉冲卷积神经网络对事件相机的数据进行特征提取,能够提高算法在高动态范围场景下的跟踪精度。
本申请实施例提供了一种基于事件相机的脉冲神经网络目标跟踪系统1,如图2所示,包括:数据流获取模块100、高时间分辨率事件帧生成模块200、网络训练模块300、网络输出模块400。
所述数据流获取模块100,配置为通过事件相机获取目标高动态场景中的异步事件的数据流,作为输入数据。
所述高时间分辨率事件帧生成模块200,配置为通过异步事件累积将异步事件的数据流划分为毫秒级时间分辨率的同步事件帧,该同步事件帧为与脉冲相似的二值图像。
所述网络训练模块300,配置为以目标图像为模板图像z,以完整图像作为搜索图像x,训练基于脉冲神经网络的孪生网络,该孪生网络包括权重共享的特征提取器以及计算目标位置的相似度计算器,使用梯度替代算法训练该孪生网络。
所述网络输出模块400,配置为使用模块300训练好的孪生网络,进行目标跟踪,将特征映射的结果进行插值上采样,获得目标在原图中的位置,实现目标跟踪。
所述技术领域的技术人员可以清楚的了解到,为描述的方便和简洁,上述描述的系统的具体的工作过程及有关说明,可以参考前述方法实施例中的对应过程,在此不再赘述。
要说明的是,上述实施例提供的基于事件相机的脉冲神经网络目标跟踪系统,仅以上述各功能模块的划分进行举例说明,在实际应用中,可以根据需要而将上述功能分配由不同的功能模块来完成,即将本申请实施例中的模块或者步骤再分解或者组合,例如,上述实施例的模块可以合并为一个模块,也可以进一步拆分成多个子模块,以完成以上描述的全部或者部分功能。对于本申请实施例中涉及的模块、步骤的名称,仅仅是为了区分各个模块或者步骤,不视为对本申请的不当限定。
需要说明的是,上述各个模块可以是功能模块也可以是程序模块,既可以通过软件来实现,也可以通过硬件来实现。对于通过硬件来实现的模块而言,上述各个模块可以位于同一处理器中;或者上述各个模块还可以按照任意组合的形式分别位于不同的处理器中。
另外,结合图1-图3描述的本申请实施例基于事件相机的脉冲神经网络目标跟踪方法可以由电子设备2来实现。图4为根据本申请实施例的电子设备2的硬件结构示意图。
电子设备2可以包括处理器21以及存储有计算机程序指令的存储器22。
具体地,上述处理器21可以包括中央处理器(CPU),或者特定集成电路(Application Specific Integrated Circuit,简称为ASIC),或者可以被配置成实施本申请实施例的一个或多个集成电路。
其中,存储器22可以包括用于数据或指令的大容量存储器。举例来说而非限制,存储器22可包括硬盘驱动器(Hard Disk Drive,简称为HDD)、软盘驱动器、固态驱动器(Solid State Drive,简称为SSD)、闪存、光盘、磁光盘、磁带或通用串行总线(Universal Serial Bus,简称为USB)驱动器或者两个或更多个以上这些的组合。在合适的情况下,存储器22可包括可移除或不可移除(或固定)的介质。在合适的情况下,存储器22可在数据处理装置的内部或外部。在特定实施例中,存储器22是非易失性(Non-Volatile)存储器。在特定实施例中,存储器22包括只读存储器(Read-Only Memory,简称为ROM)和随机存取存储器(Random Access Memory,简称为RAM)。在合适的情况下,该ROM可以是掩模编程的ROM、可编程ROM(Programmable Read-Only Memory,简称为PROM)、可擦除PROM(Erasable Programmable Read-Only Memory,简称为EPROM)、电可擦除PROM(Electrically Erasable Programmable Read-Only Memory,简称为EEPROM)、电可改写ROM(Electrically Alterable Read-Only Memory,简称为EAROM)或闪存(FLASH)或者两个或更多个以上这些的组合。在合适的情况下,该RAM可以是静态随机存取存储器(Static Random-Access Memory,简称为SRAM)或动态随机存取存储器(Dynamic Random Access Memory,简称为DRAM)。
存储器22可以用来存储或者缓存需要处理和/或通信使用的各种数据文件,以及处理器21所执行的可能的计算机程序指令。
处理器21通过读取并执行存储器22中存储的计算机程序指令,以实现上述实施例中的任意一种基于事件相机的脉冲神经网络目标跟踪方法。
在其中一些实施例中,电子设备2还可包括通信接口23和总线20。其中,如图4所示,处理器21、存储器22、通信接口23通过总线20连接并完成相互间的通信。
通信接口23用于实现本申请实施例中各模块、装置、单元和/或设备之间的通信。通信接口23还可以实现与其他部件例如:外接设备、图像/数据采集设备、数据库、外部存储以及图像/数据处理工作站等之间进行数据通信。
总线20包括硬件、软件或两者,将电子设备2的部件彼此耦接在一起。总线20包括但不限于以下至少之一:数据总线(Data Bus)、地址总线(Address Bus)、控制总线(Control Bus)、扩展总线(Expansion Bus)、局部总线(Local Bus)。举例来说而非限制,总线20可包括图形加速接口(Accelerated Graphics Port,简称为AGP)或其他图形总线、增强工业标准架构(Extended Industry Standard Architecture,简称为EISA)总线、前端总线(Front Side Bus,简称为FSB)、超传输(Hyper Transport,简称为HT)互连、工业标准架构(Industry Standard Architecture,简称为ISA)总线、无线带宽(InfiniBand)互连、低引脚数(Low Pin Count,简称为LPC)总线、存储器总线、微信道架构(Micro Channel Architecture,简称为MCA)总线、外围组件互连(Peripheral Component Interconnect,简称为PCI)总线、PCI-Express(PCI-X)总线、串行高级技术附件(Serial Advanced Technology Attachment,简称为SATA)总线、视频电子标准协会局部(Video Electronics Standards Association Local Bus,简称为VLB)总线或其他合适的总线或者两个或更多个以上这些的组合。在合适的情况下,总线20可包括一个或多个总线。尽管本申请实施例描述和示出了特定的总线,但本申请考虑任何合适的总线或互连。
此外,结合上述实施例中提供的基于事件相机的脉冲神经网络目标跟踪方法,在本实施例中还可以提供一种存储介质来实现。该存储介质上存储有计算机程序;该计算机程序被处理器执行时实现上述实施例中的任意一种基于事件相机的脉冲神经网络目标跟踪方法。
[根据细则91更正 07.08.2023]
本申请中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处。综上所述,本说明书内容不应理解为对本申请的限制。

Claims (10)

  1. 一种基于事件相机的脉冲神经网络目标跟踪方法,其特征在于,该方法包括以下步骤:
    通过事件相机获取目标高动态场景中的异步事件的数据流,作为输入数据;
    通过异步事件累积将异步事件的数据流划分为毫秒级时间分辨率的同步事件帧,该同步事件帧为与脉冲相似的二值图像;
    以目标图像为模板图像z,以完整图像作为搜索图像x,训练基于脉冲神经网络的孪生网络,该孪生网络包括权重共享的特征提取器以及计算目标位置的互相关计算器,使用梯度替代算法训练该孪生网络;
    使用训练好的孪生网络,进行目标跟踪,将特征映射的结果进行插值上采样,获得目标在原图中的位置,实现目标跟踪。
  2. 根据权利要求1所述的基于事件相机的脉冲神经网络目标跟踪方法,其中,所述同步事件帧的生成方法为:
    根据设置好的时间步大小及数量对异步事件进行划分,将每个时间步内异步事件的数据流进行累积,只要同一时间步内某个坐标产生的异步事件数大于0,则该坐标处像素设置为1,否则像素设置为0,最终生成一张按时间步划分的事件帧图像。
  3. 根据权利要求1所述的基于事件相机的脉冲神经网络目标跟踪方法,其中,所述特征提取器的生成方法为:
    采用脉冲卷积神经网络,其网络结构为96C5-2S-256C3-2S-384C3-384C3-256C3,其中95C5表示卷积核大小kernel_size为5输出通道为95的脉冲卷积层,2S表示下采样2倍的池化层,以此类推,第一层卷积步长为2,其余卷积步长均为1,该特征提取器的所有卷积层后均带有脉冲神经元。
  4. 根据权利要求3所述的基于事件相机的脉冲神经网络目标跟踪方法,其中,所述脉冲神经元是LIF(Leaky integrate and fire)神经元模型,即
    其中,τm表示膜时间常数,V表示膜电位,t表示脉冲时间,Vrest表示静息电位,Rm、I分别表示细胞膜的阻抗与输入电流;
    将所述特征提取器表示为模板图像z尺寸为255*255*3,搜索图像x尺寸为127*127*3,则经过特征提取器运算后的输出分别为大小为6*6*256;大小为22*22*256。
  5. 根据权利要求1所述的基于事件相机的脉冲神经网络目标跟踪方法,其中,所述互相关计算器的生成方法为:
    以模板图像z提取特征后的特征映射作为卷积核,以搜索图像x提取特征后的特征映射作为待卷积的特征图,对二者进行卷积操作,经过该卷积层计算后产生的结果为一个代表预测目标中心的位置预测概率的相似度热力图,最大脉冲发放率的位置即为预测目标中心的位置。
  6. 根据权利要求1所述的基于事件相机的脉冲神经网络目标跟踪方法,其中,所述孪生网络生成方法为:
    采用类脑计算开发框架,基于批次训练将填充后的模板图像与搜索图像顺序放入同一个批次batch中,使得输入层神经元数量一致,共用同一个网络连接;经过特征提取器运算后,再对第奇数个样本的输出即z分支的输出进行裁剪,删除边缘填充的操作,得到应有的尺寸为6*6*256的特征映射。
  7. 根据权利要求1所述的基于事件相机的脉冲神经网络目标跟踪方法,其中,所述目标跟踪方法为:
    目标图像即模板图像采用不更新的方式,初始目标的特征提取器只需要计算一次,搜索图像为从上一事件帧中目标位置为中心裁剪出的相当于模板图像4倍大小的图像,通过缩小搜索区域进一步提升实时性,采用双三次插值将相似度热力图的尺寸上采样还原,确定预测的目标位置,采用3种尺度进行搜索,即将图像分别缩放为1.03{-1,0,1},从输出中选择脉冲发放率最高即相似度最高的位置作为最终结果。
  8. 一种基于事件相机的脉冲神经网络目标跟踪系统,该系统包括数据流获取模块、高时间分辨率事件帧生成模块、网络训练模块、网络输出模块;
    所述数据流获取模块,配置为通过事件相机获取目标高动态场景中的异步事件的数据流,作为输入数据;
    所述高时间分辨率事件帧生成模块,配置为通过异步事件累积将异步事件的数据流划分为毫秒级时间分辨率的同步事件帧,该同步事件帧为与脉冲相似的二值图像;
    所述网络训练模块,配置为以目标图像为模板图像z,以完整图像作为搜索图像x,训练基于脉冲神经网络的孪生网络,该孪生网络包括权重共享的特征提取器以及计算目标位置的相似度计算器;
    所述网络输出模块,配置为使用训练好的孪生网络,进行目标跟踪,将特征映射的结果进行插值上采样,获得目标在原图中的位置,实现目标跟踪。
  9. 一种电子设备,包括存储器和处理器,其特征在于,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序以执行权利要求1至7中任一项所述的基于事件相机的脉冲神经网络目标跟踪方法。
  10. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至7中任一项所述的基于事件相机的脉冲神经网络目标跟踪方法的步骤。
PCT/CN2023/085815 2022-04-07 2023-04-01 基于事件相机的脉冲神经网络目标跟踪方法及系统 WO2023193670A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/240,526 US20230410328A1 (en) 2022-04-07 2023-08-31 Target tracking method and system of spiking neural network based on event camera

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210357273.6 2022-04-07
CN202210357273.6A CN114429491B (zh) 2022-04-07 2022-04-07 一种基于事件相机的脉冲神经网络目标跟踪方法和系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/240,526 Continuation US20230410328A1 (en) 2022-04-07 2023-08-31 Target tracking method and system of spiking neural network based on event camera

Publications (1)

Publication Number Publication Date
WO2023193670A1 true WO2023193670A1 (zh) 2023-10-12

Family

ID=81314426

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/085815 WO2023193670A1 (zh) 2022-04-07 2023-04-01 基于事件相机的脉冲神经网络目标跟踪方法及系统

Country Status (3)

Country Link
US (1) US20230410328A1 (zh)
CN (1) CN114429491B (zh)
WO (1) WO2023193670A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117232638A (zh) * 2023-11-15 2023-12-15 常州检验检测标准认证研究院 机器人振动检测方法及系统
CN117291914A (zh) * 2023-11-24 2023-12-26 南昌江铃华翔汽车零部件有限公司 汽车零部件缺陷检测方法、系统、计算机及存储介质
CN117314972A (zh) * 2023-11-21 2023-12-29 安徽大学 一种基于多类注意力机制的脉冲神经网络的目标跟踪方法

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114429491B (zh) * 2022-04-07 2022-07-08 之江实验室 一种基于事件相机的脉冲神经网络目标跟踪方法和系统
WO2023212857A1 (zh) * 2022-05-05 2023-11-09 中国科学院深圳先进技术研究院 一种基于类脑智能的脑机接口系统及设备
CN114861892B (zh) * 2022-07-06 2022-10-21 深圳时识科技有限公司 芯片在环代理训练方法及设备、芯片及电子设备
CN116883648B (zh) * 2023-09-06 2024-02-13 南方电网数字电网研究院股份有限公司 一种异物检测方法、装置、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110148159A (zh) * 2019-05-20 2019-08-20 厦门大学 一种基于事件相机的异步目标跟踪方法
CN111709967A (zh) * 2019-10-28 2020-09-25 北京大学 一种目标检测方法、目标跟踪方法、装置及可读存储介质
US20210105421A1 (en) * 2019-10-02 2021-04-08 Sensors Unlimited, Inc. Neuromorphic vision with frame-rate imaging for target detection and tracking
CN112837344A (zh) * 2019-12-18 2021-05-25 沈阳理工大学 一种基于条件对抗生成孪生网络的目标跟踪方法
CN114202564A (zh) * 2021-12-16 2022-03-18 深圳龙岗智能视听研究院 一种基于事件相机的高速目标追踪的方法及系统
CN114429491A (zh) * 2022-04-07 2022-05-03 之江实验室 一种基于事件相机的脉冲神经网络目标跟踪方法和系统

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019099337A1 (en) * 2017-11-14 2019-05-23 Kaban Technologies Llc Event camera-based deformable object tracking
CN109191491B (zh) * 2018-08-03 2020-09-08 华中科技大学 基于多层特征融合的全卷积孪生网络的目标跟踪方法及系统
CN110210563B (zh) * 2019-06-04 2021-04-30 北京大学 基于Spike cube SNN的图像脉冲数据时空信息学习及识别方法
CN110555523B (zh) * 2019-07-23 2022-03-29 中建三局智能技术有限公司 一种基于脉冲神经网络的短程跟踪方法及系统
CN113763415B (zh) * 2020-06-04 2024-03-08 北京达佳互联信息技术有限公司 目标跟踪方法、装置、电子设备及存储介质
CN112508996A (zh) * 2020-09-05 2021-03-16 常州工学院 无锚点孪生网络角点生成的目标跟踪方法及装置
CN112184752A (zh) * 2020-09-08 2021-01-05 北京工业大学 一种基于金字塔卷积的视频目标跟踪方法
FR3114718A1 (fr) * 2020-09-30 2022-04-01 Commissariat à l'énergie atomique et aux énergies alternatives Dispositif de compensation du mouvement d’un capteur événementiel et système d’observation et procédé associés
CN112712170B (zh) * 2021-01-08 2023-06-20 西安交通大学 基于输入加权脉冲神经网络的神经形态视觉目标分类系统
CN113205048B (zh) * 2021-05-06 2022-09-09 浙江大学 一种手势识别方法及识别系统
CN113762409A (zh) * 2021-09-17 2021-12-07 北京航空航天大学 一种基于事件相机的无人机目标检测方法
CN113988276B (zh) * 2021-12-27 2022-08-30 中科南京智能技术研究院 一种目标识别方法及系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110148159A (zh) * 2019-05-20 2019-08-20 厦门大学 一种基于事件相机的异步目标跟踪方法
US20210105421A1 (en) * 2019-10-02 2021-04-08 Sensors Unlimited, Inc. Neuromorphic vision with frame-rate imaging for target detection and tracking
CN111709967A (zh) * 2019-10-28 2020-09-25 北京大学 一种目标检测方法、目标跟踪方法、装置及可读存储介质
CN112837344A (zh) * 2019-12-18 2021-05-25 沈阳理工大学 一种基于条件对抗生成孪生网络的目标跟踪方法
CN114202564A (zh) * 2021-12-16 2022-03-18 深圳龙岗智能视听研究院 一种基于事件相机的高速目标追踪的方法及系统
CN114429491A (zh) * 2022-04-07 2022-05-03 之江实验室 一种基于事件相机的脉冲神经网络目标跟踪方法和系统

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117232638A (zh) * 2023-11-15 2023-12-15 常州检验检测标准认证研究院 机器人振动检测方法及系统
CN117232638B (zh) * 2023-11-15 2024-02-20 常州检验检测标准认证研究院 机器人振动检测方法及系统
CN117314972A (zh) * 2023-11-21 2023-12-29 安徽大学 一种基于多类注意力机制的脉冲神经网络的目标跟踪方法
CN117314972B (zh) * 2023-11-21 2024-02-13 安徽大学 一种基于多类注意力机制的脉冲神经网络的目标跟踪方法
CN117291914A (zh) * 2023-11-24 2023-12-26 南昌江铃华翔汽车零部件有限公司 汽车零部件缺陷检测方法、系统、计算机及存储介质
CN117291914B (zh) * 2023-11-24 2024-02-09 南昌江铃华翔汽车零部件有限公司 汽车零部件缺陷检测方法、系统、计算机及存储介质

Also Published As

Publication number Publication date
CN114429491B (zh) 2022-07-08
CN114429491A (zh) 2022-05-03
US20230410328A1 (en) 2023-12-21

Similar Documents

Publication Publication Date Title
WO2023193670A1 (zh) 基于事件相机的脉冲神经网络目标跟踪方法及系统
WO2020238560A1 (zh) 视频目标跟踪方法、装置、计算机设备及存储介质
EP3757890A1 (en) Method and device for image processing, method and device for training object detection model
WO2021022983A1 (zh) 图像处理方法和装置、电子设备、计算机可读存储介质
EP4206976A1 (en) Model training method and apparatus, body posture detection method and apparatus, and device and storage medium
CN113052835B (zh) 一种基于三维点云与图像数据融合的药盒检测方法及其检测系统
JP7273129B2 (ja) 車線検出方法、装置、電子機器、記憶媒体及び車両
CN112862877A (zh) 用于训练图像处理网络和图像处理的方法和装置
CN110532959B (zh) 基于双通道三维卷积神经网络的实时暴力行为检测系统
CN111882581B (zh) 一种深度特征关联的多目标跟踪方法
WO2023155387A1 (zh) 多传感器目标检测方法、装置、电子设备以及存储介质
WO2022061850A1 (zh) 点云运动畸变修正方法和装置
CN115375581A (zh) 基于事件时空同步的动态视觉事件流降噪效果评价方法
CN113762267A (zh) 一种基于语义关联的多尺度双目立体匹配方法及装置
CN113256683B (zh) 目标跟踪方法及相关设备
WO2024099068A1 (zh) 基于图像的速度确定方法、装置、设备及存储介质
CN114037087A (zh) 模型训练方法及装置、深度预测方法及装置、设备和介质
WO2021098554A1 (zh) 一种特征提取方法、装置、设备及存储介质
WO2024051591A1 (zh) 用于估算视频旋转的方法、装置、电子设备和存储介质
CN112884803A (zh) 基于dsp的实时智能监控目标检测方法及装置
CN112183431A (zh) 实时行人数量统计方法、装置、相机和服务器
CN112288817B (zh) 基于图像的三维重建处理方法及装置
WO2022017129A1 (zh) 目标对象检测方法、装置、电子设备及存储介质
CN111008555B (zh) 一种无人机图像弱小目标增强提取方法
CN109492755B (zh) 图像处理方法、图像处理装置和计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23758206

Country of ref document: EP

Kind code of ref document: A1