CN114037047A - Training method of impulse neural network - Google Patents

Training method of impulse neural network Download PDF

Info

Publication number
CN114037047A
CN114037047A CN202111177498.5A CN202111177498A CN114037047A CN 114037047 A CN114037047 A CN 114037047A CN 202111177498 A CN202111177498 A CN 202111177498A CN 114037047 A CN114037047 A CN 114037047A
Authority
CN
China
Prior art keywords
neural network
training
convolutional neural
impulse
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111177498.5A
Other languages
Chinese (zh)
Inventor
邹承明
范振锋
曾炜
常峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peng Cheng Laboratory
Original Assignee
Peng Cheng Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peng Cheng Laboratory filed Critical Peng Cheng Laboratory
Priority to CN202111177498.5A priority Critical patent/CN114037047A/en
Publication of CN114037047A publication Critical patent/CN114037047A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The invention discloses a training method of a pulse neural network, which is characterized in that a target convolutional neural network is obtained, wherein the target convolutional neural network is a convolutional neural network which is trained in advance; converting the target convolutional neural network into an initial impulse neural network; and simultaneously training the initial impulse neural network in a time domain and a space domain, and taking the trained initial impulse neural network as a target impulse neural network. According to the invention, after the convolutional neural network is converted into the impulse neural network, the impulse neural network is trained in a time domain and a space domain, so that the information transmission capability of the impulse neural network in the time domain and the space domain can be further optimized. Therefore, the problem that the SNN obtained by the conventional ANN-to-SNN training method needs a long time step for completing one-time forward reasoning is solved.

Description

Training method of impulse neural network
Technical Field
The invention relates to the field of deep learning, in particular to a training method of a pulse neural network.
Background
There are various training methods for the impulse neural network, and among them, the training method for the ANN-to-SNN has received much attention because it has good effects in performance and network scale. The training method of the ANN-to-SNN comprises the following steps: an artificial neural network is first trained and then converted into a version of SNN with the same network structure. This training approach not only avoids the difficulties faced in training SNNs directly, but also, in terms of performance, transforms SNNs with minimal gaps to ANN, with the ability to be implemented on large-scale network structures and data sets. However, the current training method of ANN-to-SNN has a problem that the time step required for the converted SNN to complete one forward inference is large, resulting in additional delay and energy consumption which are contrary to the original purpose.
Thus, there is still a need for improvement and development of the prior art.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a training method of a spiking neural network, aiming at solving the problem that the SNN obtained by the existing training method of an ANN-to-SNN requires a long time step for completing one forward inference.
The technical scheme adopted by the invention for solving the problems is as follows:
in a first aspect, an embodiment of the present invention provides a method for training a spiking neural network, where the method includes:
acquiring a target convolutional neural network, wherein the target convolutional neural network is a convolutional neural network trained in advance;
converting the target convolutional neural network into an initial impulse neural network;
and simultaneously training the initial impulse neural network in a time domain and a space domain, and taking the trained initial impulse neural network as a target impulse neural network.
In one embodiment, the obtaining the target convolutional neural network comprises:
acquiring a convolutional neural network;
acquiring original training data, inputting a training image in the original training data into the convolutional neural network, and generating a prediction vector corresponding to the training image through the convolutional neural network;
updating parameters of the convolutional neural network according to the label vector corresponding to the training image and the prediction vector, and continuing to execute the step of inputting the training image in the original training data into the convolutional neural network until the training is finished;
and taking the convolutional neural network after training as the target convolutional neural network.
In one embodiment, the obtaining a convolutional neural network comprises:
acquiring a standard convolutional neural network;
determining structural information corresponding to the standard impulse neural network;
clipping the standard convolutional neural network according to the structural information to obtain a clipped convolutional neural network;
and taking the clipped convolutional neural network as the convolutional neural network.
In one embodiment, the pruning the standard convolutional neural network according to the structural information comprises:
and adding an abs function layer at the back of the input end of the standard convolutional neural network according to the structural information, setting the bias of each convolutional layer and the full-connection layer of the standard convolutional neural network to zero, adjusting each activation function in the standard convolutional neural network to be a ReLU activation function, and adjusting the maximum pooling layer in the standard convolutional neural network to be a spatial linear down-sampling layer.
In one embodiment, the converting the target convolutional neural network into an initial impulse neural network comprises:
adding a pulse generation layer behind the input end of the target convolutional neural network, and adding a pulse counting layer in front of the output end of the target convolutional neural network;
and taking the added target convolutional neural network as the initial impulse neural network.
In one embodiment, the adding the target convolutional neural network after the pulse generation layer and the pulse counting layer as the initial impulse neural network comprises:
taking the increased target convolutional neural network as a weight initialization pulse neural network, and replacing each neuron in the weight initialization pulse neural network with a leakage integral ignition neuron;
and initializing the pulse neural network with the replaced weight as the initial pulse neural network.
In one embodiment, the training the initial spiking neural network in the time domain and the spatial domain simultaneously, and taking the trained initial spiking neural network as the target spiking neural network includes:
inputting a training image in the original training data into the initial impulse neural network, and generating an output vector corresponding to the training image through the initial impulse neural network;
according to the label vector corresponding to the training image and the output vector, parameter updating based on a time domain and a space domain is carried out on the initial impulse neural network at the same time, and the step of inputting the training image in the original training data into the initial impulse neural network is continuously carried out until the training is finished;
and taking the initial impulse neural network after training as the target impulse neural network.
In one embodiment, the performing, according to the label vector and the output vector corresponding to the training image, a parameter update based on a time domain and a spatial domain on the initial spiking neural network at the same time includes:
determining a loss function according to the label vector corresponding to the training image and the output vector;
according to the loss function, simultaneously carrying out back propagation of a space domain and back propagation of a time domain on the initial impulse neural network;
and updating parameters of the initial impulse neural network through the back propagation of the space domain and the back propagation of the time domain.
In one embodiment, the parameter updating of the initial spiking neural network by the back propagation of the spatial domain and the back propagation of the time domain comprises:
determining a target gradient corresponding to each neuron in the initial impulse neural network through the back propagation of the spatial domain and the back propagation of the time domain;
determining a target weight value corresponding to each neuron according to a target gradient corresponding to each neuron;
and updating the weight value of each neuron according to the target weight value corresponding to each neuron.
In a second aspect, an embodiment of the present invention further provides a spiking neural network, wherein the spiking neural network is obtained by training with any one of the above-mentioned training methods for the spiking neural network.
In a third aspect, an embodiment of the present invention further provides a training apparatus for a spiking neural network, where the apparatus includes:
the convolutional neural network determining module is used for acquiring a target convolutional neural network, wherein the target convolutional neural network is a convolutional neural network trained in advance;
the neural network conversion module is used for converting the target convolutional neural network into an initial impulse neural network;
and the impulse neural network training module is used for training the initial impulse neural network in a time domain and a space domain at the same time, and taking the trained initial impulse neural network as a target impulse neural network.
In a fourth aspect, an embodiment of the present invention further provides a terminal, where the terminal includes a memory and one or more processors; the memory stores one or more programs; the program comprises instructions for performing a method of training a spiking neural network as described in any one of the above; the processor is configured to execute the program.
In a fifth aspect, the present invention further provides a computer-readable storage medium, on which a plurality of instructions are stored, wherein the instructions are adapted to be loaded and executed by a processor to implement any of the steps of the training method for a spiking neural network described above.
The invention has the beneficial effects that: the method comprises the steps of obtaining a target convolutional neural network, wherein the target convolutional neural network is a convolutional neural network trained in advance; converting the target convolutional neural network into an initial impulse neural network; and simultaneously training the initial impulse neural network in a time domain and a space domain, and taking the trained initial impulse neural network as a target impulse neural network. According to the invention, after the convolutional neural network is converted into the impulse neural network, the impulse neural network is trained in a time domain and a space domain, so that the information transmission capability of the impulse neural network in the time domain and the space domain can be further optimized. Therefore, the problem that the SNN obtained by the conventional ANN-to-SNN training method needs a long time step for completing one-time forward reasoning is solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a training method of a spiking neural network according to an embodiment of the present invention.
Fig. 2 is a flow chart of information transfer and back propagation of a single neuron according to an embodiment of the present invention.
Fig. 3 is a structural diagram of a classical three-layer convolutional neural network provided in an embodiment of the present invention.
Fig. 4 is a structural diagram of a trimmed three-layer convolutional neural network according to an embodiment of the present invention.
Fig. 5 is a connection diagram of internal modules of a training apparatus for a spiking neural network according to an embodiment of the present invention.
Fig. 6 is a functional block diagram of a terminal according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It should be noted that, if directional indications (such as up, down, left, right, front, and back … …) are involved in the embodiment of the present invention, the directional indications are only used to explain the relative positional relationship between the components, the movement situation, and the like in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indications are changed accordingly.
The pulse Neural Network (SNN) is known as a new generation of Neural Network, and has attracted wide attention of learners due to its abundant characteristics of spatiotemporal dynamics, event driving, and the like. The spiking neural networks have originated in computational neuroscience, and cross-fusion with computer science-oriented artificial neural networks, represented by deep convolutional neural networks, is considered to be an advantageous approach to developing artificial intelligence. The spiking neural net completely simulates a biological neural network, is more bioanalytical, and operates asynchronous discrete events (or pulses), which makes implementation on neuromorphic hardware more energy efficient.
The training process of SNN is much more complex than that of conventional Convolutional Neural Networks (CNN), one of which is that SNN has complex spatio-temporal dynamics. The impulse neurons not only receive information from the previous layer of impulse neurons, but also are influenced by the previous time step of the impulse neurons, and the traditional CNN only transmits information in a space dimension, so that the SNN can transmit more information than the traditional CNN. Another factor is the event-driven feature that the membrane potential accumulation of the impulse neuron reaches a specified threshold before the impulse is released, and this property of threshold release results in inconspicuous impulse activity, which directly results in the complexity increase of the training of the impulse neural network.
There are various training methods for the impulse neural network, and among them, the training method for the ANN-to-SNN has received much attention because it has good effects in performance and network scale. The training method of the ANN-to-SNN comprises the following steps: an artificial neural network is first trained and then converted into a version of SNN with the same network structure. This training approach not only avoids the difficulties faced in training SNNs directly, but also, in terms of performance, transforms SNNs with minimal gaps to ANN, with the ability to be implemented on large-scale network structures and data sets. However, the current training method of ANN-to-SNN has a problem that the time step required for the converted SNN to complete one forward inference is large, resulting in additional delay and energy consumption which are contrary to the original purpose.
Aiming at the defects of the prior art, the invention provides a training method of a pulse neural network, which comprises the steps of obtaining a target convolutional neural network, wherein the target convolutional neural network is a convolutional neural network which is trained in advance; converting the target convolutional neural network into an initial impulse neural network; and simultaneously training the initial impulse neural network in a time domain and a space domain, and taking the trained initial impulse neural network as a target impulse neural network. According to the invention, after the convolutional neural network is converted into the impulse neural network, the impulse neural network is trained in a time domain and a space domain, so that the information transmission capability of the impulse neural network in the time domain and the space domain can be further optimized. Therefore, the problem that the SNN obtained by the conventional ANN-to-SNN training method needs a long time step for completing one-time forward reasoning is solved.
As shown in fig. 1, the method comprises the steps of:
step S100, obtaining a target convolutional neural network, wherein the target convolutional neural network is a convolutional neural network trained in advance.
Specifically, the present embodiment is similar to the conventional training method of the ANN-to-SNN, and an artificial neural network needs to be trained first, and then the artificial neural network is converted into the SNN version having the same network structure, thereby avoiding the difficulty (complex spatio-temporal dynamics) faced by directly training the SNN.
In one implementation, the step S100 specifically includes the following steps:
s101, acquiring a convolutional neural network;
step S102, acquiring original training data, inputting a training image in the original training data into the convolutional neural network, and generating a prediction vector corresponding to the training image through the convolutional neural network;
step S103, updating parameters of the convolutional neural network according to the label vector corresponding to the training image and the prediction vector, and continuing to perform the step of inputting the training image in the original training data into the convolutional neural network until the training is finished;
and step S104, taking the convolutional neural network after training as the target convolutional neural network.
In particular, the convolutional neural network in this embodiment is untrained. In order to train the target convolutional neural network to obtain the target convolutional neural network, a certain number of samples, that is, original training data, are prepared in advance in this embodiment, and it can be understood that the original training data includes a plurality of training images, and each training image includes a corresponding label vector, that is, a real label. During training, training images in original training data are input into the convolutional neural network, the convolutional neural network can automatically carry out reasoning according to the input training images, and corresponding reasoning results, namely prediction vectors, are output. Because the accuracy of inference of the untrained convolutional neural network is not high, a large difference is usually included between an output inference result and a real inference result, that is, the difference between a prediction vector and a label vector is large. Therefore, it is necessary to adjust the network parameters of the convolutional neural network by comparing the difference between the prediction vector and the tag vector and updating the parameters of the convolutional neural network with the difference as a guide. And continuously inputting other training images in the original training data into the adjusted convolutional neural network model to realize continuous adjustment of network parameters of the convolutional neural network model until the difference between the prediction vector output by the adjusted convolutional neural network and the label vector corresponding to the prediction vector is smaller than a preset threshold value, which indicates that the training is finished, and the trained convolutional neural network is the target convolutional neural network in the embodiment.
In one implementation, the step S101 specifically includes the following steps:
step S1011, acquiring a standard convolutional neural network;
step S1012, determining structural information corresponding to the standard impulse neural network;
s1013, clipping the standard convolutional neural network according to the structural information to obtain a clipped convolutional neural network;
and step S1014, taking the cutting convolutional neural network as the convolutional neural network.
In short, the untrained convolutional neural network in this embodiment is actually a trimmed convolutional neural network. Specifically, in this embodiment, a standard convolutional neural network is first obtained, for example, the standard convolutional neural network may be a conventional convolutional neural network such as VGG16, ResNet50, and the like, and then the standard convolutional neural network is clipped according to the structural information corresponding to the standard impulse neural network, so that the clipped convolutional neural network can meet the requirement of the impulse neural network and is closer to the impulse neural network in structural characteristics. And the clipped convolutional neural network, namely the clipped convolutional neural network is used as the convolutional neural network to be trained.
In one implementation, the tailoring the standard convolutional neural network according to the structural information specifically includes the following steps:
step S10131, adding abs function layers after the input end of the standard convolutional neural network according to the structure information, setting the bias of each convolutional layer and the full-link layer of the standard convolutional neural network to zero, adjusting each activation function in the standard convolutional neural network to be a ReLU activation function, and adjusting the maximum pooling layer in the standard convolutional neural network to be a spatial linear down-sampling layer.
Briefly, the clipping of the standard convolutional neural network in this embodiment mainly includes 4 steps: 1) adding an abs function layer after the input end of the standard convolutional neural network, i.e. after the preprocessing layer and before the first convolutional layer, and converting all the inputs into non-negative values; 2) zeroing out the bias of all convolutional and fully-connected layers in a standard convolutional neural network, since the bias is likely to be positive or negative, which is difficult to represent in an impulse neural network; 3) all the activation functions in the standard convolutional neural network are replaced by ReLU activation functions, because the activation functions in the standard convolutional neural network are usually Tanh activation functions, which may cause negative activation values to appear in the network, but negative input values are difficult to represent in the impulse neural network, so all the activation functions are replaced by ReLU activation functions, and all the activation values become non-negative numbers. 4) The maximum pooling layer in the standard convolutional neural network is adjusted to a spatial linear down-sampling layer.
For example, fig. 3 shows a network a selected in the present embodiment as a standard convolutional neural network, which is a typical three-layer convolutional neural network, and was used for classification of the Neovision2 Tower data set. The network a consists of three convolution blocks, each of which comprises three layers, the first layer being a spatial convolution layer consisting of a series of convolution kernels, the second layer being an activation function tanh (), and the third layer being a max-pooling layer. It should be noted that the last layer of the last block does not use a pooling layer, but rather a fully connected layer is selected. The entire network of network a is trained using a standard back propagation algorithm. The tailoring process for network a is:
1. an abs () layer is added before the first convolutional layer of the convolutional neural network.
2. And setting the bias of all convolution layers and all connection layers of the convolution neural network to be zero.
3. And replacing the sigmoid activation function of the convolutional neural network with a ReLU activation function.
4. The maximum pooling in the convolutional neural network is converted to spatial linear sub-sampling.
The clipped convolutional neural network obtained through the above steps, that is, the network a 'is as shown in fig. 4, for the network a', the activation function is converted from tanh to ReLU, so that the training and convergence speed can be increased, the gradient disappearance in the training process is effectively avoided, the input before the first convolutional layer is subjected to an Abs () operation, and the conversion of the activation function ensures that the values transmitted in the network are all non-negative, so that the problem that a negative value is difficult to represent in the impulse neural network is solved.
As shown in fig. 1, the method further comprises the steps of:
and step S200, converting the target convolutional neural network into an initial impulse neural network.
In particular, since the final goal of the present embodiment is to obtain a spiking neural network. Therefore, the trained target convolutional neural network needs to be subjected to network conversion so as to become the impulse neural network. Because the performance of the initially transformed spiking neural network is poor, the spiking neural network obtained by the initial transformation is used as the initial spiking neural network, and then the performance of the initial spiking neural network needs to be optimized before being used.
In one implementation, the step S200 specifically includes the following steps:
step S201, adding a pulse generation layer behind the input end of the target convolutional neural network, and adding a pulse counting layer in front of the output end of the target convolutional neural network;
and S202, taking the added target convolutional neural network as the initial impulse neural network.
Specifically, in order to realize the conversion between the convolutional neural network and the impulse neural network, the present embodiment needs to add an impulse generation layer after the input end of the target convolutional neural network, since the impulse generation layer can convert the input image into an impulse sequence to meet the format requirement of processing data in the impulse neural network. And a pulse counting layer is added in front of the output section of the target convolutional neural network to count the pulse sequence finally output by the network. And after the increase is finished, obtaining the initial impulse neural network.
In one implementation, the pulse generating layer is a poisson pulse generator.
In an implementation manner, the step S202 specifically includes the following steps:
step S2021, initializing the pulse neural network by taking the increased target convolutional neural network as a weight, and replacing each neuron in the weight initialization pulse neural network with a leakage integral ignition neuron;
step S2021, initializing the pulse neural network with the replaced weight as the initial pulse neural network.
Specifically, the most important difference between SNNs and ANN is that SNN networks use discrete pulsed signals instead of continuous analog signals that propagate through the ANN network. In order to generate the pulse signal, the present embodiment initializes the pulse neural network with the target convolutional neural network to which the pulse generation layer and the pulse counting layer are added as weights, that is, although the weight initialization pulse neural network has more pulse generation layers and pulse counting layers, the weights of the layers still use the trained weights in the target convolutional neural network, and the neurons in the weight initialization pulse neural network are replaced with leaky integrate-and-fire neurons (LIFs). For a LIF neuron, the input signal directly affects its state (membrane potential), and the output signal is only generated when the membrane potential rises to a threshold potential. Therefore, the initial impulse neural network is obtained after the neurons in the weight initialization impulse neural network are replaced by the leakage integral ignition neurons, and the initial impulse neural network can generate discrete impulse signals.
By way of example, a general representation of the LIF neuron model is as follows:
Figure BDA0003295880170000121
where u (t) is the membrane potential of the neuron at time t, τ is the time constant, I (t) is the external input potential at time t, when the membrane potential exceeds a given threshold VthWhen the neuron fires a pulse, the membrane potential is reset to the resting potential Vrest
Since the impulse neurons have complex spatiotemporal dynamics and need to be propagated back through the initial impulse neural network in both the time and spatial domains, the initial impulse neural network is transformed into the following iterative format:
Figure BDA0003295880170000131
Figure BDA0003295880170000132
Figure BDA0003295880170000133
Figure BDA0003295880170000134
in the above formula, t represents the time t, n and l (n) represent the number of the n-th layer and the n-th layer neurons, wijIs the synaptic weight between the post-synaptic neuron j and the pre-synaptic neuron i, ojA pulse is delivered as 1, whereas a pulse is not delivered.
In addition, the problem of threshold balance needs to be considered. Because the higher the ratio of threshold to weight, the longer the neuron needs to take, thereby reducing the neuron pulse rate. On the other hand, a relatively low threshold will cause the SNN to lose all its ability, causing him to lose the process of integration of the membrane voltage. That is, one is over-activated and one is under-activated, which affects the conversion accuracy. Therefore, an appropriate threshold is important. In the research, a threshold balance method is used for normalizing the network weight, and the threshold is set as a normalization factor which is the maximum output of a corresponding convolution or linear layer.
As shown in fig. 1, the method further comprises the steps of:
and S300, training the initial impulse neural network in a time domain and a space domain at the same time, and taking the trained initial impulse neural network as a target impulse neural network.
In particular, since the initial spiking neural network has complex space-time dynamics, although the initial spiking neural network is transformed based on the trained convolutional neural network, since the convolutional neural network is trained mainly in the spatial domain, the performance of the initial spiking neural network in the time domain is not very good, the time step required for completing one forward inference is very large, and additional delay and energy consumption are generated. In order to improve the performance of the initial spiking neural network, the present embodiment adds another training to the initial spiking neural network, and the goal of this training is to improve the information transfer capability in both the time domain and the spatial domain. And obtaining the target impulse neural network after training is finished, and directly applying the network to carry out reasoning.
In one implementation, the step S300 specifically includes the following steps:
step S301, inputting a training image in the original training data into the initial impulse neural network, and generating an output vector corresponding to the training image through the initial impulse neural network;
step S302, according to the label vector corresponding to the training image and the output vector, updating parameters of the initial impulse neural network based on a time domain and a space domain at the same time, and continuing to execute the step of inputting the training image in the original training data into the initial impulse neural network until the training is finished;
and step S303, taking the initial impulse neural network after training as the target impulse neural network.
In particular, the training process for the initial spiking neural network still uses the original training data used in training the convolutional neural network. During training, training images in original training data are input into the initial impulse neural network, and the initial impulse neural network contains an impulse generation layer, so that the training images can be converted into corresponding impulse sequences. Because the initial impulse neural network adopts the LIF neurons in an iterative form, the pulse sequence enters the initial impulse neural network and is firstly convoluted by a first layer, then passes through the ReLU activation function and judges whether the membrane potential of the neuron reaches a threshold value, and if the membrane potential of the neuron reaches the threshold value, the pulse is generated and input to the next layer. The process needs to be repeated for a plurality of times when the LIF neurons exist in the initial impulse neural network, no impulse signal is generated after the transmitted impulse signal enters the activation function of the last layer, the linear transformation of the characteristics is carried out through a full connection layer, and finally the impulse sequence output by the initial impulse neural network is counted in a time window T so as to obtain the output vector corresponding to the training image.
Because the initial impulse neural network is transformed from the convolutional neural network, the performance of the initial impulse neural network in the time domain and the space domain is still insufficient, and a certain gap exists between the output reasoning result and the real reasoning result. The label vector corresponding to the training image can be used for reflecting a real reasoning result corresponding to the training image, so that the error of information transmission of the initial impulse neural network in a time domain and a space domain can be calculated by comparing the label vector with the output vector, and the initial impulse neural network is subjected to parameter updating based on the time domain and the space domain simultaneously so as to adjust the network parameters of the initial impulse neural network. And then, continuously carrying out iterative training on the adjusted initial impulse neural network in a time domain and a space domain according to other training images in the original training data until the difference between the output vector of the adjusted initial impulse neural network and the corresponding standard vector is smaller than a preset threshold value. And obtaining the target impulse neural network after the training is finished.
In an implementation manner, the performing, according to the label vector and the output vector corresponding to the training image, parameter update based on a time domain and a space domain on the initial impulse neural network at the same time specifically includes the following steps:
step S3021, determining a loss function according to the label vector corresponding to the training image and the output vector;
step S3022, according to the loss function, performing back propagation in a spatial domain and back propagation in a time domain on the initial impulse neural network at the same time;
and step S3023, updating parameters of the initial impulse neural network through the back propagation of the space domain and the back propagation of the time domain.
In brief, the initial impulse neural network is mainly divided into two stages during training, one is a forward propagation stage, and the other is a backward propagation stage. The forward propagation stage is to input the training image into the initial impulse neural network to obtain an output vector corresponding to the training image; and the back propagation stage is to calculate the response error of each neuron of the initial impulse neural network from back to front in turn. In order to implement the training of the initial impulse neural network in the time domain and the space domain at the same time, in this embodiment, a loss function needs to be determined according to the tag vector and the output vector, and then back propagation is performed in the time domain and the space domain at the same time (as shown in fig. 2), so as to implement parameter updating of the initial impulse neural network and improve the accuracy of information transfer of the initial impulse neural network in the time domain and the space domain at the same time.
In one implementation manner, the step S3023 specifically includes the following steps:
step S30231, determining a target gradient corresponding to each neuron in the initial spiking neural network through the back propagation of the spatial domain and the back propagation of the time domain;
step S30232, determining a target weight value corresponding to each neuron according to a target gradient corresponding to each neuron;
step S30233, updating a weight value of each neuron according to the target weight value corresponding to each neuron.
Specifically, since the loss function value may reflect the information transfer error of the initial impulse neural network in the time domain and the spatial domain, the present embodiment performs back propagation in the spatial domain and the time domain with the direction of the loss function value as a guide, and sequentially determines the target gradient corresponding to each neuron in the initial impulse neural network through the two back propagation. Since the weight gradient of the neuron is closely related to its own weight value, the target weight value corresponding to each neuron can be calculated based on the target gradient after the weight gradient of each neuron is corrected to the target gradient corresponding to the neuron, thereby updating the weight value of each neuron.
By way of example, the loss function of the initial spiking neural network may be expressed as:
Figure BDA0003295880170000161
wherein S represents the number of samples, ysRepresents the tag vector, osThe output vector is generated according to a voting mechanism in the last layer, and L is a function of the weight w by combining a LIF neuron model, and the gradient can be calculated and the parameter can be updated according to a back propagation algorithm based on time and space;
the gradient updating based on the time domain and the space domain is carried out through a loss function L, the general formula of the gradient updating is as follows,
Figure BDA0003295880170000162
to calculate
Figure BDA0003295880170000163
First of all need to calculate
Figure BDA0003295880170000164
And
Figure BDA0003295880170000165
the expression is as follows:
Figure BDA0003295880170000171
Figure BDA0003295880170000172
therefore, the first and second electrodes are formed on the substrate,
Figure BDA0003295880170000173
as can be seen from the iterative format of the LIF neurons,
Figure BDA0003295880170000174
the gradient formula of W is as follows,
Figure BDA0003295880170000175
in one implementation, due to the non-differentiable nature of the impulse neurons, the derivative of impulse activity is approximated with a function h (u):
Figure BDA0003295880170000176
further obtaining:
Figure BDA0003295880170000177
the finally trained target impulse neural network can be directly put into application, and can be used in information processing as a traditional artificial neural network. Since the target spiking neural network is more realistic, it can be used to learn the operation of the biological nervous system.
The invention has the advantages that:
1. the traditional impulse neural network training method has the following limitations: (1) the impulse neural network is difficult to train directly, most of the impulse neural networks which are trained directly are only limited to the network of a shallow layer (below 4 layers), for example, the SpikeProp algorithm only supports a single-layer impulse neural network; (2) the pulse time-dependent plasticity rule only considers local neuron activity, and high performance is difficult to achieve; (3) most of the traditional training methods are based on information transmission in a spatial domain, and the information transmission in a time dimension is ignored, which is a performance bottleneck of the traditional impulse neural network during training.
2. The method avoids the problems of long time period, slow convergence and the like of directly training the impulse neural network, and can construct a large impulse neural network by a conversion method; in addition, the training method has a global error function as a guide, and can achieve global optimization of parameters;
3. the trained and cut artificial neural network is used as the initialization of the impulse neural network, and then the transformed impulse neural network is subjected to incremental training through back propagation in a space domain and a time domain, so that the information transmission of the time domain and the space domain is fully utilized;
4. in the traditional training method, no matter the impulse neural network obtained by direct training or the impulse neural network obtained by a conversion method, as only information transmission on a time domain is considered, more time steps are needed during inference, the invention fully utilizes the time domain and the space domain, and the time steps needed in the inference stage are 10-15 times less.
Based on the above embodiment, the present invention further provides a spiking neural network, wherein the spiking neural network is obtained by training using the above-mentioned spiking neural network training method.
Based on the above embodiment, the present invention further provides a training apparatus for a spiking neural network, as shown in fig. 5, the apparatus includes:
the convolutional neural network determining module 01 is used for acquiring a target convolutional neural network, wherein the target convolutional neural network is a convolutional neural network trained in advance;
a neural network conversion module 02, configured to convert the target convolutional neural network into an initial impulse neural network;
and the impulse neural network training module 03 is configured to train the initial impulse neural network in a time domain and a space domain at the same time, and use the trained initial impulse neural network as a target impulse neural network.
Based on the above embodiments, the present invention further provides a terminal, and a schematic block diagram thereof may be as shown in fig. 6. The terminal comprises a processor, a memory, a network interface and a display screen which are connected through a system bus. Wherein the processor of the terminal is configured to provide computing and control capabilities. The memory of the terminal comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the terminal is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement a training method of a spiking neural network. The display screen of the terminal can be a liquid crystal display screen or an electronic ink display screen.
It will be appreciated by those skilled in the art that the block diagram of fig. 6 is only a block diagram of a portion of the structure associated with the inventive arrangements and does not constitute a limitation of the terminal to which the inventive arrangements are applied, and that a particular terminal may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one implementation, one or more programs are stored in a memory of the terminal and configured to be executed by one or more processors, the one or more programs including instructions for performing a training method of an impulse neural network.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
In summary, the present invention discloses a training method of a spiking neural network, which obtains a target convolutional neural network, wherein the target convolutional neural network is a convolutional neural network trained in advance; converting the target convolutional neural network into an initial impulse neural network; and simultaneously training the initial impulse neural network in a time domain and a space domain, and taking the trained initial impulse neural network as a target impulse neural network. According to the invention, after the convolutional neural network is converted into the impulse neural network, the impulse neural network is trained in a time domain and a space domain, so that the information transmission capability of the impulse neural network in the time domain and the space domain can be further optimized. Therefore, the problem that the SNN obtained by the conventional ANN-to-SNN training method needs a long time step for completing one-time forward reasoning is solved.
It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims (13)

1. A method of training an impulse neural network, the method comprising:
acquiring a target convolutional neural network, wherein the target convolutional neural network is a convolutional neural network trained in advance;
converting the target convolutional neural network into an initial impulse neural network;
and simultaneously training the initial impulse neural network in a time domain and a space domain, and taking the trained initial impulse neural network as a target impulse neural network.
2. The method for training the spiking neural network according to claim 1, wherein the obtaining the target convolutional neural network comprises:
acquiring a convolutional neural network;
acquiring original training data, inputting a training image in the original training data into the convolutional neural network, and generating a prediction vector corresponding to the training image through the convolutional neural network;
updating parameters of the convolutional neural network according to the label vector corresponding to the training image and the prediction vector, and continuing to execute the step of inputting the training image in the original training data into the convolutional neural network until the training is finished;
and taking the convolutional neural network after training as the target convolutional neural network.
3. The method for training the spiking neural network according to claim 2, wherein the obtaining the convolutional neural network comprises:
acquiring a standard convolutional neural network;
determining structural information corresponding to the standard impulse neural network;
clipping the standard convolutional neural network according to the structural information to obtain a clipped convolutional neural network;
and taking the clipped convolutional neural network as the convolutional neural network.
4. The method for training the spiking neural network according to claim 3, wherein the clipping the standard convolutional neural network according to the structural information comprises:
and adding an abs function layer at the back of the input end of the standard convolutional neural network according to the structural information, setting the bias of each convolutional layer and the full-connection layer of the standard convolutional neural network to zero, adjusting each activation function in the standard convolutional neural network to be a ReLU activation function, and adjusting the maximum pooling layer in the standard convolutional neural network to be a spatial linear down-sampling layer.
5. The method of training the spiking neural network according to claim 1, wherein the converting the target convolutional neural network into an initial spiking neural network comprises:
adding a pulse generation layer behind the input end of the target convolutional neural network, and adding a pulse counting layer in front of the output end of the target convolutional neural network;
and taking the added target convolutional neural network as the initial impulse neural network.
6. The method for training the spiking neural network according to claim 5, wherein the step of adding the pulse generation layer and the pulse counting layer to the target convolutional neural network as the initial spiking neural network comprises:
taking the increased target convolutional neural network as a weight initialization pulse neural network, and replacing each neuron in the weight initialization pulse neural network with a leakage integral ignition neuron;
and initializing the pulse neural network with the replaced weight as the initial pulse neural network.
7. The method for training the spiking neural network according to claim 2, wherein the training the initial spiking neural network in the time domain and the space domain simultaneously, and the training the initial spiking neural network as the target spiking neural network comprises:
inputting a training image in the original training data into the initial impulse neural network, and generating an output vector corresponding to the training image through the initial impulse neural network;
according to the label vector corresponding to the training image and the output vector, parameter updating based on a time domain and a space domain is carried out on the initial impulse neural network at the same time, and the step of inputting the training image in the original training data into the initial impulse neural network is continuously carried out until the training is finished;
and taking the initial impulse neural network after training as the target impulse neural network.
8. The method for training the impulse neural network according to claim 7, wherein the performing the parameter update based on the time domain and the spatial domain simultaneously on the initial impulse neural network according to the label vector corresponding to the training image and the output vector comprises:
determining a loss function according to the label vector corresponding to the training image and the output vector;
according to the loss function, simultaneously carrying out back propagation of a space domain and back propagation of a time domain on the initial impulse neural network;
and updating parameters of the initial impulse neural network through the back propagation of the space domain and the back propagation of the time domain.
9. The method for training the spiking neural network according to claim 8, wherein the parameter updating of the initial spiking neural network through the back propagation of the spatial domain and the back propagation of the time domain comprises:
determining a target gradient corresponding to each neuron in the initial impulse neural network through the back propagation of the spatial domain and the back propagation of the time domain;
determining a target weight value corresponding to each neuron according to a target gradient corresponding to each neuron;
and updating the weight value of each neuron according to the target weight value corresponding to each neuron.
10. An impulse neural network, characterized in that the impulse neural network is trained by the training method of the impulse neural network according to any one of claims 1 to 9.
11. An apparatus for training a spiking neural network, the apparatus comprising:
the convolutional neural network determining module is used for acquiring a target convolutional neural network, wherein the target convolutional neural network is a convolutional neural network trained in advance;
the neural network conversion module is used for converting the target convolutional neural network into an initial impulse neural network;
and the impulse neural network training module is used for training the initial impulse neural network in a time domain and a space domain at the same time, and taking the trained initial impulse neural network as a target impulse neural network.
12. A terminal, comprising a memory and one or more processors; the memory stores one or more programs; the program includes instructions for performing a method of training a spiking neural network according to any of claims 1-9; the processor is configured to execute the program.
13. A computer readable storage medium having stored thereon a plurality of instructions adapted to be loaded and executed by a processor to perform the steps of the method for training a spiking neural network according to any of claims 1-9.
CN202111177498.5A 2021-10-09 2021-10-09 Training method of impulse neural network Pending CN114037047A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111177498.5A CN114037047A (en) 2021-10-09 2021-10-09 Training method of impulse neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111177498.5A CN114037047A (en) 2021-10-09 2021-10-09 Training method of impulse neural network

Publications (1)

Publication Number Publication Date
CN114037047A true CN114037047A (en) 2022-02-11

Family

ID=80141070

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111177498.5A Pending CN114037047A (en) 2021-10-09 2021-10-09 Training method of impulse neural network

Country Status (1)

Country Link
CN (1) CN114037047A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114723038A (en) * 2022-03-03 2022-07-08 北京大学 Model conversion method, device, equipment and storage medium based on target detection
CN114861892A (en) * 2022-07-06 2022-08-05 深圳时识科技有限公司 Chip on-loop agent training method and device, chip and electronic device
CN114997235A (en) * 2022-06-13 2022-09-02 脉冲视觉(北京)科技有限公司 Target detection processing method, device, equipment and medium based on pulse signal
CN114998996A (en) * 2022-06-14 2022-09-02 中国电信股份有限公司 Signal processing method, device and equipment with motion attribute information and storage
CN115238857A (en) * 2022-06-15 2022-10-25 脉冲视觉(北京)科技有限公司 Neural network based on pulse signal and pulse signal processing method
CN117037287A (en) * 2023-10-08 2023-11-10 武汉理工大学 Behavior recognition method, system and device based on 3D impulse neural network

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114723038A (en) * 2022-03-03 2022-07-08 北京大学 Model conversion method, device, equipment and storage medium based on target detection
CN114723038B (en) * 2022-03-03 2024-04-23 北京大学 Model conversion method, device, equipment and storage medium based on target detection
CN114997235A (en) * 2022-06-13 2022-09-02 脉冲视觉(北京)科技有限公司 Target detection processing method, device, equipment and medium based on pulse signal
CN114998996A (en) * 2022-06-14 2022-09-02 中国电信股份有限公司 Signal processing method, device and equipment with motion attribute information and storage
CN114998996B (en) * 2022-06-14 2024-04-05 中国电信股份有限公司 Signal processing method, device and equipment with motion attribute information and storage
CN115238857A (en) * 2022-06-15 2022-10-25 脉冲视觉(北京)科技有限公司 Neural network based on pulse signal and pulse signal processing method
CN114861892A (en) * 2022-07-06 2022-08-05 深圳时识科技有限公司 Chip on-loop agent training method and device, chip and electronic device
CN114861892B (en) * 2022-07-06 2022-10-21 深圳时识科技有限公司 Chip on-loop agent training method and device, chip and electronic device
CN117037287A (en) * 2023-10-08 2023-11-10 武汉理工大学 Behavior recognition method, system and device based on 3D impulse neural network
CN117037287B (en) * 2023-10-08 2023-12-29 武汉理工大学 Behavior recognition method, system and device based on 3D impulse neural network

Similar Documents

Publication Publication Date Title
CN114037047A (en) Training method of impulse neural network
CN108805270B (en) Convolutional neural network system based on memory
US11308392B2 (en) Fixed-point training method for deep neural networks based on static fixed-point conversion scheme
CN112633497B (en) Convolutional impulse neural network training method based on re-weighted membrane voltage
Alaloul et al. Data processing using artificial neural networks
US9256823B2 (en) Apparatus and methods for efficient updates in spiking neuron network
Rumelhart et al. The basic ideas in neural networks
US9129221B2 (en) Spiking neural network feedback apparatus and methods
CN110428042B (en) Reciprocally scaling neuron connection weights and input values to defeat hardware limitations
Burney et al. Levenberg-Marquardt algorithm for Karachi Stock Exchange share rates forecasting
Kumar et al. Deep Learning as a Frontier of Machine Learning: A
EP3889846A1 (en) Deep learning model training method and system
JP7240650B2 (en) Spiking neural network system, learning processing device, learning processing method and program
CN111382840B (en) HTM design method based on cyclic learning unit and oriented to natural language processing
Murugan Learning the sequential temporal information with recurrent neural networks
CN115017178A (en) Training method and device for data-to-text generation model
CN107292322B (en) Image classification method, deep learning model and computer system
Kozlova et al. The use of neural networks for planning the behavior of complex systems
CN113902092A (en) Indirect supervised training method for impulse neural network
CN113298231A (en) Graph representation space-time back propagation algorithm for impulse neural network
JP3374476B2 (en) Neural network construction method
Yao et al. EPNet for chaotic time-series prediction
Dao Image classification using convolutional neural networks
Tareen et al. Convolutional neural networks for beginners
Hu et al. Time series prediction with a weighted bidirectional multi-stream extended Kalman filter

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination