CN113298231A

CN113298231A - Graph representation space-time back propagation algorithm for impulse neural network

Info

Publication number: CN113298231A
Application number: CN202110548714.6A
Authority: CN
Inventors: 闫钰龙; 褚皓明; 环宇翔; 梁龙飞; 邹卓; 郑立荣
Original assignee: Shanghai New Helium Brain Intelligence Technology Co ltd; Fudan University
Current assignee: Shanghai New Helium Brain Intelligence Technology Co ltd; Fudan University
Priority date: 2021-05-19
Filing date: 2021-05-19
Publication date: 2021-08-24

Abstract

The invention relates to the technical field of an impulse neural network, in particular to a graph representation space-time back propagation algorithm for the impulse neural network, which obtains the impulse neural network through network forward propagation of neurons in a network structure; evaluating the error of the impulse neural network on the task through a loss function; training the impulse neural network through error back propagation; and completing parameter updating in the training process through a neural network optimization algorithm. The invention improves the accuracy of the impulse neural network through error back propagation, reduces the pulse release rate through sparse regularization so as to improve the energy efficiency under impulse (event) driving calculation, and is suitable for the training process of various bionic network structures through a graph representation method.

Description

Graph representation space-time back propagation algorithm for impulse neural network

Technical Field

The invention relates to the technical field of impulse neural networks, in particular to a graph representation space-time back propagation algorithm for an impulse neural network.

Background

In recent years, an Artificial Neural Network (ANN) inspired by a biological nervous system has been rapidly developed and greatly advanced, and is widely used in the fields of object detection, face recognition, automatic driving, voice recognition, translation, and the like. However, the traditional ANN still lacks reliable simulation of neuron behaviors and the structure of a nervous system, so that the ANN has a certain gap from organisms on intelligent tasks such as reasoning, decision making and the like, and the energy efficiency is far less efficient than that of the biological brain.

The impulse neural network (SNN) is known as a third generation artificial neural network. SNN has great potential to process signals rich in temporal-spatial domain features due to its simulation of complex neuronal dynamics and various structural designs inspired by the functional regions of the biological nervous system. Since SNNs transfer information between neurons through impulses as in biological neural systems, neurons may not perform extensive computations when they do not receive impulses to maintain a low resting energy overhead. This pulse (event) driven computational feature helps SNN achieve higher energy efficiency.

As one of the ANN's, the SNN also needs to be trained to suit the assigned task. Existing training algorithms include three classes, conversion-based algorithms, synaptic plasticity algorithms, and back propagation algorithms. The conversion-based algorithm converts the parameters of the conventional ANN into the SNN with the same structure, but because the ANN for transmitting information by floating point number and the SNN for transmitting information by pulse cannot be completely matched, the SNN after the parameter conversion has information loss, and the network accuracy is reduced. While the transform-based approach still limits the structure of SNNs to traditional ANN structures, there is a lack of further modeling of biological nervous system structures. The synaptic plasticity algorithm is a physiological phenomenon-based training algorithm that adjusts synaptic weights, i.e., parameters of SNNs, by the timing of pulses before and after synapses by neurons. The synaptic plasticity algorithm is suitable for various network structures and only needs less calculation in the learning process. The traditional synaptic plasticity algorithm is suitable for unsupervised learning, so that the SNN performance is limited to a certain extent. The emerging improved synaptic plasticity algorithm adds a global reward signal to the synaptic weight adjustment for modulation so as to realize a certain degree of supervised learning, but is still lower than the back propagation algorithm. Because the back propagation algorithm makes the network parameter be adjusted accurately through the back propagation of the error, and the higher network performance is improved. At present, the back propagation algorithm suitable for the SNN is used for propagating errors to each network parameter by constructing various guided back propagation paths or approximately substituting back propagation links, and updating and adjusting the parameters in modes of gradient descent and the like. However, similar to the transformation-based method, the existing back propagation algorithm is only suitable for the feedforward structure similar to the traditional ANN, and lacks support for various complex networks simulating biological nervous system structures. Meanwhile, most of the algorithms focus on network accuracy, and the SNN energy efficiency is not explored, so that the advantages of the SNN are not fully utilized.

Disclosure of Invention

The present invention is directed to overcome the problems of the prior art, and provides a graph representation space-time back propagation algorithm for an impulse neural network, so as to solve the training problems of various impulse neural networks with complex structures, and to fully utilize the impulse (event) driving characteristics of the impulse neural network to achieve higher energy efficiency.

The above purpose is realized by the following technical scheme:

a graph representation spatiotemporal back propagation algorithm for an impulse neural network, comprising:

obtaining a spiking neural network through network forward propagation of neurons in a network structure;

evaluating the error of the impulse neural network on the task through a loss function;

training the impulse neural network through error back propagation;

and completing parameter updating in the training process through a neural network optimization algorithm.

Further, the network forward propagation of the neuron in the network structure includes the network forward propagation of the neuron in a feed-forward network structure and the network forward propagation of the neuron in a recurrent network structure.

Further, in the feedforward network structure, the forward process of the neuron is as follows:

in the formula (I), the compound is shown in the specification,

representing the membrane potential of the ith neuron in the feedforward layer at the time t;

representing the input pulse of the jth neuron at the time t, wherein the value of the input pulse of the jth neuron meets x ∈ {0, 1}, wherein 0 represents no input pulse, and 1 represents an input pulse;

the output pulse of the ith neuron in the feedforward layer at the time t is represented, the value also meets s ∈ {0, 1}, and whether the output pulse exists or not is represented respectively; w is a_ijRepresents the synaptic weight from input neuron j to feed-forward layer neuron i, with a 0 value in w indicating the absence of the synapse; b_iRepresents a bias of a neuron, a value of 0 in b indicates that the neuron is not biased; τ represents a leakage constant in the LIF model, and represents a rate of decrease in membrane potential per unit time (Δ t ═ 1); u shape_thRepresenting a pulse firing threshold of the neuron, the neuron firing when the membrane potential is greater than the firing thresholdPulse output;

representing a reset function of the membrane potential, controlling the membrane potential to drop to a resting potential (0 potential) after the pulse is issued;

expressing a release function and controlling whether the neuron releases pulses, wherein the specific function values are as follows:

further, in the feedforward network structure, the forward process of the neuron can be significantly accelerated by a matrix operation, as follows:

for t＝0～T-1 do：

S(：，：，t)_m×n×1＝U(：，：，t)_m×n×1≥U_th

in the formula (I), the compound is shown in the specification,

is an input pulse

In the form of a matrix, the dimensions of the matrix are indicated in the corner marks, m denotes the batch size (batch size), n_inRepresenting the number of input neurons, and T representing the time step of algorithm operation; calculating and storing the membrane potential

And neuronal impulses

Is in matrix form U_m×n×TAnd S_m×n×TAnd then S is_m×n×TPassing as the output of that layer as the input of the next layer;

a matrix of synaptic weights is represented, and,

a bias matrix representing one dimension; an as hadamard product, representing a element-by-element multiplication between matrices; u (: t) represents the slicing operation of the matrix.

Further, in the cyclic network structure, the forward process of the neuron is as follows:

in the formula (I), the compound is shown in the specification,

the output pulse of the ith neuron in the feedforward layer at the time t is represented, the value also meets s ∈ {0, 1}, and whether the output pulse exists or not is represented respectively; w is a_ijRepresents the synaptic weight from input neuron j to feed-forward layer neuron i, with a 0 value in w indicating the absence of the synapse; b_iRepresents a bias of a neuron, a value of 0 in b indicates that the neuron is not biased; τ denotes the leakage constant in the LIF modelThe ratio of (Δ t ═ 1) decrease in membrane potential per unit time is expressed; u shape_thA pulse firing threshold representing a neuron that fires a pulse when the membrane potential is greater than the firing threshold; w is a_ikRepresenting synaptic weights between neurons within the layer;

further, in the cyclic network structure, the forward process of the neuron can be significantly accelerated by a matrix operation, as follows:

for t＝0～T-1 do：

S(：，：，t)_m×n×1＝U(：，：，t)_m×n×1≥U_th

in the formula (I), the compound is shown in the specification,

is an input pulse

And neuronal impulses

a matrix of synaptic weights is represented, and,

a bias matrix representing one dimension; an as hadamard product, representing a element-by-element multiplication between matrices; u (: t) represents the slicing operation of the matrix; [ | · C]Which represents the combination of the two matrices,

the synaptic weight matrix of the input neuron to the recurrent layer neuron and the synaptic weight matrix inside the recurrent layer are merged.

Further, the loss function includes, but is not limited to, a squared loss function, an exponential loss function, or a cross-entropy loss function in the form of a pulse.

Furthermore, sparse regularization is added into the loss function to reduce the pulse release rate of the impulse neural network.

Further, the neural network optimization algorithm includes, but is not limited to, bulk gradient descent, random gradient descent, momentum, adarad, Adam, or AdamW.

Further, the behavior of the neurons in the spiking neural network follows the LIF neuron dynamical model and its corresponding variants.

Advantageous effects

The graph representation space-time back propagation algorithm for the impulse neural network improves the accuracy rate of the impulse neural network through error back propagation, reduces the pulse emitting rate through sparse regularization so as to improve the energy efficiency under pulse (event) driving calculation, and is suitable for the training process of various bionic network structures through the graph representation method.

Drawings

FIG. 1 is a schematic diagram of network layers showing a feedforward structure and a circular structure in a spatiotemporal back propagation algorithm for an impulse neural network according to the present invention;

FIG. 2 is a diagram of forward and backward propagation processes of a spatio-temporal back propagation algorithm in a feedforward network structure according to the present invention;

FIG. 3 is a schematic diagram of a graph representing forward and backward propagation processes of a spatio-temporal back propagation algorithm in a circular network structure for an impulse neural network according to the present invention;

FIG. 4 is a graph representing the results of sparse regularization in a spatiotemporal back propagation algorithm for a spiking neural network according to the present invention;

FIG. 5 is an algorithmic flow diagram representing a spatiotemporal back propagation algorithm for an impulse neural network in accordance with the present invention.

Detailed Description

The invention is explained in more detail below with reference to the figures and examples. The described embodiments are only some embodiments of the invention, not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a graph representation space-time back propagation algorithm for a pulse neural network, aiming at improving the accuracy rate of the pulse neural network and reducing the pulse emitting rate to improve the energy efficiency. Existing impulse neural network learning algorithms are classified into three categories, namely, conversion-based algorithms, synaptic plasticity algorithms and back propagation algorithms, wherein:

the algorithm based on conversion is limited by the traditional network structure, and the impulse neural network for simulating the biological nervous system cannot be realized;

algorithms for synaptic plasticity are limited in network performance (accuracy) due to the nature of their local weight adjustment;

the back propagation achieves better network performance, but the existing back propagation algorithm still lacks support for a more flexible biological nervous system structure and consideration for a pulse delivery rate, so that the energy efficiency of the pulse neural network cannot be fully utilized and explored.

In order to solve the above problems, the present invention can support all network structures and balance the network accuracy rate and the burst rate, and the scheme is as follows:

training the impulse neural network through error back propagation;

The scheme is divided into two processes of network forward propagation and error backward propagation, and the network structure is classified into a feedforward structure and a loop structure, as shown in fig. 1. Wherein:

the feedforward structure is connected to a feedforward layer of the network layer internal neurons only by the input neurons;

the circulating structure presents synapses where internal neurons connect to internal neurons.

It is noted that although the feedforward structure is a special case of a circular structure, the classification made in the algorithm helps speed up the correlation operation of the feedforward structure.

For LIF neurons (Leaky Integrated-and-Fire) neuron dynamical models in a feedforward network structure), the forward process follows the following equation:

in the above formula, the first and second carbon atoms are,

the output pulse of the ith neuron in the feedforward layer at the time t is represented, the value also meets s ∈ {0, 1}, and whether the output pulse exists or not is represented respectively; w is a_ijRepresents the synaptic weight from input neuron j to feed-forward layer neuron i, with a 0 value in w indicating the absence of the synapse; b_iRepresents a bias of a neuron, a value of 0 in b indicates that the neuron is not biased; τ represents a leakage constant in the LIF model, and represents a rate of decrease in membrane potential per unit time (Δ t ═ 1); u shape_thA pulse firing threshold representing a neuron that fires a pulse when the membrane potential is greater than the firing threshold;

the above formula describes the dynamics of the standard LIF model, and other neuron model variants can realize corresponding processes by adjusting the above equation. For example, IF model is a special case of LIF model with τ ═ 1. Thus, the methods provided by the present invention are applicable to, but not limited to, LIF models, and also include a series of derived model variants.

The above process can be significantly accelerated by matrix operations.

The matrix operation process of the algorithm accepts input pulses

In the form of a matrix

The dimensions of the matrix are noted in the corner marks; wherein m is the batch size (batch size), n_inInputting the number of neurons, wherein T is the time step length of the operation of the algorithm; calculating and storing the membrane potential

And neuronal impulses

is a matrix of synaptic weights that is,

a bias matrix that is one-dimensional; the specific matrix algorithm flow of the forward process is as follows:

for t＝0～T-1 do：

S(：，：，t)_m×n×1＝U(：，：，t)_m×n×1≥U_th

the above process is an iterative calculation over time, where | _ is a Hadamard product (Hadamard product), indicating element-by-element multiplication between matrices; u (: t) represents the slicing operation of the matrix. By the above equation, the forward process of the feedforward layer can be calculated and accelerated.

For the forward process of a cyclic network structure, it follows the equation:

the above equation differs from the feedforward layer in that there is a synaptic connection between neurons inside the layer, and the synaptic weight is given by w_ikRepresents; the above process shows that the state of a neuron depends not only on the input pulse and the historical state of the neuron, but also on the pulse firing of other neurons in the same layer. Connections within a layer exist in the network layer and are therefore called a loop layer; in addition, as can be seen from the above formula, the feedforward layer is a cyclic layer at w_ikIn a special form under 0, in the actual calculation process, distinguishing the two layers brings a more concise calculation mode for the feedforward layer.

The above process can still benefit from the matrix operation, and the process is as follows:

for t＝0～T-1 do：

S(：，：，t)_m×n×1＝U(：，：，t)_m×n×1≥U_th

in the above formula [. cndot. ]]Represents the merging of two matrices;

the synaptic weight matrix of the input neuron to the recurrent layer neuron and the synaptic weight matrix inside the recurrent layer are merged. The above equation is a matrix calculation that iterates over time, and the forward process of the loop layer can be calculated and added through the above processAnd (4) speed.

Through the forward propagation process shown in fig. 5, the state of the impulse neural network is calculated layer by layer time by time, and the impulse output is also obtained.

The inference process of the spiking neural network is consistent with the above-described process.

The impulse neural network evaluates the learning condition of the tasks through a loss function, and the loss function is divided into two parts of loss of classification tasks and sparse regular terms.

Taking the classification task as an example, the available loss functions include, but are not limited to, a square loss function in the form of a pulse, an exponential loss function, or a Softmax cross-entropy loss function, specifically:

(1) the square loss function:

wherein Y is a class label (label), i belongs to an output layer of the impulse neural network, namely, the classification loss is only calculated on the output layer; lambda is a regular coefficient, and a regular term punishs frequent pulse delivery, so that pulses are forced to be sparse, and a neural network is encouraged to express information by more efficient pulse delivery.

The partial derivatives of the loss function for the output layer pulses are as follows:

although present

Is about

A function of, i.e.

And

related, but the above effects are taken into account

Without being considered here.

The error of the output layer is obtained by the above formula, and the error is transmitted to other layers through a back propagation process.

(2) Point-wise squared loss function:

the partial derivatives of the loss function to the output layer are:

(3) softmax cross entropy loss function:

the partial derivatives of the loss function to the output layer are:

the propagation path of the error back propagation process is opposite to that of the forward process.

For the feedforward layer, a calculation graph between each state quantity and each parameter in the forward process is shown in fig. 2, and an arrow represents a calculation path between the state and the parameter; the backward propagation reverses the path of the computation graph, and the partial derivatives of the loss function of each forward process are solved, wherein the computation formula of the corresponding partial derivative path in the graph is as follows:

to establish a counter-propagating path, a non-derivable function

And

the corresponding derivative of (a) is approximately calculated:

a and b are two parameters that control the approximate derivative shape, and in general, a-1 and b-1 may be taken.

Partial derivative of loss function for output layer

In the known manner, it is known that,

can pass through

And the partial deviation path is obtained:

as shown in FIG. 2, the above formula is an inversion in time, and the algorithm takes the error of all times after t into account by means of iterative calculation

In (taking into account and correcting for errors involved in the calculation of the function)

Time sequence dependency) of the error propagation path, while the complete error propagation path is calculated by the iterative calculation mode, the lengthy propagation is avoided, and the calculation process is more concise.

Substituting all formulas into the above formula can obtain:

is composed of

w_ijAnd b_iSo that further error function pairs can be found

w_ijAnd b_iPartial derivatives of (a):

and

for the gradient of the parameters of the impulse neural network, network parameters can be updated through optimization algorithms such as SGD, Adam, AdamW and the like, and learning training of the network is carried out.

Using the partial derivatives of the input pulses as error functions for the next-layer network

The error can be propagated back to the entire network layer by layer. It should be noted that the error of the non-output layer needs to be corrected by a sparse regularization term:

the back propagation calculation process can simplify the calculation process through matrix operation and fully utilize calculation resources. Accepted input quantity of algorithm

Is composed of

A matrix representation of (a); output of

Is composed of

In the form of a matrix; and update

And

is composed of

And

the corresponding matrix of (a). The specific algorithm process is as follows:

U′_m×n×T＝τ·(1-S_m×n×T+α·U_m×n×T⊙S_m×n×T⊙G′_m×n×T)

for t＝T-2～0 do：

Δ_U(：，：，t)_m×n×1+＝Δ_U(：，：，t+1)_m×n×1⊙U′(：，：，t)_m×n×1

where sum (-) represents the summation of the matrix over dimension axis.

The algorithm calculates the intermediate quantity without time dependence through process optimization, and remarkably optimizes the operation speed of backward propagation, so that the algorithm is remarkably benefited from computing equipment such as a matrix operation library (such as MKL of Intel) running in a CPU, a GPU and the like designed for large-scale matrix operation and the like.

For the back propagation process of the loop layer, the computation diagram of which is shown in fig. 3, it is distinguished from the feed-forward layer that there is a connection of neurons in the loop layer to other neurons, so that there is an additional error propagation path:

thus, errors are also propagated through the membrane potential of a neuron in the layer to the impulses of other neurons in the layer. The derivative of the error function to membrane potential is calculated as:

the above formula is still an inversion with respect to time, and error propagation at all times is accounted for by iterative calculations

In (1). Substituting into all formulas to obtain

Is calculated as:

further obtain error function pair

w_ij、w_ikAnd b_iPartial derivatives of (a):

the other layers except the output layer still need to be corrected by a sparse regular term when error is propagated layer by layer:

the above-described correlation calculation process can still be accelerated by matrix operations. An algorithm receives input

Output of

And performing (a) on

And

the gradient of (2) is updated.

It should be noted that the above-mentioned materials,

combining input neurons into neurons in layers and synaptic weight gradients between layers, the matrix operation of the above process can be expressed as follows:

U′_m×n×T＝τ·(1-S_m×n×T+α·U_m×n×T⊙S_m×n×T⊙G′_m×n×T)

for t＝T-2～0 do：

Δ_U(：，：，t)_m×n×1+＝Δ_U(：，：，t+1)_m×n×1⊙U′(：，：，t)_m×n×1+Δ_U(：，：，t+1)_m×n×1⊙W(：，-n：)_1×n×n⊙G′(：，：，t)_m×n×1

axis＝[0，3])

the computation process of intermediate quantity is still optimized for the back propagation of the loop layer, and the algorithm obtains remarkable acceleration effect on devices such as a CPU (central processing unit), a GPU (graphics processing unit) and the like.

Through the error back propagation process shown in fig. 5, the gradients of the parameters of the impulse neural network have been calculated by time-reversal layer-by-layer. In the network training learning process, the parameters may be updated by a network optimization algorithm such as SGD, Adam, AdamW, etc.

Taking SGD as an example, the parameter update satisfies the following equation:

where w and b represent all synaptic weights and biases in the network,

and

has been calculated by the above process. Alpha is a learning rate constant (variable under the strategy of dynamic learning rate) in the training process.

The method provides a more flexible back propagation training method of the impulse neural network structure on one hand, and balances the accuracy rate and the issuing rate of the impulse neural network on the other hand.

As shown in fig. 4, by adjusting the sparse regularization term, the pulse delivery rate can be greatly reduced while causing only a small loss in accuracy. On a hardware platform driven by pulse (event) such as special pulse neural network hardware or a brain-like computing accelerator, lower pulse emitting rate can bring smaller computing overhead, so that the energy efficiency of the pulse neural network is improved while higher accuracy is kept.

The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A graph representation spatio-temporal back propagation algorithm for an impulse neural network, characterized by:

training the impulse neural network through error back propagation;

2. A graph representation spatiotemporal back propagation algorithm for spiking neural networks according to claim 1, characterized by the network forward propagation of the neurons in the network structure, comprising the network forward propagation of the neurons in a feed-forward network structure and the network forward propagation of the neurons in a recurrent network structure.

3. A graph representation spatiotemporal back propagation algorithm for an impulse neural network as claimed in claim 2, wherein in said feed forward network structure the forward course of said neurons is as follows:

in the formula (I), the compound is shown in the specification,

4. a graph representation spatiotemporal back propagation algorithm for an impulse neural network as claimed in claim 3, wherein in said feed forward network structure the forward process of said neurons can also be significantly accelerated by matrix operations as follows:

for t＝0～T-1 do：

S(：，：，t)_m×n×1＝U(：，：，t)_m×n×1≥U_th

in the formula (I), the compound is shown in the specification,

is an input pulse

And neuronal impulses

a matrix of synaptic weights is represented, and,

5. A graph representation spatiotemporal back propagation algorithm for an impulse neural network as claimed in claim 2, wherein in said cyclic network structure the forward course of said neurons is as follows:

in the formula (I), the compound is shown in the specification,

the output pulse of the ith neuron in the feedforward layer at the time t is represented, the value also meets s ∈ {0, 1}, and whether the output pulse exists or not is represented respectively; w is a_ijRepresents the synaptic weight from input neuron j to feed-forward layer neuron i, with a 0 value in w indicating the absence of the synapse; b_iRepresents a bias of a neuron, a value of 0 in b indicates that the neuron is not biased; τ represents a leakage constant in the LIF model, and represents a rate of decrease in membrane potential per unit time (Δ t ═ 1); u shape_thA pulse firing threshold representing a neuron that fires a pulse when the membrane potential is greater than the firing threshold; w is a_ikRepresenting synaptic weights between neurons within the layer;

6. a graph representation spatiotemporal back propagation algorithm for spiking neural networks according to claim 5, characterized in that the forward process of the neurons in the recurrent network structure is also significantly accelerated by matrix operations as follows:

for t＝0～T-1 do：

S(：，：，t)_m×n×1＝U(：，：，t)_m×n×1≥U_th

in the formula (I), the compound is shown in the specification,

is an input pulse

And neuronal impulses

a matrix of synaptic weights is represented, and,

a bias matrix representing one dimension; an as Hadamard product, representing a multiplication between matrices by elements(ii) a U (: t) represents the slicing operation of the matrix; [ | · C]Which represents the combination of the two matrices,

7. A graph representation spatio-temporal back propagation algorithm for impulse neural networks according to claim 1, characterized in that the loss function includes but is not limited to a squared loss function, an exponential loss function or a cross entropy loss function in the form of impulses.

8. The graph representation spatiotemporal back propagation algorithm for an impulse neural network of claim 7, wherein sparse regularization is further added to the loss function to reduce the pulse firing rate of the impulse neural network.

9. A graph representation spatio-temporal back propagation algorithm for an impulse neural network as claimed in claim 1, characterized in that the neural network optimization algorithm includes but is not limited to bulk gradient descent, random gradient descent, momentum, adarad, Adam or AdamW.

10. A graph representation spatiotemporal back propagation algorithm for an impulse neural network as claimed in claim 1, wherein the neuron behavior in said impulse neural network follows LIF neuron dynamical models and their corresponding variants.