WO2022206193A1

WO2022206193A1 - Spiking neural network circuit and spiking neural network-based calculation method

Info

Publication number: WO2022206193A1
Application number: PCT/CN2022/076269
Authority: WO
Inventors: 张子阳; 刘涛; 王侃文; 廖健行
Original assignee: 华为技术有限公司
Priority date: 2021-04-02
Filing date: 2022-02-15
Publication date: 2022-10-06
Also published as: US20240013037A1; EP4283522A1

Abstract

A spiking neural network circuit and a spiking neural network-based calculation method, the circuit comprising a plurality of decompression modules and a calculation module. The plurality of decompression modules are respectively used to, according to information of a plurality of input neurons, obtain a plurality of weight values in a compressed weight matrix and identifiers of a plurality of corresponding output neurons, wherein each decompression module among the plurality of decompression modules is used to obtain in parallel the weight values of the same row number in the compressed weight matrix and the identifiers of the plurality of output neurons corresponding to the weight values of the same row number; the number of non-zero weight values of each row in the compressed weight matrix is the same; and the weight value of each row corresponds to one input neuron. The calculation module is used to respectively determine film voltages of the plurality of corresponding output neurons according to the plurality of weight values. The spiking neural network circuit can increase the calculation efficiency.

Description

Impulsive neural network circuit and computing method based on spiking neural network

This application claims the priority of the Chinese patent application with the application number 202110363578.3 and the application title "A Pulse Neural Network Compression Method and Device" filed with the State Intellectual Property Office of China on April 2, 2021, and filed on May 2021 The priority of the Chinese patent application filed on the 28th with the State Intellectual Property Office of China, the application number is 202110588707.9, and the application name is "Spurious Neural Network Circuit and Computing Method Based on Spurious Neural Network", the entire contents of which are incorporated by reference in this application.

technical field

The present application relates to the field of image processing, and more particularly, to a spiking neural network circuit and a computing method based on the spiking neural network.

Background technique

As a new type of neural network, spiking neural network (SNN) is often referred to as the third-generation artificial neural network. It is closer to the real biology in terms of information processing methods and biological models than traditional artificial neural networks. processing system.

Information is transmitted between neurons in a spiking neural network in the form of pulses. The occurrence of a pulse is determined by differential equations representing various biological processes, the most important of which is the neuron's membrane voltage. By accumulating the pulse train of the pre-neuron, the membrane voltage of each neuron changes with the input pulse. When the membrane voltage of a neuron reaches a preset voltage value, the neuron will be activated to generate a new signal (for example, firing a pulse), and transmit the signal to other neurons connected to it. Correlated spiking neural network circuits are less efficient at calculating the membrane voltage of neurons.

SUMMARY OF THE INVENTION

The present application provides a spiking neural network circuit and a computing method based on the spiking neural network, and the spiking neural network circuit can improve computing efficiency.

In a first aspect, an impulse neural network circuit is provided, and the circuit includes: a plurality of decompression modules and a calculation module. Wherein, the multiple decompression modules are used to obtain multiple weight values in the compressed weight matrix and the identifiers of the corresponding multiple output neurons according to the information of multiple input neurons, wherein the multiple decompression modules Each decompression module in is used to obtain in parallel the weight values of the same row number in the compressed weight matrix and the identifiers of multiple output neurons corresponding to the weight values of the same row number, and the compressed weights The number of non-zero weight values of each row in the matrix is the same, and the weight value of each row corresponds to an input neuron; the calculation module is used to determine the corresponding membrane voltages of the plurality of output neurons according to the plurality of weight values.

In the above technical solution, since the number of non-zero weight values in each row in the compressed weight matrix is the same, each decompression module of the multiple decompression modules in the spiking neural network circuit is used to obtain the compressed weight matrix in parallel. The weight value of the same number of rows and the identifiers of the multiple output neurons corresponding to the weight value of the same number of rows in the same number of rows, in this way, multiple decompression modules perform parallel decompression at the same time, which increases the computing speed of the pulse neural network chip. Thereby, the computing efficiency is improved, and the effect of reducing the delay and power consumption is achieved.

With reference to the first aspect, in some implementations of the first aspect, the input neurons in the spiking neural network circuit include a first input neuron and a second input neuron, and the plurality of decompression modules include a first input neuron. A decompression module and a second decompression module, the first decompression module is used to obtain the first row weight value and the first row weight corresponding to the first input neuron in the compressed weight matrix The identifier of one or more output neurons corresponding to the values respectively; the second decompression module is used to obtain the weight value of the second row corresponding to the second input neuron in the compressed weight matrix and the The identifiers of one or more output neurons corresponding to the weight values in the second row respectively.

With reference to the first aspect, in some implementations of the first aspect, the first decompression module is specifically configured to: obtain a base address for storing the weight value of the first row from a first storage space, and the first storage The base address of each row of weight values in the compressed weight matrix and the number of non-zero weight values in each row are stored in the space; the first row of weight values is obtained from the second storage space according to the base address of the first row of weight values. row weight value, and the identifier of the output neuron corresponding to the first row weight value respectively, the second storage space stores the first row weight value and the output neuron corresponding to the first row weight value Element ID.

With reference to the first aspect, in some implementations of the first aspect, the spiking neural network circuit further includes: a compression module, configured to prune part of the weight values in the initial weight matrix according to the pruning ratio, to obtain the The compressed weight matrix.

With reference to the first aspect, in some implementations of the first aspect, the compressed weight matrix includes multiple weight groups, and the number of non-zero weight values in each row in each weight group of the multiple weight groups is the same .

With reference to the first aspect, in some implementations of the first aspect, the computing module includes multiple computing sub-modules, and each computing sub-module in the multiple computing sub-modules is responsible for computing a weight group in parallel Membrane voltage of output neurons in .

In the above technical solution, since the number of non-zero weight values in each row in each weight group of the multiple weight groups in the compressed weight matrix is the same, in this way, multiple calculation submodules can be used, and each calculation submodule is used in parallel. Responsible for calculating the membrane voltage of the output neurons in a weight group. In this way, multiple computing sub-modules perform parallel computing at the same time, which increases the computing speed of the impulsive neural network chip, thereby improving computing efficiency and reducing time delay and power consumption.

With reference to the first aspect, in some implementations of the first aspect, the plurality of calculation submodules include a first calculation submodule and a second calculation submodule, and the first calculation submodule includes a first accumulation engine and a first calculation submodule. a calculation engine, the second calculation sub-module includes a second accumulation engine and a second calculation engine, the first accumulation engine is used to determine the output neurons in the first weight group corresponding to the first calculation sub-module the corresponding weight accumulation value; the first calculation engine is configured to determine the membrane voltage of the output neuron in the first weight group at the current moment according to the weight accumulation value output by the first accumulation engine; the second The accumulation engine is used to determine the weight accumulation value corresponding to the output neuron in the second weight group corresponding to the second calculation submodule; the second calculation engine is used for the weight accumulation value output by the second accumulation engine , determine the membrane voltage of the output neurons in the second weight group at the current moment.

In a second aspect, a computing method based on a spiking neural network is provided, comprising: obtaining multiple weight values in the compressed weight matrix and the identifiers of the corresponding multiple output neurons according to the information of multiple input neurons, respectively, The multiple weight values include weight values of the same row number in the compressed weight matrix obtained in parallel, and the identifiers of the multiple output neurons include the parallel obtained weight values corresponding to the same row number The number of non-zero weight values of each row in the compressed weight matrix is the same, and the weight value of each row corresponds to an input neuron; according to the multiple weight values, the corresponding the membrane voltage of the plurality of output neurons.

With reference to the second aspect, in some implementations of the second aspect, the input neuron of the spiking neural network circuit includes a first input neuron and a second input neuron, and the compressed weight matrix is obtained and the obtaining the first row weight value corresponding to the first input neuron, and the identifiers of one or more output neurons corresponding to the first row weight value respectively; The second row of weight values corresponding to the neurons, and the identifiers of one or more output neurons corresponding to the second row of weight values respectively.

With reference to the second aspect, in some implementations of the second aspect, before obtaining the multiple weight values in the compressed weight matrix and the identifiers of the corresponding multiple output neurons, the method further includes: Part of the weight values in the initial weight matrix is pruned according to the pruning ratio to obtain the compressed weight matrix.

With reference to the second aspect, in some implementations of the second aspect, the compressed weight matrix includes multiple weight groups, and the number of non-zero weight values in each row in each weight group of the multiple weight groups is the same .

With reference to the second aspect, in some implementations of the second aspect, the membrane voltages of the corresponding multiple output neurons are determined in parallel according to multiple weight values in each weight group.

With reference to the second aspect, in some implementations of the second aspect, the multiple weight groups include a first weight group and a second weight group, and the weight accumulation corresponding to the output neurons in the first weight group is determined value, and determine the membrane voltage of the output neurons in the first weight group at the current moment according to the weight accumulation value corresponding to the output neurons in the first weight group; The weight accumulation value corresponding to the output neuron is output, and the membrane voltage of the output neuron in the second weight group at the current moment is determined according to the weight accumulation value corresponding to the output neuron in the second weight group.

The beneficial effects of the second aspect and any possible implementation manner of the second aspect correspond to the beneficial effects of the first aspect and any possible implementation manner of the first aspect, which will not be repeated here.

In a third aspect, a spiking neural network system is provided, including a memory and a neural network circuit according to any one of the possible implementations of the first aspect and the first aspect, where the memory is used to store a plurality of compressed weight values.

With reference to the third aspect, in some implementations of the third aspect, the memory is further configured to store information of a plurality of input neurons.

In a fourth aspect, an spiking neural network system is provided, including a processor and a neural network circuit according to any one possible implementation of the first aspect and the first aspect, the processor includes an input buffer, and the input buffer uses for caching the information of the plurality of input neurons.

With reference to the fourth aspect, in some implementations of the fourth aspect, the system further includes a memory for storing the compressed multiple weight values.

In a fifth aspect, an apparatus for determining the membrane voltage of a spiking neuron is provided, comprising a communication interface and a processor. Wherein, the processor is configured to control the communication interface to send and receive information, the processor is connected to the communication interface, and is configured to execute the method for determining the membrane voltage of the spiking neuron in the second aspect or any possible implementation manner of the second aspect.

Optionally, the processor may be a general-purpose processor, which may be implemented by hardware or software. When implemented by hardware, the processor can be a logic circuit, an integrated circuit, etc.; when implemented by software, the processor can be a general-purpose processor, implemented by reading software codes stored in a memory, which can Integrated in the processor, can be located outside the processor, independent existence.

In a sixth aspect, there is provided a computer program product, the computer program product comprising: computer program code, when the computer program code is run on a computing device, the computing device is made to execute any one of the second aspect or the second aspect possible methods.

In a seventh aspect, a computer-readable medium is provided, and the computer-readable medium stores program codes that, when the computer program codes are executed on a computing device, cause the computing device to execute the second aspect or any one of the second aspects. a possible way to do it. These computer-readable storages include, but are not limited to, one or more of the following: read-only memory (ROM), programmable ROM (PROM), erasable PROM (erasable PROM, EPROM), Flash memory, electrical EPROM (electrically EPROM, EEPROM) and hard drive (harddrive).

Description of drawings

Figure 1 shows a schematic diagram of the structure of a spiking neural network.

FIG. 2 is a schematic flowchart of a method for weight compression of a spiking neural network provided by an embodiment of the present application.

FIG. 3 is a schematic diagram of an initial weight matrix of a spiking neural network provided by an embodiment of the present application.

FIG. 4 is a schematic diagram of semi-structured pruning of an initial weight matrix of a spiking neural network provided by an embodiment of the present application.

FIG. 5 is a schematic structural diagram of a spiking neural network after semi-structured pruning provided by an embodiment of the present application.

FIG. 6 is a schematic diagram of grouping and semi-structured pruning of an initial weight matrix of a spiking neural network provided by an embodiment of the present application.

FIG. 7 is a schematic structural diagram of a spiking neural network after grouping semi-structured pruning provided by an embodiment of the present application.

FIG. 8 is a schematic structural diagram of an impulse neural network circuit provided by an embodiment of the present application.

FIG. 9 is a schematic block diagram of an associated compression weight storage space 240 provided by an embodiment of the present application.

FIG. 10 is a schematic block diagram of a decompression engine obtaining weight values and corresponding output neurons according to an embodiment of the present application.

FIG. 11 is a schematic block diagram of an accumulation engine obtaining weight accumulation values of output neurons according to an embodiment of the present application.

FIG. 12 is a schematic block diagram of a calculation engine determining the membrane voltage of a spiking neuron according to an embodiment of the present application.

FIG. 13 is a schematic flowchart of a method for calculating the membrane voltage of a spiking neuron provided by an embodiment of the present application.

FIG. 14 is a schematic structural diagram of another spiking neural network circuit provided by an embodiment of the present application.

FIG. 15 is a schematic block diagram of a spiking neural network system 1500 provided by an embodiment of the present application.

Detailed ways

The technical solutions in the present application will be described below with reference to the accompanying drawings.

Artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that responds in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theory.

In the field of AI, a neural network (NN) is a mathematical model or computational model that imitates the structure and function of a biological neural network (animal central nervous system, especially the brain), and is used to estimate or approximate functions . The inside of the biological brain is composed of a large number of neurons through different connection methods. The former neuron and the latter neuron are connected through the synaptic structure for information transmission. As a new type of neural network, spiking neural network (SNN) is often referred to as the third-generation artificial neural network, which is closer to the real biology in terms of information processing methods and biological models than traditional artificial neural networks. processing system. Specifically, on the one hand, the artificial neural network transmits multi-valued signals, and the spiking neural network transmits binary pulse information, so its input and output information is sparse, and the spiking neural network has low power consumption; on the other hand, the spiking neural network The neuron model of the network is similar to the neuron model of the brain, which has a dynamic accumulation process and has one more time dimension information than the traditional artificial neural network, so it is more suitable for processing intelligent tasks with time information.

Figure 1 shows a schematic diagram of the structure of a spiking neural network. Referring to Figure 1, the spiking neural network can contain three layers: input layer, hidden layer, and output layer. Among them, the hidden layer contains multiple layers, the logic in the layer is parallel, and the logic between the layers is serial, and the calculation results between the layers are interdependent and affect each other. For the convenience of description, Fig. 1 takes the example that the hidden layer includes a layer of neurons for illustration.

Referring to Fig. 1, each layer of a spiking neural network may include multiple nodes, each of which is used to simulate a spiking neuron for performing a certain operation, such as an activation function. The connection between the former neuron (which may also be called an input neuron) and the latter neuron (which may also be called an output neuron) is used to simulate a synapse. It should be understood that a synapse is a carrier of information transmission between two neurons, and the weight value of the synapse represents the connection strength between the two neurons. It should be understood that the reference numbers in each node shown in FIG. 1 are only for identifying or distinguishing different nodes.

The transmission of information between neurons in a spiking neural network in the form of spikes is based on discrete-valued activities that occur at certain points in time, rather than continuous values. The occurrence of a pulse is determined by differential equations representing various biological processes, the most important of which is the neuron's membrane voltage. By accumulating the pulse train of the pre-neuron, the membrane voltage of each neuron changes with the input pulse. When the membrane voltage of a neuron reaches a preset voltage value, the neuron will be activated to generate a new signal (for example, firing a pulse), and transmit the signal to other neurons connected to it. After the neuron fires a pulse, its membrane voltage resets and continues to change the neuron's membrane voltage by accumulating the pre-neuron's pulse train. The neurons in the spiking neural network realize the transmission and processing of information through the above methods, and have information processing capabilities such as nonlinearity, self-adaptation, and fault tolerance.

It should be noted that, one synaptic connection may be used between two neurons in the spiking neural network, or multiple synaptic connections may also be used, which is not specifically limited in this application. Each synapse has a modifiable synaptic weight (also called weight), and multiple pulses transmitted by presynaptic neurons can generate different postsynaptic membrane voltages according to the size of the synaptic weight.

Although the spiking neural network has the characteristics of sparseness and low power consumption during operation, its accuracy is not high. In order to improve the accuracy of the network, the number of weights will also be large, which makes the weight storage in the spiking neural network chip too large. The area, delay and power consumption of the chip will increase accordingly, which restricts the hardware development and commercialization of the spiking neural network. Therefore, it is of great significance to compress the weights of spiking neural networks.

In view of this, an embodiment of the present application provides a method for compressing the weight of a spiking neural network, which can make the number of non-zeros in each row or the number of non-zeros in each row and group the same in the weight matrix, so as to save the spiking neural network. In addition to the weight storage resources at the network hardware level, it can also realize parallel decompression and parallel computing at the hardware level of the spiking neural network, increasing the computing speed, thereby improving the computing efficiency of the spiking neural network hardware level and reducing the delay and power consumption. .

FIG. 2 is a schematic flowchart of a method for weight compression of a spiking neural network provided by an embodiment of the present application. As shown in FIG. 2, the method may include steps 210-270, and the steps 210-270 will be described in detail below respectively.

Step 210: Load the pre-trained spiking neural network to obtain initial weights.

Taking the hidden layer in the spiking neural network shown in Figure 1 as an example, the initial weight matrix of the hidden layer is shown in Figure 3. where each row in the initial weight matrix represents an input neuron, eg, the neuron of the input layer connected to the hidden layer. Each column represents an output neuron. For example, a neuron in the hidden layer.

For example, in the initial weight matrix shown in FIG. 3 , the weights W ₁₁ to W ₄₁ in the first column represent the weight values corresponding to the neuron No. 1 in the hidden layer in FIG. 1 ; the weights W ₁₂ to W ₄₂ in the second column represent The weight values corresponding to the No. 2 neuron in the hidden layer in Figure 1; the weights W ₁₃ to W ₄₃ in the third column represent the weight values corresponding to the No. 3 neuron in the hidden layer in Figure 1; the weights W ₁₄ to W in the fourth column ₄₄ represents the weight value corresponding to the No. 4 neuron in the hidden layer in Figure 1; the weights W ₁₅ to W ₄₅ in the fifth column represent the weight value corresponding to the No. 5 neuron in the hidden layer in Figure 1; the sixth column The weight W ₁₆ ~W ₄₆ represents the weight value corresponding to neuron No. 6 in the hidden layer in Figure 1. The weights W ₁₁ to W ₁₆ in the first row represent the weight values corresponding to the No. 7 neuron in the input layer in Figure 1; the weights W ₂₁ to W ₂₆ in the second row represent the weights corresponding to the No. 8 neuron in the input layer in Figure 1 value; the weights W ₃₁ to W ₃₆ in the third row represent the weight values corresponding to the No. 9 neuron in the input layer in Figure 1; the weights W ₄₁ to W ₄₆ in the fourth row represent the corresponding weights for the No. 10 neuron in the input layer in Figure 1 weight value.

Step 220: Select different weight matrix pruning schemes according to requirements.

As an example, if it is necessary to achieve the same number of non-zero weights in each row of the weight matrix, semi-structured pruning in step 230 may be performed; if the weight matrix needs to be grouped, and the non-zero weights of each row and each group of the weight matrix should be implemented. The number of zero weights is the same, and the grouped semi-structured pruning in step 240 can be performed.

Step 230: Semi-structured pruning.

The semi-structured pruning in the embodiment of the present application refers to performing weight pruning at the granularity of each behavior of the weight matrix to obtain a semi-structured pruned weight matrix. Specifically, the weight value of each row in the original weight matrix can be sorted according to the weight size of each row of the weight matrix, and then the s% (sparseness) at the end of the sorting is set to 0. Thus, the weight matrix after semi-structured pruning is obtained, so that the length of each row in the weight matrix after semi-structured pruning is the same.

For example, taking the sparsity of 66.6% as an example, the weight value of each row in the initial weight matrix shown in Figure 3 can be sorted, and then the weight value of the last 66.6% of the sorting can be set to 0. In the weight matrix shown in Figure 4, the dotted line represents the pruned weight value, and the weight matrix composed of solid lines is the pruned weight matrix. In the pruned weight matrix, the number of non-zero weight values in each row is the same (both include two weight values). For example, as shown in Figure 4, the first row of the weight matrix after semi-structured pruning includes two weights, W ₁₁ and W ₁₄ , namely the No. 7 neuron in the input layer in Figure 1 and the No. 1 neuron in the hidden layer respectively. The neuron and the No. 4 neuron are connected, and the weights of the connection are W ₁₁ and W ₁₄ respectively; the second line includes two weights of W ₂₂ and W ₂₄ , that is, the No. 8 neuron in the input layer in Figure 1 and the hidden layer are respectively The No. 2 neuron and No. 6 neuron are connected, and the weight values of the connection are W ₂₂ and W ₂₆ respectively; the third line includes two weights of W ₃₁ and W ₃₅ , that is, the No. 9 neuron in the input layer in Figure 1 and the hidden The No. 1 neuron and No. 5 neuron of the layer are connected, and the weight values of the connection are W ₃₁ and W ₃₅ respectively; the fourth line includes two weights of W ₄₃ and W ₄₄ , that is, the No. 10 neuron of the input layer in Figure 1 is respectively It is connected with the No. 3 neuron and No. 4 neuron in the hidden layer, and the weight values of the connection are W ₄₃ and W ₄₄ respectively. The specific structure of the spiking neural network after semi-structured pruning is shown in Figure 5.

That is to say, the length of each row in the pruned weight matrix is the same, and the number of connections between each neuron in the same layer and the neuron in the next layer is the same.

Step 240: Group semi-structured pruning.

The grouping semi-structured pruning in the embodiment of the present application refers to dividing each row into several weight groups of equal number, and performing weight pruning with each group in each row of the weight matrix as the granularity to obtain the grouping semi-structure pruned weight matrix. Specifically, each group in each row of the weight matrix can be used as the granularity, and the weight values of each row and each group in the original weight matrix can be sorted according to the size of the weight, and then the last s% (sparseness) Set to 0. Thus, the weight matrix after grouping and semi-structured pruning is obtained, so that the lengths of each row and each group in the weight matrix after grouping and semi-structured pruning are the same.

For example, taking the sparsity of each group as 66.6%, the neurons in the hidden layer shown in Figure 1 can be divided into two groups, each group includes three neurons, for example, in the first group Three neurons with neuron numbers 1-3 are included, and three neurons with neuron numbers 4-6 are included in the second group. The weight values of each group in each row in the initial weight matrix shown in Figure 3 can be sorted, and then the bottom 66.6% of the sorted weight values are set to 0. In the weight matrix shown in Figure 6, the dotted line represents the pruned weight value, and the weight matrix formed by the solid line is the pruned weight matrix. In the weight matrix after pruning, the number of non-zero weight values of each group in each row is the same (including one weight value), that is, each row and each group in the weight matrix after group pruning of the same length. The specific structure of the spiking neural network after grouping semi-structured pruning is shown in Figure 7.

Step 250: Calculate a loss function according to the pruned weights.

It will be appreciated that the loss function is used to assist in optimizing the parameters of the spiking neural network by calculating the error between the actual (target) value and the predicted value of the spiking neural network. As an example, in this embodiment, the loss function of the spiking neural network can be calculated according to the pruned weight, so as to obtain the error between the actual (target) value and the predicted value of the spiking neural network, so as to optimize or update the pruning according to the error. The weight matrix after the branch.

Step 260: Retrain the spiking neural network and update the pruned weight matrix.

As an example, the parameters (weights) of the spiking neural network can be optimized according to the above loss function to minimize the loss of the spiking neural network. For example, the parameters (weights) of a spiking neural network can be optimized using gradient descent, and the pruned weight matrix can be updated to minimize the loss of the neural network.

Step 270: Determine whether the spiking neural network has converged.

If the spiking neural network converges, the process ends; if the spiking neural network does not converge, continue to perform step 230 or step 240 until the spiking neural network converges.

In the above technical solutions, semi-structured pruning can achieve the same number of rows in each layer of weight matrices, and grouped semi-structured pruning can achieve the same number of rows and groups in each layer of weight matrices. In this way, while saving weight storage resources at the hardware level, it also helps to achieve parallel decompression and parallel computing, increasing the computing speed at the hardware level, thereby improving computing efficiency and reducing latency and power consumption.

Taking the spiking neural network shown in FIG. 1 as an example, the hardware level of the spiking neural network provided by the embodiments of the present application will be described in detail below with reference to FIGS. 8 to 12 . It should be understood that the examples in FIG. 8 to FIG. 12 are only for helping those skilled in the art to understand the embodiments of the present application, and are not intended to limit the embodiments of the present application to specific numerical values or specific scenarios exemplified. Those skilled in the art can obviously make various equivalent modifications or changes according to the examples of FIG. 8 to FIG. 12 given below, and such modifications and changes also fall within the scope of the embodiments of the present application.

FIG. 8 is a schematic structural diagram of an impulse neural network circuit provided by an embodiment of the present application. As shown in FIG. 8 , the spiking neural network circuit may include: 1 to n decompression engines (also referred to as decompression modules or decompression modules) and a calculation module 210 . In the spiking neural network circuit, the calculation module 210 may include an accumulation engine 250 and a calculation engine 260 . Optionally, the spiking neural network circuit further includes: an input buffer 205, a compression module 220, an associated compression weight address information storage space 230, an associated compression weight storage space 240, a weight accumulation storage space 270, a neuron parameter storage space 280, a membrane Voltage storage space 290 . The functions of each of the above modules will be described in detail below.

The input buffer 205 is used to store the information of the pre-neuron (input neuron) that sends the input pulse (the information may be the number or index of the neuron). In this embodiment, the input neuron may be the input shown in FIG. 1 layer of neurons. As an example, the input buffer 205 may be a processor's buffer.

The compression module 220 is configured to execute the method shown in FIG. 2 above to obtain a pruned weight matrix, where the pruned weight matrix includes the pruned weight and an output neuron number corresponding to the weight. The pruned weight and the output neuron number corresponding to the weight may also be stored in the associated compressed weight storage space 240 .

The associated compression weight storage space 240 is used to store the pruned weight and the number of the output neuron corresponding to the weight. In this embodiment, the output neuron may be the neuron of the hidden layer shown in FIG. 1 . As an example, the pruned weights obtained through the above semi-structured pruning in FIG. 4 and the corresponding output neuron numbers can be related according to a certain correspondence to form associated compressed weight data, and the associated compressed weight data Hardened into the associative compressed weight storage space 240 of the spiking neural network chip.

Specifically, as shown in FIG. 9 , the associated compression weight storage space 240 stores compression weights and associated indexes. This association method may adopt a direct indexing method or an indirect indexing method, which is not specifically limited in this embodiment of the present application. For example, the direct index method is to add a corresponding index before each compression weight, and the specific content of the index is the number of the neuron. For another example, the indirect index method is to add a corresponding index before each compression weight, and the specific content of the index is the distance between the neuron number of the compression weight and the neuron number of the previous compression weight.

It should be understood that FIG. 9 shows a storage format diagram of the associated compression weight storage space 240 for semi-structured pruning. This figure only shows the format diagram of one layer (for example, hidden layer) of the spiking neural network, other layers are similar. where each row represents an input neuron and each column represents an output neuron. It can be seen that after semi-structured sparse, the number of compression weights for each row is the same. After the associated compression weight is obtained, the weight matrix is hardened into the associated compression weight storage space 240 of the chip.

For example, taking the pruned weight obtained by semi-structured pruning in FIG. 4, the specific content of the index is the number of the neuron associated with the weight as an example, the first row of the associated compressed weight storage space 240 stores 1 -W ₁₁ , 4-W ₁₄ ; the second row stores 2-W ₂₂ , 6-W ₂₆ ; the third row stores 1-W ₃₁ , 5-W ₃₅ ; the fourth row stores 3-W ₄₃ , 4 —W ₄₄ . Among them, 1-W ₁₁ stored in the first row corresponds to an associated compression weight above, which represents the compression weight W ₁₁ and the index corresponding to the compression weight, and the index is 1 for the output neuron associated with the compression weight No. neuron, 4-W ₁₄ corresponds to an associated compression weight above, which represents the compression weight W ₁₄ and the index corresponding to the compression weight, the index is the output neuron associated with the compression weight is No. 4 neuron Yuan.

The associated compression weight address information storage space 230 is used to store the above-mentioned address resolution information associated with the compression weight. As an example, the geocoding information may be the base address of the associated compression weight for each row and the number of compression weights for each row. In the semi-structured pruning scheme shown in Figure 4, since the number of associated compression weights for each row is the same, only one number of weights per row needs to be stored, and the associated compression weight storage can be calculated according to the base address of the associated compression weights of each row. The address in space 240 of the corresponding associated compression weight. Compared with the unstructured pruning scheme, since the number of associated compression weights of each row is different, the number of associated compression weights of each row needs to be stored separately, which can save weight storage resources.

It should be understood that FIG. 9 shows a storage format diagram of the associated compressed weight address information storage space 230 of semi-structured pruning. This figure only shows the format diagram of one layer (for example, hidden layer) of the spiking neural network, other layers are similar. After obtaining the address resolution information associated with the compression weight, the address resolution information associated with the compression weight is hardened into the storage space 230 of the associated compression weight address information of the chip.

For example, taking the pruned weight obtained by semi-structured pruning in FIG. 4 as an example, the associated compression weight address information storage space 230 can store the base address of the associated compression weight of each row and the number of associated compression weights of each row. , the number is 2, for example.

The decompression engine is configured to de-associate the associative compression weights stored in the associative compression weight storage space 240 according to the information of the plurality of input neurons. Specifically, referring to FIG. 10 , the decompression engine may obtain the input neuron number from the input buffer 205 , and resolve the address information of the associated compression weight in the associated compression weight address information storage space 230 according to the number. The associated compression weight is obtained from the associated compression weight storage space 240 according to the address information, and the associated compression weight is disassociated through the disassociation module 1010 to obtain the corresponding output neuron number and weight. As an example, if the index format is a direct index, the output neuron number and weight information can be obtained directly; if the index format is an indirect index, the neuron number and weight information can be obtained through a shift operation.

Since the weight compression in the embodiment of this application uses a semi-structured pruning scheme, so that the number of weights in each row after pruning is the same, 1 to n decompression engines can be used, and each decompression engine is responsible for the information of multiple input neurons. The associative compression weights of a row in the associative compression weight storage space 240 are decompressed. In this way, the 1-n decompression engines perform parallel decompression at the same time, which increases the computing speed of the spiking neural network chip, thereby improving computing efficiency and reducing latency and power consumption.

For example, taking the semi-structured pruning scheme in FIG. 4 as an example, the spiking neural network chip may include four decompression engines, and each decompression engine is responsible for decompressing the associated compression weights in a row in the associated compression weight storage space 240 . For example, the decompression engine 1 is responsible for decompressing the associated compression weights (for example, 1-W ₁₁ , 4-W ₁₄ ) stored in the first row of the associated compression weight storage space 240 ; the decompression engine 2 is responsible for decompressing the associated compression weight storage space 240 The associated compression weights (eg, 2-W ₂₂ , 6-W ₂₆ ) stored in the second row are decompressed; the decompression engine 3 is responsible for decompressing the associated compression weights (eg, 1-W ) stored in the third row of the associated compression weight storage space 240 ₃₁ , 5-W ₃₅ ) for decompression; the decompression engine 4 is responsible for decompressing the associated compression weights (for example, 3-W ₄₃ , 4-W ₄₄ ) stored in the fourth row of the associated compression weight storage space 240 .

Calculation module 210 , one example, may include accumulation engine 250 and calculation engine 260 . Among them, the accumulation engine 250 is used to accumulate the weights of the corresponding output neurons. Specifically, referring to FIG. 11 , the accumulation engine 250 can read the weight accumulation value corresponding to the neuron number in the weight accumulation storage space 270 according to the neuron numbers output by the 1-n decompression engines, and decompress the 1-n number of neurons. The weight corresponding to the neuron number output by the engine is accumulated with the weight accumulation value, and then the accumulated value is written into the weight accumulation storage space 270 . The calculation engine 260 is used to output the calculation of the neuron membrane voltage. Specifically, referring to FIG. 12 , after all neurons in this layer are decompressed and accumulated, the calculation engine 260 reads the membrane voltage of the previous time from the membrane voltage storage space 290 , the neuron parameter space 280 and the weight accumulation storage space 270 respectively. Voltage, neuron parameter configuration and weight accumulation value, and the neuron calculation module 1201 performs membrane voltage accumulation. If the membrane voltage exceeds the threshold voltage, a pulse is fired and the membrane voltage is written back to the membrane voltage storage space 290 . If the membrane voltage does not exceed the threshold voltage, the accumulated membrane voltage is written back to the membrane voltage storage space 290 .

The weight accumulation storage space 270 is used to store the weight accumulation value corresponding to each output neuron.

The neuron parameter space 280 is used to store the neuron parameter configuration information of the spiking neural network.

Membrane voltage storage space 290 for storing the accumulated membrane voltage of the neuron.

Taking the spiking neural network circuit shown in FIG. 8 as an example, the specific implementation process of calculating the spiking neuron membrane voltage for the circuit will be described in detail below with reference to FIG. 13 . It should be understood that the example in FIG. 13 is only for helping those skilled in the art to understand the embodiments of the present application, but is not intended to limit the embodiments of the present application to specific numerical values or specific scenarios exemplified. According to the example of Fig. 13 given below, those skilled in the art can obviously make various equivalent modifications or changes, and such modifications and changes also fall within the scope of the embodiments of the present application.

FIG. 13 is a schematic flowchart of a method for calculating the membrane voltage of a spiking neuron provided by an embodiment of the present application. As shown in FIG. 13 , the method may include steps 1310-1350, and the steps 1310-1350 will be described in detail below respectively.

It should be understood that, for the convenience of description, FIG. 13 illustrates the calculation of the membrane voltages of neurons No. 1 to 6 in the hidden layer, and the calculation of the membrane voltages of neurons in other layers is similar to the method shown in FIG. 13 .

Step 1310: The four decompression engines obtain corresponding input neuron numbers from the input buffer 205 in parallel, respectively.

As an example, the four decompression engines (decompression engine 1 to decompression engine 4 ) respectively obtain input neuron numbers from the input buffer 205 as the 7th to 10th neurons of the input layer.

Step 1320: The four decompression engines obtain the associated compression weights in parallel according to the input neuron numbers, and perform de-association to obtain the output neuron numbers and corresponding weights.

As an example, FIG. 8 may include 4 decompression engines (decompression engine 1 to decompression engine 4), each decompression engine is responsible for de-associating the associated compression weights of the corresponding rows in the associated compression weight storage space 240, and the 4 decompression engines ( The decompression engine 1 to the decompression engine 4) can perform the deassociation of the associated compression weights of the four rows in the associated compression weight storage space 240 in parallel. Specifically, each decompression engine parses the address information of the associated compression weight of the corresponding row in the associated compression weight address information storage space 230 according to the input neuron number in parallel, and obtains the address information from the associated compression weight storage space 240 in parallel according to the address information Associate the compression weights, and de-associate the associated compression weights to obtain the corresponding output neuron numbers and weights. For example, the decompression engine 1 is responsible for decompressing the associated compression weights (for example, 1-W ₁₁ , 4-W ₁₄ ) stored in the first row of the associative compression weight storage space 240 to obtain the weight value corresponding to the output neuron No. 1 as W ₁₁ , the corresponding weight value of the output neuron No. 4 is W ₁₄ . At the same time, the decompression engine 2 is responsible for decompressing the associated compression weights (for example, 2-W ₂₂ , 6-W ₂₆ ) stored in the second row of the associated compression weight storage space 240 in parallel, and obtains the weight value corresponding to the output neuron No. 2 as W ₂₂ , the corresponding weight value of the output neuron No. 6 is W ₂₆ . By analogy, other decompression engines de-associate the associated compression weights of other rows in the associated compression weight storage space 240 in parallel.

Step 1330: The accumulation engine 250 performs weight accumulation according to the output neuron number and the corresponding weight.

The accumulation engine 250 can read the weight accumulation value corresponding to the neuron number in the weight accumulation storage space 270 according to the above-mentioned output neuron number, and compare the output of the four decompression engines (decompression engine 1 to decompression engine 4) with the neuron. The weight corresponding to the number is accumulated with the weight accumulation value, and then the accumulated value is written into the weight accumulation storage space 270 .

Step 1340: Determine whether the accumulation of the single layer is completed.

As an example, it can be determined whether all the neurons in this layer have completed the decompression and accumulation, if not, return to continue to execute 1310; if yes, continue to execute step 1350.

Step 1350: The calculation engine 260 calculates the neuron's membrane voltage.

After the decompression and accumulation of all neurons in this layer are completed, the calculation engine 260 reads the membrane voltage, neuron parameter configuration and the previous time respectively from the membrane voltage storage space 290, the neuron parameter space 280 and the weight accumulation storage space 270. The weights are accumulated and the membrane voltage is accumulated by the neuron calculation module 1201. If the membrane voltage exceeds the threshold voltage, a pulse is fired and the membrane voltage is written back to the membrane voltage storage space 290 . If the membrane voltage does not exceed the threshold voltage, the accumulated membrane voltage is written back to the membrane voltage storage space 290 .

In the above technical solution, since the embodiment of the present application uses a semi-structured pruning solution, the number of weights in each row is consistent, and multiple decompression engines can be used to de-associate in parallel. The associated compression weights are decompressed. In this way, multiple decompression engines perform parallel decompression at the same time, which increases the computing speed of the pulsed neural network chip, thereby improving computing efficiency and reducing latency and power consumption.

The following takes the spiking neural network shown in FIG. 1 as an example, and in conjunction with FIG. 14 , the hardware level of another spiking neural network provided by the embodiments of the present application will be described in detail. It should be understood that the example in FIG. 14 is only for helping those skilled in the art to understand the embodiments of the present application, rather than limiting the embodiments of the present application to specific numerical values or specific scenarios exemplified. According to the example of FIG. 14 given below, those skilled in the art can obviously make various equivalent modifications or changes, and such modifications and changes also fall within the scope of the embodiments of the present application.

FIG. 14 is a schematic structural diagram of another spiking neural network circuit provided by an embodiment of the present application. As shown in FIG. 14 , the circuit may include: 1 to kn decompression engines (decompression engine 11 to decompression engine kn) and a calculation module 210 . In the spiking neural network circuit, the calculation module 210 may include a plurality of 1 to k calculation sub-modules, for example, calculation sub-module 1 to calculation sub-module k, each calculation sub-module may include an accumulation engine and corresponding a computing engine. Optionally, the spiking neural network circuit further includes: an input buffer 205, a compression module 220, an associated compression weight address information storage space 230, an associated compression weight storage space 240, a weight accumulation storage space 270, a neuron parameter storage space 280, a membrane Voltage storage space 290 .

It should be understood that the functions of the input buffer 205, the associated compression weight storage space 240, the associated compression weight address information storage space 230, the weight accumulation storage space 270, the neuron parameter storage space 280, and the membrane voltage storage space 290 and the architecture shown in FIG. 8 The functions in are the same, for details, please refer to the description in FIG. 8 , which will not be repeated here.

Different from the spiking neural network circuit in FIG. 8 , in the spiking neural network circuit shown in FIG. 14 , the calculation module 210 may include a plurality of 1 to k calculation sub-modules, wherein each calculation sub-module is responsible for a corresponding one. The multiple weight values in the group weight group respectively determine the membrane voltages of the corresponding multiple output neurons. As an example, each calculation submodule includes an accumulation engine and a corresponding calculation engine, and the accumulation engine is responsible for determining the weight accumulation value corresponding to the output neurons in a set of weight groups corresponding to the calculation submodule; the calculation engine is responsible for The weight accumulation value output by the accumulation engine determines the membrane voltage of the output neurons in the weight group at the current moment.

Specifically, since the weight compression in the embodiment of the present application uses a grouped semi-structured pruning scheme, so that the number of weights in each row and each group is the same after pruning. Therefore, 1 to k accumulation engines can be used for parallel accumulation. The accumulation engine is responsible for accumulating the corresponding weights of a group of output neurons. Similarly, 1 to k calculation engines can also be used for parallel calculation, and each calculation engine is responsible for calculating the output neuron membrane voltage according to the weight accumulation value output by the corresponding accumulation engine. In the decompression engine 01 to the decompression engine kn, since the number of weights in each group of a row is the same, the decompression engine 11 to the decompression engine 1n can de-associate the associated compression weights of each row in the group corresponding to the accumulation engine 1 in parallel. Similarly, the decompression engine k1 to the decompression engine kn are responsible for the de-association of the associated compression weights of each row in the group corresponding to the accumulation engine k, and so on.

For example, taking the hidden layer of the spiking neural network shown in Fig. 7 being divided into two groups as an example, Fig. 14 may include 2 accumulation engines (accumulation engine 1, accumulation engine 2), 2 calculation engines (calculation engine 1 , calculation engine 2), decompression engines 11-14, decompression engines 21-24. Among them, each of the decompression engines 11 to 14 is responsible for de-associating the associated compression weights of the corresponding rows in the first group, the accumulation engine 1 is responsible for the weight accumulation of the first group of neurons, and the calculation engine 1 is responsible for the first group. Membrane voltage calculations in neurons. Each of the decompression engines 21 to 24 is responsible for de-associating the associated compression weights of the corresponding rows in the second group, the accumulation engine 2 is responsible for the weight accumulation of the neurons in the second group, and the calculation engine 2 is responsible for the neurons in the second group. Element membrane voltage calculation.

For example, the decompression engine 11 is responsible for decompressing the associative compression weights (for example, 1-W ₁₁ ) stored in the first group of the first row of the associative compression weight storage space 240 to obtain the weight value corresponding to the output neuron No. 1 as W ₁₁ . The decompression engine 12 is responsible for decompressing the associative compression weights (for example, 2-W ₂₂ ) stored in the first group in the second row of the associative compression weight storage space 240 in parallel to obtain the weight value W ₂₂ corresponding to the output neuron of No. 2 . The decompression engine 13 is responsible for decompressing the associative compression weights (for example, 1-W ₃₁ ) stored in the first group in the third row of the associative compression weight storage space 240 in parallel to obtain the weight value W ₃₁ corresponding to the output neuron No. 1 . The decompression engine 14 is responsible for decompressing the associative compression weights (eg, 3-W ₄₃ ) stored in the first group in the fourth row of the associative compression weight storage space 240 in parallel to obtain the weight value corresponding to the output neuron No. 3 W ₄₃ . The accumulation engine 1 is responsible for reading the weight accumulation value corresponding to the neuron number in the weight accumulation storage space 270 according to the number of the neuron number 1 to 3, and compares the output of the four decompression engines (decompression engine 11 to decompression engine 14) with the sum. The weight corresponding to the neuron number and the weight accumulation value are accumulated, and then the accumulated value is written into the weight accumulation storage space 270 .

For another example, the decompression engine 21 is responsible for decompressing the associated compression weights (for example, 4-W ₁₄ ) stored in the second group of the first row and the second group of the associated compression weight storage space 240 to obtain the weight value corresponding to the output neuron No. 4 as W ₁₄ . The decompression engine 22 is responsible for decompressing the associative compression weights (for example, 6-W ₂₆ ) stored in the second row and second group of the associative compression weight storage space 240 in parallel, and obtains the weight value corresponding to the output neuron No. 6 as W ₂₆ . The decompression engine 23 is responsible for decompressing the associative compression weights (eg, 5-W ₃₅ ) stored in the second group of the third row of the associative compression weight storage space 240 in parallel to obtain the weight value corresponding to the output neuron No. 5 W ₃₅ . The decompression engine 24 is responsible for decompressing the associative compression weights (for example, 4-W ₄₄ ) stored in the second group of the fourth row of the associative compression weight storage space 240 in parallel to obtain the weight value corresponding to the output neuron No. 4 W ₄₄ . The accumulation engine 1 can work in parallel with the accumulation engine 1, and is responsible for reading the weight accumulation value corresponding to the neuron number in the weight accumulation storage space 270 according to the number of neurons No. ~The weight corresponding to the neuron number output by the decompression engine 24) is accumulated with the weight accumulation value, and then the accumulated value is written into the weight accumulation storage space 270.

It should be understood that the above is an example of dividing a certain layer of the spiking neural network into two groups. In fact, the number of accumulation engines and calculation engines included in the chip shown in FIG. 14 is divided into two groups. Several groups to determine. Of course, the n neurons in a certain layer can also be divided into n groups, and each neuron is a group. In this way, n accumulation engines and n calculation engines are required. Each accumulation engine is responsible for the weight accumulation of a neuron, and each calculation engine is responsible for the calculation of the membrane voltage of a neuron.

In the above-mentioned spiking neural network chip, since the weight compression uses a grouped semi-structured pruning scheme, the number of weights in each row and each group after pruning is the same. Accumulate in parallel, and use multiple computing engines for parallel computing. In this way, the computing speed of the impulse neural network chip is further increased, thereby improving computing efficiency and reducing time delay and power consumption.

FIG. 15 is a schematic block diagram of a spiking neural network system 1500 provided by an embodiment of the present application. As shown in FIG. 15 , the spiking neural network system 1500 may include a memory 1510 and a neural network circuit 1520 .

The memory 1510 may be used to store the compressed weight values, and as an example, the memory 1510 may correspond to the associated compressed weight storage space 240 above. Optionally, the memory 1510 can also be used to store the information of the input neurons. As an example, the memory 1510 can correspond to the input buffer 205 above.

There are various implementation manners of the neural network circuit 1520, which are not limited in this embodiment of the present application. For example, the neural network circuit 1520 may be the spiking neural network circuit shown in FIG. 8 , or the neural network circuit 1520 may also be the spiking neural network circuit shown in FIG. 14 . For details, please refer to the above section on the spiking neural network circuit description, which will not be repeated here.

It should be understood that, in various embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be dealt with in the embodiments of the present application. implementation constitutes any limitation.

Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.

Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this. should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

An impulse neural network circuit, characterized in that it includes:

A plurality of decompression modules are used to respectively obtain a plurality of weight values in the compressed weight matrix and the identifiers of the corresponding plurality of output neurons according to the information of the plurality of input neurons, wherein, among the plurality of decompression modules Each decompression module is used to obtain in parallel the weight values of the same row number in the compressed weight matrix and the identifiers of multiple output neurons corresponding to the weight values of the same row number, and the compressed weight matrix The number of non-zero weight values in each row is the same, and the weight value of each row corresponds to an input neuron;

The calculation module is configured to respectively determine the corresponding membrane voltages of the multiple output neurons according to the multiple weight values.
The circuit of claim 1, wherein the input neurons in the spiking neural network circuit comprise a first input neuron and a second input neuron, and the plurality of decompression modules comprise a first decompression module and the second decompression module,

The first decompression module is configured to obtain the weight value of the first row corresponding to the first input neuron in the compressed weight matrix and one or more outputs corresponding to the weight value of the first row respectively the identity of the neuron;

The second decompression module is configured to obtain the weight value of the second row corresponding to the second input neuron in the compressed weight matrix and one or more outputs corresponding to the weight value of the second row respectively The identity of the neuron.
The circuit according to claim 1 or 2, further comprising:

The compression module is used for pruning some weight values in the initial weight matrix according to the pruning ratio to obtain the compressed weight matrix.
The circuit according to any one of claims 1 to 3, wherein the compressed weight matrix includes a plurality of weight groups, and each row of each weight group of the plurality of weight groups has a non-zero weight The number of values is the same.
The circuit according to claim 4, wherein the calculation module includes a plurality of calculation sub-modules, and each calculation sub-module of the plurality of calculation sub-modules is used for calculating the output in one weight group in parallel Neuronal membrane voltage.
The circuit of claim 5, wherein the plurality of calculation submodules comprise a first calculation submodule and a second calculation submodule, and the first calculation submodule comprises a first accumulation engine and a first calculation engine , the second calculation submodule includes a second accumulation engine and a second calculation engine,

The first accumulation engine is used to determine the weight accumulation value corresponding to the output neuron in the first weight group corresponding to the first calculation submodule;

the first calculation engine, configured to determine the membrane voltage of the output neuron in the first weight group at the current moment according to the weight accumulation value output by the first accumulation engine;

The second accumulation engine is used to determine the weight accumulation value corresponding to the output neuron in the second weight group corresponding to the second calculation submodule;

The second calculation engine is configured to determine the membrane voltage of the output neurons in the second weight group at the current moment according to the weight accumulation value output by the second accumulation engine.
A computing method based on a spiking neural network, comprising:

Obtaining multiple weight values in the compressed weight matrix and the identifiers of the corresponding multiple output neurons respectively according to the information of multiple input neurons, wherein the multiple weight values include the compressed weights obtained in parallel The weight values of the same row number in the matrix, the identifiers of the multiple output neurons include the identifiers of the multiple output neurons corresponding to the weight values of the same row number obtained in parallel, and each of the compressed weight matrix The number of non-zero weight values in a row is the same, and the weight value in each row corresponds to an input neuron;

The membrane voltages of the corresponding plurality of output neurons are respectively determined according to the plurality of weight values.
The method according to claim 7, wherein the input neuron of the spiking neural network comprises a first input neuron and a second input neuron,

The obtaining of the multiple weight values in the compressed weight matrix and the identifiers of the corresponding multiple output neurons respectively according to the information of the multiple input neurons includes:

Obtain the weight value of the first row corresponding to the first input neuron in the compressed weight matrix and the identifier of one or more output neurons corresponding to the weight value of the first row respectively;

Obtain the weight value of the second row corresponding to the second input neuron and the identifier of one or more output neurons corresponding to the weight value of the second row in the compressed weight matrix.
The method according to claim 7 or 8, wherein the method further comprises:

Part of the weight values in the initial weight matrix is pruned according to the pruning ratio to obtain the compressed weight matrix.
The method according to any one of claims 7 to 9, wherein the compressed weight matrix includes a plurality of weight groups, and a non-zero weight of each row in each weight group of the plurality of weight groups The number of values is the same.
The method according to claim 10, wherein the determining the corresponding membrane voltages of the plurality of output neurons according to the plurality of weight values comprises:

In parallel, the corresponding membrane voltages of the plurality of output neurons are determined according to the plurality of weight values in each weight group.
The method of claim 11, wherein the plurality of weight groups include a first weight group and a second weight group,

The parallel determination of the corresponding membrane voltages of the multiple output neurons according to multiple weight values in each weight group includes:

Determine the weight accumulation value corresponding to the output neuron in the first weight group, and determine whether the output neuron in the first weight group is in the first weight group according to the weight accumulation value corresponding to the output neuron in the first weight group the membrane voltage at the current moment; and determining the weight accumulation value corresponding to the output neurons in the second weight group, and determining the second weight according to the weight accumulation value corresponding to the output neurons in the second weight group Membrane voltage of the output neuron in the reorganization at the current moment.
A spiking neural network system is characterized by comprising a memory and the spiking neural network circuit according to any one of claims 1 to 6, wherein the memory is used for storing a plurality of compressed weight values.