WO2022206193A1 - Spiking neural network circuit and spiking neural network-based calculation method - Google Patents

Spiking neural network circuit and spiking neural network-based calculation method Download PDF

Info

Publication number
WO2022206193A1
WO2022206193A1 PCT/CN2022/076269 CN2022076269W WO2022206193A1 WO 2022206193 A1 WO2022206193 A1 WO 2022206193A1 CN 2022076269 W CN2022076269 W CN 2022076269W WO 2022206193 A1 WO2022206193 A1 WO 2022206193A1
Authority
WO
WIPO (PCT)
Prior art keywords
weight
neuron
row
neural network
neurons
Prior art date
Application number
PCT/CN2022/076269
Other languages
French (fr)
Chinese (zh)
Inventor
张子阳
刘涛
王侃文
廖健行
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202110588707.9A external-priority patent/CN115169523A/en
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP22778378.4A priority Critical patent/EP4283522A1/en
Publication of WO2022206193A1 publication Critical patent/WO2022206193A1/en
Priority to US18/475,262 priority patent/US20240013037A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • the present application relates to the field of image processing, and more particularly, to a spiking neural network circuit and a computing method based on the spiking neural network.
  • spiking neural network is often referred to as the third-generation artificial neural network. It is closer to the real biology in terms of information processing methods and biological models than traditional artificial neural networks. processing system.
  • Information is transmitted between neurons in a spiking neural network in the form of pulses.
  • the occurrence of a pulse is determined by differential equations representing various biological processes, the most important of which is the neuron's membrane voltage.
  • the membrane voltage of each neuron changes with the input pulse.
  • the neuron will be activated to generate a new signal (for example, firing a pulse), and transmit the signal to other neurons connected to it.
  • Correlated spiking neural network circuits are less efficient at calculating the membrane voltage of neurons.
  • the present application provides a spiking neural network circuit and a computing method based on the spiking neural network, and the spiking neural network circuit can improve computing efficiency.
  • an impulse neural network circuit includes: a plurality of decompression modules and a calculation module.
  • the multiple decompression modules are used to obtain multiple weight values in the compressed weight matrix and the identifiers of the corresponding multiple output neurons according to the information of multiple input neurons, wherein the multiple decompression modules
  • Each decompression module in is used to obtain in parallel the weight values of the same row number in the compressed weight matrix and the identifiers of multiple output neurons corresponding to the weight values of the same row number, and the compressed weights
  • the number of non-zero weight values of each row in the matrix is the same, and the weight value of each row corresponds to an input neuron;
  • the calculation module is used to determine the corresponding membrane voltages of the plurality of output neurons according to the plurality of weight values.
  • each decompression module of the multiple decompression modules in the spiking neural network circuit is used to obtain the compressed weight matrix in parallel.
  • the weight value of the same number of rows and the identifiers of the multiple output neurons corresponding to the weight value of the same number of rows in the same number of rows in this way, multiple decompression modules perform parallel decompression at the same time, which increases the computing speed of the pulse neural network chip. Thereby, the computing efficiency is improved, and the effect of reducing the delay and power consumption is achieved.
  • the input neurons in the spiking neural network circuit include a first input neuron and a second input neuron
  • the plurality of decompression modules include a first input neuron.
  • a decompression module and a second decompression module the first decompression module is used to obtain the first row weight value and the first row weight corresponding to the first input neuron in the compressed weight matrix The identifier of one or more output neurons corresponding to the values respectively; the second decompression module is used to obtain the weight value of the second row corresponding to the second input neuron in the compressed weight matrix and the The identifiers of one or more output neurons corresponding to the weight values in the second row respectively.
  • the first decompression module is specifically configured to: obtain a base address for storing the weight value of the first row from a first storage space, and the first storage The base address of each row of weight values in the compressed weight matrix and the number of non-zero weight values in each row are stored in the space; the first row of weight values is obtained from the second storage space according to the base address of the first row of weight values. row weight value, and the identifier of the output neuron corresponding to the first row weight value respectively, the second storage space stores the first row weight value and the output neuron corresponding to the first row weight value Element ID.
  • the spiking neural network circuit further includes: a compression module, configured to prune part of the weight values in the initial weight matrix according to the pruning ratio, to obtain the The compressed weight matrix.
  • the compressed weight matrix includes multiple weight groups, and the number of non-zero weight values in each row in each weight group of the multiple weight groups is the same .
  • the computing module includes multiple computing sub-modules, and each computing sub-module in the multiple computing sub-modules is responsible for computing a weight group in parallel Membrane voltage of output neurons in .
  • the plurality of calculation submodules include a first calculation submodule and a second calculation submodule, and the first calculation submodule includes a first accumulation engine and a first calculation submodule.
  • the second calculation sub-module includes a second accumulation engine and a second calculation engine
  • the first accumulation engine is used to determine the output neurons in the first weight group corresponding to the first calculation sub-module the corresponding weight accumulation value
  • the first calculation engine is configured to determine the membrane voltage of the output neuron in the first weight group at the current moment according to the weight accumulation value output by the first accumulation engine
  • the second The accumulation engine is used to determine the weight accumulation value corresponding to the output neuron in the second weight group corresponding to the second calculation submodule
  • the second calculation engine is used for the weight accumulation value output by the second accumulation engine , determine the membrane voltage of the output neurons in the second weight group at the current moment.
  • a computing method based on a spiking neural network comprising: obtaining multiple weight values in the compressed weight matrix and the identifiers of the corresponding multiple output neurons according to the information of multiple input neurons, respectively,
  • the multiple weight values include weight values of the same row number in the compressed weight matrix obtained in parallel, and the identifiers of the multiple output neurons include the parallel obtained weight values corresponding to the same row number
  • the number of non-zero weight values of each row in the compressed weight matrix is the same, and the weight value of each row corresponds to an input neuron; according to the multiple weight values, the corresponding the membrane voltage of the plurality of output neurons.
  • the input neuron of the spiking neural network circuit includes a first input neuron and a second input neuron, and the compressed weight matrix is obtained and the obtaining the first row weight value corresponding to the first input neuron, and the identifiers of one or more output neurons corresponding to the first row weight value respectively; The second row of weight values corresponding to the neurons, and the identifiers of one or more output neurons corresponding to the second row of weight values respectively.
  • the method before obtaining the multiple weight values in the compressed weight matrix and the identifiers of the corresponding multiple output neurons, the method further includes: Part of the weight values in the initial weight matrix is pruned according to the pruning ratio to obtain the compressed weight matrix.
  • the compressed weight matrix includes multiple weight groups, and the number of non-zero weight values in each row in each weight group of the multiple weight groups is the same .
  • the membrane voltages of the corresponding multiple output neurons are determined in parallel according to multiple weight values in each weight group.
  • the multiple weight groups include a first weight group and a second weight group, and the weight accumulation corresponding to the output neurons in the first weight group is determined value, and determine the membrane voltage of the output neurons in the first weight group at the current moment according to the weight accumulation value corresponding to the output neurons in the first weight group;
  • the weight accumulation value corresponding to the output neuron is output, and the membrane voltage of the output neuron in the second weight group at the current moment is determined according to the weight accumulation value corresponding to the output neuron in the second weight group.
  • a spiking neural network system including a memory and a neural network circuit according to any one of the possible implementations of the first aspect and the first aspect, where the memory is used to store a plurality of compressed weight values.
  • the memory is further configured to store information of a plurality of input neurons.
  • an spiking neural network system including a processor and a neural network circuit according to any one possible implementation of the first aspect and the first aspect, the processor includes an input buffer, and the input buffer uses for caching the information of the plurality of input neurons.
  • system further includes a memory for storing the compressed multiple weight values.
  • an apparatus for determining the membrane voltage of a spiking neuron comprising a communication interface and a processor.
  • the processor is configured to control the communication interface to send and receive information
  • the processor is connected to the communication interface, and is configured to execute the method for determining the membrane voltage of the spiking neuron in the second aspect or any possible implementation manner of the second aspect.
  • the processor may be a general-purpose processor, which may be implemented by hardware or software.
  • the processor can be a logic circuit, an integrated circuit, etc.; when implemented by software, the processor can be a general-purpose processor, implemented by reading software codes stored in a memory, which can Integrated in the processor, can be located outside the processor, independent existence.
  • a computer program product comprising: computer program code, when the computer program code is run on a computing device, the computing device is made to execute any one of the second aspect or the second aspect possible methods.
  • a computer-readable medium stores program codes that, when the computer program codes are executed on a computing device, cause the computing device to execute the second aspect or any one of the second aspects. a possible way to do it.
  • These computer-readable storages include, but are not limited to, one or more of the following: read-only memory (ROM), programmable ROM (PROM), erasable PROM (erasable PROM, EPROM), Flash memory, electrical EPROM (electrically EPROM, EEPROM) and hard drive (harddrive).
  • Figure 1 shows a schematic diagram of the structure of a spiking neural network.
  • FIG. 2 is a schematic flowchart of a method for weight compression of a spiking neural network provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of an initial weight matrix of a spiking neural network provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of semi-structured pruning of an initial weight matrix of a spiking neural network provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a spiking neural network after semi-structured pruning provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of grouping and semi-structured pruning of an initial weight matrix of a spiking neural network provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a spiking neural network after grouping semi-structured pruning provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an impulse neural network circuit provided by an embodiment of the present application.
  • FIG. 9 is a schematic block diagram of an associated compression weight storage space 240 provided by an embodiment of the present application.
  • FIG. 10 is a schematic block diagram of a decompression engine obtaining weight values and corresponding output neurons according to an embodiment of the present application.
  • FIG. 11 is a schematic block diagram of an accumulation engine obtaining weight accumulation values of output neurons according to an embodiment of the present application.
  • FIG. 12 is a schematic block diagram of a calculation engine determining the membrane voltage of a spiking neuron according to an embodiment of the present application.
  • FIG. 13 is a schematic flowchart of a method for calculating the membrane voltage of a spiking neuron provided by an embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of another spiking neural network circuit provided by an embodiment of the present application.
  • FIG. 15 is a schematic block diagram of a spiking neural network system 1500 provided by an embodiment of the present application.
  • Artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that responds in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theory.
  • a neural network is a mathematical model or computational model that imitates the structure and function of a biological neural network (animal central nervous system, especially the brain), and is used to estimate or approximate functions .
  • the inside of the biological brain is composed of a large number of neurons through different connection methods. The former neuron and the latter neuron are connected through the synaptic structure for information transmission.
  • spiking neural network is often referred to as the third-generation artificial neural network, which is closer to the real biology in terms of information processing methods and biological models than traditional artificial neural networks. processing system.
  • the artificial neural network transmits multi-valued signals, and the spiking neural network transmits binary pulse information, so its input and output information is sparse, and the spiking neural network has low power consumption; on the other hand, the spiking neural network
  • the neuron model of the network is similar to the neuron model of the brain, which has a dynamic accumulation process and has one more time dimension information than the traditional artificial neural network, so it is more suitable for processing intelligent tasks with time information.
  • Figure 1 shows a schematic diagram of the structure of a spiking neural network.
  • the spiking neural network can contain three layers: input layer, hidden layer, and output layer.
  • the hidden layer contains multiple layers, the logic in the layer is parallel, and the logic between the layers is serial, and the calculation results between the layers are interdependent and affect each other.
  • Fig. 1 takes the example that the hidden layer includes a layer of neurons for illustration.
  • each layer of a spiking neural network may include multiple nodes, each of which is used to simulate a spiking neuron for performing a certain operation, such as an activation function.
  • the connection between the former neuron (which may also be called an input neuron) and the latter neuron (which may also be called an output neuron) is used to simulate a synapse.
  • a synapse is a carrier of information transmission between two neurons, and the weight value of the synapse represents the connection strength between the two neurons.
  • the reference numbers in each node shown in FIG. 1 are only for identifying or distinguishing different nodes.
  • the transmission of information between neurons in a spiking neural network in the form of spikes is based on discrete-valued activities that occur at certain points in time, rather than continuous values.
  • the occurrence of a pulse is determined by differential equations representing various biological processes, the most important of which is the neuron's membrane voltage.
  • the membrane voltage of each neuron changes with the input pulse.
  • the neuron will be activated to generate a new signal (for example, firing a pulse), and transmit the signal to other neurons connected to it.
  • the neurons in the spiking neural network realize the transmission and processing of information through the above methods, and have information processing capabilities such as nonlinearity, self-adaptation, and fault tolerance.
  • one synaptic connection may be used between two neurons in the spiking neural network, or multiple synaptic connections may also be used, which is not specifically limited in this application.
  • Each synapse has a modifiable synaptic weight (also called weight), and multiple pulses transmitted by presynaptic neurons can generate different postsynaptic membrane voltages according to the size of the synaptic weight.
  • the spiking neural network has the characteristics of sparseness and low power consumption during operation, its accuracy is not high. In order to improve the accuracy of the network, the number of weights will also be large, which makes the weight storage in the spiking neural network chip too large. The area, delay and power consumption of the chip will increase accordingly, which restricts the hardware development and commercialization of the spiking neural network. Therefore, it is of great significance to compress the weights of spiking neural networks.
  • an embodiment of the present application provides a method for compressing the weight of a spiking neural network, which can make the number of non-zeros in each row or the number of non-zeros in each row and group the same in the weight matrix, so as to save the spiking neural network.
  • it can also realize parallel decompression and parallel computing at the hardware level of the spiking neural network, increasing the computing speed, thereby improving the computing efficiency of the spiking neural network hardware level and reducing the delay and power consumption.
  • FIG. 2 is a schematic flowchart of a method for weight compression of a spiking neural network provided by an embodiment of the present application. As shown in FIG. 2, the method may include steps 210-270, and the steps 210-270 will be described in detail below respectively.
  • Step 210 Load the pre-trained spiking neural network to obtain initial weights.
  • the initial weight matrix of the hidden layer is shown in Figure 3. where each row in the initial weight matrix represents an input neuron, eg, the neuron of the input layer connected to the hidden layer. Each column represents an output neuron. For example, a neuron in the hidden layer.
  • the weights W 11 to W 41 in the first column represent the weight values corresponding to the neuron No. 1 in the hidden layer in FIG. 1 ;
  • the weights W 12 to W 42 in the second column represent The weight values corresponding to the No. 2 neuron in the hidden layer in Figure 1;
  • the weights W 13 to W 43 in the third column represent the weight values corresponding to the No. 3 neuron in the hidden layer in Figure 1;
  • the weights W 14 to W in the fourth column 44 represents the weight value corresponding to the No. 4 neuron in the hidden layer in Figure 1;
  • the weights W 15 to W 45 in the fifth column represent the weight value corresponding to the No.
  • the weight W 16 ⁇ W 46 represents the weight value corresponding to neuron No. 6 in the hidden layer in Figure 1.
  • the weights W 11 to W 16 in the first row represent the weight values corresponding to the No. 7 neuron in the input layer in Figure 1; the weights W 21 to W 26 in the second row represent the weights corresponding to the No. 8 neuron in the input layer in Figure 1 value; the weights W 31 to W 36 in the third row represent the weight values corresponding to the No. 9 neuron in the input layer in Figure 1; the weights W 41 to W 46 in the fourth row represent the corresponding weights for the No. 10 neuron in the input layer in Figure 1 weight value.
  • Step 220 Select different weight matrix pruning schemes according to requirements.
  • semi-structured pruning in step 230 may be performed; if the weight matrix needs to be grouped, and the non-zero weights of each row and each group of the weight matrix should be implemented. The number of zero weights is the same, and the grouped semi-structured pruning in step 240 can be performed.
  • Step 230 Semi-structured pruning.
  • the semi-structured pruning in the embodiment of the present application refers to performing weight pruning at the granularity of each behavior of the weight matrix to obtain a semi-structured pruned weight matrix.
  • the weight value of each row in the original weight matrix can be sorted according to the weight size of each row of the weight matrix, and then the s% (sparseness) at the end of the sorting is set to 0.
  • the weight matrix after semi-structured pruning is obtained, so that the length of each row in the weight matrix after semi-structured pruning is the same.
  • the weight value of each row in the initial weight matrix shown in Figure 3 can be sorted, and then the weight value of the last 66.6% of the sorting can be set to 0.
  • the dotted line represents the pruned weight value
  • the weight matrix composed of solid lines is the pruned weight matrix.
  • the number of non-zero weight values in each row is the same (both include two weight values).
  • the first row of the weight matrix after semi-structured pruning includes two weights, W 11 and W 14 , namely the No. 7 neuron in the input layer in Figure 1 and the No. 1 neuron in the hidden layer respectively. The neuron and the No.
  • the second line includes two weights of W 22 and W 24 , that is, the No. 8 neuron in the input layer in Figure 1 and the hidden layer are respectively The No. 2 neuron and No. 6 neuron are connected, and the weight values of the connection are W 22 and W 26 respectively;
  • the third line includes two weights of W 31 and W 35 , that is, the No. 9 neuron in the input layer in Figure 1 and the hidden The No. 1 neuron and No. 5 neuron of the layer are connected, and the weight values of the connection are W 31 and W 35 respectively;
  • the fourth line includes two weights of W 43 and W 44 , that is, the No.
  • the length of each row in the pruned weight matrix is the same, and the number of connections between each neuron in the same layer and the neuron in the next layer is the same.
  • Step 240 Group semi-structured pruning.
  • the grouping semi-structured pruning in the embodiment of the present application refers to dividing each row into several weight groups of equal number, and performing weight pruning with each group in each row of the weight matrix as the granularity to obtain the grouping semi-structure pruned weight matrix.
  • each group in each row of the weight matrix can be used as the granularity, and the weight values of each row and each group in the original weight matrix can be sorted according to the size of the weight, and then the last s% (sparseness) Set to 0.
  • the weight matrix after grouping and semi-structured pruning is obtained, so that the lengths of each row and each group in the weight matrix after grouping and semi-structured pruning are the same.
  • the neurons in the hidden layer shown in Figure 1 can be divided into two groups, each group includes three neurons, for example, in the first group Three neurons with neuron numbers 1-3 are included, and three neurons with neuron numbers 4-6 are included in the second group.
  • the weight values of each group in each row in the initial weight matrix shown in Figure 3 can be sorted, and then the bottom 66.6% of the sorted weight values are set to 0.
  • the dotted line represents the pruned weight value
  • the weight matrix formed by the solid line is the pruned weight matrix.
  • the number of non-zero weight values of each group in each row is the same (including one weight value), that is, each row and each group in the weight matrix after group pruning of the same length.
  • the specific structure of the spiking neural network after grouping semi-structured pruning is shown in Figure 7.
  • Step 250 Calculate a loss function according to the pruned weights.
  • the loss function is used to assist in optimizing the parameters of the spiking neural network by calculating the error between the actual (target) value and the predicted value of the spiking neural network.
  • the loss function of the spiking neural network can be calculated according to the pruned weight, so as to obtain the error between the actual (target) value and the predicted value of the spiking neural network, so as to optimize or update the pruning according to the error.
  • the weight matrix after the branch is used to assist in optimizing the parameters of the spiking neural network by calculating the error between the actual (target) value and the predicted value of the spiking neural network.
  • Step 260 Retrain the spiking neural network and update the pruned weight matrix.
  • the parameters (weights) of the spiking neural network can be optimized according to the above loss function to minimize the loss of the spiking neural network.
  • the parameters (weights) of a spiking neural network can be optimized using gradient descent, and the pruned weight matrix can be updated to minimize the loss of the neural network.
  • Step 270 Determine whether the spiking neural network has converged.
  • step 230 the process ends; if the spiking neural network does not converge, continue to perform step 230 or step 240 until the spiking neural network converges.
  • semi-structured pruning can achieve the same number of rows in each layer of weight matrices, and grouped semi-structured pruning can achieve the same number of rows and groups in each layer of weight matrices. In this way, while saving weight storage resources at the hardware level, it also helps to achieve parallel decompression and parallel computing, increasing the computing speed at the hardware level, thereby improving computing efficiency and reducing latency and power consumption.
  • FIG. 8 to FIG. 12 are only for helping those skilled in the art to understand the embodiments of the present application, and are not intended to limit the embodiments of the present application to specific numerical values or specific scenarios exemplified. Those skilled in the art can obviously make various equivalent modifications or changes according to the examples of FIG. 8 to FIG. 12 given below, and such modifications and changes also fall within the scope of the embodiments of the present application.
  • FIG. 8 is a schematic structural diagram of an impulse neural network circuit provided by an embodiment of the present application.
  • the spiking neural network circuit may include: 1 to n decompression engines (also referred to as decompression modules or decompression modules) and a calculation module 210 .
  • the calculation module 210 may include an accumulation engine 250 and a calculation engine 260 .
  • the spiking neural network circuit further includes: an input buffer 205, a compression module 220, an associated compression weight address information storage space 230, an associated compression weight storage space 240, a weight accumulation storage space 270, a neuron parameter storage space 280, a membrane Voltage storage space 290 .
  • the functions of each of the above modules will be described in detail below.
  • the input buffer 205 is used to store the information of the pre-neuron (input neuron) that sends the input pulse (the information may be the number or index of the neuron).
  • the input neuron may be the input shown in FIG. 1 layer of neurons.
  • the input buffer 205 may be a processor's buffer.
  • the compression module 220 is configured to execute the method shown in FIG. 2 above to obtain a pruned weight matrix, where the pruned weight matrix includes the pruned weight and an output neuron number corresponding to the weight.
  • the pruned weight and the output neuron number corresponding to the weight may also be stored in the associated compressed weight storage space 240 .
  • the associated compression weight storage space 240 is used to store the pruned weight and the number of the output neuron corresponding to the weight.
  • the output neuron may be the neuron of the hidden layer shown in FIG. 1 .
  • the pruned weights obtained through the above semi-structured pruning in FIG. 4 and the corresponding output neuron numbers can be related according to a certain correspondence to form associated compressed weight data, and the associated compressed weight data Hardened into the associative compressed weight storage space 240 of the spiking neural network chip.
  • the associated compression weight storage space 240 stores compression weights and associated indexes.
  • This association method may adopt a direct indexing method or an indirect indexing method, which is not specifically limited in this embodiment of the present application.
  • the direct index method is to add a corresponding index before each compression weight, and the specific content of the index is the number of the neuron.
  • the indirect index method is to add a corresponding index before each compression weight, and the specific content of the index is the distance between the neuron number of the compression weight and the neuron number of the previous compression weight.
  • FIG. 9 shows a storage format diagram of the associated compression weight storage space 240 for semi-structured pruning.
  • This figure only shows the format diagram of one layer (for example, hidden layer) of the spiking neural network, other layers are similar. where each row represents an input neuron and each column represents an output neuron. It can be seen that after semi-structured sparse, the number of compression weights for each row is the same. After the associated compression weight is obtained, the weight matrix is hardened into the associated compression weight storage space 240 of the chip.
  • the specific content of the index is the number of the neuron associated with the weight as an example
  • the first row of the associated compressed weight storage space 240 stores 1 -W 11 , 4-W 14 ;
  • the second row stores 2-W 22 , 6-W 26 ;
  • the third row stores 1-W 31 , 5-W 35 ;
  • the fourth row stores 3-W 43 , 4 —W 44 .
  • 1-W 11 stored in the first row corresponds to an associated compression weight above, which represents the compression weight W 11 and the index corresponding to the compression weight
  • the index is 1 for the output neuron associated with the compression weight No. neuron
  • 4-W 14 corresponds to an associated compression weight above, which represents the compression weight W 14 and the index corresponding to the compression weight
  • the index is the output neuron associated with the compression weight is No. 4 neuron Yuan.
  • the associated compression weight address information storage space 230 is used to store the above-mentioned address resolution information associated with the compression weight.
  • the geocoding information may be the base address of the associated compression weight for each row and the number of compression weights for each row.
  • the associated compression weight storage can be calculated according to the base address of the associated compression weights of each row.
  • the address in space 240 of the corresponding associated compression weight Compared with the unstructured pruning scheme, since the number of associated compression weights of each row is different, the number of associated compression weights of each row needs to be stored separately, which can save weight storage resources.
  • FIG. 9 shows a storage format diagram of the associated compressed weight address information storage space 230 of semi-structured pruning. This figure only shows the format diagram of one layer (for example, hidden layer) of the spiking neural network, other layers are similar. After obtaining the address resolution information associated with the compression weight, the address resolution information associated with the compression weight is hardened into the storage space 230 of the associated compression weight address information of the chip.
  • the associated compression weight address information storage space 230 can store the base address of the associated compression weight of each row and the number of associated compression weights of each row. , the number is 2, for example.
  • the decompression engine is configured to de-associate the associative compression weights stored in the associative compression weight storage space 240 according to the information of the plurality of input neurons. Specifically, referring to FIG. 10 , the decompression engine may obtain the input neuron number from the input buffer 205 , and resolve the address information of the associated compression weight in the associated compression weight address information storage space 230 according to the number. The associated compression weight is obtained from the associated compression weight storage space 240 according to the address information, and the associated compression weight is disassociated through the disassociation module 1010 to obtain the corresponding output neuron number and weight. As an example, if the index format is a direct index, the output neuron number and weight information can be obtained directly; if the index format is an indirect index, the neuron number and weight information can be obtained through a shift operation.
  • the weight compression in the embodiment of this application uses a semi-structured pruning scheme, so that the number of weights in each row after pruning is the same, 1 to n decompression engines can be used, and each decompression engine is responsible for the information of multiple input neurons.
  • the associative compression weights of a row in the associative compression weight storage space 240 are decompressed.
  • the 1-n decompression engines perform parallel decompression at the same time, which increases the computing speed of the spiking neural network chip, thereby improving computing efficiency and reducing latency and power consumption.
  • the spiking neural network chip may include four decompression engines, and each decompression engine is responsible for decompressing the associated compression weights in a row in the associated compression weight storage space 240 .
  • the decompression engine 1 is responsible for decompressing the associated compression weights (for example, 1-W 11 , 4-W 14 ) stored in the first row of the associated compression weight storage space 240 ;
  • the decompression engine 2 is responsible for decompressing the associated compression weight storage space 240
  • the associated compression weights eg, 2-W 22 , 6-W 26 ) stored in the second row are decompressed;
  • the decompression engine 3 is responsible for decompressing the associated compression weights (eg, 1-W ) stored in the third row of the associated compression weight storage space 240 31 , 5-W 35 ) for decompression;
  • the decompression engine 4 is responsible for decompressing the associated compression weights (for example, 3-W 43 , 4-W 44 ) stored in the fourth row of the associated compression weight storage space 240 .
  • Calculation module 210 may include accumulation engine 250 and calculation engine 260 .
  • the accumulation engine 250 is used to accumulate the weights of the corresponding output neurons.
  • the accumulation engine 250 can read the weight accumulation value corresponding to the neuron number in the weight accumulation storage space 270 according to the neuron numbers output by the 1-n decompression engines, and decompress the 1-n number of neurons.
  • the weight corresponding to the neuron number output by the engine is accumulated with the weight accumulation value, and then the accumulated value is written into the weight accumulation storage space 270 .
  • the calculation engine 260 is used to output the calculation of the neuron membrane voltage. Specifically, referring to FIG.
  • the calculation engine 260 reads the membrane voltage of the previous time from the membrane voltage storage space 290 , the neuron parameter space 280 and the weight accumulation storage space 270 respectively. Voltage, neuron parameter configuration and weight accumulation value, and the neuron calculation module 1201 performs membrane voltage accumulation. If the membrane voltage exceeds the threshold voltage, a pulse is fired and the membrane voltage is written back to the membrane voltage storage space 290 . If the membrane voltage does not exceed the threshold voltage, the accumulated membrane voltage is written back to the membrane voltage storage space 290 .
  • the weight accumulation storage space 270 is used to store the weight accumulation value corresponding to each output neuron.
  • the neuron parameter space 280 is used to store the neuron parameter configuration information of the spiking neural network.
  • Membrane voltage storage space 290 for storing the accumulated membrane voltage of the neuron.
  • FIG. 13 is only for helping those skilled in the art to understand the embodiments of the present application, but is not intended to limit the embodiments of the present application to specific numerical values or specific scenarios exemplified. According to the example of Fig. 13 given below, those skilled in the art can obviously make various equivalent modifications or changes, and such modifications and changes also fall within the scope of the embodiments of the present application.
  • FIG. 13 is a schematic flowchart of a method for calculating the membrane voltage of a spiking neuron provided by an embodiment of the present application. As shown in FIG. 13 , the method may include steps 1310-1350, and the steps 1310-1350 will be described in detail below respectively.
  • FIG. 13 illustrates the calculation of the membrane voltages of neurons No. 1 to 6 in the hidden layer, and the calculation of the membrane voltages of neurons in other layers is similar to the method shown in FIG. 13 .
  • Step 1310 The four decompression engines obtain corresponding input neuron numbers from the input buffer 205 in parallel, respectively.
  • the four decompression engines respectively obtain input neuron numbers from the input buffer 205 as the 7th to 10th neurons of the input layer.
  • Step 1320 The four decompression engines obtain the associated compression weights in parallel according to the input neuron numbers, and perform de-association to obtain the output neuron numbers and corresponding weights.
  • FIG. 8 may include 4 decompression engines (decompression engine 1 to decompression engine 4), each decompression engine is responsible for de-associating the associated compression weights of the corresponding rows in the associated compression weight storage space 240, and the 4 decompression engines ( The decompression engine 1 to the decompression engine 4) can perform the deassociation of the associated compression weights of the four rows in the associated compression weight storage space 240 in parallel.
  • each decompression engine parses the address information of the associated compression weight of the corresponding row in the associated compression weight address information storage space 230 according to the input neuron number in parallel, and obtains the address information from the associated compression weight storage space 240 in parallel according to the address information Associate the compression weights, and de-associate the associated compression weights to obtain the corresponding output neuron numbers and weights.
  • the decompression engine 1 is responsible for decompressing the associated compression weights (for example, 1-W 11 , 4-W 14 ) stored in the first row of the associative compression weight storage space 240 to obtain the weight value corresponding to the output neuron No. 1 as W 11 , the corresponding weight value of the output neuron No. 4 is W 14 .
  • the decompression engine 2 is responsible for decompressing the associated compression weights (for example, 2-W 22 , 6-W 26 ) stored in the second row of the associated compression weight storage space 240 in parallel, and obtains the weight value corresponding to the output neuron No. 2 as W 22 , the corresponding weight value of the output neuron No. 6 is W 26 .
  • other decompression engines de-associate the associated compression weights of other rows in the associated compression weight storage space 240 in parallel.
  • Step 1330 The accumulation engine 250 performs weight accumulation according to the output neuron number and the corresponding weight.
  • the accumulation engine 250 can read the weight accumulation value corresponding to the neuron number in the weight accumulation storage space 270 according to the above-mentioned output neuron number, and compare the output of the four decompression engines (decompression engine 1 to decompression engine 4) with the neuron. The weight corresponding to the number is accumulated with the weight accumulation value, and then the accumulated value is written into the weight accumulation storage space 270 .
  • Step 1340 Determine whether the accumulation of the single layer is completed.
  • Step 1350 The calculation engine 260 calculates the neuron's membrane voltage.
  • the calculation engine 260 reads the membrane voltage, neuron parameter configuration and the previous time respectively from the membrane voltage storage space 290, the neuron parameter space 280 and the weight accumulation storage space 270. The weights are accumulated and the membrane voltage is accumulated by the neuron calculation module 1201. If the membrane voltage exceeds the threshold voltage, a pulse is fired and the membrane voltage is written back to the membrane voltage storage space 290 . If the membrane voltage does not exceed the threshold voltage, the accumulated membrane voltage is written back to the membrane voltage storage space 290 .
  • the embodiment of the present application uses a semi-structured pruning solution, the number of weights in each row is consistent, and multiple decompression engines can be used to de-associate in parallel. The associated compression weights are decompressed. In this way, multiple decompression engines perform parallel decompression at the same time, which increases the computing speed of the pulsed neural network chip, thereby improving computing efficiency and reducing latency and power consumption.
  • FIG. 14 is a schematic structural diagram of another spiking neural network circuit provided by an embodiment of the present application.
  • the circuit may include: 1 to kn decompression engines (decompression engine 11 to decompression engine kn) and a calculation module 210 .
  • the calculation module 210 may include a plurality of 1 to k calculation sub-modules, for example, calculation sub-module 1 to calculation sub-module k, each calculation sub-module may include an accumulation engine and corresponding a computing engine.
  • the spiking neural network circuit further includes: an input buffer 205, a compression module 220, an associated compression weight address information storage space 230, an associated compression weight storage space 240, a weight accumulation storage space 270, a neuron parameter storage space 280, a membrane Voltage storage space 290 .
  • the calculation module 210 may include a plurality of 1 to k calculation sub-modules, wherein each calculation sub-module is responsible for a corresponding one.
  • the multiple weight values in the group weight group respectively determine the membrane voltages of the corresponding multiple output neurons.
  • each calculation submodule includes an accumulation engine and a corresponding calculation engine, and the accumulation engine is responsible for determining the weight accumulation value corresponding to the output neurons in a set of weight groups corresponding to the calculation submodule; the calculation engine is responsible for The weight accumulation value output by the accumulation engine determines the membrane voltage of the output neurons in the weight group at the current moment.
  • 1 to k accumulation engines can be used for parallel accumulation.
  • the accumulation engine is responsible for accumulating the corresponding weights of a group of output neurons.
  • 1 to k calculation engines can also be used for parallel calculation, and each calculation engine is responsible for calculating the output neuron membrane voltage according to the weight accumulation value output by the corresponding accumulation engine.
  • the decompression engine 11 to the decompression engine 1n can de-associate the associated compression weights of each row in the group corresponding to the accumulation engine 1 in parallel.
  • the decompression engine k1 to the decompression engine kn are responsible for the de-association of the associated compression weights of each row in the group corresponding to the accumulation engine k, and so on.
  • Fig. 14 may include 2 accumulation engines (accumulation engine 1, accumulation engine 2), 2 calculation engines (calculation engine 1 , calculation engine 2), decompression engines 11-14, decompression engines 21-24.
  • each of the decompression engines 11 to 14 is responsible for de-associating the associated compression weights of the corresponding rows in the first group
  • the accumulation engine 1 is responsible for the weight accumulation of the first group of neurons
  • the calculation engine 1 is responsible for the first group.
  • Each of the decompression engines 21 to 24 is responsible for de-associating the associated compression weights of the corresponding rows in the second group, the accumulation engine 2 is responsible for the weight accumulation of the neurons in the second group, and the calculation engine 2 is responsible for the neurons in the second group. Element membrane voltage calculation.
  • the decompression engine 11 is responsible for decompressing the associative compression weights (for example, 1-W 11 ) stored in the first group of the first row of the associative compression weight storage space 240 to obtain the weight value corresponding to the output neuron No. 1 as W 11 .
  • the decompression engine 12 is responsible for decompressing the associative compression weights (for example, 2-W 22 ) stored in the first group in the second row of the associative compression weight storage space 240 in parallel to obtain the weight value W 22 corresponding to the output neuron of No. 2 .
  • the decompression engine 13 is responsible for decompressing the associative compression weights (for example, 1-W 31 ) stored in the first group in the third row of the associative compression weight storage space 240 in parallel to obtain the weight value W 31 corresponding to the output neuron No. 1 .
  • the decompression engine 14 is responsible for decompressing the associative compression weights (eg, 3-W 43 ) stored in the first group in the fourth row of the associative compression weight storage space 240 in parallel to obtain the weight value corresponding to the output neuron No. 3 W 43 .
  • the accumulation engine 1 is responsible for reading the weight accumulation value corresponding to the neuron number in the weight accumulation storage space 270 according to the number of the neuron number 1 to 3, and compares the output of the four decompression engines (decompression engine 11 to decompression engine 14) with the sum. The weight corresponding to the neuron number and the weight accumulation value are accumulated, and then the accumulated value is written into the weight accumulation storage space 270 .
  • the decompression engine 21 is responsible for decompressing the associated compression weights (for example, 4-W 14 ) stored in the second group of the first row and the second group of the associated compression weight storage space 240 to obtain the weight value corresponding to the output neuron No. 4 as W 14 .
  • the decompression engine 22 is responsible for decompressing the associative compression weights (for example, 6-W 26 ) stored in the second row and second group of the associative compression weight storage space 240 in parallel, and obtains the weight value corresponding to the output neuron No. 6 as W 26 .
  • the decompression engine 23 is responsible for decompressing the associative compression weights (eg, 5-W 35 ) stored in the second group of the third row of the associative compression weight storage space 240 in parallel to obtain the weight value corresponding to the output neuron No. 5 W 35 .
  • the decompression engine 24 is responsible for decompressing the associative compression weights (for example, 4-W 44 ) stored in the second group of the fourth row of the associative compression weight storage space 240 in parallel to obtain the weight value corresponding to the output neuron No. 4 W 44 .
  • the accumulation engine 1 can work in parallel with the accumulation engine 1, and is responsible for reading the weight accumulation value corresponding to the neuron number in the weight accumulation storage space 270 according to the number of neurons No. ⁇ The weight corresponding to the neuron number output by the decompression engine 24) is accumulated with the weight accumulation value, and then the accumulated value is written into the weight accumulation storage space 270.
  • n accumulation engines and calculation engines included in the chip shown in FIG. 14 is divided into two groups. Several groups to determine.
  • the n neurons in a certain layer can also be divided into n groups, and each neuron is a group. In this way, n accumulation engines and n calculation engines are required.
  • Each accumulation engine is responsible for the weight accumulation of a neuron, and each calculation engine is responsible for the calculation of the membrane voltage of a neuron.
  • the weight compression uses a grouped semi-structured pruning scheme, the number of weights in each row and each group after pruning is the same. Accumulate in parallel, and use multiple computing engines for parallel computing. In this way, the computing speed of the impulse neural network chip is further increased, thereby improving computing efficiency and reducing time delay and power consumption.
  • FIG. 15 is a schematic block diagram of a spiking neural network system 1500 provided by an embodiment of the present application. As shown in FIG. 15 , the spiking neural network system 1500 may include a memory 1510 and a neural network circuit 1520 .
  • the memory 1510 may be used to store the compressed weight values, and as an example, the memory 1510 may correspond to the associated compressed weight storage space 240 above.
  • the memory 1510 can also be used to store the information of the input neurons.
  • the memory 1510 can correspond to the input buffer 205 above.
  • the neural network circuit 1520 may be the spiking neural network circuit shown in FIG. 8 , or the neural network circuit 1520 may also be the spiking neural network circuit shown in FIG. 14 .
  • the neural network circuit 1520 may be the spiking neural network circuit shown in FIG. 8 , or the neural network circuit 1520 may also be the spiking neural network circuit shown in FIG. 14 .
  • the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be dealt with in the embodiments of the present application. implementation constitutes any limitation.
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

Abstract

A spiking neural network circuit and a spiking neural network-based calculation method, the circuit comprising a plurality of decompression modules and a calculation module. The plurality of decompression modules are respectively used to, according to information of a plurality of input neurons, obtain a plurality of weight values in a compressed weight matrix and identifiers of a plurality of corresponding output neurons, wherein each decompression module among the plurality of decompression modules is used to obtain in parallel the weight values of the same row number in the compressed weight matrix and the identifiers of the plurality of output neurons corresponding to the weight values of the same row number; the number of non-zero weight values of each row in the compressed weight matrix is the same; and the weight value of each row corresponds to one input neuron. The calculation module is used to respectively determine film voltages of the plurality of corresponding output neurons according to the plurality of weight values. The spiking neural network circuit can increase the calculation efficiency.

Description

脉冲神经网络电路和基于脉冲神经网络的计算方法Impulsive neural network circuit and computing method based on spiking neural network
本申请要求于2021年04月02日提交中国国家知识产权局、申请号为202110363578.3、申请名称为“一种脉冲神经网络压缩方法和装置”的中国专利申请的优先权,以及于2021年05月28日提交中国国家知识产权局、申请号为202110588707.9、申请名称为“脉冲神经网络电路和基于脉冲神经网络的计算方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202110363578.3 and the application title "A Pulse Neural Network Compression Method and Device" filed with the State Intellectual Property Office of China on April 2, 2021, and filed on May 2021 The priority of the Chinese patent application filed on the 28th with the State Intellectual Property Office of China, the application number is 202110588707.9, and the application name is "Spurious Neural Network Circuit and Computing Method Based on Spurious Neural Network", the entire contents of which are incorporated by reference in this application.
技术领域technical field
本申请涉及图像处理领域,并且更具体地,涉及一种脉冲神经网络电路和基于脉冲神经网络的计算方法。The present application relates to the field of image processing, and more particularly, to a spiking neural network circuit and a computing method based on the spiking neural network.
背景技术Background technique
脉冲神经网络(spiking neural network,SNN)作为一种新兴的神经网络,经常被誉为第三代人工神经网络,其在信息处理方式和生物学模型上比传统的人工神经网络更加接近真实的生物处理系统。As a new type of neural network, spiking neural network (SNN) is often referred to as the third-generation artificial neural network. It is closer to the real biology in terms of information processing methods and biological models than traditional artificial neural networks. processing system.
脉冲神经网络中的神经元之间通过脉冲的形式进行信息的传递,一个脉冲的发生是由代表各种生物处理过程的微分方程所决定,其中最重要的是神经元的膜电压。每一个神经元通过累积前序神经元的脉冲序列,其膜电压会随着的输入的脉冲而发生改变。当神经元的膜电压达到某一预设电压值,该神经元会被激活后会产生一个新的信号(例如,发放一个脉冲),并将该信号传递给与其连接的其他神经元。相关的脉冲神经网络电路计算神经元的膜电压的效率较低。Information is transmitted between neurons in a spiking neural network in the form of pulses. The occurrence of a pulse is determined by differential equations representing various biological processes, the most important of which is the neuron's membrane voltage. By accumulating the pulse train of the pre-neuron, the membrane voltage of each neuron changes with the input pulse. When the membrane voltage of a neuron reaches a preset voltage value, the neuron will be activated to generate a new signal (for example, firing a pulse), and transmit the signal to other neurons connected to it. Correlated spiking neural network circuits are less efficient at calculating the membrane voltage of neurons.
发明内容SUMMARY OF THE INVENTION
本申请提供一种脉冲神经网络电路和基于脉冲神经网络的计算方法,该脉冲神经网络电路能够提高计算效率。The present application provides a spiking neural network circuit and a computing method based on the spiking neural network, and the spiking neural network circuit can improve computing efficiency.
第一方面,提供了一种脉冲神经网络电路,该电路包括:多个解压缩模块,计算模块。其中,多个解压缩模块用于分别根据多个输入神经元的信息获得压缩后的权重矩阵中的多个权重值以及对应的多个输出神经元的标识,其中,所述多个解压缩模块中的每个解压缩模块用于并行获得所述压缩后的权重矩阵中相同行数的权重值以及所述相同行数的权重值对应的多个输出神经元的标识,所述压缩后的权重矩阵中每一行的非零权重值的数量相同,每一行的权重值对应一个输入神经元;计算模块用于根据所述多个权重值分别确定对应的所述多个输出神经元的膜电压。In a first aspect, an impulse neural network circuit is provided, and the circuit includes: a plurality of decompression modules and a calculation module. Wherein, the multiple decompression modules are used to obtain multiple weight values in the compressed weight matrix and the identifiers of the corresponding multiple output neurons according to the information of multiple input neurons, wherein the multiple decompression modules Each decompression module in is used to obtain in parallel the weight values of the same row number in the compressed weight matrix and the identifiers of multiple output neurons corresponding to the weight values of the same row number, and the compressed weights The number of non-zero weight values of each row in the matrix is the same, and the weight value of each row corresponds to an input neuron; the calculation module is used to determine the corresponding membrane voltages of the plurality of output neurons according to the plurality of weight values.
上述技术方案中,由于压缩后的权重矩阵中每一行的非零权重值的数量相同,脉冲神经网络电路中多个解压缩模块的每个解压缩模块用于并行获得所述压缩后的权重矩阵中相同行数的权重值以及所述相同行数的权重值对应的多个输出神经元的标识,这样,多个解压缩模块同时进行并行解压缩,增加硬了脉冲神经网络芯片的运算速度,从而提高计算效率,达到降低时延和功耗的效果。In the above technical solution, since the number of non-zero weight values in each row in the compressed weight matrix is the same, each decompression module of the multiple decompression modules in the spiking neural network circuit is used to obtain the compressed weight matrix in parallel. The weight value of the same number of rows and the identifiers of the multiple output neurons corresponding to the weight value of the same number of rows in the same number of rows, in this way, multiple decompression modules perform parallel decompression at the same time, which increases the computing speed of the pulse neural network chip. Thereby, the computing efficiency is improved, and the effect of reducing the delay and power consumption is achieved.
结合第一方面,在第一方面的某些实现方式中,所述脉冲神经网络电路中的输入神经元包括第一输入神经元和第二输入神经元,所述多个解压缩模块包括第一解压缩模块和第二解压缩模块,所述第一解压缩模块用于获得所述压缩后的权重矩阵中与所述第一输入神经元对应的第一行权重值以及所述第一行权重值分别对应的一个或多个输出神经元的标识;所述第二解压缩模块用于获得所述压缩后的权重矩阵中与所述第二输入神经元对应的第二行权重值以及所述第二行权重值分别对应的一个或多个输出神经元的标识。With reference to the first aspect, in some implementations of the first aspect, the input neurons in the spiking neural network circuit include a first input neuron and a second input neuron, and the plurality of decompression modules include a first input neuron. A decompression module and a second decompression module, the first decompression module is used to obtain the first row weight value and the first row weight corresponding to the first input neuron in the compressed weight matrix The identifier of one or more output neurons corresponding to the values respectively; the second decompression module is used to obtain the weight value of the second row corresponding to the second input neuron in the compressed weight matrix and the The identifiers of one or more output neurons corresponding to the weight values in the second row respectively.
结合第一方面,在第一方面的某些实现方式中,所述第一解压模块具体用于:从第一存储空间中获得存储所述第一行权重值的基地址,所述第一存储空间中存储有所述压缩后的权重矩阵中每一行权重值的基地址以及每一行非零权重值数目;根据所述第一行权重值的基地址从第二存储空间中获得所述第一行权重值,以及所述第一行权重值分别对应的输出神经元的标识,所述第二存储空间中存储有所述第一行权重值以及和所述第一行权重值对应的输出神经元的标识。With reference to the first aspect, in some implementations of the first aspect, the first decompression module is specifically configured to: obtain a base address for storing the weight value of the first row from a first storage space, and the first storage The base address of each row of weight values in the compressed weight matrix and the number of non-zero weight values in each row are stored in the space; the first row of weight values is obtained from the second storage space according to the base address of the first row of weight values. row weight value, and the identifier of the output neuron corresponding to the first row weight value respectively, the second storage space stores the first row weight value and the output neuron corresponding to the first row weight value Element ID.
结合第一方面,在第一方面的某些实现方式中,所述脉冲神经网络电路还包括:压缩模块,用于根据剪枝比例对初始权重矩阵中的部分权重值进行剪枝,得到所述压缩后的权重矩阵。With reference to the first aspect, in some implementations of the first aspect, the spiking neural network circuit further includes: a compression module, configured to prune part of the weight values in the initial weight matrix according to the pruning ratio, to obtain the The compressed weight matrix.
结合第一方面,在第一方面的某些实现方式中,所述压缩后的权重矩阵包括多个权重组,所述多个权重组的每个权重组中每一行的非零权重值数目相同。With reference to the first aspect, in some implementations of the first aspect, the compressed weight matrix includes multiple weight groups, and the number of non-zero weight values in each row in each weight group of the multiple weight groups is the same .
结合第一方面,在第一方面的某些实现方式中,所述计算模块中包括多个计算子模块,所述多个计算子模块中的每个计算子模块用于并行负责计算一个权重组中的输出神经元的膜电压。With reference to the first aspect, in some implementations of the first aspect, the computing module includes multiple computing sub-modules, and each computing sub-module in the multiple computing sub-modules is responsible for computing a weight group in parallel Membrane voltage of output neurons in .
上述技术方案中,由于压缩后的权重矩阵中多个权重组的每个权重组中每一行的非零权重值数目相同,这样,可以使用多个计算子模块,每个计算子模块用于并行负责计算一个权重组中的输出神经元的膜电压。这样,多个计算子模块同时进行并行计算,增加硬了脉冲神经网络芯片的运算速度,从而提高计算效率,达到降低时延和功耗的效果。In the above technical solution, since the number of non-zero weight values in each row in each weight group of the multiple weight groups in the compressed weight matrix is the same, in this way, multiple calculation submodules can be used, and each calculation submodule is used in parallel. Responsible for calculating the membrane voltage of the output neurons in a weight group. In this way, multiple computing sub-modules perform parallel computing at the same time, which increases the computing speed of the impulsive neural network chip, thereby improving computing efficiency and reducing time delay and power consumption.
结合第一方面,在第一方面的某些实现方式中,所述多个计算子模块包括第一计算子模块和第二计算子模块,所述第一计算子模块包括第一累加引擎和第一计算引擎,所述第二计算子模块包括第二累加引擎和第二计算引擎,所述第一累加引擎用于确定与所述第一计算子模块对应的第一权重组中的输出神经元对应的权重累加值;所述第一计算引擎用于根据所述第一累加引擎输出的权重累加值,确定所述第一权重组中的输出神经元在当前时刻的膜电压;所述第二累加引擎用于确定与所述第二计算子模块对应的第二权重组中的输出神经元对应的权重累加值;所述第二计算引擎用于根据所述第二累加引擎输出的权重累加值,确定所述第二权重组中的输出神经元在所述当前时刻的膜电压。With reference to the first aspect, in some implementations of the first aspect, the plurality of calculation submodules include a first calculation submodule and a second calculation submodule, and the first calculation submodule includes a first accumulation engine and a first calculation submodule. a calculation engine, the second calculation sub-module includes a second accumulation engine and a second calculation engine, the first accumulation engine is used to determine the output neurons in the first weight group corresponding to the first calculation sub-module the corresponding weight accumulation value; the first calculation engine is configured to determine the membrane voltage of the output neuron in the first weight group at the current moment according to the weight accumulation value output by the first accumulation engine; the second The accumulation engine is used to determine the weight accumulation value corresponding to the output neuron in the second weight group corresponding to the second calculation submodule; the second calculation engine is used for the weight accumulation value output by the second accumulation engine , determine the membrane voltage of the output neurons in the second weight group at the current moment.
第二方面,提供了一种基于脉冲神经网络的计算方法,包括:分别根据多个输入神经元的信息获得压缩后的权重矩阵中的多个权重值以及对应的多个输出神经元的标识,其中,所述多个权重值包括并行获得的所述压缩后的权重矩阵中相同行数的权重值,所述多个输出神经元的标识包括并行获得的与所述相同行数的权重值对应的多个输出神经元的标识,所述压缩后的权重矩阵中每一行的非零权重值的数量相同,每一行的权重值对应一个输入神经元;根据所述多个权重值分别确定对应的所述多个输出神经元的膜电压。In a second aspect, a computing method based on a spiking neural network is provided, comprising: obtaining multiple weight values in the compressed weight matrix and the identifiers of the corresponding multiple output neurons according to the information of multiple input neurons, respectively, The multiple weight values include weight values of the same row number in the compressed weight matrix obtained in parallel, and the identifiers of the multiple output neurons include the parallel obtained weight values corresponding to the same row number The number of non-zero weight values of each row in the compressed weight matrix is the same, and the weight value of each row corresponds to an input neuron; according to the multiple weight values, the corresponding the membrane voltage of the plurality of output neurons.
结合第二方面,在第二方面的某些实现方式中,该脉冲神经网络电路的输入神经神经 元包括第一输入神经元和第二输入神经元,获得所述压缩后的权重矩阵中与所述第一输入神经元对应的第一行权重值,以及所述第一行权重值分别对应的一个或多个输出神经元的标识;获得所述压缩后的权重矩阵中与所述第二输入神经元对应的第二行权重值,以及所述第二行权重值分别对应的一个或多个输出神经元的标识。With reference to the second aspect, in some implementations of the second aspect, the input neuron of the spiking neural network circuit includes a first input neuron and a second input neuron, and the compressed weight matrix is obtained and the obtaining the first row weight value corresponding to the first input neuron, and the identifiers of one or more output neurons corresponding to the first row weight value respectively; The second row of weight values corresponding to the neurons, and the identifiers of one or more output neurons corresponding to the second row of weight values respectively.
结合第二方面,在第二方面的某些实现方式中,在所述分别获得压缩后的权重矩阵中的多个权重值以及对应的多个输出神经元的标识之前,所述方法还包括:根据剪枝比例对初始权重矩阵中的部分权重值进行剪枝,得到所述压缩后的权重矩阵。With reference to the second aspect, in some implementations of the second aspect, before obtaining the multiple weight values in the compressed weight matrix and the identifiers of the corresponding multiple output neurons, the method further includes: Part of the weight values in the initial weight matrix is pruned according to the pruning ratio to obtain the compressed weight matrix.
结合第二方面,在第二方面的某些实现方式中,所述压缩后的权重矩阵包括多个权重组,所述多个权重组的每个权重组中每一行的非零权重值数目相同。With reference to the second aspect, in some implementations of the second aspect, the compressed weight matrix includes multiple weight groups, and the number of non-zero weight values in each row in each weight group of the multiple weight groups is the same .
结合第二方面,在第二方面的某些实现方式中,并行根据所述每个权重组中的多个权重值分别确定对应的所述多个输出神经元的膜电压。With reference to the second aspect, in some implementations of the second aspect, the membrane voltages of the corresponding multiple output neurons are determined in parallel according to multiple weight values in each weight group.
结合第二方面,在第二方面的某些实现方式中,所述多个权重组包括第一权重组和第二权重组,确定与所述第一权重组中的输出神经元对应的权重累加值,并根据所述第一权重组中的输出神经元对应的权重累加值确定所述第一权重组中的输出神经元在当前时刻的膜电压;以及确定与所述第二权重组中的输出神经元对应的权重累加值,并根据所述第二权重组中的输出神经元对应的权重累加值确定所述第二权重组中的输出神经元在所述当前时刻的膜电压。With reference to the second aspect, in some implementations of the second aspect, the multiple weight groups include a first weight group and a second weight group, and the weight accumulation corresponding to the output neurons in the first weight group is determined value, and determine the membrane voltage of the output neurons in the first weight group at the current moment according to the weight accumulation value corresponding to the output neurons in the first weight group; The weight accumulation value corresponding to the output neuron is output, and the membrane voltage of the output neuron in the second weight group at the current moment is determined according to the weight accumulation value corresponding to the output neuron in the second weight group.
第二方面和第二方面的任意一个可能的实现方式的有益效果和第一方面以及第一方面的任意一个可能的实现方式的有益效果是对应的,对此,不再赘述。The beneficial effects of the second aspect and any possible implementation manner of the second aspect correspond to the beneficial effects of the first aspect and any possible implementation manner of the first aspect, which will not be repeated here.
第三方面,提供了一种脉冲神经网络系统,包括存储器以及如第一方面以及第一方面的任意一个可能的实现方式的神经网络电路,所述存储器用于存储压缩后的多个权重值。In a third aspect, a spiking neural network system is provided, including a memory and a neural network circuit according to any one of the possible implementations of the first aspect and the first aspect, where the memory is used to store a plurality of compressed weight values.
结合第三方面,在第三方面的某些实现方式中,所述存储器还用于存储多个输入神经元的信息。With reference to the third aspect, in some implementations of the third aspect, the memory is further configured to store information of a plurality of input neurons.
第四方面,提供了一种脉冲神经网络系统,包括处理器以及如第一方面以及第一方面的任意一个可能的实现方式的神经网络电路,所述处理器包括输入缓存,所述输入缓存用于缓存所述多个输入神经元的信息。In a fourth aspect, an spiking neural network system is provided, including a processor and a neural network circuit according to any one possible implementation of the first aspect and the first aspect, the processor includes an input buffer, and the input buffer uses for caching the information of the plurality of input neurons.
结合第四方面,在第四方面的某些实现方式中,所述系统还包括存储器,所述存储器用于存储压缩后的多个权重值。With reference to the fourth aspect, in some implementations of the fourth aspect, the system further includes a memory for storing the compressed multiple weight values.
第五方面,提供了一种确定脉冲神经元的膜电压的装置,包括通信接口和处理器。其中,该处理器用于控制该通信接口收发信息,该处理器与该通信接口连接,并用于执行第二方面或第二方面任意一种可能的实现方式中确定脉冲神经元的膜电压的方法。In a fifth aspect, an apparatus for determining the membrane voltage of a spiking neuron is provided, comprising a communication interface and a processor. Wherein, the processor is configured to control the communication interface to send and receive information, the processor is connected to the communication interface, and is configured to execute the method for determining the membrane voltage of the spiking neuron in the second aspect or any possible implementation manner of the second aspect.
可选地,该处理器可以是通用处理器,可以通过硬件来实现也可以通过软件来实现。当通过硬件实现时,该处理器可以是逻辑电路、集成电路等;当通过软件来实现时,该处理器可以是一个通用处理器,通过读取存储器中存储的软件代码来实现,该存储器可以集成在处理器中,可以位于该处理器之外,独立存在。Optionally, the processor may be a general-purpose processor, which may be implemented by hardware or software. When implemented by hardware, the processor can be a logic circuit, an integrated circuit, etc.; when implemented by software, the processor can be a general-purpose processor, implemented by reading software codes stored in a memory, which can Integrated in the processor, can be located outside the processor, independent existence.
第六方面,提供了一种计算机程序产品,该计算机程序产品包括:计算机程序代码,当该计算机程序代码在计算设备上运行时,使得计算设备执行上述第二方面或第二方面的任一种可能执行的方法。In a sixth aspect, there is provided a computer program product, the computer program product comprising: computer program code, when the computer program code is run on a computing device, the computing device is made to execute any one of the second aspect or the second aspect possible methods.
第七方面,提供了一种计算机可读介质,该计算机可读介质存储有程序代码,当 该计算机程序代码在计算设备上运行时,使得计算设备执行上述第二方面或第二方面的任一种可能执行的方法。这些计算机可读存储包括但不限于如下的一个或者多个:只读存储器(read-only memory,ROM)、可编程ROM(programmable ROM,PROM)、可擦除的PROM(erasable PROM,EPROM)、Flash存储器、电EPROM(electrically EPROM,EEPROM)以及硬盘驱动器(harddrive)。In a seventh aspect, a computer-readable medium is provided, and the computer-readable medium stores program codes that, when the computer program codes are executed on a computing device, cause the computing device to execute the second aspect or any one of the second aspects. a possible way to do it. These computer-readable storages include, but are not limited to, one or more of the following: read-only memory (ROM), programmable ROM (PROM), erasable PROM (erasable PROM, EPROM), Flash memory, electrical EPROM (electrically EPROM, EEPROM) and hard drive (harddrive).
附图说明Description of drawings
图1示出了一种脉冲神经网络的结构示意图。Figure 1 shows a schematic diagram of the structure of a spiking neural network.
图2是本申请实施例提供的一种脉冲神经网络权重压缩的方法的示意性流程图。FIG. 2 is a schematic flowchart of a method for weight compression of a spiking neural network provided by an embodiment of the present application.
图3是本申请实施例提供的一种脉冲神经网络的初始权重矩阵示意图。FIG. 3 is a schematic diagram of an initial weight matrix of a spiking neural network provided by an embodiment of the present application.
图4是本申请实施例提供的一种对脉冲神经网络的初始权重矩阵进行半结构化剪枝的示意图。FIG. 4 is a schematic diagram of semi-structured pruning of an initial weight matrix of a spiking neural network provided by an embodiment of the present application.
图5是本申请实施例提供的一种经过半结构化剪枝后的脉冲神经网络的结构示意图。FIG. 5 is a schematic structural diagram of a spiking neural network after semi-structured pruning provided by an embodiment of the present application.
图6是本申请实施例提供的一种对脉冲神经网络的初始权重矩阵进行分组半结构化剪枝的示意图。FIG. 6 is a schematic diagram of grouping and semi-structured pruning of an initial weight matrix of a spiking neural network provided by an embodiment of the present application.
图7是本申请实施例提供的一种经过分组半结构化剪枝后的脉冲神经网络的结构示意图。FIG. 7 is a schematic structural diagram of a spiking neural network after grouping semi-structured pruning provided by an embodiment of the present application.
图8是本申请实施例提供的一种脉冲神经网络电路的架构示意图。FIG. 8 is a schematic structural diagram of an impulse neural network circuit provided by an embodiment of the present application.
图9是本申请实施例提供的一种关联压缩权重存储空间240的示意性框图。FIG. 9 is a schematic block diagram of an associated compression weight storage space 240 provided by an embodiment of the present application.
图10是本申请实施例提供的一种解压引擎获得权重值以及对应的输出神经元的示意性框图。FIG. 10 is a schematic block diagram of a decompression engine obtaining weight values and corresponding output neurons according to an embodiment of the present application.
图11是本申请实施例提供的一种累加引擎获得输出神经元的权重累加值的示意性框图。FIG. 11 is a schematic block diagram of an accumulation engine obtaining weight accumulation values of output neurons according to an embodiment of the present application.
图12是本申请实施例提供的一种计算引擎确定脉冲神经元的膜电压的示意性框图。FIG. 12 is a schematic block diagram of a calculation engine determining the membrane voltage of a spiking neuron according to an embodiment of the present application.
图13是本申请实施例提供的一种进行脉冲神经元膜电压计算的方法的示意性流程图。FIG. 13 is a schematic flowchart of a method for calculating the membrane voltage of a spiking neuron provided by an embodiment of the present application.
图14是本申请实施例提供的另一种脉冲神经网络电路的架构示意图。FIG. 14 is a schematic structural diagram of another spiking neural network circuit provided by an embodiment of the present application.
图15是本申请实施例提供的一种脉冲神经网络系统1500的示意性框图。FIG. 15 is a schematic block diagram of a spiking neural network system 1500 provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合附图,对本申请中的技术方案进行描述。The technical solutions in the present application will be described below with reference to the accompanying drawings.
人工智能(artificial intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能领域的研究包括机器人,自然语言处理,计算机视觉,决策与推理,人机交互,推荐与搜索,AI基础理论等。Artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that responds in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theory.
在AI领域中,神经网络(neural network,NN)是一种模仿生物神经网络(动物的中枢神经系统,特别是大脑)的结构和功能的数学模型或计算模型,用于对函数进行估计或近似。生物大脑内部是由大量的神经元通过不同的连接方式组合而成,前一个神经元与后一个神经元之间是通过突触结构连接起来进行信息传递的。脉冲神经网络(spiking neural  network,SNN)作为一种新兴的神经网络,经常被誉为第三代人工神经网络,其在信息处理方式和生物学模型上比传统的人工神经网络更加接近真实的生物处理系统。具体的,一方面,人工神经网络传输的是多值信号,脉冲神经网络传输的二值脉冲信息,因此其输入和输出信息是稀疏,脉冲神经网络具有低功耗特性;另一方面,脉冲神经网络的神经元模型是类似大脑神经元模型,具有动力学积累过程,比传统的人工神经网络多一个时间维度信息,因此更适合处理具有时间信息的智能任务。In the field of AI, a neural network (NN) is a mathematical model or computational model that imitates the structure and function of a biological neural network (animal central nervous system, especially the brain), and is used to estimate or approximate functions . The inside of the biological brain is composed of a large number of neurons through different connection methods. The former neuron and the latter neuron are connected through the synaptic structure for information transmission. As a new type of neural network, spiking neural network (SNN) is often referred to as the third-generation artificial neural network, which is closer to the real biology in terms of information processing methods and biological models than traditional artificial neural networks. processing system. Specifically, on the one hand, the artificial neural network transmits multi-valued signals, and the spiking neural network transmits binary pulse information, so its input and output information is sparse, and the spiking neural network has low power consumption; on the other hand, the spiking neural network The neuron model of the network is similar to the neuron model of the brain, which has a dynamic accumulation process and has one more time dimension information than the traditional artificial neural network, so it is more suitable for processing intelligent tasks with time information.
图1示出了一种脉冲神经网络的结构示意图。参见图1,该脉冲神经网络中可以包含三个层次:输入层、隐藏层、输出层。其中,隐藏层中又含有多层,层内逻辑并行,层间逻辑串行,其层间的计算结果相互依赖和影响。为了便于描述,图1中以隐藏层包含一层神经元为例进行说明。Figure 1 shows a schematic diagram of the structure of a spiking neural network. Referring to Figure 1, the spiking neural network can contain three layers: input layer, hidden layer, and output layer. Among them, the hidden layer contains multiple layers, the logic in the layer is parallel, and the logic between the layers is serial, and the calculation results between the layers are interdependent and affect each other. For the convenience of description, Fig. 1 takes the example that the hidden layer includes a layer of neurons for illustration.
参见图1,脉冲神经网络的每一层中可以包括多个节点,每个节点用于模拟一个脉冲神经元,用于执行某个特定运算,例如激活函数。前一个神经元(也可以称为输入神经元)与后一个神经元(也可以称为输出神经元)之间的连接用于模拟神经突触(synapse)。应理解,突触是两个神经元之间传递信息的载体,突触的权重值代表了两个神经元之间的连接强度。应理解,图1所示的每个节点中的标号仅是为了标识或区分不同的节点。Referring to Fig. 1, each layer of a spiking neural network may include multiple nodes, each of which is used to simulate a spiking neuron for performing a certain operation, such as an activation function. The connection between the former neuron (which may also be called an input neuron) and the latter neuron (which may also be called an output neuron) is used to simulate a synapse. It should be understood that a synapse is a carrier of information transmission between two neurons, and the weight value of the synapse represents the connection strength between the two neurons. It should be understood that the reference numbers in each node shown in FIG. 1 are only for identifying or distinguishing different nodes.
脉冲神经网络中的神经元之间通过脉冲的形式进行信息的传递,是基于发生在某些时间点的离散值活动,而不是连续值。一个脉冲的发生是由代表各种生物处理过程的微分方程所决定,其中最重要的是神经元的膜电压。每一个神经元通过累积前序神经元的脉冲序列,其膜电压会随着的输入的脉冲而发生改变。当神经元的膜电压达到某一预设电压值,该神经元会被激活后会产生一个新的信号(例如,发放一个脉冲),并将该信号传递给与其连接的其他神经元。该神经元发放脉冲后,其膜电压会进行复位,并继续通过累积前序神经元的脉冲序列而改变神经元的膜电压。脉冲神经网络中的神经元通过上述方式实现信息的传递和处理,具有非线性、自适应以及容错性等信息处理能力。The transmission of information between neurons in a spiking neural network in the form of spikes is based on discrete-valued activities that occur at certain points in time, rather than continuous values. The occurrence of a pulse is determined by differential equations representing various biological processes, the most important of which is the neuron's membrane voltage. By accumulating the pulse train of the pre-neuron, the membrane voltage of each neuron changes with the input pulse. When the membrane voltage of a neuron reaches a preset voltage value, the neuron will be activated to generate a new signal (for example, firing a pulse), and transmit the signal to other neurons connected to it. After the neuron fires a pulse, its membrane voltage resets and continues to change the neuron's membrane voltage by accumulating the pre-neuron's pulse train. The neurons in the spiking neural network realize the transmission and processing of information through the above methods, and have information processing capabilities such as nonlinearity, self-adaptation, and fault tolerance.
需要说明的是,脉冲神经网络中的两个神经元之间可以采用一个突触连接,或者还可以采用多突触连接,本申请对此不做具体限定。每个突触上具有可修改的突触权值(也可以称为权重),突触前的神经元传递的多个脉冲可以根据突触权值的大小产生不同的突触后膜电压。It should be noted that, one synaptic connection may be used between two neurons in the spiking neural network, or multiple synaptic connections may also be used, which is not specifically limited in this application. Each synapse has a modifiable synaptic weight (also called weight), and multiple pulses transmitted by presynaptic neurons can generate different postsynaptic membrane voltages according to the size of the synaptic weight.
虽然脉冲神经网络在运行过程中具有稀疏和低功耗等特性,但是其精度不高,为了提高网络精度,其权重数目也会很大,这就使得脉冲神经网络芯片中权重存储过大,从而使得芯片的面积、时延和功耗都会相应变大,制约着脉冲神经网络的硬件化发展和商业化。因此,对脉冲神经网络进行权重压缩具有重要的意义。Although the spiking neural network has the characteristics of sparseness and low power consumption during operation, its accuracy is not high. In order to improve the accuracy of the network, the number of weights will also be large, which makes the weight storage in the spiking neural network chip too large. The area, delay and power consumption of the chip will increase accordingly, which restricts the hardware development and commercialization of the spiking neural network. Therefore, it is of great significance to compress the weights of spiking neural networks.
有鉴于此,本申请实施例提供了一种脉冲神经网络权重压缩的方法,该方法可以使得权重矩阵中每行的非零数目或每行每组的非零数目相同,这样,在节省脉冲神经网络硬件层面的权重存储资源的同时,还可以实现脉冲神经网络硬件层面的并行解压和并行计算,增加了运算速度,从而提高脉冲神经网络硬件层面的计算效率,达到降低时延和功耗的效果。In view of this, an embodiment of the present application provides a method for compressing the weight of a spiking neural network, which can make the number of non-zeros in each row or the number of non-zeros in each row and group the same in the weight matrix, so as to save the spiking neural network. In addition to the weight storage resources at the network hardware level, it can also realize parallel decompression and parallel computing at the hardware level of the spiking neural network, increasing the computing speed, thereby improving the computing efficiency of the spiking neural network hardware level and reducing the delay and power consumption. .
图2是本申请实施例提供的一种脉冲神经网络权重压缩的方法的示意性流程图。如图2所示,该方法可以包括步骤210-270,下面分别对步骤210-270进行详细描述。FIG. 2 is a schematic flowchart of a method for weight compression of a spiking neural network provided by an embodiment of the present application. As shown in FIG. 2, the method may include steps 210-270, and the steps 210-270 will be described in detail below respectively.
步骤210:载入预训练的脉冲神经网络,获取初始权重。Step 210: Load the pre-trained spiking neural network to obtain initial weights.
以图1所示的脉冲神经网络中的隐藏层为例,隐藏层的初始权重矩阵如图3所示。其中,该初始权重矩阵中的每一行代表一个输入神经元,例如,和隐藏层连接的输入层的神经元。每一列代表一个输出神经元。例如,隐藏层的一个神经元。Taking the hidden layer in the spiking neural network shown in Figure 1 as an example, the initial weight matrix of the hidden layer is shown in Figure 3. where each row in the initial weight matrix represents an input neuron, eg, the neuron of the input layer connected to the hidden layer. Each column represents an output neuron. For example, a neuron in the hidden layer.
举例说明,图3所示的初始权重矩阵中,第一列的权重W 11~W 41表示图1中隐藏层的1号神经元对应的权重值;第二列的权重W 12~W 42表示图1中隐藏层的2号神经元对应的权重值;第三列的权重W 13~W 43表示图1中隐藏层的3号神经元对应的权重值;第四列的权重W 14~W 44表示图1中隐藏层的4号神经元对应的权重值;第五列的权重W 15~W 45表示图1中隐藏层的5号神经元对应的权重值;第六列的权重W 16~W 46表示图1中隐藏层的6号神经元对应的权重值。第一行的权重W 11~W 16表示图1中输入层的7号神经元对应的权重值;第二行的权重W 21~W 26表示图1中输入层的8号神经元对应的权重值;第三行的权重W 31~W 36表示图1中输入层的9号神经元对应的权重值;第四行的权重W 41~W 46表示图1中输入层的10号神经元对应的权重值。 For example, in the initial weight matrix shown in FIG. 3 , the weights W 11 to W 41 in the first column represent the weight values corresponding to the neuron No. 1 in the hidden layer in FIG. 1 ; the weights W 12 to W 42 in the second column represent The weight values corresponding to the No. 2 neuron in the hidden layer in Figure 1; the weights W 13 to W 43 in the third column represent the weight values corresponding to the No. 3 neuron in the hidden layer in Figure 1; the weights W 14 to W in the fourth column 44 represents the weight value corresponding to the No. 4 neuron in the hidden layer in Figure 1; the weights W 15 to W 45 in the fifth column represent the weight value corresponding to the No. 5 neuron in the hidden layer in Figure 1; the sixth column The weight W 16 ~W 46 represents the weight value corresponding to neuron No. 6 in the hidden layer in Figure 1. The weights W 11 to W 16 in the first row represent the weight values corresponding to the No. 7 neuron in the input layer in Figure 1; the weights W 21 to W 26 in the second row represent the weights corresponding to the No. 8 neuron in the input layer in Figure 1 value; the weights W 31 to W 36 in the third row represent the weight values corresponding to the No. 9 neuron in the input layer in Figure 1; the weights W 41 to W 46 in the fourth row represent the corresponding weights for the No. 10 neuron in the input layer in Figure 1 weight value.
步骤220:根据需求选择不同的权重矩阵剪枝方案。Step 220: Select different weight matrix pruning schemes according to requirements.
作为示例,如果需要实现权重矩阵中每一行的非零权重数目相同,可以执行步骤230中的半结构化剪枝;如果需要对权重矩阵进行分组,并实现权重矩阵的每一行每一组的非零权重数目相同,可以执行步骤240中的分组半结构化剪枝。As an example, if it is necessary to achieve the same number of non-zero weights in each row of the weight matrix, semi-structured pruning in step 230 may be performed; if the weight matrix needs to be grouped, and the non-zero weights of each row and each group of the weight matrix should be implemented. The number of zero weights is the same, and the grouped semi-structured pruning in step 240 can be performed.
步骤230:半结构化剪枝。Step 230: Semi-structured pruning.
本申请实施例中的半结构化剪枝是指对以权重矩阵的每一行为粒度进行权重剪枝,得到半结构化剪枝后的权重矩阵。具体的,可以以权重矩阵的每一行为粒度,根据权重大小对原始权重矩阵中每一行的权重值进行排序,然后将排序最末的s%(稀疏度)设置为0。从而得到半结构化剪枝后的权重矩阵,使得半结构化剪枝后的权重矩阵中每一行的长度相同。The semi-structured pruning in the embodiment of the present application refers to performing weight pruning at the granularity of each behavior of the weight matrix to obtain a semi-structured pruned weight matrix. Specifically, the weight value of each row in the original weight matrix can be sorted according to the weight size of each row of the weight matrix, and then the s% (sparseness) at the end of the sorting is set to 0. Thus, the weight matrix after semi-structured pruning is obtained, so that the length of each row in the weight matrix after semi-structured pruning is the same.
举例说明,以稀疏度为66.6%为例,可以将图3所示的初始权重矩阵中每一行的权重值进行排序,然后将排序最末的66.6%的权重值设置为0。如图4所示的权重矩阵中,虚线表示被剪枝掉的权重值,由实线组成的权重矩阵为剪枝后的权重矩阵。在该剪枝后的权重矩阵中,每一行的非零权重值的数量相同(均包括两个权重值)。例如,如图4所示,经过半结构化剪枝后的权重矩阵的第一行包括W 11、W 14两个权重,即图1中输入层的7号神经元分别和隐藏层的1号神经元、4号神经元连接,连接的权重值分别为W 11、W 14;第二行包括W 22、W 24两个权重,即图1中输入层的8号神经元分别和隐藏层的2号神经元、6号神经元连接,连接的权重值分别为W 22、W 26;第三行包括W 31、W 35两个权重,即图1中输入层的9号神经元分别和隐藏层的1号神经元、5号神经元连接,连接的权重值分别为W 31、W 35;第四行包括W 43、W 44两个权重,即图1中输入层的10号神经元分别和隐藏层的3号神经元、4号神经元连接,连接的权重值分别为W 43、W 44。具体的有关经过半结构化剪枝后的脉冲神经网络结构如图5所示。 For example, taking the sparsity of 66.6% as an example, the weight value of each row in the initial weight matrix shown in Figure 3 can be sorted, and then the weight value of the last 66.6% of the sorting can be set to 0. In the weight matrix shown in Figure 4, the dotted line represents the pruned weight value, and the weight matrix composed of solid lines is the pruned weight matrix. In the pruned weight matrix, the number of non-zero weight values in each row is the same (both include two weight values). For example, as shown in Figure 4, the first row of the weight matrix after semi-structured pruning includes two weights, W 11 and W 14 , namely the No. 7 neuron in the input layer in Figure 1 and the No. 1 neuron in the hidden layer respectively. The neuron and the No. 4 neuron are connected, and the weights of the connection are W 11 and W 14 respectively; the second line includes two weights of W 22 and W 24 , that is, the No. 8 neuron in the input layer in Figure 1 and the hidden layer are respectively The No. 2 neuron and No. 6 neuron are connected, and the weight values of the connection are W 22 and W 26 respectively; the third line includes two weights of W 31 and W 35 , that is, the No. 9 neuron in the input layer in Figure 1 and the hidden The No. 1 neuron and No. 5 neuron of the layer are connected, and the weight values of the connection are W 31 and W 35 respectively; the fourth line includes two weights of W 43 and W 44 , that is, the No. 10 neuron of the input layer in Figure 1 is respectively It is connected with the No. 3 neuron and No. 4 neuron in the hidden layer, and the weight values of the connection are W 43 and W 44 respectively. The specific structure of the spiking neural network after semi-structured pruning is shown in Figure 5.
也就是说,剪枝后的权重矩阵中每一行的长度相同,同一层神经元的每个神经元和后层神经元连接的数目一致。That is to say, the length of each row in the pruned weight matrix is the same, and the number of connections between each neuron in the same layer and the neuron in the next layer is the same.
步骤240:分组半结构化剪枝。Step 240: Group semi-structured pruning.
本申请实施例中的分组半结构化剪枝是指将每一行分为若干个相等数目的权重组,并以权重矩阵的每一行中的每一组为粒度进行权重剪枝,得到分组半结构化剪枝后的权重矩 阵。具体的,可以以权重矩阵的每一行中的每一组为粒度,根据权重大小对原始权重矩阵中每一行以及每一组的权重值进行排序,然后将排序最末的s%(稀疏度)设置为0。从而得到分组半结构化剪枝后的权重矩阵,使得分组半结构化剪枝后的权重矩阵中每一行以及每一组的长度相同。The grouping semi-structured pruning in the embodiment of the present application refers to dividing each row into several weight groups of equal number, and performing weight pruning with each group in each row of the weight matrix as the granularity to obtain the grouping semi-structure pruned weight matrix. Specifically, each group in each row of the weight matrix can be used as the granularity, and the weight values of each row and each group in the original weight matrix can be sorted according to the size of the weight, and then the last s% (sparseness) Set to 0. Thus, the weight matrix after grouping and semi-structured pruning is obtained, so that the lengths of each row and each group in the weight matrix after grouping and semi-structured pruning are the same.
举例说明,以每一组的稀疏度为66.6%为例,可以将图1所示的隐藏层中的神经元划分为两组,每一组中包括三个神经元,例如,第一组中包括神经元编号为1~3的三个神经元,第二组中包括神经元编号为4~6的三个神经元。可以对图3所示的初始权重矩阵中每一行中每一组的权重值进行排序,然后将排序最末的66.6%的权重值设置为0。如图6所示的权重矩阵中,虚线表示被剪枝掉的权重值,由实线组成的权重矩阵为剪枝后的权重矩阵。在该剪枝后的权重矩阵中,每一行中每一组的非零权重值的数量相同(均包括一个权重值),也就是说,分组剪枝后的权重矩阵中每一行以及每一组的长度相同。具体的有关经过分组半结构化剪枝后的脉冲神经网络结构如图7所示。For example, taking the sparsity of each group as 66.6%, the neurons in the hidden layer shown in Figure 1 can be divided into two groups, each group includes three neurons, for example, in the first group Three neurons with neuron numbers 1-3 are included, and three neurons with neuron numbers 4-6 are included in the second group. The weight values of each group in each row in the initial weight matrix shown in Figure 3 can be sorted, and then the bottom 66.6% of the sorted weight values are set to 0. In the weight matrix shown in Figure 6, the dotted line represents the pruned weight value, and the weight matrix formed by the solid line is the pruned weight matrix. In the weight matrix after pruning, the number of non-zero weight values of each group in each row is the same (including one weight value), that is, each row and each group in the weight matrix after group pruning of the same length. The specific structure of the spiking neural network after grouping semi-structured pruning is shown in Figure 7.
步骤250:根据剪枝后的权重计算损失函数。Step 250: Calculate a loss function according to the pruned weights.
应理解,损失函数用于通过计算脉冲神经网络的实际(目标)值和预测值之间的误差,有助于优化脉冲神经网络的参数。作为示例,本实施例中可以根据剪枝后的权重计算脉冲神经网络的损失函数,从而得到脉冲神经网络的实际(目标)值和预测值之间的误差,以便于根据该误差优化或更新剪枝后的权重矩阵。It will be appreciated that the loss function is used to assist in optimizing the parameters of the spiking neural network by calculating the error between the actual (target) value and the predicted value of the spiking neural network. As an example, in this embodiment, the loss function of the spiking neural network can be calculated according to the pruned weight, so as to obtain the error between the actual (target) value and the predicted value of the spiking neural network, so as to optimize or update the pruning according to the error. The weight matrix after the branch.
步骤260:重新训练脉冲神经网络,更新剪枝后的权重矩阵。Step 260: Retrain the spiking neural network and update the pruned weight matrix.
作为示例,可以根据上述损失函数优化脉冲神经网络的参数(权重),以最大程度的减少脉冲神经网络的损失。例如,可以使用梯度下降法优化脉冲神经网络的参数(权重),更新剪枝后的权重矩阵,以使得神经网络的损失最小。As an example, the parameters (weights) of the spiking neural network can be optimized according to the above loss function to minimize the loss of the spiking neural network. For example, the parameters (weights) of a spiking neural network can be optimized using gradient descent, and the pruned weight matrix can be updated to minimize the loss of the neural network.
步骤270:判断脉冲神经网络是否收敛。Step 270: Determine whether the spiking neural network has converged.
如果脉冲神经网络收敛,则结束;如果脉冲神经网络没有收敛,继续执行步骤230或步骤240,直至脉冲神经网络收敛。If the spiking neural network converges, the process ends; if the spiking neural network does not converge, continue to perform step 230 or step 240 until the spiking neural network converges.
上述技术方案中,半结构剪枝可实现每层权重矩阵的每行数目一致,分组半结构化剪枝可实现每层权重矩阵的每行以及每组数目一致。这样,有助于在硬件层面节省权重存储资源的同时,还有助于实现并行解压和并行计算,增加硬件层面的运算速度,从而提高计算效率,达到降低时延和功耗的效果。In the above technical solutions, semi-structured pruning can achieve the same number of rows in each layer of weight matrices, and grouped semi-structured pruning can achieve the same number of rows and groups in each layer of weight matrices. In this way, while saving weight storage resources at the hardware level, it also helps to achieve parallel decompression and parallel computing, increasing the computing speed at the hardware level, thereby improving computing efficiency and reducing latency and power consumption.
下面以图1所示的脉冲神经网络为例,结合图8-图12,对本申请实施例提供的一种脉冲神经网络的硬件层面进行详细描述。应理解,图8-图12的例子仅仅是为了帮助本领域技术人员理解本申请实施例,而非要将申请实施例限制于所示例的具体数值或具体场景。本领域技术人员根据下面所给出的图8-图12的例子,显然可以进行各种等价的修改或变化,这样的修改和变化也落入本申请实施例的范围内。Taking the spiking neural network shown in FIG. 1 as an example, the hardware level of the spiking neural network provided by the embodiments of the present application will be described in detail below with reference to FIGS. 8 to 12 . It should be understood that the examples in FIG. 8 to FIG. 12 are only for helping those skilled in the art to understand the embodiments of the present application, and are not intended to limit the embodiments of the present application to specific numerical values or specific scenarios exemplified. Those skilled in the art can obviously make various equivalent modifications or changes according to the examples of FIG. 8 to FIG. 12 given below, and such modifications and changes also fall within the scope of the embodiments of the present application.
图8是本申请实施例提供的一种脉冲神经网络电路的架构示意图。如图8所示,该脉冲神经网络电路可以包括:1~n个解压引擎(也可以称为解压模块或解压缩模块)、计算模块210。在该脉冲神经网络电路中,计算模块210中可以包括累加引擎250和计算引擎260。可选地,该脉冲神经网络电路还包括:输入缓存205、压缩模块220、关联压缩权重地址信息存储空间230、关联压缩权重存储空间240、权重累积存储空间270、神经元参数存储空间280、膜电压存储空间290。下面分别对上述各个模块的功能进行详细描述。FIG. 8 is a schematic structural diagram of an impulse neural network circuit provided by an embodiment of the present application. As shown in FIG. 8 , the spiking neural network circuit may include: 1 to n decompression engines (also referred to as decompression modules or decompression modules) and a calculation module 210 . In the spiking neural network circuit, the calculation module 210 may include an accumulation engine 250 and a calculation engine 260 . Optionally, the spiking neural network circuit further includes: an input buffer 205, a compression module 220, an associated compression weight address information storage space 230, an associated compression weight storage space 240, a weight accumulation storage space 270, a neuron parameter storage space 280, a membrane Voltage storage space 290 . The functions of each of the above modules will be described in detail below.
输入缓存205,用于存储发送输入脉冲的前神经元(输入神经元)的信息(该信息可为神经元的编号或索引),本实施例中该输入神经元可以是图1所示的输入层的神经元。作为示例,该输入缓存205可以是处理器的缓存。The input buffer 205 is used to store the information of the pre-neuron (input neuron) that sends the input pulse (the information may be the number or index of the neuron). In this embodiment, the input neuron may be the input shown in FIG. 1 layer of neurons. As an example, the input buffer 205 may be a processor's buffer.
压缩模块220,用于执行上述图2所示的方法以获得剪枝后的权重矩阵,该剪枝后的权重矩阵包括剪枝后的权重和与该权重对应的输出神经元编号。还可以将剪枝后的权重和与该权重对应的输出神经元编号存储至关联压缩权重存储空间240中。The compression module 220 is configured to execute the method shown in FIG. 2 above to obtain a pruned weight matrix, where the pruned weight matrix includes the pruned weight and an output neuron number corresponding to the weight. The pruned weight and the output neuron number corresponding to the weight may also be stored in the associated compressed weight storage space 240 .
关联压缩权重存储空间240,用于存储剪枝后的权重和与该权重对应的输出神经元编号,本实施例中该输出神经元可以是图1所示的隐藏层的神经元。作为示例,可以将图4中通过上述半结构化剪枝获得的剪枝后的权重以及对应的输出神经元编号按照某种对应关系进行关系,形成关联压缩权重数据,并将该关联压缩权重数据硬化到脉冲神经网络芯片的关联压缩权重存储空间240中。The associated compression weight storage space 240 is used to store the pruned weight and the number of the output neuron corresponding to the weight. In this embodiment, the output neuron may be the neuron of the hidden layer shown in FIG. 1 . As an example, the pruned weights obtained through the above semi-structured pruning in FIG. 4 and the corresponding output neuron numbers can be related according to a certain correspondence to form associated compressed weight data, and the associated compressed weight data Hardened into the associative compressed weight storage space 240 of the spiking neural network chip.
具体的,如图9所示,关联压缩权重存储空间240中存储有压缩权重和关联的索引。这种关联方法可以采用直接索引方法,也可以使用间接索引方法,本申请实施例对此不做具体限定。例如,直接索引方法是在每个压缩权重前加上相应的索引,索引的具体内容为该神经元的编号。又如,间接索引方法是在每个压缩权重前加上相应的索引,索引的具体内容为该压缩权重的神经元编号与上个压缩权重的神经元编号的距离。Specifically, as shown in FIG. 9 , the associated compression weight storage space 240 stores compression weights and associated indexes. This association method may adopt a direct indexing method or an indirect indexing method, which is not specifically limited in this embodiment of the present application. For example, the direct index method is to add a corresponding index before each compression weight, and the specific content of the index is the number of the neuron. For another example, the indirect index method is to add a corresponding index before each compression weight, and the specific content of the index is the distance between the neuron number of the compression weight and the neuron number of the previous compression weight.
应理解,图9显示了半结构化剪枝的关联压缩权重存储空间240的存储格式图。此图只显示脉冲神经网络某一层(例如,隐藏层)的格式图,其他层与此类似。其中,每一行代表一个输入神经元,每一列代表一个输出神经元。可以看出经过半结构化稀疏后,每一行的压缩权重的数目一致。当获得关联压缩权重后,将此权重矩阵硬化到芯片的关联压缩权重存储空间240中。It should be understood that FIG. 9 shows a storage format diagram of the associated compression weight storage space 240 for semi-structured pruning. This figure only shows the format diagram of one layer (for example, hidden layer) of the spiking neural network, other layers are similar. where each row represents an input neuron and each column represents an output neuron. It can be seen that after semi-structured sparse, the number of compression weights for each row is the same. After the associated compression weight is obtained, the weight matrix is hardened into the associated compression weight storage space 240 of the chip.
举例说明,以图4中通过半结构化剪枝获得的剪枝后的权重,索引的具体内容为该权重关联的神经元的编号为例,关联压缩权重存储空间240的第一行存储有1—W 11、4—W 14;第二行存储有2—W 22、6—W 26;第三行存储有1—W 31、5—W 35;第四行存储有3—W 43、4—W 44。其中,第一行存储的1—W 11对应于上文中的一个关联压缩权重,其表示压缩权重W 11和与该压缩权重对应的索引,该索引为与该压缩权重关联的输出神经元为1号神经元,4—W 14对应于上文中的一个关联压缩权重,其表示压缩权重W 14和和与该压缩权重对应的索引,该索引为与该压缩权重关联的输出神经元为4号神经元。 For example, taking the pruned weight obtained by semi-structured pruning in FIG. 4, the specific content of the index is the number of the neuron associated with the weight as an example, the first row of the associated compressed weight storage space 240 stores 1 -W 11 , 4-W 14 ; the second row stores 2-W 22 , 6-W 26 ; the third row stores 1-W 31 , 5-W 35 ; the fourth row stores 3-W 43 , 4 —W 44 . Among them, 1-W 11 stored in the first row corresponds to an associated compression weight above, which represents the compression weight W 11 and the index corresponding to the compression weight, and the index is 1 for the output neuron associated with the compression weight No. neuron, 4-W 14 corresponds to an associated compression weight above, which represents the compression weight W 14 and the index corresponding to the compression weight, the index is the output neuron associated with the compression weight is No. 4 neuron Yuan.
关联压缩权重地址信息存储空间230,用于存储上述关联压缩权重的地址解析信息。作为示例,该地址解析信息可以是每一行关联压缩权重的基地址以及每行压缩权重的数目。在图4所示的半结构化剪枝方案中,由于每行关联压缩权重数目一样,因此只需要存储一个每行权重数目,就可以根据每一行关联压缩权重的基地址计算出关联压缩权重存储空间240中相应的关联压缩权重的地址。相比较于非结构化的剪枝方案中由于每行的关联压缩权重数目不同需要分别存储每一行的关联压缩权重数目而言,可以节省权重存储资源。The associated compression weight address information storage space 230 is used to store the above-mentioned address resolution information associated with the compression weight. As an example, the geocoding information may be the base address of the associated compression weight for each row and the number of compression weights for each row. In the semi-structured pruning scheme shown in Figure 4, since the number of associated compression weights for each row is the same, only one number of weights per row needs to be stored, and the associated compression weight storage can be calculated according to the base address of the associated compression weights of each row. The address in space 240 of the corresponding associated compression weight. Compared with the unstructured pruning scheme, since the number of associated compression weights of each row is different, the number of associated compression weights of each row needs to be stored separately, which can save weight storage resources.
应理解,图9显示了半结构化剪枝的关联压缩权重地址信息存储空间230的存储格式图。此图只显示脉冲神经网络某一层(例如,隐藏层)的格式图,其他层与此类似。获得上述关联压缩权重的地址解析信息后,将此关联压缩权重的地址解析信息硬化到芯片的关联压缩权重地址信息存储空间230中。It should be understood that FIG. 9 shows a storage format diagram of the associated compressed weight address information storage space 230 of semi-structured pruning. This figure only shows the format diagram of one layer (for example, hidden layer) of the spiking neural network, other layers are similar. After obtaining the address resolution information associated with the compression weight, the address resolution information associated with the compression weight is hardened into the storage space 230 of the associated compression weight address information of the chip.
举例说明,以图4中通过半结构化剪枝获得的剪枝后的权重为例,关联压缩权重地址 信息存储空间230中可以存储每一行关联压缩权重的基地址以及每行的关联压缩权重数目,该数目例如为2。For example, taking the pruned weight obtained by semi-structured pruning in FIG. 4 as an example, the associated compression weight address information storage space 230 can store the base address of the associated compression weight of each row and the number of associated compression weights of each row. , the number is 2, for example.
解压引擎,用于根据多个输入神经元的信息对关联压缩权重存储空间240中存储的关联压缩权重进行解关联。具体的,参见图10,解压引擎可以从输入缓存205获取输入神经元编号,根据此编号在关联压缩权重地址信息存储空间230中解析出关联压缩权重的地址信息。根据该地址信息从关联压缩权重存储空间240获取关联压缩权重,并通过解关联模块1010对关联压缩权重进行解关联,获得相应的输出神经元编号以及权重。作为示例,若索引格式为直接索引,可以直接获取输出神经元编号和权重信息;若索引格式为间接索引,可以通过移位操作获取神经元编号和权重信息。The decompression engine is configured to de-associate the associative compression weights stored in the associative compression weight storage space 240 according to the information of the plurality of input neurons. Specifically, referring to FIG. 10 , the decompression engine may obtain the input neuron number from the input buffer 205 , and resolve the address information of the associated compression weight in the associated compression weight address information storage space 230 according to the number. The associated compression weight is obtained from the associated compression weight storage space 240 according to the address information, and the associated compression weight is disassociated through the disassociation module 1010 to obtain the corresponding output neuron number and weight. As an example, if the index format is a direct index, the output neuron number and weight information can be obtained directly; if the index format is an indirect index, the neuron number and weight information can be obtained through a shift operation.
由于本申请实施例权重压缩使用的是半结构化剪枝方案,使得剪枝后每行权重数目都一致,可以使用1~n个解压引擎,每个解压引擎负责根据多个输入神经元的信息对关联压缩权重存储空间240中一行的关联压缩权重进行解压。这样,1~n解压引擎同时进行并行解压,增加硬了脉冲神经网络芯片的运算速度,从而提高计算效率,达到降低时延和功耗的效果。Since the weight compression in the embodiment of this application uses a semi-structured pruning scheme, so that the number of weights in each row after pruning is the same, 1 to n decompression engines can be used, and each decompression engine is responsible for the information of multiple input neurons. The associative compression weights of a row in the associative compression weight storage space 240 are decompressed. In this way, the 1-n decompression engines perform parallel decompression at the same time, which increases the computing speed of the spiking neural network chip, thereby improving computing efficiency and reducing latency and power consumption.
举例说明,以图4中的半结构化剪枝方案为例,脉冲神经网络芯片中可以包括4个解压引擎,每个解压引擎负责对关联压缩权重存储空间240中一行的关联压缩权重进行解压。例如,解压引擎1负责对关联压缩权重存储空间240第一行中存储的关联压缩权重(例如,1—W 11、4—W 14)进行解压;解压引擎2负责对关联压缩权重存储空间240第二行中存储的关联压缩权重(例如,2—W 22、6—W 26)进行解压;解压引擎3负责对关联压缩权重存储空间240第三行中存储的关联压缩权重(例如,1—W 31、5—W 35)进行解压;解压引擎4负责对关联压缩权重存储空间240第四行中存储的关联压缩权重(例如,3—W 43、4—W 44)进行解压。 For example, taking the semi-structured pruning scheme in FIG. 4 as an example, the spiking neural network chip may include four decompression engines, and each decompression engine is responsible for decompressing the associated compression weights in a row in the associated compression weight storage space 240 . For example, the decompression engine 1 is responsible for decompressing the associated compression weights (for example, 1-W 11 , 4-W 14 ) stored in the first row of the associated compression weight storage space 240 ; the decompression engine 2 is responsible for decompressing the associated compression weight storage space 240 The associated compression weights (eg, 2-W 22 , 6-W 26 ) stored in the second row are decompressed; the decompression engine 3 is responsible for decompressing the associated compression weights (eg, 1-W ) stored in the third row of the associated compression weight storage space 240 31 , 5-W 35 ) for decompression; the decompression engine 4 is responsible for decompressing the associated compression weights (for example, 3-W 43 , 4-W 44 ) stored in the fourth row of the associated compression weight storage space 240 .
计算模块210,一个示例,可以包括累加引擎250和计算引擎260。其中,累加引擎250,用于对应输出神经元的权重的累加。具体的,参见图11,累加引擎250可以根据1~n个解压引擎输出的神经元编号,读取权重累积存储空间270中该神经元编号对应的权重累加值,并将该1~n个解压引擎输出的与该神经元编号对应的权重与该权重累加值进行累加,再将累加值写入权重累积存储空间270中。计算引擎260,用于输出神经元膜电压的计算。具体的,参见图12,在本层所有神经元全部完成解压和累加之后,计算引擎260从膜电压存储空间290,神经元参数空间280和权重累积存储空间270分别读取前上一个时间的膜电压、神经元参数配置和权重累加值,并通过神经元计算模块1201进行膜电压累积。如果膜电压超过阈值电压,则发放脉冲,并且膜电压清零写回膜电压存储空间290。如果膜电压未超过阈值电压,则将累积后的膜电压写回膜电压存储空间290。 Calculation module 210 , one example, may include accumulation engine 250 and calculation engine 260 . Among them, the accumulation engine 250 is used to accumulate the weights of the corresponding output neurons. Specifically, referring to FIG. 11 , the accumulation engine 250 can read the weight accumulation value corresponding to the neuron number in the weight accumulation storage space 270 according to the neuron numbers output by the 1-n decompression engines, and decompress the 1-n number of neurons. The weight corresponding to the neuron number output by the engine is accumulated with the weight accumulation value, and then the accumulated value is written into the weight accumulation storage space 270 . The calculation engine 260 is used to output the calculation of the neuron membrane voltage. Specifically, referring to FIG. 12 , after all neurons in this layer are decompressed and accumulated, the calculation engine 260 reads the membrane voltage of the previous time from the membrane voltage storage space 290 , the neuron parameter space 280 and the weight accumulation storage space 270 respectively. Voltage, neuron parameter configuration and weight accumulation value, and the neuron calculation module 1201 performs membrane voltage accumulation. If the membrane voltage exceeds the threshold voltage, a pulse is fired and the membrane voltage is written back to the membrane voltage storage space 290 . If the membrane voltage does not exceed the threshold voltage, the accumulated membrane voltage is written back to the membrane voltage storage space 290 .
权重累积存储空间270,用于存储各个输出神经元对应的权重累加值。The weight accumulation storage space 270 is used to store the weight accumulation value corresponding to each output neuron.
神经元参数空间280,用于存储脉冲神经网络的神经元参数配置信息。The neuron parameter space 280 is used to store the neuron parameter configuration information of the spiking neural network.
膜电压存储空间290,用于存储神经元累积的膜电压。Membrane voltage storage space 290 for storing the accumulated membrane voltage of the neuron.
下面以图8所示的脉冲神经网络电路为例,结合图13对该电路进行脉冲神经元膜电压计算的具体实现过程进行详细描述。应理解,图13的例子仅仅是为了帮助本领域技术人员理解本申请实施例,而非要将申请实施例限制于所示例的具体数值或具体场景。本领域技术人员根据下面所给出的图13的例子,显然可以进行各种等价的修改或变化,这样 的修改和变化也落入本申请实施例的范围内。Taking the spiking neural network circuit shown in FIG. 8 as an example, the specific implementation process of calculating the spiking neuron membrane voltage for the circuit will be described in detail below with reference to FIG. 13 . It should be understood that the example in FIG. 13 is only for helping those skilled in the art to understand the embodiments of the present application, but is not intended to limit the embodiments of the present application to specific numerical values or specific scenarios exemplified. According to the example of Fig. 13 given below, those skilled in the art can obviously make various equivalent modifications or changes, and such modifications and changes also fall within the scope of the embodiments of the present application.
图13是本申请实施例提供的一种进行脉冲神经元膜电压计算的方法的示意性流程图。如图13所示,该方法可以包括步骤1310-1350,下面分别对步骤1310-1350进行详细描述。FIG. 13 is a schematic flowchart of a method for calculating the membrane voltage of a spiking neuron provided by an embodiment of the present application. As shown in FIG. 13 , the method may include steps 1310-1350, and the steps 1310-1350 will be described in detail below respectively.
应理解,为了便于描述,图13是计算隐藏层的1~6号神经元的膜电压进行举例说明的,其他层的神经元的膜电压的计算与图13所示的方法类似。It should be understood that, for the convenience of description, FIG. 13 illustrates the calculation of the membrane voltages of neurons No. 1 to 6 in the hidden layer, and the calculation of the membrane voltages of neurons in other layers is similar to the method shown in FIG. 13 .
步骤1310:4个解压引擎分别从输入缓存205并行获取对应的输入神经元编号。Step 1310: The four decompression engines obtain corresponding input neuron numbers from the input buffer 205 in parallel, respectively.
作为示例,4个解压引擎(解压引擎1~解压引擎4)分别从输入缓存205获取输入神经元编号为输入层的7~10号神经元。As an example, the four decompression engines (decompression engine 1 to decompression engine 4 ) respectively obtain input neuron numbers from the input buffer 205 as the 7th to 10th neurons of the input layer.
步骤1320:4个解压引擎并行的根据输入神经元编号获取关联压缩权重,并进行解关联,得到输出神经元编号以及对应的权重。Step 1320: The four decompression engines obtain the associated compression weights in parallel according to the input neuron numbers, and perform de-association to obtain the output neuron numbers and corresponding weights.
作为示例,图8中可以包括4个解压引擎(解压引擎1~解压引擎4),每个解压引擎负责对关联压缩权重存储空间240中对应行的关联压缩权重进行解关联,4个解压引擎(解压引擎1~解压引擎4)可以并行完成关联压缩权重存储空间240中4行关联压缩权重的解关联。具体的,每个解压引擎并行地根据输入神经元编号在关联压缩权重地址信息存储空间230中解析出对应行的关联压缩权重的地址信息,根据该地址信息并行地从关联压缩权重存储空间240获取关联压缩权重,并对关联压缩权重进行解关联,获得相应的输出神经元编号以及权重。举例说明,解压引擎1负责对关联压缩权重存储空间240第一行中存储的关联压缩权重(例如,1—W 11、4—W 14)进行解压,得到1号输出神经元对应的权重值为W 11,4号输出神经元对应的权重值为W 14。同时,解压引擎2并行负责对关联压缩权重存储空间240第二行中存储的关联压缩权重(例如,2—W 22、6—W 26)进行解压,得到2号输出神经元对应的权重值为W 22,6号输出神经元对应的权重值为W 26。以此类推,其他解压引擎并行对关联压缩权重存储空间240中其他行的关联压缩权重进行解关联。 As an example, FIG. 8 may include 4 decompression engines (decompression engine 1 to decompression engine 4), each decompression engine is responsible for de-associating the associated compression weights of the corresponding rows in the associated compression weight storage space 240, and the 4 decompression engines ( The decompression engine 1 to the decompression engine 4) can perform the deassociation of the associated compression weights of the four rows in the associated compression weight storage space 240 in parallel. Specifically, each decompression engine parses the address information of the associated compression weight of the corresponding row in the associated compression weight address information storage space 230 according to the input neuron number in parallel, and obtains the address information from the associated compression weight storage space 240 in parallel according to the address information Associate the compression weights, and de-associate the associated compression weights to obtain the corresponding output neuron numbers and weights. For example, the decompression engine 1 is responsible for decompressing the associated compression weights (for example, 1-W 11 , 4-W 14 ) stored in the first row of the associative compression weight storage space 240 to obtain the weight value corresponding to the output neuron No. 1 as W 11 , the corresponding weight value of the output neuron No. 4 is W 14 . At the same time, the decompression engine 2 is responsible for decompressing the associated compression weights (for example, 2-W 22 , 6-W 26 ) stored in the second row of the associated compression weight storage space 240 in parallel, and obtains the weight value corresponding to the output neuron No. 2 as W 22 , the corresponding weight value of the output neuron No. 6 is W 26 . By analogy, other decompression engines de-associate the associated compression weights of other rows in the associated compression weight storage space 240 in parallel.
步骤1330:累加引擎250根据输出神经元编号以及对应的权重进行权重累加。Step 1330: The accumulation engine 250 performs weight accumulation according to the output neuron number and the corresponding weight.
累加引擎250可以根据上述输出神经元编号,读取权重累积存储空间270中该神经元编号对应的权重累加值,并将4个解压引擎(解压引擎1~解压引擎4)输出的与该神经元编号对应的权重与该权重累加值进行累加,再将累加值写入权重累积存储空间270中。The accumulation engine 250 can read the weight accumulation value corresponding to the neuron number in the weight accumulation storage space 270 according to the above-mentioned output neuron number, and compare the output of the four decompression engines (decompression engine 1 to decompression engine 4) with the neuron. The weight corresponding to the number is accumulated with the weight accumulation value, and then the accumulated value is written into the weight accumulation storage space 270 .
步骤1340:判断是否完成单层的累加。Step 1340: Determine whether the accumulation of the single layer is completed.
作为示例,可以判断本层所有神经元是否全部完成解压和累加,如果否,返回继续执行1310;如果是,继续执行步骤1350。As an example, it can be determined whether all the neurons in this layer have completed the decompression and accumulation, if not, return to continue to execute 1310; if yes, continue to execute step 1350.
步骤1350:计算引擎260计算神经元的膜电压。Step 1350: The calculation engine 260 calculates the neuron's membrane voltage.
计算引擎260在本层所有神经元全部完成解压和累加之后,从膜电压存储空间290,神经元参数空间280和权重累积存储空间270分别读取前上一个时间的膜电压、神经元参数配置和权重累加值,并通过神经元计算模块1201进行膜电压累积。如果膜电压超过阈值电压,则发放脉冲,并且膜电压清零写回膜电压存储空间290。如果膜电压未超过阈值电压,则将累积后的膜电压写回膜电压存储空间290。After the decompression and accumulation of all neurons in this layer are completed, the calculation engine 260 reads the membrane voltage, neuron parameter configuration and the previous time respectively from the membrane voltage storage space 290, the neuron parameter space 280 and the weight accumulation storage space 270. The weights are accumulated and the membrane voltage is accumulated by the neuron calculation module 1201. If the membrane voltage exceeds the threshold voltage, a pulse is fired and the membrane voltage is written back to the membrane voltage storage space 290 . If the membrane voltage does not exceed the threshold voltage, the accumulated membrane voltage is written back to the membrane voltage storage space 290 .
上述技术方案中,由于本申请实施例使用半结构化剪枝方案,使得每行权重数目都一致,可以使用多个解压引擎并行解关联,每个解压引擎负责对关联压缩权重存储空间240中一行的关联压缩权重进行解压。这样,多个解压引擎同时进行并行解压,增加硬了脉冲神经网络芯片的运算速度,从而提高计算效率,达到降低时延和功耗的效果。In the above technical solution, since the embodiment of the present application uses a semi-structured pruning solution, the number of weights in each row is consistent, and multiple decompression engines can be used to de-associate in parallel. The associated compression weights are decompressed. In this way, multiple decompression engines perform parallel decompression at the same time, which increases the computing speed of the pulsed neural network chip, thereby improving computing efficiency and reducing latency and power consumption.
下面以图1所示的脉冲神经网络为例,结合图14,对本申请实施例提供的另一种脉冲神经网络的硬件层面进行详细描述。应理解,图14的例子仅仅是为了帮助本领域技术人员理解本申请实施例,而非要将申请实施例限制于所示例的具体数值或具体场景。本领域技术人员根据下面所给出的图14的例子,显然可以进行各种等价的修改或变化,这样的修改和变化也落入本申请实施例的范围内。The following takes the spiking neural network shown in FIG. 1 as an example, and in conjunction with FIG. 14 , the hardware level of another spiking neural network provided by the embodiments of the present application will be described in detail. It should be understood that the example in FIG. 14 is only for helping those skilled in the art to understand the embodiments of the present application, rather than limiting the embodiments of the present application to specific numerical values or specific scenarios exemplified. According to the example of FIG. 14 given below, those skilled in the art can obviously make various equivalent modifications or changes, and such modifications and changes also fall within the scope of the embodiments of the present application.
图14是本申请实施例提供的另一种脉冲神经网络电路的架构示意图。如图14所示,该电路中可以包括:1~kn个解压引擎(解压引擎11~解压引擎kn)、计算模块210。在该脉冲神经网络电路中,计算模块210中可以包括多个1~k个计算子模块,例如,计算子模块1~计算子模块k,每个计算子模块中可以包括一个累积引擎和对应的一个计算引擎。可选地,该脉冲神经网络电路还包括:输入缓存205、压缩模块220、关联压缩权重地址信息存储空间230、关联压缩权重存储空间240、权重累积存储空间270、神经元参数存储空间280、膜电压存储空间290。FIG. 14 is a schematic structural diagram of another spiking neural network circuit provided by an embodiment of the present application. As shown in FIG. 14 , the circuit may include: 1 to kn decompression engines (decompression engine 11 to decompression engine kn) and a calculation module 210 . In the spiking neural network circuit, the calculation module 210 may include a plurality of 1 to k calculation sub-modules, for example, calculation sub-module 1 to calculation sub-module k, each calculation sub-module may include an accumulation engine and corresponding a computing engine. Optionally, the spiking neural network circuit further includes: an input buffer 205, a compression module 220, an associated compression weight address information storage space 230, an associated compression weight storage space 240, a weight accumulation storage space 270, a neuron parameter storage space 280, a membrane Voltage storage space 290 .
应理解,输入缓存205、关联压缩权重存储空间240、关联压缩权重地址信息存储空间230、权重累积存储空间270、神经元参数存储空间280、膜电压存储空间290的功能和图8所示的架构中的功能相同,具体的请参见图8中的描述,此处不再赘述。It should be understood that the functions of the input buffer 205, the associated compression weight storage space 240, the associated compression weight address information storage space 230, the weight accumulation storage space 270, the neuron parameter storage space 280, and the membrane voltage storage space 290 and the architecture shown in FIG. 8 The functions in are the same, for details, please refer to the description in FIG. 8 , which will not be repeated here.
与图8中的脉冲神经网络电路不同的是,图14所示的脉冲神经网络电路中,计算模块210中可以包括多个1~k个计算子模块,其中每一个计算子模块负责对应的一组权重组中的多个权重值分别确定对应的多个输出神经元的膜电压。作为示例,每一个计算子模块中包括一个累积引擎和对应的一个计算引擎,累积引擎负责确定与该计算子模块对应的一组权重组中的输出神经元对应的权重累加值;计算引擎负责根据该累加引擎输出的权重累加值确定该权重组中的输出神经元在当前时刻的膜电压。Different from the spiking neural network circuit in FIG. 8 , in the spiking neural network circuit shown in FIG. 14 , the calculation module 210 may include a plurality of 1 to k calculation sub-modules, wherein each calculation sub-module is responsible for a corresponding one. The multiple weight values in the group weight group respectively determine the membrane voltages of the corresponding multiple output neurons. As an example, each calculation submodule includes an accumulation engine and a corresponding calculation engine, and the accumulation engine is responsible for determining the weight accumulation value corresponding to the output neurons in a set of weight groups corresponding to the calculation submodule; the calculation engine is responsible for The weight accumulation value output by the accumulation engine determines the membrane voltage of the output neurons in the weight group at the current moment.
具体的,由于本申请实施例权重压缩使用的是分组半结构化剪枝方案,使得剪枝后每行每组权重数目都一致,因此,可以使用1~k个累加引擎进行并行累加,每个累加引擎负责对一组输出神经元对应的权重累加。同样的,也可以使用1~k个计算引擎进行并行计算,每个计算引擎负责根据对应的累加引擎输出的权重累加值进行输出神经元膜电压的计算。解压引擎01~解压引擎kn中,由于一行的每一组中权重的数目均一致,因此,解压引擎11~解压引擎1n可以并行对累加引擎1对应的分组中每行关联压缩权重进行解关联。同样的,解压引擎k1~解压引擎kn负责累加引擎k对应的分组中每行关联压缩权重的解关联,以此类推。Specifically, since the weight compression in the embodiment of the present application uses a grouped semi-structured pruning scheme, so that the number of weights in each row and each group is the same after pruning. Therefore, 1 to k accumulation engines can be used for parallel accumulation. The accumulation engine is responsible for accumulating the corresponding weights of a group of output neurons. Similarly, 1 to k calculation engines can also be used for parallel calculation, and each calculation engine is responsible for calculating the output neuron membrane voltage according to the weight accumulation value output by the corresponding accumulation engine. In the decompression engine 01 to the decompression engine kn, since the number of weights in each group of a row is the same, the decompression engine 11 to the decompression engine 1n can de-associate the associated compression weights of each row in the group corresponding to the accumulation engine 1 in parallel. Similarly, the decompression engine k1 to the decompression engine kn are responsible for the de-association of the associated compression weights of each row in the group corresponding to the accumulation engine k, and so on.
举例说明,以图7所示的脉冲神经网络的隐藏层被分为两组为例,图14中可以包括2个累加引擎(累加引擎1,累加引擎2)、2个计算引擎(计算引擎1,计算引擎2)、解压引擎11~14、解压引擎21~24。其中,解压引擎11~14中的每个解压引擎负责对第一组中对应行的关联压缩权重的进行解关联,累加引擎1负责第一组神经元的权重累加,计算引擎1负责第一组中神经元的膜电压计算。解压引擎21~24中的每个解压引擎负责对第二组中对应行的关联压缩权重的进行解关联,累加引擎2负责第二组神经元的权重累加,计算引擎2负责第二组中神经元的膜电压计算。For example, taking the hidden layer of the spiking neural network shown in Fig. 7 being divided into two groups as an example, Fig. 14 may include 2 accumulation engines (accumulation engine 1, accumulation engine 2), 2 calculation engines (calculation engine 1 , calculation engine 2), decompression engines 11-14, decompression engines 21-24. Among them, each of the decompression engines 11 to 14 is responsible for de-associating the associated compression weights of the corresponding rows in the first group, the accumulation engine 1 is responsible for the weight accumulation of the first group of neurons, and the calculation engine 1 is responsible for the first group. Membrane voltage calculations in neurons. Each of the decompression engines 21 to 24 is responsible for de-associating the associated compression weights of the corresponding rows in the second group, the accumulation engine 2 is responsible for the weight accumulation of the neurons in the second group, and the calculation engine 2 is responsible for the neurons in the second group. Element membrane voltage calculation.
例如,解压引擎11负责对关联压缩权重存储空间240第一行第一组中存储的关联压缩权重(例如,1—W 11)进行解压,得到1号输出神经元对应的权重值为W 11。解压引擎12并行负责对关联压缩权重存储空间240第二行第一组中存储的关联压缩权重(例如,2— W 22)进行解压,得到2号输出神经元对应的权重值为W 22。解压引擎13并行负责对关联压缩权重存储空间240第三行第一组中存储的关联压缩权重(例如,1—W 31)进行解压,得到1号输出神经元对应的权重值为W 31。解压引擎14并行负责对关联压缩权重存储空间240第四行第一组中存储的关联压缩权重(例如,3—W 43)进行解压,得到3号输出神经元对应的权重值为W 43。累加引擎1负责根据1~3号神经元的编号,读取权重累积存储空间270中该神经元编号对应的权重累加值,并将4个解压引擎(解压引擎11~解压引擎14)输出的与该神经元编号对应的权重与该权重累加值进行累加,再将累加值写入权重累积存储空间270中。 For example, the decompression engine 11 is responsible for decompressing the associative compression weights (for example, 1-W 11 ) stored in the first group of the first row of the associative compression weight storage space 240 to obtain the weight value corresponding to the output neuron No. 1 as W 11 . The decompression engine 12 is responsible for decompressing the associative compression weights (for example, 2-W 22 ) stored in the first group in the second row of the associative compression weight storage space 240 in parallel to obtain the weight value W 22 corresponding to the output neuron of No. 2 . The decompression engine 13 is responsible for decompressing the associative compression weights (for example, 1-W 31 ) stored in the first group in the third row of the associative compression weight storage space 240 in parallel to obtain the weight value W 31 corresponding to the output neuron No. 1 . The decompression engine 14 is responsible for decompressing the associative compression weights (eg, 3-W 43 ) stored in the first group in the fourth row of the associative compression weight storage space 240 in parallel to obtain the weight value corresponding to the output neuron No. 3 W 43 . The accumulation engine 1 is responsible for reading the weight accumulation value corresponding to the neuron number in the weight accumulation storage space 270 according to the number of the neuron number 1 to 3, and compares the output of the four decompression engines (decompression engine 11 to decompression engine 14) with the sum. The weight corresponding to the neuron number and the weight accumulation value are accumulated, and then the accumulated value is written into the weight accumulation storage space 270 .
又如,解压引擎21负责对关联压缩权重存储空间240第一行第二组中存储的关联压缩权重(例如,4—W 14)进行解压,得到4号输出神经元对应的权重值为W 14。解压引擎22并行负责对关联压缩权重存储空间240第二行第二组中存储的关联压缩权重(例如,6—W 26)进行解压,得到6号输出神经元对应的权重值为W 26。解压引擎23并行负责对关联压缩权重存储空间240第三行第二组中存储的关联压缩权重(例如,5—W 35)进行解压,得到5号输出神经元对应的权重值为W 35。解压引擎24并行负责对关联压缩权重存储空间240第四行第二组中存储的关联压缩权重(例如,4—W 44)进行解压,得到4号输出神经元对应的权重值为W 44。累加引擎1可以和累加引擎1并行工作,负责根据4~6号神经元的编号,读取权重累积存储空间270中该神经元编号对应的权重累加值,并将4个解压引擎(解压引擎21~解压引擎24)输出的与该神经元编号对应的权重与该权重累加值进行累加,再将累加值写入权重累积存储空间270中。 For another example, the decompression engine 21 is responsible for decompressing the associated compression weights (for example, 4-W 14 ) stored in the second group of the first row and the second group of the associated compression weight storage space 240 to obtain the weight value corresponding to the output neuron No. 4 as W 14 . The decompression engine 22 is responsible for decompressing the associative compression weights (for example, 6-W 26 ) stored in the second row and second group of the associative compression weight storage space 240 in parallel, and obtains the weight value corresponding to the output neuron No. 6 as W 26 . The decompression engine 23 is responsible for decompressing the associative compression weights (eg, 5-W 35 ) stored in the second group of the third row of the associative compression weight storage space 240 in parallel to obtain the weight value corresponding to the output neuron No. 5 W 35 . The decompression engine 24 is responsible for decompressing the associative compression weights (for example, 4-W 44 ) stored in the second group of the fourth row of the associative compression weight storage space 240 in parallel to obtain the weight value corresponding to the output neuron No. 4 W 44 . The accumulation engine 1 can work in parallel with the accumulation engine 1, and is responsible for reading the weight accumulation value corresponding to the neuron number in the weight accumulation storage space 270 according to the number of neurons No. ~The weight corresponding to the neuron number output by the decompression engine 24) is accumulated with the weight accumulation value, and then the accumulated value is written into the weight accumulation storage space 270.
应理解,上述是以将脉冲神经网络的某一层分为2组为例进行举例说明的,实际上图14所示的芯片中包括的累加引擎、计算引擎数量是和将某一层划分为几组来确定的。当然,也可以将某一层中的n个神经元划分为n组,每个神经元为一组,这样,就需要n个累加引擎、n个计算引擎。每个累加引擎负责一个神经元的权重累加,每个计算引擎负责一个神经元的膜电压的计算。It should be understood that the above is an example of dividing a certain layer of the spiking neural network into two groups. In fact, the number of accumulation engines and calculation engines included in the chip shown in FIG. 14 is divided into two groups. Several groups to determine. Of course, the n neurons in a certain layer can also be divided into n groups, and each neuron is a group. In this way, n accumulation engines and n calculation engines are required. Each accumulation engine is responsible for the weight accumulation of a neuron, and each calculation engine is responsible for the calculation of the membrane voltage of a neuron.
上述脉冲神经网络芯片中,由于权重压缩使用的是分组半结构化剪枝方案,使得剪枝后每行每组权重数目都一致,可以使用多个解压引擎并行解关联,使用多个累加引擎进行并行累加,并使用多个计算引擎进行并行计算。这样,进一步的增加硬了脉冲神经网络芯片的运算速度,从而提高计算效率,达到降低时延和功耗的效果。In the above-mentioned spiking neural network chip, since the weight compression uses a grouped semi-structured pruning scheme, the number of weights in each row and each group after pruning is the same. Accumulate in parallel, and use multiple computing engines for parallel computing. In this way, the computing speed of the impulse neural network chip is further increased, thereby improving computing efficiency and reducing time delay and power consumption.
图15是本申请实施例提供的一种脉冲神经网络系统1500的示意性框图。如图15所示,该脉冲神经网络系统1500中可以包括存储器1510,神经网络电路1520。FIG. 15 is a schematic block diagram of a spiking neural network system 1500 provided by an embodiment of the present application. As shown in FIG. 15 , the spiking neural network system 1500 may include a memory 1510 and a neural network circuit 1520 .
存储器1510可以用于存储压缩后的多个权重值,作为示例,该存储器1510可以对应于上文中的关联压缩权重存储空间240。可选地,存储器1510还可以用于存储输入神经元的信息,作为示例,该存储器1510可以对应于上文中的输入缓存205。The memory 1510 may be used to store the compressed weight values, and as an example, the memory 1510 may correspond to the associated compressed weight storage space 240 above. Optionally, the memory 1510 can also be used to store the information of the input neurons. As an example, the memory 1510 can correspond to the input buffer 205 above.
神经网络电路1520的实现方式有多种,本申请实施例对此不做限定。例如,神经网络电路1520可以是图8所示的脉冲神经网络电路,又如,神经网络电路1520还可以是图14所示的脉冲神经网络电路,具体的请参考上文中对脉冲神经网络电路的描述,此处不再赘述。There are various implementation manners of the neural network circuit 1520, which are not limited in this embodiment of the present application. For example, the neural network circuit 1520 may be the spiking neural network circuit shown in FIG. 8 , or the neural network circuit 1520 may also be the spiking neural network circuit shown in FIG. 14 . For details, please refer to the above section on the spiking neural network circuit description, which will not be repeated here.
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的 实施过程构成任何限定。It should be understood that, in various embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be dealt with in the embodiments of the present application. implementation constitutes any limitation.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this. should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims (13)

  1. 一种脉冲神经网络电路,其特征在于,包括:An impulse neural network circuit, characterized in that it includes:
    多个解压缩模块,用于分别根据多个输入神经元的信息获得压缩后的权重矩阵中的多个权重值以及对应的多个输出神经元的标识,其中,所述多个解压缩模块中的每个解压缩模块用于并行获得所述压缩后的权重矩阵中相同行数的权重值以及所述相同行数的权重值对应的多个输出神经元的标识,所述压缩后的权重矩阵中每一行的非零权重值的数量相同,每一行的权重值对应一个输入神经元;A plurality of decompression modules are used to respectively obtain a plurality of weight values in the compressed weight matrix and the identifiers of the corresponding plurality of output neurons according to the information of the plurality of input neurons, wherein, among the plurality of decompression modules Each decompression module is used to obtain in parallel the weight values of the same row number in the compressed weight matrix and the identifiers of multiple output neurons corresponding to the weight values of the same row number, and the compressed weight matrix The number of non-zero weight values in each row is the same, and the weight value of each row corresponds to an input neuron;
    计算模块,用于根据所述多个权重值分别确定对应的所述多个输出神经元的膜电压。The calculation module is configured to respectively determine the corresponding membrane voltages of the multiple output neurons according to the multiple weight values.
  2. 根据权利要求1所述的电路,其特征在于,所述脉冲神经网络电路中的输入神经元包括第一输入神经元和第二输入神经元,所述多个解压缩模块包括第一解压缩模块和第二解压缩模块,The circuit of claim 1, wherein the input neurons in the spiking neural network circuit comprise a first input neuron and a second input neuron, and the plurality of decompression modules comprise a first decompression module and the second decompression module,
    所述第一解压缩模块,用于获得所述压缩后的权重矩阵中与所述第一输入神经元对应的第一行权重值以及所述第一行权重值分别对应的一个或多个输出神经元的标识;The first decompression module is configured to obtain the weight value of the first row corresponding to the first input neuron in the compressed weight matrix and one or more outputs corresponding to the weight value of the first row respectively the identity of the neuron;
    所述第二解压缩模块,用于获得所述压缩后的权重矩阵中与所述第二输入神经元对应的第二行权重值以及所述第二行权重值分别对应的一个或多个输出神经元的标识。The second decompression module is configured to obtain the weight value of the second row corresponding to the second input neuron in the compressed weight matrix and one or more outputs corresponding to the weight value of the second row respectively The identity of the neuron.
  3. 根据权利要求1或2所述的电路,其特征在于,还包括:The circuit according to claim 1 or 2, further comprising:
    压缩模块,用于根据剪枝比例对初始权重矩阵中的部分权重值进行剪枝,得到所述压缩后的权重矩阵。The compression module is used for pruning some weight values in the initial weight matrix according to the pruning ratio to obtain the compressed weight matrix.
  4. 根据权利要求1至3中任一项所述的电路,其特征在于,所述压缩后的权重矩阵包括多个权重组,所述多个权重组的每个权重组中每一行的非零权重值数目相同。The circuit according to any one of claims 1 to 3, wherein the compressed weight matrix includes a plurality of weight groups, and each row of each weight group of the plurality of weight groups has a non-zero weight The number of values is the same.
  5. 根据权利要求4所述的电路,其特征在于,所述计算模块中包括多个计算子模块,所述多个计算子模块中的每个计算子模块用于并行负责计算一个权重组中的输出神经元的膜电压。The circuit according to claim 4, wherein the calculation module includes a plurality of calculation sub-modules, and each calculation sub-module of the plurality of calculation sub-modules is used for calculating the output in one weight group in parallel Neuronal membrane voltage.
  6. 根据权利要求5所述的电路,其特征在于,所述多个计算子模块包括第一计算子模块和第二计算子模块,所述第一计算子模块包括第一累加引擎和第一计算引擎,所述第二计算子模块包括第二累加引擎和第二计算引擎,The circuit of claim 5, wherein the plurality of calculation submodules comprise a first calculation submodule and a second calculation submodule, and the first calculation submodule comprises a first accumulation engine and a first calculation engine , the second calculation submodule includes a second accumulation engine and a second calculation engine,
    所述第一累加引擎,用于确定与所述第一计算子模块对应的第一权重组中的输出神经元对应的权重累加值;The first accumulation engine is used to determine the weight accumulation value corresponding to the output neuron in the first weight group corresponding to the first calculation submodule;
    所述第一计算引擎,用于根据所述第一累加引擎输出的权重累加值,确定所述第一权重组中的输出神经元在当前时刻的膜电压;the first calculation engine, configured to determine the membrane voltage of the output neuron in the first weight group at the current moment according to the weight accumulation value output by the first accumulation engine;
    所述第二累加引擎,用于确定与所述第二计算子模块对应的第二权重组中的输出神经元对应的权重累加值;The second accumulation engine is used to determine the weight accumulation value corresponding to the output neuron in the second weight group corresponding to the second calculation submodule;
    所述第二计算引擎,用于根据所述第二累加引擎输出的权重累加值,确定所述第二权重组中的输出神经元在所述当前时刻的膜电压。The second calculation engine is configured to determine the membrane voltage of the output neurons in the second weight group at the current moment according to the weight accumulation value output by the second accumulation engine.
  7. 一种基于脉冲神经网络的计算方法,其特征在于,包括:A computing method based on a spiking neural network, comprising:
    分别根据多个输入神经元的信息获得压缩后的权重矩阵中的多个权重值以及对应的多个输出神经元的标识,其中,所述多个权重值包括并行获得的所述压缩后的权重矩阵中 相同行数的权重值,所述多个输出神经元的标识包括并行获得的与所述相同行数的权重值对应的多个输出神经元的标识,所述压缩后的权重矩阵中每一行的非零权重值的数量相同,每一行的权重值对应一个输入神经元;Obtaining multiple weight values in the compressed weight matrix and the identifiers of the corresponding multiple output neurons respectively according to the information of multiple input neurons, wherein the multiple weight values include the compressed weights obtained in parallel The weight values of the same row number in the matrix, the identifiers of the multiple output neurons include the identifiers of the multiple output neurons corresponding to the weight values of the same row number obtained in parallel, and each of the compressed weight matrix The number of non-zero weight values in a row is the same, and the weight value in each row corresponds to an input neuron;
    根据所述多个权重值分别确定对应的所述多个输出神经元的膜电压。The membrane voltages of the corresponding plurality of output neurons are respectively determined according to the plurality of weight values.
  8. 根据权利要求7所述的方法,其特征在于,所述脉冲神经网络的输入神经神经元包括第一输入神经元和第二输入神经元,The method according to claim 7, wherein the input neuron of the spiking neural network comprises a first input neuron and a second input neuron,
    所述根据多个输入神经元的信息分别获得压缩后的权重矩阵中的多个权重值以及对应的多个输出神经元的标识包括:The obtaining of the multiple weight values in the compressed weight matrix and the identifiers of the corresponding multiple output neurons respectively according to the information of the multiple input neurons includes:
    获得所述压缩后的权重矩阵中与所述第一输入神经元对应的第一行权重值以及所述第一行权重值分别对应的一个或多个输出神经元的标识;Obtain the weight value of the first row corresponding to the first input neuron in the compressed weight matrix and the identifier of one or more output neurons corresponding to the weight value of the first row respectively;
    获得所述压缩后的权重矩阵中与所述第二输入神经元对应的第二行权重值以及所述第二行权重值分别对应的一个或多个输出神经元的标识。Obtain the weight value of the second row corresponding to the second input neuron and the identifier of one or more output neurons corresponding to the weight value of the second row in the compressed weight matrix.
  9. 根据权利要求7或8所述的方法,其特征在于,所述方法还包括:The method according to claim 7 or 8, wherein the method further comprises:
    根据剪枝比例对初始权重矩阵中的部分权重值进行剪枝,得到所述压缩后的权重矩阵。Part of the weight values in the initial weight matrix is pruned according to the pruning ratio to obtain the compressed weight matrix.
  10. 根据权利要求7至9中任一项所述的方法,其特征在于,所述压缩后的权重矩阵包括多个权重组,所述多个权重组的每个权重组中每一行的非零权重值数目相同。The method according to any one of claims 7 to 9, wherein the compressed weight matrix includes a plurality of weight groups, and a non-zero weight of each row in each weight group of the plurality of weight groups The number of values is the same.
  11. 根据权利要求10所述的方法,其特征在于,所述根据所述多个权重值分别确定对应的所述多个输出神经元的膜电压,包括:The method according to claim 10, wherein the determining the corresponding membrane voltages of the plurality of output neurons according to the plurality of weight values comprises:
    并行根据所述每个权重组中的多个权重值分别确定对应的所述多个输出神经元的膜电压。In parallel, the corresponding membrane voltages of the plurality of output neurons are determined according to the plurality of weight values in each weight group.
  12. 根据权利要求11所述的方法,其特征在于,所述多个权重组包括第一权重组和第二权重组,The method of claim 11, wherein the plurality of weight groups include a first weight group and a second weight group,
    所述并行根据所述每个权重组中的多个权重值分别确定对应的所述多个输出神经元的膜电压,包括:The parallel determination of the corresponding membrane voltages of the multiple output neurons according to multiple weight values in each weight group includes:
    确定与所述第一权重组中的输出神经元对应的权重累加值,并根据所述第一权重组中的输出神经元对应的权重累加值确定所述第一权重组中的输出神经元在当前时刻的膜电压;以及确定与所述第二权重组中的输出神经元对应的权重累加值,并根据所述第二权重组中的输出神经元对应的权重累加值确定所述第二权重组中的输出神经元在所述当前时刻的膜电压。Determine the weight accumulation value corresponding to the output neuron in the first weight group, and determine whether the output neuron in the first weight group is in the first weight group according to the weight accumulation value corresponding to the output neuron in the first weight group the membrane voltage at the current moment; and determining the weight accumulation value corresponding to the output neurons in the second weight group, and determining the second weight according to the weight accumulation value corresponding to the output neurons in the second weight group Membrane voltage of the output neuron in the reorganization at the current moment.
  13. 一种脉冲神经网络系统,其特征在于,包括存储器以及如权利要求1至6中任意一项所述的脉冲神经网络电路,所述存储器用于存储压缩后的多个权重值。A spiking neural network system is characterized by comprising a memory and the spiking neural network circuit according to any one of claims 1 to 6, wherein the memory is used for storing a plurality of compressed weight values.
PCT/CN2022/076269 2021-04-02 2022-02-15 Spiking neural network circuit and spiking neural network-based calculation method WO2022206193A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22778378.4A EP4283522A1 (en) 2021-04-02 2022-02-15 Spiking neural network circuit and spiking neural network-based calculation method
US18/475,262 US20240013037A1 (en) 2021-04-02 2023-09-27 Spiking neural network circuit and spiking neural network-based calculation method

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202110363578 2021-04-02
CN202110363578.3 2021-04-02
CN202110588707.9A CN115169523A (en) 2021-04-02 2021-05-28 Impulse neural network circuit and computing method based on impulse neural network
CN202110588707.9 2021-05-28

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/475,262 Continuation US20240013037A1 (en) 2021-04-02 2023-09-27 Spiking neural network circuit and spiking neural network-based calculation method

Publications (1)

Publication Number Publication Date
WO2022206193A1 true WO2022206193A1 (en) 2022-10-06

Family

ID=83455557

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/076269 WO2022206193A1 (en) 2021-04-02 2022-02-15 Spiking neural network circuit and spiking neural network-based calculation method

Country Status (3)

Country Link
US (1) US20240013037A1 (en)
EP (1) EP4283522A1 (en)
WO (1) WO2022206193A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180276529A1 (en) * 2017-03-24 2018-09-27 Intel Corporation Handling signal saturation in spiking neural networks
CN110110851A (en) * 2019-04-30 2019-08-09 南京大学 A kind of the FPGA accelerator and its accelerated method of LSTM neural network
CN110543933A (en) * 2019-08-12 2019-12-06 北京大学 Pulse type convolution neural network based on FLASH memory array
CN110659730A (en) * 2019-10-10 2020-01-07 电子科技大学中山学院 Method for realizing end-to-end functional pulse model based on pulse neural network
CN110991623A (en) * 2019-12-20 2020-04-10 中国科学院自动化研究所 Neural network operation system based on digital-analog hybrid neurons

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180276529A1 (en) * 2017-03-24 2018-09-27 Intel Corporation Handling signal saturation in spiking neural networks
CN110110851A (en) * 2019-04-30 2019-08-09 南京大学 A kind of the FPGA accelerator and its accelerated method of LSTM neural network
CN110543933A (en) * 2019-08-12 2019-12-06 北京大学 Pulse type convolution neural network based on FLASH memory array
CN110659730A (en) * 2019-10-10 2020-01-07 电子科技大学中山学院 Method for realizing end-to-end functional pulse model based on pulse neural network
CN110991623A (en) * 2019-12-20 2020-04-10 中国科学院自动化研究所 Neural network operation system based on digital-analog hybrid neurons

Also Published As

Publication number Publication date
US20240013037A1 (en) 2024-01-11
EP4283522A1 (en) 2023-11-29

Similar Documents

Publication Publication Date Title
Lin et al. Research on convolutional neural network based on improved Relu piecewise activation function
Mostafa et al. Fast classification using sparsely active spiking networks
KR20210134363A (en) Neural network-based quantum error correction decoding method and apparatus, chip
JP7366274B2 (en) Adaptive search method and device for neural networks
Abdelsalam et al. An efficient FPGA-based overlay inference architecture for fully connected DNNs
CN113159345A (en) Power grid fault identification method and system based on fusion neural network model
CN110263917B (en) Neural network compression method and device
Qi et al. Learning low resource consumption cnn through pruning and quantization
WO2022206193A1 (en) Spiking neural network circuit and spiking neural network-based calculation method
CN115544029A (en) Data processing method and related device
CN112990454A (en) Neural network calculation acceleration method and device based on integrated DPU multi-core isomerism
Huang et al. Real-time radar gesture classification with spiking neural network on SpiNNaker 2 prototype
Zhan et al. Field programmable gate array‐based all‐layer accelerator with quantization neural networks for sustainable cyber‐physical systems
CN111612046A (en) Characteristic pyramid graph convolutional neural network and application thereof in 3D point cloud classification
Wen et al. Novel pruning of dendritic neuron models for improved system implementation and performance
CN112036554B (en) Neural network model processing method and device, computer equipment and storage medium
CN112101537B (en) CNN accelerator and electronic device
CN115169523A (en) Impulse neural network circuit and computing method based on impulse neural network
JP7230324B2 (en) Neural network learning method, computer program and computer device
Liu Model Optimization Techniques for Embedded Artificial Intelligence
Chang et al. Optimizing Big Data Retrieval and Job Scheduling Using Deep Learning Approaches.
KR20210157826A (en) Method for sturcture learning and model compression for deep neural netwrok
KR20200135117A (en) Decompression apparatus and control method thereof
WO2019200548A1 (en) Network model compiler and related product
Guo et al. Dynamic Neural Network Structure: A Review for Its Theories and Applications

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22778378

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022778378

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022778378

Country of ref document: EP

Effective date: 20230824

NENP Non-entry into the national phase

Ref country code: DE