CN112396176B - Hardware neural network batch normalization system - Google Patents

Hardware neural network batch normalization system Download PDF

Info

Publication number
CN112396176B
CN112396176B CN202011251999.9A CN202011251999A CN112396176B CN 112396176 B CN112396176 B CN 112396176B CN 202011251999 A CN202011251999 A CN 202011251999A CN 112396176 B CN112396176 B CN 112396176B
Authority
CN
China
Prior art keywords
synapse
neural network
batch normalization
circuit
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011251999.9A
Other languages
Chinese (zh)
Other versions
CN112396176A (en
Inventor
李祎
秦一凡
缪向水
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202011251999.9A priority Critical patent/CN112396176B/en
Publication of CN112396176A publication Critical patent/CN112396176A/en
Application granted granted Critical
Publication of CN112396176B publication Critical patent/CN112396176B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a hardware neural network batch normalization system, which comprises cascaded C-layer neural network circuits; the output control circuit of the p layer neural network circuit is connected with the weight area input coding circuit in the p +1 layer neural network circuit; p-1, 2, …, C-1; the p-layer neural network circuit comprises a weight region input coding circuit, a batch normalization region input coding circuit, a weight region synapse unit, a batch normalization region synapse unit, an activation layer circuit and an output control circuit; the batch normalization formula is deduced and simplified by combining with the characteristics of the neural network activation function, the batch normalization region synaptic unit is adopted to store the batch normalization parameter information of the neural network, and the normalization process is corresponding to the process of summing the output of the weight region synaptic unit and the batch normalization parameter information of the neural network according to rows, so that the original complex hardware function adapts to a memory storage and calculation integrated framework, the circuit complexity for realizing the batch normalization hardware function is greatly simplified, and higher network precision can be realized with lower circuit area consumption.

Description

Hardware neural network batch normalization system
Technical Field
The invention belongs to the technical field of artificial neural networks, and particularly relates to a hardware neural network batch normalization system.
Background
In the big data era, more and more artificial intelligence and deep learning are applied to daily life, but the artificial intelligence and deep learning are limited by a von Neumann architecture with a traditional memory and a processor separated, and an existing neural network hardware implementation and acceleration system faces more and more serious 'storage wall' problems. The neural network memory computing hardware system based on the mature memory and the novel memory has the characteristics of high parallelism, low delay, low power consumption and no obvious limit on storage and computation, is expected to break through the von Neumann bottleneck problem of the traditional computer architecture, and has great potential and significance in the current era background.
With the upgrading of application scenes and the increasing of task difficulty, a neural network algorithm develops towards more complexity and deeper layers, so that the convergence speed and the inference time of the neural network and the accuracy performance of the whole network present higher requirements. The batch normalization algorithm is increasingly emphasized and adopted as a part of the neural network optimization algorithm, and aims to normalize the changed data distribution of the neural network middle layer into a distribution with mean and variance more suitable for neural network convergence, and the normalized activation value is input into an activation function to generate a better layer output distribution. The batch normalization operation can obviously increase the convergence speed during the neural network training and accelerate the inference, and the regularization effect brought by the batch normalization operation can improve the precision performance of the neural network, which is very key in the hardware process of the neural network, and particularly for the low-precision neural network facing to the edge intelligent equipment, the batch normalization algorithm can obviously improve the network performance and efficiency.
The batch normalization parameters are continuously learned, adjusted and optimized in the neural network training process, after the neural network training is completed, different batch normalization parameter values of each neuron are determined and memorized in a constant form, and the batch normalization parameters are necessary to be adjusted according to different application scenes and tasks, so that the batch normalization hardware circuit needs to meet the variability and the fixity at the same time. At present, hardware implementation solutions for batch normalization algorithms are mainly divided into two categories, one is to omit the batch normalization algorithms on the premise of sacrificing the precision performance of a neural network, and the precision loss caused by the solutions is particularly prominent in application scenes with complex tasks and high precision requirements; secondly, a batch normalization circuit is built based on a traditional Metal-Oxide-Semiconductor (CMOS) transistor, parameters stored in the CMOS transistor cannot be changed after the CMOS transistor is manufactured, batch normalization CMOS circuits with the number equivalent to that of neurons need to be built and additional control circuits need to be built to provide different parameters due to the fact that batch normalization parameters are different for each neuron, and a large amount of circuits and additional control circuits of the traditional CMOS transistor consume a large amount of area and power consumption. Therefore, the proposal of a new batch normalization system becomes an urgent need.
Disclosure of Invention
In view of the above defects or improvement requirements of the prior art, the present invention provides a hardware neural network batch normalization system, which aims to solve the technical problem that the prior art cannot realize higher network precision with lower circuit area consumption.
In order to achieve the above object, the present invention provides a hardware neural network batch normalization system, which includes cascaded C-layer neural network circuits; c is a positive integer; the output control circuit of the p-th layer neural network circuit is connected with the weight area input coding circuit in the p + 1-th layer neural network circuit; p-1, 2, …, C-1;
the p-layer neural network circuit comprises a weight region input coding circuit, a batch normalization region input coding circuit, a weight region synapse unit, a batch normalization region synapse unit, a first activation layer circuit and an output control circuit; at this time, the weight area synapse units and the batch normalization area synapse units are both arrays formed by electronic synapse devices, the number of rows of the weight area synapse units and the number of rows of the batch normalization area synapse units are the same, and the rows are respectively connected; the output end of the weight region input coding circuit is connected with each column of the synapse units in the weight region; the output end of the input coding circuit of the batch normalization region is connected with each column of the synapse units of the batch normalization region; each row of the synapse units in the batch normalization region is connected with a first activation layer circuit, the first activation layer circuit comprises a plurality of first operational amplifiers, the synapse units in the batch normalization region are divided into a plurality of groups of synaptic blocks according to the rows, each group of synaptic blocks is composed of two adjacent rows of electronic synapse devices, and each group of synaptic blocks is connected with one first operational amplifier in a differential pair mode; the output ends of the first active layer circuits are connected with an output control circuit;
the weight area input coding circuit is used for coding input information of a system or output information X of a previous layer of neural network circuit to obtain a corresponding pulse signal, and the corresponding pulse signal is input into a weight area synapse unit;
the batch normalization region input coding circuit is used for inputting a logic level '1' pulse into the batch normalization region synapse unit, and the input time is synchronous with the time when the weight region input coding circuit inputs a pulse signal into the weight region synapse unit;
the weight area synapse unit is used for storing synapse weight information W of the neural network, realizing matrix vector multiplication WX under the action of pulse signals input by the weight area input coding circuit, and outputting the matrix vector multiplication WX according to rows;
the batch normalization region synapse unit is used for storing neural network batch normalization parameter information K, and under the action of logic level '1' pulse input by the batch normalization region input coding circuit, the output of the weight region synapse unit and the neural network batch normalization parameter information K are summed according to rows and then output to the first activation layer circuit;
the first active layer circuit is used for comparing the output results of all rows in the bump contact block connected with the first operational amplifier by adopting the first operational amplifier to obtain mapping results, inputting the mapping results into the output control circuit for integration, and inputting the results into the p +1 layer neural network circuit;
the layer C neural network circuit comprises a weight region input coding circuit, a weight region synapse unit, a second activation layer circuit and an output control circuit; at the moment, the output end of the weight region input coding circuit is connected with each column of the synapse units in the weight region; each row of the synapse units in the weight area is connected with a second activation layer circuit, the second activation layer circuit comprises a plurality of second operational amplifiers, the synapse units in the weight area are divided into a plurality of groups of synaptic blocks according to the rows, each group of synaptic blocks is composed of two adjacent rows of electronic synapse devices, and each group of synaptic blocks is connected with one second operational amplifier in a differential pair mode; the output ends of the second active layer circuits are connected with the output control circuit;
and the second active layer circuit is used for subtracting the output results of all rows in the bump blocks connected with the second operational amplifier by using the second operational amplifier and integrating the subtraction result by the output control circuit to obtain a final result.
Further preferably, in each layer of neural network circuit, the synaptic units in the weight area and the synaptic units in the batch normalization area are different in size.
Further preferably, the weight area synapse unit stores the neural network synapse weight information as
Figure BDA0002771881450000041
At this time, the size of the synaptic units in the weight region is 2 MxN, wijIs the difference between the electronic synapse devices in row 2i-1 and column j and the electronic synapse devices in row 2i and column j, i equals 1,2,…,M,j=1,2,…,N。
Further preferably, the neural network batch normalization parameter information stored by the synaptic units in the batch normalization area is
Figure BDA0002771881450000042
At this time, the size of the synapse units in the batch normalization region is 2 MxL,
Figure BDA0002771881450000043
krsfor the difference between the electronic synapse devices in row 2r-1 and column s and the electronic synapse devices in row 2r and column s, r is 1,2, …, M, s is 1,2, …, L is determined by the normalized parameter information and the electronic synapse device accuracy.
Further preferably, the output control circuit comprises a plurality of neurons; in the p-layer neural network circuit, all the neurons are connected with all the first operational amplifiers; in the layer C neural network circuit, each neuron is fully connected with each second operational amplifier.
Further preferably, in the hardware neural network batch normalization system, in the training process, the synapse units in the weight region and the synapse units in the batch normalization region update the synapse weights synchronously; and after the training is finished, keeping the synaptic weights in the synaptic units in the weight area and the synaptic units in the batch normalization area unchanged.
Further preferably, the electronic synapse device comprises a two-terminal electronic synapse device or a multi-terminal electronic synapse device;
the two-terminal electronic synapse device comprises a resistive random access memory, a phase change random access memory, a magnetic random access memory, a ferroelectric random access memory or a novel two-dimensional material device;
a multi-terminal electronic synapse device comprises a floating gate transistor or a synapse transistor.
Further preferably, when the electronic synapse device is a two-terminal electronic synapse device, the resistance state, the crystallization state, the magnetization state, and the electrical polarization state of the device are changed by applying a predetermined voltage or current or an external magnetic field to two terminals of the two-terminal electronic synapse device, so as to adjust the synapse weight.
Further preferably, when the electronic synapse device is a multi-terminal electronic synapse device, a gate, a source, and a drain of the multi-terminal electronic synapse device are used as input ports of synapses, and a channel resistance state between the source and the drain of the multi-terminal electronic synapse device is used as a synapse weight; and adjusting the synapse weight by controlling the voltage of each electrode of the multi-terminal electronic synapse device.
In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:
1. the invention provides a hardware neural network batch normalization system, which is characterized in that a batch normalization formula is combined with the characteristics of a neural network activation function to be deduced and simplified, batch normalization region synaptic units are adopted to store neural network batch normalization parameter information, a normalization process is correspondingly taken as a process of summing the output of weight region synaptic units and the neural network batch normalization parameter information according to rows, and the neural network synaptic weight information and the neural network batch normalization parameter information are synchronously updated in a training process, so that the original complex hardware function adapts to a memory storage and calculation integrated framework, the circuit complexity for realizing the batch normalization hardware function is greatly simplified, and higher network precision can be realized with lower circuit area consumption.
2. Compared with the traditional batch normalization differential design, the hardware neural network batch normalization system provided by the invention can realize batch normalization operation on the basis of not adding peripheral circuit complexity, improve neural network identification precision and accelerate network convergence; compared with the traditional CMOS batch normalization design, the complexity of a peripheral circuit is greatly simplified, and the energy efficiency and the hardware area are improved.
3. According to the hardware neural network batch normalization system provided by the invention, the synapse unit is realized by the electronic synapse device, so that the synapse unit can modify and store weight and parameters according to the neural network requirement; the weight area synapse units memorize the weight information of the layer and receive the input mode information or the upper layer input signals, the batch normalization area synapse units memorize the batch normalization information of the layer and synchronously receive logic '1' electric pulses, and the synapse arrays realize multiplication and addition of the neural network and batch normalization operation in one step.
Drawings
FIG. 1 is a schematic diagram of a hardware neural network batch normalization system according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a neural network flow for performing 0-9 handwritten digit pattern recognition tasks according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a hardware neural network batch normalization system corresponding to a neural network for executing a 0-9 handwritten digit pattern recognition task according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
In order to achieve the above object, as shown in fig. 1, the present invention provides a hardware neural network batch normalization system, and in particular, a hardware neural network batch normalization system suitable for an activation function being a symbolic function, which includes a cascaded C-layer neural network circuit; c is a positive integer; the output control circuit of the p-th layer neural network circuit is connected with the weight area input coding circuit in the p + 1-th layer neural network circuit; p-1, 2, …, C-1; in each layer of neural network circuit, the scale of the weight area synapse unit and the scale of the batch normalization area synapse unit are different, the scale of the weight area parameter is determined by the neural network parameter, the scale of the weight area synapse unit is determined by combining the precision of the electronic synapse device, and generally, a pair of differential pair electronic synapses represents a weight parameter; the batch normalization region parameter scale is determined by the number of the layer of neurons, the parameters correspond to the neurons under general conditions, corresponding optimization measures such as the number of characteristic graphs and other parameter adjustment in a convolutional neural network exist, and the array scale is determined by combining the precision of the electronic synapse device and corresponding margins. It should be noted that the scales of the synapse units in the weight area and the synapse units in the batch normalization area are the effective scales of the synapse units in the weight area and the synapse units in the batch normalization area, that is, the scales of the electronic synapse devices used for the neural network calculation.
The p-layer neural network circuit comprises a weight region input coding circuit, a batch normalization region input coding circuit, a weight region synapse unit, a batch normalization region synapse unit, a first activation layer circuit and an output control circuit; at this time, the synapse units in the weight area and the synapse units in the batch normalization area are both arrays formed by electronic synapse devices (so that the synapse units in the weight area and the synapse units in the batch normalization area can modify and store weights and parameters according to the requirements of a neural network, and have plasticity), the number of rows of the synapse units in the weight area and the synapse units in the batch normalization area are the same, and the rows are respectively connected; the output end of the weight region input coding circuit is connected with each column of the synapse units in the weight region; the output end of the input coding circuit of the batch normalization region is connected with each column of the synapse units of the batch normalization region; each row of the synapse units in the batch normalization region is connected with a first activation layer circuit, the first activation layer circuit comprises a plurality of first operational amplifiers, the synapse units in the batch normalization region are divided into a plurality of groups of synaptic blocks according to the rows, each group of synaptic blocks is composed of two adjacent rows of electronic synapse devices, and each group of synaptic blocks is connected with one first operational amplifier in a differential pair mode; the output ends of the first active layer circuits are connected with an output control circuit; the weight area input coding circuit is used for inputting information of a system or outputting information X (X) of a previous layer neural network circuit1,x2,…,xN)TCoding is carried out to obtain a corresponding pulse signal, and the pulse signal is input into a synapse unit in a weight region; the batch normalization region input coding circuit is used for inputting a logic level '1' pulse into the batch normalization region synapse unit, and the input time is synchronous with the weight region input coding circuit 1; the weight area synapse unit is used for storing synapse weight information of the neural network
Figure BDA0002771881450000081
And realizing matrix vector multiplication WX under the action of pulse signal input by weight region input coding circuit, and making it be row-by-rowInputting the result into a synapse unit of a batch normalization region; specifically, the synapse units in the weight region have a size of 2 MxN, wijThe difference between the electronic synapse devices in row 2i-1 and column j and the electronic synapse devices in row 2i and column j, i equals 1,2, …, M, j equals 1,2, …, N. Specifically, the sizes of synapse units in the corresponding weight regions can be adjusted according to the sizes and scales of synapse weights of the neural network. The input data in each layer of neural network is concentrated in s distribution through batch normalization, so that the network training is faster. The neural network batch normalization parameter information stored in the synaptic unit of the batch normalization area is
Figure BDA0002771881450000082
Under the action of a logic level '1' pulse input by the input coding circuit of the batch normalization region, summing the output of the synapse unit of the weight region and the batch normalization parameter information K of the neural network according to rows, and outputting the sum to the circuit of the activation layer; specifically, the size of the synapse units in the batch normalization region is 2 MxL,
Figure BDA0002771881450000083
krsfor the difference between the electronic synapse devices in row 2r-1 and column s and the electronic synapse devices in row 2r and column s, r is 1,2, …, M, s is 1,2, …, L is determined by the normalized parameter information and the accuracy of the electronic synapse devices, generally, L is set with a certain margin to be compatible with different tasks, and the extra electronic synapses may have their input signals set to logic level "0". Specifically, the scale of the synapse units in the corresponding batch normalization region can be adjusted according to the numerical value and scale of the neural network batch normalization parameters. The first active layer circuit is used for comparing the output results of all rows in the bump block connected with the first operational amplifier by adopting the first operational amplifier to obtain mapping results, inputting the mapping results into the output control circuit for integration, and inputting the results into the p +1 layer neural network circuit.
The layer C neural network circuit comprises a weight region input coding circuit, a weight region synapse unit, a second activation layer circuit and an output control circuit; at the moment, the output end of the weight region input coding circuit is connected with each column of the synapse units in the weight region; each row of the synapse units in the weight area is connected with a second activation layer circuit, the second activation layer circuit comprises a plurality of second operational amplifiers, the synapse units in the weight area are divided into a plurality of groups of synaptic blocks according to the rows, each group of synaptic blocks is composed of two adjacent rows of electronic synapse devices, and each group of synaptic blocks is connected with one second operational amplifier in a differential pair mode; the output ends of the second active layer circuits are connected with the output control circuit; and the second active layer circuit is used for subtracting the output results of all rows in the bump blocks connected with the second operational amplifier by using the second operational amplifier and integrating the subtraction result by the output control circuit to obtain a final result.
The output control circuit includes a plurality of neurons; in the p-layer neural network circuit, all the neurons are connected with all the first operational amplifiers; in the layer C neural network circuit, each neuron is fully connected with each second operational amplifier.
Wherein the electronic synapse device comprises a two-terminal electronic synapse device or a multi-terminal electronic synapse device; specifically, in this embodiment, the two-terminal electronic synapse device may be a resistive random access memory (e.g., HfOx, TaOx, TiOx, AlOx, ZrOx, CuOx, SiNx, SiOx, GeSe, GeTe, AgInSbTe, Ag2S, Ag2Se), a phase change random access memory (e.g., GeTe, SbTe, GeSb, GeSbTe, BiTe, etc.), a magnetic random access memory (e.g., NiFe, NiFeNi, CoFe, CoFeB, La1-xSrxMnO, Nd-Pb-Mn-O, La-Ba-Mn-O, La-Ca-Mn-O), a ferroelectric random access memory (e.g., BaTiO3, PbTiO3, SrTiO3, SrRuO3, BaxSr 1-xNb 84, Pb (Zr1-xTix) O3, PbNb2O6, Sr 1-xxxxxNb 2O6 Ba, BaxNb 2NaNb5O15, Ta2O 15, Nb2O 3, Pb (Zr) O2O 3728, Pb2 Cd 2) O7, MoNb 2O 2, Pb2 WO 28, such as a two-dimensional material, such as FeO 826959, graphite, Mo 2N 2, Mo 2, and so on, such as a new-82 7O 826959; the multi-terminal electronic synapse device may be a floating gate transistor (e.g., NOR Flash, NAND Flash), a synapse transistor, or the like. Specifically, when the electronic synapse device is a two-terminal electronic synapse device, a predetermined voltage or current or an external magnetic field is applied to two terminals of the two-terminal electronic synapse device to change a resistance state, a crystallization state, a magnetization state, and an electrical polarization state of the device, thereby implementing the adjustment of the synapse weight. When the electronic synapse device is a multi-terminal electronic synapse device, a gate of the multi-terminal electronic synapse device is used as an input port of synapse, and a channel resistance state between a source and a drain of the multi-terminal electronic synapse device is used as a synapse weight. It should be noted that in a few multi-terminal electronic synapse devices, the source and drain of the multi-terminal electronic synapse device are used as input ports of synapses, and the channel resistance state between the source and drain of the multi-terminal electronic synapse device is used as a synapse weight. The regulation of the synaptic weight is achieved by controlling the input of each pole of the multi-terminal electronic synapse device.
Further, the neural network applicable to the batch normalization system comprises an online training acceleration system and an offline inference acceleration system. The on-line training acceleration system updates network parameters once per training, synchronously updates hardware parameters, randomly distributes synaptic weights in the synaptic units in the weight area when being initialized, presets parameters according to an algorithm when the synaptic weights in the synaptic units in the batch normalization area are initialized, and updates the state of a synaptic device according to results after each training. The off-line inference acceleration system trains network parameters to ideal values in a high-performance computer, the parameters are determined and then written into a hardware acceleration system, and the hardware executes an inference acceleration function. Further, in the hardware neural network batch normalization system, in the training process, the synapse units in the weight area and the synapse units in the batch normalization area synchronously update the synapse weights; and after the training is finished, keeping the synaptic weights in the synaptic units in the weight area and the synaptic units in the batch normalization area unchanged.
Note that, the output result in the synapse unit in the weight region is denoted as S ═ WX, and denoted as a preliminary activation value; according to the process in the batch normalization algorithm, the q-th row of preliminary activation values in S are normalized and expressed as:
Figure BDA0002771881450000101
q ═ 1,2, …, Q; wherein Q is the size of the batch in the batch normalization algorithm, and mu is the average value of the initial activation values of the batch; σ is the standard deviation of the preliminary activation value; gamma rayq、ξq、βqScaling factors and offsets, both being neural network learnable constants; by activating the function one can get:
Figure BDA0002771881450000102
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002771881450000103
further, the hardware neural network batch normalization system can determine sigma and mu as constants to accelerate hardware inference in the training process, and corresponding m isqAnd kqAlso constant, the above equation can be further expressed as:
Figure BDA0002771881450000111
from the above, a batch normalization region synaptic cell storage constant k can be usedqInformation by applying a constant kqIs trained to reach a training parameter gammaq、ξq、βqThe purpose of (1). In the training process of the hardware neural network batch normalization system, synchronously updating synaptic weights by the synaptic units in the weight area and the synaptic units in the batch normalization area; and after the training is finished, keeping the synaptic weights of the synaptic units in the weight area and the synaptic units in the batch normalization area unchanged.
Further, when considering the bias constant b of the neural networkqThen, the corresponding derivation process is as follows:
Figure BDA0002771881450000112
wherein the content of the first and second substances,
Figure BDA0002771881450000113
from the above, the neural network can be biased by the constant bqIs comprised to constant kqTo perform training. Further, the number of columns of the synapse units in the batch normalization region is determined by the value range of the neural network batch normalization parameter information K stored in the synapse units in the batch normalization region and the minimum precision capable of being expressed by the device, iThere is typically a larger margin left to make the array adaptable to different recognition tasks.
The hardware neural network batch normalization system provided by the invention can flexibly adjust the scale and range of the electronic synapse device according to the characteristics of the neural network, and flexibly select parameters by taking the completion accuracy of the brain-like cognitive task as a performance index for different neural networks taking a symbolic function as an activation function.
In order to further explain the hardware neural network batch normalization system provided by the invention, the following details are provided with embodiments:
in this embodiment, a neural network batch normalization hardware offline inference acceleration system is obtained by combining a binary neural network, and a pattern recognition task for 0-9 handwritten numbers is taken as an embodiment, a flow diagram of the neural network is shown in fig. 2, in this embodiment, the number of neurons in an input layer is 784, the size of a pixel value (28 × 28) corresponding to the handwritten number is set, the number of neurons in a hidden layer is 1024, and the number of neurons in an output layer is 10; the input information is a binary gray image, and the activation function is:
Figure BDA0002771881450000121
correspondingly, the schematic diagram of the hardware neural network batch normalization system structure corresponding to the neural network for executing the pattern recognition task of 0-9 handwritten digits is shown in fig. 3 (two layers of neural networks are shared), and the formula (3) is used for setting m in the exampleq> 0, so:
Figure BDA0002771881450000122
a binary electronic synapse device is adopted in the binary network hardware design, in the embodiment, the value range of K is [ -52.3,17.1], and each row of a batch normalization region needs at least 70 electronic synapses to express K values; in order to meet the requirements of the accuracy of the K constant of the electronic synapse memory of the batch normalization region and the margin of other future tasks, the synapse unit of the batch normalization region of the first layer of neural network circuit is designed to be 2048 multiplied by 128 by hardware, and the decimal part in K is saved; the sizes of synaptic units in the weight area of the first layer of neural network circuit and the second layer of neural network circuit are 2048 × 784 and 20 × 1024 respectively. Writing the trained network parameters into a weight area synapse unit of a first layer of neural network circuit, when the operation is inferred to accelerate, inputting coding electric pulse information into the weight area synapse unit in the first layer of neural network circuit by a weight area input coding circuit in the first layer of neural network circuit according to mode input information, synchronously inputting '1' electric pulse into a batch normalization area synapse unit in the first layer of neural network circuit by a batch normalization area input coding circuit in the first layer of neural network circuit, representing positive and negative two columns of current and flowing into a corresponding first operational amplifier to execute an activation function, and transmitting results to an output control circuit in the first layer of neural network circuit by 1024 first operational amplifiers, storing and synchronously outputting the results to a weight area input coding circuit in a second layer of neural network circuit; the second operational amplifier in the second layer of neural network circuit executes subtraction function and generates real value output, and the results of 10 second operational amplifiers in the second layer of neural network circuit are transmitted out of the off-chip circuit through the output control circuit in the second layer of neural network circuit to read out the neural network pattern recognition result. The binary neural network obtains the recognition precision of more than 97% under the scheme, and compared with the traditional binary neural network which does not adopt batch normalization, the precision is improved by 7%.
The invention provides a hardware neural network batch normalization system, which aims to develop a neural network system with application value and advantages. The invention discloses a batch normalization system hardware design facing a specific neural network and based on an electronic synapse device. Compared with a synapse function circuit module under the traditional CMOS technology, the electronic synapse device serving as a key synapse function module in a neural network has the outstanding advantages of low power consumption, high density, compatibility with the CMOS technology and the like, shows huge potential in the aspects of accelerating the processing speed of the neural network and breaking the bottleneck of von Neumann, and is rapidly developed. The batch normalization algorithm has great potential in accelerating the convergence of the neural network and improving the precision of the neural network. The batch normalization hardware system based on the electronic synapse device disclosed by the invention can promote the development of a road realized by neural network hardware, and promote a novel computing architecture to provide new inspiration and a new road in an era of rapid artificial intelligence development.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (9)

1. A hardware neural network batch normalization system is characterized by comprising cascaded C-layer neural network circuits; c is a positive integer; the output control circuit of the p-th layer neural network circuit is connected with the weight area input coding circuit in the p + 1-th layer neural network circuit; p-1, 2, …, C-1;
the p-layer neural network circuit comprises a weight region input coding circuit, a batch normalization region input coding circuit, a weight region synapse unit, a batch normalization region synapse unit, a first activation layer circuit and an output control circuit; at this time, the weight area synapse units and the batch normalization area synapse units are both arrays formed by electronic synapse devices, the number of rows of the weight area synapse units and the number of rows of normalization area synapse units are the same, and the rows are respectively connected; the output end of the weight region input coding circuit is connected with each column of the weight region synapse units; the output end of the batch normalization region input coding circuit is connected with each column of the synapse units in the batch normalization region; each row of the synapse units in the batch normalization region is connected with the first activation layer circuit, the first activation layer circuit comprises a plurality of first operational amplifiers, the synapse units in the batch normalization region are divided into a plurality of groups of synaptic blocks according to the rows, each group of synaptic blocks is composed of two adjacent rows of electronic synapse devices, and each group of synaptic blocks is connected with one first operational amplifier in a differential pair mode; the output ends of the first active layer circuits are connected with the output control circuit;
the weight area input coding circuit is used for coding input information of the system or output information X of a previous layer of neural network circuit to obtain a corresponding pulse signal, and the corresponding pulse signal is input into the weight area synapse unit;
the batch normalization region input coding circuit is used for inputting a logic level 1 pulse into the batch normalization region synapse units, and the input time is synchronous with the time when the weight region input coding circuit inputs a pulse signal into the weight region synapse units;
the weight area synapse unit is used for storing synapse weight information W of the neural network, realizing matrix vector multiplication WX under the action of pulse signals input by the weight area input coding circuit, and outputting the matrix vector multiplication WX according to rows;
the batch normalization region synapse unit is used for storing neural network batch normalization parameter information K, and under the action of a logic level '1' pulse input by the batch normalization region input coding circuit, the output of the weight region synapse unit and the neural network batch normalization parameter information K are summed in a row and then output to the first activation layer circuit;
the first active layer circuit is used for comparing the output results of all rows in the bump contact block connected with the first operational amplifier by adopting the first operational amplifier to obtain mapping results, inputting the mapping results into the output control circuit for integration, and inputting the results into the p +1 layer neural network circuit;
the layer C neural network circuit comprises the weight region input coding circuit, the weight region synapse unit, a second activation layer circuit and the output control circuit; at this time, the output end of the weight region input coding circuit is connected with each column of the weight region synapse units; each row of the synapse units in the weight area is connected with the second activation layer circuit, the second activation layer circuit comprises a plurality of second operational amplifiers, the synapse units in the weight area are divided into a plurality of groups of synaptic blocks according to the rows, each group of synaptic blocks is composed of two adjacent rows of electronic synapse devices, and each group of synaptic blocks is connected with one second operational amplifier in a differential pair mode; the output ends of the second active layer circuits are connected with the output control circuit;
and the second active layer circuit is used for subtracting the output results of all rows in the bump blocks connected with the second operational amplifier by adopting the second operational amplifier and integrating the subtraction result by the output control circuit to obtain a final result.
2. The hardware neural network batch normalization system of claim 1, wherein the synaptic units in the weight area and the synaptic units in the batch normalization area are different in scale in each layer of neural network circuit.
3. The hardware neural network batch normalization system of claim 1, wherein the neural network synaptic weight information stored by the synaptic weight unit in the weight area is
Figure FDA0002771881440000021
At this time, the size of the synapse units in the weight region is 2 MxN, wijThe difference between the electronic synapse devices in row 2i-1 and column j and the electronic synapse devices in row 2i and column j, i equals 1,2, …, M, j equals 1,2, …, N.
4. The hardware neural network batch normalization system of claim 1, wherein the neural network batch normalization parameter information stored in the synapse units of the batch normalization region is
Figure FDA0002771881440000031
At this time, the size of the synapse units in the batch of normalized regions is 2 MxL,
Figure FDA0002771881440000032
krsfor the difference between the electronic synapse devices in row 2r-1 and column s and the electronic synapse devices in row 2r and column s, r is 1,2, …, M, s is 1,2, …, L is determined by the normalized parameter information and the electronic synapse device precision.
5. The hardware neural network batch normalization system of claim 1, wherein the output control circuit comprises a plurality of neurons; in the p-th layer neural network circuit, all the neurons are connected with all the first operational amplifiers; in the layer C neural network circuit, each neuron is fully connected to each second operational amplifier.
6. The hardware neural network batch normalization system of claim 1, wherein the weight region synapse units update synapse weights synchronously with the batch normalization region synapse units during training; and after the training is finished, keeping the synaptic weights in the weight area synaptic units and the batch of normalization area synaptic units unchanged.
7. The hardware neural network batch normalization system of any one of claims 1-6, wherein the electronic synapse devices comprise two-terminal electronic synapse devices or multi-terminal electronic synapse devices;
the two-terminal electronic synapse device comprises a resistive random access memory, a phase change random access memory, a magnetic random access memory, a ferroelectric random access memory or a novel two-dimensional material device;
the multi-terminal electronic synapse device comprises a floating gate transistor or a synapse transistor.
8. The batch normalization system of hardware neural networks of claim 7, wherein when the electronic synapse device is a two-terminal electronic synapse device, the resistance state, the crystallization state, the magnetization state, and the electrical polarization state of the device are changed by applying a predetermined voltage or current or an applied magnetic field across the two-terminal electronic synapse device, thereby implementing the adjustment of the synaptic weights.
9. The hardware neural network batch normalization system of claim 7, wherein the electronic synapse device is a multi-terminal electronic synapse device, a gate or a source and a drain of the multi-terminal electronic synapse device are used as input ports of synapses, and a channel resistance state between the source and the drain of the multi-terminal electronic synapse device is used as a synapse weight; and adjusting the synapse weight by controlling the voltages of all the electrodes of the multi-terminal electronic synapse device.
CN202011251999.9A 2020-11-11 2020-11-11 Hardware neural network batch normalization system Active CN112396176B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011251999.9A CN112396176B (en) 2020-11-11 2020-11-11 Hardware neural network batch normalization system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011251999.9A CN112396176B (en) 2020-11-11 2020-11-11 Hardware neural network batch normalization system

Publications (2)

Publication Number Publication Date
CN112396176A CN112396176A (en) 2021-02-23
CN112396176B true CN112396176B (en) 2022-05-20

Family

ID=74600590

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011251999.9A Active CN112396176B (en) 2020-11-11 2020-11-11 Hardware neural network batch normalization system

Country Status (1)

Country Link
CN (1) CN112396176B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108985447A (en) * 2018-06-15 2018-12-11 华中科技大学 A kind of hardware pulse nerve network system
WO2020122994A1 (en) * 2018-12-14 2020-06-18 Western Digital Technologies, Inc. Hardware accelerated discretized neural network
CN111340194A (en) * 2020-03-02 2020-06-26 中国科学技术大学 Pulse convolution neural network neural morphology hardware and image identification method thereof
CN111582451A (en) * 2020-05-08 2020-08-25 中国科学技术大学 Image recognition interlayer parallel pipeline type binary convolution neural network array architecture
CN111630527A (en) * 2017-11-14 2020-09-04 技术研发基金会有限公司 Analog-to-digital converter using memory in neural network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11403528B2 (en) * 2018-05-31 2022-08-02 Kneron (Taiwan) Co., Ltd. Self-tuning incremental model compression solution in deep neural network with guaranteed accuracy performance

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111630527A (en) * 2017-11-14 2020-09-04 技术研发基金会有限公司 Analog-to-digital converter using memory in neural network
CN108985447A (en) * 2018-06-15 2018-12-11 华中科技大学 A kind of hardware pulse nerve network system
WO2020122994A1 (en) * 2018-12-14 2020-06-18 Western Digital Technologies, Inc. Hardware accelerated discretized neural network
CN111340194A (en) * 2020-03-02 2020-06-26 中国科学技术大学 Pulse convolution neural network neural morphology hardware and image identification method thereof
CN111582451A (en) * 2020-05-08 2020-08-25 中国科学技术大学 Image recognition interlayer parallel pipeline type binary convolution neural network array architecture

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BNReLU: Combine Batch Normalization and Rectified Linear Unit to Reduce Hardware Overhead;Jiexian Ge 等;《2019 IEEE 13th International Conference on ASIC (ASICON)》;20200206;全文 *
基于忆阻器的神经网络应用研究;陈佳;《微纳电子与智能制造》;20191231;第1卷(第4期);全文 *

Also Published As

Publication number Publication date
CN112396176A (en) 2021-02-23

Similar Documents

Publication Publication Date Title
Haensch et al. The next generation of deep learning hardware: Analog computing
US11842770B2 (en) Circuit methodology for highly linear and symmetric resistive processing unit
Tsai et al. Recent progress in analog memory-based accelerators for deep learning
Demin et al. Necessary conditions for STDP-based pattern recognition learning in a memristive spiking neural network
US11361216B2 (en) Neural network circuits having non-volatile synapse arrays
CN111433792B (en) Counter-based resistance processing unit of programmable resettable artificial neural network
Kaneko et al. Ferroelectric artificial synapses for recognition of a multishaded image
Fumarola et al. Accelerating machine learning with non-volatile memory: Exploring device and circuit tradeoffs
US11157810B2 (en) Resistive processing unit architecture with separate weight update and inference circuitry
Musisi-Nkambwe et al. The viability of analog-based accelerators for neuromorphic computing: A survey
US11488001B2 (en) Neuromorphic devices using layers of ion reservoirs and ion conductivity electrolyte
Bennett et al. Contrasting advantages of learning with random weights and backpropagation in non-volatile memory neural networks
US11121259B2 (en) Metal-oxide-based neuromorphic device
US11586899B2 (en) Neuromorphic device with oxygen scavenging gate
CN114791796A (en) Multi-input computing unit based on split gate flash memory transistor and computing method thereof
Zhu et al. CMOS-compatible neuromorphic devices for neuromorphic perception and computing: a review
Thunder et al. Ultra low power 3D-embedded convolutional neural network cube based on α-IGZO nanosheet and bi-layer resistive memory
CN112396176B (en) Hardware neural network batch normalization system
Wei et al. Emerging Memory-Based Chip Development for Neuromorphic Computing: Status, Challenges, and Perspectives
Zhang et al. Long short-term memory with two-compartment spiking neuron
CN112734022B (en) Four-character memristor neural network circuit with recognition and sequencing functions
Lin et al. Resistive memory-based zero-shot liquid state machine for multimodal event data learning
Wei et al. Neuromorphic Computing Systems with emerging devices
Guo et al. A multi-conductance states memristor-based cnn circuit using quantization method for digital recognition
Bianchi et al. Combining accuracy and plasticity in convolutional neural networks based on resistive memory arrays for autonomous learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant