CN114492773A

CN114492773A - Neural network batch standardization layer hardware implementation method, device, equipment and medium

Info

Publication number: CN114492773A
Application number: CN202111601714.4A
Authority: CN
Inventors: 高滨; 周颖; 刘琪; 唐建石; 张清天; 钱鹤; 吴华强
Original assignee: Tsinghua University; Beijing Superstring Academy of Memory Technology
Current assignee: Tsinghua University; Beijing Superstring Academy of Memory Technology
Priority date: 2021-12-24
Filing date: 2021-12-24
Publication date: 2022-05-13

Abstract

The application relates to the technical field of neural network computing, in particular to a neural network batch standardization layer hardware implementation method, device, equipment and medium, wherein the method comprises the following steps: storing weight parameters of the neural network in a form of conductance into the memristor array; obtaining a corresponding quantization result according to the actual current flowing through each source line of the memristor array based on the convolution result of the last convolution layer; and sending the quantization result to the next convolution layer for convolution layer calculation. Therefore, the ADC module commonly used in the calculation integration task is realized based on the memristor array, calculation of the BN layer is realized, the function module is activated, extra overhead of the processor for calculation of the BN layer is saved, and the system energy efficiency is improved.

Description

Neural network batch standardization layer hardware implementation method, device, equipment and medium

Technical Field

The present application relates to the field of neural network computing technologies, and in particular, to a method, an apparatus, a device, and a medium for implementing a neural network batch normalization layer hardware.

Background

The Batch Normalization (BN) layer is a common module in deep neural network training. The method concentrates output results which are distributed more discretely in a certain range, avoids the problem of gradient disappearance and accelerates the network training speed. In the hardware implementation method of the BN layer, comparison with the improved threshold is often performed.

In the related art, as shown in fig. 1, in a binary neural network, an output value of each layer is 1 or-1. X^N，Y^N，Z^NThe results of the Nth convolution layer, the BN layer and the sign bit judgment module are respectively obtained. After the output vector of the convolutional layer passes through the BN layer, the sign bit of the convolutional layer needs to be judged and sent to the next convolutional layer for calculation, as shown in the following formula:

the improved threshold scheme combines BN layer calculation and a subsequent sign bit judgment module. The specific calculation formula is as follows:

if k is greater than 0 and k is less than 0, the method avoids the complicated calculation steps of the BN layer and greatly reduces the calculation cost.

However, the implementation scheme of the BN layer for improving the threshold is designed for a binary neural network, the output value of each layer is only two values, and the calculation steps are relatively simple. However, for the implementation of neural network hardware with higher precision, the method is not suitable and needs to be solved urgently.

Content of application

The application provides a method, a device, equipment and a medium for realizing hardware of a neural network batch standardization layer, wherein an Analog-to-digital converter (ADC) module commonly used in a calculation integrated task is realized based on a memristor array, calculation of a BN layer is realized, a function module is activated, extra overhead of a processor for calculation of the BN layer is saved, and system energy efficiency is improved.

An embodiment of a first aspect of the present application provides a hardware implementation method for a neural network batch normalization layer, including the following steps:

storing weight parameters of the neural network in a form of conductance into the memristor array;

obtaining a corresponding quantization result according to the actual current flowing through each source line of the memristor array based on the convolution result of the last convolution layer; and

and sending the quantization result to the next convolution layer for convolution layer calculation.

Optionally, the obtaining a corresponding quantization result according to an actual current flowing through each source line of the memristor array includes:

and 8bit quantization is carried out on the preset range to obtain the quantization result.

Optionally, the preset range is:

wherein the content of the first and second substances,

Z_maxfor the upper limit of the batch normalization layer calculation, β, γ, σ, μ are parameters of the batch normalization layer, and ε is the minimum value.

Optionally, before obtaining the corresponding quantized result according to an actual current flowing through each source line of the memristor array, the method further includes:

and charging and discharging the capacitor in the integrator according to a preset charging and discharging strategy so as to integrate the convolution result of the last convolution layer.

Optionally, the performing 8-bit quantization on the preset range to obtain the quantization result includes:

and sending the output voltage of the integrator to an 8-bit ADC, and quantizing the convolution result to a plurality of levels in a preset voltage.

An embodiment of a second aspect of the present application provides a hardware implementation apparatus for a neural network batch normalization layer, including:

the storage module is used for storing the weight parameters of the neural network into the memristor array in a conductance mode;

the obtaining module is used for obtaining a corresponding quantization result according to actual current flowing through each source line of the memristor array based on the convolution result of the last convolution layer; and

and the quantization module is used for sending the quantization result to the next convolutional layer so as to calculate the convolutional layer.

Optionally, the obtaining module is specifically configured to:

Optionally, the preset range is:

wherein the content of the first and second substances,

Z_maxfor the upper limit of the batch normalization layer calculation, β, γ, σ, μ are parameters of the batch normalization layer, and ε is the minimum value. Optionally, before obtaining the corresponding quantized result according to an actual current flowing through each source line of the memristor array, the obtaining module is further configured to:

Optionally, the obtaining module is specifically configured to:

An embodiment of a third aspect of the present application provides an electronic device, including: the hardware implementation method comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the hardware implementation method of the neural network batch standardization layer according to the embodiment.

A fourth aspect of the present application provides a computer-readable storage medium, on which a computer program is stored, the program being executed by a processor for implementing the neural network batch normalization layer hardware implementation method according to any one of claims 1 to 5.

Therefore, the weight parameters of the neural network can be stored into the memristor array in a conductance mode, corresponding quantization results are obtained according to actual currents flowing through the source lines of the memristor array on the basis of convolution results of the previous convolution layer, and the quantization results are sent to the next convolution layer to perform convolution layer calculation. Therefore, the ADC module commonly used in the calculation integration task is realized based on the memristor array, calculation of the BN layer is realized, the function module is activated, extra overhead of the processor for calculation of the BN layer is saved, and the system energy efficiency is improved.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a flowchart illustrating a convolutional layer with a BN layer according to an embodiment of the present disclosure;

FIG. 2 is a comparative illustration of BN layer calculation and ReLU function implemented in two ways;

FIG. 3 is a diagram illustrating the result of the Nth layer of convolution and the BN calculation result;

FIG. 4 is a flowchart of a hardware implementation method of a neural network batch normalization layer according to an embodiment of the present disclosure;

FIG. 5 is a diagram illustrating an implementation of a convolutional layer, a BN layer, and a ReLU activation function according to an embodiment of the present application;

FIG. 6 is an exemplary diagram of a BN layer implementation with quantization modules and ReLU activation function calculation;

FIG. 7 is an exemplary diagram of an apparatus for a neural network batch normalization layer hardware implementation according to an embodiment of the present application;

fig. 8 is an exemplary diagram of an electronic device according to an embodiment of the application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.

A neural network batch normalization layer hardware implementation method, apparatus, device, and medium according to an embodiment of the present application are described below with reference to the accompanying drawings.

Before introducing the hardware implementation method of neural network batch standardization layer according to the embodiment of the present application, the present application will be described with reference to fig. 2 to 4

The range is quantized by 8 bits, and original BN layer calculation and the reason of activating the function steps are skipped.

As shown in fig. 2, fig. 2(a) is a flow chart of the calculation of the nth layer of the neural network, and fig. 2(b) is a flow chart of a scheme for implementing the calculation of the BN layer and the ReLU activation function in an improved scheme.

Specifically, three modules, namely an Nth convolutional layer, a BN layer and a ReLU activation function in the neural network are taken as examples.

Y_N＝W_N·X_N；

Wherein, Y_NAs a result of convolution of the Nth layer, W_NIs a weight matrix, X, obtained by spreading and splicing together n convolution kernels_NIs the input vector of the nth layer; y is_NQuantization is performed by an 8bit ADC.

The BN layer calculation formula is as follows:

the BN layer calculation step can be viewed as a linear transformation function: z_N＝k·Y_N+ b, wherein

The calculation result of the BN layer needs to be sent to the ReLU activation function module.

Suppose Z of BN layer_NIn the range of (Z)_min，Z_max) Inner (Z)_min<0) Then X_N+1In the range of (0, Z)_max) Within the range; after calculation of BN layer, at (0, Z)_max) Z within the range_N8 bits of quantization are required and fed into the ReLU activation function module, such as Z _N0, quantization result corresponds to 0, Z_maxCorresponding to 255.

According to the above-mentioned Z_NAnd Y_NThe conversion relationship between them can be known

As can be seen from FIG. 3, FIG. 3(a) is a schematic diagram of the convolution layer output result quantized to 8 bits (i.e. the result after convolution of the Nth layer) after the improvement, and FIG. 3(b) is a schematic diagram of the convolution layer output result after the improvementIn the improved network, a schematic diagram (i.e. the calculation result of the N-th BN layer) for quantizing the output value of the BN layer to 8 bits is required, so that if the same quantization result as that in fig. 3(b) is to be obtained, Y is_NThe quantization range of (1) needs to be corrected, assuming that k is greater than 0, (0, Z)_max) The range corresponds to

Therefore, the embodiments of the present application need to be aligned

And 8bit quantization is carried out in the range, the original steps of BN layer calculation and function activation are skipped, and the BN layer calculation and the hardware realization of the ReLU function are realized through an improved output result quantization scheme.

Specifically, fig. 4 is a schematic flowchart of a hardware implementation method of a neural network batch normalization layer according to an embodiment of the present disclosure.

As shown in fig. 4, the hardware implementation method of the neural network batch standardization layer includes the following steps:

in step S401, the weight parameters of the neural network are stored in the form of conductances into the memristor array.

In step S402, based on the convolution result of the last convolution layer, a corresponding quantization result is obtained according to the actual current flowing through each source line of the memristor array.

Optionally, in some embodiments, deriving the corresponding quantized result from the actual current flowing through each source line of the memristor array includes: and 8bit quantization is carried out on the preset range to obtain a quantization result.

Optionally, in some embodiments, the preset range is:

wherein b is

k is

Z_maxAn upper limit value of the result is calculated for the batch normalization layer. β, γ, σ, μ are parameters of the batch normalization layer, where β and γ are trainable parameters that gradually converge during the training process, whereas σ and μ represent the standard deviation and mean of the output values, respectively, in relation to the trained samples, and ε is a negligible minimum introduced to prevent the denominator from being zero.

In some embodiments, the quantizing the preset range by 8 bits to obtain a quantization result includes: and (4) sending the output voltage of the integrator to an 8-bit ADC, and quantizing the convolution result to a plurality of levels in a preset voltage.

Optionally, in some embodiments, before obtaining the corresponding quantized result according to the actual current flowing through each source line of the memristor array, further includes: and charging and discharging the capacitor in the integrator according to a preset charging and discharging strategy so as to integrate the convolution result of the previous convolution layer.

Specifically, the weight parameters of the neural network can be stored in the 2T2R memristor array in the form of conductance, and the current value flowing through each source line needs to be converted into a voltage value and quantified. Taking the output quantization module (including the integrator and 8-bit adc) of fig. 5 as an example, the current-to-voltage module may convert the source line current into a voltage through a transimpedance amplifier or an integrator, and the reference voltage of the integrator is 2.5V. Integrating the convolution layer result, i.e. to the capacitance C in the integrator_integAnd (4) charging and discharging. Suppose that

When the voltage on the capacitor drops to 2V, when

5V was integrated. Then the output voltage of the integrator is sent to an 8-bit ADC, and the result is quantized to 256 levels in 2-5V, compared with the implementation method shown in FIG. 6, the latter BN layer and the ReLU activation function meter are not needed any moreAnd the calculation step saves the extra overhead of the processor for calculating the BN layer and improves the system energy efficiency.

It should be noted that, for each convolution kernel in the convolution layer, the parameters k1, b1, k2, b2, … …, kn, BN of the BN layer corresponding to the output neuron are all different, so that details of each integration capacitor need to be finely designed to enable the subsequent voltage range to be between 2V and 5V, and multiplexing of the 8-bit adc is achieved.

In step S403, the quantization result is sent to the next convolutional layer for convolutional layer calculation.

Therefore, the BN layer implementation scheme combined with the ADC does not need to pass through BN layer calculation and ReLU activation function steps, and the quantization result can be directly sent to the next convolutional layer for calculation.

According to the neural network batch standardization layer hardware implementation method provided by the embodiment of the application, the weight parameters of the neural network can be stored in the memristor array in a conductance mode, the corresponding quantization result is obtained according to the actual current flowing through each source line of the memristor array on the basis of the convolution result of the previous convolution layer, and the quantization result is sent to the next convolution layer to perform convolution layer calculation. Therefore, the ADC module commonly used in the calculation integration task is realized based on the memristor array, calculation of the BN layer is realized, the function module is activated, extra overhead of the processor for calculation of the BN layer is saved, and the system energy efficiency is improved.

Next, a neural network batch normalization layer hardware implementation apparatus proposed according to an embodiment of the present application is described with reference to the drawings.

Fig. 7 is a block diagram illustrating an apparatus for implementing hardware of a neural network batch normalization layer according to an embodiment of the present application.

As shown in fig. 7, the apparatus 10 for implementing neural network batch normalization layer hardware includes: a storage module 100, an acquisition module 200 and a quantization module 300.

The storage module 100 is configured to store the weight parameters of the neural network in a conductance form into the memristor array;

the obtaining module 200 is configured to obtain a corresponding quantization result according to an actual current flowing through each source line of the memristor array based on a convolution result of the last convolution layer; and

the quantization module 300 is used to send the quantization result to the next convolutional layer for convolutional layer calculation.

Optionally, the obtaining module 200 is specifically configured to:

and 8bit quantization is carried out on the preset range to obtain a quantization result.

Optionally, the preset range is:

wherein the content of the first and second substances,

Z_maxfor the upper limit of the batch normalization layer calculation, β, γ, σ, μ are parameters of the batch normalization layer, and ε is the minimum value. Optionally, before obtaining the corresponding quantized result according to the actual current flowing through each source line of the memristor array, the obtaining module 200 is further configured to:

and charging and discharging the capacitor in the integrator according to a preset charging and discharging strategy so as to integrate the convolution result of the previous convolution layer.

Optionally, the obtaining module 200 is specifically configured to:

and (4) sending the output voltage of the integrator to an 8-bit ADC, and quantizing the convolution result to a plurality of levels in a preset voltage.

It should be noted that the foregoing explanation of the embodiment of the hardware implementation method for neural network batch standardization layer is also applicable to the hardware implementation device for neural network batch standardization layer of this embodiment, and is not repeated here.

According to the neural network batch standardization layer hardware implementation device provided by the embodiment of the application, the weight parameters of the neural network can be stored in the memristor array in a conductance mode, the corresponding quantization result is obtained according to the actual current flowing through each source line of the memristor array on the basis of the convolution result of the previous convolution layer, and the quantization result is sent to the next convolution layer to perform convolution layer calculation. Therefore, the ADC module commonly used in the calculation integration task is realized based on the memristor array, calculation of the BN layer is realized, the function module is activated, extra overhead of the processor for calculation of the BN layer is saved, and the system energy efficiency is improved.

Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include:

a memory 801, a processor 802, and a computer program stored on the memory 801 and executable on the processor 802.

The processor 802, when executing the program, implements the neural network batch normalization layer hardware implementation method provided in the above embodiments.

Further, the electronic device further includes:

a communication interface 803 for communicating between the memory 801 and the processor 802.

A memory 801 for storing computer programs operable on the processor 802.

The memory 801 may include high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

If the memory 801, the processor 802 and the communication interface 803 are implemented independently, the communication interface 803, the memory 801 and the processor 802 may be connected to each other via a bus and communicate with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 8, but that does not indicate only one bus or one type of bus.

Optionally, in a specific implementation, if the memory 801, the processor 802, and the communication interface 803 are integrated on one chip, the memory 801, the processor 802, and the communication interface 803 may complete communication with each other through an internal interface.

The processor 802 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present Application.

The present embodiment also provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the neural network batch normalization layer hardware implementation method as above.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "N" means at least two, e.g., two, three, etc., unless explicitly defined otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of implementing the embodiments of the present application.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or N wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. A neural network batch standardization layer hardware implementation method is characterized by comprising the following steps:

2. The method of claim 1, wherein the deriving a corresponding quantization result from an actual current flowing through each source line of the memristor array comprises:

3. The method of claim 1, wherein the predetermined range is:

wherein, the first and the second end of the pipe are connected with each other,

4. The method of claim 2, further comprising, prior to deriving the corresponding quantized result from an actual current flowing through each source line of the memristor array:

5. The method according to claim 4, wherein the quantizing the preset range by 8 bits to obtain the quantization result comprises:

6. An apparatus for implementing hardware in neural network batch normalization layer, comprising:

7. The apparatus of claim 6, wherein the obtaining module is specifically configured to:

8. The apparatus of claim 6, wherein the predetermined range is:

wherein b is

k is

9. The apparatus of claim 7, wherein the obtaining module, prior to obtaining the corresponding quantized result from an actual current flowing through each source line of the memristor array, is further to:

10. The apparatus of claim 9, wherein the obtaining module is specifically configured to:

11. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the neural network batch normalization layer hardware implementation method of any one of claims 1-5.

12. A computer-readable storage medium, on which a computer program is stored, the program being executable by a processor for implementing the neural network batch normalization layer hardware implementation method according to any one of claims 1 to 5.