CN114462585A

CN114462585A - Neural network batch standardization layer hardware implementation method, device, equipment and medium

Info

Publication number: CN114462585A
Application number: CN202111616640.1A
Authority: CN
Inventors: 高滨; 周颖; 唐建石; 张清天; 钱鹤; 吴华强
Original assignee: Tsinghua University; Beijing Superstring Academy of Memory Technology
Current assignee: Tsinghua University; Beijing Superstring Academy of Memory Technology
Priority date: 2021-12-27
Filing date: 2021-12-27
Publication date: 2022-05-10

Abstract

The application relates to the technical field of neural network computing, in particular to a neural network batch standardization layer hardware implementation method, device, equipment and medium, wherein the method comprises the following steps: in the neural network, determining a current convolution result of the neural network; generating a K matrix based on a current convolution result, and obtaining a memristor array in a mapping relation with the K matrix, wherein a conductance difference value of the memristor array corresponds to a parameter of the K matrix; and (4) calculating the BN layer of the neural network by using the memristor array to obtain a calculation result of the BN layer. Therefore, the problems that in the related art, BN layer calculation is only suitable for a binary neural network and is not suitable for high-precision neural network hardware realization and the like are solved, data transmission back and forth between a processor unit and a memristor array unit is saved by realizing the BN layer calculation on the memristor array, and the system energy efficiency is improved.

Description

Neural network batch standardization layer hardware implementation method, device, equipment and medium

Technical Field

The present application relates to the field of neural network computing technologies, and in particular, to a method, an apparatus, a device, and a medium for implementing a neural network batch normalization layer hardware.

Background

In the related art, a BN layer (Batch Normalization) hardware implementation includes the following steps:

(1) in the binary neural network, the operation of judging the sign bit and the calculation of a BN layer are combined together. The output of convolutional layer only requires the sum of Y_THIf the comparison result is larger than the value, the final result is 1, and if the comparison result is smaller than the value, the result is-1;

(2) the BN layer parameters are fused into the convolution of the previous layer:

Z^N＝X^N*W^N′+bias′；

handle

When the input and the output are unchanged as an equivalent weight matrix,

as an equivalent bias, W^N'and bias' are mapped into the memristor array, [ X ]^N,1]The converted voltage pulse signal is input to the corresponding bit line terminal. The current flowing through the source line, namely the calculated result of the volume of the layer and the BN layer;

(3) by way of a look-up table based on the memristor array. Aiming at the characteristic that the output of the binary neural network is simple, for different convolution results, the corresponding calculation results of the BN layer are stored in the memristor array in a lookup table mode.

(4) BN layer calculation is realized through a 16-bit adder and a multiplier unit.

However, the related art is only suitable for a binary neural network, and is not suitable for hardware implementation of a neural network with higher precision, and the hardware implementation of the BN layer is mainly implemented in a CPU (central processing unit) or a GPU (graphics processing unit), and with the gradual development of a storage and computation integration technology, if the BN layer is still placed in a general processing unit for computation, data transportation between the storage and computation integration module and the CPU/GPU will hinder further improvement of system energy efficiency, and thus a solution is urgently needed.

Content of application

The application provides a neural network batch standardization layer hardware implementation method, device, equipment and medium, and aims to solve the problems that in the related art, BN layer calculation is only suitable for a binary neural network and is not suitable for high-precision neural network hardware implementation and the like.

An embodiment of a first aspect of the present application provides a hardware implementation method for a neural network batch normalization layer, including the following steps:

in a neural network, determining a current convolution result of the neural network;

generating a K matrix based on the current convolution result, and obtaining a memristor array in a mapping relation with the K matrix, wherein the conductance difference value of the memristor array corresponds to the parameters of the K matrix; and

and calculating the BN layer of the neural network by using the memristor array to obtain a calculation result of the BN layer.

Optionally, the obtaining of the memristor array in a mapping relationship with the K matrix includes:

and forming any two memristor units in the memristor array into a 2T2R unit to obtain the memristor array.

Optionally, source lines of any two memristor units are connected, and voltage pulse signals with the same amplitude and opposite polarities are applied to the bit lines.

Optionally, the method further comprises:

detecting an actual current flowing through the source line;

and calculating at least one conductance difference value between the actual voltage and the upper and lower conductances according to the actual current.

Optionally, the performing, with the memristor array, a BN layer calculation of the neural network includes:

converting the parameters in the K matrix into the at least one conductance difference value.

An embodiment of a second aspect of the present application provides a hardware implementation apparatus for a neural network batch normalization layer, including:

the determining module is used for determining a current convolution result of the neural network in the neural network;

the obtaining module is used for generating a K matrix based on the current convolution result and obtaining a memristor array in a mapping relation with the K matrix, wherein the conductance difference value of the memristor array corresponds to the parameters of the K matrix; and

the first calculation module is used for calculating the BN layer of the neural network by using the memristor array to obtain a calculation result of the BN layer.

Optionally, the obtaining module is specifically configured to:

Optionally, the method further comprises:

the detection module is used for detecting the actual current flowing through the source line;

and the second calculation module is used for calculating at least one conductance difference value of the actual voltage and the upper conductance and the lower conductance according to the actual current.

Optionally, the first calculating module is specifically configured to:

An embodiment of a third aspect of the present application provides an electronic device, including: the hardware implementation method comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the hardware implementation method of the neural network batch standardization layer according to the embodiment.

A fourth aspect of the present application provides a computer-readable storage medium, on which a computer program is stored, the program being executed by a processor for implementing the neural network batch normalization layer hardware implementation method according to any one of claims 1 to 5.

Therefore, in the neural network, the current convolution result of the neural network is determined, the K matrix is generated based on the current convolution result, the memristor array in a mapping relation with the K matrix is obtained, the BN layer of the neural network is calculated by using the memristor array, and the calculation result of the BN layer is obtained. Therefore, calculation of the BN layer is converted into matrix vector multiplication operation, and as with the forward connection and the convolution layer, parameters of the calculation can be stored and calculated in the memristor array, the problems that calculation of the BN layer in the related technology is only suitable for a binary neural network and is not suitable for implementation of high-precision neural network hardware and the like are solved, transmission of data back and forth between the processor unit and the memristor array unit is saved, and system energy efficiency is improved.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a flowchart of a hardware implementation method of a neural network batch normalization layer according to an embodiment of the present application;

FIG. 2 is an exemplary diagram of a convolutional layer with a BN layer;

FIG. 3 is an example diagram of a convolution operation and a memristor array-based convolution operation according to one embodiment of the present application;

FIG. 4 is an exemplary diagram of BN layer calculations based on a memristor array in accordance with one embodiment of the present application;

FIG. 5 is an exemplary diagram of an apparatus for batch normalization layer hardware implementation of a neural network according to an embodiment of the present application;

fig. 6 is an exemplary diagram of an electronic device according to an embodiment of the application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.

A neural network batch normalization layer hardware implementation method, apparatus, device, and medium according to embodiments of the present application are described below with reference to the accompanying drawings. In the method, a current convolution result of the neural network can be determined in the neural network, a K matrix is generated based on the current convolution result, a memristor array in a mapping relation with the K matrix is obtained, and the BN layer calculation of the neural network is carried out by utilizing the memristor array to obtain a calculation result of the BN layer. Therefore, calculation of the BN layer is converted into matrix vector multiplication operation, and as with the forward connection and the convolution layer, parameters of the calculation can be stored and calculated in the memristor array, the problems that calculation of the BN layer in the related technology is only suitable for a binary neural network and is not suitable for implementation of high-precision neural network hardware and the like are solved, transmission of data back and forth between the processor unit and the memristor array unit is saved, and system energy efficiency is improved.

Specifically, fig. 1 is a schematic flowchart of a hardware implementation method of a neural network batch normalization layer according to an embodiment of the present disclosure.

As shown in fig. 1, the hardware implementation method of the neural network batch standardization layer includes the following steps:

in step S101, in the neural network, a current convolution result of the neural network is determined.

It should be understood that, in the neural network, as the number of layers increases, the output value of the hidden layer neuron may change, and the overall distribution may shift toward the boundary of the value interval of the activation function, so that the gradient of the lower layer network in the training process disappears, and the overall convergence speed becomes slow. The BN layer is a common processing module in neural networks and is typically placed in front of the activation function ReLU. The discrete output can be unified into an interval in which the activation function is sensitive to the input, so that the gradient disappearance is avoided, and the training difficulty of the network is reduced, as shown in fig. 2, wherein fig. 2 is a schematic diagram of a convolutional layer of a BN layer.

For the sake of understanding, the embodiments of the present application take the implementation of the convolution layer with BN calculation as an example, and the parameters of the BN layer collectively include β, γ, σ, μ. Where β and γ are trainable parameters that converge gradually during the training process, and σ and μ represent the standard deviation and mean of the output values, respectively, with respect to the trained sample. After training, in the testing stage, the related parameters of the BN layer are kept unchanged.

Suppose X^NIs the input vector of the Nth layer, W^NThe weight matrix is obtained by expanding N convolution kernels of the nth layer of convolution, as shown in fig. 3(b), fig. 3(b) is an exemplary diagram of convolution operation based on the memristor array, and fig. 3(a) is an exemplary diagram of convolution operation.

Y^N＝W^N*X^N；

Suppose the output vector Y of the layer^NThe vector after being processed by the BN layer is Z^NThe calculation formula is as follows:

where ε is a negligible minimum introduced with a denominator of 0.

Further, assume that the Nth convolution layer has N convolution kernels, corresponding to N output neurons y₁，y₂，y_n。

In an existing memristor array-based storage and computation integrated scheme, n convolution kernels are unfolded into a row and mapped into n rows of a memristor array. The BN layer parameters corresponding to each output neuron are k1, b1, k2, b2, … …, kn and BN. The BN layer calculation formula is as follows:

in step S102, a K matrix is generated based on the current convolution result, and a memristor array in a mapping relationship with the K matrix is obtained, where a conductance difference value of the memristor array corresponds to a parameter of the K matrix.

Optionally, in some embodiments, obtaining a memristor array in a mapping relationship with the K matrix includes: any two memristor units in the memristor array form a 2T2R unit, and the memristor array is obtained.

Optionally, in some embodiments, the source lines of any two memristor cells are connected, and voltage pulse signals of the same magnitude and opposite polarity are applied to the bit lines.

Specifically, the embodiment of the present application may convert the parameters (i.e., the current convolution result) obtained in step S101 into the sum of the matrices K and K

The product of the vectors.

Further, the embodiment of the application can map the K matrix into the memristor array. For example, a 2T2R cell is formed by two memristor cells, and since the source lines of the two cells are connected, voltage pulse signals with the same amplitude and opposite polarities are applied to the bit lines.

Optionally, in some embodiments, the hardware implementation method of the neural network batch normalization layer further includes: detecting an actual current flowing through the source line; at least one conductance difference between the actual voltage and the upper and lower conductances is calculated from the actual current.

That is, the embodiment of the present application uses kirchhoff's current law and ohm's law as the current value flowing through the source line, that is, the voltage value and the upper and lower conductance difference values.

In step S103, BN layer calculation of the neural network is performed using the memristor array, and a calculation result of the BN layer is obtained.

Optionally, in some embodiments, the BN layer computation of the neural network using the memristor array includes: the parameters in the K matrix are converted into at least one conductance difference value.

Specifically, the embodiment of the present application converts the parameter in the K matrix into two conductance difference values, and if the 0 value in the matrix is to be mapped into the array, both the conductances can be adjusted to the high impedance state.

Therefore, calculation of the BN layer is converted into a product of a matrix and a vector, namely the calculation of the BN layer is realized on the memristor array, the back-and-forth transmission of data between the processor unit and the memristor array unit is saved, and the energy efficiency of the system is improved. It should be noted that, in addition to the manner of obtaining the memristor array by using 2T2R, other memory units, such as SRAM and DRAM, may be used in the embodiment of the present application to implement the calculation of the BN layer in a manner consistent with that described above, and details are not described here to avoid redundancy.

To facilitate further understanding of the hardware implementation method of the neural network batch normalization layer according to the embodiments of the present application, detailed descriptions are provided below with reference to specific embodiments.

As shown in fig. 4, only the last column in fig. 4 is the offset b, while the diagonal lines of the first few rows are the k values (filled circles) for each neuron, and the other elements in the matrix are all 0 (open circles).

In the test process, the conductance in the array does not need to be changed, the result of the N layer convolution network is input into the bit lines of the first rows (for example, 8-bit data is converted into 8 voltage pulse signals, which respectively correspond to the value of each bit), and the last column is mapped with the offset b, so that the bit line inputs 1 corresponding to the number of the pulse signals. E.g. maximum value of convolution output y_maxIt is mapped to 255 pulse signals. Applying round (y) to the bit line corresponding to the offset column_max255) voltage pulse signals of the same amplitude and width (round is a rounding function). And a source line of the memristor is connected with the ADC, and a value obtained by quantization is a BN layer calculation result.

According to the neural network batch standardization layer hardware implementation method provided by the embodiment of the application, the current convolution result of the neural network can be determined in the neural network, the K matrix is generated based on the current convolution result, the memristor array in a mapping relation with the K matrix is obtained, the BN layer of the neural network is calculated by using the memristor array, and the calculation result of the BN layer is obtained. Therefore, calculation of the BN layer is converted into matrix vector multiplication operation, and as with the forward connection and the convolution layer, parameters of the calculation can be stored and calculated in the memristor array, the problems that calculation of the BN layer in the related technology is only suitable for a binary neural network and is not suitable for implementation of high-precision neural network hardware and the like are solved, transmission of data back and forth between the processor unit and the memristor array unit is saved, and system energy efficiency is improved.

Next, a neural network batch normalization layer hardware implementation apparatus proposed according to an embodiment of the present application is described with reference to the drawings.

Fig. 5 is a block diagram illustrating an apparatus for implementing hardware of a neural network batch normalization layer according to an embodiment of the present application.

As shown in fig. 5, the apparatus 10 for implementing neural network batch normalization layer hardware includes: a determination module 100, an acquisition module 200 and a first calculation module 300.

The determining module 100 is configured to determine, in the neural network, a current convolution result of the neural network;

the obtaining module 200 is configured to generate a K matrix based on a current convolution result, and obtain a memristor array in a mapping relationship with the K matrix, where a conductance difference of the memristor array corresponds to a parameter of the K matrix; and

the first calculation module 300 is configured to perform BN layer calculation of the neural network by using the memristor array, so as to obtain a calculation result of the BN layer.

Optionally, the obtaining module 200 is specifically configured to:

any two memristor units in the memristor array form a 2T2R unit, and the memristor array is obtained.

Optionally, the method further comprises:

Optionally, the first calculating module 300 is specifically configured to:

the parameters in the K matrix are converted into at least one conductance difference value.

It should be noted that the foregoing explanation of the embodiment of the hardware implementation method for neural network batch standardization layer is also applicable to the hardware implementation device for neural network batch standardization layer of this embodiment, and is not repeated here.

According to the hardware implementation device for the batch standardization layer of the neural network, a current convolution result of the neural network can be determined in the neural network, the K matrix is generated based on the current convolution result, the memristor array in a mapping relation with the K matrix is obtained, the calculation of the BN layer of the neural network is carried out by utilizing the memristor array, and the calculation result of the BN layer is obtained. Therefore, calculation of the BN layer is converted into matrix vector multiplication operation, and as with the forward connection and the convolution layer, parameters of the calculation can be stored and calculated in the memristor array, the problems that calculation of the BN layer in the related technology is only suitable for a binary neural network and is not suitable for implementation of high-precision neural network hardware and the like are solved, transmission of data back and forth between the processor unit and the memristor array unit is saved, and system energy efficiency is improved. .

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include:

a memory 601, a processor 602, and a computer program stored on the memory 601 and executable on the processor 602.

The processor 602 executes the program to implement the neural network batch normalization layer hardware implementation method provided in the above embodiments.

Further, the electronic device further includes:

a communication interface 603 for communication between the memory 601 and the processor 602.

The memory 601 is used for storing computer programs that can be run on the processor 602.

Memory 601 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

If the memory 601, the processor 602 and the communication interface 603 are implemented independently, the communication interface 603, the memory 601 and the processor 602 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 6, but this is not intended to represent only one bus or type of bus.

Optionally, in a specific implementation, if the memory 601, the processor 602, and the communication interface 603 are integrated on a chip, the memory 601, the processor 602, and the communication interface 603 may complete mutual communication through an internal interface.

The processor 602 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present Application.

The present embodiment also provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the neural network batch normalization layer hardware implementation method as above.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "N" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of implementing the embodiments of the present application.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or N wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. A neural network batch standardization layer hardware implementation method is characterized by comprising the following steps:

2. The method of claim 1, wherein the obtaining the memristor array in a mapping relationship with the K matrix comprises:

3. The method of claim 2, wherein source lines of any two memristor cells are connected, and voltage pulse signals of the same magnitude and opposite polarity are applied to the bit lines.

4. The method of claim 1, further comprising:

detecting an actual current flowing through the source line;

5. The method of claim 4, wherein the utilizing the memristor array for BN layer calculations of the neural network comprises:

6. An apparatus for implementing hardware in neural network batch normalization layer, comprising:

7. The apparatus of claim 6, wherein the obtaining module is specifically configured to:

8. The apparatus of claim 7, in which source lines of any two memristor cells are connected, and voltage pulse signals of the same magnitude and opposite polarities are applied to bit lines.

9. The apparatus of claim 6, further comprising:

10. The apparatus of claim 9, wherein the first computing module is specifically configured to:

11. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the neural network batch normalization layer hardware implementation method of any one of claims 1-5.

12. A computer-readable storage medium, on which a computer program is stored, the program being executable by a processor for implementing the neural network batch normalization layer hardware implementation method according to any one of claims 1 to 5.