CN113705784A

CN113705784A - Neural network weight coding method based on matrix sharing and hardware system

Info

Publication number: CN113705784A
Application number: CN202110964903.1A
Authority: CN
Inventors: 虞致国; 孙一; 顾晓峰
Original assignee: Jiangnan University
Current assignee: Jiangnan University
Priority date: 2021-08-20
Filing date: 2021-08-20
Publication date: 2021-11-26

Abstract

The invention discloses a neural network weight coding method based on a shared matrix and a hardware system, and belongs to the technical field of neural network algorithm hardware implementation. The method aims at the problems of high demand on the number of storage and calculation devices, large size of a storage and calculation array and high cost of the current weight coding method, and realizes pairwise coding of weight parameters in a neural network convolution kernel by using a shared matrix mode. Meanwhile, the coding method of the invention has good compatibility and is not limited by the type of a nonvolatile memory device. Compared with the existing coding method, the coding method based on the shared matrix is more suitable for large-scale and ultra-large-scale computing architectures.

Description

Neural network weight coding method based on matrix sharing and hardware system

Technical Field

The invention relates to a neural network weight coding method based on matrix sharing and a hardware system, and belongs to the technical field of neural network algorithm hardware implementation.

Background

With the development of deep learning, neural networks are widely applied in various fields such as image recognition, voice recognition, natural language processing and the like. However, as network architectures become more complex, the amount of data transmission and computation in neural networks increases dramatically. In addition, computation and data transmission bring about great power consumption, so that neural network applications are difficult to deploy on hardware devices.

In recent years, a storage and computation integrated architecture for neural network computation has been widely focused and researched, and the basic idea is to store weight mapping into a storage and computation array, so that some simple but huge data volume logic computation functions are put into a memory to reduce the data transmission volume and the transmission distance between the memory and a processor.

The operation principle of a classical non-volatile memory array is shown in fig. 1. To calculate two multiplication matrices

Storing the values of the weight matrix W in a memory array in the form of conductance, and storing the matrix in the form of conductance

The value of (A) is input to the input end of the storage and calculation array in a voltage mode, and the operation result in a current mode is obtained from the output end of the storage and calculation array, so that the whole storage and calculation integration process is completed.

The weight parameters in the neural network are usually divided into positive and negative values, in order to show that the weight parameters in the storage and calculation array usually adopt the expression mode of positive and negative matrixes, the encoded values are respectively stored in the positive and negative matrixes, then the output results of the two matrixes are subtracted, and the value of the weight parameter is expressed by the difference value of the matrixes. As shown in fig. 2, the current encoding method processes each weight parameter independently, mapping into positive and negative matrices, i.e. at least two computing devices are required for each weight represented. In the application scenario of individual pursuit of high precision, a weight is represented by a plurality of positive matrix devices and a plurality of negative matrix devices, and the device overhead of each weight parameter is even more.

Nowadays, a simple neural network is difficult to meet task requirements, so that the structure of the neural network is increasingly complex, the scale is continuously enlarged, the application scene of the storage and calculation integrated framework becomes more and more extensive and complex, and realization of the large-scale and ultra-large-scale neural network in the hardware field is a necessary way for development of the storage and calculation integrated framework. As the depth and parameters of neural networks increase, the number of devices and array sizes required in hardware implementations inevitably become increasingly large. The large array size causes a series of problems including cost, chip area, parasitic parameters and testability, and the difficulty of hardware implementation is multiplied with the increase of the number of devices.

Disclosure of Invention

The invention provides a neural network weight parameter coding method based on matrix sharing and a hardware system, aiming at the problems of large quantity of computing devices, large size of a computing array and high cost in the implementation of neural network hardware in the conventional neural network weight coding method.

The invention provides a neural network weight parameter coding method based on matrix sharing, which is characterized by comprising the following steps:

the method comprises the following steps: performing fixed-point processing on the weight parameters in the trained neural network convolution kernels;

step two: grouping convolution kernels in the neural network, and grouping every two convolution kernels when the number of the convolution kernels is an even number; when the number of the convolution kernels is odd, randomly taking out one convolution kernel, and grouping the residual convolution kernels pairwise;

step three: calculating an actual encoded value, comprising:

independently coding the weight parameters in one convolution kernel arbitrarily taken out in the step two, outputting two actual coding values after each weight parameter is coded, and mapping and storing the two actual coding values to a positive matrix and a negative matrix;

for the two-by-two grouped convolution kernels in step two: two weight parameters at the same position in the same group of convolution kernels are coded pairwise, and each two weight parameters are coded and then output three actual coded values, and are mapped and stored to three matrixes;

step four: and splicing the matrixes storing the actual coding values to obtain a coded neural network weight matrix.

Optionally, using

Two weight parameters representing the same position of the same set of convolution kernels, pair

Two-by-two coding is carried out, and the coding result is mapped and stored to three matrixes which are characterized in that

The positive matrix is a matrix of positive ions,

positive matrix and shared negative matrix, the process of pairwise coding comprises:

s1: inputting weight parameters

S2: judgment of

When in use

When it is used, order

When in use

When it is used, order

S3：

S4: judgment of

When in use

When the temperature of the water is higher than the set temperature,

keeping the original value unchanged;

when in use

When it is used, order

S5: outputting the data

Wherein,

as a weight parameter

Actual coded values after coding;

weight parameter

By two coded values

And

determining, weighting, parameters

By two coded values

And

are determined jointly, wherein

Is a shared negative encoded value;

mapping is stored to

The positive matrix is a matrix of positive ions,

mapping is stored to

The positive matrix is a matrix of positive ions,

the mapping is stored to the shared negative matrix.

Optionally, using

A negative matrix of the matrix is formed,

negative matrix and shared positive matrix, the process of pairwise coding comprises:

s1: inputting weight parameters

S2: judgment of

When in use

Order to

When in use

Order to

S3：

S4: judgment of

When in use

When the temperature of the water is higher than the set temperature,

keeping the original value unchanged;

when in use

When the temperature of the water is higher than the set temperature,

s5: outputting the data

Wherein,

as a weight parameter

Actual coded values after coding;

weight parameter

By two coded values

And

determining, weighting, parameters

By two coded values

And

are determined jointly, wherein

To share a positive encoded value;

mapping is stored to

A negative matrix of the matrix is formed,

mapping is stored to

A negative matrix of the matrix is formed,

the mapping is stored to the shared positive matrix.

A second object of the present invention is to provide a neural network chip, which includes an accumulation array composed of a plurality of accumulation devices, wherein each weight value in the neural network weight matrix is written into the corresponding accumulation device one by one, and the neural network weight matrix is obtained by using the neural network weight parameter coding method.

Optionally, the storage device is a non-volatile storage device.

Optionally, the precision of the storage device is 8 bits.

Optionally, the precision of the storage device is 4 bits.

A third object of the present invention is to provide a computing apparatus, comprising a memory and a processor, wherein the memory stores computer-executable instructions, and the instructions are characterized in that when executed by the processor, the method for encoding the neural network weight parameters is performed.

A fourth object of the present invention is to provide a neural network hardware system, comprising: the invention provides a neural network chip and/or a computing device.

The invention also provides an application of the neural network weight parameter coding method and/or the neural network chip and/or the computing device and/or the neural network hardware system in the technical field of neural networks.

The invention has the beneficial effects that:

when the hardware of the neural network is implemented, the convolution kernels in the neural network are grouped in pairs, and the weight parameters at the same positions of the convolution kernels are coded in pairs in a matrix sharing mode, and the two weight parameters are coded and then only need three storage devices to implement the calculation, so that the number of the storage devices required by the calculation array is reduced; meanwhile, the coding method is not limited by the type of a nonvolatile memory device, and has good compatibility. Compared with the existing method of independently coding each weight parameter and storing the coded value into 2 storage devices, the method only uses 1.5 storage devices at least on average for each weight parameter, and saves about 1/4 device overhead for the storage array.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of the operation of a non-volatile memory array.

Fig. 2 is a schematic diagram of a storage array structure of the current encoding method.

FIG. 3 is a schematic diagram of a shared device structure according to the present invention.

Fig. 4 shows a schematic diagram of the whole process of mapping from algorithm to storage array according to the present invention, wherein (a) is algorithm encoding process and (b) is process of writing encoded value into storage array.

Fig. 5 is a flow chart of an implementation of the encoding method of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

The coding method provided by the invention is oriented to the weight parameters in the neural network, in the practical application, the neural network is trained firstly to obtain the final trained neural network convolution kernel, and then the coding method provided by the invention is utilized to code the weight parameters in the convolution kernel.

The first embodiment is as follows:

the embodiment provides a neural network coding method based on matrix sharing, which is applied to a neural network, and the coding process comprises the following steps:

the method comprises the following steps: performing fixed-point processing on the weight parameters in the trained neural network convolution kernel, wherein the fixed-point processing method in the embodiment is a linear transformation mode;

table 1 shows a convolution kernel weight parameter matrix after the dotting processing in this embodiment;

TABLE 1 weight matrix after spotting

Step two: grouping the convolution kernels processed in the step one, and rearranging the convolution kernels, wherein the weight parameters are processed pairwise;

table 2 is the matrix rearranged after grouping;

TABLE 2 rearrangement of cloth matrix

Convolution kernel 1 rebinning	5	15	-7	-9
					Convolution kernel 2 rearranged cloth	12	-3	-7	11

Step three: calculating an actual coding value; in this embodiment, a shared negative matrix is selected, and the rearrangement matrix is encoded according to the encoding flowchart of fig. 5;

and the weight parameters corresponding to the uplink and the downlink are processed in pairs, and the two share one storage device for primary coding to obtain the primary coding matrix of the table 3.

TABLE 3 Primary coding matrix

Convolution kernel 1 positive matrix	5	15	0	0
					Shared negative matrix	0	0	7	9
Convolution kernel 2 positive matrix	12	-3	0	20

And judging the matrix of the initial coding, wherein the physical values in the actual storage and calculation array have no positive or negative value, when the existing coding value is negative, introducing an offset into the group of codes, and the coding values of the rest groups are not changed, so that the actual coding matrix of the table 4 is obtained.

TABLE 4 actual coding matrix

Convolution kernel 1 positive matrix	5	18	0	0
					Shared negative matrix	0	3	7	9
Convolution kernel 2 positive matrix	12	0	0	20

Step four: and splicing the actual coding matrixes in the table 4 to obtain a coded neural network weight matrix.

In order to verify the effectiveness of the coding method and the characteristics of low device overhead, a series of experiments are carried out. In the experimental process, the coding method is applied to different neural networks, the MINIST hand-written digital data set is adopted for verification, and the accuracy of network inference after coding and the device overhead required by array storage in hardware implementation are observed.

A small-scale neural network is designed in the first experiment, and the small-scale neural network comprises 3 convolutional layers and 1 full-connection layer. In the first convolutional layer, the convolutional kernel size is 5 × 1 × 7, and no padding is added; in the second convolution layer, the convolution kernel size is 3 × 7, and filling is added; in the third convolution layer, the convolution kernel size is 3 × 7, and filling is added; the fourth layer is a full connection layer, the number of input neurons is 63, and the number of output neurons is 10.

Then, the coding method of the invention is applied to a neural network with deeper layers and larger scale. The new neural network has 8 layers, including 5 convolutional layers and 3 fully-connected layers. In the first convolution layer, the convolution kernel size is 7 × 1 × 16, and no padding is added; in the second convolution layer, the convolution kernel size is 5 × 16 × 32, and filling is added; in the third convolution layer, the convolution kernel size is 3 × 32 × 128, and padding is added; in the fourth convolution layer, the convolution kernel size is 3 × 128, and padding is added; in the fifth convolutional layer, the convolutional kernel size was 3 × 128 × 64, and padding was added. The last three layers are full-connection layers with classification function, and the number of neurons in each layer is respectively as follows: 1024, 512 and 10.

The results are summarized in Table 5 below. By applying the coding method in two types of neural networks, the effectiveness of the coding method can be effectively verified. Meanwhile, the transverse comparison of the two networks can show the proportional relation between the optimization effect of the coding method and the network scale.

TABLE 5 summary of the results

Network name	Neural network	1	Neural network 2
				Number of layers	5	8
Number of parameters	1,718	1,065,098
			Rate of accuracy	96.71％	98.94％
Number of original computing devices	3,436	2,130,196
			Existing number of computing devices	2,639	1,595,328
Saving overhead	23.2％	25.0％

According to the experimental results given in table 5, it can be seen that after the coding method of the present invention is applied, the target network still performs well on the calculation accuracy, and the validity of the coding method is verified. Meanwhile, the number of required storage devices before and after the coding method is applied can be compared, and the coding method has the characteristic of remarkably reducing device overhead. Therefore, the coding method of the invention can solve the problem that a large number of circuit devices need to be arranged in the implementation process of neural network hardware on the premise of ensuring the network precision, and effectively reduces the development cost and the design difficulty.

Some steps in the embodiments of the present invention may be implemented by software, and the corresponding software program may be stored in a readable storage medium, such as an optical disc or a hard disk.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A neural network weight parameter coding method based on matrix sharing is characterized by comprising the following steps:

step three: calculating an actual encoded value, comprising:

2. The method of claim 1, using

The positive matrix is a matrix of positive ions,

s1: inputting weight parameters

S2: judgment of

When in use

When it is used, order

When in use

When it is used, order

S3：

S4: judgment of

When in use

When the temperature of the water is higher than the set temperature,

keeping the original value unchanged;

when in use

When it is used, order

S5: outputting the data

Wherein,

as a weight parameter

Actual coded values after coding;

weight parameter

By two coded values

And

determining, weighting, parameters

By two coded values

And

are determined jointly, wherein

Is a shared negative encoded value;

mapping is stored to

The positive matrix is a matrix of positive ions,

mapping is stored to

The positive matrix is a matrix of positive ions,

the mapping is stored to the shared negative matrix.

3. The method of claim 1, using

A negative matrix of the matrix is formed,

s1: inputting weight parameters

S2: judgment of

When in use

Order to

When in use

Order to

S3：

S4: judgment of

When in use

When the temperature of the water is higher than the set temperature,

keeping the original value unchanged;

when in use

When the temperature of the water is higher than the set temperature,

s5: outputting the data

Wherein,

as a weight parameter

Actual coded values after coding;

weight parameter

By two coded values

And

determining, weighting, parameters

By two coded values

And

are determined jointly, wherein

To share a positive encoded value;

mapping is stored to

A negative matrix of the matrix is formed,

mapping is stored to

A negative matrix of the matrix is formed,

the mapping is stored to the shared positive matrix.

4. A neural network chip comprising an inventory array formed by a plurality of inventory devices, each weight value in a neural network weight matrix being written one by one into a corresponding inventory device, characterized in that the neural network weight matrix is obtained by using the neural network weight parameter coding method according to any one of claims 1 to 3.

5. The chip of claim 4, wherein the memory device is a non-volatile memory device.

6. The chip of claim 5, wherein the precision of the memory device is 8 bits.

7. The chip of claim 5, wherein the precision of the memory device is 4 bits.

8. A computing device comprising a memory and a processor, the memory having stored thereon computer-executable instructions, wherein the instructions, when executed by the processor, perform the neural network weight parameter encoding method of any one of claims 1-3.

9. A neural network hardware system, comprising: the neural network chip of any one of claims 4-7 and/or the computing device of claim 8.

10. Use of the neural network weight parameter coding method according to any one of claims 1 to 3 and/or the neural network chip according to any one of claims 4 to 7 and/or the computing device according to claim 8 and/or the neural network hardware system according to claim 9 in the field of neural network technology.