WO2019127363A1

WO2019127363A1 - Weight coding method for neural network, computing apparatus, and hardware system

Info

Publication number: WO2019127363A1
Application number: PCT/CN2017/119821
Authority: WO
Inventors: 张悠慧; 季宇; 张优扬
Original assignee: 清华大学
Priority date: 2017-12-29
Filing date: 2017-12-29
Publication date: 2019-07-04
Also published as: CN109791626A; CN109791626B

Abstract

A non-assembling-based weight coding method for a neural network, comprising: a weight fixed-point conversion step of converting each matrix element of a weight matrix into a first number having a predetermined number of bits (S210); an error introduction step of introducing noise having a predetermined standard deviation into the first number to obtain a second number (S220); and a training step of training a weight matrix represented by the second numbers until convergence occurs, and then writing the training result as a final weight matrix into a single analog circuit device correspondingly representing one matrix element (S230), wherein a single matrix element is represented by a single analog circuit device rather than multiple analog circuit devices assembled together. The coding method for a neural network can greatly reduce resource consumption without affecting the effect, thereby saving resource overhead, and thus, a large-scale neural network can be arranged with limited resources.

Description

Neural network weight coding method, computing device and hardware system

Technical field

The present invention relates generally to the field of neural network technologies, and more particularly to a weight coding method, a computing device, and a hardware system for a neural network.

Background technique

As Moore's Law gradually fails, the progress of existing chip processes has slowed down, and people have to face new applications and new devices. In recent years, the neural network (NN) has made breakthroughs in computing, and has achieved high accuracy in many fields such as image recognition, speech recognition, and natural language processing. However, neural networks require massive computing resources. The general-purpose processor has been difficult to meet the computational needs of deep learning, and designing a dedicated chip has become an important development direction. At the same time, the emergence of memristor provides an efficient solution for neural network chip design. Memristor has the advantages of high density, non-volatile, low power consumption, cost-effective, easy 3D, etc. In the network calculation, the characteristics of adjustable resistance can be used as the programmable weight, and the advantage of the combination of the calculation and the calculation can be used as the high-speed multiplier.

The neural network components are all neurons, which are connected to each other by a large number of neurons. The connections between neurons can be thought of as directed edges with weights, the outputs of the neurons are weighted by the connections between the neurons, and then passed to the connected neurons, and all the neurons receive The inputs are added together for further processing, producing the output of the neurons. Neural network modeling usually consists of several neurons as a layer, and layers are connected to each other to construct. Figure 1 shows a chain of neural networks. Each circle in the figure represents a neuron, each The arrows indicate the connections between the neurons, each of which has a weight, and the structure of the actual neural network is not limited to a chain-like network structure.

The core computation of the neural network is a matrix vector multiplication operation. The output produced by the layer L _n containing n neurons can be represented by a vector V _{n of} length n, which is fully associated with the layer L _m containing m neurons, and the connection weights It can be expressed as a matrix M _{n × m} , the matrix size is n rows and m columns, and each matrix element represents the weight of one connection. Then the vector input to L _m after weighting is M _{n × m} V _n , and such matrix vector multiplication is the core calculation of the neural network.

Since the matrix vector multiplication is very large, it takes a lot of time to perform a large number of matrix multiplication operations on the existing general-purpose processor. Therefore, the neural network acceleration chip also has the main design goal of accelerating matrix multiplication. The memristor array is just right for the above work. First, V is a set of input voltage, the voltage is multiplied by the memristor conductance G and superimposed output current, and the output current is multiplied by the grounding resistance Rs to obtain the output voltage V'. The whole process is realized under the analog circuit, and has a fast speed and a small area. The advantages.

However, chip computing based on memristor also has the disadvantages of low precision, large disturbance, large cost of digital-to-analog/analog conversion, and limited matrix size. Moreover, although the memristor can perform matrix vector multiplication operations efficiently, since the memristor chip matrix vector multiplication is implemented in an analog circuit, noise and disturbance are inevitably brought about, so compared with the neural network, the memristor The calculation results are not accurate.

Due to the process limitations of the memristor, the use of a memristor indicates that the weight has a certain error. As shown in Figure 3, the weights of different levels will overlap. In order to avoid overlap, the existing methods generally use a number of low-precision memristor splicing to represent a high-precision weight, and in the case where each memristor has a low precision, the weight data can be considered accurate. Taking a 2-bit memristor to represent a 4-bit weight as an example, a 2-bit memristor is used to indicate a lower weight of 2 bits and the other represents a high 2 bits.

The existing ISAAC technology first trains a neural network with floating point numbers and then "writes" the weighted data to the memristor. ISAAC uses four 2-bit memristor devices to represent an 8-bit weight, which allows more resources to be used to improve matrix operation accuracy.

ISAAC uses splicing methods to represent weights, which is relatively inefficient and requires a lot of resources. For example, if you represent one weight, you need 4 memristor devices.

Similar to ISAAC, the existing PRIME technology first trains a neural network with floating point numbers, then uses two 3-bit precision input voltages to represent a 6-bit input, and two 4-bit memristor devices to represent an 8-bit. The weights are weighted, and the positive and negative weights are represented by two sets of arrays.

PRIME uses positive and negative addition and high and low splicing methods to represent weights, and also requires a lot of resources. That is, to represent one weight, four memristor devices are needed.

The realization of neural networks based on memristor devices must overcome the problem of weight reading errors, which are caused by device characteristics and existing processes and are currently difficult to avoid. These prior art techniques use a number of low-precision memristor splices that can be considered "no error" to represent a high-precision weight, require a large amount of resources, and have low resource utilization efficiency.

Therefore, there is a need for a weighting technique for a memristor based neural network to solve the above problems.

Summary of the invention

The present invention has been made in view of the above circumstances.

According to an aspect of the present invention, a non-splicing weight training method for a neural network is provided, comprising: a weight-spotting step of converting each matrix element of a weight matrix into a first number having a predetermined number of bits; a step of introducing a noise having a predetermined standard deviation into the first number to obtain a second number; and a training step of training the weight matrix represented by the second number, training to convergence, and obtaining a training result, wherein The training result will be used as the final weight matrix, each matrix element being written one by one into a single analog circuit device corresponding to a matrix element, wherein a single matrix is represented by a single analog circuit device rather than a splicing of multiple analog circuit devices. element.

According to the above non-splicing weight training method, in the weight setting step, the first number conversion can be performed by a linear relationship or a logarithmic relationship.

According to the above non-splicing weight training method, the noise may be a read/write error of an analog circuit, and obey a normal distribution law.

According to the above non-splicing weight training method, the analog circuit device may be a memristor, a capacitor comparator or a voltage comparator.

According to the above non-splicing weight training method, the first number may be a fixed point number and the second number may be a floating point number.

According to another aspect of the present invention, there is provided a non-splicing weight coding method for a neural network, comprising the steps of: writing each matrix element of a weight matrix one by one into a single analog circuit device corresponding to a matrix element, A single matrix element is represented by a splicing of a single analog circuit device rather than a plurality of analog circuit devices, wherein the weight matrix is obtained by the non-splicing weight training method described above.

According to the above non-splicing weight coding method, before the writing step, the method may further include the following steps: a weight-spotting step of converting each matrix element of the weight matrix into a first number having a predetermined number of bits; and an error introduction step in Introducing noise with a predetermined standard deviation into the first number to obtain a second number; and training step, training the weight matrix represented by the second number, training until convergence, and obtaining a training result.

According to another aspect of the present invention, there is provided a neural network chip having a basic module for performing an operation of matrix vector multiplication in hardware by an analog circuit device, wherein each matrix element of the weight matrix is written one by one to represent one A single analog circuit device of matrix elements to represent a single matrix element of a weight matrix by splicing of a single analog circuit device rather than multiple analog circuit devices.

According to the above neural network chip, the weight matrix may be obtained by the above non-splicing weight training method.

According to still another aspect of the present invention, a computing device includes a memory and a processor having stored thereon computer executable instructions that, when executed by a processor, perform a non-splicing weight training method according to the above Or according to the above non-splicing weight coding method.

According to still another aspect of the present invention, there is provided a neural network system comprising: the computing device according to the above; and the neural network chip according to the above.

According to the present invention, an encoding method for a neural network is provided, which can greatly reduce resource consumption without affecting effects, thereby saving resource overhead, and arranging a large-scale nerve under conditions of limited resources. The internet.

DRAWINGS

These and/or other aspects and advantages of the present invention will become more apparent from the following detailed description of the embodiments of the invention.

Figure 1 shows a schematic of a chained neural network.

Figure 2 shows a schematic diagram of a memristor based crossbar switch structure.

Figure 3 shows a weighted statistical distribution map of eight levels of weights on a memristor.

Fig. 4 shows a schematic diagram of an application scenario of an encoding technique of a neural network according to the present invention.

Figure 5 shows a general flow chart of an encoding method in accordance with the present invention.

Fig. 6 shows a comparison of experimental effects using the existing high and low level stitching method and the encoding method according to the present invention.

Detailed ways

The present invention will be further described in detail below in conjunction with the drawings and specific embodiments.

The present application provides a new encoding method (hereinafter referred to as RLevel encoding method), which is essentially different from the existing method in that the new encoding method does not require that the weight values represented by a single device do not overlap, but instead Kinds of errors are introduced into the training. By training the weight matrix containing noise and enabling it to train to convergence, the converged values are finally written into a single device, thereby enhancing the noise immunity of the model and reducing the representation of matrix elements. The number of costs reduces resource and resource consumption.

The technical principle and embodiment of the present application will be analyzed in detail below with reference to the accompanying drawings.

As shown in Figure 3, since the error caused by the memristor device approximates a normal distribution, it is assumed that the error of the device obeys the normal distribution N(μ, σ ² ), if the memristor conductance value is used to represent an n-bit value. , then μ has 2 ⁿ possible values. Here, it will be understood by those skilled in the art that in order to simplify the calculation, the same standard deviation σ is used corresponding to different conductance values μ.

Although the memristor is described as an example in the following detailed description, circuit devices other than the memristor capable of realizing matrix vector multiplication are also possible, such as a capacitor or a voltage comparator.

According to the nature of normal distribution superposition: statistically independent normal random variables X~(μ _x , σ _x ² ), Y~(μ _y , σ _y ² ), then their sum also satisfies the normal distribution U=X+Y ~(μ _x + μ _y , σ _x ² + σ _y ² ).

Assume that, as in the prior art, two devices are spliced to represent a high precision weight. l and h represent low and high-order devices respectively, the weight is expressed as 2 ⁿ *h+l, and the errors of low and high are L~(l,σ ² ), H～(h,σ ² ), then 2 ⁿ *H ~(h, 2 ²ⁿ * σ ² ). The weight range is 2 ²ⁿ -l, and the standard deviation of the weight error is

We use the range of values and the standard deviation as the standard for the final accuracy. The accuracy of the splicing weight method is:

In contrast, in the present application, a device is used to represent a high-precision weight with an accuracy of (2 ⁿ -l) / σ.

It can be seen from the above results that the accuracy of using the high and low bit stitching method and the single device to represent the weight is basically the same.

Fig. 4 shows a schematic diagram of an application scenario of an encoding technique of a neural network according to the present invention. As shown in FIG. 4, the general inventive concept of the present disclosure is to solve the problem that the network model 1200 employed by the neural network application 1100 is weight-encoded by the encoding method 1300, and the result is written into the memristor device of the neural network chip 1400. The weight based on the memristor neural network indicates the problem of requiring a large number of devices, and finally saves a lot of resources without significant loss of accuracy.

First, the coding method

Figure 5 shows a general flow diagram of an encoding method in accordance with the present invention, comprising the following steps:

1. Weight setting process S210, converting each matrix element of the weight matrix into a first number having a predetermined number of bits;

According to the hardware design (the precision of a single memristor device requires hardware support), in the forward network, each weight value is converted into a fixed-point number with a certain precision, and the fixed-point weight is obtained.

Here, in order to better explain the method of the present invention, the weight matrix A of 2*2 size of Table 1 below will be further described as an example.

Table 1 initial weight matrix A

0.26410.2641	0.85090.8509
0.32960.3296	0.67400.6740

When 4 bits are used as the predetermined number of bits, each weight value is converted into 4 to a specific number of points. The maximum value of 0.8509 in the matrix corresponds to the maximum value of 4 bits, that is, 2 ⁴ -1=15, and other values are linearly converted correspondingly to obtain fixed-point weights, and the fixed-point matrix of Table 2 is obtained.

Table 2 fixed point matrix B

5.00005.0000	15.000015.0000
6.00006.0000	12.000012.0000

It should be noted that the above is the conversion of the fixed point number in a linear manner, but those skilled in the art can understand that the conversion can be performed by logarithm or other calculation methods without using the linear method.

2. The error is introduced into step S220, in which noise having a predetermined standard deviation is introduced in the first number to obtain a second number.

According to the characteristics of the memristor device, the noise with a normal distribution with a standard deviation of σ is added, that is, the weights w=w+Noise, Noise~(0, σ ² ). It should be noted that here, the first number is set to a fixed point number, and the second number is equal to the first number plus noise, so the second number is a floating point number. For example, the number of fixed points of 0, 1, 2, and 3 adds noise and becomes four floating point numbers of -0.1, 1.02, 2.03, and 2.88. However, such a setting is not restrictive, and the first number may be a floating point number. .

3. Training step S230, training the weight matrix represented by the second number, training to convergence, and then writing the training result as a final weight matrix into the circuit device for weight matrix calculation.

Second, theoretical verification

The actual examples are given below to theoretically illustrate that with the same input, the output using the RLevel encoding method according to the present invention and the output of the high and low level stitching method according to the prior art have close precision.

If splicing is performed with two 2 bits, the fixed point matrix B (Table 2) is decomposed into a high order matrix H (Table 3) and a low order matrix L (Table 4):

Table 3 high order matrix H

1.00001.0000	3.00003.0000
1.00001.0000	3.00003.0000

Table 4 low order matrix L

1.00001.0000	3.00003.0000
2.00002.0000	0.00000.0000

In the splicing, the fixed point matrix B is equal to the upper matrix H*4+lower matrix L, that is, B=4*H+L, and the maximum value corresponds to the maximum value of 2 bits, that is, 3, regardless of the high or low.

In order to better simulate the introduction of the actual error, the fixed-point matrix B is converted into the conductance value of 4*10 ^-6 to 4*10 ^-5 according to the RLevel method and the high-low level splicing method respectively, and the Rlevel conductance matrix of Table 5 is obtained. RC, high conductivity matrix HC and low conductivity matrix LC.

It should be noted that the training process according to the present invention does not convert the matrix into conductance values, but rather increases the noise of a normal distribution with a standard deviation of σ on the basis of the first number. Here, for the sake of explanation, the introduction of the actual error is caused by noise and disturbance during the reading and writing process of the memristor device or other used circuit device, so the data is based on the conductance value as the analog value below. analysis.

Table 5 Conductance Matrix

Assume that the input voltage is:

0.10 0.15

[No noise]

If there is no noise, based on the above input voltage, the outputs of the Rlevel conductance matrix RC, the high-order conductance matrix HC, and the low-level conductance matrix LC are respectively:

Table 6 Conductance Matrix Output

Rlevel输出RC_outRlevel output RC_out	高位输出HC_outHigh output HC_out	低位输出LC_outLow output LC_out	拼接输出HLC_outSplicing output HLC_out
4.36000000E-064.36000000E-06	4.0000E-064.0000E-06	1.0000E-051.0000E-05	2.18000000E-052.18000000E-05
8.92000000E-068.92000000E-06	5.8000E-065.8000E-06	4.6000E-064.6000E-06	4.46000000E-054.46000000E-05

The above spliced output is output according to the high bit output *4+ low bit.

If the result of Table 6, that is, the Rlevel output RC_out and the spliced output HL_out, is converted to 8 and compared with a specific number of points, it can be seen that both are:

125. 255.

[Add noise]

If noise is added to the conductance matrix with a mean of 0 and a standard deviation of 0.05*4*10 ^-5 (i.e., approximately 5%), the noise matrix of Table 7 is obtained.

Table 7 noise matrix

Still assume that the input voltage is:

0.10 0.15

Then the Rlevel, high and low noise matrix outputs are:

Table 8 noise matrix output

Rlevel输出RN_outRlevel output RN_out	高位输出HN_outHigh output HN_out	低位输出LN_outLow output LN_out	拼接输出HLN_outSplicing output HLN_out
4.5550E-064.5550E-06	4.2578E-064.2578E-06	6.3242E-066.3242E-06	2.3355E-052.3355E-05
9.0081E-069.0081E-06	9.9704E-069.9704E-06	4.1181E-064.1181E-06	4.4000E-054.4000E-05

If the result of Table 8, that is, the Rlevel output RN_out and the spliced output HLN_out, is converted to 8 and compared with a specific number of points, it can be seen that the two are respectively:

Rlevel output: 129.00255.00

Splicing output: 135.00255.00

It can be seen from the final result that the RLevel coding method according to the present invention has very close precision to the output of the prior art high and low level splicing method, whether noise is added or not. Therefore, the solution of the present invention is verified from a theoretical point of view. Practicality and feasibility.

Third, data verification

In order to verify the effectiveness of the encoding method of the present invention from the experimental data, the applicant conducted a series of experiments.

Fig. 6 shows a comparison of experimental effects using the existing high and low level stitching method and the RLevel encoding method according to the present invention.

This experiment used a convolutional neural network to classify the CIFAR10 data set. The data set has 60,000 32*32 pixel color images, each of which belongs to one of 10 categories. As shown in Fig. 6, the abscissa is the weight precision and the ordinate is the correct rate. There are two lines in the figure. The lower line uses the RLevel method to represent the weights of 2, 4, 6, and 8 bits by one device, and the upper line is 2 of 1, 2, 3, and 4 bits, respectively. The devices are spliced to represent 2, 4, 6, and 8 bits.

As shown in Figure 6, in this experiment, the accuracy of the RLevel method is very close to that of the high-low-level stitching method, but since only one device is used, and no splicing of multiple devices is required, it is non-splicing. Encoding, so you can save 50% of resources. Thus, according to the weight coding method of the present invention, it is possible to provide substantially the same accuracy as the existing high and low bit splicing without using high and low bit splicing, and the weight matrix calculation of the neural network by the analog circuit such as a memristor is solved. The need to arrange a large number of circuit devices also reduces costs and saves resources.

The embodiments of the present invention have been described above, and the foregoing description is illustrative, not limiting, and not limited to the disclosed embodiments. Numerous modifications and changes will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

A non-splicing weight training method for a neural network, comprising:

a weight-spotting step of converting each matrix element of the weight matrix into a first number having a predetermined number of bits;

An error introducing step of introducing noise having a predetermined standard deviation into the first number to obtain a second number; and

a training step of training the weight matrix represented by the second number, training to convergence, and obtaining training results,

Wherein, the training result will be used as the final weight matrix, and each matrix element is written one by one into a single analog circuit device corresponding to one matrix element, wherein the splicing is represented by a single analog circuit device instead of multiple analog circuit devices. A single matrix element.
The non-splicing weight training method according to claim 1, wherein in the weight setting step, the first number conversion is performed by a linear relationship or a logarithmic relationship.
The non-splicing weight training method according to claim 1, wherein the noise is a read/write error of an analog circuit and obeys a normal distribution law.
The non-splicing weight training method according to claim 1, wherein the analog circuit device is a memristor, a capacitor comparator or a voltage comparator.
The non-splicing weight training method according to claim 1, wherein the first number is a fixed point number and the second number is a floating point number.
A non-splicing weight coding method for a neural network, comprising the steps of: writing each matrix element of a weight matrix one by one into a single analog circuit device corresponding to a matrix element, so as to pass through a single analog circuit device instead of multiple Splicing of analog circuit devices to represent a single matrix element,

The weight matrix is obtained by the non-splicing weight training method according to any one of claims 1 to 5.
The non-splicing weight coding method according to claim 6, wherein before the writing step, the method further comprises the following steps:

a weight-spotting step of converting each matrix element of the weight matrix into a first number having a predetermined number of bits;

An error introducing step of introducing noise having a predetermined standard deviation into the first number to obtain a second number; and

The training step trains the weight matrix represented by the second number, and after training, the training result is obtained.
A neural network chip having a basic module for performing matrix vector multiplication operations in hardware by analog circuit devices,

Wherein, each matrix element of the weight matrix is written one by one into a single analog circuit device corresponding to one matrix element to represent a single matrix element of the weight matrix by a single analog circuit device rather than a splicing of multiple analog circuit devices.
The neural network chip according to claim 8, wherein said weight matrix is obtained by the non-splicing weight training method according to any one of claims 1 to 5.
A computing device comprising a memory and a processor, the memory storing computer executable instructions, the computer executable instructions, when executed by the processor, performing the non-splicing weight training method of any one of claims 1 to 5. Or the non-splicing weight coding method according to any one of claims 6 to 7.
A neural network system comprising:

The computing device of claim 10;

A neural network chip according to any of claims 8-9.