WO2021163866A1

WO2021163866A1 - Neural network weight matrix adjustment method, writing control method, and related device

Info

Publication number: WO2021163866A1
Application number: PCT/CN2020/075648
Authority: WO
Inventors: 王绍迪
Original assignee: 杭州知存智能科技有限公司
Priority date: 2020-02-18
Filing date: 2020-02-18
Publication date: 2021-08-26

Abstract

A neural network weight matrix adjustment method, a writing control method, and a related device, suitable for processing a neural network weight matrix before performing neural network operation by a computing-in-memory chip. The method comprises: determining whether weight distribution of a neural network weight matrix is lower than a first preset threshold; if yes, multiplying each weight value in the neural network weight matrix by a first constant; if not, determining whether the weight distribution of the neural network weight matrix is higher than a second preset threshold, wherein the second preset threshold is higher than the first preset threshold; and if the weight distribution of the neural network weight matrix is higher than the second preset threshold, dividing each weight value in the neural network weight matrix by a second constant, wherein the first constant and the second constant are both greater than 1. Thus, a signal obtained after computing-in-memory is performed in a memory cell array storing the processed weight matrix is within the effective range of an ADC. The computing accuracy is improved.

Description

Neural network weight matrix adjustment method, writing control method and related device

Technical field

The present invention relates to the field of artificial intelligence technology, in particular to a neural network weight matrix adjustment method, a writing control method and related devices.

Background technique

In order to solve the bottleneck of the traditional von Neumann computing architecture, computing-in-memory (CIM) chips have received widespread attention. The data transmission volume and transmission distance can reduce power consumption and improve performance at the same time.

Due to the integrated storage and calculation characteristics of the in-memory computing chip, the in-memory computing chip is suitable for neural network computing scenarios. By pre-writing the weight matrix of the trained neural network algorithm into the memory cell array of the in-memory computing chip, the signals to be processed pass Parallel input, based on Ohm's law and Kirchhoff's law, the signal to be processed and the corresponding weight are directly performed vector-matrix multiplication and addition operation in the memory cell array, and the output current/current signal of the memory cell array passes through the ADC (analog-to-digital converter) After quantization, it is used as the output result.

In practical applications, if the weight distribution of the neural network algorithm is too small or too large (see Figure 1 or Figure 2, where the circle represents the storage unit, the number in it represents the pre-stored weight value, and the horizontal represents the row input. That is, the input signal. For example, the input of the first row in Figure 1 is 7, the input of the second row is 5, and the input of the third row is 3, and the arrow from the top to the small indicates the output. For example, the output of the first column is 7 The output of the second column is 7, and the output of the third column is 248. The output signal of each column is input to an ADC to convert the analog output signal of the column into a digital signal for subsequent applications; among them, Figure 1 shows The output voltage/current of the first column and the second column is too small, exceeding the lower limit of the ADC; Figure 2 shows that the output voltage/current of the three columns of memory cells is too large, exceeding the upper limit of the ADC range), or the input signal is too small or If it is too large, it may cause the analog voltage/current output value of the memory cell array to be too small or too large, which may exceed the lower or upper range of the ADC; usually the ADC has the highest quantization accuracy for the intermediate value, and the quantization accuracy for the values on both sides is poor. When the ADC input exceeds the lower limit or upper limit range, the corresponding output is directly truncated to the minimum or maximum value, thereby reducing the accuracy of the operation.

Summary of the invention

In view of the problems in the prior art, the present invention provides a neural network weight matrix adjustment method, a write control method, related devices, electronic equipment, and computer-readable storage media, which can at least partially solve the problems in the prior art.

In order to achieve the above objectives, the present invention adopts the following technical solutions:

In the first aspect, a method for adjusting the weight matrix of a neural network is provided, including:

Judging whether the weight distribution of the neural network weight matrix is lower than the first preset threshold;

If yes, multiply all the weight values in the neural network weight matrix by a first constant;

If not, determine whether the weight distribution of the neural network weight matrix is higher than a second preset threshold, where the second preset threshold is greater than the first preset threshold;

If the weight distribution of the neural network weight matrix is higher than the second preset threshold, all the weight values in the neural network weight matrix are divided by a second constant;

Wherein, the first constant and the second constant are both greater than one.

Further, after all the weight values in the neural network weight matrix are multiplied by a first constant, the method further includes:

Judging whether the number of digits of each weight value in the processed weight matrix exceeds the third preset threshold;

If yes, intercept the bits whose weight values exceed the third preset threshold to obtain the first weight array and the second weight array;

Wherein, the first weight array is the remaining weight array after truncating the bits whose digits of each weight value exceeds the third preset threshold, and is used to store in a memory cell array; the second weight array is the weight The weight array obtained by truncating the bits whose digits of the value exceeds the third preset threshold is used for storing in another memory cell array or inputting an arithmetic operation circuit.

Further, the method for adjusting the weight matrix of the neural network also includes:

If the number of bits of each weight value in the processed weight matrix does not exceed the third preset threshold, the processed weight array is used to store in a memory cell array.

When the second weight array is stored in another memory cell array, the ADC output result after the memory cell array corresponding to the first weight array is combined with the ADC output result after the memory cell array corresponding to the second weight array;

When the second weight array is input to the arithmetic operation circuit, the ADC output result after the memory cell array corresponding to the first weight array is combined with the output result of the arithmetic operation circuit.

Further, dividing all the weight values in the neural network weight matrix by a second constant includes:

Divide each weight value by the second constant to obtain a third weight array;

Save the overflow bits obtained by dividing each weight value by the second constant as a fourth weight array;

Wherein, the third weight array is used to store a memory cell array; the fourth weight array is used to store another memory cell array or input an arithmetic operation circuit.

When the fourth weight array is stored in another memory cell array, the ADC output result after the memory cell array corresponding to the third weight array is combined with the ADC output result after the memory cell array corresponding to the fourth weight array;

When the fourth weight array is input to the arithmetic operation circuit, the ADC output result after the memory cell array corresponding to the third weight array is combined with the output result of the arithmetic operation circuit.

In a second aspect, a method for writing a neural network weight matrix is provided, including:

According to the data adjustment instruction, the shift register is controlled to shift its input weight values, and the shift register shifts its input weight values and the overflow bits and the weight values in the neural network weight matrix The address in is stored in the buffer, where the data adjustment instruction includes: the shift direction and the number of shift bits;

Storing the data in the buffer into a memory cell array or inputting an arithmetic operation circuit;

Wherein, the data adjustment instruction is generated when the weight distribution of the neural network weight matrix is uneven; the shift register is connected to a write module; the write module is connected to another memory cell array for shifting the shift register The data is written to another memory cell array.

Further, the neural network weight matrix writing control method further includes:

When the data in the buffer is input to the arithmetic operation circuit, the ADC output result after the another memory cell array is combined with the output result of the arithmetic operation circuit;

When the data in the buffer is stored in a memory cell array, the ADC output result of the memory cell array is combined with the ADC output result of the other memory cell array.

In a third aspect, a neural network weight matrix adjustment device is provided, including:

The first judgment module judges whether the weight distribution of the neural network weight matrix is lower than the first preset threshold;

The weight amplification module, if the weight distribution of the neural network weight matrix is lower than the first preset threshold, multiply all the weight values in the neural network weight matrix by a first constant;

The second judgment module, if the weight distribution of the neural network weight matrix is not lower than the first preset threshold, judge whether the weight distribution of the neural network weight matrix is higher than the second preset threshold, wherein the second preset threshold is greater than the The first preset threshold;

The weight reduction module, if the weight distribution of the neural network weight matrix is higher than the second preset threshold, all the weight values in the neural network weight matrix are divided by a second constant;

Wherein, the first constant and the second constant are both greater than one.

In a fourth aspect, an electronic device is provided, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor. The processor implements the steps of the neural network weight matrix adjustment method described above when the processor executes the program.

In a fifth aspect, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the above-mentioned neural network weight matrix adjustment method are realized.

The neural network weight matrix adjustment method, the writing control method, and related devices, electronic equipment, and computer-readable storage medium provided by the present invention are suitable for performing neural network weight matrix training before performing neural network calculations using in-memory computing chips. Processing, the method includes: judging whether the weight distribution of the neural network weight matrix is lower than a first preset threshold; if so, multiplying all the weight values in the neural network weight matrix by a first constant; if not, judging the nerve Whether the weight distribution of the network weight matrix is higher than the second preset threshold, where the second preset threshold is greater than the first preset threshold; if the weight distribution of the neural network weight matrix is higher than the second preset threshold, the All weight values in the neural network weight matrix are divided by a second constant; wherein, the first constant and the second constant are both greater than one. By multiplying or multiplying the weight matrix of the neural network with uneven overall distribution, the signal obtained after the processed weight matrix storage memory cell array is stored in the ADC (the ADC is set after the memory cell array, Used to convert the output of each memory cell column into a digital signal) within the effective range, thereby improving the accuracy of the calculation.

In order to make the above and other objectives, features and advantages of the present invention more comprehensible, the following will specifically cite preferred embodiments in conjunction with the accompanying drawings, and describe them in detail as follows.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are For some of the embodiments of the present application, for those of ordinary skill in the art, other drawings may be obtained based on these drawings without creative work. In the attached picture:

Figure 1 shows that the output current/voltage of the memory cell column is too small to exceed the lower limit of the ADC when the matrix operation is performed in the memory cell array of the in-memory computing chip;

Figure 2 shows that the output current/voltage of the memory cell column is too large to exceed the upper limit range of the ADC when the matrix operation is performed in the memory cell array of the in-memory computing chip;

Fig. 3 shows an application scenario of a neural network weight matrix adjustment method provided by an embodiment of the present invention;

FIG. 4 is a first flowchart of a method for adjusting a weight matrix of a neural network in an embodiment of the present invention;

FIG. 5 shows a schematic diagram of a method for adjusting a weight matrix of a neural network provided by an embodiment of the present invention;

FIG. 6 shows a situation in which the weight array adjusted by the neural network weight matrix adjustment method provided by the embodiment of the present invention has a weight value exceeding the number of bits.

FIG. 7 is a second schematic flowchart of a method for adjusting a neural network weight matrix in an embodiment of the present invention;

FIG. 8 shows a schematic diagram of dividing the weight array after amplifying the weight array in the embodiment of the present invention; FIG.

FIG. 9 shows another schematic diagram of dividing the weight array after enlarging the weight array in the embodiment of the present invention;

FIG. 10 shows the specific steps of step S400 in the embodiment of the present invention;

FIG. 11 shows a schematic diagram of dividing the weight array after shrinking the weight array in an embodiment of the present invention; FIG.

Fig. 12 is a structural block diagram of a neural network weight matrix adjustment device in an embodiment of the present invention;

FIG. 13 shows an application scenario of a neural network weight matrix writing control method provided by an embodiment of the present invention;

14 is a schematic flowchart of a method for writing a neural network weight matrix in an embodiment of the present invention;

Fig. 15 is a structural diagram of an electronic device according to an embodiment of the present invention.

Detailed ways

In order to enable those skilled in the art to better understand the solutions of the application, the technical solutions in the embodiments of the application will be clearly and completely described below in conjunction with the drawings in the embodiments of the application. Obviously, the described embodiments are only These are a part of the embodiments of this application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work should fall within the protection scope of this application.

Those skilled in the art should understand that the embodiments of the present invention can be provided as a method, a system, or a computer program product. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.

It should be noted that the terms "including" and "having" in the specification and claims of this application and the above-mentioned drawings and any variations of them are intended to cover non-exclusive inclusions, for example, including a series of steps or units. The process, method, system, product, or equipment of is not necessarily limited to those clearly listed steps or units, but may include other steps or units that are not clearly listed or are inherent to these processes, methods, products, or equipment.

It should be noted that the embodiments in the application and the features in the embodiments can be combined with each other if there is no conflict. Hereinafter, the present application will be described in detail with reference to the drawings and in conjunction with the embodiments.

Fig. 3 shows an application scenario of the neural network weight matrix adjustment method provided by an embodiment of the present invention; as shown in Fig. 3, the compiling software 1 is connected to the programming circuit in the in-store computing chip 2, and is used to provide Neural network weight matrix adjustment method The adjusted neural network weight matrix is written into the memory cell array in the in-store calculation chip 2 through the programming circuit. When the neural network is used to process the input data, the input data stream is subjected to certain preprocessing It is transmitted to the memory cell array and performs neural network operation with the neural network weight matrix pre-written in the memory cell array. The output data stream of the memory cell array is converted into a digital signal by the ADC module, and the operation result is output.

It is worth noting that, in Figure 3, only a few circuit modules in the in-memory computing chip 2 are exemplarily listed. Those skilled in the art can understand that the in-memory computing chip 2 may also be provided with registers and post-processing modules. And other related functional circuits. In addition, those skilled in the art can understand that the programming circuit can also be called a write module or a read/write module, which is used to control each memory cell in the memory cell array (programmable semiconductor devices can be used). Realization, such as floating gate MOS tube) for programming or writing data.

Wherein, the compilation software may be a compilation processing software that has been developed or not yet developed, or a programmed computer program, which can be executed in a computer device, or can be executed in a processing chip or a mobile portable device. The embodiment of the present invention There is no restriction on this.

In addition, for the in-memory computing chip, for the in-memory computing chip that can be repeatedly programmed and used, a programming circuit can be set in it. For applications that do not need to adjust the neural network weight matrix, in order to reduce the chip It is not necessary to set the programming circuit, but the final neural network weight matrix after adjustment is written into the in-memory calculation chip through the programming device in advance at the factory.

FIG. 4 is a first flowchart of a method for adjusting a weight matrix of a neural network in an embodiment of the present invention; as shown in FIG. 4, the method for adjusting a weight matrix of a neural network may include the following contents:

Step S100: Determine whether the weight distribution of the neural network weight matrix is lower than a first preset threshold;

If yes, go to step S200; if not, go to step S300;

Among them, the neural network weight matrix is a neural network weight matrix trained in the neural network training stage, and the weight distribution can be a statistical index such as a mean value or a probability distribution, which can be specifically set by the designer according to the actual ADC range. The first preset threshold is set by the designer according to specific statistical requirements and hardware requirements.

Step S200: Multiply all the weight values in the neural network weight matrix by a first constant;

That is: multiply all the weight values in the neural network weight matrix by a constant N to increase the network weight, and then write the increased weight matrix into the memory cell array, and apply it to the same input data stream. , Can double the output analog current of the memory cell array.

Among them, the constant N can usually be set to an integer multiple of 2, which is equivalent to shifting.

For example, referring to Fig. 5, for a binary weight matrix, the original neural network weight matrix is multiplied by 8, that is, the binary number is shifted to the left by 3 bits to obtain the amplified weight matrix.

Step S300: Determine whether the weight distribution of the neural network weight matrix is higher than a second preset threshold, where the second preset threshold is greater than the first preset threshold;

If yes, perform step S400; if not, perform step S500;

Among them, the second preset threshold is set by the designer according to specific statistical requirements and hardware requirements.

Step S400: Divide all the weight values in the neural network weight matrix by a second constant;

Wherein, the first constant and the second constant are both greater than 1, and the first constant and the second constant may be the same or different. It is worth noting that the calculation result of multiplying by 2 and dividing by 1/2 is the same. Therefore, the scope of protection claimed by the present invention, and the expression of multiplication or division does not limit the scope of protection of the present invention, but for the same or The equivalent calculation process should all be included in the scope of protection of the present invention.

That is: if the weight distribution is higher than the second preset threshold, after the weight matrix is written into the memory cell array, in the application stage, the analog voltage/current output by the memory cell array may exceed the quantization upper limit of the ADC. At this time, all The weight is divided by a constant M (M is usually an integer multiple of 2, which is equivalent to shifting to the right) to reduce the network weight, and then in the application stage, for the same input data stream, the reduced network weight can make The analog voltage/current output by the memory cell array is reduced to a suitable range for the ADC.

Step S500: Use the neural network weight matrix to store in a memory cell array.

Among them, the neural network weight matrix in this step refers to the neural network weight matrix in step S100 (if the weight distribution of the weight matrix is neither lower than the first preset threshold nor higher than the second preset threshold) or the processing in step S200 The weight matrix obtained later or the weight matrix processed in step S400.

Specifically, the weight matrix is stored in a memory cell array, and in the application stage, an input data stream is input to the memory cell array. In the memory cell array, the input data stream and the weight array are subjected to analog vector-matrix multiplication operations, The result is transmitted to the ADC after the memory cell array in the form of output analog voltage/current of the memory cell array, and the ADC converts the analog voltage/current into a digital signal.

In summary, the neural network weight matrix adjustment method provided by the embodiment of the present invention multiplies or shrinks the neural network weight matrix with uneven overall distribution, so that the processed weight matrix is stored in the memory cell array. The signal obtained after the memory calculation is within the effective range of the ADC (the ADC is arranged after the memory cell array and is used to convert the output of each memory cell column into a digital signal), thereby improving the calculation accuracy.

It is worth noting that, in step S200, all the weight values in the neural network weight matrix are multiplied by a first constant to increase the network weight, ideally all weights in the increased weight array The values meet the requirements, but those skilled in the art can understand that for an array with a relatively uneven weight distribution, in the increased weight array, some larger weight values may exceed the upper limit of the number of bits (also called For example, referring to Figure 6, assuming that the weight precision is 8 bits, after multiplying by the constant 8 (left shifted by 3 bits), the weight value in the third row and the second column will exceed the upper limit of the number of bits. In this case, referring to FIG. 7, the neural network weight matrix adjustment method provided by the embodiment of the present invention may further include:

Step S600: Determine whether the number of digits of each weight value in the processed weight matrix exceeds a third preset threshold;

If yes, perform step S700; if not, perform step S500;

Among them, the third preset threshold may be the accuracy of the algorithm, such as 8 bits, 16 bits, etc., which is set by the designer according to specific statistical requirements and hardware requirements.

Step S700: intercept the bits whose weight values exceed the third preset threshold to obtain a first weight array and a second weight array;

Wherein, the first weight array (also referred to as the standard matrix) is the remaining weight array after truncating the bits whose digits of each weight value exceed the third preset threshold value, and is used to store in the memory computing chip The memory cell array performs an analog vector-matrix multiplication operation; the second weight array is a weight array obtained by truncating the bits whose digits of each weight value exceeds the third preset threshold, and is used to store in another memory cell array Or input an arithmetic operation circuit. It is worth noting that the second weight array may be a sparse matrix or an ordinary weight matrix.

For sparse matrices, the sparse matrix can be input into an arithmetic operation circuit to perform the operation. One input corresponding to the operation is an element in the sparse matrix, and the other input is the weight value corresponding to the element in the aforementioned input data stream. Corresponding input data.

For example, the arithmetic operation circuit can be a conventional digital circuit, such as a multiplier, or a CPU. Specifically, the sparse matrix can be stored in a memory first, and then transferred from a memory to the CPU to perform multiplication operations.

When the second weight array is a normal weight array, the weight array can be stored in another memory cell array, and the other memory cell array performs an analog vector-matrix multiplication operation on the second weight array and the input data stream.

It is worth noting that when performing interception, it is necessary to correspond to the address of the recorded data, so that when the matrix obtained by the interception is operated, the corresponding input data is selected.

Fig. 8 shows a schematic diagram of dividing the weight array after amplifying the weight array in an embodiment of the present invention; as shown in Fig. 8, corresponding to the super-bit number situation shown in Fig. 6, the part higher than 8 bits is truncated and placed into one The new weight matrix, which is a sparse matrix. The standard matrix is still processed based on the integration of storage and calculation (vector-matrix multiplication), while the sparse matrix part can be processed by conventional digital circuits, and finally the combination of the two is the final output.

Figure 8 is a step-by-step detailed description of the principle of the embodiment of the present invention. In actual implementation, in order to effectively save the program, see Figure 9. While increasing the weight, the overflowed bits can be directly stored in another matrix, and There is no need to distinguish between the enlargement step and the division step.

In an optional embodiment, the adjustment method may further include:

Those skilled in the art can understand that the combination method can superimpose the results of the first weight array and the second weight array, and the combination method can also reduce the superimposed result by a certain multiple on the basis of superposition. It can be selected according to parameters such as the quantization accuracy of the subsequent circuit.

In an optional embodiment, in step S400, all the weight values in the neural network weight matrix are divided by a second constant to reduce the network weight, ideally the reduced weight array All the weight values in, meet the requirements, but those skilled in the art can understand that for an array with a relatively uneven weight distribution, in the reduced weight array, after some weight values are shifted to the right, overflow may occur For this case, referring to Fig. 10, this step S400 may include the following:

Step S410: Divide each weight value by the second constant to obtain a third weight array;

Step S420: Save the overflow bits obtained by dividing each weight value by the second constant as a fourth weight array;

Wherein, the third weight array is used to store in a memory cell array to perform analog vector-matrix multiplication; the fourth weight array is used to store in another memory cell array or input arithmetic operation circuit, and the operation of the fourth weight array For details, please refer to the calculation of the second weight array, which will not be repeated here.

Specifically, dividing each weight value by the second constant is equivalent to shifting to the right, and saving the overflow bit as a fourth weight array.

It is worth noting that when shifting or saving the overflow bit as a matrix, it is necessary to correspond to the address of the recorded data (that is, the address in the matrix), so that when the matrix obtained by interception is operated, the corresponding input data is selected .

FIG. 11 shows a schematic diagram of dividing the weight array after shrinking the weight array in the embodiment of the present invention; as shown in FIG. 11, dividing each weight value in the original neural network weight matrix by 8 is equivalent to dividing each binary weight The value is shifted by 3 bits to the right to obtain a standard matrix, and the overflow bit is saved as an overflow matrix. Since the standard matrix and overflow matrix obtained after shifting in the figure are not sparse matrices, two memory cell arrays can be used to process the standard matrix separately As well as the overflow matrix, the input data streams of the two memory cell arrays are the same. After the outputs of the two memory cell arrays are combined, they are input to the ADC for conversion.

It is worth noting that if the weight distribution of the new matrix (such as the standard matrix, sparse matrix or overflow matrix shown in Figure 8, Figure 9 or Figure 11) generated after adjustment is not uniform, the embodiments of the present invention can still be used to provide The adjustment method of the adjustment method is adjusted until the weight distribution of the matrix is appropriate. For example, for the new matrix after adjustment, the weight is too small, and you can multiply it by a constant to adjust to a proper distribution.

In a further embodiment, the neural network weight matrix adjustment method further includes:

Those skilled in the art can understand that the combination method can superimpose the results of the third weight array and the fourth weight array, and the combination method can also increase the superimposed result by a certain multiple on the basis of superposition. It can be selected according to parameters such as the quantization accuracy of the subsequent circuit.

Based on the same inventive concept, an embodiment of the present application also provides a neural network weight matrix adjustment device, which can be used to implement the method described in the foregoing embodiment, as described in the following embodiment. Since the principle of the neural network weight matrix adjustment device to solve the problem is similar to the above method, the implementation of the neural network weight matrix adjustment device can refer to the implementation of the above method, and the repetition will not be repeated. As used below, the term "unit" or "module" can be a combination of software and/or hardware that implements a predetermined function. Although the devices described in the following embodiments are preferably implemented by software, implementation by hardware or a combination of software and hardware is also possible and conceived.

FIG. 12 is a structural block diagram of a neural network weight matrix adjustment device in an embodiment of the present invention; as shown in FIG. 12, the neural network weight matrix adjustment device may include: a first judgment module 10, a weight amplification module 20, and a second judgment module 30 and the weight reduction module 40.

The first judgment module 10 judges whether the weight distribution of the neural network weight matrix is lower than a first preset threshold;

The weight amplification module 20, if the weight distribution of the neural network weight matrix is lower than the first preset threshold, multiply all the weight values in the neural network weight matrix by a first constant;

The second judgment module 30, if the weight distribution of the neural network weight matrix is not lower than the first preset threshold, judge whether the weight distribution of the neural network weight matrix is higher than the second preset threshold, wherein the second preset threshold is greater than The first preset threshold;

The weight reduction module 40, if the weight distribution of the neural network weight matrix is higher than the second preset threshold, divide all the weight values in the neural network weight matrix by a second constant;

Wherein, the first constant and the second constant are both greater than one.

The embodiment of the present invention also provides a neural network weight matrix writing control method. FIG. 13 shows an application scenario of the neural network weight matrix writing control method provided by an embodiment of the present invention; as shown in FIG. 13, in this scenario, The weight to be written is input to the shift register in the in-memory calculation chip 2'. After the shift register shifts the weight, it is written into the memory cell corresponding to the memory cell array by the write module, and the neural network is used to process the input When data, the input data stream is transmitted to the memory cell array after a certain preprocessing, and the neural network weight matrix pre-written in the memory cell array is used for neural network operation, and the output data stream of the memory cell array is converted to digital by the ADC module Signal, output the calculation result.

It is worth noting that FIG. 13 only exemplarily lists a few circuit modules in the in-memory computing chip 2'. Those skilled in the art can understand that the in-memory computing chip 2'may also be provided with registers and back Processing modules and other related functional circuits. In addition, each memory cell in the memory cell array can be implemented by a programmable semiconductor device, such as a floating gate MOS tube.

In addition, the shift register is also connected to an external buffer 1'for buffering data.

Wherein, the control module executes the neural network weight matrix writing control method provided by the embodiment of the present invention. Referring to FIG. 14, the neural network weight matrix writing control method may include the following content:

Step S1000: According to the data adjustment instruction, the shift register is controlled to shift the input weight values, and the shift register shifts the input weight values and the overflow bits and the weight values are in the nerve The addresses in the network weight matrix are stored in the buffer;

Among them, the data adjustment instruction includes: the shift direction and the number of shift bits.

Specifically, it can be judged in advance whether the weight distribution of the neural network weight matrix is lower than a first preset threshold, or whether it is higher than a second preset threshold, and if so, a data adjustment instruction is generated.

In addition, the neural network weight matrix is a neural network weight matrix trained in the neural network training stage, and the weight distribution can be a statistical indicator such as a mean value or a probability distribution. The first preset threshold and the second preset threshold are set by designers according to specific statistical requirements and hardware requirements, where the second preset threshold is greater than the first preset threshold.

If the weight distribution of the neural network weight matrix is lower than the first preset threshold, the shift direction in the data adjustment instruction is to move to the left to increase the network weight, and then write the increased weight matrix into the memory cell array. In application, for the same input data stream, the output analog current of the memory cell array can be doubled. The number of shifts is set by the designer according to the specific statistical requirements and hardware requirements; if the weight distribution of the neural network weight matrix is high At the second preset threshold, the shift direction in the data adjustment instruction is to move to the right to reduce the network weight. In the application phase, for the same input data stream, the reduced network weight can make the storage unit The analog voltage/current output by the array is reduced to the appropriate range of the ADC, and the number of shifts is set by the designer according to the specific statistical requirements and hardware requirements.

It is worth noting that the array obtained after the shift operation is used as a standard matrix, and the neural network operation is performed by the memory cell array. The bits overflowed after the shift and the address of the weight value in the neural network weight matrix are saved as an array. Buffer, this array can be a sparse matrix or a normal weight array;

Step S2000: store the data in the buffer into a memory cell array or input an arithmetic operation circuit;

Specifically, when the data in the buffer is a sparse matrix, the sparse matrix is input to an arithmetic operation circuit to perform the operation, and when the data in the buffer is a normal weight matrix, a memory cell array is input for processing, For details, please refer to the processing procedure of the second weight array, which will not be repeated here.

The neural network weight matrix writing control method provided by the embodiment of the present invention uses a shift register to multiply or reduce the overall uneven neural network weight matrix during the weight writing process, so as to make the processed The signal obtained after the weight matrix storage memory cell array performs in-storage calculation is within the effective range of the ADC (the ADC is arranged after the memory cell array and is used to convert the output of each memory cell column into a digital signal), thereby improving the calculation accuracy.

In an optional embodiment, the neural network weight matrix writing control method further includes:

Those skilled in the art can understand that the combination method can superimpose the results of two memory cell arrays or a storage cell array and an arithmetic operation circuit, and the combination method can also be to superimpose the result after superposition on the basis of superposition. Decrease a certain multiple or increase a certain multiple, which can be specifically selected according to parameters such as the quantization accuracy of the subsequent circuit.

The device explained in the above embodiments may be implemented by a computer chip or entity, or implemented by a product with a certain function. A typical implementation device is an electronic device. Specifically, the electronic device may be, for example, a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, Game consoles, tablet computers, wearable devices, or any combination of these devices.

In a typical example, the electronic device specifically includes a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements the following steps when the program is executed:

Wherein, the first constant and the second constant are both greater than one.

It can be seen from the above description that the electronic device provided by the embodiments of the present invention can be used for neural network weight matrix adjustment, by multiplying or multiplying the neural network weight matrix with uneven overall distribution, so that the processed weight matrix can be stored The signal obtained after the memory cell array performs the memory calculation is within the effective range of the ADC (the ADC is arranged after the memory cell array and is used to convert the output of each memory cell column into a digital signal), thereby improving the calculation accuracy.

Reference is now made to FIG. 15, which shows a schematic structural diagram of an electronic device 600 suitable for implementing the embodiments of the present application.

As shown in FIG. 15, the electronic device 600 includes a central processing unit (CPU)/MCU601, which can be loaded into a random access memory according to a program stored in a read-only memory (ROM) 602 or from a storage part 608

(RAM)) The program in 603 executes various appropriate tasks and processing. In the RAM 603, various programs and data required for the operation of the system 600 are also stored. The CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to the bus 604.

The following components are connected to the I/O interface 605: an input part 606 including a keyboard, a mouse, etc.; an output part 607 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and speakers, etc.; a storage part 608 including a hard disk, etc. ; And the communication part 609 including a network interface card such as a LAN card, a modem, etc. The communication section 609 performs communication processing via a network such as the Internet. The driver 610 is also connected to the I/O interface 605 as needed. The removable medium 611, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 610 as required, so that the computer program read from it is installed as the storage part 608 as required.

In particular, according to an embodiment of the present invention, the process described above with reference to the flowchart can be implemented as a computer software program. For example, the embodiment of the present invention includes a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:

Wherein, the first constant and the second constant are both greater than one.

It can be seen from the above description that the computer-readable storage medium provided by the embodiments of the present invention can be used for neural network weight matrix adjustment. The signal obtained after the weight matrix storage memory cell array performs in-storage calculation is within the effective range of the ADC (the ADC is arranged after the memory cell array and is used to convert the output of each memory cell column into a digital signal), thereby improving the calculation accuracy.

In such an embodiment, the computer program may be downloaded and installed from the network through the communication part 609, and/or installed from the removable medium 611.

Computer-readable media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology. The information can be computer-readable instructions, data structures, program modules, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. According to the definition in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.

For the convenience of description, when describing the above device, the functions are divided into various units and described separately. Of course, when implementing this application, the functions of each unit can be implemented in the same or multiple software and/or hardware.

The present invention is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the present invention. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to produce a machine, so that the instructions executed by the processor of the computer or other programmable data processing equipment can be used to generate It is a device that realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

It should also be noted that the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, commodity or equipment including a series of elements not only includes those elements, but also includes Other elements that are not explicitly listed, or also include elements inherent to such processes, methods, commodities, or equipment. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, commodity, or equipment that includes the element.

Those skilled in the art should understand that the embodiments of the present application can be provided as a method, a system, or a computer program product. Therefore, this application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.

This application may be described in the general context of computer-executable instructions executed by a computer, such as a program module. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types. This application can also be practiced in distributed computing environments. In these distributed computing environments, tasks are performed by remote processing devices connected through a communication network. In a distributed computing environment, program modules can be located in local and remote computer storage media including storage devices.

The various embodiments in this specification are described in a progressive manner, and the same or similar parts between the various embodiments can be referred to each other, and each embodiment focuses on the difference from other embodiments. In particular, as for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment.

The above descriptions are only examples of the present application, and are not used to limit the present application. For those skilled in the art, this application can have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the scope of the claims of this application.

Claims

A method for adjusting the weight matrix of a neural network is characterized in that it comprises:

Judging whether the weight distribution of the neural network weight matrix is lower than the first preset threshold;

If yes, multiply all the weight values in the neural network weight matrix by a first constant;

If not, determine whether the weight distribution of the neural network weight matrix is higher than a second preset threshold, where the second preset threshold is greater than the first preset threshold;

If the weight distribution of the neural network weight matrix is higher than the second preset threshold, dividing all the weight values in the neural network weight matrix by a second constant;

Wherein, the first constant and the second constant are both greater than one.
The method for adjusting a neural network weight matrix according to claim 1, wherein said multiplying all the weight values in the neural network weight matrix by a first constant further comprises:

Judging whether the number of digits of each weight value in the processed weight matrix exceeds the third preset threshold;

If yes, intercept the bits whose weight values exceed the third preset threshold to obtain a first weight array and a second weight array;

Wherein, the first weight array is the remaining weight array after truncating the bits whose digits of each weight value exceeds the third preset threshold, and is used to store in a memory cell array; the second weight array is The weight array obtained by truncating the bits whose digits of each weight value exceeds the third preset threshold is used for storing in another memory cell array or inputting an arithmetic operation circuit.
The method for adjusting the weight matrix of a neural network according to claim 2, further comprising:

If the number of bits of each weight value in the processed weight matrix does not exceed the third preset threshold, the processed weight array is used to store in a memory cell array.
The method for adjusting the weight matrix of a neural network according to claim 2, further comprising:

When the second weight array is stored in another memory cell array, the ADC output result after the memory cell array corresponding to the first weight array is combined with the ADC output result after the memory cell array corresponding to the second weight array Combine

When the second weight array is input to the arithmetic operation circuit, the ADC output result after the memory cell array corresponding to the first weight array is combined with the output result of the arithmetic operation circuit.
The method for adjusting a neural network weight matrix according to claim 1, wherein said dividing all the weight values in the neural network weight matrix by a second constant comprises:

Dividing each weight value by the second constant to obtain a third weight array;

Saving the overflow bits obtained by dividing each weight value by the second constant as a fourth weight array;

Wherein, the third weight array is used to store in a memory cell array; the fourth weight array is used to store in another memory cell array or input an arithmetic operation circuit.
The method for adjusting the weight matrix of a neural network according to claim 5, further comprising:

When the fourth weight array is stored in another memory cell array, the ADC output result after the memory cell array corresponding to the third weight array is combined with the ADC output result after the memory cell array corresponding to the fourth weight array Combine

When the fourth weight array is input to the arithmetic operation circuit, the ADC output result after the memory cell array corresponding to the third weight array is combined with the output result of the arithmetic operation circuit.
A neural network weight matrix writing control method, which is characterized in that it comprises:

According to the data adjustment instruction, the shift register is controlled to perform the shift operation of each weight value input, and the shift register is shifted to the weight value input and the overflow bit and the weight value are in the nerve The address in the network weight matrix is stored in the buffer, where the data adjustment instruction includes: a shift direction and a shift bit;

Storing the data in the buffer into a memory cell array or inputting an arithmetic operation circuit;

Wherein, the data adjustment instruction is generated when the weight distribution of the neural network weight matrix is uneven; the shift register is connected to a write module; the write module is connected to another memory cell array for shifting the The shifted data of the register is written into another memory cell array.
The neural network weight matrix writing control method according to claim 7, characterized in that it further comprises:

When the data in the buffer is input to the arithmetic operation circuit, combining the ADC output result after the another memory cell array with the output result of the arithmetic operation circuit;

When the data in the buffer is stored in a memory cell array, the ADC output result after the memory cell array is combined with the ADC output result after the other memory cell array.
A neural network weight matrix adjustment device, which is characterized in that it comprises:

The first judgment module judges whether the weight distribution of the neural network weight matrix is lower than the first preset threshold;

The weight amplification module, if the weight distribution of the neural network weight matrix is lower than the first preset threshold, multiply all the weight values in the neural network weight matrix by a first constant;

The second judgment module, if the weight distribution of the neural network weight matrix is not lower than the first preset threshold, judge whether the weight distribution of the neural network weight matrix is higher than a second preset threshold, wherein the second preset threshold Greater than the first preset threshold;

The weight reduction module, if the weight distribution of the neural network weight matrix is higher than a second preset threshold, divide all the weight values in the neural network weight matrix by a second constant;

Wherein, the first constant and the second constant are both greater than one.
An electronic device, comprising a memory, a processor, and a computer program stored on the memory and capable of running on the processor, wherein the processor implements any one of claims 1 to 6 when the program is executed The steps of the neural network weight matrix adjustment method.
A computer-readable storage medium with a computer program stored thereon, wherein the computer program implements the steps of the neural network weight matrix adjustment method according to any one of claims 1 to 6 when the computer program is executed by a processor.