CN110097186B

CN110097186B - Neural network heterogeneous quantitative training method

Info

Publication number: CN110097186B
Application number: CN201910354693.7A
Authority: CN
Inventors: 王子彤; 姜凯; 秦刚
Original assignee: Shandong Inspur Scientific Research Institute Co Ltd
Current assignee: Shandong Inspur Scientific Research Institute Co Ltd
Priority date: 2019-04-29
Filing date: 2019-04-29
Publication date: 2023-04-18
Anticipated expiration: 2039-04-29
Also published as: CN110097186A

Abstract

The invention provides a neural network heterogeneous quantitative training method, which belongs to the technical field of artificial neural networks, and is characterized in that high-speed interface logic is added on the basis of a traditional training framework based on a CPU (Central processing Unit) or a GPU (graphics processing Unit) or a combination of the CPU and the GPU, a hardware calculation acceleration module is connected through the high-speed interface logic, a specific calculation process or a plurality of calculation processes in the training process are transferred to the hardware calculation acceleration module, and after calculation is finished, a result is returned to a source training main control through the high-speed interface logic, so that the training process with a specific customized function is finished. A front-edge new structure or a new algorithm is quickly realized and deployed in training, the flexibility of the system is improved, the requirements for storage and bandwidth are reduced, the resource requirements in the forward prediction process are reduced, the training complexity is reduced, the training efficiency is improved, and the current training device can be better adapted to the latest neural network structure.

Description

Neural network heterogeneous quantitative training method

Technical Field

The invention relates to the technical field of artificial neural networks, in particular to a neural network heterogeneous quantitative training method.

Background

Neural network training feeds a set of training sets into the network, adjusting the weights based on the difference between the actual output and the expected output of the network. The training process comprises the following steps: defining the structure of the neural network and the output result of forward propagation, solving the error between the result and the expected value, returning the error layer by layer, and then updating the weight. The network weights are adjusted by training samples and expectations.

The CPU is good at logic control, serial operation and general type data operation, and the GPU emphasizes processing large-scale parallel computing multiple tasks. The CPU and the GPU can efficiently complete tasks in respective fields and can also be used as a mainstream mode of current neural network training.

With the intensive research, more and more new structures and new algorithms are continuously proposed, so that higher requirements and challenges are brought to general CPU and GPU training modes, specific detailed structures are difficult to realize quickly, and the training time is possibly more tedious.

Disclosure of Invention

In order to solve the technical problems, the invention provides a neural network heterogeneous quantitative training method. The method has the advantages that the original training process is accelerated in a heterogeneous mode, a leading edge new structure such as a special convolution type and the like or a new algorithm such as model parameter quantification and the like can be rapidly realized and deployed in training, the flexibility of the system is improved, the requirements for storage and bandwidth are reduced, the resource requirement in the forward prediction process is reduced, the training complexity is reduced, the training efficiency is improved, and the current training device can be well adapted to the latest neural network structure.

The technical scheme of the invention is as follows:

a neural network heterogeneous quantitative training method is characterized in that high-speed interface logic is added on the basis of a traditional training framework based on a CPU (Central processing Unit) or a GPU (graphics processing Unit) or a combination of the CPU and the GPU, a hardware quantitative acceleration module is connected through the high-speed interface logic, a quantization step is added in the training process, the quantitative calculation process of model parameters and feature map results is transferred to the hardware quantitative acceleration module, the results after the quantitative calculation are returned to a source training main control through the high-speed interface logic, the quantized model parameters are updated, and the training process with the function of quantizing the model parameters and the feature map results is finished in an iterative mode.

Furthermore, the hardware quantization acceleration module is responsible for completing low-bit quantization of the neural network model parameters and the neural network characteristic diagram results, is realized by a special circuit, and forms a heterogeneous structure with a traditional training main body CPU or GPU.

Further, the data quantization operation includes: temporary storage of data, statistical sorting of data, compression and decompression of data, hashing and table look-up of data, conversion of floating point number to fixed point number with specific number, shifting, scaling and intercepting of floating point number, inverse quantization of data and the like.

The specific customized functions include, but are not limited to: model parameter quantization, floating point number fixed point number, special convolution operation such as expansion convolution, deep-wise convolution, 1x1 multiplier array, full connection multiplier-adder array and the like; the specific customized function is realized by the hardware calculation acceleration module, and the module can be enabled for multiple times or only once in the training process to complete the specific function.

The method specifically comprises the following steps:

1) Setting neural network model parameters and hyper-parameter initial values under a traditional training framework based on a CPU or a GPU, and initializing a hardware quantization acceleration module to start training;

2) After the last layer of parameters of the neural network are updated through first round reverse propagation, the updated weight parameters are transmitted into the hardware quantization acceleration module, the weight parameters are primarily compressed and stored through general data compression methods such as GZIP or entropy coding, then the data are subjected to statistical sequencing, the data are further shifted and intercepted according to expected fixed point digits, the maximum value and the minimum value of the data are limited, the quantized weight parameters are obtained, the quantized weight parameters are transmitted back to the previous layer of parameters in the traditional frame to be continuously updated through reverse propagation until the first round reverse propagation is completed, and all the weight parameters are obtained;

3) Repeating the step 2 to update the weight, completing multi-round reverse propagation until the model Loss requirement is met, and completing training;

4) Besides the weight parameters, the results of the feature maps of all layers of the neural network can also be subjected to quantization operation so as to further quantize the reasoning of the whole model;

5) According to the requirement, hash operation can be carried out on the weight data to obtain an index value, the data is further compressed, or data inverse quantization is carried out immediately after quantization is finished, so that the quantization loss is reduced.

The hardware calculation acceleration module is realized by logic configuration by adopting an FPGA or an ACAP, is externally connected with a nonvolatile memory device, can simultaneously store different customized functions, and configures the FPGA or the ACAP in real time according to training requirements to complete different functions in the same training process.

The high-speed interface logic includes but is not limited to a PCIE interface, a USB3.0 interface, a gigabit Ethernet interface and the like, and is communicated and interacted with the original training master control.

The invention has the advantages that

The method has the advantages that the original training process is accelerated in a heterogeneous mode, a leading edge new structure such as a special convolution type and the like or a new algorithm such as model parameter quantification and the like can be rapidly realized and deployed in training, the flexibility of the system is improved, the requirements for storage and bandwidth are reduced, the resource requirement in the forward prediction process is reduced, the training complexity is reduced, the training efficiency is improved, and the current training device can be well adapted to the latest neural network structure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention are described below, it is obvious that the described embodiments are a part of the embodiments of the present invention, but not all embodiments, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts based on the embodiments of the present invention belong to the protection scope of the present invention.

The invention discloses a heterogeneous quantitative training method of a neural network, which is characterized in that high-speed interface logic is added on the basis of a traditional training framework based on a CPU (Central processing Unit) or a GPU (graphics processing Unit) or a combination of the CPU and the GPU, a hardware quantitative acceleration module is connected through the high-speed interface logic, a quantization step is added in the training process, the quantitative calculation process of model parameters and characteristic diagram results is transferred to the hardware quantitative acceleration module, the results after the quantitative calculation are returned to a source training main control through the high-speed interface logic, the quantized model parameters are updated, and the training process with the function of quantizing the model parameters and the characteristic diagram results is finished in an iterative manner.

The hardware quantization accelerating module is responsible for completing low bit quantization of the neural network model parameters and the neural network characteristic diagram results, is realized by a special circuit, and forms a heterogeneous structure with a traditional training main body CPU or GPU. The data quantization operation includes: temporary storage of data, statistical sorting of data, compression and decompression of data, hashing and table lookup of data, conversion of floating point number to fixed point number of specific digit, shifting, scaling and intercepting of floating point number, inverse quantization of data and the like.

The method comprises the following steps:

3) Repeating the step 2 to update the weight, completing multi-round reverse propagation until the requirement of the model Loss is met, and completing training;

4) Besides the weight parameters, the results of the characteristic diagrams of each layer of the neural network can also be subjected to quantization operation so as to further quantize the reasoning of the whole model;

5) According to the requirement, hash operation can be carried out on the weight data to obtain an index value, the data is further compressed, or data inverse quantization is carried out immediately after the quantization is finished, so that the quantization loss is reduced;

the hardware calculation acceleration module is realized by logic configuration by adopting an FPGA or an ACAP, is externally connected with a nonvolatile memory device, can simultaneously store different customized functions, and configures the FPGA or the ACAP in real time according to training requirements to complete different functions in the same training process; the high-speed interface logic includes but is not limited to a PCIE interface, a USB3.0 interface, a gigabit Ethernet interface and the like, and is communicated and interacted with the original training master control.

The above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A neural network heterogeneous quantitative training method is characterized in that,

on the basis of a traditional training framework based on a CPU (Central processing Unit) or a GPU (graphics processing Unit) or a combination of the CPU and the GPU, adding high-speed interface logic, connecting a hardware quantization acceleration module through the high-speed interface logic, transferring a quantization calculation process of model parameters and feature diagram results to the hardware quantization acceleration module in a training process, returning the result after quantization calculation to a source training master control through the high-speed interface logic, updating quantized model parameters, and iteratively completing the training process with a function of quantizing the model parameters and the feature diagram results;

the method comprises the following specific steps:

2) After the last layer of parameters of the neural network are updated through first round reverse propagation, the updated weight parameters are transmitted into the hardware quantization acceleration module, the weight parameters are primarily compressed and stored through a data compression method, then the data are counted and sequenced, the data are further shifted and intercepted according to expected fixed point digits, the maximum value and the minimum value of the data are limited, the quantized weight parameters are obtained, and the quantized weight parameters are transmitted back to the previous layer of parameters which are continuously subjected to reverse propagation in a traditional frame until the first round reverse propagation is completed, and all the weight parameters are obtained;

3) Repeating the step 2) to update the weight, completing multi-round reverse transmission until the requirement of the model Loss is met, and completing training;

besides the weight parameters, the results of the feature maps of all layers of the neural network can also be subjected to quantization operation so as to further quantize the reasoning of the whole model;

according to the requirement, the weighted data can be subjected to Hash operation to obtain an index value, the data is further compressed, or the data is subjected to inverse quantization immediately after the quantization is finished, so that the quantization loss is reduced.

2. The method of claim 1,

and the hardware quantization accelerating module is responsible for completing low-bit quantization of the neural network model parameters and the neural network characteristic diagram result.

3. The method of claim 2,

the hardware quantization acceleration module is realized by a circuit and forms a heterogeneous structure with a traditional training main body CPU or GPU.

4. The method of claim 1,

the data quantization operation comprises: temporary storage of data, statistical sorting of data, compression and decompression of data, hashing and table look-up of data, conversion of floating point number to fixed point number with specific number, shifting, scaling and intercepting of floating point number, and inverse quantization of data.

5. The method of claim 1,

the hardware quantization acceleration module is realized by logic configuration by adopting an FPGA or an ACAP, is externally connected with a nonvolatile memory device, can simultaneously store different customized functions, and configures the FPGA or the ACAP in real time according to training requirements to complete different functions in the same training process.

6. The method of claim 1,

the high-speed interface logic can be realized by a PCIE interface, a USB3.0 interface or a gigabit Ethernet interface and is communicated and interacted with the original training master control.