CN110097186A

CN110097186A - A kind of neural network isomery quantization training method

Info

Publication number: CN110097186A
Application number: CN201910354693.7A
Authority: CN
Inventors: 王子彤; 姜凯; 秦刚
Original assignee: Jinan Inspur Hi Tech Investment and Development Co Ltd
Current assignee: Shandong Inspur Scientific Research Institute Co Ltd
Priority date: 2019-04-29
Filing date: 2019-04-29
Publication date: 2019-08-06
Anticipated expiration: 2039-04-29
Also published as: CN110097186B

Abstract

The present invention provides a kind of neural network isomery quantization training method, belong to artificial neural network technology field, the present invention is on the basis of training framework of the tradition based on CPU or GPU or a combination of both, add high-speed interface logic, accelerating module is calculated by high-speed interface logical connection hardware, specific a certain step or a few step calculating process are transferred to the hardware and calculate accelerating module among training process, result is back to source training master control through the high-speed interface logic after the completion of calculating, completes the training process with specific customization function.Forward position new construction or new algorithm are fast implemented and are deployed in training, system flexibility is improved, reduces storage and bandwidth demand, resource requirement during reduction forward prediction, training complexity is reduced, training effectiveness is improved, guarantees that current training device can preferably adapt to newest neural network structure.

Description

A kind of neural network isomery quantization training method

Technical field

The present invention relates to artificial neural network technology fields more particularly to a kind of neural network isomery to quantify training method.

Background technique

One group of training set is sent into network by neural metwork training, according to the difference between the reality output and desired output of network To adjust weight.Training process includes: the output of the structure and propagated forward that define neural network as a result, finding out result and expectation The error of value, then the return by error in layer, then carry out right value update.Net is adjusted by training sample and desired value Network weight.

CPU is good at logic control, serial arithmetic and universal class data operation, GPU side and handles Large-scale parallel computing again Multitask.CPU and GPU can efficiently complete task in respective field, can also be used as the master of Current Situation of Neural Network training Stream mode.

Deeply with research, more and more new constructions, new algorithm is constantly suggested, and gives universal cpu, GPU training method Requirements at the higher level and challenge are brought, specific detailed structure is difficult to fast implement, and the training time may become more tediously long.

Summary of the invention

In order to solve the above technical problems, the invention proposes a kind of neural network isomeries to quantify training method.Using different Structure mode accelerates original training process, can be by forward position new construction, such as extraordinary convolution type or new algorithm, such as model Parameter quantization etc. fast implements and is deployed in training, improve system flexibility, reduces storage and bandwidth demand, reduces positive Resource requirement during prediction reduces training complexity, improves training effectiveness, it is newest to guarantee that current training device can be adapted to preferably Neural network structure.

The technical scheme is that

A kind of neural network isomery quantization training method, in training framework base of the tradition based on CPU or GPU or a combination of both On plinth, high-speed interface logic is added, accelerating module is quantified by high-speed interface logical connection hardware, in the training process additive amount Change step, the quantum chemical method process of model parameter and characteristic pattern result is transferred to the hardware and quantifies accelerating module, will be quantified Result after the completion of calculating is back to source training master control through the high-speed interface logic, the model parameter after updating quantization, iteration Complete the training process with model parameter and characteristic pattern result quantization function.

Further, the hardware quantization accelerating module is responsible for completing neural network model parameter and neural network characteristics figure As a result low bit position quantization, is realized by special circuit, constitutes heterogeneous structure with traditional training main body CPU or GPU.

Further, the data quantization operation includes: that data are temporary, data statistics sequence, data compression and decompression, Data Hash with table look-up, floating number turns particular number of bits fixed-point number, floating number displacement scaling and interception, data inverse quantization etc..

The specific customization function includes but is not limited to: model parameter quantization, and floating number turns fixed-point number, special convolution behaviour Make, such as expand convolution, deep-wise convolution, 1x1 multiplier array is complete to connect adder and multiplier array etc.；The specific customization function Accelerating module is calculated by the hardware to realize, the module may repeatedly or be only once enabled in training process, completes specific function.

Specifically includes the following steps:

1) neural network model parameter and hyper parameter initial value are set, together based under CPU or GPU training frame in tradition When initialization hardware quantify accelerating module, start to train；

2) after first run backpropagation has updated neural network the last layer parameter, updated weight parameter is incoming described Hardware quantifies accelerating module, carries out first-time compression simultaneously to weight parameter by the generic data compressions such as such as GZIP or entropy coding method Then storage carries out sort method to data, further shifted and intercepted to data according to desired fixed point digit, and limit number According to maxima and minima, weight parameter after being quantified passes the preceding layer for continuing backpropagation in conventional frame back Parameter updates, and until completing first run backpropagation, obtains whole weight parameters；

3) it repeats step 2 to be updated weight, completes more wheel backpropagations and completed until reaching model Loss requirement Training；

4) in addition to weight parameter, each layer characteristic pattern result of neural network can also carry out quantization operation, to push away to entire model Reason is further quantified；

5) as needed, Hash operation can be carried out to weighted data and obtains index value, data are further compressed, or It carries out data inverse quantization immediately after the completion of quantization, reduces quantization loss.

The hardware calculates accelerating module and configures realization, external nonvolatile storage by logic using FPGA or ACAP Part, difference customization function can store simultaneously, configured by training demand to FPGA or ACAP in real time, complete same training process In different function.

The high-speed interface logic includes but is not limited to PCIE interface, USB3.0 interface, the modes such as ten thousand mbit ethernet interfaces It realizes, carries out communication interaction with original trained master control.

The beneficial effects of the invention are as follows

Original training process is accelerated using isomery mode, can be by forward position new construction, such as extraordinary convolution type, or New algorithm, such as model parameter quantization, fast implement and are deployed in training, improve system flexibility, reduce storage and bandwidth Demand reduces resource requirement during forward prediction, reduces training complexity, improves training effectiveness, guarantee current training device Newest neural network structure can preferably be adapted to.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in the embodiment of the present invention Technical solution is clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, rather than complete The embodiment in portion, based on the embodiments of the present invention, those of ordinary skill in the art are in the premise for not making creative work Under every other embodiment obtained, shall fall within the protection scope of the present invention.

A kind of neural network isomery of the invention quantifies training method, in instruction of the tradition based on CPU or GPU or a combination of both On the basis of practicing framework, high-speed interface logic is added, accelerating module is quantified by high-speed interface logical connection hardware, in training process The quantum chemical method process of model parameter and characteristic pattern result is transferred to the hardware and quantifies to accelerate mould by middle addition quantization step Result after the completion of quantum chemical method is back to source training master control through the high-speed interface logic, the model after updating quantization by block Parameter, iteration complete the training process with model parameter and characteristic pattern result quantization function.

Hardware quantization accelerating module is responsible for completing the low bit of neural network model parameter and neural network characteristics figure result Position quantization, is realized by special circuit, constitutes heterogeneous structure with traditional training main body CPU or GPU.Data quantization operates Data are temporary, data statistics sequence, data compression and decompression, data Hash with table look-up, floating number turns particular number of bits fixed-point number, Floating number displacement scaling and interception, data inverse quantization etc..

The following steps are included:

5) as needed, Hash operation can be carried out to weighted data and obtains index value, data are further compressed, or It carries out data inverse quantization immediately after the completion of quantization, reduces quantization loss；

The hardware calculates accelerating module and configures realization, external nonvolatile storage by logic using FPGA or ACAP Part, difference customization function can store simultaneously, configured by training demand to FPGA or ACAP in real time, complete same training process In different function；The high-speed interface logic includes but is not limited to PCIE interface, USB3.0 interface, ten thousand mbit ethernet interfaces etc. Mode is realized, carries out communication interaction with original trained master control.

The foregoing is merely presently preferred embodiments of the present invention, is only used to illustrate the technical scheme of the present invention, and is not intended to limit Determine protection scope of the present invention.Any modification, equivalent substitution, improvement and etc. done all within the spirits and principles of the present invention, It is included within the scope of protection of the present invention.

Claims

1. a kind of neural network isomery quantifies training method, which is characterized in that

On the basis of training framework of the tradition based on CPU or GPU or a combination of both, high-speed interface logic is added, high quick access is passed through Mouthful logical connection hardware quantifies accelerating module, in the training process will be under the quantum chemical method process of model parameter and characteristic pattern result It puts to hardware and quantifies accelerating module, the result after the completion of quantum chemical method is back to source training master through the high-speed interface logic Control, the model parameter after updating quantization, iteration complete the training process with model parameter and characteristic pattern result quantization function.

2. the method according to claim 1, wherein

The hardware quantization accelerating module is responsible for completing the low bit of neural network model parameter and neural network characteristics figure result Position quantization.

3. according to the method described in claim 2, it is characterized in that,

The hardware quantization accelerating module is realized by circuit, constitutes heterogeneous structure with traditional training main body CPU or GPU.

4. the method according to claim 1, wherein

Data quantization operation includes: that data are temporary, data statistics sequence, data compression and decompression, data Hash with look into Table, floating number turn particular number of bits fixed-point number, floating number displacement scaling and interception, data inverse quantization.

5. the method according to claim 1, wherein

Specific step is as follows:

1) neural network model parameter and hyper parameter initial value are set, while just based under CPU or GPU training frame in tradition Beginningization hardware quantifies accelerating module, starts to train；

2) after first run backpropagation has updated neural network the last layer parameter, updated weight parameter is passed to the hardware Quantify accelerating module, first-time compression is carried out to weight parameter by data compression method and store, then data are counted Sequence further shifts data according to desired fixed point digit and intercepts, and limits data maximums and minimum value, the amount of obtaining Weight parameter after change passes the previous layer parameter update for continuing backpropagation in conventional frame back, anti-until completing the first run To propagation, whole weight parameters are obtained；

3) it repeats step 2) to be updated weight, completes more wheel backpropagations until reaching model Loss requirement and complete instruction Practice.

6. according to the method described in claim 5, it is characterized in that,

In addition to weight parameter, each layer characteristic pattern result of neural network can also carry out quantization operation, to carry out to entire model reasoning Further quantization.

7. according to the method described in claim 6, it is characterized in that,

As needed, Hash operation can be carried out to weighted data and obtains index value, data are further compressed, or is quantifying It carries out data inverse quantization immediately after the completion, reduces quantization loss.

8. the method according to claim 1, wherein

The hardware is calculated accelerating module and is realized using FPGA or ACAP by logic configuration, external non-volatile memory device, no It can be stored simultaneously with customization function, FPGA or ACAP is configured in real time by training demand, is completed in same training process Different function.

9. the method according to claim 1, wherein

The high-speed interface logic can be used PCIE interface, USB3.0 interface or ten thousand mbit ethernet interfaces and realize, with original training Master control carries out communication interaction.