CN110097186A - A kind of neural network isomery quantization training method - Google Patents
A kind of neural network isomery quantization training method Download PDFInfo
- Publication number
- CN110097186A CN110097186A CN201910354693.7A CN201910354693A CN110097186A CN 110097186 A CN110097186 A CN 110097186A CN 201910354693 A CN201910354693 A CN 201910354693A CN 110097186 A CN110097186 A CN 110097186A
- Authority
- CN
- China
- Prior art keywords
- training
- quantization
- data
- neural network
- hardware
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Feedback Control In General (AREA)
Abstract
The present invention provides a kind of neural network isomery quantization training method, belong to artificial neural network technology field, the present invention is on the basis of training framework of the tradition based on CPU or GPU or a combination of both, add high-speed interface logic, accelerating module is calculated by high-speed interface logical connection hardware, specific a certain step or a few step calculating process are transferred to the hardware and calculate accelerating module among training process, result is back to source training master control through the high-speed interface logic after the completion of calculating, completes the training process with specific customization function.Forward position new construction or new algorithm are fast implemented and are deployed in training, system flexibility is improved, reduces storage and bandwidth demand, resource requirement during reduction forward prediction, training complexity is reduced, training effectiveness is improved, guarantees that current training device can preferably adapt to newest neural network structure.
Description
Technical field
The present invention relates to artificial neural network technology fields more particularly to a kind of neural network isomery to quantify training method.
Background technique
One group of training set is sent into network by neural metwork training, according to the difference between the reality output and desired output of network
To adjust weight.Training process includes: the output of the structure and propagated forward that define neural network as a result, finding out result and expectation
The error of value, then the return by error in layer, then carry out right value update.Net is adjusted by training sample and desired value
Network weight.
CPU is good at logic control, serial arithmetic and universal class data operation, GPU side and handles Large-scale parallel computing again
Multitask.CPU and GPU can efficiently complete task in respective field, can also be used as the master of Current Situation of Neural Network training
Stream mode.
Deeply with research, more and more new constructions, new algorithm is constantly suggested, and gives universal cpu, GPU training method
Requirements at the higher level and challenge are brought, specific detailed structure is difficult to fast implement, and the training time may become more tediously long.
Summary of the invention
In order to solve the above technical problems, the invention proposes a kind of neural network isomeries to quantify training method.Using different
Structure mode accelerates original training process, can be by forward position new construction, such as extraordinary convolution type or new algorithm, such as model
Parameter quantization etc. fast implements and is deployed in training, improve system flexibility, reduces storage and bandwidth demand, reduces positive
Resource requirement during prediction reduces training complexity, improves training effectiveness, it is newest to guarantee that current training device can be adapted to preferably
Neural network structure.
The technical scheme is that
A kind of neural network isomery quantization training method, in training framework base of the tradition based on CPU or GPU or a combination of both
On plinth, high-speed interface logic is added, accelerating module is quantified by high-speed interface logical connection hardware, in the training process additive amount
Change step, the quantum chemical method process of model parameter and characteristic pattern result is transferred to the hardware and quantifies accelerating module, will be quantified
Result after the completion of calculating is back to source training master control through the high-speed interface logic, the model parameter after updating quantization, iteration
Complete the training process with model parameter and characteristic pattern result quantization function.
Further, the hardware quantization accelerating module is responsible for completing neural network model parameter and neural network characteristics figure
As a result low bit position quantization, is realized by special circuit, constitutes heterogeneous structure with traditional training main body CPU or GPU.
Further, the data quantization operation includes: that data are temporary, data statistics sequence, data compression and decompression,
Data Hash with table look-up, floating number turns particular number of bits fixed-point number, floating number displacement scaling and interception, data inverse quantization etc..
The specific customization function includes but is not limited to: model parameter quantization, and floating number turns fixed-point number, special convolution behaviour
Make, such as expand convolution, deep-wise convolution, 1x1 multiplier array is complete to connect adder and multiplier array etc.;The specific customization function
Accelerating module is calculated by the hardware to realize, the module may repeatedly or be only once enabled in training process, completes specific function.
Specifically includes the following steps:
1) neural network model parameter and hyper parameter initial value are set, together based under CPU or GPU training frame in tradition
When initialization hardware quantify accelerating module, start to train;
2) after first run backpropagation has updated neural network the last layer parameter, updated weight parameter is incoming described
Hardware quantifies accelerating module, carries out first-time compression simultaneously to weight parameter by the generic data compressions such as such as GZIP or entropy coding method
Then storage carries out sort method to data, further shifted and intercepted to data according to desired fixed point digit, and limit number
According to maxima and minima, weight parameter after being quantified passes the preceding layer for continuing backpropagation in conventional frame back
Parameter updates, and until completing first run backpropagation, obtains whole weight parameters;
3) it repeats step 2 to be updated weight, completes more wheel backpropagations and completed until reaching model Loss requirement
Training;
4) in addition to weight parameter, each layer characteristic pattern result of neural network can also carry out quantization operation, to push away to entire model
Reason is further quantified;
5) as needed, Hash operation can be carried out to weighted data and obtains index value, data are further compressed, or
It carries out data inverse quantization immediately after the completion of quantization, reduces quantization loss.
The hardware calculates accelerating module and configures realization, external nonvolatile storage by logic using FPGA or ACAP
Part, difference customization function can store simultaneously, configured by training demand to FPGA or ACAP in real time, complete same training process
In different function.
The high-speed interface logic includes but is not limited to PCIE interface, USB3.0 interface, the modes such as ten thousand mbit ethernet interfaces
It realizes, carries out communication interaction with original trained master control.
The beneficial effects of the invention are as follows
Original training process is accelerated using isomery mode, can be by forward position new construction, such as extraordinary convolution type, or
New algorithm, such as model parameter quantization, fast implement and are deployed in training, improve system flexibility, reduce storage and bandwidth
Demand reduces resource requirement during forward prediction, reduces training complexity, improves training effectiveness, guarantee current training device
Newest neural network structure can preferably be adapted to.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in the embodiment of the present invention
Technical solution is clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, rather than complete
The embodiment in portion, based on the embodiments of the present invention, those of ordinary skill in the art are in the premise for not making creative work
Under every other embodiment obtained, shall fall within the protection scope of the present invention.
A kind of neural network isomery of the invention quantifies training method, in instruction of the tradition based on CPU or GPU or a combination of both
On the basis of practicing framework, high-speed interface logic is added, accelerating module is quantified by high-speed interface logical connection hardware, in training process
The quantum chemical method process of model parameter and characteristic pattern result is transferred to the hardware and quantifies to accelerate mould by middle addition quantization step
Result after the completion of quantum chemical method is back to source training master control through the high-speed interface logic, the model after updating quantization by block
Parameter, iteration complete the training process with model parameter and characteristic pattern result quantization function.
Hardware quantization accelerating module is responsible for completing the low bit of neural network model parameter and neural network characteristics figure result
Position quantization, is realized by special circuit, constitutes heterogeneous structure with traditional training main body CPU or GPU.Data quantization operates
Data are temporary, data statistics sequence, data compression and decompression, data Hash with table look-up, floating number turns particular number of bits fixed-point number,
Floating number displacement scaling and interception, data inverse quantization etc..
The following steps are included:
1) neural network model parameter and hyper parameter initial value are set, together based under CPU or GPU training frame in tradition
When initialization hardware quantify accelerating module, start to train;
2) after first run backpropagation has updated neural network the last layer parameter, updated weight parameter is incoming described
Hardware quantifies accelerating module, carries out first-time compression simultaneously to weight parameter by the generic data compressions such as such as GZIP or entropy coding method
Then storage carries out sort method to data, further shifted and intercepted to data according to desired fixed point digit, and limit number
According to maxima and minima, weight parameter after being quantified passes the preceding layer for continuing backpropagation in conventional frame back
Parameter updates, and until completing first run backpropagation, obtains whole weight parameters;
3) it repeats step 2 to be updated weight, completes more wheel backpropagations and completed until reaching model Loss requirement
Training;
4) in addition to weight parameter, each layer characteristic pattern result of neural network can also carry out quantization operation, to push away to entire model
Reason is further quantified;
5) as needed, Hash operation can be carried out to weighted data and obtains index value, data are further compressed, or
It carries out data inverse quantization immediately after the completion of quantization, reduces quantization loss;
The hardware calculates accelerating module and configures realization, external nonvolatile storage by logic using FPGA or ACAP
Part, difference customization function can store simultaneously, configured by training demand to FPGA or ACAP in real time, complete same training process
In different function;The high-speed interface logic includes but is not limited to PCIE interface, USB3.0 interface, ten thousand mbit ethernet interfaces etc.
Mode is realized, carries out communication interaction with original trained master control.
The foregoing is merely presently preferred embodiments of the present invention, is only used to illustrate the technical scheme of the present invention, and is not intended to limit
Determine protection scope of the present invention.Any modification, equivalent substitution, improvement and etc. done all within the spirits and principles of the present invention,
It is included within the scope of protection of the present invention.
Claims (9)
1. a kind of neural network isomery quantifies training method, which is characterized in that
On the basis of training framework of the tradition based on CPU or GPU or a combination of both, high-speed interface logic is added, high quick access is passed through
Mouthful logical connection hardware quantifies accelerating module, in the training process will be under the quantum chemical method process of model parameter and characteristic pattern result
It puts to hardware and quantifies accelerating module, the result after the completion of quantum chemical method is back to source training master through the high-speed interface logic
Control, the model parameter after updating quantization, iteration complete the training process with model parameter and characteristic pattern result quantization function.
2. the method according to claim 1, wherein
The hardware quantization accelerating module is responsible for completing the low bit of neural network model parameter and neural network characteristics figure result
Position quantization.
3. according to the method described in claim 2, it is characterized in that,
The hardware quantization accelerating module is realized by circuit, constitutes heterogeneous structure with traditional training main body CPU or GPU.
4. the method according to claim 1, wherein
Data quantization operation includes: that data are temporary, data statistics sequence, data compression and decompression, data Hash with look into
Table, floating number turn particular number of bits fixed-point number, floating number displacement scaling and interception, data inverse quantization.
5. the method according to claim 1, wherein
Specific step is as follows:
1) neural network model parameter and hyper parameter initial value are set, while just based under CPU or GPU training frame in tradition
Beginningization hardware quantifies accelerating module, starts to train;
2) after first run backpropagation has updated neural network the last layer parameter, updated weight parameter is passed to the hardware
Quantify accelerating module, first-time compression is carried out to weight parameter by data compression method and store, then data are counted
Sequence further shifts data according to desired fixed point digit and intercepts, and limits data maximums and minimum value, the amount of obtaining
Weight parameter after change passes the previous layer parameter update for continuing backpropagation in conventional frame back, anti-until completing the first run
To propagation, whole weight parameters are obtained;
3) it repeats step 2) to be updated weight, completes more wheel backpropagations until reaching model Loss requirement and complete instruction
Practice.
6. according to the method described in claim 5, it is characterized in that,
In addition to weight parameter, each layer characteristic pattern result of neural network can also carry out quantization operation, to carry out to entire model reasoning
Further quantization.
7. according to the method described in claim 6, it is characterized in that,
As needed, Hash operation can be carried out to weighted data and obtains index value, data are further compressed, or is quantifying
It carries out data inverse quantization immediately after the completion, reduces quantization loss.
8. the method according to claim 1, wherein
The hardware is calculated accelerating module and is realized using FPGA or ACAP by logic configuration, external non-volatile memory device, no
It can be stored simultaneously with customization function, FPGA or ACAP is configured in real time by training demand, is completed in same training process
Different function.
9. the method according to claim 1, wherein
The high-speed interface logic can be used PCIE interface, USB3.0 interface or ten thousand mbit ethernet interfaces and realize, with original training
Master control carries out communication interaction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910354693.7A CN110097186B (en) | 2019-04-29 | 2019-04-29 | Neural network heterogeneous quantitative training method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910354693.7A CN110097186B (en) | 2019-04-29 | 2019-04-29 | Neural network heterogeneous quantitative training method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110097186A true CN110097186A (en) | 2019-08-06 |
CN110097186B CN110097186B (en) | 2023-04-18 |
Family
ID=67446342
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910354693.7A Active CN110097186B (en) | 2019-04-29 | 2019-04-29 | Neural network heterogeneous quantitative training method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110097186B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111582476A (en) * | 2020-05-09 | 2020-08-25 | 北京百度网讯科技有限公司 | Automatic quantization strategy searching method, device, equipment and storage medium |
CN111598237A (en) * | 2020-05-21 | 2020-08-28 | 上海商汤智能科技有限公司 | Quantization training method, image processing device, and storage medium |
CN112258377A (en) * | 2020-10-13 | 2021-01-22 | 国家计算机网络与信息安全管理中心 | Method and equipment for constructing robust binary neural network |
CN112308215A (en) * | 2020-12-31 | 2021-02-02 | 之江实验室 | Intelligent training acceleration method and system based on data sparse characteristic in neural network |
CN113033784A (en) * | 2021-04-18 | 2021-06-25 | 沈阳雅译网络技术有限公司 | Method for searching neural network structure for CPU and GPU equipment |
CN114611697A (en) * | 2022-05-11 | 2022-06-10 | 上海登临科技有限公司 | Neural network quantification and deployment method, system, electronic device and storage medium |
CN116451757A (en) * | 2023-06-19 | 2023-07-18 | 山东浪潮科学研究院有限公司 | Heterogeneous acceleration method, heterogeneous acceleration device, heterogeneous acceleration equipment and heterogeneous acceleration medium for neural network model |
CN116911350A (en) * | 2023-09-12 | 2023-10-20 | 苏州浪潮智能科技有限公司 | Quantification method based on graph neural network model, task processing method and task processing device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100076915A1 (en) * | 2008-09-25 | 2010-03-25 | Microsoft Corporation | Field-Programmable Gate Array Based Accelerator System |
CN108280514A (en) * | 2018-01-05 | 2018-07-13 | 中国科学技术大学 | Sparse neural network acceleration system based on FPGA and design method |
CN109635936A (en) * | 2018-12-29 | 2019-04-16 | 杭州国芯科技股份有限公司 | A kind of neural networks pruning quantization method based on retraining |
-
2019
- 2019-04-29 CN CN201910354693.7A patent/CN110097186B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100076915A1 (en) * | 2008-09-25 | 2010-03-25 | Microsoft Corporation | Field-Programmable Gate Array Based Accelerator System |
CN108280514A (en) * | 2018-01-05 | 2018-07-13 | 中国科学技术大学 | Sparse neural network acceleration system based on FPGA and design method |
CN109635936A (en) * | 2018-12-29 | 2019-04-16 | 杭州国芯科技股份有限公司 | A kind of neural networks pruning quantization method based on retraining |
Non-Patent Citations (2)
Title |
---|
刘强等: "基于GPU的并行拟牛顿神经网络训练算法设计", 《河海大学学报(自然科学版)》 * |
刘进锋: "一种简洁高效的加速卷积神经网络的方法", 《科学技术与工程》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111582476A (en) * | 2020-05-09 | 2020-08-25 | 北京百度网讯科技有限公司 | Automatic quantization strategy searching method, device, equipment and storage medium |
CN111598237A (en) * | 2020-05-21 | 2020-08-28 | 上海商汤智能科技有限公司 | Quantization training method, image processing device, and storage medium |
CN111598237B (en) * | 2020-05-21 | 2024-06-11 | 上海商汤智能科技有限公司 | Quantization training, image processing method and device, and storage medium |
CN112258377A (en) * | 2020-10-13 | 2021-01-22 | 国家计算机网络与信息安全管理中心 | Method and equipment for constructing robust binary neural network |
CN112308215A (en) * | 2020-12-31 | 2021-02-02 | 之江实验室 | Intelligent training acceleration method and system based on data sparse characteristic in neural network |
CN113033784A (en) * | 2021-04-18 | 2021-06-25 | 沈阳雅译网络技术有限公司 | Method for searching neural network structure for CPU and GPU equipment |
CN114611697A (en) * | 2022-05-11 | 2022-06-10 | 上海登临科技有限公司 | Neural network quantification and deployment method, system, electronic device and storage medium |
CN116451757A (en) * | 2023-06-19 | 2023-07-18 | 山东浪潮科学研究院有限公司 | Heterogeneous acceleration method, heterogeneous acceleration device, heterogeneous acceleration equipment and heterogeneous acceleration medium for neural network model |
CN116451757B (en) * | 2023-06-19 | 2023-09-08 | 山东浪潮科学研究院有限公司 | Heterogeneous acceleration method, heterogeneous acceleration device, heterogeneous acceleration equipment and heterogeneous acceleration medium for neural network model |
CN116911350A (en) * | 2023-09-12 | 2023-10-20 | 苏州浪潮智能科技有限公司 | Quantification method based on graph neural network model, task processing method and task processing device |
CN116911350B (en) * | 2023-09-12 | 2024-01-09 | 苏州浪潮智能科技有限公司 | Quantification method based on graph neural network model, task processing method and task processing device |
Also Published As
Publication number | Publication date |
---|---|
CN110097186B (en) | 2023-04-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110097186A (en) | A kind of neural network isomery quantization training method | |
Koloskova et al. | Decentralized stochastic optimization and gossip algorithms with compressed communication | |
CN113315604B (en) | Adaptive gradient quantization method for federated learning | |
Ozfatura et al. | Speeding up distributed gradient descent by utilizing non-persistent stragglers | |
CN110135573B (en) | Training method, computing equipment and system for deep learning model | |
CN111030861B (en) | Edge calculation distributed model training method, terminal and network side equipment | |
CN110245743A (en) | A kind of asynchronous distributed deep learning training method, apparatus and system | |
CN108460457A (en) | A kind of more asynchronous training methods of card hybrid parallel of multimachine towards convolutional neural networks | |
CN112862088A (en) | Distributed deep learning method based on pipeline annular parameter communication | |
CN108334945A (en) | The acceleration of deep neural network and compression method and device | |
CN107644252A (en) | A kind of recurrent neural networks model compression method of more mechanism mixing | |
CN106777449A (en) | Distribution Network Reconfiguration based on binary particle swarm algorithm | |
CN107911300B (en) | Multicast routing optimization method based on whale algorithm and application of multicast routing optimization method on Spark platform | |
CN111222532A (en) | Edge cloud collaborative deep learning model training method with classification precision maintenance and bandwidth protection | |
CN109214512B (en) | Deep learning parameter exchange method, device, server and storage medium | |
CN108985444A (en) | A kind of convolutional neural networks pruning method inhibited based on node | |
CN113595993A (en) | Vehicle-mounted sensing equipment joint learning method for model structure optimization under edge calculation | |
CN103679564B (en) | Task allocation method applicable to power distribution network topology analysis distributed computation | |
CN106846236A (en) | A kind of expansible distributed GPU accelerating method and devices | |
Zhang et al. | Evaluation and optimization of gradient compression for distributed deep learning | |
Chen et al. | A channel aggregation based dynamic pruning method in federated learning | |
WO2018082320A1 (en) | Data stream join method and device | |
Chandrachoodan | A GPU implementation of belief propagation decoder for polar codes | |
CN110021339A (en) | Cluster parallel computing accelerated method based on protein folding measuring and calculating protein structure | |
CN112231113A (en) | Message transmission method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20230320 Address after: 250000 building S02, No. 1036, Langchao Road, high tech Zone, Jinan City, Shandong Province Applicant after: Shandong Inspur Scientific Research Institute Co.,Ltd. Address before: 250100 First Floor of R&D Building 2877 Kehang Road, Sun Village Town, Jinan High-tech Zone, Shandong Province Applicant before: JINAN INSPUR HIGH-TECH TECHNOLOGY DEVELOPMENT Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |