CN110097186A - A kind of neural network isomery quantization training method - Google Patents

A kind of neural network isomery quantization training method Download PDF

Info

Publication number
CN110097186A
CN110097186A CN201910354693.7A CN201910354693A CN110097186A CN 110097186 A CN110097186 A CN 110097186A CN 201910354693 A CN201910354693 A CN 201910354693A CN 110097186 A CN110097186 A CN 110097186A
Authority
CN
China
Prior art keywords
training
quantization
data
neural network
hardware
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910354693.7A
Other languages
Chinese (zh)
Other versions
CN110097186B (en
Inventor
王子彤
姜凯
秦刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Inspur Scientific Research Institute Co Ltd
Original Assignee
Jinan Inspur Hi Tech Investment and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan Inspur Hi Tech Investment and Development Co Ltd filed Critical Jinan Inspur Hi Tech Investment and Development Co Ltd
Priority to CN201910354693.7A priority Critical patent/CN110097186B/en
Publication of CN110097186A publication Critical patent/CN110097186A/en
Application granted granted Critical
Publication of CN110097186B publication Critical patent/CN110097186B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Feedback Control In General (AREA)

Abstract

The present invention provides a kind of neural network isomery quantization training method, belong to artificial neural network technology field, the present invention is on the basis of training framework of the tradition based on CPU or GPU or a combination of both, add high-speed interface logic, accelerating module is calculated by high-speed interface logical connection hardware, specific a certain step or a few step calculating process are transferred to the hardware and calculate accelerating module among training process, result is back to source training master control through the high-speed interface logic after the completion of calculating, completes the training process with specific customization function.Forward position new construction or new algorithm are fast implemented and are deployed in training, system flexibility is improved, reduces storage and bandwidth demand, resource requirement during reduction forward prediction, training complexity is reduced, training effectiveness is improved, guarantees that current training device can preferably adapt to newest neural network structure.

Description

A kind of neural network isomery quantization training method
Technical field
The present invention relates to artificial neural network technology fields more particularly to a kind of neural network isomery to quantify training method.
Background technique
One group of training set is sent into network by neural metwork training, according to the difference between the reality output and desired output of network To adjust weight.Training process includes: the output of the structure and propagated forward that define neural network as a result, finding out result and expectation The error of value, then the return by error in layer, then carry out right value update.Net is adjusted by training sample and desired value Network weight.
CPU is good at logic control, serial arithmetic and universal class data operation, GPU side and handles Large-scale parallel computing again Multitask.CPU and GPU can efficiently complete task in respective field, can also be used as the master of Current Situation of Neural Network training Stream mode.
Deeply with research, more and more new constructions, new algorithm is constantly suggested, and gives universal cpu, GPU training method Requirements at the higher level and challenge are brought, specific detailed structure is difficult to fast implement, and the training time may become more tediously long.
Summary of the invention
In order to solve the above technical problems, the invention proposes a kind of neural network isomeries to quantify training method.Using different Structure mode accelerates original training process, can be by forward position new construction, such as extraordinary convolution type or new algorithm, such as model Parameter quantization etc. fast implements and is deployed in training, improve system flexibility, reduces storage and bandwidth demand, reduces positive Resource requirement during prediction reduces training complexity, improves training effectiveness, it is newest to guarantee that current training device can be adapted to preferably Neural network structure.
The technical scheme is that
A kind of neural network isomery quantization training method, in training framework base of the tradition based on CPU or GPU or a combination of both On plinth, high-speed interface logic is added, accelerating module is quantified by high-speed interface logical connection hardware, in the training process additive amount Change step, the quantum chemical method process of model parameter and characteristic pattern result is transferred to the hardware and quantifies accelerating module, will be quantified Result after the completion of calculating is back to source training master control through the high-speed interface logic, the model parameter after updating quantization, iteration Complete the training process with model parameter and characteristic pattern result quantization function.
Further, the hardware quantization accelerating module is responsible for completing neural network model parameter and neural network characteristics figure As a result low bit position quantization, is realized by special circuit, constitutes heterogeneous structure with traditional training main body CPU or GPU.
Further, the data quantization operation includes: that data are temporary, data statistics sequence, data compression and decompression, Data Hash with table look-up, floating number turns particular number of bits fixed-point number, floating number displacement scaling and interception, data inverse quantization etc..
The specific customization function includes but is not limited to: model parameter quantization, and floating number turns fixed-point number, special convolution behaviour Make, such as expand convolution, deep-wise convolution, 1x1 multiplier array is complete to connect adder and multiplier array etc.;The specific customization function Accelerating module is calculated by the hardware to realize, the module may repeatedly or be only once enabled in training process, completes specific function.
Specifically includes the following steps:
1) neural network model parameter and hyper parameter initial value are set, together based under CPU or GPU training frame in tradition When initialization hardware quantify accelerating module, start to train;
2) after first run backpropagation has updated neural network the last layer parameter, updated weight parameter is incoming described Hardware quantifies accelerating module, carries out first-time compression simultaneously to weight parameter by the generic data compressions such as such as GZIP or entropy coding method Then storage carries out sort method to data, further shifted and intercepted to data according to desired fixed point digit, and limit number According to maxima and minima, weight parameter after being quantified passes the preceding layer for continuing backpropagation in conventional frame back Parameter updates, and until completing first run backpropagation, obtains whole weight parameters;
3) it repeats step 2 to be updated weight, completes more wheel backpropagations and completed until reaching model Loss requirement Training;
4) in addition to weight parameter, each layer characteristic pattern result of neural network can also carry out quantization operation, to push away to entire model Reason is further quantified;
5) as needed, Hash operation can be carried out to weighted data and obtains index value, data are further compressed, or It carries out data inverse quantization immediately after the completion of quantization, reduces quantization loss.
The hardware calculates accelerating module and configures realization, external nonvolatile storage by logic using FPGA or ACAP Part, difference customization function can store simultaneously, configured by training demand to FPGA or ACAP in real time, complete same training process In different function.
The high-speed interface logic includes but is not limited to PCIE interface, USB3.0 interface, the modes such as ten thousand mbit ethernet interfaces It realizes, carries out communication interaction with original trained master control.
The beneficial effects of the invention are as follows
Original training process is accelerated using isomery mode, can be by forward position new construction, such as extraordinary convolution type, or New algorithm, such as model parameter quantization, fast implement and are deployed in training, improve system flexibility, reduce storage and bandwidth Demand reduces resource requirement during forward prediction, reduces training complexity, improves training effectiveness, guarantee current training device Newest neural network structure can preferably be adapted to.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in the embodiment of the present invention Technical solution is clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, rather than complete The embodiment in portion, based on the embodiments of the present invention, those of ordinary skill in the art are in the premise for not making creative work Under every other embodiment obtained, shall fall within the protection scope of the present invention.
A kind of neural network isomery of the invention quantifies training method, in instruction of the tradition based on CPU or GPU or a combination of both On the basis of practicing framework, high-speed interface logic is added, accelerating module is quantified by high-speed interface logical connection hardware, in training process The quantum chemical method process of model parameter and characteristic pattern result is transferred to the hardware and quantifies to accelerate mould by middle addition quantization step Result after the completion of quantum chemical method is back to source training master control through the high-speed interface logic, the model after updating quantization by block Parameter, iteration complete the training process with model parameter and characteristic pattern result quantization function.
Hardware quantization accelerating module is responsible for completing the low bit of neural network model parameter and neural network characteristics figure result Position quantization, is realized by special circuit, constitutes heterogeneous structure with traditional training main body CPU or GPU.Data quantization operates Data are temporary, data statistics sequence, data compression and decompression, data Hash with table look-up, floating number turns particular number of bits fixed-point number, Floating number displacement scaling and interception, data inverse quantization etc..
The following steps are included:
1) neural network model parameter and hyper parameter initial value are set, together based under CPU or GPU training frame in tradition When initialization hardware quantify accelerating module, start to train;
2) after first run backpropagation has updated neural network the last layer parameter, updated weight parameter is incoming described Hardware quantifies accelerating module, carries out first-time compression simultaneously to weight parameter by the generic data compressions such as such as GZIP or entropy coding method Then storage carries out sort method to data, further shifted and intercepted to data according to desired fixed point digit, and limit number According to maxima and minima, weight parameter after being quantified passes the preceding layer for continuing backpropagation in conventional frame back Parameter updates, and until completing first run backpropagation, obtains whole weight parameters;
3) it repeats step 2 to be updated weight, completes more wheel backpropagations and completed until reaching model Loss requirement Training;
4) in addition to weight parameter, each layer characteristic pattern result of neural network can also carry out quantization operation, to push away to entire model Reason is further quantified;
5) as needed, Hash operation can be carried out to weighted data and obtains index value, data are further compressed, or It carries out data inverse quantization immediately after the completion of quantization, reduces quantization loss;
The hardware calculates accelerating module and configures realization, external nonvolatile storage by logic using FPGA or ACAP Part, difference customization function can store simultaneously, configured by training demand to FPGA or ACAP in real time, complete same training process In different function;The high-speed interface logic includes but is not limited to PCIE interface, USB3.0 interface, ten thousand mbit ethernet interfaces etc. Mode is realized, carries out communication interaction with original trained master control.
The foregoing is merely presently preferred embodiments of the present invention, is only used to illustrate the technical scheme of the present invention, and is not intended to limit Determine protection scope of the present invention.Any modification, equivalent substitution, improvement and etc. done all within the spirits and principles of the present invention, It is included within the scope of protection of the present invention.

Claims (9)

1. a kind of neural network isomery quantifies training method, which is characterized in that
On the basis of training framework of the tradition based on CPU or GPU or a combination of both, high-speed interface logic is added, high quick access is passed through Mouthful logical connection hardware quantifies accelerating module, in the training process will be under the quantum chemical method process of model parameter and characteristic pattern result It puts to hardware and quantifies accelerating module, the result after the completion of quantum chemical method is back to source training master through the high-speed interface logic Control, the model parameter after updating quantization, iteration complete the training process with model parameter and characteristic pattern result quantization function.
2. the method according to claim 1, wherein
The hardware quantization accelerating module is responsible for completing the low bit of neural network model parameter and neural network characteristics figure result Position quantization.
3. according to the method described in claim 2, it is characterized in that,
The hardware quantization accelerating module is realized by circuit, constitutes heterogeneous structure with traditional training main body CPU or GPU.
4. the method according to claim 1, wherein
Data quantization operation includes: that data are temporary, data statistics sequence, data compression and decompression, data Hash with look into Table, floating number turn particular number of bits fixed-point number, floating number displacement scaling and interception, data inverse quantization.
5. the method according to claim 1, wherein
Specific step is as follows:
1) neural network model parameter and hyper parameter initial value are set, while just based under CPU or GPU training frame in tradition Beginningization hardware quantifies accelerating module, starts to train;
2) after first run backpropagation has updated neural network the last layer parameter, updated weight parameter is passed to the hardware Quantify accelerating module, first-time compression is carried out to weight parameter by data compression method and store, then data are counted Sequence further shifts data according to desired fixed point digit and intercepts, and limits data maximums and minimum value, the amount of obtaining Weight parameter after change passes the previous layer parameter update for continuing backpropagation in conventional frame back, anti-until completing the first run To propagation, whole weight parameters are obtained;
3) it repeats step 2) to be updated weight, completes more wheel backpropagations until reaching model Loss requirement and complete instruction Practice.
6. according to the method described in claim 5, it is characterized in that,
In addition to weight parameter, each layer characteristic pattern result of neural network can also carry out quantization operation, to carry out to entire model reasoning Further quantization.
7. according to the method described in claim 6, it is characterized in that,
As needed, Hash operation can be carried out to weighted data and obtains index value, data are further compressed, or is quantifying It carries out data inverse quantization immediately after the completion, reduces quantization loss.
8. the method according to claim 1, wherein
The hardware is calculated accelerating module and is realized using FPGA or ACAP by logic configuration, external non-volatile memory device, no It can be stored simultaneously with customization function, FPGA or ACAP is configured in real time by training demand, is completed in same training process Different function.
9. the method according to claim 1, wherein
The high-speed interface logic can be used PCIE interface, USB3.0 interface or ten thousand mbit ethernet interfaces and realize, with original training Master control carries out communication interaction.
CN201910354693.7A 2019-04-29 2019-04-29 Neural network heterogeneous quantitative training method Active CN110097186B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910354693.7A CN110097186B (en) 2019-04-29 2019-04-29 Neural network heterogeneous quantitative training method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910354693.7A CN110097186B (en) 2019-04-29 2019-04-29 Neural network heterogeneous quantitative training method

Publications (2)

Publication Number Publication Date
CN110097186A true CN110097186A (en) 2019-08-06
CN110097186B CN110097186B (en) 2023-04-18

Family

ID=67446342

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910354693.7A Active CN110097186B (en) 2019-04-29 2019-04-29 Neural network heterogeneous quantitative training method

Country Status (1)

Country Link
CN (1) CN110097186B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582476A (en) * 2020-05-09 2020-08-25 北京百度网讯科技有限公司 Automatic quantization strategy searching method, device, equipment and storage medium
CN111598237A (en) * 2020-05-21 2020-08-28 上海商汤智能科技有限公司 Quantization training method, image processing device, and storage medium
CN112258377A (en) * 2020-10-13 2021-01-22 国家计算机网络与信息安全管理中心 Method and equipment for constructing robust binary neural network
CN112308215A (en) * 2020-12-31 2021-02-02 之江实验室 Intelligent training acceleration method and system based on data sparse characteristic in neural network
CN113033784A (en) * 2021-04-18 2021-06-25 沈阳雅译网络技术有限公司 Method for searching neural network structure for CPU and GPU equipment
CN114611697A (en) * 2022-05-11 2022-06-10 上海登临科技有限公司 Neural network quantification and deployment method, system, electronic device and storage medium
CN116451757A (en) * 2023-06-19 2023-07-18 山东浪潮科学研究院有限公司 Heterogeneous acceleration method, heterogeneous acceleration device, heterogeneous acceleration equipment and heterogeneous acceleration medium for neural network model
CN116911350A (en) * 2023-09-12 2023-10-20 苏州浪潮智能科技有限公司 Quantification method based on graph neural network model, task processing method and task processing device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100076915A1 (en) * 2008-09-25 2010-03-25 Microsoft Corporation Field-Programmable Gate Array Based Accelerator System
CN108280514A (en) * 2018-01-05 2018-07-13 中国科学技术大学 Sparse neural network acceleration system based on FPGA and design method
CN109635936A (en) * 2018-12-29 2019-04-16 杭州国芯科技股份有限公司 A kind of neural networks pruning quantization method based on retraining

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100076915A1 (en) * 2008-09-25 2010-03-25 Microsoft Corporation Field-Programmable Gate Array Based Accelerator System
CN108280514A (en) * 2018-01-05 2018-07-13 中国科学技术大学 Sparse neural network acceleration system based on FPGA and design method
CN109635936A (en) * 2018-12-29 2019-04-16 杭州国芯科技股份有限公司 A kind of neural networks pruning quantization method based on retraining

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘强等: "基于GPU的并行拟牛顿神经网络训练算法设计", 《河海大学学报(自然科学版)》 *
刘进锋: "一种简洁高效的加速卷积神经网络的方法", 《科学技术与工程》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582476A (en) * 2020-05-09 2020-08-25 北京百度网讯科技有限公司 Automatic quantization strategy searching method, device, equipment and storage medium
CN111598237A (en) * 2020-05-21 2020-08-28 上海商汤智能科技有限公司 Quantization training method, image processing device, and storage medium
CN111598237B (en) * 2020-05-21 2024-06-11 上海商汤智能科技有限公司 Quantization training, image processing method and device, and storage medium
CN112258377A (en) * 2020-10-13 2021-01-22 国家计算机网络与信息安全管理中心 Method and equipment for constructing robust binary neural network
CN112308215A (en) * 2020-12-31 2021-02-02 之江实验室 Intelligent training acceleration method and system based on data sparse characteristic in neural network
CN113033784A (en) * 2021-04-18 2021-06-25 沈阳雅译网络技术有限公司 Method for searching neural network structure for CPU and GPU equipment
CN114611697A (en) * 2022-05-11 2022-06-10 上海登临科技有限公司 Neural network quantification and deployment method, system, electronic device and storage medium
CN116451757A (en) * 2023-06-19 2023-07-18 山东浪潮科学研究院有限公司 Heterogeneous acceleration method, heterogeneous acceleration device, heterogeneous acceleration equipment and heterogeneous acceleration medium for neural network model
CN116451757B (en) * 2023-06-19 2023-09-08 山东浪潮科学研究院有限公司 Heterogeneous acceleration method, heterogeneous acceleration device, heterogeneous acceleration equipment and heterogeneous acceleration medium for neural network model
CN116911350A (en) * 2023-09-12 2023-10-20 苏州浪潮智能科技有限公司 Quantification method based on graph neural network model, task processing method and task processing device
CN116911350B (en) * 2023-09-12 2024-01-09 苏州浪潮智能科技有限公司 Quantification method based on graph neural network model, task processing method and task processing device

Also Published As

Publication number Publication date
CN110097186B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
CN110097186A (en) A kind of neural network isomery quantization training method
Koloskova et al. Decentralized stochastic optimization and gossip algorithms with compressed communication
CN113315604B (en) Adaptive gradient quantization method for federated learning
Ozfatura et al. Speeding up distributed gradient descent by utilizing non-persistent stragglers
CN110135573B (en) Training method, computing equipment and system for deep learning model
CN111030861B (en) Edge calculation distributed model training method, terminal and network side equipment
CN110245743A (en) A kind of asynchronous distributed deep learning training method, apparatus and system
CN108460457A (en) A kind of more asynchronous training methods of card hybrid parallel of multimachine towards convolutional neural networks
CN112862088A (en) Distributed deep learning method based on pipeline annular parameter communication
CN108334945A (en) The acceleration of deep neural network and compression method and device
CN107644252A (en) A kind of recurrent neural networks model compression method of more mechanism mixing
CN106777449A (en) Distribution Network Reconfiguration based on binary particle swarm algorithm
CN107911300B (en) Multicast routing optimization method based on whale algorithm and application of multicast routing optimization method on Spark platform
CN111222532A (en) Edge cloud collaborative deep learning model training method with classification precision maintenance and bandwidth protection
CN109214512B (en) Deep learning parameter exchange method, device, server and storage medium
CN108985444A (en) A kind of convolutional neural networks pruning method inhibited based on node
CN113595993A (en) Vehicle-mounted sensing equipment joint learning method for model structure optimization under edge calculation
CN103679564B (en) Task allocation method applicable to power distribution network topology analysis distributed computation
CN106846236A (en) A kind of expansible distributed GPU accelerating method and devices
Zhang et al. Evaluation and optimization of gradient compression for distributed deep learning
Chen et al. A channel aggregation based dynamic pruning method in federated learning
WO2018082320A1 (en) Data stream join method and device
Chandrachoodan A GPU implementation of belief propagation decoder for polar codes
CN110021339A (en) Cluster parallel computing accelerated method based on protein folding measuring and calculating protein structure
CN112231113A (en) Message transmission method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20230320

Address after: 250000 building S02, No. 1036, Langchao Road, high tech Zone, Jinan City, Shandong Province

Applicant after: Shandong Inspur Scientific Research Institute Co.,Ltd.

Address before: 250100 First Floor of R&D Building 2877 Kehang Road, Sun Village Town, Jinan High-tech Zone, Shandong Province

Applicant before: JINAN INSPUR HIGH-TECH TECHNOLOGY DEVELOPMENT Co.,Ltd.

GR01 Patent grant
GR01 Patent grant