CN110097186B - Neural network heterogeneous quantitative training method - Google Patents

Neural network heterogeneous quantitative training method Download PDF

Info

Publication number
CN110097186B
CN110097186B CN201910354693.7A CN201910354693A CN110097186B CN 110097186 B CN110097186 B CN 110097186B CN 201910354693 A CN201910354693 A CN 201910354693A CN 110097186 B CN110097186 B CN 110097186B
Authority
CN
China
Prior art keywords
training
quantization
data
neural network
parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910354693.7A
Other languages
Chinese (zh)
Other versions
CN110097186A (en
Inventor
王子彤
姜凯
秦刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Inspur Scientific Research Institute Co Ltd
Original Assignee
Shandong Inspur Scientific Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Inspur Scientific Research Institute Co Ltd filed Critical Shandong Inspur Scientific Research Institute Co Ltd
Priority to CN201910354693.7A priority Critical patent/CN110097186B/en
Publication of CN110097186A publication Critical patent/CN110097186A/en
Application granted granted Critical
Publication of CN110097186B publication Critical patent/CN110097186B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Feedback Control In General (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention provides a neural network heterogeneous quantitative training method, which belongs to the technical field of artificial neural networks, and is characterized in that high-speed interface logic is added on the basis of a traditional training framework based on a CPU (Central processing Unit) or a GPU (graphics processing Unit) or a combination of the CPU and the GPU, a hardware calculation acceleration module is connected through the high-speed interface logic, a specific calculation process or a plurality of calculation processes in the training process are transferred to the hardware calculation acceleration module, and after calculation is finished, a result is returned to a source training main control through the high-speed interface logic, so that the training process with a specific customized function is finished. A front-edge new structure or a new algorithm is quickly realized and deployed in training, the flexibility of the system is improved, the requirements for storage and bandwidth are reduced, the resource requirements in the forward prediction process are reduced, the training complexity is reduced, the training efficiency is improved, and the current training device can be better adapted to the latest neural network structure.

Description

Neural network heterogeneous quantitative training method
Technical Field
The invention relates to the technical field of artificial neural networks, in particular to a neural network heterogeneous quantitative training method.
Background
Neural network training feeds a set of training sets into the network, adjusting the weights based on the difference between the actual output and the expected output of the network. The training process comprises the following steps: defining the structure of the neural network and the output result of forward propagation, solving the error between the result and the expected value, returning the error layer by layer, and then updating the weight. The network weights are adjusted by training samples and expectations.
The CPU is good at logic control, serial operation and general type data operation, and the GPU emphasizes processing large-scale parallel computing multiple tasks. The CPU and the GPU can efficiently complete tasks in respective fields and can also be used as a mainstream mode of current neural network training.
With the intensive research, more and more new structures and new algorithms are continuously proposed, so that higher requirements and challenges are brought to general CPU and GPU training modes, specific detailed structures are difficult to realize quickly, and the training time is possibly more tedious.
Disclosure of Invention
In order to solve the technical problems, the invention provides a neural network heterogeneous quantitative training method. The method has the advantages that the original training process is accelerated in a heterogeneous mode, a leading edge new structure such as a special convolution type and the like or a new algorithm such as model parameter quantification and the like can be rapidly realized and deployed in training, the flexibility of the system is improved, the requirements for storage and bandwidth are reduced, the resource requirement in the forward prediction process is reduced, the training complexity is reduced, the training efficiency is improved, and the current training device can be well adapted to the latest neural network structure.
The technical scheme of the invention is as follows:
a neural network heterogeneous quantitative training method is characterized in that high-speed interface logic is added on the basis of a traditional training framework based on a CPU (Central processing Unit) or a GPU (graphics processing Unit) or a combination of the CPU and the GPU, a hardware quantitative acceleration module is connected through the high-speed interface logic, a quantization step is added in the training process, the quantitative calculation process of model parameters and feature map results is transferred to the hardware quantitative acceleration module, the results after the quantitative calculation are returned to a source training main control through the high-speed interface logic, the quantized model parameters are updated, and the training process with the function of quantizing the model parameters and the feature map results is finished in an iterative mode.
Furthermore, the hardware quantization acceleration module is responsible for completing low-bit quantization of the neural network model parameters and the neural network characteristic diagram results, is realized by a special circuit, and forms a heterogeneous structure with a traditional training main body CPU or GPU.
Further, the data quantization operation includes: temporary storage of data, statistical sorting of data, compression and decompression of data, hashing and table look-up of data, conversion of floating point number to fixed point number with specific number, shifting, scaling and intercepting of floating point number, inverse quantization of data and the like.
The specific customized functions include, but are not limited to: model parameter quantization, floating point number fixed point number, special convolution operation such as expansion convolution, deep-wise convolution, 1x1 multiplier array, full connection multiplier-adder array and the like; the specific customized function is realized by the hardware calculation acceleration module, and the module can be enabled for multiple times or only once in the training process to complete the specific function.
The method specifically comprises the following steps:
1) Setting neural network model parameters and hyper-parameter initial values under a traditional training framework based on a CPU or a GPU, and initializing a hardware quantization acceleration module to start training;
2) After the last layer of parameters of the neural network are updated through first round reverse propagation, the updated weight parameters are transmitted into the hardware quantization acceleration module, the weight parameters are primarily compressed and stored through general data compression methods such as GZIP or entropy coding, then the data are subjected to statistical sequencing, the data are further shifted and intercepted according to expected fixed point digits, the maximum value and the minimum value of the data are limited, the quantized weight parameters are obtained, the quantized weight parameters are transmitted back to the previous layer of parameters in the traditional frame to be continuously updated through reverse propagation until the first round reverse propagation is completed, and all the weight parameters are obtained;
3) Repeating the step 2 to update the weight, completing multi-round reverse propagation until the model Loss requirement is met, and completing training;
4) Besides the weight parameters, the results of the feature maps of all layers of the neural network can also be subjected to quantization operation so as to further quantize the reasoning of the whole model;
5) According to the requirement, hash operation can be carried out on the weight data to obtain an index value, the data is further compressed, or data inverse quantization is carried out immediately after quantization is finished, so that the quantization loss is reduced.
The hardware calculation acceleration module is realized by logic configuration by adopting an FPGA or an ACAP, is externally connected with a nonvolatile memory device, can simultaneously store different customized functions, and configures the FPGA or the ACAP in real time according to training requirements to complete different functions in the same training process.
The high-speed interface logic includes but is not limited to a PCIE interface, a USB3.0 interface, a gigabit Ethernet interface and the like, and is communicated and interacted with the original training master control.
The invention has the advantages that
The method has the advantages that the original training process is accelerated in a heterogeneous mode, a leading edge new structure such as a special convolution type and the like or a new algorithm such as model parameter quantification and the like can be rapidly realized and deployed in training, the flexibility of the system is improved, the requirements for storage and bandwidth are reduced, the resource requirement in the forward prediction process is reduced, the training complexity is reduced, the training efficiency is improved, and the current training device can be well adapted to the latest neural network structure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention are described below, it is obvious that the described embodiments are a part of the embodiments of the present invention, but not all embodiments, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts based on the embodiments of the present invention belong to the protection scope of the present invention.
The invention discloses a heterogeneous quantitative training method of a neural network, which is characterized in that high-speed interface logic is added on the basis of a traditional training framework based on a CPU (Central processing Unit) or a GPU (graphics processing Unit) or a combination of the CPU and the GPU, a hardware quantitative acceleration module is connected through the high-speed interface logic, a quantization step is added in the training process, the quantitative calculation process of model parameters and characteristic diagram results is transferred to the hardware quantitative acceleration module, the results after the quantitative calculation are returned to a source training main control through the high-speed interface logic, the quantized model parameters are updated, and the training process with the function of quantizing the model parameters and the characteristic diagram results is finished in an iterative manner.
The hardware quantization accelerating module is responsible for completing low bit quantization of the neural network model parameters and the neural network characteristic diagram results, is realized by a special circuit, and forms a heterogeneous structure with a traditional training main body CPU or GPU. The data quantization operation includes: temporary storage of data, statistical sorting of data, compression and decompression of data, hashing and table lookup of data, conversion of floating point number to fixed point number of specific digit, shifting, scaling and intercepting of floating point number, inverse quantization of data and the like.
The method comprises the following steps:
1) Setting neural network model parameters and hyper-parameter initial values under a traditional training framework based on a CPU or a GPU, and initializing a hardware quantization acceleration module to start training;
2) After the last layer of parameters of the neural network are updated through first round reverse propagation, the updated weight parameters are transmitted into the hardware quantization acceleration module, the weight parameters are primarily compressed and stored through general data compression methods such as GZIP or entropy coding, then the data are subjected to statistical sequencing, the data are further shifted and intercepted according to expected fixed point digits, the maximum value and the minimum value of the data are limited, the quantized weight parameters are obtained, the quantized weight parameters are transmitted back to the previous layer of parameters in the traditional frame to be continuously updated through reverse propagation until the first round reverse propagation is completed, and all the weight parameters are obtained;
3) Repeating the step 2 to update the weight, completing multi-round reverse propagation until the requirement of the model Loss is met, and completing training;
4) Besides the weight parameters, the results of the characteristic diagrams of each layer of the neural network can also be subjected to quantization operation so as to further quantize the reasoning of the whole model;
5) According to the requirement, hash operation can be carried out on the weight data to obtain an index value, the data is further compressed, or data inverse quantization is carried out immediately after the quantization is finished, so that the quantization loss is reduced;
the hardware calculation acceleration module is realized by logic configuration by adopting an FPGA or an ACAP, is externally connected with a nonvolatile memory device, can simultaneously store different customized functions, and configures the FPGA or the ACAP in real time according to training requirements to complete different functions in the same training process; the high-speed interface logic includes but is not limited to a PCIE interface, a USB3.0 interface, a gigabit Ethernet interface and the like, and is communicated and interacted with the original training master control.
The above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (6)

1. A neural network heterogeneous quantitative training method is characterized in that,
on the basis of a traditional training framework based on a CPU (Central processing Unit) or a GPU (graphics processing Unit) or a combination of the CPU and the GPU, adding high-speed interface logic, connecting a hardware quantization acceleration module through the high-speed interface logic, transferring a quantization calculation process of model parameters and feature diagram results to the hardware quantization acceleration module in a training process, returning the result after quantization calculation to a source training master control through the high-speed interface logic, updating quantized model parameters, and iteratively completing the training process with a function of quantizing the model parameters and the feature diagram results;
the method comprises the following specific steps:
1) Setting neural network model parameters and hyper-parameter initial values under a traditional training framework based on a CPU or a GPU, and initializing a hardware quantization acceleration module to start training;
2) After the last layer of parameters of the neural network are updated through first round reverse propagation, the updated weight parameters are transmitted into the hardware quantization acceleration module, the weight parameters are primarily compressed and stored through a data compression method, then the data are counted and sequenced, the data are further shifted and intercepted according to expected fixed point digits, the maximum value and the minimum value of the data are limited, the quantized weight parameters are obtained, and the quantized weight parameters are transmitted back to the previous layer of parameters which are continuously subjected to reverse propagation in a traditional frame until the first round reverse propagation is completed, and all the weight parameters are obtained;
3) Repeating the step 2) to update the weight, completing multi-round reverse transmission until the requirement of the model Loss is met, and completing training;
besides the weight parameters, the results of the feature maps of all layers of the neural network can also be subjected to quantization operation so as to further quantize the reasoning of the whole model;
according to the requirement, the weighted data can be subjected to Hash operation to obtain an index value, the data is further compressed, or the data is subjected to inverse quantization immediately after the quantization is finished, so that the quantization loss is reduced.
2. The method of claim 1,
and the hardware quantization accelerating module is responsible for completing low-bit quantization of the neural network model parameters and the neural network characteristic diagram result.
3. The method of claim 2,
the hardware quantization acceleration module is realized by a circuit and forms a heterogeneous structure with a traditional training main body CPU or GPU.
4. The method of claim 1,
the data quantization operation comprises: temporary storage of data, statistical sorting of data, compression and decompression of data, hashing and table look-up of data, conversion of floating point number to fixed point number with specific number, shifting, scaling and intercepting of floating point number, and inverse quantization of data.
5. The method of claim 1,
the hardware quantization acceleration module is realized by logic configuration by adopting an FPGA or an ACAP, is externally connected with a nonvolatile memory device, can simultaneously store different customized functions, and configures the FPGA or the ACAP in real time according to training requirements to complete different functions in the same training process.
6. The method of claim 1,
the high-speed interface logic can be realized by a PCIE interface, a USB3.0 interface or a gigabit Ethernet interface and is communicated and interacted with the original training master control.
CN201910354693.7A 2019-04-29 2019-04-29 Neural network heterogeneous quantitative training method Active CN110097186B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910354693.7A CN110097186B (en) 2019-04-29 2019-04-29 Neural network heterogeneous quantitative training method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910354693.7A CN110097186B (en) 2019-04-29 2019-04-29 Neural network heterogeneous quantitative training method

Publications (2)

Publication Number Publication Date
CN110097186A CN110097186A (en) 2019-08-06
CN110097186B true CN110097186B (en) 2023-04-18

Family

ID=67446342

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910354693.7A Active CN110097186B (en) 2019-04-29 2019-04-29 Neural network heterogeneous quantitative training method

Country Status (1)

Country Link
CN (1) CN110097186B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582476A (en) * 2020-05-09 2020-08-25 北京百度网讯科技有限公司 Automatic quantization strategy searching method, device, equipment and storage medium
CN112258377A (en) * 2020-10-13 2021-01-22 国家计算机网络与信息安全管理中心 Method and equipment for constructing robust binary neural network
CN112308215B (en) * 2020-12-31 2021-03-30 之江实验室 Intelligent training acceleration method and system based on data sparse characteristic in neural network
CN113033784A (en) * 2021-04-18 2021-06-25 沈阳雅译网络技术有限公司 Method for searching neural network structure for CPU and GPU equipment
CN114611697B (en) * 2022-05-11 2022-09-09 上海登临科技有限公司 Neural network quantification and deployment method, system, electronic device and storage medium
CN116451757B (en) * 2023-06-19 2023-09-08 山东浪潮科学研究院有限公司 Heterogeneous acceleration method, heterogeneous acceleration device, heterogeneous acceleration equipment and heterogeneous acceleration medium for neural network model
CN116911350B (en) * 2023-09-12 2024-01-09 苏州浪潮智能科技有限公司 Quantification method based on graph neural network model, task processing method and task processing device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108280514A (en) * 2018-01-05 2018-07-13 中国科学技术大学 Sparse neural network acceleration system based on FPGA and design method
CN109635936A (en) * 2018-12-29 2019-04-16 杭州国芯科技股份有限公司 A kind of neural networks pruning quantization method based on retraining

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8131659B2 (en) * 2008-09-25 2012-03-06 Microsoft Corporation Field-programmable gate array based accelerator system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108280514A (en) * 2018-01-05 2018-07-13 中国科学技术大学 Sparse neural network acceleration system based on FPGA and design method
CN109635936A (en) * 2018-12-29 2019-04-16 杭州国芯科技股份有限公司 A kind of neural networks pruning quantization method based on retraining

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种简洁高效的加速卷积神经网络的方法;刘进锋;《科学技术与工程》;20141128(第33期);全文 *
基于GPU的并行拟牛顿神经网络训练算法设计;刘强等;《河海大学学报(自然科学版)》;20180925(第05期);全文 *

Also Published As

Publication number Publication date
CN110097186A (en) 2019-08-06

Similar Documents

Publication Publication Date Title
CN110097186B (en) Neural network heterogeneous quantitative training method
Mills et al. Communication-efficient federated learning for wireless edge intelligence in IoT
Eshratifar et al. Bottlenet: A deep learning architecture for intelligent mobile cloud computing services
CN111382844B (en) Training method and device for deep learning model
CN110928654A (en) Distributed online task unloading scheduling method in edge computing system
WO2020081399A1 (en) Network-centric architecture and algorithms to accelerate distributed training of neural networks
US11615301B2 (en) Lossless exponent and lossy mantissa weight compression for training deep neural networks
CN113595993A (en) Vehicle-mounted sensing equipment joint learning method for model structure optimization under edge calculation
CN110309904B (en) Neural network compression method
CN111158912A (en) Task unloading decision method based on deep learning in cloud and mist collaborative computing environment
CN108985444A (en) A kind of convolutional neural networks pruning method inhibited based on node
CN109214512B (en) Deep learning parameter exchange method, device, server and storage medium
Struharik et al. Conna–compressed cnn hardware accelerator
CN110992432A (en) Depth neural network-based minimum variance gradient quantization compression and image processing method
CN110263917B (en) Neural network compression method and device
Li et al. Anycostfl: Efficient on-demand federated learning over heterogeneous edge devices
CN109308517B (en) Binary device, method and application for binary neural network
Huang et al. An improved LBG algorithm for image vector quantization
Chen et al. DNN gradient lossless compression: Can GenNorm be the answer?
CN111260049A (en) Neural network implementation method based on domestic embedded system
CN114065923A (en) Compression method, system and accelerating device of convolutional neural network
CN113487036B (en) Distributed training method and device of machine learning model, electronic equipment and medium
CN113487012B (en) FPGA-oriented deep convolutional neural network accelerator and design method
CN114118358A (en) Image processing method, image processing apparatus, electronic device, medium, and program product
Liao et al. Structured neural network with low complexity for MIMO detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20230320

Address after: 250000 building S02, No. 1036, Langchao Road, high tech Zone, Jinan City, Shandong Province

Applicant after: Shandong Inspur Scientific Research Institute Co.,Ltd.

Address before: 250100 First Floor of R&D Building 2877 Kehang Road, Sun Village Town, Jinan High-tech Zone, Shandong Province

Applicant before: JINAN INSPUR HIGH-TECH TECHNOLOGY DEVELOPMENT Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant