WO2022022625A1 - Acceleration method and device for deep learning model - Google Patents

Acceleration method and device for deep learning model Download PDF

Info

Publication number
WO2022022625A1
WO2022022625A1 PCT/CN2021/109187 CN2021109187W WO2022022625A1 WO 2022022625 A1 WO2022022625 A1 WO 2022022625A1 CN 2021109187 W CN2021109187 W CN 2021109187W WO 2022022625 A1 WO2022022625 A1 WO 2022022625A1
Authority
WO
WIPO (PCT)
Prior art keywords
channels
model
channel
contribution
evaluation value
Prior art date
Application number
PCT/CN2021/109187
Other languages
French (fr)
Chinese (zh)
Inventor
付家为
陈东
张放
李晓飞
张德兆
王肖
霍舒豪
Original Assignee
北京智行者科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京智行者科技有限公司 filed Critical 北京智行者科技有限公司
Publication of WO2022022625A1 publication Critical patent/WO2022022625A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present invention relates to the field of data processing, and in particular, to a method and an apparatus.
  • Deep learning algorithms have been applied in more and more fields today. When using deep learning algorithms, people are often most concerned about the accuracy and running speed of the algorithm. In many application fields, deep learning algorithms are usually required to be real-time or fast, but limited by hardware platforms, especially embedded platforms, the inference speed of deep learning algorithms often cannot meet the speed requirements. to accelerate. Accelerating deep learning algorithms is usually accompanied by the loss of inference accuracy. Therefore, finding a suitable acceleration method and minimizing the loss of inference accuracy is crucial in the application of deep learning algorithms.
  • the deep learning algorithm is mainly accelerated through the following aspects: 1. Acceleration of convolution algorithm; 2. Network weight quantization; 3. Network structure optimization, etc.
  • Convolution operation algorithm acceleration one solution is to accelerate the operation according to the hardware characteristics of a specific platform.
  • This solution has platform limitations and is not very versatile.
  • Nvidia Computer Unified Device Architecture
  • CUDNN Nvidia's deep neural network library
  • the second scheme is to accelerate the operation of the convolution operation itself, such as using fast Fourier transform (FFT) to realize the convolution operation.
  • FFT fast Fourier transform
  • the network weights are quantized, the network weights are binarized or the floating-point precision of the weights is reduced, thereby reducing the amount of convolution operations. This solution is often accompanied by a large loss of precision.
  • the network structure There are many ways to optimize the network structure. One is to reduce the depth of the network, use a shallower network, and reduce the amount of computation. This method usually causes a large loss of inference accuracy; the second is to sparse network connections, such as Model compression is achieved by methods such as pruning and quantization, but the platform versatility of the convolution operation based on sparse matrix after compression is not ideal, and the acceleration effect on specific platforms is not ideal; the third method is tensor decomposition, which decomposes tensors into into multiple small tensors, but the number of output channels does not change, so it is difficult to compress the 1*1 convolutional layer by tensor decomposition, and many current model structures use a large number of 1*1 volumes Products, such as Residual Neural Network (Residual Neural Network) ResNet, deeper convolution (Going deeper with convolutions, GoogleNet), Xception, etc.
  • Residual Neural Network Residual Neural Network
  • ResNet
  • the purpose of the embodiments of the present invention is to provide an acceleration method and device for a deep learning model, so as to solve the problems in the prior art that the accuracy loss is large and the acceleration effect on a specific platform is not ideal.
  • the present invention provides a method for accelerating a deep learning model, the method comprising:
  • the channels in the convolutional layers in the model are trimmed to obtain a trimmed model
  • the described model after the training that is not trimmed and the model trained after trimming are evaluated respectively to obtain a first evaluation value and a second evaluation value;
  • the first evaluation value and the second evaluation value it is determined whether to output the trimmed and trained model as a new model.
  • the contribution value of each channel in the multiple channels in each convolutional layer in the acquisition model specifically includes:
  • the output of each channel in the convolutional layer is multiplied by the back-propagation gradient of the channel, and then multiplied by the inverse of the number of samples to obtain the contribution value of the channel in the convolutional layer.
  • the channels in the convolutional layers in the model are trimmed, and the trimmed model specifically includes:
  • the value obtained by subtracting the number of reserved channels of any convolutional layer from the preset number of channels threshold is used as the third value number, from the second number of channels, determine a third number of channels, and use the third number of channels as the reserved channels of any one of the convolutional layers;
  • the method before sorting the contribution values of each channel of all convolutional layers, the method further includes:
  • the first evaluation value includes a first inference accuracy and a first inference speed
  • the second evaluation value includes a second inference accuracy and a second inference speed.
  • the evaluation value and the second evaluation value, determining whether to output the trained model as a new model specifically includes:
  • the cropped model is output.
  • the present invention provides an acceleration device for a deep learning model, the device comprising:
  • the acquisition module is used to acquire the contribution value of each channel in the multiple channels in each convolutional layer in the model
  • a cropping module which is used for cropping the channels in the convolutional layers in the model according to the contribution value of each channel in all the convolutional layers to obtain a cropped model
  • the training module is used to train the model that is not trimmed and the model that has been trimmed;
  • the evaluation module is used to evaluate the model after training that is not trimmed and the model trained after trimming, respectively, to obtain a first evaluation value and a second evaluation value;
  • a determination module configured to determine whether to output the model trained after trimming as a new model according to the first evaluation value and the second evaluation value.
  • the obtaining module is specifically used for:
  • the output of each channel in the convolutional layer is multiplied by the back-propagation gradient of the channel, and then multiplied by the inverse of the number of samples to obtain the contribution value of the channel in the convolutional layer.
  • the cropping module is specifically used for:
  • the preset number of channels threshold is subtracted from the number of reserved channels of any convolution layer as the third number, From the second number of channels, determine a third number of channels, and use the third number of channels as the reserved channels of any one of the convolutional layers;
  • the present invention provides a device including a memory and a processor, where the memory is used for storing a program, and the processor is used for executing the method according to any one of the first aspects when the program is run.
  • the present invention provides a computer program product comprising instructions that, when the computer program product is run on a computer, cause the computer to perform the method according to any one of the first aspects.
  • the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method according to any one of the first aspects is implemented.
  • FIG. 1 is a schematic flowchart of a method for accelerating a deep learning model according to Embodiment 1 of the present invention
  • FIG. 2 is a schematic diagram of channel cropping of a convolutional layer provided in Embodiment 1 of the present invention
  • FIG. 3 is a schematic structural diagram of an acceleration device for a deep learning model according to Embodiment 2 of the present invention.
  • FIG. 1 is a schematic flowchart of a method for accelerating a deep learning model according to Embodiment 1 of the present invention.
  • the execution body of this application is a terminal, server or processor with computing functions. As shown in Figure 1, this application includes the following steps:
  • Step 110 obtaining the contribution value of each channel in the multiple channels in each convolutional layer in the model
  • each convolution layer has multiple convolution kernels, and each convolution kernel corresponds to outputting a feature channel.
  • l is the lth convolutional layer, referred to as the l layer
  • each convolutional layer has multiple output channels
  • k represents the kth output channel of a convolutional layer, here is the lth convolutional layer.
  • M is the number of samples, that is, the number of pictures received by training a deep learning network model, such as batch size (batchsize)
  • m is the mth sample in the M samples.
  • Step 120 according to the contribution value of each channel in all convolutional layers, trim the channels in the convolutional layers in the model to obtain a trimmed model
  • a tailored deep learning network model can be obtained by the following steps:
  • the contribution value of each channel of all convolutional layers is sorted; secondly, the channel whose contribution value is not greater than the preset contribution value threshold is clipped, and the reserved channels are determined as the first number of channels, and the clipped channels are The channel is determined to be the second number of channels; again, the number of reserved channels of each convolutional layer is obtained; then, when the number of reserved channels of any convolutional layer is less than the preset channel number threshold, from Among the second number of channels, determine the third number of channels, and use the third number of channels as the reserved channels of any convolutional layer; the number of reserved channels of any convolutional layer is the same as the third number of channels. After the number is added, it is equal to the preset channel number threshold; finally, according to the first number of channels and the reserved channels of any convolutional layer, the trimmed model is obtained.
  • the contribution values of the channels of all convolutional layers can be calculated, denoted as Value 1, Value 2, Value 3...
  • the channels with the contribution value threshold are clipped.
  • the channels whose contribution values are ranked as the last p need to be clipped.
  • the number of channels of the convolutional layer is equal to the preset number of channels threshold, for example, if the remaining channel value of a convolutional layer is less than q, then from the p channels, take out the convolutional layer with the highest score ranking Channels until the final number of channels is q, and finally the tailored deep learning network model is obtained.
  • a convolutional layer whose number of channels is less than a preset threshold of the number of channels can be determined first, then the convolutional layer is reserved, the remaining convolutional layers are sorted according to their contribution values, and then the execution is continued. The steps in the previous example until the cropped model is obtained. Therefore, the channels whose number of channels is less than the preset number of channels are reserved first, and then the remaining channels are processed, thereby improving the processing speed.
  • the dashed channel of Conv1 is the clipped channel. After Conv1 clips the channel, the dimension of the corresponding output Output1 of this layer is reduced, and the output of the previous layer is the input of the convolutional layer of the next layer, so the corresponding The dimension of the convolution kernel of the latter convolutional layer Conv2 is correspondingly reduced. Conv1 and Conv2 crop different parts, the first convolutional layer crops the entire channel, and the second convolutional layer crops a part of each channel.
  • Step 130 training the uncropped model and the cropped model respectively
  • the original deep learning network model is trained, and the trimmed deep learning network model is trained to obtain a trained deep learning network model and a trimmed trained deep learning network model.
  • How to train the deep learning network model is well known to those skilled in the art, and details are not described here.
  • Step 140 evaluating the model after training that is not trimmed and the model trained after trimming, respectively, to obtain a first evaluation value and a second evaluation value;
  • the original deep learning network model after training and the deep learning network model trained after trimming can be evaluated, for example, the first inference accuracy or the first inference speed of the trained deep learning network model can be calculated, and the calculated The second inference accuracy or the second inference speed of the trained deep learning network model. And it is determined by the first inference accuracy and the second inference accuracy whether the requirement of the propulsion accuracy is met, or whether the requirement of the inference speed is met by the first inference speed and the second inference speed.
  • Step 150 determine whether to output the model trained after trimming as a new model.
  • the trimmed and trained deep learning network model is evaluated and compared with the evaluation results of the model before trimming. If the effect of the model before trimming is significantly worse than that of the model before trimming, it is considered that the accuracy of the network after trimming is seriously lost. If the cropping fails, step 110 is executed to redo the deep learning network model. If the network accuracy is severely lost, the cropping fails, and the cropping is stopped. In the case where the evaluation value meets the requirements, that is, successful cropping, the more cropping times, the higher the network model size and inference speed will be, that is, the better the cropping effect will be.
  • the inference speed of the deep learning network model trained after trimming reaches 50hz, and the loss of inference accuracy is small, that is, the difference between the second inference accuracy and the first inference accuracy is smaller than the preset threshold, or the second inference accuracy is smaller than the preset
  • the inference accuracy threshold of is a successful clipping.
  • the actual inference speed that you want to achieve is 60hz, and the current inference speed is 50hz. At this time, you can try to continue cutting on the deep learning network model after cutting, and further compress the model size to improve the inference speed.
  • the inference accuracy of the pruned deep learning network model no longer meets the requirements, or the inference speed is greater than the required inference speed, it means that the pruning has failed and the pruning cannot be continued.
  • the acceleration method of the deep learning model provided by the first embodiment of the present invention, there is no dependency on the hardware platform, software, and deep learning framework, and the versatility is good. There are no restrictions on the size, dimension, and form of the convolution kernel. All can be accelerated, and the inference speed can be greatly improved under the condition that the inference accuracy is not lost or the loss is very small.
  • FIG. 3 is a schematic structural diagram of an acceleration device for a deep learning model provided in Embodiment 2 of the present invention. As shown in FIG. 3 , the acceleration device for a deep learning model is applied to the acceleration method for a deep learning model in Embodiment 1, as shown in FIG. 3 As shown, the acceleration device of the deep learning model includes: an acquisition module 310 , a cropping module 320 , a training module 330 , an evaluation module 340 and a determination module 350 .
  • the obtaining module 310 is configured to obtain the contribution value of each channel in the multiple channels in each convolutional layer in the model
  • the clipping module 320 is configured to clip the channels in the convolutional layer in the model according to the contribution value of each channel in all the convolutional layers, to obtain a clipped model
  • the training module 330 is used for training the cropped model
  • the evaluation module 340 is used to evaluate the model before trimming and the model after trimming, respectively, to obtain a first evaluation value and a second evaluation value;
  • the determining module 350 is configured to determine whether to output the trimmed model as a new model according to the first evaluation value and the second evaluation value.
  • the obtaining module 310 is specifically used for:
  • the output of each channel in the convolutional layer is multiplied by the backpropagation gradient of the channel, and then multiplied by the inverse of the number of samples to obtain the contribution of the channel in the convolutional layer.
  • cropping module 320 is specifically used for:
  • the number of reserved channels in any convolutional layer is less than the preset number of channels threshold, determine the third number of channels from the second number of channels, and use the third number of channels as any convolutional layer
  • the number of reserved channels of any convolution layer is added to the number of the third number of channels and equals to the preset channel number threshold
  • a cropped model is obtained from the first number of channels and the reserved channels of any convolutional layer.
  • cropping module 320 is also used for:
  • the first evaluation value includes the first inference accuracy and the first inference speed
  • the second evaluation value includes the second inference accuracy and the second inference speed
  • the determining module is specifically used for:
  • the cropped model is output.
  • the acceleration device of the deep learning model provided by the second embodiment of the present invention, there is no dependence on the hardware platform, software, and deep learning framework, and the generality is good, and there are no restrictions on the size, dimension, and form of the convolution kernel. All can be accelerated, and the inference speed can be greatly improved under the condition that the inference accuracy is not lost or the loss is very small.
  • the third embodiment of the invention provides a device including a memory and a processor, the memory is used to store a program, and the memory can be connected to the processor through a bus.
  • the memory may be non-volatile memory, such as hard drives and flash memory, in which software programs and device drivers are stored.
  • the software program can perform various functions of the above methods provided by the embodiments of the present invention; the device driver may be a network and interface driver.
  • the processor is configured to execute a software program, and when the software program is executed, the method provided by Embodiment 1 of the present invention can be implemented.
  • the fourth embodiment of the present invention provides a computer program product containing instructions, when the computer program product runs on a computer, the computer enables the computer to execute the method provided by the first embodiment of the present invention.
  • Embodiment 5 of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method provided by Embodiment 1 of the present invention is implemented.
  • a software module can be placed in random access memory (RAM), internal memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other in the technical field. in any other known form of storage medium.
  • RAM random access memory
  • ROM read only memory
  • electrically programmable ROM electrically erasable programmable ROM
  • registers hard disk, removable disk, CD-ROM, or any other in the technical field. in any other known form of storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

The present invention provides an acceleration method for a deep learning model, comprising: acquiring a contribution value of each channel of a plurality of channels in each convolution layer of the model; according to the contribution values of all the channels in all the convolution layers, clipping the channels in the convolution layers of the model to obtain a clipped model; separately training the model and the clipped model; respectively evaluating the trained model and the trained clipped model to obtain a first evaluation value and a second evaluation value; and according to the first evaluation value and the second evaluation value, determining whether to output the trained clipped model as a new model. Therefore, reasoning speed is greatly increased without losing reasoning precision or with a very small loss of reasoning precision.

Description

深度学习模型的加速方法及装置Acceleration method and device for deep learning model 技术领域technical field
本发明涉及数据处理领域,尤其涉及一种方法及装置。The present invention relates to the field of data processing, and in particular, to a method and an apparatus.
背景技术Background technique
深度学习算法在当今已经在越来越多的领域得到了应用,在使用深度学习算法时,人们往往最关心的就是算法的准确度和运行速度。在很多应用领域,通常要求深度学习算法实时性或快速性,而受限于硬件平台,尤其是嵌入式平台,深度学习算法推理速度往往不能满足速度需求,这便需要采取一些方法对深度学习算法进行加速。对深度学习算法进行加速又通常伴随着推理精度的损失,因此,找到合适的加速方法并使推理精度的损失控制在最小,在深度学习算法的应用上至关重要。Deep learning algorithms have been applied in more and more fields today. When using deep learning algorithms, people are often most concerned about the accuracy and running speed of the algorithm. In many application fields, deep learning algorithms are usually required to be real-time or fast, but limited by hardware platforms, especially embedded platforms, the inference speed of deep learning algorithms often cannot meet the speed requirements. to accelerate. Accelerating deep learning algorithms is usually accompanied by the loss of inference accuracy. Therefore, finding a suitable acceleration method and minimizing the loss of inference accuracy is crucial in the application of deep learning algorithms.
目前,人们为了提高深度学习算法的运行速度,采取了多方面的措施。目前主要通过以下几个方面,对深度学习算法进行加速:1.卷积运算算法加速;2.网络权重量化;3.网络结构优化等。At present, many measures have been taken to improve the running speed of deep learning algorithms. At present, the deep learning algorithm is mainly accelerated through the following aspects: 1. Acceleration of convolution algorithm; 2. Network weight quantization; 3. Network structure optimization, etc.
卷积运算算法加速,一种方案是针对特定平台硬件特点进行运算加速,这种方案具有平台局限性,通用性不是很强,例如通过利用显卡厂商英伟达(Nvidia)的统一计算设备架构(Compute Unified Device Architecture,CUDA)运算平台和Nvidia的深度神经网络库(NVIDIA CUDA Deep Neural Network library,CUDNN)进行加速,只能在Nvidia系列平台上使用,且加速算法已经几乎做到最优,想进一步加速优化较为困难。第二种方案是对卷积操作本身运算上做加速,例如用快速傅氏变换(fast Fourier transform,FFT)实现卷积操作,这种方案的通用性依然不是很好,由于FFT对卷积核较大的卷积操作加速更为明显,而现有卷积神经网络(Convolutional Neural  Networks,CNN)更多的采用的是小卷积核甚至1*1卷积核,FFT加速效果并不明显。Convolution operation algorithm acceleration, one solution is to accelerate the operation according to the hardware characteristics of a specific platform. This solution has platform limitations and is not very versatile. For example, by using the unified computing device architecture of the graphics card manufacturer Nvidia (Compute Unified Device Architecture, CUDA) computing platform and Nvidia's deep neural network library (NVIDIA CUDA Deep Neural Network library, CUDNN) for acceleration, can only be used on Nvidia series platforms, and the acceleration algorithm has been almost optimal, I want to further accelerate optimization more difficult. The second scheme is to accelerate the operation of the convolution operation itself, such as using fast Fourier transform (FFT) to realize the convolution operation. The versatility of this scheme is still not very good. The acceleration of larger convolution operations is more obvious, while the existing Convolutional Neural Networks (CNN) mostly use small convolution kernels or even 1*1 convolution kernels, and the FFT acceleration effect is not obvious.
网络权重量化,对网络权重进行二值化或者降低权重浮点精度,从而降低卷积操作运算量,这种方案往往伴随着较大的精度损失。The network weights are quantized, the network weights are binarized or the floating-point precision of the weights is reduced, thereby reducing the amount of convolution operations. This solution is often accompanied by a large loss of precision.
网络结构优化,又有许多不同方向的方法,一种是降低网络深度,使用更浅层的网络,减少运算量,该方法通常使推理精度损失较大;第二种是网络连接稀疏化,例如采用剪枝量化等方法实现模型压缩,但压缩后基于稀疏矩阵的卷积操作的平台通用性并不理想,在特定平台的加速效果不理想;第三种方法是张量分解,将张量分解成多个小张量,但是输出的通道数并没有变化,因此对于1*1的卷积层很难通过张量分解的方法做压缩,而当前很多模型结构都用到了大量的1*1卷积,例如残差神经网络(Residual Neural Network)ResNet,更深的卷积(Going deeper with convolutions,GoogleNet),Xception等。There are many ways to optimize the network structure. One is to reduce the depth of the network, use a shallower network, and reduce the amount of computation. This method usually causes a large loss of inference accuracy; the second is to sparse network connections, such as Model compression is achieved by methods such as pruning and quantization, but the platform versatility of the convolution operation based on sparse matrix after compression is not ideal, and the acceleration effect on specific platforms is not ideal; the third method is tensor decomposition, which decomposes tensors into into multiple small tensors, but the number of output channels does not change, so it is difficult to compress the 1*1 convolutional layer by tensor decomposition, and many current model structures use a large number of 1*1 volumes Products, such as Residual Neural Network (Residual Neural Network) ResNet, deeper convolution (Going deeper with convolutions, GoogleNet), Xception, etc.
发明内容SUMMARY OF THE INVENTION
本发明实施例的目的是提供一种深度学习模型的加速方法及装置,以解决现有技术中的精度损失大、在特定平台的加速效果不理想的问题。The purpose of the embodiments of the present invention is to provide an acceleration method and device for a deep learning model, so as to solve the problems in the prior art that the accuracy loss is large and the acceleration effect on a specific platform is not ideal.
第一方面,本发明提供了一种深度学习模型的加速方法,所述方法包括:In a first aspect, the present invention provides a method for accelerating a deep learning model, the method comprising:
获取模型中的每个卷积层中的多个通道中每个通道的贡献值;Get the contribution of each channel in multiple channels in each convolutional layer in the model;
根据全部卷积层中各通道的贡献值,对所述模型中的卷积层中的通道进行裁剪,得到裁剪后的模型;According to the contribution value of each channel in all the convolutional layers, the channels in the convolutional layers in the model are trimmed to obtain a trimmed model;
对未剪裁的所述模型和裁剪后的所述模型分别进行训练;training the untrimmed model and the trimmed model respectively;
对未剪裁的训练后的所述模型和裁剪后训练的模型分别进行评估,得到第一评估值和第二评估值;The described model after the training that is not trimmed and the model trained after trimming are evaluated respectively to obtain a first evaluation value and a second evaluation value;
根据所述第一评估值和所述第二评估值,确定是否将裁剪后训练的模型作为新模型输出。According to the first evaluation value and the second evaluation value, it is determined whether to output the trimmed and trained model as a new model.
在一种可能的实现方式中,所述获取模型的中每个卷积层中的多个通道中每个通道的贡献值具体包括:In a possible implementation manner, the contribution value of each channel in the multiple channels in each convolutional layer in the acquisition model specifically includes:
所述卷积层中每个通道的输出乘以所述通道的反向传播梯度,再乘以样本个数的倒数,得到卷积层中通道的贡献值。The output of each channel in the convolutional layer is multiplied by the back-propagation gradient of the channel, and then multiplied by the inverse of the number of samples to obtain the contribution value of the channel in the convolutional layer.
在一种可能的实现方式中,所述根据全部卷积层的贡献值,对所述模型中的卷积层中的通道进行裁剪,得到裁剪后的模型具体包括:In a possible implementation manner, according to the contribution values of all convolutional layers, the channels in the convolutional layers in the model are trimmed, and the trimmed model specifically includes:
对全部卷积层的每个通道的所述贡献值进行排序;sorting the contribution values of each channel of all convolutional layers;
裁剪所述贡献值不大于预设的贡献值阈值的通道,并将保留的通道确定为第一数量个通道,被裁剪掉的通道确定为第二数量个通道;Clipping the channels whose contribution value is not greater than the preset contribution value threshold, and determining the reserved channels as the first number of channels, and the clipped channels as the second number of channels;
获取每个卷积层的保留的通道的个数;Get the number of reserved channels for each convolutional layer;
当任一卷积层的保留的通道个数小于预设的通道个数阈值时,将预设的通道个数阈值减去所述任一卷积层的保留的通道个数所得值作为第三数量,从所述第二数量个通道中,确定第三数量个通道,并将所述第三数量个通道作为所述任一卷积层的保留的通道;When the number of reserved channels of any convolutional layer is less than the preset number of channels threshold, the value obtained by subtracting the number of reserved channels of any convolutional layer from the preset number of channels threshold is used as the third value number, from the second number of channels, determine a third number of channels, and use the third number of channels as the reserved channels of any one of the convolutional layers;
根据所述第一数量个通道和所述任一卷积层的保留的通道,得到裁剪后的模型。According to the first number of channels and the reserved channels of any one of the convolutional layers, a tailored model is obtained.
在一种可能的实现方式中,所述对全部卷积层的每个通道的所述贡献值进行排序之前,还包括:In a possible implementation manner, before sorting the contribution values of each channel of all convolutional layers, the method further includes:
获取每个卷积层的通道的个数;Get the number of channels of each convolutional layer;
保留通道的个数小于预设的通道个数阈值的卷积层;Retain convolutional layers whose number of channels is less than the preset number of channels threshold;
将通道的个数大于预设的通道个数阈值的卷积层的每个通道的贡献值进行排序。Sort the contribution value of each channel of the convolutional layer whose number of channels is greater than the preset number of channels threshold.
在一种可能的实现方式中,所述第一评估值包括第一推理精度和第一推理速度;所述第二评估值包括第二推理精度和第二推理速度,所述根据所述第一评估值和所述第二评估值,确定是否将训练后的模型作为新模型输出具体包括:In a possible implementation manner, the first evaluation value includes a first inference accuracy and a first inference speed; the second evaluation value includes a second inference accuracy and a second inference speed. The evaluation value and the second evaluation value, determining whether to output the trained model as a new model specifically includes:
当所述第一推理精度与所述第二推理精度的差值在预设的推理精度阈值范围内时,如果所述第二推理速度小于预设的推理速度,则继续对训练后的模型进行裁剪;或者,When the difference between the first inference accuracy and the second inference accuracy is within the preset inference accuracy threshold, and if the second inference speed is less than the preset inference speed, continue to perform the training on the model after training. cropping; or,
当所述第一推理精度与所述第二推理精度的差值在预设的推理精度阈值范围内时,且所述第二推理速度等于预设的推理速度,则将裁剪后的模型输出。When the difference between the first inference accuracy and the second inference accuracy is within a preset inference accuracy threshold range, and the second inference speed is equal to the preset inference speed, the cropped model is output.
第二方面,本发明提供了一种深度学习模型的加速装置,所述装置包括:In a second aspect, the present invention provides an acceleration device for a deep learning model, the device comprising:
获取模块,所述获取模块用于获取模型中的每个卷积层中的多个通道中每个通道的贡献值;an acquisition module, the acquisition module is used to acquire the contribution value of each channel in the multiple channels in each convolutional layer in the model;
裁剪模块,所述裁剪模块用于根据全部卷积层中各通道的贡献值,对所述模型中的卷积层中的通道进行裁剪,得到裁剪后的模型;A cropping module, which is used for cropping the channels in the convolutional layers in the model according to the contribution value of each channel in all the convolutional layers to obtain a cropped model;
训练模块,所述训练模块用于对未裁剪的所述模型和裁剪后的所述模型分别进行训练;a training module, the training module is used to train the model that is not trimmed and the model that has been trimmed;
评估模块,所述评估模块用于对未裁剪的训练后的所述模型和裁剪后训练的模型分别进行评估,得到第一评估值和第二评估值;an evaluation module, the evaluation module is used to evaluate the model after training that is not trimmed and the model trained after trimming, respectively, to obtain a first evaluation value and a second evaluation value;
确定模块,所述确定模块用于根据所述第一评估值和所述第二评估值,确定是否将裁剪后训练的模型作为新模型输出。A determination module, configured to determine whether to output the model trained after trimming as a new model according to the first evaluation value and the second evaluation value.
在一种可能的实现方式中,所述获取模块具体用于:In a possible implementation manner, the obtaining module is specifically used for:
所述卷积层中每个通道的输出乘以所述通道的反向传播梯度,再乘以样本个数的倒数,得到卷积层中通道的贡献值。The output of each channel in the convolutional layer is multiplied by the back-propagation gradient of the channel, and then multiplied by the inverse of the number of samples to obtain the contribution value of the channel in the convolutional layer.
在一种可能的实现方式中,所述裁剪模块具体用于:In a possible implementation manner, the cropping module is specifically used for:
对全部卷积层的每个通道的所述贡献值进行排序;sorting the contribution values of each channel of all convolutional layers;
裁剪所述贡献值不大于预设的贡献值阈值的通道,并将保留的通道确定为第一数量个通道,被裁剪掉的通道确定为第二数量个通道;Clipping the channels whose contribution value is not greater than the preset contribution value threshold, and determining the reserved channels as the first number of channels, and the clipped channels as the second number of channels;
获取每个卷积层的保留的通道的个数;Get the number of reserved channels for each convolutional layer;
当任一卷积层的保留的通道个数小于预设的通道个数阈值时,将预设的 通道个数阈值减去所述任一卷积层的保留的通道个数作为第三数量,从所述第二数量个通道中,确定第三数量个通道,并将所述第三数量个通道作为所述任一卷积层的保留的通道;When the number of reserved channels of any convolution layer is less than the preset number of channels threshold, the preset number of channels threshold is subtracted from the number of reserved channels of any convolution layer as the third number, From the second number of channels, determine a third number of channels, and use the third number of channels as the reserved channels of any one of the convolutional layers;
根据所述第一数量个通道和所述任一卷积层的保留的通道,得到裁剪后的模型。According to the first number of channels and the reserved channels of any one of the convolutional layers, a tailored model is obtained.
第三方面,本发明提供了一种设备,包括存储器和处理器,存储器用于存储程序,处理器用于在运行该程序时执行第一方面任一项所述的方法。In a third aspect, the present invention provides a device including a memory and a processor, where the memory is used for storing a program, and the processor is used for executing the method according to any one of the first aspects when the program is run.
第四方面,本发明提供了一种包含指令的计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行如第一方面任一项所述的方法。In a fourth aspect, the present invention provides a computer program product comprising instructions that, when the computer program product is run on a computer, cause the computer to perform the method according to any one of the first aspects.
第五方面,本发明提供了一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现如第一方面任一项所述的方法。In a fifth aspect, the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method according to any one of the first aspects is implemented.
通过应用本发明实施例二提供的深度学习模型的加速方法及装置,对硬件平台、软件、深度学习框架没有依赖性,通用性较好,对于卷积核大小、维度、形式没有限制,任何卷积操作都可以进行加速,在推理精度不损失或者损失非常小的情况下,实现推理速度的大大提升。By applying the deep learning model acceleration method and device provided in the second embodiment of the present invention, there is no dependence on hardware platforms, software, and deep learning frameworks, and the versatility is good. Product operations can be accelerated, and the inference speed can be greatly improved without loss of inference accuracy or very small loss.
附图说明Description of drawings
图1为本发明实施例一提供的深度学习模型的加速方法流程示意图;1 is a schematic flowchart of a method for accelerating a deep learning model according to Embodiment 1 of the present invention;
图2为本发明实施例一提供的卷积层的通道裁剪示意图;2 is a schematic diagram of channel cropping of a convolutional layer provided in Embodiment 1 of the present invention;
图3为本发明实施例二提供的深度学习模型的加速装置结构示意图。FIG. 3 is a schematic structural diagram of an acceleration device for a deep learning model according to Embodiment 2 of the present invention.
具体实施方式detailed description
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为便于描述,附图中仅示出了与有关发明相关的部分。The present application will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the related invention, but not to limit the invention. In addition, it should be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings.
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。It should be noted that the embodiments in the present application and the features of the embodiments may be combined with each other in the case of no conflict. The present application will be described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
图1为本发明实施例一提供的深度学习模型的加速方法流程示意图。本申请的执行主体为具有计算功能的终端、服务器或者处理器。如图1所示,本申请包括以下步骤:FIG. 1 is a schematic flowchart of a method for accelerating a deep learning model according to Embodiment 1 of the present invention. The execution body of this application is a terminal, server or processor with computing functions. As shown in Figure 1, this application includes the following steps:
步骤110,获取模型中的每个卷积层中的多个通道中每个通道的贡献值; Step 110, obtaining the contribution value of each channel in the multiple channels in each convolutional layer in the model;
具体的,深度学习网络模型中,有多个卷积层,每个卷积层中有多个卷积核,每个卷积核对应输出一个特征通道。Specifically, in the deep learning network model, there are multiple convolution layers, each convolution layer has multiple convolution kernels, and each convolution kernel corresponds to outputting a feature channel.
可以通过下列公式计算各通道的贡献值:The contribution of each channel can be calculated by the following formula:
Figure PCTCN2021109187-appb-000001
Figure PCTCN2021109187-appb-000001
其中,l为第l个卷积层,简称l层,每个卷积层有多个输出通道,k表示某个卷积层的第k个输出通道,这里为第l个卷积层的第k个输出通道,M为样本个数,即训练一次深度学习网络模型接收的图片数量,如批尺寸(batchsize),m为M个样本中的第m个样本。Among them, l is the lth convolutional layer, referred to as the l layer, each convolutional layer has multiple output channels, k represents the kth output channel of a convolutional layer, here is the lth convolutional layer. There are k output channels, M is the number of samples, that is, the number of pictures received by training a deep learning network model, such as batch size (batchsize), m is the mth sample in the M samples.
步骤120,根据全部卷积层中各通道的贡献值,对模型中的卷积层中的通道进行裁剪,得到裁剪后的模型; Step 120, according to the contribution value of each channel in all convolutional layers, trim the channels in the convolutional layers in the model to obtain a trimmed model;
在一个示例中,可以通过如下步骤得到裁剪后的深度学习网络模型:In one example, a tailored deep learning network model can be obtained by the following steps:
首先,对全部卷积层的每个通道的贡献值进行排序;其次,裁剪贡献值不大于预设的贡献值阈值的通道,并将保留的通道确定为第一数量个通道,被裁剪掉的通道确定为第二数量个通道;再次,获取每个卷积层的保留的通道的个数;接着,当任一卷积层的保留的通道个数小于预设的通道个数阈值时,从第二数量个通道中,确定第三数量个通道,并将第三数量个通道作为任一卷积层的保留的通道;任一卷积层的保留的通道个数与第三数量个通道的个数相加后等于预设的通道个数阈值;最后,根据第一数量个通道和任一卷积层的保留的通道,得到裁剪后的模型。First, the contribution value of each channel of all convolutional layers is sorted; secondly, the channel whose contribution value is not greater than the preset contribution value threshold is clipped, and the reserved channels are determined as the first number of channels, and the clipped channels are The channel is determined to be the second number of channels; again, the number of reserved channels of each convolutional layer is obtained; then, when the number of reserved channels of any convolutional layer is less than the preset channel number threshold, from Among the second number of channels, determine the third number of channels, and use the third number of channels as the reserved channels of any convolutional layer; the number of reserved channels of any convolutional layer is the same as the third number of channels. After the number is added, it is equal to the preset channel number threshold; finally, according to the first number of channels and the reserved channels of any convolutional layer, the trimmed model is obtained.
比如,可以计算得到全部卷积层的通道的贡献值,记为Value 1,Value 2,Value 3……Value n,对其按照大小进行排序,根据排序的结果,可以将贡献值不大于预设的贡献值阈值的通道裁剪掉,比如,贡献值排序为后p个的通道需要被裁剪。随后,当被裁剪后,某个卷积层的保留的通道的个数小于预设的通道个数阈值时,又从被裁剪掉的通道中,取出该卷积层中的排序靠前的通道,直到该卷积层的通道的个数等于预设的通道个数阈值,比如,某个卷积层的剩余通道值小于q,则从p个通道中,取出该卷积层得分排序高的通道,直到最终通道个数为q截止,最终得到裁剪后的深度学习网络模型。For example, the contribution values of the channels of all convolutional layers can be calculated, denoted as Value 1, Value 2, Value 3... The channels with the contribution value threshold are clipped. For example, the channels whose contribution values are ranked as the last p need to be clipped. Subsequently, when the number of reserved channels of a convolutional layer is less than the preset number of channels threshold after being cropped, the channels in the convolutional layer with the highest order are taken out from the cropped channels. , until the number of channels of the convolutional layer is equal to the preset number of channels threshold, for example, if the remaining channel value of a convolutional layer is less than q, then from the p channels, take out the convolutional layer with the highest score ranking Channels until the final number of channels is q, and finally the tailored deep learning network model is obtained.
在另一个示例中,为了提高处理速度,首先可以确定通道数小于预设的通道个数阈值的卷积层,然后保留该卷积层,对其余卷积层按照贡献值进行排序,再继续执行上一个示例中的步骤,直至得到裁剪后的模型。由此,将通道数小于预设的通道个数阈值的通道先进行保留,再对其余通道进行处理,从而提高了处理速度。In another example, in order to improve the processing speed, a convolutional layer whose number of channels is less than a preset threshold of the number of channels can be determined first, then the convolutional layer is reserved, the remaining convolutional layers are sorted according to their contribution values, and then the execution is continued. The steps in the previous example until the cropped model is obtained. Therefore, the channels whose number of channels is less than the preset number of channels are reserved first, and then the remaining channels are processed, thereby improving the processing speed.
参见图2,Conv1的虚线部分通道是裁剪掉的通道,Conv1裁剪掉通道后,相应的该层输出Output1的维度降低,前一层的输出又是后一层卷积层的输入,所以对应的后一卷积层Conv2的卷积核维度相应降低。Conv1和Conv2裁剪掉的是不同的部分,第一个卷积层裁剪掉整个通道,第二个卷积层裁剪掉每个通道的一部分。Referring to Figure 2, the dashed channel of Conv1 is the clipped channel. After Conv1 clips the channel, the dimension of the corresponding output Output1 of this layer is reduced, and the output of the previous layer is the input of the convolutional layer of the next layer, so the corresponding The dimension of the convolution kernel of the latter convolutional layer Conv2 is correspondingly reduced. Conv1 and Conv2 crop different parts, the first convolutional layer crops the entire channel, and the second convolutional layer crops a part of each channel.
步骤130,对未裁剪的模型和裁剪后的模型分别进行训练; Step 130, training the uncropped model and the cropped model respectively;
具体的,对原深度学习网络模型进行训练,并且对裁剪后的深度学习网络模型进行训练,得到训练后的深度学习网络模型和裁剪后训练的深度学习网络模型。具体如何对深度学习网络模型进行训练,是本领域技术人员所周知的,此处不再赘述。Specifically, the original deep learning network model is trained, and the trimmed deep learning network model is trained to obtain a trained deep learning network model and a trimmed trained deep learning network model. How to train the deep learning network model is well known to those skilled in the art, and details are not described here.
步骤140,对未裁剪的训练后模型和裁剪后训练的模型分别进行评估,得到第一评估值和第二评估值; Step 140, evaluating the model after training that is not trimmed and the model trained after trimming, respectively, to obtain a first evaluation value and a second evaluation value;
其中,可以对训练后的原深度学习网络模型和裁剪之后训练的深度学习 网络模型进行评估,比如,可以计算训练后的深度学习网络模型的第一推理精度或者第一推理速度,以及计算裁剪后训练的深度学习网络模型的第二推理精度或者第二推理速度。并通过第一推理精度和第二推理精度,确定是否符合推进精度要求,或者,通过第一推理速度和第二推理速度,确定是否符合推理速度要求。Among them, the original deep learning network model after training and the deep learning network model trained after trimming can be evaluated, for example, the first inference accuracy or the first inference speed of the trained deep learning network model can be calculated, and the calculated The second inference accuracy or the second inference speed of the trained deep learning network model. And it is determined by the first inference accuracy and the second inference accuracy whether the requirement of the propulsion accuracy is met, or whether the requirement of the inference speed is met by the first inference speed and the second inference speed.
步骤150,根据第一评估值和第二评估值,确定是否将裁剪后训练的模型作为新模型输出。 Step 150 , according to the first evaluation value and the second evaluation value, determine whether to output the model trained after trimming as a new model.
具体的,对裁剪并进行训练后的深度学习网络模型进行评估,并与裁剪之前的模型的评估结果对比,如果比裁剪之前的模型的效果明显变差,则认为裁剪后的网络精度严重损失,裁剪失败,则执行步骤110,重新对深度学习网络模型进行裁剪,如果都是网络精度严重损失,则裁剪失败,停止裁剪。在评估值符合要求的情况下,即成功的裁剪,裁剪次数越多,网络模型大小和推理速度提升越高,即裁剪效果越好。Specifically, the trimmed and trained deep learning network model is evaluated and compared with the evaluation results of the model before trimming. If the effect of the model before trimming is significantly worse than that of the model before trimming, it is considered that the accuracy of the network after trimming is seriously lost. If the cropping fails, step 110 is executed to redo the deep learning network model. If the network accuracy is severely lost, the cropping fails, and the cropping is stopped. In the case where the evaluation value meets the requirements, that is, successful cropping, the more cropping times, the higher the network model size and inference speed will be, that is, the better the cropping effect will be.
比如,裁剪后进行训练的深度学习网络模型的推理速度达到50hz,且推理精度损失较小,即第二推理精度与第一推理精度之差小于预设的阈值,或者第二推理精度小于预设的推理精度阈值,则属于成功的裁剪。再比如,实际想要达到的推理速度为60hz,而当前的推理速度为50hz,此时可以尝试在裁剪之后的深度学习网络模型上继续裁剪,进一步压缩模型大小,以提升推理速度。For example, the inference speed of the deep learning network model trained after trimming reaches 50hz, and the loss of inference accuracy is small, that is, the difference between the second inference accuracy and the first inference accuracy is smaller than the preset threshold, or the second inference accuracy is smaller than the preset The inference accuracy threshold of , is a successful clipping. For another example, the actual inference speed that you want to achieve is 60hz, and the current inference speed is 50hz. At this time, you can try to continue cutting on the deep learning network model after cutting, and further compress the model size to improve the inference speed.
但是如果裁剪后的深度学习网络模型的推理精度已经不符合要求,或者推理速度已经大于要求的推理速度,则说明裁剪已经失败,就不能继续裁剪。However, if the inference accuracy of the pruned deep learning network model no longer meets the requirements, or the inference speed is greater than the required inference speed, it means that the pruning has failed and the pruning cannot be continued.
通过应用本发明实施例一提供的深度学习模型的加速方法,对硬件平台、软件、深度学习框架没有依赖性,通用性较好,对于卷积核大小、维度、形式没有限制,任何卷积操作都可以进行加速,在推理精度不损失或者损失非常小的情况下,实现推理速度的大大提升。By applying the acceleration method of the deep learning model provided by the first embodiment of the present invention, there is no dependency on the hardware platform, software, and deep learning framework, and the versatility is good. There are no restrictions on the size, dimension, and form of the convolution kernel. All can be accelerated, and the inference speed can be greatly improved under the condition that the inference accuracy is not lost or the loss is very small.
图3为本发明实施例二提供的深度学习模型的加速装置结构示意图,如 图3所示,该深度学习模型的加速装置应用在实施例一种的深度学习模型的加速方法中,如图3所示,该深度学习模型的加速装置包括:获取模块310、裁剪模块320、训练模块330、评估模块340和确定模块350。FIG. 3 is a schematic structural diagram of an acceleration device for a deep learning model provided in Embodiment 2 of the present invention. As shown in FIG. 3 , the acceleration device for a deep learning model is applied to the acceleration method for a deep learning model in Embodiment 1, as shown in FIG. 3 As shown, the acceleration device of the deep learning model includes: an acquisition module 310 , a cropping module 320 , a training module 330 , an evaluation module 340 and a determination module 350 .
获取模块310用于获取模型中的每个卷积层中的多个通道中每个通道的贡献值;The obtaining module 310 is configured to obtain the contribution value of each channel in the multiple channels in each convolutional layer in the model;
裁剪模块320用于根据全部卷积层中各通道的贡献值,对模型中的卷积层中的通道进行裁剪,得到裁剪后的模型;The clipping module 320 is configured to clip the channels in the convolutional layer in the model according to the contribution value of each channel in all the convolutional layers, to obtain a clipped model;
训练模块330用于对裁剪后的模型进行训练;The training module 330 is used for training the cropped model;
评估模块340用于对裁剪之前的模型和裁剪之后的模型分别进行评估,得到第一评估值和第二评估值;The evaluation module 340 is used to evaluate the model before trimming and the model after trimming, respectively, to obtain a first evaluation value and a second evaluation value;
确定模块350用于根据第一评估值和第二评估值,确定是否将裁剪后的模型作为新模型输出。The determining module 350 is configured to determine whether to output the trimmed model as a new model according to the first evaluation value and the second evaluation value.
进一步的,获取模块310具体用于:Further, the obtaining module 310 is specifically used for:
卷积层中每个通道的输出乘以通道的反向传播梯度,再乘以样本个数的倒数,得到卷积层中通道的贡献值。The output of each channel in the convolutional layer is multiplied by the backpropagation gradient of the channel, and then multiplied by the inverse of the number of samples to obtain the contribution of the channel in the convolutional layer.
进一步的,裁剪模块320具体用于:Further, the cropping module 320 is specifically used for:
对全部卷积层的每个通道的贡献值进行排序;Sort the contribution of each channel of all convolutional layers;
裁剪贡献值不大于预设的贡献值阈值的通道,并将保留的通道确定为第一数量个通道,被裁剪掉的通道确定为第二数量个通道;Clip the channels whose contribution value is not greater than the preset contribution value threshold, and determine the reserved channels as the first number of channels, and the clipped channels as the second number of channels;
获取每个卷积层的保留的通道的个数;Get the number of reserved channels for each convolutional layer;
当任一卷积层的保留的通道个数小于预设的通道个数阈值时,从第二数量个通道中,确定第三数量个通道,并将第三数量个通道作为任一卷积层的保留的通道;任一卷积层的保留的通道个数与第三数量个通道的个数相加后等于预设的通道个数阈值;When the number of reserved channels in any convolutional layer is less than the preset number of channels threshold, determine the third number of channels from the second number of channels, and use the third number of channels as any convolutional layer The number of reserved channels of any convolution layer is added to the number of the third number of channels and equals to the preset channel number threshold;
根据第一数量个通道和任一卷积层的保留的通道,得到裁剪后的模型。A cropped model is obtained from the first number of channels and the reserved channels of any convolutional layer.
进一步的,裁剪模块320还用于:Further, the cropping module 320 is also used for:
获取每个卷积层的通道的个数;Get the number of channels of each convolutional layer;
保留通道的个数小于预设的通道个数阈值的卷积层;Retain convolutional layers whose number of channels is less than the preset number of channels threshold;
将通道的个数大于预设的通道个数阈值的卷积层的每个通道的贡献值进行排序。Sort the contribution value of each channel of the convolutional layer whose number of channels is greater than the preset number of channels threshold.
进一步的,第一评估值包括第一推理精度和第一推理速度;第二评估值包括第二推理精度和第二推理速度,确定模块具体用于:Further, the first evaluation value includes the first inference accuracy and the first inference speed; the second evaluation value includes the second inference accuracy and the second inference speed, and the determining module is specifically used for:
当第一推理精度与第二推理精度的差值在预设的推理精度阈值范围内时,如果第二推理速度小于预设的推理速度,则继续对训练后的模型进行裁剪;或者,When the difference between the first inference accuracy and the second inference accuracy is within the preset inference accuracy threshold, if the second inference speed is less than the preset inference speed, continue to trim the trained model; or,
当第一推理精度与第二推理精度的差值在预设的推理精度阈值范围内时,且第二推理速度等于预设的推理速度,则将裁剪后的模型输出。When the difference between the first inference accuracy and the second inference accuracy is within the preset inference accuracy threshold range, and the second inference speed is equal to the preset inference speed, the cropped model is output.
通过应用本发明实施例二提供的深度学习模型的加速装置,对硬件平台、软件、深度学习框架没有依赖性,通用性较好,对于卷积核大小、维度、形式没有限制,任何卷积操作都可以进行加速,在推理精度不损失或者损失非常小的情况下,实现推理速度的大大提升。By applying the acceleration device of the deep learning model provided by the second embodiment of the present invention, there is no dependence on the hardware platform, software, and deep learning framework, and the generality is good, and there are no restrictions on the size, dimension, and form of the convolution kernel. All can be accelerated, and the inference speed can be greatly improved under the condition that the inference accuracy is not lost or the loss is very small.
发明实施例三提供了一种设备,包括存储器和处理器,存储器用于存储程序,存储器可通过总线与处理器连接。存储器可以是非易失存储器,例如硬盘驱动器和闪存,存储器中存储有软件程序和设备驱动程序。软件程序能够执行本发明实施例提供的上述方法的各种功能;设备驱动程序可以是网络和接口驱动程序。处理器用于执行软件程序,该软件程序被执行时,能够实现本发明实施例一提供的方法。The third embodiment of the invention provides a device including a memory and a processor, the memory is used to store a program, and the memory can be connected to the processor through a bus. The memory may be non-volatile memory, such as hard drives and flash memory, in which software programs and device drivers are stored. The software program can perform various functions of the above methods provided by the embodiments of the present invention; the device driver may be a network and interface driver. The processor is configured to execute a software program, and when the software program is executed, the method provided by Embodiment 1 of the present invention can be implemented.
本发明实施例四提供了一种包含指令的计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行本发明实施例一提供的方法。The fourth embodiment of the present invention provides a computer program product containing instructions, when the computer program product runs on a computer, the computer enables the computer to execute the method provided by the first embodiment of the present invention.
本发明实施例五提供了一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现本发明实施例一提供的方法。Embodiment 5 of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method provided by Embodiment 1 of the present invention is implemented.
专业人员应该还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。Professionals should be further aware that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of the two. Interchangeability, the above description has generally described the components and steps of each example in terms of function. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of the present invention.
结合本文中所公开的实施例描述的方法或算法的步骤可以用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。The steps of a method or algorithm described in connection with the embodiments disclosed herein may be implemented in hardware, a software module executed by a processor, or a combination of the two. A software module can be placed in random access memory (RAM), internal memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other in the technical field. in any other known form of storage medium.
以上的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above specific embodiments further describe the purpose, technical solutions and beneficial effects of the present invention in detail. It should be understood that the above are only specific embodiments of the present invention, and are not intended to limit the protection scope of the present invention. Within the spirit and principle of the present invention, any modifications, equivalent replacements, improvements, etc. made should be included within the protection scope of the present invention.

Claims (10)

  1. 一种深度学习模型的加速方法,其特征在于,所述方法包括:A method for accelerating a deep learning model, wherein the method comprises:
    获取模型中的每个卷积层中的多个通道中每个通道的贡献值;Get the contribution of each channel in multiple channels in each convolutional layer in the model;
    根据全部卷积层中各通道的贡献值,对所述模型中的卷积层中的通道进行裁剪,得到裁剪后的模型;According to the contribution value of each channel in all the convolutional layers, the channels in the convolutional layers in the model are trimmed to obtain a trimmed model;
    对未剪裁的所述模型和裁剪后的所述模型分别进行训练;training the untrimmed model and the trimmed model respectively;
    对未剪裁的训练后的所述模型和裁剪后训练的模型分别进行评估,得到第一评估值和第二评估值;The described model after the training that is not trimmed and the model trained after trimming are evaluated respectively to obtain a first evaluation value and a second evaluation value;
    根据所述第一评估值和所述第二评估值,确定是否将裁剪后训练的模型作为新模型输出。According to the first evaluation value and the second evaluation value, it is determined whether to output the trimmed and trained model as a new model.
  2. 根据权利要求1所述的方法,其特征在于,所述获取模型的中每个卷积层中的多个通道中每个通道的贡献值具体包括:The method according to claim 1, wherein the obtaining the contribution value of each channel in the multiple channels in each convolutional layer in the model specifically comprises:
    所述卷积层中每个通道的输出乘以所述通道的反向传播梯度,再乘以样本个数的倒数,得到卷积层中通道的贡献值。The output of each channel in the convolutional layer is multiplied by the back-propagation gradient of the channel, and then multiplied by the inverse of the number of samples to obtain the contribution value of the channel in the convolutional layer.
  3. 根据权利要求1所述的方法,其特征在于,所述根据全部卷积层的贡献值,对所述模型中的卷积层中的通道进行裁剪,得到裁剪后的模型具体包括:The method according to claim 1, wherein, according to the contribution values of all convolutional layers, the channels in the convolutional layers in the model are trimmed, and the trimmed model specifically comprises:
    对全部卷积层的每个通道的所述贡献值进行排序;sorting the contribution values of each channel of all convolutional layers;
    裁剪所述贡献值不大于预设的贡献值阈值的通道,并将保留的通道确定为第一数量个通道,被裁剪掉的通道确定为第二数量个通道;Clipping the channels whose contribution value is not greater than the preset contribution value threshold, and determining the reserved channels as the first number of channels, and the clipped channels as the second number of channels;
    获取每个卷积层的保留的通道的个数;Get the number of reserved channels for each convolutional layer;
    当任一卷积层的保留的通道个数小于预设的通道个数阈值时,将预设的通道个数阈值减去所述任一卷积层的保留的通道个数所得值作为第三数量,从所述第二数量个通道中,确定第三数量个通道,并将所述第三数量个通道作为所述任一卷积层的保留的通道;When the number of reserved channels of any convolutional layer is less than the preset number of channels threshold, the value obtained by subtracting the number of reserved channels of any convolutional layer from the preset number of channels threshold is used as the third value number, from the second number of channels, determine a third number of channels, and use the third number of channels as the reserved channels of any one of the convolutional layers;
    根据所述第一数量个通道和所述任一卷积层的保留的通道,得到裁剪后 的模型。According to the first number of channels and the reserved channels of any one of the convolutional layers, a tailored model is obtained.
  4. 根据权利要求1所述的方法,其特征在于,所述对全部卷积层的每个通道的所述贡献值进行排序之前,还包括:The method according to claim 1, characterized in that before sorting the contribution values of each channel of all convolutional layers, the method further comprises:
    获取每个卷积层的通道的个数;Get the number of channels of each convolutional layer;
    保留通道的个数小于预设的通道个数阈值的卷积层;Retain convolutional layers whose number of channels is less than the preset number of channels threshold;
    将通道的个数大于预设的通道个数阈值的卷积层的每个通道的贡献值进行排序。Sort the contribution value of each channel of the convolutional layer whose number of channels is greater than the preset number of channels threshold.
  5. 根据权利要求1所述的方法,其特征在于,所述第一评估值包括第一推理精度和第一推理速度;所述第二评估值包括第二推理精度和第二推理速度,所述根据所述第一评估值和所述第二评估值,确定是否将裁剪后训练的模型作为新模型输出具体包括:The method according to claim 1, wherein the first evaluation value includes a first inference accuracy and a first inference speed; the second evaluation value includes a second inference accuracy and a second inference speed, and the For the first evaluation value and the second evaluation value, determining whether to output the model trained after trimming as a new model specifically includes:
    当所述第一推理精度与所述第二推理精度的差值在预设的推理精度阈值范围内时,如果所述第二推理速度小于预设的推理速度,则继续对裁剪后训练的模型进行裁剪;或者,When the difference between the first inference accuracy and the second inference accuracy is within the preset inference accuracy threshold range, and if the second inference speed is less than the preset inference speed, continue to train the model after trimming to be cropped; or,
    当所述第一推理精度与所述第二推理精度的差值在预设的推理精度阈值范围内时,且所述第二推理速度等于预设的推理速度,则将裁剪后训练的模型输出。When the difference between the first inference accuracy and the second inference accuracy is within the preset inference accuracy threshold, and the second inference speed is equal to the preset inference speed, the model trained after trimming is output .
  6. 一种深度学习模型的加速装置,其特征在于,所述装置包括:A device for accelerating a deep learning model, wherein the device comprises:
    获取模块,所述获取模块用于获取模型中的每个卷积层中的多个通道中每个通道的贡献值;an acquisition module, the acquisition module is used to acquire the contribution value of each channel in the multiple channels in each convolutional layer in the model;
    裁剪模块,所述裁剪模块用于根据全部卷积层中各通道的贡献值,对所述模型中的卷积层中的通道进行裁剪,得到裁剪后的模型;A cropping module, which is used for cropping the channels in the convolutional layers in the model according to the contribution values of each channel in all the convolutional layers to obtain a cropped model;
    训练模块,所述训练模块用于对未裁剪的所述模型和裁剪后的所述模型分别进行训练;a training module, the training module is used to train the uncropped model and the cropped model respectively;
    评估模块,所述评估模块用于对未裁剪的训练后的所述模型和裁剪后训练的模型分别进行评估,得到第一评估值和第二评估值;an evaluation module, the evaluation module is used to evaluate the model after training that is not trimmed and the model trained after trimming, respectively, to obtain a first evaluation value and a second evaluation value;
    确定模块,所述确定模块用于根据所述第一评估值和所述第二评估值,确定是否将裁剪后训练的模型作为新模型输出。A determination module, configured to determine whether to output the model trained after trimming as a new model according to the first evaluation value and the second evaluation value.
  7. 根据权利要求6所述的装置,其特征在于,所述获取模块具体用于:The device according to claim 6, wherein the obtaining module is specifically configured to:
    所述卷积层中每个通道的输出乘以所述通道的反向传播梯度,再乘以样本个数的倒数,得到卷积层中通道的贡献值。The output of each channel in the convolutional layer is multiplied by the backpropagation gradient of the channel, and then multiplied by the inverse of the number of samples to obtain the contribution value of the channel in the convolutional layer.
  8. 根据权利要求6所述的装置,其特征在于,所述裁剪模块具体用于:The device according to claim 6, wherein the cropping module is specifically used for:
    对全部卷积层的每个通道的所述贡献值进行排序;sorting the contribution values of each channel of all convolutional layers;
    裁剪所述贡献值不大于预设的贡献值阈值的通道,并将保留的通道确定为第一数量个通道,被裁剪掉的通道确定为第二数量个通道;Clipping the channels whose contribution value is not greater than the preset contribution value threshold, and determining the reserved channels as the first number of channels, and the clipped channels as the second number of channels;
    获取每个卷积层的保留的通道的个数;Get the number of reserved channels for each convolutional layer;
    当任一卷积层的保留的通道个数小于预设的通道个数阈值时,将预设的通道个数阈值减去所述任一卷积层的保留的通道个数作为第三数量,从所述第二数量个通道中,确定第三数量个通道,并将所述第三数量个通道作为所述任一卷积层的保留的通道;When the number of reserved channels of any convolution layer is less than the preset number of channels threshold, the preset number of channels threshold is subtracted from the number of reserved channels of any convolution layer as the third number, From the second number of channels, determine a third number of channels, and use the third number of channels as the reserved channels of any one of the convolutional layers;
    根据所述第一数量个通道和所述任一卷积层的保留的通道,得到裁剪后的模型。According to the first number of channels and the reserved channels of any one of the convolutional layers, a tailored model is obtained.
  9. 一种设备,其特征在于,所述设备包括存储器和处理器,所述存储器用于存储程序,所述处理器在运行所述程序时用于执行根据权利要求1-5中任一项所述的方法。A device, characterized in that the device comprises a memory and a processor, the memory is used to store a program, and the processor is used to execute the program according to any one of claims 1-5 when running the program Methods.
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1-5中任一项所述的方法。A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method according to any one of claims 1-5 is implemented.
PCT/CN2021/109187 2020-07-29 2021-07-29 Acceleration method and device for deep learning model WO2022022625A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010745500.3 2020-07-29
CN202010745500.3A CN112101515A (en) 2020-07-29 2020-07-29 Deep learning model acceleration method and device

Publications (1)

Publication Number Publication Date
WO2022022625A1 true WO2022022625A1 (en) 2022-02-03

Family

ID=73749869

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/109187 WO2022022625A1 (en) 2020-07-29 2021-07-29 Acceleration method and device for deep learning model

Country Status (2)

Country Link
CN (1) CN112101515A (en)
WO (1) WO2022022625A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116072096A (en) * 2022-08-10 2023-05-05 荣耀终端有限公司 Model training method, acoustic model, voice synthesis system and electronic equipment

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101515A (en) * 2020-07-29 2020-12-18 北京智行者科技有限公司 Deep learning model acceleration method and device
WO2022141489A1 (en) * 2020-12-31 2022-07-07 深圳元戎启行科技有限公司 Deep learning model reasoning method and apparatus, computer device, and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108009600A (en) * 2017-11-30 2018-05-08 北京小米移动软件有限公司 Model optimization, quality determining method, device, equipment and storage medium
CN109754080A (en) * 2018-12-21 2019-05-14 西北工业大学 The pruning method of Embedded network model
CN110598848A (en) * 2019-08-16 2019-12-20 中国科学院计算技术研究所 Migration learning acceleration method based on channel pruning
CN110689113A (en) * 2019-09-19 2020-01-14 浙江大学 Deep neural network compression method based on brain consensus initiative
CN110929839A (en) * 2018-09-20 2020-03-27 深圳市商汤科技有限公司 Method and apparatus for training neural network, electronic device, and computer storage medium
CN112101515A (en) * 2020-07-29 2020-12-18 北京智行者科技有限公司 Deep learning model acceleration method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108009600A (en) * 2017-11-30 2018-05-08 北京小米移动软件有限公司 Model optimization, quality determining method, device, equipment and storage medium
CN110929839A (en) * 2018-09-20 2020-03-27 深圳市商汤科技有限公司 Method and apparatus for training neural network, electronic device, and computer storage medium
CN109754080A (en) * 2018-12-21 2019-05-14 西北工业大学 The pruning method of Embedded network model
CN110598848A (en) * 2019-08-16 2019-12-20 中国科学院计算技术研究所 Migration learning acceleration method based on channel pruning
CN110689113A (en) * 2019-09-19 2020-01-14 浙江大学 Deep neural network compression method based on brain consensus initiative
CN112101515A (en) * 2020-07-29 2020-12-18 北京智行者科技有限公司 Deep learning model acceleration method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116072096A (en) * 2022-08-10 2023-05-05 荣耀终端有限公司 Model training method, acoustic model, voice synthesis system and electronic equipment
CN116072096B (en) * 2022-08-10 2023-10-20 荣耀终端有限公司 Model training method, acoustic model, voice synthesis system and electronic equipment

Also Published As

Publication number Publication date
CN112101515A (en) 2020-12-18

Similar Documents

Publication Publication Date Title
WO2022022625A1 (en) Acceleration method and device for deep learning model
CN110263162B (en) Convolutional neural network, text classification method thereof and text classification device
DE102020105535A1 (en) Neural network apparatus and method for quantizing parameters of a neural network
CN110059733A (en) The optimization and fast target detection method, device of convolutional neural networks
CN110874636B (en) Neural network model compression method and device and computer equipment
CN112183748A (en) Model compression method, system and related equipment based on sparse convolutional neural network
CN113516230B (en) Automatic convolutional neural network pruning method based on average rank importance ordering
JP6950756B2 (en) Neural network rank optimizer and optimization method
CN110363297A (en) Neural metwork training and image processing method, device, equipment and medium
CN110232436A (en) Pruning method, device and the storage medium of convolutional neural networks
US10733498B1 (en) Parametric mathematical function approximation in integrated circuits
CN111401523A (en) Deep learning network model compression method based on network layer pruning
CN110020718A (en) The layer-by-layer neural networks pruning method and system inferred based on variation
CN113837376B (en) Neural network pruning method based on dynamic coding convolution kernel fusion
CN116776208B (en) Training method of seismic wave classification model, seismic wave selecting method, equipment and medium
CN111126555A (en) Neural network model training method, device, equipment and storage medium
CN114677548A (en) Neural network image classification system and method based on resistive random access memory
CN114742221A (en) Deep neural network model pruning method, system, equipment and medium
US20200320393A1 (en) Data processing method and data processing device
CN113128664A (en) Neural network compression method, device, electronic equipment and storage medium
CN115081580A (en) Method for pruning pre-trained neural network model
CN112613604A (en) Neural network quantification method and device
CN111598233A (en) Compression method, device and equipment of deep learning model
CN105989843A (en) Method and device of realizing missing feature reconstruction
CN114822503A (en) Audio data based processing method, device and readable medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21850233

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21850233

Country of ref document: EP

Kind code of ref document: A1