WO2021103597A1 - 一种神经网络的模型压缩的方法和设备 - Google Patents
一种神经网络的模型压缩的方法和设备 Download PDFInfo
- Publication number
- WO2021103597A1 WO2021103597A1 PCT/CN2020/103697 CN2020103697W WO2021103597A1 WO 2021103597 A1 WO2021103597 A1 WO 2021103597A1 CN 2020103697 W CN2020103697 W CN 2020103697W WO 2021103597 A1 WO2021103597 A1 WO 2021103597A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- network
- layer
- clipping
- channel
- small
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0495—Quantised networks; Sparse networks; Compressed networks
Definitions
- the field relates to the computer field, and more specifically to a method and device for model compression of a neural network.
- the key technologies of deep neural networks are widely used in the fields of image classification, object detection, speech recognition and natural language processing.
- the model size and computing requirements increase, so that they rely on high-power computing platforms.
- research on deep neural network model compression technology for embedded applications has been carried out in order to reduce model size and storage space requirements , Optimize the model calculation process.
- the existing clipping algorithms are divided into two types: non-structural clipping and structural clipping.
- the expression of the cropped model is that the cropped model weight value is 0, and the un cropped model weight value is non-zero float data, so the model size does not change essentially.
- non-structural cropping algorithms are used, and the cropped model usually uses methods such as data indexing to change the actual model size.
- the model after the structural cropping algorithm usually needs to reduce the parameters of the current convolutional layer according to the cropping ratio, and add the corresponding convolution operation in the current convolutional layer according to the index number of the cropping channel.
- the compressed model after cropping has a low compression rate. When the model is loaded, a specific decompression module needs to be added; a network with a shortcut structure has excessive cropping of some convolutional layers.
- the purpose of the embodiments of the present invention is to propose a neural network model compression method, which can directly reduce the amount of calculation and reduce the size of the model.
- the model can be loaded with one click to reduce the difficulty of use.
- one aspect of the embodiments of the present invention provides a method for model compression of a neural network, which includes the following steps:
- recording the input and output parameters of each network layer in the network layer includes:
- dividing the network layer into several small networks according to input and output parameters includes:
- the current network layer is divided into a small network
- the current network layer is divided into a small network.
- the channel clipping algorithm includes: a dynamic channel clipping algorithm and a channel clipping algorithm based on automatic machine learning.
- the decomposition calculation of each clipping small network according to the clipping channel index number includes:
- the corresponding index number in the output channel corresponding to the corresponding layer is clipped according to the clipping channel index number of the corresponding layer in each clipping small network.
- a device for model compression of a neural network which is characterized in that the device includes:
- At least one processor At least one processor
- Memory stores program code that can be run by the processor, and the program code executes the following steps when being run by the processor:
- each clipping small network is decomposed and calculated.
- recording the input and output parameters of each network layer in the network layer includes:
- dividing the network layer into several small networks according to input and output parameters includes:
- the current network layer is divided into a small network
- the current network layer is divided into a small network.
- the channel clipping algorithm includes: a dynamic channel clipping algorithm and a channel clipping algorithm based on automatic machine learning.
- the decomposition calculation of each clipping small network according to the clipping channel index number includes:
- the corresponding index number in the output channel corresponding to the corresponding layer is clipped according to the clipping channel index number of the corresponding layer in each clipping small network.
- the neural network model compression method provided by the embodiment of the present invention records the input and output parameters of each layer of the network; and divides the network layer into several small networks according to the input and output parameters. ; Set the clipping flag of the first convolutional layer in each small network to 0 to obtain the clipping small network; use the channel clipping algorithm to train each clipping small network to obtain the network weight and weight mask; record the weight The clipping channel index number of each convolutional layer of the clipping small network with a mask of 0; the technical solution of decomposing each clipping small network according to the clipping channel index number can directly reduce the calculation amount and reduce the model size. During network deployment, the model can be loaded with one click to reduce the difficulty of use.
- Fig. 1 is a schematic flowchart of a method for model compression of a neural network according to an embodiment of the present invention.
- the first aspect of the embodiments of the present invention proposes an embodiment of a method for model compression of a neural network.
- Figure 1 shows a schematic flow chart of the method.
- the method may include the following steps:
- S1 records the input and output parameters of each layer of the network in the network layer, and can label each layer, from the 0th layer to the nth layer;
- S2 divides the network layer into several small networks according to input and output parameters, and divides the n-layer large network into m small networks according to different parameters;
- S3 sets the cropping flag of the first convolutional layer in each small network to 0 to obtain a cropped small network.
- the cropping flags of other layers are all set to 1, 0 means no cropping, and 1 means cropping;
- S4 uses the channel clipping algorithm to train each clipping small network to obtain the network weight and weight mask (mask).
- any channel clipping algorithm can be used to train the clipping small network. After training, many network parameters are obtained. Here We need to use weights and weight masks. If the weights are cropped, the mask is 0, otherwise it is 1;
- S5 records the index number of the clipping channel of each convolutional layer of the clipping small network with a weight mask of 0, that is, if the channel is clipped, its index number is recorded;
- S6 decomposes and calculates each clipping small network according to the clipping channel index number, that is, if the clipping small network has k layers, according to the channel clipping index number of the kth layer, the corresponding index number of the output channel of the k-1 layer is clipped , Until each layer of the small network is traversed.
- the final network parameters only include weights (uncut). For example, if the weight is 5x10x3x3, the index number of the k-th layer records 0, 1, and 2 in the second dimension, and the index number in the first dimension of 10x8x3x2 in the weight of the k-1 layer is 0, 1, 2 Make cropping.
- the compression algorithm of a neural network provided by the present invention, in a network with a shortcut structure, reasonably cuts the network to ensure performance. After cropping the model, there is no need to add a specific convolution operation, the model size is directly reduced, and when model inference is implemented, one-click loading without decoding is required.
- the amount of calculation can be directly reduced, and the size of the model can be reduced.
- the model can be loaded with one click to reduce the difficulty of use.
- recording the input and output parameters of each network layer in the network layer includes:
- dividing the network layer into several small networks according to input and output parameters includes:
- the current network layer and the previous network layer are divided into the same small network; if the current network layer If the number of inputs or outputs is not 1, the current network layer is divided into a small network; if the number of inputs and outputs of the current network layer are both 1, and the input name is the same as the input of one of the other network layers If the names are consistent, the current network layer is divided into a small network.
- the channel clipping algorithm includes: a dynamic channel clipping algorithm and a channel clipping algorithm based on automatic machine learning.
- Other channel clipping algorithms can also be used. In theory, any algorithm can be used.
- decomposing each clipping small network according to the clipping channel index number includes:
- the corresponding index in the output channel corresponding to the corresponding layer No. for cropping is, if the pruning small network has i layers, the corresponding index numbers of the output channels of the i-1 layer are cropped according to the channel cropping index number of the i-th layer, until each layer of the small network is traversed.
- the final network parameters only include weights (uncut).
- the index number of the k-th layer records 0, 1, 2 in the second dimension
- the index number in the first dimension of 10x8x3x2 in the weight of the i-1th layer is 0, 1, 2 Make cropping.
- the present invention can add the following two modules to the existing quantization algorithm to complete the above method:
- Network segmentation module When channel cropping, select crop input channel instead of output channel. This is because, first: if the convolutional layer contains bias, it is necessary to ensure that the output channel index of the cropped convolutional layer is consistent with the bias index; second: if the adjacent layer of the convolutional layer is the batchnorm layer (existing network structure Usually conv, batchnorm), in the batchnorm layer calculation, the clipped channel is restored, losing the meaning of clipping.
- each layer of each subnet has only one input and one output;
- each subnet is gradually stepped from the last layer Upward decomposition calculation, that is, record the clipped channel index numbers of the n layer, and clip the channel index numbers of the output channel n layer record of the n-1 layer until all the layers are traversed.
- the above-mentioned programs can be stored in a computer readable storage medium.
- the storage medium may be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.
- the foregoing computer program embodiment can achieve the same or similar effects as any of the foregoing method embodiments corresponding thereto.
- the method disclosed according to the embodiment of the present invention may also be implemented as a computer program executed by a CPU, and the computer program may be stored in a computer-readable storage medium.
- the computer program executes the above-mentioned functions defined in the method disclosed in the embodiment of the present invention.
- the second aspect of the embodiments of the present invention proposes a neural network model compression device, which is characterized in that the device includes:
- At least one processor At least one processor
- Memory stores program code that can be run by the processor, and the program code executes the following steps when being run by the processor:
- each clipping small network is decomposed and calculated.
- recording the input and output parameters of each network layer in the network layer includes:
- dividing the network layer into several small networks according to input and output parameters includes:
- the current network layer and the previous network layer are divided into the same small network; if the current network layer If the number of inputs or outputs is not 1, the current network layer is divided into a small network; if the number of inputs and outputs of the current network layer are both 1, and the input name is the same as the input of one of the other network layers If the names are consistent, the current network layer is divided into a small network.
- the channel clipping algorithm includes: a dynamic channel clipping algorithm and a channel clipping algorithm based on automatic machine learning.
- decomposing each clipping small network according to the clipping channel index number includes:
- the corresponding index number in the output channel corresponding to the corresponding layer is clipped according to the clipping channel index number of the corresponding layer in each clipping small network.
- the above method steps and system units or modules can also be implemented using a controller and a computer readable storage medium for storing a computer program that enables the controller to implement the above steps or unit or module functions.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (10)
- 一种神经网络的模型压缩的方法,其特征在于,包括以下步骤:记录网络层中每一层网络的输入和输出参数;根据所述输入和输出参数将所述网络层分为若干小网络;将每个所述小网络中的第一个卷积层的裁剪标志位设置为0,得到裁剪小网络;使用通道裁剪算法对每个所述裁剪小网络进行训练,得到网络权重和权重掩码;记录所述权重掩码为0的所述裁剪小网络的每一个卷积层的裁剪通道索引号;根据所述裁剪通道索引号对每个所述裁剪小网络进行分解计算。
- 根据权利要求1所述的方法,其特征在于,记录网络层中每一层网络的输入和输出参数包括:记录所述网络层中每一层网络的输入个数、输出个数、输入名称和输出名称。
- 根据权利要求2所述的方法,其特征在于,根据所述输入和输出参数将所述网络层分为若干小网络包括:如果当前网络层的输入个数和输出个数均为1,且输入名称与其它网络层的输入名称均不一致,则将所述当前网络层与上一网络层分为同一个所述小网络;如果所述当前网络层的输入个数或输出个数不为1,则将所述当前网络层分为一个所述小网络;如果所述当前网络层的输入个数和输出个数均为1,且输入名称与其它网络层之一的输入名称一致,则将所述当前网络层分为一个所述小网络。
- 根据权利要求1所述的方法,其特征在于,所述通道裁剪算法包括:动态通道裁剪算法和基于自动机器学习的通道裁剪算法。
- 根据权利要求1所述的方法,其特征在于,根据所述裁剪通道索引号对每个所述裁剪小网络进行分解计算包括:从每个所述裁剪小网络中最后一层向第一层依次遍历,根据每个所述裁剪小网络中相应层的裁剪通道索引号将所述相应层对应的输出通道中的相应索引号进行裁剪。
- 一种神经网络的模型压缩的设备,其特征在于,所述设备包括:至少一个处理器;和存储器,所述存储器存储有处理器可运行的程序代码,所述程序代码在被处理器运行时执行以下步骤:记录网络层中每一层网络的输入和输出参数;根据所述输入和输出参数将所述网络层分为若干小网络;将每个所述小网络中的第一个卷积层的裁剪标志位设置为0,得到裁剪小网络;使用通道裁剪算法对每个所述裁剪小网络进行训练,得到网络权重和权重掩码;记录所述权重掩码为0的所述裁剪小网络的每一个卷积层的裁剪通道索引号;根据所述裁剪通道索引号对每个所述裁剪小网络进行分解计算。
- 根据权利要求6所述的设备,其特征在于,记录网络层中每一层网络的输入和输出参数包括:记录所述网络层中每一层网络的输入个数、输出个数、输入名称和输出名称。
- 根据权利要求7所述的设备,其特征在于,根据所述输入和输出参数将所述网络层分为若干小网络包括:如果当前网络层的输入个数和输出个数均为1,且输入名称与其它网络层的输入名称均不一致,则将所述当前网络层与上一网络层分为同一个 所述小网络;如果所述当前网络层的输入个数或输出个数不为1,则将所述当前网络层分为一个所述小网络;如果所述当前网络层的输入个数和输出个数均为1,且输入名称与其它网络层之一的输入名称一致,则将所述当前网络层分为一个所述小网络。
- 根据权利要求6所述的设备,其特征在于,所述通道裁剪算法包括:动态通道裁剪算法和基于自动机器学习的通道裁剪算法。
- 根据权利要求6所述的设备,其特征在于,根据所述裁剪通道索引号对每个所述裁剪小网络进行分解计算包括:从每个所述裁剪小网络中最后一层向第一层依次遍历,根据每个所述裁剪小网络中相应层的裁剪通道索引号将所述相应层对应的输出通道中的相应索引号进行裁剪。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020227018397A KR20220091575A (ko) | 2019-11-29 | 2020-07-23 | 신경망 모델을 압축하는 방법 및 기기 |
US17/780,479 US11928599B2 (en) | 2019-11-29 | 2020-07-23 | Method and device for model compression of neural network |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911198187.X | 2019-11-29 | ||
CN201911198187.XA CN111126595A (zh) | 2019-11-29 | 2019-11-29 | 一种神经网络的模型压缩的方法和设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021103597A1 true WO2021103597A1 (zh) | 2021-06-03 |
Family
ID=70497075
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/103697 WO2021103597A1 (zh) | 2019-11-29 | 2020-07-23 | 一种神经网络的模型压缩的方法和设备 |
Country Status (4)
Country | Link |
---|---|
US (1) | US11928599B2 (zh) |
KR (1) | KR20220091575A (zh) |
CN (1) | CN111126595A (zh) |
WO (1) | WO2021103597A1 (zh) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111126595A (zh) * | 2019-11-29 | 2020-05-08 | 苏州浪潮智能科技有限公司 | 一种神经网络的模型压缩的方法和设备 |
CN113255907B (zh) * | 2021-05-20 | 2024-05-14 | 广州广电运通金融电子股份有限公司 | 一种网络模型经裁剪以进行图像识别的方法 |
US11763082B2 (en) * | 2021-07-12 | 2023-09-19 | International Business Machines Corporation | Accelerating inference of transformer-based models |
CN113537490A (zh) * | 2021-07-13 | 2021-10-22 | 广州虎牙科技有限公司 | 一种神经网络裁剪方法及电子设备 |
CN116992946B (zh) * | 2023-09-27 | 2024-05-17 | 荣耀终端有限公司 | 模型压缩方法、装置、存储介质和程序产品 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107832847A (zh) * | 2017-10-26 | 2018-03-23 | 北京大学 | 一种基于稀疏化后向传播训练的神经网络模型压缩方法 |
CN108304928A (zh) * | 2018-01-26 | 2018-07-20 | 西安理工大学 | 基于改进聚类的深度神经网络的压缩方法 |
US20190130250A1 (en) * | 2017-10-30 | 2019-05-02 | Samsung Electronics Co., Ltd. | Method and apparatus with neural network performing convolution |
CN109978142A (zh) * | 2019-03-29 | 2019-07-05 | 腾讯科技(深圳)有限公司 | 神经网络模型的压缩方法和装置 |
CN111126595A (zh) * | 2019-11-29 | 2020-05-08 | 苏州浪潮智能科技有限公司 | 一种神经网络的模型压缩的方法和设备 |
-
2019
- 2019-11-29 CN CN201911198187.XA patent/CN111126595A/zh active Pending
-
2020
- 2020-07-23 KR KR1020227018397A patent/KR20220091575A/ko active Search and Examination
- 2020-07-23 WO PCT/CN2020/103697 patent/WO2021103597A1/zh active Application Filing
- 2020-07-23 US US17/780,479 patent/US11928599B2/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107832847A (zh) * | 2017-10-26 | 2018-03-23 | 北京大学 | 一种基于稀疏化后向传播训练的神经网络模型压缩方法 |
US20190130250A1 (en) * | 2017-10-30 | 2019-05-02 | Samsung Electronics Co., Ltd. | Method and apparatus with neural network performing convolution |
CN108304928A (zh) * | 2018-01-26 | 2018-07-20 | 西安理工大学 | 基于改进聚类的深度神经网络的压缩方法 |
CN109978142A (zh) * | 2019-03-29 | 2019-07-05 | 腾讯科技(深圳)有限公司 | 神经网络模型的压缩方法和装置 |
CN111126595A (zh) * | 2019-11-29 | 2020-05-08 | 苏州浪潮智能科技有限公司 | 一种神经网络的模型压缩的方法和设备 |
Also Published As
Publication number | Publication date |
---|---|
US20230004809A1 (en) | 2023-01-05 |
KR20220091575A (ko) | 2022-06-30 |
US11928599B2 (en) | 2024-03-12 |
CN111126595A (zh) | 2020-05-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021103597A1 (zh) | 一种神经网络的模型压缩的方法和设备 | |
CN109978142B (zh) | 神经网络模型的压缩方法和装置 | |
US7689616B2 (en) | Techniques for specifying and collecting data aggregations | |
EP3076310B1 (en) | Variable virtual split dictionary for search optimization | |
CN110796251A (zh) | 基于卷积神经网络的图像压缩优化方法 | |
CN111178039B (zh) | 一种模型训练方法、装置、实现文本处理的方法及装置 | |
WO2022057468A1 (zh) | 一种深度学习模型推理加速的方法、系统、设备及介质 | |
CN110298446A (zh) | 面向嵌入式系统的深度神经网络压缩和加速方法及系统 | |
US20230289567A1 (en) | Data Processing Method, System and Device, and Readable Storage Medium | |
CN112488304A (zh) | 一种卷积神经网络中的启发式滤波器剪枝方法和系统 | |
CN114861907A (zh) | 数据计算方法、装置、存储介质和设备 | |
CN112200310B (zh) | 智能处理器、数据处理方法及存储介质 | |
CN111241204B (zh) | 一种梯度数据的同步方法、装置、设备及存储介质 | |
CN111027693A (zh) | 一种基于去权重剪枝的神经网络压缩方法及系统 | |
CN117009093A (zh) | 降低神经网络推理所需内存占用量的重计算方法和系统 | |
CN115774605A (zh) | Kubernetes的预测式弹性伸缩方法及系统 | |
CN115331690B (zh) | 一种用于通话语音的噪声实时消除的方法 | |
KR102393761B1 (ko) | 이미지 처리를 위한 인공 신경망 모델 학습 방법 및 시스템 | |
CN109993304B (zh) | 一种基于语义分割的检测模型压缩方法 | |
WO2021238289A1 (zh) | 序列处理的方法与装置 | |
CN112329923B (zh) | 一种模型压缩方法、装置、电子设备及可读存储介质 | |
CN111767204B (zh) | 溢出风险检测方法、装置及设备 | |
CN113298225A (zh) | 数据处理方法、音频降噪方法和神经网络模型 | |
CN111047013A (zh) | 卷积神经网络结构优化方法、装置和电子设备 | |
CN117540780B (zh) | 一种神经网络模型的压缩方法和相关装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20892240 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 20227018397 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20892240 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20892240 Country of ref document: EP Kind code of ref document: A1 |