WO2018107414A1 - 压缩/解压缩神经网络模型的装置、设备和方法 - Google Patents

压缩/解压缩神经网络模型的装置、设备和方法 Download PDF

Info

Publication number
WO2018107414A1
WO2018107414A1 PCT/CN2016/110053 CN2016110053W WO2018107414A1 WO 2018107414 A1 WO2018107414 A1 WO 2018107414A1 CN 2016110053 W CN2016110053 W CN 2016110053W WO 2018107414 A1 WO2018107414 A1 WO 2018107414A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
parameters
compressed
model
parameter
Prior art date
Application number
PCT/CN2016/110053
Other languages
English (en)
French (fr)
Inventor
陈天石
韦洁
陈云霁
刘少礼
支天
郭崎
Original Assignee
上海寒武纪信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海寒武纪信息科技有限公司 filed Critical 上海寒武纪信息科技有限公司
Priority to PCT/CN2016/110053 priority Critical patent/WO2018107414A1/zh
Publication of WO2018107414A1 publication Critical patent/WO2018107414A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction

Definitions

  • the present invention relates to the field of neural network model compression/decompression algorithm application technology, and more particularly to a device and device for compressing/decompressing a neural network model, and to a method for compressing/decompressing a neural network model.
  • PCA Principal Component Analysis
  • the present invention provides a method of compressing/decompressing a neural network model, and an apparatus and apparatus for compressing/decompressing a neural network model.
  • a method of compressing/decompressing a neural network model comprising the steps of:
  • S3 Decompress the low-dimensional neural network parameters and restore the parameters of the neural network model.
  • An apparatus for compressing/decompressing a neural network model comprising a parameter acquisition module, a model compression module, a model storage module, and a model decompression module, wherein
  • a parameter obtaining module configured to acquire a parameter to be compressed of the neural network model
  • a model compression module configured to compress the parameter to be compressed by using a neural network algorithm, and perform training to obtain a low-dimensional neural network parameter
  • a model decompression module for decompressing low dimensional neural network parameters to form recovered neural network parameters
  • the storage module is configured to store parameters to be compressed, low-dimensional neural network parameters, and recovered neural network parameters of the neural network model.
  • a device for compressing/decompressing a neural network model comprising:
  • a memory for storing executable instructions
  • a processor for executing executable instructions stored in the memory to perform the following operations:
  • the neural network algorithm is used to compress and train the parameters to be compressed to obtain low-dimensional neural network parameters
  • the low-dimensional neural network parameters are decompressed, and the parameters of the neural network model are restored.
  • FIG. 1 is a block diagram showing an example of an overall structure of an apparatus for compressing/decompressing a neural network model according to an embodiment of the present invention
  • FIG. 2 is a block diagram showing an example of a parameter acquisition module in an apparatus for compressing/decompressing a neural network model according to an embodiment of the present invention
  • FIG. 3 is a block diagram showing an example of an auto-encoded neural network structure in an apparatus for compressing/decompressing a neural network model according to an embodiment of the present invention
  • FIG. 4 is a block diagram of an example of a model compression module in an apparatus for compressing/decompressing a neural network model in accordance with an embodiment of the present invention.
  • FIG. 5 is a block diagram of an example of a model decompression module in an apparatus for compressing/decompressing a neural network model in accordance with an embodiment of the present invention.
  • FIG. 6 is a flow chart of a method of compressing/decompressing a neural network model in accordance with an embodiment of the present invention.
  • FIG. 7 is a block diagram of an apparatus for compressing/decompressing a neural network model in accordance with an embodiment of the present invention.
  • the embodiment of the invention provides a device for compressing/decompressing a neural network model, which can compress the trained neural network model parameters, can save the storage space of the model, and is beneficial to transplant the neural network to the small memory device.
  • the device for compressing/decompressing a neural network model includes a parameter acquisition module, a model compression module, a model storage module, and a model decompression module.
  • the parameter obtaining module is configured to acquire a parameter to be compressed of the neural network model
  • the obtaining manner may be: traversing the parameters to be compressed of the neural network model until the selected number of parameters to be compressed is equal to the set dimension.
  • the parameter to be compressed may be thinned out.
  • the parameter to be compressed When the parameter to be compressed is to be thinned, the parameter to be compressed of the neural network model may be traversed, and the parameter to be compressed is thinned.
  • the selected parameters to be compressed are judged, the parameter to be compressed less than the set threshold is set to 0, the sparse non-zero element is selected and the position coordinates of the non-zero element are marked until the selected number of parameters to be compressed is equal to the set dimension. number.
  • the thinning can effectively reduce the parameters of the neural network model, save the memory space required for the storage model, and facilitate the transmission and transplantation of the model.
  • parameters to be compressed may include network nodes, weights, training rates, excitation functions, and offsets.
  • the parameter acquisition module outputs the parameter of the set dimension every time until all the parameters of the neural network model are traversed.
  • FIG. 2 is a schematic diagram of a parameter acquisition module in a device for compressing/decompressing a neural network model according to an embodiment of the present invention.
  • the module obtains a total of one parameter of the neural network model.
  • the parameter stored in the parameter file is W[l]
  • the model compression module inputs the vector each time.
  • X[l input ] using the Boolean array Label[l] to mark the sparse case.
  • the read parameter be w i and the threshold be threshold.
  • the j-1 term before X[l input ] is not empty.
  • the model compression module is configured to compress the parameters to be compressed, and use a neural network algorithm to perform training to obtain low-dimensional neural network parameters.
  • the compression is performed by an auto-encoder neural network algorithm based on a multi-layer perceptron (MLP), and the auto-encoded neural network is divided into an encoder network and a hidden (coder).
  • Layer and decoder network the input and decompression networks of the hidden layer network have the same number of output nodes, and the number of nodes in the hidden layer is smaller than the above two.
  • the compression network inputs parameters to be compressed, outputs to the hidden layer, and the number of nodes input is greater than the number of nodes output.
  • the decompression network also adopts the MLP structure, the input layer is the coder layer, and the output layer is the same as the number of the input layer of the encoder.
  • the number of parameters can be effectively reduced, memory is saved, and the model is stored and transmitted.
  • the neural network model is decompressed during use, and the accuracy is ensured, so that the neural network algorithm is better applied to the actual in.
  • Figure 3 shows the structure of the auto-encoded neural network in this example.
  • the compressed network (equivalent to the input layer) and the decompressed network (equivalent to the output layer) are all three-layer MLP networks.
  • the number of input layer and output layer nodes is the same as l input , and the number of nodes in the middle hidden layer (coder layer) is at least l.
  • Compress ie: l compress ⁇ l input , the layer is fully connected. In this way, the number of nodes compressed by the neural network algorithm is reduced, thereby reducing the storage space.
  • Figure 4 shows a process of model compression, the training process that automatically encodes neural networks.
  • the weight is initialized
  • the X[l input ] obtained by parameter extraction is used as the input of the model compression module, and the forward propagation calculation results in the weight, offset and output layer results of each layer;
  • the weight and offset are updated by the gradient descent method; the iteration is stopped until the error is small enough or the maximum number of trainings is reached; the output Y[l compress ] of the intermediate coder layer is the low-dimensional representation of the input parameter X[l input ].
  • the structure file of the compressed neural network model, the Boolean array Label[l], the decoder network structure file, the decoder network parameter W d, and the output of the coder layer Y[l compress ] are saved, which are necessary data for decompressing the neural network model.
  • the model decompression module it is used to decompose the low-dimensional neural network parameters, form the restored neural network parameters, and put them into the neural network; wherein, the decompression is also decompressed by the above-mentioned automatic coding neural network, including the decompression network.
  • the compressed network inputs low-dimensional neural network parameters, restores the number of neural network parameters, and places the restored neural network parameters correspondingly into the network.
  • Figure 5 shows a process for decompressing a neural network model.
  • the Y [l compress] W d is inputted to the parameter decoder network, a length of l input to obtain an output X '[l input].
  • the decompressed neural network model parameter be W'[l], and map X'[l input ] to W'[l] by reading the value of the array Label[l]. If Label[i] is 0, it means that the absolute value of W[i] is less than the threshold of sparseness when the parameter is extracted. If it is omitted, there is no corresponding item in X'[l input ], and the value of W'[i] is 0. If Label[i] is 1, assign the value of X'[j] to W'[i]; after traversing the array Label[l], the decompressed neural network model parameter is W'[l].
  • the parameters to be compressed, the low-dimensional neural network parameters (that is, the compressed parameters), and the restored neural network parameters are stored in the neural network model.
  • the storage module is further configured to store the markup during the thinning.
  • a complete neural network model including parameters and structures including parameters and structures.
  • a certain number of parameters are extracted by the parameter extraction module; the low-dimensional representation of the parameters is obtained by using the auto-encoder neural network algorithm in the model compression module, and all the parameters are compressed by repeating the above process; the corresponding parameters and network structure are stored.
  • the low-dimensional parameters are used as the input of the decoder network, and the high-dimensional parameters are restored and returned to the compressed network model.
  • the above process is repeated to decompress all the parameters; the parameters obtained by decompression are put back.
  • the compressed network model is used to complete the process of decompressing the neural network model.
  • the device of the above embodiment is applied to the case where the parameters to be compressed are thinned.
  • the other case is not sparse.
  • the neural network model structure and parameters, as well as the automatic coding network structure and parameters can be the same as the sparse settings, except for parameter extraction, model storage, and decompression. There are differences in the process.
  • the compressed neural network model parameter is W[l]
  • the auto-encoder input vector is X[l input ].
  • the data to be saved is: the structure file of the compressed neural network model, the decompressed network structure file, the compressed network parameter W d, and the output of the hidden layer Y[l compress ].
  • the device of the embodiment of the invention can effectively reduce the parameters of the neural network model, save the memory space required for the storage model, and facilitate the transmission and transplantation of the model.
  • an embodiment of the present invention further provides a method for compressing/decompressing a neural network model.
  • the method includes the following steps:
  • S3 Decompress the low-dimensional neural network parameters and restore the parameters of the neural network model.
  • the method may include: traversing the parameters to be compressed of the neural network model until the number of selected parameters to be compressed is equal to the set dimension.
  • the compression parameter may be pre-processed, and the pre-processing may be thinning the parameter to be compressed.
  • the method may include: traversing the parameter to be compressed of the neural network model. The parameter to be compressed is thinned out, and the selected parameter to be compressed is judged, the parameter to be compressed less than the set threshold is set to 0, the non-zero element after thinning is selected, and the position coordinate of the non-zero element is marked until the selection is performed. Compressed parameter The number of numbers is equal to the set dimension. The thinning can effectively reduce the parameters of the neural network model, save the memory space required for the storage model, and facilitate the transmission and transplantation of the model.
  • step S1 includes: traversing the parameter to be compressed of the neural network model, thinning the parameter to be compressed, and determining the parameter to be compressed, which is smaller than a set threshold.
  • the parameter to be compressed is set to 0, the sparse non-zero element is selected and the position coordinates of the non-zero element are marked until the number of selected parameters to be compressed is equal to the set dimension.
  • the thinning step is adopted, in step S3, after decompressing, the neural network parameters need to be placed according to the non-zero-marked position.
  • the traversal selection sequentially acquires parameters to be compressed of each layer according to the order in which the neural network model is constructed.
  • step S2 it may include sub-steps:
  • S21 building an automatic coding neural network based on the multi-layer perceptron, the number of input layer and output layer nodes of the automatic coding neural network are the same, and the number of hidden layer nodes is less than the number of display layer nodes;
  • S22 input a parameter to be compressed, perform forward conduction calculation on each layer of neurons automatically encoding the neural network, and obtain activation values of each layer;
  • the neural network algorithm is used to compress the neural network model, which can realize the multiplexing of the arithmetic unit and save memory.
  • the method may include: decompressing low-dimensional neural network parameters by using an automatic coding neural network, including compressing the network, a hidden layer, and decompressing the network, restoring the number of neural network parameters, and restoring the neural network parameters.
  • the corresponding placement is placed on the network.
  • it can be decompressed by using the automatic coding neural network built in step S21 and restored to the output layer.
  • an apparatus for compressing/decompressing a neural network model is provided.
  • the device 700 includes:
  • the processor 701 is configured to execute executable instructions stored in the memory to perform the following operations:
  • the neural network algorithm is used to compress and train the parameters to be compressed to obtain low-dimensional neural network parameters
  • the low-dimensional neural network parameters are decompressed, and the parameters of the neural network model are restored.
  • the above executable instructions correspond to corresponding steps in the above method, in that the executable instructions corresponding to the above method steps are executed by the processor.
  • the above processor 701 may be a single CPU (Central Processing Unit), but may also include two or more processing units.
  • a processor can include a general purpose microprocessor, an instruction set processor, and/or a related chipset and/or a special purpose microprocessor (eg, an application specific integrated circuit (ASIC)).
  • the processor may also include an onboard memory for caching purposes.
  • a dedicated neural network processor is used, and the neural network that the neural network processor already has can be reused when the instruction is executed, thereby saving storage space.
  • the above memory 702 may be a flash memory, a random access memory (RAM), a read only memory (ROM), or an EEPROM.
  • RAM random access memory
  • ROM read only memory
  • EEPROM electrically erasable programmable read-only memory
  • an on-chip memory device mounted on a chip can be employed.
  • the memory 702 may store parameters to be compressed, low-dimensional neural network parameters, and recovered neural network parameters during execution of the instructions.
  • the automatic coding neural network algorithm overcomes these limitations by introducing the nonlinearity of the neural network, and outputs a supervisory mode equal to the input to make the result more reliable.
  • Auto-encoder is an unsupervised learning method that uses an input layer and an output layer representation.
  • a multi-layer neural network with the same meaning and the same number of nodes, learning an "identity function" with the same input and output.
  • the significance of auto-encoding neural networks is to learn the most intermediate hidden layer, which usually has fewer nodes than the input and output layers and is a good representation of the input vector. This process serves as a "dimension reduction" to achieve a low-dimensional representation of high-dimensional inputs.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

一种压缩/解压缩神经网络模型的装置、设备和方法。其中包括步骤:获取神经网络模型的待压缩参数;采用神经网络算法对所述待压缩参数进行压缩和训练,获得低维的神经网络参数;解压所述低维的神经网络参数,恢复神经网络模型的参数。本发明用自动编码神经网络算法实现压缩/解压缩神经网络模型的装置,可减少神经网络模型的参数,有利于模型的存储和传输。

Description

压缩/解压缩神经网络模型的装置、设备和方法 技术领域
本发明涉及神经网络模型压缩/解压缩算法应用技术领域,更具体地涉及一种压缩/解压缩神经网络模型的装置和设备,还涉及一种压缩/解压缩神经网络模型的方法。
背景技术
近年来,神经网络算法被广泛应用到各个领域,随着问题复杂度和对准确率要求的不断提高,神经网络模型深度不断增加,随之而来的是参数数量的爆炸式增长,这给神经网络模型的存储和传输带来了极大的不便。设想将来手机上每一个应用都具备深度学习的能力,但每一个应用都要传输、存储上G的神经网络模型参数,这显然是不合理的。
传统的降维方法大多是线性的,例如PCA(Principal Component Analysis,主成分分析)选取高维数据中方差最大的部分方向,通过选择这些方向,得到包含最多信息的低维表示。然而,PCA方法的线性性导致抽取出的特征类型有很大限制。
发明内容
本发明提供一种压缩/解压缩神经网络模型的方法,以及压缩/解压缩神经网络模型的装置和设备。
一种压缩/解压缩神经网络模型的方法,包括步骤:
S1:获取神经网络模型的待压缩参数;
S2:采用神经网络算法对所述待压缩参数进行压缩和训练,获得低维的神经网络参数;
S3:解压所述低维的神经网络参数,恢复神经网络模型的参数。
一种压缩/解压缩神经网络模型的装置,包括参数获取模块、模型压缩模块、模型存储模块和模型解压缩模块,其中,
参数获取模块,用于获取神经网络模型的待压缩参数;
模型压缩模块,用于采用神经网络算法压缩所述待压缩参数,并进行训练,获得低维的神经网络参数;
模型解压缩模块,用于解压缩低维的神经网络参数,形成恢复的神经网络参数;以及
存储模块,用于存储神经网络模型的待压缩参数、低维的神经网络参数和恢复的神经网络参数。
一种压缩/解压缩神经网络模型的设备,其中包括:
存储器,用于存储可执行指令;以及
处理器,用于执行存储器中存储的可执行指令,以执行如下操作:
获取神经网络模型的待压缩参数;
采用神经网络算法对所述待压缩参数进行压缩和训练,获得低维的神经网络参数;
解压所述低维的神经网络参数,恢复神经网络模型的参数。
为了对本发明上述及其他方面有更佳了解,下文特列举较佳实施例,并配合所附附图,作如下详细说明:
附图说明
图1为根据本发明一实施例的压缩/解压缩神经网络模型的装置的整体结构的示例框图;
图2为根据本发明一实施例的压缩/解压缩神经网络模型的装置中一种参数获取模块的示例框图;
图3为根据本发明一实施例的压缩/解压缩神经网络模型的装置中一种自动编码神经网络结构的示例框图;
图4为根据本发明一实施例的压缩/解压缩神经网络模型的装置中一种模型压缩模块的示例框图。
图5为根据本发明一实施例的压缩/解压缩神经网络模型的装置中一种模型解压缩模块的示例框图。
图6为根据本发明一实施例的压缩/解压缩神经网络模型的方法流程图。
图7为根据本发明一实施例的压缩/解压缩神经网络模型的设备的方框图。
具体实施方式
为使本发明的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本发明作进一步的详细说明。通过以下详细描述,本发明的其它方面、优势和突出特征对于本领域技术人员将变得显而易见。
在本说明书中,下述用于描述本发明原理的各种实施例只是说明,不应该以任何方式解释为限制本发明的范围。参照附图的下述描述用于帮助全面理解由权利要求及其等同物限定的本发明的示例性实施例。下述描述包括多种具体细节来帮助理解,但这些细节应认为仅仅是示例性的。因此,本领域普通技术人员应认识到,在不悖离本发明的范围和精神的情况下,可以对本文中描述的实施例进行多种改变和修改。此外,为了清楚和简洁起见,省略了公知功能和结构的描述。此外,贯穿附图,相同附图标记用于相似功能和操作。
本发明实施例提供压缩/解压缩神经网络模型的装置,可以将训练好的神经网络模型参数进行压缩,可节省模型的存储空间,有利于将神经网络移植到小内存设备。
图1为根据本发明实施方案给出的压缩/解压缩神经网络模型的装置整体结构的示例框图。其中该压缩/解压缩神经网络模型的装置,包括参数获取模块、模型压缩模块、模型存储模块和模型解压缩模块。
其中,参数获取模块,用于获取神经网络模型的待压缩参数;
具体的可以用于获取神经网络模块的待压缩参数,以及进行预处理(例如对待压缩参数进行稀疏化),为模型压缩模块的输入做准备。其中,获取方式可以是对神经网络模型的待压缩参数进行遍历选取,直至选取的待压缩参数的数量等于设定的维数。
对于上述预处理,可以是对待压缩参数进行稀疏化,当需要对待压缩参数进行稀疏化时,可以包括:对神经网络模型的待压缩参数进行遍历选取,对所述待压缩参数进行稀疏化,对选取的待压缩参数进行判断,小于设定阈值的待压缩参数被置0,选取稀疏化后的非零元并标记非零元的位置坐标,直至选取的待压缩参数的数量等于设定的维数。该稀疏化可有效减少神经网络模型参数,节约存储模型所需的内存空间,有利于模型的传输和移植。
对于待压缩参数,其可以包括网络节点、权值、训练速率、激励函数以及偏置。
优选的,参数获取模块每一次输出设定维数的参数,直到遍历完神经网络模型的全部参数。
图2为本发明实施例的压缩/解压缩神经网络模型的装置中一种参数获取模块示意图。该模块获得神经网络模型一共有l个参数。如图2所示,对一个在卷积神经网络框架(caffe,Convolution Architecture For Feature ECtraction)中搭建的神经网络框架,参数文件中存放的参数为W[l],模型压缩模块每次输入向量为X[linput],用布尔型数组Label[l]来标记稀疏化情况。设某次读取参数为wi,阈值为threshold,此时X[linput]前j-1项非空,则:若wi的绝对值大于等于threshold,则将其存放到数组X[linput]中,并将Label[i]置1,并读取下一个参数;若读取参数的绝对值小于阈值,则将Label[i]置0并读取下一个参数;直到数组X[linput]放满(也即实现选取的待压缩参数的数量等于设定的维数)。
Figure PCTCN2016110053-appb-000001
其中,模型压缩模块,用于压缩所述待压缩参数,采用神经网络算法进行训练,获得低维的神经网络参数。
具体的,通过自动编码(Auto-encoder)神经网络算法进行压缩,所述自动编码神经网络以多层感知器(MLP)为基础搭建,自动编码神经网络分为压缩(encoder)网络、隐(coder)层和解压缩(decoder)网络,隐层网络的输入和解压缩网络的输出节点数相同,隐层的节点数小于以上两者。所述压缩网络输入待压缩参数,输出至隐层,且输入的节点数大于输出的节点数。解压缩网络同样采用MLP结构,输入层为coder层,输出层与encoder输入层节点数相同。通过在隐层具有最少的节点,可有效减少参数数量、节约内存,有利于模型的存储和传输;在使用时解压缩神经网络模型,同时保证准确率,使神经网络算法更好地应用到实际中。
图3展示了本实例中自动编码神经网络的结构。其中压缩网络(相当于输入层)和解压缩网络(相当于输出层)都为三层MLP网络,输入层和输出层节点数相同为linput,中间隐层(coder层)的节点数最少为lcompress,即:lcompress<linput,层与层之间是全连接的。这样通过神经网络算法压缩后的节点数减少,从而减少存储空间。
图4展示了一种模型压缩的过程,即自动编码神经网络的训练过程。
搭建好如图3所示自动编码神经网络后对权值进行初始化;
将参数提取得到的X[linput]作为模型压缩模块的输入,前向传播计算得到各层的权重、偏置以及输出层的结果;
将输出层节点的值与X[linput]进行比较,计算残差;
用梯度下降法更新权重和偏置;直到误差足够小或者达到最大训练次数时停止迭代;中间coder层的输出Y[lcompress]即为输入参数X[linput]的低维表示。
保存被压缩神经网络模型的结构文件、布尔型数组Label[l]、decoder网络结构文件、decoder网络参数Wd以及coder层的输出Y[lcompress],这些是解压神经网络模型必需的数据。
对于模型解压缩模块,用于解压低维的神经网络参数,形成恢复的神经网络参数,放置到神经网络中;其中,解压也通过以上所述的自动编码神经网络进行解压,包括有解压缩网络,所述压缩网络输入低维的神经网络参数,恢复神经网络参数的数量,并将恢复的神经网络参数对应的放置到网络中。
图5展示了一种解压缩神经网络模型的过程。将Y[lcompress]输入到参数为Wd的decoder网络,得到长度为linput的输出X’[linput]。设解压后的神经网络模型参数为W’[l],并通过读取数组Label[l]的值将X’[linput]对应到W’[l]中。若Label[i]为0,说明在参数提取时W[i]的绝对值小于稀疏化的阈值,被省略,则X’[linput]中没有其对应项,W’[i]值为0;若Label[i]为1,则将X’[j]的值赋给W’[i];遍历完数组Label[l]即可得到解压后的神经网络模型参数为W’[l]。
Figure PCTCN2016110053-appb-000002
对于存储模块,参见图1所示,用于存储神经网络模型的待压缩参数、低维的神经网络参数(也即压缩后的参数)和恢复的神经网络参数。
可选的,当参数获取模块采用稀疏化后,存储模块还用于存储稀疏化时的标记。
上述装置的一种典型性整体工作流程如下:
对一个包括参数和结构的完整神经网络模型。先通过参数提取模块提取出一定数量参数;在模型压缩模块中使用自动编码(auto-encoder)神经网络算法压缩得到参数的低维表示,重复以上过程压缩完所有参数;储存相应的参数和网络结构;解压缩时将低维参数作为解压缩(decoder)网络的输入,恢复得到高维参数并对应放回到被压缩网络模型中,重复以上过程解压缩所有参数;将解压得到的参数对应放回被压缩网络模型,完成解压缩神经网络模型过程。
上述实施例的装置,是应用于待压缩参数进行稀疏化的情况。另一种情况是不进行稀疏化,该种情况下,对于神经网络模型结构和参数、 以及自动编码网络结构和参数都可以与稀疏化的设置下同,只是在参数提取、模型储存以及解压缩过程方面有所区别。被压缩神经网络模型参数为W[l],auto-encoder每次输入向量为X[linput]。依次读取参数wi,且令xi=wi,直到i=linput。完成一组压缩过程后,继续读取W[l]中的参数。
由于不需要标记参数的稀疏化情况,要保存的数据为:被压缩神经网络模型的结构文件、解压缩网络结构文件、压缩网络参数Wd以及隐层的输出Y[lcompress]。
解压缩过程先将Y[lcompress]输入到参数为Wd的解压缩网络,得到长度为linput的输出X’[linput]。设解压后的神经网络模型参数为W’[l],依次读取x′i,且令w′i=x′i;若i=input则解压下一组参数,直到W’[l]全部被赋值。
本发明实施例的装置可有效减少神经网络模型参数,节约存储模型所需的内存空间,有利于模型的传输和移植。
基于同一发明构思,本发明实施例还提供一种压缩/解压缩神经网络模型的方法,参见图6所示,包括步骤:
S1:获取神经网络模型的待压缩参数;
S2:采用神经网络算法对所述待压缩参数进行压缩和训练,获得低维的神经网络参数;
S3:解压所述低维的神经网络参数,恢复神经网络模型的参数。
对于步骤S1,其具体可以包括:对神经网络模型的待压缩参数进行遍历选取,直至选取的待压缩参数的数量等于设定的维数。
可选的,还可以对待压缩参数进行预处理,该预处理可以是对待压缩参数进行稀疏化,当需要对待压缩参数进行稀疏化时,可以包括:对神经网络模型的待压缩参数进行遍历选取,对所述待压缩参数进行稀疏化,对选取的待压缩参数进行判断,小于设定阈值的待压缩参数被置0,选取稀疏化后的非零元并标记非零元的位置坐标,直至选取的待压缩参 数的数量等于设定的维数。该稀疏化可有效减少神经网络模型参数,节约存储模型所需的内存空间,有利于模型的传输和移植。
当需要对待压缩参数进行稀疏化时,步骤S1包括:对神经网络模型的待压缩参数进行遍历选取,对所述待压缩参数进行稀疏化,对选取的待压缩参数进行判断,小于设定阈值的待压缩参数被置0,选取稀疏化后的非零元并标记非零元的位置坐标,直至选取的待压缩参数的数量等于设定的维数。并且相对应的,如采用稀疏化步骤,则步骤S3中,解压后需要按照非零元的标记位置放置神经网络参数。
进行稀疏化或者非稀疏化时,所述遍历选取按照构建神经网络模型的先后顺序依次获取各层的待压缩参数。
对于步骤S2,其可以包括子步骤:
S21:以多层感知器为基础搭建自动编码神经网络,自动编码神经网络的输入层和输出层节点数相同,并且隐层节点数少于显层节点数;
S22:输入待压缩参数,对自动编码神经网络每层的神经元进行前向传导计算,得到各层的激活值;
S23:令输出等于输入,使用后向传导算法求出输出层以及各层神经元的残差;
S24:利用梯度下降法更新权值W和偏置B,使输出越来越接近输入;
S25:权值和偏置收敛后,输出隐层的值,即为低维的神经网络参数。
与一般有损压缩方法相比,用神经网络算法来压缩神经网络的模型,能够实现运算单元的复用,节约内存。
对于步骤S3,其可以包括:采用自动编码神经网络对低维的神经网络参数进行解压,压缩神经网络包括压缩网络、隐层和解压缩网络,恢复神经网络参数的数量,并将恢复的神经网络参数对应的放置到网络中。优选的,其可以采用步骤S21搭建的自动编码神经网络进行解压缩,恢复到输出层中。使用相应方法在解压时能够更大程度地恢复网络参数,相比于一般的线性降维方法有更高的准确率
对于步骤S1-S3未具体描述的细节,可以参照上述装置中相应模块所执行的指令进行,在此不予赘述。
基于同一发明构思,根据本发明实施例的再一方面,提供一种压缩/解压缩神经网络模型的设备。
图7为根据本发明一实施例的压缩/解压缩神经网络模型的设备的方框图。该设备700包括:
存储器702,用于存储可执行指令;以及
处理器701,用于执行存储器中存储的可执行指令,以执行如下操作:
获取神经网络模型的待压缩参数;
采用神经网络算法对所述待压缩参数进行压缩和训练,获得低维的神经网络参数;
解压所述低维的神经网络参数,恢复神经网络模型的参数。
上述可执行指令对应上述方法中的相应步骤,在于通过处理器执行上述方法步骤对应的可执行指令。
上述处理器701可以是单个CPU(中央处理单元),但也可以包括两个或更多个处理单元。例如,处理器可以包括通用微处理器、指令集处理器和/或相关芯片组和/或专用微处理器(例如,专用集成电路(ASIC))。处理器还可以包括用于缓存用途的板载存储器。优选的,采用专用的神经网络处理器,并且在指令执行时可以复用该神经网络处理器已经具有的神经网络,节省存储空间。
上述存储器702可以是闪存、随机存取存储器(RAM)、只读存储器(ROM)、EEPROM。优选的,可以采用搭载在芯片上的片上存储装置。存储器702除存储上述指令外,还可以存储指令执行过程中的待压缩参数、低维的神经网络参数,以及恢复的神经网络参数。
通过上述实施例,自动编码神经网络算法通过引入神经网络的非线性性克服了这些限制,并且输出与输入相等的监督模式使其结果更可靠。auto-encoder是一种无监督学习方法,它利用一个输入层和输出层表示 相同的含义、具有相同的节点数的多层神经网络,学习一个输入输出相同的“恒等函数”。自动编码神经网络的意义在于学习最中间的隐层,这一层通常节点数较输入层和输出层更少,是输入向量的良好表示。这个过程起到了“降维”的作用,实现高维输入的低维表示。
在前述的说明书中,参考其特定示例性实施例描述了本发明的各实施例。显然,可对各实施例做出各种修改,而不悖离所附权利要求所述的本发明的更广泛的精神和范围。相应地,说明书和附图应当被认为是说明性的,而不是限制性的。

Claims (11)

  1. 一种压缩/解压缩神经网络模型的方法,包括步骤:
    S1:获取神经网络模型的待压缩参数;
    S2:采用神经网络算法对所述待压缩参数进行压缩和训练,获得低维的神经网络参数;
    S3:解压所述低维的神经网络参数,恢复神经网络模型的参数。
  2. 根据权利要求1所述的方法,其特征在于,步骤S1包括:
    对神经网络模型的待压缩参数进行遍历选取,直至选取的待压缩参数的数量等于设定的维数。
  3. 根据权利要求1所述的方法,其特征在于,步骤S1包括:
    对神经网络模型的待压缩参数进行遍历选取,对所述待压缩参数进行稀疏化,对选取的待压缩参数进行判断,小于设定阈值的待压缩参数被设置为0,选取稀疏化后的非零元并标记非零元的位置坐标,直至选取的待压缩参数的数量等于设定的维数。
  4. 根据权利要求2或3所述的方法,其特征在于,所述遍历选取按照构建神经网络模型的先后顺序依次获取各层的待压缩参数。
  5. 根据权利要求1所述的方法,其特征在于,步骤S2包括子步骤:
    S21:以多层感知器为基础搭建自动编码神经网络,自动编码神经网络的输入层和输出层节点数相同,并且隐层节点数少于输入层节点数;
    S22:输入待压缩参数,对自动编码神经网络每层的神经元进行前向传导计算,得到各层的激活值;
    S23:令输出等于输入,使用后向传导算法求出输出层以及各层神经元的残差;
    S24:利用梯度下降法更新权值W和偏置B,使输出越来越接近输入;
    S25:权值和偏置收敛后,输出隐层的值,即为低维的神经网络参数。
  6. 根据权利要求5所述的方法,其特征在于,采用步骤S21中自动编码神经网络的部分网络进行解压缩,恢复到输出层中。
  7. 一种压缩/解压缩神经网络模型的装置,包括参数获取模块、模型压缩模块、模型存储模块和模型解压缩模块,其中,
    参数获取模块,用于获取神经网络模型的待压缩参数;
    模型压缩模块,用于采用神经网络算法压缩所述待压缩参数,并进行训练,获得低维的神经网络参数;
    模型解压缩模块,用于解压缩低维的神经网络参数,形成恢复的神经网络参数;以及
    存储模块,用于存储神经网络模型的待压缩参数、低维的神经网络参数和恢复的神经网络参数。
  8. 根据权利要求7所述的装置,其特征在于,所述模型压缩模块中,压缩所述待压缩参数通过自动编码神经网络算法进行压缩,自动编码神经网络分为压缩网络、中间隐层和解压缩网络,所述压缩网络输入待压缩参数,输出至中间隐层,且输入的节点数大于输出的节点数。
  9. 根据权利要求8所述的装置,其特征在于,所述自动编码神经网络以多层感知器为基础搭建。
  10. 根据权利要求8所述的装置,其特征在于,所述模型解压模块中,解压缩低维的神经网络参数通过所述解压缩网络进行解压,所述解压缩网络输入低维的神经网络参数,恢复神经网络参数的数量。
  11. 一种压缩/解压缩神经网络模型的设备,其中包括:
    存储器,用于存储可执行指令;以及
    处理器,用于执行存储器中存储的可执行指令,以执行如下操作:
    获取神经网络模型的待压缩参数;
    采用神经网络算法对所述待压缩参数进行压缩和训练,获得低维的神经网络参数;
    解压所述低维的神经网络参数,恢复神经网络模型的参数。
PCT/CN2016/110053 2016-12-15 2016-12-15 压缩/解压缩神经网络模型的装置、设备和方法 WO2018107414A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/110053 WO2018107414A1 (zh) 2016-12-15 2016-12-15 压缩/解压缩神经网络模型的装置、设备和方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/110053 WO2018107414A1 (zh) 2016-12-15 2016-12-15 压缩/解压缩神经网络模型的装置、设备和方法

Publications (1)

Publication Number Publication Date
WO2018107414A1 true WO2018107414A1 (zh) 2018-06-21

Family

ID=62557784

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/110053 WO2018107414A1 (zh) 2016-12-15 2016-12-15 压缩/解压缩神经网络模型的装置、设备和方法

Country Status (1)

Country Link
WO (1) WO2018107414A1 (zh)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163367A (zh) * 2018-09-29 2019-08-23 腾讯科技(深圳)有限公司 一种模型压缩方法和装置
CN110796281A (zh) * 2019-08-26 2020-02-14 广西电网有限责任公司电力科学研究院 基于改进深度信念网络的风电机组状态参数预测方法
CN110889248A (zh) * 2019-11-06 2020-03-17 江苏科技大学 空气弹簧疲劳寿命预测平台及其预测方法
CN110929837A (zh) * 2018-09-19 2020-03-27 北京搜狗科技发展有限公司 神经网络模型压缩方法及装置
CN111353591A (zh) * 2018-12-20 2020-06-30 中科寒武纪科技股份有限公司 一种计算装置及相关产品
CN111382848A (zh) * 2018-12-27 2020-07-07 中科寒武纪科技股份有限公司 一种计算装置及相关产品
CN111713035A (zh) * 2020-04-07 2020-09-25 东莞理工学院 基于人工智能的mimo多天线信号传输与检测技术
CN112102183A (zh) * 2020-09-02 2020-12-18 杭州海康威视数字技术股份有限公司 稀疏处理方法、装置及设备
CN113168554A (zh) * 2018-12-29 2021-07-23 华为技术有限公司 一种神经网络压缩方法及装置
CN114298277A (zh) * 2021-12-28 2022-04-08 四川大学 一种基于层稀疏化的分布式深度学习训练方法及系统
CN117272688A (zh) * 2023-11-20 2023-12-22 四川省交通勘察设计研究院有限公司 一种结构力学仿真数据的压缩与解压方法、装置及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101094402A (zh) * 2007-07-13 2007-12-26 青岛大学 基于神经网络与svm的图像编码方法
CN101183873A (zh) * 2007-12-11 2008-05-21 中山大学 一种基于bp神经网络的嵌入式系统数据压缩解压缩方法
CN101795344A (zh) * 2010-03-02 2010-08-04 北京大学 数字全息图像压缩、解码方法及系统、传输方法及系统
CN102665221A (zh) * 2012-03-26 2012-09-12 南京邮电大学 基于压缩感知与bp神经网络的认知无线电频谱感知方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101094402A (zh) * 2007-07-13 2007-12-26 青岛大学 基于神经网络与svm的图像编码方法
CN101183873A (zh) * 2007-12-11 2008-05-21 中山大学 一种基于bp神经网络的嵌入式系统数据压缩解压缩方法
CN101795344A (zh) * 2010-03-02 2010-08-04 北京大学 数字全息图像压缩、解码方法及系统、传输方法及系统
CN102665221A (zh) * 2012-03-26 2012-09-12 南京邮电大学 基于压缩感知与bp神经网络的认知无线电频谱感知方法

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929837B (zh) * 2018-09-19 2024-05-10 北京搜狗科技发展有限公司 一种联想词预测方法及装置
CN110929837A (zh) * 2018-09-19 2020-03-27 北京搜狗科技发展有限公司 神经网络模型压缩方法及装置
CN110163367A (zh) * 2018-09-29 2019-08-23 腾讯科技(深圳)有限公司 一种模型压缩方法和装置
CN110163367B (zh) * 2018-09-29 2023-04-07 腾讯科技(深圳)有限公司 一种终端部署方法和装置
CN111353591A (zh) * 2018-12-20 2020-06-30 中科寒武纪科技股份有限公司 一种计算装置及相关产品
CN111382848A (zh) * 2018-12-27 2020-07-07 中科寒武纪科技股份有限公司 一种计算装置及相关产品
CN113168554A (zh) * 2018-12-29 2021-07-23 华为技术有限公司 一种神经网络压缩方法及装置
CN113168554B (zh) * 2018-12-29 2023-11-28 华为技术有限公司 一种神经网络压缩方法及装置
CN110796281A (zh) * 2019-08-26 2020-02-14 广西电网有限责任公司电力科学研究院 基于改进深度信念网络的风电机组状态参数预测方法
CN110889248A (zh) * 2019-11-06 2020-03-17 江苏科技大学 空气弹簧疲劳寿命预测平台及其预测方法
CN110889248B (zh) * 2019-11-06 2024-03-26 江苏科技大学 空气弹簧疲劳寿命预测平台及其预测方法
CN111713035A (zh) * 2020-04-07 2020-09-25 东莞理工学院 基于人工智能的mimo多天线信号传输与检测技术
CN112102183A (zh) * 2020-09-02 2020-12-18 杭州海康威视数字技术股份有限公司 稀疏处理方法、装置及设备
CN114298277A (zh) * 2021-12-28 2022-04-08 四川大学 一种基于层稀疏化的分布式深度学习训练方法及系统
CN114298277B (zh) * 2021-12-28 2023-09-12 四川大学 一种基于层稀疏化的分布式深度学习训练方法及系统
CN117272688A (zh) * 2023-11-20 2023-12-22 四川省交通勘察设计研究院有限公司 一种结构力学仿真数据的压缩与解压方法、装置及系统
CN117272688B (zh) * 2023-11-20 2024-02-13 四川省交通勘察设计研究院有限公司 一种结构力学仿真数据的压缩与解压方法、装置及系统

Similar Documents

Publication Publication Date Title
WO2018107414A1 (zh) 压缩/解压缩神经网络模型的装置、设备和方法
Chen et al. Efficient approximation of deep relu networks for functions on low dimensional manifolds
US20230140474A1 (en) Object recognition with reduced neural network weight precision
US10462476B1 (en) Devices for compression/decompression, system, chip, and electronic device
CN110059772B (zh) 基于多尺度解码网络的遥感图像语义分割方法
EP3438890B1 (en) Method and apparatus for generating fixed-point quantized neural network
US11531889B2 (en) Weight data storage method and neural network processor based on the method
JP6574503B2 (ja) 機械学習方法および装置
US20190370658A1 (en) Self-Tuning Incremental Model Compression Solution in Deep Neural Network with Guaranteed Accuracy Performance
WO2017152499A1 (zh) 图像压缩系统、解压缩系统、训练方法和装置、显示装置
CN106157339A (zh) 基于低秩顶点轨迹子空间提取的动画网格序列压缩算法
US11836572B2 (en) Quantum inspired convolutional kernels for convolutional neural networks
CN111898461B (zh) 一种时序行为片段生成方法
WO2022028197A1 (zh) 一种图像处理方法及其设备
CN112418292A (zh) 一种图像质量评价的方法、装置、计算机设备及存储介质
CN111507100A (zh) 一种卷积自编码器及基于该编码器的词嵌入向量压缩方法
KR20230072454A (ko) 이미지 텍스트 양방향 생성 장치, 방법 및 프로그램
CN115664899A (zh) 一种基于图神经网络的信道解码方法及系统
CN108805280B (zh) 一种图像检索的方法和装置
CN113989283B (zh) 3d人体姿态估计方法、装置、电子设备与存储介质
CN115022637A (zh) 一种图像编码方法、图像解压方法以及装置
US20080232682A1 (en) System and method for identifying patterns
CN116882469B (zh) 用于情感识别的脉冲神经网络部署方法、装置及设备
WO2023051335A1 (zh) 数据编码方法、数据解码方法以及数据处理装置
CN114501031B (zh) 一种压缩编码、解压缩方法以及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16923878

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16923878

Country of ref document: EP

Kind code of ref document: A1