WO2020233130A1 - 一种深度神经网络压缩方法及相关设备 - Google Patents

一种深度神经网络压缩方法及相关设备 Download PDF

Info

Publication number
WO2020233130A1
WO2020233130A1 PCT/CN2019/130560 CN2019130560W WO2020233130A1 WO 2020233130 A1 WO2020233130 A1 WO 2020233130A1 CN 2019130560 W CN2019130560 W CN 2019130560W WO 2020233130 A1 WO2020233130 A1 WO 2020233130A1
Authority
WO
WIPO (PCT)
Prior art keywords
tensor
decomposition
layer
neural network
deep neural
Prior art date
Application number
PCT/CN2019/130560
Other languages
English (en)
French (fr)
Inventor
周阳
张涌
王书强
邬晶晶
姜元爽
Original Assignee
深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳先进技术研究院 filed Critical 深圳先进技术研究院
Publication of WO2020233130A1 publication Critical patent/WO2020233130A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the field of computer technology, in particular to a deep neural network compression method, device, equipment and computer readable medium.
  • the weight of the deep convolutional neural network is decomposed into tensor, and the obtained multiple low-rank subtensors are used to replace the original weights, so that the acceleration and compression of large-scale deep convolutional neural networks can be realized;
  • another Chinese patent application number: 201711319853.1 "," "Deep neural network compression method based on block item tensor decomposition” provides a deep neural network compression method based on block item tensor decomposition, which converts the weight matrix and input vector into high-order tensors and The block item tensor is decomposed, the fully connected layer of the deep neural network is replaced with the block item tensor layer, and the back propagation algorithm is used to train the replaced deep neural network.
  • the decomposition rank of the neural network compression algorithm based on tensor train tensor decomposition is manually set through experience and parameter adjustment. During the training process, it is necessary to repeatedly explore the appropriate tensor train decomposition rank of each layer of neural network, which requires a lot of Time and energy.
  • embodiments of the present application provide a deep neural network compression method, device, equipment, and computer-readable medium based on an adaptive tensor train decomposition rank algorithm.
  • this application provides a deep neural network compression method, which includes:
  • one layer of networks is selected as the selected layer in a predetermined order, and the network parameters of the remaining layers in the optional network layer are fixed unchanged;
  • the tensor decomposition calculation is performed according to the selected layer, and multiple kernel matrices are obtained by adjusting the required precision value, and when the precision difference of the kernel matrix meets a preset condition, the kernel
  • the matrix as the tensor decomposition kernel matrix of the selected layer includes:
  • the tensor decomposition calculation is performed according to the selected layer, and multiple kernel matrices are obtained by adjusting the required precision value, and when the precision difference of the kernel matrix meets a preset condition, the kernel
  • the matrix as the tensor decomposition kernel matrix of the selected layer includes:
  • the output vector y ⁇ R N is transformed into a high-order tensor with dimensions (n 1 ,...,n d )
  • the operation process of the decomposed deep neural network model is expressed as:
  • the tensor decomposition calculation is performed according to the selected layer, and multiple kernel matrices are obtained by adjusting the required precision value, and when the precision difference of the kernel matrix meets a preset condition, the kernel
  • the matrix as the tensor decomposition kernel matrix of the selected layer includes:
  • the selected layer is a convolutional layer
  • Convolution kernel tensor Converted into a matrix of size k 2 M ⁇ N, where F is the side length of the input tensor, where the side lengths are equal, M is the number of input channels, N is the number of output channels, and k is the side length of the convolution kernel;
  • the method further includes:
  • the quantifying operation on the deep neural network model includes:
  • the step of sequentially selecting a layer of networks as the selection layer in a predetermined order includes:
  • this application provides a deep neural network compression device based on an adaptive tensor train decomposition algorithm, the device includes:
  • the acquiring unit is used to acquire the deep neural network model to be compressed
  • the determining unit is configured to, when it is determined that the deep neural network model to be compressed has optional network layers, select one layer of networks as the selected layer in a predetermined order, and fix the network parameters of the remaining layers in the optional network layer unchanged;
  • the tensor decomposition unit is configured to perform tensor decomposition calculation according to the selected layer, obtain multiple kernel matrices by adjusting the required precision value, and use the kernel matrix as the kernel matrix when the precision difference of the kernel matrix meets a preset condition
  • the tensor decomposition kernel matrix of the selected layer is configured to perform tensor decomposition calculation according to the selected layer, obtain multiple kernel matrices by adjusting the required precision value, and use the kernel matrix as the kernel matrix when the precision difference of the kernel matrix meets a preset condition The tensor decomposition kernel matrix of the selected layer;
  • the execution unit is used to repeatedly select the next layer of network as the selected layer to perform tensor decomposition, until all the optional network layers complete the kernel matrix decomposition to obtain a compressed deep neural network model.
  • this application also provides a computer device, the device including:
  • One or more processors are One or more processors;
  • Memory used to store one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the deep neural network compression method as described above.
  • the present application also provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, the deep neural network compression method as described above is implemented.
  • the deep neural network compression method, device, equipment, and computer readable medium based on the adaptive tensor train decomposition rank algorithm provided in this application, through the adaptive decomposition rank algorithm based on the tensor train decomposition algorithm, in the network training process according to the design Determine the network accuracy threshold to decompose the parameter matrix of each layer in the deep neural network model layer by layer, fix other network layers while decomposing, and adjust the decomposition rank of the current network layer in order, and retrain and restore the accuracy. After the accuracy threshold is reached Determining the current rank as the decomposition rank of the selected layer network solves the tedious and uncertain problem of manually determining the decomposition rank, and compresses the parameter matrix to achieve the compression effect of the neural network model.
  • FIG. 1 is a flowchart of a deep neural network compression method based on an adaptive tensor train decomposition rank algorithm provided in an embodiment of the present application;
  • FIG. 2 is a schematic diagram of tensor decomposition in an adaptive tensor train decomposition rank algorithm based on an embodiment of the present application
  • FIG. 3 is a structural block diagram of a deep neural network compression device based on an adaptive tensor train decomposition rank algorithm provided in an embodiment of the present application;
  • FIG. 4 is a structural diagram of an embodiment of a computer device provided in an embodiment of the present application.
  • Fig. 5 is an exemplary diagram of a computer device provided in an embodiment of the present application.
  • an embodiment of the deep neural network compression method provided in the embodiment of the present application, the method includes:
  • the deep neural network model to be compressed can be VGGNet, GoogLeNet, ResNet, etc., which is not limited.
  • the optional network layers determine whether there are optional network layers in the neural network model to be compressed.
  • the optional network layers can be decomposed layer by layer.
  • the optional network layers have multiple layers.
  • the network layers are selected in order, such as from the last One layer of network is selected in order from the first layer of network to the first layer of network as the selected layer. Of course, it can also be selected from the first layer of network to the last layer of network. This is not limited. For the selected selected layer, the remaining The network layer parameters remain unchanged.
  • n and ⁇ are both hyperparameters.
  • hyperparameters are parameters that are set before starting the learning process, rather than parameter data obtained through training.
  • S104 Repeatedly selecting the next layer of network as the selected layer to perform tensor decomposition, until all the optional network layers complete the kernel matrix decomposition, and a compressed deep neural network model is obtained.
  • the parameter matrix of each layer in the deep neural network model is decomposed layer by layer according to the set network accuracy threshold during the network training process, and other network layers are fixed while decomposing, and Adjust the decomposition rank of the current network layer in order, and retrain the recovery accuracy.
  • the current rank is determined as the decomposition rank of the selected layer network, which solves the tedious and uncertain problem of manually determining the decomposition rank.
  • the matrix is compressed to achieve the compression effect of the neural network model.
  • the tensor train decomposition schematic diagram uses the tensor train tensor decomposition algorithm to compress the parameter matrices of the fully connected layer and the convolutional layer of the neural network.
  • the principle of tensor train decomposition is to express each element in a high-dimensional tensor by multiplying several matrices. which is:
  • A(i 1 ,i 2 ,...,i d ) G 1 (i 1 )G 2 (i 2 )...G d (i d );
  • G k (i k ) is a matrix of size r k-1 ⁇ r k
  • r k represents the rank of tensor train decomposition (TT-ranks), in order to ensure that the final result is a scalar
  • a tensor train decomposition of a size of 5 ⁇ 4 ⁇ 5 tensor A, any element in tensor A, for example, A 231 can be written in the form of 3-matrix multiplication.
  • the decomposition rank of the tensor train is set to ( 1 , 3, 3, 1 )
  • the size of each matrix is r k-1 ⁇ r k , which are 1 ⁇ 3, 3 ⁇ 3, and 3 ⁇ 1, respectively.
  • each matrix in G k is determined by the element's subscript i k , which are 2, 3, and 1, respectively.
  • the input vector of the selected layer Converted to a high-dimensional tensor with dimensions (m 1 ,..., m d )
  • the output vector y ⁇ R N is transformed into a high-order tensor with dimensions (n 1 ,...,n d )
  • the operation process of the decomposed deep neural network model is expressed as:
  • the tensor train tensor decomposition algorithm is used to decompose the parameter matrix into the matrix multiplication state, which can significantly reduce the amount of parameters in the fully connected layer.
  • Convolution kernel tensor Converted into a matrix of size k 2 M ⁇ N, where F is the side length of the input tensor, where the side lengths are equal, M is the number of input channels, N is the number of output channels, and k is the side length of the convolution kernel;
  • the tensor train decomposition of the convolution kernel matrix based on the im2col operation can reduce the parameter of the convolution layer.
  • the adaptive tensor train decomposition rank algorithm is used to automatically determine the decomposition rank of the convolutional layer and the fully connected layer; on the basis of tensor decomposition, the weighted quantization algorithm is used to quantize the 32bit full precision parameters to 8bit, which can accelerate the inference speed of the neural network .
  • the method repeatedly selects the next layer of network as the selected layer to perform tensor decomposition, until all the optional network layers complete the kernel matrix decomposition to obtain a compressed deep neural network model, the method further includes:
  • the method of quantifying the parameters of the kernel matrix solves the problem of the increase in the number of network layers caused by the decomposition of the tensor train and the insignificant effect of forward acceleration.
  • the weight quantization technique is applied to the kernel matrix after the decomposition of the tensor train, which has a comparative advantage.
  • the existing technology has faster forward inference speed.
  • the method includes:
  • S202 Starting from the last layer of the network to the first layer of network, sequentially select a layer of networks in the to-be-compressed deep neural network model as the selected layer, and fix all network layer parameters except the selected layer;
  • the decomposed network operation process is expressed as:
  • the selected layer is a convolutional layer
  • use the im2col operation to input the tensor Converted into a matrix of size F′F′ ⁇ k 2 M; the convolution kernel tensor Converted to a matrix of size k 2 M ⁇ N, where F is the side length of the input tensor, where the side lengths are equal, M is the number of input channels, N is the number of output channels, and k is the side length of the convolution kernel.
  • the adaptive tensor train decomposition algorithm performs tensor train decomposition on the convolution kernel parameter matrix.
  • S207 Perform restoration accuracy adjustment on the deep neural network model after the quantization operation.
  • the tensor train tensor decomposition algorithm For the fully connected layer of the neural network, use the tensor train tensor decomposition algorithm to decompose the parameter matrix into the matrix multiplication state, which can significantly reduce the amount of parameters of the fully connected layer; for the convolution layer of the neural network, the volume is based on the im2col operation
  • the tensor train decomposition of the product kernel matrix can reduce the parameter quantity of the convolutional layer;
  • the adaptive tensor train decomposition rank algorithm is used to automatically determine the decomposition rank of the convolutional layer and the fully connected layer; on the basis of tensor decomposition, the use weight is quantified
  • the algorithm quantizes the 32-bit full-precision parameters to 8 bits, which can accelerate the inference speed of the neural network.
  • the technical solution provided in this application can effectively compress and accelerate the existing mainstream neural network.
  • the solution provided in this application has been tested on the fully connected layer of the VGG-16 network, and the experimental results show that the compression and acceleration effect of this application is good.
  • the adaptive tensor train decomposition is applied to the fully connected layer of VGG-16 for parameter compression and preliminary experiments are carried out.
  • the overall network parameter compression ratio is 3.9.
  • the top5 error rate only increased from 11.8% to 12.3%.
  • the estimated time before and after compression was compared on the CPU and GPU, as shown in Table 1.
  • the experimental results show that the tensor train decomposition has an acceleration effect on the inference speed of the fully connected layer network. Thanks to the parallel processing of matrix operations, the acceleration effect on the GPU is more obvious.
  • the present application provides a deep neural network compression device based on an adaptive tensor train decomposition algorithm, characterized in that the device includes:
  • the obtaining unit 301 is used to obtain the deep neural network model to be compressed
  • the determining unit 302 is configured to, when it is determined that the deep neural network model to be compressed has optional network layers, select a layer of networks as the selected layer in a predetermined order, and fix the network parameters of the remaining layers in the optional network layer unchanged;
  • the tensor decomposition unit 303 is configured to perform tensor decomposition calculations according to the selected layer, obtain multiple kernel matrices by adjusting the required precision values, and use the kernel matrix as the kernel matrix when the precision difference of the kernel matrix meets a preset condition The tensor decomposition kernel matrix of the selected layer;
  • the execution unit 304 is configured to repeatedly select the next layer of network as the selected layer to perform tensor decomposition, until all the optional network layers complete the kernel matrix decomposition to obtain a compressed deep neural network model.
  • the deep neural network compression device based on the adaptive tensor train decomposition rank algorithm provided in this application, through the adaptive decomposition rank algorithm based on the tensor train decomposition algorithm, decomposes the depth layer by layer according to the set network accuracy threshold during the network training process
  • the parameter matrix of each layer in the neural network model is decomposed while fixing other network layers, and the decomposition rank of the current network layer is adjusted in order, and the accuracy is retrained. After the accuracy threshold is reached, the current rank is determined as the selected layer network
  • the decomposition rank solves the tedious and uncertain problem of manually determining the decomposition rank, and compresses the parameter matrix to achieve the compression effect of the neural network model.
  • Figure 4 is a structural diagram of an embodiment of a computer device of the present invention.
  • the computer device of this embodiment includes: one or more processors 30, and a memory 40.
  • the memory 40 is used to store one or more programs.
  • processors 30 execute, so that one or more processors 30 implement the deep neural network compression method based on the adaptive tensor train decomposition rank algorithm in the embodiments shown in FIGS. 1 to 2 above.
  • multiple processors 30 are included as an example.
  • FIG. 5 is an example diagram of a computer device provided by the present invention.
  • Figure 5 shows a block diagram of an exemplary computer device 12a suitable for implementing embodiments of the present invention.
  • the computer device 12a shown in FIG. 5 is only an example, and should not bring any limitation to the function and application scope of the embodiment of the present invention.
  • the computer device 12a is represented in the form of a general-purpose computing device.
  • the components of the computer device 12a may include, but are not limited to: one or more processors 16a, a system memory 28a, and a bus 18a connecting different system components (including the system memory 28a and the processor 16a).
  • the bus 18a represents one or more of several types of bus structures, including a memory bus or a memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any bus structure among multiple bus structures.
  • these architectures include but are not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and peripheral component interconnection ( PCI) bus.
  • ISA industry standard architecture
  • MAC microchannel architecture
  • VESA Video Electronics Standards Association
  • PCI peripheral component interconnection
  • the computer device 12a typically includes a variety of computer system readable media. These media can be any available media that can be accessed by the computer device 12a, including volatile and non-volatile media, removable and non-removable media.
  • the system memory 28a may include a computer system readable medium in the form of volatile memory, such as random access memory (RAM) 30a and/or cache memory 32a.
  • the computer device 12a may further include other removable/non-removable, volatile/nonvolatile computer system storage media.
  • the storage system 34a can be used to read and write non-removable, non-volatile magnetic media (not shown in FIG. 5, usually referred to as a "hard drive").
  • a disk drive for reading and writing to a removable non-volatile disk such as a "floppy disk”
  • a removable non-volatile disk such as CD-ROM, DVD-ROM
  • other optical media read and write optical disc drives.
  • each drive can be connected to the bus 18a through one or more data media interfaces.
  • the system memory 28a may include at least one program product.
  • the program product has a set (for example, at least one) program modules configured to perform the functions of the above-mentioned embodiments of the present invention in FIGS.
  • a program/utility tool 40a having a set of (at least one) program module 42a may be stored in, for example, the system memory 28a.
  • Such program module 42a includes but is not limited to an operating system, one or more application programs, other program modules, and programs Data, each of these examples or some combination may include the realization of a network environment.
  • the program module 42a usually executes the functions and/or methods in the above-mentioned embodiments of FIG. 1 and FIG. 2 described in the present invention.
  • the computer device 12a may also communicate with one or more external devices 14a (such as a keyboard, pointing device, display 24a, etc.), and may also communicate with one or more devices that enable a user to interact with the computer device 12a, and/or communicate with Any device (such as a network card, modem, etc.) that enables the computer device 12a to communicate with one or more other computing devices. This communication can be performed through an input/output (I/O) interface 22a.
  • the computer device 12a may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 20a.
  • networks for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet
  • the network adapter 20a communicates with other modules of the computer device 12a through the bus 18a. It should be understood that although not shown in the figure, other hardware and/or software modules can be used in conjunction with the computer device 12a, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.
  • the processor 16a executes various functional applications and data processing by running programs stored in the system memory 28a, such as implementing the deep neural network compression method based on the adaptive tensor train decomposition rank algorithm shown in the foregoing embodiment.
  • the present invention also provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, it implements the deep neural network compression method based on the adaptive tensor train decomposition rank algorithm as shown in the above embodiment.
  • the computer-readable medium of this embodiment may include the RAM 30a, and/or the cache memory 32a, and/or the storage system 34a in the system memory 28a in the embodiment shown in FIG. 5 above.
  • the dissemination of computer programs is no longer limited to tangible media. It can also be downloaded directly from the Internet or obtained in other ways. Therefore, the computer-readable media in this embodiment may include not only tangible media, but also intangible media.
  • the computer-readable medium in this embodiment may adopt any combination of one or more computer-readable media.
  • the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium.
  • the computer-readable storage medium may be, for example, but is not limited to an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination of the above. More specific examples (non-exhaustive list) of computer-readable storage media include: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), Erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the computer-readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, apparatus, or device.
  • the computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and computer-readable program code is carried therein. This propagated data signal can take many forms, including, but not limited to, a magnetic signal, an optical signal, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium.
  • the computer-readable medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .
  • the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to wireless, wire, optical cable, RF, etc., or any suitable combination of the above.
  • the computer program code used to perform the operations of the present invention can be written in one or more programming languages or a combination thereof.
  • the programming languages include object-oriented programming languages-such as Java, Smalltalk, C++, and also conventional Procedural programming language-such as "C" language or similar programming language.
  • the program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or server.
  • the remote computer can be connected to the user’s computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to pass Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider for example, using an Internet service provider to pass Internet connection.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.
  • the above-mentioned integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium.
  • the above-mentioned software functional unit is stored in a storage medium and includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor execute the method described in the various embodiments of the present invention. Part of the steps.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Image Processing (AREA)

Abstract

一种深度神经网络压缩方法、装置、设备及计算机可读介质,涉及计算机技术领域。通过基于张量火车分解算法的自适应分解秩算法,在网络训练过程中根据设定的网络精度阈值逐层分解深度神经网络模型中每层的参数矩阵,分解的同时固定住其他网络层,并且按照顺序调整当前网络层的分解秩,并重新训练恢复精度,在达到精度阈值后将当前秩确定为该选取层网络的分解秩,解决了手动确定分解秩的繁琐和不确定性问题,对参数矩阵进行压缩从而达到神经网络模型的压缩效果。

Description

一种深度神经网络压缩方法及相关设备 技术领域
本申请涉及计算机技术领域,特别涉及一种深度神经网络压缩方法、装置、设备及计算机可读介质。
背景技术
近年来,深度卷积神经网络在图像识别、自然语言处理、语音识别等领域得到了十分广泛的应用,其强大的特征提取能力在众多任务上都取得了突破式的性能提升。为了提高神经网络模型的性能,研究人员普遍会设计更深和更复杂的网络,例如:VGGNet,GoogLeNet以及ResNet等。这样会使模型的参数量和计算量大大增加,这对硬件资源(如CPU、GPU内存、带宽)的要求也越来越高,成本十分昂贵,将如此复杂的深度神经网络直接部署在计算能力和续航能力有限移动设备上(如手机、无人机、机器人、智能眼镜)是很难实现的,在移动设备上和廉价部署深度神经网络系统有着重要的应用需求和前景,是未来深度学习实现产业化需要解决的一个重要问题。
要将目前的大型卷积神经网络部署在廉价设备上,需要解决存储空间和计算量有限的问题,那么模型的紧凑性和计算的高效性就是十分重要的。模型过大占用大量的内存空间,同时会影响计算效率。研究表明,在深度卷积网络中卷积操作是计算量最大的操作,故加速的关键在于提高卷积操作的计算效率。另外,在卷积神经网络中存在着大量的冗余结构和参数,特别是全连接网络层,这些冗余结构和参数对最后的推断结果贡献很小,故可以采取压缩网络结构和参数量的方式来减少模型尺寸,同时也可以加快网络的计算速度。
中国专利申请号:201610387878.4、名为“基于张量分解的深度卷积神经网络的加速与压缩方法”提供了一种基于权值张量分解的深度卷积神经网络的加速与压缩方法,通过对深度卷积神经网络的权值进张量分解,利用所得的多 个低秩子张量替换原有权值,从而可以实现大型深度卷积神经网络的加速与压缩;另有中国专利申请号:201711319853.1”、名为“基于块项张量分解的深度神经网络压缩方法”提供了一种基基于块项张量分解的深度神经网络压缩方法,将权重矩阵和输入向量均转化为高阶张量并对其进行块项张量分解处理,将深度神经网络的全连接层替换为块项张量层,采用后向传播算法对替换后的深度神经网络进行训练。
目前基于张量火车张量分解的神经网络压缩算法的分解秩均是通过经验和调参的方式手动设置,训练的过程中需要反复探索每层神经网络合适的张量火车分解秩,需要大量的时间和精力。
发明内容
为了解决上述问题之一,本申请实施例提供了一种基于自适应张量火车分解秩算法的深度神经网络压缩方法、装置、设备及计算机可读介质。
第一方面,本申请提供一种深度神经网络压缩方法,所述方法包括:
获取待压缩深度神经网络模型;
当确定所述待压缩深度神经网络模型有可选网络层,按照预定次序依次选取一层网络作为选取层,固定所述可选网络层中余下层网络参数不变;
根据所述选取层进行张量分解计算,通过调整精度要求值得到多个核矩阵,当所述核矩阵的精度差值符合预置条件时将所述核矩阵作为所述选取层的张量分解核矩阵;
重复选取下一层网络作为选取层进行张量分解,直至所述可选网络层全部完成核矩阵分解,得到压缩后的深度神经网络模型。
作为一种可选地方案,所述根据所述选取层进行张量分解计算,通过调整精度要求值得到多个核矩阵,当所述核矩阵的精度差值符合预置条件时将所述核矩阵作为所述选取层的张量分解核矩阵,包括:
获取所述选取层的参数矩阵W和网络精度L,并将所述参数矩阵W转化为高维张量T;
设定误差要求值ε并记录在数组e[]中,对所述高维张量T进行张量算法压缩后并还原得到张量T′,并满足:
Figure PCTCN2019130560-appb-000001
确定奇异值分解截断值δ,
Figure PCTCN2019130560-appb-000002
其中d为高维张量T的维度;
将所述高维张量T逐维展开为矩阵,并利用所述奇异值分解截断值δ对所述矩阵进行奇异值分解,得到分解秩r k和分解后的核矩阵;
对所述选取层进行精度调整得到精度L′,确定精度差值Δ=L-L′记录在数组l[]中;
按照从大到小的顺序调整精度要求值ε,重复进行所述精度差值的确定,直到所述精度差值Δ连续n次不大于α则停止循环,将得到的核矩阵作为所述选取层的张量分解核矩阵,其中所述n和α均为超参数。
作为一种可选地方案,所述根据所述选取层进行张量分解计算,通过调整精度要求值得到多个核矩阵,当所述核矩阵的精度差值符合预置条件时将所述核矩阵作为所述选取层的张量分解核矩阵,包括:
当所述选取层为全连接层,将所述选取层的输入向量x∈R M转化为维度为(m 1,...,m d)的高维张量
Figure PCTCN2019130560-appb-000003
输出向量y∈R N转化为维度为(n 1,...,n d)的高阶张量
Figure PCTCN2019130560-appb-000004
偏置b=R N分解为维度为(n 1,...,n d)的高维张量B,
Figure PCTCN2019130560-appb-000005
对参数矩阵进行张量分解,得到的核矩阵为G k[m k,n k];
分解后的深度神经网络模型运算过程表示为:
Figure PCTCN2019130560-appb-000006
作为一种可选地方案,所述根据所述选取层进行张量分解计算,通过调整精度要求值得到多个核矩阵,当所述核矩阵的精度差值符合预置条件时将所述核矩阵作为所述选取层的张量分解核矩阵,包括:
当选取层为卷积层,使用im2col操作将输入张量
Figure PCTCN2019130560-appb-000007
转化为大小为F′F′×k 2M的矩阵;
将卷积核张量
Figure PCTCN2019130560-appb-000008
转化为大小为k 2M×N的矩阵,其中F为输入张量的边长,其中边长相等、M为输入通道数、N为输出通道数,k为卷积核边长;
对卷积核参数矩阵进行张量分解,分别将输入维度和输出维度进行分解:
Figure PCTCN2019130560-appb-000009
将卷积核矩阵K张量化为张量
Figure PCTCN2019130560-appb-000010
分解后得到核矩阵为G k[m k,n k];
分解后得卷积运算表示为:
Figure PCTCN2019130560-appb-000011
作为一种可选地方案,所述重复选取下一层网络作为选取层进行张量分解,直至所述可选网络层全部完成核矩阵分解,得到压缩后的深度神经网络模型之后,还包括:
对所述深度神经网络模型进行量化操作。
作为一种可选地方案,所述对所述深度神经网络模型进行量化操作,包括:
将32bit全精度的核矩阵参数量化至8bit的整数型。
作为一种可选地方案,所述按照预定次序依次选取一层网络作为选取层,包括:
按照从最后一层网络向第一层网络的顺序依次选取一层网络作为选取层。
第二方面,本申请提供一种基于自适应张量火车分解算法的深度神经网络压缩装置,所述装置包括:
获取单元,用于获取待压缩深度神经网络模型;
确定单元,用于当确定所述待压缩深度神经网络模型有可选网络层,按照预定次序依次选取一层网络作为选取层,固定所述可选网络层中余下层网络参数不变;
张量分解单元,用于根据所述选取层进行张量分解计算,通过调整精度要求值得到多个核矩阵,当所述核矩阵的精度差值符合预置条件时将所述核矩阵作为所述选取层的张量分解核矩阵;
执行单元,用于重复选取下一层网络作为选取层进行张量分解,直至所述可选网络层全部完成核矩阵分解,得到压缩后的深度神经网络模型。
第三方面,本申请还提供一种计算机设备,所述设备包括:
一个或多个处理器;
存储器,用于存储一个或多个程序;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如上述的深度神经网络压缩方法。
第四方面,本申请还提供一种计算机可读介质,其上存储有计算机程序,该程序被处理器执行时实现如上述的深度神经网络压缩方法。
本申请提供的基于自适应张量火车分解秩算法的深度神经网络压缩方法、装置、设备及计算机可读介质,通过基于张量火车分解算法的自适应分解秩算 法,在网络训练过程中根据设定的网络精度阈值逐层分解深度神经网络模型中每层的参数矩阵,分解的同时固定住其他网络层,并且按照顺序调整当前网络层的分解秩,并重新训练恢复精度,在达到精度阈值后将当前秩确定为该选取层网络的分解秩,解决了手动确定分解秩的繁琐和不确定性问题,对参数矩阵进行压缩从而达到神经网络模型的压缩效果。
附图说明
图1是本申请实施例中提供的一种基于自适应张量火车分解秩算法的深度神经网络压缩方法的流程图;
图2是本申请实施例中提供的一种基于自适应张量火车分解秩算法中张量分解的示意图;
图3是本申请实施例中提供的一种基于自适应张量火车分解秩算法的深度神经网络压缩装置的结构框图;
图4是本申请实施例中提供的一种计算机设备实施例的结构图;
图5是本申请实施例中提供的一种计算机设备的示例图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具 有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
结合图1所示,本申请实施例中提供的深度神经网络压缩方法的一种实施例,所述方法包括:
S101、获取待压缩深度神经网络模型。
待压缩深度神经网络模型可以是VGGNet,GoogLeNet以及ResNet等,对此不做限定。
S102、当确定所述待压缩深度神经网络模型有可选网络层,按照预定次序依次选取一层网络作为选取层,固定所述可选网络层中余下层网络参数不变。
判断待压缩神经网络模型中是否存在可选网络层,当确实存在可选网络层则可以进行逐层分解处理,可选的网络层有多层,按照顺序进行网络层选取,如可以按照从最后一层网络向第一层网络的顺序依次选取一层网络作为选取层,当然也可以从第一层网络向最后一层网络依次选取,对此不进行限定,对于选定的选取层,余下的网络层参数保持不变。
S103、根据所述选取层进行张量分解计算,通过调整精度要求值得到多个核矩阵,当所述核矩阵的精度差值符合预置条件时将所述核矩阵作为所述选取层的张量分解核矩阵。
具体地,获取所述选取层的参数矩阵W和网络精度L,并将所述参数矩阵W转化为高维张量T;
设定误差要求值ε并记录在数组e[]中,对所述高维张量T进行张量算法压缩后并还原得到张量T′,并满足:
Figure PCTCN2019130560-appb-000012
确定奇异值分解截断值δ,
Figure PCTCN2019130560-appb-000013
其中d为高维张量T的维度;
将所述高维张量T逐维展开为矩阵,并利用所述奇异值分解截断值δ对所述矩阵进行奇异值分解,得到分解秩r k和分解后的核矩阵;
对所述选取层进行精度调整得到精度L′,确定精度差值Δ=L-L′记录在数组l[]中;
按照从大到小的顺序调整精度要求值ε,重复进行所述精度差值的确定,直到所述精度差值Δ连续n次不大于α则停止循环,将得到的核矩阵作为所述选取层的张量分解核矩阵,其中所述n和α均为超参数,在机器学习的上下文中,超参数是在开始学习过程之前设置值的参数,而不是通过训练得到的参数数据。
S104、重复选取下一层网络作为选取层进行张量分解,直至所述可选网络层全部完成核矩阵分解,得到压缩后的深度神经网络模型。
通过基于张量火车分解算法的自适应分解秩算法,在网络训练过程中根据设定的网络精度阈值逐层分解深度神经网络模型中每层的参数矩阵,分解的同时固定住其他网络层,并且按照顺序调整当前网络层的分解秩,并重新训练恢复精度,在达到精度阈值后将当前秩确定为该选取层网络的分解秩,解决了手动确定分解秩的繁琐和不确定性问题,对参数矩阵进行压缩从而达到神经网络模型的压缩效果。
结合图2所示,具体地,张量火车分解示意图,使用了张量火车张量分解算法来将神经网络的全连接层和卷积层的参数矩阵进行压缩。具体地,张量火车分解的原理是将一个高维张量中的每一个元素用若干个矩阵连乘的形式来表达。即:
A(i 1,i 2,...,i d)=G 1(i 1)G 2(i 2)...G d(i d);
其中G k(i k)是一个r k-1×r k大小的矩阵,r k表示张量火车分解的秩(TT-ranks),为了确保最终结果是一个标量,r 0=r k=1。一个大小为5×4×5张量A的张量火车分解,张量A中的任一元素,例如A 231可以写为3个矩阵连乘的形式。此处张量火车分解秩设为(1,3,3,1),每个矩阵的大小r k-1×r k,分别为1×3、3×3、3×1。每个矩阵在G k中的位置由该元素的下标i k确定,分别为2、3、1。原张 量一共有5×4×5=100个参数,压缩后一共有1×3×5+3×3×4+3×1×5=66个参数。
具体地,在S104中,当所述选取层为全连接层,将所述选取层的输入向量
Figure PCTCN2019130560-appb-000014
转化为维度为(m 1,...,m d)的高维张量
Figure PCTCN2019130560-appb-000015
输出向量y∈R N转化为维度为(n 1,...,n d)的高阶张量
Figure PCTCN2019130560-appb-000016
偏置b=R N分解为维度为(n 1,...,n d)的高维张量B,
Figure PCTCN2019130560-appb-000017
对参数矩阵进行张量分解,得到的核矩阵为G k[m k,n k];
分解后的深度神经网络模型运算过程表示为:
Figure PCTCN2019130560-appb-000018
针对神经网络的全连接层,使用张量火车张量分解算法将参数矩阵分解为矩阵连乘状态,可以显著减少全连接层参数量。
具体地,在S104中,当选取层为卷积层,使用im2col操作将输入张量
Figure PCTCN2019130560-appb-000019
转化为大小为F′F′×k 2M的矩阵,采用im2col的作用是为了优化卷积运算;
将卷积核张量
Figure PCTCN2019130560-appb-000020
转化为大小为k 2M×N的矩阵,其中F为输入张量的边长,其中边长相等、M为输入通道数、N为输出通道数,k为卷积核边长;
对卷积核参数矩阵进行张量分解,分别将输入维度和输出维度进行分解:
Figure PCTCN2019130560-appb-000021
将卷积核矩阵K张量化为张量
Figure PCTCN2019130560-appb-000022
分解后得到核矩阵为G k[m k,n k];
分解后得卷积运算表示为:
Figure PCTCN2019130560-appb-000023
针对神经网络的卷积层,在im2col操作基础上对卷积核矩阵进行张量火 车分解,可以减少卷积层参数量。适用自适应张量火车分解秩算法自动确定卷积层和全连接层的分解秩;在张量分解的基础上,使用权重量化算法将32bit全精度参数量化至8bit,可以加速神经网络的推断速度。
在S104,所述重复选取下一层网络作为选取层进行张量分解,直至所述可选网络层全部完成核矩阵分解,得到压缩后的深度神经网络模型之后,所述方法还包括:
对所述深度神经网络模型进行量化操作,具体地,将32bit全精度的核矩阵参数量化至8bit的整数型,完成对深度神经网络模型的量化操作。
使用量化核矩阵参数的方法解决了张量火车分解的带来的网络层数增加,前向加速效果不明显的问题,同时将权重量化技术应用在张量火车分解后的核矩阵上,具有比现有技术更加快速的前向推断速度。
本申请实施例中提供的深度神经网络压缩方法的另一种实施例,所述方法包括:
S201、获取待压缩深度神经网络模型。
S202、按照从最后一层网络向第一层网络开始,依次在待压缩深度神经网络模型选取一层网络作为选取层,并固定除选取层之外所有网络层参数;
S203、当选取层为全连接层,将该选取层层的输入向量x∈R M转化为维度为(m 1,...,m d)的高维张量
Figure PCTCN2019130560-appb-000024
输出向量y∈R N转化为维度为(n 1,...,n d)的高阶张量
Figure PCTCN2019130560-appb-000025
偏置b=R N分解为维度为(n 1,...,n d)的高维张量
Figure PCTCN2019130560-appb-000026
Figure PCTCN2019130560-appb-000027
使用自适应张量火车分解秩算法对参数矩阵进行张量火车分解,得到的核矩阵(TT-Cores)为G k[m k,n k];
分解后的网络运算过程表示为:
Figure PCTCN2019130560-appb-000028
S204、当选取层为卷积层,使用im2col操作将输入张量
Figure PCTCN2019130560-appb-000029
转化为大小为F′F′×k 2M的矩阵;将卷积核张量
Figure PCTCN2019130560-appb-000030
转化为大小为k 2M×N的矩阵,其中F为输入张量的边长,其中,边长相等、M为输入通道数、N为输出通道数,k为卷积核边长,使用自适应张量火车分解算法对卷积核参数矩阵进行张量火车分解,首先将输入输出维度进行分解:
Figure PCTCN2019130560-appb-000031
将卷积核矩阵K张量化为张量
Figure PCTCN2019130560-appb-000032
分解后得到得核矩阵为G k[m k,n k];分解后得卷积运算可以表示为:
Figure PCTCN2019130560-appb-000033
S205、重复步骤202、203、204,直至所有网络层都被分解,最终得到的网络就是经过张量火车分解压缩后的深度神经网络模型。
S206、对得到的深度神经网络模型进行量化操作。
S207、对量化操作后的深度神经网络模型进行恢复精度调整。
针对神经网络的全连接层,使用张量火车张量分解算法将参数矩阵分解为矩阵连乘状态,可以显著减少全连接层参数量;针对神经网络的卷积层,在im2col操作基础上对卷积核矩阵进行张量火车分解,可以减少卷积层参数量;适用自适应张量火车分解秩算法自动确定卷积层和全连接层的分解秩;在张量分解的基础上,使用权重量化算法将32bit全精度参数量化至8bit,可以加速神经网络的推断速度,本申请提供的的技术方案可以有效的压缩和加速现有的主流神经网络。
本申请提供的方案已经在VGG-16网络的全连接层进行过实验,实验结果表明本本申请压缩和加速效果良好。自适应张量火车分解应用在VGG-16的全连接层进行参数压缩进行了初步的实验。整体的网络参数压缩比例为3.9。但top5错误率仅仅从11.8%上升至12.3%,在CPU和GPU上分别对压缩前后的推算时间做了对比,如表1所示。
Figure PCTCN2019130560-appb-000034
表1
实验结果表明,张量火车分解对全连接层网络的推断速度有加速作用,得益于矩阵运算的并行处理,在GPU上的加速效果更为明显。
结合图3所示,本申请提供一种基于自适应张量火车分解算法的深度神经网络压缩装置,其特征在于,所述装置包括:
获取单元301,用于获取待压缩深度神经网络模型;
确定单元302,用于当确定所述待压缩深度神经网络模型有可选网络层,按照预定次序依次选取一层网络作为选取层,固定所述可选网络层中余下层网络参数不变;
张量分解单元303,用于根据所述选取层进行张量分解计算,通过调整精度要求值得到多个核矩阵,当所述核矩阵的精度差值符合预置条件时将所述核矩阵作为所述选取层的张量分解核矩阵;
执行单元304,用于重复选取下一层网络作为选取层进行张量分解,直至所述可选网络层全部完成核矩阵分解,得到压缩后的深度神经网络模型。
本申请提供的基于自适应张量火车分解秩算法的深度神经网络压缩装置,通过基于张量火车分解算法的自适应分解秩算法,在网络训练过程中根据设定的网络精度阈值逐层分解深度神经网络模型中每层的参数矩阵,分解的同时固定住其他网络层,并且按照顺序调整当前网络层的分解秩,并重新训练恢复精度,在达到精度阈值后将当前秩确定为该选取层网络的分解秩,解决了手动确 定分解秩的繁琐和不确定性问题,对参数矩阵进行压缩从而达到神经网络模型的压缩效果。
图4为本发明的计算机设备实施例的结构图。如图4所示,本实施例的计算机设备,包括:一个或多个处理器30,以及存储器40,存储器40用于存储一个或多个程序,当存储器40中存储的一个或多个程序被一个或多个处理器30执行,使得一个或多个处理器30实现如上图1-图2所示实施例的基于自适应张量火车分解秩算法的深度神经网络压缩方法。图4所示实施例中以包括多个处理器30为例。
例如,图5为本发明提供的一种计算机设备的示例图。图5示出了适于用来实现本发明实施方式的示例性计算机设备12a的框图。图5显示的计算机设备12a仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。
如图5所示,计算机设备12a以通用计算设备的形式表现。计算机设备12a的组件可以包括但不限于:一个或者多个处理器16a,系统存储器28a,连接不同系统组件(包括系统存储器28a和处理器16a)的总线18a。
总线18a表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括但不限于工业标准体系结构(ISA)总线,微通道体系结构(MAC)总线,增强型ISA总线、视频电子标准协会(VESA)局域总线以及外围组件互连(PCI)总线。
计算机设备12a典型地包括多种计算机系统可读介质。这些介质可以是任何能够被计算机设备12a访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。
系统存储器28a可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(RAM)30a和/或高速缓存存储器32a。计算机设备12a可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统34a可以用于读写不可移动的、非易失性磁介质(图5 未显示,通常称为“硬盘驱动器”)。尽管图5中未示出,可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如CD-ROM,DVD-ROM或者其它光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线18a相连。系统存储器28a可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本发明上述图1-图4各实施例的功能。
具有一组(至少一个)程序模块42a的程序/实用工具40a,可以存储在例如系统存储器28a中,这样的程序模块42a包括但不限于操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块42a通常执行本发明所描述的上述图1和图2各实施例中的功能和/或方法。
计算机设备12a也可以与一个或多个外部设备14a(例如键盘、指向设备、显示器24a等)通信,还可与一个或者多个使得用户能与该计算机设备12a交互的设备通信,和/或与使得该计算机设备12a能与一个或多个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口22a进行。并且,计算机设备12a还可以通过网络适配器20a与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器20a通过总线18a与计算机设备12a的其它模块通信。应当明白,尽管图中未示出,可以结合计算机设备12a使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理器、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。
处理器16a通过运行存储在系统存储器28a中的程序,从而执行各种功能应用以及数据处理,例如实现上述实施例所示的基于自适应张量火车分解秩算法的深度神经网络压缩方法。
本发明还提供一种计算机可读介质,其上存储有计算机程序,该程序被处 理器执行时实现如上述实施例所示的基于自适应张量火车分解秩算法的深度神经网络压缩方法。
本实施例的计算机可读介质可以包括上述图5所示实施例中的系统存储器28a中的RAM30a、和/或高速缓存存储器32a、和/或存储系统34a。
随着科技的发展,计算机程序的传播途径不再受限于有形介质,还可以直接从网络下载,或者采用其他方式获取。因此,本实施例中的计算机可读介质不仅可以包括有形的介质,还可以包括无形的介质。
本实施例的计算机可读介质可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是,但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括,但不限于磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、电线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言或其组合来编写用于执行本发明操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如 Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如”C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
在本发明所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
上述以软件功能单元的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本发明各个实施例所述方法的部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发 明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明保护的范围之内。

Claims (10)

  1. 一种深度神经网络压缩方法,其特征在于,所述深度神经网络压缩方法包括:
    获取待压缩深度神经网络模型;
    当确定所述待压缩深度神经网络模型有可选网络层,按照预定次序依次选取一层网络作为选取层,固定所述可选网络层中余下层网络参数不变;
    根据所述选取层进行张量分解计算,通过调整精度要求值得到多个核矩阵,当所述核矩阵的精度差值符合预置条件时将所述核矩阵作为所述选取层的张量分解核矩阵;
    重复选取下一层网络作为选取层进行张量分解,直至所述可选网络层全部完成核矩阵分解,得到压缩后的深度神经网络模型。
  2. 根据权利要求所述的深度神经网络压缩方法,其特征在于,所述根据所述选取层进行张量分解计算,通过调整精度要求值得到多个核矩阵,当所述核矩阵的精度差值符合预置条件时将所述核矩阵作为所述选取层的张量分解核矩阵,包括:
    获取所述选取层的参数矩阵W和网络精度L,并将所述参数矩阵W转化为高维张量T;
    设定误差要求值ε并记录在数组e[]中,对所述高维张量T进行张量算法压缩后并还原得到张量T′,并满足:
    Figure PCTCN2019130560-appb-100001
    确定奇异值分解截断值δ,
    Figure PCTCN2019130560-appb-100002
    其中d为高维张量T的维度;
    将所述高维张量T逐维展开为矩阵,并利用所述奇异值分解截断值δ对所述矩阵进行奇异值分解,得到分解秩r k和分解后的核矩阵;
    对所述选取层进行精度调整得到精度L′,确定精度差值Δ=L-L′记录在数组l[]中;
    按照从大到小的顺序调整精度要求值ε,重复进行所述精度差值的确定,直到所述精度差值Δ连续n次不大于α则停止循环,将得到的核矩阵作为所述选取层的张量分解核矩阵,其中所述n和α均为超参数。
  3. 根据权利要求1或2所述的深度神经网络压缩方法,其特征在于,所述根据所述选取层进行张量分解计算,通过调整精度要求值得到多个核矩阵,当所述核矩阵的精度差值符合预置条件时将所述核矩阵作为所述选取层的张量分解核矩阵,包括:
    当所述选取层为全连接层,将所述选取层的输入向量x∈R M转化为维度为(m 1,...,m d)的高维张量
    Figure PCTCN2019130560-appb-100003
    Figure PCTCN2019130560-appb-100004
    输出向量y∈R N转化为维度为(n 1,...,n d)的高阶张量
    Figure PCTCN2019130560-appb-100005
    Figure PCTCN2019130560-appb-100006
    偏置b=R N分解为维度为(n 1,...,n d)的高维张量B,
    Figure PCTCN2019130560-appb-100007
    对参数矩阵进行张量分解,得到的核矩阵为G k[m k,n k];
    分解后的深度神经网络模型运算过程表示为:
    Figure PCTCN2019130560-appb-100008
  4. 根据权利要求3所述的深度神经网络压缩方法,其特征在于,所述根据所述选取层进行张量分解计算,通过调整精度要求值得到多个核矩阵,当所述核矩阵的精度差值符合预置条件时将所述核矩阵作为所述选取层的张量分解核矩阵,包括:
    当选取层为卷积层,使用im2col操作将输入张量
    Figure PCTCN2019130560-appb-100009
    转化为大小为F′F′×k 2M的矩阵;
    将卷积核张量
    Figure PCTCN2019130560-appb-100010
    转化为大小为k 2M×N的矩阵,其中F为输入张量的边长,其中边长相等、M为输入通道数、N为输出通道数,k为卷积核边长;
    对卷积核参数矩阵进行张量分解,分别将输入维度和输出维度进行分解:
    Figure PCTCN2019130560-appb-100011
    将卷积核矩阵K张量化为张量
    Figure PCTCN2019130560-appb-100012
    分解后得到核矩阵为G k[m k,n k];
    分解后得卷积运算表示为:
    Figure PCTCN2019130560-appb-100013
  5. 根据权利要求1所述的深度神经网络压缩方法,其特征在于,所述重复选取下一层网络作为选取层进行张量分解,直至所述可选网络层全部完成核矩阵分解,得到压缩后的深度神经网络模型之后,还包括:
    对所述深度神经网络模型进行量化操作。
  6. 根据权利要求5所述的深度神经网络压缩方法,其特征在于,所述对所述深度神经网络模型进行量化操作,包括:
    将32bit全精度的核矩阵参数量化至8bit的整数型。
  7. 根据权利要求1所述的深度神经网络压缩方法,其特征在于,所述按照预定次序依次选取一层网络作为选取层,包括:
    按照从最后一层网络向第一层网络的顺序依次选取一层网络作为选取层。
  8. 一种深度神经网络压缩装置,其特征在于,所述装置包括:
    获取单元,用于获取待压缩深度神经网络模型;
    确定单元,用于当确定所述待压缩深度神经网络模型有可选网络层,按照预定次序依次选取一层网络作为选取层,固定所述可选网络层中余下层网络参数不 变;
    张量分解单元,用于根据所述选取层进行张量分解计算,通过调整精度要求值得到多个核矩阵,当所述核矩阵的精度差值符合预置条件时将所述核矩阵作为所述选取层的张量分解核矩阵;
    执行单元,用于重复选取下一层网络作为选取层进行张量分解,直至所述可选网络层全部完成核矩阵分解,得到压缩后的深度神经网络模型。
  9. 一种计算机设备,其特征在于,所述设备包括:
    一个或多个处理器;
    存储器,用于存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1至7中任一所述的方法。
  10. 一种计算机可读介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1至7中任一所述的方法。
PCT/CN2019/130560 2019-05-23 2019-12-31 一种深度神经网络压缩方法及相关设备 WO2020233130A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910435515.7 2019-05-23
CN201910435515.7A CN110263913A (zh) 2019-05-23 2019-05-23 一种深度神经网络压缩方法及相关设备

Publications (1)

Publication Number Publication Date
WO2020233130A1 true WO2020233130A1 (zh) 2020-11-26

Family

ID=67915263

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/130560 WO2020233130A1 (zh) 2019-05-23 2019-12-31 一种深度神经网络压缩方法及相关设备

Country Status (2)

Country Link
CN (1) CN110263913A (zh)
WO (1) WO2020233130A1 (zh)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112990454A (zh) * 2021-02-01 2021-06-18 国网安徽省电力有限公司检修分公司 基于集成dpu多核异构的神经网络计算加速方法及装置
CN113673694A (zh) * 2021-05-26 2021-11-19 阿里巴巴新加坡控股有限公司 数据处理方法及装置、电子设备和计算机可读存储介质
CN114691627A (zh) * 2020-12-30 2022-07-01 财团法人工业技术研究院 深度学习加速芯片的数据压缩方法、数据压缩系统及运算方法
CN114781650A (zh) * 2022-04-28 2022-07-22 北京百度网讯科技有限公司 一种数据处理方法、装置、设备以及存储介质
US11657284B2 (en) 2019-05-16 2023-05-23 Samsung Electronics Co., Ltd. Neural network model apparatus and compressing method of neural network model
CN116167431A (zh) * 2023-04-25 2023-05-26 之江实验室 一种基于混合精度模型加速的业务处理方法及装置
WO2023125838A1 (zh) * 2021-12-30 2023-07-06 深圳云天励飞技术股份有限公司 数据处理方法、装置、终端设备及计算机可读存储介质
EP4241206A4 (en) * 2020-12-01 2024-01-03 Huawei Technologies Co., Ltd. DEVICE AND METHOD FOR IMPLEMENTING A TENSOR STREAM DECOMPOSITION OPERATION
CN117540780A (zh) * 2024-01-09 2024-02-09 腾讯科技(深圳)有限公司 一种神经网络模型的压缩方法和相关装置
CN117973485A (zh) * 2024-03-29 2024-05-03 苏州元脑智能科技有限公司 模型轻量化方法、装置、计算机设备、存储介质及程序产品
WO2024159541A1 (en) * 2023-02-03 2024-08-08 Huawei Technologies Co., Ltd. Systems and methods for compression of deep learning model using reinforcement learning for low rank decomposition
CN118643884A (zh) * 2024-08-12 2024-09-13 成都启英泰伦科技有限公司 一种基于微调训练的端侧深度神经网络模型压缩方法

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110263913A (zh) * 2019-05-23 2019-09-20 深圳先进技术研究院 一种深度神经网络压缩方法及相关设备
WO2021077283A1 (zh) * 2019-10-22 2021-04-29 深圳鲲云信息科技有限公司 神经网络计算压缩方法、系统及存储介质
CN110852424B (zh) * 2019-11-15 2023-07-25 广东工业大学 一种对抗生成网络的处理方法和装置
KR20210136123A (ko) * 2019-11-22 2021-11-16 텐센트 아메리카 엘엘씨 신경망 모델 압축을 위한 양자화, 적응적 블록 파티셔닝 및 코드북 코딩을 위한 방법 및 장치
CN111210017B (zh) * 2019-12-24 2023-09-26 北京迈格威科技有限公司 确定布局顺序及数据处理的方法、装置、设备及存储介质
CN113326930B (zh) * 2020-02-29 2024-05-03 华为技术有限公司 数据处理方法、神经网络的训练方法及相关装置、设备
CN111401282B (zh) * 2020-03-23 2024-10-01 上海眼控科技股份有限公司 目标检测方法、装置、计算机设备和存储介质
CN113537485B (zh) * 2020-04-15 2024-09-06 北京金山数字娱乐科技有限公司 一种神经网络模型的压缩方法及装置
WO2021234967A1 (ja) * 2020-05-22 2021-11-25 日本電信電話株式会社 音声波形生成モデル学習装置、音声合成装置、それらの方法、およびプログラム
CN111898484A (zh) * 2020-07-14 2020-11-06 华中科技大学 生成模型的方法、装置、可读存储介质及电子设备
US11275671B2 (en) 2020-07-27 2022-03-15 Huawei Technologies Co., Ltd. Systems, methods and media for dynamically shaped tensors using liquid types
CN112541159A (zh) * 2020-09-30 2021-03-23 华为技术有限公司 一种模型训练方法及相关设备
CN112184557A (zh) * 2020-11-04 2021-01-05 上海携旅信息技术有限公司 超分辨率网络模型压缩方法、系统、设备和介质
WO2022141189A1 (zh) * 2020-12-30 2022-07-07 南方科技大学 一种循环神经网络精度和分解秩的自动搜索方法和装置
CN114692816B (zh) * 2020-12-31 2023-08-25 华为技术有限公司 神经网络模型的处理方法和设备
US20230106213A1 (en) * 2021-10-05 2023-04-06 Samsung Electronics Co., Ltd. Machine learning model compression using weighted low-rank factorization
CN116187401B (zh) * 2023-04-26 2023-07-14 首都师范大学 神经网络的压缩方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107480770A (zh) * 2017-07-27 2017-12-15 中国科学院自动化研究所 可调节量化位宽的神经网络量化与压缩的方法及装置
EP3293682A1 (en) * 2016-09-13 2018-03-14 Alcatel Lucent Method and device for analyzing sensor data
CN107944556A (zh) * 2017-12-12 2018-04-20 电子科技大学 基于块项张量分解的深度神经网络压缩方法
CN109766995A (zh) * 2018-12-28 2019-05-17 钟祥博谦信息科技有限公司 深度神经网络的压缩方法与装置
CN110263913A (zh) * 2019-05-23 2019-09-20 深圳先进技术研究院 一种深度神经网络压缩方法及相关设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3293682A1 (en) * 2016-09-13 2018-03-14 Alcatel Lucent Method and device for analyzing sensor data
CN107480770A (zh) * 2017-07-27 2017-12-15 中国科学院自动化研究所 可调节量化位宽的神经网络量化与压缩的方法及装置
CN107944556A (zh) * 2017-12-12 2018-04-20 电子科技大学 基于块项张量分解的深度神经网络压缩方法
CN109766995A (zh) * 2018-12-28 2019-05-17 钟祥博谦信息科技有限公司 深度神经网络的压缩方法与装置
CN110263913A (zh) * 2019-05-23 2019-09-20 深圳先进技术研究院 一种深度神经网络压缩方法及相关设备

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11657284B2 (en) 2019-05-16 2023-05-23 Samsung Electronics Co., Ltd. Neural network model apparatus and compressing method of neural network model
EP4241206A4 (en) * 2020-12-01 2024-01-03 Huawei Technologies Co., Ltd. DEVICE AND METHOD FOR IMPLEMENTING A TENSOR STREAM DECOMPOSITION OPERATION
CN114691627A (zh) * 2020-12-30 2022-07-01 财团法人工业技术研究院 深度学习加速芯片的数据压缩方法、数据压缩系统及运算方法
CN112990454A (zh) * 2021-02-01 2021-06-18 国网安徽省电力有限公司检修分公司 基于集成dpu多核异构的神经网络计算加速方法及装置
CN112990454B (zh) * 2021-02-01 2024-04-16 国网安徽省电力有限公司超高压分公司 基于集成dpu多核异构的神经网络计算加速方法及装置
CN113673694A (zh) * 2021-05-26 2021-11-19 阿里巴巴新加坡控股有限公司 数据处理方法及装置、电子设备和计算机可读存储介质
WO2023125838A1 (zh) * 2021-12-30 2023-07-06 深圳云天励飞技术股份有限公司 数据处理方法、装置、终端设备及计算机可读存储介质
CN114781650A (zh) * 2022-04-28 2022-07-22 北京百度网讯科技有限公司 一种数据处理方法、装置、设备以及存储介质
CN114781650B (zh) * 2022-04-28 2024-02-27 北京百度网讯科技有限公司 一种数据处理方法、装置、设备以及存储介质
WO2024159541A1 (en) * 2023-02-03 2024-08-08 Huawei Technologies Co., Ltd. Systems and methods for compression of deep learning model using reinforcement learning for low rank decomposition
CN116167431A (zh) * 2023-04-25 2023-05-26 之江实验室 一种基于混合精度模型加速的业务处理方法及装置
CN117540780A (zh) * 2024-01-09 2024-02-09 腾讯科技(深圳)有限公司 一种神经网络模型的压缩方法和相关装置
CN117973485A (zh) * 2024-03-29 2024-05-03 苏州元脑智能科技有限公司 模型轻量化方法、装置、计算机设备、存储介质及程序产品
CN118643884A (zh) * 2024-08-12 2024-09-13 成都启英泰伦科技有限公司 一种基于微调训练的端侧深度神经网络模型压缩方法

Also Published As

Publication number Publication date
CN110263913A (zh) 2019-09-20

Similar Documents

Publication Publication Date Title
WO2020233130A1 (zh) 一种深度神经网络压缩方法及相关设备
US11030522B2 (en) Reducing the size of a neural network through reduction of the weight matrices
KR102434726B1 (ko) 처리방법 및 장치
WO2022105117A1 (zh) 一种图像质量评价的方法、装置、计算机设备及存储介质
CN111488985A (zh) 深度神经网络模型压缩训练方法、装置、设备、介质
CN110830807B (zh) 图像压缩方法、装置及存储介质
CN114374440B (zh) 量子信道经典容量的估计方法及装置、电子设备和介质
WO2023138188A1 (zh) 特征融合模型训练及样本检索方法、装置和计算机设备
WO2020207174A1 (zh) 用于生成量化神经网络的方法和装置
WO2023231954A1 (zh) 一种数据的去噪方法以及相关设备
CN110751265A (zh) 一种轻量型神经网络构建方法、系统及电子设备
WO2023207039A1 (zh) 数据处理方法、装置、设备以及存储介质
US11531695B2 (en) Multiscale quantization for fast similarity search
JP7408741B2 (ja) マルチタスクのデプロイ方法、装置、電子機器及び記憶媒体
WO2024051655A1 (zh) 全视野组织学图像的处理方法、装置、介质和电子设备
WO2021012691A1 (zh) 用于检索图像的方法和装置
JP2020008836A (ja) 語彙テーブルの選択方法、装置およびコンピュータ読み取り可能な記憶媒体
CN113554149B (zh) 神经网络处理单元npu、神经网络的处理方法及其装置
WO2022246986A1 (zh) 数据处理方法、装置、设备及计算机可读存储介质
US20210342694A1 (en) Machine Learning Network Model Compression
CN109086819B (zh) caffemodel模型压缩方法、系统、设备及介质
WO2024109907A1 (zh) 一种量化方法、推荐方法以及装置
US20200242467A1 (en) Calculation method and calculation device for sparse neural network, electronic device, computer readable storage medium, and computer program product
CN115953651B (zh) 一种基于跨域设备的模型训练方法、装置、设备及介质
CN117351299A (zh) 图像生成及模型训练方法、装置、设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19929575

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19929575

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19929575

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 140622)

122 Ep: pct application non-entry in european phase

Ref document number: 19929575

Country of ref document: EP

Kind code of ref document: A1