WO2020233130A1 - 一种深度神经网络压缩方法及相关设备 - Google Patents
一种深度神经网络压缩方法及相关设备 Download PDFInfo
- Publication number
- WO2020233130A1 WO2020233130A1 PCT/CN2019/130560 CN2019130560W WO2020233130A1 WO 2020233130 A1 WO2020233130 A1 WO 2020233130A1 CN 2019130560 W CN2019130560 W CN 2019130560W WO 2020233130 A1 WO2020233130 A1 WO 2020233130A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- tensor
- decomposition
- layer
- neural network
- deep neural
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Definitions
- This application relates to the field of computer technology, in particular to a deep neural network compression method, device, equipment and computer readable medium.
- the weight of the deep convolutional neural network is decomposed into tensor, and the obtained multiple low-rank subtensors are used to replace the original weights, so that the acceleration and compression of large-scale deep convolutional neural networks can be realized;
- another Chinese patent application number: 201711319853.1 "," "Deep neural network compression method based on block item tensor decomposition” provides a deep neural network compression method based on block item tensor decomposition, which converts the weight matrix and input vector into high-order tensors and The block item tensor is decomposed, the fully connected layer of the deep neural network is replaced with the block item tensor layer, and the back propagation algorithm is used to train the replaced deep neural network.
- the decomposition rank of the neural network compression algorithm based on tensor train tensor decomposition is manually set through experience and parameter adjustment. During the training process, it is necessary to repeatedly explore the appropriate tensor train decomposition rank of each layer of neural network, which requires a lot of Time and energy.
- embodiments of the present application provide a deep neural network compression method, device, equipment, and computer-readable medium based on an adaptive tensor train decomposition rank algorithm.
- this application provides a deep neural network compression method, which includes:
- one layer of networks is selected as the selected layer in a predetermined order, and the network parameters of the remaining layers in the optional network layer are fixed unchanged;
- the tensor decomposition calculation is performed according to the selected layer, and multiple kernel matrices are obtained by adjusting the required precision value, and when the precision difference of the kernel matrix meets a preset condition, the kernel
- the matrix as the tensor decomposition kernel matrix of the selected layer includes:
- the tensor decomposition calculation is performed according to the selected layer, and multiple kernel matrices are obtained by adjusting the required precision value, and when the precision difference of the kernel matrix meets a preset condition, the kernel
- the matrix as the tensor decomposition kernel matrix of the selected layer includes:
- the output vector y ⁇ R N is transformed into a high-order tensor with dimensions (n 1 ,...,n d )
- the operation process of the decomposed deep neural network model is expressed as:
- the tensor decomposition calculation is performed according to the selected layer, and multiple kernel matrices are obtained by adjusting the required precision value, and when the precision difference of the kernel matrix meets a preset condition, the kernel
- the matrix as the tensor decomposition kernel matrix of the selected layer includes:
- the selected layer is a convolutional layer
- Convolution kernel tensor Converted into a matrix of size k 2 M ⁇ N, where F is the side length of the input tensor, where the side lengths are equal, M is the number of input channels, N is the number of output channels, and k is the side length of the convolution kernel;
- the method further includes:
- the quantifying operation on the deep neural network model includes:
- the step of sequentially selecting a layer of networks as the selection layer in a predetermined order includes:
- this application provides a deep neural network compression device based on an adaptive tensor train decomposition algorithm, the device includes:
- the acquiring unit is used to acquire the deep neural network model to be compressed
- the determining unit is configured to, when it is determined that the deep neural network model to be compressed has optional network layers, select one layer of networks as the selected layer in a predetermined order, and fix the network parameters of the remaining layers in the optional network layer unchanged;
- the tensor decomposition unit is configured to perform tensor decomposition calculation according to the selected layer, obtain multiple kernel matrices by adjusting the required precision value, and use the kernel matrix as the kernel matrix when the precision difference of the kernel matrix meets a preset condition
- the tensor decomposition kernel matrix of the selected layer is configured to perform tensor decomposition calculation according to the selected layer, obtain multiple kernel matrices by adjusting the required precision value, and use the kernel matrix as the kernel matrix when the precision difference of the kernel matrix meets a preset condition The tensor decomposition kernel matrix of the selected layer;
- the execution unit is used to repeatedly select the next layer of network as the selected layer to perform tensor decomposition, until all the optional network layers complete the kernel matrix decomposition to obtain a compressed deep neural network model.
- this application also provides a computer device, the device including:
- One or more processors are One or more processors;
- Memory used to store one or more programs
- the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the deep neural network compression method as described above.
- the present application also provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, the deep neural network compression method as described above is implemented.
- the deep neural network compression method, device, equipment, and computer readable medium based on the adaptive tensor train decomposition rank algorithm provided in this application, through the adaptive decomposition rank algorithm based on the tensor train decomposition algorithm, in the network training process according to the design Determine the network accuracy threshold to decompose the parameter matrix of each layer in the deep neural network model layer by layer, fix other network layers while decomposing, and adjust the decomposition rank of the current network layer in order, and retrain and restore the accuracy. After the accuracy threshold is reached Determining the current rank as the decomposition rank of the selected layer network solves the tedious and uncertain problem of manually determining the decomposition rank, and compresses the parameter matrix to achieve the compression effect of the neural network model.
- FIG. 1 is a flowchart of a deep neural network compression method based on an adaptive tensor train decomposition rank algorithm provided in an embodiment of the present application;
- FIG. 2 is a schematic diagram of tensor decomposition in an adaptive tensor train decomposition rank algorithm based on an embodiment of the present application
- FIG. 3 is a structural block diagram of a deep neural network compression device based on an adaptive tensor train decomposition rank algorithm provided in an embodiment of the present application;
- FIG. 4 is a structural diagram of an embodiment of a computer device provided in an embodiment of the present application.
- Fig. 5 is an exemplary diagram of a computer device provided in an embodiment of the present application.
- an embodiment of the deep neural network compression method provided in the embodiment of the present application, the method includes:
- the deep neural network model to be compressed can be VGGNet, GoogLeNet, ResNet, etc., which is not limited.
- the optional network layers determine whether there are optional network layers in the neural network model to be compressed.
- the optional network layers can be decomposed layer by layer.
- the optional network layers have multiple layers.
- the network layers are selected in order, such as from the last One layer of network is selected in order from the first layer of network to the first layer of network as the selected layer. Of course, it can also be selected from the first layer of network to the last layer of network. This is not limited. For the selected selected layer, the remaining The network layer parameters remain unchanged.
- n and ⁇ are both hyperparameters.
- hyperparameters are parameters that are set before starting the learning process, rather than parameter data obtained through training.
- S104 Repeatedly selecting the next layer of network as the selected layer to perform tensor decomposition, until all the optional network layers complete the kernel matrix decomposition, and a compressed deep neural network model is obtained.
- the parameter matrix of each layer in the deep neural network model is decomposed layer by layer according to the set network accuracy threshold during the network training process, and other network layers are fixed while decomposing, and Adjust the decomposition rank of the current network layer in order, and retrain the recovery accuracy.
- the current rank is determined as the decomposition rank of the selected layer network, which solves the tedious and uncertain problem of manually determining the decomposition rank.
- the matrix is compressed to achieve the compression effect of the neural network model.
- the tensor train decomposition schematic diagram uses the tensor train tensor decomposition algorithm to compress the parameter matrices of the fully connected layer and the convolutional layer of the neural network.
- the principle of tensor train decomposition is to express each element in a high-dimensional tensor by multiplying several matrices. which is:
- A(i 1 ,i 2 ,...,i d ) G 1 (i 1 )G 2 (i 2 )...G d (i d );
- G k (i k ) is a matrix of size r k-1 ⁇ r k
- r k represents the rank of tensor train decomposition (TT-ranks), in order to ensure that the final result is a scalar
- a tensor train decomposition of a size of 5 ⁇ 4 ⁇ 5 tensor A, any element in tensor A, for example, A 231 can be written in the form of 3-matrix multiplication.
- the decomposition rank of the tensor train is set to ( 1 , 3, 3, 1 )
- the size of each matrix is r k-1 ⁇ r k , which are 1 ⁇ 3, 3 ⁇ 3, and 3 ⁇ 1, respectively.
- each matrix in G k is determined by the element's subscript i k , which are 2, 3, and 1, respectively.
- the input vector of the selected layer Converted to a high-dimensional tensor with dimensions (m 1 ,..., m d )
- the output vector y ⁇ R N is transformed into a high-order tensor with dimensions (n 1 ,...,n d )
- the operation process of the decomposed deep neural network model is expressed as:
- the tensor train tensor decomposition algorithm is used to decompose the parameter matrix into the matrix multiplication state, which can significantly reduce the amount of parameters in the fully connected layer.
- Convolution kernel tensor Converted into a matrix of size k 2 M ⁇ N, where F is the side length of the input tensor, where the side lengths are equal, M is the number of input channels, N is the number of output channels, and k is the side length of the convolution kernel;
- the tensor train decomposition of the convolution kernel matrix based on the im2col operation can reduce the parameter of the convolution layer.
- the adaptive tensor train decomposition rank algorithm is used to automatically determine the decomposition rank of the convolutional layer and the fully connected layer; on the basis of tensor decomposition, the weighted quantization algorithm is used to quantize the 32bit full precision parameters to 8bit, which can accelerate the inference speed of the neural network .
- the method repeatedly selects the next layer of network as the selected layer to perform tensor decomposition, until all the optional network layers complete the kernel matrix decomposition to obtain a compressed deep neural network model, the method further includes:
- the method of quantifying the parameters of the kernel matrix solves the problem of the increase in the number of network layers caused by the decomposition of the tensor train and the insignificant effect of forward acceleration.
- the weight quantization technique is applied to the kernel matrix after the decomposition of the tensor train, which has a comparative advantage.
- the existing technology has faster forward inference speed.
- the method includes:
- S202 Starting from the last layer of the network to the first layer of network, sequentially select a layer of networks in the to-be-compressed deep neural network model as the selected layer, and fix all network layer parameters except the selected layer;
- the decomposed network operation process is expressed as:
- the selected layer is a convolutional layer
- use the im2col operation to input the tensor Converted into a matrix of size F′F′ ⁇ k 2 M; the convolution kernel tensor Converted to a matrix of size k 2 M ⁇ N, where F is the side length of the input tensor, where the side lengths are equal, M is the number of input channels, N is the number of output channels, and k is the side length of the convolution kernel.
- the adaptive tensor train decomposition algorithm performs tensor train decomposition on the convolution kernel parameter matrix.
- S207 Perform restoration accuracy adjustment on the deep neural network model after the quantization operation.
- the tensor train tensor decomposition algorithm For the fully connected layer of the neural network, use the tensor train tensor decomposition algorithm to decompose the parameter matrix into the matrix multiplication state, which can significantly reduce the amount of parameters of the fully connected layer; for the convolution layer of the neural network, the volume is based on the im2col operation
- the tensor train decomposition of the product kernel matrix can reduce the parameter quantity of the convolutional layer;
- the adaptive tensor train decomposition rank algorithm is used to automatically determine the decomposition rank of the convolutional layer and the fully connected layer; on the basis of tensor decomposition, the use weight is quantified
- the algorithm quantizes the 32-bit full-precision parameters to 8 bits, which can accelerate the inference speed of the neural network.
- the technical solution provided in this application can effectively compress and accelerate the existing mainstream neural network.
- the solution provided in this application has been tested on the fully connected layer of the VGG-16 network, and the experimental results show that the compression and acceleration effect of this application is good.
- the adaptive tensor train decomposition is applied to the fully connected layer of VGG-16 for parameter compression and preliminary experiments are carried out.
- the overall network parameter compression ratio is 3.9.
- the top5 error rate only increased from 11.8% to 12.3%.
- the estimated time before and after compression was compared on the CPU and GPU, as shown in Table 1.
- the experimental results show that the tensor train decomposition has an acceleration effect on the inference speed of the fully connected layer network. Thanks to the parallel processing of matrix operations, the acceleration effect on the GPU is more obvious.
- the present application provides a deep neural network compression device based on an adaptive tensor train decomposition algorithm, characterized in that the device includes:
- the obtaining unit 301 is used to obtain the deep neural network model to be compressed
- the determining unit 302 is configured to, when it is determined that the deep neural network model to be compressed has optional network layers, select a layer of networks as the selected layer in a predetermined order, and fix the network parameters of the remaining layers in the optional network layer unchanged;
- the tensor decomposition unit 303 is configured to perform tensor decomposition calculations according to the selected layer, obtain multiple kernel matrices by adjusting the required precision values, and use the kernel matrix as the kernel matrix when the precision difference of the kernel matrix meets a preset condition The tensor decomposition kernel matrix of the selected layer;
- the execution unit 304 is configured to repeatedly select the next layer of network as the selected layer to perform tensor decomposition, until all the optional network layers complete the kernel matrix decomposition to obtain a compressed deep neural network model.
- the deep neural network compression device based on the adaptive tensor train decomposition rank algorithm provided in this application, through the adaptive decomposition rank algorithm based on the tensor train decomposition algorithm, decomposes the depth layer by layer according to the set network accuracy threshold during the network training process
- the parameter matrix of each layer in the neural network model is decomposed while fixing other network layers, and the decomposition rank of the current network layer is adjusted in order, and the accuracy is retrained. After the accuracy threshold is reached, the current rank is determined as the selected layer network
- the decomposition rank solves the tedious and uncertain problem of manually determining the decomposition rank, and compresses the parameter matrix to achieve the compression effect of the neural network model.
- Figure 4 is a structural diagram of an embodiment of a computer device of the present invention.
- the computer device of this embodiment includes: one or more processors 30, and a memory 40.
- the memory 40 is used to store one or more programs.
- processors 30 execute, so that one or more processors 30 implement the deep neural network compression method based on the adaptive tensor train decomposition rank algorithm in the embodiments shown in FIGS. 1 to 2 above.
- multiple processors 30 are included as an example.
- FIG. 5 is an example diagram of a computer device provided by the present invention.
- Figure 5 shows a block diagram of an exemplary computer device 12a suitable for implementing embodiments of the present invention.
- the computer device 12a shown in FIG. 5 is only an example, and should not bring any limitation to the function and application scope of the embodiment of the present invention.
- the computer device 12a is represented in the form of a general-purpose computing device.
- the components of the computer device 12a may include, but are not limited to: one or more processors 16a, a system memory 28a, and a bus 18a connecting different system components (including the system memory 28a and the processor 16a).
- the bus 18a represents one or more of several types of bus structures, including a memory bus or a memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any bus structure among multiple bus structures.
- these architectures include but are not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and peripheral component interconnection ( PCI) bus.
- ISA industry standard architecture
- MAC microchannel architecture
- VESA Video Electronics Standards Association
- PCI peripheral component interconnection
- the computer device 12a typically includes a variety of computer system readable media. These media can be any available media that can be accessed by the computer device 12a, including volatile and non-volatile media, removable and non-removable media.
- the system memory 28a may include a computer system readable medium in the form of volatile memory, such as random access memory (RAM) 30a and/or cache memory 32a.
- the computer device 12a may further include other removable/non-removable, volatile/nonvolatile computer system storage media.
- the storage system 34a can be used to read and write non-removable, non-volatile magnetic media (not shown in FIG. 5, usually referred to as a "hard drive").
- a disk drive for reading and writing to a removable non-volatile disk such as a "floppy disk”
- a removable non-volatile disk such as CD-ROM, DVD-ROM
- other optical media read and write optical disc drives.
- each drive can be connected to the bus 18a through one or more data media interfaces.
- the system memory 28a may include at least one program product.
- the program product has a set (for example, at least one) program modules configured to perform the functions of the above-mentioned embodiments of the present invention in FIGS.
- a program/utility tool 40a having a set of (at least one) program module 42a may be stored in, for example, the system memory 28a.
- Such program module 42a includes but is not limited to an operating system, one or more application programs, other program modules, and programs Data, each of these examples or some combination may include the realization of a network environment.
- the program module 42a usually executes the functions and/or methods in the above-mentioned embodiments of FIG. 1 and FIG. 2 described in the present invention.
- the computer device 12a may also communicate with one or more external devices 14a (such as a keyboard, pointing device, display 24a, etc.), and may also communicate with one or more devices that enable a user to interact with the computer device 12a, and/or communicate with Any device (such as a network card, modem, etc.) that enables the computer device 12a to communicate with one or more other computing devices. This communication can be performed through an input/output (I/O) interface 22a.
- the computer device 12a may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 20a.
- networks for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet
- the network adapter 20a communicates with other modules of the computer device 12a through the bus 18a. It should be understood that although not shown in the figure, other hardware and/or software modules can be used in conjunction with the computer device 12a, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.
- the processor 16a executes various functional applications and data processing by running programs stored in the system memory 28a, such as implementing the deep neural network compression method based on the adaptive tensor train decomposition rank algorithm shown in the foregoing embodiment.
- the present invention also provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, it implements the deep neural network compression method based on the adaptive tensor train decomposition rank algorithm as shown in the above embodiment.
- the computer-readable medium of this embodiment may include the RAM 30a, and/or the cache memory 32a, and/or the storage system 34a in the system memory 28a in the embodiment shown in FIG. 5 above.
- the dissemination of computer programs is no longer limited to tangible media. It can also be downloaded directly from the Internet or obtained in other ways. Therefore, the computer-readable media in this embodiment may include not only tangible media, but also intangible media.
- the computer-readable medium in this embodiment may adopt any combination of one or more computer-readable media.
- the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium.
- the computer-readable storage medium may be, for example, but is not limited to an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination of the above. More specific examples (non-exhaustive list) of computer-readable storage media include: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), Erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
- the computer-readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, apparatus, or device.
- the computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and computer-readable program code is carried therein. This propagated data signal can take many forms, including, but not limited to, a magnetic signal, an optical signal, or any suitable combination of the foregoing.
- the computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium.
- the computer-readable medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .
- the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to wireless, wire, optical cable, RF, etc., or any suitable combination of the above.
- the computer program code used to perform the operations of the present invention can be written in one or more programming languages or a combination thereof.
- the programming languages include object-oriented programming languages-such as Java, Smalltalk, C++, and also conventional Procedural programming language-such as "C" language or similar programming language.
- the program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or server.
- the remote computer can be connected to the user’s computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to pass Internet connection).
- LAN local area network
- WAN wide area network
- Internet service provider for example, using an Internet service provider to pass Internet connection.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.
- the above-mentioned integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium.
- the above-mentioned software functional unit is stored in a storage medium and includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor execute the method described in the various embodiments of the present invention. Part of the steps.
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Image Processing (AREA)
Abstract
Description
Claims (10)
- 一种深度神经网络压缩方法,其特征在于,所述深度神经网络压缩方法包括:获取待压缩深度神经网络模型;当确定所述待压缩深度神经网络模型有可选网络层,按照预定次序依次选取一层网络作为选取层,固定所述可选网络层中余下层网络参数不变;根据所述选取层进行张量分解计算,通过调整精度要求值得到多个核矩阵,当所述核矩阵的精度差值符合预置条件时将所述核矩阵作为所述选取层的张量分解核矩阵;重复选取下一层网络作为选取层进行张量分解,直至所述可选网络层全部完成核矩阵分解,得到压缩后的深度神经网络模型。
- 根据权利要求所述的深度神经网络压缩方法,其特征在于,所述根据所述选取层进行张量分解计算,通过调整精度要求值得到多个核矩阵,当所述核矩阵的精度差值符合预置条件时将所述核矩阵作为所述选取层的张量分解核矩阵,包括:获取所述选取层的参数矩阵W和网络精度L,并将所述参数矩阵W转化为高维张量T;设定误差要求值ε并记录在数组e[]中,对所述高维张量T进行张量算法压缩后并还原得到张量T′,并满足:将所述高维张量T逐维展开为矩阵,并利用所述奇异值分解截断值δ对所述矩阵进行奇异值分解,得到分解秩r k和分解后的核矩阵;对所述选取层进行精度调整得到精度L′,确定精度差值Δ=L-L′记录在数组l[]中;按照从大到小的顺序调整精度要求值ε,重复进行所述精度差值的确定,直到所述精度差值Δ连续n次不大于α则停止循环,将得到的核矩阵作为所述选取层的张量分解核矩阵,其中所述n和α均为超参数。
- 根据权利要求3所述的深度神经网络压缩方法,其特征在于,所述根据所述选取层进行张量分解计算,通过调整精度要求值得到多个核矩阵,当所述核矩阵的精度差值符合预置条件时将所述核矩阵作为所述选取层的张量分解核矩阵,包括:分解后得卷积运算表示为:
- 根据权利要求1所述的深度神经网络压缩方法,其特征在于,所述重复选取下一层网络作为选取层进行张量分解,直至所述可选网络层全部完成核矩阵分解,得到压缩后的深度神经网络模型之后,还包括:对所述深度神经网络模型进行量化操作。
- 根据权利要求5所述的深度神经网络压缩方法,其特征在于,所述对所述深度神经网络模型进行量化操作,包括:将32bit全精度的核矩阵参数量化至8bit的整数型。
- 根据权利要求1所述的深度神经网络压缩方法,其特征在于,所述按照预定次序依次选取一层网络作为选取层,包括:按照从最后一层网络向第一层网络的顺序依次选取一层网络作为选取层。
- 一种深度神经网络压缩装置,其特征在于,所述装置包括:获取单元,用于获取待压缩深度神经网络模型;确定单元,用于当确定所述待压缩深度神经网络模型有可选网络层,按照预定次序依次选取一层网络作为选取层,固定所述可选网络层中余下层网络参数不 变;张量分解单元,用于根据所述选取层进行张量分解计算,通过调整精度要求值得到多个核矩阵,当所述核矩阵的精度差值符合预置条件时将所述核矩阵作为所述选取层的张量分解核矩阵;执行单元,用于重复选取下一层网络作为选取层进行张量分解,直至所述可选网络层全部完成核矩阵分解,得到压缩后的深度神经网络模型。
- 一种计算机设备,其特征在于,所述设备包括:一个或多个处理器;存储器,用于存储一个或多个程序;当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1至7中任一所述的方法。
- 一种计算机可读介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1至7中任一所述的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910435515.7 | 2019-05-23 | ||
CN201910435515.7A CN110263913A (zh) | 2019-05-23 | 2019-05-23 | 一种深度神经网络压缩方法及相关设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020233130A1 true WO2020233130A1 (zh) | 2020-11-26 |
Family
ID=67915263
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/130560 WO2020233130A1 (zh) | 2019-05-23 | 2019-12-31 | 一种深度神经网络压缩方法及相关设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110263913A (zh) |
WO (1) | WO2020233130A1 (zh) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112990454A (zh) * | 2021-02-01 | 2021-06-18 | 国网安徽省电力有限公司检修分公司 | 基于集成dpu多核异构的神经网络计算加速方法及装置 |
CN113673694A (zh) * | 2021-05-26 | 2021-11-19 | 阿里巴巴新加坡控股有限公司 | 数据处理方法及装置、电子设备和计算机可读存储介质 |
CN114691627A (zh) * | 2020-12-30 | 2022-07-01 | 财团法人工业技术研究院 | 深度学习加速芯片的数据压缩方法、数据压缩系统及运算方法 |
CN114781650A (zh) * | 2022-04-28 | 2022-07-22 | 北京百度网讯科技有限公司 | 一种数据处理方法、装置、设备以及存储介质 |
US11657284B2 (en) | 2019-05-16 | 2023-05-23 | Samsung Electronics Co., Ltd. | Neural network model apparatus and compressing method of neural network model |
CN116167431A (zh) * | 2023-04-25 | 2023-05-26 | 之江实验室 | 一种基于混合精度模型加速的业务处理方法及装置 |
WO2023125838A1 (zh) * | 2021-12-30 | 2023-07-06 | 深圳云天励飞技术股份有限公司 | 数据处理方法、装置、终端设备及计算机可读存储介质 |
EP4241206A4 (en) * | 2020-12-01 | 2024-01-03 | Huawei Technologies Co., Ltd. | DEVICE AND METHOD FOR IMPLEMENTING A TENSOR STREAM DECOMPOSITION OPERATION |
CN117540780A (zh) * | 2024-01-09 | 2024-02-09 | 腾讯科技(深圳)有限公司 | 一种神经网络模型的压缩方法和相关装置 |
CN117973485A (zh) * | 2024-03-29 | 2024-05-03 | 苏州元脑智能科技有限公司 | 模型轻量化方法、装置、计算机设备、存储介质及程序产品 |
WO2024159541A1 (en) * | 2023-02-03 | 2024-08-08 | Huawei Technologies Co., Ltd. | Systems and methods for compression of deep learning model using reinforcement learning for low rank decomposition |
CN118643884A (zh) * | 2024-08-12 | 2024-09-13 | 成都启英泰伦科技有限公司 | 一种基于微调训练的端侧深度神经网络模型压缩方法 |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110263913A (zh) * | 2019-05-23 | 2019-09-20 | 深圳先进技术研究院 | 一种深度神经网络压缩方法及相关设备 |
WO2021077283A1 (zh) * | 2019-10-22 | 2021-04-29 | 深圳鲲云信息科技有限公司 | 神经网络计算压缩方法、系统及存储介质 |
CN110852424B (zh) * | 2019-11-15 | 2023-07-25 | 广东工业大学 | 一种对抗生成网络的处理方法和装置 |
KR20210136123A (ko) * | 2019-11-22 | 2021-11-16 | 텐센트 아메리카 엘엘씨 | 신경망 모델 압축을 위한 양자화, 적응적 블록 파티셔닝 및 코드북 코딩을 위한 방법 및 장치 |
CN111210017B (zh) * | 2019-12-24 | 2023-09-26 | 北京迈格威科技有限公司 | 确定布局顺序及数据处理的方法、装置、设备及存储介质 |
CN113326930B (zh) * | 2020-02-29 | 2024-05-03 | 华为技术有限公司 | 数据处理方法、神经网络的训练方法及相关装置、设备 |
CN111401282B (zh) * | 2020-03-23 | 2024-10-01 | 上海眼控科技股份有限公司 | 目标检测方法、装置、计算机设备和存储介质 |
CN113537485B (zh) * | 2020-04-15 | 2024-09-06 | 北京金山数字娱乐科技有限公司 | 一种神经网络模型的压缩方法及装置 |
WO2021234967A1 (ja) * | 2020-05-22 | 2021-11-25 | 日本電信電話株式会社 | 音声波形生成モデル学習装置、音声合成装置、それらの方法、およびプログラム |
CN111898484A (zh) * | 2020-07-14 | 2020-11-06 | 华中科技大学 | 生成模型的方法、装置、可读存储介质及电子设备 |
US11275671B2 (en) | 2020-07-27 | 2022-03-15 | Huawei Technologies Co., Ltd. | Systems, methods and media for dynamically shaped tensors using liquid types |
CN112541159A (zh) * | 2020-09-30 | 2021-03-23 | 华为技术有限公司 | 一种模型训练方法及相关设备 |
CN112184557A (zh) * | 2020-11-04 | 2021-01-05 | 上海携旅信息技术有限公司 | 超分辨率网络模型压缩方法、系统、设备和介质 |
WO2022141189A1 (zh) * | 2020-12-30 | 2022-07-07 | 南方科技大学 | 一种循环神经网络精度和分解秩的自动搜索方法和装置 |
CN114692816B (zh) * | 2020-12-31 | 2023-08-25 | 华为技术有限公司 | 神经网络模型的处理方法和设备 |
US20230106213A1 (en) * | 2021-10-05 | 2023-04-06 | Samsung Electronics Co., Ltd. | Machine learning model compression using weighted low-rank factorization |
CN116187401B (zh) * | 2023-04-26 | 2023-07-14 | 首都师范大学 | 神经网络的压缩方法、装置、电子设备及存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107480770A (zh) * | 2017-07-27 | 2017-12-15 | 中国科学院自动化研究所 | 可调节量化位宽的神经网络量化与压缩的方法及装置 |
EP3293682A1 (en) * | 2016-09-13 | 2018-03-14 | Alcatel Lucent | Method and device for analyzing sensor data |
CN107944556A (zh) * | 2017-12-12 | 2018-04-20 | 电子科技大学 | 基于块项张量分解的深度神经网络压缩方法 |
CN109766995A (zh) * | 2018-12-28 | 2019-05-17 | 钟祥博谦信息科技有限公司 | 深度神经网络的压缩方法与装置 |
CN110263913A (zh) * | 2019-05-23 | 2019-09-20 | 深圳先进技术研究院 | 一种深度神经网络压缩方法及相关设备 |
-
2019
- 2019-05-23 CN CN201910435515.7A patent/CN110263913A/zh active Pending
- 2019-12-31 WO PCT/CN2019/130560 patent/WO2020233130A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3293682A1 (en) * | 2016-09-13 | 2018-03-14 | Alcatel Lucent | Method and device for analyzing sensor data |
CN107480770A (zh) * | 2017-07-27 | 2017-12-15 | 中国科学院自动化研究所 | 可调节量化位宽的神经网络量化与压缩的方法及装置 |
CN107944556A (zh) * | 2017-12-12 | 2018-04-20 | 电子科技大学 | 基于块项张量分解的深度神经网络压缩方法 |
CN109766995A (zh) * | 2018-12-28 | 2019-05-17 | 钟祥博谦信息科技有限公司 | 深度神经网络的压缩方法与装置 |
CN110263913A (zh) * | 2019-05-23 | 2019-09-20 | 深圳先进技术研究院 | 一种深度神经网络压缩方法及相关设备 |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11657284B2 (en) | 2019-05-16 | 2023-05-23 | Samsung Electronics Co., Ltd. | Neural network model apparatus and compressing method of neural network model |
EP4241206A4 (en) * | 2020-12-01 | 2024-01-03 | Huawei Technologies Co., Ltd. | DEVICE AND METHOD FOR IMPLEMENTING A TENSOR STREAM DECOMPOSITION OPERATION |
CN114691627A (zh) * | 2020-12-30 | 2022-07-01 | 财团法人工业技术研究院 | 深度学习加速芯片的数据压缩方法、数据压缩系统及运算方法 |
CN112990454A (zh) * | 2021-02-01 | 2021-06-18 | 国网安徽省电力有限公司检修分公司 | 基于集成dpu多核异构的神经网络计算加速方法及装置 |
CN112990454B (zh) * | 2021-02-01 | 2024-04-16 | 国网安徽省电力有限公司超高压分公司 | 基于集成dpu多核异构的神经网络计算加速方法及装置 |
CN113673694A (zh) * | 2021-05-26 | 2021-11-19 | 阿里巴巴新加坡控股有限公司 | 数据处理方法及装置、电子设备和计算机可读存储介质 |
WO2023125838A1 (zh) * | 2021-12-30 | 2023-07-06 | 深圳云天励飞技术股份有限公司 | 数据处理方法、装置、终端设备及计算机可读存储介质 |
CN114781650A (zh) * | 2022-04-28 | 2022-07-22 | 北京百度网讯科技有限公司 | 一种数据处理方法、装置、设备以及存储介质 |
CN114781650B (zh) * | 2022-04-28 | 2024-02-27 | 北京百度网讯科技有限公司 | 一种数据处理方法、装置、设备以及存储介质 |
WO2024159541A1 (en) * | 2023-02-03 | 2024-08-08 | Huawei Technologies Co., Ltd. | Systems and methods for compression of deep learning model using reinforcement learning for low rank decomposition |
CN116167431A (zh) * | 2023-04-25 | 2023-05-26 | 之江实验室 | 一种基于混合精度模型加速的业务处理方法及装置 |
CN117540780A (zh) * | 2024-01-09 | 2024-02-09 | 腾讯科技(深圳)有限公司 | 一种神经网络模型的压缩方法和相关装置 |
CN117973485A (zh) * | 2024-03-29 | 2024-05-03 | 苏州元脑智能科技有限公司 | 模型轻量化方法、装置、计算机设备、存储介质及程序产品 |
CN118643884A (zh) * | 2024-08-12 | 2024-09-13 | 成都启英泰伦科技有限公司 | 一种基于微调训练的端侧深度神经网络模型压缩方法 |
Also Published As
Publication number | Publication date |
---|---|
CN110263913A (zh) | 2019-09-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020233130A1 (zh) | 一种深度神经网络压缩方法及相关设备 | |
US11030522B2 (en) | Reducing the size of a neural network through reduction of the weight matrices | |
KR102434726B1 (ko) | 처리방법 및 장치 | |
WO2022105117A1 (zh) | 一种图像质量评价的方法、装置、计算机设备及存储介质 | |
CN111488985A (zh) | 深度神经网络模型压缩训练方法、装置、设备、介质 | |
CN110830807B (zh) | 图像压缩方法、装置及存储介质 | |
CN114374440B (zh) | 量子信道经典容量的估计方法及装置、电子设备和介质 | |
WO2023138188A1 (zh) | 特征融合模型训练及样本检索方法、装置和计算机设备 | |
WO2020207174A1 (zh) | 用于生成量化神经网络的方法和装置 | |
WO2023231954A1 (zh) | 一种数据的去噪方法以及相关设备 | |
CN110751265A (zh) | 一种轻量型神经网络构建方法、系统及电子设备 | |
WO2023207039A1 (zh) | 数据处理方法、装置、设备以及存储介质 | |
US11531695B2 (en) | Multiscale quantization for fast similarity search | |
JP7408741B2 (ja) | マルチタスクのデプロイ方法、装置、電子機器及び記憶媒体 | |
WO2024051655A1 (zh) | 全视野组织学图像的处理方法、装置、介质和电子设备 | |
WO2021012691A1 (zh) | 用于检索图像的方法和装置 | |
JP2020008836A (ja) | 語彙テーブルの選択方法、装置およびコンピュータ読み取り可能な記憶媒体 | |
CN113554149B (zh) | 神经网络处理单元npu、神经网络的处理方法及其装置 | |
WO2022246986A1 (zh) | 数据处理方法、装置、设备及计算机可读存储介质 | |
US20210342694A1 (en) | Machine Learning Network Model Compression | |
CN109086819B (zh) | caffemodel模型压缩方法、系统、设备及介质 | |
WO2024109907A1 (zh) | 一种量化方法、推荐方法以及装置 | |
US20200242467A1 (en) | Calculation method and calculation device for sparse neural network, electronic device, computer readable storage medium, and computer program product | |
CN115953651B (zh) | 一种基于跨域设备的模型训练方法、装置、设备及介质 | |
CN117351299A (zh) | 图像生成及模型训练方法、装置、设备和存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19929575 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19929575 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19929575 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 140622) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19929575 Country of ref document: EP Kind code of ref document: A1 |