CN113850365A

CN113850365A - Method, device, equipment and storage medium for compressing and transplanting convolutional neural network

Info

Publication number: CN113850365A
Application number: CN202110858896.7A
Authority: CN
Inventors: 章金龙; 李合青; 陈小彪; 李建超; 孙璆琛
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2021-07-28
Filing date: 2021-07-28
Publication date: 2021-12-28

Abstract

The application discloses a method, a device, equipment and a storage medium for compressing and transplanting a convolutional neural network. The compression method of the convolutional neural network comprises the following steps: acquiring an initial convolutional neural network to be compressed, and adding a sparse convolutional layer after each convolutional layer to be compressed in the initial convolutional neural network; acquiring a weight parameter of each sparse convolutional layer, and pruning and decomposing each convolutional layer to be compressed based on the weight parameter to obtain a sparse convolutional neural network; and training the sparse convolutional neural network, and taking the trained convolutional neural network as a compressed convolutional neural network. By the scheme, the model parameters can be effectively reduced.

Description

Method, device, equipment and storage medium for compressing and transplanting convolutional neural network

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a storage medium for compressing and transplanting a convolutional neural network.

Background

Convolutional neural networks have achieved the most advanced performance in various computer vision tasks over the past few years. However, millions of parameters and heavy computational burdens are essential for new advances in this area, but this is not practical for deploying neural network solutions on edge devices and mobile devices.

Therefore, a solution that can compress the neural network model and facilitate the deployment of the migration and operation at the mobile end is needed to achieve the goal of reducing the size of the original model without reducing the accuracy. Although both current pruning-based and decomposition-based approaches can provide model compression and accuracy preservation, both pruning-based and decomposition-based approaches have their own limitations. The pruning method only takes effect in the characteristic output channel, the convolution kernel parameter is not allowed to participate in convolution, and meanwhile, the pruning operation causes that some key characteristics cannot be transmitted downwards, so that the network precision is influenced; for convolution kernel decomposition, another 1 x 1 convolution layer is introduced, which means that extra computation of the GPU is added.

Disclosure of Invention

The technical problem mainly solved by the application is to provide a method, a device, equipment and a storage medium for compressing and transplanting a convolutional neural network, which can effectively reduce model parameters.

In order to solve the above problem, a first aspect of the present application provides a method for compressing a convolutional neural network, including: acquiring an initial convolutional neural network to be compressed, and adding a sparse convolutional layer after each convolutional layer to be compressed in the initial convolutional neural network; acquiring a weight parameter of each sparse convolutional layer, and pruning and decomposing each convolutional layer to be compressed based on the weight parameter to obtain a sparse convolutional neural network; and training the sparse convolutional neural network, and taking the trained convolutional neural network as a compressed convolutional neural network.

In order to solve the above problem, a second aspect of the present application provides a migration method of a convolutional neural network, including: acquiring an initial convolutional neural network in first equipment; compressing the initial convolutional neural network model by using a convolutional neural network compression method to obtain a compressed convolutional neural network; transplanting the compressed convolutional neural network into a second device; wherein the method for compressing the convolutional neural network comprises the method for compressing the convolutional neural network of the first aspect.

To solve the above problem, a third aspect of the present application provides a compressing apparatus of a convolutional neural network, including: the acquisition module is used for acquiring an initial convolutional neural network to be compressed and adding a sparse convolutional layer after each convolutional layer to be compressed in the initial convolutional neural network; the processing module is used for acquiring weight parameters of each sparse convolutional layer, and pruning and decomposing each convolutional layer to be compressed based on the weight parameters to obtain a sparse convolutional neural network; and the training module is used for training the thinned convolutional neural network and taking the trained convolutional neural network as a compressed convolutional neural network.

In order to solve the above problem, a fourth aspect of the present application provides an electronic device, where the electronic device for positioning a sound source azimuth includes a processor and a memory connected to each other; the memory is configured to store program instructions, and the processor is configured to execute the program instructions to implement the method for compressing the convolutional neural network of the first aspect or the method for transplanting the convolutional neural network of the second aspect.

In order to solve the above-mentioned problems, a fifth aspect of the present application provides a computer-readable storage medium on which program instructions are stored, the program instructions, when executed by a processor, implementing the above-mentioned compression method of the convolutional neural network of the first aspect, or the above-mentioned migration method of the convolutional neural network of the second aspect.

The invention has the beneficial effects that: different from the situation of the prior art, the sparse convolutional neural network can be obtained by adding a layer of sparse convolutional layer after each convolutional layer to be compressed in the convolutional neural network and pruning and decomposing each convolutional layer to be compressed based on the weight parameters of each sparse convolutional layer. The convolution kernel pruning and the convolution kernel are fused by the sparse convolution layer, so that pruning and decomposition are simultaneously carried out, the model can be compressed, the model parameters are reduced, and the model calculation is accelerated.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a convolutional neural network compression method of the present application;

FIG. 2 is a flowchart illustrating an embodiment of step S12 in FIG. 1;

FIG. 3 is a schematic diagram illustrating convolution kernel pruning and decomposition integration in the compression method of the convolutional neural network according to the present application;

FIG. 4 is a schematic diagram illustrating a workflow of group sparse matrix in an application scenario of the present application;

FIG. 5 is a schematic diagram illustrating a selection of a convolution kernel ordering in an application scenario of the present application;

FIG. 6 is a schematic flow chart diagram illustrating an embodiment of a convolutional neural network migration method of the present application;

FIG. 7 is a schematic structural diagram of an embodiment of a convolutional neural network compression apparatus according to the present application;

FIG. 8 is a schematic structural diagram of an embodiment of an electronic device of the present application;

FIG. 9 is a schematic structural diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

The following describes in detail the embodiments of the present application with reference to the drawings attached hereto.

In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, interfaces, techniques, etc. in order to provide a thorough understanding of the present application.

The terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. Further, the term "plurality" herein means two or more than two.

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a compression method of a convolutional neural network according to an embodiment of the present invention. The compression method of the convolutional neural network in the embodiment includes the following steps:

s11: the method comprises the steps of obtaining an initial convolutional neural network to be compressed, and adding a sparse convolutional layer after each convolutional layer to be compressed in the initial convolutional neural network.

In the application, for each convolution layer to be compressed, pruning and decomposition are required to be performed on convolution kernels in the convolution layer, a sparse convolution layer is added after each convolution layer to be compressed, the sparse convolution layer is utilized to perform sparsification on the convolution layer to be compressed, and output accuracy is guaranteed.

S12: and acquiring a weight parameter of each sparse convolutional layer, and pruning and decomposing each convolutional layer to be compressed based on the weight parameter to obtain a sparse convolutional neural network.

Pruning is to remove redundant parameters in the network, thereby reducing the number of parameters and unnecessary computation of the network, and specifically, a pruning object may be a single convolution kernel, so that the number of channels of each convolution layer of the neural network is reduced. The decomposition is to decompose the original matrix into a plurality of matrixes with simpler forms and smaller sizes, and the original matrix is replaced by the parameters of the small matrixes, so that the purposes of reducing the storage space and the operation amount are achieved. For example, a one-dimensional picture is input as x ∈ R^m×1While one convolution kernel is W ═ W₁,…,w_n}∈R^m×nPruning-based methods can reduce some of the convolution kernels, using x^TC as the convolved output, replacing the original x^TW, where C ∈ R^m×kThere are k output channels for the number of the convolution kernels after pruning. A method based on convolution kernel decomposition, which uses convolution kernel A epsilon R^m×kAnd B ∈ R^m×kTo replace the original convolution kernel W and ensure the convolution kernel momentThe rank of array a × B equals the rank of W. It can be seen from the above assumptions that the pruning and decomposition methods of the convolution kernel are complementary to each other, and by fusing the pruning and decomposition methods, the defect of a single method can be solved, so as to improve the compression strength of the convolutional neural network.

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating an embodiment of step S12 in fig. 1. In an embodiment, the step S12 specifically includes:

s121: and obtaining a group sparse matrix corresponding to each sparse convolution layer according to the weight parameter of each sparse convolution layer.

S122: and for a certain convolution kernel, multiplying the convolution kernel by the row simplified matrix of the group sparse matrix to obtain a new convolution kernel, and converting a single weighted convolution kernel into a light-weight convolution kernel and a 1 × 1 convolution kernel through the new convolution kernel and the column simplified matrix of the group sparse matrix to obtain the thinned convolution neural network.

Referring to fig. 3, fig. 3 is a schematic diagram illustrating an integration of convolution kernel pruning and decomposition in the compression method of the convolutional neural network according to the present application. In the convolution operation shown in fig. 3, the key between pruning and decomposition is to find a group sparse matrix, and perform pruning and decomposition on the convolution kernel matrix simultaneously through the sparse characteristics of the rows and columns of the group sparse matrix. Specifically, a row reduction matrix A of the group sparse matrix A is formed by a convolution kernel W^cMultiplying to obtain new convolution kernel W^cThe process reduces the output dimension and convolution kernel parameters of the convolution kernel, the process is a pruning operation, and meanwhile, if the row of A is subjected to group sparsity constraint, the internal channels of the matrix product W x A, namely the output channels of W and the input channels of A, can be reduced; and in order to save the calculation amount, the method is simplified by A^rMatrix sum W^rThe matrix converts a single heavy convolution kernel W into a light convolution kernel and a 1 x 1 convolution kernel, and the process is a convolution kernel decomposition operation. In the figure, the convolution kernel is a two-dimensional matrix W epsilon R^{features×outputs}The group sparse matrix is Aⁿ ^×nIncreasing sparsity of convolution kernels by group sparseness matrix by group sparseness of columns of group sparseness matrixAnd constraining to obtain the optimal solution of the group sparse matrix in the model training process. And obtaining the thinned convolutional neural network according to the optimal solution of the group sparse matrix.

Specifically, the sparse convolution layer is a convolution layer of convolution kernel 1 × 1. During an initial period of training, the number of output channels of the sparse convolutional layer is the same as the number of output channels of the convolutional layer to be compressed corresponding to the sparse convolutional layer; in the training process, the matrix of the sparse convolutional layer is subjected to group sparse constraint, so that the number of output channels of the sparse convolutional layer is smaller than that of output channels of the convolutional layer to be compressed, and the number of output channels of the convolutional layer can be reduced through continuous iterative training, so that the sparse convolutional layer is realized.

S13: and training the sparse convolutional neural network, and taking the trained convolutional neural network as a compressed convolutional neural network.

After the sparse convolutional neural network is obtained, the sparse convolutional neural network is continuously trained, so that the purpose of reducing convolutional kernel parameters is achieved while the output precision of the finally obtained compressed convolutional neural network is guaranteed.

According to the scheme, a layer of sparse convolutional layer is added behind each convolutional layer to be compressed in the convolutional neural network, pruning and decomposition processing are carried out on each convolutional layer to be compressed based on the weight parameters of each sparse convolutional layer, and the convolutional neural network after sparsification can be obtained. The convolution kernel pruning and the convolution kernel decomposition are fused by the sparse convolution layer, so that the pruning and the decomposition are simultaneously carried out, the model can be compressed, the model parameters are reduced, and the model calculation is accelerated.

Specifically, { x, y } represents the input and output of the network, x represents the input characteristic of the convolutional layer without losing the characteristic value, the output characteristic through convolutional layer a is represented by z, and the convolutional core of the convolutional layer is represented by W. The convolution between the input feature x and the convolution kernel W can then be converted into a matrix multiplication, i.e.:

Z＝X*W (1)

wherein，X∈R^N×cwh，W∈R^cwh×N，Z∈R^N×nThe parameters c, N, w, h and N respectively represent the number of input channels, the number of convolution kernels (the number of output channels), the width and height of an input feature map and the number of input pictures. Referring to fig. 4, fig. 4 is a schematic diagram of a workflow of a group sparse matrix in an application scenario of the present application. As shown in the figure, if the columns of the group sparse matrix a are normalized, the invalid convolution kernels and corresponding output features are deleted, and if the rows of the group sparse matrix a are normalized, part of the row weights in the convolution kernels are cleared, so that the convolution kernels and feature maps in the previous layer are deleted. The compression method of the convolutional neural network not only selects the convolutional kernel in one convolutional layer for compression, but also linearly combines the convolutional kernels so as to minimize the error between the original convolutional kernel and the decomposed convolutional kernel. On the other hand, when the sparsity problem is optimized, the weight of an original convolution kernel W of the convolution neural network does not need to be changed too much, so that the accuracy of the original network model is guaranteed.

In an embodiment, the step S13 may specifically include: and adding related regular terms into the loss function of the convolutional neural network, and updating the parameters of the convolutional neural network according to the loss function of the convolutional neural network.

It is understood that, in the training process of the convolutional neural network, the parameters of the convolutional neural network may be updated according to the loss function of the convolutional neural network. When overfitting occurs in the training stage, the error of the model during training is small, but the test error is large, that is, the model is complex enough to fit to all training data, but when predicting new data, the result is poor, so that a relevant regular term needs to be added into the loss function of the convolutional neural network to avoid overfitting of the network model.

In an embodiment, the correlation regularization term includes a weight decay regularization term. The weight decay regularization term is L2 norm regularization, and the value of the learned model parameter is made smaller by adding an L2 norm regularization term to a loss function, wherein the L2 norm regularization term refers to the product of the sum of squares of each element of the model weight parameter and a positive constant.

In an embodiment, the correlation regularization term comprises a sparse regularization term. In the training phase, the parameters of the group sparse matrix are restricted, the unimportant parameters are restricted to be 0, the restricted group sparse matrix is obtained, and the restricted group sparse matrix can be used for restricting the convolution kernel parameters of the previous convolution layer, so that the structured pruning of the convolution kernel is realized. It will be appreciated that the loss function of a convolutional neural network can be expressed as:

Loss_Total＝L(y,f(x,W))+αD(W)+βR(W) (2)

where D (W) and R (W) are weight decay and sparsity regularization terms, and α and β are regularization factors. The method includes the steps that a group sparse matrix A epsilon R is introduced^n×nTo increase the group sparsity constraint, the rows and columns of the group sparse matrix A are respectively represented by A_iAnd A_jIndicating that the matrix is transformed into a 1 x 1 convolution kernel after the original convolution layer. Then, the convolution of the original formula (1) may be changed to Z ═ X (W ×) and the group sparse matrix a may be subjected to group sparse regularization, so that the loss function of formula (2) becomes:

loss₁＝L(y,f(x,W,A))+αD(W)+βR(A) (3)

and performing optimization calculation on the group sparse matrix A in the training process, and performing pruning and decomposition operation on the convolution kernel by using the group sparse matrix A at the same time, so that the obtained sparse model parameters of the convolutional neural network are reduced, and the model parameters are compressed.

In an embodiment, the correlation regularization term further includes a convolution kernel ordering regularization term. Specifically, the step of adding a relevant regularization term to the loss function of the convolutional neural network and updating the parameter of the convolutional neural network according to the loss function of the convolutional neural network may specifically include: calculating the matrix 2 norm of each convolution kernel in each convolution layer, and sequencing the convolution kernels according to the value of the matrix 2 norm; acquiring a convolution kernel screening threshold according to the convolution kernel sorting relation of all convolution layers; and removing the convolution kernels in each convolution layer according to the convolution kernel screening threshold value to obtain an updated convolution neural network.

A convolution kernel sorting regular term is added into the loss function of the formula (3), the convolution kernel sorting regular term is used for expressing the influence of each convolution kernel on the convolution neural network, the convolution kernels are sorted according to the value of the norm of the matrix 2, and then the convolution kernels with small influence on the convolution neural network result are deleted, so that network parameters can be reduced, and the compression model is realized.

Please refer to fig. 5, fig. 5 is a schematic diagram illustrating a selection of a convolution kernel sorting in an application scenario of the present application. The correlation of the convolution kernels in the convolutional layers is expressed by adopting a matrix 2 norm, the matrix 2 norm is calculated for each convolution kernel in each convolutional layer, and the convolution kernels are sequenced according to the value of the matrix 2 norm. While ensuring the output accuracy of the sparse convolutional neural network, the method can determine the selection standard of the convolutional kernel of the current layer, namely determining the convolutional kernel screening threshold I:

and I represents the ith convolutional layer, so that the convolutional kernels with the matrix 2 norm value smaller than the convolutional kernel screening threshold I can be removed to obtain the updated convolutional neural network.

Further, the step of adding a relevant regularization term to the loss function of the convolutional neural network and updating the parameter of the convolutional neural network according to the loss function of the convolutional neural network may specifically include: adding a weight attenuation regular term, a sparse regular term and a convolution kernel sorting regular term into a loss function of the convolution neural network to obtain a total loss function; and training according to the total loss function, and determining the optimal solution of the weight parameters of each sparse convolution layer to obtain the sparse convolution neural network.

It can be understood that, in the training process of the network model, a global two-norm loss function loss can be established₂Values for learning γ and k:

while learning the values of γ and k can be used as a network model optimization problem, therefore, the total loss function of the convolutional neural network is:

loss_total＝loss₁+loss₂ (6)

therefore, the optimal solution of gamma and k can be obtained through learning, the convolution kernel screening threshold I is obtained through calculation, the convolution kernels of each convolution layer are removed, the training is revised again to adapt to the updated network model, and the parameters of the convolution kernels are reduced while the accuracy is ensured.

According to the scheme, the group sparsity matrix is used for carrying out convolution kernel pruning and convolution kernel decomposition on convolution kernels, two operations can be completed by one matrix product, meanwhile, the matrix 2 norm of the convolution kernels of each convolution layer is calculated in the training process, a convolution kernel screening threshold is obtained, the convolution kernels are removed, model parameters can be effectively reduced through the whole method, and the occupation of a device memory on hardware equipment by a model can be reduced and the forward reasoning calculation speed can be accelerated.

Referring to fig. 6, fig. 6 is a flowchart illustrating an embodiment of a convolutional neural network transplanting method according to the present application. The transplantation method of the convolutional neural network in the embodiment comprises the following steps:

s61: an initial convolutional neural network in a first device is obtained.

S62: and compressing the initial convolutional neural network by using a convolutional neural network compression method to obtain a compressed convolutional neural network. The compression method of the convolutional neural network comprises any one of the compression methods of the convolutional neural network.

S63: and transplanting the compressed convolutional neural network into a second device.

It can be understood that the first device has a larger storage space and a faster operation speed, and the initial convolutional neural network can be an image recognition network, so that a user can use abundant data resources in the first device to realize image recognition through a complex initial convolutional neural network. However, the initial convolutional neural network has higher requirements on the storage function and the operation function of hardware, and a user cannot directly transplant the initial convolutional neural network in the first device to a lightweight second device for image recognition, where the second device may be a mobile phone, an embedded device, or the like. Therefore, any one of the above-mentioned compression methods of the convolutional neural network can be used to compress the initial convolutional neural network in the first device, on the premise of ensuring the network accuracy, the redundant parameters are released, unnecessary operations are eliminated, and the compressed convolutional neural network is obtained, so that the compressed convolutional neural network can be transplanted to the second device and applied, power consumption, space and time resources can be saved, the application of the convolutional neural network to the lightweight second device is promoted, and various computer vision tasks based on the convolutional neural network can be closer to daily life without being limited to the high-performance first device.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a compression apparatus of a convolutional neural network according to an embodiment of the present application. The compressing apparatus 60 of the convolutional neural network in this embodiment includes an obtaining module 600, a processing module 602, and a training module 604, which are connected to each other; the obtaining module 600 is configured to obtain an initial convolutional neural network to be compressed, and add a sparse convolutional layer after each convolutional layer to be compressed in the initial convolutional neural network; the processing module 602 is configured to obtain a weight parameter of each sparse convolutional layer, and perform pruning and decomposition processing on each convolutional layer to be compressed based on the weight parameter to obtain a sparse convolutional neural network; the training module 604 is configured to train the sparse convolutional neural network, and use the trained convolutional neural network as a compressed convolutional neural network.

In one embodiment, the sparse convolution layer is a convolution layer of convolution kernel 1 x 1; the processing module 602 performs the steps of obtaining a weight parameter of each sparse convolutional layer, pruning and decomposing each convolutional layer to be compressed based on the weight parameter, and obtaining a sparse convolutional neural network, including: obtaining a group sparse matrix corresponding to each sparse convolution layer according to the weight parameters of each sparse convolution layer; and for a certain convolution kernel, multiplying the convolution kernel by the row simplified matrix of the group sparse matrix to obtain a new convolution kernel, and converting a single weighted convolution kernel W into a light-weight convolution kernel and a 1 × 1 convolution kernel through the new convolution kernel and the column simplified matrix of the group sparse matrix to obtain the thinned convolution neural network.

In an embodiment, the training module 604 performs the step of training the thinned convolutional neural network to use the trained convolutional neural network as a compressed convolutional neural network, including: and adding related regular terms into the loss function of the convolutional neural network, and updating the parameters of the convolutional neural network according to the loss function of the convolutional neural network.

In an embodiment, the correlation regularization term includes a weight decay regularization term.

In an embodiment, the correlation regularization term comprises a sparse regularization term; the sparse regular term is used for constraining the parameters of the group sparse matrix, and constraining part of the parameters to be 0.

In an embodiment, the correlation regularization term further includes a convolution kernel ordering regularization term; the training module 604 performs the steps of adding a relevant regular term to the loss function of the convolutional neural network, and updating the parameters of the convolutional neural network according to the loss function of the convolutional neural network, including: calculating the matrix 2 norm of each convolution kernel in each convolution layer, and sequencing the convolution kernels according to the value of the matrix 2 norm; acquiring a convolution kernel screening threshold according to the convolution kernel sorting relation of all convolution layers; and removing the convolution kernels in each convolution layer according to the convolution kernel screening threshold value to obtain an updated convolution neural network.

In an embodiment, the correlation regularization term further includes a convolution kernel ordering regularization term; the training module 604 performs the steps of adding a relevant regular term to the loss function of the convolutional neural network, and updating the parameters of the convolutional neural network according to the loss function of the convolutional neural network, including: adding a weight attenuation regular term, a sparse regular term and a convolution kernel sorting regular term into a loss function of the convolution neural network to obtain a total loss function; and training according to the total loss function, and determining the optimal solution of the weight parameters of each sparse convolution layer to obtain the sparse convolution neural network.

For the details of the compression method for implementing the convolutional neural network by the compression device 60 of the convolutional neural network of the present application, please refer to the contents in the above embodiments of the compression method of the convolutional neural network, which are not described herein again.

Referring to fig. 8, fig. 8 is a schematic structural diagram of an embodiment of an electronic device according to the present application. The electronic device 70 in the present embodiment includes a processor 702 and a memory 701 connected to each other; the memory 701 is configured to store program instructions, and the processor 702 is configured to execute the program instructions stored in the memory 701 to implement the steps of any one of the above-mentioned compression method of the convolutional neural network or the migration method embodiment of the convolutional neural network. In one particular implementation scenario, the electronic device 70 may include, but is not limited to: microcomputer, server.

In particular, the processor 702 is configured to control itself and the memory 701 to implement the steps of any of the above-described embodiments of the convolutional neural network compression method or convolutional neural network migration method. Processor 702 may also be referred to as a CPU (Central Processing Unit). The processor 702 may be an integrated circuit chip having signal processing capabilities. The Processor 702 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 702 may be collectively implemented by an integrated circuit chip.

In the above scheme, the processor 702 may obtain the thinned convolutional neural network by adding a sparse convolutional layer after each convolutional layer to be compressed in the convolutional neural network, and pruning and decomposing each convolutional layer to be compressed based on the weight parameter of each sparse convolutional layer. The convolution kernel pruning and the convolution kernel decomposition are fused by the sparse convolution layer, so that the pruning and the decomposition are simultaneously carried out, the model can be compressed, the model parameters are reduced, and the model calculation is accelerated.

Referring to fig. 9, fig. 9 is a schematic structural diagram of an embodiment of a computer-readable storage medium according to the present application. The computer readable storage medium 80 of the present application, having stored thereon program instructions 800, the program instructions 800, when executed by a processor, implement the steps in any of the above-described embodiments of a method of compressing a convolutional neural network or a method of migrating a convolutional neural network.

The computer-readable storage medium 80 may be a medium that can store the program instructions 800, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or may be a server that stores the program instructions 800, and the server can send the stored program instructions 800 to another device for execution, or can execute the stored program instructions 800 by itself.

In the several embodiments provided in the present application, it should be understood that the disclosed method, apparatus, and device may be implemented in other ways. For example, the above-described apparatus and device embodiments are merely illustrative, and for example, a division of a module or a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims

1. A method of compressing a convolutional neural network, the method comprising:

acquiring an initial convolutional neural network to be compressed, and adding a sparse convolutional layer after each convolutional layer to be compressed in the initial convolutional neural network;

acquiring a weight parameter of each sparse convolutional layer, and pruning and decomposing each convolutional layer to be compressed based on the weight parameter to obtain a sparse convolutional neural network;

and training the sparse convolutional neural network, and taking the trained convolutional neural network as a compressed convolutional neural network.

2. The method of compressing a convolutional neural network as claimed in claim 1, wherein said sparse convolutional layer is a convolutional layer of convolutional kernel 1 x 1;

the acquiring of the weight parameter of each sparse convolutional layer, pruning and decomposing of each convolutional layer to be compressed based on the weight parameter to obtain a sparse convolutional neural network, includes:

obtaining a group sparse matrix corresponding to each sparse convolution layer according to the weight parameter of each sparse convolution layer;

and for a certain convolution kernel, multiplying the convolution kernel by the row simplified matrix of the group sparse matrix to obtain a new convolution kernel, and converting a single weighted convolution kernel into a light-weight convolution kernel and a 1 × 1 convolution kernel through the new convolution kernel and the column simplified matrix of the group sparse matrix to obtain the thinned convolution neural network.

3. The method according to claim 2, wherein training the thinned convolutional neural network to use the trained convolutional neural network as the compressed convolutional neural network comprises:

and adding related regular terms into the loss function of the convolutional neural network, and updating the parameters of the convolutional neural network according to the loss function of the convolutional neural network.

4. The method of compressing convolutional neural network of claim 3, wherein the correlation regularization term comprises a weight decay regularization term.

5. The method of compressing a convolutional neural network as claimed in claim 3, wherein the correlation regularization term comprises a sparse regularization term; the sparse regular term is used for constraining the parameters of the group sparse matrix, and constraining part of the parameters to be 0.

6. The method of compressing convolutional neural network as claimed in claim 3, wherein the correlation regularization term further comprises a convolution kernel ordering regularization term;

adding a relevant regular term to the loss function of the convolutional neural network, and updating the parameters of the convolutional neural network according to the loss function of the convolutional neural network, including:

calculating the matrix 2 norm of each convolution kernel in each convolution layer, and sequencing the convolution kernels according to the value of the matrix 2 norm;

acquiring a convolution kernel screening threshold according to the convolution kernel sorting relation of all convolution layers;

and removing the convolution kernels in each convolution layer according to the convolution kernel screening threshold value to obtain an updated convolution neural network.

7. The method according to claim 3, wherein the adding a correlation regularization term to the loss function of the convolutional neural network, and the updating the parameters of the convolutional neural network according to the loss function of the convolutional neural network comprises:

adding a weight attenuation regular term, a sparse regular term and a convolution kernel sorting regular term into a loss function of the convolution neural network to obtain a total loss function;

and training according to the total loss function, and determining the optimal solution of the weight parameters of each sparse convolution layer to obtain the sparse convolution neural network.

8. A method for transplanting a convolutional neural network, comprising:

acquiring an initial convolutional neural network model in first equipment;

compressing the initial convolutional neural network by using a convolutional neural network compression method to obtain a compressed convolutional neural network;

transplanting the compressed convolutional neural network into a second device;

wherein the method of compressing the convolutional neural network comprises a method of compressing the convolutional neural network of any one of claims 1 to 7.

9. An apparatus for compressing a convolutional neural network, comprising:

the acquisition module is used for acquiring an initial convolutional neural network to be compressed and adding a sparse convolutional layer after each convolutional layer to be compressed in the initial convolutional neural network;

the processing module is used for acquiring weight parameters of each sparse convolutional layer, and pruning and decomposing each convolutional layer to be compressed based on the weight parameters to obtain a sparse convolutional neural network;

and the training module is used for training the thinned convolutional neural network and taking the trained convolutional neural network as a compressed convolutional neural network.

10. An electronic device, characterized in that the electronic device comprises a processor and a memory connected to each other;

the memory is configured to store program instructions, and the processor is configured to execute the program instructions to implement the method of compressing a convolutional neural network as defined in any one of claims 1 to 7, or the method of transplanting a convolutional neural network as defined in claim 8.

11. A computer readable storage medium having stored thereon program instructions which, when executed by a processor, implement the method of compressing a convolutional neural network as claimed in any one of claims 1 to 7, or the method of transplanting a convolutional neural network as claimed in claim 8.