CN111967583A

CN111967583A - Method, apparatus, device and medium for compressing neural network

Info

Publication number: CN111967583A
Application number: CN202010812188.5A
Authority: CN
Inventors: 刘宁; 关玉烁; 车正平; 唐剑
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2020-08-13
Filing date: 2020-08-13
Publication date: 2020-11-20

Abstract

According to an embodiment of the present disclosure, a method, an apparatus, a device, and a storage medium for compressing a neural network are provided. The method comprises the following steps: determining a plurality of auxiliary parameters by training a neural network with training data, the plurality of auxiliary parameters corresponding to a plurality of output channels included in convolutional layers of the neural network; determining a plurality of clipping parameters corresponding to the plurality of output channels based on the plurality of auxiliary parameters and the number of iterations of the training, the clipping parameters indicating whether the corresponding output channels are to be clipped; and clipping at least one of the plurality of output channels based on the plurality of clipping parameters. Based on the mode, the output channel in the neural network can be effectively cut, and then the neural network is compressed.

Description

Method, apparatus, device and medium for compressing neural network

Technical Field

Implementations of the present disclosure relate to the field of artificial intelligence, and more particularly, to methods, apparatus, devices, and media for compressing neural networks.

Background

In recent years, with the development of artificial intelligence technology, neural networks have been widely used in many technical fields such as image processing and speech recognition, and have played an important role.

To perform more complex tasks, neural networks include more and more layers, and the size of network parameters and the calculation scale are also increasingly large. This makes neural networks, which consume a large amount of computing resources when being trained and used, difficult to deploy into devices with limited computing resources and memory (e.g., mobile devices and embedded systems).

Therefore, how to compress the volume of the neural network model and reduce the calculation amount of the neural network model under the condition of ensuring the accuracy of the neural network has become a current focus of attention.

Disclosure of Invention

Embodiments of the present disclosure provide a scheme for compressing a neural network.

In a first aspect of the disclosure, a method of compressing a neural network is provided. The method comprises the following steps: determining a plurality of auxiliary parameters by training a neural network with training data, the plurality of auxiliary parameters corresponding to a plurality of output channels included in convolutional layers of the neural network; determining a plurality of clipping parameters corresponding to the plurality of output channels based on the plurality of auxiliary parameters and the number of iterations of the training, the clipping parameters indicating whether the corresponding output channels are to be clipped; and clipping at least one of the plurality of output channels based on the plurality of clipping parameters.

In a second aspect of the present disclosure, an apparatus for compressing a neural network is provided. The device includes: an auxiliary parameter determination module configured to determine a plurality of auxiliary parameters by training the neural network with the training data, the plurality of auxiliary parameters corresponding to a plurality of output channels included in convolutional layers of the neural network; a clipping parameter determination module configured to determine a plurality of clipping parameters corresponding to the plurality of output channels based on the plurality of auxiliary parameters and a number of iterations of the training, the clipping parameters indicating whether the corresponding output channels are to be clipped; and a clipping module configured to clip at least one output channel of the plurality of output channels based on the plurality of clipping parameters.

In a third aspect of the present disclosure, there is provided an electronic device comprising: a memory and a processor; wherein the memory is for storing one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method according to the first aspect of the disclosure.

In a fourth aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon one or more computer instructions, wherein the one or more computer instructions are executed by a processor to implement a method according to the first aspect of the present disclosure.

According to various embodiments of the present disclosure, the output channels in the convolutional layer of the neural network can be effectively tailored, thereby compressing the size of the neural network, reducing the requirements of the neural network on the deployed computing devices, and thus improving the extensibility of the neural network.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, like or similar reference characters designate like or similar elements, and wherein:

FIG. 1 illustrates a schematic diagram of an example environment in which embodiments of the present disclosure can be implemented;

FIG. 2 shows a flow diagram of a process of compressing a network in accordance with multiple embodiments of the present disclosure;

FIG. 3 shows a flow diagram of an example process of determining an assistance parameter in accordance with various embodiments of the present disclosure;

FIGS. 4A-4C are schematic diagrams illustrating tailoring parameters as a function of annealing temperature;

FIG. 5 illustrates a schematic block diagram of an apparatus for compressing a neural network, in accordance with some embodiments of the present disclosure; and

FIG. 6 illustrates a block diagram of a computing device capable of implementing various embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

In describing embodiments of the present disclosure, the terms "include" and its derivatives should be interpreted as being inclusive, i.e., "including but not limited to. The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The terms "first," "second," and the like may refer to different or the same object. Other explicit and implicit definitions are also possible below.

As used herein, a "neural network" is capable of processing an input and providing a corresponding output, which generally includes an input layer and an output layer and one or more hidden layers between the input layer and the output layer. The layers in the neural network are connected in sequence such that the output of a previous layer is provided as the input of a subsequent layer, wherein the input layer receives the input of the neural network model and the output of the output layer is the final output of the neural network model. Each layer of the neural network model includes one or more nodes (also referred to as processing nodes or neurons), each node processing input from a previous layer. The terms "neural network," "model," "network," and "neural network model" are used interchangeably herein.

As discussed above, in recent years, the deeper the neural networks (e.g., convolutional neural networks) are, the wider the neural networks are, which results in the consumption of large computational resources for both training and prediction of these neural networks. This makes it difficult for this neural network as well to be deployed into devices with only limited computational resources, such as robots, autonomous vehicles, mobile terminals, and the like.

Some existing schemes propose to reduce the volume of the neural network and reduce the computational load of the network by clipping the output channels of convolutional layers in the neural network, such schemes are also referred to as "channel clipping". For example, some conventional schemes filter out unimportant channels by formulating filtering rules and crop those channels. However, such clipping tends to be coarse grained and the clipping effect is poor.

In accordance with an implementation of the present disclosure, a scheme for compressing a neural network is presented. In this approach, first, a plurality of auxiliary parameters are determined by training a neural network with training data, wherein the plurality of auxiliary parameters correspond to a plurality of output channels comprised by convolutional layers of the neural network. Then, a plurality of clipping parameters corresponding to the plurality of output channels are determined based on the plurality of auxiliary parameters and the number of iterations of the training, wherein the clipping parameters indicate whether the corresponding output channels are to be clipped. And, at least one of the plurality of output channels is clipped based on the plurality of clipping parameters.

By co-training the weight parameters of the neural network and the auxiliary parameters corresponding to the output channels, a clipping parameter indicating whether the output channels are to be clipped can be determined, and the clipping parameter can be converged to two different values (e.g., 0 and 1) as the number of iterations increases. By the method, one or more output channels in the neural network can be finely cut under the condition of ensuring the accuracy of the neural network, so that the volume of the neural network and the requirement on computing resources are reduced.

Various example implementations of this approach are described in further detail below in conjunction with the figures.

Referring initially to FIG. 1, a schematic diagram of an environment 100 is schematically illustrated in which an exemplary implementation according to the present disclosure may be implemented. As shown in fig. 1, environment 100 includes a computing device 135. In some implementations, the computing device 135 may be a computing device with sufficient computing resources.

The computing device 135 may receive the neural network 120 to be compressed. In some implementations, the neural network 120 is a Convolutional Neural Network (CNN) and includes one or more convolutional layers, such as convolutional layers 125-1 through 125-N (individually or collectively convolutional layers 125) shown in FIG. 1. Taking convolutional layer 125-2 as an example, it includes a plurality of output channels 130-1 through 130-M (individually or collectively referred to as output channels 130). These output channels will be provided as inputs to the next convolutional layer 125.

As shown in fig. 1, the computing device 135 receives the training data 110 and trains the neural network 120 to be compressed using the training data 110. In some implementations, examples of training data 110 may include, but are not limited to, image training data, text training data, and speech training data, among others. Accordingly, the neural network 120 may be a neural network for image processing, a neural network for text processing, or a neural network for voice processing.

In the training process, the computing device 135 can set corresponding auxiliary parameters for the output channels, and obtain the weight parameters and the auxiliary parameters of the neural network 120 through collaborative training. Subsequently, the computing device 135 can determine a clipping parameter corresponding to the output channel 130 according to the auxiliary parameters and the number of iterations of training, and clip one or more output channels 130 in the neural network 120 based on the clipping parameter.

In the example of FIG. 1, the output channels 130-2 and 130-3 in the neural network 120 to be compressed are clipped (as indicated by the dashed output channels 150-2 and 150-3) to yield the compressed neural network 140. In the compressed neural network 140, it still includes a plurality of convolutional layers 145-1 to 145-N. In contrast, output channels 150-2 and 150-3 in convolutional layer 145-2 have been clipped, thereby reducing the computational load of the neural network.

It should be understood that the structure of the neural network shown in fig. 1 and the number of convolutional layers and output channels therein are illustrative and not limiting. In different applications, the neural network may be designed with other suitable architectures and/or a suitable number of convolutional layers and output channels, as desired.

A specific process of compressing the neural network will be described in more detail below with reference to fig. 2 to 4. Fig. 2 illustrates a flow diagram of a process 200 of compressing a neural network, according to some embodiments of the present disclosure. Process 200 may be implemented by computing device 135 of fig. 1. For ease of discussion, process 200 will be described in conjunction with fig. 1.

As shown in fig. 2, at block 202, the computing device 135 determines a plurality of auxiliary parameters by training the neural network with the training data 110, wherein the plurality of auxiliary parameters correspond to the plurality of output channels 130 included in the convolutional layers 125 of the neural network 120.

The specific process of block 202 will be described below in conjunction with fig. 3, fig. 3 showing a flow diagram of an example process of determining auxiliary parameters according to some embodiments of the present disclosure.

As shown in fig. 3, at block 302, the computing device 135 may determine at least one objective function of the neural network 120 based on the training data 110.

In some implementations, for a neural network with multiple convolutional layers, the computation of the ith convolutional layer can be expressed as formula (1):

O_l＝F_l(W_l，O_l-1)， (1)

wherein the content of the first and second substances,

convolution kernel representing the l-th layer with input channel c_l-1And the output channel is c_l；F_l(. represents a convolution operation, O)_lThe output feature tensor for the l-th layer is represented.

Some conventional schemes directly indicate whether or not each output channel is to be clipped by configuring it with a corresponding flag (e.g., 0 and 1). Accordingly, the clipped output feature tensor can be expressed as equation (2):

wherein the content of the first and second substances,

represents a vector formed by the flag bits,

denotes the tensor product, O'_lRepresents the utilization of I_lObtained after cuttingThe output feature tensor of.

Due to I_lThe value of the medium flag is discontinuous (0 and 1), so it cannot be trained synergistically with the weight parameters of the neural network. To overcome the above problems, in implementations of the present disclosure, auxiliary parameters corresponding to the output channel 130 are set

And utilizes it to determine the corresponding cutting parameter

In particular, computing device 135 may assign cropping parameters

Expressed as formula (3):

therein, that is to say

Is converted into

Wherein T represents a temperature variable that varies with the number of iterations of the training. For example, T can be expressed as formula (4):

T＝T₀/σ(n) (4)

wherein T is₀Represents an initial temperature variable (e.g., which may be set to 1 or other value), n represents the number of iterations of the training, and σ (-) represents a temperature annealing function that becomes larger as the number of iterations increases.

As the number of iterations increases, T will gradually become smaller and approach 0. At this time, the process of the present invention,

will approach 0 or 1. In this manner, the present disclosureNow the discontinuous clipping parameters can be combined

Is converted into continuous auxiliary parameters

To search for (1).

In some implementations, to jointly train the weight parameters and the auxiliary parameters of the neural network 120, the computing device 135 may determine one or more objective functions.

In some implementations, the objective function may indicate a prediction accuracy of the neural network for the training samples. In particular, computing device 135 may weight the output of the convolutional layer based on a plurality of training auxiliary parameters corresponding to a plurality of output channels 130. For example, the corresponding weight of the output channel 130 may be determined according to formula (3)

Subsequently, the computing device 135 may determine a loss function of the neural network 120 based on the weighted outputs and take it as a first objective function. For example, the first objective function may be expressed as equation (5):

wherein the content of the first and second substances,

represents a weight parameter of the neural network 120, alpha represents an auxiliary parameter of the neural network 120,

representing a loss function (i.e., a first objective function) corresponding to the validation data set in the training data 110. By setting the first objective function, the computing device 135 may guarantee the accuracy of the pruned neural network.

In some implementations, the computing device 135 may also consider sparsity constraints. In particular, the computing device 135 may determine the degree of model compression based on a plurality of training assistance parameters corresponding to the plurality of output channels 130. Illustratively, the degree of model compression may indicate the number of floating point calculations for the neural network 120, which may be represented, for example, by equation (6):

wherein k is_lRepresents the convolution kernel size, h, of the l-th layer_lAnd w_lRepresenting the spatial size of the output signature.

Subsequently, the computing device 135 may also determine a second objective function based on the model degree of compression and the target degree of compression. Illustratively, the second objective function may be expressed as formula (7):

where F represents the target floating-point calculation, i.e., the target degree of compression, and e represents a constant less than 1.

By introducing the second objective function, the computing device 135 may avoid the neural network 120 from being excessively compressed, thereby affecting the robustness of the neural network 120.

In some implementations, the computing device 135 may also consider symmetry constraints on the residual block. In some implementations, for a neural network 120 that includes a residual block, the computing device may determine a first number of input channels of the residual block based on a plurality of training auxiliary parameters corresponding to a plurality of output channels. For example, the first number may be expressed as:

subsequently, the computing device 135 may also determine a second number of output channels of the residual block based on the plurality of training auxiliary parameters. For example, the second number may be expressed as:

further, the computing device 135 may determine a third objective function based on the difference between the first number and the second number. For example, the third objective function may be expressed as equation (10):

wherein (l, l') represents a compound having c_lAn input channel and having c_l′A residual block for each output channel.

By introducing the third objective function, for neural networks with residual blocks, the computing device 135 may guarantee symmetry of the residual blocks (i.e., having the same number of input channels and output channels), thereby avoiding increasing the difficulty of propagating across multiple layers of gradients.

At block 304, the computing device 135 may determine an overall objective function for training the neural network 120 by combining at least a portion of the at least one objective function.

In some implementations, the auxiliary parameter is set by setting an auxiliary parameter corresponding to the output channel 130

The overall objective function of the neural network 120 may be set, for example, to equation (11):

wherein the content of the first and second substances,

representing a loss function (i.e., a first objective function) corresponding to the validation data set in the training data 110, lambda represents a weighting coefficient,

a sparsity constraint is represented, which may be any one of the second objective function and the third objective function, or a combination thereof.

At block 306, the computing device 135 may determine a plurality of auxiliary parameters for the neural network by locally minimizing the overall objective function.

In some implementations, the computing device 135 may jointly learn weight parameters of the neural network 120

And an auxiliary parameter alpha, which can be regarded as a 2-layer optimization problem, wherein the auxiliary parameter alpha is an upper-layer variable and the weight parameter

Is the lower variable. Weight parameter

This can be determined, for example, by equation (12):

wherein

Representing a loss function on the training data set in the training data 110. The computing device 135 may learn the weight parameters in coordination

And an auxiliary parameter alpha to locally minimize the overall objective function (11). Weight parameter for collaborative learning

Specific procedures for and the auxiliary parameter α can be found in "Darts: differentiated architecture search", published in 2019 by Hanxiao Liu, Karen simony and Yiming Yang. The disclosure is not described in detail herein.

With continued reference to fig. 2, at block 204, the computing device 135 determines a plurality of clipping parameters corresponding to the plurality of output channels based on the plurality of auxiliary parameters and the number of iterations of training, the clipping parameters indicating whether the corresponding output channel is to be clipped.

Specifically, as discussed above in connection with equations (3) and (4), the computing device 135 may first determine a first intermediate parameter (i.e., the temperature variable T discussed above) based on the number of iterations, where the first intermediate parameter indicates a degree of heat of training.

Subsequently, the computing device 135 may determine a second intermediate parameter (i.e., in equation (3)) based on the auxiliary parameter and the first intermediate parameter

) Wherein the intermediate parameter is proportional to the auxiliary parameter and inversely proportional to the first intermediate parameter.

Further, the computing device 135 may determine the clipping parameter corresponding to the auxiliary parameter based on an sigmoid function of the intermediate parameter. That is, computing device 135 is obtaining trained auxiliary parameters

Thereafter, the clipping parameter corresponding thereto can be determined according to the formula (3)

Fig. 4A-4C illustrate schematic diagrams of tailoring parameters as a function of annealing temperature in some implementations according to the present disclosure. Fig. 4A shows the variation of the clipping parameter with the auxiliary parameter when the temperature variable T is 1 (e.g., the first iteration), and it can be seen that the clipping parameter is a value in 0 to 1, and the distribution is comparatively dispersed.

Fig. 4B shows the variation of the clipping parameter with the auxiliary parameter when the temperature variable T is 0.02, and it can be seen that the distribution of the clipping parameter is relatively concentrated and gradually approaches 0 or 1 with respect to T being 1.

Fig. 4C shows the variation of the cutting parameter with the auxiliary parameter when the temperature variable T is 0.002, and it can be seen that the cutting parameter has been substantially binarized to 0 or 1 with respect to T being 0.02.

As can be seen from the schematic diagrams of fig. 4A to 4C, by introducing the auxiliary parameter and the temperature variable, the embodiment of the present disclosure can overcome the problem that the clipping parameter value space is discrete and difficult to search.

At block 206, the computing device 135 crops at least one output channel of the plurality of output channels based on the plurality of cropping parameters. In particular, if a clipping parameter of the plurality of clipping parameters is less than a predetermined threshold, computing device 135 may clip an output channel corresponding to the clipping parameter.

For example, take FIG. 4C as an example, where the parameters are clipped

And

is close to 0 and is determined to be less than a predetermined threshold. Accordingly, as shown in FIG. 1, the computing device 135 may crop the neural network 120 with the cropping parameters

And

corresponding output channels 130-2 and 130-3.

Through the method described above, the embodiments of the present disclosure can perform fine-grained automatic clipping on the output channels included in the convolutional layers in the neural network while ensuring the accuracy of the neural network, thereby obtaining a neural network with a lower computation amount.

In some implementations, the computing device 135 may also deploy the tailored neural network 140 to the target computing device, where the number of computing resources of the target computing device is less than the threshold number. Examples of target computing devices include, but are not limited to: robots, autonomous vehicles, mobile terminals, and the like, have relatively limited computing resources.

In this way, the embodiment of the disclosure can reduce the demand of the neural network on computing resources, improve the universality of the neural network, and further can be applied to more devices.

Embodiments of the present disclosure also provide corresponding apparatuses for implementing the above methods or processes. Fig. 5 illustrates a schematic block diagram of an apparatus 500 for compressing a neural network, according to some embodiments of the present disclosure.

As shown in fig. 5, the apparatus 500 may include an auxiliary parameter determination module 510 configured to determine a plurality of auxiliary parameters by training a neural network with training data, the plurality of auxiliary parameters corresponding to a plurality of output channels included in convolutional layers of the neural network. Additionally, the apparatus 500 further includes a clipping parameter determination module 520 configured to determine a plurality of clipping parameters corresponding to the plurality of output channels based on the plurality of auxiliary parameters and the number of iterations of training, the clipping parameters indicating whether the corresponding output channels are to be clipped. The apparatus 500 further includes a clipping module 530 configured to clip at least one output channel of the plurality of output channels based on the plurality of clipping parameters.

In some implementations, the secondary parameter determination module 510 includes: an objective function determination module configured to determine at least one objective function of the neural network based on the training data; an objective function combining module configured to determine an overall objective function for training the neural network by combining at least a portion of the at least one objective function; and an objective function solving module configured to determine a plurality of auxiliary parameters of the neural network by locally minimizing the overall objective function.

In some implementations, the objective function determination module includes: a weighting module configured to weight an output of the convolutional layer based on a plurality of training auxiliary parameters corresponding to a plurality of output channels; and a first objective function determination module configured to determine a loss function of the neural network as a first objective function based on the weighted outputs.

In some implementations, the objective function determination module includes: a model compression degree determination module configured to determine a degree of model compression based on a plurality of training auxiliary parameters corresponding to a plurality of output channels; and a second objective function determination module configured to determine a second objective function based on a difference of the model compression degree and the target compression degree.

In some implementations, the degree of model compression indicates a floating point count of the neural network.

In some implementations, the neural network includes a residual block, and the objective function determination module includes: a first number determination module configured to determine a first number of input channels of a residual block based on a plurality of training auxiliary parameters corresponding to a plurality of output channels; a second number determination module configured to determine a second number of output channels of the residual block based on the plurality of training auxiliary parameters; and a third objective function determination module configured to determine a third objective function based on a difference between the first number and the second number.

In some implementations, the clipping parameter determination module 520 includes: a first intermediate parameter determination module configured to determine a first intermediate parameter based on the number of iterations, the first intermediate parameter indicating a degree of heat of training; a second intermediate parameter determination module configured to determine a second intermediate parameter based on the auxiliary parameter and the first intermediate parameter, the intermediate parameter being proportional to the auxiliary parameter and inversely proportional to the first intermediate parameter; and an S-type function calculation module configured to determine a clipping parameter corresponding to the auxiliary parameter based on the S-type function of the intermediate parameter.

In some implementations, the clipping module 530 includes: a channel clipping module configured to clip an output channel corresponding to a clipping parameter if the clipping parameter of the plurality of clipping parameters is less than a predetermined threshold.

In some implementations, the apparatus 500 further includes: a deployment module configured to deploy the cropped neural network to a target computing device, the target computing device having a number of computing resources less than a threshold number.

In some implementations, the training data is image training data and the neural network is a neural network for image processing.

The elements included in apparatus 500 may be implemented in a variety of ways including software, hardware, firmware, or any combination thereof. In some embodiments, one or more of the units may be implemented using software and/or firmware, such as machine executable instructions stored on a storage medium. In addition to, or in the alternative to, machine-executable instructions, some or all of the elements in apparatus 500 may be implemented at least in part by one or more hardware logic components. By way of example, and not limitation, exemplary types of hardware logic components that may be used include Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standards (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and so forth.

Fig. 6 illustrates a block diagram of a computing device/server 600 in which one or more embodiments of the disclosure may be implemented. It should be understood that the computing device/server 600 illustrated in fig. 6 is merely exemplary, and should not be construed as limiting in any way the functionality and scope of the embodiments described herein.

As shown in fig. 6, computing device/server 600 is in the form of a general purpose computing device. Components of computing device/server 600 may include, but are not limited to, one or more processors or processing units 610, memory 620, storage 630, one or more communication units 640, one or more input devices 660, and one or more output devices 660. The processing unit 610 may be a real or virtual processor and can perform various processes according to programs stored in the memory 620. In a multiprocessor system, multiple processing units execute computer-executable instructions in parallel to improve the parallel processing capability of computing device/server 600.

Computing device/server 600 typically includes a number of computer storage media. Such media may be any available media that is accessible by computing device/server 600 and includes, but is not limited to, volatile and non-volatile media, removable and non-removable media. Memory 620 may be volatile memory (e.g., registers, cache, Random Access Memory (RAM)), non-volatile memory (e.g., Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory), or some combination thereof. Storage 630 may be a removable or non-removable medium and may include a machine-readable medium, such as a flash drive, a magnetic disk, or any other medium that may be capable of being used to store information and/or data (e.g., training data for training) and that may be accessed within computing device/server 600.

Computing device/server 600 may further include additional removable/non-removable, volatile/nonvolatile storage media. Although not shown in FIG. 6, a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, non-volatile optical disk may be provided. In these cases, each drive may be connected to a bus (not shown) by one or more data media interfaces. Memory 620 may include a computer program product 625 having one or more program modules configured to perform the various methods or acts of the various embodiments of the disclosure.

The communication unit 640 enables communication with other computing devices over a communication medium. Additionally, the functionality of the components of computing device/server 600 may be implemented in a single computing cluster or multiple computing machines capable of communicating over a communications connection. Thus, computing device/server 600 may operate in a networked environment using logical connections to one or more other servers, network Personal Computers (PCs), or another network node.

The input device 650 may be one or more input devices such as a mouse, keyboard, trackball, or the like. Output device 660 may be one or more output devices such as a display, speakers, printer, or the like. Computing device/server 600 may also communicate with one or more external devices (not shown), such as storage devices, display devices, etc., as desired, through communication unit 640, with one or more devices that enable a user to interact with computing device/server 600, or with any device (e.g., network card, modem, etc.) that enables computing device/server 600 to communicate with one or more other computing devices. Such communication may be performed via input/output (I/O) interfaces (not shown).

According to an exemplary implementation of the present disclosure, a computer-readable storage medium is provided, on which one or more computer instructions are stored, wherein the one or more computer instructions are executed by a processor to implement the above-described method.

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products implemented in accordance with the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various implementations of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The foregoing has described implementations of the present disclosure, and the above description is illustrative, not exhaustive, and not limited to the implementations disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described implementations. The terminology used herein was chosen in order to best explain the principles of implementations, the practical application, or improvements to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the implementations disclosed herein.

Claims

1. A method of compressing a neural network, comprising:

determining a plurality of auxiliary parameters by training the neural network with training data, the plurality of auxiliary parameters corresponding to a plurality of output channels comprised by convolutional layers of the neural network;

determining a plurality of clipping parameters corresponding to the plurality of output channels based on the plurality of auxiliary parameters and the number of iterations of the training, a clipping parameter indicating whether the corresponding output channel is to be clipped; and

clipping at least one output channel of the plurality of output channels based on the plurality of clipping parameters.

2. The method of claim 1, wherein determining the plurality of auxiliary parameters comprises:

determining at least one objective function of the neural network based on the training data;

determining a total objective function for training the neural network by combining at least a portion of the at least one objective function; and

determining the plurality of auxiliary parameters of the neural network by locally minimizing the overall objective function.

3. The method of claim 2, wherein determining at least one objective function of the neural network comprises:

weighting the output of the convolutional layer based on a plurality of training auxiliary parameters corresponding to the plurality of output channels; and

based on the weighted outputs, a loss function of the neural network is determined as a first objective function.

4. The method of claim 2, wherein determining at least one objective function of the neural network comprises:

determining a degree of model compression based on a plurality of training auxiliary parameters corresponding to the plurality of output channels; and

a second objective function is determined based on the model degree of compression and a target degree of compression.

5. The method of claim 4, wherein a degree of model compression is indicative of a floating point count of the neural network.

6. The method of claim 2, wherein the neural network comprises a residual block, and determining at least one objective function of the neural network comprises:

determining a first number of input channels of the residual block based on a plurality of training auxiliary parameters corresponding to the plurality of output channels;

determining a second number of output channels of the residual block based on the plurality of training auxiliary parameters; and

determining a third objective function based on the first number and the second number.

7. The method of claim 1, wherein determining a plurality of clipping parameters corresponding to the plurality of output channels comprises:

determining a first intermediate parameter based on the number of iterations, the first intermediate parameter indicating a heat of the training;

determining a second intermediate parameter based on an auxiliary parameter and the first intermediate parameter, the intermediate parameter being proportional to the auxiliary parameter and inversely proportional to the first intermediate parameter; and

and determining a cutting parameter corresponding to the auxiliary parameter based on the S-shaped function of the intermediate parameter.

8. The method of claim 1, wherein clipping at least one of the plurality of output channels comprises:

and if the cutting parameter in the plurality of cutting parameters is smaller than a preset threshold value, cutting an output channel corresponding to the cutting parameter.

9. The method of claim 1, further comprising:

deploying the trimmed neural network to a target computing device having a number of computing resources less than a threshold number.

10. The method of claim 1, wherein the training data is image training data and the neural network is a neural network for image processing.

11. An apparatus to compress a neural network, comprising:

an auxiliary parameter determination module configured to determine a plurality of auxiliary parameters corresponding to a plurality of output channels included in convolutional layers of the neural network by training the neural network with training data;

a clipping parameter determination module configured to determine a plurality of clipping parameters corresponding to the plurality of output channels based on the plurality of auxiliary parameters and the number of iterations of the training, a clipping parameter indicating whether the corresponding output channel is to be clipped; and

a clipping module configured to clip at least one of the plurality of output channels based on the plurality of clipping parameters.

12. The apparatus of claim 11, wherein the auxiliary parameter determination module comprises:

an objective function determination module configured to determine at least one objective function of the neural network based on the training data;

an objective function combining module configured to determine an overall objective function for training the neural network by combining at least a portion of the at least one objective function; and

an objective function solving module configured to determine the plurality of auxiliary parameters of the neural network by locally minimizing the overall objective function.

13. The apparatus of claim 12, wherein the objective function determination module comprises:

a weighting module configured to weight an output of the convolutional layer based on a plurality of training auxiliary parameters corresponding to the plurality of output channels; and

a first objective function determination module configured to determine a loss function of the neural network as a first objective function based on the weighted outputs.

14. The apparatus of claim 12, wherein the objective function determination module comprises:

a model compression degree determination module configured to determine a degree of model compression based on a plurality of training auxiliary parameters corresponding to the plurality of output channels; and

a second objective function determination module configured to determine a second objective function based on the model degree of compression and a target degree of compression.

15. The apparatus of claim 14, wherein a degree of model compression is indicative of a floating point count of the neural network.

16. The apparatus of claim 12, wherein the neural network comprises a residual block and the objective function determination module comprises:

a first number determination module configured to determine a first number of input channels of the residual block based on a plurality of training auxiliary parameters corresponding to the plurality of output channels;

a second number determination module configured to determine a second number of output channels of the residual block based on the plurality of training auxiliary parameters; and

a third objective function determination module configured to determine a third objective function based on the first number and the second number.

17. The device of claim 11, wherein the clipping parameter determination module comprises:

a first intermediate parameter determination module configured to determine a first intermediate parameter based on the number of iterations, the first intermediate parameter indicating a heat of the training;

a second intermediate parameter determination module configured to determine a second intermediate parameter based on an auxiliary parameter and the first intermediate parameter, the intermediate parameter being proportional to the auxiliary parameter and inversely proportional to the first intermediate parameter; and

and the S-shaped function calculation module is configured to determine the cutting parameters corresponding to the auxiliary parameters based on the S-shaped function of the intermediate parameters.

18. The device of claim 11, wherein the cropping module comprises:

a channel clipping module configured to clip an output channel corresponding to a clipping parameter of the plurality of clipping parameters if the clipping parameter is less than a predetermined threshold.

19. The apparatus of claim 11, further comprising:

a deployment module configured to deploy the tailored neural network to a target computing device having a number of computing resources less than a threshold number.

20. The apparatus of claim 11, wherein the training data is image training data and the neural network is a neural network for image processing.

21. An electronic device, comprising:

a memory and a processor;

wherein the memory is to store one or more computer instructions, wherein the one or more computer instructions are to be executed by the processor to implement the method of any one of claims 1 to 10.

22. A computer readable storage medium having one or more computer instructions stored thereon, wherein the one or more computer instructions are executed by a processor to implement the method of any one of claims 1 to 10.