WO2020133364A1

WO2020133364A1 - Neural network compression method and apparatus

Info

Publication number: WO2020133364A1
Application number: PCT/CN2018/125372
Authority: WO
Inventors: 朱佳峰; 魏巍; 卢惠莉
Original assignee: 华为技术有限公司
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2020-07-02
Also published as: CN113168565A

Abstract

A neural network compression method and apparatus, used to solve the problem in the prior art of inflexible model compression. The method comprises: according to an initial weight value threshold of an ith layer of an initial neural network model, cropping an initial weight value of the ith layer, and obtaining a cropped neural network model, wherein i is any positive integer from 1 to m, and m is the total number of layers of the neural network model; performing multiple rounds of training on the cropped neural network model, and during a tth round of training, determining a weight value threshold of the ith layer during the tth round of training according to a weight value threshold of the ith layer of the neural network model obtained from a tth-1 round of training, and cropping a current weight value of the ith layer during the tth round of training according to the weight value threshold of the ith layer during the tth round of training, wherein t is any positive number from 1 to q, and q is the total number of rounds of training. The present invention thus adaptively adjusts the weight value threshold of each cropping, and flexibly compresses the neural network.

Description

Neural network compression method and device

Technical field

This application relates to the field of neural networks, and in particular to a neural network compression method and device.

Background technique

With the continuous development of smart chips, deep learning technology has been increasingly applied to smart devices around people, such as mobile phones, smart home devices, wearable devices, and vehicle-mounted devices. However, the consumption of hardware resources by deep learning models (that is, neural network models) is still huge, such as storage space, memory, central processing unit (CPU) / neural network processor (network processing unit, NPU) calculation Resources, battery life, etc. For example, for embedded devices, even if an NPU specifically used to accelerate deep learning is used, the hardware resources are very limited, and it is difficult to carry the business needs of multiple models, which seriously restricts the performance and user experience of smart devices.

In order to solve the above problems, the compression technology for deep learning models came into being. This type of technology can eliminate a large amount of redundancy of the model's weight parameters within a tolerable drop in accuracy, and greatly reduce the size of the model. Sparse weights are currently commonly used compression techniques. Specifically, the weight sparseness achieves the weight sparseness by discarding a part of the small weight connections (weight is set to 0), thereby eliminating redundancy and speeding up operations. The weight sparseness can not only reduce the model volume by cutting out the redundant weights of the model, but more importantly, by excluding a part of the smaller weights, the redundant weak performance branches of the model can be cut out, and the residuals with superior performance can be strengthened through subsequent training. Branches to improve the final accuracy of the model.

In the current application of weight sparseness, the most commonly used method is to set the weights less than the threshold to zero based on the weight threshold to perform weight sparseness. However, in practical applications, the configuration of the weight threshold is relatively blind, usually relying on human experience to set, and iteratively needs to repeatedly train multiple settings to select a setting with better model accuracy.

Obviously, the above method takes a long time to achieve model sparseness, and the weight threshold is easily affected by human factors, resulting in unstable model performance. In other words, the above method is not flexible when implementing model compression.

Summary of the invention

Embodiments of the present application provide a neural network compression method and device to solve the problem of inflexible model compression in the prior art.

In the first aspect, the present application provides a neural network compression method, according to the initial weight threshold of the i-th layer of the initial neural network model, the initial weight of the i-th layer is clipped to obtain a clipped neural network model , The i takes any positive integer from 1 to m, where m is the total number of layers of the neural network model; multiple trainings are performed on the cropped neural network model, during the t-th training process , The weight threshold of the i-th layer during the t-th training is determined according to the weight threshold of the i-th layer of the neural network model obtained from the t-1th training, and the weight threshold of the i-th layer during the t-th training Crop the current weight value of the i-th layer during the t-th training; the t takes any positive number from 1 to q, and the q is the total number of trainings.

Through the above method, the weight threshold of each crop can be adjusted adaptively, which means that the neural network can be flexibly compressed, which can avoid the influence of human factors caused by artificially setting the weight threshold, thereby enhancing the stability of the performance of the neural network model. Sex.

In a possible design, before clipping the initial weight value of the i-th layer according to the initial weight threshold of the i-th layer of the initial neural network model, obtain the i-th layer of the initial neural network model Initial weights; then determine the mean and standard deviation of the initial weights of the i-th layer according to the initial weights of the i-th layer; and finally determine the average weights and standard deviations of the initial weights of the i-th layer The initial weight threshold of the i-th layer.

In the above manner, the initial weight threshold of each layer can be accurately determined, so that the weight of each layer of the initial neural network can be tailored.

In a possible design, the initial weight threshold of the i-th layer may conform to the following formula:

T _i = μ _i -λ·σ _i

Where T _i is the initial weight threshold of the i-th layer of the initial neural network model; μ _i is the mean of the initial weight of the i-th layer; σ _i is the standard deviation of the initial weight of the i-th layer; λ is the set Fixed value, λ≥0.

Through the above formula, the initial weight threshold of each layer can be accurately determined, so that the weight of each layer of the initial neural network can be tailored.

In a possible design, the weight threshold of the i-th layer during the t-th training can be determined according to the weight threshold of the i-th layer of the neural network model obtained from the t-1th training, which can meet the following formula:

among them,

Is the weight threshold of the i-th layer during the t-th training;

Is the weight threshold of the i-th layer during the t-1th training;

The gradient mean of the non-zero weights in the i-th layer of the neural network model obtained from the t-1th training;

Is the gradient of the non-zero weights in the i-th layer of the neural network model obtained from the t-1 training; N _i is the non-zero weights in the i-th layer of the neural network model obtained from the t-1 training Number, N _i is a positive integer greater than 1.

Through the above method, the weight threshold of each layer can be adaptively obtained during each training process, so that neural network compression can be flexibly performed.

In a possible design, the weight of any layer is tailored according to the weight threshold of any layer. The specific method may be: the weight of any layer is less than the weight of the any layer The weight of the weight threshold is set to zero, and the weight of the weight of any layer that is greater than or equal to the weight threshold of the any layer is kept unchanged. This can successfully complete the weight cutting of each layer.

In a second aspect, the present application also provides a neural network compression device, which has the function of implementing the method of the first aspect described above. The function can be realized by hardware, or can also be realized by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions.

In a possible design, the structure of the neural network compression device may include a weight cutting unit and a training unit, and these units may perform the corresponding functions in the above method examples. For details, see the detailed description in the method examples. To repeat.

In a possible design, the structure of the neural network compression device may include a processor and a memory, and the processor is configured to perform the above-mentioned method. The memory is coupled to the processor, and stores necessary program instructions and data of the neural network compression device.

In a third aspect, the present application also provides a computer storage medium that stores computer-executable instructions, which when used by the computer are used to cause the computer to execute the first Any one of the methods mentioned on the one hand.

According to a fourth aspect, the present application also provides a computer program product containing instructions, which when executed on a computer, causes the computer to perform any of the methods mentioned in the first aspect above.

According to a fifth aspect, the present application further provides a chip coupled to a memory, and used to read and execute program instructions stored in the memory to implement any of the methods mentioned in the first aspect above.

BRIEF DESCRIPTION

1 is a schematic diagram of a neural network provided by an embodiment of this application;

2 is a structural diagram of a computer device provided by an embodiment of the present application;

3 is a flowchart of a neural network compression method provided by an embodiment of this application;

4 is a flowchart of an example of a neural network compression method provided by an embodiment of the present application;

5 is a schematic structural diagram of a neural network compression device provided by an embodiment of the present application;

6 is a structural diagram of a neural network compression device provided by an embodiment of the present application.

detailed description

The application will be described in further detail below with reference to the drawings.

Embodiments of the present application provide a neural network compression method and device to solve the problem of inflexible model compression in the prior art. Among them, the method and the device described in this application are based on the same inventive concept. Since the principles of the method and the device to solve the problem are similar, the implementation of the device and the method can be referred to each other, and the repetition is not repeated.

In the following, the neural network in this application will be explained to facilitate the understanding of those skilled in the art:

Neural network is to imitate the behavior characteristics of animal neural network, similar to the structure of brain synapse connection for data processing. As a mathematical operation model, a neural network consists of a large number of nodes (or neurons) connected to each other. The neural network consists of an input layer, a hidden layer, and an output layer, such as shown in Figure 1. Among them, the input layer is the input data of the neural network; the output layer is the output data of the neural network; and the hidden layer is composed of many nodes connected between the input layer and the output layer, and is used to perform arithmetic processing on the input data. Among them, the hidden layer may be composed of one or more layers. The number of hidden layers in the neural network and the number of nodes are directly related to the complexity of the problem actually solved by the neural network, the number of nodes in the input layer and the number of nodes in the output layer.

In the embodiment of the present application, the neural network compression method may be but not limited to a processor. When the neural network compression device is a processor, the processor may be a processor in a computer device or other The processor in the device (for example, a chip system) may also be a separate processor. In the embodiments of the present application, a description will be made by taking a processor in a computer device executing a neural network compression method as an example.

In order to more clearly describe the technical solutions of the embodiments of the present application, the neural network compression method and device provided by the embodiments of the present application will be described in detail below with reference to the drawings.

FIG. 2 shows a structural diagram of a possible computer device applicable to the neural network compression method provided by the embodiment of the present application. Referring to FIG. 2, the computer device includes: a processor 210, a memory 220, a communication module 230, an input unit 240, a display unit 250, a power supply 260 and other components. Those skilled in the art can understand that the structure of the computer device shown in FIG. 2 does not constitute a limitation on the computer device. The computer device provided in the embodiments of the present application may include more or less components than the illustration, or a combination of Components, or different component arrangements.

The following describes each component of the computer device in detail with reference to FIG. 2:

The communication module 230 may be connected to other devices through a wireless connection or a physical connection to implement data transmission and reception by a computer device. Optionally, the communication module 230 may include any one or a combination of a radio frequency (RF) circuit, a wireless fidelity (WiFi) module, a communication interface, a Bluetooth module, etc. This embodiment of the present application does not make any limited.

The memory 220 can be used to store program instructions and data. The processor 210 executes program instructions stored in the memory 220 to execute various functional applications and data processing of the computer device. Among the program instructions, there are program instructions that enable the processor 210 to execute the neural network compression method provided by the following embodiments of the present application.

Optionally, the memory 220 may mainly include a program storage area and a data storage area. Among them, the storage program area can store the operating system, various application programs, and program instructions; the storage data area can store various data such as neural networks. In addition, the memory 220 may include a high-speed random access memory, and may also include a non-volatile memory, such as a magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.

The input unit 240 may be used to receive information such as data or operation instructions input by the user. Optionally, the input unit 240 may include input devices such as a touch panel, function keys, a physical keyboard, a mouse, a camera, and a monitor.

The display unit 250 can realize human-computer interaction, and is used to display information input by the user and information provided to the user through the user interface. Wherein, the display unit 250 may include a display panel 251. Optionally, the display panel 251 may be configured in the form of a liquid crystal display (liquid crystal) (LCD), an organic light-emitting diode (OLED), or the like.

Further, when the input unit includes a touch panel, the touch panel can cover the display panel 251, and when the touch panel detects a touch event on or near it, it is transmitted to the processor 210 To determine the type of touch event to perform the corresponding operation.

The processor 210 is a control center of a computer device, and uses various interfaces and lines to connect the above components. The processor 210 may execute program instructions stored in the memory 220 and call data stored in the memory 220 to complete various functions of the computer device and implement neural network compression provided by the embodiments of the present application method.

Optionally, the processor 210 may include one or more processing units. In an implementable manner, the processor 210 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application program, and the like, and the modem process The device mainly handles wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 210. In the embodiment of the present application, the processing unit may compress the neural network. For example, the processor 210 may be a central processing unit (CPU), a graphics processor (Graphics Processing Unit, GPU), or a combination of CPU and GPU. The processor 210 may also be a network processor (network processor) unit (NPU), a tensor processor (tensor processing unit, TPU), and other artificial intelligence (AI) chips that support neural network processing. The processor 210 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), a digital signal processing device (DSP), or a combination thereof. The PLD may be a complex programmable logic device (complex programmable logic device (CPLD), field programmable gate array (FPGA), general array logic (GAL) or any combination thereof.

The computer device also includes a power source 260 (such as a battery) for powering various components. Optionally, the power supply 260 may be logically connected to the processor 210 through a power management system, so as to realize functions such as charging and discharging the computer device through the power management system.

Although not shown, the computer device may further include components such as a camera, a sensor, and an audio collector, which will not be repeated here.

A neural network compression method provided by an embodiment of the present invention is applicable to the computer device shown in FIG. 2 and the neural network shown in FIG. 1. The method may be executed by a processor in the computer device. Referring to FIG. 3, the specific flow of the method may include:

Step 301: The processor cuts the initial weight of the i-th layer according to the initial weight threshold of the i-th layer of the initial neural network model to obtain a cropped neural network model, and the i is taken from 1 to m Any positive integer of, m is the total number of layers of the neural network model.

As we all know, the neural network model has many layers, that is, m is a positive integer greater than 1. When training a neural network or processing data based on a neural network model, it is usually layered training or processing. For example, in the process in step 301, when the initial neural network model is trimmed, the same operations are performed on each layer of the initial neural network model, that is, one layer at a time, and each layer is based on The initial weight threshold of the layer trims the initial weight of the layer.

In an optional implementation manner, the processor may obtain the first weight of the initial weight of the i-th layer before trimming the initial weight of the i-th layer according to the initial weight threshold of the i-th layer of the initial neural network model The initial weights of the i-th layer of the initial neural network model; then the mean and standard deviation of the initial weights of the i-th layer are determined according to the initial weights of the i-th layer, and according to the The mean and standard deviation of the initial weights determine the initial weight threshold of the i-th layer.

Exemplarily, the initial weight threshold of the i-th layer may meet the following formula 1:

T _i =μ _i -λ·σ _i formula one

Where, in the above formula 1, T _i is the initial weight threshold of the i-th layer of the initial neural network model; μ _i is the average value of the initial weight of the i-th layer; σ _i is the standard of the initial weight of the i-th layer Poor; λ is the set value, λ≥0.

In an optional embodiment, the mean value of the initial weights of the i-th layer may meet the following formula 2:

In the above formula 2, ω _in is the weight of the i-th layer of the initial neural network model; P _i is the number of weights of the i-th layer of the neural network model, and P _i is a positive integer greater than 1 .

In an optional implementation manner, the standard deviation σ _i of the initial weight of the i-th layer may meet the following formula 3:

In an optional embodiment, before acquiring the initial weight value of the i-th layer of the initial neural network model, the processor needs to train the neural network to obtain the ownership value of the neural network, and then obtain the initial Neural network model. Exemplarily, training the neural network to obtain the ownership value in the neural network may specifically include: building the neural network structure and the ownership value in the neural network through data input and neural network model construction.

In an optional implementation manner, when the processor trims the initial weight value of the i-th layer according to the initial weight threshold value of the i-th layer, the specific process may be that the processor will The weight in the i-th layer that is smaller than the initial weight threshold of the i-th layer is set to zero, and the weight in the i-th layer that is not smaller than the initial weight threshold of the i-th layer is kept unchanged. It should be noted that after the above-mentioned tailoring process, only some weights in the i-th layer are set to zero, not to delete the corresponding branches, that is to say, the branch with zero weights in the neural network model still exists, but the weights It's just zero. In the same way, the principle of zero-setting of weights (that is, the tailoring method) involved in the subsequent process is the same, and the specific follow-up will not be described in detail.

Step 302: The processor performs multiple trainings on the cropped neural network model. During the t-th training process, the first threshold is determined according to the weight threshold of the i-th layer of the neural network model obtained from the t-1 training The weight threshold of the i-th layer during t times of training, according to the weight threshold of the i-th layer during the t-th training, the current weight of the i-th layer during the t-th training is cropped; Pass any positive number from 1 to q, where q is the total number of times of training.

Based on step 301, after the processor trims the weights of all layers of the initial neural network, the processor executes step 302.

In an optional embodiment, the processor determines the weight threshold of the i-th layer during the t-th training according to the weight threshold of the i-th layer of the neural network model obtained from the t-1th training, which may meet the following Formula 4:

Among them, in the above formula 4,

Is the weight threshold of the i-th layer during the t-th training;

Is the weight threshold of the i-th layer during the t-1th training;

Is the gradient mean of the non-zero weights in the i-th layer of the neural network model obtained from the t-1th training;

In an optional embodiment, during each training process, the processor compares the current value of the i-th layer during the t-th training according to the weight threshold of the i-th layer during the t-th training When cutting weights, the specific formula 5 can be met:

In the above formula 5, ω ^t _in is the weight of the i-th layer during the t-th training.

It can be clearly seen from the above formula 5 that the processor sets zero when the absolute value of the weight in the i-th layer is less than the corresponding weight threshold, otherwise it remains unchanged. The specific principle is the same as that involved in step 301 The principle of trimming the weights of each layer in the initial neural network model is similar, and you can refer to each other for details.

After cutting the weights of each layer through the above, when training the neural network model (such as forward reasoning and back propagation update), you can exclude weights equal to 0 from matrix operations, for example, when a certain During the sub-matrix operation, if the input weight vector is a zero vector, the operation can be skipped directly to achieve the effect of speeding up the operation.

It can be understood that the above training process is a cyclic process, the weight of each training process is the weight obtained after the last training, and the weight threshold of each layer in each training is based on the last time The weight threshold of this layer is obtained during training. In each training, the weights of all layers of the neural network model are processed, and then the next training is performed until the training results meet certain conditions and the training ends. Exemplarily, when the training result meets certain conditions, the weight threshold may be converged, and so on. Through the above training method, in the neural network compression process, the weight threshold is no longer dependent on the setting and does not need to be repeatedly tried and trained, but the weight threshold can be adaptively adjusted according to the actual tailoring situation, which can be flexible Realize the compression of the neural network without being affected by human factors, which can make the final neural network model more stable.

It should be noted that in step 302, the first training is the first training of the cropped neural network model. When t is 1, the nerve obtained by training at the t-1th time (that is, the 0th time) The weight threshold of the i-th layer of the network model is the initial weight threshold used for the i-th layer of the initial neural network model during the cutting process. That is to say, the weight threshold of the i-th layer during the first training is obtained based on the initial weight threshold corresponding to the i-th layer.

Using the neural network compression method provided by the embodiment of the present application, the processor can adaptively adjust the weight threshold of each crop, that is, it can flexibly perform neural network compression, which can avoid the artificial factors caused by artificially setting the weight threshold. Influence, which can enhance the performance stability of neural network models.

Based on the above embodiments, the embodiments of the present application also provide an example of a neural network compression method, which is applicable to the computer device shown in FIG. 2 and the neural network shown in FIG. 1. The specific process of the example shown in FIG. 4 may include the following steps:

Step 401: The processor obtains a layer of initial weights from the initial neural network model.

Step 402: The processor determines the mean and variance of the initial weight of the layer according to the obtained initial weight of the layer.

Step 403: The processor determines the initial weight threshold of the layer according to the mean and variance of the initial weight of the layer.

Step 404: The processor trims the initial weight of the layer according to the initial weight threshold of the layer.

Step 405: The processor updates the initial weight value of the layer to the cropped weight value of the layer (that is, the processor writes the cropped weight value back to the neural network model).

Step 406: The processor determines whether the weights of all layers of the initial neural network model have been clipped. If yes, step 407 is executed to enter the neural network model training process, otherwise step 401 is executed.

Step 407: The processor obtains the gradient corresponding to the weight of a layer of the current neural network model and the weight threshold of the layer during the last training.

Step 408: The processor determines the weight threshold of the layer during the current training according to the gradient corresponding to the weight of the layer and the weight threshold of the layer during the previous training.

It should be noted that the weight threshold of the layer during the first training is determined based on the initial threshold of the layer when the weight of the layer of the initial neural network model is trimmed.

Step 409: The processor trims the current weight of the layer according to the determined weight threshold of the layer.

Step 410: The processor updates the weight of the layer in the neural network model. (That is, the processor writes back the weighted value of the layer after cropping back to the neural network model).

Step 411: The processor judges whether the weights of all layers have been processed in the current training. If yes, step 412 is entered; otherwise, step 407 is repeated.

Step 412: The processor completes a neural network model training.

Step 413: The processor judges whether the training of the neural network model is finished, if it is, then it is finished, otherwise step 407 is executed.

Based on the above example, the processor can adaptively adjust the weight threshold of each crop, that is to say, it can flexibly compress the neural network, which can avoid the influence of human factors caused by artificially setting the weight threshold, so that the neural network model can be enhanced Performance stability.

Based on the above embodiment, the embodiment of the present application further provides a neural network compression device, which is used to implement the embodiment shown in FIGS. 3 and 4 to provide a neural network compression method. Referring to FIG. 5, the neural network compression device 500 includes a weight trimming unit 501 and a training unit 502, where:

The weight cropping unit 501 is used to crop the initial weight of the i-th layer according to the initial weight threshold of the i-th layer of the initial neural network model to obtain a cropped neural network model. Any positive integer from 1 to m, where m is the total number of layers of the neural network model;

The training unit 502 is used to perform multiple trainings on the cropped neural network model;

The weight cropping unit 501 is further used to determine the first weight threshold of the i-th layer of the neural network model obtained by the training unit 502 during the t-th training process during the t-th training process. The weight threshold of the i-th layer during t times of training, according to the weight threshold of the i-th layer during the t-th training, the current weight of the i-th layer during the t-th training is cropped; Any positive number from 1 to q, where q is the total number of trainings.

In an optional embodiment, the neural network compression device may further include a weight acquisition unit 503 and a threshold determination unit 504 as shown in FIG. 5, specifically:

The weight obtaining unit 503 is used to obtain the initial nerve before the weight cutting unit 501 crops the initial weight of the i-th layer according to the initial weight threshold of the i-th layer of the initial neural network model The initial weight of the i-th layer of the network model; the threshold determination unit 504 is used to determine the mean and standard deviation of the initial weight of the i-th layer according to the initial weight of the i-th layer; The mean and standard deviation of the initial weight of the i-th layer determine the initial weight threshold of the i-th layer.

Exemplarily, the functions of the weight acquisition unit 503 and the threshold determination unit 504 may also be directly implemented by the weight clipping unit 501, which is not limited in this application.

In an optional implementation manner, the initial weight threshold of the i-th layer may conform to the following formula:

T _i = μ _i -λ·σ _i

In an optional embodiment, the weight clipping unit 501 determines the weight threshold of the i-th layer at the t-th training based on the weight threshold of the i-th layer of the neural network model obtained at the t-1th training Can meet the following formula:

among them,

Is the weight threshold of the i-th layer during the t-th training;

Is the weight threshold of the i-th layer during the t-1th training;

In an optional implementation manner, the weight clipping unit 501 is specifically used to: cut the weight of any layer when cutting the weight of the any layer according to the weight threshold of any layer The weight value less than the weight threshold value of any layer is set to zero, and the weight value of the weight value of any layer that is greater than or equal to the weight threshold value of the any layer is kept unchanged.

The neural network compression device provided by the embodiment of the present application can adaptively adjust the weight threshold of each crop, that is, the neural network compression can be flexibly performed, and the influence of human factors caused by artificially setting the weight threshold can be avoided. Therefore, the performance stability of the neural network model can be enhanced.

It should be noted that the division of the units in the embodiments of the present application is schematic, and is only a division of logical functions, and there may be another division manner in actual implementation. The functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or software function unit.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application essentially or part of the contribution to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute all or part of the steps of the methods described in the embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

Based on the above embodiments, an embodiment of the present application further provides a neural network compression device, which is used to implement the neural network compression method shown in FIG. 3 or FIG. 4. Referring to FIG. 6, the neural network compression device 600 may include: a processor 601 and a memory 602, where:

The processor 601 may be a CPU, GPU, or a combination of CPU and GPU. The processor 601 may also be an NPU, TPU, etc. AI chip that supports neural network processing. The processor 601 may further include a hardware chip. The above hardware chip may be ASIC, PLD, DSP or a combination thereof. The above PLD may be CPLD, FPGA, GAL or any combination thereof. It should be noted that the processor 601 is not limited to the above-mentioned cases, and the processor 601 may be any processing device capable of implementing neural network operations.

The processor 601 and the memory 602 are connected to each other. Optionally, the processor 601 and the memory 602 are connected to each other through a bus 603; the bus 603 may be a Peripheral Component Interconnect (PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture) , EISA) bus and so on. The bus can be divided into an address bus, a data bus, and a control bus. For ease of representation, only a thick line is used in FIG. 6, but it does not mean that there is only one bus or one type of bus.

In an implementation manner, when the processor 601 is used to implement the neural network compression method provided by the embodiment shown in FIG. 3 of the present application, it may specifically perform

steps

301 and 302 in the above embodiments shown in FIG. 3 Operation, and other operations in

steps

301 and 302 can be performed, the specific descriptions involved can refer to each other, and will not be repeated here.

In another implementation manner, when the processor 601 is used to implement the neural network compression method provided by the embodiment shown in FIG. 4 of the present application, it may specifically execute steps 401 to 413 in the above embodiments shown in FIG. 4 The operations in can be referred to each other, and will not be repeated here.

The memory 602 is used to store programs and data. Specifically, the program may include program code, and the program code includes instructions for computer operation. The memory 602 may include random access memory (random access memory, RAM), or may also include non-volatile memory (non-volatile memory), for example, at least one magnetic disk memory. The processor 601 executes the program stored in the memory 602 to realize the above functions, thereby implementing the neural network compression method shown in FIG. 3 or FIG. 4.

It should be noted that, when the neural network compression device shown in FIG. 6 can be applied to a computer device, the neural network compression device may be embodied as the computer device shown in FIG. 2. At this time, the processor 601 may be the same as the processor 210 shown in FIG. 2, and the memory 602 may be the same as the memory 220 shown in FIG. 2.

In summary, the neural network compression method and device provided by the embodiments of the present application can adaptively adjust the weight threshold of each crop, that is to say, the neural network compression can be flexibly performed, which can avoid artificially setting the weight threshold band The influence of human factors can enhance the performance stability of the neural network model.

Those skilled in the art should understand that the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, the present application may take the form of a computer program product implemented on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer usable program code.

This application is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the application. It should be understood that each flow and/or block in the flowchart and/or block diagram and a combination of the flow and/or block in the flowchart and/or block diagram may be implemented by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, special-purpose computer, embedded processing machine, or other programmable data processing device to produce a machine that enables the generation of instructions executed by the processor of the computer or other programmable data processing device A device for realizing the functions specified in one block or multiple blocks of one flow or multiple flows of a flowchart and/or one block or multiple blocks of a block diagram.

These computer program instructions may also be stored in a computer readable memory that can guide a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer readable memory produce an article of manufacture including an instruction device, the instructions The device implements the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.

These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of operating steps are performed on the computer or other programmable device to produce computer-implemented processing, which is executed on the computer or other programmable device The instructions provide steps for implementing the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.

Although the preferred embodiments of the present application have been described, those skilled in the art can make additional changes and modifications to these embodiments once they learn the basic inventive concept. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications falling within the scope of the present application.

Obviously, those skilled in the art can make various modifications and variations to the embodiments of the present application without departing from the scope of the embodiments of the present application. In this way, if these modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalent technologies, the present application is also intended to include these modifications and variations.

Claims

A neural network compression method, characterized in that it includes:

According to the initial weight threshold of the i-th layer of the initial neural network model, the initial weights of the i-th layer are clipped to obtain a clipped neural network model, and i is taken through any positive integer from 1 to m , M is the total number of layers of the neural network model;

Perform multiple trainings on the cropped neural network model. During the t-th training process, determine the i-th layer during the t-th training based on the weight threshold of the i-th layer of the neural network model obtained from the t-1th training The weight threshold of, the current weight of the i-th layer during the t-th training is clipped according to the weight threshold of the i-th layer during the t-th training; the t takes any one of 1 to q Positive number, the q is the total number of times of training.
The method according to claim 1, wherein before the initial weight of the i-th layer is clipped according to the initial weight threshold of the i-th layer of the initial neural network model, the method further comprises:

Acquiring the initial weight of the i-th layer of the initial neural network model;

Determining the mean and standard deviation of the initial weight of the i-th layer according to the initial weight of the i-th layer;

The initial weight threshold of the i-th layer is determined according to the mean and standard deviation of the initial weight of the i-th layer.
The method according to claim 1 or 2, wherein the initial weight threshold of the i-th layer conforms to the following formula:

T i = μ i -λ·σ i

Where T i is the initial weight threshold of the i-th layer of the initial neural network model; μ i is the mean of the initial weight of the i-th layer; σ i is the standard deviation of the initial weight of the i-th layer; λ is the set Fixed value, λ≥0.
The method according to any one of claims 1 to 3, wherein the weight threshold of the i-th layer during the t-th training is determined according to the weight threshold of the i-th layer of the neural network model obtained by the t-1th training , In line with the following formula:

among them,
Is the weight threshold of the i-th layer during the t-th training;
Is the weight threshold of the i-th layer during the t-1th training;
Is the gradient mean of the non-zero weights in the i-th layer of the neural network model obtained from the t-1th training;
Is the gradient of the non-zero weights in the i-th layer of the neural network model obtained from the t-1 training; N i is the non-zero weights in the i-th layer of the neural network model obtained from the t-1 training Number, N i is a positive integer greater than 1.
The method according to any one of claims 1 to 4, wherein the weighting of any layer according to the weighting threshold of any layer includes:

Set the weights of the weights of any layer less than the weight threshold of the any layer to zero, and set the weights of any layer greater than or equal to the weight threshold of the any layer The weights remain unchanged.
A neural network compression device, characterized in that it includes:

The weight clipping unit is used to crop the initial weight of the i-th layer according to the initial weight threshold of the i-th layer of the initial neural network model to obtain a cropped neural network model, where i is taken from 1 to any positive integer in m, where m is the total number of layers of the neural network model;

A training unit, configured to perform multiple trainings on the cropped neural network model;

The weight clipping unit is also used to determine the t-th time according to the weight threshold of the i-th layer of the neural network model obtained by the t-1th training of the training unit during the t-th training process of the training unit The weight threshold of the i-th layer during training, according to the weight threshold of the i-th layer during the t-th training, the current weight value of the i-th layer during the t-th training is clipped; Any positive number in q, where q is the total number of times of training.
The apparatus of claim 6, further comprising:

The weight acquisition unit is used to acquire the initial neural network model's initial weight before the weight clipping unit cuts the initial weight of the i-th layer according to the initial weight threshold of the i-th layer of the initial neural network model The initial weight of the i-th layer;

The threshold determining unit is used to determine the mean and standard deviation of the initial weight of the i-th layer according to the initial weight of the i-th layer; determine the value of the mean and standard deviation of the initial weight of the i-th layer The initial weight threshold of the i-th layer.
The device according to claim 6 or 7, wherein the initial weight threshold of the i-th layer conforms to the following formula:

T i = μ i -λ·σ i

Where T i is the initial weight threshold of the i-th layer of the initial neural network model; μ i is the mean of the initial weight of the i-th layer; σ i is the standard deviation of the initial weight of the i-th layer; λ is the set Fixed value, λ≥0.
The device according to any one of claims 6-8, wherein the weight cutting unit determines the t-th training time based on the weight threshold of the i-th layer of the neural network model obtained from the t-1th training The weight threshold of the i-layer conforms to the following formula:

among them,
Is the weight threshold of the i-th layer during the t-th training;
Is the weight threshold of the i-th layer during the t-1th training;
Is the gradient mean of the non-zero weights in the i-th layer of the neural network model obtained from the t-1th training;
Is the gradient of the non-zero weights in the i-th layer of the neural network model obtained from the t-1 training; N i is the non-zero weights in the i-th layer of the neural network model obtained from the t-1 training Number, N i is a positive integer greater than 1.
The device according to any one of claims 6-9, wherein the weight clipping unit is specifically used when clipping the weight of any layer according to the weight threshold of any layer :

Set the weights of the weights of any layer less than the weight threshold of the any layer to zero, and set the weights of any layer greater than or equal to the weight threshold of the any layer The weights remain unchanged.
A neural network compression device, characterized in that it includes:

Memory, used to store program instructions;

A processor, configured to couple with the memory, call program instructions in the memory, and perform the following operations:

According to the initial weight threshold of the i-th layer of the initial neural network model, the initial weights of the i-th layer are clipped to obtain a clipped neural network model, and i is taken through any positive integer from 1 to m , M is the total number of layers of the neural network model;

Perform multiple trainings on the cropped neural network model. During the t-th training process, determine the i-th layer during the t-th training based on the weight threshold of the i-th layer of the neural network model obtained from the t-1th training The weight threshold of, the current weight of the i-th layer during the t-th training is clipped according to the weight threshold of the i-th layer during the t-th training; the t takes any one of 1 to q Positive number, the q is the total number of times of training.
The apparatus according to claim 11, wherein the processor is further configured to:

Acquiring the initial weight of the i-th layer of the initial neural network model before trimming the initial weight of the i-th layer according to the initial weight threshold of the i-th layer of the initial neural network model;

Determining the mean and standard deviation of the initial weight of the i-th layer according to the initial weight of the i-th layer;

The initial weight threshold of the i-th layer is determined according to the mean and standard deviation of the initial weight of the i-th layer.
The apparatus according to claim 11 or 12, wherein the initial weight threshold of the i-th layer conforms to the following formula:

T i = μ i -λ·σ i

Where T i is the initial weight threshold of the i-th layer of the initial neural network model; μ i is the mean of the initial weight of the i-th layer; σ i is the standard deviation of the initial weight of the i-th layer; λ is the set Fixed value, λ≥0.
The device according to any one of claims 11-13, wherein the processor determines the i-th layer at the t-th training according to the weight threshold of the i-th layer of the neural network model obtained at the t-1th training The weight threshold of, meets the following formula:

among them,
Is the weight threshold of the i-th layer during the t-th training;
Is the weight threshold of the i-th layer during the t-1th training;
Is the gradient mean of the non-zero weights in the i-th layer of the neural network model obtained from the t-1th training;
Is the gradient of the non-zero weights in the i-th layer of the neural network model obtained from the t-1 training; N i is the non-zero weights in the i-th layer of the neural network model obtained from the t-1 training Number, N i is a positive integer greater than 1.
The apparatus according to any one of claims 11-14, wherein the processor, when tailoring the weight of any layer according to the weight threshold of any layer, is specifically used to:

Set the weights of the weights of any layer less than the weight threshold of the any layer to zero, and set the weights of any layer greater than or equal to the weight threshold of the any layer The weights remain unchanged.
A computer program product containing instructions, characterized in that, when the computer program product runs on a computer, it causes the computer to execute the method according to any one of claims 1-5.
A computer storage medium, characterized in that a computer program is stored in the computer storage medium, and when the computer program is executed by a computer, the computer is caused to execute the method according to any one of claims 1-5.
A chip, characterized in that the chip is coupled to a memory, and is used to read and execute program instructions stored in the memory to implement the method according to any one of claims 1-5.