CN112183746A

CN112183746A - Neural network pruning method, system and device for sensitivity analysis and reinforcement learning

Info

Publication number: CN112183746A
Application number: CN202011056171.8A
Authority: CN
Inventors: 陈海波; 关翔
Original assignee: Shenlan Artificial Intelligence Shenzhen Co Ltd
Current assignee: Shenlan Artificial Intelligence Shenzhen Co Ltd
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2021-01-05

Abstract

The invention discloses a neural network pruning method, system and device for sensitivity analysis and reinforcement learning, comprising the following steps: setting a sparsity threshold, and selecting a weight with low sensitivity for pruning; obtaining a cutting method and precision, and determining the weight of random pruning according to the sensitivity weight; randomly cutting each selected weight, and putting a pruning method and precision of multiple times of random cutting into a buffer area; training reinforcement learning, namely training a reinforcement learning agent by using data in a buffer area, and putting a cutting method and precision generated after training into the buffer area; and repeating the steps until the network precision reaches a preset value. According to the invention, the low-sensitivity weight is selected for pruning, and the sparsity threshold value of each weight is set to ensure that the reduced accuracy of the network is kept within a preset range after the pruned weight is pruned by adopting the current sparsity process. Under the condition of ensuring the network precision, the compression ratio of the neural network is maximally improved.

Description

Neural network pruning method, system and device for sensitivity analysis and reinforcement learning

Technical Field

The application relates to the technical field of deep learning compression, in particular to a neural network pruning method based on sensitivity analysis and reinforcement learning.

Background

Pruning (prune) is a compression technique for Convolutional Neural Networks (CNN), which is mainly used to reduce the computation load of Convolutional Neural Networks (CNN). The pruning algorithm usually achieves the purpose of reducing the calculation amount of the whole neural network by cutting off the unimportant tensor (tensor) in the weight (weight) of the neural network.

Which tensors (tensors) in the neural network weights (weights) are not important is determined by their sparsity (sparsity). Sparsity is expressed by the number of 0 s in the tensor (tensor) and the size of the tensor. Therefore, the tensor (tensor) with higher sparsity in the weight (weight) is cut out, and the purpose of compressing the Convolutional Neural Network (CNN) can be achieved.

The criterion of Convolutional Neural Network (CNN) compression is to guarantee the accuracy of the network while reducing the amount of computation. A sensitivity analysis (sensitivity analysis) is proposed in document 1 to solve the problem of clipping out a tensor (tensor) of which sparsity is greater than what in weight (weight). Namely, the tenors of each weight are cut out independently, and then the accuracy of the network is detected through a data verification set. In this way, the sensitivity of each weight can be analyzed to determine how much tensor (tensor) of the current weight is clipped. The pruning method based on sensitivity analysis (sensitivity analysis) mainly aims at independent weights, and does not consider the correlation among different weights, so that better compression efficiency cannot be achieved.

Neural network pruning based on reinforcement learning is an automatic pruning technology, sparsity (sparsity) of weight (weight) of a neural network can be automatically analyzed, then a reasonable decision is made to prune the network, and the network precision and the compression rate of the pruned network are good under most conditions.

The neural network pruning based on reinforcement learning is divided into three steps: randomly clipping multiple weights (weights) of the neural network, and then fine tuning (fine tuning) the clipped network to record the accuracy of the network. Then the clipping method and the precision of the neural network after clipping are recorded and put into a data buffer area. Repeating the first step for n times, after certain data are accumulated in the buffer area, training a reinforcement learning agent (agent) by using the data in the data buffer area in the second step, predicting a specific cutting action by using the agent (agent) trained in the second step, pruning the network by using the method, finely tuning (fine tuning) the network after the second step of pruning, recording the network precision after the fine tuning (fine tuning), and putting the cutting action predicted in the second step and the network precision after the third step of fine tuning (fine tuning) into a buffer area. And then jumps to the second step. When the net accuracy after the third fine tuning step reaches the expected value, the loop is stopped. The reinforcement learning-based method is to teach the network to adopt what method to tailor the network, so that high return (network accuracy) can be obtained. This requires that the data of the training agent (agent) is good enough and the information contained is complete enough. However, after randomly clipping a plurality of weights (weights) in the first step mentioned above, the obtained network precision is sometimes not ideal, and it is difficult to train an effective agent (agent) by using the "bad" data, or the time for training the agent (agent) is increased. The pruning approach based on reinforcement learning, although considering the impact of single weight (weight) and multiple weight (weight) clipping on network accuracy. However, sometimes, because valid data cannot be obtained, better agents (agents) cannot be trained, and these bad agents (agents) often cannot generate a good pruning method.

Disclosure of Invention

1. Objects of the invention

The invention provides a neural network pruning method based on sensitivity analysis and reinforcement learning, aiming at solving the problem of low network precision caused by the fact that training data in the reinforcement learning method cannot contain all information.

2. The technical scheme adopted by the invention

The invention provides a neural network pruning method for sensitivity analysis and reinforcement learning, which comprises the following steps:

setting a sparsity threshold, and selecting a weight with low sensitivity for pruning;

obtaining a cutting method and precision, and determining the weight of random pruning according to the sensitivity weight; randomly cutting each selected weight, and putting a pruning method and precision of multiple times of random cutting into a buffer area;

training reinforcement learning, namely training a reinforcement learning agent by using data in a buffer area, and putting a cutting method and precision generated after training into the buffer area; and repeating the steps until the network precision reaches a preset value.

Preferably, the sparsity threshold setting step selects a weight with low sensitivity for pruning, that is: and setting sparsity threshold values of all the weights, and cutting by adopting the current sparsity threshold value to keep the descending precision of the network within a preset range.

Preferably, a clipping method and precision are obtained, and each selected weight is randomly clipped, so that the sparsity of the selected weight is smaller than a sparsity threshold value.

Preferably, the step of training reinforcement learning includes training reinforcement learning agents by using data in a buffer area, predicting currently selected weights by using agents generated after training, determining a corresponding cutting method, cutting each network weight by using the generated cutting method, fine-tuning the cut network for multiple times, recording final network precision, and putting the trained cutting method and precision into the buffer area.

Preferably, the accuracy of the network degradation is kept within 20%.

The invention provides a neural network pruning system for sensitivity analysis and reinforcement learning, which comprises the following components:

a sparsity threshold setting module for selecting a weight with low sensitivity to prune;

a cutting method and precision module are obtained and used for determining the weight of random pruning according to the sensitivity weight; randomly cutting each selected weight, and putting a pruning method and precision of multiple times of random cutting into a buffer area;

the training reinforcement learning module is used for training a reinforcement learning agent by using data in the buffer area, and a cutting method and precision generated after training are put into the buffer area; and repeating the steps until the network precision reaches a preset value.

Preferably, the sparsity threshold module is configured to select a weight with low sensitivity for pruning, that is: and setting sparsity threshold values of all the weights, and cutting by adopting the current sparsity threshold value to keep the descending precision of the network within a preset range.

Preferably, the method for obtaining clipping and the precision module are used for randomly clipping each selected weight to ensure that the sparsity of the selected weight is smaller than a sparsity threshold.

Preferably, the training reinforcement learning module is configured to train a reinforcement learning agent by using data in the buffer, predict a currently selected weight by using an agent generated after training, determine a corresponding clipping method, clip each network weight by using the generated clipping method, fine-tune the clipped network for multiple times, record final network precision, and place the trained clipping method and precision in the buffer.

Preferably, the accuracy of the network degradation is kept within 20%.

The invention provides a neural network pruning device for sensitivity analysis and reinforcement learning, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of the method when executing the computer program.

The invention proposes a computer-readable storage medium on which a computer program is stored which, when being executed by a processor, carries out the method steps.

3. Advantageous effects adopted by the present invention

(1) According to the invention, the low-sensitivity weight is selected for pruning, and the sparsity threshold value of each weight is set to ensure that the reduced accuracy of the network is kept within a preset range after the pruned weight is pruned by adopting the current sparsity process. In the process of ensuring the high precision of the network,

(2) the invention provides a structure pruning method instead of element pruning, which maximally improves the compression ratio of a neural network under the condition of ensuring high precision, and specifically comprises the following steps:

the compression ratio has two measurement modes, one is a ratio based on the number of model parameters

Para_prunedRepresenting the number of model parameters after pruning, Para_unprunedRepresenting the number of parameters of the original model; one is based on the ratio of the calculated quantities of the model

MAC_prunedRepresenting the number of multiply-add times, MAC, of the pruned model_unprunedRepresenting the number of multiply-add times of the original model). The majority of the methods mentioned in the prior art use elemental pruning, although the compressibility can be greatly reduced

However, support of a specific chip architecture is required, and development of a specific chip is a long process, and the invention refers to a structure pruning method which can be conveniently accelerated by using existing hardware conditions (such as ARM neon, X86 sse2, GPU, and the like).

(3) In the prior art, pruning is performed according to the size of the model, the size of the neural network model is easy to control, but the calculated quantities MACs (multiply-add times) of the neural network are difficult to control well, and the pruned network needs to be deployed on specific hardware and has no universality.

In conclusion, the method adopted by the invention can prune the model calculated amount and the storage space at the same time.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a flow chart of the steps of obtaining a clipping approach and precision according to the present invention;

FIG. 3 illustrates the training reinforcement learning procedure of the present invention.

Detailed Description

The technical solutions in the examples of the present invention are clearly and completely described below with reference to the drawings in the examples of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without inventive step, are within the scope of the present invention.

The present invention will be described in further detail with reference to the accompanying drawings.

Example 1

The neural network is analyzed by sensitivity analysis (sensitivity analysis), so that the initial data in the data buffer area of the reinforcement learning is randomly performed within a certain range.

It can be determined by sensitivity (sensitivity analysis) which weights are more sensitive and cannot be pruned, and which weights are too sensitive and can be pruned.

The neural network pruning method for sensitivity analysis and reinforcement learning, as shown in fig. 1, includes:

s100, selecting weights with low sensitivity for pruning, setting the selected weights as W (W0, W1, W2.. wn), and then setting sparsity threshold values T (T0, T1, T2.. tn) of the weights. The thresholds are selected to ensure that the accuracy of the network degradation is kept within 20% after the clipped weight is clipped by the current sparsity.

And S200, determining the weight needing random pruning according to the W obtained in the step S100. As shown in fig. 2, includes:

s201, randomly clipping each selected weight wi, and ensuring that the sparsity of the weight wi is smaller than ti. Performing m rounds of experiments, performing fine tuning (fine tuning) on the neural network of each round of pruning for p times, and recording the network precision after fine tuning (fine tuning); the pruning method and precision of the m-round experiment are put into a buffer B.

S300, training an agent (agent) by using the data in the buffer B, as shown in fig. 3, includes:

s301, predicting by using the agent (agent) generated after training and the currently selected weight, and determining a corresponding cutting method;

s302, cutting each network weight by using the generated cutting method;

s303, fine tuning (fine tuning) is carried out on the cut network for p times, the final network precision is recorded, and the cutting method and precision are placed into a buffer B. And repeating the step 300 until the network precision reaches the requirement.

The invention provides a neural network pruning system for sensitivity analysis and reinforcement learning, which comprises the following components: the system comprises a sparsity threshold setting module, a cutting method and precision obtaining module and a training reinforcement learning module;

Wherein, set for the threshold module of sparsity, is used for choosing the weight of the low sensitivity to prune, namely: and setting sparsity threshold values of all the weights, and cutting by adopting the current sparsity threshold value to keep the descending precision of the network within a preset range.

The system comprises a cutting method acquisition module, a cutting precision module and a sparsity threshold acquisition module, wherein the cutting method acquisition module and the cutting precision module are used for randomly cutting each selected weight and ensuring that the sparsity is smaller than the sparsity threshold.

The training reinforcement learning module is used for training reinforcement learning agents by using data in the buffer area, predicting the currently selected weights by using agents generated after training, determining a corresponding cutting method, cutting each network weight by using the generated cutting method, finely adjusting the cut network for multiple times, recording the final network precision, and putting the trained cutting method and precision into the buffer area.

The machine-readable storage medium is a computer-readable storage medium, and can be used to store software programs, computer-executable programs, and modules, such as program instructions/modules (the illustrated obtaining module, the first determining module, the second determining module, and the object control module) corresponding to the virtual reality object control method in the embodiment of the present application. The processor detects the software program, the instructions and the modules stored in the machine-readable storage medium, so as to execute various functional applications and data processing of the terminal device, that is, to implement the above virtual reality object control method, which is not described herein again.

The machine-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the machine-readable storage medium may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be a Read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of example, but not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double data rate Synchronous Dynamic random access memory (DDR SDRAM), Enhanced Synchronous SDRAM (ESDRAM), Synchronous link SDRAM (SLDRAM), and direct memory bus RAM (DR RAM). It should be noted that the memories of the systems and methods described herein are intended to comprise, without being limited to, these and any other suitable memory of a publishing node. In some examples, the machine-readable storage medium may further include memory located remotely from the processor, which may be connected to the virtual reality device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method embodiments may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The processor may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website site, computer, virtual reality device, or data center to another website site, computer, virtual reality device, or data center by wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a virtual reality device, a data center, etc., that incorporates one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A neural network pruning method for sensitivity analysis and reinforcement learning is characterized by comprising the following steps:

2. The neural network pruning method for sensitivity analysis and reinforcement learning according to claim 1, wherein the sparsity threshold setting step selects the weight with low sensitivity for pruning, namely: and setting sparsity threshold values of all the weights, and cutting by adopting the current sparsity threshold value to keep the descending precision of the network within a preset range.

3. The neural network pruning method for sensitivity analysis and reinforcement learning according to claim 2, wherein the obtaining a clipping method and precision step performs random clipping on each selected weight to ensure that the sparsity is less than a sparsity threshold.

4. The neural network pruning method for sensitivity analysis and reinforcement learning of claim 3, wherein the step of training reinforcement learning comprises training a reinforcement learning agent using data in a buffer, predicting using a currently selected weight of the agent generated after training, determining a corresponding clipping method, clipping each network weight using the generated clipping method, fine-tuning the clipped network for a plurality of times, recording the final network precision, and putting the trained clipping method and precision into the buffer.

5. The neural network pruning method for sensitivity analysis and reinforcement learning of claim 1, wherein the accuracy of the network degradation is kept within 20%.

6. A neural network pruning system for sensitivity analysis and reinforcement learning is characterized in that,

7. The neural network pruning system for sensitivity analysis and reinforcement learning according to claim 6, wherein the sparsity threshold module is configured to select the weights with low sensitivity for pruning, namely: and setting sparsity threshold values of all the weights, and cutting by adopting the current sparsity threshold value to keep the descending precision of the network within a preset range.

8. The neural network pruning system for sensitivity analysis and reinforcement learning of claim 7, wherein the acquisition clipping approach and precision module is configured to perform random clipping on each selected weight to ensure that the sparsity is less than a sparsity threshold.

9. The neural network pruning system for sensitivity analysis and reinforcement learning of claim 8, wherein the training reinforcement learning module is configured to train a reinforcement learning agent using data in the buffer, predict using a currently selected weight of the agent generated after training, determine a corresponding clipping method, clip each network weight using the generated clipping method, fine tune the clipped network a plurality of times, record a final network precision, and place the trained clipping method and precision in the buffer.

10. The sensitivity analysis and learning-intensive neural network pruning system according to claim 6, wherein the accuracy of the network degradation is kept within 20%.

11. A neural network pruning device for sensitivity analysis and reinforcement learning comprises a memory and a processor, wherein the memory stores a computer program and is characterized in that; the processor, when executing the computer program, realizes the method steps of any of claims 1-5.

12. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program implementing the method steps of any one of claims 1 to 5 when executed by a processor.