WO2023024407A1 - 基于相邻卷积的模型剪枝方法、装置及存储介质 - Google Patents

基于相邻卷积的模型剪枝方法、装置及存储介质 Download PDF

Info

Publication number
WO2023024407A1
WO2023024407A1 PCT/CN2022/071221 CN2022071221W WO2023024407A1 WO 2023024407 A1 WO2023024407 A1 WO 2023024407A1 CN 2022071221 W CN2022071221 W CN 2022071221W WO 2023024407 A1 WO2023024407 A1 WO 2023024407A1
Authority
WO
WIPO (PCT)
Prior art keywords
pruning
filter
model
convolution
parameters
Prior art date
Application number
PCT/CN2022/071221
Other languages
English (en)
French (fr)
Inventor
王晓锐
郑强
高鹏
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023024407A1 publication Critical patent/WO2023024407A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular to a method, device and computer-readable storage medium for model pruning based on adjacent convolution.
  • Convolutional Neural Networks is a type of Feedforward Neural Networks (Feedforward Neural Networks) that includes convolution calculations and has a deep structure. It is one of the representative algorithms for deep learning. Although the convolutional neural network has good performance, the convolutional neural network model requires a lot of computing overhead, and the model contains a lot of redundant information, so the convolutional neural network needs to be compressed. In the field of smart medical care, convolutional neural networks can widely support functions such as auxiliary disease diagnosis, health management, and remote consultation; existing convolutional neural network model compression methods applied to smart medical care include model pruning, quantization, and distillation. The model pruning in the prior art is by selecting and removing relatively unimportant convolution kernel filters; fine-tuning the model with unimportant filters removed; it has been restored due to the removal of fine-tuning resulting in a loss of precision.
  • This application provides a model pruning method, system, electronic device, and storage medium based on adjacent convolution. Its main purpose is to solve the problem that the convolution pruning process is limited to single-layer convolution in existing smart medical scenarios. The problem of the relationship between the two layers of convolution is not considered.
  • the present application provides a model pruning method based on adjacent convolution, which is applied to electronic devices, including: using the absolute value function to obtain the filter Manhattan distance of the filter matrix in the convolutional layer to be evaluated and The channel Manhattan distance of the channel matrix, according to the Manhattan distance of the filter, obtains the convolution layer filter mode parameter, and obtains the convolution layer channel mode parameter through the channel Manhattan distance; the convolution layer filter mode parameter and the volume
  • the product of the parameters of the multilayer channel mode forms a filter pruning probability parameter for judging the filter pruning probability; sorts the filter pruning probability parameters according to preset rules, and prunes the filter according to the filter pruning probability parameter.
  • the sorting result of the probability parameter determines the filter to be pruned; the determined filter to be pruned is clipped.
  • the present application also provides a model pruning device based on adjacent convolution, which includes: a filter method parameter and channel method parameter acquisition unit, which is used to obtain the convolution layer to be evaluated by using the absolute value function
  • the filter Manhattan distance of the filter matrix and the channel Manhattan distance of the channel matrix in the filter matrix obtain the convolutional layer filter mode parameters according to the filter Manhattan distance, and obtain the convolutional layer channel mode parameters through the channel Manhattan distance; filter
  • the device pruning probability parameter acquisition unit is used to form the filter pruning probability parameter for judging the filter pruning probability by the product of the convolutional layer filter mode parameter and the convolutional layer channel mode parameter; to be pruned
  • the filter determination unit is configured to sort the filter pruning probability parameters according to preset rules, and determine the filter to be pruned according to the sorting result of the filter pruning probability parameters; the pruning unit is used to Pruning the determined filter to be pruned.
  • the present application also provides an electronic device, the electronic device includes: at least one processor; and a memory connected to the at least one processor in communication; wherein, the memory stores information that can be used by the Instructions executed by at least one processor, the instructions are executed by the at least one processor, so that the at least one processor can execute the steps in the foregoing adjacent convolution-based model pruning method.
  • the present application also provides a computer-readable storage medium, at least one instruction is stored in the computer-readable storage medium, and the at least one instruction is executed by a processor in the electronic device to realize the above-mentioned Adjacent convolution-based approach to model pruning.
  • the above-mentioned adjacent convolution-based model pruning method provided by this application is based on the fusion of the importance of adjacent two-layer convolution, which solves the problem that in the prior art, the convolution pruning process is limited to single-layer convolution without considering two layers.
  • the problem of the relationship between layer convolutions it can really get unimportant filters in the convolution; and then realize the technical effect of achieving higher precision while maintaining the relatively good model performance of the convolution model.
  • FIG. 1 is a schematic flow diagram of a model pruning method based on adjacent convolution according to an embodiment of the present application
  • Fig. 2 is a logical structural block diagram of a model pruning device based on adjacent convolution according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of an internal structure of an electronic device implementing a model pruning method based on adjacent convolution according to an embodiment of the present application.
  • AI artificial intelligence
  • digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
  • the artificial intelligence software technology in this application is machine learning technology based on convolutional neural network. Convolutional neural networks can be used in many different fields, such as speech recognition, medical diagnosis, application testing, etc.
  • the filter selection method for the relatively unimportant convolution kernel in the prior art does not consider the relationship between the filters, so the redundant information in the convolution filter of the model cannot be fully exploited
  • the model pruning method based on adjacent convolution of the present application fully considers the redundant information in the convolution filter and the relationship between the two layers of convolution, and realizes the relatively good model performance of the convolution model. At the same time, the technical effect of higher precision is achieved.
  • the method has: 1) judge the importance of the filter of the convolution kernel by the size of the parameter (weights) of one layer of batch regression; Although it is easy to understand, It is easy to implement, but the weights of the BN layer are difficult to measure the amount of information that the correlation filter really has, so that the information correlation between the filters cannot be measured; 2)
  • the size of the convolution kernel is determined by the Manhattan distance of the filter or the size of the Euclidean norm value.
  • the importance of the filter although it is easy to understand and easy to implement, it only depends on the size of the value, so the information correlation between the filters cannot be measured; 3)
  • the convolution kernel is determined by the geometric median of the filter space
  • the importance of the filter that is, the filter that is closest to the median of all filters is obtained by calculation, and the above filter is judged to be unimportant and pruned.
  • the information content of the above-mentioned geometric median cannot replace the information content of the filter.
  • the hardware of the model pruning method based on adjacent convolution in this application uses NVIDIA V100GPU, and both use the PyTorch framework.
  • PyTorch is a Python package developed by Facebook for training neural networks, and it is also a deep learning framework built by Facebook.
  • PyTorch provides an abstract method similar to NumPy to represent tensors (or multidimensional arrays), which can use GPU (Graphics Processing Unit) to accelerate training.
  • GPU Graphics Processing Unit
  • PyTorch can arbitrarily change the behavior of the network with zero delay or zero cost.
  • PyTorch is an end-to-end machine learning framework with features including Torch scripting, distributed training, mobile (experimental), tools and libraries, native ONNX support, C++ front end, and cloud partners. PyTorch is simple and easy to use, and it is also excellent in the speed of the model. Compared with frameworks such as TensorFlow, many models may be implemented faster on PyTorch. Therefore, it is applicable to the scene of model pruning of adjacent convolution in this application.
  • Common convolutional neural networks are composed of convolutional layers, BN layers (Batch Normalization batch normalization layers), and ReLu layers (non-linear activation layers) in the order of repeated superposition.
  • Each group of convolutional layer, batch normalization layer, and nonlinear activation layer is used as the feature extraction unit of the convolutional neural network, and these feature extraction units are arranged in the depth direction of the convolutional neural network.
  • the output feature map of one set of feature extraction units is used as the input feature map of the next set of feature extraction units.
  • Each cuboid in the convolutional layer is a filter, and each filter has several channels from front to back. This application judges the importance of the convolutional layer from the two dimensions of filter and channel.
  • FIG. 1 is a schematic flowchart of a method for pruning a model based on adjacent convolution provided by an embodiment of the present application.
  • the present application provides a model pruning method based on adjacent convolution, which can be executed by a device, and the device can be implemented by software and/or hardware.
  • the model pruning method based on adjacent convolution includes: Steps S110-S140:
  • S110 Use the absolute value function to obtain the filter Manhattan distance of the filter matrix in the convolution layer to be evaluated and the channel Manhattan distance of the channel matrix, obtain the convolution layer filter mode parameters according to the filter Manhattan distance, and pass the The channel Manhattan distance gets the channel mode parameter of the convolutional layer.
  • the filter-level sparsification of the convolutional layer is performed by adding a penalty term (group lasso) to the objective function of the network.
  • the input of the convolutional layer to be evaluated is the input feature map H*W*C in
  • the output is the convolution kernel W(C in *k h *k w )*(C out ) and the output feature map (H*W) *(C out ); where H and W are the height and width of the output feature map, respectively. It can be seen from matrix multiplication that the corresponding row in the convolution kernel is only multiplied by a specific column in the input feature map matrix.
  • the loss function is obtained by the following formula:
  • each element p i represents the probability that the sample belongs to the i-th class
  • the importance calculation based on the filter method focuses on the filter dimension, and compares each filter with other filters for comparison in units of filters.
  • the parameters in the filter are used as the basis for evaluation, and the difference of the mean value of the parameter Manhattan distance among multiple filters is compared. That is to say, when the average value of the Manhattan distance of the filter is smaller, it means that the importance of changing the filter is less, and it does not play a significant role in the calculation of the neural network, and its information will be redundant. It can be deleted.
  • the average value of the Manhattan distance is large, it means that the filter has a significant impact on the calculation results, and it contains a significant amount of information, which cannot be deleted.
  • the convolutional layer to be evaluated whose convolution parameter is a matrix of N ⁇ c ⁇ k ⁇ k as an example, N is the number of filters, and c is the number of channels in each filter.
  • the absolute value function is the LV function used to calculate the absolute value.
  • the Manhattan distance is to take the absolute value of the elements in the c ⁇ k ⁇ k matrix, and then average the c ⁇ k ⁇ k absolute values, and the obtained value is the index of the pruning probability of the filter, that is Indicator of filter importance.
  • the row and column sparseness of the 2D matrix is achieved by sparsely filtering the filter level, and then the values that are all 0 are cut off on the row and column of the matrix to reduce the dimension of the matrix, thereby improving the operational efficiency of the convolution model.
  • the channel-based importance calculation focuses on the channel dimension. Still taking the convolution layer whose convolution parameters are N ⁇ c ⁇ k ⁇ k matrices to be evaluated as an example, the filter divides it into N c ⁇ k ⁇ k combinations, and the channel divides it into c N A combination of ⁇ k ⁇ k.
  • the mean value of the Manhattan distance is used as an index to evaluate the importance of the channel. That is to say, the importance of the layer is reflected by comparing the importance of each channel inside a certain layer. Different channels have different Manhattan distances. The larger the value, the greater the importance of the channel, and the less it can be removed in pruning. After cutting out several channels of the current layer, the output feature map is reconstructed to minimize the loss of information.
  • S120 Multiply the filter mode parameter of the convolution layer with the channel mode parameter of the convolution layer to form a filter pruning probability parameter for judging the filter pruning probability.
  • the filter pruning probability parameter described in the filter mode channel mode is also an important parameter of the filter, which is obtained by the following formula:
  • Filter pruning probability parameter convolutional layer filter mode parameter ⁇ convolutional layer channel mode parameter.
  • the different channel channels of the second convolution are generated by the different filter convolution kernels of the first convolution. If the first convolution is evaluated by filter method, and the second convolution is evaluated by channel method, the importance of the evaluation is to evaluate the filter of the first layer, so the results of the two can be fused .
  • filter pruning probability parameter convolution layer filter mode parameter ⁇ convolution layer channel mode parameter. That is, the results of the two are multiplied and fused to obtain a new index to more accurately evaluate the importance of the filter. This evaluation not only considers the importance of the filters used to generate feature maps, but also the importance of channels when using feature maps.
  • S130 Sort the filter pruning probability parameters according to preset rules, and determine the filter to be pruned according to the sorting result of the filter pruning probability parameters.
  • the step of determining the filter to be pruned according to the sorting result of the filter pruning probability parameter includes: sorting the filter pruning probability parameter from large to small, and making the filter pruning probability parameter smaller than the preset threshold
  • the channel of is used as the filter to be pruned; wherein, the preset threshold is 1%.
  • the pruned model After merging the importance results of the filter method and channel method, sort the filters according to the size of the results, and remove a certain number of filters with a relatively small total value in the case of pre-selecting the pruning rate After pruning and its related weights, the pruned model can be obtained. That is to sort according to the filter pruning probability parameter from large to small, and then cut out all channels whose importance is less than the preset threshold.
  • the preset threshold may be 1%.
  • the importance value of a certain layer of the network can be used to determine the pruned channel of the layer.
  • layer l channels whose importance is less than p times the maximum value in this layer will be pruned; following the above symbolic system, the set of pruned channels in layer l is where p ⁇ (0,1) is the threshold.
  • p ⁇ (0,1) is the threshold.
  • Filter-level only changes the number of filter banks and feature channels in the network, and the obtained model can run without special algorithm design, which is called structured pruning. Just because a module is redundant for the current stage, doesn't mean it's also redundant for other stages. By comprehensively considering the importance of adjacent convolutions, the linkage between the layers of the convolution framework is realized, and then the pruning of the entire convolution model is completed.
  • the layer to be pruned is determined through steps S110-S130, and then pruned according to a preset pruning threshold or ratio.
  • the layers that need to be pruned are generally fully connected layers.
  • the method for clipping the determined filter to be pruned includes:
  • S141 Obtain the filter to be pruned, and train the pruning model based on adjacent convolution according to the filter to be pruned and the preset pruning threshold;
  • S142 Acquire according to the original parameters of the pruning model based on adjacent convolution Mask matrix; wherein, the size of the mask matrix is consistent with the original parameter matrix of the adjacent convolution-based pruning model, and the mask matrix is a training matrix including 0 and 1;
  • S143 using the The mask matrix adjusts the parameters of the adjacent convolution-based pruning model.
  • the method of using the mask matrix to adjust the parameters of the adjacent convolution-based pruning model includes: multiplying the parameters of the adjacent convolution-based pruning model by the mask matrix ; Filter the model parameters of the pruned model whose mask is 1, and carry out training and backpropagation adjustment to the model parameter value described in the mask; store the model parameter value adjusted by backpropagation and its corresponding matrix position; pass The final parameters of the adjacent convolution-based pruning model are obtained from the model parameter values and their corresponding matrix positions, and the parameters of the pruning model are adjusted.
  • pruning includes cutting the number of channels in the convolution kernel, and cutting the input corresponding to the number of channels in the convolution kernel The number of channels in the feature map and the convolution kernel of the corresponding upper layer that outputs the current input feature map.
  • the specific implementation method is to add a mask matrix with the same size as the parameter matrix by modifying the code; there are only 0 and 1 in the mask matrix, which is actually used for retraining the network. That is to say, the ratio of the number of pruned channels to the total number of channels in the network is defined as the pruning rate, expressed as pruned_ratio.
  • the upper limit of pruning ratio upper_ratio is 1, and the lower limit of pruning ratio lower_ratio is 0.
  • the initial pruning ratio is 0.5.
  • the average value of the feature maps of each channel that has been sorted that is, sortmin ⁇ max ⁇ ch_avg ⁇ , where the mask value of the channel selection layer corresponding to the first pruned_channels number of channels is set to 0, and the mask value of the channel selection layer corresponding to the remaining channels is set to is 1.
  • pruning is an iterative process
  • model pruning is usually referred to as "iterative pruning”.
  • the iterative process is an alternating and repeated process of pruning and model training.
  • the goal of model pruning is to keep only important weights, and the processing platforms include fully connected layer pruning and convolutional layer pruning. It has different implications for deep neural networks. The biggest impact is to reduce the computational cost while maintaining the same performance, and delete those features that are not really used in the deep network, which can also speed up the reasoning and training process; the second impact is to reduce the number of parameters, that is, reduce the parameter space. Redundancy can improve the generalization ability of the model.
  • the vgg16 model is used as an example to verify the effectiveness of the algorithm.
  • the data set uses cifar10, and each experiment performs model compression training for 500 epochs.
  • the hardware uses NVIDIAV100GPU, and both use the PyTorch framework.
  • the uncompressed model accuracy is 93.99%, and the pruned model accuracy is shown in Table 1 below.
  • APoZ is the pruning based on the average percentage of zero in the featuremap.
  • Network Trimming A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures; MeanAct is based on the pruning with the smallest average activation value.
  • FPGM is channel pruning based on geometric median, see Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration for details
  • TaylarFO is channel pruning based on first-order Taylor expansion, for details See Importance Estimation for Neural Network Pruning for information. It can be seen from the table that the method in this application has obtained the highest accuracy, which is better than other pruning methods. It shows that this application has indeed found unimportant filters in convolution, which is a good pruning method, which can effectively balance detection loss rate and accuracy in pruning.
  • the adjacent convolution-based model pruning method of this application solves the problem that the convolution pruning process in the prior art is limited to single-layer convolution without considering the two-layer convolution based on the fusion of the importance of two adjacent convolution layers.
  • the problem of the relationship between layer convolutions it can really get unimportant filters in the convolution; and then realize the technical effect of achieving higher precision while maintaining the relatively good model performance of the convolution model.
  • the present application also provides an adjacent convolution-based model pruning device.
  • Fig. 2 shows the functional modules of the adjacent convolution-based model pruning device according to an embodiment of the present application.
  • the adjacent convolution-based model pruning apparatus 200 can be installed in an electronic device.
  • the model pruning device 200 based on adjacent convolution may include a filter mode parameter and channel mode parameter acquisition unit 210, a filter pruning probability parameter acquisition unit 220, and a filter to be pruned determination unit 230 and pruning unit 240.
  • the unit described in this application can also be called a module, which refers to a series of computer program segments that can be executed by the processor of the electronic device and can complete a certain fixed function, and are stored in the memory of the electronic device.
  • each module/unit is as follows:
  • the filter mode parameter and channel mode parameter acquisition unit 210 is used to obtain the filter Manhattan distance of the filter matrix in the convolutional layer to be evaluated and the channel Manhattan distance of the channel matrix by using an absolute value function, and acquire according to the filter Manhattan distance Convolutional layer filter mode parameters, and obtain convolutional layer channel mode parameters through the channel Manhattan distance;
  • the filter pruning probability parameter acquisition unit 220 is used to form the filter pruning probability parameter for judging the filter pruning probability by multiplying the convolutional layer filter mode parameter and the convolutional layer channel mode parameter;
  • the filter to be pruned determination unit 230 is configured to sort the filter pruning probability parameters according to preset rules, and determine the filter to be pruned according to the sorting result of the filter pruning probability parameters;
  • the pruning unit 240 is configured to prune the determined filter to be pruned.
  • the pruning unit 240 further includes a model training subunit, a parameter adjustment unit, and a model pruning subunit (not shown in the figure).
  • the model training subunit is used to obtain the filter to be pruned, and train the pruning model based on adjacent convolution according to the filter to be pruned and the preset pruning threshold; according to the pruning model based on adjacent convolution
  • the original parameters of the model obtain a mask matrix; wherein, the mask matrix is consistent with the size of the original parameter matrix of the pruned model based on adjacent convolution, and the mask matrix is a training matrix including 0 and 1;
  • a parameter adjustment subunit configured to use the mask matrix to adjust the parameters of the adjacent convolution-based pruning model
  • the model pruning subunit is configured to perform pruning using the adjacent convolution-based pruning model after parameter adjustment.
  • the adjacent convolution-based model pruning device of the present application solves the problem that the convolution pruning process in the prior art is limited to single-layer convolution and does not consider two The problem of the relationship between layer convolutions; it can really get unimportant filters in the convolution; and then realize the technical effect of achieving higher precision while maintaining the relatively good model performance of the convolution model.
  • the adjacent convolution-based model pruning method proposed in this application solves the problem that the convolution pruning process in the prior art is limited to The single-layer convolution; it can really get the unimportant filters in the convolution; and then achieve higher accuracy while maintaining the relatively good model performance of the convolution model technical effect.
  • the present application provides an electronic device 3 based on a neighboring convolution-based model pruning method.
  • the electronic device 3 may include a processor 30 , a memory 31 and a bus, and may also include a computer program stored in the memory 31 and operable on the processor 30 , such as a model pruning program 32 based on adjacent convolution.
  • the memory 31 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, mobile hard disk, multimedia card, card type memory (for example: SD or DX memory, etc.), magnetic memory, magnetic disk, CD etc.
  • the storage 31 may be an internal storage unit of the electronic device 3 in some embodiments, such as a mobile hard disk of the electronic device 3 .
  • the memory 31 can also be an external storage device of the electronic device 3 in other embodiments, such as a plug-in mobile hard disk equipped on the electronic device 3, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital , SD) card, flash memory card (Flash Card), etc.
  • the memory 31 may also include both an internal storage unit of the electronic device 3 and an external storage device.
  • the memory 31 can not only be used to store application software and various data installed in the electronic device 3, such as the code of the model pruning program based on adjacent convolution, but also can be used to temporarily store the data.
  • the processor 30 may be composed of an integrated circuit, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits with the same function or different functions, including one or more Central processing unit (Central Processing unit, CPU), microprocessor, digital processing chip, graphics processor and a combination of various control chips, etc.
  • the processor 30 is the control core (Control Unit) of the electronic device, and uses various interfaces and lines to connect the various components of the entire electronic device, and runs or executes programs or modules stored in the memory 31 (for example, based on adjacent convolution model pruning program, etc.), and call the data stored in the memory 41 to execute various functions of the electronic device 3 and process data.
  • Control Unit Control Unit
  • the bus may be a peripheral component interconnect standard (PCI for short) bus or an extended industry standard architecture (EISA for short) bus or the like.
  • PCI peripheral component interconnect standard
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the bus is configured to realize connection and communication between the memory 31 and at least one processor 30 and the like.
  • Figure 3 only shows an electronic device with components, and those skilled in the art can understand that the structure shown in Figure 3 does not constitute a limitation on the electronic device 3, and may include fewer or more components than those shown in the illustration. components, or combinations of certain components, or different arrangements of components.
  • the electronic device 3 may also include a power supply (such as a battery) for supplying power to various components.
  • the power supply may be logically connected to the at least one processor 30 through a power management device, so that through power management
  • the device implements functions such as charge management, discharge management, and power consumption management.
  • the power supply may also include one or more DC or AC power supplies, recharging devices, power failure detection circuits, power converters or inverters, power status indicators and other arbitrary components.
  • the electronic device 3 may also include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
  • the electronic device 3 may also include a network interface, optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which are usually used in the electronic device 3 Establish a communication connection with other electronic devices.
  • a network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which are usually used in the electronic device 3 Establish a communication connection with other electronic devices.
  • the electronic device 3 may further include a user interface.
  • the user interface may be a display (Display) or an input unit (such as a keyboard (Keyboard)).
  • the user interface may also be a standard wired interface or a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, and the like.
  • the display may also be appropriately referred to as a display screen or a display unit, and is used for displaying information processed in the electronic device 3 and for displaying a visualized user interface.
  • the adjacent convolution-based model pruning program 32 stored in the memory 31 in the electronic device 3 is a combination of multiple instructions.
  • it can be realized: using the absolute value function to obtain the Evaluate the filter Manhattan distance of the filter matrix in the convolution layer and the channel Manhattan distance of the channel matrix, obtain the convolution layer filter mode parameters according to the filter Manhattan distance, and obtain the convolution layer channel through the channel Manhattan distance mode parameter; the product of the convolutional layer filter mode parameter and the convolutional layer channel mode parameter is formed as a filter pruning probability parameter for judging the filter pruning probability; according to preset rules, the filter The pruning probability parameters are sorted, and the filters to be pruned are determined according to the sorting results of the filter pruning probability parameters; and the determined filters to be pruned are clipped.
  • the above-mentioned model pruning program based on adjacent convolution is stored in the node of the blockchain where the server cluster is located.
  • the integrated modules/units of the electronic device 3 are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a removable hard disk, a magnetic disk, an optical disk, a computer memory, and a read-only memory (ROM, Read-Only Memory) .
  • the embodiment of the present application also provides a computer-readable storage medium, the storage medium may be non-volatile or volatile, the storage medium stores a computer program, and the computer program is executed by a processor Timely implementation: use the absolute value function to obtain the filter Manhattan distance of the filter matrix in the convolutional layer to be evaluated and the channel Manhattan distance of the channel matrix, obtain the filter mode parameters of the convolutional layer according to the filter Manhattan distance, and pass the The channel Manhattan distance is used to obtain the convolutional layer channel mode parameters; the convolutional layer filter mode parameters and the convolutional layer channel mode parameters are fused to form a filter pruning probability parameter for judging the filter pruning probability; according to The preset rule sorts the filter pruning probability parameters, and determines the filters to be pruned according to the sorting results of the filter pruning probability parameters; and prunes the determined filters to be pruned.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the method for clipping the determined filter to be pruned includes: obtaining the filter to be pruned, and training pruning based on adjacent convolution according to the filter to be pruned and a preset pruning threshold model; obtain a mask matrix according to the original parameters of the pruning model based on the adjacent convolution; wherein, the mask matrix is consistent with the size of the original parameter matrix of the pruning model based on the adjacent convolution, and the The mask matrix is a training matrix including 0 and 1; use the mask matrix to adjust the parameters of the adjacent convolution-based pruning model; use the parameter-adjusted adjacent convolution-based pruning Branch model for pruning.
  • the method of using the mask matrix to adjust the parameters of the adjacent convolution-based pruning model includes: combining the parameters of the adjacent convolution-based pruning model with the Multiply the mask matrix; filter the model parameters of the pruned model whose mask is 1, and perform training and backpropagation adjustment on the model parameter values described in the mask; store the adjusted model parameter values and their corresponding values through backpropagation The matrix position; obtain the final parameters of the pruning model based on adjacent convolution through the model parameter value and its corresponding matrix position, and complete the adjustment of the parameters of the pruning model.
  • the method of determining the filter to be pruned according to the sorting result of the filter pruning probability parameter in the step includes: sorting the filter pruning probability parameter from large to small, and sorting the filter pruning probability Channels whose parameters are smaller than a preset threshold are used as filters to be pruned; wherein, the preset threshold is 1%.
  • the loss func is obtained by the following formula:
  • each element p i represents the probability that the sample belongs to the i-th class
  • the step uses the parameter-adjusted adjacent convolution-based pruning model to perform pruning, the pruning includes cutting the number of channels in the convolution kernel, and pruning is related to the number of channels in the convolution kernel.
  • the number of channels in corresponds to the number of channels in the input feature map and the convolution kernel of the corresponding upper layer that outputs the current input feature map.
  • the input of the convolutional layer to be evaluated is the input feature map H*W*C in
  • the output is the convolution kernel W(C in *k h *k w )*(C out ) and the output feature map (H*W)*(C out ); where H and W are the height and width of the output feature map, respectively.
  • the specific implementation method can refer to the description of relevant steps in the adjacent convolution-based model pruning method in the embodiment, and details are not repeated here.
  • modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional module in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware, or in the form of hardware plus software function modules.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with each other using cryptographic methods. Each data block contains a batch of network transaction information, which is used to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • the blockchain can store medical data, such as personal health records, kitchens, inspection reports, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请涉及人工智能技术领域,揭露一种基于相邻卷积的模型剪枝方法,包括:利用绝对值函数获取待评价卷积层中的滤波器矩阵的滤波器曼哈顿距离以及通道矩阵的通道曼哈顿距离,进而获取卷积层滤波器方式参数和卷积层通道方式参数;将卷积层滤波器方式参数与卷积层通道方式参数的乘积,形成用于判断滤波器剪枝概率的滤波器剪枝概率参数;根据预设的规则将滤波器剪枝概率参数进行排序,并根据滤波器剪枝概率参数的排序结果确定待剪枝滤波器;将所确定的待剪枝滤波器进行裁剪。本申请的待剪枝卷积模型可以为用于智慧医疗的神经网络模型;本申请实现了在保持卷积模型相对较好的模型性能的同时,达到较高的精度的技术效果。

Description

基于相邻卷积的模型剪枝方法、装置及存储介质
本申请要求于2021年8月24日提交中国专利局、申请号为202110975018.3,发明名称为“基于相邻卷积的模型剪枝方法、装置及存储介质”的中国发明专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种基于相邻卷积的模型剪枝方法、装置及计算机可读存储介质。
背景技术
卷积神经网络(Convolutional Neural Networks,CNN)是一类包含卷积计算且具有深度结构的前馈神经网络(Feedforward Neural Networks),是深度学习(deep learning)的代表算法之一。卷积神经网络虽然有着较好的性能,但是卷积神经网络模型需要大量的计算开销,且模型中含有大量冗余信息,因此需要对卷积神经网络进行压缩。在智慧医疗领域中,卷积神经网络广泛可以支持疾病辅助诊断、健康管理、远程会诊等功能;现有的应用于智慧医疗的卷积神经网络模型压缩方法有模型剪枝、量化和蒸馏。现有技术中的模型剪枝是通过选取相对不重要的卷积核的滤波器(filter)并去掉;对于去掉了不重要的滤波器的模型进行微调(fine-tune);已恢复因为去掉微调带来的精度损失。
发明人意识到,现有技术中的相对不重要的卷积核的滤波器的选取方法没有考虑滤波器滤波器之间的关系,因此无法充分挖掘模型的卷积滤波器滤波器内的冗余信息,仅局限于考虑单层卷积,没有考虑两层卷积间的关系。
因此,亟需一种充分挖掘模型的卷积滤波器滤波器内的冗余信息,并充分考虑两层卷积间的关系的卷积剪枝方法。
发明内容
本申请提供一种基于相邻卷积的模型剪枝方法、系统、电子设备及存储介质,其主要目的在于解决现有的智慧医疗场景中,卷积剪枝过程中仅局限于单层卷积未考虑两层卷积之间的关系的问题。
为实现上述目的,本申请提供的一种基于相邻卷积的模型剪枝方法,应用于电子装置,包括:利用绝对值函数获取待评价卷积层中的滤波器矩阵的滤波器曼哈顿距离以及通道矩阵的通道曼哈顿距离,根据所述滤波器曼哈顿距离获取卷积层滤波器方式参数,并通过所述通道曼哈顿距离获取卷积层通道方式参数;将所述卷积层滤波器方式参数与卷积层通道方式参数的乘积,形成用于判断滤波器剪枝概率的滤波器剪枝概率参数;根据预设的规则将所述滤波器剪枝概率参数进行排序,并根据所述滤波器剪枝概率参数的排序结果确定待剪枝滤波器;将所确定的待剪枝滤波器进行裁剪。
为了解决上述问题,本申请还提供一种基于相邻卷积的模型剪枝装置,所述装置包括:滤波器方式参数和通道方式参数获取单元,用于利用绝对值函数获取待评价卷积层中的滤波器矩阵的滤波器曼哈顿距离以及通道矩阵的通道曼哈顿距离,根据所述滤波器曼哈顿距离获取卷积层滤波器方式参数,并通过所述通道曼哈顿距离获取卷积层通道方式参数;滤波器剪枝概率参数获取单元,用于将所述卷积层滤波器方式参数与卷积层通道方式参数的乘积,形成用于判断滤波器剪枝概率的滤波器剪枝概率参数;待剪枝滤波器确定单元,用于根据预设的规则将所述滤波器剪枝概率参数进行排序,并根据所述滤波器剪枝概率参数的排序结果确定待剪枝滤波器;剪枝单元,用于将所确定的待剪枝滤波器进行裁剪。
为了解决上述问题,本申请还提供一种电子设备,所述电子设备包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行前述的基于相邻卷积的模型剪枝方法中的步骤。
为了解决上述问题,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一个指令,所述至少一个指令被电子设备中的处理器执行以实现上述所述的基于相邻卷积的模型剪枝方法。
本申请提供的上述基于相邻卷积的模型剪枝方法基于相邻两层卷积的重要性的融合,解决现有技术中,卷积剪枝过程中仅局限于单层卷积未考虑两层卷积之间的关系的问题;可以真正得到卷积中不重要的滤波器;进而实现在保持卷积模型相对较好的模型性能的同时,达到较高的精度的技术效果。
附图说明
图1为根据本申请实施例的基于相邻卷积的模型剪枝方法的流程示意图;
图2为根据本申请实施例的基于相邻卷积的模型剪枝装置的逻辑结构框图;
图3为根据本申请实施例的实现基于相邻卷积的模型剪枝方法的电子设备的内部结构示意图。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请实施例可以基于人工智能技术对相关的数据进行获取和处理。其中,人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。
人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。本申请中的人工智能软件技术为基于卷积神经网络的机器学习技术。基于卷积神经网络能够应用于多种不同的领域中,如语音识别、医疗诊断、应用程序的测试等。
针对现有技术中的相对不重要的卷积核的滤波器(filter)的选取方法没有考虑滤波器 滤波器之间的关系,因此无法充分挖掘模型的卷积滤波器滤波器内的冗余信息的问题。本申请的的基于相邻卷积的模型剪枝方法通过充分考虑卷积滤波器滤波器内的冗余信息以及两层卷积间的关系,实现在保持卷积模型相对较好的模型性能的同时,达到较高的精度的技术效果。
现有技术中对于相对不重要的卷积核的滤波器的选取,方法有:1)通过批归一层的参数(weights)的大小判定卷积核的滤波器的重要程度;虽然便于理解,容易实现,但是BN层的weights难以衡量相关滤波器真正具有的信息量,从而不能衡量滤波器之间的信息相关性;2)通过滤波器的曼哈顿距离或Euclidean范数值的大小判定卷积核的滤波器的重要程度;虽然便于理解,容易实现,但是仅依赖数值的大小,从而不能衡量滤波器之间的信息相关性;3)通过滤波器所在空间的几何中位数的方法判定卷积核的滤波器的重要程度;即通过计算获得距离所有滤波器的集合中位数最接近的滤波器,判定上述滤波器为不重要,并将其剪枝。但是,上述几何中位数的信息量并无法替代滤波器的信息量。
本申请的基于相邻卷积的模型剪枝方法的硬件采用NVIDIAV100GPU,均采用PyTorch框架。其中,PyTorch是Facebook开发的用于训练神经网络的Python包,也是Facebook倾力打造的深度学习框架。PyTorch提供了一种类似NumPy的抽象方法来表征张量(或多维数组),可以利用GPU(Graphics Processing Unit图形处理器)来加速训练。PyTorch通过一种称为Reverse-mode auto-differentiation(反向模式自动微分)的技术,可以零延迟或零成本地任意改变网络的行为。PyTorch作为一个端到端的机器学习框架,具有的功能包括Torch脚本、分布式培训、移动(实验性)、工具和库、本机ONNX支持、C++前端以及云合作伙伴。PyTorch简洁易用,且在模型的速度表现上也极为出色,相比TensorFlow等框架,很多模型在PyTorch上的实现可能会更快。因此,适用于本申请的相邻卷积的模型剪枝的场景。
常见的卷积神经网络都是按照卷积层、BN层(Batch Normalization批归一化层)、ReLu层(非线性激活层)的顺序重复叠加构成的。将每组卷积层、批归一化层、非线性激活层构成的单元作为卷积神经网络的特征提取单元,这些特征提取单元按照卷积神经网络的深度方向顺序排列。一组特征提取单元的输出特征图作为下一组特征提取单元的输入特征图。卷积层中的每个长方体为一个滤波器,每个滤波器中,从前到后一共有具有若干个通道(channel)。本申请是从滤波器和通道两个维度进行卷积层重要性的判断。
具体的,作为示例,图1为本申请一实施例提供的基于相邻卷积的模型剪枝方法的流程示意图。参照图1所示,本申请提供一种基于相邻卷积的模型剪枝方法,该方法可以由一个装置执行,该装置可以由软件和/或硬件实现。
在本实施例中,基于相邻卷积的模型剪枝方法包括:步骤S110~S140:
S110:利用绝对值函数获取待评价卷积层中的滤波器矩阵的滤波器曼哈顿距离以及通道矩阵的通道曼哈顿距离,根据所述滤波器曼哈顿距离获取卷积层滤波器方式参数,并通过所述通道曼哈顿距离获取卷积层通道方式参数。具体地说,就是通过在网络的目标函数上增加惩罚项(group lasso)的限制项,对卷积层进行滤波器级稀疏化。
所述待评价卷积层的输入为输入特征图H*W*C in,输出为卷积核W(C in*k h*k w)*(C out)和输出特征图(H*W)*(C out);其中,H和W分别为输出特征图的高和宽。由矩阵乘法可知,卷积核中对应的行只和输入特征图矩阵中特定的列相乘。
所述损失函数通过以下公式获得:
Figure PCTCN2022071221-appb-000001
其中,p=[p 0,…,p C-1]为概率分布,每个元素p i表示样本属于第i类的概率,y=[y 0,…,y C-1]是样本标签的独热编码(onehot)表示,当样本属于类别i,则y i=1,否则y i=0;C是总共的类别数。
需要说明的是,基于滤波器方式的重要性计算关注在滤波器维度,以滤波器为单位进行对比,每个滤波器与其他滤波器对比相关的指标。通过采用曼哈顿距离(即L1范数)为重要性指标进行评价,把滤波器中的参数作为评价的依据,比较多个滤波器之间的参数曼哈顿距离均值的差异。也就是说,当滤波器的曼哈顿距离均值越小,则说明改滤波器的重要性较小,其在神经网络的计算中没有发挥重大的作用,其信息就会存在冗余,在剪枝中可以将其进行删除。相反,若其曼哈顿距离均值较大,则说明该滤波器在计算中对结果有着重大的影响,其含有重大的信息量,它是不能删除的。
以卷积参数为N×c×k×k的矩阵的待评价的卷积层为例,N为滤波器的个数,c为每个滤波器中通道的个数。需要说明的是,绝对值函数就是用于求绝对值的LV函数。对其做曼哈顿距离就是对c×k×k矩阵中的元素取绝对值,之后对这c×k×k个绝对值进行求平均,得到的数值就是该滤波器剪枝概率的指标,也就是滤波器重要性的指标。总之,是通过将滤波器级稀疏化实现2D矩阵的行和列稀疏化,然后分别在矩阵的行和列上裁剪掉全为0的值以降低矩阵的维度,从而提升卷积模型的运算效率。
基于通道方式的重要性计算关注在通道维度。仍然以卷积参数为N×c×k×k的矩阵的待评价的卷积层为例,滤波器将其分成N个c×k×k的组合,而通道是将其分为c个N×k×k的组合。与滤波器维度相同,通过采用曼哈顿距离的均值作为评价通道重要性的指标。也就是说通过比较某一层内部各通道的重要性反应该层的重要性。不同的通道具有不同的曼哈顿距离,该值越大,表示该通道的重要性越大,其在剪枝中越不能去掉。通过剪掉当前层的若干通道后,重建其输出特征图使得损失信息最小。
S120:将所述卷积层滤波器方式参数与卷积层通道方式参数乘积,形成用于判断滤波器剪枝概率的滤波器剪枝概率参数。
滤波器方式通道方式所述滤波器剪枝概率参数也就是滤波器重要参数,通过以下公式获得:
滤波器剪枝概率参数=卷积层滤波器方式参数×卷积层通道方式参数。
两个连续的卷积中,第二个卷积的不同通道通道就是由第一个卷积的不同滤波器卷积核产生的。若第一个卷积采用滤波器方式来进行评价,第二个卷积采用通道方式进行评价, 则其评价的重要性均是评价第一层的滤波器,所以可以将两者的结果进行融合。具体地说,滤波器剪枝概率参数=卷积层滤波器方式参数×卷积层通道方式参数。也就是将两者的结果相乘融合,得到一个新的指标,更加准确地评价滤波器的重要性。这种评价不仅仅考虑了用于生成特征图(featuremap)的滤波器的重要性,同时也考虑了使用特征图时通道的重要性。
S130:根据预设的规则将所述滤波器剪枝概率参数进行排序,并根据所述滤波器剪枝概率参数的排序结果确定待剪枝滤波器。
所述步骤根据所述滤波器剪枝概率参数的排序结果确定待剪枝滤波器的方法包括:按照滤波器剪枝概率参数从大到小进行排序,将滤波器剪枝概率参数小于预设阈值的通道作为待剪枝滤波器;其中,预设阈值为1%。
在将滤波器方式和通道方式的重要性结果进行融合后,按照结果的大小对滤波器进行排序,在预选设定剪枝率的情况下,去掉一定个数的总价值相对较小的滤波器及其相关的权重之后,就可以得到剪枝后的模型。也就是按照滤波器剪枝概率参数从大到小进行排序,然后裁剪掉重要性小于预先设定阈值的所有通道。在具体的实施过程中,预先设定的阈值可以为1%。
在一个具体的实施例中,可以用网络的某一层重要性数值确定该层被剪枝的通道。在第l层中,重要性小于该层中最大值的p倍的通道将被裁剪掉;沿用以上符号系统,第l层中被裁剪的通道构成的集合为其中,p∈(0,1)为阈值。例如,某一卷积层有四个通道,经计算得各通道重要性为{1.5,2.1,0.003,0.02},p=0.01,则第三、四通道被裁减掉。
滤波器剪枝(Filter-level)只改变了网络中的滤波器组和特征通道数目,所获得的模型不需要专门的算法设计就能够运行,被称为结构化剪枝。因为当前阶段的冗余的模块,并不意味这对其他阶段也是冗余的。通过对相邻卷积的重要性进行综合考量,实现了卷积框架各层之间的联动,进而完成对整个卷积模型的剪枝。
S140:将所确定的待剪枝滤波器进行裁剪。
通过步骤S110~S130确定了需要剪枝的层,然后,根据预设的裁剪阈值或者比例进行剪枝。具体地说,需要剪枝的层一般为全连接层。
所述将所确定的待剪枝滤波器进行裁剪的方法包括:
S141、获取待剪枝滤波器,根据待剪枝滤波器以及预设的裁剪阈值训练基于相邻卷积的剪枝模型;S142、根据所述基于相邻卷积的剪枝模型的原始参数获取掩码矩阵;其中,所述掩码矩阵与所述基于相邻卷积的剪枝模型的原始参数矩阵尺寸一致,且所述掩码矩阵为包括0和1的训练矩阵;S143、利用所述掩码矩阵对所述基于相邻卷积的剪枝模型的参数进行调整。
所述利用所述掩码矩阵对所述基于相邻卷积的剪枝模型的参数进行调整的方法包括:将所述基于相邻卷积的剪枝模型的参数与所述掩码矩阵相乘;筛选掩码为1的剪枝模型的模型参数,并对掩码所述模型参数值进行训练以及反向传播调整;储存通过反向传播调整 后的模型参数值以及其对应的矩阵位置;通过模型参数值以及其对应的矩阵位置获取所述基于相邻卷积的剪枝模型的最终参数,完成对所述剪枝模型的参数进行调整。
S144、利用参数调整后的所述基于相邻卷积的剪枝模型进行剪枝。
利用参数调整后的所述基于相邻卷积的剪枝模型进行剪枝的步骤中,剪枝包括裁剪卷积核中的通道数,裁剪与所述卷积核中的通道数相对应的输入特征图中的通道数以及输出当前输入特征图的对应上层的卷积核。
其中,需要说明的是,具体的实施方式为,通过修改代码加入一个与参数矩阵尺寸一致的掩码矩阵;掩码矩阵中只有0和1,实际上是用于重新训练的网络。也就是说,将剪去的通道数与网络总通道数的比值定义为剪枝率,表示为pruned_ratio。剪去的通道数目为pruned_channels=pruned_ratio×网络总通道数。剪枝率上限upper_ratio为1,剪枝率下限lower_ratio为0。初始剪枝率为0.5。已经排好序的各通道特征图均值,即sortmin→max{ch_avg},其中前pruned_channels数目个通道对应到通道选择层的掩码值置为0,剩余通道对应的通道选择层的掩码值置为1。
需要说明的是,剪枝作为一个迭代过程,模型剪枝通常称为“迭代式剪枝”迭代的过程就是剪枝和模型训练两者的交替重复过程。模型剪枝的目标是只保留重要的权重,处理的平台有全连接层剪枝、卷积层剪枝。它对深度神经网络有不同的影响。最大的影响是保持相同性能的同时能降低计算成本,而且删除那些在深度网络中没有真正使用的特征,也可以加速推理和训练过程;影响二是通过减少参数数量,也就是减少参数空间中的冗余,可以实现提升模型的泛化能力。
本实施例以vgg16模型为例来验证算法的有效性,数据集采用cifar10,每次实验进行500个epoch的模型压缩训练,硬件采用NVIDIAV100GPU,均采用PyTorch框架。未压缩的模型精度为93.99%,剪枝后的模型精度参见表下1所示。
表1:算法对比
压缩方法 RawModel APoZ MeanAct FPGM TaylarFO 本申请
精度(%) 93.99 91.89 92.77 93.45 93.54 93.65
通过观察表1可知,APoZ为基于featuremap中零的平均百分比的剪枝,详细信息见Network Trimming:A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures;MeanAct为基于平均激活值最小的剪枝,详细信息见Pruning Convolutional Neural Networks for Resource Efficient Inference;FPGM为基于几何中值的通道剪枝,详细信息见Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration;TaylarFO为基于一阶泰勒展开进行的通道剪枝,详细信息见Importance Estimation for Neural Network Pruning。从表中可以发现,本申请中的方法得到了最高的精度,比其他剪枝方法均好。说明本申请的确找到了卷积中不重要的滤波器,是一种很好的 剪枝方法,可以在剪枝中有效兼顾检测损失率与精度。
总之,本申请的基于相邻卷积的模型剪枝方法通过基于相邻两层卷积的重要性的融合,解决现有技术中卷积剪枝过程中仅局限于单层卷积未考虑两层卷积之间的关系的问题;可以真正得到卷积中不重要的滤波器;进而实现在保持卷积模型相对较好的模型性能的同时,达到较高的精度的技术效果。
与上述基于相邻卷积的模型剪枝方法相对应,本申请还提供一种基于相邻卷积的模型剪枝装置。图2示出了根据本申请实施例的基于相邻卷积的模型剪枝装置的功能模块。
如图2所示,本申请提供的基于相邻卷积的模型剪枝装置200可以安装于电子设备中。根据实现的功能,所述基于相邻卷积的模型剪枝装置200可以包括滤波器方式参数和通道方式参数获取单元210、滤波器剪枝概率参数获取单元220、待剪枝滤波器确定单元230和剪枝单元240。本申请所述单元也可以称之为模块,指的是一种能够被电子设备的处理器所执行,并且能够完成某一固定功能的一系列计算机程序段,其存储在电子设备的存储器中。
在本实施例中,关于各模块/单元的功能如下:
滤波器方式参数和通道方式参数获取单元210,用于利用绝对值函数获取待评价卷积层中的滤波器矩阵的滤波器曼哈顿距离以及通道矩阵的通道曼哈顿距离,根据所述滤波器曼哈顿距离获取卷积层滤波器方式参数,并通过所述通道曼哈顿距离获取卷积层通道方式参数;
滤波器剪枝概率参数获取单元220,用于将所述卷积层滤波器方式参数与卷积层通道方式参数的乘积,形成用于判断滤波器剪枝概率的滤波器剪枝概率参数;
待剪枝滤波器确定单元230,用于根据预设的规则将所述滤波器剪枝概率参数进行排序,并根据所述滤波器剪枝概率参数的排序结果确定待剪枝滤波器;
剪枝单元240,用于将所确定的待剪枝滤波器进行裁剪。
其中,滤波器剪枝概率参数获取单元220通过以下公式获得滤波器剪枝概率参数:滤波器剪枝概率参数=卷积层滤波器方式参数×卷积层通道方式参数。
在本申请的一个具体实施方式中,剪枝单元240进一步包括模型训练子单元、参数调整单元和模型剪枝子单元(图中未示出)。
其中,模型训练子单元用于获取待剪枝滤波器,并根据待剪枝滤波器以及预设的裁剪阈值训练基于相邻卷积的剪枝模型;根据所述基于相邻卷积的剪枝模型的原始参数获取掩码矩阵;其中,所述掩码矩阵与所述基于相邻卷积的剪枝模型的原始参数矩阵尺寸一致,且所述掩码矩阵为包括0和1的训练矩阵;
参数调整子单元,用于利用所述掩码矩阵对所述基于相邻卷积的剪枝模型的参数进行调整;
模型剪枝子单元,用于利用参数调整后的所述基于相邻卷积的剪枝模型进行剪枝。
总之,本申请的基于相邻卷积的模型剪枝装置通过基于相邻两层卷积的重要性的融合,解决现有技术中卷积剪枝过程中仅局限于单层卷积未考虑两层卷积之间的关系的问题;可 以真正得到卷积中不重要的滤波器;进而实现在保持卷积模型相对较好的模型性能的同时,达到较高的精度的技术效果。
本申请所提供的上述基于相邻卷积的模型剪枝装置的更为具体的实现方式,均可以参照上述对基于相邻卷积的模型剪枝方法的实施例表述,在此不再一一列举。
通过上述实施例可以看出,本申请提出的基于相邻卷积的模型剪枝方法通过基于相邻两层卷积的重要性的融合,解决现有技术中卷积剪枝过程中仅局限于单层卷积未考虑两层卷积之间的关系的问题;可以真正得到卷积中不重要的滤波器;进而实现在保持卷积模型相对较好的模型性能的同时,达到较高的精度的技术效果。
如图3所示,本申请提供一种基于相邻卷积的模型剪枝方法的电子设备3。
该电子设备3可以包括处理器30、存储器31和总线,还可以包括存储在存储器31中并可在所述处理器30上运行的计算机程序,如基于相邻卷积的模型剪枝程序32。
其中,所述存储器31至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、移动硬盘、多媒体卡、卡型存储器(例如:SD或DX存储器等)、磁性存储器、磁盘、光盘等。所述存储器31在一些实施例中可以是电子设备3的内部存储单元,例如该电子设备3的移动硬盘。所述存储器31在另一些实施例中也可以是电子设备3的外部存储设备,例如电子设备3上配备的插接式移动硬盘、智能存储卡(Smart Media Card,SMC)、安全数字(Secure Digital,SD)卡、闪存卡(Flash Card)等。进一步地,所述存储器31还可以既包括电子设备3的内部存储单元也包括外部存储设备。所述存储器31不仅可以用于存储安装于电子设备3的应用软件及各类数据,例如基于相邻卷积的模型剪枝程序的代码等,还可以用于暂时地存储已经输出或者将要输出的数据。
所述处理器30在一些实施例中可以由集成电路组成,例如可以由单个封装的集成电路所组成,也可以是由多个相同功能或不同功能封装的集成电路所组成,包括一个或者多个中央处理器(Central Processing unit,CPU)、微处理器、数字处理芯片、图形处理器及各种控制芯片的组合等。所述处理器30是所述电子设备的控制核心(Control Unit),利用各种接口和线路连接整个电子设备的各个部件,通过运行或执行存储在所述存储器31内的程序或者模块(例如基于相邻卷积的模型剪枝程序等),以及调用存储在所述存储器41内的数据,以执行电子设备3的各种功能和处理数据。
所述总线可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。所述总线被设置为实现所述存储器31以及至少一个处理器30等之间的连接通信。
图3仅示出了具有部件的电子设备,本领域技术人员可以理解的是,图3示出的结构并不构成对所述电子设备3的限定,可以包括比图示更少或者更多的部件,或者组合某些部件,或者不同的部件布置。
例如,尽管未示出,所述电子设备3还可以包括给各个部件供电的电源(比如电池), 优选地,电源可以通过电源管理装置与所述至少一个处理器30逻辑相连,从而通过电源管理装置实现充电管理、放电管理、以及功耗管理等功能。电源还可以包括一个或一个以上的直流或交流电源、再充电装置、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。所述电子设备3还可以包括多种传感器、蓝牙模块、Wi-Fi模块等,在此不再赘述。
进一步地,所述电子设备3还可以包括网络接口,可选地,所述网络接口可以包括有线接口和/或无线接口(如WI-FI接口、蓝牙接口等),通常用于在该电子设备3与其他电子设备之间建立通信连接。
可选地,该电子设备3还可以包括用户接口,用户接口可以是显示器(Display)、输入单元(比如键盘(Keyboard)),可选地,用户接口还可以是标准的有线接口、无线接口。可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在电子设备3中处理的信息以及用于显示可视化的用户界面。
应该了解,所述实施例仅为说明之用,在专利申请范围上并不受此结构的限制。
所述电子设备3中的所述存储器31存储的基于相邻卷积的模型剪枝程序32是多个指令的组合,在所述处理器30中运行时,可以实现:利用绝对值函数获取待评价卷积层中的滤波器矩阵的滤波器曼哈顿距离以及通道矩阵的通道曼哈顿距离,根据所述滤波器曼哈顿距离获取卷积层滤波器方式参数,并通过所述通道曼哈顿距离获取卷积层通道方式参数;将所述卷积层滤波器方式参数与卷积层通道方式参数的乘积,形成用于判断滤波器剪枝概率的滤波器剪枝概率参数;根据预设的规则将所述滤波器剪枝概率参数进行排序,并根据所述滤波器剪枝概率参数的排序结果确定待剪枝滤波器;将所确定的待剪枝滤波器进行裁剪。
具体地,所述处理器30对上述指令的具体实现方法可参考图1对应实施例中相关步骤的描述,在此不赘述。需要强调的是,为进一步保证上述基于相邻卷积的模型剪枝程序的私密和安全性,上述基于相邻卷积的模型剪枝程序存储于本服务器集群所处区块链的节点中。
进一步地,所述电子设备3集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)。
本申请实施例还提供一种计算机可读存储介质,所述存储介质可以是非易失性的,也可以是易失性的,所述存储介质存储有计算机程序,所述计算机程序被处理器执行时实现:利用绝对值函数获取待评价卷积层中的滤波器矩阵的滤波器曼哈顿距离以及通道矩阵的通道曼哈顿距离,根据所述滤波器曼哈顿距离获取卷积层滤波器方式参数,并通过 所述通道曼哈顿距离获取卷积层通道方式参数;将所述卷积层滤波器方式参数与卷积层通道方式参数相融合,形成用于判断滤波器剪枝概率的滤波器剪枝概率参数;根据预设的规则将所述滤波器剪枝概率参数进行排序,并根据所述滤波器剪枝概率参数的排序结果确定待剪枝滤波器;将所确定的待剪枝滤波器进行裁剪。所述计算机可读存储介质可以是非易失性,也可以是易失性。
进一步,优选的,滤波器方式通道方式所述滤波器剪枝概率参数通过以下公式获得:滤波器剪枝概率参数=卷积层滤波器方式参数×卷积层通道方式参数。
进一步,优选的,所述将所确定的待剪枝滤波器进行裁剪的方法包括:获取待剪枝滤波器,根据待剪枝滤波器以及预设的裁剪阈值训练基于相邻卷积的剪枝模型;根据所述基于相邻卷积的剪枝模型的原始参数获取掩码矩阵;其中,所述掩码矩阵与所述基于相邻卷积的剪枝模型的原始参数矩阵尺寸一致,且所述掩码矩阵为包括0和1的训练矩阵;利用所述掩码矩阵对所述基于相邻卷积的剪枝模型的参数进行调整;利用参数调整后的所述基于相邻卷积的剪枝模型进行剪枝。
进一步,优选的,所述利用所述掩码矩阵对所述基于相邻卷积的剪枝模型的参数进行调整的方法包括:将所述基于相邻卷积的剪枝模型的参数与所述掩码矩阵相乘;筛选掩码为1的剪枝模型的模型参数,并对掩码所述模型参数值进行训练以及反向传播调整;储存通过反向传播调整后的模型参数值以及其对应的矩阵位置;通过模型参数值以及其对应的矩阵位置获取所述基于相邻卷积的剪枝模型的最终参数,完成对所述剪枝模型的参数进行调整。
进一步,优选的,所述步骤根据所述滤波器剪枝概率参数的排序结果确定待剪枝滤波器的方法包括:按照滤波器剪枝概率参数从大到小进行排序,将滤波器剪枝概率参数小于预设阈值的通道作为待剪枝滤波器;其中,预设阈值为1%。
进一步,优选的,所述loss func通过以下公式获得:
Figure PCTCN2022071221-appb-000002
其中,p=[p 0,…,p C-1]为概率分布,每个元素p i表示样本属于第i类的概率,y=[y 0,…,y C-1]是样本标签的onehot表示,当样本属于类别i,则y i=1,否则y i=0;C是总共的类别数。
进一步,优选的,所述步骤利用参数调整后的所述基于相邻卷积的剪枝模型进行剪枝中,所述剪枝包括裁剪卷积核中的通道数,裁剪与所述卷积核中的通道数相对应的输入特征图中的通道数以及输出当前输入特征图的对应上层的卷积核。
进一步,优选的,所述待评价卷积层的输入为输入特征图H*W*C in,输出为卷积核W(C in*k h*k w)*(C out)和输出特征图(H*W)*(C out);其中,H和W分别为输出特征图的高和宽。
具体地,所述计算机程序被处理器执行时具体实现方法可参考实施例基于相邻卷积的 模型剪枝方法中相关步骤的描述,在此不赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。
因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附关联图标记视为限制所涉及的权利要求。
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等区块链可以存储医疗数据,如个人健康档案、厨房、检查报告等。
此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。系统权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第二等词语用来表示名称,而并不表示任何特定的顺序。
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神和范围。

Claims (20)

  1. 一种基于相邻卷积的模型剪枝方法,应用于电子装置,其中,所述方法包括:
    利用绝对值函数获取待评价卷积层中的滤波器矩阵的滤波器曼哈顿距离以及通道矩阵的通道曼哈顿距离,根据所述滤波器曼哈顿距离获取卷积层滤波器方式参数,并通过所述通道曼哈顿距离获取卷积层通道方式参数;
    将所述卷积层滤波器方式参数与卷积层通道方式参数的乘积,形成用于判断滤波器剪枝概率的滤波器剪枝概率参数;
    根据预设的规则将所述滤波器剪枝概率参数进行排序,并根据所述滤波器剪枝概率参数的排序结果确定待剪枝滤波器;
    将所确定的待剪枝滤波器进行裁剪。
  2. 如权利要求1所述的基于相邻卷积的模型剪枝方法,其中,所述将所确定的待剪枝滤波器进行裁剪的方法包括:
    获取待剪枝滤波器,根据待剪枝滤波器以及预设的裁剪阈值训练基于相邻卷积的剪枝模型;根据所述基于相邻卷积的剪枝模型的原始参数获取掩码矩阵;其中,所述掩码矩阵与所述基于相邻卷积的剪枝模型的原始参数矩阵尺寸一致,且所述掩码矩阵为包括0和1的训练矩阵;
    利用所述掩码矩阵对所述基于相邻卷积的剪枝模型的参数进行调整;
    利用参数调整后的所述基于相邻卷积的剪枝模型进行剪枝。
  3. 如权利要求2中所述的基于相邻卷积的模型剪枝方法,其中,所述利用所述掩码矩阵对所述基于相邻卷积的剪枝模型的参数进行调整的方法包括:
    将所述基于相邻卷积的剪枝模型的参数与所述掩码矩阵相乘;
    筛选掩码为1的剪枝模型的模型参数,掩码并对所述模型参数值进行训练以及反向传播调整;
    储存通过反向传播调整后的模型参数值以及其对应的矩阵位置;
    通过模型参数值以及其对应的矩阵位置获取所述基于相邻卷积的剪枝模型的最终参数,完成对所述剪枝模型的参数进行调整。
  4. 如权利要求1所述的基于相邻卷积的模型剪枝方法,其中,所述步骤根据所述滤波器剪枝概率参数的排序结果确定待剪枝滤波器的方法包括:
    按照滤波器剪枝概率参数从大到小进行排序,将滤波器剪枝概率参数小于预设阈值的通道作为待剪枝滤波器;其中,预设阈值为1%。
  5. 如权利要求1所述的基于相邻卷积的模型剪枝方法,其中,所述损失函数通过以下公式获得:
    Figure PCTCN2022071221-appb-100001
    其中,p=[p 0,…,p C-1]为概率分布,每个元素p i表示样本属于第i类的概率,y= [y 0,…,y C-1]是样本标签的表示,当样本属于类别i,则y i=1,否则y i=0;C是总共的类别数。
  6. 如权利要求2所述的基于相邻卷积的模型剪枝方法,其中,
    所述步骤利用参数调整后的所述基于相邻卷积的剪枝模型进行剪枝中,所述剪枝包括裁剪卷积核中的通道数,裁剪与所述卷积核中的通道数相对应的输入特征图中的通道数以及输出当前输入特征图的对应上层的卷积核。
  7. 如权利要求2所述的基于相邻卷积的模型剪枝方法,其中,
    所述待评价卷积层的输入为输入特征图H*W*C in,输出为卷积核W(C in*k h*k w)*(C out)和输出特征图(H*W)*(C out);其中,H和W分别为输出特征图的高和宽。
  8. 一种基于相邻卷积的模型剪枝装置,其中,所述装置包括:
    滤波器方式参数和通道方式参数获取单元,用于利用绝对值函数获取待评价卷积层中的滤波器矩阵的滤波器曼哈顿距离以及通道矩阵的通道曼哈顿距离,根据所述滤波器曼哈顿距离获取卷积层滤波器方式参数,并通过所述通道曼哈顿距离获取卷积层通道方式参数;
    滤波器剪枝概率参数获取单元,用于将所述卷积层滤波器方式参数与卷积层通道方式参数的乘积,形成用于判断滤波器剪枝概率的滤波器剪枝概率参数;
    待剪枝滤波器确定单元,用于根据预设的规则将所述滤波器剪枝概率参数进行排序,并根据所述滤波器剪枝概率参数的排序结果确定待剪枝滤波器;
    剪枝单元,用于将所确定的待剪枝滤波器进行裁剪。
  9. 一种计算机可读存储介质,存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现基于相邻卷积的模型剪枝方法,方法包括,
    利用绝对值函数获取待评价卷积层中的滤波器矩阵的滤波器曼哈顿距离以及通道矩阵的通道曼哈顿距离,根据所述滤波器曼哈顿距离获取卷积层滤波器方式参数,并通过所述通道曼哈顿距离获取卷积层通道方式参数;
    将所述卷积层滤波器方式参数与卷积层通道方式参数的乘积,形成用于判断滤波器剪枝概率的滤波器剪枝概率参数;
    根据预设的规则将所述滤波器剪枝概率参数进行排序,并根据所述滤波器剪枝概率参数的排序结果确定待剪枝滤波器;
    将所确定的待剪枝滤波器进行裁剪。
  10. 如权利要求9所述的计算机可读存储介质,其中,所述将所确定的待剪枝滤波器进行裁剪的方法包括:获取待剪枝滤波器,根据待剪枝滤波器以及预设的裁剪阈值训练基于相邻卷积的剪枝模型;根据所述基于相邻卷积的剪枝模型的原始参数获取掩码矩阵;其中,所述掩码矩阵与所述基于相邻卷积的剪枝模型的原始参数矩阵尺寸一致,且所述掩码矩阵为包括0和1的训练矩阵;利用所述掩码矩阵对所述基于相邻卷积的剪枝模型的参数 进行调整;利用参数调整后的所述基于相邻卷积的剪枝模型进行剪枝。
  11. 如权利要求10中所述的计算机可读存储介质,其中,所述利用所述掩码矩阵对所述基于相邻卷积的剪枝模型的参数进行调整的方法包括:将所述基于相邻卷积的剪枝模型的参数与所述掩码矩阵相乘;筛选掩码为1的剪枝模型的模型参数,掩码并对所述模型参数值进行训练以及反向传播调整;储存通过反向传播调整后的模型参数值以及其对应的矩阵位置;通过模型参数值以及其对应的矩阵位置获取所述基于相邻卷积的剪枝模型的最终参数,完成对所述剪枝模型的参数进行调整。
  12. 如权利要求9所述的计算机可读存储介质,其中,所述步骤根据所述滤波器剪枝概率参数的排序结果确定待剪枝滤波器的方法包括:按照滤波器剪枝概率参数从大到小进行排序,将滤波器剪枝概率参数小于预设阈值的通道作为待剪枝滤波器;其中,预设阈值为1%。
  13. 如权利要求10所述的计算机可读存储介质,其中,所述步骤利用参数调整后的所述基于相邻卷积的剪枝模型进行剪枝中,所述剪枝包括裁剪卷积核中的通道数,裁剪与所述卷积核中的通道数相对应的输入特征图中的通道数以及输出当前输入特征图的对应上层的卷积核。
  14. 一种电子设备,其中,所述电子设备包括:
    至少一个处理器;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行基于相邻卷积的模型剪枝方法中的步骤;所述方法包括:
    利用绝对值函数获取待评价卷积层中的滤波器矩阵的滤波器曼哈顿距离以及通道矩阵的通道曼哈顿距离,根据所述滤波器曼哈顿距离获取卷积层滤波器方式参数,并通过所述通道曼哈顿距离获取卷积层通道方式参数;
    将所述卷积层滤波器方式参数与卷积层通道方式参数的乘积,形成用于判断滤波器剪枝概率的滤波器剪枝概率参数;
    根据预设的规则将所述滤波器剪枝概率参数进行排序,并根据所述滤波器剪枝概率参数的排序结果确定待剪枝滤波器;
    将所确定的待剪枝滤波器进行裁剪。
  15. 根据权利要求14所述的电子设备,其中,所述将所确定的待剪枝滤波器进行裁剪的方法包括:
    获取待剪枝滤波器,根据待剪枝滤波器以及预设的裁剪阈值训练基于相邻卷积的剪枝模型;根据所述基于相邻卷积的剪枝模型的原始参数获取掩码矩阵;其中,所述掩码矩阵与所述基于相邻卷积的剪枝模型的原始参数矩阵尺寸一致,且所述掩码矩阵为包括0和1的训练矩阵;
    利用所述掩码矩阵对所述基于相邻卷积的剪枝模型的参数进行调整;
    利用参数调整后的所述基于相邻卷积的剪枝模型进行剪枝。
  16. 根据权利要求15所述的电子设备,其中,所述利用所述掩码矩阵对所述基于相邻卷积的剪枝模型的参数进行调整的方法包括:
    将所述基于相邻卷积的剪枝模型的参数与所述掩码矩阵相乘;
    筛选掩码为1的剪枝模型的模型参数,掩码并对所述模型参数值进行训练以及反向传播调整;
    储存通过反向传播调整后的模型参数值以及其对应的矩阵位置;
    通过模型参数值以及其对应的矩阵位置获取所述基于相邻卷积的剪枝模型的最终参数,完成对所述剪枝模型的参数进行调整。
  17. 根据权利要求14所述的电子设备,其中,所述步骤根据所述滤波器剪枝概率参数的排序结果确定待剪枝滤波器的方法包括:
    按照滤波器剪枝概率参数从大到小进行排序,将滤波器剪枝概率参数小于预设阈值的通道作为待剪枝滤波器;其中,预设阈值为1%。
  18. 根据权利要求14所述的电子设备,其中,
    所述损失函数通过以下公式获得:
    Figure PCTCN2022071221-appb-100002
    其中,p=[p 0,…,p C-1]为概率分布,每个元素p i表示样本属于第i类的概率,y=[y 0,…,y C-1]是样本标签的表示,当样本属于类别i,则y i=1,否则y i=0;C是总共的类别数。
  19. 根据权利要求15所述的电子设备,其中,
    所述步骤利用参数调整后的所述基于相邻卷积的剪枝模型进行剪枝中,所述剪枝包括裁剪卷积核中的通道数,裁剪与所述卷积核中的通道数相对应的输入特征图中的通道数以及输出当前输入特征图的对应上层的卷积核。
  20. 根据权利要求14所述的电子设备,其中,
    所述待评价卷积层的输入为输入特征图H*W*C in,输出为卷积核W(C in*k h*k w)*(C out)和输出特征图(H*W)*(C out);其中,H和W分别为输出特征图的高和宽。
PCT/CN2022/071221 2021-08-24 2022-01-11 基于相邻卷积的模型剪枝方法、装置及存储介质 WO2023024407A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110975018.3 2021-08-24
CN202110975018.3A CN113673697A (zh) 2021-08-24 2021-08-24 基于相邻卷积的模型剪枝方法、装置及存储介质

Publications (1)

Publication Number Publication Date
WO2023024407A1 true WO2023024407A1 (zh) 2023-03-02

Family

ID=78545610

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/071221 WO2023024407A1 (zh) 2021-08-24 2022-01-11 基于相邻卷积的模型剪枝方法、装置及存储介质

Country Status (2)

Country Link
CN (1) CN113673697A (zh)
WO (1) WO2023024407A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116341645A (zh) * 2023-04-07 2023-06-27 陕西物流集团产业研究院有限公司 一种基于全局多源层间的联合剪枝方法及系统
CN116402116A (zh) * 2023-06-05 2023-07-07 山东云海国创云计算装备产业创新中心有限公司 神经网络的剪枝方法、系统、设备、介质及图像处理方法
CN116992944A (zh) * 2023-09-27 2023-11-03 之江实验室 基于可学习重要性评判标准剪枝的图像处理方法及装置
CN116992945A (zh) * 2023-09-27 2023-11-03 之江实验室 一种基于贪心策略反向通道剪枝的图像处理方法及装置
CN117315722A (zh) * 2023-11-24 2023-12-29 广州紫为云科技有限公司 一种基于知识迁移剪枝模型的行人检测方法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113673697A (zh) * 2021-08-24 2021-11-19 平安科技(深圳)有限公司 基于相邻卷积的模型剪枝方法、装置及存储介质
CN114330713B (zh) * 2022-01-11 2023-05-02 平安科技(深圳)有限公司 卷积神经网络模型剪枝方法和装置、电子设备、存储介质
CN115170917B (zh) * 2022-06-20 2023-11-07 美的集团(上海)有限公司 图像处理方法、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210073643A1 (en) * 2019-09-05 2021-03-11 Vahid PARTOVI NIA Neural network pruning
CN112488304A (zh) * 2020-12-21 2021-03-12 湖南大学 一种卷积神经网络中的启发式滤波器剪枝方法和系统
CN113240085A (zh) * 2021-05-12 2021-08-10 平安科技(深圳)有限公司 模型剪枝方法、装置、设备及存储介质
CN113673697A (zh) * 2021-08-24 2021-11-19 平安科技(深圳)有限公司 基于相邻卷积的模型剪枝方法、装置及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210073643A1 (en) * 2019-09-05 2021-03-11 Vahid PARTOVI NIA Neural network pruning
CN112488304A (zh) * 2020-12-21 2021-03-12 湖南大学 一种卷积神经网络中的启发式滤波器剪枝方法和系统
CN113240085A (zh) * 2021-05-12 2021-08-10 平安科技(深圳)有限公司 模型剪枝方法、装置、设备及存储介质
CN113673697A (zh) * 2021-08-24 2021-11-19 平安科技(深圳)有限公司 基于相邻卷积的模型剪枝方法、装置及存储介质

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116341645A (zh) * 2023-04-07 2023-06-27 陕西物流集团产业研究院有限公司 一种基于全局多源层间的联合剪枝方法及系统
CN116341645B (zh) * 2023-04-07 2024-03-19 陕西物流集团产业研究院有限公司 一种基于全局多源层间的联合剪枝方法及系统
CN116402116A (zh) * 2023-06-05 2023-07-07 山东云海国创云计算装备产业创新中心有限公司 神经网络的剪枝方法、系统、设备、介质及图像处理方法
CN116402116B (zh) * 2023-06-05 2023-09-05 山东云海国创云计算装备产业创新中心有限公司 神经网络的剪枝方法、系统、设备、介质及图像处理方法
CN116992944A (zh) * 2023-09-27 2023-11-03 之江实验室 基于可学习重要性评判标准剪枝的图像处理方法及装置
CN116992945A (zh) * 2023-09-27 2023-11-03 之江实验室 一种基于贪心策略反向通道剪枝的图像处理方法及装置
CN116992944B (zh) * 2023-09-27 2023-12-19 之江实验室 基于可学习重要性评判标准剪枝的图像处理方法及装置
CN116992945B (zh) * 2023-09-27 2024-02-13 之江实验室 一种基于贪心策略反向通道剪枝的图像处理方法及装置
CN117315722A (zh) * 2023-11-24 2023-12-29 广州紫为云科技有限公司 一种基于知识迁移剪枝模型的行人检测方法
CN117315722B (zh) * 2023-11-24 2024-03-15 广州紫为云科技有限公司 一种基于知识迁移剪枝模型的行人检测方法

Also Published As

Publication number Publication date
CN113673697A (zh) 2021-11-19

Similar Documents

Publication Publication Date Title
WO2023024407A1 (zh) 基于相邻卷积的模型剪枝方法、装置及存储介质
EP3467723B1 (en) Machine learning based network model construction method and apparatus
Bolón-Canedo et al. Feature selection for high-dimensional data
CN113822494A (zh) 风险预测方法、装置、设备及存储介质
CN111597348B (zh) 用户画像方法、装置、计算机设备和存储介质
EP3792840A1 (en) Neural network method and apparatus
CN110321805B (zh) 一种基于时序关系推理的动态表情识别方法
Bamakan et al. A novel feature selection method based on an integrated data envelopment analysis and entropy model
CN110310114A (zh) 对象分类方法、装置、服务器及存储介质
WO2023103527A1 (zh) 一种访问频次的预测方法及装置
CN111768096A (zh) 基于算法模型的评级方法、装置、电子设备及存储介质
CN111652278A (zh) 用户行为检测方法、装置、电子设备及介质
CN108875532A (zh) 一种基于稀疏编码和长度后验概率的视频动作检测方法
CN112699142A (zh) 冷热数据处理方法、装置、电子设备及存储介质
Mandal et al. Unsupervised non-redundant feature selection: a graph-theoretic approach
CN117155771B (zh) 一种基于工业物联网的设备集群故障溯源方法及装置
Wang et al. Towards efficient convolutional neural networks through low-error filter saliency estimation
CN113824580A (zh) 一种网络指标预警方法及系统
CN116629234A (zh) 基于层级动态图卷积网络的谣言检测方法及系统
CN115862653A (zh) 音频去噪方法、装置、计算机设备和存储介质
US20240152818A1 (en) Methods for mitigation of algorithmic bias discrimination, proxy discrimination and disparate impact
CN115114992A (zh) 分类模型训练的方法、装置、设备及存储介质
CN115099339A (zh) 欺诈行为识别方法、装置、电子设备及存储介质
Kumar et al. Extensive survey on feature extraction and feature selection techniques for sentiment classification in social media
CN114118411A (zh) 图像识别网络的训练方法、图像识别方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22859768

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE