CN115660071A

CN115660071A - Model pruning method and device

Info

Publication number: CN115660071A
Application number: CN202211595555.6A
Authority: CN
Inventors: 兰婷婷; 支涛
Original assignee: Beijing Yunji Technology Co Ltd
Current assignee: Beijing Yunji Technology Co Ltd
Priority date: 2022-12-13
Filing date: 2022-12-13
Publication date: 2023-01-31

Abstract

The disclosure relates to the technical field of machine learning, and provides a model pruning method and device. The method comprises the following steps: screening out a target convolution layer needing channel pruning from the target model; clustering the filters in the target convolution layer into a preset number of filter groups by using a clustering algorithm; performing multiple rounds of training on the target convolutional layer to enable parameters of the same group of filters in the target convolutional layer to be continuously close until the difference value of the parameters of the same group of filters in the target convolutional layer is smaller than a preset threshold value, and ending the training; each group of filters in the target convolutional layer are fused into one filter to obtain a target convolutional layer after channel pruning; and according to the target convolutional layer after the channel pruning, carrying out fusion processing on a filter in a convolutional layer next to the target convolutional layer in the target model to obtain the target model after the channel pruning. By adopting the technical means, the problem that the precision of the model is reduced due to the existing model channel pruning technology in the prior art is solved.

Description

Model pruning method and device

Technical Field

The disclosure relates to the technical field of machine learning, in particular to a model pruning method and device.

Background

As the convolutional network becomes wider and deeper, the memory occupation, the power consumption and the floating point operation times per second of the convolutional network are all increased sharply. In this context, convolutional network compression and acceleration methods have gained wide attention. Compared with compression methods such as model quantization and sparsification, channel pruning (also called filter pruning) is independent of model parameter precision, and does not depend on a special hardware structure, so that a better compression acceleration effect can be obtained, and the method is a research focus in recent years. The current channel pruning method estimates the importance of the filters through various indexes artificially designed, directly prunes some filters (the weight is 0), and reconstructs the network by using the rest filters. However, although those pruned filters are less important in a sense, they are not fully redundant, and thus the model accuracy is degraded by the pruning reconstruction operation.

In the course of implementing the disclosed concept, the inventors found that there are at least the following technical problems in the related art: the existing model channel pruning technology can cause the problem of reduced model precision.

Disclosure of Invention

In view of this, embodiments of the present disclosure provide a model pruning method and apparatus, an electronic device, and a computer-readable storage medium, so as to solve the problem in the prior art that the accuracy of a model is reduced due to the existing model channel pruning technology.

In a first aspect of the embodiments of the present disclosure, a model pruning method is provided, including: screening a target convolutional layer needing channel pruning from the target model; clustering the filters in the target convolution layer into a preset number of filter groups by using a clustering algorithm; performing multiple rounds of training on the target convolutional layer to enable parameters of the same group of filters in the target convolutional layer to be close to each other continuously until the difference value of the parameters of the same group of filters in the target convolutional layer is smaller than a preset threshold value, and finishing the training; fusing each group of filters in the target convolutional layer into one filter to obtain a target convolutional layer after channel pruning; and according to the target convolutional layer after the channel pruning, carrying out fusion processing on a filter in a convolutional layer next to the target convolutional layer in the target model to obtain the target model after the channel pruning.

In a second aspect of the embodiments of the present disclosure, there is provided a model pruning device, including: the screening module is configured to screen out a target convolutional layer needing channel pruning from the target model; a clustering module configured to cluster the filters in the target convolution layer into a preset number of filter banks using a clustering algorithm; the training module is configured to perform multiple rounds of training on the target convolutional layer, so that the parameters of the same group of filters in the target convolutional layer are continuously close to each other until the difference value of the parameters of the same group of filters in the target convolutional layer is smaller than a preset threshold value, and the training is finished; the first fusion module is configured to fuse each group of filters in the target convolutional layer into one filter to obtain a target convolutional layer after channel pruning; and the second fusion module is configured to perform fusion processing on the filter in the convolution layer next to the target convolution layer in the target model according to the target convolution layer after channel pruning to obtain the target model after channel pruning.

In a third aspect of the embodiments of the present disclosure, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the above method when executing the computer program.

In a fourth aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, which stores a computer program, which when executed by a processor, implements the steps of the above-mentioned method.

Compared with the prior art, the embodiment of the disclosure has the following beneficial effects: screening out a target convolution layer needing channel pruning from the target model; clustering the filters in the target convolution layer into a preset number of filter groups by using a clustering algorithm; performing multiple rounds of training on the target convolutional layer to enable parameters of the same group of filters in the target convolutional layer to be continuously close until the difference value of the parameters of the same group of filters in the target convolutional layer is smaller than a preset threshold value, and ending the training; fusing each group of filters in the target convolutional layer into one filter to obtain a target convolutional layer after channel pruning; and according to the target convolutional layer after the channel pruning, carrying out fusion processing on a filter in a convolutional layer next to the target convolutional layer in the target model to obtain the target model after the channel pruning. By adopting the technical means, the problem that the precision of the model is reduced due to the existing model channel pruning technology in the prior art can be solved, and a method which can not reduce the precision of the model after channel pruning is further provided.

Drawings

To more clearly illustrate the technical solutions in the embodiments of the present disclosure, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without inventive efforts.

FIG. 1 is a scenario diagram of an application scenario of an embodiment of the present disclosure;

FIG. 2 is a schematic flow chart diagram of a model pruning method provided by an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a model pruning device provided by the embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, techniques, etc. in order to provide a thorough understanding of the disclosed embodiments. It will be apparent, however, to one skilled in the art that the present disclosure may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present disclosure with unnecessary detail.

A model pruning method and apparatus according to an embodiment of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 1 is a scene schematic diagram of an application scenario of an embodiment of the present disclosure. The application scenario may include

terminal devices

101, 102, and 103, server 104, and network 105.

The

terminal devices

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, and 103 are hardware, they may be various electronic devices having a display screen and supporting communication with the server 104, including but not limited to smart phones, robots, laptop portable computers, desktop computers, and the like (e.g., 102 may be a robot); when the

terminal apparatuses

101, 102, and 103 are software, they can be installed in the electronic apparatus as above. The

terminal devices

101, 102, and 103 may be implemented as a plurality of software or software modules, or may be implemented as a single software or software module, which is not limited by the embodiments of the present disclosure. Further, various applications, such as a data processing application, an instant messaging tool, social platform software, a search type application, a shopping type application, and the like, may be installed on the

terminal devices

101, 102, and 103.

The server 104 may be a server providing various services, for example, a backend server receiving a request sent by a terminal device establishing a communication connection with the server, and the backend server may receive and analyze the request sent by the terminal device and generate a processing result. The server 104 may be a server, may also be a server cluster composed of a plurality of servers, or may also be a cloud computing service center, which is not limited in this disclosure.

The server 104 may be hardware or software. When the server 104 is hardware, it may be various electronic devices that provide various services to the

terminal devices

101, 102, and 103. When the server 104 is software, it may be multiple software or software modules providing various services for the

terminal devices

101, 102, and 103, or may be a single software or software module providing various services for the

terminal devices

101, 102, and 103, which is not limited by the embodiment of the present disclosure.

The network 105 may be a wired network connected by a coaxial cable, a twisted pair and an optical fiber, or may be a wireless network that can interconnect various Communication devices without wiring, for example, bluetooth (Bluetooth), near Field Communication (NFC), infrared (Infrared), and the like, which is not limited in the embodiment of the present disclosure.

The target user can establish a communication connection with the server 104 via the network 105 through the

terminal devices

101, 102, and 103 to receive or transmit information or the like. It should be noted that the specific types, numbers and combinations of the

terminal devices

101, 102 and 103, the server 104 and the network 105 may be adjusted according to the actual requirements of the application scenario, and the embodiment of the present disclosure does not limit this.

Fig. 2 is a schematic flow chart of a model pruning method according to an embodiment of the present disclosure. The model pruning method of fig. 2 may be performed by the terminal device or the server of fig. 1. As shown in fig. 2, the model pruning method includes:

s201, screening out a target convolutional layer needing channel pruning from a target model;

s202, clustering the filters in the target convolution layer into a preset number of filter groups by using a clustering algorithm;

s203, performing multiple rounds of training on the target convolutional layer to enable parameters of the same group of filters in the target convolutional layer to be close to each other continuously until the difference value of the parameters of the same group of filters in the target convolutional layer is smaller than a preset threshold value, and finishing the training;

s204, fusing each group of filters in the target convolutional layer into one filter to obtain a target convolutional layer after channel pruning;

s205, according to the target convolutional layer after channel pruning, the filter in the convolutional layer next to the target convolutional layer in the target model is subjected to fusion processing, and the target model after channel pruning is obtained.

The target model may be any model used in machine learning. The channel pruning is performed on the target model, and is performed on a plurality of target convolutional layers in the target model, so that the number of the target convolutional layers in the embodiment of the present disclosure may be multiple. When clustering the filter, a common clustering algorithm may be used, such as k-means. The filters in the convolutional layer are a matrix of size m x n that is used to detect specific features in the image, with different filters having different parameters. And performing multiple rounds of training on the target convolutional layer to enable the parameters of the same group of filters in the target convolutional layer to be continuously close, and when the parameters of the same group of filters are close to a certain degree (namely the difference value of the parameters of the same group of filters is smaller than a preset threshold), replacing the group of filters with one filter. Filter fusion is understood to be a superposition of filters, after which a filter can be used instead of the set of filters. After the channel pruning of the target convolutional layer is completed, because the output layer of the previous layer in the adjacent convolutional layers must be consistent with the input layer of the next layer in the number of channels (or the number of filters is consistent, and the number of filters is equal to the number of channels), the channel pruning can be performed on the convolutional layer of the next layer of the target convolutional layer in the target model according to the principle, and according to the method, the channel pruning of the target model is completed.

Optionally, the target convolutional layer to be subjected to channel pruning is screened from the target model according to the principle that the convolutional layers after channel pruning cannot influence each other. For example, the channel pruning of the first layer of the convolutional layer can affect the second layer of the convolutional layer, but does not affect the third layer of the convolutional layer, so that a target convolutional layer needing the channel pruning is screened from the target model, and the target convolutional layer can be the first layer of the convolutional layer and the third layer of the convolutional layer (because the channel pruning of the first layer of the convolutional layer can affect the second layer of the convolutional layer, the channel pruning of the second layer of the convolutional layer can be completed by utilizing the effect, so that the second layer of the convolutional layer does not need to be picked independently) \\8230

According to the technical scheme provided by the embodiment of the disclosure, a target convolutional layer needing channel pruning is screened from a target model; clustering the filters in the target convolution layer into a preset number of filter groups by using a clustering algorithm; performing multiple rounds of training on the target convolutional layer to enable parameters of the same group of filters in the target convolutional layer to be continuously close until the difference value of the parameters of the same group of filters in the target convolutional layer is smaller than a preset threshold value, and ending the training; each group of filters in the target convolutional layer are fused into one filter to obtain a target convolutional layer after channel pruning; and according to the target convolutional layer after the channel pruning, carrying out fusion processing on a filter in a convolutional layer next to the target convolutional layer in the target model to obtain the target model after the channel pruning. By adopting the technical means, the problem that the precision of the model is reduced due to the existing model channel pruning technology in the prior art can be solved, and a method which can not reduce the precision of the model after channel pruning is further provided.

Performing multiple rounds of training on the target convolutional layer to enable the parameters of the same group of filters in the target convolutional layer to be continuously close to each other until the difference value of the parameters of the same group of filters in the target convolutional layer is smaller than a preset threshold value, and finishing the training, wherein the training comprises the following steps: calculating a central point matrix of each filter group according to all filters in the filter group; sequentially calculating the distance sum of each filter in each filter group to the central point matrixes of all other filter groups; determining the filter with the maximum corresponding distance sum in each filter group as a target filter of the filter group; and performing multiple rounds of training on each filter group by using a gradient descent function according to the target filter of each filter group, so that the parameters of the filters in the same group are continuously close to each other until the difference value of the parameters of the filters in the same group is smaller than a preset threshold value, and finishing the training.

The center point matrix of each filter bank can be obtained by summing and averaging all the filters in the filter bank.

Sequentially calculating the distance sum of each filter in each filter group to the central point matrix of all other filter groups, wherein the distance sum comprises the following steps: for each filter bank, sequentially calculating the distance sum of each filter in the filter bank to the central point matrix of all other filter banks by using the following formula:

wherein, P _i For the ith filter in the filter bank, M _j For the jth filter bank except the filter bank, K is the number of filter banks in the target convolutional layer and | is the norm operator.

For example for a filter bank, P _i The sum of the distances to which the filter corresponds is maximum, then P _i Is the target filter of the filter bank. So that each filter bank can determine a target filter.

According to the target filter of each filter group, performing multi-round training on each filter group by using a gradient descent function, so that the parameters of the filters in the same group are continuously close to each other until the difference value of the parameters of the filters in the same group is smaller than a preset threshold value, and finishing the training, wherein the training comprises the following steps: the following steps are executed in a circulating mode for multiple rounds of training: training each filter bank by using a gradient descent function according to the target filter of each filter bank, and adding one to the training round; when the difference value of the parameters of the filters in the same group after training is smaller than a preset threshold value, finishing the training; and when the difference value of the parameters of the filters in the same group after training is not less than the preset threshold value, updating the target filter of each filter group after training, and continuing the next round of training.

When the difference value of the parameters of the filters in the same group after training is not less than a preset threshold value, updating the central point matrix of each filter group according to all the filters in each filter group after training; sequentially updating the distance sum of each filter in each filter group to the central point matrixes of all other filter groups; and updating the target filter according to the updated distance sum.

The gradient descent function is:

wherein, P ₀ For the target filter in each filter bank, P _h For the H-th filter, H, in each filter bank _i For the number of filters in each filter bank, L is the loss function of the target model.

The training of each filter bank is the training of the filters in each filter bank. And calculating the gradient value of the filter in each filter bank in the direction with the fastest gradient descent by using a gradient descent function, and further updating the parameters of the filter in each filter bank by using a gradient back propagation algorithm.

The gradient descent function may be derived by:

because the gradient is descending in the direction of

P _h To P ₀ The gradient of the approach is in the descending direction

So a gradient descent function of

Further, the gradient descent function can be derived as:

wherein, P ₀ For the target filter, P, in each filter bank _h For the H-th filter, H, in each filter bank _i For the number of filters in each filter bank, L is the loss function of the target model, η is the attenuation factor, ε is the step size, P _i Is the ith filter in the filter bank.

Further, the gradient descent function can be derived as:

a and b are constants that can be adjusted.

According to the target convolutional layer after the channel pruning, carrying out fusion processing on a filter in a convolutional layer next to the target convolutional layer in the target model to obtain the target model after the channel pruning, wherein the fusion processing comprises the following steps: according to the filter in the output layer of the target convolutional layer after channel pruning, carrying out fusion processing on the filter in the input layer of the next convolutional layer of the target convolutional layer in the target model; and according to the fused filter in the input layer of the next layer of the target convolutional layer, carrying out fusion processing on the filters in other layers of the next layer of the target convolutional layer to obtain a target model after channel pruning.

All the above optional technical solutions may be combined arbitrarily to form optional embodiments of the present application, and are not described herein again.

The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods. For details not disclosed in the embodiments of the apparatus of the present disclosure, refer to the embodiments of the method of the present disclosure.

Fig. 3 is a schematic diagram of a model pruning device provided in an embodiment of the present disclosure. As shown in fig. 3, the model pruning device includes:

the screening module 301 is configured to screen out a target convolutional layer which needs to be subjected to channel pruning from a target model;

a clustering module 302 configured to cluster the filters in the target convolution layer into a preset number of filter banks using a clustering algorithm;

a training module 303 configured to perform multiple rounds of training on the target convolutional layer, so that the parameters of the same group of filters in the target convolutional layer are continuously close to each other until the difference value of the parameters of the same group of filters in the target convolutional layer is smaller than a preset threshold value, and ending the training;

a first fusion module 304 configured to fuse each group of filters in the target convolutional layer into one filter, so as to obtain a channel pruned target convolutional layer;

a second fusion module 305, configured to perform fusion processing on the filter in the convolutional layer next to the target convolutional layer in the target model according to the target convolutional layer after channel pruning, to obtain the target model after channel pruning.

The target model may be any model used in machine learning. The channel pruning is performed on the target model, and is performed on a plurality of target convolutional layers in the target model, so that the number of the target convolutional layers in the embodiment of the present disclosure may be multiple. When clustering the filter, a common clustering algorithm may be used, such as k-means. The filters in the convolutional layer are a matrix of m x n size that is used to detect specific features in the image, with different filters having different parameters. And performing multiple rounds of training on the target convolutional layer to enable the parameters of the same group of filters in the target convolutional layer to be continuously close, and when the parameters of the same group of filters are close to a certain degree (namely the difference value of the parameters of the same group of filters is smaller than a preset threshold), replacing the group of filters with one filter. Filter fusion is understood to be a superposition of filters, after which a filter can be used instead of the set of filters. After the channel pruning of the target convolutional layer is completed, because the output layer of the previous layer in the adjacent convolutional layers must be consistent with the input layer of the next layer in the number of channels (or the number of filters is consistent, and the number of filters is equal to the number of channels), the channel pruning can be performed on the convolutional layer of the next layer of the target convolutional layer in the target model according to the principle, and according to the method, the channel pruning of the target model is completed.

Optionally, the screening module 301 is further configured to screen out a target convolutional layer to be subjected to channel pruning from the target model according to a principle that the convolutional layers after channel pruning cannot affect each other. For example, the channel pruning of the first layer of the convolutional layer can affect the second layer of the convolutional layer, but does not affect the third layer of the convolutional layer, so that a target convolutional layer needing the channel pruning is screened from the target model, and the target convolutional layer can be the first layer of the convolutional layer and the third layer of the convolutional layer (because the channel pruning of the first layer of the convolutional layer can affect the second layer of the convolutional layer, the channel pruning of the second layer of the convolutional layer can be completed by utilizing the effect, so that the second layer of the convolutional layer does not need to be picked independently) \\8230

According to the technical scheme provided by the embodiment of the disclosure, a target convolutional layer needing channel pruning is screened from a target model; clustering the filters in the target convolution layer into a preset number of filter groups by using a clustering algorithm; performing multiple rounds of training on the target convolutional layer to enable parameters of the same group of filters in the target convolutional layer to be close to each other continuously until the difference value of the parameters of the same group of filters in the target convolutional layer is smaller than a preset threshold value, and finishing the training; fusing each group of filters in the target convolutional layer into one filter to obtain a target convolutional layer after channel pruning; and according to the target convolutional layer after the channel pruning, carrying out fusion processing on a filter in a convolutional layer next to the target convolutional layer in the target model to obtain the target model after the channel pruning. By adopting the technical means, the problem that the precision of the model is reduced due to the existing model channel pruning technology in the prior art can be solved, and a method which can not reduce the precision of the model after channel pruning is further provided.

Optionally, the training module 303 is further configured to calculate a central point matrix of each filter bank according to all filters in the filter bank; sequentially calculating the distance sum of each filter in each filter group to the central point matrixes of all other filter groups; determining the filter with the maximum corresponding distance in each filter group as a target filter of the filter group; and performing multiple rounds of training on each filter group by using a gradient descent function according to the target filter of each filter group, so that the parameters of the filters in the same group are continuously close to each other until the difference value of the parameters of the filters in the same group is smaller than a preset threshold value, and finishing the training.

Optionally, the training module 303 is further configured to, for each filter bank, sequentially calculate a sum of distances from each filter in the filter bank to the center point matrix of all other filter banks using the following formula:

wherein, P _i For the ith filter, M, in the filter bank _j For the jth filter bank except the filter bank, K is the number of filter banks in the target convolution layer and | | is the norm operator.

Optionally, the training module 303 is further configured to perform multiple rounds of training by performing the following steps in a loop: training each filter bank by using a gradient descent function according to the target filter of each filter bank, and adding one to the training round; when the difference value of the parameters of the filters in the same group after training is smaller than a preset threshold value, finishing the training; and when the difference value of the parameters of the filters in the same group after training is not less than the preset threshold value, updating the target filter of each filter group after training, and continuing the next round of training.

The gradient descent function is:

wherein, P ₀ For the target filter in each filter bank, P _h For the H-th filter in each filter bank, H _i L is the loss function of the target model for the number of filters in each filter bank.

The gradient descent function may be derived by:

since the gradient is descending in the direction of

P _h To P ₀ The gradient of the approach is in the descending direction

So a gradient descent function of

Further, the gradient descent function can be derived as:

wherein, P ₀ For the target filter in each filter bank, P _h For the H-th filter in each filter bank, H _i For the number of filters in each filter bank, L is the loss function of the target model, η is the attenuation factor, ε is the step size, P _i Is the ith filter in the filter bank.

Further, it can be derived that the gradient descent function is:

a and b are constants that can be adjusted.

Optionally, the second fusion module 305 is further configured to perform fusion processing on the filter in the input layer of the convolutional layer next to the target convolutional layer in the target model according to the filter in the output layer of the target convolutional layer after channel pruning; and according to the fused filter in the input layer of the next layer of the target convolutional layer, carrying out fusion processing on the filters in other layers of the next layer of the target convolutional layer to obtain a target model after channel pruning.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present disclosure.

Fig. 4 is a schematic diagram of an electronic device 4 provided by the embodiment of the present disclosure. As shown in fig. 4, the electronic apparatus 4 of this embodiment includes: a processor 401, a memory 402 and a computer program 403 stored in the memory 402 and executable on the processor 401. The steps in the various method embodiments described above are implemented when the processor 401 executes the computer program 403. Alternatively, the processor 401 implements the functions of the respective modules/units in the above-described respective apparatus embodiments when executing the computer program 403.

Illustratively, the computer program 403 may be partitioned into one or more modules/units, which are stored in the memory 402 and executed by the processor 401 to accomplish the present disclosure. One or more modules/units may be a series of computer program instruction segments capable of performing certain functions, which are used to describe the execution of the computer program 403 in the electronic device 4.

The electronic device 4 may be a desktop computer, a notebook, a palm computer, a cloud server, or other electronic devices. The electronic device 4 may include, but is not limited to, a processor 401 and a memory 402. Those skilled in the art will appreciate that fig. 4 is merely an example of the electronic device 4, and does not constitute a limitation of the electronic device 4, and may include more or less components than those shown, or combine certain components, or different components, e.g., the electronic device may also include input-output devices, network access devices, buses, etc.

The Processor 401 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 402 may be an internal storage unit of the electronic device 4, for example, a hard disk or a memory of the electronic device 4. The memory 402 may also be an external storage device of the electronic device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the electronic device 4. Further, the memory 402 may also include both internal storage units of the electronic device 4 and external storage devices. The memory 402 is used for storing computer programs and other programs and data required by the electronic device. The memory 402 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules, so as to perform all or part of the functions described above. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

In the embodiments provided in the present disclosure, it should be understood that the disclosed apparatus/electronic device and method may be implemented in other ways. For example, the above-described apparatus/electronic device embodiments are merely illustrative, and for example, a module or a unit may be divided into only one logical function, and may be implemented in other ways, and multiple units or components may be combined or integrated into another system, or some features may be omitted or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, the present disclosure may implement all or part of the flow of the method in the above embodiments, and may also be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of the above methods and embodiments. The computer program may comprise computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. It should be noted that the computer-readable medium may contain suitable additions or subtractions depending on the requirements of legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer-readable media may not include electrical carrier signals or telecommunication signals in accordance with legislation and patent practice.

The above examples are only intended to illustrate the technical solution of the present disclosure, not to limit it; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present disclosure, and are intended to be included within the scope of the present disclosure.

Claims

1. A method of model pruning, comprising:

screening a target convolutional layer needing channel pruning from the target model;

clustering the filters in the target convolution layer into a preset number of filter groups by using a clustering algorithm;

performing multiple rounds of training on the target convolutional layer to enable parameters of the same group of filters in the target convolutional layer to be continuously close until the difference value of the parameters of the same group of filters in the target convolutional layer is smaller than a preset threshold value, and ending the training;

each group of filters in the target convolutional layer is fused into one filter to obtain the target convolutional layer after the channel is pruned;

and according to the target convolutional layer after the channel pruning, carrying out fusion processing on a filter in a convolutional layer next to the target convolutional layer in the target model to obtain the target model after the channel pruning.

2. The method of claim 1, wherein performing multiple rounds of training on the target convolutional layer so that parameters of the same group of filters in the target convolutional layer are continuously close until the difference between the parameters of the same group of filters in the target convolutional layer is smaller than a preset threshold, and ending training comprises:

calculating a central point matrix of each filter group according to all filters in the filter group;

sequentially calculating the distance sum of each filter in each filter group to the central point matrixes of all other filter groups;

determining the filter with the maximum corresponding distance in each filter group as a target filter of the filter group;

and performing multiple rounds of training on each filter group by using a gradient descent function according to the target filter of each filter group, so that the parameters of the filters in the same group are continuously close to each other until the difference value of the parameters of the filters in the same group is smaller than the preset threshold value, and finishing the training.

3. The method of claim 2, wherein calculating the sum of distances from each filter in each filter bank to the center point matrix of all other filter banks in turn comprises:

for each filter bank, sequentially calculating the distance sum of each filter in the filter bank to the central point matrix of all other filter banks by using the following formula:

wherein, P _i For the ith filter in the filter bank, M _j For the jth filter bank except the filter bank, K is the number of filter banks in the target convolution layer, | | | | is a normA number operator.

4. The method according to claim 2, wherein the performing multiple rounds of training on each filter bank according to the target filter of each filter bank by using a gradient descent function so that the parameters of the filters in the same group are continuously close until the difference value of the parameters of the filters in the same group is smaller than the preset threshold value, and ending the training comprises:

the following steps are executed in a circulating mode for the multiple rounds of training:

according to the target filter of each filter group, training each filter group by using the gradient descent function, and adding one to the training round;

when the difference value of the parameters of the filters in the same group after training is smaller than the preset threshold value, finishing the training;

and when the difference value of the parameters of the filters in the same group after training is not less than the preset threshold value, updating the target filter of each filter group after training, and continuing the next round of training.

5. The method according to claim 2 or 4, wherein the gradient descent function is:

wherein, P ₀ For the target filter, P, in each filter bank _h For the H-th filter, H, in each filter bank _i For the number of filters in each filter bank, L is the loss function of the target model.

6. The method according to claim 2 or 4, wherein the gradient descent function is:

wherein, P ₀ For the target filter in each filter bank, P _h For the H-th filter in each filter bank, H _i For the number of filters in each filter bank, L is the loss function of the object model, η is the attenuation factor, ε is the step size, P _i For the ith filter in each filter bank.

7. The method of claim 1, wherein the performing, according to the channel-pruned target convolutional layer, a fusion process on a filter in a convolutional layer next to the target convolutional layer in the target model to obtain the channel-pruned target model comprises:

according to the filter in the output layer of the target convolutional layer after the channel pruning, carrying out fusion processing on the filter in the input layer of the convolutional layer next to the target convolutional layer in the target model;

and according to the fused filter in the input layer of the next layer of the target convolutional layer, carrying out fusion processing on the filters in other layers of the next layer of the target convolutional layer to obtain the target model after channel pruning.

8. A model pruning device, comprising:

the screening module is configured to screen out a target convolutional layer needing channel pruning from the target model;

a clustering module configured to cluster the filters in the target convolution layer into a preset number of filter banks using a clustering algorithm;

a training module configured to perform multiple rounds of training on the target convolutional layer, so that parameters of the same group of filters in the target convolutional layer are continuously close until a difference value of the parameters of the same group of filters in the target convolutional layer is smaller than a preset threshold value, and ending the training;

a first fusion module configured to fuse each group of filters in the target convolutional layer into one filter, so as to obtain a target convolutional layer after channel pruning;

and the second fusion module is configured to perform fusion processing on a filter in a convolutional layer next to the target convolutional layer in the target model according to the target convolutional layer subjected to channel pruning to obtain the target model subjected to channel pruning.

9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.