CN116863419A

CN116863419A - Method and device for lightening target detection model, electronic equipment and medium

Info

Publication number: CN116863419A
Application number: CN202311127930.9A
Authority: CN
Inventors: 王进; 刘明朝; 王明择; 石英
Original assignee: Hubei Changtou Smart Parking Co ltd; Wuhan University of Technology WUT
Current assignee: Hubei Changtou Smart Parking Co ltd; Wuhan University of Technology WUT
Priority date: 2023-09-04
Filing date: 2023-09-04
Publication date: 2023-10-10

Abstract

The application relates to a method, a device, electronic equipment and a medium for lightening a target detection model, wherein the method comprises the following steps: replacing a backbone network in the target detection model with a GhostNet backbone network, the GhostNet backbone network being formed by a GhostBottleneck module; pruning is carried out on the GhostNet backbone network of the target detection model after retraining based on a channel pruning algorithm of the composite index, so that an optimized GhostBottleneck module is obtained; and further pruning the optimized GhostBottleneck module based on a soft pruning algorithm to obtain an optimized lightweight target detection model, and detecting the vehicle based on the lightweight target detection model. The application effectively reduces redundant characteristics and improves the detection efficiency.

Description

Method and device for lightening target detection model, electronic equipment and medium

Technical Field

The present application relates to the field of machine learning technologies, and in particular, to a method for lightening a target detection model, a device, an electronic apparatus, and a medium for lightening a target detection model.

Background

A common deep learning-based vehicle target detection model for detecting a vehicle is generally composed of three parts: a Backbone network (Backbone) for feature extraction, a Neck network (neg) for feature fusion, and a Head network (Head) for outputting network detection results. Of these three components, the Backbone network (Backbone) required for feature extraction typically accounts for over 70% of the computing resources required for the overall vehicle object detection model. Therefore, the lightweight result of the backbone network can greatly affect the degree of weight reduction of the entire vehicle target detection model.

The current target detection algorithm based on the deep convolutional neural network has the defects of too large calculation resource requirement and serious memory consumption, so that the detection efficiency is low, and the vehicle detection efficiency is low.

Disclosure of Invention

In view of the foregoing, it is necessary to provide a method, apparatus, electronic device and medium for lightening a target detection model, so as to improve the detection efficiency of the target detection model, thereby achieving the purpose of improving the detection efficiency of the vehicle.

In order to achieve the above object, the present application provides a method for lightening a target detection model, including:

acquiring a target detection model based on a GhostNet backbone network, wherein the GhostNet backbone network is formed by a GhostBottleneck module;

pruning is carried out on the GhostNet backbone network of the target detection model after retraining based on a channel pruning algorithm of the composite index, so that an optimized GhostBottleneck module is obtained;

and further pruning the optimized GhostBottleneck module based on a soft pruning algorithm to obtain an optimized light target detection model, and detecting the vehicle based on the light target detection model.

In some possible implementations, the target detection model is ResNet-50 or YOLOV5s.

In some possible implementations, the GhostBottleneck module is composed of a stack of several Ghost modules.

In some possible implementations, the channel pruning algorithm of the composite index includes geometric median-based pruning and norm-based pruning.

In some possible implementations, the geometry median based pruning of the GhostNet backbone network of the target detection model after retraining includes:

calculating norms of all convolution kernels in the GhostNet backbone network, calculating geometrical median values in a data space of the convolution kernels of the layer by layer, and searching a convolution kernel set with the minimum Euclidean distance between the geometrical median values;

and training the GhostNet backbone network to a preset accuracy, calculating a distance threshold based on the proportion to be pruned, and pruning the convolution kernel with the distance smaller than the threshold from the geometric median.

In some possible implementations, the geometric median is calculated by the formula:

wherein ,representing a geometric median; />Representing an input feature map; />Representing input feature map->Is the width and height of (2); />Representing a characteristic map->The number of channels; />A j-th convolution kernel representing an i-th layer; />Indicating the number of output channels of the i-th layer.

In some possible implementations, the further pruning the optimized ghostbottleck module based on the soft pruning algorithm to obtain an optimized lightweight target detection model, and performing vehicle detection based on the lightweight target detection model, including:

and setting the value of the convolution kernel in the optimized GhostBottleneck module to zero to obtain an optimized light target detection model, and detecting the vehicle based on the light target detection model.

On the other hand, the application also provides a device for lightening the target detection model, which comprises:

the light weight module is used for acquiring a target detection model based on a GhostNet backbone network, wherein the GhostNet backbone network is formed by a GhostBottleneck module;

the composite pruning module is used for pruning the channel pruning algorithm of the GhostNet backbone network of the target detection model based on the composite index after retraining to obtain an optimized GhostBottleneck module;

the soft pruning module is used for further pruning the optimized GhostBottleneck module based on a soft pruning algorithm to obtain a light target detection model of the further optimized GhostBottleneck module, and detecting the vehicle based on the light target detection model.

In another aspect, the application also provides an electronic device comprising a memory and a processor, wherein,

the memory is used for storing programs;

the processor is coupled to the memory, and is configured to execute the program stored in the memory, so as to implement a step in the method for lightening the object detection model in any one of the foregoing implementation manners.

In another aspect, the present application further provides a computer readable storage medium storing a computer readable program or instructions, where the program or instructions, when executed by a processor, implement the steps in a method for lightening an object detection model in any one of the above implementations.

The beneficial effects of adopting the embodiment are as follows: according to the method for lightening the target detection model, the target detection model formed based on the GhostNet backbone network is obtained, and the GhostNet backbone network is formed by a GhostBottleneck module; pruning is carried out on the GhostNet backbone network of the target detection model after retraining based on the channel pruning algorithm of the composite index, and finally soft pruning is further adopted for optimization after pruning of the channel pruning algorithm of the composite index. According to the application, the backbone network in the target detection model is replaced by the GhostNet backbone network, so that the target detection model is light, a channel pruning algorithm and soft pruning with composite indexes are further adopted, the redundancy characteristic is reduced, and the detection efficiency of the target detection model is improved.

Drawings

FIG. 1 is a flowchart illustrating a method for lightening a target detection model according to an embodiment of the present application;

fig. 2 is a schematic diagram of a structure of a ghostbottleck in the method for lightening a target detection model provided by the application;

FIG. 3 is a schematic structural diagram of an embodiment of a device for lightening a target detection model according to the present application;

fig. 4 is a schematic structural diagram of an embodiment of an electronic device according to the present application.

Detailed Description

The following detailed description of preferred embodiments of the application is made in connection with the accompanying drawings, which form a part hereof, and together with the description of the embodiments of the application, are used to explain the principles of the application and are not intended to limit the scope of the application.

Fig. 1 is a flowchart of a method for lightening a target detection model according to an embodiment of the present application, as shown in fig. 1, the method for lightening a target detection model includes:

Compared with the prior art, the method for lightening the target detection model provided by the embodiment obtains the target detection model formed based on the GhostNet backbone network, wherein the GhostNet backbone network is formed by a GhostBottleneck module; pruning is carried out on the GhostNet backbone network of the target detection model after retraining based on the channel pruning algorithm of the composite index, and finally soft pruning is further adopted for optimization after pruning of the channel pruning algorithm of the composite index. According to the application, the backbone network in the target detection model is replaced by the GhostNet backbone network, so that the target detection model is light, a channel pruning algorithm and soft pruning with composite indexes are further adopted, the redundancy characteristic is reduced, and the detection efficiency of the target detection model is improved.

Note that, the GhostNet uses deep convolution as a simple linear operation generation feature.

In some embodiments of the application, the target detection model is ResNet-50 or YOLOV5s.

In some embodiments of the application, the GhostBottleneck module is made up of a stack of several Ghost modules.

In a specific embodiment of the present application, in step S101, ghostNet replaces ResNet-50: stacking the Ghost modules results in a GhostBottleneck and further stacking the GhostBottleneck results in a backbone network, ghostNet, to replace the original backbone network, resNet-50. The Ghost module mainly comprises two parts, namely a conventional convolution and linear transformation. Firstly, generating a certain number of feature graphs by adopting conventional convolution, wherein the channel number c' of the feature graphs is smaller than the channel number c of the original convolution, and the feature graphs are used for controlling the calculated amount of a model; and then, generating similar feature images, namely ghosts, by using simple linear operation on each feature image, and splicing the generated ghosts and the feature images generated by convolution to obtain final output. The calculation amount of the Ghost module is about only 1/s of that of the conventional convolution, and compared with the conventional convolution operation, the model parameters are obviously reduced, and the calculation complexity is reduced.

And adopting a Ghost module to lighten the backbone network of the original detection network. Firstly, stacking the Ghost modules to obtain GhostBottleneck, as shown in a in fig. 2, when downsampling is not performed, directly stacking two Ghost modules to extract features, and simultaneously connecting a batch normalization layer and a ReLU activation function behind the Ghost modules, so that network learning is more stable and nonlinearity is introduced. As shown in b of fig. 2, when the GhostBottleneck performs downsampling, the input and output resolutions are different, and the input part also needs to use corresponding transformation, so that the depth convolution with step length of 2 is added in the jump connection to reduce the resolution of the input, and the number of channels is adjusted by 1×1 convolution, and then added to the output of the main part. In addition, a channel attention mechanism is introduced into part GhostBottleneck to play a role in extracting key information in a vehicle target detection scene.

Further stacking GhostBottleneck resulted in GhostNet as the backbone network for the detection model, with the specific structure shown in Table 1, where G-bneck represents GhostBottleneck. The number of channels in the original GhostNet is {24, 40, 112, 160}, and the model capacity is too small, so that the number of channels is increased to {24, 40, 112, 320}, and a certain model capacity is increased by increasing the number of channels, so that the feature extraction capability of the lightweight backbone network is enhanced.

TABLE 1 overall structure of GhostNet

In order to improve the performance of the pruning algorithm, the pruning result is more reasonable and representative, and the channel pruning algorithm based on the Composite Index is provided, so that the redundancy characteristic can be effectively reduced, and the part of the characteristics with the lowest importance degree can be removed to a certain extent. In some embodiments of the application, the channel pruning algorithm of the composite index includes geometric median-based pruning and norm-based pruning.

In a specific embodiment of the present application, the index uses the geometric median as a core index, and uses the norm as an additional index as a complement to the geometric median. For example, when pruning rate is 40%, the geometric median is used to select 30% of the convolution kernels for that layer, and the remaining 10% of the convolution kernels are selected using norm criteria.

In some embodiments of the application, the geometry median based pruning of the GhostNet backbone network of the target detection model after retraining comprises:

In some embodiments of the present application, the geometric median is calculated as:

wherein ,representing a geometric median; />Representing an input feature map; />Representing input feature map->Is the width and height of (2); />Representing a characteristic map->The number of channels; />A j-th convolution kernel representing an i-th layer; />The number of output channels of the i-th layer, i.e., the number of convolution kernels of the layer, is indicated.

In a specific embodiment of the present application, the norm-based pruning calculation formula is:

in the formula ,is convolution kernel->Is a component of the group.

In the specific embodiment of the application, when the channel pruning algorithm based on the composite index is used for a Ghost module, the high similarity characteristics existing in the conventional convolution in the first step can be effectively reduced, and the specific flow is as follows: pruning is firstly carried out on the conventional convolution in the Ghost module, and the corresponding output channels are reduced; then, the quick generation of the characteristics is realized through linear operation, and the corresponding linear operation quantity is required to be increased due to the reduction of the number of output channels; and finally, stacking the two parts of features to obtain final output.

By adopting the channel pruning algorithm based on the composite index optimization, redundant features can be reduced, and the features with the lowest importance degree can be removed to a certain extent, so that the parameter number and the calculated amount of the model are reduced, and the light weight of the model structure is effectively realized.

In some embodiments of the present application, the soft pruning algorithm further prunes the optimized GhostBottleneck module to obtain an optimized lightweight target detection model, and performs vehicle detection based on the lightweight target detection model, including:

In the specific embodiment of the application, soft pruning is introduced into a channel pruning algorithm based on a composite index, and in the kth iteration, a conventional training method is adopted for the model. And after the iterative training is finished, pruning operation is started, namely each layer of the model is traversed, the geometric median of each convolution kernel is calculated, and a plurality of convolution kernels with the minimum values are cut. The pruning operation, i.e. soft pruning, here takes the form of unstructured pruning, i.e. only zeroing the values of the convolution kernel, instead of actually removing the convolution kernel and the corresponding channels, so that the capacity of the model can remain unchanged. So far, the soft pruning operation of the kth iteration is completed, and in the kth+1th iteration, the zero-set convolution kernel still participates in training, so that the important convolution kernel is effectively prevented from being cut by mistake, and the robustness of the model after pruning is improved.

In order to better implement the method for lightening the target detection model in the embodiment of the present application, correspondingly, as shown in fig. 3, the embodiment of the present application further provides a device for lightening the target detection model, where the device 300 for lightening the target detection model includes:

a lightweight module 301, configured to replace a backbone network in the target detection model with a GhostNet backbone network, where the GhostNet backbone network is formed by a GhostBottleneck module;

the composite pruning module 302 is configured to prune the re-trained GhostNet backbone network of the target detection model based on a channel pruning algorithm of a composite index to obtain an optimized GhostBottleneck module;

the soft pruning module 303 is configured to prune the optimized GhostBottleneck module further based on a soft pruning algorithm to obtain a light-weighted target detection model of the optimized GhostBottleneck module, and detect a vehicle based on the light-weighted target detection model.

The device 300 for light-weighting a target detection model provided in the foregoing embodiment may implement the technical solution described in the foregoing embodiment of a method for light-weighting a target detection model, and the specific implementation principle of each module or unit may refer to the corresponding content in the foregoing embodiment of a method for light-weighting a target detection model, which is not described herein again.

As shown in fig. 4, the present application further provides an electronic device 400 accordingly. The electronic device 400 comprises a processor 401, a memory 402 and a display 403. Fig. 4 shows only some of the components of the electronic device 400, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead.

The processor 401 may be a central processing unit (Central Processing Unit, CPU), microprocessor or other data processing chip in some embodiments for executing program codes or processing data stored in the memory 402, such as a method for lightweight object detection model in the present application.

In some embodiments, the processor 401 may be a single server or a group of servers. The server farm may be centralized or distributed. In some embodiments, the processor 401 may be local or remote. In some embodiments, the processor 401 may be implemented in a cloud platform. In an embodiment, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an inter-internal, multiple clouds, or the like, or any combination thereof.

The memory 402 may be an internal storage unit of the electronic device 400 in some embodiments, such as a hard disk or memory of the electronic device 400. The memory 402 may also be an external storage device of the electronic device 400 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the electronic device 400.

Further, the memory 403 may also include both internal storage units and external storage devices of the electronic device 400. The memory 402 is used for storing application software and various types of data for installing the electronic device 400.

The display 403 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like in some embodiments. The display 403 is used for displaying information at the electronic device 400 and for displaying a visual user interface. The components 401-403 of the electronic device 400 communicate with each other via a system bus.

In one embodiment, when the processor 401 executes a program for lightening an object detection model in the memory 402, the following steps may be implemented:

replacing a backbone network in the target detection model with a GhostNet backbone network, the GhostNet backbone network being formed by a GhostBottleneck module;

It should be understood that: the processor 401 may realize other functions in addition to the above functions when executing a program for lightening an object detection model in the memory 402, and in particular, reference may be made to the description of the corresponding method embodiments.

Further, the type of the electronic device 400 is not particularly limited, and the electronic device 400 may be a mobile phone, a tablet computer, a Personal Digital Assistant (PDA), a wearable device, a laptop computer (laptop), or other portable electronic devices. Exemplary embodiments of portable electronic devices include, but are not limited to, portable electronic devices that carry IOS, android, microsoft or other operating systems. The portable electronic device described above may also be other portable electronic devices, such as a laptop computer (laptop) or the like having a touch-sensitive surface, e.g. a touch panel. It should also be appreciated that in other embodiments of the application, electronic device 400 may not be a portable electronic device, but rather a desktop computer having a touch-sensitive surface (e.g., a touch panel).

Those skilled in the art will appreciate that all or part of the flow of the methods of the embodiments described above may be accomplished by way of a computer program to instruct associated hardware, where the program may be stored on a computer readable storage medium. Wherein the computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory, etc.

The present application is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present application are intended to be included in the scope of the present application.

Claims

1. A method for lightening a target detection model, comprising:

2. The method of claim 1, wherein the object detection model is ResNet-50 or Yolov5s.

3. The method for lightening a target detection model as claimed in claim 1, wherein said ghostbottleck module is comprised of a stack of several Ghost modules.

4. The method of claim 1, wherein the channel pruning algorithm of the composite index comprises geometric median based pruning and norm based pruning.

5. The method of claim 4, wherein the training the gustnet backbone network of the target detection model based on the pruning of the geometric median comprises:

after training the GhostNet backbone network to a preset accuracy, calculating a distance threshold based on the proportion to be pruned, and pruning the convolution kernels with the distance smaller than the threshold from the geometric median.

6. The method for lightening a target detection model as claimed in claim 5, wherein the geometric median is calculated by the formula:

7. The method for lightening a target detection model according to claim 1, wherein the step of further pruning the optimized ghostbottleck module based on a soft pruning algorithm to obtain an optimized lightening target detection model, and performing vehicle detection based on the lightening target detection model comprises:

8. A device for lightening a target detection model, comprising:

9. An electronic device comprising a memory and a processor, wherein,

the memory is used for storing programs;

the processor, coupled to the memory, is configured to execute the program stored in the memory, so as to implement the steps in the method for lightening an object detection model according to any one of claims 1 to 7.

10. A computer readable storage medium storing a computer readable program or instructions which when executed by a processor enable the steps in a method of lightening an object detection model according to any one of claims 1 to 7.