CN117114053A

CN117114053A - Convolutional neural network model compression method and device based on structure search and knowledge distillation

Info

Publication number: CN117114053A
Application number: CN202311073117.8A
Authority: CN
Inventors: 韩光洁; 陈建杭; 刁博宇; 李超; 郑新千
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-08-24
Filing date: 2023-08-24
Publication date: 2023-11-24

Abstract

The invention discloses a convolutional neural network model compression method and device based on structure search and knowledge distillation, comprising the following steps: acquiring a target task data set; dividing a training set from a target task data set, and training a convolutional neural network model by using the training set; the convolutional neural network model is used as a teacher network after training is completed; searching a convolutional neural network model by utilizing a neural structure searching technology to obtain a lightweight network structure; taking the light-weight network structure obtained by searching as a student network; performing knowledge distillation, calculating the difference of output of a teacher network and a student network Softmax layer, taking the difference as a part of the loss of the student network, and performing iterative training on the student network until convergence; and outputting the student network model after knowledge distillation, namely the model after compression. The method can effectively reduce the redundancy parameters of the model, realizes the automatic design of the student network model, and is suitable for efficient and self-adaptive model compression application scenes.

Description

Convolutional neural network model compression method and device based on structure search and knowledge distillation

Technical Field

The invention belongs to the technical field of artificial intelligence and deep learning, and particularly relates to a convolutional neural network model compression method and device based on structural search and knowledge distillation.

Background

Artificial intelligence techniques represented by deep learning in recent years have exhibited performance beyond human cognition in the scenes of word recognition, voice recognition, target detection, natural language processing, automatic driving, biological information processing, computer games, recommendation systems, medical automatic diagnosis, financial analysis, and the like, and the performance is still being continuously improved. But with the accompanying increase of parameter quantity, the network structure with more and more complex structure is adopted.

In the practical application scene, the air-based platform type informationized equipment is a key for capturing empty rights to obtain initiative rights in modern development according to the advantage of wide air monitoring range. The air-based platform equipment generally takes an aircraft and a floating platform as carriers, carries a task system for bearing various combat tasks, and has various functions of information detection, transmission, control, guidance, countermeasure and the like. Due to the development of embedded platforms and the increase of unmanned combat demands, unmanned aerial vehicle-based edge space-based platforms have gradually become the main angle. However, computing resources on edge space-based platforms are very limited. Thus, compressing the model is an indispensable step in deploying the model onto an empty base platform.

Currently, the methods for model compression are mainly: pruning, quantification, lightweight network structure design and knowledge distillation. Among them, knowledge distillation is widely used because of its low loss of precision and ease of use. Knowledge distillation includes two network models, of which a larger number of parameters is called a teacher network model and a smaller number of parameters is called a student network model. Knowledge distillation achieves compression of the model by aligning the outputs of the teacher network model and the student network model logits layers so that the student network model reaches a level close to that of the teacher network model. However, in the current knowledge distillation method, the network model of the student is designed mainly by engineers, and a great deal of parameter redundancy often exists. Moreover, designing a high-performance neural network requires a great deal of trial and error and engineering experience, and is extremely high in cost.

The Chinese patent application with publication number of CN113159173A discloses a convolutional neural network model compression method combining pruning and knowledge distillation, which takes a trained convolutional neural network model as a teacher network, performs pruning on the teacher network to obtain a student network, and further performs knowledge distillation on the teacher network and the student network. However, the pruning method for setting the threshold is adopted to sparsify the teacher model structure, the threshold setting needs to be tried continuously, the experience requirement is high, and universality is not achieved.

The Chinese patent application with publication number of CN113011570A discloses a self-adaptive high-precision compression method and system of a convolutional neural network model, which searches pruning through a two-stage structure with thickness and fineness to obtain an optimal model structure, and finally restores the precision of the network model after pruning through a knowledge distillation method. The method is still essentially to perform model pruning, and network redundancy parameters are still to be reduced.

Therefore, there is a need for a method that automatically designs a student network model structure to efficiently implement model compression through knowledge distillation.

Disclosure of Invention

In view of the above, the present invention aims to provide a convolutional neural network model compression method and apparatus based on structure search and knowledge distillation, which uses a trained convolutional neural network model as a teacher network, and adopts a neural structure search (NAS) technology to obtain a lightweight network structure as a student network, so as to implement model compression by performing knowledge distillation on the teacher network and the student network, thereby effectively reducing model redundancy parameters, implementing automatic design on the student network model, and being suitable for efficient and self-adaptive model compression application scenarios of resource-limited airborne systems such as inspection unmanned aerial vehicles, reconnaissance unmanned aerial vehicles, etc. in space-based platform equipment task systems.

In order to achieve the above purpose, the technical scheme provided by the invention is as follows:

the embodiment of the invention provides a convolutional neural network model compression method based on structure search and knowledge distillation, which comprises the following steps:

step S1, acquiring a target task data set T;

step S2, dividing a training set T from the target task data set T _train， Using training set T _train Training a convolutional neural network model;

step S3, the convolutional neural network model is used as a teacher network after training is completed;

s4, searching a lightweight network structure for the convolutional neural network model by utilizing a neural structure searching technology;

s5, taking the light-weight network structure obtained by searching as a student network;

step S6, knowledge distillation is carried out, and the training set T is obtained _train Respectively inputting the difference into a teacher network and a student network, calculating the difference of output of a Softmax layer of the teacher network and the student network, taking the difference as a part of the loss of the student network, and performing iterative training on the student network until convergence;

and S7, outputting a student network model after knowledge distillation, namely a model after compression.

Further, the step S2 includes:

s21, dividing the target task data set T to obtain a training set T _train Verification set T _val And test set T _test ；

S22, training set T _train Inputting into a convolutional neural network model;

s23, setting a loss function L of a training convolutional neural network model:

where x represents the training input, y represents the resulting label, W represents the target network weight, f (x, W) represents the output of the target network logits layer, softmax (·) represents converting the output of the logits layer using the softmax functionIs the probability distribution of the category, l _cross (. Cndot.) represents the cross entropy of the convolutional neural network as a loss function of training;

s24, training target network parameters according to the set loss function;

s25, outputting the convolutional neural network model with the trained parameters.

Further, the step S4 includes:

s41, defining a student network search space, constructing a mixed model by the convolutional neural network model through the search space, wherein the mixed model comprises a plurality of independent models, and respectively endowing different structural weights v to the operation among nodes of the search space;

s42, training set T _train Divided into a first training set T _train1 And a second training set T _train2 The method comprises the steps of respectively training structural weights v of the hybrid model and network model weights w of the hybrid model;

s43, fixing the network model weight w in the first training set T _train1 Optimizing structural weight v by the upper training mixed model;

s44, fixing the structural weight v, in the second training set T _train2 The upper training mixed model optimizes the network model weight w;

s45, repeating the steps S43-S44 for a plurality of rounds, determining an independent model structure according to the maximum value of the structural weight of each operation after each round is finished, and inputting a verification set T _val And evaluating the independent model, namely obtaining the lightweight network structure by searching the independent model with highest precision.

Preferably, the search space is a chain structure, and includes M paths, each path has N nodes, and K operations can be adopted between every two nodes.

Preferably, the search space is designed based on VGGNet, the operation between every two nodes consists of different VGGConv, VGGConv is a building block of VGGNet, when a feature map in a target task data set T is input into the building block VGGConv, the input feature map is firstly subjected to convolution with a convolution kernel size of k multiplied by k and a convolution step size of s, then activated by a ReLU function, and finally subjected to batch normalization to obtain an output feature map.

Preferably, the convolution step length is obtained by calculating the input feature map size fm_size1 and the output feature map size fm_size2, so that the number of operation stacks of the adaptive search building block VGGConv is realized in the search space, the alignment of the output feature map sizes of different operations is ensured, and the calculation formula of the convolution step length s is as follows:

preferably, the search space is designed based on mobilet, the operation between every two nodes consists of different MBConv, which is a building block of mobilet, comprising an extended convolution, a depth separable convolution, and a mapped convolution; the extended convolution comprises a convolution layer with the size of 1 multiplied by 1, a BatchNorm layer and a ReLU activation layer, and expands the characteristic map channel according to the expansion coefficient t; the depth separable convolution comprises a k multiplied by k convolution layer, a BatchNorm layer and a ReLU activation layer, wherein each channel in the convolution layer only uses a single convolution kernel to reduce the number of parameters, and the feature extraction is carried out through the depth separable convolution; the mapping convolution comprises a 1×1 convolution layer, a BatchNorm layer and a ReLU activation layer, and the feature map is mapped back to the input channel number through the mapping convolution.

Further, the step S6 includes:

s61, training set T _train Inputting the output data to a teacher network and a student network, and obtaining the output of the teacher network and the student network in an Iogits layer;

s62, converting the output of the logits layer into probability distribution of category by using softmax function;

s63, calculating differences of the class probability distribution of the teacher network and the student network by using the KL divergence, and taking the differences as soft target loss L of the student network _soft-Target (y _student ，y _teacher ) The calculation formula is as follows:

wherein y is _student Representing a class probability distribution of student network predictions, y _teacher Representing class probability distribution predicted by a teacher network;

s64, designing a knowledge validity verification weight H (y _teacher Y), avoiding the student network model learning incorrect knowledge from the teacher network model:

wherein argmax (·) represents a parametrization function, and when the teacher network model predicts the result y _teacher Outputting a validity verification weight value of 1 when the validity verification weight value is consistent with the label y, and predicting a result y when a teacher network model _teacher When the tag y is inconsistent with the tag y, outputting a validity verification weight value of 0;

s65, defining training loss function L of student network _stuedent ：

Wherein α represents hard target loss L _hard-Target (y _student Weight of y), beta represents soft target loss L _soft-Target (y _student ，y _teacher ) Weights of (2);

and S66, performing iterative training on the student network according to the defined student network training loss function until the student network model converges.

Based on the above object, the embodiment of the present invention further provides a convolutional neural network model compression device based on structure search and knowledge distillation, which includes a memory and a processor, wherein the memory is used for storing a computer program, and the processor is used for implementing the convolutional neural network model compression method based on structure search and knowledge distillation when executing the computer program.

Based on the above object, the embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is used in a computer, the above convolutional neural network model compression method based on structure search and knowledge distillation is implemented.

Compared with the prior art, the invention has the beneficial effects that at least the following steps are included:

(1) The invention can effectively reduce the redundant parameters of the convolutional neural network model and reduce the waste of computing resources such as computing power, memory, storage and the like of an operation platform.

(2) According to the invention, the neural structure search (NAS) algorithm is adopted to automatically design the student network model, so that the self-adaption degree is higher, the dependence on the experience of engineers is reduced, and the time cost of student network design is reduced.

(3) The invention has the obvious advantages of high compression rate of the model, less precision loss and the like, and has strong practicability.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for convolutional neural network model compression based on structure search and knowledge distillation provided by an embodiment of the invention;

FIG. 2 is a schematic diagram of Stage structure in a search space designed based on VGGNet according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of VGGConv building blocks in VGGNet according to an embodiment of the present invention;

fig. 4 is a schematic diagram of Stage structure in a search space based on a MobileNet design according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of MBConv building blocks in MobileNet provided by an embodiment of the present invention;

fig. 6 is a schematic diagram of knowledge distillation provided by an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description is presented by way of example only and is not intended to limit the scope of the invention.

The invention is characterized in that: aiming at the problems that in the prior art, a great amount of parameter redundancy and automation of student network model structural design and model compression rate are still required to be improved, the embodiment of the invention provides a convolutional neural network model compression method and device based on structure search and knowledge distillation.

FIG. 1 is a flow chart of a method for convolutional neural network model compression based on structural search and knowledge distillation, provided by an embodiment of the invention. As shown in fig. 1, an embodiment provides a method for compressing a convolutional neural network model based on structural search and knowledge distillation, which is applied to compressing the convolutional neural network model of an airborne system with limited resources such as a patrol unmanned aerial vehicle and a reconnaissance unmanned aerial vehicle in a space-based platform equipment task system, and is used for solving the problem of redundancy of parameters of the convolutional neural network model and reducing the waste of computing resources such as computing power, memory and storage of an operation platform, and the method comprises the following steps:

s1, acquiring a target task data set T.

In an embodiment, the Cifar-10 dataset is taken as the target dataset.

S2, dividing a training set T from the target task data set T _train And training the convolutional neural network model.

In the embodiment, the target task data set T is divided into training sets T according to the proportion of 60%, 20% and 20% _train Verification set T _val And test set T _test The method comprises the steps of carrying out a first treatment on the surface of the Then the training setT _train Inputting into a convolutional neural network model; meanwhile, a loss function L of a training convolutional neural network model is set:

where x represents the training input, y represents the resulting label, W represents the weight of the target network, f (x, W) represents the output of the target network logits layer, softmax (·) represents the probability distribution of converting the output of the logits layer into categories using the softmax function, l _cross (. Cndot.) represents the cross entropy of the convolutional neural network as a loss function of training;

after the loss function is set, training the target network parameters according to the set loss function; and finally, outputting the convolutional neural network model with the trained parameters.

S3, the convolutional neural network model is used as a teacher network after training is completed;

s4, searching a convolutional neural network model by utilizing a neural structure searching technology to obtain a lightweight network structure;

specifically, a student network search space is defined firstly, a convolutional neural network model is used for constructing a hybrid model through the search space, and the hybrid model comprises a plurality of independent models; the search space is of a chain structure and comprises M paths, N nodes are arranged on each path, K operations can be adopted between every two nodes, different structural weights v are respectively assigned to the operations between the nodes of the search space, parameters M, N and K are set according to the space to be searched, and the structural weights v are randomly initialized.

In an embodiment, a search space is designed based on VGGNet, the search space comprises 1 path, 6 nodes are arranged on the path, one Stage (Stage) is called between every two nodes, 6 optional operations in each Stage are arranged, and the structure of the Stage is shown in fig. 2. Each optional operation in Stage is composed of VGGConv, VGGConv is a building block of VGGNet, the structure of VGGConv is shown in fig. 3, when a feature map in a target task data set T is input into a building block vggconv_ka_sd, the input feature map is firstly subjected to convolution of a set convolution kernel size k×k and a convolution step s, then activated by a ReLU function, and finally subjected to batch normalization to obtain an output feature map, wherein k=a in the convolution kernel size is set by human beings, and the convolution step s=d. In order to be able to adaptively search the number of stacks of VGGConv operations of a building block, the alignment of the sizes of different operation output feature maps is ensured, the value of the convolution step s is obtained by calculating the size fm_size1 of the Stage input feature map and the size fm_size2 of the output feature map, so that the search space is adaptive, and the calculation formula of the convolution step s is as follows:

as shown in fig. 2, the 6 operations can be divided into a convolution kernel size, a convolution step size, and different stacking manners: [ VGGConv_k3_sd ], [ VGGConv_k5_sd ], [ VGGConv_k3_sd, VGGConv_k3_s1], [ VGGConv_k3_sd, VGGConv_k5_s1], [ VGGConv_k5_sd, VGGConv_k3_s1], [ VGGConv_k5_sd, VGGConv_k5_s1]. The structural weights v are randomly initialized.

The embodiment also provides a search space design, which is based on the MobileNet, wherein the search space comprises 1 path, 8 nodes are arranged on the path, each two nodes are called one Stage, 7 optional operations in each Stage are arranged, and the structure of the Stage is shown in figure 4. Each optional operation in Stage consists of MBConv, which is a building block of MobileNet, whose structure is shown in fig. 5, where building block mbconv_ka_tb contains three parts of extended convolution, depth separable convolution and mapped convolution, convolution kernel size k=a, expansion coefficient t=b, where convolution kernel size k controls the convolution kernel size of the depth separable convolution, and expansion coefficient t controls the number of output channels of the extended convolution.

The extended convolution comprises a 1 multiplied by 1 convolution layer, a BatchNorm layer and a ReLU activation layer, and the extended convolution has the function of expanding the characteristic map channel to b times of the number of input channels according to the expansion coefficient t. The depth separable convolution comprises an a×a convolution layer, a BatchNorm layer and a ReLU activation layer, and is used for feature extraction, so that only a single convolution kernel is applied to each channel in the convolution layer of the depth separable convolution to reduce the number of parameters. The mapping convolution comprises a 1×1 convolution layer, a BatchNorm layer and a ReLU activation layer, and the mapping convolution is used for mapping the feature map back to the input channel number.

As shown in fig. 4, the convolution kernel size and expansion coefficient of 7 operations are different, wherein the convolution kernel size k is 3 and MBConv with expansion coefficient t of 1 is denoted as mbconv_k3_t1. In the above nomenclature, the 7 operations are mbconv_k3_t1, mbconv_k3_t3, mbconv_k3_t6, mbconv_k5_t3, mbconv_k5_t6, mbconv_k7_t3, and mbconv_k7_t6, respectively. The structural weights v are randomly initialized.

Then, training set T _train Divided into a first training set T _train1 And a second training set T _train2 The structure weight v and the network model weight w of the hybrid model are respectively used for training the search space. The division ratio can be adjusted according to the size of the search space and the complexity of the network model through experience of engineers: when the search space is small and the independent model structure is complex, T can be calculated _train1 Is adjusted to 40 percent, T _train2 Is adjusted to 60%; when the search space is large and the independent model has simple structure, T can be calculated _train1 Is adjusted to 60 percent, T _train2 Is adjusted to 40%; in the present embodiment, T _train1 And T _train2 The division is made in a ratio of 50% and 50%.

In the training process, firstly, the network model weight w is fixed, and in a first training set T _train1 Optimizing structural weight v by the upper training mixed model; the structural weight v is fixed again, and in the second training set T _train2 The upper training hybrid model optimizes the network model weights w.

The network model weight and the structure weight are optimized to count one round (epoch) at a time, and the number of epochs can be set according to the complexity degree of the search space. In the embodiment, the iterative training of 150 epochs is performed, and after each epoch is finished, an independent model structure is determined according to the maximum value of the structural weight of each operation. Inputting a validation set T _val The evaluation of the independent model was carried out,the independent model with highest precision is the lightweight network structure obtained by searching the structure.

s6, performing knowledge distillation on the teacher network model and the student network model.

Specifically, as shown in fig. 6, the training set T is first set _train Input to the teacher network and the student network, and output of the teacher network and the student network in the logits layer is obtained. The output of the logits layer is then converted into a probability distribution of categories using a softmax function. Then, the difference of the teacher network and the student network class probability distribution is calculated by using the KL divergence, and the difference is used as a soft target loss L of the student network _soft-Target (y _student ，y _teacher ) The calculation formula of the KL divergence is as follows:

wherein y is _student Representing a class probability distribution of student network predictions, y _teacher Representing the predicted class probability distribution of the teacher network. KL divergence can be used to measure the degree of difference between two distributions. The smaller the difference in the two distributions, the smaller the KL divergence and vice versa. When the two distributions are identical, their KL divergence is 0.

In an embodiment, sample A obtains a set of class probability distributions y _teacher ＝[0.1，0，0，0，0.7，0，0，0，0，0.2]And y _student ＝[0.2，0.1，0，0，0.5，0，0，0，0，0.2]The KL divergence calculation formula can be obtained according to the following formula:

the smaller the soft target loss, the closer the class distribution probability predicted by the student network model is to the class distribution probability predicted by the teacher network model. When the class distribution probability predicted by the student network model is completely consistent with the class distribution probability predicted by the teacher network model, the soft target loss is 0. In the knowledge distillation process, the student network model can quickly learn the predicted class probability distribution of the teacher network model by means of the loss of the soft target based on the KL divergence, so that the performance level close to the teacher network model is obtained.

Specifically, in order to avoid the student network model learning incorrect knowledge from the teacher network model, a knowledge validity verification weight H (y _teacher Y), the calculation formula is as follows:

wherein argmax (·) represents the parametrization function as the result y predicted by the teacher network model _teacher When the model is consistent with the label y, the knowledge of the teacher network model distillation is considered to be effective, so that a weight of 1 is output, namely the soft target loss of the sample is reserved; when teacher network model predicts result y _teacher When the output weight is inconsistent with the label y, namely when the teacher network model is misclassified, the teacher network model is considered invalid in distilling knowledge, and the output weight is 0 at the moment, namely, the soft target loss is not reserved, and the student network model only optimizes the hard target loss.

Defining training loss function L of student network by combining soft target loss and knowledge effectiveness weight _stuedent ：

Wherein α represents hard target loss L _hard-Target (y _student Weight of y), beta represents soft target loss L _soft-Target (y _student ，y _teacher ) Is a weight of (2).

In an embodiment, a hard target loss L is set _hard-Target (y _student Weight α=1 of y), soft target loss L _soft-Target (y _student ，y _teacher ) The purpose of the weight β=100 is to make the predicted class probability distribution of the student network model as close as possible to the teacher network model, thereby reducing the loss of accuracy. In practical application, the weights alpha and beta can be adjusted with the learning condition of the student network model. In an embodiment, sample A is labeled [0,0,0,0,1,0,0,0,0,0 ]]The loss of the last sample a can be calculated as:

L _stuedent the larger the difference between the class distribution probability predicted by the student network model and the class distribution probability predicted by the teacher network model and the actual class distribution probability is larger.

Specifically, the student network is iteratively trained according to the defined student network training loss function until the student network model converges.

In the embodiment, according to the defined student network training loss function, iterative training is carried out on the student network through an Adam optimization algorithm until the loss of the student network model converges, so that the precision level is close to the teacher network level.

In summary, the embodiment of the invention provides a convolutional neural network model compression method based on structure search and knowledge distillation, which solves the problem of convolutional neural network model parameter redundancy and reduces the waste of computing resources such as computing power, memory, storage and the like of an operation platform. Meanwhile, in the traditional knowledge distillation method, the student network model is designed manually, a great deal of time and engineer experience are needed to design the efficient lightweight chemical network model and other problems. In practical application, compared with the existing model compression method, the method has the advantages of high model compression rate, less precision loss and the like, and the model compression rate is more than 70%, and the precision loss is less than 1%, so that the method has practicability.

Based on the same inventive concept, the embodiment also provides a convolutional neural network model compression device based on structure search and knowledge distillation, which comprises a memory and a processor, wherein the memory is used for storing a computer program, and the processor is used for realizing the convolutional neural network model compression method based on structure search and knowledge distillation when executing the computer program.

Based on the same inventive concept, the embodiment also provides a computer readable storage medium, wherein the storage medium stores a computer program, and the computer program realizes the convolutional neural network model compression method based on the structure search and the knowledge distillation when the computer program uses a computer.

It should be noted that, the convolutional neural network model compression device and the computer-readable storage medium based on the structure search and the knowledge distillation provided in the foregoing embodiments belong to the same concept as the convolutional neural network model compression method based on the structure search and the knowledge distillation, and detailed implementation processes of the convolutional neural network model compression method based on the structure search and the knowledge distillation are described in the embodiment, which is not repeated here.

The foregoing detailed description of the preferred embodiments and advantages of the invention will be appreciated that the foregoing description is merely illustrative of the presently preferred embodiments of the invention, and that no changes, additions, substitutions and equivalents of those embodiments are intended to be included within the scope of the invention.

Claims

1. The convolutional neural network model compression method based on structure search and knowledge distillation is characterized by comprising the following steps of:

step S1, acquiring a target task data set T;

step S2, dividing a training set T from the target task data set T _train By means of training sets T _train Training a convolutional neural network model;

2. The convolutional neural network model compression method based on structural search and knowledge distillation of claim 1, wherein step S2 comprises:

S22, training set T _train Inputting into a convolutional neural network model;

s24, training target network parameters according to the set loss function;

3. The convolutional neural network model compression method based on structural search and knowledge distillation of claim 2, wherein step S4 comprises:

4. The convolutional neural network model compression method based on structure search and knowledge distillation of claim 3, wherein the search space is a chain structure and comprises M paths, each path has N nodes, and K operations can be adopted between every two nodes.

5. The method for compressing convolutional neural network model based on structural search and knowledge distillation as recited in claim 4, wherein the search space is designed based on VGGNet, the operations between every two nodes are composed of different VGGConv, VGGConv is a building block of VGGNet, when the feature map in the target task data set T is input into the building block VGGConv, the input feature map is first subjected to convolution with a convolution kernel size of k×k and a convolution step size of s, then activated by a ReLU function, and finally subjected to batch normalization to obtain the output feature map.

6. The convolutional neural network model compression method based on structural search and knowledge distillation according to claim 5, wherein the convolution step size is obtained by calculating an input feature map size fm_size1 and an output feature map size fm_size2, so that the number of operation stacks of the adaptive search building block VGGConv is realized in a search space, the alignment of different operation output feature map sizes is ensured, and a calculation formula of the convolution step size s is as follows:

7. the convolutional neural network model compression method based on structural search and knowledge distillation of claim 4, wherein the search space is designed based on MobileNet, the operation between every two nodes consists of different MBConv, which is a building block of MobileNet, the building block MBConv comprising extended convolution, depth separable convolution and mapped convolution; the extended convolution comprises a convolution layer with the size of 1 multiplied by 1, a BatchNorm layer and a ReLU activation layer, and expands the characteristic map channel according to the expansion coefficient t; the depth separable convolution comprises a k multiplied by k convolution layer, a BatchNorm layer and a ReLU activation layer, wherein each channel in the convolution layer only uses a single convolution kernel to reduce the number of parameters, and the feature extraction is carried out through the depth separable convolution; the mapping convolution comprises a 1×1 convolution layer, a BatchNorm layer and a ReLU activation layer, and the feature map is mapped back to the input channel number through the mapping convolution.

8. The convolutional neural network model compression method based on structure search and knowledge distillation of claim 2, wherein step S6 comprises:

s61, training set T _train Inputting the data into a teacher network and a student network, and obtaining the output of the teacher network and the student network in a logits layer;

s63, calculating differences of class probability distributions of the teacher network and the student network by using KL divergence, and taking the differences as soft target loss L of the student network _sof t _-Target (y _student ,y _teacher ) The calculation formula is as follows:

s65, defining training loss function L of student network _stuedent ：

Wherein α represents hard target loss L _hard-Target (y _student Weight of y), beta represents soft target loss L _soft-Target (y _student ,y _teacher ) Weights of (2);

9. A convolutional neural network model compression device based on structure search and knowledge distillation, comprising a memory for storing a computer program, and a processor for implementing the convolutional neural network model compression method based on structure search and knowledge distillation of any one of claims 1-8 when the computer program is executed.

10. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when used in a computer, implements the convolutional neural network model compression method based on structure search and knowledge distillation as claimed in any one of claims 1-8.