WO2022027937A1 - 一种神经网络压缩方法、装置、设备及存储介质 - Google Patents

一种神经网络压缩方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2022027937A1
WO2022027937A1 PCT/CN2021/073498 CN2021073498W WO2022027937A1 WO 2022027937 A1 WO2022027937 A1 WO 2022027937A1 CN 2021073498 W CN2021073498 W CN 2021073498W WO 2022027937 A1 WO2022027937 A1 WO 2022027937A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
target
meta
initial
neural network
Prior art date
Application number
PCT/CN2021/073498
Other languages
English (en)
French (fr)
Inventor
尹文枫
董刚
赵雅倩
曹其春
梁玲燕
刘海威
杨宏斌
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Priority to US18/005,620 priority Critical patent/US20230297846A1/en
Publication of WO2022027937A1 publication Critical patent/WO2022027937A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0985Hyperparameter optimisation; Meta-learning; Learning-to-learn
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0895Weakly supervised learning, e.g. semi-supervised or self-supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/046Forward inferencing; Production systems

Definitions

  • the present application relates to the technical field of computer applications, and in particular, to a neural network compression method, apparatus, device and storage medium.
  • Neural Architecture Search is a branch of the field of automatic machine learning (AutoML). It discusses the parameter optimization of various neural network structures, such as the selection and combination of structural parameters such as the type of each layer operator and the size of the convolution kernel. The network structure with the best performance under specific requirements such as limited inference delay. Performance evaluation is the basic link of neural network structure search, and plays a guiding role in the search process of neural network structure search.
  • the purpose of this application is to provide a neural network compression method, device, equipment and storage medium, so as to optimize the performance evaluation process of neural network structure search, reduce the calculation amount of the performance evaluation process, and realize flexible neural network compression.
  • a neural network compression method comprising:
  • the target parameter sharing network obtained by pre-training performs forward inference on the target data, and obtains the output feature map of the last convolution module of the target parameter sharing network;
  • the optimal network structure under the target constraint condition is predicted by the target element generating network, and a compressed neural network model is obtained.
  • the target weakly supervised meta-learning framework includes the target meta-generating network and a target meta-evaluation network connected to the target meta-generating network, and the target meta-generating network has supervision information Gradient information derived from the target meta-evaluation network.
  • the target parameter sharing network and the target weakly supervised meta-learning framework are obtained through the following steps:
  • the initial weakly supervised meta-learning framework includes an initial meta-evaluation network and an initial meta-generating network
  • the target parameter sharing network and the target weakly supervised meta-learning framework are obtained through the following steps:
  • the initial weakly supervised meta-learning framework includes an initial meta-evaluation network and an initial meta-generating network
  • the steps of controlling the initial meta-evaluation network and the meta-generating network to learn in the verification phase are repeatedly performed until the set second end condition is reached, and the target weakly supervised meta-learning framework is obtained.
  • the initial meta-evaluation network is controlled to learn in the verification stage through the following steps:
  • the weight parameter of the last convolution module of the target neural network model is predicted by the initial meta-evaluation network according to the initial neural network structure
  • a replacement convolution module is constructed for the last convolution module of the target neural network model through the initial meta-evaluation network, and the replacement convolution module takes the weight parameter predicted by the initial meta-evaluation network as the weight, and uses the The input data of the last convolution module of the target neural network model is input;
  • the gradient is calculated according to the loss function, and its own parameters are updated.
  • the determining the loss function using the output feature map of the replacement convolution module includes:
  • a loss function is determined based on the classification error and the mean squared error.
  • the initial meta-generating network is controlled to learn in the verification stage by the following steps:
  • the loss function of the optimal network structure under the current constraints is obtained through the initial meta-evaluation network, and the gradient information is transferred backward, so that the initial meta-generating network can perform the gradient calculation and parameter calculation of its own parameters based on the gradient information renew.
  • the network structures of the target meta-evaluation network and the target meta-generating network both include two fully connected layers, the input layer of the target meta-generating network and the target meta-evaluation network
  • the output layer of the network adopts a parameter sharing mechanism.
  • a neural network compression device comprising:
  • a feature map obtaining unit configured to perform forward reasoning on the target data through the target parameter sharing network obtained by pre-training, and obtain the output feature map of the last convolution module of the target parameter sharing network;
  • a feature extraction unit used for extracting channel-related features in the output feature map of the last convolution module of the target parameter sharing network
  • the feature input unit is used to input the extracted channel-related features and target constraints into the target meta-generating network of the target weakly supervised meta-learning framework obtained by pre-training;
  • the compressed model obtaining unit is configured to predict the optimal network structure under the target constraint condition through the target element generation network, and obtain the compressed neural network model.
  • a neural network compression device comprising:
  • the processor is configured to implement the steps of any one of the above neural network compression methods when executing the computer program.
  • a computer-readable storage medium storing a computer program on the computer-readable storage medium, when the computer program is executed by a processor, implements the steps of any one of the above-mentioned neural network compression methods.
  • the target parameter sharing network obtained by pre-training performs forward inference on the target data, and obtains the output feature map of the last convolution module of the target parameter sharing network, and the output Extract channel-related features from the feature map, input the extracted channel-related features and target constraints into the target element generation network of the target weakly supervised meta-learning framework obtained by pre-training, and predict the target constraints through the target element generation network
  • the optimal network structure under the condition is obtained, and the compressed neural network model is obtained.
  • a neural network structure that meets the requirements is generated as a compressed network model based on the resource constraints, and flexible neural network compression is realized.
  • FIG. 1 is a schematic diagram of a neural network structure search process based on an evolutionary algorithm in the related art
  • FIG. 2 is a schematic diagram of a neural network structure search process based on weakly supervised meta-learning in an embodiment of the application
  • FIG. 3 is an implementation flowchart of a neural network compression method in an embodiment of the present application
  • FIG. 4 is a schematic diagram of the synchronous training process of the initial weakly supervised meta-learning framework and the target neural network model in the embodiment of the application;
  • FIG. 5 is a schematic diagram of an asynchronous training process between an initial weakly supervised meta-learning framework and a target neural network model in an embodiment of the application;
  • FIG. 6 is a schematic diagram of a training process of an initial meta-evaluation network in an embodiment of the present application
  • FIG. 7 is a schematic diagram of a training process of an initial meta-generating network in an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a network structure of an initial meta-evaluation network and an initial meta-generating network and a schematic diagram of the data flow during learning of the initial meta-generating network in the embodiment of the application;
  • FIG. 9 is a schematic diagram of a data flow of a target element generation network during inference in an embodiment of the present application.
  • FIG. 10 is a schematic diagram of the data flow during the initial meta-evaluation network learning in the embodiment of the application
  • FIG. 11 is a schematic structural diagram of a neural network compression apparatus in an embodiment of the application.
  • FIG. 12 is a schematic structural diagram of a neural network compression device in an embodiment of the present application.
  • the most straightforward performance evaluation strategy is to train each structure to be evaluated sampled from the full neural network from scratch, and then evaluate its performance on the validation set. This processing method will result in a waste of computing resources and time costs.
  • An efficient way to speed up the performance evaluation of neural network architecture search is weight sharing.
  • the weight of the new structure after sampling is initialized by the pre-trained neural network weight parameters.
  • one-shot NAS adopts the idea of weight sharing. All the sampled sub-structures share the weight of the common structure and inherit the weight in the complete network model.
  • One-shot NAS only needs to train a complete network model, avoiding the training of sub-network models, thus reducing the amount of computation in the performance evaluation process.
  • One-shot NAS starts with the subtraction of the large network. Compared with the original network model, the sub-network structure obtained by the search is reduced in terms of the number of parameters and the number of network layers, which is consistent with the goal of neural network model compression.
  • the metapruning method combines the neural network structure search method based on Hypernetworks (super network) with the pruning method to construct an automatic model compression method.
  • the core of the neural network structure search method based on Hypernetworks is to use meta-learning to train a meta network to generate parameters such as weights or gradients for another network.
  • the sub-network structure to be evaluated is given by a search method based on an evolutionary algorithm, and the meta-network is responsible for generating weights for the sub-network structure to be evaluated, and then the performance of the sub-network structure can be directly tested in the verification set without re-running training, and the training of the meta-network is meta-learning in a supervised manner.
  • the difference between supervised and unsupervised meta-learning is the training phase.
  • Supervised meta-learning can use label learning for training, while unsupervised meta-learning can only obtain unlabeled training data. In the testing phase, both supervised and unsupervised meta-learning need to utilize supervised information for effective learning.
  • One-shot NAS decouples and serializes the network training and search process. After a complete network model training, multiple search strategies can be used to repeatedly search for the best network structure to meet different constraints. Although it eliminates the calculation amount of model retraining through weight sharing, the performance evaluation process still needs to perform multiple model inference calculations to select the network structure with the best performance, and the calculation amount of model inference is not reduced. Moreover, it is found in experiments that each sub-network structure needs to be verified several times before testing to restore the accuracy. The computational cost of the performance evaluation process of neural network structure search is still relatively large.
  • an embodiment of the present application provides a neural network compression method, and the neural network compression method is based on a target weakly supervised meta-learning framework.
  • a high-performance network structure can be generated only through the reasoning of a single batch of target data, enabling fast neural network model compression and reducing the computational load in the performance evaluation process. That is to say, in this embodiment of the present application, there are only two forward inferences that need to be performed, and no iteration is required.
  • One is the target parameter sharing network forward inference to output the feature map of the last convolution module, and according to the output features
  • the graph and target constraints construct the input data of the target element generation network, and the other is the target element generation network output high-performance neural network structure prediction results that satisfy the target constraints.
  • Fig. 3 is an implementation flowchart of a neural network compression method provided by an embodiment of the present application, the method may include the following steps:
  • S310 Perform forward inference on the target data through the target parameter sharing network obtained by pre-training, and obtain the output feature map of the last convolution module of the target parameter sharing network.
  • the target parameter sharing network can be obtained through pre-training.
  • the target data may be a collection of image data currently to be classified.
  • the target data can be input into the target parameter sharing network, and the forward inference is performed on the target data through the target parameter sharing network, and the output feature map of the last convolution module of the target parameter sharing network can be obtained.
  • the last convolutional module of the target parameter sharing network can include convolutional layers, batch normalization layers and activation layers.
  • S320 Extract channel-related features from the output feature map of the last convolution module of the target parameter sharing network.
  • channel-related features can be extracted in this output feature map.
  • the channel-related feature may be the maximum value of the feature map on each channel, that is, after the feature map tensors of the N input data (four-dimensional tensors of size N*C*H*W) are split along the channel C, The maximum value of each sub-feature map (dimension H*W) finally forms a feature tensor of dimension N*C*1.
  • S330 Input the extracted channel-related features and target constraints into the target meta-generating network of the target weakly supervised meta-learning framework obtained by pre-training.
  • the target weakly supervised meta-learning framework includes a target meta-generating network and a target meta-evaluation network connected to the target meta-generating network.
  • the supervision information of the target meta-generating network is derived from the gradient information of the target meta-evaluation network.
  • the target weakly supervised meta-learning framework can be obtained by pre-training.
  • the target weakly supervised meta-learning framework includes a target meta-generating network and a target meta-evaluation network connected with the target meta-generating network.
  • the target weakly-supervised meta-learning framework can be obtained by controlling the learning of the initial weakly-supervised meta-learning framework.
  • the initial weakly supervised meta-learning framework can include an initial meta-generating network and an initial meta-evaluating network. By controlling the initial meta-generating network and the initial meta-evaluation network to learn, the target meta-generating network and the target meta-evaluation network can be obtained, and the target weakly supervised meta-learning framework can be obtained.
  • the supervision information of the target meta generation network comes from the gradient information of the target meta evaluation network.
  • the extracted channel-related features and target constraints can be input into the target meta-generating network of the target weakly supervised meta-learning framework.
  • the target constraints can be FLOPs (floating point operations per second, floating point operations per second) limit or the upper and lower limits of the compression ratio of each layer corresponding to the delay limit.
  • S340 Predict the optimal network structure under the target constraint condition through the target element generating network, and obtain a compressed neural network model.
  • the target meta-generating network is obtained by pre-training. After inputting the channel-related features and target constraints extracted from the output feature map of the last convolution module of the target parameter sharing network into the target element generation network, the target element generation network can be used to predict the optimal performance under the target constraints. Network structure to obtain the compressed neural network model.
  • the target parameter sharing network obtained by pre-training is used to perform forward reasoning on the target data, and the output feature map of the last convolution module of the target parameter sharing network is obtained.
  • the output feature map Extract channel-related features input the extracted channel-related features and target constraints into the target element generation network of the target weakly supervised meta-learning framework obtained by pre-training, and predict the optimal network structure under the target constraints through the target element generation network , to obtain the compressed neural network model. It can generate a high-performance compressed network structure under a given target constraint in a single batch of data reasoning, which can reduce the computational complexity of the performance evaluation process of neural network structure search, and speed up the search for high-performance neural network structures.
  • a neural network structure that meets the requirements is generated as a compressed network model based on the resource constraints, and flexible neural network compression is realized.
  • the target parameter sharing network and target weakly supervised meta-learning framework can be obtained through the following steps:
  • Step 1 Determine the target neural network model and the initial weakly supervised meta-learning framework.
  • the initial weakly supervised meta-learning framework includes an initial meta-evaluation network and an initial meta-generating network;
  • Step 2 control the target neural network model to learn in the training phase
  • Step 3 Control the initial meta-evaluation network and the initial meta-generating network to learn in the verification stage
  • Step 4 Repeat the steps of controlling the target neural network model to learn in the training phase, and controlling the initial meta-evaluation network and the initial meta-generating network to learn in the verification phase, until the set first end condition is reached, and the target parameter sharing network and Targeted Weakly Supervised Meta-Learning Framework.
  • the initial weakly supervised meta-learning framework includes the initial meta-evaluation network and the initial meta-generating network.
  • the target neural network model can use existing neural network models, such as resnet (residual network), lightweight network mobilenet, lightweight network shufflenet, etc.
  • the initial structure is an uncompressed model structure.
  • the training of the target neural network model can be carried out by using slimmable neural networks (SNN) technology, or by using universally slimmable network (USN) or once for all (OFA) technology. Performance verification can also be performed on the basis of using USN technology.
  • SNN slimmable neural networks
  • USN universally slimmable network
  • OFA once for all
  • the training of the initial meta-generating network and the initial meta-evaluation network is performed alternately, and the training of the initial meta-generating network depends on the initial meta-evaluation network.
  • test data in the test phase and the verification data in the verification phase may be data in the same dataset.
  • the initial weakly-supervised meta-learning framework and the target neural network model are simultaneously trained, and the target weakly-supervised meta-learning framework and the target parameter sharing network are obtained at the same time, which can save training time.
  • the meta-learning method implemented by the meta-learning framework of the embodiment of the present application is a weakly supervised learning, which is similar to unsupervised meta-learning in that both use unlabeled data in training, and the difference is that the unsupervised meta-learning method uses unsupervised meta-learning Labeled data is converted into labeled data for learning, such as CACTUs (Clustering to Automatically Construct Tasks for Unsupervised meta-learning, a method of automatically building tasks for unsupervised meta-learning based on clustering), which uses clustering methods for unlabeled data.
  • CACTUs Clustering to Automatically Construct Tasks for Unsupervised meta-learning, a method of automatically building tasks for unsupervised meta-learning based on clustering
  • a supervised meta-learning method such as MAML (Model-Agnostic Meta-Learning, a model-independent meta-learning method) is used to learn the task, and the supervision information of a meta-network in the meta-learning framework in the embodiment of the present application comes from Gradient information for another meta-network instead of label data. That is, the initial meta-generating network uses the gradient fed back by the initial meta-evaluation network as supervision information for weakly supervised learning, and uses knowledge distillation in the supervised learning of the initial meta-evaluation network to maintain the discriminative power of the compressed network.
  • MAML Model-Agnostic Meta-Learning, a model-independent meta-learning method
  • the target parameter sharing network and target weakly supervised meta-learning framework can be obtained through the following steps:
  • the first step determine the target neural network model and the initial weakly supervised meta-learning framework.
  • the initial weakly supervised meta-learning framework includes the initial meta-evaluation network and the initial meta-generating network;
  • the second step perform parameter sharing training on the target neural network model to obtain the target parameter sharing network
  • the third step control the initial meta-evaluation network and the initial meta-generating network to learn in the verification stage;
  • the fourth step Repeat the steps of controlling the initial meta-evaluation network and the meta-generating network to learn in the verification phase until the set second end condition is reached, and the target weakly supervised meta-learning framework is obtained.
  • the initial weakly-supervised meta-learning framework includes the initial meta-evaluation network and the initial meta-generating network.
  • the target neural network model can use existing neural network models, such as resnet (residual network), lightweight network mobilenet, lightweight network shufflenet, etc.
  • the initial structure is an uncompressed model structure.
  • Parameter sharing training is performed on the target neural network model, and the target parameter sharing network can be obtained.
  • the training of the target neural network model can be carried out by using slimmable neural networks (SNN) technology, and the obtained network is the target parameter sharing network, and can also be carried out by using universal compressible network (Universally slimmable network, USN) or once for all (OFA) technology train. Performance verification can also be performed on the basis of using USN technology.
  • SNN slimmable neural networks
  • USN Universal compressible network
  • OFA once for all
  • the training of the initial meta-generating network and the initial meta-evaluation network is performed alternately, and the training of the initial meta-generating network depends on the initial meta-evaluation network.
  • the initial weakly-supervised meta-learning framework and the target neural network model are asynchronously trained to obtain the target weakly-supervised meta-learning framework and the target parameter sharing network, as shown in Figure 5.
  • test data in the test phase and the verification data in the verification phase may be data in the same dataset.
  • the first end condition and the second end condition can be set and adjusted according to the actual situation, for example, taking the accuracy meeting the set requirement as the end condition, etc.
  • the training process of the initial weakly supervised meta-learning framework can be performed in a synchronous or asynchronous manner with the training of the target neural network model.
  • the synchronous training method will not interfere with the training of the target neural network model, and the asynchronous The training method can meet the needs of the target neural network model to migrate between different datasets.
  • the initial meta-evaluation network can be controlled to learn in the verification stage by the following steps:
  • Step 1 Generate a set of initial neural network structures
  • Step 2 Predict the weight parameters of the last convolution module of the target neural network model through the initial meta-evaluation network according to the initial neural network structure;
  • Step 3 Construct a replacement convolution module by using the initial meta-evaluation network as the last convolution module of the target neural network model.
  • the input data of the convolution module is the input;
  • Step 4 Use the output feature map of the replacement convolution module to determine the loss function
  • Step 5 Calculate the gradient according to the loss function through the initial meta-evaluation network, and update its own parameters.
  • a group of initial neural network structures may be generated first. For example, a set of initial neural network structures are randomly generated. That is, the combined data of different channel compression ratios of each layer is constructed.
  • the weight parameters of the last convolution module of the target neural network model can be generated through the initial meta-evaluation network according to the initial neural network structure, and a replacement convolution module is constructed for the last convolution module of the target neural network model.
  • the replacement convolution module takes the weight parameter predicted by the initial meta-evaluation network as the weight, and takes the input data of the last convolution module of the target neural network model as the input.
  • the output feature map of the replacement convolution module is an approximate estimation of the output feature map of the last convolution module after the target neural network model adopts the setting of the current initial neural network structure.
  • the advantage of this processing is that it does not require the target neural network model to perform forward inference according to the initial neural network structure, and can directly estimate the impact of the initial neural network structure on the output feature map of the last convolution module. This not only reduces the number of parameters that need to be predicted, but also avoids redoing the calculation operation of forward reasoning.
  • the output feature map of the replacement convolution module can be used to determine the loss function.
  • the output feature map of the replacement convolution module can be input into the classifier of the target neural network model, the classification error can be obtained, and the output feature map of the replacement convolution module and the output of the last convolution module of the target neural network model can be calculated.
  • the output feature map of the replacement convolution module can be input into the classifier of the target neural network model, that is, the classification error can be obtained in the final fully connected layer, and on the other hand, the replacement volume can be calculated.
  • the classification error and the mean square error can form the calculation formula of the loss function.
  • the gradient can be calculated according to the initial loss function, and its own parameters can be updated.
  • the whole training process is shown in Figure 6.
  • the initial meta-evaluation network adopts the target neural network model at the maximum network width
  • the output feature map of the last convolution module is used as the reference data for knowledge distillation, that is, the mean square error between the output feature map of the replacement convolution module and the reference data of each prediction network structure is used as the loss function of the meta-evaluation network , which guides the learning of the meta-evaluation network.
  • the initial meta-generating network can be controlled to learn in the verification phase by the following steps:
  • the first step perform forward inference through the target neural network model, and obtain the output feature map of the last convolution module of the target neural network model;
  • the second step extract channel-related features in the output feature map of the last convolution module of the target neural network model
  • the third step input the extracted channel-related features and current constraints into the initial meta-generating network
  • the fourth step predict the optimal network structure under the current constraints through the initial meta-generating network, and input it into the initial meta-evaluation network;
  • the fifth step obtain the loss function of the optimal network structure under the current constraints through the initial meta-evaluation network, and transfer the gradient information in the reverse direction, so that the initial meta-generating network can perform the gradient calculation and parameter update of its own parameters based on the gradient information.
  • forward reasoning can be performed by using the target neural network model to obtain the output feature map of the last convolution module of the target neural network model.
  • Channel-dependent features can be extracted from this output feature map.
  • the extracted channel-related features together with the current constraints are used as the input data of the initial meta-generating network, and are input into the initial meta-generating network.
  • the current constraints can be the upper and lower limits of the compression ratio of each layer corresponding to the FLOPs limit or the delay limit.
  • the optimal network structure under the current constraints is predicted by the initial meta-generating network and input to the initial meta-evaluation network. Through the initial meta-evaluation network, the loss function of the optimal network structure under the current constraints can be obtained, and the gradient information can be transferred in reverse. .
  • the initial element generation network receives the gradient information, and performs the gradient calculation and parameter update of its own parameters based on the gradient information. The whole training process is shown in Figure 7.
  • the training of the initial weakly supervised meta-learning framework includes two stages.
  • the first stage is the supervised training of the initial meta-evaluation network for predicting the structural performance of the network
  • the second stage is the supervised training of the initial meta-evaluation network.
  • the training of the initial weakly supervised meta-learning framework in the embodiments of the present application can be combined with the parameter sharing network training method in the related art, for example, combined with the universal compressible network USN, and the weakly supervised meta-learning framework proposed in the embodiments of the present application is carried out in the verification stage.
  • the learning of the universal compressible network USN is performed in the training phase, and the training of the weakly supervised meta-learning framework of the embodiment of the present application will not interfere with it.
  • the neural network compression method uses meta-learning to mine the correlation between the layers of the neural network, learns the combination rule of the number of channels in each layer of the neural network under different resource constraints, and jointly optimizes the number of channels in each layer Tailoring to achieve a reasonable distribution of the limited amount of computation in each layer.
  • the compression is performed layer by layer or only for a certain layer of the neural network, ignoring the correlation between layers, such as the NetAdapt method, which progressively compresses the neural network model.
  • the input neural network structure is compressed for N rounds, and only one layer is selected for compression and fine-tuning in each round.
  • the network structure with the highest accuracy is selected to enter the next iteration, until the compressed neural network structure reaches the resource limit. Stop the iteration after request.
  • This application can generate a high-performance compressed network structure under the given target constraints in a single batch of data reasoning, which can reduce the calculation amount of the performance evaluation process of the neural network structure search and speed up the search for high-performance neural networks.
  • the speed of the structure can be used to generate a neural network structure that meets the requirements for different resource constraints as a compressed network model to achieve flexible neural network compression.
  • the target meta-evaluation network for predicting the performance of the network structure uses the data label and the feature map of the specific output layer of the target parameter sharing network as the supervision information, and performs supervised learning through gradient descent to generate a high-performance network structure.
  • the target element generation network uses the gradient of the target element evaluation network as the supervision information. Since the network structure information with the best performance under the given constraints is unknown, the gradient information of the target element generation network does not come from the real optimal network structure.
  • the strongly supervised information of the target metagenerating network completes the weakly supervised learning through gradient descent.
  • the embodiment of the present application is based on reducing the computational load of the performance evaluation process of neural network structure search, and can avoid the iterative process of model reasoning and screening through a target weakly supervised meta-learning framework that can predict high-performance network structures under different constraints.
  • the high-performance network structure is directly generated by the meta-network as the compressed model, and the function of neural network model compression is realized.
  • the embodiments of the present application can be deployed in FPGA (Field-Programmable Gate Array, Field Programmable Gate Array)-based neural network acceleration applications or software platforms of AI acceleration chips to improve the speed of the neural network structure search process, Furthermore, fast and flexible neural network compression is realized on the basis of the target parameter sharing network, which promotes the application, implementation and promotion of FPGA-based deep learning in resource-constrained scenarios such as edge computing.
  • FPGA Field-Programmable Gate Array
  • Field Programmable Gate Array Field Programmable Gate Array
  • the network structures of the target element evaluation network and the target element generation network both include two fully connected layers, and the input layer of the target element generation network and the output layer of the target element evaluation network both adopt a parameter sharing mechanism, It can adapt to the variable number of channels of the target parameter sharing network, that is, inputs or outputs of different sizes share a set of weight parameters, and the specific input or output size can be changed according to the compression ratio adopted by the target parameter sharing network. That is, the number of weights in the input layer of the target element generation network and the output layer of the target element evaluation network is variable, and the parameter values are shared among different weights.
  • the network structures of the target meta-generating network and the initial meta-generating network are the same, and the network structures of the target meta-evaluation network and the initial meta-evaluation network are the same.
  • the network structure of the initial meta-evaluation network consists of two fully connected (FC) layers, the number of neurons in the input layer can be equal to the number of layers to be compressed in the initial parameter sharing network, and the hidden layer The number of neurons can be set to 32, and the number of neurons in the output layer is equal to the weight parameters of the convolution kernel of a specified layer in the initial parameter sharing network.
  • FC fully connected
  • the evaluation network provides information for knowledge distillation, then the number of neurons in the input layer of the initial meta-evaluation network is 17, the number of neurons in the hidden layer is still 32, and the number of neurons in the output layer is equal to the convolution of the middle convolution layer of the last convolution module.
  • the number of kernel weights i.e. 3x3x512.
  • Each convolutional module contains 3 convolutional layers.
  • the convolution kernel weights output by the initial meta-evaluation network correspond to the second convolutional layer of the last convolutional module of the resnet50 network structure.
  • the network structure of the initial meta-generating network consists of two fully connected layers and a batch normalization layer (BN layer), the BN layer is located between the two fully-connected layers, and the number of neurons in the input layer of the initial meta-generating network is equal to the initial parameters
  • the sum of the number of convolution kernel output channels of a specified layer in the shared network and the number of thresholds constrained by FLOPs, the number of hidden layer neurons of the initial element generation network is set to 32, and the BN layer here is a group of different inputs.
  • a list of ordinary BN layers of dimension, the number of neurons in the output layer of the initial element generation network is the same as the number of layers to be compressed in the initial parameter sharing network.
  • the threshold number of FLOPs constraints is 1.
  • the value of the FLOPs constraint is a certain value in [1.0, 0.875, 0.75, 0.625, 0.5]
  • the BN layer contains a total of 5 ordinary BN layers with input data dimensions [32, 28, 24, 20, 16]. .
  • Figure 8 also shows the data flow of the initial meta-generating network during learning
  • Figure 9 also shows the data flow of the target meta-generating network during inference
  • Figure 10 also shows the data flow of the initial meta-evaluation network learning.
  • the embodiments of the present application further provide a neural network compression apparatus, and the neural network compression apparatus described below and the neural network compression method described above can be referred to each other correspondingly.
  • the device may include the following units:
  • the feature map obtaining unit 1110 is used to perform forward reasoning on the target data through the target parameter sharing network obtained by pre-training, and obtain the output feature map of the last convolution module of the target parameter sharing network;
  • Feature extraction unit 1120 for extracting channel-related features in the output feature map of the last convolution module of the target parameter sharing network
  • the feature input unit 1130 is used to input the extracted channel-related features and target constraints into the target meta-generating network of the target weakly supervised meta-learning framework obtained by pre-training;
  • the compressed model obtaining unit 1140 is configured to predict the optimal network structure under the target constraint condition through the target element generation network, and obtain the compressed neural network model.
  • the target parameter sharing network obtained by pre-training performs forward inference on the target data, and obtains the output feature map of the last convolution module of the target parameter sharing network, in the output feature map Extract channel-related features, input the extracted channel-related features and target constraints into the target element generation network of the target weakly supervised meta-learning framework obtained by pre-training, and predict the optimal network structure under the target constraints through the target element generation network , to obtain the compressed neural network model. It can generate a high-performance compressed network structure under a given target constraint in a single batch of data reasoning, which can reduce the calculation amount of the performance evaluation process of the neural network structure search, speed up the search for high-performance neural network structures, and can target different neural network structures. A neural network structure that meets the requirements is generated as a compressed network model based on the resource constraints, and flexible neural network compression is realized.
  • the target weakly supervised meta-learning framework includes a target meta-generating network and a target meta-evaluation network connected to the target meta-generating network, and the supervision information of the target meta-generating network is derived from the gradient of the target meta-evaluation network. information.
  • a first training unit which is used to obtain the target parameter sharing network and the target weakly supervised meta-learning framework through the following steps:
  • the initial weakly supervised meta-learning framework includes the initial meta-evaluation network and the initial meta-generating network;
  • a second training unit is also included, which is used to obtain the target parameter sharing network and the target weakly supervised meta-learning framework through the following steps:
  • the initial weakly supervised meta-learning framework includes the initial meta-evaluation network and the initial meta-generating network;
  • Parameter sharing training is performed on the target neural network model to obtain the target parameter sharing network
  • the steps of controlling the initial meta-evaluation network and the meta-generating network to learn in the verification phase are repeatedly executed until the set second end condition is reached, and the target weakly supervised meta-learning framework is obtained.
  • a meta-evaluation network learning unit is also included, which is used to control the initial meta-evaluation network to learn in the verification stage through the following steps:
  • the weight parameter of the last convolution module of the target neural network model is predicted by the initial meta-evaluation network according to the initial neural network structure
  • a replacement convolution module is constructed by using the initial meta-evaluation network as the last convolution module of the target neural network model.
  • the input data is input;
  • the gradient is calculated according to the loss function, and its own parameters are updated.
  • the meta-evaluation network learning unit is used for:
  • a meta-generating network learning unit is also included, which is used to control the initial meta-generating network to learn in the verification stage through the following steps:
  • the optimal network structure under the current constraints is predicted by the initial meta-generating network and input into the initial meta-evaluation network;
  • the loss function of the optimal network structure under the current constraints is obtained through the initial meta-evaluation network, and the gradient information is transferred backwards, so that the initial meta-generating network can perform the gradient calculation and parameter update of its own parameters based on the gradient information.
  • the network structures of the target element evaluation network and the target element generation network both include two fully connected layers, and the input layer of the target element generation network and the output layer of the target element evaluation network both adopt parameter sharing. mechanism.
  • the embodiments of the present application further provide a neural network compression device, including:
  • the processor is configured to implement the steps of the above neural network compression method when executing the computer program.
  • the neural network compression device may include: a processor 10 , a memory 11 , a communication interface 12 and a communication bus 13 .
  • the processor 10 , the memory 11 , and the communication interface 12 all communicate with each other through the communication bus 13 .
  • the processor 10 may be a central processing unit (Central Processing Unit, CPU), an application-specific integrated circuit, a digital signal processor, a field programmable gate array, or other programmable logic devices, and the like.
  • CPU Central Processing Unit
  • the processor 10 may call the program stored in the memory 11, and specifically, the processor 10 may execute the operations in the embodiments of the neural network compression method.
  • the memory 11 is used to store one or more programs, and the programs may include program codes, and the program codes include computer operation instructions.
  • the memory 11 at least stores a program for realizing the following functions:
  • the target parameter sharing network obtained by pre-training performs forward inference on the target data, and obtains the output feature map of the last convolution module of the target parameter sharing network;
  • the optimal network structure under the target constraints is predicted by the target element generation network, and the compressed neural network model is obtained.
  • the memory 11 may include a storage program area and a storage data area, wherein the storage program area may store an operating system and an application program required for at least one function (such as an image display function and a feature extraction function). etc.; the storage data area can store data created during use, such as feature map data, network structure data, etc.
  • the storage program area may store an operating system and an application program required for at least one function (such as an image display function and a feature extraction function). etc.
  • the storage data area can store data created during use, such as feature map data, network structure data, etc.
  • the memory 11 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device or other volatile solid-state storage device.
  • the communication interface 12 may be an interface of a communication module for connecting with other devices or systems.
  • the structure shown in FIG. 12 does not constitute a limitation on the neural network compression device in the embodiment of the present application.
  • the neural network compression device may include more or less than that shown in FIG. 12 . parts, or a combination of certain parts.
  • embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the above neural network compression method are implemented.
  • a software module can be placed in random access memory (RAM), internal memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other in the technical field. in any other known form of storage medium.
  • RAM random access memory
  • ROM read only memory
  • electrically programmable ROM electrically erasable programmable ROM
  • registers hard disk, removable disk, CD-ROM, or any other in the technical field. in any other known form of storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Image Analysis (AREA)

Abstract

一种神经网络压缩方法、装置、设备及存储介质,该方法包括以下步骤:通过目标参数共享网络在目标数据上进行前向推理,获得最后一个卷积模块的输出特征图;在该输出特征图中提取通道相关特征;将提取到的通道相关特征与目标约束条件输入到目标元生成网络中;通过目标元生成网络预测目标约束条件下的最优网络结构,获得压缩后的神经网络模型。应用本申请所提供的技术方案,可以减少神经网络结构搜索的性能评价过程的计算量,加快搜索高性能神经网络结构的速度。

Description

一种神经网络压缩方法、装置、设备及存储介质
本申请要求于2020年08月06日提交中国专利局、申请号为202010783365.1、发明名称为“一种神经网络压缩方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机应用技术领域,特别是涉及一种神经网络压缩方法、装置、设备及存储介质。
背景技术
随着计算机技术的快速发展,神经网络技术逐渐发展起来,神经网络结构搜索(Neural Architecture Search,NAS)受到了越来越多的关注。神经网络结构搜索属于自动机器学习(AutoML)领域的分支,讨论各种神经网络结构的参数优化,例如每层算子类型、卷积核尺寸等结构参数的选择与组合,搜索出限定计算量或有限推理时延等特定要求下性能最佳的网络结构。性能评估是神经网络结构搜索的基础环节,对神经网络结构搜索的搜索过程起引导作用。
目前,多是基于进化算法的神经网络结构搜索,如图1所示,总体流程为:生成候选结构后,参数共享网络推理,筛选结构,优化候选结构,确定是否符合终止条件,如果是,则获得最佳结构,如果否,则重复执行参数共享网络推理。该方法需要迭代多次执行候选网络结构生成、模型推理、候选网络结构筛选与优化等操作,需要消耗大量的计算量,且时延较长,而且根据不同的资源约束要求进行神经网络压缩时都需要执行如此计算密集的神经网络结构搜索处理,不利于模型压缩方法的灵活部署。
发明内容
本申请的目的是提供一种神经网络压缩方法、装置、设备及存储介质,以对神经网络结构搜索的性能评估过程进行优化,减小性能评估过程计算量,实现灵活的神经网络压缩。
为解决上述技术问题,本申请提供如下技术方案:
一种神经网络压缩方法,包括:
通过预先训练获得的目标参数共享网络在目标数据上进行前向推理,获得所述目标参数共享网络的最后一个卷积模块的输出特征图;
在所述目标参数共享网络的最后一个卷积模块的输出特征图中提取通道相关特征;
将提取到的通道相关特征与目标约束条件输入到预先训练获得的目标弱监督元学习框架的目标元生成网络中;
通过所述目标元生成网络预测所述目标约束条件下的最优网络结构,获得压缩后的神经网络模型。
在本申请的一种具体实施方式中,所述目标弱监督元学习框架包括所述目标元生成网络和与所述目标元生成网络连接的目标元评价网络,所述目标元生成网络的监督信息来源于所述目标元评价网络的梯度信息。
在本申请的一种具体实施方式中,通过以下步骤获得所述目标参数共享网络和所述目标弱监督元学习框架:
确定目标神经网络模型和初始弱监督元学习框架,所述初始弱监督元学习框架包括初始元评价网络和初始元生成网络;
控制所述目标神经网络模型在训练阶段进行学习;
控制所述初始元评价网络和所述初始元生成网络在验证阶段进行学习;
重复执行所述控制所述目标神经网络模型在训练阶段进行学习、所述控制所述初始元评价网络和所述初始元生成网络在验证阶段进行学习的步骤,直至达到设定的第一结束条件,获得所述目标参数共享网络和所述目标弱监督元学习框架。
在本申请的一种具体实施方式中,通过以下步骤获得所述目标参数共享网络和所述目标弱监督元学习框架:
确定目标神经网络模型和初始弱监督元学习框架,所述初始弱监督元学习框架包括初始元评价网络和初始元生成网络;
对所述目标神经网络模型进行参数共享训练,获得所述目标参数共享 网络;
控制所述初始元评价网络和所述初始元生成网络在验证阶段进行学习;
重复执行所述控制所述初始元评价网络和所述元生成网络在验证阶段进行学习的步骤,直至达到设定的第二结束条件,获得所述目标弱监督元学习框架。
在本申请的一种具体实施方式中,通过以下步骤控制所述初始元评价网络在所述验证阶段进行学习:
生成一组初始神经网络结构;
通过所述初始元评价网络依据所述初始神经网络结构预测所述目标神经网络模型的最后一个卷积模块的权重参数;
通过所述初始元评价网络为所述目标神经网络模型的最后一个卷积模块构造一个替换卷积模块,所述替换卷积模块以所述初始元评价网络预测的权重参数为权重,以所述目标神经网络模型的最后一个卷积模块的输入数据为输入;
利用所述替换卷积模块的输出特征图,确定损失函数;
通过所述初始元评价网络依据所述损失函数计算梯度,并进行自身的参数更新。
在本申请的一种具体实施方式中,所述利用所述替换卷积模块的输出特征图,确定损失函数,包括:
将所述替换卷积模块的输出特征图输入到所述目标神经网络模型的分类器中,获得分类误差;
计算所述替换卷积模块的输出特征图与所述目标神经网络模型的最后一个卷积模块的输出特征图之间的均方误差;
根据所述分类误差和所述均方误差,确定损失函数。
在本申请的一种具体实施方式中,通过以下步骤控制所述初始元生成网络在所述验证阶段进行学习:
通过所述目标神经网络模型进行前向推理,获得所述目标神经网络模型的最后一个卷积模块的输出特征图;
在所述目标神经网络模型的最后一个卷积模块的输出特征图中提取通道相关特征;
将提取到的通道相关特征与当前约束条件输入到所述初始元生成网络中;
通过所述初始元生成网络预测所述当前约束条件下的最优网络结构,并输入到所述初始元评价网络中;
通过所述初始元评价网络获取所述当前约束条件下的最优网络结构的损失函数,反向传递梯度信息,以使所述初始元生成网络基于所述梯度信息进行自身参数的梯度计算和参数更新。
在本申请的一种具体实施方式中,所述目标元评价网络和所述目标元生成网络的网络结构均包含两层全连接层,所述目标元生成网络的输入层和所述目标元评价网络的输出层均采用参数共享机制。
一种神经网络压缩装置,包括:
特征图获得单元,用于通过预先训练获得的目标参数共享网络在目标数据上进行前向推理,获得所述目标参数共享网络的最后一个卷积模块的输出特征图;
特征提取单元,用于在所述目标参数共享网络的最后一个卷积模块的输出特征图中提取通道相关特征;
特征输入单元,用于将提取到的通道相关特征与目标约束条件输入到预先训练获得的目标弱监督元学习框架的目标元生成网络中;
压缩模型获得单元,用于通过所述目标元生成网络预测所述目标约束条件下的最优网络结构,获得压缩后的神经网络模型。
一种神经网络压缩设备,包括:
存储器,用于存储计算机程序;
处理器,用于执行所述计算机程序时实现上述任一项所述神经网络压缩方法的步骤。
一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述任一项所述神经网络压缩方法的步骤。
应用本申请实施例所提供的技术方案,通过预先训练获得的目标参数共享网络在目标数据上进行前向推理,获得所述目标参数共享网络的最后一个卷积模块的输出特征图,在该输出特征图中提取通道相关特征,将提取到的通道相关特征与目标约束条件输入到预先训练获得的目标弱监督元学习框架的目标元生成网络中,通过所述目标元生成网络预测所述目标约束条件下的最优网络结构,获得压缩后的神经网络模型。能够在单批次数据推理中生成给定的目标约束条件下高性能的压缩网络结构,可以减少神经网络结构搜索的性能评价过程的计算量,加快搜索高性能神经网络结构的速度,能够针对不同的资源约束生成满足要求的神经网络结构作为压缩后网络模型,实现灵活的神经网络压缩。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为相关技术中基于进化算法的神经网络结构搜索过程示意图;
图2为本申请实施例中基于弱监督元学习的神经网络结构搜索过程示意图;
图3为本申请实施例中一种神经网络压缩方法的实施流程图;
图4为本申请实施例中初始弱监督元学习框架与目标神经网络模型同步训练过程示意图;
图5为本申请实施例中初始弱监督元学习框架与目标神经网络模型异步训练过程示意图;
图6为本申请实施例中初始元评价网络的训练过程示意图;
图7为本申请实施例中初始元生成网络的训练过程示意图;
图8为本申请实施例中初始元评价网络和初始元生成网络的一种网络结构及初始元生成网络学习时的数据流示意图;
图9为本申请实施例中目标元生成网络在推理时的数据流示意图;
图10为本申请实施例中初始元评价网络学习时的数据流示意图
图11为本申请实施例中一种神经网络压缩装置的结构示意图;
图12为本申请实施例中一种神经网络压缩设备的结构示意图。
具体实施方式
首先,对当前相关技术做简单说明。
最简单直接的性能评估策略是从零开始训练由完整神经网络采样出的每个待评估结构,然后再在验证集评估其性能。这种处理方式将会造成计算资源和时间成本的浪费。加速神经网络结构搜索的性能评估的一种高效方法是权重共享。由预训练好的神经网络权重参数来初始化采样后新结构的权重。如one-shot NAS,就是采用了权重共享的思路,所有采样得到的子结构之间共享共同结构的权重,均在完整网络模型中继承权重。one-shot NAS仅需要训练一个完整的网络模型,避免了训练子网络模型,因而减小了性能评估过程的计算量。one-shot NAS由大网络开始做减法,搜索得到的子网络结构相比原网络模型,在参数量、网络层数等方面均有所减少,这与神经网络模型压缩的目标达成一致。
考虑到神经网络模型压缩和神经网络结构搜索的领域有交叉,已经有研究将神经网络结构搜索中加速搜索的方法与神经网络模型压缩方法联合应用。例如metapruning方法,将基于Hypernetworks(超网络)的神经网络结构搜索方法与剪枝方法结合,构造出一个自动化模型压缩方法。其中,基于Hypernetworks的神经网络结构搜索方法的核心是利用元学习,训练出一个元网络(meta network)来为另一个网络生成权重或梯度等参数。在metapruning方法中,待评估的子网络结构由基于进化算法的搜索方法给出,元网络负责为待评估的子网络结构生成权重,进而可以直接在校验集测试子网络结构的性能,无需重训练,而元网络的训练是以有监督的方式进行元学习的。元学习的有监督方式和无监督方式的区别在于训练阶段,有监督元学习能够利用标签学习进行训练,而无监督元学习只能获取无标签的训练数据。在测试阶段,有监督和无监督元学习均需要利用监督信息 来进行有效地学习。
one-shot NAS将网络训练和搜索过程解耦并序列化,经过一次完整网络模型的训练后,可以采用多种搜索策略来重复地搜索最佳网络结构,以满足不同的约束条件。虽然其通过权重共享,消除了模型重训练的计算量,但是性能评估过程仍需要进行多次模型推理计算,以选择性能最佳的网络结构,模型推理消耗的计算量并未减少。而且在实验中发现,每个子网络结构在测试前需要经过多次校验来恢复精度。神经网络结构搜索的性能评估过程的计算量仍然是较大开销。
鉴于此,本申请实施例提供了一种神经网络压缩方法,该神经网络压缩方法基于目标弱监督元学习框架。如图2所示,仅通过单批次目标数据的推理就能生成高性能网络结构,实现快速神经网络模型压缩,减少性能评估过程的计算量。也就是说,在本申请实施例中,只有两处需要进行前向推理,且无需迭代进行,一处是目标参数共享网络前向推理以输出最后一个卷积模块的特征图,并依据输出特征图和目标约束条件构造目标元生成网络的输入数据,另一处是目标元生成网络输出满足目标约束条件的高性能神经网络结构预测结果。
为了使本技术领域的人员更好地理解本申请方案,下面结合附图和具体实施方式对本申请作进一步的详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
参见图3所示,为本申请实施例所提供的一种神经网络压缩方法的实施流程图,该方法可以包括以下步骤:
S310:通过预先训练获得的目标参数共享网络在目标数据上进行前向推理,获得目标参数共享网络的最后一个卷积模块的输出特征图。
在本申请实施例中,目标参数共享网络可以通过预先训练获得。目标数据可以为当前待进行分类的图像数据的集合。
在实际应用中,可以将目标数据输入到目标参数共享网络中,通过目标参数共享网络在目标数据上进行前向推理,可以获得目标参数共享网络 的最后一个卷积模块的输出特征图。
目标参数共享网络的最后一个卷积模块可以包括卷积层、批归一化层和激活层。
S320:在目标参数共享网络的最后一个卷积模块的输出特征图中提取通道相关特征。
获得目标参数共享网络的最后一个卷积模块的输出特征图后,可以在该输出特征图中提取通道相关特征。
具体的,通道相关特征可以是特征图在各个通道上的最大值,即N个输入数据的特征图张量(尺寸N*C*H*W的四维张量)沿着通道C拆分后,每个子特征图(尺寸H*W)的最大值,最终组成一个N*C*1维度的特征张量。
S330:将提取到的通道相关特征与目标约束条件输入到预先训练获得的目标弱监督元学习框架的目标元生成网络中。
目标弱监督元学习框架包括目标元生成网络和与目标元生成网络连接的目标元评价网络,目标元生成网络的监督信息来源于目标元评价网络的梯度信息。
在本申请实施例中,可以预先训练获得目标弱监督元学习框架。目标弱监督元学习框架包括目标元生成网络和与目标元生成网络连接的目标元评价网络。可以通过控制初始弱监督元学习框架的学习获得目标弱监督元学习框架。初始弱监督元学习框架可以包括初始元生成网络和初始元评价网络。控制初始元生成网络和初始元评价网络进行学习,可以得到目标元生成网络和目标元评价网络,获得目标弱监督元学习框架。目标元生成网络的监督信息来源于目标元评价网络的梯度信息。
在目标参数共享网络的最后一个卷积模块的输出特征图中提取到通道相关特征后,可以将提取到的通道相关特征与目标约束条件输入到目标弱监督元学习框架的目标元生成网络中。目标约束条件可以是FLOPs(floating point operations per second,每秒浮点运算次数)限制或者时延限制对应的每层通道压缩比上下限等。
S340:通过目标元生成网络预测目标约束条件下的最优网络结构,获 得压缩后的神经网络模型。
目标元生成网络为预先训练获得。将在目标参数共享网络的最后一个卷积模块的输出特征图中提取到的通道相关特征与目标约束条件输入到目标元生成网络中后,可以通过目标元生成网络预测目标约束条件下的最优网络结构,获得压缩后的神经网络模型。
应用本申请实施例所提供的方法,通过预先训练获得的目标参数共享网络在目标数据上进行前向推理,获得目标参数共享网络的最后一个卷积模块的输出特征图,在该输出特征图中提取通道相关特征,将提取到的通道相关特征与目标约束条件输入到预先训练获得的目标弱监督元学习框架的目标元生成网络中,通过目标元生成网络预测目标约束条件下的最优网络结构,获得压缩后的神经网络模型。能够在单批次数据推理中生成给定的目标约束条件下高性能的压缩网络结构,可以减少神经网络结构搜索的性能评价过程的计算量,加快搜索高性能神经网络结构的速度,能够针对不同的资源约束生成满足要求的神经网络结构作为压缩后网络模型,实现灵活的神经网络压缩。
在本申请的一个实施例中,可以通过以下步骤获得目标参数共享网络和目标弱监督元学习框架:
步骤一:确定目标神经网络模型和初始弱监督元学习框架,初始弱监督元学习框架包括初始元评价网络和初始元生成网络;
步骤二:控制目标神经网络模型在训练阶段进行学习;
步骤三:控制初始元评价网络和初始元生成网络在验证阶段进行学习;
步骤四:重复执行控制目标神经网络模型在训练阶段进行学习、控制初始元评价网络和初始元生成网络在验证阶段进行学习的步骤,直至达到设定的第一结束条件,获得目标参数共享网络和目标弱监督元学习框架。
为便于描述,将上述四个步骤结合起来进行说明。
先确定目标神经网络模型和初始弱监督元学习框架,初始弱监督元学习框架包括初始元评价网络和初始元生成网络。目标神经网络模型可以使用已有的神经网络模型,如resnet(残差网络)、轻量级网络mobilenet、轻量级网络shufflenet等,该初始的结构是没有压缩的模型结构。
然后控制目标神经网络模型在训练阶段进行学习。对目标神经网络模型的训练可以采用slimmable neural networks(SNN)技术进行,还可以利用通用可压缩网络(Universally slimmable network,USN)或者once for all(OFA)技术进行。在采用USN技术的基础上还可以进行性能验证。
控制初始元评价网络在验证阶段进行学习,控制初始元生成网络在验证阶段进行学习,重复执行控制目标神经网络模型在训练阶段进行学习及控制初始元评价网络和初始元生成网络在验证阶段进行学习的步骤,直至达到设定的第一结束条件,获得目标参数共享网络、目标元评价网络和目标元生成网络,从而获得目标弱监督元学习框架,如图4所示。初始元生成网络和初始元评价网络的训练交替进行,初始元生成网络的训练依赖于初始元评价网络。
测试阶段的测试数据和验证阶段的验证数据可以为同一数据集中的数据。
在这个过程中,对初始弱监督元学习框架和目标神经网络模型进行同步训练,同时获得目标弱监督元学习框架和目标参数共享网络,可以节省训练时间。
本申请实施例的元学习框架实现的元学习方式是一种弱监督学习,其与无监督元学习的相同点是均在训练中采用了无标签数据,区别是无监督元学习方法通过将无标签数据转化为有标签数据来进行学习,例如CACTUs(Clustering to Automatically Construct Tasks for Unsupervised meta-learning,基于聚类的自动为无监督元学习构建任务的方法),其利用聚类方法为无标签数据构造伪标签后采用MAML(Model-Agnostic Meta-Learning,模型无关的元学习方法)等有监督元学习方法进行任务的学习,而本申请实施例中元学习框架中一个元网络的监督信息来源于另一个元网络的梯度信息而非标签数据。即初始元生成网络利用初始元评价网络反馈的梯度作为监督信息进行弱监督学习,并在初始元评价网络的有监督学习中采用了知识蒸馏的手段来保持压缩后网络的鉴别力。
在本申请的另一个实施例中,可以通过以下步骤获得目标参数共享网络和目标弱监督元学习框架:
第一个步骤:确定目标神经网络模型和初始弱监督元学习框架,初始弱监督元学习框架包括初始元评价网络和初始元生成网络;
第二个步骤:对目标神经网络模型进行参数共享训练,获得目标参数共享网络;
第三个步骤:控制初始元评价网络和初始元生成网络在验证阶段进行学习;
第四个步骤:重复执行控制初始元评价网络和元生成网络在验证阶段进行学习的步骤,直至达到设定的第二结束条件,获得目标弱监督元学习框架。
为便于描述,将上述四个步骤结合起来进行说明。
确定目标神经网络模型和初始弱监督元学习框架,初始弱监督元学习框架包括初始元评价网络和初始元生成网络。目标神经网络模型可以使用已有的神经网络模型,如resnet(残差网络)、轻量级网络mobilenet、轻量级网络shufflenet等,该初始的结构是没有压缩的模型结构。
对目标神经网络模型进行参数共享训练,可以获得目标参数共享网络。对目标神经网络模型的训练可以采用slimmable neural networks(SNN)技术进行,得到的网络为目标参数共享网络,还可以利用通用可压缩网络(Universally slimmable network,USN)或者once for all(OFA)技术进行训练。在采用USN技术的基础上还可以进行性能验证。
控制初始元评价网络在验证阶段进行学习,控制初始元生成网络在验证阶段进行学习,重复执行控制初始元评价网络、初始元生成网络在验证阶段进行学习的步骤,直至达到设定的第二结束条件,获得目标元评价网络和目标元生成网络,从而获得目标弱监督元学习框架。初始元生成网络和初始元评价网络的训练交替进行,初始元生成网络的训练依赖于初始元评价网络。
在这个过程中,对初始弱监督元学习框架和目标神经网络模型进行异步训练,获得目标弱监督元学习框架和目标参数共享网络,如图5所示。先对目标神经网络模型进行参数共享训练,获得目标参数共享网络,方便后续初始元评价网络和初始元生成网络的学习,灵活性更强。
测试阶段的测试数据和验证阶段的验证数据可以为同一数据集中的数据。
第一结束条件和第二结束条件可以根据实际情况进行设定和调整,如将精度达到设定要求作为结束条件等。
也就是说,在本申请实施例中,初始弱监督元学习框架的训练流程可以与目标神经网络模型的训练以同步或者异步的方式进行,同步训练方式不会干扰目标神经网络模型的训练,异步训练方式能够满足目标神经网络模型在不同数据集间迁移的需求。
在本申请的一个实施例中,可以通过以下步骤控制初始元评价网络在验证阶段进行学习:
步骤一:生成一组初始神经网络结构;
步骤二:通过初始元评价网络依据初始神经网络结构预测目标神经网络模型的最后一个卷积模块的权重参数;
步骤三:通过初始元评价网络为目标神经网络模型的最后一个卷积模块构造一个替换卷积模块,替换卷积模块以初始元评价网络预测的权重参数为权重,以目标神经网络模型的最后一个卷积模块的输入数据为输入;
步骤四:利用替换卷积模块的输出特征图,确定损失函数;
步骤五:通过初始元评价网络依据损失函数计算梯度,并进行自身的参数更新。
为便于描述,将上述五个步骤结合起来进行说明。
在本申请实施例中,可以先生成一组初始神经网络结构。如随机生成一组初始神经网络结构。即构造每层不同通道压缩比的组合数据。通过初始元评价网络依据初始神经网络结构可以生成目标神经网络模型的最后一个卷积模块的权重参数,并为目标神经网络模型的最后一个卷积模块构造一个替换卷积模块。该替换卷积模块以初始元评价网络预测的权重参数为权重,以目标神经网络模型的最后一个卷积模块的输入数据为输入。该替换卷积模块的输出特征图是对目标神经网络模型采用当前初始神经网络结构的设定后在最后一个卷积模块的输出特征图的近似估计。这样处理的优点在于其不需要目标神经网络模型依据初始神经网络结构进行前向推理, 可直接估计初始神经网络结构对最后一个卷积模块的输出特征图的影响。这样既减少了需要预测的参数量,也避免了重新进行前向推理的计算操作。
构造替换卷积模块后,可利用替换卷积模块的输出特征图,确定损失函数。具体的,可以将替换卷积模块的输出特征图输入到目标神经网络模型的分类器中,获得分类误差,计算替换卷积模块的输出特征图与目标神经网络模型的最后一个卷积模块的输出特征图之间的均方误差,根据分类误差和均方误差,确定损失函数。
即构造替换卷积模块后,一方面可以将该替换卷积模块的输出特征图输入到目标神经网络模型的分类器,即最后的全连接层中获取分类误差,另一方面可以计算该替换卷积模块的输出特征图与目标神经网络模型的最后一个卷积模块的输出特征图之间的均方误差。分类误差和均方误差可以组成损失函数的计算式。
通过初始元评价网络依据初始损失函数可以计算梯度,并进行自身的参数更新。整个训练过程如图6所示。
在本申请实施例中,为了加速初始弱监督元学习框架的训练,保证预测网络结构在最后一个卷积模块输出的特征图的可鉴别水平,初始元评价网络采用目标神经网络模型在最大网络宽度设定时最后一个卷积模块的输出特征图作为参照数据进行知识蒸馏,即将各个预测网络结构在替换卷积模块的输出特征图和此参照数据之间的均方误差作为元评价网络的损失函数的约束项,指导元评价网络的学习。
在本申请的一个实施例中,可以通过以下步骤控制初始元生成网络在验证阶段进行学习:
第一个步骤:通过目标神经网络模型进行前向推理,获得目标神经网络模型的最后一个卷积模块的输出特征图;
第二个步骤:在目标神经网络模型的最后一个卷积模块的输出特征图中提取通道相关特征;
第三个步骤:将提取到的通道相关特征与当前约束条件输入到初始元生成网络中;
第四个步骤:通过初始元生成网络预测当前约束条件下的最优网络结 构,并输入到初始元评价网络中;
第五个步骤:通过初始元评价网络获取当前约束条件下的最优网络结构的损失函数,反向传递梯度信息,以使初始元生成网络基于梯度信息进行自身参数的梯度计算和参数更新。
为便于描述,将上述五个步骤结合起来进行说明。
在本申请实施例中,可以通过目标神经网络模型进行前向推理,获得目标神经网络模型的最后一个卷积模块的输出特征图。在该输出特征图中可以提取通道相关特征。然后将提取到的通道相关特征与当前约束条件一起作为初始元生成网络的输入数据,输入到初始元生成网络中。当前约束条件可以为FLOPs限制或时延限制对应的每层通道压缩比上下限。
通过初始元生成网络预测当前约束下的最优网络结构,并输入到初始元评价网络中,通过初始元评价网络可以获取当前约束条件下的最优网络结构的损失函数,并反向传递梯度信息。初始元生成网络接收梯度信息,基于梯度信息进行自身参数的梯度计算和参数更新。整个训练过程如图7所示。
在本申请实施例中,初始弱监督元学习框架的训练包括两个阶段,第一个阶段是用于预测网络结构性能的元网络——初始元评价网络的有监督训练,第二个阶段是负责生成高性能网络结构的元网络——初始元生成网络的弱监督训练。本申请实施例中的初始弱监督元学习框架的训练可以与相关技术中参数共享网络训练方法结合,例如与通用可压缩网络USN结合,在验证阶段进行本申请实施例所提弱监督元学习框架的学习,在训练阶段进行通用可压缩网络USN的学习,本申请实施例的弱监督元学习框架的训练不会对其造成干扰。
本申请实施例所提供的神经网络压缩方法,利用元学习挖掘神经网络各层之间关联性,学习神经网络在不同资源约束下各层通道数的组合规律,对各层的通道数进行联合优化裁剪,实现有限计算量在各层的合理分配。相比于传统的神经网络压缩方法逐层或者只针对神经网络的某一层进行压缩,忽略层与层之间的关联性,例如NetAdapt方法,其对神经网络模型进行渐进式压缩,在每次迭代中对输入神经网络结构进行N轮压缩,每轮只 选择一层来压缩并微调,N轮压缩完成后选择准确率最高的网络结构进入下一次迭代,直至压缩的神经网络结构达到资源限制的要求后停止迭代,本申请能够在单批次数据推理中生成给定的目标约束条件下高性能的压缩网络结构,可以减少神经网络结构搜索的性能评价过程的计算量,加快搜索高性能神经网络结构的速度,能够针对不同的资源约束生成满足要求的神经网络结构作为压缩后网络模型,实现灵活的神经网络压缩。
在本申请实施例中,预测网络结构性能的目标元评价网络以数据标签和目标参数共享网络的特定输出层的特征图为监督信息,通过梯度下降进行有监督的学习,生成高性能网络结构的目标元生成网络以目标元评价网络反向传递的梯度为监督信息,由于在给定约束条件下性能最优的网络结构信息未知,目标元生成网络的梯度信息并非来自于真实最优网络结构确定的强监督信息,目标元生成网络通过梯度下降完成弱监督学习。
本申请实施例是以减小神经网络结构搜索的性能评估过程计算量为出发点,通过可以在不同约束条件下预测高性能网络结构的目标弱监督元学习框架,可以避免模型推理和筛选的迭代过程,直接由元网络生成高性能网络结构作为压缩后模型,实现神经网络模型压缩的功能。
在实际应用中,本申请实施例可以部署于基于FPGA(Field-Programmable Gate Array,现场可编程门阵列)的神经网络加速应用或者AI加速芯片的软件平台中,提高神经网络结构搜索过程的速度,进而在目标参数共享网络的基础上实现快速且灵活的神经网络压缩,促进基于FPGA的深度学习在边缘计算等资源受限场景中应用落实与推广。
在本申请的一个实施例中,目标元评价网络和目标元生成网络的网络结构均包含两层全连接层,目标元生成网络的输入层和目标元评价网络的输出层均采用参数共享机制,可适应目标参数共享网络可变的通道数量,即不同尺寸的输入或输出共享一组权重参数,具体的输入或输出尺寸可依据目标参数共享网络采用的压缩比变化。即目标元生成网络的输入层和目标元评价网络的输出层的权重数量可变,不同数量的权重间共享参数值。
目标元生成网络和初始元生成网络的网络结构相同,目标元评价网络和初始元评价网络的网络结构相同。
在本申请的一种具体实施方式中,初始元评价网络的网络结构由两层全连接(FC)层组成,其输入层神经元数目可以等于初始参数共享网络中待压缩的层数,隐藏层神经元数目可设定为32,输出层的神经元数目与初始参数共享网络中某一指定层的卷积核权重参数量相等。如图8、9、10所示,以基于resnet50模型的参数共享网络为例,假设对resnet50模型中总共17个卷积模块进行压缩,由resnet50模型的最后一个卷积模块的特征图为初始元评价网络提供知识蒸馏的信息,那么初始元评价网络的输入层神经元数目为17,隐藏层神经元数目仍为32,输出层神经元数目等于最后一个卷积模块的中间卷积层的卷积核权重数量,即3x3x512。每个卷积模块包含3个卷积层。初始元评价网络输出的卷积核权重对应于resnet50网络结构的最后一个卷积模块的第二个卷积层。
初始元生成网络的网络结构由两层全连接层和一个批归一化层(BN层)组成,BN层位于两个全连接层之间,初始元生成网络的输入层神经元数目等于初始参数共享网络中某一指定层的卷积核输出通道数与FLOPs约束的阈值个数之和,初始元生成网络的隐藏层神经元数目设定为32,此处的BN层为一组具有不同输入维度的普通BN层组成的列表,初始元生成网络的输出层的神经元数目与初始参数共享网络中待压缩的层数相同。如图8、9、10所示,以基于resnet50模型的参数共享网络为例,假设对resnet50模型中总共17个卷积模块进行压缩并且FLOPs约束的阈值个数为1,由resnet50模型的最后一个卷积模块的特征图为初始元评价网络提供输入数据,那么初始元生成网络的输入层神经元数目为2048+1=2049,隐藏层神经元数目仍为32,输出层神经元数目等于17。当FLOPs约束的取值为[1.0,0.875,0.75,0.625,0.5]中某个值时,BN层共包含5个输入数据维度分别为[32,28,24,20,16]的普通BN层。
同时,图8还展示了初始元生成网络学习时的数据流,图9还展示了目标元生成网络在推理时的数据流,图10还展示了初始元评价网络学习时的数据流。
相应于上面的方法实施例,本申请实施例还提供了一种神经网络压缩装置,下文描述的神经网络压缩装置与上文描述的神经网络压缩方法可相 互对应参照。
参见图11所示,该装置可以包括以下单元:
特征图获得单元1110,用于通过预先训练获得的目标参数共享网络在目标数据上进行前向推理,获得目标参数共享网络的最后一个卷积模块的输出特征图;
特征提取单元1120,用于在目标参数共享网络的最后一个卷积模块的输出特征图中提取通道相关特征;
特征输入单元1130,用于将提取到的通道相关特征与目标约束条件输入到预先训练获得的目标弱监督元学习框架的目标元生成网络中;
压缩模型获得单元1140,用于通过目标元生成网络预测目标约束条件下的最优网络结构,获得压缩后的神经网络模型。
应用本申请实施例所提供的装置,通过预先训练获得的目标参数共享网络在目标数据上进行前向推理,获得目标参数共享网络的最后一个卷积模块的输出特征图,在该输出特征图中提取通道相关特征,将提取到的通道相关特征与目标约束条件输入到预先训练获得的目标弱监督元学习框架的目标元生成网络中,通过目标元生成网络预测目标约束条件下的最优网络结构,获得压缩后的神经网络模型。能够在单批次数据推理中生成给定的目标约束条件下高性能的压缩网络结构,可以减少神经网络结构搜索的性能评价过程的计算量,加快搜索高性能神经网络结构的速度,能够针对不同的资源约束生成满足要求的神经网络结构作为压缩后网络模型,实现灵活的神经网络压缩。
在本申请的一种具体实施方式中,目标弱监督元学习框架包括目标元生成网络和与目标元生成网络连接的目标元评价网络,目标元生成网络的监督信息来源于目标元评价网络的梯度信息。
在本申请的一种具体实施方式中,还包括第一训练单元,用于通过以下步骤获得目标参数共享网络和目标弱监督元学习框架:
确定目标神经网络模型和初始弱监督元学习框架,初始弱监督元学习框架包括初始元评价网络和初始元生成网络;
控制目标神经网络模型在训练阶段进行学习;
控制初始元评价网络和初始元生成网络在验证阶段进行学习;
重复执行控制目标神经网络模型在训练阶段进行学习、控制初始元评价网络和初始元生成网络在验证阶段进行学习的步骤,直至达到设定的第一结束条件,获得目标参数共享网络和目标弱监督元学习框架。
在本申请的一种具体实施方式中,还包括第二训练单元,用于通过以下步骤获得目标参数共享网络和目标弱监督元学习框架:
确定目标神经网络模型和初始弱监督元学习框架,初始弱监督元学习框架包括初始元评价网络和初始元生成网络;
对目标神经网络模型进行参数共享训练,获得目标参数共享网络;
控制初始元评价网络和初始元生成网络在验证阶段进行学习;
重复执行控制初始元评价网络和元生成网络在验证阶段进行学习的步骤,直至达到设定的第二结束条件,获得目标弱监督元学习框架。
在本申请的一种具体实施方式中,还包括元评价网络学习单元,用于通过以下步骤控制初始元评价网络在验证阶段进行学习:
生成一组初始神经网络结构;
通过初始元评价网络依据初始神经网络结构预测目标神经网络模型的最后一个卷积模块的权重参数;
通过初始元评价网络为目标神经网络模型的最后一个卷积模块构造一个替换卷积模块,替换卷积模块以初始元评价网络预测的权重参数为权重,以目标神经网络模型的最后一个卷积模块的输入数据为输入;
利用替换卷积模块的输出特征图,确定损失函数;
通过初始元评价网络依据损失函数计算梯度,并进行自身的参数更新。
在本申请的一种具体实施方式中,元评价网络学习单元,用于:
将替换卷积模块的输出特征图输入到目标神经网络模型的分类器中,获得分类误差;
计算替换卷积模块的输出特征图与目标神经网络模型的最后一个卷积模块的输出特征图之间的均方误差;
根据分类误差和均方误差,确定损失函数。
在本申请的一种具体实施方式中,还包括元生成网络学习单元,用于 通过以下步骤控制初始元生成网络在验证阶段进行学习:
通过目标神经网络模型进行前向推理,获得目标神经网络模型的最后一个卷积模块的输出特征图;
在目标神经网络模型的最后一个卷积模块的输出特征图中提取通道相关特征;
将提取到的通道相关特征与当前约束条件输入到初始元生成网络中;
通过初始元生成网络预测当前约束条件下的最优网络结构,并输入到初始元评价网络中;
通过初始元评价网络获取当前约束条件下的最优网络结构的损失函数,反向传递梯度信息,以使初始元生成网络基于梯度信息进行自身参数的梯度计算和参数更新。
在本申请的一种具体实施方式中,目标元评价网络和目标元生成网络的网络结构均包含两层全连接层,目标元生成网络的输入层和目标元评价网络的输出层均采用参数共享机制。
相应于上面的方法实施例,本申请实施例还提供了一种神经网络压缩设备,包括:
存储器,用于存储计算机程序;
处理器,用于执行计算机程序时实现上述神经网络压缩方法的步骤。
如图12所示,为神经网络压缩设备的组成结构示意图,神经网络压缩设备可以包括:处理器10、存储器11、通信接口12和通信总线13。处理器10、存储器11、通信接口12均通过通信总线13完成相互间的通信。
在本申请实施例中,处理器10可以为中央处理器(Central Processing Unit,CPU)、特定应用集成电路、数字信号处理器、现场可编程门阵列或者其他可编程逻辑器件等。
处理器10可以调用存储器11中存储的程序,具体的,处理器10可以执行神经网络压缩方法的实施例中的操作。
存储器11中用于存放一个或者一个以上程序,程序可以包括程序代码,程序代码包括计算机操作指令,在本申请实施例中,存储器11中至少存储有用于实现以下功能的程序:
通过预先训练获得的目标参数共享网络在目标数据上进行前向推理,获得目标参数共享网络的最后一个卷积模块的输出特征图;
在目标参数共享网络的最后一个卷积模块的输出特征图中提取通道相关特征;
将提取到的通道相关特征与目标约束条件输入到预先训练获得的目标弱监督元学习框架的目标元生成网络中;
通过目标元生成网络预测目标约束条件下的最优网络结构,获得压缩后的神经网络模型。
在一种可能的实现方式中,存储器11可包括存储程序区和存储数据区,其中,存储程序区可存储操作系统,以及至少一个功能(比如图像显示功能、特征提取功能)所需的应用程序等;存储数据区可存储使用过程中所创建的数据,如特征图数据、网络结构数据等。
此外,存储器11可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件或其他易失性固态存储器件。
通信接口12可以为通信模块的接口,用于与其他设备或者系统连接。
当然,需要说明的是,图12所示的结构并不构成对本申请实施例中神经网络压缩设备的限定,在实际应用中神经网络压缩设备可以包括比图12所示的更多或更少的部件,或者组合某些部件。
相应于上面的方法实施例,本申请实施例还提供了一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现上述神经网络压缩方法的步骤。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可 以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的技术方案及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围内。

Claims (11)

  1. 一种神经网络压缩方法,其特征在于,包括:
    通过预先训练获得的目标参数共享网络在目标数据上进行前向推理,获得所述目标参数共享网络的最后一个卷积模块的输出特征图;
    在所述目标参数共享网络的最后一个卷积模块的输出特征图中提取通道相关特征;
    将提取到的通道相关特征与目标约束条件输入到预先训练获得的目标弱监督元学习框架的目标元生成网络中;
    通过所述目标元生成网络预测所述目标约束条件下的最优网络结构,获得压缩后的神经网络模型。
  2. 根据权利要求1所述的方法,其特征在于,所述目标弱监督元学习框架包括所述目标元生成网络和与所述目标元生成网络连接的目标元评价网络,所述目标元生成网络的监督信息来源于所述目标元评价网络的梯度信息。
  3. 根据权利要求2所述的方法,其特征在于,通过以下步骤获得所述目标参数共享网络和所述目标弱监督元学习框架:
    确定目标神经网络模型和初始弱监督元学习框架,所述初始弱监督元学习框架包括初始元评价网络和初始元生成网络;
    控制所述目标神经网络模型在训练阶段进行学习;
    控制所述初始元评价网络和所述初始元生成网络在验证阶段进行学习;
    重复执行所述控制所述目标神经网络模型在训练阶段进行学习、所述控制所述初始元评价网络和所述初始元生成网络在验证阶段进行学习的步骤,直至达到设定的第一结束条件,获得所述目标参数共享网络和所述目标弱监督元学习框架。
  4. 根据权利要求2所述的方法,其特征在于,通过以下步骤获得所述目标参数共享网络和所述目标弱监督元学习框架:
    确定目标神经网络模型和初始弱监督元学习框架,所述初始弱监督元学习框架包括初始元评价网络和初始元生成网络;
    对所述目标神经网络模型进行参数共享训练,获得所述目标参数共享网络;
    控制所述初始元评价网络和所述初始元生成网络在验证阶段进行学习;
    重复执行所述控制所述初始元评价网络和所述元生成网络在验证阶段进行学习的步骤,直至达到设定的第二结束条件,获得所述目标弱监督元学习框架。
  5. 根据权利要求3或4所述的方法,其特征在于,通过以下步骤控制所述初始元评价网络在所述验证阶段进行学习:
    生成一组初始神经网络结构;
    通过所述初始元评价网络依据所述初始神经网络结构预测所述目标神经网络模型的最后一个卷积模块的权重参数;
    通过所述初始元评价网络为所述目标神经网络模型的最后一个卷积模块构造一个替换卷积模块,所述替换卷积模块以所述初始元评价网络预测的权重参数为权重,以所述目标神经网络模型的最后一个卷积模块的输入数据为输入;
    利用所述替换卷积模块的输出特征图,确定损失函数;
    通过所述初始元评价网络依据所述损失函数计算梯度,并进行自身的参数更新。
  6. 根据权利要求5所述的方法,其特征在于,所述利用所述替换卷积模块的输出特征图,确定损失函数,包括:
    将所述替换卷积模块的输出特征图输入到所述目标神经网络模型的分类器中,获得分类误差;
    计算所述替换卷积模块的输出特征图与所述目标神经网络模型的最后一个卷积模块的输出特征图之间的均方误差;
    根据所述分类误差和所述均方误差,确定损失函数。
  7. 根据权利要求3或4所述的方法,其特征在于,通过以下步骤控制所述初始元生成网络在所述验证阶段进行学习:
    通过所述目标神经网络模型进行前向推理,获得所述目标神经网络模 型的最后一个卷积模块的输出特征图;
    在所述目标神经网络模型的最后一个卷积模块的输出特征图中提取通道相关特征;
    将提取到的通道相关特征与当前约束条件输入到所述初始元生成网络中;
    通过所述初始元生成网络预测所述当前约束条件下的最优网络结构,并输入到所述初始元评价网络中;
    通过所述初始元评价网络获取所述当前约束条件下的最优网络结构的损失函数,反向传递梯度信息,以使所述初始元生成网络基于所述梯度信息进行自身参数的梯度计算和参数更新。
  8. 根据权利要求2至4、6之中任一项所述的方法,其特征在于,所述目标元评价网络和所述目标元生成网络的网络结构均包含两层全连接层,所述目标元生成网络的输入层和所述目标元评价网络的输出层均采用参数共享机制。
  9. 一种神经网络压缩装置,其特征在于,包括:
    特征图获得单元,用于通过预先训练获得的目标参数共享网络在目标数据上进行前向推理,获得所述目标参数共享网络的最后一个卷积模块的输出特征图;
    特征提取单元,用于在所述目标参数共享网络的最后一个卷积模块的输出特征图中提取通道相关特征;
    特征输入单元,用于将提取到的通道相关特征与目标约束条件输入到预先训练获得的目标弱监督元学习框架的目标元生成网络中;
    压缩模型获得单元,用于通过所述目标元生成网络预测所述目标约束条件下的最优网络结构,获得压缩后的神经网络模型。
  10. 一种神经网络压缩设备,其特征在于,包括:
    存储器,用于存储计算机程序;
    处理器,用于执行所述计算机程序时实现如权利要求1至8任一项所述神经网络压缩方法的步骤。
  11. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介 质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至8任一项所述神经网络压缩方法的步骤。
PCT/CN2021/073498 2020-08-06 2021-01-25 一种神经网络压缩方法、装置、设备及存储介质 WO2022027937A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/005,620 US20230297846A1 (en) 2020-08-06 2021-01-25 Neural network compression method, apparatus and device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010783365.1 2020-08-06
CN202010783365.1A CN111967594A (zh) 2020-08-06 2020-08-06 一种神经网络压缩方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022027937A1 true WO2022027937A1 (zh) 2022-02-10

Family

ID=73365042

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/073498 WO2022027937A1 (zh) 2020-08-06 2021-01-25 一种神经网络压缩方法、装置、设备及存储介质

Country Status (3)

Country Link
US (1) US20230297846A1 (zh)
CN (1) CN111967594A (zh)
WO (1) WO2022027937A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114972334A (zh) * 2022-07-19 2022-08-30 杭州因推科技有限公司 一种管材瑕疵检测方法、装置、介质
CN115730654A (zh) * 2022-11-23 2023-03-03 湖南大学 层剪枝方法、厨余垃圾检测方法及遥感图像车辆检测方法
WO2024074072A1 (zh) * 2022-10-08 2024-04-11 鹏城实验室 脉冲神经网络加速器学习方法、装置、终端及存储介质

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111967594A (zh) * 2020-08-06 2020-11-20 苏州浪潮智能科技有限公司 一种神经网络压缩方法、装置、设备及存储介质
CN112396181A (zh) * 2020-12-31 2021-02-23 之江实验室 一种卷积神经网络通用压缩架构的自动剪枝方法及平台
CN113469938B (zh) * 2021-05-25 2024-02-02 长兴云尚科技有限公司 基于嵌入式前端处理服务器的管廊视频分析方法及系统
CN113779722B (zh) * 2021-09-08 2022-09-30 清华大学 一种压气机稳定性预测方法、装置和存储介质
CN113537400B (zh) * 2021-09-14 2024-03-19 浙江捷瑞电力科技有限公司 一种基于分支神经网络的边缘计算节点的分配与退出方法
CN113902099B (zh) * 2021-10-08 2023-06-02 电子科技大学 基于软硬件联合学习的神经网络设计与优化方法
CN114913441B (zh) * 2022-06-28 2024-04-16 湖南大学 通道剪枝方法、目标检测方法及遥感图像车辆检测方法
CN114861890B (zh) * 2022-07-05 2022-09-09 深圳比特微电子科技有限公司 构建神经网络的方法、装置、计算设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510083A (zh) * 2018-03-29 2018-09-07 国信优易数据有限公司 一种神经网络模型压缩方法以及装置
CN111008693A (zh) * 2019-11-29 2020-04-14 深动科技(北京)有限公司 一种基于数据压缩的网络模型构建方法、系统和介质
WO2020131968A1 (en) * 2018-12-18 2020-06-25 Movidius Ltd. Neural network compression
CN111382863A (zh) * 2018-12-28 2020-07-07 上海欧菲智能车联科技有限公司 一种神经网络压缩方法及装置
CN111967594A (zh) * 2020-08-06 2020-11-20 苏州浪潮智能科技有限公司 一种神经网络压缩方法、装置、设备及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11200483B2 (en) * 2016-08-30 2021-12-14 Lunit Inc. Machine learning method and apparatus based on weakly supervised learning
US20180260695A1 (en) * 2017-03-07 2018-09-13 Qualcomm Incorporated Neural network compression via weak supervision
US11836611B2 (en) * 2017-07-25 2023-12-05 University Of Massachusetts Method for meta-level continual learning
JP7213241B2 (ja) * 2017-11-14 2023-01-26 マジック リープ, インコーポレイテッド ニューラルネットワークに関するマルチタスク学習のためのメタ学習

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510083A (zh) * 2018-03-29 2018-09-07 国信优易数据有限公司 一种神经网络模型压缩方法以及装置
WO2020131968A1 (en) * 2018-12-18 2020-06-25 Movidius Ltd. Neural network compression
CN111382863A (zh) * 2018-12-28 2020-07-07 上海欧菲智能车联科技有限公司 一种神经网络压缩方法及装置
CN111008693A (zh) * 2019-11-29 2020-04-14 深动科技(北京)有限公司 一种基于数据压缩的网络模型构建方法、系统和介质
CN111967594A (zh) * 2020-08-06 2020-11-20 苏州浪潮智能科技有限公司 一种神经网络压缩方法、装置、设备及存储介质

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114972334A (zh) * 2022-07-19 2022-08-30 杭州因推科技有限公司 一种管材瑕疵检测方法、装置、介质
CN114972334B (zh) * 2022-07-19 2023-09-15 杭州因推科技有限公司 一种管材瑕疵检测方法、装置、介质
WO2024074072A1 (zh) * 2022-10-08 2024-04-11 鹏城实验室 脉冲神经网络加速器学习方法、装置、终端及存储介质
CN115730654A (zh) * 2022-11-23 2023-03-03 湖南大学 层剪枝方法、厨余垃圾检测方法及遥感图像车辆检测方法
CN115730654B (zh) * 2022-11-23 2024-05-14 湖南大学 层剪枝方法、厨余垃圾检测方法及遥感图像车辆检测方法

Also Published As

Publication number Publication date
CN111967594A (zh) 2020-11-20
US20230297846A1 (en) 2023-09-21

Similar Documents

Publication Publication Date Title
WO2022027937A1 (zh) 一种神经网络压缩方法、装置、设备及存储介质
Shi et al. Bridging the gap between sample-based and one-shot neural architecture search with bonas
CN111382868B (zh) 神经网络结构搜索方法和神经网络结构搜索装置
WO2019201081A1 (zh) 用于估计观测变量之间的因果关系的方法、装置和系统
CN114503121A (zh) 资源约束的神经网络架构搜索
WO2018099084A1 (zh) 一种神经网络模型训练方法、装置、芯片和系统
EP4152154A1 (en) Adaptive artificial neural network selection techniques
Khalil et al. An efficient approach for neural network architecture
JP7178513B2 (ja) ディープラーニングに基づく中国語単語分割方法、装置、記憶媒体及びコンピュータ機器
US20220156508A1 (en) Method For Automatically Designing Efficient Hardware-Aware Neural Networks For Visual Recognition Using Knowledge Distillation
CN112509600A (zh) 模型的训练方法、装置、语音转换方法、设备及存储介质
CN112307048B (zh) 语义匹配模型训练方法、匹配方法、装置、设备及存储介质
CN115455171B (zh) 文本视频的互检索以及模型训练方法、装置、设备及介质
Zheng et al. Ddpnas: Efficient neural architecture search via dynamic distribution pruning
CN116401552A (zh) 一种分类模型的训练方法及相关装置
CN114647752A (zh) 基于双向可切分深度自注意力网络的轻量化视觉问答方法
US20230237337A1 (en) Large model emulation by knowledge distillation based nas
CN111931913B (zh) 基于Caffe的卷积神经网络在FPGA上的部署方法
JP7488375B2 (ja) ニューラルネットワークの生成方法、機器及びコンピュータ可読記憶媒体
CN113033653B (zh) 一种边-云协同的深度神经网络模型训练方法
CN115438784A (zh) 一种用于混合位宽超网络的充分训练方法
CN114298290A (zh) 一种基于自监督学习的神经网络编码方法及编码器
CN116560731A (zh) 一种数据处理方法及其相关装置
Zhao et al. Rapid model architecture adaption for meta-learning
CN116029261A (zh) 中文文本语法纠错方法及相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21852677

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21852677

Country of ref document: EP

Kind code of ref document: A1