WO2021159748A1 - 模型压缩方法、装置、计算机设备及存储介质 - Google Patents

模型压缩方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2021159748A1
WO2021159748A1 PCT/CN2020/124813 CN2020124813W WO2021159748A1 WO 2021159748 A1 WO2021159748 A1 WO 2021159748A1 CN 2020124813 W CN2020124813 W CN 2020124813W WO 2021159748 A1 WO2021159748 A1 WO 2021159748A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
feature map
backbone network
test result
feature
Prior art date
Application number
PCT/CN2020/124813
Other languages
English (en)
French (fr)
Inventor
郑强
王晓锐
高鹏
王俊
李葛
谢国彤
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021159748A1 publication Critical patent/WO2021159748A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a model compression method, device, computer equipment, and storage medium.
  • model life cycle can usually be divided into two links: model training and model inference.
  • model training process in order to pursue a higher recognition accuracy of the model, the model is often inevitably redundant.
  • model inference link due to the influence of different inference application environments, in addition to focusing on the accuracy of the model, the model also needs to have high performance characteristics such as fast inference speed, small resource occupation, and small file size.
  • model compression is a commonly used optimization method to transform the model from model training to model inference.
  • the inventor realized that the current model compression is for distilling compression for the entire artificial intelligence model, and different models have complex and diverse application scenarios, so when performing model compression on different models, it is also necessary to develop a customized compression scheme. , Low efficiency and low versatility.
  • the embodiments of the present application provide a model compression method, device, computer equipment, and storage medium to solve the current problem of low efficiency and low versatility of model compression.
  • a model compression method including:
  • the channel weight vector is used to describe the importance of the feature channel corresponding to the feature map output by the first backbone network
  • the training images are respectively input into the first backbone network and the second backbone network for feature extraction, and the first feature map output by the first backbone network and the second feature map output by the second backbone network are obtained.
  • Feature map
  • a model compression device including:
  • a backbone network acquisition module for acquiring a pre-trained image recognition model and a second backbone network to be trained; wherein the image recognition model includes the first backbone network;
  • a model testing module which inputs multiple images to be tested into the image recognition model for testing, and obtains model testing results corresponding to the multiple images to be tested;
  • a channel weight calculation module configured to calculate a channel weight vector according to the model test result; wherein the channel weight vector is used to describe the importance of the feature channel corresponding to the feature map output by the first backbone network;
  • the model training module is used to input the training images into the first backbone network and the second backbone network for feature extraction, to obtain the first feature map and the second feature map output by the first backbone network.
  • the second feature map output by the backbone network is used to input the training images into the first backbone network and the second backbone network for feature extraction, to obtain the first feature map and the second feature map output by the first backbone network.
  • the second feature map output by the backbone network is used to input the training images into the first backbone network and the second backbone network for feature extraction, to obtain the first feature map and the second feature map output by the first backbone network.
  • the second feature map output by the backbone network is used to input the training images into the first backbone network and the second backbone network for feature extraction, to obtain the first feature map and the second feature map output by the first backbone network.
  • a model loss calculation module configured to calculate a model loss based on the first feature map, the second feature map, and the channel weight vector
  • the model update module is used to update and optimize the second backbone network according to the model loss to obtain a compressed image recognition model.
  • a computer device includes a memory, a processor, and a computer program that is stored in the memory and can run on the processor, and the processor executes the following steps:
  • the channel weight vector is used to describe the importance of the feature channel corresponding to the feature map output by the first backbone network
  • the training images are respectively input into the first backbone network and the second backbone network for feature extraction, and the first feature map output by the first backbone network and the second feature map output by the second backbone network are obtained.
  • Feature map
  • a computer storage medium stores a computer program, and when the computer program is executed by a processor, the following steps are implemented:
  • the channel weight vector is used to describe the importance of the feature channel corresponding to the feature map output by the first backbone network
  • the training images are respectively input into the first backbone network and the second backbone network for feature extraction to obtain a first feature map output by the first backbone network and a second feature map output by the second backbone network.
  • model compression method device, computer equipment and storage medium
  • the pre-trained image recognition model and the second backbone network to be trained are acquired, so as to perform knowledge distillation according to the first backbone network in the image recognition model, and train the first backbone network.
  • the second backbone network does not need to perform model compression on the entire artificial intelligence model, realizes the knowledge distillation of the local network of the original model, reduces the memory usage and calculation amount, accelerates the model compression process, and can effectively solve the problem caused by more application scenarios
  • the limitation of model compression is more versatile, which is conducive to toolization and can effectively reduce repeated investment.
  • the image to be tested is input into the image recognition model for testing, and the model test result is obtained, so as to calculate the channel weight vector according to the model test result, so as to determine the importance of each feature channel through the model test result and ensure the channel weight The accuracy and practicality of the vector.
  • input the training images into the first backbone network and the second backbone network for feature extraction, and obtain the first feature map output by the first backbone network and the second feature map output by the second backbone network, so as to be based on the first feature Map, the second feature map, and the channel weight vector to calculate the model loss.
  • FIG. 1 is a schematic diagram of an application environment of a model compression method in an embodiment of the present application
  • Fig. 2 is a flowchart of a model compression method in an embodiment of the present application
  • FIG. 3 is a schematic diagram of the structure of the model compression method in this embodiment
  • FIG. 4 is a specific flowchart of step S202 in FIG. 3;
  • FIG. 5 is a specific flowchart of step S203 in FIG. 2;
  • FIG. 6 is a specific flowchart of step S203 in FIG. 2;
  • FIG. 7 is a specific flowchart of step S205 in FIG. 2;
  • Fig. 8 is a schematic diagram of a model compression device in an embodiment of the present application.
  • Fig. 9 is a schematic diagram of a computer device in an embodiment of the present application.
  • the model compression method can be applied in the application environment as shown in Fig. 1, where the computer equipment communicates with the server through the network.
  • the computer equipment can be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the server can be implemented as an independent server.
  • a model compression method is provided, and the method is applied to the server in FIG. 1 as an example for description, including the following steps:
  • S201 Obtain a pre-trained image recognition model and a second backbone network to be trained, where the image recognition model includes the first backbone network.
  • the first backbone network refers to the feature extraction backbone network in the image recognition model, which can be understood as the Teacher network in the traditional knowledge distillation method.
  • the image recognition model is a pre-trained model for recognizing objects in the image, such as recognizing animals, people, etc. in the image.
  • the second backbone network is a pre-created feature backbone network that is smaller in scale (such as the number of neurons or the number of network layers) than the first backbone network. It can be understood as the Student network in the traditional knowledge distillation method. Perform knowledge distillation on the first backbone network to train the second backbone network to obtain a compressed image recognition model. It should be noted that the feature channels of the feature maps output by the first backbone network and the second backbone network are kept consistent to unify the feature channel dimensions to facilitate subsequent calculations.
  • the current artificial intelligence model includes the feature extraction backbone network and other parts related to the application.
  • the feature extraction backbone network has a large number of neurons and the highest computational complexity. Therefore, in this embodiment, only the knowledge distillation of the feature extraction backbone network is required, and there is no need to perform model compression on the entire artificial intelligence model, which can reduce the work of compression. It can effectively solve the problem of model compression limitations caused by more application scenarios.
  • the image recognition model includes a mask layer (mask layer) connected to the first backbone network (ie Teacher network) and a recognition network (ie Head The internet).
  • the mask layer is used to perform channel masking processing on the feature channels in the first feature map to calculate the importance evaluation parameters of each feature channel (that is, the channel weight vector); the recognition network is used to perform the feature map output by the mask layer Recognize, get the corresponding recognition result.
  • the test image is input to the image recognition model for testing, so as to obtain the channel weight vector corresponding to the feature map output by the first backbone network, and then the training image used to train the image recognition model is simultaneously input to In the first backbone network and the second backbone network, the first feature map and the second feature map are obtained, so as to calculate the Loss (that is, the model loss) based on the first feature map, the second feature map, and the channel weight vector.
  • the model loss is passed to the second backbone network (that is, the Student network) for model optimization to achieve model compression.
  • S202 Input multiple images to be tested into the image recognition model for testing, and obtain model test results corresponding to the multiple images to be tested.
  • multiple images to be tested can be input into the image recognition model for batch testing, and the recognition accuracy of the multiple images to be tested can be counted as the model test result.
  • S203 Calculate the channel weight vector according to the model test result; where the channel weight vector is used to describe the importance of the feature channel in the first feature map.
  • the model test result that is, the model recognition accuracy rate
  • the model test result is used to calculate the channel weight vector to directly evaluate the importance of the feature channel in the first feature map through the channel weight vector, so that the first backbone network or the second backbone
  • the importance of each feature channel in the feature map output by the network is evaluated by the accuracy of model recognition, which has higher accuracy and stronger practicability.
  • S204 Input the training image into the first backbone network and the second backbone network respectively for feature extraction, to obtain a first feature map output by the first backbone network and a second feature map output by the second backbone network.
  • the first feature map refers to the feature map output by the training image through the first backbone network for feature extraction.
  • the second feature map refers to the feature map output by the training image through the second backbone network for feature extraction.
  • the training images are respectively input into the first backbone network and the second backbone network for feature extraction to obtain the first feature map output by the first backbone network and the second feature map output by the second backbone network for subsequent calculations Model loss.
  • the execution order of step S203 and step S204 is not prioritized, and can be executed at the same time, which is not limited here.
  • S205 Calculate the model loss based on the first feature map, the second feature map, and the channel weight vector.
  • the loss of the feature map is calculated according to the first feature map and the second feature map through a predefined loss function, and then weighted with the channel weight vector, so that the second backbone network can learn the accuracy of the model Features with greater impact are beneficial to improve the recognition accuracy of the target recognition model obtained after subsequent model compression, and can avoid the problem of loss of important information due to model compression.
  • the partial derivative of the model parameters (such as model weights) of each neuron in the second backbone network is obtained through the model update algorithm preset in the image prediction model
  • the model parameters of each neuron in the second backbone network can be optimized.
  • the compressed image recognition model can be obtained.
  • the knowledge distillation is performed according to the first backbone network in the image recognition model, and the second backbone network is trained without the need for the entire artificial intelligence.
  • the model compresses the model, realizes the knowledge distillation of the local network of the original model, reduces the memory usage and calculation amount, accelerates the model compression process, and can effectively solve the problem of the limitation of model compression caused by more application scenarios, and it is universal Higher, it is conducive to tooling, which can effectively reduce repeated investment.
  • the image to be tested is input into the image recognition model for testing, and the model test result is obtained, so as to calculate the channel weight vector according to the model test result, so as to determine the importance of each feature channel through the model test result and ensure the channel weight The accuracy and practicality of the vector.
  • input the training images into the first backbone network and the second backbone network for feature extraction, and obtain the first feature map output by the first backbone network and the second feature map output by the second backbone network, so as to be based on the first feature Map, the second feature map, and the channel weight vector to calculate the model loss.
  • step 202 inputting a plurality of images to be tested into an image recognition model for testing, and obtaining model test results corresponding to the images to be tested, specifically includes the following steps:
  • S301 Use the first backbone network to perform feature extraction on each image to be tested, and output a test feature map corresponding to each image to be tested; wherein the image to be tested includes multiple feature channels.
  • the test feature map refers to the feature map corresponding to the image to be tested and outputted by the feature extraction of the image to be tested through the first backbone network. Specifically, feature extraction is performed on each image to be tested through the first backbone network, that is, through multi-layer convolution, activation, pooling and other nonlinear transformations, to output a test feature map corresponding to each image to be tested.
  • the test feature map includes multiple feature channels, and different feature channels reflect different image features.
  • S302 Use a mask layer to perform channel shielding processing on the same feature channel in each test feature map to obtain a third feature map corresponding to each test image.
  • the mask layer refers to covering the original tensor with a layer of mask to shield or select some specific elements to obtain an image of the target area.
  • the third feature map refers to the feature map after the same feature channel is shielded from each first feature map.
  • a mapping matrix of 0 and 1 is constructed through the mask layer to retain the features of the target feature channel and remove the features of the non-target channel.
  • each test feature map includes feature channel a and feature channel b.
  • Mask out feature channel 1 in each test feature map then fill the mapping matrix corresponding to feature channel a to 0, and fill the mapping matrix corresponding to feature channel b to 1, then feature channel a is masked and feature channel b is retained .
  • S303 Use the recognition network to recognize each third feature map, and obtain a recognition result corresponding to each third feature map.
  • each third feature map into the recognition network for recognition, the recognition result corresponding to each third feature map can be obtained.
  • the real result refers to the pre-labeled image classification result corresponding to the image to be tested.
  • the model application scene of the image recognition model is to identify the animal category in the image
  • the real result is the true category of the animal in the image to be tested. .
  • the test result component corresponding to the round of testing can be obtained.
  • the recognition accuracy rates of multiple third feature maps that simultaneously shield the same feature channel can be counted, which can be used as the test result component corresponding to the feature channel.
  • each feature map includes two feature channels a and b.
  • the feature channel a in the two first feature maps can be used to obtain the corresponding third feature maps a 1 'and a 2 ', and input a 1 ' and a 2 'as the input data of the recognition network into the recognition network. Recognize, get the corresponding recognition results a 1 "(represented as cat) and a 2 "(represented as dog), compare each recognition result with the real result (such as dog), and the recognition accuracy rate of this round of testing is 50 %, as the test result component corresponding to the characteristic channel a.
  • the test result component corresponding to each feature channel can be obtained.
  • S305 Use the data set containing the test result component corresponding to each feature channel as the model test result corresponding to the multiple images to be tested.
  • the data set of the test result component corresponding to each feature channel is used as the model test result, so that when the loss is subsequently calculated, the data in the data set is used for calculation.
  • step S203 that is, calculating the channel weight vector according to the model test result, specifically includes the following steps:
  • S401 Use the difference between the maximum value of the test result component and each test result component in the model test result as the first difference value.
  • S402 Use the difference between the maximum value and the minimum value of the test result component in the model test result as the second difference value.
  • S403 Calculate the ratio of the first difference and the second difference, and add the ratio and the predefined constant term to obtain the channel weight component corresponding to each test feature map.
  • the channel weight component corresponding to each test feature map is stored in a data set, so that the data set is used as a channel weight vector, so that when the loss is subsequently calculated, the data in the data set is used for calculation.
  • step S203 that is, calculating the channel weight vector according to the model test result, specifically includes the following steps:
  • S501 Use the difference between the maximum value of the test result component and each test result component in the model test result as the first difference value.
  • S502 Use the difference between the maximum value and the minimum value of the test result component in the model test result as the second difference value.
  • S503 Calculate the ratio of the first difference and the second difference, and calculate the product of the ratio and the preset scaling factor.
  • the second backbone network can learn more important features that affect the accuracy of the model, and ensure that the model The accuracy of compression is achieved by presetting the scaling factor in this embodiment to enlarge the gap between the channel weight components.
  • S504 Use a data set containing the channel weight component corresponding to each test feature map as the channel weight vector.
  • the channel weight component corresponding to each test feature map is stored in a data set, so that the data set is used as a channel weight vector, so that when the loss is subsequently calculated, the data in the data set is used for calculation.
  • step S205 that is, calculating the model loss based on the first feature map, the second feature map, and the channel weight vector, specifically includes the following steps:
  • S601 Use a predefined loss function to calculate the first feature map and the second feature map to obtain a loss of the feature map.
  • S602 Perform weighting processing on the feature map loss based on the channel weight vector to obtain the model loss.
  • steps S601-S602 is expressed here by the following formula:
  • Loss represents the model loss
  • Ft represents the first feature map
  • Fs represents the second feature map
  • W represents the channel weight vector
  • n represents the number of images to be tested
  • c represents the number of feature channels in the first feature map
  • Ft i represents The i-th feature channel map in the first feature map
  • Fs i represents the i-th feature channel map in the second feature map
  • f represents a predefined loss function, such as L1 loss, MSE loss, etc., which are not limited here .
  • the number of feature channels in the first feature map and the second feature map remain the same.
  • the feature loss and the channel weight vector are weighted, so that the calculation of the model loss integrates the influence of the feature channel importance parameter, so as to effectively reduce the model while compressing the model. Loss of compression accuracy.
  • a model compression device is provided, and the model compression device corresponds to the model compression method in the foregoing embodiment one-to-one.
  • the model compression device includes a backbone network acquisition module 10, a model testing module 20, a channel weight calculation module 30, a model training module 40, a model loss calculation module 50 and a model update module 60.
  • the detailed description of each functional module is as follows:
  • the backbone network acquisition module 10 is configured to acquire a pre-trained image recognition model and a second backbone network to be trained.
  • the image recognition model includes the first backbone network.
  • the model testing module 20 is used to input the image to be tested into the image recognition model for testing, obtain the model test result and the first feature map output by the first backbone network, and input the training image into the second backbone network for feature extraction , To obtain the second feature map output by the second backbone network.
  • the channel weight calculation module 30 is configured to calculate the channel weight vector corresponding to the first feature map according to the model test result; wherein, the channel weight vector is used to describe the importance of the feature channel in the first feature map.
  • the model training module 40 is configured to input training images into the first backbone network and the second backbone network for feature extraction, to obtain a first feature map output by the first backbone network and a second feature map output by the second backbone network.
  • the model loss calculation module 50 is configured to calculate the model loss based on the first feature map, the second feature map, and the channel weight vector.
  • the model update module 60 is used to update and optimize the second backbone network according to the model loss to obtain a compressed image recognition model.
  • the image recognition model includes a mask layer connected to the first backbone network and a recognition network connected to the mask layer.
  • the model testing module includes a feature extraction unit, a channel masking unit, an image recognition unit, a result statistics unit, and a test result acquisition unit.
  • the feature extraction unit is configured to use the first backbone network to perform feature extraction on each image to be tested, and output a test feature map corresponding to each image to be tested; wherein the test feature map includes multiple feature channels.
  • the channel shielding unit is used to perform channel shielding processing on the same feature channel in each test feature map by using a mask layer to obtain a third feature map corresponding to each test image;
  • the image recognition unit is used to recognize each third feature map by using a recognition network to obtain a recognition result corresponding to each third feature map.
  • the result statistics unit is used to obtain the test result component corresponding to each characteristic channel according to the recognition result and the real result corresponding to the image to be tested.
  • the test result acquisition unit is used to use the data set containing the test result component corresponding to each feature channel as the model test result corresponding to the multiple images to be tested.
  • the channel weight calculation module includes a first difference calculation unit, a second difference calculation unit, a channel weight component calculation unit, and a channel weight vector acquisition unit.
  • the first difference calculation unit is configured to use the difference between the maximum value of the test result component and each test result component in the model test result as the first difference.
  • the second difference calculation unit is configured to use the difference between the maximum value and the minimum value of the test result component in the model test result as the second difference value.
  • the channel weight component calculation unit is used to calculate the ratio of the first difference and the second difference, and add the ratio and the predefined constant term to obtain the channel weight component corresponding to each test feature map.
  • the channel weight vector acquisition unit is used to use the data set containing the channel weight component corresponding to each test feature map as the channel weight vector.
  • the channel weight calculation module includes a first difference calculation unit, a second difference calculation unit, a scaling unit, a channel weight component calculation unit, and a channel weight vector acquisition unit.
  • the first difference calculation unit is configured to use the difference between the maximum value of the test result component and each test result component in the model test result as the first difference.
  • the second difference calculation unit is configured to use the difference between the maximum value and the minimum value of the test result component in the model test result as the second difference value.
  • the scaling unit is used for calculating the ratio of the first difference and the second difference, and calculating the product of the ratio and the preset scaling factor.
  • the channel weight component calculation unit is used to add and process the product and the predefined constant term to obtain the channel weight component corresponding to each test feature map.
  • the channel weight vector acquisition unit is used to use the data set containing the channel weight component corresponding to each test feature map as the channel weight vector.
  • the model update module includes a feature map loss calculation unit and a model loss calculation unit.
  • the feature map loss calculation unit is configured to calculate the first feature map and the second feature map by using a predefined loss function to obtain the feature map loss.
  • the model loss calculation unit is used for weighting the feature map loss based on the channel weight vector to obtain the model loss.
  • each module in the above-mentioned model compression device can be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 9.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a computer storage medium and an internal memory.
  • the computer storage medium stores an operating system, a computer program, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the computer storage medium.
  • the database of the computer equipment is used to store the data generated or acquired during the execution of the model compression method, such as the image recognition model.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by the processor to realize a model compression method.
  • a computer device including a memory, a processor, and a computer program stored in the memory and capable of running on the processor.
  • the processor executes the computer program, the model compression method in the above-mentioned embodiment is implemented. Steps, such as steps S201-S206 shown in FIG. 2 or the steps shown in FIG. 3 to FIG. 7. Or, when the processor executes the computer program, the function of each module/unit in this embodiment of the model compression device is realized, for example, the function of each module/unit shown in FIG.
  • a computer storage medium stores a computer program.
  • the computer program When the computer program is executed by a processor, it implements the steps of the model compression method in the foregoing embodiment, for example, step S201- shown in FIG. 2 S206, or the steps shown in FIG. 3 to FIG. 7, in order to avoid repetition, it will not be repeated here.
  • the computer program when executed by the processor, the function of each module/unit in the embodiment of the model compression device is realized, for example, the function of each module/unit shown in FIG. 8. To avoid repetition, details are not described here.
  • the computer-readable storage medium may be non-volatile or volatile.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

一种模型压缩方法、装置、设备及存储介质,涉及人工智能技术领域。该模型压缩方法包括获取预先训练好的图像识别模型和待训练的第二骨干网络(S201);将多个待测试图像输入至图像识别模型中进行测试,得到模型测试结果(S202);根据模型测试结果,计算通道权重向量(S203);将训练图像分别输入至第一骨干网络和第二骨干网络中进行特征提取,得到第一骨干网络输出的第一特征图和第二骨干网络输出的第二特征图(S204);基于第一特征图、第二特征图以及通道权重向量,计算模型损失(S205);根据模型损失更新优化第二骨干网络,以得到压缩后的图像识别模型(S206)。该模型压缩方法能够解决目前模型压缩需要对整个人工智能模型进行蒸馏压缩的效率低且通用性不高的问题。

Description

模型压缩方法、装置、计算机设备及存储介质
本申请以2020年9月23日提交的申请号为202011007728.9,发明名称为“模型压缩方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种模型压缩方法、装置、计算机设备及存储介质。
背景技术
在人工智能领域模型生命周期通常可分为模型训练和模型推理两个环节。在模型训练环节,为追求模型具有更高的识别精准度,模型往往不可避免的存在冗余。而在模型推理环节,由于受到不同的推理应用环境的影响,除了需要关注模型的精准度的同时,还需要模型具有推理速度快、资源占用小、文件尺寸小等高性能特点。
目前,模型压缩是将模型从模型训练环节向模型推理环节转变的常用优化手段。但发明人意识到当前模型压缩都是面向整个人工智能模型进行蒸馏压缩的,而不同的模型由于其应用场景复杂多样,因此在对不同的模型进行模型压缩时,还需要进行定制化开发压缩方案,效率低且通用性不高。
发明内容
本申请实施例提供一种模型压缩方法、装置、计算机设备及存储介质,以解决目前模型压缩的效率低且通用性不高的问题。
一种模型压缩方法,包括:
获取预先根据训练图像训练好的图像识别模型和待训练的第二骨干网络;其中,所述图像识别模型包括第一骨干网络;
将多个待测试图像输入至所述图像识别模型中进行测试,得到所述多个待测试图像对应的模型测试结果;
根据所述模型测试结果,计算通道权重向量;其中,所述通道权重向量用于描述所述第一骨干网络输出的特征图所对应的特征通道的重要性;
将所述训练图像分别输入至所述第一骨干网络和所述第二骨干网络中进行特征提取,得到所述第一骨干网络输出的第一特征图和所述第二骨干网络输出的第二特征图;
基于所述第一特征图、所述第二特征图以及所述通道权重向量,计算模型损失;
根据所述模型损失更新优化所述第二骨干网络,以得到压缩后的图像识别模型。
一种模型压缩装置,包括:
骨干网络获取模块,用于获取预先训练好的图像识别模型和待训练的第二骨干网络;其中,所述图像识别模型包括第一骨干网络;
模型测试模块,将多个待测试图像输入至所述图像识别模型中进行测试,得到所述多个待测试图像对应的模型测试结果;
通道权重计算模块,用于根据所述模型测试结果,计算通道权重向量;其中,所述通道权重向量用于描述所述第一骨干网络输出的特征图所对应的特征通道的重要性;
模型训练模块,用于将所述训练图像分别输入至所述第一骨干网络和所述第二骨干网络中进行特征提取,得到所述第一骨干网络输出的第一特征图和所述第二骨干网络输出的第二特征图;
模型损失计算模块,用于基于所述第一特征图、所述第二特征图以及所述通道权重向量,计算模型损失;
模型更新模块,用于根据所述模型损失更新优化所述第二骨干网络,以得到压缩后的图像识别模型。
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行如下步骤:
获取预先根据训练图像训练好的图像识别模型和待训练的第二骨干网络;其中,所述图像识别模型包括第一骨干网络;
将多个待测试图像输入至所述图像识别模型中进行测试,得到所述多个待测试图像对应的模型测试结果;
根据所述模型测试结果,计算通道权重向量;其中,所述通道权重向量用于描述所述第一骨干网络输出的特征图所对应的特征通道的重要性;
将所述训练图像分别输入至所述第一骨干网络和所述第二骨干网络中进行特征提取,得到所述第一骨干网络输出的第一特征图和所述第二骨干网络输出的第二特征图;
基于所述第一特征图、所述第二特征图以及所述通道权重向量,计算模型损失;
根据所述模型损失更新优化所述第二骨干网络,以得到压缩后的图像识别模型。
一种计算机存储介质,所述计算机存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如下步骤:
获取预先根据训练图像训练好的图像识别模型和待训练的第二骨干网络;其中,所述图像识别模型包括第一骨干网络;
将多个待测试图像输入至所述图像识别模型中进行测试,得到所述多个待测试图像对应的模型测试结果;
根据所述模型测试结果,计算通道权重向量;其中,所述通道权重向量用于描述所述第一骨干网络输出的特征图所对应的特征通道的重要性;
将所述训练图像分别输入至所述第一骨干网络和所述第二骨干网络中进行特征提取,得到所述第一骨干网络输出的第一特征图和所述第二骨干网络输出的第二特征图;
基于所述第一特征图、所述第二特征图以及所述通道权重向量,计算模型损失;
根据所述模型损失更新优化所述第二骨干网络,以得到压缩后的图像识别模型。
上述模型压缩方法、装置、计算机设备及存储介质中,通过获取预先训练好的图像识别模型和待训练的第二骨干网络,以便根据该图像识别模型中的第一骨干网络进行知识蒸馏,训练第二骨干网络,无需对整个人工智能模型进行模型压缩,实现针对原模型的局部网络进行知识蒸馏,减少显存占用量和计算量,加速模型压缩过程,且可有效解决由于应用的场景较多导致的模型压缩局限性的问题,通用性更高,有利于工具化,可有效减少重复投入。接着,待测试图像输入至图像识别模型中进行测试,得到模型测试结果,以便根据该根据模型测试结果,计算通道权重向量,以便通过该模型测试结果决定每一特征通道的重要性,保证通道权重向量的准确性和实用性。然后,将训练图像分别输入至第一骨干网络和第二骨干网络中进行特征提取,得到第一骨干网络输出的第一特征图和第二骨干网络输出的第二特征图,以便基于第一特征图、第二特征图以及通道权重向量,计算模型损失。最后,根据模型损失更新优化第二骨干网络,以得到压缩后的图像识别模型,使第二骨干网络可学习到对模型准确率影响较大的特征,以提高后续模型压缩后所得到的目标识别模型的识别精度,避免出现模型压缩导致丢失重要信息的问题。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例, 对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一实施例中模型压缩方法的一应用环境示意图;
图2是本申请一实施例中模型压缩方法的一流程图;
图3是本实施例中模型压缩方法的一结构示意图;
图4是图3中步骤S202的一具体流程图;
图5是图2中步骤S203的一具体流程图;
图6是图2中步骤S203的一具体流程图;
图7是图2中步骤S205的一具体流程图;
图8是本申请一实施例中模型压缩装置的一示意图;
图9是本申请一实施例中计算机设备的一示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
该模型压缩方法可应用在如图1的应用环境中,其中,计算机设备通过网络与服务器进行通信。计算机设备可以但不限于各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备。服务器可以用独立的服务器来实现。
在一实施例中,如图2所示,提供一种模型压缩方法,以该方法应用在图1中的服务器为例进行说明,包括如下步骤:
S201:获取预先训练好的图像识别模型和待训练的第二骨干网络,图像识别模型包括第一骨干网络。
其中,第一骨干网络是指图像识别模型中的特征提取骨干网络,可理解为传统知识蒸馏法中的Teacher网络。图像给识别模型是预先训练好的用于对图像中的物体进行识别的模型,例如识别图像中的动物、人物等等。第二骨干网络是预先创建好的相对于第一骨干网络而言规模较小(如神经元数量或网络层数较少)的特征骨干网络,可理解为传统知识蒸馏法中的Student网络,通过将第一骨干网络进行知识蒸馏,以训练第二骨干网络,以得到压缩后的图像识别模型。需要说明的是,第一骨干网络以及第二骨干网络所输出的特征图的特征通道保持一致,以统一特征通道维度,以利后续计算。
可以理解地,在训练图像识别模型时,由于追求更高的模型精度,导致模型的规模较为庞大,在进行模型部署时,由于其复杂度较高,导致需要高额的存储空间、计算资源的消耗,使其很难落实到各个硬件平台中。故需要对原模型进行指示蒸馏,以最大限度地减小模型对于计算空间和时间的消耗。
目前人工智能模型包括特征提取骨干网络以及与应用相关的其它部分。其中,特征提取骨干网络的神经元数量大以及计算复杂度最高,故本实施例中只需对该特征提取骨干网络进行知识蒸馏,无需对整个人工智能模型进行模型压缩,可减小压缩的工作量,且可有效解决由于应用的场景较多导致的模型压缩局限性的问题。
具体地,如图3的模型压缩的结构示意图所示,图像识别模型包括与第一骨干网络(即Teacher网络)相连的掩膜层(mask层)以及与掩膜层相连的识别网络(即Head网络)。该mask层用于对第一特征图中的特征通道进行通道屏蔽处理,以便计算每一特征通道的重要性评价参数(即通道权重向量);识别网络用于对掩膜层输出的特征图进行识别,得到对应的识别结果。
如图3所示,通过将测试图像输入至图像识别模型中进行测试,以便得到第一骨干网 络输出的特征图对应的通道权重向量,然后再将用于训练图像识别模型的训练图像同时输入至第一骨干网络和第二骨干网络中,以得到第一特征图和第二特征图,以便基于第一特征图、第二特征图和通道权重向量,计算Loss(即模型损失),再将该模型损失传递给第二骨干网络(即Student网络)进行模型优化,以实现模型压缩。
S202:将多个待测试图像输入至图像识别模型中进行测试,得到多个待测试图像对应的模型测试结果。
具体地,按照预设的batchsize(表示测试的样本数)参数可将多张待测试图像输入至图像识别模型中进行批量测试,以统计该多张待测试图像的识别准确率作为模型测试结果。
S203:根据模型测试结果,计算通道权重向量;其中,通道权重向量用于描述第一特征图中特征通道的重要性。
具体地,通过模型测试结果,即模型识别准确率,以计算通道权重向量,以通过该通道权重向量来直接评价第一特征图中特征通道的重要性,以使第一骨干网络或第二骨干网络输出的特征图中的每一特征通道的重要性均是通过模型识别准确率来评价,准确率较高且实用性更强。
S204:将训练图像分别输入至第一骨干网络和第二骨干网络中进行特征提取,得到第一骨干网络输出的第一特征图和第二骨干网络输出的第二特征图。
其中,第一特征图是指训练图像经过第一骨干网络进行特征提取所输出的特征图。第二特征图是指训练图像经过第二骨干网络进行特征提取所输出的特征图。具体地,将训练图像分别输入至第一骨干网络和第二骨干网络中进行特征提取,以得到第一骨干网络输出的第一特征图和第二骨干网络输出的第二特征图,以便后续计算模型损失。需要说明的是,步骤S203和步骤S204的执行顺序没有先后顺序之分,可同时执行,此处不做限定。
S205:基于第一特征图、第二特征图以及通道权重向量,计算模型损失。
本实施例中,通过预定义的损失函数,根据第一特征图、第二特征图计算特征图的损失,再通过与通道权重向量进行加权,以使第二骨干网络可学习到对模型准确率影响较大的特征,有利于提高后续模型压缩后所得到的目标识别模型的识别精度,可避免出现模型压缩导致丢失重要信息的问题。
S206:根据模型损失更新优化第二骨干网络,以得到压缩后的图像识别模型。
具体地,通过图像预测模型中预置的模型更新算法对第二骨干网络中的每一神经元的模型参数(如模型权值)求偏导
Figure PCTCN2020124813-appb-000001
即可优化第二骨干网络中每一神经元的模型参数,当第二骨干网络的预测准确率达到预设值,即可得到压缩后的图像识别模型。
本实施例中,通过获取预先训练好的图像识别模型和待训练的第二骨干网络,以便根据该图像识别模型中的第一骨干网络进行知识蒸馏,训练第二骨干网络,无需对整个人工智能模型进行模型压缩,实现针对原模型的局部网络进行知识蒸馏,减少显存占用量和计算量,加速模型压缩过程,且可有效解决由于应用的场景较多导致的模型压缩局限性的问题,通用性更高,有利于工具化,可有效减少重复投入。接着,待测试图像输入至图像识别模型中进行测试,得到模型测试结果,以便根据该根据模型测试结果,计算通道权重向量,以便通过该模型测试结果决定每一特征通道的重要性,保证通道权重向量的准确性和实用性。然后,将训练图像分别输入至第一骨干网络和第二骨干网络中进行特征提取,得到第一骨干网络输出的第一特征图和第二骨干网络输出的第二特征图,以便基于第一特征图、第二特征图以及通道权重向量,计算模型损失。最后,根据模型损失更新优化第二骨干网络,以得到压缩后的图像识别模型,使第二骨干网络可学习到对模型准确率影响较大的特征,以提高后续模型压缩后所得到的目标识别模型的识别精度,避免出现模型压缩导 致丢失重要信息的问题。
在一实施例中,如图4所示,步骤202中,即将多个待测试图像输入至图像识别模型中进行测试,得到多个待测试图像对应的模型测试结果,具体包括如下步骤:
S301:采用第一骨干网络对每一待测试图像进行特征提取,输出每一待测试图像对应的测试特征图;其中,待测试图像包括多个特征通道。
其中,测试特征图是指待测试图像经过第一骨干网络进行特征提取所输出的待测试图像对应的特征图。具体地,通过第一骨干网络对每一待测试图像进行特征提取,即经过多层卷积、激活、池化等非线性变换,以输出每一待测试图像对应的测试特征图。该测试特征图包括多个特征通道,不同的特征通道反应不同的图像特征。
S302:采用掩膜层对每一测试特征图中的同一特征通道进行通道屏蔽处理,得到每一待测试图像对应的第三特征图。
其中,掩膜层是即指在原始张量上盖上一层掩膜,从而屏蔽或选择一些特定元素,以得到目标区域的图像。第三特征图是指对每一第一特征图屏蔽掉同一特征通道后的特征图。
具体地,通过掩膜层构建0、1的映射矩阵,以将目标特征通道的特征保留,去除非目标通道的特征,例如,每一测试特征图均包括特征通道a和特征通道b,假设需要屏蔽掉每一测试特征图中的特征通道1,则将特征通道a对应的映射矩阵填充为0,特征通道b对应的映射矩阵填充为1,即可将特征通道a屏蔽,并保留特征通道b。
S303:采用识别网络对每一第三特征图进行识别,得到每一第三特征图对应的识别结果。
具体地,通过将每一第三特征图输入至识别网络中进行识别,即可得到每一第三特征图对应的识别结果。
S304:根据识别结果和待测试图像对应的真实结果,得到每一特征通道对应的测试结果分量。
其中,真实结果是指预先标注的待测试图像对应的图像分类结果,例如若该图像识别模型的模型应用场景为识别图像中的动物类别,则该真实结果即为待测试图像中动物的真实类别。
具体地,通过将该识别结果与待测试图像对应的真实结果进行对比,即可得到该轮测试对应的测试结果分量。通过对比每一识别结果与真实结果,即可统计同时屏蔽掉相同特征通道的多个第三特征图的识别准确率,即可作为该特征通道对应的测试结果分量。
示例性地,假设待测试图像输入至第一骨干网络中进行特征提取所输出的结果为特征图1和特征图2,每一特征图均包括2个特征通道a和b,通过同时屏蔽掉这两个第一特征图中的特征通道a,则可得到对应的第三特征图a 1’和a 2’,分别将a 1’和a 2’作为识别网络的输入数据输入至识别网络中进行识别,得到对应的识别结果a 1”(表示为猫)和a 2”(表示为狗),将每一识别结果与真实结果(如狗)进行对比,将本轮测试的识别准确率即50%,作为特征通道a对应的测试结果分量。通过执行多轮测试,即同时屏蔽掉这两个特征图中的每一个特征通道,即可得到每一特征通道对应的测试结果分量。
S305:将包含每一特征通道对应的测试结果分量的数据集,作为多个待测试图像对应的模型测试结果。
具体地,将每一特征通道对应的测试结果分量的数据集,以将该数据集作为模型测试结果,以便后续计算损失时,调用该数据集中的数据进行计算。
在一实施例中,如图5所示,步骤S203中,即根据模型测试结果,计算通道权重向 量,具体包括如下步骤:
S401:将模型测试结果中,测试结果分量的最大值与每一测试结果分量的差值作为第一差值。
S402:将最大值与模型测试结果中测试结果分量的最小值的差值作为第二差值。
S403:计算第一差值和第二差值的比值,并将比值与预定义的常数项进行加和处理,得到每一测试特征图对应的通道权重分量。
具体地,为直观表达,此处通过如下公式表示步骤S401-S403的计算过程,Wi=1+(Amax-Ai)/(Amax-Amin),其中,1表示预定义的常数项,Amax表示测试结果分量的最大值,Amin表示测试结果分量的最小值,Ai表示特征通道i对应的测试结果分量,i为通道标识,Wi表示测试特征图对应的通道权重分量,用于表征特征通道i的重要性。
S404:将包含每一测试特征图对应的通道权重分量的数据集,作为通道权重向量。
具体地,将每一测试特征图对应的通道权重分量存储在一数据集中,以将该数据集作为通道权重向量,以便后续计算损失时,调用该数据集中的数据进行计算。
在一实施例中,如图6所示,步骤S203中,即根据模型测试结果,计算通道权重向量,具体包括如下步骤:
S501:将模型测试结果中,测试结果分量的最大值与每一测试结果分量的差值作为第一差值。
S502:将最大值与模型测试结果中测试结果分量的最小值的差值作为第二差值。
S503:计算第一差值和第二差值的比值,以及计算比值与预置缩放因子的乘积。
S504:对乘积与预定义的常数项进行加和处理,得到每一测试特征图对应的通道权重分量。
具体地,为直观表达,此处通过如下公式表示步骤S501-S503的计算过程,Wi=1+α(Amax-Ai)/(Amax-Amin),其中,1表示预定义的常数项,Amax表示测试结果分量的最大值,Amin表示测试结果分量的最小值,Ai表示特征通道i对应的测试结果分量,i为通道标识,α为预置缩放因子(默认设置为1,可根据需要进行自定义设置),Wi表示测试特征图对应的通道权重分量,用于表征特征通道i的重要性。
可以理解地,为进一步使得到每一通道权重分量间的差距变得明显,即放大通道权重分量间的差距,能够使第二骨干网络学习到对模型准确性的影响较为重要的特征,保证模型压缩的精度,本实施例中通过预置缩放因子,以实现放大通道权重分量间的差距。
S504:将包含每一测试特征图对应的通道权重分量的数据集,作为通道权重向量。
具体地,将每一测试特征图对应的通道权重分量存储在一数据集中,以将该数据集作为通道权重向量,以便后续计算损失时,调用该数据集中的数据进行计算。
在一实施例中,如图7所示,步骤S205中,即基于第一特征图、第二特征图以及通道权重向量,计算模型损失,具体包括如下步骤:
S601:采用预定义的损失函数对第一特征图与第二特征图进行计算,得到特征图损失。
S602:基于通道权重向量,对特征图损失进行加权处理,得到模型损失。
具体地,为直观表达,此处通过如下公式表示步骤S601-S602的计算过程,
Figure PCTCN2020124813-appb-000002
Figure PCTCN2020124813-appb-000003
其中,Loss表示模型损失,Ft表示第一特征图,Fs表示第二特征图,W表示通道权重向量,n表示待测试图像的数量,c表示第一特征图中特征通道的数量,Ft i表示第一特征图中的第i个特征通道图,Fs i表示第二特征图中的第i个特征通道图,f表示预定义的损失函数,如L1 loss、MSE loss等,此处不做限定。需要说明的是,第一特征图和第二特征图的特征通道数量保持一致。
本实施例中,在计算模型损失时,通过对特征损失与通道权重向量进行加权处理,以使模型损失的计算综合了特征通道重要性参数的影响,以在模型压缩的同时能够有效地降低模型压缩的精度损失。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
在一实施例中,提供一种模型压缩装置,该模型压缩装置与上述实施例中模型压缩方法一一对应。如图8所示,该模型压缩装置包括骨干网络获取模块10、模型测试模块20、通道权重计算模块30、模型训练模块40、模型损失计算模块50和模型更新模块60。各功能模块详细说明如下:
骨干网络获取模块10,用于获取预先训练好的图像识别模型和待训练的第二骨干网络,图像识别模型包括第一骨干网络。
模型测试模块20,用于将待测试图像输入至图像识别模型中进行测试,得到模型测试结果和第一骨干网络输出的第一特征图,以及将训练图像输入至第二骨干网络中进行特征提取,得到第二骨干网络输出的第二特征图。
通道权重计算模块30,用于根据模型测试结果,计算第一特征图对应的通道权重向量;其中,通道权重向量用于描述第一特征图中特征通道的重要性。
模型训练模块40,用于将训练图像分别输入至第一骨干网络和第二骨干网络中进行特征提取,得到第一骨干网络输出的第一特征图和第二骨干网络输出的第二特征图。
模型损失计算模块50,用于基于第一特征图、第二特征图以及通道权重向量,计算模型损失。
模型更新模块60,用于根据模型损失更新优化第二骨干网络,以得到压缩后的图像识别模型。
具体地,图像识别模型包括与第一骨干网络相连的掩膜层以及与掩膜层相连的识别网络。
具体地,模型测试模块包括特征提取单元、通道屏蔽单元、图像识别单元、结果统计单元和测试结果获取单元。
特征提取单元,用于采用第一骨干网络对每一待测试图像进行特征提取,输出每一待测试图像对应的测试特征图;其中,测试特征图包括多个特征通道。
通道屏蔽单元,用于采用掩膜层对每一测试特征图中的同一特征通道进行通道屏蔽处理,得到每一待测试图像对应的第三特征图;
图像识别单元,用于采用识别网络对每一第三特征图进行识别,得到每一第三特征图对应的识别结果。
结果统计单元,用于根据识别结果和待测试图像对应的真实结果,得到每一特征通道对应的测试结果分量。
测试结果获取单元,用于将包含每一特征通道对应的测试结果分量的数据集,作为多个待测试图像对应的模型测试结果。
具体地,通道权重计算模块包括第一差值计算单元、第二差值计算单元、通道权重分量计算单元和通道权重向量获取单元。
第一差值计算单元,用于将模型测试结果中,测试结果分量的最大值与每一测试结果分量的差值作为第一差值。
第二差值计算单元,用于将最大值与模型测试结果中测试结果分量的最小值的差值作为第二差值。
通道权重分量计算单元,用于计算第一差值和第二差值的比值,并将比值与预定义的常数项进行加和处理,得到每一测试特征图对应的通道权重分量。
通道权重向量获取单元,用于将包含每一测试特征图对应的通道权重分量的数据集, 作为通道权重向量。
具体地,通道权重计算模块包括第一差值计算单元、第二差值计算单元、缩放单元、通道权重分量计算单元和通道权重向量获取单元。
第一差值计算单元,用于将模型测试结果中,测试结果分量的最大值与每一测试结果分量的差值作为第一差值。
第二差值计算单元,用于将最大值与模型测试结果中测试结果分量的最小值的差值作为第二差值。
缩放单元,用于计算第一差值和第二差值的比值,以及计算比值与预置缩放因子的乘积。
通道权重分量计算单元,用于对乘积与预定义的常数项进行加和处理,得到每一测试特征图对应的通道权重分量。
通道权重向量获取单元,用于将包含每一测试特征图对应的通道权重分量的数据集,作为通道权重向量。
具体地,模型更新模块包括特征图损失计算单元和模型损失计算单元。
特征图损失计算单元,用于采用预定义的损失函数对第一特征图与第二特征图进行计算,得到特征图损失。
模型损失计算单元,用于基于通道权重向量,对特征图损失进行加权处理,得到模型损失。
关于模型压缩装置的具体限定可以参见上文中对于模型压缩方法的限定,在此不再赘述。上述模型压缩装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图9所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括计算机存储介质、内存储器。该计算机存储介质存储有操作系统、计算机程序和数据库。该内存储器为计算机存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储执行模型压缩方法过程中生成或获取的数据,如图像识别模型。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种模型压缩方法。
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现上述实施例中的模型压缩方法的步骤,例如图2所示的步骤S201-S206,或者图3至图7中所示的步骤。或者,处理器执行计算机程序时实现模型压缩装置这一实施例中的各模块/单元的功能,例如图8所示的各模块/单元的功能,为避免重复,这里不再赘述。
在一实施例中,提供一计算机存储介质,该计算机存储介质上存储有计算机程序,该计算机程序被处理器执行时实现上述实施例中模型压缩方法的步骤,例如图2所示的步骤S201-S206,或者图3至图7中所示的步骤,为避免重复,这里不再赘述。或者,该计算机程序被处理器执行时实现上述模型压缩装置这一实施例中的各模块/单元的功能,例如图8所示的各模块/单元的功能,为避免重复,这里不再赘述。所述计算机可读存储介质可以是非易失性,也可以是易失性。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可 包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种模型压缩方法,其中,包括:
    获取预先根据训练图像训练好的图像识别模型和待训练的第二骨干网络;其中,所述图像识别模型包括第一骨干网络;
    将多个待测试图像输入至所述图像识别模型中进行测试,得到所述多个待测试图像对应的模型测试结果;
    根据所述模型测试结果,计算通道权重向量;其中,所述通道权重向量用于描述所述第一骨干网络输出的特征图所对应的特征通道的重要性;
    将所述训练图像分别输入至所述第一骨干网络和所述第二骨干网络中进行特征提取,得到所述第一骨干网络输出的第一特征图和所述第二骨干网络输出的第二特征图;
    基于所述第一特征图、所述第二特征图以及所述通道权重向量,计算模型损失;
    根据所述模型损失更新优化所述第二骨干网络,以得到压缩后的图像识别模型。
  2. 如权利要求1所述模型压缩方法,其中,所述图像识别模型包括与所述第一骨干网络相连的掩膜层以及与所述掩膜层相连的识别网络。
  3. 如权利要求2所述模型压缩方法,其中,所述将多个待测试图像输入至所述图像识别模型中进行测试,得到所述多个待测试图像对应的模型测试结果,包括:
    采用所述第一骨干网络对每一所述待测试图像进行特征提取,输出每一所述待测试图像对应的测试特征图;其中,所述测试特征图包括多个特征通道;
    采用所述掩膜层对每一所述测试特征图中的同一特征通道进行通道屏蔽处理,得到每一所述待测试图像对应的第三特征图;
    采用所述识别网络对每一所述第三特征图进行识别,得到每一所述第三特征图对应的识别结果;
    根据所述识别结果和所述待测试图像对应的真实结果,得到每一所述特征通道对应的测试结果分量;
    将包含每一所述特征通道对应的测试结果分量的数据集,作为所述多个待测试图像对应的模型测试结果。
  4. 如权利要求3所述模型压缩方法,其中,所述根据所述模型测试结果,计算通道权重向量,包括:
    将所述模型测试结果中,所述测试结果分量的最大值与每一所述测试结果分量的差值作为第一差值;
    将所述最大值与所述模型测试结果中所述测试结果分量的最小值的差值作为第二差值;
    计算所述第一差值和所述第二差值的比值,并将所述比值与预定义的常数项进行加和处理,得到每一所述测试特征图对应的通道权重分量;
    将包含每一所述测试特征图对应的通道权重分量的数据集,作为所述通道权重向量。
  5. 如权利要求3所述模型压缩方法,其中,所述根据所述模型测试结果,计算通道权重向量,包括:
    将所述模型测试结果中,所述测试结果分量的最大值与每一所述测试结果分量的差值作为第一差值;
    将所述最大值与所述模型测试结果中所述测试结果分量的最小值的差值作为第二差值;
    计算所述第一差值和所述第二差值的比值,以及计算所述比值与预置缩放因子的乘积;
    对所述乘积与预定义的常数项进行加和处理,得到每一所述测试特征图对应的通道权 重分量;
    将包含每一所述测试特征图对应的通道权重分量的数据集,作为所述通道权重向量。
  6. 如权利要求1所述模型压缩方法,其中,所述基于所述第一特征图、所述第二特征图以及所述通道权重向量,计算模型损失,包括;
    采用预定义的损失函数对所述第一特征图与第二特征图进行计算,得到特征图损失;
    基于所述通道权重向量,对所述特征图损失进行加权处理,得到所述模型损失。
  7. 一种模型压缩装置,其中,包括:
    骨干网络获取模块,用于获取预先训练好的图像识别模型和待训练的第二骨干网络;其中,所述图像识别模型包括第一骨干网络;
    模型测试模块,将多个待测试图像输入至所述图像识别模型中进行测试,得到所述多个待测试图像对应的模型测试结果;
    通道权重计算模块,用于根据所述模型测试结果,计算通道权重向量;其中,所述通道权重向量用于描述所述第一骨干网络输出的特征图所对应的特征通道的重要性;
    模型训练模块,用于将所述训练图像分别输入至所述第一骨干网络和所述第二骨干网络中进行特征提取,得到所述第一骨干网络输出的第一特征图和所述第二骨干网络输出的第二特征图;
    模型损失计算模块,用于基于所述第一特征图、所述第二特征图以及所述通道权重向量,计算模型损失;
    模型更新模块,用于根据所述模型损失更新优化所述第二骨干网络,以得到压缩后的图像识别模型。
  8. 如权利要求7所述的模型压缩装置,其中,所述模型测试模块包括:
    特征提取单元,用于采用所述第一骨干网络对每一所述待测试图像进行特征提取,输出每一所述待测试图像对应的测试特征图;其中,所述测试特征图包括多个特征通道;
    通道屏蔽单元,用于采用掩膜层对每一所述测试特征图中的同一特征通道进行通道屏蔽处理,得到每一所述待测试图像对应的第三特征图;
    图像识别单元,用于采用识别网络对每一所述第三特征图进行识别,得到每一所述第三特征图对应的识别结果;
    结果统计单元,用于根据所述识别结果和所述待测试图像对应的真实结果,得到每一所述特征通道对应的测试结果分量;
    测试结果获取单元,用于将包含每一所述特征通道对应的测试结果分量的数据集,作为所述多个待测试图像对应的模型测试结果。
  9. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现如下步骤:
    获取预先根据训练图像训练好的图像识别模型和待训练的第二骨干网络;其中,所述图像识别模型包括第一骨干网络;
    将多个待测试图像输入至所述图像识别模型中进行测试,得到所述多个待测试图像对应的模型测试结果;
    根据所述模型测试结果,计算通道权重向量;其中,所述通道权重向量用于描述所述第一骨干网络输出的特征图所对应的特征通道的重要性;
    将所述训练图像分别输入至所述第一骨干网络和所述第二骨干网络中进行特征提取,得到所述第一骨干网络输出的第一特征图和所述第二骨干网络输出的第二特征图;
    基于所述第一特征图、所述第二特征图以及所述通道权重向量,计算模型损失;
    根据所述模型损失更新优化所述第二骨干网络,以得到压缩后的图像识别模型。
  10. 如权利要求9所述的计算机设备,其中,所述图像识别模型包括与所述第一骨干网络相连的掩膜层以及与所述掩膜层相连的识别网络。
  11. 如权利要求10所述的计算机设备,其中,所述将多个待测试图像输入至所述图像识别模型中进行测试,得到所述多个待测试图像对应的模型测试结果,包括:
    采用所述第一骨干网络对每一所述待测试图像进行特征提取,输出每一所述待测试图像对应的测试特征图;其中,所述测试特征图包括多个特征通道;
    采用所述掩膜层对每一所述测试特征图中的同一特征通道进行通道屏蔽处理,得到每一所述待测试图像对应的第三特征图;
    采用所述识别网络对每一所述第三特征图进行识别,得到每一所述第三特征图对应的识别结果;
    根据所述识别结果和所述待测试图像对应的真实结果,得到每一所述特征通道对应的测试结果分量;
    将包含每一所述特征通道对应的测试结果分量的数据集,作为所述多个待测试图像对应的模型测试结果。
  12. 如权利要求11所述的计算机设备,其中,所述根据所述模型测试结果,计算通道权重向量,包括:
    将所述模型测试结果中,所述测试结果分量的最大值与每一所述测试结果分量的差值作为第一差值;
    将所述最大值与所述模型测试结果中所述测试结果分量的最小值的差值作为第二差值;
    计算所述第一差值和所述第二差值的比值,并将所述比值与预定义的常数项进行加和处理,得到每一所述测试特征图对应的通道权重分量;
    将包含每一所述测试特征图对应的通道权重分量的数据集,作为所述通道权重向量。
  13. 如权利要求11所述的计算机设备,其中,所述根据所述模型测试结果,计算通道权重向量,包括:
    将所述模型测试结果中,所述测试结果分量的最大值与每一所述测试结果分量的差值作为第一差值;
    将所述最大值与所述模型测试结果中所述测试结果分量的最小值的差值作为第二差值;
    计算所述第一差值和所述第二差值的比值,以及计算所述比值与预置缩放因子的乘积;
    对所述乘积与预定义的常数项进行加和处理,得到每一所述测试特征图对应的通道权重分量;
    将包含每一所述测试特征图对应的通道权重分量的数据集,作为所述通道权重向量。
  14. 如权利要求9所述的计算机设备,其中,基于所述第一特征图、所述第二特征图以及所述通道权重向量,计算模型损失,包括;
    采用预定义的损失函数对所述第一特征图与第二特征图进行计算,得到特征图损失;
    基于所述通道权重向量,对所述特征图损失进行加权处理,得到所述模型损失。
  15. 一种计算机存储介质,所述计算机存储介质存储有计算机程序,其中,所述计算机程序被处理器执行时实现如下步骤:
    获取预先根据训练图像训练好的图像识别模型和待训练的第二骨干网络;其中,所述图像识别模型包括第一骨干网络;
    将多个待测试图像输入至所述图像识别模型中进行测试,得到所述多个待测试图像对应的模型测试结果;
    根据所述模型测试结果,计算通道权重向量;其中,所述通道权重向量用于描述所述第一骨干网络输出的特征图所对应的特征通道的重要性;
    将所述训练图像分别输入至所述第一骨干网络和所述第二骨干网络中进行特征提取, 得到所述第一骨干网络输出的第一特征图和所述第二骨干网络输出的第二特征图;
    基于所述第一特征图、所述第二特征图以及所述通道权重向量,计算模型损失;
    根据所述模型损失更新优化所述第二骨干网络,以得到压缩后的图像识别模型。
  16. 如权利要求15所述的计算机存储介质,其中,所述图像识别模型包括与所述第一骨干网络相连的掩膜层以及与所述掩膜层相连的识别网络。
  17. 如权利要求16所述的计算机存储介质,其中,所述将多个待测试图像输入至所述图像识别模型中进行测试,得到所述多个待测试图像对应的模型测试结果,包括:
    采用所述第一骨干网络对每一所述待测试图像进行特征提取,输出每一所述待测试图像对应的测试特征图;其中,所述测试特征图包括多个特征通道;
    采用所述掩膜层对每一所述测试特征图中的同一特征通道进行通道屏蔽处理,得到每一所述待测试图像对应的第三特征图;
    采用所述识别网络对每一所述第三特征图进行识别,得到每一所述第三特征图对应的识别结果;
    根据所述识别结果和所述待测试图像对应的真实结果,得到每一所述特征通道对应的测试结果分量;
    将包含每一所述特征通道对应的测试结果分量的数据集,作为所述多个待测试图像对应的模型测试结果。
  18. 如权利要求17所述的计算机存储介质,其中,所述根据所述模型测试结果,计算通道权重向量,包括:
    将所述模型测试结果中,所述测试结果分量的最大值与每一所述测试结果分量的差值作为第一差值;
    将所述最大值与所述模型测试结果中所述测试结果分量的最小值的差值作为第二差值;
    计算所述第一差值和所述第二差值的比值,并将所述比值与预定义的常数项进行加和处理,得到每一所述测试特征图对应的通道权重分量;
    将包含每一所述测试特征图对应的通道权重分量的数据集,作为所述通道权重向量。
  19. 如权利要求17所述的计算机存储介质,其中,所述根据所述模型测试结果,计算通道权重向量,包括:
    将所述模型测试结果中,所述测试结果分量的最大值与每一所述测试结果分量的差值作为第一差值;
    将所述最大值与所述模型测试结果中所述测试结果分量的最小值的差值作为第二差值;
    计算所述第一差值和所述第二差值的比值,以及计算所述比值与预置缩放因子的乘积;
    对所述乘积与预定义的常数项进行加和处理,得到每一所述测试特征图对应的通道权重分量;
    将包含每一所述测试特征图对应的通道权重分量的数据集,作为所述通道权重向量。
  20. 如权利要求15所述的计算机存储介质,其中,所述基于所述第一特征图、所述第二特征图以及所述通道权重向量,计算模型损失,包括;
    采用预定义的损失函数对所述第一特征图与第二特征图进行计算,得到特征图损失;
    基于所述通道权重向量,对所述特征图损失进行加权处理,得到所述模型损失。
PCT/CN2020/124813 2020-09-23 2020-10-29 模型压缩方法、装置、计算机设备及存储介质 WO2021159748A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011007728.9 2020-09-23
CN202011007728.9A CN112132278A (zh) 2020-09-23 2020-09-23 模型压缩方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021159748A1 true WO2021159748A1 (zh) 2021-08-19

Family

ID=73842781

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/124813 WO2021159748A1 (zh) 2020-09-23 2020-10-29 模型压缩方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN112132278A (zh)
WO (1) WO2021159748A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113936295A (zh) * 2021-09-18 2022-01-14 中国科学院计算技术研究所 基于迁移学习的人物检测方法和系统
CN115757745A (zh) * 2022-12-01 2023-03-07 潍坊羞摆信息科技有限公司 基于人工智能的业务场景控制方法、系统及云平台
CN116468102A (zh) * 2023-05-04 2023-07-21 杭州鄂达精密机电科技有限公司 刀具图像分类模型剪枝方法、装置、计算机设备
CN117218580A (zh) * 2023-09-13 2023-12-12 杭州像素元科技有限公司 一种结合多模型的高速公路跨摄像多车辆跟踪方法及系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112990296B (zh) * 2021-03-10 2022-10-11 中科人工智能创新技术研究院(青岛)有限公司 基于正交相似度蒸馏的图文匹配模型压缩与加速方法及系统
US20230196067A1 (en) * 2021-12-17 2023-06-22 Lemon Inc. Optimal knowledge distillation scheme

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160217369A1 (en) * 2015-01-22 2016-07-28 Qualcomm Incorporated Model compression and fine-tuning
CN110880036A (zh) * 2019-11-20 2020-03-13 腾讯科技(深圳)有限公司 神经网络压缩方法、装置、计算机设备及存储介质
CN111461212A (zh) * 2020-03-31 2020-07-28 中国科学院计算技术研究所 一种用于点云目标检测模型的压缩方法
CN111488985A (zh) * 2020-04-08 2020-08-04 华南理工大学 深度神经网络模型压缩训练方法、装置、设备、介质
CN111695375A (zh) * 2019-03-13 2020-09-22 上海云从企业发展有限公司 基于模型蒸馏的人脸识别模型压缩算法、介质及终端

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160217369A1 (en) * 2015-01-22 2016-07-28 Qualcomm Incorporated Model compression and fine-tuning
CN111695375A (zh) * 2019-03-13 2020-09-22 上海云从企业发展有限公司 基于模型蒸馏的人脸识别模型压缩算法、介质及终端
CN110880036A (zh) * 2019-11-20 2020-03-13 腾讯科技(深圳)有限公司 神经网络压缩方法、装置、计算机设备及存储介质
CN111461212A (zh) * 2020-03-31 2020-07-28 中国科学院计算技术研究所 一种用于点云目标检测模型的压缩方法
CN111488985A (zh) * 2020-04-08 2020-08-04 华南理工大学 深度神经网络模型压缩训练方法、装置、设备、介质

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113936295A (zh) * 2021-09-18 2022-01-14 中国科学院计算技术研究所 基于迁移学习的人物检测方法和系统
CN115757745A (zh) * 2022-12-01 2023-03-07 潍坊羞摆信息科技有限公司 基于人工智能的业务场景控制方法、系统及云平台
CN115757745B (zh) * 2022-12-01 2023-09-15 甘肃省招标咨询集团有限责任公司 基于人工智能的业务场景控制方法、系统及云平台
CN116468102A (zh) * 2023-05-04 2023-07-21 杭州鄂达精密机电科技有限公司 刀具图像分类模型剪枝方法、装置、计算机设备
CN117218580A (zh) * 2023-09-13 2023-12-12 杭州像素元科技有限公司 一种结合多模型的高速公路跨摄像多车辆跟踪方法及系统

Also Published As

Publication number Publication date
CN112132278A (zh) 2020-12-25

Similar Documents

Publication Publication Date Title
WO2021159748A1 (zh) 模型压缩方法、装置、计算机设备及存储介质
US11348249B2 (en) Training method for image semantic segmentation model and server
WO2021114625A1 (zh) 用于多任务场景的网络结构构建方法和装置
WO2022042123A1 (zh) 图像识别模型生成方法、装置、计算机设备和存储介质
CN111191791B (zh) 基于机器学习模型的图片分类方法、装置及设备
WO2019228122A1 (zh) 模型的训练方法、存储介质及计算机设备
WO2021151336A1 (zh) 基于注意力机制的道路图像目标检测方法及相关设备
US11854248B2 (en) Image classification method, apparatus and training method, apparatus thereof, device and medium
WO2020228446A1 (zh) 模型训练方法、装置、终端及存储介质
CN109784153B (zh) 情绪识别方法、装置、计算机设备及存储介质
EP4163831A1 (en) Neural network distillation method and device
CN112613515B (zh) 语义分割方法、装置、计算机设备和存储介质
WO2020006881A1 (zh) 蝴蝶识别网络构建方法、装置、计算机设备及存储介质
WO2021022521A1 (zh) 数据处理的方法、训练神经网络模型的方法及设备
WO2021189922A1 (zh) 用户画像生成方法、装置、设备及介质
WO2021114620A1 (zh) 病历质控方法、装置、计算机设备和存储介质
CN112926654A (zh) 预标注模型训练、证件预标注方法、装置、设备及介质
CN113435594B (zh) 安防检测模型训练方法、装置、设备及存储介质
WO2020062299A1 (zh) 一种神经网络处理器、数据处理方法及相关设备
US20240185086A1 (en) Model distillation method and related device
CN111062324A (zh) 人脸检测方法、装置、计算机设备和存储介质
CN111898735A (zh) 蒸馏学习方法、装置、计算机设备和存储介质
CN111598213A (zh) 网络训练方法、数据识别方法、装置、设备和介质
CN112308825A (zh) 一种基于SqueezeNet的农作物叶片病害识别方法
CN111832581A (zh) 肺部特征识别方法、装置、计算机设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20918438

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20918438

Country of ref document: EP

Kind code of ref document: A1