WO2022198606A1 - 深度学习模型的获取方法、系统、装置及存储介质 - Google Patents

深度学习模型的获取方法、系统、装置及存储介质 Download PDF

Info

Publication number
WO2022198606A1
WO2022198606A1 PCT/CN2021/083129 CN2021083129W WO2022198606A1 WO 2022198606 A1 WO2022198606 A1 WO 2022198606A1 CN 2021083129 W CN2021083129 W CN 2021083129W WO 2022198606 A1 WO2022198606 A1 WO 2022198606A1
Authority
WO
WIPO (PCT)
Prior art keywords
deep learning
learning model
parameter
neural network
network layer
Prior art date
Application number
PCT/CN2021/083129
Other languages
English (en)
French (fr)
Inventor
张雪
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2021/083129 priority Critical patent/WO2022198606A1/zh
Publication of WO2022198606A1 publication Critical patent/WO2022198606A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • the present application relates to the technical field of deep learning, and in particular, to a method, system, device, and computer-readable storage medium for acquiring a deep learning model.
  • Deep learning models can be deployed on various platforms such as server clusters, servers, mobile terminals, etc., and applied in many different scenarios such as face recognition, beauty, and semantic segmentation.
  • platforms have different computing performance, and different application scenarios often have different computing requirements, in order to enable the deep learning model to meet the requirements of running speed and running accuracy of different platforms or different application scenarios, in related technologies.
  • the main focus is to train multiple separate models.
  • the training of multiple separate deep learning models will consume a lot of repetitive computing resources, resulting in huge computing resources. waste.
  • the training of multiple separate deep learning models also has the disadvantage of being difficult to implement.
  • the present application provides a method.
  • a method for acquiring a deep learning model comprising: acquiring a first deep learning model and an expected parameter characterizing the performance of the deep learning model, where the performance of the deep learning model includes at least the following One: the size, running speed and running accuracy of the deep learning model; pruning the first deep learning model according to the expected parameters to obtain a second deep learning model; fixing the first deep learning model parameters, perform joint training on the first deep learning model and the second deep learning model to obtain a first target deep learning model that satisfies the desired parameters.
  • another method for acquiring a deep learning model comprising: acquiring a first deep learning model and an expected cropping amount of the deep learning model; and according to the expected cropping amount, Determine the trimming amount of the first parameter of each neural network layer of the first deep learning model, the first parameter includes at least one of neurons, vectors, convolution kernels or filters; A first neural network layer that needs to be pruned in a deep learning model, removes a different specified number of first parameters, and obtains the first feature map output by the second neural network layer after the first neural network layer.
  • the second neural network layer is a neural network layer in which the size of the output feature map does not change before and after the first neural network layer removes the specified number of first parameters; obtains a plurality of the first feature maps and the The error between the second feature maps corresponding to the first feature map, the second feature map is output by the second neural network layer before the first neural network layer does not remove the specified number of first parameters based on the expected cropping amount and the multiple errors, determine the first parameter that needs to be cropped; and trim the first parameter that needs to be cropped to obtain a second target deep learning model.
  • a deep learning model acquisition system includes a first platform and a second platform; the first platform is used for In the method described in the second aspect, a target deep learning model is obtained; the second platform is used for deploying the target deep learning model; the platform includes at least one of the following: a server cluster, a server, and a mobile terminal.
  • an apparatus for acquiring a deep learning model includes a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor executes the The methods described in the first aspect and the second aspect of the embodiments of the present application are implemented during the program.
  • a computer-readable storage medium where several computer instructions are stored on the computer-readable storage medium, and when the computer instructions are executed, the first aspect and the first aspect of the embodiments of the present application are implemented. The method described in the second aspect.
  • the pre-acquired first deep learning model is pruned based on the expected parameters representing the performance of the deep learning model, and after obtaining the second deep learning model that satisfies the expected parameters, the first deep learning model is fixed.
  • the parameters of the deep learning model, the joint training of the first deep learning model and the second deep learning model can restore the accuracy of the second deep learning model, so that the finally obtained first target deep learning model, Compared with the first deep learning model, it is not only a lightweight deep learning model, but also has higher running accuracy.
  • the method of retraining is not adopted, but based on the first deep learning model, the pruning The accuracy of the deep learning model after the branch is restored, and the deep learning model that meets the needs of different platforms or different application scenarios is obtained. It can overcome the waste of computing resources caused by the need for separate and repeated training of deep learning models suitable for different platforms or different application scenarios in related technologies, and the inability to obtain suitable deep learning models when computing resources are tight. Defects.
  • FIG. 1 is a schematic flowchart of a first method for acquiring a deep learning model according to an exemplary embodiment of this specification.
  • FIG. 2 is a schematic flowchart of a first pruning method according to an exemplary embodiment of the present description.
  • FIG. 3 is a schematic diagram illustrating a comparison of four pruning methods with different granularities according to an exemplary embodiment of the present description.
  • FIG. 4 is a schematic diagram showing the principle of clipping connections and neurons of a deep learning model according to an exemplary embodiment of the present description.
  • FIG. 5 is a schematic diagram showing the principle of the first pruning method according to an exemplary embodiment of the present description.
  • FIG. 6 is a schematic diagram of a distillation network according to an exemplary embodiment of the present description.
  • FIG. 7 is a schematic flowchart of a second pruning method according to an exemplary embodiment of the present description.
  • FIG. 8 is a schematic diagram showing the principle of a third pruning method according to an exemplary embodiment of the present description.
  • FIG. 9 is a schematic flowchart of a method for acquiring a second deep learning model according to an exemplary embodiment of the present specification.
  • FIG. 10 is a schematic flowchart of performing accuracy restoration on a second target deep learning model according to an exemplary embodiment of the present specification.
  • FIG. 11 is a schematic structural diagram of a deep learning model acquisition system according to an exemplary embodiment of the present specification.
  • FIG. 12 is a schematic structural diagram of an apparatus for obtaining a deep learning model according to an exemplary embodiment of the present specification.
  • first, second, third, etc. may be used in this application to describe various information, such information should not be limited by these terms. These terms are only used to distinguish the same type of information from each other.
  • first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information without departing from the scope of the present application.
  • word "if” as used herein can be interpreted as "at the time of” or "when” or "in response to determining.”
  • the deep learning model there may be requirements that need to be deployed on different platforms and applied to different scenarios.
  • the deep learning model used for feature recognition of images collected by drones
  • the deep learning model needs to be deployed on a mobile terminal or dedicated
  • the deep learning model may need to be deployed on the server or server cluster of the drone manufacturer
  • the third-party supervision structure in order to monitor the flight status of the UAV, the deep learning model may need to be deployed on the server or server cluster of the third-party supervision structure.
  • the same deep learning model has the problem that it cannot be directly deployed on different platforms.
  • the deep learning model trained for the server cluster is a heavyweight deep learning model, that is, the trained deep learning model has a larger size, higher computing accuracy and faster speed. operation speed.
  • the computing performance of the mobile terminal is far lower than that of the server cluster, the deep learning model trained for the server cluster cannot be directly deployed on the mobile terminal. If it is forcibly deployed on the mobile terminal, it may lead to The computing speed of the mobile terminal is very slow, and even the computing resources collapse.
  • the same deep learning model even if deployed on the same platform, may have different requirements for the performance of the deep learning model based on the requirements of the application scenario. For example, for the same deep learning model used for target recognition, in a real-time application scenario, the deep learning model needs to have a fast running speed, while in an application scenario with low real-time requirements, it may be necessary to run Speed is not too demanding.
  • a method of independently training multiple separate deep learning models is usually adopted. That is, for different platforms or application scenarios, the same deep learning framework is used, and training is performed based on the computing performance of each platform or the target requirements of different application scenarios, and finally multiple deep learning models suitable for different platforms or different application scenarios are obtained.
  • the present application provides a method for acquiring a deep learning model, as shown in FIG. Methods include:
  • Step 101 Obtain a first deep learning model and expected parameters representing the performance of the deep learning model, where the performance of the deep learning model includes at least one of the following: the size, running speed and running accuracy of the deep learning model;
  • Step 102 pruning the first deep learning model according to the expected parameters to obtain a second deep learning model
  • Step 103 Fix the parameters of the first deep learning model, perform joint training on the first deep learning model and the second deep learning model, and obtain a first target deep learning model that satisfies the desired parameters.
  • the first deep learning model may be based on an existing or self-developed deep learning framework, obtained by training on a platform with computing power, especially a platform with strong computing power, or may be directly obtained from other places
  • the acquired deep learning model trained by a third party may also be acquired in other manners, which are not limited in this embodiment of the present application.
  • the expected parameters that characterize the performance of the deep learning model can be obtained in a preset way, or obtained from a demand platform where the first target deep learning model needs to be deployed, or obtained from a third party. , which is not limited in the embodiments of the present application.
  • the desired parameter characterizing the performance of the deep learning model may be determined based on the computing performance of the required platform on which the first target deep learning model needs to be deployed.
  • the first deep learning model obtained in step 101 is used for image feature recognition, and the training is completed on the server, and the size is 10M; while the first target deep learning model described in step 103 needs to be deployed on the mobile terminal , and the mobile terminal only has a 5M cache. Then, based on the performance of the mobile terminal as the demand platform, it can be determined that among the expected parameters representing the performance of the deep learning model, the size of the first target deep learning model should be less than 5M, and the representation can be determined accordingly.
  • the expected parameter for the performance of the deep learning model is that the size of the deep learning model is no larger than 5M.
  • the computing performance of the required platform can be characterized not only by the size of the cache, but also by parameters such as computing speed, computing accuracy, etc.
  • the expected parameter characterizing the performance of the deep learning model may also be the calculation speed, calculation accuracy, etc. of the required platform, the running speed and running accuracy of the deep learning model, etc. determined.
  • the desired parameter characterizing the performance of the deep learning model may be determined based on an application scenario of the first target deep learning model.
  • the first deep learning model obtained in step 101 is used for image feature recognition, is trained on the server, and can recognize details with a resolution of 10 microns.
  • this ultra-high resolution is very meaningful and can help doctors find the location of tiny lesions.
  • the deep learning model described in step 103 is still deployed on the same server, but is applied to the target object recognition of the UAV, then, for the UAV, the meaningful target objects are usually different in size. It will be very small, and identifying objects as small as micrometers, such as floating and sinking, is not very meaningful in itself.
  • the level of the running accuracy of the deep learning model can be determined.
  • the application scenarios of the deep learning model can also be represented by computing speed, model size, etc.
  • the performance of the deep learning model can be characterized by the The expected parameters of , may also be the calculation speed, model size, etc. required by the application scenario, and the determined running speed and size of the deep learning model, etc.
  • the deep learning model has a large number of redundant parameters from the convolutional layer to the fully connected layer.
  • a large number of neuron activation values, vectors, convolution kernels and filters, etc. approach 0.
  • the deep learning model can show the same or similar expressive ability as the original model. This situation is called over-parameterization of the deep learning model.
  • the neurons, vectors, convolution kernels and filters that have little influence on the expressive ability of the deep learning model are the pruning process.
  • the first deep learning model is pruned, which may be implemented in various manners, and the present application does not limit the specific manner used for the pruning. .
  • the pruning may be implemented with reference to related technologies, or may be other pruning methods improved by those skilled in the art.
  • the retraining method is not used, but the first deep learning model is obtained.
  • the deep learning model restores the accuracy of the deep learning model after pruning, and obtains the deep learning model that meets the needs of different platforms or different application scenarios. It can overcome the waste of computing resources caused by the need for separate and repeated training of deep learning models suitable for different platforms or different application scenarios in related technologies, and the inability to obtain suitable deep learning models when computing resources are tight. Defects.
  • step 102 according to the expected parameters, pruning the first deep learning model, including:
  • Step 201 Determine, according to the desired parameter, a trimming amount of a first parameter of each neural network layer of the first deep learning model, where the first parameter at least includes a neuron, a vector, a convolution kernel or a filter;
  • Step 202 Remove a different specified number of first parameters from the first neural network layer that needs to be pruned in the first deep learning model each time, and obtain all the parameters of the second neural network layer after the first neural network layer.
  • the output first feature map, the second neural network layer is a neural network layer in which the size of the output feature map does not change before and after the first neural network layer removes the specified number of first parameters;
  • Step 203 Obtain the error between a plurality of the first feature maps and a second feature map corresponding to the first feature map, where the second feature map is the first feature map of the second neural network layer in the first feature map.
  • the feature map output before the neural network layer does not remove the specified number of first parameters;
  • Step 204 Determine a first parameter to be trimmed based on the trimming amount and the plurality of errors.
  • Figure 3(A) is fine-grained pruning (Fine-grained), which prunes neurons or the weight connections between neurons, which is the smallest granularity pruning
  • Figure 3(B) is vector pruning (Vector-grained).
  • the cropped first parameter may be a neuron, a vector, a convolution kernel or a filter.
  • Figure 4 shows a schematic diagram of the principle of pruning the connections and neurons of the deep learning model.
  • the neurons r 1 , r 2 and r 3 are not 0, and the connections between the neural network layer and the neurons r 1 , r 2 and r 3 are not 0 either; in Fig. 4(B) , the connection between the neural network layer and the neuron r 2 is set to 0, so that the weight connection matrix becomes sparse, which is the weight connection pruning.
  • Vector pruning, convolution kernel pruning, and filter pruning are similar to weight connection pruning. They are to remove certain vectors, convolution kernels and filters in the convolution layer, thereby "slim down" the deep learning model. , reducing the size of the deep learning model. When the removed weight connections, neurons, vectors, convolution kernels and filters have little impact on the performance of the entire deep learning model, removing these parameters can reduce the size of the deep learning model while ensuring the computational performance of the deep learning model. , to increase the running speed.
  • step 102 steps 201 to 204 of pruning the first deep learning model according to the expected parameters will be described.
  • steps 201 to 204 of pruning the first deep learning model according to the expected parameters will be described.
  • Neurons, vectors and convolutions For other types of first parameters such as the kernel, the process of pruning is similar to the process of pruning the filter, and details are not described in this embodiment of the present application.
  • step 101 When the desired parameters are obtained in step 101, it can be determined in various ways that in order to obtain the first target deep learning model, the first parameters of each neural network layer of the first deep learning model need to be determined. The number of cuts to be made.
  • the cropping amount may be a fixed cropping amount pre-set by the developer based on the expected parameters according to experience. For example, the developer can preset, remove 3 filters from the first convolutional layer of the first deep learning model, remove 2 filters from the second convolutional layer... Of course, based on this The method determines the cutting amount, which is inefficient and has poor reliability.
  • step 201 determines the trimming amount of the first parameter of each neural network layer of the first deep learning model, which may include:
  • Step 2012 Determine, according to the ratio and the first deep learning model, a trimming amount of the first parameter of each neural network layer of the first deep learning model.
  • the first deep learning model is a model for beautifying a human face, and the training is completed in a server with high computing performance, and the size is 10M. If you want to deploy the first deep learning model on the user's mobile phone, and the mobile phone can provide only 5M of cache when running the model, then the first deep learning model needs to be pruned to achieve "Slimming down" the first deep learning model. Based on the proportional relationship between 10M and 5M, it can be known that at least the first parameter clipping ratio of each neural network layer of the first deep learning model needs to be 50%. Based on this ratio, in combination with the number of the first parameters of each neural network layer of the first deep learning model, the number of trimmings of the first parameters of each neural network layer of the first deep learning model can be determined .
  • Pruning the neural network layer (hereinafter referred to as “shallow layer”) can greatly improve the running speed of the deep learning model, but the running accuracy of the pruned deep learning model will be reduced; Prune the neural network layer (hereinafter referred to as “deep layer”) at the end of the pruning, which can greatly reduce the amount of parameters of the deep learning model after pruning, reduce the size of the deep learning model after pruning, and the size of the deep learning model after pruning.
  • the reduction in running accuracy is less than that of pruning shallow neural network layers.
  • step 2011, determine the cutting ratio for pruning the first parameter of each neural network layer of the first deep learning model, which can be based on different The preset allocation strategy of “evenly” allocates the same clipping ratio to the first parameter of each neural network layer of the deep learning model. In this case, it is not necessary to consider the deep learning model after pruning. It is also possible to "non-uniformly" assign different clipping ratios to each neural network layer of the deep learning model, thereby reducing the size of the pruned deep learning model and improving the post-pruning. There is a trade-off between the running speed of the deep learning model, which is suitable for different application scenarios and deployment platforms.
  • step 2011 determine the pruning ratio for pruning the first parameters of each neural network layer of the first deep learning model, including:
  • Step 2011a determine the total proportion of the first parameter of the first deep learning model to be trimmed
  • Step 2011b based on a preset allocation strategy, allocate different cropping ratios to multiple neural network layers of the first deep learning model, and the different cropping scales make the first deep learning model after cropping, all The total cropping ratio of the obtained deep learning model is within the preset error range of the total ratio.
  • the preset allocation strategy can be set based on the characteristics of the application scenario and the deployment platform, so as to express whether to give priority to improving the running speed of the pruned deep learning model or reducing the size of the pruned deep learning model.
  • the first deep learning model is a model trained on the server for target object recognition.
  • the first target deep learning model obtained based on the first deep learning model is implemented on the user's mobile phone. In this case, priority should be given to reducing the size of the pruned deep learning model.
  • the preset allocation strategy may be that the clipping ratio allocated to the first N neural network layers of the first deep learning model is a, and the N+1 th neural network layer of the first deep learning model to the last A neural network layer, the allocated clipping ratio is b, and the clipping ratio a ⁇ the clipping ratio b, and the total clipping ratio of the deep learning model after the first deep learning is pruned based on the clipping ratio within the preset error range of the total proportion, so that the first parameter of the deep neural network of the first deep learning model is pruned to ensure that the pruned first deep learning model has more Small size, suitable for mobile platform.
  • the above-mentioned preset allocation strategy is only an exemplary illustration.
  • the preset allocation strategy may also be other content.
  • the intermediate neural network layer of the first deep learning model may also be determined first.
  • the clipping ratio is allocated with a fixed decreasing value, and the neural network layer after the intermediate neural network layer is allocated with a fixed incremental value, based on the total clipping of the first deep learning model.
  • the cropping ratio of the first parameter of each neural network layer of the first deep learning model can be determined, so that the first deep neural network of the first deep learning model
  • the parameters are pruned to ensure that the first deep learning model after pruning has a smaller size and is suitable for mobile phone platforms.
  • the embodiment of the present application does not limit the specific content of the preset allocation strategy and the specific manner of allocating different cropping ratios to the multiple neural networks of the first deep learning model.
  • step 2011, determine the value of the first deep learning model.
  • the pruning ratio of the first parameter of each neural network layer for pruning including:
  • Step 2011c determine the total proportion of the first parameter of the first deep learning model to be trimmed
  • Step 2011d respectively assign the same clipping ratio to the multiple neural network layers of the first deep learning model, and the same clipping ratio makes the obtained deep learning model after clipping the first deep learning model.
  • the total cropping ratio is within a preset error range of the total ratio.
  • the same cropping ratio is allocated to multiple neural network layers of the first deep learning model, which is simple and convenient, does not require too much additional calculation, and can quickly determine the cutting ratio of the first deep learning model. Cropping ratio for multiple neural network layer assignments.
  • the trimming amount of the first parameter of each neural network layer of the first deep learning model After determining the trimming amount of the first parameter of each neural network layer of the first deep learning model through step 201, based on steps 202 and 203, a plurality of the first feature maps and the The error between the second feature maps corresponding to the first feature map is described below.
  • the first parameter is used as a filter, and steps 202 and 203 of the pruning process are described in conjunction with FIG. 5 .
  • Figure 5(A) is the feature map of the i-th layer (i is a positive integer not less than 1) of the first deep learning model before pruning, and the dimension is C*H*W, where, C is the number of channels, H is the height, and W is the width.
  • 5(B) is the ith layer filter bank with dimension Oi*C*h*w, wherein, Oi is the number of filters in the filter bank, C and the channel number C of the ith layer feature map are equal, h is the height of the filter bank, w is the width of the filter bank, and the pruning operation for the i-th layer of neural network layers is to remove the number of filters in the i-th layer Oi by the specified number of Figure 5 ( Filters shown in B) as dashed cuboids.
  • the dimension of the i-th layer filter bank becomes (Oi-1)*C* h*w
  • the dimension of the corresponding feature map of the i+1th layer becomes (Oi-1)*H*W, as shown in Figure 5(C), that is, the performance of cutting out a filter is deep learning
  • the feature map output by the next layer of the model has one less channel, and the dashed box in Figure 5(C) represents a channel that has been pruned.
  • the number of channels of the filter bank of the i+1th layer should correspondingly become the same number of channels Oi-1 as the feature map of the i+1th layer.
  • the filter bank of the i+1th layer is pruned by one channel, it has no effect on the output dimension of the feature map of the i+2 layer.
  • Figure 5(E) it is the i+2 layer after pruning. feature map.
  • the error of the feature map of this layer before and after pruning can be calculated. According to the above process, for each neural network layer in the first deep learning model, different filters are removed each time, and errors between different feature maps corresponding to multiple filters before and after pruning can be obtained.
  • the description is given by removing one filter from the neural network layer of the first deep learning model.
  • the neural network layer of the first deep learning model is also It can be to remove the first parameters of neuron, vector and convolution kernel. When these first parameters are removed, it can also be based on the pruning before and after pruning, the dimension has not changed and is located in the pruned neural network The feature map after the layer is obtained, and the error before and after pruning of the same feature map is obtained.
  • only one first parameter may be removed in one pruning process, or multiple first parameters may be removed simultaneously, for example, in one pruning process , three filters of one filter bank are removed at the same time, which is not limited in this embodiment of the present application.
  • the first parameter to be trimmed can be determined in combination with the trimming amount determined in step 201 .
  • step 204 determines the first parameter to be trimmed, including:
  • Step 2041 sort the multiple errors
  • Step 2042 Based on the sorting result, retain the first parameter of the cropping amount with the smallest error.
  • the errors between the plurality of first feature maps and the second feature maps corresponding to the removal of a plurality of the first parameters can be obtained, and the plurality of errors can be sorted, if the error is small , it means that removing the first parameter has little impact on the subsequent neural network layers; if the error is large, it means that removing the first parameter has a large impact on the subsequent neural network layers. Therefore, by sorting the plurality of errors, the impact of the corresponding removed first parameter on the first deep learning model is measured by the sorting result, and based on the sorting result, the first parameter to be removed is determined, It is scientifically effective, low in computation and easy to implement.
  • the first parameter to be retained or removed may also be determined by other methods based on the error. For example, each error of the plurality of errors can be compared with the remaining errors, and when it is determined that it is the smallest error, it is determined that the first parameter corresponding to the error can be removed, and the first parameter that can be removed can be determined. A parameter, and so on, until the determined quantity of the first parameter that can be removed is equal to the cropping amount determined in step 201 .
  • the error between the first feature map and the second feature map may be determined based on the distance between the first feature map and the second feature map, and the distance may be Euclidean Distance, Manhattan distance, Chebyshev distance, Minkowski distance, Mahalanobis distance, etc., are not limited in the embodiments of the present application.
  • the parameter of distance is used to measure the error between the first feature map and the second feature map, which is convenient for calculation and easy to determine.
  • the first parameter that needs to be removed when the first deep learning model is pruned can be determined, and then the second deep learning model can be obtained.
  • the parameters of the first deep learning model can be fixed based on step 103, and the first deep learning model and the second deep learning model can be jointly trained to obtain the satisfaction The first target deep learning model of the desired parameters.
  • the joint training of the first deep learning model and the second deep learning model described in step 103 may be implemented in various specific manners, which is not limited in this embodiment of the present application.
  • step 103 can be implemented based on the distillation technology in the knowledge transfer technology, that is, the first deep learning model is used as a teacher model, and the second deep learning model is used as a student model ( student model), fix the parameters of the first deep learning model, establish a loss function related to the two deep learning models based on the first deep learning model and the second deep learning model, and input to the two deep learning models Using the same training data, the parameters of the second deep learning model are adjusted based on the loss function, so as to achieve the effect of restoring the accuracy of the second deep learning model under the guidance of the first deep learning model.
  • a specific distillation network is given: the training data is input into the first deep learning model, the first deep learning model only participates in the forward transfer, and the output result is heated (/T) Then, through the first Softmax, the softened Softtarget is obtained; the same training data is input into the second deep learning model, and the output result passes through the same temperature (/T) as the first deep learning model, and then passes through the second deep learning model.
  • Softmax carries out KL divergence calculation with Soft target to obtain distillation loss; the same training data is input into the second deep learning model, and the output result passes through the third Softmax, and cross-entropy calculation with hard target is carried out to obtain student loss; The distillation loss and the student loss are used to construct a joint loss, and the second deep learning model is trained, that is, the accuracy of the second deep learning model can be restored under the guidance of the first deep learning model.
  • the first deep learning model obtained in advance is pruned, and after the second deep learning model that meets the expected parameters is obtained, the first deep learning model is fixed.
  • a parameter of a deep learning model, the joint training of the first deep learning model and the second deep learning model can restore the accuracy of the second deep learning model, so that the finally obtained first target deep learning model , compared with the first deep learning model, it is not only a lightweight deep learning model, but also has higher running accuracy.
  • the retraining method is not used, but the In the first deep learning model, the precision of the deep learning model after pruning is restored, and a deep learning model that meets the requirements of different platforms or different application scenarios is obtained. It can overcome the waste of computing resources caused by the need for separate and repeated training of deep learning models suitable for different platforms or different application scenarios in related technologies, and the inability to obtain suitable deep learning models when computing resources are tight. Defects.
  • the first deep learning model is pruned, and other pruning methods may also be used in addition to the methods described in the foregoing embodiments.
  • step 102, pruning the first deep learning model according to the desired parameters, as shown in FIG. 7, includes:
  • Step 701 each time for multiple neural network layers in the first deep learning model, remove a different specified number of first parameters to obtain a plurality of third deep learning models, the first parameters at least include neurons, vector, kernel or filter;
  • Step 702 obtaining evaluation parameters that characterize the performance of each third deep learning model, where the performance at least includes the size, running speed and running accuracy of the third deep learning model;
  • Step 703 Determine, based on the evaluation parameter and the expected parameter, a first parameter that needs to be trimmed in the first deep learning model.
  • step 701 removing a different specified number of first parameters from multiple neural network layers in the first deep learning model each time can be performed based on a preset order, for example, from the first depth Starting from the first neural network layer of the learning model, one of the first parameters is removed each time to obtain the corresponding third deep learning model, and then the evaluation parameters of the performance of each of the third deep learning models are obtained, for example, each The size of each of the third deep learning models, and inputting the same unlabeled data to each third deep learning model to obtain the running time, running accuracy, etc. of the output of each third deep learning model.
  • step 703 based on the evaluation parameter and the expected parameter, determine the first parameter that needs to be tailored in the first deep learning model, including:
  • Step 7031 obtaining the distance between the evaluation parameter and the desired parameter
  • Step 7032 sort the distances
  • Step 7034 Based on the distance sorting result, determine the first parameter that needs to be trimmed.
  • step 702 performance evaluation parameters of a plurality of third deep learning models obtained after removing a different specified number of first parameters for a plurality of neural network layers of the first deep learning model can be obtained.
  • step and 7031 the distance between the evaluation parameter and the expected parameter can be obtained, which can quantify the first parameter removed from the neural network layer in the first deep learning model in step 701 in a single pass.
  • the degree of impact on the performance of deep learning models For example, the first deep learning model before pruning has been trained on the server, the size is 10M, and when it is used for face beautification, the running time is 10ms (here, "running time” is used to measure “running speed” ”).
  • the obtained first target deep learning model needs to be deployed on the user's mobile terminal, the expected model size is 5M, and the running time is 18ms.
  • the size of the obtained third deep learning model is 9.8M, which is used for human faces.
  • the running speed is 30ms; after removing the second filter from the first neural network layer of the first deep learning model, the size of the third deep learning model obtained is 9.9M, which is used for face When beautifying the face, the running speed is 34.7ms....
  • the size of the third deep learning model obtained is 6.6M
  • the running speed is 40.1ms.
  • the obtained performance evaluation parameters of multiple third deep learning models ie, the size, running speed, running accuracy, etc. of multiple third deep learning models
  • the expected parameters ie, the expected model size, running speed, etc.
  • time and running accuracy, etc. to obtain a plurality of distances between the evaluation parameters and the expected parameters, as shown in Table 1.
  • Table 1 The distance between the evaluation parameters and the expected parameters of the third deep learning model after pruning
  • the distances between the plurality of evaluation parameters and the expected parameters are sorted, and based on the distance sorting result, the first parameter to be trimmed can be determined.
  • the model size distance or running time distance can also be weighted to obtain a comprehensive The distance, comprehensively considering the influence of the size of the deep learning model and the running time, to determine the first parameter that needs to be trimmed.
  • the evaluation parameter may also be the running accuracy, running speed, etc. of the deep learning model, which is not limited in this embodiment of the present application.
  • the first deep learning model for the first deep learning model, a number of different specified first parameters are removed to obtain a plurality of third deep learning models, which are based on a plurality of evaluation parameters of the performance of the third deep learning models , combined with the desired parameters, it is possible to achieve purpose-oriented, simply and effectively determine the first parameters that need to be pruned, obtain deep learning models that meet the needs of different platforms or different application scenarios, and can also overcome related technologies. , the waste of computing resources caused by the need to separate and repeat the training of deep learning models suitable for different platforms or different application scenarios, and the defect that a suitable deep learning model cannot be obtained in the case of tight computing resources.
  • the first deep learning model is pruned to obtain a second deep learning model.
  • Steps 201 to 204, and steps 701 to 703 respectively give different Pruning method.
  • the specified number may be any integer, that is, for each neural network layer of the first deep learning model, the first parameter may not be trimmed, or Remove multiple first parameters at once. That is, in each pairing of the first deep learning model, the first neural network layer that needs to be pruned, removes a group of the first parameters (that is, a group includes more than one first parameter).
  • each group of the first parameters is pruned.
  • the number of the first parameters included in each group may be the same or different, which is not limited in this embodiment of the present application.
  • the specified number is 1. That is, one of the first parameters is removed from the first neural network layer that needs to be pruned in the first deep learning model each time, and the first parameters at least include neurons, vectors, convolution kernels or filters.
  • each time the first neural network layer needs to be pruned one of the first parameters can be removed, and each of the first parameters can be accurately determined.
  • the impact of the clipping on the performance of the first deep learning model, so that the subsequently determined first parameter that needs to be clipped is the first parameter that has less impact on the performance of the first deep learning model, so as to ensure that the While the first deep learning model performs "slim down", it is ensured that the performance of the first deep learning model does not decrease significantly.
  • a second deep learning model After pruning the first deep learning model, a second deep learning model can be obtained, and then, through step 103, the first deep learning model and the second deep learning model are jointly trained to achieve accuracy recovery .
  • the joint training adopted in step 3 may be implemented based on various specific manners, for example, may be implemented based on the distillation technology in the related art, which is not limited in this embodiment of the present application.
  • the first deep learning model is pruned according to the desired parameters, which may also be implemented with reference to related technologies.
  • pruning the first deep learning model according to the desired parameters includes:
  • NAS Neural Architecture Search
  • NAS Neural Architecture Search
  • the principle of the NAS method is to give a set of candidate neural network structures called a search space, based on a preset search strategy, search for the network structure from the search space, and based on the preset performance evaluation strategy, the searched network structure is analyzed.
  • the advantages and disadvantages of the structure are evaluated, and then it is determined whether the searched network structure is the optimal network structure.
  • the preset performance evaluation strategy that is, measured by certain indicators, such as running accuracy, running speed, etc., is called performance evaluation.
  • the search space may be a set of all neural network structures included in the first neural network.
  • the performance evaluation strategy can set the running speed, running accuracy and size of the network structure that is required to be searched out according to the requirements of the deployment platform and the application scenario.
  • the network structure can be automatically combined until the network structure satisfies the performance evaluation strategy, that is, the pruning process of the first deep learning model is completed.
  • the first deep learning model can be pruned to obtain the second deep learning model. Then, through step 103, joint training is performed on the first deep learning model and the second deep learning model, so that accuracy recovery can be achieved.
  • the joint training adopted in step 3 may be implemented based on various specific manners, for example, may be implemented based on the distillation technology in the related art, which is not limited in this embodiment of the present application.
  • the method of retraining is not used, but the first deep learning model is not used.
  • the deep learning model and the pruned deep learning model are jointly trained to restore the accuracy of the pruned deep learning model, and obtain deep learning models that meet the needs of different platforms or application scenarios. It can overcome the waste of computing resources caused by the need for separate and repeated training of deep learning models suitable for different platforms or different application scenarios in related technologies, and the inability to obtain suitable deep learning models when computing resources are tight. Defects.
  • the embodiment of the present application also provides another method for acquiring a deep learning model, as shown in FIG. 9 , the method includes:
  • Step 901 obtaining the first deep learning model and the desired cropping amount of the deep learning model
  • Step 902 according to the expected trimming amount, determine the trimming amount of the first parameter of each neural network layer of the first deep learning model, and the first parameter includes at least neurons, vectors, convolution kernels or filters ;
  • Step 903 each time the first neural network layer that needs to be pruned in the first deep learning model, remove a different specified number of first parameters, and obtain the second neural network layer after the first neural network layer.
  • the output first feature map, the second neural network layer is a neural network layer in which the size of the output feature map does not change before and after the first neural network layer removes the specified number of first parameters;
  • Step 904 Obtain the error between a plurality of the first feature maps and a second feature map corresponding to the first feature map, where the second feature map is the first feature map of the second neural network layer in the first feature map.
  • the feature map output before the neural network layer does not remove the specified number of first parameters;
  • Step 905 based on the expected trimming amount and the multiple errors, determine the first parameter to be trimmed;
  • Step 906 trim the first parameter to be trimmed to obtain a second target deep learning model.
  • the first deep learning model may be based on an existing or self-developed deep learning framework, obtained by training on a platform with strong computing performance, or may be obtained directly from other places and completed by a third-party training
  • the deep learning model may also be obtained in other manners, which are not limited in this embodiment of the present application.
  • the desired cropping amount of the deep learning model may be a preset fixed desired cropping amount, that is, regardless of the performance of the first deep learning model, for all the first deep learning models, a fixed proportion or a fixed amount is cropped
  • the first parameter of , the performance includes the size, running speed and running accuracy of the first deep learning model.
  • it can also be a preset cropping amount related to the performance parameter of the first deep learning model. For example, for the first deep learning model whose size is smaller than the first threshold, the cropping amount is the first proportion or the first quantity.
  • the first parameter of for the first deep learning model whose size is between the first threshold and the second threshold, the second ratio or the second quantity of the first parameter is cropped, and for the size between the second threshold and the second threshold.
  • the first deep learning model between the third thresholds, the first parameter of the third ratio or the third quantity is trimmed... and so on, the embodiment of the present application obtains the desired trimming amount of the deep learning model
  • the method is not limited.
  • the desired cropping amount is determined according to desired parameters representing the performance of the deep learning model, and the performance of the deep learning model includes at least the size, running speed and running accuracy of the deep learning model. That is, in these embodiments, the desired trimming amount is not a preset trimming amount, but is determined according to an expected parameter characterizing the performance of the deep learning model.
  • the expected parameters that characterize the performance of the deep learning model can be obtained in a preset manner, or obtained from a demand platform where the second target deep learning model needs to be deployed, or obtained from a third party. , which is not limited in the embodiments of the present application.
  • the expected parameter characterizing the performance of the deep learning model may be determined based on the computing performance of the platform on which the second target deep learning model needs to be deployed, or may be determined based on the application scenario of the second target deep learning model Sure.
  • the expected parameter characterizing the performance of the deep learning model may be determined based on the computing performance of the platform on which the second target deep learning model needs to be deployed, or may be determined based on the application scenario of the second target deep learning model Sure.
  • the desired cropping amount is determined according to desired parameters characterizing the performance of the deep learning model, including:
  • Step 9011 determine the proportion of the first parameter of each neural network layer of the first deep learning model to be trimmed
  • Step 9012 Determine, according to the ratio and the first deep learning model, the number of trimmings of the first parameters of each neural network layer of the first deep learning model.
  • Steps 9011 and 9012 described in the embodiments of the present application are similar to steps 2011 and 2012 in the first method for acquiring a deep learning model provided by the embodiments of the present application. Repeat.
  • step 902 according to the desired cropping amount, determining the cropping amount of the first parameter of each neural network layer of the first deep learning model, which may include:
  • Step 9021 based on a preset allocation strategy, allocate different trimming amounts to each neural network layer of the first deep learning model, and the different trimming amounts make the first deep learning model after trimming, all
  • the total cropping amount of the obtained deep learning model is within the preset error range of the desired cropping amount; or,
  • Step 9022 Allocate the same clipping amount to the multiple neural network layers of the first deep learning model respectively, and the same clipping amount makes the obtained deep learning model after clipping the first deep learning model.
  • the total trimming amount is within the preset error range of the desired trimming amount.
  • Step 9021 described in this embodiment of the present application is similar to step 2011b in the method for acquiring the first deep learning model provided by this embodiment of the present application; step 9022 is similar to the first deep learning model provided by this embodiment of the present application.
  • Step 2011d in the acquisition method is similar. The relevant content of this part has been introduced in detail in the previous section, and will not be repeated here.
  • step 9021 to determine the trimming amount of the first parameter of each neural network layer of the first deep learning model can give priority to ensuring the size of the deep learning model or whether Priority is given to ensuring the running speed of the deep learning model to make a trade-off, so that the pruned deep learning model can meet the application requirements of different platforms and different application scenarios.
  • step 9022 to determine the trimming amount of the first parameter of each neural network layer of the first deep learning model is simple in calculation, easy to implement, and can save computing resources.
  • step 905 determine the first parameter to be trimmed, including:
  • Step 9051 sort the multiple errors
  • Step 9052 Based on the sorting result, retain the first parameter of the desired cropping amount with the smallest error.
  • Steps 9051 and 9052 described in this embodiment of the present application are similar to steps 2041 and 2042 in the first method for acquiring a deep learning model provided by this embodiment of the present application. Repeat.
  • the influence of the corresponding removed first parameter on the first deep learning model is measured by the sorting result, and based on the sorting result, it is determined that the The first parameter is scientific and effective, with low computational complexity and easy realization.
  • the specified number in step 903 can be any integer, that is, for each neural network layer of the first deep learning model, the first parameter may not be trimmed. , and multiple first parameters may be removed at one time, which is not limited in this embodiment of the present application.
  • the specified number is one. That is, one of the first parameters is removed from the first neural network layer that needs to be pruned in the first deep learning model each time, and the first parameters at least include neurons, vectors, convolution kernels or filters.
  • each time the first neural network layer needs to be pruned one of the first parameters can be removed, and each of the first parameters can be accurately determined.
  • the impact of the clipping on the performance of the first deep learning model, so that the subsequently determined first parameter that needs to be clipped is the first parameter that has less impact on the performance of the first deep learning model, so as to ensure that the While the first deep learning model performs "slim down", it is ensured that the performance of the first deep learning model does not decrease significantly.
  • the error between the first feature map and the second feature map may be determined based on the distance between the first feature map and the second feature map, and the distance may be Euclidean Distance, Manhattan distance, Chebyshev distance, Minkowski distance, Mahalanobis distance, etc., are not limited in the embodiments of the present application.
  • the parameter of distance is used to measure the error between the first feature map and the second feature map, which is convenient for calculation and easy to determine.
  • the method for obtaining the second deep learning model provided by the embodiment of the present application further includes: based on the same training data and loss function as the first deep learning model, performing a Perform training on the acquired second target deep learning at 906 to obtain a deep learning model after accuracy recovery.
  • the above-mentioned parameters of the first deep learning model can also be fixed, and the first deep learning model and the second target deep learning model can be jointly trained to obtain Deep Learning Models for Accuracy Recovery.
  • the pre-acquired first deep learning model is pruned, and after the second target deep learning model is obtained, the second target deep learning model is pruned.
  • Accuracy recovery is performed, so that the finally obtained deep learning model, compared with the first deep learning model, is not only a lightweight deep learning model, but also has higher running accuracy.
  • the embodiment of the present application also provides a deep learning model acquiring system as shown in FIG. 11 , the system includes a first platform 1101 and a second platform 1102; the first platform 1101 is used to acquire the target deep learning model based on the acquisition methods of the deep learning models provided in the previous embodiments of this application, and the second platform 1102 is used to deploy the target deep learning model Learning models.
  • the platform includes at least one of the following: a server cluster, a server, a mobile terminal, etc.
  • a server cluster a server cluster
  • a server a server
  • a mobile terminal a mobile terminal
  • it may also be other platforms capable of acquiring or deploying the deep learning model, which is not limited in this embodiment of the present application.
  • the target deep learning model can be acquired based on the first platform, and the second platform can deploy the first platform.
  • the obtained target deep learning model can overcome the waste of computing resources caused by the need for separate and repeated training of deep learning models suitable for different platforms or different application scenarios in the related art, and the situation of tight computing resources. , the defect of not being able to obtain a suitable deep learning model.
  • an embodiment of the present application further provides an apparatus corresponding to the deep learning model acquisition method.
  • FIG. 12 a hardware structure diagram of an apparatus for obtaining a deep learning model provided by an embodiment of the present application
  • the apparatus includes a memory 1201 and a processor 1202 and a memory 1201 and a processor 1202 and a A computer program, when the processor executes the program, any method embodiment provided by the embodiments of this application is implemented.
  • the memory 1201 may be an internal storage unit of the deep learning model acquiring apparatus, such as a hard disk or a memory of the device.
  • the memory 1201 can also be an external storage device of the deep learning model acquisition device, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) equipped on the device card, Flash Card, etc. Further, the memory 1201 may also include both an internal storage unit of the deep learning model acquiring apparatus and an external storage device.
  • the memory is used to store the computer program and other programs and data required by the device.
  • the memory may also be used to temporarily store data that has been output or is to be output.
  • the processor 1202 calls the program stored in the memory 1201 to execute the methods of the foregoing embodiments, which have been described in detail above and will not be repeated here.
  • the actual function of the device is usually obtained according to the deep learning model, and may also include other hardware, such as a network interface, etc., which will not be repeated in this application.
  • the embodiments of the present application also provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements all the embodiments of the above methods of the present application, It is not repeated here.
  • the computer-readable storage medium may be an internal storage unit of any electronic device, such as a hard disk or a memory of the electronic device.
  • the computer-readable storage medium may also be an external storage device of the electronic device, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) equipped on the device card, Flash Card, etc.
  • the computer-readable storage medium may also include both an internal storage unit of the electronic device and an external storage device.
  • the computer-readable storage medium is used to store the computer program and other programs and data required by the electronic device.
  • the computer-readable storage medium can also be used to temporarily store data that has been or will be output.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

本申请提供一种深度学习模型的获取方法,所述方法包括:获取第一深度学习模型和表征深度学习模型性能的期望参数,所述深度学习模型性能至少包括以下之一:所述深度学习模型的大小、运行速度和运行精度;根据所述期望参数,对所述第一深度学习模型进行剪枝,得到第二深度学习模型;固定所述第一深度学习模型的参数,对所述第一深度学习模型和所述第二深度学习模型进行联合训练,获得满足所述期望参数的第一目标深度学习模型。应用本申请实施例所提供的方法,能够获取满足不同平台或者不同应用场景的需求的深度学习模型,且还能够克服相关技术中计算资源的浪费,以及在计算资源紧张的情况下,无法获取合适的深度学习模型的缺陷。

Description

深度学习模型的获取方法、系统、装置及存储介质 技术领域
本申请涉及深度学习技术领域,尤其涉及一种深度学习模型的获取方法、系统、装置及计算机可读存储介质。
背景技术
随着深度学习技术的快速发展,深度学习模型的应用越来越广泛。当前,深度学习模型可以部署在如服务器集群、服务器、移动终端等等的各类平台上,以及应用在人脸识别、美颜、语义分割等诸多不同场景中。然而,由于不同平台具有不同的计算性能,以及不同应用场景往往具有不同的计算需求,为了能够使深度学习模型达到不同平台或不同应用场景的运行速度以及运行精度等等的需求,相关技术中,针对不同平台或不同应用场景对同一种深度学习模型有不同需求的研发,基本以训练多个分离的模型为主。
相关技术所使用的上述方法,虽然可以获得满足不同平台或不同应用场景的需求的深度学习模型,但是,由于多个分离的深度学习模型的训练会消耗很多重复的计算资源,造成计算资源的巨大浪费。此外,在计算资源紧张的情况下,多个分离的深度学习模型的训练,也存在实现困难的缺陷。
发明内容
为克服相关技术中,为了获取适用于不同平台或不同应用场景的深度学习模型,采用分离训练多个深度学习模型的方法中所存在的浪费计算资源、实现困难等诸多缺陷,本申请提供了一种深度学习模型的获取方法、系统、装置及计算机可读存储介质。
根据本申请实施例的第一方面,提供一种深度学习模型的获取方法,所述方法包括:获取第一深度学习模型和表征深度学习模型性能的期望参数,所述深度学习模型性能至少包括以下之一:所述深度学习模型的大小、运行速度和运行精度;根据所述期望参数,对所述第一深度学习模型进行剪枝,得到第二深度学习模型;固定所述第一深度学习模型的参数,对所述第一深度学习模型和所述第二深度学习模型进行联 合训练,获得满足所述期望参数的第一目标深度学习模型。
根据本申请实施例的第二方面,提供另一种深度学习模型的获取方法,所述方法包括:获取第一深度学习模型和所述深度学习模型的期望裁剪量;根据所述期望裁剪量,确定所述第一深度学习模型的每个神经网络层的第一参量的裁剪量,所述第一参量至少包括神经元、向量、卷积核或滤波器中的一个;每次对所述第一深度学习模型中需要剪枝的第一神经网络层,去掉不同的指定数量的第一参量,获得所述第一神经网络层之后的第二神经网络层所输出的第一特征图,所述第二神经网络层为所述第一神经网络层去掉所述指定数量的第一参量前后,输出的特征图尺寸未发生变化的神经网络层;获得多个所述第一特征图和与所述第一特征图对应的第二特征图之间的误差,所述第二特征图为所述第二神经网络层在所述第一神经网络层未去掉所述指定数量的第一参量之前所输出的特征图;基于所述期望裁剪量和所述多个误差,确定需要裁剪的第一参量;对所述需要裁剪的第一参量进行裁剪,获得第二目标深度学习模型。
根据本申请实施例的第三方面,提供一种深度学习模型获取系统,所述系统包括第一平台和第二平台;所述第一平台,用于基于本申请实施例的第一方面和第二方面所述的方法,获取目标深度学习模型;所述第二平台,用于部署所述目标深度学习模型;所述平台至少包括以下之一:服务器集群、服务器、移动终端。
根据本申请实施例的第四方面,提供一种深度学习模型获取装置,所述装置包括存储器和处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现本申请实施例第一方面和第二方面所述的方法。
根据本申请实施例的第五方面,提供一种计算机可读存储介质,所述计算机可读存储介质上存储有若干计算机指令,所述计算机指令被执行时实现本申请实施例第一方面和第二方面所述的方法。
本申请的实施例提供的技术方案可以包括以下有益效果:
在本申请的实施例中,基于表征深度学习模型性能的期望参数,对预先获取的第一深度学习模型进行剪枝,得到满足所述期望参数的第二深度学习模型之后,固定所述第一深度学习模型的参数,对所述第一深度学习模型和所述第二深度学习模型进行联合训练,能够对所述第二深度学习模型进行精度恢复,使得最终获得的第一目标深度学习模型,相对于所述第一深度学习模型,不仅为轻量化的深度学习模型,同时还具备较高的运行精度。可见,应用本申请实施例所提供的方法,能够基于第一深度 学习模型,获得剪枝后的深度学习模型之后,不采用重新训练的方法,而是基于所述第一深度学习模型,对剪枝之后的深度学习模型进行精度恢复,获取满足不同平台或者不同应用场景的需求的深度学习模型。能够克服相关技术中,需要对适用于不同平台或者不同应用场景的深度学习模型进行分离且重复的训练而导致的计算资源的浪费,以及在计算资源紧张的情况下,无法获取合适的深度学习模型的缺陷。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本说明书。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本说明书根据一示例性实施例示出的第一种深度学习模型的获取方法的流程示意图。
图2是本说明根据一示例性实施例示出的第一种剪枝方法的流程示意图。
图3是本说明根据一示例性实施例示出的一种4种不同粒度的剪枝方法的比较示意图。
图4是本说明根据一示例性实施例示出的一种对深度学习模型的连接和神经元进行裁剪的原理示意图。
图5是本说明根据一示例性实施例示出的第一种剪枝方法的原理示意图。
图6是本说明根据一示例性实施例示出的一种蒸馏网络示意图。
图7是本说明根据一示例性实施例示出的第二种剪枝方法的流程示意图。
图8是本说明根据一示例性实施例示出的第三种剪枝方法的原理示意图。
图9是本说明书根据一示例性实施例示出的第二种深度学习模型的获取方法的流程示意图。
图10是本说明书根据一示例性实施例示出的一种对第二目标深度学习模型进行精度恢复的流程示意图。
图11是本说明书根据一示例性实施例示出的一种深度学习模型获取系统的结构示意图。
图12是本说明书根据一示例性实施例示出的一种深度学习模型获取装置的结构示意图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。
在本申请使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申请。在本申请说明书和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本申请可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本申请范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
近年来,由于基于深度学习模型的各种技术方案,相对于传统的算法具有学习能力强、适应力强、可移植性好等诸多优点,深度学习技术迅猛发展。
目前,对于同一种深度学习模型,可能会存在需要部署在不同平台以及应用到不同场景的需求。例如,对于同一种用于对无人机所采集的图像进行特征识别的深度学习模型,对于用户来说,为了方便使用,增强用户体验,所述深度学习模型需要部署在用户端的移动终端或者专用的控制终端上;而对于无人机厂商来说,为了方便监控所述无人机的飞行安全,所述深度学习模型可能需要部署在无人机厂商端的服务器或者服务器集群上;此外,对于第三方监管结构来说,为了监控所述无人机的飞行状态,所述深度学习模型可能需要部署在所述第三方监管结构的服务器或者服务器集群上。
对于不同的移动终端、专用的控制终端、服务器以及服务器集群等平台,往往具有不同的计算性能。同一种深度学习模型,存在着无法直接部署在不同平台上使用的问题。例如,为了能够发挥出服务器集群最大计算能力,针对服务器集群所训练出来的深度学习模型为重量级深度学习模型,即训练出的深度学习模型的大小较大、具有较高的运算精度和较快的运算速度。然而,由于移动终端的计算性能远远低于服务器集群的计算性能,故针对服务器集群所训练出来的深度学习模型并不能直接部署在移动终端上,如果强行部署在移动终端上,则可能会导致所述移动终端的运算速度十分慢,甚至出现计算资源崩溃的情况。
同一个深度学习模型,即使部署在同一个平台上,但是基于应用场景的需求,可能对所述深度学习模型的性能也存在着不同的需求。例如,对于同一个用于目标识别的深度学习模型,在实时应用场景中,则需要所述深度学习模型具有较快的运行速度,而在实时性要求不高的应用场景中,则可能对于运行速度没有太高的要求。
相关技术中,为了解决同一种深度学习模型,无法直接部署在不同平台上以及无法更好地应用在不同的场景中的问题,通常会采用独立地训练多个分离的深度学习模型的方法。即针对不同的平台或应用场景,采用相同的深度学习框架,基于各个平台自身的计算性能或不同应用场景的目标需求进行训练,最终获得适用于不同平台或不同应用场景的多个深度学习模型。
虽然相关技术中的技术方案理论上能够获得适用于不同平台或不同应用场景的多个深度学习模型,但是由于采用分离训练的方式,那么,针对不同的平台或应用场景进行训练,每次训练都需要从深度学习框架这个起点进行重新训练,进而会导致消耗很多的重复的计算资源,造成计算资源的巨大浪费。此外,在计算资源比较紧张的情况下,相关技术中所采用的技术方案,甚至会存在无法实现的问题。
为了解决相关技术在获取适用于不同平台或者不同应用场景的深度学习模型的过程中,所存在的重复计算资源的浪费问题以及在计算资源比较紧张的情况下,相关技术甚至无法获取适用于不同平台或者不同应用场景的深度学习模型的问题,本申请提供了一种深度学习模型的获取方法,如图1所示,是本申请实施例所提供的第一种深度学习模型的获取方法,所述方法包括:
步骤101,获取第一深度学习模型和表征深度学习模型性能的期望参数,所述深度学习模型性能至少包括以下之一:所述深度学习模型的大小、运行速度和运行精度;
步骤102,根据所述期望参数,对所述第一深度学习模型进行剪枝,得到第二深度学习模型;
步骤103,固定所述第一深度学习模型的参数,对所述第一深度学习模型和所述第二深度学习模型进行联合训练,获得满足所述期望参数的第一目标深度学习模型。
其中,所述第一深度学习模型,可以是基于现有的或者自己开发的深度学习框架,利用具有计算能力的平台,尤其是计算能力较强的平台训练获得,也可以是直接从其他的地方获取的由第三方训练完成的深度学习模型,当然,还可以是其他获取方式,本申请实施例对此不做限制。
所述表征深度学习模型性能的期望参数,可以通过预先设置的方式获取,也可以是通过向所述第一目标深度学习模型需要部署的需求平台获取,还可以是通过向第三方获取的方式得到,本申请实施例对此不做限定。
在一些实施例中,所述表征深度学习模型性能的期望参数,可以基于所述第一目标深度学习模型需要部署的需求平台的计算性能确定。例如,步骤101中所获取的第一深度学习模型是用于图像特征识别的,在服务器上训练完成,大小为10M;而步骤103所述的第一目标深度学习模型,是需要部署在移动终端上的,而所述移动终端仅具有5M的缓存。那么,基于作为需求平台的所述移动终端的性能,可以确定所述表征深度学习模型性能的期望参数中,所述第一目标深度学习模型的大小应为5M以下,可以据此确定所述表征深度学习模型性能的期望参数为所述深度学习模型的大小不大于5M。当然,本领域技术人员应当理解,上述举例仅为示例性说明,当然,所述需求平台的计算性能,除了可以以缓存大小来表征,还可以以计算速度、计算精度等等参数表征,相应地,所述表征深度学习模型性能的期望参数,还可以是基于所述需求平台的计算速度、计算精度等等,所确定的所述深度学习模型的运行速度和运行精度等等。
在一些实施例中,所述表征深度学习模型性能的期望参数,可以基于所述第一目标深度学习模型的应用场景确定。例如,步骤101中所获取的第一深度学习模型是用于图像特征识别的,在服务器上训练完成,能够识别10微米分辨率的细节。对于某些应用场景,比如疾病的智能诊断上,这种超高分辨率是十分有意义的,能够帮助医生发现微小的病变位置。然而,如果步骤103中所述的深度学习模型还是部署在同一服务器上,但是是应用到无人机的目标物体识别上,那么,对于无人机来说,有意义的目标物体,通常尺寸不会十分小,而识别尺寸小至微米级别的物体,例如浮沉,本 身也没有太大的意义。那么,基于所述第一目标深度学习模型的应用场景,可以确定所述表征深度学习模型性能的期望参数中,所述深度学习模型的运行精度的级别。当然,本领域技术人员应当理解,上述举例仅为示例性说明,所述深度学习模型的应用场景,还可以转化为计算速度、模型大小等等来表征,相应地,所述表征深度学习模型性能的期望参数,还可以是基于应用场景所需求的计算速度、模型大小等等,所确定的所述深度学习模型的运行速度和大小等等。
深度学习模型从卷积层到全连接层存在着大量冗余的参数,神经网络的各个层中,大量的神经元激活值、向量、卷积核以及滤波器等等,趋近于0。将这些神经元、向量、卷积核以及滤波器去除后,深度学习模型可以表现出与原来的模型相同或者近似的表达能力,这种情况被称为深度学习模型的过参数化,去除各个神经网络层中,对深度学习模型表达能力影响较小的神经元、向量、卷积核和滤波器,就是剪枝过程。
在本申请实施例中,步骤102,根据所述期望参数,对所述第一深度学习模型进行剪枝,可以由多种实现方式,本申请对所述剪枝所采用的具体方式不做限制。在下文,给出本申请所提供的几种具体的剪枝方式,但是,本领域技术人员应当理解,下文各个实施例仅为示例性说明,并非是对所述剪枝方式的限定,所述剪枝可以参考相关技术实现,也可以是本领域技术人员自己所改进的其他剪枝方式。
通过上述实施例可以看到,应用本申请实施例所提供的方法,能够基于第一深度学习模型,获得剪枝后的深度学习模型之后,不采用重新训练的方法,而是基于所述第一深度学习模型,对剪枝之后的深度学习模型进行精度恢复,获取满足不同平台或者不同应用场景的需求的深度学习模型。能够克服相关技术中,需要对适用于不同平台或者不同应用场景的深度学习模型进行分离且重复的训练而导致的计算资源的浪费,以及在计算资源紧张的情况下,无法获取合适的深度学习模型的缺陷。
在一些实施例,如图2所示,步骤102,根据所述期望参数,对所述第一深度学习模型进行剪枝,包括:
步骤201,根据所述期望参数,确定所述第一深度学习模型的每个神经网络层的第一参量的裁剪量,所述第一参量至少包括神经元、向量、卷积核或滤波器;
步骤202,每次从所述第一深度学习模型中需要剪枝的第一神经网络层,去掉不同的指定数量的第一参量,获得所述第一神经网络层之后的第二神经网络层所输出的第一特征图,所述第二神经网络层为所述第一神经网络层去掉所述指定数量的第一 参量前后,输出的特征图尺寸未发生变化的神经网络层;
步骤203,获得多个所述第一特征图和与所述第一特征图对应的第二特征图之间的误差,所述第二特征图为所述第二神经网络层在所述第一神经网络层未去掉所述指定数量的第一参量之前所输出的特征图;
步骤204,基于所述裁剪量和所述多个误差,确定需要裁剪的第一参量。
对于深度学习模型进行剪枝,根据粒度的不同,可以如图3所示,至少粗分为4个粒度。图3(A)为细粒度剪枝(Fine-grained),对神经元或者神经元之间的权重连接进行剪枝,是粒度最小的剪枝;图3(B)为向量剪枝(Vector-level),相对于细粒度剪枝粒度更大,属于卷积核内部(Intra-kernel)的剪枝;图3(C)为卷积核剪枝(Kernel-level),即去除卷积层中的某个卷积核,它将丢弃对输入通道中对应计算通道的响应;图4(D)为滤波器剪枝(Filter-level),对整个卷积核组进行剪枝。因此,在步骤201中,被裁剪的第一参量可以是神经元、向量、卷积核或者滤波器。
图4给出了一个对深度学习模型的连接和神经元进行裁剪的原理示意图。在图4(A)中,神经元r 1、r 2和r 3不为0,且神经网络层与神经元r 1、r 2和r 3的连接也不为0;在图4(B)中,对神经网络层与神经元r 2之间的连接置0,使得权重连接矩阵变得稀疏,这就是权重连接剪枝。向量剪枝、卷积核剪枝以及滤波器剪枝与权重连接剪枝类似,分别是将卷积层中的某些向量、卷积核和滤波器去除,从而对深度学习模型进行“瘦身”,减小深度学习模型的大小。当去除的权重连接、神经元、向量、卷积核以及滤波器对整个深度学习模型的性能影响较小时,去除这些参量,能够在保证深度学习模型计算性能的同时,减小深度学习模型的大小,提高运行速度。
在这里,以所述第一参量为滤波器,对步骤102,根据所述期望参数,对所述第一深度学习模型进行剪枝的步骤201至204进行说明,对神经元、向量以及卷积核等其他类型的第一参量,进行剪枝的过程同对滤波器进行剪枝的过程类似,本申请实施例不做赘述。
当在步骤101获取了所述期望参数,则可以通过多种方式,确定出为了获得所述第一目标深度学习模型,需要对所述第一深度学习模型的每个神经网络层的第一参量进行裁剪的数量。
在一些实施例中,所述裁剪量,可以是开发人员基于所述期望参数,根据经验预先设置的固定裁剪量。例如,开发人员可以预先设置,对所述第一深度学习模型的 第一个卷积层去掉3个滤波器,对第二个卷积层去掉2个滤波器....当然,基于此种方法确定裁剪量,效率低下,且可靠性差。
故,在一些实施例中,步骤201,根据所述期望参数,确定所述第一深度学习模型的每个神经网络层的第一参量的裁剪量,可以包括:
步骤2011,根据所述期望参数,确定对所述第一深度学习模型的每个神经网络层的第一参量进行裁剪的比例;
步骤2012,根据所述比例和所述第一深度学习模型,确定所述第一深度学习模型的每个神经网络层的第一参量的裁剪量。
下面,给出一个示例进行说明。例如,所述第一深度学习模型,是用于对人脸进行美颜的模型,在具有较高计算性能的服务器中完成训练,大小为10M。如果想要将所述第一深度学习模型部署在用户的手机上,所述手机在运行模型时,能够提供的缓存仅为5M,那么,需要对所述第一深度学习模型进行剪枝以实现对所述第一深度学习模型的“瘦身”。基于10M和5M之间的比例关系,可知,需要至少对所述第一深度学习模型的每个神经网络层的第一参量裁剪的比例为50%。基于这个比例,结合所述第一深度学习模型每个神经网络层的所述第一参量的数量,即可确定对所述第一深度学习模型的每个神经网络层的第一参量的裁剪数量。
通过上述实施例可以看到,基于所述期望参数和所述第一深度学习模型,确定对所述第一深度学习模型的每个神经网络层的第一参量进行裁剪的数量,不需要开发人员的干预,可以自动化实现,简单方便,容易实现。
由于深度学习模型进行训练时,使用了大量的卷积层由浅至深地提取输入数据特征的特点,故对大部分的深度学习模型进行剪枝时,存在如下特点:对深度学习模型靠近输入端的神经网络层(下文称为“浅层”)进行剪枝,可以大大提高所述深度学习模型的运行速度,但是剪枝后的深度学习模型的运行精度会有所降低;对深度学习模型靠近输出端的神经网络层(下文称为“深层”)进行剪枝,可以大大降低剪枝之后的深度学习模型的参数量,降低剪枝之后的深度学习模型的大小,且剪枝之后的深度学习模型的运行精度的降低程度小于对浅层的神经网络层进行剪枝的运行精度的降低程度。
基于上述对深度学习模型进行剪枝的特点,步骤2011,根据所述期望参数,确定对所述第一深度学习模型的每个神经网络层的第一参量进行剪枝的裁剪比例,可以 根据不同的预设的分配策略,“均匀”地对所述深度学习模型的每个神经网络层的所述第一参量分配相同的裁剪比例,这种情况下,不需要考虑剪枝后的深度学习模型的大小和运行速度的权衡;也可以“非均匀”地对所述深度学习模型的每个神经网络层分配不同的裁剪比例,从而在降低剪枝后的深度学习模型的大小和提升剪枝后的深度学习模型的运行速度之间,进行权衡,从而适用不同的应用场景和部署平台。
故,在一些实施例中,步骤2011,根据所述期望参数,确定对所述第一深度学习模型的每个神经网络层的第一参量进行剪枝的裁剪比例,包括:
步骤2011a,根据所述期望参数,确定对所述第一深度学习模型的第一参量进行裁剪的总比例;
步骤2011b,基于预设的分配策略,向所述第一深度学习模型的多个神经网络层分配不同的裁剪比例,所述不同的裁剪比例使得对所述第一深度学习模型进行裁剪后,所获得的深度学习模型的总裁剪比例在所述总比例的预设误差范围之内。
其中,所述预设的分配策略,可以基于应用场景和部署平台的特点设定,以表征优先提升剪枝后的深度学习模型的运行速度,还是降低剪枝后的深度学习模型的大小。例如,所述第一深度学习模型,为在服务器上训练完成的用于进行目标对象识别的模型。基于所述第一深度学习模型所获取的第一目标深度学习模型,是部署在用户的手机上实现的。此种情况下,应当优先降低剪枝后的深度学习模型的大小。故预设的分配策略,可以是对所述第一深度学习模型的前N个神经网络层分配的裁剪比例为a,对所述第一深度学习模型的第N+1个神经网络层至最后一个神经网络层,分配的裁剪比例为b,且所述裁剪比例a<所述裁剪比例b,以及对所述第一深度学习基于所述裁剪比例进行剪枝后的深度学习模型的总裁剪比例在所述总比例的预设误差范围之内,从而使得对所述第一深度学习模型的深层神经网络的所述第一参量进行剪枝,以保证剪枝后的第一深度学习模型具有较小的大小,适用于手机平台。
本领域技术人员应当理解,上述预设的分配策略,仅为示例性说明。所述预设的分配策略,还可以是其他内容,例如,在优先降低剪枝后的深度学习模型的大小的情况下,还可以先确定所述第一深度学习模型的中间神经网络层,该中间神经网络层之前的神经网络层,裁剪比例以固定的递减值分配,该中间神经网络层之后的神经网络层,裁剪比例以固定的递增值分配,基于所述第一深度学习模型的总裁剪比例,进行方程求解,可以确定所述第一深度学习模型的每个神经网络层的所述第一参量的裁剪比例,从而使得对所述第一深度学习模型的深层神经网络的所述第一参量进行剪枝, 以保证剪枝后的第一深度学习模型具有较小的大小,适用于手机平台。本申请实施例对所述预设的分配策略的具体内容以及对所述第一深度学习模型的多个神经网络分配不同的裁剪比例的具体方式不做限制。
对于一些应用场景和一些平台,剪枝后的第一深度学习模型的大小和运行速度重要程度相同或者相近的情况下,步骤2011,根据所述期望参数,确定对所述第一深度学习模型的每个神经网络层的第一参量进行剪枝的裁剪比例,包括:
步骤2011c,根据所述期望参数,确定对所述第一深度学习模型的第一参量进行裁剪的总比例;
步骤2011d,分别向所述第一深度学习模型的多个神经网络层分配相同的裁剪比例,所述相同的裁剪比例使得对所述第一深度学习模型进行裁剪后,所获得的深度学习模型的总裁剪比例在所述总比例的预设误差范围之内。
在该实施例中,对所述第一深度学习模型的多个神经网络层分配相同的裁剪比例,简单方便,不需要过多额外的计算量,能够快速确定对所述第一深度学习模型的多个神经网络层分配的裁剪比例。
在通过步骤201,确定了对所述第一深度学习模型的每个神经网络层的第一参量的裁剪量之后,基于步骤202和步骤203,能够获取多个所述第一特征图和与所述第一特征图对应的第二特征图之间的误差,以下,以所述第一参量为滤波器,结合图5对剪枝过程的步骤202和步骤203进行说明。
如图5所示,图5(A)为所述第一深度学习模型在剪枝之前,第i层特征图(i为不小于1的正整数),维度为C*H*W,其中,C为通道数,H是高度,W是宽度。图5(B)为维度为Oi*C*h*w的第i层滤波器组,其中,Oi为所述滤波器组中滤波器的个数,C与第i层特征图的通道数C相等,h是所述滤波器组的高度,w是所述滤波器组的宽度,对第i层神经网络层进行剪枝操作就是把第i层滤波器个数Oi去掉指定数量的图5(B)中以虚线的长方体表示的滤波器。以去掉第i层滤波器组的一个滤波器为例,裁剪掉第i层滤波器组中的一个滤波器之后,第i层滤波器组的维度就变成了(Oi-1)*C*h*w,对应的第i+1层的特征图的维度就变成了(Oi-1)*H*W,如图5(C)所示,即裁剪掉一个滤波器的表现为深度学习模型的下一层输出的特征图少一个通道,图5(C)中虚线框就表示被剪枝掉的一个通道。此时,由于第i+1层的特征图的维度的改变,第i+1层的滤波器组的通道数也要相应变成和第i+1层的特征图相同的通道 数Oi-1,如图5(D)中的长方体上的虚线所示,即为被裁剪掉的一个通道。第i+1层滤波器组被剪掉一个通道后,对第i+2层特征图的输出维度并没有任何影响,如图5(E)所示,为剪枝后的第i+2层特征图。由于剪枝后的第i+2层输出的特征图和剪枝前的输出的特征图维度完全相同,故可以计算获得该层特征图在剪枝前后的误差。根据上述过程,对所述第一深度学习模型中的每个神经网络层,每次去掉不同的滤波器,能够得到多个滤波器所对应的不同特征图在剪枝前后之间的误差。
在上述示例中,是以对所述第一深度学习模型的神经网络层去掉一个的滤波器为例进行说明,本领域技术人员应当理解,对所述第一深度学习模型的神经网络层,也可以是去掉神经元、向量以及卷积核等类型的第一参量,当去掉的是这些第一参量时,同样,可以基于剪枝前后,维度并未发生变化且位于所述剪枝的神经网络层之后的特征图,获得同一特征图剪枝前后的误差。
此外,对于所述第一参量的剪枝数量,可以在一次剪枝过程中,仅去掉一个所述第一参量,也可以是同时去掉多个所述第一参量,例如,在一次剪枝过程中,同时去掉一个滤波器组的3个滤波器,本申请实施例对此不做限制。
基于上述实施例的步骤202和步骤203,获取了多个同一特征图在剪枝前后的误差之后,结合步骤201所确定的裁剪量,可以确定需要裁剪的第一参量。
在一些实施例中,步骤204,基于所述裁剪量和所述多个误差,确定需要裁剪的第一参量,包括:
步骤2041,对所述多个误差进行排序;
步骤2042,基于所述排序结果,保留误差最小的所述裁剪量的第一参量。
基于步骤202和步骤203,能够获得去掉多个所述第一参量,所对应的多个第一特征图和第二特征图之间的误差,对所述多个误差进行排序,如果误差较小,则说明去掉该第一参量,对于后续的神经网络层的影响较小;如果误差较大,则说明去掉该第一参量,对于后续的神经网络层的影响较大。因此,通过对所述多个误差进行排序,以排序结果衡量对应所去掉的第一参量对所述第一深度学习模型的影响,基于所述排序结果,确定需要去掉的所述第一参量,科学有效,计算量低,容易实现。
当然,除了上述实施例所述的基于排序结果,确定需要保留或者去掉的第一参量的方式之外,还可以基于所述误差,通过其他方式确定需要保留或者去掉的第一参量。例如,可以采用将所述多个误差的每个误差与其余误差进行比较,当确定自己是 最小的误差时,确定该误差所对应的第一参量可以去掉,从而确定第一个可以去掉的第一参量,依次类推,直至确定出的可以去掉的第一参量的数量,等于步骤201所确定的裁剪量。
在一些实施例中,所述第一特征图与所述第二特征图之间的误差,可以基于所述第一特征图与所述第二特征图的距离确定,所述距离,可以为欧式距离、曼哈顿距离、切比雪夫距离、闵可夫斯基距离、马氏距离等等,本申请实施例不做限制。
通过上述实施例可以看到,采用距离这个参量来衡量所述第一特征图与所述第二特征图之间的误差,计算方便,容易确定。
经过上述各个步骤,能够确定对所述第一深度学习模型进行剪枝时,需要去掉的所述第一参量,进而能够得到第二深度学习模型。在得到了所述第二深度学习模型之后,可以基于步骤103,固定所述第一深度学习模型的参数,对所述第一深度学习模型和所述第二深度学习模型进行联合训练,获得满足所述期望参数的第一目标深度学习模型。
对于步骤103所述的第一深度学习模型和第二深度学习模型的联合训练,可以基于多种具体的方式实现,本申请实施例对此不做限制。
在一些实施例里中,步骤103,可以基于知识迁移技术中的蒸馏技术来实现,即将所述第一深度学习模型作为教师模型(teacher model),将所述第二深度学习模型作为学生模型(student model),固定所述第一深度学习模型的参数,基于第一深度学习模型和第二深度学习模型,建立一个与这两个深度学习模型都有关的损失函数,向着两个深度学习模型输入相同的训练数据,基于所述损失函数调整所述第二深度学习模型的参数,以实现在所述第一深度学习模型的引导下,对所述第二深度学习模型进行精度恢复的效果。
如图6所示,给出了一种具体的蒸馏网络:将训练数据输入所述第一深度学习模型,所述第一深度学习模型只参与前向传递,输出的结果经升温(/T)后,经过第一Softmax,得到软化的Soft target;同样的训练数据输入至所述第二深度学习模型,输出结果经过和所述第一深度学习模型相同的温度(/T)后,经过第二Softmax,与Soft target进行KL散度计算,得到蒸馏损失;同样的训练数据输入至所述第二深度学习模型,输出结果经过第三Softmax,与硬目标进行交叉熵计算,得到学生损失;基于所述蒸馏损失和所述学生损失,构建联合损失,对所述第二深度学习模型进行训练, 即能够实现在所述第一深度学习模型的指导下,对所述第二深度学习模型进行精度恢复。
本领域技术人员应当理解,上述基于蒸馏技术实现联合训练以及所述的蒸馏网络仅为示例性说明,当然还可以采用其他方式,来基于所述第一深度学习模型,对所述第二深度学习模型进行联合训练,本申请实施例对此不做限制。
通过上述各个实施例可以看到,基于表征深度学习模型性能的期望参数,对预先获取的第一深度学习模型进行剪枝,得到满足所述期望参数的第二深度学习模型之后,固定所述第一深度学习模型的参数,对所述第一深度学习模型和所述第二深度学习模型进行联合训练,能够对所述第二深度学习模型进行精度恢复,使得最终获得的第一目标深度学习模型,相对于所述第一深度学习模型,不仅为轻量化的深度学习模型,同时还具备较高的运行精度。可见,应用本申请实施例所提供的方法,应用本申请实施例所提供的方法,能够基于第一深度学习模型,获得剪枝后的深度学习模型之后,不采用重新训练的方法,而是基于所述第一深度学习模型,对剪枝之后的深度学习模型进行精度恢复,获取满足不同平台或者不同应用场景的需求的深度学习模型。能够克服相关技术中,需要对适用于不同平台或者不同应用场景的深度学习模型进行分离且重复的训练而导致的计算资源的浪费,以及在计算资源紧张的情况下,无法获取合适的深度学习模型的缺陷。
对于步骤102,根据所述期望参数,对所述第一深度学习模型进行剪枝,除了基于上述实施例所述方法之外,还可以采用其他剪枝方式。
在一些实施例中,步骤102,根据所述期望参数,对所述第一深度学习模型进行剪枝,可以如图7所示,包括:
步骤701,每次对所述第一深度学习模型中的多个神经网络层,去掉不同的指定数量的第一参量,获得多个第三深度学习模型,所述第一参量至少包括神经元、向量、卷积核或滤波器;
步骤702,获取表征每个第三深度学习模型性能的评价参数,所述性能至少包括第三深度学习模型的大小、运行速度和运行精度;
步骤703,基于所述评价参数与所述期望参数,确定所述第一深度学习模型中,需要裁剪的第一参量。
其中,步骤701,每次对所述第一深度学习模型中的多个神经网络层,去掉不 同的指定数量的第一参量,可以基于预设的顺序来进行,例如,从所述第一深度学习模型的第一个神经网络层开始,每次去掉1个所述第一参量,获得对应的第三深度学习模型,进而获取每个所述第三深度学习模型性能的评价参数,如,每个所述第三深度学习模型的大小,以及向每个第三深度学习模型输入相同的未标注的数据,获得每个第三深度学习模型的输出的运行时间、运行精度等等。在获得了对所述第一深度学习模型进行剪枝之后的第三深度学习模型性能的评价参数之后,基于所述评价参数与所述期望参数,能够确定在所述第一深度学习模型中,需要裁剪的第一参量。
在一些实施例中,步骤703,基于所述评价参数与所述期望参数,确定所述第一深度学习模型中,需要裁剪的第一参量,包括:
步骤7031,获取所述评价参数与所述期望参数的距离;
步骤7032,对所述距离进行排序;
步骤7034,基于所述距离排序结果,确定需要裁剪的第一参量。
在上述步骤702中,能够获取对所述第一深度学习模型的多个神经网络层,去掉不同的指定数量的第一参量后,所获得的多个第三深度学习模型的性能评价参数。通过步骤与7031,获取所述评价参数与所述期望参数的距离,能够量化在步骤701中,在单次中对所述第一深度学习模型中的神经网络层所去掉的第一参量,对于深度学习模型性能的影响程度。例如,剪枝之前的所述第一深度学习模型,在服务器上训练完成,大小为10M,用于人脸美颜时,运行时间为10ms(此处,用“运行时间”来衡量“运行速度”)。对所述第一深度学习模型进行剪枝和精度恢复之后,所获得的第一目标深度学习模型是需要部署在用户的移动终端上的,期望的模型大小为5M,运行时间为18ms。应用上述实施例所述的方法,对于所述第一深度学习模型的第一层神经网络层去掉第1个滤波器之后,所获得的第三深度学习模型的大小为9.8M,用于人脸美颜时,运行速度为30ms;对于所述第一深度学习模型的第一层神经网络层去掉第2个滤波器之后,所获得的第三深度学习模型的大小为9.9M,用于人脸美颜时,运行速度为34.7ms....对于所述第一深度学习模型的第N层神经网络层去掉第n个滤波器之后,所获得的第三深度学习模型的大小为6.6M,用于人脸美颜时,运行速度为40.1ms。对于所获得的多个第三深度学习模型的性能评价参数(即指多个第三深度学习模型的大小、运行速度、运行精度等),与所述期望参数(即指期望的模型大小、运行时间与运行精度等),获得多个所述评价参数与所述期望参数的距离,如表1所示。
表1:剪枝后的第三深度学习模型的评价参数与期望参数的距离
Figure PCTCN2021083129-appb-000001
对多个所述评价参数与所述期望参数之间的距离进行排序,基于距离排序结果,可以确定需要裁剪的第一参量。结合表1进行说明,可以仅基于所述模型大小距离或者运行时间距离进行排序,确定需要裁剪的所述第一参量;当然,也可以对所述模型大小距离以及运行时间距离进行加权,获得综合距离,综合考虑所述深度学习模型大小与运行时间的影响,确定需要裁剪的所述第一参量。
本领域技术人员应当理解,上述例子仅为示例性说明,所述评价参数还可以是所述深度学习模型的运行精度、运行速度等等,本申请实施例对此不做限制。
通过上述实施例可以看到,对所述第一深度学习模型,去掉不同的指定数量的第一参量,获得多个第三深度学习模型,基于多个所述第三深度学习模型性能的评价参数,结合所述期望参数,能够实现以目的为导向,简单有效地确定需要剪枝的所述第一参量,获取满足不同平台或者不同应用场景的需求的深度学习模型,且还能够克服相关技术中,需要对适用于不同平台或者不同应用场景的深度学习模型进行分离且重复的训练而导致的计算资源的浪费,以及在计算资源紧张的情况下,无法获取合适的深度学习模型的缺陷。
此外,对于步骤102,根据所述期望参数,对所述第一深度学习模型进行剪枝,得到第二深度学习模型,步骤201~步骤204,以及步骤701~步骤703,分别给出了不同的剪枝方法。本领域技术人员应当理解,步骤202以及步骤701中,所述的指定数量,可以为任意整数,即对于所述第一深度学习模型的每个神经网络层,可以不裁剪第一参量,也可以一次去掉多个第一参量。即在每次对所述第一深度学习模型中,需要剪枝的第一神经网络层,去掉一组所述第一参量(即一组包含1个以上的所述第一参量)。然后获得多个剪枝前后的尺寸不发生变化的特征图的误差,基于所述多个误差,确定每组所述第一参量是否被裁剪。其中,每组所包含的所述第一参量的数量可 以相同,也可以不同,本申请实施例对此不做限制。
在一些实施例中,步骤202以及步骤701中,所述指定数量为1。即每次对所述第一深度学习模型中需要剪枝的第一神经网络层,去掉1个所述第一参量,所第一参量至少包括神经元、向量、卷积核或滤波器。
通过上述实施例可以看到,每次对所述第一深度学习模型中,需要剪枝的第一神经网络层,去掉1个所述第一参量,能够精确地确定每个所述第一参量的裁剪对于所述第一深度学习模型性能的影响,使得后续所确定的需要裁剪的所述第一参量是对所述第一深度学习模型性能影响较小的所述第一参量,从而保证对所述第一深度学习模型进行“瘦身”的同时,保证所述第一深度学习模型的性能不发生较大的下降。
在对所述第一深度学习模型进行剪枝之后,能够获得第二深度学习模型,然后,通过步骤103,对所述第一深度学习模型和第二深度学习模型进行联合训练,可以实现精度恢复。关于步骤3所采用的联合训练,可以基于多种具体的方式实现,例如,可以基于相关技术中的蒸馏技术来实现,本申请实施例对此不做限制。
对于步骤102,根据所述期望参数,对所述第一深度学习模型进行剪枝,还可以参考相关技术来实现。在一些实施例中,根据所述期望参数,对所述第一深度学习模型进行剪枝,包括:
基于NAS方式,对所述第一深度学习模型进行自动剪枝。NAS,即为神经结构搜索(Neural Architecture Search,简称NAS),是一种自动设计神经网络的技术,可以通过算法根据样本集自动设计出高性能的网络结构,在某些任务上甚至可以媲美人类专家的水准,甚至发现某些人类之前未曾提出的网络结构,这可以有效的降低神经网络的使用和实现成本。
如图8所示,给出了NAS方式的原理示意图。NAS方式的原理是给定一个称为搜索空间的候选神经网络结构集合,基于预设的搜索策略,从所述搜索空间中搜索出网络结构,基于预设的性能评估策略,对搜索出的网络结构进行优劣评估,进而确定所述搜索出的网络结构是否为最优网络结构。其中,预设的性能评估策略,即用某些指标,例如运行精度、运行速度等来度量,称为性能评估。
对于本申请实施例而言,所述搜索空间可以是所述第一神经网络所包含的全部神经网络结构的集合。所述性能评估策略,可以根据部署平台以及应用场景的需求,设定要求搜索出的网络结构的运行速度、运行精度以及大小。基于NAS方式,能够自 动组合出网络结构,直至所述网络结构满足性能评估策略,即完成了对所述第一深度学习模型的剪枝过程。
基于NAS方式,能够对所述第一深度学习模型进行剪枝,获得所述第二深度学习模型。然后,通过步骤103,对所述第一深度学习模型和第二深度学习模型进行联合训练,可以实现精度恢复。关于步骤3所采用的联合训练,可以基于多种具体的方式实现,例如,可以基于相关技术中的蒸馏技术来实现,本申请实施例对此不做限制。
通过上述实施例可以看到,应用本申请实施例所提供的方法,能够基于第一深度学习模型,获得剪枝后的深度学习模型之后,不采用重新训练的方法,而是对所述第一深度学习模型和剪枝后的深度学习模型进行联合训练,实现对剪枝之后的深度学习模型进行精度恢复,获取满足不同平台或者不同应用场景的需求的深度学习模型。能够克服相关技术中,需要对适用于不同平台或者不同应用场景的深度学习模型进行分离且重复的训练而导致的计算资源的浪费,以及在计算资源紧张的情况下,无法获取合适的深度学习模型的缺陷。
与前文所述的一种深度学习模型的获取方法相对应,本申请实施例还提供了另外一种深度学习模型的获取方法,如图9所示,所述方法包括:
步骤901,获取所述第一深度学习模型和所述深度学习模型的期望裁剪量;
步骤902,根据所述期望裁剪量,确定所述第一深度学习模型的每个神经网络层的第一参量的裁剪量,所述第一参量至少包括神经元、向量、卷积核或滤波器;
步骤903,每次对所述第一深度学习模型中需要剪枝的第一神经网络层,去掉不同的指定数量的第一参量,获得所述第一神经网络层之后的第二神经网络层所输出的第一特征图,所述第二神经网络层为所述第一神经网络层去掉所述指定数量的第一参量前后,输出的特征图尺寸未发生变化的神经网络层;
步骤904,获得多个所述第一特征图和与所述第一特征图对应的第二特征图之间的误差,所述第二特征图为所述第二神经网络层在所述第一神经网络层未去掉所述指定数量的第一参量之前所输出的特征图;
步骤905,基于所述期望裁剪量和所述多个误差,确定需要裁剪的第一参量;
步骤906,对所述需要裁剪的第一参量进行裁剪,获得第二目标深度学习模型。
其中,所述第一深度学习模型,可以是基于现有的或者自己开发的深度学习框架,利用计算性能较强的平台训练获得,也可以是直接从其他的地方获取的由第三方训练完成的深度学习模型,当然,还可以是其他获取方式,本申请实施例对此不做限制。
所述深度学习模型的期望裁剪量,可以是预先设定的固定期望裁剪量,即不考虑所述第一深度学习模型的性能,对于所有的第一深度学习模型,都裁剪固定比例或者固定数量的所述第一参量,所述性能包括所述所述第一深度学习模型的大小,运行速度和运行精度。当然,也可以是预先设定的与所述第一深度学习模型的性能参数相关的裁剪量,例如,对于大小小于第一阈值的所述第一深度学习模型,裁剪第一比例或者第一数量的所述第一参量,对于大小在第一阈值和第二阈值之间的所述第一深度学习模型,裁剪第二比例或者第二数量的所述第一参量,对于大小在第二阈值和第三阈值之间的所述第一深度学习模型,裁剪第三比例或者第三数量的所述第一参量...依次类推,本申请实施例对所述深度学习模型的期望裁剪量的获取方式不做限定。
在一些实施例中,所述期望裁剪量根据表征深度学习模型性能的期望参数确定,所述深度学习模型性能至少包括所述深度学习模型的大小、运行速度和运行精度。即在这些实施例中,所述期望裁剪量并非是预设的裁剪量,而是根据表征深度学习模型性能的期望参数确定。所述表征深度学习模型性能的期望参数,可以通过预先设置的方式获取,也可以是通过向所述第二目标深度学习模型需要部署的需求平台获取,还可以是通过向第三方获取的方式得到,本申请实施例对此不做限定。
在一些实施例中,所述表征深度学习模型性能的期望参数,可以基于所述第二目标深度学习模型需要部署的平台的计算性能确定,也可以基于所述第二目标深度学习模型的应用场景确定。相关内容可以参见本申请实施例所提供的第一种深度学习模型的获取方法中的对应部分,本申请实施例在此对此不再赘述。
在一些实施例中,所述期望裁剪量根据表征深度学习模型性能的期望参数确定,包括:
步骤9011,根据所述期望参数,确定对所述第一深度学习模型的每个神经网络层的第一参量进行裁剪的比例;
步骤9012,根据所述比例和所述第一深度学习模型,确定所述第一深度学习模型的每个神经网络层的第一参量的裁剪数量。
本申请实施例所述的步骤9011与步骤9012,同本申请实施例所提供的第一种深度学习模型的获取方法中的步骤2011与步骤2012类似,相关内容已经在前文详细介绍,这里不再赘述。
通过上述实施例可以看到,基于所述期望参数和所述第一深度学习模型,确定对所述第一深度学习模型的每个神经网络层的第一参量进行裁剪的数量,不需要开发人员的干预,可以自动化实现,简单方便,容易实现。
在一些实施例中,步骤902,根据所述期望裁剪量,确定所述第一深度学习模型的每个神经网络层的第一参量的裁剪量,可以包括:
步骤9021,基于预设的分配策略,向所述第一深度学习模型的每个神经网络层分配不同的裁剪量,所述不同的裁剪量使得对所述第一深度学习模型进行裁剪后,所获得的深度学习模型的总裁剪量在所述期望裁剪量的预设误差范围之内;或者,
步骤9022,分别向所述第一深度学习模型的多个神经网络层分配相同的裁剪量,所述相同的裁剪量使得对所述第一深度学习模型进行裁剪后,所获得的深度学习模型的总裁剪量在所述期望裁剪量的预设误差范围之内。
本申请实施例所述的步骤9021,同本申请实施例所提供的第一种深度学习模型的获取方法中的步骤2011b类似;步骤9022,同本申请实施例所提供的第一种深度学习模型的获取方法中的步骤2011d类似。关于此部分的相关内容已经在前文详细介绍,这里不再赘述。
通过该实施例可以看到,根据所述期望裁剪量,采用步骤9021确定所述第一深度学习模型的每个神经网络层的第一参量的裁剪量,能够在优先保证深度学习模型的大小还是优先保证深度学习模型的运行速度之间做出权衡,使得剪枝后的深度学习模型能够满足不同平台及不同应用场景的应用需求。而采用步骤9022确定所述第一深度学习模型的每个神经网络层的第一参量的裁剪量,计算简单,容易实现,能够节省计算资源。
在一些实施例中,步骤905,基于所述期望裁剪量和所述多个误差,确定需要裁剪的第一参量,包括:
步骤9051,对所述多个误差进行排序;
步骤9052,基于所述排序结果,保留误差最小的所述期望裁剪量的第一参量。
本申请实施例所述的步骤9051与步骤9052,同本申请实施例所提供的第一种深度学习模型的获取方法中的步骤2041与步骤2042类似,相关内容已经在前文详细介绍,这里不再赘述。
通过上述实施例可以看到,通过对所述多个误差进行排序,以排序结果衡量对应所去掉的第一参量对所述第一深度学习模型的影响,基于所述排序结果,确定需要去掉的所述第一参量,科学有效,计算量低,容易实现。
同前文所述的第一种深度学习模型的获取方法类似,步骤903中的指定数量,可以为任意整数,即对于所述第一深度学习模型的每个神经网络层,可以不裁剪第一参量,也可以一次去掉多个第一参量,本申请实施例对此不做限制。
在一些实施例中,步骤903中,所述指定数量为1。即每次对所述第一深度学习模型中需要剪枝的第一神经网络层,去掉1个所述第一参量,所第一参量至少包括神经元、向量、卷积核或滤波器。
通过上述实施例可以看到,每次对所述第一深度学习模型中,需要剪枝的第一神经网络层,去掉1个所述第一参量,能够精确地确定每个所述第一参量的裁剪对于所述第一深度学习模型性能的影响,使得后续所确定的需要裁剪的所述第一参量是对所述第一深度学习模型性能影响较小的所述第一参量,从而保证对所述第一深度学习模型进行“瘦身”的同时,保证所述第一深度学习模型的性能不发生较大的下降。
在一些实施例中,所述第一特征图与所述第二特征图之间的误差,可以基于所述第一特征图与所述第二特征图的距离确定,所述距离,可以为欧式距离、曼哈顿距离、切比雪夫距离、闵可夫斯基距离、马氏距离等等,本申请实施例不做限制。
通过上述实施例可以看到,采用距离这个参量来衡量所述第一特征图与所述第二特征图之间的误差,计算方便,容易确定。
在一些实施例中,如图10所示,本申请实施例所提供的第二种深度学习模型的获取方法还包括:基于与所述第一深度学习模型相同的训练数据和损失函数,对步骤906所获取的第二目标深度学习进行训练,以获得精度恢复后的深度学习模型。
当然,本领域技术人员应当理解,也可以采用前文所述的,固定所述第一深度学习模型的参数,对所述第一深度学习模型和所述第二目标深度学习模型进行联合训练,获得精度恢复的深度学习模型。
通过上述各个实施例可以看到,基于深度学习模型性能的期望裁剪量,对预先 获取的第一深度学习模型进行剪枝,得到第二目标深度学习模型之后,对所述第二目标深度学习模型进行精度恢复,使得最终获得的深度学习模型,相对于所述第一深度学习模型,不仅为轻量化的深度学习模型,同时还具备较高的运行精度。可见,应用本申请实施例所提供的方法,能够基于第一深度学习模型,获取满足不同平台或者不同应用场景的需求的深度学习模型,且还能够克服相关技术中,需要对适用于不同平台或者不同应用场景的深度学习模型进行分离且重复的训练而导致的计算资源的浪费,以及在计算资源紧张的情况下,无法获取合适的深度学习模型的缺陷。
与本申请实施例所提供的深度学习模型的获取方法相对应,本申请实施例还提供了一种如图11所示的深度学习模型获取系统,所述系统包括第一平台1101和第二平台1102;所述第一平台1101,用于基于本申请前文各个实施例所提供的深度学习模型的获取方法,获取所述目标深度学习模型,所述第二平台1102,用于部署所述目标深度学习模型。
其中,所述平台至少包括以下之一:服务器集群、服务器、移动终端等等,当然,还可以是其他能够获取或部署所述深度学习模型的平台,本申请实施例对此不做限制。
通过上述实施例可以看到,基于本申请实施例所提供的深度学习模型获取系统,能够基于所述第一平台获取所述目标深度学习模型,所述第二平台能够部署所述第一平台所获取的所述目标深度学习模型,能够克服相关技术中,需要对适用于不同平台或者不同应用场景的深度学习模型进行分离且重复的训练而导致的计算资源的浪费,以及在计算资源紧张的情况下,无法获取合适的深度学习模型的缺陷。
相应地,本申请实施例还提供了一种与所述深度学习模型获取方法相对应的装置。如图12所示,为本申请实施例所提供的一种深度学习模型获取装置的硬件结构图,所述装置包括存储器1201和处理器1202及存储在所述存储器上并可在处理器运行的计算机程序,所述处理器执行所述程序时实现本申请实施例所提供的任一方法实施例。所述存储器1201可以是所述深度学习模型获取装置的内部存储单元,例如是设备的硬盘或者内存。所述存储器1201也可以是所述深度学习模型获取装置的外部存储设备,例如所述设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器1201还可以既包括所述深度学习模型获取装置的内部存储单元也包括外部存储设备。所述存储器用于存储所述计算机程序以及所述设备所需的其他程序和数据。所述存储器还可以用于 暂时地存储已经输出或者将要输出的数据。当存储器存储的程序被执行时,所述处理器1202调用存储器1201中存储的程序,用于执行前述各实施例的方法,所述方法已在前文详细介绍,这里不再赘述。
当然,本领域技术人员应当理解,通常根据该深度学习模型获取装置的实际功能,还可以包括其他硬件,例如网络接口等等,本申请对此不再赘述。
在本申请的实施例中还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现本申请上述方法中的所有实施例,在此不再赘述。
所述计算机可读存储介质可以是任一电子设备的内部存储单元,例如电子设备的硬盘或内存。所述计算机可读存储介质也可以是所述电子设备的外部存储设备,例如所述设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述计算机可读存储介质还可以既包括所述电子设备的内部存储单元也包括外部存储设备。所述计算机可读存储介质用于存储所述计算机程序以及所述电子设备所需的其他程序和数据。所述计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。
上述对本申请特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
本领域技术人员在考虑说明书及实践这里申请的发明后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未申请的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本申请的真正范围和 精神由下面的权利要求指出。
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。
以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。

Claims (22)

  1. 一种深度学习模型的获取方法,其特征在于,包括:
    获取第一深度学习模型和表征深度学习模型性能的期望参数,所述深度学习模型性能至少包括以下之一:所述深度学习模型的大小、运行速度和运行精度;
    根据所述期望参数,对所述第一深度学习模型进行剪枝,得到第二深度学习模型;
    固定所述第一深度学习模型的参数,对所述第一深度学习模型和所述第二深度学习模型进行联合训练,获得满足所述期望参数的第一目标深度学习模型。
  2. 根据权利要求1所述的方法,其特征在于,所述表征深度学习模型性能的期望参数,基于所述第一目标深度学习模型需要部署的需求平台的计算性能和\或所述第一目标深度学习模型的应用场景确定。
  3. 根据权利要求1所述的方法,其特征在于,根据所述期望参数,对所述第一深度学习模型进行剪枝,包括:
    根据所述期望参数,确定所述第一深度学习模型的每个神经网络层的第一参量的裁剪量,所述第一参量至少包括神经元、向量、卷积核或滤波器;
    每次从所述第一深度学习模型中需要剪枝的第一神经网络层,去掉不同的指定数量的第一参量,获得所述第一神经网络层之后的第二神经网络层所输出的第一特征图,所述第二神经网络层为所述第一神经网络层去掉所述指定数量的第一参量前后,输出的特征图尺寸未发生变化的神经网络层;
    获得多个所述第一特征图和与所述第一特征图对应的第二特征图之间的误差,所述第二特征图为所述第二神经网络层在所述第一神经网络层未去掉所述指定数量的第一参量之前所输出的特征图;
    基于所述裁剪量和所述多个误差,确定需要裁剪的第一参量。
  4. 根据权利要求3所述的方法,其特征在于,根据所述期望参数,确定所述第一深度模型的每个神经网络层的第一参量的裁剪量,包括:
    根据所述期望参数,确定对所述第一深度学习模型的每个神经网络层的第一参量进行裁剪的比例;
    根据所述比例和所述第一深度学习模型,确定所述第一深度学习模型的每个神经网络层的第一参量的裁剪数量。
  5. 根据权利要求4所述的方法,其特征在于,所述确定对所述第一深度学习模型的每个神经网络层的第一参量进行剪枝的裁剪比例,包括:
    根据所述期望参数,确定对所述第一深度学习模型的第一参量进行裁剪的总比例;
    基于预设的分配策略,向所述第一深度学习模型的多个神经网络层分配不同的裁剪比例,所述不同的裁剪比例使得对所述第一深度学习模型进行裁剪后,所获得的深度学习模型的总裁剪比例在所述总比例的预设误差范围之内;
    或者,
    根据所述期望参数,确定对所述第一深度学习模型的第一参量进行裁剪的总比例;
    分别向所述第一深度学习模型的多个神经网络层分配相同的裁剪比例,所述相同的裁剪比例使得对所述第一深度学习模型进行裁剪后,所获得的深度学习模型的总裁剪比例在所述总比例的预设误差范围之内。
  6. 根据权利要求3所述的方法,其特征在于,基于所述裁剪量和所述多个误差,确定需要裁剪的第一参量,包括:
    对所述多个误差进行排序;
    基于所述排序结果,保留误差最小的所述裁剪量的第一参量。
  7. 根据权利要求3所述的方法,其特征在于,所述第一特征图和与所述第一特征图对应的第二特征图之间的误差,基于所述第一特征图与所述第二特征图的距离确定。
  8. 根据权利要求1所述的方法,其特征在于,根据所述期望参数,对所述第一深度学习模型进行剪枝,包括:
    每次对所述第一深度学习模型中的多个神经网络层,去掉不同的指定数量的第一参量,获得多个第三深度学习模型,所述第一参量至少包括神经元、向量、卷积核或滤波器;
    获取表征每个第三深度学习模型性能的评价参数,所述性能至少包括第三深度学习模型的大小、运行速度和运行精度;
    基于所述评价参数与所述期望参数,确定所述第一深度学习模型中,需要裁剪的第一参量。
  9. 根据权利要求8所述的方法,其特征在于,基于所述评价参数与所述期望参数,确定所述第一深度学习模型中,需要裁剪的第一参量,包括:
    获取所述评价参数与所述期望参数的距离;
    对所述距离进行排序;
    基于所述距离排序结果,确定需要裁剪的第一参量。
  10. 根据权利要求3或8所述的方法,其特征在于,所述指定数量为1。
  11. 根据权利要求1所述的方法,其特征在于,根据所述期望参数,对所述第一深度学习模型进行剪枝,包括:
    基于NAS方式,对所述第一深度学习模型进行自动剪枝。
  12. 一种深度学习模型的获取方法,其特征在于,包括:
    获取第一深度学习模型和所述深度学习模型的期望裁剪量;
    根据所述期望裁剪量,确定所述第一深度学习模型的每个神经网络层的第一参量的裁剪量,所述第一参量至少包括神经元、向量、卷积核或滤波器中的一个;
    每次对所述第一深度学习模型中需要剪枝的第一神经网络层,去掉不同的指定数量的第一参量,获得所述第一神经网络层之后的第二神经网络层所输出的第一特征图,所述第二神经网络层为所述第一神经网络层去掉所述指定数量的第一参量前后,输出的特征图尺寸未发生变化的神经网络层;
    获得多个所述第一特征图和与所述第一特征图对应的第二特征图之间的误差,所述第二特征图为所述第二神经网络层在所述第一神经网络层未去掉所述指定数量的第一参量之前所输出的特征图;
    基于所述期望裁剪量和所述多个误差,确定需要裁剪的第一参量;
    对所述需要裁剪的第一参量进行裁剪,获得第二目标深度学习模型。
  13. 根据权利要求12所述的方法,其特征在于,所述期望裁剪量根据表征深度学习模型性能的期望参数确定,所述深度学习模型性能至少包括所述深度学习模型的大小、运行速度和运行精度。
  14. 根据权利要求13所述的方法,其特征在于,所述期望裁剪量根据表征深度学习模型性能的期望参数确定,包括:
    根据所述期望参数,确定对所述第一深度学习模型的每个神经网络层的第一参量进行裁剪的比例;
    根据所述比例和所述第一深度学习模型,确定所述第一深度学习模型的每个神经网络层的第一参量的裁剪数量。
  15. 根据权利要求12所述的方法,其特征在于,根据所述期望裁剪量,确定所述第一深度学习模型的每个神经网络层的第一参量的裁剪量,包括:
    基于预设的分配策略,向所述第一深度学习模型的每个神经网络层分配不同的裁剪量,所述不同的裁剪量使得对所述第一深度学习模型进行裁剪后,所获得的深度学习模型的总裁剪量在所述期望裁剪量的预设误差范围之内;
    或者,
    分别向所述第一深度学习模型的多个神经网络层分配相同的裁剪量,所述相同的裁剪量使得对所述第一深度学习模型进行裁剪后,所获得的深度学习模型的总裁剪量 在所述期望裁剪量的预设误差范围之内。
  16. 根据权利要求12所述的方法,其特征在于,基于所述期望裁剪量和所述多个误差,确定需要裁剪的第一参量,包括:
    对所述多个误差进行排序;
    基于所述排序结果,保留误差最小的所述期望裁剪量的第一参量。
  17. 根据权利要求12所述的方法,其特征在于,所述指定数量为1。
  18. 根据权利要求12所述的方法,其特征在于,所述第一特征图和与所述第一特征图对应的第二特征图之间的误差,基于所述第一特征图与第二特征图的距离确定。
  19. 根据权利要求12所述的方法,其特征在于,所述方法还包括:
    基于与所述第一深度学习模型相同的训练数据和损失函数,对所述第二目标深度学习模型进行训练。
  20. 一种深度学习模型获取系统,其特征在于,所述系统包括第一平台和第二平台;
    所述第一平台,用于基于权利要求1至19任一所述的方法,获取所述目标深度学习模型;
    所述第二平台,用于部署所述目标深度学习模型;
    所述平台至少包括以下之一:服务器集群、服务器、移动终端。
  21. 一种深度学习模型获取装置,其特征在于,所述装置包括:存储器和处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现权利要求1至19任一所述的方法。
  22. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有若干计算机指令,所述计算机指令被执行时实现权利要求1至19任一所述方法的步骤。
PCT/CN2021/083129 2021-03-26 2021-03-26 深度学习模型的获取方法、系统、装置及存储介质 WO2022198606A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/083129 WO2022198606A1 (zh) 2021-03-26 2021-03-26 深度学习模型的获取方法、系统、装置及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/083129 WO2022198606A1 (zh) 2021-03-26 2021-03-26 深度学习模型的获取方法、系统、装置及存储介质

Publications (1)

Publication Number Publication Date
WO2022198606A1 true WO2022198606A1 (zh) 2022-09-29

Family

ID=83396229

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/083129 WO2022198606A1 (zh) 2021-03-26 2021-03-26 深度学习模型的获取方法、系统、装置及存储介质

Country Status (1)

Country Link
WO (1) WO2022198606A1 (zh)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334934A (zh) * 2017-06-07 2018-07-27 北京深鉴智能科技有限公司 基于剪枝和蒸馏的卷积神经网络压缩方法
CN109657780A (zh) * 2018-06-15 2019-04-19 清华大学 一种基于剪枝顺序主动学习的模型压缩方法
CN109711528A (zh) * 2017-10-26 2019-05-03 北京深鉴智能科技有限公司 基于特征图变化对卷积神经网络剪枝的方法
CN110555417A (zh) * 2019-09-06 2019-12-10 福建中科亚创动漫科技股份有限公司 一种基于深度学习的视频图像识别系统及方法
CN110633747A (zh) * 2019-09-12 2019-12-31 网易(杭州)网络有限公司 目标检测器的压缩方法、装置、介质以及电子设备
US20200104716A1 (en) * 2018-08-23 2020-04-02 Samsung Electronics Co., Ltd. Method and system with deep learning model generation
CN111091177A (zh) * 2019-11-12 2020-05-01 腾讯科技(深圳)有限公司 一种模型压缩方法、装置、电子设备和存储介质
CN111695375A (zh) * 2019-03-13 2020-09-22 上海云从企业发展有限公司 基于模型蒸馏的人脸识别模型压缩算法、介质及终端
CN112016674A (zh) * 2020-07-29 2020-12-01 魔门塔(苏州)科技有限公司 一种基于知识蒸馏的卷积神经网络的量化方法

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334934A (zh) * 2017-06-07 2018-07-27 北京深鉴智能科技有限公司 基于剪枝和蒸馏的卷积神经网络压缩方法
CN109711528A (zh) * 2017-10-26 2019-05-03 北京深鉴智能科技有限公司 基于特征图变化对卷积神经网络剪枝的方法
CN109657780A (zh) * 2018-06-15 2019-04-19 清华大学 一种基于剪枝顺序主动学习的模型压缩方法
US20200104716A1 (en) * 2018-08-23 2020-04-02 Samsung Electronics Co., Ltd. Method and system with deep learning model generation
CN111695375A (zh) * 2019-03-13 2020-09-22 上海云从企业发展有限公司 基于模型蒸馏的人脸识别模型压缩算法、介质及终端
CN110555417A (zh) * 2019-09-06 2019-12-10 福建中科亚创动漫科技股份有限公司 一种基于深度学习的视频图像识别系统及方法
CN110633747A (zh) * 2019-09-12 2019-12-31 网易(杭州)网络有限公司 目标检测器的压缩方法、装置、介质以及电子设备
CN111091177A (zh) * 2019-11-12 2020-05-01 腾讯科技(深圳)有限公司 一种模型压缩方法、装置、电子设备和存储介质
CN112016674A (zh) * 2020-07-29 2020-12-01 魔门塔(苏州)科技有限公司 一种基于知识蒸馏的卷积神经网络的量化方法

Similar Documents

Publication Publication Date Title
US20180182377A1 (en) Method and device for extracting speech feature based on artificial intelligence
WO2022042123A1 (zh) 图像识别模型生成方法、装置、计算机设备和存储介质
CN109086866B (zh) 一种适用于嵌入式设备的部分二值卷积方法
CN110263659A (zh) 一种基于三元组损失和轻量级网络的指静脉识别方法及系统
WO2017124713A1 (zh) 一种数据模型的确定方法及装置
WO2019232772A1 (en) Systems and methods for content identification
EP3620982B1 (en) Sample processing method and device
CN110321805B (zh) 一种基于时序关系推理的动态表情识别方法
US10990807B2 (en) Selecting representative recent digital portraits as cover images
CN111047563A (zh) 一种应用于医学超声图像的神经网络构建方法
Smelyakov et al. Search by image. New search engine service model
CN112036564B (zh) 图片识别方法、装置、设备及存储介质
CN110807514A (zh) 一种基于lo正则的神经网络剪枝方法
CN109754077B (zh) 深度神经网络的网络模型压缩方法、装置及计算机设备
WO2019232723A1 (en) Systems and methods for cleaning data
CN114968612B (zh) 一种数据处理方法、系统及相关设备
CN113420651B (zh) 深度卷积神经网络的轻量化方法、系统及目标检测方法
KR20210111677A (ko) 뉴럴 네트워크의 클리핑 방법, 뉴럴 네트워크의 컨벌루션 계산 방법 및 그 방법을 수행하기 위한 전자 장치
CN113240090A (zh) 图像处理模型生成方法、图像处理方法、装置及电子设备
WO2022198606A1 (zh) 深度学习模型的获取方法、系统、装置及存储介质
CN116227573B (zh) 分割模型训练方法、图像分割方法、装置及相关介质
JP6991960B2 (ja) 画像認識装置、画像認識方法及びプログラム
CN112132062A (zh) 一种基于剪枝压缩神经网络的遥感图像分类方法
CN114723043A (zh) 基于超图模型谱聚类的卷积神经网络卷积核剪枝方法
CN114241227A (zh) 一种基于vlad的图像识别方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21932231

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21932231

Country of ref document: EP

Kind code of ref document: A1