CN114358254A

CN114358254A - Model processing method and related product

Info

Publication number: CN114358254A
Application number: CN202210010948.XA
Authority: CN
Inventors: 刘瑞
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-01-05
Filing date: 2022-01-05
Publication date: 2022-04-15

Abstract

The embodiment of the application discloses a model processing method and a related product, which can be used in the fields of intelligent traffic and automatic driving. The model processing method comprises the following steps: acquiring a to-be-processed deep network model, wherein the to-be-processed deep network model comprises N processing layers, any processing layer comprises one or more pruning objects, and N is a positive integer; acquiring prior knowledge of a pruning structure corresponding to the depth network model to be processed, wherein the prior knowledge of the pruning structure comprises a distribution rule presented by N pruning rates of N processing layers; according to the priori knowledge of the pruning structure, pruning objects contained in each processing layer to obtain a simplified depth network model; and outputting the simplified deep network model. By the method and the device, the calculated amount of the deep learning model can be reduced, and the storage space occupied by the model can be reduced.

Description

Model processing method and related product

Technical Field

The application relates to the field of intelligent transportation and automatic driving, in particular to a model processing method and a related product.

Background

Deep learning has become one of the most mainstream branches of machine learning. However, with the continuous optimization of deep learning, the depth of the model extends from several layers to hundreds of layers, with the concomitant dilation of the computational effort. For a non-computation-intensive terminal device, if a deep learning model is directly deployed, the computation amount, the size of a storage space occupied by the model, and the like become problems. Further, in a specific field (e.g., medical field or high frequency video field), the resolution of the picture has reached 2k × 2k, even 5k × 5k, and the increase of the picture resolution further increases the amount of model calculation.

Therefore, how to reduce the calculation amount of the deep learning model and reduce the memory space occupied by the model becomes an important problem to be solved urgently.

Disclosure of Invention

The embodiment of the application provides a model processing method and a related product, which can reduce the calculation amount of a deep learning model and reduce the storage space occupied by the model.

An aspect of the present embodiment provides a model processing method, including:

acquiring a deep network model to be processed, wherein the deep network model to be processed comprises N processing layers, any processing layer comprises one or more pruning objects, and N is a positive integer;

acquiring prior knowledge of a pruning structure corresponding to the depth network model to be processed, wherein the prior knowledge of the pruning structure comprises a distribution rule presented by N pruning rates of the N processing layers;

according to the priori knowledge of the pruning structure, pruning objects contained in each processing layer to obtain a simplified depth network model;

and outputting the simplified deep network model.

An aspect of an embodiment of the present application provides a model processing apparatus, including:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a deep network model to be processed, the deep network model to be processed comprises N processing layers, any processing layer comprises one or more pruning objects, and N is a positive integer;

the acquisition module is further configured to acquire priori knowledge of a pruning structure corresponding to the to-be-processed depth network model, where the priori knowledge of the pruning structure includes a distribution rule presented by N pruning rates of the N processing layers;

the pruning module is used for respectively carrying out pruning processing on the pruning objects contained in each processing layer according to the priori knowledge of the pruning structure to obtain a simplified depth network model;

and the output module is used for outputting the simplified deep network model.

An aspect of the embodiments of the present application provides a computer device, including a memory and a processor, where the memory stores a computer program, and when the computer program is executed by the processor, the processor is caused to execute the method in the foregoing embodiments.

An aspect of the embodiments of the present application provides a computer storage medium, in which a computer program is stored, where the computer program includes program instructions, and when the program instructions are executed by a processor, the method in the foregoing embodiments is performed.

An aspect of the embodiments of the present application provides a computer program product, where the computer program product includes a computer program/instruction, where the computer program/instruction is stored in a computer-readable storage medium, and when the computer program/instruction is executed by a processor of a computer device, the computer program/instruction performs the method in the foregoing embodiments.

According to the method, the deep network model is pruned, redundant parts of the model are pruned, the size of a storage space occupied by the model is compressed, and then parameters participating in calculation are reduced when the model performs forward propagation calculation, so that the calculation amount can be reduced, and the operation efficiency of the model can be improved; moreover, each processing layer is cut according to the prior knowledge, namely the distribution rule corresponding to the N pruning rates, compared with the random cutting of the pruning objects of each processing layer, the model can be simplified, and the model precision after cutting can be ensured; furthermore, each processing layer is cut according to the prior knowledge, and compared with the method of changing the loss function of the model to perform sparse training and then determining the cutting object of each layer, the method is simpler and more flexible, and is beneficial to expanding the application range of the method.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a system architecture diagram of a model process provided by an embodiment of the present application;

FIG. 2 is a schematic flow chart of a model process provided by an embodiment of the present application;

fig. 3a is a schematic diagram of a parabolic distribution rule provided in an embodiment of the present application;

FIG. 3b is a schematic diagram of a straight distribution rule provided in the embodiment of the present application;

fig. 3c is a schematic diagram of a rule of distribution of pairs according to an embodiment of the present application;

FIG. 3d is a schematic diagram of an exponential distribution rule provided in the embodiment of the present application;

FIG. 4 is a schematic diagram of determining a priori knowledge of a pruning structure according to an embodiment of the present application;

FIG. 5 is a graph illustrating pruning rate provided by an embodiment of the present application;

FIG. 6 is a schematic flow chart of a cropping model provided in an embodiment of the present application;

fig. 7 is a first schematic flowchart of a pruning process provided in an embodiment of the present application;

FIG. 8 is a diagram illustrating a clipping convolution kernel according to an embodiment of the present application;

FIG. 9 is a diagram illustrating a clipping convolution kernel channel according to an embodiment of the present application;

fig. 10 is a schematic flow chart of a pruning process provided in the embodiment of the present application;

FIG. 11 is a diagram illustrating a cropping zoom factor provided by an embodiment of the present application;

fig. 12 is a schematic flow chart of a pruning process provided in the embodiment of the present application;

FIGS. 13 a-13 b are schematic diagrams of clipping connection weights provided by embodiments of the present application;

fig. 14 is a fourth schematic flowchart of a pruning process provided in an embodiment of the present application;

FIGS. 15 a-15 b are schematic diagrams of a trimmed neuron according to an embodiment of the present application;

fig. 16 is a schematic structural diagram of a model processing apparatus according to an embodiment of the present application;

fig. 17 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The model processing method provided by the application belongs to the technology of an artificial intelligence software level, and achieves the purpose of simplifying the model by pruning the existing model to obtain the model with fewer parameters so as to reduce the storage space occupied by the model and reduce the calculation amount of the model.

The method and the device can be applied to various acceleration scenes of the depth network model, for example, the method and the device can be applied to acceleration scenes of the depth network model with the super-resolution image as input data, the structure of the depth network model is optimized (namely, the model is pruned), the size of the model can be greatly reduced, and the processing of the super-resolution image by the model can be accelerated. The deep network model is a model constructed by simulating a human actual Neural network, and may be a bp (back propagation) Neural network, a Convolutional Neural network model (CNN), or the like.

Please refer to fig. 1, which is a system architecture diagram of a model process according to an embodiment of the present application. The model thin server 10f establishes a connection with a service server cluster storing a deep network model through the switch 10e and the communication bus 10d, and the service server cluster may include: the system comprises a service server 10a, a service server 10b, a service server 10c, wherein the deep network model stored in each service server corresponds to one service. Taking the service server 10a as an example, when the deep network model in the service server 10a needs to be cut, the service server 10a may send the deep network model to be processed and the priori knowledge of the pruning structure of the model to the server 10 f. And the model reduction server 10f performs pruning treatment on each processing layer of the deep network model to be processed respectively according to the priori knowledge of the pruning structure to obtain a reduced deep network model, and the model reduction server 10f performs fine adjustment on the reduced deep network model to obtain the target deep network. Subsequently, the model reduction server 10f may return the target deep network to the service server 10a, the service server 10a may replace the originally stored deep network model in the server with the target deep network, and subsequently, the service server 10a provides a service based on the new target deep network.

The servers (including the service server and the thin server) may be independent physical servers, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be cloud servers providing basic cloud computing services such as cloud services, a cloud database, cloud computing, cloud functions, cloud storage, Network services, cloud communication, middleware services, domain name services, security services, Content Delivery Networks (CDNs), smart traffic platforms, auto-driving clouds, big data and artificial intelligence platforms, and the like.

Please refer to fig. 2, which is a schematic flow diagram of a model process provided in an embodiment of the present application, where the model process in the present application mainly refers to pruning a model, and the following embodiment is described with a server as an execution subject, where the model process may include the following steps:

step S201, obtaining a deep network model to be processed, wherein the deep network model to be processed comprises N processing layers, any processing layer comprises one or more pruning objects, and N is a positive integer.

Specifically, the server obtains a deep network model to be processed, where the deep network model to be processed may be a BP network model or a convolutional neural network model.

The deep network model to be processed may include 1 input layer and N processing layers, each including one or more pruned objects.

When the deep network model to be processed is a BP network model, the processing layer may be a hidden layer, and the pruning object may be a connection weight of the hidden layer or a neuron of the hidden layer; when the network model to be processed is a convolutional neural network model, the processing layer may be a convolutional layer or a Normalization layer (BN), and the pruning object may be a convolutional kernel in the convolutional layer or a scaling factor γ of the Normalization layer.

Step S202, obtaining the prior knowledge of the pruning structure corresponding to the depth network model to be processed, wherein the prior knowledge of the pruning structure comprises the distribution rule presented by the N pruning rates of the N processing layers.

Specifically, the pruning rate refers to the proportion of the number of the pruned pruning objects to the total number of the pruning objects, and the pruning rate of the processing layer is as follows: the number of the pruned pruning objects of the processing layer accounts for the proportion of the total amount of the pruning objects of the processing layer.

The priori knowledge of the pruning structure means that the N pruning rates of the N processing layers present a regular distribution. Wherein, the distribution rule may include: parabolic distribution rules, straight line distribution rules, line-to-line distribution rules, and exponential distribution rules. In colloquial, the priori knowledge of the pruning structure represents the overall distribution of the N pruning rates, but the pruning rate of the processing layer is not specifically defined.

If the distribution rule is a lower opening parabola distribution rule, the N pruning rates are shown to be uniformly increased to a top point and then uniformly decreased along with the deepening of the processing layer in the deep network model to be processed. Referring to fig. 3a, fig. 3a is a schematic diagram of a parabolic distribution rule provided in an embodiment of the present application, and as shown in fig. 3a, as a processing layer deepens, a pruning rate becomes larger and smaller.

If the distribution rule is a straight line distribution rule, the N pruning rates are basically kept unchanged along with the deepening of the processing layer in the deep network model to be processed; or the N pruning rates continuously increase at a constant speed along with the deepening of the processing layer in the deep network model to be processed; or the N pruning rates continuously decrease at a constant speed along with the deepening of the processing layer in the deep network model to be processed. Referring to fig. 3b, fig. 3b is a schematic diagram of a straight line distribution rule provided in the embodiment of the present application, and as shown in fig. 3b, the pruning rate remains unchanged as the processing layer deepens in the deep network model to be processed.

If the distribution rule is an alignment distribution rule, the N pruning rates are continuously increased along with the deepening of the processing layer in the deep network model to be processed, and the increasing range is larger first and smaller second. Referring to fig. 3c, fig. 3c is a schematic diagram of a rule of distribution of pairs provided in an embodiment of the present application, and as shown in fig. 3c, the pruning rate continuously increases as the processing layer deepens in the deep network model to be processed, and the increasing amplitude increases first and then decreases.

If the distribution rule is an exponential distribution rule, it is indicated that the N pruning rates continuously become smaller as the processing layer deepens in the deep network model to be processed, and the smaller pruning rates are larger first and smaller second. Referring to fig. 3d, fig. 3d is a schematic diagram of an exponential distribution rule provided in an embodiment of the present application, and as shown in fig. 3d, the pruning rate is continuously decreased as the processing layer is deepened in the deep network model to be processed, and the decreasing amplitude is first larger and then smaller.

Referring to fig. 4, fig. 4 is a schematic diagram of determining priori knowledge of a pruning structure according to an embodiment of the present application, and in order to prove the validity of the priori knowledge of the pruning structure, a plurality of sets of experiments are performed on an image classification task, a target detection task, and a semantic segmentation task. The deep network model to be processed is a YOLO-V3 model, and N processing layers are allIn the convolutional layers in the YOLO-V3 model, the pruning objects are convolution kernels as an example, after the YOLO-V3 model is pruned by using a network weight-reducing pruning method, the numbers of all convolutional layers in the YOLO-V3 model are taken as the horizontal axis, and the pruning rate is taken as the vertical axis. As shown in fig. 4, the respective clipping rate distributions are in a lower opening parabolic distribution rule, and the parabolic analytic expression can be regarded as y ═ a (x-h)²And + k, where x is h as the axis of symmetry and h is the median of the convolutional layer numbers. The distribution rule is called as prior knowledge of a pruning structure, namely a model structure after prior pruning before pruning.

And step S203, according to the priori knowledge of the pruning structure, pruning objects contained in each processing layer to obtain a simplified depth network model.

Specifically, the server may divide the N processing layers into M orders according to the positions of the processing layers in the to-be-processed deep network model, where M is not greater than N, and M is a positive integer. The number of processing layers included in each step is substantially the same, or the number of processing layers included in each step is different.

Further, the server may divide the N processing layers into M orders according to the positions of the processing layers in the to-be-processed deep network model and the priori knowledge of the pruning structure. For example, if the distribution rule is a parabolic distribution rule or a straight distribution rule, then uniform division may be employed; if the distribution rule is an aligned distribution rule or an exponential distribution rule, then non-uniform division may be employed.

Although the prior knowledge of the pruning structure determines the distribution rule of the N pruning rates, the pruning rate of each processing layer depends on the mathematical analytic expression of the distribution rule. The prior knowledge of the pruning structure further comprises a mathematical analytic expression of the distribution rule, the server determines the pruning rate of each step according to the mathematical analytic expression, and the server takes the pruning rate of any step as the pruning rate of a processing layer contained in the step. Processing layers belonging to the same order share a pruning rate.

When the distribution rule is a lower opening parabola distribution rule in the parabola distribution rules, the priori knowledge of the pruning structure further comprises an upper opening parabolaMathematical analytic formula of distribution rule

And maximum pruning rate C_mMaximum pruning rate C_mHas a value range of

Or

C_tRepresenting the pruning rate of the t-th order, Step is a constant and may be set to an integer multiple of 1/16. Analyzing the mathematical analysis formula to know

The pruning rate of the order is the maximum pruning rate C_mAnd the pruning rate of the t order and the maximum pruning rate C_mAnd the t order and the t

The distance between the steps.

Referring to fig. 5, fig. 5 is a schematic diagram of a pruning rate according to an embodiment of the present application, in which N processing layers are convolution layers in a deep network model to be processed, and a pruning object is a convolution kernel. Assuming that the deep network model to be processed has 40 convolutional layers, after dividing into 5 orders (i.e. M is 5), each order includes 8 convolutional layers. Mathematical analytic expression according to the distribution rule of the lower opening parabola (wherein, C_m1), it can be determined that the pruning rate of the first step (the first step includes the first convolutional layer to the eighth convolutional layer) is 0.4, i.e., the pruning rate of the first convolutional layer to the eighth convolutional layer is 0.4; it can be determined that the pruning rate of the second stage (the second stage includes the ninth convolutional layer to the sixteenth convolutional layer) is 0.7, i.e., the pruning rate of the ninth convolutional layer to the sixteenth convolutional layer is 0.7; it can be determined that the pruning rate of the third step (the third step includes the seventeenth convolutional layer to the twenty-fourth convolutional layer) is 1, i.e., the pruning rate of the seventeenth convolutional layer to the twenty-fourth convolutional layer is 1; a fourth order (fourth order including twenty-fifth order) may be determinedOne convolutional layer-the thirty-second convolutional layer) had a pruning rate of 0.7, i.e., the pruning rates of the twenty-fifth convolutional layer-the thirty-second convolutional layer were all 0.7; it can be determined that the pruning rate of the fifth order (the fifth order includes thirty-third convolutional layer to forty convolutional layer) is 0.4, i.e., the pruning rate of the thirty-third convolutional layer to forty convolutional layer is 0.4.

After the pruning rate of each processing layer is determined, the server can respectively carry out pruning processing on the pruning objects of the N processing layers according to the pruning rate of each layer. Specifically, a description will be given taking one process layer (referred to as a first process layer) of N process layers as an example: and the server determines the first pruning quantity according to the quantity of the pruning objects contained in the first processing layer and the pruning rate of the first processing layer. And pruning the pruning objects of the first processing layer according to the first pruning quantity, namely cutting off the pruning objects of the first pruning quantity in the first processing layer to obtain the first processing layer after pruning.

Of course, if the pruning object is a connection weight or a neuron, the pruning processing is to cut the connection weight or the neuron; if the pruning object is a convolution kernel, the pruning processing is cutting the convolution kernel; if the pruning object is a scaling factor, the pruning process is a clipping scaling factor.

The server can combine the N processing layers after pruning into a simplified deep network model, the simplified deep network model is a model obtained by simplifying the deep network model to be processed, and the storage space occupied by the model and the calculation amount are smaller than those of the deep network model to be processed.

And step S204, outputting the simplified deep network model.

Optionally, although the pruning processing is performed on each processing layer according to the priori knowledge of the pruning structure, the storage space occupied by the pruned simplified deep network model is actually smaller than that of the deep network model to be processed, but the storage space occupied by the simplified deep network model may not reach the preset condition. The specific process is as follows:

the server obtains a storage space occupied by the pruned reduced depth network model, and if the occupied storage space is not less than a preset storage space threshold (for example, the storage space threshold may be equal to 1M), the server uses the reduced depth network model as a new depth network model to be processed, and continues to prune the new depth network model to be processed according to the priori knowledge of the pruning structure, so as to obtain the new reduced depth network model. And continuously detecting whether the storage space occupied by the new reduced deep network model is smaller than a storage space threshold value, and continuously circulating, wherein when the storage space occupied by the new reduced deep network model is detected to be smaller than the storage space threshold value, the reduced deep network model at the moment meets the requirements, and the server can output the reduced deep network model.

The deep network model to be processed is used for identifying multimedia data, and the multimedia data may include: images, video, audio, and text. Subsequently, the server can train the output simplified deep network model by adopting the sample multimedia data until the trained simplified deep network model meets the model convergence condition. It should be noted that, because the deep network model to be processed is originally a trained model, part of the parameters are deleted after pruning, and then the simplified deep network model is trained, the training cost consumed for achieving the model convergence condition is low, that is, the model can be converged only by fine tuning the simplified deep network model.

The condition that the model convergence condition is met means that the training frequency reaches a preset frequency threshold, or the variation of the model parameters before and after training is smaller than a preset variation threshold, or the recognition accuracy of the trained model test multimedia data reaches a preset accuracy threshold.

The server can take the trained simplified deep network which meets the model convergence condition as a target deep network model. Subsequently, the server can perform recognition processing on the multimedia data based on the target deep network model, that is, the server can acquire the multimedia data to be recognized, and call the target deep network model to perform recognition processing on the multimedia data to be recognized, so as to obtain a recognition result of the multimedia data to be recognized.

It can be known that, compared with the deep network model to be processed, the target deep network model has some pruned objects in the model cut, so that the computation time of the target deep network model for performing forward propagation computation on the multimedia data to be recognized is shorter.

Referring to fig. 6, fig. 6 is a schematic flow chart of a clipping model provided in an embodiment of the present application, and the following embodiment is described by taking an example in which a deep network model to be processed is a convolutional neural network model, and N processing layers are convolutional layers in the convolutional neural network model, where the clipping model includes the following steps:

in step S301, the flow starts.

Step S302, reading the convolutional neural network model to be cut.

Step S303, numbers are distributed to convolutional layers in the convolutional neural network model.

Specifically, the server may regard each convolutional layer as a node in the graph theory, traverse from the output node to the input node, and construct a directed acyclic graph. And topologically sorting the directed acyclic graph, and numbering the sorted convolutional layers from small to large in sequence. Here, the purpose of topological ordering is to ensure that the numbering is done in order of the direction of data flow from input to output. Of course, if the convolutional neural network model is of a single-input single-output structure, the convolutional neural network model can also be directly numbered in sequence.

Step S304, the convolution layer is graded.

Specifically, the server ranks the convolutional layers according to the sizes of the numbers of the convolutional layers, allocates the cutting rates according to different ranks, and the same rank shares one cutting rate. In experiments, the effect of pruning the convolutional layer into 7 grades is the best. Therefore, if M is 7, each step includes N/7 convolutional layers, and if N/7 is not an integer, rounding up is performed. For example, if uniform division is employed, the conventional 42 convolutional layers are divided into 7 stages, each of which includes 6 processing layers. Wherein, the first stage comprises a processing layer 1-a processing layer 7, the second stage comprises a second processing layer 8-a processing layer 14, the third stage comprises a first processing layer 15-a processing layer 21, and the second stage comprises a second processing layer 22-a processing layer 28.

Step S305, a clipping rate is assigned by rank.

In particular, in the convolutional neural network, the number of convolutional kernels in each layer is generally a multiple of 8, and the minimum number is 16, because the arrangement is favorable for reasoning acceleration of the model. In order to ensure that the premise is still met after pruning, the cutting rate of each stage of distribution is set to be an integral multiple of 1/16. Assuming that the distribution rule is a lower opening parabola distribution rule, the mathematical analytic formula of the lower opening parabola distribution rule is as follows:

C_mis the maximum pruning rate, the maximum pruning rate C_mIn the range of

Or

The distance between the steps. And after the server determines the pruning rate of each step according to the mathematical analysis formula, taking the pruning rate of any step as the pruning rate of the processing layer contained in the step.

In step S306, pruning is performed on each convolutional layer.

And step S307, fine tuning the pruned convolutional neural network model.

After pruning, fine tuning training is carried out on the model, and the precision of the model can be recovered.

In step S308, the flow ends.

The method can obviously reduce the parameter number of the network model and improve the reasoning speed of the network model, and can be applied to image classification tasks, target detection tasks, semantic segmentation and other tasks in the field of computer vision. And the method and the device prune the network model based on the prior knowledge, and keep the precision of the network model to the maximum extent on the premise of reducing the parameter quantity of the model.

Referring to fig. 7, fig. 7 is a schematic flow chart of a pruning process according to an embodiment of the present application, which mainly describes how to perform a pruning process on a pruning object of a first processing layer when a deep network model to be processed is a convolutional neural network model, the first processing layer is a convolutional layer (the first processing layer is one of N processing layers, and the N processing layers may all be convolutional layers or both may have a convolutional layer and a normalization layer), and the pruning object is a convolution kernel. The pruning treatment comprises the following steps:

step S701, determining a convolution kernel coefficient of each convolution kernel included in the first processing layer.

Specifically, the convolution kernel coefficient is an L1 norm or an L2 norm of the convolution kernel, an L1 norm is a sum of absolute values of elements in the convolution kernel, and an L2 norm is a sum of squares of elements in the convolution kernel.

Step S702, sorting the plurality of convolution kernel coefficients corresponding to the first processing layer in order from small to large.

Step S703, performing pruning processing on the pruning object of the first processing layer according to the sorting result and the first pruning quantity, to obtain the first processing layer after the pruning processing.

Specifically, whether the first processing layer is the first processing layer of the N processing layers is judged, and if the first processing layer is the first processing layer of the N processing layers, the server cuts the convolution kernels corresponding to the coefficients of the convolution kernels of the first pruning quantity before pruning, so that the first processing layer after pruning can be obtained.

For example, if the first processing layer is a first processing layer of the N processing layers, the first processing layer includes 4 convolution kernels, which are convolution kernel 1, convolution kernel 2, convolution kernel 3, and convolution kernel 4, respectively, and if the convolution kernel coefficient of convolution kernel 1 is 3, the convolution kernel coefficient of convolution kernel 2 is 4, the convolution kernel coefficient of convolution kernel 3 is 5, the convolution kernel coefficient of convolution kernel 4 is 1, and the first pruning number is 2, then the convolution kernel 4 and convolution kernel 1 are pruned, and the first processing layer including only convolution kernel 2 and convolution kernel 3 is used as the first processing layer after pruning.

If the first processing layer is not the first processing layer in the N processing layers, the server still cuts out the convolution kernels corresponding to the coefficients of the convolution kernels of the first pruning quantity before the cutting, and the convolution layer to be processed is obtained.

The server acquires the object position of the clipped pruning object in the second processing layer in the previous processing layer (called the second processing layer, which can be a convolution layer or a normalization layer) of the first processing layer. For example, if the second processing layer includes 4 pruning objects and a certain pruning object is the first pruning object in the 4 pruning objects, the object position of the pruning object may be represented as 1000.

After the fact that the pruned pruning object in the second processing layer is the object position is determined, in the convolution layer to be processed, the server prunes a convolution kernel channel corresponding to the object position, and the first processing layer after pruning processing can be obtained. This is because: no matter the second processing layer is a convolution layer or a normalization layer, the number of feature maps (which may also be referred to as the number of channels) is directly reduced after pruning objects of the second processing layer, and then the number of channels of the convolution kernel should be correspondingly adjusted if the first processing layer behind the second processing layer is a convolution layer. By comparing the operation flow of the first processing layer which is the first processing layer with the operation flow of the first processing layer which is not the first processing layer, the first processing layer is the first processing layer, and the channel of the convolution kernel does not need to be further cut; conversely, the first processing layer is not the first processing layer, and further clipping of the channels of the convolution kernel is required.

For example, the second processing layer and the first processing layer are convolutional layers, and the second processing layer originally includes 4 convolutional kernels and the first processing layer originally includes 3 convolutional kernels. Since the second processing layer includes 4 convolutions, the feature map output after processing by the second processing layer has 4 (i.e. the number of channels is 4), the number of channels of each convolution kernel of the first processing layer is 4, and the 3 convolution kernels of the first processing layer can be represented as: a1 × b1 × 4, a2 × b2 × 4, and a3 × b3 × 4. If the clipped convolution kernel of the second processing layer is the first convolution kernel and the fourth convolution kernel, and the clipped convolution kernel of the first processing layer is the first convolution kernel. Then, the first processing layer has 2 convolution kernels left, and 2 channels of the 2 convolution kernels need to be cut off (and the first channel and the fourth channel are cut off), so that the 2 convolution kernels of the first processing layer after cutting off can be expressed as: a2 × b2 × 2 and a3 × b3 × 2.

Referring to fig. 8, fig. 8 is a schematic diagram of a clipping convolution kernel according to an embodiment of the present application, and as shown in fig. 8, a convolution layer includes 5 convolution kernels, i.e., convolution kernel 1, convolution kernel 2, convolution kernel 3, convolution kernel 4, and convolution kernel 5. In the forward model calculation process, each convolution kernel generates a corresponding feature map, so that the convolution layer shown in fig. 8 generates 5 feature maps (or 5 channels of output data). If convolution kernels 2 and 4 are to-be-clipped convolution kernels, then after clipping convolution kernels 2 and 4, there are 3 convolution kernels (convolution kernel 1, convolution kernel 3, and convolution kernel 5) left in the convolutional layer. Assume that the previous processing layer of the convolutional layer is also convolutional layer, and the previous convolutional layer originally has 3 convolutional kernels, but is cropped by 1 convolutional kernel (the middle convolutional kernel of the 3 convolutional kernels is cropped). Then, for the

convolution kernels

1, 3, and 5 in the convolution layer shown in fig. 8, the number of channels of the 3 convolution kernels is all 3, but the middle channels of the 3 convolution kernels are cut out, so that the trimmed convolution layer can be obtained. Referring to fig. 9, fig. 9 is a schematic diagram of clipping convolution kernel channels according to an embodiment of the present application, taking convolution kernel 1 in fig. 8 as an example, before adjusting the number of channels, convolution kernel 1 originally has 3 channels, and after clipping the middle channels of 3 channels, convolution kernel 1 has only 2 channels.

Referring to fig. 10, fig. 10 is a schematic flow diagram of a pruning process according to an embodiment of the present application, where the present application mainly describes how to prune the pruning object of the first processing layer when the deep network model to be processed is a convolutional neural network model, the first processing layer is a normalization layer (the first processing layer is one of N processing layers, and the N processing layers may all be normalization layers, or both a convolutional layer and a normalization layer), and the pruning object is a scaling factor. The pruning treatment comprises the following steps:

judging whether the first processing layer is the first processing layer in the N processing layers, and if the first processing layer is the first processing layer in the N processing layers, executing the step S1001-the step S1002; if the first processing layer is not the first processing layer of the N processing layers, steps S1003 to S1006 are performed.

Step S1001, if the first processing layer is a first processing layer of the N processing layers, sorting the scaling factors corresponding to the first processing layer in order from small to large.

Step S1002, zooming factors of the number of the first pruning before the pruning are carried out, and a first processing layer after the pruning processing is obtained.

For example, if the first processing layer is the first processing layer of the N processing layers, the first processing layer includes 4 scaling factors, γ 1, γ 2, γ 3, and γ 4; if γ 1 is 0.2, γ 2 is 0.4, γ 3 is 0.6, γ 4 is 0.8, and the first pruning number is 2, γ 1 and γ 2 are cut out, and the first treated layer containing only γ 3 and γ 4 is used as the first treated layer after the pruning treatment.

Step S1003, if the first processing layer is not the first processing layer of the N processing layers, obtaining an object position of the pruned object in the second processing layer, where the second processing layer is the previous processing layer of the first processing layer of the N processing layers, and the second processing layer is a convolutional layer.

Specifically, if the first processing layer is not the first processing layer of the N processing layers and the second processing layer is the convolutional layer (the second processing layer is the previous processing layer of the first processing layer of the N processing layers), the server first obtains the object position of the pruned pruning object in the second processing layer.

Step S1004, a scaling factor corresponding to the object position is cut in the second processing layer, so as to obtain a normalization layer to be processed.

Specifically, in the second processing layer, the server cuts out the scaling factor corresponding to the position of the object, and the normalization layer to be processed can be obtained.

Step S1005, using a difference value between the first pruning quantity and the second pruning quantity as a target second pruning quantity, where the second pruning quantity is a pruning quantity of a second processing layer.

Specifically, the server takes a difference between the first pruning quantity and the second pruning quantity as a target second pruning quantity, and the second pruning quantity is the quantity of the pruning objects cut by the second processing layer.

Step S1006, the scaling factors included in the normalization layer to be processed are sorted from small to large, and in the normalization layer to be processed, the scaling factors of the second pruning quantity of the target before pruning are obtained, so that the first processing layer after pruning is obtained.

Specifically, the server sorts the scaling factors included in the normalization layer to be processed in the descending order, and in the normalization layer to be processed, the scaling factors of the second pruning quantity of the target before pruning can be obtained, so that the first processing layer after pruning processing can be obtained. Assuming that the first processing layer originally comprises a scaling factors, the second processing layer clips b pruning objects, and the first pruning quantity is c, the to-be-processed normalization layer comprises a-b scaling factors, and the target second pruning quantity is c-b.

It should be noted that, if the first pruning quantity is smaller than the second pruning quantity, the server directly classifies the to-be-processed layer as the first processing layer after the pruning processing.

If the first processing layer is not the first processing layer of the N processing layers, the first processing layer is the normalization layer, and the second processing layer is the normalization layer, the server can directly select the scaling factor with the minimum first pruning quantity according to the size of the scaling factor in the first processing layer for clipping, and the normalization layer after pruning can be obtained. This is because, in the convolutional neural network model, the normalization layer precedes the convolutional layer, and thus the former normalization layer does not affect the latter normalization layer, and the normalization layer only affects the number of convolutional kernel channels of the convolutional layer adjacent to it. It should be noted that, when the scaling factor of the first processing layer is clipped, and the next processing layer of the second processing layer is a convolutional layer, the number of channels of the convolutional kernel of the corresponding convolutional layer should also be adjusted.

Referring to fig. 11, fig. 11 is a schematic diagram of a cropping scaling factor according to an embodiment of the present disclosure, and as shown in fig. 11, the normalization layer only changes the value of the input data and does not change the data size and the data dimension. If 5 feature maps are input into the normalization layer, the output data processed by the normalization layer is also 5 feature maps, and the size of the feature maps is not changed. The normalization layer shown in fig. 11 includes 5 scaling factors, and assuming that what is to be clipped is scaling factor 2 and scaling factor 4, after scaling factor 2 and scaling factor 4 are clipped, in the model forward calculation process, the input is still 5 feature maps, but the output is only 3 feature maps, that is, the number of channels is changed from 5 channels to 3 channels.

Referring to fig. 12, fig. 12 is a schematic flowchart of a pruning process provided in an embodiment of the present application, which mainly describes how to clip connection weights in a hidden layer when a deep network model to be processed is a non-convolutional deep network model (e.g., a BP neural network model), that is, when the deep network model to be processed is a neural network model (or a deep network model), but not a convolutional neural network model, N processing layers are all hidden layers, and a pruning object is a connection weight. Since the rough process of performing pruning processing on each processing layer is the same, how to prune the pruning object of the first processing layer will be described below by taking one processing layer (referred to as a first processing layer, which includes a plurality of pruning objects) of the N processing layers as an example. The pruning treatment comprises the following steps:

step S1201, sorting the plurality of connection weights corresponding to the first processing layer in order from small to large.

Step S1202, adjusting the values of the connection weights of the previous first pruning quantity to be sparse thresholds, and using the first processing layer after the connection weight adjustment as the first processing layer after the pruning processing.

Specifically, the server adjusts the values of the connection weights of the first pruning quantity to be a sparse threshold (the sparse threshold may be equal to 0), and the adjusted first processing layer is the first processing layer after pruning.

Note that clipping the connection weight in the hidden layer does not delete the convolution kernel or the scaling factor as it is, but instead sets the connection weight to 0, unlike clipping the convolution kernel or the clipping scaling factor. This is because the smaller the connection weight is, the smaller the contribution to the output is, setting the connection weight to 0 directly does not change the dimensionality of the input data and the output data, and the storage space occupied by the connection weight of 0 is much smaller than the storage space occupied by the value of non-0, and furthermore, the connection weight of 0, and when performing forward calculation, since 0 is multiplied by any numerical value and is 0, only non-calculation needs to be performed, and the calculation speed is also greatly increased.

Please refer to fig. 13 a-13 b, which are schematic diagrams illustrating a clipping connection weight according to an embodiment of the present application, wherein fig. 13a shows before model clipping, and fig. 13b shows after model clipping. As can be seen from fig. 13a, the hidden layer 1 contains 5 neurons, the hidden layer 2 contains 4 neurons, the hidden layer 3 contains 3 neurons, and the neurons between layers are fully connected. After the scheme of the application is adopted, the number of the neurons is not changed, but the connection between the neurons between layers becomes sparse, namely the number of the connection weights (the connection weights are the connection lines between the neurons and the neurons) is reduced, so that when the model after pruning is used for forward calculation, the parameters are reduced, and the calculation speed is increased. Although the connection weight shown in fig. 13b is directly deleted, in actual operation, the value of the connection weight to be redundant is directly set to 0.

Referring to fig. 14, fig. 14 is a schematic flowchart of a pruning process according to an embodiment of the present application, which mainly describes how to clip neurons in a hidden layer when a non-convolutional deep network model (e.g., a BP neural network model), i.e., a deep network model to be processed, is a neural network model (or a deep network model) but is not a convolutional neural network model, N processing layers are all hidden layers, and a pruning object is a neuron. Since the rough process of performing pruning processing on each processing layer is the same, how to prune the pruning object of the first processing layer will be described below by taking one processing layer (referred to as a first processing layer, which includes a plurality of pruning objects) of the N processing layers as an example. The pruning treatment comprises the following steps:

step S1401, a sum of connection weights of each neuron element of the first processing layer is acquired, respectively.

Specifically, each neuron is connected with a plurality of connection weights, and the server can sum the connection weights of each neuron to obtain the sum of the connection weights of each neuron. Of course, the sum of the connection weights of a neuron is the sum of one row or one column of the connection weight matrix.

Step S1402, sorting the sum of the plurality of connection weights corresponding to the first processing layer in order from small to large.

Specifically, the server sorts the sum of the connection weights of all neurons included in the first processing layer in order from small to large.

Step S1403, a neuron corresponding to the sum of the connection weights of the first number of pruning before the pruning is performed, and a first processing layer after the pruning processing is obtained.

Specifically, the server cuts the neurons corresponding to the sum of the connection weights of the first pruning quantity before pruning, and then the first processing layer after pruning processing can be obtained.

It can be known that directly clipping neurons changes the dimension of the output data, for example, if the first processing layer before clipping includes 4 neurons, the dimension of the output data of the first processing layer is 4 × 1, and if 1 neuron is clipped, the dimension of the output data of the first processing layer after clipping is 3 × 1.

Since the neurons are directly connected to the connection weights, in addition to clipping the neurons of the first processing layer, all the connection weights connecting the clipped neurons need to be deleted, and all the connection weights connecting the clipped neurons include: forward connecting the weights of the clipped neurons, and backward connecting the weights of the clipped neurons. Wherein, whether a neuron is to be clipped is judged based on the sum of the connection weights of the neuron, and the sum of the connection weights is equal to the sum of the connection weights of the forward connection neuron.

Optionally, the server may directly cut off the neurons, and may directly set the forward connection weights and the backward connection weights connecting the neurons to 0, so that the output of the neuron is equal to 0, specifically, the server determines the connection weights corresponding to the sum of the connection weights of the first pruning quantity, and adjusts the values of the determined connection weights to a sparse threshold (the sparse threshold may be equal to 0), so as to obtain the first processing layer after pruning.

Please refer to fig. 15 a-15 b, which are schematic diagrams of a clipping neuron according to an embodiment of the present application, wherein fig. 15a shows before model clipping and fig. 15b shows after model clipping. As can be seen from fig. 15a, the hidden layer 1 contains 5 neurons, the hidden layer 2 contains 4 neurons, the hidden layer 3 contains 3 neurons, and the neurons between layers are fully connected. After the scheme of the application is adopted, the

neurons

2 and 3 are determined to be deleted in the hidden layer 2. After deleting the neuron 2 and the neuron 3, the connection weight connected to the neuron 2 (the connection weight is a connection line between the neuron and the neuron) should be deleted correspondingly, and the connection weight connected to the neuron 3 should be deleted correspondingly. As can be seen from fig. 15b, before and after the model is cut, the number of neurons becomes smaller, but the neurons between layers are still fully connected. It should be noted that, here, the connection weight for deleting the neuron and deleting the connected neuron may be directly deleted, or may be directly set to 0 (if the output of the neuron is directly set to 0, it means that the neuron does not participate in the forward calculation, because the result of the forward calculation of the neuron is 0), or directly set to 0. Therefore, when the pruned model is calculated in the forward direction, the parameters are less, and the calculation speed is increased.

The method and the device can obviously reduce the parameter number of the deep network model to be processed and improve the reasoning speed of the model, and can be applied to image classification tasks, target detection tasks, semantic segmentation and other tasks in the field of computer vision. As shown in table 1 below, in the PASCAL VOC2012+2017 dataset, the neural network model fast pruning method based on the priori knowledge of the pruning structure proposed in the present application can reduce the parameter amount of the YOLO-V5 model by 79.6%, and the mAP (Mean Average Precision, mapp) is reduced by only 0.5%.

TABLE 1

Model (model)	Amount of ginseng	Precision (mAP)
			Model YOLO-V5	21114417	0.824
Post-pruning Yolo-V5 model	4539351	0.819

According to the method, the deep network model is pruned, the redundant part of the model is pruned, the size of the storage space occupied by the model is compressed, and then when the model carries out forward propagation calculation, the parameters participating in calculation are reduced, the calculation amount can be reduced, and the operation efficiency of the model can be improved; moreover, each processing layer is cut according to the prior knowledge, namely the distribution rule corresponding to the N pruning rates, compared with the random cutting of the pruning objects of each processing layer, the model can be simplified, and the model precision after cutting can be ensured; furthermore, each processing layer is cut according to the prior knowledge, and compared with the method of changing the loss function of the model to perform sparse training and then determining the cutting object of each layer, the method is simpler and more flexible, and is beneficial to expanding the application range of the method.

Further, please refer to fig. 16, which is a schematic structural diagram of a model processing apparatus according to an embodiment of the present application. As shown in fig. 16, the model processing apparatus 1 can be applied to the server in the embodiment corresponding to fig. 2 to fig. 15b described above. Specifically, the model processing apparatus 1 may be a computer program (including program code) running in a server, for example, the model processing apparatus 1 is an application software; the model processing apparatus 1 may be used to perform corresponding steps in the methods provided by the embodiments of the present application.

The model processing apparatus 1 may include: the device comprises an acquisition module 11, a pruning module 12 and an output module 13.

An obtaining module 11, configured to obtain a deep network model to be processed, where the deep network model to be processed includes N processing layers, each processing layer includes one or more pruning objects, and N is a positive integer;

the obtaining module 11 is further configured to obtain priori knowledge of a pruning structure corresponding to the to-be-processed depth network model, where the priori knowledge of the pruning structure includes a distribution rule presented by N pruning rates of the N processing layers;

the pruning module 12 is configured to perform pruning processing on the pruning objects included in each processing layer respectively according to the priori knowledge of the pruning structure, so as to obtain a simplified depth network model;

and the output module 13 is used for outputting the reduced deep network model.

In a possible implementation manner, the pruning module 12 is specifically configured to, when configured to perform pruning processing on the pruning objects included in each processing layer according to the priori knowledge of the pruning structure to obtain the simplified depth network model:

dividing the N processing layers into M orders according to the positions of the processing layers in the deep network model to be processed, wherein M is not more than N and is a positive integer;

respectively determining the pruning rate of the processing layers contained in each step according to the priori knowledge of the pruning structure, wherein the pruning rates of the processing layers belonging to the same step are the same;

and according to the pruning rate of each processing layer, pruning objects of the N processing layers respectively, and combining the N processing layers after pruning into the simplified deep network model.

In a possible implementation manner, the priori knowledge of the pruning structure further includes a maximum pruning rate, and when the distribution rule is a parabolic distribution rule, the pruning module 12 is specifically configured to, when being configured to respectively determine the pruning rate of the processing layer included in each step according to the priori knowledge of the pruning structure:

taking the maximum pruning rate as the second

The pruning rate of the order;

according to the maximum pruning rate and the t order and the t

The distance between orders, determining the pruning rate of the t-th order, t is a positive integer, and

and taking the pruning rate of the t order as the pruning rate of the processing layer contained in the t order.

In a possible implementation manner, the first processing layer is one processing layer of the N processing layers, the first processing layer includes a plurality of pruning objects, and the pruning module 12, when being configured to perform the pruning processing on the pruning objects of the first processing layer according to the pruning rate of the first processing layer to obtain the first processing layer after the pruning processing, is specifically configured to:

determining a first pruning quantity according to the quantity of pruning objects contained in the first processing layer and the pruning rate of the first processing layer;

and according to the first pruning quantity, carrying out pruning treatment on the pruning objects of the first processing layer to obtain the first processing layer after the pruning treatment.

In a possible implementation manner, when the to-be-processed deep network model is a convolutional neural network model, the first convolutional layer is a convolutional layer, and the pruning object of the first processing layer is a convolutional core, the pruning module 12 is specifically configured to, when performing pruning processing on the pruning object of the first processing layer according to the first pruning quantity to obtain a first processing layer after the pruning processing:

respectively determining a convolution kernel coefficient of each convolution kernel contained in the first processing layer;

sequencing the coefficients of the plurality of convolution kernels corresponding to the first processing layer from small to large;

and pruning the pruning objects of the first processing layer according to the sequencing result and the first pruning quantity to obtain the first processing layer after pruning.

In a possible implementation manner, when the pruning module 12 is configured to perform pruning processing on the pruning object in the first processing layer according to the sorting result and the first pruning quantity, to obtain the first processing layer after the pruning processing, specifically configured to:

and if the first processing layer is the first processing layer in the N processing layers, obtaining the first processing layer after pruning, wherein before pruning, the convolution kernels correspond to the convolution kernel coefficients of the first pruning quantity.

If the first processing layer is not the first processing layer in the N processing layers, the convolution kernels corresponding to the coefficients of the convolution kernels with the first pruning quantity before clipping are obtained to obtain the convolution layer to be processed;

acquiring an object position of a pruned object in a second processing layer in the second processing layer, wherein the second processing layer is a previous processing layer of the first processing layer in the N processing layers;

and in the convolutional layer to be processed, cutting a convolutional kernel channel corresponding to the position of the object to obtain a first processing layer after pruning.

In a possible implementation manner, when the to-be-processed deep network model is a convolutional neural network model, the first convolutional layer is a normalization layer, and the pruning object of the first processing layer is a scaling factor, the pruning module 12 is specifically configured to, when performing pruning processing on the pruning object of the first processing layer according to the first pruning quantity to obtain a first processing layer after the pruning processing, perform:

if the first processing layer is the first processing layer of the N processing layers, sequencing a plurality of scaling factors corresponding to the first processing layer from small to large;

and scaling factors of the first pruning quantity before pruning to obtain a first processing layer after pruning processing.

In a possible embodiment, the pruning module 12 is further configured to:

if the first processing layer is not the first processing layer of the N processing layers, acquiring the object position of the pruned object in the second processing layer, wherein the second processing layer is the previous processing layer of the first processing layer of the N processing layers and is a convolutional layer;

cutting a scaling factor corresponding to the position of the object in the second processing layer to obtain a normalization layer to be processed;

taking the difference value between the first pruning quantity and the second pruning quantity as a target second pruning quantity, wherein the second pruning quantity is the pruning quantity of a second processing layer;

and sequencing the scaling factors contained in the normalization layer to be processed from small to large, and in the normalization layer to be processed, cutting a second pruning quantity of scaling factors of the target before cutting to obtain a first processing layer after pruning.

In a possible implementation manner, when the to-be-processed deep network model is a non-convolution deep network model and the pruning object of the first processing layer is a connection weight, the pruning module 12 is specifically configured to, when performing pruning processing on the pruning object of the first processing layer according to the first pruning quantity to obtain a first processing layer after the pruning processing, perform:

sequencing a plurality of connection weights corresponding to the first processing layer according to a sequence from small to large;

and adjusting the connection weights of the first pruning quantity to be a sparse threshold, and taking the first processing layer after the connection weight adjustment as the first processing layer after the pruning processing.

In a possible implementation manner, when the to-be-processed depth network model is a non-convolution depth network model and the pruning object of the first processing layer is a neuron, the pruning module 12 is specifically configured to, when performing pruning processing on the pruning object of the first processing layer according to the first pruning quantity to obtain a first processing layer after the pruning processing, perform:

respectively obtaining the sum of the connection weights of each neuron of the first processing layer;

sorting the sum of the plurality of connection weights corresponding to the first processing layer in a descending order;

and connecting the neurons corresponding to the sum of the weights of the first pruning quantity before the pruning to obtain a first processing layer after the pruning processing.

In a possible embodiment, the distribution law includes a parabolic distribution law, a straight distribution law, a pair-line distribution law, and an exponential distribution law.

In a possible embodiment, the device 1 further comprises: a training module 14 and a recognition module 15.

A training module 14, configured to train the reduced deep network model with sample multimedia data to obtain the target deep network model;

and the identification module 15 is configured to acquire multimedia data to be identified, and call the target deep network model to perform identification processing on the multimedia data to be identified, so as to obtain an identification result of the multimedia data to be identified.

In a possible embodiment, the device 1 further comprises: judging module 16

A judging module 16, configured to, if the storage space occupied by the simplified deep network model is not less than the storage space threshold, use the simplified deep network model as the deep network model to be processed, and continue to perform pruning on the deep network model to be processed to obtain a simplified deep network model;

the determining module 16 is further configured to notify the output module 13 to execute the step of outputting the reduced deep network model if the storage space occupied by the reduced deep network model is smaller than the storage space threshold.

The steps involved in the methods shown in fig. 2-15 b may each be performed by various modules in the model processing apparatus shown in fig. 16, according to embodiments of the present invention. For example, steps S201 to S204 shown in fig. 2, steps S701 to S703 shown in fig. 7, steps S1001 to S1006 of fig. 10, steps S1201 to S1202 of fig. 12, and steps S1401 to S1403 of fig. 14 may be performed by the acquisition module 11, the pruning module 12, the output module 13, the training module 14, the recognition model 15, and the judgment module 16 shown in fig. 16, respectively.

Further, please refer to fig. 17, which is a schematic structural diagram of a computer device according to an embodiment of the present application. The server in the corresponding embodiments of fig. 2-15 b described above may be a computer device 1000. As shown in fig. 17, the computer apparatus 1000 may include: a user interface 1002, a processor 1004, an encoder 1006, and a memory 1008. Signal receiver 1016 is used to receive or transmit data via cellular interface 1010, WIFI interface 1012. The encoder 1006 encodes the received data into a computer-processed data format. The memory 1008 has stored therein a computer program by which the processor 1004 is arranged to perform the steps of any of the method embodiments described above. The memory 1008 may include volatile memory (e.g., dynamic random access memory DRAM) and may also include non-volatile memory (e.g., one time programmable read only memory OTPROM). In some instances, the memory 1008 can further include memory located remotely from the processor 1004, which can be connected to the computer device 1000 via a network. The user interface 1002 may include: a keyboard 1018, and a display 1020.

In the computer device 1000 shown in fig. 17, the processor 1004 may be configured to call the memory 1008 to store a computer program to implement:

and outputting the simplified deep network model.

In an embodiment, when executing the prior knowledge of the pruning structure, the processor 1004 performs pruning on the pruning objects included in each processing layer to obtain the reduced depth network model, specifically executing the following steps:

In an embodiment, the priori knowledge of the pruning structure further includes a maximum pruning rate, and when the distribution rule is a parabolic distribution rule, the processor 1004 specifically performs the following steps when performing the respective determination of the pruning rate of the processing layer included in each step according to the priori knowledge of the pruning structure:

taking the maximum pruning rate as the second

The pruning rate of the order;

according to the maximum pruning rate and the t order and the t

In an embodiment, the first processing layer is one of N processing layers, the first processing layer includes a plurality of pruning objects, and when the processor 1004 performs pruning processing on the pruning objects of the first processing layer according to the pruning rate of the first processing layer to obtain the first processing layer after the pruning processing, the following steps are specifically performed:

In an embodiment, when the to-be-processed deep network model is a convolutional neural network model, the first convolutional layer is a convolutional layer, and the pruning object of the first processing layer is a convolutional core, the processor 1004 performs the following steps when performing pruning processing on the pruning object of the first processing layer according to the first pruning quantity to obtain a first processing layer after pruning processing:

In an embodiment, when the processor 1004 performs pruning processing on the pruning object of the first processing layer according to the sorting result and the first pruning quantity to obtain the first processing layer after pruning processing, the following steps are specifically performed:

In an embodiment, when the to-be-processed deep network model is a convolutional neural network model, the first convolutional layer is a normalization layer, and the pruning object of the first processing layer is a scaling factor, the processor 1004 performs the following steps when performing pruning processing on the pruning object of the first processing layer according to the first pruning quantity to obtain a first processing layer after the pruning processing:

In one embodiment, the processor 1004 further performs the following steps:

In an embodiment, when the to-be-processed deep network model is a non-convolution deep network model and the pruning object of the first processing layer is a connection weight, the processor 1004 performs pruning processing on the pruning object of the first processing layer according to the first pruning quantity to obtain a first processing layer after the pruning processing, and specifically performs the following steps:

In an embodiment, when the to-be-processed deep network model is a non-convolution deep network model and the pruning object of the first processing layer is a neuron, the processor 1004 performs pruning processing on the pruning object of the first processing layer according to the first pruning quantity to obtain a first processing layer after pruning processing, and specifically performs the following steps:

In one embodiment, the distribution rules include a parabolic distribution rule, a straight distribution rule, a line distribution rule, and an exponential distribution rule.

In one embodiment, the processor 1004 further performs the following steps:

training the simplified deep network model by adopting sample multimedia data to obtain the target deep network model;

and acquiring multimedia data to be recognized, calling the target deep network model to recognize the multimedia data to be recognized, and obtaining a recognition result of the multimedia data to be recognized.

In one embodiment, the processor 1004 further performs the following steps:

if the storage space occupied by the simplified deep network model is not less than the storage space threshold, taking the simplified deep network model as the deep network model to be processed, and continuing pruning the deep network model to be processed to obtain a simplified deep network model;

and if the storage space occupied by the simplified deep network model is smaller than the storage space threshold value, executing the step of outputting the simplified deep network model.

It should be understood that the computer device 1000 described in this embodiment of the present application may perform the description of the model processing method in the embodiment corresponding to fig. 2-15 b, and may also perform the description of the model processing apparatus 1 in the embodiment corresponding to fig. 16, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.

Further, here, it is to be noted that: an embodiment of the present application further provides a computer storage medium, and the computer storage medium stores the aforementioned computer program executed by the model processing apparatus 1, and the computer program includes program instructions, and when the processor executes the program instructions, the description of the model processing method in the embodiment corresponding to fig. 2 to fig. 15b can be performed, so that details are not repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in the embodiments of the computer storage medium referred to in the present application, reference is made to the description of the embodiments of the method of the present application. By way of example, program instructions may be deployed to be executed on one computer device or on multiple computer devices at one site or distributed across multiple sites and interconnected by a communication network, and the multiple computer devices distributed across the multiple sites and interconnected by the communication network may be combined into a blockchain network.

According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device can perform the method in the embodiment corresponding to fig. 2 to fig. 15b, and therefore, the detailed description thereof will not be repeated here.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims

1. A method of model processing, the method comprising:

and outputting the simplified deep network model.

2. The method according to claim 1, wherein the pruning objects included in each processing layer are respectively pruned according to the priori knowledge of the pruning structure to obtain a reduced depth network model, and the method includes:

3. The method of claim 2, wherein the a priori knowledge of the pruning structure further includes a maximum pruning rate, and when the distribution rule is a parabolic distribution rule, the determining the pruning rate of the processing layer included in each step according to the a priori knowledge of the pruning structure includes:

taking the maximum pruning rate as the second

The pruning rate of the order;

according to the maximum pruning rate and the t order and the t

4. The method according to claim 2, wherein the first processing layer is one of N processing layers, the first processing layer includes a plurality of pruning objects, and the process of performing pruning processing on the pruning objects of the first processing layer according to the pruning rate of the first processing layer to obtain the first processing layer after the pruning processing includes:

5. The method according to claim 4, wherein when the deep network model to be processed is a convolutional neural network model, the first convolutional layer is a convolutional layer, and the pruning object of the first processing layer is a convolutional core, performing pruning processing on the pruning object of the first processing layer according to the first pruning quantity to obtain a pruned first processing layer includes:

6. The method according to claim 5, wherein the pruning object of the first processing layer according to the sorting result and the first pruning quantity to obtain the first processing layer after pruning, includes:

if the first processing layer is the first processing layer in the N processing layers, the convolution kernels corresponding to the convolution kernel coefficients of the first pruning quantity before pruning are obtained, and the first processing layer after pruning is obtained;

7. The method according to claim 4, wherein when the deep network model to be processed is a convolutional neural network model, the first convolutional layer is a normalization layer, and the pruning object of the first processing layer is a scaling factor, performing pruning processing on the pruning object of the first processing layer according to the first pruning quantity to obtain a pruned first processing layer comprises:

8. The method of claim 7, further comprising:

9. The method according to claim 4, wherein when the deep network model to be processed is a non-convolution deep network model and the pruning object of the first processing layer is a connection weight, performing pruning processing on the pruning object of the first processing layer according to the first pruning quantity to obtain a pruned first processing layer comprises:

10. The method according to claim 4, wherein when the deep network model to be processed is a non-convolution deep network model and the pruning object of the first processing layer is a neuron, performing pruning processing on the pruning object of the first processing layer according to the first pruning quantity to obtain a pruned first processing layer comprises:

11. The method of claim 1, wherein the distribution rules include parabolic distribution rules, straight line distribution rules, line-to-line distribution rules, and exponential distribution rules.

12. The method of claim 1, further comprising:

13. The method of claim 1, further comprising:

14. A model processing apparatus, comprising:

and the output module is used for outputting the simplified deep network model.

15. A computer arrangement comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of the method according to any one of claims 1-13.

16. A computer storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions which, when executed by a processor, cause a computer device having the processor to perform the steps of the method of any one of claims 1-13.

17. A computer program product comprising computer programs/instructions, characterized in that the computer programs/instructions, when executed by a processor, implement the steps of the method according to any of claims 1-13.