CN110378470B

CN110378470B - Optimization method and device for neural network model and computer storage medium

Info

Publication number: CN110378470B
Application number: CN201910656769.1A
Authority: CN
Inventors: 陈岩
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-07-19
Filing date: 2019-07-19
Publication date: 2023-08-18
Anticipated expiration: 2039-07-19
Also published as: CN110378470A

Abstract

The embodiment of the application discloses a neural network model optimization method, a neural network model optimization device and a computer storage medium, wherein the neural network model optimization method comprises the following steps: determining a neural network model to be optimized; the neural network model to be optimized at least comprises a depth separable convolution layer, a FusedBatchNorm layer and an activation layer; when the depth separable convolution layer, the FusedBatchNorm layer and the activation layer are connected in sequence, fusing the depth separable convolution layer, the FusedBatchNorm layer and the activation layer to obtain a fused convolution layer; and optimizing the neural network model to be optimized through the fusion convolution layer to obtain an optimized neural network model.

Description

Optimization method and device for neural network model and computer storage medium

Technical Field

The present application relates to the field of neural networks, and in particular, to a method and apparatus for optimizing a neural network model, and a computer storage medium.

Background

With the rapid development of artificial intelligence (Artificial Intelligence, AI), deep learning is attracting more and more people's eyes. Specifically, deep Learning (DL) is one of the technical and research fields of machine Learning, and artificial intelligence is implemented in a computer system by establishing an artificial neural network (Artificial Neural Networks, ans) having a hierarchical structure. Convolutional neural networks have grown more mature and more popular in the context of deep learning development.

Many optimization schemes for neural network models, such as optimization for a mobile frame, optimization of operators, optimization of hardware, preprocessing optimization, and the like, are presented at present. Although these optimization schemes can make the framework operation faster, these optimization schemes are either special neural network hardware structures which most manufacturers cannot develop, or online (online) optimization in the forward reasoning of the neural network model, which results in no improvement of memory occupation, and the operational speed of model optimization has the possibility of improvement.

Disclosure of Invention

The application provides an optimization method, a device and a computer storage medium of a neural network model, which can improve the operation speed of the neural network and reduce the memory occupation on the premise of ensuring no loss of precision by fusing a depth separable convolution layer, a FusedBatchNorm layer and an activation layer.

The technical scheme of the application is realized as follows:

in a first aspect, an embodiment of the present application provides a method for optimizing a neural network model, where the method includes:

determining a neural network model to be optimized; the neural network model to be optimized at least comprises a depth separable convolution layer, a FusedBatchNorm layer and an activation layer;

When the depth separable convolution layer, the FusedBatchNorm layer and the activation layer are connected in sequence, fusing the depth separable convolution layer, the FusedBatchNorm layer and the activation layer to obtain a fused convolution layer;

and optimizing the neural network model to be optimized through the fusion convolution layer to obtain an optimized neural network model.

In a second aspect, an embodiment of the present application provides an optimization apparatus for a neural network model, including: a determining unit, a fusing unit and an optimizing unit, wherein,

the determining unit is configured to determine a neural network model to be optimized; the neural network model to be optimized at least comprises a depth separable convolution layer, a FusedBatchNorm layer and an activation layer;

the fusion unit is configured to fuse the depth separable convolution layer, the FusedBatchNorm layer and the activation layer when the depth separable convolution layer, the FusedBatchNorm layer and the activation layer are sequentially connected to obtain a fused convolution layer;

the optimizing unit is configured to optimize the neural network model to be optimized through the fusion convolution layer, and obtain an optimized neural network model.

In a third aspect, an embodiment of the present application provides an optimization apparatus for a neural network model, including: a memory and a processor; wherein,,

the memory is used for storing a computer program capable of running on the processor;

the processor is configured to perform the method according to the first aspect when the computer program is run.

In a fourth aspect, an embodiment of the present application provides a computer storage medium storing an optimization program of a neural network model, which when executed by at least one processor implements a method according to the first aspect.

The embodiment of the application provides a method, a device and a computer storage medium for optimizing a neural network model, which are used for firstly determining the neural network model to be optimized, wherein the neural network model to be optimized at least comprises a depth separable convolution layer, a FusedBatchNorm layer and an activation layer; when the depth separable convolution layer, the FusedBatchNorm layer and the activation layer are connected in sequence, fusing the depth separable convolution layer, the FusedBatchNorm layer and the activation layer to obtain a fused convolution layer; then optimizing the neural network model to be optimized through a fusion convolution layer to obtain an optimized neural network model; in this way, the depth separable convolution layer, the FusedBatchNorm layer and the activation layer are fused in the optimized neural network model, that is, the operation of partial online (such as the operation of the FusedBatchNorm layer and the activation layer) is adjusted to the operation when the model is converted, so that the operation speed of the neural network can be obviously improved on the premise of ensuring no loss of precision, and meanwhile, the memory occupation can be reduced.

Drawings

FIG. 1 is a schematic flow chart of an optimization method of a neural network model according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a sequential connection of a depth separable convolutional layer, a FusedBatchNorm layer, and an active layer according to one embodiment of the present application;

FIG. 3 is a flowchart of another method for optimizing a neural network model according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an activation function according to an embodiment of the present application;

FIG. 5 is a flowchart of another method for optimizing a neural network model according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of an optimizing apparatus for a neural network model according to an embodiment of the present application;

fig. 7 is a schematic diagram of a specific hardware structure of an optimizing apparatus for a neural network model according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to be limiting.

In the following description, suffixes such as "module", "component", or "unit" for representing elements are used only for facilitating the description of the present application, and have no specific meaning per se. Thus, "module," "component," or "unit" may be used in combination.

In practical application, the development of the neural network model in the deployment of the terminal equipment is not separated from various optimizations aiming at the neural network model. The current optimization scheme may include optimization of a mobile frame, such as optimization for multithreading, operator optimization, such as operator optimization of a convolution kernel, and optimization of hardware, such as optimization of hardware structures using digital signal processors (Digital Signal Processor, DSP), embedded Neural network processors (Neural-network Processing Unit, NPU), and the like. Specifically, for optimization of hardware, most manufacturers cannot develop a dedicated neural network hardware structure, and the stability of the hardware structure also requires long-time verification; the optimization of operators is mostly online optimization, no matter which mobile frame needs to be extremely optimized for the operators, and the operators with the highest operation performance are required to be developed for specific hardware, and parallelization optimization is also required, so that the frame operation speed can be higher by fully utilizing the parallelism of the hardware. However, for the deployment of the neural network model, since the model is converted only once, and the operation is performed by the model for a plurality of times, the operation speed of model optimization is increased.

The embodiment of the application provides an optimization method of a neural network model, which is applied to an optimization device of the neural network model, wherein the optimization device of the neural network model can be positioned in terminal equipment. The terminal device may be a user device, a mobile device, a smart phone, a palm computer, a notebook computer, a personal digital assistant, a navigation device, a desktop computer, or a wearable device. Determining a neural network model to be optimized, wherein the neural network model to be optimized at least comprises a depth separable convolution layer, a FusedBatchNorm layer and an activation layer; when the depth separable convolution layer, the FusedBatchNorm layer and the activation layer are connected in sequence, fusing the depth separable convolution layer, the FusedBatchNorm layer and the activation layer to obtain a fused convolution layer; optimizing the neural network model to be optimized through the fusion convolution layer to obtain an optimized neural network model; in this way, the depth separable convolution layer, the FusedBatchNorm layer and the activation layer are fused in the optimized neural network model, that is, the operation of partial online (such as the operation of the FusedBatchNorm layer and the activation layer) is adjusted to the operation when the model is converted, at this time, the optimization of the neural network model is most effective, the operation speed of the neural network can be obviously improved on the premise of ensuring no loss of precision, and meanwhile, the memory occupation can be reduced.

Embodiments of the present application will be described in detail below with reference to the accompanying drawings.

In an embodiment of the present application, referring to fig. 1, a schematic flow chart of a method for optimizing a neural network model according to an embodiment of the present application is shown. As shown in fig. 1, the method may include:

s101: determining a neural network model to be optimized; the neural network model to be optimized at least comprises a depth separable convolution layer, a FusedBatchNorm layer and an activation layer;

the neural network model is applied to various aspects (such as feature extraction, face recognition, or object recognition) of computer vision and natural language processing, and achieves good effects. In the embodiment of the application, the neural network model may specifically refer to a convolutional neural network (Convolutional Neural Networks, CNN) model, which is a feedforward neural network (Feedforward Neural Networks) with a deep structure and including convolutional operation, and is one of the representative algorithms of deep learning. Here, the neural network model may include a convolutional layer, a fusedbachnum layer, an active layer, and the like, in addition to the input layer, the full connection layer, and the output layer.

In the neural network model, a depth separable convolution layer can be used to replace a common convolution layer, so that on the premise of keeping channel separation, a depth convolution structure is connected, and space convolution can be realized, thereby reducing parameters required by the neural network model. Assuming that a convolution kernel of 3×3 size is provided and that the input channel is 16 and the output channel is 32, then with the depth separable convolution layer, 16×3×32×16×1×1=656 parameters are required, which is much smaller than the 16×32×3×3=4608 parameters required with the normal convolution layer. It will be appreciated that if the convolution kernel is a two-dimensional convolution kernel, the depth separable convolution layer may also be denoted as a DepthwiseConv2DNative layer.

In addition, in the neural network model, a FusedBatchNorm layer is usually connected to the back of the convolution layer and used for accelerating convergence of the neural network model, so that the overfitting can be effectively controlled, and the problems of gradient disappearance and gradient explosion can be effectively solved after the FusedBatchNorm layer is utilized to normalize data. In addition, since the expression of the linear model is insufficient, an active layer may be added after the fusedbachnum layer at this time in order to add a nonlinear factor. The activation layer is generally represented by an activation function, and the activation function may include a modified linear unit (Rectified Linear Unit, relu) function, a Relu1 function, a Relu6 function, a hyperbolic tangent (Tanh) function, a logic (logic) function, a normalized index (Softmax) function, and the like.

Thus, while the fusetbachnum layer and the activation layer have very positive roles in training the neural network model, operations of some layers are added in forward reasoning of the neural network model, which will affect the performance of the neural network model and occupy more memory. In this time, when the model is converted, the depth separable convolution layer, the FusedBatchNorm layer and the activation layer can be fused, so that the operation of the FusedBatchNorm layer and the activation layer is omitted, and the purposes of saving memory and improving the performance of the neural network model can be achieved.

S102: when the depth separable convolution layer, the FusedBatchNorm layer and the activation layer are connected in sequence, fusing the depth separable convolution layer, the FusedBatchNorm layer and the activation layer to obtain a fused convolution layer;

it should be noted that the connection of these three layers in the neural network model is sequentially divided for the depth separable convolutional layer, the fusedbachnum layer, and the active layer. That is, at the time of model conversion, it is first necessary to identify or judge whether the depth separable convolutional layer, the fusetbochnum layer, and the active layer are sequentially connected. Only when the depth separable convolution layer, the FusedBatchNorm layer and the activation layer are connected in sequence, the three layers of the depth separable convolution layer, the FusedBatchNorm layer and the activation layer can be fused.

Specifically, assuming that the depth separable convolutional layer is represented by a DepthwiseConv2DNative layer, an activation function selected by the activation layer is a Relu6 function; thus, the sequential connection of these three layers is specifically referred to as DepthwiseConv2DNative layer→FusedBatchNorm layer→Relu6 layer. Referring to fig. 2, a schematic structural diagram of a sequential connection of a depth separable convolution layer, a fusedbachnum layer, and an activation layer is shown in accordance with an embodiment of the present application. In fig. 2, the output of the depth separable convolutional layer is connected as input to the fusedbachnum layer, i.e., output=x; and the output of the fusedbachnum layer serves as the input of the active layer, i.e., y=features; this indicates that the depth separable convolutional layer, the fusedbachnum layer and the active layer are sequentially connected, and in this case, three layers of the depth separable convolutional layer, the fusedbachnum layer and the active layer can be combined into a one-layer structure to obtain the fusion convolutional layer.

S103: and optimizing the neural network model to be optimized through the fusion convolution layer to obtain an optimized neural network model.

It should be noted that, for the neural network model to be optimized, after the fusion convolutional layer is obtained, the optimized neural network model may be obtained according to the fusion convolutional layer. In this way, the data to be processed can be operated according to the optimized neural network model.

In some embodiments, after S103, the method may further include:

and inputting the data to be processed into the optimized neural network model for operation processing, and outputting target data.

That is, after the optimized neural network model is obtained, the data to be processed may be input into the optimized neural network model, and the target data may be output through the operation of the optimized neural network model. The target data is the result expected to be output by the optimized neural network model. Therefore, the running time of the model can be reduced by utilizing the optimized neural network model, so that the operation speed of the neural network is improved, and the memory occupation is reduced.

The embodiment provides an optimization method of a neural network model, which comprises the steps of determining the neural network model to be optimized, wherein the neural network model to be optimized at least comprises a depth separable convolution layer, a FusedBatchNorm layer and an activation layer; when the depth separable convolution layer, the FusedBatchNorm layer and the activation layer are connected in sequence, fusing the depth separable convolution layer, the FusedBatchNorm layer and the activation layer to obtain a fused convolution layer; optimizing the neural network model to be optimized through the fusion convolution layer to obtain an optimized neural network model; in this way, the depth separable convolution layer, the FusedBatchNorm layer and the activation layer are fused in the optimized neural network model, that is, the operation of partial online (such as the operation of the FusedBatchNorm layer and the activation layer) is adjusted to the operation when the model is converted, and the operation of offline (offline) is converted into the model, so that the running time of the model can be reduced, and the operation speed of the neural network is obviously improved on the premise of ensuring no loss of precision; in addition, the optimization of the neural network model is most effective, so that the performance of the neural network model can be improved, and the memory occupation can be reduced.

In another embodiment of the present application, for the fusion of the depth separable convolutional layer, the fusedbachnum layer and the active layer, the depth separable convolutional layer and the fusedbachnum layer may be combined first, and then combined with the active layer to obtain the fusion convolutional layer. Referring to fig. 3, a flow chart of another method for optimizing a neural network model according to an embodiment of the present application is shown. As shown in fig. 3, for S102, this step may include:

s301: when the depth separable convolution layer, the FusedBatchNorm layer and the activation layer are connected in sequence, merging the depth separable convolution layer with the FusedBatchNorm layer to obtain a merged convolution layer;

it should be noted that the depth separable convolution layer is used for extracting features of input data of the convolution layer, and includes a plurality of convolution kernels, where each pixel that forms the convolution kernels corresponds to a weight parameter and a bias parameter. Thus, the output of the depth separable convolutional layer is denoted by y, whose calculation formula is as follows,

y＝x[0]×w[0]+x[1]×w[1]+...+x[n-1]×w[n-1]+b…………………………(1)

here, x [ i ] represents an input parameter of the depth-separable convolutional layer, w [ i ] represents an initial weight parameter of the depth-separable convolutional layer, b represents an initial bias parameter of the depth-separable convolutional layer, i=0, 1.

After the depth separable convolutional layer, the fusedbachnum layer will be connected. Wherein the fusedbachnum layer is used to normalize the data and accelerate the convergence of the neural network model. The output of the fusedbachnum layer is denoted by z, and its calculation formula is as follows,

z＝gamma×(y-mean)/sqrt(variance+epsilon)+beta…………………………(2)

here, y represents the output of the previous layer, i.e., the output of the depth-separable convolution layer, such as an output feature map (feature map); in calculating z, y is first subtracted by a mean (expressed in mean), then divided by the standard deviation (expressed in sqrt (variance+epsilon)) to normalize the data, finally multiplied by a factor (expressed in gamma) and added with an offset value (expressed in beta). Here mean, gamma, beta and sqrt (variance+epsilon) are normalized parameters of the fusedbachnum layer. It should be noted that epsilon is mainly used to ensure that the standard deviation is not 0. At the fusedbachnom level, epsilon is not mandatory, but in engineering implementations it is necessary to check if the standard deviation is 0, if it is equal to 0, epsilon is also set to 0.001.

For each channel, there will be its own normalized parameters (including mean, gamma, beta and sqrt (variance+epsilon); if the output of the depth separable convolutional layer has C channels, there will be C x 4 parameters at the fusedbachnum layer. Thus, since equation (2) is applied to each channel of the output profile of the depth separable convolutional layer, the depth separable convolutional layer and the fusedbachnum layer can be combined according to some mathematical operations, the depth separable convolutional layer typically employs a point multiplication operation as shown in equation (1), belonging to a linear transformation, while the fusedbachnum layer also belongs to a linear transformation, so that the depth separable convolutional layer and the fusedbachnum layer can be combined into one linear transformation as shown in equation (3),

The mathematical operation is performed on the expression (3), that is, the expression (3) may be expressed as follows,

as can be seen from equation (4), for the combination of the depth separable convolutional layer and the fusedbachnum layer, the method specifically can be realized by updating the initial weight parameter and the initial bias parameter in the depth separable convolutional layer, the updated weight parameter and the updated bias parameter are shown as equation (5),

thus, in some embodiments, for S301, the merging the depth separable convolutional layer with the fusedbachnum layer to obtain a merged convolutional layer may include:

acquiring initial weight parameters and initial bias parameters of the depth separable convolutional layer and normalization parameters of the FusedBatchNorm layer;

updating the initial weight parameter and the initial bias parameter based on the normalization parameter to obtain an updated weight parameter and an updated bias parameter;

and obtaining the merging convolution layer based on the updated weight parameter and the updated offset parameter.

That is, after the initial weight parameters (denoted by w [ i ]) and the initial bias parameters (denoted by b) of the depth separable convolutional layer and the normalized parameters (denoted by mean, gamma, beta and sqrt (variance+epsilon)) of the FusedPachy layer are obtained, in order to merge the FusedPachy layer and the depth separable convolutional layer, the normalized parameters of the FusedPachy layer may be merged above the initial weight parameters and the initial bias parameters of the depth separable convolutional layer, i.e., after the initial weight parameters and the initial bias parameters of the depth separable convolutional layer are updated by the normalized parameters of the FusedPachy layer, the updated weight parameters and the updated bias parameters are as shown in formula (5), so that the parameters of the FusedPachy layer are completely merged into the depth separable convolutional layer, and thus the calculation of the FusedPachy layer is omitted in the forward reasoning of the neural network model.

S302: and merging the merging convolution layer with the activation layer to obtain the merging convolution layer.

It should be noted that, after the fusetbachnum layer and the depth separable convolution layer are combined, the combined convolution layer obtained after the combination may be combined with the active layer again. Since the active layer is a nonlinear transformation and the merging convolutional layer belongs to a linear transformation, the parameters of the active layer cannot be directly merged into the weight parameters of the merging convolutional layer. To omit the activation layer in the neural network model, an activation flag (flag) may be added to the merge convolution layer at this time, where the flag is used to indicate the activation type of the neural network model.

Thus, in some embodiments, for S302, the merging convolutional layer with the active layer to obtain the merging convolutional layer may include:

setting an activation identifier in the merging convolution layer; the activation identification is used for determining the activation type of the neural network model to be optimized;

and performing activation operation on the output value of the merging convolution layer according to the activation type determined by the activation identification so as to realize fusion of the merging convolution layer and the activation layer and obtain the fusion convolution layer.

Further, the activation type includes at least one of the following: a Relu type, a Relu1 type, a Relu6 type, a Tanh type, a Logistic type, and a Softmax type.

It should be noted that, taking the Relu6 type as an example, the calculation formula of the active layer is as follows,

y＝min(6,max(0,x))…………………………………………………(6)

here, x represents the output of the previous layer, i.e., the output of the combined convolutional layer obtained by combining the depth separable convolutional layer with the fusedbachnum layer. Referring to fig. 4, a schematic diagram of an activation function according to an embodiment of the present application is shown. The activation layer comprises a plurality of activation functions such as a Relu function, a Relu1 function, a Relu6 function, a Tanh function, a Logistic function, a Softmax function and the like, and each activation function corresponds to one activation type. In fig. 4, the activation function is a Relu6 function, the corresponding activation type is a Relu6 type, and it can be seen from fig. 4 that the activation function is a nonlinear transformation.

Specifically, to remove the activation layer in the neural network model, a flag may be added to the merge convolution layer, where the flag is used to represent the activation type of the neural network model. The activation type of the neural network model can be determined through the flag, then when a value is obtained through calculation by utilizing the merging convolution layers, activation operation can be performed on the value, for example, when a window of the merging convolution layers slides, a plurality of values can be obtained, before the value is written into the target memory, activation operation is performed by utilizing the flag to obtain a new value, and then the new value is written into the target memory.

Typically, after the fusetbachnum layer is merged into the depth separable convolutional layer, the resulting merged convolutional layer remains a depth separable convolutional layer. Thus, before the depth separable convolutional layer (i.e., the merge convolutional layer) and the active layer merge, the memory needs three parts: (1) an input memory of the depth separable convolution layer, (2) an output memory of the depth separable convolution layer, (3) an output memory of the activation layer; the active layer is also incorporated into the depth separable convolutional layer, and the memory required is two parts: (1) an input memory of the depth separable convolutional layer, (2) an output memory of the depth separable convolutional layer. That is, the optimized neural network model is utilized to achieve the effect of saving memory, and the convolution operation and the activation operation can be realized in one for loop, while the operation of convolution and the activation operation can be completed by utilizing the neural network model to be optimized only through two for loops, so that the operation performance of an operator can be effectively improved.

According to the embodiment of the application, the calculation formulas of the depth separable convolution layer, the FusedBatchNorm layer and the activation layer are researched, so that the FusedBatchNorm layer and the activation layer are combined into the depth separable convolution layer during model conversion, the realization of the depth separable convolution layer is not influenced, and meanwhile, the operation time of a neural network model is shortened and the memory occupation is reduced because the model is converted into off-line operation.

In yet another embodiment of the present application, after determining the neural network model to be optimized, a determination is made as to whether the depth separable convolutional layer, the FusedBatchNorm layer, and the active layer are connected sequentially. Referring to fig. 5, a flow chart of another method for optimizing a neural network model according to an embodiment of the present application is shown. As shown in fig. 5, after S101, the method may further include:

s501: determining whether the depth separable convolutional layer, the fusedbachnum layer, and the active layer are sequentially connected.

Further, after S501, the method may further include:

s502: when the depth separable convolutional layer, the fusedbachnum layer and the active layer are not connected in sequence, the step of fusing the depth separable convolutional layer, the fusedbachnum layer and the active layer is not performed, and the neural network model to be optimized is maintained;

s503: and inputting the data to be processed into the neural network model to be optimized for operation processing.

It should be noted that, after step S501, if the depth separable convolutional layer, the fusedbachnum layer, and the active layer are not sequentially connected, steps S502 and S503 will be performed; if the depth separable convolutional layer, the fusetbachnum layer and the active layer are connected sequentially, steps S102 and S103 will be performed, and then step S504 will be continued as follows,

S504: and inputting the data to be processed into the optimized neural network model for operation processing.

Thus, the neural network model to be optimized at least comprises a depth separable convolution layer, a FusedBatchNorm layer and an activation layer. In order to determine whether to optimize the neural network model to be optimized, it is necessary to determine whether the depth separable convolutional layer, the fusedbachnum layer and the active layer are sequentially connected; when the depth separable convolution layer, the FusedBatchNorm layer and the activation layer are not connected in sequence, the neural network model to be optimized is not optimized according to the flow shown in the figure 1, and the original neural network model to be optimized is continuously maintained at the moment; when the depth separable convolution layer, the FusedBatchNorm layer and the activation layer are connected in sequence, optimizing the neural network model to be optimized according to the flow shown in FIG. 1 to obtain an optimized neural network model; because the optimized neural network model has adjusted the operation of part of the online (such as the operation of the FusedBATCHNorm layer and the activation layer) to the operation when the model is converted, that is, the calculation of the FusedBATCHNorm layer and the activation layer during the model operation can be put into the operation when the model is converted through the offine optimization when the model is converted, the FusedBATCHNorm layer and the activation layer can be removed, and the operation speed of the neural network is obviously improved on the premise of ensuring no loss of precision; and the memory is saved, and the work of memory copying is reduced.

In the embodiment of the application, if the depth separable convolution layer, the FusedBatchNorm layer and the activation layer are not connected in sequence, but the depth separable convolution layer and the FusedBatchNorm layer are connected in sequence, and after the depth separable convolution layer is arranged on the FusedBatchNorm layer, the depth separable convolution layer and the FusedBatchNorm layer can be fused or combined, so that the FusedBatchNorm layer can be removed during model conversion, and the purposes of improving the operation speed of a neural network model and saving memory can be achieved on the premise of ensuring no loss of precision.

The embodiment provides an optimization method for a neural network model, and detailed description is made on specific implementation of the foregoing embodiment through the foregoing embodiment, so that it can be seen that, since a depth separable convolution layer, a fusedbachnum layer and an activation layer are fused in the optimized neural network model, that is, operations of part of online (such as operations of the fusedbachnum layer and the activation layer) are adjusted to perform operations when the model is converted, and since the model is converted to offline (ofline), operation time of the model can be reduced, and operation speed of the neural network is remarkably improved on the premise that accuracy is not lost; in addition, the optimization of the neural network model is most effective, so that the performance of the neural network model can be improved, and the memory occupation can be reduced.

In still another embodiment of the present application, based on the same inventive concept as the previous embodiment, referring to fig. 6, a schematic structural diagram of an optimizing apparatus 60 of a neural network model according to an embodiment of the present application is shown. As shown in fig. 6, the optimizing means 60 of the neural network model may include: a determination unit 601, a fusion unit 602 and an optimization unit 603, wherein,

the determining unit 601 is configured to determine a neural network model to be optimized; the neural network model to be optimized at least comprises a depth separable convolution layer, a FusedBatchNorm layer and an activation layer;

the fusion unit 602 is configured to fuse the depth separable convolutional layer, the fusedbachnum layer and the active layer when the depth separable convolutional layer, the fusedbachnum layer and the active layer are sequentially connected, so as to obtain a fused convolutional layer;

the optimizing unit 603 is configured to optimize the neural network model to be optimized through the fusion convolutional layer, and obtain an optimized neural network model.

In the above-mentioned aspect, referring to fig. 6, the optimizing apparatus 60 of the neural network model may further include an operation unit 604 configured to input data to be processed into the optimized neural network model for operation processing, and output target data.

In the above solution, the fusion unit 602 is specifically configured to combine the depth separable convolutional layer with the fusedbachnum layer to obtain a combined convolutional layer; and merging the merging convolution layer with the activation layer to obtain the merging convolution layer.

In the above-described scheme, referring to fig. 6, the optimizing apparatus 60 of the neural network model may further include an acquiring unit 605 and an updating unit 606, wherein,

the acquiring unit 605 is configured to acquire an initial weight parameter and an initial bias parameter of the depth separable convolutional layer and a normalization parameter of the fusedbachnum layer;

the updating unit 606 is configured to update the initial weight parameter and the initial bias parameter based on the normalization parameter, so as to obtain an updated weight parameter and an updated bias parameter;

the fusion unit 602 is specifically configured to obtain the merging convolutional layer based on the updated weight parameter and the updated offset parameter.

In the above-mentioned scheme, referring to fig. 6, the optimizing apparatus 60 of the neural network model may further include an activating unit 607 configured to set an activation flag in the merging convolutional layer; the activation identification is used for determining the activation type of the neural network model to be optimized; and performing an activation operation on the output value of the merging convolutional layer according to the activation type determined by the activation identifier so as to realize the fusion of the merging convolutional layer and the activation layer and obtain the fusion convolutional layer.

In the above scheme, the activation type at least comprises one of the following: a Relu type, a Relu1 type, a Relu6 type, a Tanh type, a Logistic type, and a Softmax type.

In the above-described aspect, referring to fig. 6, the optimizing apparatus 60 of the neural network model may further include a determining unit 608 configured to determine whether the depth separable convolutional layer, the fusedbachnum layer, and the active layer are sequentially connected.

In the above solution, the determining unit 608 is further configured to, when the depth separable convolutional layer, the fusedbachnum layer, and the active layer are not sequentially connected, not perform the step of fusing the depth separable convolutional layer, the fusedbachnum layer, and the active layer, and maintain the neural network model to be optimized;

the operation unit 604 is further configured to input the data to be processed into the neural network model to be optimized for operation processing.

It will be appreciated that in this embodiment, the "unit" may be a part of a circuit, a part of a processor, a part of a program or software, etc., and may of course be a module, or may be non-modular. Furthermore, the components in the present embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional modules.

The integrated units, if implemented in the form of software functional modules, may be stored in a computer-readable storage medium, if not sold or used as separate products, and based on such understanding, the technical solution of the present embodiment may be embodied essentially or partly in the form of a software product, which is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or processor to perform all or part of the steps of the method described in the present embodiment. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Accordingly, the present embodiment provides a computer storage medium storing an optimization program of a neural network model, which when executed by at least one processor implements the method of any of the preceding embodiments.

The composition of the optimization apparatus 60 based on the neural network model and the computer storage medium described above, referring to fig. 7, illustrate a specific hardware structure example of the optimization apparatus 60 for a neural network model according to an embodiment of the present application, may include: a communication interface 701, a memory 702, and a processor 703; the various components are coupled together by a bus system 704. It is appreciated that bus system 704 is used to enable connected communications between these components. The bus system 704 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration, the various buses are labeled as bus system 704 in fig. 7. The communication interface 701 is configured to receive and send signals in a process of receiving and sending information with other external network elements;

a memory 702 for storing a computer program capable of running on the processor 703;

a processor 703 for executing, when running the computer program:

It is to be appreciated that memory 702 in embodiments of the application may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and Direct RAM (DRRAM). The memory 702 of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

And the processor 703 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in the processor 703 or by instructions in the form of software. The processor 703 may be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory 702, and the processor 703 reads information in the memory 702 and, in combination with its hardware, performs the steps of the method described above.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or a combination thereof. For a hardware implementation, the processing units may be implemented within one or more application specific integrated circuits (Application Specific Integrated Circuits, ASIC), digital signal processors (Digital Signal Processing, DSP), digital signal processing devices (DSP devices, DSPD), programmable logic devices (Programmable Logic Device, PLD), field programmable gate arrays (Field-Programmable Gate Array, FPGA), general purpose processors, controllers, microcontrollers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof. For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein, and software code may be stored in memory and executed by a processor.

Optionally, as another embodiment, the processor 703 is further configured to perform the method of any of the preceding embodiments when running the computer program.

It should be noted that, in the present application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

The methods disclosed in the method embodiments provided by the application can be arbitrarily combined under the condition of no conflict to obtain a new method embodiment.

The features disclosed in the several product embodiments provided by the application can be combined arbitrarily under the condition of no conflict to obtain new product embodiments.

The features disclosed in the embodiments of the method or the apparatus provided by the application can be arbitrarily combined without conflict to obtain new embodiments of the method or the apparatus.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of optimizing a neural network model for face recognition or object recognition, the method comprising:

optimizing the neural network model to be optimized through the fusion convolution layer to obtain an optimized neural network model;

and inputting the data to be identified into the optimized neural network model for identification processing, and outputting a face identification result or an object identification result.

2. The method of claim 1, wherein the fusing the depth separable convolutional layer, the fusedbachnum layer, and the active layer to obtain a fused convolutional layer comprises:

combining the depth separable convolutional layer with the FusedBatchNorm layer to obtain a combined convolutional layer;

and merging the merging convolution layer with the activation layer to obtain the merging convolution layer.

3. The method of claim 2, wherein said merging the depth separable convolutional layer with the fusedbachnum layer to obtain a merged convolutional layer comprises:

4. The method of claim 2, wherein the merging the merged convolutional layer with the active layer to obtain the merged convolutional layer comprises:

5. The method of claim 4, wherein the activation type comprises at least one of: a Relu type, a Relu1 type, a Relu6 type, a Tanh type, a Logistic type, and a Softmax type.

6. The method according to any one of claims 1 to 5, wherein after said determining a neural network model to be optimized, the method further comprises:

determining whether the depth separable convolutional layer, the fusedbachnum layer, and the active layer are sequentially connected.

7. The method of claim 6, wherein after said determining whether said depth separable convolutional layer, said fusedbachnum layer, and said active layer are sequentially connected, said method further comprises:

when the depth separable convolutional layer, the fusedbachnum layer and the active layer are not connected in sequence, the step of fusing the depth separable convolutional layer, the fusedbachnum layer and the active layer is not performed, and the neural network model to be optimized is maintained;

and inputting the data to be processed into the neural network model to be optimized for operation processing.

8. An optimization apparatus for a neural network model, wherein the neural network model is used for face recognition or object recognition, the apparatus comprising: a determining unit, a fusion unit, an optimizing unit and an operation unit, wherein,

the optimizing unit is configured to optimize the neural network model to be optimized through the fusion convolution layer, and obtain an optimized neural network model;

the operation unit is configured to input the data to be identified into the optimized neural network model for identification processing and output a face identification result or an object identification result.

9. The apparatus of claim 8, wherein the fusion unit is specifically configured to combine the depth separable convolutional layer with the fusedbachnum layer to obtain a combined convolutional layer; and merging the merging convolution layer with the activation layer to obtain the merging convolution layer.

10. The apparatus of claim 9, further comprising an acquisition unit and an update unit, wherein,

the acquisition unit is configured to acquire initial weight parameters and initial bias parameters of the depth separable convolution layer and normalization parameters of the FusedBatchNorm layer;

The updating unit is configured to update the initial weight parameter and the initial bias parameter based on the normalization parameter to obtain an updated weight parameter and an updated bias parameter;

the fusion unit is specifically configured to obtain the merging convolution layer based on the updated weight parameter and the updated bias parameter.

11. The apparatus of claim 9, further comprising an activation unit configured to set an activation flag in the merged convolutional layer; the activation identification is used for determining the activation type of the neural network model to be optimized; and performing an activation operation on the output value of the merging convolutional layer according to the activation type determined by the activation identifier so as to realize the fusion of the merging convolutional layer and the activation layer and obtain the fusion convolutional layer.

12. The apparatus of claim 11, wherein the activation type comprises at least one of: a Relu type, a Relu1 type, a Relu6 type, a Tanh type, a Logistic type, and a Softmax type.

13. The apparatus according to claims 8 to 12, further comprising a determination unit configured to determine whether the depth separable convolutional layer, the fusedbachnum layer, and the active layer are sequentially connected.

14. The apparatus of claim 13, wherein the determination unit is further configured to maintain the neural network model to be optimized without performing the step of fusing the depth separable convolutional layer, the fusedbachnum layer, and the active layer when the depth separable convolutional layer, the fusedbachnum layer, and the active layer are not sequentially connected;

the operation unit is further configured to input the data to be processed into the neural network model to be optimized for operation processing.

15. An apparatus for optimizing a neural network model, the apparatus comprising: a memory and a processor; wherein,,

the processor being adapted to perform the method of any of claims 1 to 7 when the computer program is run.

16. A computer storage medium storing an optimisation of a neural network model, which when executed by at least one processor implements the method of any one of claims 1 to 7.