CN110033083B

CN110033083B - Convolutional neural network model compression method and device, storage medium and electronic device

Info

Publication number: CN110033083B
Application number: CN201910251951.9A
Authority: CN
Inventors: 金坤; 李峰; 赵世杰; 左小祥
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-03-29
Filing date: 2019-03-29
Publication date: 2023-08-29
Anticipated expiration: 2039-03-29
Also published as: CN110033083A

Abstract

The invention discloses a convolutional neural network model compression method and device, a storage medium and an electronic device. Wherein the method comprises the following steps: combining parameters of a first batch of standardization layers in the convolutional neural network model into a first convolutional layer in the convolutional neural network model to generate a first target neural network model containing a first target convolutional layer, wherein the convolutional neural network model and the first target neural network model have the same output for the same input; deleting convolution kernels with norms smaller than a first threshold value in a first target convolution layer in the first target neural network model to obtain a second target neural network model; and compressing the second target neural network model to obtain a third target neural network model. The method solves the technical problems of low use efficiency and poor flexibility of the neural network model in the related technology.

Description

Convolutional neural network model compression method and device, storage medium and electronic device

Technical Field

The present invention relates to the field of computers, and in particular, to a convolutional neural network model compression method and apparatus, a storage medium, and an electronic apparatus.

Background

In the related art, the neural network model has huge parameter quantity and is difficult to be directly applied to end products.

In addition, because the parameter amount of the neural network model is huge, the memory consumption is high, and the use efficiency of the neural network model is low. Further, due to high memory consumption, the neural network model is only allowed to be used in a fixed scene, so that the flexibility of using the neural network model is poor.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the invention provides a convolutional neural network model compression method and device, a storage medium and an electronic device, which are used for at least solving the technical problems of low use efficiency and poor flexibility of a neural network model in the related technology.

According to an aspect of an embodiment of the present invention, there is provided a convolutional neural network model compression method including: combining parameters of a first batch of standardization layers in a convolutional neural network model into a first convolutional layer in the convolutional neural network model to generate a first target neural network model containing a first target convolutional layer, wherein the convolutional neural network model and the first target neural network model have the same output for the same input, and the first batch of standardization layers are batch standardization layers which are positioned behind the first convolutional layer in the convolutional neural network model and are connected with the first convolutional layer; deleting convolution kernels with norms smaller than a first threshold value in the first target convolution layer in the first target neural network model to obtain a second target neural network model, wherein the memory occupancy rate of the second target neural network model is smaller than that of the first target neural network model; and compressing the second target neural network model to obtain a third target neural network model.

According to another aspect of the embodiment of the present invention, there is also provided a convolutional neural network model using method, including: acquiring parameters to be input; inputting the parameters to be input into a third target neural network model, wherein the third target neural network model is a model obtained by merging parameters of a first batch of standardization layers in a convolutional neural network model into a first convolutional layer in the convolutional neural network model, deleting target convolutional kernels in the first target convolutional layer in the first target neural network model after generating the first target neural network model containing the first target convolutional layer, and compressing a second target neural network model deleted of the target convolutional kernels, wherein the convolutional neural network model and the first target neural network model have the same output for the same input, and the first batch of standardization layers are batch of standardization layers which are positioned behind the first convolutional layer in the convolutional neural network model and are connected with the first convolutional layer; and obtaining the result output by the third target neural network model.

According to still another aspect of the embodiment of the present invention, there is also provided a convolutional neural network model compression device, including: a merging unit, configured to merge parameters of a first batch of normalization layers in a convolutional neural network model into a first convolutional layer in the convolutional neural network model, and generate a first target neural network model including a first target convolutional layer, where the convolutional neural network model and the first target neural network model have the same output for the same input, and the first batch of normalization layers are batch normalization layers that are located after the first convolutional layer in the convolutional neural network model and are connected to the first convolutional layer; a deleting unit, configured to delete a convolution kernel in the first target convolution layer in the first target neural network model, where the norm is smaller than a first threshold, to obtain a second target neural network model, where a memory occupancy rate of the second target neural network model is smaller than a memory occupancy rate of the first target neural network model; and the compression unit is used for compressing the second target neural network model to obtain a third target neural network model.

As an alternative example, the above apparatus further includes: a first determining unit, configured to determine a P-norm of each set of convolution kernels in the first target convolution layer before obtaining the second target neural network model, where P is equal to 0 or P is equal to 1 or P is equal to 2 or P is equal to positive infinity; the deleting unit includes: the first setting module is used for setting pruning rate of each group of convolution kernels; the sorting module is used for arranging the p-norms of each group of convolution kernels in the order from large to small; and the deleting module is used for deleting the target group convolution kernels corresponding to the P-norm smaller than the first threshold in the P-norm of each group of convolution kernels until the pruning rate requirement is met, so as to obtain the second target neural network model, wherein the ratio of the number of all convolution kernels in the second target neural network model to the number of all convolution kernels in the first target neural network model is smaller than a preset threshold.

As an optional example, the deletion module includes: and the deleting submodule is used for setting the weight of the target group convolution kernel with the P-norm smaller than the first threshold to be 0.

As an alternative example, the above apparatus further includes: a second determining unit, configured to, before incorporating the parameters of the first batch of normalization layers in the convolutional neural network model into the first convolutional layers in the convolutional neural network model, take, as the first convolutional layers, a convolutional layer in the convolutional neural network model except for a last convolutional layer on each branch in a case where the convolutional neural network model includes a cross-layer connection branch and/or a plurality of series branches formed by the convolutional layers.

As an alternative example, the above-described merging unit includes: the first calculation module is used for calculating N convolution kernels of the first target convolution layer corresponding to the ith output feature map according to N convolution kernels of the first convolution layer corresponding to the ith output feature map of the first convolution layer and parameters of the first batch of standardization layers corresponding to the ith output feature map of the first convolution layer; the second calculating module is used for calculating the offset corresponding to the ith output characteristic diagram of the first target convolution layer according to the offset corresponding to the ith output characteristic diagram of the first convolution layer and the parameter corresponding to the ith output characteristic diagram of the first batch normalization layer; and a third calculation module, configured to determine an ith output feature map of the first target convolutional layer according to a convolution sum between N input feature maps of the first convolutional layer and N convolution kernels of the first target convolutional layer corresponding to the ith output feature map, and an offset of the first target convolutional layer corresponding to the ith output feature map, where N is the number of input feature maps of the first convolutional layer, i is greater than or equal to 1 and less than or equal to M, and M is the number of output feature maps of the first target convolutional layer.

As an optional example, the first determining unit includes: a first determining module for determining P-norms of N convolution kernels of the first target convolution layer corresponding to the ith output feature map, wherein i is greater than or equal to 1 and less than or equal to M, and the first target convolution layer is greater than or equal to 1 and less than or equal to MM is the number of the output feature graphs of the first target convolution layer, N is the number of the input feature graphs of the first convolution layer, and the ith group of convolution kernels in the first target convolution layer are the N convolution kernels of the first target convolution layer corresponding to the ith output feature graph; a second determining module, configured to determine a sum of P-norms of N convolution kernels of the first target convolution layer, where the N convolution kernels correspond to an ith output feature map; a third determination module for summing the P-normsThe power is taken as the P-norm of the ith set of convolution kernels of the first target convolution layer corresponding to the ith output feature map.

As an alternative example, the compression unit includes: the second setting module is used for setting the compression rate rho, wherein rho is more than 0 and less than 1; and the compression module is used for carrying out layer-by-layer compression on the second target neural network model until the set compression rate rho is met.

As an alternative example, the above apparatus further includes: the first acquisition unit is used for acquiring a person face image after compressing the second target neural network model to obtain a third target neural network model; an input unit configured to input the person face image into the third target neural network model; and a second obtaining unit, configured to obtain a result output by the third target neural network model, where the output result is an age of a person corresponding to the face image of the person.

According to still another aspect of the embodiment of the present invention, there is also provided a convolutional neural network model using apparatus, including: the first acquisition unit is used for acquiring parameters to be input; an input unit, configured to input the parameter to be input into a third target neural network model, where the third target neural network model is a model obtained by merging parameters of a first batch of normalization layers in a convolutional neural network model into a first convolutional layer in the convolutional neural network model, deleting a target convolutional kernel in the first target convolutional layer in the first target neural network model after generating the first target neural network model including the first target convolutional layer, and compressing a second target neural network model from which the target convolutional kernel is deleted, where the convolutional neural network model and the first target neural network model have the same output for the same input, and the first batch of normalization layers is a batch of normalization layers located after the first convolutional layer in the convolutional neural network model and connected with the first convolutional layer; and the second acquisition unit is used for acquiring the result output by the third target neural network model.

As an alternative example, the above apparatus further includes: and the determining unit is used for taking the convolution layers except the last convolution layer on each branch in the convolution neural network model as the first convolution layer in the case that the convolution neural network model comprises cross-layer connection branches and/or multi-layer serial branches formed by the convolution layers before the parameters of the first batch of normalization layers are combined into the first convolution layer in the convolution neural network model.

As an alternative example, the first acquisition unit includes: the first acquisition module is used for acquiring the face image of the person; the input unit includes: an input module for inputting the face image of the person into the third target neural network model; the second acquisition unit includes: and the second acquisition module is used for acquiring the age of the person corresponding to the person face image output by the third target neural network model.

According to yet another aspect of an embodiment of the present invention, there is also provided a storage medium having stored therein a computer program, wherein the computer program is configured to perform the convolutional neural network model compression method described above when run.

According to still another aspect of the embodiment of the present invention, there is further provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the convolutional neural network model compression method described above through the computer program.

In the embodiment of the invention, the parameters of a first batch of standardized layers in the convolutional neural network model are combined into the first convolutional layers in the convolutional neural network model, so that a first target neural network model containing a first target convolutional layer is generated; deleting convolution kernels with norms smaller than a first threshold value in the first target convolution layer in the first target neural network model to obtain a second target neural network model; and compressing the second target neural network model to obtain a third target neural network model. In the method, the first batch of standardization layers in the convolutional neural network model are combined into the first convolutional layer to generate the first target neural network model, the deleting operation is carried out on the convolutional kernel with the norm smaller than the first threshold value in the first target neural network model to obtain the second target neural network model with smaller memory occupancy rate, and the second target neural network model is compressed to obtain the converged third target neural network model, so that the cutting of the convolutional neural network model is realized, the volume of the convolutional neural network model is reduced, the memory occupancy of the convolutional neural network model is reduced, and the use efficiency of the convolutional neural network model is improved. Further, the volume of the convolutional neural network model is reduced, so that the convolutional neural network model is not limited to a specific use environment, and the flexibility of the convolutional neural network model is improved. The method solves the technical problems of low use efficiency and poor flexibility of the neural network model in the related technology.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a schematic illustration of an application environment of an alternative convolutional neural network model compression method in accordance with an embodiment of the present application;

FIG. 2 is a flow chart of an alternative convolutional neural network model compression method in accordance with an embodiment of the present application;

FIG. 3 is a schematic diagram of an alternative convolutional neural network model compression method in accordance with an embodiment of the present application;

FIG. 4 is a schematic diagram of another alternative convolutional neural network model compression method in accordance with an embodiment of the present application;

FIG. 5 is a schematic diagram of yet another alternative convolutional neural network model compression method in accordance with an embodiment of the present application;

FIG. 6 is a schematic diagram of yet another alternative convolutional neural network model compression method in accordance with an embodiment of the present application;

FIG. 7 is a flow chart of another alternative convolutional neural network model compression method in accordance with an embodiment of the present application;

FIG. 8 is a flow chart of an alternative convolutional neural network model usage method in accordance with an embodiment of the present application;

FIG. 9 is a schematic diagram of an alternative convolutional neural network model compression device in accordance with an embodiment of the present invention;

FIG. 10 is a schematic diagram of another alternative convolutional neural network model compression device in accordance with an embodiment of the present invention;

FIG. 11 is a schematic diagram of a further alternative convolutional neural network model compression device in accordance with an embodiment of the present invention;

FIG. 12 is a schematic diagram of an alternative convolutional neural network model compression device in accordance with an embodiment of the present invention;

FIG. 13 is a schematic diagram of an alternative convolutional neural network model-using device in accordance with an embodiment of the present invention;

FIG. 14 is a schematic diagram of another alternative convolutional neural network model-using device in accordance with an embodiment of the present invention;

FIG. 15 is a schematic diagram of an alternative electronic device according to an embodiment of the invention;

fig. 16 is a schematic structural view of an alternative electronic device according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to an aspect of the embodiment of the present invention, there is provided a convolutional neural network model compression method, optionally, as an optional implementation manner, the convolutional neural network model compression method may be applied, but not limited to, in the environment shown in fig. 1.

As shown in fig. 1, fig. 1 includes a terminal 104 and a server 112. The terminal 104 is responsible for human-computer interaction with the user 102, and the terminal 104 includes a memory 106 for storing interaction information input by the user 102, a processor 108 for processing the interaction information input by the user 102, or transmitting the interaction information to a server 112 via a network 110. The server 112 includes a database 114 for storing interaction information for the user 102, a processing engine 116 for processing or forwarding the interaction information, and the like.

The terminal 104 is displayed with a user interface 104-2 that includes a content input button 104-4 and a result display area 104-6. The content input button is used for detecting a data input instruction, the terminal 104 receives a parameter after detecting the data input instruction, inputs the parameter into the target convolutional neural network model, outputs a result by the target convolutional neural network model, and displays the result in the result display area 104-6.

Alternatively, the terminal 104 may be, but not limited to, a mobile phone, tablet, notebook, PC, or other physical terminal, and the network 110 may include, but is not limited to, a wireless network or a wired network. Wherein the wireless network comprises: bluetooth, WIFI, and other networks that enable wireless communications. The wired network may include, but is not limited to: wide area network, metropolitan area network, local area network. The server 112 may include, but is not limited to, any hardware device that can perform calculations.

It should be noted that fig. 1 is only an example, and the manner of displaying the result is not limited to the interface display, and the result may be output in the form of sound. The input of parameters is not limited to the input using the input button 104-4, and other modes, such as sensor acquisition, camera shooting, voice input, etc., may be used. This embodiment is not limited thereto.

According to the embodiment, the first batch of standardization layers in the convolutional neural network model are combined into the first convolutional layer to generate the first target neural network model, the deleting operation is carried out on the convolutional kernel with the norm smaller than the first threshold value in the first target neural network model to obtain the second target neural network model with smaller memory occupancy rate, and the second target neural network model is trained to obtain the converged third target neural network model. Further, the volume of the convolutional neural network model is reduced, so that the convolutional neural network model is not limited to a specific use environment, and the flexibility of the convolutional neural network model is improved.

Optionally, as an optional embodiment, as shown in fig. 2, the convolutional neural network model compression method includes:

s202, merging parameters of a first batch of standardization layers in the convolutional neural network model into a first convolutional layer in the convolutional neural network model to generate a first target neural network model containing the first target convolutional layer, wherein the convolutional neural network model and the first target neural network model have the same output for the same input, and the first batch of standardization layers are batch standardization layers which are positioned behind the first convolutional layer in the convolutional neural network model and are connected with the first convolutional layer;

S204, deleting convolution kernels with norms smaller than a first threshold value in a first target convolution layer in the first target neural network model to obtain a second target neural network model, wherein the memory occupancy rate of the second target neural network model is smaller than that of the first target neural network model;

s206, compressing the second target neural network model to obtain a third target neural network model.

Alternatively, the compression method of the convolutional neural network model can be applied to any field using the convolutional neural network model, but is not limited to the method. For example, the convolutional neural network model is applied to a scene for automatically identifying fruit quality. If the prior art is used, the convolutional neural network model has large volume and high memory occupation, and if the prior convolutional neural network model is used for identifying whether fruits are good or not, the resources are consumed more, and the identification efficiency is low. By the convolutional neural network model compression method in the scheme, the convolutional neural network model is compressed, so that the volume of the convolutional neural network model is reduced, and the memory consumption of the convolutional neural network model is reduced. For example, the compressed convolutional neural network model is installed in the mobile phone, so that the mobile phone can judge whether the fruits are good or not by acquiring the pictures of the fruits, and the use efficiency and the use flexibility of the convolutional neural network model are improved.

Alternatively, the convolutional neural network model described above is applied to a scene in which the age of a person is identified. By compressing the convolutional neural network model, the volume of the convolutional neural network model is reduced, and the memory consumption of the convolutional neural network model is reduced. When the compressed convolutional neural network model is applied, the compressed convolutional neural network model has small volume and low memory consumption, so the compressed convolutional neural network model can be flexibly applied. If the mobile phone carrying the compressed convolutional neural network model is used for photographing, then the photo is identified, and the age of the photographed person is output. The method has the advantages of improving the use efficiency and the use flexibility of the convolutional neural network model.

The above is merely an example of the use of the compressed convolutional neural network model, and the present embodiment is not limited to the use form of the convolutional neural network model. Any scene using the convolutional neural network model can be used for compressing the convolutional neural network model, so that the volume of the convolutional neural network model is reduced, the memory consumption of the convolutional neural network model is reduced, and the use efficiency and the use flexibility of the convolutional neural network model are improved.

Alternatively, the first convolutional layer of the convolutional neural network model may be any layer of the convolutional neural network model. And as convolutional neural network models may contain cross-layer connected branches and/or multiple layers of series branches. For a convolutional neural network model in which there are cross-layer connected branches and/or multiple layers of series branches, all convolutional layers of all branches of the convolutional neural network model except the last convolutional layer of each branch are determined as the first convolutional layer. That is, except that the last convolution layer of each branch is not merged, the other remaining convolution layers are all used as the first convolution layer and are merged, so that the compression of the convolution neural network model is realized. Meanwhile, the fusion of output results of different branches is ensured.

Optionally, after the parameters of the first batch of standardization layers in the convolutional neural network model are combined into the first convolutional layer, unimportant convolutional kernels in the first target convolutional layer obtained after combination are deleted, so that the first target convolutional layer is compressed, and the convolutional neural network model is compressed. At this point, optionally, the significance of each set of convolution kernels may be determined using a P-norm approach Degree of the degree. For example, as shown in connection with fig. 3 (a) and 3 (b), fig. 3 (a) includes a first target convolutional layer 302, a first output characteristic diagram in the first target convolutional layer 302Corresponds to a set of convolution kernels K' ₁₁ 、K′ ₂₁ 、K′ ₃₁ . And after calculating the P-norms of each set of convolution kernels for each target convolution layer, sorting the-norms of each set of convolution kernels, the target set of convolution kernels having a P-norms less than the first threshold value being deleted. Such as a target set convolution kernel K' ₁₁ 、K′ ₂₁ 、K′ ₃₁ The corresponding P-norm is smaller than the first threshold, the target group is convolved with the kernel K' ₁₁ 、K′ ₂₁ 、K′ ₃₁ Deleting, i.e. outputting the profile +.>Deleting, thereby completing the compression of the convolutional neural network model. It should be noted that, due to the output characteristic diagram +.>Is deleted, corresponding to +.>The associated subsequent convolution kernel is also deleted. As shown in the compressed first target convolutional layer 304 in FIG. 3 (b), delete +.>The relevant convolution kernels thereafter (all dashed lines in fig. 3 (a) are deleted).

In practical applications, the target group convolution kernel K 'may be used' ₁₁ 、K′ ₂₁ 、K′ ₃₁ Is set to 0, then the feature map is outputNo longer input and output valid parameters to achieve deletion of target group volumesThe effect of accumulation. The above P may be 1 or 2, i.e. a 1-norm or 2-norm of each set of convolution kernels is calculated, and one or more sets of convolution kernels are deleted based on the size of the 1-norm or 2-norm.

After deleting the target set of convolution kernels having a P-norm less than the first threshold, a ratio of a number of all remaining convolution kernels in the second target neural network model including the first target convolution layer to a number of all convolution kernels in the first target neural network model is less than a predetermined threshold. For example, after the target set of convolution kernels is deleted, 500 convolution kernels remain in the second target neural network model, while the total convolution kernels in the first target neural network model are 1000. And setting the preset threshold value to be 60%, wherein the ratio of the number of all the residual convolution kernels in the second target neural network model to the number of all the convolution kernels in the first target neural network model is 50%, and the ratio is smaller than the preset threshold value.

It should be noted that, in the process of deleting the target group convolution kernel of the first target convolution layer, a preset compression rate needs to be set for each target convolution layer, where the preset compression rate is used to determine the compression degree of the first target convolution layer. If the preset compression rate is set to 80%, the corresponding convolution kernel of 20% should be deleted. If the P-norms are ordered, deleting the target group convolution kernel corresponding to the smallest P-norms, and then detecting whether the compression rate of the deleted first target convolution layer is less than 80%. And if the compression rate of the first target convolution layer is less than 80%, the first target convolution layer is qualified in compression. If the compression rate of the first target convolution layer is greater than 80%, the first target convolution layer still needs to continue compression. At this time, the target group convolution kernel corresponding to the current minimum P-norm is deleted again, and the compression of the first target convolution layer is completed until the compression rate is less than 80%. It should be noted that, the preset compression ratios set for the first target convolution layers may be the same or different, and specific values are set according to needs.

Optionally, after compressing the first target convolutional layer in the convolutional neural network model to obtain a second target neural network model, training the second target neural network model until the second target neural network model converges. At this time, the converged second target neural network model is taken as a third target neural network model, thereby completing the compression of the convolutional neural network model. In the compression process, the second target neural network model may be compressed layer by layer.

By combining parameters of a first batch of standardization layers in the convolutional neural network model into the first convolutional layer, a first target neural network model containing a first target convolutional layer is generated; deleting convolution kernels with norms smaller than a first threshold in a first target convolution layer in a first target neural network model; and compressing the second target neural network model after deleting the target convolution kernel to obtain a third target neural network model, thereby realizing the compression of the convolution neural network model, reducing the volume of the convolution neural network model, and improving the use efficiency and the use flexibility of the convolution neural network model.

As an alternative to this embodiment of the present invention,

the method further comprises, prior to deriving the second target neural network model: s1, determining the P-norm of each group of convolution kernels in the first target convolution layer, wherein P is equal to 0 or P is equal to 1 or P is equal to 2 or P is equal to positive infinity;

deleting convolution kernels having norms less than a first threshold in a first target convolution layer in a first target neural network model includes: s1, setting pruning rate of each group of convolution kernels; s2, arranging p-norms of each group of convolution kernels in order from large to small; and S3, deleting target group convolution kernels corresponding to the P-norms smaller than the first threshold in the P-norms of each group of convolution kernels until the pruning rate requirement is met, and obtaining the second target neural network model, wherein the ratio of the number of all convolution kernels in the second target neural network model to the number of all convolution kernels in the first target neural network model is smaller than a preset threshold.

The pruning rate may be used to represent the degree of deletion of a set of convolution kernels and may be represented by a value between 0 and 1. If the pruning rate is 60, the set of convolution kernels needs to be compressed to 40% of the original.

Alternatively, after the P-norms of each set of convolution kernels in the first target convolution layer are obtained, the P-norms may be ordered by set, in order from large to small or in order from small to large. The larger the P-norm, the more important the corresponding set of convolution kernels. After ordering the P-norms, a first threshold is obtained, and a set of convolution kernels with P-norms less than the first threshold are deleted, thereby deleting unimportant convolution kernels. P is preferably 1 or 2.

According to the embodiment, by determining the P-norm of each group of convolution kernels, important convolution kernels are determined according to the P-norms, unimportant convolution kernels are deleted, the effect of improving the accuracy of deleting the convolution kernels is achieved, and the compression efficiency of compressing the convolution neural network model is further guaranteed.

As an alternative, deleting the target group convolution kernels corresponding to the P-norm less than the first threshold from the P-norms of each group of convolution kernels includes:

s1, setting the weight of a target group convolution kernel with the P-norm smaller than a first threshold to 0.

Optionally, each convolution kernel is configured with a certain weight during the transfer of data. If the weight is set to 0, the data transferred by the convolution kernel is zero, and valid data is not transferred any more. Therefore, the weight of the target group convolution kernel is set to 0, so that the target group convolution kernel can be prevented from continuously transmitting data, and the purpose of deleting the target group convolution kernel is achieved.

According to the embodiment, the target group convolution kernel is deleted by the method, so that the unimportant convolution kernel in the first target neural network model is deleted, and the compression efficiency of the convolution neural network model is improved.

As an alternative embodiment, before incorporating the parameters of the first batch of normalized layers in the convolutional neural network model into the first convolutional layers in the convolutional neural network model, further comprising:

s1, taking a convolution layer except the last convolution layer on each branch in the convolution neural network model as a first convolution layer when the convolution neural network model comprises cross-layer connection branches formed by the convolution layers and/or multiple layers of series branches.

Optionally, in compressing the convolutional neural network model, it is necessary to determine which convolutional layers of the convolutional neural network model are the first convolutional layers. For convolutional neural network models that have no cross-layer connected branches and multiple layers of series branches, such as the VGGNet model, all convolutional layers of the convolutional neural network model may be determined as the first convolutional layer, i.e., all convolutional layers are compressed. If the residual module of the model, such as the res net model or the res next model, contains cross-layer connections, a part of the convolution layers in the model need to be selected as the first convolution layer.

For example, as shown in FIG. 4, for a model comprising cross-layer connected branches, the main branches are formed by concatenating convolutions of 1*1, 3*3, 1*1, with the number of input channels being C respectively ₀ 、C ₁ 、C ₂ The cross-layer branch is the output characteristic diagram of the last residual error module, so that the fusion of the output results of different branches is not influenced, the channel number of the output characteristic diagram of the main branch needs to be ensured to be unchanged, and therefore only the first two convolution layers are compressed, and the last 1*1 convolution layer of the main branch is not compressed. In the InceptionNet, the fusion of the feature graphs comprising a plurality of branch outputs is carried out, the last convolution layer in each branch is not compressed, and other convolution layers are compressed according to the method.

By the embodiment, the last convolution layer of the convolution layers of the cross-layer connection branches and/or the multi-layer series connection branches is not compressed, so that the compression accuracy of the convolution neural network model is improved.

As an alternative embodiment, incorporating parameters of a first batch of normalized layers in a convolutional neural network model into a first convolutional layer in the convolutional neural network model comprises:

s1, calculating N convolution kernels of the first target convolution layer corresponding to the ith output feature map according to N convolution kernels of the first convolution layer corresponding to the ith output feature map of the first convolution layer and parameters of the first batch of standardization layers corresponding to the ith output feature map of the first convolution layer;

S2, calculating the offset of the first target convolution layer corresponding to the ith output feature map according to the offset of the first convolution layer corresponding to the ith output feature map of the first convolution layer and the parameters of the first batch of standardization layers corresponding to the ith output feature map of the first convolution layer;

s3, determining the ith output characteristic diagram of the first target convolution layer according to the convolution sum between the N input characteristic diagrams of the first convolution layer and the N convolution kernels of the first target convolution layer corresponding to the ith output characteristic diagram and the offset of the first target convolution layer corresponding to the ith output characteristic diagram,

n is the number of the input characteristic diagrams of the first convolution layer, i is more than or equal to 1 and less than or equal to M, and M is the number of the output characteristic diagrams of the first target convolution layer.

Alternatively, the first convolution layer is expressed as:

the first batch normalization layer is expressed as:

wherein, the liquid crystal display device comprises a liquid crystal display device,for the ith output profile of the first convolution layer,/or->For the j-th input feature map of the first convolutional layer,is convolutionOperation, K _ji A j-th convolution kernel, b, which is the i-th group of the first convolution layer _i Offset for the ith set of convolution kernels of the first convolution layer, where 1 <＝i<＝M，1<＝j<N, M is the number of output feature maps, N is the number of input feature maps;

the ith output feature map, gamma, for the first batch normalization layer _i And beta _i Normalizing the layer medium and for the first lotCorresponding parameters, mu _i To +.>Average value saved by corresponding first batch normalization layer, +.>To +.>The variance stored in the corresponding first batch normalization layer, e is a non-zero constant;

wherein parameters of a first batch of normalized layers in the convolutional neural network model are incorporated into the first convolutional layer in the convolutional neural network model by the following formula:

wherein, K' _ji A j-th convolution kernel, b 'for the i-th output feature map of the first target convolution layer' _i The bias of the ith output feature map for the first target convolutional layer.

For example, the description is given with reference to fig. 5 and 6. As shown in FIG. 5, the j-th input feature map of the first convolutional layer isIncludedThe i output feature map is +.>Comprises->The j-th convolution kernel of the i-th group is K _ji Comprises K ₁₁ 、K ₂₁ 、K ₃₁ 、K ₁₂ 、K ₂₂ 、K ₃₂ 、K ₁₃ 、K ₂₃ 、K ₃₃ The bias of the ith set of convolution kernels is bi, including b ₁ 、b ₂ 、b ₃ ，/>Representing a convolution operation. The convolution process can be expressed as:

in the first batch normalization layer, gamma _i 、β _i Input feature map representing ith channelCorresponding parameters, gamma _i Comprising the following steps: gamma ray ₁ 、γ ₂ 、γ ₃ ，β _i Comprising the following steps: beta ₁ 、β ₂ 、β ₃ ，μ _i To +.>Average value saved by corresponding first batch normalization layer, +.>To +.>The variance, mu, saved by the corresponding first normalization layer _i Comprising the following steps: mu (mu) ₁ 、μ ₂ 、μ ₃ ，/>Comprising the following steps: />The process of the first batch normalization layer can be expressed as:

combining the formula and the formula to obtain:

wherein, K' _ji A j-th convolution kernel, b 'of the i-th group of the first target convolution layer' _i Is the offset of the ith set of convolution kernels in the first target convolution layer. As shown in figure 6 of the drawings,by K' _ji K for replacing the original convolution kernel _ji By b' _i B replacing the original convolution kernel _i And simultaneously removing the first batch of standardized layers, so as to obtain a target model after the first batch of standardized layers are fused into the first convolution layer, wherein the target model is completely equivalent to the original convolution neural network model in terms of calculation results.

According to the embodiment, the parameters of the first batch of standardized layers are combined into the first convolution layer through the method, so that the combination of the first batch of standardized layers and the first convolution layer of the convolution neural network model is realized, and the first target convolution layer is obtained. The unimportant convolution kernels in the first target convolution layer are further deleted through the P-norm, so that compression of the convolution neural network model is achieved, the volume of the convolution neural network model is reduced, and the use efficiency and the use flexibility of the convolution neural network model are improved.

As an alternative embodiment, determining the P-norm of each set of convolution kernels in the first target convolution layer comprises:

s1, determining P-norms of N convolution kernels of the first target convolution layer, which correspond to the ith output feature image, wherein i is more than or equal to 1 and less than or equal to M, M is the number of the output feature images of the first target convolution layer, N is the number of the input feature images of the first convolution layer, and the ith group of convolution kernels in the first target convolution layer is the N convolution kernels of the first target convolution layer, which correspond to the ith output feature image;

s2, determining the sum of P-norms of N convolution kernels of the first target convolution layer, which correspond to the ith output feature map;

s3, summing the P-normsThe power is taken as the P-norm of the ith set of convolution kernels of the first target convolution layer corresponding to the ith output feature map.

Optionally, the P-norm of each set of convolution kernels in the first target convolution layer is determined using the following formula:

wherein, K' _ji For the j-th convolution kernel of the i-th group of the first target convolution layer, P is equal to 0 or P is equal to 1 or P is equal to 2 or P is equal to positive infinity, 1 < = i < = M,1 < = j < = N, M is the number of output feature maps, N is the number of input feature maps, And P-norm of a group of convolution kernels corresponding to the ith output feature map. P is preferably 1 or 2.

The P-norm of the convolution kernel of each group in the first target convolution layer is determined through the formula, so that the first target convolution layer can be compressed according to the P-norm, and the compression efficiency of compressing the convolution neural network model is improved.

As an alternative embodiment, compressing the second target neural network model to obtain a compressed third target neural network model, including:

setting a compression rate rho, wherein rho is more than 0 and less than 1; and

and carrying out layer-by-layer compression on the second target neural network model until the set compression rate rho is met.

By setting the compression rate, the compression degree of the second target neural network model can be limited, and the compression accuracy is improved.

As an alternative embodiment, after training the second target neural network model to obtain the third target neural network model, the method further includes:

s1, acquiring a face image of a person;

s2, inputting the face image of the person into a third target neural network model;

and S3, obtaining a result output by the third target neural network model, wherein the output result is the age of the person corresponding to the face image of the person.

For example, since the third target neural network model in the present solution is a compressed model. Therefore, the model has small volume and low memory consumption, and can be flexibly applied to various terminals or applications. Taking the example that the third target neural network model is applied to the mobile phone, when the mobile phone is used for photographing the person to obtain the face image of the person or the mobile phone comprises the face image of the person, the third target neural network model is used for identifying the face image of the person, and then the third target neural network model outputs the age of the person corresponding to the face image of the person.

Through the embodiment, the flexible application of the third target neural network model is realized through the method.

The above-described method is described in its entirety below in connection with steps S702 to S712 in fig. 7. As shown in fig. 7, for a convolutional neural network model trained in advance on an arbitrary data set, all combinable first batch of normalized layer parameters in the model are combined into a corresponding first convolutional layer to generate a first target neural network model, and the output result of the convolutional neural network model and the output result of the first target neural network model are equivalent. And (3) calculating the P-norm of each group of convolution kernels for the first target convolution layer of the first target neural network model generated in the step S704, sequencing from large to small, removing a plurality of groups of convolution kernels with smaller P-norms in the step S706 from the first target neural network model until the requirement of a set compression rate rho (0 < rho < 1) is met, and simultaneously removing parameters related to the removed convolution kernels in subsequent convolution layers of the first target convolution layer from the target model. And repeating the steps S706 and S708 to obtain a second target neural network model. And retraining the second target neural network model on the given data set until the model converges to obtain a third target neural network model.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present invention. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present invention.

According to another aspect of the embodiment of the present invention, there is further provided a convolutional neural network model using method, as an alternative, as shown in fig. 8, the method includes:

s802, acquiring parameters to be input;

s804, inputting parameters to be input into a third target neural network model, wherein the third target neural network model is a model obtained by merging parameters of a first batch of standardization layers in the convolutional neural network model into a first convolutional layer in the convolutional neural network model, deleting target convolutional kernels in the first target convolutional layer in the first target neural network model after the first target neural network model containing the first target convolutional layer is generated, and compressing a second target neural network model after the target convolutional kernels are deleted, wherein the convolutional neural network model and the first target neural network model have the same output for the same input, and the first batch of standardization layers are batch of standardization layers which are positioned behind the first convolutional layer in the convolutional neural network model and are connected with the first convolutional layer;

S806, obtaining a result output by the third target neural network model.

Alternatively, the convolutional neural network model using method can be applied to any scene which can use the compressed convolutional neural network model, but is not limited to the application.

For example, the above method is applied to a scene in which the facial features ratio is recognized using a mobile phone. If the prior art is used, the convolutional neural network model has large volume and high occupied memory, and can not be normally used or used on a terminal with weak operation capability such as a mobile phone. By adopting the method in the scheme, the convolutional neural network model is a compressed model, so that the convolutional neural network model can smoothly run on a mobile phone. After the compressed convolutional neural network model is installed in the mobile phone, the mobile phone acquires face parameters and then outputs a recognition result. For example, the recognition result is "your five sense organs are very correct", or the recognition result is "your eyes are very large".

It should be noted that the application scenario is only an example, and the present embodiment does not limit the application scenario of the convolutional neural network model using method.

By the method, the convolutional neural network model is compressed, so that the volume of the convolutional neural network model is reduced, and the use efficiency and flexibility of the convolutional neural network model are improved.

As an alternative, before incorporating the parameters of the first batch of normalized layers into the first convolutional layer in the convolutional neural network model, further comprising:

In compressing the convolutional neural network model, it is necessary to determine which convolutional layers of the convolutional neural network model are the first convolutional layers. For convolutional neural network models that have no cross-layer connected branches and multiple layers of series branches, such as the VGGNet model, all convolutional layers of the convolutional neural network model may be determined as the first convolutional layer, i.e., all convolutional layers are compressed. If the residual module of the model, such as the res net model or the res next model, contains cross-layer connections, a part of the convolution layers in the model need to be selected as the first convolution layer.

According to the embodiment, the convolutional layer to be compressed is determined by the method, so that the compression accuracy of compressing the convolutional neural network is improved.

As an alternative to this embodiment of the present invention,

the obtaining the parameters to be input includes: s1, acquiring a face image of a person;

the inputting the parameters to be input into the third target neural network model includes: s1, inputting a person face image into a third target neural network model;

the obtaining the result output by the third target neural network model includes: s1, acquiring the age of the person corresponding to the person face image output by the third target neural network model.

Other aspects of the present embodiment may refer to examples of the convolutional neural network model compression method described above, and will not be repeated here.

According to still another aspect of the embodiment of the present invention, there is also provided a convolutional neural network model compression device for implementing the convolutional neural network model compression method described above. As shown in fig. 9, the apparatus includes:

(1) A merging unit 902, configured to merge parameters of a first batch of normalization layers in the convolutional neural network model into a first convolutional layer in the convolutional neural network model, and generate a first target neural network model including a first target convolutional layer, where the convolutional neural network model and the first target neural network model have the same output for the same input, and the first batch of normalization layers is a batch of normalization layers that is located after the first convolutional layer in the convolutional neural network model and is connected with the first convolutional layer;

(2) A deleting unit 904, configured to delete a convolution kernel in a first target convolution layer in the first target neural network model, where the norm is smaller than a first threshold, to obtain a second target neural network model, where a memory occupancy rate of the second target neural network model is smaller than a memory occupancy rate of the first target neural network model;

(3) And the compressing unit 906 is configured to compress the second target neural network model to obtain a third target neural network model.

Alternatively, the compression device of the convolutional neural network model can be applied to any field using the convolutional neural network model, but is not limited to the compression device. For example, the convolutional neural network model is applied to a scene for automatically identifying fruit quality. If the prior art is used, the convolutional neural network model has large volume and high memory occupation, and if the prior convolutional neural network model is used for identifying whether fruits are good or not, the resources are consumed more, and the identification efficiency is low. By the convolutional neural network model compression method in the scheme, the convolutional neural network model is compressed, so that the volume of the convolutional neural network model is reduced, and the memory consumption of the convolutional neural network model is reduced. For example, the compressed convolutional neural network model is installed in the mobile phone, so that the mobile phone can judge whether the fruits are good or not by acquiring the pictures of the fruits, and the use efficiency and the use flexibility of the convolutional neural network model are improved.

Optionally, after the parameters of the first batch of standardization layers in the convolutional neural network model are combined into the first convolutional layer, unimportant convolutional kernels in the first target convolutional layer obtained after combination are deleted, so that the first target convolutional layer is compressed, and the convolutional neural network model is compressed. At this time, the importance of each set of convolution kernels may optionally be determined using a P-norm method. For example, as shown in connection with fig. 3 (a) and 3 (b), fig. 3 (a) includes a first target convolutional layer 302, a first output characteristic diagram in the first target convolutional layer 302Corresponds to a set of convolution kernels K' ₁₁ 、K′ ₂₁ 、K′ ₃₁ . And after calculating the P-norms of each set of convolution kernels for each target convolution layer, sorting the-norms of each set of convolution kernels, the target set of convolution kernels having a P-norms less than the first threshold value being deleted. Such as a target set convolution kernel K' ₁₁ 、K′ ₂₁ 、K′ ₃₁ The corresponding P-norm is smaller than the first threshold, the target group is convolved with the kernel K' ₁₁ 、K′ ₂₁ 、K′ ₃₁ Deleting, i.e. outputting the profile +.>Deleting, thereby completing the compression of the convolutional neural network model. It should be noted that, due to the output characteristic diagram +.>Is deleted, corresponding to +.>The associated subsequent convolution kernel is also deleted. Compressed first target convolutional layer as in FIG. 3 (b) 304, delete->The relevant convolution kernels thereafter (all dashed lines in fig. 3 (a) are deleted).

In practical applications, the target group convolution kernel K 'may be used' ₁₁ 、K′ ₂₁ 、K′ ₃₁ Is set to 0, then the feature map is outputEffective parameters are not input and output any more, so that the effect of deleting the convolution kernel of the target group is achieved. The above P may be 1 or 2, i.e. a 1-norm or 2-norm of each set of convolution kernels is calculated, and one or more sets of convolution kernels are deleted based on the size of the 1-norm or 2-norm.

Optionally, after compressing the first target convolutional layer in the convolutional neural network model to obtain a second target neural network model, training the second target neural network model until the second target neural network model converges. At this time, the converged second target neural network model is taken as a third target neural network model, thereby completing the compression of the convolutional neural network model.

By combining parameters of a first batch of standardization layers in the convolutional neural network model into the first convolutional layer, a first target neural network model containing a first target convolutional layer is generated; deleting convolution kernels with norms smaller than a first threshold in a first target convolution layer in a first target neural network model; the second target neural network model after the target convolution kernel is deleted is trained to obtain a third target neural network model, so that the convolution neural network model is compressed, the volume of the convolution neural network model is reduced, and the use efficiency and the use flexibility of the convolution neural network model are improved.

As an alternative embodiment, as shown in figure 10,

the device further comprises: (1) A first determining unit 1002, configured to determine a P-norm of each set of convolution kernels in the first target convolutional layer before obtaining the second target neural network model, where P is equal to 0 or P is equal to 1 or P is equal to 2 or P is equal to positive infinity;

the deleting unit 904 includes: (1) A first setting module 1004, configured to set a pruning rate of each set of convolution kernels; (2) A sorting module 1006, configured to sort the p-norms of each set of convolution kernels in order from large to small; (3) And a deleting module 1008, configured to delete a target group of convolution kernels corresponding to a P-norm less than the first threshold in the P-norm of each group of convolution kernels until the pruning rate requirement is met, so as to obtain the second target neural network model, where a ratio of the number of all convolution kernels in the second target neural network model to the number of all convolution kernels in the first target neural network model is less than a predetermined threshold.

As an alternative embodiment, as shown in fig. 11, the deletion module 1008 includes:

a setting submodule 1102 is configured to set a weight of a target set of convolution kernels with a P-norm less than a first threshold to 0.

As an alternative embodiment, as shown in fig. 12, the above apparatus further includes:

(1) A second determining unit 1202 is configured to take, as a first convolutional layer, a convolutional layer in the convolutional neural network model except for a last convolutional layer on each branch, in a case where the convolutional neural network model includes cross-layer connection branches and/or multiple layer series branches formed by the convolutional layers, before merging parameters of a first batch of normalized layers in the convolutional neural network model into the first convolutional layer in the convolutional neural network model.

By the embodiment, the last convolution layer containing cross-layer connection or multi-layer fusion is not compressed, so that the compression accuracy of the convolution neural network model is improved.

As an alternative, the merging unit 902 includes:

(1) The first calculation module is used for calculating N convolution kernels of the first target convolution layer corresponding to the ith output feature map according to N convolution kernels of the first convolution layer corresponding to the ith output feature map of the first convolution layer and parameters of the first batch of standardization layers corresponding to the ith output feature map of the first convolution layer;

(2) The second calculation module is used for calculating the offset of the first target convolution layer corresponding to the ith output feature map according to the offset of the first convolution layer corresponding to the ith output feature map of the first convolution layer and the parameter of the first batch normalization layer corresponding to the ith output feature map of the first convolution layer;

(3) A third calculation module, configured to determine an ith output feature map of the first target convolution layer according to a convolution sum between N input feature maps of the first convolution layer and N convolution kernels of the first target convolution layer corresponding to the ith output feature map, and an offset of the first target convolution layer corresponding to the ith output feature map,

As an alternative, the first determining unit 1002 includes:

(1) A first determining module, configured to determine a P-norm of N convolution kernels of the first target convolution layer corresponding to an i-th output feature map, where i is 1-M, where M is the number of output feature maps of the first target convolution layer, N is the number of input feature maps of the first convolution layer, and an i-th set of convolution kernels in the first target convolution layer is the N convolution kernels of the first target convolution layer corresponding to the i-th output feature map;

(2) A second determining module, configured to determine a sum of P-norms of N convolution kernels of the first target convolution layer corresponding to the i-th output feature map;

(3) A third determination module for summing the P-normsThe power is taken as the P-norm of the ith set of convolution kernels of the first target convolution layer corresponding to the ith output feature map.

As an alternative embodiment, the compression unit 906 includes:

(1) The second setting module is used for setting the compression rate rho, wherein rho is more than 0 and less than 1;

(2) And the compression module is used for carrying out layer-by-layer compression on the second target neural network model until the set compression rate rho is met.

As an alternative, the apparatus further includes:

(1) The first acquisition unit is used for acquiring a character face image after training the second target neural network model to obtain a third target neural network model;

(2) A second input unit for inputting the face image of the person into a third target neural network model;

(3) And the second acquisition unit is used for acquiring a result output by the third target neural network model, wherein the output result is the age of the person corresponding to the face image of the person.

According to the embodiment, the device is used for realizing flexible application of the third target neural network model.

According to still another aspect of the embodiment of the present invention, there is also provided a convolutional neural network model using apparatus for implementing the convolutional neural network model using method described above. Alternatively, as shown in fig. 13, the above-described apparatus includes:

(1) A first obtaining unit 1302, configured to obtain a parameter to be input;

(2) An input unit 1304, configured to input parameters to be input into a third target neural network model, where the third target neural network model is a model obtained by merging parameters of a first batch of normalization layers in the convolutional neural network models into a first convolutional layer in the convolutional neural network models, deleting target convolutional kernels in the first target convolutional layers in the first target neural network models after generating the first target neural network models including the first target convolutional layers, and compressing a second target neural network model after deleting the target convolutional kernels, where the convolutional neural network model and the first target neural network model have the same output for the same input, and the first batch of normalization layers is a batch of normalization layers that is located behind the first convolutional layers in the convolutional neural network models and is connected with the first convolutional layers;

(3) A second obtaining unit 1306, configured to obtain a result output by the third target neural network model.

Alternatively, the convolutional neural network model using device can be applied to any scene which can use the convolutional neural network model after compression, but is not limited to the device.

For example, the device is a mobile phone. If the prior art is used, the convolutional neural network model has large volume and high occupied memory, and can not be normally used or used on a terminal with weak operation capability such as a mobile phone. By adopting the method in the scheme, the convolutional neural network model is a compressed model, so that the convolutional neural network model can smoothly run on a mobile phone. After the compressed convolutional neural network model is installed in the mobile phone, the mobile phone acquires face parameters and then outputs a recognition result. For example, the recognition result is "your five sense organs are very correct", or the recognition result is "your eyes are very large".

As an alternative, as shown in figure 14,

the first acquiring unit 1302 includes: (1) A first acquisition module 1402 for acquiring a person face image;

the input unit 1304 includes: (1) An input module 1404 for inputting a person face image into a third target neural network model;

the second acquisition unit 1306 includes: (1) The second obtaining module 1406 is configured to obtain an age of a person corresponding to the face image of the person output by the third target neural network model.

As an alternative, the apparatus further includes:

(1) And the determining unit is used for taking the convolution layers except the last convolution layer on each branch in the convolution neural network model as the first convolution layer in the case that the convolution neural network model comprises cross-layer connection branches formed by the convolution layers and/or multiple layers of serial branches before the parameters of the first batch of normalization layers are combined into the first convolution layer in the convolution neural network model.

According to a further aspect of an embodiment of the present invention, there is also provided an electronic device for implementing the convolutional neural network model compression method described above, as shown in fig. 15, the electronic device comprising a memory 1502 and a processor 1504, the memory 1502 having stored therein a computer program, the processor 1504 being arranged to perform the steps of any of the method embodiments described above by means of the computer program.

Alternatively, in this embodiment, the electronic apparatus may be located in at least one network device of a plurality of network devices of the computer network.

Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:

s1, merging parameters of a first batch of standardization layers in a convolutional neural network model into a first convolutional layer in the convolutional neural network model to generate a first target neural network model containing the first target convolutional layer, wherein the convolutional neural network model and the first target neural network model have the same output for the same input, and the first batch of standardization layers are batch standardization layers which are positioned behind the first convolutional layer in the convolutional neural network model and are connected with the first convolutional layer;

S2, deleting convolution kernels with norms smaller than a first threshold value in a first target convolution layer in the first target neural network model to obtain a second target neural network model, wherein the memory occupancy rate of the second target neural network model is smaller than that of the first target neural network model;

and S3, compressing the second target neural network model to obtain a third target neural network model.

Alternatively, it will be understood by those skilled in the art that the structure shown in fig. 15 is only schematic, and the electronic device may also be a terminal device such as a smart phone (e.g. an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, and a mobile internet device (Mobile Internet Devices, MID), a PAD, etc. Fig. 15 is not limited to the structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in fig. 15, or have a different configuration than shown in fig. 15.

The memory 1502 may be used to store software programs and modules, such as program instructions/modules corresponding to the convolutional neural network model compression method and apparatus in the embodiments of the present invention, and the processor 1504 executes the software programs and modules stored in the memory 1502 to perform various functional applications and data processing, i.e. implement the convolutional neural network model compression method described above. The memory 1502 may include high-speed random access memory, but may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 1502 may further include memory located remotely from the processor 1504, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 1502 may be used to store, but is not limited to, information such as a target convolutional neural network model. As an example, as shown in fig. 15, the memory 1502 may include, but is not limited to, a merging unit 902, a deleting unit 904, and a compressing unit 906 in the convolutional neural network model compressing device. In addition, other module units in the convolutional neural network model compression device may be included, but are not limited to, and are not described in detail in this example.

Optionally, the transmission device 1506 is configured to receive or transmit data via a network. Specific examples of the network described above may include wired networks and wireless networks. In one example, the transmission device 1506 includes a network adapter (Network Interface Controller, NIC) that may be connected to other network devices and routers via a network cable to communicate with the internet or a local area network. In one example, the transmission device 1506 is a Radio Frequency (RF) module that is configured to communicate wirelessly with the internet.

In addition, the electronic device further includes: a display 1508 for displaying an output result of the target convolutional neural network model; and a connection bus 1510 for connecting the individual module components in the electronic device.

According to a further aspect of the embodiments of the present invention, there is also provided an electronic device for implementing the above convolutional neural network model using method, as shown in fig. 16, the electronic device comprising a memory 1602 and a processor 1604, the memory 1602 having stored therein a computer program, the processor 1604 being arranged to perform the steps of any of the method embodiments described above by the computer program.

s1, acquiring parameters to be input;

s2, inputting parameters to be input into a third target neural network model, wherein the third target neural network model is a model obtained by merging parameters of a first batch of standardization layers in the convolutional neural network model into a first convolutional layer in the convolutional neural network model, deleting target convolutional kernels in the first target convolutional layer in the first target neural network model after the first target neural network model containing the first target convolutional layer is generated, and compressing a second target neural network model after the target convolutional kernels are deleted, wherein the convolutional neural network model and the first target neural network model have the same output for the same input, and the first batch of standardization layers are batch of standardization layers which are positioned behind the first convolutional layer in the convolutional neural network model and are connected with the first convolutional layer;

s3, obtaining a result output by the third target neural network model.

Alternatively, it will be understood by those skilled in the art that the structure shown in fig. 16 is only schematic, and the electronic device may also be a terminal device such as a smart phone (e.g. an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, and a mobile internet device (Mobile Internet Devices, MID), a PAD, etc. Fig. 16 is not limited to the structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in fig. 16, or have a different configuration than shown in fig. 16.

The memory 1602 may be used to store software programs and modules, such as program instructions/modules corresponding to the convolutional neural network model using method and apparatus in the embodiments of the present invention, and the processor 1604 executes the software programs and modules stored in the memory 1602 to perform various functional applications and data processing, i.e., to implement the convolutional neural network model using method described above. Memory 1602 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 1602 may further include memory located remotely from the processor 1604, which may be connected to the terminal by a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 1602 may be used to store information such as, but not limited to, a target convolutional neural network model. As an example, as shown in fig. 16, the memory 1602 may include, but is not limited to, the first acquiring unit 1302, the input unit 1304, and the second acquiring unit 1306 in the convolutional neural network model using apparatus. In addition, other module units in the convolutional neural network model using device may be included, but are not limited to, and are not described in detail in this example.

Optionally, the transmission device 1606 is used to receive or transmit data via a network. Specific examples of the network described above may include wired networks and wireless networks. In one example, the transmission means 1606 includes a network adapter (Network Interface Controller, NIC) that can connect to other network devices and routers via a network cable to communicate with the internet or a local area network. In one example, the transmission device 1606 is a Radio Frequency (RF) module, which is used to communicate with the internet wirelessly.

In addition, the electronic device further includes: a display 1608 for displaying the output result of the target convolutional neural network model; and a connection bus 1610 for connecting the respective module components in the above-described electronic device.

According to a further aspect of embodiments of the present invention there is also provided a storage medium having stored therein a computer program, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.

Alternatively, in the present embodiment, the above-described storage medium may be configured to store a computer program for performing the steps of:

Or alternatively, in the present embodiment, the above-described storage medium may be configured to store a computer program for performing the steps of:

s1, acquiring parameters to be input;

S3, obtaining a result output by the target convolutional neural network model.

Alternatively, in this embodiment, it will be understood by those skilled in the art that all or part of the steps in the methods of the above embodiments may be performed by a program for instructing a terminal device to execute the steps, where the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present invention may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the method described in the embodiments of the present invention.

In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In several embodiments provided by the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. A method for compressing a convolutional neural network model, comprising:

merging parameters of a first batch of standardization layers in a convolutional neural network model into a first convolutional layer in the convolutional neural network model to generate a first target neural network model containing a first target convolutional layer, wherein the convolutional neural network model and the first target neural network model have the same output for the same input, and the first batch of standardization layers are batch standardization layers which are positioned behind the first convolutional layer in the convolutional neural network model and are connected with the first convolutional layer;

Deleting convolution kernels with norms smaller than a first threshold value in the first target convolution layer in the first target neural network model to obtain a second target neural network model, wherein the memory occupancy rate of the second target neural network model is smaller than that of the first target neural network model;

compressing the second target neural network model to obtain a third target neural network model;

acquiring a face image of a person;

inputting the person face image into the third target neural network model;

and obtaining an output result of the third target neural network model, wherein the output result is the age of the person corresponding to the person face image.

2. The method of claim 1, wherein the step of determining the position of the substrate comprises,

before the second target neural network model is obtained, the method further comprises: determining a P-norm of each set of convolution kernels in the first target convolution layer, wherein P equals 0 or P equals 1 or P equals 2 or P equals positive infinity;

the deleting convolution kernels with norms less than a first threshold in the first target convolution layer in the first target neural network model comprises:

Setting pruning rate of each group of convolution kernels;

arranging the p-norms of each group of convolution kernels in order from large to small;

and deleting the target group convolution kernels corresponding to the P-norms smaller than the first threshold in the P-norms of each group of convolution kernels until the pruning rate requirement is met, and obtaining the second target neural network model, wherein the ratio of the number of all convolution kernels in the second target neural network model to the number of all convolution kernels in the first target neural network model is smaller than a preset threshold.

3. The method of claim 2, wherein the deleting the P-norm of each set of convolution kernels from the target set of convolution kernels corresponding to the P-norm less than the first threshold comprises: and setting the weight of the target group convolution kernel of which the P-norm is smaller than the first threshold to 0.

4. A method according to any one of claims 1 to 3, further comprising, prior to incorporating parameters of the first batch normalization layer in the convolutional neural network model into the first convolutional layer in the convolutional neural network model:

in the case where the convolutional neural network model includes cross-layer connection branches and/or multiple series branches formed by convolutional layers, the convolutional layers of the convolutional neural network model except for the last convolutional layer on each branch are taken as the first convolutional layer.

5. A method according to any one of claims 1 to 3, wherein incorporating parameters of a first batch of normalization layers in a convolutional neural network model into a first convolutional layer in the convolutional neural network model comprises:

according to N convolution kernels of the first convolution layer corresponding to the ith output feature map of the first convolution layer and parameters of the first batch of standardization layers corresponding to the ith output feature map of the first convolution layer, N convolution kernels of the first target convolution layer corresponding to the ith output feature map are calculated;

calculating the offset of the first target convolution layer corresponding to the ith output feature map according to the offset of the first convolution layer corresponding to the ith output feature map of the first convolution layer and the parameters of the first batch of standardization layers corresponding to the ith output feature map of the first convolution layer;

determining an ith output feature map of the first target convolutional layer according to a convolution sum between the N input feature maps of the first convolutional layer and N convolution kernels of the first target convolutional layer corresponding to the ith output feature map and an offset of the first target convolutional layer corresponding to the ith output feature map,

6. A method according to claim 2 or 3, wherein determining the P-norm of each set of convolution kernels in the first target convolution layer comprises:

determining P-norms of N convolution kernels of the first target convolution layer, which correspond to the ith output feature map, wherein i is more than or equal to 1 and less than or equal to M, M is the number of the output feature maps of the first target convolution layer, N is the number of the input feature maps of the first convolution layer, and the ith group of convolution kernels in the first target convolution layer are the N convolution kernels of the first target convolution layer, which correspond to the ith output feature map;

determining a sum of P-norms of N convolution kernels of the first target convolution layer corresponding to the i-th output feature map;

summing the P-normsThe power is taken as the P-norm of the ith set of convolution kernels of the first target convolution layer corresponding to the ith output feature map.

7. The method of claim 1, wherein compressing the second target neural network model to obtain a compressed third target neural network model comprises:

Setting a compression rate ρ, wherein 0< ρ <1; and

8. A method for using a convolutional neural network model, comprising:

acquiring a face image of a person;

inputting the face image of the person into a third target neural network model, wherein the third target neural network model is a model obtained by merging parameters of a first batch of standardization layers in a convolutional neural network model into a first convolutional layer in the convolutional neural network model, deleting target convolution kernels in the first target convolutional layer in the first target neural network model after generating the first target neural network model containing the first target convolutional layer, and compressing a second target neural network model after deleting the target convolution kernels, wherein the convolutional neural network model and the first target neural network model have the same output for the same input, and the first batch of standardization layers are batch of standardization layers which are positioned behind the first convolutional layer in the convolutional neural network model and are connected with the first convolutional layer;

And acquiring the age of the person corresponding to the person face image output by the third target neural network model.

9. The method of claim 8, further comprising, prior to incorporating parameters of the first batch normalization layer into a first convolutional layer in the convolutional neural network model:

10. A compression apparatus for a convolutional neural network model, comprising:

a merging unit, configured to merge parameters of a first batch of normalization layers in a convolutional neural network model into a first convolutional layer in the convolutional neural network model, and generate a first target neural network model including a first target convolutional layer, where the convolutional neural network model and the first target neural network model have the same output for the same input, and the first batch of normalization layers are batch normalization layers that are located after the first convolutional layer in the convolutional neural network model and are connected with the first convolutional layer;

A deleting unit, configured to delete a target convolution kernel in the first target convolution layer in the first target neural network model to obtain a second target neural network model, where a memory occupancy rate of the second target neural network model after deleting the target convolution kernel is smaller than a memory occupancy rate of the first target neural network model;

the compression unit is used for compressing the second target neural network model to obtain a third target neural network model;

the compressing device of the convolutional neural network model is further used for: acquiring a face image of a person;

inputting the person face image into the third target neural network model;

11. A convolutional neural network model using apparatus, comprising:

a first acquisition unit configured to acquire a person face image;

the input unit is used for inputting the face image of the person into a third target neural network model, wherein the third target neural network model is a model obtained by merging parameters of a first batch of standardization layers in a convolutional neural network model into a first convolutional layer in the convolutional neural network model, deleting target convolutional kernels in the first target convolutional layers in the first target neural network model after the first target neural network model containing the first target convolutional layers is generated, and compressing a second target neural network model after the target convolutional kernels are deleted, wherein the convolutional neural network model and the first target neural network model have the same output for the same input, and the first batch of standardization layers are batch of standardization layers which are positioned behind the first convolutional layers in the convolutional neural network model and are connected with the first convolutional layers;

And the second acquisition unit is used for acquiring the age of the person corresponding to the person face image output by the third target neural network model.

12. A storage medium storing a computer program, characterized in that the computer program when run performs the method of any one of claims 1 to 7 or 8-9.

13. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method according to any of the claims 1-7 or 8-9 by means of the computer program.