CN111414990B

CN111414990B - Convolutional neural network processing method and device, electronic equipment and storage medium

Info

Publication number: CN111414990B
Application number: CN202010105457.4A
Authority: CN
Inventors: 张选杨
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2020-02-20
Filing date: 2020-02-20
Publication date: 2024-03-19
Anticipated expiration: 2040-02-20
Also published as: CN111414990A

Abstract

The embodiment of the invention provides a convolutional neural network processing method, a convolutional neural network processing device, electronic equipment and a storage medium, wherein the method comprises the following steps: creating a target convolutional neural network, wherein an identity transformation branch and a convolutional processing branch are arranged between an input end and an output end of each layer of the target convolutional neural network, and the output and the input of the identity transformation branch are identical; respectively distributing corresponding weights for the identity transformation branches and the convolution processing branches in each layer of the target convolution neural network to obtain a plurality of sub-convolution neural networks; searching according to a preset model searching algorithm by taking at least part of the sub-convolution neural networks in the plurality of sub-convolution neural networks as a searching space to obtain a target sub-convolution neural network; and determining the performance parameter value of a preset model searching algorithm according to the layer number of the target sub-convolution neural network.

Description

Convolutional neural network processing method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of deep learning technologies, and in particular, to a convolutional neural network processing method, a device, an electronic device, and a storage medium.

Background

In recent years, with the development of deep learning technology, convolutional neural networks have made breakthrough progress in various visual recognition and speech recognition tasks. For example, in a speech recognition task, a piece of audio is input into a convolutional neural network, and the identity of a user to which the piece of audio belongs can be recognized through the output of the convolutional neural network.

The model structure of the convolutional neural network has an important influence on the completion precision of the recognition task, and the accuracy of the recognition task can be obviously improved by the proper model structure. The model structure of the convolutional neural network can be designed manually, but the labor cost consumed by the method is excessive, and errors are unavoidable.

In the related art, an automatic model structure design scheme is proposed to replace a manual design model structure. Specifically, a convolutional neural network with a certain depth is designed as a search space, and an optimal model structure is searched in the search space by using a search method. Therefore, only the search space, the search algorithm and the evaluation index of the model structure are required to be designed, and the proper model structure can be designed.

However, the above approach is poor for a fixed search space to search for model structures. When the search algorithm is adopted, the searched model structure may be different, and no evaluation standard exists for the performance of the search algorithm at present. In other words, the performance of the model structure searched in the related art is poor, and the searching performance of a searching algorithm cannot be accurately measured, and the design of the automated model structure in the related art needs to be improved.

Disclosure of Invention

In view of the foregoing, embodiments of the present invention provide a convolutional neural network processing method, apparatus, electronic device, and storage medium, so as to overcome or at least partially solve the foregoing problems.

In a first aspect of an embodiment of the present invention, there is provided a convolutional neural network processing method, including:

creating a target convolutional neural network, wherein an identity transformation branch and a convolutional processing branch are arranged between an input end and an output end of each layer of the target convolutional neural network, and the output and the input of the identity transformation branch are identical;

respectively distributing corresponding weights for the identity transformation branches and the convolution processing branches in each layer of the target convolution neural network to obtain a plurality of sub-convolution neural networks;

Searching according to a preset model searching algorithm by taking at least part of the sub-convolution neural networks in the plurality of sub-convolution neural networks as a searching space to obtain a target sub-convolution neural network;

and determining the performance parameter value of the preset model searching algorithm according to the layer number of the target sub-convolution neural network.

Optionally, creating the target convolutional neural network includes:

adding an identity transformation unit in each layer in an original convolutional neural network to obtain a convolutional processing branch, wherein the output and the input of the identity transformation unit are the same;

and adding an identity transformation branch on the basis of a convolution processing branch between an input end and an output end of each layer in the original convolution neural network to obtain the target convolution neural network.

Optionally, each layer in the original convolutional neural network includes at least: a convolution processing unit and a batch normalization unit; adding an identity transformation unit in each layer of the original convolutional neural network to obtain convolutional processing branches, wherein the method comprises the following steps of:

adding an identity transformation unit on the basis of a convolution processing unit of each layer in the original convolution neural network to obtain convolution processing branches;

Wherein the sum of the output of the convolution processing unit and the output of the identity transformation unit is the input of a batch normalization unit.

Optionally, searching according to a preset model searching algorithm by taking at least part of the plurality of sub-convolution neural networks as a searching space to obtain a target sub-convolution neural network, including:

screening candidate sub-convolution neural networks with parameter values of which the target tasks are completed being larger than preset parameter values from the plurality of sub-convolution neural networks;

and reserving or discarding the identity transformation branches in each layer of the target sub-convolution neural network according to the weights of the identity transformation branches in each layer of the candidate sub-convolution neural network to obtain the target sub-convolution neural network which is adapted to the target task.

Optionally, the preset model searching algorithm is a model searching algorithm based on an evolution algorithm; respectively distributing corresponding weights to the identity transformation branches and the convolution processing branches in each layer of the target convolution neural network to obtain a plurality of sub-convolution neural networks, wherein the method comprises the following steps:

preserving or discarding identity transformation branches in each layer of the target convolutional neural network with preset probability to obtain a plurality of sub convolutional neural networks;

Taking at least part of the plurality of sub-convolution neural networks as a search space, searching according to a preset model search algorithm to obtain a target sub-convolution neural network, wherein the method comprises the following steps:

screening K sub-convolution neural networks with the front parameter value ordering of the target task from the plurality of sub-convolution neural networks;

and taking the K sub-convolution neural networks as an initialization population, and adopting an evolutionary algorithm to carry out multiple screening according to the parameter values for completing the target task to obtain the target sub-convolution neural network.

Optionally, the preset model searching algorithm is a model searching algorithm based on a reinforcement learning model; respectively distributing corresponding weights to the identity transformation branches and the convolution processing branches in each layer of the target convolution neural network to obtain a plurality of sub-convolution neural networks, wherein the method comprises the following steps:

Sampling the plurality of sub-convolution neural networks for a plurality of times, taking weights of identical transformation branches and convolution processing branches in each layer of the sub-convolution neural networks obtained by the plurality of times of sampling as training samples, and training the reinforcement learning model for a plurality of times;

and screening the plurality of sub-convolution neural networks by adopting a reinforced learning model after training according to the parameter value of the completed target task to obtain the target sub-convolution neural network.

Optionally, the preset model searching algorithm is an end-to-end model based model searching algorithm; taking at least part of the plurality of sub-convolution neural networks as a search space, searching according to a preset model search algorithm to obtain a target sub-convolution neural network, wherein the method comprises the following steps:

according to the parameter values of the target convolutional neural network for completing the target task, the weights respectively distributed to the identity transformation branches and the convolutional processing branches in each layer of the target convolutional neural network are updated for a plurality of times;

and reserving or discarding the identity transformation branches in each layer of the target convolutional neural network after multiple updating according to the weights of the identity transformation branches in each layer of the target convolutional neural network after multiple updating, so as to obtain the target sub convolutional neural network.

Optionally, the number of layers of the target convolutional neural network is a first preset number of layers; before determining the performance parameter value of the preset model search algorithm according to the layer number of the target sub-convolution neural network, the method further comprises:

screening target original convolutional neural networks with the parameter values of completing the target tasks larger than the preset parameter values from the original convolutional neural networks with the maximum layer number being the first preset layer number;

determining the performance parameter value of the preset model search algorithm according to the layer number of the target sub-convolution neural network, wherein the method comprises the following steps:

comparing the layer number of the target sub-convolution neural network with the layer number of the target original convolution neural network;

and determining the performance parameter value of the preset model searching algorithm according to the comparison result.

In a second aspect of an embodiment of the present invention, there is provided a convolutional neural network processing apparatus, including:

the network construction module is used for creating a target convolutional neural network, and an identity transformation branch and a convolutional processing branch are arranged between the input end and the output end of each layer of the target convolutional neural network, wherein the output and the input of the identity transformation branch are the same;

The sub-network construction module is used for respectively distributing corresponding weights to the identity transformation branches and the convolution processing branches in each layer of the target convolution neural network to obtain a plurality of sub-convolution neural networks;

the target network determining module is used for searching according to a preset model searching algorithm by taking at least part of the sub-convolution neural networks in the plurality of sub-convolution neural networks as a searching space to obtain a target sub-convolution neural network;

and the performance verification module is used for determining the performance parameter value of the preset model search algorithm according to the layer number of the target sub-convolution neural network.

Optionally, the network construction module includes:

the first transformation unit is used for adding an identity transformation unit in each layer in the original convolutional neural network to obtain a convolutional processing branch, wherein the output and the input of the identity transformation unit are the same;

and the second transformation unit is used for adding identical transformation branches on the basis of convolution processing branches between the input end and the output end of each layer in the original convolution neural network to obtain the target convolution neural network.

Optionally, each layer in the original convolutional neural network includes at least: a convolution processing unit and a batch normalization unit; the first transformation unit is specifically configured to add an identity transformation unit on the basis of a convolution processing unit in each layer in the original convolution neural network, so as to obtain a convolution processing branch;

Optionally, the target network determining module includes:

the first screening unit is used for screening candidate sub-convolution neural networks with the parameter values of which the target tasks are completed being larger than a preset parameter value from the plurality of sub-convolution neural networks;

and the first network reservation unit is used for reserving or discarding the identity transformation branches in each layer of the target sub-convolution neural network according to the weights of the identity transformation branches in each layer of the candidate sub-convolution neural network to obtain the target sub-convolution neural network adapted to the target task.

Optionally, the preset model searching algorithm is a model searching algorithm based on an evolution algorithm; the sub-network construction module is specifically configured to reserve or discard identical transformation branches in each layer of the target convolutional neural network with a preset probability to obtain a plurality of sub-convolutional neural networks;

the target network determining module includes:

the second screening unit is used for screening K sub-convolution neural networks with the front parameter value ordering of the target task from the plurality of sub-convolution neural networks;

And the second retaining unit is used for taking the K sub-convolution neural networks as an initialization population, and carrying out multiple screening by adopting an evolutionary algorithm according to the parameter values for completing the target task to obtain the target sub-convolution neural network.

Optionally, the preset model searching algorithm is a model searching algorithm based on a reinforcement learning model; the sub-network construction module is specifically configured to reserve or discard identical transformation branches in each layer of the target convolutional neural network with a preset probability to obtain a plurality of sub-convolutional neural networks;

the target network determining module includes:

the model training unit is used for sampling the plurality of sub-convolution neural networks for a plurality of times, taking weights of identical transformation branches and convolution processing branches in each layer of the sub-convolution neural networks obtained by the plurality of times of sampling as training samples, and training the reinforcement learning model for a plurality of times;

and the third screening unit is used for screening the plurality of sub-convolution neural networks by adopting a reinforced learning model after training according to the parameter value for completing the target task to obtain the target sub-convolution neural network.

Optionally, the preset model searching algorithm is an end-to-end model based model searching algorithm; the target network determining module includes:

The weight updating unit is used for updating the weights respectively distributed to the identity transformation branches and the convolution processing branches in each layer of the target convolution neural network for a plurality of times according to the parameter values of the target task completed by the target convolution neural network;

and the fourth screening unit is used for reserving or discarding the identity transformation branches in each layer of the target convolutional neural network after multiple updating according to the weights of the identity transformation branches in each layer of the target convolutional neural network after multiple updating, so as to obtain the target sub convolutional neural network.

Optionally, the number of layers of the target convolutional neural network is a first preset number of layers; the apparatus further comprises:

the original network obtaining module is used for screening target original convolutional neural networks with the parameter values for completing the target tasks being larger than the preset parameter values from all original convolutional neural networks with the maximum layer number being the first preset layer number;

the performance verification module comprises:

the layer number comparison unit is used for comparing the layer number of the target sub-convolution neural network with the layer number of the target original convolution neural network;

and the result determining unit is used for determining the performance parameter value of the preset model searching algorithm according to the comparison result.

In a third aspect of the embodiment of the present invention, an electronic device is further disclosed, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the convolutional neural network processing method according to the first aspect of the present embodiment when executed.

In a fourth aspect of the embodiment of the present invention, a computer readable storage medium is also disclosed, where a computer program stored in the storage medium causes a processor to execute the convolutional neural network processing method according to the first aspect of the embodiment of the present invention.

The embodiment of the invention has the following advantages:

in the embodiment of the invention, an identity transformation branch and a convolution processing branch are arranged between the input end and the output end of each layer of the established target convolution neural network, wherein the output and the input of the identity transformation branch are the same; then, respectively distributing corresponding weights for the identity transformation branches and the convolution processing branches in each layer to obtain a plurality of sub-convolution neural networks; at least part of the sub-convolution neural networks in the plurality of sub-convolution neural networks are used as search spaces, and searching is carried out in the search spaces according to a preset model searching algorithm to obtain target sub-convolution neural networks; and finally, determining the performance parameter value of a preset model searching algorithm according to the layer number of the target sub-convolution neural network.

By adopting the embodiment of the invention, on one hand, the model searching algorithm can be measured by determining the performance parameter value of the model searching algorithm, so that the performance of the searching algorithm can be evaluated, and the superiority of the model structure searched based on the searching algorithm can be evaluated by the evaluating searching algorithm. On the other hand, corresponding weights can be respectively distributed to the identical transformation branches and the convolution processing branches in each layer, different sub-convolution neural networks can be obtained by different weights, and then a dynamic change search space is constructed, so that the superiority of the model structure searched in the dynamic change search space is improved, and the accuracy of evaluating the search algorithm is improved due to the fact that the searched model structure has higher superiority.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a convolutional neural network processing method according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating steps for processing a target convolutional neural network in accordance with one embodiment of the present invention;

FIG. 3 is a schematic diagram of a layer of network architecture of a target convolutional neural network in accordance with one embodiment of the present invention;

FIG. 4 is a schematic diagram of the structure of each layer in yet another target convolutional neural network in accordance with an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a convolutional neural network processing device according to an embodiment of the present invention.

Detailed Description

In order to make the above objects, features and advantages of the present invention more comprehensible, the technical solutions in the embodiments of the present application will be clearly described in conjunction with the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

Referring to fig. 1, a general flow diagram of a convolutional neural network processing method according to an embodiment of the present application is shown. As shown in fig. 1, the method comprises the following steps: the method comprises the steps of obtaining an optimal network model structure aiming at target task enumeration, creating a target convolutional neural network, constructing a plurality of sub convolutional neural networks with different depths in the target convolutional neural network, searching the sub convolutional neural networks with different depths by adopting a model searching algorithm to obtain the target sub convolutional neural network, and verifying the performance of the model searching algorithm according to the optimal network model structure and the target sub convolutional neural network.

In this embodiment, the convolutional neural network CNN is a class in the deep neural network, where the model structure of the convolutional neural network may include, but is not limited to: leNet, alexNet, VGG, googleNet, resNet and DenseNet. The convolutional neural network processing method is applicable to convolutional neural networks of any structure.

In this embodiment, the number of layers of the target convolutional neural network may be a first preset number of layers, where the number of layers of the target convolutional neural network characterizes the depth of the target convolutional neural network, for example, if the target convolutional neural network has 100 layers, the depth of the target convolutional neural network is 100.

Wherein, a target task can be determined first, the target task can be an image classification task, a voice recognition task and the like, and then an original convolutional neural network structure with the optimal depth for completing the target task can be obtained through enumeration. The original convolutional neural network structure with the optimal depth can be used as a reference structure of an evaluation model searching algorithm.

In this embodiment, the optimal original convolutional neural network structure may be obtained by enumeration, which specifically includes the following steps:

Step S11: and screening target original convolutional neural networks with the parameter values of completing the target tasks larger than the preset parameter values from the original convolutional neural networks with the maximum layer number being the first preset layer number.

In this embodiment, the maximum layer number is inconsistent with the layer number of different original convolutional neural networks in each original convolutional neural network with the first preset layer number. For example, the first preset layer number is 100 layers, and then there may be 80 original convolutional neural networks, where the layer number of the 80 original convolutional neural networks is 20 layers at the minimum and 100 layers at the maximum.

In implementation, on the ImageNet data set, each original convolutional neural network with the maximum layer number being the first preset layer number is used as a search space, and the original convolutional neural networks with different depths are enumerated and trained to find the corresponding optimal depth on the ImageNet data set, wherein the original convolutional neural network under the optimal depth is the target original convolutional neural network with the best completion of the target task. The optimal depth is the optimal layer number, for example, the optimal layer number is 25 layers, and the target original convolutional neural network is the original convolutional neural network of 25 layers. Wherein the ImageNet dataset is a large visual database for visual object recognition software studies.

In this embodiment, after or simultaneously with or before obtaining the optimal target original convolutional neural network, a target convolutional neural network may be created, and the target convolutional neural network is processed to search out a target sub convolutional neural network that completes the target task, so as to evaluate the model search algorithm. Referring to fig. 2, a flowchart illustrating steps for processing a target convolutional neural network, as shown in fig. 2, may include the steps of:

step S12: a target convolutional neural network is created, and an identity transformation branch and a convolutional processing branch are arranged between an input end and an output end of each layer of the target convolutional neural network.

Wherein the output and input of the identity transformation branch are the same.

Referring to fig. 3, a one-layer network structure of a target convolutional neural network is shown, and as shown in fig. 3, an i-th layer network represents any one layer network of the target convolutional neural network, which includes an identity transformation branch 301 and a convolutional processing branch 302 between an input and an output of each layer network. Where identity transformation branch 301 and convolution processing branch 302 are two branches in parallel in each layer.

Taking information S as an example, when information S is input to an i-th layer network in the target convolutional neural network, the information S is still information S output by identity transformation branch 301 when it is input to identity transformation branch 301. When this information S is input to the convolution processing branch 302, the information output by the convolution processing branch 302 is convolutionally processed information S'.

Step S13: and respectively distributing corresponding weights for the identity transformation branches and the convolution processing branches in each layer of the target convolution neural network to obtain a plurality of sub-convolution neural networks.

In this embodiment, the weight of the identity transformation branch may represent the proportion of the output of the identity transformation branch, and similarly, the weight of the convolution processing branch may represent the proportion of the output of the convolution processing branch. Wherein the sum of the weights of the identity transformation branches and the convolution processing branches may be 1.

For example, when the weight of an identity transformation branch in one layer is 0 and the weight of a convolution processing branch is 1, it means that the output of the identity transformation branch is reserved to be 0, and the output of the convolution processing branch is reserved entirely; when the weight of the identity transformation branch in one layer is 0.2 and the weight of the convolution processing branch is 0.8, the output of the identity transformation branch is only 20%, and the output of the convolution processing branch is only 80%.

In the implementation, corresponding weights can be respectively allocated to the identity transformation branch and the convolution processing branch in each layer of the target convolution neural network for multiple times, and the value of the weight allocated each time can be different, so that multiple different sub convolution neural networks can be obtained. That is, the weights respectively assigned to the identity transformation branch and the convolution processing branch of each layer of the target convolutional neural network at the nth time may be different from the weights respectively assigned to the identity transformation branch and the convolution processing branch of each layer of the target convolutional neural network at the (n+1) th time. Thus, the nth resulting sub-convolution neural network is different from the n+1th resulting sub-convolution neural network.

Step S14: and searching according to a preset model searching algorithm by taking at least part of the plurality of sub-convolution neural networks as a searching space to obtain a target sub-convolution neural network.

In this embodiment, when the number of selected partial sub-convolutional neural networks changes, this means that the search space changes accordingly. In this way, the search space is made changeable, and by searching in the changeable search space, the probability of obtaining the optimal network model structure can be improved, that is, the probability of obtaining the network model structure which is the optimal network structure and can be searched in the changeable search space is higher.

In specific implementation, the target sub-convolution neural network can be searched out from at least part of the sub-convolution neural networks by adopting a preset model searching algorithm with the highest accuracy of completing the target task as a target, so that the obtained target sub-convolution neural network is the optimal convolution neural network for completing the target task.

The preset model searching algorithm can be an end-to-end model searching algorithm, a model searching algorithm based on a reinforcement learning model or a model searching algorithm based on evolutionary learning.

Step S15: and determining the performance parameter value of the preset model searching algorithm according to the layer number of the target sub-convolution neural network.

Since the optimal network model structure is obtained through enumeration, when determining the performance parameter value of the preset model search algorithm, the method may include the following steps:

step S15-1: and comparing the layer number of the target sub-convolution neural network with the layer number of the target original convolution neural network.

The number of layers of the target sub-convolution neural network may be the number of reserved convolution branches, for example, the target sub-convolution neural network includes 20 reserved convolution branches, and then the number of layers of the target sub-convolution neural network is 20.

In specific implementation, the absolute value of the difference between the number of layers of the target sub-convolution neural network and the number of layers of the target original convolution neural network can be determined, and the absolute value is used as a comparison result. For example, the number of layers of the target sub-convolution neural network is 20, the number of layers of the target original convolution neural network is 22, and the absolute value of the difference of the number of layers is 2.

Step S15-2: and determining the performance parameter value of the preset model searching algorithm according to the comparison result.

In this embodiment, the absolute value of the difference between the number of layers of the target sub-convolutional neural network and the number of layers of the target original convolutional neural network may be used as a performance parameter value of a preset model search algorithm. Because the target original convolutional neural network is an optimal convolutional neural network obtained based on enumeration and is equivalent to a standard answer, the performance parameter value can reflect the difference between the searched target convolutional neural network and the optimal convolutional neural network and can be used for evaluating the searching performance of a preset model searching algorithm.

When the performance parameter value is the absolute value of the difference between the number of layers of the target sub-convolutional neural network and the number of layers of the target original convolutional neural network, the larger the performance parameter value is, the worse the searching performance of the preset model searching algorithm is indicated, that is, the preset model searching algorithm cannot accurately search out the optimal network model structure. Conversely, the smaller the performance parameter value is, the higher the searching performance of the preset model searching algorithm is, that is, the preset model searching algorithm can accurately search out the optimal network model structure.

When the embodiment of the invention is adopted, corresponding weights can be respectively allocated to the identity transformation branches and the convolution processing branches in each layer, and different convolution neural networks can be obtained by different weights, so that the search space can be dynamically changed, and the superiority of the model structure searched in the dynamically changed search space is further improved. Because the searched model structure has higher superiority, the accuracy of evaluating the search algorithm is improved. On the other hand, the performance parameter value of the model search algorithm can be determined, so that the model search algorithm can be measured, the performance of the search algorithm can be evaluated, and the superiority of the model structure searched based on the search algorithm can be evaluated through the evaluation search algorithm.

In combination with the above embodiment, the following embodiment a specifically describes how to use at least part of the sub-convolutional neural networks in the plurality of sub-convolutional neural networks as a search space, and search according to a preset model search algorithm to obtain a process of the target sub-convolutional neural network.

In this embodiment a, weights corresponding to the identity transformation branches and the convolution processing branches of each layer in different sub-convolution neural networks are different, and the process of obtaining the target sub-convolution neural network may be as follows:

step S14-1: and screening candidate sub-convolution neural networks with parameter values of which the target tasks are completed being larger than preset parameter values from the plurality of sub-convolution neural networks.

In this embodiment a, parameter values of the target tasks of the plurality of sub-convolutional neural networks may be determined respectively, where the parameter values may be used to reflect accuracy of predicting the target tasks by the sub-convolutional neural networks, for example, the target tasks are tasks performed on objects in one picture, and the parameter values may reflect accuracy of classifying the objects by the sub-convolutional neural networks.

In specific implementation, the sub-convolution neural network with the parameter value of the completed target task being larger than the preset parameter value can be used as a candidate sub-convolution neural network. In practice, the number of the sub-convolution neural networks with the parameter values larger than the preset parameter value of the target task may be one or more, and when the number of the sub-convolution neural networks is more than one, the sub-convolution neural network with the largest parameter value for completing the target task may be used as the candidate sub-convolution neural network.

Step S14-2: and reserving or discarding the identity transformation branches in each layer of the candidate sub-convolution neural network according to the weights of the identity transformation branches in each layer of the candidate sub-convolution neural network to obtain the target sub-convolution neural network which is adapted to the target task.

In the embodiment a, since the identity transformation branches and the convolution processing branches of each layer in the candidate sub-convolution neural network have corresponding weights, the weights can be used as the basis of whether to reserve the identity transformation branches or discard the identity transformation branches. Furthermore, each convolution branch that is finally reserved may be constructed as a target sub-convolution neural network, where each layer is a convolution branch, and the number of layers of the target sub-convolution neural network is the number of convolution branches.

When the weight of the identity transformation branch is larger than that of the convolution processing branch, the identity transformation branch can be reserved, and the convolution processing branch can be discarded correspondingly, so that the input and the output of the layer network are the same. When the weight of the identity transformation branch is smaller than that of the convolution processing branch, the identity transformation branch can be discarded, and then the convolution processing branch is reserved, so that the input of the layer network is output after being processed by the convolution processing branch, and finally the target sub-convolution neural network matched with the target task is obtained.

In combination with the above embodiment, in one implementation B, step S13 may be implemented by a corresponding process according to the model search algorithm adopted, that is, a plurality of sub-convolutional neural networks are obtained by a corresponding step, and step S14 may be implemented by a different process to obtain the target sub-convolutional neural network.

Specifically, in one embodiment B1, the model search algorithm adopted is a model search algorithm based on an evolutionary algorithm. When the model search algorithm is adopted, a plurality of deconvolution neural networks with different depths are required to be obtained first, that is, the plurality of deconvolution neural networks in the present embodiment B1 have different depths. The step S13 may be the following steps:

step S13': and reserving or discarding identity transformation branches in each layer of the target convolutional neural network with preset probability to obtain a plurality of sub convolutional neural networks.

The preset probability may be understood as a preset sampling probability, and the preset probability may be 0.5. In particular implementations, the identity transformation branches in the various layers of the target convolutional neural network may be randomly discarded per iteration through a bernoulli distribution of p=0.5, such that the weight of the identity transformation branches in each layer is 0 or 1. When the identity transformation branch is discarded, the weight of the identity transformation branch is set to 0, and then the weight of the convolution processing branch is set to 1, which means that the convolution processing branch is reserved. However, when the identity transformation branch is retained, the weight of the identity transformation branch is set to 1, and then the weight of the convolution processing branch is set to 0, which means that the convolution processing branch is discarded.

In the implementation, after a plurality of iterations, a plurality of sub-convolution neural networks with different layers are formed. For example, in a 100-layer target convolutional neural network, 20 convolutional processing branches remain at the first iteration, and then the 20 convolutional processing branches form one sub-convolutional neural network 1. In the second iteration, 40 convolution branches are reserved, and then the 40 convolution branches form a sub-convolution neural network 2. The number of layers of the sub-convolution neural network 1 is 20, and the number of layers of the sub-convolution neural network 2 is 40.

Accordingly, a model search algorithm based on an evolutionary algorithm can be adopted to search a plurality of sub-convolution neural networks with different layers so as to obtain a target sub-convolution neural network. The specific process can be described as follows:

step S14-1: and screening K sub-convolution neural networks with the top parameter values for completing the target task from the plurality of sub-convolution neural networks.

The parameter value of the target task may be used to reflect the accuracy of the target task predicted by the sub-convolution neural network, for example, the target task is a task of classifying objects in one picture, and the parameter value may reflect the accuracy of the sub-convolution neural network classifying objects.

In this embodiment, the sub-convolutional neural networks may be sorted in order of the parameter values from the largest to the smallest, and then the K preceding sub-convolutional neural networks may be selected.

Step S14-2: and taking the K sub-convolution neural networks as an initialization population, and adopting an evolutionary algorithm to carry out multiple screening according to the parameter values for completing the target task to obtain the target sub-convolution neural network.

In specific implementation, the K sub-convolution neural networks can be used as an initialization population, the sub-convolution neural networks with poor adaptability (i.e. small performance parameter values for completing the target task) are eliminated through intersection and variation according to the parameter values for completing the target task, the sub-convolution neural networks with strong adaptability (i.e. large performance parameter values for completing the target task) are reserved, and the finally reserved sub-convolution neural networks are used as target sub-convolution neural networks.

Specifically, in another specific embodiment B2, a model search algorithm based on a reinforcement learning model may be used to search a plurality of sub-convolutional neural networks in the search space to obtain a target sub-convolutional neural network, and when the model search algorithm is used, a plurality of sub-convolutional neural networks with different depths are also required to be obtained, that is, the plurality of sub-convolutional neural networks in the embodiment B2 have different depths. The step S13 may be the following steps:

This step S13 'is similar to the procedure of step S13' in the above-described embodiment B1, and will not be described again.

Accordingly, a model search algorithm based on a reinforcement learning model can be adopted to search a plurality of sub-convolution neural networks with different layers so as to obtain a target sub-convolution neural network. The specific process can be described as follows:

step S14-1': and sampling the plurality of sub-convolution neural networks for a plurality of times, taking weights of constant transformation branches and convolution processing branches in each layer of the sub-convolution neural networks obtained by the plurality of times of sampling as training samples, and training the reinforcement learning model for a plurality of times.

In particular, an LSTM (Long Short-Term Memory) may be defined to construct the reinforcement learning model, where the first time step of the LSTM is input as a full 0 vector and the other time steps are outputs of the last time step. The weights of the identity transformation branches and the convolution processing branches in each layer are used as training samples and input into the reinforcement learning model. The reinforcement learning model can sample the plurality of sub-convolution neural networks for a plurality of times, one sub-convolution neural network can be sampled when each time is sampled, a reward function is designed according to the accuracy of the sub-convolution network channels and collaterals on the verification data set and the calculation complexity of the sub-network model, and the strategy gradient algorithm is used for updating the learning model so as to achieve the purpose of training the reinforcement learning model. And (5) reciprocating until the reinforcement learning model converges.

Step S14-2': and screening the plurality of sub-convolution neural networks by adopting a reinforced learning model after training according to the parameter value of the completed target task to obtain the target sub-convolution neural network.

After the reinforcement learning model is trained, the depth of the neural network model is determined, and then the trained reinforcement learning model can be used to search out the target sub-convolutional neural network from the plurality of sub-convolutional neural networks.

Specifically, in still another specific embodiment B3, a model search algorithm based on an end-to-end model may be used to directly search the target convolutional neural network to obtain a target sub-convolutional neural network, where in an intermediate process of searching the target convolutional neural network to obtain the target sub-convolutional neural network, a plurality of sub-convolutional neural networks in an intermediate state may be obtained. That is, when the present embodiment B3 is adopted, the plurality of sub-convolutional neural networks in step S13 are obtained in the process of searching for the target convolutional neural network by the model search algorithm based on the end-to-end model. The method specifically comprises the following steps:

step S14-1": and updating weights respectively distributed to the identity transformation branches and the convolution processing branches in each layer of the target convolutional neural network for multiple times according to the parameter values of the target convolutional neural network for completing the target task.

The specific implementation can be carried out according to the following procedures:

firstly, model structure parameters of an end-to-end model can be defined, an objective function is constructed according to an objective task, and weights respectively distributed by an identity transformation branch and a convolution processing branch in each layer in the objective convolution neural network are obtained through softmax function calculation of the model parameters.

Then, the outputs of the identity transformation branches and the convolution branches are weighted and summed according to the weights respectively assigned to the identity transformation branches and the convolution branches in each layer. Specifically, weighted summation refers to: the product of the output of the identity transformation branch and the weight of the identity transformation branch is calculated, and the sum of the product of the weight of the convolution processing branch and the output of the convolution processing branch is calculated.

Then, the objective function is updated and the updated model structure parameters, the weights of each layer of identity transformation branches and the weights of convolution processing branches are reversely transferred according to the weighted sum value.

Step S14-2": and reserving or discarding the identity transformation branches in each layer of the target convolutional neural network after multiple updating according to the weights of the identity transformation branches in each layer of the target convolutional neural network after multiple updating, so as to obtain the target sub convolutional neural network.

In this embodiment, the weights of the identity transformation branches in each layer of the target convolutional neural network may be updated multiple times until the objective function converges, so as to obtain a final target convolutional neural network, where the identity transformation branches and the convolutional processing branches of each layer also have final weights, and the weights may be used as a basis for whether to retain the convolutional processing branches or discard the convolutional processing branches.

In specific implementation, the branches with larger weights of all layers in the final target convolutional neural network are reserved, namely, the weights of the identical transformation branches and the weights of the convolutional processing branches are compared in each layer, when the weights of the identical transformation branches are larger than the weights of the convolutional processing branches, the identical transformation branches are reserved, and otherwise, the identical transformation branches are discarded.

Since the inputs and outputs of the identity transformation branches are the same, preserving an identity transformation branch of one layer indicates that the input to that layer is directly used as the input to the next layer (i.e., that layer does not process the input), indicating that layer in the target neural network is removed, discarding the identity transformation branch indicates that a convolution processing branch in the target neural network is preserved, and thus a target sub-convolution neural network is obtained, in which a plurality of preserved convolution processing branches are included.

For example, taking the target convolutional neural network as 100 layers as an example, the final target convolutional neural network comprises an ith layer and a jth layer. Wherein, the weight of the identity transformation branch of the ith layer is 0.4, the weight of the convolution processing branch is 0.6, the weight of the identity transformation branch of the jth layer is 0.8, and the weight of the convolution processing branch is 0.2, the identity transformation branch of the ith layer can be discarded, while the identity transformation branch of the jth layer is reserved, in practice, the final target sub-convolution neural network reserves the convolution processing branch of the ith layer, without the jth layer, i.e. the target sub-convolution neural network is reduced by one layer.

When the method is adopted, the process of obtaining the target sub-convolution neural network is included in the searching process, so that the efficiency of obtaining the target sub-convolution neural network can be improved.

In combination with the above embodiment, in an implementation manner, when the target convolutional neural network is created, the convolutional processing branch of the target convolutional neural network may further include an identity change unit, and specifically, the creating of the target convolutional neural network may include the following steps:

step S12-1: and adding an identity transformation unit in each layer in the original convolutional neural network to obtain a convolutional processing branch, wherein the output and the input of the identity transformation unit are the same.

In this embodiment, two identity transforms may be included in each layer of the target convolutional neural network, where one identity transform unit may be included in the convolutional processing branch in addition to the identity transform at the identity transform branch.

In one embodiment, the location in the convolution processing branch where the identity transformation unit is added is described in detail. Wherein each layer in the original convolutional neural network comprises at least: a convolution processing unit and a batch normalization unit. The step of deriving the convolution processing branches may be the steps of:

step S12-1': and adding an identity transformation unit on the basis of a convolution processing unit of each layer in the original convolution neural network to obtain convolution processing branches.

In this embodiment, each layer of network in the original convolutional neural network may include: convolution processing unit, batch normalization unit, and ReLU (Rectified Linear Unit, modified linear unit). Wherein an identity transforming unit is added between the input of each layer and the input of the batch normalizing unit, the identity transforming unit can be a unit parallel to the convolution processing unit, and the input and the output of the identity transforming unit are the same. Wherein the input of each layer of batch normalization unit is the sum of the output of the convolution processing unit and the output of the identity transformation unit.

Taking the input of the information S as an example, when S is input to the convolution processing branch, the S is input to the convolution processing unit and the identity transformation unit, respectively, so that the convolution processing unit outputs S1, the identity transformation unit still outputs S, and then the sum of S and S1 is input to the batch normalization layer.

Step S12-2: and adding an identity transformation branch on the basis of a convolution processing branch between an input end and an output end of each layer in the original convolution neural network to obtain the target convolution neural network.

Referring to fig. 4, a schematic diagram of each layer in a further target convolutional neural network is shown, and as shown in fig. 4, the i-th layer network represents any layer network in the target convolutional neural network, which includes an identity transformation branch 402 and a convolution processing branch, where the convolution processing branch includes a convolution processing unit, a batch normalization layer, a ReLU, and an identity transformation unit 401 juxtaposed to the convolution processing unit. It can be seen that the identity transformation branch 402 outputs the input directly, and the output of the identity transformation unit 401 in the convolution processing branch and the output of the convolution processing unit are added by element and then input to the batch normalization layer.

By adopting the embodiment, the identical conversion unit which is parallel to the convolution processing unit is arranged in the convolution processing branch, so that the problem that model training is not performed due to the fact that parameters of a network at a shallow layer cannot be updated when the depth is deep is solved, and the target convolution neural network can be prevented from becoming a residual network. The depth of the target convolutional neural network is increased, namely the number of layers of the target convolutional neural network can be increased to a higher number of layers, so that the search space can be enlarged when the number of layers is higher, and the probability that the searched target sub convolutional neural network is the optimal network structure is increased, namely the optimal target sub convolutional neural network can be obtained more accurately in a larger search space.

It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.

Based on the same inventive concept, referring to fig. 5, there is shown a schematic frame diagram of a convolutional neural network processing device according to an embodiment of the present invention, the device may include the following modules:

the network construction module 501 is configured to create a target convolutional neural network, where an identity transformation branch and a convolutional processing branch are arranged between an input end and an output end of each layer of the target convolutional neural network, and an output and an input of the identity transformation branch are the same;

the sub-network construction module 502 is configured to allocate corresponding weights to the identity transformation branches and the convolution processing branches in each layer of the target convolution neural network, so as to obtain a plurality of sub-convolution neural networks;

The target network determining module 503 is configured to search according to a preset model searching algorithm with at least part of the sub-convolutional neural networks in the plurality of sub-convolutional neural networks as a search space, so as to obtain a target sub-convolutional neural network;

and the performance verification module 504 is configured to determine a performance parameter value of the preset model search algorithm according to the number of layers of the target sub-convolutional neural network.

Optionally, the network construction module 501 may specifically include the following units:

Optionally, the target network determining module 503 may specifically include the following units:

the target network determining module 503 may specifically include the following units:

Optionally, the preset model searching algorithm is an end-to-end model based model searching algorithm; the target network determining module 503 may specifically include the following units:

Optionally, the number of layers of the target convolutional neural network is a first preset number of layers; the apparatus may further comprise the following modules:

the performance verification module 504 may specifically include the following units:

For the embodiment of the convolutional neural network processing device, the description is relatively simple because the embodiment of the convolutional neural network processing device is basically similar to the embodiment of the convolutional neural network processing method, and the relevant points are only referred to the part of the description of the embodiment of the convolutional neural network processing method.

The embodiment of the invention also provides electronic equipment, which can comprise: one or more processors; and one or more machine readable media having instructions stored thereon, which when executed by the one or more processors, cause the apparatus to perform one or more convolutional neural network processing methods as described in embodiments of the present invention.

The embodiment of the invention also provides a computer readable storage medium, and a stored computer program causes a processor to execute the convolutional neural network processing method according to the embodiment of the invention.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.

The convolutional neural network processing method, device, electronic equipment and storage medium provided by the invention are described in detail, and specific examples are applied to illustrate the principle and implementation of the invention, and the description of the above examples is only used for helping to understand the method and core ideas of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. A convolutional neural network processing method, comprising:

determining a target task, wherein the target task is an image classification task, training an original convolutional neural network based on an image Net data set aiming at the target task, enumerating and training the original convolutional neural networks with different depths by taking the original convolutional neural network with the maximum layer number being a first preset layer number as a search space on the image Net data set so as to find the corresponding optimal depth on the image Net data set, and taking the original convolutional neural network under the optimal depth as an optimal network model structure;

And determining a performance parameter value of the preset model searching algorithm according to the layer number comparison result of the optimal network model structure and the target sub-convolution neural network, and verifying the performance of the model searching algorithm, wherein the performance parameter value represents the difference of the layer numbers between the optimal network model structure and the target sub-convolution neural network.

2. The method of claim 1, wherein creating the target convolutional neural network comprises:

3. The method of claim 2, wherein each layer in the original convolutional neural network comprises at least: a convolution processing unit and a batch normalization unit; adding an identity transformation unit in each layer of the original convolutional neural network to obtain convolutional processing branches, wherein the method comprises the following steps of:

4. The method of claim 1, wherein searching according to a preset model search algorithm with at least part of the plurality of sub-convolutional neural networks as a search space to obtain a target sub-convolutional neural network comprises:

and reserving or discarding the identity transformation branches in each layer of the candidate sub-convolution neural network according to the weights of the identity transformation branches in each layer of the candidate sub-convolution neural network to obtain the target sub-convolution neural network which is adapted to the target task.

5. The method according to claim 1, wherein the preset model search algorithm is an evolutionary algorithm-based model search algorithm; respectively distributing corresponding weights to the identity transformation branches and the convolution processing branches in each layer of the target convolution neural network to obtain a plurality of sub-convolution neural networks, wherein the method comprises the following steps:

6. The method of claim 1, wherein the predetermined model search algorithm is a reinforcement learning model-based model search algorithm; respectively distributing corresponding weights to the identity transformation branches and the convolution processing branches in each layer of the target convolution neural network to obtain a plurality of sub-convolution neural networks, wherein the method comprises the following steps:

7. The method according to claim 1, wherein the preset model search algorithm is an end-to-end model based model search algorithm; taking at least part of the plurality of sub-convolution neural networks as a search space, searching according to a preset model search algorithm to obtain a target sub-convolution neural network, wherein the method comprises the following steps:

8. The method of any of claims 4-7, wherein the number of layers of the target convolutional neural network is a first preset number of layers; before determining the performance parameter value of the preset model search algorithm according to the layer number of the target sub-convolution neural network, the method further comprises:

9. A convolutional neural network processing device, comprising:

And the performance verification module is used for determining a performance parameter value of the preset model search algorithm according to the layer number comparison result of the optimal network model structure and the target sub-convolution neural network, verifying the performance of the model search algorithm, wherein the performance parameter value represents the difference of the layer numbers between the optimal network model structure and the target sub-convolution neural network.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executed implementing a convolutional neural network processing method according to any one of claims 1-8.

11. A computer-readable storage medium, characterized in that it stores a computer program causing a processor to execute the convolutional neural network processing method according to any one of claims 1 to 8.