CN115345303A

CN115345303A - Convolutional neural network weight tuning method, device, storage medium and electronic equipment

Info

Publication number: CN115345303A
Application number: CN202211113628.3A
Authority: CN
Inventors: 李言洲
Original assignee: Hangzhou Hikrobot Co Ltd
Current assignee: Hangzhou Hikrobot Co Ltd
Priority date: 2022-09-14
Filing date: 2022-09-14
Publication date: 2022-11-15

Abstract

The application discloses a weight tuning method of a convolutional neural network, which comprises the following steps: forming a learning rate parameter group by using learning rate parameters corresponding to each designated layer of a first convolutional neural network, using the learning rate parameter group as an individual of a genetic algorithm, and using the learning rate parameters as genes of the genetic algorithm; and processing all the learning rate parameter sets of each generation in sequence by using the genetic algorithm until the optimization termination condition of the genetic algorithm is met so as to determine the optimal learning rate parameter set for generating a second convolutional neural network by training the first convolutional neural network. By applying the method and the device, the optimal or local optimal learning rate parameter set in the transfer learning can be found without depending on any priori knowledge, and the performance of the target convolutional neural network of the transfer learning is improved.

Description

Convolutional neural network weight tuning method, device, storage medium and electronic equipment

Technical Field

The present disclosure relates to neural network technologies, and in particular, to a convolutional neural network weight tuning method, an apparatus, a storage medium, and an electronic device.

Background

With the deep learning and the improvement of the neural network technology, the convolutional neural network model is more and more widely applied.

Deep learning realized by utilizing a convolutional neural network is a method in the field of machine learning, and tasks such as image classification, target detection, image segmentation, machine translation and the like are realized by building a neural network with a deeper hierarchy. The application process of the convolutional neural network is generally divided into two stages of training and testing. The training stage trains the convolutional neural network by using large-scale data, and the testing stage applies the trained convolutional neural network to various specific tasks.

The transfer learning is a machine learning method, namely a model (generally called as a pre-trained base network) developed for a task A is used as an initial point and is reused in the process of developing the model for a task B; the main idea is to migrate annotation data or knowledge structure from the related domain (also called source domain), to complete or improve the learning effect of the target domain (also called target domain) or task. There are currently several mainstream approaches to migratory learning: 1) Transfer Learning: freezing all convolution layers of the pre-training model, and only training a self-customized full-connection layer; 2) Extract Feature Vector: firstly, calculating the feature vectors of the convolution layer of the pre-training model to all training and testing data, then abandoning the pre-training model, and only training the self-customized simple configuration version full-connection network; 3) Fine-tune: part of the convolutional layers of the pre-trained model (usually most convolutional layers near the input) are frozen, the remaining convolutional layers (usually part of the convolutional layers near the output) and the fully-connected layers are trained. The source domain refers to a field with a large amount of labeled data, and the target domain refers to a field with a small amount of labeled data.

Weight tuning is a strategy of transfer learning, and how to transfer the knowledge learned in the source domain to the target domain is studied by changing the learning rate parameter values of each designated layer in the base network. At present, the learning rate parameters (also called learning weights) are changed mainly by manually setting the learning weights, which generally include a learning weight increasing strategy and a learning weight decreasing strategy, but the method needs to rely on the guidance of a lot of prior knowledge, is low in efficiency, and the effect of the found learning rate parameter combination depends on the accuracy of the prior knowledge, so that the performance of the target convolutional neural network cannot be guaranteed.

Disclosure of Invention

The application provides a weight tuning method, a weight tuning device, a storage medium and electronic equipment for a convolutional neural network, which can find an optimal or local optimal learning rate parameter set in transfer learning without depending on any priori knowledge, and improve the performance of the target convolutional neural network in the transfer learning.

In order to achieve the purpose, the following technical scheme is adopted in the application:

a weight tuning method of a convolutional neural network comprises the following steps:

forming a learning rate parameter group by using learning rate parameters corresponding to each designated layer of a first convolutional neural network, using the learning rate parameter group as an individual of a genetic algorithm, and using the learning rate parameters as genes of the genetic algorithm;

processing all the learning rate parameter sets of each generation in sequence by using the genetic algorithm until the optimization termination condition of the genetic algorithm is met, so as to determine the optimal learning rate parameter set for generating a second convolutional neural network by training the first convolutional neural network;

wherein, the processing all the learning rate parameter sets of each generation comprises:

respectively training on the basis of the first convolutional neural network by utilizing all the learning rate parameter groups of the current generation to generate network models of the second convolutional neural network corresponding to the corresponding learning rate parameter groups, and performing performance evaluation corresponding to each network model; taking the performance evaluation result of the network model as an evaluation standard of a corresponding learning rate parameter group, determining a current optimal learning rate parameter group, judging whether the optimization termination condition is met, if so, taking the current optimal learning rate parameter group as the optimal learning rate parameter group, and taking a network model obtained by training by using the learning rate parameter group as a trained second convolutional neural network; and if the optimization termination condition is not met, determining all learning rate parameter sets of the next generation through the genetic algorithm.

Preferably, the generating a network model of the second convolutional neural network corresponding to the corresponding learning rate parameter set, and performing performance evaluation corresponding to each network model includes:

for all the learning parameter groups of the current generation, respectively training and generating the second convolutional neural network by using each learning rate parameter group in a transfer learning manner on the basis of the first convolutional neural network, and respectively obtaining a network model corresponding to each learning rate parameter group;

and respectively performing performance evaluation on the network model corresponding to each learning rate parameter group to obtain a performance evaluation result of the network model corresponding to each learning rate parameter group, wherein the performance evaluation result is used as the performance evaluation result of each learning rate parameter group.

Preferably, the determining whether the optimization termination condition is satisfied includes:

if the performance evaluation result corresponding to the current optimal learning rate parameter set is superior to a preset target performance, or if the iteration frequency of the genetic algorithm reaches a preset first maximum iteration frequency, determining that the optimization termination condition is met; otherwise, determining that the optimization termination condition is not met; and the performance evaluation result corresponding to each learning rate parameter group is the performance evaluation result of the network model of the second convolutional neural network corresponding to the corresponding learning rate parameter group.

Preferably, the method for determining the current optimal learning rate parameter set and the corresponding performance evaluation result thereof includes:

if the current generation is a first generation, selecting a learning rate parameter group with the highest performance evaluation result from all learning rate parameter groups of the first generation as the current optimal learning rate parameter group;

if the current generation is not the first generation, in all learning rate parameter sets of the current generation, if a performance evaluation result corresponding to any one learning rate parameter set is better than a performance evaluation result corresponding to a current optimal learning rate parameter set before the current generation, taking any one learning rate parameter set as a latest current optimal learning rate parameter set, and if not, keeping the current optimal learning rate parameter set unchanged.

Preferably, the determining all learning rate parameter sets of the next generation by the genetic algorithm comprises:

and judging whether variation occurs to each learning rate parameter in each learning rate parameter group of the current generation, if so, updating the corresponding learning rate parameter of the next generation, and otherwise, determining that the corresponding learning rate parameter of the next generation is kept unchanged.

Preferably, the updating the corresponding learning rate parameter of the next generation comprises:

for the ith learning rate parameter in each learning rate parameter set, updating the corresponding learning rate parameter of the next generation to max (min (P [ index [0 ])]] _k +0.5*(P[index[1]] _k -P[index[2]] _k ) Up _ bound), low _ bound); therein, index [ i] _k Sample (range (Np), 3) represents the kth learning rate parameter value of the ith learning rate parameter group in the three learning rate parameter groups randomly extracted from the Np learning rate parameter groups of the current generation, 3 ≦ Np, low _ bound being the minimum value of the preset learning rate parameter, and up _ bound being the maximum value of the learning rate parameter.

Preferably, after determining that the optimization termination condition is not satisfied, the method further comprises:

and judging whether the number of the learning rate parameters higher than a first set value in the current optimal learning rate parameter group is larger than a set threshold value, if so, increasing the maximum value of the learning rate parameters.

Preferably, the first setting value is a midpoint value of the learning rate parameter value interval; and/or the presence of a gas in the gas,

the set threshold value is total _ modules high _ value _ ratio; the total _ modules is the number of convolution groups of the first convolution neural network, and the high _ value _ ratio is a proportion that learning rate parameters included in a learning rate parameter combination corresponding to an optimal performance evaluation result in all the learning rate parameter combinations of the current generation are larger than a second set value.

Preferably, the training on the basis of the first convolutional neural network to generate the network model of the second convolutional neural network corresponding to the corresponding learning rate parameter set includes:

and respectively training on the basis of the first convolutional neural network, finishing the training when the training iteration number reaches a set second maximum iteration number, and taking the trained network model as the network model of the second convolutional neural network corresponding to the corresponding learning rate parameter group.

A weight tuning apparatus of a convolutional neural network, comprising: a setting unit and a genetic algorithm processing unit;

the setting unit is used for forming a learning rate parameter group by using learning rate parameters corresponding to each designated layer of the first convolutional neural network, using the learning rate parameter group as an individual of a genetic algorithm, and using the learning rate parameters as genes of the genetic algorithm;

the genetic algorithm processing unit is used for processing all the learning rate parameter sets of each generation in sequence by using the genetic algorithm until the optimization termination condition of the genetic algorithm is met so as to determine the optimal learning rate parameter set for generating a second convolutional neural network by training the first convolutional neural network;

wherein, in the genetic algorithm processing unit, the processing of all sets of learning rate parameters for each generation includes:

Preferably, in the genetic algorithm processing unit, the generating a network model of the second convolutional neural network corresponding to the corresponding learning rate parameter group, and performing performance evaluation corresponding to each network model includes:

for all learning parameter sets of the current generation, respectively utilizing each learning rate parameter set, on the basis of the first convolutional neural network, training and generating the second convolutional neural network in a transfer learning manner, and respectively obtaining a network model corresponding to each learning rate parameter set;

and respectively performing performance evaluation on the network model corresponding to each learning rate parameter group to obtain a performance evaluation result of the network model corresponding to each learning rate parameter group as the performance evaluation result of each learning rate parameter group.

Preferably, in the genetic algorithm processing unit, the determining whether the optimization termination condition is satisfied includes:

if the performance evaluation result corresponding to the current optimal learning rate parameter set is superior to a preset target performance, or if the iteration frequency of the genetic algorithm reaches a preset first maximum iteration frequency, determining that the optimization termination condition is met; otherwise, determining that the optimization termination condition is not met.

Preferably, in the genetic algorithm processing unit, the determining a current optimal learning rate parameter set and a corresponding performance evaluation result thereof includes:

if the current generation is not the first generation, in all learning rate parameter sets of the current generation, if a performance evaluation result corresponding to any learning rate parameter set is better than a performance evaluation result corresponding to a current optimal learning rate parameter set before the current generation, taking any learning rate parameter set as a latest current optimal learning rate parameter set, and if not, keeping the current optimal learning rate parameter set unchanged.

Preferably, in the genetic algorithm processing unit, the determining all the sets of learning rate parameters of the next generation by the genetic algorithm includes:

and judging whether variation occurs to each learning rate parameter in each learning rate parameter group of the current generation, if so, updating the corresponding learning rate parameter of the next generation, and otherwise, determining that the corresponding learning rate parameter of the next generation remains unchanged.

Preferably, in the genetic algorithm processing unit, the updating the respective learning rate parameters of the next generation includes:

for the ith learning rate parameter in each learning rate parameter set, updating the corresponding learning rate parameter of the next generation to max (min (P [ index [0 ])]] _k +0.5*(P[index[1]] _k -P[index[2]] _k ) Up _ bound), low _ bound); wherein, index [ i] _k Sample (range (Np), 3) represents the kth learning rate parameter value of the ith learning rate parameter group in the three learning rate parameter groups randomly extracted from the Np learning rate parameter groups of the current generation, 3 ≦ Np, low _ bound being the minimum value of the preset learning rate parameter, and up _ bound being the maximum value of the learning rate parameter.

Preferably, after determining that the optimization termination condition is not satisfied, the genetic algorithm processing unit is further configured to determine whether the number of learning rate parameters higher than a first set value in the current optimal learning rate parameter set is greater than a set threshold, and if so, increase the maximum value of the learning rate parameters.

the set threshold value is total _ modules high _ value _ ratio; the total _ modules is the number of convolution groups of the first convolution neural network, and the high _ value _ ratio is the proportion that the learning rate parameter in the first convolution neural network is larger than a second set value.

Preferably, in the genetic algorithm processing unit, the training is performed on the basis of the first convolutional neural networks, and the network model of the second convolutional neural network corresponding to the corresponding learning rate parameter set is generated, including:

A computer readable storage medium having computer instructions stored thereon, wherein the instructions when executed by a processor implement any of the above convolutional neural network weight tuning methods.

An electronic device comprising at least a computer-readable storage medium, and a processor;

the processor is configured to read the executable instructions from the computer-readable storage medium and execute the instructions to implement any one of the above convolutional neural network weight tuning methods.

According to the technical scheme, the learning rate parameters corresponding to each designated layer of the first convolutional neural network of the source domain form a learning rate parameter group, the learning rate parameter group is used as an individual of the genetic algorithm, and the learning rate parameters are used as genes of the genetic algorithm; and optimizing a learning rate parameter group consisting of the learning rate parameters by using a genetic algorithm, and performing optimization processing by using a performance evaluation result of a second convolutional neural network model obtained by training by using the learning rate parameter group as an evaluation standard of the corresponding learning rate parameter group in the optimization process. The optimization method of the learning rate parameters can continuously update and optimize the learning rate parameters in a continuous search space through a genetic algorithm without depending on any prior knowledge, so that the performance of the target convolutional neural network is improved.

Further, when the next generation learning rate combination is determined each time, if the learning rate parameters reaching a certain proportion in the current optimal learning rate parameter combination are closer to the interval upper limit of the current learning rate parameter combination, the maximum value of the learning rate parameters can be increased, so as to optimize the optimal or local optimal learning rate parameter combination in a wider range. Compared with the mode that the learning rate parameter interval is fixed (generally 0-1), the processing mode can find out a better learning rate parameter combination, thereby further improving the performance of the target convolutional neural network.

Drawings

FIG. 1 is a schematic diagram illustrating a basic flow of a weight tuning method for a convolutional neural network according to the present application;

FIG. 2 is a logic flow diagram of a genetic algorithm;

FIG. 3 is a flowchart illustrating a weight tuning method for a convolutional neural network according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a basic structure of a weight tuning apparatus for a convolutional neural network according to the present application;

fig. 5 is a schematic diagram of a basic structure of an electronic device provided in the present application.

Detailed Description

For the purpose of making the objects, technical means and advantages of the present application more apparent, the present application will be described in further detail with reference to the accompanying drawings.

Currently, the common strategy for weight tuning (i.e. optimizing the learning rate parameter set) is to freeze the bottom layer and release the network from the upper layer. However, there are also documents which show experimentally that good results are obtained with the opposite strategy. That is, the optimal learning rate parameter distribution may also be in the form of a fluctuation. In summary, in the migration learning, there is no consensus strategy to guide how the learning rate parameter of each layer should be set. At present, the mode of changing the learning rate parameters is mainly a manual setting method (a learning weight increasing strategy and a learning weight decreasing strategy), the mode of adopting manual adjustment usually needs to depend on prior knowledge guidance of a literature experiment result, and then the effect is verified through a large number of experiments, so that the method has low efficiency, wastes time and labor, and is difficult to find an optimal solution.

In view of the above problems of the learning rate parameter setting method, the basic idea of the present application is: and (3) carrying out weight tuning in transfer learning by utilizing a genetic algorithm, thereby obtaining the optimal or local optimal learning rate parameter combination in transfer learning and improving the performance of the target convolutional neural network.

Fig. 1 is a schematic basic flow chart of a weight tuning method of a convolutional neural network in an embodiment of the present application, as shown in fig. 1, the method includes:

step 101, composing the learning rate parameters corresponding to each designated layer of the first convolutional neural network into a learning rate parameter group, using the learning rate parameter group as an individual of the genetic algorithm, and using the learning rate parameters as genes of the genetic algorithm.

The convolutional neural network of the source domain in the migration learning (i.e., the convolutional neural network that has been trained) is referred to as a first convolutional neural network in the present application. On the basis of the first convolutional neural network, a second convolutional neural network suitable for a target domain, namely the target convolutional neural network, is generated through training through setting of learning rate parameters.

Based on the setting, learning rate parameters are correspondingly set corresponding to each designated layer of the first convolutional neural network, and all the learning rate parameters are combined together to form a learning rate parameter group. The designated layer may be set as needed, and is typically a layer that the second convolutional neural network needs to learn from the first convolutional neural network, for example, all convolutional layers. Each learning rate parameter is used as a gene of the genetic algorithm, and the learning rate parameter group is used as an individual in the genetic algorithm, so that the genetic algorithm can be used for searching an optimal learning rate parameter group in the following step 102, and the optimal learning rate parameter group is used for training and generating the second convolutional neural network on the basis of the first convolutional neural network.

And 102, sequentially processing all the learning rate parameter sets of each generation by using a genetic algorithm until the optimization termination condition of the genetic algorithm is met, so as to determine the optimal learning rate parameter set for generating a second convolutional neural network by training a first convolutional neural network.

FIG. 2 is a schematic of a genetic algorithm logic flow. In the genetic algorithm, each generation has a plurality of individuals, each individual comprises a plurality of genes, for each generation of individuals, some genes are mutated and updated to new gene values, some genes are not mutated, the gene values are not changed (namely, the genes are inherited to the next generation), and the inherited or mutated genes form the next generation of individuals.

When the genetic algorithm is used for optimizing the learning rate parameters, all the learning rate parameter sets of each generation are processed in sequence, whether the optimization termination condition of the genetic algorithm is met or not is judged after the processing of each generation, and the processing of the genetic algorithm is ended until the learning rate parameter set of a certain generation meets the optimization termination condition.

Wherein, processing all learning rate parameter sets of each generation may specifically include:

respectively training on the basis of the first convolutional neural network by utilizing all the learning rate parameter groups of the current generation to generate network models of a second convolutional neural network corresponding to the corresponding learning rate parameter groups, and performing performance evaluation corresponding to each network model; taking the performance evaluation result of the network model as an evaluation standard of a corresponding learning rate parameter group, determining a current optimal learning rate parameter group, judging whether the optimization termination condition is met, if the optimization termination condition is met, taking the current optimal learning rate parameter group as the optimal learning rate parameter group, and taking a network model obtained by training by using the learning rate parameter group as a trained second convolutional neural network; and if the optimization termination condition is not met, determining all learning rate parameter sets of the next generation through a genetic algorithm.

In the above processing, the one-generation learning rate parameter group currently being processed is referred to as a current-generation learning rate parameter group. And for all the learning rate parameter groups of the current generation, training the neural network by taking the learning rate parameter groups as units. Specifically, taking a learning rate parameter group a as an example, each learning rate parameter in the learning rate parameter group a is correspondingly set to a corresponding designated layer of a first convolutional neural network, and is used as a learning rate parameter of the layer, then, training of the neural network is performed based on the first convolutional neural network, and a network model (hereinafter, referred to as a network model B) of a second convolutional neural network, which may also be referred to as a second convolutional neural network, is obtained by training, where the network model B corresponds to the learning rate parameter group a; next, the network model B is subjected to performance evaluation, and the performance evaluation result is used as an evaluation criterion of the learning rate parameter group a in the genetic algorithm. The processing for each learning rate parameter group is the same as the processing for the learning rate parameter group a, so that a corresponding network model can be generated by training corresponding to each learning rate parameter group, and the performance evaluation result of the network model is obtained as the evaluation criterion of the corresponding learning rate parameter group. After obtaining corresponding network models and performance evaluation results thereof for all the learning rate parameter sets of the current generation, determining the current optimal learning rate parameter set, judging whether the optimization termination condition of the genetic algorithm is met or not at present, if so, ending the iteration of the genetic algorithm, and taking the current optimal learning rate parameter set as an optimization result, namely the optimal learning rate parameter set; if the parameter set does not meet the optimization termination condition of the genetic algorithm, determining all the learning rate parameter sets of the next generation through the genetic algorithm, and continuously processing the learning rate parameter sets of the next generation until the optimization termination condition of the genetic algorithm is met.

The basic flow shown in fig. 1 ends up so far. As can be seen from the above processing, the learning rate parameter group is set as a gene of the genetic algorithm, the learning rate parameter group is set as an individual of the genetic algorithm, and the optimization processing of the learning rate parameter group is realized by the genetic algorithm. Therefore, the learning rate parameters can be continuously updated and optimized in a continuous search space through a genetic algorithm without depending on any prior knowledge, so that the performance of the target convolutional neural network is improved.

The following describes a specific implementation of the weight tuning method of the convolutional neural network in the present application by using a specific embodiment.

Fig. 3 is a flowchart illustrating a convolutional neural network weight tuning method according to an embodiment of the present disclosure.

As shown in fig. 3, the method includes:

step 301, collecting and dividing a training set and a verification set of a target domain.

Collecting and dividing a target domain training set and a verification set, which means collecting training data and verification data required by a deep learning network in a target task scene aimed at by transfer learning, and dividing the training data and the verification data according to a certain proportion; in this embodiment, the collected training data is divided into training sets and validation sets according to 1:1.

Step 302, initializing a hyperreference of the genetic algorithm.

Genetic algorithms require setting of necessary hyper-parameters. The following hyper-parameters are set in this embodiment:

1. the number of genes of an individual, popu _ element;

as mentioned above, in the present application, the learning rate parameter is a gene in the genetic algorithm, and the learning rate parameter group is an individual of the genetic algorithm, so the number of genes, popup _ element, of the individual is also the number of learning rate parameters included in the learning rate parameter group.

2. The number of convolution groups of the first convolutional neural network, total _ modules;

the super parameter is an optional super parameter, and when the maximum value of the learning rate parameter can be updated, the super parameter is used for judging whether the maximum value of the learning rate parameter needs to be updated. The hyper-parameter is a parameter introduced for the scenario of the present application, and is not a hyper-parameter of a conventional genetic algorithm, in this embodiment, the first convolutional neural network implements cross-layer connection of convolutional layers through a residual module, and therefore, the number total _ modules of convolutional groups in this embodiment is the number of residual modules.

3. The probability CR of mutation of the gene;

in determining the next generation set of learning rate parameters, some genes (i.e., the learning rate parameters) may be mutated according to genetic algorithms, and the probability CR is used to control the probability of the mutation of the genes. Generally, a fixed value can be preset according to actual needs.

4. The number of individuals per generation Np;

in the present application, the number Np of individuals is the same for different generations, and as shown in fig. 2, it can be set as needed. Generally, the larger the scale of the first convolutional neural network, the larger the number Np of individuals per generation, so that the optimal learning rate parameter set can be found more accurately. In the present embodiment, np is assumed to be 6.

5. Maximum and minimum values of the learning rate parameter;

the maximum value (which may also be referred to as the maximum value) and the minimum value (which may also be referred to as the minimum value) of the learning rate parameter, which are desirable for the learning rate parameter, also define the value interval of the learning rate parameter. In this embodiment, the minimum value low _ bound of the learning rate parameter is kept unchanged, and may generally be 0; in order to find a better learning rate parameter, the maximum value of the learning rate parameter is set to be updatable in the present embodiment, and the processing of updating will be described in detail in the subsequent step 308. Naturally, in consideration of the limitations of processing capacity and resources or the requirements of network performance, the maximum value of the learning rate parameter may be set to a fixed value to simplify the processing, and of course, the performance of the target convolutional neural network may be affected to some extent.

6. Maximum number of iterations N of a genetic algorithm _max ；

In the genetic algorithm, the processing of each generation of individuals is equivalent to an iterative process, and the maximum iteration number N is set _max The total algebra of the genetic algorithm is limited.

Step 303, setting the ITERATION times of each training of the convolutional neural network, namely, the ITERATION _ NUM.

In order to ensure fairness of convolutional neural network training by using different learning rate parameter sets, in the weight tuning method, each training of the convolutional neural network is iterated for ITERATION _ NUM times, then the training is finished, that is, in the training of the convolutional neural network, a loss function is not used as a training end condition, but the iterating times ITERATION _ NUM is uniformly used as the training end condition, so that for the different learning rate parameter sets, corresponding network models are obtained through ITERATION _ NUM ITERATIONs, and then the performance of the network models is evaluated, and fairness in comparison of performance evaluation results can be ensured. The item _ NUM is a preset constant.

The processing of the foregoing steps 301 to 303 belongs to the initialization processing of the weight tuning method in this embodiment, wherein the order of the three steps may be arbitrarily adjusted or may be performed simultaneously, which is not limited in this application, and in this embodiment, the description is given by taking the order of the three steps as an example. Next, the processing of steps 304 to 308 belongs to the operation part of the genetic algorithm, and is performed cyclically for each generation of learning rate parameter set, and the generation of learning rate parameter set being processed is hereinafter referred to as the current generation of learning rate parameter set, and the processing performed on the current generation of learning rate parameter set is taken as an example and described in detail.

And 304, for all the learning rate parameter groups of the current generation, respectively training and generating a second convolutional neural network by using each learning rate parameter group in a transfer learning manner on the basis of the first convolutional neural network, and respectively obtaining a network model corresponding to each learning rate parameter group.

This step is the training process of the convolutional neural network. And training the convolutional neural network once corresponding to each learning rate parameter group of the current generation respectively to generate a network model of a second convolutional neural network corresponding to each learning rate parameter group.

If the current generation is the first generation of the genetic algorithm, the learning rate parameter values of all the learning rate parameter sets of the first generation, that is, the initial values of the learning rate parameters, need to be calculated first.

In the present embodiment, in any one of the first generation learning rate parameter groups, all the learning rate parameter values of the learning rate parameter group are the same. For the ith learning rate parameter group of the first generation, the value of the learning rate parameter is outputted _i The calculation can be performed according to the following equation (1):

if the current generation is not the first generation of the genetic algorithm, the value of each learning rate parameter in the learning rate parameter set of the current generation is determined after the previous generation processing is finished, which will be described in step 308.

After the values of all the learning rate parameters in all the learning rate parameter groups of the current generation are determined, the convolutional neural network can be trained by the learning rate parameter groups one by one. The following description will be given by taking the learning rate parameter set x as an example.

And correspondingly setting each learning rate parameter in the learning rate parameter group x to a specified layer of the first convolutional neural network, and training the convolutional neural network by using the training set divided in the step 301 on the basis of the first convolutional neural network to generate a network model y of the second convolutional neural network. When convolutional neural network training is performed, and when the number of training ITERATIONs reaches the entry _ NUM set in step 303, the training is ended to obtain a network model y, so that fairness for different learning rate parameter sets is ensured. The network model y corresponds to a learning rate parameter set x. After the processing of the step, a corresponding network model is obtained by training each learning rate parameter group of the current generation.

Note that, here, every time the process of returning to this step after the end of step 308 is performed to train the convolutional neural network, the training is restarted on the basis of the first convolutional neural network.

Step 305, performing performance evaluation on the network model corresponding to each learning rate parameter set of the current generation respectively, and obtaining a performance evaluation result of the network model corresponding to each learning rate parameter set as a performance evaluation result of the corresponding learning rate parameter set.

For each learning rate parameter set of the current generation, a corresponding network model is obtained through training through the processing of step 304. In this embodiment, if Np of each generation is 6, 6 network models can be obtained. For each of these obtained network models, the performance of the network model is evaluated by using the test set determined in step 301, so as to obtain a corresponding performance evaluation result. Specifically, an evaluation index for performance evaluation may be selected by itself according to task requirements, for example: recall rate recall, precision rate precision, F1, etc. The F1 index is calculated according to a calculation formula (2), precision in the formula refers to precision, recall in the formula refers to recall, and the precision and the recall can be determined according to the existing mode.

Step 306, determining the current best learning rate parameter group according to the performance evaluation results of all the learning rate parameter groups of the current generation.

This step is to determine the current best individual based on genetic algorithms. Specifically, if the current generation is the first generation, selecting a learning rate parameter group with the highest performance evaluation result from all learning rate parameter groups of the first generation as the current optimal learning rate parameter group;

if the current generation is not the first generation, in all the learning rate parameter sets of the current generation, if the performance evaluation result corresponding to a certain learning rate parameter set a is better than the performance evaluation result corresponding to the latest current optimal learning rate parameter set, the learning rate parameter set a is used as the latest current optimal learning rate parameter set, otherwise, the latest current optimal learning rate parameter set is kept unchanged. Of course, in practical applications, different implementations may be adopted to determine the current optimal learning rate parameter set. For example, a learning rate parameter group with an optimal performance evaluation result may be selected from all the learning rate parameter groups of the current generation, the performance evaluation result of the selected learning rate parameter group may be compared with the performance evaluation result of the latest current optimal learning rate parameter group, and the learning rate parameter group corresponding to the more optimal performance evaluation result may be used as the latest current optimal learning rate parameter group; or, the performance evaluation result of each learning rate parameter group of the current generation may be compared with the current latest learning rate parameter group one by one, and the learning rate parameter group corresponding to the better performance evaluation result is selected as the latest learning rate parameter group, until all the learning rate parameter groups of the current generation are compared, the current best learning rate parameter group is determined.

And 307, judging whether the optimization termination condition of the genetic algorithm is met, if not, executing step 308, otherwise, executing step 309.

After the processing of steps 304-306 is performed for all the learning rate parameter sets of each generation, it is determined by this step whether to end the iteration of the genetic algorithm.

The specific manner for judging whether the optimization termination condition is satisfied may be:

if the performance evaluation result F corresponding to the current optimal learning rate parameter group is superior to the preset target performance F _target Or, if the iteration number of the genetic algorithm reaches a preset first maximum iteration number N _max Determining that the optimization termination condition of the genetic algorithm is satisfied; otherwise, determining that the optimization termination condition of the genetic algorithm is not met. Wherein the target property F _target Is a preset performance index value which the convolutional neural network is expected to reach.

When it is determined that the optimization termination condition is not satisfied, the specific value of the next generation learning rate parameter set is determined by the processing of step 308 using a genetic algorithm.

When it is determined that the optimization termination condition is satisfied, that is, the iteration of the genetic algorithm is ended, the optimal learning rate parameter is determined through step 309.

And 308, determining the next generation learning rate parameter group and the maximum value of the learning rate parameter by using a genetic algorithm.

As mentioned above, in the genetic algorithm, when determining the next generation of individuals and genes, the genes may be mutated, that is, the learning rate parameters in the present application may need to be updated. This step is used to determine the specific value of each learning rate parameter in all the learning rate parameter sets of the next generation. The specific value determination method for each learning rate parameter may be the same, and here, the kth learning rate parameter in a certain learning rate parameter group b is taken as an example for description.

And the value of the kth learning rate parameter in the next generation learning rate parameter group b is determined according to whether the learning rate parameter is mutated or not. Specifically, if the kth learning rate parameter varies, the value of the learning rate parameter is updated, and a specific updated value determination mode can be set as required; if the k-th learning rate parameter has no variation, the value of the learning rate parameter remains unchanged, i.e. the value of the learning rate parameter is the same as the value of the corresponding learning rate parameter of the current generation.

The determination method of whether the learning rate parameter has a variation may be: and judging whether the random number generated corresponding to the learning rate parameter is smaller than the mutation probability CR set in the step 302, if so, determining that mutation occurs, and otherwise, determining that no mutation occurs.

In this embodiment, a way of updating the learning rate parameter value when a variation occurs is provided, that is, the value Trail of the kth learning rate parameter is determined according to the following formula (3) _k ：

Wherein, index [ i] _k ＝random.sample(range(Np),COUNT)，index[i] _k Means for randomly extracting COUNT learning rate parameter sets from Np learning rate parameter sets of the current generation, wherein the k-th learning rate parameter value of the i-th learning rate parameter set is index [ i [ ]] _k COUNT =3 ≤ Np, low _ bound is a minimum value of the preset learning rate parameter, and up _ bound is a maximum value of the learning rate parameter; random () represents a random number between 0 and 1 generated corresponding to the kth learning rate parameter.

For each learning rate parameter, the value of the learning rate parameter of the next generation can be determined according to the above mode.

In addition, to further select a more optimal learning rate parameter combination, optionally, the present embodiment further determines whether to update the maximum value of the learning rate parameter in step 308.

Specifically, in the conventional weight tuning method, the interval of the learning rate parameter is fixed. In this embodiment, when determining the next generation learning rate combination each time, if the learning rate parameters that have reached a certain proportion in the current optimal learning rate parameter combination are closer to the upper limit of the interval of the current learning rate parameter combination, the maximum value of the learning rate parameters may be increased to optimize the optimal learning rate parameter combination in a wider range.

More specifically, in this embodiment, it is determined whether the number of the learning rate parameters higher than the first set value in the current optimal learning rate parameter set is greater than a set threshold, if so, the maximum value of the learning rate parameters is increased, otherwise, the maximum value of the learning rate parameters is kept unchanged. The first set value may be a midpoint value of the learning rate parameter value interval, that is, (up _ bound + low _ bound)/2.0, and the set threshold value may be total _ modules × high _ value _ ratio, where total _ modules is the number of convolution groups of the first convolution neural network, in this embodiment, the number of residual error modules in the first convolution neural network, and high _ value _ ratio is a ratio that a learning rate parameter included in a learning rate parameter combination corresponding to an optimal performance evaluation result in all the learning rate parameter combinations of the current generation is greater than the second set value.

The values of all the learning rate parameter sets of the next generation and the maximum value of the latest learning rate parameter can be obtained through the processing of the step. Next, the process returns to step 304 to perform the next iteration of the genetic algorithm.

Step 309, using the current optimal learning rate parameter group as the optimal learning rate parameter group, and using the network model obtained by training with the learning rate parameter group as the trained second convolutional neural network.

When the step is executed, the iteration of the genetic algorithm is ended to obtain the optimal learning rate parameter set, and then the network model corresponding to the optimal learning rate parameter set obtained through the training in the step 304 is the trained second convolutional neural network.

The flow shown in fig. 3 ends up so far. Through the processing of the application, the optimal or local optimal learning rate parameter set in the transfer learning can be accurately found, and the performance of the target convolutional neural network of the transfer learning is improved. Further, when the next generation learning rate combination is determined each time, if the learning rate parameters reaching a certain proportion in the current optimal learning rate parameter combination are closer to the interval upper limit of the current learning rate parameter combination, the maximum value of the learning rate parameters can be increased, so as to optimize the optimal or local optimal learning rate parameter combination in a wider range. Compared with the mode that the learning rate parameter interval is fixed (generally 0-1), the processing mode can find out a better learning rate parameter combination, thereby further improving the performance of the target convolutional neural network.

The foregoing is a specific implementation of the convolutional neural network weight tuning method in this application. The application also provides a weight tuning device of the convolutional neural network, which can be used for implementing the weight tuning method. Fig. 4 is a schematic diagram of a basic structure of a convolutional neural network weight tuning apparatus according to the present application, as shown in fig. 4, the apparatus includes: a setting unit and a genetic algorithm processing unit.

the genetic algorithm processing unit is used for processing all the learning rate parameter sets of each generation in sequence by utilizing the genetic algorithm until the optimization termination condition of the genetic algorithm is met so as to determine the optimal learning rate parameter set for generating a second convolutional neural network by training the first convolutional neural network;

in more detail, in the genetic algorithm processing unit, all the sets of learning rate parameters of each generation are processed, which may specifically include:

respectively training on the basis of the first convolutional neural network by utilizing all the learning rate parameter groups of the current generation to generate network models of a second convolutional neural network corresponding to the corresponding learning rate parameter groups, and performing performance evaluation corresponding to each network model; taking the performance evaluation result of the network model as an evaluation basis of a corresponding learning rate parameter group, determining a current optimal learning rate parameter group, judging whether an optimization termination condition is met, if so, taking the current optimal learning rate parameter group as the optimal learning rate parameter group, and taking a network model obtained by training by using the learning rate parameter group as a trained second convolutional neural network; and if the optimization termination condition is not met, determining all learning rate parameter sets of the next generation through a genetic algorithm.

Optionally, in the genetic algorithm processing unit, generating a network model of the second convolutional neural network corresponding to the corresponding learning rate parameter group, and performing performance evaluation corresponding to each network model, including:

for all learning parameter sets of the current generation, respectively utilizing each learning rate parameter set, training and generating a second convolutional neural network in a transfer learning mode on the basis of the first convolutional neural network, and respectively obtaining a network model corresponding to each learning rate parameter set;

and respectively carrying out performance evaluation on the network model corresponding to each learning rate parameter group to obtain a performance evaluation result of the network model corresponding to each learning rate parameter group as the performance evaluation result of each learning rate parameter group.

Optionally, in the genetic algorithm processing unit, the determining whether the processing of the optimization termination condition is satisfied may specifically include:

if the performance evaluation result corresponding to the current optimal learning rate parameter set is superior to a preset target performance, or if the iteration times of the genetic algorithm reach a preset first maximum iteration time, determining that the optimization termination condition is met; otherwise, determining that the optimization termination condition is not met; and the performance evaluation result corresponding to each learning rate parameter group is the performance evaluation result of the network model of the second convolutional neural network corresponding to the corresponding learning rate parameter group.

Optionally, in the genetic algorithm processing unit, the manner of determining the current optimal learning rate parameter set and the performance evaluation result corresponding to the current optimal learning rate parameter set may specifically include:

if the current generation is the first generation, selecting a learning rate parameter group with the highest performance evaluation result from all the learning rate parameter groups of the first generation as a current optimal learning rate parameter group;

if the current generation is not the first generation, in all the learning rate parameter sets of the current generation, if the performance evaluation result corresponding to any one learning rate parameter set is better than the performance evaluation result corresponding to the current optimal learning rate parameter set, taking any one learning rate parameter set as the latest current optimal learning rate parameter set, otherwise, keeping the current optimal learning rate parameter set unchanged.

Optionally, in the genetic algorithm processing unit, the processing of determining all the sets of learning rate parameters of the next generation by using a genetic algorithm may specifically include:

Optionally, in the genetic algorithm processing unit, the process of updating the corresponding learning rate parameter of the next generation may specifically include:

for the ith learning rate parameter in each learning rate parameter set, the corresponding learning rate parameter of the next generation is updated to max (min (P [ index [0 ])]] _k +0.5*(P[index[1]] _k -P[index[2]] _k ) Up _ bound), low _ bound); wherein, index [ i] _k Sample (range (Np), 3) represents the kth learning rate parameter value of the ith learning rate parameter group in the three learning rate parameter groups randomly extracted from the Np learning rate parameter groups of the current generation, 3 ≦ Np, low _ bound being the minimum value of the preset learning rate parameter, and up _ bound being the maximum value of the learning rate parameter.

Optionally, after determining that the optimization termination condition is not satisfied, the genetic algorithm processing unit is further configured to determine whether the number of the learning rate parameters higher than the first set value in the current optimal learning rate parameter set is greater than a set threshold, and if so, increase the maximum value of the learning rate parameters.

Optionally, the first setting value may be a midpoint value of the learning rate parameter value interval; and/or the presence of a gas in the gas,

the set threshold value may be total _ modules · high _ value _ ratio; the total _ modules may be the number of convolution groups of the first convolutional neural network, and the high _ value _ ratio may be a ratio of learning rate parameters included in a learning rate parameter combination corresponding to the best performance evaluation result in all the learning rate parameter combinations of the current generation, which is greater than a second set value.

Optionally, in the genetic algorithm processing unit, the training is performed on the basis of the first convolutional neural network, and the processing of generating the network model of the second convolutional neural network corresponding to the corresponding learning rate parameter group specifically includes:

and respectively training on the basis of the first convolutional neural network, finishing the training when the training iteration number reaches a set second maximum iteration number, and taking the network model obtained by training as the network model of the second convolutional neural network corresponding to the corresponding learning rate parameter group.

The present application also provides a computer-readable storage medium storing instructions that, when executed by a processor, may perform the steps in implementing the convolutional neural network weight tuning method as described above. In practical applications, the computer readable medium may be included in each of the apparatuses/devices/systems of the above embodiments, or may exist separately and not be assembled into the apparatuses/devices/systems. Wherein instructions are stored in a computer readable storage medium, which stored instructions, when executed by a processor, may perform the steps in the convolutional neural network weight tuning method as described above.

According to embodiments disclosed herein, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example and without limitation: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing, without limiting the scope of the present disclosure. In the embodiments disclosed herein, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

Fig. 5 is an electronic device according to still another embodiment of the present disclosure. As shown in fig. 5, a schematic structural diagram of an electronic device according to an embodiment of the present application is shown, specifically:

the electronic device may include a processor 501 of one or more processing cores, memory 502 of one or more computer-readable storage media, and a computer program stored on the memory and executable on the processor. When the program of the memory 502 is executed, a weight tuning method of a convolutional neural network may be implemented.

Specifically, in practical applications, the electronic device may further include a power supply 503, an input/output unit 504, and other components. Those skilled in the art will appreciate that the configuration of the electronic device shown in fig. 5 is not intended to be limiting of the electronic device and may include more or fewer components than shown, or some components in combination, or a different arrangement of components. Wherein:

the processor 501 is a control center of the electronic device, connects various parts of the whole electronic device by various interfaces and lines, and performs various functions of the server and processes data by running or executing software programs and/or modules stored in the memory 502 and calling data stored in the memory 502, thereby performing overall monitoring of the electronic device.

The memory 502 may be used to store software programs and modules, i.e., the computer-readable storage media described above. The processor 501 executes various functional applications and data processing by running software programs and modules stored in the memory 502. The memory 502 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the server, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 502 may also include a memory controller to provide the processor 501 with access to the memory 502.

The electronic device further comprises a power source 503 for supplying power to each component, and the power source can be logically connected with the processor 501 through a power management system, so that functions of charging, discharging, power consumption management and the like can be managed through the power management system. The power supply 503 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

The electronic device may also include an input-output unit 504, where the input-output unit 504 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. The input unit output 504 may also be used to display information input by or provided to the user, as well as various graphical user interfaces, which may be made up of graphics, text, icons, video, and any combination thereof.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A weight tuning method of a convolutional neural network is characterized by comprising the following steps:

processing all the learning rate parameter sets of each generation in sequence by utilizing the genetic algorithm until the optimization termination condition of the genetic algorithm is met, so as to determine the optimal learning rate parameter set for generating a second convolutional neural network by training the first convolutional neural network;

respectively training on the basis of the first convolutional neural network by utilizing all the learning rate parameter groups of the current generation to generate network models of the second convolutional neural network corresponding to the corresponding learning rate parameter groups, and performing performance evaluation corresponding to each network model; taking the performance evaluation result of the network model as an evaluation basis of a corresponding learning rate parameter group, determining a current optimal learning rate parameter group, judging whether the optimization termination condition is met, if so, taking the current optimal learning rate parameter group as the optimal learning rate parameter group, and taking a network model obtained by training by using the learning rate parameter group as a trained second convolutional neural network; and if the optimization termination condition is not met, determining all learning rate parameter sets of the next generation through the genetic algorithm.

2. The method of claim 1, wherein the generating network models of the second convolutional neural network corresponding to the respective sets of learning rate parameters and performing performance evaluation corresponding to each network model comprises:

3. The method according to claim 1 or 2, wherein the determining whether the optimization termination condition is satisfied comprises:

4. The method of claim 3, wherein determining the current optimal learning rate parameter set and the corresponding performance evaluation result comprises:

if the current generation is not the first generation, in all learning rate parameter sets of the current generation, if a performance evaluation result corresponding to any one learning rate parameter set is better than a performance evaluation result corresponding to the current optimal learning rate parameter set, taking the any one learning rate parameter set as the latest current optimal learning rate parameter set, otherwise, keeping the current optimal learning rate parameter set unchanged.

5. The weight tuning method according to claim 1, wherein the determining all the sets of learning rate parameters of the next generation by the genetic algorithm comprises:

6. The weight tuning method of claim 5, wherein the updating the corresponding learning rate parameters of the next generation comprises:

7. The weight tuning method of claim 1, wherein after determining that the optimization termination condition is not satisfied, the method further comprises:

8. The weight tuning method according to claim 7, wherein the first setting value is a midpoint value of a learning rate parameter value interval; and/or the presence of a gas in the gas,

the set threshold value is total _ modules high _ value _ ratio; the total _ modules is the number of convolution groups of the first convolution neural network, and the high _ value _ ratio is a proportion that a learning rate parameter included in a learning rate parameter combination corresponding to the best performance evaluation result in all the learning rate parameter combinations of the current generation is greater than a second set value.

9. The method of claim 1, wherein the training on the basis of the first convolutional neural networks to generate the network model of the second convolutional neural network corresponding to the corresponding learning rate parameter set comprises:

10. A weight tuning apparatus for a convolutional neural network, comprising: a setting unit and a genetic algorithm processing unit;

the setting unit is used for forming a learning rate parameter group by learning rate parameters corresponding to each designated layer of the first convolutional neural network, using the learning rate parameter group as an individual genetic algorithm, and using the learning rate parameters as genes of the genetic algorithm;

11. A computer readable storage medium having stored thereon computer instructions, which when executed by a processor, implement the method of weight tuning of a convolutional neural network as defined in any one of claims 1 to 9.

12. An electronic device, comprising at least a computer-readable storage medium, and further comprising a processor;

the processor is configured to read the executable instructions from the computer-readable storage medium and execute the instructions to implement the weight tuning method of the convolutional neural network according to any one of claims 1 to 9.