CN116090536A

CN116090536A - Neural network optimization method, device, computer equipment and storage medium

Info

Publication number: CN116090536A
Application number: CN202310096446.8A
Authority: CN
Inventors: 李梦圆
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2023-02-07
Filing date: 2023-02-07
Publication date: 2023-05-09

Abstract

The application discloses a neural network optimization method, a neural network optimization device, computer equipment and a storage medium, and relates to the technical field of artificial intelligence. The method comprises the following steps: acquiring a first super network; sampling the first super network to obtain a plurality of sub-networks; testing the network performance of each sub-network by using a test sample set corresponding to the target prediction task to obtain a first network performance parameter of each sub-network; and carrying out iterative optimization on the network structure parameters of the first super network based on the first network performance parameters of each sub network until the first optimization condition is met, and obtaining the optimized first super network as a second super network. Therefore, based on the network performance parameters of the plurality of sub-networks, iterative optimization is carried out on the network structure parameters corresponding to the first super-network, so that the effectiveness of search space design and the performance of the searched network structure can be improved, and the prediction performance of a target model based on second super-network training for a target prediction task is improved.

Description

Neural network optimization method, device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a method and apparatus for optimizing a neural network, a computer device, and a storage medium.

Background

Artificial intelligence (artificial intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. With the rapid development of artificial intelligence technology, neural networks (e.g., deep neural networks) have achieved great success in the processing and analysis of a variety of media signals, such as images, video, and speech.

However, a neural network with excellent performance often has a subtle network structure, and often requires a great deal of effort by human experts with high skill and experience to construct. In the related art, in order to better construct a neural network, a neural network structure is generally automatically searched by a method of searching (Neural Architecture Search, NAS) for a neural network structure, thereby obtaining a neural network structure with more excellent performance. However, the performance of the neural network structure searched by the method still needs to be improved, so how to further optimize the neural network structure is a problem to be solved currently.

Disclosure of Invention

The application provides an optimization method, an optimization device, computer equipment and a storage medium of a neural network, so as to further optimize the performance of the neural network.

In a first aspect, an embodiment of the present application provides a method for optimizing a neural network, where the method includes: acquiring a first super network; sampling the first super network to obtain a plurality of sub-networks; testing the network performance of each sub-network by using a test sample set corresponding to a target prediction task to obtain a first network performance parameter of each sub-network; and carrying out iterative optimization on the network structure parameters of the first super network based on the first network performance parameters of each sub network until a first optimization condition is met, obtaining the optimized first super network as a second super network, wherein the second super network is used for obtaining a target model corresponding to the target prediction task after carrying out iterative training through a training sample set.

In a second aspect, embodiments of the present application provide an optimization apparatus for a neural network, where the apparatus includes: the device comprises a super network acquisition module, a first sampling module, a performance test module and a super network optimization module. The super network acquisition module is used for acquiring a first super network; the first sampling module is used for sampling the first super network to obtain a plurality of sub-networks; the performance testing module is used for testing the network performance of each sub-network by utilizing a testing sample set corresponding to the target prediction task to obtain a first network performance parameter of each sub-network; the super-network optimization module is used for carrying out iterative optimization on network structure parameters of the first super-network based on the first network performance parameters of each sub-network until a first optimization condition is met, obtaining the optimized first super-network as a second super-network, and the second super-network is used for obtaining a target model corresponding to the target prediction task after carrying out iterative training through a training sample set.

In a third aspect, embodiments of the present application provide a computer device, including: one or more processors; a memory; one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the methods described above.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having program code stored therein, the program code being callable by a processor to perform the method described above.

In the scheme provided by the application, a first super network is acquired; sampling the first super network to obtain a plurality of sub-networks; testing the network performance of each sub-network by using a test sample set corresponding to the target prediction task to obtain a first network performance parameter of each sub-network; and carrying out iterative optimization on network structure parameters of the first super network based on the first network performance parameters of each sub network until the first optimization condition is met, and obtaining the optimized first super network as a second super network, wherein the second super network is used for obtaining a target model corresponding to a target prediction task after carrying out iterative training through a training sample set. In this way, before searching the neural network architecture, based on the network performance of a plurality of sub-networks sampled by aiming at the first super-network, iterative optimization is firstly carried out on the initial search space corresponding to the first super-network to obtain the optimal search space, namely the second super-network with more optimized network structure parameters is obtained, so that the effectiveness of the search space design and the performance of the searched network structure can be improved, and the prediction performance of a target model aiming at a target prediction task based on the second super-network training can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of an optimization method of a neural network according to an embodiment of the present application.

Fig. 2 shows a network architecture schematic of a first super network according to an embodiment of the present application.

Fig. 3 shows a flow diagram of the sub-steps of step S120 in fig. 1 in one embodiment.

Fig. 4 shows a schematic flow chart of steps S150 to S170 after step S140 in fig. 1 in one embodiment.

Fig. 5 shows a schematic flow chart of progressive contraction training according to an embodiment of the present application.

Fig. 6 shows a flowchart of a method for optimizing a neural network according to an embodiment of the present application.

Fig. 7 shows a flow diagram of the sub-steps of step S230 in fig. 6 in one embodiment.

Fig. 8 shows a flow diagram of the sub-steps of step S250 in fig. 6 in one embodiment.

Fig. 9 is a block diagram of an optimizing apparatus for a neural network according to an embodiment of the present application.

Fig. 10 is a block diagram of a computer device for performing the optimization method of the neural network according to the embodiment of the present application.

Fig. 11 is a memory unit for storing or carrying program codes for implementing the optimization method of the neural network according to the embodiment of the present application.

Detailed Description

In order to enable those skilled in the art to better understand the present application, the following description will make clear and complete descriptions of the technical solutions in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present application based on the embodiments herein.

It should be noted that, in some of the processes described in the specification, claims and drawings above, a plurality of operations appearing in a specific order are included, and the operations may be performed out of the order in which they appear herein or in parallel. The sequence numbers of operations such as S110, S120, etc. are merely used to distinguish between the different operations, and the sequence numbers themselves do not represent any execution order. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. And the terms first, second and the like in the description and in the claims of the present application and in the above-described figures, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or sub-modules is not necessarily limited to those steps or sub-modules that are expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or sub-modules that are not expressly listed.

The inventor provides a neural network optimization method, a neural network optimization device, a neural network optimization computer device and a neural network storage medium, wherein the first super network is sampled to obtain a plurality of sub networks, and the parameter values of network structure parameters of the first super network are updated based on the network performances of the plurality of sub networks, so that the optimization of the search space corresponding to the first super network is realized. The following describes in detail the method for optimizing the neural network provided in the embodiment of the present application.

Referring to fig. 1, fig. 1 is a flowchart of a neural network optimization method according to an embodiment of the present application. The method for optimizing the neural network according to the embodiment of the present application will be described in detail below with reference to fig. 1. The optimization method of the neural network may include the steps of:

step S110: a first super network is acquired.

In this embodiment, a Super Network (Super Network) may be understood as comprising a set of all possible sub-networks during the model search. The super network is generated based on the set search space, and the weight of the super network model is the weight of all the sub network models. Weights for the sub-network model may be obtained from the super-network model. Each layer in the super network model comprises a plurality of operators, namely, a plurality of candidate operators are arranged at each node, the operators between the layers are connected in a full connection mode, and one path in the full connection is a sub network model. For example, an operator is selected in each layer, and the neural network model composed of the selected operators in the plurality of layers is a sub-network model. And updating the weights in the paths, namely updating the weights of the sub-network models, and updating the weights of the super-network models, namely achieving the effect of training the super-network models.

Alternatively, the first super network may be a neural network based on a network structure extension of the lightweight deep neural network MobileNetV 2. Referring to fig. 2, fig. 2 shows a super-network structure of a first super-network constructed based on a network structure extension of MobileNetV2, the super-network structure including one convolution block, a plurality of convolution units, and a last full connection layer; wherein each convolution unit contains 3 convolution blocks (conv blocks), each consisting of one point-wise conv, batchNorm, and RELU, one depth-wise conv, batchNorm, and RELU, and one point-wise conv, batchNorm, and RELU. Of course, the first super network may be a neural network extended based on a network structure such as an afficientnet-b 0, resNet, inception, or a DenseNet, which is not limited in this embodiment. In this embodiment, the convolution kernel size of depth-wise-conv is modified to 7×7 to support the search of the convolution kernel size.

The search space can be set from network structure parameters with multiple dimensions, wherein the network structure parameters comprise, but are not limited to, parameters such as convolution kernel size, expansion rate of the number of input channels in a convolution block, depth of a convolution unit, and the like; the search space may also be characterized in terms of at least one of the number of layers included in the neural network architecture, the unit block data included in each layer, and the number of neurons included in each unit block. It can be appreciated that the search space defines a range for searching the neural network architecture, a set of neural network architectures for searching can be provided based on the range defined by the search space, and different search spaces can be characterized by the range of values of the network structure parameters.

Alternatively, hereinafter, the search space will be described as a non-limiting example with three-dimensional network structure parameter setting search space of depth (depth) of a convolution unit, expansion ratio (expansion ratio) of the number of input channels in a convolution block, and size (kernel size) of a depth-wise-conv convolution kernel in the convolution block. The search space provided for generating the first supernetwork may be understood as an initial search space, e.g.,

characterizing an initial search space, ++>

Characterizing the depth of the convolution element, epsilon characterizing the expansion of the number of input channels in the convolution block, ++>

The size of the depth-wise-conv convolution kernel in the convolution block is characterized.

Step S120: and sampling the first super network to obtain a plurality of sub networks.

Further, after the first super network is obtained, iterative optimization can be performed on the initial search space corresponding to the first super network to obtain an optimal search space, so that the effectiveness of search space design and the performance of a network structure searched based on the optimal search space are improved. Specifically, for each round of optimization, training is required to be performed on the current first super network, and after convergence, the first super network after the round of training is performed is sampled to obtain a plurality of sub networks; and then, the network performance parameters of the multiple sub-networks sampled after each round of optimization can be gradually optimized, and the performance of the optimized first super-network is gradually improved. When training is performed on the current first super network, the first super network can be trained by adopting a sandwich rule, namely, each round of training samples a largest sub-network, a smallest sub-network and two sub-networks which are randomly sampled from the first super network, trains all the sub-networks to be converged, and shares the weight parameters of the sub-networks into the first super network to obtain the trained first super network. Of course, the training strategy may be, besides the above-mentioned sandwich rule, also be by means of uniform sampling or independent sampling, which is not limited in this embodiment.

It can be understood that the first super network in step S120 is the first super network after the training of each round is completed; this can be regarded as a pre-training of the first super-network, by means of which a preliminary increase in the network performance of the first super-network is achieved.

In some embodiments, referring to fig. 3, step S120 may include the following steps S121 to S122:

step S121: and acquiring a search space corresponding to the first super network, wherein the search space comprises a first value range of each network structure parameter in a plurality of network structure parameters of the first super network.

Step S122: and based on the search space, sampling a plurality of sub-networks, wherein the parameter value of each network structure parameter in each sub-network is positioned in the first value range of each network structure parameter.

Wherein the search space corresponding to the first super network is a preset and stored range for searching the neural network architecture, for example, as mentioned above

Wherein (1)>

The size of the depth-wise-conv convolution kernel in the convolution block is characterized. That is, based on the search space, the maximum depth of the convolution element of each sub-network of samples does not exceed [3,4,5 ] ]The expansion rate of the number of input channels in the convolution block is not more than [2,4,6 ]]The depth-wise-conv convolution kernel in the convolution block is not more than [3,5,7 ]]. The method for sampling the plurality of sub-networks can be obtained by randomly sampling based on the search space corresponding to the first super-network, and at least one network structure parameter is different among the sub-networks.

Step S130: and testing the network performance of each sub-network by using a test sample set corresponding to the target prediction task to obtain a first network performance parameter of each sub-network.

In practical applications, the super-network has attribute characteristics of consistency, where consistency refers to a degree of consistency between network performance of a candidate network set acquired in the super-network and network performance obtained by separately training the candidate network set, and these sub-networks may be generally used to characterize the network performance of the first super-network. Therefore, in the plurality of sub-networks obtained by sampling the first super-network after the pre-training is completed, the network performance of each sub-network can be tested by utilizing the test sample set corresponding to the target prediction task, so as to obtain the first network performance parameter of each sub-network. Therefore, the excellent network performance of the first super network can be indirectly obtained through the first network performance parameters of the plurality of sub-networks, so that iterative optimization is carried out on each network structure parameter of the first super network, namely, the search space corresponding to the first super network is optimized, and the network performance of the first super network is gradually improved.

The first network performance parameter may be a task prediction error rate, and may of course also include at least one of task prediction delay and task prediction, which is not limited in this embodiment. The target prediction task can be determined according to actual application requirements. For example, if the actual application requirement is a requirement of face alignment, face recognition, eye tracking, or the like, that is, the face pose needs to be estimated, and the performance of the task related to the face needs to be improved in an auxiliary manner based on the estimation result, at this time, the target prediction task may be a face pose prediction task; correspondingly, the test sample set corresponding to the target prediction task may be a sample image set containing a human face, where each sample image in the sample image set carries pose information containing the human face, so as to determine a prediction error rate of the human face pose prediction task based on the pose information carried by the sample image. Of course, other performance index parameter values can be set, and correspondingly, based on the other performance index parameter values, the excellent performance of other networks of each sub-network is determined, so that the network performance of the first super-network is evaluated from the multi-dimensional network performance evaluation angle, a foundation is laid for the subsequent optimization of the network structure parameters of the first super-network, and the iterative optimization of the network structure parameters of the first super-network is realized to the greatest extent.

Step S140: and carrying out iterative optimization on the network structure parameters of the first super network based on the first network performance parameters of each sub network until a first optimization condition is met, obtaining the optimized first super network as a second super network, wherein the second super network is used for obtaining a target model corresponding to the target prediction task after carrying out iterative training through a training sample set.

Further, after the first network performance parameter of each sub-network is obtained, the target network performance parameter of the first super-network of the round can be determined based on the first network performance parameter of each sub-network of the round; the target network performance parameter value of the first super network may be determined by performing weighted average on the first network performance parameter value, and of course, the determination manner may be adjusted according to the actual requirement, which is not limited in this embodiment. For example, taking the first network performance parameter as the task prediction error rate as an example, an average prediction error rate of task prediction errors of a plurality of sub-networks may be obtained as the target prediction error rate of the first super-network. Further, the first super network structure parameter can be optimized based on the target network performance parameter of the first super network determined by the round of operation, and the optimized first super network is obtained. It can be understood that, each round of optimization of the network structure parameters is performed according to steps S120 to S140, that is, the optimization of the network structure parameters of the initial first super-network is performed for multiple rounds until the first optimization condition is satisfied, so that the optimized first super-network is obtained and used as the second super-network. It may be appreciated that the second super-network may be used as a target model corresponding to a target prediction task, for example, the target prediction task is a face pose prediction task, where the second super-network may be used as a face pose prediction model. In addition, at this time, the performance of task prediction of the second super network is greatly improved compared with that of the first super network which is not optimized.

In some embodiments, referring to fig. 4, after step S140, the following steps S150 to S170 may be further included:

step S150: and performing iterative training on the second super network by using the training sample set until a first target condition is met, so as to obtain the trained second super network.

It can be understood that, the second super-network is a super-network after searching for the pre-trained first super-network, so as to further improve the network performance of the second super-network, and further, the second super-network can be iteratively trained through a training sample set corresponding to the target prediction task until the first target condition is met, and a trained second super-network is obtained, where the performance of the trained second super-network is also greatly improved compared with that of the second super-network before training.

In some embodiments, training may be performed in a progressive shrink fashion for the second super network. The progressive contraction training includes three stages, as shown in fig. 5, in which the search space corresponding to the second super-network only includes sub-search spaces

Epsilon and->

Fixed to the maximum value in the corresponding set. After the training is converged, entering a second stage, wherein the search space corresponding to the second super network only comprises sub-search spaces +. >

And ε, & lt->

Fixed to the maximum value in the corresponding set. After the training is converged, a third stage is entered, in which the search space corresponding to the second super network contains +.>

Epsilon and->

The second super network in each stage inherits the weight of the previous stage, and the problems of large network training oscillation amplitude and low convergence speed caused by mutual coupling among sub-networks can be solved by adopting the progressive contraction training method. In addition, a sandwich rule is adopted during training, namely, in each round of training process, a maximum sub-network and a minimum sub-network are sampled, two intermediate-size networks are resampled, gradient is accumulated, one-time updating is performed during counter propagation, a large network can be utilized for monitoring a small network during retraining, the performance of the small network is improved, and then the performance of a second super-network after training is completed is improved. For example, to enhance the second subnetworkThe method is applied to the prediction accuracy of the face gesture prediction. In addition, the sub-network and the super-network weight share in the training process, so that the one-shot training mode can accelerate network convergence, and meanwhile, the sub-network with the inherited weight sampled in the searching process can achieve higher precision without fine tuning or head training.

Wherein the first target condition may be: the corresponding loss value is smaller than a preset value when the second super network is trained, the loss value is not changed, or the training times reach the preset times, and the like. It can be understood that after performing iterative training for a plurality of training periods on the second super-network according to the training sample set, where each training period includes a plurality of iterative training, the weight parameter in the second super-network is continuously optimized, so that the loss value is smaller and smaller, and finally becomes a fixed value or smaller than the preset value, where the loss value indicates that the second super-network has converged; of course, it may also be determined that the second super network has converged after the training number reaches the preset number. The preset value and the preset times are preset, and the numerical value of the preset value and the preset times can be adjusted according to different application scenes, which is not limited in this embodiment.

Step S160: and sampling the trained second super network according to the target performance index value to obtain a second sub-network corresponding to the second super network.

Step S170: predicting a second network performance parameter of the second sub-network by using a pre-trained performance prediction model until the second network performance parameter of the second sub-network obtained by sampling meets a second target condition, and taking the second sub-network meeting the second target condition as the target model.

Further, after training of the second super network is completed, the trained second super network may be sampled based on the target performance index value, to obtain a second sub-network corresponding to the second super network. Wherein, the target performance index value may be a performance index value related to hardware, and the performance index value includes, but is not limited to, any one or more of model inference delay (latency), activation amount, throughput, power consumption (power) and video memory occupancy rate; the target performance index value may be preset according to a hardware condition of the model deployment platform, which is not limited in this embodiment.

In some embodiments, a sub-network meeting the target performance index value can be quickly searched out from the second super-network as a target model based on a search algorithm of evolutionary computation and combined with the target performance index value. Specifically, the performance of each second sub-network in the sub-network population formed by the second sub-networks obtained by pre-training can be predicted by using a pre-training performance prediction model, evolution operation (crossover and mutation) is performed under a limiting condition (such as limitation of the number of floating point operations executed per second), evaluation and alternation of the second sub-networks in the population are performed, and after the target iteration number is met, the second sub-network with optimal performance in the population is output, and the second sub-network with optimal performance can be used as the target model. The performance prediction model may be obtained by training a multi-layer perceptron (Multilayer Perceptron, MLP) based on a target training sample set, where the target training sample set may be obtained by sampling a plurality of sub-networks in a trained second super-network and performing structural coding, that is, generating a sub-search space corresponding to each sub-network, simultaneously testing a network performance parameter value corresponding to each sub-network, and generating a training sample pair of { sub-network structural coding, sub-network performance }. Therefore, by pre-training the performance prediction model, test set reasoning on a complex convolutional neural network can be omitted, a large amount of time can be saved, searching of a network structure is accelerated, and the searching speed of searching a second sub-network meeting a target performance index value as a target model is improved; in addition, the second super network is the super network trained after the search space is optimized, so that the searching efficiency is greatly improved when the second sub network is searched, and the performance of the searched second sub network is also improved.

Alternatively, the search for the network structure may also be performed by a random and grid search, a gradient-based strategy or a reinforcement learning strategy, or the like search algorithm, which is not limited in this embodiment.

In other embodiments, steps S160 to S170 may also be performed on other computer devices, i.e. after the training of the second super network is completed, the other computer devices may obtain the trained second super network from the computer device for training the second super network; then searching for the neural network structure is performed through the embodiments of step S160 to step S170.

In this embodiment, before searching the neural network architecture, based on the network performance of the multiple sub-networks sampled for the first super-network, iterative optimization is performed on the initial search space corresponding to the first super-network to obtain an optimal search space, i.e. obtain the second super-network, so that the effectiveness of the search space design and the performance of the searched network structure can be improved. Then, training the second super network by adopting a progressive shrinkage training mode, wherein the sub-network and the super-network weight share in the training process, the one-shot training mode can accelerate the network convergence, and meanwhile, the sub-network with inherited weight sampled during searching can achieve higher precision without fine adjustment or head training; after the second super network is trained, a search algorithm based on evolutionary computation is adopted, and the most suitable sub network can be quickly searched and deployed aiming at different hardware platforms and efficiency constraints.

Referring to fig. 6, fig. 6 is a flowchart of a neural network optimization method according to another embodiment of the present application. The method for optimizing the neural network according to the embodiment of the present application will be described in detail below with reference to fig. 6. The optimization method of the neural network may include the steps of:

step S210: a first super network is acquired.

Step S220: and sampling the first super network to obtain a plurality of sub networks.

In this embodiment, the specific implementation of step S210 to step S220 may refer to the content in the foregoing embodiment, and will not be described herein.

Step S230: and testing the network performance of each sub-network by using a test sample set corresponding to the target prediction task to obtain a first network performance parameter of each sub-network, wherein the first network performance parameter at least comprises the task prediction error rate aiming at the target prediction task, and each test sample in the test sample set carries target tag information.

In some embodiments, referring to fig. 7, step S230 may include the following steps S231 to S232:

step S231: and inputting each test sample in the test sample set to each sub-network to obtain the predictive label information which is output by each sub-network and is specific to each test sample.

In this embodiment, each test sample in the test sample set for the test sub-network carries target tag information, based on which each test sample can be input to each sub-network, each sub-network can perform feature extraction based on the input test sample, and based on the extracted feature information, output predicted tag information corresponding to the test sample.

Step S232: and determining the error rate of label prediction of each sub-network based on the predicted label information of each test sample output by each sub-network and the target label information carried by each test sample.

Further, whether the two are matched or not can be determined according to the predicted label information aiming at each test sample and the target label information carried by each test sample, if the two are not matched, the predicted label is wrong, and if the two are matched, the predicted label is correct; and then, according to the prediction and matching results of each sub-network for each test sample, counting the error rate of label prediction of each sub-network for the test samples in the test sample set. The error rate of label prediction for each sub-network can be understood as the average error rate (mean average error, MAE) of label prediction for each sub-network for the test sample set.

Illustratively, taking the target prediction task as a face pose prediction task as an example, the test sample set is a sample image set containing a face, wherein target tag information carried by each sample image comprises target pose information of the face, and the target pose information comprises a target yaw angle (yaw), a target pitch angle (pitch) and a target roll angle (roll). And inputting each sample image into each sub-network, wherein the sub-network can predict the predicted label information of the face in the sample image, the predicted label information comprises the predicted gesture information of the face, and the predicted gesture information comprises a predicted yaw angle, a predicted pitch angle and a predicted roll angle. And further, whether the gesture error value of the target gesture information and the predicted gesture information is within a preset error value or not can be obtained, if so, the label prediction is determined to be correct, and if not, the label prediction is determined to be incorrect.

Step S240: and generating a target mapping relation between the error rate and the parameter value according to the task prediction error rate of each sub-network aiming at the target prediction task and the parameter value of the network structure parameter of each sub-network, wherein the parameter value of the network structure parameter of each sub-network is the maximum value of the network structure parameter in the sub-search space corresponding to each sub-network.

In this embodiment, the parameter value of the network structure parameter of each sub-network is the maximum value of the network structure parameter in the sub-search space corresponding to each sub-network, so that the generated target mapping relationship between the error rate and the parameter value, that is, the mapping relationship between the error rate and the value of the sub-search space, is generated.

Specifically, according to the counted task prediction error rate of each sub-network for the target prediction task and the parameter value of the network structure parameter of each sub-network, the mapping relation between the two may be fitted by the following linear function:

y＝ωx+b

where y characterizes the error rate and x characterizes the parameter values of the network structure parameters of each sub-network. Wherein ω is a positive number greater than 0, i.e., the error rate is positively correlated with the size of the subnetwork.

Step S250: and carrying out iterative optimization on the parameter values of the network structure parameters of the first super network according to the target mapping relation until the preset iterative times are reached, obtaining the optimized first super network, wherein the error rate of task prediction of the second super network for the target prediction task is smaller than that of task prediction of the first super network for the target prediction task as the second super network, and the parameter values of the network structure parameters of the first super network are the maximum value of the network structure parameters in the search space corresponding to the first super network.

In some embodiments, referring to fig. 8, step S250 may include the following steps S251 to S252:

step S251: and determining a first parameter adjustment value based on the target mapping relation.

Specifically, based on the linear mapping relation, determining the change rate of the error rate compared with the parameter value, wherein the change rate is omega in the linear function; and determining a first parameter adjustment value according to the change rate and a second parameter adjustment value, wherein the second parameter adjustment value is the current evolution step length optimized for the search space, namely, the product of the change rate and the second parameter adjustment value is obtained and is used as the evolution step length of the round for the search space. It can be appreciated that if the current iteration number is the first time, the current evolution step is the preset initial evolution step, for example, γ _d ＝γ _e ＝γ _k ＝1，γ _d Evolution step size, gamma, for characterizing depth for convolution unit _e Evolution step size gamma for characterizing expansion rate for number of input channels in convolution block _k For characterizing the evolution step size for the size of the depth-wise-conv convolution kernel in the convolution block. If the current iteration number is the T-th iteration, the current evolution step is the evolution step at the T-1 th iteration, in other words, the second parameter adjustment value is the current actual evolution step.

Step S252: and iteratively adjusting the parameter value of the network structure parameter of the first super-network based on the first parameter adjustment value until the preset iteration times are reached, so as to obtain the adjusted first super-network as the second super-network, wherein the parameter value of the network structure parameter of the second super-network is smaller than the parameter value of the network structure parameter of the first super-network.

Further, during parameter adjustment of each round, reducing a network structure parameter value of the first super network based on the first parameter adjustment value, specifically, obtaining a difference value between the current parameter value of the network structure parameter of the first super network and the first parameter adjustment value, and taking the difference value as the parameter value of the network structure parameter of the first super network after the update of the round of adjustment; and gradually shrinking the corresponding search space of the first super network until the iteration times reach the preset iteration times, stopping the iterative adjustment of the parameter values, and obtaining the first super network corresponding to the finally optimized search space as the second super network. It can be understood that the value range of the search space corresponding to the second super-network is smaller than the value range of the search space corresponding to the first super-network, i.e. the parameter value of the network structure parameter of the second super-network is smaller than the parameter value of the network structure parameter of the first super-network. Therefore, the optimization of the search space is realized by gradually narrowing the value range of the search space, so that the gradual reduction of the error rate of task prediction of the first super network is realized, namely the improvement of the network performance of the first super network is realized gradually.

Specifically, step S250 may characterize the iterative update of the parameter values of the network structure parameters of the first super-network by the following formula:

wherein, the liquid crystal display device comprises a liquid crystal display device,

parameter values for network structure parameters of the ith dimension in a search space corresponding to a first supernetwork for characterizing the t+1st round, +.>

In the search space corresponding to the first super network for representing the t-th round, the parameter value of the i-th network structure parameter, w is the change rate of the error rate in the linear function compared with the parameter value of the search space, and->

The method comprises the steps of representing the actual evolution step length of the ith network structure parameter in a search space corresponding to a first super network of a t-th round.

It will be appreciated that there is a minimum threshold for the size of the search space, e.g., the size of the convolution kernel cannot be less than 1 x 1, and therefore, when acquired

And then, the search space is compared with a minimum threshold value to ensure that the optimized search space can realize the search of the network structure. In addition, in this embodiment, the parameter value of the network structure parameter of the first super-network is not updated by using a fixed evolution step, that is, the larger the change rate of the error rate is, the larger the value of the evolution step is, so that the parameter value of the network structure parameter of the current first super-network can be reduced to the greatest extent, so as to quickly reduce the error rate of task prediction of the first super-network. Therefore, the parameter values of the network structure parameters of the first super network can be adjusted to the proper positions more quickly and accurately by means of iteratively updating the parameter values of the network structure parameters of the first super network based on the first parameter adjustment values dynamically determined by the error rate of the task prediction, so that optimization of the search space corresponding to the first super network can be realized more quickly and accurately, and finally an optimal search space is obtained, namely the efficiency of setting the search space is effectively improved, and the performance of the searched network can be improved.

In this embodiment, based on the first parameter adjustment value dynamically determined by the error rate of the task prediction, the parameter value of the network structure parameter of the first super-network is iteratively updated, so that the parameter value of the network structure parameter of the first super-network can be adjusted to a suitable position more quickly, that is, optimization of the search space corresponding to the first super-network can be realized more quickly, and an optimal search space is finally obtained. In other words, the initial search space is effectively optimized without prior knowledge of people, so that the effectiveness of the search space design is improved, and the performance of the search architecture is improved.

Referring to fig. 9, a block diagram of a neural network optimization apparatus 300 according to an embodiment of the present application is shown. The apparatus 300 may include: the device comprises a super network acquisition module 310, a first sampling module 320, a performance test module 330 and a super network optimization module 340.

The super network acquisition module 310 is configured to acquire a first super network.

The first sampling module 320 is configured to sample the first super network to obtain a plurality of sub-networks.

The performance testing module 330 is configured to test the network performance of each of the sub-networks by using a test sample set corresponding to the target prediction task, so as to obtain a first network performance parameter of each of the sub-networks.

The super-network optimization module 340 is configured to perform iterative optimization on the configuration parameters until a first optimization condition is satisfied, so as to obtain the optimized first super-network, and the second super-network is used as a second super-network, and after performing iterative training through a training sample set, obtain a target model corresponding to the target prediction task.

In some embodiments, the first network performance parameter includes at least a task prediction error rate for the target prediction task, each test sample in the set of test samples carries target tag information, and performance test module 330 includes: a tag prediction unit and an error rate prediction unit. The label prediction unit may be configured to input each test sample in the test sample set to each sub-network, to obtain predicted label information for each test sample output by each sub-network. The error rate prediction unit may be configured to determine an error rate of label prediction of each sub-network based on the predicted label information for each test sample output by each sub-network and the target label information carried by each test sample.

In some embodiments, the first network performance parameter includes at least an error rate of task prediction for the target prediction task, and the super network optimization module 340 may include: and the mapping relation generating unit and the iterative optimization unit. The mapping relation generating unit may be configured to generate, according to the error rate of task prediction of each sub-network for the target prediction task and the parameter value of the network structure parameter of each sub-network, a target mapping relation between the error rate and the parameter value, where the parameter value of the network structure parameter of each sub-network is the maximum value of the network structure parameter in the sub-search space corresponding to each sub-network. The iterative optimization unit may be configured to iteratively optimize a parameter value of the network structure parameter of the first super network according to the target mapping relationship until a preset iteration number is reached, so as to obtain, as the second super network, an optimized first super network, where an error rate of task prediction of the second super network for the target prediction task is smaller than an error rate of task prediction of the first super network for the target prediction task, and a parameter value of the network structure parameter of the first super network is a maximum value of network structure parameters in a search space corresponding to the first super network.

In this manner, the error rate in the target mapping relationship is positively correlated with the parameter value, and the iterative optimization unit may include: the parameter adjustment value determines the subunit to iteratively optimize the subunit. Wherein the parameter adjustment value determination subunit may be configured to determine the first parameter adjustment value based on the target mapping relationship. The iterative optimization subunit may be configured to iteratively adjust a parameter value of a network structure parameter of the first super-network based on the first parameter adjustment value until the preset iteration number is reached, so as to obtain, as the second super-network, the adjusted first super-network, where the parameter value of the network structure parameter of the second super-network is smaller than the parameter value of the network structure of the first super-network.

In some embodiments, the target mapping relationship is a linear mapping relationship, and the parameter adjustment value determining subunit may be specifically configured to: determining a rate of change of the error rate compared to the parameter value based on the linear mapping relationship; and determining the first parameter adjustment value according to the change rate and the second parameter adjustment value.

In some implementations, the first sampling module 320 may include: search space acquisition unit and sampling unit. The search space obtaining unit may be configured to obtain a search space corresponding to the first super network, where the search space includes a first value range of each of a plurality of network structure parameters of the first super network. The sampling unit may be configured to sample a plurality of the sub-networks based on the search space, where a parameter value of each of the network configuration parameters in each of the sub-networks is located within the first value range of each of the network configuration parameters.

In some embodiments, the optimization apparatus 300 of the neural network may further include: the system comprises a super-network training module, a second sampling module and an iteration optimization module. The super-network training module may be configured to iteratively optimize the network structure parameter of the first super-network based on the first network performance parameter of each sub-network until a first optimization condition is met, obtain the optimized first super-network, and after using the first super-network as a second super-network, iteratively train the second super-network by using the training sample set until a first target condition is met, thereby obtaining the trained second super-network. The second sampling module may be configured to sample the trained second super network according to the target performance index value, to obtain a second sub-network corresponding to the second super network. The iterative optimization module may be configured to predict a second network performance parameter of the second sub-network by using a pre-trained performance prediction model, until the second network performance parameter of the second sub-network obtained by sampling meets a second target condition, and take the second sub-network meeting the second target condition as the target model.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus and modules described above may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.

In several embodiments provided herein, the coupling of the modules to each other may be electrical, mechanical, or other.

In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules.

In summary, a first super network is obtained; sampling the first super network to obtain a plurality of sub-networks; testing the network performance of each sub-network by using a test sample set corresponding to the target prediction task to obtain a first network performance parameter of each sub-network; and carrying out iterative optimization on network structure parameters of the first super network based on the first network performance parameters of each sub network until the first optimization condition is met, and obtaining the optimized first super network as a second super network, wherein the second super network is used for obtaining a target model corresponding to a target prediction task after carrying out iterative training through a training sample set. In this way, before searching the neural network architecture, based on the network performance of a plurality of sub-networks sampled by aiming at the first super-network, iterative optimization is firstly carried out on the initial search space corresponding to the first super-network to obtain the optimal search space, namely the second super-network with more optimized network structure parameters is obtained, so that the effectiveness of the search space design and the performance of the searched network structure can be improved, and the prediction performance of a target model aiming at a target prediction task based on the second super-network training can be improved.

A computer device provided in the present application will be described with reference to fig. 10.

Referring to fig. 10, fig. 10 shows a block diagram of a computer device 400 according to an embodiment of the present application, where the above method according to the embodiment of the present application may be performed by the computer device 400. The computer device may be an electronic terminal with data processing function, including but not limited to a smart phone, tablet computer, notebook computer, desktop computer, smart watch, electronic book reader, MP3 (Moving Picture Experts Group Audio Layer III, dynamic image expert compression standard audio layer 3) player, MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert compression standard audio layer 4) player, smart home device, etc.; of course, the computer device may be a server, and the server may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, network acceleration services (Content Delivery Network, CDN), and basic cloud computing services such as big data and an artificial intelligence platform.

The computer device 400 in embodiments of the present application may include one or more of the following components: a processor 401, a memory 402, and one or more application programs, wherein the one or more application programs may be stored in the memory 402 and configured to be executed by the one or more processors 401, the one or more program(s) configured to perform the method as described in the foregoing method embodiments.

Processor 401 may include one or more processing cores. The processor 401 connects the various portions of the overall computer device 400 using various interfaces and lines, executing various functions of the computer device 400, and processing data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 402, and invoking data stored in the memory 402. Alternatively, the processor 401 may be implemented in at least one hardware form of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 401 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may also be integrated into the processor 401, implemented solely by a communication chip.

The Memory 402 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Memory 402 may be used to store instructions, programs, code sets, or instruction sets. The memory 402 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (e.g., a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described below, etc. The storage data area may also store data created by the computer device 400 in use (such as the various correspondences described above), and so forth.

In the several embodiments provided herein, the illustrated or discussed coupling or direct coupling or communication connection of the modules to each other may be through some interfaces, indirect coupling or communication connection of devices or modules, electrical, mechanical, or other forms.

Referring to fig. 11, a block diagram of a computer readable storage medium according to an embodiment of the present application is shown. The computer readable medium 500 has stored therein program code which may be invoked by a processor to perform the methods described in the method embodiments above.

The computer readable storage medium 500 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Optionally, the computer readable storage medium 500 comprises a non-transitory computer readable medium (non-transitory computer-readable storage medium). The computer readable storage medium 500 has storage space for program code 510 that performs any of the method steps described above. The program code can be read from or written to one or more computer program products. Program code 510 may be compressed, for example, in a suitable form.

In some embodiments, a computer program product or computer program is provided that includes computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer-readable storage medium and executes the computer instructions to cause the electronic device to perform the steps of the method embodiments described above.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, one of ordinary skill in the art will appreciate that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not drive the essence of the corresponding technical solutions to depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A method of optimizing a neural network, the method comprising:

acquiring a first super network;

sampling the first super network to obtain a plurality of sub-networks;

Testing the network performance of each sub-network by using a test sample set corresponding to a target prediction task to obtain a first network performance parameter of each sub-network;

and carrying out iterative optimization on the network structure parameters of the first super network based on the first network performance parameters of each sub network until a first optimization condition is met, obtaining the optimized first super network as a second super network, wherein the second super network is used for obtaining a target model corresponding to the target prediction task after carrying out iterative training through a training sample set.

2. The method according to claim 1, wherein the first network performance parameter includes at least a task prediction error rate for the target prediction task, each test sample in the test sample set carries target tag information, the testing the network performance of each sub-network by using the test sample set corresponding to the target prediction task, to obtain the first network performance parameter of each sub-network, includes:

inputting each test sample in the test sample set to each sub-network to obtain predictive label information which is output by each sub-network and aims at each test sample;

And determining the error rate of label prediction of each sub-network based on the predicted label information of each test sample output by each sub-network and the target label information carried by each test sample.

3. The method according to claim 1, wherein the first network performance parameter includes at least an error rate of task prediction for the target prediction task, and the iteratively optimizing the network structure parameter of the first super-network based on the first network performance parameter of each sub-network until a first optimization condition is satisfied, to obtain the optimized first super-network as a second super-network, including:

generating a target mapping relation between the error rate and the parameter value according to the task prediction error rate of each sub-network aiming at the target prediction task and the parameter value of the network structure parameter of each sub-network, wherein the parameter value of the network structure parameter of each sub-network is the maximum value of the network structure parameter in the sub-search space corresponding to each sub-network;

and carrying out iterative optimization on the parameter values of the network structure parameters of the first super network according to the target mapping relation until the preset iterative times are reached, obtaining the optimized first super network, wherein the error rate of task prediction of the second super network for the target prediction task is smaller than that of task prediction of the first super network for the target prediction task as the second super network, and the parameter values of the network structure parameters of the first super network are the maximum value of the network structure parameters in the search space corresponding to the first super network.

4. A method according to claim 3, wherein the error rate in the target mapping relationship is positively correlated with the parameter value;

performing iterative optimization on the parameter value of the network structure parameter of the first super-network according to the target mapping relation until the preset iteration times are reached, and obtaining the optimized first super-network as the second super-network, wherein the iterative optimization comprises the following steps:

determining a first parameter adjustment value based on the target mapping relationship;

and iteratively adjusting the parameter value of the network structure parameter of the first super-network based on the first parameter adjustment value until the preset iteration times are reached, so as to obtain the adjusted first super-network as the second super-network, wherein the parameter value of the network structure parameter of the second super-network is smaller than the parameter value of the network structure parameter of the first super-network.

5. The method of claim 4, wherein the target mapping relationship is a linear mapping relationship, and wherein the determining a first parameter adjustment value based on the target mapping relationship comprises:

determining a rate of change of the error rate compared to the parameter value based on the linear mapping relationship;

And determining the first parameter adjustment value according to the change rate and the second parameter adjustment value.

6. The method of claim 1, wherein the sampling the first subnetwork to obtain a plurality of subnetworks comprises:

acquiring a search space corresponding to the first super network, wherein the search space comprises a first value range of each network structure parameter in a plurality of network structure parameters of the first super network;

and based on the search space, sampling a plurality of sub-networks, wherein the parameter value of each network structure parameter in each sub-network is positioned in the first value range of each network structure parameter.

7. The method according to any one of claims 1-6, wherein after iteratively optimizing the network configuration parameters of the first super-network based on the first network performance parameters of each sub-network until a first optimization condition is met, obtaining the optimized first super-network as a second super-network, the method further comprises:

performing iterative training on the second super network by using the training sample set until a first target condition is met, so as to obtain the trained second super network;

Sampling the trained second super network according to the target performance index value to obtain a second sub-network corresponding to the second super network;

predicting a second network performance parameter of the second sub-network by using a pre-trained performance prediction model until the second network performance parameter of the second sub-network obtained by sampling meets a second target condition, and taking the second sub-network meeting the second target condition as the target model.

8. An optimization apparatus for a neural network, the apparatus comprising:

the super network acquisition module is used for acquiring a first super network;

the first sampling module is used for sampling the first super network to obtain a plurality of sub-networks;

the performance testing module is used for testing the network performance of each sub-network by utilizing a testing sample set corresponding to the target prediction task to obtain a first network performance parameter of each sub-network;

the super-network optimization module is used for carrying out iterative optimization on network structure parameters of the first super-network based on the first network performance parameters of each sub-network until a first optimization condition is met, obtaining the optimized first super-network as a second super-network, and the second super-network is used for obtaining a target model corresponding to the target prediction task after carrying out iterative training through a training sample set.

9. A computer device, comprising:

one or more processors;

a memory;

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method of any of claims 1-7.

10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a program code, which is callable by a processor for performing the method according to any one of claims 1-7.