WO2022126448A1

WO2022126448A1 - Neural architecture search method and system based on evolutionary learning

Info

Publication number: WO2022126448A1
Application number: PCT/CN2020/136950
Authority: WO
Inventors: 程然; 谭浩; 何成; 侯章禄; 邱畅啸; 杨帆
Original assignee: 华为技术有限公司; 南方科技大学
Priority date: 2020-12-16
Filing date: 2020-12-16
Publication date: 2022-06-23
Also published as: CN116964594A

Abstract

A neural architecture search method and system based on evolutionary learning. The method comprises: S101, initializing a population, wherein each neural architecture in the population is an architecture code; S102, randomly selecting two architecture codes in the population, decoding the two architecture codes into two neural architectures for pairing, and inheriting corresponding weights from a supernet, so as to obtain first and second neural network models; S103, evaluating the first and second neural network models which have been trained, so as to obtain a winner and a loser; S104, updating the supernet according to the trained first and second neural network models; S105, calculating a pseudo-gradient value, such that the loser learns from the winner, and obtaining an architecture code of a third neural architecture; S106, replacing, in the population, the architecture code of the loser with the architecture code of the third neural architecture, and updating the population; and S107, outputting an optimal neural network model from the population, and performing iterative evolution on the updated population.

Description

A neural network structure search method and system based on evolutionary learning

technical field

The present application relates to the field of artificial intelligence, and in particular, to a method and system for searching neural network structures based on evolutionary learning.

Background technique

As the learning task becomes more complex, the neural network model design becomes more and more complex. Designing a high-performance neural network requires a lot of professional knowledge and repeated manual experiments, which greatly increases computing resources and time costs. Using algorithms to automatically search for neural network structure models can save labor costs and optimize neural network models. Neural Network Architecture Search (NAS) is a technology for automatically designing neural networks, which enables a computer algorithm to automatically search for neural network preference models according to deep learning tasks. NAS is one of the hotspots in the field of automatic machine learning (Auto-ML). By designing cost-effective search methods to automatically obtain neural network structures with strong generalization capabilities and friendly hardware requirements, a large number of researchers' creativity can be liberated.

The three main components of the core design decisions related to the NAS method are: search space definition, search strategy, and search target evaluation. A neural network usually consists of many nodes. When searching for a neural network structure, a completely arbitrary combination of nodes can be used, that is, each node can be connected to any other node, and there are different operations between nodes to choose from. . The search space increases exponentially with the number of nodes, the search space is huge, and the search speed is very slow. Since the search space involved in NAS is huge, and its performance evaluation often involves model training, it consumes a lot of resources.

SUMMARY OF THE INVENTION

In order to solve the above problems, embodiments of the present application provide a method, system, electronic device and storage medium for searching a neural network structure based on evolutionary learning.

In a first aspect, the present application provides a method for searching a neural network structure based on evolutionary learning, the method comprising: S101 , initializing a population, where the population is a structure code set including multiple different neural network structures, the structure code A mapping relationship used to indicate the connection and operation between any two nodes of the neural network structure through a continuous real number interval; S102, randomly select two structural codes in the population, and decode the two structural codes to Obtaining two neural network structures, and pairing the two neural network structures; respectively inheriting the corresponding weights from the supernet to obtain the first neural network model and the second neural network model; The supernet includes a set of multiple operations and the weight of each operation; S103, train the first and second neural network models respectively to obtain the trained first and second neural network models; The voice, video or graphic samples are input into the trained first and second neural network models, and the error value between the output result and the label is calculated to obtain the winner and the loser, and the error value of the winner is less than that of the loser. error value; S104, update the supernet according to the trained first and second neural network models; S105, calculate the pseudo gradient value between the structural code of the loser and the structural code of the winner, Evolve the structural code of the loser to the structural code of the winner based on the pseudo gradient value, so as to obtain a third neural network structural code; the pseudo gradient is the gradient of the structural code update; S106, use the third neural network structural code The neural network structure code replaces the structure code of the neural network structure corresponding to the loser in the population to obtain an updated population; S107, output the optimal neural network model in the updated population, thereby completing the neural network structure. search.

This embodiment uses a continuous real number space to represent the neural network structure, which can reduce the search space corresponding to the operation selection, improve the NAS search efficiency, and increase the diversity of the neural network structure in the population to match the second-order learning evolution of the subsequent neural network. It can solve the problems of poor effect of existing neural network structure search methods and high consumption of computing resources.

In an embodiment, the outputting the optimal neural network model in the updated population, so as to complete the search of the neural network structure, includes: outputting the optimal neural network model in the updated population when the termination condition is satisfied. The neural network model to complete the search of the neural network structure.

In this embodiment, the population is iteratively evolved by setting termination conditions, which improves the reliability of the neural network structure search.

In an embodiment, the outputting the optimal neural network model in the updated population, thereby completing the search of the neural network structure; including: if the termination condition is not satisfied, returning to S102, and performing a search on the updated population. The population is iteratively evolved until the termination condition is met, and the optimal neural network model in the updated population is output, thereby completing the search of the neural network structure.

This embodiment finds a set of optimal neural network models through an iterative method based on the characteristics of the population, which can provide decision makers with multiple choices.

In an embodiment, the step of inheriting the corresponding weights of the two neural network structures respectively from the supernet to obtain the first neural network model and the second neural network model includes: converting the first neural network structure Inheriting the same connection as the first neural network structure and the first weight corresponding to the same operation corresponding to the connection from the supernet to obtain the first neural network model; The network structure inherits the same connection as the second neural network structure and the second weight corresponding to the same operation corresponding to the connection from the supernet to obtain the second neural network model.

This embodiment can speed up the speed of obtaining the model by inheriting the weights of the supernet, and significantly reduce the computational cost and running time required for searching the neural network; in the iterative process, the weights inherited by the neural network structure from the supernet are optimized weights , which significantly reduces the computational cost and running time required for searching neural networks.

In an embodiment, the separately training the first and second neural network models to obtain the trained first and second neural network models includes: training the first neural network at least once by using a stochastic gradient descent method The weight value of the network model is used to obtain the optimized first neural network model; the weight value of the second neural network model is trained at least once by using the stochastic gradient descent method to obtain the optimized second neural network model.

In this embodiment, first and second neural network models optimized for weight values are obtained by training the first and second neural network models.

In one embodiment, the first and second neural network models with labels are input into the trained first and second neural network models, and the error value between the output result and the label is calculated to obtain the winner and the loser The method includes: inputting the labeled voice, video or graphic samples into the trained first neural network model and the trained second neural network model respectively; an output result, calculate the first error value between the first output result and the label of the sample; according to the second output result of the trained second neural network model, calculate the difference between the second output result and the sample The second error value between labels; compare the first error value and the second error value, take the first/second neural network model with the smaller error value as the winner, and use the first/second neural network model with the larger error value as the winner. /Second neural network model as loser, get winner and loser.

In this embodiment, the paired first and second neural network models are trained by the labeled samples, and the performance of the trained first and second models is evaluated, so as to speed up the speed of finding the optimal model.

In an embodiment, the updating the supernet according to the trained first and second neural network models includes: two nodes of the first and second neural network models contain the same connection and the operation corresponding to the connection is the same, update the supernet by using the weight of the winner as the weight of the corresponding operation in the supernet.

This embodiment can synchronously optimize the operation weights of the supernet, so as to speed up the search; the weight update of the supernet can significantly reduce the computational cost and running time required for searching the neural network.

In an embodiment, the updating of the supernet according to the trained first and second neural network models includes: connecting two nodes of the first and second neural network models and the Under the condition that the operations corresponding to the connections are not the same, take the weight of the first neural network model as the weight of the connection in the supernet that has the same structure as the first neural network and the same operation corresponding to the connection; The weight of the second neural network model is used as the weight of the connection in the supernet with the same structure as the second neural network and the same operation corresponding to the connection; the supernet is updated.

In one embodiment, the calculating a pseudo gradient value between the structural code of the loser and the structural code of the winner, and based on the pseudo gradient value, the structural code of the loser is moved toward the winner. The structure coding evolution of the winner, to obtain the structure coding of the third neural network structure, comprising: calculating the difference between the structure coding value of the loser and the structure coding value of the winner, and multiplying the difference by a random coefficient, Accumulate and sum up the historical pseudo gradients under the random coefficient multiplier to obtain the value of the pseudo gradient updated by the structural code of the loser; sum the structural code value of the loser and the value of the pseudo gradient to obtain the The structure code of the third neural network structure is used to realize the evolution of the structure code of the loser to the structure code of the winner.

According to the evaluation results of the paired first and second neural network models, this embodiment enables the loser to perform structural evolution update by learning from the winner, so as to find the optimal neural network model more quickly.

In one embodiment, the termination condition includes whether all structural codes in the population participate in pairing or whether a set number of iterations is reached.

In this embodiment, the population is fully iteratively evolved by setting termination conditions, which improves the reliability of the neural network structure search.

In a second aspect, the present application provides a search system for a neural network structure based on evolutionary learning, the system includes: a population initialization module for initializing a population, where the population is a structure encoding set including a plurality of different neural network structures, The structure code is used to indicate the connection and operation mapping relationship between any two nodes of the neural network structure through a continuous real number interval; the individual pairing module is used to randomly select two structure codes in the population, Decode the two structure codes to obtain two neural network structures, and pair the two neural network structures; the weight inheritance module inherits the corresponding weights of the two neural network structures from the supernet respectively, and obtains a first neural network model and a second neural network model; wherein the supernet includes a set of multiple operations and a weight for each operation; a training module for training the first and second neural network models respectively to obtain training Good first and second neural network models; an evaluation module for inputting labeled voice, video or graphic samples into the trained first and second neural network models, and calculating the difference between the output result and the label. The error value obtains the winner and the loser, and the error value of the winner is smaller than that of the loser; the supernet weight update module is used to update the supernet according to the trained first and second neural network models; structure coding an evolution module, configured to calculate a pseudo gradient value between the structural code of the loser and the structural code of the winner, and make the structural code of the loser to the structural code of the winner based on the pseudo gradient value evolution to obtain a third neural network structure code; the pseudo gradient is the gradient of the structure code update; and a population update module for replacing the neural network corresponding to the loser in the population with the third neural network structure code The structure coding of the network structure obtains the updated population; the model output module outputs the optimal neural network model in the updated population, thereby completing the search of the neural network structure.

In one embodiment, the model output module is configured to output the optimal neural network model in the updated population under the condition that the termination condition is satisfied, so as to complete the search of the neural network structure.

In one embodiment, the model output module is configured to: if the termination condition is not met, return to S102, perform iterative evolution on the updated population, and output the updated population after the termination condition is met. The optimal neural network model is used to complete the search of the neural network structure.

In one embodiment, the weight inheritance module is configured to: inherit the first neural network structure from the supernet to the same connection as the first neural network structure and the same connection corresponding to the connection Operate the corresponding first weight to obtain the first neural network model; inherit the second neural network structure from the supernet and inherit the same connection as the second neural network structure and corresponding to the connection The second weight corresponding to the same operation is obtained to obtain the second neural network model.

In one embodiment, the training module is used for: training the weight value of the first neural network model at least once by using stochastic gradient descent, to obtain the optimized first neural network model; training by using stochastic gradient descent The weight value of the second neural network model is obtained at least once to obtain the optimized second neural network model.

In one embodiment, the evaluation module is used to: input the labeled voice, video or graphic samples into the trained first neural network model and the trained second neural network model respectively; The first output result of the trained first neural network model is calculated, and the first error value between the first output result and the label of the sample is calculated; according to the second output result of the trained second neural network model, Calculate the second error value between the second output result and the label of the sample; compare the first error value and the second error value, and record the first/second neural network model with a smaller error value as For the winner, the first/second neural network model with larger error value is recorded as the loser, and the winner and loser are obtained.

In one embodiment, the supernet weight update module is configured to: under the condition that the two nodes of the first and second neural network models contain the same connection and the corresponding operations of the connection are the same, The weight of the winner is used as the weight of the corresponding operation in the supernet to update the supernet.

In an embodiment, the supernet weight update module is configured to: under the condition that the connection of the two nodes of the first and second neural network models and the operation corresponding to the connection are different, the The weight of the first neural network model is used as the weight of the connection in the supernet that has the same structure as the first neural network and the same operation corresponding to the connection; the weight of the second neural network model is used as the supernet. in the connection with the same structure as the second neural network and the weight of the same operation corresponding to the connection; update the supernet.

In one embodiment, the structure coding evolution module is configured to: calculate the difference between the structure coding value of the loser and the structure coding value of the winner, multiply the difference by a random coefficient, and randomly The historical pseudo-gradients under the coefficient magnification are accumulated and summed to obtain the value of the pseudo-gradient updated by the structural code of the loser; the value of the structural code of the loser and the value of the pseudo-gradient are summed to obtain the first The structure coding of the three neural network structures realizes the evolution of the structure coding of the loser to the structure coding of the winner.

In a third aspect, the present application provides an electronic device, including a memory and a processor; the processor is configured to execute computer-executable instructions stored in the memory, and the processor executes the computer-executable instructions to execute any one of the above implementations. The search method of the neural network structure based on evolutionary learning described in the method.

In a fourth aspect, the present application provides a storage medium, including a readable storage medium and a computer program stored in the readable storage medium, where the computer program is used to implement the evolution-based learning described in any of the foregoing embodiments The search method of the neural network structure.

A search method, system, electronic device and storage medium for a neural network structure based on evolutionary learning provided by the embodiments of the present application map a continuous space to a neural network structure so as to perform continuous mathematical operations on the structure, which can give the algorithm a better global Search ability; the optimal solution can be found faster through the structure update method based on the population-based paired second-order learning; at the same time, a set of solutions can be found based on the characteristics of the population, which can provide multiple choices for decision makers and improve the algorithm at the same time and the weight inheritance and update of the supernet can speed up the model evaluation and significantly reduce the computational cost and running time required for searching the neural network.

Description of drawings

1 is a schematic diagram of an application environment of a neural network structure search provided by an embodiment of the present application;

Figure 2 is a flowchart of the population-based neural network structure search proposed by the first scheme;

FIG. 3 provides a basic framework diagram of a neural network structure search based on evolutionary learning for a system embodiment of the present application;

FIG. 4 is a general flowchart of a method for searching a neural network structure based on evolutionary learning provided by an embodiment of the present application;

5a is a block diagram of a specific embodiment of a method for searching a neural network structure based on evolutionary learning provided by an embodiment of the application;

Figure 5b is a flow chart of population initialization;

Figure 5c is a block diagram of the initialization flow of the supernet;

6 is a schematic diagram of operations between two nodes in a method for searching a neural network structure based on evolutionary learning provided by an embodiment of the present application;

7 is a schematic diagram of connections and operations between two nodes of a supernet in a method for searching a neural network structure based on evolutionary learning provided by an embodiment of the present application;

8 is a flowchart of a method for updating a supernet weight provided by an embodiment of the present application;

9 is a flowchart of a population pairing-based structure updating method provided by an embodiment of the present application;

10 is a system block diagram of a neural network structure search based on evolutionary learning provided by an embodiment of the application;

11 is a block diagram of a system for updating a supernet provided by an embodiment of the present application;

FIG. 12 is a schematic diagram of an electronic device according to an embodiment of the present application.

Detailed ways

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" can be the same or a different subset of all possible embodiments, and Can be combined with each other without conflict.

In the following description, the terms "first\second\third, etc." or module A, module B, module C, etc., are only used to distinguish similar objects, and do not represent a specific ordering of objects. It is understood that Indeed, where permitted, the specific order or sequence may be interchanged to enable the embodiments of the application described herein to be practiced in sequences other than those illustrated or described herein.

In the following description, the reference numerals indicating steps, such as S110, S120, etc., do not necessarily mean that this step will be performed, and the sequence of the preceding and following steps may be interchanged or performed simultaneously if permitted.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein are only for the purpose of describing the embodiments of the present application, and are not intended to limit the present application.

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.

The neural network structure search (NAS) technology is applied in a wide range of scenarios. For example, in the field of image recognition, using algorithms to automatically design neural network structure models can achieve better performance than manually designed neural network structures; another example, in the field of medical image processing, using neural network structure search to generate neural network structure models To process data such as Magnetic Resonance Imaging (MRI), Computed Tomography (CT) and Ultrasound to determine whether a patient has a disease. This application is applicable to all occasions involving image classification and segmentation, as well as to other scenarios related to video processing.

FIG. 1 is a schematic diagram of an application environment of a neural network structure search provided by an embodiment of the present application; as shown in FIG. 1 , the application scenario includes at least one background server 10 and a smart device 11 . The smart device 11 can be connected to the backend server 10 through the Internet; the smart device 11 can include smart devices capable of outputting medical images, voice, video or pictures, such as magnetic resonance imagers, smart speakers, smart cameras, and smart phones.

The intelligent device 11 is provided with a picture, voice, medical image or video collection device, and the collected picture, voice, medical image or video data can be sent to the background server 10, so that the background server 10 can input the picture, voice, medical image or video. Use neural network structure search to generate neural network structure models for classification, segmentation or identification.

NAS is a subset of hyperparameter optimization. Customized NAS methods are not actually fully automated, they rely on neural network structures specially hand-coded for the application or learning task as the starting point for the search. In general, the goal of the neural network structure search method is defined as:

In formula (1), α is defined as structural coding, ω is defined as weight information, and ω ^* is the corresponding optimal weight.

is the loss value on the training set,

is the loss value on the validation set. The meaning expressed by formula (1) is that the neural network structure search needs to find a structure code α, which can make the loss value on the validation set under the optimal weight ω ^*

as small as possible.

Current NAS research can be divided into three main categories: population-based neural network structure search, reinforcement learning-based neural network structure search, and differentiable neural network structure search.

The first solution is population-based neural network structure search. This method is one of the most common methods in current neural network structure search research. The general process is to initialize a population, and then select parent individuals to use crossover, mutation, etc. The operator updates the topology of the parent to obtain the topology of the child, and finally uses the idea of "survival of the fittest" to eliminate individuals with low fitness and retain the better individuals. By iterating the above process, the population can continuously evolve to obtain the global/local optimal solution.

Figure 2 shows the population-based neural network structure search proposed by the first scheme; as shown in Figure 2, the steps of population-based neural network structure search include: initializing the population, and the population is a collection of individuals containing different neural network structures; The individual of the neural network structure is trained, and its accuracy on the validation set is obtained as the fitness of the individual; it is judged whether the termination conditions set by the algorithm are met; if the judgment result is "No", the parent neural network structure is passed. Different crossover and mutation operators generate the offspring neural network structure; if the judgment result is "yes", the preference model is output; the offspring neural network structure is trained to obtain its accuracy on the validation set as the offspring fitness value; According to the size of the fitness value of the offspring, individuals with at least one neural network structure are selected from the neural network structure of the parent and the neural network structure of the offspring; the set of individuals containing the selected at least one neural network structure is a new population Output the selected new population.

The main difference between different population-based structure search algorithms lies in the steps of generating the child neural network structure from the parent neural network structure through different crossover and mutation operators. There are many designs of crossover and mutation operators. For example, the AmoebaNet (source link) algorithm defines a macro template of the neural network structure, and designs two mutation operators: an operator that changes different operations between nodes and an operator that changes the connections between different nodes. The Large-Scaleevolution algorithm does not define a macro template, and proposes eleven different mutation operators, including changing the learning rate operator, inserting the convolutional layer operator, removing the convolutional layer operator, changing the number of channels, etc. By executing different mutation operators, the method can automatically evolve a complex neural network structure from a simple neural network structure.

Although the population-based neural network structure search method has the advantages of being suitable for parallelism and high reliability, because a large number of individuals in the population need to evaluate their fitness, it needs to consume more GPU resources and time. For example, AmoebaNet needs 3150 GPUdays to complete the search. Task. Therefore, it is difficult for this method to obtain a balance between structure search accuracy and resource consumption.

Differentiable Architecture Search (DARTS) maps the neural network into a continuous space, and uses the gradient descent method to solve it, and the parameters such as the structure and weight of the neural network can be obtained at the same time. Specifically, the gradient information of the structure

for:

In formula (2), α is defined as the structure of the neural network model, ω represents the current weight, ω ^* (α) is the corresponding optimal weight, and ξ represents the learning rate for one step of internal optimization.

is the loss value on the validation set. This method approximates ω ^* (α) by training ω once, instead of continuously training ω to reach its convergence. This method searches the neural network structure along the direction of the gradient, so it can quickly search for a better neural network structure.

Although the structure search method based on differentiable neural network has the advantage of being fast. However, since this method only searches for a single body, compared with a population, because only a single structure is searched each time, the reliability is low. Moreover, this method only uses the gradient information of the monomer, and cannot avoid the local optimal structure; and this method uses a probability to encode each possible connection and operation, resulting in a huge search space corresponding to the encoding, and the cost of optimization is high. .

The following introduces the concept of a method and system for searching for a neural network structure based on evolutionary learning provided by the embodiments of the present application.

FIG. 3 provides a basic framework diagram of a neural network structure search based on evolutionary learning according to a system embodiment of the present application. As shown in FIG. 3 , an embodiment of the present application provides a method and system for neural network structure search based on evolutionary learning. On the basis of custom coding the structure of the neural network, the solution uses a population-based pairing mechanism to use two The method of order learning generates a new neural network model for population update; uses the supernet weight to train the newly generated neural network model using the gradient descent method, and uses the trained neural network model to update the weight of the supernet model to complete the neural network. Automatic search of network structures. In this process, performance evaluation is performed on the trained paired neural network models, and the losers of the evaluation learn from the winners to generate new neural network models for population update. The solution can solve the problems of poor effect of existing neural network structure search methods and high consumption of computing resources.

In the above scheme, self-defined coding refers to coding the neural network structure according to the learning task or applying artificially set coding rules. In this application, the nodes in the neural network structure can be represented by multiple real variables respectively, and the connection and operation between any two nodes are unified and independent codes.

In the above scheme, a supernet is a directly defined neural network with the same number of nodes as the neural network model in the initialized population, including all connection relationships and operation relationships, and the corresponding weights of its operations are used for sharing. The structure of the supernet is fixed, and its optimal weight can be optimized by standard backpropagation. The optimized weight value is applicable to all neural network models to improve the recognition performance.

FIG. 4 is a flowchart of a method for searching a neural network structure based on evolutionary learning according to an embodiment of the present application. As shown in Figure 2, the flow of the method is: S101, initialize the population and the supernet; each neural network structure of the population is a structural code, and the population initialization is to randomly initialize these codes. S102 , for each code initialized in the population, first decode it into a neural network structure and then perform random pairing, and the two paired neural network structures respectively inherit weights from the initialized supernet. S103: Perform training and optimization on the two paired neural network structures after inheriting the weights according to the learning task, respectively, to obtain a neural network model, and evaluate the performance of the two trained neural network models on the verification set respectively, and obtain according to the evaluation results. Losers and winners. S104, update the corresponding weight value of the supernet according to the neural network model obtained after training in S103 and the evaluation result; S105, according to the evaluation result, make the structural code of the loser learn from the structural code of the winner to obtain a new structure of the neural network coding, and then replace the structural coding of the loser in the population with the structural coding of the new neural network to update the population; S106, judge whether the termination condition is met, if the termination condition is met, execute S107; otherwise, return to S102 to update the population Iterative evolution. The termination condition is that all individuals in the population participate in pairing and reach the set number of iterations; S107 , output the preference model in the new population. The preference model is the optimal neural network that meets the needs of the learning task.

FIG. 5a is a block diagram of an embodiment of a method for searching a neural network structure based on evolutionary learning according to an embodiment of the present application. As shown in Figure 5a, the method is implemented by performing the following steps.

S201, initialize the population and supernet.

Specifically, in the population initialization step, the coding rules are customized for the application or learning task, and the structure coding of the neural network structure is generated according to the coding rules, and the continuous real number intervals are respectively mapped to the neural network structure. Applications or learning tasks here include classifying, segmenting, or recognizing input pictures, speech, medical images, or videos.

The population initialization process is shown in Figure 5b. S2011: Set the nodes of the neural network structure, express the connection between the set nodes as continuous real numbers, randomly connect the nodes, and encode the connection of the nodes and the operation corresponding to the connection into the structure code α of the neural network , α is set as a vector, including the connections between nodes and the operations in these connections, so as to map continuous real-number intervals to the neural network structure respectively.

For example, the neural network structure can be set to have m nodes, and the continuous real number space [0, 1), [1, 2), [2, 3), [3, 4).. .[m-1, m) is mapped to m nodes, the first two nodes represent the input, and the latter node randomly selects the two nodes in front of it to connect. Therefore, each node needs to store four variables except the first two, two of which represent the node codes corresponding to the connected nodes, and the other two variables represent the operation codes corresponding to the operations represented by the two connections. Each node is represented by four variables, the operation code α is a vector containing a plurality of four variables, and the value of the operation code is defined in the real number space with a difference between the upper and lower limits of 1. Exemplarily, suppose that the neural network has four nodes, m=4, the number of the nodes is 0, 1, 2, 3; the coding range of each node is [0, 1), [1, 2), [2 , 3), [3, 4); For example, the connection code of a certain node is 0.5, 2.3, which means that the node is connected with node 0 and node 2.

S2012 , generating N structural codes by initializing the coding rules in S2011 to form a population, where N is a natural number greater than 2. N structure codes can be decoded into N neural network structures with the same number of nodes, different connection relationships and different operations.

Using continuous real number space to represent the neural network structure can increase the diversity of the neural network structure within the population to match the second-order learning evolution of the subsequent neural network.

The initialization flow of the supernet is shown in Figure 5c.

S2013 , set the nodes of the supernet, and the number of nodes of the supernet is the same as the number of nodes of the neural network structure in the population.

S2014, set multiple operations between every two nodes, and set the weight of each operation.

S2015 , representing the operations between each node as a weight value set, where the weight value set covers all possible operations required for the application or learning task; the weight value set is shared.

The neural network structure represented by any possible structural code in the population is a sub-network of the supernet, and the sub-network is recorded as a network unit. Only one operation can be selected between every two nodes of the neural network structure in the population.

For example, as shown in Figure 6, suppose there are three possible operations, the first operation is the operation of the 3*3 average pooling layer, the second operation 3*3 is the operation of the max pooling layer, and the third operation is the operation of 3*3 convolutional layers; the continuous real number space of [0, 1) can be mapped to the first operation, and the operation between node 0 and node 1 is represented by the operation code ∈ [0, 1) as the average The operation of the pooling layer; the continuous real number space of [1, 2) can be mapped into the second operation, and the operation code ∈ [1, 2) represents the operation as the operation of the maximum pooling layer; the [2, 3) The continuous real number space of is mapped into a third operation, denoted by the operation code ∈ [2, 3) as the operation of the convolutional layer. Finally choose an operation as the operation between these two nodes. The schematic diagram of the connection between any two nodes of the supernet is shown in Fig. 7. The supernet does not involve structural coding, and every two nodes of the supernet can include all possible connections required for application or learning tasks in parallel. Operations, including but not limited to operations in average pooling layers, operations in max pooling layers, and operations in convolutional layers. It is set that each operation contains its own weight information and needs to be trained separately.

In the present application, the step of initializing the population is coded using the coding rules of the connection between two nodes and the operation search, which can map independent continuous variable intervals to the connection between the two nodes and the operation corresponding to the connection, which can reduce the The operation selects the corresponding search space, improves the NAS search efficiency, and can convert discrete real numbers, combination numbers, and probability values into continuous real numbers.

Returning to Fig. 5a, execute S202, randomly select the structure codes corresponding to the two neural network structures in the population, decode into two neural network structures for pairing, and the paired two neural network structures inherit weights from the supernet. The weight includes the weight value of the operation.

Specifically, taking any pair of neural network structures randomly paired in the population as an example, the nth neural network structure is recorded as the first neural network structure, and the n+1th neural network structure is recorded as the second neural network structure, The first neural network inherits the same connection as the first neural network structure and the weight corresponding to the same operation corresponding to the connection from the initialized supernet to obtain the first neural network model; the second neural network structure is obtained from the initialized supernet Inheriting the same connection as the second neural network structure and the weight corresponding to the same operation corresponding to the connection, the second neural network model is obtained.

For example, in the supernet with the same connection as the first neural network structure and the same operation corresponding to this connection, the operation weight of the convolutional layer is 2.6. The first neural network structure inherits the same connection as the first neural network structure and the weight value corresponding to the same operation corresponding to the connection from the initialized supernet, so that the weight of the operation of the convolutional layer of the first neural network structure is 2.6. As above, the second neural network structure inherits from the supernet a connection with the same structure and a weight value corresponding to the same operation corresponding to the connection.

It should be noted that the weights inherited by the paired two neural network structures from the supernet for the first time are the weight values corresponding to the connection with the same structure in the initialization supernet and the same operation corresponding to the connection, and in the process of each subsequent iteration. The weights inherited by the paired two neural network structures from the supernet are the weight values corresponding to the connection in the updated supernet with the same structure and the same operation corresponding to the connection.

S203 , in combination with the learning task, perform one or more gradient descent training on the two neural network models after inheriting the weight, optimize the weight value, and verify the trained first neural network model and the second neural network model on the verification set, respectively, Obtain the error value of the first neural network model and the error value of the second neural network model, compare the two error values, and record the neural network model with the smaller error value as the winner, and the neural network model with the larger error value as the loser , get the evaluation result.

Specifically, in S2031, combined with the learning task, the stochastic gradient descent method is used to train the two neural network models respectively, and the calculation formula (3) obtains the weight drop value of the current neural network model, and the calculation formula (4) obtains the optimized neural network model. Weight ω:

ω(t)=ω(t-1)-η(t)*Δω(t) (4)

Among them, t is the number of iterations of the stochastic gradient descent method, Δω(t) represents the weight drop value of the t-th generation neural network model, ω(t) represents the weight value of the t-th iteration neural network model after optimization, β is the momentum, η(t) is the learning rate,

The error value (loss) for the neural network on the training set, which is obtained by calculating the accuracy of the current neural network model on the validation set.

In the embodiment of the present application, the first neural network model is respectively trained along the gradient descent direction to operate the weights, and the optimized weight value ω ₁ is calculated according to the calculated weight drop value Δω ₁ (t) to obtain the first neural network model. The first neural network model for one optimization. Similarly, the second neural network model is trained along the gradient direction to operate the weight, and the optimized weight value ω ₂ is calculated according to the calculated weight drop value Δω ₂ (t) to obtain the optimized second neural network model.

S2032, respectively verify the error values of the first neural network model optimized for the first time and the second neural network model optimized for the first time on the validation set, and record the neural network model with the first/second smaller error value as the winner , the first/second neural network model with large error value is recorded as the loser, and the evaluation result is obtained.

S204, update the weight value of the supernet according to the weight value obtained after training and the evaluation result.

Specifically, it is judged whether two nodes of the first and second neural networks have the same connection and the connection has the same operation, if so, according to the weight value and evaluation result obtained after training, the corresponding operation in the supernet The weight value is updated to the weight value of the winner; otherwise, according to the weight value obtained after training and the evaluation result, the same connection as the structure of the first neural network model in the supernet and the weight of the same operation corresponding to the connection are updated For the optimized weight value ω ₁ of the first neural network model, update the weight of the same connection in the supernet as the structure of the second neural network model and the same operation corresponding to the connection to the optimized weight of the second neural network model. Weight value ω ₂ .

S205, according to the evaluation result, make the structural code of the loser learn from the structural code of the winner to obtain the structural code of the new neural network structure, and then use the structural code of the new neural network structure to replace the neural network of the loser in the population. Structural coding for population update.

Specifically, step S2051 is first performed, and according to the evaluation results of the two neural network models optimized after training in S204, the loser learns from the winner to obtain a new neural network model. Specifically, the structural code α of the loser is optimized by adopting pseudo-gradient-based learning update for the loser, so that the structural code of the loser is close to the structural code of the winner, and then the structural code of the new neural network structure is used to replace the failure in the population By.

Pseudo-gradient-based learning and updating algorithms can include first-order gradient learning and updating, second-order gradient learning and updating, or both first-order and second-order gradient learning updates, or even constant terms or multiples based on gradient information. expand. Specifically, set the structure code of the winner as α _w and the structure code of the loser as α _l in the paired two neural network structures, then the pseudo gradient Δα _l updated by the structure code of the neural network model of the loser is as follows:

Δα _l (t)=a*μ*(α _w (t)-α _l (t))+b*γ*Δα _l (t-1)+c (5)

where Δα _l (t) represents the pseudo-gradient value of the structural encoding of the t-th generation loser, μ and γ represent two real values randomly sampled from a [0,1] uniform distribution, and α, b are two [- A given real value between 1, 1], indicating the degree of confidence in gradients of different orders, c is a given real number between [-1, 1], indicating the bias effect on the pseudo gradient, Δα _l (t -1) is the historically accumulated pseudo gradient value before the structure update of the loser, and the initial value Δα _l (0) is 0.

Then calculate the updated structural code α _l ' of the loser as:

α _l '(t)=α _l (t)+Δα _l (t) (6)

Assuming that the structural code α _w of the winner is 0.2, and the structural code α _l of the loser is 0.9, assign values to a, b, c, for example, a=1, b=1, c=0, then by formula (6) The calculated value of α _l ' will be less than 0.9, and the updated structural code of the loser will realize the learning of the structural code of the loser to the structural code of the winner, so as to obtain a new neural network structure.

Then step S2052 is performed, and the structure code of the new neural network structure is used to replace the loser in the population, and the population is updated.

S206, judge whether the termination condition is met, if so, execute S207; otherwise, repeat steps 202-206, perform pairing and iterative learning according to the population, and continue to evolve and update the population until the set termination condition is reached. The termination condition can be paired and learned for all structures in the population that encode the corresponding neural network structures.

Specifically, it is judged whether all the neural network structures corresponding to the structure codes in the population participate in the pairing; if the judgment result is "No", execute S202; if the judgment result is "Yes", execute S208.

When executing step S206, it can be judged that if n<N-1, add 2 to the value of n, and return to step 202, and if n≥N-1, execute S207.

The termination condition can also be reaching a set number of iterations.

Specifically, let t be the algebra performed by the current evolution, and T be the set number of iterations, and judge whether the algebra performed by the current evolution has reached the set number of iterations; if the judgment result is "No", execute S202; if the judgment result is "Yes", execute S207.

When executing step S206, it can be judged that if t<T and n<N-1, then add 1 to the value of t, add 2 to the value of n, and return to step 202; if t<T, and n≥N- 1, add 1 to the value of t, and the value of n is 1, return to step 202; if t≥T, execute S207.

S207, in the updated population, select a preference model.

Based on the above embodiment, the present application also proposes a supernet weight update mechanism, which uses the neural network structures in the population for evolution without repeated pairing. During the evolution process, the weight values of the losers and winners are jointly The supernet weights are updated. Specifically, FIG. 8 is a flowchart of a method for updating supernet weights proposed by the application, and as shown in FIG. 6 , it includes:

S301: Randomly initialize the supernet weight, and steps S2013-S2015 may be referred to for the specific implementation manner.

S302: Randomly pair the decoded neural network structures in the population, and the paired two neural network structures inherit the connection with the same structure and the weight corresponding to the same operation corresponding to the connection from the initialized supernet to generate two neural networks Model. It should be noted that the weight value of the paired two neural network structures inherited from the supernet for the first time is the weight value of the connection with the same structure in the initialization supernet and the weight value of the same operation corresponding to the connection, in the process of each subsequent iteration. The weight values inherited by the paired two neural network structures from the supernet are the updated supernet's connection with the same structure and the weight value of the same operation corresponding to the connection.

S303, train one or more gradient descents on the two neural network models after inheriting the weights in S302.

S304, the loser and the winner are obtained by calculating the error values of the two neural network models on the validation set, the neural network model with the smaller error value is the winner, and the neural network model with the larger error value is the loser.

S305, it is judged whether the two paired neural network models contain the same connection and the corresponding operations of the connection are the same; if the judgment result is "yes", execute S306; if the judgment result is "no", execute S307.

S306 , the weight value of the same connection in the supernet with the two neural network models and the same operation corresponding to the connection is updated to the weight value of the winner.

S307 , the weight values of the operations corresponding to the connections of the two neural network models in the supernet respectively use the optimized weight values of the two neural network models.

S308 , output the supernet with the updated weight value.

Based on the above-mentioned embodiment, the present application also proposes a structure update mechanism based on population pairing, which is used to realize the non-repetitive pairing of neural network structures in the population to compete, and the loser to the winner to perform a second-order based on pseudo-gradient. Learning, generating new individuals to replace the original losers.

FIG. 9 is a flowchart of the method for updating the structure based on population pairing proposed by the present application. As shown in Figure 7, including:

S401, initialize the population, where N is the total number of neural network structures encoded in the population. For specific implementations, steps S2011-2012 can be referred to.

S402, randomly perform non-repetitive pairing of the decoded neural network structures in the population, wherein n and n+1 are the numbers of the paired two neural network structures; record the nth neural network structure as the first neural network structure , the n+1th neural network structure is recorded as the second neural network structure.

S403, the paired two neural network structures inherit weight values from the supernet.

It should be noted that the weight value inherited from the supernet for the first time by the paired two neural network structures is the weight value of the corresponding connection between the two neural network structures in the initialization supernet and the same operation corresponding to the connection. The weight value inherited from the supernet by the two neural network structures paired in the process is the weight value of the updated supernet.

S404, according to the learning task, train the paired neural network structure to obtain the loser and the winner.

S405 , according to the evaluation result, learn the structural code of the loser from the structural code of the winner to obtain a new structural code of the neural network.

S406, replace the losers in the population with the structural encoding of the new neural network.

S407, judging whether all individuals in the population participate in the pairing; if the judgment result is "No", then execute S402; if the judgment result is "Yes", execute S408;

Specifically, if n≥N-1, the iteration is ended, and S508 is executed; if n<N-1, the value of n is incremented by 2, and the execution returns to step 402.

S408, output a new population.

In this application, the continuous space is mapped to the neural network structure in order to perform continuous mathematical operations on the structure, which can endow the algorithm with better global search ability; through the population-based paired second-order learning structure update method, the optimal solution can be found faster At the same time, based on the characteristics of the population, a set of solutions can finally be found, which can provide decision makers with multiple choices and improve the reliability of the algorithm at the same time; and the weight inheritance and update of the supernet can speed up the speed of model evaluation and significantly reduce the search for neural networks. the required computational cost and running time.

An embodiment of the present application provides a system for searching neural network structures based on evolutionary learning. As shown in FIG. 10, the system includes: a population initialization module 801, an individual pairing module 802, a training evaluation module 803, a supernet weight update 804, a population Update module 805 and model output module 806.

The system initializes the population through the population initialization module 801, wherein each neural network structure in the population is a structure code, and the structure code uses a continuous real number interval to map the connections and corresponding operations between the nodes of the neural network structure. The individual pairing module 802 randomly selects two structural codes in the population, and decodes them into two neural network structures for pairing; the paired two neural network structures inherit the corresponding weights from the supernet respectively to obtain the first neural network model and the first neural network structure. Two neural network models; wherein the supernet includes a set of all operations, and the weight is the weight information corresponding to all operations; the first and second neural network models are trained respectively by the training evaluation module 803, and the trained first and second neural network models are evaluated. Network model to obtain winners and losers; update the supernet according to the first and second neural network models after training through the supernet weight update 804; calculate the difference between the structural code of the loser and the structural code of the winner through the population update module 805. Based on the pseudo gradient value, the structural code of the loser is evolved to the structural code of the winner based on the pseudo gradient value, and the structural code of the third neural network structure is obtained; the structural code of the third neural network structure is used to replace the corresponding structure of the loser in the population. The structure coding of the neural network structure obtained by the updated population; the model output module 806 outputs the optimal neural network model in the updated population under the condition that the termination condition is satisfied; thus completing the search of the neural network structure; otherwise, execute the individual The pairing module 802 performs iterative evolution on the updated population.

Specifically, the population initialization module 801 can also generate N neural network structures with the same number of nodes by manual coding according to the self-defined coding rules; by coding, the continuous real number interval is mapped to the nodes between the nodes of a single neural network structure. Connection and corresponding discrete operation, N is a natural number.

The system for searching neural network structures based on evolutionary learning provided by the embodiments of the present application further includes a supernet initialization module, which sets up a supernet according to a learning task, and the supernet includes N network units and a set of all operations.

Specifically, in the system for searching neural network structures based on evolutionary learning provided by the embodiments of the present application, the individual pairing module 802 inherits the first neural network structure from the supernet and inherits the same connection as the first neural network structure and the connection is related to the first neural network structure. The first weight corresponding to the corresponding same operation is obtained, and the first neural network model is obtained; the second neural network structure is inherited from the supernet with the same connection as the second neural network structure and the second corresponding to the same operation corresponding to the connection. The weights of the second neural network model are obtained. The training evaluation module 803 uses the stochastic gradient descent method to train the weight value of the first neural network model at least once in combination with the learning task to obtain an optimized first neural network model; in combination with the learning task, uses the stochastic gradient descent method to train the second neural network model once , obtain the optimized second neural network model; evaluate the optimized first neural network model and the optimized second neural network model respectively on the verification set; calculate the error value of the first neural network model according to the optimized first neural network model; The optimized second neural network model calculates the error value of the second neural network model; compares the error value of the first neural network model and the error value of the second neural network model; The network model is recorded as the winner, the first/second with the largest error value is recorded as the loser, and the evaluation result is obtained. The supernet weight update 804 takes the operation weight of the winner as the weight of the supernet under the condition that some two nodes have the same connection in the first and second neural networks and the connection has the same operation; Under the condition that the second neural network has different node connections or the same node connection but corresponds to different operations, the weight of the first neural network model is taken as the connection in the supernet that has the same structure as the first neural network and the corresponding value of the connection. The weight of the same operation; take the weight of the second neural network model as the weight of the connection in the supernet that has the same structure as the second neural network and the weight of the same operation corresponding to the connection; obtain the updated supernet. The population update module calculates the difference between the structural code value of the loser and the structural code value of the winner, multiplies the difference by a random coefficient, and accumulates and sums the historical pseudo-gradient under the random coefficient multiple to obtain the structural code update of the loser The value of the pseudo gradient of ; sum the value of the structure code of the loser and the value of the pseudo gradient to obtain the structure code of the third neural network structure. The model output module 806 judges whether all the neural network structures in the population are involved in pairing; if the judgment result is "no", it returns to the individual pairing module 802 to iteratively evolve the updated population; if the judgment result is "yes", it outputs the updated The optimal neural network model in the population; thus completing the search of the neural network structure. Model output module 806: Set the number of iterations as T, where T is a natural number greater than 0, and determine whether the current number of executions is less than T; if the judgment result is "yes", return to the individual pairing module 802 to iteratively evolve the updated population; If the judgment result is "No", output the optimal neural network model in the updated population; thus completing the search of the neural network structure. The model output module 806 may also inherit the corresponding weight values from the updated supernet when the number of execution iterations is greater than 1.

An embodiment of the present application provides a system for updating a supernet. As shown in FIG. 11 , the system includes: randomly initializing a supernet through a supernet initialization module 901, and the supernet includes N network units and a set of all operations; 802 Randomly select two neural network structures in the population for pairing; the paired two neural network structures respectively inherit the corresponding weights from the supernet to obtain the first neural network model and the second neural network model; Train the first and second neural network models, evaluate the trained first and second neural network models, and obtain winners and losers; there are two nodes in the first and second neural networks that have the same connection and the connection Under the condition of having the same operation, the weight of the operation of the winner is taken as the weight of the supernet. Taking the weight of the winner as the weight of the supernet; and by updating the module 904, under the condition that the first and second neural networks have different node connections or the same node connections are corresponding to different operations, the first neural network model The weight is taken as the weight of the same connection as the first neural network structure in the supernet and the same operation corresponding to the connection; the weight of the second neural network model is taken as the same connection as the second neural network structure in the supernet and this connection The corresponding weights of the same operation; get the updated supernet.

An embodiment of the present application provides an electronic device 1000, as shown in FIG. 12, including a processor 1001 and a memory 1002; the processor 1001 is configured to execute computer-executed instructions stored in the memory 1002, and the processor 1001 runs The computer executes the instructions to execute the method for searching for a neural network structure based on evolutionary learning described in any of the foregoing embodiments.

An embodiment of the present application provides a storage medium, including a readable storage medium and a computer program stored in the readable storage medium, where the computer program is used to implement the evolutionary learning-based neural network described in any of the foregoing embodiments method for structure search.

Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Experts may use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of the embodiments of the present application. Furthermore, various aspects or features of the embodiments of the present application may be implemented as a method, apparatus or article of manufacture using standard programming and/or engineering techniques. The term "article of manufacture" as used in this application encompasses a computer program accessible from any computer readable device, carrier or medium. For example, computer-readable media may include, but are not limited to, magnetic storage devices (eg, hard disks, floppy disks, or magnetic tapes, etc.), optical disks (eg, compact discs (CDs), digital versatile discs (DVDs, DVDs), etc.), Smart cards and flash memory devices (eg, erasable programmable read-only memory (EPROM), cards, stick or key drives, etc.). Additionally, various storage media described herein can represent one or more devices and/or other machine-readable media for storing information. The term "machine-readable medium" may include, but is not limited to, wireless channels and various other media capable of storing, containing, and/or carrying instructions and/or data.

It should be understood that, in various embodiments of the embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be The implementation process of the embodiments of the present application constitutes any limitation.

Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

The functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present application can be embodied in the form of software products in essence, or the parts that make contributions to the prior art or the parts of the technical solutions, and the computer software products are stored in a storage medium , including several instructions to cause a computer device (which may be a personal computer, a server, or an access network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the embodiments of this application. The aforementioned storage medium includes: U disk, removable hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes.

The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this. should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

A search method for a neural network structure based on evolutionary learning, wherein the method is used to include:

S101: Initialize a population, where the population is a set of structure codes including a plurality of different neural network structures, and the structure codes are used to indicate the connection and operation between any two nodes of the neural network structure through a continuous real number interval Mapping relations;

S102: Randomly select two structural codes in the population, decode the two structural codes to obtain two neural network structures, and pair the two neural network structures; separate the two neural network structures Inheriting the corresponding weights from the supernet to obtain the first neural network model and the second neural network model; wherein the supernet includes a set of multiple operations and the weight of each operation;

S103, train the first and second neural network models respectively to obtain the trained first and second neural network models; input the labeled voice, video or graphic samples into the trained first and second neural network models. The second neural network model calculates the error value between the output result and the label to obtain the winner and the loser, and the error value of the winner is less than the error value of the loser;

S104, update the supernet according to the trained first and second neural network models;

S105: Calculate a pseudo gradient value between the structural code of the loser and the structural code of the winner, and based on the pseudo gradient value, make the structural code of the loser evolve to the structural code of the winner, to obtain The third neural network structure code; the pseudo gradient is the gradient of the structure code update;

S106, use the third neural network structure code to replace the structure code of the neural network structure corresponding to the loser in the population to obtain an updated population;

S107, output the optimal neural network model in the updated population, and complete the search of the neural network structure.
The method according to claim 1, wherein the outputting the optimal neural network model in the updated population, so as to complete the search of the neural network structure, comprises: when a termination condition is satisfied, outputting the The optimal neural network model in the updated population is used to complete the search of the neural network structure.
The method according to claim 1, wherein the outputting the optimal neural network model in the updated population, so as to complete the search of the neural network structure; comprising: returning to S102 if the termination condition is not satisfied , performing iterative evolution on the updated population until the termination condition is satisfied, and outputting the optimal neural network model in the updated population, thereby completing the search of the neural network structure.
The method according to any one of claims 1-3, wherein the two neural network structures inherit corresponding weights from a supernet respectively to obtain a first neural network model and a second neural network model, include:

Inheriting the first neural network structure from the supernet with the same connection as the first neural network structure and the first weight corresponding to the same operation corresponding to the connection to obtain the first neural network Model;

Inheriting the second neural network structure from the supernet with the same connection as the second neural network structure and the second weight corresponding to the same operation corresponding to the connection to obtain the second neural network Model.
The method according to any one of claims 1-3, wherein the training the first and second neural network models respectively to obtain the trained first and second neural network models comprises:

Use the stochastic gradient descent method to train the weight value of the first neural network model at least once to obtain the optimized first neural network model;

The weight value of the second neural network model is trained at least once by using the stochastic gradient descent method to obtain the optimized second neural network model.
The method according to any one of claims 1-3, characterized in that, inputting the labeled voice, video or graphic samples into the trained first and second neural network models, and calculating an output result The error values from the labels get winners and losers, including:

Input the voice, video or graphic samples with labels into the trained first neural network model and the trained second neural network model respectively;

Calculate the first error value between the first output result and the label of the sample according to the first output result of the trained first neural network model;

Calculate the second error value between the second output result and the label of the sample according to the second output result of the trained second neural network model;

Comparing the first error value and the second error value, the first/second neural network model with smaller error value is regarded as the winner, and the first/second neural network model with larger error value is regarded as the loser , get winners and losers.
The method according to any one of claims 1-3, wherein the updating the supernet according to the trained first and second neural network models comprises:

Under the condition that the two nodes of the first and second neural network models contain the same connection and the corresponding operation of the connection is the same, the weight of the winner is taken as the weight of the corresponding operation in the supernet, Update the supernet.
The method according to any one of claims 1-3, wherein the updating the supernet according to the trained first and second neural network models comprises:

Under the condition that the connection of the two nodes of the first and second neural network models and the operations corresponding to the connection are different, the weight of the first neural network model is used as the weight of the first neural network model in the supernet. A connection with the same neural network structure and the weight of the same operation corresponding to the connection; the weight of the second neural network model is taken as the connection in the supernet with the same structure as the second neural network and the connection corresponding to the connection the weights of the same operations; update the supernet.
The method according to any one of claims 1 to 3, wherein the calculating a pseudo gradient value of the structure code between the loser's structure code and the winner's structure code, based on the pseudo gradient value, make the The structural code of the loser evolves to the structural code of the winner, and the structural code of the third neural network structure is obtained, including:

Calculate the difference between the structural code value of the loser and the structural code value of the winner, multiply the difference by a random coefficient, accumulate and sum the historical pseudo-gradients under the random coefficient multiplier, and obtain the failure The structure of the author encodes the value of the updated pseudo-gradient;

Summing the value of the structure code of the loser and the value of the pseudo gradient to obtain the structure code of the third neural network structure, and realizing the evolution of the structure code of the loser to the structure code of the winner.
The method according to claim 2 or 3, wherein the termination condition includes whether all structural codes in the population participate in pairing or whether a set number of iterations is reached.
A search system based on an evolutionary learning neural network structure, characterized in that the system comprises:

A population initialization module is used to initialize a population, where the population is a set of structure codes including a plurality of different neural network structures, and the structure codes are used to indicate the relationship between any two nodes of the neural network structure through a continuous real number interval. The mapping relationship between connections and operations;

an individual pairing module for randomly selecting two structural codes in the population, decoding the two structural codes to obtain two neural network structures, and pairing the two neural network structures;

A weight inheritance module, which inherits the corresponding weights from the supernet respectively for the two neural network structures to obtain a first neural network model and a second neural network model; wherein the supernet includes a set of multiple operations and each operation the weight of;

A training module for training the first and second neural network models respectively to obtain the trained first and second neural network models;

The evaluation module is used to input the voice, video or graphic samples with labels into the trained first and second neural network models, and calculate the error value between the output result and the label to obtain winners and losers. The winner's margin of error is smaller than the loser's;

a supernet weight update module, used for updating the supernet according to the trained first and second neural network models;

A structure encoding evolution module, configured to calculate a pseudo-gradient value between the loser's structure code and the winner's structure code, and make the loser's structure code move toward the winner's structure code based on the pseudo-gradient value. Structural encoding evolution to obtain a third neural network structural encoding; the pseudo-gradient is the gradient of the structural encoding update; and

a population update module, configured to replace the structure code of the neural network structure corresponding to the loser in the population with the third neural network structure code to obtain an updated population;

The model output module outputs the optimal neural network model in the updated population, thereby completing the search of the neural network structure.
The system of claim 11, wherein the model output module is used to:

When the termination condition is satisfied, the optimal neural network model in the updated population is output, thereby completing the search of the neural network structure.
The system of claim 11, wherein the model output module is used to:

If the termination condition is not satisfied, return to S102, perform iterative evolution on the updated population, and output the optimal neural network model in the updated population after the termination condition is satisfied, thereby completing the search of the neural network structure.
The system according to claim 11, wherein the weight inheritance module is used for:

Inheriting the first neural network structure from the supernet with the same connection as the first neural network structure and the first weight corresponding to the same operation corresponding to the connection to obtain the first neural network Model;

Inheriting the second neural network structure from the supernet with the same connection as the second neural network structure and the second weight corresponding to the same operation corresponding to the connection to obtain the second neural network Model.
The system according to any one of claims 11, wherein the training module is used for:

Use the stochastic gradient descent method to train the weight value of the first neural network model at least once to obtain the optimized first neural network model;

The weight value of the second neural network model is trained at least once by using the stochastic gradient descent method to obtain the optimized second neural network model.
The system according to any one of claims 11, wherein the evaluation module is used for:

Input the voice, video or graphic samples with labels into the trained first neural network model and the trained second neural network model respectively;

Calculate the first error value between the first output result and the label of the sample according to the first output result of the trained first neural network model;

Calculate the second error value between the second output result and the label of the sample according to the second output result of the trained second neural network model;

Comparing the first error value and the second error value, the first/second neural network model with a smaller error value is recorded as the winner, and the first/second neural network model with a larger error value is recorded as Losers, get winners and losers.
The system according to claim 11, wherein the supernet weight update module is used for:

Under the condition that the two nodes of the first and second neural network models contain the same connection and the corresponding operation of the connection is the same, the weight of the winner is taken as the weight of the corresponding operation in the supernet, Update the supernet.
The system according to claim 11, wherein the supernet weight update module is used for:

Under the condition that the connection of the two nodes of the first and second neural network models and the operations corresponding to the connection are different, the weight of the first neural network model is used as the weight of the first neural network model in the supernet. A connection with the same neural network structure and the weight of the same operation corresponding to the connection; the weight of the second neural network model is taken as the connection in the supernet with the same structure as the second neural network and the connection corresponding to the connection the weights of the same operations; update the supernet.
The system according to claim 11, wherein the structural coding evolution module is used for:

Calculate the difference between the structural code value of the loser and the structural code value of the winner, multiply the difference by a random coefficient, accumulate and sum the historical pseudo-gradients under the random coefficient multiplier, and obtain the failure The structure of the author encodes the value of the updated pseudo-gradient;

Summing the value of the structure code of the loser and the value of the pseudo gradient to obtain the structure code of the third neural network structure, and realizing the evolution of the structure code of the loser to the structure code of the winner.
The system according to claim 12 or 13, wherein the termination condition includes whether all structural codes in the population participate in pairing or whether a set number of iterations is reached.
An electronic device, characterized in that it includes a memory and a processor; the processor is configured to execute computer-executable instructions stored in the memory, and the processor executes the computer-executable instructions to execute any one of claims 1-10 The described search method of neural network structure based on evolution learning.
A storage medium, characterized in that it includes a readable storage medium and a computer program stored in the readable storage medium, and the computer program is used to implement the evolutionary learning-based algorithm described in any one of claims 1-10. Search methods for neural network structures.