WO2022126448A1 - Neural architecture search method and system based on evolutionary learning - Google Patents

Neural architecture search method and system based on evolutionary learning Download PDF

Info

Publication number
WO2022126448A1
WO2022126448A1 PCT/CN2020/136950 CN2020136950W WO2022126448A1 WO 2022126448 A1 WO2022126448 A1 WO 2022126448A1 CN 2020136950 W CN2020136950 W CN 2020136950W WO 2022126448 A1 WO2022126448 A1 WO 2022126448A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
supernet
network model
population
weight
Prior art date
Application number
PCT/CN2020/136950
Other languages
French (fr)
Chinese (zh)
Inventor
程然
谭浩
何成
侯章禄
邱畅啸
杨帆
Original Assignee
华为技术有限公司
南方科技大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司, 南方科技大学 filed Critical 华为技术有限公司
Priority to PCT/CN2020/136950 priority Critical patent/WO2022126448A1/en
Priority to CN202080107589.9A priority patent/CN116964594A/en
Publication of WO2022126448A1 publication Critical patent/WO2022126448A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models

Definitions

  • the present application relates to the field of artificial intelligence, and in particular, to a method and system for searching neural network structures based on evolutionary learning.
  • NAS Neural Network Architecture Search
  • Auto-ML automatic machine learning
  • a neural network usually consists of many nodes. When searching for a neural network structure, a completely arbitrary combination of nodes can be used, that is, each node can be connected to any other node, and there are different operations between nodes to choose from. .
  • the search space increases exponentially with the number of nodes, the search space is huge, and the search speed is very slow. Since the search space involved in NAS is huge, and its performance evaluation often involves model training, it consumes a lot of resources.
  • embodiments of the present application provide a method, system, electronic device and storage medium for searching a neural network structure based on evolutionary learning.
  • the present application provides a method for searching a neural network structure based on evolutionary learning, the method comprising: S101 , initializing a population, where the population is a structure code set including multiple different neural network structures, the structure code A mapping relationship used to indicate the connection and operation between any two nodes of the neural network structure through a continuous real number interval; S102, randomly select two structural codes in the population, and decode the two structural codes to Obtaining two neural network structures, and pairing the two neural network structures; respectively inheriting the corresponding weights from the supernet to obtain the first neural network model and the second neural network model; The supernet includes a set of multiple operations and the weight of each operation; S103, train the first and second neural network models respectively to obtain the trained first and second neural network models; The voice, video or graphic samples are input into the trained first and second neural network models, and the error value between the output result and the label is calculated to obtain the winner and the loser, and the error value of the winner is less than that of the loser.
  • S104 update the supernet according to the trained first and second neural network models
  • S105 calculate the pseudo gradient value between the structural code of the loser and the structural code of the winner, Evolve the structural code of the loser to the structural code of the winner based on the pseudo gradient value, so as to obtain a third neural network structural code
  • the pseudo gradient is the gradient of the structural code update
  • S106 use the third neural network structural code
  • the neural network structure code replaces the structure code of the neural network structure corresponding to the loser in the population to obtain an updated population
  • S107 output the optimal neural network model in the updated population, thereby completing the neural network structure. search.
  • This embodiment uses a continuous real number space to represent the neural network structure, which can reduce the search space corresponding to the operation selection, improve the NAS search efficiency, and increase the diversity of the neural network structure in the population to match the second-order learning evolution of the subsequent neural network. It can solve the problems of poor effect of existing neural network structure search methods and high consumption of computing resources.
  • the outputting the optimal neural network model in the updated population, so as to complete the search of the neural network structure includes: outputting the optimal neural network model in the updated population when the termination condition is satisfied.
  • the neural network model to complete the search of the neural network structure includes: outputting the optimal neural network model in the updated population when the termination condition is satisfied.
  • the population is iteratively evolved by setting termination conditions, which improves the reliability of the neural network structure search.
  • the outputting the optimal neural network model in the updated population, thereby completing the search of the neural network structure including: if the termination condition is not satisfied, returning to S102, and performing a search on the updated population.
  • the population is iteratively evolved until the termination condition is met, and the optimal neural network model in the updated population is output, thereby completing the search of the neural network structure.
  • This embodiment finds a set of optimal neural network models through an iterative method based on the characteristics of the population, which can provide decision makers with multiple choices.
  • the step of inheriting the corresponding weights of the two neural network structures respectively from the supernet to obtain the first neural network model and the second neural network model includes: converting the first neural network structure Inheriting the same connection as the first neural network structure and the first weight corresponding to the same operation corresponding to the connection from the supernet to obtain the first neural network model;
  • the network structure inherits the same connection as the second neural network structure and the second weight corresponding to the same operation corresponding to the connection from the supernet to obtain the second neural network model.
  • This embodiment can speed up the speed of obtaining the model by inheriting the weights of the supernet, and significantly reduce the computational cost and running time required for searching the neural network; in the iterative process, the weights inherited by the neural network structure from the supernet are optimized weights , which significantly reduces the computational cost and running time required for searching neural networks.
  • the separately training the first and second neural network models to obtain the trained first and second neural network models includes: training the first neural network at least once by using a stochastic gradient descent method The weight value of the network model is used to obtain the optimized first neural network model; the weight value of the second neural network model is trained at least once by using the stochastic gradient descent method to obtain the optimized second neural network model.
  • first and second neural network models optimized for weight values are obtained by training the first and second neural network models.
  • the first and second neural network models with labels are input into the trained first and second neural network models, and the error value between the output result and the label is calculated to obtain the winner and the loser
  • the method includes: inputting the labeled voice, video or graphic samples into the trained first neural network model and the trained second neural network model respectively; an output result, calculate the first error value between the first output result and the label of the sample; according to the second output result of the trained second neural network model, calculate the difference between the second output result and the sample
  • the second error value between labels compare the first error value and the second error value, take the first/second neural network model with the smaller error value as the winner, and use the first/second neural network model with the larger error value as the winner.
  • /Second neural network model as loser, get winner and loser.
  • the paired first and second neural network models are trained by the labeled samples, and the performance of the trained first and second models is evaluated, so as to speed up the speed of finding the optimal model.
  • the updating the supernet according to the trained first and second neural network models includes: two nodes of the first and second neural network models contain the same connection and the operation corresponding to the connection is the same, update the supernet by using the weight of the winner as the weight of the corresponding operation in the supernet.
  • This embodiment can synchronously optimize the operation weights of the supernet, so as to speed up the search; the weight update of the supernet can significantly reduce the computational cost and running time required for searching the neural network.
  • the updating of the supernet according to the trained first and second neural network models includes: connecting two nodes of the first and second neural network models and the Under the condition that the operations corresponding to the connections are not the same, take the weight of the first neural network model as the weight of the connection in the supernet that has the same structure as the first neural network and the same operation corresponding to the connection;
  • the weight of the second neural network model is used as the weight of the connection in the supernet with the same structure as the second neural network and the same operation corresponding to the connection; the supernet is updated.
  • This embodiment can synchronously optimize the operation weights of the supernet, so as to speed up the search; the weight update of the supernet can significantly reduce the computational cost and running time required for searching the neural network.
  • the calculating a pseudo gradient value between the structural code of the loser and the structural code of the winner, and based on the pseudo gradient value, the structural code of the loser is moved toward the winner.
  • the structure coding evolution of the winner, to obtain the structure coding of the third neural network structure comprising: calculating the difference between the structure coding value of the loser and the structure coding value of the winner, and multiplying the difference by a random coefficient, Accumulate and sum up the historical pseudo gradients under the random coefficient multiplier to obtain the value of the pseudo gradient updated by the structural code of the loser; sum the structural code value of the loser and the value of the pseudo gradient to obtain the
  • the structure code of the third neural network structure is used to realize the evolution of the structure code of the loser to the structure code of the winner.
  • this embodiment enables the loser to perform structural evolution update by learning from the winner, so as to find the optimal neural network model more quickly.
  • the termination condition includes whether all structural codes in the population participate in pairing or whether a set number of iterations is reached.
  • the population is fully iteratively evolved by setting termination conditions, which improves the reliability of the neural network structure search.
  • the present application provides a search system for a neural network structure based on evolutionary learning
  • the system includes: a population initialization module for initializing a population, where the population is a structure encoding set including a plurality of different neural network structures, The structure code is used to indicate the connection and operation mapping relationship between any two nodes of the neural network structure through a continuous real number interval; the individual pairing module is used to randomly select two structure codes in the population, Decode the two structure codes to obtain two neural network structures, and pair the two neural network structures; the weight inheritance module inherits the corresponding weights of the two neural network structures from the supernet respectively, and obtains a first neural network model and a second neural network model; wherein the supernet includes a set of multiple operations and a weight for each operation; a training module for training the first and second neural network models respectively to obtain training Good first and second neural network models; an evaluation module for inputting labeled voice, video or graphic samples into the trained first and second neural network models, and calculating the difference between the output result and the label
  • the error value obtains the winner and the loser, and the error value of the winner is smaller than that of the loser;
  • the supernet weight update module is used to update the supernet according to the trained first and second neural network models;
  • structure coding an evolution module configured to calculate a pseudo gradient value between the structural code of the loser and the structural code of the winner, and make the structural code of the loser to the structural code of the winner based on the pseudo gradient value evolution to obtain a third neural network structure code;
  • the pseudo gradient is the gradient of the structure code update;
  • a population update module for replacing the neural network corresponding to the loser in the population with the third neural network structure code
  • the structure coding of the network structure obtains the updated population;
  • the model output module outputs the optimal neural network model in the updated population, thereby completing the search of the neural network structure.
  • the model output module is configured to output the optimal neural network model in the updated population under the condition that the termination condition is satisfied, so as to complete the search of the neural network structure.
  • the model output module is configured to: if the termination condition is not met, return to S102, perform iterative evolution on the updated population, and output the updated population after the termination condition is met.
  • the optimal neural network model is used to complete the search of the neural network structure.
  • the weight inheritance module is configured to: inherit the first neural network structure from the supernet to the same connection as the first neural network structure and the same connection corresponding to the connection Operate the corresponding first weight to obtain the first neural network model; inherit the second neural network structure from the supernet and inherit the same connection as the second neural network structure and corresponding to the connection The second weight corresponding to the same operation is obtained to obtain the second neural network model.
  • the training module is used for: training the weight value of the first neural network model at least once by using stochastic gradient descent, to obtain the optimized first neural network model; training by using stochastic gradient descent The weight value of the second neural network model is obtained at least once to obtain the optimized second neural network model.
  • the evaluation module is used to: input the labeled voice, video or graphic samples into the trained first neural network model and the trained second neural network model respectively; The first output result of the trained first neural network model is calculated, and the first error value between the first output result and the label of the sample is calculated; according to the second output result of the trained second neural network model, Calculate the second error value between the second output result and the label of the sample; compare the first error value and the second error value, and record the first/second neural network model with a smaller error value as For the winner, the first/second neural network model with larger error value is recorded as the loser, and the winner and loser are obtained.
  • the supernet weight update module is configured to: under the condition that the two nodes of the first and second neural network models contain the same connection and the corresponding operations of the connection are the same, The weight of the winner is used as the weight of the corresponding operation in the supernet to update the supernet.
  • the supernet weight update module is configured to: under the condition that the connection of the two nodes of the first and second neural network models and the operation corresponding to the connection are different, the The weight of the first neural network model is used as the weight of the connection in the supernet that has the same structure as the first neural network and the same operation corresponding to the connection; the weight of the second neural network model is used as the supernet. in the connection with the same structure as the second neural network and the weight of the same operation corresponding to the connection; update the supernet.
  • the structure coding evolution module is configured to: calculate the difference between the structure coding value of the loser and the structure coding value of the winner, multiply the difference by a random coefficient, and randomly
  • the historical pseudo-gradients under the coefficient magnification are accumulated and summed to obtain the value of the pseudo-gradient updated by the structural code of the loser; the value of the structural code of the loser and the value of the pseudo-gradient are summed to obtain the first
  • the structure coding of the three neural network structures realizes the evolution of the structure coding of the loser to the structure coding of the winner.
  • the termination condition includes whether all structural codes in the population participate in pairing or whether a set number of iterations is reached.
  • the present application provides an electronic device, including a memory and a processor; the processor is configured to execute computer-executable instructions stored in the memory, and the processor executes the computer-executable instructions to execute any one of the above implementations.
  • the present application provides a storage medium, including a readable storage medium and a computer program stored in the readable storage medium, where the computer program is used to implement the evolution-based learning described in any of the foregoing embodiments
  • the search method of the neural network structure is used to implement the evolution-based learning described in any of the foregoing embodiments.
  • a search method, system, electronic device and storage medium for a neural network structure based on evolutionary learning provided by the embodiments of the present application map a continuous space to a neural network structure so as to perform continuous mathematical operations on the structure, which can give the algorithm a better global Search ability; the optimal solution can be found faster through the structure update method based on the population-based paired second-order learning; at the same time, a set of solutions can be found based on the characteristics of the population, which can provide multiple choices for decision makers and improve the algorithm at the same time and the weight inheritance and update of the supernet can speed up the model evaluation and significantly reduce the computational cost and running time required for searching the neural network.
  • FIG. 1 is a schematic diagram of an application environment of a neural network structure search provided by an embodiment of the present application
  • Figure 2 is a flowchart of the population-based neural network structure search proposed by the first scheme
  • FIG. 3 provides a basic framework diagram of a neural network structure search based on evolutionary learning for a system embodiment of the present application
  • FIG. 4 is a general flowchart of a method for searching a neural network structure based on evolutionary learning provided by an embodiment of the present application
  • 5a is a block diagram of a specific embodiment of a method for searching a neural network structure based on evolutionary learning provided by an embodiment of the application;
  • Figure 5b is a flow chart of population initialization
  • Figure 5c is a block diagram of the initialization flow of the supernet
  • FIG. 6 is a schematic diagram of operations between two nodes in a method for searching a neural network structure based on evolutionary learning provided by an embodiment of the present application;
  • FIG. 7 is a schematic diagram of connections and operations between two nodes of a supernet in a method for searching a neural network structure based on evolutionary learning provided by an embodiment of the present application;
  • FIG. 8 is a flowchart of a method for updating a supernet weight provided by an embodiment of the present application.
  • FIG. 9 is a flowchart of a population pairing-based structure updating method provided by an embodiment of the present application.
  • FIG. 10 is a system block diagram of a neural network structure search based on evolutionary learning provided by an embodiment of the application;
  • FIG. 11 is a block diagram of a system for updating a supernet provided by an embodiment of the present application.
  • FIG. 12 is a schematic diagram of an electronic device according to an embodiment of the present application.
  • the neural network structure search (NAS) technology is applied in a wide range of scenarios.
  • using algorithms to automatically design neural network structure models can achieve better performance than manually designed neural network structures;
  • using neural network structure search to generate neural network structure models To process data such as Magnetic Resonance Imaging (MRI), Computed Tomography (CT) and Ultrasound to determine whether a patient has a disease.
  • MRI Magnetic Resonance Imaging
  • CT Computed Tomography
  • Ultrasound Ultrasound
  • FIG. 1 is a schematic diagram of an application environment of a neural network structure search provided by an embodiment of the present application; as shown in FIG. 1 , the application scenario includes at least one background server 10 and a smart device 11 .
  • the smart device 11 can be connected to the backend server 10 through the Internet; the smart device 11 can include smart devices capable of outputting medical images, voice, video or pictures, such as magnetic resonance imagers, smart speakers, smart cameras, and smart phones.
  • the intelligent device 11 is provided with a picture, voice, medical image or video collection device, and the collected picture, voice, medical image or video data can be sent to the background server 10, so that the background server 10 can input the picture, voice, medical image or video.
  • Use neural network structure search to generate neural network structure models for classification, segmentation or identification.
  • NAS is a subset of hyperparameter optimization. Customized NAS methods are not actually fully automated, they rely on neural network structures specially hand-coded for the application or learning task as the starting point for the search. In general, the goal of the neural network structure search method is defined as:
  • is defined as structural coding
  • is defined as weight information
  • ⁇ * is the corresponding optimal weight.
  • ⁇ * is the corresponding optimal weight.
  • the first solution is population-based neural network structure search. This method is one of the most common methods in current neural network structure search research. The general process is to initialize a population, and then select parent individuals to use crossover, mutation, etc. The operator updates the topology of the parent to obtain the topology of the child, and finally uses the idea of "survival of the fittest" to eliminate individuals with low fitness and retain the better individuals. By iterating the above process, the population can continuously evolve to obtain the global/local optimal solution.
  • Figure 2 shows the population-based neural network structure search proposed by the first scheme; as shown in Figure 2, the steps of population-based neural network structure search include: initializing the population, and the population is a collection of individuals containing different neural network structures; The individual of the neural network structure is trained, and its accuracy on the validation set is obtained as the fitness of the individual; it is judged whether the termination conditions set by the algorithm are met; if the judgment result is "No", the parent neural network structure is passed.
  • the offspring neural network structure is trained to obtain its accuracy on the validation set as the offspring fitness value; According to the size of the fitness value of the offspring, individuals with at least one neural network structure are selected from the neural network structure of the parent and the neural network structure of the offspring; the set of individuals containing the selected at least one neural network structure is a new population Output the selected new population.
  • the main difference between different population-based structure search algorithms lies in the steps of generating the child neural network structure from the parent neural network structure through different crossover and mutation operators.
  • crossover and mutation operators There are many designs of crossover and mutation operators.
  • the AmoebaNet (source link) algorithm defines a macro template of the neural network structure, and designs two mutation operators: an operator that changes different operations between nodes and an operator that changes the connections between different nodes.
  • the Large-Scaleevolution algorithm does not define a macro template, and proposes eleven different mutation operators, including changing the learning rate operator, inserting the convolutional layer operator, removing the convolutional layer operator, changing the number of channels, etc.
  • the method can automatically evolve a complex neural network structure from a simple neural network structure.
  • the population-based neural network structure search method has the advantages of being suitable for parallelism and high reliability, because a large number of individuals in the population need to evaluate their fitness, it needs to consume more GPU resources and time. For example, AmoebaNet needs 3150 GPUdays to complete the search. Task. Therefore, it is difficult for this method to obtain a balance between structure search accuracy and resource consumption.
  • Differentiable Architecture Search maps the neural network into a continuous space, and uses the gradient descent method to solve it, and the parameters such as the structure and weight of the neural network can be obtained at the same time.
  • the gradient information of the structure for:
  • is defined as the structure of the neural network model, ⁇ represents the current weight, ⁇ * ( ⁇ ) is the corresponding optimal weight, and ⁇ represents the learning rate for one step of internal optimization. is the loss value on the validation set.
  • This method approximates ⁇ * ( ⁇ ) by training ⁇ once, instead of continuously training ⁇ to reach its convergence. This method searches the neural network structure along the direction of the gradient, so it can quickly search for a better neural network structure.
  • the structure search method based on differentiable neural network has the advantage of being fast.
  • this method since this method only searches for a single body, compared with a population, because only a single structure is searched each time, the reliability is low.
  • this method only uses the gradient information of the monomer, and cannot avoid the local optimal structure; and this method uses a probability to encode each possible connection and operation, resulting in a huge search space corresponding to the encoding, and the cost of optimization is high. .
  • the following introduces the concept of a method and system for searching for a neural network structure based on evolutionary learning provided by the embodiments of the present application.
  • FIG. 3 provides a basic framework diagram of a neural network structure search based on evolutionary learning according to a system embodiment of the present application.
  • an embodiment of the present application provides a method and system for neural network structure search based on evolutionary learning.
  • the solution uses a population-based pairing mechanism to use two
  • the method of order learning generates a new neural network model for population update; uses the supernet weight to train the newly generated neural network model using the gradient descent method, and uses the trained neural network model to update the weight of the supernet model to complete the neural network.
  • Automatic search of network structures In this process, performance evaluation is performed on the trained paired neural network models, and the losers of the evaluation learn from the winners to generate new neural network models for population update.
  • the solution can solve the problems of poor effect of existing neural network structure search methods and high consumption of computing resources.
  • self-defined coding refers to coding the neural network structure according to the learning task or applying artificially set coding rules.
  • the nodes in the neural network structure can be represented by multiple real variables respectively, and the connection and operation between any two nodes are unified and independent codes.
  • a supernet is a directly defined neural network with the same number of nodes as the neural network model in the initialized population, including all connection relationships and operation relationships, and the corresponding weights of its operations are used for sharing.
  • the structure of the supernet is fixed, and its optimal weight can be optimized by standard backpropagation.
  • the optimized weight value is applicable to all neural network models to improve the recognition performance.
  • FIG. 4 is a flowchart of a method for searching a neural network structure based on evolutionary learning according to an embodiment of the present application.
  • the flow of the method is: S101, initialize the population and the supernet; each neural network structure of the population is a structural code, and the population initialization is to randomly initialize these codes.
  • S102 for each code initialized in the population, first decode it into a neural network structure and then perform random pairing, and the two paired neural network structures respectively inherit weights from the initialized supernet.
  • S103 Perform training and optimization on the two paired neural network structures after inheriting the weights according to the learning task, respectively, to obtain a neural network model, and evaluate the performance of the two trained neural network models on the verification set respectively, and obtain according to the evaluation results. Losers and winners.
  • S104 update the corresponding weight value of the supernet according to the neural network model obtained after training in S103 and the evaluation result;
  • S105 according to the evaluation result, make the structural code of the loser learn from the structural code of the winner to obtain a new structure of the neural network coding, and then replace the structural coding of the loser in the population with the structural coding of the new neural network to update the population;
  • S106 judge whether the termination condition is met, if the termination condition is met, execute S107; otherwise, return to S102 to update the population Iterative evolution.
  • the termination condition is that all individuals in the population participate in pairing and reach the set number of iterations;
  • S107 output the preference model in the new population.
  • the preference model is the optimal neural network that meets the needs of the learning task.
  • FIG. 5a is a block diagram of an embodiment of a method for searching a neural network structure based on evolutionary learning according to an embodiment of the present application. As shown in Figure 5a, the method is implemented by performing the following steps.
  • the coding rules are customized for the application or learning task, and the structure coding of the neural network structure is generated according to the coding rules, and the continuous real number intervals are respectively mapped to the neural network structure.
  • Applications or learning tasks here include classifying, segmenting, or recognizing input pictures, speech, medical images, or videos.
  • S2011 Set the nodes of the neural network structure, express the connection between the set nodes as continuous real numbers, randomly connect the nodes, and encode the connection of the nodes and the operation corresponding to the connection into the structure code ⁇ of the neural network , ⁇ is set as a vector, including the connections between nodes and the operations in these connections, so as to map continuous real-number intervals to the neural network structure respectively.
  • the neural network structure can be set to have m nodes, and the continuous real number space [0, 1), [1, 2), [2, 3), [3, 4).. .[m-1, m) is mapped to m nodes, the first two nodes represent the input, and the latter node randomly selects the two nodes in front of it to connect. Therefore, each node needs to store four variables except the first two, two of which represent the node codes corresponding to the connected nodes, and the other two variables represent the operation codes corresponding to the operations represented by the two connections.
  • Each node is represented by four variables, the operation code ⁇ is a vector containing a plurality of four variables, and the value of the operation code is defined in the real number space with a difference between the upper and lower limits of 1.
  • the connection code of a certain node is 0.5, 2.3, which means that the node is connected with node 0 and node 2.
  • N structure codes can be decoded into N neural network structures with the same number of nodes, different connection relationships and different operations.
  • Using continuous real number space to represent the neural network structure can increase the diversity of the neural network structure within the population to match the second-order learning evolution of the subsequent neural network.
  • S2014 set multiple operations between every two nodes, and set the weight of each operation.
  • the neural network structure represented by any possible structural code in the population is a sub-network of the supernet, and the sub-network is recorded as a network unit. Only one operation can be selected between every two nodes of the neural network structure in the population.
  • the first operation is the operation of the 3*3 average pooling layer
  • the second operation 3*3 is the operation of the max pooling layer
  • the third operation is the operation of 3*3 convolutional layers
  • the continuous real number space of [0, 1) can be mapped to the first operation, and the operation between node 0 and node 1 is represented by the operation code ⁇ [0, 1) as the average
  • the operation of the pooling layer the continuous real number space of [1, 2) can be mapped into the second operation, and the operation code ⁇ [1, 2) represents the operation as the operation of the maximum pooling layer
  • the continuous real number space of is mapped into a third operation, denoted by the operation code ⁇ [2, 3) as the operation of the convolutional layer.
  • FIG. 7 The schematic diagram of the connection between any two nodes of the supernet is shown in Fig. 7.
  • the supernet does not involve structural coding, and every two nodes of the supernet can include all possible connections required for application or learning tasks in parallel.
  • Operations including but not limited to operations in average pooling layers, operations in max pooling layers, and operations in convolutional layers. It is set that each operation contains its own weight information and needs to be trained separately.
  • the step of initializing the population is coded using the coding rules of the connection between two nodes and the operation search, which can map independent continuous variable intervals to the connection between the two nodes and the operation corresponding to the connection, which can reduce the The operation selects the corresponding search space, improves the NAS search efficiency, and can convert discrete real numbers, combination numbers, and probability values into continuous real numbers.
  • execute S202 randomly select the structure codes corresponding to the two neural network structures in the population, decode into two neural network structures for pairing, and the paired two neural network structures inherit weights from the supernet.
  • the weight includes the weight value of the operation.
  • the nth neural network structure is recorded as the first neural network structure
  • the n+1th neural network structure is recorded as the second neural network structure
  • the first neural network inherits the same connection as the first neural network structure and the weight corresponding to the same operation corresponding to the connection from the initialized supernet to obtain the first neural network model
  • the second neural network structure is obtained from the initialized supernet Inheriting the same connection as the second neural network structure and the weight corresponding to the same operation corresponding to the connection, the second neural network model is obtained.
  • the operation weight of the convolutional layer is 2.6.
  • the first neural network structure inherits the same connection as the first neural network structure and the weight value corresponding to the same operation corresponding to the connection from the initialized supernet, so that the weight of the operation of the convolutional layer of the first neural network structure is 2.6.
  • the second neural network structure inherits from the supernet a connection with the same structure and a weight value corresponding to the same operation corresponding to the connection.
  • the weights inherited by the paired two neural network structures from the supernet for the first time are the weight values corresponding to the connection with the same structure in the initialization supernet and the same operation corresponding to the connection, and in the process of each subsequent iteration.
  • the weights inherited by the paired two neural network structures from the supernet are the weight values corresponding to the connection in the updated supernet with the same structure and the same operation corresponding to the connection.
  • S203 in combination with the learning task, perform one or more gradient descent training on the two neural network models after inheriting the weight, optimize the weight value, and verify the trained first neural network model and the second neural network model on the verification set, respectively, Obtain the error value of the first neural network model and the error value of the second neural network model, compare the two error values, and record the neural network model with the smaller error value as the winner, and the neural network model with the larger error value as the loser , get the evaluation result.
  • the stochastic gradient descent method is used to train the two neural network models respectively, and the calculation formula (3) obtains the weight drop value of the current neural network model, and the calculation formula (4) obtains the optimized neural network model.
  • Weight ⁇ :
  • t is the number of iterations of the stochastic gradient descent method
  • ⁇ (t) represents the weight drop value of the t-th generation neural network model
  • ⁇ (t) represents the weight value of the t-th iteration neural network model after optimization
  • is the momentum
  • ⁇ (t) is the learning rate
  • the error value (loss) for the neural network on the training set which is obtained by calculating the accuracy of the current neural network model on the validation set.
  • the first neural network model is respectively trained along the gradient descent direction to operate the weights, and the optimized weight value ⁇ 1 is calculated according to the calculated weight drop value ⁇ 1 (t) to obtain the first neural network model.
  • the first neural network model for one optimization.
  • the second neural network model is trained along the gradient direction to operate the weight, and the optimized weight value ⁇ 2 is calculated according to the calculated weight drop value ⁇ 2 (t) to obtain the optimized second neural network model.
  • S2032 respectively verify the error values of the first neural network model optimized for the first time and the second neural network model optimized for the first time on the validation set, and record the neural network model with the first/second smaller error value as the winner , the first/second neural network model with large error value is recorded as the loser, and the evaluation result is obtained.
  • weight value ⁇ 1 of the first neural network model update the weight of the same connection in the supernet as the structure of the second neural network model and the same operation corresponding to the connection to the optimized weight of the second neural network model.
  • step S2051 is first performed, and according to the evaluation results of the two neural network models optimized after training in S204, the loser learns from the winner to obtain a new neural network model.
  • the structural code ⁇ of the loser is optimized by adopting pseudo-gradient-based learning update for the loser, so that the structural code of the loser is close to the structural code of the winner, and then the structural code of the new neural network structure is used to replace the failure in the population By.
  • Pseudo-gradient-based learning and updating algorithms can include first-order gradient learning and updating, second-order gradient learning and updating, or both first-order and second-order gradient learning updates, or even constant terms or multiples based on gradient information. Expand. Specifically, set the structure code of the winner as ⁇ w and the structure code of the loser as ⁇ l in the paired two neural network structures, then the pseudo gradient ⁇ l updated by the structure code of the neural network model of the loser is as follows:
  • ⁇ l (t) a* ⁇ *( ⁇ w (t)- ⁇ l (t))+b* ⁇ * ⁇ l (t-1)+c (5)
  • ⁇ l (t) represents the pseudo-gradient value of the structural encoding of the t-th generation loser
  • ⁇ and ⁇ represent two real values randomly sampled from a [0,1] uniform distribution
  • ⁇ , b are two [- A given real value between 1, 1], indicating the degree of confidence in gradients of different orders
  • c is a given real number between [-1, 1], indicating the bias effect on the pseudo gradient
  • ⁇ l (t -1) is the historically accumulated pseudo gradient value before the structure update of the loser
  • the initial value ⁇ l (0) is 0.
  • step S2052 is performed, and the structure code of the new neural network structure is used to replace the loser in the population, and the population is updated.
  • S206 judge whether the termination condition is met, if so, execute S207; otherwise, repeat steps 202-206, perform pairing and iterative learning according to the population, and continue to evolve and update the population until the set termination condition is reached.
  • the termination condition can be paired and learned for all structures in the population that encode the corresponding neural network structures.
  • step S206 When executing step S206, it can be judged that if n ⁇ N-1, add 2 to the value of n, and return to step 202, and if n ⁇ N-1, execute S207.
  • the termination condition can also be reaching a set number of iterations.
  • step S206 When executing step S206, it can be judged that if t ⁇ T and n ⁇ N-1, then add 1 to the value of t, add 2 to the value of n, and return to step 202; if t ⁇ T, and n ⁇ N- 1, add 1 to the value of t, and the value of n is 1, return to step 202; if t ⁇ T, execute S207.
  • FIG. 8 is a flowchart of a method for updating supernet weights proposed by the application, and as shown in FIG. 6 , it includes:
  • S302 Randomly pair the decoded neural network structures in the population, and the paired two neural network structures inherit the connection with the same structure and the weight corresponding to the same operation corresponding to the connection from the initialized supernet to generate two neural networks Model.
  • the weight value of the paired two neural network structures inherited from the supernet for the first time is the weight value of the connection with the same structure in the initialization supernet and the weight value of the same operation corresponding to the connection, in the process of each subsequent iteration.
  • the weight values inherited by the paired two neural network structures from the supernet are the updated supernet's connection with the same structure and the weight value of the same operation corresponding to the connection.
  • S303 train one or more gradient descents on the two neural network models after inheriting the weights in S302.
  • the loser and the winner are obtained by calculating the error values of the two neural network models on the validation set, the neural network model with the smaller error value is the winner, and the neural network model with the larger error value is the loser.
  • the weight value of the same connection in the supernet with the two neural network models and the same operation corresponding to the connection is updated to the weight value of the winner.
  • the weight values of the operations corresponding to the connections of the two neural network models in the supernet respectively use the optimized weight values of the two neural network models.
  • the present application also proposes a structure update mechanism based on population pairing, which is used to realize the non-repetitive pairing of neural network structures in the population to compete, and the loser to the winner to perform a second-order based on pseudo-gradient. Learning, generating new individuals to replace the original losers.
  • FIG. 9 is a flowchart of the method for updating the structure based on population pairing proposed by the present application. As shown in Figure 7, including:
  • steps S2011-2012 can be referred to.
  • n and n+1 are the numbers of the paired two neural network structures; record the nth neural network structure as the first neural network structure , the n+1th neural network structure is recorded as the second neural network structure.
  • the paired two neural network structures inherit weight values from the supernet.
  • the weight value inherited from the supernet for the first time by the paired two neural network structures is the weight value of the corresponding connection between the two neural network structures in the initialization supernet and the same operation corresponding to the connection.
  • the weight value inherited from the supernet by the two neural network structures paired in the process is the weight value of the updated supernet.
  • n ⁇ N-1 the iteration is ended, and S508 is executed; if n ⁇ N-1, the value of n is incremented by 2, and the execution returns to step 402.
  • the continuous space is mapped to the neural network structure in order to perform continuous mathematical operations on the structure, which can endow the algorithm with better global search ability; through the population-based paired second-order learning structure update method, the optimal solution can be found faster At the same time, based on the characteristics of the population, a set of solutions can finally be found, which can provide decision makers with multiple choices and improve the reliability of the algorithm at the same time; and the weight inheritance and update of the supernet can speed up the speed of model evaluation and significantly reduce the search for neural networks. the required computational cost and running time.
  • An embodiment of the present application provides a system for searching neural network structures based on evolutionary learning.
  • the system includes: a population initialization module 801, an individual pairing module 802, a training evaluation module 803, a supernet weight update 804, a population Update module 805 and model output module 806.
  • the system initializes the population through the population initialization module 801, wherein each neural network structure in the population is a structure code, and the structure code uses a continuous real number interval to map the connections and corresponding operations between the nodes of the neural network structure.
  • the individual pairing module 802 randomly selects two structural codes in the population, and decodes them into two neural network structures for pairing; the paired two neural network structures inherit the corresponding weights from the supernet respectively to obtain the first neural network model and the first neural network structure.
  • Network model to obtain winners and losers update the supernet according to the first and second neural network models after training through the supernet weight update 804; calculate the difference between the structural code of the loser and the structural code of the winner through the population update module 805.
  • the structural code of the loser is evolved to the structural code of the winner based on the pseudo gradient value, and the structural code of the third neural network structure is obtained; the structural code of the third neural network structure is used to replace the corresponding structure of the loser in the population.
  • the pairing module 802 performs iterative evolution on the updated population.
  • the population initialization module 801 can also generate N neural network structures with the same number of nodes by manual coding according to the self-defined coding rules; by coding, the continuous real number interval is mapped to the nodes between the nodes of a single neural network structure. Connection and corresponding discrete operation, N is a natural number.
  • the system for searching neural network structures based on evolutionary learning further includes a supernet initialization module, which sets up a supernet according to a learning task, and the supernet includes N network units and a set of all operations.
  • the individual pairing module 802 inherits the first neural network structure from the supernet and inherits the same connection as the first neural network structure and the connection is related to the first neural network structure.
  • the first weight corresponding to the corresponding same operation is obtained, and the first neural network model is obtained;
  • the second neural network structure is inherited from the supernet with the same connection as the second neural network structure and the second corresponding to the same operation corresponding to the connection.
  • the weights of the second neural network model are obtained.
  • the training evaluation module 803 uses the stochastic gradient descent method to train the weight value of the first neural network model at least once in combination with the learning task to obtain an optimized first neural network model; in combination with the learning task, uses the stochastic gradient descent method to train the second neural network model once , obtain the optimized second neural network model; evaluate the optimized first neural network model and the optimized second neural network model respectively on the verification set; calculate the error value of the first neural network model according to the optimized first neural network model; The optimized second neural network model calculates the error value of the second neural network model; compares the error value of the first neural network model and the error value of the second neural network model; The network model is recorded as the winner, the first/second with the largest error value is recorded as the loser, and the evaluation result is obtained.
  • the supernet weight update 804 takes the operation weight of the winner as the weight of the supernet under the condition that some two nodes have the same connection in the first and second neural networks and the connection has the same operation; Under the condition that the second neural network has different node connections or the same node connection but corresponds to different operations, the weight of the first neural network model is taken as the connection in the supernet that has the same structure as the first neural network and the corresponding value of the connection. The weight of the same operation; take the weight of the second neural network model as the weight of the connection in the supernet that has the same structure as the second neural network and the weight of the same operation corresponding to the connection; obtain the updated supernet.
  • the population update module calculates the difference between the structural code value of the loser and the structural code value of the winner, multiplies the difference by a random coefficient, and accumulates and sums the historical pseudo-gradient under the random coefficient multiple to obtain the structural code update of the loser
  • the value of the pseudo gradient of sum the value of the structure code of the loser and the value of the pseudo gradient to obtain the structure code of the third neural network structure.
  • the model output module 806 judges whether all the neural network structures in the population are involved in pairing; if the judgment result is "no", it returns to the individual pairing module 802 to iteratively evolve the updated population; if the judgment result is "yes”, it outputs the updated The optimal neural network model in the population; thus completing the search of the neural network structure.
  • Model output module 806 Set the number of iterations as T, where T is a natural number greater than 0, and determine whether the current number of executions is less than T; if the judgment result is "yes”, return to the individual pairing module 802 to iteratively evolve the updated population; If the judgment result is "No", output the optimal neural network model in the updated population; thus completing the search of the neural network structure.
  • the model output module 806 may also inherit the corresponding weight values from the updated supernet when the number of execution iterations is greater than 1.
  • An embodiment of the present application provides a system for updating a supernet.
  • the system includes: randomly initializing a supernet through a supernet initialization module 901, and the supernet includes N network units and a set of all operations; 802 Randomly select two neural network structures in the population for pairing; the paired two neural network structures respectively inherit the corresponding weights from the supernet to obtain the first neural network model and the second neural network model; Train the first and second neural network models, evaluate the trained first and second neural network models, and obtain winners and losers; there are two nodes in the first and second neural networks that have the same connection and the connection Under the condition of having the same operation, the weight of the operation of the winner is taken as the weight of the supernet.
  • the first neural network model The weight is taken as the weight of the same connection as the first neural network structure in the supernet and the same operation corresponding to the connection; the weight of the second neural network model is taken as the same connection as the second neural network structure in the supernet and this connection The corresponding weights of the same operation; get the updated supernet.
  • An embodiment of the present application provides an electronic device 1000, as shown in FIG. 12, including a processor 1001 and a memory 1002; the processor 1001 is configured to execute computer-executed instructions stored in the memory 1002, and the processor 1001 runs The computer executes the instructions to execute the method for searching for a neural network structure based on evolutionary learning described in any of the foregoing embodiments.
  • An embodiment of the present application provides a storage medium, including a readable storage medium and a computer program stored in the readable storage medium, where the computer program is used to implement the evolutionary learning-based neural network described in any of the foregoing embodiments method for structure search.
  • computer-readable media may include, but are not limited to, magnetic storage devices (eg, hard disks, floppy disks, or magnetic tapes, etc.), optical disks (eg, compact discs (CDs), digital versatile discs (DVDs, DVDs), etc.), Smart cards and flash memory devices (eg, erasable programmable read-only memory (EPROM), cards, stick or key drives, etc.).
  • various storage media described herein can represent one or more devices and/or other machine-readable media for storing information.
  • the term "machine-readable medium” may include, but is not limited to, wireless channels and various other media capable of storing, containing, and/or carrying instructions and/or data.
  • the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be The implementation process of the embodiments of the present application constitutes any limitation.
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the technical solutions of the embodiments of the present application can be embodied in the form of software products in essence, or the parts that make contributions to the prior art or the parts of the technical solutions, and the computer software products are stored in a storage medium , including several instructions to cause a computer device (which may be a personal computer, a server, or an access network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the embodiments of this application.
  • the aforementioned storage medium includes: U disk, removable hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A neural architecture search method and system based on evolutionary learning. The method comprises: S101, initializing a population, wherein each neural architecture in the population is an architecture code; S102, randomly selecting two architecture codes in the population, decoding the two architecture codes into two neural architectures for pairing, and inheriting corresponding weights from a supernet, so as to obtain first and second neural network models; S103, evaluating the first and second neural network models which have been trained, so as to obtain a winner and a loser; S104, updating the supernet according to the trained first and second neural network models; S105, calculating a pseudo-gradient value, such that the loser learns from the winner, and obtaining an architecture code of a third neural architecture; S106, replacing, in the population, the architecture code of the loser with the architecture code of the third neural architecture, and updating the population; and S107, outputting an optimal neural network model from the population, and performing iterative evolution on the updated population.

Description

一种基于演化学习的神经网络结构搜索方法和系统A neural network structure search method and system based on evolutionary learning 技术领域technical field
本申请涉及人工智能领域,尤其涉及一种基于演化学习的神经网络结构搜索方法和系统。The present application relates to the field of artificial intelligence, and in particular, to a method and system for searching neural network structures based on evolutionary learning.
背景技术Background technique
随着学习任务的复杂化,神经网络模型设计也越来越复杂。设计出高性能的神经网络需要大量的专业知识与反复人工试验,极大增加了计算资源和时间成本,而使用算法来自动搜索神经网络结构模型可以节省人工成本和优化神经网络模型。神经网络结构搜索(Neural Architecture Search,NAS)是一种自动设计神经网络的技术,可以使制定的计算机算法根据深度学习任务自动搜索出神经网络偏好模型。NAS是自动机器学习(Auto-ML)领域热点之一,通过设计经济高效的搜索方法,自动获取泛化能力强、硬件要求友好的神经网络结构,大量的解放研究员的创造力。As the learning task becomes more complex, the neural network model design becomes more and more complex. Designing a high-performance neural network requires a lot of professional knowledge and repeated manual experiments, which greatly increases computing resources and time costs. Using algorithms to automatically search for neural network structure models can save labor costs and optimize neural network models. Neural Network Architecture Search (NAS) is a technology for automatically designing neural networks, which enables a computer algorithm to automatically search for neural network preference models according to deep learning tasks. NAS is one of the hotspots in the field of automatic machine learning (Auto-ML). By designing cost-effective search methods to automatically obtain neural network structures with strong generalization capabilities and friendly hardware requirements, a large number of researchers' creativity can be liberated.
NAS方法相关的核心设计决策三个主要组成部分是:搜索空间定义、搜索策略和搜索目标评估。神经网络通常由很多节点组成,在进行神经网络结构搜索时,各节点之间可采用完全任意的组合方式,即每个节点均可和其它任意节点相连,并且节点之间有不同的操作可以选择。搜索空间随着节点数量呈指数级增加,搜索空间巨大,搜索速度很慢。由于NAS中涉及的搜索空间巨大,而且其性能评估往往涉及模型的训练,导致消耗的资源很大。The three main components of the core design decisions related to the NAS method are: search space definition, search strategy, and search target evaluation. A neural network usually consists of many nodes. When searching for a neural network structure, a completely arbitrary combination of nodes can be used, that is, each node can be connected to any other node, and there are different operations between nodes to choose from. . The search space increases exponentially with the number of nodes, the search space is huge, and the search speed is very slow. Since the search space involved in NAS is huge, and its performance evaluation often involves model training, it consumes a lot of resources.
发明内容SUMMARY OF THE INVENTION
为了解决上述的问题,本申请的实施例提供了一种基于演化学习的神经网络结构的搜索的方法、系统、电子设备和存储介质。In order to solve the above problems, embodiments of the present application provide a method, system, electronic device and storage medium for searching a neural network structure based on evolutionary learning.
第一方面,本申请提供一种基于演化学习的神经网络结构的搜索方法,所述方法包括:S101,初始化种群,所述种群为包含多个不同神经网络结构的结构编码集合,所述结构编码用于通过连续的实数区间指示所述神经网络结构的任意两个节点之间的连接和操作的映射关系;S102,随机选择所述种群内的两个结构编码,解码所述两个结构编码以得到两个神经网络结构,并对所述两个神经网络结构进行配对;将所述两个神经网络结构分别从超网中继承对应的权重,获得第一神经网络模型和第二神经网络模型;其中所述超网包括多个操作的集合和每个操作的权重;S103,分别训练所述第一、第二神经网络模型以得到训练好的第一、第二神经网络模型;将带有标签的语音、视频或图形样本输入所述训练好的所述第一、第二神经网络模型,计算输出结果与标签的误差值获得胜利者和失败者,所述胜利者的误差值小于失败者的误差值;S104,根据训练好的所述第一、第二神经网络模型更新所述超网;S105,计算所述失败者的结构编码与所述胜利者的结构编码之间的伪梯度值,基于所述伪梯度值使所述失败者的结构编码向所述胜利者的结构编码演化,得到第三神经网络结构编码;所述伪梯度为结构编码更新的梯度;S106,用所述第三神经网络 结构编码在所述种群中代替所述失败者对应的神经网络结构的结构编码,获得更新的种群;S107,输出所述更新的种群中最优的神经网络模型,从而完成神经网络结构的搜索。In a first aspect, the present application provides a method for searching a neural network structure based on evolutionary learning, the method comprising: S101 , initializing a population, where the population is a structure code set including multiple different neural network structures, the structure code A mapping relationship used to indicate the connection and operation between any two nodes of the neural network structure through a continuous real number interval; S102, randomly select two structural codes in the population, and decode the two structural codes to Obtaining two neural network structures, and pairing the two neural network structures; respectively inheriting the corresponding weights from the supernet to obtain the first neural network model and the second neural network model; The supernet includes a set of multiple operations and the weight of each operation; S103, train the first and second neural network models respectively to obtain the trained first and second neural network models; The voice, video or graphic samples are input into the trained first and second neural network models, and the error value between the output result and the label is calculated to obtain the winner and the loser, and the error value of the winner is less than that of the loser. error value; S104, update the supernet according to the trained first and second neural network models; S105, calculate the pseudo gradient value between the structural code of the loser and the structural code of the winner, Evolve the structural code of the loser to the structural code of the winner based on the pseudo gradient value, so as to obtain a third neural network structural code; the pseudo gradient is the gradient of the structural code update; S106, use the third neural network structural code The neural network structure code replaces the structure code of the neural network structure corresponding to the loser in the population to obtain an updated population; S107, output the optimal neural network model in the updated population, thereby completing the neural network structure. search.
该实施方式用连续实数空间表示神经网络结构,能够减小操作选择对应的搜索空间,提高NAS搜索效率,能够增加种群内的神经网络结构的多样性,以匹配后续神经网络的二阶学习演化,能够解决现有的神经网络结构搜索方法效果差、计算资源消耗高等问题。This embodiment uses a continuous real number space to represent the neural network structure, which can reduce the search space corresponding to the operation selection, improve the NAS search efficiency, and increase the diversity of the neural network structure in the population to match the second-order learning evolution of the subsequent neural network. It can solve the problems of poor effect of existing neural network structure search methods and high consumption of computing resources.
在一种实施方式中,所述输出所述更新的种群中最优的神经网络模型,从而完成神经网络结构的搜索,包括:在满足终止条件的情况下,输出所述更新的种群中最优的神经网络模型,从而完成神经网络结构的搜索。In an embodiment, the outputting the optimal neural network model in the updated population, so as to complete the search of the neural network structure, includes: outputting the optimal neural network model in the updated population when the termination condition is satisfied. The neural network model to complete the search of the neural network structure.
该实施方式通过设定终止条件对种群进行迭代演化,提高了神经网络结构搜索的可靠性。In this embodiment, the population is iteratively evolved by setting termination conditions, which improves the reliability of the neural network structure search.
在一种实施方式中,所述输出所述更新的种群中最优的神经网络模型,从而完成神经网络结构的搜索;包括:在不满足终止条件的情况下,返回S102,对所述更新的种群进行迭代演化,直至满足终止条件后,输出所述更新的种群中最优的神经网络模型,从而完成神经网络结构的搜索。In an embodiment, the outputting the optimal neural network model in the updated population, thereby completing the search of the neural network structure; including: if the termination condition is not satisfied, returning to S102, and performing a search on the updated population. The population is iteratively evolved until the termination condition is met, and the optimal neural network model in the updated population is output, thereby completing the search of the neural network structure.
该实施方式基于种群的特性通过迭代的方法找到一组优化神经网络模型,能够为决策者提供多个选择。This embodiment finds a set of optimal neural network models through an iterative method based on the characteristics of the population, which can provide decision makers with multiple choices.
在一种实施方式中,所述将所述两个神经网络结构分别从超网中继承对应的权重,获得第一神经网络模型和第二神经网络模型,包括:将所述第一神经网络结构从所述超网中继承与所述第一神经网络结构相同的连接以及与所述连接相对应的相同操作所对应的第一权重,获得所述第一神经网络模型;将所述第二神经网络结构从所述超网中继承与所述第二神经网络结构相同的连接以及与所述连接相对应的相同操作所对应的第二权重,获得所述第二神经网络模型。In an embodiment, the step of inheriting the corresponding weights of the two neural network structures respectively from the supernet to obtain the first neural network model and the second neural network model includes: converting the first neural network structure Inheriting the same connection as the first neural network structure and the first weight corresponding to the same operation corresponding to the connection from the supernet to obtain the first neural network model; The network structure inherits the same connection as the second neural network structure and the second weight corresponding to the same operation corresponding to the connection from the supernet to obtain the second neural network model.
该实施方式通过继承超网的权重能够加快获得模型的速度,显著减少搜索神经网络的所需的计算代价和运行时间;在迭代过程中使神经网络结构从超网中继承的权重为优化的权重,显著减少了搜索神经网络的所需的计算代价和运行时间。This embodiment can speed up the speed of obtaining the model by inheriting the weights of the supernet, and significantly reduce the computational cost and running time required for searching the neural network; in the iterative process, the weights inherited by the neural network structure from the supernet are optimized weights , which significantly reduces the computational cost and running time required for searching neural networks.
在一种实施方式中,所述分别训练所述第一、第二神经网络模型以得到训练好的第一、第二神经网络模型,包括:采用随机梯度下降法训练至少一次所述第一神经网络模型的权重值,得到所述优化的第一神经网络模型;采用随机梯度下降法训练至少一次所述第二神经网络模型的权重值,得到所述优化的第二神经网络模型。In an embodiment, the separately training the first and second neural network models to obtain the trained first and second neural network models includes: training the first neural network at least once by using a stochastic gradient descent method The weight value of the network model is used to obtain the optimized first neural network model; the weight value of the second neural network model is trained at least once by using the stochastic gradient descent method to obtain the optimized second neural network model.
该实施方式通过训练所述第一、第二神经网络模型获得对权重值优化的第一、第二神经网络模型。In this embodiment, first and second neural network models optimized for weight values are obtained by training the first and second neural network models.
在一种实施方式中,所述将带有标签的语音、视频或图形样本输入所述训练好的所述第一、第二神经网络模型,计算输出结果与标签的误差值获得胜利者和失败者,包括:将带有标签的语音、视频或图形样本分别输入所述训练好的第一神经网络模型和训练好的第二神经网络模型;根据所述训练好的第一神经网络模型的第一输出结果,计算所述第一输出结果与样本的标签之间的第一误差值;根据所述训练好的第二神经网络模型的第二输出结果,计算所述第二输出结果与样本的标签之间的第二误差值;比较所述第一误差值和所述第二误差值,将误差值较小的第一/第二神经网络模型作为胜利者,将误差值较大的第一/第二神经网络模型作为失败者,得到胜利者和失败者。In one embodiment, the first and second neural network models with labels are input into the trained first and second neural network models, and the error value between the output result and the label is calculated to obtain the winner and the loser The method includes: inputting the labeled voice, video or graphic samples into the trained first neural network model and the trained second neural network model respectively; an output result, calculate the first error value between the first output result and the label of the sample; according to the second output result of the trained second neural network model, calculate the difference between the second output result and the sample The second error value between labels; compare the first error value and the second error value, take the first/second neural network model with the smaller error value as the winner, and use the first/second neural network model with the larger error value as the winner. /Second neural network model as loser, get winner and loser.
该实施方式通过带标签的样本训练配对的第一、第二神经网络模型,对训练后的第一、第二模型性能优劣的进行评价,以加快的找到最优模型的速度。In this embodiment, the paired first and second neural network models are trained by the labeled samples, and the performance of the trained first and second models is evaluated, so as to speed up the speed of finding the optimal model.
在一种实施方式中,所述根据训练好的所述第一、第二神经网络模型更新所述超网,包括:在所述第一、第二神经网络模型的两个节点含有相同的连接以及该连接相对应的操作为相同的条件下,将所述胜利者的权重作为所述超网中相应操作的权重,更新所述超网。In an embodiment, the updating the supernet according to the trained first and second neural network models includes: two nodes of the first and second neural network models contain the same connection and the operation corresponding to the connection is the same, update the supernet by using the weight of the winner as the weight of the corresponding operation in the supernet.
该实施方式能够使超网的操作权重同步优化,以利于加快搜索的速度;通过超网的权重更新能够显著减少搜索神经网络的所需的计算代价和运行时间。This embodiment can synchronously optimize the operation weights of the supernet, so as to speed up the search; the weight update of the supernet can significantly reduce the computational cost and running time required for searching the neural network.
在一种实施方式中,所述根据训练好的所述第一、第二神经网络模型更新所述超网,包括:在所述第一、第二神经网络模型的两个节点的连接以及该连接相对应的操作不相同的条件下,将所述第一神经网络模型的权重作为所述超网中与所述第一神经网络结构相同的连接以及该连接相对应的相同操作的权重;将所述第二神经网络模型的权重作为所述超网中与所述第二神经网络结构相同的连接以及该连接相对应的相同操作的权重;更新所述超网。In an embodiment, the updating of the supernet according to the trained first and second neural network models includes: connecting two nodes of the first and second neural network models and the Under the condition that the operations corresponding to the connections are not the same, take the weight of the first neural network model as the weight of the connection in the supernet that has the same structure as the first neural network and the same operation corresponding to the connection; The weight of the second neural network model is used as the weight of the connection in the supernet with the same structure as the second neural network and the same operation corresponding to the connection; the supernet is updated.
该实施方式能够使超网的操作权重同步优化,以利于加快搜索的速度;通过超网的权重更新能够显著减少搜索神经网络的所需的计算代价和运行时间。This embodiment can synchronously optimize the operation weights of the supernet, so as to speed up the search; the weight update of the supernet can significantly reduce the computational cost and running time required for searching the neural network.
在一种实施方式中,所述计算所述失败者的结构编码与所述胜利者的之间结构编码的伪梯度值,基于所述伪梯度值使所述失败者的结构编码向所述胜利者的结构编码演化,得到第三神经网络结构的结构编码,包括:计算所述失败者的结构编码值与所述胜利者的结构编码值的差值,将所述差值乘以随机系数,与其随机系数倍率下的历史的伪梯度累加求和,获得所述失败者的结构编码更新的伪梯度的值;将所述失败者的结构编码值与所述伪梯度的值求和,得到所述第三神经网络结构的结构编码,实现所述失败者的结构编码向所述胜利者的结构编码演化。In one embodiment, the calculating a pseudo gradient value between the structural code of the loser and the structural code of the winner, and based on the pseudo gradient value, the structural code of the loser is moved toward the winner. The structure coding evolution of the winner, to obtain the structure coding of the third neural network structure, comprising: calculating the difference between the structure coding value of the loser and the structure coding value of the winner, and multiplying the difference by a random coefficient, Accumulate and sum up the historical pseudo gradients under the random coefficient multiplier to obtain the value of the pseudo gradient updated by the structural code of the loser; sum the structural code value of the loser and the value of the pseudo gradient to obtain the The structure code of the third neural network structure is used to realize the evolution of the structure code of the loser to the structure code of the winner.
该实施方式根据配对的所述第一、第二神经网络模型评价的结果,使失败者通过向胜利者学习进行结构演化更新,以便更快的找到最优神经网络模型。According to the evaluation results of the paired first and second neural network models, this embodiment enables the loser to perform structural evolution update by learning from the winner, so as to find the optimal neural network model more quickly.
在一种实施方式中,所述终止条件包括所述种群中所有结构编码是否都参与配对或是否达到设定的迭代次数。In one embodiment, the termination condition includes whether all structural codes in the population participate in pairing or whether a set number of iterations is reached.
该实施方式通过设定终止条件对种群进行全面的迭代演化,提高了神经网络结构搜索的可靠性。In this embodiment, the population is fully iteratively evolved by setting termination conditions, which improves the reliability of the neural network structure search.
第二方面,本申请提供一种基于演化学习的神经网络结构的搜索系统,所述系统包括:种群初始化模块,用于初始化种群,所述种群为包含多个不同神经网络结构的结构编码集合,所述结构编码用于通过连续的实数区间指示所述神经网络结构的任意两个节点之间的连接和操作的映射关系;个体配对模块,用于随机选择所述种群内的两个结构编码,解码所述两个结构编码以得到两个神经网络结构,并对所述两个神经网络结构进行配对;权重继承模块,将所述两个神经网络结构分别从超网中继承对应的权重,获得第一神经网络模型和第二神经网络模型;其中所述超网包括多个操作的集合和每个操作的权重;训练模块,用于分别训练所述第一、第二神经网络模型以得到训练好的第一、第二神经网络模型;评估模块,用于将带有标签的语音、视频或图形样本输入所述训练好的所述第一、第二神经网络模型,计算输出结果与标签的误差值获得胜利者和失败者,所述胜利者的误差值小于失败者;超网权重更新模块,用于根据训练好的所述第一、第二神经网络模型更新所述 超网;结构编码演化模块,用于计算所述失败者的结构编码与所述胜利者的结构编码之间的伪梯度值,基于所述伪梯度值使所述失败者的结构编码向所述胜利者的结构编码演化,得到第三神经网络结构编码;所述伪梯度为结构编码更新的梯度;和种群更新模块,用于用所述第三神经网络结构编码在所述种群中代替所述失败者对应的神经网络结构的结构编码,获得更新的种群;模型输出模块,输出所述更新的种群中最优的神经网络模型,从而完成神经网络结构的搜索。In a second aspect, the present application provides a search system for a neural network structure based on evolutionary learning, the system includes: a population initialization module for initializing a population, where the population is a structure encoding set including a plurality of different neural network structures, The structure code is used to indicate the connection and operation mapping relationship between any two nodes of the neural network structure through a continuous real number interval; the individual pairing module is used to randomly select two structure codes in the population, Decode the two structure codes to obtain two neural network structures, and pair the two neural network structures; the weight inheritance module inherits the corresponding weights of the two neural network structures from the supernet respectively, and obtains a first neural network model and a second neural network model; wherein the supernet includes a set of multiple operations and a weight for each operation; a training module for training the first and second neural network models respectively to obtain training Good first and second neural network models; an evaluation module for inputting labeled voice, video or graphic samples into the trained first and second neural network models, and calculating the difference between the output result and the label. The error value obtains the winner and the loser, and the error value of the winner is smaller than that of the loser; the supernet weight update module is used to update the supernet according to the trained first and second neural network models; structure coding an evolution module, configured to calculate a pseudo gradient value between the structural code of the loser and the structural code of the winner, and make the structural code of the loser to the structural code of the winner based on the pseudo gradient value evolution to obtain a third neural network structure code; the pseudo gradient is the gradient of the structure code update; and a population update module for replacing the neural network corresponding to the loser in the population with the third neural network structure code The structure coding of the network structure obtains the updated population; the model output module outputs the optimal neural network model in the updated population, thereby completing the search of the neural network structure.
在一种实施方式中,所述模型输出模块用于:在满足终止条件的情况下,输出所述更新的种群中最优的神经网络模型,从而完成神经网络结构的搜索。In one embodiment, the model output module is configured to output the optimal neural network model in the updated population under the condition that the termination condition is satisfied, so as to complete the search of the neural network structure.
在一种实施方式中,所述模型输出模块用于:在不满足终止条件的情况下,返回S102,对所述更新的种群进行迭代演化,直至满足终止条件后,输出所述更新的种群中最优的神经网络模型,从而完成神经网络结构的搜索。In one embodiment, the model output module is configured to: if the termination condition is not met, return to S102, perform iterative evolution on the updated population, and output the updated population after the termination condition is met. The optimal neural network model is used to complete the search of the neural network structure.
在一种实施方式中,所述权重继承模块用于:将所述第一神经网络结构从所述超网中继承与所述第一神经网络结构相同的连接以及与所述连接相对应的相同操作所对应的第一权重,获得所述第一神经网络模型;将所述第二神经网络结构从所述超网中继承与所述第二神经网络结构相同的连接以及与所述连接相对应的相同操作所对应的第二权重,获得所述第二神经网络模型。In one embodiment, the weight inheritance module is configured to: inherit the first neural network structure from the supernet to the same connection as the first neural network structure and the same connection corresponding to the connection Operate the corresponding first weight to obtain the first neural network model; inherit the second neural network structure from the supernet and inherit the same connection as the second neural network structure and corresponding to the connection The second weight corresponding to the same operation is obtained to obtain the second neural network model.
在一种实施方式中,所述训练模块用于:采用随机梯度下降法训练至少一次所述第一神经网络模型的权重值,得到所述优化的第一神经网络模型;采用随机梯度下降法训练至少一次所述第二神经网络模型的权重值,得到所述优化的第二神经网络模型。In one embodiment, the training module is used for: training the weight value of the first neural network model at least once by using stochastic gradient descent, to obtain the optimized first neural network model; training by using stochastic gradient descent The weight value of the second neural network model is obtained at least once to obtain the optimized second neural network model.
在一种实施方式中,所述评估模块用于:将带有标签的语音、视频或图形样本分别输入所述训练好的第一神经网络模型和训练好的第二神经网络模型;根据所述训练好的第一神经网络模型的第一输出结果,计算所述第一输出结果与样本的标签之间的第一误差值;根据所述训练好的第二神经网络模型的第二输出结果,计算所述第二输出结果与样本的标签之间的第二误差值;比较所述第一误差值和所述第二误差值,将误差值较小的第一/第二神经网络模型记为胜利者,将误差值较大的第一/第二神经网络模型记为失败者,得到胜利者和失败者。In one embodiment, the evaluation module is used to: input the labeled voice, video or graphic samples into the trained first neural network model and the trained second neural network model respectively; The first output result of the trained first neural network model is calculated, and the first error value between the first output result and the label of the sample is calculated; according to the second output result of the trained second neural network model, Calculate the second error value between the second output result and the label of the sample; compare the first error value and the second error value, and record the first/second neural network model with a smaller error value as For the winner, the first/second neural network model with larger error value is recorded as the loser, and the winner and loser are obtained.
在一种实施方式中,所述超网权重更新模块用于:在所述第一、第二神经网络模型的两个节点含有相同的连接以及该连接相对应的操作为相同的条件下,将所述胜利者的权重作为所述超网中相应操作的权重,更新所述超网。In one embodiment, the supernet weight update module is configured to: under the condition that the two nodes of the first and second neural network models contain the same connection and the corresponding operations of the connection are the same, The weight of the winner is used as the weight of the corresponding operation in the supernet to update the supernet.
在一种实施方式中,所述超网权重更新模块用于:在所述第一、第二神经网络模型的两个节点的连接以及该连接相对应的操作不相同的条件下,将所述第一神经网络模型的权重作为所述超网中与所述第一神经网络结构相同的连接以及该连接相对应的相同操作的权重;将所述第二神经网络模型的权重作为所述超网中与所述第二神经网络结构相同的连接以及该连接相对应的相同操作的权重;更新所述超网。In an embodiment, the supernet weight update module is configured to: under the condition that the connection of the two nodes of the first and second neural network models and the operation corresponding to the connection are different, the The weight of the first neural network model is used as the weight of the connection in the supernet that has the same structure as the first neural network and the same operation corresponding to the connection; the weight of the second neural network model is used as the supernet. in the connection with the same structure as the second neural network and the weight of the same operation corresponding to the connection; update the supernet.
在一种实施方式中,所述结构编码演化模块用于:计算所述失败者的结构编码值与所述胜利者的结构编码值的差值,将所述差值乘以随机系数,与其随机系数倍率下的历史的伪梯度累加求和,获得所述失败者的结构编码更新的伪梯度的值;将所述失败者的结构编码值与所述伪梯度的值求和,得到所述第三神经网络结构的结构编码,实现所述失败者的结构编码向所述胜利者的结构编码演化。In one embodiment, the structure coding evolution module is configured to: calculate the difference between the structure coding value of the loser and the structure coding value of the winner, multiply the difference by a random coefficient, and randomly The historical pseudo-gradients under the coefficient magnification are accumulated and summed to obtain the value of the pseudo-gradient updated by the structural code of the loser; the value of the structural code of the loser and the value of the pseudo-gradient are summed to obtain the first The structure coding of the three neural network structures realizes the evolution of the structure coding of the loser to the structure coding of the winner.
在一种实施方式中,所述终止条件包括所述种群中所有结构编码是否都参与配对或是否达到设定的迭代次数。In one embodiment, the termination condition includes whether all structural codes in the population participate in pairing or whether a set number of iterations is reached.
第三方面,本申请提供一种电子装置,包括存储器和处理器;所述处理器用于执行所述存储器所存储的计算机执行指令,所述处理器运行所述计算机执行指令执行上述任意一项实施方式所述的基于演化学习的神经网络结构的搜索方法。In a third aspect, the present application provides an electronic device, including a memory and a processor; the processor is configured to execute computer-executable instructions stored in the memory, and the processor executes the computer-executable instructions to execute any one of the above implementations. The search method of the neural network structure based on evolutionary learning described in the method.
第四方面,本申请提供一种存储介质,包括可读存储介质和存储在所述可读存储介质中的计算机程序,所述计算机程序用于实现上述任意一项实施方式所述的基于演化学习的神经网络结构的搜索方法。In a fourth aspect, the present application provides a storage medium, including a readable storage medium and a computer program stored in the readable storage medium, where the computer program is used to implement the evolution-based learning described in any of the foregoing embodiments The search method of the neural network structure.
本申请实施例提供的一种基于演化学习的神经网络结构的搜索方法、系统、电子设备和存储介质将连续空间映射到神经网络结构上以便对结构进行连续数学运算,能够赋予算法更好的全局搜索能力;通过基于种群的配对二阶学习的结构更新方法,能够更快的找到最优解;同时,基于种群的特性最后可以找到一组解,能够为决策者提供多个选择,同时提高算法的可靠性;而超网的权重继承与更新能够加快模型评价的速度,显著减少搜索神经网络的所需的计算代价和运行时间。A search method, system, electronic device and storage medium for a neural network structure based on evolutionary learning provided by the embodiments of the present application map a continuous space to a neural network structure so as to perform continuous mathematical operations on the structure, which can give the algorithm a better global Search ability; the optimal solution can be found faster through the structure update method based on the population-based paired second-order learning; at the same time, a set of solutions can be found based on the characteristics of the population, which can provide multiple choices for decision makers and improve the algorithm at the same time and the weight inheritance and update of the supernet can speed up the model evaluation and significantly reduce the computational cost and running time required for searching the neural network.
附图说明Description of drawings
图1为本申请实施例提供的神经网络结构搜索的应用环境的示意图;1 is a schematic diagram of an application environment of a neural network structure search provided by an embodiment of the present application;
图2为第一个方案提出的基于种群的神经网络结构搜索流程图;Figure 2 is a flowchart of the population-based neural network structure search proposed by the first scheme;
图3为本申请的系统实施例提出一种基于演化学习的神经网络结构搜索的基本框架图;FIG. 3 provides a basic framework diagram of a neural network structure search based on evolutionary learning for a system embodiment of the present application;
图4为本申请实施例提供的一种基于演化学习的神经网络结构搜索的方法的一般流程框图;FIG. 4 is a general flowchart of a method for searching a neural network structure based on evolutionary learning provided by an embodiment of the present application;
图5a为本申请实施例提供的一种基于演化学习的神经网络结构搜索的方法的具体实施例框图;5a is a block diagram of a specific embodiment of a method for searching a neural network structure based on evolutionary learning provided by an embodiment of the application;
图5b为种群初始化流程框图;Figure 5b is a flow chart of population initialization;
图5c为超网的初始化流程框图;Figure 5c is a block diagram of the initialization flow of the supernet;
图6为本申请实施例提供的一种基于演化学习的神经网络结构搜索的方法中两个节点之间操作示意图;6 is a schematic diagram of operations between two nodes in a method for searching a neural network structure based on evolutionary learning provided by an embodiment of the present application;
图7为本申请实施例提供的一种基于演化学习的神经网络结构搜索的方法中超网的两个节点之间的连接和操作示意图;7 is a schematic diagram of connections and operations between two nodes of a supernet in a method for searching a neural network structure based on evolutionary learning provided by an embodiment of the present application;
图8为本申请实施例提供的超网权重更新方法的流程图;8 is a flowchart of a method for updating a supernet weight provided by an embodiment of the present application;
图9为本申请实施例提供的基于种群配对的结构更新方法的流程图;9 is a flowchart of a population pairing-based structure updating method provided by an embodiment of the present application;
图10为本申请实施例提供的一种基于演化学习的神经网络结构搜索的系统框图;10 is a system block diagram of a neural network structure search based on evolutionary learning provided by an embodiment of the application;
图11为本申请实施例提供的一种超网的更新系统框图;11 is a block diagram of a system for updating a supernet provided by an embodiment of the present application;
图12为本申请实施例提供一种电子装置示意图。FIG. 12 is a schematic diagram of an electronic device according to an embodiment of the present application.
具体实施方式Detailed ways
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" can be the same or a different subset of all possible embodiments, and Can be combined with each other without conflict.
在以下的描述中,所涉及的术语“第一\第二\第三等”或模块A、模块B、模块C等,仅 用于区别类似的对象,不代表针对对象的特定排序,可以理解地,在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。In the following description, the terms "first\second\third, etc." or module A, module B, module C, etc., are only used to distinguish similar objects, and do not represent a specific ordering of objects. It is understood that Indeed, where permitted, the specific order or sequence may be interchanged to enable the embodiments of the application described herein to be practiced in sequences other than those illustrated or described herein.
在以下的描述中,所涉及的表示步骤的标号,如S110、S120……等,并不表示一定会按此步骤执行,在允许的情况下可以互换前后步骤的顺序,或同时执行。In the following description, the reference numerals indicating steps, such as S110, S120, etc., do not necessarily mean that this step will be performed, and the sequence of the preceding and following steps may be interchanged or performed simultaneously if permitted.
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein are only for the purpose of describing the embodiments of the present application, and are not intended to limit the present application.
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.
神经网络结构搜索(NAS)技术应用的场景十分广泛。例如,在图片识别领域,使用算法来自动设计神经网络结构模型,能比人工设计的神经网络结构取得更好的性能;又如,在医学影像处理领域,利用神经网络结构搜索生成神经网络结构模型来处理磁共振成像(MRI),计算机断层扫描(CT)和超声波等数据来判断患者是否患有疾病。本申请在所有涉及图像分类、分割的场合都适用,也适用于其他跟视频处理相关的场景。The neural network structure search (NAS) technology is applied in a wide range of scenarios. For example, in the field of image recognition, using algorithms to automatically design neural network structure models can achieve better performance than manually designed neural network structures; another example, in the field of medical image processing, using neural network structure search to generate neural network structure models To process data such as Magnetic Resonance Imaging (MRI), Computed Tomography (CT) and Ultrasound to determine whether a patient has a disease. This application is applicable to all occasions involving image classification and segmentation, as well as to other scenarios related to video processing.
图1是本申请实施例提供的神经网络结构搜索的应用环境的示意图;如图1所示,所述应用场景包括至少一个后台服务器10、智能设备11。所述智能设备11可以通过互联网连接后台服务器10;智能设备11可以包括可进行医学影像、语音、视频或图片输出的智能设备,如磁共振成像仪、智能音箱、智能摄像头、智能手机等。FIG. 1 is a schematic diagram of an application environment of a neural network structure search provided by an embodiment of the present application; as shown in FIG. 1 , the application scenario includes at least one background server 10 and a smart device 11 . The smart device 11 can be connected to the backend server 10 through the Internet; the smart device 11 can include smart devices capable of outputting medical images, voice, video or pictures, such as magnetic resonance imagers, smart speakers, smart cameras, and smart phones.
智能设备11设置有图片、语音、医学影像或视频采集装置,可以收集到的图片、语音、医学影像或视频数据发送到后台服务器10,使得后台服务器10可以将图片、语音、医学影像或视频输入利用神经网络结构搜索生成神经网络结构模型进行分类、分割或识别。The intelligent device 11 is provided with a picture, voice, medical image or video collection device, and the collected picture, voice, medical image or video data can be sent to the background server 10, so that the background server 10 can input the picture, voice, medical image or video. Use neural network structure search to generate neural network structure models for classification, segmentation or identification.
NAS是超参数优化的子集。定制化的NAS方法实际上并不是完全自动化的,它们依赖于针对应用或学习任务而专门人工编码设计的神经网络结构作为搜索的起点。一般地,神经网络结构搜索方法的目标定义为:NAS is a subset of hyperparameter optimization. Customized NAS methods are not actually fully automated, they rely on neural network structures specially hand-coded for the application or learning task as the starting point for the search. In general, the goal of the neural network structure search method is defined as:
Figure PCTCN2020136950-appb-000001
Figure PCTCN2020136950-appb-000001
Figure PCTCN2020136950-appb-000002
Figure PCTCN2020136950-appb-000002
公式(1)中α定义为结构编码,ω定义为权重信息,ω *为与之相对应的最优权重。
Figure PCTCN2020136950-appb-000003
为在训练集上的损失值,
Figure PCTCN2020136950-appb-000004
为在验证集上的损失值。公式(1)表达的意义为神经网络结构搜索需要找到一个结构编码α,它在最优的权重ω *下能够使验证集上的损失值
Figure PCTCN2020136950-appb-000005
尽可能的小。
In formula (1), α is defined as structural coding, ω is defined as weight information, and ω * is the corresponding optimal weight.
Figure PCTCN2020136950-appb-000003
is the loss value on the training set,
Figure PCTCN2020136950-appb-000004
is the loss value on the validation set. The meaning expressed by formula (1) is that the neural network structure search needs to find a structure code α, which can make the loss value on the validation set under the optimal weight ω *
Figure PCTCN2020136950-appb-000005
as small as possible.
目前的NAS研究可分为三个主要的范畴:基于种群的神经网络结构搜索、基于强化学习的神经网络结构搜索和基于可微分的神经网络结构搜索。Current NAS research can be divided into three main categories: population-based neural network structure search, reinforcement learning-based neural network structure search, and differentiable neural network structure search.
第一个方案是基于种群的神经网络结构搜索,这类方法是目前神经网络结构搜索研究中最常见的方法之一,一般流程是初始化一个种群,然后通过选择父代个体,利用交叉、变异等算子更新父代的拓扑结构得到子代拓扑结构,最后利用“优胜劣汰”的思想淘汰适应度低的个体,保留较优的个体。通过迭代上述过程可以让种群不断演化得到全局/局部最优解。The first solution is population-based neural network structure search. This method is one of the most common methods in current neural network structure search research. The general process is to initialize a population, and then select parent individuals to use crossover, mutation, etc. The operator updates the topology of the parent to obtain the topology of the child, and finally uses the idea of "survival of the fittest" to eliminate individuals with low fitness and retain the better individuals. By iterating the above process, the population can continuously evolve to obtain the global/local optimal solution.
图2为第一个方案提出的基于种群的神经网络结构搜索;如图2所示,基于种群 的神经网络结构搜索的步骤包括:初始化种群,种群为包含不同神经网络结构的个体集合;对不同神经网络结构的个体进行训练,并得到其在验证集上的准确度作为个体的适应度;判断是否满足算法设定的终止条件;如果判定结果为“否”,则将父代神经网络结构通过不同的交叉、变异算子产生子代神经网络结构;如果判定结果为“是”,则输出偏好模型;训练子代神经网络结构得到其在验证集上的准确度作为子代的适应度值;根据子代的适应度值的大小,从父代神经网络结构和子代神经网络结构中选出至少一种神经网络结构的个体;包含这些选出的至少一种神经网络结构的个体集合为新种群将选出的新种群输出。Figure 2 shows the population-based neural network structure search proposed by the first scheme; as shown in Figure 2, the steps of population-based neural network structure search include: initializing the population, and the population is a collection of individuals containing different neural network structures; The individual of the neural network structure is trained, and its accuracy on the validation set is obtained as the fitness of the individual; it is judged whether the termination conditions set by the algorithm are met; if the judgment result is "No", the parent neural network structure is passed. Different crossover and mutation operators generate the offspring neural network structure; if the judgment result is "yes", the preference model is output; the offspring neural network structure is trained to obtain its accuracy on the validation set as the offspring fitness value; According to the size of the fitness value of the offspring, individuals with at least one neural network structure are selected from the neural network structure of the parent and the neural network structure of the offspring; the set of individuals containing the selected at least one neural network structure is a new population Output the selected new population.
基于种群的结构搜索的不同算法主要区别在于将父代神经网络结构通过不同的交叉、变异算子产生子代神经网络结构的步骤,交叉、变异算子的设计可以有很多种。例如,AmoebaNet(源码链接)算法定义了神经网络结构的宏模版,设计了两种变异算子:改变节点间不同的操作的算子和改变不同节点之间的连接的算子。Large-Scaleevolution算法则没有定义宏模版,提出十一种不同的变异算子,包括了改变学习率算子、插入卷积层算子、移除卷积层算子、改变通道数算子等,通过执行不同的变异算子,该方法可以由简单的神经网络结构自动演化出复杂的神经网络结构。The main difference between different population-based structure search algorithms lies in the steps of generating the child neural network structure from the parent neural network structure through different crossover and mutation operators. There are many designs of crossover and mutation operators. For example, the AmoebaNet (source link) algorithm defines a macro template of the neural network structure, and designs two mutation operators: an operator that changes different operations between nodes and an operator that changes the connections between different nodes. The Large-Scaleevolution algorithm does not define a macro template, and proposes eleven different mutation operators, including changing the learning rate operator, inserting the convolutional layer operator, removing the convolutional layer operator, changing the number of channels, etc. By executing different mutation operators, the method can automatically evolve a complex neural network structure from a simple neural network structure.
虽然基于种群的神经网络结构搜索方法具有适合并行、高可靠性的优点,但是由于种群中大量的个体需要评价其适应度,导致需要消耗较多的GPU资源和时间,比如AmoebaNet需要3150GPUdays才能完成搜索任务。因此,该方法在结构搜索准确度和资源消耗之间很难获得一个平衡。Although the population-based neural network structure search method has the advantages of being suitable for parallelism and high reliability, because a large number of individuals in the population need to evaluate their fitness, it needs to consume more GPU resources and time. For example, AmoebaNet needs 3150 GPUdays to complete the search. Task. Therefore, it is difficult for this method to obtain a balance between structure search accuracy and resource consumption.
可微分结构搜索(DifferentiableArchitectureSearch,DARTS)将神经网络映射到连续空间中,采用梯度下降法求解,可以同时得到神经网络的结构和权重等参数。具体地,结构的梯度信息
Figure PCTCN2020136950-appb-000006
为:
Differentiable Architecture Search (DARTS) maps the neural network into a continuous space, and uses the gradient descent method to solve it, and the parameters such as the structure and weight of the neural network can be obtained at the same time. Specifically, the gradient information of the structure
Figure PCTCN2020136950-appb-000006
for:
Figure PCTCN2020136950-appb-000007
Figure PCTCN2020136950-appb-000007
公式(2)中α定义为神经网络模型的结构,ω表示当前的权重,ω *(α)为与之相对应的最优权重,ξ表示对于内部优化一步的学习率。
Figure PCTCN2020136950-appb-000008
为在验证集上的损失值。该方法是通过一次训练后的ω来近似ω *(α),而不是通过不断训练ω来达到其收敛。该方法沿着梯度的方向进行神经网络结构搜索,因此能够快速搜索到较优的神经网络结构。
In formula (2), α is defined as the structure of the neural network model, ω represents the current weight, ω * (α) is the corresponding optimal weight, and ξ represents the learning rate for one step of internal optimization.
Figure PCTCN2020136950-appb-000008
is the loss value on the validation set. This method approximates ω * (α) by training ω once, instead of continuously training ω to reach its convergence. This method searches the neural network structure along the direction of the gradient, so it can quickly search for a better neural network structure.
虽然基于可微分神经网络结构搜索方法具有快速的优点。但是由于该方法只进行单体搜索,相比种群而言,因为每次只搜索出单个结构,可靠性低。并且该方法只利用单体的梯度信息,无法避免局部最优结构;而且该方法对每一个可能的连接和操作均采用一个概率进行编码,导致编码对应的搜索空间巨大,优化所需代价较高。Although the structure search method based on differentiable neural network has the advantage of being fast. However, since this method only searches for a single body, compared with a population, because only a single structure is searched each time, the reliability is low. Moreover, this method only uses the gradient information of the monomer, and cannot avoid the local optimal structure; and this method uses a probability to encode each possible connection and operation, resulting in a huge search space corresponding to the encoding, and the cost of optimization is high. .
下面介绍本申请实施例提供的一种基于演化学习的神经网络结构搜索的方法和系统的构思。The following introduces the concept of a method and system for searching for a neural network structure based on evolutionary learning provided by the embodiments of the present application.
图3为本申请的系统实施例提出一种基于演化学习的神经网络结构搜索的基本框架图。如图3所示,本申请实施例提供一种基于演化学习的神经网络结构搜索的方法和系统,该方案在对神经网络的结构进行自定义编码的基础上,通过基于种群的配对机制使用二阶学习的方法生成新的神经网络模型进行种群更新;利用超网(supernet)权重采用梯度下降法训练新生成的神经网络模型,利用训练后的神经网络模型更新超网模型的权重,从而完成神经网络结构的自动搜索。在此过程中,对训练后的配对的神经网络模型进行性能评估, 评估的失败者向胜利者学习,生成新的神经网络模型进行种群更新。该方案能够解决现有的神经网络结构搜索方法效果差、计算资源消耗高等问题。FIG. 3 provides a basic framework diagram of a neural network structure search based on evolutionary learning according to a system embodiment of the present application. As shown in FIG. 3 , an embodiment of the present application provides a method and system for neural network structure search based on evolutionary learning. On the basis of custom coding the structure of the neural network, the solution uses a population-based pairing mechanism to use two The method of order learning generates a new neural network model for population update; uses the supernet weight to train the newly generated neural network model using the gradient descent method, and uses the trained neural network model to update the weight of the supernet model to complete the neural network. Automatic search of network structures. In this process, performance evaluation is performed on the trained paired neural network models, and the losers of the evaluation learn from the winners to generate new neural network models for population update. The solution can solve the problems of poor effect of existing neural network structure search methods and high consumption of computing resources.
在上述方案中,自定义编码是指根据学习任务或应用人工设定的编码规则对神经网络结构进行编码。在本申请中,神经网络结构中的节点可以分别用多个实数变量来表示,并且任意两个节点之间连接和操作为统一的独立编码。In the above scheme, self-defined coding refers to coding the neural network structure according to the learning task or applying artificially set coding rules. In this application, the nodes in the neural network structure can be represented by multiple real variables respectively, and the connection and operation between any two nodes are unified and independent codes.
在上述方案中,超网(supernet)是直接定义的具有与初始化种群中的神经网络模型相同数量的节点,包括所有的连接关系和操作关系的神经网络,其操作对应的权重用于共享。超网的结构是固定的,其最优的权重可通过标准反向传播进行优化,优化的权重值适用于所有的神经网络模型以提高识别性能。In the above scheme, a supernet is a directly defined neural network with the same number of nodes as the neural network model in the initialized population, including all connection relationships and operation relationships, and the corresponding weights of its operations are used for sharing. The structure of the supernet is fixed, and its optimal weight can be optimized by standard backpropagation. The optimized weight value is applicable to all neural network models to improve the recognition performance.
图4为本申请实施例提供的一种基于演化学习的神经网络结构搜索的方法的流程框图。如图2所示,该方法的流程为:S101,进行种群的初始化和超网的初始化;种群的每一个神经网络结构为一个结构编码,种群初始化是对这些编码随机初始化。S102,对于种群中初始化的每个编码,先解码成神经网络结构再进行随机配对,配对的两个神经网络结构分别从初始化的超网中继承权重。S103,根据学习任务对继承权重后的配对的两个神经网络结构分别进行训练优化,得到神经网络模型,在验证集上分别对两个经过训练的神经网络模型的性能进行评估,根据评估结果获得失败者和胜利者。S104,根据S103中训练后得到的神经网络模型和评估结果更新超网的相应权重值;S105,根据评估结果,使失败者的结构编码向胜利者的结构编码学习,得到新的神经网络的结构编码,然后用新的神经网络的结构编码在种群中代替失败者的结构编码,进行种群更新;S106,判断是否满足终止条件,如果满足终止条件,则执行S107;否则返回S102,对更新的种群进行迭代演化。其中,终止条件为种群中所有个体都参与配对和达到设定的迭代次数;S107,输出新种群中的偏好模型。其中偏好模型为满足学习任务需求的神经网络最优的神经网络。FIG. 4 is a flowchart of a method for searching a neural network structure based on evolutionary learning according to an embodiment of the present application. As shown in Figure 2, the flow of the method is: S101, initialize the population and the supernet; each neural network structure of the population is a structural code, and the population initialization is to randomly initialize these codes. S102 , for each code initialized in the population, first decode it into a neural network structure and then perform random pairing, and the two paired neural network structures respectively inherit weights from the initialized supernet. S103: Perform training and optimization on the two paired neural network structures after inheriting the weights according to the learning task, respectively, to obtain a neural network model, and evaluate the performance of the two trained neural network models on the verification set respectively, and obtain according to the evaluation results. Losers and winners. S104, update the corresponding weight value of the supernet according to the neural network model obtained after training in S103 and the evaluation result; S105, according to the evaluation result, make the structural code of the loser learn from the structural code of the winner to obtain a new structure of the neural network coding, and then replace the structural coding of the loser in the population with the structural coding of the new neural network to update the population; S106, judge whether the termination condition is met, if the termination condition is met, execute S107; otherwise, return to S102 to update the population Iterative evolution. The termination condition is that all individuals in the population participate in pairing and reach the set number of iterations; S107 , output the preference model in the new population. The preference model is the optimal neural network that meets the needs of the learning task.
图5a为本申请实施例提供的一种基于演化学习的神经网络结构搜索的方法的实施例框图。如图5a所示,该方法通过执行以下步骤实现。FIG. 5a is a block diagram of an embodiment of a method for searching a neural network structure based on evolutionary learning according to an embodiment of the present application. As shown in Figure 5a, the method is implemented by performing the following steps.
S201,初始化种群和超网。S201, initialize the population and supernet.
具体地,在种群初始化步骤时,针对应用或学习任务自定义编码规则,根据编码规则编码生成神经网络结构的结构编码,分别将连续的实数区间映射到神经网络结构中。这里的应用或学习任务包括对输入的图片、语音、医学影像或视频进行分类、分割或识别。Specifically, in the population initialization step, the coding rules are customized for the application or learning task, and the structure coding of the neural network structure is generated according to the coding rules, and the continuous real number intervals are respectively mapped to the neural network structure. Applications or learning tasks here include classifying, segmenting, or recognizing input pictures, speech, medical images, or videos.
种群初始化流程如图5b所示。S2011,设置神经网络结构的节点,将设定的节点之间的连接表示为连续的实数,对节点之间进行随机连接,将节点的连接和该连接对应的操作编码到神经网络的结构编码α中,设定α为向量,包括了节点之间的连接和这些连接中的操作,从而分别将连续的实数区间映射到神经网络结构中。The population initialization process is shown in Figure 5b. S2011: Set the nodes of the neural network structure, express the connection between the set nodes as continuous real numbers, randomly connect the nodes, and encode the connection of the nodes and the operation corresponding to the connection into the structure code α of the neural network , α is set as a vector, including the connections between nodes and the operations in these connections, so as to map continuous real-number intervals to the neural network structure respectively.
例如,可以设定神经网络结构具有m个节点,分别将[0,m)的连续的实数空间[0,1),[1,2),[2,3),[3,4)...[m-1,m)映射到m个节点中,前两个节点代表输入,后面的节点随机选择它前面的两个节点进行连接。因此,每个节点除了最前面两个外都需要储存四个变量,其中两个变量表示连接的节点对应的节点编码,另两个变量表示两个连接所表示的操作对应的操作编码。每个节点用四个变量表示,操作编码α为包含了多个四个变量的向量,将操作编码的取值定义在上下限差值为1的实数空间中。示例性的,设该神经网络具有四个节点,m=4,节点的编号为0,1,2,3;每个节点的编码范围为[0,1),[1,2),[2,3),[3,4); 例如,某一个节点的连接编码为0.5,2.3,表示为该节点跟节点0和节点2相连接。For example, the neural network structure can be set to have m nodes, and the continuous real number space [0, 1), [1, 2), [2, 3), [3, 4).. .[m-1, m) is mapped to m nodes, the first two nodes represent the input, and the latter node randomly selects the two nodes in front of it to connect. Therefore, each node needs to store four variables except the first two, two of which represent the node codes corresponding to the connected nodes, and the other two variables represent the operation codes corresponding to the operations represented by the two connections. Each node is represented by four variables, the operation code α is a vector containing a plurality of four variables, and the value of the operation code is defined in the real number space with a difference between the upper and lower limits of 1. Exemplarily, suppose that the neural network has four nodes, m=4, the number of the nodes is 0, 1, 2, 3; the coding range of each node is [0, 1), [1, 2), [2 , 3), [3, 4); For example, the connection code of a certain node is 0.5, 2.3, which means that the node is connected with node 0 and node 2.
S2012,通过S2011的编码规则初始化生成N个结构编码,形成一个种群,N为大于2的自然数。N个结构编码可以解码成N个神经网络结构,这些神经网络结构具有相同的节点数量,不同的连接关系和不同的操作。S2012 , generating N structural codes by initializing the coding rules in S2011 to form a population, where N is a natural number greater than 2. N structure codes can be decoded into N neural network structures with the same number of nodes, different connection relationships and different operations.
用连续实数空间表示神经网络结构,能够增加种群内的神经网络结构的多样性,以匹配后续神经网络的二阶学习演化。Using continuous real number space to represent the neural network structure can increase the diversity of the neural network structure within the population to match the second-order learning evolution of the subsequent neural network.
超网的初始化流程如图5c所示。The initialization flow of the supernet is shown in Figure 5c.
S2013,设置超网的节点,超网的节点数量与种群中的神经网络结构的节点数量相同。S2013 , set the nodes of the supernet, and the number of nodes of the supernet is the same as the number of nodes of the neural network structure in the population.
S2014,设置每两个节点之间的多种操作,设定每种操作的权重。S2014, set multiple operations between every two nodes, and set the weight of each operation.
S2015,将各个节点间的操作表示为权重值集合,该权重值集合涵盖了针对应用或学习任务所需的所有可能的操作;共享权重值集合。S2015 , representing the operations between each node as a weight value set, where the weight value set covers all possible operations required for the application or learning task; the weight value set is shared.
种群中任何可能的结构编码所代表的神经网络结构,都是超网的子网络,将子网络记做网络单元。种群中的神经网络结构的每两个节点之间只能选择一种操作。The neural network structure represented by any possible structural code in the population is a sub-network of the supernet, and the sub-network is recorded as a network unit. Only one operation can be selected between every two nodes of the neural network structure in the population.
例如,如图6所示,假设存在三种可能的操作,第一种操作为3*3平均池化层的操作,第二种操作3*3为最大池化层的操作,第三种操作为3*3卷积层的操作;可以将[0,1)的连续实数空间映射到第一种操作中,以操作编码∈[0,1)表示节点0和节点1之间的操作为平均池化层的操作;可以将[1,2)的连续实数空间映射到第二种操作中,以操作编码∈[1,2)表示操作为最大池化层的操作;将[2,3)的连续实数空间映射到第三种操作中,以操作编码∈[2,3)表示操作为卷积层的操作。最后选择一种操作作为这两个节点之间的操作。如图7所示的超网的任意两个节点之间的连接的示意图,超网不涉及结构编码,超网的每两个节点之间可以并行包括针对应用或学习任务所需的所有可能的操作,包括但不限于平均池化层的操作、最大池化层的操作和卷积层的操作。设定每种操作都包含了各自的权重信息,需要单独训练。For example, as shown in Figure 6, suppose there are three possible operations, the first operation is the operation of the 3*3 average pooling layer, the second operation 3*3 is the operation of the max pooling layer, and the third operation is the operation of 3*3 convolutional layers; the continuous real number space of [0, 1) can be mapped to the first operation, and the operation between node 0 and node 1 is represented by the operation code ∈ [0, 1) as the average The operation of the pooling layer; the continuous real number space of [1, 2) can be mapped into the second operation, and the operation code ∈ [1, 2) represents the operation as the operation of the maximum pooling layer; the [2, 3) The continuous real number space of is mapped into a third operation, denoted by the operation code ∈ [2, 3) as the operation of the convolutional layer. Finally choose an operation as the operation between these two nodes. The schematic diagram of the connection between any two nodes of the supernet is shown in Fig. 7. The supernet does not involve structural coding, and every two nodes of the supernet can include all possible connections required for application or learning tasks in parallel. Operations, including but not limited to operations in average pooling layers, operations in max pooling layers, and operations in convolutional layers. It is set that each operation contains its own weight information and needs to be trained separately.
在本申请中,初始化种群的步骤采用两个节点间连接和操作搜索的编码规则进行编码,可以分别将独立的连续的变量区间映射到两个节点间的连接和连接对应的操作,能够减小操作选择对应的搜索空间,提高NAS搜索效率,能够实现将离散的实数、组合数和概率值等转换为连续的实数。In the present application, the step of initializing the population is coded using the coding rules of the connection between two nodes and the operation search, which can map independent continuous variable intervals to the connection between the two nodes and the operation corresponding to the connection, which can reduce the The operation selects the corresponding search space, improves the NAS search efficiency, and can convert discrete real numbers, combination numbers, and probability values into continuous real numbers.
回到图5a,执行S202,随机选择种群内的两个神经网络结构对应的结构编码,解码成两个神经网络结构进行配对,配对的两个神经网络结构从超网中继承权重。该权重包括操作的权重值。Returning to Fig. 5a, execute S202, randomly select the structure codes corresponding to the two neural network structures in the population, decode into two neural network structures for pairing, and the paired two neural network structures inherit weights from the supernet. The weight includes the weight value of the operation.
具体地,以种群中随机配对后的任意一对神经网络结构为例,将第n个神经网络结构记做第一神经网络结构,第n+1个神经网络结构记做第二神经网络结构,其中第一神经网络从初始化的超网中继承与第一神经网络结构相同的连接以及该连接相对应的相同操作对应的权重,获得第一神经网络模型;第二神经网络结构从初始化的超网中继承与第二神经网络结构相同的连接以及该连接相对应的相同操作对应的权重,获得第二神经网络模型。Specifically, taking any pair of neural network structures randomly paired in the population as an example, the nth neural network structure is recorded as the first neural network structure, and the n+1th neural network structure is recorded as the second neural network structure, The first neural network inherits the same connection as the first neural network structure and the weight corresponding to the same operation corresponding to the connection from the initialized supernet to obtain the first neural network model; the second neural network structure is obtained from the initialized supernet Inheriting the same connection as the second neural network structure and the weight corresponding to the same operation corresponding to the connection, the second neural network model is obtained.
例如,超网中与第一神经网络结构相同的连接以及该连接相对应的相同操作中,卷积层的操作权重为2.6。第一神经网络结构从初始化的超网中继承与第一神经网络结构相同的连接以及该连接相对应的相同操作对应的权重值,使得第一神经网络结构卷积层的操作 的权重为2.6。同上,第二神经网络结构从超网中继承与其结构相同的连接以及该连接相对应的相同操作对应的权重值。For example, in the supernet with the same connection as the first neural network structure and the same operation corresponding to this connection, the operation weight of the convolutional layer is 2.6. The first neural network structure inherits the same connection as the first neural network structure and the weight value corresponding to the same operation corresponding to the connection from the initialized supernet, so that the weight of the operation of the convolutional layer of the first neural network structure is 2.6. As above, the second neural network structure inherits from the supernet a connection with the same structure and a weight value corresponding to the same operation corresponding to the connection.
需说明的是,配对的两个神经网络结构初次从超网继承的权重为初始化超网中与其结构相同的连接以及该连接相对应的相同操作对应的权重值,在后续每次迭代的过程中配对的两个神经网络结构从超网继承的权重为更新后的超网中与其结构相同的连接以及该连接相对应的相同操作对应的权重值。It should be noted that the weights inherited by the paired two neural network structures from the supernet for the first time are the weight values corresponding to the connection with the same structure in the initialization supernet and the same operation corresponding to the connection, and in the process of each subsequent iteration. The weights inherited by the paired two neural network structures from the supernet are the weight values corresponding to the connection in the updated supernet with the same structure and the same operation corresponding to the connection.
S203,结合学习任务,对继承权重后的两个神经网络模型进行一次或者多次梯度下降训练,优化权重值,在验证集上分别验证训练后的第一神经网络模型和第二神经网络模型,得到第一神经网络模型的误差值和第二神经网络模型的误差值,比较两个误差值,将误差值小的神经网络模型记为胜利者,将误差值大的神经网络模型记为失败者,得到评估结果。S203 , in combination with the learning task, perform one or more gradient descent training on the two neural network models after inheriting the weight, optimize the weight value, and verify the trained first neural network model and the second neural network model on the verification set, respectively, Obtain the error value of the first neural network model and the error value of the second neural network model, compare the two error values, and record the neural network model with the smaller error value as the winner, and the neural network model with the larger error value as the loser , get the evaluation result.
具体地,S2031,结合学习任务,采用随机梯度下降法分别训练两个神经网络模型,计算公式(3)获得当前神经网络模型的权重下降值,计算公式(4)获得优化后的神经网络模型的权重ω:Specifically, in S2031, combined with the learning task, the stochastic gradient descent method is used to train the two neural network models respectively, and the calculation formula (3) obtains the weight drop value of the current neural network model, and the calculation formula (4) obtains the optimized neural network model. Weight ω:
Figure PCTCN2020136950-appb-000009
Figure PCTCN2020136950-appb-000009
ω(t)=ω(t-1)-η(t)*Δω(t)         (4)ω(t)=ω(t-1)-η(t)*Δω(t) (4)
其中,t为随机梯度下降法的迭代次数,Δω(t)表示第t代神经网络模型的权重下降值,ω(t)表示优化后第t次迭代神经网络模型的权重值,β为动量,η(t)为学习率,
Figure PCTCN2020136950-appb-000010
为神经网络在训练集上误差值(loss),该值通过计算当前的神经网络模型在验证集上验证的准确度获得。
Among them, t is the number of iterations of the stochastic gradient descent method, Δω(t) represents the weight drop value of the t-th generation neural network model, ω(t) represents the weight value of the t-th iteration neural network model after optimization, β is the momentum, η(t) is the learning rate,
Figure PCTCN2020136950-appb-000010
The error value (loss) for the neural network on the training set, which is obtained by calculating the accuracy of the current neural network model on the validation set.
在本申请的实施例中,分别将第一神经网络模型沿着梯度下降的方向进行操作权重的训练,根据计算得到的权重下降值Δω 1(t)计算优化后的权重值ω 1,得到第一次优化的第一神经网络模型。同理将第二神经网络模型沿着梯度的方向进行操作权重的训练,根据计算得到的权重下降值Δω 2(t)计算优化后的权重值ω 2,得到优化的第二神经网络模型。 In the embodiment of the present application, the first neural network model is respectively trained along the gradient descent direction to operate the weights, and the optimized weight value ω 1 is calculated according to the calculated weight drop value Δω 1 (t) to obtain the first neural network model. The first neural network model for one optimization. Similarly, the second neural network model is trained along the gradient direction to operate the weight, and the optimized weight value ω 2 is calculated according to the calculated weight drop value Δω 2 (t) to obtain the optimized second neural network model.
S2032,在验证集上分别验证第一次优化的第一神经网络模型和第一次优化的第二神经网络模型的误差值,将误差值小第一/第二的神经网络模型记为胜利者,将误差值大的第一/第二神经网络模型记为失败者,得到评估结果。S2032, respectively verify the error values of the first neural network model optimized for the first time and the second neural network model optimized for the first time on the validation set, and record the neural network model with the first/second smaller error value as the winner , the first/second neural network model with large error value is recorded as the loser, and the evaluation result is obtained.
S204,根据训练后得到的权重值和评估结果,更新超网的权重值。S204, update the weight value of the supernet according to the weight value obtained after training and the evaluation result.
具体地,判断第一和第二神经网络某两个节点是否具有相同的连接并且该连接具有相同的操作,如果是,则根据训练后得到的权重值和评估结果,将超网中相应操作的权重值更新为胜利者的权重值;否则,根据训练后得到的权重值和评估结果,将超网中与第一神经网络模型的结构编码相同的连接以及该连接相对应的相同操作的权重更新为第一神经网络模型优化后的权重值ω 1,将超网中与第二神经网络模型的结构编码相同的连接以及该连接相对应的相同操作的权重更新为第二神经网络模型优化后的权重值ω 2Specifically, it is judged whether two nodes of the first and second neural networks have the same connection and the connection has the same operation, if so, according to the weight value and evaluation result obtained after training, the corresponding operation in the supernet The weight value is updated to the weight value of the winner; otherwise, according to the weight value obtained after training and the evaluation result, the same connection as the structure of the first neural network model in the supernet and the weight of the same operation corresponding to the connection are updated For the optimized weight value ω 1 of the first neural network model, update the weight of the same connection in the supernet as the structure of the second neural network model and the same operation corresponding to the connection to the optimized weight of the second neural network model. Weight value ω 2 .
S205,根据评估结果,使失败者的结构编码向胜利者的结构编码学习,得到新的神经网络结构的结构编码,然后用新的神经网络结构的结构编码在种群中代替失败者的神经网络的结构编码,进行种群更新。S205, according to the evaluation result, make the structural code of the loser learn from the structural code of the winner to obtain the structural code of the new neural network structure, and then use the structural code of the new neural network structure to replace the neural network of the loser in the population. Structural coding for population update.
具体地,首先执行步骤S2051,根据S204训练后优化的两个神经网络模型的评估结果,使失败者向胜利者学习以得到新的神经网络模型。具体地,通过对失败者采用基于伪梯度 的学习更新优化失败者的结构编码α,使失败者的结构编码接近胜利者的结构编码,然后用新的神经网络结构的结构编码在种群中代替失败者。Specifically, step S2051 is first performed, and according to the evaluation results of the two neural network models optimized after training in S204, the loser learns from the winner to obtain a new neural network model. Specifically, the structural code α of the loser is optimized by adopting pseudo-gradient-based learning update for the loser, so that the structural code of the loser is close to the structural code of the winner, and then the structural code of the new neural network structure is used to replace the failure in the population By.
基于伪梯度的学习更新算法可以包含一阶梯度的学习更新、二阶梯度的学习更新,或同时包含一阶梯度和二阶梯度的学习更新,甚至可以在梯度信息基础上的常数项或者倍数项拓展。具体地,设配对的两个神经网络结构中胜利者的结构编码为α w,失败者的结构编码为α l,则失败者的神经网络模型的结构编码更新的伪梯度Δα l如下: Pseudo-gradient-based learning and updating algorithms can include first-order gradient learning and updating, second-order gradient learning and updating, or both first-order and second-order gradient learning updates, or even constant terms or multiples based on gradient information. expand. Specifically, set the structure code of the winner as α w and the structure code of the loser as α l in the paired two neural network structures, then the pseudo gradient Δα l updated by the structure code of the neural network model of the loser is as follows:
Δα l(t)=a*μ*(α w(t)-α l(t))+b*γ*Δα l(t-1)+c        (5) Δα l (t)=a*μ*(α w (t)-α l (t))+b*γ*Δα l (t-1)+c (5)
其中,Δα l(t)表示第t代失败者的结构编码的伪梯度值,μ和γ表示从[0,1]均匀分布中随机采样的两个实数值,α、b是两个[-1,1]之间的给定实数值,表示对不同阶次梯度的置信程度,c是[-1,1]之间的给定实数,表示对伪梯度的偏置作用,Δα l(t-1)为失败者的结构更新之前历史累积的伪梯度值,初始值Δα l(0)取值为0。 where Δα l (t) represents the pseudo-gradient value of the structural encoding of the t-th generation loser, μ and γ represent two real values randomly sampled from a [0,1] uniform distribution, and α, b are two [- A given real value between 1, 1], indicating the degree of confidence in gradients of different orders, c is a given real number between [-1, 1], indicating the bias effect on the pseudo gradient, Δα l (t -1) is the historically accumulated pseudo gradient value before the structure update of the loser, and the initial value Δα l (0) is 0.
则计算更新后的失败者的结构编码α l′为: Then calculate the updated structural code α l ' of the loser as:
α l'(t)=α l(t)+Δα l(t)        (6) α l '(t)=α l (t)+Δα l (t) (6)
假设胜利者的结构编码α w为0.2,失败者的结构编码α l为0.9,对a,b,c进行赋值,比如,a=1,b=1,c=0,则通过公式(6)计算得到的α l′的值将小于0.9,更新的失败者的结构编码,实现失败者的结构编码向胜利者的结构编码学习靠近,以得到新的神经网络结构。 Assuming that the structural code α w of the winner is 0.2, and the structural code α l of the loser is 0.9, assign values to a, b, c, for example, a=1, b=1, c=0, then by formula (6) The calculated value of α l ' will be less than 0.9, and the updated structural code of the loser will realize the learning of the structural code of the loser to the structural code of the winner, so as to obtain a new neural network structure.
然后执行步骤S2052,在种群中用新的神经网络结构的结构编码代替失败者,进行种群更新。Then step S2052 is performed, and the structure code of the new neural network structure is used to replace the loser in the population, and the population is updated.
S206,判断是否满足终止条件,如果满足,则执行S207;否则,重复执行步骤202-206,根据种群进行配对和迭代学习,继续演化更新种群,直至达到设定的终止条件。终止条件可以为种群中所有结构编码对应神经网络结构都完成配对和学习。S206, judge whether the termination condition is met, if so, execute S207; otherwise, repeat steps 202-206, perform pairing and iterative learning according to the population, and continue to evolve and update the population until the set termination condition is reached. The termination condition can be paired and learned for all structures in the population that encode the corresponding neural network structures.
具体地,判断种群中的所有结构编码对应神经网络结构是否都参与配对;如果判断结果为“否”,则执行S202;如果判断结果为“是”,则执行S208。Specifically, it is judged whether all the neural network structures corresponding to the structure codes in the population participate in the pairing; if the judgment result is "No", execute S202; if the judgment result is "Yes", execute S208.
在执行时步骤S206时,可以判断如果n<N-1,则令n的值加2,返回执行步骤202,如果n≥N-1,则执行S207。When executing step S206, it can be judged that if n<N-1, add 2 to the value of n, and return to step 202, and if n≥N-1, execute S207.
终止条件还可以为达到设定的迭代次数。The termination condition can also be reaching a set number of iterations.
具体地,设t为当前演化进行的代数,T为设定的迭代次数,判断当前演化进行的代数是否达到设定的迭代次数;如果判断结果为“否”,则执行S202;如果判断结果为“是”,则执行S207。Specifically, let t be the algebra performed by the current evolution, and T be the set number of iterations, and judge whether the algebra performed by the current evolution has reached the set number of iterations; if the judgment result is "No", execute S202; if the judgment result is "Yes", execute S207.
在执行时步骤S206时,可以判断如果t<T,且n<N-1,则令t的值加1,n的值加2,返回执行步骤202;如果t<T,且n≥N-1,令t的值加1,n的值为1,返回执行步骤202;如果t≥T,则执行S207。When executing step S206, it can be judged that if t<T and n<N-1, then add 1 to the value of t, add 2 to the value of n, and return to step 202; if t<T, and n≥N- 1, add 1 to the value of t, and the value of n is 1, return to step 202; if t≥T, execute S207.
S207,更新的种群中,选出偏好模型。S207, in the updated population, select a preference model.
基于上述实施例,本申请还提出一种超网权重更新机制,该机制利用种群里的神经网络结构两两不重复配对进行演化,在演化过程中,通过失败者和成功者的权重值共同为超网权重进行更新。具体地,图8为本申请提出的超网权重更新方法的流程图,如图6所示包括:Based on the above embodiment, the present application also proposes a supernet weight update mechanism, which uses the neural network structures in the population for evolution without repeated pairing. During the evolution process, the weight values of the losers and winners are jointly The supernet weights are updated. Specifically, FIG. 8 is a flowchart of a method for updating supernet weights proposed by the application, and as shown in FIG. 6 , it includes:
S301,随机初始化超网权重,具体实施方式可以引用步骤S2013-S2015。S301: Randomly initialize the supernet weight, and steps S2013-S2015 may be referred to for the specific implementation manner.
S302,对种群中解码后的神经网络结构随机配对,配对的两个神经网络结构从初始化 的超网中继承与其结构相同的连接以及该连接相对应的相同操作对应的权重,生成两个神经网络模型。需说明的是,配对的两个神经网络结构初次从超网继承的权重值为初始化超网中与其结构相同的连接以及该连接相对应的相同操作的权重值,在后续每次迭代的过程中配对的两个神经网络结构从超网继承的权重值为更新后的超网与其结构相同的连接以及该连接相对应的相同操作的权重值。S302: Randomly pair the decoded neural network structures in the population, and the paired two neural network structures inherit the connection with the same structure and the weight corresponding to the same operation corresponding to the connection from the initialized supernet to generate two neural networks Model. It should be noted that the weight value of the paired two neural network structures inherited from the supernet for the first time is the weight value of the connection with the same structure in the initialization supernet and the weight value of the same operation corresponding to the connection, in the process of each subsequent iteration. The weight values inherited by the paired two neural network structures from the supernet are the updated supernet's connection with the same structure and the weight value of the same operation corresponding to the connection.
S303,对S302中继承权重后的两个神经网络模型训练一次或多次梯度下降。S303, train one or more gradient descents on the two neural network models after inheriting the weights in S302.
S304,通过计算的两个神经网络模型在验证集上的误差值得到失败者和胜利者,误差值小的神经网络模型为胜利者,误差值大的神经网络模型为失败者。S304, the loser and the winner are obtained by calculating the error values of the two neural network models on the validation set, the neural network model with the smaller error value is the winner, and the neural network model with the larger error value is the loser.
S305,判断配对的两个神经网络模型是否含有相同的连接以及该连接相对应的操作相同;如果判断结果为“是”则执行S306;如果判断结果为“否”执行S307。S305, it is judged whether the two paired neural network models contain the same connection and the corresponding operations of the connection are the same; if the judgment result is "yes", execute S306; if the judgment result is "no", execute S307.
S306,超网中与两个神经网络模型的相同的连接以及该连接相对应的相同操作的权重值更新为胜利者的权重值。S306 , the weight value of the same connection in the supernet with the two neural network models and the same operation corresponding to the connection is updated to the weight value of the winner.
S307,超网中与两个神经网络模型的相对应连接的操作的权重值分别使用两个神经网络模型的优化后的权重值。S307 , the weight values of the operations corresponding to the connections of the two neural network models in the supernet respectively use the optimized weight values of the two neural network models.
S308,输出权重值更新后的超网。S308 , output the supernet with the updated weight value.
基于上述实施例,本申请还提出一种基于种群配对的结构更新机制,该机制用于实现对种群里的神经网络结构两两不重复配对进行竞争,失败者向获胜者基于伪梯度进行二阶学习,生成新的个体替代原有失败者。Based on the above-mentioned embodiment, the present application also proposes a structure update mechanism based on population pairing, which is used to realize the non-repetitive pairing of neural network structures in the population to compete, and the loser to the winner to perform a second-order based on pseudo-gradient. Learning, generating new individuals to replace the original losers.
图9为本申请提出的基于种群配对的结构更新方法的流程图。如图7所示,包括:FIG. 9 is a flowchart of the method for updating the structure based on population pairing proposed by the present application. As shown in Figure 7, including:
S401,初始化种群,N为种群中编码的神经网络结构的总数量。具体实施方式可以引用步骤S2011-2012。S401, initialize the population, where N is the total number of neural network structures encoded in the population. For specific implementations, steps S2011-2012 can be referred to.
S402,对种群中解码后的神经网络结构随机进行不重复的两两配对,其中n和n+1为配对两个神经网络结构的编号;将第n个神经网络结构记做第一神经网络结构,第n+1个神经网络结构记做第二神经网络结构。S402, randomly perform non-repetitive pairing of the decoded neural network structures in the population, wherein n and n+1 are the numbers of the paired two neural network structures; record the nth neural network structure as the first neural network structure , the n+1th neural network structure is recorded as the second neural network structure.
S403,配对的两个神经网络结构从超网继承权重值。S403, the paired two neural network structures inherit weight values from the supernet.
需说明的是,配对的两个神经网络结构初次从超网继承的权重值为初始化超网中与两个神经网络结构对应连接以及该连接对应的相同操作的权重值,在后续每次迭代的过程中配对的两个神经网络结构从超网继承的权重值为更新后的超网的权重值。It should be noted that the weight value inherited from the supernet for the first time by the paired two neural network structures is the weight value of the corresponding connection between the two neural network structures in the initialization supernet and the same operation corresponding to the connection. The weight value inherited from the supernet by the two neural network structures paired in the process is the weight value of the updated supernet.
S404,根据学习任务,训练配对的神经网络结构,得到失败者和胜利者。S404, according to the learning task, train the paired neural network structure to obtain the loser and the winner.
S405,根据评估结果,使失败者的结构编码向胜利者的结构编码学习,得到新的神经网络的结构编码。S405 , according to the evaluation result, learn the structural code of the loser from the structural code of the winner to obtain a new structural code of the neural network.
S406,用新神经网络的结构编码代替种群中的失败者。S406, replace the losers in the population with the structural encoding of the new neural network.
S407,判断种群中所有个体是否都参与配对;如果判断结果为“否”,则执行S402;如果判断结果为“是”,则执行S408;S407, judging whether all individuals in the population participate in the pairing; if the judgment result is "No", then execute S402; if the judgment result is "Yes", execute S408;
具体地,如果n≥N-1,则结束迭代,执行S508;如果n<N-1,则令n的值加2,返回执行步骤402。Specifically, if n≥N-1, the iteration is ended, and S508 is executed; if n<N-1, the value of n is incremented by 2, and the execution returns to step 402.
S408,输出新种群。S408, output a new population.
本申请将连续空间映射到神经网络结构上以便对结构进行连续数学运算,能够赋予算法更好的全局搜索能力;通过基于种群的配对二阶学习的结构更新方法,能够更快的找到 最优解;同时,基于种群的特性最后可以找到一组解,能够为决策者提供多个选择,同时提高算法的可靠性;而超网的权重继承与更新能够加快模型评价的速度,显著减少搜索神经网络的所需的计算代价和运行时间。In this application, the continuous space is mapped to the neural network structure in order to perform continuous mathematical operations on the structure, which can endow the algorithm with better global search ability; through the population-based paired second-order learning structure update method, the optimal solution can be found faster At the same time, based on the characteristics of the population, a set of solutions can finally be found, which can provide decision makers with multiple choices and improve the reliability of the algorithm at the same time; and the weight inheritance and update of the supernet can speed up the speed of model evaluation and significantly reduce the search for neural networks. the required computational cost and running time.
本申请实施例提供一种基于演化学习的神经网络结构搜索的系统,如图10所示,该系统包括:种群初始化模块801、个体配对模块802、训练评估模块803、超网权重更新804、种群更新模块805和模型输出模块806。An embodiment of the present application provides a system for searching neural network structures based on evolutionary learning. As shown in FIG. 10, the system includes: a population initialization module 801, an individual pairing module 802, a training evaluation module 803, a supernet weight update 804, a population Update module 805 and model output module 806.
该系统通过种群初始化模块801初始化种群,其中种群内的每一个神经网络结构为一个结构编码,该结构编码用连续的实数区间映射神经网络结构的节点之间的连接和对应的操作。通过个体配对模块802随机选择种群内的两个结构编码,解码成两个神经网络结构进行配对;配对的两个神经网络结构分别从超网中继承相应的权重,获得第一神经网络模型和第二神经网络模型;其中超网包括所有操作的集合,权重为所有操作所对应的权重信息;通过训练评估模块803分别训练第一、第二神经网络模型,评估训练后的第一、第二神经网络模型,获得胜利者和失败者;通过超网权重更新804根据训练后的第一、第二神经网络模型更新超网;通过种群更新模块805计算失败者的结构编码与胜利者的结构编码之间的伪梯度值,基于伪梯度值使失败者的结构编码向胜利者的结构编码演化,得到第三神经网络结构的结构编码;用第三神经网络结构的结构编码在种群中代替失败者对应的神经网络结构的结构编码,获得更新的种群;通过模型输出模块806在满足终止条件的情况下,输出更新的种群中最优的神经网络模型;从而完成神经网络结构的搜索;否则,执行个体配对模块802,对更新的种群进行迭代演化。The system initializes the population through the population initialization module 801, wherein each neural network structure in the population is a structure code, and the structure code uses a continuous real number interval to map the connections and corresponding operations between the nodes of the neural network structure. The individual pairing module 802 randomly selects two structural codes in the population, and decodes them into two neural network structures for pairing; the paired two neural network structures inherit the corresponding weights from the supernet respectively to obtain the first neural network model and the first neural network structure. Two neural network models; wherein the supernet includes a set of all operations, and the weight is the weight information corresponding to all operations; the first and second neural network models are trained respectively by the training evaluation module 803, and the trained first and second neural network models are evaluated. Network model to obtain winners and losers; update the supernet according to the first and second neural network models after training through the supernet weight update 804; calculate the difference between the structural code of the loser and the structural code of the winner through the population update module 805. Based on the pseudo gradient value, the structural code of the loser is evolved to the structural code of the winner based on the pseudo gradient value, and the structural code of the third neural network structure is obtained; the structural code of the third neural network structure is used to replace the corresponding structure of the loser in the population. The structure coding of the neural network structure obtained by the updated population; the model output module 806 outputs the optimal neural network model in the updated population under the condition that the termination condition is satisfied; thus completing the search of the neural network structure; otherwise, execute the individual The pairing module 802 performs iterative evolution on the updated population.
具体地,种群初始化模块801还可以根据自定义的编码规则人工进行的编码生成N个具有相同数量的节点的神经网络结构;通过编码将连续的实数区间映射到单个神经网络结构的节点之间的连接和对应的离散操作,N为自然数。Specifically, the population initialization module 801 can also generate N neural network structures with the same number of nodes by manual coding according to the self-defined coding rules; by coding, the continuous real number interval is mapped to the nodes between the nodes of a single neural network structure. Connection and corresponding discrete operation, N is a natural number.
本申请实施例提供的基于演化学习的神经网络结构搜索的系统还包括超网初始化模块,超网初始化模块根据学习任务设置超网,超网包括N个网络单元和所有操作的集合。The system for searching neural network structures based on evolutionary learning provided by the embodiments of the present application further includes a supernet initialization module, which sets up a supernet according to a learning task, and the supernet includes N network units and a set of all operations.
具体地,在本申请实施例提供的基于演化学习的神经网络结构搜索的系统中,个体配对模块802将第一神经网络结构从超网中继承与第一神经网络结构相同的连接以及该连接相对应的相同操作对应的第一的权重,获得第一神经网络模型;将第二神经网络结构从超网中继承与第二神经网络结构相同的连接以及该连接相对应的相同操作对应的第二的权重,获得第二神经网络模型。训练评估模块803结合学习任务,采用随机梯度下降法训练至少一次第一神经网络模型的权重值,得到优化的第一神经网络模型;结合学习任务,采用随机梯度下降法训练一次第二神经网络模型,得到优化的第二神经网络模型;在验证集上分别评估优化的第一神经网络模型和优化第二神经网络模型;根据优化的第一神经网络模型计算第一神经网络模型的误差值;根据优化的第二神经网络模型计算第二神经网络模型的误差值;比较第一神经网络模型的误差值和第二神经网络模型的误差值;将误差值较小第一/第二的神经网络的网络模型记为胜利者,将误差值大的第一/第二记为失败者,得到评估结果。超网权重更新804在第一和第二神经网络存在某两个节点具有相同的连接并且该连接具有相同的操作的条件下,将胜利者的该操作权重作为超网的权重;在第一和第二神经网络存在不同的节点连接或者存在相同的节点连接却对应不同操作的条件下,将第一神经网络模型的权重作为超网中与第一神经网络结构相同的连接以及该连接相对应 的相同操作的权重;将第二神经网络模型的权重作为超网中与第二神经网络结构相同的连接以及该连接相对应的相同操作的权重;获得更新的超网。种群更新模块计算失败者的结构编码值与胜利者的结构编码值的差值,将差值乘以随机系数,与其随机系数倍率下的历史的伪梯度累加求和,获得失败者的结构编码更新的伪梯度的值;将失败者的结构编码值与伪梯度的值求和,得到第三神经网络结构的结构编码。模型输出模块806判断种群中所有神经网络结构是否都参与配对;如果判断结果为“否”,则返回个体配对模块802,对更新的种群进行迭代演化;如果判断结果为“是”,则输出更新的种群中最优的神经网络模型;从而完成神经网络结构的搜索。模型输出模块806:设定迭代次数为T,T为大于0的自然数,判断当前执行次数是否小于T;如果判断结果为“是”,则返回个体配对模块802,对更新的种群进行迭代演化;如果判断结果为“否”,则输出更新的种群中最优的神经网络模型;从而完成神经网络结构的搜索。模型输出模块806还可以与执行迭代次数大于1时,将配对的两个神经网络结构从更新的超网中继承相应权重值。Specifically, in the system for searching neural network structures based on evolutionary learning provided by the embodiments of the present application, the individual pairing module 802 inherits the first neural network structure from the supernet and inherits the same connection as the first neural network structure and the connection is related to the first neural network structure. The first weight corresponding to the corresponding same operation is obtained, and the first neural network model is obtained; the second neural network structure is inherited from the supernet with the same connection as the second neural network structure and the second corresponding to the same operation corresponding to the connection. The weights of the second neural network model are obtained. The training evaluation module 803 uses the stochastic gradient descent method to train the weight value of the first neural network model at least once in combination with the learning task to obtain an optimized first neural network model; in combination with the learning task, uses the stochastic gradient descent method to train the second neural network model once , obtain the optimized second neural network model; evaluate the optimized first neural network model and the optimized second neural network model respectively on the verification set; calculate the error value of the first neural network model according to the optimized first neural network model; The optimized second neural network model calculates the error value of the second neural network model; compares the error value of the first neural network model and the error value of the second neural network model; The network model is recorded as the winner, the first/second with the largest error value is recorded as the loser, and the evaluation result is obtained. The supernet weight update 804 takes the operation weight of the winner as the weight of the supernet under the condition that some two nodes have the same connection in the first and second neural networks and the connection has the same operation; Under the condition that the second neural network has different node connections or the same node connection but corresponds to different operations, the weight of the first neural network model is taken as the connection in the supernet that has the same structure as the first neural network and the corresponding value of the connection. The weight of the same operation; take the weight of the second neural network model as the weight of the connection in the supernet that has the same structure as the second neural network and the weight of the same operation corresponding to the connection; obtain the updated supernet. The population update module calculates the difference between the structural code value of the loser and the structural code value of the winner, multiplies the difference by a random coefficient, and accumulates and sums the historical pseudo-gradient under the random coefficient multiple to obtain the structural code update of the loser The value of the pseudo gradient of ; sum the value of the structure code of the loser and the value of the pseudo gradient to obtain the structure code of the third neural network structure. The model output module 806 judges whether all the neural network structures in the population are involved in pairing; if the judgment result is "no", it returns to the individual pairing module 802 to iteratively evolve the updated population; if the judgment result is "yes", it outputs the updated The optimal neural network model in the population; thus completing the search of the neural network structure. Model output module 806: Set the number of iterations as T, where T is a natural number greater than 0, and determine whether the current number of executions is less than T; if the judgment result is "yes", return to the individual pairing module 802 to iteratively evolve the updated population; If the judgment result is "No", output the optimal neural network model in the updated population; thus completing the search of the neural network structure. The model output module 806 may also inherit the corresponding weight values from the updated supernet when the number of execution iterations is greater than 1.
本申请实施例提供一种超网的更新系统,如图11所示,系统包括:通过超网初始化模块901随机初始化超网,超网包括N个网络单元和所有操作的集合;通过个体配对模块802随机选择种群内的两个神经网络结构进行配对;配对的两个神经网络结构分别从超网中继承相应的权重,获得第一神经网络模型和第二神经网络模型;通过训练评估模块803分别训练第一、第二神经网络模型,评估训练后的第一、第二神经网络模型,获得胜利者和失败者;在第一和第二神经网络存在某两个节点具有相同的连接并且该连接具有相同的操作的条件下,将胜利者的该操作权重作为超网的权重。将胜利者的权重作为超网的权重;和通过更新模块904在第一和第二神经网络存在不同的节点连接或者存在相同的节点连接却对应不同操作的条件下,将第一神经网络模型的权重作为超网中与第一神经网络结构相同的连接以及该连接相对应的相同操作的权重;将第二神经网络模型的权重作为超网中与第二神经网络结构相同的连接以及该连接相对应的相同操作的权重;获得更新的超网。An embodiment of the present application provides a system for updating a supernet. As shown in FIG. 11 , the system includes: randomly initializing a supernet through a supernet initialization module 901, and the supernet includes N network units and a set of all operations; 802 Randomly select two neural network structures in the population for pairing; the paired two neural network structures respectively inherit the corresponding weights from the supernet to obtain the first neural network model and the second neural network model; Train the first and second neural network models, evaluate the trained first and second neural network models, and obtain winners and losers; there are two nodes in the first and second neural networks that have the same connection and the connection Under the condition of having the same operation, the weight of the operation of the winner is taken as the weight of the supernet. Taking the weight of the winner as the weight of the supernet; and by updating the module 904, under the condition that the first and second neural networks have different node connections or the same node connections are corresponding to different operations, the first neural network model The weight is taken as the weight of the same connection as the first neural network structure in the supernet and the same operation corresponding to the connection; the weight of the second neural network model is taken as the same connection as the second neural network structure in the supernet and this connection The corresponding weights of the same operation; get the updated supernet.
本申请实施例提供一种电子装置1000,如图12所示,包括处理器1001和存储器1002;所述处理器1001用于执行所述存储器1002所存储的计算机执行指令,所述处理器1001运行所述计算机执行指令执行上述任意实施例所述的基于演化学习的神经网络结构搜索的方法。An embodiment of the present application provides an electronic device 1000, as shown in FIG. 12, including a processor 1001 and a memory 1002; the processor 1001 is configured to execute computer-executed instructions stored in the memory 1002, and the processor 1001 runs The computer executes the instructions to execute the method for searching for a neural network structure based on evolutionary learning described in any of the foregoing embodiments.
本申请实施例提供一种存储介质,包括可读存储介质和存储在所述可读存储介质中的计算机程序,所述计算机程序用于实现上述任意一实施例所述的基于演化学习的神经网络结构搜索的方法。An embodiment of the present application provides a storage medium, including a readable storage medium and a computer program stored in the readable storage medium, where the computer program is used to implement the evolutionary learning-based neural network described in any of the foregoing embodiments method for structure search.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请实施例的范围。此外,本申请实施例的各个方面或特征可以实现成方法、装置或使用标准编程和/或工程技术的制品。本申请中使用的术语“制品”涵盖可从任何计算机可读器件、载体或介质访问的计算机程序。例如,计算机可读介质可以包括,但不限于:磁存储器件(例如,硬盘、软盘或磁带等),光盘(例如,压缩盘(compactdisc,CD)、数字通用盘(digitalversatiledisc,DVD)等),智能卡和闪存器件(例如,可擦写可编程只读存储器 (erasableprogrammableread-onlymemory,EPROM)、卡、棒或钥匙驱动器等)。另外,本文描述的各种存储介质可代表用于存储信息的一个或多个设备和/或其它机器可读介质。术语“机器可读介质”可包括但不限于,无线信道和能够存储、包含和/或承载指令和/或数据的各种其它介质。Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Experts may use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of the embodiments of the present application. Furthermore, various aspects or features of the embodiments of the present application may be implemented as a method, apparatus or article of manufacture using standard programming and/or engineering techniques. The term "article of manufacture" as used in this application encompasses a computer program accessible from any computer readable device, carrier or medium. For example, computer-readable media may include, but are not limited to, magnetic storage devices (eg, hard disks, floppy disks, or magnetic tapes, etc.), optical disks (eg, compact discs (CDs), digital versatile discs (DVDs, DVDs), etc.), Smart cards and flash memory devices (eg, erasable programmable read-only memory (EPROM), cards, stick or key drives, etc.). Additionally, various storage media described herein can represent one or more devices and/or other machine-readable media for storing information. The term "machine-readable medium" may include, but is not limited to, wireless channels and various other media capable of storing, containing, and/or carrying instructions and/or data.
应当理解的是,在本申请实施例的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that, in various embodiments of the embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be The implementation process of the embodiments of the present application constitutes any limitation.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者接入网设备等)执行本申请实施例各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-OnlyMemory)、随机存取存储器(RAM,RandomAccessMemory)、磁碟或者光盘等各种可以存储程序代码的介质。The functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present application can be embodied in the form of software products in essence, or the parts that make contributions to the prior art or the parts of the technical solutions, and the computer software products are stored in a storage medium , including several instructions to cause a computer device (which may be a personal computer, a server, or an access network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the embodiments of this application. The aforementioned storage medium includes: U disk, removable hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this. should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims (22)

  1. 一种基于演化学习的神经网络结构的搜索方法,其特征在于,所述方法用于包括:A search method for a neural network structure based on evolutionary learning, wherein the method is used to include:
    S101,初始化种群,所述种群为包含多个不同神经网络结构的结构编码集合,所述结构编码用于通过连续的实数区间指示所述神经网络结构的任意两个节点之间的连接和操作的映射关系;S101: Initialize a population, where the population is a set of structure codes including a plurality of different neural network structures, and the structure codes are used to indicate the connection and operation between any two nodes of the neural network structure through a continuous real number interval Mapping relations;
    S102,随机选择所述种群内的两个结构编码,解码所述两个结构编码以得到两个神经网络结构,并对所述两个神经网络结构进行配对;将所述两个神经网络结构分别从超网中继承对应的权重,获得第一神经网络模型和第二神经网络模型;其中所述超网包括多个操作的集合和每个操作的权重;S102: Randomly select two structural codes in the population, decode the two structural codes to obtain two neural network structures, and pair the two neural network structures; separate the two neural network structures Inheriting the corresponding weights from the supernet to obtain the first neural network model and the second neural network model; wherein the supernet includes a set of multiple operations and the weight of each operation;
    S103,分别训练所述第一、第二神经网络模型以得到训练好的第一、第二神经网络模型;将带有标签的语音、视频或图形样本输入所述训练好的所述第一、第二神经网络模型,计算输出结果与标签的误差值获得胜利者和失败者,所述胜利者的误差值小于失败者的误差值;S103, train the first and second neural network models respectively to obtain the trained first and second neural network models; input the labeled voice, video or graphic samples into the trained first and second neural network models. The second neural network model calculates the error value between the output result and the label to obtain the winner and the loser, and the error value of the winner is less than the error value of the loser;
    S104,根据训练好的所述第一、第二神经网络模型更新所述超网;S104, update the supernet according to the trained first and second neural network models;
    S105,计算所述失败者的结构编码与所述胜利者的结构编码之间的伪梯度值,基于所述伪梯度值使所述失败者的结构编码向所述胜利者的结构编码演化,得到第三神经网络结构编码;所述伪梯度为结构编码更新的梯度;S105: Calculate a pseudo gradient value between the structural code of the loser and the structural code of the winner, and based on the pseudo gradient value, make the structural code of the loser evolve to the structural code of the winner, to obtain The third neural network structure code; the pseudo gradient is the gradient of the structure code update;
    S106,用所述第三神经网络结构编码在所述种群中代替所述失败者对应的神经网络结构的结构编码,获得更新的种群;S106, use the third neural network structure code to replace the structure code of the neural network structure corresponding to the loser in the population to obtain an updated population;
    S107,输出所述更新的种群中最优的神经网络模型,完成神经网络结构的搜索。S107, output the optimal neural network model in the updated population, and complete the search of the neural network structure.
  2. 根据权利要求1所述的方法,其特征在于,所述输出所述更新的种群中最优的神经网络模型,从而完成神经网络结构的搜索,包括:在满足终止条件的情况下,输出所述更新的种群中最优的神经网络模型,从而完成神经网络结构的搜索。The method according to claim 1, wherein the outputting the optimal neural network model in the updated population, so as to complete the search of the neural network structure, comprises: when a termination condition is satisfied, outputting the The optimal neural network model in the updated population is used to complete the search of the neural network structure.
  3. 根据权利要求1所述的方法,其特征在于,所述输出所述更新的种群中最优的神经网络模型,从而完成神经网络结构的搜索;包括:在不满足终止条件的情况下,返回S102,对所述更新的种群进行迭代演化,直至满足终止条件后,输出所述更新的种群中最优的神经网络模型,从而完成神经网络结构的搜索。The method according to claim 1, wherein the outputting the optimal neural network model in the updated population, so as to complete the search of the neural network structure; comprising: returning to S102 if the termination condition is not satisfied , performing iterative evolution on the updated population until the termination condition is satisfied, and outputting the optimal neural network model in the updated population, thereby completing the search of the neural network structure.
  4. 根据权利要求1-3之一所述的方法,其特征在于,所述将所述两个神经网络结构分别从超网中继承对应的权重,获得第一神经网络模型和第二神经网络模型,包括:The method according to any one of claims 1-3, wherein the two neural network structures inherit corresponding weights from a supernet respectively to obtain a first neural network model and a second neural network model, include:
    将所述第一神经网络结构从所述超网中继承与所述第一神经网络结构相同的连接以及与所述连接相对应的相同操作所对应的第一权重,获得所述第一神经网络模型;Inheriting the first neural network structure from the supernet with the same connection as the first neural network structure and the first weight corresponding to the same operation corresponding to the connection to obtain the first neural network Model;
    将所述第二神经网络结构从所述超网中继承与所述第二神经网络结构相同的连接以及与所述连接相对应的相同操作所对应的第二权重,获得所述第二神经网络模型。Inheriting the second neural network structure from the supernet with the same connection as the second neural network structure and the second weight corresponding to the same operation corresponding to the connection to obtain the second neural network Model.
  5. 根据权利要求1-3之一所述的方法,其特征在于,所述分别训练所述第一、第二神经网络模型以得到训练好的第一、第二神经网络模型,包括:The method according to any one of claims 1-3, wherein the training the first and second neural network models respectively to obtain the trained first and second neural network models comprises:
    采用随机梯度下降法训练至少一次所述第一神经网络模型的权重值,得到所述优化的第一神经网络模型;Use the stochastic gradient descent method to train the weight value of the first neural network model at least once to obtain the optimized first neural network model;
    采用随机梯度下降法训练至少一次所述第二神经网络模型的权重值,得到所述优化的第二神经网络模型。The weight value of the second neural network model is trained at least once by using the stochastic gradient descent method to obtain the optimized second neural network model.
  6. 根据权利要求1-3之一所述的方法,其特征在于,所述将带有标签的语音、视频或图形样本输入所述训练好的所述第一、第二神经网络模型,计算输出结果与标签的误差值获得胜利者和失败者,包括:The method according to any one of claims 1-3, characterized in that, inputting the labeled voice, video or graphic samples into the trained first and second neural network models, and calculating an output result The error values from the labels get winners and losers, including:
    将带有标签的语音、视频或图形样本分别输入所述训练好的第一神经网络模型和训练好的第二神经网络模型;Input the voice, video or graphic samples with labels into the trained first neural network model and the trained second neural network model respectively;
    根据所述训练好的第一神经网络模型的第一输出结果,计算所述第一输出结果与样本的标签之间的第一误差值;Calculate the first error value between the first output result and the label of the sample according to the first output result of the trained first neural network model;
    根据所述训练好的第二神经网络模型的第二输出结果,计算所述第二输出结果与样本的标签之间的第二误差值;Calculate the second error value between the second output result and the label of the sample according to the second output result of the trained second neural network model;
    比较所述第一误差值和所述第二误差值,将误差值较小的第一/第二神经网络模型作为胜利者,将误差值较大的第一/第二神经网络模型作为失败者,得到胜利者和失败者。Comparing the first error value and the second error value, the first/second neural network model with smaller error value is regarded as the winner, and the first/second neural network model with larger error value is regarded as the loser , get winners and losers.
  7. 根据权利要求1-3之一所述的方法,其特征在于,所述根据训练好的所述第一、第二神经网络模型更新所述超网,包括:The method according to any one of claims 1-3, wherein the updating the supernet according to the trained first and second neural network models comprises:
    在所述第一、第二神经网络模型的两个节点含有相同的连接以及该连接相对应的操作为相同的条件下,将所述胜利者的权重作为所述超网中相应操作的权重,更新所述超网。Under the condition that the two nodes of the first and second neural network models contain the same connection and the corresponding operation of the connection is the same, the weight of the winner is taken as the weight of the corresponding operation in the supernet, Update the supernet.
  8. 根据权利要求1-3之一所述的方法,其特征在于,所述根据训练好的所述第一、第二神经网络模型更新所述超网,包括:The method according to any one of claims 1-3, wherein the updating the supernet according to the trained first and second neural network models comprises:
    在所述第一、第二神经网络模型的两个节点的连接以及该连接相对应的操作不相同的条件下,将所述第一神经网络模型的权重作为所述超网中与所述第一神经网络结构相同的连接以及该连接相对应的相同操作的权重;将所述第二神经网络模型的权重作为所述超网中与所述第二神经网络结构相同的连接以及该连接相对应的相同操作的权重;更新所述超网。Under the condition that the connection of the two nodes of the first and second neural network models and the operations corresponding to the connection are different, the weight of the first neural network model is used as the weight of the first neural network model in the supernet. A connection with the same neural network structure and the weight of the same operation corresponding to the connection; the weight of the second neural network model is taken as the connection in the supernet with the same structure as the second neural network and the connection corresponding to the connection the weights of the same operations; update the supernet.
  9. 根据权利要求1-3之一所述的方法,其特征在于,所述计算所述失败者的结构编码与所述胜利者的之间结构编码的伪梯度值,基于所述伪梯度值使所述失败者的结构编码向所述胜利者的结构编码演化,得到第三神经网络结构的结构编码,包括:The method according to any one of claims 1 to 3, wherein the calculating a pseudo gradient value of the structure code between the loser's structure code and the winner's structure code, based on the pseudo gradient value, make the The structural code of the loser evolves to the structural code of the winner, and the structural code of the third neural network structure is obtained, including:
    计算所述失败者的结构编码值与所述胜利者的结构编码值的差值,将所述差值乘以随机系数,与其随机系数倍率下的历史的伪梯度累加求和,获得所述失败者的结构编码更新的伪梯度的值;Calculate the difference between the structural code value of the loser and the structural code value of the winner, multiply the difference by a random coefficient, accumulate and sum the historical pseudo-gradients under the random coefficient multiplier, and obtain the failure The structure of the author encodes the value of the updated pseudo-gradient;
    将所述失败者的结构编码值与所述伪梯度的值求和,得到所述第三神经网络结构的结构编码,实现所述失败者的结构编码向所述胜利者的结构编码演化。Summing the value of the structure code of the loser and the value of the pseudo gradient to obtain the structure code of the third neural network structure, and realizing the evolution of the structure code of the loser to the structure code of the winner.
  10. 根据权利要求2或3所述的方法,其特征在于,所述终止条件包括所述种群中所有结构编码是否都参与配对或是否达到设定的迭代次数。The method according to claim 2 or 3, wherein the termination condition includes whether all structural codes in the population participate in pairing or whether a set number of iterations is reached.
  11. 一种基于演化学习的神经网络结构的搜索系统,其特征在于,所述系统包括:A search system based on an evolutionary learning neural network structure, characterized in that the system comprises:
    种群初始化模块,用于初始化种群,所述种群为包含多个不同神经网络结构的结构编码集合,所述结构编码用于通过连续的实数区间指示所述神经网络结构的任意两个节点之间的连接和操作的映射关系;A population initialization module is used to initialize a population, where the population is a set of structure codes including a plurality of different neural network structures, and the structure codes are used to indicate the relationship between any two nodes of the neural network structure through a continuous real number interval. The mapping relationship between connections and operations;
    个体配对模块,用于随机选择所述种群内的两个结构编码,解码所述两个结构编码以得到两个神经网络结构,并对所述两个神经网络结构进行配对;an individual pairing module for randomly selecting two structural codes in the population, decoding the two structural codes to obtain two neural network structures, and pairing the two neural network structures;
    权重继承模块,将所述两个神经网络结构分别从超网中继承对应的权重,获得第一神 经网络模型和第二神经网络模型;其中所述超网包括多个操作的集合和每个操作的权重;A weight inheritance module, which inherits the corresponding weights from the supernet respectively for the two neural network structures to obtain a first neural network model and a second neural network model; wherein the supernet includes a set of multiple operations and each operation the weight of;
    训练模块,用于分别训练所述第一、第二神经网络模型以得到训练好的第一、第二神经网络模型;A training module for training the first and second neural network models respectively to obtain the trained first and second neural network models;
    评估模块,用于将带有标签的语音、视频或图形样本输入所述训练好的所述第一、第二神经网络模型,计算输出结果与标签的误差值获得胜利者和失败者,所述胜利者的误差值小于失败者;The evaluation module is used to input the voice, video or graphic samples with labels into the trained first and second neural network models, and calculate the error value between the output result and the label to obtain winners and losers. The winner's margin of error is smaller than the loser's;
    超网权重更新模块,用于根据训练好的所述第一、第二神经网络模型更新所述超网;a supernet weight update module, used for updating the supernet according to the trained first and second neural network models;
    结构编码演化模块,用于计算所述失败者的结构编码与所述胜利者的结构编码之间的伪梯度值,基于所述伪梯度值使所述失败者的结构编码向所述胜利者的结构编码演化,得到第三神经网络结构编码;所述伪梯度为结构编码更新的梯度;和A structure encoding evolution module, configured to calculate a pseudo-gradient value between the loser's structure code and the winner's structure code, and make the loser's structure code move toward the winner's structure code based on the pseudo-gradient value. Structural encoding evolution to obtain a third neural network structural encoding; the pseudo-gradient is the gradient of the structural encoding update; and
    种群更新模块,用于用所述第三神经网络结构编码在所述种群中代替所述失败者对应的神经网络结构的结构编码,获得更新的种群;a population update module, configured to replace the structure code of the neural network structure corresponding to the loser in the population with the third neural network structure code to obtain an updated population;
    模型输出模块,输出所述更新的种群中最优的神经网络模型,从而完成神经网络结构的搜索。The model output module outputs the optimal neural network model in the updated population, thereby completing the search of the neural network structure.
  12. 根据权利要求11所述的系统,其特征在于,所述模型输出模块用于:The system of claim 11, wherein the model output module is used to:
    在满足终止条件的情况下,输出所述更新的种群中最优的神经网络模型,从而完成神经网络结构的搜索。When the termination condition is satisfied, the optimal neural network model in the updated population is output, thereby completing the search of the neural network structure.
  13. 根据权利要求11所述的系统,其特征在于,所述模型输出模块用于:The system of claim 11, wherein the model output module is used to:
    在不满足终止条件的情况下,返回S102,对所述更新的种群进行迭代演化,直至满足终止条件后,输出所述更新的种群中最优的神经网络模型,从而完成神经网络结构的搜索。If the termination condition is not satisfied, return to S102, perform iterative evolution on the updated population, and output the optimal neural network model in the updated population after the termination condition is satisfied, thereby completing the search of the neural network structure.
  14. 根据权利要求11之一所述的系统,其特征在于,所述权重继承模块用于:The system according to claim 11, wherein the weight inheritance module is used for:
    将所述第一神经网络结构从所述超网中继承与所述第一神经网络结构相同的连接以及与所述连接相对应的相同操作所对应的第一权重,获得所述第一神经网络模型;Inheriting the first neural network structure from the supernet with the same connection as the first neural network structure and the first weight corresponding to the same operation corresponding to the connection to obtain the first neural network Model;
    将所述第二神经网络结构从所述超网中继承与所述第二神经网络结构相同的连接以及与所述连接相对应的相同操作所对应的第二权重,获得所述第二神经网络模型。Inheriting the second neural network structure from the supernet with the same connection as the second neural network structure and the second weight corresponding to the same operation corresponding to the connection to obtain the second neural network Model.
  15. 根据权利要求11之一所述的系统,其特征在于,所述训练模块用于:The system according to any one of claims 11, wherein the training module is used for:
    采用随机梯度下降法训练至少一次所述第一神经网络模型的权重值,得到所述优化的第一神经网络模型;Use the stochastic gradient descent method to train the weight value of the first neural network model at least once to obtain the optimized first neural network model;
    采用随机梯度下降法训练至少一次所述第二神经网络模型的权重值,得到所述优化的第二神经网络模型。The weight value of the second neural network model is trained at least once by using the stochastic gradient descent method to obtain the optimized second neural network model.
  16. 根据权利要求11之一所述的系统,其特征在于,所述评估模块用于:The system according to any one of claims 11, wherein the evaluation module is used for:
    将带有标签的语音、视频或图形样本分别输入所述训练好的第一神经网络模型和训练好的第二神经网络模型;Input the voice, video or graphic samples with labels into the trained first neural network model and the trained second neural network model respectively;
    根据所述训练好的第一神经网络模型的第一输出结果,计算所述第一输出结果与样本的标签之间的第一误差值;Calculate the first error value between the first output result and the label of the sample according to the first output result of the trained first neural network model;
    根据所述训练好的第二神经网络模型的第二输出结果,计算所述第二输出结果与样本的标签之间的第二误差值;Calculate the second error value between the second output result and the label of the sample according to the second output result of the trained second neural network model;
    比较所述第一误差值和所述第二误差值,将误差值较小的第一/第二神经网络模型记为胜利者,将误差值较大的第一/第二神经网络模型记为失败者,得到胜利者和失败者。Comparing the first error value and the second error value, the first/second neural network model with a smaller error value is recorded as the winner, and the first/second neural network model with a larger error value is recorded as Losers, get winners and losers.
  17. 根据权利要求11所述的系统,其特征在于,所述超网权重更新模块用于:The system according to claim 11, wherein the supernet weight update module is used for:
    在所述第一、第二神经网络模型的两个节点含有相同的连接以及该连接相对应的操作为相同的条件下,将所述胜利者的权重作为所述超网中相应操作的权重,更新所述超网。Under the condition that the two nodes of the first and second neural network models contain the same connection and the corresponding operation of the connection is the same, the weight of the winner is taken as the weight of the corresponding operation in the supernet, Update the supernet.
  18. 根据权利要求11所述的系统,其特征在于,所述超网权重更新模块用于:The system according to claim 11, wherein the supernet weight update module is used for:
    在所述第一、第二神经网络模型的两个节点的连接以及该连接相对应的操作不相同的条件下,将所述第一神经网络模型的权重作为所述超网中与所述第一神经网络结构相同的连接以及该连接相对应的相同操作的权重;将所述第二神经网络模型的权重作为所述超网中与所述第二神经网络结构相同的连接以及该连接相对应的相同操作的权重;更新所述超网。Under the condition that the connection of the two nodes of the first and second neural network models and the operations corresponding to the connection are different, the weight of the first neural network model is used as the weight of the first neural network model in the supernet. A connection with the same neural network structure and the weight of the same operation corresponding to the connection; the weight of the second neural network model is taken as the connection in the supernet with the same structure as the second neural network and the connection corresponding to the connection the weights of the same operations; update the supernet.
  19. 根据权利要求11所述的系统,其特征在于,所述结构编码演化模块用于:The system according to claim 11, wherein the structural coding evolution module is used for:
    计算所述失败者的结构编码值与所述胜利者的结构编码值的差值,将所述差值乘以随机系数,与其随机系数倍率下的历史的伪梯度累加求和,获得所述失败者的结构编码更新的伪梯度的值;Calculate the difference between the structural code value of the loser and the structural code value of the winner, multiply the difference by a random coefficient, accumulate and sum the historical pseudo-gradients under the random coefficient multiplier, and obtain the failure The structure of the author encodes the value of the updated pseudo-gradient;
    将所述失败者的结构编码值与所述伪梯度的值求和,得到所述第三神经网络结构的结构编码,实现所述失败者的结构编码向所述胜利者的结构编码演化。Summing the value of the structure code of the loser and the value of the pseudo gradient to obtain the structure code of the third neural network structure, and realizing the evolution of the structure code of the loser to the structure code of the winner.
  20. 根据权利要求12或13所述的系统,其特征在于,所述终止条件包括所述种群中所有结构编码是否都参与配对或是否达到设定的迭代次数。The system according to claim 12 or 13, wherein the termination condition includes whether all structural codes in the population participate in pairing or whether a set number of iterations is reached.
  21. 一种电子装置,其特征在于,包括存储器和处理器;所述处理器用于执行所述存储器所存储的计算机执行指令,所述处理器运行所述计算机执行指令执行权利要求1-10任意一项所述的基于演化学习的神经网络结构的搜索方法。An electronic device, characterized in that it includes a memory and a processor; the processor is configured to execute computer-executable instructions stored in the memory, and the processor executes the computer-executable instructions to execute any one of claims 1-10 The described search method of neural network structure based on evolution learning.
  22. 一种存储介质,其特征在于,包括可读存储介质和存储在所述可读存储介质中的计算机程序,所述计算机程序用于实现权利要求1-10任意一项所述的基于演化学习的神经网络结构的搜索方法。A storage medium, characterized in that it includes a readable storage medium and a computer program stored in the readable storage medium, and the computer program is used to implement the evolutionary learning-based algorithm described in any one of claims 1-10. Search methods for neural network structures.
PCT/CN2020/136950 2020-12-16 2020-12-16 Neural architecture search method and system based on evolutionary learning WO2022126448A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/136950 WO2022126448A1 (en) 2020-12-16 2020-12-16 Neural architecture search method and system based on evolutionary learning
CN202080107589.9A CN116964594A (en) 2020-12-16 2020-12-16 Neural network structure searching method and system based on evolution learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/136950 WO2022126448A1 (en) 2020-12-16 2020-12-16 Neural architecture search method and system based on evolutionary learning

Publications (1)

Publication Number Publication Date
WO2022126448A1 true WO2022126448A1 (en) 2022-06-23

Family

ID=82059915

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/136950 WO2022126448A1 (en) 2020-12-16 2020-12-16 Neural architecture search method and system based on evolutionary learning

Country Status (2)

Country Link
CN (1) CN116964594A (en)
WO (1) WO2022126448A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115130483A (en) * 2022-07-13 2022-09-30 湘潭大学 Neural architecture searching method based on multi-target group intelligent algorithm and application
CN116304932A (en) * 2023-05-19 2023-06-23 湖南工商大学 Sample generation method, device, terminal equipment and medium
CN117611974A (en) * 2024-01-24 2024-02-27 湘潭大学 Image recognition method and system based on searching of multiple group alternative evolutionary neural structures
CN117709205A (en) * 2024-02-05 2024-03-15 华南师范大学 Method, device, equipment and medium for predicting residual service life of aero-engine

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190095780A1 (en) * 2017-08-18 2019-03-28 Beijing Sensetime Technology Development Co., Ltd Method and apparatus for generating neural network structure, electronic device, and storage medium
US20190385059A1 (en) * 2018-05-23 2019-12-19 Tusimple, Inc. Method and Apparatus for Training Neural Network and Computer Server
CN110782034A (en) * 2019-10-31 2020-02-11 北京小米智能科技有限公司 Neural network training method, device and storage medium
CN111340220A (en) * 2020-02-25 2020-06-26 北京百度网讯科技有限公司 Method and apparatus for training a predictive model
CN111368973A (en) * 2020-02-25 2020-07-03 北京百度网讯科技有限公司 Method and apparatus for training a hyper-network
CN111563592A (en) * 2020-05-08 2020-08-21 北京百度网讯科技有限公司 Neural network model generation method and device based on hyper-network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190095780A1 (en) * 2017-08-18 2019-03-28 Beijing Sensetime Technology Development Co., Ltd Method and apparatus for generating neural network structure, electronic device, and storage medium
US20190385059A1 (en) * 2018-05-23 2019-12-19 Tusimple, Inc. Method and Apparatus for Training Neural Network and Computer Server
CN110782034A (en) * 2019-10-31 2020-02-11 北京小米智能科技有限公司 Neural network training method, device and storage medium
CN111340220A (en) * 2020-02-25 2020-06-26 北京百度网讯科技有限公司 Method and apparatus for training a predictive model
CN111368973A (en) * 2020-02-25 2020-07-03 北京百度网讯科技有限公司 Method and apparatus for training a hyper-network
CN111563592A (en) * 2020-05-08 2020-08-21 北京百度网讯科技有限公司 Neural network model generation method and device based on hyper-network

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115130483A (en) * 2022-07-13 2022-09-30 湘潭大学 Neural architecture searching method based on multi-target group intelligent algorithm and application
CN116304932A (en) * 2023-05-19 2023-06-23 湖南工商大学 Sample generation method, device, terminal equipment and medium
CN116304932B (en) * 2023-05-19 2023-09-05 湖南工商大学 Sample generation method, device, terminal equipment and medium
CN117611974A (en) * 2024-01-24 2024-02-27 湘潭大学 Image recognition method and system based on searching of multiple group alternative evolutionary neural structures
CN117611974B (en) * 2024-01-24 2024-04-16 湘潭大学 Image recognition method and system based on searching of multiple group alternative evolutionary neural structures
CN117709205A (en) * 2024-02-05 2024-03-15 华南师范大学 Method, device, equipment and medium for predicting residual service life of aero-engine
CN117709205B (en) * 2024-02-05 2024-05-07 华南师范大学 Method, device, equipment and medium for predicting residual service life of aero-engine

Also Published As

Publication number Publication date
CN116964594A (en) 2023-10-27

Similar Documents

Publication Publication Date Title
WO2022126448A1 (en) Neural architecture search method and system based on evolutionary learning
CN108334949B (en) Image classifier construction method based on optimized deep convolutional neural network structure fast evolution
Baymurzina et al. A review of neural architecture search
WO2022083624A1 (en) Model acquisition method, and device
CN110263227B (en) Group partner discovery method and system based on graph neural network
CN109948029B (en) Neural network self-adaptive depth Hash image searching method
Scrucca GA: A package for genetic algorithms in R
WO2018161468A1 (en) Global optimization, searching and machine learning method based on lamarck acquired genetic principle
CN112465120A (en) Fast attention neural network architecture searching method based on evolution method
WO2022252455A1 (en) Methods and systems for training graph neural network using supervised contrastive learning
US20220147877A1 (en) System and method for automatic building of learning machines using learning machines
CN116049459B (en) Cross-modal mutual retrieval method, device, server and storage medium
CN108536784B (en) Comment information sentiment analysis method and device, computer storage medium and server
CN111898689A (en) Image classification method based on neural network architecture search
US20240135191A1 (en) Method, apparatus, and system for generating neural network model, device, medium, and program product
Wen et al. Learning ensemble of decision trees through multifactorial genetic programming
WO2023087953A1 (en) Method and apparatus for searching for neural network ensemble model, and electronic device
CN113128432B (en) Machine vision multitask neural network architecture searching method based on evolution calculation
WO2021042857A1 (en) Processing method and processing apparatus for image segmentation model
CN116523079A (en) Reinforced learning-based federal learning optimization method and system
Shi et al. Multi-objective neural architecture search via predictive network performance optimization
CN114118369A (en) Image classification convolution neural network design method based on group intelligent optimization
Xie et al. Automated design of CNN architecture based on efficient evolutionary search
CN114241267A (en) Structural entropy sampling-based multi-target architecture search osteoporosis image identification method
CN115908909A (en) Evolutionary neural architecture searching method and system based on Bayes convolutional neural network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20965466

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202080107589.9

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20965466

Country of ref document: EP

Kind code of ref document: A1