CN111882048A

CN111882048A - Neural network structure searching method and related equipment

Info

Publication number: CN111882048A
Application number: CN202011043732.0A
Authority: CN
Inventors: 王铭正; 胡毅奇; 刘云峰
Original assignee: Shenzhen Zhuiyi Technology Co Ltd
Current assignee: Shenzhen Zhuiyi Technology Co Ltd
Priority date: 2020-09-28
Filing date: 2020-09-28
Publication date: 2020-11-03

Abstract

The embodiment of the application discloses a neural network structure searching method, which comprises the following steps: acquiring a first neural network structure; obtaining a second neural network structure, wherein the second neural network structure is a network structure to be evaluated obtained according to a neural network structure search algorithm; acquiring a target data set; inputting the target data set into the first neural network structure to obtain a first output result; inputting the target data set into the second neural network structure to obtain a second output result; calculating an output gap between the first output result and the second output result; evaluating the second neural network structure based on the output gaps. According to the scheme, the output difference between the second neural network to be evaluated and the first neural network structure is obtained, and the second neural network is further evaluated based on the output difference, so that the obtained neural network structure can be more comprehensively known and searched.

Description

Neural network structure searching method and related equipment

Technical Field

The embodiment of the application relates to the field of communication, in particular to a neural network structure searching method and related equipment.

Background

With the flourishing development of deep learning, especially the development of neural networks, various neural network structures with various excellent effects aiming at different problems are layered endlessly, such as network structures like ResNet which is very colorful on an image classification task, transform which is called super on a machine translation task, and the like, deep theoretical research and a large number of extensive experiments are behind the mature neural network structures, and a large amount of labor and energy investment is still needed for completing efficient neural network structure design.

Based on this background, neural Network Architecture Search (NAS) is proposed as a solution to manually design a network architecture, and the neural network architecture search process mainly includes: 1. defining a search space, 2, executing a search algorithm; 3. The candidate networks are evaluated. The three steps are a great number of researches on definition of a search space and a specific search algorithm in recent years, and particularly, a search algorithm based on different modes such as reinforcement learning or evolution algorithm is provided for the search algorithm, so that a good effect on solving different problems is achieved.

However, in the prior art, in the link of evaluating the candidate network, the evaluation method mostly adopts the verification set to verify the candidate network, so as to obtain the loss condition of the candidate network to the verification set, that is, the accuracy of the candidate network to the verification set evaluates the candidate network structure. Evaluation of candidate structures by only the accuracy of the validation set does not reveal the full nature of the candidate structures.

Disclosure of Invention

The embodiment of the application provides a neural network structure searching method, which is used for searching neural network structures with different purposes.

A first aspect of an embodiment of the present application provides a neural network structure search method, including:

acquiring a first neural network structure;

obtaining a second neural network structure, wherein the second neural network structure is a network structure to be evaluated obtained according to a neural network structure search algorithm;

acquiring a target data set;

inputting the target data set into the first neural network structure to obtain a first output result;

inputting the target data set into the second neural network structure to obtain a second output result;

calculating an output gap between the first output result and the second output result;

evaluating the second neural network structure based on the output gaps.

Based on the neural network structure searching method provided by the first aspect of the embodiments of the present application, optionally,

the method further comprises the following steps:

obtaining an initial loss function of the second neural network structure;

taking a sum of the initial loss function and the output gap as a target loss function for the second neural network structure;

the evaluating the second neural network structure based on the output gaps includes:

evaluating the second neural network structure based on the objective loss function.

Based on the neural network structure search method provided in the first aspect of the embodiment of the present application, optionally, the evaluating the second neural network structure based on the objective loss function includes:

and updating the structure parameters of the second neural network structure based on the target loss function so as to obtain the target structure parameters of the second neural network structure when the target loss function is the minimum value.

the updating structural parameters of the second neural network structure based on the objective loss function, the method further comprising:

training a second neural network structure with the target structure parameters to update the model parameters of the second neural network structure with the target structure parameters.

Based on the neural network structure search method provided in the first aspect of the embodiment of the present application, optionally, the updating the structure parameter of the second neural network structure based on the target loss function to obtain the target structure parameter of the second neural network structure when the target loss function is the minimum value includes:

updating model parameters of the second neural network structure based on the initial loss function;

updating structural parameters of the second neural network structure based on the target loss function;

and alternately carrying out the updating process of the model parameters and the structure parameters to obtain the target structure parameters of the second neural network structure when the target loss function is the minimum value.

Based on the neural network structure search method provided in the first aspect of the embodiment of the present application, optionally, the output gap is a KL divergence.

Based on the neural network structure search method provided in the first aspect of the embodiment of the present application, optionally, the neural network structure search algorithm is a gradient-based micro neural structure search method.

Based on the neural network structure search method provided in the first aspect of the embodiment of the present application, optionally, the obtaining a target data set includes:

acquiring an original data set;

performing data enhancement on the original data set to obtain a target data set, wherein the data enhancement mode comprises the following steps: geometric transformation or color transformation.

A second aspect of the embodiments of the present application provides a neural network structure search device, including:

a first network acquisition unit for acquiring a first neural network structure;

the second network acquisition unit is used for acquiring a second neural network structure, and the second neural network structure is a network structure to be evaluated obtained according to a neural network structure search algorithm;

a target data set acquisition unit for acquiring a target data set;

a first input unit, configured to input the target data set into the first neural network structure, so as to obtain a first output result;

the second input unit is used for inputting the target data set into the second neural network structure to obtain a second output result;

a calculating unit for calculating an output gap between the first output result and the second output result;

an evaluation unit for evaluating the second neural network structure based on the output gap.

Based on the neural network structure search device provided in the second aspect of the embodiment of the present application, optionally, the neural network structure search device further includes:

an initial loss function obtaining unit, configured to obtain an initial loss function of the second neural network structure;

a target loss function acquisition unit for taking the sum of the initial loss function and the output gap as a target loss function of the second neural network structure;

the evaluation unit is specifically configured to: evaluating the second neural network structure based on the objective loss function.

Based on the neural network structure search device provided in the second aspect of the embodiment of the present application, optionally, the evaluation unit specifically includes: and updating the structure parameters of the second neural network structure based on the target loss function so as to obtain the target structure parameters of the second neural network structure when the target loss function is the minimum value.

and the model parameter training unit is used for training the second neural network structure with the target structure parameters so as to update the model parameters of the second neural network structure with the target structure parameters.

Based on the neural network structure search device provided in the second aspect of the embodiment of the present application, optionally, the evaluation unit is specifically configured to: updating model parameters of the second neural network structure based on the initial loss function;

Based on the neural network structure search device provided in the second aspect of the embodiment of the present application, optionally, the output gap is a KL divergence.

Based on the neural network structure search device provided in the second aspect of the embodiment of the present application, optionally, the neural network structure search algorithm is a gradient-based micro neural structure search method.

Based on the neural network structure search device provided in the second aspect of the embodiment of the present application, optionally, the target data set obtaining unit is specifically configured to:

acquiring an original data set;

A third aspect of the embodiments of the present application provides a neural network structure search device, including:

the system comprises a central processing unit, a memory, an input/output interface, a wired or wireless network interface and a power supply;

the memory is a transient memory or a persistent memory;

the central processor is configured to communicate with the memory, and to execute the operations of the instructions in the memory on the neural network structure search device to perform the method according to any one of the first aspect of the embodiments of the present application.

A fourth aspect of embodiments of the present application provides a computer-readable storage medium, including instructions, which, when executed on a computer, cause the computer to perform the method according to any one of the first aspects of embodiments of the present application.

A fifth aspect of embodiments of the present application provides a computer program product containing instructions, which when executed on a computer, cause the computer to perform the method according to any one of the first aspect of embodiments of the present application.

According to the technical scheme, the embodiment of the application has the following advantages: the method comprises the steps of obtaining a first network, obtaining a second neural network structure to be evaluated, inputting the same data set to the first neural network structure and the second neural network structure according to a neural network structure searching algorithm to obtain an output difference between the second neural network structure to be evaluated and the first neural network structure, and further evaluating the second neural network based on the output difference so as to more comprehensively know and search the obtained neural network structure.

Drawings

FIG. 1 is a schematic flow chart illustrating an embodiment of a neural network structure searching method according to the present application;

FIG. 2 is another schematic flow chart of an embodiment of a neural network structure searching method of the present application;

FIG. 3 is another schematic flow chart of an embodiment of a neural network structure searching method of the present application;

FIG. 4 is a schematic structural diagram of an embodiment of a neural network structure search device according to the present application;

FIG. 5 is another schematic structural diagram of an embodiment of a neural network structure searching method according to the present application;

fig. 6 is another schematic structural diagram of the neural network structure searching method according to the embodiment of the present application.

Detailed Description

In order to make the technical solutions in the embodiments of the present application better understood, the technical solutions in the embodiments of the present application are clearly and completely described below, and it is obvious that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Some terms used in the embodiments of the present application will be explained below.

The embodiments of the present application relate to related applications of neural networks, and in order to better understand the solution of the embodiments of the present application, the following first introduces related terms and other related concepts of neural networks that may be related to the embodiments of the present application.

(1) A neural network.

Neural Networks (NNs), also referred to as Connection models (Connection models), are algorithmic mathematical models that Model animal Neural network behavior characteristics and perform distributed parallel information processing. The network achieves the aim of processing information by adjusting the interconnection relationship among a large number of simple processing units (called neurons) in the network depending on the complexity of the system.

(2) A deep neural network.

Deep Neural Networks (DNNs), also called multi-layer neural networks, can be understood as neural networks with multiple hidden layers. The DNNs are divided according to the positions of different layers, and the neural networks inside the DNNs can be divided into three categories: input layer, hidden layer, output layer. Generally, the first layer is an input layer, the last layer is an output layer, and the middle layers are hidden layers. The layers are all connected, that is, any neuron of the ith layer is necessarily connected with any neuron of the (i + 1) th layer. In deep neural networks, more hidden layers make the network more able to depict complex situations in the real world. Theoretically, a more parametric model is more complex, meaning that it can perform more complex learning tasks. The final purpose of the process of training the deep neural network, namely learning the weight matrix, is to obtain the weight matrix of all layers of the deep neural network.

(3) A recurrent neural network.

The purpose of a Recurrent Neural Networks (RNN) RNN is to process sequence data. In the traditional neural network model, from an input layer to a hidden layer to an output layer, all layers are connected, and nodes between each layer are connectionless. But such a general neural network is not capable of failing to address many problems. For example, you would typically need to use the previous word to predict what the next word in a sentence is, because the previous and next words in a sentence are not independent. The RNN is called a recurrent neural network, i.e., the current output of a sequence is also related to the previous output. The concrete expression is that the network memorizes the previous information and applies the previous information to the calculation of the current output, namely, the nodes between the hidden layers are not connected any more but connected, and the input of the hidden layer comprises not only the output of the input layer but also the output of the hidden layer at the last moment. In theory, RNNs can process sequence data of any length.

(4) And searching a neural network structure.

With the advent of a wide variety of neural network architectures, the design of neural network architectures is transitioning from manual design to automatic machine design. A born consequence is Neural Network Architecture Search (NAS). The classical NAS method uses RNN as a controller to generate a subnet (child network), trains and evaluates the subnet to obtain its network performance (e.g., accuracy), and finally updates the parameters of the controller. Neural network structure search has become a trend of research. Although the methods are diversified, they basically include the following three major parts.

1. Defining a search space; i.e. defining a candidate set of network structures to be searched. The search space is roughly divided into a global search space and a cell-based search space, wherein the global search space represents the search of the whole network structure, and the cell-based search space only searches a plurality of small structures and is combined into a complete large network by a stacking and splicing method.

2. Executing a search algorithm; search algorithms, i.e. how to select in a search space, are roughly classified into three types according to different methods: reinforcement learning based methods, evolutionary algorithm based methods, and gradient based methods.

3. Performing performance evaluation on the sampled network; i.e. to evaluate the performance of the network structure on the target data set.

The scheme can be practically applied to environments such as a server or a terminal in the implementation process. The server may be a conventional server, or may be a cloud server composed of a large number of network servers, which is not specifically limited herein.

The terminal device may be various electronic devices that have a display screen, a data processing module, a shooting camera, an audio input/output function, and the like, and support data input, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, self-service terminals, wearable electronic devices, and the like. Specifically, the data input may be inputting voice based on a voice module provided on the electronic device, inputting characters based on a character input module, and the like.

The terminal device may have a corresponding data transmission or networking function so as to obtain external files or computing resources required in the execution process of the scheme, and the specific details are not limited herein.

Several exemplary application scenarios of the neural network architecture obtained as a result of the operation of the solution provided in the present application are described below.

Application scenario 1: online question-answering robot

In the scene, in the development process of the question-answering robot, the neural network structure searching method provided by the scheme can be used for searching for the corresponding neural network structure suitable for the question-answering robot. The question-answering robot can answer the questions proposed by the users according to the corresponding neural network structure, so that the problems of the users are solved quickly, and the efficiency of the question-answering robot is improved.

Application scenario 2: voice robot

In the process of business promotion, a user may adopt methods such as telemarketing and the like to correspondingly popularize products, and for the telemarketing, a voice robot can be adopted to execute corresponding voice services, wherein the used voice robot adopts a semantic processing neural network structure and can search and obtain the semantic processing neural network structure by using the neural network structure searching method provided by the scheme. Therefore, the voice robot can accurately understand the intention of the client and improve the popularization efficiency of the service.

Referring to fig. 1, an embodiment of a neural network structure searching method according to the present application includes:

101. a first neural network structure is obtained.

A first neural network structure is obtained. The first neural network structure is a network for reference and comparison, so that the first neural network structure is selected based on the problem targeted by the user, for example, a ResNet network structure which performs better on the problem can be adopted as the first neural network structure for solving the image recognition problem, and a mature Transformer network structure can be used as the first neural network structure for solving the machine translation task. The specific selection of the first neural network structure can be determined according to the actual use scene of a user, and a network structure which aims at the problem to be solved is selected as the first neural network structure, and it can be understood that the first neural network structure should select a relatively mature neural network structure with a certain reference value, so that the function can be exerted in the evaluation process of the candidate network structure, and the specific selection of the network structure can be determined according to the actual situation, which is not limited here.

102. A second neural network structure is obtained.

And obtaining a second neural network structure, wherein the second neural network structure is a network structure to be evaluated obtained according to a neural network structure searching algorithm. The second Neural network structure is a network to be evaluated, and the existing Neural network Search algorithms are various and mainly divided into a Neural network Search algorithm based on reinforcement learning, a Neural network Search algorithm based on an evolutionary algorithm and a Neural network Search algorithm (DNAS) based on gradient. In evaluating the results, a classification of Efficient neural network Search algorithm (ENAS) and the like has also appeared. Obtaining a single neural network structure to be evaluated, evaluating the neural network structure, and repeatedly performing a neural network search algorithm according to an evaluation result, namely obtaining another neural network structure to be evaluated again according to the evaluation result, wherein the last evaluation result can influence the next neural network structure to be evaluated, and the process is repeated in such a way, namely the second neural network structure can be different according to different search algorithms of the neural network structures, and the obtaining of the second neural network structure in the specific implementation process can be determined according to actual conditions without limitation.

103. A target data set is acquired.

A target data set is acquired. The target data set is a data set used for evaluating the difference between the first and second neural network structures, and specifically, the target data set may be a small part selected from a large data set, and the data set may be used in advance for training the first neural network structure and searching algorithm process of the second neural network structure under the condition of ensuring data consistency. The target data set can be correspondingly enhanced in advance, so that the scene range and the data type included by the target data set are improved, and the accuracy of the second neural network structure evaluation is improved. The specific situation may be determined according to actual circumstances, and is not limited herein.

It should be understood that there is no logical relationship in time sequence between the steps 101 to 103, and therefore the execution sequence of the steps 101 to 103 in the actual execution process may be determined according to the actual situation, and is not limited herein.

104. The target data set is input into a first neural network structure to obtain a first output result.

105. And inputting the target data set into a second neural network structure to obtain a second output result.

And respectively inputting the target data set into the first neural network structure and the second neural network structure, and correspondingly obtaining a first output result and a second output result. It can be understood that there is no logical relationship in time sequence between the steps 104 and 105, and therefore the execution sequence of the steps is not limited in the actual implementation process, and the steps may be executed sequentially according to a certain order, or may be executed simultaneously.

It will be appreciated that a plurality or variety of data may be included in the target data set, and to ensure that the comparison is meaningful, the first output and the second output should be output by data from the same target data set to facilitate subsequent comparison. The output result may be the accuracy of each neural network structure to the target data set, and the value may be expressed in the form of a vector, which may be determined according to the actual situation, and is not limited herein.

106. Calculating an output gap between the first output result and the second output result.

Calculating an output gap between the first output result and the second output result. Specifically, for the case of representing the output in a vector form, the calculation may be performed in the form of KL divergence (cross entropy), Empirical Mode Decomposition (EMD), cosine distance, or the like, which may be determined according to the actual situation, and is not limited herein. The output gap expresses the effect gap between the second neural network and the first neural network, and can be used for evaluating the second neural network structure obtained by the neural network structure searching algorithm.

107. A second neural network structure is evaluated based on the output gap.

A second neural network structure is evaluated based on the output gap. The specific user can obtain the effect difference between the second neural network structure and the first neural network structure from the output difference, and then judge whether the second neural network structure reaches the accuracy required by the user. If the result is reached, the model parameters of the second neural network structure can be further trained under the structural parameters of the second neural network structure, and if the result is not reached, the search algorithm of the neural network structure, which is executed again by using the output gap, can be used so as to further obtain a more accurate neural network structure, which is not limited herein.

Based on the embodiment described in fig. 1, another implementation manner of the neural network structure searching method provided by the present application may refer to fig. 2, which includes: step 201 to step 211.

201. A first neural network structure is obtained.

And acquiring a first neural network structure, wherein the first neural network structure is a mature network with a good target data set identification effect. The first neural network structure is exemplified as a DenseNet network structure, the basic idea of which is consistent with ResNet, but it establishes a dense connection (densconnection) of all layers in front and all layers behind, and the name of which is also derived from it. Another feature of DenseNet is feature reuse (feature reuse) implemented by the connection of features on a channel. These features allow DenseNet to achieve better performance than ResNet with less parametric and computational cost. The first neural network structure may be trained using the corresponding data set after the first neural network structure is determined in order to improve the ability of the first neural network structure to address a particular problem.

202. A second neural network structure is obtained.

A second neural network structure is obtained. The Neural network structure Search algorithm is exemplified as a gradient-based micro Neural structure Search method (DNAS), which continuously relaxes the representation of the Neural network structure, and efficiently searches the structure through gradient descent. First, each neuron of the model to be searched is used as a node of a directed acyclic graph, and each node represents a layer of the network. The operations that can be selected among each node are as follows: zero operations, cross-layer chaining, convolution and pooling, and the like.

It is understood that, in order to obtain the second neural network structure, the method may further include the basic steps of the neural network structure search method, that is, determining the search space and determining the neural network structure search algorithm, where the determination of the search space may be determined according to the actual requirements of the user, so that the second neural network structure obtained by the method meets the requirements, and is not limited herein.

203. An original data set is acquired.

And acquiring a raw data set required to be used, wherein the specific raw data set can be determined by the problem which is expected to be solved by a user so as to search and evaluate the neural network structure in a targeted manner. It should be understood that, for convenience of description, the step of acquiring the target data set is performed in time here, and does not affect the actual implementation process of the present solution, and there is no time-series limitation in the actual implementation process from step 201 to step 203, and the step may be determined according to the actual situation.

204. And performing data enhancement on the original data set.

And performing data enhancement on the original data set to obtain a target data set. In order to improve the using effect of the data set, the data enhancement can be carried out on the original data set, and the enhanced sample and the original sample are obtained by geometric transformation such as clipping, overturning, rotating, scaling, twisting and the like with strong correlation, and color transformation modes such as pixel disturbance, noise addition, illumination adjustment, contrast adjustment, sample addition and the like. Namely, the data set is processed, so that the sample size of the input of the network is increased, and the representation capability of the network on the target data set is further enhanced. The specific enhancement mode can be determined according to the actual situation, and is not limited herein.

205. The target data set is input into a first neural network structure to obtain a first output result.

206. And inputting the target data set into a second neural network structure to obtain a second output result.

And respectively inputting the enhanced target data set into the first neural network structure and the second neural network structure, and correspondingly obtaining a first output result and a second output result. This step is similar to steps 104 to 105 in the embodiment corresponding to fig. 1, and detailed description thereof is omitted here. It can be understood that there is no logical relationship in time sequence between the steps 205 and 206, so that the execution sequence of the steps is not limited in the actual implementation process, and the steps may be executed sequentially according to a certain order, or may be executed simultaneously.

207. Calculating an output gap between the first output result and the second output result.

Calculating an output difference between the first output result and the second output result, where the output difference is represented by a KL divergence. For a neural network structure, inputting a sample whose output is the probability over all classes, noting the second output as p2 and the first output as p1, the calculation formula of the KL divergence between the two outputs is:

where N is the number of samples, M is the number of sample classes, and Xi represents a certain sample.

The KL divergence is a relative entropy (relative entropy) between the two, which is also called Kullback-Leibler divergence (Kullback-Leibler divergence) or information divergence (information divergence), and is an asymmetry measure of a difference between two probability distributions (performance distributions), and a degree of a performance difference of two network structures can be obtained by using the KL divergence.

208. Obtaining an initial loss function for the second neural network structure.

Obtaining an initial loss function of a second neural network structure, wherein the loss function is a function for evaluating the degree of difference between a predicted value and a true value of a network structure model, the meaning of the initial loss function of the second neural network structure is the accuracy of the second neural network structure on a target verification set, and specifically, a classified cross entropy function (cross-entropy function) can be adopted for representation, and other types of loss functions can also be adopted for representation in the actual implementation process, such as: the hinge loss function (hinge loss function) or the exponential loss function (exponential loss function) performs the expression of the loss function, and is not limited herein.

It can be understood that the obtaining process of the initial loss function of the second neural network structure does not have a time-series logical relationship with other steps in the method, and only needs to be executed after the second neural network structure is determined, which is only for convenience of understanding and is not intended to limit the specific implementation sequence of each step in the scheme.

209. Taking the sum of the initial loss function and the output gap as a target loss function for the second neural network structure.

And taking the sum of the loss function and the output gap as a target loss function of the second neural network structure, and judging whether the second neural network structure takes the optimal reference as an initial loss function in the execution process of each step of the neural network searching algorithm part, but for the problem that the network structure is easy to be unstable when the second neural network structure is only carried out by the initial loss function, for example, for a searching mode adopting a gradient-based micro neural network searching algorithm, each neuron of a model to be searched needs to be taken as a node of a directed acyclic graph in the searching process, and each node represents a layer of the network. Each node may choose to operate as: zero operations, cross-layer chaining, convolution and pooling, and the like. The flow of information from the front node to the back node may represent an operation on the front node, and which operation is specifically selected to be determined by the configuration parameter α. Each choice has a weight and the end result determines which choice has the greatest weight to choose the operation at that node.

The calculation method of the structural parameter α is consistent with that of the softmax function, and reference may be made to the prior art specifically, which is not described herein again.

As for the whole neural network structure, not only the structural parameter α but also the model parameter w exist, the purpose of the neural network structure search process is to obtain the structural parameter and the model parameter w with the least loss, that is, the optimization process is a two-layer optimization process, and the learning parameters in the optimization for finding the optimal a are less, which causes the disadvantage in the two-layer optimization, and this also causes the phenomenon of collapse of the neural network structure search, and the phenomenon of very many cross-layer links in the structure is one of the collapse phenomena. According to the scheme, in the process of optimizing the structural parameter alpha, the target loss function comprising the initial loss function and the KL divergence is used for optimizing the alpha, so that the alpha is not easy to fail due to too few learnable parameters, and the stability of the network is improved.

210. Updating model parameters of the second neural network structure based on the initial loss function.

Updating model parameters of the second neural network structure based on an initial loss function. The model parameters of the second neural network structure are updated based on the original initial loss function of the second neural network structure, in the optimization process of the second neural network structure, the performance effect of the second neural network structure on a target data set can not be correctly reflected easily because only the structure parameters are adjusted, and a certain influence can be caused on the updating process of the structure parameters of the second neural network structure, so that the model parameters of the second neural network structure can be correspondingly updated in the updating process of the second neural network structure. The specific update process of the model parameters of the second neural network structure may be performed with reference to an existing update process of the model parameters of the neural network structure, and is not limited herein.

211. Updating structural parameters of the second neural network structure based on the objective loss function.

And updating the structural parameters of the second neural network structure based on the target loss function so as to obtain the target structural parameters of the second neural network structure when the target loss function is the minimum value. And when the second neural network structure conforms to the structure represented by the alpha, the loss of the network is minimum, and the obtained candidate neural network is most accurate.

212. And obtaining the target structure parameters of the second neural network structure when the target loss function is the minimum value.

And alternately and circularly carrying out the updating process of the model parameters and the structure parameters so as to obtain the target structure parameters of the second neural network structure when the target loss function is the minimum value. The specific circulation process can set circulation conditions according to the current mode based on the searching process of the micro neural network structure, and finally the target structure parameters of the second neural network structure matched with the user requirements are obtained. The structural parameters of the second neural network structure obtained based on the mode are influenced by two factors, the first factor is an initial loss function and represents the accuracy of the obtained second neural network structure, the second factor is an output difference between the second neural network structure and the first neural network, namely the similarity between the second neural network structure and the first neural network, and when the target structural parameters of the second neural network structure are obtained when the target loss function is the minimum value, the similarity and the accuracy of the second neural network structure and the first neural network structure are better embodied, so that the second neural network structure has higher practicability and can better embody the effect potential of the second neural network structure.

Based on the embodiment described in fig. 1, another implementation manner of the neural network structure searching method provided by the present application may refer to fig. 3, which includes: step 301 to step 311.

301. A first neural network structure is obtained.

302. A second neural network structure is obtained.

303. An original data set is acquired.

304. And performing data enhancement on the original data set.

305. The target data set is input into a first neural network structure to obtain a first output result.

306. And inputting the target data set into a second neural network structure to obtain a second output result.

307. Calculating an output gap between the first output result and the second output result.

308. Obtaining an initial loss function for the second neural network structure.

309. Taking the sum of the initial loss function and the output gap as a target loss function for the second neural network structure.

Steps 301 to 309 are similar to the embodiment of fig. 2, and detailed description thereof is omitted here.

310. And updating the structural parameters of the second neural network structure based on the target loss function so as to obtain the target structural parameters of the second neural network structure when the target loss function is the minimum value.

And updating the structural parameters of the second neural network structure based on the target loss function so as to obtain the target structural parameters of the second neural network structure when the target loss function is the minimum value. And updating the second neural network structure by taking the target loss function as a basis to obtain the target structure parameters of the second neural network structure when the target loss function is the minimum value. It is understood that, during the process of updating the structural parameters of the second neural network structure, the process may also be alternated with the process of updating the model parameters of the second neural network structure, and the process is not limited herein.

311. Training a second neural network structure with the target structure parameters to update the model parameters of the second neural network structure with the target structure parameters.

Training a second neural network structure with the target structure parameters to update the model parameters of the second neural network structure with the target structure parameters. After the structural parameters of the second neural network structure are determined, the model parameters of the second neural network structure can be updated by using a proper data set, and the specific training process can refer to the existing training process of the model parameters of the neural network structure, and is not limited herein. After the structure of the neural network is determined, model parameters of the model content are adjusted to improve the performance of the second neural network structure on a specific problem, so that the performance of the second neural network structure is further improved, and the second neural network structure meeting the use requirements of a user is obtained.

As can be seen from the specific contents of the above embodiments, the neural network structure searching method of the present application may adjust the loss function involved in updating the neural network structure, so that the loss function involved in the updating process further includes a value representing an output difference between the second neural network structure to be evaluated and the first neural network structure, that is, a KL divergence, that is, a value representing a performance difference between the second neural network structure to be evaluated and the first neural network structure, which is, in the updating process of the structural parameters of the neural network, not only the accuracy of the neural network structure to be evaluated but also the performance difference between the second neural network structure and the mature network are considered, thereby improving the stability of the neural network structure to be evaluated, ensuring that the searching process considers various aspects of the structure to be evaluated, and improving the effect of the obtained.

Referring to fig. 4, an embodiment of the neural network structure search apparatus of the present application includes:

a first network obtaining unit 401, configured to obtain a first neural network structure;

a second network obtaining unit 402, configured to obtain a second neural network structure, where the second neural network structure is a network structure to be evaluated obtained according to a neural network structure search algorithm;

a target data set acquisition unit 403 for acquiring a target data set;

a first input unit 404, configured to input the target data set into the first neural network structure, so as to obtain a first output result;

a second input unit 405, configured to input the target data set into the second neural network structure, so as to obtain a second output result;

a calculating unit 406, configured to calculate an output gap between the first output result and the second output result;

an evaluation unit 407 for evaluating the second neural network structure based on the output gap.

In this embodiment, the flow executed by each unit in the neural network structure search device is similar to the method flow described in the embodiment corresponding to fig. 1, and is not described herein again.

Referring to fig. 4, an embodiment of the neural network structure search apparatus of the present application includes: a first network acquisition unit 501, a second network acquisition unit 502, a target data set acquisition unit 503, a first input unit 505, a second input unit 506, a calculation unit 507, and an evaluation unit 510. The above units are similar to the processes executed by the units in the embodiment corresponding to fig. 4 of the present application, and are not described again here.

an initial loss function obtaining unit 508, configured to obtain an initial loss function of the second neural network structure;

a target loss function obtaining unit 509, configured to use a sum of the initial loss function and the output gap as a target loss function of the second neural network structure;

optionally, the evaluation unit 510 may be specifically configured to: evaluating the second neural network structure based on the objective loss function.

Optionally, the evaluation unit 510 may be specifically configured to: and updating the structure parameters of the second neural network structure based on the target loss function so as to obtain the target structure parameters of the second neural network structure when the target loss function is the minimum value.

Optionally, the evaluation unit 510 is specifically configured to: updating model parameters of the second neural network structure based on the initial loss function;

Optionally, the neural network structure search device further includes:

a model parameter training unit 511. Training a second neural network structure with the target structure parameters to update the model parameters of the second neural network structure with the target structure parameters.

Optionally, the output gap is a KL divergence.

Optionally, the neural network structure search algorithm is a gradient-based micro neural structure search method.

Optionally, the target data set obtaining unit is specifically configured to:

acquiring an original data set;

In this embodiment, the flow executed by each unit in the neural network structure search device is similar to the method flow described in the embodiment corresponding to fig. 2 or fig. 3, and is not described again here.

Fig. 6 is a schematic structural diagram of a neural network structure search device according to an embodiment of the present disclosure, where the server 600 may include one or more Central Processing Units (CPUs) 601 and a memory 605, where the memory 605 stores one or more applications or data.

In this embodiment, the specific functional module division in the central processing unit 601 may be similar to the functional module division manner of each unit described in fig. 4 or fig. 5, and is not described herein again.

The memory 605 may be volatile storage or persistent storage, among other things. The program stored in the memory 605 may include one or more modules, each of which may include a sequence of instructions operating on a server. Still further, the central processor 601 may be configured to communicate with the memory 605 to execute a series of instruction operations in the memory 605 on the server 600.

The server 600 may also include one or more power supplies 602, one or more wired or wireless network interfaces 603, one or more input-output interfaces 604, and/or one or more operating systems, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

The central processing unit 601 may perform operations performed by any one of the neural network structure searching methods in the embodiments shown in fig. 1 to fig. 2, and details thereof are not repeated herein.

Embodiments of the present application also provide a computer storage medium for storing computer software instructions for use as described above, including a program designed for performing a neural network structure search method.

The neural network structure searching method may be any one of the neural network structure searching methods described in the foregoing embodiments of fig. 1 to 3.

The embodiment of the present application further provides a computer program product, where the computer program product includes computer software instructions, and the computer software instructions can be loaded by a processor to implement the flow of the neural network structure searching method in any one of fig. 1 and fig. 2.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and various other media capable of storing program codes.

Claims

1. A neural network structure search method, comprising:

acquiring a first neural network structure;

acquiring a target data set;

evaluating the second neural network structure based on the output gaps.

2. The neural network structure searching method according to claim 1,

the method further comprises the following steps:

obtaining an initial loss function of the second neural network structure;

3. The neural network structure searching method according to claim 2,

the evaluating the second neural network structure based on the objective loss function includes:

4. The neural network structure searching method according to claim 3,

5. The neural network structure searching method according to claim 3,

the updating the structural parameters of the second neural network structure based on the target loss function to obtain the target structural parameters of the second neural network structure when the target loss function is the minimum value includes:

6. The neural network structure searching method according to claim 1,

the output gap is the KL divergence.

7. The neural network structure searching method according to claim 1,

the neural network structure searching algorithm is a gradient-based micro neural structure searching method.

8. The neural network structure searching method according to claim 1,

the acquiring a target data set includes:

acquiring an original data set;

9. A neural network structure search device, characterized by comprising:

a target data set acquisition unit for acquiring a target data set;

10. The neural network structure search device according to claim 9, further comprising:

11. The neural network structure searching device according to claim 10,

the evaluation unit is specifically characterized in that: and updating the structure parameters of the second neural network structure based on the target loss function so as to obtain the target structure parameters of the second neural network structure when the target loss function is the minimum value.

12. The neural network structure searching device according to claim 11,

the neural network structure searching apparatus further includes:

13. The neural network structure searching device according to claim 11,

the evaluation unit is specifically configured to: updating model parameters of the second neural network structure based on the initial loss function;

14. The neural network structure searching device according to claim 9,

the output gap is the KL divergence.

15. The neural network structure searching device according to claim 9,

16. The neural network structure searching device according to claim 9,

the target data set obtaining unit is specifically configured to:

acquiring an original data set;

17. A neural network structure search device, characterized by comprising:

the memory is a transient memory or a persistent memory;

the central processor is configured to communicate with the memory, the instructions in the memory operating to perform the method of any of claims 1-9 on the neural network structure search device.

18. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any one of claims 1-9.

19. A computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1-9.