WO2022063247A1

WO2022063247A1 - Neural architecture search method and apparatus

Info

Publication number: WO2022063247A1
Application number: PCT/CN2021/120434
Authority: WO
Inventors: 李明阳; 周振坤; 徐羽琼
Original assignee: 华为技术有限公司
Priority date: 2020-09-28
Filing date: 2021-09-24
Publication date: 2022-03-31
Also published as: CN114330699A

Abstract

A neural architecture search method and apparatus, which relate to the field of AI and can use a small amount of computing resources to determine, in a short time, a neural network architecture with an excellent performance, while the consistency between a theoretical time delay and a true time delay is ensured. The method comprises: obtaining a super-network according to a target task; obtaining a time delay of each deep learning operator in the super-network running on an electronic device; determining a time-delay loss function according to the time delay of each deep learning operator running on the electronic device; executing a training operation on the super-network, and updating a model parameter of the super-network according to the time-delay loss function and a network loss function until an updated super-network satisfies a condition for the target task to run on the electronic device; and determining a target neural network architecture according to an updated framework parameter of each network layer, wherein the super-network comprises a plurality of network layers, each network layer comprises a plurality of nodes, any two nodes of a network layer are connected to each other by means of a deep learning operator, and the model parameter comprises a framework parameter of each network layer.

Description

Neural network structure search method and device

This application claims the priority of the Chinese patent application with the application number 202011043055.2 and the invention titled "Neural Network Structure Search Method and Device", which was submitted to the State Intellectual Property Office on September 28, 2020, the entire contents of which are incorporated herein by reference middle.

technical field

The present application relates to the field of artificial intelligence (AI), and in particular, to a method and apparatus for searching a neural network structure.

Background technique

With the rapid development of AI technology, various neural network models emerge in an endless stream. The performance of the neural network structure has an important influence on the task execution effect of the neural network model. The better the performance of the neural network structure, the better the task execution effect of the neural network model. Therefore, when building a neural network model, how to determine a neural network structure with excellent performance is a research hotspot for those skilled in the art.

The neural network architecture search (NAS) technology was born from the application of the NAS technology, which can automatically search for the neural network structure with the best performance in the pre-defined search space. However, in the prior art, when the NAS technology is used to search the neural network structure, there is a problem that the consumption of computing resources is relatively large, and the theoretical delay and the actual delay cannot be guaranteed to be consistent.

SUMMARY OF THE INVENTION

The present application provides a method and device for searching a neural network structure, which can determine a neural network structure with excellent performance in a short period of time with less computing resources under the condition that the theoretical delay and the actual delay are guaranteed to be consistent.

In the first aspect, the present application provides a neural network structure search method. The neural network structure search device obtains a super network according to a target task, and obtains the running delay of each deep learning operator in the super network in the electronic device, and according to each The delay of a deep learning operator running on the electronic device determines the delay loss function of the super network, and then performs training operations on the super network. According to the delay loss function and the network loss function obtained in the training process, the super network Model parameters until the updated super-network satisfies the conditions for the target task to run on the electronic device, and the target neural network structure is determined according to the updated architecture parameters of each network layer. Among them, the super network includes multiple network layers, each network layer includes multiple nodes, any two nodes of a network layer are connected by a deep learning operator, and the model parameters include the architecture of each network layer in the multiple network layers parameter.

In this way, the supernetwork obtained according to the target task is a structure that includes multiple network layers, each network layer includes multiple nodes, and any two nodes are connected by deep learning operators. Subnet of the target task. The embodiment of the present application updates the model parameters of the super network by training the super network. The model parameters include the architecture parameters of each network layer. Until the updated super network satisfies the conditions, the updated architecture of each network layer can be The parameters determine the target neural network structure, that is, determine the neural network structure with the best performance. Compared with the training of a large number of sub-networks in the prior art, the target neural network structure can be obtained because the embodiment of the present application only needs to train the super network. The target neural network structure is obtained, so a lot of computing resources are saved, the search time is shortened, and the search efficiency is improved. Moreover, since the delay loss function is referenced when updating the model parameters, the delay loss function is obtained according to the real delay of each deep learning operator running on the electronic device, which can realize the determination of the target neural network structure. When , the theoretical delay and the real delay are guaranteed to be consistent.

Optionally, in a possible implementation manner of the present application, the above-mentioned method of "determining the delay loss function of the super network according to the delay of each deep learning operator running on the electronic device" may include: a neural network structure. The search device determines the network embedding coefficient corresponding to each deep learning operator according to the corresponding relationship between the pre-stored operator and the network embedding coefficient, and determines the delay of each deep learning operator running on the electronic device and the corresponding deep learning operator. The network embeds the product of the coefficients, and determines the sum of all products, and then determines the delay loss function according to the sum and the delay consistency coefficient.

In this way, according to the real time delay of each deep learning operator running on the electronic device and the network embedding coefficient corresponding to each deep learning operator, the time delay corresponding to the discrete deep learning operator is constructed into a continuous one. Delay constraint function to ensure delay consistency.

Optionally, in another possible implementation manner of the present application, the architectural parameters of the network layer include the connection weight of each deep learning operator of the network layer. In this case, the above-mentioned method of “determining the target neural network structure according to the updated architecture parameters of each network layer” may include: the neural network structure search device obtains the updated architecture parameters of each network layer, and the numerical value satisfies The connection weights of the preset conditions are determined, and the target neural network structure is determined according to all the obtained connection weights.

In the prior art, the target neural network structure is determined according to the largest connection weight among the architectural parameters of each network layer, while in the present application, the connections whose values satisfy the preset conditions are based on the architectural parameters of each network layer. The weights determine the target neural network structure. The number of connection weights of each network layer is not limited in this application. When the number of connection weights obtained from each network layer is multiple, compared with one in the prior art, the larger the number of reserved connection weights, the more stable the obtained target neural network structure, so that according to the target neural network The task execution effect of the neural network model determined by the network structure is better.

Optionally, in another possible implementation manner of the present application, the above-mentioned method of "update the model parameters of the super network according to the delay loss function and the network loss function obtained in the training process" may include: a neural network structure. The search device determines the overall loss function of the super network according to the delay loss function and the network loss function, and updates the model parameters of the super network according to the overall loss function.

In this way, by updating the model parameters in the hypernetwork according to the delay loss function and the network loss function, it is ensured that the target neural network structure meets the requirements of delay consistency and network accuracy.

Optionally, in another possible implementation manner of the present application, the above-mentioned method of "update the model parameters of the super network according to the overall loss function" may include: the neural network structure search device determines each model according to the overall loss function. The gradient information of the parameters, according to the gradient information of each model parameter, adjust the model parameters. Among them, the gradient information is used to represent the adjustment coefficient of the corresponding model parameter.

Implemented updating model parameters through gradients.

In a second aspect, a neural network structure search apparatus is provided. The neural network structure search apparatus includes various modules for executing the neural network structure search method of the first aspect or any possible implementation manner of the first aspect.

In a third aspect, a neural network structure search apparatus is provided, and the neural network structure search apparatus includes a memory and a processor. The memory and the processor are coupled. The memory is used to store computer program code including computer instructions. When the processor executes the computer instructions, the neural network structure search apparatus executes the neural network structure search method as in the first aspect and any possible implementations thereof.

In a fourth aspect, a chip system is provided, and the chip system is applied to a neural network structure search apparatus. A chip system includes one or more interface circuits, and one or more processors. The interface circuit and the processor are interconnected by lines; the interface circuit is used for receiving signals from the memory of the neural network structure search device and sending signals to the processor, the signals including computer instructions stored in the memory. When the processor executes the computer instructions, the neural network structure search apparatus executes the neural network structure search method according to the first aspect and any possible implementations thereof.

In a fifth aspect, a computer-readable storage medium is provided, the computer-readable storage medium comprising computer instructions, when the computer instructions are executed on the neural network structure search device, the neural network structure search device is made to perform the first aspect and any of the above. A possible implementation of the neural network structure search method.

In a sixth aspect, the present application provides a computer program product, the computer program product comprising computer instructions, when the computer instructions are run on the neural network structure search device, the neural network structure search device is made to perform the first aspect and any one thereof. Possible implementations of neural network architecture search methods.

For specific descriptions of the second to sixth aspects and their various implementations in this application, reference may be made to the detailed descriptions in the first aspect and their various implementations; and, for the second to sixth aspects and their various implementations For the beneficial effect of the method, reference may be made to the analysis of the beneficial effect in the first aspect and its various implementation manners, which will not be repeated here.

These and other aspects of the present application will be more clearly understood from the following description.

Description of drawings

1 is a schematic structural diagram of a neural network structure search system provided by an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a computing device provided by an embodiment of the present application;

3 is one of the schematic flowcharts of the neural network structure search method provided by the embodiment of the present application;

4 is a schematic structural diagram of a super network provided by an embodiment of the present application;

5 is a second schematic flowchart of a method for searching a neural network structure provided by an embodiment of the present application;

FIG. 6 is a third schematic flowchart of a method for searching a neural network structure provided by an embodiment of the present application;

FIG. 7 is a schematic structural diagram of a neural network structure search apparatus provided by an embodiment of the present application.

detailed description

In the embodiments of the present application, words such as "exemplary" or "for example" are used to represent examples, illustrations or illustrations. Any embodiments or designs described in the embodiments of the present application as "exemplary" or "such as" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present the related concepts in a specific manner.

Hereinafter, the terms "first" and "second" are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Thus, a feature defined as "first" or "second" may expressly or implicitly include one or more of that feature. In the description of the embodiments of the present application, unless otherwise specified, "plurality" means two or more.

At present, the construction process of a neural network model is as follows: constructing a neural network structure, training and evaluating the constructed neural network structure to obtain a neural network structure with excellent performance, and determining a neural network model according to the neural network structure with excellent performance.

Most of the existing neural network structures are artificially designed. For example, network structures such as ResNet, which shines on image classification tasks, and Transformer, which dominates machine translation tasks, are designed by experts in the field. However, the design of the network structure is obtained by experts based on rich experience and a large number of experiments, and there are problems such as long time consumption, low accuracy and inconsistent delay. Among them, the inconsistency of the time delay refers to the inconsistency between the theoretical time delay of the neural network model and the real time delay, and the real time delay refers to the time delay of the actual operation of the neural network model on the electronic device.

NAS technology can automatically search for a neural network structure with excellent performance in a predefined search space, so as to solve the problem of artificially designed neural network structure.

In the solution of the prior art, the reinforcement learning technology is used to search the neural network structure. Specifically, the neural network structure search device may use a recurrent neural network (RNN) as a controller, and use the controller parameters to sample and generate a sub-network according to a preset search space. Train the sub-network to converge to obtain model evaluation indicators, such as the accuracy of the sub-network, the number of floating-point operations per second (FLOPs) running per second, and so on. The controller parameters can then be updated based on the model evaluation metrics. Then, the neural network structure search device can repeatedly perform the above operations, that is, according to the search space, use the updated controller parameters to sample and generate another sub-network, and train the other sub-network to obtain a new model evaluation index, and evaluate according to the new model. The indicator updates the controller parameters after the last update. The cycle is repeated until a sub-network with excellent performance is obtained, and the sub-network is used as the network structure of the neural network model to be determined.

However, because the neural network structure search device needs to train a large number of sub-networks to obtain the sub-network with the best performance, the network weights need to be initialized each time the sub-network is trained, resulting in a large consumption of computing resources. Moreover, since the neural network structure search device refers to FLOPs when updating the controller parameters, the FLOPs cannot reflect the real delay of the sub-network on different electronic devices, resulting in the inability to guarantee the theoretical and real delays of the sub-network. consistent.

In the solution of the prior art 2, an evolutionary algorithm and a reinforcement learning technology are used to search the neural network structure. The search process in this solution adds a neural network structure search device to the electronic device to send the sub-network to the electronic device on the basis of the solution of the prior art, and receives the real time delay of running the sub-network returned by the electronic device. In this way, the neural network structure search device can refer to the real time delay instead of the FLOPs when updating the controller parameters, which solves the problem of time delay inconsistency in the solution of the prior art.

However, the solution of the prior art still has the problem of large consumption of computing resources, and sending the sub-network to the electronic device will also increase a large amount of computing resources, resulting in low search efficiency of the neural network structure.

To sum up, the neural network structure search in the prior art has the problems that the consumption of computing resources is relatively large, and the theoretical delay and the actual delay cannot be guaranteed to be consistent.

In order to use less computing resources to determine a neural network structure with excellent performance in a short time under the condition that the theoretical delay and the real delay are guaranteed to be consistent, the embodiment of the present application provides a neural network structure search method, By obtaining the super network according to the target task, and according to the delay of each deep learning operator in the super network running on the electronic device, the delay loss function of the super network is determined. In the process of training the super network, the model parameters of the super network are updated according to the delay loss function and the network loss function until the updated super network satisfies the conditions for the target task to run on the electronic device, and finally according to the updated The architectural parameters of the network layer determine the target neural network structure. In this way, the supernetwork obtained according to the target task is a structure that includes multiple network layers, each network layer includes multiple nodes, and any two nodes are connected by deep learning operators. Subnet of the target task. The embodiment of the present application updates the model parameters of the super network by training the super network. The model parameters include the architecture parameters of each network layer. Until the updated super network satisfies the conditions, the updated architecture of each network layer can be The parameters determine the target neural network structure, that is, determine the neural network structure with the best performance. Compared with the training of a large number of sub-networks in the prior art, the target neural network structure can be obtained because the embodiment of the present application only needs to train the super network. The target neural network structure is obtained, so a lot of computing resources are saved, the search time is shortened, and the search efficiency is improved. Moreover, since the delay loss function is referenced when updating the model parameters, the delay loss function is obtained according to the real delay of each deep learning operator running on the electronic device, which can realize the determination of the target neural network structure. When , the theoretical delay and the real delay are guaranteed to be consistent.

The execution body of the neural network structure search method provided by the embodiment of the present application is a neural network structure search apparatus.

In one scenario, the neural network structure search apparatus may be an electronic device, and the electronic device may be a server or a terminal device. That is, the electronic device itself initiates the target task, and determines the target neural network structure with the best performance by executing the neural network structure search method provided in the embodiment of the present application, thereby determining the neural network model. After that, the electronic device runs the neural network model to perform the target task.

In another scenario, the neural network structure search device may be a server, and the terminal device running the neural network model may be used. That is, the server determines the target neural network structure with the best performance by executing the neural network structure search method provided in the embodiment of the present application, thereby determining the neural network model, and sending the neural network model to the terminal device. The terminal device runs the received neural network model to perform the target task. Specifically, the neural network structure search method provided in the embodiment of the present application may be applicable to a neural network structure search system.

FIG. 1 shows a structure of the neural network structure search system. As shown in FIG. 1 , the neural network structure search system may include: a server 11 and a terminal device 12 . The connection between the server 11 and the terminal device 12 is established through wired communication or wireless communication.

The server 11 is the execution body of the neural network structure search method provided by the embodiment of the present application. It is mainly used to train the super network, and update the model parameters in the super network according to the delay loss function and the network loss function, until the updated super network meets the conditions for the target task to run on the terminal device 12 . It is also used to determine the target neural network structure according to the updated architecture parameters of each network layer, so as to determine the neural network model, and send the neural network model to the terminal device 12 .

In some embodiments, the server 11 may be a single server, a server cluster composed of multiple servers, or a cloud computing service center. This embodiment of the present application does not limit the specific form of the server, and FIG. 1 takes one server as an example to illustrate.

The terminal device 12 is used to run the neural network model from the server 11 to perform the target task.

In some embodiments, the terminal device 12 may be: a mobile phone (mobile phone), a tablet computer, a notebook computer, a handheld computer, a mobile internet device (MID), a wearable device, and a virtual reality (VR) equipment, augmented reality (AR) equipment, wireless terminals in industrial control (wireless terminals in industrial control), wireless terminals in self-driving (self driving), wireless terminals in remote medical surgery (remote medical surgery), smart grids (smart grids) wireless terminals in grid), wireless terminals in transportation safety, wireless terminals in smart cities, wireless terminals in smart homes, internet of things (IOT) devices etc. The specific form of the terminal device is not limited in this embodiment of the present application, and FIG. 1 takes the terminal device 12 as a mobile phone as an example for illustration.

This embodiment of the present application does not limit the specific scenario in which the neural network structure search method is applied.

The basic hardware structures of the above-mentioned server 11 and terminal device 12 are similar, and both include the elements included in the computing apparatus shown in FIG. 2 . The hardware structures of the server 11 and the terminal device 12 are described below by taking the computing device shown in Fig. 2 as an example.

As shown in FIG. 2 , the computing device may include a processor 21 , a memory 22 , a communication interface 23 , and a bus 24 . The processor 21 , the memory 22 and the communication interface 23 can be connected through a bus 24 .

The processor 21 is the control center of the computing device, and may be a processor or a general term for multiple processing elements. For example, the processor 21 may be a general-purpose central processing unit (central processing unit, CPU), or may be other general-purpose processors or the like. The general-purpose processor may be a microprocessor or any conventional processor, for example, the general-purpose processor may be a graphics processor (graphics processing unit, GPU), a digital signal processor (digital signal processing, DSP), and the like.

As an example, the processor 21 may include one or more CPUs, such as CPU 0 and CPU 1 shown in FIG. 2 .

The memory 22 may be read-only memory (ROM) or other type of static storage device that can store static information and instructions, random access memory (RAM) or other type of static storage device that can store information and instructions A dynamic storage device that can also be an electrically erasable programmable read-only memory (EEPROM), a magnetic disk storage medium, or other magnetic storage device, or can be used to carry or store instructions or data structures in the form of desired program code and any other medium that can be accessed by a computer, but is not limited thereto.

In a possible implementation manner, the memory 22 may exist independently of the processor 21, and the memory 22 may be connected to the processor 21 through a bus 24 for storing instructions or program codes. When the processor 21 calls and executes the instructions or program codes stored in the memory 22, it can implement the neural network structure search method provided by the following embodiments of the present application.

In this embodiment of the present application, for the server 11 and the terminal device 12, the software programs stored in the memory 22 are different, so the functions implemented by the server 11 and the terminal device 12 are different. The functions performed by each device will be described in conjunction with the following flowcharts.

In another possible implementation manner, the memory 22 may also be integrated with the processor 21 .

The communication interface 23 is used to connect the computing device with other devices through a communication network, and the communication network can be an Ethernet, a radio access network (RAN), a wireless local area network (WLAN), and the like. The communication interface 23 may include a receiving unit for receiving data, and a transmitting unit for transmitting data.

The bus 24 can be an industry standard architecture (Industry Standard Architecture, ISA) bus, a peripheral device interconnect (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus and the like. The bus can be divided into address bus, data bus, control bus and so on. For ease of presentation, only one thick line is used in FIG. 2, but it does not mean that there is only one bus or one type of bus.

It should be pointed out that the structure shown in FIG. 2 does not constitute a limitation on the computing device. In addition to the components shown in FIG. 2 , the computing device may include more or less components than those shown in the figure, or combine certain components. some components, or a different arrangement of components.

Based on the hardware structure of the above computing device, an embodiment of the present application provides a method for searching a neural network structure. The following describes the method for searching a neural network structure provided by the embodiment of the present application with reference to the accompanying drawings. In the embodiment of the present application, the neural network structure search method provided by the embodiment of the present application is introduced by taking the scenario where the server executes the neural network structure search method, determines the neural network model, and the terminal device receives and runs the neural network model as an example.

When the neural network structure search method is applied to the neural network structure search system shown in FIG. 1 , as shown in FIG. 3 , the neural network structure search method may include the following steps 301 to 305 .

301. The server obtains the hypernetwork according to the target task.

Among them, the target task is used to instruct the construction of the neural network model running on the terminal device. The super network includes multiple network layers, each network layer includes multiple nodes, and any two nodes in a network layer are connected by one or more deep learning operators. The types of deep learning operators can be convolution, separation convolution, dilated convolution, average pooling, etc. And, each neural network structure sampled from this supernetwork can be used to perform the target task.

Typically, each network layer includes at least two nodes in number. The greater the number of nodes included in the network layer, the more corresponding deep learning operators, the more computing resources required, and the higher the accuracy of the output results.

When the server obtains the target task for instructing the construction of the neural network model running on the terminal device, the server may first determine the target neural network structure with the best performance of the neural network model. Specifically, the server may first obtain the hypernetwork according to the target task.

It can be understood that the above-mentioned process for the server to acquire the super network according to the target task is as follows: the server can determine whether there is a local historical task that is the same as or similar to the target task. If it exists, it indicates that the server has built a supernetwork based on historical tasks before, and at this time, the server can directly obtain the supernetwork previously built based on historical tasks locally. If it does not exist, it means that the server has not built a super network according to the target task before. At this time, the server can build a super network according to the target task and the preset search space. Among the above two ways of acquiring the super network, the way that the server directly acquires the super network locally can reduce the workload of searching for the target neural network structure, thereby improving the search efficiency.

In addition, the above target task may include the output type of the neural network model. For example, the target task can be to build a face recognition neural network model running on the terminal device, which is used to recognize the face and output the corresponding person name. For another example, the target task may be to build a hand pose estimation model running on a terminal device to recognize the hand pose of a person in a picture.

Exemplarily, FIG. 4 is a schematic structural diagram of a super network provided by an embodiment of the present application. As shown in FIG. 4 , the super network includes three network layers as an example. Among them, the first network layer includes three nodes, and the deep learning operators used in the connection between the three nodes include: 3×3 standard convolution, 5×5 standard convolution and skip connection operator. The second network layer includes three nodes, and the deep learning operators used in the connection between the three nodes include: 3×3 standard convolution, 5×5 standard convolution and 3×3 split convolution. The third network layer includes four nodes. The deep learning operators used in the connection between these four nodes include: 3×3 standard convolution, 5×5 separation convolution, 3×3 dilated convolution and skip connection operator . In this way, it can be seen from Figure 4 that the super network includes: 3×3 standard convolution, 5×5 standard convolution, skip connection operator, 3×3 separated convolution, 5×5 separated convolution, and 3×3 dilated volume product, a total of six deep learning operators. It can be understood that in each network layer shown in Figure 4, the connection of each node is only an exemplary connection, how to connect the nodes of each network layer specifically, and the deep learning algorithm used for the connection between the two nodes. The embodiments of the present application are not limited herein.

302. The server obtains the running delay of each deep learning operator in the super network on the terminal device.

After the server constructs the super network, it can send each deep learning operator in the super network to the terminal device. The terminal device can run each received deep learning operator and return the delay of running each deep learning operator to the server. In this way, the server can obtain the running delay of each deep learning operator on the terminal device.

303. The server determines a delay loss function of the super network according to the delay of each deep learning operator running on the terminal device.

After obtaining the delay of each deep learning operator running on the terminal device, the server can determine the delay loss function of the entire super network according to these delays. For details, refer to the description of steps 303A-303C below.

304. The server performs a training operation on the hypernetwork, and updates the model parameters of the hypernetwork according to the delay loss function and the network loss function obtained during the training process, until the updated hypernetwork satisfies the target task running on the terminal device. condition.

Among them, the network loss function is used to characterize the difference between the prediction output of the super network and the data labels. The larger the output value of the network loss function, the greater the difference between the predicted output and the data label. The training process of the super network can be understood as the process of reducing the output value of the delay loss function and the network loss function as much as possible.

After the server obtains the super network in step 301, it can train the super network, and update the model parameters in the super network according to the delay loss function determined in step 303 and the network loss function obtained in the training process, until The updated supernetwork satisfies the conditions for the target task to run on the terminal device and stops the training process. The model parameters may include architectural parameters of each of the multiple network layers.

It can be understood that the above conditions may include: accuracy requirements and delay requirements. For example, the condition may include that the accuracy of the output result obtained by running the super network reaches a preset percentage, the delay of running the super network is less than a preset time value, and the like. This condition is proposed in advance for the target task and the hardware structure of the terminal device.

305. The server determines the target neural network structure according to the updated architecture parameters of each network layer.

Among them, the target neural network structure is the network structure with the best performance.

In a specific implementation, the architectural parameters of each network layer may include the connection weight of each deep learning operator in all deep learning operators of the network layer. In this case, the server determines the target neural network structure according to the updated architecture parameters of each network layer. weight, and then determine the target neural network structure according to all the obtained connection weights.

It can be understood that the above-mentioned preset conditions may be implemented in various manners.

In a possible implementation manner, the preset condition may be: a preset number of connection weights in each network layer. The connection weights that satisfy the preset conditions are: the weights of the first preset number of all connection weights in the network layer sorted in descending order. The preset number of connection weights for different network layers can be the same or different.

In another possible implementation manner, the preset condition may be: a connection weight greater than a preset weight value. In this way, the server can obtain connection weights greater than the preset weight value from the architectural parameters of each network layer, and determine the target neural network structure according to all the obtained connection weights. Of course, the preset condition may also be to set a corresponding preset weight value for each network layer. The preset weight values corresponding to different network layers may be the same or different.

The neural network structure search method provided by the embodiment of the present application obtains the super network according to the target task, and determines the delay loss function of the super network according to the running delay of each deep learning operator in the super network on the electronic device. In the process of training the super network, the model parameters of the super network are updated according to the delay loss function and the network loss function until the updated super network satisfies the conditions for the target task to run on the electronic device, and finally according to the updated The architectural parameters of the network layer determine the target neural network structure. In this way, the supernetwork obtained according to the target task is a structure that includes multiple network layers, each network layer includes multiple nodes, and any two nodes are connected by deep learning operators. Subnet of the target task. The embodiment of the present application updates the model parameters of the super network by training the super network. The model parameters include the architecture parameters of each network layer. Until the updated super network satisfies the conditions, the updated architecture of each network layer can be The parameters determine the target neural network structure, that is, determine the neural network structure with the best performance. Compared with the training of a large number of sub-networks in the prior art, the target neural network structure can be obtained because the embodiment of the present application only needs to train the super network. The target neural network structure is obtained, so a lot of computing resources are saved, the search time is shortened, and the search efficiency is improved. Moreover, since the delay loss function is referenced when updating the model parameters, the delay loss function is obtained according to the real delay of each deep learning operator running on the electronic device, which can realize the determination of the target neural network structure. When , the theoretical delay and the real delay are guaranteed to be consistent.

For example, suppose the target task is to build a hand pose estimation model that runs on a terminal device, and the hand pose estimation model runs on the GPU of the terminal device. Then, if the target neural network structure needs to be determined within one day, if the solution of the prior art is used to determine the target neural network structure, thousands of GPUs may be required. If the target neural network structure is determined by using the neural network structure search method provided in the embodiment of the present application, it may only consume 1-2 GPUs. It can be seen from this that the neural network structure search method provided by the embodiments of the present application greatly saves the computing resources required for searching for the target neural network structure.

Optionally, in this embodiment of the present application, based on FIG. 3 , as shown in FIG. 5 , the foregoing step 303 may specifically include the following steps 303A-303C.

303A. The server determines the network embedding coefficient corresponding to each deep learning operator according to the pre-stored correspondence between the operator and the network embedding coefficient.

Among them, the function of the network embedding coefficient is to make the delay loss function obtained by using the network embedding coefficient consistent with the meaning of the network loss function.

303B. The server determines the product of the running delay of each deep learning operator on the terminal device and the network embedding coefficient corresponding to the deep learning operator, and determines the sum of all the products.

After determining the network embedding coefficient corresponding to each deep learning operator, the server can calculate the product of the running delay of each deep learning operator on the terminal device and the network embedding coefficient corresponding to the deep learning operator, and calculate all the Add the products to get the sum.

In a specific implementation, it is assumed that the server uses

Indicates the delay set composed of the delays of each deep learning operator running on the terminal device among the multiple deep learning operators included in the super network.

Among them, S represents the set of all deep learning operators in the super network, and operator represents the deep learning operators in the S set,

Indicates the delay of the ith deep learning operator in the S set on the terminal device.

Then, the server can calculate the product of the delay of each deep learning operator running on the terminal device and the network embedding coefficient corresponding to the deep learning operator, and calculate the sum of all products. where the sum value satisfies the following formula:

in,

represents the delay of the ith deep learning operator in the S set on the terminal device, α _i represents the network embedding coefficient corresponding to the ith deep learning operator,

Represents the weighted summation result of all deep learning operators in the S set, that is, calculates the product of the delay of each deep learning operator and the network embedding coefficient corresponding to the deep learning operator, and calculates the sum of all products.

303C, the server determines the delay loss function according to the sum value and the delay consistency coefficient.

The server is getting the sum of all products

After that, the delay loss function can be determined. Among them, the delay loss function satisfies the following formula:

Among them, λ _la represents the delay consistency coefficient, and loss _la represents the delay loss function.

It can be understood that the above λ _la is a variable, which is a matrix formed by the connection weights of each deep learning operator in each network layer in the multiple network layers of the super network, which is continuously updated during the training process.

Optionally, in this embodiment of the present application, based on FIG. 5 , as shown in FIG. 6 , the foregoing step 304 may specifically include the following steps 304A-304B.

304A. The server performs a training operation on the super network, and determines the overall loss function of the super network according to the delay loss function and the network loss function.

Among them, the delay loss function is to ensure the delay consistency of the target neural network structure, and the network loss function is to ensure the accuracy requirements of the target neural network structure, that is, the accuracy requirements.

It can be understood that in order to avoid overfitting of the hypernetwork, the server also needs to consider the network regularization term when determining the overall loss function.

Specifically, the server can determine the overall loss function of the hypernetwork. Among them, the overall loss function satisfies the following formula:

Among them, loss _la represents the delay loss function, loss _mse represents the network loss function,

Represents the network regular term, and loss represents the overall loss function.

304B. The server updates the model parameters of the super-network according to the volume loss function, until the updated super-network satisfies the conditions for running the target task on the terminal device.

In a specific implementation, the server may determine the gradient information of each model parameter according to the overall loss function, where the gradient information is used to represent the adjustment coefficient of the corresponding model parameter. After that, the server can adjust the model parameters according to the gradient information of each model parameter.

In the case that the model parameters include network parameters and architecture parameters of each network layer, and the network parameters of each network layer include the weights calculated by each deep learning of the network layer, the server may first update the network parameters of each network layer. Among them, the updated network parameters satisfy the following formula:

Among them, w represents the weight of a deep learning operator in the network parameters of a certain network layer, W _N ' represents the value of the network parameter w after the last training,

Represents the gradient information of the network parameter w, and W _N represents the updated value of the network parameter w.

After that, the server can update the architectural parameters of each network layer. Among them, the updated architecture parameters satisfy the following formula:

Among them, a represents the connection weight of a deep learning operator in the architecture parameters of a certain network layer, W _A ' represents the value of the architecture parameter a after the last training,

Represents the gradient information of the architecture parameter a, and W _A represents the updated value of the architecture parameter a.

The solutions provided by the embodiments of the present application have been introduced above mainly from the perspective of methods. In order to realize the above-mentioned functions, it includes corresponding hardware structures and/or software modules for executing each function. Those skilled in the art should easily realize that the present application can be implemented in hardware or in the form of a combination of hardware and computer software, in conjunction with the algorithm steps of the examples described in the embodiments disclosed herein. Whether a function is performed by hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.

As shown in FIG. 7 , it is a schematic structural diagram of a neural network structure search apparatus 70 according to an embodiment of the present application. The neural network structure search device 70 is configured to execute the neural network structure search method shown in any one of FIG. 3 , FIG. 5 , and FIG. 6 . The neural network structure search apparatus 70 may include an acquisition unit 71 , a determination unit 72 , a training unit 73 and an update unit 74 .

The obtaining unit 71 is configured to obtain a super network according to the target task, the super network includes a plurality of network layers, each network layer includes a plurality of nodes, and any two nodes of a network layer are connected by a deep learning operator; also used for Obtain the running delay of each deep learning operator in the super network on the electronic device. For example, with reference to FIG. 3 , the obtaining unit 71 may be used to perform steps 301 and 302 . The determining unit 72 is configured to determine the delay loss function of the super network according to the running delay of each deep learning operator obtained by the obtaining unit 71 in the electronic device. For example, in conjunction with FIG. 3 , the determination unit 72 may be used to perform step 303 . The training unit 73 is configured to perform a training operation on the super network obtained by the obtaining unit 71 . For example, in conjunction with FIG. 3 , the training unit 73 may be configured to perform the training operation on the super network described in step 304 . The updating unit 74 is used to update the model parameters of the super network according to the time delay loss function determined by the determining unit 72 and the network loss function obtained during the training process of the training unit 73, until the updated super network satisfies the target task on the electronic device The operating conditions, the model parameters include the architectural parameters of each of the multiple network layers. For example, referring to FIG. 3 , the updating unit 74 may be configured to perform step 304 to update the model parameters of the super network according to the delay loss function and the network loss function obtained in the training process. The determining unit 72 is further configured to determine the target neural network structure according to the architecture parameters of each network layer updated by the updating unit 74 . For example, in conjunction with FIG. 3 , the determination unit 72 may be used to perform step 305 .

Optionally, the determining unit 72 is specifically configured to: determine the network embedding coefficient corresponding to each deep learning operator according to the corresponding relationship between the pre-stored operator and the network embedding coefficient; The product of the delay and the network embedding coefficient corresponding to the deep learning operator is determined, and the sum of all products is determined; the delay loss function is determined according to the sum and the delay consistency coefficient.

Optionally, the architecture parameters of the network layer include the connection weight of each deep learning operator of the network layer, and the determining unit 72 is specifically configured to: obtain the updated architecture parameters of each network layer, the value of which satisfies the preset condition. Connection weights; determine the target neural network structure according to all the obtained connection weights.

Optionally, the updating unit 74 is specifically configured to: determine the overall loss function of the super network according to the delay loss function and the network loss function; and update the model parameters of the super network according to the overall loss function.

Optionally, the updating unit 74 is specifically configured to: determine gradient information of each model parameter according to the overall loss function, and the gradient information is used to represent the adjustment coefficient of the corresponding model parameter; adjust the gradient information of each model parameter according to the gradient information of each model parameter. model parameters.

Certainly, the neural network structure search apparatus 70 provided in the embodiment of the present application includes but is not limited to the above-mentioned modules.

In actual implementation, the acquiring unit 71 , the determining unit 72 , the training unit 73 and the updating unit 74 may be implemented by the processor shown in FIG. 2 . For the specific execution process, reference may be made to the description of the part of the neural network structure search method shown in FIG. 3 , FIG. 5 or FIG. 6 , which will not be repeated here.

Another embodiment of the present application further provides a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and when the computer instructions are executed on the neural network structure search device, the neural network structure search device is made to execute the above method. Each step performed by the neural network structure search apparatus in the method flow shown in the embodiment.

Another embodiment of the present application further provides a chip system, which is applied to a neural network structure search apparatus. A chip system includes one or more interface circuits, and one or more processors. The interface circuit and the processor are interconnected by wires. The interface circuit is configured to receive signals from the memory of the neural network structure search device and send signals to the processor, the signals including computer instructions stored in the memory. When the processor executes the computer instructions, the neural network structure search apparatus executes each step performed by the neural network structure search apparatus in the method flow shown in the above method embodiment.

In another embodiment of the present application, a computer program product is also provided. The computer program product includes computer instructions. When the computer instructions are run on the neural network structure search device, the neural network structure search device is made to perform the above method embodiments. Each step performed by the neural network structure search apparatus in the method flow shown.

In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented using a software program, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer-executed instructions are loaded and executed on the computer, the flow or function according to the embodiments of the present application is generated in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device. Computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website site, computer, server, or data center over a wire (e.g. coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.) means to transmit to another website site, computer, server or data center. Computer-readable storage media can be any available media that can be accessed by a computer or data storage devices including one or more servers, data centers, etc., that can be integrated with the media. Useful media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media (eg, solid state disks (SSDs)), and the like.

The above descriptions are merely specific embodiments of the present application. Those skilled in the art can think of changes or substitutions based on the specific embodiments provided by the present application, which should all fall within the protection scope of the present application.

Claims

A neural network structure search method, comprising:

Obtain a super network according to the target task, the super network includes a plurality of network layers, each network layer includes a plurality of nodes, and any two nodes of a network layer are connected by a deep learning operator;

Obtain the running delay of each deep learning operator in the super network in the electronic device;

Determine the delay loss function of the super network according to the delay of each deep learning operator running on the electronic device;

Perform a training operation on the super network, and update the model parameters of the super network according to the delay loss function and the network loss function obtained in the training process, until the updated super network satisfies the target task at Conditions for operation on the electronic device; the model parameters include architectural parameters of each network layer in the plurality of network layers;

According to the updated architecture parameters of each network layer, the target neural network structure is determined.
The method for searching a neural network structure according to claim 1, wherein the determining the delay loss function of the super network according to the running delay of each deep learning operator in the electronic device, comprises:

Determine the network embedding coefficient corresponding to each deep learning operator according to the correspondence between the pre-stored operator and the network embedding coefficient;

Determine the product of the delay of each deep learning operator running on the electronic device and the network embedding coefficient corresponding to the deep learning operator, and determine the sum of all products;

The delay loss function is determined according to the sum and the delay consistency coefficient.
The neural network structure search method according to claim 1 or 2, wherein the architectural parameters of the network layer include a connection weight of each deep learning operator of the network layer, and the The architectural parameters of the network layer determine the target neural network structure, including:

Obtain the connection weights whose values satisfy the preset conditions in the updated architecture parameters of each network layer;

The target neural network structure is determined according to all the obtained connection weights.
The neural network structure search method according to any one of claims 1-3, wherein the super network is updated according to the delay loss function and the network loss function obtained in the training process model parameters, including:

determining the overall loss function of the super-network according to the delay loss function and the network loss function;

According to the overall loss function, the model parameters of the super-network are updated.
The neural network structure search method according to claim 4, wherein the updating the model parameters of the super network according to the overall loss function comprises:

According to the overall loss function, determine the gradient information of each model parameter, where the gradient information is used to represent the adjustment coefficient of the corresponding model parameter;

According to the gradient information of each model parameter, the model parameter is adjusted.
A neural network structure search device, characterized in that it includes:

The acquiring unit is configured to acquire a super network according to the target task, the super network includes a plurality of network layers, each network layer includes a plurality of nodes, and any two nodes of a network layer are connected by a deep learning operator; The delay of each deep learning operator in the super network running on the electronic device;

A determining unit, for determining the time delay loss function of the super network according to the time delay that each deep learning operator obtained by the obtaining unit runs in the electronic device;

A training unit, configured to perform a training operation on the hypernetwork acquired by the acquiring unit;

an update unit, configured to update the model parameters of the super network according to the delay loss function determined by the determination unit and the network loss function obtained during the training process of the training unit until the updated super network Satisfy the conditions for the target task to run on the electronic device; the model parameters include architectural parameters of each network layer in the plurality of network layers;

The determining unit is further configured to determine the target neural network structure according to the architecture parameters of each network layer updated by the updating unit.
The neural network structure search device according to claim 6, wherein the determining unit is specifically used for:

Determine the network embedding coefficient corresponding to each deep learning operator according to the correspondence between the pre-stored operator and the network embedding coefficient;

Determine the product of the delay of each deep learning operator running on the electronic device and the network embedding coefficient corresponding to the deep learning operator, and determine the sum of all products;

The delay loss function is determined according to the sum and the delay consistency coefficient.
The neural network structure search apparatus according to claim 6 or 7, wherein the architecture parameter of the network layer includes a connection weight of each deep learning operator of the network layer, and the determining unit is specifically used for :

Obtain the connection weights whose values satisfy the preset conditions in the updated architecture parameters of each network layer;

The target neural network structure is determined according to all the obtained connection weights.
The neural network structure search device according to any one of claims 6-8, wherein the updating unit is specifically used for:

determining the overall loss function of the super-network according to the delay loss function and the network loss function;

According to the overall loss function, the model parameters of the super-network are updated.
The neural network structure search device according to claim 9, wherein the updating unit is specifically used for:

According to the overall loss function, determine the gradient information of each model parameter, where the gradient information is used to represent the adjustment coefficient of the corresponding model parameter;

According to the gradient information of each model parameter, the model parameter is adjusted.
A neural network structure search device, characterized in that the neural network structure search device includes a memory and a processor; the memory and the processor are coupled; the memory is used for storing computer program codes, the computer program codes Computer instructions are included; when the processor executes the computer instructions, the neural network structure search apparatus executes the neural network structure search method according to any one of claims 1-5.
A computer-readable storage medium, characterized by comprising computer instructions, when the computer instructions are executed on the neural network structure search device, the neural network structure search device is made to perform any one of claims 1-5. The described neural network structure search method.