CN111428854A

CN111428854A - Structure searching method and structure searching device

Info

Publication number: CN111428854A
Application number: CN202010055831.4A
Authority: CN
Inventors: 肖安; 徐宇辉; 谢凌曦; 张晓鹏; 魏龙辉; 田奇
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-01-17
Filing date: 2020-01-17
Publication date: 2020-07-17

Abstract

The application discloses a structure searching method and a structure searching device in the field of artificial intelligence, which are used for constructing a neural network, determining the neural network meeting the time delay requirement by taking the prediction delay of a construction unit on hardware as a constraint condition, and realizing the construction of the neural network in the artificial intelligence. The method comprises the following steps: acquiring a target task for requesting to create a neural network running on preset hardware; then, acquiring a super unit according to the target task, wherein the super unit comprises a plurality of nodes, and any two nodes in the plurality of nodes are connected through a plurality of basic operations; then, searching the super unit by taking the output of the delay prediction model as a constraint condition to determine at least one first construction unit, wherein the delay prediction model is used for outputting a prediction delay, and the prediction delay is the prediction delay of the construction unit included in the super unit when the construction unit runs on preset hardware; and stacking at least one first construction unit to obtain the neural network.

Description

Structure searching method and structure searching device

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a structure search method and a structure search apparatus.

Background

In recent years, deep neural networks have achieved a higher level of success in processing and analyzing various media signals such as images, videos, and voices. A well-behaved neural network often has a delicate network structure and requires a highly skilled and experienced human expert to expend a great deal of effort in designing. The neural network structure search changes the manual design mode, automatically searches the neural network structure, obtains the neural network structure with excellent performance, and achieves excellent performance on tasks such as image recognition, image semantic segmentation, natural language processing and the like.

Generally, the neural network structure search may specifically include searching the building units in a search space, and then building a deep neural network by stacking the searched building units. However, in searching for the neural network structure, in addition to the output accuracy of the neural network, the delay in running the neural network on hardware needs to be considered. Especially in some scenarios with high requirements on time delay, if the delay of the operation of the neural network is large, the longer the time taken for outputting the result is, which affects the user experience.

Disclosure of Invention

The application discloses a structure searching method and a structure searching device in the field of artificial intelligence, which are used for constructing a neural network, determining the neural network meeting the time delay requirement by taking the prediction delay of a construction unit on hardware as a constraint condition, and realizing the construction of the neural network in the artificial intelligence.

In a first aspect, the present application provides a structure search method, including:

acquiring a target task, wherein the target task is used for requesting to create a neural network running on preset hardware; acquiring a super unit according to a target task, wherein the super unit comprises a plurality of nodes, and any two nodes in the plurality of nodes are connected through a plurality of basic operations; searching the super unit by taking the output of a delay prediction model as a constraint condition to determine at least one first construction unit, wherein the delay prediction model is used for outputting a prediction delay, the prediction delay is the prediction time length of an output result obtained when the construction unit included in the super unit runs on preset hardware, the delay prediction model is obtained by training according to the information of a plurality of second construction units, the information of the plurality of second construction units comprises the structure of each second construction unit in the plurality of second construction units and the operation delay of each second construction unit, the operation delay is the time length of the output result obtained when each second construction unit runs on the preset hardware, and each first construction unit, the construction unit included in the super unit and each second construction unit in at least one first construction unit comprise a plurality of nodes, any two nodes in the plurality of nodes are connected through at most one of a plurality of basic operations, and each node in the plurality of nodes is connected with at least one of the plurality of basic operations; and stacking at least one first construction unit to obtain the neural network.

In the embodiment of the application, after the target task is obtained, the super unit is searched by taking the output of the delay prediction model as a constraint condition, and at least one construction unit is obtained. And stacking the at least one construction unit to obtain the neural network constructed by the target task request. The neural network takes the prediction delay output by the delay prediction model as a constraint condition, and the delay prediction model is obtained by operation delay training obtained by the operation of the second construction unit on the preset hardware. Therefore, the application adds the prediction delay as a constraint condition search building unit, and further obtains the neural network taking the prediction delay as the constraint condition. For example, for some scenes with high delay requirements, a construction unit with low delay can be determined, so that the time spent by the neural network for outputting the result is reduced, and the user experience is improved.

In a possible embodiment, before searching the super unit with the output of the delay prediction model as a constraint condition, the method may further include: acquiring the structure of each second construction unit; measuring a delay in operation of each second building element; and training based on the structure of each second construction unit, the operation delay of each second construction unit and a preset regression model to obtain a delay prediction model.

In the embodiment of the application, before searching the super unit, the delay prediction model for outputting the prediction delay of the construction unit is obtained by training the structure and the operation delay of the second construction unit, so that the delay prediction model can be used for outputting the prediction delay subsequently, the super unit does not need to be updated each time, the operation delay of the construction unit on preset hardware is measured, the efficiency of searching the super unit is improved, and the construction unit meeting the constraint condition is searched.

In a possible embodiment, the training based on the structure of each second building unit, the operation delay of each second building unit and the preset regression model may include: dividing the plurality of second construction units into two types to obtain a first type construction unit and a second type construction unit; performing iterative training for M times based on a preset regression model according to the structure of the first type of construction unit, the structure of the second type of construction unit and the preset regression model to obtain a delay prediction model, wherein M is a positive integer; wherein the Kth one of the M iterations may include: training the delay prediction model obtained by the K-1 training according to the structure of the first type of construction unit and the operation delay of each second construction unit in the first type of construction unit to obtain a temporary delay prediction model for the K training, wherein K is a positive integer not greater than M; and verifying the temporary delay prediction model according to the structure of each second construction unit included in the second type construction unit and the operation delay of each second construction unit in the second type construction unit, and updating the temporary delay prediction model according to the verification result to obtain the K-th training delay prediction model.

In the embodiment of the application, the training of the delay prediction model can be performed in an iterative training mode, so that the prediction delay which can be output by the trained delay prediction model is closer to the actually measured delay, and the accuracy of the prediction delay is improved.

In a possible embodiment, the searching the super unit with the output of the delay prediction model as a constraint condition to determine at least one first building unit may include: performing N times of iterative updating on the structural parameter set of the super unit by taking the output of the delay prediction model as a constraint condition, wherein N is a positive integer; and determining at least one first construction unit according to the structure parameter set of the super unit updated by the Nth iteration.

In the embodiment of the present application, the output of the delay prediction model may be used as a constraint condition to iteratively update the set of structural parameters of the super cell until convergence. Thus, the subsequently determined at least one building element is also associated with the predicted delay of the structural element, resulting in a neural network that satisfies the delay constraint.

In one possible implementation, the pth of the N iterations may include: sampling the super unit according to a structure parameter set of the super unit to obtain a plurality of third construction units, wherein the structure parameter set comprises a plurality of structure parameters, and each structure parameter in the plurality of structure parameters is the weight of basic operation of two nodes connecting the super unit; outputting the predicted delays of the plurality of third building units through the delay prediction model; and updating the structural parameter set obtained by updating the P-1 st time by taking the prediction delay of the plurality of third construction units as a constraint condition to obtain a structural parameter set updated for the P time, wherein P is a positive integer not greater than N.

In the embodiment of the present application, the super unit may be specifically sampled to obtain a plurality of third building units, and then the structure parameter set obtained by the last update is updated by using the prediction delays in the plurality of third building units as constraint conditions to obtain the structure parameter set updated by the current iteration. After N iterations, a converged structure parameter set can be obtained, so that one or more first construction units conforming to the delay constraint strength are obtained, and a neural network conforming to the delay constraint strength is obtained.

In a possible implementation manner, the updating the structure parameter set obtained by the P-1 th updating with the predicted delay of the plurality of third building units as a constraint condition includes: calculating an expected delay of the super cell from the delays of the plurality of third building cells; updating the joint loss function of the super cell with the expected delay as a constraint condition; and updating the structural parameter set according to the joint loss function.

In the embodiment of the application, the expected delay of the super unit can be calculated according to the predicted delays of the plurality of third building units, the joint loss function of the super unit is updated by taking the expected delay as a constraint condition, and then the structure parameter set is updated according to the joint loss function. Thus, in updating the set of structural parameters of the super-unit, the expected delay may be combined, such that the updated set of structural parameters is associated with the predicted delays of the plurality of third building units, resulting in at least one first building unit that complies with the delay constraint strength.

In one possible embodiment, the joint loss function includes L_total(α)＝L_val(α) + λ. L PM (α), wherein L_val(α) is a predetermined loss function, L PM (α) is a desired delay, α is a set of structural parameters, λ is a delay constraint strength, and λ is determined according to a target task.

In the embodiment of the present application, an expected delay and a delay constraint strength are added, so that a subsequently updated set of structural parameters is directly associated with the expected delay and the delay constraint strength, and thus, the obtained at least one first structural unit is associated with the expected delay and the delay constraint strength, and at least one building unit conforming to the delay constraint strength is obtained.

In one possible embodiment, after the nth update, the operating delay of the first building element derived from the nth updated set of structural parameters is close to or equal to the desired delay in the nth update. It is to be understood that the operational delay of the first building element is equal to the desired delay or that the difference between the operational delay and the desired delay of the first building element is smaller than a threshold value. Therefore, the present application may make the resulting running delay of the first building element close to or equal to the desired delay, with the desired delay as a constraint. And obtaining a first building unit according with the delay expectation, thereby obtaining the neural network according with the delay expectation.

In a possible implementation, outputting the delays of the plurality of third building units through the delay prediction model may include: coding the structures of the plurality of third construction units to obtain coded data of the plurality of third construction units; and inputting the coded data of the plurality of third construction units into the delay prediction model to obtain the prediction delay of each third construction unit in the plurality of third construction units.

In the embodiment of the present application, a specific way of outputting the prediction delay is provided by encoding the third building unit and inputting the encoded third building unit to the delay prediction model.

In one possible embodiment, the method further comprises: and constructing based on a search space to obtain the super unit, wherein the search space comprises a plurality of basic operations, and the plurality of basic operations comprise operations corresponding to the target task.

In the embodiment of the application, the super unit can be constructed based on the search space, and a specific mode for acquiring the super unit is provided. In different scenarios, the required basic operations are different, and the type and number of basic operations are associated with the target task.

In one possible implementation, the search space is a differentiable search space. The method and the device are suitable for the differentiable search space, so that the output of the delay prediction model is added as a constraint condition aiming at the scene of the differentiable search space, at least one first construction unit associated with delay is searched, and the neural network which is more efficient and accurate in the differentiable search space and is more friendly to hardware is realized.

In a second aspect, the present application provides a base station having a function of implementing the structure search method of the first aspect. The function can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above.

In a third aspect, the present application provides a structure search apparatus, comprising: a processor and a memory, wherein the processor and the memory are interconnected by a line, and the processor calls the program code in the memory to execute the processing-related functions in the structure search method according to any one of the first aspect.

In a fourth aspect, an embodiment of the present application provides a digital processing chip, where the chip includes a processor and a memory, where the memory and the processor are interconnected by a line, and the memory stores instructions, and the processor is configured to perform functions related to processing as in the first aspect or any one of the optional implementations of the first aspect.

In a fifth aspect, an embodiment of the present application provides a computer-readable storage medium, which includes instructions that, when executed on a computer, cause the computer to perform the method in the first aspect or any optional implementation manner of the first aspect.

In a sixth aspect, an embodiment of the present application provides a computer program product containing instructions, which when run on a computer, cause the computer to perform the method of the first aspect or any one of the optional embodiments of the first aspect.

Drawings

FIG. 1 is a schematic diagram of an artificial intelligence agent framework provided by an embodiment of the present application;

FIG. 2 is a system architecture diagram according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a convolutional neural network according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of another convolutional neural network structure provided in the embodiments of the present application;

fig. 5 is a schematic structural diagram of a neural network processor according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a cloud system according to an embodiment of the present disclosure;

fig. 7 is a schematic flowchart of a structure searching method according to an embodiment of the present application;

FIG. 8 is a schematic structural diagram of a super cell according to an embodiment of the present disclosure;

FIG. 9 is a schematic diagram of structural parameters provided in an embodiment of the present application;

FIG. 10 is a schematic structural diagram of a building unit according to an embodiment of the present disclosure;

FIG. 11 is a schematic structural diagram of a first building unit according to an embodiment of the present disclosure;

FIG. 12 is a structural diagram of a second building unit according to an embodiment of the present disclosure;

fig. 13A is a schematic diagram illustrating a delay prediction model training method according to an embodiment of the present disclosure;

fig. 13B is a schematic diagram of a structure coding according to an embodiment of the present application;

fig. 13C is a schematic diagram of a network parameter provided in an embodiment of the present application;

FIG. 14 is a schematic structural diagram of another super cell provided in an embodiment of the present application;

FIG. 15 is a schematic structural diagram of another first building element provided in an embodiment of the present application;

fig. 16 is a schematic structural diagram of a neural network according to an embodiment of the present application;

fig. 17 is a schematic flowchart of another structure searching method according to an embodiment of the present application;

fig. 18 is a schematic flowchart of another structure searching method according to an embodiment of the present application;

fig. 19A is a schematic flowchart of another structure searching method according to an embodiment of the present application;

fig. 19B is a schematic flowchart of another structure searching method according to an embodiment of the present application;

fig. 20 is a schematic structural diagram of a structure searching apparatus according to an embodiment of the present application;

fig. 21 is a schematic structural diagram of another structure search apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

FIG. 1 shows a schematic diagram of an artificial intelligence body framework that describes the overall workflow of an artificial intelligence system, applicable to the general artificial intelligence field requirements.

The artificial intelligence topic framework described above is set forth below in terms of two dimensions, the "intelligent information chain" (horizontal axis) and the "IT value chain" (vertical axis).

The "smart information chain" reflects a list of processes processed from the acquisition of data. For example, the general processes of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision making and intelligent execution and output can be realized. In this process, the data undergoes a "data-information-knowledge-wisdom" refinement process.

The 'IT value chain' reflects the value brought by artificial intelligence to the information technology industry from the bottom infrastructure of the human intelligence, information (provision and processing technology implementation) to the industrial ecological process of the system.

(1) Infrastructure:

the infrastructure provides computing power support for the artificial intelligent system, realizes communication with the outside world, and realizes support through a foundation platform. Communicating with the outside through a sensor; the computing power is provided by intelligent chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA and the like); the basic platform comprises distributed computing framework, network and other related platform guarantees and supports, and can comprise cloud storage and computing, interconnection and intercommunication networks and the like. For example, sensors and external communications acquire data that is provided to intelligent chips in a distributed computing system provided by a base platform for computation.

(2) Data of

Data at the upper level of the infrastructure is used to represent the data source for the field of artificial intelligence. The data relates to graphics, images, voice, video and text, and also relates to the data of the internet of things of traditional equipment, including service data of the existing system and sensing data of force, displacement, liquid level, temperature, humidity and the like.

(3) Data processing

Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.

The machine learning and the deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.

Inference means a process of simulating an intelligent human inference mode in a computer or an intelligent system, using formalized information to think about and solve a problem by a machine according to an inference control strategy, and a typical function is searching and matching.

The decision-making refers to a process of making a decision after reasoning intelligent information, and generally provides functions of classification, sequencing, prediction and the like.

(4) General capabilities

After the above-mentioned data processing, further based on the result of the data processing, some general capabilities can be formed, such as an algorithm or a general system, for example, translation, text analysis, computer vision processing (e.g., image recognition, object detection, etc.), voice recognition, etc.

(5) Intelligent product and industrial application

The intelligent product and industry application refers to the product and application of an artificial intelligence system in various fields, and is the encapsulation of an artificial intelligence overall solution scheme, the intelligent information decision is commercialized, and the application on the ground is realized, and the application field mainly comprises: intelligent manufacturing, intelligent transportation, intelligent home, intelligent medical treatment, intelligent security, automatic driving, safe city, intelligent terminal and the like.

Referring to fig. 2, a system architecture 200 is provided in an embodiment of the present application. The system architecture includes a database 230 and a client device 240. The data collection device 260 is used to collect data and store it in the database 230, and the training module 220 generates the target model/rule 201 based on the data maintained in the database 230. In the following, how the training module 220 obtains the target model/rule 201 based on the data will be described in more detail, and the target model/rule 201 is a neural network formed by stacking one or more first building units in the following embodiments of the present application, and refer to the following description in step 704.

The operation of each layer in a deep neural network can be described by the mathematical expression y ═ a (W × x + b): from the work of each layer in the physical-level deep neural network, it can be understood that the transformation of the input space into the output space (i.e. the row space to the column space of the matrix) is accomplished by five operations on the input space (set of input vectors), which include: 1. ascending/descending dimensions; 2. zooming in/out; 3. rotating; 4. translating; 5. "bending". Wherein, the

operations

1, 2 and 3 are completed by W x, the operation 4 is completed by + b, and the operation 5 is realized by a (). The expression "space" is used herein because the objects being classified are not a single thing, but a class of things, and space refers to the collection of all individuals of such things. Where W is a weight vector, each value in the vector representing a weight value for a neuron in the layer of neural network. The vector determines the spatial transformation from input space to output space described above, i.e., the weight of each layer controls how the space is transformed. And (3) training the deep neural network, namely finally obtaining the weight matrix of all layers of the trained neural network. Therefore, the training process of the neural network is essentially a way of learning the control space transformation, and more specifically, a weight matrix, which can be refined into a structure parameter set and a network parameter set in the following embodiments of the present application, and refer to the related description in fig. 2 below.

Because the output of the deep neural network is expected to be as close to the target value as possible, the weight vector of each layer of the neural network can be updated according to the difference between the predicted value and the target value of the current network by comparing the predicted value and the target value (of course, an initialization process is usually performed before the first update, that is, parameters are configured in advance for each layer in the deep neural network). For example, if the predicted value of the network is too high, the values of the weights in the weight matrix are adjusted to reduce the predicted value, and the adjustment is continued until the value output by the neural network approaches the target value or equals the target value. Therefore, it is necessary to define in advance how to compare the difference between the predicted value and the target value, that is, a loss function (loss function) which is an important equation for measuring the difference between the predicted value and the target value or an objective function (objective function). In this case, taking the loss function as an example, a higher output value (loss) of the loss function indicates a larger difference, and the training of the neural network may be understood as a process of reducing the loss as much as possible.

The calculation module may include a training module 220, and the target model/rule obtained by the training module 220 may be applied to different systems or devices. In fig. 2, the execution device 210 configures a transceiver 212, the transceiver 212 may be a wireless transceiver, an optical transceiver, a wired interface (such as an I/O interface), or the like, and performs data interaction with an external device, and a "user" may input data to the transceiver 212 through the client device 240, for example, in the following embodiments of the present application, the client device 240 may send a target task to the execution device 210, request the execution device to construct a neural network, and send a database for training to the execution device 210.

The execution device 210 may call data, code, etc. in the data storage system 250 or may store data, instructions, etc. in the data storage system 250.

The calculation module 211 processes the input data using the target model/rule 201. Specifically, the calculation module 211 is configured to: acquiring a target task, wherein the target task is used for requesting to create a neural network running on preset hardware; acquiring a super unit according to the target task, wherein the super unit comprises a plurality of nodes, and any two nodes in the plurality of nodes are connected through various basic operations; searching the super unit by taking the output of a delay prediction model as a constraint condition, determining at least one first construction unit, wherein the delay prediction model is used for outputting a prediction delay, the prediction delay is a prediction time length of an output result obtained when a construction unit included in the super unit runs on preset hardware, the delay prediction model is obtained by training according to information of a plurality of second construction units, the information of the plurality of second construction units comprises the structure of each second construction unit in the plurality of second construction units and the operation delay of each second construction unit, the operation delay is a time length of the output result obtained when each second construction unit runs on the preset hardware, and each first construction unit, the construction unit included in the super unit and each second construction unit in the at least one first construction unit comprise a plurality of nodes, any two nodes in the plurality of nodes are connected through at most one of the plurality of basic operations, and each node in the plurality of nodes is connected with at least one of the plurality of basic operations; and stacking the at least one first construction unit to obtain the neural network.

The correlation function module 21 may specifically be a module for training a delay prediction model. More specifically, the specific process of training the delay prediction model may refer to the following related description in step 703, and is not described herein again.

The correlation function 214 may be configured to perform search construction according to basic operations included in the search space, so as to obtain a super cell.

Finally, the transceiver 212 returns the constructed neural network to the client device 240 for deployment in the client device 240 or other devices.

Further, the training module 220 may derive corresponding target models/rules 201 based on different data for different target tasks to provide better results to the user.

In the case shown in fig. 2, the user may manually specify data to be input into the execution device 210, for example, to operate in an interface provided by the transceiver 212. Alternatively, the client device 240 may automatically input data to the transceiver 212 and obtain the results, and if the client device 240 automatically inputs data to obtain authorization from the user, the user may set corresponding permissions in the client device 240. The user can view the result output by the execution device 210 at the client device 240, and the specific presentation form can be display, sound, action, and the like. The client device 240 may also act as a data collector to store collected data associated with the target task in the database 230.

It should be noted that fig. 2 is only a schematic diagram of a system architecture provided in an embodiment of the present application, and a positional relationship between devices, modules, and the like shown in the diagram does not constitute any limitation. For example, in FIG. 2, the data storage system 250 is an external memory with respect to the execution device 210, and in other scenarios, the data storage system 250 may be disposed in the execution device 210.

Illustratively, a Convolutional Neural Network (CNN) is taken as an example below.

CNN is a deep neural network with a convolution structure, and is a deep learning (deep learning) architecture, which refers to learning at multiple levels at different abstraction levels by a machine learning algorithm. As a deep learning architecture, CNN is a feed-forward artificial neural network in which individual neurons respond to overlapping regions in an image input thereto.

As shown in fig. 3, Convolutional Neural Network (CNN)100 may include an input layer 110, a convolutional/pooling layer 120, where the pooling layer is optional, and a neural network layer 130.

As shown in FIG. 3, convolutional layer/pooling layer 120 may include, for example, 121-126 layers, in one implementation, 121 layers are convolutional layers, 122 layers are pooling layers, 123 layers are convolutional layers, 124 layers are pooling layers, 125 layers are convolutional layers, and 126 layers are pooling layers; in another implementation, 121, 122 are convolutional layers, 123 are pooling layers, 124, 125 are convolutional layers, and 126 are pooling layers. I.e., the output of a convolutional layer may be used as input to a subsequent pooling layer, or may be used as input to another convolutional layer to continue the convolution operation.

Taking convolutional layer 121 as an example, convolutional layer 121 may include a plurality of convolution operators, also called kernels, whose role in image processing is to act as a filter to extract specific information from the input image matrix, and the convolution operator may be essentially a weight matrix, which is usually predefined. During the convolution operation on the image, the weight matrix is usually processed on the input image pixel by pixel in the horizontal direction (or two pixels by two pixels … … depending on the value of the step size stride), so as to complete the extraction of the specific feature from the image. The size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension (depth dimension) of the weight matrix is the same as the depth dimension of the input image, and the weight matrix extends to the entire depth of the input image during the convolution operation. Thus, convolving with a single weight matrix will produce a single depth dimension of the convolved output, but in most cases not a single weight matrix is used, but a plurality of weight matrices of the same dimension are applied. The outputs of each weight matrix are stacked to form the depth dimension of the convolved image. Different weight matrices may be used to extract different features in the image, e.g., one weight matrix to extract image edge information, another weight matrix to extract a particular color of the image, yet another weight matrix to blur unwanted noise in the image, etc. The dimensions of the multiple weight matrixes are the same, the dimensions of the feature maps extracted by the multiple weight matrixes with the same dimensions are also the same, and the extracted feature maps with the same dimensions are combined to form the output of convolution operation.

The weight values in these weight matrices need to be obtained through a large amount of training in practical application, and each weight matrix formed by the trained weight values can extract information from the input image, thereby helping the convolutional neural network 100 to make correct prediction.

When convolutional neural network 100 has multiple convolutional layers, the initial convolutional layer (e.g., 121) tends to extract more general features, which may also be referred to as low-level features; as the depth of the convolutional neural network 100 increases, the features extracted by the convolutional layers (e.g., 126) further down are more complex, such as features with high levels of semantics, and the more highly semantic features are more suitable for the problem to be solved.

A pooling layer:

since it is often desirable to reduce the number of training parameters, it is often desirable to periodically introduce pooling layers after the convolutional layer, i.e., layers 121-126 as illustrated by 120 in fig. 3, which may be one convolutional layer followed by one pooling layer, or multiple convolutional layers followed by one or more pooling layers. During image processing, the only purpose of the pooling layer is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to smaller sized images. The average pooling operator may calculate pixel values in the image over a particular range to produce an average. The max pooling operator may take the pixel with the largest value in a particular range as the result of the max pooling. In addition, just as the size of the weighting matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the image output after the processing by the pooling layer may be smaller than the size of the image input to the pooling layer, and each pixel point in the image output by the pooling layer represents an average value or a maximum value of a corresponding sub-region of the image input to the pooling layer.

The neural network layer 130:

after processing by convolutional layer/pooling layer 120, convolutional neural network 100 is not sufficient to output the required output information. Because, as previously described, the convolutional layer/pooling layer 120 only extracts features and reduces the parameters brought by the input image. However, to generate the final output information (class information or other relevant information as needed), the convolutional neural network 100 needs to generate one or a set of outputs of the number of classes as needed using the neural network layer 130. Accordingly, a plurality of hidden layers (131, 132 to 13n as shown in fig. 3) and an output layer 140 may be included in the neural network layer 130. In this application, the convolutional neural network is: and searching the super unit by taking the output of the delay prediction model as a constraint condition to obtain at least one first construction unit, and stacking the at least one first construction unit. The convolutional neural network can be used for image recognition, image classification, image super-resolution reconstruction and the like.

After the hidden layers in the neural network layer 130, i.e. the last layer of the whole convolutional neural network 100 is the output layer 140, the output layer 140 has a loss function similar to the class cross entropy, and is specifically used for calculating the prediction error, once the forward propagation (i.e. the propagation from 110 to 140 in fig. 3 is the forward propagation) of the whole convolutional neural network 100 is completed, the backward propagation (i.e. the propagation from 140 to 110 in fig. 3 is the backward propagation) starts to update the weight values and the bias of the aforementioned layers, so as to reduce the loss of the convolutional neural network 100 and the error between the result output by the convolutional neural network 100 through the output layer and the ideal result.

It should be noted that the convolutional neural network 100 shown in fig. 3 is only an example of a convolutional neural network, and in a specific application, the convolutional neural network may also exist in the form of other network models, for example, as shown in fig. 4, a plurality of convolutional layers/pooling layers are parallel, and the features extracted respectively are all input to the overall neural network layer 130 for processing.

Fig. 5 is a diagram of a chip hardware structure according to an embodiment of the present invention.

The neural network processor NPU 50NPU is mounted on a main CPU (Host CPU) as a coprocessor, and tasks are allocated by the Host CPU. The core portion of the NPU is an arithmetic circuit 50, and the controller 504 controls the arithmetic circuit 503 to extract matrix data in the memory and perform multiplication.

In some implementations, the arithmetic circuit 503 internally includes a plurality of processing units (PEs). In some implementations, the operational circuitry 503 is a two-dimensional systolic array. The arithmetic circuit 503 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuitry 503 is a general-purpose matrix processor.

For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 502 and buffers each PE in the arithmetic circuit. The arithmetic circuit performs matrix operation on the matrix a data and the matrix B from the input memory 501, and partial results or final results of the obtained matrix are stored in the accumulator 508 accumulator.

The unified memory 506 is used to store input data as well as output data. The weight data is directly transferred to the weight Memory 502 through the Direct Memory Access Controller 505, and the DMAC. The input data is also carried through the DMAC into the unified memory 506.

The BIU is a Bus Interface Unit 510, which is used for the interaction between the AXI Bus and the DMAC and the Instruction Fetch memory 509Instruction Fetch Buffer.

The Bus Interface Unit 510(Bus Interface Unit, BIU for short) is configured to obtain an instruction from the instruction fetch memory 509 and obtain the original data of the input matrix a or the weight matrix B from the external memory by the memory Unit access controller 505.

The DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 506 or to transfer weight data into the weight memory 502 or to transfer input data into the input memory 501.

The vector calculation unit 507 is provided with a plurality of operation processing units, and further processes the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like, if necessary.

In some implementations, the vector calculation unit 507 can store the processed output vector to the unified buffer 506. For example, the vector calculation unit 507 may apply a non-linear function to the output of the arithmetic circuit 503, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit 507 generates normalized values, combined values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to the arithmetic circuitry 503, for example for use in subsequent layers in a neural network.

An instruction fetch buffer 509 connected to the controller 504 for storing instructions used by the controller 504;

the unified memory 506, the input memory 501, the weight memory 502, and the instruction fetch memory 509 are all On-Chip memories. The external memory is private to the NPU hardware architecture.

The operations of the layers in the convolutional neural networks shown in fig. 3 and 4 may be performed by the matrix calculation unit or the vector calculation unit 507.

Referring to fig. 6, the present embodiment provides a system architecture 300. The execution device 210 is implemented by one or more servers, optionally in cooperation with other computing devices, such as: data storage, routers, load balancers, and the like; the execution device 210 may be disposed on one physical site or distributed across multiple physical sites. The executing device 210 may use the data in the data storage system 250 or call the program code in the data storage system 250 to implement the steps of the structure searching method corresponding to fig. 2-19B below.

The user may operate respective user devices (e.g., local device 301 and local device 302) to interact with the execution device 210. Each local device may represent any computing device, such as a personal computer, computer workstation, smartphone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set-top box, gaming console, and so forth.

The local devices of each user may interact with the enforcement device 210 via a communication network of any communication mechanism/standard, such as a wide area network, a local area network, a point-to-point connection, or any combination thereof, and in particular, the communication network may include a wireless network, a wired network, or a combination of a wireless network and a wired network, and the like, the wireless network includes, but is not limited to, a fifth Generation mobile communication technology (5th-Generation, 5G) system, a long term evolution (L TE) system, a global system for mobile communication (GSM) or Code Division Multiple Access (CDMA) network, a Wideband Code Division Multiple Access (WCDMA) network, a wireless fidelity (WiFi), a bluetooth (bluetooth), a Zigbee (Zigbee) protocol, a radio frequency identification technology (radio frequency identification) network, a wireless fidelity (Zigbee) network, a wireless communication network (L) including a combination of a plurality of wireless communication networks, a wireless communication network, or a combination thereof.

In another implementation, one or more aspects of the execution device 210 may be implemented by each local device, e.g., the local device 301 may provide local data or feedback calculations for the execution device 210.

It is noted that all of the functions of the performing device 210 may also be performed by a local device. For example, local device 301 implements functionality to perform device 210 and provide services to its own user, or to the user of local device 302.

The structure search method provided by the present application is explained below based on the aforementioned application scenarios.

Referring to fig. 7, a flow chart of a structure searching method provided in the present application includes the following steps.

701. And acquiring the target task.

The method comprises the steps of obtaining a target task, wherein the target task is used for requesting to create a neural network running on preset hardware.

The preset hardware may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), or a general purpose processor. For example, the target task may be for requesting creation of a neural network running in the CPU.

The target task can be determined according to the requirement of the terminal or the terminal according to the operation of the user. Illustratively, the target task may include: the type of neural network, the latency constraint strength, etc. The type of the neural network includes an output type of the neural network requested to be constructed, for example, the target task may be initiated by the terminal, and the target task may be to construct a face recognition neural network for recognizing a face and outputting a corresponding person name. For another example, the target task may be a neural network initiated by the terminal to construct a vehicle identification for identifying information of the vehicle included in the picture obtained by the sensor. The delay constraint strength is used for embodying the constraint strength of the prediction delay output by the delay prediction model to the search building unit. Generally, the greater the delay constraint strength, the lower the delay of the neural network requesting construction, and the shorter the duration of obtaining the output structure. More specifically, the specific role of the delay constraint strength can be referred to the following related description in step 704, and is not described herein again.

The neural network in the present application may be a convolutional neural network, a cyclic neural network, a perceptron neural network, or the like, and may be specifically adjusted according to an actual application scenario, which is not limited in the present application.

In one possible implementation, corresponding training data may also be acquired at the same time as or after the target task is acquired. The training data is data associated with a target task. The training data may include input data and actual measurement data for the target task. For example, if the target task is to construct a face recognition neural network, the training data includes a face picture library of a large number of face pictures and the person information corresponding to each picture. If the face picture library comprises the Xiaoming face pictures, the real measurement data of the pictures comprise information such as Xiaoming names, heights, birth and average introduction and the like.

702. And acquiring the super unit according to the target task.

After the target task is determined, the super unit is obtained according to the target task. The super unit comprises a plurality of nodes, and any two nodes are connected through a plurality of basic operations.

Wherein the number of nodes included in the super cell is at least two and is associated with the target task. Generally, the more nodes included in a super cell, the more corresponding parameters, and the more computing resources required. Accordingly, the fewer nodes the superunit includes, the fewer corresponding parameters, and the fewer computing resources required. Therefore, when the computing resources of the preset hardware are more, the super unit including more nodes can be determined according to the computing resources of the preset hardware. In addition, within a certain quantity range, the more nodes included in the super unit, the higher the accuracy of the output result is, and the better the performance is. Generally, a super cell may include 7 nodes, including 2 input nodes, 4 intermediate nodes, and 1 output node, which may balance accuracy and computational resources.

In one possible implementation, the superunit may be directly extracted from locally stored data. For example, if a historical task is acquired before step 701, the neural network constructed by the historical task request is similar to the neural network constructed by the target task request, and if both the historical task and the target task request the construction of the face recognition neural network, the superunit used or updated when the neural network is constructed based on the historical task can be directly acquired from the historical data, so that the workload of constructing the superunit is reduced.

In another possible implementation, the super cell may be constructed based on a search space. The search space includes a plurality of basic operations. After determining the number of nodes included in the super cell, any two nodes are connected through the plurality of basic operations.

Specifically, the search space may be a differentiable search space, such as a differentiable structure search (DARTS) space, a mobiconv-based space, a mobicnenet-V2-based space, etc. the basic operations comprised by the search space may include convolution, pooling, or a combination of convolution and pooling operations, for example, the basic operations comprised by the search space include, but are not limited to, one or more of mean pooling (avg _ pool _3x3) with a pooling kernel size of 3 ×, maximum pooling (max _ pool _3x3) with a pooling kernel size of 3 ×, separation convolution (sep _ conv _3x3) with a convolution kernel size of 3 3 3 3, separation convolution (sep _ conv _5x5) with a convolution kernel size of 5 ×, hole size (sep _ conv _5x5) with a convolution kernel size of 3 × and a convolution rate of 2, hole size (3 _ conviconv _5x × x) with a convolution kernel size of 5 4835, and a hole size of Zero-null (Zero-5) with a convolution kernel size of 3 and a convolution kernel 3, a convolution (sep _ ×) of 3, a convolution kernel size of 3, a convolution (3, a null-5, a null-365, a null operation, a null.

Illustratively, taking 4 nodes and 3 basic operations as an example, the structure of the super cell can be as shown in fig. 8. Any two nodes are connected through three basic operations, such as basic operation 1, basic operation 2 and basic operation 3 shown in fig. 8. The three basic operations may be any of the three operations or a combination of the operations described above.

In addition, the sum of the weights of various basic operations between any two nodes connected in the super cell is 1. In the embodiment of the present application, the weight occupied by each basic operation in multiple basic operations between any two nodes is referred to as a structural parameter. For example, as shown in fig. 9, node 0 and node 2 are connected by base operation 1, base operation 2 and base operation 3. The weight occupied by base operation 1 is 0.3, the weight occupied by base operation 2 is 0.4, and the weight occupied by base operation 3 is 0.3. The sum of the weights occupied by the three basis operations is 1. Wherein, the larger the weight occupied by the basic operation is, the larger the probability of adopting the basic operation between the two nodes is. For example, between node 0 and node 2, the weight occupied by the basic operation 2 is the largest, i.e., the probability that node 0 and node 2 are connected through the basic operation 2 is higher.

In the following embodiments of the present application, the weight occupied by each basic operation connected between every two nodes in a super cell is referred to as a configuration parameter set. For example. As for the super cell shown in fig. 8, the weight occupied by each of the three basic operations connected between every two nodes in node 0, node 1, node 2, and node 3 constitutes a structural parameter set.

703. And searching the super unit by taking the output of the delay prediction model as a constraint condition to determine at least one first construction unit.

When searching for the super unit, at least one first building unit may be determined with a predicted delay output by the delay prediction model as a constraint condition.

The delay prediction model is used for outputting the prediction delay of the construction unit included in the super unit. It can be understood that the prediction delay is a prediction duration of the output result obtained by the execution of the building unit of the super unit on the preset hardware. It can be understood that the prediction delay can be obtained by predicting the time length of the output result obtained by the operation of the building unit included in the super unit on the preset hardware through the delay model. Without measuring the operating delay of the building unit each time a search is made for a super cell. The delay prediction model is obtained by training the structures of a plurality of second construction units and the operation delay of each second construction unit. The operation delay is a time length of the output result obtained by operating each second building unit on the preset hardware. For example, the operation time length may be a time length in which the corresponding second construction unit infers one time in the CPU of the terminal in the forward direction to obtain the output result. Therefore, the delay of the building unit included in the super unit can be predicted through the trained delay prediction model, the operation delay of the building unit included in the super unit does not need to be measured, the workload is greatly reduced, and the efficiency of searching to obtain the first building unit is improved.

Wherein a super cell may comprise a plurality of building units, a building unit may be understood as a subset of a super cell. The construction unit comprises a plurality of nodes, and any two nodes are connected through at most one basic operation in the basic operations. Illustratively, the structure of the building block can be seen in fig. 10, where node 0 and node 1 can be connected by algorithm 3, node 0 and node 2 can be connected by algorithm 2, and so on.

The first building element and the second building element are similar in structure to the building elements described above in connection with fig. 10, and exemplarily, the structure of one first building element can be referred to in fig. 11 and the structure of one second building element can be referred to in fig. 12.

It should be understood that, in the embodiments of the present application, the first building unit, the building unit included in the super unit, or the second building unit, may specifically include a plurality of nodes, and the number of the included nodes is the same as that of the nodes included in the super unit. The plurality of nodes are connected by at most one of a plurality of basic operations, and at least two nodes are connected by one basic operation. Typically, no isolated node exists in the plurality of nodes, and each node in the plurality of nodes is connected with at least one of the plurality of basic operations, i.e., each node is connected with other nodes through at least one basic operation.

Some possible or specific embodiments of the delay prediction model and the search of the super cell are described below, respectively.

First, a building unit or a training process involved in training of the delay prediction model will be described.

In a specific embodiment, the second building element may be sampled by a search space. For example, the basic operation included in the search space is randomly sampled and constructed to obtain 10 ten thousand second construction units, and the delay prediction model is obtained by training the structures of the 10 ten thousand second construction units and the measured operation delay of each second construction unit.

In one possible implementation, a delay prediction model may also be trained prior to step 703. The specific way of training the delay prediction model may include: and obtaining the structure of each second construction unit in the plurality of second construction units, testing the operation delay of each second construction unit on the preset hardware, and then training on the basis of the structure of each second construction unit, the operation delay of each second construction unit and a preset regression model to obtain a delay prediction model. It can be understood that the functional relationship between the structure of the building unit and the running time on the preset hardware is obtained by training a large number of structures of the second building unit and the corresponding running delay, so that the structure of the input building unit is predicted, the predicted delay of the building unit can be output without performing trial running on the preset hardware, and the efficiency of the output delay is improved.

The predetermined regression model may be a regression type model, such as a linear regression model, a gradient descent regression model, a polynomial regression model, an L asso regression model, and the like.

In a specific embodiment, the plurality of second building elements may be divided into at least two classes, a first class building element and a second class building element. And performing M times of iterative training according to the first type of construction unit and the second type of structural unit based on a preset regression model to obtain a delay prediction model, wherein M times are positive integers. It should be understood that the value of M may be adjusted according to the actual application scenario, and it is sufficient to train until the delay prediction model converges.

Specifically, taking the kth iteration of M iterations as an example, K is a positive integer not greater than M. And training the delay prediction model obtained in the last training for the K-1 th time according to the structure of each second construction unit in the first type of construction unit and the operation delay of each second construction unit in the first type of construction unit to obtain the temporary delay prediction model of the training for the K time. And then verifying the temporary delay prediction model according to the structure of each second building unit in the second type building units and the operation delay of each second building unit in the second type building units, and updating the delay prediction model according to the verification structure to obtain the K-th training delay prediction model. And taking the result of the M training from iteration to convergence as a finally output delay training model.

Illustratively, as shown in fig. 13A, the delay prediction model may be a regression model composed of a plurality of layers of perceptual networks, wherein the four layers are respectively composed of 112, 256, 64 and 1 neuron, the input structural code may be a structural code obtained by encoding the structure of the second construction unit, the output is the prediction delay of the second construction unit, in the stage of training by the first type of construction, model training is performed in a gradient descent manner by calculating a loss function, the loss function may include a cross entropy function, a Smooth L1 loss function commonly used in target detection, and the like, and the minimum mean square error is, for example:

wherein, y_mIn order to measure the delay of the delay,

the smaller the MSE for delaying the delay output by the prediction model, the closer the predicted delay is to the measured true delay, i.e., the more accurate the predicted delay.

For example, the second building unit may be encoded using one-hot codes (one-hot), and the structure parameters of the second building unit may be converted into binary structure codes. A second building element, as shown in fig. 13B, may be converted to 010001010001100000. It is to be understood that base operation 1 can be represented as 100, base operation 2 as 010, and base operation 3 as 001. And when the model training is carried out, training the network parameters of the delay prediction model according to the first type construction unit to obtain a temporary delay prediction model. For example, as shown in fig. 13C, the output of two nodes is used as the input of one node, and a1 is w1 x1+ w2 x2, and w1 and w2 can be understood as the network parameters of the delay prediction model. And then verifying the temporary delay prediction model through a second type construction unit. And when the difference between the prediction delay and the real measurement delay is larger, updating the temporary delay prediction model to obtain the delay prediction model trained at the current time. When the predicted delay is less different from the real measured delay or converges, the temporary delay prediction model can be used as a delay prediction model obtained by training.

Therefore, in the embodiment of the present application, the construction unit formed by the basic operation included in the search space is sampled, and the second construction unit obtained by sampling is measured to obtain the running delay. And training through the structure of the second construction unit and the measured running delay to obtain a delay prediction model for predicting the delay of the input construction unit. Therefore, the prediction delay of the building units included in the super unit can be rapidly output, the super unit does not need to be sampled and tested each time the building units are searched, and the searching efficiency of the building units is improved.

Next, the building unit or the specific search process involved in the search process of the super cell will be described below.

In one possible implementation, the output of the delay prediction model is used as a constraint condition, and the structural parameter set of the super unit is subjected to N times of iterative updating, wherein N is a positive integer. And determining at least one construction unit according to the structure parameter set of the super unit after the iteration updating for N times.

It should be understood that the value of N may be different in different scenarios, and the value of N may be specifically adjusted according to the actual application scenario. For example, the structural parameter set of the super cell is iteratively updated until the super cell converges, and a final structural parameter set can be obtained.

In a specific embodiment, the aforementioned P-th iteration (P is a positive integer not greater than N) of the N iterations may include: and sampling the super units according to the structural parameter set of the super units to obtain a plurality of third construction units. The structure of the third building unit is similar to that of the aforementioned fig. 10, and is not described here again. The structural parameter set can refer to the related description in step 702, and is not described herein again. And coding the structures of the plurality of third construction units to obtain the structure code of each third construction unit in the plurality of third construction units, inputting the structure code of each third construction unit into the delay prediction model, and outputting the prediction delay of each third construction unit. And then updating the structural parameter set obtained by updating the P-1 st time by taking the predicted delay of the plurality of third construction units as a constraint condition to obtain the structural parameter set updated for the P-th time.

Specifically, the updating the structure parameter set obtained by updating P-1 times with the prediction delays of the plurality of third building units as constraint conditions may include: and calculating expected delay of the super unit according to the delay of the plurality of third construction units, updating a joint loss function of the super unit by taking the expected delay as a constraint condition, and updating a structural parameter set according to the joint loss function to obtain a P-th updated structural parameter set.

More specifically, the expected delay may be calculated in the manner of:

l PM (gamma), among others_k) To delay the predicted delay of the prediction model output, L PM (α) is the expected delay, K is the number of third building blocks sampled, α is the set of structural parameters.

Specific ways of updating the joint loss function of a super-cell with the expected delay as a constraint may include L_total(α)＝L_val(α) + λ L PM (α), wherein L_total(α) is the value of the updated joint loss function, L_val(α) is a value calculated according to a preset loss function, L PM (α) is a desired delay, α is a structural parameter set (which can also be understood as a structural vector), λ is a delay constraint strength, and λ is determined according to a target task. generally, the larger the value of λ is, the greater the strength of the predicted delay as a constraint condition is, the shorter the duration of the output result obtained by operating the searched neural network on preset hardware is, and in order to guarantee the accuracy of the output of the neural network, the value of λ is generally not greater than 0.2. for example, when the value of λ is 0.2, the delay can be reduced by 30% while the accuracy is not reduced, and in the range of (0, 0.2), the larger the value of λ is, the lower the delay is, and the accuracy can be kept stable.

Wherein, L_valThe specific calculation method (α) may include sampling the super cells based on the structure parameter set obtained from the last iteration to obtain one or more building units, stacking the one or more building units to obtain a temporary neural network, taking the data in the training data from the previous step 701 as the input of the temporary neural network to obtain the output of the temporary neural network, and calculating the difference between the output and the true value of the temporary neural network through a preset loss function to obtain L_val(α)。

Updating the set of structure parameters according to the joint loss function may specifically include calculating L_total(α) after the super cell α is derived, then α is updated in a gradient descending manner until convergence is achieved, and an updated set of structural parameters α is obtained.

It is to be understood that after the nth update, the operating delay of the first building element derived from the nth updated set of structural parameters is close to or equal to the desired delay in the nth update. It is to be understood that the operational delay of the first building element is equal to the desired delay or that the difference between the operational delay and the desired delay of the first building element is less than a threshold. Therefore, the expected delay is taken as a constraint condition, so that the obtained operation delay of the first construction unit is close to or equal to the expected delay, the first construction unit which is in accordance with the expectation is obtained, and the neural network which is in accordance with the expectation is obtained.

Illustratively, as shown in fig. 14, in the process of updating the structure parameter set of the super cell according to the joint loss function, the weight occupied by each operation-based operation between any two nodes is updated. In the super cell shown in fig. 14, the basic operation in each node represents the weight occupied by the basic operation by the thickness of a line, and for example, the weight occupied by the basic operation 2 is the largest between the node 0 and the node 1. And after N times of iteration updating, the structure of the converged super unit can be obtained, and at least one first construction unit is determined according to the structure parameter set of the super unit. If the basic operation with the largest weight is used as the structural parameter of the connection node, one of the determined first building units is shown in fig. 15. The converged super unit can be understood as a neural network composed of stacked building units obtained by sampling the super unit based on a structural parameter set, and the difference between the output value and the true output value is smaller than a threshold value or 0.

Referring to FIG. 13C, w1 and w2 can be understood as two network parameters of the super cell, and it can be understood that, unlike the structure parameters, the network parameters represent the weight of the data output by the previous node as the input of the next node, and the structure parameters represent the weight of the corresponding basic operation as the operation connecting the two nodes.

For example, the structure parameter set α is updated in the process of updating N times of iterations until convergenceThe body mode is as follows:

when the value is 0 (for updating to the first order approximation update), the network parameter set ω is updated by:

wherein, for inner layer optimization learning rate, L_valAs a loss function on the validation set, L_train() Is the loss function of the model on the training set. The training data mentioned in step 701 is divided into a verification set and a training set, the verification set is a data set for evaluating the model, and the training set is a data set for training the model. If the training data is a face picture library, the pictures included in the face picture can be divided into two types, one type is a verification set, and the other type is a training set.

Therefore, the network parameter set of the super unit is updated besides the structure parameter set of the super unit, and the comprehensive updating of the super unit is realized. Therefore, the prediction delay is used as a constraint condition, and the network parameter set and the structure parameter set of the super unit are updated, so that the obtained neural network can be associated with the time delay of running on hardware, and more scenes are adapted.

704. And stacking at least one first construction unit to obtain the neural network.

After obtaining at least one first construction unit, stacking the at least one construction unit to obtain the neural network requested to be constructed by the target task.

In particular, one or more first building units may be stacked, resulting in an input layer, an intermediate layer and an output layer of the neural network. Illustratively, the neural network may be as shown in fig. 16. One or more first building units can be stacked to serve as an input layer of the neural network, a plurality of first building units serve as intermediate layers of the neural network, and one first building unit serves as an output layer of the neural network.

Therefore, in the embodiment of the present application, the output of the delay prediction model is used as a constraint condition to search the super unit, and one or more first building units obtained by the search are stacked to obtain the neural network. The resulting neural network can be associated with delays running on specific hardware, satisfying different latency scenarios. For example, for a target task with a higher delay requirement, a first building unit with a shorter operation time can be searched out through a larger delay constraint strength, so that a neural network with a shorter operation time of an output structure is obtained.

The foregoing structure search method provided by the present application is described in detail, and for convenience of understanding, the structure search method provided by the present application is exemplarily described below with reference to specific scenarios.

First, the flow of the search method provided in the present application can refer to fig. 17.

The superunit may refer to the related description in step 702. Sampling the super unit to obtain at least one construction unit, namely a construction unit gamma₁To the building unit gamma_k(i.e., the third building element described above).

Then constructing a unit gamma₁To the building unit gamma_kInputting the coded data of each construction unit into a delay prediction model to obtain the prediction delay of each construction unit, namely L PM (gamma)₁) (21.3ms) to L PM (gamma)₁)(25.2ms)。

The expected delay is then calculated:

such as 23.5 ms.

In addition, sampling is performed based on a structural parameter set obtained by updating the super unit last time to obtain one or more construction units, and for example, the basic operation with the highest weight value between two nodes is determined to be used as the basic operation for connecting the two nodes. A temporary neural network is stacked by the temporary building unit. Taking data in a database associated with the target task as input of the temporary neural network to obtain output of the temporary neural networkThe difference between the output value of the temporary neural network and the actual output value may be obtained by comparing the obtained output value of the temporary neural network with the actual output value included in the database, calculating the value of the loss function using a predetermined loss function formula (e.g., cross entropy function, minimum mean, etc. as the loss function), and then calculating the value of the joint loss function L_total(α)＝L_val(α) + λ L PM (α) and updating the set of structural parameters and the set of network parameters by combining a loss function and a gradient descent, resulting in an updated super-cell.

And repeating the process until the super unit converges, namely, sampling the super unit based on the structural parameter set to obtain the output value of the neural network formed by stacking the construction units, wherein the difference between the output value and the real output value is less than a threshold value or 0.

After the converged super cell is determined, the first building unit is output based on the structure parameter set of the converged super cell (only one is exemplified in fig. 17). Illustratively, in the first building unit, node 0 and node 1 are connected by basic operation 2, node 0 and node 2 are connected by basic operation 3, node 0 and node 3 are connected by basic operation 2, node 1 and node 2 are connected by basic operation 3, node 1 and node 3 are connected by basic operation 2, and node 2 and node 3 are connected by basic operation 2.

After the first construction unit is obtained, the first construction unit is stacked to obtain a final neural network, such as a face recognition neural network, a vehicle recognition neural network, and the like.

Therefore, in the embodiment of the present application, in addition to the calculation of the loss function L val (α), an expected delay L PM (α) is calculated in combination with a delay prediction model, the structure parameter set and the network parameter set of the super cell are updated by using the expected delay obtained according to the predicted delay as a constraint condition, and the structure parameter set and the network parameter set of the super cell after iterative update are associated with the delay, so that the first building unit meeting the actual delay requirement can be determined, and the neural network meeting the delay requirement can be obtained.

For ease of understanding, the structure search method provided in the present application is described in more detail below.

Referring to fig. 18, a scene diagram of an application of the structure search method provided by the present application is shown.

The method comprises the steps of obtaining a database and a target task, wherein the target task is used for requesting to build a neural network running on preset hardware, and the database comprises a large amount of input data and real running results corresponding to the data.

Taking DARTS space as an example, the superunit is constructed using the basic operations included in DARTS space. And the delay prediction model is also obtained based on the structure of the construction unit obtained by sampling in DARTS space and the operation delay training of the construction unit.

And taking the output of the delay preset model as a constraint condition, searching and training the super unit based on the database, and outputting one or more first building units.

And then stacking one or more first building units to obtain the neural network constructed by the target task request.

For example, as shown in fig. 19A.

The target task may be a face recognition task, and the database may be a face picture library. The face picture library includes a plurality of face pictures and information of tasks corresponding to the face pictures, such as names, sexes or introductions of persons.

The superunit is constructed using the basic operations included in DARTS space. And searching the super unit by taking the output of the delay preset model as a constraint condition to obtain one or more first construction units for picture identification.

And then stacking one or more first construction units to obtain the face recognition neural network.

For example, as shown in fig. 19A, inputting one of the face pictures can output information of a person corresponding to the face picture. Inputting a small and clear face picture, namely outputting a small and clear name, gender, birth and balance introduction and the like, wherein the name is taken as an example for illustration.

For another example, as shown in fig. 19B, the target task may be a vehicle identification task, and the database may be a sensor data set collected by the sensor.

The search for the super cell results in the first building block in a manner similar to that described above in connection with FIG. 19A.

After obtaining the one or more first building units, stacking the one or more first building units to obtain the vehicle identification neural network.

The vehicle identification neural network is used for identifying vehicles included in the acquired picture data. As shown in fig. 19B, one of the pictures acquired by the sensor is used as input data, and the output data is the vehicle included in the picture. Specifically, a prompt box may be added to the picture, and the prompt box includes a vehicle.

Therefore, in the application, the prediction delay output by the delay prediction model is used as a constraint condition, the super unit is searched to obtain the first construction unit associated with the delay, and then the neural network associated with the delay is obtained. Therefore, the neural network of the scene which is suitable for each time delay requirement can be obtained, and the user experience is improved.

The structure searching apparatus provided by the present application for performing the steps in the structure searching method of fig. 7-19B is described below with reference to the structure searching method provided by fig. 7-19B.

Referring to fig. 20, a structure of a structure searching apparatus provided in the present application is schematically illustrated.

The structure search apparatus may include: a task acquisition module 2001, a superunit acquisition module 2002, a search module 2003, and a stack module 2004.

A task obtaining module 2001, configured to obtain a target task, where the target task is used to request to create a neural network running on preset hardware;

a super cell obtaining module 2002, configured to obtain a super cell according to a target task, where the super cell includes a plurality of nodes, and any two nodes in the plurality of nodes are connected through multiple basic operations;

the searching module 2003 is further configured to search the super unit using the output of the delay prediction model as a constraint condition, and determine at least one first building unit, where the delay prediction model is used to output a prediction delay, where the prediction delay is a prediction duration of an output result obtained when a building unit included in the super unit operates on preset hardware, the delay prediction model is obtained by training according to information of a plurality of second building units, where the information of the plurality of second building units includes a structure of each second building unit in the plurality of second building units and an operation delay of each second building unit, the operation delay is a duration of an output result obtained when each second building unit operates on the preset hardware, and each first building unit, the building unit included in the super unit, and each second building unit in the at least one first building unit include a plurality of nodes, any two nodes in the plurality of nodes are connected through at most one of a plurality of basic operations, and each node in the plurality of nodes is connected with at least one of the plurality of basic operations;

the stacking module 2004 is further configured to stack at least one first building unit to obtain the neural network.

In one possible embodiment, the apparatus may further include: the training module 2005 is specifically configured to:

before the search module searches the super unit by taking the output of the delay prediction model as a constraint condition, acquiring the structure of each second construction unit;

measuring a running delay of each second building element;

and training based on the structure of each second construction unit, the operation delay of each second construction unit and a preset regression model to obtain a delay prediction model.

In one possible implementation, the training module 2005 is specifically configured to: dividing the plurality of second construction units into two types to obtain a first type construction unit and a second type construction unit; performing iterative training for M times based on a preset regression model according to the structure of the first type of construction unit, the structure of the second type of construction unit and the preset regression model to obtain a delay prediction model, wherein M is a positive integer;

wherein the Kth iteration of the M iterations comprises: training the delay prediction model obtained by the K-1 training according to the structure of the first type of construction unit and the operation delay of each second construction unit in the first type of construction unit to obtain a temporary delay prediction model for the K training, wherein K is a positive integer not greater than M; and verifying the temporary delay prediction model according to the structure of each second construction unit included by the second type construction unit and the operation delay of each second construction unit in the second type construction unit, and updating the temporary delay prediction model according to the verification result to obtain the K-th training delay prediction model.

In a possible implementation, the search module 2003 is specifically configured to:

performing N times of iterative updating on the structural parameter set of the super unit by taking the output of the delay prediction model as a constraint condition, wherein N is a positive integer; and determining at least one first construction unit according to the structure parameter set of the super unit updated by the Nth iteration.

In one possible embodiment, the pth of the N iterations includes: sampling the super unit according to a structural parameter set of the super unit to obtain a plurality of third construction units, wherein the structural parameter set comprises a plurality of structural parameters, and each structural parameter in the plurality of structural parameters is the weight of basic operation of two nodes connecting the super unit; outputting, by the delay prediction model, predicted delays for the plurality of third building units;

and updating the structural parameter set obtained by updating the P-1 st time by taking the prediction delay of the plurality of third construction units as a constraint condition to obtain a structural parameter set updated for the P time, wherein P is a positive integer not greater than N.

In a possible implementation, the search module 2003 is specifically configured to: calculating an expected delay of the super cell from the delays of the plurality of third building cells; updating the joint loss function of the super cell with the expected delay as a constraint condition; and updating the structure parameter set according to the joint loss function.

In a possible implementation, the search module 2003 is specifically configured to: coding the structures of the plurality of third construction units to obtain coded data of the plurality of third construction units; and inputting the coded data of the plurality of third construction units into the delay prediction model to obtain the prediction delay of each third construction unit in the plurality of third construction units.

In a possible implementation manner, the super cell obtaining module 2002 is specifically configured to construct a super cell based on a search space, where the search space includes a plurality of basic operations, and the plurality of basic operations include operations corresponding to a target task.

In a possible embodiment, the aforementioned search space is a differentiable search space.

Fig. 21 is a schematic structural diagram of a structure search apparatus provided in the present application. The structure search apparatus may include a processor 2101, a memory 2102, and a transceiver 2103. The processor 2101 and the memory 2102 are interconnected by a line. The memory 2102 stores therein program instructions and data.

The memory 2102 stores therein program instructions and data corresponding to the steps performed by the structure finding apparatus in fig. 7 to 19B described above. A processor 2101 configured to perform the steps performed by the structure searching apparatus as illustrated in any of the embodiments of fig. 7-19B.

The transceiver 2103 is used for performing the steps of data transceiving performed by the structure searching apparatus as shown in any one of the embodiments of fig. 7-19B. Specifically, the transceiver 2103 transmits received data to the processor 2102 or transmits data transmitted from the processor 2102.

The embodiment of the application also provides a digital processing chip. Integrated with circuitry and one or more interfaces to implement the functions of the processor 2101 described above. When integrated with memory, the digital processing chip may perform the method steps of any one or more of the preceding embodiments. When the digital processing chip is not integrated with the memory, the digital processing chip can be connected with the external memory through an interface. The digital processing chip implements the actions performed by the structure search device in the above embodiments according to the program codes stored in the external memory.

The present application provides a chip system comprising a processor for enabling a structure finding apparatus to implement the functionality of the controller involved in the above method, e.g. to process data and/or information involved in the above method. In one possible design, the system-on-chip further includes a memory for storing necessary program instructions and data. The chip system may be formed by a chip, and may also include a chip and other discrete devices.

In another possible design, when the chip system is a chip in a user equipment or an access network, the chip includes: a processing unit, which may be for example a processor, and a communication unit, which may be for example an input/output interface, a pin or a circuit, etc. The processing unit may execute computer-executable instructions stored by the storage unit to cause a chip within the structure finding apparatus or the like to perform the steps performed by the structure finding apparatus in any one of the embodiments of fig. 7-19B described above. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, and the like, and the storage unit may also be a storage unit located outside the chip in the structure search apparatus and the like, such as a read-only memory (ROM) or another type of static storage device that can store static information and instructions, a Random Access Memory (RAM), and the like.

The embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a computer, implements the method flows executed by the controller of the structure search apparatus in any of the above method embodiments. Correspondingly, the computer can be the structure searching device.

It should be understood that the controller or processor mentioned in the above embodiments of the present application may be a Central Processing Unit (CPU), and may also be one or more of a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an Field Programmable Gate Array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It should also be understood that the number of the processors or controllers in the structure searching apparatus or the chip system in the above embodiments in the present application may be one or more, and may be adjusted according to the actual application scenario, and this is merely an exemplary illustration and is not limited. The number of the memories in the embodiment of the present application may be one or multiple, and may be adjusted according to an actual application scenario, and this is merely an exemplary illustration and is not limited.

It should also be understood that the memories or readable storage media mentioned in the structure search device and the like in the above embodiments of the present application may be either volatile memories or non-volatile memories, or may include both volatile and non-volatile memories, wherein the non-volatile memories may be read-only memories (ROMs), programmable read-only memories (PROMs), erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), or flash memories, the volatile memories may be Random Access Memories (RAMs) serving as external caches, and by way of exemplary but not limiting illustration, many forms of RAMs are available, such as static random access memories (static DRAMs), SRAMs), dynamic random access memories (dynamic RAMs, DRAMs), synchronous dynamic random access memories (synchronic DRAMs, SDRAMs, DDR SDRAM, and DDR direct access memories (SDRAM, DDR SDRAM), SDRAM, and DDR direct access memories (SDRAM L).

It will be understood by those of ordinary skill in the art that all or part of the steps performed by the structure finding means or processor 2102 to implement the above embodiments may be performed by hardware or a program to instruct the associated hardware. The program may be stored in a computer readable storage medium, which may be read only memory, random access memory, etc. Specifically, for example: the processing unit or processor may be a central processing unit, a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

Computer instructions may be stored on or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g., from one website site, computer, server, or data center via a wired (e.g., coaxial cable, fiber optics, digital subscriber line (DS L)) or wireless (e.g., infrared, wireless, microwave, etc.) manner, computer-readable storage media may be any available media that can be accessed by a computer or a data center integrated with one or more available media, such as a digital storage device (e.g., a magnetic storage medium, a floppy disk, a magnetic storage medium, a DVD, a floppy disk, a magnetic storage medium, or a magnetic storage medium.

The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely descriptive of the various embodiments of the application and how objects of the same nature can be distinguished from one another. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the embodiments of the present application, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that in the description of the present application, unless otherwise indicated, "/" indicates a relationship where the objects associated before and after are an "or", e.g., a/B may indicate a or B; in the present application, "and/or" is only an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B, and may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural.

The word "if" or "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A structure search method, comprising:

acquiring a target task, wherein the target task is used for requesting to create a neural network running on preset hardware;

acquiring a super unit according to the target task, wherein the super unit comprises a plurality of nodes, and any two nodes in the plurality of nodes are connected through various basic operations;

searching the super unit by taking the output of a delay prediction model as a constraint condition, and determining at least one first construction unit, wherein the delay prediction model is used for outputting a prediction delay, the prediction delay is a prediction time length of an output result obtained when a construction unit included in the super unit operates on the preset hardware, the delay prediction model is obtained by training according to information of a plurality of second construction units, the information of the plurality of second construction units comprises a structure of each second construction unit in the plurality of second construction units and an operation delay of each second construction unit, the operation delay is a time length of an output result obtained when each second construction unit operates on the preset hardware, and each first construction unit in the at least one first construction unit, the construction unit included in the super unit and each second construction unit comprise the plurality of nodes, any two nodes in the plurality of nodes are connected through at most one of the plurality of basic operations, and each node in the plurality of nodes is connected with at least one of the plurality of basic operations;

and stacking the at least one first construction unit to obtain the neural network.

2. The method of claim 1, wherein prior to searching the super cell with the output of the delay prediction model as a constraint, the method further comprises:

acquiring the structure of each second construction unit;

testing the operational delay of each second building element;

and training based on the structure of each second construction unit, the operation delay of each second construction unit and a preset regression model to obtain the delay prediction model.

3. The method of claim 2, wherein the training based on the structure of each second building unit, the operating delay of each second building unit, and a pre-set regression model comprises:

dividing the plurality of second construction units into two types to obtain a first type construction unit and a second type construction unit;

performing iterative training for M times according to the structures of the first type of building units, the second type of building units and the preset regression model based on the preset regression model to obtain the delay prediction model, wherein M is a positive integer;

wherein the Kth one of the M iterations comprises:

training a delay prediction model obtained by training the K-1 st time according to the structure of the first type of construction unit and the operation delay of each second construction unit in the first type of construction unit to obtain a temporary delay prediction model of the training of the K time, wherein K is a positive integer not greater than M;

and verifying the temporary delay prediction model according to the structure of each second construction unit included by the second type construction unit and the operation delay of each second construction unit in the second type construction unit, and updating the temporary delay prediction model according to a verification result to obtain the delay prediction model trained at the Kth time.

4. The method according to any one of claims 1-3, wherein searching the super unit with the output of the delay prediction model as a constraint condition to determine at least one first building unit comprises:

performing N times of iterative updating on the structural parameter set of the super unit by taking the output of the delay prediction model as a constraint condition, wherein N is a positive integer;

and determining the at least one first construction unit according to the structure parameter set of the super unit updated by the Nth iteration.

5. The method of claim 4, wherein the P-th iteration of the N iterations comprises:

sampling the super unit according to a structure parameter set of the super unit to obtain a plurality of third construction units, wherein the structure parameter set comprises a plurality of structure parameters, and each structure parameter in the plurality of structure parameters is the weight of basic operation of two nodes connected with the super unit;

outputting, by the delay prediction model, predicted delays for the plurality of third building units;

and updating the structural parameter set obtained by updating the P-1 st time by taking the prediction delay of the plurality of third construction units as a constraint condition to obtain the structural parameter set updated for the P time, wherein P is a positive integer not greater than N.

6. The method according to claim 5, wherein the updating the structure parameter set obtained by updating the P-1 st time with the predicted delay of the third building units as a constraint condition comprises:

calculating an expected delay of the super cell from the predicted delays of the plurality of third building blocks;

updating a joint loss function of the super cell with the expected delay as the constraint;

and updating the structural parameter set according to the joint loss function.

7. The method of claim 6,

the joint loss function includes L_total(α)＝L_val(α) + λ. L PM (α), wherein said L_val(α) is a predetermined loss function, the L PM (α) is the desired delay, the α is the set of structural parameters, the λ is a delay constraint strength, and the λ is determined according to the target task.

8. The method according to any one of claims 5-7, wherein said outputting, by said delay prediction model, the delays of said plurality of third building elements comprises:

coding the structures of the plurality of third construction units to obtain coded data of the plurality of third construction units;

and inputting the coded data of the plurality of third construction units into the delay prediction model to obtain the prediction delay of each third construction unit in the plurality of third construction units.

9. The method according to any one of claims 1-8, further comprising:

and constructing based on a search space to obtain the super unit, wherein the search space comprises a plurality of basic operations, and the plurality of basic operations comprise operations corresponding to the target task.

10. The method of claim 9, wherein the search space is a differentiable search space.

11. A structure search apparatus, characterized by comprising:

the task acquisition module is used for acquiring a target task, and the target task is used for requesting to create a neural network running on preset hardware;

the super unit obtaining module is used for obtaining a super unit according to the target task, wherein the super unit comprises a plurality of nodes, and any two nodes in the plurality of nodes are connected through multiple basic operations;

the search module is further configured to search the super unit by using an output of a delay prediction model as a constraint condition, and determine at least one first construction unit, where the delay prediction model is used to output a prediction delay, where the prediction delay is a prediction duration of an output result obtained when a construction unit included in the super unit operates on the preset hardware, the delay prediction model is obtained by training according to information of a plurality of second construction units, where the information of the plurality of second construction units includes a structure of each second construction unit in the plurality of second construction units and an operation delay of each second construction unit, the operation delay is a duration of an output result obtained when each second construction unit operates on the preset hardware, and each first construction unit in the at least one first construction unit, the construction unit included in the super unit, and each second construction unit include the plurality of nodes, any two nodes in the plurality of nodes are connected through at most one of the plurality of basic operations, and each node in the plurality of nodes is connected with at least one of the plurality of basic operations;

and the stacking module is further used for stacking the at least one first construction unit to obtain the neural network.

12. The apparatus of claim 11, further comprising: the training module is specifically configured to:

measuring the operating delay of each second building element;

13. The apparatus of claim 12, wherein the training module is specifically configured to:

wherein the Kth one of the M iterations comprises:

14. The apparatus according to any one of claims 11-13, wherein the search module is specifically configured to:

15. The apparatus of claim 14, wherein the pth of the N iterations comprises:

16. The apparatus of claim 15, wherein the search module is specifically configured to:

calculating an expected delay of the super cell from the delays of the plurality of third building units;

and updating the structural parameter set according to the joint loss function.

17. The apparatus of claim 16,

the joint loss function includes L_total(α)＝L_val(α) + λ. L PM (α), wherein said L_val(α) loss to presetA lost function, said L PM (α) being said desired delay, said α being said set of structural parameters, said λ being a delay constraint strength, said λ being determined from said target task.

18. The apparatus according to any one of claims 15 to 17, wherein the search module is specifically configured to:

19. The apparatus of any one of claims 11-18,

the super cell obtaining module is specifically configured to construct a search space to obtain the super cell, where the search space includes multiple basic operations, and the multiple basic operations include operations corresponding to the target task.

20. The apparatus of claim 19, wherein the search space is a differentiable search space.

21. A structure search apparatus comprising: a processor and a memory;

the memory and the processor are interconnected by a line, the memory having stored therein instructions, the processor being configured to execute the structure searching method of any one of claims 1-10.

22. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any of claims 1-10.