CN113344186A

CN113344186A - Neural network architecture searching method and image classification method and device

Info

Publication number: CN113344186A
Application number: CN202110641506.0A
Authority: CN
Inventors: 初祥祥; 范裕达; 王晓星; 魏晓林
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2021-06-09
Filing date: 2021-06-09
Publication date: 2021-09-03

Abstract

The specification discloses a method for searching a neural network architecture and a method and a device for classifying images, wherein a search model comprises a first weight and a second weight, for one architecture unit in the neural network architecture to be generated, the first weight represents the importance of the connection between the architecture unit and each other architecture unit in the neural network architecture to be generated, the second weight represents the importance of realizing the architecture unit by adopting each alternative unit in a search space, when the search model is trained, other architecture units connected with the architecture unit are selected according to the first weight, alternative units for realizing the architecture unit are selected according to the second weight, the neural network architecture is generated according to the selected result, the quality of the generated neural network architecture is predicted, the search model is trained according to the predicted result to adjust the first weight and the second weight, the neural network architecture is generated through the trained search model, and the quality of the generated neural network architecture can be effectively improved.

Description

Neural network architecture searching method and image classification method and device

Technical Field

The present disclosure relates to the field of machine learning technologies, and in particular, to a neural network architecture search method and an image classification method and apparatus.

Background

At present, machine learning techniques based on neural networks have been widely applied in various technical fields, such as image classification, unmanned decision making, and the like. For the Neural network, various processing layers, modules and sub-networks which can be used as architectural elements in the Architecture are increasing, and the Neural network Architecture usually needs to be designed manually, so a Neural Network Architecture Search (NAS) method which can automatically generate the Neural network Architecture is developed.

The GDAS is one of the NAS, and when searching for a neural network architecture through the GDAS, a search space is required to be preset, the search space includes various architecture units, and then the neural network architecture is searched for in the search space.

Specifically, when searching for a neural network architecture, two main concerns are that, first, for an architecture unit in the neural network architecture, the architecture unit selects which architecture unit in the search space to implement, and second, for an architecture unit in the neural network architecture, the architecture unit is connected to which other architecture units in the neural network architecture.

The final purpose of the NAS is to output a high-quality neural network architecture, and for a neural network architecture, the quality of the neural network architecture needs to be evaluated after training, so that the NAS needs to train the neural network corresponding to the output architecture each time the NAS outputs one architecture, and the quality of the architecture needs to be evaluated after training, which undoubtedly increases the time cost of the NAS search architecture. In the GDAS, only a small amount of neural networks of the architecture are trained after the architecture is output, and then the quality of the architecture is predicted according to the small amount of trained neural networks so as to save time cost.

However, the effect of using the architecture searched by GDAS is still not ideal when the architecture is actually used, so how to obtain the architecture with ideal effect by an automatic search method is a problem to be solved urgently.

Disclosure of Invention

The embodiments of the present disclosure provide a method for searching a neural network architecture, and a method and an apparatus for classifying images, so as to partially solve the problems in the prior art.

The embodiment of the specification adopts the following technical scheme:

the present specification provides a method for searching a neural network architecture, including:

aiming at a current architecture unit in a to-be-generated neural network architecture, selecting a first target unit connected with the current architecture unit from other architecture units according to a first weight corresponding to the connection relation between each other architecture unit in the to-be-generated neural network architecture and the current architecture unit, wherein the first weight is contained in a search model; selecting a second target unit for realizing the current architecture unit from all the candidate units according to each candidate unit contained in a preset search space and a second weight corresponding to each candidate unit contained in the search model;

generating a neural network architecture according to the first target unit and the second target unit selected aiming at each architecture unit in the neural network architecture to be generated;

predicting the quality of the generated neural network architecture according to the training samples, and determining the loss of the search model according to the predicted quality;

and training the search model by taking the loss minimization as an optimization target, adjusting at least a first weight and a second weight in the search model until a preset training condition is met, and generating a neural network architecture according to the first weight and the second weight contained in the trained search model.

Optionally, selecting the first target unit connected to the current architecture unit from the other architecture units specifically includes:

and sequentially selecting at least one other architecture unit from other architecture units of the to-be-generated neural network architecture as a first target unit according to the sequence from large to small of the first weight corresponding to the connection relation of the current architecture unit.

Optionally, selecting a second target unit for implementing the current architecture unit from the candidate units according to each candidate unit included in a preset search space and a second weight corresponding to each candidate unit included in the search model, specifically including:

inquiring each alternative unit contained in a preset search space;

for each candidate unit, under the condition that the current architecture unit is connected with the first target unit in the second weights included in the search model, the second weight of the candidate unit for realizing the current architecture unit is selected as the second weight corresponding to the candidate unit;

and selecting a second target unit for realizing the current architecture unit from the candidate units according to the second weight corresponding to each candidate unit.

Optionally, selecting a second target unit for implementing the current architecture unit from the candidate units specifically includes:

for each alternative unit, determining the probability of selecting the alternative unit for realizing the current architecture unit according to the second weight corresponding to the alternative unit; wherein the second weight is positively correlated with the probability;

repeatedly executing the following specified steps for specified times:

and selecting a second target unit for realizing the current architecture unit from the alternative units according to the probability of selecting the alternative units.

Optionally, generating a neural network architecture according to the first target unit and the second target unit selected for each architecture unit in the neural network architecture to be generated, specifically including:

for each architecture unit in the neural network architecture to be generated, taking the output of the first target unit selected for the architecture unit as the input of the architecture unit, and connecting the first target unit selected for the architecture unit and the architecture unit;

and aiming at the appointed step executed each time, adopting the second target unit selected for each architecture unit in the neural network architecture to be generated at the time to realize each architecture unit in the neural network architecture to be generated, and obtaining the neural network architecture corresponding to the execution of the appointed step at the time.

Optionally, predicting quality of the generated neural network architecture according to the training samples, and determining loss of the search model according to the predicted quality, specifically including:

for each executed appointed step, predicting the quality of a neural network architecture corresponding to the executed appointed step according to a training sample, and determining the loss of the search model when the appointed step is executed according to the predicted quality;

determining an average value of losses of the search model each time the specifying step is performed, as the losses of the search model.

Optionally, the method further comprises:

for each executed appointed step, determining the loss of the neural network corresponding to the executed appointed step according to the training sample;

and aiming at each second target unit, adjusting the model parameters in the second target unit according to the average value of the loss of the neural network corresponding to each appointed step for selecting the second target unit.

The present specification provides a method for image classification, including:

predicting a quality of the generated neural network architecture from the sample image, and determining a loss of the search model from the predicted quality;

training the search model by taking the loss minimization as an optimization target, adjusting at least a first weight and a second weight in the search model until a preset training condition is met, and generating a neural network architecture as an image classification model according to the first weight and the second weight contained in the trained search model;

and inputting the image to be classified into the image classification model to obtain a classification result of the image to be classified by the image classification model.

The present specification provides an apparatus for neural network architecture search, the apparatus comprising:

the first selection module is used for selecting a first target unit connected with the current architecture unit from other architecture units according to a first weight corresponding to the connection relation between each other architecture unit and the current architecture unit in the to-be-generated neural network architecture and contained in a search model aiming at the current architecture unit in the to-be-generated neural network architecture;

the second selection module is used for selecting a second target unit for realizing the current architecture unit from all the alternative units according to each alternative unit contained in a preset search space and a second weight corresponding to each alternative unit contained in the search model;

the training module is used for generating a neural network architecture according to the first target unit and the second target unit which are selected aiming at each architecture unit in the neural network architecture to be generated; predicting the quality of the generated neural network architecture according to the training samples, and determining the loss of the search model according to the predicted quality; training the search model with the loss minimization as an optimization goal to adjust at least a first weight and a second weight in the search model;

and the generating module is used for generating a neural network architecture according to the first weight and the second weight contained in the trained search model when the preset training condition is met.

The present specification provides an apparatus for image classification, the apparatus comprising:

the training module is used for generating a neural network architecture according to the first target unit and the second target unit which are selected aiming at each architecture unit in the neural network architecture to be generated; predicting a quality of the generated neural network architecture from the sample image, and determining a loss of the search model from the predicted quality; training the search model with the loss minimization as an optimization goal to adjust at least a first weight and a second weight in the search model;

the generating module is used for generating a neural network architecture according to a first weight and a second weight contained in a trained search model when a preset training condition is met; as an image classification model;

and the classification module is used for inputting the image to be classified into the image classification model to obtain the classification result of the image to be classified by the image classification model.

The present specification provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above neural network architecture search method or image classification method.

The present specification provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the program, the processor implements the method for neural network architecture search or the method for image classification.

The embodiment of the specification adopts at least one technical scheme which can achieve the following beneficial effects:

the search model in the embodiments of the present description includes a first weight and a second weight, where for an architecture unit in the neural network architecture to be generated, the first weight may represent an importance of a connection between the architecture unit and each other architecture unit in the neural network architecture to be generated, and the second weight may represent an importance of implementing the architecture unit by using each candidate unit in a search space, so that, when the search model is trained, the other architecture units connected to the architecture unit may be selected according to the first weight, the candidate unit for implementing the architecture unit may be selected according to the second weight, the neural network architecture may be generated according to a selected result, quality of the generated neural network architecture may be predicted according to a training sample, the search model may be trained according to a prediction result to adjust the first weight and the second weight, and after training the search model is completed, the neural network architecture is generated through the trained search model, and the quality of the generated neural network architecture can be effectively improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification and are incorporated in and constitute a part of this specification, illustrate embodiments of the specification and together with the description serve to explain the specification and not to limit the specification in a non-limiting sense. In the drawings:

fig. 1 is a schematic diagram of a method for searching a neural network architecture provided in an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a neural network architecture to be generated according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of an apparatus for neural network architecture search according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an image classification apparatus provided in an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an electronic device provided in an embodiment of this specification.

Detailed Description

At present, when the NAS is used to generate the neural network architecture, for one architecture unit in the neural network to be generated, a theoretically optimal connection method is that each other architecture unit in the neural network to be generated is connected to the architecture unit. Therefore, when training the search model, the architecture units in the neural network architecture generated by the search model are often connected two by two. However, after the neural network architecture is generated, a certain training of the neural network architecture is required to predict the quality of the neural network architecture, and since each architecture unit in the neural network architecture is connected pairwise, the training difficulty is very high, after the neural network architecture is generated, the connection between the architecture units in the neural network architecture is simplified, that is, the connection between some architecture units is eliminated, then the training is performed, and the quality of the simplified neural network architecture is predicted. However, the predicted quality is not actually the quality of the neural network architecture generated by the search model, but the quality of the generated neural network architecture is simplified, so the loss obtained by the predicted quality is not the real loss of the search model, which results in that the trained search model cannot generate the high-quality neural network architecture, and the generated neural network architecture is often composed of a large number of non-parameter operators (i.e. architecture units without parameters in the search space), such as pooling and skip connection, and the quality of the neural network architecture is poor.

The neural network architecture searching method provided in the embodiments of the present specification introduces a first weight and a second weight into a search model, for a neural network architecture to be generated, the first weight may represent an importance of a connection between the architecture unit and each other architecture unit in the neural network architecture to be generated, the second weight may represent an importance of implementing the architecture unit by using each alternative unit in a search space, the other architecture units connected to the architecture unit may be selected according to the first weight, the alternative units for implementing the architecture unit are selected according to the second weight, the neural network architecture is generated according to a selected result, and then the quality of the generated neural network architecture is predicted according to a training sample, so that the predicted quality is a quality of the neural network architecture actually generated by the search model, and a loss obtained by the predicted quality is also a true loss of the search model, therefore, after the search model is trained, the quality of the neural network architecture generated by the trained search model can be effectively improved.

In order to make the objects, technical solutions and advantages of the present disclosure more clear, the technical solutions of the present disclosure will be clearly and completely described below with reference to the specific embodiments of the present disclosure and the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present specification without any creative effort belong to the protection scope of the present specification.

The technical solutions provided by the embodiments of the present description are described in detail below with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of a neural network architecture search method provided in an embodiment of the present disclosure, including:

s100: and aiming at a current architecture unit in the neural network architecture to be generated, selecting a first target unit connected with the current architecture unit from other architecture units according to a first weight corresponding to the connection relation between each other architecture unit in the neural network architecture to be generated and the current architecture unit, wherein the first weight is contained in a search model.

Since the neural network architecture is composed of a plurality of architecture units, and one of the problems to be solved by the neural network architecture search is the connection relationship between these architecture units, that is, the output of which architecture unit serves as the input of which architecture unit, in the embodiment of the present specification, a first weight for representing the importance of the connection relationship between the architecture units is introduced in the search model for searching the neural network architecture.

For each architecture unit in the neural network architecture to be generated, the architecture unit can be used as a current architecture unit for each architecture unit in sequence according to the sequence of the architecture units from front to back, and a first target unit connected with the current architecture unit is selected from other architecture units arranged before the current architecture unit according to a first weight corresponding to the connection relation between each other architecture unit arranged before the current architecture unit and the current architecture unit.

Specifically, at least one other architecture unit may be sequentially selected from the other architecture units of the to-be-generated neural network architecture as the first target unit in an order from large to small of the first weight corresponding to the connection relationship of the current architecture unit. That is, the number of other architectural elements connected to the current architectural element may be more than two.

For two architecture units, the greater the first weight corresponding to the connection relationship between the two architecture units, the more important the connection relationship between the two architecture units is, that is, if the two architecture units are connected, the easier the neural network architecture with higher quality is obtained.

It should be noted that, the first target unit connected to the current architecture unit in the embodiment of the present specification refers to: the output of the first target unit is directly used as the input of the current architecture unit.

For example, in the neural network architecture to be generated as shown in fig. 2, 5 architecture units are included, which are architecture units 1 to 5 in the order from front to back, respectively, and assuming that the architecture unit 4 is used as the current architecture unit, the first weights q corresponding to the connection relationship between the architecture unit 1 and the architecture unit 4 included in the search model can be queried for the architecture units 1 to 3 arranged before the architecture unit 4 respectively₁₄The first weight q corresponding to the connection relationship between the architecture unit 2 and the architecture unit 4₂₄The connection relationship between the architecture unit 3 and the architecture unit 4 corresponds toA first weight q₃₄. Assuming that two first target units need to be selected, then q may be₁₄、q₂₄、q₃₄The largest two are selected, and the largest two are assumed to be q₁₄And q is₃₄Then, the architecture unit 1 and the architecture unit 3 are taken as two selected first target units, and the architecture units 1 and 4 are connected, and the architecture units 3 and 4 are connected in a manner that the outputs of the architecture unit 1 and the architecture unit 3 are simultaneously taken as the inputs of the architecture unit 4.

S102: and selecting a second target unit for realizing the current architecture unit from the candidate units according to each candidate unit contained in a preset search space and a second weight corresponding to each candidate unit contained in the search model.

Another problem to be solved by neural network architecture search is how to implement each architecture unit in the neural network architecture. Thus, in the embodiments of the present specification, there is also a search space in which various alternative units capable of implementing an architectural unit, such as a 3 × 3 convolutional layer, a 5 × 5 convolutional layer, a pooling layer, and so on, are stored. All the alternative units have model parameters, but the model parameters are not trained and adjusted, and certain functions cannot be well realized.

In the embodiment of the present specification, a second weight is also introduced to represent the importance of implementing a certain architecture unit in the neural network architecture to be generated by adopting various alternative units in the search space. The larger the second weight of a certain candidate unit is, the greater the importance of implementing the current architecture unit by using the candidate unit is, that is, implementing the current architecture unit by using the candidate unit is easier to obtain a neural network architecture with better quality. Therefore, in this embodiment of the present specification, for the current architecture unit, the candidate unit with the largest second weight may be directly selected as the second target unit in the search space, and at this time, the execution order of step S100 and step S102 is not sequential.

However, for the same candidate unit, the importance of implementing the architecture unit at different positions in the neural network architecture by using the candidate unit is not the same, and therefore, in order to further improve the quality of the generated neural network, when the second target unit is selected for the current architecture unit, the selection needs to be performed according to the connection relationship of the current architecture unit in the neural network architecture to be generated. Specifically, each candidate unit included in the search space may be queried, and for each candidate unit, in each second weight included in the search model, when the current architecture unit is queried to be connected to the first target unit, the candidate unit is selected to implement the second weight of the current architecture unit, and the second weight is used as the second weight corresponding to the candidate unit; and selecting a second target unit for realizing the current architecture unit from the candidate units according to the second weight corresponding to each candidate unit. In this case, step S100 needs to be executed first, and then step S102 needs to be executed.

Continuing with the above example, assuming that three candidate units, namely, a 3 × 3 convolutional layer, a 5 × 5 convolutional layer, and a pooling layer, exist in the search space, and the current architecture unit is architecture unit 4, since the first target units determined for the current architecture unit are architecture unit 1 and architecture unit 3, the three candidate units can be queried to select the three candidate units for implementing the second weight of the current architecture unit when architecture unit 1 and architecture unit 3 are connected to the current architecture unit, respectively, and select the second target unit for the current architecture unit among the three candidate units according to the queried second weight.

In addition, for each candidate unit, the probability of selecting the candidate unit to realize the current architecture unit can be determined according to the second weight corresponding to the candidate unit, and a second target unit can be selected in the search space according to the probability. Wherein, the larger the second weight corresponding to one candidate unit is, the larger the probability of selecting the candidate unit is.

S104: and generating the neural network architecture according to the first target unit and the second target unit selected aiming at each architecture unit in the neural network architecture to be generated.

Through the above steps S100 and S102, a first target unit and a second target unit are determined for each architecture unit in the neural network architecture to be generated, the first target unit represents a connection relationship (relationship between input and output) between each architecture unit in the neural network architecture to be generated, and the second target unit represents how each architecture unit in the neural network architecture to be generated is implemented, so that the neural network architecture can be generated according to the first target unit and the second target unit selected for each architecture unit.

S106: predicting a quality of the generated neural network architecture from the training samples, and determining a loss of the search model from the predicted quality.

In this embodiment, the training sample may also be input into the search model, and after the neural network architecture is generated in step S104, the neural network architecture generated in step S104 may be trained for a preset number of times according to the input training sample, and the quality of the generated neural network architecture is predicted according to the trained neural network architecture, and then the loss of the search model is determined according to the predicted quality.

The input training sample is related to the function to be realized by the neural network architecture to be generated.

For example, when the function to be implemented by the neural network architecture to be generated is to classify images, the input training sample may specifically be a sample image. When the generated neural network architecture is trained, the generated neural network architecture can be trained for a preset number of times according to the sample images and the labels of the sample images.

No matter what the function of the neural network architecture to be generated needs to be realized, when the generated neural network architecture is trained, the training samples can be divided into a training set and a verification set, and the generated neural network architecture is trained by using the training set. Moreover, each architecture unit in the generated neural network architecture is realized by an alternative unit in the search space, so that when the generated neural network architecture is trained, the model parameters of each architecture unit in the generated neural network architecture can be adjusted, and the model parameters of each alternative unit in the search space are updated by using the adjusted model parameters of each architecture unit.

After the generated neural network architecture is trained for a preset number of times, the neural network architecture trained for the preset number of times can be verified through a verification set, the quality of the neural network architecture is predicted according to a verification result, and finally the loss of a search model is determined according to the predicted quality. The loss is inversely related to the predicted quality of the neural network architecture. The method for predicting the quality of the neural network architecture is not limited in the embodiments of the present specification.

S108: and training the search model by taking the loss minimization as an optimization target, adjusting at least a first weight and a second weight in the search model until a preset training condition is met, and generating a neural network architecture according to the first weight and the second weight contained in the trained search model.

In this embodiment of the present specification, the connection relationship of each architecture unit in the neural network architecture generated by the search model is determined by the first weight, and how to implement each architecture unit is determined by the second weight, so that after the loss of the search model is determined in step S106, the loss can be minimized as a training target, the search model is trained to adjust the first weight and the second weight in the search model, and determine whether a preset training condition is satisfied, if not, the step S100 is returned to continue training the search model, and if so, the neural network architecture can be generated by the trained search model.

Continuing with the above example, after the search model is trained, a neural network architecture can be generated through the above steps S100 to S104 according to the first weight and the second weight included in the trained search model, and the neural network architecture is used as an image classification model, and an image to be classified is input into the image classification model, so as to obtain a classification result of the image to be classified by the image classification model.

Further, by the method shown in fig. 1, in the training process of the search model, not only the first weight and the second weight are adjusted, but also the model parameters of the candidate units included in the search space are adjusted, so that the model parameters of each architecture unit in the neural network architecture generated by the trained search model are also adjusted, but nevertheless, the model parameters of the candidate units are adjusted in the process of training the generated neural network architecture in step S106 to predict the quality of the generated neural network architecture, and the training of the generated neural network architecture is only performed for a preset number of times after all, and the training is not sufficient, that is, the model parameters of each architecture unit in the neural network architecture generated by the trained search model are not adjusted to be optimal, so that the neural network architecture is generated by the trained search model, and after the image classification model is used as the image classification model, the image classification model can be further trained through the sample image, and after the training is finished, the image to be classified is input into the trained image classification model to obtain a classification result.

It can be seen from the above method that, in the embodiment of the present specification, when a search model is trained, each architecture unit in a neural network architecture generated by the search model is not connected two by two, but is selectively connected according to a second weight, so that it is not necessary to eliminate connections between some architecture units for simplification, the generated neural network architecture can be directly trained and the quality of the generated neural network architecture can be predicted, and thus, a loss determined according to the quality is an actual loss of the search model, so that the search model can be accurately trained, and the quality of the neural network architecture generated by the search model is improved.

In addition, in step S102, when the second target unit is selected for the current architecture unit, for each candidate unit, although the probability of selecting the candidate unit for implementing the current architecture unit may be determined according to the weight corresponding to the candidate unit, and the second target unit may be selected according to the probability, it is inevitable that some candidate units with higher probability are always selected as the second target unit in the process of training the search model, which is obviously not beneficial to training the search model. For better training of the search model, in the embodiment of the present specification, after determining, for each candidate unit in the search space, the probability of selecting the candidate unit for implementing the current architecture unit, the following specified steps may be repeatedly performed a specified number of times:

That is, the second target unit may be selected a specified number of times, repeatable, after determining the probability of selecting each candidate unit for implementing the current architectural unit. The number of times can be set as needed, for example, 10 times.

In step S104, when the neural network architecture is generated, corresponding neural network architectures may be generated according to the second target unit selected each time the above-mentioned designating step is performed. That is, for each architecture unit in the neural network architecture to be generated, the output of the first target unit selected for the architecture unit is used as the input of the architecture unit, and the first target unit selected for the architecture unit is connected with the architecture unit; and aiming at the appointed step executed each time, adopting the second target unit selected for each architecture unit in the neural network architecture to be generated at the time to realize each architecture unit in the neural network architecture to be generated, and obtaining the neural network architecture corresponding to the execution of the appointed step at the time.

Correspondingly, in step S106, for each executed specifying step, the quality of the neural network architecture corresponding to the executed specifying step is predicted according to the training sample, and the loss of the search model when the specifying step is executed is determined according to the predicted quality; determining an average value of losses of the search model each time the specifying step is performed, as the losses of the search model.

In addition, when the model parameters of each candidate unit in the search space are adjusted, the loss of the neural network corresponding to the executed specified step can be determined according to the training sample aiming at the specified step executed each time; and aiming at each second target unit, adjusting the model parameters in the second target unit according to the average value of the loss of the neural network corresponding to each appointed step for selecting the second target unit. For example, assuming that a second target unit is selected for a certain architecture unit 10 times in total, and 3 times of convolutional layers of 3 × 3 are selected for implementing the architecture unit, the losses of the neural network architecture generated by performing the above-mentioned specifying step for the 3 times can be respectively determined, and the average value of the losses is determined, and finally the model parameters of the convolutional layers of 3 × 3 in the search space are adjusted according to the determined average value of the losses.

Based on the same idea, the present specification further provides a corresponding apparatus, a storage medium, and an electronic device.

Fig. 3 is a schematic structural diagram of an apparatus for neural network architecture search provided in an embodiment of the present disclosure, where the apparatus includes:

a first selecting module 301, configured to select, for a current architecture unit in a to-be-generated neural network architecture, a first target unit connected to the current architecture unit from among other architecture units according to a first weight corresponding to a connection relationship between each other architecture unit and the current architecture unit in the to-be-generated neural network architecture, where the first weight is included in a search model;

the second selecting module 302 is configured to select, according to each candidate unit included in a preset search space and a second weight corresponding to each candidate unit included in the search model, a second target unit for implementing the current architecture unit from the candidate units;

a training module 303, configured to generate a neural network architecture according to the first target unit and the second target unit selected for each architecture unit in the neural network architecture to be generated; predicting the quality of the generated neural network architecture according to the training samples, and determining the loss of the search model according to the predicted quality; training the search model with the loss minimization as an optimization goal to adjust at least a first weight and a second weight in the search model;

the generating module 304 is configured to generate a neural network architecture according to the first weight and the second weight included in the trained search model when a preset training condition is met.

Optionally, the first selecting module 301 is specifically configured to sequentially select, in order from large to small according to the first weight corresponding to the connection relationship of the current architecture unit, at least one other architecture unit from among the other architecture units of the to-be-generated neural network architecture, as the first target unit.

Optionally, the second selecting module 302 is specifically configured to query each candidate unit included in a preset search space; for each candidate unit, under the condition that the current architecture unit is connected with the first target unit in the second weights included in the search model, the second weight of the candidate unit for realizing the current architecture unit is selected as the second weight corresponding to the candidate unit; and selecting a second target unit for realizing the current architecture unit from the candidate units according to the second weight corresponding to each candidate unit.

Optionally, the second selecting module 302 is specifically configured to, for each candidate unit, determine, according to a second weight corresponding to the candidate unit, a probability that the candidate unit is selected to implement the current architecture unit; wherein the second weight is positively correlated with the probability; repeatedly executing the following specified steps for specified times: and selecting a second target unit for realizing the current architecture unit from the alternative units according to the probability of selecting the alternative units.

Optionally, the training module 303 is specifically configured to, for each architecture unit in the to-be-generated neural network architecture, take an output of the first target unit selected for the architecture unit as an input of the architecture unit, and connect the first target unit selected for the architecture unit and the architecture unit; and aiming at the appointed step executed each time, adopting the second target unit selected for each architecture unit in the neural network architecture to be generated at the time to realize each architecture unit in the neural network architecture to be generated, and obtaining the neural network architecture corresponding to the execution of the appointed step at the time.

Optionally, the training module 303 is specifically configured to, for each executed specifying step, predict, according to a training sample, quality of a neural network architecture corresponding to the executed specifying step, and determine, according to the predicted quality, a loss of the search model when the specifying step is executed; determining an average value of losses of the search model each time the specifying step is performed, as the losses of the search model.

Optionally, the training module 303 is further configured to, for each executed specified step, determine, according to the training sample, a loss of the neural network corresponding to the executed specified step; and aiming at each second target unit, adjusting the model parameters in the second target unit according to the average value of the loss of the neural network corresponding to each appointed step for selecting the second target unit.

Fig. 4 is a schematic structural diagram of an apparatus for image classification provided in an embodiment of the present specification, where the apparatus includes:

a first selection module 401, configured to select, for a current architecture unit in a to-be-generated neural network architecture, a first target unit connected to the current architecture unit from among other architecture units according to a first weight corresponding to a connection relationship between each other architecture unit and the current architecture unit in the to-be-generated neural network architecture, where the first weight is included in a search model;

a second selecting module 402, configured to select, according to each candidate unit included in a preset search space and a second weight corresponding to each candidate unit included in the search model, a second target unit for implementing the current architecture unit from the candidate units;

a training module 403, configured to generate a neural network architecture according to the first target unit and the second target unit selected for each architecture unit in the neural network architecture to be generated; predicting a quality of the generated neural network architecture from the sample image, and determining a loss of the search model from the predicted quality; training the search model with the loss minimization as an optimization goal to adjust at least a first weight and a second weight in the search model;

a generating module 404, configured to generate a neural network architecture according to a first weight and a second weight included in a trained search model when a preset training condition is met; as an image classification model;

the classification module 405 is configured to input the image to be classified into the image classification model, so as to obtain a classification result of the image to be classified by the image classification model.

The present specification also provides a computer readable storage medium storing a computer program which, when executed by a processor, is operable to perform the method of neural network architecture search or the method of image classification provided above.

Based on the neural network architecture search method or the image classification method provided above, the embodiment of the present specification further provides a schematic structural diagram of the electronic device shown in fig. 5. As shown in fig. 5, the drone includes, at the hardware level, a processor, an internal bus, a network interface, a memory, and a non-volatile memory, although it may also include hardware required for other services. The processor reads a corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to implement the neural network architecture search method or the image classification method.

Of course, besides the software implementation, the present specification does not exclude other implementations, such as logic devices or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may be hardware or logic devices.

In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Hardware Description Language), traffic, pl (core universal Programming Language), HDCal (jhdware Description Language), lang, Lola, HDL, laspam, hardward Description Language (vhr Description Language), vhal (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functions of the various elements may be implemented in the same one or more software and/or hardware implementations of the present description.

As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The description has been presented with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only an example of the present specification, and is not intended to limit the present specification. Various modifications and alterations to this description will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present specification should be included in the scope of the claims of the present specification.

Claims

1. A method of neural network architecture search, comprising:

2. The method of claim 1, wherein selecting the first target unit connected to the current architectural unit from among the other architectural units specifically comprises:

3. The method according to claim 1, wherein selecting a second target unit for implementing the current architecture unit from the candidate units according to each candidate unit included in a preset search space and a second weight corresponding to each candidate unit included in the search model specifically includes:

inquiring each alternative unit contained in a preset search space;

4. The method according to claim 1, wherein selecting a second target unit for implementing the current architectural unit from among the candidate units specifically includes:

repeatedly executing the following specified steps for specified times:

5. The method of claim 4, wherein generating a neural network architecture from the first target unit and the second target unit selected for each architecture unit in the neural network architecture to be generated comprises:

6. The method of claim 5, wherein predicting a quality of the generated neural network architecture based on training samples, and determining a loss of the search model based on the predicted quality, comprises:

7. The method of claim 5, wherein the method further comprises:

8. A method of image classification, comprising:

9. An apparatus for neural network architecture search, comprising:

10. An apparatus for image classification, comprising:

11. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-7 or 8.

12. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-7 or 8 when executing the program.