CN114648091A

CN114648091A - Method for establishing neural network and readable storage medium

Info

Publication number: CN114648091A
Application number: CN202011510457.9A
Authority: CN
Inventors: 不公告发明人
Original assignee: Anhui Cambricon Information Technology Co Ltd
Current assignee: Anhui Cambricon Information Technology Co Ltd
Priority date: 2020-12-18
Filing date: 2020-12-18
Publication date: 2022-06-21

Abstract

The invention relates to a method for establishing a neural network and a readable storage medium, firstly setting a super network, wherein the super network comprises a plurality of levels, each level comprises a plurality of layers, and each layer comprises a plurality of units; setting an input control gate for each cell to reflect the probability of selecting the cell; setting an output control gate for each layer to determine the number of channels of the layer; inputting image data to the ultra-network for forward propagation so as to calculate a loss function; updating the input control gate and the output control gate according to a loss function; and determining the structure of the neural network according to the updated input control gate and the updated output control gate. The size of the model is compressed as much as possible on the premise of ensuring no loss of precision.

Description

Method for establishing neural network and readable storage medium

Technical Field

The present invention relates generally to the field of neural networks. More particularly, the present invention relates to a method of establishing a neural network and a readable storage medium.

Background

With the development of artificial intelligence, neural networks are largely used for classification and detection of images. The traditional backbone network for classification and detection tasks is designed by developers according to own experience, and a network with acceptable performance is selected after a plurality of trial and error experiments. Moreover, most networks are designed aiming at the classification task, then the networks and parameters suitable for the classification task are migrated to the detection task, and finally the precision is obtained through retraining. The resulting network often requires a large number of trial and error and is not universal.

When designing a network, developers often use the accuracy of tasks as a main design index, and consider less performance standards such as network delay, the number of frames transmitted per second (FPS), and the like.

Therefore, a neural network establishment scheme which is automatic and can comprehensively evaluate a plurality of performance indexes is urgently needed.

Disclosure of Invention

To at least partially solve the technical problems mentioned in the background, an aspect of the present invention provides a method and a readable storage medium related to establishing a neural network.

In one aspect, the present invention discloses a method for establishing a neural network, comprising: setting a super-network, wherein the super-network comprises a plurality of stages, each stage comprises a plurality of layers, and each layer comprises a plurality of units; setting an input control gate for each cell to reflect the probability of selecting the cell; setting an output control gate for each layer to determine the number of channels of the layer; inputting image data to the ultra-network for forward propagation so as to calculate a loss function; updating the input control gate and the output control gate according to a loss function; and determining the structure of the neural network according to the updated input control gate and the updated output control gate.

In another aspect, the present invention discloses a computer readable storage medium having stored thereon computer program code for establishing a neural network, which when executed by a processing device, performs the aforementioned method.

The invention utilizes automatic machine learning, simultaneously aims at the number of channels and time delay, enables the searched network structure to be beneficial to classification and detection tasks through the arrangement of the search space, and compresses the size of the model as much as possible on the premise of not losing precision.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. In the accompanying drawings, several embodiments of the present invention are illustrated by way of example and not by way of limitation, and like reference numerals designate like or corresponding parts throughout the several views, in which:

FIG. 1 is a schematic diagram showing a plurality of units preset for specific hardware in an embodiment of the present invention;

FIG. 2 is a flow chart illustrating the establishment of a neural network according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating an ultra-net of an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating a neural network model of an embodiment of the present invention;

FIG. 5 is a flow chart illustrating the establishment of a neural network according to another embodiment of the present invention; and

FIG. 6 is a flow chart illustrating the establishment of a neural network according to another embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be understood that the terms "first", "second", "third" and "fourth", etc. in the claims, the description and the drawings of the present invention are used for distinguishing different objects and are not used for describing a particular order. The terms "comprises" and "comprising," when used in the specification and claims of this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification and claims of this application, the singular form of "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the specification and claims of this application refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

As used in this specification and claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection".

The following detailed description of embodiments of the invention refers to the accompanying drawings.

Building neural network models in the field of machine learning is a very tedious process, with a network model at 10 layers yielding 1010 possible network combinations, no mention of hundreds of network models. These models have been adapted manually by developers and have consumed a significant amount of time and resources. In order to automate the process of designing machine learning models, a system for automated machine learning (AutoML) has emerged.

For AutoML, the first work is to construct a neural network model, because once a problem occurs in the neural network architecture, the model effect cannot be significantly improved by performing parameter tuning. The operation of the neural network architecture of AutoML can be divided into three stages: a search space phase, a search strategy phase and an evaluation strategy phase.

The search space phase is based on certain premises and assumptions, and based on the existing experience, preset some units, which are like building blocks, and use these building blocks to stack the search space. Under the premise of a given search space, the AutoML follows an algorithm designed by a developer, and generates a relatively perfect network structure through continuous tests according to certain evaluation indexes. That is, the neural network model is filtered out based on this search space.

The AutoML considers the neural network structure as being formed by combining many small and repeated units, and when the whole neural network is constructed, only the units need to be searched, arranged and combined, and the whole network structure does not need to be searched.

The purpose of the strategy searching stage is to judge which algorithm can be used to quickly and accurately find the optimal network structure parameter configuration. Common search methods include: random search, bayesian optimization, evolutionary algorithms, reinforcement learning, gradient-based algorithms, and the like.

The evaluation strategy phase is used for evaluating the performance of the generated network structure, and is similar to an agent model (agent model) in engineering optimization, because the effect of the deep learning model is very dependent on the scale of training data, and model training on large-scale data is time-consuming, so that the design method needs to be approximately evaluated. Currently, there are several commonly used evaluation strategies.

The simplest and intuitive strategy is to use a data set for training and validation to evaluate the performance of the network structure. Low fidelity (low fidelity) strategies consider only a single or a few evaluation criteria, such as shorter training time, use of fewer training sets, use of lower picture resolution, or use of fewer filters per layer, to reduce training overhead. Although deviations may result, such deviations are generally acceptable. The core idea of inference method strategy is to initially build multiple learning curves and terminate those curves that do not perform well to speed up the framework search process. The weight parameter migration (weight parameter migration) strategy is to directly apply a trained model parameter to the current network structure to shorten the training time, provided that a similar trained model can be migrated.

The method utilizes AutoML to model the neural network, defines units in a search space stage, sets coefficients aiming at the number of channels of the units and each layer in a search strategy stage, plans a parameter updating mode, trains and verifies the network structure by using a data set in a strategy evaluation stage, thereby evaluating the performance of the network structure and finally debugging an ideal neural network model.

An embodiment of the present invention is a method for building a neural network using AutoML. The neural network comprises a backbone network and a sorter, wherein the backbone network is used for extracting features, the sorter is connected behind the backbone network, and the sorter comprises a classifier and/or a detector, so that the neural network model can perform classification and/or detection tasks.

In the embodiment, some units are preset in the stage of searching the space, and the units can be designed according to the artificial intelligence chip for operating the neural network model, so that the units are more friendly to the specific artificial intelligence chip and have higher operation speed. Fig. 1 shows a plurality of units preset for specific hardware, which includes a first unit 101, a second unit 102, a third unit 103, a fourth unit 104, and a fifth unit 105.

The first unit 101 includes 4 net operators, 1 × 1 convolution-batch normalization-ReLU activation (1 × 1Conv-BN-ReLU) operator, 3 × 3 convolution-batch normalization-ReLU activation (3 × 3Conv-BN-ReLU) operator, 1 × 1 convolution-batch normalization (1 × 1Conv-BN) operator, and ReLU activation operator, respectively, arranged in the specific order shown in fig. 1.

The second unit 102 includes 4 network operators, namely, 1 × 1 convolution-batch normalization-ReLU activation (1 × 1Conv-BN-ReLU) operator, 3 × 3 convolution-batch normalization-ReLU activation (3 × 3 differentiation ═ 2Conv-BN-ReLU) operator with a hole of 2, 1 × 1 convolution-batch normalization (1 × 1Conv-BN) operator, and ReLU activation operator, respectively, arranged in the specific order shown in fig. 1.

The third unit 103 includes 4 network operators, namely, 1 × 1 convolution-batch normalization-ReLU activation (1 × 1Conv-BN-ReLU) operator, 3 × 3 convolution-batch normalization-ReLU activation (3 × 3 differentiation ═ 4Conv-BN-ReLU) operator with a hole of 4, 1 × 1 convolution-batch normalization (1 × 1Conv-BN) operator, and ReLU activation operator, respectively, arranged in the specific order shown in fig. 1.

The fourth unit 104 includes 3 net operators, 3 × 3 convolution-batch normalization-ReLU activation (3 × 3Conv-BN-ReLU) operator, 3 × 3 convolution-batch normalization (3 × 3Conv-BN) operator, and ReLU activation operator, respectively, arranged in the specific order shown in fig. 1.

The fifth unit 105 is a null operator, i.e. no operator, does not perform any calculation, and the output is equal to the input. The purpose of the fifth cell 105 is to make the number of layers of the neural network model variable, when a layer selects the fifth cell 105, it means that the layer is not calculated for any operator, which is equivalent to the absence of the layer.

These 5 elements are "bricks" in the search space phase, building up the search space, in other words, the frame of the search space will only be combined with these 5 elements, and no other undefined elements will appear in the search space. The method for establishing the neural network in this embodiment is shown in fig. 2.

In step 201, the delay time of each cell is measured.

In step 202, all delays are made into a look-up table, i.e. the delays of the first unit 101 to the fifth unit 105 are made into a look-up table.

In step 203, a super net (supernet) is set. The super network is a large network formed by combining a plurality of small networks, fig. 3 shows a schematic diagram of the super network of this embodiment, the super network includes any number of stages, i stages (0 th stage to i-1 th stage) are shown in the diagram, each stage includes a plurality of layers, j layers (0 th layer to j-1 th layer) are shown in the diagram, each layer includes parallel first units 101 to fifth units 105, and i and j are positive integers. For example, if i is 4 and j is 10, it means that the super network has 4 levels, each level has 10 layers, so that the super network has 4 × 10 layers, each layer has 5 selectable units (the first unit 101 to the fifth unit 105). The super network is the backbone network, i.e. the so-called search space.

When the search space phase is completed, the search strategy phase is entered.

In step 204, an input control gate is set for each cell to reflect the probability of selecting the cell. Each unit is provided with an input control gate

And the input control gate is used for representing the h unit of the ith level and jth layer in the super network and reflects the probability that the corresponding unit is selected as an operator by the layer.

For example,

representing the probability of the level 1 level 5 first unit 101 being a level 1 level 5 operator,

representing the probability of the level 1 level 5 second cell 102 being a level 1 level 5 operator,

representing the probability that the level 1 level 5 third unit 103 is the operator of level 1 level 5,

representing the probability of the level 1 level 5 fourth cell 104 being a level 1 level 5 operator,

representing the probability that the level 1, layer 5, fifth cell 105 is the operator of the level 1, layer 5. The sum of the values of the input control gates of each layer is 1, i.e.

In this step, the initial values of the input control gates may be randomly set or evenly distributed, for example, the initial settings of 5 input control gates are all 0.2. But the input control gate of the fifth unit 105 (null operator) of the start level (level 0) of each level is constantly 0 and is not updated with model training, since it is ensured that there is at least one level at each level in the evaluation strategy stage.

In step 205, an output control gate is set for each layer to determine the number of channels for that layer. The number of channels in this step refers to the number of channels per output channel, and the output control gate includes a plurality of channel coefficients

And the output control gate represents the ith level and the jth level of the ith level corresponding to the c channel coefficient, and each channel coefficient corresponds to one channel number. Output control gate reflectsThe probability that the layer selects a particular number of channels.

In this embodiment, the number of channels per layer is at most 256, and the output control gate (channel coefficient) has 4 values, e.g., a value of 0 for 64 channels, a value of 1 for 128 channels, a value of 2 for 192 channels, and a value of 3 for 256 channels, so that there are 4 channel numbers available for selection per layer in the piconet. The sum of the values of the output control gates of each layer is 1, i.e.

The output control gate of the fifth unit 105 of each layer is set to 9 and is not updated with the model training, in this embodiment the output control gate is 9 to indicate that the null operator has no channel.

In step 206, a sorter is added after the piconet. The sorter comprises a classifier and a detector, wherein the classifier is used for classifying the images, the image classification refers to that a group of images which are respectively marked as a single class are given, the class of a group of new testing images is subjected to prediction classification by utilizing a neural network, the detector is used for detecting the images, and the image detection refers to that objects in the images are identified and a bounding box and a label are output.

The hyper-network is only a backbone network and is used for screening out the characteristics of the image, and the specific function of the neural network model is determined by a sorter. If a classifier is added after the hyper-network, the neural network model has the function of image classification (such as the neural network model 401 of FIG. 4); if a detector is added after the super net, the neural network model has the function of image detection (such as the neural network model 402 of FIG. 4); if the classifier and the detector are added after the ultra-net, the neural network model has the functions of image classification and detection (such as the neural network model 403 of fig. 4).

In step 207, the output profile for each layer in the supernet is calculated. In this embodiment, the output signature is:

wherein, I^i,jIs an input feature map of the ith level and the jth layer,

the input of the ith unit of the ith level and the ith unit is controlled,

as a function of the output of the ith unit of the ith stage and jth layer,

and controlling the gate for the ith stage jth layer corresponding to the output of the c channel coefficient.

The strategy evaluation stage is divided into evaluation in search and evaluation after search. The evaluation in the search is synchronously evaluated by model reasoning on a training set in a search strategy phase to obtain a loss function or model precision (based on reinforcement learning). And after the search, the evaluation is to obtain a network structure by selecting an input/output control gate, retraining the model on a training set, and finally obtaining the model precision on a verification set for evaluation. This embodiment employs an in-search evaluation, and therefore the search policy phase and the evaluation policy phase follow at the same time.

As previously mentioned, the performance of the super-net can be evaluated in a variety of ways during the strategy evaluation phase, and this embodiment chooses to train the super-net directly with the data set to find the best input control gates, output control gates and other network parameters.

The training of the neural network is to adjust the parameters of each layer by inputting training samples, so that the result calculated by the neural network is as close as possible to the real result. The neural network training comprises forward propagation and backward propagation, wherein the forward propagation is based on the existing model, an input training sample is calculated through each layer of the neural network, an input feature graph is gradually extracted into abstract features, the backward propagation is a loss function calculated according to a forward propagation result and a real value, and the partial derivative of each parameter of the loss function is calculated through a chain rule by adopting a gradient descent method to update the parameters. And training by using the updated parameters, repeating the training for multiple times, and finally enabling the calculation result of forward propagation to be in line with the expectation.

In practice, the training of the neural network model may go through multiple generations of training (epoch). The first generation training refers to a training process using all training samples, a set of these training samples is a training set, and each training batch number (batch size) of training samples is an iteration (iteration). For example, the training set has 1000 training samples, and the number of batches is set to 10, so that 10 training samples are required for each iteration to participate in the training, and 100 iterations are total in a generation of training.

In step 208, the input image data is propagated forward into the super-net to compute a loss function. The training set contains a large number of image data training samples, which are imported from the input end of the super-net, and an output feature map as in step 207 is generated at each layer of the super-net, the output feature map at the last layer of the super-net is input into the analyzer, and the loss function is the deviation of the actual output value and the estimated output value of the analyzer, in other words, the loss function is obtained based on the output feature map of the super-net and the output feature map of the analyzer.

In this embodiment, the loss function includes a control gate loss term, a time loss term, and a sorter loss term.

The gate loss term reflects the amount of L1 regularization loss for the input and output control gates. Regularization is an effective mode for avoiding overfitting of a model and ensuring generalization capability through explicit control model complexity in machine learning, L1 regularization can restrict the magnitude of a number and can also play a role in making parameters more sparse, one part of optimized parameters is 0 and the other part of optimized parameters is non-zero real values, and the parameters of the non-zero real values play a role in selecting important parameters or characteristic dimensions and play a role in removing noise. Because the input control gate corresponds to the probability that each unit becomes the operator of the layer, the loss item of the control gate plays a role in selecting each layer of operators; moreover, since the output control gate corresponds to the number of channels in each layer, the gate loss term also plays the role of selecting the number of channels in each layer.

The time loss term is used to calculate the delay of each cell according to the lookup table obtained in step 202, so that the time loss term allows the neural network model to select the appropriate cell in consideration of the cell delay.

The loss term of the analyzer includes the loss of the classifier and the loss of the detector, i.e. the deviation caused by the analyzer. As previously mentioned, the sorter may include only the classifier or the detector, or both. If only the classifier or the detector is included, the classifier loss term reflects the deviation of the classifier or the deviation of the detector; if both are included, the sorter penalty term reflects the bias of the sorter and the bias of the detector.

Specifically, the loss function for this embodiment is:

λ₀×L_cls+λ₁×L_det+λ₂×L_gate+λ₃×L_lat

wherein λ is₀Super parameter, L, being the amount of loss of the classifier_clsIs the loss of the classifier, lambda₁Super-parameter, L, being the loss of the detector_detIs the loss of the detector, lambda₂For controlling door loss, L_gateTo control door loss, λ₃For time-loss of hyperginseng, L_latFor a loss of time, where λ₀、λ₁、λ₂And lambda₃Greater than or equal to 0, set by the developer based on experience, and not updated with training. If the sorter only includes classifiers, then λ₁Is 0; if the sorter only includes a detector, λ₀Is 0. And time loss L_latComprises the following steps:

wherein, T₀The target time delay is preset and set by a developer according to experience, and T is:

wherein,

input control gate for ith level jth unit_h,cThe delay corresponding to the c-th channel coefficient for the h-th element, may be obtained by the look-up table established in step 202,

and controlling the gate for the output of the ith stage and the jth layer corresponding to the c channel coefficient.

In step 209, all parameters, including the input control gate and the output control gate, are updated according to the loss function. And in the back propagation, a gradient descent method is adopted, and the partial derivative of the loss function to each network parameter is calculated through a chain method. Optionally, the learning rates of the input control gate and the output control gate are different from the learning rates of the other network parameters.

In step 210, the structure of the neural network is determined based on the updated input control gates and output control gates. After training is finished, the numerical value of the input control gate of each unit is updated, at this time, the unit corresponding to the maximum value of the input control gate of each layer is selected, and the unit is set as an operator of each layer. For example, assume that the input control gates of the ith level and the jth level are:

input control gate maximum of

Which corresponds to the third cell 103, the operator at the ith level and the jth layer is set as the third cell 103, and other cells at the same layer are discarded. If the input control gate of the fifth unit 105 of a certain layer is maximum, it indicates that there is no calculation for the layerThis layer can be deleted directly. If there are more than one maximum value of the input control gate, one is selected randomly. In this step, the operator for each layer is selected by the input control gate.

In this step, the number of channels in each layer is also set to be the number of channels corresponding to the output control gate in the layer being greater than the channel threshold, which is between 0 and 1, for example, 0.25. Assuming that the coefficients of 4 channels of a certain layer are 0.2, 0.4, 0.2, and 0.2, respectively, and the second channel coefficient is greater than the channel threshold, the number of output channels of the layer is set to 128. If the channel coefficients are greater than the channel threshold, a greater number of channels are selected. In this step, the number of channels per layer is selected by the output control gate.

At this moment, the supernetwork generates a backbone network of the neural network model through a strategy evaluation stage, and a post-positioned sorter is added, so that a complete neural network model with a classification/detection function is established. The neural network model may be retrained to improve accuracy.

Another embodiment of the present invention is a method of building a neural network using AutoML as well. The neural network comprises a backbone network and a sorter, wherein the backbone network is used for extracting features, the sorter is connected behind the backbone network, and the sorter comprises a classifier and/or a detector, so that the neural network model can perform classification and/or detection tasks.

This embodiment also presets the units shown in fig. 1 during the search space phase. The method of establishing the neural network of this embodiment is shown in fig. 5.

In step 501, the delay of each cell is measured.

In step 502, all delays are fabricated into a look-up table.

In step 503, the piconet as shown in FIG. 3 is set. The super network is a backbone network and is also a search space.

When the search space phase is completed, the search strategy phase is entered.

In step 504, an input control gate is set for each cell to reflect the probability of selecting the cell. Each cell being provided with an input controlDoor making device

And the input control gate is used for representing the h unit of the ith level and jth layer in the super network and reflects the probability that the corresponding unit is selected as an operator by the layer. The sum of the values of the input control gates of each layer is 1. In this step, the initial values of the input control gates can be randomly set or evenly distributed, and the input control gates of the fifth unit 105 (null operator) of the start level (level 0) of each stage are constantly 0 and are not updated with the model training, because it is ensured that there is at least one level at each stage in the evaluation strategy stage.

In step 505, an output control gate is provided for each layer to determine the number of channels for that layer. The number of channels in the step refers to the number of channels of each layer of output channels, the output control gate comprises a plurality of channel coefficients, each channel coefficient corresponds to one channel number, and the sum of the values of the output control gates in each layer is 1.

In step 506, a sorter is added after the piconet. The analyzer of this embodiment is the same as the analyzer of the previous embodiment, and is not described again.

In step 507, an output feature map for each layer in the supernet is calculated. In this embodiment, the output signature is:

wherein, I^i,jIs an input characteristic diagram of the ith level and the jth layer,

the input of the ith unit of the ith level and the ith unit is controlled,

for the ith level jth layer h unitThe function is given out according to the number of the functional groups,

This embodiment also employs in-search evaluation, and therefore the search policy phase and the evaluation policy phase follow at the same time.

In step 508, the input image data is propagated forward into the super-net to compute a loss function. The training set contains a large number of image data training samples, which are imported from the input end of the super-net, and an output feature map as in step 507 is generated at each layer of the super-net, the output feature map at the last layer of the super-net is input into the analyzer, and the loss function is the deviation of the actual output value and the estimated output value of the analyzer, in other words, the loss function is obtained based on the output feature map of the super-net and the output feature map of the analyzer.

In this embodiment, the loss function also includes a control gate loss term, a time loss term, and a demultiplexer loss term, which are the same as the loss function of the previous embodiment and are not described again.

In step 509, the network parameters of the non-input control gate and the output control gate are updated according to the loss function. In the back propagation, a gradient descent method is adopted, the partial derivative of the loss function to each network parameter is calculated through a chain method, and the numerical values of the network parameters except for the input control gate and the output control gate are updated based on a certain learning rate. The reason why the input control gate and the output control gate are not updated in this step is that other parameters are updated first, and then the input control gate and the output control gate are updated after the training period of other parameters approaches a reasonable value, so that the whole network can be converged more quickly.

To inhibit updating the input and output control gates, the specific operation mode can be to set the gate probability P for the input control gate in the loss function₁And setting a gate probability P for the output control gate₂In this step P₁＝P₂And 0, hiding the input control gate and the output control gate.

In step 510, it is determined whether the training progress has reached a progress threshold. The progress threshold may be set to 10%, 20%, 25% or 30%, in this embodiment 25%, when the training progress has not reached the progress threshold, the process proceeds to step 509, and the network parameters of the non-input control gate and the output control gate are updated. When the training progress reaches the progress threshold, step 511 is performed.

In step 511, the input control gate and the output control gate are updated according to the loss function. The initial values of the input control gate and the output control gate are set randomly, a gradient descent method is adopted in the reverse propagation, the partial derivatives of the loss function to the input control gate and the output control gate are calculated through a chain method, the numerical values of the input control gate and the output control gate are updated, and the updated input control gate and the updated output control gate are used for training until the training is finished. In this step P₁＝P ₂1, thereby introducing an input control gate and an output control gate into the loss function.

Optionally, the learning rates of the input control gate and the output control gate are different from the learning rates of the other network parameters. It should be noted that in this step, the network parameters of the non-input control gate and the output control gate are still updated.

In step 512, the structure of the neural network is determined based on the updated input control gates and output control gates. After training is finished, the numerical value of the input control gate of each unit is updated, the unit corresponding to the maximum value of the input control gate of each layer is selected, the unit is set as an operator of each layer, and other units on the same layer are abandoned. If the input control gate of the fifth unit 105 of a certain layer is maximum, it indicates that there is no operator in the layer, and the layer can be directly deleted. If there are more than one maximum value of the input control gate, one is selected randomly. In this step, the operator for each layer is selected by the input control gate.

And simultaneously setting the number of channels of each layer as the number of channels corresponding to the output control gates in the layer larger than the channel threshold, wherein the channel threshold is between 0 and 1, and if the output control gates are larger than the channel threshold, selecting the channel with the largest number. In this step, the number of channels per layer is selected by the output control gate.

At this moment, the supernetwork finishes the stage of evaluating strategies, generates a backbone network of the neural network model, and establishes a complete neural network model with classification/detection functions by adding a post-positioned sorter. The neural network model may be retrained to improve accuracy.

Another embodiment of the present invention is a method of building a neural network using AutoML as well. The neural network comprises a backbone network and a sorter, wherein the backbone network is used for extracting features, the sorter is connected behind the backbone network, and the sorter comprises a classifier and a detector, so that the neural network model can execute classification and/or detection tasks.

This embodiment also presets the units shown in fig. 1 during the search space phase. The method for establishing the neural network in this embodiment is shown in fig. 6.

In step 601, the delay of each cell is measured.

In step 602, all delays are made into a look-up table.

In step 603, the piconet as shown in FIG. 3 is set. The piconet is a backbone network and is also a search space.

When the search space phase is completed, the search strategy phase is entered.

In step 604, an input control gate is set for each cell to reflect the probability of selecting the cell. Each unit is provided with an input control gate

And the input control gate is used for representing the h unit of the ith level and jth layer in the super network and reflects the probability that the corresponding unit is selected as an operator by the layer. The sum of the values of the input control gates of each layer is 1. In this step, the initial values of the input control gates can be set randomly or distributed evenly, and the input control gates of the fifth units 105 (null operators) of the initial layer (layer 0) of each stage are constantly 0 and are not updated along with the model training, because it is ensured that at least one layer exists at each stage of the evaluation strategy stage.

In step 605, an output control gate is set for each layer to determine the number of channels for the layer. The number of channels in this step refers to the number of channels in each layer of output channels, the output control gate includes a plurality of channel coefficients, each channel coefficient corresponds to one channel number, and the sum of the values of the output control gates in each layer is 1.

In step 606, a sorter is added after the piconet. The analyzer of this embodiment is the same as the analyzer of the previous embodiment, and is not described again.

In step 607, the output signature for each layer in the piconet is computed. In this embodiment, the output profile is:

wherein, I^i,jIs an input feature map of the ith level and the jth layer,

the input of the ith unit of the ith level and the ith unit is controlled,

as a function of the output of the ith unit of the ith stage and jth layer,

In step 608, the input image data is propagated forward into the hyper-network to calculate a loss function. The training set contains a large number of training samples of image data, which are imported from the input end of the super-net, and an output feature map as in step 607 is generated at each layer of the super-net, and the output feature map of the last layer of the super-net is input into the analyzer, and the loss function is the deviation of the actual output value and the estimated output value of the analyzer, in other words, the loss function is obtained based on the output feature map of the super-net and the output feature map of the analyzer.

In this embodiment, the loss function includes a control gate loss term, a time loss term, and a sorter accuracy term.

The gate loss term reflects the amount of L1 regularization loss for the input and output control gates. The time loss term is used to calculate the delay of each cell according to the lookup table obtained in step 602, so that the time loss term allows the neural network model to select the appropriate cell in consideration of the cell delay.

The classifier accuracy term includes the accuracy of the classifier and/or the accuracy of the detector. As previously mentioned, the sorter may include only the classifier or the detector, or both. If only the classifier or the detector is included, the item of the accuracy rate of the sorter reflects the accuracy rate of the classifier or the accuracy rate of the detector; if both are included, the sorter accuracy term reflects the accuracy of the classifier and the accuracy of the detector.

Specifically, the loss function for this embodiment is:

λ₀×Q_cls+λ₁×Q_det+λ₂×L_gate+λ₃×L_lat

wherein λ is₀Super-parameter, Q, for the accuracy of the classifier_clsTo the accuracy of the classifier, λ₁Super-parametric, Q, for detector accuracy_detFor the accuracy of the detector, λ₂For controlling door loss, L_gateTo control door loss, λ₃For time-loss of hyperginseng, L_latFor a loss of time, where λ₀And lambda₁Greater than or equal to 0, λ₂And lambda₃Less than 0, whose value is set by the developer empirically and is not updated with training. If the sorter only includes classifiers, λ₁Is 0; if the sorter only includes a detector, λ₀Is 0. And time loss L_latComprises the following steps:

wherein,

In step 609, all parameters, including the input control gate and the output control gate, are updated according to the loss function. In the embodiment, parameters are updated by adopting reinforcement learning (reinforcement learning), wherein the reinforcement learning refers to iterative training in which a hyper-network is regarded as reinforcement learning agent model learning, a search neural network architecture space is established by utilizing the action of the agent model in the space, and then the generated model is rewarded or punished by combining reinforcement learning technology according to the performance of the generated model on a test set, so that the agent model can adjust the frame generation direction according to the rewarding and punishing. In this embodiment, the network structure parameters and the input/output control gates are alternately updated, the network structure parameters are updated by a traditional gradient descent method, and the input/output control gates are updated by a reinforcement learning method.

In step 610, the structure of the neural network is determined based on the updated input control gates and output control gates. And after training is finished, selecting a unit corresponding to the maximum value of each layer of input control gate, setting the unit as an operator of each layer, and abandoning other units on the same layer. If there are more than one maximum value of the input control gate, one is selected randomly. And then setting the number of channels of each layer as the number of channels corresponding to the output control gate in the layer which is larger than the channel threshold, wherein the channel threshold is between 0 and 1. If the channel coefficients are greater than the channel threshold, the number of channels is selected to be greater.

After the aforementioned embodiments establish the neural network model, the model can be used for image classification, inputting image data into the neural network for inference to perform predictive classification, and can also be used for image detection, inputting image data into the neural network for inference to identify objects in the image and output bounding boxes and labels for the objects.

Another embodiment of the present invention is a computer readable storage medium having stored thereon computer program code for establishing a neural network, which when executed by a processor, performs the method of the embodiments as described above. When the aspects of the present invention are embodied in a software product (e.g., a computer-readable storage medium), the software product may be stored in a memory, and may include instructions for causing a computer device (e.g., a personal computer, a server, a network device, etc.) to perform some or all of the steps of the method described in the embodiments of the present invention. The Memory may include, but is not limited to, a usb disk, a flash disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

The invention utilizes automatic machine learning, simultaneously aims at the number of channels and time delay, enables the searched network structure to be beneficial to classification and detection tasks through the arrangement of the search space, and compresses the size of the model as much as possible on the premise of ensuring no loss of precision.

It is noted that for the sake of simplicity, the present invention sets forth some methods and embodiments thereof as a series of acts or combinations thereof, but those skilled in the art will appreciate that the inventive arrangements are not limited by the order of acts described. Accordingly, persons skilled in the art may appreciate that certain steps may be performed in other sequences or simultaneously, in accordance with the disclosure or teachings of the invention. Further, those skilled in the art will appreciate that the described embodiments of the invention are capable of being practiced in other than the described embodiments, i.e., that the acts or elements described herein are not necessarily required to practice one or more aspects of the invention. In addition, the description of some embodiments of the present invention is also focused on different schemes. In view of this, those skilled in the art will understand that portions of the present invention that are not described in detail in one embodiment may also refer to related descriptions of other embodiments.

In particular implementations, based on the disclosure and teachings of the present invention, one of ordinary skill in the art will appreciate that the several embodiments disclosed herein can be practiced in other ways not disclosed herein. For example, as for the units in the foregoing embodiments of the electronic device or apparatus, the units are split based on the logic function, and there may be another splitting manner in the actual implementation. Also for example, multiple units or components may be combined or integrated with another system or some features or functions in a unit or component may be selectively disabled. The connections discussed above in connection with the figures may be direct or indirect couplings between the units or components in terms of connectivity between the different units or components. In some scenarios, the aforementioned direct or indirect coupling involves a communication connection utilizing an interface, where the communication interface may support electrical, optical, acoustic, magnetic, or other forms of signal transmission.

The foregoing may be better understood in light of the following clauses:

clause a1, a method of establishing a neural network, comprising: setting a super-network, wherein the super-network comprises a plurality of stages, each stage comprises a plurality of layers, and each layer comprises a plurality of units; setting an input control gate for each cell to reflect the probability of selecting the cell; setting an output control gate for each layer to determine the number of channels of the layer; inputting image data to the ultra-network for forward propagation so as to calculate a loss function; updating the input control gate and the output control gate according to a loss function; and determining the structure of the neural network according to the updated input control gate and the updated output control gate.

Clause a2, the method of clause a1, wherein the plurality of units comprises: a plurality of network operators arranged in a specific order; and a null operator.

Clause A3, the method of clause a2, further comprising: setting the input control gate of the null operator corresponding to the start layer of each stage to 0.

Clause a4, the method of clause a2, further comprising: setting the output control gate corresponding to the null operator to 1.

Clause a5, the method of clause a1, further comprising: measuring the delay of each cell; and making all delays into a look-up table.

Clause a6, the method of clause a5, further comprising: adding a sorter after the ultra-net, wherein the sorter comprises at least one of a classifier and a detector; wherein the inputting step further calculates the loss function from an output profile of the sorter.

Clause a7, the method of clause a6, wherein the loss function comprises: the loss of the sorter comprises the loss amount corresponding to the sorter and the loss amount of the detector; a gate loss corresponding to an L1 regularization loss amount of the input control gate and the output control gate; and time loss, calculating the time delay of each unit according to the lookup table.

Clause A8, the method of clause a7, wherein the loss function is:

λ₀×L_cls+λ₁×L_det+λ₂×L_gate+λ₃×L_lat

wherein λ is₀A parameter, L, of the loss of the classifier_clsIs the loss of the classifier, λ₁A parameter, L, of the loss of said detector_detIs the loss of the detector, λ₂For the override of the loss of the control gate, L_gateFor loss of said control gate, λ₃For said time-loss of hyper-ginseng, L_latIs the loss of time.

Clause a9, the method of clause A8, wherein λ₀、λ₁、λ₂And lambda₃Greater than or equal to 0.

Clause a10, the method of clause a6, wherein the loss function comprises: a sorter accuracy comprising an accuracy corresponding to the classifier and an accuracy of the detector; a gate loss corresponding to an L1 regularization loss amount of the input control gate and the output control gate; and time loss, calculating the time delay of each unit according to the lookup table.

Clause a11, the method of clause a10, wherein the loss function is:

λ₀×Q_cls+λ₁×Q_det+λ₂×L_gate+λ₃×L_lat

wherein λ is₀A super parameter, L, of the accuracy of the classifier_clsIs the accuracy of the classifier, λ₁A super parameter, L, of the accuracy of the detector_detFor the accuracy of the detector, λ₂For said override of control gate losses, L_gateFor loss of said control gate, λ₃For said time-loss of hyper-ginseng, L_latIs the loss of time.

Clause a12, the method of clause a11, wherein λ₀And lambda₁Greater than or equal to 0, λ₂And lambda₃Is less than 0.

Clause a13, the method of clause a7 or 10, wherein the time loss is:

wherein, T₀For a preset target delay, T is:

wherein,

input control gate for ith level jth unit_h,cThe delay corresponding to the c channel coefficient for the h unit in the lookup table,

an output control gate corresponding to the c channel coefficient for the ith level and the jth layer;

clause a14, the method of clause a1, wherein the sum of the values of the input control gates of each layer is 1.

Clause a15, the method of clause a1, wherein the output control gate comprises a plurality of channel coefficients, each channel coefficient corresponding to a channel number.

Clause a16, the method of clause a15, wherein the channel coefficient is 4.

Clause a17, the method of clause a15, further comprising: calculating an output characteristic diagram of each layer, wherein the output characteristic diagram is as follows:

the input of the ith unit of the ith level and the ith unit is controlled,

is the ith stageThe output function of the h-th cell of the j-th layer,

wherein the inputting step derives based on the output feature map.

Clause a18, the method of clause a1, further comprising: judging whether the training progress reaches a progress threshold value; and if so, performing the updating step.

Clause a19, the method of clause a18, wherein the progress threshold is 25%.

Clause a20, the method of clause a1, wherein the deciding step comprises: and setting an operator of each layer as a unit corresponding to the maximum value of the input control gate in the layer.

Clause a21, the method of clause a1, wherein the determining step comprises: and setting the number of channels of each layer as the number of channels corresponding to the output control gate in the layer which is larger than the channel threshold.

Clause a22, the method of clause a21, wherein the channel threshold is between 0 and 1.

Clause a23, a computer readable storage medium having stored thereon computer program code for establishing a neural network, the computer program code, when executed by a processing device, performing the method of any of clauses a 1-22.

The above embodiments of the present invention are described in detail, and the principle and the implementation of the present invention are explained by applying specific embodiments, and the above description of the embodiments is only used to help understanding the method of the present invention and the core idea thereof; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method of establishing a neural network, comprising:

setting a super-network, wherein the super-network comprises a plurality of stages, each stage comprises a plurality of layers, and each layer comprises a plurality of units;

setting an input control gate for each cell to reflect the probability of selecting the cell;

setting an output control gate for each layer to determine the number of channels of the layer;

inputting image data to the ultra-network for forward propagation so as to calculate a loss function;

updating the input control gate and the output control gate according to a loss function; and

and determining the structure of the neural network according to the updated input control gate and the updated output control gate.

2. The method of claim 1, wherein the plurality of cells comprises:

a plurality of network operators arranged in a specific order; and

and (4) a null operator.

3. The method of claim 2, further comprising:

setting the input control gate of the null operator corresponding to the start layer of each stage to 0.

4. The method of claim 2, further comprising:

setting the output control gate corresponding to the null operator to 1.

5. The method of claim 1, further comprising:

measuring the delay of each cell; and

and making all delays into a lookup table.

6. The method of claim 5, further comprising:

adding a sorter after the ultra-net, wherein the sorter comprises at least one of a classifier and a detector;

wherein the inputting step further calculates the loss function from an output profile of the sorter.

7. The method of claim 6, wherein the loss function comprises:

the loss of the sorter comprises the loss amount corresponding to the sorter and the loss amount of the detector;

a gate loss corresponding to an L1 regularization loss amount of the input control gate and the output control gate; and

and time loss, and calculating the time delay of each unit according to the lookup table.

8. The method of claim 7, wherein the loss function is:

λ₀×L_cls+λ₁×L_det+λ₂×L_gate+λ₃×L_lat

9. The method of claim 8, wherein λ₀、λ₁、λ₂And lambda₃Greater than or equal to 0.

10. The method of claim 6, wherein the loss function comprises:

a sorter accuracy comprising an accuracy corresponding to the classifier and an accuracy of the detector;

11. The method of claim 10, wherein the loss function is:

λ₀×Q_cls+λ₁×Q_det+λ₂×L_gate+λ₃×L_lat

wherein λ is₀A super parameter, L, of the accuracy of the classifier_clsIs the accuracy of the classifier, λ₁A super parameter, L, of the accuracy of the detector_detFor the accuracy of the detector, λ₂For the override of the loss of the control gate, L_gateFor loss of said control gate, λ₃For said time-loss of hyper-parameter, L_latIs the loss of time.

12. The method of claim 11, wherein λ₀And lambda₁Greater than or equal to 0, λ₂And lambda₃Less than 0.

13. The method of claim 7 or 10, wherein the time loss is:

wherein, T₀For a preset target delay, T is:

wherein,

14. The method of claim 1, wherein the sum of the values of the input control gates of each layer is 1.

15. The method of claim 1, wherein the output control gate comprises a plurality of channel coefficients, each channel coefficient corresponding to a number of channels.

16. The method of claim 15, wherein the channel coefficients are 4.

17. The method of claim 15, further comprising:

calculating an output characteristic diagram of each layer, wherein the output characteristic diagram is as follows:

the input of the ith unit of the ith level and the ith unit is controlled,

as a function of the output of the ith unit of the ith stage and jth layer,

an output control gate corresponding to the c channel coefficient for the ith layer and the jth layer;

wherein the inputting step derives based on the output feature map.

18. The method of claim 1, further comprising:

judging whether the training progress reaches a progress threshold value; and

if so, the updating step is performed.

19. The method of claim 18, wherein the progress threshold is 25%.

20. The method of claim 1, wherein the deciding step comprises:

and setting an operator of each layer as a unit corresponding to the maximum value of the input control gate in the layer.

21. The method of claim 1, wherein the deciding step comprises:

and setting the number of channels of each layer as the number of channels corresponding to the output control gate in the layer which is larger than the threshold value of the channel.

22. The method of claim 21, wherein the channel threshold is between 0 and 1.

23. A computer readable storage medium having stored thereon computer program code for establishing a neural network, which when executed by a processing device, performs the method of any of claims 1-22.