CN112364981A

CN112364981A - Differentiable searching method and device of mixed precision neural network

Info

Publication number: CN112364981A
Application number: CN202011249481.1A
Authority: CN
Inventors: 常成; 朱雪娟; 余浩; 毛伟; 代柳瑶; 李凯; 王宇航
Original assignee: Southern University of Science and Technology
Current assignee: Shenzhen Maitexin Technology Co ltd
Priority date: 2020-11-10
Filing date: 2020-11-10
Publication date: 2021-02-12
Anticipated expiration: 2040-11-10
Also published as: CN112364981B

Abstract

The invention discloses a differentiable searching method and a differentiable searching device for a mixed precision neural network. The method comprises the following steps: acquiring an initialization hyper network; the super network comprises a plurality of sub networks, and each sub network carries a super parameter; updating the hyper-parameters based on a differentiable search method to obtain a first hyper-network; hardware performance evaluation is carried out on sub-networks contained in the first super-network, and the super-parameters of the first super-network are updated according to the evaluation result to obtain a second super-network; judging whether an updating termination condition is met, and if so, determining the second hyper-network as a target neural network; otherwise, returning to execute the operation of updating the hyper-parameters based on the differentiable search method to obtain the first hyper-network. By using the method, automatic model quantification can be performed, and a neural network can be searched and constructed for a specific hardware platform.

Description

Differentiable searching method and device of mixed precision neural network

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to a differentiable searching method and a differentiable searching device for a mixed precision neural network.

Background

The deep learning can automatically learn useful features, the dependence on feature engineering is removed, and results exceeding other algorithms are obtained on tasks such as image recognition, video understanding and natural language processing. This success has benefited in large part from the advent of new neural network architectures such as ResNet, inclusion, DenseNet, MobileNet, and the like.

Neural Architecture Search (NAS) is a technology for automatically designing a Neural network, and a high-precision and high-performance network structure can be automatically designed according to a data set through an algorithm, can match or even exceed the level of human experts on certain tasks, and can find some network structures which have not been proposed before by human beings. Compared with the traditional manual design network structure and the hyper-parameters, the neural architecture search can effectively reduce the design and use cost of the neural network.

However, with the increase of the complexity of the actual task, a neural network with a larger and deeper architecture needs to be designed, and meanwhile, the model needs to be more widely deployed and applied to different hardware platforms. In addition, although the traditional NAS method based on reinforcement learning and the NAS method based on the evolutionary algorithm can design a network with high precision and high performance, the search algorithm itself needs to consume too high computing resources, which is not favorable for large-scale popularization and application of the NAS method.

Therefore, it is a technical problem to be solved urgently that a neural network architecture search method is provided, which has a fast search speed, consumes less computation and memory resources, can automatically perform model quantization, and can search for a specific hardware platform.

Disclosure of Invention

The embodiment of the invention provides a differentiable searching method and a differentiable searching device for a mixed precision neural network, which can carry out automatic model quantification and can search and construct the neural network aiming at a specific hardware platform.

In a first aspect, an embodiment of the present invention provides a differentiable search method for a mixed-precision neural network, including:

acquiring an initialization hyper network; the super network comprises a plurality of sub networks, and each sub network carries a super parameter;

updating the hyper-parameters based on a differentiable search method to obtain a first hyper-network;

hardware performance evaluation is carried out on sub-networks contained in the first super-network, and the super-parameters of the first super-network are updated according to the evaluation result to obtain a second super-network;

judging whether an updating termination condition is met, and if so, determining the second hyper-network as a target neural network; otherwise, returning to execute the operation of updating the hyper-parameters based on the differentiable search method to obtain the first hyper-network.

In a second aspect, an embodiment of the present invention further provides a differential search apparatus for a hybrid-precision neural network, including:

an acquisition module for acquiring an initialized hyper-network; the super network comprises a plurality of sub networks, and each sub network carries a super parameter;

the first updating module is used for updating the hyper-parameters based on a differentiable searching method to obtain a first hyper-network;

a second updating module, configured to perform hardware performance evaluation on a sub-network included in the first super-network, and update a super-parameter of the first super-network according to an evaluation result, so as to obtain a second super-network;

the judging module is used for judging whether the updating termination condition is met or not, and if so, determining the second hyper-network as a target neural network; otherwise, returning to execute the operation of updating the hyper-parameters based on the differentiable search method to obtain the first hyper-network.

In a third aspect, an embodiment of the present invention further provides a computer device, including:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a method for differentiable search of a mixed-precision neural network as described in any of the embodiments of the present invention.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the method for differentiable search of a mixed-precision neural network according to any embodiment of the present invention.

The embodiment of the invention provides a differentiable searching method and a differentiable searching device for a hybrid precision neural network, which comprises the steps of firstly obtaining an initialized super network, wherein the super network comprises a plurality of sub-networks, each sub-network carries a super parameter, then updating the super parameters based on a differentiable searching method to obtain a first super network, then carrying out hardware performance evaluation on the sub-networks contained in the first super network, updating the super parameters of the first super network according to an evaluation result to obtain a second super network, finally judging whether an updating termination condition is met, and if the updating termination condition is met, determining the second super network as a target neural network; otherwise, returning to execute the operation of updating the hyper-parameters based on the differentiable search method to obtain the first hyper-network. By utilizing the technical scheme, automatic model quantification can be carried out, and a neural network can be searched and constructed aiming at a specific hardware platform.

Drawings

Fig. 1 is a schematic flowchart of a method for performing differential search on a hybrid-precision neural network according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a differentiable searching method of a hybrid-precision neural network according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of a differentiable searching apparatus of a hybrid-precision neural network according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like. In addition, the embodiments and features of the embodiments in the present invention may be combined with each other without conflict.

The term "include" and variations thereof as used herein are intended to be open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment".

Example one

Fig. 1 is a flowchart illustrating a method for performing a differential search of a hybrid-precision neural network according to an embodiment of the present invention, where the method is applicable to searching a high-performance neural network, and the method can be performed by a differential search apparatus of the hybrid-precision neural network, where the apparatus can be implemented by software and/or hardware and is generally integrated on a computer device.

As shown in fig. 1, a method for differentiable search of a hybrid-precision neural network according to an embodiment of the present invention includes the following steps:

s110, acquiring an initialized hyper-network; the super network comprises a plurality of sub-networks, and each sub-network carries a super parameter.

In this embodiment, the initialized hyper-network may be a hyper-network having initial value settings, and the initialized hyper-network may be obtained from a search space, for example.

Sub-network is understood to be the above sub-network that initializes the super-network, a super-network may be formed by a plurality of sub-networks, and a sub-network may include a plurality of network layers.

The sub-networks carry hyper-parameters, wherein the hyper-parameters are continuously differentiable, the hyper-parameters of one sub-network can be a spatial set formed by a plurality of groups of vectors, and the hyper-parameters of one network layer in one sub-network can be formed by two different groups of vectors, wherein one group of vectors can be formed by influence factors of convolution kernel configuration parameters of the network layer, and the other group of vectors can be formed by influence factors of quantized bit values of the network layer. The number of the influence factors of the two groups of vectors can be determined according to the number of the convolution kernel configuration parameters and the number of the quantization bit values, and if the number of the convolution kernel configuration parameters of the layer network is three, the number of the influence factors of the corresponding convolution kernel configuration parameters of the layer network is also three.

Wherein the convolution kernel configuration parameters may include one or more of: a number of convolution kernels per layer, a convolution kernel height per layer per convolution kernel, a convolution kernel width per layer per convolution kernel, a stride height per layer per convolution kernel, and a stride width per layer per convolution kernel. The convolution kernel configuration parameters of each layer network may be a vector of any number of parameters selected from the above parameters.

Wherein the quantized bit values may comprise one or more of: the data bit width of the characteristic graph of each layer of the convolutional neural network, the data bit width of the weight of each layer of the convolutional neural network and the data bit width of the activation function of each layer. The quantized bit values of each layer of the network may be a vector of at least one value arbitrarily selected from the above-mentioned values.

The influence factor of the convolution kernel configuration parameter may represent the degree of influence on each parameter in the convolution kernel configuration parameter, and for example, the larger the influence factor corresponding to one parameter is, the larger the influence of the parameter in all the convolution kernel configuration parameters of the network layer is; correspondingly, the influence factor of the quantized bit value may characterize the degree of influence on the quantized bit value, and for example, the larger the influence factor corresponding to a certain value of the quantized bit value, the larger the influence factor indicating that the value has a larger influence on all the quantized bit values of the network layer.

And S120, updating the hyper-parameters based on a differentiable search method to obtain a first hyper-network.

In this embodiment, the differentiable search method may be a search method of a neural network, and may be used to receive accuracy feedback of the neural network. The obtaining of the first hyper-network may be understood as updating the hyper-parameters carried by several sub-networks sampled from the hyper-network based on a differential search method.

Sampling a plurality of sub-networks from the super-network by using a Gumbel-softmax, for example, transforming the super-parameter into a probability vector by the Gumbel-softmax, sampling a plurality of sub-networks from the super-network according to the size of the probability vector, wherein the probability vector can be understood as a vector formed by the product of an influence factor of a convolution kernel configuration parameter of each layer network and an influence factor of a quantized bit value, and the larger the probability vector is, the higher the probability is that the sub-network corresponding to the layer network is sampled.

Updating the hyper-parameters based on the differentiable search method, and obtaining the first hyper-network can be understood as: firstly, keeping the hyperparameters carried by the sub-networks unchanged, training the sub-networks on a training set, updating the weight parameters of the sub-networks, keeping the weight parameters unchanged, carrying out forward propagation on the trained sub-networks on a verification set to obtain values of network target loss functions, carrying out derivation on the hyperparameters according to the values of the network target loss functions to obtain gradient values, and updating the hyperparameters of the sub-networks according to the gradient values.

The training set is used for fitting the model, and the classification model is trained by setting parameters of the classifier. When the verification set is subsequently combined, different values of the same parameter can be selected, and a plurality of classifiers are fitted.

The verification set is used for verifying the data of the verification set by using each model and updating the hyper-parameters based on the loss on the verification set in order to update the hyper-parameters after a plurality of models are trained by the training set.

The first super network may be a new super network composed of updated super parameters after updating the super parameters by a differential search method.

S130, evaluating the hardware performance of the sub-networks contained in the first super-network, and updating the super-parameters of the first super-network according to the evaluation result to obtain a second super-network.

In this embodiment, before performing the hardware performance evaluation, the first super network needs to be sampled to obtain a plurality of second sampling sub-networks, wherein the sampling method is the same as the sampling method in step S120.

The step can be realized by an evolutionary algorithm, the evolutionary algorithm in this embodiment can be used for receiving and processing non-differentiable hardware feedback, and updating the hyper-parameters of the first hyper-network based on the received hardware feedback, and the hardware feedback can be understood as feedback of hardware performance indexes of each layer of the sub-network by a hardware evaluation model.

The hardware performance evaluation model may be used to evaluate the hardware performance of the network, and the hardware performance evaluation model may evaluate the hardware performance of the second sampling subnetwork in the first super network. The hardware performance evaluation may be understood as evaluating a hardware performance indicator of the target hardware.

The hardware evaluation model may be configured to evaluate a hardware performance indicator of the neural network model on the target hardware, and in particular, the hardware evaluation model may be configured to evaluate a hardware performance indicator of the second sampling subnetwork model on the target hardware, where the hardware performance indicator may include one or more of power consumption, latency, and model size, for example.

The evaluation mode can include the following two modes:

the first method is as follows: deploying a plurality of second sampling sub-network models to target hardware to obtain hardware performance indexes;

the second method comprises the following steps: and determining the hardware performance index according to the convolution kernel configuration parameters and the sizes of the quantized bit values of the plurality of second sampling sub-network models.

Updating the hyper-parameters of the first hyper-network according to the evaluation result, wherein the evaluation result can be understood as the evaluation result of the hardware performance index of the second sampling sub-network, and if the evaluation result of the hardware performance index of one or more of the second sampling sub-networks is optimal, it indicates that the convolution kernel configuration parameter and the quantization bit value of the second sampling sub-network are optimal.

After the optimal second sampling sub-network is determined, the values of the influence factors corresponding to the convolution kernel configuration parameters and the quantized bit values of the specific network layer in the optimal second sampling sub-network can be increased, and after the values of the influence factors are increased, the optimal second sampling sub-network can be changed into a new sub-network, and the super-network in which the optimal second sampling sub-network is located is correspondingly updated into a new super-network.

The second super network may be a network in which the influence factors of the convolution kernel configuration parameters and the influence factors of the quantized bit values of the internal network layer of the optimal second sampling sub-network are updated.

S140, judging whether the updating termination condition is met, and if so, executing a step 150; otherwise, return to execute step 120.

In this embodiment, the update termination condition may be a condition for the super network to terminate the update, that is, a condition for the neural network architecture to stop searching. Specifically, the update termination condition may include the following two conditions: first, the update termination condition may be when both the validation set feedback and the hardware feedback of the sub-network satisfy a preset index; second, the termination update condition may be when the number of times of the super network update reaches a set threshold.

The verification set feedback can be understood as feedback of accuracy of the sub-network model in the differentiable search method, and the hardware feedback can be understood as feedback of a result of hardware performance evaluation.

The preset index may be a value set by a user according to an actual situation before the method provided by the embodiment is used. And when the updated hyper-parameters of the hyper-network meet the preset indexes, stopping updating the hyper-parameters of the hyper-network, and determining the hyper-network as a target neural network, wherein the target neural network can be understood as the hyper-network meeting the user requirements.

The set threshold value can be the updating times of the hyper-network hyper-parameters set by a user, and when the updating times reach the set threshold value, the neural network construction method provided by the embodiment can automatically stop updating the hyper-parameters, and determine the hyper-network obtained by the last updating as the target neural network.

If the second hyper-network does not meet the condition of updating termination, the steps of S120 and S130 are continuously executed, the hyper-network is updated accordingly, and finally, whether the updated hyper-network meets the condition of updating termination is judged. And circulating the steps until the updated hyper-network meets the update termination condition, and stopping updating the hyper-network.

Step 150, the second hyper-network is determined to be the target neural network.

The differentiable searching method of the mixed precision neural network provided by the embodiment of the invention comprises the following steps of firstly, obtaining an initialized hyper-network; the super network comprises a plurality of sub networks, and each sub network carries a super parameter; secondly, updating the hyper-parameters based on a differentiable search method to obtain a first hyper-network; then, hardware performance evaluation is carried out on sub-networks contained in the first super-network, and the super-parameters of the first super-network are updated according to the evaluation result to obtain a second super-network; finally, whether an updating termination condition is met is judged, and if yes, the second hyper-network is determined to be a target neural network; otherwise, returning to execute the operation of updating the hyper-parameters based on the differentiable search method to obtain the first hyper-network. By utilizing the method, the design and use cost of the neural network can be effectively reduced, and compared with the traditional neural network architecture searching method, the searching speed is obviously improved; in addition, compared with a full-precision neural architecture searching method, the method can search a lightweight network with lower complexity.

Further, the super network is a convolutional neural network, and the sub-network comprises at least one layer of network; the super-parameter includes an influence factor of a convolution kernel configuration parameter and an influence factor of a quantization bit value of each layer network, and is continuously differentiable.

Specifically, the super network may be a convolutional neural network, and the sub-network may include one or more layers of networks, where the number of network layers of the sub-network and the super network is not particularly limited.

The better the hardware feedback and verification set feedback results of the sub-network are, the better the convolution kernel configuration parameters and the quantization bit values of the sub-network are, the more optimal the corresponding hyper-parameters in the sub-network can be increased, and thus a new updated hyper-network is obtained.

It should be noted that the hyper-parameters may be continuously differentiable for verification set feedback. For example, the loss function of the neural network can be used for deriving the hyper-parameter, and then the hyper-parameter can be iteratively optimized by using a gradient descent method.

Further, the manner of obtaining the first super network by updating the super parameters based on the differentiable search method may be: sampling sub-networks included in the initialized super-network to obtain at least one first sampling sub-network; training a first sampling subnetwork based on a training set; forward propagating the verification set in the trained first sampling sub-network to obtain a target loss function value; carrying out differential calculation on the target loss function value to obtain a gradient value; and updating the hyper-parameters according to the gradient values to obtain a first hyper-network.

Wherein, after sampling the sub-networks included in the initialized super-network, one or more sampling sub-networks can be obtained according to the sampling probability, and a sampling sub-network can be understood as a sub-network sampled by using the gum-softmax from the initialized super-network.

Training the first sampling subnetwork based on a training set can comprise: one or more first sampling sub-networks are trained several times on the training set to update the weight parameters of these first sampling sub-networks, and then the weight parameters are kept unchanged.

Where the target loss function value may be the function that ultimately needs to be optimized. The differential calculation of the target loss function value can be understood as that the target loss function carries out derivative calculation on the hyperparameter to obtain the gradient value of the hyperparameter. The superparameter may then be iteratively optimized using a gradient descent approach.

For example, an existing network may be verified on a verification set to obtain a value of a network loss function; then, the value is subjected to derivation on the hyper-parameter to obtain a gradient value; and finally updating the hyper-parameter according to the gradient value.

Further, the process of training the first sampling sub-network based on the training set may be: the first sampling sub-network is trained based on a training set to update weight parameters of the first sampling sub-network.

The weight parameters are updated to be new weight parameters after each training, the first sampling sub-network is trained through a training set, and the weight parameters of the trained first sampling sub-network can be updated.

Further, the hardware performance evaluation of the sub-network included in the first super-network, and updating the super-parameters of the first super-network according to the evaluation result to obtain the second super-network, includes: sampling sub-networks included in the first super-network to obtain at least one second sampling sub-network; performing hardware evaluation on at least one second sampling sub-network to obtain an optimal second sampling sub-network; and increasing the influence factors of the convolution kernel configuration parameters and the influence factors of the quantized bit values of each layer of the optimal second sampling sub-network to obtain a second super-network.

The hardware evaluation of the second sampling sub-network can be performed through a hardware evaluation model, and the sub-network with the optimal evaluation result is the optimal second sampling sub-network.

It should be noted that the obtained hardware performance index of the optimal second sampling sub-network is optimal, which indicates that the hyper-parameters of the optimal second sampling sub-network, that is, the convolution kernel configuration parameters and the quantization bit values are optimal. Here the convolution kernel configuration parameters and the quantized bit values may be hyper-parameters for each layer network within the optimal second sampling sub-network.

After determining the optimal convolution kernel configuration parameters and quantization bit values, the impact factors of the optimal convolution kernel configuration parameters and the impact factors of the optimal quantization bit values may be increased. The increased hyper-parameters enable the corresponding optimal sub-network to be updated, and the hyper-network of the optimal sub-network is also updated.

Further, performing hardware evaluation on the at least one second sampling sub-network includes: deploying at least one second sampling sub-network to target hardware to obtain a hardware performance index; alternatively, the hardware performance indicator is determined based on the convolution kernel configuration parameters and the size of the quantized bit values of the at least one second sampling sub-network.

The hardware evaluation method for the sampling sub-networks may be two, and the first method may include directly deploying a plurality of sampled second sampling sub-networks to target hardware, so as to obtain a hardware performance index, where the hardware performance index of each layer of the plurality of second sampling sub-networks is obtained.

The second method may include estimating, when searching for the neural network architecture, a hardware performance index of each layer of the second sampling sub-network according to the network layer of the second sampling sub-network and the correspondence table. The obtained hardware performance indicator may be used to characterize the merits of the convolution kernel configuration parameters and the quantized bit values in the second sampling sub-network.

Further, before performing hardware evaluation on the at least one second sampling subnetwork, the method further comprises: establishing a corresponding relation table of the size selection of the convolution kernel configuration parameters and the quantized bit values of at least one second sampling sub-network and the hardware performance indexes; correspondingly, the selecting and determining the hardware performance index according to the convolution kernel configuration parameters and the quantized bit values of each layer of the at least one second sampling subnetwork comprises: and selecting to search the corresponding hardware performance index from the corresponding relation table according to the convolution kernel configuration parameters and the quantized bit values of the at least one second sampling sub-network.

The correspondence table may include a correspondence between the size selection of the convolution kernel configuration parameters and the quantization bit values of the plurality of second sampling subnetworks and the hardware performance index, and the hardware performance index may be an estimated hardware performance index of all layer networks in all the second sampling subnetworks. For example, there may be a plurality of combinations of the convolution kernel configuration parameters and the quantization bit values of a certain network layer of a second sampling subnetwork, each combination may have a hardware performance index corresponding thereto, and there is a one-to-one correspondence relationship between them, and the one-to-one correspondence relationship is established as a correspondence relationship table.

It should be noted that the establishing of the correspondence table is to deploy a large number of different second sampling sub-networks with known structures to the target hardware before updating the hyper-parameters of the second sampling sub-networks and before searching the neural architecture, determine the hardware performance index of each second sampling sub-network, and then establish the correspondence table based on the determined hardware performance index.

When the hardware of the second sampling sub-network is evaluated, any one or more of the power consumption, the time delay and the model size of each layer of the network can be estimated according to the network layer of the second sampling sub-network to be estimated and the corresponding relation lookup table.

Based on the size selection of the convolution kernel configuration parameters and the quantization bit values of each layer network in the second sampling sub-network, the hardware performance indexes corresponding to the convolution kernel configuration parameters and the quantization bit value combinations of the layer network one to one can be directly searched from the corresponding relation table. For example, there may be a plurality of convolution kernel configuration parameters of the layer network, there may also be a plurality of quantization bit values of the layer network, and the convolution kernel configuration parameters and the quantization bit values may be arbitrarily combined into one group in pairs, that is, there may be a plurality of combinations, each of which has a hardware performance index corresponding thereto.

Further, after obtaining the second super network, the method further includes: determining hardware resources required by a second hyper-network and preset limited resources; and if the required hardware resources exceed the preset limited resources, reducing the quantized bit value of each sub-network so that the required hardware resources are less than or equal to the preset limited resources.

The required hardware resources may be understood as the performance of the sub-network on the target hardware, and may include, for example, power consumption and latency.

The preset limited resource can be understood as a resource limited value of the updated hyper-network model on the target hardware, that is, the required hardware resource of the updated hyper-network cannot exceed the preset limited resource. The preset limited resources can be set by a user according to actual requirements in a self-defining way before the super network is updated.

For example, when a network model runs on target hardware, power consumption is generated certainly, different network models have different power consumption, and generally, a network model with higher complexity has higher power consumption.

After the required hardware resources exceed the preset limited resources, a mode of simultaneously reducing the quantized bit value of each layer network in each sub-network may be adopted, for example, the quantized bit value of each layer network may be reduced from 10 bits to 8 bits, and then reduced to 6 bits, and the reduction is performed in sequence, so that the required hardware resources are less than or equal to the preset limited resources. Reducing the value of the quantized bit may also be understood as a way of reducing the value of the network aspect ratio bit.

Example two

Fig. 2 is a schematic flowchart of a differentiable searching method of a hybrid-precision neural network according to a second embodiment of the present invention. With the development of artificial intelligence application, the demand for deploying a neural network model to edge hardware devices (such as mobile phones, internet of things devices, and the like) is higher and higher. However, the hardware deployment of the neural network often faces a plurality of hardware resource limitations, such as power consumption and time delay, and therefore a lightweight network which can maintain model accuracy and has excellent performance in hardware implementation needs to be designed. The model quantization is a neural network compression optimization technology, data in a neural network can be represented by using a lower bit width, the calculation amount and complexity of the model can be greatly reduced while the accuracy of the model is ensured not to be greatly lost, and the model quantization can be further deployed to hardware platforms with limited resources. Model quantification can be divided into manual model quantification and automated model quantification at present. In the manual model quantization method, human experts need to rely on experience and repeated experiments to determine the quantization bit value of each layer of the network, and although significant optimization promotion is achieved, the method consumes a large amount of manpower and time cost and is easy to fall into local optimization. The automatic model quantization considers the low bit quantization of the neural network as an optimization problem, and each layer of the network is automatically quantized by virtue of various optimization algorithms, so that the automatic model quantization has better effect and lower cost than manual model quantization.

In the traditional NAS method based on reinforcement learning and the NAS method based on an evolutionary algorithm, a large amount of time, calculation and memory resources are consumed in the searching process. The differentiable searching method of the mixed precision neural network can carry out automatic model quantification and can search for a specific hardware platform, and has the advantages of high performance, high speed and low resource consumption.

As shown in fig. 2, the differentiable search method of the mixed-precision neural network provided by this embodiment adopts a search strategy combining the differentiable search method and an evolutionary algorithm. Firstly, acquiring hyper-parameters of a hyper-network, namely acquiring the initialized hyper-parameters of the hyper-network, sampling a plurality of sub-networks, namely a first sampling sub-network, by using a gum-softmax based on the hyper-parameters of the hyper-network, and then updating the hyper-parameters of the hyper-network by adopting a differentiable search method, wherein the method comprises the following steps: keeping the hyper-parameters of the sub-networks unchanged, training the sub-networks on a training set for a plurality of times, updating the weight parameters of the sub-networks to obtain the trained sub-networks, sending the trained sub-networks into a network verification set evaluator for verification and evaluation to obtain verification set feedback after evaluation, and updating the hyper-parameters of the hyper-network according to the verification set feedback to obtain a first hyper-network.

Updating the super-parameters of the initialized super-network based on a differentiable search method, and then updating the super-parameters of the super-network by using an evolutionary algorithm, wherein the method comprises the following steps: sampling a plurality of sub-networks, namely a second sampling sub-network, by using the gum-softmax again based on the updated hyper-network hyper-parameter; and then, according to the hardware deployment of the model or the query of a lookup table, namely a corresponding relation table, the hardware performance index of the second sampling sub-network on target hardware is evaluated by using the hardware evaluation model to obtain hardware feedback, and the second sampling sub-network is updated according to the hardware feedback, namely the hyper-network hyper-parameter is updated.

The two updating modes are alternately and circularly used for updating the hyper-parameters of the hyper-network. And updating the hyper-network hyper-parameter by adopting a gradient-based mode aiming at the precision feedback of the verification set, and updating the hyper-network hyper-parameter by adopting an evolutionary algorithm aiming at the hardware feedback.

The differentiable search method is used for receiving network precision feedback and updating the hyper-network hyper-parameter based on the network precision feedback. The network validation set evaluator is configured to evaluate the loss/accuracy of the trained subnetwork on the validation set.

Compared with other methods, when the differentiable search method of the mixed-precision neural network provided by the embodiment is used for exploring the network architecture space, the method adopts a differentiable search algorithm based on gradient to explore the search space, combines the mixed precision quantification while searching the architecture, and has more excellent performance and efficiency in the practical task application. In the network searching process, the method adds the feedback of actual hardware as constraint information, processes the non-differentiable hardware feedback information by adopting an evolutionary algorithm, can search the architecture meeting the hardware resource requirement, and is easier to search the global optimal solution. The method can achieve high accuracy in computer vision tasks by using few computing resources, and provides convenience for the application of the technology of automatic network architecture design in the practice of individual researchers, small-sized companies and university research teams.

The differentiable searching method of the mixed precision neural network provided by the embodiment of the invention adopts a differentiable searching mode, changes the searching space into continuous differentiable, considers the quantized bit value selection of each layer of the network as the neural network architecture searching problem, and adopts a gradient-based algorithm to search the architecture parameters, namely the convolution kernel configuration parameters and the network layer quantized bit values. On one hand, compared with the reinforcement learning and evolution algorithm adopted by the traditional neural network architecture search, the gradient-based differentiable search algorithm has the advantage that the search speed is obviously improved. On the other hand, the network obtained by searching is a quantized mixed precision network, and compared with a full-precision neural network, the calculation and memory resources consumed by the network are greatly reduced, so that the possibility of deep learning on hardware with limited resources is provided.

EXAMPLE III

Fig. 3 is a schematic structural diagram of a differentiable searching apparatus for a hybrid-precision neural network according to a third embodiment of the present invention, which can be applied to the case of searching a high-performance neural network, wherein the apparatus can be implemented by software and/or hardware and is generally integrated on a computer device.

As shown in fig. 3, the apparatus includes:

an obtaining module 310, configured to obtain an initialized hyper-network; the super network comprises a plurality of sub networks, and each sub network carries a super parameter;

a first updating module 320, configured to update the hyper-parameter based on a differentiable search method to obtain a first hyper-network;

a second updating module 330, configured to perform hardware performance evaluation on the first super network, and update the super parameters of the first super network according to an evaluation result to obtain a second super network;

a determining module 340, configured to determine whether an update termination condition is met, and if so, determine the second super network as a target neural network; otherwise, returning to execute the operation of updating the hyper-parameters based on the differentiable search method to obtain the first hyper-network.

In this embodiment, the apparatus is first used to acquire an initialization hyper-network through an acquisition module; the super network comprises a plurality of sub networks, and each sub network carries the super parameters; secondly, a first updating module is used for updating the hyper-parameters based on a differentiable search method to obtain a first hyper-network; then, a second updating module is used for carrying out hardware performance evaluation on the sub-network contained in the first hyper-network, and updating the hyper-parameters of the first hyper-network according to the evaluation result to obtain a second hyper-network; then the process is passed; finally, the judgment module is used for judging whether the updating termination condition is met, and if the updating termination condition is met, the second hyper-network is determined as a target neural network; otherwise, returning to execute the operation of updating the hyper-parameters based on the differentiable search method to obtain the first hyper-network.

The embodiment provides a differentiable searching device of a mixed precision neural network, which can perform automatic model quantization and can search and construct the neural network aiming at a specific hardware platform.

Further, the super network is a convolutional neural network, and the sub-network comprises at least one layer of network; the super-parameters include an influence factor of a convolution kernel configuration parameter and an influence factor of a quantization bit value of each layer network, and are continuously differentiable.

Further, the first updating module 320 is further configured to: sampling sub-networks included in the initialization hyper-network to obtain at least one first sampling sub-network; training the first sampling subnetwork based on a training set; forward propagating the verification set in the trained first sampling sub-network to obtain a target loss function value; carrying out differential calculation on the target loss function value to obtain a gradient value; and updating the hyper-parameters according to the gradient values to obtain a first hyper-network.

Further, the training module is configured to train the first sampling subnetwork based on a training set to update the weight parameter of the first sampling subnetwork.

Further, the second updating module 330 is further configured to sample subnetworks included in the first super network to obtain at least one second sampling subnetwork; performing hardware evaluation on the at least one second sampling sub-network to obtain an optimal second sampling sub-network; and increasing the influence factors of the convolution kernel configuration parameters and the influence factors of the quantized bit values of each layer of the optimal second sampling sub-network to obtain a second super-network.

Further, the hardware evaluation module is configured to deploy the at least one second sampling subnetwork to target hardware to obtain a hardware performance index; alternatively, the hardware performance indicator is determined based on the convolution kernel configuration parameters and the size of the quantized bit values of the at least one second sampling sub-network.

Further, before the hardware evaluation module evaluates the hardware, the second updating module 330 is further configured to: establishing a corresponding relation table of the size selection of the convolution kernel configuration parameters and the quantized bit values of the at least one second sampling sub-network and the hardware performance indexes; and selecting to search the corresponding hardware performance index from the corresponding relation table according to the convolution kernel configuration parameter and the quantized bit value of each layer of at least one second sampling sub-network.

Further, the second updating module 330 is further configured to determine hardware resources required by the second extranet and actually available preset limited resources; and if the required hardware resources exceed the preset limited actual available resources, reducing the quantized bit value of each sub-network so that the required hardware resources are less than or equal to the preset limited actual available resources.

The differentiable searching device of the mixed precision neural network can execute the differentiable searching method of the mixed precision neural network provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the executing method.

Example four

Fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention. As shown in fig. 4, a computer device provided in the fourth embodiment of the present invention includes: one or more processors 41 and storage 42; the processor 41 in the device may be one or more, and one processor 41 is taken as an example in fig. 4; storage 42 is used to store one or more programs; the one or more programs are executable by the one or more processors 41 to cause the one or more processors 41 to implement a method of differential search of a mixed-precision neural network as described in any one of the embodiments of the present invention.

The computer device may further include: an input device 43 and an output device 44.

The processor 41, the storage device 42, the input device 43 and the output device 44 in the computer apparatus may be connected by a bus or other means, and the connection by the bus is exemplified in fig. 4.

The storage device 42 in the computer device is used as a computer readable storage medium for storing one or more programs, which may be software programs, computer executable programs, and modules, such as program instructions/modules corresponding to the methods provided in one or two embodiments of the present invention (for example, the modules in the device shown in fig. 3, including the first update module 320 and the second update module 330). The processor 41 executes various functional applications of the terminal device and data processing, that is, implements the differential search method of the hybrid precision neural network in the above method embodiment, by executing software programs, instructions, and modules stored in the storage device 42.

The storage device 42 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the device, and the like. Further, the storage 42 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, storage 42 may further include memory located remotely from processor 41, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 43 may be used to receive input numeric or character information and to generate key signal inputs relating to user settings and function controls of the apparatus. The output device 44 may include a display device such as a display screen.

And, when the one or more programs included in the above-mentioned apparatus are executed by the one or more processors 41, the programs perform the following operations:

EXAMPLE five

An embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, is configured to perform a method for performing a hybrid-precision neural network differentiable search, where the method includes:

Optionally, the program when executed by the processor may be further configured to perform a method for performing a differential search of a hybrid-precision neural network provided in any of the embodiments of the present invention.

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a flash Memory, an optical fiber, a portable CD-ROM, an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. A computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take a variety of forms, including, but not limited to: an electromagnetic signal, an optical signal, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, Radio Frequency (RF), etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method for differentiable searching of a mixed-precision neural network, comprising:

2. The method of claim 1, wherein the super network is a convolutional neural network, and wherein the sub-network comprises at least one layer of network; the super-parameters include an influence factor of a convolution kernel configuration parameter and an influence factor of a quantization bit value of each layer network, and are continuously differentiable.

3. The method of claim 2, wherein updating the hyper-parameter based on a differentiable search method to obtain a first hyper-network comprises:

sampling sub-networks included in the initialization hyper-network to obtain at least one first sampling sub-network;

training the first sampling subnetwork based on a training set;

forward propagating the verification set in the trained first sampling sub-network to obtain a target loss function value;

carrying out differential calculation on the target loss function value to obtain a gradient value;

and updating the hyper-parameters according to the gradient values to obtain a first hyper-network.

4. The method of claim 3, wherein training the first sampling subnetwork based on a training set comprises:

training the first sampling sub-network based on a training set to update weight parameters of the first sampling sub-network.

5. The method of claim 1, wherein evaluating hardware performance of subnetworks included in the first super network, and updating a super parameter of the first super network according to an evaluation result to obtain a second super network comprises:

sampling sub-networks included in the first super-network to obtain at least one second sampling sub-network;

performing hardware evaluation on the at least one second sampling sub-network to obtain an optimal second sampling sub-network;

and increasing the influence factors of the convolution kernel configuration parameters and the influence factors of the quantized bit values of each layer of the optimal second sampling sub-network to obtain a second super-network.

6. The method of claim 5, wherein performing a hardware evaluation of the at least one second sampling sub-network comprises:

deploying the at least one second sampling sub-network to target hardware to obtain a hardware performance index; alternatively, the hardware performance indicator is determined based on the convolution kernel configuration parameters and the size of the quantized bit values for each layer of the at least one second sampling sub-network.

7. The method of claim 6, further comprising, prior to hardware evaluation of the at least one second sampling subnetwork:

establishing a corresponding relation table of the size selection of the convolution kernel configuration parameters and the quantized bit values of the at least one second sampling sub-network and the hardware performance indexes;

correspondingly, the selecting and determining the hardware performance index according to the convolution kernel configuration parameter and the quantized bit value of the at least one second sampling subnetwork comprises:

and selecting to search the corresponding hardware performance index from the corresponding relation table according to the convolution kernel configuration parameters and the quantized bit values of at least one second sampling sub-network.

8. The method of claim 2, further comprising, after obtaining the second super network:

determining hardware resources required by the second hyper-network and preset limited resources;

and if the required hardware resource exceeds the preset limited resource, reducing the quantized bit value of each sub-network so that the required hardware resource is less than or equal to the preset limited resource.

9. A differential search apparatus for a hybrid precision neural network, comprising:

the second updating module is used for evaluating the hardware performance of the first hyper-network and updating hyper-parameters of the first hyper-network according to an evaluation result to obtain a second hyper-network;

10. The apparatus of claim 9, wherein the super network is a convolutional neural network, and wherein the sub-network comprises at least one layer of network; the super-parameters include an influence factor of a convolution kernel configuration parameter and an influence factor of a quantization bit value of each layer network, and are continuously differentiable.