CN112149809A

CN112149809A - Model hyper-parameter determination method and device, calculation device and medium

Info

Publication number: CN112149809A
Application number: CN202011148224.9A
Authority: CN
Inventors: 希滕; 张刚; 温圣召
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-10-23
Filing date: 2020-10-23
Publication date: 2020-12-29

Abstract

The disclosure provides a hyper-parameter determination method and device, a computing device and a medium of a neural network model, relates to the technical field of artificial intelligence, in particular to the technical field of deep learning and computer vision, and can be used for image processing scenes. The hyper-parameter determination method of the neural network model comprises the following steps: constructing a plurality of search spaces in a neural network model; respectively acquiring corresponding hyper-parameter value sets aiming at each search space in a plurality of search spaces; acquiring a group of codes generated by an encoder, wherein the number of the codes in the group of codes is the same as the number of the plurality of search spaces; and determining the corresponding hyper-parameter value of each search space according to the group of codes and the acquired hyper-parameter value set.

Description

Model hyper-parameter determination method and device, calculation device and medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to deep learning and computer vision, which can be used in image processing scenarios, and in particular, to a method and an apparatus for determining hyper-parameters of a neural network model, a computing apparatus, and a medium.

Background

Training a deep neural network is very complicated because the distribution of inputs in each layer changes as the parameters of previous layers change during the training process. This requires a low learning rate and careful parameter initialization, slowing down the training speed, and it is difficult to train a model with saturated non-linearity. We refer to this phenomenon as internal covariate migration and solve the problem by normalizing the layer inputs. The advantages of this can be drawn by taking normalization as part of the model architecture and performing normalization for each training mini-batch. Batch Normalization (BN) allows higher learning rates to be used and does not require much care in parameter initialization.

The BN parameter has a great influence on the training speed of the model and the final precision of the model, however, the optimal BN parameter is difficult to debug by manually adjusting the parameter. In the existing application, each layer of the model usually adopts the same BN parameter, the model structure usually has tens of layers or even hundreds of layers, all the layers adopt the same BN strategy, and the precision of the model is poor.

The approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, unless otherwise indicated, the problems mentioned in this section should not be considered as having been acknowledged in any prior art.

Disclosure of Invention

According to an aspect of the present disclosure, there is provided a method for determining hyper-parameters of a neural network model, including: constructing a plurality of search spaces in the neural network model; respectively acquiring a corresponding hyper-parameter value set aiming at each search space in the plurality of search spaces; obtaining a set of codes generated by an encoder, wherein the number of codes in the set of codes is the same as the number of the plurality of search spaces; and determining a hyper-parameter value corresponding to each search space according to the group of codes and the acquired hyper-parameter value set.

According to another aspect of the present disclosure, there is provided a hyper-parameter determination apparatus of a neural network model, including: a search space construction unit configured to construct a plurality of search spaces in the neural network model; a first obtaining unit, configured to obtain, for each of the plurality of search spaces, a corresponding hyper-parameter value set; a second obtaining unit configured to obtain a set of codes generated by an encoder, wherein the number of codes in the set of codes is the same as the number of the plurality of search spaces; and the first determining unit is configured to determine a hyper-parameter value corresponding to each search space according to the group of codes and the acquired hyper-parameter value set.

According to yet another aspect of the present disclosure, there is provided a computing device comprising: a processor; and a memory storing a program comprising instructions that, when executed by the processor, cause the processor to perform a method of hyper-parameter determination of a neural network model according to one aspect of the present disclosure.

According to yet another aspect of the present disclosure, there is provided a computer-readable storage medium storing a program, the program comprising instructions that, when executed by a processor of a computing device, cause the computing device to perform a method of hyper-parameter determination of a neural network model according to one aspect of the present disclosure.

According to one aspect of the disclosure, the method for determining the hyper-parameters of the neural network model automatically determines the relevant hyper-parameter values in different search spaces through the introduced search spaces, so that a group of hyper-parameter values are automatically set for the model, and the speed and the precision of the model on specific hardware are improved.

These and other aspects of the disclosure will be apparent from and elucidated with reference to the embodiments described hereinafter.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the embodiments and, together with the description, serve to explain the exemplary implementations of the embodiments. The illustrated embodiments are for purposes of illustration only and do not limit the scope of the claims. Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements.

FIG. 1 shows a flow diagram of a method of hyper-parameter determination of a neural network model according to an example embodiment of the present disclosure;

FIG. 2 shows a schematic flow diagram of a method of hyperparametric determination of a neural network model according to an exemplary embodiment of the present disclosure;

FIG. 3 shows a schematic diagram of a hyper-parameter determination apparatus of a neural network model according to an exemplary embodiment of the present disclosure; and

FIG. 4 illustrates a block diagram of an exemplary computing device that can be used to implement embodiments of the present disclosure.

Detailed Description

In the present disclosure, unless otherwise specified, the use of the terms "first", "second", etc. to describe various elements is not intended to limit the positional relationship, the timing relationship, or the importance relationship of the elements, and such terms are used only to distinguish one element from another. In some examples, a first element and a second element may refer to the same instance of the element, and in some cases, based on the context, they may also refer to different instances.

The terminology used in the description of the various described examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, if the number of elements is not specifically limited, the elements may be one or more. Furthermore, the term "and/or" as used in this disclosure is intended to encompass any and all possible combinations of the listed items.

The method or apparatus of the present disclosure may be applied to a server, or may be applied to a system architecture including a terminal device, a network, and a server. The medium in which the network provides communication links between the terminal devices and the server may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The end device may be a customer premises device on which various client applications may be installed. Such as image processing type applications, search applications, voice service type applications, etc. The terminal device may be hardware or software. When the terminal device is hardware, it may be various electronic devices including, but not limited to, a smart phone, a tablet computer, an e-book reader, a laptop portable computer, a desktop computer, and the like. When the terminal device is software, the terminal device can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules, or as a single piece of software or software module. And is not particularly limited herein.

The server may be a server running various services, such as a server running services based on object detection and recognition of data such as images, video, voice, text, digital signals, text or voice recognition, signal conversion, etc. The server can acquire deep learning task data to construct training samples and train a neural network model for executing the deep learning task.

The server may be a backend server providing backend support for applications installed on the terminal device. For example, the server may search for a neural network model structure suitable for the operation of the terminal device according to the operation environment of the terminal, specifically may construct a super network, train the super network, and evaluate the performance of the neural network models of different structures based on the trained super network, thereby determining the structure of the neural network model matched with the terminal device. The server can also receive data to be processed sent by the terminal equipment, process the data by using a neural network model searched out based on the trained hyper-network, and return a processing result to the terminal equipment.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

There are generally two types of parameters in machine learning: one type of parameters can be obtained by learning and estimating from data, and the other type of parameters can be obtained by estimating from data by methods, and can be designed and formulated only by human experience, and the latter type of parameters are called hyper-parameters. The frame parameters in the machine learning model, such as the number of classes in the clustering method, or the number of topics in the topic model, or the related parameters in the BN, etc., are all called hyper-parameters. They are not the same as the parameters (weights) learned during the training process, and the hyper-parameters are the parameters that are set before the learning process is started, usually manually set, continuously adjusted by trial and error, or are enumerated by a series of exhaustive parameter sets (called grid search). And optimizing the hyper-parameters, and selecting a group of optimal hyper-parameters for the learning machine so as to improve the learning performance and effect. Generally, each layer in the model adopts the same hyper-parametric tuning model, but the model structure often has tens of layers or even hundreds of layers, all the layers adopt the same strategy, and the precision of the model is poor.

Therefore, according to an aspect of the present disclosure, as shown in fig. 1, there is provided a method for hyper-parameter determination of a neural network model, comprising: building a plurality of search spaces in the neural network model (step 110); acquiring a hyper-parameter value set corresponding to each of the plurality of search spaces (step 120); acquiring a set of codes generated by an encoder, wherein the number of codes in the set of codes is the same as the number of the search spaces (step 130); and determining a hyper-parameter value corresponding to each search space according to the group of codes and the plurality of hyper-parameter value sets (step 140).

According to the hyper-parameter determining method of the neural network model, relevant hyper-parameter values in different search spaces are automatically determined through the introduced search spaces, so that a group of hyper-parameter values are automatically set for the model, and the speed and the precision of the model on specific hardware are improved.

Manually designing a network topology requires a very extensive experience and numerous attempts, and many parameters can produce explosive combinations, and conventional random searching is hardly feasible. The Neural network Architecture Search technology (NAS for short) is to use an algorithm to replace a tedious manual operation, and automatically Search out an optimal Neural network Architecture in a massive Search space. The NAS mainly comprises a search space, a search strategy and a performance evaluation strategy. The principle of NAS is to search for the optimal network structure from a set of candidate neural network structures, called a search space, using a certain strategy.

Here, the search space refers to a candidate set of network structures to be searched. The search space is roughly divided into a global search space and a cell-based search space, wherein the global search space represents the search of the whole network structure, and the cell-based search space only searches a plurality of small structures and is combined into a complete large network by a stacking and splicing method. Common search methods include: random search, bayesian optimization, evolutionary algorithms, reinforcement learning, gradient-based algorithms, and the like.

In some examples, search space design rules may be formulated using NAS's correlation algorithms to construct multiple search spaces. According to some embodiments, the plurality of search spaces comprises a coarse-grained space and/or a fine-grained space, wherein one or more layers of networks share the same super-parameter value in the coarse-grained space; and in the fine granularity space, one channel or a plurality of channels share the same super parameter value.

In some examples, the search space may be constructed according to layers, for example, when constructing the search space in the neural network model, each layer of network may be constructed as a different search space in turn, or each layer of network may be constructed as a different search space. It should be understood that the number of layers of networks is not fixed in different search spaces, for example, a first layer of the neural network model may be constructed as a first search space, a second layer and a third layer of the neural network model may be constructed together as a second search space, a fourth, fifth, sixth, and seventh layers of the neural network model may be constructed together as a third search space, and so on.

In some examples, the search space may also be constructed from channels, and in a neural network model, a layer may include multiple channels, and thus, multiple search spaces may be constructed from channels. For example, each of the channels may be uniformly divided into a search space, or the number of channels included in each search space may be customized. Furthermore, for example, the search space may be sequentially constructed according to the order of layers and channels, or some channels at intervals in a certain layer and/or channels in a different layer may be constructed as one search space in a customized manner, that is, the specific structure of the search space is not limited herein, as long as the method according to the present disclosure can be implemented.

In some examples, the search space may also be constructed from layers and channels. For example, the constructed search space may be configured to include both the complete layer and one or more other channels; alternatively, it may be configured to construct a part of the search space according to the layers and another part of the search space according to the channels. That is, the specific structure of the search space is not limited herein as long as it can be used to implement the method according to the present disclosure.

According to some embodiments, the hyper-parameter comprises a batch normalized BN hyper-parameter.

Training a neural network model is a complex process, and as long as the first layers of the network are slightly changed, the later layers are cumulatively enlarged. Once the distribution of input data of a certain layer of the network changes, the network of the layer needs to adapt to learning the new data distribution, so that if the distribution of training data changes all the time in the training process, the training speed of the model is affected. Taking the second layer of the network as an example: the second layer input of the network is calculated by the parameters and input of the first layer, and the parameters of the first layer are changed all the time in the whole training process, thereby inevitably causing the distribution of the input data of each later layer to be changed. This phenomenon is called internal covariate migration and can be solved by normalizing the layer inputs. For example, a normalization layer is inserted at the time of inputting each layer of the model, that is, a normalization process is performed first, and then the next layer of the model is entered. Because the activation input value of the neural network model before the nonlinear transformation is deepened along with the network depth or the distribution of the activation input value gradually shifts or changes in the training process, the training convergence is slow, generally, the overall distribution gradually approaches to both ends of the upper limit and the lower limit of the value interval of the nonlinear function, so that the gradient of the lower-layer neural network disappears during the reverse propagation, which is the essential reason that the convergence of the deep-layer neural network is slower and slower. The batch normalization BN is that for each hidden layer neuron, input distribution which is gradually mapped to a nonlinear function and then drawn close to a value range limit saturation region is forcibly pulled back to a standard normal distribution with the mean value of 0 and the variance of 1, so that the input value of the nonlinear transformation function falls into a region which is sensitive to input, and the problem of gradient disappearance is avoided. Moreover, the gradient is increased, which means that the learning convergence speed is high, and the training speed can be greatly increased.

Hyper-parameters in a BN can be determined well using the method according to the present disclosure without having to be specified manually prior to model training.

According to some embodiments, the encoded values generated by the encoder correspond to a set size of the set of hyper-parameter values. For example, assuming that 5 search spaces are constructed in the neural network model, and it is preset that the hyper-parameter value sets corresponding to each search space are all {0.0001,0.001,0.01,0.1}, the maximum value size of each code in a set of codes generated by the encoder should correspond to a plurality of numbers contained in the set, for example, if the encoder outputs a set of codes {0,1,3,1,0}, it indicates that the hyper-parameter value of the first search space is the first data in the set corresponding to the hyper-parameter value set, that is, 0.0001; the value of the hyper-parameter of the second search space is the second data in the corresponding set, namely 0.001; the value of the hyper-parameter of the third search space is the fourth data in the corresponding set, namely 0.1; the value of the hyper-parameter of the fourth search space is the second data in the corresponding set, namely 0.001; the hyper-parameter value of the fifth search space is the first data in the set corresponding to the hyper-parameter value, namely 0.0001. It should be understood that the sets of hyper-parameter values corresponding to different search spaces may also be different, for example, the set of hyper-parameter values corresponding to the first search space is {0.0001,0.001,0.01,0.1}, the set of hyper-parameter values corresponding to the second search space is {0.001,0.002,0.02,0.2}, and so on.

According to some embodiments, the method according to the present disclosure further comprises: performing iterative training on the neural network model for a preset number of times based on the determined super-parameter value to obtain the precision loss of the neural network model after quantization; in response to the encoder update preset condition not being met, updating the encoder according to the precision loss to obtain a set of codes generated newly; and determining a hyper-parameter value corresponding to each search space according to the newly generated group of codes and the acquired hyper-parameter value set.

According to some embodiments, training the neural network model a predetermined number of iterations based on the determined hyper-parameter value to obtain a quantified loss of precision for the neural network model comprises: in response to the number of iterative training times not reaching the predetermined number of times, performing an iterative training process, the iterative training process comprising the operations of: converting the floating point type parameters of the neural network model into integer type parameters for forward propagation in each model training or after model training for a preset number of times; obtaining the precision loss of the neural network model as the precision loss after quantization; and performing back propagation according to the precision loss to update the floating point type parameters of the neural network model.

Here, model quantization is a process of approximating fixed-point of floating-point type model weights of continuous values (or a large number of possible discrete values) or tensor data flowing through a model (generally int8) to a finite number of (or fewer) discrete values with low inference precision loss, and it is a process of approximating 32-bit finite-range floating-point type data with a data type of a smaller number of bits, and the input and output of the model are still floating-point type, thereby achieving the objectives of reducing the size of the model, reducing the memory consumption of the model, and speeding up the model inference.

And after the codes generated by the encoder are converted into a group of BN hyperparameters, the neural network model can be iteratively trained according to the BN hyperparameters. In some examples, one iterative training may be completed by converting floating-point (e.g., float32, float16, etc.) model parameters to integer (e.g., int8, etc.) model parameters for forward propagation in each model training, and evaluating the loss of precision of the model to update the floating-point model parameters of the model in a backward propagation according to the loss of precision. In some examples, the above model training considering the loss of precision after quantization may also be performed after one or more times of normal model training, thereby completing one iterative training. And when the number of iterative training reaches a preset number, feeding back the precision loss output by the last iterative training to the encoder to update the encoder, acquiring a group of codes newly generated by the updated encoder, and determining the BN hyperparameter value corresponding to each search space according to the newly generated group of codes and the acquired hyperparameter value set so as to continuously execute the iterative training for the preset number until the preset condition of updating the encoder is reached.

According to some embodiments, the encoder is based on a neural network model. For example, the encoder is a deep learning based neural network model.

A neural network is a complex network system formed by a large number of simple processing units (called neurons) widely connected to each other, reflects many basic features of human brain functions, and is a highly complex nonlinear dynamical learning system. The neural network has the capabilities of large-scale parallel, distributed storage and processing, self-organization, self-adaptation and self-learning, and is particularly suitable for processing inaccurate and fuzzy information processing problems which need to consider many factors and conditions simultaneously. Learning is one of the most important and most compelling features of neural networks, which can perform learning training on pattern samples provided by external environments. In an example according to the present disclosure, the training accuracy of the model is fed back to the encoder model, so that the encoder model learns to further optimize the generated code, thereby continuously optimizing the value of the hyper-parameter.

In some examples, as shown in fig. 2, after a plurality of search spaces are constructed in the neural network model and a respective hyper-parameter value set corresponding to each search space is obtained (step 210), an encoder is initialized (step 220) so that the encoder generates a set of codes and obtains the set of codes (step 230). And decoding the generated group of codes according to the respective corresponding hyper-parameter value sets of each search space (step 240) to obtain the respective corresponding hyper-parameter values of each search space. And setting corresponding hyper-parameter values of a plurality of search spaces constructed in the neural network model according to the hyper-parameter values obtained by decoding, so as to carry out iterative training on the neural network model for a preset number of times, and adding quantization loss in the iterative training process to obtain the precision loss after model quantization (step 250). In response to the preset iteration times are not reached (step 260, No), continuously executing one iteration training and obtaining the precision loss output by the iteration training; otherwise (step 260, yes) perform the next step; in response to the encoder update preset condition not being met ("no" in step 270), the accuracy loss is fed back to the encoder model to be updated based on the deep learning algorithm (step 280). And generating a new group of codes by using the updated encoder, performing iterative training on the neural network model for a preset number of times by using the hyper-parameter values obtained after decoding the generated new group of codes according to the respective corresponding hyper-parameter value sets of each search space to obtain the precision loss … … after the quantization of the model again until the encoder is not updated after the preset conditions of the update of the encoder are met (yes in step 270), and training and further using the hyper-parameter values finally determined by the encoder for the neural network model.

According to some embodiments, the encoder update preset conditions include one or more of: the model training precision reaches the preset precision, and the updating times of the encoder reach the preset times. That is, it may be set that when the training precision of the neural network model reaches the preset precision, the encoder is not further updated; or when the encoder is updated for a preset number of times, the encoder is not updated any further; or the training precision of the neural network model reaches the preset precision and the encoder is updated for the preset times, and the encoder is not updated any further.

It should be understood that other preset conditions consistent with the present disclosure are also possible and not limited herein.

In some examples, the quantifying comprises one or more of: online quantization and offline quantization.

According to the method disclosed by the disclosure, different BN hyperparameters can be customized for different layers or channels of the neural network model, so that the training precision of the neural network model is improved; therefore, the running speed and the training precision of the neural network model on specific hardware can be improved, the core competitiveness of the product is improved, and the running cost of the product is reduced.

By the method, the training precision of the model can be remarkably improved, which is equivalent to the original training precision which can be achieved through a smaller model, so that the corresponding speed is higher when the image is processed on hardware through the trained model, the cost of the product is reduced, and the core competitiveness of the product is improved.

According to another aspect of the present disclosure, as shown in fig. 3, there is provided a hyper-parameter determining apparatus 300 of a neural network model, including: a search space construction unit 310 configured to construct a plurality of search spaces in the neural network model; a first obtaining unit 320, configured to obtain, for each of the plurality of search spaces, a corresponding hyper-parameter value set; a second obtaining unit 330 configured to obtain a set of codes generated by an encoder, wherein the number of codes in the set of codes is the same as the number of the plurality of search spaces; and a first determining unit 340 configured to determine a hyper-parameter value corresponding to each of the search spaces according to the set of codes and the obtained hyper-parameter value set.

In some embodiments, the apparatus according to the present disclosure further comprises: the training unit is configured to carry out iteration training on the neural network model for a preset number of times based on the determined super-parameter value so as to obtain the precision loss of the quantized neural network model; an updating unit configured to update the encoder according to the precision loss to obtain a set of codes generated newly in response to the encoder update preset condition not being met; and the second determining unit is configured to determine a hyper-parameter value corresponding to each search space according to the newly generated group of codes and the acquired hyper-parameter value set.

According to some embodiments, the plurality of search spaces comprise a coarse-grained space and/or a fine-grained space, wherein one or more layers of network structures share the same super-parameter value in the coarse-grained space; and in the fine granularity space, one channel or a plurality of channels share the same super parameter value.

According to some embodiments, the encoder is based on a neural network model.

According to some embodiments, the preset conditions comprise one or more of: the model training precision reaches the preset precision, and the updating times of the encoder reach the preset times.

According to some embodiments, the encoded values generated by the encoder correspond to a set size of the set of hyper-parameter values.

According to some embodiments, the hyper-parameters comprise batch normalized hyper-parameters.

According to some embodiments, the quantifying comprises one or more of: online quantization and offline quantization.

Here, the operations of the units 310 to 340 of the apparatus 300 for determining hyper-parameters of a neural network model are similar to the operations of the steps 110 to 140 described above, and are not described herein again.

According to yet another aspect of the present disclosure, there is provided a computing device comprising: a processor; and a memory storing a program comprising instructions that, when executed by the processor, cause the processor to perform the method of hyper-parameter determination of a neural network model as described in the present disclosure.

According to yet another aspect of the present disclosure, there is provided a computer-readable storage medium storing a program, the program comprising instructions that, when executed by a processor of a computing device, cause the computing device to perform the method of hyper-parameter determination of a neural network model described in the present disclosure.

Referring to fig. 4, a computing device 2000, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. The computing device 2000 may be any machine configured to perform processing and/or computing, and may be, but is not limited to, a workstation, a server, a desktop computer, a laptop computer, a tablet computer, a personal digital assistant, a robot, a smart phone, an on-board computer, or any combination thereof. The above-described hyper-parameter determination methods of the neural network model may each be implemented in whole or at least in part by a computing device 2000 or similar device or system.

Computing device 2000 may include elements to connect with bus 2002 (possibly via one or more interfaces) or to communicate with bus 2002. For example, computing device 2000 may include a bus 2002, one or more processors 2004, one or more input devices 2006, and one or more output devices 2008. The one or more processors 2004 may be any type of processor and may include, but are not limited to, one or more general purpose processors and/or one or more special purpose processors (e.g., special processing chips). Input device 2006 may be any type of device capable of inputting information to computing device 2000 and may include, but is not limited to, a mouse, a keyboard, a touch screen, a microphone, and/or a remote control. Output device 2008 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. The computing device 2000 may also include or be connected with a non-transitory storage device 2010, which may be any storage device that is non-transitory and that may enable data storage, and may include, but is not limited to, a magnetic disk drive, an optical storage device, solid state memory, a floppy disk, a flexible disk, a hard disk, a magnetic tape, or any other magnetic medium, an optical disk or any other optical medium, a ROM (read only memory), a RAM (random access memory), a cache memory, and/or any other memory chip or cartridge, and/or any other medium from which a computer may read data, instructions, and/or code. The non-transitory storage device 2010 may be removable from the interface. The non-transitory storage device 2010 may have data/programs (including instructions)/code for implementing the above-described methods and steps. Computing device 2000 may also include a communication device 2012. The communication device 2012 may be any type of device or system that enables communication with external devices and/or with a network and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication devices, and/or chipsets such as bluetooth (TM) devices, 1302.11 devices, WiFi devices, WiMax devices, cellular communication devices, and/or the like.

The computing device 2000 may also include a working memory 2014, which may be any type of working memory that can store programs (including instructions) and/or data useful for the operation of the processor 2004, and may include, but is not limited to, random access memory and/or read only memory devices.

Software elements (programs) may be located in the working memory 2014 including, but not limited to, an operating system 2016, one or more application programs 2018, drivers, and/or other data and code. Instructions for performing the above-described methods and steps may be included in the one or more applications 2018, and the above-described method of hyper-parameter determination of a neural network model may each be implemented by instructions of the one or more applications 2018 being read and executed by the processor 2004. More specifically, in the above-described hyper-parameter determination method of the neural network model, the steps 110 to 140 may be implemented, for example, by the processor 2004 executing the application 2018 having the instructions of the steps 110 to 140. Further, other steps in the above-described method of hyper-parameter determination of a neural network model may be implemented, for example, by the processor 2004 executing an application 2018 having instructions to perform the respective steps. Executable code or source code of instructions of the software elements (programs) may be stored in a non-transitory computer-readable storage medium (such as the storage device 2010 described above) and, upon execution, may be stored in the working memory 2014 (possibly compiled and/or installed). Executable code or source code for the instructions of the software elements (programs) may also be downloaded from a remote location.

It will also be appreciated that various modifications may be made in accordance with specific requirements. For example, customized hardware might also be used and/or particular elements might be implemented in hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. For example, some or all of the disclosed methods and apparatus may be implemented by programming hardware (e.g., programmable logic circuitry including Field Programmable Gate Arrays (FPGAs) and/or Programmable Logic Arrays (PLAs)) in an assembly language or hardware programming language such as VERILOG, VHDL, C + +, using logic and algorithms according to the present disclosure.

It should also be understood that the components of computing device 2000 may be distributed across a network. For example, some processes may be performed using one processor while other processes may be performed by another processor that is remote from the one processor. Other components of the computing system 2000 may also be similarly distributed. As such, the computing device 2000 may be interpreted as a distributed computing system that performs processing at multiple locations.

Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the above-described methods, systems and apparatus are merely exemplary embodiments or examples and that the scope of the present invention is not limited by these embodiments or examples, but only by the claims as issued and their equivalents. Various elements in the embodiments or examples may be omitted or may be replaced with equivalents thereof. Further, the steps may be performed in an order different from that described in the present disclosure. Further, various elements in the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced with equivalent elements that appear after the present disclosure.

Claims

1. A hyper-parameter determination method of a neural network model comprises the following steps:

constructing a plurality of search spaces in the neural network model;

respectively acquiring a corresponding hyper-parameter value set aiming at each search space in the plurality of search spaces;

obtaining a set of codes generated by an encoder, wherein the number of codes in the set of codes is the same as the number of the plurality of search spaces; and

and determining a hyper-parameter value corresponding to each search space according to the group of codes and the acquired hyper-parameter value set.

2. The method of claim 1, further comprising:

performing iterative training on the neural network model for a preset number of times based on the determined super-parameter value to obtain the precision loss of the neural network model after quantization;

in response to the encoder update preset condition not being met, updating the encoder according to the precision loss to obtain a set of codes generated newly; and

and determining a hyper-parameter value corresponding to each search space according to the newly generated group of codes and the acquired hyper-parameter value set.

3. The method of claim 2, wherein training the neural network model a predetermined number of iterations based on the determined hyper-parameter value to obtain a quantified loss of precision for the neural network model comprises:

in response to the number of iterative training times not reaching the predetermined number of times, performing an iterative training process, the iterative training process comprising the operations of:

converting the floating point type parameters of the neural network model into integer type parameters for forward propagation in each model training or after model training for a preset number of times;

obtaining the precision loss of the neural network model as the precision loss after quantization; and

and performing back propagation according to the precision loss to update the floating point type parameters of the neural network model.

4. The method of claim 1, wherein the plurality of search spaces comprise a coarse-grained space and/or a fine-grained space, wherein,

in the coarse-grained space, one or more layers of network structures share the same super-parameter value; and

in the fine granularity space, one channel or a plurality of channels share the same super parameter value.

5. The method of claim 2, wherein the encoder is based on a neural network model.

6. The method of claim 2, wherein the preset conditions include one or more of: the model training precision reaches the preset precision, and the updating times of the encoder reach the preset times.

7. The method of claim 1, wherein the encoded values generated by the encoder correspond to a set size of the set of hyper-parameter values.

8. The method of claim 1, wherein the hyper-parameter comprises a batch normalized hyper-parameter.

9. The method of claim 2, the quantifying comprising one or more of: online quantization and offline quantization.

10. A hyper-parameter determination apparatus of a neural network model, comprising:

a search space construction unit configured to construct a plurality of search spaces in the neural network model;

a first obtaining unit, configured to obtain, for each of the plurality of search spaces, a corresponding hyper-parameter value set;

a second obtaining unit configured to obtain a set of codes generated by an encoder, wherein the number of codes in the set of codes is the same as the number of the plurality of search spaces; and

and the first determining unit is configured to determine a hyper-parameter value corresponding to each search space according to the group of codes and the acquired hyper-parameter value set.

11. The apparatus of claim 10, further comprising:

the training unit is configured to carry out iteration training on the neural network model for a preset number of times based on the determined super-parameter value so as to obtain the precision loss of the quantized neural network model;

an updating unit configured to update the encoder according to the precision loss to obtain a set of codes generated newly in response to the encoder update preset condition not being met; and

and the second determining unit is configured to determine a hyper-parameter value corresponding to each search space according to the newly generated group of codes and the acquired hyper-parameter value set.

12. The apparatus of claim 11, wherein training the neural network model a predetermined number of iterations based on the determined hyper-parameter value to obtain a quantified loss of precision for the neural network model comprises:

13. The apparatus of claim 10, wherein the plurality of search spaces comprise a coarse-grained space and/or a fine-grained space, wherein,

14. The apparatus of claim 11, wherein the encoder is based on a neural network model.

15. The apparatus of claim 11, wherein the preset conditions include one or more of: the model training precision reaches the preset precision, and the updating times of the encoder reach the preset times.

16. The apparatus of claim 10, wherein the encoded values generated by the encoder correspond to a set size of the set of hyper-parameter values.

17. The apparatus of claim 10, wherein the hyper-parameter comprises a batch normalized hyper-parameter.

18. The apparatus of claim 11, the quantization comprising one or more of: online quantization and offline quantization.

19. A computing device, comprising:

a processor; and

a memory storing a program comprising instructions that, when executed by the processor, cause the processor to perform the method of any of claims 1-9.

20. A computer-readable storage medium storing a program, the program comprising instructions that when executed by a processor of a computing device cause the computing device to perform the method of any of claims 1-9.