CN110633797B

CN110633797B - Network model structure searching method and device and electronic equipment

Info

Publication number: CN110633797B
Application number: CN201910863417.3A
Authority: CN
Inventors: 希滕; 张刚; 温圣召
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-09-11
Filing date: 2019-09-11
Publication date: 2022-12-02
Anticipated expiration: 2039-09-11
Also published as: CN110633797A

Abstract

The application discloses a network model structure searching method and device and electronic equipment, and relates to the field of architecture searching of a neural network. The specific implementation scheme is as follows: selecting a sampling model structure from a plurality of network model structures in a search space; updating the hyper-parameters of the probability distribution model by using the performance of the sampling model structure to obtain the predicted values of the hyper-parameters; obtaining the performance of any network model structure according to the probability distribution model with the predicted value of the hyper-parameter; and screening out the network model structure which accords with the search task according to the performance of the network model structure. When the search task is changed, the whole network model structure in the search space does not need to be searched again. Only the network model structure with the performance meeting the constraint condition of the search task needs to be selected. The search process is completely decoupled from the constraint condition, the search efficiency is improved, and the search cost is reduced.

Description

Network model structure searching method and device and electronic equipment

Technical Field

The present application relates to the field of computers, and more particularly, to the field of architecture search for neural networks.

Background

Deep learning techniques have enjoyed tremendous success in many directions, and NAS technology (Neural Architecture Search) has become a research hotspot in recent years. The NAS is a neural network architecture which is automatically searched out in a massive search space by replacing fussy manual operation with an algorithm. The step of conducting an architectural search of the neural network includes: first, a search space is defined and determined. Then, a search strategy is determined according to an adopted optimization algorithm, such as an algorithm of reinforcement learning, an evolutionary algorithm, bayesian optimization and the like. And finally, searching to obtain the speed of the model structure and the performance of the model.

Currently, the architecture search method for neural networks may include reinforcement learning-based model structure automatic search, evolutionary algorithm-based model structure automatic search, and gradient-based model structure automatic search. The three methods are all to generate a search strategy by using the model structure of the black box. The search strategy brings a plurality of technical problems, wherein the most important technical problem is that when various search tasks are carried out, such as a classification task, a detection task, a face recognition task and the like, each search is based on a single constraint condition of different search tasks, and different search results are obtained. When a plurality of search tasks are performed simultaneously, the search can be performed one by one, namely, the search process is strongly coupled with the constraint condition, and when the constraint condition changes, the search needs to be restarted.

Disclosure of Invention

The embodiment of the application provides a searching method and device of a network model structure and electronic equipment, so as to solve one or more technical problems in the prior art.

In a first aspect, an embodiment of the present application provides a method for searching a network model structure, including:

selecting a sampling model structure from a plurality of network model structures in a search space;

updating the hyper-parameters of the probability distribution model by using the performance of the sampling model structure to obtain the predicted values of the hyper-parameters;

obtaining the performance of any network model structure according to the probability distribution model with the predicted values of the hyper-parameters;

and screening out the network model structure which accords with the search task according to the performance of the network model structure.

In this embodiment, a small number of network model structures are selected from the search space as sampling model structures, and the sampling model structures are used to update the hyper-parameters of the probability distribution model, so as to derive the performance of any network model structure in the search space. When the search task changes, namely the constraint condition changes, the whole network model structure in the search space does not need to be searched again. Because the performance of any network model structure is obtained, the network model structure which meets the constraint condition of the search task is selected according to the performance of each network model structure. The search process is completely decoupled from the constraint condition, the search speed and accuracy are improved, and the search cost is reduced.

In one embodiment, updating the hyper-parameters of the probability distribution model by using the performance of the sampling model structure to obtain the predicted values of the hyper-parameters includes:

obtaining the prediction performance of the sampling model according to the prior distribution of the hyper-parameters of the probability distribution model;

inputting the predicted performance, the real performance and the performance deviation of the sampling model and the prior distribution of the hyper-parameters into a Bayes estimation algorithm model, and outputting the posterior distribution of the hyper-parameters;

taking the posterior distribution of the hyper-parameters as the prior distribution of the hyper-parameters obtained next time, and carrying out iterative calculation of preset times to obtain the predicted values of the hyper-parameters;

the preset times are preset iteration times according to the search task.

In the embodiment, the predicted value of the hyper-parameter is more and more close to the true value by iteratively updating the hyper-parameter. Due to the introduction of the prior distribution of the hyper-parameters, the transfer learning is convenient to realize among different search tasks, and the search efficiency is improved.

In one embodiment, obtaining the performance of any network model structure according to the probability distribution model with the predicted values of the hyper-parameters comprises:

constructing a kernel function and an average function of the probability distribution model, wherein the average function represents the relation between the structure and the performance of the same network model, and the kernel function represents the relation between different network model structures;

respectively inputting digital codes corresponding to the network model structure into a kernel function and a mean function to obtain a variance and a mean of the network model structure;

wherein, the performance of the network model structure comprises variance and mean.

In the present embodiment, a kernel function and a mean function in a probability distribution model are constructed. The relation between the same network model structure and performance and the relation between different network model structures are considered, the accuracy of updating the hyper-parameters is improved, and the searching precision is further improved.

In one embodiment, screening out the network model structure which accords with the search task according to the performance of the network model structure comprises the following steps:

acquiring a constraint condition of a search task;

obtaining a confidence space according to the variance and the mean of the network model structure;

and under the condition that the confidence space meets the constraint condition, outputting the network model structure as a search result which accords with the search task.

In the embodiment, when the search task is replaced, the constraint condition is changed, re-search is not needed, and transfer learning is easily realized among different search tasks, so that the search efficiency is obviously improved.

In one embodiment, the probability distribution model comprises a gaussian random field modeling model.

In a second aspect, the present application provides a device for searching a network model structure, including:

a sampling model structure selection module for selecting a sampling model structure from a plurality of network model structures in a search space;

the super-parameter updating module is used for updating the super-parameters of the probability distribution model by utilizing the performance of the sampling model structure to obtain the predicted values of the super-parameters;

the model structure performance acquisition module is used for acquiring the performance of any network model structure in the search space according to the probability distribution model with the predicted values of the hyper-parameters;

and the model structure screening module is used for screening out the network model structure which accords with the search task according to the performance of the network model structure.

In one embodiment, the hyper-parameter update module comprises:

the prediction performance acquisition unit is used for obtaining the prediction performance of the sampling model according to the prior distribution of the hyper-parameters of the probability distribution model;

the posterior distribution calculating unit is used for inputting the predicted performance, the real performance and the performance deviation thereof of the sampling model and the prior distribution of the hyperparameter into the Bayes estimation algorithm model and outputting the posterior distribution of the hyperparameter;

and the iteration calculation unit is used for performing iteration calculation of preset times by taking the posterior distribution of the hyper-parameters as the prior distribution of the hyper-parameters acquired next time to obtain the predicted values of the hyper-parameters, wherein the preset times are the preset iteration times according to the search task.

In one embodiment, the model structure performance acquisition module comprises:

the function building unit is used for building a kernel function and a mean function of the probability distribution model, wherein the mean function represents the relation between the structure and the performance of the same network model, and the kernel function represents the relation between different network model structures;

and the variance-mean calculating unit is used for respectively inputting the digital codes corresponding to the network model structure into the kernel function and the mean function to obtain the variance and the mean of the network model structure, wherein the performance of the network model structure comprises the variance and the mean.

In one embodiment, the model structure screening module comprises:

a constraint condition obtaining unit for obtaining a constraint condition of the search task;

the confidence space acquisition unit is used for obtaining a confidence space according to the variance and the mean of the network model structure;

and the search result output unit is used for outputting the network model structure as a search result which accords with the search task under the condition that the confidence space meets the constraint condition.

One embodiment in the above application has the following advantages or benefits: because a small number of model structures are used as sampling model structures, the performance of the sampling model structures is used for updating the hyper-parameters of the probability distribution model, and the performance of any network model structure in the search space is deduced, the technical problem that the search needs to be carried out again when the search task is replaced every time is solved, and the technical effects of improving the search efficiency and reducing the search cost are achieved.

Other effects of the above alternatives will be described below with reference to specific embodiments.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a schematic flow chart of a method for searching a network model structure according to the present application;

FIG. 2 is a schematic flow chart of a method of another network model architecture according to the present application;

FIG. 3 is a block diagram of a searching apparatus of a network model structure according to the present application;

fig. 4 is a block diagram of a search apparatus according to another network model structure of the present application;

fig. 5 is a block diagram of an electronic device for implementing a network model structure searching method according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Example one

In a specific implementation manner, as shown in fig. 1, an embodiment of the present application provides a method for searching a network model structure, including:

step S10: selecting a sampling model structure from a plurality of network model structures in a search space;

step S20: updating the hyper-parameters of the probability distribution model by using the performance of the sampling model structure to obtain the predicted values of the hyper-parameters;

step S30: obtaining the performance of any network model structure according to the probability distribution model with the predicted value of the hyper-parameter;

step S40: and screening out the network model structure which accords with the search task according to the performance of the network model structure.

In one example, the probability distribution model does not require training of a network to compute (predict) the performance of any network model structure, and is therefore a white-box model. And when the black box model executes a search task each time, one network model structure needs to be trained, and then more network model structure performances can be obtained. Due to the predictive capability of the probabilistic distribution model structure, when performing search tasks such as classification, face recognition and detection, re-searching is not required when replacing task targets.

Since the hyper-parameters in the probability distribution model are unknown, the hyper-parameters are estimated first. In this embodiment, the prior distribution of the hyper-parameters may be iteratively calculated by using a bayesian estimation theory to obtain a posterior distribution of the hyper-parameters, so as to estimate the hyper-parameters. The specific process comprises the following steps: first, a prior distribution of hyper-parameters in a probability distribution model is initialized. The prior distribution of the hyper-parameters is a probabilistic expression of prior information before the true hyper-parameters are computed iteratively step by step. The prior distribution of the hyper-parameters can be obtained through small data set statistics, and can also be set according to artificial experience. The convergence of the sampling of the network model structure does not depend on the prior distribution of the hyper-parameters, since it is a stepwise iterative process. Then, in the search space, a plurality of network model structures may be randomly selected as the sampling model structure. Because the updating of the hyper-parameters is a step-by-step iterative process, and each iteration needs to utilize the performance of the sampling model structure. After each iteration, a hyper-parametric covariance matrix, main diagonal elements of the covariance matrix, and a fluctuation range of the reaction mean are obtained.

And further estimating the performance of any network model structure in the search space based on the posterior distribution of the hyper-parameters. The performance of the network model structure includes a mean and a variance, which can reflect the stability of the network model structure. And obtaining a confidence interval of the network model structure according to the mean and the variance. And judging whether the confidence interval meets the constraint condition of the search task, if so, terminating the search, and otherwise, continuously updating the hyperparameter of the probability distribution model structure. It should be noted that the posterior distribution of the hyper-parameters obtained from the current estimation round is used as the prior distribution of the next estimation round. The mean value of the network model structure obtained by searching is high, the variance is small, and the covariance matrix is small.

In this embodiment, a small number of model structures may be randomly selected from the search space as sampling model structures, and the sampling model structures may also be selected according to a sampling strategy that maximizes mutual information. The sampling strategy for mutual information maximization comprises the following steps: and in each iteration, selecting a sampling model structure in the next iteration by taking the maximization of mutual information between the candidate model structure and the residual model structure as a target. After the sampling model structure is selected, the hyper-parameters of the probability distribution model are updated by using the performance of the sampling model structure. And deducing the performance of any model structure in the search space by using the updated probability distribution model of the hyperparameter. When the search task changes, namely the constraint condition changes, the whole network model structure in the search space does not need to be searched again. Because the performance of any network model structure is obtained, only the network model structure with the performance meeting the constraint condition of the search task needs to be selected. The search process is completely decoupled from the constraint condition, so that the search efficiency is improved, and the search cost is reduced.

In one embodiment, as shown in FIG. 2, step S20: the method comprises the following steps:

step S201: obtaining the prediction performance of the sampling model according to the prior distribution of the hyper-parameters of the probability distribution model;

step S202: inputting the predicted performance, the real performance and the performance deviation of the sampling model and the prior distribution of the hyperparameter into a Bayes estimation algorithm model, and outputting the posterior distribution of the hyperparameter;

step S203: taking the posterior distribution of the hyper-parameters as the prior distribution of the hyper-parameters obtained next time, and carrying out iterative calculation of preset times to obtain the predicted values of the hyper-parameters;

the preset times are preset iteration times according to the search task.

In one example, the prior distribution of the hyper-parameters of the probability distribution model is typically initialized, and then the sampling model structure is selected based on the prior distribution. Before updating the hyperparameters in each round, estimating the prediction performance of the sampling model according to the prior distribution of the hyperparameters of the probability distribution model. And comparing the predicted performance of the sampling model with the real performance of the sampling model to obtain the performance deviation. The Bayesian estimation (Bayesian estimation) algorithm uses Bayesian theorem in combination with new evidence and previous prior probability to obtain new probability. In the embodiment, the posterior distribution of the hyperparameters is obtained by combining the Bayesian estimation algorithm and the performance deviation. And taking the posterior distribution of the hyper-parameters as the prior distribution of the hyper-parameters acquired next time, and circularly executing the steps S201-S202. And continuously updating the hyper-parameters through multiple iterations, so that the initialized hyper-parameters are more and more close to the true values through updating, and finally, the predicted values of the hyper-parameters are obtained. Due to the introduction of prior distribution of the hyper-parameters, transfer learning is realized among different search tasks, and the search efficiency is improved.

In one embodiment, as shown in fig. 2, step S30 includes:

step S301: constructing a kernel function and an average function of the probability distribution model, wherein the average function represents the relation between the structure and the performance of the same network model, and the kernel function represents the relation between different network model structures;

step S302: respectively inputting digital codes corresponding to the network model structure into a kernel function and a mean function to obtain a variance and a mean of the network model structure;

In one example, the mean function and kernel function may be designed according to the characteristics of different search tasks. The kernel function and the mean function in the constructed probability distribution model contain hyper-parameters to be estimated. The digital code corresponding to the network model structure is also called network code, and the model structure is coded by using numbers. The actual task is complex, taking a 6-layer convolutional neural network as an example, the numbers 0,1,2 represent the number of convolutional channels 64, 128, and 256, respectively. Then [0,0,1,1,2,2] represents the model structure of the network with 64, 64, 128, 128, 256, 256 channels from layer 1 to layer 6. Each group of digital codes uniquely determines a network model structure. If the number of search network channels is used for encoding, a gaussian kernel function may be used. If the number of network channels to be searched and the size of the convolution kernel are used for coding, a Gaussian mixture kernel function can be adopted. The mean function can be designed using a linear function. Alternatively, a more complex mean function may be constructed by first projecting the radiation changes into a higher or lower dimensional space and then applying a linear function to the new space. Because the relation between the same network model structure and the performance and the relation between different network model structures are considered, the accuracy of updating the hyper-parameters is improved, and the searching precision is further improved.

In one embodiment, as shown in fig. 2, step S40 includes:

step S401: acquiring a constraint condition of a search task;

step S402: obtaining a confidence space according to the variance and the mean of the network model structure;

step S403: and under the condition that the confidence space meets the constraint condition of the search task, outputting the network model structure as a search result conforming to the search task.

In one example, the constraints of the search task may include a delay of the network model structure on the client side (e.g., a mobile phone side), or a delay of the network model structure on the server side, a size of the network model structure, and the like. The delay of the network model structure at the server end refers to the speed of the model, such as the speed of the model at the server or the mobile phone end. When the search tasks are replaced, the constraint conditions are changed, re-search is not needed, and transfer learning is easily realized among different search tasks, so that the search efficiency is obviously improved.

In an example, the conventional gaussian random field has only two dimensions of time and space, and in the embodiment, the conventional gaussian process may be expanded, and the similarity between different network structures is used as a bridge to expand the gaussian random field to the high-dimensional space. And designing a kernel function in a high-dimensional space to represent the correlation between different network model structures, and designing a mean function to represent the relation between the same network model structure and performance. When the parameters of the Gaussian random field modeling model are known, the performance and confidence interval of any model structure can be deduced.

Example two

In another embodiment, as shown in fig. 3, a searching apparatus 100 of a network model structure is provided, which includes:

a sampling model structure selection module 110 for selecting a sampling model structure from a plurality of network model structures in a search space;

the hyper-parameter updating module 120 is configured to update the hyper-parameters of the probability distribution model by using the performance of the sampling model structure, so as to obtain predicted values of the hyper-parameters;

a model structure performance obtaining module 130, configured to obtain performance of any network model structure in the search space according to the probability distribution model with the predicted value of the hyper-parameter;

and the model structure screening module 140 is used for screening out the network model structure which accords with the search task according to the performance of the network model structure.

In one embodiment, as shown in fig. 4, the searching apparatus 200 of the network model structure is formed on the basis of the searching apparatus 100 of the network model structure, and the hyper-parameter updating module 120 includes:

a prediction performance obtaining unit 1201, configured to obtain prediction performance of the sampling model according to prior distribution of the hyper-parameters of the probability distribution model;

a posterior distribution calculating unit 1202, configured to input the predicted performance, the true performance, and the performance deviation of the sampling model, and the prior distribution of the hyper-parameters into a bayesian estimation algorithm model, and output the posterior distribution of the hyper-parameters;

the iterative calculation unit 1203 is configured to perform iterative calculation for a preset number of times by using the posterior distribution of the hyper-parameter as a prior distribution of the hyper-parameter obtained next time, so as to obtain a predicted value of the hyper-parameter, where the preset number of times is a preset number of iterations according to the search task.

In one embodiment, as shown in fig. 4, the model structure performance obtaining module 130 includes:

a function building unit 1301, configured to build a kernel function and a mean function of the probability distribution model, where the mean function represents a relationship between the same network model structure and performance, and the kernel function represents a relationship between different network model structures;

the variance-mean calculating unit 1302 is configured to input the digital codes corresponding to the network model structure into the kernel function and the mean function respectively to obtain a variance and a mean of the network model structure, where the variance and the mean constitute a performance of the network model structure.

In one embodiment, as shown in FIG. 4, the model structure screening module 140 includes:

a constraint condition acquisition unit 1401 for acquiring a constraint condition of a search task;

a confidence space obtaining unit 1402, configured to obtain a confidence space according to the variance and the mean of the network model structure;

a search result output unit 1403, configured to output the network model structure as a search result that meets the search task when the confidence space meets the constraint condition.

Fig. 5 is a block diagram of an electronic device according to a network model structure search method according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 5, the electronic apparatus includes: one or more processors 501, memory 502, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display Graphical information for a Graphical User Interface (GUI) on an external input/output device, such as a display device coupled to the Interface. In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 5 illustrates an example of a processor 501.

Memory 502 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for searching a network model structure provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the search method of a network model structure provided by the present application.

The memory 502, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the searching of the network model structure in the embodiments of the present application (e.g., the sampling model structure selection module 110, the hyper-parameter update module 120, the model structure performance acquisition module 130, and the model structure screening module 140 shown in fig. 3). The processor 501 executes various functional applications of the server and data processing, i.e., a search method of a network model structure in the above method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 502.

The memory 502 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the electronic device searched according to the network model structure, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 502 optionally includes memory located remotely from processor 501, which may be connected to a network model architecture search electronics via a network connection. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the search method of the network model structure may further include: an input device 503 and an output device 504. The processor 501, the memory 502, the input device 503 and the output device 504 may be connected by a bus or other means, and fig. 5 illustrates the connection by a bus as an example.

The input device 503 may receive input numeric or character information and generate key signal input related to user settings and function control of the electronic device for which the network model structure is searched, for example, an input device such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or the like. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application Specific Integrated Circuits (ASICs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (Cathode ray Tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, a small number of model structures are selected from the search space as sampling model structures, the sampling model structures are used for updating the hyper-parameters of the probability distribution model, and the performance of any model structure in the search space is further deduced. When the search task changes, namely the constraint condition changes, the whole network model in the search space does not need to be searched again. Because the performance of any model structure is obtained, the model structure which meets the constraint condition of the search task is selected according to the performance of each model structure. The search process is completely decoupled from the constraint condition, the search efficiency is improved, and the search cost is reduced.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for searching a network model structure, comprising:

selecting a sampling model structure from network model structures of a plurality of neural networks in a search space;

obtaining the performance of any network model structure according to a probability distribution model with the predicted values of the hyper-parameters; the performance of the network model structure comprises a variance and a mean of the network model structure;

acquiring a constraint condition of a search task; the search task comprises at least one of a classification task, a face recognition task and a detection task, and the constraint conditions of the search task comprise: at least one constraint condition of time delay of the network model structure on the client, time delay of the network model structure on the server and size of the network model structure;

obtaining a confidence space according to the variance and the mean of the network model structure; the variance and the mean of the network model structure are obtained based on a kernel function and a mean function of the probability distribution model, and the mean function and the kernel function are designed according to the characteristics of different search tasks;

2. The method of claim 1, wherein updating the hyper-parameters of the probability distribution model using the performance of the sampling model structure to obtain the predicted values of the hyper-parameters comprises:

inputting the predicted performance, the real performance and the performance deviation of the sampling model and the prior distribution of the hyper-parameters into a Bayesian estimation algorithm model, and outputting the posterior distribution of the hyper-parameters;

taking the posterior distribution of the hyper-parameters as the prior distribution of the hyper-parameters acquired next time, and performing iterative computation for preset times to obtain a predicted value of the hyper-parameters;

and the preset times are iteration times preset according to the search task.

3. The method of claim 1, wherein deriving the performance of any of the network model structures from a probability distribution model having predicted values of the hyper-parameters comprises:

constructing a kernel function and a mean function of the probability distribution model, wherein the kernel function and the mean function comprise predicted values of the hyper-parameters, the mean function represents the relationship between the same network model structure and performance, and the kernel function represents the relationship between different network model structures;

and respectively inputting the digital codes corresponding to the network model structure into the kernel function and the mean function to obtain the variance and the mean of the network model structure.

4. A method according to any one of claims 1 to 3, wherein the probability distribution model comprises a gaussian random field modeling model.

5. A network model structure search apparatus, comprising:

a sampling model structure selection module for selecting a sampling model structure from network model structures of a plurality of neural networks in a search space;

the model structure performance acquisition module is used for acquiring the performance of any network model structure in the search space according to a probability distribution model with the predicted values of the hyper-parameters; the performance of the network model structure comprises a variance and a mean of the network model structure;

the model structure screening module is used for acquiring a constraint condition of a search task, obtaining a confidence space according to the variance and the mean value of the network model structure, and outputting the network model structure as a search result conforming to the search task under the condition that the confidence space meets the constraint condition;

the search task comprises at least one of a classification task, a face recognition task and a detection task;

the constraints of the search task include: at least one constraint condition of time delay of the network model structure on the client, time delay of the network model structure on the server and size of the network model structure;

the variance and the mean of the network model are obtained based on a kernel function and a mean function of the probability distribution model, and the mean function and the kernel function are designed according to the characteristics of different search tasks.

6. The apparatus of claim 5, wherein the hyper-parameter update module comprises:

the prediction performance obtaining unit is used for obtaining the prediction performance of the sampling model according to the prior distribution of the hyper-parameters of the probability distribution model;

the posterior distribution calculating unit is used for inputting the predicted performance, the real performance and the performance deviation of the sampling model and the prior distribution of the hyperparameters into a Bayesian estimation algorithm model and outputting the posterior distribution of the hyperparameters;

and the iteration calculation unit is used for performing iteration calculation of preset times by taking the posterior distribution of the hyper-parameters as the prior distribution of the hyper-parameters acquired next time to obtain the predicted values of the hyper-parameters, wherein the preset times are preset iteration times according to the search task.

7. The apparatus of claim 5, wherein the model structure performance obtaining module comprises:

the function building unit is used for building a kernel function and a mean function of the probability distribution model, the kernel function and the mean function comprise predicted values of the hyper-parameters, the mean function represents the relationship between the structure and the performance of the same network model, and the kernel function represents the relationship between different network model structures;

and the variance-mean calculating unit is used for respectively inputting the digital codes corresponding to the network model structure into the kernel function and the mean function to obtain the variance and the mean of the network model structure.

8. The apparatus of claim 7, wherein the model structure screening module comprises:

a constraint condition obtaining unit, configured to obtain a constraint condition of the search task;

9. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-4.

10. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-4.