CN110633797B - Network model structure searching method and device and electronic equipment - Google Patents

Network model structure searching method and device and electronic equipment Download PDF

Info

Publication number
CN110633797B
CN110633797B CN201910863417.3A CN201910863417A CN110633797B CN 110633797 B CN110633797 B CN 110633797B CN 201910863417 A CN201910863417 A CN 201910863417A CN 110633797 B CN110633797 B CN 110633797B
Authority
CN
China
Prior art keywords
model structure
network model
hyper
performance
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910863417.3A
Other languages
Chinese (zh)
Other versions
CN110633797A (en
Inventor
希滕
张刚
温圣召
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201910863417.3A priority Critical patent/CN110633797B/en
Publication of CN110633797A publication Critical patent/CN110633797A/en
Application granted granted Critical
Publication of CN110633797B publication Critical patent/CN110633797B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The application discloses a network model structure searching method and device and electronic equipment, and relates to the field of architecture searching of a neural network. The specific implementation scheme is as follows: selecting a sampling model structure from a plurality of network model structures in a search space; updating the hyper-parameters of the probability distribution model by using the performance of the sampling model structure to obtain the predicted values of the hyper-parameters; obtaining the performance of any network model structure according to the probability distribution model with the predicted value of the hyper-parameter; and screening out the network model structure which accords with the search task according to the performance of the network model structure. When the search task is changed, the whole network model structure in the search space does not need to be searched again. Only the network model structure with the performance meeting the constraint condition of the search task needs to be selected. The search process is completely decoupled from the constraint condition, the search efficiency is improved, and the search cost is reduced.

Description

Network model structure searching method and device and electronic equipment
Technical Field
The present application relates to the field of computers, and more particularly, to the field of architecture search for neural networks.
Background
Deep learning techniques have enjoyed tremendous success in many directions, and NAS technology (Neural Architecture Search) has become a research hotspot in recent years. The NAS is a neural network architecture which is automatically searched out in a massive search space by replacing fussy manual operation with an algorithm. The step of conducting an architectural search of the neural network includes: first, a search space is defined and determined. Then, a search strategy is determined according to an adopted optimization algorithm, such as an algorithm of reinforcement learning, an evolutionary algorithm, bayesian optimization and the like. And finally, searching to obtain the speed of the model structure and the performance of the model.
Currently, the architecture search method for neural networks may include reinforcement learning-based model structure automatic search, evolutionary algorithm-based model structure automatic search, and gradient-based model structure automatic search. The three methods are all to generate a search strategy by using the model structure of the black box. The search strategy brings a plurality of technical problems, wherein the most important technical problem is that when various search tasks are carried out, such as a classification task, a detection task, a face recognition task and the like, each search is based on a single constraint condition of different search tasks, and different search results are obtained. When a plurality of search tasks are performed simultaneously, the search can be performed one by one, namely, the search process is strongly coupled with the constraint condition, and when the constraint condition changes, the search needs to be restarted.
Disclosure of Invention
The embodiment of the application provides a searching method and device of a network model structure and electronic equipment, so as to solve one or more technical problems in the prior art.
In a first aspect, an embodiment of the present application provides a method for searching a network model structure, including:
selecting a sampling model structure from a plurality of network model structures in a search space;
updating the hyper-parameters of the probability distribution model by using the performance of the sampling model structure to obtain the predicted values of the hyper-parameters;
obtaining the performance of any network model structure according to the probability distribution model with the predicted values of the hyper-parameters;
and screening out the network model structure which accords with the search task according to the performance of the network model structure.
In this embodiment, a small number of network model structures are selected from the search space as sampling model structures, and the sampling model structures are used to update the hyper-parameters of the probability distribution model, so as to derive the performance of any network model structure in the search space. When the search task changes, namely the constraint condition changes, the whole network model structure in the search space does not need to be searched again. Because the performance of any network model structure is obtained, the network model structure which meets the constraint condition of the search task is selected according to the performance of each network model structure. The search process is completely decoupled from the constraint condition, the search speed and accuracy are improved, and the search cost is reduced.
In one embodiment, updating the hyper-parameters of the probability distribution model by using the performance of the sampling model structure to obtain the predicted values of the hyper-parameters includes:
obtaining the prediction performance of the sampling model according to the prior distribution of the hyper-parameters of the probability distribution model;
inputting the predicted performance, the real performance and the performance deviation of the sampling model and the prior distribution of the hyper-parameters into a Bayes estimation algorithm model, and outputting the posterior distribution of the hyper-parameters;
taking the posterior distribution of the hyper-parameters as the prior distribution of the hyper-parameters obtained next time, and carrying out iterative calculation of preset times to obtain the predicted values of the hyper-parameters;
the preset times are preset iteration times according to the search task.
In the embodiment, the predicted value of the hyper-parameter is more and more close to the true value by iteratively updating the hyper-parameter. Due to the introduction of the prior distribution of the hyper-parameters, the transfer learning is convenient to realize among different search tasks, and the search efficiency is improved.
In one embodiment, obtaining the performance of any network model structure according to the probability distribution model with the predicted values of the hyper-parameters comprises:
constructing a kernel function and an average function of the probability distribution model, wherein the average function represents the relation between the structure and the performance of the same network model, and the kernel function represents the relation between different network model structures;
respectively inputting digital codes corresponding to the network model structure into a kernel function and a mean function to obtain a variance and a mean of the network model structure;
wherein, the performance of the network model structure comprises variance and mean.
In the present embodiment, a kernel function and a mean function in a probability distribution model are constructed. The relation between the same network model structure and performance and the relation between different network model structures are considered, the accuracy of updating the hyper-parameters is improved, and the searching precision is further improved.
In one embodiment, screening out the network model structure which accords with the search task according to the performance of the network model structure comprises the following steps:
acquiring a constraint condition of a search task;
obtaining a confidence space according to the variance and the mean of the network model structure;
and under the condition that the confidence space meets the constraint condition, outputting the network model structure as a search result which accords with the search task.
In the embodiment, when the search task is replaced, the constraint condition is changed, re-search is not needed, and transfer learning is easily realized among different search tasks, so that the search efficiency is obviously improved.
In one embodiment, the probability distribution model comprises a gaussian random field modeling model.
In a second aspect, the present application provides a device for searching a network model structure, including:
a sampling model structure selection module for selecting a sampling model structure from a plurality of network model structures in a search space;
the super-parameter updating module is used for updating the super-parameters of the probability distribution model by utilizing the performance of the sampling model structure to obtain the predicted values of the super-parameters;
the model structure performance acquisition module is used for acquiring the performance of any network model structure in the search space according to the probability distribution model with the predicted values of the hyper-parameters;
and the model structure screening module is used for screening out the network model structure which accords with the search task according to the performance of the network model structure.
In one embodiment, the hyper-parameter update module comprises:
the prediction performance acquisition unit is used for obtaining the prediction performance of the sampling model according to the prior distribution of the hyper-parameters of the probability distribution model;
the posterior distribution calculating unit is used for inputting the predicted performance, the real performance and the performance deviation thereof of the sampling model and the prior distribution of the hyperparameter into the Bayes estimation algorithm model and outputting the posterior distribution of the hyperparameter;
and the iteration calculation unit is used for performing iteration calculation of preset times by taking the posterior distribution of the hyper-parameters as the prior distribution of the hyper-parameters acquired next time to obtain the predicted values of the hyper-parameters, wherein the preset times are the preset iteration times according to the search task.
In one embodiment, the model structure performance acquisition module comprises:
the function building unit is used for building a kernel function and a mean function of the probability distribution model, wherein the mean function represents the relation between the structure and the performance of the same network model, and the kernel function represents the relation between different network model structures;
and the variance-mean calculating unit is used for respectively inputting the digital codes corresponding to the network model structure into the kernel function and the mean function to obtain the variance and the mean of the network model structure, wherein the performance of the network model structure comprises the variance and the mean.
In one embodiment, the model structure screening module comprises:
a constraint condition obtaining unit for obtaining a constraint condition of the search task;
the confidence space acquisition unit is used for obtaining a confidence space according to the variance and the mean of the network model structure;
and the search result output unit is used for outputting the network model structure as a search result which accords with the search task under the condition that the confidence space meets the constraint condition.
One embodiment in the above application has the following advantages or benefits: because a small number of model structures are used as sampling model structures, the performance of the sampling model structures is used for updating the hyper-parameters of the probability distribution model, and the performance of any network model structure in the search space is deduced, the technical problem that the search needs to be carried out again when the search task is replaced every time is solved, and the technical effects of improving the search efficiency and reducing the search cost are achieved.
Other effects of the above alternatives will be described below with reference to specific embodiments.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is a schematic flow chart of a method for searching a network model structure according to the present application;
FIG. 2 is a schematic flow chart of a method of another network model architecture according to the present application;
FIG. 3 is a block diagram of a searching apparatus of a network model structure according to the present application;
fig. 4 is a block diagram of a search apparatus according to another network model structure of the present application;
fig. 5 is a block diagram of an electronic device for implementing a network model structure searching method according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Example one
In a specific implementation manner, as shown in fig. 1, an embodiment of the present application provides a method for searching a network model structure, including:
step S10: selecting a sampling model structure from a plurality of network model structures in a search space;
step S20: updating the hyper-parameters of the probability distribution model by using the performance of the sampling model structure to obtain the predicted values of the hyper-parameters;
step S30: obtaining the performance of any network model structure according to the probability distribution model with the predicted value of the hyper-parameter;
step S40: and screening out the network model structure which accords with the search task according to the performance of the network model structure.
In one example, the probability distribution model does not require training of a network to compute (predict) the performance of any network model structure, and is therefore a white-box model. And when the black box model executes a search task each time, one network model structure needs to be trained, and then more network model structure performances can be obtained. Due to the predictive capability of the probabilistic distribution model structure, when performing search tasks such as classification, face recognition and detection, re-searching is not required when replacing task targets.
Since the hyper-parameters in the probability distribution model are unknown, the hyper-parameters are estimated first. In this embodiment, the prior distribution of the hyper-parameters may be iteratively calculated by using a bayesian estimation theory to obtain a posterior distribution of the hyper-parameters, so as to estimate the hyper-parameters. The specific process comprises the following steps: first, a prior distribution of hyper-parameters in a probability distribution model is initialized. The prior distribution of the hyper-parameters is a probabilistic expression of prior information before the true hyper-parameters are computed iteratively step by step. The prior distribution of the hyper-parameters can be obtained through small data set statistics, and can also be set according to artificial experience. The convergence of the sampling of the network model structure does not depend on the prior distribution of the hyper-parameters, since it is a stepwise iterative process. Then, in the search space, a plurality of network model structures may be randomly selected as the sampling model structure. Because the updating of the hyper-parameters is a step-by-step iterative process, and each iteration needs to utilize the performance of the sampling model structure. After each iteration, a hyper-parametric covariance matrix, main diagonal elements of the covariance matrix, and a fluctuation range of the reaction mean are obtained.
And further estimating the performance of any network model structure in the search space based on the posterior distribution of the hyper-parameters. The performance of the network model structure includes a mean and a variance, which can reflect the stability of the network model structure. And obtaining a confidence interval of the network model structure according to the mean and the variance. And judging whether the confidence interval meets the constraint condition of the search task, if so, terminating the search, and otherwise, continuously updating the hyperparameter of the probability distribution model structure. It should be noted that the posterior distribution of the hyper-parameters obtained from the current estimation round is used as the prior distribution of the next estimation round. The mean value of the network model structure obtained by searching is high, the variance is small, and the covariance matrix is small.
In this embodiment, a small number of model structures may be randomly selected from the search space as sampling model structures, and the sampling model structures may also be selected according to a sampling strategy that maximizes mutual information. The sampling strategy for mutual information maximization comprises the following steps: and in each iteration, selecting a sampling model structure in the next iteration by taking the maximization of mutual information between the candidate model structure and the residual model structure as a target. After the sampling model structure is selected, the hyper-parameters of the probability distribution model are updated by using the performance of the sampling model structure. And deducing the performance of any model structure in the search space by using the updated probability distribution model of the hyperparameter. When the search task changes, namely the constraint condition changes, the whole network model structure in the search space does not need to be searched again. Because the performance of any network model structure is obtained, only the network model structure with the performance meeting the constraint condition of the search task needs to be selected. The search process is completely decoupled from the constraint condition, so that the search efficiency is improved, and the search cost is reduced.
In one embodiment, as shown in FIG. 2, step S20: the method comprises the following steps:
step S201: obtaining the prediction performance of the sampling model according to the prior distribution of the hyper-parameters of the probability distribution model;
step S202: inputting the predicted performance, the real performance and the performance deviation of the sampling model and the prior distribution of the hyperparameter into a Bayes estimation algorithm model, and outputting the posterior distribution of the hyperparameter;
step S203: taking the posterior distribution of the hyper-parameters as the prior distribution of the hyper-parameters obtained next time, and carrying out iterative calculation of preset times to obtain the predicted values of the hyper-parameters;
the preset times are preset iteration times according to the search task.
In one example, the prior distribution of the hyper-parameters of the probability distribution model is typically initialized, and then the sampling model structure is selected based on the prior distribution. Before updating the hyperparameters in each round, estimating the prediction performance of the sampling model according to the prior distribution of the hyperparameters of the probability distribution model. And comparing the predicted performance of the sampling model with the real performance of the sampling model to obtain the performance deviation. The Bayesian estimation (Bayesian estimation) algorithm uses Bayesian theorem in combination with new evidence and previous prior probability to obtain new probability. In the embodiment, the posterior distribution of the hyperparameters is obtained by combining the Bayesian estimation algorithm and the performance deviation. And taking the posterior distribution of the hyper-parameters as the prior distribution of the hyper-parameters acquired next time, and circularly executing the steps S201-S202. And continuously updating the hyper-parameters through multiple iterations, so that the initialized hyper-parameters are more and more close to the true values through updating, and finally, the predicted values of the hyper-parameters are obtained. Due to the introduction of prior distribution of the hyper-parameters, transfer learning is realized among different search tasks, and the search efficiency is improved.
In one embodiment, as shown in fig. 2, step S30 includes:
step S301: constructing a kernel function and an average function of the probability distribution model, wherein the average function represents the relation between the structure and the performance of the same network model, and the kernel function represents the relation between different network model structures;
step S302: respectively inputting digital codes corresponding to the network model structure into a kernel function and a mean function to obtain a variance and a mean of the network model structure;
wherein, the performance of the network model structure comprises variance and mean.
In one example, the mean function and kernel function may be designed according to the characteristics of different search tasks. The kernel function and the mean function in the constructed probability distribution model contain hyper-parameters to be estimated. The digital code corresponding to the network model structure is also called network code, and the model structure is coded by using numbers. The actual task is complex, taking a 6-layer convolutional neural network as an example, the numbers 0,1,2 represent the number of convolutional channels 64, 128, and 256, respectively. Then [0,0,1,1,2,2] represents the model structure of the network with 64, 64, 128, 128, 256, 256 channels from layer 1 to layer 6. Each group of digital codes uniquely determines a network model structure. If the number of search network channels is used for encoding, a gaussian kernel function may be used. If the number of network channels to be searched and the size of the convolution kernel are used for coding, a Gaussian mixture kernel function can be adopted. The mean function can be designed using a linear function. Alternatively, a more complex mean function may be constructed by first projecting the radiation changes into a higher or lower dimensional space and then applying a linear function to the new space. Because the relation between the same network model structure and the performance and the relation between different network model structures are considered, the accuracy of updating the hyper-parameters is improved, and the searching precision is further improved.
In one embodiment, as shown in fig. 2, step S40 includes:
step S401: acquiring a constraint condition of a search task;
step S402: obtaining a confidence space according to the variance and the mean of the network model structure;
step S403: and under the condition that the confidence space meets the constraint condition of the search task, outputting the network model structure as a search result conforming to the search task.
In one example, the constraints of the search task may include a delay of the network model structure on the client side (e.g., a mobile phone side), or a delay of the network model structure on the server side, a size of the network model structure, and the like. The delay of the network model structure at the server end refers to the speed of the model, such as the speed of the model at the server or the mobile phone end. When the search tasks are replaced, the constraint conditions are changed, re-search is not needed, and transfer learning is easily realized among different search tasks, so that the search efficiency is obviously improved.
In one embodiment, the probability distribution model comprises a gaussian random field modeling model.
In an example, the conventional gaussian random field has only two dimensions of time and space, and in the embodiment, the conventional gaussian process may be expanded, and the similarity between different network structures is used as a bridge to expand the gaussian random field to the high-dimensional space. And designing a kernel function in a high-dimensional space to represent the correlation between different network model structures, and designing a mean function to represent the relation between the same network model structure and performance. When the parameters of the Gaussian random field modeling model are known, the performance and confidence interval of any model structure can be deduced.
Example two
In another embodiment, as shown in fig. 3, a searching apparatus 100 of a network model structure is provided, which includes:
a sampling model structure selection module 110 for selecting a sampling model structure from a plurality of network model structures in a search space;
the hyper-parameter updating module 120 is configured to update the hyper-parameters of the probability distribution model by using the performance of the sampling model structure, so as to obtain predicted values of the hyper-parameters;
a model structure performance obtaining module 130, configured to obtain performance of any network model structure in the search space according to the probability distribution model with the predicted value of the hyper-parameter;
and the model structure screening module 140 is used for screening out the network model structure which accords with the search task according to the performance of the network model structure.
In one embodiment, as shown in fig. 4, the searching apparatus 200 of the network model structure is formed on the basis of the searching apparatus 100 of the network model structure, and the hyper-parameter updating module 120 includes:
a prediction performance obtaining unit 1201, configured to obtain prediction performance of the sampling model according to prior distribution of the hyper-parameters of the probability distribution model;
a posterior distribution calculating unit 1202, configured to input the predicted performance, the true performance, and the performance deviation of the sampling model, and the prior distribution of the hyper-parameters into a bayesian estimation algorithm model, and output the posterior distribution of the hyper-parameters;
the iterative calculation unit 1203 is configured to perform iterative calculation for a preset number of times by using the posterior distribution of the hyper-parameter as a prior distribution of the hyper-parameter obtained next time, so as to obtain a predicted value of the hyper-parameter, where the preset number of times is a preset number of iterations according to the search task.
In one embodiment, as shown in fig. 4, the model structure performance obtaining module 130 includes:
a function building unit 1301, configured to build a kernel function and a mean function of the probability distribution model, where the mean function represents a relationship between the same network model structure and performance, and the kernel function represents a relationship between different network model structures;
the variance-mean calculating unit 1302 is configured to input the digital codes corresponding to the network model structure into the kernel function and the mean function respectively to obtain a variance and a mean of the network model structure, where the variance and the mean constitute a performance of the network model structure.
In one embodiment, as shown in FIG. 4, the model structure screening module 140 includes:
a constraint condition acquisition unit 1401 for acquiring a constraint condition of a search task;
a confidence space obtaining unit 1402, configured to obtain a confidence space according to the variance and the mean of the network model structure;
a search result output unit 1403, configured to output the network model structure as a search result that meets the search task when the confidence space meets the constraint condition.
Fig. 5 is a block diagram of an electronic device according to a network model structure search method according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 5, the electronic apparatus includes: one or more processors 501, memory 502, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display Graphical information for a Graphical User Interface (GUI) on an external input/output device, such as a display device coupled to the Interface. In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 5 illustrates an example of a processor 501.
Memory 502 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for searching a network model structure provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the search method of a network model structure provided by the present application.
The memory 502, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the searching of the network model structure in the embodiments of the present application (e.g., the sampling model structure selection module 110, the hyper-parameter update module 120, the model structure performance acquisition module 130, and the model structure screening module 140 shown in fig. 3). The processor 501 executes various functional applications of the server and data processing, i.e., a search method of a network model structure in the above method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 502.
The memory 502 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the electronic device searched according to the network model structure, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 502 optionally includes memory located remotely from processor 501, which may be connected to a network model architecture search electronics via a network connection. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the search method of the network model structure may further include: an input device 503 and an output device 504. The processor 501, the memory 502, the input device 503 and the output device 504 may be connected by a bus or other means, and fig. 5 illustrates the connection by a bus as an example.
The input device 503 may receive input numeric or character information and generate key signal input related to user settings and function control of the electronic device for which the network model structure is searched, for example, an input device such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or the like. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application Specific Integrated Circuits (ASICs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (Cathode ray Tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme of the embodiment of the application, a small number of model structures are selected from the search space as sampling model structures, the sampling model structures are used for updating the hyper-parameters of the probability distribution model, and the performance of any model structure in the search space is further deduced. When the search task changes, namely the constraint condition changes, the whole network model in the search space does not need to be searched again. Because the performance of any model structure is obtained, the model structure which meets the constraint condition of the search task is selected according to the performance of each model structure. The search process is completely decoupled from the constraint condition, the search efficiency is improved, and the search cost is reduced.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A method for searching a network model structure, comprising:
selecting a sampling model structure from network model structures of a plurality of neural networks in a search space;
updating the hyper-parameters of the probability distribution model by using the performance of the sampling model structure to obtain the predicted values of the hyper-parameters;
obtaining the performance of any network model structure according to a probability distribution model with the predicted values of the hyper-parameters; the performance of the network model structure comprises a variance and a mean of the network model structure;
acquiring a constraint condition of a search task; the search task comprises at least one of a classification task, a face recognition task and a detection task, and the constraint conditions of the search task comprise: at least one constraint condition of time delay of the network model structure on the client, time delay of the network model structure on the server and size of the network model structure;
obtaining a confidence space according to the variance and the mean of the network model structure; the variance and the mean of the network model structure are obtained based on a kernel function and a mean function of the probability distribution model, and the mean function and the kernel function are designed according to the characteristics of different search tasks;
and under the condition that the confidence space meets the constraint condition, outputting the network model structure as a search result which accords with the search task.
2. The method of claim 1, wherein updating the hyper-parameters of the probability distribution model using the performance of the sampling model structure to obtain the predicted values of the hyper-parameters comprises:
obtaining the prediction performance of the sampling model according to the prior distribution of the hyper-parameters of the probability distribution model;
inputting the predicted performance, the real performance and the performance deviation of the sampling model and the prior distribution of the hyper-parameters into a Bayesian estimation algorithm model, and outputting the posterior distribution of the hyper-parameters;
taking the posterior distribution of the hyper-parameters as the prior distribution of the hyper-parameters acquired next time, and performing iterative computation for preset times to obtain a predicted value of the hyper-parameters;
and the preset times are iteration times preset according to the search task.
3. The method of claim 1, wherein deriving the performance of any of the network model structures from a probability distribution model having predicted values of the hyper-parameters comprises:
constructing a kernel function and a mean function of the probability distribution model, wherein the kernel function and the mean function comprise predicted values of the hyper-parameters, the mean function represents the relationship between the same network model structure and performance, and the kernel function represents the relationship between different network model structures;
and respectively inputting the digital codes corresponding to the network model structure into the kernel function and the mean function to obtain the variance and the mean of the network model structure.
4. A method according to any one of claims 1 to 3, wherein the probability distribution model comprises a gaussian random field modeling model.
5. A network model structure search apparatus, comprising:
a sampling model structure selection module for selecting a sampling model structure from network model structures of a plurality of neural networks in a search space;
the super-parameter updating module is used for updating the super-parameters of the probability distribution model by utilizing the performance of the sampling model structure to obtain the predicted values of the super-parameters;
the model structure performance acquisition module is used for acquiring the performance of any network model structure in the search space according to a probability distribution model with the predicted values of the hyper-parameters; the performance of the network model structure comprises a variance and a mean of the network model structure;
the model structure screening module is used for acquiring a constraint condition of a search task, obtaining a confidence space according to the variance and the mean value of the network model structure, and outputting the network model structure as a search result conforming to the search task under the condition that the confidence space meets the constraint condition;
the search task comprises at least one of a classification task, a face recognition task and a detection task;
the constraints of the search task include: at least one constraint condition of time delay of the network model structure on the client, time delay of the network model structure on the server and size of the network model structure;
the variance and the mean of the network model are obtained based on a kernel function and a mean function of the probability distribution model, and the mean function and the kernel function are designed according to the characteristics of different search tasks.
6. The apparatus of claim 5, wherein the hyper-parameter update module comprises:
the prediction performance obtaining unit is used for obtaining the prediction performance of the sampling model according to the prior distribution of the hyper-parameters of the probability distribution model;
the posterior distribution calculating unit is used for inputting the predicted performance, the real performance and the performance deviation of the sampling model and the prior distribution of the hyperparameters into a Bayesian estimation algorithm model and outputting the posterior distribution of the hyperparameters;
and the iteration calculation unit is used for performing iteration calculation of preset times by taking the posterior distribution of the hyper-parameters as the prior distribution of the hyper-parameters acquired next time to obtain the predicted values of the hyper-parameters, wherein the preset times are preset iteration times according to the search task.
7. The apparatus of claim 5, wherein the model structure performance obtaining module comprises:
the function building unit is used for building a kernel function and a mean function of the probability distribution model, the kernel function and the mean function comprise predicted values of the hyper-parameters, the mean function represents the relationship between the structure and the performance of the same network model, and the kernel function represents the relationship between different network model structures;
and the variance-mean calculating unit is used for respectively inputting the digital codes corresponding to the network model structure into the kernel function and the mean function to obtain the variance and the mean of the network model structure.
8. The apparatus of claim 7, wherein the model structure screening module comprises:
a constraint condition obtaining unit, configured to obtain a constraint condition of the search task;
the confidence space acquisition unit is used for obtaining a confidence space according to the variance and the mean of the network model structure;
and the search result output unit is used for outputting the network model structure as a search result which accords with the search task under the condition that the confidence space meets the constraint condition.
9. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-4.
10. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-4.
CN201910863417.3A 2019-09-11 2019-09-11 Network model structure searching method and device and electronic equipment Active CN110633797B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910863417.3A CN110633797B (en) 2019-09-11 2019-09-11 Network model structure searching method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910863417.3A CN110633797B (en) 2019-09-11 2019-09-11 Network model structure searching method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN110633797A CN110633797A (en) 2019-12-31
CN110633797B true CN110633797B (en) 2022-12-02

Family

ID=68970985

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910863417.3A Active CN110633797B (en) 2019-09-11 2019-09-11 Network model structure searching method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN110633797B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111340222B (en) * 2020-02-25 2023-06-13 北京百度网讯科技有限公司 Neural network model searching method and device and electronic equipment
CN111340219A (en) * 2020-02-25 2020-06-26 北京百度网讯科技有限公司 Neural network model searching method and device, image processing method and processor
CN111340221B (en) * 2020-02-25 2023-09-12 北京百度网讯科技有限公司 Neural network structure sampling method and device
CN111488971B (en) * 2020-04-09 2023-10-24 北京百度网讯科技有限公司 Neural network model searching method and device, and image processing method and device
CN111582478B (en) * 2020-05-09 2023-09-22 北京百度网讯科技有限公司 Method and device for determining model structure
CN111612134B (en) * 2020-05-20 2024-04-12 鼎富智能科技有限公司 Neural network structure searching method and device, electronic equipment and storage medium
CN111680599B (en) * 2020-05-29 2023-08-08 北京百度网讯科技有限公司 Face recognition model processing method, device, equipment and storage medium
CN113807376A (en) * 2020-06-15 2021-12-17 富泰华工业(深圳)有限公司 Network model optimization method and device, electronic equipment and storage medium
EP4148623A4 (en) * 2020-09-10 2024-02-07 Aizoth Inc Hyperparameter adjustment device, non-transitory recording medium in which hyperparameter adjustment program is recorded, and hyperparameter adjustment program
CN112348188B (en) * 2020-11-13 2023-04-07 北京市商汤科技开发有限公司 Model generation method and device, electronic device and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062587A (en) * 2017-12-15 2018-05-22 清华大学 The hyper parameter automatic optimization method and system of a kind of unsupervised machine learning
CN108229657A (en) * 2017-12-25 2018-06-29 杭州健培科技有限公司 A kind of deep neural network training and optimization algorithm based on evolution algorithmic
CN108780519A (en) * 2016-03-11 2018-11-09 奇跃公司 Structure learning in convolutional neural networks
CN109214605A (en) * 2018-11-12 2019-01-15 国网山东省电力公司电力科学研究院 Power-system short-term Load Probability prediction technique, apparatus and system
CN109598332A (en) * 2018-11-14 2019-04-09 北京市商汤科技开发有限公司 Neural network generation method and device, electronic equipment and storage medium
CN109716346A (en) * 2016-07-18 2019-05-03 河谷生物组学有限责任公司 Distributed machines learning system, device and method
CN110197258A (en) * 2019-05-29 2019-09-03 北京市商汤科技开发有限公司 Neural network searching method, image processing method and device, equipment and medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108780519A (en) * 2016-03-11 2018-11-09 奇跃公司 Structure learning in convolutional neural networks
CN109716346A (en) * 2016-07-18 2019-05-03 河谷生物组学有限责任公司 Distributed machines learning system, device and method
CN108062587A (en) * 2017-12-15 2018-05-22 清华大学 The hyper parameter automatic optimization method and system of a kind of unsupervised machine learning
CN108229657A (en) * 2017-12-25 2018-06-29 杭州健培科技有限公司 A kind of deep neural network training and optimization algorithm based on evolution algorithmic
CN109214605A (en) * 2018-11-12 2019-01-15 国网山东省电力公司电力科学研究院 Power-system short-term Load Probability prediction technique, apparatus and system
CN109598332A (en) * 2018-11-14 2019-04-09 北京市商汤科技开发有限公司 Neural network generation method and device, electronic equipment and storage medium
CN110197258A (en) * 2019-05-29 2019-09-03 北京市商汤科技开发有限公司 Neural network searching method, image processing method and device, equipment and medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
启发式的多层客户-服务器排队网络模型的分析方法;郑羽洁;《煤炭技术》;20110510(第05期);第233-235页 *
基于人群的神经网络超参数优化的研究;朱汇龙等;《信息技术》;20181120(第11期);第105-110页 *
基于和声搜索算法和相关向量机的网络安全态势预测方法;李洁等;《计算机应用》;20160110(第01期);第199-203页 *
贝叶斯神经网络建模预测方法及其应用;樊春玲等;《中国惯性技术学报》;20090215(第01期);第89-92页 *

Also Published As

Publication number Publication date
CN110633797A (en) 2019-12-31

Similar Documents

Publication Publication Date Title
CN110633797B (en) Network model structure searching method and device and electronic equipment
CN111539479B (en) Method and device for generating sample data
CN111738414B (en) Recommendation model generation method, content recommendation method, device, equipment and medium
CN111242306B (en) Method, apparatus, electronic device, and computer-readable storage medium for quantum principal component analysis
CN111582479B (en) Distillation method and device for neural network model
CN110795569B (en) Method, device and equipment for generating vector representation of knowledge graph
CN113723278B (en) Training method and device for form information extraction model
CN111639753B (en) Method, apparatus, device and storage medium for training image processing super network
CN112559870B (en) Multi-model fusion method, device, electronic equipment and storage medium
CN111461343B (en) Model parameter updating method and related equipment thereof
CN111563593B (en) Training method and device for neural network model
CN110852379B (en) Training sample generation method and device for target object recognition
CN110569969A (en) Network model structure sampling method and device and electronic equipment
CN110580520A (en) model structure sampling device based on hyper-network and electronic equipment
CN111667057A (en) Method and apparatus for searching model structure
CN111767833A (en) Model generation method and device, electronic equipment and storage medium
CN111079945A (en) End-to-end model training method and device
CN111695698A (en) Method, device, electronic equipment and readable storage medium for model distillation
CN111652354B (en) Method, apparatus, device and storage medium for training super network
CN110569973A (en) Network structure searching method and device and electronic equipment
CN111539224A (en) Pruning method and device of semantic understanding model, electronic equipment and storage medium
CN111340219A (en) Neural network model searching method and device, image processing method and processor
CN112580723B (en) Multi-model fusion method, device, electronic equipment and storage medium
CN111640103A (en) Image detection method, device, equipment and storage medium
CN111753759A (en) Model generation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant