CN111340220A - Method and apparatus for training a predictive model - Google Patents

Method and apparatus for training a predictive model Download PDF

Info

Publication number
CN111340220A
CN111340220A CN202010116709.3A CN202010116709A CN111340220A CN 111340220 A CN111340220 A CN 111340220A CN 202010116709 A CN202010116709 A CN 202010116709A CN 111340220 A CN111340220 A CN 111340220A
Authority
CN
China
Prior art keywords
network
sub
trained
sampling
sampling operation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010116709.3A
Other languages
Chinese (zh)
Other versions
CN111340220B (en
Inventor
希滕
张刚
温圣召
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010116709.3A priority Critical patent/CN111340220B/en
Publication of CN111340220A publication Critical patent/CN111340220A/en
Application granted granted Critical
Publication of CN111340220B publication Critical patent/CN111340220B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to the field of artificial intelligence. Embodiments of the present disclosure disclose methods and apparatus for training a predictive model. The prediction model is used for predicting the performance of the neural network structure, and the method comprises training the prediction model through a sampling operation; the sampling operation comprises the following steps: sampling a sub-network from the trained super-network, and training the sampled sub-network to obtain the performance information of the trained sub-network; constructing sample data based on the trained sub-network and the corresponding performance information, and training a prediction model by using the sample data; and in response to determining that the precision of the prediction model trained in the current sampling operation does not meet the preset condition, executing the next sampling operation, and increasing the number of sub-networks for sampling in the next sampling operation. The method can reduce the searching cost of the neural network model structure.

Description

Method and apparatus for training a predictive model
Technical Field
The embodiment of the disclosure relates to the technical field of computers, in particular to the technical field of artificial intelligence, and particularly relates to a method and a device for training a prediction model.
Background
With the development of artificial intelligence technology and data storage technology, deep neural networks have achieved important results in many fields. The design of the deep neural network architecture has a direct impact on its performance. The design of the traditional deep neural network structure is completed by manual experience. The manual design of the network structure requires a lot of expert knowledge, and the network structure needs to be designed specifically for different tasks or application scenarios, which is costly.
NAS (neural architecture search) is an algorithm to replace the tedious manual operation, and automatically searches out the optimal neural network architecture. The existing model structure automatic search can only search based on specific constraint conditions, such as searching aiming at a specific hardware device model. However, the constraints in an actual scenario are complex and vary widely, involving multiple hardware classes, such as multiple different models of processors. The search constraints are also numerous for each type of hardware, such as different latency constraints. The existing method needs to execute network structure search aiming at each constraint condition, and a large amount of repeated network structure search tasks consume a lot of computing resources and are very high in cost.
Disclosure of Invention
Embodiments of the present disclosure propose methods and apparatuses, electronic devices, and computer-readable media for training a predictive model.
In a first aspect, embodiments of the present disclosure provide a method for training a predictive model for predicting performance of a neural network structure, the method for training the predictive model including training the predictive model by a sampling operation; the sampling operation comprises the following steps: sampling a sub-network from the trained super-network, and training the sampled sub-network to obtain the performance information of the trained sub-network; constructing sample data based on the trained sub-network and the corresponding performance information, and training a prediction model by using the sample data; and in response to determining that the precision of the prediction model trained in the current sampling operation does not meet the preset condition, executing the next sampling operation, and increasing the number of sub-networks for sampling in the next sampling operation.
In some embodiments, the sampling a sub-network from the trained super-network includes: sampling a sub-network from the trained super-network by adopting an initial recurrent neural network; and before training the sampled subnetwork, the sampling operation further comprises: generating feedback information based on the trained performance information of the sub-network to iteratively update the recurrent neural network based on the feedback information; and re-sampling the sub-networks from the trained super-network based on the iteratively updated recurrent neural network.
In some embodiments, the sampling a sub-network from the trained super-network includes: sampling a sub-network which is not sampled from the trained super-network; and constructing sample data based on the trained sub-network and the corresponding performance information, comprising: and constructing sample data based on the sub-networks and the corresponding performance information sampled in the current sampling operation and the sub-networks and the corresponding performance information sampled in the last sampling operation.
In some embodiments, the sampling operation further comprises: and generating a trained prediction model based on a training result of the current sampling operation in response to determining that the precision of the prediction model meets a preset condition.
In some embodiments, the above method further comprises: and searching a neural network model structure meeting the performance constraint condition in the model structure search space based on the performance prediction result of the trained prediction model on the model structure in the preset model structure search space and the performance constraint condition of the preset deep learning task scene.
In a second aspect, an embodiment of the present disclosure provides an apparatus for training a prediction model, the prediction model being used for predicting performance of a neural network structure, the apparatus for training the prediction model including a sampling unit configured to train the prediction model by a sampling operation; the sampling operation performed by the sampling unit includes: sampling a sub-network from the trained super-network, and training the sampled sub-network to obtain the performance information of the trained sub-network; constructing sample data based on the trained sub-network and the corresponding performance information, and training a prediction model by using the sample data; and in response to determining that the precision of the prediction model trained in the current sampling operation does not meet the preset condition, executing the next sampling operation, and increasing the number of sub-networks for sampling in the next sampling operation.
The sampling unit for training the prediction model samples the sub-network from the trained super-network as follows: sampling a sub-network from the trained super-network by adopting an initial recurrent neural network; and before training the sampled sub-network, the sampling operation performed by the sampling unit further includes: generating feedback information based on the trained performance information of the sub-network to iteratively update the recurrent neural network based on the feedback information; and re-sampling the sub-networks from the trained super-network based on the iteratively updated recurrent neural network.
The sampling unit for training the prediction model samples the sub-network from the trained super-network as follows: sampling a sub-network which is not sampled from the trained super-network; and the sampling unit constructs sample data according to the following mode: and constructing sample data based on the sub-networks and the corresponding performance information sampled in the current sampling operation and the sub-networks and the corresponding performance information sampled in the last sampling operation.
The sampling operations for training the predictive model further include: and generating a trained prediction model based on a training result of the current sampling operation in response to determining that the precision of the prediction model meets a preset condition.
The apparatus for training a predictive model further comprises: and the searching unit is configured to search out the neural network model structure meeting the performance constraint condition in the model structure searching space based on the performance prediction result of the trained prediction model on the model structure in the preset model structure searching space and the performance constraint condition of the preset deep learning task scene.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device for storing one or more programs which, when executed by one or more processors, cause the one or more processors to implement a method for training a predictive model as provided in the first aspect.
In a fourth aspect, an embodiment of the present disclosure provides a computer-readable medium on which a computer program is stored, wherein the program, when executed by a processor, implements the method for training a predictive model provided in the first aspect.
The method and apparatus for training a prediction model of the above-described embodiments of the present disclosure train the prediction model by a sampling operation; wherein the sampling operation comprises: sampling a sub-network from the trained super-network, and training the sampled sub-network to obtain the performance information of the trained sub-network; constructing sample data based on the trained sub-network and the corresponding performance information, and training a prediction model by using the sample data; and in response to determining that the accuracy of the prediction model does not meet the preset condition, executing next sampling operation, and increasing the number of sub-networks for sampling in the next sampling operation, wherein the prediction model is used for predicting the performance of the neural network structure. The method and the device can obtain the prediction model for predicting the performance of any model structure, so that the model structure with the optimal performance can be obtained only by searching for different constraint conditions once when the method and the device are applied to automatic search of the model structure, resources consumed by model structure search are effectively reduced, and the model structure search cost is reduced.
Drawings
Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram in which embodiments of the present disclosure may be applied;
FIG. 2 is a flow diagram of one embodiment of a method for training a predictive model according to the present disclosure;
FIG. 3 is a flow diagram of another embodiment of a method for training a predictive model according to the present disclosure;
FIG. 4 is a schematic block diagram illustrating an embodiment of an apparatus for training a predictive model according to the present disclosure;
FIG. 5 is a schematic block diagram of a computer system suitable for use in implementing an electronic device of an embodiment of the present disclosure.
Detailed Description
The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 illustrates an example system architecture 100 to which the disclosed method for training a hyper-network or apparatus for training a hyper-network may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The terminal devices 101, 102, 103 interact with a server 105 via a network 104 to receive or send messages or the like. The end devices 101, 102, 103 may be customer premises devices on which various client applications may be installed. Such as image processing-type applications, information analysis-type applications, voice assistant-type applications, shopping-type applications, financial-type applications, and the like.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may be a server running various services, such as a server running an image or voice data based target tracking, voice processing service. The server 105 may obtain deep learning task data from the terminal devices 101, 102, 103 or obtain deep learning task data from a database to construct training samples, and automatically search and optimize a model structure of a neural network for performing a deep learning task. The server 105 may further run a prediction model for predicting the performance of the neural network structure, and predict the performance of different neural network structures based on the prediction model when performing automatic search of the model structure, thereby quickly determining the neural network model structure with the optimal performance.
In an application scenario of an embodiment of the present disclosure, the server 105 may implement automatic search of a model structure of a neural network through a super network. The server 105 may train the super network based on the acquired deep learning task data, such as media data of images, texts, voices, and the like, and after the super network training is completed, the server 105 may sample a sub-network structure from the super network to execute a corresponding task.
The server 105 may also be a backend server providing backend support for applications installed on the terminal devices 101, 102, 103. For example, the server 105 may receive data to be processed sent by the terminal devices 101, 102, 103, process the data using the neural network model, and return the processing result to the terminal devices 101, 102, 103.
In a real scenario, the terminal devices 101, 102, 103 may send deep learning task requests related to tasks such as voice interaction, text classification, dialogue behavior classification, image recognition, key point detection, etc. to the server 105. A neural network model, which has been trained for a corresponding deep learning task, may be run on the server 105, with which information is processed.
It should be noted that the method for training the prediction model provided by the embodiment of the present disclosure is generally performed by the server 105, and accordingly, the apparatus for training the prediction model is generally disposed in the server 105.
In some scenarios, the server 105 may obtain from a database, memory, or other device the source data (e.g., training samples, trained completed hyper-networks, etc.) needed to train the predictive model, in which case the example system architecture 100 may be absent of the terminal devices 101, 102, 103 and the network 104.
The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method for training a predictive model in accordance with the present disclosure is shown.
The predictive model of the present disclosure is used to predict the performance of a neural network structure. The performance of the neural network structure may include at least one of: the accuracy of the neural network structure in executing the corresponding deep learning task, the running power consumption of the neural network structure in the specified hardware or software environment, the running delay of the neural network structure in the specified hardware or software environment, the memory occupancy rate of the neural network structure in the specified hardware or software environment, and the like. It should be noted that, for different hardware or software environments, the corresponding prediction models may be trained separately. Different prediction models can be trained respectively aiming at different deep learning tasks.
The process 200 of the method for training a predictive model in this embodiment includes training a predictive model through a sampling operation. The sampling operation includes the following steps 201 to 203.
In step 201, a sub-network is sampled from the trained super-network, and the sampled sub-network is trained to obtain the performance information of the trained sub-network.
In this embodiment, the executive of the method for training the predictive model may acquire a pre-trained hyper-network. The structure of the super network may be predetermined, and includes all network structures in the network structure search space, and each layer of the super network may include a plurality of network structure units in the network structure search space. Here, the network structure unit may be formed by a single network layer, such as a single convolutional layer, a single cyclic unit in a cyclic neural network, or may be formed by combining a plurality of network layers, such as a convolutional block (block) formed by convolutional layers, batch normalization layers, and nonlinear layer connections. In a super network, each network fabric element may be connected to all network fabric elements of its upper and lower layers. After the training of the hyper-network is completed, all internal network structures share parameters when different sub-networks are constructed.
The sub-networks may be sampled randomly in the super-network or may be sampled out of the super-network using a trained recurrent neural network. It should be noted that in each sampling operation, a plurality of sub-networks may be sampled.
Training data may be acquired to train the sampled subnetwork. The training data may be media data such as images, texts, voices and videos, or digital data such as positions, prices and time, and may be determined according to the performed deep learning task, for example, if the deep learning task is an image classification task, the training data is image data.
In this embodiment, the training data may be data with label information, and during the training process of the sub-network, the error of the sub-network is determined based on the label information of the training data input to the sub-network, and then the parameters of the sub-network are iteratively adjusted by means of back propagation of the error, so that the parameters of the sub-network are gradually optimized during the training process.
After training of the sub-network is complete, the performance of the sub-network may be tested using the test data. The test data may also have label information, and the performance information of the trained sub-network is obtained according to the processing result of the trained sub-network on the test data and the corresponding label information.
In step 202, sample data is constructed based on the trained sub-networks and corresponding performance information, and the predictive model is trained using the sample data.
In this embodiment, the sub-network trained in step 201 and the performance information thereof may be used to construct a pair of sample data, in which the sub-network is the input information and the corresponding performance information is the label information corresponding to the input information.
Alternatively, a part of the sample data may be used as a training sample, and another part of the training sample may be used as a test sample.
The predictive model to be trained may be trained using the sample data. The structure of the prediction model to be trained may be pre-constructed, for example, it may be automatically searched out from the search space based on the NAS method, or it may be a network structure such as a preset convolutional neural network, a cyclic neural network, etc. In this embodiment, the sub-networks in the sample data may be encoded and then input into the prediction model to be trained, and the prediction model to be trained may predict the performance information of the input sub-networks. And constructing an objective function according to the difference between the prediction result of the prediction model to be trained on the performance information of the sub-network and the labeling information of the sub-network, and iteratively adjusting the parameters of the prediction model to be trained by minimizing the objective function. When the value of the objective function converges to a preset range, or the number of times of iteratively adjusting the parameter of the prediction model to be trained reaches a preset number threshold, the parameter of the prediction model to be trained can be fixed, and the prediction model generated in the current sampling operation is obtained.
Because the sub-networks are obtained by sampling from the trained super-network, the initial parameters of each sub-network have high precision, so that the sub-networks can be converged quickly when being trained in step 202, and the scheme of the embodiment can quickly complete each sampling operation, thereby accelerating the training speed of the prediction model.
In step 203, in response to determining that the accuracy of the prediction model does not satisfy the preset condition, a next sampling operation is performed, and the number of sampled sub-networks is increased in the next sampling operation.
After the training of the prediction model to be trained is stopped in the current sampling operation, the prediction accuracy of the prediction model trained in the current sampling operation can be tested by using the test sample, specifically, the performance information of each sub-network in the test sample can be predicted by using the prediction model trained in the current sampling operation, and the prediction accuracy of the prediction model is obtained by comparing the prediction result with the labeled information of each sub-network in the test sample.
If the precision of the prediction model trained in the current sampling operation does not meet a preset condition, for example, does not meet a preset precision threshold, the next sampling operation may be performed, and the number of sub-networks to be sampled is increased in the next sampling operation. The number of sub-networks to be added may be predetermined, for example, by adding 500 sub-networks to each sampling operation compared to the previous sampling operation. Therefore, in two adjacent sampling operations, the number of samples of the prediction model in the next sampling operation is increased, and the accuracy can be better than that in the previous sampling operation. And by gradually increasing the number of the sampled sub-networks, excessive memory resources consumed by the training of the prediction model due to the excessive number of samples can be avoided.
The method for training a prediction model of the above embodiment trains the prediction model through a sampling operation, where the sampling operation includes: sampling a sub-network from the trained super-network, and training the sampled sub-network to obtain the performance information of the trained sub-network; constructing sample data based on the trained sub-network and the corresponding performance information, and training a prediction model by using the sample data; and in response to determining that the accuracy of the prediction model does not meet the preset condition, performing a next sampling operation, and increasing the number of sub-networks sampled in the next sampling operation. The method can obtain the prediction model for predicting the performance of any model structure, so that the model structure with the optimal performance can be obtained only by searching for different constraint conditions once when the method is applied to the automatic search of the model structure, the resources consumed by the search of the model structure are effectively reduced, and the search cost of the model structure is reduced.
Optionally, the sampling operation may further include: and generating a trained prediction model based on a training result of the current sampling operation in response to determining that the precision of the prediction model meets a preset condition. If the precision of the prediction model obtained by training in the current sampling operation reaches a preset precision threshold, the prediction model can be used as the trained prediction model. Therefore, after the number of sub-networks to be sampled is gradually increased through a plurality of sampling operations and the number of samples of the prediction model is increased to gradually optimize the prediction model, the sampling operation can be stopped when the precision of the prediction model meets the preset condition, and the excessive sampling operation is prevented from consuming memory resources.
The trained prediction model can predict the performance of the preset neural network model. In an actual scene, before a function in an application program is on-line, the performance of a neural network model for realizing the function can be predicted by using a prediction model finished by the virtual blue, and the prediction result can be used as reference information to evaluate the stability and reliability of the function.
In some optional implementations of the foregoing embodiment, the step of sampling a subnetwork from the trained super-network includes: and sampling the sub-networks which are not sampled from the trained super-network. That is, in each sampling operation, the sub-networks that have been sampled in the previous sampling operation are not repeatedly sampled, and each sampling operation samples a new batch of sub-networks. At this time, the constructing of sample data based on the trained sub-network and the corresponding performance information includes: and constructing sample data based on the sub-networks and the corresponding performance information sampled in the current sampling operation and the sub-networks and the corresponding performance information sampled in the last sampling operation. That is, the sub-network sampled by the current sampling operation may be added to the sample data constructed in the previous sampling operation to expand the sample data. Therefore, the consumption of operation resources caused by sub-network sampling can be minimized, the number of sample data can be increased step by step, and the memory resources occupied by the prediction model can be reduced.
Optionally, in the above sampling operation, the sampling of the sub-network of step 201 may be implemented as follows: and sampling a sub-network from the trained super-network by adopting an initial recurrent neural network. Before performing step 202 to train the sampled subnetwork, the sampling operation may further include: generating feedback information based on the trained performance information of the sub-network to iteratively update the recurrent neural network based on the feedback information; and re-sampling the sub-networks from the trained super-network based on the iteratively updated recurrent neural network.
In particular, in the process of training the prediction model, a recurrent neural network for sampling a sub-network from the super-network may also be trained. In each sampling operation, a recurrent neural network to be trained can be adopted to sample the sub-networks, parameters of the recurrent neural network to be trained can be initialized randomly, and then information such as errors of the sub-networks sampled by the recurrent neural network to be trained is fed back to the recurrent neural network as feedback information, so that the recurrent neural network updates the parameters according to the feedback information and re-samples the sub-networks.
Therefore, the recurrent neural network can be optimized by training the recurrent neural network based on the sub-network sampling result, so that the sub-network sampling result is optimized, and the prediction accuracy of the prediction model trained based on the sub-network sampling result is improved.
Referring to FIG. 3, a flow diagram of another embodiment of a method for training a predictive model of the present disclosure is shown. As shown in fig. 3, a flow 300 of the method for training a prediction model of the present embodiment includes:
step 301, training a prediction model through a sampling operation.
The sampling operation includes the following steps 3011, 3012, and 3013.
And step 3011, sampling a sub-network from the trained super-network, and training the sampled sub-network to obtain the performance information of the trained sub-network.
And step 3012, constructing sample data based on the trained sub-networks and the corresponding performance information, and training the prediction model by using the sample data.
Step 3013, in response to determining that the precision of the prediction model trained in the current sampling operation does not meet the preset condition, performing the next sampling operation, and increasing the number of sub-networks to be sampled in the next sampling operation.
Optionally, in step 3011, an initial recurrent neural network may be used to sample a subnetwork from the trained super-network; and before performing step 3012 to train the sampled subnetwork, the sampling operation further comprises: generating feedback information based on the trained performance information of the sub-network to iteratively update the recurrent neural network based on the feedback information; and re-sampling the sub-networks from the trained super-network based on the iteratively updated recurrent neural network.
Optionally, the step 3011 of sampling out the sub-network from the training completed super-network may include: and sampling the sub-networks which are not sampled from the trained super-network. And in the step 3012, sample data may be constructed as follows: and constructing sample data based on the sub-networks and the corresponding performance information sampled in the current sampling operation and the sub-networks and the corresponding performance information sampled in the last sampling operation.
Optionally, the sampling operation may further include: and generating a trained prediction model based on a training result of the current sampling operation in response to determining that the precision of the prediction model meets a preset condition.
Step 3011, step 3012, and step 3013 in the sampling operation 301 are respectively the same as step 201, step 202, and step 203 in the foregoing embodiment, and the specific implementation manner of step 3011, step 3012, and step 3013 and the optional implementation manner of the sampling operation may refer to the description of the corresponding step in the foregoing embodiment, which is not described herein again.
In this embodiment, the method for training a prediction model further includes:
and step 302, searching a neural network model structure meeting the performance constraint condition in the model structure search space based on the performance prediction result of the trained prediction model on the model structure in the preset model structure search space and the performance constraint condition of the preset deep learning task scene.
The preset model structure search space may be a search space constructed for a specified deep learning task, such as a search space containing convolutional layers constructed for an image processing task, a search space containing Attention units (Attention) constructed for sequence data of text or voice, or the like. The performance of each model structure in the search space can be predicted by using the prediction model trained in step 301. And then matching the performance prediction result of each model structure with a preset performance constraint condition of a deep learning task scene, and taking the successfully matched model structure as a searched neural network model structure meeting the performance constraint condition. The searched neural network model structure can be used for executing the task data in the preset deep learning task scene.
The performance constraints described above may be determined by the hardware or software environment of the device running the neural network model architecture. For example, if the minimum delay for running the neural network model on a chip is 0.2 seconds, the network structure satisfying the delay condition can be searched in the search space. Alternatively, the performance constraints may be determined based on the requirements of the task being performed by the neural network model. For example, if a function in an application needs to reach an accuracy of 95%, a neural network model structure with an accuracy of not less than 95% can be searched from the search space.
Based on the performance prediction result of the prediction model on the network structure in the search space and the preset performance constraint condition, the appropriate neural network model structure can be quickly searched out. Therefore, the prediction model can be flexibly applied to searching for the neural network model structures suitable in different scenes.
Referring to fig. 4, as an implementation of the method for training a prediction model, the present disclosure provides an embodiment of an apparatus for training a prediction model, which corresponds to the method embodiments shown in fig. 2 and fig. 3, and which can be applied to various electronic devices. Wherein the prediction model is used to predict the performance of the neural network structure.
As shown in fig. 4, the apparatus 400 for training a prediction model of the present embodiment includes a sampling unit 401. The sampling unit 401 is configured to train a prediction model by a sampling operation; the sampling operation performed by the sampling unit includes: sampling a sub-network from the trained super-network, and training the sampled sub-network to obtain the performance information of the trained sub-network; constructing sample data based on the trained sub-network and the corresponding performance information, and training a prediction model by using the sample data; and in response to determining that the precision of the prediction model trained in the current sampling operation does not meet the preset condition, executing the next sampling operation, and increasing the number of sub-networks for sampling in the next sampling operation.
In some embodiments, the sampling unit 401 samples the sub-network from the trained super-network as follows: sampling a sub-network from the trained super-network by adopting an initial recurrent neural network; and before training the sampled sub-network, the sampling operation performed by the sampling unit further includes: generating feedback information based on the trained performance information of the sub-network to iteratively update the recurrent neural network based on the feedback information; and re-sampling the sub-networks from the trained super-network based on the iteratively updated recurrent neural network.
In some embodiments, the sampling unit 401 samples the sub-network from the trained super-network as follows: sampling a sub-network which is not sampled from the trained super-network; the sampling unit 401 constructs sample data as follows: and constructing sample data based on the sub-networks and the corresponding performance information sampled in the current sampling operation and the sub-networks and the corresponding performance information sampled in the last sampling operation.
In some embodiments, the sampling operation further comprises: and generating a trained prediction model based on a training result of the current sampling operation in response to determining that the precision of the prediction model meets a preset condition.
In some embodiments, the above apparatus further comprises: and the searching unit is configured to search out the neural network model structure meeting the performance constraint condition in the model structure searching space based on the performance prediction result of the trained prediction model on the model structure in the preset model structure searching space and the performance constraint condition of the preset deep learning task scene.
The sampling unit 401 in the above-described apparatus 400 corresponds to the steps in the method described with reference to fig. 2 and 3. Thus, the operations, features and technical effects described above for the method for training the prediction model are also applicable to the apparatus 400 and the units included therein, and are not described herein again.
Referring now to FIG. 5, a schematic diagram of an electronic device (e.g., the server shown in FIG. 1) 500 suitable for use in implementing embodiments of the present disclosure is shown. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 5, electronic device 500 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; a storage device 508 including, for example, a hard disk; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 5 may represent one device or may represent multiple devices as desired.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program, when executed by the processing device 501, performs the above-described functions defined in the methods of embodiments of the present disclosure. It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: training a prediction model through sampling operation; the sampling operation comprises the following steps: sampling a sub-network from the trained super-network, and training the sampled sub-network to obtain the performance information of the trained sub-network; constructing sample data based on the trained sub-network and the corresponding performance information, and training a prediction model by using the sample data; and in response to the fact that the precision of the prediction model obtained by training in the current sampling operation does not meet the preset condition, executing the next sampling operation, and increasing the number of sub-networks to be sampled in the next sampling operation, wherein the prediction model is used for predicting the performance of the neural network structure.
Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a sampling unit. Where the names of these units do not in some cases constitute a limitation of the unit itself, for example, a sampling unit may also be described as a "unit that trains a prediction model through a sampling operation".
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept as defined above. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (12)

1. A method for training a predictive model for predicting the performance of a neural network structure, the method comprising training the predictive model by a sampling operation;
the sampling operation comprises:
sampling a sub-network from the trained super-network, and training the sampled sub-network to obtain the performance information of the trained sub-network;
constructing sample data based on the trained sub-network and the corresponding performance information, and training the prediction model by using the sample data;
and in response to determining that the precision of the prediction model trained in the current sampling operation does not meet the preset condition, executing the next sampling operation, and increasing the number of sub-networks for sampling in the next sampling operation.
2. The method of claim 1, wherein said sampling a subnetwork from a trained super-network comprises:
sampling a sub-network from the trained super-network by adopting an initial recurrent neural network; and
before training the sampled subnetwork, the sampling operation further comprises:
generating feedback information based on the performance information of the trained sub-network to iteratively update the recurrent neural network based on the feedback information;
and re-sampling the sub-networks from the trained super-network based on the iteratively updated recurrent neural network.
3. The method of claim 1, wherein said sampling a subnetwork from a trained super-network comprises:
sampling a sub-network which is not sampled from the trained super-network; and
the constructing of sample data based on the trained sub-networks and corresponding performance information comprises:
and constructing sample data based on the sub-networks and the corresponding performance information sampled in the current sampling operation and the sub-networks and the corresponding performance information sampled in the last sampling operation.
4. The method of claim 1, wherein the sampling operation further comprises:
and generating a trained prediction model based on a training result of the current sampling operation in response to determining that the precision of the prediction model meets a preset condition.
5. The method of any of claims 1-4, wherein the method further comprises:
and searching a neural network model structure meeting the performance constraint condition in the model structure search space based on the performance prediction result of the trained prediction model on the model structure in the preset model structure search space and the performance constraint condition of the preset deep learning task scene.
6. An apparatus for training a prediction model for predicting performance of a neural network structure, the apparatus comprising a sampling unit configured to train the prediction model by a sampling operation;
the sampling operation performed by the sampling unit comprises:
sampling a sub-network from the trained super-network, and training the sampled sub-network to obtain the performance information of the trained sub-network;
constructing sample data based on the trained sub-network and the corresponding performance information, and training the prediction model by using the sample data;
and in response to determining that the precision of the prediction model trained in the current sampling operation does not meet the preset condition, executing the next sampling operation, and increasing the number of sub-networks for sampling in the next sampling operation.
7. The apparatus of claim 6, wherein the sampling unit samples a subnetwork from a trained super-network as follows:
sampling a sub-network from the trained super-network by adopting an initial recurrent neural network; and
before training the sampled sub-network, the sampling operation performed by the sampling unit further includes:
generating feedback information based on the performance information of the trained sub-network to iteratively update the recurrent neural network based on the feedback information;
and re-sampling the sub-networks from the trained super-network based on the iteratively updated recurrent neural network.
8. The apparatus of claim 6, wherein the sampling unit samples a subnetwork from a trained super-network as follows:
sampling a sub-network which is not sampled from the trained super-network; and
the sampling unit constructs sample data according to the following mode:
and constructing sample data based on the sub-networks and the corresponding performance information sampled in the current sampling operation and the sub-networks and the corresponding performance information sampled in the last sampling operation.
9. The apparatus of claim 6, wherein the sampling operation further comprises:
and generating a trained prediction model based on a training result of the current sampling operation in response to determining that the precision of the prediction model meets a preset condition.
10. The apparatus of any of claims 6-9, wherein the apparatus further comprises:
and the searching unit is configured to search out a neural network model structure meeting the performance constraint condition in the model structure searching space based on the performance prediction result of the trained prediction model on the model structure in the preset model structure searching space and the performance constraint condition of the preset deep learning task scene.
11. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.
12. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-5.
CN202010116709.3A 2020-02-25 2020-02-25 Method and apparatus for training predictive models Active CN111340220B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010116709.3A CN111340220B (en) 2020-02-25 2020-02-25 Method and apparatus for training predictive models

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010116709.3A CN111340220B (en) 2020-02-25 2020-02-25 Method and apparatus for training predictive models

Publications (2)

Publication Number Publication Date
CN111340220A true CN111340220A (en) 2020-06-26
CN111340220B CN111340220B (en) 2023-10-20

Family

ID=71183586

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010116709.3A Active CN111340220B (en) 2020-02-25 2020-02-25 Method and apparatus for training predictive models

Country Status (1)

Country Link
CN (1) CN111340220B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112633471A (en) * 2020-12-17 2021-04-09 苏州浪潮智能科技有限公司 Method, system, device and medium for constructing neural network architecture search framework
CN112949842A (en) * 2021-05-13 2021-06-11 北京市商汤科技开发有限公司 Neural network structure searching method, apparatus, computer device and storage medium
CN112949662A (en) * 2021-05-13 2021-06-11 北京市商汤科技开发有限公司 Image processing method and device, computer equipment and storage medium
CN113033784A (en) * 2021-04-18 2021-06-25 沈阳雅译网络技术有限公司 Method for searching neural network structure for CPU and GPU equipment
WO2022126448A1 (en) * 2020-12-16 2022-06-23 华为技术有限公司 Neural architecture search method and system based on evolutionary learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150254347A1 (en) * 2014-03-04 2015-09-10 Palo Alto Research Center Incorporated System and method for direct storage access in a content-centric network
CN106777006A (en) * 2016-12-07 2017-05-31 重庆邮电大学 A kind of sorting algorithm based on parallel super-network under Spark
CN110288084A (en) * 2019-06-06 2019-09-27 北京小米智能科技有限公司 Super-network training method and device
CN110490303A (en) * 2019-08-19 2019-11-22 北京小米智能科技有限公司 Super-network construction method, application method, device and medium
CN110782034A (en) * 2019-10-31 2020-02-11 北京小米智能科技有限公司 Neural network training method, device and storage medium
CN110807515A (en) * 2019-10-30 2020-02-18 北京百度网讯科技有限公司 Model generation method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150254347A1 (en) * 2014-03-04 2015-09-10 Palo Alto Research Center Incorporated System and method for direct storage access in a content-centric network
CN106777006A (en) * 2016-12-07 2017-05-31 重庆邮电大学 A kind of sorting algorithm based on parallel super-network under Spark
CN110288084A (en) * 2019-06-06 2019-09-27 北京小米智能科技有限公司 Super-network training method and device
CN110490303A (en) * 2019-08-19 2019-11-22 北京小米智能科技有限公司 Super-network construction method, application method, device and medium
CN110807515A (en) * 2019-10-30 2020-02-18 北京百度网讯科技有限公司 Model generation method and device
CN110782034A (en) * 2019-10-31 2020-02-11 北京小米智能科技有限公司 Neural network training method, device and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MUYUAN FANG 等: "BETANAS: BALANCED TRAINING AND SELECTIVE DROP FOR NEURAL ARCHITECTURE SEARCH", 《ARXIV》, pages 1 - 11 *
张选杨: "深度神经网络架构优化与设计", 《中国优秀硕士学位论文全文数据库信息科技辑 》, no. 01, pages 140 - 260 *
蒋冰青: "基于QUATRE算法的多目标卷积神经网络搜索方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 02, pages 140 - 231 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022126448A1 (en) * 2020-12-16 2022-06-23 华为技术有限公司 Neural architecture search method and system based on evolutionary learning
CN112633471A (en) * 2020-12-17 2021-04-09 苏州浪潮智能科技有限公司 Method, system, device and medium for constructing neural network architecture search framework
CN112633471B (en) * 2020-12-17 2023-09-26 苏州浪潮智能科技有限公司 Method, system, equipment and medium for constructing neural network architecture search framework
CN113033784A (en) * 2021-04-18 2021-06-25 沈阳雅译网络技术有限公司 Method for searching neural network structure for CPU and GPU equipment
CN112949842A (en) * 2021-05-13 2021-06-11 北京市商汤科技开发有限公司 Neural network structure searching method, apparatus, computer device and storage medium
CN112949662A (en) * 2021-05-13 2021-06-11 北京市商汤科技开发有限公司 Image processing method and device, computer equipment and storage medium
CN112949842B (en) * 2021-05-13 2021-09-14 北京市商汤科技开发有限公司 Neural network structure searching method, apparatus, computer device and storage medium

Also Published As

Publication number Publication date
CN111340220B (en) 2023-10-20

Similar Documents

Publication Publication Date Title
CN110807515B (en) Model generation method and device
US11836576B2 (en) Distributed machine learning at edge nodes
CN110288049B (en) Method and apparatus for generating image recognition model
CN111340220B (en) Method and apparatus for training predictive models
CN110852438B (en) Model generation method and device
CN108520470B (en) Method and apparatus for generating user attribute information
CN110766142A (en) Model generation method and device
CN111523640B (en) Training method and device for neural network model
CN110688528B (en) Method, apparatus, electronic device, and medium for generating classification information of video
CN111340221B (en) Neural network structure sampling method and device
CN111368973B (en) Method and apparatus for training a super network
CN109829164B (en) Method and device for generating text
CN112650841A (en) Information processing method and device and electronic equipment
CN111061956A (en) Method and apparatus for generating information
CN111104599B (en) Method and device for outputting information
CN111353601A (en) Method and apparatus for predicting delay of model structure
WO2022188534A1 (en) Information pushing method and apparatus
US11410023B2 (en) Lexicographic deep reinforcement learning using state constraints and conditional policies
CN111767290B (en) Method and apparatus for updating user portraits
CN111523639A (en) Method and apparatus for training a hyper-network
CN109857838B (en) Method and apparatus for generating information
CN111353585A (en) Structure searching method and device of neural network model
CN113361677A (en) Quantification method and device of neural network model
CN110942306A (en) Data processing method and device and electronic equipment
CN113361678A (en) Training method and device of neural network model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant