WO2022083624A1 - 一种模型的获取方法及设备 - Google Patents

一种模型的获取方法及设备 Download PDF

Info

Publication number
WO2022083624A1
WO2022083624A1 PCT/CN2021/124924 CN2021124924W WO2022083624A1 WO 2022083624 A1 WO2022083624 A1 WO 2022083624A1 CN 2021124924 W CN2021124924 W CN 2021124924W WO 2022083624 A1 WO2022083624 A1 WO 2022083624A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
trained
target
models
derivative
Prior art date
Application number
PCT/CN2021/124924
Other languages
English (en)
French (fr)
Inventor
王波超
康宁
徐航
黄国位
张维
李震国
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022083624A1 publication Critical patent/WO2022083624A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/04Constraint-based CAD

Definitions

  • the present application relates to the field of machine learning, and in particular, to a method and device for acquiring a model.
  • Artificial Intelligence is the use of computers or computer-controlled machines to simulate, extend and expand human intelligence. Artificial intelligence includes the study of the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Transfer learning is a method of machine learning, which refers to taking the model developed for task A (which can be called the first data set) as the initial point, and reusing it for the new task B (which can be called the second data set)
  • a pre-trained model obtained from task A is reused in another task B.
  • there are a large number of models trained based on some existing tasks ie pre-trained models
  • how to efficiently select a model suitable for the new task from the large number of models and A suitable set of hyperparameters is an urgent problem.
  • using a model trained on an existing related task such as the ImageNet dataset classification task
  • transfer learning such as fine-tune
  • An existing solution is to manually select a model pre-trained on an open dataset (such as ImageNet and other original datasets) based on experience, and select a set of hyperparameters (or manually fine-tune hyperparameters) for transfer learning based on experience.
  • the selected model is retrained according to the new task, in order to train the model to the target accuracy, but the model with high accuracy is output on the original data set.
  • the learning task ie the second dataset
  • the whole process may involve multiple model selections and multiple hyperparameter selections (may even require manual design of new models), and each training requires a lot of time and computing power costs.
  • the embodiment of the present application provides a method and device for acquiring a model, the method comprehensively considers the selection of a model and the selection of hyperparameters, and is used to rapidly predict each model in a set of models constructed based on constraints by using the constructed first predictor
  • the performance of the new task under different hyperparameters and select the model and hyperparameters that satisfy the preset conditions (such as the maximum value of the output accuracy of the model) as the final goal of processing the new task (ie the second data set).
  • Model and target hyperparameters For new tasks, the method can efficiently select appropriate models and hyperparameters based on the constraints given by the user, thereby saving training time and computing power costs.
  • an embodiment of the present application first provides a method for acquiring a model, which can be used in the field of artificial intelligence.
  • the method includes: first, constructing a model set (also referred to as a model set, hereinafter collectively referred to as a model set) based on constraints , the model set includes at least two models that have been pre-trained on the first data set. After the model set is constructed based on the constraints, the model set includes at least two models that have been pre-trained on the first data set. After that, random sampling is performed in the hyperparameter space to obtain a set of hyperparameters. The hyperparameters obtained by random sampling can be called the first hyperparameter, and then any model pair in the model set is predicted by the first predictor constructed.
  • the first output accuracy of the second data set where each model corresponds to a first output accuracy, for example, it may be the output accuracy of one model in the prediction model set, or the corresponding outputs of multiple models in the prediction model set.
  • Accuracy can also be the output accuracy of each model in the prediction model set, which is not limited here, and each model has a corresponding hyperparameter (ie, the first hyperparameter), that is, in the hyperparameter of the model
  • the output accuracy of any model in the model set for the second data set is predicted by another predictor (which can be called the first predictor) (which can be called the first output accuracy), where the second dataset is the dataset of the new task.
  • the target output accuracy which is the same as the target output accuracy.
  • the model and hyperparameters corresponding to the target output accuracy are called the target model and target hyperparameters.
  • the target model and the target hyperparameters are used as the model and hyperparameters for the final processing of the second data set, that is to say, select
  • the target model and the target hyperparameters are transfer-learned on a new second dataset. After the target model and target hyperparameters are determined from the model set and hyperparameter space through the above steps, the target model can be trained according to the second data set based on the target hyperparameters to obtain a trained target model.
  • the selection of the model and the selection of hyperparameters are comprehensively considered, and the constructed first predictor is used to quickly predict that each model in the set of models constructed based on the constraints will target new tasks with different hyperparameters. and select the model and hyperparameters that satisfy the preset conditions (such as the maximum output accuracy of the model) as the target model and target hyperparameters for the final processing of the new task (ie, the second dataset).
  • the method can efficiently select appropriate models and hyperparameters based on the constraints given by the user, thereby saving training time and computing power costs.
  • the technical effect achieved by the embodiments of the present application is: in the actual service delivery process, a suitable model is found for a new task (ie, the second data set) within a limited time, and trained to achieve Delivering the required accuracy means choosing the best model and the best set of hyperparameters for the new task.
  • the input data of the constructed first predictor is a set of hyperparameters (ie, the first hyperparameters) sampled from the hyperparameter space, any one model in the model set, and
  • the output is the prediction of the output accuracy of the second data set by the arbitrary model under the condition of the first hyperparameter.
  • the first hyperparameter, the model and the second data set are encoded respectively, so as to obtain the hyperparameter encoding, the model encoding and the second data set encoding respectively, and then the hyperparameter encoding, the model encoding and the second data set encoding are respectively obtained.
  • the encoding and the encoding of the second data set are input to the first predictor, and the prediction result of the model for the first output accuracy of the second data set under the condition of the first hyperparameter is output.
  • the first predictor since the constructed first predictor is untrained, in this embodiment of the present application, the first predictor may be initialized by using an existing task, When the second data set is used as a new task to complete the prediction, the parameters of the first predictor can also be updated as the new task as the next existing task, thereby improving the detection accuracy of the first predictor. Specifically, the parameters of the first predictor can be updated according to the second output precision, the second data set, the target hyperparameters and the target model, wherein the second output precision is the second data set of the trained target model. output precision.
  • the first predictor for the processed second data set, can be updated according to the second output accuracy, the second data set, etc., so that the prediction accuracy of the first predictor can be improved.
  • the first output accuracy is roughly predicted by the predictor, and the second output accuracy is obtained by actual training. If the parameters of the first predictor are updated with the output accuracy of the actual training, the detection accuracy of the first predictor will be improved accordingly.
  • the target output precision satisfying the first preset condition includes: the target output precision takes the largest value among the first output precisions. It should also be noted here that evaluating a The performance of the model, in addition to the output accuracy, can also be other things. For example, the smaller the error rate, the better the performance; the higher the accuracy rate, the better the performance.
  • the output accuracy is described as an example.
  • the method of determining the target model from the models may be: selecting the model corresponding to the first output precision with the largest value among all the first output precisions as the target model described in the embodiment of the present application , in general, the larger the output accuracy, the better the detection performance of the model in the case of the corresponding hyperparameters, and the optimal model and hyperparameters can be selected accordingly.
  • an initial model set may be constructed based on constraints, and the initial model set includes at least two trained initial models. model, wherein each initial model is trained according to the existing open first data set.
  • a group of derivative models corresponding to each initial model can be obtained according to the evolutionary algorithm (EA).
  • EA evolutionary algorithm
  • Each group of derivative models includes at least one derivative model.
  • the set of derivative models derived from each initial model specifically includes several derivative models that can be set by themselves according to the evolutionary algorithm, which is not specifically limited here.
  • each derivative model derived from the evolutionary algorithm is an untrained model
  • this application also needs to construct a predictor (which may be called a second predictor), the second predictor
  • the function is to predict the output accuracy of each derivative model for the first data set (it can be called the third output accuracy).
  • the third output accuracy is a rough prediction result, not the real output of the derivative model for the first data set. precision.
  • the constructed second predictor is also untrained.
  • the input of the second predictor is each trained initial model in the initial model set. model, the trained second predictor can be obtained.
  • the trained second predictor can be used to process each derived model to predict the third output accuracy of each derived model on the first dataset. Then, according to the obtained third output accuracy corresponding to each derivative model, a target derivative model (which can be one or more) is selected from all the derivative models, and the selected target derivative model is trained according to the first data set, so as to obtain the training If the target derivative model is obtained after the training, the above-mentioned initial model after training and the derivative model after training constitute the model set described in the embodiment of the present application.
  • a target derivative model which can be one or more
  • a model set based on constraints is specifically described, that is, an initial model set is first constructed based on the constraints, and then a series of derivative models are derived through an evolutionary algorithm using the initial model in the initial model set as a seed. And select the target derivative model from it for training, so that the target derivative model after training and the initial model after training obtained from the beginning together constitute the model set described in the embodiment of the present application, and this construction method can be accumulated to meet the constraints.
  • the second predictor can quickly filter out suitable models, saving search time.
  • the method of constructing the initial model set based on the constraints may specifically be: first, determine a search space according to the constraints, and the search space includes a variety of network structural units (blocks) and The connection relationship between the various network structural units, wherein each block contains one or more nodes and an operation (operation, OP) on each node, and the operation refers to some basic operation units of the neural network,
  • operations such as convolution, pooling, etc.
  • the nodes here can be understood as the layers of the neural network model, such as input layer, output layer, convolution layer, pooling layer, fully connected layer, etc., the combination formed after each block is connected
  • the structure is the initial model described in the embodiments of this application.
  • At least two initial models can be obtained by random sampling from the search space, and each initial model is determined by a plurality of block structures and the connection relationship between each block structure.
  • the initial models can be pre-trained according to the first data set, thereby obtaining the initial models after training, and each initial model after training constitutes the initial initial model set.
  • the initial model set according to the constraints that is, firstly, the search space is determined according to the constraints, and then the initial model is obtained by sampling and combining from the search space.
  • this construction method can also be combined to obtain an architecture organization method that currently does not exist or is unexpected, and is complete.
  • the initial model is trained according to the first data set, and the trained initial model may specifically be: first, All the initial models in the initial model set are fused into a supernet model (which can be called the first model), and then the first model is trained according to the first data set, so as to obtain the trained first model, and finally, the training The first model after that is disassembled again into the initial model after training.
  • the trained initial model may specifically be: first, All the initial models in the initial model set are fused into a supernet model (which can be called the first model), and then the first model is trained according to the first data set, so as to obtain the trained first model, and finally, the training The first model after that is disassembled again into the initial model after training.
  • the target derivative model is trained according to the first data set, and the obtained target derivative model after training may specifically be:
  • These multiple target-derived models are merged into a supernet model (which can be called a second model), and then the second model is trained according to the first data set to obtain a second model after training.
  • the second model is again disassembled into multiple target-derived models after training.
  • the second predictor may be "GCN+Bayesian regressor", and specifically, the process of training the constructed second predictor according to the initial model after training may be as follows : First, encode the graph structure (also known as topological graph) of the initial model after training, get the graph code of each initial model after training, and then use each graph code as the input of GCN, and then the output of GCN As an input to the Bayesian regressor, the role of the Bayesian regressor is mainly to evaluate the mean and variance of the model performance, specifically by using a confidence upper bound to evaluate the performance of the model.
  • the second predictor can be "GCN+Bayesian regressor", and when the second predictor is "GCN+Bayesian regressor", then the graph of the initial model after training needs to be
  • the structure is encoded, and the graph codes corresponding to each initial model obtained by encoding can be used as the input data of GCN, and GCN is used to extract the features of each graph code, so as to avoid manually designing kernel functions to evaluate the distance between network architectures.
  • the output of GCN is used as the input of the Bayesian regressor, which is mainly used to evaluate the mean and variance of the model performance, which is achievable.
  • a target derived model from all derived models according to the third output precision corresponding to each derived model including but not limited to the following: Select the derivative model whose third output precision is greater than the preset value from the models as the target derivative model; or, select the top n derivative models with a larger third output precision from all the derivative models as the target derivative model, n ⁇ 1; or, obtain the upper confidence bound (UCB) corresponding to each derived model according to the mean and variance of the third output precision, and select the top m derived models with a larger confidence upper bound from all derived models as the target Derivative model, m ⁇ 1.
  • UDB upper confidence bound
  • the above constructed model set may also be used as a new initial model set and the target derived model may be used as a new initial model, and the above steps of constructing a model set may be re-executed until a preset value is reached.
  • condition (may be referred to as the second preset condition).
  • each model in the model set can be re-used as a new initial model to continue to construct a new derived model and select a new target derived model until a preset condition is reached, so that the model set can accumulate enough A model that meets the requirements.
  • the second preset condition may be set according to user requirements.
  • the second preset condition may be that the number of models in the module library reaches a preset number, assuming that the preset The number is 13, and the model set obtained in the current round includes 14 models, then the second preset condition is met, so the model set including 14 models is the model set finally constructed;
  • the second preset condition can also be that the constraints satisfied by the models in the model set meet the preset requirements. For example, assuming that there are three types of constraints, the user requires that each type of constraints need to reach a certain number. The purpose of this is This is to allow the model set to accumulate to models that satisfy different constraints.
  • the constraint conditions include any one or more of: model size, model inference delay, model training delay, hardware deployment conditions, and on-chip memory size.
  • some new tasks such as data sets of pictures, audio, etc. obtained by autonomous vehicles
  • model inference delay because autonomous vehicles have high real-time requirements
  • some new tasks such as mobile phones, etc.
  • Terminal devices have higher requirements on the size of on-chip memory, because the storage space of handheld terminals such as mobile phones is limited.
  • the trained target model may also be deployed on the execution device, so that the execution device processes the input target data through the trained target model.
  • the execution device can be deployed on smart terminals such as mobile phones, personal computers, and smart watches, and can also be deployed on mobile terminal devices such as autonomous vehicles, connected cars, and smart cars, which are not specifically limited here.
  • the target model obtained by training based on the second data set can be deployed on the execution device for practical application.
  • a second aspect of the embodiments of the present application provides a computer device, where the computer device has a function of implementing the method of the first aspect or any possible implementation manner of the first aspect.
  • This function can be implemented by hardware or by executing corresponding software by hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • a third aspect of an embodiment of the present application provides a computer device, which may include a memory, a processor, and a bus system, wherein the memory is used to store a program, and the processor is used to call the program stored in the memory to execute the first aspect of the embodiment of the present application or any one of the possible implementation methods of the first aspect.
  • a fourth aspect of the present application provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, when the computer-readable storage medium runs on a computer, the computer can execute the first aspect or any one of the possible implementations of the first aspect way method.
  • a fifth aspect of the embodiments of the present application provides a computer program, which, when running on a computer, causes the computer to execute the above-mentioned first aspect or any one of the possible implementation methods of the first aspect.
  • a sixth aspect of an embodiment of the present application provides a chip, the chip includes at least one processor and at least one interface circuit, the interface circuit is coupled to the processor, and the at least one interface circuit is configured to perform a transceiving function and send an instruction to At least one processor, at least one processor is used to run a computer program or instruction, which has the function of implementing the method as described above in the first aspect or any possible implementation manner of the first aspect, and the function can be implemented by hardware or software.
  • the implementation can also be implemented by a combination of hardware and software, where the hardware or software includes one or more modules corresponding to the above functions.
  • the interface circuit is used to communicate with other modules outside the chip.
  • the interface circuit can send the target model obtained by the processor on the chip to various intelligent driving (such as unmanned driving, assisted driving, etc.) agent for application.
  • Figure 1 is a schematic diagram of a process for selecting appropriate models and hyperparameters for a new task
  • Fig. 2 is a schematic flow chart of GCN processing graph structure data
  • FIG. 3 is a schematic structural diagram of an artificial intelligence main body framework provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a method for acquiring a model provided by an embodiment of the present application
  • FIG. 5 is a schematic diagram of a block structure and an internal operation relationship of the block structure provided by an embodiment of the present application;
  • FIG. 6 is a schematic diagram of a connection relationship between a plurality of identical or different blocks provided by an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of constructing an initial model set based on a search space provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a graph structure of a model provided by an embodiment of the present application and a corresponding graph encoding
  • Fig. 9 provides the schematic diagram of a plurality of initial models that are fused into the first model for training and re-disassembled into a plurality of initial models after the training provided by the embodiment of the application;
  • FIG. 10 is a schematic diagram of the first predictor provided by the embodiment of the application obtaining the prediction of the first output accuracy of the second data set by each model;
  • FIG. 11 is a schematic diagram of a framework of a method for acquiring a model provided by an embodiment of the present application.
  • FIG. 12 is a schematic diagram of the comparison between the model set ET-NAS provided by the embodiment of the application and the manually designed model in terms of training step time;
  • FIG. 13 is a schematic diagram of a performance comparison between a D-chip friendly network model and a commonly used network model provided by an embodiment of the present application;
  • FIG. 14 is a schematic diagram of the performance comparison between a GPU V100-friendly network model and a commonly used network model provided by an embodiment of the application;
  • 15 is a schematic diagram of a comparison of sampling efficiency on a neural network architecture search benchmark data set provided by an embodiment of the present application.
  • 16 is a schematic structural diagram of a computer device provided by an embodiment of the application.
  • FIG. 17 is another schematic structural diagram of a computer device provided by an embodiment of the present application.
  • FIG. 18 is a schematic structural diagram of a chip provided by an embodiment of the present application.
  • the embodiment of the present application provides a method and device for acquiring a model, the method comprehensively considers the selection of a model and the selection of hyperparameters, and is used to rapidly predict each model in a set of models constructed based on constraints by using the constructed first predictor
  • the performance of the new task under different hyperparameters and select the model and hyperparameters that satisfy the preset conditions (such as the maximum value of the output accuracy of the model) as the final goal of processing the new task (ie the second data set).
  • Model and target hyperparameters For new tasks, the method can efficiently select appropriate models and hyperparameters based on the constraints given by the user, thereby saving training time and computing power costs.
  • Transfer learning is a machine learning method that takes the model developed for task A as an initial point and reuses it in the process of developing a model for task B. That is to say, the knowledge learned from the model trained based on the existing task (such as the said task A) is transferred to the new task (such as the said task B) to help the model to retrain, through transfer learning
  • the knowledge that has been learned (contained in the model parameters) is shared with new tasks in some way to speed up and optimize the learning efficiency of the model, so that the model does not have to learn from scratch.
  • fine-tune is a simple and efficient transfer learning method, such as training target detection tasks, using the model trained on the ImageNet dataset as the model (backbone) of the new task can significantly improve the training efficiency.
  • the essential purpose of GCN is to extract the spatial features of the graph structure.
  • the graph structure here refers to the topology graph in which the corresponding relationship is established with vertices and edges in mathematics (ie, graph theory).
  • a graph The spatial characteristics of the structure have the following two characteristics: a. Node characteristics, that is, each node has its own characteristics, which are reflected in the node itself; b. Structural characteristics, that is, the connection between nodes in the graph structure, this characteristic It is reflected on the edge (the connecting line between nodes).
  • Figure 2 is a schematic diagram of the process of GCN processing graph structure data.
  • GCN can be regarded as a convolutional neural network (CNN) ) is a natural extension of graph structure, which can perform end-to-end learning of node features and structural features at the same time, and is currently the best choice for learning tasks of graph structure type data. Moreover, GCN has a wide range of applicability and is suitable for graphs with any topology.
  • CNN convolutional neural network
  • Bayesian regressor also known as Bayesian regression or Bayesian linear regression
  • Bayesian linear regression treats the parameters of a linear model as random variables, and calculates its posterior through the prior of the model parameters (weight coefficients).
  • Bayesian linear regression can be solved numerically, and under certain conditions, the posterior or its related statistics in analytical form can also be obtained.
  • Bayesian linear regression has the basic properties of Bayesian statistical models, which can solve the probability density function of weight coefficients, perform online learning and model hypothesis testing based on Bayesian factors.
  • the confidence limit is a general term for the limit in the one-sided confidence interval and the upper and lower bounds of the two-sided confidence interval.
  • the confidence interval refers to the interval included between the confidence limits. or reciprocal confidence) can describe a range (interval) that has a parameter required to be measured, which may be a mean, standard error, a proportion, or any other point of measurement for the purpose of determining higher and lower confidence bounds.
  • the higher confidence bound is called the upper confidence bound (also known as the upper confidence limit)
  • the lower confidence bound also known as the lower confidence bound
  • the purpose is to determine the higher and lower confidence limit.
  • an estimated value of a parameter can be obtained by random sampling calculation from a population.
  • the possibility of the true value appearing in this interval is included to a certain extent, and this interval is is the confidence interval.
  • the 95% confidence interval is calculated, which can be understood as the possibility of 95% of the true value appearing in this interval, and the 99% or 99.9% confidence interval can also be calculated.
  • evolutionary algorithm is a group-oriented random search technology and method produced by simulating the evolution process of organisms in nature. is an "algorithmic cluster", although it has many variations, with different ways of genetic gene expression, different crossover and mutation operators, references to special operators, and different methods of regeneration and selection, but they all generate inspiration. Biological evolution from nature. Compared with the traditional calculus-based method and the exhaustive method and other optimization algorithms, evolutionary computing is a mature global optimization method with high robustness and wide applicability, and has the characteristics of self-organization, self-adaptation and self-learning. , which is not limited by the nature of the problem, and can effectively deal with complex problems that are difficult to solve by traditional optimization algorithms.
  • Figure 3 shows a schematic structural diagram of the main frame of artificial intelligence.
  • the above-mentioned artificial intelligence theme framework is explained in two dimensions (vertical axis).
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, data has gone through the process of "data-information-knowledge-wisdom".
  • the "IT value chain” reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecological process of the system.
  • the infrastructure provides computing power support for artificial intelligence systems, realizes communication with the outside world, and supports through the basic platform. Communication with the outside world through sensors; computing power is provided by smart chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA); the basic platform includes distributed computing framework and network-related platform guarantee and support, which can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.
  • smart chips hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA
  • the basic platform includes distributed computing framework and network-related platform guarantee and support, which can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.
  • the data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence.
  • the data involves graphics, images, voice, and text, as well as IoT data from traditional devices, including business data from existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.
  • machine learning and deep learning can perform symbolic and formalized intelligent information modeling, extraction, preprocessing, training, etc. on data.
  • Reasoning refers to the process of simulating human's intelligent reasoning method in a computer or intelligent system, using formalized information to carry out machine thinking and solving problems according to the reasoning control strategy, and the typical function is search and matching.
  • Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.
  • some general capabilities can be formed based on the results of data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, image identification, etc.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall artificial intelligence solution, the productization of intelligent information decision-making, and the realization of landing applications. Its application areas mainly include: intelligent terminals, intelligent manufacturing, Smart transportation, smart home, smart healthcare, smart security, autonomous driving, safe city, etc.
  • the present application can be applied to the field of computer vision in the field of artificial intelligence.
  • the data acquired by the infrastructure in the embodiment of the present application is the new task (ie, the second data set) described in the embodiment of the present application.
  • the new task ie, the second data set
  • the new task can be data such as pictures, texts, voices, etc.
  • a target model suitable for the new task and a set of target hyperparameters from the constructed model set based on the model determination method provided in the embodiment of the present application for the new task.
  • process to obtain a target model trained based on the new task wherein it should be noted that the target model has been pre-trained by an existing task (eg, the first data set described in the embodiment of the present application).
  • FIG. 4 is a schematic flowchart of the method for acquiring the model provided by the embodiment of the present application. The method may include the following steps:
  • a model set is constructed based on the constraints, and the model set includes at least two models that have been pre-trained on the first dataset, eg, the open ImageNet dataset.
  • the constraints include some specific business requirements of users.
  • the constraints may be model size, model inference delay, model training delay, specific hardware deployment conditions, on-chip One or more of memory size, etc.
  • some new tasks such as data sets of pictures, audio, etc. obtained by autonomous vehicles
  • some new tasks such as mobile phones, etc.) Terminal devices
  • different new tasks have different constraints on the model.
  • different constraints can be obtained based on different application scenarios of the new tasks (which may be one or more), so that the construction based on the constraints satisfies each A set of models for new tasks.
  • constructing a model set based on constraints may have different implementations, and may first construct an initial model set based on constraints, and the initial model set includes at least two post-training models. , where each initial model is trained based on the existing open first dataset.
  • the initial model set may be constructed by searching through a neural network architecture based on the constructed search space, wherein the constructed search spaces are different, then the specific implementation of constructing the initial model set based on constraints The methods will also be different, as follows:
  • the constructed search space includes various network structural units (blocks) and connection relationships between the various network structural units.
  • a search space is determined according to the constraints, and the search space includes various network structural units (blocks) and connection relationships between the various network structural units, wherein each block contains one or more nodes and The operation (operation, OP) on each node, which refers to some basic operation units of the neural network, such as convolution, pooling and other operations, where the nodes can be understood as the layers of the neural network model, such as the input layer , output layer, convolutional layer, pooling layer, fully connected layer, etc.
  • the following describes the connection relationship between blocks.
  • Figure 5 shows a block structure and the internal operation relationship of the block structure. Each block structure can set the number of nodes, the operation on each node and the change of the number of channels.
  • Figure 5 shows 5 nodes.
  • the nodes include input nodes and output nodes (also known as input layers and output layers), C represents the number of input channels, and (0.25 ⁇ 4) ⁇ C represents the number of channels in the middle three nodes according to C.
  • Figure 5 is only for illustration. It should be noted here that, in general, the input node and output node The number of channels is the same, and there is a skip connection (that is, the flow direction of the data stream) between the two by default. If the resolution of the input layer and the output layer are inconsistent, a 1x1 convolution can be inserted in the middle. When the output results of different nodes are combined, you can It is a direct addition (add) method or a channel combination (concat) method, which are two different operations, which are not limited here.
  • a block structure considers 1-5 nodes, each node considers 7 different operations, and the number of channels is generally 5 changes. For example, it can be 7 as shown in Table 1.
  • operation and 5 changes in the number of channels represented by the ratio of the change in the number of channels), where c in Table 1 represents the number of input channels for the current operation.
  • the operation on the node and the change of the number of channels may also be in other forms, which are not specifically limited here.
  • FIG. 6 illustrates a connection relationship (also referred to as a stacking relationship) between a plurality of identical or different blocks, and a combined structure formed after each block is connected is the initial model described in the embodiment of the present application.
  • Figure 6 shows the stack structure in the form of 4432, that is to say, in the initial model of the stack, the first stage (ie stage 1) includes 4 blocks with the number of channels c, and the second stage (ie stage 2) includes 4 blocks, of which 2 blocks have a channel number of c, and the other 2 blocks have a channel number of 2c.
  • the third stage (ie stage 3) includes 3 blocks with a channel number of 2c, and the fourth stage (ie stage 4) Including 2 blocks with 4c channels.
  • the initial model of the stack can include multiple stages, as shown in Figure 6, there are 4 stages, each stage can include blocks with the same or different internal structures, as shown in Figure 6, stage 1 includes 4 stages Each block has the same internal structure.
  • the four blocks in stage 2 in FIG. 6 include two blocks with different internal structures.
  • how many stages are included in the initial stacking model and the number of stages included in each stage The type of block structure and the number of channels can be set, which are not limited here.
  • each stage may include 1-10 same or different blocks.
  • the search space determined according to the constraints can be decomposed into a two-level search process. First, a block structure that meets the requirements can be searched based on the constraints, and then the block structures can be searched based on the constraints. connection relationship. Through these two levels of search, a search space that meets the constraints is obtained.
  • At least two initial models can be obtained by random sampling from the search space, and each initial model is composed of multiple block structures and the connection relationship between each block structure. It is determined that after at least two initial models are obtained, the initial models can be pre-trained according to the first data set, thereby obtaining the initial models after training, and each initial model after training constitutes the initial initial model set.
  • the new task is a classification task, it can be based on various models in the search space (each circle in FIG.
  • the model obtained from the connection relationship is trained on the ImageNet dataset and the duration of the training step to construct a Pareto frontier, and based on the Pareto frontier, a trained initial model that can be migrated to new tasks amicably is constructed.
  • the initial model constitutes the initial model set.
  • Table 2 shows the number of picture categories, the number of training set pictures and the number of test set pictures included in the ImageNet dataset.
  • the constructed search space includes the existing mature initial models.
  • Another way to construct an initial model set is to search for an existing mature model that meets the constraints directly based on the constraints.
  • the mature model is trained, and the mature model after training is the initial model after training.
  • the advantage of this method is that the existing initial model can be directly obtained, which saves some search time compared with the first method, and the advantage of the first method is that on the one hand, all possible blocks and all connection relationships can be traversed. , so as to find the optimal structure for new energy; on the other hand, it can break the limitations of human thinking and find a structure organization that does not exist in the existing.
  • each initial model After the initial model set is constructed based on the constraints, a group of derivative models corresponding to each initial model can be obtained according to the evolutionary algorithm (EA).
  • Each group of derivative models includes at least one derivative model. It should be noted that each The group of derivative models derived from the initial model specifically includes several derivative models that can be set by themselves according to the evolutionary algorithm, which is not specifically limited here.
  • each derivative model derived from the evolutionary algorithm is an untrained model
  • this application also needs to construct a predictor (which may be called a second predictor), the second predictor
  • the function is to predict the output accuracy of each derivative model for the first data set (it can be called the third output accuracy).
  • the third output accuracy is a rough prediction result, not the real output of the derivative model for the first data set. precision.
  • the constructed second predictor is also untrained.
  • the input of the second predictor is each trained initial model in the initial model set. model, the trained second predictor can be obtained.
  • the trained second predictor can be used to process each derived model to predict the third output accuracy of each derived model on the first dataset. Then, according to the obtained third output accuracy corresponding to each derivative model, a target derivative model (which can be one or more) is selected from all the derivative models, and the selected target derivative model is trained according to the first data set, so as to obtain the training If the target derivative model is obtained after the training, the above-mentioned initial model after training and the derivative model after training constitute the model set described in the embodiment of the present application.
  • a target derivative model which can be one or more
  • the above example is still used as an example: there are 3 initial models in the initial model set, and each initial model derives 5 derivative models, a total of 15 derivative models, because the initial models of these 15 derivative models have evolved
  • this application constructs a second predictor, which is used to roughly predict the performance of the 15 derived models on the first data set. What is the output accuracy (that is, the third output accuracy)? Then, according to each third output accuracy, select the target derivative model that meets the requirements from the 15 derivative models, assuming that 5 are selected from the 15 derivative models.
  • the target derived model is to train the five target derived models according to the first data set to obtain the trained target derived model. Then the five trained target derived models and the original three trained initial models are Together, they constitute the model set described in the embodiments of the present application.
  • the three derivative models , e, and f are used as target derivative models.
  • the two derivative models with the top two values are the target derivative models, that is, the two derivative models e and f are selected as the target derivative models.
  • the four derivative models d, e, and f are
  • the model set constructed above can also be used as a new initial model set, and the target derived model can be used as a new initial model, and the above steps of constructing a model set can be re-executed until reaching a new initial model set.
  • a preset condition may be referred to as a second preset condition).
  • the above example is still used as an example to illustrate:
  • the initial initial model set (which can be called the first round of initial model set)
  • each initial model derives 5 derived models, a total of 15 derived models, and selected 5 target derived models from these 15 derived models according to the above-mentioned method, then the 5 target derived models after training and the original 3
  • the trained initial models together constitute the model set described in the embodiment of the present application, and then a model set of 8 models (3 initial models + 5 target derived models) is used as a new initial model set, and each target derived model set is used as a new initial model set.
  • the model is used as the initial model, then the initial model set of the second round has 8 initial models after training. After that, the evolutionary algorithm is still used to derive these 8 initial models, and each corresponding set of initial models can be obtained. Derivative models, assuming that a total of 40 derivative models are obtained, then the second predictor is used to continue to predict the output accuracy (ie the third output accuracy) of these 40 derivative models for the first data set, and then according to each third output accuracy Select target derived models that meet the requirements from these 40 derived models.
  • the model set includes a total of 14 models.
  • Models (3 initial models for the first round + 5 target derived models for the first round + 6 target derived models for the current round).
  • the cycle will not be repeated.
  • the model set obtained in the second round will be used as the final model set (which can be called the target model set). If the model sets obtained in rounds still do not meet the second preset condition, the cycle continues until the second preset condition is reached.
  • the second preset condition can be set according to user needs.
  • the second preset condition can be that the number of models in the module library reaches a preset number. For example: Assuming that the preset number is 13, and the model set obtained in the second round includes 14 models, it means that the second preset condition is met, so the model set including 14 models is the final build. Model set; for another example, the second preset condition may also be that the constraints satisfied by the models in the model set meet the preset requirements. For example, assuming that there are three types of constraints, the user requires that each type of constraints requires To a certain number, the purpose of this is to make the model set accumulate to models that satisfy different constraints.
  • the trained target derived model obtained in each round can be used to update the second predictor, so as to improve the prediction accuracy of the second predictor.
  • the second predictor may be "GCN+Bayesian regressor", and specifically, the process of training the constructed second predictor according to the initial model after training It can be: first, encode the graph structure (also called topology graph) of the initial model after training, obtain the graph code of each initial model after training, and then use each graph code as the input of GCN, use GCN Extract the features encoded by each graph, thus avoiding hand-designed kernel functions to evaluate the distance between network architectures. The output of the GCN is then used as the input of the Bayesian regressor, which is mainly used to evaluate the mean and variance of the model performance, specifically by using the upper confidence bound to evaluate the performance of the model.
  • the process of training the constructed second predictor according to the initial model after training It can be: first, encode the graph structure (also called topology graph) of the initial model after training, obtain the graph code of each initial model after training, and then use each graph code as the input of GCN, use GCN Extract the features encoded by each graph, thus avoiding hand-designed kernel
  • FIG. 8 shows the graph structure of a model and the corresponding graph coding. Therefore, each model can be regarded as a graph structure.
  • the graph structure of the model shown in Figure 8 includes 3 operations, 6 node types and 7 nodes (including the input nodes Input Node1 and Output node Output Node7), of which the three operations are 1 ⁇ 1 convolution operation (1 ⁇ 1Conv), 3 ⁇ 3 convolution operation (3 ⁇ 3Conv) and maximum pooling operation (Max Pooling), 6 types
  • the node types are input, 1 ⁇ 1 convolution, 3 ⁇ 3 convolution, max pooling, output, global, and the 7 nodes are Node1-Node7 respectively.
  • an additional global node Global Node8 is introduced, which is used to connect all nodes of the graph structure, so that the entire graph structure can be encoded, and finally the graph structure of these 8 nodes and 6 node types can be formed.
  • the graph structure of each model it can be uniquely encoded to obtain a graph code.
  • Each graph code is composed of an adjacency matrix and a one-hot encoding, as shown in Figure 8.
  • a graph Encoding uniquely identifies a model.
  • a sampled model whether it is an initial model or a target derived model derived from the initial model
  • multiple models can be fused into a A supernet that uses parameter sharing for fast training, which can greatly reduce model training time.
  • the shared parameters mentioned here refer to the parameters within the network structure itself, such as the convolution operation of the subnets that constitute the supernet, the size of the convolution kernel, and the value of the convolution kernel. The following describes how to Fuse the model into a supernet for training:
  • the initial model is trained according to the first data set, and the obtained initial model after training may be: first, all initial models in the initial model set are The model is merged into a supernet model (which can be called the first model), and then the first model is trained according to the first data set, so as to obtain the first model after training, and finally, the first model after training is re-trained. Disassembled into the initial model after training.
  • FIG. 9 is a schematic diagram of fusing multiple initial models into a first model for training and re-disassembling into a plurality of initial models after training provided by an embodiment of the present application. It is assumed that There are 3 initial models, which are A 1 , A 2 and A 3 respectively.
  • the network structures of A 1 , A 2 and A 3 are shown in Figure 9, wherein each circle in Figure 9 represents a layer of the network structure (eg, pooling layer, convolutional layer, etc.), it should be noted that, Figure 9 shows that each initial model is represented as 4 layers. In practical applications, the number of layers of each initial model is not necessarily the same.
  • the number of layers is not necessarily 4, which is only for illustration, and is not specifically limited.
  • the fusion of A 1 , A 2 and A 3 is to embody all the connection relationships between the initial model layers and layers in one model, that is, the model super-A in Figure 9, and then according to the first data set.
  • the fused model super-A is trained, so that the model accuracy of all initial models can be obtained by training one model.
  • the model super-A is disassembled according to the original connection relationship, so that The trained A 1' , A 2' and A 3' are obtained.
  • the target derived model is trained according to the first data set, and the obtained trained target derived model may specifically be:
  • the target derived model is fused into a supernet model (which can be called a second model), and then the second model is trained according to the first data set to obtain the trained second model.
  • the trained second model is It is re-disassembled into multiple target-derived models after training.
  • the specific fusion and disassembly process of the target derivative model is similar to that in FIG. 9 , and will not be repeated here.
  • each model corresponds to a first output accuracy
  • each model corresponds to a set of hyperparameters
  • the hyperparameters Parameters are obtained by sampling in the hyperparameter space.
  • the model set After the model set is constructed based on the constraints, the model set includes at least two models pre-trained on the first data set (that is, the initial model after training and the target derived model after training).
  • the parameter space is randomly sampled to obtain a set of hyperparameters.
  • the hyperparameters obtained by random sampling can be called the first hyperparameters.
  • the first predictor constructed is used to predict the impact of any model in the model set on the first hyperparameter of the second data set.
  • An output precision, where each model corresponds to a first output precision for example, it may be the output precision of a model in the prediction model set, or the output precision corresponding to each model in the prediction model set, or the prediction
  • the output accuracy of each model in the model set is not limited here, and each model has a corresponding hyperparameter (ie, the first hyperparameter), that is, the hyperparameter of the model is set to the first hyperparameter.
  • the output accuracy of any model in the model set to the second data set (which can be called the first output accuracy) is predicted by another constructed predictor (which can be called the first predictor), wherein,
  • the second dataset is the dataset of the new task.
  • the input data of the constructed first predictor is the first hyperparameter, any model in the model set, and the second data set, and the output is that any model is in the first Prediction of output accuracy on the second dataset in the hyperparameter case.
  • the first hyperparameter, the model, and the second data set need to be encoded respectively, so as to obtain the hyperparameter encoding, the model encoding, and the second data set encoding, respectively,
  • input the hyperparameter code, the model code and the second data set code into the first predictor, and output the prediction result of the first output accuracy of the second data set by the model under the condition of the first hyperparameter.
  • FIG. 10 is a schematic diagram showing that the constructed first predictor obtains the prediction of the first output accuracy of the second data set by each model.
  • the first predictor can be initialized through an existing task. After the second data set is used as a new task to complete the prediction, the parameters of the first predictor can also be updated as the new task as the next existing task, thereby improving the detection accuracy of the first predictor. Specifically, in some embodiments of the present application, the parameters of the first predictor can be updated according to the second output accuracy, the second data set, the target hyperparameter and the target model, wherein the second output accuracy is the training The output accuracy of the latter target model on the second dataset.
  • the initialization process of the first predictor may specifically be: randomly sampling a subset from the training data set of the first predictor, and extracting a subset from the constructed model set and the hyperparameter space Randomly sample the pre-trained model and a set of hyperparameters for transfer learning to obtain the classification accuracy (one classification accuracy for each model, each set of hyperparameters, and a sampled subset).
  • this method can be used to collect 30K groups of data, of which 24K groups are used as training sets and 6K groups are used as validation sets to record the classification accuracy.
  • Table 3 shows the training data set that can be used to initialize the first predictor
  • Table 4 shows the test data set for testing the first predictor.
  • Tables 3 and 4 are for illustration only, and other types of data sets are also possible.
  • the training data set and the test data set may also be other types of data sets.
  • the training data set and the test data set may be text data sets; for another example, when the models in the model set are used to process speech data, then the training set data
  • the test set data and the first data set and the second data set described above in the embodiment of the present application can be the data set of the voice class.
  • the applicable scene and database type of the model in the model set are not limited here. As long as the models in the model set correspond to the datasets.
  • the network structure of the first predictor may be denoted as P
  • the network structure includes multiple fully connected layers
  • the input data and output data of the first predictor may be denoted as follows:
  • the data on the left side of the formula is the input data
  • the data on the right side of the formula is the output data
  • Regime FT represents the model characteristics of the representation, which can specifically include the one-hot encoding of the model, the first output accuracy of the model on the first data set, etc.
  • state (D) Represents the number of data categories (eg, the number of picture categories) encoded and represented by the second data set (assuming the data type of the second data set is picture), the average and variance of the number of pictures in each category, the difference between the second data set and the first
  • the similarity of a dataset eg, ImageNet dataset), etc.
  • l indicates the different layers in the first predictor
  • a l indicates the feature of each layer.
  • Weight, f l represents the feature value of each layer. Beside this there is:
  • W l and ⁇ l are the learnable parameters of each layer
  • h are the input and output of each layer.
  • a model whose first output accuracy satisfies the first preset condition in the models is a target model, and a hyperparameter corresponding to the target model is a target hyperparameter.
  • the target output accuracy which is the same as the target output accuracy.
  • the model and the first hyperparameters corresponding to the target output accuracy are called the target model and the target hyperparameters.
  • the target model and the target hyperparameters are used as the model and hyperparameters for the final processing of the second data set, that is to say , select the target model and the target hyperparameters to perform transfer learning on the new second dataset.
  • the method for judging that the target output precision satisfies the first preset condition may be: selecting the one with the largest value among all the first output precisions as the target output precision.
  • the performance of a model can be evaluated not only by output accuracy, but also by other factors. For example, the smaller the error rate, the better the performance. In the embodiment of this application, only the output Accuracy is described as an example.
  • the second dataset since the data in the dataset is fixed, first extract its dataset features (ie, data encoding), and randomly select a model in the model set, In addition, hyperparameters are randomly selected from the hyperparameter space for encoding, and finally the initialized first predictor is used to predict the detection accuracy (ie, the first output accuracy) of the second data set under various configurations, and finally the first output accuracy can be selected.
  • the configuration with the highest output accuracy ie, the corresponding model and hyperparameters
  • the meta-feature information obtained after the transfer learning is completed can be used to update the relevant parameters of the first predictor.
  • the target model can be trained according to the second data set based on the target hyperparameters to obtain a trained target model.
  • the trained target model may also be deployed on the execution device, so that the execution device processes the input target data through the trained target model.
  • the execution device can be deployed on smart terminals such as mobile phones, personal computers, and smart watches, and can also be deployed on mobile terminal devices such as autonomous vehicles, connected cars, and smart cars, which are not specifically limited here.
  • the selection of the model and the selection of hyperparameters are comprehensively considered, and the constructed first predictor is used to quickly predict that each model in the set of models constructed based on the constraints will target new tasks with different hyperparameters. and select the model and hyperparameters that satisfy the preset conditions (such as the maximum output accuracy of the model) as the target model and target hyperparameters for the final processing of the new task (ie, the second dataset).
  • the method can efficiently select appropriate models and hyperparameters based on the constraints given by the user, thereby saving training time and computing power costs.
  • the technical effect achieved by the embodiments of the present application is: in the actual service delivery process, a suitable model is found for a new task (ie, the second data set) within a limited time, and trained to achieve Deliver the required accuracy (that is, to select the best model and the best set of hyperparameters for the new task).
  • the constructed first predictor is not only used to process a new task once, but can be processed in the above manner for each new task, so that the model provided by the embodiment of the present application can be obtained.
  • the method can be applied to continuous, multi-task delivery scenarios to achieve the purpose of transfer learning across tasks.
  • FIG. 11 is a schematic diagram of FIG.
  • Step 1 Define a search space based on constraints, where the search space includes various network structural units (blocks) and connection relationships between the various network structural units.
  • Step 2 Randomly sample from the search space to obtain several initial models (for example, three initial models) to form an initial model set.
  • Step 3 Integrate multiple initial models to construct a supernet (ie, the first model described above), and through parameter sharing, according to the first data set (ie, existing data sets such as ImageNet and other data sets)
  • the first data set ie, existing data sets such as ImageNet and other data sets
  • the shared parameters mentioned here refer to the parameters inside the network structure, such as the convolution operation, the size and value of the convolution kernel of the subnets that constitute the supernet.
  • the detection accuracy of multiple initial models can be obtained at the same time, thus saving training time.
  • the detection accuracy generally refers to the accuracy of the prediction results output by the initial model for the first data set. That is, the above-mentioned first output precision.
  • An initial model set is formed according to the initial model trained on the first data set.
  • Step 4 Extract the graph codes of each trained initial model in the initial model set, train and initialize the GCN and the Bayesian regressor.
  • Step 5 Based on the existing initial model set, in the search space, use the EA sampling method to construct multiple sets of new models (ie, derivative models), in which each initial model can obtain several new models through the EA sampling method.
  • Derivative models such as 3 initial models, after EA sampling, each initial model derives 5 new models, then a total of 15 derivative models are derived (the number of evolution of each initial model can also be different), here EA sampling The resulting derivative model is untrained.
  • Step 6 Encode the graph structure of each derivative model to obtain the graph code, and then use the GCN trained in the above step 4 to extract the features of the graph code corresponding to each derivative model, and input the extracted features into the graph that has been passed through the above step 4.
  • the detection accuracy ie, the first output accuracy
  • the detection accuracy of each derivative model for the first data set is obtained. For example, if there are 15 derivative models in total, 15 first outputs can be obtained correspondingly. precision.
  • Step 7 Obtain the mean and variance of the first output accuracy according to the predicted first output accuracy of each derivative model, and further calculate the upper confidence bound (UCB) of each derivative model, so that a total of 15 confidence upper bounds can be obtained.
  • the upper confidence bound represents the upper limit of the detection accuracy of each derived model.
  • Step 8 Sort the upper confidence bound of each derived model in descending order, and select the top m (Top-m) derived models with the upper confidence bound as the target derived model, assuming that the value of m is 5, Then, from the 15 derivative models, the 5 derivative models with the larger confidence upper bound are selected as the target derivative models.
  • Step 9 For the selected m target-derived models, similarly, the m target-derived models are fused to construct a supernet (that is, the second model described above), and through parameter sharing, according to
  • the first data set ie, existing data sets such as ImageNet
  • the constructed model set includes the original 3 initial models trained according to the first data set and 5 target derived models trained according to the first data set.
  • Step 10 Execute steps 5-9 in a loop until a preset condition (that is, the second preset condition described above) is obtained, for example, until models that meet different constraints are accumulated in the model set, or, until the model set is accumulated to sufficient number of models.
  • a preset condition that is, the second preset condition described above
  • Step 1 Based on the existing task (eg, the first data set), encode the model in the model set, the randomly sampled hyperparameters in the hyperparameter space, and the first data set, and the detection accuracy of the obtained model, etc. data, initialize and train the first predictor.
  • Step 2 For a new task (ie, the second dataset), encode the dataset of the new task, extract the corresponding features, and sample the model from the model set, and sample the hyperparameters from the hyperparameter sampling space (ie, the first hyperparameter), input the second data set encoding, model encoding, and hyperparameter encoding into the first predictor, output the prediction result of the output accuracy of each model on the second data set under the condition of the first hyperparameter, and finally From multiple prediction results, select the best model and training hyperparameter configuration, and perform transfer learning on this new task.
  • a new task ie, the second dataset
  • the hyperparameter sampling space ie, the first hyperparameter
  • Step 3 If the current new task has been completed, the data set code, target model code, target hyperparameter code and the output accuracy of the target model for the new task under the condition of target hyperparameters can be further extracted for the new task (that is, the above The second output accuracy) and other meta-information, and the first predictor is updated by using this information, thereby improving the prediction accuracy of the first predictor.
  • Table 5 shows the comparison results of the combination results of the model acquisition method provided in the embodiment of the present application and the existing method.
  • model sets suitable for other types of tasks can also be obtained by modifying the constraints.
  • the inference time of the model on Huawei's D chip is introduced as the constraint.
  • the inference time of the model on the GPU V100 is introduced as a constraint to search for a network model that is friendly to the GPU V100. Change the search space and verify the efficiency of sampling on different benchmarks, as shown in Figure 14.
  • Figure 14 shows the performance comparison between the GPU V100-friendly network model and the commonly used network model.
  • the application first builds a model conversion tool, which can quickly Convert pytorch model to caffe model.
  • This tool first exports the pytorch model into an onnx model, and then converts it into a caffe model by analyzing the graph structure of the onnx model. Further, through the tools that come with the D chip, the caffe model is packaged into an om model that can run on the D chip.
  • a closed loop of model sampling, model training, and model hardware evaluation is constructed. The application can quickly obtain the inference time on the D chip during the search process, selectively build a model set, and finally obtain a D chip friendly.
  • the network structure of the model is constructed from the search process.
  • each model is randomly run 100 times, the running time is sorted, the data in the middle segment is selected, and the average value is calculated as the final evaluation performance of the model. Finally, a network model that is friendly to GPU V100 is obtained.
  • this application uses the benchmark search spaces NAS-Bench-101 and NAS-Bench-201 to replace the search space customized in the embodiment of this application, and other conditions and methods remain unchanged to verify the sampling of this application.
  • the efficiency of the algorithm is shown in Figure 15.
  • Figure 15 shows a schematic diagram of the comparison of sampling efficiency on the neural network architecture search benchmark dataset. It can be seen from Figure 15 that the neural network architecture search benchmark dataset NAS-Bench On -101 and NAS-Bench-201, higher accuracy can be obtained by using the sampling method in the embodiment of the present application (in the case of the same sampling times).
  • the acquisition method of the model provided by the embodiment of the present application can be used in the fields of intelligent security, safe city, and intelligent terminal to migrate the target model to a new task (ie, the second data set) for learning, for example, it can be applied to continuous Multi-task delivery scenarios (scenarios with only one new task are also available), such as cloud training platforms, terminal vision, unmanned driving and other projects, the following will introduce multiple application scenarios that land on products.
  • model acquisition method In the fields of terminal vision and unmanned driving, more attention is paid to the deployment of the model on a specific hardware platform.
  • the artificially designed network may not be able to meet the hardware constraints well. Therefore, using the model acquisition method provided by the embodiment of the present application can quickly Build a series of network models that meet the requirements for business trainers to choose.
  • the above descriptions are only a few specific scenarios to which the method for obtaining the model in the embodiment of the present application is applied.
  • the method for obtaining the model provided in the embodiment of the present application is not limited to the above-mentioned scenarios when applied, and can be applied to any required selection.
  • the acquisition method of the model provided by the embodiment of the present application and the target model that is finally trained based on the task can be applied, and no examples are shown here. .
  • FIG. 16 is a schematic structural diagram of a computer device provided by an embodiment of the application.
  • the computer device 1600 includes: a building module 1601, a prediction module 1602, a selection module 1603, and a training module 1604.
  • the model set includes at least two models pre-trained on the first data set; the prediction module 1602 is used to predict any model pair in the model set through the constructed first predictor
  • the first output accuracy of the second data set wherein each model corresponds to a first output accuracy, and each model corresponds to a set of hyperparameters, and the hyperparameters are obtained by sampling in the hyperparameter space, that is, in the When the hyperparameter of the model is the first hyperparameter, the first output accuracy of any model in the model set for the second data set is predicted by the constructed first predictor, and the first hyperparameter is obtained by sampling in the hyperparameter space.
  • the second data set includes any collected data set; the selection module 1603 is used to determine that the model whose first output accuracy satisfies the first preset condition in the model is the target model.
  • the hyperparameters corresponding to the target model are target hyperparameters; the training module 1604 is configured to train the target model according to the second data set based on the target hyperparameters to obtain a trained target model.
  • the selection of the model and the selection of hyperparameters are comprehensively considered, and the constructed first predictor is used to quickly predict that each model in the set of models constructed based on the constraints will target new tasks with different hyperparameters. and select the model and hyperparameters that satisfy the preset conditions (such as the maximum output accuracy of the model) as the target model and target hyperparameters for the final processing of the new task (ie, the second dataset).
  • this method can efficiently select appropriate models and hyperparameters based on the constraints given by the user, thus saving training time and computing power costs.
  • the technical effect achieved by the embodiments of the present application is: in the actual service delivery process, a suitable model is found for a new task (ie, the second data set) within a limited time, and trained to achieve Delivering the required accuracy means choosing the best model and the best set of hyperparameters for the new task.
  • the prediction module 1602 is specifically configured to: encode the hyperparameter (ie, the above-mentioned first hyperparameter), any model in the model set, and the second data set, respectively. , obtain the hyperparameter code, the model code and the second data set code respectively; input the hyperparameter code, the model code and the second data set code into the first predictor, and output any one of the models in The first output accuracy for the second dataset with the first hyperparameters.
  • the hyperparameter ie, the above-mentioned first hyperparameter
  • the model code and the second data set code respectively
  • input the hyperparameter code, the model code and the second data set code into the first predictor and output any one of the models in The first output accuracy for the second dataset with the first hyperparameters.
  • the training module 1604 is further configured to: after obtaining the trained target model, according to the second output accuracy, the second data set, the target hyperparameters and the target model The parameters of the first predictor are updated, and the second output precision is the output precision of the trained target model for the second data set.
  • the first predictor for the processed second data set, can be updated according to the second output accuracy, the second data set, etc., so that the prediction accuracy of the first predictor can be improved.
  • the first output accuracy is roughly predicted by the predictor, and the second output accuracy is obtained by actual training. If the parameters of the first predictor are updated with the output accuracy of the actual training, the detection accuracy of the first predictor will be improved accordingly.
  • the selection module 1603 is specifically configured to: select the model with the largest value of the first output accuracy from the models as the target model. That is to say, the target output accuracy has the largest value in the first output accuracy.
  • the performance of a model can be evaluated not only by the output accuracy, but also by other methods. For example, the higher the error rate The smaller the value, the better the performance, etc. In the embodiments of the present application, only the output accuracy is used as an example for description.
  • the method of determining the target model from the models may be: selecting the model corresponding to the first output precision with the largest value among all the first output precisions as the target model described in the embodiment of the present application , in general, the larger the output accuracy, the better the detection performance of the model in the case of the corresponding hyperparameters, and the optimal model and hyperparameters can be selected accordingly.
  • the building module 1601 is specifically configured to: first build an initial model set based on constraints, and the initial model set includes at least two trained initial models, wherein each initial model is Obtained by training according to the existing open first data set; after that, the constructed second predictor is trained according to the initial model after training to obtain the trained second predictor; obtained through evolutionary algorithm (EA)
  • Each initial model corresponds to a group of derivative models, each group of derivative models includes at least one derivative model; each derivative model is processed by the trained second predictor to obtain The third output precision of the data set; the target derivative model is selected from the derivative models according to the third output precision.
  • the training module 1604 is further configured to train the target derived model according to the first data set to obtain a trained target derived model, and the trained initial model and the trained target derived model are composed of the model set.
  • a model set based on constraints is specifically described, that is, an initial model set is first constructed based on the constraints, and then a series of derivative models are derived through an evolutionary algorithm using the initial model in the initial model set as a seed. And select the target derivative model from it for training, so that the target derivative model after training and the initial model after training obtained from the beginning together constitute the model set described in the embodiment of the present application, and this construction method can be accumulated to meet the constraints.
  • the second predictor can quickly filter out suitable models, saving search time.
  • the building module 1601 is further configured to: determine a search space according to a constraint condition, where the search space includes multiple network structural units (blocks) and blocks between the multiple network structural units. connection relationship; after that, randomly sample at least two initial models from the search space.
  • the training module 1604 is further configured to train the initial model according to the first data set to obtain a trained initial model, where the initial model set includes the trained initial model.
  • the initial model set according to the constraints that is, firstly, the search space is determined according to the constraints, and then the initial model is obtained by sampling and combining from the search space.
  • this construction method can also be combined to obtain an architecture organization method that currently does not exist or is unexpected, and is complete.
  • the training module 1604 is specifically configured to: fuse the at least two initial models into a first model; train the first model according to the first data set to obtain The first model after training, so that the model accuracy of all initial models can be obtained by training one model; finally, the first model after training is disassembled into initial models after training.
  • the training module 1604 is further configured to: fuse multiple target derived models into a second model; The second model is trained to obtain a trained second model; the trained second model is disassembled into a trained target derivative model.
  • the building module 1601 is further configured to: encode the graph structure of the initial model after training to obtain a graph code; then, train a graph convolutional neural network according to the graph code (GCN) and a Bayesian regressor to obtain a trained GCN and a trained Bayesian regressor, the GCN and the Bayesian regressor constitute the second predictor, and the trained GCN and the trained Bayesian regressor constitute the trained second predictor.
  • GCN graph convolutional neural network according to the graph code
  • a Bayesian regressor to obtain a trained GCN and a trained Bayesian regressor
  • the GCN and the Bayesian regressor constitute the second predictor
  • the trained GCN and the trained Bayesian regressor constitute the trained second predictor.
  • the second predictor can be "GCN+Bayesian regressor", and when the second predictor is "GCN+Bayesian regressor", then the graph of the initial model after training needs to be
  • the structure is encoded, and the graph codes corresponding to each initial model obtained by encoding can be used as the input data of GCN, and GCN is used to extract the features of each graph code, so as to avoid manually designing kernel functions to evaluate the distance between network architectures.
  • the output of GCN is used as the input of the Bayesian regressor, which is mainly used to evaluate the mean and variance of the model performance, which is achievable.
  • the building module 1601 is further configured to: select a third derivative model whose output precision is greater than a preset value from all the derivative models as the target derivative model; or, select from all the derivative models
  • the first n derivative models with a larger third output precision are used as the target derivative model, n ⁇ 1; or, the upper confidence bound (UCB) corresponding to each derivative model is obtained according to the mean and variance of the third output precision, and From all the derived models, select the top m derived models with a larger confidence upper bound as the target derived model, where m ⁇ 1.
  • the computer device 1600 may further include: a triggering module 1605, the triggering module 1605 is configured to use the model set as a new initial model set and the target derived model as a new initial model , and repeat the steps performed by the building module 1601 until the second preset condition is reached.
  • a triggering module 1605 is configured to use the model set as a new initial model set and the target derived model as a new initial model , and repeat the steps performed by the building module 1601 until the second preset condition is reached.
  • each model in the model set can be re-used as a new initial model to continue to construct a new derived model and select a new target derived model until a preset condition is reached, so that the model set can accumulate enough A model that meets the requirements.
  • the second preset condition can be set according to user requirements.
  • the second preset condition can be that the number of models in the module library reaches a preset number, assuming that the preset number is 13, and The model set obtained in the current round includes 14 models, then it means that the second preset condition is reached, so the model set including 14 models is the model set finally constructed; for another example, the second preset condition also It can be that the constraints satisfied by the models in the model set meet the preset requirements. For example, suppose there are three types of constraints, and the user requires that each type of constraints must reach a certain number. The purpose of this is to make the model set. Accumulate to models that satisfy different constraints.
  • the constraints include any one or more of: model size, model inference delay, model training delay, hardware deployment conditions, and on-chip memory size.
  • some new tasks such as data sets of pictures, audio, etc. obtained by autonomous vehicles
  • model inference delay because autonomous vehicles have high real-time requirements
  • some new tasks such as mobile phones, etc.
  • Terminal devices have higher requirements on the size of on-chip memory, because the storage space of handheld terminals such as mobile phones is limited.
  • the computer device 1600 may further include: a deployment module 1606, configured to deploy the trained target model on an execution device, so that the execution device passes the
  • the trained target model processes the input target data.
  • it can be deployed on smart terminals such as mobile phones, personal computers, and smart watches, and can also be deployed on mobile terminal devices such as autonomous vehicles, connected cars, and smart cars, which are not specifically limited here.
  • the target model obtained by training based on the second data set can be deployed on the execution device for practical application.
  • FIG. 17 is a schematic structural diagram of the computer device provided by the embodiment of the present application.
  • the described computer device 1600 is used to implement the functions of each step in the corresponding embodiment of FIG. 4 .
  • the computer device 1700 is implemented by one or more servers, and the computer device 1700 may vary greatly due to different configurations or performances.
  • which may include one or more central processing units (CPUs) 1722 (eg, one or more central processing units) and memory 1732 , one or more storage media 1730 (eg, one or more central processing units) that store application programs 1742 or data 1744 one or more mass storage devices).
  • CPUs central processing units
  • storage media 1730 eg, one or more central processing units
  • the memory 1732 and the storage medium 1730 may be short-term storage or persistent storage.
  • the program stored in the storage medium 1730 may include one or more modules (not shown in the figure), and each module may include a series of instructions to operate on the computer device 1700 .
  • the central processing unit 1722 may be configured to communicate with the storage medium 1730 to execute a series of instruction operations in the storage medium 1730 on the computer device 1700 .
  • Computer device 1700 may also include one or more power supplies 1726, one or more wired or wireless network interfaces 1750, one or more input and output interfaces 1758, and/or, one or more operating systems 1741, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and many more.
  • operating systems 1741 such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and many more.
  • the central processing unit 1722 is configured to execute the method for acquiring the target model in the embodiment corresponding to FIG. 4 .
  • the central processing unit 1722 is configured to: first, build a model set based on the constraints, the model set includes at least two models that have been pre-trained on the first data set (eg, the open ImageNet data set).
  • the model set includes at least two models pre-trained on the first data set (that is, the initial model after training and the target derived model after training), and then, in the hyperparameter space Random sampling is performed to obtain a set of hyperparameters, and the hyperparameters obtained by random sampling are called the first hyperparameters, and then the first output of any model in the model set to the second data set is predicted by the constructed first predictor.
  • Accuracy where each model corresponds to a first output accuracy. For example, it may be the output accuracy of a model in the prediction model set, or the output accuracy corresponding to multiple models in the prediction model set, or the prediction model set.
  • each model has a corresponding hyperparameter (ie, the first hyperparameter), that is, the hyperparameter of the model is set to the first hyperparameter.
  • the output accuracy (which may be called the first output accuracy) of any model in the model set for the second data set is predicted by another constructed predictor (which may be called the first predictor), wherein the first The second dataset is the dataset of the new task.
  • the target output accuracy which is the same as the target output accuracy.
  • the model and hyperparameters corresponding to the target output accuracy are called the target model and target hyperparameters.
  • the target model and the target hyperparameters are used as the model and hyperparameters for the final processing of the second data set, that is to say, select
  • the target model and the target hyperparameters are transfer-learned on a new second dataset.
  • the target model and target hyperparameters are determined from the model set and hyperparameter space through the above steps, the target model can be trained according to the second data set based on the target hyperparameters to obtain a trained target model.
  • Embodiments of the present application further provide a computer-readable storage medium, where a program for performing signal processing is stored in the computer-readable storage medium, and when the computer-readable storage medium runs on a computer, it causes the computer to execute the programs described in the foregoing embodiments. Perform the steps performed by the device.
  • the computer device provided in the embodiment of the present application may be a chip, and the chip includes: a processing unit and a communication unit.
  • the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, a pin, or a circuit.
  • the processing unit can execute the computer-executed instructions stored in the storage unit, so that the chip in the computer device executes the method for acquiring the model described in the embodiment shown in FIG. 4 above.
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc.
  • the storage unit may also be a storage unit located outside the chip in the wireless access device, such as only Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
  • ROM Read-only memory
  • RAM random access memory
  • FIG. 18 is a schematic structural diagram of a chip provided by an embodiment of the application.
  • the chip can be represented as a neural network processor NPU 200, and the NPU 200 is mounted as a co-processor to the main CPU (Host CPU), tasks are allocated by the Host CPU.
  • the core part of the NPU is the arithmetic circuit 2003, which is controlled by the controller 2004 to extract the matrix data in the memory and perform multiplication operations.
  • the arithmetic circuit 2003 includes multiple processing units (Process Engine, PE). In some implementations, the arithmetic circuit 2003 is a two-dimensional systolic array. The arithmetic circuit 2003 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 2003 is a general-purpose matrix processor.
  • PE Processing Unit
  • the arithmetic circuit 2003 is a two-dimensional systolic array.
  • the arithmetic circuit 2003 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition.
  • the arithmetic circuit 2003 is a general-purpose matrix processor.
  • the arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 2002 and buffers it on each PE in the arithmetic circuit.
  • the arithmetic circuit fetches the data of matrix A and matrix B from the input memory 2001 to perform matrix operation, and stores the partial result or final result of the matrix in an accumulator 2008 .
  • Unified memory 2006 is used to store input data and output data.
  • the weight data directly passes through the storage unit access controller (Direct Memory Access Controller, DMAC) 2005, and the DMAC is transferred to the weight memory 2002.
  • Input data is also transferred to unified memory 2006 via the DMAC.
  • DMAC Direct Memory Access Controller
  • the BIU is the Bus Interface Unit, that is, the bus interface unit 2010, which is used for the interaction between the AXI bus and the DMAC and the instruction fetch buffer (Instruction Fetch Buffer, IFB) 2009.
  • IFB Instruction Fetch Buffer
  • the bus interface unit 2010 (Bus Interface Unit, BIU for short) is used for the instruction fetch memory 2009 to obtain instructions from the external memory, and also for the storage unit access controller 2005 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
  • the DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 2006 , the weight data to the weight memory 2002 , or the input data to the input memory 2001 .
  • the vector calculation unit 2007 includes a plurality of operation processing units, and further processes the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on, if necessary. It is mainly used for non-convolutional/fully connected layer network computation in neural networks, such as Batch Normalization, pixel-level summation, and upsampling of feature planes.
  • the vector computation unit 2007 can store the processed output vectors to the unified memory 2006 .
  • the vector calculation unit 2007 may apply a linear function and/or a nonlinear function to the output of the operation circuit 2003, such as linear interpolation of the feature plane extracted by the convolutional layer, such as a vector of accumulated values, to generate activation values.
  • the vector computation unit 2007 generates normalized values, pixel-level summed values, or both.
  • the vector of processed outputs can be used as activation input to the arithmetic circuit 2003, eg, for use in subsequent layers in a neural network.
  • the instruction fetch memory (instruction fetch buffer) 2009 connected to the controller 2004 is used to store the instructions used by the controller 2004;
  • Unified memory 2006, input memory 2001, weight memory 2002 and instruction fetch memory 2009 are all On-Chip memories. External memory is private to the NPU hardware architecture.
  • the processor mentioned in any one of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the program of the method in the first aspect.
  • the device embodiments described above are only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be A physical unit, which can be located in one place or distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
  • U disk U disk
  • mobile hard disk ROM
  • RAM random access memory
  • disk or CD etc.
  • ROM read only memory
  • CD CD, etc.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted over a wire from a website site, computer or data center. (eg coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.) to another website site, computer or data center.
  • DSL digital subscriber line
  • wireless eg infrared, wireless, microwave, etc.
  • the computer-readable storage medium can be any available media that can be stored by a computer or a data storage device such as a data center that includes one or more available media integrated.
  • the usable media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (digital video disc, DVD), or semiconductor media (eg, solid state disk (SSD)), and the like.

Abstract

本申请公开了一种模型的获取方法及设备,可应用于人工智能领域中的计算机视觉领域,该方法包括:通过构建的第一预测器快速预测基于约束条件构建的模型集中每个模型(根据第一数据集预训练过)在不同超参数情况下针对新任务的性能表现,并从中选择满足预设条件(如模型的输出精度取值最大)的模型和超参数作为最终处理新任务(即第二数据集)的目标模型和目标超参数。针对新任务,该方法基于用户给定的约束条件,可高效选择出合适的模型和超参数,节约了训练时间和算力成本。在实际业务交付过程中,可在有限时间针对一个新任务找到合适模型进行迁移学习,并将其训练到交付要求的精度。

Description

一种模型的获取方法及设备
本申请要求于2020年10月21日提交中国专利局、申请号为202011131434.7、申请名称为“一种模型的获取方法及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及机器学习领域,尤其涉及一种模型的获取方法及设备。
背景技术
人工智能(Artificial Intelligence,AI)是利用计算机或者计算机控制的机器模拟、延伸和扩展人的智能。人工智能包括研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
迁移学习是一种机器学习的方法,是指把为任务A(可称为第一数据集)开发得到的模型作为初始点,重新使用在为新的任务B(可称为第二数据集)开发模型的过程中,即将一个基于任务A得到的预训练过的模型重新用在另一任务B中。然而,对于一个新的任务来说,存在海量的基于某些已有任务训练得到的模型(即预训练过的模型),如何从海量的模型中高效地选择到一个适合该新任务的模型以及一组合适的超参数是一个亟待解决的问题。目前,针对一个新任务,利用在已有的相关任务(如,ImageNet数据集分类任务)基础上训练好的模型进行迁移学习(如,fine-tune)是一种高效的技术,在计算机视觉领域被广泛使用。
现有的一种解决方式是根据经验人工选择一个在开放数据集(如ImageNet等原数据集)上预训练过的模型,并根据经验选择一组超参数(或手工微调超参数)进行迁移学习到新任务中,基于选出的这组超参数,根据该新任务对选出的这个模型进行再次训练,以期将该模型训练到目标精度,但是在原数据集上输出精度高的模型,在迁移学习任务(即第二数据集)上不一样好,如果训练结果没有达到目标精度,可能需要重新选择模型或重新选择超参数再次进行训练。如图1所示,整个流程可能涉及到多次模型选择和多次超参数选择(甚至可能需要人工设计新模型),而每次训练都需要花费大量的时间和算力成本。
发明内容
本申请实施例提供了一种模型的获取方法及设备,该方法综合考虑了模型的选择和超参数的选择,用于通过构建的第一预测器快速预测基于约束条件构建的模型集中每个模型在不同超参数情况下针对新任务的性能表现,并从中选择满足预设条件(如,模型的输出精度取值最大)的模型和超参数作为最终处理新任务(即第二数据集)的目标模型和目标超参数。针对新任务,该方法基于用户给定的约束条件,可高效选择出合适的模型和超参数,从而节约了训练时间和算力成本。
基于此,本申请实施例提供以下技术方案:
第一方面,本申请实施例首先提供一种模型的获取方法,可用于人工智能领域中,该 方法包括:首先,基于约束条件构建模型集(也可称为模型集合,以下统称为模型集),该模型集中包括至少两个已经在第一数据集上预训练过的模型,在基于约束条件构建好模型集之后,那么该模型集就包括至少两个在第一数据集上预训练过的模型,之后,在超参数空间进行随机采样,得到一组超参数,这组随机采样得到的超参数可称为第一超参数,之后通过构建的第一预测器预测模型集中的任意一个模型对第二数据集的第一输出精度,其中,每个模型对应一个第一输出精度,例如,可以是预测模型集中的一个模型的输出精度,也可以是预测模型集中的多个模型各自对应的输出精度,还可以是预测模型集中的每个模型的输出精度,此处不做限定,且每个模型都对应会有一个超参数(即第一超参数),也就是说,在模型的超参数设置为该第一超参数的情况下,通过构建的另一预测器(可称为第一预测器)预测该模型集里的任意一个模型对第二数据集的输出精度(可称为第一输出精度),其中,该第二数据集即为新任务的数据集。当得到的所有第一输出精度中,存在一个满足预设条件(可称为第一预设条件)的输出精度,则该满足第一预设条件的输出精度就称为目标输出精度,与该目标输出精度对应的模型和超参数则称为目标模型及目标超参数,之后,就将该目标模型和该目标超参数作为最终处理该第二数据集的模型和超参数,也就是说,选择该目标模型和该目标超参数在新的第二数据集上进行迁移学习。通过上述步骤从模型集和超参数空间确定出目标模型和目标超参数后,就可基于该目标超参数,根据该第二数据集对该目标模型进行训练,从而得到训练后的目标模型。
在本申请上述实施方式中,综合考虑了模型的选择和超参数的选择,用于通过构建的第一预测器快速预测基于约束条件构建的模型集中每个模型在不同超参数情况下针对新任务的性能表现,并从中选择满足预设条件(如,模型的输出精度取值最大)的模型和超参数作为最终处理新任务(即第二数据集)的目标模型和目标超参数。针对新任务,该方法基于用户给定的约束条件,可高效选择出合适的模型和超参数,从而节约了训练时间和算力成本。也就是说,本申请实施例所达到的技术效果是:在实际业务交付过程中,在有限的时间内对一个新任务(即第二数据集),找到合适的模型,并将其训练到达到交付要求的精度,也就是针对新任务要选择出一个最好的模型和一组最好的超参。
在第一方面的一种可能的实现方式中,构建的第一预测器的输入数据是从超参数空间采样得到的一组超参数(即第一超参数)、该模型集中的任意一个模型和第二数据集,输出是该任意一个模型在该第一超参数情况下对第二数据集的输出精度的预测。具体地,对该第一超参数、该模型以及该第二数据集分别进行编码,从而分别得到该超参数编码、该模型编码以及第二数据集编码,之后,将该超参数编码、该模型编码及第二数据集编码输入第一预测器,输出该模型在第一超参数情况下对第二数据集的第一输出精度的预测结果。
在本申请上述实施方式中,具体阐述了构建的第一检测器的输入数据和输出数据分别是什么,具备可实现性。
在第一方面的一种可能的实现方式中,由于构建的第一预测器是未经过训练的,因此,在本申请实施例中,可通过已有的任务对该第一预测器进行初始化,当该第二数据集作为新任务完成预测后,也可将该新任务作为下一个已有的任务对该第一预测器的参数进行更新,从而提高第一预测器的检测精度。具体来说,可以根据第二输出精度、第二数据集、 目标超参数及目标模型更新该第一预测器的参数,其中,该第二输出精度为该训练后的目标模型对第二数据集的输出精度。
在本申请上述实施方式中,对于已处理完的第二数据集,可根据第二输出精度、第二数据集等更新该第一预测器,从而可提升该第一预测器的预测精度,第一输出精度是预测器粗略预测的,第二输出精度就是真实训练得到的,通过真实训练的输出精度去更新第一预测器的参数,那么第一预测器的检测精度相应就会提高。
在第一方面的一种可能的实现方式中,目标输出精度满足第一预设条件包括:所述目标输出精度在所述第一输出精度中取值最大,这里还需要注意的是,评价一个模型的性能,除了可以是通过输出精度,还可以是其他的,比如,错误率越小,则性能越好;准确率越大,则性能越好等,在本申请实施例中,仅是以输出精度为例进行说明。
在本申请上述实施方式中,从模型中确定出目标模型的方式可以是:在所有第一输出精度中选择取值最大的那个第一输出精度对应的模型作为本申请实施例所述的目标模型,一般来说,输出精度越大,说明该模型在对应超参数情况下的检测性能越好,据此可选择出配置最优的模型和超参数。
在第一方面的一种可能的实现方式中,基于约束条件构建模型集可以有不同的实现方式,可以是基于约束条件先构建初始模型集,该初始模型集就包括至少两个训练后的初始模型,其中,每个初始模型是根据已有的开放的第一数据集训练得到的。基于约束条件构建好初始模型集之后,就可以根据演化算法(evolutionary algorithm,EA)得到每个初始模型各自对应的一组衍生模型,其中,每组衍生模型包括至少一个衍生模型,需要注意的是,每个初始模型衍生的一组衍生模型中具体包括几个衍生模型可根据演化算法自行设置,具体此处不做限定。得到初始模型的衍生模型之后,由于通过演化算法衍生出来的各个衍生模型是没有经过训练的模型,因此,本申请还需要构建一个预测器(可称为第二预测器),该第二预测器的作用是预测各个衍生模型对第一数据集的输出精度(可称为第三输出精度),该第三输出精度是一种粗略的预测结果,并不是衍生模型针对第一数据集真正的输出精度。这里还需要注意的是,构建的第二预测器也是未经过训练的,在本申请实施例中,该第二预测器的输入为初始模型集中各个训练过的初始模型,根据各个训练过的初始模型,可以得到训练后的第二预测器。之后,经过训练的第二预测器就可用于对每个衍生模型进行处理,预测每个衍生模型对第一数据集的第三输出精度。随后根据得到的各个衍生模型对应的第三输出精度从所有衍生模型中选取目标衍生模型(可以是一个或多个),并根据第一数据集对选出的目标衍生模型进行训练,从而得到训练后的目标衍生模型,那么上述训练后的初始模型和该训练后的衍生模型就构成本申请实施例所述的模型集。
在本申请上述实施方式中,具体阐述了如何基于约束条件构建模型集,即先基于约束条件构建初始模型集,再以初始模型集中的初始模型作为种子,通过演化算法衍生出一系列衍生模型,并从中选择出目标衍生模型进行训练,从而训练后的目标衍生模型和开始得到的训练后的初始模型共同构成本申请实施例所述的模型集,这种构建方式可累积到满足约束条件的各种模型,并且第二预测器可快速筛选出合适的模型,节省了搜索时间。
在第一方面的一种可能的实现方式中,基于约束条件构建初始模型集的方式具体可以 是:首先,根据约束条件确定一个搜索空间,该搜索空间就包括多种网络结构单元(block)及所述多种网络结构单元之间的连接关系,其中,每个block内部包含一个或多个节点以及各个节点上的操作(operation,OP),该操作指的是神经网络的一些基本操作单元,例如,卷积、池化等操作,这里的节点可以理解为神经网络模型的层,如,输入层、输出层、卷积层、池化层、全连接层等,各个block连接后形成的组合结构就为本申请实施例所述的初始模型。根据所述方式确定好符合约束条件的搜索空间后,就可从搜索空间随机采样,得到至少两个初始模型,每个初始模型是由多个block结构和各block结构之间的连接关系确定的,得到至少两个初始模型后,就可根据第一数据集对初始模型进行预先训练,从而得到训练后的初始模型,各个训练后的初始模型就构成最开始的初始模型集。
在本申请上述实施方式中,阐述了如何根据约束条件构建初始模型集,即先根据约束条件确定搜索空间,然后从搜索空间采样组合得到初始模型。这种构建方式一方面除了可以遍历到所有可能的架构之外,还可以组合得到目前没有或大家想不到的模型的架构组织方式,具备完备性。
在第一方面的一种可能的实现方式中,由于构建的初始模型集中包括至少两个初始模型,因此根据第一数据集对初始模型进行训练,得到训练后的初始模型具体可以是:首先,将初始模型集中的所有初始模型融合成一个超网模型(可称为第一模型),之后根据第一数据集对该第一模型进行训练,从而得到训练后的第一模型,最后,将训练后的第一模型又重新拆解为训练后的初始模型。
在本申请上述实施方式中,阐述了如何对多个初始模型进行联合训练,即将采样得到的至少两个初始模型融合成一个超网(即第一模型),这样可以采用参数共享的方式进行训练,训练完后再拆解开,从而通过对一个模型的训练就可得到所有初始模型的检测精度,加快了对所有初始模型训练进度,相比一个一个单独训练各个初始模型节约了训练时间。
在第一方面的一种可能的实现方式中,如果得到的目标衍生模型有多个,那么根据第一数据集对目标衍生模型进行训练,得到训练后的目标衍生模型具体可以是:首先,将这多个目标衍生模型融合成一个超网模型(可称为第二模型),之后根据第一数据集对该第二模型进行训练,从而得到训练后的第二模型,最后,将训练后的第二模型又重新拆解为训练后的多个目标衍生模型。
在本申请上述实施方式中,当目标衍生模型有多个时,阐述了如何对多个目标衍生模型进行联合训练,即将采样得到的多个目标衍生模型融合成一个超网(即第二模型),这样依然可以采用参数共享的方式进行训练,训练完后再拆解开,从而通过对一个模型的训练就可得到所有目标衍生模型的检测精度,加快了对所有目标衍生模型训练进度,相比一个一个单独训练各个目标衍生模型节约了训练时间。
在第一方面的一种可能的实现方式中,第二预测器可以是“GCN+贝叶斯回归器”,具体地,根据训练后的初始模型对构建的第二预测器进行训练的过程可以是:首先,对训练后的初始模型的图结构(也可称为拓扑图)进行编码,得到每个训练后的初始模型的图编码,然后将每个图编码作为GCN的输入,之后GCN的输出作为贝叶斯回归器的输入,该贝叶斯回归器的作用主要是用来评估模型性能的均值和方差,具体为通过使用置信上界来 评估模型的性能。
在本申请上述实施方式中,阐述了第二预测器可以是“GCN+贝叶斯回归器”,当第二预测器是“GCN+贝叶斯回归器”,那么需要对训练后的初始模型的图结构进行编码,编码得到的各个初始模型对应的图编码才能作为GCN的输入数据,利用GCN提取每个图编码的特征,从而避免手工设计核函数来评估网络架构之间的距离。之后GCN的输出作为贝叶斯回归器的输入,该贝叶斯回归器的作用主要是用来评估模型性能的均值和方差,具备可实现性。
在第一方面的一种可能的实现方式中,如何根据各个衍生模型对应的第三输出精度从所有衍生模型中选取目标衍生模型有多种实现方式,包括但不限于如下几种:从所有衍生模型中选取第三输出精度大于预设值的衍生模型作为该目标衍生模型;或,从所有衍生模型中选取第三输出精度取值较大的前n个衍生模型作为该目标衍生模型,n≥1;或,根据第三输出精度的均值和方差得到每个衍生模型对应的置信上界(UCB),并从所有衍生模型中选取置信上界取值较大的前m个衍生模型作为该目标衍生模型,m≥1。
在本申请上述实施方式中,阐述了根据各个衍生模型对应的第三输出精度从所有衍生模型中选取目标衍生模型有多种实现方式,具备可选择性和灵活性。
在第一方面的一种可能的实现方式中,还可以将上述构建的模型集作为新的初始模型集、目标衍生模型作为新的初始模型,重新执行上述构建模型集的步骤直至达到一个预设条件(可称为第二预设条件)。
在本申请上述实施方式中,阐述可将模型集内的各个模型重新作为新的初始模型继续构建新的衍生模型以及选择新的目标衍生模型,直至达到预设条件,可使得模型集累积到足够满足要求的模型。
在第一方面的一种可能的实现方式中,该第二预设条件可根据用户需求自行设置,例如,该第二预设条件可以是模块库内的模型数量达到预设数量,假设预设数量为13,而当前轮次得到的模型集包括14个模型,那么说明达到了第二预设条件,因此该包括14个模型的模型集就为最终构建得到的模型集;又例如,该第二预设条件还可以是模型集内的模型满足的约束条件达到预设要求,例如,假设约束条件一共有3种类型,用户要求每种类型的约束条件都需要达到一定数量,这样做的目的是为了使得模型集累积到满足不同约束条件的模型。
在本申请上述实施方式中,阐述了第二预设条件的几种具体表现形式,具备灵活性。
在第一方面的一种可能的实现方式中,所述约束条件包括:模型大小、模型推理时延、模型训练时延、硬件部署条件、片上内存大小中的任意一个或多个。举例来说,有些新任务(如,自动驾驶车辆获取的图片、音频等数据集)对模型推理时延要求比较高,因为自动驾驶车辆对实时性要求高;而有些新任务(如,手机等终端设备)对占据片上内存大小有较高要求,这是因为手机等手持终端的存储空间有限。
在本申请上述实施方式中,阐述了约束条件可以是哪些类型,这是因为不同的新任务对模型有不同的约束条件,在本申请实施例中,可基于新任务(可以是一个或多个)的不同应用场景得到不同的约束条件,从而基于约束条件构建满足各个新任务的模型集,具备 完备性。
在第一方面的一种可能的实现方式中,训练后的目标模型还可以部署在执行设备上,以使得执行设备通过该训练后的目标模型对输入的目标数据进行处理。例如,可以部署在手机、个人电脑、智能手表等智能终端上,也可以部署在自动驾驶车辆、网联汽车、智能汽车等可移动终端设备上,具体此处不做限定。
在本申请上述实施方式中,阐述了基于第二数据集训练得到的目标模型可以部署在执行设备上进行实际应用。
本申请实施例第二方面提供一种计算机设备,该计算机设备具有实现上述第一方面或第一方面任意一种可能实现方式的方法的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。
本申请实施例第三方面提供一种计算机设备,可以包括存储器、处理器以及总线系统,其中,存储器用于存储程序,处理器用于调用该存储器中存储的程序以执行本申请实施例第一方面或第一方面任意一种可能实现方式的方法。
本申请第四方面提供一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机可以执行上述第一方面或第一方面任意一种可能实现方式的方法。
本申请实施例第五方面提供了一种计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面或第一方面任意一种可能实现方式的方法。
本申请实施例第六方面提供了一种芯片,该芯片包括至少一个处理器和至少一个接口电路,该接口电路和该处理器耦合,至少一个接口电路用于执行收发功能,并将指令发送给至少一个处理器,至少一个处理器用于运行计算机程序或指令,其具有实现如上述第一方面或第一方面任意一种可能实现方式的方法的功能,该功能可以通过硬件实现,也可以通过软件实现,还可以通过硬件和软件组合实现,该硬件或软件包括一个或多个与上述功能相对应的模块。此外,该接口电路用于与该芯片之外的其它模块进行通信,例如,该接口电路可将芯片上处理器得到的目标模型发送给各种智能行驶(如,无人驾驶、辅助驾驶等)的智能体进行应用。
附图说明
图1为一种为新任务选择合适模型和超参数的一个流程示意图;
图2为GCN处理图结构数据的一个流程示意图;
图3为本申请实施例提供的人工智能主体框架的一种结构示意图;
图4为本申请实施例提供的模型的获取方法的一种流程示意图;
图5为本申请实施例提供的block结构以及该block结构的内部操作关系的一个示意图;
图6为本申请实施例提供的多个相同或不同的block之间的连接关系的一个示意图;
图7为本申请实施例提供的基于搜索空间构建初始模型集的一个流程示意图;
图8为本申请实施例提供的模型的图结构以及对应的图编码的一个示意图;
图9为本申请实施例提供的将多个初始模型融合成第一模型进行训练且训练后重新拆 解为多个初始模型的示意图;
图10为本申请实施例提供的第一预测器得到各个模型对第二数据集的第一输出精度的预测的示意图;
图11为本申请实施例提供的模型的获取方法的一个框架示意图;
图12为本申请实施例提供的模型集ET-NAS与手工设计的模型在训练步长时间上的比较的一个示意图;
图13为本申请实施例提供的对D芯片友好的网络模型和常用网络模型的性能比较的一个示意图;
图14为本申请实施例提供的对GPU V100友好的网络模型和常用网络模型的性能比较的一个示意图;
图15为本申请实施例提供的在神经网络架构搜索基准数据集上采样效率比较的一个示意图;
图16为本申请实施例提供的计算机设备的一种结构示意图;
图17为本申请实施例提供的计算机设备的另一结构示意图;
图18为本申请实施例提供的芯片的一种结构示意图。
具体实施方式
本申请实施例提供了一种模型的获取方法及设备,该方法综合考虑了模型的选择和超参数的选择,用于通过构建的第一预测器快速预测基于约束条件构建的模型集中每个模型在不同超参数情况下针对新任务的性能表现,并从中选择满足预设条件(如,模型的输出精度取值最大)的模型和超参数作为最终处理新任务(即第二数据集)的目标模型和目标超参数。针对新任务,该方法基于用户给定的约束条件,可高效选择出合适的模型和超参数,从而节约了训练时间和算力成本。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。
本申请实施例涉及了许多关于迁移学习的相关知识,为了更好地理解本申请实施例的方案,下面先对本申请实施例可能涉及的相关术语和概念进行介绍。应理解的是,相关的概念解释可能会因为本申请实施例的具体情况有所限制,但并不代表本申请仅能局限于该具体情况,在不同实施例的具体情况可能也会存在差异,具体此处不做限定。
(1)迁移学习(transfer learning)
迁移学习是一种机器学习方法,就是把为任务A开发的模型作为初始点,重新使用在为任务B开发模型的过程中。也就是说,把基于已有任务(如所述的任务A)训练好的模型学习到的知识迁移到新的任务(如所述的任务B)中来帮助该模型进行再训练,通过迁 移学习将已经学到的知识(蕴含在模型参数中)通过某种方式来分享给新任务从而加快并优化模型的学习效率,这样模型不用从零开始学习。其中,fine-tune就是一种简单、高效的迁移学习方法,例如训练目标检测任务,使用在ImageNet数据集上训练好的模型作为新任务的模型(backbone)可以明显的提升训练效率。
(2)图卷积神经网络(graph convolutional network,GCN)
GCN的本质目的是用来提取图(graph)结构的空间特征,这里的图结构是指数学(即图论)中的用顶点(vertex)和边(edge)建立相应关系的拓扑图,一个图结构的空间特征具有如下两个特征:a、节点特征,即每个节点有自己的特征,该特征体现在节点本身;b、结构特征,即图结构中节点与节点之间的联系,该特征体现在边上(节点与节点之间的连接线)。对于GCN来说,既要考虑节点信息又要考虑结构信息,如图2所示,图2为GCN处理图结构数据的一个流程示意图,GCN可以看作是卷积神经网络(convolutional neural network,CNN)在图结构上的自然推广,它能同时对节点特征与结构特征进行端到端的学习,是目前对图结构类型数据进行学习任务的最佳选择。并且,GCN适用性极广,适用于任意拓扑结构的图。
(3)贝叶斯回归器
贝叶斯回归器也称为贝叶斯回归或贝叶斯线性回归,是使用统计学中贝叶斯推断方法求解的线性回归模型。贝叶斯线性回归将线性模型的参数视为随机变量,并通过模型参数(权重系数)的先验(prior)计算其后验(posterior)。贝叶斯线性回归可以使用数值方法求解,在一定条件下,也可得到解析型式的后验或其有关统计量。贝叶斯线性回归具有贝叶斯统计模型的基本性质,可以求解权重系数的概率密度函数,进行在线学习以及基于贝叶斯因子(bayes factor)的模型假设检验。
(4)置信上界(upper confidence bound,UCB)
置信界限是对单侧置信区间中的界限以及双侧置信区间的上、下限的统称,置信区间则是指包括在置信界限之间的区间,置信区间是用一种特定的可能性(置信程度或置信倒数)能说明的一个范围(区间),它有一个要求去测定的参数,这个参数可能是平均数、标准误差、一种比例数或任何其他测定点,目的是确定较高和较低的置信界限。其中,较高的置信界限就称为置信上界(也可称为置信上限),较低的置信界限就称为置信下界(也可称为置信下限)目的是确定较高和较低的置信界限。
具体地,由一个总体进行随机抽样计算可得到某一参数估计值,在估计值周围由抽样值计算得到的一个区间内,一定程度上包括了真值在此区间出现的可能性,此区间即为置信区间。通常计算95%置信区间,可理解为真值在此区间内有95%出现的可能性,也可计算99%或99.9%的置信区间等。
(5)演化算法(evolutionary algorithm,EA)
也可称为进化算法,是模拟自然界中的生物的演化过程产生的一种群体导向的随机搜索技术和方法。是一个“算法簇”,尽管它有很多的变化,有不同的遗传基因表达方式,不同的交叉和变异算子,特殊算子的引用,以及不同的再生和选择方法,但它们产生的灵感都来自于大自然的生物进化。与传统的基于微积分的方法和穷举法等优化算法相比,进 化计算是一种成熟的具有高鲁棒性和广泛适用性的全局优化方法,具有自组织、自适应、自学习的特性,能够不受问题性质的限制,有效地处理传统优化算法难以解决的复杂问题。
(6)帕累托前沿(pareto front)
最早是一个经济学的概念,在多目标优化问题里面广泛使用。多个目标进行优化的过程中,存在目标之间的冲突和无法比较的情况,一个解在某个目标上是最好的,在其他的目标上可能是最差的。给定两个解S 1和S 2,对所有的目标而言,如果S 1均优于S 2,那么有S 1支配S 2。如果S 1没有被其他解所支配,则S 1称为非支配解,也称Pareto解。Pareto解构成的集合,称之为帕累托前沿。
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
首先对人工智能系统总体工作流程进行描述,请参见图3,图3示出的为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。
(1)基础设施
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片(CPU、NPU、GPU、ASIC、FPGA等硬件加速芯片)提供;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。
(2)数据
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。
(3)数据处理
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。
(4)通用能力
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理, 语音识别,图像的识别等等。
(5)智能产品及行业应用
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、智能制造、智能交通、智能家居、智能医疗、智能安防、自动驾驶、平安城市等。
本申请可应用于人工智能领域中的计算机视觉领域,具体的,结合图3来讲,本申请实施例中基础设施获取的数据是本申请实施例所述的新任务(即第二数据集),具体可以是图片、文本、语音等数据,之后,再基于本申请实施例提供的模型确定方法从构建的模型集中选出一个适合该新任务的目标模型和一组目标超参数对该新任务进行处理,从而得到基于该新任务训练后的目标模型,其中需要注意的是,该目标模型是已经过已有的任务(如,本申请实施例所述的第一数据集)预训练过的模型。
接下来对本申请实施例提供的模型的获取方法进行介绍,具体请参阅图4,图4为本申请实施例提供的模型的获取方法的一种流程示意图,该方法可以包括如下步骤:
401、基于约束条件构建模型集,该模型集包括至少两个在第一数据集上预训练过的模型。
首先,基于约束条件构建模型集,该模型集中包括至少两个已经在第一数据集上,如,开放的ImageNet数据集,预训练过的模型。
需要说明的是,在本申请的一些实施方式中,约束条件包括用户的一些具体业务需求,例如,约束条件可以是模型大小、模型推理时延、模型训练时延、特定的硬件部署条件、片上内存大小等中的一项或多项。举例来说,有些新任务(如,自动驾驶车辆获取的图片、音频等数据集)对模型推理时延要求比较高,因为自动驾驶车辆对实时性要求高;而有些新任务(如,手机等终端设备)对占据片上内存大小有较高要求,这是因为手机等手持终端的存储空间有限。因此,不同的新任务对模型有不同的约束条件,在本申请实施例中,可基于新任务(可以是一个或多个)的不同应用场景得到不同的约束条件,从而基于约束条件构建满足各个新任务的模型集。
还需要说明的是,在本申请的一些实施方式中,基于约束条件构建模型集可以有不同的实现方式,可以是基于约束条件先构建初始模型集,该初始模型集就包括至少两个训练后的初始模型,其中,每个初始模型是根据已有的开放的第一数据集训练得到的。具体地,在本申请实施例中,可以是基于构建的搜索空间,通过神经网络架构搜索的方式构建该初始模型集,其中,构建的搜索空间不同,那么基于约束条件构建初始模型集的具体实现方式也会有所不同,下面分别进行介绍:
一、构建的搜索空间包括多种网络结构单元(block)及所述多种网络结构单元之间的连接关系。
首先,根据约束条件确定一个搜索空间,该搜索空间就包括多种网络结构单元(block)及所述多种网络结构单元之间的连接关系,其中,每个block内部包含一个或多个节点以及各个节点上的操作(operation,OP),该操作指的是神经网络的一些基本操作单元,例如,卷积、池化等操作,这里的节点可以理解为神经网络模型的层,如,输入层、输出层、卷 积层、池化层、全连接层等。为便于理解,下面对block与block之间的连接关系进行说明。图5示意的是一个block结构以及该block结构的内部操作关系,每个block结构内部可自行设置节点数量、各节点上的操作以及通道数变化,图5中示意的是5个节点,这5个节点包括输入节点和输出节点(也可称为输入层和输出层),C表示输入通道数大小,(0.25~4)×C表示的中间的这3个节点的通道数可以是按照C的0.25~4的比例变化,该变化比例区间可自行设置,且每个中间节点的通道数也可以不一样,图5仅为示意,这里需要注意的是,一般来说,输入节点和输出节点的通道数相同,两者之间默认包含一个跳连(即数据流的流向),如果输入层和输出层分辨率不一致时,中间可以插入一个1x1的卷积,不同节点的输出结果合并时,可以是直接相加(add)方式或通道合并(concat)方式这两种不同的操作,此处不做限定。
在实际应用中,一般来说,一个block结构内部考虑1-5个节点,每个节点考虑7种不同的操作,通道数一般为5种变化,例如,可以是如表1中所示的7种操作和5种通道数的变化(以通道数变化的比例表征),其中表1中的c表示当前操作的输入通道数,需要注意的是,表1仅为对操作和通道数变化的一种示意,在具体应用中,节点上的操作和通道数变化还可以是其他形式,具体此处不做限定。
表1:7种操作和5种通道数的变化
Figure PCTCN2021124924-appb-000001
图6示意的则是多个相同或不同的block之间的连接关系(也可称为堆叠关系),各个block连接后形成的组合结构就为本申请实施例所述的初始模型。图6示意的是4432形式的堆叠结构,也就是说,在堆叠的初始模型中,第一阶段(即阶段1)包括4个通道数均为c的block,第二阶段(即阶段2)包括4个block,其中2个block的通道数为c,另外2个block的通道数为2c,第三阶段(即阶段3)包括3个通道数为2c的block,第四阶段(即阶段4)包括2个通道数为4c的block。这里需要注意的是,堆叠的初始模型可以包括多个阶段,如图6示意的是4个阶段,每个阶段可以包括相同或不同内部结构的block,如图6中的阶段1包括的就是4个内部结构均相同的block,图6中的阶段2的4个block就包括2种内部结构不同的block,在本申请实施例中,堆叠的初始模型包括多少个阶段以及每个阶段中包括的block结构的类型和通道数均可设置,此处不做限定。在图6中,示意的则是每个阶段可包括1~10个相同或不同的block。
还需要注意的是,在本申请实施例中,根据约束条件确定搜索空间可以分解为两个层 次的搜索过程,可以先基于约束条件搜索符合要求的block结构,再基于约束条件搜索block结构之间的连接关系。通过这两个层次的搜索,得到符合约束条件的搜索空间。
根据上述所述的方式确定好符合约束条件的搜索空间后,就可从搜索空间随机采样,得到至少两个初始模型,每个初始模型是由多个block结构和各block结构之间的连接关系确定的,得到至少两个初始模型后,就可根据第一数据集对初始模型进行预先训练,从而得到训练后的初始模型,各个训练后的初始模型就构成最开始的初始模型集。具体地,在本申请的一些实施方式中,如图7所示,若新任务是分类任务,那么可以基于搜索空间中的各个模型(图7中每个圈圈表示的是一个基于block结构和连接关系得到的模型)在ImageNet数据集上训练的精度以及训练单步的时长构建帕累托前沿,并根据帕累托前沿构建能够友好迁移到新任务的训练过的初始模型,各个训练过的初始模型就构成初始模型集。其中,表2示意的是ImageNet数据集包括的图片类别数、训练集图片数量和测试集图片数量。
表2:ImageNet数据集
数据集名称 类别数 训练集图片数量 测试集图片数量
ImageNet 1000 12.8M 50K
二、构建的搜索空间包括已有的成熟的初始模型。
另一种构建初始模型集的方式是直接基于约束条件搜索是否存在符合该约束条件的已有的成熟的模型,若有,就直接将该成熟模型纳入初始模型集,并通过第一数据集对该成熟模型进行训练,训练后的成熟模型就是训练后的初始模型。这种方式的好处在于:可直接得到已有的初始模型,相对第一种方式节省了一些搜索时间,而第一种方式的好处则在于:一方面可以遍历所有可能的block以及所有的连接关系,从而找到新能最优的架构;另一方面是可以打破人类思维的局限,找到现有中没有的架构组织方式。
基于约束条件构建好初始模型集之后,就可以根据演化算法(EA)得到每个初始模型各自对应的一组衍生模型,其中,每组衍生模型包括至少一个衍生模型,需要注意的是,每个初始模型衍生的一组衍生模型中具体包括几个衍生模型可根据演化算法自行设置,具体此处不做限定。
为便于理解,这里举例说明:假设初始模型集中有3个训练过的初始模型,那么根据演化算法,这3个初始模型均可各自衍生出一组衍生模型,那么一共可衍生3组衍生模型,这3组衍生模型中各自包括的衍生模型的数量可以相同,也可以不同,此处不做限定。例如,通过演化算法这3个初始模型均各自衍生出5个衍生模型,那么就一共可得到15个衍生模型。
得到初始模型的衍生模型之后,由于通过演化算法衍生出来的各个衍生模型是没有经过训练的模型,因此,本申请还需要构建一个预测器(可称为第二预测器),该第二预测器的作用是预测各个衍生模型对第一数据集的输出精度(可称为第三输出精度),该第三输出精度是一种粗略的预测结果,并不是衍生模型针对第一数据集真正的输出精度。这里还需要注意的是,构建的第二预测器也是未经过训练的,在本申请实施例中,该第二预测器的输入为初始模型集中各个训练过的初始模型,根据各个训练过的初始模型,可以得到训练 后的第二预测器。之后,经过训练的第二预测器就可用于对每个衍生模型进行处理,预测每个衍生模型对第一数据集的第三输出精度。随后根据得到的各个衍生模型对应的第三输出精度从所有衍生模型中选取目标衍生模型(可以是一个或多个),并根据第一数据集对选出的目标衍生模型进行训练,从而得到训练后的目标衍生模型,那么上述训练后的初始模型和该训练后的衍生模型就构成本申请实施例所述的模型集。
为便于理解,依然以上述例子为例进行说明:初始模型集中有3个初始模型,并且每个初始模型衍生出5个衍生模型,共15个衍生模型,由于这15个衍生模型初始模型经过演化算法得到的,衍生模型的网络参数有些是未初始化和未训练过的,因此,本申请构建了一个第二预测器,该第二预测器用于粗略预测这15个衍生模型对第一数据集的输出精度(即第三输出精度)是怎样的,之后,再根据各个第三输出精度从这15个衍生模型中选择符合要求的目标衍生模型,假设从这15个衍生模型中选择出了5个目标衍生模型,就根据第一数据集对这5个目标衍生模型进行训练,得到训练后的目标衍生模型,那么训练后的这5个目标衍生模型和原来的那3个训练后的初始模型就共同构成本申请实施例所述的模型集。
需要说明的是,在本申请的一些实施方式中,如何根据各个衍生模型对应的第三输出精度从所有衍生模型中选取目标衍生模型有多种实现方式,包括但不限于如下几种:
A、从所有衍生模型中选取第三输出精度大于预设值的衍生模型作为该目标衍生模型。
举例示意:假设一共有6个衍生模型,各个衍生模型对应的第三输出精度取值分别为85%(对应衍生模型a)、87%(对应衍生模型b)、89%(对应衍生模型c)、91%(对应衍生模型d)、93%(对应衍生模型e)、94%(对应衍生模型f),其中,假设预设值为90%,那么从这6个衍生模型中选取衍生模型d、e、f这3个衍生模型作为目标衍生模型。
B、从所有衍生模型中选取第三输出精度取值较大的前n个衍生模型作为该目标衍生模型,n≥1。
举例示意:依然假设一共有6个衍生模型,各个衍生模型对应的第三输出精度取值分别为85%(对应衍生模型a)、87%(对应衍生模型b)、89%(对应衍生模型c)、91%(对应衍生模型d)、93%(对应衍生模型e)、94%(对应衍生模型f),其中,假设n=2,也就是选择所有第三输出精度由大到小排序的话取值排在前两位的2个衍生模型为目标衍生模型,也就是选择衍生模型e、f这2个衍生模型作为目标衍生模型。
C、根据第三输出精度的均值和方差得到每个衍生模型对应的置信上界(UCB),并从所有衍生模型中选取置信上界取值较大的前m个衍生模型作为该目标衍生模型,m≥1。
举例示意:依然假设一共有6个衍生模型,各个衍生模型对应的第三输出精度取值分别为85%(对应衍生模型a)、87%(对应衍生模型b)、89%(对应衍生模型c)、91%(对应衍生模型d)、93%(对应衍生模型e)、94%(对应衍生模型f),假设根据第三输出精度的均值和方差得到上述6个衍生模型对应的置信上界分别为87%(对应衍生模型a)、91%(对应衍生模型b)、90%(对应衍生模型c)、92%(对应衍生模型d)、95%(对应衍生模型e)、97%(对应衍生模型f),其中,假设m=4,也就是选择所有置信上界由大到小排序的话取值排在前四位的4个衍生模型为目标衍生模型,也就是选择衍生模型b、d、e、f这 4个衍生模型作为目标衍生模型。
还需要说明的是,在本申请的一些实施方式中,还可以将上述构建的模型集作为新的初始模型集,目标衍生模型作为新的初始模型,重新执行上述构建模型集的步骤直至达到一个预设条件(可称为第二预设条件)。为便于理解该重复执行的步骤,这里依然以上述例子为例进行说明:假设一开始的初始模型集(可称为第一轮初始模型集)中有3个初始模型,并且每个初始模型衍生出5个衍生模型,共15个衍生模型,并根据上述所述的方式从这15个衍生模型中选出了5个目标衍生模型,那么训练后的这5个目标衍生模型和原来的那3个训练后的初始模型就共同构成本申请实施例所述的模型集,之后有8个模型(3个初始模型+5个目标衍生模型)的模型集作为新的初始模型集,每个目标衍生模型作为初始模型,那么第二轮的初始模型集就有8个训练后的初始模型,之后,依然利用演化算法对这8个初始模型进行衍生,又可以得到每个初始模型各自对应的一组衍生模型,假设一共得到了40个衍生模型,那么再利用第二预测器继续预测这40个衍生模型对第一数据集的输出精度(即第三输出精度),之后再根据各个第三输出精度从这40个衍生模型中选择符合要求的目标衍生模型,假设从这40个衍生模型中又选择出了6个目标衍生模型,就根据第一数据集对这6个目标衍生模型进行训练,得到训练后的目标衍生模型,训练后的这6个目标衍生模型和第二轮的那8个训练后的初始模型就共同构成本申请实施例所述的模型集,该模型集中就一共包括14个模型(3个第一轮的初始模型+5个第一轮的目标衍生模型+6个当前轮次的目标衍生模型)。假设第二轮次得到的模型集满足上述第二预设条件,则不再循环,此时第二轮得到的模型集就作为最终构建的模型集(可称为目标模型集),假设第二轮次得到的模型集依然未满足上述第二预设条件,则继续循环,直至达到该第二预设条件。
需要说明的是,在本申请的一些实施方式中,该第二预设条件可根据用户需求自行设置,例如,该第二预设条件可以是模块库内的模型数量达到预设数量,以上述例子为例:假设预设数量为13,而第二轮次得到的模型集包括14个模型,那么说明达到了第二预设条件,因此该包括14个模型的模型集就为最终构建得到的模型集;又例如,该第二预设条件还可以是模型集内的模型满足的约束条件达到预设要求,例如,假设约束条件一共有3种类型,用户要求每种类型的约束条件都需要达到一定数量,这样做的目的是为了使得模型集累积到满足不同约束条件的模型。
还需要说明的是,在本申请的一些实施方式中,每一轮得到的训练后的目标衍生模型都可以用来更新第二预测器,以提高第二预测器的预测精度。
还需要说明的是,在本申请的一些实施方式中,第二预测器可以是“GCN+贝叶斯回归器”,具体地,根据训练后的初始模型对构建的第二预测器进行训练的过程可以是:首先,对训练后的初始模型的图结构(也可称为拓扑图)进行编码,得到每个训练后的初始模型的图编码,然后将每个图编码作为GCN的输入,利用GCN提取每个图编码的特征,从而避免手工设计核函数来评估网络架构之间的距离。之后GCN的输出作为贝叶斯回归器的输入,该贝叶斯回归器的作用主要是用来评估模型性能的均值和方差,具体为通过使用置信上界来评估模型的性能。
为了便于理解什么是模型的图结构以及图编码,下面举例进行示意,如图8所示,图8为一个模型的图结构以及对应的图编码,模型实质是由众多节点以及节点与节点之间的连接关系构成的,因此,每个模型都可以看成是一个图结构,如图8示意的模型的图结构就包括3种操作、6种节点类型和7个节点(包括输入节点Input Node1和输出节点Output Node7),其中,3种操作分别为1×1的卷积操作(1×1Conv)、3×3的卷积操作(3×3Conv)以及最大池化操作(Max Pooling),6种节点类型分别为输入、1×1卷积、3×3卷积、最大池化、输出、全局,7个节点则分别为Node1-Node7,在本申请实施例中,为了编码整个图结构的特征,额外引入了一个全局节点Global Node8,该全局节点用于连接图结构的所有节点,这样才能编码整个图结构,最终构成这8个节点、6种节点类型的图结构。针对每个模型的图结构,都可以对其进行唯一编码,得到图编码,每个图编码由邻接矩阵和独热(one-hot)编码构成,具体如图8所示,由此,一个图编码就唯一确定了一个模型。
还需要说明的是,在本申请的一些实施方式中,对于采样得到的模型,不管是初始模型还是基于初始模型衍生得到的目标衍生模型,为了更高效的进行训练,可以将多个模型融合成一个超网,通过使用参数共享的方式进行快速训练,这样可以大大减少模型训练的时长。需要注意的是,这里所述的共享的参数是指网络结构内部本身具有的参数,比如构成超网的子网的卷积操作、卷积核大小和卷积核取值等,下面分别介绍如何将模型融合成一个超网进行训练:
A、初始模型的融合训练。
本申请实施例中,由于构建的初始模型集中包括至少两个初始模型,因此根据第一数据集对初始模型进行训练,得到训练后的初始模型具体可以是:首先,将初始模型集中的所有初始模型融合成一个超网模型(可称为第一模型),之后根据第一数据集对该第一模型进行训练,从而得到训练后的第一模型,最后,将训练后的第一模型又重新拆解为训练后的初始模型。
为便于理解,下面举例进行示意:请参阅图9,图9为本申请实施例提供的将多个初始模型融合成第一模型进行训练且训练后重新拆解为多个初始模型的示意图,假设初始模型有3个,分别为A 1、A 2和A 3,A 1、A 2和A 3的网络结构如图9所示,其中,图9中的每个圈圈表示网络结构的一层(如,池化层、卷积层等),需要注意的是,图9示意的是每个初始模型均表示为4层,在实际应用中,每个初始模型的层数不一定相同,层的数量也不一定是4层,此处仅为示意,具体不做限定。对A 1、A 2和A 3的融合就是将各个初始模型层与层之间的连接关系全部体现在一个模型中,即图9中的模型super-A中,之后根据第一数据集对该融合后的模型super-A进行训练,这样通过对一个模型的训练就可得到所有初始模型的模型精度,当训练完成后,再将该模型super-A按照原来的连接关系拆解开,这样就得到训练后的A 1’、A 2’和A 3’
B、目标衍生模型的融合训练。
类似地,在本申请实施例中,如果得到的目标衍生模型有多个,那么根据第一数据集对目标衍生模型进行训练,得到训练后的目标衍生模型具体可以是:首先,将这多个目标衍生模型融合成一个超网模型(可称为第二模型),之后根据第一数据集对该第二模型进行 训练,从而得到训练后的第二模型,最后,将训练后的第二模型又重新拆解为训练后的多个目标衍生模型。目标衍生模型具体的融合与拆解的过程与图9类似,此处不予赘述。
402、通过构建的第一预测器预测该模型集中任意一个模型对第二数据集的第一输出精度,其中,每个模型对应一个第一输出精度,每个模型对应一组超参数,该超参数通过在超参数空间采样得到。
在基于约束条件构建好模型集之后,那么该模型集就包括至少两个在第一数据集上预训练过的模型(即训练后的初始模型和训练后的目标衍生模型),之后,在超参数空间进行随机采样,得到一组超参数,这组随机采样得到的超参数可称为第一超参数,之后通过构建的第一预测器预测模型集中的任意一个模型对第二数据集的第一输出精度,其中,每个模型对应一个第一输出精度,例如,可以是预测模型集中的一个模型的输出精度,也可以是预测模型集中的多个模型各自对应的输出精度,还可以是预测模型集中的每个模型的输出精度,此处不做限定,且每个模型都对应会有一个超参数(即第一超参数),也就是说,在模型的超参数设置为该第一超参数的情况下,通过构建的另一预测器(可称为第一预测器)预测该模型集里的任意一个模型对第二数据集的输出精度(可称为第一输出精度),其中,该第二数据集即为新任务的数据集。
为便于理解上述步骤,下面举例进行示意:假设超参数空间一共有30组超参数,构建的模型集里一共有10个训练过的模型,那么针对每个训练过的模型和每组超参数,该构建的第一预测器都可预测出对第二数据集的输出精度,这样针对该第二数据集,就可得到30×10=300个第一输出精度,每个第一输出精度对应模型集里的一个模型和超参数空间里的一组超参数。
这里需要注意的是,在本申请实施例中,构建的第一预测器的输入数据是第一超参数、模型集中的任意一个模型和第二数据集,输出是该任意一个模型在该第一超参数情况下对第二数据集的输出精度的预测。具体地,在本申请的一些实施方式中,需要对该第一超参数、该模型以及该第二数据集分别进行编码,从而分别得到该超参数编码、该模型编码以及第二数据集编码,之后,将该超参数编码、该模型编码及第二数据集编码输入第一预测器,输出该模型在第一超参数情况下对第二数据集的第一输出精度的预测结果,该过程具体可参阅图10,图10表示的是构建的第一预测器得到各个模型对第二数据集的第一输出精度的预测的示意图。
这里需要注意的是,如图10所示,由于构建的第一预测器也是未经过训练的,因此,在本申请实施例中,可通过已有的任务对该第一预测器进行初始化,当该第二数据集作为新任务完成预测后,也可将该新任务作为下一个已有的任务对该第一预测器的参数进行更新,从而提高第一预测器的检测精度。具体来说,在本申请的一些实施方式中,可以根据第二输出精度、第二数据集、目标超参数及目标模型更新该第一预测器的参数,其中,该第二输出精度为该训练后的目标模型对第二数据集的输出精度。
还需要说明的是,在本申请实施例中,对第一预测器的初始化过程具体可以是:在第一预测器的训练数据集上随机采样子集,并从构建的模型集以及超参空间随机采样预训练过的模型和一组超参数进行迁移学习,从而得到分类准确率(每个模型、每组超参数、一 个采样子集对应一个分类准确率)。例如,可用此方法采集30K组数据,其中24K组作为训练集,6K组作验证集,记录分类准确率。其中,表3示意的是可以用于初始化该第一预测器的训练数据集,表4示意的是测试该第一预测器的测试数据集。表3和表4仅为示意,还可以是其他种类的数据集。此外,在本申请其他的一些实施方式中,训练数据集和测试数据集也可以是其他类型的数据集,例如,当模型集里的模型是用于处理文本数据的,那么训练数据集和测试数据集以及本申请实施例上述所述的第一数据集和第二数据集就可以是文本类的数据集;又例如,当模型集里的模型是用于处理语音数据的,那么训练集数据和测试集数据以及本申请实施例上述所述的第一数据集和第二数据集就可以是语音类的数据集,具体此处对模型集里模型可应用的场景以及数据库类型不做限定,只要模型集里的模型与数据集对应即可。
表3:一些训练数据集的示意
数据集(即用于训练的新任务) 类别数 训练集图片数量 测试集图片数量
Flowers102 102 2.0K 6.1K
Stanford-Car 196 8.1K 8.0K
Caltech101 101 3.1K 6.1K
Places365 365 1.8M 36.5K
CUB-Birds 200 6.0K 5.8K
表4:一些测试集数据的示意
数据集(即模拟真实的新任务) 类别数 训练集图片数量 测试集图片数量
Aircrafts 100 6.7K 3.3K
MIT67 67 5.4K 1.3K
Stanford-Car 196 8.1K 8.0K
Stanford-Dog 257 12K 8.6K
在本申请实施例中,第一预测器的网络结构可记为P,该网络结构包括多个全连接层,且该第一预测器的输入数据和输出数据可以记为如下所述的方式:
Figure PCTCN2021124924-appb-000002
其中,公式左边的数据为输入数据,公式右边的数据为输出数据,Regime FT表示表征的模型特征,具体可以包括模型的独热编码、模型在第一数据集上的第一输出精度等;state(D)表示第二数据集(假设第二数据集的数据类型为图片)编码表征的数据类别数(如,图片类别数)、每类图片数量的平均值及方差、第二数据集与第一数据集(如,ImageNet数据集)的相似度等;
Figure PCTCN2021124924-appb-000003
表示学习率、训练轮数、模型中固定参数的阶段数(即训练过程中某个阶段固定哪些参数不变)等;l则表示第一预测器中不同的层,a l表示每层特征的权重,f l表示的是每层的特征值。此外,还有:
f l=h lW l
h l=ReLU(φ lh l-1),
Figure PCTCN2021124924-appb-000004
其中,W l和φ l是每层的可学习参数,h是每层的输入和输出。
403、确定所述模型中第一输出精度满足第一预设条件的模型为目标模型,该目标模型对应的超参数为目标超参数。
当得到的所有第一输出精度中,存在一个满足预设条件(可称为第一预设条件)的输出精度,则该满足第一预设条件的输出精度就称为目标输出精度,与该目标输出精度对应的模型和第一超参数则称为目标模型及目标超参数,之后,就将该目标模型和该目标超参数作为最终处理该第二数据集的模型和超参数,也就是说,选择该目标模型和该目标超参数在新的第二数据集上进行迁移学习。
需要说明的是,在本申请实施例中,判断目标输出精度满足第一预设条件的方式可以是:在所有第一输出精度中选择取值最大的那个作为目标输出精度,一般来说,输出精度越大,说明该模型在对应超参数情况下性能越好。这里还需要注意的是,评价一个模型的性能,除了可以是通过输出精度,还可以是其他的,比如,错误率越小,则性能越好等,在本申请实施例中,仅是以输出精度为例进行说明。
此外,对于一个给定的新任务的数据集(即第二数据集),由于其数据集内的数据是固定的,先提取其数据集特征(即数据编码),在模型集中随机选择模型,另外从超参数空间随机选择超参数进行编码,最后利用该初始化后的第一预测器预测各种配置下对该第二数据集的检测精度(即第一输出精度),最后可从中选择第一输出精度最高的配置(即对应的模型和超参数)进行迁移学习,也就是将作为最终处理该第二数据集的目标模型和目标超参数。迁移学习结束后得到的元特征信息,就可以用来更新该第一预测器的相关参数。
404、基于该目标超参数,根据该第二数据集对该目标模型进行训练,得到训练后的目标模型。
通过上述步骤从模型集和超参数空间确定出目标模型和目标超参数后,就可基于该目标超参数,根据该第二数据集对该目标模型进行训练,从而得到训练后的目标模型。
需要说明的是,在本申请的一些实施方式中,训练后的目标模型还可以部署在执行设备上,以使得执行设备通过该训练后的目标模型对输入的目标数据进行处理。例如,可以部署在手机、个人电脑、智能手表等智能终端上,也可以部署在自动驾驶车辆、网联汽车、智能汽车等可移动终端设备上,具体此处不做限定。
在本申请上述实施方式中,综合考虑了模型的选择和超参数的选择,用于通过构建的第一预测器快速预测基于约束条件构建的模型集中每个模型在不同超参数情况下针对新任务的性能表现,并从中选择满足预设条件(如,模型的输出精度取值最大)的模型和超参数作为最终处理新任务(即第二数据集)的目标模型和目标超参数。针对新任务,该方法基于用户给定的约束条件,可高效选择出合适的模型和超参数,从而节约了训练时间和算力成本。
也就是说,本申请实施例所达到的技术效果是:在实际业务交付过程中,在有限的时间内对一个新任务(即第二数据集),找到合适的模型,并将其训练到达到交付要求的精度(也就是针对新任务要选择出一个最好的模型和一组最好的超参)。
此外,由于不同业务的应用场景不同,对应的约束条件也不尽相同,选择合适的网络 结构,是非常耗时。在实际业务中,往往是针对具体的问题,人工设计满足要求的网络结构,然后通过手工调参方式,使其达到业务交付的目标。整个周期非常长,需要大量人工介入,而且这些业务之间相互彼此独立,没有充分挖掘之间的相关信息。对于新的任务,利用现有的模型直接迁移学习(如,fine-tune)是一种非常高效的解决方案,但是无法适配不同的应用场景和约束条件。谷歌微软提供的AutoML服务平台能够提供一个解决方案,但是用户不能根据自己的需求,如交付时间、部署平台等进行选择。因此,在本申请实施例中,该构建的第一预测器并不仅用于处理一次新任务,针对每个新任务,均可通过上述方式进行处理,从而使得本申请实施例提供的模型的获取方法可应用于持续性、多任务的交付场景,达到跨任务进行迁移学习的目的。
为便于理解上述图4对应的实施例所述的模型的获取方法,下面以一个实例分别从模型集构建阶段和迁移学习阶段对上述实施例的框架进行示意,请参阅图11,图11为本申请实施例提供的模型的获取方法的一个框架示意图,该框架示意图包括模型集构建阶段和迁移学习阶段,下面分别进行介绍:
一、模型集构建阶段
步骤1、基于约束条件定义搜索空间,该搜索空间包括多种网络结构单元(block)及所述多种网络结构单元之间的连接关系。
步骤2、从搜索空间随机采样,得到若干个初始模型(比如,3个初始模型),构成初始的模型集合。
步骤3、将多个初始模型进行融合,构建成一个超网(即上述所述的第一模型),通过参数共享的方式,根据第一数据集(即已有的如ImageNet等数据集)对各个初始模型进行训练,这里所述的共享的参数是指网络结构内部的参数,比如构成该超网的子网的卷积操作、卷积核大小和取值等。这样一次训练过程就可同时得到多个初始模型的检测精度,从而节省了训练时间,需要注意的是,这里的检测精度一般是指初始模型输出的针对第一数据集的预测结果的准确率,也就是上述所述的第一输出精度。根据第一数据集训练好的初始模型,构成初始模型集。
步骤4、提取初始模型集中各个训练好的初始模型的图编码,训练和初始化GCN和贝叶斯回归器。
步骤5、基于已有的该初始模型集,在搜索空间内,采用EA采样的方式,构建多组新的模型(即衍生模型),其中,每个初始模型都可以通过EA采样方式得到若干个衍生模型,比如初始模型3个,EA采样后每个初始模型衍生出5个新的模型,那么就一共衍生出15个衍生模型(也可以每个初始模型演化的数量不一样),这里EA采样得到的衍生模型是没有经过训练的。
步骤6、针对每个衍生模型的图结构进行编码,得到图编码,之后再利用上述步骤4训练过的GCN提取每个衍生模型对应的图编码的特征,将提取的特征输入到经过上述步骤4训练过的贝叶斯回归器中,得到每个衍生模型针对第一数据集的检测精度(即第一输出精度),例如,一共有15个衍生模型,那么就可对应得到15个第一输出精度。
步骤7、根据预测的每个衍生模型的第一输出精度得到第一输出精度的均值和方差, 并进一步计算得到每个衍生模型的置信上界(UCB),这样一共就可得到15个置信上界,该置信上界表示的是每个衍生模型检测精度能达到的上限。
步骤8、对每个衍生模型的置信上界按从大到小的顺序进行排序,选取置信上界排在前m个(Top-m)衍生模型作为目标衍生模型,假设m取值为5,那么就是从这15个衍生模型中选取置信上界取值较大的那5个衍生模型作为目标衍生模型。
步骤9、对于选出的这m个目标衍生模型,同样地,将这m个目标衍生模型进行融合,构建成一个超网(即上述所述的第二模型),通过参数共享的方式,根据第一数据集(即已有的如ImageNet等数据集)对各个目标衍生模型进行训练,训练好的目标衍生模型英语更新上述GCN和贝叶斯回归器,并且该训练好的目标衍生模型,同时更新到模型集中。以m取值为5为例,那么该构建的模型集就包括原来的3个根据第一数据集训练后的初始模型以及5个根据第一数据集训练后的目标衍生模型。
步骤10、对步骤5-9循环执行直至得到一个预设条件(即上述所述的第二预设条件),如,直至模型集中累积到满足不同约束条件的模型,或者,直至模型集中累积到足够数量的模型。
二、迁移学习阶段
步骤1、基于已有的任务(如,第一数据集),对模型集里的模型、超参数空间里随机采样的超参数、以及该第一数据集进行编码以及得到的模型的检测精度等数据,对第一预测器进行初始化和训练。
步骤2、对于一个新任务(即第二数据集),对该新任务的数据集进行编码,提取相应的特征,并从模型集中采样模型,以及从超参数采样空间采样超参数(即第一超参数),将第二数据集编码、模型编码、超参数编码输入到该第一预测器中,输出每个模型在第一超参数情况下对第二数据集的输出精度的预测结果,最后从多个预测结果中,选取最好的模型和训练超参数配置,在该新的任务上进行迁移学习。
步骤3、若当前新任务已完成,那么可以进一步提取该新任务的数据集编码、目标模型编码、目标超参数编码以及该目标模型在目标超参数情况下对该新任务的输出精度(即上述所述的第二输出精度)等元信息,利用这些信息来更新该第一预测器,从而提升该第一预测器的预测精度。
为了对本申请实施例所带来的有益效果有更为直观的认识,以下对本申请实施例所带来的技术效果作进一步的对比,在基于约束条件构建模型集的过程中,引入训练单步时长作为约束条件,构建高效训练的模型集(ET-NAS)。图12示意的是本申请实施例提供的模型集ET-NAS与手工设计的模型在训练步长时间上的比较,从图12中可以看到,在ImageNet数据集相同输出精度下,ET-NAS-G比RegNetY-16GF的训练步长快6倍,ET-NAS-I比EfficientNet-B3快1.5倍。
此外,本申请实施例提供的模型的获取方法和现有方法的组合结果的对比结果如表5所示。
表5:本申请实施例提供的模型的获取方法和现有方法的组合结果对比
Figure PCTCN2021124924-appb-000005
从表5前两行结果与其他结果对比,如果利用在ImageNet数据集上性能表现好的模型,只搜索超参数,性能远远低于其他方法或组合。也就表明不同的训练任务之间有一些差异,直接进行迁移学习达不到最优性能。在不同的超参数搜索算法,随机搜索(random search)、BOHB和本申请在线自适应(OA)下,使用ET-NAS模型作为模型集,结果均明显好于常用模型构成的模型集。相同的模型集(常用模型、ET-NAS模型),用OA预测的超参数得到的精度与随机搜索40组超参数得到的精度相当。基于本申请示例构建的模型集,OA预测的超参数得到的精度与BOHB在常用模型上搜索40组训练参数得到的精度相当。
需要说明的是,在本申请的一些实施方式中,修改约束条件也可以得到适配其他类型任务的模型集,例如,在搜索过程中,引入模型在华为的D芯片上的推理时间作为约束条件,使用对D芯片友好的算子进行搜索,最后可以得到D芯片友好的网络结构模型,具体可如图13所示,图13示意的是对D芯片友好的网络模型和常用网络模型的性能比较。又例如,在搜索过程中,引入模型在GPU V100上推理的时间作为约束条件,搜索对GPU V100友好的网络模型。更改搜索空间,在不同的benchmark上验证采样的高效性,具体可如图14所示,图14示意的是对GPU V100友好的网络模型和常用网络模型的性能比较。
这里需要注意的是,由于搜索出的模型可以不能直接被搭载在芯片或设备上,以D芯片为例,在获取D芯片上推理的时间过程中,本申请首先构建一个模型转换工具,能够快速地将pytorch模型转化成caffe模型。该工具先将pytorch模型导出成onnx模型,然后通过解析onnx模型的图结构,转换化成caffe模型。进一步通过D芯片自带的工具,将caffe模型打包成D芯片上能够运行的om模型。通过上述步骤就构建了一个模型采样、模型训练、模型硬件评估的闭环,本申请能够在搜索的过程中快速获取在D芯片上的推理时间,有选择的构建模型集,最后得到D芯片友好的模型的网络结构。
类似地,在获取GPU V100上推理的时间中,对每个模型,随机运行100次,对运行时间进行排序,选取中间段的数据,求平均值作为该模型的最后评估性能。最后筛选得到对于GPU V100友好的网络模型。
为了对比其他的采样方法,本申请使用基准的搜索空间NAS-Bench-101和NAS-Bench-201代替本申请实施例自定义的搜索空间,其他条件和方法都不变,来验证本申请的采样算法的高效性,如图15所示,图15示意的是在神经网络架构搜索基准数据集上采样效率比较的示意图,从图15中可以看出,在神经网络架构搜索基准数据集NAS-Bench-101、NAS-Bench-201上,使用本申请实施例的采样方法(采样次数相同的情况 下),均能获得更高的精度。
由于智能安防、平安城市、智能终端等领域中都可以用到本申请实施例提供的模型的获取方法将目标模型迁移到新任务(即第二数据集)上进行学习,例如,可应用于持续性、多任务交付场景(只有一个新任务的场景也是可以用的),如云训练平台、终端视觉、无人驾驶等项目,下面将对多个落地到产品的多个应用场景进行介绍。
应用场景1:云训练平台
平台上有大量训练好的任务及模型,可基于本申请实施例提供的模型的获取方法以充分利用这些信息,提供AutoML服务。另外还可基于本申请实施例提供的模型的获取方法充分挖掘这些任务之间的相关性,对新任务提供更多、性能更高以及部署硬件友好的模型选择,对于选择的模型,可以推荐合适的超参数,从而简化业务训练人员的工作。
应用场景2:终端视觉及无人驾驶
终端视觉和无人驾驶等领域,更多的是关注模型在特定硬件平台上的部署,人工设计的网络未必能够很好的满足硬件约束,因此使用本申请实施例提供的模型的获取方法可以快速构建出一些列满足要求的网络模型,以供业务训练人员选择。
应理解,以上介绍的只是本申请实施例的模型的获取方法所应用的几个具体场景,本申请实施例提供的模型的获取方法在应用时并不限于上述场景,其能够应用到任何需要选择模型进行图像分类或者图像识别等的场景中,只要能使用模型的领域和设备,都可应用本申请实施例提供的模型的获取方法以及最终基于任务训练好的目标模型,此处不再举例示意。
在上述所对应的实施例的基础上,为了更好的实施本申请实施例的上述方案,下面还提供用于实施上述方案的相关设备。具体参阅图16,图16为本申请实施例提供的计算机设备的一种结构示意图,计算机设备1600包括:构建模块1601、预测模块1602、选择模块1603、训练模块1604,其中,构建模块1601,用于基于约束条件构建模型集,所述模型集包括至少两个在第一数据集上预训练过的模型;预测模块1602,用于通过构建的第一预测器预测所述模型集中任意一个模型对第二数据集的第一输出精度,其中,每个模型对应一个第一输出精度,所述每个模型对应一组超参数,所述超参数通过在超参数空间采样得到,也就是说,在模型的超参数为第一超参数的情况下,通过构建的第一预测器预测模型集中的任意一个模型对第二数据集的第一输出精度,该第一超参数为在超参数空间采样得到的任意一组超参数,所述第二数据集包括采集到的任意一个数据集;选择模块1603,用于确定所述模型中第一输出精度满足第一预设条件的模型为目标模型,所述目标模型对应的超参数为目标超参数;训练模块1604,用于基于所述目标超参数,根据所述第二数据集对所述目标模型进行训练,得到训练后的目标模型。
在本申请上述实施方式中,综合考虑了模型的选择和超参数的选择,用于通过构建的第一预测器快速预测基于约束条件构建的模型集中每个模型在不同超参数情况下针对新任务的性能表现,并从中选择满足预设条件(如,模型的输出精度取值最大)的模型和超参数作为最终处理新任务(即第二数据集)的目标模型和目标超参数。针对新任务,该方法基于用户给定的约束条件,可高效选择出合适的模型和超参数,从而节约了训练时间和算 力成本。也就是说,本申请实施例所达到的技术效果是:在实际业务交付过程中,在有限的时间内对一个新任务(即第二数据集),找到合适的模型,并将其训练到达到交付要求的精度,也就是针对新任务要选择出一个最好的模型和一组最好的超参。
在一种可能的设计中,所述预测模块1602,具体用于:对所述超参数(即上述的第一超参数)、模型集里的任意一个模型及所述第二数据集分别进行编码,分别得到超参数编码、该模型编码及第二数据集编码;将所述超参数编码、该模型编码及所述第二数据集编码输入所述第一预测器,输出所述任意一个模型在所述第一超参数情况下对所述第二数据集的第一输出精度。
在本申请上述实施方式中,具体阐述了构建的第一检测器的输入数据和输出数据分别是什么,具备可实现性。
在一种可能的设计中,所述训练模块1604,还用于:在得到训练后的目标模型之后,根据第二输出精度、所述第二数据集、所述目标超参数及所述目标模型更新所述第一预测器的参数,所述第二输出精度为所述训练后的目标模型对所述第二数据集的输出精度。
在本申请上述实施方式中,对于已处理完的第二数据集,可根据第二输出精度、第二数据集等更新该第一预测器,从而可提升该第一预测器的预测精度,第一输出精度是预测器粗略预测的,第二输出精度就是真实训练得到的,通过真实训练的输出精度去更新第一预测器的参数,那么第一预测器的检测精度相应就会提高。
在一种可能的设计中,该选择模块1603,具体用于:从所述模型中选取第一输出精度取值最大的模型为所述目标模型。也就是说,目标输出精度在所述第一输出精度中取值最大,这里还需要注意的是,评价一个模型的性能,除了可以是通过输出精度,还可以是其他的,比如,错误率越小,则性能越好等,在本申请实施例中,仅是以输出精度为例进行说明。
在本申请上述实施方式中,从模型中确定出目标模型的方式可以是:在所有第一输出精度中选择取值最大的那个第一输出精度对应的模型作为本申请实施例所述的目标模型,一般来说,输出精度越大,说明该模型在对应超参数情况下的检测性能越好,据此可选择出配置最优的模型和超参数。
在一种可能的设计中,所述构建模块1601,具体用于:首先基于约束条件先构建初始模型集,该初始模型集就包括至少两个训练后的初始模型,其中,每个初始模型是根据已有的开放的第一数据集训练得到的;之后,根据所述训练后的初始模型对构建的第二预测器进行训练,得到训练后的第二预测器;通过演化算法(EA)得到每个初始模型各自对应的一组衍生模型,每组衍生模型包括至少一个衍生模型;通过所述训练后的第二预测器对每个衍生模型进行处理,得到每个衍生模型对所述第一数据集的第三输出精度;根据所述第三输出精度从所述衍生模型中选取目标衍生模型。所述训练模块1604,还用于根据所述第一数据集对所述目标衍生模型进行训练,得到训练后的目标衍生模型,所述训练后的初始模型及所述训练后的目标衍生模型构成所述模型集。
在本申请上述实施方式中,具体阐述了如何基于约束条件构建模型集,即先基于约束条件构建初始模型集,再以初始模型集中的初始模型作为种子,通过演化算法衍生出一系 列衍生模型,并从中选择出目标衍生模型进行训练,从而训练后的目标衍生模型和开始得到的训练后的初始模型共同构成本申请实施例所述的模型集,这种构建方式可累积到满足约束条件的各种模型,并且第二预测器可快速筛选出合适的模型,节省了搜索时间。
在一种可能的设计中,所述构建模块1601,具体还用于:根据约束条件确定搜索空间,所述搜索空间包括多种网络结构单元(block)及所述多种网络结构单元之间的连接关系;之后,从所述搜索空间随机采样至少两个初始模型。所述训练模块1604,还用于根据所述第一数据集对所述初始模型进行训练,得到训练后的初始模型,所述初始模型集包括所述训练后的初始模型。
在本申请上述实施方式中,阐述了如何根据约束条件构建初始模型集,即先根据约束条件确定搜索空间,然后从搜索空间采样组合得到初始模型。这种构建方式一方面除了可以遍历到所有可能的架构之外,还可以组合得到目前没有或大家想不到的模型的架构组织方式,具备完备性。
在一种可能的设计中,所述训练模块1604,具体用于:将所述至少两个初始模型融合成一个第一模型;根据所述第一数据集对所述第一模型进行训练,得到训练后的第一模型,这样通过对一个模型的训练就可得到所有初始模型的模型精度;最后,将所述训练后的第一模型拆解为训练后的初始模型。
在本申请上述实施方式中,阐述了如何对多个初始模型进行联合训练,即将采样得到的至少两个初始模型融合成一个超网(即第一模型),这样可以采用参数共享的方式进行训练,训练完后再拆解开,从而通过对一个模型的训练就可得到所有初始模型的检测精度,加快了对所有初始模型训练进度,相比一个一个单独训练各个初始模型节约了训练时间。
在一种可能的设计中,所述目标衍生模型为多个,所述训练模块1604,具体还用于:将多个目标衍生模型融合成一个第二模型;根据所述第一数据集对所述第二模型进行训练,得到训练后的第二模型;将所述训练后的第二模型拆解为训练后的目标衍生模型。
在本申请上述实施方式中,当目标衍生模型有多个时,阐述了如何对多个目标衍生模型进行联合训练,即将采样得到的多个目标衍生模型融合成一个超网(即第二模型),这样依然可以采用参数共享的方式进行训练,训练完后再拆解开,从而通过对一个模型的训练就可得到所有目标衍生模型的检测精度,加快了对所有目标衍生模型训练进度,相比一个一个单独训练各个目标衍生模型节约了训练时间。
在一种可能的设计中,所述构建模块1601,具体还用于:对所述训练后的初始模型的图结构进行编码,得到图编码;之后,根据所述图编码训练图卷积神经网络(GCN)和贝叶斯回归器,得到训练后的GCN和训练后的贝叶斯回归器,所述GCN和所述贝叶斯回归器构成所述第二预测器,所述训练后的GCN和训练后的贝叶斯回归器构成所述训练后的第二预测器。
在本申请上述实施方式中,阐述了第二预测器可以是“GCN+贝叶斯回归器”,当第二预测器是“GCN+贝叶斯回归器”,那么需要对训练后的初始模型的图结构进行编码,编码得到的各个初始模型对应的图编码才能作为GCN的输入数据,利用GCN提取每个图编码的特征,从而避免手工设计核函数来评估网络架构之间的距离。之后GCN的输出作为贝叶 斯回归器的输入,该贝叶斯回归器的作用主要是用来评估模型性能的均值和方差,具备可实现性。
在一种可能的设计中,所述构建模块1601,具体还用于:从所有衍生模型中选取第三输出精度大于预设值的衍生模型作为该目标衍生模型;或,从所有衍生模型中选取第三输出精度取值较大的前n个衍生模型作为该目标衍生模型,n≥1;或,根据第三输出精度的均值和方差得到每个衍生模型对应的置信上界(UCB),并从所有衍生模型中选取置信上界取值较大的前m个衍生模型作为该目标衍生模型,m≥1。
在本申请上述实施方式中,阐述了根据各个衍生模型对应的第三输出精度从所有衍生模型中选取目标衍生模型有多种实现方式,具备可选择性和灵活性。
在一种可能的设计中,该计算机设备1600还可以包括:触发模块1605,该触发模块1605用于将所述模型集作为新的初始模型集,并将所述目标衍生模型作为新的初始模型,重复执行上述构建模块1601所执行的步骤直至达到第二预设条件。
在本申请上述实施方式中,阐述可将模型集内的各个模型重新作为新的初始模型继续构建新的衍生模型以及选择新的目标衍生模型,直至达到预设条件,可使得模型集累积到足够满足要求的模型。
在一种可能的设计中,该第二预设条件可根据用户需求自行设置,例如,该第二预设条件可以是模块库内的模型数量达到预设数量,假设预设数量为13,而当前轮次得到的模型集包括14个模型,那么说明达到了第二预设条件,因此该包括14个模型的模型集就为最终构建得到的模型集;又例如,该第二预设条件还可以是模型集内的模型满足的约束条件达到预设要求,例如,假设约束条件一共有3种类型,用户要求每种类型的约束条件都需要达到一定数量,这样做的目的是为了使得模型集累积到满足不同约束条件的模型。
在本申请上述实施方式中,阐述了第二预设条件的几种具体表现形式,具备灵活性。
在一种可能的设计中,所述约束条件包括:模型大小、模型推理时延、模型训练时延、硬件部署条件、片上内存大小中的任意一个或多个。举例来说,有些新任务(如,自动驾驶车辆获取的图片、音频等数据集)对模型推理时延要求比较高,因为自动驾驶车辆对实时性要求高;而有些新任务(如,手机等终端设备)对占据片上内存大小有较高要求,这是因为手机等手持终端的存储空间有限。
在本申请上述实施方式中,阐述了约束条件可以是哪些类型,这是因为不同的新任务对模型有不同的约束条件,在本申请实施例中,可基于新任务(可以是一个或多个)的不同应用场景得到不同的约束条件,从而基于约束条件构建满足各个新任务的模型集,具备完备性。
在一种可能的设计中,该计算机设备1600还可以包括:部署模块1606,该部署模块1606,用于将所述训练后的目标模型部署在执行设备上,以使得所述执行设备通过所述训练后的目标模型对输入的目标数据进行处理。例如,可以部署在手机、个人电脑、智能手表等智能终端上,也可以部署在自动驾驶车辆、网联汽车、智能汽车等可移动终端设备上,具体此处不做限定。
在本申请上述实施方式中,阐述了基于第二数据集训练得到的目标模型可以部署在执 行设备上进行实际应用。
需要说明的是,图16对应实施例所述的计算机设备1600中各模块/单元之间的信息交互、执行过程等内容,与本申请中图4对应的实施例基于同一构思,具体内容可参见本申请前述所示实施例中的叙述,此处不再赘述。
接下来介绍本申请实施例提供的另一种计算机设备,请参阅图17,图17为本申请实施例提供的计算机设备的一种结构示意图,计算机设备1700上可以部署有图16对应实施例中所描述的计算机设备1600,用于实现图4对应实施例中各步骤的功能,具体的,计算机设备1700由一个或多个服务器实现,计算机设备1700可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)1722(例如,一个或一个以上中央处理器)和存储器1732,一个或一个以上存储应用程序1742或数据1744的存储介质1730(例如一个或一个以上海量存储设备)。其中,存储器1732和存储介质1730可以是短暂存储或持久存储。存储在存储介质1730的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对计算机设备1700中的一系列指令操作。更进一步地,中央处理器1722可以设置为与存储介质1730通信,在计算机设备1700上执行存储介质1730中的一系列指令操作。
计算机设备1700还可以包括一个或一个以上电源1726,一个或一个以上有线或无线网络接口1750,一个或一个以上输入输出接口1758,和/或,一个或一个以上操作系统1741,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
本申请实施例中,中央处理器1722,用于执行图4对应实施例中的目标模型的获取方法。具体地,中央处理器1722用于:首先,基于约束条件构建模型集,该模型集中包括至少两个已经在第一数据集上(如,开放的ImageNet数据集)预训练过的模型,在基于约束条件构建好模型集之后,那么该模型集就包括至少两个在第一数据集上预训练过的模型(即训练后的初始模型和训练后的目标衍生模型),之后,在超参数空间进行随机采样,得到一组超参数,这组随机采样得到的超参数就称为第一超参数,之后通过构建的第一预测器预测模型集中的任意一个模型对第二数据集的第一输出精度,其中,每个模型对应一个第一输出精度,例如,可以是预测模型集中的一个模型的输出精度,也可以是预测模型集中的多个模型各自对应的输出精度,还可以是预测模型集中的每个模型的输出精度,此处不做限定,且每个模型都对应会有一个超参数(即第一超参数),也就是说,在模型的超参数设置为该第一超参数的情况下,通过构建的另一预测器(可称为第一预测器)预测该模型集里的任意一个模型对第二数据集的输出精度(可称为第一输出精度),其中,该第二数据集即为新任务的数据集。当得到的所有第一输出精度中,存在一个满足预设条件(可称为第一预设条件)的输出精度,则该满足第一预设条件的输出精度就称为目标输出精度,与该目标输出精度对应的模型和超参数则称为目标模型及目标超参数,之后,就将该目标模型和该目标超参数作为最终处理该第二数据集的模型和超参数,也就是说,选择该目标模型和该目标超参数在新的第二数据集上进行迁移学习。通过上述步骤从模型集和超参数空间确定出目标模型和目标超参数后,就可基于该目标超参数,根据该第二数据集对该目标模型进行训练,从而得到训练后的目标模型。
需要说明的是,中央处理器1722执行上述各个步骤的具体方式,与本申请中图4对应的方法实施例基于同一构思,其带来的技术效果与本申请中图4对应的实施例相同,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于进行信号处理的程序,当其在计算机上运行时,使得计算机执行如前述所示实施例描述中执行设备所执行的步骤。
本申请实施例提供的计算机设备具体可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使计算机设备内的芯片执行上述图4所示实施例描述的模型的获取方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。
具体的,请参阅图18,图18为本申请实施例提供的芯片的一种结构示意图,所述芯片可以表现为神经网络处理器NPU 200,NPU 200作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路2003,通过控制器2004控制运算电路2003提取存储器中的矩阵数据并进行乘法运算。
在一些实现中,运算电路2003内部包括多个处理单元(Process Engine,PE)。在一些实现中,运算电路2003是二维脉动阵列。运算电路2003还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路2003是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器2002中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器2001中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)2008中。
统一存储器2006用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(Direct Memory Access Controller,DMAC)2005,DMAC被搬运到权重存储器2002中。输入数据也通过DMAC被搬运到统一存储器2006中。
BIU为Bus Interface Unit即,总线接口单元2010,用于AXI总线与DMAC和取指存储器(Instruction Fetch Buffer,IFB)2009的交互。
总线接口单元2010(Bus Interface Unit,简称BIU),用于取指存储器2009从外部存储器获取指令,还用于存储单元访问控制器2005从外部存储器获取输入矩阵A或者权重矩阵B的原数据。
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器2006或将权重数据搬运到权重存储器2002中或将输入数据数据搬运到输入存储器2001中。
向量计算单元2007包括多个运算处理单元,在需要的情况下,对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网 络中非卷积/全连接层网络计算,如Batch Normalization(批归一化),像素级求和,对特征平面进行上采样等。
在一些实现中,向量计算单元2007能将经处理的输出的向量存储到统一存储器2006。例如,向量计算单元2007可以将线性函数和/或非线性函数应用到运算电路2003的输出,例如对卷积层提取的特征平面进行线性插值,再例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元2007生成归一化的值、像素级求和的值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路2003的激活输入,例如用于在神经网络中的后续层中的使用。
控制器2004连接的取指存储器(instruction fetch buffer)2009,用于存储控制器2004使用的指令;
统一存储器2006,输入存储器2001,权重存储器2002以及取指存储器2009均为On-Chip存储器。外部存储器私有于该NPU硬件架构。
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述第一方面方法的程序执行的集成电路。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、只读存储器(read only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,或者网络设备等)执行本申请各个实施例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机或数据中心通过有线(例如同轴 电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(digital video disc,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。

Claims (31)

  1. 一种模型的获取方法,其特征在于,包括:
    基于约束条件构建模型集,所述模型集包括至少两个在第一数据集上预训练过的模型;
    通过构建的第一预测器预测所述模型集中任意一个模型对第二数据集的第一输出精度,其中,每个模型对应一个第一输出精度,所述每个模型对应一组超参数,所述超参数通过在超参数空间采样得到;
    确定所述模型中第一输出精度满足第一预设条件的模型为目标模型,所述目标模型对应的超参数为目标超参数;
    基于所述目标超参数,根据所述第二数据集对所述目标模型进行训练,得到训练后的目标模型。
  2. 根据权利要求1所述的方法,其特征在于,所述通过构建的第一预测器预测所述模型集中任意一个模型对第二数据集的第一输出精度包括:
    对所述超参数、所述模型集中的任意一个模型及所述第二数据集分别进行编码,分别得到超参数编码、模型编码及第二数据集编码;
    将所述超参数编码、所述模型编码及所述第二数据集编码输入所述第一预测器,输出所述任意一个模型在所述超参数情况下对所述第二数据集的第一输出精度。
  3. 根据权利要求1-2中任一项所述的方法,其特征在于,在所述基于所述目标超参数,根据所述第二数据集对所述目标模型进行训练,得到训练后的目标模型之后,所述方法还包括:
    根据第二输出精度、所述第二数据集、所述目标超参数及所述目标模型更新所述第一预测器的参数,所述第二输出精度为所述训练后的目标模型对所述第二数据集的输出精度。
  4. 根据权利要求1-3中任一项所述的方法,其特征在于,所述确定所述模型中第一输出精度满足第一预设条件的模型为目标模型包括:
    从所述模型中选取第一输出精度取值最大的模型为所述目标模型。
  5. 根据权利要求1-4中任一项所述的方法,其特征在于,所述基于约束条件构建模型集包括:
    基于约束条件构建初始模型集,所述初始模型集包括至少两个训练后的初始模型,所述训练后的初始模型为根据所述第一数据集对初始模型训练得到;
    根据所述训练后的初始模型对构建的第二预测器进行训练,得到训练后的第二预测器;
    通过演化算法(EA)得到每个初始模型各自对应的一组衍生模型,每组衍生模型包括至少一个衍生模型;
    通过所述训练后的第二预测器对每个衍生模型进行处理,得到每个衍生模型对所述第一数据集的第三输出精度;
    根据所述第三输出精度从所述衍生模型中选取目标衍生模型,并根据所述第一数据集对所述目标衍生模型进行训练,得到训练后的目标衍生模型,所述训练后的初始模型及所述训练后的目标衍生模型构成所述模型集。
  6. 根据权利要求5所述的方法,其特征在于,所述基于约束条件构建初始模型集包括:
    根据约束条件确定搜索空间,所述搜索空间包括多种网络结构单元(block)及所述多种网络结构单元之间的连接关系;
    从所述搜索空间随机采样至少两个初始模型,并根据所述第一数据集对所述初始模型进行训练,得到训练后的初始模型,所述初始模型集包括所述训练后的初始模型。
  7. 根据权利要求6所述的方法,其特征在于,所述根据所述第一数据集对所述初始模型进行训练,得到训练后的初始模型包括:
    将所述至少两个初始模型融合成一个第一模型;
    根据所述第一数据集对所述第一模型进行训练,得到训练后的第一模型;
    将所述训练后的第一模型拆解为至少两个训练后的初始模型。
  8. 根据权利要求5-7中任一项所述的方法,其特征在于,所述目标衍生模型为多个,所述根据所述第一数据集对所述目标衍生模型进行训练,得到训练后的目标衍生模型包括:
    将多个所述目标衍生模型融合成一个第二模型;
    根据所述第一数据集对所述第二模型进行训练,得到训练后的第二模型;
    将所述训练后的第二模型拆解为多个训练后的目标衍生模型。
  9. 根据权利要求5-8中任一项所述的方法,其特征在于,所述根据所述训练后的初始模型对构建的第二预测器进行训练,得到训练后的第二预测器包括:
    对所述训练后的初始模型的图结构进行编码,得到图编码;
    根据所述图编码训练图卷积神经网络(GCN)和贝叶斯回归器,得到训练后的GCN和训练后的贝叶斯回归器,其中,所述第二预测器包括所述GCN和所述贝叶斯回归器,所述训练后的第二预测器包括所述训练后的GCN和训练后的贝叶斯回归器。
  10. 根据权利要求5-9中任一项所述的方法,其特征在于,所述根据所述第三输出精度从所述衍生模型中选取目标衍生模型包括:
    从所述衍生模型中选取第三输出精度大于预设值的衍生模型作为所述目标衍生模型;
    或,
    从所述衍生模型中选取第三输出精度取值较大的前n个衍生模型作为所述目标衍生模型,n≥1;
    或,
    根据所述第三输出精度的均值和方差得到每个衍生模型对应的置信上界(UCB),并从所述衍生模型中选取置信上界取值较大的前m个衍生模型作为所述目标衍生模型,m≥1。
  11. 根据权利要求5-10中任一项所述的方法,其特征在于,所述方法还包括:
    将所述模型集作为新的初始模型集,并将所述目标衍生模型作为新的初始模型,重复执行上述基于约束条件构建模型集的步骤直至达到第二预设条件。
  12. 根据权利要求11所述的方法,其特征在于,所述第二预设条件包括:
    所述模型集内的模型数量达到预设数量;
    或,
    所述模型集内的模型满足的所述约束条件达到预设要求。
  13. 根据权利要求1-12中任一项所述的方法,其特征在于,所述约束条件包括:
    模型大小、模型推理时延、模型训练时延、硬件部署条件、片上内存大小中的任意一个或多个。
  14. 根据权利要求1-13中任一项所述的方法,其特征在于,在所述得到训练后的目标模型之后,所述方法还包括:
    将所述训练后的目标模型部署在执行设备上,以使得所述执行设备通过所述训练后的目标模型对输入的目标数据进行处理。
  15. 一种计算机设备,其特征在于,包括:
    构建模块,用于基于约束条件构建模型集,所述模型集包括至少两个在第一数据集上预训练过的模型;
    预测模块,用于通过构建的第一预测器预测所述模型集中任意一个模型对第二数据集的第一输出精度,其中,每个模型对应一个第一输出精度,所述每个模型对应一组超参数,所述超参数通过在超参数空间采样得到;
    选择模块,用于确定所述模型中第一输出精度满足第一预设条件的模型为目标模型,所述目标模型对应的超参数为目标超参数;
    训练模块,用于基于所述目标超参数,根据所述第二数据集对所述目标模型进行训练,得到训练后的目标模型。
  16. 根据权利要求15所述的设备,其特征在于,所述预测模块,具体用于:
    对所述超参数、所述模型集中的任意一个模型及所述第二数据集分别进行编码,分别得到超参数编码、模型编码及第二数据集编码;
    将所述超参数编码、所述模型编码及所述第二数据集编码输入所述第一预测器,输出所述任意一个模型在所述超参数情况下对所述第二数据集的第一输出精度。
  17. 根据权利要求15-16中任一项所述的设备,其特征在于,所述训练模块,还用于:
    在得到训练后的目标模型之后,根据第二输出精度、所述第二数据集、所述目标超参数及所述目标模型更新所述第一预测器的参数,所述第二输出精度为所述训练后的目标模型对所述第二数据集的输出精度。
  18. 根据权利要求15-17中任一项所述的设备,其特征在于,所述选择模块,具体用于:
    从所述模型中选取第一输出精度取值最大的模型为所述目标模型。
  19. 根据权利要求15-18中任一项所述的设备,其特征在于,所述构建模块,具体用于:
    基于约束条件构建初始模型集,所述初始模型集包括至少两个训练后的初始模型,所述训练后的初始模型为根据所述第一数据集对初始模型训练得到;
    根据所述训练后的初始模型对构建的第二预测器进行训练,得到训练后的第二预测器;
    通过演化算法(EA)得到每个初始模型各自对应的一组衍生模型,每组衍生模型包括至少一个衍生模型;
    通过所述训练后的第二预测器对每个衍生模型进行处理,得到每个衍生模型对所述第一数据集的第三输出精度;
    根据所述第三输出精度从所述衍生模型中选取目标衍生模型;
    所述训练模块,还用于根据所述第一数据集对所述目标衍生模型进行训练,得到训练后的目标衍生模型,所述训练后的初始模型及所述训练后的目标衍生模型构成所述模型集。
  20. 根据权利要求19所述的设备,其特征在于,所述构建模块,具体还用于:
    根据约束条件确定搜索空间,所述搜索空间包括多种网络结构单元(block)及所述多种网络结构单元之间的连接关系;
    从所述搜索空间随机采样至少两个初始模型;
    所述训练模块,还用于根据所述第一数据集对所述初始模型进行训练,得到训练后的初始模型,所述初始模型集包括所述训练后的初始模型。
  21. 根据权利要求20所述的设备,其特征在于,所述训练模块,具体用于:
    将所述至少两个初始模型融合成一个第一模型;
    根据所述第一数据集对所述第一模型进行训练,得到训练后的第一模型;
    将所述训练后的第一模型拆解为至少两个训练后的初始模型。
  22. 根据权利要求19-21中任一项所述的设备,其特征在于,所述目标衍生模型为多个,所述训练模块,具体还用于:
    将多个所述目标衍生模型融合成一个第二模型;
    根据所述第一数据集对所述第二模型进行训练,得到训练后的第二模型;
    将所述训练后的第二模型拆解为多个训练后的目标衍生模型。
  23. 根据权利要求19-22中任一项所述的设备,其特征在于,所述构建模块,具体还用于:
    对所述训练后的初始模型的图结构进行编码,得到图编码;
    根据所述图编码训练图卷积神经网络(GCN)和贝叶斯回归器,得到训练后的GCN和训练后的贝叶斯回归器,其中,所述第二预测器包括所述GCN和所述贝叶斯回归器,所述训练后的第二预测器包括所述训练后的GCN和训练后的贝叶斯回归器。
  24. 根据权利要求19-23中任一项所述的设备,其特征在于,所述构建模块,具体还用于:
    从所述衍生模型中选取第三输出精度大于预设值的衍生模型作为所述目标衍生模型;
    或,
    从所述衍生模型中选取第三输出精度取值较大的前n个衍生模型作为所述目标衍生模型,n≥1;
    或,
    根据所述第三输出精度的均值和方差得到每个衍生模型对应的置信上界(UCB),并从所述衍生模型中选取置信上界取值较大的前m个衍生模型作为所述目标衍生模型,m≥1。
  25. 根据权利要求19-24中任一项所述的设备,其特征在于,所述设备还包括:
    触发模块,用于将所述模型集作为新的初始模型集,并将所述目标衍生模型作为新的初始模型,重复执行上述构建模块所执行的步骤直至达到第二预设条件。
  26. 根据权利要求25所述的设备,其特征在于,所述第二预设条件包括:
    所述模型集内的模型数量达到预设数量;
    或,
    所述模型集内的模型满足的所述约束条件达到预设要求。
  27. 根据权利要求15-26中任一项所述的设备,其特征在于,所述约束条件包括:
    模型大小、模型推理时延、模型训练时延、硬件部署条件、片上内存大小中的任意一个或多个。
  28. 根据权利要求15-27中任一项所述的设备,其特征在于,所述设备还包括:
    部署模块,用于将所述训练后的目标模型部署在执行设备上,以使得所述执行设备通过所述训练后的目标模型对输入的目标数据进行处理。
  29. 一种计算机设备,包括处理器和存储器,所述处理器与所述存储器耦合,其特征在于,
    所述存储器,用于存储程序;
    所述处理器,用于执行所述存储器中的程序,使得所述计算机设备执行如权利要求1-14中任一项所述的方法。
  30. 一种计算机可读存储介质,包括程序,当其在计算机上运行时,使得计算机执行如权利要求1-14中任一项所述的方法。
  31. 一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行如权利要求1-14中任一项所述的方法。
PCT/CN2021/124924 2020-10-21 2021-10-20 一种模型的获取方法及设备 WO2022083624A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011131434.7 2020-10-21
CN202011131434.7A CN112434462A (zh) 2020-10-21 2020-10-21 一种模型的获取方法及设备

Publications (1)

Publication Number Publication Date
WO2022083624A1 true WO2022083624A1 (zh) 2022-04-28

Family

ID=74695802

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/124924 WO2022083624A1 (zh) 2020-10-21 2021-10-20 一种模型的获取方法及设备

Country Status (2)

Country Link
CN (1) CN112434462A (zh)
WO (1) WO2022083624A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115833101A (zh) * 2022-12-06 2023-03-21 北京百度网讯科技有限公司 电力调度方法、装置、电子设备和存储介质
CN116527411A (zh) * 2023-07-05 2023-08-01 安羚科技(杭州)有限公司 数据安全智能防护模型构建方法、装置及协作平台
CN116956747A (zh) * 2023-08-28 2023-10-27 西湾智慧(广东)信息科技有限公司 一种基于ai能力的机器学习建模平台的搭建方法

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112434462A (zh) * 2020-10-21 2021-03-02 华为技术有限公司 一种模型的获取方法及设备
CN112989603B (zh) * 2021-03-12 2024-04-05 北京金山云网络技术有限公司 一种工作流的调整方法和装置
CN113159283B (zh) * 2021-03-31 2023-03-31 华为技术有限公司 一种基于联邦迁移学习的模型训练方法及计算节点
CN113282721B (zh) * 2021-04-28 2023-07-21 南京大学 基于网络结构搜索的视觉问答方法
CN113392983B (zh) * 2021-06-29 2023-01-13 中国科学院自动化研究所 自动机器学习的超参数自适应寻优优化系统和方法
CN113609779B (zh) * 2021-08-16 2024-04-09 深圳力维智联技术有限公司 分布式机器学习的建模方法、装置及设备
CN113516204A (zh) * 2021-08-16 2021-10-19 上海冰鉴信息科技有限公司 建模数据集确定方法及装置
CN114037057B (zh) * 2021-11-05 2024-03-15 北京百度网讯科技有限公司 预训练模型的生成方法、装置、电子设备以及存储介质
CN115034368B (zh) * 2022-06-10 2023-09-29 小米汽车科技有限公司 车载模型训练方法、装置、电子设备、存储介质及芯片
CN115131633A (zh) * 2022-06-14 2022-09-30 华为技术有限公司 一种模型迁移方法、装置及电子设备
CN115099393B (zh) * 2022-08-22 2023-04-07 荣耀终端有限公司 神经网络结构搜索方法及相关装置
CN117688984A (zh) * 2022-08-25 2024-03-12 华为云计算技术有限公司 神经网络结构搜索方法、装置及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409442A (zh) * 2018-11-21 2019-03-01 电子科技大学 迁移学习中卷积神经网络模型选择方法
CN111260074A (zh) * 2020-01-09 2020-06-09 腾讯科技(深圳)有限公司 一种超参数确定的方法、相关装置、设备及存储介质
WO2020160252A1 (en) * 2019-01-30 2020-08-06 Google Llc Task-aware neural network architecture search
CN112434462A (zh) * 2020-10-21 2021-03-02 华为技术有限公司 一种模型的获取方法及设备

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111290074B (zh) * 2020-02-21 2021-03-02 东北大学 一种中红外布拉格光纤及其气体定性定量检测装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409442A (zh) * 2018-11-21 2019-03-01 电子科技大学 迁移学习中卷积神经网络模型选择方法
WO2020160252A1 (en) * 2019-01-30 2020-08-06 Google Llc Task-aware neural network architecture search
CN111260074A (zh) * 2020-01-09 2020-06-09 腾讯科技(深圳)有限公司 一种超参数确定的方法、相关装置、设备及存储介质
CN112434462A (zh) * 2020-10-21 2021-03-02 华为技术有限公司 一种模型的获取方法及设备

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115833101A (zh) * 2022-12-06 2023-03-21 北京百度网讯科技有限公司 电力调度方法、装置、电子设备和存储介质
CN115833101B (zh) * 2022-12-06 2023-11-14 北京百度网讯科技有限公司 电力调度方法、装置、电子设备和存储介质
CN116527411A (zh) * 2023-07-05 2023-08-01 安羚科技(杭州)有限公司 数据安全智能防护模型构建方法、装置及协作平台
CN116527411B (zh) * 2023-07-05 2023-09-22 安羚科技(杭州)有限公司 数据安全智能防护模型构建方法、装置及协作平台
CN116956747A (zh) * 2023-08-28 2023-10-27 西湾智慧(广东)信息科技有限公司 一种基于ai能力的机器学习建模平台的搭建方法

Also Published As

Publication number Publication date
CN112434462A (zh) 2021-03-02

Similar Documents

Publication Publication Date Title
WO2022083624A1 (zh) 一种模型的获取方法及设备
Liu et al. Progressive neural architecture search
WO2022012407A1 (zh) 一种用于神经网络的训练方法以及相关设备
CN113361680B (zh) 一种神经网络架构搜索方法、装置、设备及介质
CN109120462B (zh) 机会网络链路的预测方法、装置及可读存储介质
WO2022068623A1 (zh) 一种模型训练方法及相关设备
CN110782015A (zh) 神经网络的网络结构优化器的训练方法、装置及存储介质
US20200167659A1 (en) Device and method for training neural network
CN112508085A (zh) 基于感知神经网络的社交网络链路预测方法
CN112465120A (zh) 一种基于进化方法的快速注意力神经网络架构搜索方法
US20200143243A1 (en) Multiobjective Coevolution of Deep Neural Network Architectures
CN112364880A (zh) 基于图神经网络的组学数据处理方法、装置、设备及介质
CN113159283A (zh) 一种基于联邦迁移学习的模型训练方法及计算节点
CN112905801A (zh) 基于事件图谱的行程预测方法、系统、设备及存储介质
WO2023051369A1 (zh) 一种神经网络的获取方法、数据处理方法以及相关设备
US20200272812A1 (en) Human body part segmentation with real and synthetic images
CN113988464A (zh) 基于图神经网络的网络链路属性关系预测方法及设备
CN113162787B (zh) 电信网络中故障定位的方法、节点分类方法以及相关设备
CN112420125A (zh) 分子属性预测方法、装置、智能设备和终端
Loni et al. Densedisp: Resource-aware disparity map estimation by compressing siamese neural architecture
WO2022100607A1 (zh) 一种神经网络结构确定方法及其装置
CN114897085A (zh) 一种基于封闭子图链路预测的聚类方法及计算机设备
Azizi et al. Graph-based generative representation learning of semantically and behaviorally augmented floorplans
WO2023087953A1 (zh) 搜索神经网络集成模型的方法、装置和电子设备
WO2023143121A1 (zh) 一种数据处理方法及其相关装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21882039

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21882039

Country of ref document: EP

Kind code of ref document: A1