CN112990423A

CN112990423A - Artificial intelligence AI model generation method, system and equipment

Info

Publication number: CN112990423A
Application number: CN201911294745.2A
Authority: CN
Inventors: 胡琪; 田行辉; 郭兴泽
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-12-16
Filing date: 2019-12-16
Publication date: 2021-06-18

Abstract

The application provides an artificial intelligence AI model generation method, a system and equipment, which relate to the technical field of AI, and the method comprises the following steps: the AI model generation system receives a data set and a task target of a function of a target AI model for representing user requirements, and obtains N candidate AI models according to the task target and the data set, wherein the N candidate AI models are generated through N iterative processes, each candidate AI model is composed of a plurality of nodes, and each node comprises one or more modules; in the Mth iteration process, selecting a module corresponding to each node in the initial framework in a search space according to a sampling parameter set determined by sampling data in the iteration process before the Mth iteration process, generating an initial candidate AI model, and training the initial candidate AI model by using a data set to obtain a candidate AI model; and finally, determining a target AI model in the N candidate AI models. The method can improve the accuracy of the generated target AI model.

Description

Artificial intelligence AI model generation method, system and equipment

Technical Field

The embodiment of the application relates to the technical field of Artificial Intelligence (AI), in particular to an AI model generation method, system and device.

Background

Artificial Intelligence (AI) is a new technical science that studies and develops theories, methods, techniques and application systems for simulating, extending and expanding human intelligence, and belongs to a branch of computer science. Currently, AI is widely concerned in academic and industrial fields, and is more and more widely applied, for example, face recognition, image classification, object detection, voice recognition, etc., and the recognition, classification and detection capabilities of AI technology are beyond the level of common human beings in many application fields. For example: the application of AI in the field of machine vision (human recognition, image classification, object detection, etc.) makes the accuracy of machine vision higher than that of humans.

Machine learning is one method of implementing AI. The method of machine learning is that a computer firstly constructs or selects a proper AI model aiming at the technical problem to be solved. The computer may then use the AI model to parse and learn from sample data relating to the technical problem to obtain capabilities (e.g., cognitive capabilities, discrimination capabilities, classification capabilities, etc.) for solving the technical problem, i.e., to train and optimize the AI model. It should be understood that an AI model is essentially an algorithm that includes a large number of parameters and calculation formulas (which may also be referred to as calculation rules). By adopting the data set to train and optimize the AI model, the parameter combination in the AI model can be better, so that the AI model can learn the rule of the data in the data set, and can be used for solving specific technical problems. Subsequently, the computer can make predictions of the data in the real world using the trained AI model. For example, speech recognition is performed by an AI model, image processing is performed by an AI model, classification is performed by an AI model, or the like. At present, common AI models include a neural network (neural network) model, a Support Vector Machine (SVM) model, and the like.

As can be seen from the above description of the AI model, when the AI model is applied to solve the technical problem, an appropriate AI model needs to be constructed or selected, and the AI model needs to be trained and optimized. However, in practice, many users (e.g., businesses or organizations) who want to use the AI model to improve their own technical capabilities or service levels lack the ability to do so. In this context, an automatic machine learning (AutoML) platform has come. The AutoML platform can automatically generate an AI model by applying the technologies of transfer learning, super-parameter automatic optimization, neural network architecture search and the like, and helps a user to avoid AI model design and super-parameter tuning. That is, the AutoML platform can provide, according to a task target determined by a user and a data set collected by the user, services such as selection, construction, training, verification, tuning and the like of an AI model for realizing the task target for the user, greatly simplifies the operation of applying the AI model by the user, and improves the research and development efficiency and the operation efficiency of the user. The task object can be understood as a technical problem to be solved by the AI model, or a function of the AI model.

Currently, the automatic ml platform can implement the above function of automatically generating an AI model through an AI model generation system, but the accuracy of the AI model searched by the AI model generation system is low.

Disclosure of Invention

The application provides an AI model generation method, system and equipment, and the AI model generation method can improve the accuracy of an AI model searched by an AI model generation system.

In a first aspect, the present application provides an AI model generation method, which is applied to an artificial intelligence AI model generation system, and may include: first, a task objective and a data set are received, the task objective being for a function that a target AI model representing a user's needs possesses. And secondly, obtaining N candidate AI models according to the task target and the data set. Wherein N is a positive integer, and the N candidate AI models are generated through N iterative processes. Each of the candidate AI models is composed of a plurality of nodes, each node including one or more modules. In the M iteration process, selecting a module corresponding to each node in the initial architecture in a search space according to the sampling parameter set, generating an initial candidate AI model, and training the initial candidate AI model by using the data set to obtain a candidate AI model. M is an integer greater than 1 and less than or equal to N, and the sampling parameter set is determined by sampling data of the AI model generation system in an iteration process before the Mth iteration process. Finally, a target AI model is determined among the N candidate AI models.

Illustratively, the sampling data in the iterative process before the mth iterative process includes: and in the iteration process before the Mth iteration process, generating sampling data of each node in the initial candidate AI model aiming at each module. For example, the number of samples, or other sample data that can be used to determine the sampling probability, etc.

Illustratively, the set of sampling parameters includes a plurality of sampling parameters, each sampling parameter for determining a sampling probability for each node in the initial architecture for each module.

By the method, in the process of obtaining N candidate AI models, a sampling parameter set used in the current iteration process can be determined by combining sampling data in the iteration process before the iteration process, so that when a module corresponding to each node in an initial framework is selected in a search space based on the sampling parameter set, the problem of unbalanced module sampling can be solved, and the problems of inaccurate evaluation of the candidate AI models and inaccuracy of the searched AI models caused by unbalanced module sampling can be solved.

In a possible implementation manner of the first aspect, in the mth iteration process, selecting a module corresponding to each node in the initial architecture in the search space according to the sampling parameter set, and generating the initial candidate AI model includes: firstly, according to the evaluation result of the candidate AI model obtained in the M-1 iteration process, obtaining an initial sampling parameter set in the M iteration process, wherein the initial sampling parameter set comprises: initial sampling parameters of each node pair module of the initial candidate AI model. And secondly, acquiring a sampling parameter adjustment value of at least one node of the sampling parameter to be adjusted according to sampling data in the iteration process before the Mth iteration process. And finally, adjusting the initial sampling parameter of the node pair module of the at least one sampling parameter to be adjusted according to the sampling parameter adjustment value of the node of the at least one sampling parameter to be adjusted, so as to obtain a sampling parameter set.

For example, the obtaining a sampling parameter adjustment value of at least one node of a sampling parameter to be adjusted according to sampling data in an iteration process before the mth iteration process includes: firstly, according to sampling data in an iteration process before an Mth iteration process, the sampling probability, the mean value of the sampling probability and the value of a first parameter of each node aiming at each module are obtained. The first parameter is related to the sampling probability of each node for each module in the historical iteration process, for example, the first parameter is a sampling discrete coefficient or a sampling range, and the like. And then, determining at least one node of the sampling parameter to be adjusted, wherein the node of the sampling parameter to be adjusted is a node of which the value of the first parameter is greater than or equal to a preset threshold value. And finally, acquiring a sampling parameter adjustment value of the node of the at least one sampling parameter to be adjusted according to the sampling probability and the sampling probability mean value of the node pair module of the at least one sampling parameter to be adjusted.

Through the possible implementation mode, after the sampling unbalanced nodes are identified through the sampling data, the sampling parameters of the nodes can be adjusted through the sampling parameter adjustment values of the nodes, so that the sampling probability of the nodes for each module is changed, the sampling probability of the module with the high original sampling probability is reduced, the sampling probability of the module with the low original sampling probability is increased, and the sampling balance of the nodes for each module can be improved when the module corresponding to each node in the initial framework is selected in the search space based on the sampling set. Therefore, the training degrees of the modules with balanced sampling are similar, so that the evaluation accuracy of the candidate AI model can be improved, the updating accuracy of the candidate AI model in the next iteration period can be improved, and the accuracy of the final model search result is improved finally.

In a possible implementation manner of the first aspect, the method further includes: and outputting the target AI model.

In a possible implementation manner of the first aspect, the method further includes: and outputting the information of the initial candidate AI model in the N times of iteration processes, and/or outputting the information of the candidate AI model in the N times of iteration processes. Through the possible implementation mode, the method can help the user to better understand the structure of the AI model, enhance the credibility of generating the AI model, optimize the use experience of the user and improve the satisfaction of the user.

In a possible implementation manner of the first aspect, during the mth iteration, the method further includes: and in the M iteration process, determining the training parameters of the initial candidate AI model according to the sampling data in the M iteration process, wherein the training parameters are used for training the initial candidate AI model. The training parameters may include, for example: the learning rate. Through the possible implementation mode, the sampling data in the Mth iteration process is used as the parameters for calculating the training parameters of each node for each module, so that the training parameters of each node for each module can be changed along with the sampling data of each node for each module, the training parameters can be adaptively adjusted based on the sampling data of each node for each module, the difference of the training degree among the modules can be reduced, the accuracy of candidate AI model training evaluation can be improved, the accuracy of AI model updating can be improved, and the jitter of an AI model generating system is reduced. In addition, the convergence rate of the initial candidate AI model can be improved, so that the time for generating the candidate AI model is reduced, the time for obtaining the target AI model is reduced, and the waiting time of the user is reduced.

In a second aspect, the present application provides an artificial intelligence AI model generation system, the system comprising: and the user input/output I/O module is used for receiving a task target and a data set, wherein the task target is used for representing the functions of the target AI model required by the user. And the model generation module is used for obtaining N candidate AI models according to the task target and the data set and determining a target AI model in the N candidate AI models. The N candidate AI models are generated through N iterations, each candidate AI model is composed of a plurality of nodes, and each node includes one or more modules. In the M iteration process, selecting a module corresponding to each node in the initial architecture in a search space according to the sampling parameter set, generating an initial candidate AI model, and training the initial candidate AI model by using the data set to obtain a candidate AI model. Wherein M is an integer greater than 1 and less than or equal to N, and the sampling parameter set is determined by sampling data of the AI model generation system in an iteration process before the Mth iteration process.

In a possible implementation manner of the second aspect, the sampling data in the iteration process before the mth iteration process includes: and in the iteration process before the Mth iteration process, generating sampling data of each node in the initial candidate AI model aiming at each module.

In one possible implementation of the second aspect, the sampling parameter set includes a plurality of sampling parameters, each sampling parameter being used to determine a sampling probability for each node in the initial architecture for each module.

In a possible implementation manner of the second aspect, the model generating module is configured to, in an mth iteration process, select a module corresponding to each node in an initial architecture in a search space according to a sampling parameter set, and when generating an initial candidate AI model, specifically configured to: obtaining an initial sampling parameter set in the M-th iteration process according to the evaluation result of the candidate AI model obtained in the M-1 th iteration process, wherein the initial sampling parameter set comprises: initial sampling parameters of each node pair module of the initial candidate AI model; acquiring a sampling parameter adjustment value of at least one node of a sampling parameter to be adjusted according to sampling data in an iteration process before the Mth iteration process; and adjusting the initial sampling parameter of the node pair module of the at least one sampling parameter to be adjusted according to the sampling parameter adjustment value of the node of the at least one sampling parameter to be adjusted to obtain a sampling parameter set.

In a possible implementation manner of the second aspect, the model generating module is specifically configured to, when obtaining a sampling parameter adjustment value of a node of at least one sampling parameter to be adjusted according to sampling data in an iteration process before an mth iteration process: acquiring sampling probability, a sampling probability mean value and a value of a first parameter of each node aiming at each module according to sampling data in an iteration process before the Mth iteration process, wherein the first parameter is related to the sampling probability of each node aiming at each module in a historical iteration process; determining at least one node of a sampling parameter to be adjusted, wherein the node of the sampling parameter to be adjusted is a node of which the value of the first parameter is greater than or equal to a preset threshold value; and acquiring a sampling parameter adjustment value of the node of the at least one sampling parameter to be adjusted according to the sampling probability and the sampling probability mean value of the node pair module of the at least one sampling parameter to be adjusted.

In one possible implementation manner of the second aspect, the first parameter is a sampling discrete coefficient or a sampling range.

In a possible implementation manner of the second aspect, the user I/O module is further configured to output the target AI model.

In a possible implementation manner of the second aspect, the user I/O module is further configured to output information of the initial candidate AI model in the N iterative processes, and/or output information of the candidate AI model in the N iterative processes.

In a possible implementation manner of the second aspect, the model generating module is further configured to determine, in an mth iteration process, a training parameter of the initial candidate AI model according to the sampling data in the mth iteration process, where the training parameter is used for training the initial candidate AI model.

In a third aspect, the present application provides a computing device comprising a memory for storing a set of computer instructions and a processor; the processor executes a set of computer instructions stored by the memory to cause the computing device to perform the method provided by the first aspect or any one of the possible implementations of the first aspect.

In a fourth aspect, the present application provides a non-transitory readable storage medium having stored thereon computer program code which, when executed by a computing device, performs the method provided in the foregoing first aspect or any one of the possible implementations of the first aspect. The storage medium includes, but is not limited to, volatile memory such as random access memory, non-volatile memory such as flash memory, hard disk (HDD), Solid State Disk (SSD).

In a fifth aspect, the present application provides a computer program product comprising computer program code which, when executed by a computing device, performs the method provided in the foregoing first aspect or any possible implementation manner of the first aspect. The computer program product may be a software installation package, which may be downloaded and executed on a computing device in case it is desired to use the method as provided in the first aspect or any possible implementation manner of the first aspect.

Drawings

FIG. 1 is a schematic diagram of an architecture of an AutoML platform;

FIG. 2 is a schematic diagram of a neural network architecture;

FIG. 3 is a system architecture of an AI model generation system;

fig. 4 is a schematic structural diagram of the AI model generation system 100 in the embodiment of the present application;

fig. 5 is a schematic application scenario diagram of an AI model generation system 100 according to an embodiment of the present disclosure;

fig. 6 is a schematic application scenario diagram of another AI model generation system 100 according to an embodiment of the present disclosure;

fig. 7 is a hardware configuration diagram of a computing apparatus 200 in which the AI model generation system 100 is deployed;

fig. 8 is a schematic flowchart of a method for generating an AI model according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of a computing device according to an embodiment of the present application.

Detailed Description

Currently, when applying an AI model to solve a technical problem, it is necessary to construct or select an appropriate AI model, and train and optimize the AI model. However, in practice, many users (e.g., businesses or organizations) who want to use the AI model to improve their own technical capabilities or service levels lack the ability to do so. In this context, an automatic machine learning (AutoML) platform has come.

Fig. 1 is a schematic diagram of an architecture of an AutoML platform. As shown in fig. 1, the AutoML platform may provide data processing and archiving services, model training (train) services, model evaluation (evaluate) services, model management services, model hyper parameter tuning (hyper parameter tuning) services, and model auto generation (AutoNet) services for users. Wherein the AutoNet service can automatically generate an AI model for the user.

The AutoNet service may be implemented by an AI model generation system. The AI model generation system may perform a search of the AI models using an AI model search algorithm to generate a target AI model that satisfies the task goals of the user. For example: the target neural network model may be generated using a neural network search (NAS) technology or an efficient neural network search (ENAS) technology. The NAS and the ENAS are emerging technologies in the field of neural networks, and can be used for searching and generating a target neural network model meeting a task target from 0 aiming at a specific scene by applying technologies such as reinforcement learning, genetic algorithm and the like. In other words, the design of the target neural network model can be automatically searched and completed by applying sample data of a specific scene. The user can complete the feature extraction of sample data, the establishment, optimization and performance evaluation of the neural network model without understanding the principle of the neural network model, so as to obtain a target neural network model meeting the application scene requirements (namely the task target), and the use threshold of the neural network technology is reduced.

The AI model generation system needs to generate an AI model meeting the user requirement, and includes the following three aspects:

aspect 1: search space-includes all modules or combinations of modules that generate AI models. Wherein each module is configured to implement an operation. The plurality of modules are connected by some combination to form an initial candidate AI model, which may include a plurality of nodes, each node including at least one module (block).

Taking the AI model as an example of a neural network model, the neural network model is a mathematical model or a computational model that mimics the structure and function of a biological neural network (i.e., the central nervous system of an animal, particularly the brain). Fig. 2 is a schematic diagram of a neural network. As shown in fig. 2, the architecture of the neural network is mainly composed of neurons and connections between neurons. Neurons are the most basic units of neural networks, and each o in fig. 2 represents a neuron. Each neuron may be connected to one or more other neurons to form a network. It can also be said that a neural network is a complex network formed by a large number of simple neurons widely interconnected.

The neural network may include an input layer, an output layer, and a hidden layer. Wherein, each neuron of the input layer can receive various different characteristic information in the sample data. That is, the input layer receives information only from the external environment, and each neuron of the input layer corresponds to an argument, and does not perform any calculation, but transmits information only for the next layer. There may be at least one hidden layer interposed between the input layer and the output layer, that is, a neural network may have a plurality of hidden layers, and fig. 1 is a neural network exemplified by one hidden layer. The hidden layer is used for analyzing information, and functions used in calculation of each neuron of the hidden layer are connected with variables of the previous layer and variables of the next layer, so that the hidden layer is more suitable for data. Finally, the output layer generates the final result. For example, in a classification neural network, each neuron in the output layer corresponds to a particular classification.

At present, the operations of the common hidden layer are: convolution, pooling (pooling), full ligation, etc. The hidden layer used to implement the convolution operation may be referred to as the convolutional layer, which is mainly used for feature extraction. Common convolution operations are 3 × 3 convolution, 5 × 5 convolution, etc. The hidden layer used for implementing the pooling operation may be referred to as a pooling layer, which is mainly used for compressing the features and simplifying the complexity of the neural network computation. The hidden layer used to implement the fully-connected operation may be referred to as the fully-connected layer, which is used primarily to connect all features.

In this example, the initial candidate AI model may include nodes corresponding to a group of neurons in a neural network, each node including at least one module, each module being configured to implement an operation, such as 3 × 3 convolution, or, 5 × 5 convolution, or, pooling, or, fully-connected, etc.

Aspect 2: search strategy — algorithm to generate initial candidate AI models.

Aspect 3: performance evaluation method-A method of evaluating the performance of a candidate AI model. The candidate AI model referred to herein is an AI model obtained after training an initial candidate AI model. It should be understood that the candidate AI model is consistent with the structure of the initial candidate AI model, i.e., the candidate AI model may include a plurality of nodes, each node including at least one module.

The AI model generation system will be described in detail below based on the three aspects described above. Fig. 3 is a system architecture of an AI model generation system. As shown in fig. 3, the AI model generation system mainly includes a controller module, a training & evaluation module. The controller module is preset with the above-described search space and search strategy, and the training and evaluation module is preset with the above-described performance evaluation method. Optionally, in some embodiments, the AI model generation system may further include a data presentation module for presenting information of the currently generated candidate AI model to a user.

The steps of the AI model generation system shown in fig. 3 for automatically generating the target AI model are as follows:

step 1, a controller module uses a search strategy to perform model search in a search space according to a task target, generates an instruction set and sends the instruction set to a training and evaluation module. The task object may be a function of a target AI model representing a user's needs, and the instruction set may also be referred to as a decision for the controller to construct an initial candidate AI model structure. These decisions determine the various modules (e.g., convolutions, pooling, etc.) that make up a particular node in the initial candidate AI model from which the initial candidate AI model can be constructed. It should be understood that the initial candidate AI model constructed is one of the AI models that may be generated in the search space. In some embodiments, the instruction set may also be referred to as code for generating the initial candidate AI model, or information describing the structure of the initial candidate AI model, or the like.

And 2, constructing an initial candidate AI model by the training and evaluation module based on the instruction set, and training the initial candidate AI model until the parameters of the initial candidate AI model are converged. And finishing the training of the initial candidate AI model after the appointed training iteration times to obtain the candidate AI model. The training and evaluating module may adopt the aforementioned performance evaluating method to evaluate the performance of the trained candidate AI model (which may also be referred to as verifying the trained candidate AI model), and feed back the evaluation result to the controller module, so that the controller module may adjust and update parameters in the controller module according to the fed-back evaluation result, so that the initial candidate AI model for next generation of instruction set decision has better performance.

Taking the generation of an AI model (i.e., a task target) for completing an image classification task as an example, the controller module may perform a model search in a search space using a search strategy according to the task target, generate an instruction set for constructing an initial candidate AI model, and send the instruction set to the training and evaluation module. The training and evaluation module may generate an initial candidate AI model according to the instruction set, and train the initial candidate AI model according to the constructed data set. For example, the initial candidate AI model is a Convolutional Neural Network (CNN) model.

Assume that the constructed data set contains 3 types of images, respectively: the apple, the pear and the banana are respectively stored in 3 folders according to types, and the name of the folder is the label of all the images in the folder. I.e. all apple images in the folder named apple, all pear images in the folder named pear and all banana images in the folder named banana. The construction of the data set may be implemented by the training and evaluation module, or the controller module, or other modules included in the AI model generation system (e.g., data processing module, etc.), which is not limited herein.

The training and evaluation module can input the images in the data set into the initial candidate AI model, each node in the initial candidate AI model performs feature extraction and feature classification on the images, finally, confidence (confidence) that the images belong to each type is output, loss values are calculated by using a loss function according to the confidence and labels corresponding to the images, and parameters of the initial candidate AI model are updated according to the loss values and the structure of the initial candidate AI model. The aforementioned training process continues until the loss value output by the loss function converges or all images in the data set are used for training, and the training ends.

The penalty function is a function that measures how well the AI model is trained (i.e., to compute the difference between the outcome of the AI model prediction and the true goal). In the process of training the initial candidate AI model, because it is desirable that the output of the initial candidate AI model is as close as possible to the value really desired to be predicted, the parameters in the initial candidate AI model can be updated by comparing the predicted value of the input image with the target value really desired (i.e. the label of the input image) of the current initial candidate AI model, and then according to the difference between the predicted value and the target value (of course, there is usually an initialization process before the first update, that is, the initial values are pre-configured for the parameters in the initial candidate AI model). And judging the difference between the current predicted value of the initial candidate AI model and the real target value through a loss function every time of training, and updating the parameters of the initial candidate AI model until the initial candidate AI model can predict the real desired target value or a value which is very close to the real desired target value, and considering that the initial candidate AI model is trained completely. At this time, the trained AI model is the candidate AI model.

Then, the training and evaluation module may use the loss function value of the trained candidate AI model as an evaluation result and feed back the evaluation result to the controller module, so that the controller module may adjust and update parameters in the controller module according to the loss function value, so that the initial candidate AI model decided by the instruction set generated next time has better performance.

As can be seen from the foregoing description of the initial candidate AI model, an initial candidate AI model may include a plurality of nodes (corresponding to a set of neurons in a neural network), each node including at least one module, each module for implementing an operation. Therefore, the parameters in the controller module may include sampling parameters of each node for each module, and the controller module may determine which module or modules to use to generate the node when the candidate AI model is generated next time through the sampling parameters of each node for each module. For a description of the sampling parameters, reference may be made to the following introduction.

It should be understood that the training iteration performed by the training and evaluation module described in step 2 above may be referred to as a child iteration (child epoch), and steps 1 to 2 above describe an overall iterative process, which may be referred to as a controller epoch.

The process is repeated in a loop (namely, the step 1-2 is executed in a loop) until a specific condition is met. The specific condition may be, for example, a specified number of controller iterations, e.g., 2000. And finally, selecting the optimal candidate AI model from all candidate AI models generated in the iterative process of the controller as the final target AI model to be output. The optimal candidate AI model referred to herein may be understood as the candidate AI model with the highest accuracy.

Currently, in the process of generating a target AI model, in the AI model generation system shown in fig. 3, in the model search process (i.e., the controller iteration process), each node has a certain deviation with respect to the sampling frequency of each module. The sampling frequency of the module as referred to herein means the frequency of the module used by the controller module of the AI model generation system in generating the node. I.e. the module is used to generate the frequency of the node. For example, take the example of 100 controller iterations, i.e., assume that node m has 5 possible modules opi, i ∈ [0,4 ]. Wherein each module is configured to implement a different operation. The controller module generates node m with sampling times of 40, 30, 10, 15, and 5 for 5 modules (i.e. the number of times the node is generated using the module) in 100 initial candidate AI models generated by the controller iteration, and the sampling frequency of node m for each module is 0.4, 0.3, 0.1, 0.15, and 0.05. This relationship is shown in table 1 below:

TABLE 1

By counting the sampling data (such as sampling times and the like) of each module of each node by the controller module when the controller iteration reaches different times, the sampling discrete coefficient and the sampling range of each node when the controller iteration reaches different times can be obtained. The sampling discrete coefficient is the ratio of the sampling standard deviation to the sampling mean value, and the sampling range is the ratio of the maximum result and the minimum result in the sampling data. The larger the sampling discrete coefficient and the sampling range are, the more unbalanced the sampling of the node aiming at each module is.

Through the experimental statistics of the sampling discrete coefficient and the sampling range of each node when the controller iterates to different times, the following conclusion can be obtained: the sampling imbalance is most obvious in the initial stage of controller iteration, and as the number of controller iterations increases, the sampling imbalance is relieved in comparison with the initial stage of controller iteration, but imbalance still exists. Taking an experimental statistics as an example, when the controller starts iteration, the sampling discrete coefficient of the node reaches more than 60%, and the sampling range is more than 6. After the iteration number of the controller reaches 100 times, the sampling discrete coefficient of the node is still more than 20%, and the sampling range is more than 2. Referring to the example of table 1 above, it can be seen that the sampling of node m is unbalanced for each module.

Sampling imbalances can result in different degrees of training for different modules, thereby resulting in inaccurate evaluation of the trained candidate AI models. After the inaccurate result is fed back to the controller module, the controller module updates the controller parameters based on the inaccurate result, which further affects the updating accuracy of the controller module, thereby reducing the accuracy of the final AI model search result. Namely, the target AI model obtained by searching is less accurate in achieving the task target.

In addition, in the process of generating the target AI model by the AI model generation system, when the training and evaluation module trains the initial candidate AI model, the initial candidate AI model is trained by using a stochastic gradient descent method until the parameters are converged. The principle of the random gradient descent method is that the optimal path under the current target is continuously judged and selected, so that the optimal result can be achieved under the shortest path. Taking a person going down a hill as an example and wanting to reach the bottom of the hill more quickly, the simplest method is to go down the hill along the steepest direction at the current position and then find the steepest direction after going to another position, so that the method of stopping and observing the optimal route at each step is the essence of the stochastic gradient descent algorithm. Therefore, in each iteration period of the sub-iteration, the training and evaluation module can calculate the adjustment direction of the parameters of each module of each node of the initial candidate AI model by a random gradient descent method, and further perform parameter adjustment and update. The formula for calculating the parameter update can be shown as the following formula (1):

wherein theta is a parameter of the module, theta' is a parameter after the module is updated, 1r_curThe learning rate of the module in the current sub-iteration period may also be referred to as a descent coefficient or a descent step size. The larger the learning rate is, the larger the variation of the calculated parameter is, and the smaller the learning rate is, the smaller the variation of the calculated parameter is, but the time for iterative calculation is relatively prolonged.

The AI model generation system uses a uniform learning rate variation strategy, such as cosine-type learning rate attenuation, for the entire search space in generating the target AI model, and the calculation formula of the learning rate is shown in the following formula (2):

wherein, 1r_minTo studyMinimum value of learning rate, 1r_maxFor the maximum value of the learning rate, step _ count is the number of controller iterations (i.e. the number of iterations already), batch _ num is the number of times the initial candidate AI model needs to be trained, and 1r _ dec _ T is a preset constant.

As can be seen from the above learning rate calculation formula, if the sampling frequency of each module is balanced, step _ count (i.e., the number of training iterations of the module) in formula (2) is similar, and the learning rate calculated based on the step _ count is similar. In fact, the difference between the sampling frequencies of the modules is large (i.e., the sampling of the modules is unbalanced), which results in a large deviation of the learning rate of each module, and thus a large deviation of the training degree of each module.

Taking the op0 and op4 shown in table 1 as an example, assuming that 40 samples of op0 are evenly distributed in 100 samples, 3 samples of 5 samples of op4 are located in the early stage of sampling and 2 samples are located in the late stage of sampling. The op0 is trained for 40 rounds, and the learning rate in the training process is smoothly reduced from 0.1 to 0.01 through 0.09, 0.08 and the like. The op4 only goes through 5 rounds of training, and the learning rate in the training process does not go down smoothly, but drops sharply from 0.09 to 0.02. In this case, training for op4 is inaccurate, and thus the evaluation of op4, and the comparison of op4 with op0 is also inaccurate.

It can be seen from this example that, for under-sampled modules, on the one hand, prematurely forcing a reduction in the learning rate results in an inherently under-trained module training more slowly. This is because the smaller the learning rate, the smaller the variation of the calculated parameter, and the longer the iterative calculation time. On the other hand, insufficient sampling and insufficient training can cause the module to fail to converge in time, so that the training gap between the module and other modules is larger, and further the evaluation deviation of the candidate AI model is further increased. That is, the learning rate of each module is affected by the fact that the sampling frequency of each module is greatly different (i.e., the sampling of each module is unbalanced), and the inaccuracy in the evaluation of the candidate AI model is further aggravated.

In conclusion, imbalance of node to module sampling can cause inaccurate evaluation of candidate AI models, thereby affecting accuracy of final model search results.

In order to solve the problem, the embodiment of the application provides a method for generating an AI model, and when an AI model generation system automatically generates an AI model, the problems that candidate AI model evaluation is not accurate and candidate AI model updating is not accurate due to unbalanced module sampling can be solved.

Fig. 4 is a schematic structural diagram of the AI model generation system 100 in the embodiment of the present application. It should be understood that fig. 4 is only an exemplary diagram illustrating the structural configuration of the AI model generation system 100, and the present application does not limit the partitioning of the modules in the AI model generation system 100. As shown in fig. 4, the AI model generation system 100 includes a user input/output (I/O) module 11 and a model generation module 12.

The functions of the respective modules in the AI model generation system 100 are briefly described below:

user I/O module 11: for receiving a task goal entered or selected by a user, receiving a data set uploaded by the user, and providing the user with a goal AI model, information of an initial candidate AI model in an iterative process, and/or information of a candidate AI model, etc. As an example of the user I/O module 101, a Graphical User Interface (GUI) may be used.

The model generation module 12: and the target AI model is used for automatically generating a target AI model meeting the task target of the user according to the task target and the data set.

Optionally, in some embodiments, the model generation module 12 may include a controller unit 121, a training and evaluation unit 122, a controller monitoring analysis unit 123, a controller correction unit 124, a model correction unit 125, and a data presentation unit 126.

And the controller unit 121 is configured to, in the mth iteration process, select a module corresponding to each node in the initial architecture in the search space according to the sampling parameter set, and generate an instruction set, where the instruction set is used to generate an initial candidate AI model.

And the training and evaluation unit 122 is configured to construct an initial candidate AI model according to the instruction set, and train the initial candidate AI model until parameters of the initial candidate AI model converge to obtain a candidate AI model. Then, the performance of the trained candidate AI model is evaluated, and the evaluation result is fed back to the controller unit 121.

The controller monitoring and analyzing unit 123 is configured to determine at least one node of the sampling parameter to be adjusted according to sampling data in an iteration process before the mth iteration process, where the node of the sampling parameter to be adjusted is a node where a value of the first parameter is greater than or equal to a preset threshold.

The controller correcting unit 124 is configured to obtain a sampling parameter adjustment value of at least one node of the sampling parameter to be adjusted, adjust the initial sampling parameter of the node module of the at least one node of the sampling parameter to be adjusted according to the sampling parameter adjustment value of the at least one node of the sampling parameter to be adjusted, obtain a sampling parameter set, and provide the sampling parameter set to the controller unit 121.

Optionally, in some embodiments, the model generation module 12 may further include a model correction unit 125 and/or a data presentation unit 126.

The model correcting unit 125 is configured to determine the training parameters of the initial candidate AI model according to the sampling data in the mth iteration process, and provide the training parameters to the training and evaluating unit 122. The training parameters are used to train the initial candidate AI model.

A data presentation unit 126, configured to provide information of the initial candidate AI model in the N iterative processes, and/or information of the candidate AI model in the N iterative processes.

Due to the functions of the modules, the AI model generation system 100 provided in the embodiment of the present application can provide a service for automatically generating an AI model to a user.

The division of the modules of the AI model generation system 100 shown in fig. 4 is merely an example, and the present application does not limit the division of the modules and the names of the modules. For example, in some embodiments, the initial candidate AI model may also be referred to as an initial sub-model, and the candidate AI model may be referred to as a sub-model, and thus the training and evaluation unit for training the initial candidate AI model may also be referred to as a sub-model training and evaluation unit.

Fig. 5 is a schematic view of an application scenario of the AI model generation system 100 according to an embodiment of the present disclosure, and as shown in fig. 5, in an embodiment, the AI model generation system 100 may be completely deployed in a cloud environment. The cloud environment is an entity which provides cloud services to users by using basic resources in a cloud computing mode. A cloud environment includes a cloud data center that includes a large number of infrastructure resources (including computing resources, storage resources, and network resources) owned by a cloud service provider, which may include a large number of computing devices (e.g., servers), and a cloud service platform. For example, taking an example that the computing resources included in the cloud data center are servers running virtual machines, the AI model generation system 100 may be deployed independently on the servers or virtual machines in the cloud data center, or the AI model generation system 100 may also be deployed in a distributed manner on multiple servers in the cloud data center, or on multiple virtual machines in the cloud data center, or on the servers and virtual machines in the cloud data center.

As shown in fig. 5, the AI model generation system 100 may be abstracted into an AI model generation service at the cloud service platform by a cloud service provider, for example, and provided to a user, and after the user purchases the cloud service at the cloud service platform (for example, the user may pre-charge the value and then settle the account according to the final resource usage), the AI model generation service may be provided to the user by the AI model generation system 100 deployed in the cloud data center in the cloud environment. When a user uses an AI model generating service, a task (i.e., a task target) to be completed by an AI model may be specified through an Application Program Interface (API) or a GUI, and a data set is uploaded to a cloud environment, an AI model generating system 100 in the cloud environment receives the task target and the data set of the user, executes an operation of automatically generating the AI model, and the AI model generating system 100 returns the automatically generated target AI model to the user through the API or the GUI. The target AI model may be downloaded by a user or used online for accomplishing a specific task.

Fig. 6 is a schematic view of an application scenario of another AI model generation system 100 according to an embodiment of the present application, where the AI model generation system 100 according to the embodiment of the present application is more flexible to deploy, as shown in fig. 6, in another embodiment, the AI model generation system 100 according to the embodiment of the present application may also be deployed in different environments in a distributed manner. The AI model generation system 100 provided herein can be logically divided into multiple sections, each having a different functionality. The various parts of the AI model generation system 100 may be deployed in any two or three of a terminal computing device (on the user side), an edge environment, and a cloud environment, respectively. The terminal computing device located at the user side may, for example, include at least one of: terminal server, smart mobile phone, notebook computer, panel computer, personal desktop computer, intelligent camera etc.. An edge environment is an environment that includes a set of edge computing devices that are closer to a terminal computing device, the edge computing devices including: edge servers, edge kiosks that possess computational power, etc. The various parts of the AI model generation system 100 deployed in different environments or devices are cooperatively implemented to provide a user with the functionality to automatically generate an AI model. It should be understood that, in the embodiment of the present application, the specific deployments of which parts of the AI model generation system 100 are not restrictively divided into what environments, and when the AI model generation system is actually applied, adaptive deployments may be performed according to the computing capability of the terminal computing device, the resource occupation of the edge environment and the cloud environment, or the specific application requirements. Fig. 6 is a schematic view of an application scenario in which the AI model generation system 100 is deployed in an edge environment and a cloud environment, respectively.

The AI model generation system 100 can also be deployed separately on one computing device in any environment (e.g., separately on one edge server in an edge environment). Fig. 7 is a schematic diagram of a hardware configuration of a computing device 200 in which the AI model generation system 100 is deployed, and the computing device 200 shown in fig. 7 includes a memory 201, a processor 202, and a communication interface 203. The memory 201, the processor 202 and the communication interface 203 are connected with each other in communication. For example, the memory 201, the processor 202, and the communication interface 203 may be connected by a network connection. Alternatively, the computing device 200 may also include a bus 204. The memory 201, the processor 202 and the communication interface 203 are connected to each other by a bus 204. Fig. 7 is a computing device 200 with a memory 201, a processor 202, and a communication interface 203 communicatively coupled to each other via a bus 204.

The Memory 201 may be a Read Only Memory (ROM), a static Memory device, a dynamic Memory device, or a Random Access Memory (RAM). The memory 201 may store a program, and the processor 202 and the communication interface 203 are used to perform a method of the AI model generation system 100 automatically generating a target AI model for a user when the program stored in the memory 201 is executed by the processor 202. The memory may also store data needed by the AI model generation system 100 to automatically generate the target AI model, e.g., a data set, a search space, etc. used to train the initial candidate AI model.

The processor 202 may be implemented as a general purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), a Graphics Processing Unit (GPU), or one or more Integrated circuits.

The processor 202 may also be an integrated circuit chip having signal processing capabilities. In implementation, the functions of the AI model generation system 100 of the present application may be performed by instructions in the form of hardware, integrated logic circuits, or software in the processor 202. The processor 202 may also be a general-purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components, which may implement or execute the methods, steps, and logic blocks disclosed in the embodiments of the present application below. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the methods disclosed in connection with the embodiments described below may be embodied directly in the hardware decoding processor, or in a combination of the hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 201, and the processor 202 reads information in the memory 201, and completes the functions of the AI model generation system 100 according to the embodiment of the present application in combination with hardware thereof.

The communication interface 203 enables communication between the computing device 200 and other devices or communication networks using transceiver modules such as, but not limited to, transceivers. For example, the data set may be acquired through the communication interface 203.

When computing device 200 includes bus 204, as described above, bus 204 may include a pathway to transfer information between various components of computing device 200 (e.g., memory 201, processor 202, communication interface 203).

The following describes in detail how to automatically generate an AI model according to the embodiment of the present application with reference to specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

Fig. 8 is a flowchart illustrating a method for generating an AI model according to an embodiment of the present disclosure. The method may be performed by the AI model generation system 100, as shown in fig. 8, and includes:

s101, the AI model generation system receives a task target and a data set.

In the present embodiment, the AI model generation system may receive a user's task goals and data sets through an API or GUI, for example. The task target is used for representing functions of a target AI model required by a user, namely the technical problem to be solved by the target AI model. For example, a task goal may be: a neural network model which can detect and identify characters on the express bill number, or a neural network model which can accurately identify images containing various fruits, and the like. The data set described above may be used to train and evaluate initial candidate AI models generated during the generation of the target AI network model. As a possible implementation, the data set may include two subsets, one subset being a training data subset and one subset being a test data set. The training data subset is used for training an initial candidate AI model, and the test data set is used for testing the candidate AI model obtained after training so as to evaluate the prediction accuracy of the trained candidate AI model. As another possible implementation, the AI model generation system automatically divides the data set uploaded by the user into a training data set and a testing data set.

The embodiment of the present application does not limit the manner in which the AI model generation system receives the task object and the data set, for example, the AI model generation system may receive the task object from the user first, and then give a prompt to the user through an API or a GUI, so that the user uploads the collected data set according to the prompt, so that the AI model generation system receives the data set uploaded by the user. Optionally, the AI model generation system may also receive other information related to generating the target AI model, such as at least one of: a target AI model structure selected or input by a user, configurable parameters related to a search strategy used in generating an initial candidate AI model, configurable parameters used in training an initial candidate AI model, user expectations for the effect of a finally generated target AI model (e.g., it is desirable that the detection accuracy and recognition accuracy of the finally generated target AI model reach 99% or more), and the like.

S102, the AI model generation system obtains N candidate AI models according to a task target and a data set, wherein N is a positive integer, the N candidate AI models are generated through N times of iteration processes, each candidate AI model is composed of a plurality of nodes, and each node comprises one or more modules; in the Mth iteration process, the controller firstly generates an initial framework, the initial framework comprises an arrangement rule of a plurality of nodes, a module corresponding to each node in the initial framework is selected in a search space according to a sampling parameter set, an initial candidate AI model is generated, and the initial candidate AI model is trained by utilizing a data set to obtain a candidate AI model; and M is an integer which is greater than 1 and less than or equal to N, and the sampling parameter set is determined by sampling data of the AI model generation system in the iteration process before the Mth iteration process. The value of N can be set according to the requirements of users.

After receiving the task objective and the data set, the AI model generation system may generate N initial candidate AI models in the course of N iteration cycles based on the task objective. Each initial candidate AI model is used to achieve a task goal for the user. In each iteration process, the AI model generation system may use the data set to perform training iterations on the initial candidate AI model until convergence, resulting in a trained candidate AI model. The AI model generation system can then evaluate the trained candidate AI models. Then, the AI model generation system may obtain a sampling parameter set of each module for each node used when generating the initial candidate AI model in the current iteration process based on the evaluation result of the candidate AI model and the sampling data of the AI model generation system in the previous iteration process, so that the generated initial candidate AI model has better performance. The sampled data of the AI model generation system in the previous iteration process may include, for example: sample data for each module at each node, etc. For example, the number of samples taken by each node for each module, and/or other sample data that can determine the sampling probability of each node for each module, etc.

For how to select a module corresponding to each node in the initial architecture in the search space according to the sampling parameter set in one iteration process, an initial candidate AI model is generated, see the description of the subsequent embodiments.

S103, the AI model generation system determines a target AI model in the N candidate AI models.

For example, the AI model generation system may select an optimal candidate AI model from all candidate AI models generated in the N iterations and trained to be output to the user as a final target AI model. The optimal candidate AI model described herein may be understood as a candidate AI model having the smallest loss function value, a candidate AI model satisfying a condition set by a user, a candidate AI model selected by using another rule, or the like. If there are a plurality of candidate AI models that satisfy the user-set condition, the target AI model may satisfy any one of the conditions.

Taking the mth iteration process as an example, the method will be described below, in which the AI model generation system selects a module corresponding to each node in the initial architecture in the search space according to the sampling parameters, and generates an initial candidate AI model. Specifically, the method comprises the following steps:

step A, AI, the model generation system obtains an initial sampling parameter set used for generating an initial candidate AI model in the mth iteration process according to the evaluation result of the candidate AI model obtained in the mth-1 iteration process, where the initial sampling parameter set includes: each node of the initial candidate AI model is directed to the initial sampling parameters of each module.

For the implementation of this step, reference may be made to the prior art, which is not described in detail herein.

Step B, AI, the model generation system obtains the offset adjustment value of the sampling parameter of at least one node of the sampling parameter to be adjusted for each module according to the sampling data in the iteration process before the mth iteration process.

Taking the sampling data as the sampling times of each module by each node as an example, the sampling data of one node for each module may be the sampling times shown in table 1, for example, from the first iteration to the M-1 th iteration, and the number of times of generating the node (that is, the sampling times of the node for each module) by each module when the AI model generation system generates each initial candidate AI model. In another possible implementation manner, the sampling data of each module by a node may be sampling times of each module when the AI model generation system generates the structure of each initial candidate AI model within a preset time length, the deadline of the preset time length may be a time when the sampling data is currently acquired, and the preset time length may be set according to an actual situation.

As a possible implementation manner, after acquiring the sampling data of each node for each module, the AI model generation system may calculate the sampling information of each node for each module. The sampling information may include, but is not limited to, the following: the sampling probability of each node for each module, the mean value of the sampling probability of each node for each module, and the first parameter of each node. The first parameter is related to the sampling probability of each node for each module in the historical iteration process, and the first parameter can measure whether the sampling of the node for each module is balanced, for example, the sampling discrete coefficient and/or the sampling range are poor.

The information can be calculated by using the existing calculation method, for example, the sampling probability can be shown in the following formula (3):

wherein f is_ijSample probability, count, for the ith node for the jth module_ijThe number of samples for the jth module for the ith node.

Then, the AI model generation system may determine whether each node is a node whose sampling parameter is to be adjusted based on the first parameter of the node. For example, whether the first parameter of each node is greater than or equal to the preset threshold value is judged; and taking the node corresponding to the first parameter which is greater than or equal to the preset threshold value as the node of the sampling parameter to be adjusted. Taking the sampling discrete coefficient of the ith node as an example, when the sampling discrete coefficient of the ith node is greater than or equal to the preset threshold, it is described that the sampling of the ith node for a specific module is too little, that is, the sampling of the ith node for each module is unbalanced, and then the correction direction of the sampling of the ith node for each module is: and increasing the sampling frequency of the ith node for a specific module. That is, the ith node is the node whose sampling parameter is to be adjusted. When the sampling discrete coefficient of the ith node is smaller than the preset threshold, the sampling balance degree of the ith node for each module is in an allowable range, and the current parameter setting can be maintained. That is, the ith node is not the node whose sampling parameter is to be adjusted. It should be understood that the preset threshold may be set according to actual conditions. For example, the sampling equalization setting can be set according to the requirements of users on the sampling equalization.

The AI model generation system can obtain the sampling parameter offset adjustment value of the node of the sampling parameter to be adjusted according to the sampling probability and the sampling probability mean value of the node of the sampling parameter to be adjusted for each module. Wherein the sampling parameter offset adjustment value decreases with increasing sampling probability.

For example, the AI model generation system may calculate the offset adjustment value of the sampling parameter of the at least one node for each module by the following formula (4):

Δbias_ij＝fun(f_ij-mean_i) (4)

wherein, Δ bias_ijOffset adjustment value, mean, of sampling parameter for jth module for ith node_iAnd the sampling probability mean value of each module for the ith node.

For example, the AI model generation system may calculate the sampling parameter offset adjustment value of the at least one node for each module by the following equation (5):

wherein, Δ bias_ijOffset adjustment value, mean, of sampling parameter for jth module for ith node_iThe z may be an adjustable coefficient for the mean of the sampling probability of the ith node for each module.

It should be understood that the above equations (4) and (5) are only illustrative, and other functions satisfying the rule "the offset adjustment value of the sampling parameter decreases with the increase of the sampling probability" may be used for determining, and are not limited thereto.

Step C, AI, the model generating system adjusts the sampling parameter of the node of the at least one sampling parameter to be adjusted according to the sampling parameter offset adjustment value of the node of the at least one sampling parameter to be adjusted, so as to obtain a sampling parameter set.

The AI model generation system may adjust the sampling parameter of the at least one node of the sampling parameter to be adjusted acquired in step a for each module using the offset adjustment value of the sampling parameter of the at least one node of the sampling parameter to be adjusted for each module. For example, the adjustment can be made by the following equation (6):

b′_ij＝b_ij+c*Δbias_ij (6)

wherein, b'_ijTo be adjusted afterFor the sampling parameter of the jth module, b_ijAnd c is a sampling parameter of the ith node for the jth module, and is an adjustable coefficient. As a possible implementation manner, the adjustable coefficient may be input to the AI model generation system by a user or set by the AI model generation system.

Taking the sampling parameter of the ith node for the jth module as an example, since fun () used for calculating the offset adjustment value of the sampling parameter of the ith node for the jth module is a monotonous decreasing function, the sampling probability f of the ith node for the jth module_ijThe larger the sampling parameter bias adjustment value delta bias of the ith node for the jth module_ijThe smaller will be. Therefore, when the ith node is generated for the sampling parameter of the jth module based on the adjusted ith node obtained by the sampling parameter offset adjustment value, the probability of generating the ith node by using the jth module is reduced. That is, the sampling probability of the ith node for the jth module can be reduced. Correspondingly, the sampling probability f of the ith node for the jth module_ijThe smaller the ith node is, the smaller the sampling parameter bias adjustment value delta bias of the ith node for the jth module_ijThe larger will be. Therefore, when the ith node is generated for the sampling parameter of the jth module based on the adjusted ith node obtained by the sampling parameter offset adjustment value, the probability of generating the ith node by using the jth module is increased. That is, the sampling probability of the ith node for the jth module can be increased. That is, the particular sampling bias parameter for each module at the node decreases with increasing sampling probability for each module and increases with decreasing sampling probability for each module.

Therefore, for the unbalanced sampling node, the sampling probability of the node for each module can be changed by adjusting the sampling parameters of the node for each module used when the initial candidate AI model is generated in the mth iteration process, the sampling probability of the node for each module is reduced for the module with the high original sampling probability, and the sampling probability of the module with the low original sampling probability is increased for the module with the low original sampling probability, so that the sampling balance of the node for each module can be improved when the module corresponding to each node in the initial architecture is selected in the search space based on the sampling set. Therefore, the training degrees of the modules with balanced sampling are similar, so that the evaluation accuracy of the candidate AI model can be improved, the updating accuracy of the candidate AI model in the next iteration period can be improved, and the accuracy of the final model search result is improved finally.

Although, this example has been schematically illustrated with respect to a node having sampling unevenness. It should be understood that when all nodes are equalized for sampling of each module, then the above flow execution element has no steps B and C.

Step D, AI the model generation system uses the sampled parameter set to select a module in the search space corresponding to each node in the initial architecture, generating an initial candidate AI model.

Step E, AI, the model generation system uses the data set to perform training iteration on the initial candidate AI model to obtain a trained candidate AI model, and evaluates the trained candidate AI model.

Regarding the implementation manner of step D and step E, reference may be made to the prior art, which is not described herein again.

Step F, AI the model generation system determines if M is equal to N, if so, it ends, otherwise, it proceeds to step G.

And G, adding 1 to M, and returning to execute the step A.

The above description has exemplified how the AI model generation system calibrates the sampling parameters of the nodes for each module based on the sampling parameter offset adjustment values of the sampling unbalanced nodes for each module, taking the mth iteration process as an example. It should be appreciated that the AI model generation system may employ this approach to adjust for the sampling imbalance nodes each iteration cycle, and so on iteratively until a maximum number of iterations is met, e.g., 2000.

When the AI model generation system performs training iteration on the initial candidate AI model by using the data set, that is, in the step E, the adjustment value of the parameter of each module of each node of the initial candidate AI model may be calculated by using the uniform learning rate change strategy described above in each training iteration period. Optionally, as a possible implementation manner, a learning rate used by each sub-module of the initial candidate AI model may be determined according to sampling data of each module by each node of the initial candidate AI model, and an adjustment value of a parameter of each module of each node of the initial candidate AI model is calculated by using the learning rate.

For example, the AI model generation system may calculate the learning rate of each node for each module using the following equation (7):

wherein,

for the learning rate of the ith node for the jth module,

for the ith node, count is the relative period of training for the jth module_ijFor the sampling times of the ith node for the jth module, batch _ num is the number of times that the candidate AI model needs to be trained, and 1r _ dec _ T is a preset constant. It should be understood that the sampling times of the ith node for the jth module are the same as the training times of the ith node for the jth module, and therefore, the count is the same as the value of the sampling times of the ith node for the jth module_ijWhich may also be referred to as the number of times the ith node trains against the jth module. The count_ijMay be determined based on the sampling data (e.g., number of samples, etc.) of the ith node for the jth module.

The learning rate of each node for each module can be changed along with the sampling times of the ith node for the jth module by using the sampling times of the ith node for the jth module as parameters for calculating the learning rate of each node for each module. In this implementation, the learning rate is smaller as the number of sampling times is larger, and the variation of the parameter calculated based on the learning rate is smaller. The smaller the number of sampling times, the larger the learning rate, and the larger the variation of the parameter calculated based on the learning rate. By the mode of adaptively adjusting the learning rate according to the sampling data of each module based on each node, the learning rate deviation of each module can be reduced, and the difference of the training degree among the modules can be reduced, so that the accuracy of candidate AI model training evaluation can be improved, the accuracy of AI model updating can be improved, and the jitter of an AI model generation system is reduced. Jitter as used herein refers to instability of the candidate AI model generated during loop iterations. In addition, the convergence rate of the initial candidate AI model can be improved, so that the time for generating the candidate AI model is reduced, the time for obtaining the target AI model is reduced, and the waiting time of the user is reduced.

It should be understood that although the learning rate is used as an example, how to adaptively adjust the learning rate based on the sampled data of each node for each module is described. However, those skilled in the art can understand that any training parameter related to the sampling data of each node for each module can implement adaptive adjustment by using a similar principle, and details thereof are not described again.

Optionally, in some embodiments, after the step S103, the method may further include: the AI model generation system outputs a target AI model. For example, the AI model generation system outputs the target AI model and information about the target AI model (e.g., the prediction accuracy of the target AI model) via an API or GUI. For example, the AI model generation system may output the target AI model to another device, or a display device (e.g., a display screen, etc.), or the like.

Optionally, in some embodiments, the AI model generation system may further output information of the initial candidate AI model in the process of N iterations and/or information of the candidate AI model in the process of generating the target AI model. For example, the AI model generation system may output information of the initial candidate AI model and/or information of the candidate AI model in the N iterations to other devices, or a display device (e.g., a display screen, etc.), and/or the like.

The information of the initial candidate AI model in the above-mentioned N iterations, and/or the information of the candidate AI model may include at least one of the following information:

presetting a threshold value, sampling frequency, sampling probability, sampling extreme value, sampling discrete coefficient, sampling times, sampling parameter bias, calibrated sampling parameters, parameter quantity of each layer of a candidate AI model, feature map (feature map) shape of each layer of the candidate AI model, inference time of the candidate AI model, precision of the candidate AI model, loss function value after convergence of the candidate AI model, training control parameters of each node for each module and the like.

Optionally, the AI model generation system may further output the following calculation analysis content, for example, detailed analysis and prediction (for example, convergence speed, predicted convergence time, etc.) of the overall process of model generation in terms of generation, adjustment amplitude, adjustment direction, precision improvement tendency, etc. of the model. The AI model generation system can also output hints information as to when the task can be terminated early, which can be determined based on convergence of the candidate AI models.

By outputting the information, the user can be helped to better understand the structure of the AI model, the credibility of the AI model generation is enhanced, the use experience of the user is optimized, and the satisfaction of the user is improved.

It should be understood that the above method embodiments may be applied to a scenario in which the AI model generation system uses ENAS to generate a neural network model, a scenario in which the AI model generation system uses NAS to generate a neural network model, and an application scenario in which the AI model generation system generates other AI models, which is not limited in this application.

The present application further provides an AI model generation system 100 as shown in fig. 4, and the modules and functions of the AI model generation system are as described above and will not be described herein again. In an embodiment, the user I/O module 11 in the AI model generation system 100 is specifically configured to perform the actions received and output in the foregoing method embodiment, and the model generation module 12 is configured to perform the actions of automatically generating the target AI model in the foregoing method embodiment, which have similar implementation principles and are not described again.

The present application also provides a computing device 200 as shown in fig. 7, and a processor 202 in the computing device 200 reads programs and data sets stored in a memory 201 to execute the AI model generation method performed by the aforementioned AI model generation system.

Fig. 9 is a schematic structural diagram of another computing device according to an embodiment of the present application. Since the respective modules in the AI model generation system 100 provided by the present application can be distributively deployed on a plurality of computers in the same environment or different environments, the present application also provides a computing device as shown in fig. 9, which includes a plurality of computers 300, each computer 300 including a memory 301, a processor 302, a communication interface 303, and a bus 304. The memory 301, the processor 302 and the communication interface 303 are connected to each other by a bus 304.

The Memory 301 may be a Read Only Memory (ROM), a static Memory device, a dynamic Memory device, or a Random Access Memory (RAM). The memory 301 may store a program, and the processor 302 and communication interface 303 are used to perform part of the method by which the AI model generation system automatically generates AI models for a user when the program stored in the memory 301 is executed by the processor 302.

The processor 302 may be a general-purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), a Graphics Processing Unit (GPU), or one or more Integrated circuits.

The processor 302 may also be an integrated circuit chip having signal processing capabilities. In implementation, some or all of the functions of the AI model generation system of the present application may be performed by instructions in the form of hardware, integrated logic circuits, or software in the processor 302. The processor 302 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments above of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 301, and the processor 302 reads information in the memory 301, and completes, in conjunction with hardware thereof, part of the functions of the AI model generation system according to the embodiment of the present application.

The communication interface 303 enables communication between the computer 300 and other devices or communication networks using transceiver modules such as, but not limited to, transceivers. For example, the data set may be acquired through the communication interface 303.

Bus 304 may include a path that transfers information between components of computer 300 (e.g., memory 301, processor 302, communication interface 303).

A communication path is established between each of the above-mentioned computers 300 through a communication network. Any one or more of the user I/O module 11, the model generation module 12, are running on each computer 300. Any of the computers 300 may be a computer (e.g., a server) in a cloud data center, or a computer in an edge data center, or a terminal computing device.

The descriptions of the flows corresponding to the above-mentioned figures have respective emphasis, and for parts not described in detail in a certain flow, reference may be made to the related descriptions of other flows.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product providing an AI model generation function includes one or more computer program instructions for AI model generation that, when loaded and executed on a computer, cause, in whole or in part, the processes or functions described in fig. 8 to be performed according to embodiments of the invention.

The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another, for example, where the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via a wire (e.g., coaxial cable, fiber optics, digital subscriber line, or wireless (e.g., infrared, wireless, microwave, etc.) manner, the computer-readable storage medium storing a readable storage medium providing AutoML computer program instructions, the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device including one or more of the available media, which may be magnetic media, (e.g., floppy disk, hard disk, magnetic tape), optical media (e.g., DVD), or semiconductor media (e.g., SSD).

Claims

1. An artificial intelligence AI model generation method is applied to an AI model generation system and comprises the following steps:

receiving a task target and a data set, wherein the task target is used for representing functions of a target AI model required by a user;

obtaining N candidate AI models according to the task target and the data set, wherein N is a positive integer, the N candidate AI models are generated through N iterative processes, each candidate AI model is composed of a plurality of nodes, and each node comprises one or more modules; in the Mth iteration process, selecting a module corresponding to each node in an initial framework in a search space according to a sampling parameter set, generating an initial candidate AI model, and training the initial candidate AI model by using the data set to obtain a candidate AI model; wherein M is an integer greater than 1 and less than or equal to N, and the sampling parameter set is determined by sampling data in an iteration process before the Mth iteration process;

a target AI model is determined among the N candidate AI models.

2. The method of claim 1, wherein sampling data in an iterative process prior to the mth iterative process comprises: and in the iteration process before the Mth iteration process, the generated sampling data of each node in the initial candidate AI model aiming at the modules.

3. The method of claim 1 or 2, wherein the set of sampling parameters comprises a plurality of sampling parameters, each sampling parameter for determining a sampling probability for each node in the initial architecture for each module.

4. The method according to claim 1 or 2, wherein the selecting a module corresponding to each node in the initial architecture in the search space according to the sampling parameter set during the mth iteration to generate the initial candidate AI model comprises:

obtaining an initial sampling parameter set in the M-th iteration process according to the evaluation result of the candidate AI model obtained in the M-1 th iteration process, wherein the initial sampling parameter set comprises: initial sampling parameters of each node pair module of the initial candidate AI model;

acquiring a sampling parameter adjustment value of at least one node of a sampling parameter to be adjusted according to sampling data in an iteration process before the Mth iteration process;

and adjusting the initial sampling parameter of the node pair module of the at least one sampling parameter to be adjusted according to the sampling parameter adjustment value of the node of the at least one sampling parameter to be adjusted to obtain a sampling parameter set.

5. The method according to claim 4, wherein the obtaining of the sampling parameter adjustment value of the node of at least one sampling parameter to be adjusted according to the sampling data in the iterative process before the mth iterative process comprises:

acquiring sampling probability, a sampling probability mean value and a value of a first parameter of each node aiming at each module according to sampling data in an iteration process before the Mth iteration process, wherein the first parameter is related to the sampling probability of each node aiming at each module in a historical iteration process;

determining at least one node of a sampling parameter to be adjusted, wherein the node of the sampling parameter to be adjusted is a node of which the value of the first parameter is greater than or equal to a preset threshold value;

and acquiring a sampling parameter adjustment value of the node of the at least one sampling parameter to be adjusted according to the sampling probability and the sampling probability mean value of the node pair module of the at least one sampling parameter to be adjusted.

6. The method of claim 5, wherein the first parameter is a sample discrete coefficient or a sample range.

7. The method of any one of claims 1-6, further comprising:

and outputting the target AI model.

8. The method of any one of claims 1-7, further comprising:

outputting information of the initial candidate AI model in the N iterative processes, and/or,

and outputting the information of the candidate AI model in the N iterative processes.

9. The method of any one of claims 1-8, wherein during the mth iteration, the method further comprises:

and in the M iteration process, determining the training parameters of the initial candidate AI model according to the sampling data in the M iteration process, wherein the training parameters are used for training the initial candidate AI model.

10. An Artificial Intelligence (AI) model generation system, the system comprising:

the system comprises a user input/output (I/O) module, a task object and a data set, wherein the task object is used for representing functions of a target AI model required by a user;

the model generation module is used for obtaining N candidate AI models according to the task target and the data set and determining a target AI model in the N candidate AI models; the method comprises the steps that N is a positive integer, the N candidate AI models are generated through N iterative processes, each candidate AI model is composed of a plurality of nodes, and each node comprises one or more modules; in the Mth iteration process, selecting a module corresponding to each node in an initial framework in a search space according to a sampling parameter set, generating an initial candidate AI model, and training the initial candidate AI model by using the data set to obtain a candidate AI model; wherein M is an integer greater than 1 and less than or equal to N, and the sampling parameter set is determined by sampling data in an iteration process before the Mth iteration process.

11. The system of claim 10, wherein sampling data in an iterative process prior to the mth iterative process comprises: and in the iteration process before the Mth iteration process, the generated sampling data of each node in the initial candidate AI model aiming at the modules.

12. The system of claim 10 or 11, wherein the set of sampling parameters includes a plurality of sampling parameters, each sampling parameter for determining a sampling probability for each node in the initial architecture for each module.

13. The system according to claim 10 or 11, wherein the model generating module is configured to, during the mth iteration, select a module corresponding to each node in the initial architecture in the search space according to the sampling parameter set, and when generating the initial candidate AI model, specifically:

14. The system according to claim 13, wherein the model generating module, when acquiring a sampling parameter adjustment value of at least one node of the sampling parameter to be adjusted according to the sampling data in the iterative process before the mth iterative process, is specifically configured to:

15. The system of claim 14, wherein the first parameter is a sample discrete coefficient or a sample range.

16. The system of any one of claims 10-15,

the user I/O module is further configured to output the target AI model.

17. The system of any one of claims 10-16,

the user I/O module is further configured to output information of the initial candidate AI model in the N iteration processes, and/or output information of the candidate AI model in the N iteration processes.

18. The system of any one of claims 10-17,

and the model generation module is further used for determining training parameters of the initial candidate AI model according to the sampling data in the Mth iteration process, wherein the training parameters are used for training the initial candidate AI model.

19. A computing device, comprising a memory to store a set of computer instructions and a processor;

the processor executes a set of computer instructions stored by the memory to perform the method of any of the above claims 1-9.

20. A non-transitory readable storage medium storing computer program code, wherein the computer program code is executed by a computing device to perform the method of any one of claims 1 to 9.