CN110766090A

CN110766090A - Model training method, device, equipment, system and storage medium

Info

Publication number: CN110766090A
Application number: CN201911048084.5A
Authority: CN
Inventors: 欧阳显斌; 周飞虎; 魏杰乾
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-10-30
Filing date: 2019-10-30
Publication date: 2020-02-07

Abstract

The model training method provided by the application divides a model training stage into a plurality of continuous training stages, emphasizes iterative updating of model network parameters and optimized searching aiming at specific hyper-parameters in each training stage, so that a model after training of each training stage can have the optimal capability of the specific hyper-parameters, automatically takes the optimal model obtained by training of the current training stage as an initial model of the next training stage after training of each training stage, adopts the hierarchical progressive mode for training, integrates the optimal capabilities of all the specific hyper-parameters of the optimal model obtained by training of the last training stage, and carries out the optimization process of the hyper-parameters in a normal model training process without spending excessive time, so that the training time and the training effect can be well evaluated, the model with the optimal comprehensive performance can be obtained through training in a specified time.

Description

Model training method, device, equipment, system and storage medium

Technical Field

The present application relates to the field of machine learning technologies, and in particular, to a model training method, apparatus, device, system, and storage medium.

Background

Reinforcement Learning (DRL), also known as refit Learning, evaluation Learning or Reinforcement Learning, is one of the paradigms and methodologies of machine Learning, and is used to describe and solve the problem that an agent (agent) can achieve maximum return or achieve a specific goal through Learning strategies in the process of interacting with the environment.

However, the reinforcement learning method has some problems at present, wherein the problem of super-parameter setting has important research value because the super-parameter setting directly affects the learning efficiency and quality. Generally, a model with excellent performance can be trained only by using a deep reinforcement learning method which usually requires tens of days and several equipment resources, and a traditional training mode is to train by fixing a group of values of hyper-parameters from beginning to end, namely, the hyper-parameters are kept unchanged in the training process, however, the model obtained by training has single capability because the hyper-parameters are kept single in the training process.

Of course, besides reinforcement learning, the same problem of hyper-parameter setting can exist in the scenes of supervised learning, unsupervised learning and the like, and based on the problem, a solution is urgently needed to be researched in the scene of machine learning so as to find a balance point in the training effect and the training time, so that the comprehensive capability of various hyper-parameters in the model, namely the capability of the model, can be improved while the training efficiency is improved.

Disclosure of Invention

The embodiment of the application provides a model training method, a device, equipment, a system and a medium, which can balance training effect and training time, optimize different hyper-parameters in a model training process in a staged manner, so that the model can maximally synthesize various hyper-parameter capabilities, and the final performance of the model is improved.

In a first aspect of the present application, there is provided a model training method, the method comprising:

determining a plurality of continuous training stages corresponding to the model, wherein different training stages are used for synchronously optimizing different hyper-parameters in the training process of the model;

in the current training stage of the plurality of continuous training stages, carrying out hyper-parameter optimization search according to a hyper-parameter search range corresponding to the hyper-parameters to be optimized in the current training stage, and obtaining an optimal model obtained by training in the current training stage as an optimal model in the current training stage;

and taking the optimal model in the current training stage as an initial model in the next training stage, performing hyper-parameter optimization search according to a hyper-parameter search range corresponding to hyper-parameters to be optimized in the next training stage, and obtaining the optimal model obtained by training in the next training stage as the optimal model in the next training stage until obtaining the optimal model obtained by training in the last training stage in the plurality of continuous training stages.

In a second aspect of the present application, there is provided a model training apparatus, the apparatus comprising:

the determining module is used for determining a plurality of continuous training stages corresponding to the model, and different training stages are used for synchronously optimizing different hyper-parameters in the training process of the model;

the first training module is used for carrying out hyper-parameter optimization search according to a hyper-parameter search range corresponding to a hyper-parameter to be optimized in the current training stage of the plurality of continuous training stages to obtain an optimal model obtained by training in the current training stage as the optimal model in the current training stage;

and the second training module is used for taking the optimal model in the current training stage as an initial model in the next training stage, performing hyper-parameter optimization search according to a hyper-parameter search range corresponding to hyper-parameters to be optimized in the next training stage, and obtaining the optimal model obtained by training in the next training stage as the optimal model in the next training stage until obtaining the optimal model obtained by training in the last training stage in the plurality of continuous training stages.

In a third aspect of the present application, there is provided an apparatus comprising:

a memory and a processor;

the memory for storing a computer program;

the processor is configured to run the computer program to perform the method provided in the first aspect of the present application.

In a fourth aspect of the present application, there is provided a system for model training, the system comprising:

a plurality of first servers and second servers;

the first server is used for operating the model of the agent to obtain operation data;

the second server is configured to train the model in the following manner by using the operation data generated by the plurality of first servers as sample data:

determining a plurality of continuous training stages corresponding to the model, wherein different training stages are used for synchronously optimizing different hyper-parameters in the model training process;

In a fifth aspect of the present application, a computer-readable storage medium for storing a computer program for performing the method provided by the first aspect of the present application is provided.

According to the technical scheme, the scheme provided by the application has the following advantages:

the model training method provided by the application divides a model training stage into a plurality of continuous training stages, and the training stages are divided so that in the subsequent model training process, the aim of emphasizing the iterative update of model network parameters and the optimization search aiming at specific hyper-parameters in each training stage is fulfilled, so that a model after the training of each training stage can have the optimal capability of the specific hyper-parameters, and the optimal model obtained by the training of the current training stage can be automatically used as the initial model of the next training stage after the training of each training stage is finished, so that the training in a hierarchical progressive mode can ensure that the optimal model obtained by the training of the last training stage integrates the capabilities of all optimized specific hyper-parameters, so that the model performance can be improved, and the optimization process of the hyper-parameters is carried out in the normal training process of the model, and excessive time is not needed to be spent additionally, so that the training time and the training effect can be well evaluated, and the model with the optimal comprehensive performance can be obtained by training in the specified time.

Drawings

Fig. 1 is an exemplary diagram of an application scenario of a model training method provided in an embodiment of the present application;

FIG. 2 is a flow chart of a model training method provided in an embodiment of the present application;

FIG. 3 is a schematic diagram of an implementation process of a model training method provided in an embodiment of the present application;

FIG. 4 is a block diagram of a model training apparatus provided in an embodiment of the present application;

fig. 5 is a schematic diagram of a server according to an embodiment of the present application;

fig. 6 is a block diagram of a model training system according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

At present, in the model training process, an engineer usually performs manual configuration according to personal experience, and after the configuration is completed, the hyper-parameters are fixed from beginning to end in the model training process, however, researches show that the fixed hyper-parameters affect the final performance of the model, and the optimal state of the hyper-parameters is adaptively adjusted along with the advance of the learning process instead of using a set of fixed hyper-parameter configuration from beginning to end. Because the training of the model is time-consuming and expensive, the training time and the training efficiency are both combined in the model training process, and how to train the model with excellent performance in a limited time is a problem which is always concerned in the field of model training, while the application compatibly considers the two aspects of the training time and the training effect, and provides a corresponding solution for the problem that the existing hyper-parameters need adaptive optimization in the model training, the solution carries out staged division on normal model training, different training stages are configured to carry out optimization search synchronously aiming at specific hyper-parameters while carrying out model network parameter training, so that the model finally trained is subjected to the hyper-parameter optimization search in multiple stages to determine the optimal hyper-parameters, the optimal capability of various hyper-parameters can be combined, and the model performance can be improved, and the training time and the training resources are not additionally consumed, and the training time and the training effect can be well balanced.

The scheme of the model training method provided by the application can be applied to equipment with data processing capacity, such as a server; when the method is applied specifically, the server can be deployed in a cluster mode, the model training process is achieved through the cluster server so as to complete model training, and therefore the model training efficiency can be improved while the model training effect is improved.

The model training method provided by the application can be suitable for training any model related to super-parameter configuration in a machine learning scene, so that the model can synchronously update model network parameters and optimize specific super-parameters in different training stages; the machine learning scenes can be roughly divided into several types, namely supervised learning, unsupervised learning, reinforcement learning and the like, and the model training method provided by the application can be applied to any types of machine learning scenes, namely, the model training method can be applied to training as long as the model relates to the hyper-parameters and the hyper-parameters need to be optimized. And the nature and implementation logic of the method are basically consistent when applied to different machine learning scenarios, so the following description is only applied to the explanation in the reinforcement learning scenario for convenience, but is not limited thereto.

For ease of understanding, the reinforcement learning scenario will be briefly described below, and reinforcement learning is a special type of machine learning algorithm, and the problem to be solved by the algorithm is how an agent (i.e., an entity running the reinforcement learning algorithm) performs actions in the environment to obtain the maximum accumulated reward. Reinforcement learning is a method for solving decision problems, and an algorithm needs to obtain a mapping function called a policy function through sample learning, wherein the input of the mapping function is current time environment information, and the output of the mapping function is an action to be executed.

For example, when an intelligent driving automobile is developed in the field of automatic driving, the action of the automobile needs to be controlled through a reinforcement learning algorithm, and safe driving to a destination is guaranteed. Therefore, the intelligent agent needs to decide the next best operation of the vehicle according to the current environmental state parameters through the model, and specifically, the model needs to decide the driving behavior of the unmanned vehicle according to the current road conditions and the self state (such as speed and acceleration) of the unmanned vehicle, such as controlling a steering wheel, an accelerator, a brake and the like, so as to control the safe driving of the vehicle.

For another example, when a robot is developed, the robot needs to control its motion through a reinforcement learning algorithm, so that the robot can determine its motion to be executed according to the current environment and its own state, and the robot can reach a high intelligent state.

For another example, in the game development process, a model of a game agent, referred to as game AI for short, needs to be trained based on a reinforcement learning technique, and the essence of the game AI is that actions to be executed by the trained model need to be determined according to the current game picture and state in the game scene, such as pressing a keyboard, a handle, and a mouse of the game. The performance of the model directly determines the performance of the game AI, in other words, the performance of the model directly affects whether the game system can be normally and orderly played.

An application scenario of the model training method provided by the present application is described below by taking a training scenario of a game AI as an example, please refer to fig. 1, where the application scenario shown in fig. 1 includes a plurality of first servers 101 and a second server 102, the plurality of first servers 101 are configured to obtain combat data through a game AI, the second server is configured to use the combat data as training data, perform reinforcement learning on a model of the game AI to finally train to obtain a model with multiple kinds of hyper-parameter optimal capabilities, and then put the trained model into the game AI for use.

Specifically, the second server is configured to execute the model training method provided by the present application, and execute the following steps: determining a plurality of continuous training stages corresponding to the model, wherein different training stages are used for synchronously optimizing and searching different hyper-parameters in the training process of the model; in the current training stage of the plurality of continuous training stages, carrying out hyper-parameter optimization search according to a hyper-parameter search range corresponding to the hyper-parameters to be optimized in the current training stage, and obtaining an optimal model obtained by training in the current training stage as an optimal model in the current training stage; and taking the optimal model in the current training stage as an initial model in the next training stage, performing hyper-parameter optimization search according to a hyper-parameter search range corresponding to hyper-parameters to be optimized in the next training stage, and obtaining the optimal model obtained by training in the next training stage as the optimal model in the next training stage until obtaining the optimal model obtained by training in the last training stage in the plurality of continuous training stages.

The above is an application scenario of the model training method provided by the present application, and in a specific implementation, the model training method may also be applied to other application scenarios, for example, the whole training process may be completed by only one server, or the whole training process may be completed by only one terminal device, or the whole training process may be completed by an interactive manner between the terminal device and the server. Of course, there are other application scenarios, which are not listed here.

For ease of understanding, one model training method provided herein is explained below from a server perspective. Referring to fig. 2, fig. 2 is a flowchart illustrating a model training method provided in an embodiment of the present application, where the method includes:

s201, determining a plurality of continuous training stages corresponding to the model, wherein different training stages are used for synchronously optimizing different hyper-parameters in the training process of the model;

the determination of the plurality of continuous training phases corresponding to the model means that the training time of the model is divided into the plurality of continuous training phases, generally, the training time of one model can be basically preset, the training time of the model means the time approximately required for training the model to reach the optimal state, which is determined according to historical experience, and the training time of the model can be characterized by the maximum training step corresponding to the model, such as Maxstep, or can be understood as the amount of batch data to be trained, i.e., batch-size required for training.

In the process of determining a plurality of continuous training stages of the model, the maximum training step length can be divided into a plurality of parts by adopting an average division mode, and the training step length corresponding to each training stage is the same; for example, if the Maxstep of the model is 50 ten thousand steps, and the number of the stages that need to be divided is 2, the training time length of the divided first training stage, that is, the training step length is 50w/2 — 25 w; in the same way, the training step size of the second training stage is the same as that of the first training stage, and is 25 w. In addition to determining multiple continuous training phases of the model by using an average division method, in a specific implementation, the training phases may be divided according to the importance degree of the hyper-parameter to be optimized in each phase, and the training step lengths corresponding to different training phases may be different or the same.

During specific implementation, determining a plurality of continuous training stages corresponding to a model, wherein the continuous training stages mainly depend on the number N of times of hyper-parametric capacity hybridization required by the model training at this time and a hyper-parameter search range corresponding to a hyper-parameter involved in the ith hybridization, wherein N is an integer greater than 1, and the value of i is all positive integers from 1 to N; and setting the N continuous training phases aiming at model training, wherein the ith training phase is used for optimizing the hyper-parameters involved in the ith hybridization.

The number of times of crossing N of the hyper-parameter capability refers to the number of stages which need to execute the hyper-parameter optimization in the model training process, each hyper-parameter optimization stage can be optimized only for one specified hyper-parameter, and can also be synchronously optimized for a group of specified hyper-parameters, so that the group of specified hyper-parameters is simply understood as a plurality of hyper-parameters. The set times of the super-parameter ability hybridization are hopeful to be obtained by N times of super-parameter ability hybridization training, so that the model can be hybridized and synthesized with various kinds of super-parameter optimal ability, and the comprehensive ability of the model is improved.

In a specific implementation, at least one training stage is used for carrying out optimization search on one hyper-parameter, and at least another training stage is used for carrying out synchronous optimization search on multiple hyper-parameters in a plurality of continuous training stages. Of course, it can also be set that each training phase is only used for carrying out optimization search on a specific hyper-parameter; it can also be arranged that each training phase is used to perform an optimization search for a plurality of specific hyper-parameters. Of course, the hyper-parameters for different training phases are different, so that repeated optimization can be avoided.

The following is a brief explanation of the hyper-parameter.

The model generally includes network parameters, which may be simply understood as parameters iteratively updated by training in the model, and hyper-parameters, which are parameters set values before starting the learning process, rather than parameter data obtained by training. In general, the hyper-parameters need to be optimized, and a set of optimal hyper-parameters is selected for the learning machine.

Some common hyper-parameters in the common model, such as learning rate, number of hidden layers in the deep neural network, number of clusters in the K-means cluster, etc., are given below, although different models have different number of hyper-parameters, and the hyper-parameters to be optimized are also different. In the model training process, the optimization can be selectively carried out aiming at the specified hyper-parameters, and the configuration is carried out before the machine learning by engineers according to experience values by adopting a traditional mode aiming at other hyper-parameters.

For example, 10 hyper-parameters exist in one model, but research shows that only 3 hyper-parameters need to be optimized, and the rest 7 hyper-parameters are configured in advance by engineers according to personal experience, so that the hyper-parameter optimization is synchronously completed in the model training process based on the 3 hyper-parameters to be optimized.

In specific implementation, because the super-parameter capacities used by the model in different operation periods are different, the sequence of super-parameter optimization search can be set according to the sequence of the super-parameter capacities playing important roles in model application, namely, which super-parameter is synchronously optimized in which training stage, namely, each training stage of the model is provided with the corresponding super-parameter to be optimized, so that model training and super-parameter optimization search are performed according to the sequence, and the model obtained by final training meets the requirements of the actual application environment.

The essence of the optimization search of the hyper-parameters is to search the values of the hyper-parameters within the hyper-parameter search range corresponding to the hyper-parameters, so as to search the hyper-parameter values which enable the model performance to be optimal. Before the training process is executed, it is necessary to clearly divide the training process into how many training phases, and a hyper-parameter to be optimized and a hyper-parameter search range corresponding to the hyper-parameter to be optimized corresponding to each training phase.

The hyper-parameter search range may be a value range set according to experience. For example, the search range of the hyper-parameters corresponding to some hyper-parameters is [0, 1 ], the value range of the hyper-parameters corresponding to some hyper-parameters is [0.2, 0.6 ], and the like, and the value ranges of the hyper-parameters corresponding to different hyper-parameters may be the same or different.

After the number of the continuous training stages required by the model training, the step length of each training stage and the hyper-parameter search range corresponding to the corresponding hyper-parameter to be optimized are determined through S201, and based on these basic parameters, the following step S202 is continuously executed.

S202, in the current training stage of the plurality of continuous training stages, carrying out hyper-parameter optimization search according to a hyper-parameter search range corresponding to the hyper-parameter to be optimized in the current training stage, and obtaining an optimal model obtained by training in the current training stage as the optimal model in the current training stage;

in order to facilitate understanding of the whole model training process, the essence of model training is briefly explained, and the essence of the general model training process is that network parameters in the model are continuously trained through sample data, and the essence of model training is that the network parameters in the model are continuously trained through the sample data, so that the model is in a convergence state, namely, a relatively stable and excellent-performance state is achieved.

In the model training method of this embodiment, the whole training process of the model is divided into a plurality of continuous training phases, for example, N continuous training phases; and simultaneously compatible with the optimization search processing of the hyper-parameters to be optimized corresponding to the training stage in each model training stage, so that the optimal model finally trained in the training stage has the optimal capability of the hyper-parameters.

The first training stage is used as the initial training stage of the model, and the optimal model of the first training stage obtained by training the first training stage is used as the initial model of the second training stage, namely the second training stage is used for continuing model training on the basis of the optimal model of the first training stage, so that the training result, namely the training ability of the first training stage is inherited and continued; similarly, the second training stage optimal model obtained by the training in the second training stage can be used as the initial model in the third training stage, that is, the model training is continued on the basis of the second training stage optimal model in the third training stage, so that the training results in the first two training stages can be passed and recorded. Thus, training is performed sequentially in stages until the last training stage.

When the current training stage is finished, because a plurality of models, namely intermediate models, are generated in the super-parameter optimal search process, one intermediate model with the strongest super-parameter capability needs to be selected from the plurality of intermediate models to serve as the optimal model in the current training stage, and when the training step number in the current training stage reaches the preset step number, a plurality of intermediate models obtained after training is carried out after super-parameter optimal search in the current training stage are obtained; and selecting one intermediate model with the model performance meeting the preset condition and the optimal performance from the plurality of intermediate models, and taking the selected intermediate model as the optimal model obtained by training in the current training stage.

Whether the current training stage is finished or not is determined through the training steps, for example, when the training step corresponding to the current training stage is 10w steps, when the current training stage trains to 10w batches of data, the current training stage is determined to be finished, namely, the training step in the current training stage reaches the preset step, at the moment, a plurality of intermediate models obtained after the training is carried out after the hyper-parameter optimization search in the current training stage can be obtained, then whether the performances of the intermediate models meet preset conditions or not is evaluated, and the intermediate model with the best performance and meeting the preset performance is selected from the intermediate models to serve as the optimal model in the current training stage.

The performance evaluation parameters adopted by different models may be different, for example, for some classification models, the performance evaluation parameters may be classification accuracy, so that, during performance evaluation, a middle model with the classification accuracy greater than a preset threshold and the highest classification accuracy is selected as an optimal model; for the battle models of some game scenes, the performance evaluation parameter may be a hit rate, and thus, in the performance evaluation, an intermediate model with the hit rate greater than a preset threshold and the highest hit rate is selected as the optimal model. Of course, the implementation of the embodiment of the present application is not limited to the above example, and other means for evaluating the performance of the model may be adopted, which is not limited to this.

In the model training process, not every training stage can train and obtain the optimum model meeting the condition, in order to guarantee that the model training process can be continuously and effectively carried out, the embodiment of the application further provides a corresponding means, when the performance of the plurality of intermediate models does not meet the preset condition, namely when the optimum model in the current stage cannot be selected, the training in the current training stage is considered to be failed, and repeated training is needed, however, the repeated training is not directly repeated, but the initial model depended on in the current training stage needs to be selected again, specifically, the suboptimal model obtained by the training in the previous training stage can be used as the initial model of the current training model, and the step is returned to be repeatedly executed: and performing hyper-parameter optimization search according to a hyper-parameter search range corresponding to the hyper-parameters to be optimized in the current training stage, and obtaining an optimal model obtained by training in the current training stage as the optimal model in the current training stage.

Of course, if the optimal model of the training stage still cannot be obtained after the training of the current training stage is repeated for the second time, the next suboptimal model obtained by the training of the previous training stage can be selected again as the initial model in the same manner, and the training is performed again until the optimal model of the training stage is obtained by the training.

Under special conditions, if the optimal model meeting the conditions cannot be obtained after the training of the first training stage is finished, the training process of the first training stage is repeatedly executed once, in the repeated execution process, the network parameters of the model are reinitialized, the hyperparameter is optimized and searched again, the optimal model can be trained most possibly after the training is carried out based on the hyperparameter after the optimized and searched again, and if the training fails, the optimal model of the stage cannot be obtained, the training is continuously repeated until the training of the training stage is successful, namely the optimal model of the stage can be obtained.

S203, taking the optimal model in the current training stage as an initial model in the next training stage, performing hyper-parameter optimization search according to a hyper-parameter search range corresponding to hyper-parameters to be optimized in the next training stage, and obtaining the optimal model obtained by training in the next training stage as the optimal model in the next training stage until obtaining the optimal model obtained by training in the last training stage in the plurality of continuous training stages.

In the specific implementation process, the model training phase is divided into N phases, that is, the model training needs to perform 1 to N phases of periodic training in total, the N training phases are all used as the current training phase to perform step S202 once, and continue to perform step S203 once until the training is completed to the nth training phase.

Specifically, the training is sequentially executed according to model training stages, namely, a first training stage is executed first, the model is trained in the first training stage based on training sample data, network parameters in the model are updated in an iterative manner, and a hyper-parameter optimization search algorithm is synchronously adopted in the processing process to perform optimization search on the hyper-parameters required to be optimized in the first training stage, so that when the first training stage is finished, an optimal model obtained by the training in the first training stage, namely, an optimal model in the first training stage is obtained.

An important link in this embodiment is that a hyper-parameter optimization search algorithm is used to perform optimal hyper-parameter search in the model periodic training process, and during specific implementation, various hyper-parameter optimization search algorithms (also referred to as hyper-parameter tuning algorithms) can be used, such as a grid search algorithm, a random search algorithm, a bayesian search algorithm, and the like; the grid search algorithm is simple in implementation mode, and determines an optimal value by searching all points in a search range; if a larger search method and a smaller step size are adopted, the grid search has a very high probability of finding the global optimum value. The idea of the random search algorithm is similar to that of the grid search algorithm, only all values between an upper bound and a lower bound in a search range are not tested any more, but sample points are randomly selected in the search range, and the theoretical basis of the search algorithm is that if a sample point set is large enough, a global optimal value or an approximate value can be found out at a high probability through random sampling; experiments have shown that random search algorithms are generally faster than grid search. When the Bayesian optimization algorithm is used for searching the optimal most-valued parameter, completely different methods of grid search and random search are adopted, when a new point is tested, the grid search and the random search ignore the information of the previous point, the Bayesian optimization algorithm makes full use of the previous information, and the Bayesian optimization algorithm learns the shape of the objective function to find the parameter which promotes the objective function to the global optimal value.

During specific implementation, random hyper-parameter optimization search is carried out in a hyper-parameter search range according to preset search times through a random hyper-parameter optimization algorithm, model training in the current stage is executed in parallel based on searched hyper-parameter values, and therefore a plurality of models are obtained.

The process of the hyper-parametric optimization search is exemplified by an example, taking the first training phase described above as an example.

In the first training stage, the designated hyper-parameter to be optimized is a first hyper-parameter, the hyper-parameter search range corresponding to the first hyper-parameter is [0,0.4], and when the set search times are 5 times, a random hyper-parameter search algorithm is adopted to randomly search 5 numerical values, wherein the 5 numerical values are specifically [0.15,0.3, 0.23,0.19,0.36], so that 5 initial models are obtained by assigning in a manner that the first hyper-parameter is respectively taken as the 5 numerical values and other hyper-parameter values in the models are corresponding preset values or default values, and then the 5 initial models are respectively trained in the first training stage based on training samples.

When the training of the first training stage is finished, 5 intermediate models are correspondingly obtained, and then the performance of the 5 intermediate models is evaluated, so that the intermediate model which has the optimal performance and meets the preset performance condition is selected from the 5 intermediate models to serve as the optimal model of the first training stage.

If the intermediate model with the first hyperparameter value of 0.36 in the 5 intermediate models is obtained through performance evaluation to have the optimal effect and meet the preset performance condition, the intermediate model with the first hyperparameter value of 0.36 is used as the optimal model in the first training stage.

Of course, the training process of other training phases is similar to the implementation process of the first training phase, and is not described in detail here.

In the process of model training, not every training stage can train to obtain the optimal model meeting the conditions, in order to ensure that the model training process can be continuously and effectively carried out,

it can be known from the above embodiments that a model training method provided by the present application divides a model training phase into a plurality of continuous training phases, emphasizes both iterative update of model network parameters and optimization search for specific hyper-parameters in each training phase, so that a model after training of each training phase can have optimal capability of the specific hyper-parameters, and after training of each training phase, automatically uses an optimal model obtained by training of a current training phase as an initial model of a next training phase, so that training in a progressive manner, so that the optimal model obtained by training of a last training phase integrates capabilities of all optimized specific hyper-parameters, so that the method can integrate capabilities corresponding to a plurality of hyper-parameters through multi-segment automatic machine learning search training, the model performance can be improved, the optimization process of the hyper-parameters is carried out in the normal training process of the model, excessive time does not need to be additionally spent, the training time and the training effect can be well evaluated, and the model with the optimal comprehensive performance can be obtained by training in the specified time.

The practical application process of the model training method provided by the present application is explained with reference to fig. 3, as shown in fig. 3, when the method is applied, firstly, a random initial model is required, the random initial model refers to randomly initializing network parameters of a model to be trained, based on the initialized initial model, selecting a hyper-parameter to be trained in a first stage, determining a search range corresponding to the hyper-parameter, performing hyper-parameter optimization search based on the search range to determine a plurality of sets of hyper-parameters, performing automatic machine learning training based on the plurality of sets of hyper-parameters to obtain an optimal model in the first training stage, and then determining whether to perform intensive training in a next stage, after the training in the first training stage is completed, the training is not completed, and a second training stage is required, wherein the second training stage specifically, the optimal model obtained by the training in the previous training stage, i.e. the first training stage, is used as the initial model to start training, the process of the second training stage is similar to that of the first training stage, except that the hyper-parameters to be optimized for the second training stage and the hyper-parameters to be optimized for the first training stage are different, after the second training stage is finished, the third training stage is carried out, the iterative training is carried out in the way, and until the training of the last training stage is finished, the optimal model obtained by the training of the last training stage is used as the final target model.

For convenience of understanding, the following description will be made only by taking a model training scenario for a game AI scenario as an example, and an exemplary description will be made of a model training method provided in the present application.

In the model of the game AI, there are 5 hyper-parameters, wherein 3 hyper-parameters need to be optimized, such as a money hyper-parameter, a death hyper-parameter and a tower hyper-parameter, wherein the money hyper-parameter refers to a hyper-parameter capable of guiding the game AI to master the ability to make money, the survival hyper-parameter refers to a hyper-parameter capable of guiding the game AI to master the ability to survive to prevent killing, the tower hyper-parameter refers to a hyper-parameter capable of guiding the game AI to knock down the enemy building, and the tower hyper-parameter can also be understood as a hyper-parameter guiding the game AI to have the ability to fight.

Considering that the requirement on monetary capacity is high in the early stage of game AI, and the death capacity and tower pushing capacity are important in the later stage of game AI, monetary hyper-parameters are synchronously optimized and searched in the early stage of model training, and survival hyper-parameters and tower pushing hyper-parameters are synchronously optimized and searched in the later stage of model training; therefore, the model training process of the game AI is divided into two training stages, wherein the first training stage is used for synchronously optimizing and searching the monetary capacity hyperparameter in the model training process; the second training stage is used for synchronously optimizing and searching survival hyper-parameters and tower push hyper-parameters in the process of training the model; thus, after two training phases, a game AI model is trained that hybridizes with optimal monetary capabilities, optimal survivability, and optimal tower pushing capabilities.

The following describes a specific training process of the game AI model.

Assuming that the training time duration StepMax of the game AI model is 100w, the training process of the game AI model is divided into two training phases, the first training phase corresponds to a training step length of 50w, and the second training phase corresponds to a training step length of 50 w.

First, an initial model of the game AI is created and recorded as checkpoint0, and a suitable hyper-parameter optimization algorithm for automatic machine learning is selected, for example, in the present embodiment, a random hyper-parameter optimization algorithm is selected as an example.

Secondly, determining that the hyper-parameter to be optimized corresponding to each training stage is a monetary hyper-parameter, a hyper-parameter search range corresponding to the monetary hyper-parameter, a hyper-parameter which cannot be optimized and a default value thereof. The following parameters are shown in table 1:

TABLE 1 model training parameter Table

Then, entering a first training stage, training based on an initial model checkpoint0 of the game AI, firstly, performing optimization search on the monetary hyper-parameter by using a random hyper-parameter optimization algorithm according to a preset search frequency, and when performing optimization search on the monetary hyper-parameter, adopting default values for other hyper-parameters in the model, for example, the training default value of the survival hyper-parameter is 0.5, the training default value of the tower-pushing hyper-parameter is 0.2, and other hyper-parameters also have corresponding default values, which is not illustrated here.

Specifically, a random hyper-parameter optimization algorithm may be used to perform 3 searches on the monetary hyper-parameter money at [0,0.4] (the number of searches in the first training stage may be set according to requirements, and is described as 3 examples only), and dead and tower are default values, and three sets of hyper-parameters are performed on the random hyper-parameter optimization search at [0,0.4], dead is 0.5, and tower is 0.2, as shown in table 2 below.

TABLE 2 first stage hyper-parametric optimization search results

Number of searches	Searched hyper-parameters
		1	money＝0.12,dead＝0.5,tower＝0.2
2	money＝0.3,dead＝0.5,tower＝0.2
		3	money＝0.23,dead＝0.5,tower＝0.2

It should be noted that the values of the survival hyperparameter dead and the tower-pushing hyperparameter tower in table 2 are training default values, and only the monetary hyperparameter money is subjected to the hyperparameter optimization search, and table 2 shows the values of the three hyperparameters only for the convenience of understanding the hyperparameter configuration.

Model training in the first training phase is performed based on the hyper-parameter configuration in table 2, an optimal model is selected from the 3 intermediate models obtained after the training in the first training phase is completed, and the optimal model obtained by the training in the first training phase is recorded as checkpoint1 assuming that money of the selected optimal model is 0.23.

Next, training is performed in a second training phase.

The model training in the second training stage is to continue training with the optimal model checkpoint1 obtained by training in the first training stage as an initial model, that is, model training is performed on the basis of checkpoint1, and since the second training stage is used to perform optimization search on the survival hyper-parameter dead and the tower-push hyper-parameter tower, random hyper-parameter optimization search is performed 5 times for [0,0.4] and [0.2,0.8] of the tower-push hyper-parameter tower based on the initial model checkpoint1 (the number of searches in the second training stage may also be set according to the requirement, and is described only by taking 5 times as an example), and the search results are shown in table 3.

TABLE 3 second stage hyper-parametric optimization search results

It should be noted that the value of the survival over-parameter money in table 3 is the optimal value trained in the first training stage, that is, the value of money in the optimal model trained in the first training stage. In table 3, only the result of the hyper-parameter optimization search is performed for the survival hyper-parameter dead and the tower push hyper-parameter tower, and the values of the three hyper-parameters are shown in table 3 only for convenience of understanding the hyper-parameter configuration.

The second training stage performs model training on checkpoint1 based on the 5 hyper-parameter configuration conditions in table 3, selects an optimal model from the 5 intermediate models obtained after the training of the second training stage is completed, and records the optimal model obtained by the training of the second training stage as checkpoint2, assuming that money of the selected optimal model is 0.23, dead is 0.45, and top is 0.32.

Because the training process of the whole model is only divided into two training stages, after the two training stages are executed, the optimal model checkpoint2 obtained by training in the second training stage is directly used as the model obtained by final training, namely the checkpoint2 obtained by searching is a model with money, tower-making and survival capabilities, and the comprehensive performance of the game AI is optimized.

The above is only explained by taking the example of dividing the model training process into two training phases.

In practical application, the model is divided into a plurality of training stages, in this case, the training is performed stepwise from the first training stage, the optimal model obtained by the training of each training stage is directly used as the model base of the next training stage, the iterative training is performed until the training is performed to the last training stage, the optimal model obtained by the training of the last training stage is used as the final model, and the model is the optimal model hybridized with a plurality of hyper-parameter optimal capabilities.

In order to make the method provided by the present application fall to the ground conveniently, the embodiment of the present application further provides a model training device, which is explained with reference to fig. 3 below.

Referring to fig. 4, fig. 4 is a block diagram illustrating a model training apparatus according to an embodiment of the present application, where the apparatus 400 includes:

a determining module 401, configured to determine multiple continuous training phases corresponding to a model, where different training phases are used to synchronously optimize different hyper-parameters in a model training process;

a first training module 402, configured to perform, in a current training stage of the multiple continuous training stages, a hyper-parameter optimization search according to a hyper-parameter search range corresponding to a hyper-parameter to be optimized in the current training stage, and obtain an optimal model obtained by training in the current training stage as an optimal model in the current training stage;

a second training module 403, configured to use the optimal model in the current training phase as an initial model in a next training phase, perform a hyper-parameter optimization search according to a hyper-parameter search range corresponding to a hyper-parameter to be optimized in the next training phase, obtain an optimal model obtained by training in the next training phase as the optimal model in the next training phase, until obtaining an optimal model obtained by training in a last training phase of the multiple consecutive training phases.

Optionally, the determining module 401 is specifically configured to determine a number N of hyperparametric capacity crosses and a hyperparametric search range corresponding to a hyperparameter involved in an ith cross, where N is an integer greater than 1, and i takes all positive integers from 1 to N; and the number of the first and second groups,

and setting the N continuous training phases aiming at model training, wherein the ith training phase is used for optimizing the hyper-parameters involved in the ith hybridization.

Optionally, the first training module 402 and the second training module 403 perform a hyper-parameter optimization search, specifically configured to: and performing random hyper-parameter optimization search according to preset search times in the hyper-parameter search range through a random hyper-parameter optimization algorithm.

Optionally, when obtaining the optimal model obtained by training in the current training stage, the first training module 402 is specifically configured to:

when the training step number in the current training stage reaches a preset step number, acquiring a plurality of intermediate models obtained by training after carrying out hyper-parameter optimization search in the current training stage;

and selecting one intermediate model with the model performance meeting the preset condition and the optimal performance from the plurality of intermediate models, and taking the selected intermediate model as the optimal model obtained by training in the current training stage.

Optionally, the first training module 402 is further configured to, when the performances of the multiple intermediate models do not meet a preset condition, use a suboptimal model obtained by training in a previous training stage as an initial model of the current training model, and return to repeatedly execute the steps of: and performing hyper-parameter optimization search according to a hyper-parameter search range corresponding to the hyper-parameter to be optimized in the current training stage to obtain an optimal model obtained by training in the current training stage as the optimal model in the current training stage.

Optionally, at least one of the plurality of consecutive training phases is used to optimize one hyper-parameter, and at least another training phase is used to optimize multiple hyper-parameters synchronously.

Optionally, the model is a fight decision model of the game agent, and the fight decision model is used for deciding the next action of the game agent according to the current state.

Optionally, the value of N is 2, the first training stage is configured to optimize the first hyper-parameter, and the second training stage is configured to optimize the second hyper-parameter and the third hyper-parameter synchronously; the importance degree of the first super-parameter at the early stage of the running of the game agent is higher than the importance degrees of the second super-parameter and the third super-parameter.

For specific implementation of the model training apparatus shown in fig. 4, reference may be made to the related description in the above embodiment of the model training method, and details are not repeated here.

Next, a device provided in the embodiment of the present application is explained, and the device provided in the embodiment of the present application is mainly used for implementing the model training method in the embodiment of the present application.

Specifically, the apparatus may include: a memory and a processor;

the memory for storing a computer program;

the processor is configured to run the computer program to perform the method steps described in the embodiments of the present application.

The apparatus provided by the embodiment of the present application is explained by an example, referring to fig. 5, fig. 5 is a schematic structural diagram of a server provided by the embodiment of the present application, and the server 500 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 522 (e.g., one or more processors) and a memory 532, and one or more storage media 530 (e.g., one or more mass storage devices) storing an application program 542 or data 544. Memory 532 and storage media 530 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 522 may be configured to communicate with the storage medium 530, and execute a series of instruction operations in the storage medium 530 on the server 500.

The server 500 may also include one or more power supplies 526, one or more wired or wireless network interfaces 550, one or more input-output interfaces 558, and/or one or more operating systems 541, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.

The steps performed by the server in the above embodiments may be based on the server structure shown in fig. 4.

The CPU 522 is configured to perform the following steps:

Optionally, the CPU 522 may also execute method steps of any specific implementation of the model training method provided in this embodiment, which are not described herein again, and refer to the above description directly. .

In addition, in view of the research and development requirements of the actual game AI, the embodiment of the present application further provides a model training system, and referring to fig. 6, fig. 6 shows a structural diagram of the model training system provided by the embodiment of the present application, and the system 600 includes:

a plurality of first servers 601 and second servers 602;

the first server 601 is configured to run a model of an agent to obtain run data;

the second server 602 is configured to train the model by using the operation data generated by the plurality of first servers as sample data in the following manner:

Of course, the second server may also execute the method steps of any specific implementation manner of the model training method provided in this embodiment, which is not described herein again, and refer to the above description directly.

It should be noted that, reference may be made to fig. 4 and the related description corresponding to fig. 4 for the structures of the first server and the second server in the system 600, and details are not described here again.

In addition, in order to facilitate the method provided by the embodiment of the present application to be put into industrial application, the present application further provides a computer-readable storage medium, where the computer-readable storage medium is used to store a computer program, and the computer program is used to execute a model training method provided by the embodiment of the present application, and the specific method is described with reference to fig. 2 and fig. 2.

It can be clearly understood by those skilled in the art of the present application that, for convenience and brevity of description, the specific working processes of the system, the apparatus and the module described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A method of model training, the method comprising:

2. The method of claim 1, wherein determining a plurality of successive training phases for the model comprises:

determining the number N of times of hyper-parametric capacity hybridization and a hyper-parameter search range corresponding to a hyper-parameter involved in the ith hybridization, wherein N is an integer greater than 1, and the value of i is all positive integers from 1 to N;

3. The method of claim 1, wherein performing the hyper-parametric optimization search comprises: and performing random hyper-parameter optimization search according to preset search times in the hyper-parameter search range through a random hyper-parameter optimization algorithm.

4. The method of claim 1, wherein obtaining the optimal model trained in the current training phase comprises:

5. The method of claim 1, wherein when none of the plurality of intermediate models satisfies a predetermined condition, the method further comprises:

and taking the suboptimal model obtained by training in the last training stage as the initial model of the current training model, and returning to repeatedly execute the steps: and performing hyper-parameter optimization search according to a hyper-parameter search range corresponding to the hyper-parameter to be optimized in the current training stage to obtain an optimal model obtained by training in the current training stage as the optimal model in the current training stage.

6. The method of claim 1, wherein at least one of the plurality of successive training phases is configured to optimize one of the plurality of hyper-parameters, and wherein at least another of the plurality of training phases is configured to optimize a plurality of hyper-parameters simultaneously.

7. The method of any one of claims 1 to 6, wherein the model is a match decision model for the gaming agent, the match decision model being used to decide the next action of the gaming agent based on the current state.

8. A model training apparatus, the apparatus comprising:

9. An apparatus, comprising:

a memory and a processor;

the memory for storing a computer program;

the processor for executing the computer program to perform the method of any of the preceding claims 1 to 8.

10. A model training system, comprising:

a plurality of first servers and second servers;

11. A computer-readable storage medium, characterized in that the computer-readable storage medium is used to store a computer program for performing the method of any of claims 1 to 7.