CN111481935A - Configuration method, device, equipment and medium for AI models of games with different styles - Google Patents
Configuration method, device, equipment and medium for AI models of games with different styles Download PDFInfo
- Publication number
- CN111481935A CN111481935A CN202010273066.3A CN202010273066A CN111481935A CN 111481935 A CN111481935 A CN 111481935A CN 202010273066 A CN202010273066 A CN 202010273066A CN 111481935 A CN111481935 A CN 111481935A
- Authority
- CN
- China
- Prior art keywords
- optimization
- target
- game
- algorithm
- values
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/60—Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/12—Computing arrangements based on biological models using genetic models
- G06N3/126—Evolutionary algorithms, e.g. genetic algorithms or genetic programming
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Evolutionary Biology (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Physiology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Genetics & Genomics (AREA)
- Medical Informatics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The application provides a configuration method, a configuration device, equipment and a configuration medium of AI models of games with different styles, and relates to the technical field of software algorithms. The configuration method comprises the steps of determining a plurality of different optimization targets, wherein the different optimization targets correspond to different styles of game AI models; determining parameter values of a plurality of groups of optimization targets of the nth generation according to a multi-target evolutionary algorithm; training a game AI model according to the parameter values of the multiple groups of optimization targets to determine corresponding multiple groups of optimization target values; when at least one target optimization target value meeting the optimization threshold exists in the multiple groups of optimization target values, the trained game AI model corresponding to the at least one target optimization target value is reserved.
Description
Technical Field
The application relates to the technical field of software algorithms, in particular to a configuration method, a device, equipment and a medium for AI models of games with different styles.
Background
A large amount of objects in the online game are required to be diversified, and Non-player characters (NPC) with different styles can greatly improve the freshness of players, so that the game experience and entertainment are improved.
In the prior art, the field of games is generally based on Reinforcement L earning (R L), and the weight ratio of various excitation signals is adjusted in a manual parameter adjusting manner to guide the training of NPCs, so as to train NPCs of various styles.
However, in the existing method for training NPC, because manual parameter adjustment is needed, the problems of limited optimization capability and low output efficiency exist.
Disclosure of Invention
The present application aims to provide a configuration method, device, equipment and medium for AI models of games of different styles, aiming at the deficiencies in the prior art, and can solve the problems of limited optimization capability and low output efficiency existing in manual parameter adjustment in the prior art.
In order to achieve the above purpose, the technical solutions adopted in the embodiments of the present application are as follows:
in a first aspect, an embodiment of the present application provides a method for configuring AI models of games with different styles, including:
determining a plurality of different optimization objectives, wherein the different optimization objectives correspond to different styles of the game AI model;
determining parameter values of a plurality of groups of optimization targets of the nth generation according to a multi-target evolutionary algorithm;
training the game AI model according to the parameter values of the multiple groups of optimization targets to determine the corresponding multiple groups of optimization targets;
and when at least one target optimization target value meeting the optimization threshold exists in the multiple groups of optimization target values, reserving the trained game AI model corresponding to the at least one target optimization target value.
Optionally, the parameter values of the optimization objective include: the neural network parameters are calculated and obtained according to an enhanced algorithm, and the excitation signal weights correspond to corresponding excitation signals.
Optionally, the training the game AI model according to the plurality of sets of parameter values of the optimization goal to determine the corresponding plurality of sets of the optimization goal values includes:
and taking the reinforced algorithm as a mutation operator of the multi-objective evolutionary algorithm, substituting the mutation operator into multiple groups of parameter values of the optimization target, and training the game AI model to determine corresponding multiple groups of optimization target values.
Optionally, the taking the reinforcement algorithm as a mutation operator of the multi-objective evolutionary algorithm, substituting multiple sets of parameter values of the optimization objective, and training the game AI model to determine corresponding multiple sets of the optimization objective includes:
taking the reinforced algorithm as a mutation operator of the multi-objective evolutionary algorithm, substituting a plurality of groups of parameter values of the optimization objective to obtain the substituted reinforced algorithm;
acquiring an excitation scalar according to the excitation signal weight and the excitation signal corresponding to the excitation signal weight;
and training the game AI model to determine corresponding groups of the optimization target values according to the substituted strengthening algorithm and the excitation scalar.
Optionally, the method further comprises:
and when at least one target optimization target value meeting the optimization threshold does not exist in the multiple groups of optimization target values, adjusting the excitation signal weight according to a multi-target evolutionary algorithm.
In a second aspect, an embodiment of the present application provides an apparatus for configuring AI models of games with different styles, including: the device comprises a first determining module, a second determining module, a training module and a reserving module.
The first determining module is used for determining a plurality of different optimization targets, wherein the different optimization targets correspond to different styles of the game AI model;
the second determining module is used for determining the parameter values of the multiple groups of optimization targets of the nth generation according to the multi-target evolutionary algorithm;
the training module is used for training the game AI model according to the parameter values of the multiple groups of optimization targets to determine corresponding multiple groups of optimization targets;
the reservation module is configured to, when at least one target optimization target value that meets an optimization threshold exists in the multiple sets of optimization target values, reserve the trained game AI model corresponding to the at least one target optimization target value.
Optionally, the parameter values of the optimization objective include: the neural network parameters are calculated and obtained according to an enhanced algorithm, and the excitation signal weights correspond to corresponding excitation signals.
Optionally, the training module is specifically configured to use the reinforcement algorithm as a mutation operator of the multi-objective evolutionary algorithm, substitute multiple sets of parameter values of the optimization objective, and train the game AI model to determine corresponding multiple sets of the optimization objective values.
Optionally, the training module is specifically configured to substitute the reinforcement algorithm as a mutation operator of the multi-objective evolutionary algorithm into a plurality of sets of parameter values of the optimization objective to obtain a substituted reinforcement algorithm;
acquiring an excitation scalar according to the weighting of the excitation signal weight and the excitation signal corresponding to the excitation signal weight;
and training the game AI model to determine corresponding groups of the optimization target values according to the substituted strengthening algorithm and the incentive scalar.
Optionally, the apparatus further comprises: and the adjusting module is used for adjusting the excitation signal weight according to a multi-objective evolutionary algorithm when at least one target optimization target value meeting an optimization threshold value does not exist in the multiple groups of optimization target values.
In a third aspect, an embodiment of the present application provides an electronic device, including: the game device comprises a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, when the electronic device runs, the processor and the storage medium communicate through the bus, and the processor executes the machine-readable instructions to execute the steps of the configuration method of the different-style game AI model of the first aspect.
In a fourth aspect, the present application provides a storage medium, where a computer program is stored on the storage medium, and the computer program is executed by a processor to perform the steps of the configuration method for AI models of games with different styles according to the first aspect.
The beneficial effect of this application is:
in the configuration method, the device, the equipment and the medium for the game AI models with different styles, provided by the embodiment of the application, a plurality of different optimization targets are determined, wherein the different optimization targets correspond to different styles of the game AI models; determining parameter values of a plurality of groups of optimization targets of the nth generation according to a multi-target evolutionary algorithm; training a game AI model according to the parameter values of the multiple groups of optimization targets to determine corresponding multiple groups of optimization target values; when at least one target optimization target value meeting the optimization threshold exists in the multiple groups of optimization target values, the trained game AI model corresponding to the at least one target optimization target value is reserved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a schematic flowchart of a configuration method of AI models of games of different styles according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of another configuration method of AI models of different styles according to an embodiment of the present disclosure;
FIG. 3 is a schematic flowchart of a configuration method of AI models of games with different styles according to an embodiment of the present application;
FIG. 4 is a schematic flow chart illustrating another configuration method of AI models for games of different styles according to an embodiment of the present disclosure;
FIG. 5 is a schematic structural diagram of an apparatus for configuring AI models of games of different styles according to an embodiment of the present application;
FIG. 6 is a schematic structural diagram of another configuration apparatus for AI models of games with different styles according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
Fig. 1 is a schematic flowchart of a configuration method of AI models of games of different styles according to an embodiment of the present application, where an execution subject of the method may be a computer, a server, a processor, or other devices that can perform data processing, and as shown in fig. 1, the method includes:
s101, determining a plurality of different optimization targets, wherein the different optimization targets correspond to different styles of the game AI model.
The game AI () model may be a model corresponding to a Non-player Character (NPC) in a game, and the game AI model may correspond to different styles, which refer to game behavior styles, and each style may correspond to a corresponding optimization target. For example, a game AI model may include an aggressive style and a conservative style, the optimization target corresponding to the aggressive style may be accumulated damage of an opponent caused by an attack of an NPC corresponding to the game AI model, and the optimization target corresponding to the conservative style may be an average escape speed of the NPC corresponding to the game AI model in the whole game match, but not limited thereto, and may include other styles and optimization targets corresponding to the styles according to an actual application scenario.
Optionally, the multiple optimization objectives may be conflicting, for example, including: both the optimization objectives under aggressive and conservative types are conflicting.
S102, determining parameter values of multiple groups of optimization targets of the nth generation according to a multi-target evolutionary algorithm.
Before introducing the multi-objective evolutionary algorithm, the method introduces an Evolutionary Algorithm (EA) which is an optimization algorithm simulating the advantages and disadvantages of natural organisms, a population is formed by a plurality of individuals, parents in each generation generate filial generations through genetic operations (crossing and variation), and the population continuously searches for the optimal solution of the problem through the advantages and disadvantages; specifically, at the algorithm level, an individual may include decision variables (e.g., parameter values of the aforementioned optimization objectives), and other associated data (e.g., optimization target values corresponding to the aforementioned multiple different optimization objectives, etc.). It should be noted that, the "parent" and the "child" are both "individuals" in the evolutionary algorithm, and the child individuals are generated by the parent individuals through genetic operations (crossover and mutation), which are different from but mostly similar to the parent.
Since the multiple different optimization objectives may reflect multiple styles of game AI models, based on this, the multiple different optimization objectives constitute a Multi-objective optimization problem (MOP), and multiple different optimization objectives are represented by multiple different optimization objective functions, and when the multiple different optimization objective functions are represented by the multiple different optimization objective functions, the multiple different optimization objective functions may include parameter values of corresponding optimization objectives, and the parameter values may determine function values of the respective optimization objective functions, that is, optimization objective values corresponding to the respective optimization objectives. Based on the EA Algorithm, the Multi-objective Evolutionary Algorithm (MOEA) is a black box optimization Algorithm simulating the superiority and inferiority of the nature, and has unique advantages in processing the Multi-objective optimization problem, so that the MOEA Algorithm is selected, an Evolutionary population of MOEA is composed of a plurality of individuals, for each generation of MOEA iteration of each individual, the MOEA determines parameter values of a plurality of groups of optimization targets of the nth generation through cross and variation adjustment, and then corresponding groups of optimization target values can be obtained according to a plurality of different optimization target functions corresponding to a plurality of different optimization targets and the parameter values of the plurality of groups of optimization targets of the nth generation, wherein the value of n can be an integer greater than 0, and the selection can be performed according to the actual application scene.
Optionally, the method can be realized by selecting algorithms such as NSGA-II, SMS-EMOA and the like in the MOEA algorithm, wherein the NSGA-II algorithm has the advantages of reducing the complexity of a non-inferior ranking genetic algorithm, along with high running speed and good convergence of solution set; the SMS-EMOA algorithm covers as large an ultra volume as possible with limited points, and can be selected by the user according to the actual application scenario, and the application is not limited herein.
S103, training the game AI model according to the parameter values of the multiple groups of optimization targets to determine the corresponding multiple groups of optimization targets.
After the parameter values of the multiple groups of optimization targets are determined, the game AI model can be trained according to the parameter values of the multiple groups of optimization targets to determine corresponding multiple groups of optimization target values, and the optimization target values can be used for counting and quantifying the performance and behavior of the game AI model so as to measure the style corresponding to the game AI model.
It should be noted that, compared with the conventional single-target optimization algorithm, the conventional single-target optimization algorithm can only obtain one optimal solution, and the MOEA provided in the embodiment of the present application can obtain a set of non-dominated solution sets, that is, parameter values of a set of optimization targets, each of the parameter values of the optimization targets has advantages and disadvantages on a plurality of different optimization targets. For example, under the parameter values of a certain group of optimization targets, the style of the game AI model corresponding to the first optimization target may be an aggressive type that is output by a mad attack but rarely escapes; the style of the game AI model corresponding to the second optimization target can be a conservative type which is always evaded and never attacked; the style of the game AI model corresponding to the third optimization target may be a normal type of attack while escaping. Then, after the MOEA is executed for multiple times, multiple non-dominated solution sets can be obtained, and the multiple non-dominated solution sets can correspond to multiple sets of parameter values of the optimization target.
And S104, when at least one target optimization target value meeting the optimization threshold exists in the multiple groups of optimization target values, reserving the trained game AI model corresponding to the at least one target optimization target value.
Alternatively, a plurality of different optimization objectives may be represented by a plurality of different optimization objective functions and correspond to respective optimization objective values. For example, the plurality of different optimization objectives includes a first optimization objective corresponding to a first optimization objective function and a second optimization objective corresponding to a second optimization objective function, which are respectively represented by y1 ═ maxf1(x) and y2 ═ maxf2(x), f1(), f2() respectively represent the first optimization objective function and the second optimization objective function, and y1, y2 respectively represent the optimization objective value corresponding to the first optimization objective function and the optimization objective value corresponding to the second optimization objective function.
The embodiment of the present application is described herein with reference to a first optimization objective as an example, where the optimization threshold may be a preset optimization target value conforming to an expected behavior style, and if the parameter values x1 and x2 of 2 groups of optimization objectives are determined according to a multi-objective evolutionary algorithm, for the first optimization objective, the game AI model is trained according to the parameter values x1 and x2 of the 2 groups of optimization objectives to determine corresponding 2 groups of first optimization target values y11 ═ maxf1(x1) and y12 ═ maxf1(x2), and then it may be determined whether to retain the game AI model corresponding to the first optimization objective according to the first optimization target values y11, y12 and the optimization threshold, where it is to be noted that different optimization objectives may correspond to different optimization thresholds. For example, the optimization threshold corresponding to the first optimization target is Y1, and the optimization threshold corresponding to the second optimization target is Y2, optionally, it may be compared whether the first optimization target values Y11 and Y12 are respectively greater than the optimization threshold Y1, if the first optimization target value Y11 is greater than the optimization threshold Y1, and the second optimization target value Y12 is less than the optimization threshold Y1, it is determined that at least one first target optimization target value Y11 meeting the optimization threshold exists in the first optimization target values Y11 and Y12, the trained game AI model corresponding to the first optimization target value Y11 is retained, and the trained game AI model corresponding to the first optimization target value Y12 is discarded. In addition, the game AI model can be continuously trained to determine the corresponding first optimization target value according to the parameter values of other optimization targets (for example, x3, x4, etc.) at a later stage, so that the optimal target value can be obtained. According to the process, the trained game AI model corresponding to each optimization target can be reserved according to a plurality of different optimization targets, and each game AI model can correspond to different styles to realize the diversity of the styles of the game AI models.
In the practical application process, a user only needs to select and obtain the corresponding target game AI model from a plurality of optimization target values according to the style requirements of the target game AI model, and the target game AI model can operate under the parameter values of a corresponding group of optimization targets to obtain the corresponding target style. For example, when the user uses the game service, the user may select a suitable game AI model from the three styles of the aggressive type, the conservative type and the normal type (for example, the game AI model with the aggressive style may be selected for playing the game).
In summary, the configuration method of the game AI models with different styles provided by the embodiment of the present application determines a plurality of different optimization objectives, wherein the different optimization objectives correspond to different styles of the game AI models; determining parameter values of a plurality of groups of optimization targets of the nth generation according to a multi-target evolutionary algorithm; training a game AI model according to the parameter values of the multiple groups of optimization targets to determine corresponding multiple groups of optimization target values; when at least one target optimization target value meeting the optimization threshold exists in the multiple groups of optimization target values, the trained game AI model corresponding to the at least one target optimization target value is reserved.
Optionally, the parameter values of the optimization objective include: the neural network parameters are calculated and obtained according to a strengthening algorithm, and the excitation signal weights correspond to corresponding excitation signals.
If x1 and x2 respectively represent parameter values of an optimization target, which may also be referred to as decision variables in MOEA, then x1 may include two parts: a neural network parameter θ 1 and an excitation signal weight W1; x2 may include two parts: the neural network parameter θ 2 and the excitation signal weight W2, and the neural network parameter θ 1 and the neural network parameter θ 2 can be calculated according to a reinforcement learning algorithm, and each excitation signal weight corresponds to a corresponding excitation signal, for example, the excitation signal weight W1 corresponds to the excitation signal r1, and the excitation signal weight W2 corresponds to the excitation signal r 2.
It should be noted that, in the embodiment of the present application, the neural network parameter θ 1 and the neural network parameter θ 2 do not refer to a certain parameter in the neural network model, but refer to all parameters in the neural network model, alternatively, the neural network model may be a deep neural network model, but is not limited thereto.
Fig. 2 is a schematic flowchart of another configuration method of AI models of games with different styles according to an embodiment of the present application. Optionally, as shown in fig. 2, the training the game AI model according to the parameter values of the multiple sets of optimization targets to determine the corresponding multiple sets of optimization target values includes:
s201, taking the reinforced algorithm as a mutation operator of the multi-objective evolutionary algorithm, substituting parameter values of multiple groups of optimization targets, and training the game AI model to determine corresponding multiple groups of optimization target values.
The method is characterized in that a reinforcement algorithm is used as a mutation operator of a multi-objective evolutionary algorithm, so that the algorithm has certain local random search capability, on one hand, convergence to an optimal solution is accelerated in the later stage of solving, on the other hand, the diversity of the solution is maintained, and then a plurality of groups of optimized target values can be obtained. Specifically, the game AI model may be trained by substituting parameter values of multiple sets of optimization objectives into the reinforcement algorithm, and multiple sets of optimization objective values corresponding to multiple different optimization objectives may be determined through training.
For example, for a given excitation signal weight, the excitation signal weight and the corresponding excitation signal may be input into a reinforcement learning algorithm, and after the neural network model converges, the neural network parameters may be obtained through multiple learning parameter adjustments, so that the parameter values of the optimization target may be obtained according to the neural network parameters and the excitation signal weight, and further, the game AI model may be trained to determine corresponding sets of optimization target values. It should be noted that, in the training process, the neural network parameters may be adjusted through a gradient, but not limited thereto.
Fig. 3 is a schematic flowchart of a configuration method of AI models of games with different styles according to an embodiment of the present application. Optionally, the above-mentioned taking the enhancement algorithm as a mutation operator of the multi-objective evolutionary algorithm, substituting the parameter values of the multiple groups of optimization targets, and training the game AI model to determine corresponding multiple groups of optimization target values, as shown in fig. 3, includes:
s301, taking the enhancement algorithm as a mutation operator of the multi-objective evolutionary algorithm, substituting the parameter values of a plurality of groups of optimization targets, and obtaining the substituted enhancement algorithm.
R L may be used as a mutation operator of MOEA, and the method may respectively substitute parameter values of multiple groups of optimization targets to obtain a substituted reinforcement algorithm, and further may evaluate and learn multiple different optimization targets based on the substituted reinforcement algorithm, and finally determine the leaving of MOEA descendant according to the optimization target values of the multiple different optimization targets.
And S302, acquiring an excitation scalar according to the excitation signal weight and the excitation signal corresponding to the excitation signal weight.
Of course, it should be noted that, during the reinforcement learning process, it may include incentive signals reward in a plurality of different game styles, which may be obtained during the game operation process, and then R L may use the incentive signals to guide the training of parameter values of the optimization goal, and correspondingly, may include an incentive signal weight equal to the number of the incentive signals, and then, according to a plurality of incentive signals and corresponding incentive signal weights, it may calculate and obtain corresponding incentive scalar, because R L based on gradient descent is essentially a single goal optimization algorithm, and thus reward may only be a single floating point value.
For example, the stimulus signal includes r1 and r2, r1 represents the attack score, r2 represents the defense score; with excitation signal weights w1 and w2, respectively, the excitation scalar can be expressed as r-w 1 r1+ w2 r 2.
S303, training the game AI model according to the substituted strengthening algorithm and the excitation scalar to determine corresponding multiple groups of optimized target values.
After the excitation scalar is obtained, the excitation scalar can be used for learning of a reinforcement learning algorithm and used for providing a feedback signal for R L in the learning process, so that multiple groups of optimization target values can be obtained.
Fig. 4 is a schematic flowchart of another configuration method of AI models of games with different styles according to an embodiment of the present application. Optionally, as shown in fig. 4, the method further includes:
s401, when at least one target optimization target value meeting the optimization threshold does not exist in the multiple groups of optimization target values, adjusting the weight of the excitation signal according to a multi-target evolution algorithm.
When at least one target optimization target value meeting the optimization threshold does not exist in the multiple groups of optimization target values, the excitation signal weight can be adjusted again according to the multi-objective evolutionary algorithm and training is continued until at least one target optimization target value meeting the optimization threshold exists in the multiple groups of optimization target values, and the trained game AI model corresponding to the at least one target optimization target value is reserved, so that the game AI model can show an expected style in the game process, and the diversity of the game AI model is realized.
In addition, it should be noted that, when actually using the reinforcement learning algorithm as a mutation operator of the multi-objective evolutionary algorithm and substituting the mutation operator into parameter values of multiple groups of optimization targets, the property that MOEA does not require differentiation of optimization problems can be utilized, and the excitation signal weight is automatically adjusted by MOEA, so that R L learns the process of different styles, that is, automatic Machine learning (Auto Machine L learning, Auto m L) is realized, and thus parameter values of multiple groups of optimization targets under different game styles can be learned, so that not only can the manual parameter tuning cost be reduced, but also more various game AI model styles can be generated by means of the diversity retention capability of MOEA, and the technical problem that the game AI model generated by the limited manual parameter tuning optimization capability is poor in diversity is solved.
The configuration method of the game AI models with different styles can realize the diversity of the game AI models compared with manual parameter adjustment, wherein the manual parameter adjustment can only obtain extreme styles (such as always keeping attack or always keeping escape), and intermediate styles (such as always keeping attack when the blood volume is sufficient and always keeping escape when the blood volume is insufficient) can obtain better game experience and winning rate, but the excitation signal weight ratio is difficult to find by manual parameter adjustment, and the configuration method is based on MOEA, can improve the optimization efficiency by combining R L, and can automatically adjust the excitation signal weight by utilizing the property that the MOEA does not require the differentiation of the optimization problem to realize automatic machine learning, so the diversity of the game AI models can be realized compared with manual parameter adjustment.
Fig. 5 is a schematic structural diagram of a configuration apparatus for AI models of games with different styles according to an embodiment of the present application, the basic principle and the technical effect of the configuration apparatus are the same as those of the corresponding method embodiment, and for a brief description, the corresponding contents in the method embodiment may be referred to for the parts not mentioned in this embodiment. As shown in fig. 5, the apparatus includes: a first determination module 110, a second determination module 120, a training module 130, and a retention module 140.
A first determining module 110, configured to determine a plurality of different optimization objectives, where the different optimization objectives correspond to different styles of game AI models; a second determining module 120, configured to determine parameter values of multiple sets of optimization targets of an nth generation according to a multi-objective evolutionary algorithm; a training module 130, configured to train the game AI model according to the parameter values of the multiple sets of optimization targets to determine corresponding multiple sets of optimization targets; the retaining module 140 is configured to, when at least one target optimization target value that meets the optimization threshold exists in the multiple sets of optimization target values, retain the trained game AI model corresponding to the at least one target optimization target value.
Optionally, the parameter values of the optimization objective include: the neural network parameters are calculated and obtained according to a strengthening algorithm, and the excitation signal weights correspond to corresponding excitation signals.
Optionally, the training module 130 is specifically configured to use the reinforcement algorithm as a mutation operator of the multi-objective evolutionary algorithm, substitute the mutation operator into multiple sets of parameter values of the optimization objectives, and train the game AI model to determine corresponding multiple sets of optimization objective values.
Optionally, the training module 130 is specifically configured to substitute the reinforcement algorithm as a mutation operator of the multi-objective evolutionary algorithm into a plurality of sets of parameter values of the optimization objective to obtain a substituted reinforcement algorithm; acquiring an excitation scalar according to the weighting of the excitation signal weight and the excitation signal corresponding to the excitation signal weight; and training the game AI model to determine corresponding groups of optimized target values according to the substituted strengthening algorithm and the excitation scalar.
Fig. 6 is a schematic structural diagram of another configuration apparatus for AI models of games with different styles according to an embodiment of the present application. Optionally, as shown in fig. 6, the apparatus further includes: an adjusting module 150, configured to adjust the excitation signal weight according to the multi-objective evolutionary algorithm when at least one target optimization target value satisfying the optimization threshold does not exist in the plurality of sets of optimization target values.
The above-mentioned apparatus is used for executing the method provided by the foregoing embodiment, and the implementation principle and technical effect are similar, which are not described herein again.
These above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 7, the electronic device may include: a processor 210, a storage medium 220, and a bus 230, wherein the storage medium 220 stores machine-readable instructions executable by the processor 210, and when the electronic device is operated, the processor 210 communicates with the storage medium 220 via the bus 230, and the processor 210 executes the machine-readable instructions to perform the steps of the above-mentioned method embodiments. The specific implementation and technical effects are similar, and are not described herein again.
Optionally, the present application further provides a storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program performs the steps of the above method embodiments. The specific implementation and technical effects are similar, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to perform some steps of the methods according to the embodiments of the present application. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Claims (10)
1. A configuration method of AI models of games with different styles is characterized by comprising the following steps:
determining a plurality of different optimization objectives, wherein the different optimization objectives correspond to different styles of the game AI model;
determining parameter values of a plurality of groups of optimization targets of the nth generation according to a multi-target evolutionary algorithm;
training the game AI model according to the multiple groups of parameter values of the optimization target to determine corresponding multiple groups of optimization target values;
and when at least one target optimization target value meeting the optimization threshold exists in the multiple groups of optimization target values, reserving the trained game AI model corresponding to the at least one target optimization target value.
2. The method of claim 1, wherein the parameter values for the optimization objective comprise: the neural network parameters are calculated and obtained according to an enhanced algorithm, and the excitation signal weights correspond to corresponding excitation signals.
3. The method of claim 2, wherein training the game AI model to determine the corresponding sets of the optimization objective values according to the sets of the optimization objective parameter values comprises:
and taking the reinforced algorithm as a mutation operator of the multi-objective evolutionary algorithm, substituting the mutation operator into multiple groups of parameter values of the optimization target, and training the game AI model to determine corresponding multiple groups of optimization target values.
4. The method of claim 3, wherein the training the game AI model to determine the corresponding sets of the optimal target values by substituting the reinforcement algorithm as a mutation operator of the multi-objective evolutionary algorithm into the sets of parameter values of the optimal target comprises:
taking the reinforced algorithm as a mutation operator of the multi-objective evolutionary algorithm, substituting a plurality of groups of parameter values of the optimization objective to obtain the substituted reinforced algorithm;
acquiring an excitation scalar according to the excitation signal weight and the excitation signal corresponding to the excitation signal weight;
and training the game AI model to determine corresponding groups of the optimization target values according to the substituted strengthening algorithm and the excitation scalar.
5. The method of claim 2, further comprising:
and when at least one target optimization target value meeting an optimization threshold does not exist in the multiple groups of optimization target values, adjusting the excitation signal weight according to the multi-target evolutionary algorithm.
6. An apparatus for configuring AI models for different styles of games, comprising: the device comprises a first determining module, a second determining module, a training module and a reserving module;
the first determining module is used for determining a plurality of different optimization targets, wherein the different optimization targets correspond to different styles of the game AI model;
the second determining module is used for determining the parameter values of the multiple groups of optimization targets of the nth generation according to the multi-target evolutionary algorithm;
the training module is used for training the game AI model according to the parameter values of the multiple groups of optimization targets to determine corresponding multiple groups of optimization target values;
the reservation module is configured to, when at least one target optimization target value that meets an optimization threshold exists in the multiple sets of optimization target values, reserve the trained game AI model corresponding to the at least one target optimization target value.
7. The apparatus of claim 6, wherein the parameter values of the optimization objective comprise: the neural network parameters are calculated and obtained according to an enhanced algorithm, and the excitation signal weights correspond to corresponding excitation signals.
8. The apparatus of claim 7, wherein the training module is specifically configured to take the reinforcement algorithm as a mutation operator of the multi-objective evolutionary algorithm, substitute parameter values of a plurality of sets of the optimization objectives, and train the game AI model to determine a corresponding plurality of sets of the optimization objective values.
9. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is running, the processor executing the machine-readable instructions to perform the steps of the method for configuring the AI models of different styles according to any one of claims 1 to 5.
10. A storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of configuring different-style game AI models according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010273066.3A CN111481935B (en) | 2020-04-08 | 2020-04-08 | Configuration method, device, equipment and medium for AI models of games with different styles |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010273066.3A CN111481935B (en) | 2020-04-08 | 2020-04-08 | Configuration method, device, equipment and medium for AI models of games with different styles |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111481935A true CN111481935A (en) | 2020-08-04 |
CN111481935B CN111481935B (en) | 2023-04-18 |
Family
ID=71790100
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010273066.3A Active CN111481935B (en) | 2020-04-08 | 2020-04-08 | Configuration method, device, equipment and medium for AI models of games with different styles |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111481935B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100261527A1 (en) * | 2009-04-10 | 2010-10-14 | Sony Computer Entertainment America Inc., a Delaware Corporation | Methods and systems for enabling control of artificial intelligence game characters |
CN109934352A (en) * | 2019-03-06 | 2019-06-25 | 北京深度奇点科技有限公司 | The automatic evolvement method of model of mind |
CN110533221A (en) * | 2019-07-29 | 2019-12-03 | 西安电子科技大学 | Multipurpose Optimal Method based on production confrontation network |
CN110882542A (en) * | 2019-11-13 | 2020-03-17 | 广州多益网络股份有限公司 | Training method, device, equipment and storage medium for game agent |
-
2020
- 2020-04-08 CN CN202010273066.3A patent/CN111481935B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100261527A1 (en) * | 2009-04-10 | 2010-10-14 | Sony Computer Entertainment America Inc., a Delaware Corporation | Methods and systems for enabling control of artificial intelligence game characters |
CN109934352A (en) * | 2019-03-06 | 2019-06-25 | 北京深度奇点科技有限公司 | The automatic evolvement method of model of mind |
CN110533221A (en) * | 2019-07-29 | 2019-12-03 | 西安电子科技大学 | Multipurpose Optimal Method based on production confrontation network |
CN110882542A (en) * | 2019-11-13 | 2020-03-17 | 广州多益网络股份有限公司 | Training method, device, equipment and storage medium for game agent |
Also Published As
Publication number | Publication date |
---|---|
CN111481935B (en) | 2023-04-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110404264B (en) | Multi-person non-complete information game strategy solving method, device and system based on virtual self-game and storage medium | |
CN111282267B (en) | Information processing method, information processing apparatus, information processing medium, and electronic device | |
CN110141867B (en) | Game intelligent agent training method and device | |
CN107970608A (en) | The method to set up and device, storage medium, electronic device of outpost of the tax office game | |
Ponsen et al. | Integrating opponent models with monte-carlo tree search in poker | |
CN113688977A (en) | Confrontation task oriented man-machine symbiosis reinforcement learning method and device, computing equipment and storage medium | |
CN112685921B (en) | Mahjong intelligent decision method, system and equipment for efficient and accurate search | |
CN111701240B (en) | Virtual article prompting method and device, storage medium and electronic device | |
CN114404975B (en) | Training method, device, equipment, storage medium and program product of decision model | |
Preuss et al. | Integrated balancing of an rts game: Case study and toolbox refinement | |
CN116090549A (en) | Knowledge-driven multi-agent reinforcement learning decision-making method, system and storage medium | |
CN115577795A (en) | Policy model optimization method and device and storage medium | |
Kim et al. | Evolving population method for real-time reinforcement learning | |
CN113893547A (en) | Fitness function-based data processing method and system and storage medium | |
Reis et al. | An adversarial approach for automated Pokémon team building and meta-game balance | |
CN111481935B (en) | Configuration method, device, equipment and medium for AI models of games with different styles | |
Salge et al. | Relevant information as a formalised approach to evaluate game mechanics | |
Yang et al. | Deck building in collectible card games using genetic algorithms: A case study of legends of code and magic | |
CN111882072A (en) | Intelligent model automatic course training method for playing chess with rules | |
CN115659054A (en) | Game level recommendation method and device based on reinforcement learning | |
CN114307124A (en) | Intelligent decision method and system based on card touching mode and computer equipment | |
CN114146401A (en) | Mahjong intelligent decision method, device, storage medium and equipment | |
Gaina et al. | General win prediction from agent experience | |
Khatri | The gaming experience with AI | |
CN113426109A (en) | Method for cloning chess and card game behaviors based on factorization machine |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |