CN112870722A

CN112870722A - Method, device, equipment and medium for generating fighting AI (AI) game model

Info

Publication number: CN112870722A
Application number: CN202110265501.2A
Authority: CN
Inventors: 杨敬文
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-03-11
Filing date: 2021-03-11
Publication date: 2021-06-01
Anticipated expiration: 2041-03-11
Also published as: CN112870722B

Abstract

The application provides a generation method, a device, equipment and a medium for fighting AI game models, wherein the method comprises the following steps: acquiring a first fighting type AI game model; generating a plurality of training samples of the first fighting type AI game model by playing of two AI game roles based on the first fighting type AI game model; training a first fighting type AI game model according to a plurality of training samples to obtain a second fighting type AI game model, wherein each training sample comprises: game situation information, action probability distribution and winning rate. Therefore, the generation efficiency of the fighting AI game model can be improved.

Description

Method, device, equipment and medium for generating fighting AI (AI) game model

Technical Field

The embodiment of the application relates to the technical field of Artificial Intelligence (AI), in particular to a generation method, a device, equipment and a medium of a fighting type AI game model.

Background

The fighting game has the characteristics of short decision time, large decision space, rich strategy variation and the like. During the course of a battle, limited by the play-defined battle time and range, the player needs to avoid the risk and to hurt the opponent character as much as possible by reasonable displacement. Because the behavior strategy of the opponent is rich and changeable, the strategy making, selecting and executing is a crucial link of the game intelligent system in the face of the huge decision space and the real-time requirement of the decision.

Action trees are currently mainly used to implement formulation, selection and enforcement strategies. However, the fight AI game model has a large number of judgment branch descriptions, needs to be adjusted manually and repeatedly, and has a large workload, and the generation efficiency of the fight AI game model is low by the method at present.

Disclosure of Invention

The application provides a generation method, a device, equipment and a medium of fighting type AI game models, thereby improving the generation efficiency of the fighting type AI game models.

In a first aspect, the present application provides a generation method of a fighting AI game model, including: acquiring a first fighting type AI game model; generating a plurality of training samples of the first fighting type AI game model by playing of two AI game roles based on the first fighting type AI game model; training a first fighting type AI game model according to a plurality of training samples to obtain a second fighting type AI game model, wherein each training sample comprises: game situation information, action probability distribution and winning rate.

In a second aspect, the present application provides a device for generating a fighting AI game model, comprising: the system comprises a first acquisition module, a generation module and a training module, wherein the first acquisition module is used for acquiring a first fighting AI game model; the generation module is used for generating a plurality of training samples of the first fighting type AI game model through the playing of two AI game roles based on the first fighting type AI game model; the training module is used for training first fight class AI game model according to a plurality of training samples to obtain second fight class AI game model wherein, every training sample includes: game situation information, action probability distribution and winning rate.

In a third aspect, there is provided a generation device of a fighting type AI game model, comprising: a processor and a memory, the memory for storing a computer program, the processor for invoking and executing the computer program stored in the memory to perform the method of the first aspect.

In a fourth aspect, there is provided a computer readable storage medium for storing a computer program for causing a computer to perform the method of the first aspect.

Through the technical scheme provided by the application, the following technical effects can be realized: first, this approach does not depend on manpower, so that the generation efficiency of the fighting type AI game model can be improved. Second, this way of training the fighting type AI game model through the play of two AI game characters can be applied to any scenario, including scenarios that are difficult to describe with rules. Thirdly, aiming at different AI game roles, a specific fighting AI game model does not need to be designed manually, and the fighting AI game model can be generated only by the self-play of the AI game roles, so that the manual maintenance cost is reduced. Fourth, this approach is not dependent on human power, and therefore, it is difficult for a game player to find a game hole. Fifthly, rich strategy changes can be generated in the case so as to meet the competitive accompanying playing requirements of high-level players. In addition, if the technical scheme is adopted, the server does not need to obtain the data of the real player, and the fighting AI game model can be generated when the product is not on line.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of a generation method of a fighting AI game model according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an action probability distribution according to an embodiment of the present application;

FIG. 3 is a flowchart of a method for generating training samples according to an embodiment of the present disclosure;

FIG. 4 is a flow chart of another method for generating training samples according to an embodiment of the present disclosure;

fig. 5 is a schematic diagram of a one-wheel playing process of two AI game characters provided in the embodiment of the present application;

FIG. 6 is a schematic diagram of generating training samples according to an embodiment of the present application;

FIG. 7 is a flowchart of a first combat fighting AI game model training method according to an embodiment of the present disclosure;

fig. 8 is an interaction flowchart of an action determination method for an AI game character according to an embodiment of the present application;

fig. 9 is a schematic diagram of a device for generating a fighting AI game model according to an embodiment of the present application;

fig. 10 is a schematic block diagram of a generation device 1000 of a fighting AI game model according to an embodiment of the present application;

fig. 11 is a schematic diagram of an AI game system according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The present application relates to the field of AI technology, wherein AI is a theory, method, technique and application system that utilizes a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

As mentioned above, action trees are currently mainly utilized to implement formulation, selection and enforcement strategies. The behavior tree is a tree with distinct node hierarchy and controls a series of execution strategies of the AI game role. Generally, a behavior tree includes 4 large types of nodes:

(1) composite Node, generally comprising: selecting a sub-Node (Selector Node), traversing the Sequence sub-Node (Sequence Node) executing all the sub-nodes in Sequence, executing the Parallel sub-Node (Parallel Node) of all the sub-nodes in Parallel, and further deriving some nodes such as weighted random Node, priority Node and the like.

(2) Modifying Node (Decorator Node), making additional processing on the returned result after the child Node executes, and returning to Parent Node (Parent Node).

(3) Condition Node (Condition Node).

(4) Action nodes (Action nodes), also called leaf nodes.

Through various combinations of the above nodes, a series of execution strategies of the AI game character can be controlled.

The construction of the behavior tree mainly depends on modeling of understanding of game mechanisms and fighting situations by developers. As described above, such behavior trees have a large number of judgment branch descriptions, and need to be adjusted manually and repeatedly, which results in a large workload and a low efficiency of behavior tree generation. In addition to this, there are the following problems:

in a fighting game, a plurality of scenes which are difficult to describe by rules exist, so that the capacity of a behavior tree has a certain bottleneck.

For different AI game roles, developers need to specify behavior trees corresponding to the different AI game roles, which results in high cost of manual maintenance.

The branch logic in the behavior tree is limited, the behavior mode capable of being described is also limited, and the game vulnerability can be easily discovered by the player.

The fighting game has the characteristics of short decision time and rich strategy change, so that the AI game role designed based on the behavior tree has limited capability, and the competitive accompanying and playing requirements of high-level players are difficult to meet.

In order to solve the technical problem, the fighting type AI game model is trained by constructing the fighting type AI game model and playing two AI game roles based on the fighting type AI game model.

Optionally, the technical scheme of the application can be applied to fighting games, for example: a game scene of man-machine battle.

The technical scheme of the application is explained in detail as follows:

fig. 1 is a flowchart of a method for generating a fighting AI game model according to an embodiment of the present application, where an execution main body of the method may be a server or other electronic devices, where the server may be an independent physical server, a server cluster or a distributed system formed by multiple physical servers, or a cloud server providing cloud computing services, and the present application is not limited thereto. The following exemplary description of the method is given by taking the execution subject of the method as a server, and as shown in fig. 1, the method includes the following steps:

s110: and acquiring a first fighting type AI game model.

S120: a plurality of training samples of the first fighting type AI game model are generated through the playing of the two AI game characters based on the first fighting type AI game model.

S130: and training the first fighting type AI game model according to the plurality of training samples to obtain a second fighting type AI game model.

It should be understood that the fighting type AI game models related to the embodiments of the present application are all implementation strategies for implementing AI game characters. These combat fighting type AI game models are all deep neural network models.

It should be appreciated that the first fighting type AI game model is trained with respect to model parameters, i.e., model parameters for which the model is trained.

It should be understood that the first fighting type AI game model, also referred to as the initial fighting type AI game model, has as inputs game situation information and as outputs action probability distribution and odds.

For example: let s denote game situation information, p denote an action probability distribution, v denote a winning rate, and θ denote model parameters relating to the first fighting class AI game model. The first fighting type AI game model can be expressed by the following formula (1):

model_θ(s)＝(p,v) (1)

optionally, the game situation information may be game situation information that can be acquired by any AI game character, and may include at least one of the following, but is not limited to this: position information, blood volume information, skill information, distance information between the AI game character and another game character, obstacle information, time information, and score information of the AI game character.

Optionally, the time information may be a game duration, and the like, which is not limited in the present application.

Optionally, the game situation information further includes at least one of the following, but is not limited thereto: the AI game character can acquire the position information, blood volume information, score information, skill information, and the like of the opponent side.

It should be understood that in a combat scenario of human-computer interaction, the AI game characters involved in the game situation information refer to machine-side AI game characters. And other game characters involved in the game situation information may be player-side game characters.

Actions involved with AI game characters or any combat fighting type AI game model involved in the present application include, but are not limited to: move left, move right, move up, attack, jump, block, etc.

The action probability distribution output by the first fighting type AI game model is the probability distribution of each action, for example: fig. 2 is a schematic diagram of an action probability distribution provided in the embodiment of the present application, and as shown in fig. 2, the execution probabilities corresponding to left movement, right movement, upward movement, attack, jump, and block are 0.2, 0.3, 0.6, 0.8, 0.3, and 0.2, respectively.

The action probability distribution output by the first fighting type AI game model is used for the client where the AI game role is located to determine the executed action in the current game situation, for example: and the client executes the action with the highest probability according to the action probability distribution.

The winning rate output by the first fighting type AI game model is the winning rate of the AI game role under the current game situation information. Therefore, the winning rate of the AI game character may be different under different game situations.

Optionally, the server may generate a plurality of training samples of the first fighting type AI game model in the following optional manner, but is not limited thereto:

the first alternative is as follows: fig. 3 is a flowchart of a method for generating training samples according to an embodiment of the present application, and as shown in fig. 3, the method includes the following steps:

s310: and initializing parameters in the first fighting type AI game model to obtain a third fighting type AI game model.

S320: in the process of playing each wheel of the two AI game roles, game situation information corresponding to the current game situation is input into a third fighting type AI game model to obtain respective action probability distribution of the two AI game roles, the two AI game roles are controlled to execute corresponding actions according to the respective action probability distribution to enter the next game situation, and the next game situation is used as a new current game situation until the two AI game roles win or lose.

S330: a plurality of first training samples corresponding to a winner and a plurality of second training samples corresponding to a loser in the two AI game characters are generated.

Wherein any one of the first training samples comprises: first game situation information, a first action probability distribution and 1; the first game situation information is any game situation information acquired by the winner in the playing process of each wheel; the first action probability distribution is the action probability distribution of the winner under the first game situation information; 1 indicates that the winner has a winning rate of 1.

Any of the second training samples includes: second game situation information, a second action probability distribution, and-1; the second game situation information is any game situation information acquired by a loser in the playing process of each wheel; the second action probability distribution is the action probability distribution of the loser under the first game situation information; -1 indicates that the winning rate of the loser is-1.

The second option is: fig. 4 is a flowchart of another method for generating training samples according to an embodiment of the present application, and as shown in fig. 4, the method includes the following steps:

s410: and initializing parameters in the first fighting type AI game model to obtain a third fighting type AI game model.

S420: in the process of playing each wheel of the two AI game roles, game situation information corresponding to the current game situation is input into a third fighting type AI game model to obtain respective action probability distribution of the two AI game roles, the two AI game roles are controlled to execute corresponding actions according to the respective action probability distribution to enter the next game situation, and the next game situation is used as a new current game situation until the two AI game roles win or lose.

S430: a plurality of first training samples corresponding to a winner and a plurality of second training samples corresponding to a loser in the two AI game characters are generated.

S440: at least one second training sample is selected among the plurality of second training samples.

S450: and aiming at any one second training sample in the at least one second training sample, according to the second game situation information corresponding to the second training sample, adjusting the second action probability distribution corresponding to the second training sample to obtain a third action probability distribution.

S460: a third training sample is generated.

Wherein the third training sample comprises: and the second game situation information, the third action probability distribution and 0 corresponding to the second training sample, wherein 0 represents that the winning rate is 0.

The following describes the first alternative:

optionally, the server may generate a random value to initialize a parameter in the first fighting type AI game model to obtain a third fighting type AI game model. Alternatively, the server may assign a preset value to the first fighting type AI game model for the purpose of initializing parameters in the first fighting type AI game model. In summary, the present application does not limit how the parameters in the first fighting type AI game model are initialized.

Optionally, the server may obtain a plurality of training samples of the first fighting type AI game model through at least one pair of chess playing processes of the two AI game characters.

For example, for each playing process of two AI game characters, the process of the server obtaining a plurality of training samples of the first fighting type AI game model is as follows: fig. 5 is a schematic diagram of a one-wheel playing process of two AI game characters provided in the embodiment of the present application, and as shown in fig. 5, in an initial game situation, it is assumed that game situation information acquired by a game character a is s_A1The game situation information acquired by the game role B is s_B1To be specific to s_A1Inputting the data into a third fighting AI game model to obtain the action probability distribution of the game role A as pi_A1At this time, the game character A executes the action a1 based on the action probability distribution, and similarly, the action s is calculated_B1Inputting the data into a third fighting AI game model to obtain the action probability distribution of the game role B as pi_B1At this time, based on the action probability distribution, the game character B performs action B1, and then enters the next game situation, at which time the game character a performs action a2, the game character B performs action B2, and so on, until the two AI game characters win or lose. Assuming that the final game character a wins n times of battles, the corresponding first training samples are: (s)_A1,π_A1，1)、(s_A2,π_A2，1)…(s_An,π_An1), and the plurality of second training samples corresponding to the game character B are respectively: (s)_B1,π_B1，-1)、(s_B2,π_B2，-1)…(s_Bn,π_Bn-1), eventually these first and second training samples may constitute a first pair as described aboveA plurality of training samples of a combat-type AI game model.

The following describes the second alternative:

it should be understood that S410 to S430 are the same as S310 to S330, and the content thereof can refer to the explanation of S310 to S330, which is not limited in the present application.

The second training sample of the loser often contains much information that can significantly avoid causing the failure. Therefore, as shown in fig. 6, the server may select at least one second training sample from the plurality of second training samples for mutation, where the server may randomly select the at least one second training sample, or may select the at least one second training sample according to a certain rule, and in short, the present application does not limit how the server selects the at least one second training sample. For example: the server may randomly select 50% of the plurality of second training samples to be mutated. The server may adjust the second action probability distribution corresponding to the second training sample according to the second game situation information corresponding to the second training sample, so as to obtain a third action probability distribution. For example: when the game role B is attacked, the server determines that the game role can select the jump or the block, and if the probability of the jump and the block obtained according to the action probability distribution under the current situation is not high, the server can perform variation adjustment on the probability of the jump and the block, such as increasing the probability of the jump or the block. The samples after these variations cannot guarantee that the game character B will win, but can obtain better results than the game character B will fail, so the server can mark the winning rate of the game character B between winning and losing as 0. Based on this, the server may configure the plurality of first training samples, the plurality of second training samples, and the third training samples into the plurality of training samples to train the first fighting AI game model.

It should be understood that the second fighting combat type AI game model may be a fighting combat type AI game model after final training or an intermediate fighting combat type AI game model formed in the training process, which is not limited in the present application.

It should be understood that reference may be made to the first combat type AI game model with respect to the input and output of the second combat type AI game model, which is not limited by the present application.

In summary, in the present application, the server can train the fighting type AI game model by playing two AI game characters based on the fighting type AI game model. First, this approach does not depend on manpower, so that the generation efficiency of the fighting type AI game model can be improved. Second, this way of training the fighting type AI game model through the play of two AI game characters can be applied to any scenario, including scenarios that are difficult to describe with rules. Thirdly, aiming at different AI game roles, a specific fighting AI game model does not need to be designed manually, and the fighting AI game model can be generated only by the self-play of the AI game roles, so that the manual maintenance cost is reduced. Fourth, this approach is not dependent on human power, and therefore, it is difficult for a game player to find a game hole. Fifthly, rich strategy changes can be generated in the case so as to meet the competitive accompanying playing requirements of high-level players. In addition, if the technical scheme is adopted, the server does not need to obtain the data of the real player, and the fighting AI game model can be generated when the product is not on line.

Optionally, after the server obtains the plurality of training samples, the server may train the first fighting combat type AI game model sequentially through each training sample to obtain a second fighting combat type AI game model, in this case, the second fighting combat type AI game model may be understood as the intermediate fighting combat type AI game model or the finally trained fighting combat type AI game model. If the second fighting combat type AI game model is the middle fighting combat type AI game model, the training samples of the second fighting combat type AI game model can be continuously obtained by referring to the process of obtaining the training samples of the first fighting combat type AI game model, the second fighting combat type AI game model is continuously trained through the training samples, and by analogy, when the training end condition is met, the final fighting combat type AI game model is obtained.

Optionally, the training end condition is that the training result is optimal, or the training times reach preset times, which is not limited in the present application.

The following describes the training process of the first fighting type AI game model by way of example:

fig. 7 is a flowchart of a training method of a first fighting type AI game model according to an embodiment of the present application, and as shown in fig. 7, the method includes the following steps:

s710: training the first fighting type AI game model through a fourth training sample to obtain a fourth fighting type AI game model; the fourth training sample is any one of a plurality of training samples.

S720: and taking any training sample except the fourth training sample in the plurality of training samples as a new fourth training sample, taking the fourth fighting type AI game model as a new first fighting type AI game model, and training the new first fighting type AI game model through the new fourth training sample until the training is finished so as to obtain a second fighting type AI game model.

Wherein the fourth training sample comprises: third game aspect information, a fourth action probability distribution, and a first win ratio.

It should be understood that the fourth action probability distribution and the first win ratio in the fourth training sample are both deterministic values.

Optionally, the server may input the third game situation information into the first fighting against fighting type AI game model to obtain the fifth action probability distribution and the second winning rate. And training the first fighting type AI game model according to the fourth action probability distribution, the fifth action probability distribution, the first winning rate and the second winning rate to obtain a fourth fighting type AI game model.

It should be understood that the fifth action probability distribution here relates to the model parameters of the first fighting type AI game model, and similarly, the second winning rate also relates to the model parameters of the first fighting type AI game model.

Optionally, the server may train the first fighting type AI game model to obtain the fourth fighting type AI game model by the following method, but is not limited thereto:

the first alternative is as follows: the server calculates the cross entropy of the fourth action probability distribution and the second action probability distribution and calculates the mean square error of the first and second wins. And training the first fighting type AI game model according to the cross entropy and the mean square error so as to obtain a fourth fighting type AI game model.

The second option is: the server calculates the mean square error of the fourth action probability distribution and the second action probability distribution, and calculates the mean square error of the first and second wins. And training the first fighting type AI game model according to the two mean square errors to obtain a fourth fighting type AI game model.

The following describes the first alternative:

the server can calculate the sum of the cross entropy and the mean square error to obtain a summation result, and the summation result is minimized to train the first fighting type AI game model and obtain the fourth fighting type AI game model.

Illustratively, the objective function is as in equation (2)

L＝(z-v)²-π^Tlogp (2)

The objective function here is divided into two parts, one is the mean square error (z-v) of the first and second odds²Wherein z represents a first winning rate and v represents a second winning rate, the winning rates being related to parameters of the first fighting type AI game model, (z-v)²The smaller the better. The other is the cross entropy-pi of the fourth action probability distribution and the second action probability distribution^Tlogp, where pi represents the fourth action probability distribution and p represents the fifth action probability distribution, and the cross entropy can measure the difference between the two distributions, the smaller the difference between the two distributions, the better.

In the present application, the server may optimize the parameter θ in the first fighting type AI game model using a gradient descent method, i.e., by the following formula (3).

Where α is the update step size of the parameter θ, which may be an artificially set hyper-parameter.

It is the gradient of the objective function L, which is calculated from our training samples, e.g. from software such as TensorFlow or PyTorch.

It should be noted that the parameter θ may be understood as a parameter set including at least one parameter.

It should be understood that the server may morph the objective function described above. And optimizing the parameter theta in the first fighting type AI game model based on the deformed objective function.

The following describes the second alternative:

the server can calculate the sum of the two mean square errors to obtain a summation result, and the summation result is minimized to train the first fighting type AI game model and obtain the fourth fighting type AI game model.

Illustratively, the objective function is as in equation (4)

L＝(z-v)²+(π-p)² (4)

The objective function here is divided into two parts, one is the mean square error (z-v) of the first and second odds²Wherein z represents a first winning rate and v represents a second winning rate, the winning rates being related to parameters of the first fighting type AI game model, (z-v)²The smaller the better. The other is the mean square error (pi-p) of the fourth action probability distribution and the second action probability distribution²Wherein pi represents the fourth action probability distribution, p represents the fifth action probability distribution, and the cross entropy can measure the difference between the two distributions, and the smaller the difference between the two distributions is, the better.

In the present application, the server may optimize the parameter θ in the first fighting type AI game model using a gradient descent method, i.e., by the following formula (5).

To sum up, in this application, the server can train the combat AI game model through playing two AI game roles based on the combat AI game model, and the technical effect of the previous embodiment can be referred to for its effect, which is not repeated herein.

It should be appreciated that after the trained second combat fighting type AI game model is obtained by the server, it can be used as follows:

fig. 8 is an interaction flowchart of a method for determining an action of an AI game character according to an embodiment of the present application, where the method involves a server or other electronic device, and a client where a target AI game character is located, as shown in fig. 8, the method includes the following steps:

s810: the server acquires fourth game situation information of the target AI game role.

S820: the server inputs the fourth game situation information into the second fighting type AI game model to output a sixth action probability distribution of the target AI game role.

S830: and the server outputs the sixth action probability distribution to the client corresponding to the target AI game role.

S840: and the client executes the corresponding action according to the sixth action probability distribution.

Optionally, the fourth game situation information includes at least one of: position information, blood volume information, skill information, distance information between the target AI game character and another game character, obstacle information, time information, and score information of the target AI game character.

Optionally, the fourth game situation information further includes at least one of the following, but is not limited thereto: the target AI game character can acquire the position information, blood volume information, score information, skill information, and the like of the opponent side.

Optionally, after obtaining the sixth action probability distribution, the client may select the action with the highest probability and execute the action. For example: the execution probabilities corresponding to the left movement, the right movement, the upward movement, the attack, the jump and the block determined according to the sixth action probability distribution are respectively 0.2, 0.3, 0.6, 0.8, 0.3 and 0.2. Based on this, the server selects an attack action corresponding to the maximum probability of 0.8, and executes the attack action.

In summary, in the present application, the server may determine the action probability distribution of the target AI game role based on the trained fighting AI game model, and since the model is obtained through the AI game role self-training, the accuracy is high, and based on this, the client may obtain accurate action judgment.

The above model generation process and usage process are described below by way of example:

suppose two game roles are selected in a game, namely a game role A and a game role B, then a first fighting type AI game model is initialized randomly, the two roles are controlled to play against each other until the match is finished to generate a sample, and if the game A wins, a plurality of first training samples can be obtained: (s)_A1,π_A1，1)、(s_A2,π_A2，1)…(s_An,π_An1), and can beTo obtain a plurality of second training samples corresponding to the game character B: (s)_B1,π_B1，-1)、(s_B2,π_B2，-1)…(s_Bn,π_Bn, -1). The server may randomly select at least one second training sample, or may select at least one second training sample according to a certain rule, and perform variation on the second training samples to obtain samples:

the server trains the first fighting type AI game model through the first training sample, the second training sample and the mutated training model to update the parameter theta of the first fighting type AI game model, and finally obtains a second fighting type AI game model.

Based on this, the server may use a second combat type AI game model. Specifically, the server may obtain game state data s from the client, via the model_θAnd(s) obtaining the action probability distribution p and the winning rate v, sending the action probability distribution p to the client, and selecting the action with the maximum probability by the client for output, thereby finishing the interactive competition with the player.

Fig. 9 is a schematic diagram of a fighting AI game model generation device according to an embodiment of the present application, and as shown in fig. 9, the fighting AI game model generation device includes:

the first obtaining module 910 is configured to obtain a first fighting type AI game model.

A generating module 920, configured to generate a plurality of training samples of the first fighting type AI game model through playing of two AI game roles based on the first fighting type AI game model.

The training module 930 is configured to train the first fighting type AI game model according to the plurality of training samples to obtain a second fighting type AI game model.

Wherein each training sample comprises: game situation information, action probability distribution and winning rate.

Optionally, the actions related to the action probability distribution include: left shift, right shift, up shift, attack, jump, block.

In one implementation, the generating module 920 is specifically configured to: and initializing parameters in the first fighting type AI game model to obtain a third fighting type AI game model. In the process of playing each wheel of the two AI game roles, game situation information corresponding to the current game situation is input into a third fighting type AI game model to obtain respective action probability distribution of the two AI game roles, the two AI game roles are controlled to execute corresponding actions according to the respective action probability distribution to enter the next game situation, and the next game situation is used as a new current game situation until the two AI game roles win or lose. A plurality of first training samples corresponding to a winner and a plurality of second training samples corresponding to a loser in the two AI game characters are generated. Wherein any one of the first training samples comprises: first game situation information, a first action probability distribution, and 1. The first game situation information is any game situation information acquired by the winner in the playing process of each wheel. The first action probability distribution is an action probability distribution of the winner under the first game situation information. 1 indicates that the winner has a winning rate of 1. Any of the second training samples includes: second game situation information, a second action probability distribution, and-1. The second game situation information is any game situation information acquired by the loser in the playing process of each wheel. The second action probability distribution is an action probability distribution of the loser under the first game situation information. -1 indicates that the winning rate of the loser is-1.

In another implementation manner, the generating module 920 is specifically configured to: and initializing parameters in the first fighting type AI game model to obtain a third fighting type AI game model. In the process of playing each wheel of the two AI game roles, game situation information corresponding to the current game situation is input into a third fighting type AI game model to obtain respective action probability distribution of the two AI game roles, the two AI game roles are controlled to execute corresponding actions according to the respective action probability distribution to enter the next game situation, and the next game situation is used as a new current game situation until the two AI game roles win or lose. A plurality of first training samples corresponding to a winner and a plurality of second training samples corresponding to a loser in the two AI game characters are generated. At least one second training sample is selected among the plurality of second training samples. And aiming at any one second training sample in the at least one second training sample, according to the second game situation information corresponding to the second training sample, adjusting the second action probability distribution corresponding to the second training sample to obtain a third action probability distribution. A third training sample is generated. Wherein the third training sample comprises: and the second game situation information, the third action probability distribution and 0 corresponding to the second training sample, wherein 0 represents that the winning rate is 0.

Optionally, the training module 930 is specifically configured to: and training the first fighting type AI game model through the fourth training sample to obtain a fourth fighting type AI game model. The fourth training sample is any one of a plurality of training samples. And taking any training sample except the fourth training sample in the plurality of training samples as a new fourth training sample, taking the fourth fighting type AI game model as a new first fighting type AI game model, and training the new first fighting type AI game model through the new fourth training sample until the training is finished so as to obtain a second fighting type AI game model. Wherein the fourth training sample comprises: third game aspect information, a fourth action probability distribution, and a first win ratio.

Optionally, the training module 930 is specifically configured to: and inputting the third game situation information into the first fighting type AI game model to obtain the fifth action probability distribution and the second winning rate. And training the first fighting type AI game model according to the fourth action probability distribution, the fifth action probability distribution, the first winning rate and the second winning rate to obtain a fourth fighting type AI game model.

Optionally, the training module 930 is specifically configured to: and calculating the cross entropy of the fourth action probability distribution and the second action probability distribution. A mean square error of the first and second wins is calculated. And training the first fighting type AI game model according to the cross entropy and the mean square error so as to obtain a fourth fighting type AI game model.

Optionally, the training module 930 is specifically configured to: and calculating the sum of the cross entropy and the mean square error to obtain a summation result. And minimizing the summation result to train the first fighting type AI game model and obtain the fourth fighting type AI game model.

Optionally, the generation device of the fighting AI game model further includes:

the second obtaining module 940 is configured to obtain fourth game situation information of the target AI game role.

The processing module 950 is configured to input the fourth game situation information into the second fighting combat type AI game model to output a sixth action probability distribution of the target AI game role.

An output module 960, configured to output the sixth action probability distribution to the client corresponding to the target AI game role.

It is to be understood that apparatus embodiments and method embodiments may correspond to one another and that similar descriptions may refer to method embodiments. To avoid repetition, further description is omitted here. Specifically, the generation apparatus of the fighting type AI game model shown in fig. 9 may execute the method embodiment corresponding to the server, and the foregoing and other operations and/or functions of each module in the generation apparatus of the fighting type AI game model are respectively for implementing corresponding flows in each method in the method embodiment, and are not repeated herein for brevity.

The generation device of the fighting type AI game model according to the embodiment of the present application is described above from the perspective of the functional block with reference to the drawings. It should be understood that the functional modules may be implemented by hardware, by instructions in software, or by a combination of hardware and software modules. Specifically, the steps of the method embodiments in the present application may be implemented by integrated logic circuits of hardware in a processor and/or instructions in the form of software, and the steps of the method disclosed in conjunction with the embodiments in the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. Alternatively, the software modules may be located in random access memory, flash memory, read only memory, programmable read only memory, electrically erasable programmable memory, registers, and the like, as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps in the above method embodiments in combination with hardware thereof.

Fig. 10 is a schematic block diagram of a device 1000 for generating a fighting AI game model according to an embodiment of the present application, where the device may be a server or other electronic devices, where the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services, and the present application is not limited thereto.

As shown in fig. 10, the generation apparatus 1000 of the fighting type AI game model may include:

a memory 1010 and a processor 1020, the memory 1010 being adapted to store a computer program and to transfer the program code to the processor 1020. In other words, the processor 1020 can call and run the computer program from the memory 1010 to implement the method in the embodiment of the present application.

For example, the processor 1020 may be configured to perform the above-described method embodiments according to instructions in the computer program.

In some embodiments of the present application, the processor 1020 may include, but is not limited to:

general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like.

In some embodiments of the present application, the memory 1010 includes, but is not limited to:

volatile memory and/or non-volatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of example, but not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double Data Rate Synchronous Dynamic random access memory (DDR SDRAM), Enhanced Synchronous SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DR RAM).

In some embodiments of the present application, the computer program can be partitioned into one or more modules that are stored in the memory 1010 and executed by the processor 1020 to perform the methods provided herein. The one or more modules may be a series of computer program instruction segments capable of performing specific functions, the instruction segments describing the execution process of the computer program in the generation device of the fighting type AI game model.

As shown in fig. 10, the generation device of the fighting type AI game model may further include:

a transceiver 1030, the transceiver 1030 being connectable to the processor 1020 or the memory 1010.

The processor 1020 may control the transceiver 1030 to communicate with other devices, and specifically, may transmit information or data to the other devices or receive information or data transmitted by the other devices. The transceiver 1030 may include a transmitter and a receiver. The transceiver 1030 may further include an antenna, and the number of antennas may be one or more.

It should be understood that the various components of the combat fighting type AI game model generation device are connected by a bus system, wherein the bus system includes a power bus, a control bus and a status signal bus in addition to a data bus.

Fig. 11 is a schematic diagram of an AI game system provided in an embodiment of the present application, and as shown in fig. 11, the AI game system includes: a server 1110 and a client 1120, where the server 1110 is configured to implement the generation method of the fighting AI game model, and the content and effect thereof may refer to the method embodiment, which is not described herein again.

The present application also provides a computer storage medium having stored thereon a computer program which, when executed by a computer, enables the computer to perform the method of the above-described method embodiments. In other words, the present application also provides a computer program product containing instructions, which when executed by a computer, cause the computer to execute the method of the above method embodiments.

When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the present application occur, in whole or in part, when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a Digital Video Disk (DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.

Those of ordinary skill in the art will appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the module is merely a logical division, and other divisions may be realized in practice, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.

Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. For example, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and all the changes or substitutions should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A generation method of fighting AI game model is characterized by comprising the following steps:

acquiring a first fighting artificial intelligence AI game model;

generating a plurality of training samples of the first fighting type AI game model by playing of two AI game characters based on the first fighting type AI game model;

training the first fighting type AI game model according to the training samples to obtain a second fighting type AI game model;

wherein each of the training samples comprises: game situation information, action probability distribution and winning rate.

2. The method of claim 1, wherein the action probability distribution relates to actions comprising: left shift, right shift, up shift, attack, jump, block.

3. The method of claim 1, wherein generating the plurality of training samples for the first combat-type AI game model by playing based on two AI game characters of the first combat-type AI game model comprises:

initializing parameters in the first fighting type AI game model to obtain a third fighting type AI game model;

in the process of playing each wheel of the two AI game roles, inputting game situation information corresponding to the current game situation into the third fighting type AI game model to obtain respective action probability distribution of the two AI game roles, controlling the two AI game roles to execute corresponding actions according to the respective action probability distribution so as to enter the next game situation, and taking the next game situation as a new current game situation until the two AI game roles win or lose success;

generating a plurality of first training samples corresponding to a winner and a plurality of second training samples corresponding to a loser in the two AI game characters;

wherein any one of the first training samples comprises: first game situation information, a first action probability distribution and 1; the first game situation information is any game situation information acquired by the winner in the playing process of each wheel; the first action probability distribution is an action probability distribution of the winner under the first game situation information; 1 represents that the winning rate of the winner is 1;

any of the second training samples comprises: second game situation information, a second action probability distribution, and-1; the second game situation information is any game situation information acquired by the loser in the playing process of each wheel; the second action probability distribution is an action probability distribution of the loser under the first game situation information; -1 indicates that the winner of the loser is-1.

4. The method of claim 3, wherein after generating the plurality of second training samples corresponding to the loser, further comprising:

selecting at least one second training sample among the plurality of second training samples;

aiming at any one second training sample in the at least one second training sample, according to second game situation information corresponding to the second training sample, adjusting second action probability distribution corresponding to the second training sample to obtain third action probability distribution;

generating a third training sample;

wherein the third training sample comprises: and the second game situation information corresponding to the second training sample, the third action probability distribution and 0, wherein 0 represents that the winning rate is 0.

5. The method of claim 3 or 4, wherein the training the first fighting type AI game model from the plurality of training samples to obtain a second fighting type AI game model comprises:

training the first fighting type AI game model through a fourth training sample to obtain a fourth fighting type AI game model; the fourth training sample is any one of the plurality of training samples;

taking any training sample except the fourth training sample in the plurality of training samples as a new fourth training sample, taking the fourth fighting type AI game model as a new first fighting type AI game model, and training the new first fighting type AI game model through the new fourth training sample until the training is finished to obtain the second fighting type AI game model;

6. The method of claim 5, wherein the training the first fighting type AI game model with a fourth training sample to obtain a fourth fighting type AI game model comprises:

inputting the third game situation information into the first fighting AI game model to obtain a fifth action probability distribution and a second winning rate;

and training the first fighting type AI game model according to the fourth action probability distribution, the fifth action probability distribution, the first winning rate and the second winning rate to obtain the fourth fighting type AI game model.

7. The method of claim 6, wherein the training the first fighting type AI game model based on the fourth action probability distribution, the fifth action probability distribution, the first win ratio, and the second win ratio to obtain the fourth fighting type AI game model comprises:

calculating the cross entropy of the fourth action probability distribution and the second action probability distribution;

calculating a mean square error of the first and second wins;

and training the first fighting type AI game model according to the cross entropy and the mean square error so as to obtain the fourth fighting type AI game model.

8. The method of claim 7, wherein the training the first fighting type AI game model to obtain the fourth fighting type AI game model based on the cross entropy and the mean square error comprises:

calculating the sum of the cross entropy and the mean square error to obtain a summation result;

and minimizing the summation result to train the first fighting type AI game model to obtain the fourth fighting type AI game model.

9. The method according to any one of claims 1-4, further comprising:

acquiring fourth game situation information of a target AI game role;

inputting the fourth game situation information into the second fighting type AI game model to output a sixth action probability distribution of the target AI game role;

and outputting the sixth action probability distribution to the client corresponding to the target AI game role.

10. The method of claim 9, wherein the fourth game aspect information comprises at least one of: position information, blood volume information, skill information, distance information between the target AI game character and other game characters, obstacle information, time information, and score information of the target AI game character.

11. An apparatus for generating a fighting AI game model, comprising:

the first acquisition module is used for acquiring a first fighting AI game model;

a generation module for generating a plurality of training samples of the first fighting type AI game model by playing based on two AI game roles of the first fighting type AI game model;

the training module is used for training the first fighting type AI game model according to the training samples to obtain a second fighting type AI game model;

12. A server, comprising:

a processor and a memory for storing a computer program, the processor for invoking and executing the computer program stored in the memory to perform the method of any one of claims 1 to 10.

13. A computer-readable storage medium for storing a computer program which causes a computer to perform the method of any one of claims 1 to 10.