CN112755538B - Real-time strategy game match method based on multiple intelligent agents - Google Patents
Real-time strategy game match method based on multiple intelligent agents Download PDFInfo
- Publication number
- CN112755538B CN112755538B CN202110370381.2A CN202110370381A CN112755538B CN 112755538 B CN112755538 B CN 112755538B CN 202110370381 A CN202110370381 A CN 202110370381A CN 112755538 B CN112755538 B CN 112755538B
- Authority
- CN
- China
- Prior art keywords
- search
- node
- value
- strategy
- winning probability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 238000011156 evaluation Methods 0.000 claims abstract description 36
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 29
- 239000008280 blood Substances 0.000 claims abstract description 26
- 210000004369 blood Anatomy 0.000 claims abstract description 25
- 238000010845 search algorithm Methods 0.000 claims abstract description 24
- 230000009471 action Effects 0.000 claims description 10
- 230000006870 function Effects 0.000 claims description 6
- 238000005314 correlation function Methods 0.000 claims description 5
- 230000003044 adaptive effect Effects 0.000 abstract description 2
- 238000012549 training Methods 0.000 description 8
- 230000002787 reinforcement Effects 0.000 description 6
- 238000013515 script Methods 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 244000141353 Prunus domestica Species 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/80—Special adaptations for executing a specific game genre or game mode
- A63F13/822—Strategy games; Role-playing games
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Measuring Pulse, Heart Rate, Blood Pressure Or Blood Flow (AREA)
Abstract
The invention provides a real-time strategy game match method based on multiple intelligent agents, which comprises the following steps: AERUCT search algorithm: carrying out forward search according to the current blood volume and the adaptive adjustment exploration ratio of the winning rate, calculating an evaluation value of a search direction according to the current state, and selecting the next search direction according to the evaluation value of the search direction; the AERUCT search algorithm is an improved UCT search algorithm; the performance of the AERUCT search algorithm is improved in a small-scale game scene, but because the number of nodes for large-scale game scene search decision is increased and is limited by time, the UCTRL algorithm evaluates and selects child nodes with high winning rate by comparing a strategy with good storage and update performance with the result of the AERUCT search, and reversely updates state information, and the steps are repeated, so that the current strategy is not inferior to the previous strategy, each intelligent agent is more intelligent, and the learning capacity is improved.
Description
Technical Field
The application relates to the field of reinforcement learning, man-machine confrontation and multi-agent games, in particular to a real-time strategy game match method based on multiple agents.
Background
Real-time strategy (RTS) game is not a turn-based game, but a video game. Players manage resources, build different types of structures, and instruct them how to fight opponents. The current research is mainly focused on the aspects of micro-operations, game strategies, optimal paths and the like. The game strategy is particularly important when the number of agents and the attack capabilities of both parties are the same. Therefore, researchers have made a great deal of research into multi-agent gaming strategies.
Script-based and search tree algorithms are commonly used in real-time strategic games, and classical script-based strategic algorithms use a strategy in a round of game play, such as attacking the nearest enemy or attacking the weakest enemy first, etc. The PGS algorithm selects the best action by evaluating multiple scripts. The strategy algorithm based on the script can make a decision quickly and is suitable for a game scene with a plurality of intelligent agents. However, it cannot update the policy according to the change of the real-time scene. In this case, you cannot win once the enemy knows your script algorithm. Search tree algorithms get better strategies as the search depth deepens, such as MCTS, Alpha-Beta, and UCT algorithms. The MCTS algorithm specifies the search tree depth and traverses all possible child nodes to select the best child node. The Alpha-beta algorithm prunes child nodes that are unlikely to have the best results, which improves search efficiency. However, the optimum value can be obtained only after the search is completed. The UCT algorithm is a combination of the UCB and MCTS algorithms. It has advantages in time and space over traditional search algorithms in the course of very large-scale games. Search tree algorithm based game strategies typically make better decisions based on real-time scenarios. But as the number of agents increases, the search depth will become shallower and the search decision obtained will be degraded.
Application publication No. CN 111111220 a relates to a self-playing model training method, device, computer device and storage medium for a multiplayer battle game. The method comprises the following steps: acquiring historical battle video data; acquiring training fight state characteristics from each state characteristic region in a fight video frame of historical fight video data, and acquiring operation labels corresponding to the training fight state characteristics from each fight operation region in the fight video frame; training based on the training fighting state characteristics and the operation labels to obtain a fighting strategy model; predicting operation to carry out the fight based on the fight state characteristics in the fight through a fight strategy model; acquiring the fighting state characteristics in the fighting and the corresponding predicted operation value labels; training a fighting operation value model based on the fighting state characteristics and the operation value labels; and constructing a self-playing model according to the fighting strategy model and the fighting operation value model and training. By adopting the method, the training efficiency of the self-playing model can be improved.
Application publication No. CN 111437608A provides a game match method, device, equipment and storage medium based on artificial intelligence; the method comprises the following steps: responding to a received operation instruction for joining the game play, and acquiring game play data streams of all participants in the game play; carrying out prediction operation on the game match data stream through a trained neural network model to obtain a prediction result, wherein the trained neural network model at least comprises a self-attention coding module; determining a target game strategy based on the prediction result; and sending the target game strategy to a server. Thus, the accuracy of the game strategy can be improved.
Disclosure of Invention
In view of the above, in a first aspect, the present invention provides a multi-agent based real-time strategy game match method, including:
AERUCT search algorithm: and carrying out forward search according to the current blood volume self-adaptive adjustment search ratio, calculating an evaluation value of a search direction according to the value, the traversal times and the search ratio of the current node, wherein the evaluation value is a winning probability value calculated by an AERUCT search algorithm, and selecting the next search direction according to the evaluation value of the search direction.
Preferably, the AERUCT search algorithm is an improved UCT search algorithm, and specifically includes:
(1) the forward search selects a child node for each non-leaf node starting from a root node;
(2) calculating an exploration ratio according to the current blood volume;
(3) calculating the evaluation value of each child node in the searching direction according to the value, the traversal times and the exploration ratio of the current node;
(4) if the child node with the maximum value of the node is required to be selected at present, the child node with the maximum evaluation value is selected; if the child node with the minimum value of the node is required to be selected at present, the child node with the minimum evaluation value is selected;
(5) and after the forward search is finished, updating the values and the traversal times of the nodes on all the search paths in a reverse value transmission mode.
Preferably, the exploratory ratio is a positive correlation function of blood volume.
Preferably, the forward search specifically calculates the evaluation value of each child node in the search direction by:
wherein,
c, adjusting a constant of the exploration ratio.
Preferably, the specific method for updating the values and the traversal times of the nodes on all the search paths in the backward value transfer manner includes:
wherein,
In a second aspect, the present invention provides another multi-agent based real-time strategy game play method, comprising:
the UCTRL algorithm:
(1) applying an AERUCT search module, adaptively adjusting an exploration ratio according to the current blood volume, carrying out forward search, and calculating an evaluation value of a search direction according to the value, the traversal times and the exploration ratio of the current node, wherein the evaluation value is a winning probability value calculated by an AERUCT search algorithm; the AERUCT search module applies part of an AERUCT search algorithm;
(2) selecting winning probability values of the search directions of the current state in a strategy pool by the evaluation function;
(3) comparing the winning probability value of the selected search direction in the strategy pool with the winning probability value calculated by the AERUCT search algorithm by using an evaluation function, selecting the node with the maximum winning probability value as an update node, and updating the state of the strategy pool;
(4) the currently selected action and the updated node are then passed to the AERUCT search module, and a new search is initiated from this updated node.
Preferably, the partial AERUCT search algorithm comprises:
(1) the forward search selects a child node for each non-leaf node starting from a root node;
(2) calculating an exploration ratio according to the current blood volume;
(3) and calculating the evaluation value of each child node in the searching direction according to the value, the traversal times and the exploration ratio of the current node.
Preferably, the exploration ratio is a positive correlation function of blood volume and odds ratio.
Preferably, the forward search specifically calculates the evaluation value of each child node in the search direction by:
wherein,
c, adjusting a constant of the exploration ratio.
Preferably, the policy pool comprises: a memory pool and a forgetting pool; the winning probability value of the search direction of the current state is calculated by a memory pool; the method for calculating the winning probability value of the search direction of the current state comprises the following steps:
recording a state s 'most similar to the current state s in the memory pool, and taking the winning probability value of s' as the winning probability value of the current s; the winning probability value of s' is stored in a memory pool.
Preferably, the method for selecting the child node with the largest winning probability value is as follows: and comparing the winning probability value of the searching direction of the selected child node with the winning probability value of the searching direction in the strategy pool through the AERUCT searching algorithm, and selecting the child node with the high winning probability value as the optimal child node.
Preferably, the method for updating the value of all child nodes passing through the policy pool state includes:
wherein,
Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages:
the method provided by the embodiment of the application has the following advantages that:
(1) in a small-scale game scene, the AERUCT search algorithm has better decision effect and can update the exploration and development ratio according to the real-time state of the game;
(2) the UCTRL algorithm introduces a reinforcement learning idea and a memory pool, so that knowledge can be continuously learned according to the acquired reward value when an agent interacts with the environment, the UCTRL algorithm adapts to the environment, and better strategies can be stored in game scenes of various scales for subsequent decisions.
Drawings
Fig. 1 is a diagram of a UCTRL algorithm structure according to an embodiment of the present invention;
fig. 2 is a data flow diagram of the UCTRL algorithm provided by the embodiment of the present invention;
in the figure: 1-AERUCT search module, 2-strategy pool, 21-memory pool, 22-forgetting pool, 3-evaluation function, and 4-reverse update module.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
The UCT search algorithm is suitable for continuous real-time strategy games and can give action feedback to a plurality of agents in a continuous space, however, the exploration ratio of the UCT search algorithm in search decision is fixed and cannot be changed adaptively according to the change of a real-time scene.
In some embodiments, in a small-scale game scenario, the embodiment of the present application provides a multi-agent-based real-time strategy game match method, wherein the AERUCT search algorithm comprises:
and carrying out forward search according to the current blood volume and the adaptive search ratio adjustment of the winning rate, calculating an evaluation value of a search direction according to the value, the traversal times and the search ratio of the current node, and selecting the next search direction according to the evaluation value of the search direction.
The AERUCT search algorithm is an improved UCT search algorithm, and specifically comprises the following steps:
(1) the forward search selects a child node for each non-leaf node starting from a root node;
(2) calculating an exploration ratio according to the current blood volume and the winning rate; the exploration ratio is a positive correlation function of blood volume and the winning rate;
(3) calculating the evaluation value of each child node in the searching direction according to the value, the traversal times and the exploration ratio of the current node;
the current state is the ambient state. If the game environment is information such as the situation of the game, each node is in a different state. The value of a node is the value of the current state and the preservation of the value and traversal times of each node is the calculation of the new value of the node for subsequent updates.
The specific method for calculating the evaluation value of each child node in the search direction in the forward search comprises the following steps:
wherein,
c, adjusting a constant of the exploration ratio.
(4) If the child node with the maximum value of the node is required to be selected at present, the child node with the maximum evaluation value is selected; if the child node with the minimum value of the node is required to be selected at present, the child node with the minimum evaluation value is selected;
(5) after the forward search is finished, updating the values and the traversal times of the nodes on all the search paths in a reverse value transmission mode; the specific method comprises the following steps:
wherein,
t: the traversal times of the father node;
The depth of a search tree is limited, a v value and a T value need to be initialized while child nodes are expanded, the initial value of v for expanding the child nodes is assumed to be the average value of multiple simulation results of the search tree, and T is initialized to be 0.
In some embodiments, because the number of nodes for large-scale game scene search decision is increased and limited by time, the search depth is reduced, and a better decision strategy is difficult to make, aiming at the problems, the strategy with good updating performance is stored and compared with the result of AERUCT search, the child nodes with high selection rate are evaluated, the state information is updated reversely, and the repeated operation is repeated to ensure that the current strategy is not inferior to the previous strategy, so that each intelligent agent is more intelligent, and the learning capability is improved. According to the real-time strategy game match method based on the multiple intelligent agents, a UCTRL algorithm, such as a reinforcement learning and memory pool, can continuously learn knowledge according to obtained reward values when the intelligent agents interact with the environment, and is suitable for the environment; reinforcement learning algorithms learn to update their own models based on previous sample experience and use the current model to guide the next action. The algorithm then updates the model after the next action. Finally, the algorithm iterates until the model converges. In reinforcement learning algorithms, agents have a definite goal. All agents recognize their environment and direct their behavior towards their goals. Therefore, the reinforcement learning algorithm considers the agent and the uncertain environment as a complete problem. Each action of the algorithm is not only related to the current action and environment of the current time period, but also to historical feedback information of previous time periods.
As shown in fig. 1 and 2, the UCTRL algorithm includes:
(1) applying an AERUCT searching module 1, adaptively adjusting an exploration ratio according to the current blood volume, carrying out forward search, and calculating an evaluation value of a searching direction according to the value, the traversal times and the exploration ratio of the current node; the AERUCT searching module 1 applies partial AERUCT searching algorithm;
in some embodiments, the partial AERUCT search algorithm comprises:
(a) the forward search selects a child node for each non-leaf node starting from a root node;
(b) calculating an exploration ratio according to the current blood volume; said exploration ratio is a positive correlation function of said blood volume and said odds;
(c) and calculating the evaluation value of each child node in the searching direction according to the value, the traversal times and the exploration ratio of the current node.
The specific method for calculating the evaluation value of each child node search direction is as follows:
wherein,
c, adjusting a constant of the exploration ratio.
(2) Selecting winning probability values of the search directions of the current state in the strategy pool 2; the policy pool includes: a memory pool 21 and a forgetting pool 22; the winning probability value of the search direction of the current state is calculated by a memory pool 21; the method for calculating the winning probability value of the search direction of the current state comprises the following steps:
the memory pool 21 records the state s 'most similar to the current state s, and then the winning probability value of s' is taken as the winning probability value of the current s; the winning probability value of s' is stored in a memory pool;
(3) comparing the winning probability value of the selected search direction in the strategy pool with the winning probability value calculated by the AERUCT search algorithm by using an evaluation function, selecting the node with the maximum winning probability value as an update node, and updating the state of the strategy pool;
the method for updating the value of all the child nodes passing through the strategy pool state comprises the following steps:
wherein,
T: the traversal times of the father node;
(4) The currently selected action and the updated node are then passed to the AERUCT search module, and a new search is initiated from this updated node.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. In other instances, features described in connection with one embodiment may be implemented as discrete components or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. Further, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (6)
1. A multi-agent based real-time strategy game match method, wherein when an agent interacts with the environment, the exploration ratio is updated according to the real-time status of the game, comprising:
(1) applying an AERUCT search module, adaptively adjusting an exploration ratio according to the current blood volume, carrying out forward search, and calculating an evaluation value of a search direction, namely a winning probability value according to the value, the traversal times and the exploration ratio of the current node; the AERUCT search module applies an AERUCT search algorithm;
(2) selecting a winning probability value of the search direction of the current state in the strategy pool by applying an evaluation function;
(3) comparing the winning probability value of the selected search direction in the strategy pool with the winning probability value calculated by the AERUCT search algorithm by using an evaluation function, selecting the node with the maximum winning probability value as an update node, and updating the state of the strategy pool;
(4) then the currently selected action and the updating node are transmitted to an AERUCT searching module, and new searching is started from the updating node;
the AERUCT search algorithm comprises:
(11) the forward search selects a child node for each non-leaf node starting from a root node;
(12) calculating an exploration ratio according to the current blood volume;
(13) and calculating the evaluation value of each child node in the searching direction according to the value, the traversal times and the exploration ratio of the current node.
2. The multi-agent based real-time strategic game strategy approach of claim 1, wherein said heuristic ratio is a positive correlation function of blood volume.
3. The multi-agent based real-time strategy game play-alignment method according to claim 2, wherein the forward search, the specific method of calculating the evaluation value of each child node search direction is:
wherein,
c, adjusting a constant of the exploration ratio.
4. The multi-agent based real-time strategic game strategy approach of claim 3, wherein said strategy pool comprises: a memory pool and a forgetting pool; the winning probability value of the search direction of the current state is calculated by a memory pool; the method for calculating the winning probability value of the search direction of the current state comprises the following steps:
recording a state s 'most similar to the current state s in the memory pool, and taking the winning probability value of s' as the winning probability value of the current state s; the winning probability value of s' is stored in a memory pool.
5. The multi-agent based real-time strategic game strategy approach of claim 4, wherein said method of selecting the node with the highest winning probability value is: and comparing the winning probability value of the searching direction of the selected child node with the winning probability value of the searching direction in the strategy pool through the AERUCT searching algorithm, and selecting the child node with the high winning probability value as the optimal child node.
6. The multi-agent based real-time strategy game strategy method of claim 5, wherein the value of all the child nodes passing through is updated when the strategy pool status is updated, by:
wherein,
T: the traversal times of the father node;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110370381.2A CN112755538B (en) | 2021-04-07 | 2021-04-07 | Real-time strategy game match method based on multiple intelligent agents |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110370381.2A CN112755538B (en) | 2021-04-07 | 2021-04-07 | Real-time strategy game match method based on multiple intelligent agents |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112755538A CN112755538A (en) | 2021-05-07 |
CN112755538B true CN112755538B (en) | 2021-08-31 |
Family
ID=75691416
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110370381.2A Active CN112755538B (en) | 2021-04-07 | 2021-04-07 | Real-time strategy game match method based on multiple intelligent agents |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112755538B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113420226A (en) * | 2021-07-20 | 2021-09-21 | 网易(杭州)网络有限公司 | Card recommendation method and device, electronic equipment and computer readable medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105999689A (en) * | 2016-05-30 | 2016-10-12 | 北京理工大学 | AI algorithm for game of the Amazons based on computer game playing |
CN110083748A (en) * | 2019-04-30 | 2019-08-02 | 南京邮电大学 | A kind of searching method based on adaptive Dynamic Programming and the search of Monte Carlo tree |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6415274B1 (en) * | 1999-06-24 | 2002-07-02 | Sandia Corporation | Alpha-beta coordination method for collective search |
US9147316B2 (en) * | 2012-07-19 | 2015-09-29 | David Hardcastle | Method and apparatus that facilitates pooling lottery winnings via a relational structure |
CN107038477A (en) * | 2016-08-10 | 2017-08-11 | 哈尔滨工业大学深圳研究生院 | A kind of neutral net under non-complete information learns the estimation method of combination with Q |
CN107050839A (en) * | 2017-04-14 | 2017-08-18 | 安徽大学 | Amazon chess game playing by machine system based on UCT algorithms |
-
2021
- 2021-04-07 CN CN202110370381.2A patent/CN112755538B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105999689A (en) * | 2016-05-30 | 2016-10-12 | 北京理工大学 | AI algorithm for game of the Amazons based on computer game playing |
CN110083748A (en) * | 2019-04-30 | 2019-08-02 | 南京邮电大学 | A kind of searching method based on adaptive Dynamic Programming and the search of Monte Carlo tree |
Also Published As
Publication number | Publication date |
---|---|
CN112755538A (en) | 2021-05-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wu et al. | Training agent for first-person shooter game with actor-critic curriculum learning | |
Justesen et al. | Learning macromanagement in starcraft from replays using deep learning | |
CN108211362B (en) | Non-player character combat strategy learning method based on deep Q learning network | |
CN110141867B (en) | Game intelligent agent training method and device | |
CA3060900A1 (en) | System and method for deep reinforcement learning | |
Andrade et al. | Challenge-sensitive action selection: an application to game balancing | |
Liu et al. | Evolving game skill-depth using general video game ai agents | |
CN104102522B (en) | The artificial emotion driving method of intelligent non-player roles in interactive entertainment | |
CN112870721B (en) | Game interaction method, device, equipment and storage medium | |
CN109925717B (en) | Game victory rate prediction method, model generation method and device | |
Knegt et al. | Opponent modelling in the game of Tron using reinforcement learning | |
Tang et al. | A review of computational intelligence for StarCraft AI | |
Gemine et al. | Imitative learning for real-time strategy games | |
Gajurel et al. | Neuroevolution for rts micro | |
CN112755538B (en) | Real-time strategy game match method based on multiple intelligent agents | |
Nam et al. | Generation of diverse stages in turn-based role-playing game using reinforcement learning | |
CN114404975A (en) | Method, device, equipment, storage medium and program product for training decision model | |
Singal et al. | Modeling decisions in games using reinforcement learning | |
Zhen et al. | Neuroevolution for micromanagement in the real-time strategy game StarCraft: Brood War | |
CN114344889B (en) | Game strategy model generation method and control method of intelligent agent in game | |
Justesen et al. | Learning a behavioral repertoire from demonstrations | |
CN111882072A (en) | Intelligent model automatic course training method for playing chess with rules | |
CN114581834A (en) | Curling decision method for deep reinforcement learning based on Monte Carlo tree search | |
Ansó et al. | Deep reinforcement learning for pellet eating in agar. IO | |
Langenhoven et al. | Swarm tetris: Applying particle swarm optimization to tetris |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |