CN112755538B - Real-time strategy game match method based on multiple intelligent agents - Google Patents

Real-time strategy game match method based on multiple intelligent agents Download PDF

Info

Publication number
CN112755538B
CN112755538B CN202110370381.2A CN202110370381A CN112755538B CN 112755538 B CN112755538 B CN 112755538B CN 202110370381 A CN202110370381 A CN 202110370381A CN 112755538 B CN112755538 B CN 112755538B
Authority
CN
China
Prior art keywords
search
node
value
strategy
winning probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110370381.2A
Other languages
Chinese (zh)
Other versions
CN112755538A (en
Inventor
张俊格
尹奇跃
于彤彤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202110370381.2A priority Critical patent/CN112755538B/en
Publication of CN112755538A publication Critical patent/CN112755538A/en
Application granted granted Critical
Publication of CN112755538B publication Critical patent/CN112755538B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/80Special adaptations for executing a specific game genre or game mode
    • A63F13/822Strategy games; Role-playing games
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Measuring Pulse, Heart Rate, Blood Pressure Or Blood Flow (AREA)

Abstract

The invention provides a real-time strategy game match method based on multiple intelligent agents, which comprises the following steps: AERUCT search algorithm: carrying out forward search according to the current blood volume and the adaptive adjustment exploration ratio of the winning rate, calculating an evaluation value of a search direction according to the current state, and selecting the next search direction according to the evaluation value of the search direction; the AERUCT search algorithm is an improved UCT search algorithm; the performance of the AERUCT search algorithm is improved in a small-scale game scene, but because the number of nodes for large-scale game scene search decision is increased and is limited by time, the UCTRL algorithm evaluates and selects child nodes with high winning rate by comparing a strategy with good storage and update performance with the result of the AERUCT search, and reversely updates state information, and the steps are repeated, so that the current strategy is not inferior to the previous strategy, each intelligent agent is more intelligent, and the learning capacity is improved.

Description

Real-time strategy game match method based on multiple intelligent agents
Technical Field
The application relates to the field of reinforcement learning, man-machine confrontation and multi-agent games, in particular to a real-time strategy game match method based on multiple agents.
Background
Real-time strategy (RTS) game is not a turn-based game, but a video game. Players manage resources, build different types of structures, and instruct them how to fight opponents. The current research is mainly focused on the aspects of micro-operations, game strategies, optimal paths and the like. The game strategy is particularly important when the number of agents and the attack capabilities of both parties are the same. Therefore, researchers have made a great deal of research into multi-agent gaming strategies.
Script-based and search tree algorithms are commonly used in real-time strategic games, and classical script-based strategic algorithms use a strategy in a round of game play, such as attacking the nearest enemy or attacking the weakest enemy first, etc. The PGS algorithm selects the best action by evaluating multiple scripts. The strategy algorithm based on the script can make a decision quickly and is suitable for a game scene with a plurality of intelligent agents. However, it cannot update the policy according to the change of the real-time scene. In this case, you cannot win once the enemy knows your script algorithm. Search tree algorithms get better strategies as the search depth deepens, such as MCTS, Alpha-Beta, and UCT algorithms. The MCTS algorithm specifies the search tree depth and traverses all possible child nodes to select the best child node. The Alpha-beta algorithm prunes child nodes that are unlikely to have the best results, which improves search efficiency. However, the optimum value can be obtained only after the search is completed. The UCT algorithm is a combination of the UCB and MCTS algorithms. It has advantages in time and space over traditional search algorithms in the course of very large-scale games. Search tree algorithm based game strategies typically make better decisions based on real-time scenarios. But as the number of agents increases, the search depth will become shallower and the search decision obtained will be degraded.
Application publication No. CN 111111220 a relates to a self-playing model training method, device, computer device and storage medium for a multiplayer battle game. The method comprises the following steps: acquiring historical battle video data; acquiring training fight state characteristics from each state characteristic region in a fight video frame of historical fight video data, and acquiring operation labels corresponding to the training fight state characteristics from each fight operation region in the fight video frame; training based on the training fighting state characteristics and the operation labels to obtain a fighting strategy model; predicting operation to carry out the fight based on the fight state characteristics in the fight through a fight strategy model; acquiring the fighting state characteristics in the fighting and the corresponding predicted operation value labels; training a fighting operation value model based on the fighting state characteristics and the operation value labels; and constructing a self-playing model according to the fighting strategy model and the fighting operation value model and training. By adopting the method, the training efficiency of the self-playing model can be improved.
Application publication No. CN 111437608A provides a game match method, device, equipment and storage medium based on artificial intelligence; the method comprises the following steps: responding to a received operation instruction for joining the game play, and acquiring game play data streams of all participants in the game play; carrying out prediction operation on the game match data stream through a trained neural network model to obtain a prediction result, wherein the trained neural network model at least comprises a self-attention coding module; determining a target game strategy based on the prediction result; and sending the target game strategy to a server. Thus, the accuracy of the game strategy can be improved.
Disclosure of Invention
In view of the above, in a first aspect, the present invention provides a multi-agent based real-time strategy game match method, including:
AERUCT search algorithm: and carrying out forward search according to the current blood volume self-adaptive adjustment search ratio, calculating an evaluation value of a search direction according to the value, the traversal times and the search ratio of the current node, wherein the evaluation value is a winning probability value calculated by an AERUCT search algorithm, and selecting the next search direction according to the evaluation value of the search direction.
Preferably, the AERUCT search algorithm is an improved UCT search algorithm, and specifically includes:
(1) the forward search selects a child node for each non-leaf node starting from a root node;
(2) calculating an exploration ratio according to the current blood volume;
(3) calculating the evaluation value of each child node in the searching direction according to the value, the traversal times and the exploration ratio of the current node;
(4) if the child node with the maximum value of the node is required to be selected at present, the child node with the maximum evaluation value is selected; if the child node with the minimum value of the node is required to be selected at present, the child node with the minimum evaluation value is selected;
(5) and after the forward search is finished, updating the values and the traversal times of the nodes on all the search paths in a reverse value transmission mode.
Preferably, the exploratory ratio is a positive correlation function of blood volume.
Preferably, the forward search specifically calculates the evaluation value of each child node in the search direction by:
Figure DEST_PATH_IMAGE001
Figure 941225DEST_PATH_IMAGE002
wherein,
Figure DEST_PATH_IMAGE003
: an evaluation value of a search direction;
Figure 575338DEST_PATH_IMAGE004
: node point
Figure DEST_PATH_IMAGE005
The value of (D);
Figure 437115DEST_PATH_IMAGE006
: current child node
Figure 55047DEST_PATH_IMAGE005
The number of traversals;
Figure DEST_PATH_IMAGE007
: searching ratio;
Figure 293261DEST_PATH_IMAGE008
: the sum of all blood volumes;
Figure DEST_PATH_IMAGE009
: current child node
Figure 63771DEST_PATH_IMAGE005
Blood volume of
c, adjusting a constant of the exploration ratio.
Preferably, the specific method for updating the values and the traversal times of the nodes on all the search paths in the backward value transfer manner includes:
Figure 431471DEST_PATH_IMAGE010
wherein,
Figure DEST_PATH_IMAGE011
: nodes on updated search path
Figure 287432DEST_PATH_IMAGE005
The value of (D);
Figure 594916DEST_PATH_IMAGE012
: the traversal times of the father node;
Figure 219932DEST_PATH_IMAGE004
: child node
Figure 282566DEST_PATH_IMAGE005
The value of (D);
Figure 343932DEST_PATH_IMAGE006
: current child node
Figure 783004DEST_PATH_IMAGE005
The number of traversals.
In a second aspect, the present invention provides another multi-agent based real-time strategy game play method, comprising:
the UCTRL algorithm:
(1) applying an AERUCT search module, adaptively adjusting an exploration ratio according to the current blood volume, carrying out forward search, and calculating an evaluation value of a search direction according to the value, the traversal times and the exploration ratio of the current node, wherein the evaluation value is a winning probability value calculated by an AERUCT search algorithm; the AERUCT search module applies part of an AERUCT search algorithm;
(2) selecting winning probability values of the search directions of the current state in a strategy pool by the evaluation function;
(3) comparing the winning probability value of the selected search direction in the strategy pool with the winning probability value calculated by the AERUCT search algorithm by using an evaluation function, selecting the node with the maximum winning probability value as an update node, and updating the state of the strategy pool;
(4) the currently selected action and the updated node are then passed to the AERUCT search module, and a new search is initiated from this updated node.
Preferably, the partial AERUCT search algorithm comprises:
(1) the forward search selects a child node for each non-leaf node starting from a root node;
(2) calculating an exploration ratio according to the current blood volume;
(3) and calculating the evaluation value of each child node in the searching direction according to the value, the traversal times and the exploration ratio of the current node.
Preferably, the exploration ratio is a positive correlation function of blood volume and odds ratio.
Preferably, the forward search specifically calculates the evaluation value of each child node in the search direction by:
Figure DEST_PATH_IMAGE013
Figure 465789DEST_PATH_IMAGE002
wherein,
Figure 902587DEST_PATH_IMAGE003
: an evaluation value of a search direction;
Figure 982407DEST_PATH_IMAGE004
: node point
Figure 897273DEST_PATH_IMAGE005
The value of (D);
Figure 231303DEST_PATH_IMAGE006
: current child node
Figure 839002DEST_PATH_IMAGE005
The number of traversals;
Figure 688009DEST_PATH_IMAGE007
: searching ratio;
Figure 655834DEST_PATH_IMAGE014
: the sum of all blood volumes;
Figure DEST_PATH_IMAGE015
: current child node
Figure 313211DEST_PATH_IMAGE005
The blood volume of (c);
c, adjusting a constant of the exploration ratio.
Preferably, the policy pool comprises: a memory pool and a forgetting pool; the winning probability value of the search direction of the current state is calculated by a memory pool; the method for calculating the winning probability value of the search direction of the current state comprises the following steps:
recording a state s 'most similar to the current state s in the memory pool, and taking the winning probability value of s' as the winning probability value of the current s; the winning probability value of s' is stored in a memory pool.
Preferably, the method for selecting the child node with the largest winning probability value is as follows: and comparing the winning probability value of the searching direction of the selected child node with the winning probability value of the searching direction in the strategy pool through the AERUCT searching algorithm, and selecting the child node with the high winning probability value as the optimal child node.
Preferably, the method for updating the value of all child nodes passing through the policy pool state includes:
Figure 888549DEST_PATH_IMAGE010
wherein,
Figure 162535DEST_PATH_IMAGE011
: nodes on updated search path
Figure 934051DEST_PATH_IMAGE005
The value of (D);
Figure 39411DEST_PATH_IMAGE012
: the traversal times of the father node;
Figure 988912DEST_PATH_IMAGE004
: child node
Figure 937145DEST_PATH_IMAGE005
The value of (D);
Figure 997505DEST_PATH_IMAGE006
: current child node
Figure 895054DEST_PATH_IMAGE005
The number of traversals.
Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages:
the method provided by the embodiment of the application has the following advantages that:
(1) in a small-scale game scene, the AERUCT search algorithm has better decision effect and can update the exploration and development ratio according to the real-time state of the game;
(2) the UCTRL algorithm introduces a reinforcement learning idea and a memory pool, so that knowledge can be continuously learned according to the acquired reward value when an agent interacts with the environment, the UCTRL algorithm adapts to the environment, and better strategies can be stored in game scenes of various scales for subsequent decisions.
Drawings
Fig. 1 is a diagram of a UCTRL algorithm structure according to an embodiment of the present invention;
fig. 2 is a data flow diagram of the UCTRL algorithm provided by the embodiment of the present invention;
in the figure: 1-AERUCT search module, 2-strategy pool, 21-memory pool, 22-forgetting pool, 3-evaluation function, and 4-reverse update module.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
The UCT search algorithm is suitable for continuous real-time strategy games and can give action feedback to a plurality of agents in a continuous space, however, the exploration ratio of the UCT search algorithm in search decision is fixed and cannot be changed adaptively according to the change of a real-time scene.
In some embodiments, in a small-scale game scenario, the embodiment of the present application provides a multi-agent-based real-time strategy game match method, wherein the AERUCT search algorithm comprises:
and carrying out forward search according to the current blood volume and the adaptive search ratio adjustment of the winning rate, calculating an evaluation value of a search direction according to the value, the traversal times and the search ratio of the current node, and selecting the next search direction according to the evaluation value of the search direction.
The AERUCT search algorithm is an improved UCT search algorithm, and specifically comprises the following steps:
(1) the forward search selects a child node for each non-leaf node starting from a root node;
(2) calculating an exploration ratio according to the current blood volume and the winning rate; the exploration ratio is a positive correlation function of blood volume and the winning rate;
(3) calculating the evaluation value of each child node in the searching direction according to the value, the traversal times and the exploration ratio of the current node;
the current state is the ambient state. If the game environment is information such as the situation of the game, each node is in a different state. The value of a node is the value of the current state and the preservation of the value and traversal times of each node is the calculation of the new value of the node for subsequent updates.
The specific method for calculating the evaluation value of each child node in the search direction in the forward search comprises the following steps:
Figure 999145DEST_PATH_IMAGE016
Figure 310041DEST_PATH_IMAGE002
wherein,
Figure DEST_PATH_IMAGE017
: an evaluation value of a search direction;
Figure 377354DEST_PATH_IMAGE004
: node point
Figure 191726DEST_PATH_IMAGE005
The value of (D);
Figure 483030DEST_PATH_IMAGE018
: current child node
Figure 468173DEST_PATH_IMAGE005
The number of traversals;
Figure 870335DEST_PATH_IMAGE007
: searching ratio;
Figure 804793DEST_PATH_IMAGE008
: the sum of all blood volumes;
Figure 532578DEST_PATH_IMAGE009
: current child node
Figure 224590DEST_PATH_IMAGE005
The blood volume of (c);
c, adjusting a constant of the exploration ratio.
(4) If the child node with the maximum value of the node is required to be selected at present, the child node with the maximum evaluation value is selected; if the child node with the minimum value of the node is required to be selected at present, the child node with the minimum evaluation value is selected;
(5) after the forward search is finished, updating the values and the traversal times of the nodes on all the search paths in a reverse value transmission mode; the specific method comprises the following steps:
Figure 492760DEST_PATH_IMAGE010
wherein,
Figure 468676DEST_PATH_IMAGE011
: nodes on updated search path
Figure 898520DEST_PATH_IMAGE005
The value of (D);
t: the traversal times of the father node;
Figure 608987DEST_PATH_IMAGE004
: child node
Figure 352952DEST_PATH_IMAGE005
The value of (D);
Figure 465264DEST_PATH_IMAGE006
: current child node
Figure 518540DEST_PATH_IMAGE005
The number of traversals.
The depth of a search tree is limited, a v value and a T value need to be initialized while child nodes are expanded, the initial value of v for expanding the child nodes is assumed to be the average value of multiple simulation results of the search tree, and T is initialized to be 0.
In some embodiments, because the number of nodes for large-scale game scene search decision is increased and limited by time, the search depth is reduced, and a better decision strategy is difficult to make, aiming at the problems, the strategy with good updating performance is stored and compared with the result of AERUCT search, the child nodes with high selection rate are evaluated, the state information is updated reversely, and the repeated operation is repeated to ensure that the current strategy is not inferior to the previous strategy, so that each intelligent agent is more intelligent, and the learning capability is improved. According to the real-time strategy game match method based on the multiple intelligent agents, a UCTRL algorithm, such as a reinforcement learning and memory pool, can continuously learn knowledge according to obtained reward values when the intelligent agents interact with the environment, and is suitable for the environment; reinforcement learning algorithms learn to update their own models based on previous sample experience and use the current model to guide the next action. The algorithm then updates the model after the next action. Finally, the algorithm iterates until the model converges. In reinforcement learning algorithms, agents have a definite goal. All agents recognize their environment and direct their behavior towards their goals. Therefore, the reinforcement learning algorithm considers the agent and the uncertain environment as a complete problem. Each action of the algorithm is not only related to the current action and environment of the current time period, but also to historical feedback information of previous time periods.
As shown in fig. 1 and 2, the UCTRL algorithm includes:
(1) applying an AERUCT searching module 1, adaptively adjusting an exploration ratio according to the current blood volume, carrying out forward search, and calculating an evaluation value of a searching direction according to the value, the traversal times and the exploration ratio of the current node; the AERUCT searching module 1 applies partial AERUCT searching algorithm;
in some embodiments, the partial AERUCT search algorithm comprises:
(a) the forward search selects a child node for each non-leaf node starting from a root node;
(b) calculating an exploration ratio according to the current blood volume; said exploration ratio is a positive correlation function of said blood volume and said odds;
(c) and calculating the evaluation value of each child node in the searching direction according to the value, the traversal times and the exploration ratio of the current node.
The specific method for calculating the evaluation value of each child node search direction is as follows:
Figure 450724DEST_PATH_IMAGE013
Figure 263959DEST_PATH_IMAGE002
wherein,
Figure 230778DEST_PATH_IMAGE003
: an evaluation value of a search direction;
Figure 454955DEST_PATH_IMAGE004
: node point
Figure 405593DEST_PATH_IMAGE005
The value of (D);
Figure 225782DEST_PATH_IMAGE018
: current child node
Figure 30796DEST_PATH_IMAGE005
The number of traversals;
Figure 176606DEST_PATH_IMAGE007
: searching ratio;
Figure DEST_PATH_IMAGE019
: the sum of all blood volumes;
Figure 83382DEST_PATH_IMAGE020
: current child node
Figure 238420DEST_PATH_IMAGE005
The blood volume of (c);
c, adjusting a constant of the exploration ratio.
(2) Selecting winning probability values of the search directions of the current state in the strategy pool 2; the policy pool includes: a memory pool 21 and a forgetting pool 22; the winning probability value of the search direction of the current state is calculated by a memory pool 21; the method for calculating the winning probability value of the search direction of the current state comprises the following steps:
the memory pool 21 records the state s 'most similar to the current state s, and then the winning probability value of s' is taken as the winning probability value of the current s; the winning probability value of s' is stored in a memory pool;
(3) comparing the winning probability value of the selected search direction in the strategy pool with the winning probability value calculated by the AERUCT search algorithm by using an evaluation function, selecting the node with the maximum winning probability value as an update node, and updating the state of the strategy pool;
the method for updating the value of all the child nodes passing through the strategy pool state comprises the following steps:
Figure 710990DEST_PATH_IMAGE010
wherein,
Figure 542549DEST_PATH_IMAGE011
: nodes on updated search path
Figure 671042DEST_PATH_IMAGE005
The value of (D);
T: the traversal times of the father node;
Figure 364191DEST_PATH_IMAGE004
: child node
Figure 956846DEST_PATH_IMAGE005
The value of (D);
Figure 444460DEST_PATH_IMAGE006
: current child node
Figure 856986DEST_PATH_IMAGE005
The number of traversals.
(4) The currently selected action and the updated node are then passed to the AERUCT search module, and a new search is initiated from this updated node.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. In other instances, features described in connection with one embodiment may be implemented as discrete components or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. Further, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (6)

1. A multi-agent based real-time strategy game match method, wherein when an agent interacts with the environment, the exploration ratio is updated according to the real-time status of the game, comprising:
(1) applying an AERUCT search module, adaptively adjusting an exploration ratio according to the current blood volume, carrying out forward search, and calculating an evaluation value of a search direction, namely a winning probability value according to the value, the traversal times and the exploration ratio of the current node; the AERUCT search module applies an AERUCT search algorithm;
(2) selecting a winning probability value of the search direction of the current state in the strategy pool by applying an evaluation function;
(3) comparing the winning probability value of the selected search direction in the strategy pool with the winning probability value calculated by the AERUCT search algorithm by using an evaluation function, selecting the node with the maximum winning probability value as an update node, and updating the state of the strategy pool;
(4) then the currently selected action and the updating node are transmitted to an AERUCT searching module, and new searching is started from the updating node;
the AERUCT search algorithm comprises:
(11) the forward search selects a child node for each non-leaf node starting from a root node;
(12) calculating an exploration ratio according to the current blood volume;
(13) and calculating the evaluation value of each child node in the searching direction according to the value, the traversal times and the exploration ratio of the current node.
2. The multi-agent based real-time strategic game strategy approach of claim 1, wherein said heuristic ratio is a positive correlation function of blood volume.
3. The multi-agent based real-time strategy game play-alignment method according to claim 2, wherein the forward search, the specific method of calculating the evaluation value of each child node search direction is:
Figure 706590DEST_PATH_IMAGE001
Figure 307335DEST_PATH_IMAGE002
wherein,
Figure 239519DEST_PATH_IMAGE003
: an evaluation value of a search direction;
Figure 583913DEST_PATH_IMAGE004
: child node
Figure 285153DEST_PATH_IMAGE005
The value of (D);
Figure 322379DEST_PATH_IMAGE006
: current child node
Figure 476280DEST_PATH_IMAGE005
The number of traversals;
Figure 624364DEST_PATH_IMAGE007
: searching ratio;
Figure 432308DEST_PATH_IMAGE008
: the sum of all blood volumes;
Figure 640435DEST_PATH_IMAGE009
: current child node
Figure 281632DEST_PATH_IMAGE005
The blood volume of (c);
c, adjusting a constant of the exploration ratio.
4. The multi-agent based real-time strategic game strategy approach of claim 3, wherein said strategy pool comprises: a memory pool and a forgetting pool; the winning probability value of the search direction of the current state is calculated by a memory pool; the method for calculating the winning probability value of the search direction of the current state comprises the following steps:
recording a state s 'most similar to the current state s in the memory pool, and taking the winning probability value of s' as the winning probability value of the current state s; the winning probability value of s' is stored in a memory pool.
5. The multi-agent based real-time strategic game strategy approach of claim 4, wherein said method of selecting the node with the highest winning probability value is: and comparing the winning probability value of the searching direction of the selected child node with the winning probability value of the searching direction in the strategy pool through the AERUCT searching algorithm, and selecting the child node with the high winning probability value as the optimal child node.
6. The multi-agent based real-time strategy game strategy method of claim 5, wherein the value of all the child nodes passing through is updated when the strategy pool status is updated, by:
Figure 967829DEST_PATH_IMAGE010
wherein,
Figure 643660DEST_PATH_IMAGE011
: nodes on updated search path
Figure 757110DEST_PATH_IMAGE005
The value of (D);
T: the traversal times of the father node;
Figure 947920DEST_PATH_IMAGE004
: child node
Figure 375490DEST_PATH_IMAGE005
The value of (D);
Figure 702566DEST_PATH_IMAGE006
: current child node
Figure 439447DEST_PATH_IMAGE005
The number of traversals.
CN202110370381.2A 2021-04-07 2021-04-07 Real-time strategy game match method based on multiple intelligent agents Active CN112755538B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110370381.2A CN112755538B (en) 2021-04-07 2021-04-07 Real-time strategy game match method based on multiple intelligent agents

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110370381.2A CN112755538B (en) 2021-04-07 2021-04-07 Real-time strategy game match method based on multiple intelligent agents

Publications (2)

Publication Number Publication Date
CN112755538A CN112755538A (en) 2021-05-07
CN112755538B true CN112755538B (en) 2021-08-31

Family

ID=75691416

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110370381.2A Active CN112755538B (en) 2021-04-07 2021-04-07 Real-time strategy game match method based on multiple intelligent agents

Country Status (1)

Country Link
CN (1) CN112755538B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420226A (en) * 2021-07-20 2021-09-21 网易(杭州)网络有限公司 Card recommendation method and device, electronic equipment and computer readable medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105999689A (en) * 2016-05-30 2016-10-12 北京理工大学 AI algorithm for game of the Amazons based on computer game playing
CN110083748A (en) * 2019-04-30 2019-08-02 南京邮电大学 A kind of searching method based on adaptive Dynamic Programming and the search of Monte Carlo tree

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6415274B1 (en) * 1999-06-24 2002-07-02 Sandia Corporation Alpha-beta coordination method for collective search
US9147316B2 (en) * 2012-07-19 2015-09-29 David Hardcastle Method and apparatus that facilitates pooling lottery winnings via a relational structure
CN107038477A (en) * 2016-08-10 2017-08-11 哈尔滨工业大学深圳研究生院 A kind of neutral net under non-complete information learns the estimation method of combination with Q
CN107050839A (en) * 2017-04-14 2017-08-18 安徽大学 Amazon chess game playing by machine system based on UCT algorithms

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105999689A (en) * 2016-05-30 2016-10-12 北京理工大学 AI algorithm for game of the Amazons based on computer game playing
CN110083748A (en) * 2019-04-30 2019-08-02 南京邮电大学 A kind of searching method based on adaptive Dynamic Programming and the search of Monte Carlo tree

Also Published As

Publication number Publication date
CN112755538A (en) 2021-05-07

Similar Documents

Publication Publication Date Title
Wu et al. Training agent for first-person shooter game with actor-critic curriculum learning
Justesen et al. Learning macromanagement in starcraft from replays using deep learning
CN108211362B (en) Non-player character combat strategy learning method based on deep Q learning network
CN110141867B (en) Game intelligent agent training method and device
CA3060900A1 (en) System and method for deep reinforcement learning
Andrade et al. Challenge-sensitive action selection: an application to game balancing
Liu et al. Evolving game skill-depth using general video game ai agents
CN104102522B (en) The artificial emotion driving method of intelligent non-player roles in interactive entertainment
CN112870721B (en) Game interaction method, device, equipment and storage medium
CN109925717B (en) Game victory rate prediction method, model generation method and device
Knegt et al. Opponent modelling in the game of Tron using reinforcement learning
Tang et al. A review of computational intelligence for StarCraft AI
Gemine et al. Imitative learning for real-time strategy games
Gajurel et al. Neuroevolution for rts micro
CN112755538B (en) Real-time strategy game match method based on multiple intelligent agents
Nam et al. Generation of diverse stages in turn-based role-playing game using reinforcement learning
CN114404975A (en) Method, device, equipment, storage medium and program product for training decision model
Singal et al. Modeling decisions in games using reinforcement learning
Zhen et al. Neuroevolution for micromanagement in the real-time strategy game StarCraft: Brood War
CN114344889B (en) Game strategy model generation method and control method of intelligent agent in game
Justesen et al. Learning a behavioral repertoire from demonstrations
CN111882072A (en) Intelligent model automatic course training method for playing chess with rules
CN114581834A (en) Curling decision method for deep reinforcement learning based on Monte Carlo tree search
Ansó et al. Deep reinforcement learning for pellet eating in agar. IO
Langenhoven et al. Swarm tetris: Applying particle swarm optimization to tetris

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant