CN111047053A - Monte Carlo search game decision method and system facing to opponents with unknown strategies - Google Patents

Monte Carlo search game decision method and system facing to opponents with unknown strategies Download PDF

Info

Publication number
CN111047053A
CN111047053A CN201911142537.0A CN201911142537A CN111047053A CN 111047053 A CN111047053 A CN 111047053A CN 201911142537 A CN201911142537 A CN 201911142537A CN 111047053 A CN111047053 A CN 111047053A
Authority
CN
China
Prior art keywords
decision
game
monte carlo
unknown
situation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911142537.0A
Other languages
Chinese (zh)
Inventor
芦维宁
杨君
赵千川
梁斌
谢鸣洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201911142537.0A priority Critical patent/CN111047053A/en
Publication of CN111047053A publication Critical patent/CN111047053A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a Monte Carlo search game decision method and a system facing to an unknown strategy opponent, wherein the method comprises the following steps: designing a state evaluation function based on expert experience according to the rules of the air combat game to estimate the situation of an enemy and a self party in the fighting situation; and a Monte Carlo search algorithm fused with the maximum and minimum game ideas is provided, and game decisions are output when the opponents of the air battle game with unknown decision methods face. According to the Monte Carlo search game decision method, the game strategy idea of the maximum and minimum game algorithm for the alternative decision of the enemy and the my is used for reference, on the premise that the opponent game strategy is not needed to be obtained, various possibilities of the opponent decision can be fully considered, the corresponding coping decision can be made, and the confrontation rate is improved.

Description

Monte Carlo search game decision method and system facing to opponents with unknown strategies
Technical Field
The invention relates to the technical field of game decision, in particular to a game decision method and a game decision system for an unknown strategy opponent under a one-to-one air combat game environment.
Background
Machine gaming is a discipline that studies how to solve the antagonism decision problem using machine learning methods. With the rapid development of artificial intelligence technology, machine game theory and related applications have been advanced into various social fields such as politics, finance, military and the like. The AlphaGo and AlphaGo Zero systems, such as those introduced by DeepMind in 2016 and 2017, successfully compete human high-handed players in go-type gambling games, which also promises the machine gambling discipline, as well as the ability to provide human aid decision support in many segments of future life.
If the hand information is completely divided in the game process, the game problems can be roughly classified into perfect information games and imperfect information games, the difficulty of solving the problems is increased due to the missing of the game information in the game process under the imperfect information condition, however, in practical application, the missing of the game information becomes a normal state due to various subjective and objective reasons, and therefore, the game process for researching the imperfect information has practical significance.
The one-to-one air battle game is a typical game decision type game, and the confrontation environment with unknown opponent strategy is also the conventional setting of the game, so the one-to-one air battle game is a very representative imperfect information game algorithm experimental platform.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, one objective of the present invention is to provide a Monte Carlo search game decision method facing an opponent with unknown strategy, which can fully consider various possibilities of opponent decisions and make corresponding coping decisions to improve the winning rate without learning opponent game strategies.
Another objective of the present invention is to provide a monte carlo search betting decision system facing an opponent with unknown strategy.
In order to achieve the above object, an embodiment of the present invention provides a monte carlo search game decision method for an opponent with unknown policy, including the following steps: the method comprises the following steps of designing a composite state evaluation function by taking the influence of the current situation of the two enemies and the two parties on the subsequent confrontation situation under the air combat fighting game environment and the physical constraint factors of the two enemies and the two parties as reference standards; describing the change situation of the situations of the enemy and the my both sides in the adversarial progress from a plurality of dimensions through the composite state evaluation function; and outputting game decisions when the opponents of the air battle game with unknown decision methods face through a Monte Carlo search algorithm fused with the maximum and minimum game ideas according to the change conditions.
According to the Monte Carlo search game decision method for the opponents with unknown strategies, disclosed by the embodiment of the invention, through the game strategy idea of the alternative decision of the maximum and minimum game algorithm for the two opponents, the action with the maximum upper bound of confidence coefficient is obtained in the selection of the own evaluation system by the own decision wheel, and the action with the maximum upper bound of confidence coefficient is obtained in the selection of the opponent evaluation system by the opponent decision wheel, so that on the premise of not knowing the opponent game strategies, various possibilities of the opponent decisions can be fully considered, corresponding countermeasures can be made, and the confrontation rate is improved.
In addition, the Monte Carlo search game decision method facing the unknown strategy opponents according to the embodiment of the invention can also have the following additional technical features:
further, in an embodiment of the present invention, the composite state evaluation function includes an immediate return item, a persistent return item, and a physical relationship rationality constraint item.
Further, in one embodiment of the present invention, the persistent reward item includes: and calculating the return according to the situation superiority and inferiority duration of the two enemies, wherein the longer the situation superiority or inferiority duration of the two enemies is, the larger the reward/penalty value of the situation evaluation function of the two enemies is.
Further, in an embodiment of the present invention, the method further includes: and obtaining the action with the maximum confidence degree upper bound in the own-party decision wheel selection own-party evaluation system, and obtaining the action with the maximum confidence degree upper bound in the enemy decision wheel selection enemy evaluation system.
In order to achieve the above object, another embodiment of the present invention provides a monte carlo search game decision system facing an opponent with unknown policy, including: the design module is used for designing a composite state evaluation function by taking the influence of the current situation of the two enemies and the two parties on the subsequent confrontation situation and the physical constraint factors of the two enemies and the two parties as reference standards in the air combat game environment; the description module is used for describing the change situation of the situations of the enemy and the my both in the adversarial progress from a plurality of dimensions through the composite state evaluation function; and the output module is used for outputting game decisions when the opponents of the air battle games with unknown decision methods face through a Monte Carlo search algorithm fused with the maximum and minimum game ideas according to the change conditions.
According to the Monte Carlo search game decision system for the opponents with unknown strategies, the game strategy idea of the alternative decision of the maximum and minimum game algorithm for the two opponents is adopted, the action with the maximum upper bound of confidence coefficient is obtained in the selection of the own evaluation system by the own decision wheel, and the action with the maximum upper bound of confidence coefficient is obtained in the selection of the opponent evaluation system by the opponent decision wheel, so that on the premise that the opponent game strategy does not need to be obtained, various possibilities of the opponent decisions can be fully considered, corresponding countermeasures can be made, and the confrontation rate is improved.
In addition, the Monte Carlo search game decision system facing the unknown strategy opponents according to the above embodiment of the present invention may also have the following additional technical features:
further, in an embodiment of the present invention, the composite state evaluation function includes an immediate return item, a persistent return item, and a physical relationship rationality constraint item.
Further, in one embodiment of the present invention, the persistent reward item includes: and calculating the return according to the situation superiority and inferiority duration of the two enemies, wherein the longer the situation superiority or inferiority duration of the two enemies is, the larger the reward/penalty value of the situation evaluation function of the two enemies is.
Optionally, in an embodiment of the present invention, the method further includes: and the selection module is used for obtaining the action with the maximum confidence degree upper bound in the own-party decision wheel selection own-party evaluation system and obtaining the action with the maximum confidence degree upper bound in the enemy decision wheel selection enemy evaluation system.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow diagram of a Monte Carlo search gambling decision method for an unknown policy opponent according to one embodiment of the invention;
FIG. 2 is a flow chart of a Monte Carlo search game decision method model construction according to one embodiment of the present invention;
FIG. 3 is a schematic diagram of an airplane combat model in a one-to-one air combat game combat platform employed in accordance with one embodiment of the present invention;
FIG. 4 is a diagram illustrating variable definitions associated with the location relationship of the friend or foe in accordance with an embodiment of the invention;
fig. 5 is a schematic structural diagram of a monte carlo search gambling decision system facing an unknown policy opponent according to an embodiment of the invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The Monte Carlo search game decision method and system for the opponents with unknown strategies proposed by the embodiment of the invention are described below with reference to the accompanying drawings, and firstly, the Monte Carlo search game decision method for the opponents with unknown strategies proposed by the embodiment of the invention is described with reference to the accompanying drawings.
FIG. 1 is a flow chart of a Monte Carlo search betting decision method for an unknown policy opponent in accordance with an embodiment of the present invention.
As shown in fig. 1, the monte carlo search game decision method facing the unknown strategy opponent comprises the following steps:
in step S101, a composite state evaluation function is designed with the influence of the current situation of both the enemy and the my on the subsequent confrontation situation and the physical constraint factors of both the enemy and the my as reference criteria in the air combat game environment.
In step S102, the change of the situation of both the friend and foe in the progress of confrontation is described from a plurality of dimensions by the composite state evaluation function.
That is to say, as shown in fig. 2, with the influence of the current situation of the two enemies and the physical constraint factors of the two enemies and the two parties as the main reference criteria, a composite state evaluation function is designed, the change situation of the two enemies and the two parties in the fighting progress is described from multiple dimensions, and the rationality of the optimization direction of the own-party game decision method is ensured.
Further, designing a state evaluation function is one of important links for realizing a game decision method, and the embodiment of the invention provides a composite state evaluation function comprising an instant return item, a continuous return item and a physical relationship rationality constraint item on the basis of comprehensively considering the influence of the friend or foe situation on a final result in the aspects of return timeliness, physical constraint and the like.
The real-time return item is the real-time evaluation of the current positions of the two enemies, the continuous return item is the return given based on the duration time of the dominant and subordinate situation of the enemy and the my, and the physical relationship rationality constraint item is the punishment of collision of the two enemies and the my.
Further, the persistent reward item includes: and calculating the return according to the situation superiority and inferiority duration of the two enemies, wherein the longer the situation superiority or inferiority duration of the two enemies is, the larger the reward/penalty value of the situation evaluation function of the two enemies is.
In step S103, a game decision is output in the face of an air combat game opponent with an unknown decision method by a monte carlo search algorithm that fuses the maximum and minimum game ideas according to the variation.
Further, in an embodiment of the present invention, the method further includes: and obtaining the action with the maximum confidence degree upper bound in the own-party decision wheel selection own-party evaluation system, and obtaining the action with the maximum confidence degree upper bound in the enemy decision wheel selection enemy evaluation system.
Specifically, by means of a Monte Carlo search algorithm fused with the maximum and minimum game ideas, the action with the maximum upper confidence level is obtained in a self-decision wheel selection self-evaluation system, and the action with the maximum upper confidence level is obtained in an enemy decision wheel selection enemy evaluation system. Under the condition that the opponent game strategy is unknown, various possible influences of decisions of the two enemies and the two parties on subsequent confrontation situations are fully considered, so that corresponding decisions are made, and the confrontation winning rate is improved.
The embodiments of the invention will be further explained, but not limited to, by the exemplary embodiments described below with reference to the drawings.
The two-dimensional air combat simulation platform adopted by the embodiment of the invention comprises a dynamic model and a combat model of an airplane. In a two-dimensional plane, the state of each aircraft is represented by a five-tuple s ═ (x, y, v, θ, σ), and the meaning of each variable:
aircraft position (x, y): the position of the aircraft in a top view;
aircraft velocity v: the current flight rate of the aircraft;
aircraft yaw angle θ: the current nose orientation of the aircraft;
fuselage roll angle σ: the angle of the aircraft fuselage off the horizontal plane on the axis;
the variables have respective range limits, are matched with the game simulation platform adopted in the embodiment of the invention, and are not developed as the development details of the simulation platform do not belong to the content of the invention.
As shown in FIG. 3, each aircraft has a fan-shaped attack zone directly in front of it, with a length ratkAngle of thetaatk. Each airplane has a fan-shaped dead angle with a length r and easy to attack right behind the airplanedfAngle of thetadf
As shown in fig. 4, described by the centroid distance r, the azimuth AA, and the antenna deflection angle ATA of both enemy and my, the relative position (r, AA, ATA) can be calculated from the positions of two airplanes.
In the case of one-to-one combat for two airplanes. In the two-dimensional plane, the ultimate goal of each aircraft action is: 1) the enemy plane is positioned in the attack area of the enemy plane; 2) and meanwhile, the air conditioner is positioned in the dead angle of the enemy plane. If the two conditions are met simultaneously, the airplane can be considered to enter the advantageous state that the airplane can attack the enemy and is difficult to counterattack.
The essence of the Monte Carlo algorithm (MCTS) is to obtain more information by sampling in order to approximate the optimal solution. If the continuous decision space in the two-dimensional air combat problem is discretized, a discrete decision set is obtained:
D={d1,d2,d3,...,dn}
wherein d isiIs the action of the ith control aircraft obtained after discretization.
At each decision time t, the two parties of the enemy and the my depend on the current state
Figure BDA0002281342840000051
To make a decision, to pick a policy from the set of decisions
Figure BDA0002281342840000052
Figure BDA0002281342840000053
At the next decision time t +1, the state will transition on the decision, resulting in
Figure BDA0002281342840000054
Figure BDA0002281342840000055
The state of the next moment after the decision is made can be judged by a simulation method if the transfer function f is known. Given any one of the states(s)self,senemy) The decision combination that both I and F can make is n2And (4) seed preparation. I.e. starting from a state, can be switched onGo through different decisions to reach n2A new state.
Under the ideal condition of infinite computing resources, if any one t is 0 initial state, all decisions can be exhausted, and all possible states when t is 1 are obtained; exhaustive decision is made for each possible state when t is 1, and all possible states … … are repeated until timeout when t is 2 is obtained; or finally obtaining a final state corresponding to the fighting win or lose, and then backtracking from the state to determine the specific decision which should be taken at each decision moment.
The method essentially traverses a decision tree taking an initial state as a root node, so as to deduce an optimal decision. However, such an approach has two limitations:
one is that the computational resources are limited and the number of states explored by the simulation is limited. The depth and the breadth of the exploration in the decision tree are limited, and the exploration to the end state is difficult;
second, each step of decision is composed of
Figure BDA0002281342840000061
The respective decision components of the two parties, i can only decide
Figure BDA0002281342840000062
Can not make a decision on the left or right of the other party
Figure BDA0002281342840000063
That is, the game exists, so that the actual party can not completely control the trend of the decision state.
For the first limitation, in the embodiment of the present invention, a UCT (upper Confidence Bound applied to tree) algorithm is used to balance the depth and breadth of the decision tree exploration under the limited computing resources, the UCT algorithm is a classic game tree search algorithm, and the content of the algorithm does not belong to the content scope of the present invention, so that the present invention does not expand. With respect to the second limitation, the embodiment of the present invention proposes a corresponding MCTS improvement.
The UCT algorithm needs to evaluate the nodes to be expanded through a scoring system in the process of searching the game space strategy. In the MCTS algorithm, scoring is performed by an expert-based evaluation function. The evaluation function is the basis of the UCT search and needs to reflect the quality of the current state.
Ideally, the evaluation function should also reflect the quality of the subsequent state obtained from the current state. This property is not necessary, however, because the UCT algorithm can modify the score for this state by extending the nodes of this state. Therefore, the MCTS algorithm can achieve the expected effect only by reflecting the evaluation function of the current state.
The evaluation function actually adopted in the embodiment of the invention is recorded as R(s)self,senemy) Specifically, the method can be split into three parts:
R(sself,senemy)=Rimd+Radv+Rcol
Rimdis based on an immediate return of the current friend or foe location,
Figure BDA0002281342840000064
wherein R, AA and ATA are relative orientation parameters of friend or foe, RdIs the collision radius of the aircraft.
RadvThe method is a return based on the duration time of the friend or foe state, and the specific form is as follows:
Figure BDA0002281342840000065
the IsAdvance is used for judging whether the I plane or the enemy plane is in a judgment function of an advantage state, and the judgment conditions are as follows:
1) the distance between the two enemy and my machines is less than a certain range (relevant to the adopted confrontation simulation platform);
2) the AA value of the two machines is less than 60 degrees;
3) the ATA value of the two machines is less than 120 degrees;
t is the duration of time after the enemy or my plane enters the dominant state.
RcolIs to make two machines collidePenalty of (2):
Figure BDA0002281342840000071
if the adversary strategy is unknown, in a decision tree in which both the enemy and the my participate in decision making, the nodes at the odd level make decisions d for the enemyselfThis decision is made only so that my state sselfAfter the state is transferred, the decision tree is expanded one layer; when the enemy comes to the even layer, the enemy carries out decision again under the condition that the enemy knows the decision of the enemy. In the decision tree, the paths of the odd layers are determined by the policy of our party, the policies of the even layers are determined by the policy of the enemy party, and both parties cannot completely control the expansion direction of the decision tree. The MiniMax (MiniMax) algorithm proposed by McGrew provides a way to make decisions in this case. The MiniMax algorithm is based on finite-step search and evaluation:
in decision making, a search depth is determined first, and then each node of the decision tree at the search depth is traversed. The scores of these nodes reflect the state of the war after a finite look-ahead. At the odd level of the decision tree, my party will choose an action that is favorable to my party; on even layers, the adversary chooses the action that is not good for my party as much as possible. Thus, when both the enemy and the my party select the action, the user wants to 'make the next state transition to the state favorable to my party as far as possible no matter how the other party selects'. Thus, the action is evaluated as "worst result per action" and "best result among worst results" is selected to obtain assurance of the results.
The MiniMax strategy described above is very conservative and limiting in nature. However, by using the idea of "decision of my at odd-numbered level and decision of enemy at even-numbered level", the embodiment of the present invention applies the UCT algorithm when expanding the decision tree:
for each node state, each of the two enemies has respective evaluation, and the respective evaluation of each of the two enemies can be reversely propagated to each node on the decision path, so that each node has accumulated return obtained by the respective evaluation of each of the two enemies. When making a decision and expanding, selecting an action with the highest confidence degree upper bound obtained by evaluation of one party from the other party on an odd layer; and at the even layer, the enemy selects the action with the maximum confidence coefficient upper bound obtained by evaluation of the enemy. Therefore, the idea of MiniMax is fused with the MCTS, and the MCTS game decision algorithm when the opponent strategy is unknown is obtained. The specific algorithm flow is as follows:
inputting:
root node state sroot=(sself,senemy) Decision set D ═ D1,d2,...,dn(ii) a The number of expansion nodes N; my square merit function Rself(ii) a Adversary evaluation function Renemy(ii) a Constant C to balance exploration depth and breadth
Figure BDA0002281342840000072
Figure BDA0002281342840000081
In order to show the effectiveness of the method, the embodiment of the invention carries out algorithm test on a one-to-one air fighting game platform. The experimental conditions adopted in the embodiment of the invention are as follows: the Monte Carlo game decision method is adopted by the party when the strategy is unknown, and the Minimax algorithm is adopted by the enemy as the game decision method. In the case that the models of both enemies are the same, the test is completed in 2048 initial states (the positions and orientations of both enemies). The test results are shown in the following table:
Figure BDA0002281342840000082
Figure BDA0002281342840000091
from the above experimental results, compared with the classic Minimax game decision algorithm, the game algorithm proposed by the embodiment of the invention has more or less victory results, and the effectiveness of the method proposed by the embodiment of the invention is shown.
According to the Monte Carlo search game decision method for the opponents with unknown strategies, provided by the embodiment of the invention, through the game strategy idea of the alternative decision of the maximum and minimum game algorithm for the two opponents, the action with the maximum upper bound of confidence coefficient is provided in the own decision wheel selection and own evaluation system, and the action with the maximum upper bound of confidence coefficient is obtained in the enemy decision wheel selection and opponent evaluation system, so that on the premise of not obtaining the game strategy of the opponents, various possibilities of the opponents can be fully considered, corresponding countermeasures can be made, and the win-out rate can be improved.
The Monte Carlo search gambling decision system device facing the unknown strategy opponents proposed by the embodiment of the invention is described next with reference to the attached drawings.
Fig. 5 is a structural diagram of a monte carlo search gambling decision system facing an unknown policy opponent according to an embodiment of the invention.
As shown in fig. 5, the monte carlo search gambling decision system 10 facing an unknown policy opponent includes: a design module 100, a description module 200 and an output module 300.
The design module 100 is configured to design a composite state evaluation function by taking an influence of current situations of the two enemy and me on a subsequent confrontation situation and physical constraint factors of the two enemy and me as reference standards in an air combat game environment. The description module 200 is used for describing the change situation of the situations of the enemy and the my both in the adversarial progress from a plurality of dimensions through a composite state evaluation function. The output module 300 is configured to output the game decision in the face of an air combat game opponent with an unknown decision method through a monte carlo search algorithm that fuses the maximum and minimum game ideas according to the change situation.
Further, in an embodiment of the present invention, the composite status evaluation function includes an immediate return item, a persistent return item, and a physical relationship rationality constraint item.
Further, in one embodiment of the present invention, the persistent reward item includes: and calculating the return according to the situation superiority and inferiority duration of the two enemies, wherein the longer the situation superiority or inferiority duration of the two enemies is, the larger the reward/penalty value of the situation evaluation function of the two enemies is.
Further, in an embodiment of the present invention, the method further includes: the selection module 400 is configured to obtain an action with the largest confidence upper bound in the own-party decision wheel selection own-party evaluation system, and obtain an action with the largest confidence upper bound in the enemy decision wheel selection enemy evaluation system.
According to the Monte Carlo search game decision system facing the opponents with unknown strategies, provided by the embodiment of the invention, through the game strategy idea of the alternative decision of the maximum and minimum game algorithm, the action with the maximum upper bound of confidence coefficient is obtained in the own party decision wheel selection and own party evaluation system, and the action with the maximum upper bound of confidence coefficient is obtained in the enemy decision wheel selection and opponent evaluation system, so that on the premise of not obtaining the game strategy of the opponents, various possibilities of the opponents can be fully considered, corresponding countermeasures can be made, and the win-out rate is improved.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (8)

1. An unknown strategy opponent-oriented Monte Carlo search game decision method is characterized by comprising the following steps:
the method comprises the following steps of designing a composite state evaluation function by taking the influence of the current situation of the two enemies and the two parties on the subsequent confrontation situation under the air combat fighting game environment and the physical constraint factors of the two enemies and the two parties as reference standards;
describing the change situation of the situations of the enemy and the my both sides in the adversarial progress from a plurality of dimensions through the composite state evaluation function; and
and outputting game decisions when the opponents of the air battle game with unknown decision methods face through a Monte Carlo search algorithm fused with the maximum and minimum game ideas according to the change conditions.
2. The Monte Carlo search game decision method for the unknown policy opponent according to claim 1, wherein the composite state evaluation function comprises an instant return item, a continuous return item and a physical relationship rationality constraint item.
3. The Monte Carlo search betting decision method for unknown policy adversaries according to claim 2, wherein the persistent reward term comprises:
and calculating the return according to the situation superiority and inferiority duration of the two enemies, wherein the longer the situation superiority or inferiority duration of the two enemies is, the larger the reward/penalty value of the situation evaluation function of the two enemies is.
4. The Monte Carlo search betting decision method for unknown tactical opponents according to claim 1, further comprising:
and obtaining the action with the maximum confidence degree upper bound in the own-party decision wheel selection own-party evaluation system, and obtaining the action with the maximum confidence degree upper bound in the enemy decision wheel selection enemy evaluation system.
5. An unknown strategy opponent oriented Monte Carlo search gaming decision system, comprising:
the design module is used for designing a composite state evaluation function by taking the influence of the current situation of the two enemies and the two parties on the subsequent confrontation situation and the physical constraint factors of the two enemies and the two parties as reference standards in the air combat game environment;
the description module is used for describing the change situation of the situations of the enemy and the my both in the adversarial progress from a plurality of dimensions through the composite state evaluation function; and
and the output module is used for outputting game decisions when the opponents of the air battle games with unknown decision methods face through a Monte Carlo search algorithm fused with the maximum and minimum game ideas according to the change conditions.
6. The Monte Carlo search game decision system for unknown policy adversaries according to claim 5, wherein the composite state evaluation function contains an instant return term, a continuous return term and a physical relationship rationality constraint term.
7. The system of claim 6, wherein the persistent reward term comprises:
and calculating the return according to the situation superiority and inferiority duration of the two enemies, wherein the longer the situation superiority or inferiority duration of the two enemies is, the larger the reward/penalty value of the situation evaluation function of the two enemies is.
8. The Monte Carlo search betting decision system for unknown policy opponents according to claim 6, further comprising:
and the selection module is used for obtaining the action with the maximum confidence degree upper bound in the own-party decision wheel selection own-party evaluation system and obtaining the action with the maximum confidence degree upper bound in the enemy decision wheel selection enemy evaluation system.
CN201911142537.0A 2019-11-20 2019-11-20 Monte Carlo search game decision method and system facing to opponents with unknown strategies Pending CN111047053A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911142537.0A CN111047053A (en) 2019-11-20 2019-11-20 Monte Carlo search game decision method and system facing to opponents with unknown strategies

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911142537.0A CN111047053A (en) 2019-11-20 2019-11-20 Monte Carlo search game decision method and system facing to opponents with unknown strategies

Publications (1)

Publication Number Publication Date
CN111047053A true CN111047053A (en) 2020-04-21

Family

ID=70232482

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911142537.0A Pending CN111047053A (en) 2019-11-20 2019-11-20 Monte Carlo search game decision method and system facing to opponents with unknown strategies

Country Status (1)

Country Link
CN (1) CN111047053A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329348A (en) * 2020-11-06 2021-02-05 东北大学 Intelligent decision-making method for military countermeasure game under incomplete information condition
CN112612298A (en) * 2020-11-27 2021-04-06 合肥工业大学 Multi-target game method and device for multi-unmanned aerial vehicle tactical decision under countermeasure environment
CN113599832A (en) * 2021-07-20 2021-11-05 北京大学 Adversary modeling method, apparatus, device and storage medium based on environment model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107463094A (en) * 2017-07-13 2017-12-12 江西洪都航空工业集团有限责任公司 A kind of multiple no-manned plane air battle dynamic game method under uncertain information
CN108446801A (en) * 2018-03-22 2018-08-24 成都大象分形智能科技有限公司 A kind of more people's Under Asymmetry Information game decision making systems

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107463094A (en) * 2017-07-13 2017-12-12 江西洪都航空工业集团有限责任公司 A kind of multiple no-manned plane air battle dynamic game method under uncertain information
CN108446801A (en) * 2018-03-22 2018-08-24 成都大象分形智能科技有限公司 A kind of more people's Under Asymmetry Information game decision making systems

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
何旭等: "基于蒙特卡洛树搜索方法的空战机动决策", 《空军工程大学学报(自然科学版)》 *
陈鹏: "面向实时策略游戏微操的智能博弈决策方法", 《中国优秀硕士学位论文全文数据库(电子期刊)信息科技辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329348A (en) * 2020-11-06 2021-02-05 东北大学 Intelligent decision-making method for military countermeasure game under incomplete information condition
CN112329348B (en) * 2020-11-06 2023-09-15 东北大学 Intelligent decision-making method for military countermeasure game under incomplete information condition
CN112612298A (en) * 2020-11-27 2021-04-06 合肥工业大学 Multi-target game method and device for multi-unmanned aerial vehicle tactical decision under countermeasure environment
CN112612298B (en) * 2020-11-27 2023-06-09 合肥工业大学 Multi-target game method and device for tactical decisions of multiple unmanned aerial vehicles in countermeasure environment
CN113599832A (en) * 2021-07-20 2021-11-05 北京大学 Adversary modeling method, apparatus, device and storage medium based on environment model
CN113599832B (en) * 2021-07-20 2023-05-16 北京大学 Opponent modeling method, device, equipment and storage medium based on environment model

Similar Documents

Publication Publication Date Title
CN110119773B (en) Global situation assessment method, system and device of strategic gaming system
CN111047053A (en) Monte Carlo search game decision method and system facing to opponents with unknown strategies
CN113791634A (en) Multi-aircraft air combat decision method based on multi-agent reinforcement learning
US6763325B1 (en) Heightened realism for computer-controlled units in real-time activity simulation
US6195626B1 (en) Heightened realism for computer-controlled units in real-time simulation
CN106861190B (en) AI construction method and device, game control method and device and AI system
Smith et al. RETALIATE: Learning winning policies in first-person shooter games
CN110928329A (en) Multi-aircraft track planning method based on deep Q learning algorithm
CN106779210A (en) Algorithm of Firepower Allocation based on ant group algorithm
Jaidee et al. Case-based goal-driven coordination of multiple learning agents
Straatman et al. Hierarchical AI for multiplayer bots in Killzone 3
CN112906233A (en) Distributed near-end strategy optimization method based on cognitive behavior knowledge and application thereof
CN116858039A (en) Hypersonic aircraft game guidance method, system, equipment and medium
CN113282100A (en) Unmanned aerial vehicle confrontation game training control method based on reinforcement learning
CN113435598A (en) Knowledge-driven intelligent strategy deduction decision method
CN116700079A (en) Unmanned aerial vehicle countermeasure occupation maneuver control method based on AC-NFSP
CN115951695A (en) Dynamic tactical control domain resolving method based on three-party game in air combat simulation environment
CN115933717A (en) Unmanned aerial vehicle intelligent air combat maneuver decision training system and method based on deep reinforcement learning
US6179618B1 (en) Heightened realism for computer-controlled units in real-time activity simulation
CN113741186A (en) Double-machine air combat decision method based on near-end strategy optimization
Mora et al. Evolving the cooperative behaviour in Unreal™ bots
Sheeba et al. Optimal resource allocation and redistribution strategy in military conflicts with Lanchester square law attrition
Adams Fundamentals of strategy game design
CN114202175A (en) Combat mission planning method and system based on artificial intelligence
CN113705828A (en) Battlefield game strategy reinforcement learning training method based on cluster influence degree

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200421

RJ01 Rejection of invention patent application after publication