CN111176892A - Countermeasure type searching method based on backup strategy - Google Patents

Countermeasure type searching method based on backup strategy Download PDF

Info

Publication number
CN111176892A
CN111176892A CN201911333317.6A CN201911333317A CN111176892A CN 111176892 A CN111176892 A CN 111176892A CN 201911333317 A CN201911333317 A CN 201911333317A CN 111176892 A CN111176892 A CN 111176892A
Authority
CN
China
Prior art keywords
val
value
node
state
backup
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911333317.6A
Other languages
Chinese (zh)
Other versions
CN111176892B (en
Inventor
刘婵娟
闫俊名
张强
魏小鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201911333317.6A priority Critical patent/CN111176892B/en
Publication of CN111176892A publication Critical patent/CN111176892A/en
Application granted granted Critical
Publication of CN111176892B publication Critical patent/CN111176892B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a countermeasure type searching method based on a backup strategy, and belongs to the field of zero sum game strategy searching. The invention provides an iterative optimal minimum and maximum (IOM) algorithm by optimizing a backup rule of a classical minimum and maximum algorithm. The method comprises the following steps: first, the evaluation value of any given node is calculated using a static evaluation function. The final value of each node is then updated according to the backup rules in a back-propagation manner, i.e., the final backup value of each node is equal to its evaluation value minus twice the maximum backup value in its child nodes. The backup rule used when the final state value of the intermediate node is calculated provides a solution for reducing the influence of ill-conditioned nodes on decision quality in the game tree. Compared with an error minimization maximum and minimum algorithm and a classical maximum and minimum algorithm, the iterative optimal maximum and minimum algorithm improves decision quality under the condition of limited search depth.

Description

Countermeasure type searching method based on backup strategy
Technical Field
The invention belongs to the field of machine game strategy search, and particularly relates to a confrontation type search method for improving a minimum algorithm backup method.
Background
Search algorithms are an important area of machine gaming research. In the double zero-sum game problem, an algorithm based on the maximum and minimum theorem is one of the most advanced countermeasure type search algorithms, and when the whole game tree can be searched, the optimal solution of the complete information double zero-sum game problem can be obtained. However, since the state space in many game problems is very large and the game tree cannot be completely searched, in the implementation process of the maxmin algorithm proposed by shannon, the game tree is selectively expanded in a limited depth, a heuristic function is used as a static evaluation function to evaluate the state value of leaf nodes, the value calculated by the evaluation function is used as a real state value to perform search calculation, and finally, the calculated optimal solution is obtained.
Initially, the direction of improvement for the maxmin algorithm was mainly to research and develop various game tree pruning methods, and the quality of decision making was improved by deeper search. In practice, many gaming programs achieve higher decision quality through deeper searches. Among them, the most notable is that the chess program "deep blue" has prevailed the world champion caspoov in 1997. However, scholars have questioned the phenomenon that deeper searches can result in higher decision quality. Nau and Beal respectively find that for some classes of game trees, the deeper the search using the maximum minimum algorithm backtracking rules, the worse the quality of the decision, and call this phenomenon ill-conditioned (pathological). Wilson et al propose an ERROR minimization minimum maximum (ERROR MINIMIZING MINIMAX) algorithm to provide a solution to the ill-conditioned phenomena in the game tree. On other aspects, the effectiveness of the maximum and minimum algorithm is always in theoretical question, and Pearl considers that the behavior of taking heuristic evaluation as a true value in the maximum and minimum algorithm makes a statistically fatal error, so that researchers also propose different backtracking methods such as product propagation (product propagation).
Although the error minimization minimum maximum minimum algorithm can avoid local pathological phenomena, the search algorithm still has a plurality of defects. In the searching process of the algorithm with the minimum and maximum error, the node type can be judged only after all the child nodes of the node are traversed and searched, so that the time complexity of the algorithm is high and is difficult to reduce. Meanwhile, due to the inevitable error of the static evaluation function, the algorithm may often erroneously evaluate the node type and propagate according to the wrong backup rule.
Disclosure of Invention
In order to solve the problem of local pathological conditions in the game tree and avoid the defect of an error minimization minimum algorithm in solving the problem, the invention provides a game tree searching method for endowing different source state values with different weights in the backuping updating process of a backup value. When backups of a backup value of a certain state are retroactively updated, the final backup value is updated to the backup value of the state with the maximum backup value in the sub-states obtained by subtracting twice from the evaluation value of the state of the evaluation function. The backtracking method not only keeps the idea that deeper descendant nodes are utilized in the maximum and minimum algorithm to obtain more accurate current node value, but also reduces the influence of pathological nodes on the search quality by utilizing the evaluation value of the local state of the node. Compared with the loss minimization minimum algorithm, the method avoids the risk of node type evaluation errors, and meanwhile, a pruning method is more easily added in the searching process to reduce the algorithm complexity.
The invention adopts the following technical scheme:
a countermeasure type search method based on a backup strategy is disclosed, the method adopts a backtracking search method to search a game tree, and specifically adopts a depth-first strategy to traverse nodes in the game tree; and a new strategy is given to the updating of the backup value of the situation state corresponding to the nodes in the game tree in the searching process. The method assumes that both game parties adopt the same static evaluation function in the executing process, and both game parties adopt action strategies for maximizing own interests and minimizing opposite interests, and specifically comprises the following steps:
step S1: initializing a backup value b _ val of a current situation state s to + ∞; calculating an evaluation value e _ val of the current situation state s by using a static evaluation function and recording the evaluation value e _ val; judging whether the current searching node is a game leaf node (namely whether the depth of the current searching node in the game tree reaches the maximum searching depth or whether the local state corresponding to the node is a game ending state); if yes, returning the evaluation value e _ val of the node as the final backup value b _ val of the node; otherwise, the process proceeds to step S2.
Step S2: for different game problems, all actions meeting the game rules in the current situation state S are obtained according to the corresponding game rules, and the process goes to step S3.
Step S3: judging whether there is an unaccessed action, if not, entering step S7; otherwise, the process proceeds to step S4.
Step S4: selecting an action mv from the unaccessed actions, simulating and completing the action mv, and changing the situation state into a sub-state s' of the situation state s; recursively calling the countervailing search method from step S1 to step S7 (the search depth is reduced by 1, the mobile party becomes the opposite party) to the sub-state S', and obtaining the final backup value b _ val of the situation state Ss’Then, the process proceeds to step S5.
Step S5: canceling the simulation action in step S4 to restore the situation state to the situation state S; using the evaluation value e _ val of the situation state s and the final backup value b _ val of its sub-state ss’The temporary backup value temp _ val of the situation state S when the action mv is selected is calculated, and the process proceeds to step S6.
Step S6: if the temporary backup value temp _ val of the office state S calculated in step S5 is smaller than the current backup value b _ val of the office state S, updating the value of b _ val to temp _ val and returning to step S3; meanwhile, if the node is the root node, the action mv corresponding to the temporary backup value temp _ val is recorded; otherwise, it returns directly to step S3.
Step S7: judging whether the current searching node is a root node or not; if yes, returning to the best action in the current situation state; otherwise, returning the final backup value b _ val of the current situation state.
Further, the specific content of step S1 is:
when the countermeasure type search method based on the backup strategy is called, parameters needing to be input comprise a situation state s, an action player to act in the current state and a maximum search depth d. The value of the maximum search DEPTH DEPTH of the game tree is the same as the value of the maximum search DEPTH d given when the iterative optimum maximum minimum algorithm is called for the first time, and the DEPTH is used as a global variable in a function and used for judging whether the node is a root node or not.
The function for evaluating the local state s is a static evaluation function determined according to a specific game problem, and the e _ val value obtained by each evaluation is an evaluation value for the action party forming the local state s.
Further, the specific content of step S5 is:
in the process of updating the temporary backup value of the situation state, the temporary backup value of the situation state s is as follows:
temp_val=e_val-2*b_vals’
wherein temp _ val represents a temporary backup value of the situation status s; e _ val represents the evaluation value of the static evaluation function on the local state s; b _ vals’A final backup value representing a sub-state s' of the current situation state s; because the final decision is determined according to the relative size of the node backup values in the game tree, under the node backup value updating rule, nodes with different depths have different decision weights, and the decision weight of the deeper node is higher.
Further, the specific content of step S7 is:
judging the type of the search node; if the maximum search DEPTH d of the node is equal to DEPTH (namely the node is the game tree root node), the action mv adopted when the backup value is updated is recorded while the backup value is updated, and the adopted action mv is output as the best action after all actions are accessed.
b _ val is the final backup value of the corresponding state of the node, and in the zero sum game, the owner selects an action which enables the value of the opposite side to be the lowest in order to obtain higher income, so when a temporary backup value corresponding to one action is smaller than the current backup value, the b _ val value needs to be updated.
The invention has the beneficial effects that: the invention perfects the classic minimax algorithm by considering both the static evaluation value of the node and the backup value from the child node when backing up the value of the node. The backup rules used by the invention distribute different decision weights for nodes with different depths, and the deeper the node, the higher the weight. Under this backup rule, the evaluation value of the deeper node is still the main basis for making the final decision. Meanwhile, the shallow node evaluation value is included in the calculation, so that the influence of the ill-conditioned node on the decision quality can be reduced. The decision quality higher than that of the traditional maximum minimum algorithm is obtained in the inexhaustible game environment. Compared with the minimum loss algorithm, the method avoids the risk of node type evaluation errors, and meanwhile, a pruning method is more easily added in the searching process to reduce the algorithm complexity.
Drawings
FIG. 1 is a flow chart of the present invention.
FIG. 2 is a diagram illustrating an example of a value backup process according to an embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
As shown in fig. 1, the embodiment provides a countermeasure search method based on a backup policy, which includes the following steps:
step S1: inputting a parameter situation state s, an action player to act in the current situation state and a maximum search depth d; initializing a backup value b _ val of a current situation state s to + ∞; calculating an evaluation value e _ val of the current situation state s by using a static evaluation function and recording the evaluation value e _ val; judging whether the current searching node is a game leaf node (namely the depth of the current searching node in the game tree reaches the maximum searching depth or the local state corresponding to the node is a game ending state); if yes, returning the evaluation value e _ val of the node as the final backup value b _ val of the node; otherwise, the process proceeds to step S2.
Step S2: for different game problems, all actions meeting the game rules in the current situation state are obtained according to the corresponding game rules, and the process goes to step S3.
Step S3: judging whether there is an unaccessed action, if not, entering step S7; otherwise, the process proceeds to step S4.
Step S4: selecting an action mv from the unaccessed actions, simulating and completing the action mv, and changing the situation state into a sub-state s' of the situation state s; recursively calling the countervailing search method from step S1 to step S7 for the sub-state S' (search depth minus 1, action side becoming counterpart), to obtain the final backup value b _ val of the situation state Ss’Then, the process proceeds to step S5.
Step S5: canceling the simulation action in step S4 to restore the situation state to the situation state S; using the evaluation value e _ val of the situation state s and the final backup value b _ val of its sub-state ss’The temporary backup value temp _ val of the situation state S when the action mv is selected is calculated, and the process proceeds to step S6.
Step S6: if the temporary backup value temp _ val of the office state S calculated in step S5 is smaller than the current backup value b _ val of the office state S, updating the value of b _ val to temp _ val and returning to step S3 (meanwhile, if the node is the root node, the action mv corresponding to the temporary backup value temp _ val is recorded); otherwise, it returns directly to step S3.
Step S7: judging whether the node searched currently is a root node; if yes, returning to the best action in the current situation state; otherwise, returning the final backup value b _ val of the current situation state.
In this embodiment, the specific content of step S5 is:
in the process of updating the temporary backup value of the situation state, the calculation formula of the temporary backup value of the situation state s is as follows:
temp_val=e_val-2*b_vals’
wherein temp _ val represents a temporary backup value of the situation status s; e _ val represents the evaluation value of the static evaluation function on the local state s; b _ vals’The final backup value of the sub-state s' representing the current situation state s.
In this embodiment, the specific content of step S7 is:
judging the type of the search node; if the maximum search DEPTH d of the node is equal to DEPTH (namely the node is the game tree root node), the action mv adopted when the backup value is updated is recorded while the backup value is updated, and the adopted action mv is output as the best action after all actions are accessed.
b _ val is the final backup value of the corresponding state of the node, and in the zero sum game, the owner selects an action which enables the value of the opposite side to be the lowest in order to obtain higher income, so when a temporary backup value corresponding to one action is smaller than the current backup value, the b _ val value needs to be updated.
Fig. 2 is a diagram illustrating an example of value update in the present embodiment. This example is illustrated with a game tree with a search depth of 2 and a branching factor of 2. The Evaluation Values (EV) of the nodes recorded in the game tree form the value of the action party in the state. The final backup value of a non-leaf node is denoted by BV.
As shown in fig. 2, node a is the root node of the game tree, and it is assumed that node a is the player1 for action decision. First, node a is an office state formed by action of layer2, and from the viewpoint of layer2, evaluation value EV of node a is 7 using a static evaluation function. Then, a depth-first search is performed on the child node of node a, resulting in an evaluation value EV of its child node B of-5 (this value is calculated from the viewpoint of player1 using the same static evaluation function).
The node B is subjected to depth-first search, and the evaluation value EV of the child node D is 10. Since the node D is a leaf node, the node B on the upper layer of the node D is backtracked and updated according to the backup rule temp _ val-2B _ vals’The temporary backup value of the node B is obtained as-25, and the backup value of the node B is updated as-25.
And continuing to search the child node E of the node B to obtain the temporary backup value of the node B corresponding to the node E, which is-15. Since this value is greater than the current backup value of the node B, the backup value of the node B is not changed. At this time, all the child nodes of the node B are searched, so the current backup value is returned as the final backup value of the node B, and the backup value BV of the node B is-25 (the backup value of the node B is the value obtained by the player1 in the situation state corresponding to the node B).
The same method obtains the final backup value BV of the node C as-23 (likewise, the backup value of the node C is the value obtained by the player1 in the corresponding situation state of the node C).
The backup values of nodes B and C are propagated up to node A according to the backup rule B _ valA=e_valA-2*max(b_valAi) The final backup value for node a is 53. Meanwhile, since the node A is the root node of the game tree, the action of the node C is selected as the best action of the next step of the player 1.

Claims (3)

1. A countermeasure type search method based on backup strategy is characterized in that the countermeasure type search method searches a game tree through a backtracking search method, and in the execution process of the method, both game parties adopt the same static evaluation function and adopt an action strategy which maximizes own party benefit and minimizes opposite party benefit; the parameters input when the method is called comprise a situation state s, an action player to act under the current situation state and a maximum search depth d; setting the value of the maximum search DEPTH DEPTH of the game tree to be the same as the input maximum search DEPTH d, and judging whether the search node is a root node or not; the method specifically comprises the following steps:
step S1: initializing a backup value b _ val of a current situation state s to + ∞; calculating an evaluation value e _ val of the current situation state s by using a static evaluation function and recording the evaluation value e _ val; judging whether the current searching node is a game leaf node or not; if yes, returning the evaluation value e _ val of the node as the final backup value b _ val of the node; otherwise, go to step S2;
step S2: for different game problems, according to corresponding game rules, all actions meeting the game rules in the current situation state S are obtained, and the step S3 is entered;
step S3: judging whether there is an unaccessed action, if not, entering step S7; otherwise, go to step S4;
step S4: selecting an action mv from the unaccessed actions, simulating and completing the action mv, and changing the situation state into a sub-state s' of the situation state s; recursively invoking the confrontation search method from step S1 to step S7 on the sub-state S' to obtain the situation state SFinal backup value b _ vals’Then, the process proceeds to step S5;
step S5: canceling the simulation action in step S4 to restore the situation state to the situation state S; using the evaluation value e _ val of the situation state s and the final backup value b _ val of its sub-state ss’Calculating a temporary backup value temp _ val of the situation status S when the action mv is selected, and proceeding to step S6;
step S6: if the temporary backup value temp _ val of the office state S calculated in step S5 is smaller than the current backup value b _ val of the office state S, updating the value b _ val to the value temp _ val and returning to step S3, and for the root node, recording the action mv corresponding to the temporary backup value temp _ val; otherwise, directly returning to step S3;
step S7: judging whether the current searching node is a root node or not, if so, returning to the best action in the current situation state; otherwise, returning the final backup value b _ val of the current situation state.
2. The countermeasure-based search method according to claim 1, further comprising: the specific content of step S5 is:
in the process of updating the temporary backup value of the situation state, the calculation formula of the temporary backup value of the situation state s is as follows:
temp_val=e_val-2*b_vals’
wherein temp _ val represents a temporary backup value of the situation status s; e _ val represents the evaluation value of the static evaluation function on the local state s; b _ vals’The final backup value of the sub-state s' representing the current situation state s.
3. The countermeasure search method according to claim 1 or 2, wherein: the specific content of step S7 is:
judging the type of the search node; if the maximum search DEPTH d of the node is equal to DEPTH, namely the node is a game tree root node, the action mv adopted when the backup value is updated is recorded while the backup value is updated, and the adopted action mv is output as the best action after all actions are accessed.
CN201911333317.6A 2019-12-23 2019-12-23 Countermeasure type searching method based on backup strategy Active CN111176892B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911333317.6A CN111176892B (en) 2019-12-23 2019-12-23 Countermeasure type searching method based on backup strategy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911333317.6A CN111176892B (en) 2019-12-23 2019-12-23 Countermeasure type searching method based on backup strategy

Publications (2)

Publication Number Publication Date
CN111176892A true CN111176892A (en) 2020-05-19
CN111176892B CN111176892B (en) 2023-06-09

Family

ID=70654121

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911333317.6A Active CN111176892B (en) 2019-12-23 2019-12-23 Countermeasure type searching method based on backup strategy

Country Status (1)

Country Link
CN (1) CN111176892B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113050686A (en) * 2021-03-19 2021-06-29 北京航空航天大学 Combat strategy optimization method and system based on deep reinforcement learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407670A (en) * 2016-09-06 2017-02-15 中国矿业大学 Game algorithm-based black and white chess game method and system
US20180309779A1 (en) * 2017-04-21 2018-10-25 Raytheon Bbn Technologies Corp. Multi-dimensional heuristic search as part of an integrated decision engine for evolving defenses
CN108985458A (en) * 2018-07-23 2018-12-11 东北大学 A kind of double tree monte carlo search algorithms of sequential synchronous game
US20190050451A1 (en) * 2015-11-20 2019-02-14 Big Sky Sorcery, Llc System and method for searching structured data files

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190050451A1 (en) * 2015-11-20 2019-02-14 Big Sky Sorcery, Llc System and method for searching structured data files
CN106407670A (en) * 2016-09-06 2017-02-15 中国矿业大学 Game algorithm-based black and white chess game method and system
US20180309779A1 (en) * 2017-04-21 2018-10-25 Raytheon Bbn Technologies Corp. Multi-dimensional heuristic search as part of an integrated decision engine for evolving defenses
CN108985458A (en) * 2018-07-23 2018-12-11 东北大学 A kind of double tree monte carlo search algorithms of sequential synchronous game

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
付强;陈焕文;: "基于RL算法的自学习博弈程序设计及实现", 长沙理工大学学报(自然科学版), no. 04 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113050686A (en) * 2021-03-19 2021-06-29 北京航空航天大学 Combat strategy optimization method and system based on deep reinforcement learning

Also Published As

Publication number Publication date
CN111176892B (en) 2023-06-09

Similar Documents

Publication Publication Date Title
Zimmermann et al. Cooperation, social networks, and the emergence of leadership in a prisoner’s dilemma with adaptive local interactions
CN110515303A (en) A kind of adaptive dynamic path planning method based on DDQN
CN109697512B (en) Personal data analysis method based on Bayesian network and computer storage medium
CN108363478A (en) For wearable device deep learning application model load sharing system and method
Haufe et al. Automated verification of state sequence invariants in general game playing
CN113361279A (en) Medical entity alignment method and system based on double neighborhood map neural network
CN111176892A (en) Countermeasure type searching method based on backup strategy
US11461656B2 (en) Genetic programming for partial layers of a deep learning model
CN113221390B (en) Training method and device for scheduling model
Jaleel et al. Robustness of stochastic learning dynamics to player heterogeneity in games
Avin et al. Preferential attachment as a unique equilibrium
CN113010437B (en) Software system reliability management method and system based on fault analysis
CN108304929A (en) A kind of determination method and system of the best tactics of lattice chess
Desmarchelier et al. Kibs and the dynamics of industrial clusters
Brown et al. On the feasibility of local utility redesign for multiagent optimization
CN108197186B (en) Dynamic graph matching query method applied to social network
CN113449869A (en) Learning method of easy-reasoning Bayesian network
Collenette et al. On the role of mobility and interaction topologies in social dilemmas
CN113762469B (en) Neural network structure searching method and system
CN110442690A (en) A kind of query optimization method, system and medium based on probability inference
CN111652369A (en) Novel node value hybrid updating method
CN110162400B (en) Method and system for realizing cooperation of intelligent agents in MAS system in complex network environment
CN118297353B (en) Industrial production process multi-objective optimization method based on branch non-dominant sorting algorithm
JP7338858B2 (en) Behavior learning device, behavior learning method, behavior determination device, and behavior determination method
Li et al. Genetic network programming with automatic program generation for agent control

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant