CN111176892B - Countermeasure type searching method based on backup strategy - Google Patents

Countermeasure type searching method based on backup strategy Download PDF

Info

Publication number
CN111176892B
CN111176892B CN201911333317.6A CN201911333317A CN111176892B CN 111176892 B CN111176892 B CN 111176892B CN 201911333317 A CN201911333317 A CN 201911333317A CN 111176892 B CN111176892 B CN 111176892B
Authority
CN
China
Prior art keywords
val
node
value
backup
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911333317.6A
Other languages
Chinese (zh)
Other versions
CN111176892A (en
Inventor
刘婵娟
闫俊名
张强
魏小鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201911333317.6A priority Critical patent/CN111176892B/en
Publication of CN111176892A publication Critical patent/CN111176892A/en
Application granted granted Critical
Publication of CN111176892B publication Critical patent/CN111176892B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a countermeasure searching method based on a backup strategy, and belongs to the field of zero and game strategy searching. The invention provides an iterative optimal maximum and minimum (IOM) algorithm by optimizing the backup rule of a classical maximum and minimum algorithm. The method comprises the following steps: first, an evaluation value of any given node is calculated using a static evaluation function. The final value of each node is then updated in a back-propagation manner according to the backup rules, i.e. the final backup value of each node is equal to its evaluation value minus twice the maximum backup value in its child nodes. The backup rule used in calculating the final state value of the intermediate node provides a solution for reducing the influence of the sick node in the game tree on the decision quality. Compared with the error minimization maximum and minimum algorithm and the classical maximum and minimum algorithm, the iterative optimal maximum and minimum algorithm improves decision quality under the condition of limited search depth.

Description

Countermeasure type searching method based on backup strategy
Technical Field
The invention belongs to the field of machine game strategy searching, and particularly relates to an countermeasure searching method for improving a maximum and minimum algorithm backup method.
Background
Search algorithms are an important area of machine game research. In the double zero and game problems, the algorithm based on the maximum and minimum theorem is one of the most advanced countermeasure search algorithms, and when the whole game tree can be searched, the optimal solution of the complete information double zero and game problems can be obtained. However, since the state space is very large in many game problems, the game tree cannot be completely searched, the maximum and minimum algorithm proposed by shannon can select to extend the game tree to a limited depth in the implementation process, and a heuristic function is used as a static evaluation function to evaluate the state value of the leaf node, and the value calculated by the evaluation function is used as a true state value to search and calculate, so that the calculated optimal solution is finally obtained.
Initially, the direction of improvement for the maximum and minimum algorithms was mainly to research and develop various game tree pruning methods, and the quality of decision was improved by deeper searching. In practice, many gaming programs achieve higher decision quality through deeper searches. Of these, the most notable is the chess program "deep blue" which has defeated world champion caspalov in 1997. However, the phenomenon that higher decision quality can be achieved by a deeper search by a learner has been questioned. Nau and Beal respectively find that for certain categories of game trees, the deeper the search decision quality is worse with the rule of the maximum and minimum algorithm backtracking, and this phenomenon is called pathological (pathological). Wilson et al propose an error minimization maximum minimum algorithm (ERROR MINIMIZING MINIMAX) to address the pathological phenomena in the game tree. On the other hand, the validity of the maximum and minimum algorithm is always in question in theory, and the Pearl considers that the behavior of taking heuristic evaluation as a true value in the maximum and minimum algorithm makes a statistically fatal error, so that a learner also proposes different backtracking methods such as product propagation (product propagation).
Although the error minimization maximum minimum algorithm can avoid local pathology, the search algorithm still has many drawbacks. In the searching process of the error minimization maximum and minimum algorithm, the node type of the node can be judged only after all child nodes of the node are searched in a traversing mode, so that the time complexity of the algorithm is high and the algorithm is difficult to reduce. Meanwhile, due to the unavoidable errors of the static evaluation function, the algorithm may frequently evaluate the node type erroneously and propagate according to the erroneous backup rule.
Disclosure of Invention
In order to solve the problem of local pathology existing in the game tree and avoid the defect of an error minimization maximum and minimum algorithm in solving the problem, the invention provides a game tree searching method for endowing different source state values with different weights in the process of backups back-up updating. When the backup value of a certain state is updated in a backtracking way, the final backup value is updated as the backup value of the maximum backup value state in the sub-state of the evaluation function, which is obtained by subtracting two times from the evaluation value of the self state of the evaluation function. The backtracking method not only keeps the idea of obtaining more accurate current node value by utilizing deeper offspring nodes in the maximum and minimum algorithm, but also reduces the influence of the sick node on the search quality by utilizing the evaluation value of the situation state of the node. Compared with a loss minimization maximum and minimum algorithm, the method avoids the risk of node type evaluation errors, and simultaneously, a pruning method is easier to add in the searching process so as to reduce algorithm complexity.
The invention adopts the following technical scheme:
a countermeasure search method based on backup strategy, the method searches game tree by backtracking search method, specifically uses depth priority strategy to traverse nodes in game tree; and in the searching process, a new strategy is given to updating the backup value of the situation state corresponding to the node in the game tree. In the execution process, the method assumes that both game sides adopt the same static evaluation function, and both game sides adopt an action strategy for maximizing own benefits and minimizing other benefits, and specifically comprises the following steps:
step S1: initializing a backup value b_val of the current situation state s to ++infinity; calculating and recording an evaluation value e_val of the current situation state s by using a static evaluation function; judging whether the current searching node is a game leaf node (namely, judging whether the depth of the current searching node in a game tree reaches the maximum searching depth or whether the state corresponding to the node is a game ending state or not); if yes, returning the evaluation value e_val of the node as a final backup value b_val of the node; otherwise, step S2 is entered.
Step S2: for different game problems, according to the corresponding game rules, all actions conforming to the game rules in the current situation state S are obtained, and the step S3 is performed.
Step S3: judging whether an unvisited action exists, if not, entering step S7; otherwise, step S4 is entered.
Step S4: selecting an action mv from the non-accessed actions, simulating to complete the action mv, and changing the situation state into a sub-state s' of the situation state s; recursively invoking the challenge search method on sub-state sSteps S1 to S7 (search depth minus 1, action party becomes the other party) to obtain the final backup value b_val of the situation state S s’ After that, the process advances to step S5.
Step S5: canceling the simulation action in the step S4 to restore the situation state to the situation state S; final backup value b_val using evaluation value e_val of situation state s and sub-state s' thereof s’ The temporary backup value temp_val of the situation state S at the time of selecting the action mv is calculated, and the process proceeds to step S6.
Step S6: if the temporary backup value temp_val of the situation state S calculated in the step S5 is smaller than the current backup value b_val of the situation state S, updating the value b_val to be the value temp_val and returning to the step S3; meanwhile, if the node is a root node, the action mv corresponding to the temporary backup value temp_val is recorded; otherwise, it is directly returned to step S3.
Step S7: judging whether the current searching node is a root node or not; if yes, returning to the optimal action in the current situation state; otherwise, the final backup value b_val of the current situation state is returned.
Further, the specific content of the step S1 is:
when the countermeasure search method based on the backup strategy is invoked, parameters to be input include a situation state s, an action party layer to be acted in the current state and a maximum search depth d. The maximum search DEPTH value of the game tree is the same as the maximum search DEPTH d given when the iterative optimal maximum and minimum algorithm is called for the first time, and the DEPTH is used as a global variable in the function to judge whether the node is a root node or not.
The function for evaluating the situation state s is a static evaluation function determined according to a specific game problem, and the e_val value obtained by each evaluation is an evaluation value of an action party forming the situation state s.
Further, the specific content of step S5 is as follows:
in the process of updating the temporary backup value of the situation state, the temporary backup value of the situation state s is as follows:
temp_val=e_val-2*b_val s’
wherein temp_val representsTemporary backup value of situation state s; e_val represents the evaluation value of the static evaluation function on the boundary state s; b_val s’ A final backup value representing a sub-state s' of the current situation state s; because the final decision is determined according to the relative size of the node backup values in the game tree, under the node backup value updating rule, nodes with different depths have different decision weights, and the deeper node decision weights are higher.
Further, the specific content of step S7 is as follows:
judging the type of the search node; if the maximum search DEPTH d of the node is equal to DEPTH (namely, the node is a game tree root node), the action mv adopted when the backup value is updated is recorded while the backup value is updated, and the adopted action mv is output as the optimal action after all actions are accessed.
b_val is the final backup value of the state corresponding to the node, and in zero and game, the own side can choose the action which makes the value of the other side lowest in order to obtain higher benefits, so when encountering that the temporary backup value corresponding to one action is smaller than the current backup value, the b_val value needs to be updated.
The invention has the beneficial effects that: the invention perfects the classical minimum algorithm by taking the static evaluation value of the node itself and the backup value from the child node into consideration when the node value is backed up. The backup rules used in the invention allocate different decision weights for nodes with different depths, and the deeper the node weights are, the higher the decision weights are. Under this backup rule, the evaluation value of the deeper node is still the main basis for making the final decision. Meanwhile, the influence of the pathological node on decision quality can be reduced by including the shallow node evaluation value in calculation. Higher decision quality than conventional maximum and minimum algorithms is achieved in a non-exhaustive gaming environment. Compared with a loss minimization maximum and minimum algorithm, the method avoids the risk of node type evaluation errors, and simultaneously, a pruning method is easier to add in the searching process so as to reduce the complexity of the algorithm.
Drawings
FIG. 1 is a flow chart of the present invention.
FIG. 2 is a diagram illustrating an example of a value backup process according to an embodiment of the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and examples.
As shown in fig. 1, the present embodiment provides an countermeasure search method based on a backup policy, which includes the following steps:
step S1: inputting a parameter situation state s, an action party layer to be acted under the current situation state and a maximum search depth d; initializing a backup value b_val of the current situation state s to ++infinity; calculating and recording an evaluation value e_val of the current situation state s by using a static evaluation function; judging whether the current searching node is a game leaf node (namely, the depth of the current searching node in a game tree reaches the maximum searching depth or the state corresponding to the situation of the current searching node is a game ending state); if yes, returning the evaluation value e_val of the node as a final backup value b_val of the node; otherwise, step S2 is entered.
Step S2: for different game problems, according to the corresponding game rules, all actions conforming to the game rules in the current situation state are obtained, and the step S3 is carried out.
Step S3: judging whether an unvisited action exists, if not, entering step S7; otherwise, step S4 is entered.
Step S4: selecting an action mv from the non-accessed actions, simulating to complete the action mv, and changing the situation state into a sub-state s' of the situation state s; recursively calling the countermeasure search method steps S1 to S7 (search depth minus 1, action party becomes the other party) for the sub-state S' to obtain the final backup value b_val of the situation state S s’ After that, the process advances to step S5.
Step S5: canceling the simulation action in the step S4 to restore the situation state to the situation state S; final backup value b_val using evaluation value e_val of situation state s and sub-state s' thereof s’ The temporary backup value temp_val of the situation state S at the time of selecting the action mv is calculated, and the process proceeds to step S6.
Step S6: if the temporary backup value temp_val of the situation state S calculated in the step S5 is smaller than the current backup value b_val of the situation state S, updating the value b_val to be the value temp_val and returning to the step S3 (meanwhile, if the node is the root node, the action mv corresponding to the temporary backup value temp_val is to be recorded); otherwise, it is directly returned to step S3.
Step S7: judging whether the currently searched node is a root node or not; if yes, returning to the optimal action in the current situation state; otherwise, the final backup value b_val of the current situation state is returned.
In this embodiment, the specific content of step S5 is as follows:
in the process of updating the temporary backup value of the situation state, the calculation formula of the temporary backup value of the situation state s is as follows:
temp_val=e_val-2*b_val s’
wherein temp_val represents a temporary backup value of the localstate s; e_val represents the evaluation value of the static evaluation function on the boundary state s; b_val s’ Representing the final backup value of the sub-state s' of the current situation state s.
In this embodiment, the specific content of step S7 is as follows:
judging the type of the search node; if the maximum search DEPTH d of the node is equal to DEPTH (namely, the node is a game tree root node), the action mv adopted when the backup value is updated is recorded while the backup value is updated, and the adopted action mv is output as the optimal action after all actions are accessed.
b_val is the final backup value of the state corresponding to the node, and in zero and game, the own side can choose the action which makes the value of the other side lowest in order to obtain higher benefits, so when encountering that the temporary backup value corresponding to one action is smaller than the current backup value, the b_val value needs to be updated.
As shown in fig. 2, a value update example diagram in the present embodiment is shown. This example illustrates a game tree with a search depth of 2 and a branching factor of 2. The Evaluation Values (EV) of the nodes recorded in the game tree are all values of the action party forming the state. The final backup value of the non-leaf nodes is denoted by BV.
As shown in fig. 2, node a is the root node of the game tree, assuming that node a is correspondingly layer1 to make an action decision. First, the node a is a situation state formed by the behavior of the layer2, and the evaluation value EV of the node a is 7 from the point of view of the layer2 by using a static evaluation function. Then, the child node of the node a is subjected to a depth-first search, and the evaluation value EV of the child node B thereof is obtained as-5 (the value is calculated from the layer1 point of view by using the same static evaluation function).
And performing depth-first search on the node B to obtain an evaluation value EV of the child node D of 10. Since node D is a leaf node, the upper node B of node D is updated back according to the backup rule temp_val=e_val-2×b_val s’ Obtaining a temporary backup value of-25 for the node B, and updating the backup value of-25 for the node B.
And continuing to search the child node E of the node B to obtain a temporary backup value of-15 of the node B corresponding to the node E. Since this value is greater than the current backup value of node B, the backup value of node B is unchanged. At this time, all the child nodes of the node B are searched, so the current backup value is used as the final backup value of the node B, and the backup value BV of the node B is-25 (the backup value of the node B is the value obtained by layer1 in the corresponding situation state of the node B).
The same method obtains a final backup value BV of-23 for the node C (similarly, the backup value for the node C is the value obtained by layer1 in the corresponding situation state of the node C).
The backup values of nodes B and C are propagated up to node A according to the backup rules b_val A =e_val A -2*max(b_val Ai ) The backup value of the final node a is 53. Meanwhile, since the node A is a game tree root node, the action of the selected node C is formed as the optimal action of the layer1 in the next step.

Claims (3)

1. The countermeasure search method based on the backup strategy is characterized in that the countermeasure search method searches game trees through a retrospective search method, and in the method, both game parties are set to adopt the same static evaluation function in the execution process, and both game parties adopt action strategies for maximizing own benefits and minimizing other benefits; the parameters input when the method is called comprise a situation state s, an action party layer to be acted under the current situation state and a maximum search depth d; setting the maximum search DEPTH DEPTH of the game tree to be the same as the input maximum search DEPTH d, and judging whether the search node is a root node or not; the method specifically comprises the following steps:
step S1: initializing a backup value b_val of the current situation state s to ++infinity; calculating and recording an evaluation value e_val of the current situation state s by using a static evaluation function; judging whether the current searching node is a game leaf node or not; if yes, returning the evaluation value e_val of the node as a final backup value b_val of the node; otherwise, enter step S2;
step S2: for different game problems, according to the corresponding game rules, obtaining all actions conforming to the game rules under the current situation state S, and entering step S3;
step S3: judging whether an unvisited action exists, if not, entering step S7; otherwise, entering step S4;
step S4: selecting an action mv from the non-accessed actions, simulating to complete the action mv, and changing the situation state into a sub-state s' of the situation state s; recursively calling the steps S1 to S7 of the countermeasure search method for the sub-state S' to obtain a final backup value b_val of the situation state S s’ After that, step S5 is entered;
step S5: canceling the simulation action in the step S4, and restoring the situation state to a situation state S; final backup value b_val using evaluation value e_val of situation state s and sub-state s' thereof s’ Calculating a temporary backup value temp_val of the situation state S when the action mv is selected, and proceeding to step S6;
step S6: if the temporary backup value temp_val of the situation state S calculated in the step S5 is smaller than the current backup value b_val of the situation state S, updating the value b_val to be the value temp_val and returning to the step S3, and recording the action mv corresponding to the temporary backup value temp_val for the root node; otherwise, directly returning to the step S3;
step S7: judging whether the current searching node is a root node or not, if so, returning to the optimal action under the current situation state; otherwise, returning to the final backup value b_val of the current situation state.
2. The backup policy-based challenge search method of claim 1, wherein: the specific content of the step S5 is as follows:
in the process of updating the temporary backup value of the situation state, the calculation formula of the temporary backup value of the situation state s is as follows:
temp_val=e_val-2*b_val s’
wherein temp_val represents a temporary backup value of the localstate s; e_val represents the evaluation value of the static evaluation function on the boundary state s; b_val s’ Representing the final backup value of the sub-state s' of the current situation state s.
3. A backup policy-based challenge search method according to claim 1 or 2, characterized in that: the specific content of the step S7 is as follows:
judging the type of the search node; if the maximum search DEPTH d of the node is equal to DEPTH, namely the node is a game tree root node, the action mv adopted when the backup value is updated is recorded while the backup value is updated, and the adopted action mv is output as the optimal action after all actions are accessed.
CN201911333317.6A 2019-12-23 2019-12-23 Countermeasure type searching method based on backup strategy Active CN111176892B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911333317.6A CN111176892B (en) 2019-12-23 2019-12-23 Countermeasure type searching method based on backup strategy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911333317.6A CN111176892B (en) 2019-12-23 2019-12-23 Countermeasure type searching method based on backup strategy

Publications (2)

Publication Number Publication Date
CN111176892A CN111176892A (en) 2020-05-19
CN111176892B true CN111176892B (en) 2023-06-09

Family

ID=70654121

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911333317.6A Active CN111176892B (en) 2019-12-23 2019-12-23 Countermeasure type searching method based on backup strategy

Country Status (1)

Country Link
CN (1) CN111176892B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113050686B (en) * 2021-03-19 2022-03-25 北京航空航天大学 Combat strategy optimization method and system based on deep reinforcement learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407670A (en) * 2016-09-06 2017-02-15 中国矿业大学 Game algorithm-based black and white chess game method and system
CN108985458A (en) * 2018-07-23 2018-12-11 东北大学 A kind of double tree monte carlo search algorithms of sequential synchronous game

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190050451A1 (en) * 2015-11-20 2019-02-14 Big Sky Sorcery, Llc System and method for searching structured data files
US10862918B2 (en) * 2017-04-21 2020-12-08 Raytheon Bbn Technologies Corp. Multi-dimensional heuristic search as part of an integrated decision engine for evolving defenses

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407670A (en) * 2016-09-06 2017-02-15 中国矿业大学 Game algorithm-based black and white chess game method and system
CN108985458A (en) * 2018-07-23 2018-12-11 东北大学 A kind of double tree monte carlo search algorithms of sequential synchronous game

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于RL算法的自学习博弈程序设计及实现;付强;陈焕文;;长沙理工大学学报(自然科学版)(第04期);全文 *

Also Published As

Publication number Publication date
CN111176892A (en) 2020-05-19

Similar Documents

Publication Publication Date Title
Guez et al. Learning to search with mctsnets
CN106964156B (en) Path finding method and device
CN111144570B (en) Knowledge representation method combining logic rules and confidence degrees
CN111176892B (en) Countermeasure type searching method based on backup strategy
CN108809697A (en) Social networks key node recognition methods based on maximizing influence and system
KR102460485B1 (en) Neural architecture search apparatus and method based on policy vector
US11461656B2 (en) Genetic programming for partial layers of a deep learning model
Germano et al. Uncertain rationality, depth of reasoning and robustness in games with incomplete information
CN115062570A (en) Formal verification method, device and equipment and computer storage medium
CN113221390B (en) Training method and device for scheduling model
Avin et al. Preferential attachment as a unique equilibrium
CN111258911A (en) Software test case generation method, system and storage medium based on data driving and multiple coverage strategies
Brown et al. On the feasibility of local utility redesign for multiagent optimization
JP2011107885A (en) Preprocessor in neural network learning
Song et al. Probability based Proof Number Search.
Ricordeau Q-concept-learning: generalization with concept lattice representation in reinforcement learning
Collenette et al. On the role of mobility and interaction topologies in social dilemmas
CN113762469B (en) Neural network structure searching method and system
CN113626721B (en) Regrettful exploration-based recommendation method and device, electronic equipment and storage medium
CN111652369A (en) Novel node value hybrid updating method
CN112580803B (en) Model acquisition method, apparatus, electronic device, storage medium, and program product
JP7338858B2 (en) Behavior learning device, behavior learning method, behavior determination device, and behavior determination method
Pothitos et al. The dilemma between arc and bounds consistency
CN116415541A (en) End-to-end reinforcement learning mixed scale layout method based on post-processing
CN115392469A (en) Quantum line mapping method and system based on dynamic deep search and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant