CN111176892B

CN111176892B - Countermeasure type searching method based on backup strategy

Info

Publication number: CN111176892B
Application number: CN201911333317.6A
Authority: CN
Inventors: 刘婵娟; 闫俊名; 张强; 魏小鹏
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2019-12-23
Filing date: 2019-12-23
Publication date: 2023-06-09
Anticipated expiration: 2039-12-23
Also published as: CN111176892A

Abstract

The invention discloses a countermeasure searching method based on a backup strategy, and belongs to the field of zero and game strategy searching. The invention provides an iterative optimal maximum and minimum (IOM) algorithm by optimizing the backup rule of a classical maximum and minimum algorithm. The method comprises the following steps: first, an evaluation value of any given node is calculated using a static evaluation function. The final value of each node is then updated in a back-propagation manner according to the backup rules, i.e. the final backup value of each node is equal to its evaluation value minus twice the maximum backup value in its child nodes. The backup rule used in calculating the final state value of the intermediate node provides a solution for reducing the influence of the sick node in the game tree on the decision quality. Compared with the error minimization maximum and minimum algorithm and the classical maximum and minimum algorithm, the iterative optimal maximum and minimum algorithm improves decision quality under the condition of limited search depth.

Description

Countermeasure type searching method based on backup strategy

Technical Field

The invention belongs to the field of machine game strategy searching, and particularly relates to an countermeasure searching method for improving a maximum and minimum algorithm backup method.

Background

Search algorithms are an important area of machine game research. In the double zero and game problems, the algorithm based on the maximum and minimum theorem is one of the most advanced countermeasure search algorithms, and when the whole game tree can be searched, the optimal solution of the complete information double zero and game problems can be obtained. However, since the state space is very large in many game problems, the game tree cannot be completely searched, the maximum and minimum algorithm proposed by shannon can select to extend the game tree to a limited depth in the implementation process, and a heuristic function is used as a static evaluation function to evaluate the state value of the leaf node, and the value calculated by the evaluation function is used as a true state value to search and calculate, so that the calculated optimal solution is finally obtained.

Initially, the direction of improvement for the maximum and minimum algorithms was mainly to research and develop various game tree pruning methods, and the quality of decision was improved by deeper searching. In practice, many gaming programs achieve higher decision quality through deeper searches. Of these, the most notable is the chess program "deep blue" which has defeated world champion caspalov in 1997. However, the phenomenon that higher decision quality can be achieved by a deeper search by a learner has been questioned. Nau and Beal respectively find that for certain categories of game trees, the deeper the search decision quality is worse with the rule of the maximum and minimum algorithm backtracking, and this phenomenon is called pathological (pathological). Wilson et al propose an error minimization maximum minimum algorithm (ERROR MINIMIZING MINIMAX) to address the pathological phenomena in the game tree. On the other hand, the validity of the maximum and minimum algorithm is always in question in theory, and the Pearl considers that the behavior of taking heuristic evaluation as a true value in the maximum and minimum algorithm makes a statistically fatal error, so that a learner also proposes different backtracking methods such as product propagation (product propagation).

Although the error minimization maximum minimum algorithm can avoid local pathology, the search algorithm still has many drawbacks. In the searching process of the error minimization maximum and minimum algorithm, the node type of the node can be judged only after all child nodes of the node are searched in a traversing mode, so that the time complexity of the algorithm is high and the algorithm is difficult to reduce. Meanwhile, due to the unavoidable errors of the static evaluation function, the algorithm may frequently evaluate the node type erroneously and propagate according to the erroneous backup rule.

Disclosure of Invention

In order to solve the problem of local pathology existing in the game tree and avoid the defect of an error minimization maximum and minimum algorithm in solving the problem, the invention provides a game tree searching method for endowing different source state values with different weights in the process of backups back-up updating. When the backup value of a certain state is updated in a backtracking way, the final backup value is updated as the backup value of the maximum backup value state in the sub-state of the evaluation function, which is obtained by subtracting two times from the evaluation value of the self state of the evaluation function. The backtracking method not only keeps the idea of obtaining more accurate current node value by utilizing deeper offspring nodes in the maximum and minimum algorithm, but also reduces the influence of the sick node on the search quality by utilizing the evaluation value of the situation state of the node. Compared with a loss minimization maximum and minimum algorithm, the method avoids the risk of node type evaluation errors, and simultaneously, a pruning method is easier to add in the searching process so as to reduce algorithm complexity.

The invention adopts the following technical scheme:

a countermeasure search method based on backup strategy, the method searches game tree by backtracking search method, specifically uses depth priority strategy to traverse nodes in game tree; and in the searching process, a new strategy is given to updating the backup value of the situation state corresponding to the node in the game tree. In the execution process, the method assumes that both game sides adopt the same static evaluation function, and both game sides adopt an action strategy for maximizing own benefits and minimizing other benefits, and specifically comprises the following steps:

step S1: initializing a backup value b_val of the current situation state s to ++infinity; calculating and recording an evaluation value e_val of the current situation state s by using a static evaluation function; judging whether the current searching node is a game leaf node (namely, judging whether the depth of the current searching node in a game tree reaches the maximum searching depth or whether the state corresponding to the node is a game ending state or not); if yes, returning the evaluation value e_val of the node as a final backup value b_val of the node; otherwise, step S2 is entered.

Step S2: for different game problems, according to the corresponding game rules, all actions conforming to the game rules in the current situation state S are obtained, and the step S3 is performed.

Step S3: judging whether an unvisited action exists, if not, entering step S7; otherwise, step S4 is entered.

Step S4: selecting an action mv from the non-accessed actions, simulating to complete the action mv, and changing the situation state into a sub-state s' of the situation state s; recursively invoking the challenge search method on sub-state sSteps S1 to S7 (search depth minus 1, action party becomes the other party) to obtain the final backup value b_val of the situation state S _s’ After that, the process advances to step S5.

Step S5: canceling the simulation action in the step S4 to restore the situation state to the situation state S; final backup value b_val using evaluation value e_val of situation state s and sub-state s' thereof _s’ The temporary backup value temp_val of the situation state S at the time of selecting the action mv is calculated, and the process proceeds to step S6.

Step S6: if the temporary backup value temp_val of the situation state S calculated in the step S5 is smaller than the current backup value b_val of the situation state S, updating the value b_val to be the value temp_val and returning to the step S3; meanwhile, if the node is a root node, the action mv corresponding to the temporary backup value temp_val is recorded; otherwise, it is directly returned to step S3.

Step S7: judging whether the current searching node is a root node or not; if yes, returning to the optimal action in the current situation state; otherwise, the final backup value b_val of the current situation state is returned.

Further, the specific content of the step S1 is:

when the countermeasure search method based on the backup strategy is invoked, parameters to be input include a situation state s, an action party layer to be acted in the current state and a maximum search depth d. The maximum search DEPTH value of the game tree is the same as the maximum search DEPTH d given when the iterative optimal maximum and minimum algorithm is called for the first time, and the DEPTH is used as a global variable in the function to judge whether the node is a root node or not.

The function for evaluating the situation state s is a static evaluation function determined according to a specific game problem, and the e_val value obtained by each evaluation is an evaluation value of an action party forming the situation state s.

Further, the specific content of step S5 is as follows:

in the process of updating the temporary backup value of the situation state, the temporary backup value of the situation state s is as follows:

temp_val＝e_val-2*b_val _s’

wherein temp_val representsTemporary backup value of situation state s; e_val represents the evaluation value of the static evaluation function on the boundary state s; b_val _s’ A final backup value representing a sub-state s' of the current situation state s; because the final decision is determined according to the relative size of the node backup values in the game tree, under the node backup value updating rule, nodes with different depths have different decision weights, and the deeper node decision weights are higher.

Further, the specific content of step S7 is as follows:

judging the type of the search node; if the maximum search DEPTH d of the node is equal to DEPTH (namely, the node is a game tree root node), the action mv adopted when the backup value is updated is recorded while the backup value is updated, and the adopted action mv is output as the optimal action after all actions are accessed.

b_val is the final backup value of the state corresponding to the node, and in zero and game, the own side can choose the action which makes the value of the other side lowest in order to obtain higher benefits, so when encountering that the temporary backup value corresponding to one action is smaller than the current backup value, the b_val value needs to be updated.

The invention has the beneficial effects that: the invention perfects the classical minimum algorithm by taking the static evaluation value of the node itself and the backup value from the child node into consideration when the node value is backed up. The backup rules used in the invention allocate different decision weights for nodes with different depths, and the deeper the node weights are, the higher the decision weights are. Under this backup rule, the evaluation value of the deeper node is still the main basis for making the final decision. Meanwhile, the influence of the pathological node on decision quality can be reduced by including the shallow node evaluation value in calculation. Higher decision quality than conventional maximum and minimum algorithms is achieved in a non-exhaustive gaming environment. Compared with a loss minimization maximum and minimum algorithm, the method avoids the risk of node type evaluation errors, and simultaneously, a pruning method is easier to add in the searching process so as to reduce the complexity of the algorithm.

Drawings

FIG. 1 is a flow chart of the present invention.

FIG. 2 is a diagram illustrating an example of a value backup process according to an embodiment of the present invention.

Detailed Description

The invention will be further described with reference to the accompanying drawings and examples.

As shown in fig. 1, the present embodiment provides an countermeasure search method based on a backup policy, which includes the following steps:

step S1: inputting a parameter situation state s, an action party layer to be acted under the current situation state and a maximum search depth d; initializing a backup value b_val of the current situation state s to ++infinity; calculating and recording an evaluation value e_val of the current situation state s by using a static evaluation function; judging whether the current searching node is a game leaf node (namely, the depth of the current searching node in a game tree reaches the maximum searching depth or the state corresponding to the situation of the current searching node is a game ending state); if yes, returning the evaluation value e_val of the node as a final backup value b_val of the node; otherwise, step S2 is entered.

Step S2: for different game problems, according to the corresponding game rules, all actions conforming to the game rules in the current situation state are obtained, and the step S3 is carried out.

Step S4: selecting an action mv from the non-accessed actions, simulating to complete the action mv, and changing the situation state into a sub-state s' of the situation state s; recursively calling the countermeasure search method steps S1 to S7 (search depth minus 1, action party becomes the other party) for the sub-state S' to obtain the final backup value b_val of the situation state S _s’ After that, the process advances to step S5.

Step S6: if the temporary backup value temp_val of the situation state S calculated in the step S5 is smaller than the current backup value b_val of the situation state S, updating the value b_val to be the value temp_val and returning to the step S3 (meanwhile, if the node is the root node, the action mv corresponding to the temporary backup value temp_val is to be recorded); otherwise, it is directly returned to step S3.

Step S7: judging whether the currently searched node is a root node or not; if yes, returning to the optimal action in the current situation state; otherwise, the final backup value b_val of the current situation state is returned.

In this embodiment, the specific content of step S5 is as follows:

in the process of updating the temporary backup value of the situation state, the calculation formula of the temporary backup value of the situation state s is as follows:

temp_val＝e_val-2*b_val _s’

wherein temp_val represents a temporary backup value of the localstate s; e_val represents the evaluation value of the static evaluation function on the boundary state s; b_val _s’ Representing the final backup value of the sub-state s' of the current situation state s.

In this embodiment, the specific content of step S7 is as follows:

As shown in fig. 2, a value update example diagram in the present embodiment is shown. This example illustrates a game tree with a search depth of 2 and a branching factor of 2. The Evaluation Values (EV) of the nodes recorded in the game tree are all values of the action party forming the state. The final backup value of the non-leaf nodes is denoted by BV.

As shown in fig. 2, node a is the root node of the game tree, assuming that node a is correspondingly layer1 to make an action decision. First, the node a is a situation state formed by the behavior of the layer2, and the evaluation value EV of the node a is 7 from the point of view of the layer2 by using a static evaluation function. Then, the child node of the node a is subjected to a depth-first search, and the evaluation value EV of the child node B thereof is obtained as-5 (the value is calculated from the layer1 point of view by using the same static evaluation function).

And performing depth-first search on the node B to obtain an evaluation value EV of the child node D of 10. Since node D is a leaf node, the upper node B of node D is updated back according to the backup rule temp_val=e_val-2×b_val _s’ Obtaining a temporary backup value of-25 for the node B, and updating the backup value of-25 for the node B.

And continuing to search the child node E of the node B to obtain a temporary backup value of-15 of the node B corresponding to the node E. Since this value is greater than the current backup value of node B, the backup value of node B is unchanged. At this time, all the child nodes of the node B are searched, so the current backup value is used as the final backup value of the node B, and the backup value BV of the node B is-25 (the backup value of the node B is the value obtained by layer1 in the corresponding situation state of the node B).

The same method obtains a final backup value BV of-23 for the node C (similarly, the backup value for the node C is the value obtained by layer1 in the corresponding situation state of the node C).

The backup values of nodes B and C are propagated up to node A according to the backup rules b_val _A ＝e_val _A -2*max(b_val _Ai ) The backup value of the final node a is 53. Meanwhile, since the node A is a game tree root node, the action of the selected node C is formed as the optimal action of the layer1 in the next step.

Claims

1. The countermeasure search method based on the backup strategy is characterized in that the countermeasure search method searches game trees through a retrospective search method, and in the method, both game parties are set to adopt the same static evaluation function in the execution process, and both game parties adopt action strategies for maximizing own benefits and minimizing other benefits; the parameters input when the method is called comprise a situation state s, an action party layer to be acted under the current situation state and a maximum search depth d; setting the maximum search DEPTH DEPTH of the game tree to be the same as the input maximum search DEPTH d, and judging whether the search node is a root node or not; the method specifically comprises the following steps:

step S1: initializing a backup value b_val of the current situation state s to ++infinity; calculating and recording an evaluation value e_val of the current situation state s by using a static evaluation function; judging whether the current searching node is a game leaf node or not; if yes, returning the evaluation value e_val of the node as a final backup value b_val of the node; otherwise, enter step S2;

step S2: for different game problems, according to the corresponding game rules, obtaining all actions conforming to the game rules under the current situation state S, and entering step S3;

step S3: judging whether an unvisited action exists, if not, entering step S7; otherwise, entering step S4;

step S4: selecting an action mv from the non-accessed actions, simulating to complete the action mv, and changing the situation state into a sub-state s' of the situation state s; recursively calling the steps S1 to S7 of the countermeasure search method for the sub-state S' to obtain a final backup value b_val of the situation state S _s’ After that, step S5 is entered;

step S5: canceling the simulation action in the step S4, and restoring the situation state to a situation state S; final backup value b_val using evaluation value e_val of situation state s and sub-state s' thereof _s’ Calculating a temporary backup value temp_val of the situation state S when the action mv is selected, and proceeding to step S6;

step S6: if the temporary backup value temp_val of the situation state S calculated in the step S5 is smaller than the current backup value b_val of the situation state S, updating the value b_val to be the value temp_val and returning to the step S3, and recording the action mv corresponding to the temporary backup value temp_val for the root node; otherwise, directly returning to the step S3;

step S7: judging whether the current searching node is a root node or not, if so, returning to the optimal action under the current situation state; otherwise, returning to the final backup value b_val of the current situation state.

2. The backup policy-based challenge search method of claim 1, wherein: the specific content of the step S5 is as follows:

temp_val＝e_val-2*b_val _s’

3. A backup policy-based challenge search method according to claim 1 or 2, characterized in that: the specific content of the step S7 is as follows:

judging the type of the search node; if the maximum search DEPTH d of the node is equal to DEPTH, namely the node is a game tree root node, the action mv adopted when the backup value is updated is recorded while the backup value is updated, and the adopted action mv is output as the optimal action after all actions are accessed.