CN111176892A

CN111176892A - Countermeasure type searching method based on backup strategy

Info

Publication number: CN111176892A
Application number: CN201911333317.6A
Authority: CN
Inventors: 刘婵娟; 闫俊名; 张强; 魏小鹏
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2019-12-23
Filing date: 2019-12-23
Publication date: 2020-05-19
Anticipated expiration: 2039-12-23
Also published as: CN111176892B

Abstract

The invention discloses a countermeasure type searching method based on a backup strategy, and belongs to the field of zero sum game strategy searching. The invention provides an iterative optimal minimum and maximum (IOM) algorithm by optimizing a backup rule of a classical minimum and maximum algorithm. The method comprises the following steps: first, the evaluation value of any given node is calculated using a static evaluation function. The final value of each node is then updated according to the backup rules in a back-propagation manner, i.e., the final backup value of each node is equal to its evaluation value minus twice the maximum backup value in its child nodes. The backup rule used when the final state value of the intermediate node is calculated provides a solution for reducing the influence of ill-conditioned nodes on decision quality in the game tree. Compared with an error minimization maximum and minimum algorithm and a classical maximum and minimum algorithm, the iterative optimal maximum and minimum algorithm improves decision quality under the condition of limited search depth.

Description

Countermeasure type searching method based on backup strategy

Technical Field

The invention belongs to the field of machine game strategy search, and particularly relates to a confrontation type search method for improving a minimum algorithm backup method.

Background

Search algorithms are an important area of machine gaming research. In the double zero-sum game problem, an algorithm based on the maximum and minimum theorem is one of the most advanced countermeasure type search algorithms, and when the whole game tree can be searched, the optimal solution of the complete information double zero-sum game problem can be obtained. However, since the state space in many game problems is very large and the game tree cannot be completely searched, in the implementation process of the maxmin algorithm proposed by shannon, the game tree is selectively expanded in a limited depth, a heuristic function is used as a static evaluation function to evaluate the state value of leaf nodes, the value calculated by the evaluation function is used as a real state value to perform search calculation, and finally, the calculated optimal solution is obtained.

Initially, the direction of improvement for the maxmin algorithm was mainly to research and develop various game tree pruning methods, and the quality of decision making was improved by deeper search. In practice, many gaming programs achieve higher decision quality through deeper searches. Among them, the most notable is that the chess program "deep blue" has prevailed the world champion caspoov in 1997. However, scholars have questioned the phenomenon that deeper searches can result in higher decision quality. Nau and Beal respectively find that for some classes of game trees, the deeper the search using the maximum minimum algorithm backtracking rules, the worse the quality of the decision, and call this phenomenon ill-conditioned (pathological). Wilson et al propose an ERROR minimization minimum maximum (ERROR MINIMIZING MINIMAX) algorithm to provide a solution to the ill-conditioned phenomena in the game tree. On other aspects, the effectiveness of the maximum and minimum algorithm is always in theoretical question, and Pearl considers that the behavior of taking heuristic evaluation as a true value in the maximum and minimum algorithm makes a statistically fatal error, so that researchers also propose different backtracking methods such as product propagation (product propagation).

Although the error minimization minimum maximum minimum algorithm can avoid local pathological phenomena, the search algorithm still has a plurality of defects. In the searching process of the algorithm with the minimum and maximum error, the node type can be judged only after all the child nodes of the node are traversed and searched, so that the time complexity of the algorithm is high and is difficult to reduce. Meanwhile, due to the inevitable error of the static evaluation function, the algorithm may often erroneously evaluate the node type and propagate according to the wrong backup rule.

Disclosure of Invention

In order to solve the problem of local pathological conditions in the game tree and avoid the defect of an error minimization minimum algorithm in solving the problem, the invention provides a game tree searching method for endowing different source state values with different weights in the backuping updating process of a backup value. When backups of a backup value of a certain state are retroactively updated, the final backup value is updated to the backup value of the state with the maximum backup value in the sub-states obtained by subtracting twice from the evaluation value of the state of the evaluation function. The backtracking method not only keeps the idea that deeper descendant nodes are utilized in the maximum and minimum algorithm to obtain more accurate current node value, but also reduces the influence of pathological nodes on the search quality by utilizing the evaluation value of the local state of the node. Compared with the loss minimization minimum algorithm, the method avoids the risk of node type evaluation errors, and meanwhile, a pruning method is more easily added in the searching process to reduce the algorithm complexity.

The invention adopts the following technical scheme:

a countermeasure type search method based on a backup strategy is disclosed, the method adopts a backtracking search method to search a game tree, and specifically adopts a depth-first strategy to traverse nodes in the game tree; and a new strategy is given to the updating of the backup value of the situation state corresponding to the nodes in the game tree in the searching process. The method assumes that both game parties adopt the same static evaluation function in the executing process, and both game parties adopt action strategies for maximizing own interests and minimizing opposite interests, and specifically comprises the following steps:

step S1: initializing a backup value b _ val of a current situation state s to + ∞; calculating an evaluation value e _ val of the current situation state s by using a static evaluation function and recording the evaluation value e _ val; judging whether the current searching node is a game leaf node (namely whether the depth of the current searching node in the game tree reaches the maximum searching depth or whether the local state corresponding to the node is a game ending state); if yes, returning the evaluation value e _ val of the node as the final backup value b _ val of the node; otherwise, the process proceeds to step S2.

Step S2: for different game problems, all actions meeting the game rules in the current situation state S are obtained according to the corresponding game rules, and the process goes to step S3.

Step S3: judging whether there is an unaccessed action, if not, entering step S7; otherwise, the process proceeds to step S4.

Step S4: selecting an action mv from the unaccessed actions, simulating and completing the action mv, and changing the situation state into a sub-state s' of the situation state s; recursively calling the countervailing search method from step S1 to step S7 (the search depth is reduced by 1, the mobile party becomes the opposite party) to the sub-state S', and obtaining the final backup value b _ val of the situation state S_s’Then, the process proceeds to step S5.

Step S5: canceling the simulation action in step S4 to restore the situation state to the situation state S; using the evaluation value e _ val of the situation state s and the final backup value b _ val of its sub-state s_s’The temporary backup value temp _ val of the situation state S when the action mv is selected is calculated, and the process proceeds to step S6.

Step S6: if the temporary backup value temp _ val of the office state S calculated in step S5 is smaller than the current backup value b _ val of the office state S, updating the value of b _ val to temp _ val and returning to step S3; meanwhile, if the node is the root node, the action mv corresponding to the temporary backup value temp _ val is recorded; otherwise, it returns directly to step S3.

Step S7: judging whether the current searching node is a root node or not; if yes, returning to the best action in the current situation state; otherwise, returning the final backup value b _ val of the current situation state.

Further, the specific content of step S1 is:

when the countermeasure type search method based on the backup strategy is called, parameters needing to be input comprise a situation state s, an action player to act in the current state and a maximum search depth d. The value of the maximum search DEPTH DEPTH of the game tree is the same as the value of the maximum search DEPTH d given when the iterative optimum maximum minimum algorithm is called for the first time, and the DEPTH is used as a global variable in a function and used for judging whether the node is a root node or not.

The function for evaluating the local state s is a static evaluation function determined according to a specific game problem, and the e _ val value obtained by each evaluation is an evaluation value for the action party forming the local state s.

Further, the specific content of step S5 is:

in the process of updating the temporary backup value of the situation state, the temporary backup value of the situation state s is as follows:

temp_val＝e_val-2*b_val_s’

wherein temp _ val represents a temporary backup value of the situation status s; e _ val represents the evaluation value of the static evaluation function on the local state s; b _ val_s’A final backup value representing a sub-state s' of the current situation state s; because the final decision is determined according to the relative size of the node backup values in the game tree, under the node backup value updating rule, nodes with different depths have different decision weights, and the decision weight of the deeper node is higher.

Further, the specific content of step S7 is:

judging the type of the search node; if the maximum search DEPTH d of the node is equal to DEPTH (namely the node is the game tree root node), the action mv adopted when the backup value is updated is recorded while the backup value is updated, and the adopted action mv is output as the best action after all actions are accessed.

b _ val is the final backup value of the corresponding state of the node, and in the zero sum game, the owner selects an action which enables the value of the opposite side to be the lowest in order to obtain higher income, so when a temporary backup value corresponding to one action is smaller than the current backup value, the b _ val value needs to be updated.

The invention has the beneficial effects that: the invention perfects the classic minimax algorithm by considering both the static evaluation value of the node and the backup value from the child node when backing up the value of the node. The backup rules used by the invention distribute different decision weights for nodes with different depths, and the deeper the node, the higher the weight. Under this backup rule, the evaluation value of the deeper node is still the main basis for making the final decision. Meanwhile, the shallow node evaluation value is included in the calculation, so that the influence of the ill-conditioned node on the decision quality can be reduced. The decision quality higher than that of the traditional maximum minimum algorithm is obtained in the inexhaustible game environment. Compared with the minimum loss algorithm, the method avoids the risk of node type evaluation errors, and meanwhile, a pruning method is more easily added in the searching process to reduce the algorithm complexity.

Drawings

FIG. 1 is a flow chart of the present invention.

FIG. 2 is a diagram illustrating an example of a value backup process according to an embodiment of the present invention.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

As shown in fig. 1, the embodiment provides a countermeasure search method based on a backup policy, which includes the following steps:

step S1: inputting a parameter situation state s, an action player to act in the current situation state and a maximum search depth d; initializing a backup value b _ val of a current situation state s to + ∞; calculating an evaluation value e _ val of the current situation state s by using a static evaluation function and recording the evaluation value e _ val; judging whether the current searching node is a game leaf node (namely the depth of the current searching node in the game tree reaches the maximum searching depth or the local state corresponding to the node is a game ending state); if yes, returning the evaluation value e _ val of the node as the final backup value b _ val of the node; otherwise, the process proceeds to step S2.

Step S2: for different game problems, all actions meeting the game rules in the current situation state are obtained according to the corresponding game rules, and the process goes to step S3.

Step S4: selecting an action mv from the unaccessed actions, simulating and completing the action mv, and changing the situation state into a sub-state s' of the situation state s; recursively calling the countervailing search method from step S1 to step S7 for the sub-state S' (search depth minus 1, action side becoming counterpart), to obtain the final backup value b _ val of the situation state S_s’Then, the process proceeds to step S5.

Step S6: if the temporary backup value temp _ val of the office state S calculated in step S5 is smaller than the current backup value b _ val of the office state S, updating the value of b _ val to temp _ val and returning to step S3 (meanwhile, if the node is the root node, the action mv corresponding to the temporary backup value temp _ val is recorded); otherwise, it returns directly to step S3.

Step S7: judging whether the node searched currently is a root node; if yes, returning to the best action in the current situation state; otherwise, returning the final backup value b _ val of the current situation state.

In this embodiment, the specific content of step S5 is:

in the process of updating the temporary backup value of the situation state, the calculation formula of the temporary backup value of the situation state s is as follows:

temp_val＝e_val-2*b_val_s’

wherein temp _ val represents a temporary backup value of the situation status s; e _ val represents the evaluation value of the static evaluation function on the local state s; b _ val_s’The final backup value of the sub-state s' representing the current situation state s.

In this embodiment, the specific content of step S7 is:

Fig. 2 is a diagram illustrating an example of value update in the present embodiment. This example is illustrated with a game tree with a search depth of 2 and a branching factor of 2. The Evaluation Values (EV) of the nodes recorded in the game tree form the value of the action party in the state. The final backup value of a non-leaf node is denoted by BV.

As shown in fig. 2, node a is the root node of the game tree, and it is assumed that node a is the player1 for action decision. First, node a is an office state formed by action of layer2, and from the viewpoint of layer2, evaluation value EV of node a is 7 using a static evaluation function. Then, a depth-first search is performed on the child node of node a, resulting in an evaluation value EV of its child node B of-5 (this value is calculated from the viewpoint of player1 using the same static evaluation function).

The node B is subjected to depth-first search, and the evaluation value EV of the child node D is 10. Since the node D is a leaf node, the node B on the upper layer of the node D is backtracked and updated according to the backup rule temp _ val-2B _ val_s’The temporary backup value of the node B is obtained as-25, and the backup value of the node B is updated as-25.

And continuing to search the child node E of the node B to obtain the temporary backup value of the node B corresponding to the node E, which is-15. Since this value is greater than the current backup value of the node B, the backup value of the node B is not changed. At this time, all the child nodes of the node B are searched, so the current backup value is returned as the final backup value of the node B, and the backup value BV of the node B is-25 (the backup value of the node B is the value obtained by the player1 in the situation state corresponding to the node B).

The same method obtains the final backup value BV of the node C as-23 (likewise, the backup value of the node C is the value obtained by the player1 in the corresponding situation state of the node C).

The backup values of nodes B and C are propagated up to node A according to the backup rule B _ val_A＝e_val_A-2*max(b_val_Ai) The final backup value for node a is 53. Meanwhile, since the node A is the root node of the game tree, the action of the node C is selected as the best action of the next step of the player 1.

Claims

1. A countermeasure type search method based on backup strategy is characterized in that the countermeasure type search method searches a game tree through a backtracking search method, and in the execution process of the method, both game parties adopt the same static evaluation function and adopt an action strategy which maximizes own party benefit and minimizes opposite party benefit; the parameters input when the method is called comprise a situation state s, an action player to act under the current situation state and a maximum search depth d; setting the value of the maximum search DEPTH DEPTH of the game tree to be the same as the input maximum search DEPTH d, and judging whether the search node is a root node or not; the method specifically comprises the following steps:

step S1: initializing a backup value b _ val of a current situation state s to + ∞; calculating an evaluation value e _ val of the current situation state s by using a static evaluation function and recording the evaluation value e _ val; judging whether the current searching node is a game leaf node or not; if yes, returning the evaluation value e _ val of the node as the final backup value b _ val of the node; otherwise, go to step S2;

step S2: for different game problems, according to corresponding game rules, all actions meeting the game rules in the current situation state S are obtained, and the step S3 is entered;

step S3: judging whether there is an unaccessed action, if not, entering step S7; otherwise, go to step S4;

step S4: selecting an action mv from the unaccessed actions, simulating and completing the action mv, and changing the situation state into a sub-state s' of the situation state s; recursively invoking the confrontation search method from step S1 to step S7 on the sub-state S' to obtain the situation state SFinal backup value b _ val_s’Then, the process proceeds to step S5;

step S5: canceling the simulation action in step S4 to restore the situation state to the situation state S; using the evaluation value e _ val of the situation state s and the final backup value b _ val of its sub-state s_s’Calculating a temporary backup value temp _ val of the situation status S when the action mv is selected, and proceeding to step S6;

step S6: if the temporary backup value temp _ val of the office state S calculated in step S5 is smaller than the current backup value b _ val of the office state S, updating the value b _ val to the value temp _ val and returning to step S3, and for the root node, recording the action mv corresponding to the temporary backup value temp _ val; otherwise, directly returning to step S3;

step S7: judging whether the current searching node is a root node or not, if so, returning to the best action in the current situation state; otherwise, returning the final backup value b _ val of the current situation state.

2. The countermeasure-based search method according to claim 1, further comprising: the specific content of step S5 is:

temp_val＝e_val-2*b_val_s’

3. The countermeasure search method according to claim 1 or 2, wherein: the specific content of step S7 is:

judging the type of the search node; if the maximum search DEPTH d of the node is equal to DEPTH, namely the node is a game tree root node, the action mv adopted when the backup value is updated is recorded while the backup value is updated, and the adopted action mv is output as the best action after all actions are accessed.