CN114492749B

CN114492749B - Game decision method for motion space decoupling of time-limited red-blue countermeasure problem

Info

Publication number: CN114492749B
Application number: CN202210079797.3A
Authority: CN
Inventors: 耿虎军; 孟楠; 张加佳; 姜岩松; 张文宝; 楚博策; 韩长兴; 高晓倩; 王梅瑞; 高峰
Original assignee: CETC 54 Research Institute
Current assignee: CETC 54 Research Institute
Priority date: 2022-01-24
Filing date: 2022-01-24
Publication date: 2023-09-15
Anticipated expiration: 2042-01-24
Also published as: CN114492749A

Abstract

The invention discloses a game decision method for motion space decoupling of a time-limited red-blue countermeasure problem, and belongs to the field of game decision in artificial intelligence. The method comprises the following steps: 1. abstracting game problems, completing game problem modeling, and constructing a structured abstraction and a countermeasure scene based on semantics; 2. constructing a blue side information prediction pre-training model under incomplete information, and supporting mapping from the incomplete information to the complete information under a red side view angle; 3. constructing an action space of a single group, designing an evaluation function, and making a Monte Carlo tree decision based on action space decoupling; 4. and designing path association degree and influence discriminant function, performing Monte Carlo tree decision post-processing based on the group influence degree, completing action space design of high-correlation groups and decision result output of low-correlation groups, and finally obtaining game decision results. The method can solve the complex game problem under the limitation of time, and can quickly search for a large-scale action space, thereby supporting efficient and accurate game decision.

Description

Game decision method for motion space decoupling of time-limited red-blue countermeasure problem

Technical Field

The invention belongs to the field of game decision making in artificial intelligence, and particularly relates to a time-limited red-blue countermeasure problem action space decoupling game decision making method.

Background

The semantic data can be used as an information description mode to effectively express situation information, the transmission has the characteristics of low bandwidth and low communication pressure, and the information transmission under the conditions of high countermeasure, narrow bandwidth and high damage can be effectively supported. Machine game decision making is a key technology for supporting economic, political and other problems to make auxiliary decisions, and is receiving extensive attention. With the development of artificial intelligence technology, game decision-making technology has realized floor application on part of the problems. However, as the complexity of the game to be solved becomes higher, the problems of incomplete information, large action space, limited decision time and the like are highlighted, and the existing game decision technology cannot meet the requirements. How to better optimize the game decision algorithm, solve the complex game problem of limited decision time under the semantic situation background, and become the key point of the current research.

The current game decision algorithm is roughly divided into three main study methods, the first is to construct a strategy or an estimated network based on accumulated game rewards by using a reinforcement learning method, and realize decision through a large amount of simulation; the second type is a game theory target optimization method, a current rewarding target is set, and a decision result is obtained by solving in combination with the optimization method; the third category is search methods, which combine rewards to obtain decision results by searching for various possible scenarios of game questions. Most of the first methods are methods for constructing a neural network, combine a large amount of office data, and make decisions through network training, but the result has poor interpretation and requires a large amount of training. The second type of approach is an optimal solution for the current prize, but often lacks consideration of future rewards, with certain limitations. The third class of methods can be solved for the jackpot with good interpretation. Aiming at the complex game decision problem with larger action space, the Monte Carlo tree in the third class of methods can be well applied, has better interpretation and does not need a large amount of historical game data. However, the problem of incomplete information such as the problem of force generation by calculation or the problem of immediate strategic game cannot be applied, and the decision effect is required to be improved under the condition of time limitation. Therefore, the invention aims at the improvement of the third class method to solve the problem of complex game decision-making under the incomplete information with the semantic situation as the basis time limitation.

Disclosure of Invention

The invention aims to provide a game decision method for action space decoupling of a limited time red and blue countermeasure problem, which can realize the decision of game countermeasure problem under the 'war vague' problem by modeling semantic situation information, simplify the action space, be applied to complex countermeasure problem, greatly reduce the search space and improve the search accuracy.

The invention adopts the technical scheme that:

a game decision method for motion space decoupling of time-limited red and blue countermeasure problems comprises the following steps:

step 1, building a red-blue countermeasure scene with a foggy effect, providing semantic information of red-blue situations of respective camping views, simultaneously carrying out environment abstraction based on a building, introducing connectivity and nodes to discretize a map, and defining a principle of action space formulation;

step 2, for each ofAnalyzing semantic information of red and blue situations of a camping view, constructing game data to train a blue information prediction model under incomplete information, and obtaining a pre-training model BN _r ；

Step 3, analyzing blue-side incomplete situation semantic information and red-side information based on fog at the red-side visual angle, and sending the blue-side incomplete situation semantic information and the red-side information into a blue-side information prediction model BN _r Predicting unknown blue side information to obtain blue side complete information under a red side visual angle;

step 4, establishing a joint Monte Carlo tree evaluation system based on a communication gain matrix, a radiation probability matrix and a red and blue unit distribution matrix;

step 5, based on the blue party complete information obtained in the step 3 and the evaluation system obtained in the step 4, constructing an action space for each group of red parties, and establishing a Monte Carlo tree separate decision; each unit of the red party is defined as 1 group at the initial moment, and in the Monte Carlo tree decision process of each group, only the current group is searched for decision, and other groups are stationary;

step 6, carrying out post-processing based on the Monte Carlo tree decision result obtained in the step 5, checking the final state generated by the Monte Carlo tree search result of each group, judging whether the results can influence each other, combining the mutually influenced groups into a new group, returning to the step 5, constructing a joint action space until reaching a set time threshold, and outputting the current decision result; for the groups without association, directly outputting the current decision result;

and 7, generating decision semantics based on the current decision result, and executing the current scheme.

Further, the step 1 specifically includes the following steps:

101, constructing a red-blue countermeasure scene based on a semantic situation with a foggy effect, wherein information is acquired by the red-blue party based on a visual range, and game inputs of the red-blue party are semantic information;

102 extracting city building, discretizing the game map to construct passable area, wherein the map is used (M ₁ ，N ₁ ) Dimensional 0,1 matrix tableShown, 0 represents a passable area, and 1 represents an unvented area;

103 discretizing the problem based on the passable area, representing the passable location in the city with a connected node v, passing the distance threshold d _thr Discriminating node connectivity by v _i，j Nodes representing the ith row and jth column of the passable zone, using v _p，q A node representing the p-th row and q-th column of the passable region, if the distance d (v _i，j ，v _p，q ) Less than d _thr Node v _i，j ，v _p，q Communicating, otherwise not communicating, and constructing an adjacency matrix of nodes therewith

Wherein node v _ij Adjacency matrix A (v) _ij ) Is M ₁ Row N ₁ Matrix of columns, a (v _i，j ) _(p，q) Representing node v _i，j And node v _p，q Is communicated with:

and further obtaining the communication node of each node, namely the moving action space of the red and blue units positioned on each node.

Further, the step 4 specifically includes the following steps:

401 calculating each node v based on the adjacency matrix _i，j With other nodes v _p，q Connectivity l of (2) _ij ：

The higher the connectivity value, the better the connectivity of the node with other nodes, where M ₁ ，N ₁ Respectively the length and width of the map A _mn (v _ij ) Is matrix A (v _ij ) An element of an mth row and an n-th column;

402 constructing a connected revenue matrix based on connectivity of nodes

403 to construct a radiation probability matrix for gaming element kThe gaming unit includes a red square unit and a blue Fang Shanyuan, wherein the radiation probability matrix is calculated by:

(1) initializing a radiation probability matrix for gaming unit k

(2) Updating a radiation probability matrix tau based on gaming unit positions ^k The method comprises the steps of carrying out a first treatment on the surface of the If the gaming unit k is located at v _i，j Where v is _i，j Radiation probability value at1 is shown in the specification; rest position v _p，q Radiation probability value->By an attenuation function G (v _i，j ，v _p，q ) The following is given:

wherein alpha is an attenuation coefficient; c (v) _i，j ，v _p，q ) Representing node v _i，j ，v _p，q The shortest path length between the two is obtained by breadth or depth traversal, when the radiation depth C (v _i，j ，v _p，q ) Reaching a set threshold or radiation probability valueStopping radiation when the radiation is smaller than a set threshold value;

404, superposing the radiation probability matrixes of the game units to obtain an overall radiation probability matrix D of the red square matrix or the blue square matrix:

wherein n is the number of red or blue units, and if n=0, then The evaluation score of the final red or blue party is expressed as follows:

S _r ＝Z·D _b

S _b ＝Z·D _r

wherein S is _r Evaluation score for red square camp, S _b For the evaluation score of blue matrix camping, the corresponding elements of the representation matrix are multiplied by D _r An integral radiation probability matrix for red square camping, D _b Is the overall radiation probability matrix of the blue square matrix.

Further, the step 6 specifically includes the following steps:

601 defines a path association based on node similarity:

wherein V is _i ＝[V _i，1 ，V _i，2 ，...，V _i，L ]，V _j ＝[V _j，1 ，V _j，2 ，...，V _j，L ]Representing paths of red Fang Shanyuan i and j respectively, wherein L is the number of nodes of the connected nodes contained in the next-moment moving point output by the red Fang Mengte Carlo tree, and a set containing G red party communities with the relevance higher than a threshold is screened based on relevance clusteringS＝[S ₁ ，S ₂ ，…S _i ，…，S _G ]Wherein S is _i Is a red square group S _i Comprises u red Fang Shanyuan [ i ] ₁ ，…，i _u ]；

602, based on the path association degree and the weapon force distribution of both red and blue, quantifying the unit decision coupling degree, and constructing an influence discriminant function f:

wherein D (S) _i ) Is a high-density red square group S _i ＝[i ₁ ，…，i _u ]Path repetition rate of (a):

in set S _i The average repetition rate of mid-red square units measures the degree of association of units within a community, whereinRepresenting community S _i Middle unit i _o Sum unit i _l Path of->Representing the red party group S _i The path of each cell in (2), in addition, calculate S _i The number of red people N _r Number of enemies N included in the red path _b The degree of the actual difference between two parties of the war is defined by the distribution of the number of red and blue parties:

603 for s= [ S ] ₁ ，S ₂ ，…S _i ，...，S _G ]Each community S of (1) _i ＝[i ₁ ，…，i _u ]Respectively calculating discriminant functions to discriminateWhether or not the red square units mutually affect, if the discriminant function f (S _i ) Less than or equal to threshold d _dis Group S _i ＝[i ₁ ，…，i _u ]Each unit of (a) is mutually influenced by each other, S is _i The unit combination in (a) constructs a joint action space, and returns to the step 5 to calculate again; if the discriminant function f (S _i ) Greater than threshold d _dis And the units have no mutual influence, and the independent calculation result is reserved and used as a decision result of the unit.

The invention has the following beneficial effects:

(1) The invention provides a modeling and incomplete information processing method for a semantic situation countermeasure game problem, which can solve the problem of incomplete information based on a Monte Carlo tree.

(2) The invention provides an action space decoupling method, which can solve the problem of overlarge search space caused by the combination action space of multiple agents in a complex game environment and is used for solving the game decision problem with limited time.

Drawings

Fig. 1 is a schematic diagram of an embodiment of the present invention.

Detailed Description

The invention will be described in further detail with reference to the accompanying drawings and detailed description.

Referring to fig. 1, a game decision method for time-limited red-blue countermeasure problem action space decoupling includes the following steps:

step 1, based on the red and blue attack and defense problems of the urban environment, clearly thinking and modeling a scene. Building a red-blue countermeasure scene with a foggy effect, providing semantic information of red-blue situations of respective camping views, abstracting the environment based on buildings, introducing connectivity and nodes to discretize a map, and defining a principle of action space formulation.

Step 2, constructing a blue party information prediction model under incomplete information, taking blue party incomplete information semantic situation and red party information based on foggy under a red party visual angle as input, and training a blue party strategy prediction network BN _r And obtaining the current unknown blue party information. First, set upAnd the red party is supposed to decide, and the situation semantic information of the red party and the blue party of each camping view is analyzed and converted into a situation vector data pair. Secondly, taking semantic information of red and blue situations under a blue party view angle as input, performing self-game by utilizing a Monte Carlo tree to obtain a blue party strategy, and combining the red and blue party information under the red party view angle to obtain a large number of vector situation data pairs (the current state of the red and blue parties under the red party view angle and the Monte Carlo tree blue party strategy result). Finally, training a blue party information prediction model BN based on the situation vector data _r The network inputs are red side information under the red side visual angle and acquired blue side position, and outputs are position information of unknown blue side, and support variable length output.

Step 3, analyzing blue-side situation semantic information and red-side information containing mist at the red-side view angle, and then sending the blue-side situation semantic information and red-side information into a blue-side position pre-training model BN _r And predicting unknown blue side information to obtain a blue side complete information environment under the red side visual angle.

And 4, establishing a joint Monte Carlo tree evaluation system based on the connected benefit matrix, the probability radiation matrix and the red and blue unit distribution matrix. The method specifically comprises the following steps:

402 constructing a connected revenue matrix based on connectivity of nodes

(1) initializing a radiation probability matrix for gaming unit k

wherein, the liquid crystal display device comprises a liquid crystal display device,n is the number of red or blue square units, and if n=0, then The evaluation score of the final red or blue party is expressed as follows:

S _r ＝Z·D _b

S _b ＝Z·D _r

And 5, based on the blue party complete information obtained in the step 3 and the evaluation system obtained in the step 4, performing action space decoupling on each group of the red party, and establishing a Monte Carlo tree separation decision. Wherein, each unit of the red party is defined as 1 group at the initial moment, and in the Monte Carlo tree decision process of each group, only the current group is searched for decision, and other groups are stationary. Specifically:

(1) a single community action space is constructed, and for each community of red parties, the feasible action points of the next moment are determined, and meanwhile, the decision time interval is determined.

(2) And (3) evaluating the Monte Carlo tree nodes based on the evaluation system in the step (4), namely grading and evaluating the situation after the action selection, and assisting in constructing the Monte Carlo tree.

(3) And (3) carrying out the steps of selecting, expanding, simulating and backtracking 4 steps to establish the Monte Carlo tree, if the specified search time limit is reached, selecting the node with the maximum current estimated value, otherwise, continuing the Monte Carlo tree searching step and carrying out deep searching.

Step 6, carrying out post-processing based on the Monte Carlo tree decision result obtained in the step 5, checking the final state generated by the Monte Carlo tree search result of each group, judging whether the results can influence each other, combining the mutually influenced groups into a new group, returning to the step 5, constructing a joint action space until reaching a set time threshold, and outputting the current decision result; for the groups without association, directly outputting the current decision result; the method specifically comprises the following steps:

601 defines a path association based on node similarity:

wherein V is _i ＝[V _i，1 ，V _i，2 ，...，V _i，L ]，V _j ＝[V _j，1 ，V _j，2 ，...，V _j，L ]Respectively representing paths of red Fang Shanyuan i and j, wherein L is the number of nodes of the connected nodes contained in the next-moment moving point output by the red Fang Mengte Carlo tree, and a set S= [ S ] containing G red party communities with a higher relevance than a threshold is screened based on relevance clustering ₁ ，S ₂ ，…S _i ，…，S _G ]Wherein S is _i Is a red square group S _i Comprises u red Fang Shanyuan [ i ] ₁ ，…，i _u ]；

603 for s= [ S ] ₁ ，S ₂ ，…S _i ，...，S _G ]Each community S of (1) _i ＝[i ₁ ，…，i _u ]Respectively calculating discriminant functions to judge whether the red square units in the discriminant functions are mutually influenced, if so, f (S _i ) Less than or equal to threshold d _dis Group S _i ＝[i ₁ ，…，i _u ]Each unit of (a) is mutually influenced by each other, S is _i The unit combination in (a) constructs a joint action space, and returns to the step 5 to calculate again; if the discriminant function f (S _i ) Greater than threshold d _dis And the units have no mutual influence, and the independent calculation result is reserved and used as a decision result of the unit.

The invention realizes the Monte Carlo tree improvement method for the action space decoupling of the red and blue countermeasure problems, researches the ultra-large-scale information space state searching and solving technology in the incomplete information game environment aiming at the intelligent group game decision in the complex environment based on the semantic situation, researches the efficient information space searching and estimating method, improves the information understanding and abstract capacity of the incomplete information, finally forms an effective game strategy solving method and provides effective support for the auxiliary decision of an intelligent body.

While the foregoing describes illustrative embodiments of the present invention to facilitate an understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but is to be construed as protected by all the inventions by the appended claims insofar as the various changes are within the spirit and scope of the present invention as defined and defined by the appended claims.

Claims

1. A game decision method for motion space decoupling of time-limited red and blue countermeasure problems is characterized by comprising the following steps:

step 2, analyzing semantic information of red and blue situations of each camping view, constructing game data, and training a blue information prediction model under incomplete information to obtain a pre-training model BN _r ；

step 7, generating decision semantics based on the current decision result, and executing the current scheme;

wherein, the step 4 specifically comprises the following steps:

401 calculating each node v based on the adjacency matrix _i,j With other nodes v _p,q Connectivity l of (2) _ij ：

The higher the connectivity value, the better the connectivity of the node with other nodes, where M ₁ ,N ₁ Respectively the length and width of the map A _mn (v _ij ) Is matrix A (v _ij ) An element of an mth row and an n-th column;

402 constructing a connected revenue matrix based on connectivity of nodes

(1) initializing a radiation probability matrix for gaming unit k

(2) Updating a radiation probability matrix tau based on gaming unit positions ^k The method comprises the steps of carrying out a first treatment on the surface of the If gameUnit k is at v _i,j Where v is _i,j Radiation probability value at1 is shown in the specification; rest position v _p,q Radiation probability value->By an attenuation function G (v _i,j ,v _p,q ) The following is given:

wherein alpha is an attenuation coefficient; c (v) _i,j ,v _p,q ) Representing node v _i,j ,v _p,q The shortest path length between the two is obtained by breadth or depth traversal, when the radiation depth C (v _i,j ,v _p,q ) Reaching a set threshold or radiation probability valueStopping radiation when the radiation is smaller than a set threshold value;

wherein n is the number of red or blue units, and if n=0, thenThe evaluation score of the final red or blue party is expressed as follows:

S _r ＝Z·D _b

S _b ＝Z·D _r

2. The method for game decision making with time-limited red-blue countermeasure problem action space decoupling according to claim 1, wherein step 1 specifically comprises the following steps:

102 extracting city building, discretizing the game map to construct passable area, wherein the map is used (M ₁ ,N ₁ ) A 0,1 matrix representation of dimensions, 0 representing passable areas and 1 representing non-passable areas;

103 discretizing the problem based on the passable area, representing the passable location in the city with a connected node v, passing the distance threshold d _thr Discriminating node connectivity by v _i,j Nodes representing the ith row and jth column of the passable zone, using v _p,q A node representing the p-th row and q-th column of the passable region, if the distance d (v _i,j ,v _p,q ) Less than d _thr Node v _i,j ,v _p,q Communicating, otherwise not communicating, and constructing an adjacency matrix of nodes therewith

Wherein node v _ij Adjacency matrix A (v) _ij ) Is M ₁ Row N ₁ Matrix of columns, a (v _i,j ) _(p,q) Representing node v _i,j And node v _p,q Is communicated with:

3. The method for game decision making with time-limited red-blue countermeasure problem action space decoupling according to claim 1, wherein step 6 specifically comprises the steps of:

601 defines a path association based on node similarity:

wherein V is _i ＝[V _i,1 ,V _i,2 ,…,V _i,L ],V _j ＝[V _j,1 ,V _j,2 ,…,V _j,L ]Respectively representing paths of red Fang Shanyuan i and j, wherein L is the number of nodes of the connected nodes contained in the next-moment moving point output by the red Fang Mengte Carlo tree, and a set S= [ S ] containing G red party communities with a higher relevance than a threshold is screened based on relevance clustering ₁ ,S ₂ ,…S _i ,…,S _G ]Wherein S is _i Is a red square group S _i Comprises u red Fang Shanyuan [ i ] ₁ ,…,i _u ]；

wherein D (S) _i ) Is a high-density red square group S _i ＝[i ₁ ,…,i _u ]Path repetition rate of (a):

603 for s= [ S ] ₁ ,S ₂ ,…S _i ,…,S _G ]Each community S of (1) _i ＝[i ₁ ,…,i _u ]Respectively calculating discriminant functions to judge whether the red square units in the discriminant functions are mutually influenced, if so, f (S _i ) Less than or equal to threshold d _dis Group S _i ＝[i ₁ ,…,i _u ]Each unit of (a) is mutually influenced by each other, S is _i The unit combination in (a) constructs a joint action space, and returns to the step 5 to calculate again; if the discriminant function f (S _i ) Greater than threshold d _dis And the units have no mutual influence, and the independent calculation result is reserved and used as a decision result of the unit.