CN111275174A

CN111275174A - Game-oriented radar countermeasure generating method

Info

Publication number: CN111275174A
Application number: CN202010091616.XA
Authority: CN
Inventors: 杨健; 王沙飞; 李岩; 肖德政; 田震; 张丁
Original assignee: 32802 Troops Of People's Liberation Army Of China; Beijing Institute of Technology BIT
Current assignee: 32802 Troops Of People's Liberation Army Of China; Beijing Institute of Technology BIT
Priority date: 2020-02-13
Filing date: 2020-02-13
Publication date: 2020-06-12
Anticipated expiration: 2040-02-13
Also published as: CN111275174B

Abstract

The invention provides a game-oriented radar countermeasure strategy generation method, which comprises the following steps: setting a confrontation scene: setting a radar and interference system for two players, constructing a radar confrontation game tree, setting two neural networks of an regret value and a confrontation strategy and a corresponding cache region for each player, and initializing neural network parameters; and traversing K game trees within the iteration times, alternately training the radar and the interference party, training the regressive neural network by using the regressive value cache region data, and then training the countermeasure neural network of the radar and the interference party by using the collected countermeasure strategy cache region data until the countermeasure neural network converges. Compared with the radar anti-static game research, the method has the advantages that a dynamic game model of incomplete information is constructed, under the condition that both sides are intelligent, Nash equilibrium of the radar and the interference system game is approximately solved by using the neural network, and respective Nash equilibrium strategies are obtained through multiple dynamic iteration updates.

Description

Game-oriented radar countermeasure generating method

Technical Field

The invention relates to the technical field of intersection of radar electronic countermeasure, game theory and artificial intelligence, in particular to a game-oriented radar countermeasure strategy generation method.

Background

The application of artificial intelligence technology in the field of electronic countermeasure is more and more deep, the radar countermeasure tends to be intelligent, both the radar and the interference system develop intelligent algorithms with self-adaption and even cognition capability, and the research of the radar electronic countermeasure problem from the perspective of dynamic game of both the radar and the interference system is an important development direction.

At present, the field of cognitive electronic countermeasure mainly focuses on the intelligent algorithm research in radar or interference square, such as the waveform optimization, target detection, identification and tracking technology of cognitive radar; and adaptive interference decision of an interference party, an interference effect evaluation technology and the like. The current interference countermeasure strategy generation only focuses on the optimization of a unilateral algorithm, for example, a reinforcement learning method is used, and through repeated countermeasure interaction, the self optimal strategy is generated.

Game theory is the theory of studying the strategy and decision making of two or more people. Double zero and betting refers to a game in which two players play against each other and the sum of the earnings of the two players is zero. The goal of double-player zero-sum gambling is to solve nash equilibrium, a state of equilibrium in which any player changing the strategy will reduce his own revenue. The countermeasure process of the radar and the interference party can be regarded as double zero-sum game, and the respective strategies of the two parties can be obtained by solving Nash equilibrium.

The existing radar countermeasure game mainly researches static complete information double-player zero sum game, both sides of the countermeasure have complete understanding of the action space and the profit function of the opponent, the finally obtained countermeasure is also fixed, and under the actual complex electromagnetic environment, the static strategy cannot deal with the variable opponents, and the opponents can easily guess the own strategy, so that the opponents cannot win the countermeasure.

Disclosure of Invention

The invention aims to provide a game-oriented radar countermeasure generation method, which can overcome the defects of the prior art, is applied to a radar countermeasure scene, dynamically updates own strategies of counterpartners in a game in a scene with incomplete information, and has important significance for the development of cognitive radars and cognitive countermeasure technologies.

The technical scheme of the invention is as follows, and the game-oriented radar countermeasure generating method comprises the following specific steps:

step 1: regarding the radar and interference system as two players of the game, setting a confrontation scene: setting the radar to have N_radarAn operating state, the interferer has N_jamSetting the interference pattern, converting the working state of the radar into a report, constructing a radar countermeasure game tree from a root node root, and setting N_infoEach node needs to obtain a confrontation strategy, each node is provided with a corresponding player information set I, each node has a unique and fixed player to make a selection action, and each selectable action a_iI is 1, 2, …, n is an optional number of actions, I is_l，l＝1，…N_infoRepresenting an information set of nodes needing to obtain a countermeasure strategy, and setting utility values of a radar and an interference system at each termination node of the tree according to a state conversion return report;

setting the number of iterations N_iterAnd setting the training frequency N of the countermeasure neural network according to the traversal times K of the game tree from the root node in each iteration_sSetting the training times N of the neural network_nnSetting a threshold theta for judging whether the countermeasure is converged;

setting two neural networks of regret value and countermeasure strategy for each player, each neural network having corresponding training sampleBuffer area, note as

And

setting a countermeasure vector buffer area M, wherein the content stored in the buffer area is (I)_l，[d(I_l，a₁)，…，d(I_l，a_n)]or[s(I_l，a₁)，…，s(I_l，a_n)]T), the input to the neural network is I_lAnd outputting the predicted regret value vector calculated by the neural network

Or predicting countermeasure policy vector

Emptying buffers before training begins

And M, initializing neural network parameters, enabling t to be 1, and skipping to the step 2;

wherein the training frequency N_sRepresentation training N_sAfter regret neural network, train 1 fighting strategy neural network and satisfy N_iter％N _s0; the superscripts r and s represent regret and countermeasure (strategy), respectively, and the

subscripts

0 and 1 represent radar and interference systems, respectively; the content of the cache region is as follows: i is_lInformation sets corresponding to the nodes representing the game tree for which a countermeasure policy is to be derived, d (I)_l，a_i) I-1, …, n is an unfortunate value and represents the current information set I_lTake action a_iIn the case of the above-described conventional art,

is an unfortunate value of neural network prediction, s (I)_l，a_i) I-1, …, n is a countermeasure policy indicating the action a taken under the current information set_iThe probability of (a) of (b) being,

is a countermeasure strategy predicted by a neural network, t is the iteration number of the current game, and u (I)_l，a_i) Where I is 1, …, n denotes the information set I_lLower motion a_iA utility value of;

step 2: selecting a current player p as t% 2, traversing the radar countermeasure game tree from a root node for K times in each iteration process, and if a node needs to obtain an countermeasure strategy, determining an information set I corresponding to the node_lThe player is the current player p of the iteration, and an information set I is input_lObtaining predicted regret vector according to regret neural network of player p

According to

To calculate the countermeasure policy vector s (I) for the information set_l，a₁)，…，s(I_l，a_n)]And accumulating the utility value and the countermeasure vector obtained in the traversal process to obtain an regret value vector [ d (I)_l，a₁)，…，d(I_l，a_n)]The information is collected into a node I_lRegretted value vector [ d (I)_l，a₁)，…，d(I_l，a_n)]And the current iteration number t into the regret value cache area of the current player p

If the information set I_lThe player is 1-p, namely the current player of the iteration is not, the information set I is input_lAsk for their output at the regret neural network of player 1-p

According to

To calculate a countermeasure for the set of information[ s (I) ]_l，a₁)，…，s(I_l，a_n)]And combining the information set I_lAntagonistic strategy [ s (I)_l，a₁)，…，s(I_l，a_n)]And the current iteration number t are stored in the confrontation strategy cache area of the player 1-p

After traversing the game tree for K times from the root node, namely after 1 iteration process, caching the area according to the regret value of the current player p

Training data in N_nnRegrettful neural network of next player to output vector

And the expected vector [ d (I)_l，a₁)，…，d(I_l，a_n)]As close as possible, i.e. regrettable neural network loss function L^rCloser and closer to 0;

iteration N_sAfter that, satisfy t% N_sWhen the value is equal to 0, skipping to the step 3;

and step 3: data caching using two countermeasure policies

Training update N_nnSecond two countermeasure neural networks, vector of outputs

And the expected vector s (I)_l，a₁)，…，s(I_l，a_n)]As close as possible, i.e. countering the strategic neural network loss function L^sCloser and closer to 0;

information set I corresponding to each node needing to obtain countermeasure strategy_l，l＝1，…，N_infoInputting the competition strategy network of the corresponding player and obtaining N_infoGroup impedance strategy is stored inA buffer area M;

if the accumulated iteration number of the step 2 does not reach N_iterReturning to the step 2 to carry out iterative calculation again;

repeating the step 2 and the step 3 until the iteration number reaches N_iterThen, make the Nth in the buffer memory M_iterCountermeasure strategy for secondary record and Nth_iter-N_sThe countermeasures recorded at the time are subjected to difference and the absolute value is taken, the largest element is found, if the maximum element is smaller than the threshold value theta, the output of the neural networks of the two countermeasures at the time is converged, the two parties reach a Nash equilibrium state, and the respective Nash equilibrium strategies are obtained; if it is greater than the threshold θ, it indicates that convergence has not been reached, i.e., the number of iterations N_iterToo small, it is necessary to jump back to step 1 to change N_iterThe value of (c) is restarted.

Preferably, the method for constructing the radar countermeasure game tree in the step 1 comprises the following specific steps: starting from a root node root, a working state is selected by a radar, a set available interference pattern is arranged on an interference side in each working state, after the interference is implemented, the radar has a set convertible working state, nodes for terminating the game are set, the branch conditions of all game trees are exhausted, and the game tree construction is completed.

Preferably, the information set I in step 1 specifically includes: the number of the player to which the information set belongs, the history of the sequence of actions of the game before the information set, and the position of the information set in the game tree.

Further, the calculation of the countermeasure policy vector [ s (I) as described in step 2_l，a₁)，…，s(I_l，a_n)]The specific calculation method comprises the following steps:

in which use

To represent

That is, the predicted regret value is positive, this formula also shows that if the regret values of all actions in the information set I are negative, the probability that the action with the largest regret value is selected is 1.

Further, the regrettable value vector [ d (I) of the current player p is calculated as described in step 2_l，a₁)，…，d(I_l，a_n)]The method comprises the following steps:

according to input I_lUtility value vector u (I) of each action under corresponding node_l，a_i)，…，u(I_l，a_n)]And the countermeasure strategy vector [ s (I) corresponding to the node_l，a₁)，…，s(I_l，a_n)]Calculating the regret value d (I) of each action under the node_l，a_i) The method specifically comprises the following steps:

calculated according to the formula to obtain [ d (I)_l，a₁)，…，d(I_l，a_n)]。

Further, the training method of the regret neural network described in step 2 specifically includes the following steps:

2.1 the training process of neural network is the process of minimizing the loss function to update the network parameters, so the loss function L of the regressive neural network is set^rAs shown in the following formula:

wherein N is_batchThe number of data used for a training session, i.e. N is selected from the regrettable buffer_batchThe data is used as a sample to participate in training, and t is an regret value d in a regret value cache region_i(I_i，a_k) The number of corresponding iterations is then determined,

an output of the regret neural network;

2.2 pairs loss function L^rAnd (4) carrying out gradient descent, and continuously adjusting network parameters through a back propagation algorithm to minimize a loss function so as to obtain a new regret value neural network.

Further, the strategy of training neural network of step 3 is as follows:

3.1 the training process of the neural network is the process of minimizing the loss function to update the network parameters, thus setting the loss function L of the strategy-fighting neural network^sAs shown in the following formula:

wherein N is_batchThe number of data used for one training, i.e. N is selected from the countermeasure buffer_batchTaking the data as a sample to participate in training, and t is a countermeasure strategy s in a countermeasure strategy cache region_i(I_i，a_k) The number of corresponding iterations is then determined,

is the output of the countermeasure neural network;

3.2 pairs loss function L^sAnd (4) performing gradient descent, and continuously adjusting parameters of the network through a back propagation algorithm to minimize a loss function so as to obtain a new countermeasure neural network.

Preferably, said loss function L^rOr L^sMiddle batch N_batchThe training data is selected by weighting and extracting the data in the regret value cache region or the countermeasure cache region according to the size of the t in proportion, namely the probability of selecting the data with later iteration times is higher, and N is selected_batchAnd (4) data.

The invention has the beneficial effects that: the method regards the radar countermeasure process as double zero and dynamic games under incomplete information, constructs a dynamic game model of the incomplete information compared with radar countermeasure static game research, approximately solves Nash equilibrium of the radar and interference system games by using a neural network under the condition that both sides are intelligent, and obtains respective Nash equilibrium strategies through multiple dynamic iterative updates.

Drawings

Fig. 1 is a game tree of a radar countermeasure simulation scene in an embodiment of the present invention.

Fig. 2 is a schematic diagram of a radar state transition matrix according to an embodiment of the present invention.

FIG. 3 is a flow chart of an experiment according to the present invention.

FIG. 4 is a diagram of a neural network according to an embodiment of the present invention.

Fig. 5 is a convergence curve of each action probability of the radar under the root node in the embodiment of the present invention.

Fig. 6 is a convergence curve of the probability of each operation of the interference side under the node s0 according to the embodiment of the present invention.

FIG. 7 is a table of simulation results of action utility values and probabilities of various sets of information of a radar according to an embodiment of the present invention.

Fig. 8 is a table of action utility values and probability simulation results of each information set of an interferer according to the embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the figures and the embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The experiment sets the countermeasure scene as one-to-one countermeasure, namely, the countermeasure scene comprises a multifunctional radar with a switchable state and an interference machine with a changeable interference pattern. Setting N_radar3, i.e. the radar has three operating states, denoted s₀～s₂The radar selects different working states, namely different actions; n is a radical of_jamI.e. the interferer has five interference patterns, denoted j₀～j₄Similarly, the selection of a different interference pattern by the interferer is a selection of a different action. Setting the number of iterations N _iter1000, the traversal times K of the game tree in each iteration is 10, and the training frequency N of the confrontation strategy neural network _s2, neural network training times N_nnWhen the threshold θ is equal to 0.01, it is determined whether the convergence is reached, and the countermeasure cache area M is cleared.

Setting the Radar countermeasure Game Tree As shown in FIG. 1, the total number of nodes N of the Game Tree_tree23, the number of nodes N required to obtain the countermeasure strategy_info7. Starting from a root node, the radar can select one working state according to the current countermeasure strategy, the interference party can execute corresponding interference patterns aiming at different radar states, and after the interference party sends interference, the radar selects the next working state. The figure shows seven nodes that need to derive a countermeasure in a dark graph. In order to make the simulation more practical, the number of the optional actions and the content of the optional actions are not completely consistent under different nodes of the experimental design radar and the interference party. E.g. s₀At the node, the only actions available to the interferer are j₁、j₃And j₄And the interferer selects j₁After that, the radar can only jump to s₁And s₂Two states.

As shown in fig. 2, threat degrees of three states of the radar are evaluated from the perspective of the interfering party, and two-party rewards after state transition are set according to the threat degrees of the radar states, wherein the first is the radar party reward, the second is the interfering party reward, and the rewards of two players meet the characteristic of zero sum. Both the radar and the interference system in the subsequent experiment obtain corresponding utility values in the game process according to the state conversion report of fig. 2, so as to calculate data such as regret values.

Experimental flow diagram As shown in FIG. 3, the neural network is initialized and four buffer areas are cleared

And

and traversing K game trees within the iteration times, alternately training the radar and the interference party, training the regressive neural network by using the regressive value cache region data, and then training the countermeasure neural network of the radar and the interference party by using the collected countermeasure strategy cache region data until the countermeasure neural network converges. Specific details and calculations are described below.

The structure of the deep neural network used in the experiment is shown in FIG. 4, the networkData of the information set is taken as input, including player information (radar or interfering parties) of the current information set, the numbered sequence of actions performed, and the progress of the gaming process. For example, s in FIG. 1₀The information set of a node is represented as ([1 ]]，[0，-1]，[1，-1])，[1]Representing a set of information s₀Take turns to Player 1, i.e. the disturber selection action, [0, -1]Indicates that the radar selects action 0 in the previous information set root, but the action is not selected yet and is marked as-1, [1, -1]Indicating progress, i.e. having undergone only one information set selection action from the root node.

The output of the neural network is a vector of 3-dimensional (radar) or 5-dimensional (interference), i.e. an unfortunate value or a countermeasure of the radar or the interference, and since the available actions under different information sets are different, the unfortunate value and the countermeasure of the unavailable actions are both expressed by None as invalid values.

For example, in the 500 th iteration, the selected current player is the interference party and traverses from the root node to s₀The following calculation process will be followed after the node:

1)s₀the player of the node is an interference party, and corresponding information set data, namely s, is input₀＝([1]，[0，-1]，[1，-1]) To the interfering party, output

(Note: here we shall output a 5-dimensional vector, since the information set s₀Lower only j₁，j₃And j₄Are available actions, so do not represent the belonging to j₀And j₂Two invalid values None, which will be omitted later in a similar manner), the countermeasure policy is calculated according to the following formula:

namely:

i.e. node s₀Is [ s(s) ]₀，j₁)，s(s₀，j₃)，s(s₀，j₄)]＝[0.51，0.49，0]。

2) Traverse s₀The 3 actions under the node are respectively to the node j₁，j₃And j₄And respectively calculating utility values of the three actions:

① since only one node under the last two nodes reaches the termination state, so for s₀On the part of the node, the interferer selection action j₃The utility value 2, i.e. u(s), is obtained₀，j₃) 2; selection action j₄The utility value of-1, i.e. u(s), is obtained₀，j₄)＝-1；

② traversal to node j₁Due to j₁Is radar, so j is₁Information set data j of nodes₁＝([0]，[0，1]，[1，1]) Regrettful neural network of input radar to obtain output

Calculating the countermeasure thereof:

i.e. j₁The countermeasure policy of the node is [ s (j)₁，s₁)，s(j₁，s₂)]＝[1，0]Correlation data (j)₁，[s(j₁，s₁)，s(j₁，s₂)]T), i.e., ([ 0)]，[0，1]，[1，1])，[1，0]500), pairs deposited to radarPolicy resistant cache

③j₁Two actions under the node, which can get utility values of 2 and 4, respectively, for the interferer, from s according to the countermeasure policy mentioned in ②₁And s₂Select an action and return utility value, select s₁It can be concluded that for the interferer, j is selected in this traversal₁Has a utility value of 2, i.e. u(s)₀，j₁)＝2；

3) According to s₀Utility values of respective actions under the node, and s₀Of the countermeasure, calculating s₀Regret values for the following actions:

d(s₀，j₁)＝u(s₀，j₁)-(u(s₀，j₁)*s(s₀，j₁)+u(s₀，j₃)*s(s₀，j₃)+u(s₀，j₄)*s(s₀，j₄))

＝2-(2*0.51+2*0.49+(-1)*0)

＝0

d(s₀，j₃)＝u(s₀，j₃)-(u(s₀，j₁)*s(s₀，j₁)+u(s₀，j₃)*s(s₀，j₃)+u(s₀，j₄)*s(s₀，j₄))

＝2-(2*0.51+2*0.49+(-1)*0)

＝0

d(s₀，j₄)＝u(s₀，j₄)-(u(s₀，j₁)*s(s₀，j₁)+u(s₀，j₃)*s(s₀，j₃)+u(s₀，j₄)*s(s₀，j₄))

＝-1-(2*0.51+2*0.49+(-1)*0)

＝-3

correlation data(s)₀，[d(s₀，j₁)，d(s₀，j₃)，d(s₀，j₄)]T), i.e., ([ 1)]，[0，-1]，[1，-1])，[0，0，-3]500), put into the regret buffer of the interference party

After traversing 10 game trees from root, because the current player of the 500 th iteration is an interference party, training the regret neural network:

according to set N_bantch30, each time from the regret buffer

Selecting 30 data, wherein when selecting data, the data needs to be extracted according to the proportion weighted by the iteration times t, namely the data with the later iteration times are more easily selected, and according to the 30 data, according to the loss function

Selecting Adam gradient descent method, continuously adjusting weight and deviation of neural network through back propagation algorithm, minimizing loss function, training N_nnThe loss function L is obtained after 30 times^r＝0.00034。

The 500 th iteration is N_sAfter training the regret neural network of the interferer, two confrontation strategy neural networks are also trained respectively:

according to set N_bantch30, each time from the countermeasure cache

And selecting 30 data, wherein when the data is selected, the data needs to be extracted according to the proportion weighted by the iteration times t, namely the data with the later iteration times is easier to select. From these 30 data, according to the loss function

Selecting Adam gradient descent method, and continuously adjusting spirit by back propagation algorithmThrough the weight and deviation of the network, the loss function is minimized, and N is trained_nnThe loss functions of the neural networks of the two confrontation strategies obtained in the next time are 17.5465 and 0.00161 respectively. Root, s₀J under₁，s₁J under₂，s₂J under₂Inputting the corresponding information set data into a radar countermeasure neural network, and converting s₀，s₁And s₂Inputting corresponding information set data into an interference party countermeasure strategy neural network to obtain N_infoAnd 7 confrontation strategy vectors are stored in the buffer area M.

The experiment is iterated to 1000, and the countermeasure strategies of the two parties are obtained as shown in fig. 7 and 8, the difference is made between the countermeasure strategy of the 998 th time and the absolute value is taken, the largest element is found to be 0.003572, and the maximum element is smaller than the threshold value theta which is 0.01, so that the countermeasure neural network can be obtained to achieve convergence.

Selection of root and s for Experimental results₀Two relatively complex information sets are analyzed, and the convergence condition of each action probability under 1000 games is drawn. As can be seen from fig. 5, after 750 games, the countermeasure strategy basically converges in the information set root; as can be seen from FIG. 6, the information set s₀The confrontation strategy of (1) converges faster and substantially after 500 games.

Referring to the report table in fig. 2, the utility (value or range) of each action under the information set corresponding to each node where the countermeasure policy needs to be derived by the radar and the interferer, and the action probability output after 1000 games are listed, as shown in fig. 7 and fig. 8. Information set j in FIG. 7₁(s₀) Is shown in radar state s₀Lower implementation of the interference action j₁After that, it is the turn of the radar to select the next action, j₂(s₁) And j₂(s₂) The meaning is similar to it. The optimal action can be visually seen according to the utility of part of the information sets, and the corresponding optimal action is marked in the table by using bold fonts.

The countermeasure strategies for which the radar side needs to derive the respective information sets of the countermeasure strategy are analyzed in connection with fig. 7. Comparing utility values of actions, j₁(s₀)，j₂(s₁) And j₂(s₂) All have absoluteAdvantageous optional actions, simulation results can also be verified, with an absolute advantageous action probability of approximately 1. And the calculation of the countermeasure strategy of the information set root needs to consider the corresponding countermeasure strategy of the disturber in the countermeasure process. Comprehensively analyzing the utility data of the two parties, s under Nash equilibrium₁And s₂The utility values of the two actions are approximately equal and greater than s₀I.e. s₁And s₂Both actions are approximately equivalent and both are superior to s₀The simulation results also conform to the above analysis.

The countermeasure strategies for the interferers requiring the respective information sets of the countermeasure strategies are analyzed in connection with fig. 8. Information set s₁And s₂All have optional actions with absolute advantage, and the simulation result can also be verified, and the probability of the action with absolute advantage is approximately 1. It is noted that in the simulation results, the interference action j₁The absolute advantage of (c) is not shown. Analysis shows that, similar to root, s₀Is in j with the radar side₁(s₀) The countermeasure is closely related, under Nash equilibrium, the interferer selects j₁Will be approximately equal to the selection j₃And all are superior to the selection j₄So j in the simulation result₁And j₃Is about 0.5.

The foregoing description is only an example of the game play countermeasure generation of the present invention and is not intended to limit the present invention, and various modifications and variations will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A game-oriented radar countermeasure generating method is characterized by comprising the following specific steps:

step 1: regarding the radar and interference system as two players of the game, setting a confrontation scene: setting the radar to have N_radarAn operating state, the interferer has N_jamSetting an interference pattern, converting the working state of the radar into a report, and constructing a radar countermeasure from a root node rootGame tree, set N_infoEach node needs to obtain a confrontation strategy, each node is provided with a corresponding player information set I, each node has a unique and fixed player to make a selection action, and each selectable action a_iI is 1, 2, …, n is an optional number of actions, I is_l，l＝1，…N_infoRepresenting an information set of nodes needing to obtain a countermeasure strategy, and setting utility values of a radar and an interference system at each termination node of the tree according to a state conversion return report;

each player sets two neural networks of regret value and countermeasure strategy, each neural network has corresponding training sample buffer zone, and the buffer zone is recorded as

And

Or predicting countermeasure policy vector

Emptying buffers before training begins

And M, beginningInitializing the neural network parameters, setting t as 1, and skipping to the step 2;

wherein the training frequency N_sRepresentation training N_sAfter regret neural network, train 1 fighting strategy neural network and satisfy N_iter％N_s0; the superscripts r and s represent regret and countermeasure (strategy), respectively, and the subscripts 0 and 1 represent radar and interference systems, respectively; the content of the cache region is as follows: i is_lInformation sets corresponding to the nodes representing the game tree for which a countermeasure policy is to be derived, d (I)_l，a_i) I-1, …, n is an unfortunate value and represents the current information set I_lTake action a_iIn the case of the above-described conventional art,

is an unfortunate value of neural network prediction, s (Il, ai), i ═ 1, …, n is a countermeasure strategy, indicating that action a is taken under the current information set_iThe probability of (a) of (b) being,

According to

To calculate the countermeasure policy vector s (I) for the information set_l，a₁)，…，s(I_l，a_n)]Then go back toAccumulating the utility value and the countermeasure vector obtained in the process of over-traversal to obtain an regret value vector [ d (I)_l，a₁)，…，d(I_l，a_n)]The information is collected into a node I_lRegretted value vector [ d (I)_l，a₁)，…，d(I_l，a_n)]And the current iteration number t into the regret value cache area of the current player p

According to

To calculate a countermeasure strategy for the set of information s (I)_l，a₁)，…，s(I_l，a_n)]And combining the information set I_lAntagonistic strategy [ s (I)_l，a₁)，…，s(I_l，a_n)]And the current iteration number t are stored in the confrontation strategy cache area of the player 1-p

Training data in N_nnRegrettful neural network of next player to output vector

and step 3: data caching using two countermeasure policies

information set I corresponding to each node needing to obtain countermeasure strategy_l，l＝1，…，N_infoInputting the competition strategy network of the corresponding player and obtaining N_infoStoring the group impedance strategy into a cache region M;

2. A game-oriented radar countermeasure generation method according to claim 1, wherein the radar countermeasure game tree is constructed in step 1 by the specific method: starting from a root node root, a working state is selected by a radar, a set available interference pattern is arranged on an interference side in each working state, after the interference is implemented, the radar has a set convertible working state, nodes for terminating the game are set, the branch conditions of all game trees are exhausted, and the game tree construction is completed.

3. The game-oriented radar countermeasure generation method according to claim 1, wherein the information set I in step 1 specifically includes: the number of the player to which the information set belongs, the history of the sequence of actions of the game before the information set, and the position of the information set in the game tree.

4. A game oriented radar countermeasure generation method according to claim 1, wherein the calculation of the countermeasure policy vector [ s (I, a) in step 2 is performed₁)，…，s(I，a_n)]The specific calculation method comprises the following steps:

in which use

To represent

5. A game-oriented radar countermeasure generation method according to claim 4, wherein the calculation of the regret value vector [ d (I, a) of the current player p in step 2 is performed₁)，…，d(I，a_n)]The method comprises the following steps:

according to the inputI_lUtility value vector u (I) of each action under corresponding node_l，a_i)，…，u(I_l，a_n)]And the countermeasure strategy vector [ s (I) corresponding to the node_l，a₁)，…，s(I_l，a_n)]Calculating the regret value d (I) of each action under the node_l，a_i) The method specifically comprises the following steps:

6. The game-oriented radar countermeasure generation method according to claim 5, wherein the regressive value neural network training method in the step 2 is specifically as follows:

an output of the regret neural network;

7. The game-oriented radar countermeasure generation method of claim 4, wherein the countermeasure neural network in step 3 is trained as follows:

is the output of the countermeasure neural network;

8. A game-oriented radar countermeasure generation method according to claim 6 or 7, characterized in that the loss function L^rOr L^sA batch of N as described in (1)_batchThe training data is selected by weighting and extracting the data in the regret value cache region or the countermeasure cache region according to the size of the t in proportion, namely the probability of selecting the data with later iteration times is higher, and N is selected_batchAnd (4) data.