CN111275174A - Game-oriented radar countermeasure generating method - Google Patents

Game-oriented radar countermeasure generating method Download PDF

Info

Publication number
CN111275174A
CN111275174A CN202010091616.XA CN202010091616A CN111275174A CN 111275174 A CN111275174 A CN 111275174A CN 202010091616 A CN202010091616 A CN 202010091616A CN 111275174 A CN111275174 A CN 111275174A
Authority
CN
China
Prior art keywords
countermeasure
neural network
radar
game
strategy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010091616.XA
Other languages
Chinese (zh)
Other versions
CN111275174B (en
Inventor
杨健
王沙飞
李岩
肖德政
田震
张丁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
32802 Troops Of People's Liberation Army Of China
Beijing Institute of Technology BIT
Original Assignee
32802 Troops Of People's Liberation Army Of China
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 32802 Troops Of People's Liberation Army Of China, Beijing Institute of Technology BIT filed Critical 32802 Troops Of People's Liberation Army Of China
Priority to CN202010091616.XA priority Critical patent/CN111275174B/en
Publication of CN111275174A publication Critical patent/CN111275174A/en
Application granted granted Critical
Publication of CN111275174B publication Critical patent/CN111275174B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Radar Systems Or Details Thereof (AREA)

Abstract

The invention provides a game-oriented radar countermeasure strategy generation method, which comprises the following steps: setting a confrontation scene: setting a radar and interference system for two players, constructing a radar confrontation game tree, setting two neural networks of an regret value and a confrontation strategy and a corresponding cache region for each player, and initializing neural network parameters; and traversing K game trees within the iteration times, alternately training the radar and the interference party, training the regressive neural network by using the regressive value cache region data, and then training the countermeasure neural network of the radar and the interference party by using the collected countermeasure strategy cache region data until the countermeasure neural network converges. Compared with the radar anti-static game research, the method has the advantages that a dynamic game model of incomplete information is constructed, under the condition that both sides are intelligent, Nash equilibrium of the radar and the interference system game is approximately solved by using the neural network, and respective Nash equilibrium strategies are obtained through multiple dynamic iteration updates.

Description

Game-oriented radar countermeasure generating method
Technical Field
The invention relates to the technical field of intersection of radar electronic countermeasure, game theory and artificial intelligence, in particular to a game-oriented radar countermeasure strategy generation method.
Background
The application of artificial intelligence technology in the field of electronic countermeasure is more and more deep, the radar countermeasure tends to be intelligent, both the radar and the interference system develop intelligent algorithms with self-adaption and even cognition capability, and the research of the radar electronic countermeasure problem from the perspective of dynamic game of both the radar and the interference system is an important development direction.
At present, the field of cognitive electronic countermeasure mainly focuses on the intelligent algorithm research in radar or interference square, such as the waveform optimization, target detection, identification and tracking technology of cognitive radar; and adaptive interference decision of an interference party, an interference effect evaluation technology and the like. The current interference countermeasure strategy generation only focuses on the optimization of a unilateral algorithm, for example, a reinforcement learning method is used, and through repeated countermeasure interaction, the self optimal strategy is generated.
Game theory is the theory of studying the strategy and decision making of two or more people. Double zero and betting refers to a game in which two players play against each other and the sum of the earnings of the two players is zero. The goal of double-player zero-sum gambling is to solve nash equilibrium, a state of equilibrium in which any player changing the strategy will reduce his own revenue. The countermeasure process of the radar and the interference party can be regarded as double zero-sum game, and the respective strategies of the two parties can be obtained by solving Nash equilibrium.
The existing radar countermeasure game mainly researches static complete information double-player zero sum game, both sides of the countermeasure have complete understanding of the action space and the profit function of the opponent, the finally obtained countermeasure is also fixed, and under the actual complex electromagnetic environment, the static strategy cannot deal with the variable opponents, and the opponents can easily guess the own strategy, so that the opponents cannot win the countermeasure.
Disclosure of Invention
The invention aims to provide a game-oriented radar countermeasure generation method, which can overcome the defects of the prior art, is applied to a radar countermeasure scene, dynamically updates own strategies of counterpartners in a game in a scene with incomplete information, and has important significance for the development of cognitive radars and cognitive countermeasure technologies.
The technical scheme of the invention is as follows, and the game-oriented radar countermeasure generating method comprises the following specific steps:
step 1: regarding the radar and interference system as two players of the game, setting a confrontation scene: setting the radar to have NradarAn operating state, the interferer has NjamSetting the interference pattern, converting the working state of the radar into a report, constructing a radar countermeasure game tree from a root node root, and setting NinfoEach node needs to obtain a confrontation strategy, each node is provided with a corresponding player information set I, each node has a unique and fixed player to make a selection action, and each selectable action aiI is 1, 2, …, n is an optional number of actions, I isl,l=1,…NinfoRepresenting an information set of nodes needing to obtain a countermeasure strategy, and setting utility values of a radar and an interference system at each termination node of the tree according to a state conversion return report;
setting the number of iterations NiterAnd setting the training frequency N of the countermeasure neural network according to the traversal times K of the game tree from the root node in each iterationsSetting the training times N of the neural networknnSetting a threshold theta for judging whether the countermeasure is converged;
setting two neural networks of regret value and countermeasure strategy for each player, each neural network having corresponding training sampleBuffer area, note as
Figure BDA0002383912280000031
And
Figure BDA0002383912280000032
setting a countermeasure vector buffer area M, wherein the content stored in the buffer area is (I)l,[d(Il,a1),…,d(Il,an)]or[s(Il,a1),…,s(Il,an)]T), the input to the neural network is IlAnd outputting the predicted regret value vector calculated by the neural network
Figure BDA0002383912280000033
Or predicting countermeasure policy vector
Figure BDA0002383912280000034
Emptying buffers before training begins
Figure BDA0002383912280000035
And M, initializing neural network parameters, enabling t to be 1, and skipping to the step 2;
wherein the training frequency NsRepresentation training NsAfter regret neural network, train 1 fighting strategy neural network and satisfy NiterN s0; the superscripts r and s represent regret and countermeasure (strategy), respectively, and the subscripts 0 and 1 represent radar and interference systems, respectively; the content of the cache region is as follows: i islInformation sets corresponding to the nodes representing the game tree for which a countermeasure policy is to be derived, d (I)l,ai) I-1, …, n is an unfortunate value and represents the current information set IlTake action aiIn the case of the above-described conventional art,
Figure BDA0002383912280000036
is an unfortunate value of neural network prediction, s (I)l,ai) I-1, …, n is a countermeasure policy indicating the action a taken under the current information setiThe probability of (a) of (b) being,
Figure BDA0002383912280000037
is a countermeasure strategy predicted by a neural network, t is the iteration number of the current game, and u (I)l,ai) Where I is 1, …, n denotes the information set IlLower motion aiA utility value of;
step 2: selecting a current player p as t% 2, traversing the radar countermeasure game tree from a root node for K times in each iteration process, and if a node needs to obtain an countermeasure strategy, determining an information set I corresponding to the nodelThe player is the current player p of the iteration, and an information set I is inputlObtaining predicted regret vector according to regret neural network of player p
Figure BDA0002383912280000041
According to
Figure BDA0002383912280000042
To calculate the countermeasure policy vector s (I) for the information setl,a1),…,s(Il,an)]And accumulating the utility value and the countermeasure vector obtained in the traversal process to obtain an regret value vector [ d (I)l,a1),…,d(Il,an)]The information is collected into a node IlRegretted value vector [ d (I)l,a1),…,d(Il,an)]And the current iteration number t into the regret value cache area of the current player p
Figure BDA0002383912280000043
If the information set IlThe player is 1-p, namely the current player of the iteration is not, the information set I is inputlAsk for their output at the regret neural network of player 1-p
Figure BDA0002383912280000044
According to
Figure BDA0002383912280000045
To calculate a countermeasure for the set of information[ s (I) ]l,a1),…,s(Il,an)]And combining the information set IlAntagonistic strategy [ s (I)l,a1),…,s(Il,an)]And the current iteration number t are stored in the confrontation strategy cache area of the player 1-p
Figure BDA0002383912280000046
After traversing the game tree for K times from the root node, namely after 1 iteration process, caching the area according to the regret value of the current player p
Figure BDA0002383912280000047
Training data in NnnRegrettful neural network of next player to output vector
Figure BDA0002383912280000048
And the expected vector [ d (I)l,a1),…,d(Il,an)]As close as possible, i.e. regrettable neural network loss function LrCloser and closer to 0;
iteration NsAfter that, satisfy t% NsWhen the value is equal to 0, skipping to the step 3;
and step 3: data caching using two countermeasure policies
Figure BDA0002383912280000049
Training update NnnSecond two countermeasure neural networks, vector of outputs
Figure BDA0002383912280000051
And the expected vector s (I)l,a1),…,s(Il,an)]As close as possible, i.e. countering the strategic neural network loss function LsCloser and closer to 0;
information set I corresponding to each node needing to obtain countermeasure strategyl,l=1,…,NinfoInputting the competition strategy network of the corresponding player and obtaining NinfoGroup impedance strategy is stored inA buffer area M;
if the accumulated iteration number of the step 2 does not reach NiterReturning to the step 2 to carry out iterative calculation again;
repeating the step 2 and the step 3 until the iteration number reaches NiterThen, make the Nth in the buffer memory MiterCountermeasure strategy for secondary record and Nthiter-NsThe countermeasures recorded at the time are subjected to difference and the absolute value is taken, the largest element is found, if the maximum element is smaller than the threshold value theta, the output of the neural networks of the two countermeasures at the time is converged, the two parties reach a Nash equilibrium state, and the respective Nash equilibrium strategies are obtained; if it is greater than the threshold θ, it indicates that convergence has not been reached, i.e., the number of iterations NiterToo small, it is necessary to jump back to step 1 to change NiterThe value of (c) is restarted.
Preferably, the method for constructing the radar countermeasure game tree in the step 1 comprises the following specific steps: starting from a root node root, a working state is selected by a radar, a set available interference pattern is arranged on an interference side in each working state, after the interference is implemented, the radar has a set convertible working state, nodes for terminating the game are set, the branch conditions of all game trees are exhausted, and the game tree construction is completed.
Preferably, the information set I in step 1 specifically includes: the number of the player to which the information set belongs, the history of the sequence of actions of the game before the information set, and the position of the information set in the game tree.
Further, the calculation of the countermeasure policy vector [ s (I) as described in step 2l,a1),…,s(Il,an)]The specific calculation method comprises the following steps:
Figure BDA0002383912280000061
in which use
Figure BDA0002383912280000062
To represent
Figure BDA0002383912280000063
That is, the predicted regret value is positive, this formula also shows that if the regret values of all actions in the information set I are negative, the probability that the action with the largest regret value is selected is 1.
Further, the regrettable value vector [ d (I) of the current player p is calculated as described in step 2l,a1),…,d(Il,an)]The method comprises the following steps:
according to input IlUtility value vector u (I) of each action under corresponding nodel,ai),…,u(Il,an)]And the countermeasure strategy vector [ s (I) corresponding to the nodel,a1),…,s(Il,an)]Calculating the regret value d (I) of each action under the nodel,ai) The method specifically comprises the following steps:
Figure BDA0002383912280000064
calculated according to the formula to obtain [ d (I)l,a1),…,d(Il,an)]。
Further, the training method of the regret neural network described in step 2 specifically includes the following steps:
2.1 the training process of neural network is the process of minimizing the loss function to update the network parameters, so the loss function L of the regressive neural network is setrAs shown in the following formula:
Figure BDA0002383912280000071
wherein N isbatchThe number of data used for a training session, i.e. N is selected from the regrettable bufferbatchThe data is used as a sample to participate in training, and t is an regret value d in a regret value cache regioni(Ii,ak) The number of corresponding iterations is then determined,
Figure BDA0002383912280000072
an output of the regret neural network;
2.2 pairs loss function LrAnd (4) carrying out gradient descent, and continuously adjusting network parameters through a back propagation algorithm to minimize a loss function so as to obtain a new regret value neural network.
Further, the strategy of training neural network of step 3 is as follows:
3.1 the training process of the neural network is the process of minimizing the loss function to update the network parameters, thus setting the loss function L of the strategy-fighting neural networksAs shown in the following formula:
Figure BDA0002383912280000073
wherein N isbatchThe number of data used for one training, i.e. N is selected from the countermeasure bufferbatchTaking the data as a sample to participate in training, and t is a countermeasure strategy s in a countermeasure strategy cache regioni(Ii,ak) The number of corresponding iterations is then determined,
Figure BDA0002383912280000074
is the output of the countermeasure neural network;
3.2 pairs loss function LsAnd (4) performing gradient descent, and continuously adjusting parameters of the network through a back propagation algorithm to minimize a loss function so as to obtain a new countermeasure neural network.
Preferably, said loss function LrOr LsMiddle batch NbatchThe training data is selected by weighting and extracting the data in the regret value cache region or the countermeasure cache region according to the size of the t in proportion, namely the probability of selecting the data with later iteration times is higher, and N is selectedbatchAnd (4) data.
The invention has the beneficial effects that: the method regards the radar countermeasure process as double zero and dynamic games under incomplete information, constructs a dynamic game model of the incomplete information compared with radar countermeasure static game research, approximately solves Nash equilibrium of the radar and interference system games by using a neural network under the condition that both sides are intelligent, and obtains respective Nash equilibrium strategies through multiple dynamic iterative updates.
Drawings
Fig. 1 is a game tree of a radar countermeasure simulation scene in an embodiment of the present invention.
Fig. 2 is a schematic diagram of a radar state transition matrix according to an embodiment of the present invention.
FIG. 3 is a flow chart of an experiment according to the present invention.
FIG. 4 is a diagram of a neural network according to an embodiment of the present invention.
Fig. 5 is a convergence curve of each action probability of the radar under the root node in the embodiment of the present invention.
Fig. 6 is a convergence curve of the probability of each operation of the interference side under the node s0 according to the embodiment of the present invention.
FIG. 7 is a table of simulation results of action utility values and probabilities of various sets of information of a radar according to an embodiment of the present invention.
Fig. 8 is a table of action utility values and probability simulation results of each information set of an interferer according to the embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and the embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The experiment sets the countermeasure scene as one-to-one countermeasure, namely, the countermeasure scene comprises a multifunctional radar with a switchable state and an interference machine with a changeable interference pattern. Setting Nradar3, i.e. the radar has three operating states, denoted s0~s2The radar selects different working states, namely different actions; n is a radical ofjamI.e. the interferer has five interference patterns, denoted j0~j4Similarly, the selection of a different interference pattern by the interferer is a selection of a different action. Setting the number of iterations N iter1000, the traversal times K of the game tree in each iteration is 10, and the training frequency N of the confrontation strategy neural network s2, neural network training times NnnWhen the threshold θ is equal to 0.01, it is determined whether the convergence is reached, and the countermeasure cache area M is cleared.
Setting the Radar countermeasure Game Tree As shown in FIG. 1, the total number of nodes N of the Game Treetree23, the number of nodes N required to obtain the countermeasure strategyinfo7. Starting from a root node, the radar can select one working state according to the current countermeasure strategy, the interference party can execute corresponding interference patterns aiming at different radar states, and after the interference party sends interference, the radar selects the next working state. The figure shows seven nodes that need to derive a countermeasure in a dark graph. In order to make the simulation more practical, the number of the optional actions and the content of the optional actions are not completely consistent under different nodes of the experimental design radar and the interference party. E.g. s0At the node, the only actions available to the interferer are j1、j3And j4And the interferer selects j1After that, the radar can only jump to s1And s2Two states.
As shown in fig. 2, threat degrees of three states of the radar are evaluated from the perspective of the interfering party, and two-party rewards after state transition are set according to the threat degrees of the radar states, wherein the first is the radar party reward, the second is the interfering party reward, and the rewards of two players meet the characteristic of zero sum. Both the radar and the interference system in the subsequent experiment obtain corresponding utility values in the game process according to the state conversion report of fig. 2, so as to calculate data such as regret values.
Experimental flow diagram As shown in FIG. 3, the neural network is initialized and four buffer areas are cleared
Figure BDA0002383912280000091
And
Figure BDA0002383912280000092
and traversing K game trees within the iteration times, alternately training the radar and the interference party, training the regressive neural network by using the regressive value cache region data, and then training the countermeasure neural network of the radar and the interference party by using the collected countermeasure strategy cache region data until the countermeasure neural network converges. Specific details and calculations are described below.
The structure of the deep neural network used in the experiment is shown in FIG. 4, the networkData of the information set is taken as input, including player information (radar or interfering parties) of the current information set, the numbered sequence of actions performed, and the progress of the gaming process. For example, s in FIG. 10The information set of a node is represented as ([1 ]],[0,-1],[1,-1]),[1]Representing a set of information s0Take turns to Player 1, i.e. the disturber selection action, [0, -1]Indicates that the radar selects action 0 in the previous information set root, but the action is not selected yet and is marked as-1, [1, -1]Indicating progress, i.e. having undergone only one information set selection action from the root node.
The output of the neural network is a vector of 3-dimensional (radar) or 5-dimensional (interference), i.e. an unfortunate value or a countermeasure of the radar or the interference, and since the available actions under different information sets are different, the unfortunate value and the countermeasure of the unavailable actions are both expressed by None as invalid values.
For example, in the 500 th iteration, the selected current player is the interference party and traverses from the root node to s0The following calculation process will be followed after the node:
1)s0the player of the node is an interference party, and corresponding information set data, namely s, is input0=([1],[0,-1],[1,-1]) To the interfering party, output
Figure BDA0002383912280000101
(Note: here we shall output a 5-dimensional vector, since the information set s0Lower only j1,j3And j4Are available actions, so do not represent the belonging to j0And j2Two invalid values None, which will be omitted later in a similar manner), the countermeasure policy is calculated according to the following formula:
Figure BDA0002383912280000111
namely:
Figure BDA0002383912280000112
Figure BDA0002383912280000113
Figure BDA0002383912280000114
i.e. node s0Is [ s(s) ]0,j1),s(s0,j3),s(s0,j4)]=[0.51,0.49,0]。
2) Traverse s0The 3 actions under the node are respectively to the node j1,j3And j4And respectively calculating utility values of the three actions:
① since only one node under the last two nodes reaches the termination state, so for s0On the part of the node, the interferer selection action j3The utility value 2, i.e. u(s), is obtained0,j3) 2; selection action j4The utility value of-1, i.e. u(s), is obtained0,j4)=-1;
② traversal to node j1Due to j1Is radar, so j is1Information set data j of nodes1=([0],[0,1],[1,1]) Regrettful neural network of input radar to obtain output
Figure BDA0002383912280000115
Calculating the countermeasure thereof:
Figure BDA0002383912280000116
Figure BDA0002383912280000117
i.e. j1The countermeasure policy of the node is [ s (j)1,s1),s(j1,s2)]=[1,0]Correlation data (j)1,[s(j1,s1),s(j1,s2)]T), i.e., ([ 0)],[0,1],[1,1]),[1,0]500), pairs deposited to radarPolicy resistant cache
Figure BDA0002383912280000122
③j1Two actions under the node, which can get utility values of 2 and 4, respectively, for the interferer, from s according to the countermeasure policy mentioned in ②1And s2Select an action and return utility value, select s1It can be concluded that for the interferer, j is selected in this traversal1Has a utility value of 2, i.e. u(s)0,j1)=2;
3) According to s0Utility values of respective actions under the node, and s0Of the countermeasure, calculating s0Regret values for the following actions:
d(s0,j1)=u(s0,j1)-(u(s0,j1)*s(s0,j1)+u(s0,j3)*s(s0,j3)+u(s0,j4)*s(s0,j4))
=2-(2*0.51+2*0.49+(-1)*0)
=0
d(s0,j3)=u(s0,j3)-(u(s0,j1)*s(s0,j1)+u(s0,j3)*s(s0,j3)+u(s0,j4)*s(s0,j4))
=2-(2*0.51+2*0.49+(-1)*0)
=0
d(s0,j4)=u(s0,j4)-(u(s0,j1)*s(s0,j1)+u(s0,j3)*s(s0,j3)+u(s0,j4)*s(s0,j4))
=-1-(2*0.51+2*0.49+(-1)*0)
=-3
correlation data(s)0,[d(s0,j1),d(s0,j3),d(s0,j4)]T), i.e., ([ 1)],[0,-1],[1,-1]),[0,0,-3]500), put into the regret buffer of the interference party
Figure BDA0002383912280000123
After traversing 10 game trees from root, because the current player of the 500 th iteration is an interference party, training the regret neural network:
according to set Nbantch30, each time from the regret buffer
Figure BDA0002383912280000121
Selecting 30 data, wherein when selecting data, the data needs to be extracted according to the proportion weighted by the iteration times t, namely the data with the later iteration times are more easily selected, and according to the 30 data, according to the loss function
Figure BDA0002383912280000131
Selecting Adam gradient descent method, continuously adjusting weight and deviation of neural network through back propagation algorithm, minimizing loss function, training NnnThe loss function L is obtained after 30 timesr=0.00034。
The 500 th iteration is NsAfter training the regret neural network of the interferer, two confrontation strategy neural networks are also trained respectively:
according to set Nbantch30, each time from the countermeasure cache
Figure BDA0002383912280000133
And selecting 30 data, wherein when the data is selected, the data needs to be extracted according to the proportion weighted by the iteration times t, namely the data with the later iteration times is easier to select. From these 30 data, according to the loss function
Figure BDA0002383912280000132
Selecting Adam gradient descent method, and continuously adjusting spirit by back propagation algorithmThrough the weight and deviation of the network, the loss function is minimized, and N is trainednnThe loss functions of the neural networks of the two confrontation strategies obtained in the next time are 17.5465 and 0.00161 respectively. Root, s0J under1,s1J under2,s2J under2Inputting the corresponding information set data into a radar countermeasure neural network, and converting s0,s1And s2Inputting corresponding information set data into an interference party countermeasure strategy neural network to obtain NinfoAnd 7 confrontation strategy vectors are stored in the buffer area M.
The experiment is iterated to 1000, and the countermeasure strategies of the two parties are obtained as shown in fig. 7 and 8, the difference is made between the countermeasure strategy of the 998 th time and the absolute value is taken, the largest element is found to be 0.003572, and the maximum element is smaller than the threshold value theta which is 0.01, so that the countermeasure neural network can be obtained to achieve convergence.
Selection of root and s for Experimental results0Two relatively complex information sets are analyzed, and the convergence condition of each action probability under 1000 games is drawn. As can be seen from fig. 5, after 750 games, the countermeasure strategy basically converges in the information set root; as can be seen from FIG. 6, the information set s0The confrontation strategy of (1) converges faster and substantially after 500 games.
Referring to the report table in fig. 2, the utility (value or range) of each action under the information set corresponding to each node where the countermeasure policy needs to be derived by the radar and the interferer, and the action probability output after 1000 games are listed, as shown in fig. 7 and fig. 8. Information set j in FIG. 71(s0) Is shown in radar state s0Lower implementation of the interference action j1After that, it is the turn of the radar to select the next action, j2(s1) And j2(s2) The meaning is similar to it. The optimal action can be visually seen according to the utility of part of the information sets, and the corresponding optimal action is marked in the table by using bold fonts.
The countermeasure strategies for which the radar side needs to derive the respective information sets of the countermeasure strategy are analyzed in connection with fig. 7. Comparing utility values of actions, j1(s0),j2(s1) And j2(s2) All have absoluteAdvantageous optional actions, simulation results can also be verified, with an absolute advantageous action probability of approximately 1. And the calculation of the countermeasure strategy of the information set root needs to consider the corresponding countermeasure strategy of the disturber in the countermeasure process. Comprehensively analyzing the utility data of the two parties, s under Nash equilibrium1And s2The utility values of the two actions are approximately equal and greater than s0I.e. s1And s2Both actions are approximately equivalent and both are superior to s0The simulation results also conform to the above analysis.
The countermeasure strategies for the interferers requiring the respective information sets of the countermeasure strategies are analyzed in connection with fig. 8. Information set s1And s2All have optional actions with absolute advantage, and the simulation result can also be verified, and the probability of the action with absolute advantage is approximately 1. It is noted that in the simulation results, the interference action j1The absolute advantage of (c) is not shown. Analysis shows that, similar to root, s0Is in j with the radar side1(s0) The countermeasure is closely related, under Nash equilibrium, the interferer selects j1Will be approximately equal to the selection j3And all are superior to the selection j4So j in the simulation result1And j3Is about 0.5.
The foregoing description is only an example of the game play countermeasure generation of the present invention and is not intended to limit the present invention, and various modifications and variations will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A game-oriented radar countermeasure generating method is characterized by comprising the following specific steps:
step 1: regarding the radar and interference system as two players of the game, setting a confrontation scene: setting the radar to have NradarAn operating state, the interferer has NjamSetting an interference pattern, converting the working state of the radar into a report, and constructing a radar countermeasure from a root node rootGame tree, set NinfoEach node needs to obtain a confrontation strategy, each node is provided with a corresponding player information set I, each node has a unique and fixed player to make a selection action, and each selectable action aiI is 1, 2, …, n is an optional number of actions, I isl,l=1,…NinfoRepresenting an information set of nodes needing to obtain a countermeasure strategy, and setting utility values of a radar and an interference system at each termination node of the tree according to a state conversion return report;
setting the number of iterations NiterAnd setting the training frequency N of the countermeasure neural network according to the traversal times K of the game tree from the root node in each iterationsSetting the training times N of the neural networknnSetting a threshold theta for judging whether the countermeasure is converged;
each player sets two neural networks of regret value and countermeasure strategy, each neural network has corresponding training sample buffer zone, and the buffer zone is recorded as
Figure FDA0002383912270000011
And
Figure FDA0002383912270000012
setting a countermeasure vector buffer area M, wherein the content stored in the buffer area is (I)l,[d(Il,a1),…,d(Il,an)]or[s(Il,a1),…,s(Il,an)]T), the input to the neural network is IlAnd outputting the predicted regret value vector calculated by the neural network
Figure FDA0002383912270000013
Or predicting countermeasure policy vector
Figure FDA0002383912270000014
Emptying buffers before training begins
Figure FDA0002383912270000015
And M, beginningInitializing the neural network parameters, setting t as 1, and skipping to the step 2;
wherein the training frequency NsRepresentation training NsAfter regret neural network, train 1 fighting strategy neural network and satisfy Niter%Ns0; the superscripts r and s represent regret and countermeasure (strategy), respectively, and the subscripts 0 and 1 represent radar and interference systems, respectively; the content of the cache region is as follows: i islInformation sets corresponding to the nodes representing the game tree for which a countermeasure policy is to be derived, d (I)l,ai) I-1, …, n is an unfortunate value and represents the current information set IlTake action aiIn the case of the above-described conventional art,
Figure FDA0002383912270000021
is an unfortunate value of neural network prediction, s (Il, ai), i ═ 1, …, n is a countermeasure strategy, indicating that action a is taken under the current information setiThe probability of (a) of (b) being,
Figure FDA0002383912270000022
is a countermeasure strategy predicted by a neural network, t is the iteration number of the current game, and u (I)l,ai) Where I is 1, …, n denotes the information set IlLower motion aiA utility value of;
step 2: selecting a current player p as t% 2, traversing the radar countermeasure game tree from a root node for K times in each iteration process, and if a node needs to obtain an countermeasure strategy, determining an information set I corresponding to the nodelThe player is the current player p of the iteration, and an information set I is inputlObtaining predicted regret vector according to regret neural network of player p
Figure FDA0002383912270000023
According to
Figure FDA0002383912270000024
To calculate the countermeasure policy vector s (I) for the information setl,a1),…,s(Il,an)]Then go back toAccumulating the utility value and the countermeasure vector obtained in the process of over-traversal to obtain an regret value vector [ d (I)l,a1),…,d(Il,an)]The information is collected into a node IlRegretted value vector [ d (I)l,a1),…,d(Il,an)]And the current iteration number t into the regret value cache area of the current player p
Figure FDA0002383912270000025
If the information set IlThe player is 1-p, namely the current player of the iteration is not, the information set I is inputlAsk for their output at the regret neural network of player 1-p
Figure FDA0002383912270000026
According to
Figure FDA0002383912270000031
To calculate a countermeasure strategy for the set of information s (I)l,a1),…,s(Il,an)]And combining the information set IlAntagonistic strategy [ s (I)l,a1),…,s(Il,an)]And the current iteration number t are stored in the confrontation strategy cache area of the player 1-p
Figure FDA0002383912270000032
After traversing the game tree for K times from the root node, namely after 1 iteration process, caching the area according to the regret value of the current player p
Figure FDA0002383912270000033
Training data in NnnRegrettful neural network of next player to output vector
Figure FDA0002383912270000034
And the expected vector [ d (I)l,a1),…,d(Il,an)]As close as possible, i.e. regrettable neural network loss function LrCloser and closer to 0;
iteration NsAfter that, satisfy t% NsWhen the value is equal to 0, skipping to the step 3;
and step 3: data caching using two countermeasure policies
Figure FDA0002383912270000035
Training update NnnSecond two countermeasure neural networks, vector of outputs
Figure FDA0002383912270000036
And the expected vector s (I)l,a1),…,s(Il,an)]As close as possible, i.e. countering the strategic neural network loss function LsCloser and closer to 0;
information set I corresponding to each node needing to obtain countermeasure strategyl,l=1,…,NinfoInputting the competition strategy network of the corresponding player and obtaining NinfoStoring the group impedance strategy into a cache region M;
if the accumulated iteration number of the step 2 does not reach NiterReturning to the step 2 to carry out iterative calculation again;
repeating the step 2 and the step 3 until the iteration number reaches NiterThen, make the Nth in the buffer memory MiterCountermeasure strategy for secondary record and Nthiter-NsThe countermeasures recorded at the time are subjected to difference and the absolute value is taken, the largest element is found, if the maximum element is smaller than the threshold value theta, the output of the neural networks of the two countermeasures at the time is converged, the two parties reach a Nash equilibrium state, and the respective Nash equilibrium strategies are obtained; if it is greater than the threshold θ, it indicates that convergence has not been reached, i.e., the number of iterations NiterToo small, it is necessary to jump back to step 1 to change NiterThe value of (c) is restarted.
2. A game-oriented radar countermeasure generation method according to claim 1, wherein the radar countermeasure game tree is constructed in step 1 by the specific method: starting from a root node root, a working state is selected by a radar, a set available interference pattern is arranged on an interference side in each working state, after the interference is implemented, the radar has a set convertible working state, nodes for terminating the game are set, the branch conditions of all game trees are exhausted, and the game tree construction is completed.
3. The game-oriented radar countermeasure generation method according to claim 1, wherein the information set I in step 1 specifically includes: the number of the player to which the information set belongs, the history of the sequence of actions of the game before the information set, and the position of the information set in the game tree.
4. A game oriented radar countermeasure generation method according to claim 1, wherein the calculation of the countermeasure policy vector [ s (I, a) in step 2 is performed1),…,s(I,an)]The specific calculation method comprises the following steps:
Figure FDA0002383912270000041
in which use
Figure FDA0002383912270000042
To represent
Figure FDA0002383912270000043
That is, the predicted regret value is positive, this formula also shows that if the regret values of all actions in the information set I are negative, the probability that the action with the largest regret value is selected is 1.
5. A game-oriented radar countermeasure generation method according to claim 4, wherein the calculation of the regret value vector [ d (I, a) of the current player p in step 2 is performed1),…,d(I,an)]The method comprises the following steps:
according to the inputIlUtility value vector u (I) of each action under corresponding nodel,ai),…,u(Il,an)]And the countermeasure strategy vector [ s (I) corresponding to the nodel,a1),…,s(Il,an)]Calculating the regret value d (I) of each action under the nodel,ai) The method specifically comprises the following steps:
Figure FDA0002383912270000051
calculated according to the formula to obtain [ d (I)l,a1),…,d(Il,an)]。
6. The game-oriented radar countermeasure generation method according to claim 5, wherein the regressive value neural network training method in the step 2 is specifically as follows:
2.1 the training process of neural network is the process of minimizing the loss function to update the network parameters, so the loss function L of the regressive neural network is setrAs shown in the following formula:
Figure FDA0002383912270000052
wherein N isbatchThe number of data used for a training session, i.e. N is selected from the regrettable bufferbatchThe data is used as a sample to participate in training, and t is an regret value d in a regret value cache regioni(Ii,ak) The number of corresponding iterations is then determined,
Figure FDA0002383912270000053
an output of the regret neural network;
2.2 pairs loss function LrAnd (4) carrying out gradient descent, and continuously adjusting network parameters through a back propagation algorithm to minimize a loss function so as to obtain a new regret value neural network.
7. The game-oriented radar countermeasure generation method of claim 4, wherein the countermeasure neural network in step 3 is trained as follows:
3.1 the training process of the neural network is the process of minimizing the loss function to update the network parameters, thus setting the loss function L of the strategy-fighting neural networksAs shown in the following formula:
Figure FDA0002383912270000061
wherein N isbatchThe number of data used for one training, i.e. N is selected from the countermeasure bufferbatchTaking the data as a sample to participate in training, and t is a countermeasure strategy s in a countermeasure strategy cache regioni(Ii,ak) The number of corresponding iterations is then determined,
Figure FDA0002383912270000062
is the output of the countermeasure neural network;
3.2 pairs loss function LsAnd (4) performing gradient descent, and continuously adjusting parameters of the network through a back propagation algorithm to minimize a loss function so as to obtain a new countermeasure neural network.
8. A game-oriented radar countermeasure generation method according to claim 6 or 7, characterized in that the loss function LrOr LsA batch of N as described in (1)batchThe training data is selected by weighting and extracting the data in the regret value cache region or the countermeasure cache region according to the size of the t in proportion, namely the probability of selecting the data with later iteration times is higher, and N is selectedbatchAnd (4) data.
CN202010091616.XA 2020-02-13 2020-02-13 Game-oriented radar countermeasure generating method Active CN111275174B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010091616.XA CN111275174B (en) 2020-02-13 2020-02-13 Game-oriented radar countermeasure generating method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010091616.XA CN111275174B (en) 2020-02-13 2020-02-13 Game-oriented radar countermeasure generating method

Publications (2)

Publication Number Publication Date
CN111275174A true CN111275174A (en) 2020-06-12
CN111275174B CN111275174B (en) 2020-09-18

Family

ID=71003630

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010091616.XA Active CN111275174B (en) 2020-02-13 2020-02-13 Game-oriented radar countermeasure generating method

Country Status (1)

Country Link
CN (1) CN111275174B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270103A (en) * 2020-11-09 2021-01-26 北京理工大学重庆创新中心 Cooperative strategy inversion identification method based on multi-agent game
CN112329348A (en) * 2020-11-06 2021-02-05 东北大学 Intelligent decision-making method for military countermeasure game under incomplete information condition
CN113055107A (en) * 2021-02-23 2021-06-29 电子科技大学 Interference strategy generation method for radio station with unknown communication mode
CN113869501A (en) * 2021-10-19 2021-12-31 京东科技信息技术有限公司 Neural network generation method and device, electronic equipment and storage medium
CN115022952A (en) * 2022-08-09 2022-09-06 中国人民解放军国防科技大学 Satellite communication power resource allocation method under confrontation condition
CN116028817A (en) * 2023-01-13 2023-04-28 哈尔滨工业大学(深圳) CFR strategy solving method based on single evaluation value network and related equipment
CN116755046A (en) * 2023-08-16 2023-09-15 西安电子科技大学 Multifunctional radar interference decision-making method based on imperfect expert strategy

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101408076B1 (en) * 2014-01-13 2014-06-18 국방과학연구소 Method for recognizing pri modulation type using neural network fuse with genetic algorithm and apparatus thereof
CN104267379A (en) * 2014-09-15 2015-01-07 电子科技大学 Active and passive radar cooperative anti-interference method based on waveform design
US20160132768A1 (en) * 2014-11-10 2016-05-12 The Boeing Company Systems and methods for training multipath filtering systems
CN106443598A (en) * 2016-12-08 2017-02-22 中国人民解放军海军航空工程学院 Convolutional neural network based cooperative radar network track deception jamming discrimination method
CN108492258A (en) * 2018-01-17 2018-09-04 天津大学 A kind of radar image denoising method based on generation confrontation network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101408076B1 (en) * 2014-01-13 2014-06-18 국방과학연구소 Method for recognizing pri modulation type using neural network fuse with genetic algorithm and apparatus thereof
CN104267379A (en) * 2014-09-15 2015-01-07 电子科技大学 Active and passive radar cooperative anti-interference method based on waveform design
US20160132768A1 (en) * 2014-11-10 2016-05-12 The Boeing Company Systems and methods for training multipath filtering systems
CN106443598A (en) * 2016-12-08 2017-02-22 中国人民解放军海军航空工程学院 Convolutional neural network based cooperative radar network track deception jamming discrimination method
CN108492258A (en) * 2018-01-17 2018-09-04 天津大学 A kind of radar image denoising method based on generation confrontation network

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329348A (en) * 2020-11-06 2021-02-05 东北大学 Intelligent decision-making method for military countermeasure game under incomplete information condition
CN112329348B (en) * 2020-11-06 2023-09-15 东北大学 Intelligent decision-making method for military countermeasure game under incomplete information condition
CN112270103A (en) * 2020-11-09 2021-01-26 北京理工大学重庆创新中心 Cooperative strategy inversion identification method based on multi-agent game
CN113055107A (en) * 2021-02-23 2021-06-29 电子科技大学 Interference strategy generation method for radio station with unknown communication mode
CN113869501A (en) * 2021-10-19 2021-12-31 京东科技信息技术有限公司 Neural network generation method and device, electronic equipment and storage medium
CN115022952A (en) * 2022-08-09 2022-09-06 中国人民解放军国防科技大学 Satellite communication power resource allocation method under confrontation condition
CN116028817A (en) * 2023-01-13 2023-04-28 哈尔滨工业大学(深圳) CFR strategy solving method based on single evaluation value network and related equipment
CN116755046A (en) * 2023-08-16 2023-09-15 西安电子科技大学 Multifunctional radar interference decision-making method based on imperfect expert strategy
CN116755046B (en) * 2023-08-16 2023-11-14 西安电子科技大学 Multifunctional radar interference decision-making method based on imperfect expert strategy

Also Published As

Publication number Publication date
CN111275174B (en) 2020-09-18

Similar Documents

Publication Publication Date Title
CN111275174B (en) Game-oriented radar countermeasure generating method
CN112861442A (en) Multi-machine collaborative air combat planning method and system based on deep reinforcement learning
CN114358141A (en) Multi-agent reinforcement learning method oriented to multi-combat-unit cooperative decision
CN113952733A (en) Multi-agent self-adaptive sampling strategy generation method
Oh et al. Learning to sample with local and global contexts in experience replay buffer
Perez-Liebana et al. Generating diverse and competitive play-styles for strategy games
Fukushima et al. Mimicking an expert team through the learning of evaluation functions from action sequences
KR100621559B1 (en) Gamer's game style transplanting system and its processing method by artificial intelligence learning
CN116090549A (en) Knowledge-driven multi-agent reinforcement learning decision-making method, system and storage medium
Szita et al. Effective and diverse adaptive game AI
CN116338599A (en) Interference pattern and working parameter combined optimization method for multifunctional radar
Liu et al. An improved minimax-Q algorithm based on generalized policy iteration to solve a Chaser-Invader game
CN114662655A (en) Attention mechanism-based weapon and chess deduction AI hierarchical decision method and device
CN114757092A (en) System and method for training multi-agent cooperative communication strategy based on teammate perception
Karnsund DQN tackling the game of candy crush friends saga: A reinforcement learning approach
Yee et al. Pattern Recognition and Monte-CarloTree Search for Go Gaming Better Automation
Menon et al. An Efficient Application of Neuroevolution for Competitive Multiagent Learning
Kim et al. LESSON: learning to integrate exploration strategies for reinforcement learning via an option framework
Baláž et al. AlphaZero with Real-Time Opponent Skill Adaptation
Lee et al. Opponent Exploitation Based on Bayesian Strategy Inference and Policy Tracking
Hasan et al. Design and development of a benchmark for dynamic multi-objective optimisation problem in the context of deep reinforcement learning
Ou et al. A New Decision-Making Approach via Monte Carlo Tree Search and A2C
Zheng The influence of different environments on reinforcement learning
Stolt Ansó Investigating state representations in deep reinforcement learning for pellet eating in Agar. io
Weil et al. Know your Enemy: Investigating Monte-Carlo Tree Search with Opponent Models in Pommerman

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant