CN116755046B - Multifunctional radar interference decision-making method based on imperfect expert strategy - Google Patents

Multifunctional radar interference decision-making method based on imperfect expert strategy Download PDF

Info

Publication number
CN116755046B
CN116755046B CN202311029543.1A CN202311029543A CN116755046B CN 116755046 B CN116755046 B CN 116755046B CN 202311029543 A CN202311029543 A CN 202311029543A CN 116755046 B CN116755046 B CN 116755046B
Authority
CN
China
Prior art keywords
interference
expert
decision
radar
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311029543.1A
Other languages
Chinese (zh)
Other versions
CN116755046A (en
Inventor
周峰
田甜
李建鑫
刘磊
樊伟伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202311029543.1A priority Critical patent/CN116755046B/en
Publication of CN116755046A publication Critical patent/CN116755046A/en
Application granted granted Critical
Publication of CN116755046B publication Critical patent/CN116755046B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S7/00Details of systems according to groups G01S13/00, G01S15/00, G01S17/00
    • G01S7/02Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00
    • G01S7/38Jamming means, e.g. producing false echoes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Radar Systems Or Details Thereof (AREA)

Abstract

The invention relates to a multifunctional radar interference decision method of imperfect expert strategy, comprising the following steps: acquiring a radar state; when the radar state is inconsistent with the radar target state, judging whether the radar state belongs to an expert decision error state set or not by using an expert intervention discriminant function module; when the radar state is judged to belong to the expert decision error state set, selecting an interference pattern by using a main decision network in the interference decision network; judging whether an expert strategy participates in interference decision or not by using an expert interference exploration function module according to the radar state when the radar state is judged not to belong to the expert decision error state set; when judging that the expert strategy participates in the interference decision, selecting an interference pattern by using an expert decision network module; and when judging that the expert strategy does not participate in the interference decision, selecting an interference pattern by using a main decision network in the interference decision network. The method effectively improves the learning efficiency and decision accuracy of the interference decision algorithm, and reduces the trial-and-error cost of the countermeasure game.

Description

Multifunctional radar interference decision-making method based on imperfect expert strategy
Technical Field
The invention belongs to the technical field of radars, and particularly relates to a multifunctional radar interference decision method of an imperfect expert strategy.
Background
Along with the wide application of digital technology and artificial intelligence technology in the radar field, the multifunctional radar with multiple tasks and waveforms has become an important means for modern microwave detection. However, the advantages of high intelligence and high flexibility of the radar for the detected party pose a more serious threat and challenge to the detected target and area. The jammer is difficult to flexibly and rapidly take effective interference measures, and is disadvantageous. Interference decision is an important part of radar game countermeasure, and is the key of whether effective interference can be implemented. Therefore, research on an intelligent interference decision method with self-adaptive characteristics is significant for improving interference efficiency in response to radars with cognitive abilities. At present, intelligent interference decision making by combining reinforcement learning is one of research hotspots in the radar field, and is also considered as one of effective means for solving the problem of multi-functional radar interference. In recent years, with the continuous development of technology in the deep learning field, deep reinforcement learning, which combines the advantages of deep learning and reinforcement learning, is excellent in the field of intelligent control decision, and is widely applied to the fields of robot control, automatic driving, game AI, natural language processing, and the like. Therefore, many students introduce reinforcement learning into the radar interference decision field, and various interference decision methods based on reinforcement learning are proposed.
Li Yunjie et al combine reinforcement learning with radar countermeasure process for the first time, providing a new idea for radar interference decision-making method. Zhang Bakai et al propose an interference decision method based on DQN (Deep Q Network) aiming at the problem that the interference decision efficiency based on Q-learning is reduced due to the increase of radar states, so that interference decisions of various radar states are realized. Li Huiqin et al improve the Q-learning algorithm by using the ideas of simulated annealing algorithm and random gradient descent with hot restart, so as to improve the exploration and utilization rate of the interference strategy and obtain a faster convergence effect. Weiqi et al introduce A3C (Asynchronous Advantage Actor-Critic, asynchronous dominant motion evaluation algorithm) into the field of interference decision making, improving the decision time efficiency. Zhu Bakun et al improve the convergence rate of the algorithm by introducing priori knowledge and combining Dyna architecture for the problem of too slow convergence rate of the Q-learning algorithm. Liu Hongdi et al propose a two-stage interference decision algorithm that implements the strategy optimization problem of interference patterns and interference parameters. Li Yongfeng by analyzing the sample sampling method, a DDQN (Double DQN, double depth Q network) interference decision method based on the supervised sampling is provided, so that the decision efficiency and stability are improved. Liu Songtao et al further analyze the DQN algorithm and propose a D3QN (Dueling Double Deep Q Network, dual depth Q network) based radar interference decision algorithm that improves the efficiency and accuracy of interference decisions.
However, with the increase of radar state and interference action space, huge pressure exists in the calculation and storage of the traditional Q-learning algorithm, the complexity of the algorithm increases exponentially, the timeliness of decision making of an jammer is seriously affected, and the decision making efficiency of the jammer is lower; in addition, the existing intelligent jammer decision learning method based on reinforcement learning still has the problems of low learning efficiency and high trial and error cost of the countermeasure game.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a multifunctional radar interference decision method of an imperfect expert strategy. The technical problems to be solved by the invention are realized by the following technical scheme:
the embodiment of the invention provides a multifunctional radar interference decision method of an imperfect expert strategy, which is based on a multifunctional radar interference decision model, and utilizes a trained decision network to select an interference pattern according to radar states so as to implement interference on the radar, so that the radar generates a new radar state; the trained decision network comprises an expert decision network module, an interference decision network, an expert interference discrimination function module and an expert interference exploration function module; the method comprises the steps of:
acquiring a radar state;
when the radar state is inconsistent with the radar target state, judging whether the radar state belongs to an expert decision error state set or not by utilizing the expert intervention discriminant function module;
when the radar state is judged to belong to an expert decision error state set, selecting an interference pattern by using a main decision network in the interference decision network; judging whether an expert strategy participates in interference decision or not by utilizing the expert interference exploration function module according to the radar state when the radar state is judged not to belong to the expert decision error state set;
when judging that the expert strategy participates in the interference decision, selecting an interference pattern by using the expert decision network module; and when judging that the expert strategy does not participate in the interference decision, selecting an interference pattern by using a main decision network in the interference decision network.
In one embodiment of the present invention, the multi-functional radar interference decision model is:
<S,J,P,R,T>;
wherein S is a radar state space, s= { S 1 ,s 2 ,…,s N },s n N=1, 2,..n is the radar status in S, N is the number of radar states; j is the interference pattern space, j= { J 1 ,j 2 ,…,j M },j m M=1, 2,., M is the interference pattern of the interfering machine in J, M is the number of interference patterns; p is SxJxS → [0,1 ]]The state transition probability is represented by T, the ending signal is represented by T, and the reward function is represented by R;
the bonus function R is defined as:
wherein r is t For the interference gain at time t, t is the time before interference is implemented, t+1 is the time after interference is implemented and radar signals are again detected,for pre-interference radar threat level, +.>For post-interference radar threat level s target Is radar target state s t To implement the radar state at interference s t+1 To implement the radar state after interference.
In one embodiment of the present invention, the method for constructing the expert decision network module includes the steps of:
taking expert data as a training sample set; the method comprises the steps of using a current multifunctional radar state in expert data as input of an initial expert decision network, using probability distribution of interference patterns adopted in the current multifunctional radar state as output of the initial expert decision network, and using interference patterns corresponding to the multifunctional radar state in the expert data as labels through a behavior cloning method;
constructing an initial expert decision network represented by a deep neural network;
and training the initial expert decision network by using the training sample set, and back-propagating a cross entropy loss function to update parameters of the initial expert decision network to obtain the expert decision network module.
In one embodiment of the present invention, the expert data is represented in the form of a trace as:
Γ={(s i ,j i )|i=1,2,...,N e };
wherein s is i Is the state of multifunctional radar, j i For jammer pattern j i E J, J is the interference pattern space,(s) i ,j i ) For the ith expert knowledge, N e Representing the number of expert knowledge;
the cross entropy loss function is:
wherein M is the number of interference patterns, j k For the kth interference pattern,probability distribution, p, of interference patterns assumed in the current multifunctional radar state e (·|s) is the probability distribution of the interference pattern corresponding to the multi-functional radar state in the expert data.
In one embodiment of the present invention, the interference decision network includes a main decision network and a target decision network, and the main decision network and the target decision network have the same structure, and each includes: a third fully-connected layer, a second active layer, a fourth fully-connected layer, a third active layer, a fifth fully-connected layer, a sixth fully-connected layer, and an addition module, wherein,
the third full-connection layer, the second activation layer, the fourth full-connection layer and the third activation layer are sequentially connected;
the input end of the fifth full-connection layer and the input end of the sixth full-connection layer are connected with the output end of the third full-connection layer;
the output end of the fifth full-connection layer and the output end of the sixth full-connection layer are both connected with the input end of the adding module;
the output end of the adding module is used as the output end of the main decision network or the target decision network.
In one embodiment of the present invention, the trained decision network is obtained by training a decision network, and the training method of the decision network includes the steps of:
s2041, setting the total number of training games, the maximum number of single-round games and an exploration factor, and initializing network parameters, learning rate, experience pool and sampling batch size of the interference decision network;
s2042, acquiring a current radar state, and inputting the current radar state into the expert intervention discriminant function module when the current radar state is inconsistent with the radar target state; when the current radar state is judged to be consistent with the radar target state, starting a new training round until the training times reach the total training game times;
s2043, when the expert intervention discriminant function module judges that the current radar state belongs to an expert decision error state set, selecting a current interference pattern by using a main decision network in the interference decision network to implement interference; when the expert interference judging function module judges that the current radar state does not belong to an expert decision error state set, judging the participation degree of an expert strategy for interference decision by utilizing the expert interference exploring function module according to the current radar state; when judging that the expert strategy participates in interference decision, selecting a current interference pattern by utilizing the expert decision network module to implement interference; when judging that the expert strategy does not participate in the interference decision, selecting a current interference pattern by using a main decision network in the interference decision network to implement interference;
s2044, acquiring a radar state after interference, evaluating current interference benefits by combining the current radar state, and storing the current radar state, the current interference pattern, the current interference benefits, the radar state after interference and a current ending signal as a combination into an experience pool;
s2045, sampling experience samples from the experience pool according to the size of the sampling batch by adopting a sampling mode of preferential experience playback so as to train and update the main decision network, and back-propagating by utilizing a target loss function so as to update main decision network parameters;
s2046, updating target decision network parameters by combining the super parameters and the main decision network parameters;
s2047, storing expert strategy error information into the expert decision error state set;
s2048, when the number of single-round games is smaller than the maximum number of single-round games, returning to the step S2042; and when the number of single-round games is larger than or equal to the maximum number of single-round games, returning to the step S2041 until the training number reaches the total number of training games, and obtaining the trained decision network.
In one embodiment of the invention, the objective loss function is:
wherein D3QN is D3QN algorithm, θ is main decision network parameter, Q(s) t ,j t The method comprises the steps of carrying out a first treatment on the surface of the θ) is the primary decision network output at s t Take j below t Interference value s of (c) t J is the current radar state t R for the current interference pattern t For current interference gain, Q T For the target decision network s t+1 For post-interference radar conditions, J is the interference pattern space, J is the interference pattern of the jammer, Q (s t+1 J; θ) is at s of the main decision network output after interference t+1 The interference value of j, theta is adopted - Deciding network parameters for a target;
the update formula of the main decision network parameter is as follows:
wherein θ new For updated primary decision network parameters, θ old For the main decision network parameters before updating, l is the learning rate,regarding θ for the target loss function old Is a gradient of (2);
the updating formula of the target decision network parameter is as follows:
wherein,decision network parameters for updated targets, +.>For the target decision network parameters before updating, τ is the superparameter.
In one embodiment of the present invention, the output strategy of the trained decision network is:
wherein,for expert intervention discriminant function module->Exploring a function module for expert interference, pi e For expert policy in expert decision network module, pi q S is radar state, which is an interference strategy in the interference decision network.
In one embodiment of the present invention, the expert intervention discriminant function module is defined as:
wherein s is the radar state,for expert decision error state set, < >>Representing the utilization of expert policy pi e Decision making->Representing the utilization of interference policy pi q And making a decision.
In one embodiment of the present invention, the expert interference discovery function module is defined as:
wherein s is the radar state,for expert decision error state set, xi is random number of 0-1, epsilon e In order to explore the factors,representing expert policy pi e Participation in decision making, including->Representing the utilization of interference policy pi q And (5) searching an interference pattern in the radar state s.
Compared with the prior art, the invention has the beneficial effects that:
1. according to the multifunctional radar interference decision method based on the imperfect expert strategy, on the basis of a traditional deep reinforcement learning interference decision algorithm, expert knowledge accumulated in the radar countermeasure field is integrated into the interference decision method, and the expert decision network module and the interference decision network are combined to jointly carry out interference decision, so that the learning efficiency and decision accuracy of the interference decision algorithm are effectively improved, and the countermeasure game trial-and-error cost is reduced.
2. According to the multifunctional radar interference decision method, the severity of the optimal condition of the expert strategy is considered, the expert knowledge discrimination processing module and the expert interference exploration function module are introduced on the premise that the expert strategy provides safety guarantee and exploration guidance for the decision network, the decision network is utilized to learn and correct expert knowledge of different qualities, the decision accuracy is improved, and the timeliness of decision is improved.
Drawings
Fig. 1 is a schematic flow chart of a multifunctional radar interference decision method of an imperfect expert strategy according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a method for acquiring a trained decision network according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an initial expert decision network according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an interference decision network according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a training implementation framework of a decision network according to an embodiment of the present invention;
FIG. 6 is a diagram of a radar state transition relationship provided in an embodiment of the present invention;
FIG. 7 is a graph of decision accuracy provided by an embodiment of the present invention;
fig. 8 is a graph of another decision accuracy provided by an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to specific examples, but embodiments of the present invention are not limited thereto.
Example 1
Referring to fig. 1, fig. 1 is a flow chart of a multifunctional radar interference decision method of an imperfect expert strategy according to an embodiment of the present invention. The method is based on a constructed multifunctional radar interference decision model, and a trained decision network is utilized to select an interference pattern according to radar states so as to implement interference on the radar, so that the radar generates new radar states. The trained decision network comprises an expert decision network module, an interference decision network, an expert interference discriminant function module and an expert interference exploration function module. The method comprises the following steps:
s101, acquiring radar states.
S102, when the radar state is inconsistent with the radar target state, judging whether the radar state belongs to an expert decision error state set or not by using an expert intervention discriminant function module. When the radar state is judged to be consistent with the radar target state, interference decision is not needed.
S103, when the radar state is judged to belong to the expert decision error state set, selecting an interference pattern by using a main decision network in the interference decision network; and judging the participation degree of the expert strategy for interference decision by utilizing the expert interference exploration function module according to the radar state when the radar state is judged not to belong to the expert decision error state set, namely judging whether the expert strategy participates in decision.
S104, when judging that the expert strategy participates in interference decision, selecting an interference pattern by using the expert decision network module; and when judging that the expert strategy does not participate in the interference decision, selecting an interference pattern by using a main decision network in the interference decision network.
Referring to fig. 2, fig. 2 is a schematic diagram of a method for acquiring a trained decision network according to an embodiment of the present invention. The method comprises the following steps:
s201, constructing a multifunctional radar interference decision model.
Specifically, in a complex electromagnetic environment, the non-cooperative radar has uncertainty and dynamics in the game process of the two countering parties, and the radar state sequence meets the markov property, namely the countering system only depends on the current radar state and the interference strategy of the jammer. Thus, the multi-functional radar interference decision problem is built as a markov decision process (Markov Decision Process, MDPs). MDPs are a framework for a formalized sequence decision problem for random dynamic systems with Markov properties, using a five-tuple<S,J,P,R,T>Describing, wherein S is a state set, and can be used for representing radar state space, J is an action set, and can be used for representing interference pattern space, and P is S multiplied by J multiplied by S multiplied by 0,1]For the probability of state transition, i.e. the state s of the agent at the present state t Take action j t After that, the environmental state transitions to s t+1 Probability p(s) t+1 |s t ,a t ) R is a reward function, and T is an end signal. Under the MDPs framework, the core objective is to find the optimal strategy, namely, the optimal mapping pi of the state space to the action space is S-A, and the maximum expected rewards are obtained:
wherein pi * Represents an optimal strategy, pi represents a strategy, gamma is a discount factor, T represents a time step, T end Representing the step size, pi, at the end of a round of interaction t The policy taken at time t is indicated.
In the multifunctional radar interference decision model, the radar state space is defined as s= { S 1 ,s 2 ,…,s N Characterizing radar threat information, i.e., radar status; the radar target state is denoted as s target Indicating that the interference purpose is achieved; n represents the number of radar states. The interference pattern space is defined as j= { J 1 ,j 2 ,…,j M The interference pattern set being assumed by the jammer, e.g. noise suppressed interference, dense decoy interference, smart noise interference, combSpectrum interference, etc., where M represents the number of interference patterns. The reward function R represents an evaluation of interference effectiveness after a specific interference is taken, and R is defined according to radar threat level variation before and after interference:
(1) Threat level is minimized, i.e. target radar state s is reached target The prize is set to r=100;
(2) Threat level decreases, but not to the minimum, rewards are set to r= -1;
(3) The threat level is unchanged or raised and the prize is set to r=1.
Based on the definition, the multifunctional radar interference decision model based on the Markov decision process is as follows:
<S,J,P,R,T>;
wherein S is a radar state space, s= { S 1 ,s 2 ,…,s N },s n N=1, 2,..n is the radar status in S, N is the number of radar states; j is the interference pattern space, j= { J 1 ,j 2 ,…,j M },j m M=1, 2,., M is the interference pattern of the interfering machine in J, M is the number of interference patterns; p is SxJxS → [0,1 ]]The state transition probability is represented by T, the ending signal is represented by T, and the reward function is represented by R; the bonus function R is defined as:
wherein r is t For the interference gain at time t, t is the time before interference is implemented, t+1 is the time after interference is implemented and radar signals are again detected,for pre-interference radar threat level, +.>For post-interference radar threat level s target Is radar target state s t To implement the radar state at interference s t+1 To implement the radar state after interference.
Based on the above-described multi-functional radar interference decision model, the entire radar countermeasure process can be described as: the jammer determines radar state information by detecting signals, then adopts a certain interference pattern according to an interference strategy, analyzes the threat level change of the radar to obtain rewards, adopts an anti-interference means after interference, and the radar state is transferred to a new state. In the continuous countermeasure process, the jammer continuously updates the interference strategy through rewarding benefits to obtain maximum interference benefits, and self-adaptive interference to the multifunctional radar is realized.
S202, parameterizing expert knowledge information into an expert decision network model.
Parameterizing the radar countermeasure domain expert knowledge information by behavior cloning, and converting the parameterized radar countermeasure domain expert knowledge information into an expert decision network model represented by a deep neural network. The method specifically comprises the following steps:
s2021, taking expert data as a training sample set; the method comprises the steps of taking a current multifunctional radar state in expert data as input of an initial expert decision network, taking probability distribution of interference patterns adopted in the current multifunctional radar state as output of the initial expert decision network, and taking the interference patterns corresponding to the multifunctional radar state in the expert data as labels through a behavior cloning method.
Specifically, behavior cloning is one of three main imitation learning methods, and the learning task is as follows: the strategy for solving the problem, i.e. the expert strategy, is simulated from expert data, which is essentially supervised learning. Assuming that expert knowledge information (i.e., expert data) in the radar countermeasure field is Γ, the expert data is expressed as:
Γ={(s i ,j i )|i=1,2,...,N e };
wherein s is i Is the state of multifunctional radar, j i For jammer pattern j i E J, J is the interference pattern space,(s) i ,j i ) For the ith expert knowledge, i.e. in state s i Take interference action j i ,N e Representing the number of expert knowledge.
Behavior cloning takes expert data gamma as training sample set toThe current multifunctional radar state s in the expert data Γ is used as the input of an initial expert decision network model, and the probability distribution of the interference pattern adopted under the state s is usedFor output, the interference pattern corresponding to the multifunctional radar state in expert data Γ is used as a label, wherein the label is in One-hot coding form and can be also recorded as a probability distribution p e (·|s)。
S2022, constructing an initial expert decision network represented by the deep neural network.
The initial expert decision network is represented by a deep neural network and comprises a full-connection layer, an activation layer and a normalization layer, wherein the layer numbers and the hierarchical structures of the full-connection layer, the activation layer and the normalization layer can be set according to actual conditions.
Referring to fig. 3, fig. 3 is a schematic structural diagram of an initial expert decision network according to an embodiment of the present invention, where the initial expert decision network in fig. 3 includes a first full connection layer, a first activation layer, a second full connection layer, and a normalization layer that are sequentially connected. Specifically, the first activation layer comprises a ReLU activation layer and the normalization layer comprises a softmax layer.
S2023, training the initial expert decision network by using the training sample set, and back-propagating the cross entropy loss function to update parameters of the initial expert decision network to obtain an expert decision network module. The specific training steps comprise:
2023a) Randomly extracting expert data samples, sending the expert data samples into an initial expert decision network to obtain probability distribution of interference patterns adopted under the state s of the multifunctional radar
2023b) Calculating a cross entropy loss function:
wherein M is the number of interference patterns, j k For the kth interference pattern,probability distribution, p, of interference patterns assumed in the current multifunctional radar state e (·|s) is the probability distribution of the interference pattern corresponding to the multi-functional radar state in the expert data.
2023c) And (3) reversely transmitting the cross entropy Loss function Loss to update the parameters of the initial expert decision network, and if the Loss function converges, obtaining the expert decision network module.
S203, constructing an interference decision network based on D3 QN.
The interference decision network is constructed based on a D3QN algorithm and comprises a main decision network Q (s, j; theta) and a target decision network Q T (s,j;θ - ) The primary decision network Q (s, j; θ) for interference decision, target decision network Q T (s,j;θ - ) For updating the primary decision network parameters. A primary decision network Q (s, j; θ) and a target decision network Q T (s,j;θ - ) The network structure is the same, all contain full tie-layer and activation layer, and the layer number and the hierarchical structure of full tie-layer, activation layer can set up according to actual conditions.
Referring to fig. 4, fig. 4 is a schematic structural diagram of an interference decision network according to an embodiment of the present invention. As shown in fig. 4, the main decision network and the target decision network each include: the third full connection layer, the second activation layer, the fourth full connection layer, the third activation layer, the fifth full connection layer, the sixth full connection layer and the addition module. The third full-connection layer, the second activation layer, the fourth full-connection layer and the third activation layer are sequentially connected; the input end of the fifth full-connection layer and the input end of the sixth full-connection layer are both connected with the output end of the third full-connection layer; the output end of the fifth full-connection layer and the output end of the sixth full-connection layer are both connected with the input end of the adding module, and the output of the fifth full-connection layer and the output of the sixth full-connection layer are added by using addition operation; the output end of the adding module is used as the output end of the main decision network or the target decision network. Specifically, the second activation layer and the third activation layer both comprise a ReLU activation layer; the fifth full connection layer has 1 node, the sixth full connection layer has M nodes, and M represents the number of interference patterns.
S204, constructing a decision network, and training the decision network to obtain a trained decision network.
The decision network constructed in this embodiment includes: (1) an expert decision network model constructed in the step S202, (2) an interference decision network constructed in the step S203, and (3) an expert intervention discriminant function module(4) Expert interference exploration function moduleWherein, the expert decision network module characterizes expert strategy pi e The method comprises the steps of carrying out a first treatment on the surface of the Interference policy pi for online learning of interference decision network characterization q The method comprises the steps of carrying out a first treatment on the surface of the Expert intervention discriminant function Module->Determining whether to decide the interference action by an expert decision network module according to the radar state s detected by the current jammer; expert interference exploration function Module->And controlling the participation degree of the expert strategy according to the radar state s detected by the current jammer.
In this embodiment, the expert decision network module and the interference decision network jointly decide to form a hybrid interference decision mechanism, and the purpose of the hybrid interference decision mechanism is two: firstly, the wrong expert knowledge is reexplored and learned again, so that the efficiency of interference decision making is improved; and secondly, participating in expert decision, exploring possibility except expert strategy according to probability, and avoiding the expert strategy as suboptimal solution. The final output strategy pi of the decision network (i.e., the output strategy of the trained decision network) is expressed as:
wherein pi e Is specially designed forExpert strategy pi in home decision network module q S is radar state, which is an interference strategy in an interference decision network;for expert intervention discriminant function module->Recording error information of expert strategy by using a knowledge base established in the process of game countermeasure of two parties, and expressing an expert decision error state set as +.>The expert intervention discriminant function module is defined as +.> Representing the utilization of expert policy pi e Decision making, in contrastRepresenting the utilization of interference policy pi q Making a decision; />Exploring a function module for expert interference>Defined as->Wherein ζ represents a random number of 0 to 1, ε e The search factor is represented by a search factor,representing expert policy pi e Participation in decision, conversely->Representing the utilization of interference policy pi q And performing interference pattern exploration in the state s.
It should be noted that in the mixed interference decision mechanism, the interference decision network is trained by data collected by the final policy pi interaction, and some of them are in some states composed of expert policy pi e Generated, expert policy pi under these conditions e Interference strategy pi q There are policy differencesThis policy difference affects the state distribution D under the mixing policy and the interference policy π (s) and->When the difference is too large, the instability of the decision network training is directly caused, and the decision performance is reduced. The state distribution difference between the mixed strategy and the interference strategy is constrained by the strategy difference, and then the state distribution difference between the final output strategy and the interference strategy can be obtained as follows:
wherein,for expectations, s-pi is s obeys the policy pi distribution, gamma is the discount factor,referred to as an intervention factor.
As can be seen from the above formula, the upper limit of the state distribution difference is constrained by the policy difference and the intervention factor, the policy difference is uncontrollable, but the state distribution difference can be reduced by reducing the participation degree of the expert policy, so that the embodiment gradually reduces the participation decision proportion of the expert policy, and improves the stability of the decision network training, which is embodied inSearch factor epsilon e Gradually decreasing.
Referring to fig. 5, fig. 5 is a schematic diagram of a training implementation framework of a decision network according to an embodiment of the present invention, and a training method of the decision network includes the steps of:
s2041, setting total training game times N round Maximum number of single wheel games N max And an exploration factor epsilon e Initializing network parameters of an interference decision network, a learning rate l, an experience pool H and a sampling batch size N b
S2042 obtaining the current radar status S t When judging the current radar state s t And radar target state s target Inconsistencies, i.e. s t ≠s target Inputting the current radar state into an expert intervention discriminant function module, and turning to step S2043; when judging the current radar state s t And radar target state s target Consistent, start a new training round until the training times reach the total training game times N round
S2043 ifThen the current interference pattern j is selected by the interference decision network t And implementing interference; otherwise, the expert strategy is controlled by the interference exploration function to participate in the decision.
Specifically, when the expert intervenes the discriminant function module to judge the current radar state s t Belongs to expert decision error state setThen the current interference pattern j is selected using the primary one of the interference decision networks t To implement interference; when the expert intervenes the discriminant function module to judge the current radar state s t Not belonging to expert decision error state set->Then according to the current radar status s t Judging the participation degree of the expert strategy for interference decision by utilizing the expert interference exploration function module; when judging speciallyThe home strategy participates in interference decision, and then the expert decision network module is utilized to select the current interference pattern j t To implement interference; when judging that the expert strategy does not participate in the interference decision, selecting the current interference pattern j by using a main decision network in the interference decision network t To implement the interference.
S2044, acquiring radar status S after interference t+1 And evaluate the current interference gain r in combination with the current radar status t The current radar state s t Current interference pattern j t Current interference gain r t Radar state after interference s t+1 And the current end signal done as a combination, namely<s t ,j t ,r t ,s t+1 ,done>Stored in experience pool H.
S2045, sampling the experience samples from the experience pool H according to the size of the sampling batch by adopting a sampling mode of preferential experience playback, and the number of the experience samples and the size N of the sampling batch b Similarly, the primary decision network Q (s, j; θ) is updated with training and the primary decision network parameters θ are back propagated with the target loss function. The calculation formula of the target loss function is as follows:
wherein D3QN is D3QN algorithm, θ is main decision network parameter, Q(s) t ,j t The method comprises the steps of carrying out a first treatment on the surface of the θ) is the primary decision network output at s t Take j below t Interference value s of (c) t J is the current radar state t R for the current interference pattern t For current interference gain, Q T For the target decision network s t+1 For post-interference radar conditions, J is the interference pattern space, J is the interference pattern of the jammer, Q (s t+1 J; θ) is the primary decision network output at s t+1 The interference value of j, theta is adopted - Network parameters are decided for the target.
The update formula of the main decision network parameter is as follows:
wherein θ new For updated primary decision network parameters, θ old For the main decision network parameters before updating, l is the learning rate,regarding θ for the target loss function old Is a gradient of (a).
S2046, updating the target decision network parameters by combining the super parameters and the main decision network parameters.
For example, the super parameter tau is set, and the super parameter tau is taken as a weight value to be weighted average with the main decision network parameter theta to update the target decision network parameter theta - The update formula is as follows:
wherein,decision network parameters for updated targets, +.>For the target decision network parameters before updating, τ is the superparameter.
S2047, recording expert strategy error information, and storing the expert strategy error information into an expert decision error state set
S2048, when it is determined that the number of single-round games is less than the maximum number of single-round games N max Returning to step S2042; when the number of single-round games is determined to be greater than the maximum number of single-round games, returning to step S2041 until the training number reaches the training game total number N round And obtaining a trained decision network. Wherein, when returning to step S2041 to start a new training round, the search factor ε is explored e Smaller than the last oneExploration factor epsilon in round training e I.e. the exploration factor epsilon e Gradually decreasing with increasing training times.
Further, the trained decision network is utilized to carry out the multi-functional radar interference decision according to the method shown in fig. 1.
The embodiment provides a multifunctional radar interference decision method of an imperfect expert strategy, integrates expert knowledge accumulated in the radar countermeasure field into the interference decision method on the basis of a traditional deep reinforcement learning interference decision algorithm, combines an expert decision network module and an interference decision network to jointly carry out interference decision, effectively improves the learning efficiency and decision precision of the interference decision algorithm, and reduces the countermeasure game trial-error cost.
The multifunctional radar interference decision method of the embodiment considers the severity of the optimal condition of the expert strategy, introduces the expert knowledge discrimination processing module and the expert interference exploration function module on the premise of guaranteeing that the expert strategy provides safety guarantee and exploration guidance for the decision network, learns and corrects expert knowledge of different qualities by utilizing the decision network, improves the decision accuracy and improves the timeliness of decision.
The effects of the present embodiment can be further described by the following simulation experiments.
To verify the performance of the method proposed in this embodiment in complex situations, it is assumed that the jammer can transmit 10 different interference patterns j= { J 1 ,j 2 ,…,j 10 The multifunctional radar state space contains 16 states s= { S 1 ,s 2 ,…,s 16 -16 threat levels, s 1 Highest threat level, s 16 For target radar state s target The threat level is the lowest, and each radar state conversion relation refers to fig. 6, and fig. 6 is a radar state conversion relation diagram provided by the embodiment of the invention. The interference task of the jammer is to transfer any state of the multifunctional radar to the radar target state s target The bonus function is expressed as:
wherein,and->Indicating radar threat levels before and after the disturbance.
In order to compare the performance improvement of the method (abbreviated as ED3 qn_per) according to the present embodiment with respect to the current interference decision algorithm based on depth reinforcement learning, the decision result is analyzed by comparing the two dimensions of the number of single games and the decision accuracy with the decision algorithm based on depth Q learning (DQN), depth Q learning with preferential experience playback (dqn_per), depth double Q learning with preferential experience playback (ddqn_per), and depth double decision learning with preferential experience playback (d3qn_per), and fig. 7 is a graph of decision accuracy according to the embodiment of the present invention. From the point of view of a decision accuracy curve, the method provided by the embodiment guides the interference decision by introducing an expert strategy, and the interference decision accuracy is basically superior to that of other algorithms in the whole training period, so that the safety in the game process is improved.
In order to analyze the performance of the method provided by the embodiment under expert strategies of different quality, the expert strategies of four grades of high, medium, low and poor are taken as references, the influence of the expert strategies of different quality on the performance of the algorithm is analyzed, the decision result is shown in fig. 8, and fig. 8 is another decision precision graph provided by the embodiment of the invention. From the point of view of the decision accuracy curve, under the influence of different quality expert strategies, the convergence speed and accuracy of the decision are excellent, and along with the improvement of the quality of the expert strategies, the decision convergence speed and accuracy are also improved.
In summary, the embodiment introduces expert knowledge to solve the problems of long training period and high trial-and-error cost of the existing interference decision algorithm based on reinforcement learning, improves the learning efficiency of the interference decision algorithm, reduces the trial-and-error risk, and improves the interference decision efficiency and accuracy under the condition of imperfect expert knowledge aided decision, aiming at the problem that the optimal expert interference strategy in the radar and jammer non-cooperative game countermeasure under the complex electromagnetic environment is difficult to obtain.
The foregoing is a further detailed description of the invention in connection with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments described. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.

Claims (10)

1. A multifunctional radar interference decision method of imperfect expert strategy is characterized in that the method is based on a multifunctional radar interference decision model, and a trained decision network is utilized to select an interference pattern according to radar states so as to implement interference on the radar, so that the radar generates new radar states; the trained decision network comprises an expert decision network module, an interference decision network, an expert interference discrimination function module and an expert interference exploration function module; the method comprises the steps of:
acquiring a radar state;
when the radar state is inconsistent with the radar target state, judging whether the radar state belongs to an expert decision error state set or not by utilizing the expert intervention discriminant function module;
when the radar state is judged to belong to an expert decision error state set, selecting an interference pattern by using a main decision network in the interference decision network; judging whether an expert strategy participates in interference decision or not by utilizing the expert interference exploration function module according to the radar state when the radar state is judged not to belong to the expert decision error state set;
when judging that the expert strategy participates in the interference decision, selecting an interference pattern by using the expert decision network module; and when judging that the expert strategy does not participate in the interference decision, selecting an interference pattern by using a main decision network in the interference decision network.
2. The imperfect expert policy multi-functional radar interference decision method according to claim 1, wherein the multi-functional radar interference decision model is:
<S,J,P,R,T>;
wherein S is a radar state space, s= { S 1 ,s 2 ,…,s N },s n N=1, 2,..n is the radar status in S, N is the number of radar states; j is the interference pattern space, j= { J 1 ,j 2 ,…,j M },j m M=1, 2,., M is the interference pattern of the interfering machine in J, M is the number of interference patterns; p is SxJxS → [0,1 ]]The state transition probability is represented by T, the ending signal is represented by T, and the reward function is represented by R;
the bonus function R is defined as:
wherein r is t For the interference gain at time t, t is the time before interference is implemented, t+1 is the time after interference is implemented and radar signals are again detected,for pre-interference radar threat level, +.>For post-interference radar threat level s target Is radar target state s t To implement the radar state at interference s t+1 To implement the radar state after the disturbance.
3. The method for multi-functional radar interference decision making with imperfect expert strategy according to claim 1, wherein the method for constructing the expert decision network module comprises the steps of:
taking expert data as a training sample set; the method comprises the steps of using a current multifunctional radar state in expert data as input of an initial expert decision network, using probability distribution of interference patterns adopted in the current multifunctional radar state as output of the initial expert decision network, and using interference patterns corresponding to the multifunctional radar state in the expert data as labels through a behavior cloning method;
constructing an initial expert decision network represented by a deep neural network;
and training the initial expert decision network by using the training sample set, and back-propagating a cross entropy loss function to update parameters of the initial expert decision network to obtain the expert decision network module.
4. A multifunctional radar interference decision method according to claim 3, characterized in that the expert data is represented in trajectory form as:
Γ={(s i ,j i )|i=1,2,...,N e };
wherein s is i Is the state of multifunctional radar, j i For jammer pattern j i E J, J is the interference pattern space,(s) i ,j i ) For the ith expert knowledge, N e Representing the number of expert knowledge;
the cross entropy loss function is:
wherein M is the number of interference patterns, j k For the kth interference pattern,probability distribution, p, of interference patterns assumed in the current multifunctional radar state e (·|s) is the probability distribution of the interference pattern corresponding to the multi-functional radar state in the expert data.
5. The imperfect expert policy multifunction radar interference decision making method according to claim 1, wherein the interference decision making network comprises a main decision making network and a target decision making network, the main decision making network and the target decision making network are identical in structure, each comprising: a third fully-connected layer, a second active layer, a fourth fully-connected layer, a third active layer, a fifth fully-connected layer, a sixth fully-connected layer, and an addition module, wherein,
the third full-connection layer, the second activation layer, the fourth full-connection layer and the third activation layer are sequentially connected;
the input end of the fifth full-connection layer and the input end of the sixth full-connection layer are connected with the output end of the third full-connection layer;
the output end of the fifth full-connection layer and the output end of the sixth full-connection layer are both connected with the input end of the adding module;
the output end of the adding module is used as the output end of the main decision network or the target decision network.
6. The method for multi-functional radar interference decision making with imperfect expert strategy according to claim 5, wherein the trained decision network is obtained by training the decision network, and the training method of the decision network comprises the steps of:
s2041, setting the total number of training games, the maximum number of single-round games and an exploration factor, and initializing network parameters, learning rate, experience pool and sampling batch size of the interference decision network;
s2042, acquiring a current radar state, and inputting the current radar state into the expert intervention discriminant function module when the current radar state is inconsistent with the radar target state; when the current radar state is judged to be consistent with the radar target state, starting a new training round until the training times reach the total training game times;
s2043, when the expert intervention discriminant function module judges that the current radar state belongs to an expert decision error state set, selecting a current interference pattern by using a main decision network in the interference decision network to implement interference; when the expert interference judging function module judges that the current radar state does not belong to an expert decision error state set, judging the participation degree of an expert strategy for interference decision by utilizing the expert interference exploring function module according to the current radar state; when judging that the expert strategy participates in interference decision, selecting a current interference pattern by utilizing the expert decision network module to implement interference; when judging that the expert strategy does not participate in the interference decision, selecting a current interference pattern by using a main decision network in the interference decision network to implement interference;
s2044, acquiring a radar state after interference, evaluating current interference benefits by combining the current radar state, and storing the current radar state, the current interference pattern, the current interference benefits, the radar state after interference and a current ending signal as a combination into an experience pool;
s2045, sampling experience samples from the experience pool according to the size of the sampling batch by adopting a sampling mode of preferential experience playback so as to train and update the main decision network, and back-propagating by utilizing a target loss function so as to update main decision network parameters;
s2046, updating target decision network parameters by combining the super parameters and the main decision network parameters;
s2047, storing expert strategy error information into the expert decision error state set;
s2048, when the number of single-round games is smaller than the maximum number of single-round games, returning to the step S2042; and when the number of single-round games is equal to the maximum number of single-round games, returning to the step S2041 until the training number reaches the total number of training games, and obtaining the trained decision network.
7. The imperfect expert policy multi-functional radar interference decision making method according to claim 6, wherein the objective loss function is:
wherein D3QN is D3QN algorithm, θ is main decision network parameter, Q(s) t ,j t The method comprises the steps of carrying out a first treatment on the surface of the θ) is the primary decision network output at s t Take j below t Interference value s of (c) t J is the current radar state t R for the current interference pattern t For current interference gain, Q T For the target decision network s t+1 For post-interference radar conditions, J is the interference pattern space, J is the interference pattern of the jammer, Q (s t+1 J; θ) is the primary decision network output at s t+1 The interference value of j, theta is adopted - Deciding network parameters for a target;
the update formula of the main decision network parameter is as follows:
wherein θ new For updated primary decision network parameters, θ old For the main decision network parameters before updating, l is the learning rate,regarding θ for the target loss function old Is a gradient of (2);
the updating formula of the target decision network parameter is as follows:
wherein,decision network parameters for updated targets, +.>For the target decision network parameters before updating, τ is the superparameter.
8. The imperfect expert policy multi-functional radar interference decision making method according to claim 1, wherein the trained decision network output policy is:
wherein,for expert intervention discriminant function module->Exploring a function module for expert interference, pi e For expert policy in expert decision network module, pi q S is radar state, which is an interference strategy in the interference decision network.
9. The method of claim 1, wherein the expert intervention discriminant function module defines:
wherein s is the radar state,for expert decision error state set, < >>Representing the utilization of expert policy pi e Decision making->Representing the utilization of interference policy pi q And making a decision.
10. The method of claim 1, wherein the expert interference exploration function module defines:
wherein s is the radar state,for expert decision error state set, xi is random number of 0-1, epsilon e In order to explore the factors,representing expert policy pi e Participation in decision making, including->Representing the utilization of interference policy pi q And (5) searching an interference pattern in the radar state s.
CN202311029543.1A 2023-08-16 2023-08-16 Multifunctional radar interference decision-making method based on imperfect expert strategy Active CN116755046B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311029543.1A CN116755046B (en) 2023-08-16 2023-08-16 Multifunctional radar interference decision-making method based on imperfect expert strategy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311029543.1A CN116755046B (en) 2023-08-16 2023-08-16 Multifunctional radar interference decision-making method based on imperfect expert strategy

Publications (2)

Publication Number Publication Date
CN116755046A CN116755046A (en) 2023-09-15
CN116755046B true CN116755046B (en) 2023-11-14

Family

ID=87953600

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311029543.1A Active CN116755046B (en) 2023-08-16 2023-08-16 Multifunctional radar interference decision-making method based on imperfect expert strategy

Country Status (1)

Country Link
CN (1) CN116755046B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111275174A (en) * 2020-02-13 2020-06-12 中国人民解放军32802部队 Game-oriented radar countermeasure generating method
CN114236477A (en) * 2021-09-01 2022-03-25 西安电子科技大学 Radar interference game strategy design method based on neural network virtual self-alignment
CN114415126A (en) * 2022-04-02 2022-04-29 中国人民解放军军事科学院国防科技创新研究院 Radar compression type interference decision method based on reinforcement learning
CN114814741A (en) * 2022-05-31 2022-07-29 中国地质大学(武汉) DQN radar interference decision method and device based on priority important sampling fusion
CN115932752A (en) * 2023-01-06 2023-04-07 中国人民解放军海军大连舰艇学院 Radar cognitive interference decision method based on incomplete information game
WO2023122629A1 (en) * 2021-12-20 2023-06-29 A10 Systems LLC Waveform agnostic learning-enhanced decision engine for any radio

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8860602B2 (en) * 2012-10-09 2014-10-14 Accipiter Radar Technologies Inc. Device and method for cognitive radar information network
US20220209885A1 (en) * 2020-12-24 2022-06-30 Viettel Group Method and apparatus for adaptive anti-jamming communications based on deep double-q reinforcement learning
US20230130863A1 (en) * 2021-06-25 2023-04-27 Bae Systems Information And Electronic Systems Integration Inc. Method for signal representation and reconstruction

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111275174A (en) * 2020-02-13 2020-06-12 中国人民解放军32802部队 Game-oriented radar countermeasure generating method
CN114236477A (en) * 2021-09-01 2022-03-25 西安电子科技大学 Radar interference game strategy design method based on neural network virtual self-alignment
WO2023122629A1 (en) * 2021-12-20 2023-06-29 A10 Systems LLC Waveform agnostic learning-enhanced decision engine for any radio
CN114415126A (en) * 2022-04-02 2022-04-29 中国人民解放军军事科学院国防科技创新研究院 Radar compression type interference decision method based on reinforcement learning
CN114814741A (en) * 2022-05-31 2022-07-29 中国地质大学(武汉) DQN radar interference decision method and device based on priority important sampling fusion
CN115932752A (en) * 2023-01-06 2023-04-07 中国人民解放军海军大连舰艇学院 Radar cognitive interference decision method based on incomplete information game

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A DECEPTIVE JAMMING TEMPLATE SYNTHESIS METHOD FOR SAR USING GENERATIVE ADVERSARIAL NETS;Weiwei Fan et al.;《IGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium》;6926-6929 *
A Multi-Domain Anti-Jamming Scheme Based on Bayesian Stackelberg Game With Imperfect Information;YONGCHENG LI et al.;《IEEE Access》;132250-132259 *
雷达智能博弈抗干扰技术综述与展望;李康等;《现代雷达》;第45卷(第5期);15-26 *

Also Published As

Publication number Publication date
CN116755046A (en) 2023-09-15

Similar Documents

Publication Publication Date Title
CN111310915B (en) Data anomaly detection defense method oriented to reinforcement learning
CN110958680B (en) Energy efficiency-oriented unmanned aerial vehicle cluster multi-agent deep reinforcement learning optimization method
CN111695690B (en) Multi-agent confrontation decision-making method based on cooperative reinforcement learning and transfer learning
CN112052936B (en) Reinforced learning exploration method and device based on generation countermeasure mechanism
CN114199248B (en) AUV co-location method for optimizing ANFIS based on mixed element heuristic algorithm
CN113570039B (en) Block chain system based on reinforcement learning optimization consensus
CN114896899B (en) Multi-agent distributed decision method and system based on information interaction
CN115952736A (en) Multi-agent target collaborative search method and system
CN113759709A (en) Method and device for training strategy model, electronic equipment and storage medium
CN113780576A (en) Cooperative multi-agent reinforcement learning method based on reward self-adaptive distribution
CN116205298A (en) Opponent behavior strategy modeling method and system based on deep reinforcement learning
CN116755046B (en) Multifunctional radar interference decision-making method based on imperfect expert strategy
CN112257648A (en) Signal classification and identification method based on improved recurrent neural network
CN117319232A (en) Multi-agent cluster consistency cooperative control method based on behavior prediction
CN116896422A (en) Intelligent interference resistant channel decision method based on interference consciousness learning
CN116340737A (en) Heterogeneous cluster zero communication target distribution method based on multi-agent reinforcement learning
CN116866048A (en) Anti-interference zero-and Markov game model and maximum and minimum depth Q learning method
CN116128028A (en) Efficient deep reinforcement learning algorithm for continuous decision space combination optimization
CN115660052A (en) Group intelligent learning method integrating postwitness ideas
CN113344071B (en) Intrusion detection algorithm based on depth strategy gradient
Zhou et al. An evolutionary approach toward dynamic self-generated fuzzy inference systems
CN111445005A (en) Neural network control method based on reinforcement learning and reinforcement learning system
CN117391153A (en) Multi-agent online learning method based on decision-making attention mechanism
CN109670602B (en) Group standard rapid emerging method for social reinforcement learning with teacher-student mechanism
CN115018042A (en) Multi-agent reinforcement learning method based on point-to-point communication and time sequence feature extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant