CN105388461A - Radar adaptive behavior Q learning method - Google Patents

Radar adaptive behavior Q learning method Download PDF

Info

Publication number
CN105388461A
CN105388461A CN201510729398.7A CN201510729398A CN105388461A CN 105388461 A CN105388461 A CN 105388461A CN 201510729398 A CN201510729398 A CN 201510729398A CN 105388461 A CN105388461 A CN 105388461A
Authority
CN
China
Prior art keywords
waveform
radar
probability
state
bayes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510729398.7A
Other languages
Chinese (zh)
Other versions
CN105388461B (en
Inventor
彭晓燕
杨金金
袁晓垒
张花国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201510729398.7A priority Critical patent/CN105388461B/en
Publication of CN105388461A publication Critical patent/CN105388461A/en
Application granted granted Critical
Publication of CN105388461B publication Critical patent/CN105388461B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S7/00Details of systems according to groups G01S13/00, G01S15/00, G01S17/00
    • G01S7/02Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Radar Systems Or Details Thereof (AREA)

Abstract

The invention belongs to the field of radar signal processing, and particularly relates to a Q learning method updated based on a Bayesian table to learn and recognize radar adaptive behaviors. The invention provides a radar adaptive behavior Q learning method. An improved Q learning algorithm is used for learning in view of a time domain waveform selection behavior (the minimum mutual information criterion), a big forward step is made on the basis of carrying out interference only according to direct information obtained by a receiving end traditionally, a suggested machine learning algorithm is used for recognizing the radar time domain adaptive behavior, and a certain learning result is given. The method of the invention applies the Q learning algorithm updated based on the Bayesian table to the radar behavior learning and recognition problem for the first time, and in comparison with the prior art, learning effects under time domain waveform selection (the minimum mutual information criterion) are better.

Description

A kind of radar self-adaption behavior Q learning method
Technical field
The invention belongs to radar signal processing field, particularly relate to the study of Q learning method to radar self-adaption behavior, the identification problem of showing renewal based on Bayes.
Background technology
Along with the appearance of the eighties of last century adaptive system sixties and Adaptive Signal Processing, be born adaption radar system, and its adaptive ability is growing, and progressively develop into receiver-transmitter by radar receiver self-adaptive processing and synchronously process.When current radar self-adaption behavior is mainly manifested in/and the behavioural characteristic of frequently/spatial domain running parameter, signal transacting and mode of operation aspect, as time domain waveform selects adaptive behavior.It is the adaptive a kind of important means of radar waveform that radar waveform is selected, and target radar can set up a waveform library, chooses transmitted waveform to improve radar performance according to certain criterion in waveform library.Waveform selection criterion and the mode of operation residing for radar (or Radar Task) are closely related, according to existing documents and materials, when Radar Task is target identification, waveform selection criterion comprises maximum mutual information criterion (target is for sending signal and target optimum matching next time); Minimal mutual information (target is that the signal sent next time can obtain more amount of new information, and common quantity of information is minimum); Maximum Kullback-Leibler information criterions etc., the present invention will take minimal mutual information as object.
Current radar signal processing field, as a side of interference, generally identify for fixing radar target, but intellectuality is a trend of future development, both sides are by gradually toward having the future development of cognitive ability, for the target above with adaptive behavior, need more intelligent algorithm to learn adaptive behavior, the result of study could be utilized afterwards to attack efficiently and in real time.
Q learning algorithm is the one of nitrification enhancement, proposed first in its Ph.D. Dissertation " Learningfromdelayedrewards " in 1989 by C.Watkins, this algorithm is the relevant theory of dynamic programming and animal learning is psychologic be combined with each other effectively, to solve the sequentialization decision problem that has and postpone return for target.In Q learning algorithm, carry out iterative computation according to the behavior value function of time difference to Markov decision process, its iterative computation formula is: Q ( s t , a t ) ← Q ( s t , a t ) + α [ r t + 1 + γ m a x a ∈ A s Q ( s t + 1 , a t ) - Q ( s t , a t ) ] , Wherein, parameter alpha is called learning rate (or Learning Step), and γ is discount rate.Q (s t, a t) be the right value function of state-action, represent at state s tunder, perform an action a t, map the remuneration of action gained again by tactful π, the target of Q study is later its each step is all greedy.
Bayesian network represents knowledge by providing patterned method, it is a directed acyclic graph, wherein node represents the variable in domain, directed arc represents the relation of variable, conditional probability represents effect between variable, clearly can reflect the dependence in practical application between variable by Bayesian network.Bayesian network is also called Belief Network, is a kind of patterned model, represents the joint probability distribution function between one group of variable.A Bayesian network comprises a structural model and an associated set condition probability distribution function.
When the data characteristics number in Bayesian network is K, so joint distribution p (x of this K variable 1..., x k) then can be written as form below, and simplified by the conditional sampling feature of Bayesian network: p ( x 1 , ... , x K ) = p ( x K | x 1 , ... , x K - 1 ) ... p ( x 2 | x 1 ) p ( x 1 ) = Π k = 1 K p ( x k | pa k ) , Wherein, pa krefer to node x kfather node set.Can learn, when Bayesian network is comparatively sparse, joint probability density form will simplify greatly.
Q study is unsupervised machine learning algorithm, study side can be made to adapt to the environment that will learn gradually by study, refer to the adaptive behavior adapting to target radar here, and Bayesian learning from the angle of probability (degree of confidence) using the information of the unknown as stochastic variable, there is good adaptability and extensibility, be applied to by Bayesian Learning Theory in Q study, add heuristic strategies, the study for the adaptive behavior of target radar has better effect.
Summary of the invention
The object of the invention is to for the deficiencies in the prior art, a kind of radar self-adaption behavior Q learning method is provided, learn for time domain waveform housing choice behavior (minimal mutual information criterion) with the Q learning algorithm improved, the direct information only obtained according to receiving end in tradition carries out leaping in the basis of disturbing major step, utilize the machine learning algorithm proposed to debate knowledge to the behavior of radar time-domain adaptive, and provide certain learning outcome.
Technical scheme of the present invention is: on the basis of Q learning algorithm, with the waveform selection under time domain minimal mutual information criterion for learning object, first, also needs to obtain its waveform library information before to the study of target radar waveform adaptive behavior; Secondly, modeling is carried out to adaptive behavior object, and utilize the object of modeling and study side to carry out alternately, obtain the wave-like shape transitions situation under disturbance and laboratory training data; Then, utilize training data to carry out Bayesian network parameters study, obtain Bayes posterior probability table and Bayes's record sheet; Finally, with Bayes posterior probability table for priori, utilize the algorithm proposed to carry out iterative learning, and provide learning outcome
A kind of radar self-adaption behavior Q learning method, concrete steps are as follows:
S1, study side is by continuous emission detection undesired signal, force target radar to change to transmit, the side's of study receiving end obtains transmitting of target radar next time, for the dynamic waveform storehouse of sound study side, the concrete grammar in the dynamic waveform storehouse of described sound study side is: the shape information obtain study side's receiving end and known waveform contrast, if not this waveform in dynamic waveform storehouse, then stored in dynamic waveform storehouse, then emission detection undesired signal is continued, until m time mutual in the target radar transmitted waveform that obtains all can find in dynamic waveform storehouse, wherein, m is empirical value,
S2, with the waveform selection under time domain minimal mutual information criterion for learning object, modeling is carried out to it, and utilizes the object of modeling and study side to carry out alternately, obtain the wave-like shape transitions situation under disturbance and laboratory training data;
S3, training data described in S2 is utilized to carry out Bayesian network parameters study, utilize the Bayes tool box under Matlab environment, add Di Li Cray prior distribution, obtain maximum a posteriori probability table and Bayes's record sheet of new radar waveform, wherein, Bayes's record sheet has the new waveform numbering of maximum a posteriori probability under referring to existing waveform, existing undesired signal;
S4, on the basis of original Q learning algorithm, with Bayes's record sheet described in S3 for priori, show update algorithm according to Bayes and carry out iterative learning, and provide learning outcome.
Further, the concrete grammar of modeling described in S2 is:
S21, when target radar waveform selection criterion is minimal mutual information criterion, radar echo signal is modeled as b=a+w=S α+w, and wherein, S is the waveform convolution matrix comprising waveform parameter, and α is scattering coefficient vector, and w is receiver noise vector;
S22, carry out waveform selection, be specially: ensure that the waveform next time sent can obtain more amount of new information, namely the mutual information of twice radar echo signal in front and back is minimum, namely M I = { M I ( b 1 , b i ) } s i ( m ) min M I ( b 1 , b i ) = H ( b 1 ) - H ( b 1 | b i ) = H ( b i ) - H ( b i | b 1 ) , Under the hypothesis of w white Gaussian noise distribution, the mutual information between waveform 1 and waveform i is, wherein, { d k| k=1,2 ..., K} is cross-correlation matrix R xzsingular value, matrix R xzbe defined as singular value meets: 1>=d 1>=d 2...>=d k>=0, cross-correlation matrix R 11, R i1, R iibe defined as E ( b 1 b 1 H ) = R 11 = S 1 R α α S 1 H + R w w E ( b i b 1 H ) = R i 1 = S i R α α S 1 H E ( b i b i H ) = R i i = S i R α α S i H + R w w , Obtain the mutual information between different wave;
S23, modeling is carried out to the waveform selection object described in S22, characterize different radar waveform states with parameters such as signal waveform radar, bandwidth, select waveform minimum with a upper transmitted waveform mutual information in waveform library as new radar waveform state;
S24, different undesired signals is set, affects the waveform selection of target radar, constantly carry out alternately, then obtaining the wave-like shape transitions situation under disturbance and laboratory training data with this.
Further, described in S3, Bayesian network parameters study is specially:
S31, training data described in S2 is utilized to obtain conditional probability in Bayesian network and Bayes' theorem;
S32, according to S31, conditional probability and Bayes' theorem obtain the posterior probability of output node and root node wherein, s krefer to the state in radar k moment, r krefer to the attack that study side taked in the k moment, s k+1refer to the new state in radar k+1 moment, the formula left side represents and is in state s at k moment radar k, study side take attack r ktime, radar changes new state s in the k+1 moment k+1probability, be the posterior probability estimation of radar new state.On the right of formula in denominator, P (s k+1| s k) represent the state transition probability of radar, be also the prior probability of k+1 moment state, P (r k| s k+1, s k) be the conditional probability of radar state, represent that radar is state s in the k moment k, the k+1 moment is state s k+1condition under, study side takes action r kprobability, that is to say at state s ktime, an expectation state s is set k+1, study side is for making radar from state s kforward state s to k+1select the probability that each is attacked, denominator P (r k| s k) be that molecule is to new state s k+1integration or summation, still with current state s kfor condition, study side selects to attack r kprobability.
Further, described in S4, Bayes shows update algorithm, specific as follows:
After S41, the waveform selection object modeling carried out under minimal mutual information, radar waveform storehouse waveform parameter are arranged, interference signal parameters arranges, waveform transformation situation is obtained by launching different undesired signals to target radar, namely laboratory training data are obtained, specific implementation process is: from waveform 1, carry out waveform selection, attack Stochastic choice from 4 interference numberings, obtain new waveform, carry out upgrading, circulating, obtain 100 training datas;
S42, structure Bayesian network, add prior distribution, solve maximum a posteriori solution, utilize Bayes tool box in Matlab to the conditional probability in Bayesian network solve, finally obtain the posterior probability of root node, wherein, prior probability is set to Dirichlet distribution, and probability is impartial, and Bayes's record sheet is the waveform transfer case under the disturbance added up on the basis of the maximum a posteriori probability solution solved, for under existing waveform, a certain interference, choose there is maximum a posteriori probability new waveform as output, be recorded in table, namely S t + 1 max = arg max S t + 1 { P ( S t , r t , S t + 1 ) } ;
S43, obtain Bayes posterior probability table after, show update algorithm process flow diagram according to Bayes, and carry out mutual between the waveform selection object under minimal mutual information criterion, then algorithm iteration, study.
The invention has the beneficial effects as follows:
The Q learning algorithm showing based on Bayes to upgrade is applied in radar action learning and identification problem by method of the present invention first, selects the results of learning under (minimal mutual information criterion) more excellent relative to prior art in time domain waveform.
Accompanying drawing explanation
Fig. 1 obtains target radar waveform library method schematic diagram.
Fig. 2 is waveform adaptive behavior modeling schematic diagram.
Fig. 3 is the bayesian network structure under waveform selection object.
Fig. 4 is Q learning algorithm process flow diagram.
Fig. 5 is that Bayes shows update algorithm process flow diagram.
Fig. 6 is Q learning algorithm convergence curve.
Fig. 7 is that Bayes shows update algorithm convergence curve.
Fig. 8 is the state transition diagram after study under jamming exposure area checking.
Algorithm Learning compliance test result when Fig. 9 is initial waveform difference.
Embodiment
Below in conjunction with embodiment and accompanying drawing, describe technical scheme of the present invention in detail.
S1, study side are by continuous emission detection undesired signal, and force target radar to change and transmit, the side's of study receiving end obtains transmitting of target radar next time, constantly the dynamic waveform storehouse of sound study side.First the shape information that obtains of the side's of study receiving end and known waveform comparison, if not this waveform in dynamic waveform storehouse, then stored in this dynamic waveform storehouse, then emission detection interference is continued, until m time mutual in the target radar transmitted waveform that obtains all can find in waveform library at this end, the value of m can regulate.
As shown in Figure 1, with the minimal mutual information criterion under waveform selection for object, continuous transmitting interference, travel through target radar waveform library, obtains learning square waveform storehouse, later study and emulating all completely and carry out under correctly having traveled through the condition of target radar waveform library.
S2, with the waveform selection under time domain minimal mutual information criterion for learning object, modeling is carried out to it, and utilizes the object of modeling and study side to carry out alternately, obtain the wave-like shape transitions situation under disturbance and laboratory training data, specific as follows:
S21, signature analysis and modeling are carried out to waveform adaptive behavior.
When target radar waveform selection criterion is minimal mutual information criterion, radar echo signal is modeled as: b=a+w=S α+w, wherein, S is the waveform convolution matrix comprising waveform parameter, α is scattering coefficient vector, and w is receiver noise vector, is generally assumed to white Gaussian noise.
Radar is accurately describe interested region, adopts more effective method to collect information.Therefore, waveform selection criterion is ensure that the waveform next time sent can obtain more amount of new information, and namely, the mutual information of twice radar echo signal is minimum, and expression formula is: M I = { M I ( b 1 , b i ) } s i ( m ) min M I ( b 1 , b i ) = H ( b 1 ) - H ( b 1 | b i ) = H ( b i ) - H ( b i | b 1 ) .
Under the hypothesis of w white Gaussian noise distribution, the mutual information between waveform 1 and waveform i is: wherein, { d k| k=1,2 ..., K} is cross-correlation matrix R xzsingular value, matrix R xzbe defined as follows: singular value meets: 1>=d 1>=d 2...>=d k>=0, cross-correlation matrix R 11, R i1, R iibe defined as: E ( b 1 b 1 H ) = R 11 = S 1 R α α S 1 H + R w w E ( b i b 1 H ) = R i 1 = S i R α α S 1 H E ( b i b i H ) = R i i = S i R α α S i H + R w w , The mutual information between different wave then can be obtained from above formula.
Then, modeling is carried out to the waveform selection object under minimal mutual information criterion, different radar waveform states is characterized with parameters such as signal waveform radar, bandwidth, select waveform minimum with a upper transmitted waveform mutual information in waveform library as new radar waveform state according to this criterion, therefore, namely the conversion of the state under certain criterion is radar waveform adaptive behavior.
Concrete modeling as Fig. 2, with parameter characterization radar waveform states such as signal waveform type, signal bandwidth, signal pulsewidths.In the present invention's emulation, target radar waveform parameter arranges as follows: arrange 32 waveforms in waveform library, 8 class waveforms, are respectively convex negative slope, logarithm frequency modulation positive slope, logarithm frequency modulation negative slope under convex positive slope under linear frequency modulation positive slope, linear frequency modulation negative slope, frequency modulation frequency modulation fovea superior positive slope, frequency modulation frequency modulation fovea superior negative slope, frequency modulation frequency modulation, frequency modulation frequency modulation; Every class waveform arranges 4 kinds of bandwidth, is 10MHz, 15MHz, 20MHz, 25MHz.
After to the waveform selection object modeling under minimal mutual information criterion, also need to arrange study side's interference signal parameters, undesired signal is 4 kinds, and the simple signal of to be jamming-to-signal ratio be respectively 30dB and 50dB, jamming-to-signal ratio is 30dB and 55dB, bandwidth is the linear FM signal of 30MHz.
After having had learning object and undesired signal, then need to solve the mutual information between the waveform under disturbance in simulation process to obtain the new waveform selected.Wherein, under the hypothesis of w white Gaussian noise distribution, the mutual information between waveform 1 and waveform i is: wherein, { d k| k=1,2 ..., K} is cross-correlation matrix R xzsingular value, matrix R xzbe defined as follows: singular value meets: 1>=d 1>=d 2...>=d k>=0, cross-correlation matrix R 11, R i1, R iibe defined as: E ( b 1 b 1 H ) = R 11 = S 1 R α α S 1 H + R w w E ( b i b 1 H ) = R i 1 = S i R α α S 1 H E ( b i b i H ) = R i i = S i R α α S i H + R w w , The mutual information between different wave then can be obtained from above formula.
S22, after modeling is carried out to minimal mutual information waveform selection object, different undesired signals is set, affect the waveform selection of target radar, constantly carry out alternately with this, then obtain the wave-like shape transitions situation under disturbance and laboratory training data, the radar waveform parameter in reciprocal process and interference signal parameters are described in detail below.
S3, training data is utilized to carry out Bayesian network parameters study, utilize the Bayes tool box under Matlab environment, add Di Li Cray prior distribution, obtain the maximum a posteriori probability table of new radar waveform, Bayes's record sheet has the new waveform numbering of maximum a posteriori probability under then referring to existing waveform, existing undesired signal.
The structure of Bayesian network as shown in Figure 3, utilizes training data to obtain various conditional probability in Bayesian network and Bayes' theorem, and then can obtain the posterior probability of output node and root node wherein, s krefer to the state in radar k moment, r krefer to the attack that study side taked in the k moment, s k+1refer to the new state in radar k+1 moment, the formula left side represents and is in state s at k moment radar k, study side take attack r ktime, radar changes new state s in the k+1 moment k+1probability, be the posterior probability estimation of radar new state.On the right of formula in denominator, P (s k+1| s k) represent the state transition probability of radar, be also the prior probability of k+1 moment state, P (r k| s k+1, s k) be the conditional probability of radar state, represent that radar is state s in the k moment k, the k+1 moment is state s k+1condition under, study side takes action r kprobability, that is to say at state s ktime, an expectation state s is set k+1, study side is for making radar from state s kforward state s to k+1select the probability that each is attacked, denominator P (r k| s k) be that molecule is to new state s k+1integration or summation, still with current state s kfor condition, study side selects to attack r kprobability.
S4, on the basis of original Q learning algorithm, with Bayes posterior probability table for priori, show update algorithm process flow diagram according to the Bayes of Fig. 5 and carry out iterative learning, and provide learning outcome.
Fig. 4 and Fig. 5 is Q learning algorithm and the process flow diagram showing the Q learning algorithm upgraded based on Bayes respectively, the key distinction of two kinds of algorithms is that Bayes shows update algorithm and make use of laboratory training data and obtain Bayes posterior probability table, and show as priori and the guiding knowledge to target waveform, then just in reciprocal process learning, the iteration with object with this.
Q learning algorithm specific implementation process comprises the following steps:
Step 1, above-mentioned to the waveform selection object modeling under minimal mutual information, after radar waveform storehouse waveform parameter is arranged, interference signal parameters arranges, then can make to utilize undesired signal to carry out alternately between Q learning algorithm and destination object.
Step 2, according to the Q learning algorithm process flow diagram in Fig. 4, between destination object distinct interaction process in, Q learning algorithm carries out iteration, study, and Fig. 6 is then the convergence curve of Q learning algorithm.Wherein, the curtain of horizontal ordinate represents the number of times reaching dbjective state, and the iterations of every act of ordinate time represents the number of times of attack reached required for dbjective state at every turn, when also namely study side and target radar carry out mutual, interaction times required during traction target radar arrival dbjective state.As can be seen from the figure, in the incipient stage that emulation curtain is secondary, required iterations is a lot, even can reach the iterations upper limit, along with the intensification that emulation curtain is secondary, on the basis of the knowledge obtained in time iterative process of curtain before, the iterations that algorithm reaches required for target waveform reduces gradually, finally reaches stable.
Bayes shows update algorithm specific implementation process and comprises the following steps:
Step one, above-mentioned to the waveform selection object modeling under minimal mutual information, after radar waveform storehouse waveform parameter is arranged, interference signal parameters arranges, then can obtain waveform transformation situation by launching different undesired signals to target radar, also laboratory training data are namely obtained, specific implementation process is: from waveform 1, carry out waveform selection, attack Stochastic choice from 4 interference numberings, obtain new waveform, carry out upgrading, circulating, obtain 100 training datas.
Step 2, structure Bayesian network, as shown in Figure 3, add prior distribution, solve maximum a posteriori solution.Prior probability is set to Dirichlet distribution, and probability is impartial.Utilize the Bayes tool box in Matlab to solve according to following formula the conditional probability in Bayesian network, finally obtain the posterior probability of root node.
bayes's record sheet is then the waveform transfer case under the disturbance added up on the basis of the maximum a posteriori probability solution solved, wherein, under existing waveform, a certain interference, choose there is maximum a posteriori probability new waveform as output, also be namely recorded in table, namely S t + 1 max = arg max S t + 1 { P ( S t , r t , S t + 1 ) } .
Step 3, obtain Bayes posterior probability table after, then show update algorithm process flow diagram according to the Bayes in Fig. 5, and carry out mutual between the waveform selection object under minimal mutual information criterion, then algorithm iteration, study, Fig. 7 is algorithm convergence linearity curve, Fig. 8, Fig. 9 is then the learning outcome that Bayes shows update algorithm.
The convergence of Q learning algorithm and Bayes show visible Fig. 6 and Fig. 7 of convergence of update algorithm, can find out, proposed by the invention shows the Q learning algorithm convergence of renewal better based on Bayes, also namely better to the results of learning of radar waveform adaptive behavior; Show after update algorithm study through Bayes, can find out that from Fig. 7 iterations statistic curve iterations reduces until stable gradually, when reaching stable, also algorithm has learnt destination object, then battlefield Qualify Phase afterwards, the Bayes posterior probability table utilizing laboratory learning phase iteration to obtain and the interference selection strategy of algorithm, under identical original state and dbjective state, the jamming signal type selection on battlefield and state transfer case are as shown in Figure 8; When initial value is not waveform 1, the posterior probability table utilizing Bayes to show update algorithm to learn, then can obtain mean iterative number of time required when different initial waveforms is issued to target waveform, as Fig. 9, horizontal ordinate represents the initial waveform numbering of each iteration initialization, and ordinate is the number of times of corresponding initial waveform algorithm iteration when being issued to target waveform.Can find out, after iterative learning, under each initial waveform, want the number of times of attack reached needed for target waveform to greatly reduce, all within 10 times.

Claims (4)

1. a radar self-adaption behavior Q learning method, is characterized in that, comprise the steps:
S1, study side is by continuous emission detection undesired signal, force target radar to change to transmit, the side's of study receiving end obtains transmitting of target radar next time, for the dynamic waveform storehouse of sound study side, the concrete grammar in the dynamic waveform storehouse of described sound study side is: the shape information obtain study side's receiving end and known waveform contrast, if not this waveform in dynamic waveform storehouse, then stored in dynamic waveform storehouse, then emission detection undesired signal is continued, until m time mutual in the target radar transmitted waveform that obtains all can find in dynamic waveform storehouse, wherein, m is empirical value,
S2, with the waveform selection under time domain minimal mutual information criterion for learning object, modeling is carried out to it, and utilizes the object of modeling and study side to carry out alternately, obtain the wave-like shape transitions situation under disturbance and laboratory training data;
S3, training data described in S2 is utilized to carry out Bayesian network parameters study, utilize the Bayes tool box under Matlab environment, add Di Li Cray prior distribution, obtain maximum a posteriori probability table and Bayes's record sheet of new radar waveform, wherein, Bayes's record sheet has the new waveform numbering of maximum a posteriori probability under referring to existing waveform, existing undesired signal;
S4, on the basis of original Q learning algorithm, with Bayes's record sheet described in S3 for priori, show update algorithm according to Bayes and carry out iterative learning, and provide learning outcome.
2. a kind of radar self-adaption behavior Q learning method according to claim 1, is characterized in that: the concrete grammar of modeling described in S2 is:
S21, when target radar waveform selection criterion is minimal mutual information criterion, radar echo signal is modeled as b=a+w=S α+w, and wherein, S is the waveform convolution matrix comprising waveform parameter, and α is scattering coefficient vector, and w is receiver noise vector;
S22, carry out waveform selection, be specially: ensure that the waveform next time sent can obtain more amount of new information, namely the mutual information of twice radar echo signal in front and back is minimum, namely M I = min s i ( n ) { M I ( b 1 , b i ) } M I ( b 1 , b i ) = H ( b 1 ) - H ( b 1 | b i ) = H ( b i ) - H ( b i | b 1 ) , Under the hypothesis of w white Gaussian noise distribution, the mutual information between waveform 1 and waveform i is, wherein, { d k| k=1,2 ..., K} is cross-correlation matrix R xzsingular value, matrix R xzbe defined as singular value meets: 1>=d 1>=d 2...>=d k>=0, cross-correlation matrix R 11, R i1, R iibe defined as E [ b 1 b 1 H ] = R 11 = S 1 R α α S 1 H + R w w E [ b i b 1 H ] = R i 1 = S i R α α S 1 H E [ b i b i H ] = R i i = S i R α α S i H + R w w , Obtain the mutual information between different wave;
S23, modeling is carried out to the waveform selection object described in S22, characterize different radar waveform states with parameters such as signal waveform radar, bandwidth, select waveform minimum with a upper transmitted waveform mutual information in waveform library as new radar waveform state;
S24, different undesired signals is set, affects the waveform selection of target radar, constantly carry out alternately, then obtaining the wave-like shape transitions situation under disturbance and laboratory training data with this.
3. a kind of radar self-adaption behavior Q learning method according to claim 1, is characterized in that: Bayesian network parameters study described in S3 is specially:
S31, training data described in S2 is utilized to obtain conditional probability in Bayesian network and Bayes' theorem;
S32, according to S31, conditional probability and Bayes' theorem obtain the posterior probability of output node and root node wherein, s krefer to the state in radar k moment, r krefer to the attack that study side taked in the k moment, s k+1refer to the new state in radar k+1 moment, the formula left side represents and is in state s at k moment radar k, study side take attack r ktime, radar changes new state s in the k+1 moment k+1probability, be the posterior probability estimation of radar new state.On the right of formula in denominator, P (s k+1| s k) represent the state transition probability of radar, be also the prior probability of k+1 moment state, P (r k| s k+1, s k) be the conditional probability of radar state, represent that radar is state s in the k moment k, the k+1 moment is state s k+1condition under, study side takes action r kprobability, that is to say at state s ktime, an expectation state s is set k+1, study side is for making radar from state s kforward state s to k+1select the probability that each is attacked, denominator P (r k| s k) be that molecule is to new state s k+1integration or summation, still with current state s kfor condition, study side selects to attack r kprobability.
4. a kind of radar self-adaption behavior Q learning method according to claim 1, is characterized in that: described in S4, Bayes shows update algorithm, specific as follows:
After S41, the waveform selection object modeling carried out under minimal mutual information, radar waveform storehouse waveform parameter are arranged, interference signal parameters arranges, waveform transformation situation is obtained by launching different undesired signals to target radar, namely laboratory training data are obtained, specific implementation process is: from waveform 1, carry out waveform selection, attack Stochastic choice from 4 interference numberings, obtain new waveform, carry out upgrading, circulating, obtain 100 training datas;
S42, structure Bayesian network, add prior distribution, solve maximum a posteriori solution, utilize Bayes tool box in Matlab to the conditional probability in Bayesian network solve, finally obtain the posterior probability of root node, wherein, prior probability is set to Dirichlet distribution, and probability is impartial, and Bayes's record sheet is the waveform transfer case under the disturbance added up on the basis of the maximum a posteriori probability solution solved, for under existing waveform, a certain interference, choose there is maximum a posteriori probability new waveform as output, be recorded in table, namely S t + 1 max = arg max S t + 1 { P ( S t , r t , S t + 1 ) } ;
S43, obtain Bayes posterior probability table after, show update algorithm process flow diagram according to Bayes, and carry out mutual between the waveform selection object under minimal mutual information criterion, then algorithm iteration, study.
CN201510729398.7A 2015-10-31 2015-10-31 A kind of radar self-adaption behavior Q learning methods Active CN105388461B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510729398.7A CN105388461B (en) 2015-10-31 2015-10-31 A kind of radar self-adaption behavior Q learning methods

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510729398.7A CN105388461B (en) 2015-10-31 2015-10-31 A kind of radar self-adaption behavior Q learning methods

Publications (2)

Publication Number Publication Date
CN105388461A true CN105388461A (en) 2016-03-09
CN105388461B CN105388461B (en) 2017-12-01

Family

ID=55420950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510729398.7A Active CN105388461B (en) 2015-10-31 2015-10-31 A kind of radar self-adaption behavior Q learning methods

Country Status (1)

Country Link
CN (1) CN105388461B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110031807A (en) * 2019-04-19 2019-07-19 电子科技大学 A kind of multistage smart noise jamming realization method based on model-free intensified learning
CN110308432A (en) * 2019-07-12 2019-10-08 电子科技大学 A kind of radar self-adaption waveform selection Activity recognition method neural network based
CN110494762A (en) * 2017-04-10 2019-11-22 株式会社电装 Environment monitoring radar installations
CN110533192A (en) * 2019-08-30 2019-12-03 京东城市(北京)数字科技有限公司 Intensified learning method, apparatus, computer-readable medium and electronic equipment
CN111337918A (en) * 2020-02-17 2020-06-26 南京航空航天大学 Airborne radar radio frequency stealth waveform selection method based on neural network
CN115718536A (en) * 2023-01-09 2023-02-28 苏州浪潮智能科技有限公司 Frequency modulation method and device, electronic equipment and readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030204368A1 (en) * 2002-03-29 2003-10-30 Emre Ertin Adaptive sequential detection network
CN104794359A (en) * 2015-04-29 2015-07-22 电子科技大学 Iterative step variable multi-step Q studying self-adaptation algorithm

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030204368A1 (en) * 2002-03-29 2003-10-30 Emre Ertin Adaptive sequential detection network
CN104794359A (en) * 2015-04-29 2015-07-22 电子科技大学 Iterative step variable multi-step Q studying self-adaptation algorithm

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
杨文芳: "基于信息论的雷达波形设计研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
牛建栋: "MIMO认知雷达波形设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
王彬 等: "认知雷达中基于Q学习的自适应波形选择算法", 《系统工程与电子技术》 *
王彬: "认知雷达波形优化设计研究", 《中国博士学位论文全文数据库 信息科技辑》 *
鞠默然: "认知雷达自适应波形设计", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110494762A (en) * 2017-04-10 2019-11-22 株式会社电装 Environment monitoring radar installations
CN110494762B (en) * 2017-04-10 2022-11-11 株式会社电装 Periphery monitoring radar apparatus
CN110031807A (en) * 2019-04-19 2019-07-19 电子科技大学 A kind of multistage smart noise jamming realization method based on model-free intensified learning
CN110031807B (en) * 2019-04-19 2021-01-12 电子科技大学 Multi-stage smart noise interference method based on model-free reinforcement learning
CN110308432A (en) * 2019-07-12 2019-10-08 电子科技大学 A kind of radar self-adaption waveform selection Activity recognition method neural network based
CN110533192A (en) * 2019-08-30 2019-12-03 京东城市(北京)数字科技有限公司 Intensified learning method, apparatus, computer-readable medium and electronic equipment
CN111337918A (en) * 2020-02-17 2020-06-26 南京航空航天大学 Airborne radar radio frequency stealth waveform selection method based on neural network
CN115718536A (en) * 2023-01-09 2023-02-28 苏州浪潮智能科技有限公司 Frequency modulation method and device, electronic equipment and readable storage medium
CN115718536B (en) * 2023-01-09 2023-04-18 苏州浪潮智能科技有限公司 Frequency modulation method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN105388461B (en) 2017-12-01

Similar Documents

Publication Publication Date Title
CN105388461A (en) Radar adaptive behavior Q learning method
CN111860982B (en) VMD-FCM-GRU-based wind power plant short-term wind power prediction method
CN110632572B (en) Radar radiation source individual identification method and device based on unintentional phase modulation characteristics
Haykin et al. Cognitive radar: Step toward bridging the gap between neuroscience and engineering
US11973540B1 (en) Radio signal processing network model search
CN109993280A (en) A kind of underwater sound source localization method based on deep learning
CN103675799B (en) A kind of phased array sonar system transducer sparse battle array optimization method
CN114022693B (en) Single-cell RNA-seq data clustering method based on double self-supervision
CN110996343A (en) Interference recognition model based on deep convolutional neural network and intelligent recognition algorithm
CN107462875B (en) Cognitive radar maximum MI (maximum MI) waveform optimization method based on IGA-NP (ensemble-nearest neighbor) algorithm
Wang et al. Pre-trained Gaussian processes for Bayesian optimization
CN112147571A (en) Sound source azimuth angle estimation method based on regular orthogonal matching pursuit and bat algorithm
CN114201987B (en) Active interference identification method based on self-adaptive identification network
CN109388778A (en) A kind of iteration volume point Unscented kalman filtering method
Liu et al. Path planning based on improved deep deterministic policy gradient algorithm
CN114415110B (en) Direct positioning method for non-negative sparse Bayesian learning
Scala et al. Optimal adaptive waveform selection for target tracking
CN118114031A (en) Radio waveform prediction method and system based on machine learning
Choi et al. Information-maximizing adaptive design of experiments for wind tunnel testing
US20230385611A1 (en) Apparatus and method for training parametric policy
CN117193008A (en) Small sample robust imitation learning training method oriented to high-dimensional disturbance environment, electronic equipment and storage medium
CN117390544A (en) Unknown radar waveform identification method based on dung beetle optimization-open set differential distribution alignment
Turlapaty et al. Parameter estimation and waveform design for cognitive radar by minimal free-energy principle
CN116996147A (en) Underwater sound communication self-adaptive modulation method and device based on deep reinforcement learning
CN112348165A (en) Underwater acoustic communication signal classification and identification method and system based on hybrid cycle network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant