CN109302262A - A kind of communication anti-interference method determining Gradient Reinforcement Learning based on depth - Google Patents
A kind of communication anti-interference method determining Gradient Reinforcement Learning based on depth Download PDFInfo
- Publication number
- CN109302262A CN109302262A CN201811129485.9A CN201811129485A CN109302262A CN 109302262 A CN109302262 A CN 109302262A CN 201811129485 A CN201811129485 A CN 201811129485A CN 109302262 A CN109302262 A CN 109302262A
- Authority
- CN
- China
- Prior art keywords
- interference
- neural network
- strategy
- performer
- estimation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04K—SECRET COMMUNICATION; JAMMING OF COMMUNICATION
- H04K3/00—Jamming of communication; Counter-measures
- H04K3/20—Countermeasures against jamming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04K—SECRET COMMUNICATION; JAMMING OF COMMUNICATION
- H04K3/00—Jamming of communication; Counter-measures
- H04K3/40—Jamming having variable characteristics
Abstract
The invention belongs to wireless communication technology fields, are related to a kind of communication anti-interference method that Gradient Reinforcement Learning is determined based on depth.The present invention constructs interference environment model according to interference source quantity and wireless channel model first;Utility function is constructed according to legitimate user's communication quality index, and using the utility function as the return in study;The spectrum information that different time-gap samples is built into frequency spectrum time slot matrix, with the matrix description interference environment state.Then Gradient Reinforcement Learning mechanism is determined according to depth, constructs convolutional neural networks, when carrying out anti-interference decision, ambient condition matrix realizes Anti-interference Strategy selection of the corresponding states on continuous space by target performer convolutional neural networks.Base of the present invention.Determine that the intensified learning mechanism of gradient policy completes continuous Anti-interference Strategy selection in communication in depth.Quantization discrete processes policy space bring quantization error is overcome, neural network output unit lattice number and network complexity is reduced, improves Anti-interference algorithm performance.
Description
Technical field
The invention belongs to wireless communication technology fields, are related to a kind of communication that Policy-Gradient Reinforcement Learning is determined based on depth
Anti-interference method.
Background technique
With the development of wireless communication technique, the electromagnetic environment that wireless communication system faces is increasingly complicated severe, both may
It can be by the unintentional interference communicated from one's own side, it is also possible to which will receive the interference signal that enemy deliberately discharges influences.Traditional
Antijamming measure is directed to the static interference mode of interference source, takes fixed Anti-interference Strategy.With interference means intelligence,
Interference source can adjust jamming exposure area according to the change dynamic of legitimate user's communications status, so that traditional anti-interference method can not be protected
Demonstrate,prove normal communication of the legitimate user under dynamic interference environment.It is therefore desirable to the dynamic disturbance strategies for interference source to take phase
The intelligent Anti-interference Strategy answered guarantees normal communication of the legitimate user under dynamic interference environment.
Currently, the dynamic disturbance means for interference source mainly carry out Anti-interference Strategy by the way of based on intensified learning
Dynamic adjusts.This method carries out sliding-model control to Anti-interference Strategy space first, constructs Anti-interference Strategy collection;Secondly construction with
The relevant utility function of legitimate user's communication quality;Ambient condition matrix is obtained by spectral sample and pretreatment, and by environment
State matrix realizes discrete strategies selection by deep neural network;Selection strategy is finally acted on environment and estimates environment shape
State transfer.By repeatedly learning, the optimal communication strategy under dynamic disturbance strategy is obtained.It specifically refers to: Xin Liu,
etc., “Anti-jamming Communications Using Spectrum Waterfall:A Deep
Reinforcement Learning Approach”,IEEE Communication Letters,vol.22,no.5,
May.2018.This method constitutes power selection set by carrying out quantization discrete processes to power selection strategy.Then construction is deep
Spend neural network, and by the frequency spectrum time slot matrix sampled from air interference environment by neural network output it is corresponding each from
Dissipate the state behavior functional value of power policy.The selection of power policy is carried out finally by ∈-greedy Greedy strategy.However,
This method can introduce quantization error when carrying out quantization discrete processes to power so that power selection result be unable to reach it is optimal.No
Only in this way, in discretization power for the transmission power in different subchannels, according to quantization discrete processes rule, the plan of construction
Slightly gathering in need includes N × L element, and wherein N is the number of channel, is quantization series, and corresponding deep neural network needs a LN
Output.When system channel number and excessive quantization series, neural network exports number exponentially and increases, and increases the instruction of neural network
Practice and carry out based on ∈-greedy Greedy strategy the complexity of policy selection.
Summary of the invention
Against the above technical problems, the present invention proposes that one kind determines Policy-Gradient strategy intensified learning mechanism based on depth
The communication anti-interference power selection method of (Deep Deterministic Policy Gradient, DDPG).To power plan
In the case that slightly space carries out discretization, the selection for determining anti-interference power strategy is completed, interference free performance is improved, reduces strategy
Select complexity.
The present invention constructs interference environment according to interference source quantity and wireless channel model first.Matter is communicated according to legitimate user
Figureofmerit constructs utility function, and using the utility function as the return in study.The spectrum information structure that different time-gap is sampled
Frequency spectrum time slot matrix is built up, with the matrix description interference environment state.It is constructed in the present invention including target performer (target_
Actor), estimate performer (evaluate_actor), target reviewer (target_critic) and estimation reviewer
(evaluate_critic) four deep neural networks are respectively used to policy selection based on ambient condition matrix, strategy
Select the operations such as network training, policy selection evaluation and evaluation network training.Wherein, target performer neural network and estimation performer
Neural network network structure having the same, target reviewer neural network and estimation reviewer's neural network net having the same
Network structure.Ambient condition matrix exports Anti-interference Strategy by target performer neural network.Legitimate user adjusts transmission power
And channel selection, realize intelligent Anti-interference Strategy adjustment.Return letter is calculated according to air interference environmental model and Anti-interference Strategy
Numerical value and transfer environment state matrix.Current ambient conditions, current Anti-interference Strategy, Reward Program value and transfer ambient condition
Composition experience group, is stored in experience pond.The experience group finally extracted in experience pond is completed to estimation performer's neural network and is estimated
Count the training of reviewer's neural network.When study step number reaches certain amount, commented by estimation performer's neural network and estimation
The update to target performer neural network and target reviewer's neural network is respectively completed by the parameter of family's neural network.The study
Mechanism is continued for, until learning outcome is restrained.
Using the realization of the present invention mentioned legitimate user's intelligence Anti jamming Scheme the following steps are included:
S1, the intelligent each algoritic module definition of Anti jamming Scheme: interference environment definition, the definition of interference environment state, return
Function definition, Anti-interference Strategy definition, the definition of experience storage pool.
S2, construction target performer neural network (target_actor), estimation performer's neural network (evaluate_
Actor), target reviewer neural network (target_critic) and estimation reviewer's neural network (evaluate_critic)
Four deep neural networks.Wherein target performer neural network and estimation performer's neural network network structure having the same, mesh
Mark reviewer's neural network and estimation reviewer's neural network structure having the same.
S3, by environmental state information, i.e. frequency slot matrix obtains Anti-interference Strategy by target performer's neural network, should
Strategy acts on interference environment, calculates return value and transfering state matrix of the Anti-interference Strategy under current n interference environment, goes forward side by side
Row storage.
S4, experience group of sampling from experience pond are trained estimation performer's neural network and estimation reviewer's neural network
With parameter with new.
S5, judges whether study mechanism meets stop condition, if satisfied, then stopping learning obtaining Anti-interference Strategy to the end;
Otherwise S2 is returned to continue to learn.
According to an embodiment of the invention, above-mentioned steps S1 the following steps are included:
Interference environment definition: S1.1 defines interference environment according to intruder's quantity, conflicting mode and wireless channel model.
S1.2, the definition of interference environment state: the spectrum information that different time-gap is measured constitutes frequency spectrum time slot matrix, when frequency spectrum
Gap matrix size is determined by observation spectral range and observation slot length.
Reward Program definition: S1.3 constructs feedback Reward Program according to the communication quality index of legitimate user.
Anti-interference Strategy definition: transmission power combination in different subchannels is defined as Anti-interference Strategy collection by S1.4.Often
Transmission power on sub-channels can beAny value on continuum.
S1.5, the definition of experience storage pool: the experience storage pool of a default fixed size, for storing by current environment shape
The experience group that state matrix, Anti-interference Strategy, Reward Program value and transfer environment state matrix form.
According to embodiments of the present invention, above-mentioned steps S2 the following steps are included:
S2.1, using mutually isostructural convolutional neural networks construction target performer neural network and estimation performer's nerve net
Network.Convolutional neural networks include multiple convolutional layers, multiple pond layers and multiple full articulamentums.Target performer's neural network is according to defeated
Enter the selection that frequency spectrum time slot state matrix completes Anti-interference Strategy.Estimate that performer's neural network completes network according to sampling experience group
Trained and parameter updates.When train epochs reach preset value, with estimation performer's neural network parameter coverage goal performer nerve
Network parameter, so that the parameter for completing target performer's neural network updates.
S2.2, using mutually isostructural conventional depth neural network configuration target reviewer neural network and estimation reviewer
Neural network.The deep neural network includes multiple neural net layers, includes multiple neurons, activation in each neural net layer
Function.The output of target reviewer's neural network is used to help the policy selection superiority and inferiority of evaluation performer's neural network.Estimation comment
Family's neural network carries out network training according to sampling posterior infromation and parameter updates.When train epochs reach preset value, with estimating
It counts reviewer's neural network parameter coverage goal reviewer's neural network and completes parameter update.
According to an embodiment of the invention, above-mentioned steps S3 the following steps are included:
S3.1, according to the definition of ambient condition in step S1.2, by ambient condition matrix by constructing in step S2.1
Target performer's neural network obtains Anti-interference Strategy.And Anti-interference Strategy is acted on into the interference environment that step S1.1 is defined, it counts
State matrix after calculating Reward Program value and shifting in next step.
S3.2, define a capacity be M experience pond, and by the strategy interaction of current ambient conditions, selection in S3.1,
Obtained Reward Program value and next step ambient condition constitutes experience group { S, A, R, S_ } and is stored in experience pond.
According to an embodiment of the invention, above-mentioned steps S4 the following steps are included:
S4.1 randomly selects a certain number of experience groups for convolutional neural networks parameter from the experience pond that S3.2 is obtained
Training and update.
S4.2, the current state S and next step state S_ in experience group extracted by step S4.1, passes through target nerve
Network and estimation neural network obtain corresponding two states behavior value.Pass through current Reward Program value and two state behavior values
Loss function is constructed, network training and update are completed to estimation reviewer's neural network by minimizing loss function.
S4.3, by the current state S in experience group that step S4.1 is extracted by estimating that reviewer's neural network obtains it
Step S4.1 is extracted current state S in experience group and strategy A and is obtained pair by target performer's neural network by state behavior value
Answer state behavior value.Loss function is constructed according to two state behavior values, carries out the training and parameter of estimation performer's neural network
It updates.
The invention has the benefit that
The present invention is based on depth to determine that the intensified learning mechanism of Policy-Gradient strategy completes continuous Anti-interference Strategy in communication
Selection.Quantization discrete processes policy space bring quantization error is overcome, neural network output unit lattice number and net are reduced
Network complexity improves Anti-interference algorithm performance.
Detailed description of the invention
Fig. 1 is that the Anti-interference Strategy based on determining depth-size strategy gradient policy intensified learning mechanism that the present invention designs selects
Algorithm process frame
Fig. 2 is the target performer neural network that the present invention designs and estimation performer's neural network structure
Fig. 3 is the target reviewer neural network that the present invention designs and estimation reviewer's neural network structure
Fig. 4 is the designed algorithm of this hair and optimal policy selection, randomized policy selection and is based on DQN discretization decision-making party
The algorithm performance of method compares.
Specific embodiment
To keep step of the invention clear in further detail, below in conjunction with attached drawing and case study on implementation to of the invention further detailed
Explanation.
Embodiment one
Fig. 1 is inventive algorithm specific implementation method, and each step and its principle is described in detail below with reference to Fig. 1.
It is proposed by the present invention to determine that the continuous policy selection anti-interference method algorithm of gradient policy intensified learning is real based on depth
Existing frame is as shown in Figure 1.Interference and wireless environment modeling are completed in step S1 in S1.1.Multiple interference sources are to legal logical in scene
Letter link is interfered, and conflicting mode may include but be not limited to: single tone jamming, Multi-tone jamming, linear frequency sweep interference, part frequency
Band interference and five kinds of noise frequency hopping interference interference.Interference source can be by adjusting interference parameter or switching conflicting mode realization pair
The interference dynamic of legitimate user adjusts.Five kinds of conflicting mode concrete mathematical models are as follows:
(1) single tone jamming
The complex radical type expression of single tone jamming signal are as follows:
Wherein, A is single tone jamming signal amplitude, fJFor single tone jamming signal frequency,For single tone jamming initial phase.
(2) Multi-tone jamming
The complex radical type expression of Multi-tone jamming signal are as follows:
Wherein, AmFor m-th of single tone jamming amplitude in Multi-tone jamming, fmFor the frequency of m-th of single tone jamming,For m
The initial phase of a single tone jamming.
(3) linear frequency sweep interferes
The complex radical type expression of linear frequency sweep interference signal are as follows:
Wherein, A is amplitude, f0It is original frequency, k is coefficient of frequency modulation,It is initial phase, T is signal duration.
(4) partial-band jamming
Partial-band Gaussian noise jamming shows as white Gaussian noise in partial-band, the expression formula of complex base band:
Wherein, Un(t) be obey mean value be zero, variance isBaseband noise, fJFor the centre frequency of signal,For [0,
2 π] in be uniformly distributed and mutually independent phase.
(5) niose-modulating-frenquency jamming
The complex base band of noise FM signal can indicate as follows:
Wherein, A is the amplitude of noise FM signal, f0For the carrier frequency of noise FM signal, kfmFor frequency modulation index (FM index), ξ
It (t) is zero-mean, varianceFor this white noise of the narrowband loud, high-pitched sound of certain value.WhereinIt is a Wiener-Hopf equation, belongs to
In oneGaussian Profile.Frequency modulation index (FM index) kfmAnd varianceThe effective bandwidth of noise FM is codetermined.
Interference source is according to maximum interference effect dynamic select conflicting mode and corresponding parameter.
Legitimate user's Anti-interference Strategy calculates Reward Program value R by wireless frequency spectrum intelligence sample in environment, calculates environment
State matrix S;History warp is constructed according to Reward Program, ambient condition, current Anti-interference Strategy and next step transfering state matrix
Group is tested, is stored in experience pond;Neural network carries out anti-interference action selection in next step according to current ambient conditions matrix, and will
The Anti-interference Strategy acts on environment, while the update of parameter is carried out according to historical experience;Entire algorithm iteration is carried out until calculating
Method convergence.Specifically, the specific implementation step of the algorithm is as follows:
Step S1.2, S1.3 and S1.4 are respectively completed ambient condition design, the design of Reward Program and resist dry in the present invention
Disturb the design of strategy.In multi sub-channel, received signal be may be expressed as: on sub-channels for legal link receiving end
Wherein m ∈ { 1 ..., N } is channel indexes number, and N is channel number;xtIt is useful transmitting signal, xjIt is interference signal,It is white Gaussian noise in subchannel;J ∈ { 1 ..., J } is interference source call number, and J is interference source number;Sequence index when t is
Number;Indicate the channel between legitimate correspondence user,Interference channel of the expression interference source to legitimate user receiver.Cause
This, Signal to Interference plus Noise Ratio obtained by legitimate user receiving end and achievable rate may be expressed as:
WhereinIt is the equivalent channel gain in subchannel,It is corresponding noise power.Receiving end
The rate summation in N number of subchannel is represented by the achievable rate of moment t:
Before anti-interference decision, corresponding power on every sub-channels, institute are obtained by the sampling to wireless environment first
There is the power of subchannel to constitute vector power P=[pt,1,pt,2,…,pt,N], wherein N corresponds to subchannel number.State matrix S by
Multiple historical power vectors constitute St=[Pt-1 Pt-2…Pt-t]T, wherein t is observation time window.Anti-interference Strategy is considered simultaneously
Limitation in terms of transmission power, Anti-interference Strategy used by the Reward Program that designs considers in the present invention are dry in letter simultaneously
It makes an uproar than upper gain and power overhead, expression is as follows:
WhereinIt is jamming power of the interference source on channel;FunctionExpression is worked as
fjWhen=m, otherwise output 1 exports 0;It is transmission power expense.
Interference strength due to the influence in the source of being interfered, in certain subchannelsIt is larger, it can be by adjusting
Transmission power in respective channel guarantees to maximize link communication quality within the scope of controlled power.Therefore every in the present invention
Anti-interference Strategy on sub-channels is the transmission power in the subchannel.It will assume subchannel m emission maximum in the present invention
Power isWherein m ∈ { 1 ..., N }, therefore Anti-interference Strategy collection is represented by
Experience group and experience pond are defined in inventive step S1 in S1.5 step, passes through the storage and sampling to historical experience
The training and parameter for providing the neural network in subsequent step update.It is described according to the algorithm structure of Fig. 1, defines appearance in invention
Amount size is MeExperience pond, M can be storedeHistorical experience.The current ambient conditions obtained by S1.2-S1.5 in step S1
S, Reward Program value R, current Anti-interference Strategy AtWith transfer ambient condition S-Building experience group { S, R, At,S_}.The experience group quilt
It is stored in experience pond one by one, when the experience group item number stored in experience pond reaches maximum size, the longest experience group of storage time
By newly into experience group covering.
In inventive step S2 step S2.1, using convolutional neural networks construction target performer neural network μ (| θμ) and
Estimation performer's neural network μ ' (| θμ).Target performer neural network and estimation performer's neural network network knot having the same
Structure, specific structure is as shown in Fig. 2, design parameter reference implementation example two.The current ambient conditions matrix obtained by step S1.2 leads to
Cross the transmission power vector that target performer neural network spatially selects corresponding subchannel from continuous Anti-interference Strategy:In order to realize to the exploration of unknown strategy, overcome the case where falling into local optimum, the vector power with
The random search noise of identical dimensional is superimposed, i.e.,Form current Anti-interference Strategy At.The strategy
It acts on environment, completes the interaction of strategy with interference environment, to shift in next step ambient condition and Reward Program value
It calculates.In inventive step S2 step S2.2 using same depth neural network structure construction target reviewer neural network Q (
|θt) and estimation reviewer's neural network Q'(| θt).Target performer neural network is completed according to input spectrum time slot state matrix
The selection of Anti-interference Strategy.Estimate that performer's neural network completes network training and parameter with new according to sampling experience group.Work as training
When step number reaches preset value, with performer's neural network parameter coverage goal performer's neural network parameter is estimated, to complete target
The parameter of performer's neural network updates.The output of target reviewer's neural network is used to help the strategy of evaluation performer's neural network
Select superiority and inferiority.Estimate that reviewer's neural network carries out network training according to sampling posterior infromation and parameter updates.Work as train epochs
When reaching preset value, parameter is completed with estimation reviewer's neural network parameter coverage goal reviewer's neural network and is updated.
In step S3 in step S3.1 by strategy obtained in S2.2 as the transmission power on present channel m, next time
It is calculated when calculating ambient condition according to new transmission power and interference model.In step S3 in step S3.2, according to S1.5
Defined in experience storage pool capacity and structure, by S2.1 current ambient conditions, select in S2.2 strategy interaction,
The next step ambient condition that Reward Program value and S3.1 obtained in S2.2 are obtained constitutes experience group { S, At, R, S_ } and it is stored in
In the experience pond.When the experience group of storage reaches the maximum size of experience group, newest obtained experience group is stored in oldest
In the storage unit of experience group storage, the oldest experience group is covered.
In step s 4 in step S4.1, according to presetting batch_size size from the experience storage pool in step S3
The middle experience group for extracting corresponding number is completed to estimation reviewer Q'(| θt) neural network parameter training.According to Fig. 1 institute
Show, step S4.2 is to estimation reviewer's neural network Q'(| θ in step S4Q') training by minimize its loss function
Loss_function realizes that wherein Loss_function is defined as follows:
Lloss_function(θQ')=(1/N) ∑i(yi-Q(Si,Ai|θQ'))2 (10)
yi=Ri+γQ(Si+1,μ'(Si+1|θμ')|θQ) (11)
Wherein Q (Si,Ai|θQ) indicate dependent on estimation performer's neural network parameter θQ‘State behavior value function, γ indicate
Long-term return discount factor.When train epochs, which reach, updates step number I, it will estimate that the network parameter in reviewer's neural network is answered
Make the update that network parameter is completed in target reviewer's neural network.Step S4.3 is to estimation performer's neural network in step S4
μ'(·|θμ') training by strengthen target reviewer neural network optimal policy choice direction and estimation performer's neural network work as
Parameter optimal selection direction is realized under preceding ambient condition, and update method is as follows:
When train epochs, which reach, updates step number I, the network parameter estimated in performer's neural network is copied into target and is drilled
The update of network parameter is completed in member's neural network.
In step s 5, with trained lasting progress, Reward Program R gradually converges to its optimal value.The present invention falls into a trap
The Change in Mean situation for recording ζ step R thinks trained convergence when the Change in Mean is sufficiently small, stops the algorithm, and will be final defeated
Strategy out is anti-interference as final strategy.Convergent decision procedure is as follows:
Wherein υ is to determine convergent termination condition, is set as a very small positive value.
Embodiment two
Convolutional neural networks structure for anti-interference decision proposed by the invention is as shown in Figure 2: system is assumed in emulation
System divides 128 sub-channels, according to the frequency spectrum time slot state matrix of spectral sample signal construction 128 × 128 as convolutional Neural
The input of network;Then by three convolutional layers, the vector power of two pond layers and two full articulamentum output 1 × 128.Tool
Body, in convolutional neural networks convolutional layer, pond layer and operation it is as follows:
Assuming that the input data of convolution algorithm is I, corresponding convolution kernel K is identical as the dimension of input data.With three-dimensional defeated
1) entering data instance (when input data is two-dimentional, can regard the third dimension as.Convolution operation require the convolution kernel K third dimension with it is defeated
It is identical to enter the data I third dimension, uses w1,w2,w3Indicate each three dimensions, after convolution operation, output are as follows:
It generally includes to maximize pond, mean value pond in the operation of convolutional neural networks pondization, calculation method is as follows:
Mean value pond:
Maximum value pond:
Maximum value pond is used in the present invention.
Specifically, each layer of structure is as shown in Fig. 2, every layer of structure is described in detail below in the present embodiment:
Convolutional neural networks first layer is input layer, and input size is determined by subchannel number and observation slot length.
Usable spectrum is divided into 128 sub-channels in network model, and it is 128 that observation time slot, which is length, therefore input state matrix is tieed up
Degree is 128 × 128.
The convolutional neural networks second layer is made of the operation of convolution, Relu activation primitive and pondization.Specifically, coming from input layer
State matrix first pass around convolution kernel having a size of 3 × 3 convolution operation, wherein convolution kernel number is 20, and convolution step-length is 1,
Using ReLu as activation primitive.Output result dimension after the operation is 126 × 126 × 20.Wherein Relu activates letter
Number operation are as follows:
Y=max { 0, x } (17)
The output is subjected to maximum pondization operation again, pond is having a size of 2 × 2.After the operation of the convolution pondization of first layer
Exporting dimension is 63 × 63 × 20.
By convolutional network third layer, convolution operation obtains 31 × 31 for output after convolution pondization operation from the second layer
× 30 output.Wherein convolution kernel ruler dimension is 3 × 3, and convolution kernel number is 30, and activation primitive uses Relu function, convolution step
A length of 2.
The output of third layer is carried out convolution operation by the 4th layer of convolutional network, and the convolution kernel of use is having a size of 4
× 4, convolution kernel number is 30, and convolution step-length is 2, and to w1,w2Two dimensions carry out zero padding operation, and zero padding number is 1.By
It is 15 × 15 × 30 that dimension is exported after this layer of convolution operation.And the output after convolution operation will be changed to carry out maximum pondization and operate,
Having a size of 3 × 3, it is 5 × 5 × 30 that dimension is exported behind pond in pond.
Convolutional network layer 5 is full articulamentum, constructs 1024 neurons in this layer, and activation primitive uses Relu letter
Number.It is reassembled as the vector that dimension is 1 × 750 from the output that the 4th layer of dimension of convolutional neural networks is 5 × 5 × 30, is passed through
The vector of dimension 1 × 360 is exported after the full articulamentum processing.
Convolutional network layer 6 is full articulamentum, constructs 128 neurons in this layer, and activation primitive uses Relu letter
Number.Output from convolutional neural networks layer 5 output after the full articulamentum processing is corresponding with Anti-interference Strategy collection dimension
Q (| θt) value vector, output dimension is 1 × 128.
Fig. 3 is for realizing the layer neural network of estimation reviewer's neural network and target reviewer's neural network, nerve
Network structure.First layer is input layer, and dimension is 128 × (128+1), wherein the state square comprising indicating channel power information
Battle array St, and the function vector A for indicating strategyt.The second layer is nervous layer 1, and neuron number 1024, output dimension is 1024
× 1, activation primitive is ReLu function.Third layer is nervous layer 2, and neuron number 128, output dimension is 128 × 1, is used
ReLu activation primitive.4th layer is nervous layer 3, and neuron number 32, output dimension is 32 × 1, activates letter using ReLu
Number.Layer 5 is nervous layer 4, and neuron number 1 exports the Q value for evaluating performer's network strategy selection superiority and inferiority.
Further, Fig. 4 is illustrated in the present invention and is determined that the continuous power of Policy-Gradient Reinforcement Learning selects based on depth
Anti-interference Strategy performance.Random power selection strategy, the discrete power selection strategy based on DQN, institute of the present invention have been carried out in figure
Propose the performance of continuous power selection strategy and ideal optimal power selection strategy that Policy-Gradient is determined based on depth.It can from figure
To find out, algorithm Reward Program proposed in the present invention has very big performance boost compared to random power selection strategy.
Claims (3)
1. a kind of communication anti-interference method for determining Gradient Reinforcement Learning based on depth, which comprises the following steps:
S1, initialization definitions, comprising:
Interference environment: interference environment is defined according to intruder's quantity, conflicting mode and wireless channel model;
Interference environment state: the spectrum information that different time-gap is measured constitutes frequency spectrum time slot matrix, frequency spectrum time slot matrix size by
It observes spectral range and observation slot length determines;
Reward Program: feedback Reward Program is constructed according to the communication quality index of legitimate user;
Anti-interference Strategy: the transmission power combination in different subchannels is defined as Anti-interference Strategy collection;
Deep neural network: construction target performer, estimation performer, target reviewer and estimation four depth nerve nets of reviewer
Network, wherein target performer neural network and estimation performer's neural network network structure having the same, target reviewer's nerve net
Network and estimation reviewer's neural network network structure having the same;
Experience storage pool: the experience storage pool of a default fixed size, for storing by current AF panel strategy, environment shape
The experience group of state, current AF panel strategy and environment return composition;
S2, by interference environment state, i.e. frequency slot matrix obtains Anti-interference Strategy by target performer's convolutional neural networks, and
The strategy is acted on into interference environment, is observed under current Anti-interference Strategy in the return value of interference environment under according to Reward Program
State matrix after the transfer of one step;The output of the target reviewer neural network is used to help the plan of evaluation performer's neural network
Slightly select superiority and inferiority;
S3, by under current Anti-interference Strategy, interference environment state, Anti-interference Strategy return value and transfer ambient condition constitute warp
Group storage is tested to experience pond;
S4, experience group of sampling from experience pond are trained estimation performer's neural network and estimation reviewer's neural network, when
When train epochs reach preset value, with estimation performer's neural network parameter coverage goal performer's neural network parameter, commented with estimation
By family's neural network parameter coverage goal reviewer's neural network parameter, to complete the parameter of target performer's neural network more
Newly;
S5, judge whether study mechanism meets preset stop condition, if satisfied, then stopping learning obtaining anti-interference plan to the end
Slightly;Otherwise S2 is returned to continue to learn.
2. a kind of imperfect information intelligence anti-interference method based on intensified learning according to claim 1, feature exist
In Reward Program described in step S1 are as follows:
Wherein, m ∈ { 1 ..., N } is channel indexes number, and N is channel number,It is interference source in channel
On jamming power, j ∈ { 1 ..., J } is interference source call number, and J is interference source number;T is timing call number;It indicates to close
Channel between method communication user,For sub-channel transmission power, functionF is worked as in expressionjWhen=m, output 1 is otherwise defeated
Out 0;It is transmission power expense.
3. a kind of imperfect information intelligence anti-interference method based on intensified learning according to claim 2, feature exist
In, in the step S4, the method for convolutional neural networks parameter update are as follows:
The training of convolutional neural networks parameter passes through convolution mind by current state in the experience group of extraction and next step state
Corresponding state behavior value is obtained through network, and constructs corresponding loss function, carries out network ginseng by minimizing loss function
Several updates.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811129485.9A CN109302262B (en) | 2018-09-27 | 2018-09-27 | Communication anti-interference method based on depth determination gradient reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811129485.9A CN109302262B (en) | 2018-09-27 | 2018-09-27 | Communication anti-interference method based on depth determination gradient reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109302262A true CN109302262A (en) | 2019-02-01 |
CN109302262B CN109302262B (en) | 2020-07-10 |
Family
ID=65164716
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811129485.9A Expired - Fee Related CN109302262B (en) | 2018-09-27 | 2018-09-27 | Communication anti-interference method based on depth determination gradient reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109302262B (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109861720A (en) * | 2019-03-15 | 2019-06-07 | 中国科学院上海高等研究院 | WSN anti-interference method, device, equipment and medium based on intensified learning |
CN110113418A (en) * | 2019-05-08 | 2019-08-09 | 电子科技大学 | A kind of collaboration buffering updating method of Che Lian information centre network |
CN110611619A (en) * | 2019-09-12 | 2019-12-24 | 西安电子科技大学 | Intelligent routing decision method based on DDPG reinforcement learning algorithm |
CN110944354A (en) * | 2019-11-12 | 2020-03-31 | 广州丰石科技有限公司 | Base station interference monitoring method and system based on waveform analysis and deep learning |
CN111181618A (en) * | 2020-01-03 | 2020-05-19 | 东南大学 | Intelligent reflection surface phase optimization method based on deep reinforcement learning |
CN111526592A (en) * | 2020-04-14 | 2020-08-11 | 电子科技大学 | Non-cooperative multi-agent power control method used in wireless interference channel |
CN111835453A (en) * | 2020-07-01 | 2020-10-27 | 中国人民解放军空军工程大学 | Communication countermeasure process modeling method |
CN112087749A (en) * | 2020-08-27 | 2020-12-15 | 华北电力大学(保定) | Cooperative active eavesdropping method for realizing multiple listeners based on reinforcement learning |
CN112188004A (en) * | 2020-09-28 | 2021-01-05 | 精灵科技有限公司 | Obstacle call detection system based on machine learning and control method thereof |
CN112202527A (en) * | 2020-10-01 | 2021-01-08 | 西北工业大学 | Intelligent electromagnetic signal identification system interference method based on momentum gradient disturbance |
CN112492691A (en) * | 2020-11-26 | 2021-03-12 | 辽宁工程技术大学 | Downlink NOMA power distribution method of deep certainty strategy gradient |
CN112906640A (en) * | 2021-03-19 | 2021-06-04 | 电子科技大学 | Space-time situation prediction method and device based on deep learning and readable storage medium |
CN113038616A (en) * | 2021-03-16 | 2021-06-25 | 电子科技大学 | Frequency spectrum resource management and allocation method based on federal learning |
CN113098565A (en) * | 2021-04-02 | 2021-07-09 | 甘肃工大舞台技术工程有限公司 | Stage carrier communication self-adaptive frequency hopping anti-interference technology based on deep network |
CN113221454A (en) * | 2021-05-06 | 2021-08-06 | 西北工业大学 | Electromagnetic radiation source identification method based on deep reinforcement learning |
CN113411099A (en) * | 2021-05-28 | 2021-09-17 | 杭州电子科技大学 | Double-change frequency hopping pattern intelligent decision method based on PPER-DQN |
CN113890564A (en) * | 2021-08-24 | 2022-01-04 | 浙江大学 | Special ad hoc network frequency hopping anti-interference method and device for unmanned aerial vehicle based on federal learning |
CN114417939A (en) * | 2022-01-27 | 2022-04-29 | 中国人民解放军32802部队 | Interference strategy generation method based on knowledge graph |
CN114696925A (en) * | 2020-12-31 | 2022-07-01 | 华为技术有限公司 | Channel quality assessment method and related device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104581738A (en) * | 2015-01-30 | 2015-04-29 | 厦门大学 | Cognitive radio hostile interference resisting method based on Q learning |
CN104994569A (en) * | 2015-06-25 | 2015-10-21 | 厦门大学 | Multi-user reinforcement learning-based cognitive wireless network anti-hostile interference method |
US20190004518A1 (en) * | 2017-06-30 | 2019-01-03 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and system for training unmanned aerial vehicle control model based on artificial intelligence |
-
2018
- 2018-09-27 CN CN201811129485.9A patent/CN109302262B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104581738A (en) * | 2015-01-30 | 2015-04-29 | 厦门大学 | Cognitive radio hostile interference resisting method based on Q learning |
CN104994569A (en) * | 2015-06-25 | 2015-10-21 | 厦门大学 | Multi-user reinforcement learning-based cognitive wireless network anti-hostile interference method |
US20190004518A1 (en) * | 2017-06-30 | 2019-01-03 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and system for training unmanned aerial vehicle control model based on artificial intelligence |
Non-Patent Citations (3)
Title |
---|
GUOAN HAN: "Two-dimensional anti-jamming communication based on deep reinforcement learning", 《2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)》 * |
XIN LIU: "A heterogeneous information fusion deep reinforcement learning for intelligent frequency selection of HF communication", 《CHINA COMMUNICATIONS》 * |
XIN LIU: "Anti-Jamming Communications Using Spectrum Waterfall: A Deep Reinforcement Learning Approach", 《IEEE COMMUNICATIONS LETTERS》 * |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109861720A (en) * | 2019-03-15 | 2019-06-07 | 中国科学院上海高等研究院 | WSN anti-interference method, device, equipment and medium based on intensified learning |
CN110113418A (en) * | 2019-05-08 | 2019-08-09 | 电子科技大学 | A kind of collaboration buffering updating method of Che Lian information centre network |
CN110611619A (en) * | 2019-09-12 | 2019-12-24 | 西安电子科技大学 | Intelligent routing decision method based on DDPG reinforcement learning algorithm |
CN110611619B (en) * | 2019-09-12 | 2020-10-09 | 西安电子科技大学 | Intelligent routing decision method based on DDPG reinforcement learning algorithm |
CN110944354A (en) * | 2019-11-12 | 2020-03-31 | 广州丰石科技有限公司 | Base station interference monitoring method and system based on waveform analysis and deep learning |
CN111181618A (en) * | 2020-01-03 | 2020-05-19 | 东南大学 | Intelligent reflection surface phase optimization method based on deep reinforcement learning |
CN111526592A (en) * | 2020-04-14 | 2020-08-11 | 电子科技大学 | Non-cooperative multi-agent power control method used in wireless interference channel |
CN111526592B (en) * | 2020-04-14 | 2022-04-08 | 电子科技大学 | Non-cooperative multi-agent power control method used in wireless interference channel |
CN111835453B (en) * | 2020-07-01 | 2022-09-20 | 中国人民解放军空军工程大学 | Communication countermeasure process modeling method |
CN111835453A (en) * | 2020-07-01 | 2020-10-27 | 中国人民解放军空军工程大学 | Communication countermeasure process modeling method |
CN112087749A (en) * | 2020-08-27 | 2020-12-15 | 华北电力大学(保定) | Cooperative active eavesdropping method for realizing multiple listeners based on reinforcement learning |
CN112087749B (en) * | 2020-08-27 | 2023-06-02 | 华北电力大学(保定) | Cooperative active eavesdropping method for realizing multiple listeners based on reinforcement learning |
CN112188004A (en) * | 2020-09-28 | 2021-01-05 | 精灵科技有限公司 | Obstacle call detection system based on machine learning and control method thereof |
CN112202527A (en) * | 2020-10-01 | 2021-01-08 | 西北工业大学 | Intelligent electromagnetic signal identification system interference method based on momentum gradient disturbance |
CN112202527B (en) * | 2020-10-01 | 2022-09-13 | 西北工业大学 | Intelligent electromagnetic signal identification system interference method based on momentum gradient disturbance |
CN112492691A (en) * | 2020-11-26 | 2021-03-12 | 辽宁工程技术大学 | Downlink NOMA power distribution method of deep certainty strategy gradient |
CN112492691B (en) * | 2020-11-26 | 2024-03-26 | 辽宁工程技术大学 | Downlink NOMA power distribution method of depth deterministic strategy gradient |
CN114696925A (en) * | 2020-12-31 | 2022-07-01 | 华为技术有限公司 | Channel quality assessment method and related device |
CN114696925B (en) * | 2020-12-31 | 2023-12-15 | 华为技术有限公司 | Channel quality assessment method and related device |
CN113038616A (en) * | 2021-03-16 | 2021-06-25 | 电子科技大学 | Frequency spectrum resource management and allocation method based on federal learning |
CN113038616B (en) * | 2021-03-16 | 2022-06-03 | 电子科技大学 | Frequency spectrum resource management and allocation method based on federal learning |
CN112906640A (en) * | 2021-03-19 | 2021-06-04 | 电子科技大学 | Space-time situation prediction method and device based on deep learning and readable storage medium |
CN113098565A (en) * | 2021-04-02 | 2021-07-09 | 甘肃工大舞台技术工程有限公司 | Stage carrier communication self-adaptive frequency hopping anti-interference technology based on deep network |
CN113098565B (en) * | 2021-04-02 | 2022-06-07 | 甘肃工大舞台技术工程有限公司 | Stage carrier communication self-adaptive frequency hopping anti-interference method based on deep network |
CN113221454A (en) * | 2021-05-06 | 2021-08-06 | 西北工业大学 | Electromagnetic radiation source identification method based on deep reinforcement learning |
CN113221454B (en) * | 2021-05-06 | 2022-09-13 | 西北工业大学 | Electromagnetic radiation source identification method based on deep reinforcement learning |
CN113411099A (en) * | 2021-05-28 | 2021-09-17 | 杭州电子科技大学 | Double-change frequency hopping pattern intelligent decision method based on PPER-DQN |
CN113411099B (en) * | 2021-05-28 | 2022-04-29 | 杭州电子科技大学 | Double-change frequency hopping pattern intelligent decision method based on PPER-DQN |
CN113890564A (en) * | 2021-08-24 | 2022-01-04 | 浙江大学 | Special ad hoc network frequency hopping anti-interference method and device for unmanned aerial vehicle based on federal learning |
CN113890564B (en) * | 2021-08-24 | 2023-04-11 | 浙江大学 | Special ad hoc network frequency hopping anti-interference method and device for unmanned aerial vehicle based on federal learning |
CN114417939B (en) * | 2022-01-27 | 2022-06-28 | 中国人民解放军32802部队 | Interference strategy generation method based on knowledge graph |
CN114417939A (en) * | 2022-01-27 | 2022-04-29 | 中国人民解放军32802部队 | Interference strategy generation method based on knowledge graph |
Also Published As
Publication number | Publication date |
---|---|
CN109302262B (en) | 2020-07-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109302262A (en) | A kind of communication anti-interference method determining Gradient Reinforcement Learning based on depth | |
Jiang et al. | Deep learning for fading channel prediction | |
CN108777872B (en) | Intelligent anti-interference method and intelligent anti-interference system based on deep Q neural network anti-interference model | |
Liu et al. | Anti-jamming communications using spectrum waterfall: A deep reinforcement learning approach | |
CN109274456A (en) | A kind of imperfect information intelligence anti-interference method based on intensified learning | |
Jiang et al. | Recurrent neural networks with long short-term memory for fading channel prediction | |
CN111970072B (en) | Broadband anti-interference system and method based on deep reinforcement learning | |
CN111726217B (en) | Deep reinforcement learning-based autonomous frequency selection method and system for broadband wireless communication | |
Jiang et al. | Multi-antenna fading channel prediction empowered by artificial intelligence | |
CN109845310A (en) | The method and unit of wireless resource management are carried out using intensified learning | |
Jiang et al. | A deep learning method to predict fading channel in multi-antenna systems | |
CN108712748B (en) | Cognitive radio anti-interference intelligent decision-making method based on reinforcement learning | |
Ak et al. | Avoiding jammers: A reinforcement learning approach | |
WO2021036414A1 (en) | Co-channel interference prediction method for satellite-to-ground downlink under low earth orbit satellite constellation | |
Çavdar | PSO tuned ANFIS equalizer based on fuzzy C-means clustering algorithm | |
CN108401254A (en) | A kind of wireless network resource distribution method based on intensified learning | |
KR20210124897A (en) | Method and system of channel esimiaion for precoded channel | |
CN113420495B (en) | Active decoy type intelligent anti-interference method | |
Zhou et al. | Deep deterministic policy gradient with prioritized sampling for power control | |
CN114051252A (en) | Multi-user intelligent transmitting power control method in wireless access network | |
CN116866048A (en) | Anti-interference zero-and Markov game model and maximum and minimum depth Q learning method | |
Zappone et al. | Complexity-aware ANN-based energy efficiency maximization | |
Evmorfos et al. | Deep actor-critic for continuous 3D motion control in mobile relay beamforming networks | |
Sriharipriya et al. | Artifical neural network based multi dimensional spectrum sensing in full duplex cognitive radio networks | |
CN113747447A (en) | Double-action reinforcement learning frequency spectrum access method and system based on priori knowledge |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20200710 Termination date: 20210927 |