CN109302262A - A kind of communication anti-interference method determining Gradient Reinforcement Learning based on depth - Google Patents

A kind of communication anti-interference method determining Gradient Reinforcement Learning based on depth Download PDF

Info

Publication number
CN109302262A
CN109302262A CN201811129485.9A CN201811129485A CN109302262A CN 109302262 A CN109302262 A CN 109302262A CN 201811129485 A CN201811129485 A CN 201811129485A CN 109302262 A CN109302262 A CN 109302262A
Authority
CN
China
Prior art keywords
interference
neural network
strategy
performer
estimation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811129485.9A
Other languages
Chinese (zh)
Other versions
CN109302262B (en
Inventor
黎伟
王军
李黎
党泽
王杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
CETC 54 Research Institute
Original Assignee
University of Electronic Science and Technology of China
CETC 54 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China, CETC 54 Research Institute filed Critical University of Electronic Science and Technology of China
Priority to CN201811129485.9A priority Critical patent/CN109302262B/en
Publication of CN109302262A publication Critical patent/CN109302262A/en
Application granted granted Critical
Publication of CN109302262B publication Critical patent/CN109302262B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04KSECRET COMMUNICATION; JAMMING OF COMMUNICATION
    • H04K3/00Jamming of communication; Counter-measures
    • H04K3/20Countermeasures against jamming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04KSECRET COMMUNICATION; JAMMING OF COMMUNICATION
    • H04K3/00Jamming of communication; Counter-measures
    • H04K3/40Jamming having variable characteristics

Abstract

The invention belongs to wireless communication technology fields, are related to a kind of communication anti-interference method that Gradient Reinforcement Learning is determined based on depth.The present invention constructs interference environment model according to interference source quantity and wireless channel model first;Utility function is constructed according to legitimate user's communication quality index, and using the utility function as the return in study;The spectrum information that different time-gap samples is built into frequency spectrum time slot matrix, with the matrix description interference environment state.Then Gradient Reinforcement Learning mechanism is determined according to depth, constructs convolutional neural networks, when carrying out anti-interference decision, ambient condition matrix realizes Anti-interference Strategy selection of the corresponding states on continuous space by target performer convolutional neural networks.Base of the present invention.Determine that the intensified learning mechanism of gradient policy completes continuous Anti-interference Strategy selection in communication in depth.Quantization discrete processes policy space bring quantization error is overcome, neural network output unit lattice number and network complexity is reduced, improves Anti-interference algorithm performance.

Description

A kind of communication anti-interference method determining Gradient Reinforcement Learning based on depth
Technical field
The invention belongs to wireless communication technology fields, are related to a kind of communication that Policy-Gradient Reinforcement Learning is determined based on depth Anti-interference method.
Background technique
With the development of wireless communication technique, the electromagnetic environment that wireless communication system faces is increasingly complicated severe, both may It can be by the unintentional interference communicated from one's own side, it is also possible to which will receive the interference signal that enemy deliberately discharges influences.Traditional Antijamming measure is directed to the static interference mode of interference source, takes fixed Anti-interference Strategy.With interference means intelligence, Interference source can adjust jamming exposure area according to the change dynamic of legitimate user's communications status, so that traditional anti-interference method can not be protected Demonstrate,prove normal communication of the legitimate user under dynamic interference environment.It is therefore desirable to the dynamic disturbance strategies for interference source to take phase The intelligent Anti-interference Strategy answered guarantees normal communication of the legitimate user under dynamic interference environment.
Currently, the dynamic disturbance means for interference source mainly carry out Anti-interference Strategy by the way of based on intensified learning Dynamic adjusts.This method carries out sliding-model control to Anti-interference Strategy space first, constructs Anti-interference Strategy collection;Secondly construction with The relevant utility function of legitimate user's communication quality;Ambient condition matrix is obtained by spectral sample and pretreatment, and by environment State matrix realizes discrete strategies selection by deep neural network;Selection strategy is finally acted on environment and estimates environment shape State transfer.By repeatedly learning, the optimal communication strategy under dynamic disturbance strategy is obtained.It specifically refers to: Xin Liu, etc., “Anti-jamming Communications Using Spectrum Waterfall:A Deep Reinforcement Learning Approach”,IEEE Communication Letters,vol.22,no.5, May.2018.This method constitutes power selection set by carrying out quantization discrete processes to power selection strategy.Then construction is deep Spend neural network, and by the frequency spectrum time slot matrix sampled from air interference environment by neural network output it is corresponding each from Dissipate the state behavior functional value of power policy.The selection of power policy is carried out finally by ∈-greedy Greedy strategy.However, This method can introduce quantization error when carrying out quantization discrete processes to power so that power selection result be unable to reach it is optimal.No Only in this way, in discretization power for the transmission power in different subchannels, according to quantization discrete processes rule, the plan of construction Slightly gathering in need includes N × L element, and wherein N is the number of channel, is quantization series, and corresponding deep neural network needs a LN Output.When system channel number and excessive quantization series, neural network exports number exponentially and increases, and increases the instruction of neural network Practice and carry out based on ∈-greedy Greedy strategy the complexity of policy selection.
Summary of the invention
Against the above technical problems, the present invention proposes that one kind determines Policy-Gradient strategy intensified learning mechanism based on depth The communication anti-interference power selection method of (Deep Deterministic Policy Gradient, DDPG).To power plan In the case that slightly space carries out discretization, the selection for determining anti-interference power strategy is completed, interference free performance is improved, reduces strategy Select complexity.
The present invention constructs interference environment according to interference source quantity and wireless channel model first.Matter is communicated according to legitimate user Figureofmerit constructs utility function, and using the utility function as the return in study.The spectrum information structure that different time-gap is sampled Frequency spectrum time slot matrix is built up, with the matrix description interference environment state.It is constructed in the present invention including target performer (target_ Actor), estimate performer (evaluate_actor), target reviewer (target_critic) and estimation reviewer (evaluate_critic) four deep neural networks are respectively used to policy selection based on ambient condition matrix, strategy Select the operations such as network training, policy selection evaluation and evaluation network training.Wherein, target performer neural network and estimation performer Neural network network structure having the same, target reviewer neural network and estimation reviewer's neural network net having the same Network structure.Ambient condition matrix exports Anti-interference Strategy by target performer neural network.Legitimate user adjusts transmission power And channel selection, realize intelligent Anti-interference Strategy adjustment.Return letter is calculated according to air interference environmental model and Anti-interference Strategy Numerical value and transfer environment state matrix.Current ambient conditions, current Anti-interference Strategy, Reward Program value and transfer ambient condition Composition experience group, is stored in experience pond.The experience group finally extracted in experience pond is completed to estimation performer's neural network and is estimated Count the training of reviewer's neural network.When study step number reaches certain amount, commented by estimation performer's neural network and estimation The update to target performer neural network and target reviewer's neural network is respectively completed by the parameter of family's neural network.The study Mechanism is continued for, until learning outcome is restrained.
Using the realization of the present invention mentioned legitimate user's intelligence Anti jamming Scheme the following steps are included:
S1, the intelligent each algoritic module definition of Anti jamming Scheme: interference environment definition, the definition of interference environment state, return Function definition, Anti-interference Strategy definition, the definition of experience storage pool.
S2, construction target performer neural network (target_actor), estimation performer's neural network (evaluate_ Actor), target reviewer neural network (target_critic) and estimation reviewer's neural network (evaluate_critic) Four deep neural networks.Wherein target performer neural network and estimation performer's neural network network structure having the same, mesh Mark reviewer's neural network and estimation reviewer's neural network structure having the same.
S3, by environmental state information, i.e. frequency slot matrix obtains Anti-interference Strategy by target performer's neural network, should Strategy acts on interference environment, calculates return value and transfering state matrix of the Anti-interference Strategy under current n interference environment, goes forward side by side Row storage.
S4, experience group of sampling from experience pond are trained estimation performer's neural network and estimation reviewer's neural network With parameter with new.
S5, judges whether study mechanism meets stop condition, if satisfied, then stopping learning obtaining Anti-interference Strategy to the end; Otherwise S2 is returned to continue to learn.
According to an embodiment of the invention, above-mentioned steps S1 the following steps are included:
Interference environment definition: S1.1 defines interference environment according to intruder's quantity, conflicting mode and wireless channel model.
S1.2, the definition of interference environment state: the spectrum information that different time-gap is measured constitutes frequency spectrum time slot matrix, when frequency spectrum Gap matrix size is determined by observation spectral range and observation slot length.
Reward Program definition: S1.3 constructs feedback Reward Program according to the communication quality index of legitimate user.
Anti-interference Strategy definition: transmission power combination in different subchannels is defined as Anti-interference Strategy collection by S1.4.Often Transmission power on sub-channels can beAny value on continuum.
S1.5, the definition of experience storage pool: the experience storage pool of a default fixed size, for storing by current environment shape The experience group that state matrix, Anti-interference Strategy, Reward Program value and transfer environment state matrix form.
According to embodiments of the present invention, above-mentioned steps S2 the following steps are included:
S2.1, using mutually isostructural convolutional neural networks construction target performer neural network and estimation performer's nerve net Network.Convolutional neural networks include multiple convolutional layers, multiple pond layers and multiple full articulamentums.Target performer's neural network is according to defeated Enter the selection that frequency spectrum time slot state matrix completes Anti-interference Strategy.Estimate that performer's neural network completes network according to sampling experience group Trained and parameter updates.When train epochs reach preset value, with estimation performer's neural network parameter coverage goal performer nerve Network parameter, so that the parameter for completing target performer's neural network updates.
S2.2, using mutually isostructural conventional depth neural network configuration target reviewer neural network and estimation reviewer Neural network.The deep neural network includes multiple neural net layers, includes multiple neurons, activation in each neural net layer Function.The output of target reviewer's neural network is used to help the policy selection superiority and inferiority of evaluation performer's neural network.Estimation comment Family's neural network carries out network training according to sampling posterior infromation and parameter updates.When train epochs reach preset value, with estimating It counts reviewer's neural network parameter coverage goal reviewer's neural network and completes parameter update.
According to an embodiment of the invention, above-mentioned steps S3 the following steps are included:
S3.1, according to the definition of ambient condition in step S1.2, by ambient condition matrix by constructing in step S2.1 Target performer's neural network obtains Anti-interference Strategy.And Anti-interference Strategy is acted on into the interference environment that step S1.1 is defined, it counts State matrix after calculating Reward Program value and shifting in next step.
S3.2, define a capacity be M experience pond, and by the strategy interaction of current ambient conditions, selection in S3.1, Obtained Reward Program value and next step ambient condition constitutes experience group { S, A, R, S_ } and is stored in experience pond.
According to an embodiment of the invention, above-mentioned steps S4 the following steps are included:
S4.1 randomly selects a certain number of experience groups for convolutional neural networks parameter from the experience pond that S3.2 is obtained Training and update.
S4.2, the current state S and next step state S_ in experience group extracted by step S4.1, passes through target nerve Network and estimation neural network obtain corresponding two states behavior value.Pass through current Reward Program value and two state behavior values Loss function is constructed, network training and update are completed to estimation reviewer's neural network by minimizing loss function.
S4.3, by the current state S in experience group that step S4.1 is extracted by estimating that reviewer's neural network obtains it Step S4.1 is extracted current state S in experience group and strategy A and is obtained pair by target performer's neural network by state behavior value Answer state behavior value.Loss function is constructed according to two state behavior values, carries out the training and parameter of estimation performer's neural network It updates.
The invention has the benefit that
The present invention is based on depth to determine that the intensified learning mechanism of Policy-Gradient strategy completes continuous Anti-interference Strategy in communication Selection.Quantization discrete processes policy space bring quantization error is overcome, neural network output unit lattice number and net are reduced Network complexity improves Anti-interference algorithm performance.
Detailed description of the invention
Fig. 1 is that the Anti-interference Strategy based on determining depth-size strategy gradient policy intensified learning mechanism that the present invention designs selects Algorithm process frame
Fig. 2 is the target performer neural network that the present invention designs and estimation performer's neural network structure
Fig. 3 is the target reviewer neural network that the present invention designs and estimation reviewer's neural network structure
Fig. 4 is the designed algorithm of this hair and optimal policy selection, randomized policy selection and is based on DQN discretization decision-making party The algorithm performance of method compares.
Specific embodiment
To keep step of the invention clear in further detail, below in conjunction with attached drawing and case study on implementation to of the invention further detailed Explanation.
Embodiment one
Fig. 1 is inventive algorithm specific implementation method, and each step and its principle is described in detail below with reference to Fig. 1.
It is proposed by the present invention to determine that the continuous policy selection anti-interference method algorithm of gradient policy intensified learning is real based on depth Existing frame is as shown in Figure 1.Interference and wireless environment modeling are completed in step S1 in S1.1.Multiple interference sources are to legal logical in scene Letter link is interfered, and conflicting mode may include but be not limited to: single tone jamming, Multi-tone jamming, linear frequency sweep interference, part frequency Band interference and five kinds of noise frequency hopping interference interference.Interference source can be by adjusting interference parameter or switching conflicting mode realization pair The interference dynamic of legitimate user adjusts.Five kinds of conflicting mode concrete mathematical models are as follows:
(1) single tone jamming
The complex radical type expression of single tone jamming signal are as follows:
Wherein, A is single tone jamming signal amplitude, fJFor single tone jamming signal frequency,For single tone jamming initial phase.
(2) Multi-tone jamming
The complex radical type expression of Multi-tone jamming signal are as follows:
Wherein, AmFor m-th of single tone jamming amplitude in Multi-tone jamming, fmFor the frequency of m-th of single tone jamming,For m The initial phase of a single tone jamming.
(3) linear frequency sweep interferes
The complex radical type expression of linear frequency sweep interference signal are as follows:
Wherein, A is amplitude, f0It is original frequency, k is coefficient of frequency modulation,It is initial phase, T is signal duration.
(4) partial-band jamming
Partial-band Gaussian noise jamming shows as white Gaussian noise in partial-band, the expression formula of complex base band:
Wherein, Un(t) be obey mean value be zero, variance isBaseband noise, fJFor the centre frequency of signal,For [0, 2 π] in be uniformly distributed and mutually independent phase.
(5) niose-modulating-frenquency jamming
The complex base band of noise FM signal can indicate as follows:
Wherein, A is the amplitude of noise FM signal, f0For the carrier frequency of noise FM signal, kfmFor frequency modulation index (FM index), ξ It (t) is zero-mean, varianceFor this white noise of the narrowband loud, high-pitched sound of certain value.WhereinIt is a Wiener-Hopf equation, belongs to In oneGaussian Profile.Frequency modulation index (FM index) kfmAnd varianceThe effective bandwidth of noise FM is codetermined.
Interference source is according to maximum interference effect dynamic select conflicting mode and corresponding parameter.
Legitimate user's Anti-interference Strategy calculates Reward Program value R by wireless frequency spectrum intelligence sample in environment, calculates environment State matrix S;History warp is constructed according to Reward Program, ambient condition, current Anti-interference Strategy and next step transfering state matrix Group is tested, is stored in experience pond;Neural network carries out anti-interference action selection in next step according to current ambient conditions matrix, and will The Anti-interference Strategy acts on environment, while the update of parameter is carried out according to historical experience;Entire algorithm iteration is carried out until calculating Method convergence.Specifically, the specific implementation step of the algorithm is as follows:
Step S1.2, S1.3 and S1.4 are respectively completed ambient condition design, the design of Reward Program and resist dry in the present invention Disturb the design of strategy.In multi sub-channel, received signal be may be expressed as: on sub-channels for legal link receiving end
Wherein m ∈ { 1 ..., N } is channel indexes number, and N is channel number;xtIt is useful transmitting signal, xjIt is interference signal,It is white Gaussian noise in subchannel;J ∈ { 1 ..., J } is interference source call number, and J is interference source number;Sequence index when t is Number;Indicate the channel between legitimate correspondence user,Interference channel of the expression interference source to legitimate user receiver.Cause This, Signal to Interference plus Noise Ratio obtained by legitimate user receiving end and achievable rate may be expressed as:
WhereinIt is the equivalent channel gain in subchannel,It is corresponding noise power.Receiving end The rate summation in N number of subchannel is represented by the achievable rate of moment t:
Before anti-interference decision, corresponding power on every sub-channels, institute are obtained by the sampling to wireless environment first There is the power of subchannel to constitute vector power P=[pt,1,pt,2,…,pt,N], wherein N corresponds to subchannel number.State matrix S by Multiple historical power vectors constitute St=[Pt-1 Pt-2…Pt-t]T, wherein t is observation time window.Anti-interference Strategy is considered simultaneously Limitation in terms of transmission power, Anti-interference Strategy used by the Reward Program that designs considers in the present invention are dry in letter simultaneously It makes an uproar than upper gain and power overhead, expression is as follows:
WhereinIt is jamming power of the interference source on channel;FunctionExpression is worked as fjWhen=m, otherwise output 1 exports 0;It is transmission power expense.
Interference strength due to the influence in the source of being interfered, in certain subchannelsIt is larger, it can be by adjusting Transmission power in respective channel guarantees to maximize link communication quality within the scope of controlled power.Therefore every in the present invention Anti-interference Strategy on sub-channels is the transmission power in the subchannel.It will assume subchannel m emission maximum in the present invention Power isWherein m ∈ { 1 ..., N }, therefore Anti-interference Strategy collection is represented by
Experience group and experience pond are defined in inventive step S1 in S1.5 step, passes through the storage and sampling to historical experience The training and parameter for providing the neural network in subsequent step update.It is described according to the algorithm structure of Fig. 1, defines appearance in invention Amount size is MeExperience pond, M can be storedeHistorical experience.The current ambient conditions obtained by S1.2-S1.5 in step S1 S, Reward Program value R, current Anti-interference Strategy AtWith transfer ambient condition S-Building experience group { S, R, At,S_}.The experience group quilt It is stored in experience pond one by one, when the experience group item number stored in experience pond reaches maximum size, the longest experience group of storage time By newly into experience group covering.
In inventive step S2 step S2.1, using convolutional neural networks construction target performer neural network μ (| θμ) and Estimation performer's neural network μ ' (| θμ).Target performer neural network and estimation performer's neural network network knot having the same Structure, specific structure is as shown in Fig. 2, design parameter reference implementation example two.The current ambient conditions matrix obtained by step S1.2 leads to Cross the transmission power vector that target performer neural network spatially selects corresponding subchannel from continuous Anti-interference Strategy:In order to realize to the exploration of unknown strategy, overcome the case where falling into local optimum, the vector power with The random search noise of identical dimensional is superimposed, i.e.,Form current Anti-interference Strategy At.The strategy It acts on environment, completes the interaction of strategy with interference environment, to shift in next step ambient condition and Reward Program value It calculates.In inventive step S2 step S2.2 using same depth neural network structure construction target reviewer neural network Q ( |θt) and estimation reviewer's neural network Q'(| θt).Target performer neural network is completed according to input spectrum time slot state matrix The selection of Anti-interference Strategy.Estimate that performer's neural network completes network training and parameter with new according to sampling experience group.Work as training When step number reaches preset value, with performer's neural network parameter coverage goal performer's neural network parameter is estimated, to complete target The parameter of performer's neural network updates.The output of target reviewer's neural network is used to help the strategy of evaluation performer's neural network Select superiority and inferiority.Estimate that reviewer's neural network carries out network training according to sampling posterior infromation and parameter updates.Work as train epochs When reaching preset value, parameter is completed with estimation reviewer's neural network parameter coverage goal reviewer's neural network and is updated.
In step S3 in step S3.1 by strategy obtained in S2.2 as the transmission power on present channel m, next time It is calculated when calculating ambient condition according to new transmission power and interference model.In step S3 in step S3.2, according to S1.5 Defined in experience storage pool capacity and structure, by S2.1 current ambient conditions, select in S2.2 strategy interaction, The next step ambient condition that Reward Program value and S3.1 obtained in S2.2 are obtained constitutes experience group { S, At, R, S_ } and it is stored in In the experience pond.When the experience group of storage reaches the maximum size of experience group, newest obtained experience group is stored in oldest In the storage unit of experience group storage, the oldest experience group is covered.
In step s 4 in step S4.1, according to presetting batch_size size from the experience storage pool in step S3 The middle experience group for extracting corresponding number is completed to estimation reviewer Q'(| θt) neural network parameter training.According to Fig. 1 institute Show, step S4.2 is to estimation reviewer's neural network Q'(| θ in step S4Q') training by minimize its loss function Loss_function realizes that wherein Loss_function is defined as follows:
Lloss_functionQ')=(1/N) ∑i(yi-Q(Si,AiQ'))2 (10)
yi=Ri+γQ(Si+1,μ'(Si+1μ')|θQ) (11)
Wherein Q (Si,AiQ) indicate dependent on estimation performer's neural network parameter θQ‘State behavior value function, γ indicate Long-term return discount factor.When train epochs, which reach, updates step number I, it will estimate that the network parameter in reviewer's neural network is answered Make the update that network parameter is completed in target reviewer's neural network.Step S4.3 is to estimation performer's neural network in step S4 μ'(·|θμ') training by strengthen target reviewer neural network optimal policy choice direction and estimation performer's neural network work as Parameter optimal selection direction is realized under preceding ambient condition, and update method is as follows:
When train epochs, which reach, updates step number I, the network parameter estimated in performer's neural network is copied into target and is drilled The update of network parameter is completed in member's neural network.
In step s 5, with trained lasting progress, Reward Program R gradually converges to its optimal value.The present invention falls into a trap The Change in Mean situation for recording ζ step R thinks trained convergence when the Change in Mean is sufficiently small, stops the algorithm, and will be final defeated Strategy out is anti-interference as final strategy.Convergent decision procedure is as follows:
Wherein υ is to determine convergent termination condition, is set as a very small positive value.
Embodiment two
Convolutional neural networks structure for anti-interference decision proposed by the invention is as shown in Figure 2: system is assumed in emulation System divides 128 sub-channels, according to the frequency spectrum time slot state matrix of spectral sample signal construction 128 × 128 as convolutional Neural The input of network;Then by three convolutional layers, the vector power of two pond layers and two full articulamentum output 1 × 128.Tool Body, in convolutional neural networks convolutional layer, pond layer and operation it is as follows:
Assuming that the input data of convolution algorithm is I, corresponding convolution kernel K is identical as the dimension of input data.With three-dimensional defeated 1) entering data instance (when input data is two-dimentional, can regard the third dimension as.Convolution operation require the convolution kernel K third dimension with it is defeated It is identical to enter the data I third dimension, uses w1,w2,w3Indicate each three dimensions, after convolution operation, output are as follows:
It generally includes to maximize pond, mean value pond in the operation of convolutional neural networks pondization, calculation method is as follows:
Mean value pond:
Maximum value pond:
Maximum value pond is used in the present invention.
Specifically, each layer of structure is as shown in Fig. 2, every layer of structure is described in detail below in the present embodiment:
Convolutional neural networks first layer is input layer, and input size is determined by subchannel number and observation slot length. Usable spectrum is divided into 128 sub-channels in network model, and it is 128 that observation time slot, which is length, therefore input state matrix is tieed up Degree is 128 × 128.
The convolutional neural networks second layer is made of the operation of convolution, Relu activation primitive and pondization.Specifically, coming from input layer State matrix first pass around convolution kernel having a size of 3 × 3 convolution operation, wherein convolution kernel number is 20, and convolution step-length is 1, Using ReLu as activation primitive.Output result dimension after the operation is 126 × 126 × 20.Wherein Relu activates letter Number operation are as follows:
Y=max { 0, x } (17)
The output is subjected to maximum pondization operation again, pond is having a size of 2 × 2.After the operation of the convolution pondization of first layer Exporting dimension is 63 × 63 × 20.
By convolutional network third layer, convolution operation obtains 31 × 31 for output after convolution pondization operation from the second layer × 30 output.Wherein convolution kernel ruler dimension is 3 × 3, and convolution kernel number is 30, and activation primitive uses Relu function, convolution step A length of 2.
The output of third layer is carried out convolution operation by the 4th layer of convolutional network, and the convolution kernel of use is having a size of 4 × 4, convolution kernel number is 30, and convolution step-length is 2, and to w1,w2Two dimensions carry out zero padding operation, and zero padding number is 1.By It is 15 × 15 × 30 that dimension is exported after this layer of convolution operation.And the output after convolution operation will be changed to carry out maximum pondization and operate, Having a size of 3 × 3, it is 5 × 5 × 30 that dimension is exported behind pond in pond.
Convolutional network layer 5 is full articulamentum, constructs 1024 neurons in this layer, and activation primitive uses Relu letter Number.It is reassembled as the vector that dimension is 1 × 750 from the output that the 4th layer of dimension of convolutional neural networks is 5 × 5 × 30, is passed through The vector of dimension 1 × 360 is exported after the full articulamentum processing.
Convolutional network layer 6 is full articulamentum, constructs 128 neurons in this layer, and activation primitive uses Relu letter Number.Output from convolutional neural networks layer 5 output after the full articulamentum processing is corresponding with Anti-interference Strategy collection dimension Q (| θt) value vector, output dimension is 1 × 128.
Fig. 3 is for realizing the layer neural network of estimation reviewer's neural network and target reviewer's neural network, nerve Network structure.First layer is input layer, and dimension is 128 × (128+1), wherein the state square comprising indicating channel power information Battle array St, and the function vector A for indicating strategyt.The second layer is nervous layer 1, and neuron number 1024, output dimension is 1024 × 1, activation primitive is ReLu function.Third layer is nervous layer 2, and neuron number 128, output dimension is 128 × 1, is used ReLu activation primitive.4th layer is nervous layer 3, and neuron number 32, output dimension is 32 × 1, activates letter using ReLu Number.Layer 5 is nervous layer 4, and neuron number 1 exports the Q value for evaluating performer's network strategy selection superiority and inferiority.
Further, Fig. 4 is illustrated in the present invention and is determined that the continuous power of Policy-Gradient Reinforcement Learning selects based on depth Anti-interference Strategy performance.Random power selection strategy, the discrete power selection strategy based on DQN, institute of the present invention have been carried out in figure Propose the performance of continuous power selection strategy and ideal optimal power selection strategy that Policy-Gradient is determined based on depth.It can from figure To find out, algorithm Reward Program proposed in the present invention has very big performance boost compared to random power selection strategy.

Claims (3)

1. a kind of communication anti-interference method for determining Gradient Reinforcement Learning based on depth, which comprises the following steps:
S1, initialization definitions, comprising:
Interference environment: interference environment is defined according to intruder's quantity, conflicting mode and wireless channel model;
Interference environment state: the spectrum information that different time-gap is measured constitutes frequency spectrum time slot matrix, frequency spectrum time slot matrix size by It observes spectral range and observation slot length determines;
Reward Program: feedback Reward Program is constructed according to the communication quality index of legitimate user;
Anti-interference Strategy: the transmission power combination in different subchannels is defined as Anti-interference Strategy collection;
Deep neural network: construction target performer, estimation performer, target reviewer and estimation four depth nerve nets of reviewer Network, wherein target performer neural network and estimation performer's neural network network structure having the same, target reviewer's nerve net Network and estimation reviewer's neural network network structure having the same;
Experience storage pool: the experience storage pool of a default fixed size, for storing by current AF panel strategy, environment shape The experience group of state, current AF panel strategy and environment return composition;
S2, by interference environment state, i.e. frequency slot matrix obtains Anti-interference Strategy by target performer's convolutional neural networks, and The strategy is acted on into interference environment, is observed under current Anti-interference Strategy in the return value of interference environment under according to Reward Program State matrix after the transfer of one step;The output of the target reviewer neural network is used to help the plan of evaluation performer's neural network Slightly select superiority and inferiority;
S3, by under current Anti-interference Strategy, interference environment state, Anti-interference Strategy return value and transfer ambient condition constitute warp Group storage is tested to experience pond;
S4, experience group of sampling from experience pond are trained estimation performer's neural network and estimation reviewer's neural network, when When train epochs reach preset value, with estimation performer's neural network parameter coverage goal performer's neural network parameter, commented with estimation By family's neural network parameter coverage goal reviewer's neural network parameter, to complete the parameter of target performer's neural network more Newly;
S5, judge whether study mechanism meets preset stop condition, if satisfied, then stopping learning obtaining anti-interference plan to the end Slightly;Otherwise S2 is returned to continue to learn.
2. a kind of imperfect information intelligence anti-interference method based on intensified learning according to claim 1, feature exist In Reward Program described in step S1 are as follows:
Wherein, m ∈ { 1 ..., N } is channel indexes number, and N is channel number,It is interference source in channel On jamming power, j ∈ { 1 ..., J } is interference source call number, and J is interference source number;T is timing call number;It indicates to close Channel between method communication user,For sub-channel transmission power, functionF is worked as in expressionjWhen=m, output 1 is otherwise defeated Out 0;It is transmission power expense.
3. a kind of imperfect information intelligence anti-interference method based on intensified learning according to claim 2, feature exist In, in the step S4, the method for convolutional neural networks parameter update are as follows:
The training of convolutional neural networks parameter passes through convolution mind by current state in the experience group of extraction and next step state Corresponding state behavior value is obtained through network, and constructs corresponding loss function, carries out network ginseng by minimizing loss function Several updates.
CN201811129485.9A 2018-09-27 2018-09-27 Communication anti-interference method based on depth determination gradient reinforcement learning Expired - Fee Related CN109302262B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811129485.9A CN109302262B (en) 2018-09-27 2018-09-27 Communication anti-interference method based on depth determination gradient reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811129485.9A CN109302262B (en) 2018-09-27 2018-09-27 Communication anti-interference method based on depth determination gradient reinforcement learning

Publications (2)

Publication Number Publication Date
CN109302262A true CN109302262A (en) 2019-02-01
CN109302262B CN109302262B (en) 2020-07-10

Family

ID=65164716

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811129485.9A Expired - Fee Related CN109302262B (en) 2018-09-27 2018-09-27 Communication anti-interference method based on depth determination gradient reinforcement learning

Country Status (1)

Country Link
CN (1) CN109302262B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109861720A (en) * 2019-03-15 2019-06-07 中国科学院上海高等研究院 WSN anti-interference method, device, equipment and medium based on intensified learning
CN110113418A (en) * 2019-05-08 2019-08-09 电子科技大学 A kind of collaboration buffering updating method of Che Lian information centre network
CN110611619A (en) * 2019-09-12 2019-12-24 西安电子科技大学 Intelligent routing decision method based on DDPG reinforcement learning algorithm
CN110944354A (en) * 2019-11-12 2020-03-31 广州丰石科技有限公司 Base station interference monitoring method and system based on waveform analysis and deep learning
CN111181618A (en) * 2020-01-03 2020-05-19 东南大学 Intelligent reflection surface phase optimization method based on deep reinforcement learning
CN111526592A (en) * 2020-04-14 2020-08-11 电子科技大学 Non-cooperative multi-agent power control method used in wireless interference channel
CN111835453A (en) * 2020-07-01 2020-10-27 中国人民解放军空军工程大学 Communication countermeasure process modeling method
CN112087749A (en) * 2020-08-27 2020-12-15 华北电力大学(保定) Cooperative active eavesdropping method for realizing multiple listeners based on reinforcement learning
CN112188004A (en) * 2020-09-28 2021-01-05 精灵科技有限公司 Obstacle call detection system based on machine learning and control method thereof
CN112202527A (en) * 2020-10-01 2021-01-08 西北工业大学 Intelligent electromagnetic signal identification system interference method based on momentum gradient disturbance
CN112492691A (en) * 2020-11-26 2021-03-12 辽宁工程技术大学 Downlink NOMA power distribution method of deep certainty strategy gradient
CN112906640A (en) * 2021-03-19 2021-06-04 电子科技大学 Space-time situation prediction method and device based on deep learning and readable storage medium
CN113038616A (en) * 2021-03-16 2021-06-25 电子科技大学 Frequency spectrum resource management and allocation method based on federal learning
CN113098565A (en) * 2021-04-02 2021-07-09 甘肃工大舞台技术工程有限公司 Stage carrier communication self-adaptive frequency hopping anti-interference technology based on deep network
CN113221454A (en) * 2021-05-06 2021-08-06 西北工业大学 Electromagnetic radiation source identification method based on deep reinforcement learning
CN113411099A (en) * 2021-05-28 2021-09-17 杭州电子科技大学 Double-change frequency hopping pattern intelligent decision method based on PPER-DQN
CN113890564A (en) * 2021-08-24 2022-01-04 浙江大学 Special ad hoc network frequency hopping anti-interference method and device for unmanned aerial vehicle based on federal learning
CN114417939A (en) * 2022-01-27 2022-04-29 中国人民解放军32802部队 Interference strategy generation method based on knowledge graph
CN114696925A (en) * 2020-12-31 2022-07-01 华为技术有限公司 Channel quality assessment method and related device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104581738A (en) * 2015-01-30 2015-04-29 厦门大学 Cognitive radio hostile interference resisting method based on Q learning
CN104994569A (en) * 2015-06-25 2015-10-21 厦门大学 Multi-user reinforcement learning-based cognitive wireless network anti-hostile interference method
US20190004518A1 (en) * 2017-06-30 2019-01-03 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and system for training unmanned aerial vehicle control model based on artificial intelligence

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104581738A (en) * 2015-01-30 2015-04-29 厦门大学 Cognitive radio hostile interference resisting method based on Q learning
CN104994569A (en) * 2015-06-25 2015-10-21 厦门大学 Multi-user reinforcement learning-based cognitive wireless network anti-hostile interference method
US20190004518A1 (en) * 2017-06-30 2019-01-03 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and system for training unmanned aerial vehicle control model based on artificial intelligence

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GUOAN HAN: "Two-dimensional anti-jamming communication based on deep reinforcement learning", 《2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)》 *
XIN LIU: "A heterogeneous information fusion deep reinforcement learning for intelligent frequency selection of HF communication", 《CHINA COMMUNICATIONS》 *
XIN LIU: "Anti-Jamming Communications Using Spectrum Waterfall: A Deep Reinforcement Learning Approach", 《IEEE COMMUNICATIONS LETTERS》 *

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109861720A (en) * 2019-03-15 2019-06-07 中国科学院上海高等研究院 WSN anti-interference method, device, equipment and medium based on intensified learning
CN110113418A (en) * 2019-05-08 2019-08-09 电子科技大学 A kind of collaboration buffering updating method of Che Lian information centre network
CN110611619A (en) * 2019-09-12 2019-12-24 西安电子科技大学 Intelligent routing decision method based on DDPG reinforcement learning algorithm
CN110611619B (en) * 2019-09-12 2020-10-09 西安电子科技大学 Intelligent routing decision method based on DDPG reinforcement learning algorithm
CN110944354A (en) * 2019-11-12 2020-03-31 广州丰石科技有限公司 Base station interference monitoring method and system based on waveform analysis and deep learning
CN111181618A (en) * 2020-01-03 2020-05-19 东南大学 Intelligent reflection surface phase optimization method based on deep reinforcement learning
CN111526592A (en) * 2020-04-14 2020-08-11 电子科技大学 Non-cooperative multi-agent power control method used in wireless interference channel
CN111526592B (en) * 2020-04-14 2022-04-08 电子科技大学 Non-cooperative multi-agent power control method used in wireless interference channel
CN111835453B (en) * 2020-07-01 2022-09-20 中国人民解放军空军工程大学 Communication countermeasure process modeling method
CN111835453A (en) * 2020-07-01 2020-10-27 中国人民解放军空军工程大学 Communication countermeasure process modeling method
CN112087749A (en) * 2020-08-27 2020-12-15 华北电力大学(保定) Cooperative active eavesdropping method for realizing multiple listeners based on reinforcement learning
CN112087749B (en) * 2020-08-27 2023-06-02 华北电力大学(保定) Cooperative active eavesdropping method for realizing multiple listeners based on reinforcement learning
CN112188004A (en) * 2020-09-28 2021-01-05 精灵科技有限公司 Obstacle call detection system based on machine learning and control method thereof
CN112202527A (en) * 2020-10-01 2021-01-08 西北工业大学 Intelligent electromagnetic signal identification system interference method based on momentum gradient disturbance
CN112202527B (en) * 2020-10-01 2022-09-13 西北工业大学 Intelligent electromagnetic signal identification system interference method based on momentum gradient disturbance
CN112492691A (en) * 2020-11-26 2021-03-12 辽宁工程技术大学 Downlink NOMA power distribution method of deep certainty strategy gradient
CN112492691B (en) * 2020-11-26 2024-03-26 辽宁工程技术大学 Downlink NOMA power distribution method of depth deterministic strategy gradient
CN114696925A (en) * 2020-12-31 2022-07-01 华为技术有限公司 Channel quality assessment method and related device
CN114696925B (en) * 2020-12-31 2023-12-15 华为技术有限公司 Channel quality assessment method and related device
CN113038616A (en) * 2021-03-16 2021-06-25 电子科技大学 Frequency spectrum resource management and allocation method based on federal learning
CN113038616B (en) * 2021-03-16 2022-06-03 电子科技大学 Frequency spectrum resource management and allocation method based on federal learning
CN112906640A (en) * 2021-03-19 2021-06-04 电子科技大学 Space-time situation prediction method and device based on deep learning and readable storage medium
CN113098565A (en) * 2021-04-02 2021-07-09 甘肃工大舞台技术工程有限公司 Stage carrier communication self-adaptive frequency hopping anti-interference technology based on deep network
CN113098565B (en) * 2021-04-02 2022-06-07 甘肃工大舞台技术工程有限公司 Stage carrier communication self-adaptive frequency hopping anti-interference method based on deep network
CN113221454A (en) * 2021-05-06 2021-08-06 西北工业大学 Electromagnetic radiation source identification method based on deep reinforcement learning
CN113221454B (en) * 2021-05-06 2022-09-13 西北工业大学 Electromagnetic radiation source identification method based on deep reinforcement learning
CN113411099A (en) * 2021-05-28 2021-09-17 杭州电子科技大学 Double-change frequency hopping pattern intelligent decision method based on PPER-DQN
CN113411099B (en) * 2021-05-28 2022-04-29 杭州电子科技大学 Double-change frequency hopping pattern intelligent decision method based on PPER-DQN
CN113890564A (en) * 2021-08-24 2022-01-04 浙江大学 Special ad hoc network frequency hopping anti-interference method and device for unmanned aerial vehicle based on federal learning
CN113890564B (en) * 2021-08-24 2023-04-11 浙江大学 Special ad hoc network frequency hopping anti-interference method and device for unmanned aerial vehicle based on federal learning
CN114417939B (en) * 2022-01-27 2022-06-28 中国人民解放军32802部队 Interference strategy generation method based on knowledge graph
CN114417939A (en) * 2022-01-27 2022-04-29 中国人民解放军32802部队 Interference strategy generation method based on knowledge graph

Also Published As

Publication number Publication date
CN109302262B (en) 2020-07-10

Similar Documents

Publication Publication Date Title
CN109302262A (en) A kind of communication anti-interference method determining Gradient Reinforcement Learning based on depth
Jiang et al. Deep learning for fading channel prediction
CN108777872B (en) Intelligent anti-interference method and intelligent anti-interference system based on deep Q neural network anti-interference model
Liu et al. Anti-jamming communications using spectrum waterfall: A deep reinforcement learning approach
CN109274456A (en) A kind of imperfect information intelligence anti-interference method based on intensified learning
Jiang et al. Recurrent neural networks with long short-term memory for fading channel prediction
CN111970072B (en) Broadband anti-interference system and method based on deep reinforcement learning
CN111726217B (en) Deep reinforcement learning-based autonomous frequency selection method and system for broadband wireless communication
Jiang et al. Multi-antenna fading channel prediction empowered by artificial intelligence
CN109845310A (en) The method and unit of wireless resource management are carried out using intensified learning
Jiang et al. A deep learning method to predict fading channel in multi-antenna systems
CN108712748B (en) Cognitive radio anti-interference intelligent decision-making method based on reinforcement learning
Ak et al. Avoiding jammers: A reinforcement learning approach
WO2021036414A1 (en) Co-channel interference prediction method for satellite-to-ground downlink under low earth orbit satellite constellation
Çavdar PSO tuned ANFIS equalizer based on fuzzy C-means clustering algorithm
CN108401254A (en) A kind of wireless network resource distribution method based on intensified learning
KR20210124897A (en) Method and system of channel esimiaion for precoded channel
CN113420495B (en) Active decoy type intelligent anti-interference method
Zhou et al. Deep deterministic policy gradient with prioritized sampling for power control
CN114051252A (en) Multi-user intelligent transmitting power control method in wireless access network
CN116866048A (en) Anti-interference zero-and Markov game model and maximum and minimum depth Q learning method
Zappone et al. Complexity-aware ANN-based energy efficiency maximization
Evmorfos et al. Deep actor-critic for continuous 3D motion control in mobile relay beamforming networks
Sriharipriya et al. Artifical neural network based multi dimensional spectrum sensing in full duplex cognitive radio networks
CN113747447A (en) Double-action reinforcement learning frequency spectrum access method and system based on priori knowledge

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200710

Termination date: 20210927