CN115296709A - Self-adaptive anti-interference beam forming method based on preprocessing deep reinforcement learning - Google Patents

Self-adaptive anti-interference beam forming method based on preprocessing deep reinforcement learning Download PDF

Info

Publication number
CN115296709A
CN115296709A CN202210785303.3A CN202210785303A CN115296709A CN 115296709 A CN115296709 A CN 115296709A CN 202210785303 A CN202210785303 A CN 202210785303A CN 115296709 A CN115296709 A CN 115296709A
Authority
CN
China
Prior art keywords
interference
layer
learning
function
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210785303.3A
Other languages
Chinese (zh)
Inventor
郝传辉
孙绪保
王胜利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University of Science and Technology
Original Assignee
Shandong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University of Science and Technology filed Critical Shandong University of Science and Technology
Priority to CN202210785303.3A priority Critical patent/CN115296709A/en
Publication of CN115296709A publication Critical patent/CN115296709A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/06Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station
    • H04B7/0613Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission
    • H04B7/0615Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission of weighted versions of same signal
    • H04B7/0617Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission of weighted versions of same signal for beam forming
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B17/00Monitoring; Testing
    • H04B17/30Monitoring; Testing of propagation channels
    • H04B17/391Modelling the propagation channel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L25/00Baseband systems
    • H04L25/02Details ; arrangements for supplying electrical power along data transmission lines
    • H04L25/0202Channel estimation
    • H04L25/024Channel estimation channel estimation algorithms
    • H04L25/0254Channel estimation channel estimation algorithms using neural network algorithms

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Electromagnetism (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Power Engineering (AREA)
  • Filters That Use Time-Delay Elements (AREA)

Abstract

The invention discloses a self-adaptive anti-interference beam forming method based on preprocessing deep reinforcement learning, belonging to the technical field of navigation and comprising the following steps: constructing a GPS terminal signal model, which comprises a 2x 2 dual-polarized antenna array, a control variable, a time-varying gain interference variable and Gaussian noise; constructing a deep learning Convolutional Neural Network (CNN), which comprises a data feature extraction network layer, a convolutional network layer, a pooling layer, an activation function layer and a full link layer; carrying out reinforcement learning processing and decision implementation on the Q network on the deep learning convolution neural network to obtain a deep reinforcement learning Q network; training the deep reinforcement learning Q network to obtain a trained deep reinforcement learning convolutional neural network
Figure DDA0003721975030000011
The invention carries out wave beam control on array signals, realizes wave beam main lobe positioning and sidelobe null interference signals through controlling the control variables; the deep Q network learning interference data characteristics automatically determine the next execution action without human intervention, and the autonomy of the intelligent antenna is greatly improved.

Description

Self-adaptive anti-interference beam forming method based on preprocessing deep reinforcement learning
Technical Field
The invention discloses a self-adaptive anti-interference beam forming method based on preprocessing deep reinforcement learning, and belongs to the technical field of navigation.
Background
The self-adaptive anti-interference beam forming is one of main research contents of an anti-interference technology of the intelligent antenna, is widely applied to the fields of transportation, surveying and mapping, telecommunication, water conservancy, fishery, natural disaster relief, aerospace and the like, and has extremely high commercial economic value. The current adaptive anti-interference beam forming technology adjusts the weighting factor of each array element signal according to a certain rule algorithm, so as to adjust the radiation pattern of the antenna array, thereby achieving the purposes of enhancing the expected signal and suppressing the interference signal. However, in the field of actual wireless channel transmission, due to the complex electromagnetic environment and the dynamically variable interference influence, non-uniform dispersion imbalance, multi-directional anisotropy and non-deterministic variation occur during array beam forming, and these adverse factors cause the wave front of the antenna array to be distorted to generate an angle diffusion phenomenon, so that the beam components are scattered and suspended in the transmission channel direction of an interference source, and variable virtual interference signals, namely airspace superposition interference, are caused. The research on an array beam forming algorithm of wavefront distortion caused by dynamic interference is relatively less, but with the development of artificial intelligence, an anti-interference strategy of machine learning is applied to the field of beam forming, and an expected signal of wavefront distortion is modeled and learned and constrained in a dynamic interference environment.
Related publications such as CN202110568887.4: a large array rapid self-adaptive anti-interference method based on a convolutional neural network. The convolution neural network is utilized to solve the problems of large calculation amount and poor beam shape protection in the large phased array self-adaptive beam forming in the prior art, thereby controlling the beam forming to realize the anti-interference purpose. Related documents also research methods for implementing anti-interference by machine learning, such as recently published [1] Z.Xiao, B.Gao, S.Liu and L.Xiao.Learing Based Power Control for MMwave Massive MIMO against jam [ C ]. IEEE Global Communications Conference (GLOBECOM), pp.1-6,2018.Xiao et al adopt DQN learning method to improve the total rate of anti-interference system under unknown low-complexity environment. [2] H.Yang, Z.Xiong, J.ZHao, D.Niyato, L.Xiao and Q.Wu.deep Reinforcement Learning-Based intention Reflecting Surface for Secure Wireless Communications [ J ]. IEEE Transactions on Wireless Communications, vol.20, no.1, pp.375-388, jan.2021. A Deep recommendation Learning errors [ J ] IEEE Communications Letters, vol.22, no.5, pp.998-1001, and May 2018, an anti-interference beam forming strategy is realized by a DRL algorithm, and the strategy acquires the optimal convex solution of the anti-interference beam and improves the total rate of the system to the maximum extent. There are two limitations on learning decisions for interference rejection: i) Part of the information in the direction of the interfering beam may be lost due to unknown circumstances; and ii) the anti-interference strategy is intelligently switched according to a dynamic environment, but the interference signal is difficult to track in real time.
Reinforcement learning, also called reinforcement learning, mainly includes two types of methods: value-based and probability-based methods. The value-based method optimizes the estimation functions of the action values in different states through experienced learning, so as to obtain the optimal action control strategy, and the reinforced learning exceeds the performance of human beings in most Atari (Yadary) games; the deep neural network has achieved remarkable results in the computer field, and particularly in the computer vision field, the convolutional neural network can be used for effectively extracting convolution characteristics of an image, and excellent results are achieved in non-linear fitting, object positioning, object identification and image semantic segmentation through a deep neural network-based method. However, when facing phase disturbance of the antenna array, system randomness errors or being affected by channel transmission clutter, the interference source can still destroy the array beam and generate distortion, so that the superposition of the interference space is enhanced, and the steering vector formed by the beam generates phase diffusion, so that the covariance matrix of the signal generates a rank bias phenomenon. This phenomenon is equivalent to the fact that interference signals are cracked in a channel environment, so that the array degree of freedom is greatly consumed, partial information in the interference beam direction is lost, and the system transmission performance is seriously reduced. Therefore, how to eliminate wavefront distortion in practical engineering application and establish a stable anti-interference strategy has important significance for researching the maximum optimization system transmission total rate under the array wave space overlapping interference model.
Disclosure of Invention
The invention provides a self-adaptive anti-interference beam forming method based on preprocessing deep reinforcement learning, which solves the problem that in the prior art, interference signals are cracked in a channel environment to cause partial information loss in the direction of interference beams.
The self-adaptive anti-interference beam forming method based on the preprocessing deep reinforcement learning comprises the following steps:
s1, constructing a GPS terminal signal model, wherein the GPS terminal signal model comprises a 2x 2 dual-polarized antenna array, a control variable, a time-varying gain interference variable and Gaussian noise;
s2, constructing a deep learning Convolutional Neural Network (CNN), wherein the CNN comprises a data feature extraction network layer, a convolutional network layer, a pooling layer, an activation function layer and a full link layer;
s3, carrying out reinforcement learning processing and decision implementation on the Q network on the deep learning convolutional neural network to obtain a deep reinforcement learning Q network;
s4, training the deep reinforcement learning Q network to obtain a trained deep reinforcement learning convolution neural network
Figure BDA0003721975010000021
Preferably, in S1, the 2 × 2 dual-polarized antenna array obtains 8 polarization port gains, the manipulated variables realize beam patterns at different azimuth angles, the time-varying gain interference variables satisfy time-varying rayleigh attenuation, and gaussian noise is used as an interference auxiliary quantity;
A2X 2 dual-polarized antenna array is composed of a double-layer 'Rogers RO 3010' material substrate, the relative dielectric constant of the antenna array is 10.2, and the dielectric loss tangent of the antenna array is 0.0035.
Preferably, in S1, the manipulated variables are specifically the ith pitch and azimuth of the downlink navigation transmission 1# link
Figure BDA0003721975010000022
Figure BDA0003721975010000031
Wherein θ and
Figure BDA0003721975010000032
respectively an azimuth angle and an initial pitch angle,
Figure BDA0003721975010000033
for the beam signal steering vector, diag () represents a diagonal matrix symbol, exp () represents an exponential function, j is an imaginary symbol, ω is an angular frequency, sin () cos () represents a trigonometric function product, and R = d/λ is the ratio of the array element spacing d to the resonant wavelength λ.
Preferably, in S1, the time-varying gain interference variable and the gaussian noise are interference variables of a total GPS receiving signal model in downlink navigation transmission links 3# and 4#, and a structural formula of the time-varying gain interference variable is shown in the following
Figure BDA0003721975010000034
In the formula
Figure BDA0003721975010000035
For an interference quantity similar to the desired signal structure, E j H t Obtaining a total GPS signal model according to a downlink navigation transmission link as Rayleigh attenuation interference quantity of time-varying t
Figure BDA0003721975010000036
In the formula (I), the compound is shown in the specification,
Figure BDA0003721975010000037
in order to receive the overall time-varying GPS signal,
Figure BDA0003721975010000038
for the expected GPS signal in the # 1 link,
Figure BDA0003721975010000039
and
Figure BDA00037219750100000310
the manipulated variables of the expected GPS signals in the 1# and 2# links respectively,
Figure BDA00037219750100000311
setting the manipulated variable of the interference signal in the 3# and 4# links, wherein n (t) is a Gaussian noise signal
Figure BDA00037219750100000312
Where i =1x or 2x or j,
Figure BDA00037219750100000313
is an IRS sensor phase keying Nx 1 matrix, j is an imaginary number symbol, the amplitude and the phase of the IRS sensor phase keying unit are respectively eta which belongs to 0 and 1, phi which belongs to 0 and 2 pi]。
Preferably, in S2, the data feature extraction network layer is connected to a convolutional network layer, the convolutional network layer is connected to a pooling layer, and the pooling layer and the activation function layer are respectively connected to a full link layer;
the data feature extraction network layer is a network layer which is subjected to data dimension reduction standardization processing after an input layer, extracts a feature value of input data and inputs the feature value to a data feature extraction network, and the output of the data feature extraction network is a corresponding convolution feature value after the standardization processing at the time t;
the convolution network layer is gradually cognized from regional perception to a higher perception region, and the convolution output formula is
Figure BDA00037219750100000314
Figure BDA00037219750100000315
In the formula (I), the compound is shown in the specification,
Figure BDA00037219750100000316
is the Cartesian inner product, X t ,,W c And b are the input, weight and offset variables in the convolutional layer, respectively;
the pooling layer reduces dimensionality and compresses data to avoid passingDegree fit, selecting maximum pooling pattern to filter excessive noise, s = s x δ(s x ≥n th ) In the formula, s x For the raw data of the network input, δ(s) x ≥n th ) As the original data s x ≥n th Impulse function of n th Is a minimum noise threshold;
the activation function layer selects a linear rectification function ReLu () as an activation function to realize nonlinear conversion from convolution pooling to full link, and the expression is f ReLu (x) Maximum function value of = max (0, x), f ReLu (x) The mathematical symbol of the ReLu (x) activation function is represented, and max (0, x) represents the maximum function value between 0 and x;
the full link layer combines all the characteristics of the nodes of the previous layer at the moment t, Y t =W f ·X t ,Y t+1 =W f ·X t+1 =Y t +W f ·ΔX t+1 In which Y is t And Y t+1 Output data at times t and t +1, W, respectively f Is the weight vector of the full link layer, Δ X t+1 =X t+1 -X t Representing the error variable between the input data X at times t and t +1, Δ X, according to the recursive nature of the Markov decision process t+1 ={X t+1 -X t ,X t -X t-1 ,…,X t-T+2 -X t-T+1 And (c) only affect changes of the input information at the time t and the time t +1, but remain unchanged at the time t-1.
Preferably, S3 comprises:
s3.1. Pi strategy based on t moment state s t And available actions a t The low variance estimates the desired Q-function,
Figure BDA0003721975010000041
implementing deep Q learning for desired operators
Figure BDA0003721975010000042
s represents the initial state of Q learning, a represents the Q initial learning activity, and the pi strategy is the system state from the moment t
Figure BDA0003721975010000043
To available actions
Figure BDA0003721975010000044
Is mapped to a probability distribution of pi(s) t ,a t ):
Figure BDA0003721975010000045
Figure BDA0003721975010000046
State s at moment S3.2.T +1 t+1 And available actions a t+1 Acquisition optimization
Figure BDA0003721975010000047
Value, i-th weight w based on pi strategy i A desired target Q-value is obtained,
Figure BDA0003721975010000048
in the formula, r t Is a cost function at the time t, mu is a controllable factor, and max () is a maximum solver;
s3.3. Non-linear CNN is preprocessed by randomly selecting uniformly distributed elements
Figure BDA0003721975010000049
To approximate it to the target desired value:
Figure BDA00037219750100000410
in the formula (I), the compound is shown in the specification,
Figure BDA00037219750100000411
r(a t )=α(a t )-γδ(a t ≠a t-1 ) Wherein the reward factor α (a) t )=δ(ξ k (a t )≥ξ th ) In the formula, xi k (a t ) For available actions a at time t t The amount of the reward of; in the reward function formula, gamma represents the loss cost coefficient of the transmission signal power, xi th Setting the threshold value set during data transmission of a GNSS emitter;
s3.4. TransfusionEntering a state
Figure BDA00037219750100000412
Cross entropy error is used to prevent the gradient from disappearing each time a loss function is trained, the loss function being:
Figure BDA00037219750100000413
wherein log () is a logarithmic function;
s3.5, optimizing the gradient descent of the loss function in S3.4 under the beta learning rate to obtain an updated Q function algorithm:
Figure BDA00037219750100000414
in the formula, # t Is a gradient value at the time t;
s3.6, in order to avoid local convergence, selecting an expected target action by adopting an epsilon-greedy strategy, selecting a maximum Q table value according to the random probability epsilon is more than or equal to p and less than or equal to 1-epsilon, selecting a current action by using p and less than or equal to epsilon, and expressing the epsilon-greedy strategy of the agent action as follows:
Figure BDA00037219750100000415
preferably, S4 comprises:
s4.1, in each learning time t, the learning agent adopts CNN to pair state information S t Performing a pre-processing to observe a system state for each event step to perform an action, the action comprising an interference rejection weight and a phase shift;
s4.2, selecting an optimal Q function by using an element-greedy strategy to balance exploration and utilization;
s4.3, obtaining the maximum Q function in each time slot according to the probability range epsilon not less than p not more than 1-epsilon, and executing(s) t ,a t ) Thereafter, a prize r(s) is earned t ,a t ) And the next state S t+1 Stored in set D to test the sample at the next time slot t + 1;
and S4.4, feeding back the updated Q value to the CNN so as to take the next action until the loop reaches the maximum iteration number.
Compared with the prior art, the invention has the beneficial effects that: the invention utilizes deep reinforcement learning to carry out beam control on array signals, and realizes beam main lobe positioning and sidelobe null interference signals through controlling manipulated variables. In the anti-interference process, the execution action of the next step is determined by the characteristics of the deep Q network learning interference data, manual intervention is not needed, and the autonomy of the intelligent antenna is greatly improved; the adopted characterization learning capacity of the deep convolutional network is used for carrying out translation invariant classification on input information (beam forming), the input data is pooled maximally, and the convolutional neural network can process a beam array face covariance matrix with smaller calculation amount through convolution kernel parameter sharing in a hidden layer and sparsity of interlayer connection so as to enable the beam array face covariance matrix to approach a target value of a Q function; the reinforcement learning method is used for reinforcement learning of the target Q value, weakening the space overlapping property of interference signals, further eliminating wave front distortion data, seeking the maximization of the transmission total rate of a target anti-interference system, and further reducing the consumption of the degree of freedom of the array antenna.
Drawings
FIG. 1 is a technical flow chart of the present invention;
FIG. 2 is a schematic diagram of a deep learning convolutional neural network CNN operation according to the method of the present invention;
FIG. 3 is a schematic diagram of a deep reinforcement learning Q network training phase of the method of the present invention;
FIG. 4 is a graph showing comparison of the results of examples of the present invention.
Detailed Description
The following description will further illustrate embodiments of the present invention with reference to specific examples:
a method for adaptive anti-interference beam forming based on preprocessing deep reinforcement learning, as shown in fig. 1, includes:
s1, constructing a GPS terminal signal model, wherein the GPS terminal signal model comprises a 2x 2 dual-polarized antenna array, a control variable, a time-varying gain interference variable and Gaussian noise;
s2, constructing a deep learning convolutional neural network CNN, as shown in FIG. 2, including a data feature extraction network layer, a convolutional network layer, a pooling layer, an activation function layer and a full link layer;
s3, carrying out reinforcement learning processing and decision implementation on the Q network on the deep learning convolution neural network to obtain a deep reinforcement learning Q network;
s4, training the deep reinforcement learning Q network, and obtaining a trained deep reinforcement learning convolutional neural network as shown in figure 3
Figure BDA0003721975010000051
In S1, the 2x 2 dual-polarized antenna array obtains 8 polarization port gains, a variable is controlled to realize beam modes under different azimuth angles, time-varying gain interference variables meet time-varying Rayleigh attenuation, and Gaussian noise serves as interference auxiliary quantity;
A2X 2 dual-polarized antenna array is composed of a double-layer 'Rogers RO 3010' material substrate, the relative dielectric constant of the antenna array is 10.2, and the dielectric loss tangent of the antenna array is 0.0035.
In S1, the control variable is specifically the ith pitch and azimuth of the downlink navigation transmission 1# link
Figure BDA0003721975010000052
Figure BDA0003721975010000061
Wherein θ and
Figure BDA0003721975010000062
respectively an azimuth angle and an initial pitch angle,
Figure BDA0003721975010000063
for the beam signal steering vector, diag () represents a diagonal matrix symbol, exp () represents an exponential function, j is an imaginary symbol, ω is an angular frequency, sin () cos () represents a trigonometric function product, and R = d/λ is the ratio of the array element spacing d to the resonant wavelength λ.
In S1, the time-varying gain interference variable and the Gaussian noise are interference quantities of a GPS total received signal model in downlink navigation transmission 3# and 4# links, and the structural formula of the time-varying gain interference variable is shown as
Figure BDA0003721975010000064
In the formula
Figure BDA0003721975010000065
For an amount of interference similar to the desired signal structure, E j H t For the Rayleigh attenuation interference amount of time-varying t, the total GPS signal model is obtained according to the downlink navigation transmission link
Figure BDA0003721975010000066
In the formula (I), the compound is shown in the specification,
Figure BDA0003721975010000067
in order to receive the overall time-varying GPS signal,
Figure BDA0003721975010000068
for the expected GPS signal in the # 1 link,
Figure BDA0003721975010000069
and
Figure BDA00037219750100000610
the manipulated variables of the desired GPS signal in the 1# and 2# links respectively,
Figure BDA00037219750100000611
setting the manipulated variable of interference signals in 3# and 4# links and n (t) as Gaussian noise signals
Figure BDA00037219750100000612
Where i =1x or 2x or j,
Figure BDA00037219750100000613
the method is characterized in that an IRS sensor phase keying Nx1 matrix is formed, j is an imaginary number symbol, the amplitude and the phase of an IRS sensor phase keying unit are respectively eta belonged to 0 and 1, phi belonged to 0 and 2 pi]。
In S2, the data feature extraction network layer is connected with a convolution network layer, the convolution network layer is connected with a pooling layer, and the pooling layer and the activation function layer are respectively connected with a full link layer;
the data feature extraction network layer is a network layer which is subjected to data dimension reduction standardization processing after an input layer, extracts a feature value of input data and inputs the feature value to a data feature extraction network, and the output of the data feature extraction network is a corresponding convolution feature value after the standardization processing at the time t;
the convolution network layer is gradually cognized from region perception to a higher perception region, and the convolution output formula is
Figure BDA00037219750100000614
Figure BDA00037219750100000615
In the formula (I), the compound is shown in the specification,
Figure BDA00037219750100000616
is the Cartesian inner product, X t ,,W c And b are the input, weight and offset variables in the convolutional layer, respectively;
the pooling layer reduces dimensionality and compresses data to avoid overfitting, selects the maximum pooling mode to filter excessive noise, s = s x δ(s x ≥n th ) In the formula, s x For the raw data of the network input, δ(s) x ≥n th ) For the original data s x ≥n th Impulse function of, n th Is a minimum noise threshold;
the activation function layer selects a linear rectification function ReLu () as an activation function to realize nonlinear conversion from convolution pooling to full chaining, and the expression is f ReLu (x) Maximum function value of = max (0, x), f ReLu (x) The mathematical symbol of the ReLu (x) activation function is represented, and max (0, x) represents the maximum function value between 0 and x;
the full link layer combines all the characteristics of the nodes of the previous layer at the moment t, Y t =W f ·X t ,Y t+1 =W f ·X t+1 =Y t +W f ·ΔX t+1 Wherein Y is t And Y t+1 Are respectively asOutput data at times t and t +1, W f Is the weight vector of the full link layer, Δ X t+1 =X t+1 -X t Representing the error variable between the input data X at the time t and t +1, deltaX, according to the recursive nature of the Markov decision process t+1 ={X t+1 -X t ,X t -X t-1 ,…,X t-T+2 -X t-T+1 And the change of the input information only affects t and t +1 moments, but is kept unchanged at the t-1 moment.
S3 comprises the following steps:
s3.1. State s based on t moment under pi strategy t And available actions a t The low variance estimates the desired Q-function,
Figure BDA0003721975010000071
implementing deep Q learning for desired operators
Figure BDA0003721975010000072
s represents the initial state of Q learning, a represents Q initial learning activity, and pi strategy is the system state from t moment
Figure BDA0003721975010000073
To available actions
Figure BDA0003721975010000074
Is mapped to a probability distribution of pi(s) t ,a t ):
Figure BDA0003721975010000075
Figure BDA0003721975010000076
State s at moment S3.2.T +1 t+1 And available actions a t+1 Acquisition optimization
Figure BDA0003721975010000077
Value, i-th weight w based on pi strategy i A desired target Q-value is obtained,
Figure BDA0003721975010000078
in the formula, r t Is a cost function at the time t, mu is a controllable factor, and max () is a maximum solver;
s3.3. Non-linear CNN is preprocessed by randomly selecting uniformly distributed elements
Figure BDA0003721975010000079
To approximate it to the target desired value:
Figure BDA00037219750100000710
in the formula (I), the compound is shown in the specification,
Figure BDA00037219750100000711
r(a t )=α(a t )-γδ(a t ≠a t-1 ) Wherein the reward factor α (a) t )=δ(ξ k (a t )≥ξ th ) In the formula, xi k (a t ) For available actions a at time t t The amount of the reward of; in the reward function formula, gamma represents the loss cost coefficient of the transmission signal power, xi th Setting a threshold value set during data transmission of a GNSS emitter;
s3.4. Input state
Figure BDA00037219750100000712
Cross entropy error is used to prevent the gradient from disappearing each time a loss function is trained, the loss function being:
Figure BDA00037219750100000713
wherein log () is a logarithmic function;
s3.5, optimizing the gradient descent of the loss function in S3.4 under the beta learning rate to obtain an updated Q function algorithm:
Figure BDA00037219750100000714
in the formula, /) t Is a gradient value at the time t;
s3.6, in order to avoid local convergence, selecting an expected target action by adopting an epsilon-greedy strategy, selecting a maximum Q table value according to the random probability epsilon is more than or equal to p and less than or equal to 1-epsilon, selecting a current action by using p and less than or equal to epsilon, and expressing the epsilon-greedy strategy of the agent action as follows:
Figure BDA00037219750100000715
s4, the method comprises the following steps:
s4.1, in each learning time t, the learning agent adopts CNN to pair state information S t Performing a pre-processing to observe a system state for each event step to perform an action, the action comprising an interference rejection weight and a phase shift;
s4.2, selecting an optimal Q function by using an epsilon-greedy strategy to balance exploration and utilization;
s4.3, obtaining the maximum Q function in each time slot according to the probability range epsilon is less than or equal to p and less than or equal to 1-epsilon, and executing(s) t ,a t ) Thereafter, a prize r(s) is won t ,a t ) And the next state S t+1 Stored in set D to test the sample at the next time slot t + 1;
and S4.4, feeding back the updated Q value to the CNN so as to take the next action until the loop reaches the maximum iteration number.
The data calculation results are shown in FIG. 4, the simulation operating environment is the learning library of TensorFlow v1 and Keras on Python v3.6.6Win64, and simulation analysis is performed on a computer with 4 Intel (R) i5-6500 CPU cores (TM), 1258GPU and 8GB memory with main frequency. In addition, the Ansoft HFSS 15.0 software was used to simulate a 2x 2 dual-polarized GNSS smart antenna array. All experimental procedures were run on PyCharm version.2018 to evaluate the interference rejection performance of the proposed DRL.
The convolutional neural network adopts an Adam method to update network parameters; the convolution depth is 32, and the deviation number is 8 bits; the maximum pooling setting is: pooling layer is 1, tensor is 1, padding = 'VALID'; converting input data into shape, wherein the sample number is 1024, and the matrix is 8 multiplied by 1; the initial learning rate is 0.01, and every time loss training stops decreasing, the learning rate is divided by 2; the number of iterative training times is 200; the Small batch (Mini-batch) size is 32; the node of the hidden layer is set to 1024.
Simulation parameters: reference arrangementThe learning rate λ =0.1, max e greedy e =0.9, e delta coefficient Δ e =0.1, discount coefficient μ =0.7 in reward function, cost of consumption coefficient γ =0.2, threshold ξ th =15dB; the parameter of the rayleigh fading gain is set to λ =1,f c =1.5GHz,c=3×10 8 m/s,α=2.8,d l =2m; setting PL in channel loss L-dB 0 =30dB,d 0 =1m, γ =0.8. In addition, the number of CNN convolution kernels is set to 5, where CL has 8 kernels of size 1 × 3, FCL has 32 neurons, maximum PL. In addition, to compare the performance of the proposed method, the following three methods were set:
i) The state space of the GNSS transmission system is optimized by adopting a RL method (Q learning) based on a neural network, wherein the neural network adopts 8 nerve cores, 1 ReLu layer and 1 output layer (named as reinforcement learning);
ii) a greedy learning method (expressed as greedy learning without Q learning) is set, and the method adopts a CNN and a epsilon-greedy strategy to optimize 1# transmitting power sent by the GNSS dual-polarized antenna array to a user without Q learning (named as greedy learning);
iii) A minimum distortion-free adaptive anti-interference beam forming method (spatial filtering) is adopted, and the weight vector is adaptively changed for received data in a minimum variance distortion-free response mode, so that the average power output by an array is minimized (named as minimum distortion-free response).
It is to be understood that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art may make various changes, modifications, additions and substitutions within the spirit and scope of the present invention.

Claims (7)

1. The self-adaptive anti-interference beam forming method based on the preprocessing deep reinforcement learning is characterized by comprising the following steps of:
s1, constructing a GPS terminal signal model, wherein the GPS terminal signal model comprises a 2x 2 dual-polarized antenna array, a control variable, a time-varying gain interference variable and Gaussian noise;
s2, constructing a deep learning Convolutional Neural Network (CNN), wherein the CNN comprises a data feature extraction network layer, a convolutional network layer, a pooling layer, an activation function layer and a full link layer;
s3, carrying out reinforcement learning processing and decision implementation on the Q network on the deep learning convolution neural network to obtain a deep reinforcement learning Q network;
s4, training the deep reinforcement learning Q network to obtain a trained deep reinforcement learning convolutional neural network
Figure FDA0003721972000000011
2. The adaptive anti-interference beam forming method based on preprocessing deep reinforcement learning of claim 1, wherein in S1, the 2 × 2 dual-polarized antenna array obtains 8 polarization port gains, the control variables realize beam modes at different azimuth angles, the time-varying gain interference variables meet time-varying rayleigh attenuation, and gaussian noise is used as an interference auxiliary quantity;
a2 x 2 dual-polarized antenna array is composed of a double-layer substrate made of Rogers RO3010 material, and has a relative dielectric constant of 10.2 and a dielectric loss tangent of 0.0035.
3. The adaptive interference rejection beamforming method based on preprocessing deep reinforcement learning of claim 2, wherein in S1, the manipulated variables are specifically the ith pitch and azimuth of downlink navigation transmission 1# link
Figure FDA0003721972000000012
Figure FDA0003721972000000013
Wherein θ and
Figure FDA0003721972000000014
respectively an azimuth angle and an initial pitch angle,
Figure FDA0003721972000000015
for the beam signal steering vector, diag () represents the diagonal matrix symbols, exp () represents the exponential function, j is the imaginary symbol, ω is the angular frequency, sin () cos () represents the trigonometric function product, and R = d/λ is the ratio of the array element spacing d to the resonance wavelength λ.
4. The adaptive anti-interference beam forming method based on preprocessing deep reinforcement learning as claimed in claim 3, wherein in S1, the time-varying gain interference variable and Gaussian noise are interference quantities of a GPS total received signal model in downlink navigation transmission 3# and 4# links, and the structural formula of the time-varying gain interference variable is shown as
Figure FDA0003721972000000016
In the formula
Figure FDA0003721972000000017
For an interference quantity similar to the desired signal structure, E j H t For the Rayleigh attenuation interference amount of time-varying t, the total GPS signal model is obtained according to the downlink navigation transmission link
Figure FDA0003721972000000018
In the formula (I), the compound is shown in the specification,
Figure FDA0003721972000000019
in order to receive the overall time-varying GPS signal,
Figure FDA00037219720000000110
for the expected GPS signal in the # 1 link,
Figure FDA00037219720000000111
and
Figure FDA0003721972000000021
the manipulated variables of the desired GPS signal in the 1# and 2# links respectively,
Figure FDA0003721972000000022
setting the manipulated variable of the interference signal in the 3# and 4# links, wherein n (t) is a Gaussian noise signal
Figure FDA0003721972000000023
Where i =1x or 2x or j,
Figure FDA0003721972000000024
the method is characterized in that an IRS sensor phase keying Nx1 matrix is formed, j is an imaginary number symbol, the amplitude and the phase of an IRS sensor phase keying unit are respectively eta belonged to 0 and 1, phi belonged to 0 and 2 pi]。
5. The adaptive anti-interference beam forming method based on preprocessing deep reinforcement learning as claimed in claim 4, wherein in S2, the data feature extraction network layer is connected with a convolution network layer, the convolution network layer is connected with a pooling layer, and the pooling layer and the activation function layer are respectively connected with a full link layer;
the data feature extraction network layer is a network layer which is subjected to data dimension reduction standardization processing after an input layer, extracts a feature value of input data and inputs the feature value to a data feature extraction network, and the output of the data feature extraction network is a corresponding convolution feature value after the standardization processing at the time t;
the convolution network layer is gradually cognized from regional perception to a higher perception region, and the convolution output formula is
Figure FDA0003721972000000025
Figure FDA0003721972000000026
In the formula (I), the compound is shown in the specification,
Figure FDA0003721972000000027
is the Cartesian inner product, X t ,,W c And b are in the convolutional layers, respectivelyInputs, weights and bias variables;
the pooling layer reduces dimensionality and compresses data to avoid overfitting, selects the maximum pooling mode to filter excessive noise, s = s x δ(s x ≥n th ) In the formula, s x For the raw data of the network input, δ(s) x ≥n th ) As the original data s x ≥n th Impulse function of, n th Is a minimum noise threshold;
the activation function layer selects a linear rectification function ReLu () as an activation function to realize nonlinear conversion from convolution pooling to full link, and the expression is f ReLu (x) Maximum function value of = max (0, x), f ReLu (x) The mathematical symbol of the ReLu (x) activation function is represented, and max (0, x) represents the maximum function value between 0 and x;
the full link layer combines all the characteristics of the nodes of the previous layer at the moment t, Y t =W f ·X t ,Y t+1 =W f ·X t+1 =Y t +W f ·ΔX t+1 In which Y is t And Y t+1 Output data at times t and t +1, W, respectively f Is the weight vector of the full link layer, Δ X t+1 =X t+1 -X t Representing the error variable between the input data X at the time t and t +1, deltaX, according to the recursive nature of the Markov decision process t+1 ={X t+1 -X t ,X t -X t-1 ,…,X t-T+2 -X t-T+1 And the change of the input information only affects t and t +1 moments, but is kept unchanged at the t-1 moment.
6. The adaptive interference rejection beamforming method based on preprocessing deep reinforcement learning according to claim 5, wherein S3 comprises:
s3.1. Pi strategy based on t moment state s t And available actions a t The low variance estimates the desired Q-function,
Figure FDA0003721972000000028
implementing deep Q learning for desired operators
Figure FDA0003721972000000029
s represents the initial state of Q learning, a represents Q initial learning activity, and pi strategy is the system state from t moment
Figure FDA00037219720000000210
To available actions
Figure FDA00037219720000000211
The probability distribution of the mapping of (2),
Figure FDA00037219720000000212
Figure FDA00037219720000000213
state s at moment S3.2.T +1 t+1 And available actions a t+1 Acquisition optimization
Figure FDA00037219720000000214
Value, i-th weight w based on pi strategy i A desired target Q-value is obtained,
Figure FDA0003721972000000031
in the formula, r t Is a cost function at the time t, mu is a controllable factor, and max () is a maximum solver;
s3.3. Non-linear CNN is preprocessed by randomly selecting uniformly distributed elements
Figure FDA0003721972000000032
To approximate it to the target desired value:
Figure FDA0003721972000000033
in the formula (I), the compound is shown in the specification,
Figure FDA0003721972000000034
r(a t )=α(a t )-γδ(a t ≠a t-1 ) Wherein the reward factor alpha (a) t )=δ(ξ k (a t )≥ξ th ) In the formula, xi k (a t ) For available actions a at time t t The amount of the reward of; in the reward function formula, gamma represents the loss cost coefficient of the transmission signal power, xi th Setting a threshold value set during data transmission of a GNSS emitter;
s3.4. Input state
Figure FDA0003721972000000035
Cross entropy error is used to prevent the gradient from disappearing each time a loss function is trained, the loss function being:
Figure FDA0003721972000000036
in the formula, log () is a logarithmic function;
s3.5, optimizing the gradient descent of the loss function in S3.4 under the beta learning rate to obtain an updated Q function algorithm:
Figure FDA0003721972000000037
in the formula, # t Is a gradient value at the time t;
s3.6, in order to avoid local convergence, selecting an expected target action by adopting an epsilon-greedy strategy, selecting a maximum Q table value according to the random probability epsilon is less than or equal to p and less than or equal to 1-epsilon, selecting a current action by using p less than or equal to epsilon, and expressing the epsilon-greedy strategy of the proxy action as follows:
Figure FDA0003721972000000038
7. the adaptive interference rejection beamforming method based on preprocessing deep reinforcement learning according to claim 6, wherein S4 comprises:
s4.1, in each learning time t, the learning agent adopts CNN to pair state information S t Pre-processing is performed to observe the system state of each event step, to perform actions,the actions include interference rejection weights and phase shifts;
s4.2, selecting an optimal Q function by using an element-greedy strategy to balance exploration and utilization;
s4.3, obtaining the maximum Q function in each time slot according to the probability range epsilon not less than p not more than 1-epsilon, and executing(s) t ,a t ) Thereafter, a prize r(s) is earned t ,a t ) And the next state S t+1 Stored in set D to test the sample at the next time slot t + 1;
and S4.4, feeding back the updated Q value to the CNN so as to take the next action until the loop reaches the maximum iteration number.
CN202210785303.3A 2022-06-30 2022-06-30 Self-adaptive anti-interference beam forming method based on preprocessing deep reinforcement learning Pending CN115296709A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210785303.3A CN115296709A (en) 2022-06-30 2022-06-30 Self-adaptive anti-interference beam forming method based on preprocessing deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210785303.3A CN115296709A (en) 2022-06-30 2022-06-30 Self-adaptive anti-interference beam forming method based on preprocessing deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN115296709A true CN115296709A (en) 2022-11-04

Family

ID=83822862

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210785303.3A Pending CN115296709A (en) 2022-06-30 2022-06-30 Self-adaptive anti-interference beam forming method based on preprocessing deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN115296709A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116996919A (en) * 2023-09-26 2023-11-03 中南大学 Single-node multi-domain anti-interference method based on reinforcement learning
CN117610317A (en) * 2024-01-19 2024-02-27 湖北工业大学 Multi-bit super-surface phase arrangement optimization method based on deep learning
CN118446119A (en) * 2024-06-19 2024-08-06 中国人民解放军国防科技大学 Terahertz flat-top beam forming method and device based on cone optimization and deep learning

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116996919A (en) * 2023-09-26 2023-11-03 中南大学 Single-node multi-domain anti-interference method based on reinforcement learning
CN116996919B (en) * 2023-09-26 2023-12-05 中南大学 Single-node multi-domain anti-interference method based on reinforcement learning
CN117610317A (en) * 2024-01-19 2024-02-27 湖北工业大学 Multi-bit super-surface phase arrangement optimization method based on deep learning
CN117610317B (en) * 2024-01-19 2024-04-12 湖北工业大学 Multi-bit super-surface phase arrangement optimization method based on deep learning
CN118446119A (en) * 2024-06-19 2024-08-06 中国人民解放军国防科技大学 Terahertz flat-top beam forming method and device based on cone optimization and deep learning

Similar Documents

Publication Publication Date Title
CN115296709A (en) Self-adaptive anti-interference beam forming method based on preprocessing deep reinforcement learning
CN107024681A (en) MIMO radar transmit-receive combination optimization method under the conditions of not known based on clutter knowledge
CN109004970A (en) A kind of adaptive sparse array beams forming method of zero norm constraint
Wang et al. Optimal pattern synthesis of linear array and broadband design of whip antenna using grasshopper optimization algorithm
Yang et al. A learning-aided flexible gradient descent approach to MISO beamforming
He et al. GBLinks: GNN-based beam selection and link activation for ultra-dense D2D mmWave networks
CN105282761B (en) A kind of method of quick LMS Adaptive beamformers
CN115942494A (en) Multi-target safe Massive MIMO resource allocation method based on intelligent reflecting surface
Dudczyk et al. Adaptive forming of the beam pattern of microstrip antenna with the use of an artificial neural network
Jiang et al. Active sensing for two-sided beam alignment and reflection design using ping-pong pilots
Omondi et al. Variational autoencoder-enhanced deep neural network-based detection for MIMO systems
Mallipeddi et al. Near optimal robust adaptive beamforming approach based on evolutionary algorithm
CN116192206B (en) Large-scale conformal array real-time wave beam synthesis method based on generalized regression neural network
Omid et al. Deep Reinforcement Learning-Based Secure Standalone Intelligent Reflecting Surface Operation
Haider et al. GAN-based Channel Estimation for IRS-aided Communication Systems
CN117318769A (en) Beam searching method and device and electronic equipment
CN110346766B (en) Null broadening method based on sparse constraint control side lobe
Hsu et al. Memetic algorithms for optimizing adaptive linear array patterns by phase-position perturbation
Papari et al. Robust adaptive beamforming algorithm based on sampling function neural network
Elpidio et al. Comparison of evolutionary algorithms for synthesis of linear array of antennas with minimal level of sidelobe
Mallioras et al. Zero Forcing Beamforming With Sidelobe Suppression Using Neural Networks
Hao et al. Adaptive anti-jamming beamforming based on the preprocessing deep reinforcement learning for downlink navigation communication
Shelim et al. Learning wireless power allocation through graph convolutional regression networks over Riemannian manifolds
CN118118069B (en) Robust self-adaptive beam forming method based on deep expansion network
CN115102589B (en) Deep learning hybrid precoding method of terahertz large-scale MIMO system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination