CN115296709A - Self-adaptive anti-interference beam forming method based on preprocessing deep reinforcement learning - Google Patents
Self-adaptive anti-interference beam forming method based on preprocessing deep reinforcement learning Download PDFInfo
- Publication number
- CN115296709A CN115296709A CN202210785303.3A CN202210785303A CN115296709A CN 115296709 A CN115296709 A CN 115296709A CN 202210785303 A CN202210785303 A CN 202210785303A CN 115296709 A CN115296709 A CN 115296709A
- Authority
- CN
- China
- Prior art keywords
- interference
- layer
- learning
- function
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B7/00—Radio transmission systems, i.e. using radiation field
- H04B7/02—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
- H04B7/04—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
- H04B7/06—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station
- H04B7/0613—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission
- H04B7/0615—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission of weighted versions of same signal
- H04B7/0617—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission of weighted versions of same signal for beam forming
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B17/00—Monitoring; Testing
- H04B17/30—Monitoring; Testing of propagation channels
- H04B17/391—Modelling the propagation channel
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L25/00—Baseband systems
- H04L25/02—Details ; arrangements for supplying electrical power along data transmission lines
- H04L25/0202—Channel estimation
- H04L25/024—Channel estimation channel estimation algorithms
- H04L25/0254—Channel estimation channel estimation algorithms using neural network algorithms
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Electromagnetism (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Power Engineering (AREA)
- Filters That Use Time-Delay Elements (AREA)
Abstract
The invention discloses a self-adaptive anti-interference beam forming method based on preprocessing deep reinforcement learning, belonging to the technical field of navigation and comprising the following steps: constructing a GPS terminal signal model, which comprises a 2x 2 dual-polarized antenna array, a control variable, a time-varying gain interference variable and Gaussian noise; constructing a deep learning Convolutional Neural Network (CNN), which comprises a data feature extraction network layer, a convolutional network layer, a pooling layer, an activation function layer and a full link layer; carrying out reinforcement learning processing and decision implementation on the Q network on the deep learning convolution neural network to obtain a deep reinforcement learning Q network; training the deep reinforcement learning Q network to obtain a trained deep reinforcement learning convolutional neural networkThe invention carries out wave beam control on array signals, realizes wave beam main lobe positioning and sidelobe null interference signals through controlling the control variables; the deep Q network learning interference data characteristics automatically determine the next execution action without human intervention, and the autonomy of the intelligent antenna is greatly improved.
Description
Technical Field
The invention discloses a self-adaptive anti-interference beam forming method based on preprocessing deep reinforcement learning, and belongs to the technical field of navigation.
Background
The self-adaptive anti-interference beam forming is one of main research contents of an anti-interference technology of the intelligent antenna, is widely applied to the fields of transportation, surveying and mapping, telecommunication, water conservancy, fishery, natural disaster relief, aerospace and the like, and has extremely high commercial economic value. The current adaptive anti-interference beam forming technology adjusts the weighting factor of each array element signal according to a certain rule algorithm, so as to adjust the radiation pattern of the antenna array, thereby achieving the purposes of enhancing the expected signal and suppressing the interference signal. However, in the field of actual wireless channel transmission, due to the complex electromagnetic environment and the dynamically variable interference influence, non-uniform dispersion imbalance, multi-directional anisotropy and non-deterministic variation occur during array beam forming, and these adverse factors cause the wave front of the antenna array to be distorted to generate an angle diffusion phenomenon, so that the beam components are scattered and suspended in the transmission channel direction of an interference source, and variable virtual interference signals, namely airspace superposition interference, are caused. The research on an array beam forming algorithm of wavefront distortion caused by dynamic interference is relatively less, but with the development of artificial intelligence, an anti-interference strategy of machine learning is applied to the field of beam forming, and an expected signal of wavefront distortion is modeled and learned and constrained in a dynamic interference environment.
Related publications such as CN202110568887.4: a large array rapid self-adaptive anti-interference method based on a convolutional neural network. The convolution neural network is utilized to solve the problems of large calculation amount and poor beam shape protection in the large phased array self-adaptive beam forming in the prior art, thereby controlling the beam forming to realize the anti-interference purpose. Related documents also research methods for implementing anti-interference by machine learning, such as recently published [1] Z.Xiao, B.Gao, S.Liu and L.Xiao.Learing Based Power Control for MMwave Massive MIMO against jam [ C ]. IEEE Global Communications Conference (GLOBECOM), pp.1-6,2018.Xiao et al adopt DQN learning method to improve the total rate of anti-interference system under unknown low-complexity environment. [2] H.Yang, Z.Xiong, J.ZHao, D.Niyato, L.Xiao and Q.Wu.deep Reinforcement Learning-Based intention Reflecting Surface for Secure Wireless Communications [ J ]. IEEE Transactions on Wireless Communications, vol.20, no.1, pp.375-388, jan.2021. A Deep recommendation Learning errors [ J ] IEEE Communications Letters, vol.22, no.5, pp.998-1001, and May 2018, an anti-interference beam forming strategy is realized by a DRL algorithm, and the strategy acquires the optimal convex solution of the anti-interference beam and improves the total rate of the system to the maximum extent. There are two limitations on learning decisions for interference rejection: i) Part of the information in the direction of the interfering beam may be lost due to unknown circumstances; and ii) the anti-interference strategy is intelligently switched according to a dynamic environment, but the interference signal is difficult to track in real time.
Reinforcement learning, also called reinforcement learning, mainly includes two types of methods: value-based and probability-based methods. The value-based method optimizes the estimation functions of the action values in different states through experienced learning, so as to obtain the optimal action control strategy, and the reinforced learning exceeds the performance of human beings in most Atari (Yadary) games; the deep neural network has achieved remarkable results in the computer field, and particularly in the computer vision field, the convolutional neural network can be used for effectively extracting convolution characteristics of an image, and excellent results are achieved in non-linear fitting, object positioning, object identification and image semantic segmentation through a deep neural network-based method. However, when facing phase disturbance of the antenna array, system randomness errors or being affected by channel transmission clutter, the interference source can still destroy the array beam and generate distortion, so that the superposition of the interference space is enhanced, and the steering vector formed by the beam generates phase diffusion, so that the covariance matrix of the signal generates a rank bias phenomenon. This phenomenon is equivalent to the fact that interference signals are cracked in a channel environment, so that the array degree of freedom is greatly consumed, partial information in the interference beam direction is lost, and the system transmission performance is seriously reduced. Therefore, how to eliminate wavefront distortion in practical engineering application and establish a stable anti-interference strategy has important significance for researching the maximum optimization system transmission total rate under the array wave space overlapping interference model.
Disclosure of Invention
The invention provides a self-adaptive anti-interference beam forming method based on preprocessing deep reinforcement learning, which solves the problem that in the prior art, interference signals are cracked in a channel environment to cause partial information loss in the direction of interference beams.
The self-adaptive anti-interference beam forming method based on the preprocessing deep reinforcement learning comprises the following steps:
s1, constructing a GPS terminal signal model, wherein the GPS terminal signal model comprises a 2x 2 dual-polarized antenna array, a control variable, a time-varying gain interference variable and Gaussian noise;
s2, constructing a deep learning Convolutional Neural Network (CNN), wherein the CNN comprises a data feature extraction network layer, a convolutional network layer, a pooling layer, an activation function layer and a full link layer;
s3, carrying out reinforcement learning processing and decision implementation on the Q network on the deep learning convolutional neural network to obtain a deep reinforcement learning Q network;
s4, training the deep reinforcement learning Q network to obtain a trained deep reinforcement learning convolution neural network
Preferably, in S1, the 2 × 2 dual-polarized antenna array obtains 8 polarization port gains, the manipulated variables realize beam patterns at different azimuth angles, the time-varying gain interference variables satisfy time-varying rayleigh attenuation, and gaussian noise is used as an interference auxiliary quantity;
A2X 2 dual-polarized antenna array is composed of a double-layer 'Rogers RO 3010' material substrate, the relative dielectric constant of the antenna array is 10.2, and the dielectric loss tangent of the antenna array is 0.0035.
Preferably, in S1, the manipulated variables are specifically the ith pitch and azimuth of the downlink navigation transmission 1# link
Wherein θ andrespectively an azimuth angle and an initial pitch angle,for the beam signal steering vector, diag () represents a diagonal matrix symbol, exp () represents an exponential function, j is an imaginary symbol, ω is an angular frequency, sin () cos () represents a trigonometric function product, and R = d/λ is the ratio of the array element spacing d to the resonant wavelength λ.
Preferably, in S1, the time-varying gain interference variable and the gaussian noise are interference variables of a total GPS receiving signal model in downlink navigation transmission links 3# and 4#, and a structural formula of the time-varying gain interference variable is shown in the followingIn the formulaFor an interference quantity similar to the desired signal structure, E j H t Obtaining a total GPS signal model according to a downlink navigation transmission link as Rayleigh attenuation interference quantity of time-varying t
In the formula (I), the compound is shown in the specification,in order to receive the overall time-varying GPS signal,for the expected GPS signal in the # 1 link,andthe manipulated variables of the expected GPS signals in the 1# and 2# links respectively,setting the manipulated variable of the interference signal in the 3# and 4# links, wherein n (t) is a Gaussian noise signalWhere i =1x or 2x or j,is an IRS sensor phase keying Nx 1 matrix, j is an imaginary number symbol, the amplitude and the phase of the IRS sensor phase keying unit are respectively eta which belongs to 0 and 1, phi which belongs to 0 and 2 pi]。
Preferably, in S2, the data feature extraction network layer is connected to a convolutional network layer, the convolutional network layer is connected to a pooling layer, and the pooling layer and the activation function layer are respectively connected to a full link layer;
the data feature extraction network layer is a network layer which is subjected to data dimension reduction standardization processing after an input layer, extracts a feature value of input data and inputs the feature value to a data feature extraction network, and the output of the data feature extraction network is a corresponding convolution feature value after the standardization processing at the time t;
the convolution network layer is gradually cognized from regional perception to a higher perception region, and the convolution output formula is In the formula (I), the compound is shown in the specification,is the Cartesian inner product, X t ,,W c And b are the input, weight and offset variables in the convolutional layer, respectively;
the pooling layer reduces dimensionality and compresses data to avoid passingDegree fit, selecting maximum pooling pattern to filter excessive noise, s = s x δ(s x ≥n th ) In the formula, s x For the raw data of the network input, δ(s) x ≥n th ) As the original data s x ≥n th Impulse function of n th Is a minimum noise threshold;
the activation function layer selects a linear rectification function ReLu () as an activation function to realize nonlinear conversion from convolution pooling to full link, and the expression is f ReLu (x) Maximum function value of = max (0, x), f ReLu (x) The mathematical symbol of the ReLu (x) activation function is represented, and max (0, x) represents the maximum function value between 0 and x;
the full link layer combines all the characteristics of the nodes of the previous layer at the moment t, Y t =W f ·X t ,Y t+1 =W f ·X t+1 =Y t +W f ·ΔX t+1 In which Y is t And Y t+1 Output data at times t and t +1, W, respectively f Is the weight vector of the full link layer, Δ X t+1 =X t+1 -X t Representing the error variable between the input data X at times t and t +1, Δ X, according to the recursive nature of the Markov decision process t+1 ={X t+1 -X t ,X t -X t-1 ,…,X t-T+2 -X t-T+1 And (c) only affect changes of the input information at the time t and the time t +1, but remain unchanged at the time t-1.
Preferably, S3 comprises:
s3.1. Pi strategy based on t moment state s t And available actions a t The low variance estimates the desired Q-function,implementing deep Q learning for desired operatorss represents the initial state of Q learning, a represents the Q initial learning activity, and the pi strategy is the system state from the moment tTo available actionsIs mapped to a probability distribution of pi(s) t ,a t ):
State s at moment S3.2.T +1 t+1 And available actions a t+1 Acquisition optimizationValue, i-th weight w based on pi strategy i A desired target Q-value is obtained,in the formula, r t Is a cost function at the time t, mu is a controllable factor, and max () is a maximum solver;
s3.3. Non-linear CNN is preprocessed by randomly selecting uniformly distributed elementsTo approximate it to the target desired value:in the formula (I), the compound is shown in the specification,r(a t )=α(a t )-γδ(a t ≠a t-1 ) Wherein the reward factor α (a) t )=δ(ξ k (a t )≥ξ th ) In the formula, xi k (a t ) For available actions a at time t t The amount of the reward of; in the reward function formula, gamma represents the loss cost coefficient of the transmission signal power, xi th Setting the threshold value set during data transmission of a GNSS emitter;
s3.4. TransfusionEntering a stateCross entropy error is used to prevent the gradient from disappearing each time a loss function is trained, the loss function being:wherein log () is a logarithmic function;
s3.5, optimizing the gradient descent of the loss function in S3.4 under the beta learning rate to obtain an updated Q function algorithm:
s3.6, in order to avoid local convergence, selecting an expected target action by adopting an epsilon-greedy strategy, selecting a maximum Q table value according to the random probability epsilon is more than or equal to p and less than or equal to 1-epsilon, selecting a current action by using p and less than or equal to epsilon, and expressing the epsilon-greedy strategy of the agent action as follows:
preferably, S4 comprises:
s4.1, in each learning time t, the learning agent adopts CNN to pair state information S t Performing a pre-processing to observe a system state for each event step to perform an action, the action comprising an interference rejection weight and a phase shift;
s4.2, selecting an optimal Q function by using an element-greedy strategy to balance exploration and utilization;
s4.3, obtaining the maximum Q function in each time slot according to the probability range epsilon not less than p not more than 1-epsilon, and executing(s) t ,a t ) Thereafter, a prize r(s) is earned t ,a t ) And the next state S t+1 Stored in set D to test the sample at the next time slot t + 1;
and S4.4, feeding back the updated Q value to the CNN so as to take the next action until the loop reaches the maximum iteration number.
Compared with the prior art, the invention has the beneficial effects that: the invention utilizes deep reinforcement learning to carry out beam control on array signals, and realizes beam main lobe positioning and sidelobe null interference signals through controlling manipulated variables. In the anti-interference process, the execution action of the next step is determined by the characteristics of the deep Q network learning interference data, manual intervention is not needed, and the autonomy of the intelligent antenna is greatly improved; the adopted characterization learning capacity of the deep convolutional network is used for carrying out translation invariant classification on input information (beam forming), the input data is pooled maximally, and the convolutional neural network can process a beam array face covariance matrix with smaller calculation amount through convolution kernel parameter sharing in a hidden layer and sparsity of interlayer connection so as to enable the beam array face covariance matrix to approach a target value of a Q function; the reinforcement learning method is used for reinforcement learning of the target Q value, weakening the space overlapping property of interference signals, further eliminating wave front distortion data, seeking the maximization of the transmission total rate of a target anti-interference system, and further reducing the consumption of the degree of freedom of the array antenna.
Drawings
FIG. 1 is a technical flow chart of the present invention;
FIG. 2 is a schematic diagram of a deep learning convolutional neural network CNN operation according to the method of the present invention;
FIG. 3 is a schematic diagram of a deep reinforcement learning Q network training phase of the method of the present invention;
FIG. 4 is a graph showing comparison of the results of examples of the present invention.
Detailed Description
The following description will further illustrate embodiments of the present invention with reference to specific examples:
a method for adaptive anti-interference beam forming based on preprocessing deep reinforcement learning, as shown in fig. 1, includes:
s1, constructing a GPS terminal signal model, wherein the GPS terminal signal model comprises a 2x 2 dual-polarized antenna array, a control variable, a time-varying gain interference variable and Gaussian noise;
s2, constructing a deep learning convolutional neural network CNN, as shown in FIG. 2, including a data feature extraction network layer, a convolutional network layer, a pooling layer, an activation function layer and a full link layer;
s3, carrying out reinforcement learning processing and decision implementation on the Q network on the deep learning convolution neural network to obtain a deep reinforcement learning Q network;
s4, training the deep reinforcement learning Q network, and obtaining a trained deep reinforcement learning convolutional neural network as shown in figure 3
In S1, the 2x 2 dual-polarized antenna array obtains 8 polarization port gains, a variable is controlled to realize beam modes under different azimuth angles, time-varying gain interference variables meet time-varying Rayleigh attenuation, and Gaussian noise serves as interference auxiliary quantity;
A2X 2 dual-polarized antenna array is composed of a double-layer 'Rogers RO 3010' material substrate, the relative dielectric constant of the antenna array is 10.2, and the dielectric loss tangent of the antenna array is 0.0035.
In S1, the control variable is specifically the ith pitch and azimuth of the downlink navigation transmission 1# link
Wherein θ andrespectively an azimuth angle and an initial pitch angle,for the beam signal steering vector, diag () represents a diagonal matrix symbol, exp () represents an exponential function, j is an imaginary symbol, ω is an angular frequency, sin () cos () represents a trigonometric function product, and R = d/λ is the ratio of the array element spacing d to the resonant wavelength λ.
In S1, the time-varying gain interference variable and the Gaussian noise are interference quantities of a GPS total received signal model in downlink navigation transmission 3# and 4# links, and the structural formula of the time-varying gain interference variable is shown asIn the formulaFor an amount of interference similar to the desired signal structure, E j H t For the Rayleigh attenuation interference amount of time-varying t, the total GPS signal model is obtained according to the downlink navigation transmission link
In the formula (I), the compound is shown in the specification,in order to receive the overall time-varying GPS signal,for the expected GPS signal in the # 1 link,andthe manipulated variables of the desired GPS signal in the 1# and 2# links respectively,setting the manipulated variable of interference signals in 3# and 4# links and n (t) as Gaussian noise signalsWhere i =1x or 2x or j,the method is characterized in that an IRS sensor phase keying Nx1 matrix is formed, j is an imaginary number symbol, the amplitude and the phase of an IRS sensor phase keying unit are respectively eta belonged to 0 and 1, phi belonged to 0 and 2 pi]。
In S2, the data feature extraction network layer is connected with a convolution network layer, the convolution network layer is connected with a pooling layer, and the pooling layer and the activation function layer are respectively connected with a full link layer;
the data feature extraction network layer is a network layer which is subjected to data dimension reduction standardization processing after an input layer, extracts a feature value of input data and inputs the feature value to a data feature extraction network, and the output of the data feature extraction network is a corresponding convolution feature value after the standardization processing at the time t;
the convolution network layer is gradually cognized from region perception to a higher perception region, and the convolution output formula is In the formula (I), the compound is shown in the specification,is the Cartesian inner product, X t ,,W c And b are the input, weight and offset variables in the convolutional layer, respectively;
the pooling layer reduces dimensionality and compresses data to avoid overfitting, selects the maximum pooling mode to filter excessive noise, s = s x δ(s x ≥n th ) In the formula, s x For the raw data of the network input, δ(s) x ≥n th ) For the original data s x ≥n th Impulse function of, n th Is a minimum noise threshold;
the activation function layer selects a linear rectification function ReLu () as an activation function to realize nonlinear conversion from convolution pooling to full chaining, and the expression is f ReLu (x) Maximum function value of = max (0, x), f ReLu (x) The mathematical symbol of the ReLu (x) activation function is represented, and max (0, x) represents the maximum function value between 0 and x;
the full link layer combines all the characteristics of the nodes of the previous layer at the moment t, Y t =W f ·X t ,Y t+1 =W f ·X t+1 =Y t +W f ·ΔX t+1 Wherein Y is t And Y t+1 Are respectively asOutput data at times t and t +1, W f Is the weight vector of the full link layer, Δ X t+1 =X t+1 -X t Representing the error variable between the input data X at the time t and t +1, deltaX, according to the recursive nature of the Markov decision process t+1 ={X t+1 -X t ,X t -X t-1 ,…,X t-T+2 -X t-T+1 And the change of the input information only affects t and t +1 moments, but is kept unchanged at the t-1 moment.
S3 comprises the following steps:
s3.1. State s based on t moment under pi strategy t And available actions a t The low variance estimates the desired Q-function,implementing deep Q learning for desired operatorss represents the initial state of Q learning, a represents Q initial learning activity, and pi strategy is the system state from t momentTo available actionsIs mapped to a probability distribution of pi(s) t ,a t ):
State s at moment S3.2.T +1 t+1 And available actions a t+1 Acquisition optimizationValue, i-th weight w based on pi strategy i A desired target Q-value is obtained,in the formula, r t Is a cost function at the time t, mu is a controllable factor, and max () is a maximum solver;
s3.3. Non-linear CNN is preprocessed by randomly selecting uniformly distributed elementsTo approximate it to the target desired value:in the formula (I), the compound is shown in the specification,r(a t )=α(a t )-γδ(a t ≠a t-1 ) Wherein the reward factor α (a) t )=δ(ξ k (a t )≥ξ th ) In the formula, xi k (a t ) For available actions a at time t t The amount of the reward of; in the reward function formula, gamma represents the loss cost coefficient of the transmission signal power, xi th Setting a threshold value set during data transmission of a GNSS emitter;
s3.4. Input stateCross entropy error is used to prevent the gradient from disappearing each time a loss function is trained, the loss function being:wherein log () is a logarithmic function;
s3.5, optimizing the gradient descent of the loss function in S3.4 under the beta learning rate to obtain an updated Q function algorithm:
s3.6, in order to avoid local convergence, selecting an expected target action by adopting an epsilon-greedy strategy, selecting a maximum Q table value according to the random probability epsilon is more than or equal to p and less than or equal to 1-epsilon, selecting a current action by using p and less than or equal to epsilon, and expressing the epsilon-greedy strategy of the agent action as follows:
s4, the method comprises the following steps:
s4.1, in each learning time t, the learning agent adopts CNN to pair state information S t Performing a pre-processing to observe a system state for each event step to perform an action, the action comprising an interference rejection weight and a phase shift;
s4.2, selecting an optimal Q function by using an epsilon-greedy strategy to balance exploration and utilization;
s4.3, obtaining the maximum Q function in each time slot according to the probability range epsilon is less than or equal to p and less than or equal to 1-epsilon, and executing(s) t ,a t ) Thereafter, a prize r(s) is won t ,a t ) And the next state S t+1 Stored in set D to test the sample at the next time slot t + 1;
and S4.4, feeding back the updated Q value to the CNN so as to take the next action until the loop reaches the maximum iteration number.
The data calculation results are shown in FIG. 4, the simulation operating environment is the learning library of TensorFlow v1 and Keras on Python v3.6.6Win64, and simulation analysis is performed on a computer with 4 Intel (R) i5-6500 CPU cores (TM), 1258GPU and 8GB memory with main frequency. In addition, the Ansoft HFSS 15.0 software was used to simulate a 2x 2 dual-polarized GNSS smart antenna array. All experimental procedures were run on PyCharm version.2018 to evaluate the interference rejection performance of the proposed DRL.
The convolutional neural network adopts an Adam method to update network parameters; the convolution depth is 32, and the deviation number is 8 bits; the maximum pooling setting is: pooling layer is 1, tensor is 1, padding = 'VALID'; converting input data into shape, wherein the sample number is 1024, and the matrix is 8 multiplied by 1; the initial learning rate is 0.01, and every time loss training stops decreasing, the learning rate is divided by 2; the number of iterative training times is 200; the Small batch (Mini-batch) size is 32; the node of the hidden layer is set to 1024.
Simulation parameters: reference arrangementThe learning rate λ =0.1, max e greedy e =0.9, e delta coefficient Δ e =0.1, discount coefficient μ =0.7 in reward function, cost of consumption coefficient γ =0.2, threshold ξ th =15dB; the parameter of the rayleigh fading gain is set to λ =1,f c =1.5GHz,c=3×10 8 m/s,α=2.8,d l =2m; setting PL in channel loss L-dB 0 =30dB,d 0 =1m, γ =0.8. In addition, the number of CNN convolution kernels is set to 5, where CL has 8 kernels of size 1 × 3, FCL has 32 neurons, maximum PL. In addition, to compare the performance of the proposed method, the following three methods were set:
i) The state space of the GNSS transmission system is optimized by adopting a RL method (Q learning) based on a neural network, wherein the neural network adopts 8 nerve cores, 1 ReLu layer and 1 output layer (named as reinforcement learning);
ii) a greedy learning method (expressed as greedy learning without Q learning) is set, and the method adopts a CNN and a epsilon-greedy strategy to optimize 1# transmitting power sent by the GNSS dual-polarized antenna array to a user without Q learning (named as greedy learning);
iii) A minimum distortion-free adaptive anti-interference beam forming method (spatial filtering) is adopted, and the weight vector is adaptively changed for received data in a minimum variance distortion-free response mode, so that the average power output by an array is minimized (named as minimum distortion-free response).
It is to be understood that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art may make various changes, modifications, additions and substitutions within the spirit and scope of the present invention.
Claims (7)
1. The self-adaptive anti-interference beam forming method based on the preprocessing deep reinforcement learning is characterized by comprising the following steps of:
s1, constructing a GPS terminal signal model, wherein the GPS terminal signal model comprises a 2x 2 dual-polarized antenna array, a control variable, a time-varying gain interference variable and Gaussian noise;
s2, constructing a deep learning Convolutional Neural Network (CNN), wherein the CNN comprises a data feature extraction network layer, a convolutional network layer, a pooling layer, an activation function layer and a full link layer;
s3, carrying out reinforcement learning processing and decision implementation on the Q network on the deep learning convolution neural network to obtain a deep reinforcement learning Q network;
2. The adaptive anti-interference beam forming method based on preprocessing deep reinforcement learning of claim 1, wherein in S1, the 2 × 2 dual-polarized antenna array obtains 8 polarization port gains, the control variables realize beam modes at different azimuth angles, the time-varying gain interference variables meet time-varying rayleigh attenuation, and gaussian noise is used as an interference auxiliary quantity;
a2 x 2 dual-polarized antenna array is composed of a double-layer substrate made of Rogers RO3010 material, and has a relative dielectric constant of 10.2 and a dielectric loss tangent of 0.0035.
3. The adaptive interference rejection beamforming method based on preprocessing deep reinforcement learning of claim 2, wherein in S1, the manipulated variables are specifically the ith pitch and azimuth of downlink navigation transmission 1# link
Wherein θ andrespectively an azimuth angle and an initial pitch angle,for the beam signal steering vector, diag () represents the diagonal matrix symbols, exp () represents the exponential function, j is the imaginary symbol, ω is the angular frequency, sin () cos () represents the trigonometric function product, and R = d/λ is the ratio of the array element spacing d to the resonance wavelength λ.
4. The adaptive anti-interference beam forming method based on preprocessing deep reinforcement learning as claimed in claim 3, wherein in S1, the time-varying gain interference variable and Gaussian noise are interference quantities of a GPS total received signal model in downlink navigation transmission 3# and 4# links, and the structural formula of the time-varying gain interference variable is shown asIn the formulaFor an interference quantity similar to the desired signal structure, E j H t For the Rayleigh attenuation interference amount of time-varying t, the total GPS signal model is obtained according to the downlink navigation transmission link
In the formula (I), the compound is shown in the specification,in order to receive the overall time-varying GPS signal,for the expected GPS signal in the # 1 link,andthe manipulated variables of the desired GPS signal in the 1# and 2# links respectively,setting the manipulated variable of the interference signal in the 3# and 4# links, wherein n (t) is a Gaussian noise signalWhere i =1x or 2x or j,the method is characterized in that an IRS sensor phase keying Nx1 matrix is formed, j is an imaginary number symbol, the amplitude and the phase of an IRS sensor phase keying unit are respectively eta belonged to 0 and 1, phi belonged to 0 and 2 pi]。
5. The adaptive anti-interference beam forming method based on preprocessing deep reinforcement learning as claimed in claim 4, wherein in S2, the data feature extraction network layer is connected with a convolution network layer, the convolution network layer is connected with a pooling layer, and the pooling layer and the activation function layer are respectively connected with a full link layer;
the data feature extraction network layer is a network layer which is subjected to data dimension reduction standardization processing after an input layer, extracts a feature value of input data and inputs the feature value to a data feature extraction network, and the output of the data feature extraction network is a corresponding convolution feature value after the standardization processing at the time t;
the convolution network layer is gradually cognized from regional perception to a higher perception region, and the convolution output formula is In the formula (I), the compound is shown in the specification,is the Cartesian inner product, X t ,,W c And b are in the convolutional layers, respectivelyInputs, weights and bias variables;
the pooling layer reduces dimensionality and compresses data to avoid overfitting, selects the maximum pooling mode to filter excessive noise, s = s x δ(s x ≥n th ) In the formula, s x For the raw data of the network input, δ(s) x ≥n th ) As the original data s x ≥n th Impulse function of, n th Is a minimum noise threshold;
the activation function layer selects a linear rectification function ReLu () as an activation function to realize nonlinear conversion from convolution pooling to full link, and the expression is f ReLu (x) Maximum function value of = max (0, x), f ReLu (x) The mathematical symbol of the ReLu (x) activation function is represented, and max (0, x) represents the maximum function value between 0 and x;
the full link layer combines all the characteristics of the nodes of the previous layer at the moment t, Y t =W f ·X t ,Y t+1 =W f ·X t+1 =Y t +W f ·ΔX t+1 In which Y is t And Y t+1 Output data at times t and t +1, W, respectively f Is the weight vector of the full link layer, Δ X t+1 =X t+1 -X t Representing the error variable between the input data X at the time t and t +1, deltaX, according to the recursive nature of the Markov decision process t+1 ={X t+1 -X t ,X t -X t-1 ,…,X t-T+2 -X t-T+1 And the change of the input information only affects t and t +1 moments, but is kept unchanged at the t-1 moment.
6. The adaptive interference rejection beamforming method based on preprocessing deep reinforcement learning according to claim 5, wherein S3 comprises:
s3.1. Pi strategy based on t moment state s t And available actions a t The low variance estimates the desired Q-function,implementing deep Q learning for desired operatorss represents the initial state of Q learning, a represents Q initial learning activity, and pi strategy is the system state from t momentTo available actionsThe probability distribution of the mapping of (2),
state s at moment S3.2.T +1 t+1 And available actions a t+1 Acquisition optimizationValue, i-th weight w based on pi strategy i A desired target Q-value is obtained,in the formula, r t Is a cost function at the time t, mu is a controllable factor, and max () is a maximum solver;
s3.3. Non-linear CNN is preprocessed by randomly selecting uniformly distributed elementsTo approximate it to the target desired value:in the formula (I), the compound is shown in the specification,r(a t )=α(a t )-γδ(a t ≠a t-1 ) Wherein the reward factor alpha (a) t )=δ(ξ k (a t )≥ξ th ) In the formula, xi k (a t ) For available actions a at time t t The amount of the reward of; in the reward function formula, gamma represents the loss cost coefficient of the transmission signal power, xi th Setting a threshold value set during data transmission of a GNSS emitter;
s3.4. Input stateCross entropy error is used to prevent the gradient from disappearing each time a loss function is trained, the loss function being:in the formula, log () is a logarithmic function;
s3.5, optimizing the gradient descent of the loss function in S3.4 under the beta learning rate to obtain an updated Q function algorithm:
s3.6, in order to avoid local convergence, selecting an expected target action by adopting an epsilon-greedy strategy, selecting a maximum Q table value according to the random probability epsilon is less than or equal to p and less than or equal to 1-epsilon, selecting a current action by using p less than or equal to epsilon, and expressing the epsilon-greedy strategy of the proxy action as follows:
7. the adaptive interference rejection beamforming method based on preprocessing deep reinforcement learning according to claim 6, wherein S4 comprises:
s4.1, in each learning time t, the learning agent adopts CNN to pair state information S t Pre-processing is performed to observe the system state of each event step, to perform actions,the actions include interference rejection weights and phase shifts;
s4.2, selecting an optimal Q function by using an element-greedy strategy to balance exploration and utilization;
s4.3, obtaining the maximum Q function in each time slot according to the probability range epsilon not less than p not more than 1-epsilon, and executing(s) t ,a t ) Thereafter, a prize r(s) is earned t ,a t ) And the next state S t+1 Stored in set D to test the sample at the next time slot t + 1;
and S4.4, feeding back the updated Q value to the CNN so as to take the next action until the loop reaches the maximum iteration number.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210785303.3A CN115296709A (en) | 2022-06-30 | 2022-06-30 | Self-adaptive anti-interference beam forming method based on preprocessing deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210785303.3A CN115296709A (en) | 2022-06-30 | 2022-06-30 | Self-adaptive anti-interference beam forming method based on preprocessing deep reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115296709A true CN115296709A (en) | 2022-11-04 |
Family
ID=83822862
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210785303.3A Pending CN115296709A (en) | 2022-06-30 | 2022-06-30 | Self-adaptive anti-interference beam forming method based on preprocessing deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115296709A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116996919A (en) * | 2023-09-26 | 2023-11-03 | 中南大学 | Single-node multi-domain anti-interference method based on reinforcement learning |
CN117610317A (en) * | 2024-01-19 | 2024-02-27 | 湖北工业大学 | Multi-bit super-surface phase arrangement optimization method based on deep learning |
CN118446119A (en) * | 2024-06-19 | 2024-08-06 | 中国人民解放军国防科技大学 | Terahertz flat-top beam forming method and device based on cone optimization and deep learning |
-
2022
- 2022-06-30 CN CN202210785303.3A patent/CN115296709A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116996919A (en) * | 2023-09-26 | 2023-11-03 | 中南大学 | Single-node multi-domain anti-interference method based on reinforcement learning |
CN116996919B (en) * | 2023-09-26 | 2023-12-05 | 中南大学 | Single-node multi-domain anti-interference method based on reinforcement learning |
CN117610317A (en) * | 2024-01-19 | 2024-02-27 | 湖北工业大学 | Multi-bit super-surface phase arrangement optimization method based on deep learning |
CN117610317B (en) * | 2024-01-19 | 2024-04-12 | 湖北工业大学 | Multi-bit super-surface phase arrangement optimization method based on deep learning |
CN118446119A (en) * | 2024-06-19 | 2024-08-06 | 中国人民解放军国防科技大学 | Terahertz flat-top beam forming method and device based on cone optimization and deep learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115296709A (en) | Self-adaptive anti-interference beam forming method based on preprocessing deep reinforcement learning | |
CN107024681A (en) | MIMO radar transmit-receive combination optimization method under the conditions of not known based on clutter knowledge | |
CN109004970A (en) | A kind of adaptive sparse array beams forming method of zero norm constraint | |
Wang et al. | Optimal pattern synthesis of linear array and broadband design of whip antenna using grasshopper optimization algorithm | |
Yang et al. | A learning-aided flexible gradient descent approach to MISO beamforming | |
He et al. | GBLinks: GNN-based beam selection and link activation for ultra-dense D2D mmWave networks | |
CN105282761B (en) | A kind of method of quick LMS Adaptive beamformers | |
CN115942494A (en) | Multi-target safe Massive MIMO resource allocation method based on intelligent reflecting surface | |
Dudczyk et al. | Adaptive forming of the beam pattern of microstrip antenna with the use of an artificial neural network | |
Jiang et al. | Active sensing for two-sided beam alignment and reflection design using ping-pong pilots | |
Omondi et al. | Variational autoencoder-enhanced deep neural network-based detection for MIMO systems | |
Mallipeddi et al. | Near optimal robust adaptive beamforming approach based on evolutionary algorithm | |
CN116192206B (en) | Large-scale conformal array real-time wave beam synthesis method based on generalized regression neural network | |
Omid et al. | Deep Reinforcement Learning-Based Secure Standalone Intelligent Reflecting Surface Operation | |
Haider et al. | GAN-based Channel Estimation for IRS-aided Communication Systems | |
CN117318769A (en) | Beam searching method and device and electronic equipment | |
CN110346766B (en) | Null broadening method based on sparse constraint control side lobe | |
Hsu et al. | Memetic algorithms for optimizing adaptive linear array patterns by phase-position perturbation | |
Papari et al. | Robust adaptive beamforming algorithm based on sampling function neural network | |
Elpidio et al. | Comparison of evolutionary algorithms for synthesis of linear array of antennas with minimal level of sidelobe | |
Mallioras et al. | Zero Forcing Beamforming With Sidelobe Suppression Using Neural Networks | |
Hao et al. | Adaptive anti-jamming beamforming based on the preprocessing deep reinforcement learning for downlink navigation communication | |
Shelim et al. | Learning wireless power allocation through graph convolutional regression networks over Riemannian manifolds | |
CN118118069B (en) | Robust self-adaptive beam forming method based on deep expansion network | |
CN115102589B (en) | Deep learning hybrid precoding method of terahertz large-scale MIMO system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |