CN114884547A - Active monitoring method based on deep reinforcement learning - Google Patents
Active monitoring method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN114884547A CN114884547A CN202210312148.3A CN202210312148A CN114884547A CN 114884547 A CN114884547 A CN 114884547A CN 202210312148 A CN202210312148 A CN 202210312148A CN 114884547 A CN114884547 A CN 114884547A
- Authority
- CN
- China
- Prior art keywords
- transmitter
- listener
- receiver
- power
- monitoring
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012544 monitoring process Methods 0.000 title claims abstract description 50
- 238000000034 method Methods 0.000 title claims abstract description 44
- 230000002787 reinforcement Effects 0.000 title claims abstract description 17
- 239000011159 matrix material Substances 0.000 claims abstract description 45
- 238000004891 communication Methods 0.000 claims abstract description 23
- 230000006698 induction Effects 0.000 claims abstract description 20
- 238000013461 design Methods 0.000 claims abstract description 4
- 230000002349 favourable effect Effects 0.000 claims abstract description 4
- 230000005540 biological transmission Effects 0.000 claims description 24
- 230000006870 function Effects 0.000 claims description 9
- 238000005457 optimization Methods 0.000 claims description 8
- 238000012549 training Methods 0.000 claims description 8
- 238000005259 measurement Methods 0.000 claims description 7
- 230000006399 behavior Effects 0.000 claims description 3
- 238000005094 computer simulation Methods 0.000 abstract 1
- 239000013598 vector Substances 0.000 description 11
- 239000003795 chemical substances by application Substances 0.000 description 9
- 230000001965 increasing effect Effects 0.000 description 4
- 230000001939 inductive effect Effects 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 238000000342 Monte Carlo simulation Methods 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B7/00—Radio transmission systems, i.e. using radiation field
- H04B7/02—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
- H04B7/04—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
- H04B7/06—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station
- H04B7/0613—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission
- H04B7/0615—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission of weighted versions of same signal
- H04B7/0619—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission of weighted versions of same signal using feedback from receiving side
- H04B7/0636—Feedback format
- H04B7/0639—Using selective indices, e.g. of a codebook, e.g. pre-distortion matrix index [PMI] or for beam selection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B7/00—Radio transmission systems, i.e. using radiation field
- H04B7/02—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
- H04B7/022—Site diversity; Macro-diversity
- H04B7/024—Co-operative use of antennas of several sites, e.g. in co-ordinated multipoint or co-operative multiple-input multiple-output [MIMO] systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B7/00—Radio transmission systems, i.e. using radiation field
- H04B7/02—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
- H04B7/04—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
- H04B7/0413—MIMO systems
- H04B7/0456—Selection of precoding matrices or codebooks, e.g. using matrices antenna weighting
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B7/00—Radio transmission systems, i.e. using radiation field
- H04B7/02—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
- H04B7/04—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
- H04B7/06—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station
- H04B7/0613—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission
- H04B7/0615—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission of weighted versions of same signal
- H04B7/0619—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission of weighted versions of same signal using feedback from receiving side
- H04B7/0621—Feedback content
- H04B7/0626—Channel coefficients, e.g. channel state information [CSI]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Radio Transmission System (AREA)
Abstract
The invention discloses an active monitoring method based on deep reinforcement learning, and belongs to the field of communication. In massive MIMO-OFDM systems, conventional passive and active listening schemes become inefficient or even ineffective when the listener E and suspect receiver D are not within the coverage area of the same communication beam. In order to realize legal monitoring of a large-scale MIMO-OFDM system, a monitor is used as a pseudo relay to realize beam induction and data monitoring. When the transmitter S performs beam scanning, the listener E induces the transmitter to select a beam that is favorable for listening by optimizing the relay precoding matrix. In the data listening phase, the listener E increases the listening rate by optimizing the relay power allocation factor and the power gain factor. Because the channel state information of the suspicious communication link is unknown, an optimal precoding matrix and a power distribution factor are searched through a deep reinforcement learning algorithm-MADDPG. Computer simulation verifies the validity of the proposed design.
Description
Technical Field
The invention belongs to the field of communication, and particularly relates to an active monitoring method based on deep reinforcement learning, and more particularly relates to an active monitoring method in a large-scale MIMO-OFDM (Multiple Input Multiple Output-Orthogonal Frequency Division Multiplexing) system based on deep reinforcement learning.
Background
MIMO-OFDM technology is considered a key technology of fifth generation (5G) mobile networks. However, when advanced beamforming techniques are employed in 5G base stations, directional narrow beams make conventional listening methods inefficient or even ineffective. Therefore, in order to realize lawful interception of suspicious links, it is important to study an interception scheme in a narrow beam scenario.
Existing documents on monitoring can be divided into three categories, passive monitoring, interference-type active monitoring and spoofed relay-type active monitoring. In passive listening, the listener remains silent at listening, i.e. only listens to the data sent by the transmitter. This method is effective only when the listening channel is better than the suspect channel. To overcome this drawback, a method of active monitoring of the interfering type is introduced, i.e. the listener sends interfering signals towards the suspect receiver, forcing the transmitter to reduce the rate so that the information can be decoded by the listener. In order to flexibly implement active listening, a listening method called spoofed relay is proposed. When the listening channel is better than the suspect channel, the method can maximize the listening rate by disguising the listener as a relay. However, when a transmitter sends information to a suspected user using a directional beam, using any of the listening schemes described above does not allow a listener that is outside the beam coverage to successfully listen.
Disclosure of Invention
The invention aims to: aiming at the defects of the prior art, the invention researches a scheme that a monitor outside a beam coverage range in an MIMO-OFDM system can successfully monitor a suspicious communication link, and provides an active monitoring method in a large-scale MIMO-OFDM system based on deep reinforcement learning, so as to ensure that communication data can be successfully monitored even if a transmitter adopts narrow beams to communicate with a suspicious receiver.
The technical scheme is as follows: an active monitoring method based on deep reinforcement learning comprises the following steps:
(1) the transmitter S executes analog beam scanning in a time division manner according to the beam precoding book;
(2) during the phase of beam scanning performed by the transmitter S, the listener E performs a phase of beam scanning according to itselfAnd the beam report fed back by the receiver D to the transmitter S determines the best beam index j advantageous to itself * ;
(3) The listener E induces the transmitter S to select the best beam index j by optimizing the forward precoding matrix * ;
(4) In beam j * In the determined communication stage, the listener E serves as a pseudo relay for data forwarding, communication beams are maintained, and the data interception rate is improved.
Further, the step (2) comprises the following steps:
1) the receiver D and the monitor E respectively receive a beam quality measurement reference signal sent by the transmitter S, and calculate the beam quality according to the received signal, and the receiver D forms a beam quality report and feeds the beam quality report back to the transmitter S to be used as beam selection reference;
2) the monitor E determines the best beam index j according to the beam quality report and the beam quality report fed back to the transmitter S by the monitor receiver D, and finally according to the beam induction success rate and the power consumption compromise formula by considering the power consumption factor * 。
The step (3) comprises the following steps:
(1) the listener forms an optimization problem: minimizing the total transmitting power of the monitor under the constraint of successful beam induction, deducing the form of an optimal precoding matrix according to the optimization problem, and obtaining that the optimal precoding matrix is related to the channel state information of the transmitter S and the receiver D;
(2) the listener E trains a first fitting network using a madpg (Multi-Agent Deep Deterministic Policy Gradient) algorithm to determine transmission parameters of a first forwarding matrix, and then forwards a beam quality measurement reference signal to the receiver D using the first forwarding matrix determined by the transmission parameters, so as to induce the receiver D to send a faulty beam measurement report, thereby enabling the transmitter S to select a beam that is beneficial to the listener E.
The step (4) comprises the following steps:
the i-listener E receives the transmission data sent by the transmitter S and forms an optimization problem: under the condition that the successful monitoring and the sending power are smaller than the upper limit of the forwarding power, the data monitoring rate is maximized;
ii, the monitor trains a second fitting network by using the MADDPG algorithm to determine a power distribution factor and a power gain factor, a part of power is used for decoding, a part of power is used for forwarding signals, and then communication data is forwarded to the receiver D by using a second forwarding matrix, so that communication beams are maintained, and the data monitoring rate is improved.
Further, the step (2) comprises the following steps:
modeling a beam induction problem into a first multi-agent cooperation MDP (Markov Decision Process) problem;
secondly, according to the form of the optimal precoding matrix, the problem of searching the optimal precoding matrix is converted into the problem of searching a pair of constants, so that the training process is accelerated; at a certain time, the motion on a single subcarrier is taken as the angle and the amplitude of a precoding matrix, so that the motion of all subcarriers is a set of motion on a single carrier;
state on single sub-carrier is the sum of beam report information on the feedback channel through monitoring and analyzing and known channel information at a certain specific moment, and the global state is the union of non-overlapping information of all sub-carrier states;
and fourthly, the reward function design at a specific moment encourages successful beam induction and punishs the behavior of consuming excessive energy.
Further, the step ii includes the following steps:
i, modeling a data monitoring problem into a second multi-agent cooperation MDP problem;
II, according to the form of the optimal precoding matrix, converting the problem of searching the optimal precoding matrix into the problem of searching a pair of constants, thereby accelerating the training process; at a certain specific moment, the actions of a single subcarrier are power gain factors and power distribution factors, so that the actions of all subcarriers are a set of actions on a single carrier;
III, at a certain specific moment, the state on a single subcarrier is the sum of the signal-to-interference-and-noise ratio obtained by monitoring and feeding back channel information and the known channel information, and the global state is the union of the non-overlapping information of all the subcarrier states;
IV the bonus design at a particular time encourages the subcarriers to maximize the listening rate under the constraints of listening success and power limitations.
Has the advantages that: the method is suitable for monitoring the suspicious communication link in the narrow-beam large-scale MIMO-OFDM system. In the beam scanning and beam determining process of the transmitter, the listener realizes the induction of the beam by optimizing a precoding matrix. The listening rate is maximized by optimizing a power allocation factor and a power gain factor during transmission of data by the transmitter. Considering that the listener is difficult to obtain the channel information between suspicious nodes, the invention provides a learning scheme based on MADDPG to help the listener to perform beam induction and data listening. The active monitoring method in the large-scale MIMO-OFDM system based on the deep reinforcement learning can effectively induce the transmitter S to select the wave beam beneficial to the monitor E, lay the foundation for the following data monitoring process, enable the monitor E to readjust the power division factor and the power gain factor, effectively maintain the communication link and improve the data monitoring rate.
Drawings
FIG. 1 is a diagram of an active listening model in a massive MIMO-OFDM system according to the present invention;
fig. 2 is a diagram of the functioning of the listener E of the invention in different transmission phases of the transmitter S (BS and DT are abbreviations for the beam scanning and data transmission phases);
FIG. 3 is a diagram of a transceiver of listener E of the present invention;
FIG. 4 is a variation N of the present invention te A relation graph of the beam induction success rate and the transmission power under configuration;
FIG. 5 is a variation N of the present invention te A relation graph of the sending power of the configured monitor E and the transmission power of the transmitter S;
FIG. 6 is a different P of the present invention S And N ts Average interception rate graph under the condition;
fig. 7 is a graph of average snoop rates for various snoop methods of the present invention.
Detailed Description
The invention provides a monitoring method in a large-scale MIMO-OFDM system on the basis of the traditional pseudo relay monitoring, wherein the legal full duplex relay is adopted to realize beam induction and data monitoring. The present invention assumes that analog beamforming is employed at the suspect transmitter and utilizes beam scanning to select the optimal beam vector. The beam steering is done during the beam scanning phase. The purpose of the beam steering is to steer the suspect receiver to select a beam that is favorable to the listener. To achieve this, the listener acts as a relay, amplifying and forwarding the measurement reference signal of the desired beam to the suspect receiver. At this stage, the object of the present invention is to minimize the total transmit power of the listeners under the constraint of successful beam induction by optimizing the precoding matrix of the listeners. Through mathematical derivation, a closed-form representation of the optimal precoding matrix is calculated, which is related to the CSI (Channel State Information) of the suspected communication pair. When the listeners do not know the CSI between them, the present invention uses DRL (Deep Reinforcement Learning) algorithm-maddppg (Multi-Agent Deep Deterministic Policy Gradient) to determine the transmission parameters of all subcarriers. Once beam steering is achieved, the listener can perform data listening and improve the listening rate by continuing to act as a pseudo-relay. At this stage, the power division factor and the power gain factor are optimized to maximize the listening rate. Also, since the listener is unaware of the CSI of the suspect communication pair, the present invention still uses madpg to optimize the listener's relay parameters.
Embodiments of the invention are described in detail below with reference to the accompanying drawings:
the application scenario of the present invention is shown in fig. 1: the invention concerns a legal interception system consisting of a pair of suspicious communication nodes (transmitter S and receiver D) and a legal interceptor E. The transmitter S and the receiver D are each provided with N ts Root transmitting antenna and N rd The root receives the antenna. The invention assumes that both the transmitter S and the receiver D employ a large gauge of analog beamformingA modulo MIMO-OFDM array to transmit and receive information. The analog beam is selected from a predefined discrete codebook, and the invention represents the codebook of the transmitter S asThe listener E acts as a full duplex pseudo-relay, passing N re The antenna receives the signal from the transmitter S while passing N te The antenna forwards the signal to the receiver D. In order to improve the monitoring quality, the monitor E employs a digital beamforming technique on each subcarrier. The present invention assumes that all channels in the system remain unchanged in each RB (Resource Block), but may vary from RB to RB according to a markov model.
In the solution of the invention, as shown in fig. 2, for the transmitter S, the whole process of each transport block is divided into two phases: a BS (Beam scanning) phase and a DT (Data Transmission) phase. For listener E, however, the listening process includes three phases: beam selection, beam inducibility, and fraudulent data forwarding. In the beam selection phase, the listener acquires beam quality information by listening to the feedback channel. In particular, when the transmitter S uses beamforming vectorsIn transmission, the signals received by receiver D and listener E on the k-th sub-carrier can be represented as
And
wherein s is k Is a transmission signal of a transmitter and indicating expectation, f j Is a beamforming vector and f j (n)|=1,n=1,...,N ts And j is the beam index,andis the transmit power on the kth sub-carrier, the channel matrix between the transmitter S and the receiver D, the channel matrix between the transmitter S and the listener E,representing the matrix dimensions.Andis zero mean additive white Gaussian noise and has a covariance matrix of σ 2 I. At the receiver of the receiver D, use The analog beamformer of (1) processes the received signal and has | v D ‖ 2 =N rd Where | represents the F-norm modulo a vector or taking a matrix. At listener E, the received signal on the k sub-carrier uses a digital beamformerThe signal is processed. At the BS stage of the transmitter S, the receiver D and the listener E, the SNR (Signal to Noise Ratio) at the k-th subcarrier of the receiver D and the listener E is
And
receiver D calculates the average SNR of all subcarriersWhere K is the number of subcarriers, andselecting J piecesThe candidate beam with large value is thenAnd corresponding index feedback to S, where J is the number of beams for maximum feedback. The present invention assumes that the listener E can obtain this feedback information by listening to the feedback channel between the transmitter S and the receiver D. When the beam selected by the transmitter S results in lowIt is difficult for listener E to listen to the communication transmitted by transmitter S, and therefore, listener E induces beam selection of transmitter S as a pseudo-relay. For listener E, the ideal beam should provide a higher signal-to-noise ratio for both listener E and receiver D, because of the low signal-to-noise ratioWill consume more of the listener E's forwarding power and therefore the listener E will determine the optimum beam index required based on
Where δ is a trade-off factor used to balance snoop success rate and power consumption of the snoopers.
After determining the required optimal beam index j * The listener E will then induce the transmitter S to select the best beam index j during the next BS * . In the DT phase, the transmitter S transmits the communication data to the receiver D, and the listener E acts as an AF (Amplify and Forward) spoofing relay, listening while forwarding the data.
As shown in figure 3 of the drawings,α k and g k Respectively representing a receiving beamforming vector, a transmitting beamforming vector, a power allocation factor and a power gain factor of the listener E at the subcarrier k. During the beam-steering phase, the received signal is amplified and passed through a k 1 transmission, i.e. no decoding information is needed. In the data forwarding phase, the received signal power is divided into two parts for decoding and forwarding. The invention analyzes how to optimizeTo achieve the maximum listening rate.
During the beam sweep of the transmitter S, the listener E will amplify and forward the pilot signal sent from the transmitter S to measure the beam quality between the transmitter S and the receiver D. The present invention assumes that the delay of the AF relay used by the listener is much smaller than the symbol duration and therefore negligible. Due to the full duplex nature of listener E, the received signal at subcarrier k from listener E is
WhereinIn order to be a self-interfering channel,is the pre-coding matrix of the listener E,is the signal received by the listener at the last time. As can be seen from (6), if W k In thatNull space of (1), thenThen (6)Is provided with Is composed ofThe right singular matrix corresponding to the zero singular value of (b), the precoding matrix may be written as
Wherein,is the new matrix to be optimized. To ensure r 0 Greater than 0, the present invention has N te >N re I.e. listener E needs more transmit antennas than receive antennas to suppress self-interference. After eliminating self-interference, transmitting signalCan be expressed asTransmission power of listener EIs composed ofThe signal received by the receiver D after the reception beam forming can represent
Wherein,is the channel matrix between listener E and receiver D on the k sub-carrier, anAndis a newly constructed equivalent channel. The received signal-to-noise ratio of receiver D on the k-th subcarrier can then be expressed as
To accomplish the inducement of the transmitter S to select the best beam index j with the minimum transmit power * This problem can be expressed as
Wherein,andto obtainThe present invention decomposes problem (10) into K independent sub-problems, such thatWherein Therefore, the K sub-problems can be expressed as
To solve (11), the invention first proves a lemma: the optimal solution to the problem (11) can be expressed as
for the sake of brevity, the identification of subcarrier k will be omitted in the following proof of lemma. To prove the lemma, the present invention assumes a feasible solution of (11)Whereinw ′ For the amplitude parameter of the feasible solution, the power consumption P (W') corresponding to the feasible solution isThe invention then constructs a matrixWhereinThe inequality follows the equation of any matrix (vector) A and B, and is less than or equal to A and B. It will be demonstrated below that the new matrix W 'is not only feasible for the problem (11), but also results in a target value that is smaller than P (W'). Order toBy substituting W' into beta in (9) k Numerator and denominator of (1), thus having
Wherein (13) and (14) follow the triangle inequality. Based on (13) and (14), the present invention infers that β (W ') > β (W') > β D . The above results show that W' is feasible for problem (11). By substituting W' into the objective function of (11), the invention obtains
Wherein (15) follows the Cauchy-Schwarz inequalityIn summary, for any solution W' to the problem (11), the present invention can always construct another oneA smaller target value is obtained, which proves thatThis rationale is clear.
Substituting (12) into the objective function in (11) can be seenIs w k Is used as an increasing function of. Therefore, in (12), w is gradually increased from a small value k Until the constraint in (11) is satisfied, a unique unknown variable w can be found k . The doctrine of equivalents applying to any givenThe optimal solution of (10) is therefore of the same form as (12). In theory, the invention can be implemented by considering all possible combinationsTo obtain the optimal solution of (10). For a given(11) Provides an upper bound for the solution of (10). Only if the listener E is able to know all the channelsThe listener E can then use the precoding matrix in (12). The invention assumes that the listener E can obtain equivalent channel vectors by listening to the pilot signalHowever, due to the non-cooperative relationship between the transmitter S and the listener E, it is difficult to obtainThus, the present invention relies on the feedback β and β of the DRL D Adjustment of To minimize P E . Real-time determination by employing a learning framework of MADDPGThe transmitter S is induced to select the beam required by the listener E. Finally W in (7) k Can be expressed as a column vectorSum row vectorThe product of (c) as shown in fig. 3.
Successful beam steering does not mean that listening can be successfully performed. In the data transmission phase, if the listener E does not forward data to the receiver D, the error rate of the receiver D may be above the threshold and trigger a beam recovery procedure, thereby switching beams. Therefore, in order to realize data relay and listening of the listener E under the AF relay operation, a received signal is transmittedOne part is used for forwarding information to increase the signal-to-noise ratio of the receiver D, and the other part is used for decoding information to listen to the message sent by the transmitter S. Due to alpha k The introduction of (a) into (b),power gain factor w in k Re-optimization is required. Definition ofFor the purpose of normalizing the beamforming vector, the transmitted signal of the listener EIs shown as
Whereing k For the power gain factor, alpha, for controlling the transmission power during the data listening phase k A factor is assigned to the power. It should be noted that it is preferable that,andconsistent with beam induction, since the present invention is directed to improving the signal-to-noise ratio of the receiver D at both stages. Similarly to (8), the reception signal of the receiver D in the data transmission stage can be written as
For a givenAndthe received signal-to-noise ratio of the receiver D and the listener E can be calculated asAndthe goal of listener E is then to optimizeThereby maximizing the listening rate within the constraints of the transmission power. Thus, the optimization problem can be expressed as
Wherein, and P M Respectively, the total transmit power and the power constraint of the listener E. The invention assumes that the listener E is only at R E ≥R D The monitoring can be realized only when the corresponding monitoring rate is R D . If the listener E knows the global CSI, then a solution can be derived (18) using the Lagrangian multiplier method. However, whenWhen unknown, the invention can not obtain the optimumIs to knowA more reasonable assumption is that this can be obtained by listening to the uplink control channel between the transmitter S and the receiver DTherefore, DRL is adopted toDetermining as an observed state and interacting with a systemBy training the neural network using MADDPG, it is given in real timeThereby improving the listening rate of the listener E at a controllable transmission power.
Based on the above analysis, when the CSI between the transmitter S and the receiver D is unknown, the beam induction and data listening problem is expressed as an MDP (Markov Decision Process) problem. Treating all sub-carriers as one agent and obtaining the policy through the Actor-critical network of a single DDPG is the first intuitive deep learning solution. However, in practical implementations, it is often more difficult to train one strategy with a large action space than to train multiple strategies with a small action space. Thus, in both phases, the present invention treats each subcarrier as a separate agent, which cooperate to achieve a common goal. Therefore, the present invention adopts the learning architecture of MADDPG, which includes K actors (policies) and a centralized Critic (cost function). During the training phase, Actor and Critic are updated with global data, including global state, shared rewards, and all actions, which will be defined later.
Modeling a beam induction problem into a first multi-agent cooperation MDP problem, and converting the problem of searching the optimal precoding matrix into the problem of searching a pair of constants (w) according to the form of the optimal precoding matrix k ,θ k ) Problem of where θ k As a MADDPG algorithm pairThereby speeding up the training process. At time t, the k-th sub-carrier is used for operationAnd (4) showing. Thus, the action of all subcarriers isAt time t, the state on each subcarrier k isBeta and beta D Obtained by listening and analyzing beam reports on the feedback channel. Global state s t Is the all sub-carrier stateOf non-overlapping information, i.e. unionReward r at time t t Is defined as r t =-a 1 P E -a 2 (β-β D -B) 2 +a 3 I(β,β D ) WhereinB is a positive factor for balancing the induced success rate and power consumption of the listener, and is used to increase the selection of the best beam index j * The constant of probability, I (x, y), is a boolean function where I (x, y) is 1 when x ≧ y, and I (x, y) is 0 otherwise. The reward function encourages successful beam inducement while penalizing excessive energy consumption behavior.
Modeling the data monitoring problem into a second multi-agent collaborative MDP problem, and converting the problem of finding the optimal precoding matrix into the problem of finding a pair of constants (g) according to the form of the optimal precoding matrix k ,α k ) Problem wherein g is k And alpha k Respectively representing the power gain factor and the power distribution ratio of the monitor on the subcarrier k, thereby accelerating the training process. At time t, the k-th subcarrier acts asThus, the action of all subcarriers isAt time t, the state on each subcarrier k is WhereinIs obtained by listening and feeding back channel information. Global state s t Is the all sub-carrier stateOf non-overlapping information, i.e. unionReward r at time t t Is defined asWhereinC is a constant used to boost the listening rate, a positive coefficient used to balance the listening rate and power consumption of the relay. Reward function encourages subcarriers at R E >R D And P E ≤P M Maximize R under the constraint of D 。
As shown in FIG. 4, during the beam-inducing phase, different N te Lower beam induction success rate and transmission power P S A graph of the relationship (c). The invention allocates the same transmission power on each subcarrier, i.e.The induction rate was calculated by 10 5 Statistics of beta is more than or equal to beta in sub Monte Carlo simulation D The amount of (c) is obtained. In the passive approach, the listener E remains silent while the transmitter S performs the beam scanning. In this case, when the listener E and the receiver D are far apart, the receiver D will select the best beam index j with low probability * . The results show that the success rate of the method based on MADDPG provided by the invention is close to 100%. These results verify the effectiveness of the method in different system configurations.
As shown in FIG. 5, during the beam-inducing phase, different N te P under configuration E And P S The relationship between them. In the best case, P E Is known as (10)The calculated optimal target value. In the MADDPG based scheme, P E Calculated using parameters learned by maddppg. It can be seen that although P is E Will follow P S Is increased, but is provided with more N te Can effectively reduce P E . In combination with fig. 3 and 4, canIt is seen that even thoughUnknown, the present invention can still use the beam-inducing strategy learned by maddppg to achieve beam-inducing, and the transmit power is slightly higher than the theoretical minimum power.
As shown in fig. 6, at the stage of data listening, by solving (18), an optimal solution is obtained. A passive approach with SBM (Successful Beam steering) means that the agent achieves Beam steering in the BS phase but remains silent in the DT phase. As shown in FIG. 6, after successful induction, the interception rate will follow P S Or N ts Can ensure R by adjusting transmission parameters E ≥R D . Meanwhile, the monitoring rate of the method provided by the invention is close to the optimal solution, and is obviously superior to the passive monitoring method under the SBM.
As shown in FIG. 7, FIG. 7 contrasts different power constraints P M The following monitoring schemes and the traditional active interference scheme are drawn as comparison. The result shows that the monitoring rate obtained by the MADDPG scheme provided by the invention is close to the optimal solution and is along with the P M Is increased. When P is M >55dBm,The interception rate approaches the maximum R E . The monitoring performance of the passive monitoring scheme without the SBM is irrelevant to the transmitting power of the monitor E, and the monitoring performance of the passive monitoring with the SBM is superior to that of the method without the SBM. The average interception rate of the interference scheme is limited by the power constraint of the listener E, since it is at a power limit value P M Relatively low R cannot be guaranteed E ≥R D 。
Simulation proves that the active monitoring method in the large-scale MIMO-OFDM system based on deep reinforcement learning can effectively induce the transmitter S to select the wave beam beneficial to the monitor E, lay the foundation for the following data monitoring process, enable the monitor E to readjust the power distribution factor and the power gain factor, effectively maintain the communication link and improve the data monitoring rate. The two stages are combined to realize a large-scale MIMO-OFDM system for monitoring narrow beam communication.
Claims (6)
1. An active monitoring method based on deep reinforcement learning comprises the following steps:
(1) the transmitter S executes analog beam scanning in a time division manner according to the beam precoding book;
(2) in the phase of beam scanning performed by the transmitter S, the monitor E determines the optimal beam index j favorable to itself according to its beam quality report and the beam report fed back to the transmitter S by the receiver D * ;
(3) The listener E induces the transmitter S to select the best beam index j by optimizing the forward precoding matrix * ;
(4) At the best beam index j * In the determined communication stage, the listener E serves as a pseudo relay for data forwarding, communication beams are maintained, and the data interception rate is improved.
2. The active listening method based on deep reinforcement learning of claim 1, wherein the step (2) comprises the following steps:
1) the receiver D and the monitor E respectively receive a beam quality measurement reference signal sent by the transmitter S, and calculate the beam quality according to the received signal, and the receiver D forms a beam quality report and feeds the beam quality report back to the transmitter S to be used as beam selection reference;
2) the monitor E determines the best beam index j according to the beam quality report and the beam quality report fed back to the transmitter S by the monitor receiver D, and finally according to the beam induction success rate and the power consumption compromise formula by considering the power consumption factor * 。
3. The active listening method based on deep reinforcement learning of claim 1, wherein the step (3) comprises the following steps:
(1) the listener forms an optimization problem: minimizing the total transmitting power of the monitor under the constraint of successful beam induction, deducing the form of an optimal precoding matrix according to the optimization problem, and obtaining that the optimal precoding matrix is related to the channel state information of the transmitter S and the receiver D;
(2) the listener E trains a first fitting network using the maddppg algorithm to determine transmission parameters of a first forwarding matrix, and then forwards a beam quality measurement reference signal to the receiver D using the first forwarding matrix determined by the transmission parameters, so as to induce the receiver D to send a beam measurement report with errors, thereby enabling the transmitter S to select a beam that is favorable for the listener E.
4. The active listening method based on deep reinforcement learning of claim 1, wherein the step (4) comprises the following steps:
the i-listener E receives the transmission data sent by the transmitter S and forms an optimization problem: under the condition that the monitoring is successful and the sending power is smaller than the upper limit of the forwarding power, the data monitoring rate is maximized;
ii, the monitor trains a second fitting network by using the MADDPG algorithm to determine a power distribution factor and a power gain factor, a part of power is used for decoding, a part of power is used for forwarding signals, and then communication data is forwarded to the receiver D by using a second forwarding matrix, so that communication beams are maintained, and the data monitoring rate is improved.
5. The active listening method based on deep reinforcement learning of claim 3, wherein the step (2) comprises the following steps:
firstly, modeling a beam induction problem into a first multi-agent cooperation MDP problem;
secondly, according to the form of the optimal precoding matrix, the problem of searching the optimal precoding matrix is converted into the problem of searching a pair of constants, so that the training process is accelerated; at a certain time, the motion on a single subcarrier is taken as the angle and the amplitude of a precoding matrix, so that the motion of all subcarriers is a set of motion on a single carrier;
state on single sub-carrier is the sum of beam report information on the feedback channel through monitoring and analyzing and known channel information at a certain specific moment, and the global state is the union of non-overlapping information of all sub-carrier states;
and fourthly, designing a reward function at a specific moment to encourage successful beam induction and punish the behavior of consuming excessive energy.
6. The active listening method based on deep reinforcement learning of claim 4, wherein the step ii comprises the following steps:
i, modeling a data monitoring problem into a second multi-agent cooperation MDP problem;
II, according to the form of the optimal precoding matrix, converting the problem of searching the optimal precoding matrix into the problem of searching a pair of constants, thereby accelerating the training process; at a certain specific moment, the actions of a single subcarrier are power gain factors and power distribution factors, so that the actions of all subcarriers are a set of actions on a single carrier;
III, at a certain specific moment, the state on a single subcarrier is the sum of the signal-to-interference-and-noise ratio obtained by monitoring and feeding back channel information and the known channel information, and the global state is the union of the non-overlapping information of all the subcarrier states;
IV the bonus design at a particular time encourages the subcarriers to maximize the listening rate under the constraints of listening success and power limitations.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210312148.3A CN114884547A (en) | 2022-03-28 | 2022-03-28 | Active monitoring method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210312148.3A CN114884547A (en) | 2022-03-28 | 2022-03-28 | Active monitoring method based on deep reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114884547A true CN114884547A (en) | 2022-08-09 |
Family
ID=82669000
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210312148.3A Pending CN114884547A (en) | 2022-03-28 | 2022-03-28 | Active monitoring method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114884547A (en) |
-
2022
- 2022-03-28 CN CN202210312148.3A patent/CN114884547A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Tsinos et al. | Joint transmit waveform and receive filter design for dual-function radar-communication systems | |
US9413474B2 (en) | Efficient large-scale multiple input multiple output communications | |
Jiang et al. | Recurrent neural network-based frequency-domain channel prediction for wideband communications | |
US11742910B2 (en) | Multi-user uplink and downlink beam alignment method for asymmetric millimeter wave large-scale MIMO | |
Liu et al. | Joint beamforming and reflection design for RIS-assisted ISAC systems | |
Uher et al. | Review of distributed beamforming | |
CN113721198A (en) | Physical layer security combined beam forming method for dual-function MIMO radar communication system | |
Gao et al. | Deep reinforcement learning for joint beamwidth and power optimization in mmWave systems | |
KR101155629B1 (en) | Method for selective transmit/receive antenna repetition | |
Zhu et al. | Resource allocation for IRS assisted mmWave integrated sensing and communication systems | |
CN115632684B (en) | Transmission strategy design method of perception and communication integrated system | |
Chen-Hu et al. | Differential data-aided beam training for RIS-empowered multi-antenna communications | |
Chen et al. | A two-stage beamforming design for active RIS aided dual functional radar and communication | |
Abdulateef et al. | Performance analyses of channel estimation and precoding for massive MIMO downlink in the TDD system | |
CN109669167B (en) | Airborne radar emission waveform selection method based on radio frequency stealth | |
Chen et al. | Proactive eavesdropping in massive MIMO-OFDM systems via deep reinforcement learning | |
CN114884547A (en) | Active monitoring method based on deep reinforcement learning | |
CN113297724B (en) | Distributed MIMO radar power and bandwidth joint optimization method based on target positioning | |
CN116614161A (en) | Radar communication integrated beam forming method based on linear constraint power distribution | |
CN113923746B (en) | Anti-interference method of wireless energy supply communication network based on time reversal | |
CN115348577A (en) | Beam scanning method based on reinforcement learning in covert communication system | |
Zhang et al. | Design and performance analysis of wireless legitimate surveillance systems with radar function | |
CN114793127A (en) | Dual-function radar communication method and device, computer equipment and storage medium | |
Kloob et al. | Novel KLD-based Resource Allocation for Integrated Sensing and Communication | |
Wang et al. | Joint Reliability Optimization and Beamforming Design for STAR-RIS-Aided Multi-user MISO URLLC systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |