CN115296705B - Active monitoring method in MIMO communication system - Google Patents

Active monitoring method in MIMO communication system Download PDF

Info

Publication number
CN115296705B
CN115296705B CN202210470392.2A CN202210470392A CN115296705B CN 115296705 B CN115296705 B CN 115296705B CN 202210470392 A CN202210470392 A CN 202210470392A CN 115296705 B CN115296705 B CN 115296705B
Authority
CN
China
Prior art keywords
antenna
listener
transmitter
parameter
policy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210470392.2A
Other languages
Chinese (zh)
Other versions
CN115296705A (en
Inventor
唐岚
郭德邻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202210470392.2A priority Critical patent/CN115296705B/en
Publication of CN115296705A publication Critical patent/CN115296705A/en
Application granted granted Critical
Publication of CN115296705B publication Critical patent/CN115296705B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/0413MIMO systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/06Testing, supervising or monitoring using simulated traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/04Wireless resource allocation
    • H04W72/044Wireless resource allocation based on the type of the allocated resource
    • H04W72/0473Wireless resource allocation based on the type of the allocated resource the resource being transmission power
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The application discloses an active monitoring method in a MIMO communication system, which comprises a suspicious transmitter A, a suspicious receiver B and a legal monitor E, wherein the transmitter A sends information to the receiver B, the monitor E makes a decision to improve the monitoring performance according to part of known channels, the A makes a corresponding decision to prevent the monitoring of the monitor, so that monitoring and anti-monitoring games between a source node A and the monitor E are generated. The application designs a reinforcement learning algorithm to optimize the transmitting power strategy of the monitor E and the transmitter A, and obtains Nash equilibrium in the monitoring and anti-monitoring game between the monitor E and the transmitter A.

Description

Active monitoring method in MIMO communication system
Technical Field
The application belongs to the field of wireless communication, in particular to an active monitoring method in a MIMO (multiple input multiple output) system, and more particularly relates to an optimization method of a power distribution strategy based on a multi-agent reinforcement learning algorithm FSP-SAC (virtual self-play soft action-criticizing).
Background
In recent decades, wireless communication has played a very important role in people's daily lives by providing efficient and convenient access to linked people.
At present, physical layer monitoring can be divided into two types, passive monitoring and active monitoring. Passive listening is simply receiving a leaking wireless signal through a silent receiver. However, with the deployment of high-band channels and MIMO in fifth generation (5G) networks, the beams of the transmission signals become more and more station-specific, so that passive listening will have difficulty in listening to the valid information. According to the information theory, information can be decoded in the sense of the physical layer as long as the channel capacity of the communication channel is smaller than the channel capacity of the listening channel. So in order to improve the listening efficiency, an active listening method for reducing the capacity of a communication channel by using an interference signal is widely used.
The current common active monitoring mode only considers static monitored targets, and along with the development of anti-monitoring measures, more and more illegal information transmitters start to intelligently adjust the transmitting power and reduce the capacity of a monitored channel by using noise signals, so that the game problems about anti-monitoring and monitoring are generated, and the difficulty is also great for the effective active monitoring. Therefore, it is important to construct a method to seek to counter-monitor and monitor the Nash equilibrium solution in the gaming problem.
Disclosure of Invention
The application aims to: aiming at the problems and the shortcomings of the prior art, the application aims to provide an active monitoring method in a MIMO communication system, and simultaneously optimize a power transmission strategy of a monitor and a power transmission strategy of a suspicious source node so that the strategies of the monitor and the suspicious source node achieve Nash equilibrium.
The technical scheme is as follows: in order to achieve the above object, the present application adopts a technical scheme that an active monitoring method in a MIMO communication system includes the following steps:
(1) At each time slot t, a multi-antenna transmitter (transmitter) a transmits an information signal to a multi-antenna receiver (receiver) BAnd transmitting interference signals to a multi-antenna listener (listener) E>To reduce the channel capacity of a to E and thus prevent listening from E, action of transmitter a +.>Denoted as->Wherein->And->Are respectively->And->Is based on the local channel informationUtilizing policy pi A Selecting an action, transmitter A uses a base +.>Conditional probability distribution->The value of one sampling is the selected action +.>
(2) At each time slot t, listener E transmits an interference signal x to receiver B E (t) reducing channel capacity between transmitter A and receiver B to improve listening success rate, action of listener EDenoted as->Wherein the method comprises the steps ofIs x E Covariance matrix of (t), listener E based on local channel information +.>Utilizing policy pi E Selecting actions, listener E uses based on +.>Conditional probability distribution->The value of one sampling is the selected action +.>
(3) At each time slot t, after the transmitter A and the listener E have performed actions, respectively awardedAndlet pi= { pi AE Defining an average prize function J for emitter A A (pi) is->Wherein the method comprises the steps ofRepresenting the mathematical expectation taken on the time slot t based on the condition pi, the average reward function J of the listener E E (pi) isOptimization strategy pi A To maximize J A (pi), optimization strategy pi E Maximum J E (pi) to achieve Nash equalization in listening and anti-listening games.
Further, in the step (3), the method further comprises the following steps:
1) For any device n, where n is { A, E }, a parameter is initialized to θ n Is a parameterized strategy of (2)One parameter is ψ n Policy of->One parameter is omega n Value function of +.>One parameter is phi n Value function of +.>Parameter phi n Value assignment to->Parameter of->
2) At each time slot t, device n uses the policy with a probability of 0.1To select an action, use policy +.0.9 probability>Selecting action, data collected +.>Store to the first storage area->Wherein when n=a, data +.>Is->When n=e, data +.>Is->If the action is taken by policy->Select and then add data->Is stored in a second storage area
3) From the slaveRandomly selecting samples with sample length L> Calculating gradients Wherein->Representing the gradient of the variable x->Is made up of policy->Sampled, temperature parameter alpha E [0,1 ]]Calculate->Gradient of-> Calculating gradientsWherein the discount factor gamma E (0, 1), fromRandomly selecting samples with sample length L>Calculate gradient->Then update the parameter θ n ←θ n + ηΔθ nn ←ω n +ηΔω nn ←φ n +ηΔφ n ,/>ψ n ←ψ n + ηΔψ n Wherein eta is learning rate, eta is (0, 1) and v is a sliding average parameter, v is (0, 1), symbol (C) is the value of the right side of the arrow to the left side, and then step 2) is returned until the policy parameter theta n No longer changes.
The beneficial effects are that: the application solves the dimension disaster problem faced by high-dimensional game through designing an algorithm FSP-SAC and introducing deep strong learning, and solves the problem that the common single-agent reinforced learning algorithm is difficult to converge in the game problem through combining virtual game playing and deep reinforced learning technologies, so that the algorithm can gradually converge to Nash equilibrium.
Drawings
FIG. 1 is a system model diagram of the present application;
FIG. 2 is a graph comparing the performance of the method used in the present application with other methods;
FIG. 3 is a graph comparing the performance of the method used in the present application and other methods.
Detailed Description
The present application is further illustrated in the accompanying drawings and detailed description which are to be understood as being merely illustrative of the application and not limiting of its scope, and various modifications of the application, which are equivalent to those skilled in the art upon reading the application, will fall within the scope of the application as defined in the appended claims.
As shown in fig. 1, the communication system we consider is composed ofA multi-antenna transmitter (transmitter) a, a multi-antenna receiver (receiver) B, and a multi-antenna listener (listener) E. Let the number of transmitting antennas of transmitter A be N A The number of receiving antennas of the receiver B is N B While listener E has two sets of antennas, one set for transmitting interference signals, the number of which isAnother group is used to monitor the signal from A, the number of which is +.>At time slot t, the channel matrices between transmitter A and receiver B, transmitter A and listener E, and listener E and receiver B are denoted as H, respectively AB (t),H AE (t), and H EB (t)。/> Wherein->Representing a complex domain space of size i x j.
At each time slot t, the signal transmitted by transmitter A is transmitted by an information signalAnd artificial noise signal->Composition of->Is expressed as +.>Wherein the method comprises the steps ofIs expressed as +.>Wherein->In order that the artificial noise does not interfere with the information signal +.>From H AB (t) N corresponding to non-0 singular values B Right singular value vector composition, and +.>From the rest N A -N B Right singular vectors corresponding to 0 singular values. The total signal transmitted by transmitter a in time slot t is denoted +.> The signal received by the receiver B is:
wherein x is E (t) is the interference signal transmitted by listener E to B,n B is gaussian white noise. The signal received by listener E is:
the covariance matrix of the signal received by B, which is obtained by equation (1), is:
wherein the method comprises the steps ofSuperscript () H Representing the conjugate transpose of the matrix or vector. The covariance matrix of the interference received by B is:
wherein the method comprises the steps ofIs n B Covariance matrix sigma of (a) 2 Is the noise figure, I x Representing an identity matrix of size x. The channel capacity between the transmitter a and the receiver B is according to the formulas (3) and (4)
Wherein the function det represents a matrix determinant, superscript of matrix () -1 Representing the inverse of the matrix.
According to equation (2), the covariance matrix of the signal at listener E is:
the covariance matrix of the interference is:
the channel capacity between the transmitter a and the receiver B, as obtained by equations (6) and (7), is:
at time slot t, both transmitter a and listener E can only obtain partial channel information, which we call local observation information (or local channel information). At time slot t, the local observation information of transmitter A is defined asWherein->Is the space of the local observation information of a. The local observation information of listener E is defined as +.> Is the local observation information space of E. And the global state is defined as +.>Wherein->Is a global state space.
At each time slot t, the transmitter A decides to transmit a signalAnd->Is, covariance matrix +.>And->The power of the signal is +.> Where tr is the trace of the matrix, artificial noise power +.>If transmitter A does not know channel H AE (t) the noise power can be assumedEqually distributed across each artificial noise streamI.e. +.>If it is assumed that each stream of signals is not related to each other +.>And->Are semi-positive definite symmetric matrixes. The signal power and the artificial noise power need to satisfy the general aggressive constraints: /> Wherein->Is the maximum power of transmitter a. Define the action of transmitter a in time slot t as +.>Interference signal x of listener E E Covariance matrix of (t)>Assuming that each stream of signals is not correlated with the other, then +.>A semi-positive symmetric matrix is defined. The power of the noise signal needs to meet the total power constraint: /> Wherein->Is the maximum transmit power of listener E. Define the action of E in time slot t as +.>Define a joint action->Based on theory of information, if C E (t) is greater than C B (t), the listener can decode the information transmitted by the transmitter a to the receiver B with any small error in the physical layer sense, so define the reward function of the listener E as:
wherein the method comprises the steps ofIndicating a Boolean indication function, outputting 1 when the input value is true, otherwise outputting 0, < ->Because the magnitude of the change in action by the reward function can be increased exponentially. If the information transmitted by the transmitter a is intercepted once, a penalty is given to the intercepted data according to each part. Transmitter a aims to maximize the transmission rate while reducing the amount of information that is intercepted, so the reward function of transmitter a is defined as:
where ζ is a coefficient of the balanced transmission rate and the information leakage penalty greater than 0. We will r E (s t ,a t ) And r A (s t ,a t ) Respectively abbreviated asAnd->
Defining the strategy of transmitter a as pi A The action is selected and the selection is performed,is one ofAre based on->Condition probability distribution of (2) in observation information +.>When A uses the probability distribution to select action +.>Defining the policy of listener E to be pi E Selecting actions (I)>Is based on->Is +.>When transmitter A uses the probability distribution to select action +.>The joint policy of the two is expressed as pi= (pi) AE ). The objective function of the transmitter A isIndicating ++under policy pi>The mathematical expectation in the time dimension, i.e. the average prize value. Likewise, the objective function of listener E is +.>
The optimization objective of transmitter a is:
the optimization objective of listener E is:
to solve the problems (11) and (12), a common reinforcement learning algorithm can be applied to the transmitter a and the listener E, respectively, to solve the problems, but the learning result is difficult to converge due to the fact that both strategies are changing. For this, we designed a multi-agent reinforcement learning algorithm FSP-SAC to learn the respective optimal strategy pi for emitter A and listener E A And pi E . The method stabilizes the learning process by training an average strategy and an optimal response strategy.
We present a procedure for solving problems (11) and (12) using FSP-SAC, since problems (11) and (12) are both solved using FSP-SAC, for simplicity of description, n is used to represent one of transmitter a or listener E, i.e., n E { a, E }, the algorithm procedure is as follows:
1) For any n E { A, E }, a parameter is initialized to θ n Is a parameterized optimal response strategyOne parameter is ψ n Average policy of->One parameter is omega n Value function of +.>One parameter is phi n Value function of +.>Initializing a value function->To stabilize the learning process and to stabilize the parameter phi n Value assignment to->Parameter of->
2) And (3) data collection: definition x-p (x) means that x obeys the probability distribution p (x). At each time slot t, the local observation information isn uses policy +.0.1 probability>To select the action:> using policies with a probability of 0.9Selecting actions of:>we call this probabilistic selection strategy a hybrid strategy, +.>And->Is pi n . After the actions of both transmitter A and listener E are performed, the system transitions to the next state and n gets a prize +.>And local observation information for observing the next slot +.>Data to be collectedStore to storage area->In, if the action is by policy->Select, data->Store to another storage area->Assuming that the data collection phase is a T step, when t=T, the data collection is finished, and the optimization learning phase is entered.
3) Reinforcement learning stage, utilizingData in (a) to update->And +.>From the slaveRandomly selecting samples with sample length L> Calculate about->Gradient:
wherein the method comprises the steps ofRepresenting the gradient of the variable x->Is made up of policy->Sampled instead of from sample tau RL Obtained, the temperature parameter alpha epsilon [0,1 ]]. Recalculating about +.>Is a gradient of (2):
computing information aboutIs a gradient of (2):
wherein the discount factor gamma epsilon (0, 1). The parameters are then updated: θ n ←θ n +ηΔθ nn ←ω n + ηΔω nn ←φ n +ηΔφ n ,Where η is the learning rate, η is (0, 1) in the value range, ν is the moving average parameter, ν is (0, 1), and symbol Σ indicates that the right value of the arrow is assigned to the left.
4) Supervised learning phase, utilisation ofUpdate +.>From->Randomly selecting samples with sample length B>Then calculate the gradient
The parameters are then updated: psi phi type n ←ψ n +ηΔψ n . Then return to step 2) until policyIs converged to a steady state.
Finally we simulate the system. The simulation parameters are set as follows: sigma (sigma) 2 =10 -8 mW,N A = 4,The distance between the transmitter A, the receiver B and the monitor E is 200m, and the path loss index is 3.48. The coefficient ζ=2 in the formula (10). The strategy and the value function are parameterized by a multi-layer perceptron (one of the artificial neural networks), the activation function is ReLu (Rectified Linear Unit, linear rectifying unit), and there are two layers of 128 neurons each. η=0.0003, α=0.05, ν=0.005, γ=0.99, t=1000, l=128.
In FIG. 3, we compare with several other methods, where the SAC (Soft Actor-Critic) method is from Soft Actor-Critic: off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, and the Wolf-PPO method is from Win or learn fast proximal policy optimisation. Fig. 2 and 3 are graphs of learning of the transmitter a and the listener E, respectively, and it can be seen that the result curves of the FSP-SAC using the multi-agent reinforcement learning algorithm gradually converge to a steady state value, while other learning algorithms for comparison face severe fluctuations, so the present application solves the problem that other reinforcement learning methods are difficult to converge in the game problem, and according to the relationship between convergence and rationality in the game, since the FSP-SAC inherits rationality from the SAC, it can be determined that the result of the FSP-SAC method converges to nash equilibrium.

Claims (1)

1. An active monitoring method in a MIMO communication system, comprising the steps of:
(1) At each time slot t, the multi-antenna transmitter A transmits an information signal to the multi-antenna receiver BAnd transmitting an interference signal to the multi-antenna listener E>To reduce the channel capacity of the multi-antenna transmitter A to the multi-antenna listener E to prevent listening from the multi-antenna listener E, action of the multi-antenna transmitter A +.>Denoted as-> Wherein the method comprises the steps ofAnd->Are respectively->And->Is based on the local channel information>Utilizing policy pi A Selecting an action, the multi-antenna transmitter A uses the antenna based on +.>Conditional probability distribution->The value of one sampling is the selected action +.>
(2) At each time slot t, multi-antenna listener E transmits interference signal x to multi-antenna receiver B E (t) to reduce the channel capacity between the multi-antenna transmitter A and the multi-antenna receiver B to increase the listening success rate, the action of the multi-antenna listener EDenoted as->Wherein->Is x E (t) covariance matrix, multi-antenna listener E based on localized channel informationUtilizing policy pi E Selecting action, multi-antenna listener E uses based on +.>Conditional probability distribution->The value of one sampling is the selected action +.>
(3) At each time slot t, after the multi-antenna transmitter A and the multi-antenna transmitter E each perform actions, rewards are respectively obtainedAndlet pi= { pi AE Defining an average reward function pi for multi-antenna transmitter a A (pi) is->Wherein the method comprises the steps ofRepresenting a mathematical expectation taken on the basis of the condition pi for the time slot t, an average reward function J of the multi-antenna listener E E (pi) isOptimization strategy pi A To maximize J A (pi), optimization strategy pi E Maximum J E (pi) to achieve Nash equalization in listening and anti-listening games;
the step (3) further comprises the following steps:
1) For any device n, where n is { A, E }, a parameter is initialized to θ n Is a parameterized strategy of (2)One parameter is ψ n Policy of->One parameter is omega n Value function of +.>One parameter is phi n Value function of +.>Parameter phi n Value assignment to->Parameter of->
2) At each time slot t, device n uses the policy with a probability of 0.1To select actions, use policies with a probability of 0.9Selecting action, data collected +.>Store to the first storage area->Wherein when n=a, data +.>Is->When n=e, data +.>Is thatIf the action is taken by policy->Select and then add data->Store to the second storage area->
3) From the slaveRandomly selecting samples with sample length L>Calculating gradientsWherein->Representing the gradient of the variable x->Is made up of policy->Sampled, temperature parameter alpha E [0,1 ]]Calculate->Gradient of-> Calculating gradientsWherein the discount factor gamma E (0, 1), fromRandomly selecting samples with sample length L> Calculate gradient->Then update the parameter θ n ←θ n +ηΔθ nn ←ω n +ηΔω nn ←φ n +ηΔφ n ,/> Wherein eta is learning rate, eta is (0, 1) and v is a sliding average parameter, v is (0, 1), symbol (C) represents assigning the right value of the arrow to the left, and then returning to step 2) until the policy parameter theta n No longer changes.
CN202210470392.2A 2022-04-28 2022-04-28 Active monitoring method in MIMO communication system Active CN115296705B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210470392.2A CN115296705B (en) 2022-04-28 2022-04-28 Active monitoring method in MIMO communication system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210470392.2A CN115296705B (en) 2022-04-28 2022-04-28 Active monitoring method in MIMO communication system

Publications (2)

Publication Number Publication Date
CN115296705A CN115296705A (en) 2022-11-04
CN115296705B true CN115296705B (en) 2023-11-21

Family

ID=83819503

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210470392.2A Active CN115296705B (en) 2022-04-28 2022-04-28 Active monitoring method in MIMO communication system

Country Status (1)

Country Link
CN (1) CN115296705B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10069592B1 (en) * 2015-10-27 2018-09-04 Arizona Board Of Regents On Behalf Of The University Of Arizona Systems and methods for securing wireless communications
CN111726845A (en) * 2020-07-01 2020-09-29 南京大学 Base station switching selection and power distribution method in multi-user heterogeneous network system
CN112087749A (en) * 2020-08-27 2020-12-15 华北电力大学(保定) Cooperative active eavesdropping method for realizing multiple listeners based on reinforcement learning
CN112840600A (en) * 2018-08-20 2021-05-25 瑞典爱立信有限公司 Immune system for improving sites using generation of countermeasure networks and reinforcement learning
WO2021136070A1 (en) * 2019-12-30 2021-07-08 三维通信股份有限公司 Resource allocation method for simultaneous wireless information and power transfer, device, and computer
CN114363908A (en) * 2022-01-13 2022-04-15 重庆邮电大学 A2C-based unlicensed spectrum resource sharing method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7813739B2 (en) * 2007-09-27 2010-10-12 Koon Hoo Teo Method for reducing inter-cell interference in wireless OFDMA networks

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10069592B1 (en) * 2015-10-27 2018-09-04 Arizona Board Of Regents On Behalf Of The University Of Arizona Systems and methods for securing wireless communications
CN112840600A (en) * 2018-08-20 2021-05-25 瑞典爱立信有限公司 Immune system for improving sites using generation of countermeasure networks and reinforcement learning
WO2021136070A1 (en) * 2019-12-30 2021-07-08 三维通信股份有限公司 Resource allocation method for simultaneous wireless information and power transfer, device, and computer
CN111726845A (en) * 2020-07-01 2020-09-29 南京大学 Base station switching selection and power distribution method in multi-user heterogeneous network system
CN112087749A (en) * 2020-08-27 2020-12-15 华北电力大学(保定) Cooperative active eavesdropping method for realizing multiple listeners based on reinforcement learning
CN114363908A (en) * 2022-01-13 2022-04-15 重庆邮电大学 A2C-based unlicensed spectrum resource sharing method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
A Proactive Eavesdropping Game in MIMO Systems Based on Multiagent Deep Reinforcement Learning;Delin Guo;《 IEEE Transactions on Wireless Communications》;全文 *
Eavesdropping Game Based on Multi-Agent Deep Reinforcement Learning;Delin Guo;《 2022 IEEE 23rd International Workshop on Signal Processing Advances in Wireless Communication (SPAWC)》;全文 *
基于全双工的主动监听系统中合法通信速率最大化方法设计;吴伟;胡冰;胡峰;;南京邮电大学学报(自然科学版)(第02期);全文 *
基于博弈论的移动Ad hoc网络入侵检测模型;李奕男;《电子与信息学报》;全文 *

Also Published As

Publication number Publication date
CN115296705A (en) 2022-11-04

Similar Documents

Publication Publication Date Title
CN113543176B (en) Unloading decision method of mobile edge computing system based on intelligent reflecting surface assistance
Zhang et al. NAS-AMR: Neural architecture search-based automatic modulation recognition for integrated sensing and communication systems
CN109617584B (en) MIMO system beam forming matrix design method based on deep learning
CN105790813B (en) Code book selection method based on deep learning under a kind of extensive MIMO
CN108712748B (en) Cognitive radio anti-interference intelligent decision-making method based on reinforcement learning
Zhao et al. Cognitive radio engine design based on ant colony optimization
CN109274456B (en) Incomplete information intelligent anti-interference method based on reinforcement learning
CN117010534B (en) Dynamic model training method, system and equipment based on annular knowledge distillation and meta federal learning
CN114449584B (en) Distributed computing unloading method and device based on deep reinforcement learning
CN118054828B (en) Intelligent super-surface-oriented beam forming method, device, equipment and storage medium
Chen et al. SGPL: An intelligent game-based secure collaborative communication scheme for metaverse over 5G and beyond networks
CN115296705B (en) Active monitoring method in MIMO communication system
Panahi et al. Optimal channel-sensing policy based on fuzzy q-learning process over cognitive radio systems
Zhang et al. How Often Channel Estimation is Required for Adaptive IRS Beamforming: A Bilevel Deep Reinforcement Learning Approach
Wu et al. Online learning to optimize transmission over an unknown gilbert-elliott channel
Li et al. Piecewise-drl: Joint beamforming optimization for ris-assisted mu-miso communication system
Zhou et al. QoS-aware power management with deep learning
Sriharipriya et al. Artifical neural network based multi dimensional spectrum sensing in full duplex cognitive radio networks
Zhang et al. Beyond supervised power control in massive MIMO network: Simple deep neural network solutions
Dai et al. Power allocation for multiple transmitter-receiver pairs under frequency-selective fading based on convolutional neural network
CN112087275A (en) Cooperative spectrum sensing method based on birth and death process and viscous hidden Markov model
CN102055540B (en) Nyman Pearson rule based noise enhancement distributed detection method and system
Miao et al. A Graph Neural Network Power Allocation Algorithm Based on Fully Unrolled WMMSE
Li et al. CWGAN-Based Channel Modeling of Convolutional Autoencoder-Aided SCMA for Satellite-Terrestrial Communication
Du et al. Modulation Recognition Based on Denoising Bidirectional Recurrent Neural Network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant