CN117750525A - Frequency domain anti-interference method and system based on reinforcement learning - Google Patents

Frequency domain anti-interference method and system based on reinforcement learning Download PDF

Info

Publication number
CN117750525A
CN117750525A CN202410182440.7A CN202410182440A CN117750525A CN 117750525 A CN117750525 A CN 117750525A CN 202410182440 A CN202410182440 A CN 202410182440A CN 117750525 A CN117750525 A CN 117750525A
Authority
CN
China
Prior art keywords
interference
channel
receiver
communication
transmitter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410182440.7A
Other languages
Chinese (zh)
Inventor
李刚
吴麒
王翔
董珊珊
罗浩
乔冠华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 10 Research Institute
Original Assignee
CETC 10 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 10 Research Institute filed Critical CETC 10 Research Institute
Priority to CN202410182440.7A priority Critical patent/CN117750525A/en
Publication of CN117750525A publication Critical patent/CN117750525A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The invention discloses a frequency domain anti-interference method and a system based on reinforcement learning, wherein the method comprises the steps that a transmitter and a receiver transmit data through a communication link and transmit control information through a control link; when the communication user performs data transmission, the plurality of patterned jammers generate interference signals to interfere the communication user; the intelligent agent is embedded into the receiver, one communication period of the transmitter and the receiver is divided into a plurality of subframes, each subframe comprises a plurality of time slots, and the avoidance rate of all time slot channels is calculated; judging whether the avoidance rate reaches a preset threshold value, if not, training by using a WDQL algorithm, updating the channel strategy, transmitting the updated channel strategy and NACK to a transmitter through a control link, and starting data transmission of the next communication period. The invention not only ensures lower iteration time and calculation complexity, but also realizes rapid training decision speed and excellent anti-interference performance.

Description

Frequency domain anti-interference method and system based on reinforcement learning
Technical Field
The invention relates to the technical field of wireless communication, in particular to a frequency domain anti-interference method and system based on reinforcement learning.
Background
The openness of the wireless communication channel makes it vulnerable to interference attacks, which in turn results in loss of communication performance, reducing the reliability of the wireless communication system. Thus, the anti-interference technology becomes a crucial research direction in the communication field.
Traditional anti-interference technologies, such as Frequency Hopping Spread Spectrum (FHSS) and Direct Sequence Spread Spectrum (DSSS), although capable of providing a certain anti-interference capability to a communication system, cannot flexibly optimize the anti-interference strategy according to the real-time spectrum environment and interference mode due to its fixed mode. Thus, there is a need for a more intelligent method of selecting communication frequencies to be effective against malicious interference.
With the development of machine learning technology, a scholars in recent years have proposed an anti-interference channel selection method based on Q learning (references: s.liu, y.xu, x.chen, m.wang, w.li, y.li and y.xu, "Pattern-Aware Intelligent Anti-Jamming Communication: A Sequential Deep Reinforcement Learning Approach," in IEEE Access, vol.7, pp. 169204-169216, 2019 "). However, adjusting only system frequency domain parameters does not take full advantage of the multi-domain flexibility of the wireless communication system. Thus, some scholars focused on the joint anti-interference problem of the frequency domain and the power domain and proposed a Multi-parameter Q Learning anti-interference algorithm (reference: Z. Pu, Y. Niu and G. Zhang, "A Multi-Parameter Intelligent Communication Anti-Jamming Method Based on Three-Dimensional Q-Learning," 2022 IEEE 2nd International Conference on Computer Communication and Artificial Intelligence (CCAI), beijin, china, 2022, pp. 205-210.). In addition, there are also scholars combining Q learning with deep learning, fitting Q-value tables using deep reinforcement learning algorithms to achieve dynamic spectrum immunity (ref: X. Liu, Y. Xu, L. Jia, et al, "Anti-Jamming Communications Using Spectrum Waterfall: A Deep Reinforcement Learning Approach," in IEEE Communications Letters, vol. 22, no. 5, pp. 998-1001, may 2018).
However, while anti-jamming algorithms employing deep reinforcement learning successfully solve the "dimensional explosion" problem of huge state decision space, in many cases they have long convergence times and are difficult to train effectively. The reinforcement learning anti-jamming algorithm employing Q learning, while capable of converging in a shorter time than deep reinforcement learning, does not adequately take into account overestimation problems that may result when a single estimator is employed to update the Q value. This problem may make the resulting interference rejection strategy less than optimal.
Thus, how to achieve rapid convergence and good interference immunity in the face of an unknown communication interference environment is a challenge that needs to be addressed by practitioners of the art.
Disclosure of Invention
The invention aims to provide a frequency domain anti-interference method and a system based on reinforcement learning, which are particularly suitable for the situation facing unknown patterned interference. The method can avoid interference rapidly to obtain good anti-interference performance, and simultaneously reduces the frequency of channel switching as much as possible to reduce the communication cost, so that the problems that the anti-interference research based on deep reinforcement learning is difficult to train and the convergence time is long in the prior art are solved, and the problem that strategies faced in the anti-interference research of the reinforcement learning algorithm based on Q learning are not optimal is solved.
The invention discloses a frequency domain anti-interference method based on reinforcement learning, which comprises the following steps:
step 1: the method comprises the steps that a transmitter and a receiver which are mutually communicated are used as communication users, the transmitter and the receiver transmit data through a communication link, and control information is transmitted through a control link, wherein the control information comprises channel strategies and NACK; when the communication user performs data transmission, the plurality of patterned jammers generate interference signals to interfere the communication user;
step 2: the intelligent agent is embedded into the receiver, one communication period of the transmitter and the receiver is divided into a plurality of subframes, each subframe comprises a plurality of time slots, and the avoidance rate of all time slot channels is calculated;
step 3: judging whether the avoidance rate reaches a preset threshold value, if not, training by using a WDQL algorithm, updating the channel strategy, transmitting the updated channel strategy and NACK to a transmitter through a control link, and starting data transmission of the next communication period.
Further, the step 2 includes:
in one communication cycleInside, in front->Or->Data transmission is carried out in time, and the data transmission is carried out in time>The method comprises the steps of carrying out a first treatment on the surface of the At the same time, in front ofOr->In time, an agent located in the receiver perceives the spectrum environment in real time, generating +.>After the data transmission is completed, calculating the avoidance rate of all time slot channels in the current period; communication cycle->Comprising +.>Or->、/>、/>;/>And->All represent time periods.
Further, in each time slot, the communication user obtains the signal-to-interference-and-noise ratio of each channel through spectrum sensing, and combines the signal-to-interference-and-noise ratio information of a plurality of time slots into a plurality of signal-to-interference-and-noise ratio subframes;
the signal-to-interference-plus-noise ratio acquisition method comprises the following steps:
describing channel state information by adopting a block fading channel model, wherein channel parameters are kept unchanged in each time slot; modeling the channel gain between the transmitter and the receiver and the channel gain between the jammer and the receiver respectively;
and calculating the signal-to-interference-and-noise ratio of the receiver based on the channel gain model between the transmitter and the receiver and the channel gain model between the jammer and the receiver.
Further, the channel gain model between the transmitter and the receiver is as follows
The channel gain model between the receiver and the jth jammer is
Wherein,for the Euclidean distance between transmitter and receiver, < >>For the receiver and the firstjEuclidean distance between the jammers, < >>And->Is a path fading factor>For the instantaneous fading coefficient, the mean value is 0, and the variance is +.>Complex gaussian variable of (a);
the signal-to-interference-and-noise ratio of the receiver is
Wherein,indicating the communication user's selection of messages in time slot tDao->Center frequency of>For baseband signal bandwidth, ">For communication signal power, +.>And->Representing the interference power spectral density function and the noise power spectral density function respectively,fthe frequency of the variable is represented by,n(f) Representing the power spectral density of the noise,Jindicating the number of jammers, +.>Represent the firstjThe selected channel of the individual jammer in the t time slot +.>Is set at the center frequency of (a).
Further, the step 3 includes:
step 31: setting training subframe numberFront +.A WDQL algorithm is used for the current communication period>Training sub-frames to obtain +.>For Q value table (+)>);/>And->All are Q value tables;
step 32: for the pair ofThe Q value table is averaged to extract the action with the maximum Q value in each time slot to form a length of +.>As an optimal channel strategy;
step 33: and sending the optimal channel strategy and NACK to a transmitter together, guiding the transmitter to carry out channel selection in the next communication period, wherein N is a positive integer greater than 1.
Further, the step 32 includes:
first in the state of time slot tNext, according to probability->Randomly selecting an action, or according to probability +.>Selecting the action with the maximum Q value, namely +.>;/>Indicates the action of->An action at time slot t;
then calculate the actionIs awarded->And randomly select to update the Q value table +.>Or Q value table->
Further, the rewardsThe calculation process of (1) comprises:
the agent in the receiver avoids the interference through Markov decision; the Markov decision process includes states, actions, state transition probabilities and rewards, the states of the t time slots are expressed asAll states constitute a state space->The method comprises the steps of carrying out a first treatment on the surface of the The action in time slot t is denoted +.>,/>,/>For the number of available channels, m represents the mth channel in the number of available channels, all actions constitute action space +.>The method comprises the steps of carrying out a first treatment on the surface of the And state transition probabilitySatisfy->Indicated at slot->When the intelligent agent is in the current environment state +.>Select action->The environment shifts to the next slot +.>Status of->The probability of the instant channel avoidance rate; />Representing actions of all available channels in the action space A at the time of t time slots; />State space representing t time slots, ">State space representing the t+1 time slot, +.>Representing the state of the t+1 slot, +.>Representing the action space of the t+1 time slot, and Pr and p both represent probabilities;
if the communication user already perceives the signal-to-interference-and-noise ratio information of each time slot, all the state transition probabilities are determined values; the bonus representative is in stateAt the time, the agent selects action->Environment transitions to State->The obtained gain is demodulated by three preset modulation modes corresponding to the demodulation threshold +.>,/>And->And a switching cost +_ brought when a channel switch is generated>Determining, profit->Expressed as:
wherein,as an indication function, express when->I.e. when no channel switch occurs in the front and rear slots, no cost is lost, otherwise, a size of +.>Cost of (2); />Representing the channel selected by the current time slot of the communication subscriber at time t+1 time slot +.>Is set at the center frequency of (a).
Further, the goal of the Markov decision is to maximize the total revenue within a subframe
Wherein,optimal channel policy representing communication subscriber +.>Representing slave policy->Selecting an operation that maximizes the total profit of the communication subscriber,/->Representing the sum of rewards earned for all slots of a single subframe,/->For the number of slots in a single subframe, < >>Represents action, pi (τ) represents and policy +.>Corresponding actions, T, represent the number of slots within a single subframe.
Further, the random selection updatesOr->Comprising the following steps:
if it is updated
If it is updated
Wherein (1)>And->Respectively represent the next state obtained based on the Q value table +.>Action with maximum Q and action with minimum Q down, < >>And->As weights, to balance the problem of overestimation of a single estimator with underestimation of a double estimator; when updating the Q value, use of +.>And->Two Q value tables, ">For learning rate->For discounts factor->For weight parameter, ++>To find the maximum function.
The invention also discloses a frequency domain anti-interference system based on reinforcement learning, which is used for realizing the frequency domain anti-interference method based on reinforcement learning, and comprises the following steps:
the communication module is used for taking a transmitter and a receiver which are mutually communicated as communication users, transmitting data through a communication link and transmitting control information through a control link; when the communication user performs data transmission, the plurality of patterned jammers generate interference signals to interfere the communication user;
the computing module is used for embedding an agent into the receiver, dividing one communication period of the transmitter and the receiver into a plurality of subframes, wherein each subframe comprises a plurality of time slots, and computing the avoidance rate of all time slot channels;
and the decision module is used for judging whether the avoidance rate reaches a preset threshold value, if the avoidance rate does not reach the preset threshold value, training by using a WDQL algorithm, updating the channel strategy, transmitting the updated channel strategy and NACK to the transmitter through the control link, and starting data transmission of the next communication period.
Due to the adoption of the technical scheme, the invention has the following advantages:
1. through a continuously optimized weighted double-Q learning algorithm, the system can update the Q-value table, formulate and communicate the optimized channel strategy to the transmitter through a stable control link. The model of the invention has complete design and reasonable algorithm, not only ensures lower iteration time and calculation complexity, but also realizes rapid training decision speed. Particularly, when facing fixed mode interference, the system can quickly converge and has excellent anti-interference performance, and a powerful guarantee is provided for the reliability of the wireless communication system.
2. Based on reinforcement learning technology, an intelligent communication anti-interference method is designed. According to the method, the frequency spectrum environment is periodically sensed, the signal-to-interference-and-noise ratio subframes are generated, and a channel strategy is formulated through training of the subframes so as to achieve the purpose of avoiding interference.
3. The invention does not need to estimate the interference mode and parameters of the jammer in advance, namely, the model is not needed, so the invention can be widely applied to various modeling anti-interference scenes.
4. The invention can avoid the problems of long training and convergence time of the deep reinforcement learning algorithm and the overestimation of the actions by taking the Q learning algorithm as the reinforcement learning algorithm. By effectively solving the proposed model, good anti-interference performance can be obtained.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments described in the embodiments of the present invention, and other drawings may be obtained according to these drawings for those skilled in the art.
FIG. 1 is a diagram of a frequency domain anti-interference system of the present invention;
FIG. 2 is a single communication cycle of the frequency domain anti-interference system of the present inventionA structural design drawing in the inner part;
FIG. 3 is a flow chart of the frequency domain anti-interference method of the present invention;
FIG. 4 (a) is a signal-to-interference-and-noise ratio thermodynamic diagram of comb interference rejection in accordance with an embodiment of the present invention;
FIG. 4 (b) is a diagram of yet another SNR thermodynamic diagram for comb interference rejection in accordance with an embodiment of the present invention;
FIG. 4 (c) is a graph of yet another SNR thermodynamic diagram of comb interference rejection in accordance with an embodiment of the present invention;
FIG. 5 (a) is a signal-to-interference-and-noise ratio thermodynamic diagram of the present invention for resisting the interference of the frequency sweep;
FIG. 5 (b) is a diagram of a further signal-to-interference-and-noise ratio thermodynamic diagram for combating swept-frequency interference in an embodiment of the invention;
FIG. 5 (c) is a diagram of a further signal-to-interference-and-noise ratio thermodynamic diagram for combating swept-frequency interference in an embodiment of the invention;
FIG. 6 is a graph of prize variation against comb interference in an embodiment of the invention;
FIG. 7 is a graph showing a reward variation of the anti-sweep interference according to an embodiment of the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings and examples, wherein the examples are shown only in a partial, but not in all embodiments of the invention. All other embodiments obtained by those skilled in the art are intended to fall within the scope of the embodiments of the present invention.
Fig. 1 is a model diagram of a frequency domain interference rejection system. In this model, a pair of communication transceivers form a communication user, and a transmitter and a receiver transmit data over a communication link while transmitting control information over a control link. The agent is embedded in the receiver, obtains channel information using spectrum sensing, and optimizes channel strategies using reinforcement learning algorithms. Meanwhile, a plurality of patterned jammers generate high-power interference signals for interfering communication users.
FIG. 2 is a single communication cycle of a frequency domain interference rejection systemThe inner structural design drawing. In this structure, the communication user performs the following operations: before->Or->Within the time (/ ->) And carrying out data transmission. Meanwhile, during this period, the intelligent agent in the receiver perceives the spectrum environment in real time, generating +.>And after the data transmission is completed, the communication user calculates the avoidance rate of all time slot channels in the current period. If the avoidance rate is greater than the set threshold value, the method shows that the channel strategy is not required to be optimized through reinforcement learning, and the method is carried out at the last +.>An ACK (Acknowledgement) is sent to the transmitter in time. Conversely, in the following->During time, the Q value table is updated by reinforcement learning algorithm to give the optimal channel strategy, and at last +.>A NACK (Negative Acknowledgement, negative feedback) and the latest channel strategy are sent to the transmitter in time. Finally, the communication user uses the latest channel strategy to transmit data after the next communication period starts.
The invention provides an embodiment of a frequency domain anti-interference method based on reinforcement learning, which comprises the following steps:
step 1: the method comprises the steps that a transmitter and a receiver which are mutually communicated are used as communication users, the transmitter and the receiver transmit data through a communication link, and control information is transmitted through a control link, wherein the control information comprises channel strategies and NACK; when the communication user performs data transmission, the plurality of patterned jammers generate interference signals to interfere the communication user;
step 2: the intelligent agent is embedded into the receiver, one communication period of the transmitter and the receiver is divided into a plurality of subframes, each subframe comprises a plurality of time slots, and the avoidance rate of all time slot channels is calculated;
step 3: judging whether the avoidance rate reaches a preset threshold value, if not, training and updating a channel strategy by using a WDQL (weighted double Q-learing, weighted double Q learning) algorithm, then sending the updated channel strategy and NACK to a transmitter through a control link, and starting data transmission of the next communication period.
The model of the invention has complete design and reasonable algorithm, not only ensures lower iteration time and calculation complexity, but also realizes rapid training decision speed. Particularly, when facing fixed mode interference, the system can quickly converge and has excellent anti-interference performance, and a powerful guarantee is provided for the reliability of the wireless communication system.
In this embodiment, step 2 includes:
in one communication cycleInside, in front->Or->Data transmission is carried out in time, and the data transmission is carried out in time>The method comprises the steps of carrying out a first treatment on the surface of the At the same time, in front ofOr->Within the time periodAn agent located in the receiver perceives the spectrum environment in real time, generating +.>After the data transmission is completed, calculating the avoidance rate of all time slot channels in the current period; communication cycle->Comprising +.>Or->、/>、/>;/>And->All represent time periods.
In this embodiment, in each time slot, a communication user obtains the signal-to-interference-and-noise ratio of each channel through spectrum sensing, and combines the signal-to-interference-and-noise ratio information of a plurality of time slots into a plurality of signal-to-interference-and-noise ratio subframes;
the signal-to-interference-and-noise ratio acquisition method comprises the following steps:
describing channel state information by adopting a block fading channel model, wherein channel parameters are kept unchanged in each time slot; modeling the channel gain between the transmitter and the receiver and the channel gain between the jammer and the receiver respectively;
and calculating the signal-to-interference-and-noise ratio of the receiver based on the channel gain model between the transmitter and the receiver and the channel gain model between the jammer and the receiver.
In this embodiment, the channel gain model between the transmitter and the receiver is
The channel gain model between the receiver and the jth jammer is
Wherein,for the Euclidean distance between transmitter and receiver, < >>For the receiver and the firstjEuclidean distance between the jammers, < >>And->Is a path fading factor>For the instantaneous fading coefficient, the mean value is 0, and the variance is +.>Complex gaussian variable of (a);
the signal-to-interference-and-noise ratio of the receiver is
Wherein,indicating the selected channel of the communication subscriber in the t time slot +.>Center frequency of>For baseband signal bandwidth, ">For communication signal power, +.>And->Representing the interference power spectral density function and the noise power spectral density function respectively,fthe frequency of the variable is represented by,n(f) Representing the power spectral density of the noise,Jindicating the number of jammers, +.>Represent the firstjThe selected channel of the individual jammer in the t time slot +.>Is set at the center frequency of (a).
In this embodiment, step 3 includes:
step 31: setting training subframe numberFront +.A WDQL algorithm is used for the current communication period>Training sub-frames to obtain +.>For Q value table (+)>);/>And->All are Q value tables;
step 32: for the pair ofThe Q value table is averaged to extract the action with the maximum Q value in each time slot to form a length of +.>As an optimal channel strategy;
step 33: and sending the optimal channel strategy and NACK to a transmitter together, guiding the transmitter to carry out channel selection in the next communication period, wherein N is a positive integer greater than 1.
In this embodiment, step 32 includes:
first in the state of time slot tNext, according to probability->Randomly selecting an action, or according to probability +.>Selecting the action with the maximum Q value, namely +.>;/>Indicates the action of->An action at time slot t;
then calculate the actionIs awarded->And randomly select to update the Q value table +.>Or Q value table->
In the present embodiment, rewardsThe calculation process of (1) comprises:
the agent in the receiver avoids the interference through Markov decision; the Markov decision process includes states, actions, state transition probabilities, and rewards, with the states of the t slots expressed asAll states constitute a state space->The method comprises the steps of carrying out a first treatment on the surface of the The action in time slot t is denoted +.>,/>,/>For the number of available channels, m represents the mth channel in the number of available channels, all actions constitute action space +.>The method comprises the steps of carrying out a first treatment on the surface of the And state transition probabilitySatisfy->Indicated at slot->When the intelligent agent is in the current environment state +.>Select action->The environment shifts to the next slot +.>State of (2)The probability of the instant channel avoidance rate; />Representing actions of all available channels in the action space A at the time of t time slots; />State space representing t time slots, ">State space representing the t+1 time slot, +.>Representing the state of the t+1 slot, +.>Representing the action space of the t+1 time slot, and Pr and p both represent probabilities;
if the communication user already perceives the signal-to-interference-and-noise ratio information of each time slot, all the state transition probabilities are determined values; the bonus representative is in stateAt the time, the agent selects action->Environment transitions to State->Post-obtainingThe obtained income is respectively corresponding to demodulation threshold by three preset modulation modes>,/>And->And a switching cost +_ brought when a channel switch is generated>Determining, profit->Expressed as:
wherein,as an indication function, express when->I.e. when no channel switch occurs in the front and rear slots, no cost is lost, otherwise, a size of +.>Cost of (2); />Representing the channel selected by the current time slot of the communication subscriber at time t+1 time slot +.>Is set at the center frequency of (a).
In this embodiment, the Markov decision is madeThe goal is to maximize the total benefit within one subframe
Wherein,optimal channel policy representing communication subscriber +.>Representing slave policy->Selecting an operation that maximizes the total profit of the communication subscriber,/->Representing the sum of rewards earned for all slots of a single subframe,/->For the number of slots in a single subframe, < >>Represents action, pi (τ) represents and policy +.>Corresponding actions, T, represent the number of slots within a single subframe.
In this embodiment, the update is randomly selectedOr->Comprising the following steps:
if it is more thanNew type
If it is updated
Wherein (1)>And->Respectively represent the next state obtained based on the Q value table +.>Action with maximum Q and action with minimum Q down, < >>And->As weights, to balance the problem of overestimation of a single estimator with underestimation of a double estimator; when updating the Q value, use of +.>And->Two Q value tables, ">For learning rate->For discounts factor->For weight parameter, ++>To find the maximum function.
Step 3 further comprises:
if the avoidance rate is greater than the preset threshold value, the receiver sends ACK to the sender through the control link, the channel strategy of the sender is unchanged, and data transmission in the next communication period is started.
The invention also provides an embodiment of a frequency domain anti-interference system based on reinforcement learning, which is used for realizing the frequency domain anti-interference method based on reinforcement learning described in the above embodiment, and comprises the following steps:
the communication module is used for taking a transmitter and a receiver which are mutually communicated as communication users, transmitting data through a communication link and transmitting control information through a control link; when the communication user performs data transmission, the plurality of patterned jammers generate interference signals to interfere the communication user;
the computing module is used for embedding an agent into the receiver, dividing one communication period of the transmitter and the receiver into a plurality of subframes, wherein each subframe comprises a plurality of time slots, and computing the avoidance rate of all time slot channels;
and the decision module is used for judging whether the avoidance rate reaches a preset threshold value, if the avoidance rate does not reach the preset threshold value, training by using a WDQL algorithm, updating the channel strategy, transmitting the updated channel strategy and NACK to the transmitter through the control link, and starting data transmission of the next communication period.
The invention is further illustrated by the following examples:
under the windows 10-bit 64-operating system, simulations were completed in pychar software using the python language using a CPU model 12th Gen Intel (R) Core (TM) i 3-12100.30 GH. To analyze the effectiveness of the system, it is compared to a random channel selection algorithm. The relevant parameter settings for reinforcement learning are shown in table 1.
Table 1 simulation parameter settings
In an embodiment, one communication period is divided into 20 signal-to-interference-and-noise ratio subframes, two fixed interference modes are considered: comb interference and swept interference. Fig. 4 (a), fig. 4 (b) and fig. 4 (c) respectively show signal-to-interference-and-noise-ratio thermodynamic diagrams of a communication user obtained by performing system simulation by using a WDQL algorithm in a comb interference environment. Each block represents a channel and the black blocks represent the optimal channel strategy given by the reinforcement learning algorithm for the current communication cycle. The shade of the grey square represents the magnitude of the signal-to-interference-plus-noise value, and the darker the color, the smaller the value, which indicates that the corresponding channel is disturbed to a greater extent and is unsuitable for communication. Fig. 4 (a), fig. 4 (b) and fig. 4 (c) correspond to the first, second and third snr subframes of the current communication period, respectively, and the perceived snr information is different from time slot to time slot, so that the thermodynamic diagrams of the different subframes have different color shades, but the interference patterns are consistent. It can be observed that after the reinforcement learning algorithm is trained, the channel strategy given by the intelligent agent basically avoids the interference of the jammer, and the purpose of avoiding the interference is achieved. Similarly, fig. 5 (a), fig. 5 (b) and fig. 5 (c) show thermodynamic diagrams of signal-to-interference-and-noise ratios of communication users obtained by performing system simulation using the WDQL algorithm in a swept interference environment. Sweep interference is more complex than comb interference, resulting in more frequent channel switching frequencies.
Fig. 6 and 7 show graphs of reward variation of reinforcement learning based channel selection algorithm and random channel selection algorithm in comb interference and swept interference environments. It can be observed from the graph that as the number of training rounds increases, the rewards per round of the reinforcement learning-based algorithm are continuously increased, so that the interference is effectively avoided, and the final rewards tend to be a stable value. Conversely, the prize value of the random channel selection algorithm is not increased and interference is naturally not effectively avoided.
Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims (10)

1. The frequency domain anti-interference method based on reinforcement learning is characterized by comprising the following steps of:
step 1: the method comprises the steps that a transmitter and a receiver which are mutually communicated are used as communication users, the transmitter and the receiver transmit data through a communication link, and control information is transmitted through a control link, wherein the control information comprises channel strategies and NACK; when the communication user performs data transmission, the plurality of patterned jammers generate interference signals to interfere the communication user;
step 2: the intelligent agent is embedded into the receiver, one communication period of the transmitter and the receiver is divided into a plurality of subframes, each subframe comprises a plurality of time slots, and the avoidance rate of all time slot channels is calculated;
step 3: judging whether the avoidance rate reaches a preset threshold value, if not, training by using a WDQL algorithm, updating the channel strategy, transmitting the updated channel strategy and NACK to a transmitter through a control link, and starting data transmission of the next communication period.
2. The reinforcement learning-based frequency domain interference rejection method according to claim 1, wherein the step 2 comprises:
in one communication cycleInside, in front->Or->Data transmission is carried out in time, and the data transmission is carried out in time>The method comprises the steps of carrying out a first treatment on the surface of the At the same time, before->Or (b)In time, an agent located in the receiver perceives the spectrum environment in real time, generating +.>After the data transmission is completed, calculating the avoidance rate of all time slot channels in the current period; communication cycle->Comprising +.>Or->、/>、/>;/>And->All represent time periods.
3. The reinforcement learning-based frequency domain anti-interference method according to claim 1 or 2, wherein in each time slot, a communication user obtains the signal-to-interference-and-noise ratio of each channel through spectrum sensing, and combines the signal-to-interference-and-noise ratio information of a plurality of time slots into a plurality of signal-to-interference-and-noise ratio subframes;
the signal-to-interference-plus-noise ratio acquisition method comprises the following steps:
describing channel state information by adopting a block fading channel model, wherein channel parameters are kept unchanged in each time slot; modeling the channel gain between the transmitter and the receiver and the channel gain between the jammer and the receiver respectively;
and calculating the signal-to-interference-and-noise ratio of the receiver based on the channel gain model between the transmitter and the receiver and the channel gain model between the jammer and the receiver.
4. The reinforcement learning-based frequency domain interference rejection method according to claim 3, wherein the channel gain model between the transmitter and the receiver is
The channel gain model between the receiver and the jth jammer is
Wherein,for the Euclidean distance between transmitter and receiver, < >>For the receiver and the firstjEuclidean distance between the jammers, < >>And->Is a path fading factor>For the instantaneous fading coefficient, the mean value is 0, and the variance is +.>Complex gaussian variable of (a);
the signal-to-interference-and-noise ratio of the receiver is
Wherein,indicating the selected channel of the communication subscriber in the t time slot +.>Center frequency of>For baseband signal bandwidth, ">For communication signal power, +.>And->Representing the interference power spectral density function and the noise power spectral density function respectively,fthe frequency of the variable is represented by,n(f) Representing the power spectral density of the noise,Jindicating the number of jammers, +.>Represent the firstjThe selected channel of the individual jammer in the t time slot +.>Is set at the center frequency of (a).
5. The reinforcement learning-based frequency domain interference rejection method according to claim 1, wherein the step 3 comprises:
step 31: setting training subframe numberFront +.A WDQL algorithm is used for the current communication period>Training sub-frames to obtain +.>For Q value table (+)>);/>And->All are Q value tables;
step 32: for the pair ofThe Q value table is averaged to extract the action with the maximum Q value in each time slot to form a length of +.>As an optimal channel strategy;
step 33: and sending the optimal channel strategy and NACK to a transmitter together, guiding the transmitter to carry out channel selection in the next communication period, wherein N is a positive integer greater than 1.
6. The reinforcement learning-based frequency domain interference rejection method according to claim 5, wherein said step 32 comprises:
first in the state of time slot tNext, according to probability->Randomly selecting an action, or according to probability +.>Selecting the action with the maximum Q value, namely +.>;/>Indicates the action of->An action at time slot t;
then calculate the actionIs awarded->And randomly select to update the Q value table +.>Or Q value table->
7. The reinforcement learning based frequency domain interference avoidance method according to claim 6 wherein said rewardsThe calculation process of (1) comprises:
the agent in the receiver avoids the interference through Markov decision; the Markov decision process includes states, actions, state transition probabilities and rewards, the states of the t time slots are expressed asAll states constitute a state space->The method comprises the steps of carrying out a first treatment on the surface of the The action in time slot t is denoted +.>,/>,/>For the number of available channels, m represents the mth channel in the number of available channels, all actions constitute action space +.>The method comprises the steps of carrying out a first treatment on the surface of the And state transition probabilitySatisfy->Representing in time slotsWhen the intelligent agent is in the current environment state +.>Select action->The environment shifts to the next slot +.>Status of->The probability of the instant channel avoidance rate; />Representing actions of all available channels in the action space A at the time of t time slots; />State space representing t time slots, ">State space representing the t+1 time slot, +.>Representing the state of the t+1 slot, +.>Representing the action space of the t+1 time slot, and Pr and p both represent probabilities;
if the communication user already perceives the signal-to-interference-and-noise ratio information of each time slot, all the state transition probabilities are determined values; the bonus representative is in stateAt the time, the agent selects action->Environment transitions to State->The obtained gain is demodulated by three preset modulation modes corresponding to the demodulation threshold +.>,/>And->And a switching cost +_ brought when a channel switch is generated>Determining, profit->Expressed as:
wherein,as an indication function, express when->I.e. when no channel switch occurs in the front and rear slots, no cost is lost, otherwise, a size of +.>Cost of (2); />Representing the channel selected by the current time slot of the communication subscriber at time t+1 time slot +.>Is set at the center frequency of (a).
8. The reinforcement learning based frequency domain interference rejection method according to claim 7, wherein the markov decision is aimed at maximizing the total gain within one subframe
Wherein,optimal channel policy representing communication subscriber +.>Representing slave policy->Selecting an operation that maximizes the total profit of the communication subscriber,/->Representing the sum of rewards earned for all slots of a single subframe,/->As the number of slots within a single sub-frame,represents action, pi (τ) represents and policy +.>Corresponding actions, T, represent the number of slots within a single subframe.
9. The reinforcement learning based frequency domain interference rejection method according to claim 8, wherein the randomly selected updatesOr->Comprising the following steps:
if it is updated
If it is updated
Wherein (1)>And->Respectively represent the next state obtained based on the Q value table +.>Action with maximum Q and action with minimum Q down, < >>And->As weights, to balance the problem of overestimation of a single estimator with underestimation of a double estimator; when updating the Q value, use of +.>And->Two Q value tables, ">For learning rate->For discounts factor->For weight parameter, ++>To find the maximum function.
10. A reinforcement learning-based frequency domain interference suppression system for implementing the reinforcement learning-based frequency domain interference suppression method of any one of claims 1-9, comprising:
the communication module is used for taking a transmitter and a receiver which are mutually communicated as communication users, transmitting data through a communication link and transmitting control information through a control link; when the communication user performs data transmission, the plurality of patterned jammers generate interference signals to interfere the communication user;
the computing module is used for embedding an agent into the receiver, dividing one communication period of the transmitter and the receiver into a plurality of subframes, wherein each subframe comprises a plurality of time slots, and computing the avoidance rate of all time slot channels;
and the decision module is used for judging whether the avoidance rate reaches a preset threshold value, if the avoidance rate does not reach the preset threshold value, training by using a WDQL algorithm, updating the channel strategy, transmitting the updated channel strategy and NACK to the transmitter through the control link, and starting data transmission of the next communication period.
CN202410182440.7A 2024-02-19 2024-02-19 Frequency domain anti-interference method and system based on reinforcement learning Pending CN117750525A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410182440.7A CN117750525A (en) 2024-02-19 2024-02-19 Frequency domain anti-interference method and system based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410182440.7A CN117750525A (en) 2024-02-19 2024-02-19 Frequency domain anti-interference method and system based on reinforcement learning

Publications (1)

Publication Number Publication Date
CN117750525A true CN117750525A (en) 2024-03-22

Family

ID=90259480

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410182440.7A Pending CN117750525A (en) 2024-02-19 2024-02-19 Frequency domain anti-interference method and system based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN117750525A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109586820A (en) * 2018-12-28 2019-04-05 中国人民解放军陆军工程大学 The anti-interference model of dynamic spectrum and intensified learning Anti-interference algorithm in fading environment
US20210123741A1 (en) * 2019-10-29 2021-04-29 Loon Llc Systems and Methods for Navigating Aerial Vehicles Using Deep Reinforcement Learning
US20210241090A1 (en) * 2020-01-31 2021-08-05 At&T Intellectual Property I, L.P. Radio access network control with deep reinforcement learning
CN114280558A (en) * 2021-12-23 2022-04-05 北京邮电大学 Interference signal waveform optimization method based on reinforcement learning
US20220209885A1 (en) * 2020-12-24 2022-06-30 Viettel Group Method and apparatus for adaptive anti-jamming communications based on deep double-q reinforcement learning
CN115103446A (en) * 2022-05-25 2022-09-23 南京邮电大学 Multi-user communication anti-interference intelligent decision-making method based on deep reinforcement learning
CN115236607A (en) * 2022-06-30 2022-10-25 北京邮电大学 Radar anti-interference strategy optimization method based on double-layer Q learning
CN116744311A (en) * 2023-05-24 2023-09-12 中国人民解放军国防科技大学 User group spectrum access method based on PER-DDQN

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109586820A (en) * 2018-12-28 2019-04-05 中国人民解放军陆军工程大学 The anti-interference model of dynamic spectrum and intensified learning Anti-interference algorithm in fading environment
US20210123741A1 (en) * 2019-10-29 2021-04-29 Loon Llc Systems and Methods for Navigating Aerial Vehicles Using Deep Reinforcement Learning
US20210241090A1 (en) * 2020-01-31 2021-08-05 At&T Intellectual Property I, L.P. Radio access network control with deep reinforcement learning
US20220209885A1 (en) * 2020-12-24 2022-06-30 Viettel Group Method and apparatus for adaptive anti-jamming communications based on deep double-q reinforcement learning
CN114280558A (en) * 2021-12-23 2022-04-05 北京邮电大学 Interference signal waveform optimization method based on reinforcement learning
CN115103446A (en) * 2022-05-25 2022-09-23 南京邮电大学 Multi-user communication anti-interference intelligent decision-making method based on deep reinforcement learning
CN115236607A (en) * 2022-06-30 2022-10-25 北京邮电大学 Radar anti-interference strategy optimization method based on double-layer Q learning
CN116744311A (en) * 2023-05-24 2023-09-12 中国人民解放军国防科技大学 User group spectrum access method based on PER-DDQN

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
FANGMIN XU;FAN YANG;CHENGLIN ZHAO;SHENG WU;: "Deep Reinforcement Learning Based Joint Edge Resource Management in Maritime Network", 中国通信, no. 05, 15 May 2020 (2020-05-15) *
WANLING LI等: "D2D Communication Power Control Based on Deep Q Learning and Fractional Frequency Reuse", 2023 15TH INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS (ICCSN), 6 November 2023 (2023-11-06) *
谭俊杰;梁应敞;: "面向智能通信的深度强化学习方法", 电子科技大学学报, no. 02, 30 March 2020 (2020-03-30) *

Similar Documents

Publication Publication Date Title
CN109586820A (en) The anti-interference model of dynamic spectrum and intensified learning Anti-interference algorithm in fading environment
CN111726217B (en) Deep reinforcement learning-based autonomous frequency selection method and system for broadband wireless communication
CN108712748B (en) Cognitive radio anti-interference intelligent decision-making method based on reinforcement learning
CN104994569B (en) Multi-user reinforcement learning-based method for resisting hostile interference of cognitive wireless network
CN109274456B (en) Incomplete information intelligent anti-interference method based on reinforcement learning
Kong et al. A reinforcement learning approach for dynamic spectrum anti-jamming in fading environment
Liu et al. A heterogeneous information fusion deep reinforcement learning for intelligent frequency selection of HF communication
Kim Adaptive online power control scheme based on the evolutionary game theory
Van Huynh et al. DeepFake: Deep dueling-based deception strategy to defeat reactive jammers
Ilahi et al. LoRaDRL: Deep reinforcement learning based adaptive PHY layer transmission parameters selection for LoRaWAN
CN113225794A (en) Full-duplex cognitive communication power control method based on deep reinforcement learning
CN113423110A (en) Multi-user multi-channel dynamic spectrum access method based on deep reinforcement learning
Han et al. Primary-user-friendly dynamic spectrum anti-jamming access: A GAN-enhanced deep reinforcement learning approach
CN115567148A (en) Intelligent interference method based on cooperative Q learning
CN115766089A (en) Energy acquisition cognitive Internet of things anti-interference optimal transmission method
Pei et al. Joint time-frequency anti-jamming communications: A reinforcement learning approach
Li et al. Intelligent dynamic spectrum anti-jamming communications: A deep reinforcement learning perspective
Zhou et al. A countermeasure against random pulse jamming in time domain based on reinforcement learning
CN113271119B (en) Anti-interference cooperative frequency hopping method based on transmission scheduling
Courjault et al. How robust is a LoRa communication against impulsive noise?
CN113038567B (en) Anti-interference method of anti-interference system in multi-relay communication
CN117750525A (en) Frequency domain anti-interference method and system based on reinforcement learning
CN111741520A (en) Cognitive underwater acoustic communication system power distribution method based on particle swarm
CN116866048A (en) Anti-interference zero-and Markov game model and maximum and minimum depth Q learning method
CN114978388B (en) Unmanned aerial vehicle time-frequency domain combined cognition anti-interference intelligent decision-making method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination