CN113395129B - Decoy-assisted hidden anti-interference method, device and storage medium - Google Patents

Decoy-assisted hidden anti-interference method, device and storage medium Download PDF

Info

Publication number
CN113395129B
CN113395129B CN202110547565.1A CN202110547565A CN113395129B CN 113395129 B CN113395129 B CN 113395129B CN 202110547565 A CN202110547565 A CN 202110547565A CN 113395129 B CN113395129 B CN 113395129B
Authority
CN
China
Prior art keywords
frequency
communication
preset
decoy
spoofing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110547565.1A
Other languages
Chinese (zh)
Other versions
CN113395129A (en
Inventor
刘鑫
王一凡
王玫
陈璐瑶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Technology
Original Assignee
Guilin University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Technology filed Critical Guilin University of Technology
Priority to CN202110547565.1A priority Critical patent/CN113395129B/en
Publication of CN113395129A publication Critical patent/CN113395129A/en
Application granted granted Critical
Publication of CN113395129B publication Critical patent/CN113395129B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04KSECRET COMMUNICATION; JAMMING OF COMMUNICATION
    • H04K3/00Jamming of communication; Counter-measures
    • H04K3/40Jamming having variable characteristics
    • H04K3/42Jamming having variable characteristics characterized by the control of the jamming frequency or wavelength
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04KSECRET COMMUNICATION; JAMMING OF COMMUNICATION
    • H04K3/00Jamming of communication; Counter-measures
    • H04K3/60Jamming involving special techniques
    • H04K3/65Jamming involving special techniques using deceptive jamming or spoofing, e.g. transmission of false signals for premature triggering of RCIED, for forced connection or disconnection to/from a network or for generation of dummy target signal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04KSECRET COMMUNICATION; JAMMING OF COMMUNICATION
    • H04K3/00Jamming of communication; Counter-measures
    • H04K3/80Jamming or countermeasure characterized by its function
    • H04K3/82Jamming or countermeasure characterized by its function related to preventing surveillance, interception or detection
    • H04K3/825Jamming or countermeasure characterized by its function related to preventing surveillance, interception or detection by jamming
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Noise Elimination (AREA)

Abstract

The invention provides a decoy auxiliary type hidden anti-interference method, a device and a storage medium, wherein the method comprises the following steps: acquiring the current moment through a user receiver, judging whether the current moment is earlier than the preset moment, if so, randomly selecting a communication frequency and a trapping frequency within the range from a preset initial frequency to a preset termination frequency, and if not, directly acquiring the communication frequency and the trapping frequency from the user receiver; and obtaining a frequency spectrum sample sequence according to the communication frequency, the decoy frequency and the interference signal, and storing the frequency spectrum sample sequence into a preset frequency spectrum sample sequence table. The invention ensures that the user trapping signal can firmly attract the interference attack, so that the user communication signal realizes the covert communication by the interference, the purposes of avoiding the interference attack and obtaining the information without interference are realized, meanwhile, the covert communication is realized by sacrificing the user trapping signal, the anti-interference performance is improved, and the leakage of the self information is avoided.

Description

Decoy-assisted hidden anti-interference method, device and storage medium
Technical Field
The invention mainly relates to the technical field of communication anti-interference, in particular to a decoy auxiliary type hidden anti-interference method, a device and a storage medium.
Background
With the development of wireless networks, the development of wireless networks is faster and faster, and the proportion of the wireless networks in daily life and military communication of people is increased year by year. Wireless networks offer many benefits to people's life and work due to the openness of their propagation medium, which however also makes them vulnerable to interference. With the advancement of science and technology, intelligent interference with learning ability has a serious influence on wireless communication technology, and how to deal with the intelligent interference becomes one of the popular directions for anti-interference researchers to research. At present, most anti-interference technologies are based on an evasion strategy, namely, the attack of interference is avoided as much as possible. Although the current effect of the method is good, the self information of the communication user is leaked, and the anti-interference performance is obviously reduced along with the continuous learning of the interference on the user information.
The method has the advantages that a learner puts forward a concealed anti-interference idea, utilizes an environment signal to conceal a communication signal of the learner, and ensures that the information of the learner is not interfered and acquired, so that the interference cannot be learned and targeted interference can not be implemented. However, in the model, it is necessary to assume that there is a strong ambient signal around, i.e. there is a perception blind area for the interference. In reality, environmental signals do not exist all the time, and self information can be masked, so that a significant research direction is provided for how to realize the hiding and anti-interference under the condition that the interference is not influenced by the environment.
Disclosure of Invention
The invention aims to solve the technical problem of the prior art and provides a decoy auxiliary type hidden anti-interference method, a device and a storage medium.
The technical scheme for solving the technical problems is as follows: a decoy auxiliary type hidden anti-interference method comprises the following steps:
s1: acquiring a current moment through a user receiver, and judging whether the current moment is earlier than a preset moment, if so, executing a step S2, and if not, executing a step S3;
s2: randomly selecting a communication frequency and a decoy frequency within a range from a preset initial frequency to a preset termination frequency, and executing a step S4;
s3: directly acquiring a communication frequency and a decoy frequency from the user receiver, and executing a step S4;
s4: acquiring an interference signal from an interference machine, obtaining a frequency spectrum sample sequence according to the communication frequency, the decoy frequency and the interference signal, and storing the frequency spectrum sample sequence into a preset frequency spectrum sample sequence table;
s5: judging whether the current time is earlier than the preset time again, if so, acquiring the next time through a user receiver, judging whether the next time is earlier than the preset time, if so, returning to the step S2, otherwise, returning to the step S3, taking the preset frequency spectrum sample value sequence table as the environmental state information of the current time, fitting the environmental state information, the preset starting frequency and the preset terminating frequency based on a deep reinforcement learning network, outputting the initial communication frequency of the current time and the initial trapping frequency of the current time, saving the communication frequency in a preset historical communication frequency sequence table, and saving the trapping frequency in the preset historical trapping frequency sequence table;
s6: sending the communication frequency and the spoofing frequency to the user receiver, wherein the communication frequency is used for controlling a user transmitter to transmit communication signals by the user receiver, and the spoofing frequency is used for controlling the user transmitter to transmit spoofing signals by the user receiver;
s7: respectively fitting the communication frequency, the environment state information and the decoy frequency in the preset historical communication frequency sequence list based on an interference decision evaluation network, outputting a communication evaluation network error value at the current moment and a decoy evaluation network error value at the current moment, calculating the instantaneous return of the communication evaluation network error value to obtain a first instantaneous return at the current moment, and calculating the instantaneous return of the decoy evaluation network error value to obtain a second instantaneous return at the current moment;
s8: when the environment state is shifted to the environment state of the next moment, taking the environment state information, the first instantaneous reward of the current moment and the initial communication frequency of the current moment as the communication experience information of the current moment, storing the communication experience information of the current moment into a preset communication experience data set, taking the environment state information, the second instantaneous reward of the current moment and the initial spoofing frequency of the current moment as the spoofing experience information of the current moment, storing the spoofing experience information of the current moment into a preset spoofing experience data set, returning to the step S1 until the number of the communication experience information stored in the preset communication experience data set and the number of the spoofing experience information stored in the preset spoofing experience data set reach a preset upper limit value, and executing the step S9;
s9: updating the deep reinforcement learning network according to the preset communication experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting optimized communication frequency, updating the deep reinforcement learning network according to the preset decoy experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting optimized decoy frequency, and sending the optimized communication frequency and the optimized decoy frequency to the user receiver.
Another technical solution of the present invention for solving the above technical problems is as follows: a decoy-assisted covert jamming protection device comprising:
the first judging module is used for acquiring the current moment through a user receiver and judging whether the current moment is earlier than a preset moment or not, if so, the first signal is sent to the random selection module, and if not, the second signal is sent to the frequency acquisition module;
the random selection module is used for randomly selecting a communication frequency and a decoy frequency within the range from a preset initial frequency to a preset termination frequency according to the first signal and sending the randomly selected communication frequency and the decoy frequency to the frequency spectrum sample value sequence processing module;
the frequency acquisition module is used for directly acquiring communication frequency and decoy frequency from the user receiver according to the second signal and sending the acquired communication frequency and decoy frequency to the frequency spectrum sample value sequence processing module;
the frequency spectrum sample value sequence processing module is used for acquiring an interference signal from an interference machine, obtaining a frequency spectrum sample value sequence according to the communication frequency, the decoy frequency and the interference signal, and storing the frequency spectrum sample value sequence into a preset frequency spectrum sample value sequence table;
a final judging module, configured to judge whether the current time is earlier than the preset time again, if so, obtain the next time through a user receiver, and judge whether the next time is earlier than the preset time, if so, return to the random selection module, otherwise, return to the frequency obtaining module, until the current time reaches or is later than the preset time, use the preset frequency spectrum sample value sequence table as environment state information of the current time, perform fitting processing on the environment state information, the preset starting frequency and the preset terminating frequency based on a deep reinforcement learning network, output an initial communication frequency of the current time and an initial spoofing frequency of the current time, store the communication frequency in a preset historical communication frequency sequence table, and store the spoofing frequency in the preset historical spoofing frequency sequence table;
a sending module, configured to send the communication frequency and the spoofing frequency to the user receiver, where the communication frequency is used for the user receiver to control a user transmitter to transmit a communication signal, and the spoofing frequency is used for the user receiver to control the user transmitter to transmit a spoofing signal;
a fitting processing module, configured to perform fitting processing on the communication frequency in the preset historical communication frequency sequence table, the environment state information, and the spoofing frequency in the preset historical spoofing frequency sequence table respectively based on an interference decision evaluation network, output a communication evaluation network error value at the current time and a spoofing evaluation network error value at the current time, and calculate an instantaneous reward of the communication evaluation network error value to obtain a first instantaneous reward at the current time, and calculate an instantaneous reward of the spoofing evaluation network error value to obtain a second instantaneous reward at the current time;
the environment state transfer module is used for taking the environment state information, the first instantaneous reward of the current moment and the initial communication frequency of the current moment as the communication experience information of the current moment when the environment state is transferred to the environment state of the next moment, storing the communication experience information of the current moment into a preset communication experience data set, taking the environment state information, the second instantaneous reward of the current moment and the initial trapping frequency of the current moment as the trapping experience information of the current moment, storing the trapping experience information of the current moment into a preset trapping experience data set, returning to the execution step S1 until the number of the communication experience information stored in the preset communication experience data set and the number of the trapping experience information stored in the preset trapping experience data set reach a preset upper limit value, and sending a third signal into the frequency optimization module;
and the frequency optimization module is used for receiving the third signal, updating the deep reinforcement learning network according to the preset communication experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting an optimized communication frequency, updating the deep reinforcement learning network according to the preset spoofing experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting an optimized spoofing frequency, and sending the optimized communication frequency and the optimized spoofing frequency to the user receiver.
Another technical solution of the present invention for solving the above technical problems is as follows: a decoy-assisted covert anti-jamming device comprising a memory, a processor and a computer program stored in said memory and executable on said processor, said computer program, when executed by said processor, implementing a decoy-assisted covert anti-jamming method as described above.
Another technical solution of the present invention for solving the above technical problems is as follows: a computer readable storage medium storing a computer program which, when executed by a processor, implements a decoy-assisted covert interference rejection method as described above.
The beneficial effects of the invention are: the current moment is obtained through the user receiver, when the current moment is earlier than the preset moment, the communication frequency and the trapping frequency are randomly selected from the range from the preset starting frequency to the preset ending frequency, experience is accumulated, data are provided for subsequent processing, when the current moment is equal to or later than the preset moment, the communication frequency and the trapping frequency are directly obtained from the user receiver, the fact that the user trapping signal can firmly attract interference attack is guaranteed, the user communication signal can achieve covert communication accordingly, the purposes that the interference attack is avoided, information is not interfered and obtained are achieved, meanwhile, covert communication is achieved by sacrificing the user trapping signal, anti-interference performance is improved, and meanwhile leakage of self information is avoided.
Drawings
FIG. 1 is a flow chart of a spoofing-assisted covert interference rejection method according to an embodiment of the present invention;
fig. 2 is a block diagram of a spoofing-assisted covert interference rejection unit according to an embodiment of the present invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
Fig. 1 is a schematic flowchart of a spoofing-assisted covert interference rejection method according to an embodiment of the present invention.
As shown in fig. 1, a spoofing-assisted covert anti-interference method includes the following steps:
1. a decoy auxiliary type hidden anti-interference method is characterized by comprising the following steps:
s1: acquiring a current moment through a user receiver, and judging whether the current moment is earlier than a preset moment, if so, executing a step S2, and if not, executing a step S3;
s2: randomly selecting a communication frequency and a decoy frequency within a range from a preset initial frequency to a preset end frequency, and executing a step S4;
s3: directly acquiring a communication frequency and a decoy frequency from the user receiver, and executing a step S4;
s4: obtaining an interference signal from an interference machine, obtaining a frequency spectrum sample sequence according to the communication frequency, the decoy frequency and the interference signal, and storing the frequency spectrum sample sequence into a preset frequency spectrum sample sequence table;
s5: judging whether the current time is earlier than the preset time again, if so, acquiring the next time through a user receiver, judging whether the next time is earlier than the preset time, if so, returning to the step S2, otherwise, returning to the step S3, taking the preset frequency spectrum sample value sequence table as the environmental state information of the current time, fitting the environmental state information, the preset starting frequency and the preset terminating frequency based on a deep reinforcement learning network, outputting the initial communication frequency of the current time and the initial trapping frequency of the current time, saving the communication frequency in a preset historical communication frequency sequence table, and saving the trapping frequency in the preset historical trapping frequency sequence table;
s6: sending the communication frequency and the spoofing frequency to the user receiver, wherein the communication frequency is used for controlling a user transmitter to transmit communication signals by the user receiver, and the spoofing frequency is used for controlling the user transmitter to transmit spoofing signals by the user receiver;
s7: respectively fitting the communication frequency, the environment state information and the decoy frequency in the preset historical communication frequency sequence list based on an interference decision evaluation network, outputting a communication evaluation network error value at the current moment and a decoy evaluation network error value at the current moment, calculating the instantaneous return of the communication evaluation network error value to obtain a first instantaneous return at the current moment, and calculating the instantaneous return of the decoy evaluation network error value to obtain a second instantaneous return at the current moment;
s8: when the environment state is transferred to the environment state of the next moment, taking the environment state information, the first instantaneous reward of the current moment and the initial communication frequency of the current moment as the communication experience information of the current moment, storing the communication experience information of the current moment in a preset communication experience data set, taking the environment state information, the second instantaneous reward of the current moment and the initial spoofing frequency of the current moment as the spoofing experience information of the current moment, storing the spoofing experience information of the current moment in a preset spoofing experience data set, returning to the step S1 until the number of the communication experience information stored in the preset communication experience data set and the number of the spoofing experience information stored in the preset spoofing experience data set reach a preset upper limit value, and executing the step S9;
s9: updating the deep reinforcement learning network according to the preset communication experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting optimized communication frequency, updating the deep reinforcement learning network according to the preset decoy experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting optimized decoy frequency, and sending the optimized communication frequency and the optimized decoy frequency to the user receiver.
It should be understood that the interference decision evaluation network refers to a GAN network, i.e. a generation-counter network.
It should be appreciated that the first instantaneous reward can be an instantaneous reward of a communication deep reinforcement learning network; the second instantaneous reward may be an instantaneous reward that spoofs a deep reinforcement learning network.
Specifically, step S2 transmits the obtained communication frequency and spoofing frequency to the user transmitter, and step S4 receives an interference signal transmitted by the jammer from the user receiver, and also receives a communication signal and a spoofing signal transmitted by the user transmitter, where the communication signal includes the communication frequency and the communication bandwidth, and the spoofing signal includes the spoofing frequency and the spoofing bandwidth.
It should be understood that, if the preset time is set to 10 o ' clock, if the current time is 9 o ' clock and 58 min, step S2 is executed, and if the current time is 10 o ' clock and 01 min, step S3 is executed.
It should be understood that considering a scenario where one user and one interference are competing within a communication bandwidth of B, so that one pair of the user transmitter and the user receiver is one user, there is one interference in the system to implement interference on user communication.
It should be understood that, the user transmits signals (i.e. communication signals and spoofing signals), the jammer learns and makes a targeted jamming decision according to the perceived environment state, releases the jamming signals, the user can receive the communication signals and spoofing signals and jamming signals transmitted by the user transmitter, and learn the received environment state information to obtain corresponding communication frequencies and spoofing frequencies, and transmit the corresponding communication frequencies and spoofing frequencies to the user receiver, and the user receiver transmits control information to the user transmitter to transmit corresponding signals.
It should be appreciated that since the environmental state cannot expand in the time dimension in the early stages of the confrontation, the user begins to randomly select a frequency and accumulate experience until the environmental state satisfies the expansion in the time dimension.
Specifically, considering that the decision of both the confrontation parties is related to the environmental state of the past for a long time, the environmental state is defined as S t ={s t ,s t-1 ,…,s t-T+1 },s t Is a frequency spectrum sample value sequence, T represents the time length of backtracking, and the user can use the environment state information S t Learning is performed and communicated to the user transmitter over a control link.
In the embodiment, the current time is obtained through the user receiver, when the current time is earlier than the preset time, the communication frequency and the trapping frequency are randomly selected from the range from the preset starting frequency to the preset ending frequency, experience is accumulated, data are provided for subsequent processing, when the current time is equal to or later than the preset time, the communication frequency and the trapping frequency are directly obtained from the user receiver, and the fact that the user trapping signal can firmly attract interference attack is guaranteed, so that the user communication signal is concealed in communication, the purposes that the user communication signal is free from interference attack and information is not interfered to obtain are achieved, meanwhile, the concealed communication is achieved by sacrificing the user trapping signal, the anti-interference performance is improved, and meanwhile, the leakage of self information is avoided.
Optionally, as an embodiment of the present invention, the interference signal includes an interference selection frequency and an interference signal bandwidth, and the step S4 of obtaining the spectrum sample sequence according to the communication frequency, the decoy frequency, and the interference signal specifically includes:
calculating the communication frequency, the decoy frequency, the interference selection frequency and the interference signal bandwidth through a first equation to obtain a frequency spectrum sample sequence, wherein the first equation is as follows:
s t ={s 1,t ,s 2,t ,…,s n,t },
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003074177490000091
wherein H t (f)=g u U t (f)+g j J t (f)+n(f),
Wherein the content of the first and second substances,
Figure BDA0003074177490000092
wherein the content of the first and second substances,
Figure BDA0003074177490000093
wherein the content of the first and second substances,
Figure BDA0003074177490000094
wherein s is t Is a sequence of spectral samples, s n,t For the nth spectral sample,. DELTA.f spectral resolution, f s As the starting frequency, H t (f) Is the power spectral density, g u For the channel gain between the subscriber transmitter and the subscriber receiver, g j For channel gain between jammer to user receiver, n (f) is noise, U t (f) For user signals, J t (f) In order to interfere with the signal, it is,
Figure BDA0003074177490000097
for user decision, p u For transmitting communication signals and spoofing signals for power, b u For the spectral bandwidth of the communication signal and the spoofed signal,
Figure BDA0003074177490000095
in order to make a decision on the interference,
Figure BDA0003074177490000096
for the decision to communicate at time t,
Figure BDA00030741774900001010
is the communication frequency at the time t,
Figure BDA0003074177490000101
for the decoy decision at time t,
Figure BDA0003074177490000102
for the decoy frequency at time t,
Figure BDA00030741774900001011
the frequency is selected for the purpose of interference,
Figure BDA0003074177490000103
is the interference signal bandwidth.
Understandably, in the formula, H t (f) And H t (f+f s ) The difference in (A) is merely that the values of the substitutions differ, i.e. H t (f) The substituted values are f, H t (f+f s ) The value of f + f s
Specifically, the start-stop frequencies of the communication band where the user and the interference are located are f s And f e The user can select the frequencies fc and f d Respectively as a communication center frequency and a spoofing center frequency, wherein f c ∈[f s ,f e ],f d ∈[f s ,f e ]With p u For power transmission of communication signals and spoofing signals, the user signal has a spectral bandwidth of b u The user's decision only takes into account the change in frequency, including the communication frequency and the spoofing frequency, i.e.
Figure BDA0003074177490000104
Wherein
Figure BDA0003074177490000105
Thus defining the user signal as
Figure BDA0003074177490000106
The interfering user can also be in the communication band f s ,f e ]Internal free selection frequency f j Transmitting an interference signal (i.e. the interference selection frequency) for the center frequency, whichSignal bandwidth b j ∈[b s ,b h ]Can be changed according to the perception state, wherein b s ,b h Minimum and maximum bandwidths for an interfering signal, respectively, the interference decision comprising the interference selection frequency and the interfering signal bandwidth
Figure BDA0003074177490000107
Thus defining the interference signal as
Figure BDA0003074177490000108
The sensing device configured with the communication party can sense the frequency spectrum of the whole communication frequency band in real time, and the PSD of the receiving end signal considering the coexistence of the user signal and the interference signal is expressed as follows:
H t (f)=g u U t (f)+g j J t (f)+n(f)
wherein g is u Representing the channel gain, g, between the subscriber transmitter and the subscriber receiver j Representing the channel gain from the jammer to the user receiver and n (f) representing the noise.
The spectrum sample after discretization is
Figure BDA0003074177490000109
Where Δ f denotes the spectral resolution, f s Is the start-stop frequency; the frequency spectrum sample value sequence of the user receiving end is s t ={s 1,t ,s 2,t ,…,s n,t }。
In the embodiment, the frequency spectrum sample value sequence is obtained by calculating the communication frequency, the decoy frequency, the interference selection frequency and the interference signal bandwidth in the first mode, so that a basis is provided for acquiring subsequent environment state information, the communication is concealed, the anti-interference performance is improved, and meanwhile, the leakage of self information is avoided.
Optionally, as an embodiment of the present invention, the deep reinforcement learning network includes a communication deep reinforcement learning network and a spoofing deep reinforcement learning network, and in step S5, the process of fitting the environment state information, the preset starting frequency and the preset terminating frequency based on the deep reinforcement learning network and outputting the initial communication frequency at the current time and the initial spoofing frequency at the current time specifically includes:
extracting the characteristics of the environment state information based on the communication deep reinforcement learning network, outputting a plurality of initial communication Q values at the current moment, and screening out the maximum communication Q value from the plurality of communication Q values;
selecting a frequency within the range from the preset initial frequency to the preset termination frequency according to the maximum communication Q value to obtain the initial communication frequency at the current moment;
extracting the characteristics of the environment state information based on the spoofing deep reinforcement learning network, outputting a plurality of spoofing Q values which are initial at the current moment, and screening out the maximum spoofing Q value from the plurality of spoofing Q values;
and selecting the frequency within the range from the preset initial frequency to the preset termination frequency according to the maximum decoy Q value to obtain the initial decoy frequency at the current moment.
It should be understood that the communication Q-value is a value derived representing a selection of a different user decision in the ambient state St, and the spoof Q-value is a value derived representing a selection of a different spoof decision in the ambient state St.
Specifically, the user receives the environmental status information S t . The user selects the probability epsilon according to the communication decision c Randomly selecting one of said communication frequencies, or according to a probability 1-epsilon c Selecting environmental status information S t The communication frequency corresponding to the largest communication Q value in the down decision
Figure BDA0003074177490000111
The user selects a probability epsilon according to the spoofed decision d Randomly selecting one of said spoofing frequencies, or according to the probability 1-epsilon d Selecting an environmental State S t The spoofing frequency corresponding to the maximum spoofing Q value in the down decision
Figure BDA0003074177490000112
In the embodiment, the initial communication frequency at the current moment and the initial spoofing frequency at the current moment are output based on the fitting processing of the deep reinforcement learning network on the environmental state information, the preset starting frequency and the preset terminating frequency, the frequency can be updated, the user spoofing signal can be ensured to firmly attract the interference attack, the user communication signal realizes the covert communication, and the purposes of avoiding the interference attack and obtaining the information without interference are achieved.
Optionally, as an embodiment of the present invention, the interference decision evaluation network includes a communication interference decision evaluation network and a spoofed interference decision evaluation network; the process of step S7 specifically includes:
performing feature extraction on the communication frequency in the preset historical communication frequency sequence table based on a communication interference decision evaluation network, and outputting a communication fitting environment state at the current moment;
calculating the communication fitting environment state and the environment state information by a second formula to obtain a communication evaluation network error value at the current moment, wherein the second formula is as follows:
Figure BDA0003074177490000121
wherein the content of the first and second substances,
Figure BDA0003074177490000122
a network error value is evaluated for the communication,
Figure BDA0003074177490000123
is the environmental status information at the time t,
Figure BDA0003074177490000124
fitting an environment state for the communication at the time t;
calculating the instantaneous return of the communication evaluation network error value through a third formula to obtain a first instantaneous return at the current moment, wherein the third formula is as follows:
Figure BDA0003074177490000125
wherein the content of the first and second substances,
Figure BDA0003074177490000126
a network error value is evaluated for the communication,
Figure BDA0003074177490000127
reporting for the first instant;
performing feature extraction on the decoy frequency in the preset historical decoy frequency sequence table based on a decoy interference decision evaluation network, and outputting a decoy fitting environment state at the current moment;
calculating the environment state of the decoy fitting and the environment state information through a fourth formula to obtain a decoy evaluation network error value at the current moment, wherein the fourth formula is as follows:
Figure BDA0003074177490000128
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003074177490000129
in order to spoof the evaluation of the network error value,
Figure BDA00030741774900001210
is the environmental status information at the time t,
Figure BDA00030741774900001211
fitting the environment state for the spoofing at time t;
calculating the instantaneous return of the cheating evaluation network error value through a fifth formula to obtain a second instantaneous return at the current moment, wherein the fifth formula is as follows:
Figure BDA00030741774900001212
wherein the content of the first and second substances,
Figure BDA0003074177490000131
in the form of a second instantaneous reward,
Figure BDA0003074177490000132
the net error value is evaluated for spoofing.
It should be appreciated that the communication interference decision evaluation network and the spoofed interference decision evaluation network may both be generating countermeasure networks.
It should be appreciated that the interference decision correlation evaluation network (i.e. either the communication interference decision evaluation network or the spoofed interference decision evaluation network) evaluates the network loss value with the communication at each iteration
Figure BDA0003074177490000133
And spoofing the evaluation of network loss values
Figure BDA0003074177490000134
Reverse updating respective interference decision evaluation network parameters
Figure BDA0003074177490000135
And
Figure BDA0003074177490000136
specifically, past communication decision sequences are compared
Figure BDA0003074177490000137
(i.e., the predetermined historical communication frequency sequence list) and past spoofing decision sequences
Figure BDA0003074177490000138
(namely the preset historical spoofing frequency sequence list) respectively input the respective interference decision correlation evaluation network (namely the communication interference decision evaluation network or the spoofing interference decision evaluation network) for feature extraction to obtain the output instantaneous environment state
Figure BDA0003074177490000139
And
Figure BDA00030741774900001310
computing output
Figure BDA00030741774900001311
And
Figure BDA00030741774900001312
and instantaneous environmental state
Figure BDA00030741774900001313
Estimate network loss value of communication between
Figure BDA00030741774900001314
And spoofing the evaluation of network loss values
Figure BDA00030741774900001315
Defining the first instantaneous reward of the communication deep reinforcement learning network as
Figure BDA00030741774900001316
The second instantaneous reward of the decoy deep reinforcement learning network is
Figure BDA00030741774900001317
In the embodiment, the fitting process of the communication frequency, the environmental state information and the spoofing frequency in the preset historical communication frequency sequence table and the fitting process of the spoofing frequency in the preset historical spoofing frequency sequence table are respectively output based on the interference decision evaluation network, so that the first instant reward of the current moment and the second instant reward of the current moment are ensured to be firmly attracted by the user spoofing signal, the user communication signal is enabled to realize covert communication, the purposes of avoiding interference attack and obtaining information without interference are realized, meanwhile, covert communication is realized by sacrificing the user spoofing signal, the anti-interference performance is improved, and meanwhile, the leakage of self information is avoided.
Optionally, as an embodiment of the present invention, the process of step S8 specifically includes:
when the environmental state is transferred to the environmental state of the next moment, executing the step S1 to the step S5 so as to obtain the environmental state information of the next moment;
taking the environmental state information at the current moment, the first instantaneous report at the current moment, the initial communication frequency at the current moment and the environmental state information at the next moment as communication experience information, and storing the communication experience information into a preset communication experience data set, wherein the communication experience information specifically comprises the following steps:
communication experience data set D for storing communication experience information with upper limit of N c
Defining the communication experience information as
Figure BDA0003074177490000141
Wherein the content of the first and second substances,
Figure BDA0003074177490000142
where t is the time, S t Environmental state at time t, S t+1 Is the ambient state at time t +1,
Figure BDA0003074177490000143
for the decision to communicate at time t,
Figure BDA0003074177490000147
is the communication frequency at the time of the t,
Figure BDA0003074177490000146
a first instantaneous reward for time t;
taking the environmental state information at the current moment, the second instantaneous return at the current moment, the initial trapping frequency at the current moment and the environmental state information at the next moment as trapping experience information, and storing the trapping experience information into a preset trapping experience data set;
when the quantity of the communication experience information stored in the preset communication experience data set and the communication experience information stored in the preset cheating experience data setWhen the number of the cheating experience information reaches the preset upper limit value, extracting the communication experience information from the preset communication experience data set according to an equal probability mode, and updating the weight of the communication deep reinforcement learning network through the extracted communication experience information
Figure BDA0003074177490000144
Extracting decoy experience information from the preset decoy experience data set according to an equal probability mode, and updating the weight value of the decoy deep reinforcement learning network through the extracted decoy experience information
Figure BDA0003074177490000148
It should be understood that the preset upper limit value may be N/2.
It should be understood that the communication experience (i.e., the communication experience information) of the user at time t is recorded
Figure BDA0003074177490000145
It is stored into the data set Dc.
It is to be understood that N is a positive integer greater than 0.
In the embodiment, the environmental state information at the current moment, the first instantaneous reward at the current moment, the initial communication frequency at the current moment and the environmental state information at the next moment are taken as the communication experience information, and the communication experience information is stored in the preset communication experience data set, so that the user decoy signal can firmly attract the interference attack, the user communication signal can realize covert communication, the purposes of avoiding the interference attack and obtaining the information without interference are realized, meanwhile, the covert communication is realized by sacrificing the user decoy signal, the anti-interference performance is improved, and the leakage of the self information is avoided.
Optionally, as an embodiment of the present invention, the step of using the environmental status information at the current time, the second instantaneous reward at the current time, the initial spoofing frequency at the current time, and the environmental status information at the next time as the spoofing experience information, and storing the spoofing experience information in a preset spoofing experience data set specifically includes:
setting a spoofing experience data set D for storing upper limit of the N pieces of spoofing experience information d
Defining the decoy experience information as
Figure BDA0003074177490000151
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003074177490000152
where t is the time, S t Is the environmental state at time t, S t+1 Is the environmental state at time t +1,
Figure BDA0003074177490000153
for the decoy decision at time t,
Figure BDA00030741774900001510
for the frequency of spoofing at time t,
Figure BDA0003074177490000159
the second instantaneous reward at time t. It should be understood that the user's t-time spoofing experience (i.e., the spoofing experience information) is recorded
Figure BDA0003074177490000154
It is stored in the data set Dd.
It is to be understood that N is a positive integer greater than 0.
In the embodiment, the environmental state information at the current moment, the second instantaneous reward at the current moment, the initial trapping frequency at the current moment and the environmental state information at the next moment are taken as the trapping experience information, and the trapping experience information is stored in the preset trapping experience data set, so that the user trapping signals can firmly attract the interference attack, the user communication signals realize the covert communication, the purposes of avoiding the interference attack and obtaining the information without interference are achieved, meanwhile, the covert communication is achieved by sacrificing the user trapping signals, the anti-interference performance is improved, and meanwhile, the leakage of the self information is avoided.
Optionally, as an embodiment of the present invention, the communication experience information is extracted from the preset communication experience data set according to an equiprobable manner, and the weight of the communication deep reinforcement learning network is updated according to the extracted communication experience information
Figure BDA0003074177490000155
The process comprises the following steps:
respectively extracting communication experience information from the preset communication experience data set according to an equal probability mode;
constructing a communication target value according to the extracted communication experience information, wherein the communication target value is as follows:
Figure BDA0003074177490000156
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003074177490000157
in order to achieve the communication target value,
Figure BDA0003074177490000158
is the first instantaneous return at time t, γ c A value representing a communication reward attenuation factor,
Figure BDA0003074177490000161
is shown in the environment S t+1 The maximum communication Q value output by the lower communication deep reinforcement learning network,
Figure BDA0003074177490000162
the communication frequency at the time t + 1;
calculating the gradient of the communication deep reinforcement learning network according to the sixth formula and the communication target value
Figure BDA0003074177490000163
The sixth formula is:
Figure BDA0003074177490000164
wherein the content of the first and second substances,
Figure BDA0003074177490000165
wherein the content of the first and second substances,
Figure BDA0003074177490000166
in order to calculate the partial derivatives,
Figure BDA0003074177490000167
is the communication frequency at the time t,
Figure BDA0003074177490000168
to communicate an error value, E is desired, S t Is the state of the environment at time t,
Figure BDA0003074177490000169
in order to achieve the communication target value,
Figure BDA00030741774900001610
to be in an environment S t The communication Q value output by the lower communication deep reinforcement learning network,
Figure BDA00030741774900001611
gradient of a reinforcement learning network for communication depth;
updating the gradient of the communication deep reinforcement learning network by using a random gradient descent algorithm to obtain a weight value of the communication deep reinforcement learning network
Figure BDA00030741774900001612
And until all the communication experience information in the preset communication experience data set is extracted.
In particular, from the communication experience data set D with equal probability c Randomly extracting experience (i.e. communication experience information) and according to the state and phase in the experienceCorresponding decision and return are carried out to construct a target value
Figure BDA00030741774900001613
Wherein
Figure BDA00030741774900001614
Indicating the environmental state S in memory t+1 The maximum Q value that the user can obtain (i.e., in the environment S) t+1 Maximum communication Q value output by lower communication deep reinforcement learning network), calculating target value
Figure BDA00030741774900001615
And true value
Figure BDA00030741774900001616
Error between
Figure BDA00030741774900001617
And from this the gradient is calculated as follows,
Figure BDA00030741774900001618
then, adopting a random gradient descent algorithm to update the network weight
Figure BDA00030741774900001619
In the embodiment, the communication experience information is extracted from the preset communication experience data set in an equiprobable manner, the weight of the communication depth reinforcement learning network is updated through the extracted communication experience information, the user decoy signal can firmly attract the interference attack, the user communication signal can realize covert communication through the communication experience information, the purposes of avoiding the interference attack and obtaining the information without interference are achieved, meanwhile, the covert communication is achieved through sacrificing the user decoy signal, the anti-interference performance is improved, and meanwhile, the leakage of the information of the user is avoided.
Optionally, as an embodiment of the present invention, the cheating experience is cheated from the preset experience according to an equal probability mannerExtracting the decoy experience information in the data set, and updating the weight value of the decoy deep reinforcement learning network through the extracted decoy experience information
Figure BDA0003074177490000171
The process comprises the following steps:
respectively extracting the spoofing experience information from the preset spoofing experience data set according to an equal probability mode;
constructing a decoy target value according to the extracted decoy experience information, wherein the decoy target value is as follows:
Figure BDA0003074177490000172
wherein, γ d Indicating a spoofed reward attenuation factor,
Figure BDA00030741774900001715
for the second instantaneous reward at time t,
Figure BDA0003074177490000173
in order to trick the target value into play,
Figure BDA0003074177490000174
is shown in the environment S t+1 The maximum decoy Q value output by the lower decoy deep reinforcement learning network,
Figure BDA0003074177490000175
the frequency of fraud at time t + 1;
calculating the gradient of the decoy deep reinforcement learning network according to the seventh formula and the decoy target value
Figure BDA0003074177490000176
The seventh formula is:
Figure BDA0003074177490000177
wherein the content of the first and second substances,
Figure BDA0003074177490000178
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003074177490000179
in order to calculate the partial derivative,
Figure BDA00030741774900001710
in order to trick the target value into play,
Figure BDA00030741774900001711
to trick the error value, E is the expectation, S t Is the state of the environment at time t,
Figure BDA00030741774900001712
to be in an environment S t The lower spoofing deep reinforcement learning network outputs the spoofing Q value,
Figure BDA00030741774900001713
gradient of a deep reinforcement learning network for deception;
updating the gradient of the decoy deep reinforcement learning network by using a random gradient descent algorithm to obtain the weight of the decoy deep reinforcement learning network
Figure BDA00030741774900001714
And until all the cheating experience information in the preset cheating experience data set is extracted.
In particular, from the preset spoofing experience data set D, equi-probabilistically d Randomly extracting experience (namely the cheating experience information), and constructing a target value according to the state in the experience and corresponding decision and return
Figure BDA0003074177490000181
Wherein
Figure BDA0003074177490000182
Indicating the environmental state S in memory t+1 That the user can obtainMaximum Q value (i.e. in the environment S) t+1 Maximum spoofing Q value output by the lower spoofing deep reinforcement learning network). Calculating a target value
Figure BDA0003074177490000187
And true value
Figure BDA0003074177490000183
Error between
Figure BDA0003074177490000184
And from this the gradient is calculated as follows,
Figure BDA0003074177490000185
then, adopting a random gradient descent algorithm to update the network weight
Figure BDA0003074177490000186
In the embodiment, the spoofing experience information is extracted from the preset spoofing experience data set in an equiprobable manner, the weight of the spoofing depth reinforcement learning network is updated through the extracted spoofing experience information, so that a user spoofing signal can firmly attract interference attack, the user communication signal can realize covert communication, the purposes of avoiding interference attack and obtaining information without interference are achieved, meanwhile, covert communication is realized through sacrificing the user spoofing signal, the anti-interference performance is improved, and meanwhile, the leakage of self information is avoided.
Optionally, as an embodiment of the present invention, in step S9, the fitting process is performed on the environment state information based on the updated deep reinforcement learning network, an optimized communication frequency is output, the deep reinforcement learning network is updated according to the preset spoofing experience data set, the fitting process is performed on the environment state information based on the updated deep reinforcement learning network, and a process of outputting the optimized spoofing frequency specifically includes:
updating the communication deep reinforcement learning network according to the preset communication experience data set, performing fitting processing on the environment state information based on the updated communication deep reinforcement learning network, and outputting optimized communication frequency;
and updating the spoofing deep reinforcement learning network according to the preset spoofing experience data set, fitting the environmental state information based on the updated spoofing deep reinforcement learning network, and outputting optimized spoofing frequency.
Specifically, outputting the optimized communication frequency and the optimized spoofing frequency are both based on the formula epsilon = max (0.01, epsilon-delta epsilon), and updating the communication decision selection probability epsilon c And decoy decision selection probability epsilon d Wherein, delta epsilon is the attenuation coefficient of the updating step length, and the probability epsilon is selected through the updated communication decision c And decoy decision selection probability epsilon d And (4) obtaining the product.
In the embodiment, the optimized communication frequency is output based on the fitting processing of the updated deep reinforcement learning network on the environmental state information, the deep reinforcement learning network is updated according to the preset spoofing experience data set, the optimized spoofing frequency is output based on the fitting processing of the updated deep reinforcement learning network on the environmental state information, and the user spoofing signal can firmly attract the interference attack, so that the user communication signal realizes covert communication, the purposes of avoiding the interference attack and obtaining the information without interference are achieved, meanwhile, the covert communication is realized by sacrificing the user spoofing signal, the anti-interference performance is improved, and meanwhile, the leakage of self information is avoided.
Optionally, as an embodiment of the present invention, before the method is executed, the user initializes the preset communication experience data set
Figure BDA0003074177490000191
And said predetermined spoofing experience dataset
Figure BDA0003074177490000192
Setting the preset communication experience data set D c And said predetermined spoofing experience data set D d The upper limit is N, the upper limit of the iterative times of the algorithm is M,the communication decision selection probability ε c And the decoy decision selection probability ε d Coefficient of reward attenuation gamma c ,γ d Communication deep reinforcement learning network parameters
Figure BDA0003074177490000193
And interference decision evaluation network parameters thereof
Figure BDA0003074177490000194
And spoofing deep reinforcement learning network parameters
Figure BDA0003074177490000195
And interference decision evaluation network parameters thereof
Figure BDA0003074177490000196
Set to a random number.
Optionally, as an embodiment of the present invention, the user and the interference are countered when the spectrum bandwidth B =20MHz, and both sides can freely select the center frequency to transmit the signal; bandwidth b of user signal u =1MHz, power p u =30dBm; interference signal bandwidth b j ∈[b s ,b h ]Can be changed according to the environmental state, wherein b s =1MHz,b h =3MHz, interference signal power p j =60dBm. Carrying out full-band sensing on users and interference once every 1ms, and storing sensed frequency spectrum data for 100ms, namely, the length T =100ms during backtracking; the decision is made by the two parties every 10ms, wherein the user selects the method of the invention to make the optimal decision to realize anti-interference, the interference selects Q learning algorithm, and the best interference is made by learning the sensed user information. Initializing a user data set
Figure BDA0003074177490000197
Setting an upper limit N =1000 of a data set, setting an upper limit M =10000 of iteration times of the algorithm, and selecting a probability epsilon according to a strategy c =ε d =1, reward attenuation coefficient gamma c =γ d =0.8. The attenuation coefficient Δ ∈ =0.001 for the update step.
The final result shows that the method of the invention has excellent performance when the user uses the method to resist the intelligent interference with learning ability, the user can make a decoy decision to firmly attract the interference attack, and the communication decision can be concealed by the user, thereby not only improving the anti-interference performance, but also ensuring the information safety of the user.
Fig. 2 is a block diagram of a spoofing-assisted covert interference rejection unit according to an embodiment of the present invention.
Optionally, as another embodiment of the present invention, as shown in fig. 2, a decoy-assisted covert anti-jamming device includes:
the first judging module is used for acquiring the current moment through the user receiver and judging whether the current moment is earlier than the preset moment, if so, the first signal is sent to the random selection module, and if not, the second signal is sent to the frequency acquisition module;
the random selection module is used for randomly selecting a communication frequency and a decoy frequency within the range from a preset initial frequency to a preset termination frequency according to the first signal and sending the randomly selected communication frequency and the decoy frequency to the frequency spectrum sample value sequence processing module;
the frequency acquisition module is used for directly acquiring the communication frequency and the spoofing frequency from the user receiver according to the second signal and sending the acquired communication frequency and the spoofing frequency to the frequency spectrum sample value sequence processing module;
the frequency spectrum sample value sequence processing module is used for acquiring an interference signal from an interference machine, obtaining a frequency spectrum sample value sequence according to the communication frequency, the decoy frequency and the interference signal, and storing the frequency spectrum sample value sequence into a preset frequency spectrum sample value sequence table;
a final judging module, configured to judge whether the current time is earlier than the preset time again, if so, obtain the next time through a user receiver, and judge whether the next time is earlier than the preset time, if so, return to the random selection module, otherwise, return to the frequency obtaining module, until the current time reaches or is later than the preset time, use the preset frequency spectrum sample value sequence table as environment state information of the current time, perform fitting processing on the environment state information, the preset starting frequency and the preset terminating frequency based on a deep reinforcement learning network, output an initial communication frequency of the current time and an initial spoofing frequency of the current time, store the communication frequency in a preset historical communication frequency sequence table, and store the spoofing frequency in the preset historical spoofing frequency sequence table;
a sending module, configured to send the communication frequency and the spoofing frequency to the user receiver, where the communication frequency is used for the user receiver to control a user transmitter to transmit a communication signal, and the spoofing frequency is used for the user receiver to control the user transmitter to transmit a spoofing signal;
a fitting processing module, configured to perform fitting processing on the communication frequency in the preset historical communication frequency sequence table, the environment state information, and the spoofing frequency in the preset historical spoofing frequency sequence table respectively based on an interference decision evaluation network, output a communication evaluation network error value at the current time and a spoofing evaluation network error value at the current time, and calculate an instantaneous reward of the communication evaluation network error value to obtain a first instantaneous reward at the current time, and calculate an instantaneous reward of the spoofing evaluation network error value to obtain a second instantaneous reward at the current time;
the environment state transfer module is used for taking the environment state information, the first instantaneous reward of the current moment and the initial communication frequency of the current moment as the communication experience information of the current moment when the environment state is transferred to the environment state of the next moment, storing the communication experience information of the current moment into a preset communication experience data set, taking the environment state information, the second instantaneous reward of the current moment and the initial trapping frequency of the current moment as the trapping experience information of the current moment, storing the trapping experience information of the current moment into a preset trapping experience data set, returning to the execution step S1 until the number of the communication experience information stored in the preset communication experience data set and the number of the trapping experience information stored in the preset trapping experience data set reach a preset upper limit value, and sending a third signal into the frequency optimization module;
and the frequency optimization module is used for receiving the third signal, updating the deep reinforcement learning network according to the preset communication experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting an optimized communication frequency, updating the deep reinforcement learning network according to the preset spoofing experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting an optimized spoofing frequency, and sending the optimized communication frequency and the optimized spoofing frequency to the user receiver.
Optionally, another embodiment of the present invention provides a spoof-assisted covert interference rejection device, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the computer program, when executed by the processor, implements the spoof-assisted covert interference rejection method as described above. The device may be a computer or the like.
Optionally, another embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, which, when executed by a processor, implements the decoy-assisted covert interference rejection method as described above.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (9)

1. A decoy auxiliary type hidden anti-interference method is characterized by comprising the following steps:
s1: acquiring a current moment through a user receiver, and judging whether the current moment is earlier than a preset moment, if so, executing a step S2, and if not, executing a step S3;
s2: randomly selecting a communication frequency and a decoy frequency within a range from a preset initial frequency to a preset termination frequency, and executing a step S4;
s3: directly acquiring a communication frequency and a decoy frequency from the user receiver, and executing a step S4;
s4: acquiring an interference signal from an interference machine, obtaining a frequency spectrum sample sequence according to the communication frequency, the decoy frequency and the interference signal, and storing the frequency spectrum sample sequence into a preset frequency spectrum sample sequence table;
s5: judging whether the current time is earlier than the preset time again, if so, acquiring the next time through a user receiver, judging whether the next time is earlier than the preset time, if so, returning to the step S2, otherwise, returning to the step S3, taking the preset frequency spectrum sample value sequence table as the environmental state information of the current time, fitting the environmental state information, the preset starting frequency and the preset terminating frequency based on a deep reinforcement learning network until the current time is reached or later than the preset time, outputting the initial communication frequency of the current time and the initial decoy frequency of the current time, saving the communication frequency in a preset historical communication frequency sequence table, and saving the decoy frequency in the preset historical decoy frequency sequence table;
s6: sending the communication frequency and the spoofing frequency to the user receiver, wherein the communication frequency is used for controlling a user transmitter to transmit communication signals by the user receiver, and the spoofing frequency is used for controlling the user transmitter to transmit spoofing signals by the user receiver;
s7: respectively fitting the communication frequency, the environment state information and the decoy frequency in the preset historical communication frequency sequence list based on an interference decision evaluation network, outputting a communication evaluation network error value at the current moment and a decoy evaluation network error value at the current moment, calculating the instantaneous return of the communication evaluation network error value to obtain a first instantaneous return at the current moment, and calculating the instantaneous return of the decoy evaluation network error value to obtain a second instantaneous return at the current moment;
s8: when the environment state is shifted to the environment state of the next moment, taking the environment state information, the first instantaneous reward of the current moment and the initial communication frequency of the current moment as the communication experience information of the current moment, storing the communication experience information of the current moment into a preset communication experience data set, taking the environment state information, the second instantaneous reward of the current moment and the initial spoofing frequency of the current moment as the spoofing experience information of the current moment, storing the spoofing experience information of the current moment into a preset spoofing experience data set, returning to the step S1 until the number of the communication experience information stored in the preset communication experience data set and the number of the spoofing experience information stored in the preset spoofing experience data set reach a preset upper limit value, and executing the step S9;
s9: updating the deep reinforcement learning network according to the preset communication experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting optimized communication frequency, updating the deep reinforcement learning network according to the preset decoy experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting optimized decoy frequency, and sending the optimized communication frequency and the optimized decoy frequency to the user receiver;
the interference signal includes an interference selection frequency and an interference signal bandwidth, and in step S4, the process of obtaining a spectrum sample sequence according to the communication frequency, the spoofing frequency, and the interference signal specifically includes:
calculating the communication frequency, the decoy frequency, the interference selection frequency and the interference signal bandwidth through a first formula to obtain a frequency spectrum sample sequence, wherein the first formula is as follows:
s t ={s 1,t ,s 2,t ,…,s n,t },
wherein the content of the first and second substances,
Figure FDA0003922159160000021
wherein H t (f)=g u U t (f)+g j J t (f)+n(f),
Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003922159160000031
wherein the content of the first and second substances,
Figure FDA0003922159160000032
wherein the content of the first and second substances,
Figure FDA0003922159160000033
wherein s is t Is a sequence of spectral samples, s n,t For the nth spectral sample,. DELTA.f spectral resolution, f s As the starting frequency, H t (f) Is the power spectral density, g u For the channel gain between the subscriber transmitter and the subscriber receiver, g j For channel gain between jammer to user receiver, n (f) is noise, U t (f) For user signals, J t (f) In order to interfere with the signal(s),
Figure FDA0003922159160000034
for user decision, p u Transmitting communication signals and spoofing signals for power, b u For the spectral bandwidth of the communication signal and the spoof signal,
Figure FDA0003922159160000035
in order to make a decision on the interference,
Figure FDA0003922159160000036
for the communication decision at time t, f t c Is the communication frequency at the time t,
Figure FDA0003922159160000037
for the decoy decision at time t,
Figure FDA0003922159160000038
for the decoy frequency at time t, f t j The selection of a frequency for the interference is made,
Figure FDA0003922159160000039
is the interference signal bandwidth.
2. The decoy-assisted covert anti-jamming method according to claim 1, wherein the deep reinforcement learning network comprises a communication deep reinforcement learning network and a decoy deep reinforcement learning network, and the step S5 of fitting the environmental state information, the preset starting frequency and the preset terminating frequency based on the deep reinforcement learning network and outputting the initial communication frequency at the current time and the initial decoy frequency at the current time specifically comprises:
extracting the characteristics of the environment state information based on the communication deep reinforcement learning network, outputting a plurality of initial communication Q values at the current moment, and screening out the maximum communication Q value from the plurality of communication Q values;
selecting a frequency within the range from the preset initial frequency to the preset termination frequency according to the maximum communication Q value to obtain the initial communication frequency at the current moment;
extracting the characteristics of the environment state information based on the spoofing deep reinforcement learning network, outputting a plurality of spoofing Q values which are initial at the current moment, and screening out the maximum spoofing Q value from the plurality of spoofing Q values;
and selecting the frequency within the range from the preset initial frequency to the preset termination frequency according to the maximum spoofing Q value to obtain the initial spoofing frequency at the current moment.
3. The decoy-assisted covert interference rejection method of claim 1 wherein said interference decision evaluation network comprises a communication interference decision evaluation network and a decoy interference decision evaluation network; the process of the step S7 specifically includes:
performing feature extraction on the communication frequency in the preset historical communication frequency sequence table based on a communication interference decision evaluation network, and outputting a communication fitting environment state at the current moment;
calculating the communication fitting environment state and the environment state information by a second formula to obtain a communication evaluation network error value at the current moment, wherein the second formula is as follows:
Figure FDA0003922159160000041
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003922159160000042
a network error value is evaluated for the communication,
Figure FDA0003922159160000043
is the environmental status information at the time point t,
Figure FDA0003922159160000044
fitting an environment state for the communication at time t;
calculating the instantaneous return of the communication evaluation network error value through a third formula to obtain a first instantaneous return at the current moment, wherein the third formula is as follows:
Figure FDA0003922159160000045
wherein the content of the first and second substances,
Figure FDA0003922159160000046
a network error value is evaluated for the communication,
Figure FDA0003922159160000047
reporting for the first instant;
performing feature extraction on the decoy frequency in the preset historical decoy frequency sequence table based on a decoy interference decision evaluation network, and outputting a decoy fitting environment state at the current moment;
calculating the environment state of the decoy fitting and the environment state information through a fourth formula to obtain a decoy evaluation network error value at the current moment, wherein the fourth formula is as follows:
Figure FDA0003922159160000048
wherein the content of the first and second substances,
Figure FDA0003922159160000049
in order to spoof the evaluation of the network error value,
Figure FDA00039221591600000410
is the environmental status information at the time t,
Figure FDA00039221591600000411
fitting the environment state for the spoofing at time t;
calculating the instantaneous return of the cheating evaluation network error value through a fifth formula to obtain a second instantaneous return at the current moment, wherein the fifth formula is as follows:
Figure FDA00039221591600000412
wherein the content of the first and second substances,
Figure FDA00039221591600000413
in the form of a second instantaneous reward,
Figure FDA00039221591600000414
network error values are evaluated for spoofing.
4. The decoy-assisted covert anti-jamming method according to claim 2, wherein the process of step S8 specifically comprises:
when the environmental state is transferred to the environmental state of the next moment, executing the step S1 to the step S5 so as to obtain the environmental state information of the next moment;
taking the environmental state information at the current moment, the first instantaneous report at the current moment, the initial communication frequency at the current moment and the environmental state information at the next moment as communication experience information, and storing the communication experience information into a preset communication experience data set, specifically:
communication experience data set D for storing communication experience information with upper limit of N c
Defining the communication experience information as
Figure FDA0003922159160000051
Wherein the content of the first and second substances,
Figure FDA0003922159160000052
where t is the time, S t Environmental state at time t, S t+1 Is the ambient state at time t +1,
Figure FDA0003922159160000053
for the communication decision at time t, f t c Is the communication frequency at the time t,
Figure FDA0003922159160000054
a first instantaneous reward for time t;
taking the environmental state information at the current moment, the second instantaneous return at the current moment, the initial trapping frequency at the current moment and the environmental state information at the next moment as trapping experience information, and storing the trapping experience information into a preset trapping experience data set;
when the number of the communication experience information stored in the preset communication experience data set and the number of the spoofing experience information stored in the preset spoofing experience data set both reach the preset upper limit value, extracting the communication experience information from the preset communication experience data set according to an equal probability mode, and updating the weight of the communication deep reinforcement learning network through the extracted communication experience information
Figure FDA0003922159160000055
Extracting decoy experience information from the preset decoy experience data set according to an equal probability mode, and updating the weight value of the decoy deep reinforcement learning network through the extracted decoy experience information
Figure FDA0003922159160000056
5. The decoy-assisted covert anti-jamming method according to claim 4, wherein the process of taking the environmental status information at the current time, the second instantaneous reward at the current time, the initial decoy frequency at the current time and the environmental status information at the next time as the decoy experience information and storing the decoy experience information in a preset decoy experience data set specifically comprises:
setting up and storing upper limit N pieces of decoy experience informationDecoy experience data set D of d
Defining the cheating experience information as
Figure FDA0003922159160000061
Wherein the content of the first and second substances,
Figure FDA0003922159160000062
where t is the time, S t Is the environmental state at time t, S t+1 Is the ambient state at time t +1,
Figure FDA0003922159160000063
for the decoy decision at time t,
Figure FDA00039221591600000621
for the frequency of spoofing at time t,
Figure FDA00039221591600000620
for a second instant reward at time t.
6. The decoy-assisted covert interference rejection method according to claim 4, wherein said communication experience information is extracted from said preset communication experience data set according to an equiprobable manner, and the weights of said communication deep reinforcement learning network are updated according to the extracted communication experience information
Figure FDA00039221591600000619
The process comprises the following steps:
respectively extracting communication experience information from the preset communication experience data set according to an equal probability mode;
constructing a communication target value according to the extracted communication experience information, wherein the communication target value is as follows:
Figure FDA0003922159160000064
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003922159160000065
in order to achieve the communication target value,
Figure FDA0003922159160000066
is the first instantaneous return at time t, γ c A value representing a communication reward attenuation factor,
Figure FDA0003922159160000067
is shown in the environment S t+1 The maximum communication Q value output by the lower communication deep reinforcement learning network,
Figure FDA0003922159160000068
the communication frequency at the time t + 1;
calculating the gradient of the communication deep reinforcement learning network according to the sixth formula and the communication target value
Figure FDA0003922159160000069
The sixth formula is:
Figure FDA00039221591600000610
wherein the content of the first and second substances,
Figure FDA00039221591600000611
wherein the content of the first and second substances,
Figure FDA00039221591600000612
in order to calculate the partial derivatives,
Figure FDA00039221591600000613
is the communication frequency at the time t,
Figure FDA00039221591600000614
to communicate an error value, E is expected, S t Is the state of the environment at time t,
Figure FDA00039221591600000615
in order to achieve the target value of the communication,
Figure FDA00039221591600000616
to be in an environment S t The communication Q value output by the lower communication deep reinforcement learning network,
Figure FDA00039221591600000617
gradient of a reinforcement learning network for communication depth;
updating the gradient of the communication deep reinforcement learning network by using a random gradient descent algorithm to obtain the weight of the communication deep reinforcement learning network
Figure FDA00039221591600000618
And until all the communication experience information in the preset communication experience data set is extracted.
7. The decoy-assisted covert anti-interference method according to claim 5, wherein the decoy experience information is extracted from the preset decoy experience data set according to an equal probability mode, and the weight of the decoy deep reinforcement learning network is updated according to the extracted decoy experience information
Figure FDA0003922159160000071
The process comprises the following steps:
respectively extracting the spoofing experience information from the preset spoofing experience data set according to an equal probability mode;
and constructing a decoy target value according to the extracted decoy experience information, wherein the decoy target value is as follows:
Figure FDA0003922159160000072
wherein, γ d Indicating a spoofed reward attenuation factor,
Figure FDA0003922159160000073
for the second instant reward at time t,
Figure FDA0003922159160000074
in order to trick the target value into play,
Figure FDA0003922159160000075
is shown in the environment S t+1 The maximum decoy Q value output by the lower decoy deep reinforcement learning network,
Figure FDA0003922159160000076
the frequency of fraud at time t + 1;
calculating the gradient of the decoy deep reinforcement learning network according to the seventh formula and the decoy target value
Figure FDA0003922159160000077
The seventh formula is:
Figure FDA0003922159160000078
wherein the content of the first and second substances,
Figure FDA00039221591600000715
wherein the content of the first and second substances,
Figure FDA0003922159160000079
in order to calculate the partial derivatives,
Figure FDA00039221591600000710
in order to trick the target value into play,
Figure FDA00039221591600000711
to trick the error value, E is the expectation, S t Is the state of the environment at time t,
Figure FDA00039221591600000712
to be in an environment S t The lower spoofing deep reinforcement learning network outputs the spoofing Q value,
Figure FDA00039221591600000713
gradient of the network for decoy deep reinforcement learning;
updating the gradient of the decoy deep reinforcement learning network by using a random gradient descent algorithm to obtain the weight of the decoy deep reinforcement learning network
Figure FDA00039221591600000714
And until all the cheating experience information in the preset cheating experience data set is extracted.
8. A decoy-assisted covert interference rejection device, comprising:
the first judging module is used for acquiring the current moment through a user receiver and judging whether the current moment is earlier than a preset moment or not, if so, the first signal is sent to the random selection module, and if not, the second signal is sent to the frequency acquisition module;
the random selection module is used for randomly selecting a communication frequency and a decoy frequency within the range from a preset initial frequency to a preset termination frequency according to the first signal and sending the randomly selected communication frequency and the decoy frequency to the frequency spectrum sample value sequence processing module;
the frequency acquisition module is used for directly acquiring the communication frequency and the spoofing frequency from the user receiver according to the second signal and sending the acquired communication frequency and the spoofing frequency to the frequency spectrum sample value sequence processing module;
the frequency spectrum sample value sequence processing module is used for acquiring an interference signal from an interference machine, obtaining a frequency spectrum sample value sequence according to the communication frequency, the decoy frequency and the interference signal, and storing the frequency spectrum sample value sequence into a preset frequency spectrum sample value sequence table;
a final judging module, configured to judge whether the current time is earlier than the preset time again, if so, obtain the next time through a user receiver, and judge whether the next time is earlier than the preset time, if so, return to the random selection module, otherwise, return to the frequency obtaining module, until the current time reaches or is later than the preset time, use the preset frequency spectrum sample value sequence table as environment state information of the current time, perform fitting processing on the environment state information, the preset starting frequency and the preset terminating frequency based on a deep reinforcement learning network, output an initial communication frequency of the current time and an initial spoofing frequency of the current time, store the communication frequency in a preset historical communication frequency sequence table, and store the spoofing frequency in the preset historical spoofing frequency sequence table;
a sending module, configured to send the communication frequency and the spoofing frequency to the user receiver, where the communication frequency is used for the user receiver to control a user transmitter to transmit a communication signal, and the spoofing frequency is used for the user receiver to control the user transmitter to transmit a spoofing signal;
a fitting processing module, configured to perform fitting processing on the communication frequency in the preset historical communication frequency sequence table, the environment state information, and the spoofing frequency in the preset historical spoofing frequency sequence table respectively based on an interference decision evaluation network, output a communication evaluation network error value at the current time and a spoofing evaluation network error value at the current time, and calculate an instantaneous reward of the communication evaluation network error value to obtain a first instantaneous reward at the current time, and calculate an instantaneous reward of the spoofing evaluation network error value to obtain a second instantaneous reward at the current time;
the environment state transfer module is used for taking the environment state information, the first instantaneous reward of the current moment and the initial communication frequency of the current moment as the communication experience information of the current moment when the environment state is transferred to the environment state of the next moment, storing the communication experience information of the current moment into a preset communication experience data set, taking the environment state information, the second instantaneous reward of the current moment and the initial trapping frequency of the current moment as the trapping experience information of the current moment, storing the trapping experience information of the current moment into a preset trapping experience data set, returning to the primary judging module until the number of the communication experience information stored in the preset communication experience data set and the number of the trapping experience information stored in the preset trapping experience data set reach a preset upper limit value, and sending a third signal into the frequency optimization module;
the frequency optimization module is used for receiving the third signal, updating the deep reinforcement learning network according to the preset communication experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting an optimized communication frequency, updating the deep reinforcement learning network according to the preset spoofing experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting an optimized spoofing frequency, and sending the optimized communication frequency and the optimized spoofing frequency to the user receiver;
the interference signal comprises an interference selection frequency and an interference signal bandwidth, and the spectrum sample sequence processing module is specifically configured to:
calculating the communication frequency, the decoy frequency, the interference selection frequency and the interference signal bandwidth through a first formula to obtain a frequency spectrum sample sequence, wherein the first formula is as follows:
s t ={s 1,t ,s 2,t ,…,s n,t },
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003922159160000091
wherein H t (f)=g u U t (f)+g j J t (f)+n(f),
Wherein the content of the first and second substances,
Figure FDA0003922159160000092
wherein the content of the first and second substances,
Figure FDA0003922159160000093
wherein the content of the first and second substances,
Figure FDA0003922159160000094
wherein s is t Is a sequence of spectral samples, s n,t For the nth spectral sample,. DELTA.f the spectral resolution, f s As the starting frequency, H t (f) Is the power spectral density, g u For the channel gain between the subscriber transmitter and the subscriber receiver, g j For channel gain between jammer to user receiver, n (f) is noise, U t (f) For user signals, J t (f) In order to interfere with the signal(s),
Figure FDA0003922159160000101
for user decision, p u Transmitting communication signals and spoofing signals for power, b u For the spectral bandwidth of the communication signal and the spoof signal,
Figure FDA0003922159160000102
in order to make a decision on the interference,
Figure FDA0003922159160000103
for the communication decision at time t, f t c Is the communication frequency at the time t,
Figure FDA0003922159160000104
for a decoy decision at time t, f t d Decoy frequency at time t, f t j The frequency is selected for the purpose of interference,
Figure FDA0003922159160000105
is the interference signal bandwidth.
9. A computer-readable storage medium, storing a computer program, wherein the computer program, when executed by a processor, implements the decoy-assisted covert interference rejection method of any one of claims 1 to 7.
CN202110547565.1A 2021-05-19 2021-05-19 Decoy-assisted hidden anti-interference method, device and storage medium Active CN113395129B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110547565.1A CN113395129B (en) 2021-05-19 2021-05-19 Decoy-assisted hidden anti-interference method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110547565.1A CN113395129B (en) 2021-05-19 2021-05-19 Decoy-assisted hidden anti-interference method, device and storage medium

Publications (2)

Publication Number Publication Date
CN113395129A CN113395129A (en) 2021-09-14
CN113395129B true CN113395129B (en) 2023-03-14

Family

ID=77618072

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110547565.1A Active CN113395129B (en) 2021-05-19 2021-05-19 Decoy-assisted hidden anti-interference method, device and storage medium

Country Status (1)

Country Link
CN (1) CN113395129B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114172691A (en) * 2021-11-11 2022-03-11 南京航空航天大学 Anti-tracking interference system based on decoy strategy
CN113890651B (en) * 2021-11-17 2022-08-16 北京航空航天大学 Method for predicting spectrum interference between transmitter and receiver

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111934786A (en) * 2020-07-30 2020-11-13 桂林理工大学 Signal concealment anti-interference method and device based on deep reinforcement learning
CN111970072A (en) * 2020-07-01 2020-11-20 中国人民解放军陆军工程大学 Deep reinforcement learning-based broadband anti-interference system and anti-interference method
CN112180331A (en) * 2020-09-29 2021-01-05 中国船舶重工集团公司第七二四研究所 Adaptive radio frequency shielding pulse frequency point strategy scheduling method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200153535A1 (en) * 2018-11-09 2020-05-14 Bluecom Systems and Consulting LLC Reinforcement learning based cognitive anti-jamming communications system and method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111970072A (en) * 2020-07-01 2020-11-20 中国人民解放军陆军工程大学 Deep reinforcement learning-based broadband anti-interference system and anti-interference method
CN111934786A (en) * 2020-07-30 2020-11-13 桂林理工大学 Signal concealment anti-interference method and device based on deep reinforcement learning
CN112180331A (en) * 2020-09-29 2021-01-05 中国船舶重工集团公司第七二四研究所 Adaptive radio frequency shielding pulse frequency point strategy scheduling method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于相关峰数量统计的诱骗信号快速识别;梁宵等;《电讯技术》;20180228(第02期);第1-4页 *
超宽带多址通信信号的功率谱分析;郑继禹等;《电子学报》;20031025(第10期);第1-3页 *
通信电子干扰的分类与发展;逄天洋等;《通信技术》;20181010(第10期);第1-7页 *

Also Published As

Publication number Publication date
CN113395129A (en) 2021-09-14

Similar Documents

Publication Publication Date Title
CN113395129B (en) Decoy-assisted hidden anti-interference method, device and storage medium
Preisig Performance analysis of adaptive equalization for coherent acoustic communications in the time-varying ocean environment
CN108777872A (en) A kind of anti-interference model of depth Q neural networks and intelligent Anti-interference algorithm
Jia-jia et al. Bio-inspired steganography for secure underwater acoustic communications
CN112466320B (en) Underwater sound signal noise reduction method based on generation countermeasure network
CN105117769A (en) Identification method of deception jamming on the basis of differential evolution wolf pack algorithm
Han et al. Experimental demonstration of underwater acoustic communication using bionic signals
CN115442191B (en) Communication signal noise reduction method and system based on relative average generation countermeasure network
CN107180259B (en) STAP training sample selection method based on system identification
Kari et al. Robust adaptive algorithms for underwater acoustic channel estimation and their performance analysis
CN104568113B (en) A kind of ocean acoustic propagation investigation automatic intercept method of blast wave based on model
CN107769862A (en) A kind of bionical low communication interception method
CN113420495B (en) Active decoy type intelligent anti-interference method
CN110061982B (en) Intelligent attack resisting safe transmission method based on reinforcement learning
CN115236607A (en) Radar anti-interference strategy optimization method based on double-layer Q learning
Li et al. Counterfactual regret minimization for anti-jamming game of frequency agile radar
CN106910508B (en) Hidden underwater acoustic communication method for imitating marine pile driving sound source
CN111934786B (en) Signal concealment anti-interference method and device based on deep reinforcement learning
Topal et al. Identification of smart jammers: Learning based approaches using wavelet representation
CN116866048A (en) Anti-interference zero-and Markov game model and maximum and minimum depth Q learning method
CN106385272A (en) Underwater signal enhancing method based on stochastic resonance and time reverse mirror
CN114666107A (en) Advanced persistent threat defense method in mobile fog computing
CN114330441A (en) Underwater sound JANUS signal identification method and system based on time-frequency spectrum and transfer learning
CN115378777A (en) Method for identifying underwater communication signal modulation mode in alpha stable distribution noise environment
Jin et al. Preamble detection for underwater acoustic communications based on convolutional neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant