CN113395129B - Decoy-assisted hidden anti-interference method, device and storage medium - Google Patents
Decoy-assisted hidden anti-interference method, device and storage medium Download PDFInfo
- Publication number
- CN113395129B CN113395129B CN202110547565.1A CN202110547565A CN113395129B CN 113395129 B CN113395129 B CN 113395129B CN 202110547565 A CN202110547565 A CN 202110547565A CN 113395129 B CN113395129 B CN 113395129B
- Authority
- CN
- China
- Prior art keywords
- frequency
- communication
- preset
- decoy
- spoofing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04K—SECRET COMMUNICATION; JAMMING OF COMMUNICATION
- H04K3/00—Jamming of communication; Counter-measures
- H04K3/40—Jamming having variable characteristics
- H04K3/42—Jamming having variable characteristics characterized by the control of the jamming frequency or wavelength
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04K—SECRET COMMUNICATION; JAMMING OF COMMUNICATION
- H04K3/00—Jamming of communication; Counter-measures
- H04K3/60—Jamming involving special techniques
- H04K3/65—Jamming involving special techniques using deceptive jamming or spoofing, e.g. transmission of false signals for premature triggering of RCIED, for forced connection or disconnection to/from a network or for generation of dummy target signal
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04K—SECRET COMMUNICATION; JAMMING OF COMMUNICATION
- H04K3/00—Jamming of communication; Counter-measures
- H04K3/80—Jamming or countermeasure characterized by its function
- H04K3/82—Jamming or countermeasure characterized by its function related to preventing surveillance, interception or detection
- H04K3/825—Jamming or countermeasure characterized by its function related to preventing surveillance, interception or detection by jamming
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Mobile Radio Communication Systems (AREA)
- Noise Elimination (AREA)
Abstract
The invention provides a decoy auxiliary type hidden anti-interference method, a device and a storage medium, wherein the method comprises the following steps: acquiring the current moment through a user receiver, judging whether the current moment is earlier than the preset moment, if so, randomly selecting a communication frequency and a trapping frequency within the range from a preset initial frequency to a preset termination frequency, and if not, directly acquiring the communication frequency and the trapping frequency from the user receiver; and obtaining a frequency spectrum sample sequence according to the communication frequency, the decoy frequency and the interference signal, and storing the frequency spectrum sample sequence into a preset frequency spectrum sample sequence table. The invention ensures that the user trapping signal can firmly attract the interference attack, so that the user communication signal realizes the covert communication by the interference, the purposes of avoiding the interference attack and obtaining the information without interference are realized, meanwhile, the covert communication is realized by sacrificing the user trapping signal, the anti-interference performance is improved, and the leakage of the self information is avoided.
Description
Technical Field
The invention mainly relates to the technical field of communication anti-interference, in particular to a decoy auxiliary type hidden anti-interference method, a device and a storage medium.
Background
With the development of wireless networks, the development of wireless networks is faster and faster, and the proportion of the wireless networks in daily life and military communication of people is increased year by year. Wireless networks offer many benefits to people's life and work due to the openness of their propagation medium, which however also makes them vulnerable to interference. With the advancement of science and technology, intelligent interference with learning ability has a serious influence on wireless communication technology, and how to deal with the intelligent interference becomes one of the popular directions for anti-interference researchers to research. At present, most anti-interference technologies are based on an evasion strategy, namely, the attack of interference is avoided as much as possible. Although the current effect of the method is good, the self information of the communication user is leaked, and the anti-interference performance is obviously reduced along with the continuous learning of the interference on the user information.
The method has the advantages that a learner puts forward a concealed anti-interference idea, utilizes an environment signal to conceal a communication signal of the learner, and ensures that the information of the learner is not interfered and acquired, so that the interference cannot be learned and targeted interference can not be implemented. However, in the model, it is necessary to assume that there is a strong ambient signal around, i.e. there is a perception blind area for the interference. In reality, environmental signals do not exist all the time, and self information can be masked, so that a significant research direction is provided for how to realize the hiding and anti-interference under the condition that the interference is not influenced by the environment.
Disclosure of Invention
The invention aims to solve the technical problem of the prior art and provides a decoy auxiliary type hidden anti-interference method, a device and a storage medium.
The technical scheme for solving the technical problems is as follows: a decoy auxiliary type hidden anti-interference method comprises the following steps:
s1: acquiring a current moment through a user receiver, and judging whether the current moment is earlier than a preset moment, if so, executing a step S2, and if not, executing a step S3;
s2: randomly selecting a communication frequency and a decoy frequency within a range from a preset initial frequency to a preset termination frequency, and executing a step S4;
s3: directly acquiring a communication frequency and a decoy frequency from the user receiver, and executing a step S4;
s4: acquiring an interference signal from an interference machine, obtaining a frequency spectrum sample sequence according to the communication frequency, the decoy frequency and the interference signal, and storing the frequency spectrum sample sequence into a preset frequency spectrum sample sequence table;
s5: judging whether the current time is earlier than the preset time again, if so, acquiring the next time through a user receiver, judging whether the next time is earlier than the preset time, if so, returning to the step S2, otherwise, returning to the step S3, taking the preset frequency spectrum sample value sequence table as the environmental state information of the current time, fitting the environmental state information, the preset starting frequency and the preset terminating frequency based on a deep reinforcement learning network, outputting the initial communication frequency of the current time and the initial trapping frequency of the current time, saving the communication frequency in a preset historical communication frequency sequence table, and saving the trapping frequency in the preset historical trapping frequency sequence table;
s6: sending the communication frequency and the spoofing frequency to the user receiver, wherein the communication frequency is used for controlling a user transmitter to transmit communication signals by the user receiver, and the spoofing frequency is used for controlling the user transmitter to transmit spoofing signals by the user receiver;
s7: respectively fitting the communication frequency, the environment state information and the decoy frequency in the preset historical communication frequency sequence list based on an interference decision evaluation network, outputting a communication evaluation network error value at the current moment and a decoy evaluation network error value at the current moment, calculating the instantaneous return of the communication evaluation network error value to obtain a first instantaneous return at the current moment, and calculating the instantaneous return of the decoy evaluation network error value to obtain a second instantaneous return at the current moment;
s8: when the environment state is shifted to the environment state of the next moment, taking the environment state information, the first instantaneous reward of the current moment and the initial communication frequency of the current moment as the communication experience information of the current moment, storing the communication experience information of the current moment into a preset communication experience data set, taking the environment state information, the second instantaneous reward of the current moment and the initial spoofing frequency of the current moment as the spoofing experience information of the current moment, storing the spoofing experience information of the current moment into a preset spoofing experience data set, returning to the step S1 until the number of the communication experience information stored in the preset communication experience data set and the number of the spoofing experience information stored in the preset spoofing experience data set reach a preset upper limit value, and executing the step S9;
s9: updating the deep reinforcement learning network according to the preset communication experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting optimized communication frequency, updating the deep reinforcement learning network according to the preset decoy experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting optimized decoy frequency, and sending the optimized communication frequency and the optimized decoy frequency to the user receiver.
Another technical solution of the present invention for solving the above technical problems is as follows: a decoy-assisted covert jamming protection device comprising:
the first judging module is used for acquiring the current moment through a user receiver and judging whether the current moment is earlier than a preset moment or not, if so, the first signal is sent to the random selection module, and if not, the second signal is sent to the frequency acquisition module;
the random selection module is used for randomly selecting a communication frequency and a decoy frequency within the range from a preset initial frequency to a preset termination frequency according to the first signal and sending the randomly selected communication frequency and the decoy frequency to the frequency spectrum sample value sequence processing module;
the frequency acquisition module is used for directly acquiring communication frequency and decoy frequency from the user receiver according to the second signal and sending the acquired communication frequency and decoy frequency to the frequency spectrum sample value sequence processing module;
the frequency spectrum sample value sequence processing module is used for acquiring an interference signal from an interference machine, obtaining a frequency spectrum sample value sequence according to the communication frequency, the decoy frequency and the interference signal, and storing the frequency spectrum sample value sequence into a preset frequency spectrum sample value sequence table;
a final judging module, configured to judge whether the current time is earlier than the preset time again, if so, obtain the next time through a user receiver, and judge whether the next time is earlier than the preset time, if so, return to the random selection module, otherwise, return to the frequency obtaining module, until the current time reaches or is later than the preset time, use the preset frequency spectrum sample value sequence table as environment state information of the current time, perform fitting processing on the environment state information, the preset starting frequency and the preset terminating frequency based on a deep reinforcement learning network, output an initial communication frequency of the current time and an initial spoofing frequency of the current time, store the communication frequency in a preset historical communication frequency sequence table, and store the spoofing frequency in the preset historical spoofing frequency sequence table;
a sending module, configured to send the communication frequency and the spoofing frequency to the user receiver, where the communication frequency is used for the user receiver to control a user transmitter to transmit a communication signal, and the spoofing frequency is used for the user receiver to control the user transmitter to transmit a spoofing signal;
a fitting processing module, configured to perform fitting processing on the communication frequency in the preset historical communication frequency sequence table, the environment state information, and the spoofing frequency in the preset historical spoofing frequency sequence table respectively based on an interference decision evaluation network, output a communication evaluation network error value at the current time and a spoofing evaluation network error value at the current time, and calculate an instantaneous reward of the communication evaluation network error value to obtain a first instantaneous reward at the current time, and calculate an instantaneous reward of the spoofing evaluation network error value to obtain a second instantaneous reward at the current time;
the environment state transfer module is used for taking the environment state information, the first instantaneous reward of the current moment and the initial communication frequency of the current moment as the communication experience information of the current moment when the environment state is transferred to the environment state of the next moment, storing the communication experience information of the current moment into a preset communication experience data set, taking the environment state information, the second instantaneous reward of the current moment and the initial trapping frequency of the current moment as the trapping experience information of the current moment, storing the trapping experience information of the current moment into a preset trapping experience data set, returning to the execution step S1 until the number of the communication experience information stored in the preset communication experience data set and the number of the trapping experience information stored in the preset trapping experience data set reach a preset upper limit value, and sending a third signal into the frequency optimization module;
and the frequency optimization module is used for receiving the third signal, updating the deep reinforcement learning network according to the preset communication experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting an optimized communication frequency, updating the deep reinforcement learning network according to the preset spoofing experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting an optimized spoofing frequency, and sending the optimized communication frequency and the optimized spoofing frequency to the user receiver.
Another technical solution of the present invention for solving the above technical problems is as follows: a decoy-assisted covert anti-jamming device comprising a memory, a processor and a computer program stored in said memory and executable on said processor, said computer program, when executed by said processor, implementing a decoy-assisted covert anti-jamming method as described above.
Another technical solution of the present invention for solving the above technical problems is as follows: a computer readable storage medium storing a computer program which, when executed by a processor, implements a decoy-assisted covert interference rejection method as described above.
The beneficial effects of the invention are: the current moment is obtained through the user receiver, when the current moment is earlier than the preset moment, the communication frequency and the trapping frequency are randomly selected from the range from the preset starting frequency to the preset ending frequency, experience is accumulated, data are provided for subsequent processing, when the current moment is equal to or later than the preset moment, the communication frequency and the trapping frequency are directly obtained from the user receiver, the fact that the user trapping signal can firmly attract interference attack is guaranteed, the user communication signal can achieve covert communication accordingly, the purposes that the interference attack is avoided, information is not interfered and obtained are achieved, meanwhile, covert communication is achieved by sacrificing the user trapping signal, anti-interference performance is improved, and meanwhile leakage of self information is avoided.
Drawings
FIG. 1 is a flow chart of a spoofing-assisted covert interference rejection method according to an embodiment of the present invention;
fig. 2 is a block diagram of a spoofing-assisted covert interference rejection unit according to an embodiment of the present invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
Fig. 1 is a schematic flowchart of a spoofing-assisted covert interference rejection method according to an embodiment of the present invention.
As shown in fig. 1, a spoofing-assisted covert anti-interference method includes the following steps:
1. a decoy auxiliary type hidden anti-interference method is characterized by comprising the following steps:
s1: acquiring a current moment through a user receiver, and judging whether the current moment is earlier than a preset moment, if so, executing a step S2, and if not, executing a step S3;
s2: randomly selecting a communication frequency and a decoy frequency within a range from a preset initial frequency to a preset end frequency, and executing a step S4;
s3: directly acquiring a communication frequency and a decoy frequency from the user receiver, and executing a step S4;
s4: obtaining an interference signal from an interference machine, obtaining a frequency spectrum sample sequence according to the communication frequency, the decoy frequency and the interference signal, and storing the frequency spectrum sample sequence into a preset frequency spectrum sample sequence table;
s5: judging whether the current time is earlier than the preset time again, if so, acquiring the next time through a user receiver, judging whether the next time is earlier than the preset time, if so, returning to the step S2, otherwise, returning to the step S3, taking the preset frequency spectrum sample value sequence table as the environmental state information of the current time, fitting the environmental state information, the preset starting frequency and the preset terminating frequency based on a deep reinforcement learning network, outputting the initial communication frequency of the current time and the initial trapping frequency of the current time, saving the communication frequency in a preset historical communication frequency sequence table, and saving the trapping frequency in the preset historical trapping frequency sequence table;
s6: sending the communication frequency and the spoofing frequency to the user receiver, wherein the communication frequency is used for controlling a user transmitter to transmit communication signals by the user receiver, and the spoofing frequency is used for controlling the user transmitter to transmit spoofing signals by the user receiver;
s7: respectively fitting the communication frequency, the environment state information and the decoy frequency in the preset historical communication frequency sequence list based on an interference decision evaluation network, outputting a communication evaluation network error value at the current moment and a decoy evaluation network error value at the current moment, calculating the instantaneous return of the communication evaluation network error value to obtain a first instantaneous return at the current moment, and calculating the instantaneous return of the decoy evaluation network error value to obtain a second instantaneous return at the current moment;
s8: when the environment state is transferred to the environment state of the next moment, taking the environment state information, the first instantaneous reward of the current moment and the initial communication frequency of the current moment as the communication experience information of the current moment, storing the communication experience information of the current moment in a preset communication experience data set, taking the environment state information, the second instantaneous reward of the current moment and the initial spoofing frequency of the current moment as the spoofing experience information of the current moment, storing the spoofing experience information of the current moment in a preset spoofing experience data set, returning to the step S1 until the number of the communication experience information stored in the preset communication experience data set and the number of the spoofing experience information stored in the preset spoofing experience data set reach a preset upper limit value, and executing the step S9;
s9: updating the deep reinforcement learning network according to the preset communication experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting optimized communication frequency, updating the deep reinforcement learning network according to the preset decoy experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting optimized decoy frequency, and sending the optimized communication frequency and the optimized decoy frequency to the user receiver.
It should be understood that the interference decision evaluation network refers to a GAN network, i.e. a generation-counter network.
It should be appreciated that the first instantaneous reward can be an instantaneous reward of a communication deep reinforcement learning network; the second instantaneous reward may be an instantaneous reward that spoofs a deep reinforcement learning network.
Specifically, step S2 transmits the obtained communication frequency and spoofing frequency to the user transmitter, and step S4 receives an interference signal transmitted by the jammer from the user receiver, and also receives a communication signal and a spoofing signal transmitted by the user transmitter, where the communication signal includes the communication frequency and the communication bandwidth, and the spoofing signal includes the spoofing frequency and the spoofing bandwidth.
It should be understood that, if the preset time is set to 10 o ' clock, if the current time is 9 o ' clock and 58 min, step S2 is executed, and if the current time is 10 o ' clock and 01 min, step S3 is executed.
It should be understood that considering a scenario where one user and one interference are competing within a communication bandwidth of B, so that one pair of the user transmitter and the user receiver is one user, there is one interference in the system to implement interference on user communication.
It should be understood that, the user transmits signals (i.e. communication signals and spoofing signals), the jammer learns and makes a targeted jamming decision according to the perceived environment state, releases the jamming signals, the user can receive the communication signals and spoofing signals and jamming signals transmitted by the user transmitter, and learn the received environment state information to obtain corresponding communication frequencies and spoofing frequencies, and transmit the corresponding communication frequencies and spoofing frequencies to the user receiver, and the user receiver transmits control information to the user transmitter to transmit corresponding signals.
It should be appreciated that since the environmental state cannot expand in the time dimension in the early stages of the confrontation, the user begins to randomly select a frequency and accumulate experience until the environmental state satisfies the expansion in the time dimension.
Specifically, considering that the decision of both the confrontation parties is related to the environmental state of the past for a long time, the environmental state is defined as S t ={s t ,s t-1 ,…,s t-T+1 },s t Is a frequency spectrum sample value sequence, T represents the time length of backtracking, and the user can use the environment state information S t Learning is performed and communicated to the user transmitter over a control link.
In the embodiment, the current time is obtained through the user receiver, when the current time is earlier than the preset time, the communication frequency and the trapping frequency are randomly selected from the range from the preset starting frequency to the preset ending frequency, experience is accumulated, data are provided for subsequent processing, when the current time is equal to or later than the preset time, the communication frequency and the trapping frequency are directly obtained from the user receiver, and the fact that the user trapping signal can firmly attract interference attack is guaranteed, so that the user communication signal is concealed in communication, the purposes that the user communication signal is free from interference attack and information is not interfered to obtain are achieved, meanwhile, the concealed communication is achieved by sacrificing the user trapping signal, the anti-interference performance is improved, and meanwhile, the leakage of self information is avoided.
Optionally, as an embodiment of the present invention, the interference signal includes an interference selection frequency and an interference signal bandwidth, and the step S4 of obtaining the spectrum sample sequence according to the communication frequency, the decoy frequency, and the interference signal specifically includes:
calculating the communication frequency, the decoy frequency, the interference selection frequency and the interference signal bandwidth through a first equation to obtain a frequency spectrum sample sequence, wherein the first equation is as follows:
s t ={s 1,t ,s 2,t ,…,s n,t },
wherein H t (f)=g u U t (f)+g j J t (f)+n(f),
wherein s is t Is a sequence of spectral samples, s n,t For the nth spectral sample,. DELTA.f spectral resolution, f s As the starting frequency, H t (f) Is the power spectral density, g u For the channel gain between the subscriber transmitter and the subscriber receiver, g j For channel gain between jammer to user receiver, n (f) is noise, U t (f) For user signals, J t (f) In order to interfere with the signal, it is,for user decision, p u For transmitting communication signals and spoofing signals for power, b u For the spectral bandwidth of the communication signal and the spoofed signal,in order to make a decision on the interference,for the decision to communicate at time t,is the communication frequency at the time t,for the decoy decision at time t,for the decoy frequency at time t,the frequency is selected for the purpose of interference,is the interference signal bandwidth.
Understandably, in the formula, H t (f) And H t (f+f s ) The difference in (A) is merely that the values of the substitutions differ, i.e. H t (f) The substituted values are f, H t (f+f s ) The value of f + f s 。
Specifically, the start-stop frequencies of the communication band where the user and the interference are located are f s And f e The user can select the frequencies fc and f d Respectively as a communication center frequency and a spoofing center frequency, wherein f c ∈[f s ,f e ],f d ∈[f s ,f e ]With p u For power transmission of communication signals and spoofing signals, the user signal has a spectral bandwidth of b u The user's decision only takes into account the change in frequency, including the communication frequency and the spoofing frequency, i.e.WhereinThus defining the user signal asThe interfering user can also be in the communication band f s ,f e ]Internal free selection frequency f j Transmitting an interference signal (i.e. the interference selection frequency) for the center frequency, whichSignal bandwidth b j ∈[b s ,b h ]Can be changed according to the perception state, wherein b s ,b h Minimum and maximum bandwidths for an interfering signal, respectively, the interference decision comprising the interference selection frequency and the interfering signal bandwidthThus defining the interference signal as
The sensing device configured with the communication party can sense the frequency spectrum of the whole communication frequency band in real time, and the PSD of the receiving end signal considering the coexistence of the user signal and the interference signal is expressed as follows:
H t (f)=g u U t (f)+g j J t (f)+n(f)
wherein g is u Representing the channel gain, g, between the subscriber transmitter and the subscriber receiver j Representing the channel gain from the jammer to the user receiver and n (f) representing the noise.
The spectrum sample after discretization isWhere Δ f denotes the spectral resolution, f s Is the start-stop frequency; the frequency spectrum sample value sequence of the user receiving end is s t ={s 1,t ,s 2,t ,…,s n,t }。
In the embodiment, the frequency spectrum sample value sequence is obtained by calculating the communication frequency, the decoy frequency, the interference selection frequency and the interference signal bandwidth in the first mode, so that a basis is provided for acquiring subsequent environment state information, the communication is concealed, the anti-interference performance is improved, and meanwhile, the leakage of self information is avoided.
Optionally, as an embodiment of the present invention, the deep reinforcement learning network includes a communication deep reinforcement learning network and a spoofing deep reinforcement learning network, and in step S5, the process of fitting the environment state information, the preset starting frequency and the preset terminating frequency based on the deep reinforcement learning network and outputting the initial communication frequency at the current time and the initial spoofing frequency at the current time specifically includes:
extracting the characteristics of the environment state information based on the communication deep reinforcement learning network, outputting a plurality of initial communication Q values at the current moment, and screening out the maximum communication Q value from the plurality of communication Q values;
selecting a frequency within the range from the preset initial frequency to the preset termination frequency according to the maximum communication Q value to obtain the initial communication frequency at the current moment;
extracting the characteristics of the environment state information based on the spoofing deep reinforcement learning network, outputting a plurality of spoofing Q values which are initial at the current moment, and screening out the maximum spoofing Q value from the plurality of spoofing Q values;
and selecting the frequency within the range from the preset initial frequency to the preset termination frequency according to the maximum decoy Q value to obtain the initial decoy frequency at the current moment.
It should be understood that the communication Q-value is a value derived representing a selection of a different user decision in the ambient state St, and the spoof Q-value is a value derived representing a selection of a different spoof decision in the ambient state St.
Specifically, the user receives the environmental status information S t . The user selects the probability epsilon according to the communication decision c Randomly selecting one of said communication frequencies, or according to a probability 1-epsilon c Selecting environmental status information S t The communication frequency corresponding to the largest communication Q value in the down decisionThe user selects a probability epsilon according to the spoofed decision d Randomly selecting one of said spoofing frequencies, or according to the probability 1-epsilon d Selecting an environmental State S t The spoofing frequency corresponding to the maximum spoofing Q value in the down decision
In the embodiment, the initial communication frequency at the current moment and the initial spoofing frequency at the current moment are output based on the fitting processing of the deep reinforcement learning network on the environmental state information, the preset starting frequency and the preset terminating frequency, the frequency can be updated, the user spoofing signal can be ensured to firmly attract the interference attack, the user communication signal realizes the covert communication, and the purposes of avoiding the interference attack and obtaining the information without interference are achieved.
Optionally, as an embodiment of the present invention, the interference decision evaluation network includes a communication interference decision evaluation network and a spoofed interference decision evaluation network; the process of step S7 specifically includes:
performing feature extraction on the communication frequency in the preset historical communication frequency sequence table based on a communication interference decision evaluation network, and outputting a communication fitting environment state at the current moment;
calculating the communication fitting environment state and the environment state information by a second formula to obtain a communication evaluation network error value at the current moment, wherein the second formula is as follows:
wherein the content of the first and second substances,a network error value is evaluated for the communication,is the environmental status information at the time t,fitting an environment state for the communication at the time t;
calculating the instantaneous return of the communication evaluation network error value through a third formula to obtain a first instantaneous return at the current moment, wherein the third formula is as follows:
wherein the content of the first and second substances,a network error value is evaluated for the communication,reporting for the first instant;
performing feature extraction on the decoy frequency in the preset historical decoy frequency sequence table based on a decoy interference decision evaluation network, and outputting a decoy fitting environment state at the current moment;
calculating the environment state of the decoy fitting and the environment state information through a fourth formula to obtain a decoy evaluation network error value at the current moment, wherein the fourth formula is as follows:
wherein, the first and the second end of the pipe are connected with each other,in order to spoof the evaluation of the network error value,is the environmental status information at the time t,fitting the environment state for the spoofing at time t;
calculating the instantaneous return of the cheating evaluation network error value through a fifth formula to obtain a second instantaneous return at the current moment, wherein the fifth formula is as follows:
wherein the content of the first and second substances,in the form of a second instantaneous reward,the net error value is evaluated for spoofing.
It should be appreciated that the communication interference decision evaluation network and the spoofed interference decision evaluation network may both be generating countermeasure networks.
It should be appreciated that the interference decision correlation evaluation network (i.e. either the communication interference decision evaluation network or the spoofed interference decision evaluation network) evaluates the network loss value with the communication at each iterationAnd spoofing the evaluation of network loss valuesReverse updating respective interference decision evaluation network parametersAnd
specifically, past communication decision sequences are compared(i.e., the predetermined historical communication frequency sequence list) and past spoofing decision sequences(namely the preset historical spoofing frequency sequence list) respectively input the respective interference decision correlation evaluation network (namely the communication interference decision evaluation network or the spoofing interference decision evaluation network) for feature extraction to obtain the output instantaneous environment stateAndcomputing outputAndand instantaneous environmental stateEstimate network loss value of communication betweenAnd spoofing the evaluation of network loss valuesDefining the first instantaneous reward of the communication deep reinforcement learning network asThe second instantaneous reward of the decoy deep reinforcement learning network is
In the embodiment, the fitting process of the communication frequency, the environmental state information and the spoofing frequency in the preset historical communication frequency sequence table and the fitting process of the spoofing frequency in the preset historical spoofing frequency sequence table are respectively output based on the interference decision evaluation network, so that the first instant reward of the current moment and the second instant reward of the current moment are ensured to be firmly attracted by the user spoofing signal, the user communication signal is enabled to realize covert communication, the purposes of avoiding interference attack and obtaining information without interference are realized, meanwhile, covert communication is realized by sacrificing the user spoofing signal, the anti-interference performance is improved, and meanwhile, the leakage of self information is avoided.
Optionally, as an embodiment of the present invention, the process of step S8 specifically includes:
when the environmental state is transferred to the environmental state of the next moment, executing the step S1 to the step S5 so as to obtain the environmental state information of the next moment;
taking the environmental state information at the current moment, the first instantaneous report at the current moment, the initial communication frequency at the current moment and the environmental state information at the next moment as communication experience information, and storing the communication experience information into a preset communication experience data set, wherein the communication experience information specifically comprises the following steps:
communication experience data set D for storing communication experience information with upper limit of N c ;
where t is the time, S t Environmental state at time t, S t+1 Is the ambient state at time t +1,for the decision to communicate at time t,is the communication frequency at the time of the t,a first instantaneous reward for time t;
taking the environmental state information at the current moment, the second instantaneous return at the current moment, the initial trapping frequency at the current moment and the environmental state information at the next moment as trapping experience information, and storing the trapping experience information into a preset trapping experience data set;
when the quantity of the communication experience information stored in the preset communication experience data set and the communication experience information stored in the preset cheating experience data setWhen the number of the cheating experience information reaches the preset upper limit value, extracting the communication experience information from the preset communication experience data set according to an equal probability mode, and updating the weight of the communication deep reinforcement learning network through the extracted communication experience information
Extracting decoy experience information from the preset decoy experience data set according to an equal probability mode, and updating the weight value of the decoy deep reinforcement learning network through the extracted decoy experience information
It should be understood that the preset upper limit value may be N/2.
It should be understood that the communication experience (i.e., the communication experience information) of the user at time t is recordedIt is stored into the data set Dc.
It is to be understood that N is a positive integer greater than 0.
In the embodiment, the environmental state information at the current moment, the first instantaneous reward at the current moment, the initial communication frequency at the current moment and the environmental state information at the next moment are taken as the communication experience information, and the communication experience information is stored in the preset communication experience data set, so that the user decoy signal can firmly attract the interference attack, the user communication signal can realize covert communication, the purposes of avoiding the interference attack and obtaining the information without interference are realized, meanwhile, the covert communication is realized by sacrificing the user decoy signal, the anti-interference performance is improved, and the leakage of the self information is avoided.
Optionally, as an embodiment of the present invention, the step of using the environmental status information at the current time, the second instantaneous reward at the current time, the initial spoofing frequency at the current time, and the environmental status information at the next time as the spoofing experience information, and storing the spoofing experience information in a preset spoofing experience data set specifically includes:
setting a spoofing experience data set D for storing upper limit of the N pieces of spoofing experience information d ;
where t is the time, S t Is the environmental state at time t, S t+1 Is the environmental state at time t +1,for the decoy decision at time t,for the frequency of spoofing at time t,the second instantaneous reward at time t. It should be understood that the user's t-time spoofing experience (i.e., the spoofing experience information) is recordedIt is stored in the data set Dd.
It is to be understood that N is a positive integer greater than 0.
In the embodiment, the environmental state information at the current moment, the second instantaneous reward at the current moment, the initial trapping frequency at the current moment and the environmental state information at the next moment are taken as the trapping experience information, and the trapping experience information is stored in the preset trapping experience data set, so that the user trapping signals can firmly attract the interference attack, the user communication signals realize the covert communication, the purposes of avoiding the interference attack and obtaining the information without interference are achieved, meanwhile, the covert communication is achieved by sacrificing the user trapping signals, the anti-interference performance is improved, and meanwhile, the leakage of the self information is avoided.
Optionally, as an embodiment of the present invention, the communication experience information is extracted from the preset communication experience data set according to an equiprobable manner, and the weight of the communication deep reinforcement learning network is updated according to the extracted communication experience informationThe process comprises the following steps:
respectively extracting communication experience information from the preset communication experience data set according to an equal probability mode;
constructing a communication target value according to the extracted communication experience information, wherein the communication target value is as follows:
wherein, the first and the second end of the pipe are connected with each other,in order to achieve the communication target value,is the first instantaneous return at time t, γ c A value representing a communication reward attenuation factor,is shown in the environment S t+1 The maximum communication Q value output by the lower communication deep reinforcement learning network,the communication frequency at the time t + 1;
calculating the gradient of the communication deep reinforcement learning network according to the sixth formula and the communication target valueThe sixth formula is:
wherein the content of the first and second substances,in order to calculate the partial derivatives,is the communication frequency at the time t,to communicate an error value, E is desired, S t Is the state of the environment at time t,in order to achieve the communication target value,to be in an environment S t The communication Q value output by the lower communication deep reinforcement learning network,gradient of a reinforcement learning network for communication depth;
updating the gradient of the communication deep reinforcement learning network by using a random gradient descent algorithm to obtain a weight value of the communication deep reinforcement learning network
And until all the communication experience information in the preset communication experience data set is extracted.
In particular, from the communication experience data set D with equal probability c Randomly extracting experience (i.e. communication experience information) and according to the state and phase in the experienceCorresponding decision and return are carried out to construct a target valueWhereinIndicating the environmental state S in memory t+1 The maximum Q value that the user can obtain (i.e., in the environment S) t+1 Maximum communication Q value output by lower communication deep reinforcement learning network), calculating target valueAnd true valueError betweenAnd from this the gradient is calculated as follows,
In the embodiment, the communication experience information is extracted from the preset communication experience data set in an equiprobable manner, the weight of the communication depth reinforcement learning network is updated through the extracted communication experience information, the user decoy signal can firmly attract the interference attack, the user communication signal can realize covert communication through the communication experience information, the purposes of avoiding the interference attack and obtaining the information without interference are achieved, meanwhile, the covert communication is achieved through sacrificing the user decoy signal, the anti-interference performance is improved, and meanwhile, the leakage of the information of the user is avoided.
Optionally, as an embodiment of the present invention, the cheating experience is cheated from the preset experience according to an equal probability mannerExtracting the decoy experience information in the data set, and updating the weight value of the decoy deep reinforcement learning network through the extracted decoy experience informationThe process comprises the following steps:
respectively extracting the spoofing experience information from the preset spoofing experience data set according to an equal probability mode;
constructing a decoy target value according to the extracted decoy experience information, wherein the decoy target value is as follows:
wherein, γ d Indicating a spoofed reward attenuation factor,for the second instantaneous reward at time t,in order to trick the target value into play,is shown in the environment S t+1 The maximum decoy Q value output by the lower decoy deep reinforcement learning network,the frequency of fraud at time t + 1;
calculating the gradient of the decoy deep reinforcement learning network according to the seventh formula and the decoy target valueThe seventh formula is:
wherein, the first and the second end of the pipe are connected with each other,in order to calculate the partial derivative,in order to trick the target value into play,to trick the error value, E is the expectation, S t Is the state of the environment at time t,to be in an environment S t The lower spoofing deep reinforcement learning network outputs the spoofing Q value,gradient of a deep reinforcement learning network for deception;
updating the gradient of the decoy deep reinforcement learning network by using a random gradient descent algorithm to obtain the weight of the decoy deep reinforcement learning network
And until all the cheating experience information in the preset cheating experience data set is extracted.
In particular, from the preset spoofing experience data set D, equi-probabilistically d Randomly extracting experience (namely the cheating experience information), and constructing a target value according to the state in the experience and corresponding decision and returnWhereinIndicating the environmental state S in memory t+1 That the user can obtainMaximum Q value (i.e. in the environment S) t+1 Maximum spoofing Q value output by the lower spoofing deep reinforcement learning network). Calculating a target valueAnd true valueError betweenAnd from this the gradient is calculated as follows,
In the embodiment, the spoofing experience information is extracted from the preset spoofing experience data set in an equiprobable manner, the weight of the spoofing depth reinforcement learning network is updated through the extracted spoofing experience information, so that a user spoofing signal can firmly attract interference attack, the user communication signal can realize covert communication, the purposes of avoiding interference attack and obtaining information without interference are achieved, meanwhile, covert communication is realized through sacrificing the user spoofing signal, the anti-interference performance is improved, and meanwhile, the leakage of self information is avoided.
Optionally, as an embodiment of the present invention, in step S9, the fitting process is performed on the environment state information based on the updated deep reinforcement learning network, an optimized communication frequency is output, the deep reinforcement learning network is updated according to the preset spoofing experience data set, the fitting process is performed on the environment state information based on the updated deep reinforcement learning network, and a process of outputting the optimized spoofing frequency specifically includes:
updating the communication deep reinforcement learning network according to the preset communication experience data set, performing fitting processing on the environment state information based on the updated communication deep reinforcement learning network, and outputting optimized communication frequency;
and updating the spoofing deep reinforcement learning network according to the preset spoofing experience data set, fitting the environmental state information based on the updated spoofing deep reinforcement learning network, and outputting optimized spoofing frequency.
Specifically, outputting the optimized communication frequency and the optimized spoofing frequency are both based on the formula epsilon = max (0.01, epsilon-delta epsilon), and updating the communication decision selection probability epsilon c And decoy decision selection probability epsilon d Wherein, delta epsilon is the attenuation coefficient of the updating step length, and the probability epsilon is selected through the updated communication decision c And decoy decision selection probability epsilon d And (4) obtaining the product.
In the embodiment, the optimized communication frequency is output based on the fitting processing of the updated deep reinforcement learning network on the environmental state information, the deep reinforcement learning network is updated according to the preset spoofing experience data set, the optimized spoofing frequency is output based on the fitting processing of the updated deep reinforcement learning network on the environmental state information, and the user spoofing signal can firmly attract the interference attack, so that the user communication signal realizes covert communication, the purposes of avoiding the interference attack and obtaining the information without interference are achieved, meanwhile, the covert communication is realized by sacrificing the user spoofing signal, the anti-interference performance is improved, and meanwhile, the leakage of self information is avoided.
Optionally, as an embodiment of the present invention, before the method is executed, the user initializes the preset communication experience data setAnd said predetermined spoofing experience datasetSetting the preset communication experience data set D c And said predetermined spoofing experience data set D d The upper limit is N, the upper limit of the iterative times of the algorithm is M,the communication decision selection probability ε c And the decoy decision selection probability ε d Coefficient of reward attenuation gamma c ,γ d Communication deep reinforcement learning network parametersAnd interference decision evaluation network parameters thereofAnd spoofing deep reinforcement learning network parametersAnd interference decision evaluation network parameters thereofSet to a random number.
Optionally, as an embodiment of the present invention, the user and the interference are countered when the spectrum bandwidth B =20MHz, and both sides can freely select the center frequency to transmit the signal; bandwidth b of user signal u =1MHz, power p u =30dBm; interference signal bandwidth b j ∈[b s ,b h ]Can be changed according to the environmental state, wherein b s =1MHz,b h =3MHz, interference signal power p j =60dBm. Carrying out full-band sensing on users and interference once every 1ms, and storing sensed frequency spectrum data for 100ms, namely, the length T =100ms during backtracking; the decision is made by the two parties every 10ms, wherein the user selects the method of the invention to make the optimal decision to realize anti-interference, the interference selects Q learning algorithm, and the best interference is made by learning the sensed user information. Initializing a user data setSetting an upper limit N =1000 of a data set, setting an upper limit M =10000 of iteration times of the algorithm, and selecting a probability epsilon according to a strategy c =ε d =1, reward attenuation coefficient gamma c =γ d =0.8. The attenuation coefficient Δ ∈ =0.001 for the update step.
The final result shows that the method of the invention has excellent performance when the user uses the method to resist the intelligent interference with learning ability, the user can make a decoy decision to firmly attract the interference attack, and the communication decision can be concealed by the user, thereby not only improving the anti-interference performance, but also ensuring the information safety of the user.
Fig. 2 is a block diagram of a spoofing-assisted covert interference rejection unit according to an embodiment of the present invention.
Optionally, as another embodiment of the present invention, as shown in fig. 2, a decoy-assisted covert anti-jamming device includes:
the first judging module is used for acquiring the current moment through the user receiver and judging whether the current moment is earlier than the preset moment, if so, the first signal is sent to the random selection module, and if not, the second signal is sent to the frequency acquisition module;
the random selection module is used for randomly selecting a communication frequency and a decoy frequency within the range from a preset initial frequency to a preset termination frequency according to the first signal and sending the randomly selected communication frequency and the decoy frequency to the frequency spectrum sample value sequence processing module;
the frequency acquisition module is used for directly acquiring the communication frequency and the spoofing frequency from the user receiver according to the second signal and sending the acquired communication frequency and the spoofing frequency to the frequency spectrum sample value sequence processing module;
the frequency spectrum sample value sequence processing module is used for acquiring an interference signal from an interference machine, obtaining a frequency spectrum sample value sequence according to the communication frequency, the decoy frequency and the interference signal, and storing the frequency spectrum sample value sequence into a preset frequency spectrum sample value sequence table;
a final judging module, configured to judge whether the current time is earlier than the preset time again, if so, obtain the next time through a user receiver, and judge whether the next time is earlier than the preset time, if so, return to the random selection module, otherwise, return to the frequency obtaining module, until the current time reaches or is later than the preset time, use the preset frequency spectrum sample value sequence table as environment state information of the current time, perform fitting processing on the environment state information, the preset starting frequency and the preset terminating frequency based on a deep reinforcement learning network, output an initial communication frequency of the current time and an initial spoofing frequency of the current time, store the communication frequency in a preset historical communication frequency sequence table, and store the spoofing frequency in the preset historical spoofing frequency sequence table;
a sending module, configured to send the communication frequency and the spoofing frequency to the user receiver, where the communication frequency is used for the user receiver to control a user transmitter to transmit a communication signal, and the spoofing frequency is used for the user receiver to control the user transmitter to transmit a spoofing signal;
a fitting processing module, configured to perform fitting processing on the communication frequency in the preset historical communication frequency sequence table, the environment state information, and the spoofing frequency in the preset historical spoofing frequency sequence table respectively based on an interference decision evaluation network, output a communication evaluation network error value at the current time and a spoofing evaluation network error value at the current time, and calculate an instantaneous reward of the communication evaluation network error value to obtain a first instantaneous reward at the current time, and calculate an instantaneous reward of the spoofing evaluation network error value to obtain a second instantaneous reward at the current time;
the environment state transfer module is used for taking the environment state information, the first instantaneous reward of the current moment and the initial communication frequency of the current moment as the communication experience information of the current moment when the environment state is transferred to the environment state of the next moment, storing the communication experience information of the current moment into a preset communication experience data set, taking the environment state information, the second instantaneous reward of the current moment and the initial trapping frequency of the current moment as the trapping experience information of the current moment, storing the trapping experience information of the current moment into a preset trapping experience data set, returning to the execution step S1 until the number of the communication experience information stored in the preset communication experience data set and the number of the trapping experience information stored in the preset trapping experience data set reach a preset upper limit value, and sending a third signal into the frequency optimization module;
and the frequency optimization module is used for receiving the third signal, updating the deep reinforcement learning network according to the preset communication experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting an optimized communication frequency, updating the deep reinforcement learning network according to the preset spoofing experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting an optimized spoofing frequency, and sending the optimized communication frequency and the optimized spoofing frequency to the user receiver.
Optionally, another embodiment of the present invention provides a spoof-assisted covert interference rejection device, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the computer program, when executed by the processor, implements the spoof-assisted covert interference rejection method as described above. The device may be a computer or the like.
Optionally, another embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, which, when executed by a processor, implements the decoy-assisted covert interference rejection method as described above.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (9)
1. A decoy auxiliary type hidden anti-interference method is characterized by comprising the following steps:
s1: acquiring a current moment through a user receiver, and judging whether the current moment is earlier than a preset moment, if so, executing a step S2, and if not, executing a step S3;
s2: randomly selecting a communication frequency and a decoy frequency within a range from a preset initial frequency to a preset termination frequency, and executing a step S4;
s3: directly acquiring a communication frequency and a decoy frequency from the user receiver, and executing a step S4;
s4: acquiring an interference signal from an interference machine, obtaining a frequency spectrum sample sequence according to the communication frequency, the decoy frequency and the interference signal, and storing the frequency spectrum sample sequence into a preset frequency spectrum sample sequence table;
s5: judging whether the current time is earlier than the preset time again, if so, acquiring the next time through a user receiver, judging whether the next time is earlier than the preset time, if so, returning to the step S2, otherwise, returning to the step S3, taking the preset frequency spectrum sample value sequence table as the environmental state information of the current time, fitting the environmental state information, the preset starting frequency and the preset terminating frequency based on a deep reinforcement learning network until the current time is reached or later than the preset time, outputting the initial communication frequency of the current time and the initial decoy frequency of the current time, saving the communication frequency in a preset historical communication frequency sequence table, and saving the decoy frequency in the preset historical decoy frequency sequence table;
s6: sending the communication frequency and the spoofing frequency to the user receiver, wherein the communication frequency is used for controlling a user transmitter to transmit communication signals by the user receiver, and the spoofing frequency is used for controlling the user transmitter to transmit spoofing signals by the user receiver;
s7: respectively fitting the communication frequency, the environment state information and the decoy frequency in the preset historical communication frequency sequence list based on an interference decision evaluation network, outputting a communication evaluation network error value at the current moment and a decoy evaluation network error value at the current moment, calculating the instantaneous return of the communication evaluation network error value to obtain a first instantaneous return at the current moment, and calculating the instantaneous return of the decoy evaluation network error value to obtain a second instantaneous return at the current moment;
s8: when the environment state is shifted to the environment state of the next moment, taking the environment state information, the first instantaneous reward of the current moment and the initial communication frequency of the current moment as the communication experience information of the current moment, storing the communication experience information of the current moment into a preset communication experience data set, taking the environment state information, the second instantaneous reward of the current moment and the initial spoofing frequency of the current moment as the spoofing experience information of the current moment, storing the spoofing experience information of the current moment into a preset spoofing experience data set, returning to the step S1 until the number of the communication experience information stored in the preset communication experience data set and the number of the spoofing experience information stored in the preset spoofing experience data set reach a preset upper limit value, and executing the step S9;
s9: updating the deep reinforcement learning network according to the preset communication experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting optimized communication frequency, updating the deep reinforcement learning network according to the preset decoy experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting optimized decoy frequency, and sending the optimized communication frequency and the optimized decoy frequency to the user receiver;
the interference signal includes an interference selection frequency and an interference signal bandwidth, and in step S4, the process of obtaining a spectrum sample sequence according to the communication frequency, the spoofing frequency, and the interference signal specifically includes:
calculating the communication frequency, the decoy frequency, the interference selection frequency and the interference signal bandwidth through a first formula to obtain a frequency spectrum sample sequence, wherein the first formula is as follows:
s t ={s 1,t ,s 2,t ,…,s n,t },
wherein H t (f)=g u U t (f)+g j J t (f)+n(f),
wherein s is t Is a sequence of spectral samples, s n,t For the nth spectral sample,. DELTA.f spectral resolution, f s As the starting frequency, H t (f) Is the power spectral density, g u For the channel gain between the subscriber transmitter and the subscriber receiver, g j For channel gain between jammer to user receiver, n (f) is noise, U t (f) For user signals, J t (f) In order to interfere with the signal(s),for user decision, p u Transmitting communication signals and spoofing signals for power, b u For the spectral bandwidth of the communication signal and the spoof signal,in order to make a decision on the interference,for the communication decision at time t, f t c Is the communication frequency at the time t,for the decoy decision at time t,for the decoy frequency at time t, f t j The selection of a frequency for the interference is made,is the interference signal bandwidth.
2. The decoy-assisted covert anti-jamming method according to claim 1, wherein the deep reinforcement learning network comprises a communication deep reinforcement learning network and a decoy deep reinforcement learning network, and the step S5 of fitting the environmental state information, the preset starting frequency and the preset terminating frequency based on the deep reinforcement learning network and outputting the initial communication frequency at the current time and the initial decoy frequency at the current time specifically comprises:
extracting the characteristics of the environment state information based on the communication deep reinforcement learning network, outputting a plurality of initial communication Q values at the current moment, and screening out the maximum communication Q value from the plurality of communication Q values;
selecting a frequency within the range from the preset initial frequency to the preset termination frequency according to the maximum communication Q value to obtain the initial communication frequency at the current moment;
extracting the characteristics of the environment state information based on the spoofing deep reinforcement learning network, outputting a plurality of spoofing Q values which are initial at the current moment, and screening out the maximum spoofing Q value from the plurality of spoofing Q values;
and selecting the frequency within the range from the preset initial frequency to the preset termination frequency according to the maximum spoofing Q value to obtain the initial spoofing frequency at the current moment.
3. The decoy-assisted covert interference rejection method of claim 1 wherein said interference decision evaluation network comprises a communication interference decision evaluation network and a decoy interference decision evaluation network; the process of the step S7 specifically includes:
performing feature extraction on the communication frequency in the preset historical communication frequency sequence table based on a communication interference decision evaluation network, and outputting a communication fitting environment state at the current moment;
calculating the communication fitting environment state and the environment state information by a second formula to obtain a communication evaluation network error value at the current moment, wherein the second formula is as follows:
wherein, the first and the second end of the pipe are connected with each other,a network error value is evaluated for the communication,is the environmental status information at the time point t,fitting an environment state for the communication at time t;
calculating the instantaneous return of the communication evaluation network error value through a third formula to obtain a first instantaneous return at the current moment, wherein the third formula is as follows:
wherein the content of the first and second substances,a network error value is evaluated for the communication,reporting for the first instant;
performing feature extraction on the decoy frequency in the preset historical decoy frequency sequence table based on a decoy interference decision evaluation network, and outputting a decoy fitting environment state at the current moment;
calculating the environment state of the decoy fitting and the environment state information through a fourth formula to obtain a decoy evaluation network error value at the current moment, wherein the fourth formula is as follows:
wherein the content of the first and second substances,in order to spoof the evaluation of the network error value,is the environmental status information at the time t,fitting the environment state for the spoofing at time t;
calculating the instantaneous return of the cheating evaluation network error value through a fifth formula to obtain a second instantaneous return at the current moment, wherein the fifth formula is as follows:
4. The decoy-assisted covert anti-jamming method according to claim 2, wherein the process of step S8 specifically comprises:
when the environmental state is transferred to the environmental state of the next moment, executing the step S1 to the step S5 so as to obtain the environmental state information of the next moment;
taking the environmental state information at the current moment, the first instantaneous report at the current moment, the initial communication frequency at the current moment and the environmental state information at the next moment as communication experience information, and storing the communication experience information into a preset communication experience data set, specifically:
communication experience data set D for storing communication experience information with upper limit of N c ;
where t is the time, S t Environmental state at time t, S t+1 Is the ambient state at time t +1,for the communication decision at time t, f t c Is the communication frequency at the time t,a first instantaneous reward for time t;
taking the environmental state information at the current moment, the second instantaneous return at the current moment, the initial trapping frequency at the current moment and the environmental state information at the next moment as trapping experience information, and storing the trapping experience information into a preset trapping experience data set;
when the number of the communication experience information stored in the preset communication experience data set and the number of the spoofing experience information stored in the preset spoofing experience data set both reach the preset upper limit value, extracting the communication experience information from the preset communication experience data set according to an equal probability mode, and updating the weight of the communication deep reinforcement learning network through the extracted communication experience information
5. The decoy-assisted covert anti-jamming method according to claim 4, wherein the process of taking the environmental status information at the current time, the second instantaneous reward at the current time, the initial decoy frequency at the current time and the environmental status information at the next time as the decoy experience information and storing the decoy experience information in a preset decoy experience data set specifically comprises:
setting up and storing upper limit N pieces of decoy experience informationDecoy experience data set D of d ;
6. The decoy-assisted covert interference rejection method according to claim 4, wherein said communication experience information is extracted from said preset communication experience data set according to an equiprobable manner, and the weights of said communication deep reinforcement learning network are updated according to the extracted communication experience informationThe process comprises the following steps:
respectively extracting communication experience information from the preset communication experience data set according to an equal probability mode;
constructing a communication target value according to the extracted communication experience information, wherein the communication target value is as follows:
wherein, the first and the second end of the pipe are connected with each other,in order to achieve the communication target value,is the first instantaneous return at time t, γ c A value representing a communication reward attenuation factor,is shown in the environment S t+1 The maximum communication Q value output by the lower communication deep reinforcement learning network,the communication frequency at the time t + 1;
calculating the gradient of the communication deep reinforcement learning network according to the sixth formula and the communication target valueThe sixth formula is:
wherein the content of the first and second substances,in order to calculate the partial derivatives,is the communication frequency at the time t,to communicate an error value, E is expected, S t Is the state of the environment at time t,in order to achieve the target value of the communication,to be in an environment S t The communication Q value output by the lower communication deep reinforcement learning network,gradient of a reinforcement learning network for communication depth;
updating the gradient of the communication deep reinforcement learning network by using a random gradient descent algorithm to obtain the weight of the communication deep reinforcement learning network
And until all the communication experience information in the preset communication experience data set is extracted.
7. The decoy-assisted covert anti-interference method according to claim 5, wherein the decoy experience information is extracted from the preset decoy experience data set according to an equal probability mode, and the weight of the decoy deep reinforcement learning network is updated according to the extracted decoy experience informationThe process comprises the following steps:
respectively extracting the spoofing experience information from the preset spoofing experience data set according to an equal probability mode;
and constructing a decoy target value according to the extracted decoy experience information, wherein the decoy target value is as follows:
wherein, γ d Indicating a spoofed reward attenuation factor,for the second instant reward at time t,in order to trick the target value into play,is shown in the environment S t+1 The maximum decoy Q value output by the lower decoy deep reinforcement learning network,the frequency of fraud at time t + 1;
calculating the gradient of the decoy deep reinforcement learning network according to the seventh formula and the decoy target valueThe seventh formula is:
wherein the content of the first and second substances,in order to calculate the partial derivatives,in order to trick the target value into play,to trick the error value, E is the expectation, S t Is the state of the environment at time t,to be in an environment S t The lower spoofing deep reinforcement learning network outputs the spoofing Q value,gradient of the network for decoy deep reinforcement learning;
updating the gradient of the decoy deep reinforcement learning network by using a random gradient descent algorithm to obtain the weight of the decoy deep reinforcement learning network
And until all the cheating experience information in the preset cheating experience data set is extracted.
8. A decoy-assisted covert interference rejection device, comprising:
the first judging module is used for acquiring the current moment through a user receiver and judging whether the current moment is earlier than a preset moment or not, if so, the first signal is sent to the random selection module, and if not, the second signal is sent to the frequency acquisition module;
the random selection module is used for randomly selecting a communication frequency and a decoy frequency within the range from a preset initial frequency to a preset termination frequency according to the first signal and sending the randomly selected communication frequency and the decoy frequency to the frequency spectrum sample value sequence processing module;
the frequency acquisition module is used for directly acquiring the communication frequency and the spoofing frequency from the user receiver according to the second signal and sending the acquired communication frequency and the spoofing frequency to the frequency spectrum sample value sequence processing module;
the frequency spectrum sample value sequence processing module is used for acquiring an interference signal from an interference machine, obtaining a frequency spectrum sample value sequence according to the communication frequency, the decoy frequency and the interference signal, and storing the frequency spectrum sample value sequence into a preset frequency spectrum sample value sequence table;
a final judging module, configured to judge whether the current time is earlier than the preset time again, if so, obtain the next time through a user receiver, and judge whether the next time is earlier than the preset time, if so, return to the random selection module, otherwise, return to the frequency obtaining module, until the current time reaches or is later than the preset time, use the preset frequency spectrum sample value sequence table as environment state information of the current time, perform fitting processing on the environment state information, the preset starting frequency and the preset terminating frequency based on a deep reinforcement learning network, output an initial communication frequency of the current time and an initial spoofing frequency of the current time, store the communication frequency in a preset historical communication frequency sequence table, and store the spoofing frequency in the preset historical spoofing frequency sequence table;
a sending module, configured to send the communication frequency and the spoofing frequency to the user receiver, where the communication frequency is used for the user receiver to control a user transmitter to transmit a communication signal, and the spoofing frequency is used for the user receiver to control the user transmitter to transmit a spoofing signal;
a fitting processing module, configured to perform fitting processing on the communication frequency in the preset historical communication frequency sequence table, the environment state information, and the spoofing frequency in the preset historical spoofing frequency sequence table respectively based on an interference decision evaluation network, output a communication evaluation network error value at the current time and a spoofing evaluation network error value at the current time, and calculate an instantaneous reward of the communication evaluation network error value to obtain a first instantaneous reward at the current time, and calculate an instantaneous reward of the spoofing evaluation network error value to obtain a second instantaneous reward at the current time;
the environment state transfer module is used for taking the environment state information, the first instantaneous reward of the current moment and the initial communication frequency of the current moment as the communication experience information of the current moment when the environment state is transferred to the environment state of the next moment, storing the communication experience information of the current moment into a preset communication experience data set, taking the environment state information, the second instantaneous reward of the current moment and the initial trapping frequency of the current moment as the trapping experience information of the current moment, storing the trapping experience information of the current moment into a preset trapping experience data set, returning to the primary judging module until the number of the communication experience information stored in the preset communication experience data set and the number of the trapping experience information stored in the preset trapping experience data set reach a preset upper limit value, and sending a third signal into the frequency optimization module;
the frequency optimization module is used for receiving the third signal, updating the deep reinforcement learning network according to the preset communication experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting an optimized communication frequency, updating the deep reinforcement learning network according to the preset spoofing experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting an optimized spoofing frequency, and sending the optimized communication frequency and the optimized spoofing frequency to the user receiver;
the interference signal comprises an interference selection frequency and an interference signal bandwidth, and the spectrum sample sequence processing module is specifically configured to:
calculating the communication frequency, the decoy frequency, the interference selection frequency and the interference signal bandwidth through a first formula to obtain a frequency spectrum sample sequence, wherein the first formula is as follows:
s t ={s 1,t ,s 2,t ,…,s n,t },
wherein H t (f)=g u U t (f)+g j J t (f)+n(f),
wherein s is t Is a sequence of spectral samples, s n,t For the nth spectral sample,. DELTA.f the spectral resolution, f s As the starting frequency, H t (f) Is the power spectral density, g u For the channel gain between the subscriber transmitter and the subscriber receiver, g j For channel gain between jammer to user receiver, n (f) is noise, U t (f) For user signals, J t (f) In order to interfere with the signal(s),for user decision, p u Transmitting communication signals and spoofing signals for power, b u For the spectral bandwidth of the communication signal and the spoof signal,in order to make a decision on the interference,for the communication decision at time t, f t c Is the communication frequency at the time t,for a decoy decision at time t, f t d Decoy frequency at time t, f t j The frequency is selected for the purpose of interference,is the interference signal bandwidth.
9. A computer-readable storage medium, storing a computer program, wherein the computer program, when executed by a processor, implements the decoy-assisted covert interference rejection method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110547565.1A CN113395129B (en) | 2021-05-19 | 2021-05-19 | Decoy-assisted hidden anti-interference method, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110547565.1A CN113395129B (en) | 2021-05-19 | 2021-05-19 | Decoy-assisted hidden anti-interference method, device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113395129A CN113395129A (en) | 2021-09-14 |
CN113395129B true CN113395129B (en) | 2023-03-14 |
Family
ID=77618072
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110547565.1A Active CN113395129B (en) | 2021-05-19 | 2021-05-19 | Decoy-assisted hidden anti-interference method, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113395129B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114172691A (en) * | 2021-11-11 | 2022-03-11 | 南京航空航天大学 | Anti-tracking interference system based on decoy strategy |
CN113890651B (en) * | 2021-11-17 | 2022-08-16 | 北京航空航天大学 | Method for predicting spectrum interference between transmitter and receiver |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111934786A (en) * | 2020-07-30 | 2020-11-13 | 桂林理工大学 | Signal concealment anti-interference method and device based on deep reinforcement learning |
CN111970072A (en) * | 2020-07-01 | 2020-11-20 | 中国人民解放军陆军工程大学 | Deep reinforcement learning-based broadband anti-interference system and anti-interference method |
CN112180331A (en) * | 2020-09-29 | 2021-01-05 | 中国船舶重工集团公司第七二四研究所 | Adaptive radio frequency shielding pulse frequency point strategy scheduling method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200153535A1 (en) * | 2018-11-09 | 2020-05-14 | Bluecom Systems and Consulting LLC | Reinforcement learning based cognitive anti-jamming communications system and method |
-
2021
- 2021-05-19 CN CN202110547565.1A patent/CN113395129B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111970072A (en) * | 2020-07-01 | 2020-11-20 | 中国人民解放军陆军工程大学 | Deep reinforcement learning-based broadband anti-interference system and anti-interference method |
CN111934786A (en) * | 2020-07-30 | 2020-11-13 | 桂林理工大学 | Signal concealment anti-interference method and device based on deep reinforcement learning |
CN112180331A (en) * | 2020-09-29 | 2021-01-05 | 中国船舶重工集团公司第七二四研究所 | Adaptive radio frequency shielding pulse frequency point strategy scheduling method |
Non-Patent Citations (3)
Title |
---|
基于相关峰数量统计的诱骗信号快速识别;梁宵等;《电讯技术》;20180228(第02期);第1-4页 * |
超宽带多址通信信号的功率谱分析;郑继禹等;《电子学报》;20031025(第10期);第1-3页 * |
通信电子干扰的分类与发展;逄天洋等;《通信技术》;20181010(第10期);第1-7页 * |
Also Published As
Publication number | Publication date |
---|---|
CN113395129A (en) | 2021-09-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113395129B (en) | Decoy-assisted hidden anti-interference method, device and storage medium | |
Preisig | Performance analysis of adaptive equalization for coherent acoustic communications in the time-varying ocean environment | |
CN108777872A (en) | A kind of anti-interference model of depth Q neural networks and intelligent Anti-interference algorithm | |
Jia-jia et al. | Bio-inspired steganography for secure underwater acoustic communications | |
CN112466320B (en) | Underwater sound signal noise reduction method based on generation countermeasure network | |
CN105117769A (en) | Identification method of deception jamming on the basis of differential evolution wolf pack algorithm | |
Han et al. | Experimental demonstration of underwater acoustic communication using bionic signals | |
CN115442191B (en) | Communication signal noise reduction method and system based on relative average generation countermeasure network | |
CN107180259B (en) | STAP training sample selection method based on system identification | |
Kari et al. | Robust adaptive algorithms for underwater acoustic channel estimation and their performance analysis | |
CN104568113B (en) | A kind of ocean acoustic propagation investigation automatic intercept method of blast wave based on model | |
CN107769862A (en) | A kind of bionical low communication interception method | |
CN113420495B (en) | Active decoy type intelligent anti-interference method | |
CN110061982B (en) | Intelligent attack resisting safe transmission method based on reinforcement learning | |
CN115236607A (en) | Radar anti-interference strategy optimization method based on double-layer Q learning | |
Li et al. | Counterfactual regret minimization for anti-jamming game of frequency agile radar | |
CN106910508B (en) | Hidden underwater acoustic communication method for imitating marine pile driving sound source | |
CN111934786B (en) | Signal concealment anti-interference method and device based on deep reinforcement learning | |
Topal et al. | Identification of smart jammers: Learning based approaches using wavelet representation | |
CN116866048A (en) | Anti-interference zero-and Markov game model and maximum and minimum depth Q learning method | |
CN106385272A (en) | Underwater signal enhancing method based on stochastic resonance and time reverse mirror | |
CN114666107A (en) | Advanced persistent threat defense method in mobile fog computing | |
CN114330441A (en) | Underwater sound JANUS signal identification method and system based on time-frequency spectrum and transfer learning | |
CN115378777A (en) | Method for identifying underwater communication signal modulation mode in alpha stable distribution noise environment | |
Jin et al. | Preamble detection for underwater acoustic communications based on convolutional neural networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |