CN113395129B

CN113395129B - Decoy-assisted hidden anti-interference method, device and storage medium

Info

Publication number: CN113395129B
Application number: CN202110547565.1A
Authority: CN
Inventors: 刘鑫; 王一凡; 王玫; 陈璐瑶
Original assignee: Guilin University of Technology
Current assignee: Guilin University of Technology
Priority date: 2021-05-19
Filing date: 2021-05-19
Publication date: 2023-03-14
Anticipated expiration: 2041-05-19
Also published as: CN113395129A

Abstract

The invention provides a decoy auxiliary type hidden anti-interference method, a device and a storage medium, wherein the method comprises the following steps: acquiring the current moment through a user receiver, judging whether the current moment is earlier than the preset moment, if so, randomly selecting a communication frequency and a trapping frequency within the range from a preset initial frequency to a preset termination frequency, and if not, directly acquiring the communication frequency and the trapping frequency from the user receiver; and obtaining a frequency spectrum sample sequence according to the communication frequency, the decoy frequency and the interference signal, and storing the frequency spectrum sample sequence into a preset frequency spectrum sample sequence table. The invention ensures that the user trapping signal can firmly attract the interference attack, so that the user communication signal realizes the covert communication by the interference, the purposes of avoiding the interference attack and obtaining the information without interference are realized, meanwhile, the covert communication is realized by sacrificing the user trapping signal, the anti-interference performance is improved, and the leakage of the self information is avoided.

Description

Decoy-assisted hidden anti-interference method, device and storage medium

Technical Field

The invention mainly relates to the technical field of communication anti-interference, in particular to a decoy auxiliary type hidden anti-interference method, a device and a storage medium.

Background

With the development of wireless networks, the development of wireless networks is faster and faster, and the proportion of the wireless networks in daily life and military communication of people is increased year by year. Wireless networks offer many benefits to people's life and work due to the openness of their propagation medium, which however also makes them vulnerable to interference. With the advancement of science and technology, intelligent interference with learning ability has a serious influence on wireless communication technology, and how to deal with the intelligent interference becomes one of the popular directions for anti-interference researchers to research. At present, most anti-interference technologies are based on an evasion strategy, namely, the attack of interference is avoided as much as possible. Although the current effect of the method is good, the self information of the communication user is leaked, and the anti-interference performance is obviously reduced along with the continuous learning of the interference on the user information.

The method has the advantages that a learner puts forward a concealed anti-interference idea, utilizes an environment signal to conceal a communication signal of the learner, and ensures that the information of the learner is not interfered and acquired, so that the interference cannot be learned and targeted interference can not be implemented. However, in the model, it is necessary to assume that there is a strong ambient signal around, i.e. there is a perception blind area for the interference. In reality, environmental signals do not exist all the time, and self information can be masked, so that a significant research direction is provided for how to realize the hiding and anti-interference under the condition that the interference is not influenced by the environment.

Disclosure of Invention

The invention aims to solve the technical problem of the prior art and provides a decoy auxiliary type hidden anti-interference method, a device and a storage medium.

The technical scheme for solving the technical problems is as follows: a decoy auxiliary type hidden anti-interference method comprises the following steps:

s1: acquiring a current moment through a user receiver, and judging whether the current moment is earlier than a preset moment, if so, executing a step S2, and if not, executing a step S3;

s2: randomly selecting a communication frequency and a decoy frequency within a range from a preset initial frequency to a preset termination frequency, and executing a step S4;

s3: directly acquiring a communication frequency and a decoy frequency from the user receiver, and executing a step S4;

s4: acquiring an interference signal from an interference machine, obtaining a frequency spectrum sample sequence according to the communication frequency, the decoy frequency and the interference signal, and storing the frequency spectrum sample sequence into a preset frequency spectrum sample sequence table;

s5: judging whether the current time is earlier than the preset time again, if so, acquiring the next time through a user receiver, judging whether the next time is earlier than the preset time, if so, returning to the step S2, otherwise, returning to the step S3, taking the preset frequency spectrum sample value sequence table as the environmental state information of the current time, fitting the environmental state information, the preset starting frequency and the preset terminating frequency based on a deep reinforcement learning network, outputting the initial communication frequency of the current time and the initial trapping frequency of the current time, saving the communication frequency in a preset historical communication frequency sequence table, and saving the trapping frequency in the preset historical trapping frequency sequence table;

s6: sending the communication frequency and the spoofing frequency to the user receiver, wherein the communication frequency is used for controlling a user transmitter to transmit communication signals by the user receiver, and the spoofing frequency is used for controlling the user transmitter to transmit spoofing signals by the user receiver;

s7: respectively fitting the communication frequency, the environment state information and the decoy frequency in the preset historical communication frequency sequence list based on an interference decision evaluation network, outputting a communication evaluation network error value at the current moment and a decoy evaluation network error value at the current moment, calculating the instantaneous return of the communication evaluation network error value to obtain a first instantaneous return at the current moment, and calculating the instantaneous return of the decoy evaluation network error value to obtain a second instantaneous return at the current moment;

s8: when the environment state is shifted to the environment state of the next moment, taking the environment state information, the first instantaneous reward of the current moment and the initial communication frequency of the current moment as the communication experience information of the current moment, storing the communication experience information of the current moment into a preset communication experience data set, taking the environment state information, the second instantaneous reward of the current moment and the initial spoofing frequency of the current moment as the spoofing experience information of the current moment, storing the spoofing experience information of the current moment into a preset spoofing experience data set, returning to the step S1 until the number of the communication experience information stored in the preset communication experience data set and the number of the spoofing experience information stored in the preset spoofing experience data set reach a preset upper limit value, and executing the step S9;

s9: updating the deep reinforcement learning network according to the preset communication experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting optimized communication frequency, updating the deep reinforcement learning network according to the preset decoy experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting optimized decoy frequency, and sending the optimized communication frequency and the optimized decoy frequency to the user receiver.

Another technical solution of the present invention for solving the above technical problems is as follows: a decoy-assisted covert jamming protection device comprising:

the first judging module is used for acquiring the current moment through a user receiver and judging whether the current moment is earlier than a preset moment or not, if so, the first signal is sent to the random selection module, and if not, the second signal is sent to the frequency acquisition module;

the random selection module is used for randomly selecting a communication frequency and a decoy frequency within the range from a preset initial frequency to a preset termination frequency according to the first signal and sending the randomly selected communication frequency and the decoy frequency to the frequency spectrum sample value sequence processing module;

the frequency acquisition module is used for directly acquiring communication frequency and decoy frequency from the user receiver according to the second signal and sending the acquired communication frequency and decoy frequency to the frequency spectrum sample value sequence processing module;

the frequency spectrum sample value sequence processing module is used for acquiring an interference signal from an interference machine, obtaining a frequency spectrum sample value sequence according to the communication frequency, the decoy frequency and the interference signal, and storing the frequency spectrum sample value sequence into a preset frequency spectrum sample value sequence table;

a final judging module, configured to judge whether the current time is earlier than the preset time again, if so, obtain the next time through a user receiver, and judge whether the next time is earlier than the preset time, if so, return to the random selection module, otherwise, return to the frequency obtaining module, until the current time reaches or is later than the preset time, use the preset frequency spectrum sample value sequence table as environment state information of the current time, perform fitting processing on the environment state information, the preset starting frequency and the preset terminating frequency based on a deep reinforcement learning network, output an initial communication frequency of the current time and an initial spoofing frequency of the current time, store the communication frequency in a preset historical communication frequency sequence table, and store the spoofing frequency in the preset historical spoofing frequency sequence table;

a sending module, configured to send the communication frequency and the spoofing frequency to the user receiver, where the communication frequency is used for the user receiver to control a user transmitter to transmit a communication signal, and the spoofing frequency is used for the user receiver to control the user transmitter to transmit a spoofing signal;

a fitting processing module, configured to perform fitting processing on the communication frequency in the preset historical communication frequency sequence table, the environment state information, and the spoofing frequency in the preset historical spoofing frequency sequence table respectively based on an interference decision evaluation network, output a communication evaluation network error value at the current time and a spoofing evaluation network error value at the current time, and calculate an instantaneous reward of the communication evaluation network error value to obtain a first instantaneous reward at the current time, and calculate an instantaneous reward of the spoofing evaluation network error value to obtain a second instantaneous reward at the current time;

the environment state transfer module is used for taking the environment state information, the first instantaneous reward of the current moment and the initial communication frequency of the current moment as the communication experience information of the current moment when the environment state is transferred to the environment state of the next moment, storing the communication experience information of the current moment into a preset communication experience data set, taking the environment state information, the second instantaneous reward of the current moment and the initial trapping frequency of the current moment as the trapping experience information of the current moment, storing the trapping experience information of the current moment into a preset trapping experience data set, returning to the execution step S1 until the number of the communication experience information stored in the preset communication experience data set and the number of the trapping experience information stored in the preset trapping experience data set reach a preset upper limit value, and sending a third signal into the frequency optimization module;

and the frequency optimization module is used for receiving the third signal, updating the deep reinforcement learning network according to the preset communication experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting an optimized communication frequency, updating the deep reinforcement learning network according to the preset spoofing experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting an optimized spoofing frequency, and sending the optimized communication frequency and the optimized spoofing frequency to the user receiver.

Another technical solution of the present invention for solving the above technical problems is as follows: a decoy-assisted covert anti-jamming device comprising a memory, a processor and a computer program stored in said memory and executable on said processor, said computer program, when executed by said processor, implementing a decoy-assisted covert anti-jamming method as described above.

Another technical solution of the present invention for solving the above technical problems is as follows: a computer readable storage medium storing a computer program which, when executed by a processor, implements a decoy-assisted covert interference rejection method as described above.

The beneficial effects of the invention are: the current moment is obtained through the user receiver, when the current moment is earlier than the preset moment, the communication frequency and the trapping frequency are randomly selected from the range from the preset starting frequency to the preset ending frequency, experience is accumulated, data are provided for subsequent processing, when the current moment is equal to or later than the preset moment, the communication frequency and the trapping frequency are directly obtained from the user receiver, the fact that the user trapping signal can firmly attract interference attack is guaranteed, the user communication signal can achieve covert communication accordingly, the purposes that the interference attack is avoided, information is not interfered and obtained are achieved, meanwhile, covert communication is achieved by sacrificing the user trapping signal, anti-interference performance is improved, and meanwhile leakage of self information is avoided.

Drawings

FIG. 1 is a flow chart of a spoofing-assisted covert interference rejection method according to an embodiment of the present invention;

fig. 2 is a block diagram of a spoofing-assisted covert interference rejection unit according to an embodiment of the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.

Fig. 1 is a schematic flowchart of a spoofing-assisted covert interference rejection method according to an embodiment of the present invention.

As shown in fig. 1, a spoofing-assisted covert anti-interference method includes the following steps:

1. a decoy auxiliary type hidden anti-interference method is characterized by comprising the following steps:

s2: randomly selecting a communication frequency and a decoy frequency within a range from a preset initial frequency to a preset end frequency, and executing a step S4;

s4: obtaining an interference signal from an interference machine, obtaining a frequency spectrum sample sequence according to the communication frequency, the decoy frequency and the interference signal, and storing the frequency spectrum sample sequence into a preset frequency spectrum sample sequence table;

s8: when the environment state is transferred to the environment state of the next moment, taking the environment state information, the first instantaneous reward of the current moment and the initial communication frequency of the current moment as the communication experience information of the current moment, storing the communication experience information of the current moment in a preset communication experience data set, taking the environment state information, the second instantaneous reward of the current moment and the initial spoofing frequency of the current moment as the spoofing experience information of the current moment, storing the spoofing experience information of the current moment in a preset spoofing experience data set, returning to the step S1 until the number of the communication experience information stored in the preset communication experience data set and the number of the spoofing experience information stored in the preset spoofing experience data set reach a preset upper limit value, and executing the step S9;

It should be understood that the interference decision evaluation network refers to a GAN network, i.e. a generation-counter network.

It should be appreciated that the first instantaneous reward can be an instantaneous reward of a communication deep reinforcement learning network; the second instantaneous reward may be an instantaneous reward that spoofs a deep reinforcement learning network.

Specifically, step S2 transmits the obtained communication frequency and spoofing frequency to the user transmitter, and step S4 receives an interference signal transmitted by the jammer from the user receiver, and also receives a communication signal and a spoofing signal transmitted by the user transmitter, where the communication signal includes the communication frequency and the communication bandwidth, and the spoofing signal includes the spoofing frequency and the spoofing bandwidth.

It should be understood that, if the preset time is set to 10 o ' clock, if the current time is 9 o ' clock and 58 min, step S2 is executed, and if the current time is 10 o ' clock and 01 min, step S3 is executed.

It should be understood that considering a scenario where one user and one interference are competing within a communication bandwidth of B, so that one pair of the user transmitter and the user receiver is one user, there is one interference in the system to implement interference on user communication.

It should be understood that, the user transmits signals (i.e. communication signals and spoofing signals), the jammer learns and makes a targeted jamming decision according to the perceived environment state, releases the jamming signals, the user can receive the communication signals and spoofing signals and jamming signals transmitted by the user transmitter, and learn the received environment state information to obtain corresponding communication frequencies and spoofing frequencies, and transmit the corresponding communication frequencies and spoofing frequencies to the user receiver, and the user receiver transmits control information to the user transmitter to transmit corresponding signals.

It should be appreciated that since the environmental state cannot expand in the time dimension in the early stages of the confrontation, the user begins to randomly select a frequency and accumulate experience until the environmental state satisfies the expansion in the time dimension.

Specifically, considering that the decision of both the confrontation parties is related to the environmental state of the past for a long time, the environmental state is defined as S _t ＝{s _t ,s _t-1 ,…,s _t-T+1 }，s _t Is a frequency spectrum sample value sequence, T represents the time length of backtracking, and the user can use the environment state information S _t Learning is performed and communicated to the user transmitter over a control link.

In the embodiment, the current time is obtained through the user receiver, when the current time is earlier than the preset time, the communication frequency and the trapping frequency are randomly selected from the range from the preset starting frequency to the preset ending frequency, experience is accumulated, data are provided for subsequent processing, when the current time is equal to or later than the preset time, the communication frequency and the trapping frequency are directly obtained from the user receiver, and the fact that the user trapping signal can firmly attract interference attack is guaranteed, so that the user communication signal is concealed in communication, the purposes that the user communication signal is free from interference attack and information is not interfered to obtain are achieved, meanwhile, the concealed communication is achieved by sacrificing the user trapping signal, the anti-interference performance is improved, and meanwhile, the leakage of self information is avoided.

Optionally, as an embodiment of the present invention, the interference signal includes an interference selection frequency and an interference signal bandwidth, and the step S4 of obtaining the spectrum sample sequence according to the communication frequency, the decoy frequency, and the interference signal specifically includes:

calculating the communication frequency, the decoy frequency, the interference selection frequency and the interference signal bandwidth through a first equation to obtain a frequency spectrum sample sequence, wherein the first equation is as follows:

s _t ＝{s _1，t ，s _2，t ，…，s _n，t }，

wherein, the first and the second end of the pipe are connected with each other,

wherein H _t (f)＝g _u U _t (f)+g _j J _t (f)+n(f)，

Wherein the content of the first and second substances,

wherein the content of the first and second substances,

wherein the content of the first and second substances,

wherein s is _t Is a sequence of spectral samples, s _n，t For the nth spectral sample,. DELTA.f spectral resolution, f _s As the starting frequency, H _t (f) Is the power spectral density, g _u For the channel gain between the subscriber transmitter and the subscriber receiver, g _j For channel gain between jammer to user receiver, n (f) is noise, U _t (f) For user signals, J _t (f) In order to interfere with the signal, it is,

for user decision, p _u For transmitting communication signals and spoofing signals for power, b _u For the spectral bandwidth of the communication signal and the spoofed signal,

in order to make a decision on the interference,

for the decision to communicate at time t,

is the communication frequency at the time t,

for the decoy decision at time t,

for the decoy frequency at time t,

the frequency is selected for the purpose of interference,

is the interference signal bandwidth.

Understandably, in the formula, H _t (f) And H _t (f+f _s ) The difference in (A) is merely that the values of the substitutions differ, i.e. H _t (f) The substituted values are f, H _t (f+f _s ) The value of f + f _s 。

Specifically, the start-stop frequencies of the communication band where the user and the interference are located are f _s And f _e The user can select the frequencies fc and f ^d Respectively as a communication center frequency and a spoofing center frequency, wherein f ^c ∈[f _s ，f _e ]，f ^d ∈[f _s ，f _e ]With p _u For power transmission of communication signals and spoofing signals, the user signal has a spectral bandwidth of b _u The user's decision only takes into account the change in frequency, including the communication frequency and the spoofing frequency, i.e.

Wherein

Thus defining the user signal as

The interfering user can also be in the communication band f _s ，f _e ]Internal free selection frequency f ^j Transmitting an interference signal (i.e. the interference selection frequency) for the center frequency, whichSignal bandwidth b ^j ∈[b _s ，b _h ]Can be changed according to the perception state, wherein b _s ，b _h Minimum and maximum bandwidths for an interfering signal, respectively, the interference decision comprising the interference selection frequency and the interfering signal bandwidth

Thus defining the interference signal as

The sensing device configured with the communication party can sense the frequency spectrum of the whole communication frequency band in real time, and the PSD of the receiving end signal considering the coexistence of the user signal and the interference signal is expressed as follows:

H _t (f)＝g _u U _t (f)+g _j J _t (f)+n(f)

wherein g is _u Representing the channel gain, g, between the subscriber transmitter and the subscriber receiver _j Representing the channel gain from the jammer to the user receiver and n (f) representing the noise.

The spectrum sample after discretization is

Where Δ f denotes the spectral resolution, f _s Is the start-stop frequency; the frequency spectrum sample value sequence of the user receiving end is s _t ＝{s _1，t ，s _2，t ，…，s _n，t }。

In the embodiment, the frequency spectrum sample value sequence is obtained by calculating the communication frequency, the decoy frequency, the interference selection frequency and the interference signal bandwidth in the first mode, so that a basis is provided for acquiring subsequent environment state information, the communication is concealed, the anti-interference performance is improved, and meanwhile, the leakage of self information is avoided.

Optionally, as an embodiment of the present invention, the deep reinforcement learning network includes a communication deep reinforcement learning network and a spoofing deep reinforcement learning network, and in step S5, the process of fitting the environment state information, the preset starting frequency and the preset terminating frequency based on the deep reinforcement learning network and outputting the initial communication frequency at the current time and the initial spoofing frequency at the current time specifically includes:

extracting the characteristics of the environment state information based on the communication deep reinforcement learning network, outputting a plurality of initial communication Q values at the current moment, and screening out the maximum communication Q value from the plurality of communication Q values;

selecting a frequency within the range from the preset initial frequency to the preset termination frequency according to the maximum communication Q value to obtain the initial communication frequency at the current moment;

extracting the characteristics of the environment state information based on the spoofing deep reinforcement learning network, outputting a plurality of spoofing Q values which are initial at the current moment, and screening out the maximum spoofing Q value from the plurality of spoofing Q values;

and selecting the frequency within the range from the preset initial frequency to the preset termination frequency according to the maximum decoy Q value to obtain the initial decoy frequency at the current moment.

It should be understood that the communication Q-value is a value derived representing a selection of a different user decision in the ambient state St, and the spoof Q-value is a value derived representing a selection of a different spoof decision in the ambient state St.

Specifically, the user receives the environmental status information S _t . The user selects the probability epsilon according to the communication decision _c Randomly selecting one of said communication frequencies, or according to a probability 1-epsilon _c Selecting environmental status information S _t The communication frequency corresponding to the largest communication Q value in the down decision

The user selects a probability epsilon according to the spoofed decision _d Randomly selecting one of said spoofing frequencies, or according to the probability 1-epsilon _d Selecting an environmental State S _t The spoofing frequency corresponding to the maximum spoofing Q value in the down decision

In the embodiment, the initial communication frequency at the current moment and the initial spoofing frequency at the current moment are output based on the fitting processing of the deep reinforcement learning network on the environmental state information, the preset starting frequency and the preset terminating frequency, the frequency can be updated, the user spoofing signal can be ensured to firmly attract the interference attack, the user communication signal realizes the covert communication, and the purposes of avoiding the interference attack and obtaining the information without interference are achieved.

Optionally, as an embodiment of the present invention, the interference decision evaluation network includes a communication interference decision evaluation network and a spoofed interference decision evaluation network; the process of step S7 specifically includes:

performing feature extraction on the communication frequency in the preset historical communication frequency sequence table based on a communication interference decision evaluation network, and outputting a communication fitting environment state at the current moment;

calculating the communication fitting environment state and the environment state information by a second formula to obtain a communication evaluation network error value at the current moment, wherein the second formula is as follows:

wherein the content of the first and second substances,

a network error value is evaluated for the communication,

is the environmental status information at the time t,

fitting an environment state for the communication at the time t;

calculating the instantaneous return of the communication evaluation network error value through a third formula to obtain a first instantaneous return at the current moment, wherein the third formula is as follows:

wherein the content of the first and second substances,

a network error value is evaluated for the communication,

reporting for the first instant;

performing feature extraction on the decoy frequency in the preset historical decoy frequency sequence table based on a decoy interference decision evaluation network, and outputting a decoy fitting environment state at the current moment;

calculating the environment state of the decoy fitting and the environment state information through a fourth formula to obtain a decoy evaluation network error value at the current moment, wherein the fourth formula is as follows:

in order to spoof the evaluation of the network error value,

is the environmental status information at the time t,

fitting the environment state for the spoofing at time t;

calculating the instantaneous return of the cheating evaluation network error value through a fifth formula to obtain a second instantaneous return at the current moment, wherein the fifth formula is as follows:

wherein the content of the first and second substances,

in the form of a second instantaneous reward,

the net error value is evaluated for spoofing.

It should be appreciated that the communication interference decision evaluation network and the spoofed interference decision evaluation network may both be generating countermeasure networks.

It should be appreciated that the interference decision correlation evaluation network (i.e. either the communication interference decision evaluation network or the spoofed interference decision evaluation network) evaluates the network loss value with the communication at each iteration

And spoofing the evaluation of network loss values

Reverse updating respective interference decision evaluation network parameters

And

specifically, past communication decision sequences are compared

(i.e., the predetermined historical communication frequency sequence list) and past spoofing decision sequences

(namely the preset historical spoofing frequency sequence list) respectively input the respective interference decision correlation evaluation network (namely the communication interference decision evaluation network or the spoofing interference decision evaluation network) for feature extraction to obtain the output instantaneous environment state

And

computing output

And

and instantaneous environmental state

Estimate network loss value of communication between

And spoofing the evaluation of network loss values

Defining the first instantaneous reward of the communication deep reinforcement learning network as

The second instantaneous reward of the decoy deep reinforcement learning network is

In the embodiment, the fitting process of the communication frequency, the environmental state information and the spoofing frequency in the preset historical communication frequency sequence table and the fitting process of the spoofing frequency in the preset historical spoofing frequency sequence table are respectively output based on the interference decision evaluation network, so that the first instant reward of the current moment and the second instant reward of the current moment are ensured to be firmly attracted by the user spoofing signal, the user communication signal is enabled to realize covert communication, the purposes of avoiding interference attack and obtaining information without interference are realized, meanwhile, covert communication is realized by sacrificing the user spoofing signal, the anti-interference performance is improved, and meanwhile, the leakage of self information is avoided.

Optionally, as an embodiment of the present invention, the process of step S8 specifically includes:

when the environmental state is transferred to the environmental state of the next moment, executing the step S1 to the step S5 so as to obtain the environmental state information of the next moment;

taking the environmental state information at the current moment, the first instantaneous report at the current moment, the initial communication frequency at the current moment and the environmental state information at the next moment as communication experience information, and storing the communication experience information into a preset communication experience data set, wherein the communication experience information specifically comprises the following steps:

communication experience data set D for storing communication experience information with upper limit of N _c ；

Defining the communication experience information as

Wherein the content of the first and second substances,

where t is the time, S _t Environmental state at time t, S _t+1 Is the ambient state at time t +1,

for the decision to communicate at time t,

is the communication frequency at the time of the t,

a first instantaneous reward for time t;

taking the environmental state information at the current moment, the second instantaneous return at the current moment, the initial trapping frequency at the current moment and the environmental state information at the next moment as trapping experience information, and storing the trapping experience information into a preset trapping experience data set;

when the quantity of the communication experience information stored in the preset communication experience data set and the communication experience information stored in the preset cheating experience data setWhen the number of the cheating experience information reaches the preset upper limit value, extracting the communication experience information from the preset communication experience data set according to an equal probability mode, and updating the weight of the communication deep reinforcement learning network through the extracted communication experience information

Extracting decoy experience information from the preset decoy experience data set according to an equal probability mode, and updating the weight value of the decoy deep reinforcement learning network through the extracted decoy experience information

It should be understood that the preset upper limit value may be N/2.

It should be understood that the communication experience (i.e., the communication experience information) of the user at time t is recorded

It is stored into the data set Dc.

It is to be understood that N is a positive integer greater than 0.

In the embodiment, the environmental state information at the current moment, the first instantaneous reward at the current moment, the initial communication frequency at the current moment and the environmental state information at the next moment are taken as the communication experience information, and the communication experience information is stored in the preset communication experience data set, so that the user decoy signal can firmly attract the interference attack, the user communication signal can realize covert communication, the purposes of avoiding the interference attack and obtaining the information without interference are realized, meanwhile, the covert communication is realized by sacrificing the user decoy signal, the anti-interference performance is improved, and the leakage of the self information is avoided.

Optionally, as an embodiment of the present invention, the step of using the environmental status information at the current time, the second instantaneous reward at the current time, the initial spoofing frequency at the current time, and the environmental status information at the next time as the spoofing experience information, and storing the spoofing experience information in a preset spoofing experience data set specifically includes:

setting a spoofing experience data set D for storing upper limit of the N pieces of spoofing experience information _d ；

Defining the decoy experience information as

where t is the time, S _t Is the environmental state at time t, S _t+1 Is the environmental state at time t +1,

for the decoy decision at time t,

for the frequency of spoofing at time t,

the second instantaneous reward at time t. It should be understood that the user's t-time spoofing experience (i.e., the spoofing experience information) is recorded

It is stored in the data set Dd.

It is to be understood that N is a positive integer greater than 0.

In the embodiment, the environmental state information at the current moment, the second instantaneous reward at the current moment, the initial trapping frequency at the current moment and the environmental state information at the next moment are taken as the trapping experience information, and the trapping experience information is stored in the preset trapping experience data set, so that the user trapping signals can firmly attract the interference attack, the user communication signals realize the covert communication, the purposes of avoiding the interference attack and obtaining the information without interference are achieved, meanwhile, the covert communication is achieved by sacrificing the user trapping signals, the anti-interference performance is improved, and meanwhile, the leakage of the self information is avoided.

Optionally, as an embodiment of the present invention, the communication experience information is extracted from the preset communication experience data set according to an equiprobable manner, and the weight of the communication deep reinforcement learning network is updated according to the extracted communication experience information

The process comprises the following steps:

respectively extracting communication experience information from the preset communication experience data set according to an equal probability mode;

constructing a communication target value according to the extracted communication experience information, wherein the communication target value is as follows:

in order to achieve the communication target value,

is the first instantaneous return at time t, γ _c A value representing a communication reward attenuation factor,

is shown in the environment S _t+1 The maximum communication Q value output by the lower communication deep reinforcement learning network,

the communication frequency at the time t + 1;

calculating the gradient of the communication deep reinforcement learning network according to the sixth formula and the communication target value

The sixth formula is:

wherein the content of the first and second substances,

wherein the content of the first and second substances,

in order to calculate the partial derivatives,

is the communication frequency at the time t,

to communicate an error value, E is desired, S _t Is the state of the environment at time t,

in order to achieve the communication target value,

to be in an environment S _t The communication Q value output by the lower communication deep reinforcement learning network,

gradient of a reinforcement learning network for communication depth;

updating the gradient of the communication deep reinforcement learning network by using a random gradient descent algorithm to obtain a weight value of the communication deep reinforcement learning network

And until all the communication experience information in the preset communication experience data set is extracted.

In particular, from the communication experience data set D with equal probability _c Randomly extracting experience (i.e. communication experience information) and according to the state and phase in the experienceCorresponding decision and return are carried out to construct a target value

Wherein

Indicating the environmental state S in memory _t+1 The maximum Q value that the user can obtain (i.e., in the environment S) _t+1 Maximum communication Q value output by lower communication deep reinforcement learning network), calculating target value

And true value

Error between

And from this the gradient is calculated as follows,

then, adopting a random gradient descent algorithm to update the network weight

In the embodiment, the communication experience information is extracted from the preset communication experience data set in an equiprobable manner, the weight of the communication depth reinforcement learning network is updated through the extracted communication experience information, the user decoy signal can firmly attract the interference attack, the user communication signal can realize covert communication through the communication experience information, the purposes of avoiding the interference attack and obtaining the information without interference are achieved, meanwhile, the covert communication is achieved through sacrificing the user decoy signal, the anti-interference performance is improved, and meanwhile, the leakage of the information of the user is avoided.

Optionally, as an embodiment of the present invention, the cheating experience is cheated from the preset experience according to an equal probability mannerExtracting the decoy experience information in the data set, and updating the weight value of the decoy deep reinforcement learning network through the extracted decoy experience information

The process comprises the following steps:

respectively extracting the spoofing experience information from the preset spoofing experience data set according to an equal probability mode;

constructing a decoy target value according to the extracted decoy experience information, wherein the decoy target value is as follows:

wherein, γ _d Indicating a spoofed reward attenuation factor,

for the second instantaneous reward at time t,

in order to trick the target value into play,

is shown in the environment S _t+1 The maximum decoy Q value output by the lower decoy deep reinforcement learning network,

the frequency of fraud at time t + 1;

calculating the gradient of the decoy deep reinforcement learning network according to the seventh formula and the decoy target value

The seventh formula is:

wherein the content of the first and second substances,

in order to calculate the partial derivative,

in order to trick the target value into play,

to trick the error value, E is the expectation, S _t Is the state of the environment at time t,

to be in an environment S _t The lower spoofing deep reinforcement learning network outputs the spoofing Q value,

gradient of a deep reinforcement learning network for deception;

updating the gradient of the decoy deep reinforcement learning network by using a random gradient descent algorithm to obtain the weight of the decoy deep reinforcement learning network

And until all the cheating experience information in the preset cheating experience data set is extracted.

In particular, from the preset spoofing experience data set D, equi-probabilistically _d Randomly extracting experience (namely the cheating experience information), and constructing a target value according to the state in the experience and corresponding decision and return

Wherein

Indicating the environmental state S in memory _t+1 That the user can obtainMaximum Q value (i.e. in the environment S) _t+1 Maximum spoofing Q value output by the lower spoofing deep reinforcement learning network). Calculating a target value

And true value

Error between

And from this the gradient is calculated as follows,

then, adopting a random gradient descent algorithm to update the network weight

In the embodiment, the spoofing experience information is extracted from the preset spoofing experience data set in an equiprobable manner, the weight of the spoofing depth reinforcement learning network is updated through the extracted spoofing experience information, so that a user spoofing signal can firmly attract interference attack, the user communication signal can realize covert communication, the purposes of avoiding interference attack and obtaining information without interference are achieved, meanwhile, covert communication is realized through sacrificing the user spoofing signal, the anti-interference performance is improved, and meanwhile, the leakage of self information is avoided.

Optionally, as an embodiment of the present invention, in step S9, the fitting process is performed on the environment state information based on the updated deep reinforcement learning network, an optimized communication frequency is output, the deep reinforcement learning network is updated according to the preset spoofing experience data set, the fitting process is performed on the environment state information based on the updated deep reinforcement learning network, and a process of outputting the optimized spoofing frequency specifically includes:

updating the communication deep reinforcement learning network according to the preset communication experience data set, performing fitting processing on the environment state information based on the updated communication deep reinforcement learning network, and outputting optimized communication frequency;

and updating the spoofing deep reinforcement learning network according to the preset spoofing experience data set, fitting the environmental state information based on the updated spoofing deep reinforcement learning network, and outputting optimized spoofing frequency.

Specifically, outputting the optimized communication frequency and the optimized spoofing frequency are both based on the formula epsilon = max (0.01, epsilon-delta epsilon), and updating the communication decision selection probability epsilon _c And decoy decision selection probability epsilon _d Wherein, delta epsilon is the attenuation coefficient of the updating step length, and the probability epsilon is selected through the updated communication decision _c And decoy decision selection probability epsilon _d And (4) obtaining the product.

In the embodiment, the optimized communication frequency is output based on the fitting processing of the updated deep reinforcement learning network on the environmental state information, the deep reinforcement learning network is updated according to the preset spoofing experience data set, the optimized spoofing frequency is output based on the fitting processing of the updated deep reinforcement learning network on the environmental state information, and the user spoofing signal can firmly attract the interference attack, so that the user communication signal realizes covert communication, the purposes of avoiding the interference attack and obtaining the information without interference are achieved, meanwhile, the covert communication is realized by sacrificing the user spoofing signal, the anti-interference performance is improved, and meanwhile, the leakage of self information is avoided.

Optionally, as an embodiment of the present invention, before the method is executed, the user initializes the preset communication experience data set

And said predetermined spoofing experience dataset

Setting the preset communication experience data set D _c And said predetermined spoofing experience data set D _d The upper limit is N, the upper limit of the iterative times of the algorithm is M,the communication decision selection probability ε _c And the decoy decision selection probability ε _d Coefficient of reward attenuation gamma _c ，γ _d Communication deep reinforcement learning network parameters

And interference decision evaluation network parameters thereof

And spoofing deep reinforcement learning network parameters

And interference decision evaluation network parameters thereof

Set to a random number.

Optionally, as an embodiment of the present invention, the user and the interference are countered when the spectrum bandwidth B =20MHz, and both sides can freely select the center frequency to transmit the signal; bandwidth b of user signal _u =1MHz, power p _u =30dBm; interference signal bandwidth b ^j ∈[b _s ，b _h ]Can be changed according to the environmental state, wherein b _s ＝1MHz，b _h =3MHz, interference signal power p _j =60dBm. Carrying out full-band sensing on users and interference once every 1ms, and storing sensed frequency spectrum data for 100ms, namely, the length T =100ms during backtracking; the decision is made by the two parties every 10ms, wherein the user selects the method of the invention to make the optimal decision to realize anti-interference, the interference selects Q learning algorithm, and the best interference is made by learning the sensed user information. Initializing a user data set

Setting an upper limit N =1000 of a data set, setting an upper limit M =10000 of iteration times of the algorithm, and selecting a probability epsilon according to a strategy _c ＝ε _d =1, reward attenuation coefficient gamma _c ＝γ _d =0.8. The attenuation coefficient Δ ∈ =0.001 for the update step.

The final result shows that the method of the invention has excellent performance when the user uses the method to resist the intelligent interference with learning ability, the user can make a decoy decision to firmly attract the interference attack, and the communication decision can be concealed by the user, thereby not only improving the anti-interference performance, but also ensuring the information safety of the user.

Optionally, as another embodiment of the present invention, as shown in fig. 2, a decoy-assisted covert anti-jamming device includes:

the first judging module is used for acquiring the current moment through the user receiver and judging whether the current moment is earlier than the preset moment, if so, the first signal is sent to the random selection module, and if not, the second signal is sent to the frequency acquisition module;

the frequency acquisition module is used for directly acquiring the communication frequency and the spoofing frequency from the user receiver according to the second signal and sending the acquired communication frequency and the spoofing frequency to the frequency spectrum sample value sequence processing module;

Optionally, another embodiment of the present invention provides a spoof-assisted covert interference rejection device, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the computer program, when executed by the processor, implements the spoof-assisted covert interference rejection method as described above. The device may be a computer or the like.

Optionally, another embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, which, when executed by a processor, implements the decoy-assisted covert interference rejection method as described above.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

s5: judging whether the current time is earlier than the preset time again, if so, acquiring the next time through a user receiver, judging whether the next time is earlier than the preset time, if so, returning to the step S2, otherwise, returning to the step S3, taking the preset frequency spectrum sample value sequence table as the environmental state information of the current time, fitting the environmental state information, the preset starting frequency and the preset terminating frequency based on a deep reinforcement learning network until the current time is reached or later than the preset time, outputting the initial communication frequency of the current time and the initial decoy frequency of the current time, saving the communication frequency in a preset historical communication frequency sequence table, and saving the decoy frequency in the preset historical decoy frequency sequence table;

s9: updating the deep reinforcement learning network according to the preset communication experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting optimized communication frequency, updating the deep reinforcement learning network according to the preset decoy experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting optimized decoy frequency, and sending the optimized communication frequency and the optimized decoy frequency to the user receiver;

the interference signal includes an interference selection frequency and an interference signal bandwidth, and in step S4, the process of obtaining a spectrum sample sequence according to the communication frequency, the spoofing frequency, and the interference signal specifically includes:

calculating the communication frequency, the decoy frequency, the interference selection frequency and the interference signal bandwidth through a first formula to obtain a frequency spectrum sample sequence, wherein the first formula is as follows:

s _t ＝{s _1，t ，s _2，t ，…，s _n，t }，

wherein the content of the first and second substances,

wherein H _t (f)＝g _u U _t (f)+g _j J _t (f)+n(f)，

wherein the content of the first and second substances,

wherein the content of the first and second substances,

wherein s is _t Is a sequence of spectral samples, s _n,t For the nth spectral sample,. DELTA.f spectral resolution, f _s As the starting frequency, H _t (f) Is the power spectral density, g _u For the channel gain between the subscriber transmitter and the subscriber receiver, g _j For channel gain between jammer to user receiver, n (f) is noise, U _t (f) For user signals, J _t (f) In order to interfere with the signal(s),

for user decision, p _u Transmitting communication signals and spoofing signals for power, b _u For the spectral bandwidth of the communication signal and the spoof signal,

in order to make a decision on the interference,

for the communication decision at time t, f _t ^c Is the communication frequency at the time t,

for the decoy decision at time t,

for the decoy frequency at time t, f _t ^j The selection of a frequency for the interference is made,

is the interference signal bandwidth.

2. The decoy-assisted covert anti-jamming method according to claim 1, wherein the deep reinforcement learning network comprises a communication deep reinforcement learning network and a decoy deep reinforcement learning network, and the step S5 of fitting the environmental state information, the preset starting frequency and the preset terminating frequency based on the deep reinforcement learning network and outputting the initial communication frequency at the current time and the initial decoy frequency at the current time specifically comprises:

and selecting the frequency within the range from the preset initial frequency to the preset termination frequency according to the maximum spoofing Q value to obtain the initial spoofing frequency at the current moment.

3. The decoy-assisted covert interference rejection method of claim 1 wherein said interference decision evaluation network comprises a communication interference decision evaluation network and a decoy interference decision evaluation network; the process of the step S7 specifically includes:

a network error value is evaluated for the communication,

is the environmental status information at the time point t,

fitting an environment state for the communication at time t;

wherein the content of the first and second substances,

a network error value is evaluated for the communication,

reporting for the first instant;

wherein the content of the first and second substances,

in order to spoof the evaluation of the network error value,

is the environmental status information at the time t,

fitting the environment state for the spoofing at time t;

wherein the content of the first and second substances,

in the form of a second instantaneous reward,

network error values are evaluated for spoofing.

4. The decoy-assisted covert anti-jamming method according to claim 2, wherein the process of step S8 specifically comprises:

taking the environmental state information at the current moment, the first instantaneous report at the current moment, the initial communication frequency at the current moment and the environmental state information at the next moment as communication experience information, and storing the communication experience information into a preset communication experience data set, specifically:

Defining the communication experience information as

Wherein the content of the first and second substances,

a first instantaneous reward for time t;

when the number of the communication experience information stored in the preset communication experience data set and the number of the spoofing experience information stored in the preset spoofing experience data set both reach the preset upper limit value, extracting the communication experience information from the preset communication experience data set according to an equal probability mode, and updating the weight of the communication deep reinforcement learning network through the extracted communication experience information

5. The decoy-assisted covert anti-jamming method according to claim 4, wherein the process of taking the environmental status information at the current time, the second instantaneous reward at the current time, the initial decoy frequency at the current time and the environmental status information at the next time as the decoy experience information and storing the decoy experience information in a preset decoy experience data set specifically comprises:

setting up and storing upper limit N pieces of decoy experience informationDecoy experience data set D of _d ；

Defining the cheating experience information as

Wherein the content of the first and second substances,

where t is the time, S _t Is the environmental state at time t, S _t+1 Is the ambient state at time t +1,

for the decoy decision at time t,

for the frequency of spoofing at time t,

for a second instant reward at time t.

6. The decoy-assisted covert interference rejection method according to claim 4, wherein said communication experience information is extracted from said preset communication experience data set according to an equiprobable manner, and the weights of said communication deep reinforcement learning network are updated according to the extracted communication experience information

The process comprises the following steps:

in order to achieve the communication target value,

the communication frequency at the time t + 1;

The sixth formula is:

wherein the content of the first and second substances,

wherein the content of the first and second substances,

in order to calculate the partial derivatives,

is the communication frequency at the time t,

to communicate an error value, E is expected, S _t Is the state of the environment at time t,

in order to achieve the target value of the communication,

gradient of a reinforcement learning network for communication depth;

updating the gradient of the communication deep reinforcement learning network by using a random gradient descent algorithm to obtain the weight of the communication deep reinforcement learning network

7. The decoy-assisted covert anti-interference method according to claim 5, wherein the decoy experience information is extracted from the preset decoy experience data set according to an equal probability mode, and the weight of the decoy deep reinforcement learning network is updated according to the extracted decoy experience information

The process comprises the following steps:

and constructing a decoy target value according to the extracted decoy experience information, wherein the decoy target value is as follows:

wherein, γ _d Indicating a spoofed reward attenuation factor,

for the second instant reward at time t,

in order to trick the target value into play,

the frequency of fraud at time t + 1;

The seventh formula is:

wherein the content of the first and second substances,

wherein the content of the first and second substances,

in order to calculate the partial derivatives,

in order to trick the target value into play,

gradient of the network for decoy deep reinforcement learning;

8. A decoy-assisted covert interference rejection device, comprising:

the environment state transfer module is used for taking the environment state information, the first instantaneous reward of the current moment and the initial communication frequency of the current moment as the communication experience information of the current moment when the environment state is transferred to the environment state of the next moment, storing the communication experience information of the current moment into a preset communication experience data set, taking the environment state information, the second instantaneous reward of the current moment and the initial trapping frequency of the current moment as the trapping experience information of the current moment, storing the trapping experience information of the current moment into a preset trapping experience data set, returning to the primary judging module until the number of the communication experience information stored in the preset communication experience data set and the number of the trapping experience information stored in the preset trapping experience data set reach a preset upper limit value, and sending a third signal into the frequency optimization module;

the frequency optimization module is used for receiving the third signal, updating the deep reinforcement learning network according to the preset communication experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting an optimized communication frequency, updating the deep reinforcement learning network according to the preset spoofing experience data set, fitting the environmental state information based on the updated deep reinforcement learning network, outputting an optimized spoofing frequency, and sending the optimized communication frequency and the optimized spoofing frequency to the user receiver;

the interference signal comprises an interference selection frequency and an interference signal bandwidth, and the spectrum sample sequence processing module is specifically configured to:

s _t ＝{s _1，t ，s _2，t ，…，s _n，t }，

wherein H _t (f)＝g _u U _t (f)+g _j J _t (f)+n(f)，

Wherein the content of the first and second substances,

wherein the content of the first and second substances,

wherein the content of the first and second substances,

wherein s is _t Is a sequence of spectral samples, s _n,t For the nth spectral sample,. DELTA.f the spectral resolution, f _s As the starting frequency, H _t (f) Is the power spectral density, g _u For the channel gain between the subscriber transmitter and the subscriber receiver, g _j For channel gain between jammer to user receiver, n (f) is noise, U _t (f) For user signals, J _t (f) In order to interfere with the signal(s),

in order to make a decision on the interference,

for a decoy decision at time t, f _t ^d Decoy frequency at time t, f _t ^j The frequency is selected for the purpose of interference,

is the interference signal bandwidth.

9. A computer-readable storage medium, storing a computer program, wherein the computer program, when executed by a processor, implements the decoy-assisted covert interference rejection method of any one of claims 1 to 7.