CN115150963A - Multi-user scene-oriented distributed interference avoidance method and device - Google Patents

Multi-user scene-oriented distributed interference avoidance method and device Download PDF

Info

Publication number
CN115150963A
CN115150963A CN202211076124.9A CN202211076124A CN115150963A CN 115150963 A CN115150963 A CN 115150963A CN 202211076124 A CN202211076124 A CN 202211076124A CN 115150963 A CN115150963 A CN 115150963A
Authority
CN
China
Prior art keywords
utility function
time slot
local
data transmission
legal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211076124.9A
Other languages
Chinese (zh)
Other versions
CN115150963B (en
Inventor
张姣
雷婵
赵海涛
陈海涛
魏急波
周力
刘月玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202211076124.9A priority Critical patent/CN115150963B/en
Publication of CN115150963A publication Critical patent/CN115150963A/en
Application granted granted Critical
Publication of CN115150963B publication Critical patent/CN115150963B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B17/00Monitoring; Testing
    • H04B17/30Monitoring; Testing of propagation channels
    • H04B17/309Measuring or estimating channel quality parameters
    • H04B17/345Interference values
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B17/00Monitoring; Testing
    • H04B17/30Monitoring; Testing of propagation channels
    • H04B17/382Monitoring; Testing of propagation channels for resource allocation, admission control or handover

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Electromagnetism (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The application relates to a distributed interference avoidance method and device for a multi-user scene. The method comprises the following steps: each legal user obtains local observation information of a current time slot by perceiving an electromagnetic environment, inputs the local observation information of the current time slot and the action of a previous time slot into an intelligent network for calculation to obtain a local utility function of each legal user, fuses all the local utility functions by means of a central controller with a hybrid network during training to obtain a combined utility function, continuously adjusts and evaluates strategies under central auxiliary training to finally obtain an optimal combined strategy, and further obtains an optimal interference avoidance strategy of each legal user; the optimal interference avoidance strategy of the legal user independently completes communication decision to carry out data transmission. By adopting the method, the legal users can simultaneously deal with the malicious interference of the external jammers and the mutual interference between the internal users without depending on the central centralized deployment, and the normal communication is realized.

Description

Multi-user-scene-oriented distributed interference avoidance method and device
Technical Field
The application relates to the technical field of intelligent wireless communication, in particular to a distributed interference avoidance method and device for a multi-user scene.
Background
With the continuous development of communication technology, the communication of users in the network is susceptible to external electromagnetic interference, and meanwhile, mutual interference is easy to occur when multiple users communicate without negotiation in the network, so that respective normal communication is affected. Therefore, how to enable users deployed in a distributed manner in a network to autonomously identify and avoid electromagnetic interference is imperative, and the users have intelligent interference avoidance capability.
Traditional solutions based on centralized allocation or negotiation between users solve the multi-user scenario electromagnetic interference problem. On one hand, for the centralized allocation scheme, the disadvantages exist that in an actual scene, it cannot be guaranteed that a base station and other facilities exist as a central control role, even if the facilities exist, the overall scheduling of the base station is quite complex, and information transmission between the base station and a user brings about great communication and calculation cost, which is not suitable for a wireless communication network with limited resources; meanwhile, a large processing time delay is inevitably caused, which is unfavorable for coping with a time-varying and severe electromagnetic interference environment; in addition, once the center is destroyed, it directly results in the disorganization of the wireless resource usage by the users, thereby causing interference and communication failure. On the other hand, for the scheme of mutual negotiation between users, the disadvantage is that the negotiation between users is often limited to adjacent users in the communication range, and this limitation causes the problem of overall processing delay of the network; due to the complexity of the network environment, the communication strategy which is obtained by the scheme is possibly not the optimal strategy, and the problem of interference in the communication process of the user cannot be completely solved; meanwhile, the potential security problem in the user negotiation needs to be also taken into consideration.
Disclosure of Invention
In view of the foregoing, it is necessary to provide a distributed interference avoidance method and apparatus for a multi-user scenario, which can implement legal communication of multiple legitimate users under the conditions of a harsh communication environment and dynamic changes.
A distributed interference avoidance method for a multi-user scenario, the method comprising:
a legal user obtains local observation information of the current time slot by sensing an electromagnetic environment; the local observation information comprises the state information of a legal user and the channel use information in the electromagnetic environment;
each legal user inputs the local observation information of the current time slot and the action of the previous time slot into the intelligent network for calculation to obtain a local utility function of each legal user; the action input of the previous time slot is obtained by selecting an access channel for data transmission according to the local observation information of the previous time slot and making a decision according to a feedback result received by the data transmission;
all legal users input the local utility function into a mixed network in a central controller for fusion to obtain a combined utility function, and the combined utility function is trained and updated according to a utility function approximation algorithm to obtain an optimal combined utility function; wherein the optimal joint utility function comprises an optimal joint strategy;
and the central controller distributes the optimal combination strategy to each legal user to obtain an optimal interference avoidance strategy corresponding to each legal user, and the legal users autonomously decide a communication channel according to the optimal interference avoidance strategy to carry out data transmission.
In one embodiment, the step of inputting, by each legitimate user, the local observation information of the current time slot and the action of the previous time slot into the intelligent agent network for calculation to obtain the local utility function of each legitimate user includes:
and each legal user inputs the local observation information of the current time slot and the action of the previous time slot into the intelligent agent network for calculation to obtain a local utility function of each legal user, wherein the local utility function comprises a historical observation action set of the intelligent agent network, the action of the current time slot and the intelligent agent network strategy.
In one embodiment, the action input of the previous time slot is obtained by selecting an access channel for data transmission according to local observation information of the previous time slot and making a decision according to a feedback result received by the data transmission, and the action input of the previous time slot includes:
selecting an access channel for data transmission according to the local observation information of the previous time slot to obtain a data transmission result;
analyzing the communication condition of the access channel according to the data transmission result to obtain a feedback result;
and making a decision according to the feedback result to obtain the action of the last time slot.
In one embodiment, the data transmission result comprises a data transmission success result and a data transmission failure result;
analyzing the communication condition of the access channel according to the data transmission result to obtain a feedback result, wherein the feedback result comprises the following steps:
analyzing the communication condition of the access channel according to the successful data transmission result to obtain a positive feedback result;
and analyzing the communication condition of the access channel according to the data transmission failure result to obtain a negative feedback result and a non-feedback result.
In one embodiment, analyzing the communication condition of the access channel according to the data transmission failure result to obtain a negative feedback result and a non-feedback result, includes:
analyzing the communication condition of the access channel according to the data transmission failure result, and obtaining a negative feedback result when the communication condition is interfered by other legal users; when the communication condition is interfered by the jammer, a non-feedback result is obtained.
In one embodiment, the making a decision according to the feedback result to obtain the action of the previous time slot includes:
making a decision according to the positive feedback result to obtain the action of the last time slot as selecting an access channel for communication;
and making a decision according to the negative feedback and non-feedback results to obtain the action of the previous time slot, namely selecting other channels for communication.
In one embodiment, the step of inputting the local utility function into a hybrid network in the central controller for fusion by all legal users to obtain a joint utility function includes:
all legal users input the local utility function into a mixed network in a central controller for fusion to obtain a combined utility function; the joint utility function comprises historical observation action sets of all the intelligent agent networks, an action set of the current time slot and a joint strategy;
and coordinating the relationship between the joint utility function and the local utility function according to the hyper-network in the hybrid network, so that the monotonicity between the joint utility function and the local utility function of each legal user is met.
In one embodiment, coordinating a relationship between a joint utility function and a local utility function according to a super network in a hybrid network so that the joint utility function and the local utility function of each legitimate user satisfy monotonicity, includes:
inputting the global environment state into a super network in the hybrid network for calculation to obtain a bias parameter and a non-negative weight parameter of the hybrid network;
and coordinating the relationship between the joint utility function and the local utility function according to the bias parameter and the non-negative weight parameter, so that the monotonicity between the joint utility function and the local utility function of each legal user is met.
In one embodiment, training and updating the combined utility function according to a utility function approximation algorithm to obtain an optimal combined utility function includes:
and inputting the joint utility function and the target network utility function into a pre-constructed loss function for calculation, and training and updating the joint utility function through a minimized loss function to obtain an optimal joint utility function, wherein the optimal joint utility function comprises an optimal joint strategy.
A distributed interference avoidance apparatus facing a multi-user scene comprises:
the perception module is used for a legal user to obtain local observation information of the current time slot by perceiving the electromagnetic environment; the local observation information comprises the state information of a legal user and the channel use information in the electromagnetic environment;
the training module is used for inputting the local observation information of the current time slot and the action of the previous time slot into the intelligent agent network for calculation by each legal user to obtain a local utility function of each legal user; the action input of the previous time slot is obtained by selecting an access channel for data transmission according to the local observation information of the previous time slot and making a decision according to a feedback result received by the data transmission; all legal users input the local utility function into a mixed network in the central controller for fusion to obtain a combined utility function, and the combined utility function is trained and updated according to a utility function approximation algorithm to obtain an optimal combined utility function; wherein the optimal joint utility function comprises an optimal joint strategy;
and the interference avoidance module is used for distributing the optimal combined strategy to each legal user by the central controller to obtain the optimal interference avoidance strategy corresponding to each legal user, and the legal users autonomously decide a communication channel according to the optimal interference avoidance strategy to perform data transmission.
According to the distributed interference avoiding method and device for the multi-user scene, each legal user calculates perceived local observation information and actions according to an intelligent lifting network by means of a central controller with a hybrid network during training to obtain a local utility function, all legal users input the local utility functions into the central controller to be fused to obtain a combined utility function, and under the auxiliary training of the center, strategies are continuously adjusted and evaluated to finally obtain the optimal combined strategies, so that the optimal interference avoiding strategies of all legal users are obtained; in actual execution, a legal user does not rely on the central controller any more, and the legal user can independently complete communication decision to carry out data transmission according to currently perceived local observation information and an optimal interference avoidance strategy. By adopting the distributed interference avoiding method for decentralized execution of centralized training, under the condition of severe and dynamic change of the actual wireless communication environment, legal users can simultaneously deal with malicious interference of an external interference machine and mutual interference between internal users without depending on centralized deployment of a center, and normal communication is realized.
Drawings
Fig. 1 is an application scenario diagram of a distributed interference avoidance method for a multi-user scenario in an embodiment;
FIG. 2 is a flowchart illustrating a distributed interference avoidance method for a multi-user scenario according to an embodiment;
fig. 3 is a schematic diagram of a timeslot structure in which a legitimate user and a jammer operate in one embodiment;
FIG. 4 is a block diagram of the method of the present invention according to one embodiment;
FIG. 5 is a graph comparing the interference avoidance performance of the proposed method of the present invention with Q learning and ideal solution in one embodiment;
fig. 6 is a graph of the interference avoidance performance of the proposed method for different network sizes in one embodiment;
FIG. 7 is a diagram illustrating a time-frequency representation of a user without training and learning according to the present invention;
FIG. 8 is a time-frequency representation of a user after training and learning according to the method of the present invention in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The distributed interference avoidance method for the multi-user scene can be applied toAs in the communication network scenario shown in fig. 1. The presence of a jammer in a communication network andNfor a legitimate user comprising a transmitter and a receiver, denoted as
Figure 341030DEST_PATH_IMAGE001
In space existsKThe set of interference channels of the jammer, which is consistent with the set of communication channels of the legitimate users, is indicated as available channels
Figure 461433DEST_PATH_IMAGE002
When legal users communicate, the jammers continuously work and switch on different channels in hopes of interfering with the data transmission of the legal users. Specifically, the invention considers that the jammer only interferes one channel per time slot, the interference mode is sweep frequency interference but unknown to legal users, and meanwhile, because the communication network comprises multiple pairs of legal users, mutual interference among the users exists in the data transmission process.
In an embodiment, as shown in fig. 2, a distributed interference avoidance method for a multi-user scenario is provided, and is described by taking an application scenario in fig. 1 as an example, the method includes the following steps:
step 202, a legal user obtains local observation information of the current time slot by sensing an electromagnetic environment; the local observation information comprises the state information of a legal user and the channel use information in the electromagnetic environment.
It can be understood that a legal user mainly perceives the frequency domain information in the electromagnetic environment, so as to perceive that the channel is free from interference of an interference machine, i.e. no other user uses the channel; in particular, due to the limitation of user hardware devices and the complexity of the environment, a user can only perceive local observation information in the environment, but cannot acquire global information.
Step 204, inputting the local observation information of the current time slot and the action of the previous time slot into the intelligent agent network for calculation by each legal user to obtain a local utility function of each legal user; and the action input of the previous time slot is obtained by selecting an access channel for data transmission according to the local observation information of the previous time slot and making a decision according to a feedback result received by the data transmission.
It can be understood that the intelligent network adopts a deep-cycle neural network (DRQN) structure, and can use all observation-action history information to represent the current state, so as to effectively deal with the local observation problem of the legal user, the intelligent network corresponds to the legal user one by one, the input of the intelligent network is the local observation information of the current time slot and the action of the last time slot of each legal user, and the output information is the local utility function of each legal user.
Step 206, inputting the local utility function into a mixed network in the central controller by all legal users for fusion to obtain a combined utility function, and training and updating the combined utility function according to a utility function approximation algorithm to obtain an optimal combined utility function; wherein the optimal joint utility function comprises an optimal joint strategy;
the hybrid network provided by the invention is a nonlinear network, local utility functions of all legal users are fused according to the hybrid network to obtain a joint utility function, the joint utility function is used for evaluating the quality of actions of all legal users, the joint utility function is trained and updated according to a utility function approximation algorithm to obtain an optimal joint utility function, and the maximization of the multi-user interference avoidance performance is ensured.
And 208, the central controller distributes the optimal combined strategy to each legal user to obtain an optimal interference avoidance strategy corresponding to each legal user, and the legal users autonomously decide a communication channel according to the optimal interference avoidance strategy to perform data transmission.
The distributed interference avoidance method for the multi-user scene is a distributed interference avoidance method based on multi-agent reinforcement learning (QMIX), the central controller distributes the optimal combination strategy to each legal user to obtain the optimal interference avoidance strategy corresponding to each legal user, the legal users can autonomously capture the interference mode of an interference machine under the condition that no center exists and mutual negotiation does not exist according to the optimal interference avoidance strategies, and the legal users can avoid mutual interference among the legal users, so that normal communication can be realized.
The distributed interference avoiding method for the multi-user scene comprises the steps that a central controller with a hybrid network is deployed during training, each legal user calculates perceived local observation information and actions according to an intelligent network to obtain a local utility function, all legal users input the local utility functions into the central controller to be fused to obtain a combined utility function, strategies are continuously adjusted and evaluated under central auxiliary training, and finally the optimal interference avoiding strategies of each legal user are obtained; in actual execution, a legal user does not rely on the central controller any more, and the legal user can independently complete communication decision to carry out data transmission according to currently perceived local observation information and an optimal interference avoidance strategy. By adopting the distributed interference avoiding method for decentralized execution of centralized training, the method can ensure that legal users simultaneously deal with the malicious interference of an external interference machine and the mutual interference between internal users under the condition of independent central centralized deployment under the condition of severe and dynamic change of the actual wireless communication environment, and realize normal communication.
In one embodiment, the step of inputting, by each legal user, the local observation information of the current time slot and the action of the previous time slot into the intelligent agent network for calculation to obtain the local utility function of each legal user includes:
for arbitrary agents
Figure 506749DEST_PATH_IMAGE003
Each legal user will observe the local information of the current time slot
Figure 495434DEST_PATH_IMAGE004
And the last time slot
Figure 513069DEST_PATH_IMAGE005
Inputting the corresponding intelligent network for calculation to obtain the local utility function of each legal user
Figure 926077DEST_PATH_IMAGE006
Wherein the local utility function comprises a set of historical observed actions for the agent network
Figure 458689DEST_PATH_IMAGE007
The current time slot istMovement of time
Figure 126431DEST_PATH_IMAGE008
And agent network policy
Figure 388785DEST_PATH_IMAGE009
It can be understood that the local utility function is used for evaluating the quality of the actions of the legal users under the current policy, and when the maximum value of each local utility function is obtained, the optimal distributed policy of each legal user is obtained.
In one embodiment, the action input of the previous time slot is obtained by selecting an access channel for data transmission according to local observation information of the previous time slot and making a decision according to a feedback result received by the data transmission, and the action input of the previous time slot includes:
selecting an access channel for data transmission according to the local observation information of the last time slot to obtain a data transmission result;
analyzing the communication condition of the access channel according to the data transmission result to obtain a feedback result;
and making a decision according to the feedback result to obtain the action of the last time slot.
It can be understood that, as shown in fig. 3, when a legal user performs communication, the jammer continuously operates, the jammer performs frequency sweeping interference on channels in space according to time slots, selects one channel at the initial time of each time slot, and keeps the interference channel unchanged in the current time slot, and particularly, if the jammer interference channel is a communication channel of the legal user, the jammer performs interference successfully; otherwise, the interference is invalid;
when the jammer switches over different channels to interfere the data transmission of the legal user, the state of the user on the unaccessed channel is unknown, the real interference situation of the channel can be obtained only after the data transmission is carried out on the accessed channel, and the decision situation of the user on other users in the network is also unknown, so that the user needs to obtain the channel access action sequentially through the sensing, data transmission, learning analysis and decision process, and the jammer is prevented from interfering the accessed channel.
In one embodiment, the data transmission result comprises a data transmission success result and a data transmission failure result;
analyzing the communication condition of the access channel according to the data transmission result to obtain a feedback result, wherein the feedback result comprises the following steps:
analyzing the communication condition of the access channel according to the successful data transmission result to obtain a positive feedback result;
and analyzing the communication condition of the access channel according to the data transmission failure result to obtain a negative feedback result and a non-feedback result.
In one embodiment, analyzing the communication condition of the access channel according to the data transmission failure result to obtain a negative feedback result and a non-feedback result, includes:
analyzing the communication condition of the access channel according to the data transmission failure result, and obtaining a negative feedback result when the communication condition is interfered by other legal users; when the communication situation is interfered by the jammer, a feedback-free result is obtained.
It can be understood that there are two cases of success and failure in data transmission, where the failure is mainly caused by channel sensing and access failure, resulting in interference from jammers or collision of data transmission by other legitimate users.
It can be understood that three feedback results are obtained according to the data transmission result, and when the data transmission is successful, positive feedback of ACK is received, which indicates that the user is not interfered; the user receives NACK negative feedback to indicate that the user is interfered by other users in the data transmission process; receiving a result of no feedback, which indicates that the user is maliciously interfered by an interference machine; wherein the latter two cases both indicate a data transmission failure.
In one embodiment, the making a decision according to the feedback result to obtain the action of the previous time slot includes:
making a decision according to the positive feedback result, wherein the action of obtaining the last time slot is to select an access channel for communication;
and making a decision according to the negative feedback and non-feedback results to obtain the action of the previous time slot, namely selecting other channels for communication.
It can be understood that the user identifies the network environment situation according to the received feedback result, and decides how to adjust the interference avoidance strategy of the next time slot.
In one embodiment, the step of inputting the local utility function into a hybrid network in the central controller for fusion by all legal users to obtain a joint utility function includes:
the network framework composed of the intelligent agent network deployed in the legal users and the hybrid network and the super network deployed in the central controller is shown in FIG. 4, and all the legal users will use the local utility function
Figure 585411DEST_PATH_IMAGE010
The mixed networks input into the central controller are fused to obtain a combined utility function
Figure 605320DEST_PATH_IMAGE011
(ii) a Wherein the joint utility function comprises a historical observation action set of all agent networks
Figure 935807DEST_PATH_IMAGE012
Current time slot action set
Figure 662455DEST_PATH_IMAGE013
And federation policies
Figure 685774DEST_PATH_IMAGE014
Coordinating the relationship between the joint utility function and the local utility function according to the hyper-network in the hybrid network so that the joint utility function and the local utility function of each legal user satisfy monotonicity, expressed as
Figure 396241DEST_PATH_IMAGE015
In one embodiment, coordinating a relationship between a joint utility function and a local utility function according to a super network in a hybrid network so that the joint utility function and the local utility function of each legitimate user satisfy monotonicity, includes:
global environment state
Figure 936944DEST_PATH_IMAGE016
Inputting into the super network in the hybrid network for calculation to obtain bias parameters of the hybrid networkbAnd non-negative weight parameterw
And coordinating the relationship between the joint utility function and the local utility function according to the bias parameter b and the non-negative weight parameter w, so that the monotonicity between the joint utility function and the local utility function of each legal user is met.
It can be appreciated that the super network is deployed within the hybrid network, and the convergence speed of the hybrid network is improved by coordinating the relationship between the joint utility function and the local utility function.
In one embodiment, training and updating the combined utility function according to a utility function approximation algorithm to obtain an optimal combined utility function includes:
inputting the joint utility function and the target network utility function into a pre-constructed loss function for calculation,
a pre-constructed loss function of
Figure 908311DEST_PATH_IMAGE017
Wherein, in the step (A),
Figure 712319DEST_PATH_IMAGE018
Figure 34716DEST_PATH_IMAGE019
indicating the batch size at which sampling was performed during training,
Figure 379110DEST_PATH_IMAGE020
to representFirst, theiThe global reward under each sampling batch, namely the sum of the instant rewards of all the intelligent agent networks,
Figure 345929DEST_PATH_IMAGE021
is a target network utility function and is interpreted as the state of the user
Figure 445472DEST_PATH_IMAGE022
Temporal basis policy
Figure 661689DEST_PATH_IMAGE023
Performing an action
Figure 13036DEST_PATH_IMAGE024
Combined with historical observation-action information
Figure 693416DEST_PATH_IMAGE025
The utility function value obtained by evaluation can be used for ensuring the operation stability of the algorithm during training;
by minimizing a loss function
Figure 104806DEST_PATH_IMAGE026
Training and updating the combined utility function to obtain an optimal combined utility function, wherein the optimal combined utility function comprises an optimal combined strategy
Figure 808320DEST_PATH_IMAGE027
Upon execution, the central controller will optimize the federation policy
Figure 825342DEST_PATH_IMAGE028
Distributing the interference avoidance information to each legal user to obtain the optimal interference avoidance strategy corresponding to each legal user
Figure 501174DEST_PATH_IMAGE029
And a legal user autonomously decides a communication channel according to the optimal interference avoidance strategy to carry out data transmission.
It can be understood that the invention adopts the idea of centralized training and decentralized execution, and comprises two stages of off-line training and on-line execution respectively. On one hand, in the off-line training stage, the central controller is used for collecting the observation, action and reward information of all the agents, training and issuing the optimal interference avoidance strategy to the user. On the other hand, the user online execution stage does not need the participation of a central controller, each legal user inputs own perception information locally, and the decision information is output autonomously and executed by learning the optimal interference avoidance strategy, so that the intelligent anti-interference capability of the legal user is ensured.
Furthermore, the method provided by the invention is compared with the Q learning and interference avoidance performance under an ideal scheme through experimental verification, two pairs of legal users, one interference machine and six channels are considered during the experimental verification, and particularly, each iteration comprises 100 rounds of training, and each round of training comprises 60-time-slot interactive learning. The performance comparison result is shown in fig. 5, where the abscissa in fig. 5 is the iteration number, and the ordinate is the normalized reward value, and it can be seen from fig. 5 that the performance under the method of the present invention shows a trend of increasing continuously with the increase of the training number, then gradually becomes stable within a limited number, and reaches convergence after about 220 times of iterative training, and the obtained performance effect is significantly better than that of the centralized Q learning method, and the convergence value is highly consistent with the performance value under the ideal scheme. The result shows that the invention has effective interference avoidance performance, and ensures that users have the capability of autonomously coping with malicious interference of an interference machine and mutual interference among users.
Specifically, the method provided by the invention is also experimentally verified for the interference avoidance performance of different network scales, and similarly, the number of the fixed channels is six, the number of the legal users is two pairs, three pairs and four pairs respectively, and particularly, the larger the number of the legal users is, the larger the network scale is, and the more complex the communication environment is. The verification result is shown in fig. 6, the abscissa in the figure is the iteration number, and the ordinate in the figure is the average global reward value of one time slot, and it can be seen from fig. 6 that, in three different scale scenes, the performance value can reach convergence within a limited number of times, which ensures the interference avoidance effectiveness and applicability of the present invention for different network scales and different communication environment complexities.
In addition, the time-frequency result of the user not trained and learned by the method of the invention is compared with the time-frequency result of the user trained and learned by the algorithm of the invention, the schematic diagram of the time-frequency result of the user not trained and learned by the method of the invention is shown in fig. 7, the diagram shows the working conditions of three pairs of legal users and an interference machine under six channels, the abscissa is the tested time slot, the ordinate represents the channel ID, different color blocks of the grid correspond to the conditions of different channels used by the legal users and the interference machine, it can be seen that the interference machine implements frequency sweep interference in real time, most of the legal users can not normally communicate, namely are interfered by the interference machine, the mutual interference of other users or the two kinds of interference at the same time, which indicates that the user is frequently interfered and the interference avoiding capability is very poor before the method of the invention is trained. The schematic diagram of the time-frequency result of the user after training and learning by the algorithm provided by the invention is shown in fig. 8, and it can be seen that under the condition that the network environment is complex and changes and no negotiation exists between multiple users when accessing the channel, by the method provided by the invention, a legal user can completely and independently avoid the interference of an interference machine, only individual mutual interference exists, and normal communication can be realized under most conditions, and the result further ensures the effectiveness of the invention.
It should be understood that, although the various steps in the flowchart of fig. 1 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 1 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
In one embodiment, a distributed interference avoidance apparatus facing a multi-user scenario is provided, including: perception module, training module and interference avoidance module, wherein:
the perception module is used for a legal user to obtain local observation information of the current time slot by perceiving the electromagnetic environment; the local observation information comprises the state information of a legal user and the channel use information in the electromagnetic environment.
The training module is used for inputting the local observation information of the current time slot and the action of the previous time slot into the intelligent agent network for calculation by each legal user to obtain a local utility function of each legal user; the action input of the previous time slot is obtained by selecting an access channel for data transmission according to the local observation information of the previous time slot and making a decision according to a feedback result received by the data transmission; all legal users input the local utility function into a mixed network in the central controller for fusion to obtain a combined utility function, and the combined utility function is trained and updated according to a utility function approximation algorithm to obtain an optimal combined utility function; wherein the optimal joint utility function comprises an optimal joint strategy.
And the interference avoidance module is used for distributing the optimal combined strategy to each legal user by the central controller to obtain the optimal interference avoidance strategy corresponding to each legal user, and the legal users autonomously decide a communication channel according to the optimal interference avoidance strategy to perform data transmission.
For specific limitation of a distributed interference avoidance apparatus for a multi-user scenario, refer to the above limitation on a distributed interference avoidance method for a multi-user scenario, and details are not repeated here. Each module in the distributed interference avoiding device facing to the multi-user scenario can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A distributed interference avoidance method facing a multi-user scene is characterized by comprising the following steps:
a legal user obtains local observation information of the current time slot by sensing an electromagnetic environment; the local observation information comprises the state information of a legal user and the channel use information in the electromagnetic environment;
each legal user inputs the local observation information of the current time slot and the action of the previous time slot into the intelligent network for calculation to obtain a local utility function of each legal user; the action input of the previous time slot is obtained by selecting an access channel for data transmission according to the local observation information of the previous time slot and making a decision according to a feedback result received by the data transmission;
all legal users input the local utility function into a mixed network in a central controller for fusion to obtain a combined utility function, and the combined utility function is trained and updated according to a utility function approximation algorithm to obtain an optimal combined utility function; wherein the optimal joint utility function comprises an optimal joint strategy;
and the central controller distributes the optimal combined strategy to each legal user to obtain an optimal interference avoidance strategy corresponding to each legal user, and the legal users autonomously decide a communication channel according to the optimal interference avoidance strategy to carry out data transmission.
2. The method of claim 1, wherein each valid user inputs the local observation information of the current time slot and the action of the previous time slot into the intelligent agent network for calculation, so as to obtain a local utility function of each valid user, and the method comprises the following steps:
and each legal user inputs the local observation information of the current time slot and the action of the previous time slot into the intelligent agent network for calculation to obtain a local utility function of each legal user, wherein the local utility function comprises a historical observation action set of the intelligent agent network, the action of the current time slot and the intelligent agent network strategy.
3. The method of claim 1, wherein the action input of the previous time slot is obtained by selecting an access channel for data transmission according to local observation information of the previous time slot and making a decision according to a feedback result received from the data transmission, and the method comprises:
selecting an access channel for data transmission according to the local observation information of the last time slot to obtain a data transmission result;
analyzing the communication condition of the access channel according to the data transmission result to obtain a feedback result;
and making a decision according to the feedback result to obtain the action of the last time slot.
4. The method of claim 3, wherein the data transmission result comprises a data transmission success result and a data transmission failure result;
analyzing the communication condition of the access channel according to the data transmission result to obtain a feedback result, wherein the feedback result comprises the following steps:
analyzing the communication condition of the access channel according to the successful data transmission result to obtain a positive feedback result;
and analyzing the communication condition of the access channel according to the data transmission failure result to obtain a negative feedback result and a non-feedback result.
5. The method of claim 4, wherein analyzing the communication condition of the access channel according to the data transmission failure result to obtain a negative feedback result and a non-feedback result comprises:
analyzing the communication condition of the access channel according to the data transmission failure result, and obtaining a negative feedback result when the communication condition is interfered by other legal users; when the communication condition is interfered by the jammer, a non-feedback result is obtained.
6. The method of claim 5, wherein the act of making a decision based on the feedback result to obtain a last timeslot comprises:
making a decision according to the positive feedback result to obtain the action of the last time slot, namely selecting the access channel for communication;
and making a decision according to the negative feedback and non-feedback results to obtain the action of the last time slot, namely selecting other channels for communication.
7. The method of claim 1, wherein the step of all legal users fusing the local utility function input into a hybrid network in a central controller to obtain a joint utility function comprises:
all legal users input the local utility function into a mixed network in a central controller for fusion to obtain a combined utility function; the joint utility function comprises historical observation action sets of all intelligent agent networks, action sets of current time slots and joint strategies;
and coordinating the relationship between the joint utility function and the local utility function according to the hyper-network in the hybrid network, so that the monotonicity between the joint utility function and the local utility function of each legal user is met.
8. The method of claim 7, wherein coordinating the relationship between the joint utility function and the local utility function according to a hyper-network in the hybrid network such that the joint utility function and the local utility function of each legitimate user satisfy monotonicity, comprises:
inputting a global environment state into a hyper-network in the hybrid network for calculation to obtain a bias parameter and a nonnegative weight parameter of the hybrid network;
and coordinating the relationship between the joint utility function and the local utility function according to the bias parameter and the non-negative weight parameter, so that the monotonicity between the joint utility function and the local utility function of each legal user is met.
9. The method of claim 1, wherein training and updating the combined utility function according to a utility function approximation algorithm to obtain an optimal combined utility function comprises:
and inputting the combined utility function and a target network utility function into a pre-constructed loss function for calculation, and training and updating the combined utility function by minimizing the loss function to obtain an optimal combined utility function, wherein the optimal combined utility function comprises an optimal combined strategy.
10. A distributed interference avoidance apparatus for a multi-user scenario, the apparatus comprising:
the perception module is used for a legal user to obtain local observation information of the current time slot by perceiving the electromagnetic environment; the local observation information comprises the state information of a legal user and the channel use information in the electromagnetic environment;
the training module is used for inputting the local observation information of the current time slot and the action of the previous time slot into the intelligent agent network for calculation by each legal user to obtain a local utility function of each legal user; the action input of the previous time slot is obtained by selecting an access channel for data transmission according to the local observation information of the previous time slot and making a decision according to a feedback result received by the data transmission; all legal users input the local utility function into a mixed network in a central controller for fusion to obtain a combined utility function, and the combined utility function is trained and updated according to a utility function approximation algorithm to obtain an optimal combined utility function; wherein the optimal joint utility function comprises an optimal joint strategy;
and the interference avoidance module is used for distributing the optimal combined strategy to each legal user by the central controller to obtain the optimal interference avoidance strategy corresponding to each legal user, and the legal users autonomously decide a communication channel according to the optimal interference avoidance strategy to carry out data transmission.
CN202211076124.9A 2022-09-05 2022-09-05 Multi-user scene-oriented distributed interference avoidance method and device Active CN115150963B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211076124.9A CN115150963B (en) 2022-09-05 2022-09-05 Multi-user scene-oriented distributed interference avoidance method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211076124.9A CN115150963B (en) 2022-09-05 2022-09-05 Multi-user scene-oriented distributed interference avoidance method and device

Publications (2)

Publication Number Publication Date
CN115150963A true CN115150963A (en) 2022-10-04
CN115150963B CN115150963B (en) 2022-11-04

Family

ID=83415990

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211076124.9A Active CN115150963B (en) 2022-09-05 2022-09-05 Multi-user scene-oriented distributed interference avoidance method and device

Country Status (1)

Country Link
CN (1) CN115150963B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140097979A1 (en) * 2012-10-09 2014-04-10 Accipiter Radar Technologies, Inc. Device & method for cognitive radar information network
US9858967B1 (en) * 2015-09-09 2018-01-02 A9.Com, Inc. Section identification in video content
CN111917509A (en) * 2020-08-10 2020-11-10 中国人民解放军陆军工程大学 Multi-domain intelligent communication model and communication method based on channel-bandwidth joint decision
CN112180724A (en) * 2020-09-25 2021-01-05 中国人民解放军军事科学院国防科技创新研究院 Training method and system for multi-agent cooperative cooperation under interference condition
CN113111594A (en) * 2021-05-12 2021-07-13 中国人民解放军国防科技大学 Multi-objective optimization-based frequency planning method and device and computer equipment
CN114047523A (en) * 2021-10-19 2022-02-15 中国人民解放军国防科技大学 Method for detecting and tracking real target by puzzling and disturbing electromagnetic waves based on noise interference

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140097979A1 (en) * 2012-10-09 2014-04-10 Accipiter Radar Technologies, Inc. Device & method for cognitive radar information network
US9858967B1 (en) * 2015-09-09 2018-01-02 A9.Com, Inc. Section identification in video content
CN111917509A (en) * 2020-08-10 2020-11-10 中国人民解放军陆军工程大学 Multi-domain intelligent communication model and communication method based on channel-bandwidth joint decision
CN112180724A (en) * 2020-09-25 2021-01-05 中国人民解放军军事科学院国防科技创新研究院 Training method and system for multi-agent cooperative cooperation under interference condition
CN113111594A (en) * 2021-05-12 2021-07-13 中国人民解放军国防科技大学 Multi-objective optimization-based frequency planning method and device and computer equipment
CN114047523A (en) * 2021-10-19 2022-02-15 中国人民解放军国防科技大学 Method for detecting and tracking real target by puzzling and disturbing electromagnetic waves based on noise interference

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HAIJUN WANG: "Survey on Unmanned Aerial Vehicle Networks:A Cyber Physical System Perspective", 《IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 22, NO. 2, SECOND QUARTER 2020》 *
潘筱茜: "基于深度强化学习的多域联合干扰规避", 《信号处理》 *
荆楠等: "MIMO-OFDM系统中时变稀疏信道估计", 《信号处理》 *

Also Published As

Publication number Publication date
CN115150963B (en) 2022-11-04

Similar Documents

Publication Publication Date Title
CN109862610B (en) D2D user resource allocation method based on deep reinforcement learning DDPG algorithm
Li Multi-agent Q-learning of channel selection in multi-user cognitive radio systems: A two by two case
Slimeni et al. Jamming mitigation in cognitive radio networks using a modified Q-learning algorithm
Slimeni et al. Cooperative Q-learning based channel selection for cognitive radio networks
US12067487B2 (en) Method and apparatus employing distributed sensing and deep learning for dynamic spectrum access and spectrum sharing
CN109274456B (en) Incomplete information intelligent anti-interference method based on reinforcement learning
CN112492591B (en) Method and device for accessing power Internet of things terminal to network
Slimeni et al. Learning multi-channel power allocation against smart jammer in cognitive radio networks
Han et al. Joint resource allocation in underwater acoustic communication networks: A game-based hierarchical adversarial multiplayer multiarmed bandit algorithm
Pourranjbar et al. Reinforcement learning for deceiving reactive jammers in wireless networks
Slimeni et al. Cognitive radio jamming mitigation using markov decision process and reinforcement learning
Albinsaid et al. Multi-agent reinforcement learning-based distributed dynamic spectrum access
CN116600324B (en) Channel allocation method for channel-bonded WiFi network
CN117615419A (en) Distributed data unloading method based on task scheduling and resource allocation
Li et al. Intelligent anti-jamming communication with continuous action decision for ultra-dense network
Jiang et al. Q-learning for non-cooperative channel access game of cognitive radio networks
Lakew et al. Adaptive partial offloading and resource harmonization in wireless edge computing-assisted IoE networks
Xu et al. Play it by ear: Context-aware distributed coordinated anti-jamming channel access
Thien et al. A transfer games actor–critic learning framework for anti-jamming in multi-channel cognitive radio networks
CN108449151B (en) Spectrum access method in cognitive radio network based on machine learning
Wei et al. Optimal frequency-hopping anti-jamming strategy based on multi-step prediction Markov decision process
CN115150963B (en) Multi-user scene-oriented distributed interference avoidance method and device
Slimeni et al. A modified Q-learning algorithm to solve cognitive radio jamming attack
Vien et al. Enhancing security of MME handover via fractional programming and firefly algorithm
Adeogun et al. Distributed channel allocation for mobile 6G subnetworks via multi-agent deep Q-learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant