CN115150963A

CN115150963A - Multi-user scene-oriented distributed interference avoidance method and device

Info

Publication number: CN115150963A
Application number: CN202211076124.9A
Authority: CN
Inventors: 张姣; 雷婵; 赵海涛; 陈海涛; 魏急波; 周力; 刘月玲
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2022-09-05
Filing date: 2022-09-05
Publication date: 2022-10-04
Anticipated expiration: 2042-09-05
Also published as: CN115150963B

Abstract

The application relates to a distributed interference avoidance method and device for a multi-user scene. The method comprises the following steps: each legal user obtains local observation information of a current time slot by perceiving an electromagnetic environment, inputs the local observation information of the current time slot and the action of a previous time slot into an intelligent network for calculation to obtain a local utility function of each legal user, fuses all the local utility functions by means of a central controller with a hybrid network during training to obtain a combined utility function, continuously adjusts and evaluates strategies under central auxiliary training to finally obtain an optimal combined strategy, and further obtains an optimal interference avoidance strategy of each legal user; the optimal interference avoidance strategy of the legal user independently completes communication decision to carry out data transmission. By adopting the method, the legal users can simultaneously deal with the malicious interference of the external jammers and the mutual interference between the internal users without depending on the central centralized deployment, and the normal communication is realized.

Description

Multi-user-scene-oriented distributed interference avoidance method and device

Technical Field

The application relates to the technical field of intelligent wireless communication, in particular to a distributed interference avoidance method and device for a multi-user scene.

Background

With the continuous development of communication technology, the communication of users in the network is susceptible to external electromagnetic interference, and meanwhile, mutual interference is easy to occur when multiple users communicate without negotiation in the network, so that respective normal communication is affected. Therefore, how to enable users deployed in a distributed manner in a network to autonomously identify and avoid electromagnetic interference is imperative, and the users have intelligent interference avoidance capability.

Traditional solutions based on centralized allocation or negotiation between users solve the multi-user scenario electromagnetic interference problem. On one hand, for the centralized allocation scheme, the disadvantages exist that in an actual scene, it cannot be guaranteed that a base station and other facilities exist as a central control role, even if the facilities exist, the overall scheduling of the base station is quite complex, and information transmission between the base station and a user brings about great communication and calculation cost, which is not suitable for a wireless communication network with limited resources; meanwhile, a large processing time delay is inevitably caused, which is unfavorable for coping with a time-varying and severe electromagnetic interference environment; in addition, once the center is destroyed, it directly results in the disorganization of the wireless resource usage by the users, thereby causing interference and communication failure. On the other hand, for the scheme of mutual negotiation between users, the disadvantage is that the negotiation between users is often limited to adjacent users in the communication range, and this limitation causes the problem of overall processing delay of the network; due to the complexity of the network environment, the communication strategy which is obtained by the scheme is possibly not the optimal strategy, and the problem of interference in the communication process of the user cannot be completely solved; meanwhile, the potential security problem in the user negotiation needs to be also taken into consideration.

Disclosure of Invention

In view of the foregoing, it is necessary to provide a distributed interference avoidance method and apparatus for a multi-user scenario, which can implement legal communication of multiple legitimate users under the conditions of a harsh communication environment and dynamic changes.

A distributed interference avoidance method for a multi-user scenario, the method comprising:

a legal user obtains local observation information of the current time slot by sensing an electromagnetic environment; the local observation information comprises the state information of a legal user and the channel use information in the electromagnetic environment;

each legal user inputs the local observation information of the current time slot and the action of the previous time slot into the intelligent network for calculation to obtain a local utility function of each legal user; the action input of the previous time slot is obtained by selecting an access channel for data transmission according to the local observation information of the previous time slot and making a decision according to a feedback result received by the data transmission;

all legal users input the local utility function into a mixed network in a central controller for fusion to obtain a combined utility function, and the combined utility function is trained and updated according to a utility function approximation algorithm to obtain an optimal combined utility function; wherein the optimal joint utility function comprises an optimal joint strategy;

and the central controller distributes the optimal combination strategy to each legal user to obtain an optimal interference avoidance strategy corresponding to each legal user, and the legal users autonomously decide a communication channel according to the optimal interference avoidance strategy to carry out data transmission.

In one embodiment, the step of inputting, by each legitimate user, the local observation information of the current time slot and the action of the previous time slot into the intelligent agent network for calculation to obtain the local utility function of each legitimate user includes:

and each legal user inputs the local observation information of the current time slot and the action of the previous time slot into the intelligent agent network for calculation to obtain a local utility function of each legal user, wherein the local utility function comprises a historical observation action set of the intelligent agent network, the action of the current time slot and the intelligent agent network strategy.

In one embodiment, the action input of the previous time slot is obtained by selecting an access channel for data transmission according to local observation information of the previous time slot and making a decision according to a feedback result received by the data transmission, and the action input of the previous time slot includes:

selecting an access channel for data transmission according to the local observation information of the previous time slot to obtain a data transmission result;

analyzing the communication condition of the access channel according to the data transmission result to obtain a feedback result;

and making a decision according to the feedback result to obtain the action of the last time slot.

In one embodiment, the data transmission result comprises a data transmission success result and a data transmission failure result;

analyzing the communication condition of the access channel according to the data transmission result to obtain a feedback result, wherein the feedback result comprises the following steps:

analyzing the communication condition of the access channel according to the successful data transmission result to obtain a positive feedback result;

and analyzing the communication condition of the access channel according to the data transmission failure result to obtain a negative feedback result and a non-feedback result.

In one embodiment, analyzing the communication condition of the access channel according to the data transmission failure result to obtain a negative feedback result and a non-feedback result, includes:

analyzing the communication condition of the access channel according to the data transmission failure result, and obtaining a negative feedback result when the communication condition is interfered by other legal users; when the communication condition is interfered by the jammer, a non-feedback result is obtained.

In one embodiment, the making a decision according to the feedback result to obtain the action of the previous time slot includes:

making a decision according to the positive feedback result to obtain the action of the last time slot as selecting an access channel for communication;

and making a decision according to the negative feedback and non-feedback results to obtain the action of the previous time slot, namely selecting other channels for communication.

In one embodiment, the step of inputting the local utility function into a hybrid network in the central controller for fusion by all legal users to obtain a joint utility function includes:

all legal users input the local utility function into a mixed network in a central controller for fusion to obtain a combined utility function; the joint utility function comprises historical observation action sets of all the intelligent agent networks, an action set of the current time slot and a joint strategy;

and coordinating the relationship between the joint utility function and the local utility function according to the hyper-network in the hybrid network, so that the monotonicity between the joint utility function and the local utility function of each legal user is met.

In one embodiment, coordinating a relationship between a joint utility function and a local utility function according to a super network in a hybrid network so that the joint utility function and the local utility function of each legitimate user satisfy monotonicity, includes:

inputting the global environment state into a super network in the hybrid network for calculation to obtain a bias parameter and a non-negative weight parameter of the hybrid network;

and coordinating the relationship between the joint utility function and the local utility function according to the bias parameter and the non-negative weight parameter, so that the monotonicity between the joint utility function and the local utility function of each legal user is met.

In one embodiment, training and updating the combined utility function according to a utility function approximation algorithm to obtain an optimal combined utility function includes:

and inputting the joint utility function and the target network utility function into a pre-constructed loss function for calculation, and training and updating the joint utility function through a minimized loss function to obtain an optimal joint utility function, wherein the optimal joint utility function comprises an optimal joint strategy.

A distributed interference avoidance apparatus facing a multi-user scene comprises:

the perception module is used for a legal user to obtain local observation information of the current time slot by perceiving the electromagnetic environment; the local observation information comprises the state information of a legal user and the channel use information in the electromagnetic environment;

the training module is used for inputting the local observation information of the current time slot and the action of the previous time slot into the intelligent agent network for calculation by each legal user to obtain a local utility function of each legal user; the action input of the previous time slot is obtained by selecting an access channel for data transmission according to the local observation information of the previous time slot and making a decision according to a feedback result received by the data transmission; all legal users input the local utility function into a mixed network in the central controller for fusion to obtain a combined utility function, and the combined utility function is trained and updated according to a utility function approximation algorithm to obtain an optimal combined utility function; wherein the optimal joint utility function comprises an optimal joint strategy;

and the interference avoidance module is used for distributing the optimal combined strategy to each legal user by the central controller to obtain the optimal interference avoidance strategy corresponding to each legal user, and the legal users autonomously decide a communication channel according to the optimal interference avoidance strategy to perform data transmission.

According to the distributed interference avoiding method and device for the multi-user scene, each legal user calculates perceived local observation information and actions according to an intelligent lifting network by means of a central controller with a hybrid network during training to obtain a local utility function, all legal users input the local utility functions into the central controller to be fused to obtain a combined utility function, and under the auxiliary training of the center, strategies are continuously adjusted and evaluated to finally obtain the optimal combined strategies, so that the optimal interference avoiding strategies of all legal users are obtained; in actual execution, a legal user does not rely on the central controller any more, and the legal user can independently complete communication decision to carry out data transmission according to currently perceived local observation information and an optimal interference avoidance strategy. By adopting the distributed interference avoiding method for decentralized execution of centralized training, under the condition of severe and dynamic change of the actual wireless communication environment, legal users can simultaneously deal with malicious interference of an external interference machine and mutual interference between internal users without depending on centralized deployment of a center, and normal communication is realized.

Drawings

Fig. 1 is an application scenario diagram of a distributed interference avoidance method for a multi-user scenario in an embodiment;

FIG. 2 is a flowchart illustrating a distributed interference avoidance method for a multi-user scenario according to an embodiment;

fig. 3 is a schematic diagram of a timeslot structure in which a legitimate user and a jammer operate in one embodiment;

FIG. 4 is a block diagram of the method of the present invention according to one embodiment;

FIG. 5 is a graph comparing the interference avoidance performance of the proposed method of the present invention with Q learning and ideal solution in one embodiment;

fig. 6 is a graph of the interference avoidance performance of the proposed method for different network sizes in one embodiment;

FIG. 7 is a diagram illustrating a time-frequency representation of a user without training and learning according to the present invention;

FIG. 8 is a time-frequency representation of a user after training and learning according to the method of the present invention in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The distributed interference avoidance method for the multi-user scene can be applied toAs in the communication network scenario shown in fig. 1. The presence of a jammer in a communication network andNfor a legitimate user comprising a transmitter and a receiver, denoted as

In space existsKThe set of interference channels of the jammer, which is consistent with the set of communication channels of the legitimate users, is indicated as available channels

When legal users communicate, the jammers continuously work and switch on different channels in hopes of interfering with the data transmission of the legal users. Specifically, the invention considers that the jammer only interferes one channel per time slot, the interference mode is sweep frequency interference but unknown to legal users, and meanwhile, because the communication network comprises multiple pairs of legal users, mutual interference among the users exists in the data transmission process.

In an embodiment, as shown in fig. 2, a distributed interference avoidance method for a multi-user scenario is provided, and is described by taking an application scenario in fig. 1 as an example, the method includes the following steps:

step 202, a legal user obtains local observation information of the current time slot by sensing an electromagnetic environment; the local observation information comprises the state information of a legal user and the channel use information in the electromagnetic environment.

It can be understood that a legal user mainly perceives the frequency domain information in the electromagnetic environment, so as to perceive that the channel is free from interference of an interference machine, i.e. no other user uses the channel; in particular, due to the limitation of user hardware devices and the complexity of the environment, a user can only perceive local observation information in the environment, but cannot acquire global information.

Step 204, inputting the local observation information of the current time slot and the action of the previous time slot into the intelligent agent network for calculation by each legal user to obtain a local utility function of each legal user; and the action input of the previous time slot is obtained by selecting an access channel for data transmission according to the local observation information of the previous time slot and making a decision according to a feedback result received by the data transmission.

It can be understood that the intelligent network adopts a deep-cycle neural network (DRQN) structure, and can use all observation-action history information to represent the current state, so as to effectively deal with the local observation problem of the legal user, the intelligent network corresponds to the legal user one by one, the input of the intelligent network is the local observation information of the current time slot and the action of the last time slot of each legal user, and the output information is the local utility function of each legal user.

Step 206, inputting the local utility function into a mixed network in the central controller by all legal users for fusion to obtain a combined utility function, and training and updating the combined utility function according to a utility function approximation algorithm to obtain an optimal combined utility function; wherein the optimal joint utility function comprises an optimal joint strategy;

the hybrid network provided by the invention is a nonlinear network, local utility functions of all legal users are fused according to the hybrid network to obtain a joint utility function, the joint utility function is used for evaluating the quality of actions of all legal users, the joint utility function is trained and updated according to a utility function approximation algorithm to obtain an optimal joint utility function, and the maximization of the multi-user interference avoidance performance is ensured.

And 208, the central controller distributes the optimal combined strategy to each legal user to obtain an optimal interference avoidance strategy corresponding to each legal user, and the legal users autonomously decide a communication channel according to the optimal interference avoidance strategy to perform data transmission.

The distributed interference avoidance method for the multi-user scene is a distributed interference avoidance method based on multi-agent reinforcement learning (QMIX), the central controller distributes the optimal combination strategy to each legal user to obtain the optimal interference avoidance strategy corresponding to each legal user, the legal users can autonomously capture the interference mode of an interference machine under the condition that no center exists and mutual negotiation does not exist according to the optimal interference avoidance strategies, and the legal users can avoid mutual interference among the legal users, so that normal communication can be realized.

The distributed interference avoiding method for the multi-user scene comprises the steps that a central controller with a hybrid network is deployed during training, each legal user calculates perceived local observation information and actions according to an intelligent network to obtain a local utility function, all legal users input the local utility functions into the central controller to be fused to obtain a combined utility function, strategies are continuously adjusted and evaluated under central auxiliary training, and finally the optimal interference avoiding strategies of each legal user are obtained; in actual execution, a legal user does not rely on the central controller any more, and the legal user can independently complete communication decision to carry out data transmission according to currently perceived local observation information and an optimal interference avoidance strategy. By adopting the distributed interference avoiding method for decentralized execution of centralized training, the method can ensure that legal users simultaneously deal with the malicious interference of an external interference machine and the mutual interference between internal users under the condition of independent central centralized deployment under the condition of severe and dynamic change of the actual wireless communication environment, and realize normal communication.

In one embodiment, the step of inputting, by each legal user, the local observation information of the current time slot and the action of the previous time slot into the intelligent agent network for calculation to obtain the local utility function of each legal user includes:

for arbitrary agents

Each legal user will observe the local information of the current time slot

And the last time slot

Inputting the corresponding intelligent network for calculation to obtain the local utility function of each legal user

Wherein the local utility function comprises a set of historical observed actions for the agent network

The current time slot istMovement of time

And agent network policy

。

It can be understood that the local utility function is used for evaluating the quality of the actions of the legal users under the current policy, and when the maximum value of each local utility function is obtained, the optimal distributed policy of each legal user is obtained.

selecting an access channel for data transmission according to the local observation information of the last time slot to obtain a data transmission result;

It can be understood that, as shown in fig. 3, when a legal user performs communication, the jammer continuously operates, the jammer performs frequency sweeping interference on channels in space according to time slots, selects one channel at the initial time of each time slot, and keeps the interference channel unchanged in the current time slot, and particularly, if the jammer interference channel is a communication channel of the legal user, the jammer performs interference successfully; otherwise, the interference is invalid;

when the jammer switches over different channels to interfere the data transmission of the legal user, the state of the user on the unaccessed channel is unknown, the real interference situation of the channel can be obtained only after the data transmission is carried out on the accessed channel, and the decision situation of the user on other users in the network is also unknown, so that the user needs to obtain the channel access action sequentially through the sensing, data transmission, learning analysis and decision process, and the jammer is prevented from interfering the accessed channel.

analyzing the communication condition of the access channel according to the data transmission failure result, and obtaining a negative feedback result when the communication condition is interfered by other legal users; when the communication situation is interfered by the jammer, a feedback-free result is obtained.

It can be understood that there are two cases of success and failure in data transmission, where the failure is mainly caused by channel sensing and access failure, resulting in interference from jammers or collision of data transmission by other legitimate users.

It can be understood that three feedback results are obtained according to the data transmission result, and when the data transmission is successful, positive feedback of ACK is received, which indicates that the user is not interfered; the user receives NACK negative feedback to indicate that the user is interfered by other users in the data transmission process; receiving a result of no feedback, which indicates that the user is maliciously interfered by an interference machine; wherein the latter two cases both indicate a data transmission failure.

making a decision according to the positive feedback result, wherein the action of obtaining the last time slot is to select an access channel for communication;

It can be understood that the user identifies the network environment situation according to the received feedback result, and decides how to adjust the interference avoidance strategy of the next time slot.

the network framework composed of the intelligent agent network deployed in the legal users and the hybrid network and the super network deployed in the central controller is shown in FIG. 4, and all the legal users will use the local utility function

The mixed networks input into the central controller are fused to obtain a combined utility function

(ii) a Wherein the joint utility function comprises a historical observation action set of all agent networks

Current time slot action set

And federation policies

；

Coordinating the relationship between the joint utility function and the local utility function according to the hyper-network in the hybrid network so that the joint utility function and the local utility function of each legal user satisfy monotonicity, expressed as

。

global environment state

Inputting into the super network in the hybrid network for calculation to obtain bias parameters of the hybrid networkbAnd non-negative weight parameterw；

And coordinating the relationship between the joint utility function and the local utility function according to the bias parameter b and the non-negative weight parameter w, so that the monotonicity between the joint utility function and the local utility function of each legal user is met.

It can be appreciated that the super network is deployed within the hybrid network, and the convergence speed of the hybrid network is improved by coordinating the relationship between the joint utility function and the local utility function.

inputting the joint utility function and the target network utility function into a pre-constructed loss function for calculation,

a pre-constructed loss function of

Wherein, in the step (A),

，

indicating the batch size at which sampling was performed during training,

to representFirst, theiThe global reward under each sampling batch, namely the sum of the instant rewards of all the intelligent agent networks,

is a target network utility function and is interpreted as the state of the user

Temporal basis policy

Performing an action

Combined with historical observation-action information

The utility function value obtained by evaluation can be used for ensuring the operation stability of the algorithm during training;

by minimizing a loss function

Training and updating the combined utility function to obtain an optimal combined utility function, wherein the optimal combined utility function comprises an optimal combined strategy

；

Upon execution, the central controller will optimize the federation policy

Distributing the interference avoidance information to each legal user to obtain the optimal interference avoidance strategy corresponding to each legal user

And a legal user autonomously decides a communication channel according to the optimal interference avoidance strategy to carry out data transmission.

It can be understood that the invention adopts the idea of centralized training and decentralized execution, and comprises two stages of off-line training and on-line execution respectively. On one hand, in the off-line training stage, the central controller is used for collecting the observation, action and reward information of all the agents, training and issuing the optimal interference avoidance strategy to the user. On the other hand, the user online execution stage does not need the participation of a central controller, each legal user inputs own perception information locally, and the decision information is output autonomously and executed by learning the optimal interference avoidance strategy, so that the intelligent anti-interference capability of the legal user is ensured.

Furthermore, the method provided by the invention is compared with the Q learning and interference avoidance performance under an ideal scheme through experimental verification, two pairs of legal users, one interference machine and six channels are considered during the experimental verification, and particularly, each iteration comprises 100 rounds of training, and each round of training comprises 60-time-slot interactive learning. The performance comparison result is shown in fig. 5, where the abscissa in fig. 5 is the iteration number, and the ordinate is the normalized reward value, and it can be seen from fig. 5 that the performance under the method of the present invention shows a trend of increasing continuously with the increase of the training number, then gradually becomes stable within a limited number, and reaches convergence after about 220 times of iterative training, and the obtained performance effect is significantly better than that of the centralized Q learning method, and the convergence value is highly consistent with the performance value under the ideal scheme. The result shows that the invention has effective interference avoidance performance, and ensures that users have the capability of autonomously coping with malicious interference of an interference machine and mutual interference among users.

Specifically, the method provided by the invention is also experimentally verified for the interference avoidance performance of different network scales, and similarly, the number of the fixed channels is six, the number of the legal users is two pairs, three pairs and four pairs respectively, and particularly, the larger the number of the legal users is, the larger the network scale is, and the more complex the communication environment is. The verification result is shown in fig. 6, the abscissa in the figure is the iteration number, and the ordinate in the figure is the average global reward value of one time slot, and it can be seen from fig. 6 that, in three different scale scenes, the performance value can reach convergence within a limited number of times, which ensures the interference avoidance effectiveness and applicability of the present invention for different network scales and different communication environment complexities.

In addition, the time-frequency result of the user not trained and learned by the method of the invention is compared with the time-frequency result of the user trained and learned by the algorithm of the invention, the schematic diagram of the time-frequency result of the user not trained and learned by the method of the invention is shown in fig. 7, the diagram shows the working conditions of three pairs of legal users and an interference machine under six channels, the abscissa is the tested time slot, the ordinate represents the channel ID, different color blocks of the grid correspond to the conditions of different channels used by the legal users and the interference machine, it can be seen that the interference machine implements frequency sweep interference in real time, most of the legal users can not normally communicate, namely are interfered by the interference machine, the mutual interference of other users or the two kinds of interference at the same time, which indicates that the user is frequently interfered and the interference avoiding capability is very poor before the method of the invention is trained. The schematic diagram of the time-frequency result of the user after training and learning by the algorithm provided by the invention is shown in fig. 8, and it can be seen that under the condition that the network environment is complex and changes and no negotiation exists between multiple users when accessing the channel, by the method provided by the invention, a legal user can completely and independently avoid the interference of an interference machine, only individual mutual interference exists, and normal communication can be realized under most conditions, and the result further ensures the effectiveness of the invention.

It should be understood that, although the various steps in the flowchart of fig. 1 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 1 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

In one embodiment, a distributed interference avoidance apparatus facing a multi-user scenario is provided, including: perception module, training module and interference avoidance module, wherein:

the perception module is used for a legal user to obtain local observation information of the current time slot by perceiving the electromagnetic environment; the local observation information comprises the state information of a legal user and the channel use information in the electromagnetic environment.

The training module is used for inputting the local observation information of the current time slot and the action of the previous time slot into the intelligent agent network for calculation by each legal user to obtain a local utility function of each legal user; the action input of the previous time slot is obtained by selecting an access channel for data transmission according to the local observation information of the previous time slot and making a decision according to a feedback result received by the data transmission; all legal users input the local utility function into a mixed network in the central controller for fusion to obtain a combined utility function, and the combined utility function is trained and updated according to a utility function approximation algorithm to obtain an optimal combined utility function; wherein the optimal joint utility function comprises an optimal joint strategy.

For specific limitation of a distributed interference avoidance apparatus for a multi-user scenario, refer to the above limitation on a distributed interference avoidance method for a multi-user scenario, and details are not repeated here. Each module in the distributed interference avoiding device facing to the multi-user scenario can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A distributed interference avoidance method facing a multi-user scene is characterized by comprising the following steps:

and the central controller distributes the optimal combined strategy to each legal user to obtain an optimal interference avoidance strategy corresponding to each legal user, and the legal users autonomously decide a communication channel according to the optimal interference avoidance strategy to carry out data transmission.

2. The method of claim 1, wherein each valid user inputs the local observation information of the current time slot and the action of the previous time slot into the intelligent agent network for calculation, so as to obtain a local utility function of each valid user, and the method comprises the following steps:

3. The method of claim 1, wherein the action input of the previous time slot is obtained by selecting an access channel for data transmission according to local observation information of the previous time slot and making a decision according to a feedback result received from the data transmission, and the method comprises:

4. The method of claim 3, wherein the data transmission result comprises a data transmission success result and a data transmission failure result;

5. The method of claim 4, wherein analyzing the communication condition of the access channel according to the data transmission failure result to obtain a negative feedback result and a non-feedback result comprises:

6. The method of claim 5, wherein the act of making a decision based on the feedback result to obtain a last timeslot comprises:

making a decision according to the positive feedback result to obtain the action of the last time slot, namely selecting the access channel for communication;

and making a decision according to the negative feedback and non-feedback results to obtain the action of the last time slot, namely selecting other channels for communication.

7. The method of claim 1, wherein the step of all legal users fusing the local utility function input into a hybrid network in a central controller to obtain a joint utility function comprises:

all legal users input the local utility function into a mixed network in a central controller for fusion to obtain a combined utility function; the joint utility function comprises historical observation action sets of all intelligent agent networks, action sets of current time slots and joint strategies;

8. The method of claim 7, wherein coordinating the relationship between the joint utility function and the local utility function according to a hyper-network in the hybrid network such that the joint utility function and the local utility function of each legitimate user satisfy monotonicity, comprises:

inputting a global environment state into a hyper-network in the hybrid network for calculation to obtain a bias parameter and a nonnegative weight parameter of the hybrid network;

9. The method of claim 1, wherein training and updating the combined utility function according to a utility function approximation algorithm to obtain an optimal combined utility function comprises:

and inputting the combined utility function and a target network utility function into a pre-constructed loss function for calculation, and training and updating the combined utility function by minimizing the loss function to obtain an optimal combined utility function, wherein the optimal combined utility function comprises an optimal combined strategy.

10. A distributed interference avoidance apparatus for a multi-user scenario, the apparatus comprising:

the training module is used for inputting the local observation information of the current time slot and the action of the previous time slot into the intelligent agent network for calculation by each legal user to obtain a local utility function of each legal user; the action input of the previous time slot is obtained by selecting an access channel for data transmission according to the local observation information of the previous time slot and making a decision according to a feedback result received by the data transmission; all legal users input the local utility function into a mixed network in a central controller for fusion to obtain a combined utility function, and the combined utility function is trained and updated according to a utility function approximation algorithm to obtain an optimal combined utility function; wherein the optimal joint utility function comprises an optimal joint strategy;

and the interference avoidance module is used for distributing the optimal combined strategy to each legal user by the central controller to obtain the optimal interference avoidance strategy corresponding to each legal user, and the legal users autonomously decide a communication channel according to the optimal interference avoidance strategy to carry out data transmission.