CN113923794A - Distributed dynamic spectrum access method based on multi-agent reinforcement learning - Google Patents

Distributed dynamic spectrum access method based on multi-agent reinforcement learning Download PDF

Info

Publication number
CN113923794A
CN113923794A CN202111339165.8A CN202111339165A CN113923794A CN 113923794 A CN113923794 A CN 113923794A CN 202111339165 A CN202111339165 A CN 202111339165A CN 113923794 A CN113923794 A CN 113923794A
Authority
CN
China
Prior art keywords
reinforcement learning
access
spectrum access
cognitive user
cognitive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111339165.8A
Other languages
Chinese (zh)
Inventor
周力
谭翔
魏急波
赵海涛
熊俊
高文颖
唐麒
张姣
曹阔
刘潇然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202111339165.8A priority Critical patent/CN113923794A/en
Publication of CN113923794A publication Critical patent/CN113923794A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W74/00Wireless channel access
    • H04W74/08Non-scheduled access, e.g. ALOHA
    • H04W74/0833Random access procedures, e.g. with 4-step access
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • G06F18/295Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/08Testing, supervising or monitoring using real traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W74/00Wireless channel access
    • H04W74/08Non-scheduled access, e.g. ALOHA
    • H04W74/0833Random access procedures, e.g. with 4-step access
    • H04W74/0841Random access procedures, e.g. with 4-step access with collision treatment
    • H04W74/085Random access procedures, e.g. with 4-step access with collision treatment collision avoidance

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a distributed dynamic spectrum access method based on multi-agent reinforcement learning, which models a multi-user distributed dynamic spectrum access problem into a multi-agent Markov cooperation game model and constructs a centralized training and distributed execution multi-agent reinforcement learning framework, wherein the multi-agent reinforcement learning framework comprises an offline training module and an online execution module, the online execution module utilizes a learned access strategy to perform spectrum access of a cognitive user, and the offline training module dynamically updates the online execution module according to a spectrum access result of the cognitive user. The invention provides a communication environment self-adaptive and network scale extensible multi-user cooperative spectrum access method, which reduces access conflicts among cognitive users when interference on authorized users is avoided, thereby maximizing the access success rate of the cognitive users and improving the utilization efficiency of a spectrum.

Description

Distributed dynamic spectrum access method based on multi-agent reinforcement learning
Technical Field
The invention relates to the technical field of wireless communication networks, in particular to a distributed dynamic spectrum access method and a distributed dynamic spectrum access system based on multi-agent reinforcement learning.
Background
In the cognitive wireless network, a cognitive user is accessed to a spectrum hole of an authorized user in an overlapping mode for data transmission. Distributed multi-user dynamic spectrum access faces two major challenges: the method has the advantages that interference of the cognitive user on the master user is avoided, namely when the master user occupies the authorized spectrum for data transmission, the cognitive user cannot access the corresponding spectrum; and secondly, access conflict among the cognitive users is avoided, namely, the situation that more than two cognitive users access the same frequency spectrum hole is avoided, so that data transmission is unsuccessful. Due to the limited sensing capability of a single cognitive node, only partial channel state information can be observed. Meanwhile, due to the influence of factors such as hidden nodes and shielding objects, perception information of a cognitive user is incomplete and inaccurate.
Disclosure of Invention
The invention provides a distributed dynamic spectrum access method and system based on multi-agent reinforcement learning, which are used for overcoming the defects that in the prior art, when a cognitive user is accessed into a spectrum cavity of an authorized user for data transmission, interference is generated on a main user, and meanwhile, access conflict is generated among the cognitive users, so that the throughput of a communication system is low and the like.
In order to achieve the above object, the present invention provides a distributed dynamic spectrum access method based on multi-agent reinforcement learning, which comprises the following steps:
modeling a multi-user distributed dynamic spectrum access problem into a multi-agent Markov cooperative game model, and constructing a centralized training and distributed execution multi-agent reinforcement learning framework; the multi-agent reinforcement learning framework comprises an off-line training module and an on-line execution module;
acquiring local spectrum occupation information according to the self narrow-band sensing capability of the cognitive user;
according to the local spectrum occupation information, through a trained online execution module, spectrum access of a cognitive user is carried out by utilizing a learned access strategy;
and monitoring the access success rate of the cognitive user in real time, and when the power is lower than a threshold value, retraining the online execution module by the offline training module so as to be self-adaptive to various communication environments.
In order to achieve the above object, the present invention further provides a distributed dynamic spectrum access system based on multi-agent reinforcement learning, including:
the algorithm building module is used for modeling a multi-user distributed dynamic spectrum access problem into a multi-agent Markov cooperative game model and building a centralized training and distributed execution multi-agent reinforcement learning framework; the multi-agent reinforcement learning framework comprises an off-line training module and an on-line execution module;
the spectrum sensing module is used for acquiring local spectrum occupation information according to the self narrow-band sensing capability of the cognitive user;
the spectrum access module is used for performing spectrum access of the cognitive user by using a learned access strategy through a trained online execution module according to the local spectrum occupation information;
and the real-time monitoring module is used for monitoring the access success rate of the cognitive user in real time, and when the power is lower than a threshold value, the offline training module retrains the online execution module so as to be self-adaptive to various communication environments.
To achieve the above object, the present invention further provides a computer device, which includes a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the method when executing the computer program.
To achieve the above object, the present invention further proposes a computer-readable storage medium having a computer program stored thereon, which, when being executed by a processor, implements the steps of the above method.
Compared with the prior art, the invention has the beneficial effects that:
the distributed dynamic spectrum access method based on multi-agent reinforcement learning provided by the invention models a multi-user distributed dynamic spectrum access problem into a multi-agent Markov cooperation game model, and constructs a centralized training and distributed execution multi-agent reinforcement learning framework, wherein the multi-agent reinforcement learning framework comprises an offline training module and an online execution module, the online execution module utilizes a learned access strategy to perform spectrum access of cognitive users, and the offline training module dynamically updates the online execution module according to spectrum access results of the cognitive users. The invention provides a communication environment self-adaptive and network scale extensible multi-user cooperative spectrum access method, which reduces access conflicts among cognitive users when interference on authorized users is avoided, thereby maximizing the access success rate of the cognitive users and improving the utilization efficiency of a spectrum.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
FIG. 1 is a schematic diagram of a distributed dynamic spectrum access method based on multi-agent reinforcement learning according to the present invention;
FIG. 2 is a diagram of a centralized training, distributed execution multi-agent reinforcement learning framework of the present invention;
fig. 3 is a schematic diagram of timeslot division according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In addition, the technical solutions in the embodiments of the present invention may be combined with each other, but it must be based on the realization of those skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination of technical solutions should not be considered to exist, and is not within the protection scope of the present invention.
The drugs/reagents used are all commercially available without specific mention.
The invention provides a distributed dynamic spectrum access method based on multi-agent reinforcement learning, which comprises the following steps as shown in figure 1:
101: modeling a multi-user distributed dynamic spectrum access problem into a multi-agent Markov cooperative game model, and constructing a centralized training and distributed execution multi-agent reinforcement learning framework (as shown in FIG. 2); the multi-agent reinforcement learning framework comprises an off-line training module and an on-line execution module.
102: acquiring local spectrum occupation information according to the self narrow-band sensing capability of the cognitive user;
103: according to the local spectrum occupation information, through a trained online execution module, spectrum access of a cognitive user is carried out by utilizing a learned access strategy;
104: and monitoring the access success rate of the cognitive user in real time, and when the power is lower than a threshold value, retraining the online execution module by the offline training module so as to be self-adaptive to various communication environments.
The invention models a multi-user distributed dynamic spectrum access problem of a cognitive wireless network into a multi-agent Markov game process, constructs a centralized training and distributed execution multi-agent reinforcement learning framework according to a multi-agent Markov cooperation game model, wherein the multi-agent reinforcement learning framework comprises an offline training module and an online execution module, the online execution module utilizes a learned access strategy to perform spectrum access of cognitive users, and the offline training module dynamically updates the online execution module according to a real-time monitoring result. The invention provides a communication environment self-adaptive and network scale extensible multi-user cooperative spectrum access method, which reduces access conflicts among cognitive users when interference on authorized users is avoided, thereby maximizing the access success rate of the cognitive users and improving the utilization efficiency of a spectrum.
In one embodiment, for step 101, the offline training module includes a centralized trainer that is built by a network edge computing server (e.g., a small cell, a wireless access point, or a drone assisted edge computing server, etc.).
The online execution module comprises a policy network, and the policy network is loaded on the cognitive user side.
The multi-agent reinforcement learning framework is a centralized training and distributed execution multi-agent reinforcement learning framework.
In the next embodiment, for step 101, the offline training module collects the interaction information between the cognitive users and the wireless environment through the common channel, trains a mutually cooperative policy network for each cognitive user by using the collected interaction information, and sends the trained policy network parameters to the corresponding cognitive user through the common channel to update the parameters of the policy network of the corresponding cognitive user side.
In another embodiment, for step 104, monitoring access success rate of the cognitive user in real time includes:
401: outputting a reward value of the current spectrum access by utilizing a multi-agent reinforcement learning framework according to the spectrum access condition of the cognitive user;
402: and monitoring the access success rate of the cognitive user in real time according to the reward value.
In the next embodiment, for step 401, outputting the reward value of the current spectrum access by using the multi-agent reinforcement learning framework according to the spectrum access condition of the cognitive user, including:
4011: adding the access success times of all the cognitive users as a utility function of each cognitive user;
4012: establishing a reward function in a multi-agent reinforcement learning framework according to the utility function;
4013: and outputting the reward value of the current spectrum access by using the reward function according to the spectrum access condition of the cognitive user.
And local state information of the wireless channel is acquired according to the limited perception capability of the cognitive user, so that an observation space for reinforcement learning is formed.
And selecting a perception channel according to the perception capability of the cognitive user, and selecting an available channel for access, thereby forming an action space for reinforcement learning.
In this embodiment, the available frequency spectrum is divided into K orthogonal sub-channels with equal bandwidth, and the bandwidth of the sub-channel is smaller than the coherent bandwidth of the channel;
each sub-channel is divided into time slots according to the same start-stop time, as shown in fig. 3, the time slot length is shorter than the channel coherence time;
the K orthogonal sub-channels are randomly occupied by the corresponding K authorized users, the idle/occupied states of the K orthogonal sub-channels form a state space of the cognitive wireless network, and the state space is as follows: 2K
And modeling the cognitive user into an intelligent agent, and cooperatively accessing an available spectrum hole to perform data transmission according to the channel state sensed by the cognitive user.
The perception capability of the cognitive user isOnly M subchannels can be selected from the K subchannels for sensing, so that the observation space size of a single cognitive user is as follows:
Figure BDA0003351805080000071
the joint observation space of all cognitive users is:
Figure BDA0003351805080000072
selecting 1 channel in an idle state for access according to the sensing results of the selected M sub-channels; the action space size of the cognitive user is as follows:
Figure BDA0003351805080000073
the joint motion space of all cognitive users is:
Figure BDA0003351805080000074
in one embodiment, the reward function is:
Figure BDA0003351805080000075
in the formula (I), the compound is shown in the specification,
Figure BDA0003351805080000076
representing the utility function of all cognitive users at the time t;
Figure BDA0003351805080000077
representing the access success times of the cognitive user n at the time t; onRepresenting the observation of the cognitive user n at the time t; a isnRepresenting the access action of the cognitive user n at the time t; n represents the total number of cognitive users.
In a next embodiment, the policy network is a deep recurrent neural network structure.
In this embodiment, in a training phase, a centralized trainer deployed on an edge server trains a spectrum cooperation access strategy offline by using a perception-access experience of each cognitive user; when the method is executed, the cognitive user node only depends on local spectrum sensing information to perform spectrum access through autonomous decision of a policy network. The method comprises the steps of dividing available channels into time slots at equal intervals according to the same starting and stopping time, modeling a multi-user frequency spectrum cooperative access problem into a complete cooperative game problem, and solving the optimal strategy for achieving a balance point of a distributed partially observable Markov game problem by using centralized training and distributed executed multi-agent reinforcement learning.
The invention also provides a distributed dynamic spectrum access system based on multi-agent reinforcement learning, which comprises:
the algorithm building module is used for modeling a multi-user distributed dynamic spectrum access problem into a multi-agent Markov cooperative game model and building a centralized training and distributed execution multi-agent reinforcement learning framework; the multi-agent reinforcement learning framework comprises an off-line training module and an on-line execution module;
the spectrum sensing module is used for acquiring local spectrum occupation information according to the self narrow-band sensing capability of the cognitive user;
the spectrum access module is used for performing spectrum access of the cognitive user by using a learned access strategy through a trained online execution module according to the local spectrum occupation information;
and the real-time monitoring module is used for monitoring the access success rate of the cognitive user in real time, and when the power is lower than a threshold value, the offline training module retrains the online execution module so as to be self-adaptive to various communication environments.
The invention further provides a computer device, which includes a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the method when executing the computer program.
The invention also proposes a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method described above.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all modifications and equivalents of the present invention, which are made by the contents of the present specification and the accompanying drawings, or directly/indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A distributed dynamic spectrum access method based on multi-agent reinforcement learning is characterized by comprising the following steps:
modeling a multi-user distributed dynamic spectrum access problem into a multi-agent Markov cooperative game model, and constructing a centralized training and distributed execution multi-agent reinforcement learning framework; the multi-agent reinforcement learning framework comprises an off-line training module and an on-line execution module;
acquiring local spectrum occupation information according to the self narrow-band sensing capability of the cognitive user;
according to the local spectrum occupation information, through a trained online execution module, spectrum access of a cognitive user is carried out by utilizing a learned access strategy;
and monitoring the access success rate of the cognitive user in real time, and when the power is lower than a threshold value, retraining the online execution module by the offline training module so as to be self-adaptive to various communication environments.
2. The multi-agent reinforcement learning-based distributed dynamic spectrum access method of claim 1, wherein the offline training module comprises a centralized trainer, the centralized trainer being constructed by a network edge computing server;
the online execution module comprises a policy network, and the policy network is loaded on the cognitive user side.
3. The multi-agent reinforcement learning-based distributed dynamic spectrum access method as claimed in claim 2, wherein the offline training module collects interaction information between the cognitive users and the wireless environment through a common channel, trains a mutually cooperative policy network for each cognitive user using the collected interaction information, and sends the trained policy network parameters to the corresponding cognitive user through the common channel to update the parameters of the policy network at the corresponding cognitive user end.
4. The distributed dynamic spectrum access method based on multi-agent reinforcement learning as claimed in claim 1, wherein monitoring access success rate of cognitive users in real time comprises:
outputting a reward value of the current spectrum access by utilizing a multi-agent reinforcement learning framework according to the spectrum access condition of the cognitive user;
and monitoring the access success rate of the cognitive user in real time according to the reward value.
5. The multi-agent reinforcement learning-based distributed dynamic spectrum access method as claimed in claim 4, wherein outputting the reward value of the current spectrum access by using the multi-agent reinforcement learning framework according to the spectrum access condition of the cognitive user comprises:
adding the access success times of all the cognitive users as a utility function of each cognitive user;
establishing a reward function in a multi-agent reinforcement learning framework according to the utility function;
and outputting the reward value of the current spectrum access by using the reward function according to the spectrum access condition of the cognitive user.
6. The multi-agent reinforcement learning-based distributed dynamic spectrum access method of claim 5, wherein the reward function is:
Figure FDA0003351805070000021
in the formula (I), the compound is shown in the specification,
Figure FDA0003351805070000022
representing the utility function of all cognitive users at the time t;
Figure FDA0003351805070000023
representing the access success times of the cognitive user n at the time t; onRepresenting the observation of the cognitive user n at the time t; a isnRepresenting the access action of the cognitive user n at the time t; n represents the total number of cognitive users.
7. The multi-agent reinforcement learning-based distributed dynamic spectrum access method of claim 2, wherein the policy network is a deep-cycle neural network structure.
8. A distributed dynamic spectrum access system based on multi-agent reinforcement learning, comprising:
the algorithm building module is used for modeling a multi-user distributed dynamic spectrum access problem into a multi-agent Markov cooperative game model and building a centralized training and distributed execution multi-agent reinforcement learning framework; the multi-agent reinforcement learning framework comprises an off-line training module and an on-line execution module;
the spectrum sensing module is used for acquiring local spectrum occupation information according to the self narrow-band sensing capability of the cognitive user;
the spectrum access module is used for performing spectrum access of the cognitive user by using a learned access strategy through a trained online execution module according to the local spectrum occupation information;
and the real-time monitoring module is used for monitoring the access success rate of the cognitive user in real time, and when the power is lower than a threshold value, the offline training module retrains the online execution module so as to be self-adaptive to various communication environments.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor when executing the computer program implements the steps of the method of any of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202111339165.8A 2021-11-12 2021-11-12 Distributed dynamic spectrum access method based on multi-agent reinforcement learning Pending CN113923794A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111339165.8A CN113923794A (en) 2021-11-12 2021-11-12 Distributed dynamic spectrum access method based on multi-agent reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111339165.8A CN113923794A (en) 2021-11-12 2021-11-12 Distributed dynamic spectrum access method based on multi-agent reinforcement learning

Publications (1)

Publication Number Publication Date
CN113923794A true CN113923794A (en) 2022-01-11

Family

ID=79246180

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111339165.8A Pending CN113923794A (en) 2021-11-12 2021-11-12 Distributed dynamic spectrum access method based on multi-agent reinforcement learning

Country Status (1)

Country Link
CN (1) CN113923794A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024032228A1 (en) * 2022-08-12 2024-02-15 华为技术有限公司 Reinforcement learning training method and related device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112383922A (en) * 2019-07-07 2021-02-19 东北大学秦皇岛分校 Deep reinforcement learning frequency spectrum sharing method based on prior experience replay
CN112637965A (en) * 2020-12-30 2021-04-09 上海交通大学 Game-based Q learning competition window adjusting method, system and medium
CN113207127A (en) * 2021-04-27 2021-08-03 重庆邮电大学 Dynamic spectrum access method based on hierarchical deep reinforcement learning in NOMA system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112383922A (en) * 2019-07-07 2021-02-19 东北大学秦皇岛分校 Deep reinforcement learning frequency spectrum sharing method based on prior experience replay
CN112637965A (en) * 2020-12-30 2021-04-09 上海交通大学 Game-based Q learning competition window adjusting method, system and medium
CN113207127A (en) * 2021-04-27 2021-08-03 重庆邮电大学 Dynamic spectrum access method based on hierarchical deep reinforcement learning in NOMA system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XIANG TAN, 等: "Cooperative Multi-Agent Reinforcement Learning Based Distributed Dynamic Spectrum Access in Cognitive Radio Networks", EI文摘库, 6 March 2021 (2021-03-06), pages 1 - 20 *
董春利;王莉;: "认知无线电网络频谱感知与接入算法研究", 无线互联科技, no. 14, 25 July 2016 (2016-07-25) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024032228A1 (en) * 2022-08-12 2024-02-15 华为技术有限公司 Reinforcement learning training method and related device

Similar Documents

Publication Publication Date Title
Zheng et al. Stochastic game-theoretic spectrum access in distributed and dynamic environment
Pasandi et al. Mac protocol design optimization using deep learning
Yu et al. Multiagent learning of coordination in loosely coupled multiagent systems
Bkassiny et al. Distributed Reinforcement Learning based MAC protocols for autonomous cognitive secondary users
CN111262638B (en) Dynamic spectrum access method based on efficient sample learning
KR102206775B1 (en) Method for allocating resource using machine learning in a wireless network and recording medium for performing the method
van Dijk et al. Grounding subgoals in information transitions
CN113923794A (en) Distributed dynamic spectrum access method based on multi-agent reinforcement learning
Wang et al. Decentralized learning for channel allocation in IoT networks over unlicensed bandwidth as a contextual multi-player multi-armed bandit game
CN114615265A (en) Vehicle-mounted task unloading method based on deep reinforcement learning in edge computing environment
CN113613332B (en) Spectrum resource allocation method and system based on cooperative distributed DQN (differential signal quality network) joint simulated annealing algorithm
Luo et al. Evolutionary coalitional games for random access control
Barrachina-Muñoz et al. Multi-armed bandits for spectrum allocation in multi-agent channel bonding WLANs
Zou et al. Multi-agent reinforcement learning enabled link scheduling for next generation internet of things
CN114885422A (en) Dynamic edge computing unloading method based on hybrid access mode in ultra-dense network
CN114173421B (en) LoRa logic channel based on deep reinforcement learning and power distribution method
Bulychev et al. Computing nash equilibrium in wireless ad hoc networks: A simulation-based approach
CN103686755A (en) On-line learning method capable of realizing optimal transmission for cognitive radio
Gafni et al. A distributed stable strategy learning algorithm for multi-user dynamic spectrum access
CN106936611A (en) A kind of method and device for predicting network state
Kovacs et al. Mixed observability Markov decision processes for overall network performance optimization in wireless sensor networks
Cong et al. Double deep recurrent reinforcement learning for centralized dynamic multichannel access
Lu et al. Dynamic channel access via meta-reinforcement learning
Meyer et al. QMA: A Resource-efficient, Q-learning-based Multiple Access Scheme for the IIoT
Cano et al. Wireless optimisation via convex bandits: unlicensed LTE/WiFi coexistence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination