CN113613339A - Channel access method of multi-priority wireless terminal based on deep reinforcement learning - Google Patents
Channel access method of multi-priority wireless terminal based on deep reinforcement learning Download PDFInfo
- Publication number
- CN113613339A CN113613339A CN202110781263.0A CN202110781263A CN113613339A CN 113613339 A CN113613339 A CN 113613339A CN 202110781263 A CN202110781263 A CN 202110781263A CN 113613339 A CN113613339 A CN 113613339A
- Authority
- CN
- China
- Prior art keywords
- network
- channel
- priority
- protocol
- reward
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 230000002787 reinforcement Effects 0.000 title claims abstract description 38
- 230000009471 action Effects 0.000 claims abstract description 54
- 238000012549 training Methods 0.000 claims abstract description 37
- 230000006870 function Effects 0.000 claims abstract description 33
- 238000003062 neural network model Methods 0.000 claims abstract description 28
- 238000004088 simulation Methods 0.000 claims abstract description 27
- 238000012795 verification Methods 0.000 claims abstract description 17
- 238000013461 design Methods 0.000 claims abstract description 14
- 238000004891 communication Methods 0.000 claims abstract description 10
- 230000005540 biological transmission Effects 0.000 claims description 44
- 238000013528 artificial neural network Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 6
- 238000004422 calculation algorithm Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 5
- 238000005516 engineering process Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000012544 monitoring process Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000012790 confirmation Methods 0.000 claims description 2
- 239000003795 chemical substances by application Substances 0.000 description 10
- 229920006395 saturated elastomer Polymers 0.000 description 9
- 238000010586 diagram Methods 0.000 description 7
- 230000007246 mechanism Effects 0.000 description 4
- 239000007787 solid Substances 0.000 description 4
- 108700026140 MAC combination Proteins 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000010835 comparative analysis Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 229910003460 diamond Inorganic materials 0.000 description 1
- 239000010432 diamond Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W74/00—Wireless channel access
- H04W74/04—Scheduled access
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Computer Hardware Design (AREA)
- Geometry (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention belongs to the technical field of wireless communication, and discloses a channel access method of a multi-priority wireless terminal based on deep reinforcement learning, which comprises the following steps: establishing network scenes with different priority services; designing and defining a system model of the protocol, carrying out state space modeling and action space modeling according to the network scene of the protocol, and designing reward functions aiming at different scenes; defining and establishing a neural network model used by the protocol, and training the network model through experience tuples; and performing performance verification on the trained model through multi-scene simulation comparison. The invention designs the channel access method of the multi-priority service wireless terminal by using deep reinforcement learning, is more suitable for wireless networks with different priority services, improves the throughput of the system and the utilization rate of wireless channel resources, and improves the opportunity of accessing the low-priority service to the channel while reducing the scheduling delay of the high-priority service.
Description
Technical Field
The invention belongs to the technical field of wireless communication, and particularly relates to a channel access method of a multi-priority wireless terminal based on deep reinforcement learning.
Background
Currently, with the rapid development of wireless communication technology, the demand of emerging services such as data transmission and exchange for wireless channels is increasing. In a wireless network, when a plurality of users contend for the usage right of a specific resource (for example, the usage right of a shared channel) at the same time, the users transmit data packets by acquiring the usage right of the channel, and at this time, information from different users needs to occupy the channel for transmission, which may cause the collision of the data packets, thereby causing communication failure. In order to improve the communication efficiency, a multiple access protocol needs to be introduced to determine the use permission of the user to the resource, and the problem that multiple users share the same physical link resource is solved.
Yiding Yu et al propose Deep-reinforcement Learning-based Multiple Access protocol (DLMA) in the literature in wireless heterogeneous networks. Different deep reinforcement learning-based multiple access algorithms are proposed in the literature for different optimization targets, and system simulation comparative analysis is performed on the algorithms corresponding to the different targets. Simulation results show that DLMA can achieve expected targets under the condition of not knowing multiple access protocols adopted by other coexisting networks, so that the proportional fairness and the system throughput of the system are improved. However, the existing channel access method based on deep reinforcement learning does not consider the difference of the service quality requirements of different priority services, and cannot well guarantee the service quality requirements of high priority services.
Conventional Multiple Access protocols such as Time Division Multiple Access (TDMA), ALOHA protocol, Code Division Multiple Access (CDMA), Carrier Sense Multiple Access (CSMA), etc. all have a problem of low channel utilization. Therefore, the invention mainly aims at the problems of the multiple access protocol, and designs a channel access method of a multi-priority wireless terminal based on deep reinforcement learning in a network scene with multi-priority service, so as to reduce the scheduling delay of high-priority service under the constraint of ensuring better system throughput.
Through the above analysis, the problems and defects of the prior art are as follows:
(1) in a wireless network, when a plurality of users compete for the right to use a specific resource at the same time, the users send data packets by acquiring the right to use a channel, and since information from different users needs to occupy the channel for transmission, collision of the data packets may be caused, thereby causing communication failure.
(2) The existing channel access method based on deep reinforcement learning does not consider the difference of the service quality requirements of different priority services, and can not well ensure the service quality requirements of high priority services.
(3) The traditional multi-access method based on multi-priority service can not fully utilize wireless channel resources, which causes resource waste.
The difficulty and significance for solving the problems and defects are as follows: the invention provides a channel access method of a multi-priority wireless terminal based on deep reinforcement learning, which endows the wireless terminal with learning capability, utilizes a reinforcement learning mechanism to interact with the environment, and can further improve the utilization rate of wireless channel resources; in the design of the reward function, different rewards are set for different priority services, so that the scheduling time delay of high priority services can be reduced, and the opportunity of accessing the low priority services to a channel can be improved.
Disclosure of Invention
Aiming at the problems of the existing multiple access protocol, the invention provides a channel access method of a multi-priority wireless terminal based on deep reinforcement learning, in particular to a channel access method of a multi-priority wireless terminal based on deep reinforcement learning.
The invention is realized in such a way that a channel access method of a multi-priority wireless terminal based on deep reinforcement learning comprises the following steps:
step one, establishing network scenes with different priority level services; and the network scene is determined, and the user can modify the system model and the neural network model according to different network scenes so as to deploy the system model and the neural network model in different network scenes.
Designing and defining a system model of the protocol, carrying out state space modeling and action space modeling according to the network scene of the protocol, and designing reward functions aiming at different scenes; according to different network scenes, the state space and the action space are finely adjusted and used, and the form of the reward function is designed and modified according to different key points (the core of reward function design, such as priority), so that the method better adapts to the requirements of the different key points under different network scenes and better meets the requirements of actual deployment.
Step three, defining and establishing a neural network model used by the protocol, and training the network model through experience tuples; aiming at different neural network model training processes, the model convergence speeds are different, and a better neural network model is selected by comparison, so that the training process is quicker and more accurate.
And fourthly, performing performance verification on the trained model through multi-scene simulation comparison. The feasibility and the superiority of the invention are verified and explained by comparing the multi-scene access protocol with the multi-address access protocol without priority service.
Further, in step one, the establishing a network scenario with services of different priorities includes:
establishing a transmission network with k priority services, wherein k is greater than 0; the network scene comprises a base station, N DRL-MAC nodes (adopting the nodes of the invention), M Time Division Multiple Access (TDMA) nodes and X q-ALOHA nodes, wherein (N > 1; M + X >1) at least comprises one DRL-MAC node and one other protocol node.
The base station is used for acquiring data from a wireless channel between the nodes and the base station and transmitting the data; the DRL-MAC node adopts a multiple access technology based on deep reinforcement learning, if the node sends different priority services, a transmission result fed back by a base station is obtained, and different rewards are obtained according to the different priority services; if the node does not send the service, the channel is intercepted, and the transmission state of other nodes in a certain time slot is obtained through the channel observation result; the time division multiple access node adopts a TDMA protocol and is used for carrying out service transmission according to the time slot which is regularly and periodically occupied and allocated; and the q-ALOHA node adopts q-ALOHA and is used for carrying out service transmission at each time slot with a fixed transmission probability q according to q values under different scenes.
Further, in the second step, each DRL-MAC node is equivalent to an agent in reinforcement learning; in each time slot, the agent calculates the Q value in the current state by:
wherein q(s)tAnd a, theta) is an approximation of the deep neural network model calculation, and the action a is selected from the action set according to a greedy strategy to maximize the overall expectation of the reward or to better adapt to a dynamically changing wireless network environment.
Further, in the second step, the designing and defining the system model of the protocol, performing state space modeling and action space modeling according to the network scene of the protocol, and designing the reward function for different scenes includes:
(1) motion space modeling
System action set At={a0,a1,a2......akAnd k is the number of types of priority traffic in the network scenario. In the time slot t, the DRL-MAC node needs to make a decision a through a deep neural network, and the decision a is used for determining whether a data packet is accessed to a channel in the current time slot; wherein a is0Indicating no access to the channel; a is1......akRepresenting different priority service access channels corresponding to k;
the resulting channel observation after taking action is ZtThe element belongs to { SUCCESS, COLLISION, IDLENESS }, and channel observation results are obtained by monitoring a channel and are used for forming experience tuples; wherein SUCCESS indicates that the node has accessed the channel and transmitted the data packet successfully; COLLISION means that a plurality of nodes are simultaneously accessed into a channel for transmission, so that COLLISION is caused; idle indicates that the channel is idle, i.e., no node has access to the channel; the Agent determines a channel observation result according to the confirmation signal from the access point and the interception channel;
(2) state space modeling
State collectionContaining M historical states to be tracked, each historical state being observed by an actionComposition, wherein there are a total of 2k +3 combinations:
(3) reward function
For network scenarios of different priority services, the principle that the reward function always follows is as follows: the higher the priority of the service is, the higher the reward brought by successful transmission is, and the more the punishment brought by failed transmission is; the reward function is set to sum _ rewarded ═ α rewards + (1- α) × (delay/T); wherein alpha and T are controllable variables, the parameter alpha is used for adjusting the influence value of the time delay on the whole reward, and when the influence of the time delay on the whole reward is not considered, the parameter alpha is initialized to 1; the parameter T is used for unifying the influence range of the time delay on the reward, and is initialized to 50; delay is the time delay from the generation of a certain service to the access of the channel, namely scheduling time delay; forwards is the reward value, r, proposed for different priority services1......rkReward r corresponding to successful transmission of services with different priorities-1......r-kPenalty for transmission failure of traffic of different priorityPenalty:
further, in step three, the defining and establishing the neural network model used by the protocol, and training the network model through the experience tuple include:
the DQN is introduced to enable the DRL-MAC node to better learn the next decision of other nodes on the use condition of a channel, the intelligent agent adopts a deep residual error network architecture for training, the deep residual error network is used for approaching a Q value, the current state s is input, an action strategy a is output, and then an experience tuple is formed by combining other information and used for training the deep residual error network.
Further, in step three, the defining and establishing a neural network model used by the protocol, and training the network model through an experience tuple further includes:
(1) initializing an experience pool, setting the capacity of the experience pool, and initializing parameters;
(2) starting from time slot t ═ 0;
(3) transmitting the current state s into a neural network to calculate the Q value of the state; selecting an action to be executed through a greedy strategy, and recording a channel observation result z and a total reward sum _ rewards obtained after the action is taken; putting the acquired state s, the reward r acquired after the action a is taken and the next state s 'which is reached into an experience tuple (s, a, r, s') into an experience pool;
(4) if the current generated experience tuple is larger than the capacity of the experience pool, discarding the experience tuple which enters the experience pool earliest, and putting the latest experience tuple into the experience pool; otherwise, sequentially entering the experience tuples into an experience pool according to the sequence;
(5) if the current time slot t is a multiple of 10, randomly extracting N experience tuples from the experience pool, and sequentially calculating the y values of the experience tuples:where r denotes taking action in the current state sa the current prize won, γ ∈ (0, 1) is the discount factor,selecting the reward obtained by the action with the maximum Q value for future prediction, and otherwise, entering the step (8);
(6) updating the Q-estimation network parameter theta by using a half-gradient descent algorithm;
(7) if the current time slot t is a multiple of the parameter F, the Q-estimated network parameter theta is assigned to the target network parameter theta-Otherwise, entering the step (8);
(8) if the time slot t is more than or equal to the set training round, the training process is exited; otherwise, entering the next time slot t ═ t +1, and entering step (3).
Another object of the present invention is to provide a channel access system of a multi-priority service wireless terminal, in which the channel access method of a multi-priority wireless terminal based on deep reinforcement learning is applied, the channel access system of the multi-priority service wireless terminal including:
the network scene establishing module is used for establishing network scenes with different priority level services;
the system model design module is used for designing and defining a system model of the protocol;
the space modeling module is used for carrying out state space modeling and action space modeling according to the protocol network scene;
the reward function design module is used for designing reward functions aiming at different scenes;
the neural network model establishing module is used for determining and establishing a neural network model used by the protocol;
the network model training module is used for training the network model through experience tuples;
and the performance verification module is used for performing performance verification on the trained model through multi-scene simulation comparison.
It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:
establishing network scenes with different priority services; designing and defining a system model of the protocol, carrying out state space modeling and action space modeling according to the network scene of the protocol, and designing reward functions aiming at different scenes; defining and establishing a neural network model used by the protocol, and training the network model through experience tuples; and performing performance verification on the trained model through multi-scene simulation comparison.
It is another object of the present invention to provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
establishing network scenes with different priority services; designing and defining a system model of the protocol, carrying out state space modeling and action space modeling according to the network scene of the protocol, and designing reward functions aiming at different scenes; defining and establishing a neural network model used by the protocol, and training the network model through experience tuples; and performing performance verification on the trained model through multi-scene simulation comparison.
Another object of the present invention is to provide a wireless communication information data processing terminal for implementing a channel access system of the multi-priority service wireless terminal.
By combining all the technical schemes, the invention has the advantages and positive effects that: the channel Access method (DRL-MAC) of the Deep Reinforcement Learning-based multi-priority service wireless terminal provided by the invention is realized by establishing network scenes with different priority services; designing and defining a system model of the protocol, carrying out state space modeling and action space modeling according to the network scene of the protocol, and designing reward functions aiming at different scenes; defining and establishing a neural network model used by the protocol, and training the network model through experience tuples; and performing performance verification on the trained model through multi-scene simulation comparison to reduce the scheduling delay of the high-priority service under the constraint of ensuring the system throughput. The invention designs the channel access method of the multi-priority service wireless terminal by using deep reinforcement learning, is more suitable for wireless networks with different priority services, improves the throughput of the system and reduces the scheduling delay of high-priority services.
Aiming at wireless networks with different priority services, the invention provides a channel access method of a multi-priority service wireless terminal based on deep reinforcement learning, which endows the wireless terminal with learning capability, utilizes a reinforcement learning mechanism to interact with the environment, and can further improve the utilization rate of wireless channel resources; in the design of the reward function, different rewards are set for different priority services, so that the scheduling time delay of high priority services can be reduced, and the opportunity of accessing the low priority services to a channel can be improved. Compared with a multiple access protocol of a service without priority, the result shows that the channel access method of the multi-priority service wireless terminal based on deep reinforcement learning has better system throughput and scheduling delay of a high-priority service.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a channel access method for a multi-priority service wireless terminal according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a channel access method of a multi-priority service wireless terminal according to an embodiment of the present invention.
Fig. 3 is a block diagram of a channel access system of a multi-priority service wireless terminal according to an embodiment of the present invention;
in the figure: 1. a network scene establishing module; 2. a system model design module; 3. a spatial modeling module; 4. a reward function design module; 5. a neural network model building module; 6. a network model training module; 7. and a performance verification module.
Fig. 4 is a flowchart of deep neural network training based on deep reinforcement learning according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of a network model according to an embodiment of the present invention.
Fig. 6 is a schematic diagram of simulation results provided by the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In view of the problems in the prior art, the present invention provides a channel access method for a multi-priority wireless terminal based on deep reinforcement learning, and the following describes the present invention in detail with reference to the accompanying drawings.
As shown in fig. 1, a channel access method for a deep reinforcement learning-based multi-priority wireless terminal according to an embodiment of the present invention includes the following steps:
s101, establishing network scenes with different priority services;
s102, designing and defining a system model of the protocol, carrying out state space modeling and action space modeling according to the network scene of the protocol, and designing reward functions aiming at different scenes;
s103, defining and establishing a neural network model used by the protocol, and training the network model through experience tuples;
and S104, performing performance verification on the trained model through multi-scene simulation comparison.
A schematic diagram of a channel access method for a multi-priority wireless terminal based on deep reinforcement learning according to an embodiment of the present invention is shown in fig. 2.
As shown in fig. 3, a channel access system of a multi-priority service wireless terminal according to an embodiment of the present invention includes:
the network scene establishing module 1 is used for establishing network scenes with different priority level services;
a system model design module 2, which is used for designing and defining the system model of the protocol;
the space modeling module 3 is used for carrying out state space modeling and action space modeling according to the protocol network scene;
the reward function design module 4 is used for designing reward functions aiming at different scenes;
a neural network model establishing module 5, which is used for determining and establishing the neural network model used by the protocol;
the network model training module 6 is used for training the network model through experience tuples;
and the performance verification module 7 is used for performing performance verification on the trained model through multi-scene simulation comparison.
The technical solution of the present invention will be further described with reference to the following examples.
As shown in fig. 1, the DRL-MAC protocol provided in the embodiment of the present invention includes the following steps:
(1) establishing a wireless network scene containing a plurality of services with different priorities;
establishing a wireless network with two priority services, wherein the network scene comprises: the system comprises a base station, N DRL-MAC nodes (adopting the nodes of the invention), M Time Division Multiple Access (TDMA) nodes and X q-ALOHA nodes, wherein (N > 1; M + X >1) at least comprises one DRL-MAC node and one other protocol node.
The base station acquires data from a wireless channel between the nodes and the base station and transmits the data; the DRL-MAC node adopts a multiple access technology based on deep reinforcement learning, if the node sends different priority services, a transmission result fed back by a base station is obtained, and different rewards are obtained according to the different priority services; if the node does not send the service, the channel is intercepted, and the transmission state of other nodes in a certain time slot is obtained through the channel observation result; the time division multiple access node adopts a TDMA protocol and carries out service transmission according to the time slot which is regularly and periodically occupied and allocated; the q-ALOHA node shown employs q-ALOHA, which performs traffic transmission at each slot with a fixed transmission probability q according to q values under different scenarios.
(2) Designing and defining a system model of the protocol, carrying out state space modeling and action space modeling according to the network scene of the protocol, and designing reward functions aiming at different scenes;
each DRL-MAC node is equivalent to an agent in reinforcement learning; in each time slot, the agent calculates the Q value in the current state by:
wherein q(s)tAnd a, theta) are approximations of the deep neural network model calculations and the action a is selected from the set of actions according to a greedy strategy to maximize the overall expectation of the reward or to better adapt to dynamically changing wireless network environments.
(2.1) motion space modeling
For a network scenario with two priority services, the system action set is At={a0,a1,a2At time slot t, the DRL-MAC node needs to make a decision a through a deep neural network to determine whether to access the data packet to the channel at the current time slot, wherein a0Indicating no access to the channel; a is1Indicating that high priority traffic is to be accessed to the channel; a is2Indicating that low priority traffic is to be accessed to the channel.
The resulting channel observation after taking action is ZtThe method comprises the following steps that (SUCCESS, COLLISION and IDLENESS) belongs to the field, channel observation results are obtained by monitoring a channel and are used for forming experience tuples, wherein the SUCCESS indicates that a node accesses the channel and transmits data packets successfully; COLLISION means that a plurality of nodes are simultaneously accessed into a channel for transmission, so that COLLISION is caused; idle indicates that the channel is idle, i.e., no node has access to the channel. The Agent determines the channel observation based on the acknowledgement signal from the access point (if it sends) and listening to the channel (if it waits).
(2.2) State space modeling
State collectionContaining M historical states to be tracked, each historical state being observed by an actionThe composition, for a network scenario with two types of priority traffic, there are seven combinations of action observation pairs in total:
(2.3) reward function
For a network scenario with two types of priority traffic, the principle that the reward function always follows is: the higher the priority of the service is, the higher the reward brought by the successful transmission of the service is; the more penalty it incurs for transmission failure. The reward function is set to sum _ rewarded ═ α rewards + (1- α) × (delay/T), wherein α and T are controllable variables, the parameter α is to adjust the influence value of the time delay on the overall reward, and when the influence of the time delay on the overall reward is not considered, the parameter α is initialized to 1; the parameter T is mainly used for unifying the influence range of the time delay on the reward and is initialized to 50; delay is the time delay from the generation of a certain service to the access of the channel, namely scheduling time delay; forwards is the value of the reward proposed for a network scenario with two priority services.
(3) Defining and establishing a neural network model used by the protocol, and training the network model through experience tuples; FIG. 4 is a process of training a deep neural network based on deep reinforcement learning according to the present invention.
The DQN is introduced to enable the DRL-MAC node to better learn the next decision of other nodes on the use condition of a channel, the intelligent agent adopts a deep residual error network architecture for training, the deep residual error network is used for approaching a Q value, the current state s is input, an action strategy a is output, and then an experience tuple is formed by combining other information and used for training the deep residual error network.
And (3.1) initializing the experience pool, setting the capacity of the experience pool and initializing parameters.
(3.2) starting from the time slot t ═ 0;
(3.3) transmitting the current state s into a neural network to calculate the Q value of the state; selecting an action to be executed through a greedy strategy, and recording a channel observation result z and a total reward sum _ rewards obtained after the action is taken; the experience tuple (s, a, r, s ') is put in the experience pool by the obtained state s, the reward r obtained after taking action a and the reaching of the next state s'.
(3.4) if the current generated experience tuple is larger than the capacity of the experience pool, discarding the experience tuple which enters the experience pool earliest, and putting the latest experience tuple into the experience pool; and conversely, sequentially entering the experience tuples into the experience pool according to the sequence.
(3.5) if the current time slot t is a multiple of 10, randomly extracting N experience tuples from the experience pool, and sequentially calculating the y values of the experience tuples:where r represents the current reward obtained by taking action a in the current state s, γ ∈ (0, 1) is the discount factor,and (4) selecting the reward obtained by the action with the maximum Q value for future prediction, and otherwise, entering the step (3.8).
(3.6) updating the Q-estimated network parameter θ using a half-gradient descent algorithm.
(3.7) if the current time slot t is a multiple of the parameter F, assigning the Q-estimated network parameter theta to the target network parameter theta-Otherwise, go to step (3.8).
(3.8) if the time slot t is more than or equal to the set training round, exiting the training process; otherwise, entering the next time slot t ═ t + 1; and proceeds to step (3.3).
(4) And performing performance verification on the trained model through multi-scene simulation comparison to reduce the scheduling delay of the high-priority service under the constraint of ensuring the system throughput.
Fig. 5 is a network scenario used in the simulation experiment of the present invention, where the network scenario includes a base station, N DRL-MAC nodes (nodes adopting the present invention) (N >1), M TDMA nodes, and X q-ALOHA nodes (M + X >1), and data packets are transmitted between the nodes and the base station through a shared wireless channel.
The technical effects of the present invention will be described in detail with reference to simulation experiments.
1. Simulation conditions are as follows:
the simulation experiment of the invention is on a Windows platform, and is mainly configured as follows: CPU is Intel (R) core (TM) i7-7500U, 2.70 GHz; the memory is 8G; the operating system is Windows 10; the simulation software was Pycharm.
2. Simulation content and result analysis:
the simulation experiment is compared with a model sensing node, wherein the model sensing node refers to a multiple access protocol (MAC) mechanism of other coexisting nodes known by the node, and an optimal MAC protocol coexisting with the node is obtained by utilizing the mechanism of the known MAC protocol. The results of the simulation experiment are shown in fig. 6.
Example one: under a transmission network with two types of priority services, a network scene comprises a base station, a DRL-MAC node and a TDMA node; the DRL-MAC node is always in a state with traffic to transmit (saturated traffic scenario).
Fig. 6(a) is a throughput result when a DRL-MAC node coexists with one TDMA node in a saturated traffic scenario, with the goal of achieving system optimal throughput.
The throughput results when the time slot N occupied by TDMA varies from 2 to 9 when the frame length is 10 can be seen from fig. 6 (a). The diagonal filled portions and the solid filled portions in the histogram indicate the throughputs of the DRL-MAC node and the TDMA node, respectively. The circular dotted line is the simulated total throughput, i.e., the total throughput of the system, under coexistence of the DRL-MAC node and the TDMA node. The diamond-marked dashed line represents the value of theoretically optimal system throughput verified by the model-aware node. It can be seen from fig. 6(a) that the circular mark dashed line and the diamond mark dashed line almost coincide. This means that the DRL-MAC node can discover the unused time slots of the TDMA by learning without knowing the protocol used by another node.
Fig. 6(b) is a throughput result when a DRL-MAC node coexists with one TDMA node in a saturated traffic scenario, with the goal of achieving system optimal throughput.
Fig. 6(b) shows the access probability of high-priority data packets by the DRL-MAC node and the TDMA node under the scenario of considering the service priority and not considering the service priority. The solid square-labeled line of fig. 6(b) represents the access probability of a high-priority packet in a scenario in which the traffic priority is considered; the solid line with a circular mark represents the access probability of a high priority packet without considering the traffic priority scenario. It can be seen from fig. 6(b) that the blue line is in most cases above the red line. It can be concluded that the DRL-MAC node can transmit the high-priority service more timely in the scenario in which the service priority is considered, thereby ensuring the priority communication of the high-priority service.
Example two: under a transmission network with two types of priority services, a network scene comprises a base station, a DRL-MAC node and a q-ALOHA node; the DRL-MAC node is always in a state with traffic to transmit (saturated traffic scenario).
Fig. 6(c) is a throughput result when a DRL-MAC node coexists with a q-ALOHA node in a saturated traffic scenario, with the goal of achieving system optimal throughput.
Fig. 6(c) shows the throughput results for q-ALOHA when the access probability q is changed from 0.2 to 0.9 when the q-ALOHA node coexists with the DRL-MAC node in a saturated traffic scenario. The diagonal filled portion and the solid filled portion in fig. 6(c) represent the throughputs of the DRL-MAC node and the q-ALOHA node, respectively. The circular-labeled dotted line represents the total throughput of the system, i.e., the total throughput simulated in the coexistence of the DRL-MAC node and the q-ALOHA node. The diamond-marked dashed line represents the value of theoretically optimal system throughput verified by the model-aware node. It can be seen from fig. 6(c) that the circular mark broken line and the circular mark broken line almost coincide in most cases. This means that the DRL-MAC node can obtain the best throughput by learning the policy without knowing that another node is a q-ALOHA node and the transmission probability q.
Fig. 6(d) is a throughput result when a DRL-MAC node coexists with one q-ALOHA node in a saturated traffic scenario, with the goal of achieving fair transmission between nodes.
Fig. 6(d) is a throughput result of q-ALOHA when the access probability q is changed from 0.2 to 0.6 in the case of realizing the proportional fairness index. The actual throughputs of the q-ALOHA node, the DRL-MAC node and the system are respectively represented by a circular marked solid line, a triangular marked solid line and a square marked solid line, and the theoretical optimal throughputs of the q-ALOHA node, the DRL-MAC node and the system obtained by the model sensing node are respectively represented by a circular marked dotted line, a triangular marked dotted line and a square marked dotted line. As can be seen from fig. 6(d), in the case of implementing the proportional fairness index, there is some relatively small error between the actual throughput and the theoretical optimal throughput, which may indicate that the DRL-MAC node may implement the proportional fairness index by learning a policy without knowing that another node is a q-ALOHA node and that the transmission probability q is.
Fig. 6(e) is a throughput result when a DRL-MAC node coexists with a q-ALOHA node in a saturated traffic scenario, with the goal of achieving system optimal throughput.
Fig. 6(e) is a simulation of the access probability of a high priority packet by q-ALOHA when the access probability q changes from 0.2 to 0.9. The solid line in the figure represents the proportion of all access channel actions of the Agent accessing the high priority data packet under different access probabilities q. As can be seen from fig. 6(e), in a scenario where DRL-MAC and q-ALOHA coexist (in a case where the environment is unstable), it is possible to achieve a higher access probability for high priority traffic than for low priority traffic.
Example three: under a transmission network with two types of priority services, a network scene comprises a base station, a DRL-MAC node and a TDMA node; the DRL-MAC is in a non-saturated traffic scenario.
The unsaturated service scenario refers to: each time slot has a data packet arriving, the arrival rates of the data packets with different priorities are different, the arrival probability of a data packet with a high priority is defined to be 0.3, the arrival probability of a data packet with a low priority is defined to be 0.7, the data packets with different priorities respectively enter data packet queues corresponding to different priorities to queue and wait, in order to avoid the situation that the queues are empty when access action operation is taken, the data packets with certain corresponding priorities are respectively queued into the service queues with different priorities (5 data packets are respectively queued by the high and low priority queues in the invention) when the service queues are initialized, and the situation that the access service queues are empty when training is just started is avoided.
The channel access criterion in the unsaturated service scene is as follows: when taking action a1That is, a low-priority service is accessed in a channel, and if collision occurs in the transmission process, the data packet is dequeued and discarded; when taking action a2That is, a high-priority service is accessed in a channel, collision is generated in the transmission process, the data packet is re-queued and is placed at the head of a queue; the high priority traffic transmission needs are large due to the large reward of the high priority traffic, but since the traffic arrival probability is fixed, the Agent takes action a through learning in the text2That is, when accessing the action of high priority service, when there is no service in the high priority service queue, the action a of accessing the high priority service is still executed2However, actually, the low-priority service is taken out from the low-priority queue for access, and if the data packet is successfully transmitted at the moment, the reward corresponding to the low-priority service is given; and if the transmission fails, giving a penalty corresponding to the high-priority data packet and re-accessing the data packet to the head of the low-priority queue.
Fig. 6(f) shows the results obtained in simulation scenarios corresponding to different parameters α under the coexistence of a DRL-MAC node and a TDMA node in an unsaturated service scenario, where the diagram of each simulation result is divided into three parts: system throughput, high priority service access probability, and system latency.
It can be seen from fig. 6(f) that the delay of the transmission of different priority traffic for different parameters α in the lower left corner of each simulation diagram, when decreasing from 1 to 0.8, the delay of the high priority traffic remains almost constant, while the maximum delay of the low priority traffic decreases gradually from a value close to 80 to a value close to 60. Fig. 6(f) shows that in this scenario, by adjusting the parameter α, the optimal throughput of the system is ensured to be achieved by the learning strategy; on the premise that the access probability of the high-priority service is greater than that of the low-priority data packets, the overall time delay of the system is reduced by sacrificing the access probability of some high-priority data packets.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When used in whole or in part, can be implemented in a computer program product that includes one or more computer instructions. When loaded or executed on a computer, cause the flow or functions according to embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.)). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.
Claims (10)
1. A channel access method of a multi-priority wireless terminal based on deep reinforcement learning is characterized by comprising the following steps:
step one, establishing network scenes with different priority level services;
designing and defining a system model of the protocol, carrying out state space modeling and action space modeling according to the network scene of the protocol, and designing reward functions aiming at different scenes;
step three, defining and establishing a neural network model used by the protocol, and training the network model through experience tuples;
and fourthly, performing performance verification on the trained model through multi-scene simulation comparison.
2. The channel access method for a deep reinforcement learning-based multi-priority wireless terminal as claimed in claim 1, wherein in step one, the establishing of network scenarios with different priority services comprises:
establishing a transmission network with k priority services, wherein k is greater than 0; the network scene comprises a base station, N DRL-MAC nodes (adopting the nodes of the invention), M Time Division Multiple Access (TDMA) nodes and X q-ALOHA nodes, wherein (N > 1; M + X >1) at least comprises one DRL-MAC node and one other protocol node;
the base station is used for acquiring data from a wireless channel between the nodes and the base station and transmitting the data; the DRL-MAC node adopts a multiple access technology based on deep reinforcement learning, if the node sends different priority services, a transmission result fed back by a base station is obtained, and different rewards are obtained according to the different priority services; if the node does not send the service, the channel is intercepted, and the transmission state of other nodes in a certain time slot is obtained through the channel observation result; the time division multiple access node adopts a TDMA protocol and is used for carrying out service transmission according to the time slot which is regularly and periodically occupied and allocated; and the q-ALOHA node adopts q-ALOHA and is used for carrying out service transmission at each time slot with a fixed transmission probability q according to q values under different scenes.
3. The channel access method of multi-priority wireless terminal based on deep reinforcement learning of claim 1, wherein in step two, each DRL-MAC node corresponds to an agent in reinforcement learning; in each time slot, the agent calculates the Q value in the current state by:
wherein q(s)tAnd a, theta) is an approximation of the deep neural network model calculation, and the action a is selected from the action set according to a greedy strategy to maximize the overall expectation of the reward or to better adapt to a dynamically changing wireless network environment.
4. The method for accessing channels of a multi-priority wireless terminal based on deep reinforcement learning of claim 1, wherein in step two, the designing and defining a system model of the protocol, performing state space modeling and action space modeling according to the network scenario of the protocol, and designing a reward function for different scenarios comprises:
(1) motion space modeling
System action set At={a0,a1,a2......akAnd k is the number of types of priority traffic in the network scenario. In the time slot t, the DRL-MAC node needs to make a decision a through a deep neural network, and the decision a is used for determining whether a data packet is accessed to a channel in the current time slot; it is composed ofIn (a)0Indicating no access to the channel; a is1......akRepresenting different priority service access channels corresponding to k;
the resulting channel observation after taking action is ZtThe element belongs to { SUCCESS, COLLISION, IDLENESS }, and channel observation results are obtained by monitoring a channel and are used for forming experience tuples; wherein SUCCESS indicates that the node has accessed the channel and transmitted the data packet successfully; COLLISION means that a plurality of nodes are simultaneously accessed into a channel for transmission, so that COLLISION is caused; idle indicates that the channel is idle, i.e., no node has access to the channel; the Agent determines a channel observation result according to the confirmation signal from the access point and the interception channel;
(2) state space modeling
State collectionContaining M historical states to be tracked, each historical state being observed by an actionComposition, wherein there are a total of 2k +3 combinations:
(3) reward function
For network scenarios of different priority services, the principle that the reward function always follows is as follows: the higher the priority of the service is, the higher the reward brought by successful transmission is, and the more the punishment brought by failed transmission is; the reward function is set to sum _ rewarded ═ α rewards + (1- α) × (delay/T); wherein alpha and T are controllable variables, the parameter alpha is used for adjusting the influence value of the time delay on the whole reward, and when the influence of the time delay on the whole reward is not considered, the parameter alpha is initialized to 1; the parameter T is used for unifying the influence range of the time delay on the reward, and is initialized to 50; delay is the time delay from the generation of a certain service to the access of the channel, namely scheduling time delay; forwards is the reward value, r, proposed for different priority services1......rkReward r corresponding to successful transmission of services with different priorities-1......r-kPunishment corresponding to transmission failure of services with different priorities:
5. the method for accessing channels of multi-priority wireless terminals based on deep reinforcement learning of claim 1, wherein in step three, the defining and establishing the neural network model used by the protocol and training the network model through the experience tuples comprise:
the DQN is introduced to enable the DRL-MAC node to better learn the next decision of other nodes on the use condition of a channel, the intelligent agent adopts a deep residual error network architecture for training, the deep residual error network is used for approaching a Q value, the current state s is input, an action strategy a is output, and then an experience tuple is formed by combining other information and used for training the deep residual error network.
6. The method for accessing channels of multi-priority wireless terminals based on deep reinforcement learning of claim 1, wherein in step three, the neural network model used by the protocol is defined and established, and the network model is trained through experience tuples, further comprising:
(1) initializing an experience pool, setting the capacity of the experience pool, and initializing parameters;
(2) starting from time slot t ═ 0;
(3) transmitting the current state s into a neural network to calculate the Q value of the state; selecting an action to be executed through a greedy strategy, and recording a channel observation result z and a total reward sum _ rewards obtained after the action is taken; putting the acquired state s, the reward r acquired after the action a is taken and the next state s 'which is reached into an experience tuple (s, a, r, s') into an experience pool;
(4) if the current generated experience tuple is larger than the capacity of the experience pool, discarding the experience tuple which enters the experience pool earliest, and putting the latest experience tuple into the experience pool; otherwise, sequentially entering the experience tuples into an experience pool according to the sequence;
(5) if the current time slot t is a multiple of 10, randomly extracting N experience tuples from the experience pool, and sequentially calculating the y values of the experience tuples:where r represents the current reward obtained by taking action a in the current state s, γ ∈ (0, 1) is the discount factor,selecting the reward obtained by the action with the maximum Q value for future prediction, and otherwise, entering the step (8);
(6) updating the Q-estimation network parameter theta by using a half-gradient descent algorithm;
(7) if the current time slot t is a multiple of the parameter F, the Q-estimated network parameter theta is assigned to the target network parameter theta-Otherwise, entering the step (8);
(8) if the time slot t is more than or equal to the set training round, the training process is exited; otherwise, entering the next time slot t ═ t +1, and entering step (3).
7. A channel access system of a multi-priority service wireless terminal for implementing the channel access method of the multi-priority wireless terminal based on deep reinforcement learning of any one of claims 1 to 6, wherein the channel access system of the multi-priority service wireless terminal comprises:
the network scene establishing module is used for establishing network scenes with different priority level services;
the system model design module is used for designing and defining a system model of the protocol;
the space modeling module is used for carrying out state space modeling and action space modeling according to the protocol network scene;
the reward function design module is used for designing reward functions aiming at different scenes;
the neural network model establishing module is used for determining and establishing a neural network model used by the protocol;
the network model training module is used for training the network model through experience tuples;
and the performance verification module is used for performing performance verification on the trained model through multi-scene simulation comparison.
8. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of:
establishing network scenes with different priority services; designing and defining a system model of the protocol, carrying out state space modeling and action space modeling according to the network scene of the protocol, and designing reward functions aiming at different scenes; defining and establishing a neural network model used by the protocol, and training the network model through experience tuples; and performing performance verification on the trained model through multi-scene simulation comparison.
9. A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
establishing network scenes with different priority services; designing and defining a system model of the protocol, carrying out state space modeling and action space modeling according to the network scene of the protocol, and designing reward functions aiming at different scenes; defining and establishing a neural network model used by the protocol, and training the network model through experience tuples; and performing performance verification on the trained model through multi-scene simulation comparison.
10. A wireless communication information data processing terminal, characterized in that the wireless communication information data processing terminal is configured to implement the channel access system of the multi-priority service wireless terminal according to claim 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110781263.0A CN113613339B (en) | 2021-07-10 | 2021-07-10 | Channel access method of multi-priority wireless terminal based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110781263.0A CN113613339B (en) | 2021-07-10 | 2021-07-10 | Channel access method of multi-priority wireless terminal based on deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113613339A true CN113613339A (en) | 2021-11-05 |
CN113613339B CN113613339B (en) | 2023-10-17 |
Family
ID=78304401
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110781263.0A Active CN113613339B (en) | 2021-07-10 | 2021-07-10 | Channel access method of multi-priority wireless terminal based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113613339B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114024639A (en) * | 2021-11-09 | 2022-02-08 | 重庆邮电大学 | Distributed channel allocation method in wireless multi-hop network |
CN114375022A (en) * | 2022-01-08 | 2022-04-19 | 山东大学 | Leader election method based on multi-agent reinforcement learning in wireless network |
CN114826986A (en) * | 2022-03-30 | 2022-07-29 | 西安电子科技大学 | Performance analysis method for ALOHA protocol of priority frameless structure |
CN114938530A (en) * | 2022-06-10 | 2022-08-23 | 电子科技大学 | Wireless ad hoc network intelligent networking method based on deep reinforcement learning |
CN115134060A (en) * | 2022-06-20 | 2022-09-30 | 京东科技控股股份有限公司 | Data transmission method and device, electronic equipment and computer readable medium |
CN115315020A (en) * | 2022-08-08 | 2022-11-08 | 重庆邮电大学 | Intelligent CSMA/CA (Carrier sense multiple Access/Carrier aggregation) backoff method based on IEEE (institute of Electrical and electronics Engineers) 802.15.4 protocol of differentiated services |
CN115767785A (en) * | 2022-10-22 | 2023-03-07 | 西安电子科技大学 | MAC protocol switching method based on deep reinforcement learning in self-organizing network |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190014488A1 (en) * | 2017-07-06 | 2019-01-10 | Futurewei Technologies, Inc. | System and method for deep learning and wireless network optimization using deep learning |
CN110049018A (en) * | 2019-03-25 | 2019-07-23 | 上海交通大学 | SPMA protocol parameter optimization method, system and medium based on enhancing study |
CN110809306A (en) * | 2019-11-04 | 2020-02-18 | 电子科技大学 | Terminal access selection method based on deep reinforcement learning |
CN111628855A (en) * | 2020-05-09 | 2020-09-04 | 中国科学院沈阳自动化研究所 | Industrial 5G dynamic multi-priority multi-access method based on deep reinforcement learning |
CN111711666A (en) * | 2020-05-27 | 2020-09-25 | 梁宏斌 | Internet of vehicles cloud computing resource optimization method based on reinforcement learning |
CN111786713A (en) * | 2020-06-04 | 2020-10-16 | 大连理工大学 | Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning |
-
2021
- 2021-07-10 CN CN202110781263.0A patent/CN113613339B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190014488A1 (en) * | 2017-07-06 | 2019-01-10 | Futurewei Technologies, Inc. | System and method for deep learning and wireless network optimization using deep learning |
CN110049018A (en) * | 2019-03-25 | 2019-07-23 | 上海交通大学 | SPMA protocol parameter optimization method, system and medium based on enhancing study |
CN110809306A (en) * | 2019-11-04 | 2020-02-18 | 电子科技大学 | Terminal access selection method based on deep reinforcement learning |
CN111628855A (en) * | 2020-05-09 | 2020-09-04 | 中国科学院沈阳自动化研究所 | Industrial 5G dynamic multi-priority multi-access method based on deep reinforcement learning |
CN111711666A (en) * | 2020-05-27 | 2020-09-25 | 梁宏斌 | Internet of vehicles cloud computing resource optimization method based on reinforcement learning |
CN111786713A (en) * | 2020-06-04 | 2020-10-16 | 大连理工大学 | Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning |
Non-Patent Citations (2)
Title |
---|
HE JIANG; HAIBO HE: "Q-Learning for Non-Cooperative Channel Access Game of Cognitive Radio Networks", IEEE * |
黄影;严定宇;李男;: "动态频谱接入的Q学习优化算法", 西安电子科技大学学报, no. 06 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114024639A (en) * | 2021-11-09 | 2022-02-08 | 重庆邮电大学 | Distributed channel allocation method in wireless multi-hop network |
CN114024639B (en) * | 2021-11-09 | 2024-01-05 | 成都天软信息技术有限公司 | Distributed channel allocation method in wireless multi-hop network |
CN114375022A (en) * | 2022-01-08 | 2022-04-19 | 山东大学 | Leader election method based on multi-agent reinforcement learning in wireless network |
CN114375022B (en) * | 2022-01-08 | 2024-03-12 | 山东大学 | Channel preemption method based on multi-agent reinforcement learning in wireless network |
CN114826986A (en) * | 2022-03-30 | 2022-07-29 | 西安电子科技大学 | Performance analysis method for ALOHA protocol of priority frameless structure |
CN114826986B (en) * | 2022-03-30 | 2023-11-03 | 西安电子科技大学 | Performance analysis method for ALOHA protocol with priority frameless structure |
CN114938530A (en) * | 2022-06-10 | 2022-08-23 | 电子科技大学 | Wireless ad hoc network intelligent networking method based on deep reinforcement learning |
CN114938530B (en) * | 2022-06-10 | 2023-03-21 | 电子科技大学 | Wireless ad hoc network intelligent networking method based on deep reinforcement learning |
CN115134060A (en) * | 2022-06-20 | 2022-09-30 | 京东科技控股股份有限公司 | Data transmission method and device, electronic equipment and computer readable medium |
CN115315020A (en) * | 2022-08-08 | 2022-11-08 | 重庆邮电大学 | Intelligent CSMA/CA (Carrier sense multiple Access/Carrier aggregation) backoff method based on IEEE (institute of Electrical and electronics Engineers) 802.15.4 protocol of differentiated services |
CN115767785A (en) * | 2022-10-22 | 2023-03-07 | 西安电子科技大学 | MAC protocol switching method based on deep reinforcement learning in self-organizing network |
CN115767785B (en) * | 2022-10-22 | 2024-02-27 | 西安电子科技大学 | MAC protocol switching method based on deep reinforcement learning in self-organizing network |
Also Published As
Publication number | Publication date |
---|---|
CN113613339B (en) | 2023-10-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113613339B (en) | Channel access method of multi-priority wireless terminal based on deep reinforcement learning | |
EP3637708B1 (en) | Network congestion processing method, device, and system | |
Nguyen et al. | RTEthernet: Real‐time communication for manufacturing cyberphysical systems | |
Kong et al. | Performance analysis of IEEE 802.11 e contention-based channel access | |
CN107040948B (en) | CSMA/CA optimization method based on priority | |
CN111294775B (en) | Resource allocation method based on H2H dynamic characteristics in large-scale MTC and H2H coexistence scene | |
US20090129404A1 (en) | Differentiation for bandwidth request contention | |
CN111836370B (en) | Resource reservation method and equipment based on competition | |
CN114339660A (en) | Random access method for unmanned aerial vehicle cluster | |
Hu et al. | Performance and reliability analysis of prioritized safety messages broadcasting in DSRC with hidden terminals | |
CN111601398B (en) | Ad hoc network medium access control method based on reinforcement learning | |
Shoaei et al. | Reconfigurable and traffic-aware MAC design for virtualized wireless networks via reinforcement learning | |
AlQahtani | Performance analysis of cognitive‐based radio resource allocation in multi‐channel LTE‐A networks with M2M/H2H coexistence | |
JP2022553601A (en) | Transmission contention resolution, apparatus, terminal and medium | |
CN113056010A (en) | Reserved time slot distribution method based on LoRa network | |
Ahmed et al. | A QoS-aware scheduling with node grouping for IEEE 802.11 ah | |
Bankov et al. | Approach to real-time communications in Wi-Fi networks | |
Wang et al. | TCP throughput enhancement for cognitive radio networks through lower-layer configurations | |
CN114845338A (en) | Random back-off method for user access | |
Wang et al. | Optimization on information freshness for multi‐access users with energy harvesting cognitive radio networks | |
Kim et al. | Dynamic Transmission and Delay Optimization Random Access for Reduced Power Consumption | |
Raeis et al. | Distributed fair scheduling for information exchange in multi-agent systems | |
Zhai et al. | Large-Scale Micro-Power Sensors Access Scheme Based on Hybrid Mode in IoT Enabled Smart Grid | |
Zazhigina et al. | Analytical study of Restricted Access Window with short slots for fast and reliable data delivery from energy-harvesting sensors | |
Liu et al. | A novel artificial intelligence based wireless local area network channel access control scheme for low latency e‐health applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |