CN111245541B - Channel multiple access method based on reinforcement learning - Google Patents

Channel multiple access method based on reinforcement learning Download PDF

Info

Publication number
CN111245541B
CN111245541B CN202010154072.7A CN202010154072A CN111245541B CN 111245541 B CN111245541 B CN 111245541B CN 202010154072 A CN202010154072 A CN 202010154072A CN 111245541 B CN111245541 B CN 111245541B
Authority
CN
China
Prior art keywords
action
window
contention
channel
competition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010154072.7A
Other languages
Chinese (zh)
Other versions
CN111245541A (en
Inventor
雷建军
黎露
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202010154072.7A priority Critical patent/CN111245541B/en
Publication of CN111245541A publication Critical patent/CN111245541A/en
Application granted granted Critical
Publication of CN111245541B publication Critical patent/CN111245541B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W74/00Wireless channel access
    • H04W74/08Non-scheduled access, e.g. ALOHA
    • H04W74/0833Random access procedures, e.g. with 4-step access
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B17/00Monitoring; Testing
    • H04B17/30Monitoring; Testing of propagation channels
    • H04B17/382Monitoring; Testing of propagation channels for resource allocation, admission control or handover

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Electromagnetism (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention provides a channel multiple access method based on reinforcement learning, which comprises the steps of adjusting and modeling a competition window in a channel access process into a Markov decision process; selecting a competition window action by using an epsilon-greedy strategy in the current action adjustment period; the AP selects an optimal competition window in the current state from the competition window set; the STA generates an OBO backoff value to carry out backoff by using the optimal contention window broadcasted by the AP until the contention is finished; the AP allocates resource units RU to the successfully contended stations through the trigger frame, and the stations use the respective allocated RUs to transmit data; calculating the performance index of the system as a reward after one action adjustment period is finished until the current action adjustment period is finished; updating an action cost function according to the reward of the currently obtained performance index; repeatedly executing the processes to continuously optimize the competition window; the invention can improve the system throughput and fairness and reduce the data transmission time delay.

Description

Channel multiple access method based on reinforcement learning
Technical Field
The invention relates to the field of WLAN (wireless local area network), in particular to a channel multiple access method based on reinforcement learning, which is mainly applied to an IEEE 802.11ax high-density network environment.
Background
In recent years, with the rapid development of various intelligent terminal devices and mobile internet of things services, the demands of people on wireless traffic and service quality are increasing. Wireless local area networks and cellular networks are the main services for carrying wireless networks due to their high speed, flexible deployment and low cost. In the past, the standardization work for WLANs has focused primarily on improving the throughput of the link, rather than efficiently utilizing spectrum resources and improving user experience, and the design of MAC algorithms has not improved significantly. However, after wide deployment of WLANs, we will face some fundamental technical challenges, especially in dense network environments. In these environments, the high collision due to channel contention may cause severe degradation of network performance, failing to provide users with sufficient bandwidth and good user experience. In 2014, the IEEE standards Committee approved the establishment of the 802.11ax task group. 802.11ax aims to provide a mode of operation that enables stations to be deployed in dense scenarios with at least a four-fold improvement in average throughput of STAs.
In a high-density scenario, a conventional MAC protocol has a high collision rate, severe interference and a low channel utilization rate, and cannot support the requirement of diversity of Quality of Service (QoS) of a wireless Service in the future. Meanwhile, an efficient contention window backoff mechanism cannot be provided. The multiple access mechanism based on IEEE 802.11ax can reduce the conflict to a certain extent and improve the utilization rate of the channel. There are still a number of unsolved problems: on one hand, in a high-density scene, along with the great increase of stations, the current multiple access mechanism still cannot effectively avoid conflict and interference, and the performance of an MAC layer is seriously reduced; on the other hand, in the operation process of the network, the faced environment is extremely complex, and the current MAC algorithm based on the traditional communication theory cannot perform dynamic allocation of resources and also cannot efficiently learn historical experience.
Disclosure of Invention
In order to solve the above problems, the present invention aims to provide a channel multiple access method based on reinforcement learning, which models the problem of adjusting the contention window in channel access as a markov decision process, and improves the system performance by a reinforcement learning algorithm.
In order to achieve the purpose, the invention mainly adopts the following processes to process:
and in the channel access process, the station generates an OBO backoff value for backoff by using the uniform contention window broadcasted by the AP. After the backoff is finished, randomly selecting an RU (Buffer Status Report, BSR) to send a competition channel resource, and if a competition ending condition is met, ending the competition; otherwise, the remaining stations continue to contend for the RU. After the competition is finished, the AP allocates different RUs to the stations which succeed in the competition through the trigger frame, and the stations use the respective allocated RUs to transmit data.
In the learning process, after an action adjusting period, the AP calculates the reward in the current state according to the network performance of the previous period, updates the value function of the reinforcement learning model, and reselects a competition window to broadcast to the STA. After a motion adjustment period, the above process is repeatedly performed to optimize the contention window.
Specifically, the invention provides a channel multiple access method based on reinforcement learning, which is particularly suitable for the 802.11ax standard, and the method comprises the following steps:
step 1) adjusting and modeling a competition window in a channel access process into a Markov decision process;
step 2) selecting a competition window action by using an epsilon-greedy strategy in the current action adjustment period; the AP selects an optimal competition window in the current state from the competition window set;
step 3), the STA uses the optimal competition window broadcasted by the AP to generate an OBO backoff value for backoff until the competition is finished;
step 4) the AP allocates resource units RU to the successfully contended stations through the trigger frame, and the stations use the respective allocated RUs to transmit data; judging whether the current action adjusting period is finished or not, and if so, entering the step 5); otherwise, returning to the step 3);
step 5) after an action adjusting period, calculating a performance index of the system as a reward;
step 6) updating an action value function according to the reward of the currently obtained performance index; judging whether a termination condition is met, if not, returning to the step 2) to continuously optimize the competition window after entering the next action adjusting period; otherwise, the flow is terminated.
The invention has the beneficial effects that:
the invention provides a channel multiple access method based on reinforcement learning. Based on the standard back-off mechanism of IEEE 802.11ax, a reinforced learning algorithm is used for dynamically adjusting the contention window. The method realizes the further control of the station competition channel, thereby achieving the effects of improving the throughput and fairness of the system and reducing the time delay.
Drawings
FIG. 1 is a block diagram of a channel multiple access architecture based on reinforcement learning according to the present invention;
FIG. 2 is a diagram of a model for reinforcement learning according to the present invention;
FIG. 3 is a flow chart of the channel multiple access method based on reinforcement learning according to the present invention;
FIG. 4 is a flowchart illustrating AP learning according to the present invention;
fig. 5 is a flowchart of channel access by an STA in the present invention;
fig. 6 is a timing diagram of channel access by STAs in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clearly and completely apparent, the technical solutions in the embodiments of the present invention are described below with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
In an embodiment, as shown in fig. 1, this embodiment provides a frame structure of channel multiple access based on reinforcement learning, an AP adjusts a contention window CW in a learning evaluation manner, and an STA acquires the contention window CW from an acknowledgement frame sent by the AP and uses the contention window to contend for a channel resource; after the STA uses the contention window to contend for the channel resource for multiple times, network feedback is formed and sent to the AP.
In an embodiment, as shown in fig. 2, the embodiment provides a model for performing reinforcement learning by an AP, in the reinforcement learning model, the AP is set as an agent, and an environment state S where the AP is located is a current contention window; the action A allowed to be executed is the increase, decrease and maintenance of the current contention window; the reward is an important performance index in the network, such as throughput, time delay, fairness and the like.
In one embodiment, as shown in fig. 3, the present embodiment provides a channel multiple access method based on reinforcement learning, the method including:
step 1) adjusting and modeling a competition window in a channel access process into a Markov decision process;
step 2) selecting a competition window action by using an epsilon-greedy strategy in the current action adjustment period; the AP selects an optimal competition window in the current state from the competition window set;
step 3), the STA uses the optimal competition window broadcasted by the AP to generate an OBO backoff value for backoff until the competition is finished;
step 4) the AP allocates resource units RU to the successfully contended stations through the trigger frame, and the stations use the respective allocated RUs to transmit data; judging whether the current action adjusting period is finished or not, and if so, entering the step 5); otherwise, returning to the step 3);
step 5) after an action adjusting period, calculating a performance index of the system as a reward;
step 6) updating an action value function according to the reward of the currently obtained performance index; judging whether a termination condition is met, if not, returning to the step 2) to continuously optimize the competition window after entering the next action adjusting period; otherwise, the flow is terminated.
In one embodiment, the model of the markov decision process in step 1) comprises:
Figure BDA0002403438390000041
wherein, S represents a state space, namely a set of all competition windows which can be selected by a station; stRepresents the contention window at time t; a represents the action space, namely the scaling or holding operation is carried out on the current competition window; a ist-1 represents taking the action of contention window reduction at time t; a ist0 denotes that at time t the action is taken with the contention window remaining unchanged, at1 represents the action of increasing the contention window at time t; η represents an adjustment factor; CWcurrA contention window representing a current state; CWnextIndicating the contention window for the next state.
In a preferred embodiment, the next state is determined by the contention window at the next time instant. Eta may be 2 or
Figure BDA0002403438390000042
Other values may be selected depending on the actual situation. By passing
Figure BDA0002403438390000043
The contention window can be controlled to scale or remain unchanged, and only a unique state can be obtained after a certain action is performed, so that the transition probability p(s)t+1|st,at) 1. Wherein, the maximum contention window and the minimum contention window are respectively CWmin=15,CWmax1023, can be adjusted according to the actual conditions。
In one embodiment, the system adopts a value function updating mode of a Q-learning algorithm; the system does not have an updating action value function when running for the first time, and the following formula is needed when the system is not running for the first time; the action cost function includes:
q(s,a)←q(s,a)+α[U-q(s,a)](2)
U←R+γmaxa′∈A(s′)qπ(s′,a′)(3)
wherein q (s, a) represents the value of taking the contention window action a in the s state; alpha is the learning rate and gamma is the discount factor; r represents the reward of a performance index; u is a time sequence difference target and represents the predicted actual reward; q. q.sπ(s ', a') represents the value of using the strategy π to take action a 'in the next state s', although other reinforcement learning algorithms may be used with the present invention.
In one embodiment, the system usesε-a greedy policy selection action, the action of AP selection actually referring to the contention window. That is, the AP selects the optimal contention window CW in the current state to broadcast to the STAs through reinforcement learning. The system may initially broadcast the CW piggybacked with a beacon frame and the non-first broadcast contention window may be set by adding a CW field in the acknowledgment frame MBA. The CW is broadcast while the AP acknowledges the data frame.εThe greedy strategy selection action formula is as follows:
Figure BDA0002403438390000051
wherein pi (s | a) represents that the AP agent selects an action of the current maximized value with a probability of 1-epsilon, and randomly selects an action from all actions with a probability of epsilon; | a(s) | represents the number of selectable actions under the contention window of the s state; q. q.sπ(s, a) represents the value function under the strategy pi, namely the value of the action a selected by the strategy pi under the current state s.
In an embodiment, after the channel access method of the present invention is used, after a system runs an action adjustment period, performance indexes such as system throughput, time delay, fairness, etc. in the period can be counted, and data transmission can be performed for many times in the action adjustment period. From these performance indicators, a reward R can be calculated, the calculation formula of the reward of the performance indicators comprising:
R=p(t)(5)
Figure BDA0002403438390000061
Figure BDA0002403438390000062
wherein, p (t) is an important performance index in the network, including throughput, time delay or/and fairness; thoutthputiRepresents the system throughput of the ith cycle; delyTimeiRepresenting the average delay of the ith cycle.
In another embodiment, the reinforcement learning-based channel multiple access method may further include:
in the learning process, after an action adjusting period, the AP calculates the reward in the current state according to the network performance of the previous period, updates the value function of the reinforcement learning model, and reselects a competition window to broadcast to the STA. After a motion adjustment period, the above process is repeatedly performed to optimize the contention window.
And in the channel access process, the station generates an OBO backoff value for backoff by using the uniform contention window broadcasted by the AP. After the backoff is finished, randomly selecting one RU to send BSR competition channel resources, and if the competition ending condition is met, ending the competition; otherwise, the remaining stations continue to contend for the RU. After the competition is finished, the AP allocates different RUs to the stations which succeed in the competition through the trigger frame, and the stations use the respective allocated RUs to transmit data.
The learning process may refer to fig. 4, and may also include:
step S21, initializing parameters, and establishing a Markov decision process with the AP as an agent and the environment state as a current competition window;
the environment state S of the AP is the current competition window; the action allowed to be performed is increasing, decreasing or not changing the current contention window; the reward is an important performance index in the network, such as throughput, time delay and the like.
Step S22, updating the action value function;
the action value function can record historical experience and can be used for adjusting a later competition window.
Step S23, usingε-a greedy policy selection action;
thereby trading off exploration and utilization. The AP performs the action by actually scaling or holding without adjustment the current contention window.
Step S24, the STA contends for the channel and transmits data;
alternatively, the process of step S24 may be a channel access process, and reference may be made to the above-described embodiment.
Step S25, obtaining reward, and counting some performance indexes of the last action adjusting period as reward;
as a preferred embodiment, the present embodiment prioritizes throughput as a reward.
Step S26, the action cost function is updated, and the system updates the action cost function again according to the reward obtained in the current action adjustment period. At this time, it is determined whether the system satisfies the termination condition, and if so, the system is terminated. Otherwise, execution continues at step S21.
The channel access procedure may refer to fig. 5, and may also include:
in step S11, after acquiring the contention window CW from the acknowledgement frame sent by the AP, the STA randomly selects a backoff value from [0, CW ], and records it as an OBO. If the channel is idle, the total RU number is subtracted from the OBO in each backoff process until the OBO is less than or equal to 0.
In step S12, when the OBO of the STA is less than or equal to 0, the STA gets a chance to contend for the channel resource. The STA randomly selects one RU to transmit a BSR. In order to ensure the service quality of the high-priority STA, the high-priority STA is allowed to obtain two continuous chances of competing for channel resources after one backoff is finished; while low priority STAs have only one chance to contend for channel resources.
In step S13, while the STA competes for the RU, the AP counts the number of STAs and the number of contention rounds that successfully competed. If the number of STAs for which the contention succeeds is greater than or equal to the total number of RUs or the number of contention rounds is greater than the maximum number of contention rounds, the contention ends. After the competition is finished, the AP sends a trigger frame, and each STA which successfully competes for RUs is allocated with one RU.
In step S14, the STA transmits data using the RU allocated by the AP, and after receiving the acknowledgement frame, the STA re-executes step S11.
In another embodiment, the channel access procedure may further include:
step S111, after acquiring the contention window CW from the acknowledgement frame sent by the AP, the STA randomly selects a backoff value from [0, CW ], and records the backoff value as OBO. If the channel is continuously idle for one DIFS frame interval, the STA starts to retreat; if the channel is busy due to the transmission of RSB frames, the backoff may start just after waiting for one MIFS frame interval. The total number of RUs is subtracted from the OBO in each backoff procedure until the OBO is less than or equal to 0.
In step S112, when the OBO of the STA is less than or equal to 0, the STA obtains an opportunity to contend for the channel resource. The STA randomly selects one sub-channel to transmit the BSR contention channel, and each RU can be regarded as one sub-channel. In order to ensure the service quality of the high-priority STA, the high-priority STA is allowed to obtain two continuous chances of competing for channel resources after one backoff is completed, and if one competition is successful in the two times, the STA is considered to be successful in competing for the channel; while low priority STAs have only one chance to contend for channel resources. The system mainly divides the service types into two types: a high priority video site and a low priority background site.
In step S113, while the STA competes for the RU, the AP counts the number of STAs and the number of contention rounds that successfully compete. After waiting for a DIFS frame idle time, each idle slot and each time BSR is transmitted is considered a contention round.
For example, fig. 6 is a timing diagram of STAs contending for a channel and transmitting data in the system of the present invention; the number of competing rounds in fig. 6 is 5 rounds. If the number of STAs for which the contention succeeds is greater than or equal to the total number of RUs or the number of contention rounds is greater than the maximum number of contention rounds, the contention ends. After the competition is finished, the AP sends a trigger frame and allocates one RU for each STA which successfully competes for the RUs. If the number of the successful STAs is larger than the total number of the RUs, randomly marking the part of the STAs as failed in contention. Until the number of STAs with successful contention equals the total number of RUs, one RU is allocated to each STA.
Step S114, after receiving the trigger frame TF, the STAs participating in the contention for channel resources transmit data using the RU allocated by the AP, and after receiving the acknowledgment frame MBA, the process goes to step S111, and the STAs having data transmission can perform channel access again.
In a preferred embodiment, video traffic is prioritized high and background traffic is prioritized low.
In one embodiment, the optimal contention window broadcasted by the AP includes that the AP broadcasts in a manner of piggybacking a contention window CW by using a beacon frame, and a non-first-time broadcast contention window adds a CW field in an acknowledgement frame MBA; broadcasting CW when the AP confirms the data frame; of course, the first broadcast may be as shown in fig. 6, with MBA frames sent in all subchannels.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: ROM, RAM, magnetic or optical disks, and the like.
The above-mentioned embodiments, which further illustrate the objects, technical solutions and advantages of the present invention, should be understood that the above-mentioned embodiments are only preferred embodiments of the present invention, and should not be construed as limiting the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (7)

1. A channel multiple access method based on reinforcement learning, the method comprising:
step 1) adjusting and modeling a competition window in a channel access process into a Markov decision process;
step 2) selecting a competition window action by using an epsilon-greedy strategy in the current action adjustment period; the AP selects an optimal competition window in the current state from the competition window set;
the epsilon-greedy strategy selects the formula adopted by the competition window action and comprises the following steps:
Figure FDA0003201796050000011
wherein pi (s | a) represents that the AP agent selects an action of the current maximized value with a probability of 1-epsilon, and randomly selects an action from all actions with a probability of epsilon; | a(s) | represents the number of selectable actions under the contention window of the s state; q. q.sπ(s, a) represents a cost function of taking action a under policy π;
step 3), the STA uses the optimal competition window broadcasted by the AP to generate an OBO backoff value for backoff until the competition is finished;
the backoff contention process includes:
step 31) after obtaining the optimal contention window CW from the acknowledgement frame sent by the AP, the STA randomly selects a backoff value from [0, CW ] and records the backoff value as OBO; if the channel is idle, subtracting the total RU number from the OBO in each backoff process until the OBO is less than or equal to 0;
step 32) when the OBO of the STA is less than or equal to 0, the STA obtains the opportunity of competing for the channel resources; the STA randomly selects one RU to send BSR frames;
step 33), the AP counts the number of the STAs successfully contended and the number of the contention rounds; if the number of the STAs successfully contended is larger than or equal to the total number of the RUs or the number of the contention rounds is larger than the maximum number of the contention rounds, the contention is ended;
step 4) the AP allocates resource units RU to the successfully contended stations through the trigger frame, and the stations use the respective allocated RUs to transmit data; judging whether the current action adjusting period is finished or not, and if so, entering the step 5); otherwise, returning to the step 3);
step 5) after an action adjusting period, calculating a performance index of the system as a reward;
step 6) updating an action value function according to the reward of the currently obtained performance index; judging whether a termination condition is met, if not, returning to the step 2) to continuously optimize the competition window after entering the next action adjusting period; otherwise, the flow is terminated.
2. The channel multiple access method based on reinforcement learning of claim 1, wherein the model of the markov decision process in step 1) comprises:
S={s1,s2,…,sn},st∈CW
A={a1,a2,…,an},at∈{-1,0,1}
st=CWcurr
st+1=CWnext
Figure FDA0003201796050000021
wherein, S represents a state space, namely a set of all competition windows which can be selected by a station; stRepresents the contention window at time t; a represents the action space, namely the scaling or holding operation is carried out on the current competition window; a ist-1 represents taking the action of contention window reduction at time t; a ist0 denotes that at time t the action is taken with the contention window remaining unchanged, at1 represents the action of increasing the contention window at time t; η represents an adjustment factor; CWcurrA contention window representing a current state; CWnextIndicating the contention window for the next state.
3. The channel multiple access method based on reinforcement learning of claim 1, wherein the initial action value in step 2) is q (s, a) ═ 0.
4. The channel multiple access method based on reinforcement learning of claim 1, wherein the step 32) comprises, in order to guarantee the service quality of the high priority STA, allowing the high priority STA to obtain two consecutive opportunities to contend for the channel resource after one backoff is completed; while low priority STAs have only one chance to contend for channel resources.
5. The channel multiple access method based on reinforcement learning of claim 1, wherein the optimal contention window broadcasted by the AP comprises that the AP broadcasts by using a beacon frame to piggyback a contention window CW, and a non-first-time broadcast contention window (MBA) adds a CW field in an acknowledgement frame (MBA); the CW is broadcast while the AP acknowledges the data frame.
6. The channel multiple access method based on reinforcement learning of claim 1, wherein the calculation formula of the reward of the performance index in the step 5) comprises:
R=p(t)
Figure FDA0003201796050000031
Figure FDA0003201796050000032
wherein, p (t) is an important performance index in the network, including any one or more of throughput, time delay or fairness; thoutthputiRepresents the system throughput of the ith cycle; delyTimeiRepresenting the average delay of the ith cycle.
7. The channel multiple access method based on reinforcement learning of claim 1, wherein the calculation formula of the action cost function in step 6) comprises:
q(s,a)←q(s,a)+α[U-q(s,a)]
U←R+γmaxa′∈A(s′)qπ(s′,a′)
wherein q (s, a) represents the value of taking the contention window action a in the s state; alpha is the learning rate and gamma is the discount factor; r represents the reward of a performance index; u is a time sequence difference target and represents the predicted actual reward; q. q.sπ(s ', a') represents the value of selecting action a 'in the next state s' using strategy π.
CN202010154072.7A 2020-03-07 2020-03-07 Channel multiple access method based on reinforcement learning Active CN111245541B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010154072.7A CN111245541B (en) 2020-03-07 2020-03-07 Channel multiple access method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010154072.7A CN111245541B (en) 2020-03-07 2020-03-07 Channel multiple access method based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN111245541A CN111245541A (en) 2020-06-05
CN111245541B true CN111245541B (en) 2021-11-16

Family

ID=70876879

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010154072.7A Active CN111245541B (en) 2020-03-07 2020-03-07 Channel multiple access method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN111245541B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112492656B (en) * 2020-11-25 2022-08-05 重庆邮电大学 Wireless network access point switching method based on reinforcement learning
CN112584541A (en) * 2020-11-28 2021-03-30 重庆邮电大学 Greedy algorithm based wireless network multichannel multiple access method
CN112566161B (en) * 2020-12-02 2022-07-15 温州职业技术学院 WLAN target wake-up time scheduling method under deterministic channel access condition
CN115315020A (en) * 2022-08-08 2022-11-08 重庆邮电大学 Intelligent CSMA/CA (Carrier sense multiple Access/Carrier aggregation) backoff method based on IEEE (institute of Electrical and electronics Engineers) 802.15.4 protocol of differentiated services

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110336620A (en) * 2019-07-16 2019-10-15 沈阳理工大学 A kind of QL-UACW back-off method based on MAC layer fair exchange protocols

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100285327B1 (en) * 1996-12-02 2001-04-02 박종섭 Method for testing power control guide function of base station transceiver subsystem in mobile telecommunication system
CN102256262B (en) * 2011-07-14 2013-09-25 南京邮电大学 Multi-user dynamic spectrum accessing method based on distributed independent learning
CN109639377B (en) * 2018-12-13 2021-03-23 西安电子科技大学 Spectrum resource management method based on deep reinforcement learning
CN110035559B (en) * 2019-04-25 2023-03-10 重庆邮电大学 Intelligent competition window size selection method based on chaotic Q-learning algorithm

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110336620A (en) * 2019-07-16 2019-10-15 沈阳理工大学 A kind of QL-UACW back-off method based on MAC layer fair exchange protocols

Also Published As

Publication number Publication date
CN111245541A (en) 2020-06-05

Similar Documents

Publication Publication Date Title
CN111245541B (en) Channel multiple access method based on reinforcement learning
US20060215686A1 (en) Communication method for accessing wireless medium under enhanced distributed channel access
US20060062189A1 (en) Wireless transceiver, circuit module, and method for setting channel access time
CN111163491B (en) Fine-grained statistical priority multiple access method with high channel utilization rate
CN104936303A (en) Carrier sensing threshold and competition window combined control method
CN111328052B (en) Channel resource allocation method in high-density wireless network
Uwai et al. Adaptive backoff mechanism for OFDMA random access with finite service period in IEEE802. 11ax
CN109257830B (en) QoS-based vehicle-mounted network self-adaptive back-off method
Syed et al. Delay analysis of IEEE 802.11 e EDCA with enhanced QoS for delay sensitive applications
Zhang et al. Performance analysis of reservation and contention-based hybrid MAC for wireless networks
Huang et al. Detailed analysis for IEEE 802.11 e EDCA in non-saturated conditions-Frame-transmission-cycle approach
CN116489813A (en) Self-adaptive conflict back-off method and system suitable for Lora-Mesh network
Achary et al. Performance enhancement of IEEE 802.1 le WLAN by dynamic adaptive contention window
Gopinath et al. Channel status based contention algorithm for non-safety applications in IEEE802. 11p vehicular network
CN106937326B (en) Method for coordinating transmission among base stations and first base station
CN115022978A (en) Wireless network uplink scheduling method based on self-adaptive grouping and reinforcement learning
KR100853695B1 (en) Wireless lan apparatus based on multiple queues
WO2016155218A1 (en) Method and device for sending wireless frames
CN112584541A (en) Greedy algorithm based wireless network multichannel multiple access method
Xu et al. Time-Triggered Reservation for Cooperative Random Access in Wireless LANs
Ojeda-Guerra et al. Adaptive tuning mechanism for EDCA in IEEE 802.11 e wireless LANs
CN111263463A (en) IEEE802-11ax QoS channel access control method based on service priority
Lv et al. Dynamic polling sequence arrangement for low-latency wireless LAN
Liu et al. DRL-based channel access in NR unlicensed spectrum for downlink URLLC
CN117241409B (en) Multi-type terminal random access competition solving method based on near-end policy optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant