CN108924944A - The dynamic optimization method of contention window value coexists in LTE and WiFi based on Q-learning algorithm - Google Patents

The dynamic optimization method of contention window value coexists in LTE and WiFi based on Q-learning algorithm Download PDF

Info

Publication number
CN108924944A
CN108924944A CN201810797200.2A CN201810797200A CN108924944A CN 108924944 A CN108924944 A CN 108924944A CN 201810797200 A CN201810797200 A CN 201810797200A CN 108924944 A CN108924944 A CN 108924944A
Authority
CN
China
Prior art keywords
value
base station
laa
small base
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810797200.2A
Other languages
Chinese (zh)
Other versions
CN108924944B (en
Inventor
裴二荣
江军杰
李露
程巍
李海星
马玉鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN201810797200.2A priority Critical patent/CN108924944B/en
Publication of CN108924944A publication Critical patent/CN108924944A/en
Application granted granted Critical
Publication of CN108924944B publication Critical patent/CN108924944B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W74/00Wireless channel access
    • H04W74/08Non-scheduled access, e.g. ALOHA
    • H04W74/0808Non-scheduled access, e.g. ALOHA using carrier sensing, e.g. carrier sense multiple access [CSMA]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The present invention relates to the dynamic optimization methods that contention window value coexists in the LTE and WiFi based on Q-learning algorithm, belong to field of communication technology, including step:1, the state set and set of actions of the small base station LAA are set;2, the state and behavior Q value of the small base station LAA are initialized;3, the initial state value of the small base station LAA is calculated;4, Logistic Chaotic map sequence is calculated according to formula, and is mapped in the small base station behavior value set of LAA and randomly chooses a behavior at(i);5, process performing at(i) after, environment reward value r is obtainedt, into next state st+1;6, the behavior Q value function of the small base station LAA is updated;7, t ← t+1 is enabled, step 4~6 are repeated, until reaching target.The present invention can improve the availability of frequency spectrum of channel, while extending the power system capacity of next generation communication system under conditions of guaranteeing user fairness, provide preferable service quality for user, promote user experience.

Description

The dynamic that contention window value coexists in LTE and WiFi based on Q-learning algorithm is excellent Change method
Technical field
The invention belongs to field of communication technology, it is related to a kind of LTE based on Q-learning algorithm and WiFi and competition coexists The dynamic optimization method of window value.
Background technique
Wireless mobile communications become more next in Future Information communication system with the convenience of its popularity and access that use More important role.With the rapid development of mobile Internet and internet of things service, mobile data flow abruptly increase leads to authorization frequency Section shortage, therefore operator wishes to excavate unlicensed spectrum to supplement authorization frequency spectrum.LTE-U (the LTE- that 3GPP is proposed Unlicensed) also it is referred to as authorization auxiliary access module (Licensed Assisted Access, LAA), it is intended to tie up as far as possible Under conditions of holding original LTE protocol specification, LTE technology is applied to unauthorized frequency range (such as near 5GHz), by unauthorized Frequency range disposes small base station, and allows LTE to cooperate in unauthorized frequency range and authorized spectrum band by carrier aggregation technology, to reach To the purpose for promoting honeycomb system capacity and the unauthorized frequency range availability of frequency spectrum of raising.
At present in unauthorized frequency range there are mainly two types of the co-existence schemes of LTE and WiFi:Duty ratio silent mode (Duty Cycle Muting, DCM) and LAA.DCM is first version of LTE-U, initially by Ericsson and Qualcomm in 2013 It proposes.This scheme with WiFi shares unlicensed spectrum by way of for a period of time LTE periodic quiet, does not need " first to listen Say afterwards " (Listen Before talk, LBT), and be easy to dispose because modification LTE protocol is not needed, only exist at present China, India, South Korea and the U.S. use.In the Sophia Antipolis meeting of in June, 2014 France, it is put forward for the first time LTE LAA scheme.This scheme seeks to a long-range global solutions, its important feature is exactly that LTE is accessed not It needs to assess channel situation before authorization frequency spectrum, i.e. clear channel assessment (CCA) (the Clear Channel of LBT mechanism Assessment, CCA) process.Thus this mechanism needs to modify to LTE protocol stack and the support of equipment vendor.At present The telecommunications such as 3GPP, ESTI tissue is also actively formulating relevant criterion to LBT coexistence mechanism.We study LTE and WiFi network Between the coexistence mechanism based on LBT, that is, LAA mechanism.Due to the concern and worry to the LAA mechanism performance based on LBT, Yi Xieyan Study carefully personnel to assess the performance of this coexistence mechanism.By the study found that the contention window value of LBT mechanism to machine coexists The performance influence of system is very big, and a good avoidance mechanism can generate reasonable competition window according to actual loading situation in network Value makes user obtain preferable experience to improve the availability of frequency spectrum of channel.
Currently, existing avoidance mechanism lacks the process of dynamic learning, such as binary exponential backoff mechanism, fixed contention window Mouth avoidance mechanism etc., and symbiotic system channel cannot be objectively limited according to real-time scene flexibly adjustment system parameter The raising of the availability of frequency spectrum.
Therefore, a good avoidance mechanism is designed, for real-time network load condition, type of service etc. can be generated Reasonable contention window value will be helpful to improve the availability of frequency spectrum of channel, while the system for extending next generation communication system is held Amount, provides preferable service quality for user, to promote user experience.
Summary of the invention
In view of this, the purpose of the present invention is to provide a kind of LTE based on Q-learning algorithm and WiFi coexist it is competing The dynamic optimization method of window value is striven, the small base station LAA, can be according to network real time traffic load, industry by Q-learning algorithm The factors such as service type carry out its LBT mechanism contention window value for coexisting with WiFi system of flexibly adjustment, are guaranteeing LTE and WiFi Under conditions of user fairness coexists, overall system throughput is maximized, the availability of frequency spectrum of symbiotic system is improved, to promote user Experience.This method has the characteristics that be concisely and efficiently, and at the same time, has certain portability.
In order to achieve the above objectives, the present invention provides the following technical solutions:
The dynamic optimization method of contention window value, including following step coexists in LTE and WiFi based on Q-learning algorithm Suddenly:
S1:The state set and set of actions of the small base station LAA are set;
S2:At the t=0 moment, the state and behavior Q value for initializing the small base station LAA are " 0 ";
S3:Calculate the original state s of the small base station LAAtState value;
S4:Logistic Chaotic map sequence is calculated according to formula, the sequence is then mapped to the small base station behavior value of LAA In set and randomly choose a behavior at(i);
S5:Process performing at(i) after, system will obtain environment reward value r according to formulat, then into next state st+1
S6:The behavior Q value function of the small base station LAA is updated according to formula;
S7:T ← t+1 is enabled, step S4~S6 is repeated, until reaching dbjective state.
Further, in step sl, the state set of the small base station LAA is expressed as the combination of throughput of system and fairness, That is st={ Rt,Ft, RtIndicate t moment system total throughout obtained in unauthorized frequency range, i.e. LAA and WiFi user gulps down The sum of the amount of spitting, FtIndicate the fairness function on average, defining fairness function is:
Wherein Rt(s, l) and Rt(s, w) indicates that LAA and WiFi user throughput, nl indicate the quantity of the small base station LAA, nw table The small base station LAA is divided into four kinds of states according to predefined handling capacity and fairness threshold value by the number of users for showing WiFi:It is low to handle up The high fairness of low fairness, poor throughput, the low fairness of high-throughput and the high fairness of high-throughput are measured, i.e.,
WhereinWithThe threshold value of handling capacity and fairness is respectively indicated, and
For behavior set, using contention window value as the small base station behavior of LAA, and according to the Ma Er of limited action set Section's husband's process defines 16≤a of any small base station behavior of t moment LAAt(i)≤128。
Further, in step s 2, the state and behavior Q value that the small base station LAA is arranged are null matrix, base station small for LAA The solution target of markov decision process is to find an optimal policy π*, so that the value V (s) of each state s is simultaneously Reach maximum, state value function representation is as follows:
Wherein r (st,at) indicate the reward value that the small base station LAA is obtained from environment, p (st+1|st,at) indicate the small base of LAA It stands when in state stWhen housing choice behavior atAfter be transferred to state st+1Probability.
Further, in step s 4, the target of the small base station LAA is to obtain higher reward value, and introducing has ergodic, rule The chaotic motion of rule property and random nature is as a kind of Optimization Mechanism;
There are three types of common mapped systems in chaos system:Logistic mapping, Chebyshev mapping and Henon mapping, Its equation is mapped for Logistic to be expressed as:
zk+1=μ zk(1-zk)
Wherein 0≤μ≤4 are known as branch parameter, and when μ ∈ [3.5699456 ..., 4], logistic mappings work is in chaos State takes μ=4;K indicates the number of iterations, and z is known as Chaos Variable, and chaos domain is (0,1).
Further, in step s 5, the small base station LAA will obtain a reward value after executing the behavior of selection from environment, Reward value function is defined as:
Wherein ε indicates weight factor and 0 < ε < 1,Indicate symbiotic system handling capacity minimum requirements threshold value, Ft° indicate altogether The minimum of deposit system fairness function requires threshold value.
Further, in step s 6, the small base station LAA after obtaining reward value in environment, is needing to carry out more Q matrix Newly, more new formula is:
α indicates learning rate in formula and 0 < α < 1, Υ indicate discount factor and 0≤Υ < 1.
The beneficial effects of the present invention are:By Q-learning algorithm dynamic optimization LTE and WiFi in unauthorized frequency range On the contention window value based on LBT mechanism coexistence, compared with traditional back off algorithm, the present invention in be based on Q-learning algorithm Dynamic optimization can be carried out to the contention window value that LTE and WiFi coexist on unauthorized, the small base station LAA can be according to network reality When scene contention window value is adjusted flexibly.Its process as shown in Fig. 2, the small base station LAA first under some state, according to current Logistic mapping selection of the environment based on chaos system simultaneously executes some behavior;Then environment of observation obtains reward value, according to Formula is updated Q functional value and is determined the behavior of next state based on current Q functional value, repeats above-mentioned movement until convergence, originally Invention can improve the availability of frequency spectrum of channel, while extending next generation communication system under conditions of guaranteeing user fairness Power system capacity, provide preferable service quality for user, promote user experience.
Detailed description of the invention
In order to keep the purpose of the present invention, technical scheme and beneficial effects clearer, the present invention provides following attached drawing and carries out Explanation:
Fig. 1 is that competition window coexists in a kind of LTE based on Q-learning algorithm described in the embodiment of the present invention and WiFi The flow diagram of the dynamic optimization method of value;
Fig. 2 is Q-learning described in the embodiment of the present invention and environmental interaction process model;
Fig. 3 is the network illustraton of model that LTE described in the embodiment of the present invention and WiFi coexist.
Specific embodiment
Below in conjunction with attached drawing, a preferred embodiment of the present invention will be described in detail.
The present invention is based on LBT mechanism coexistence problem in WiFi for LTE on unauthorized frequency range (5GHz), proposes a kind of base The dynamic optimization method of contention window value coexists in the LTE and WiFi of Q-learning algorithm.Compared with traditional back off algorithm, It can be excellent to the contention window value progress dynamic that LTE and WiFi coexist on unauthorized based on Q-learning algorithm in the present invention Change, contention window value can be adjusted flexibly according to the real-time scene of network in the small base station LAA.Its process is as shown in Fig. 2, LAA is small first Under some state, Logistic mapping selection according to current environment based on chaos system simultaneously executes some behavior for base station; Then environment of observation obtains reward value, updates Q functional value according to formula and determines the row of next state based on current Q functional value To repeat above-mentioned movement until convergence.
Consider that there are multiple LAA small base stations and multiple WiFi access points (AP), network model such as Fig. 3 institutes in coexistence scenario Show.Since the small base station LAA can be run in multiple unauthorized frequency ranges, and it is primarily upon the performance that coexists of LAA, therefore, institute The scene of consideration can simplify as simpler coexistence scenario, and there are the small bases of multiple LAA on specific one unlicensed channel It stands and a WiFi AP.Assuming that there are the small base stations nl LAA and one to have nw user's in the coexistence scenario considered WiFi AP, wherein the network insertion of WiFi user follows 802.11 standard of IEEE.
As shown in Figure 1, the contention window based on dynamic optimization LTE and WiFi based on LBT mechanism coexistence in unauthorized frequency range The method of mouth value, this approach includes the following steps:
100:The state set and set of actions of the small base station LAA are set;
200:At the t=0 moment, the state and behavior Q value for initializing the small base station LAA are " 0 ";
300:Calculate the original state s of the small base station LAAtState value;
400:Logistic Chaotic map sequence is calculated according to formula, the sequence is then mapped to the small base station behavior of LAA In value set and randomly choose a behavior at(i);
500:Process performing at(i) after, system will obtain environment reward value r according to formulat, then into next shape State st+1
600:The behavior Q value function of the small base station LAA is updated according to formula;
700:T ← t+1 is enabled, step 400~600 are repeated, until reaching dbjective state.
Q-learning algorithm is a kind of enhancing study of determining optimizing decision strategy using algorithm, is considered different Walk a kind of method of Dynamic Programming.During Q-learning algorithm iteration, state set is defined as S, if the decision-making time For t, then st∈ S indicates that in the state of the small base station t moment LAA be st.Meanwhile the limited behavior that may execute the small base station LAA Set is defined as A, at∈ A indicates the behavior in the small base station t moment LAA.Reward function r (st,at) indicate that the small base station LAA is based on State in which stProcess performing atThe reward value obtained from environment afterwards, then from state stIt is transferred to st+1, determine next The plan time, t+1 was to QtFunction is updated.Q-learning algorithm is really markov decision process (Markov Decision Processes, MDP) a kind of version.
In co-existin networks, the small base station user of LAA in unauthorized frequency range with WiFi user's harmonious coexistence.Based on Q- Learning algorithm working principle, state set is expressed as follows:
st={ Rt,Ft}
Wherein RtIndicate t moment system total throughout obtained, i.e. R in unauthorized frequency ranget=Rt(s,l)+Rt(s, w)。FtIt indicates the fairness function on average, fair function is defined as follows:
Wherein Rt(s,l)(Rt(s, w)) indicate LAA (Wi-Fi) user throughput, FtValue show that system is got over closer to 1 It is fair.According to predefinedWith(and) threshold value, the small base station LAA is divided into four kinds of states:The low public affairs of poor throughput The high fairness of levelling, poor throughput, the low fairness of high-throughput, the high fairness of high-throughput.Therefore the list of elements of state set S Show as follows:
Using contention window value as behavior set, then the behavior set A={ a (1), a (2) ..., a (k) } of the small base station LAA, Its unit is number of time slots.According to the markoff process of limited action set, the small base station behavior 16 of any t moment LAA is defined ≤at(i)≤128
The task that the small base station LAA faces is to determine an optimal policy, so that reward obtained is maximum.It is small for LAA Base station, can be according to current state, then environment of observation makes best decision to state/movement of next step.State st's Accoumulation of discount reward value function can be expressed as:
Wherein r (st,at) indicate the small base station LAA in state stSelection acts atWhen instant reward obtained.Υ indicates folding The factor and 0≤Υ < 1 are detained, discount factor tends to the 0 small base station expression LAA and mainly considers to reward immediately.p(st+1|st,at) indicate The small base station selected movement a of LAAtWhen from state stIt is transferred to st+1Probability.The target that MDP is solved is to find an optimal policy π*, so that the value V (s) of each state s reaches maximum simultaneously.According to bellman principle, when total discount period of the small base station LAA An optimal policy π can at least be obtained when reward is maximum by hoping*So that:
Wherein V*(st) indicate the small base station LAA from state stStart and follows optimal policy π*Maximum-discount obtained is tired Count reward value.The tactful π given for one, is the function that state space is mapped to motion space, i.e.,:π:st→at.Therefore Optimal policy can be expressed as form:
π*(st)=argV*(st)
The target of the small base station LAA is to obtain higher reward value, therefore, in each state, it will selection has higher Q The movement of value.But in the initial stage of study, fewer for state-movement experience, Q value cannot accurately indicate correct Reinforcement value, in general, the movement of highest q value, which results in the small base station LAA, always can not explore it along identical path He is preferably worth, to be easily trapped into local optimum.Therefore, in order to overcome the disadvantage, the small base station LAA must be randomly chosen dynamic Make, therefore, introducing has the chaotic motion of ergodic, regularity and random nature as a kind of Optimization Mechanism, to reduce The small base station movement selection strategy of LAA falls into the possibility of locally optimal solution.
Chaos system mainly has Logistic mapping, Chebyshev mapping and Henon to map three kinds, for Logistic Mapped system, equation are expressed as:
zk+1=μ zk(1-zk)
Wherein, 0≤μ≤4 are known as branch parameter in formula, and k indicates the number of iterations, and z is known as Chaos Variable, chaos domain be (0, 1).When μ ∈ [3.5699456 ..., 4], logistic mappings work is in chaos state, that is to say, that in logistic mapping The lower sequence generated of effect is aperiodic and not convergent.The chaotic motion state that chaos system shows seems random complexity, But it there are in fact inherent laws.
Based on reward value function, the small base station LAA will change using high-throughput and high fairness as target selection strategy Generation.The reward value function that the small base station LAA is obtained from environment is defined as:
Wherein ε indicates weight factor and 0 < ε < 1, ε are smaller shows that Q-learning process is more likely to fair sexual factor Reward obtained.Indicate symbiotic system handling capacity minimum requirements threshold value, Ft° indicate symbiotic system fairness function minimum It is required that threshold value.It can be seen that r from reward value function expressiontIt is bounded function, item is restrained according to watt golden this (Watkins) The Q-learning process has convergence known to part.In view of the throughput performance and network fairness factor of whole network, Reward value function throughput of system be higher than minimum throughput threshold under conditions of make fairness functional value as close as 1。
It is right in a recursive manner in each moment t as the following formula based on the small base station strategy π, LAA in Q-learning algorithm Q value function is calculated:
It will be apparent that Q value is indicated when the small base station LAA is in state stWhen follow tactful π execution movement atExpectation discount obtained Reward.Therefore, assessment optimal policy π is aimed at*Under Q value.From above formula it can be concluded that state value function and behavior value function Relationship it is as follows:
However, being based on uncertainty environment, above-mentioned Q value function is only just set up under optimal policy, i.e. the value of Q value function Learn to be variation (or not restraining) by Q under non-optimal strategy.Therefore, the following institute of its calculation formula of Q value function is corrected Show:
Wherein α indicates learning rate and 0 < α < 1, and learning rate is bigger, and the effect of training is fewer before showing to retain. If each state-movement is to that can be repeated several times, learning rate can decline according to suitable scheme, then to arbitrary finite MDP, Q-learning algorithm can converge to optimal policy.Learning rate and discount factor synergistic effect adjust Q matrix more Newly, so influence Q-learning algorithm learning performance, α value 0.5, Υ value 0.8.
Finally, it is stated that preferred embodiment above is only used to illustrate the technical scheme of the present invention and not to limit it, although logical It crosses above preferred embodiment the present invention is described in detail, however, those skilled in the art should understand that, can be Various changes are made to it in form and in details, without departing from claims of the present invention limited range.

Claims (6)

1. the dynamic optimization method that contention window value coexists in LTE and WiFi based on Q-learning algorithm, it is characterised in that:Packet Include following steps:
S1:The state set and set of actions of the small base station LAA are set;
S2:At the t=0 moment, the state and behavior Q value for initializing the small base station LAA are " 0 ";
S3:Calculate the original state s of the small base station LAAtState value;
S4:Logistic Chaotic map sequence is calculated according to formula, the sequence is then mapped to the small base station behavior value set of LAA In and randomly choose a behavior at(i);
S5:Process performing at(i) after, system will obtain environment reward value r according to formulat, then into next state st+1
S6:The behavior Q value function of the small base station LAA is updated according to formula;
S7:T ← t+1 is enabled, step S4~S6 is repeated, until reaching dbjective state.
2. the LTE according to claim 1 based on Q-learning algorithm and the dynamic that contention window value coexists in WiFi are excellent Change method, it is characterised in that:In step sl, the state set of the small base station LAA is expressed as the group of throughput of system and fairness It closes, i.e. st={ Rt,Ft, RtIndicate t moment system total throughout obtained in unauthorized frequency range, i.e. LAA and WiFi user The sum of handling capacity, FtIndicate the fairness function on average, defining fairness function is:
Wherein Rt(s, l) and Rt(s, w) indicates that LAA and WiFi user throughput, nl indicate the quantity of the small base station LAA, and nw is indicated The small base station LAA is divided into four kinds of states according to predefined handling capacity and fairness threshold value by the number of users of WiFi:Poor throughput The high fairness of low fairness, poor throughput, the low fairness of high-throughput and the high fairness of high-throughput, i.e.,
WhereinWithThe threshold value of handling capacity and fairness is respectively indicated, and
For behavior set, using contention window value as the small base station behavior of LAA, and according to the Markov of limited action set Process defines 16≤a of any small base station behavior of t moment LAAt(i)≤128。
3. the LTE according to claim 2 based on Q-learning algorithm and the dynamic that contention window value coexists in WiFi are excellent Change method, it is characterised in that:In step s 2, the state and behavior Q value that the small base station LAA is arranged are null matrix, base small for LAA Stand markov decision process solution target be find an optimal policy π*, so that the value V (s) of each state s is same When reach maximum, state value function representation is as follows:
Wherein r (st,at) indicate the reward value that the small base station LAA is obtained from environment, p (st+1|st,at) indicate the small base station LAA when place In state stWhen housing choice behavior atAfter be transferred to state st+1Probability.
4. the LTE according to claim 3 based on Q-learning algorithm and the dynamic that contention window value coexists in WiFi are excellent Change method, it is characterised in that:In step s 4, a kind of Optimization Mechanism is used as by Logistic mapping in chaotic motion, with this Housing choice behavior at(i), the equation of Logistic mapped system is:
zk+1=μ zk(1-zk)
Wherein 0≤μ≤4 are known as branch parameter, take μ=4, k to indicate the number of iterations herein, z is known as Chaos Variable, and chaos domain is (0,1)。
5. the LTE according to claim 4 based on Q-learning algorithm and the dynamic that contention window value coexists in WiFi are excellent Change method, it is characterised in that:In step s 5, the small base station LAA will obtain a reward after executing the behavior of selection from environment Value, reward value function are defined as:
Wherein ε indicates weight factor and 0 < ε < 1,Indicate symbiotic system handling capacity minimum requirements threshold value, Ft° indicate coexistence system The minimum of system fairness function requires threshold value.
6. the LTE according to claim 5 based on Q-learning algorithm and the dynamic that contention window value coexists in WiFi are excellent Change method, it is characterised in that:In step s 6, the small base station LAA after obtaining reward value in environment, is needing to carry out Q matrix It updates, more new formula is:
Wherein α indicates learning rate and 0 < α < 1, Υ indicate discount factor and 0≤Υ < 1.
CN201810797200.2A 2018-07-19 2018-07-19 LTE and WiFi coexistence competition window value dynamic optimization method based on Q-learning algorithm Active CN108924944B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810797200.2A CN108924944B (en) 2018-07-19 2018-07-19 LTE and WiFi coexistence competition window value dynamic optimization method based on Q-learning algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810797200.2A CN108924944B (en) 2018-07-19 2018-07-19 LTE and WiFi coexistence competition window value dynamic optimization method based on Q-learning algorithm

Publications (2)

Publication Number Publication Date
CN108924944A true CN108924944A (en) 2018-11-30
CN108924944B CN108924944B (en) 2021-09-14

Family

ID=64414708

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810797200.2A Active CN108924944B (en) 2018-07-19 2018-07-19 LTE and WiFi coexistence competition window value dynamic optimization method based on Q-learning algorithm

Country Status (1)

Country Link
CN (1) CN108924944B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766951A (en) * 2019-01-18 2019-05-17 重庆邮电大学 A kind of WiFi gesture identification based on time-frequency statistical property
CN109862567A (en) * 2019-03-28 2019-06-07 电子科技大学 A kind of method of cell mobile communication systems access unlicensed spectrum
CN110035559A (en) * 2019-04-25 2019-07-19 重庆邮电大学 A kind of contention window size intelligent selecting method based on chaos Q- learning algorithm
CN110336620A (en) * 2019-07-16 2019-10-15 沈阳理工大学 A kind of QL-UACW back-off method based on MAC layer fair exchange protocols
CN110933723A (en) * 2019-11-21 2020-03-27 普联技术有限公司 Roaming switching control method and device and wireless AP
CN111163531A (en) * 2019-12-16 2020-05-15 北京理工大学 Unauthorized spectrum duty ratio coexistence method based on DDPG
CN111342920A (en) * 2020-01-10 2020-06-26 重庆邮电大学 Channel selection method based on Q learning
CN113316156A (en) * 2021-05-26 2021-08-27 重庆邮电大学 Intelligent coexistence method on unlicensed frequency band
CN113946428A (en) * 2021-11-02 2022-01-18 Oppo广东移动通信有限公司 Processor dynamic control method, electronic equipment and storage medium
CN115134026A (en) * 2022-06-29 2022-09-30 重庆邮电大学 Intelligent unlicensed spectrum access method based on mean field

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105306176A (en) * 2015-11-13 2016-02-03 南京邮电大学 Realization method for Q learning based vehicle-mounted network media access control (MAC) protocol
CN106332094A (en) * 2016-09-19 2017-01-11 重庆邮电大学 Q algorithm-based dynamic duty ratio coexistence method for LTE-U and Wi-Fi systems in unauthorized frequency band
CN107094321A (en) * 2017-03-31 2017-08-25 南京邮电大学 A kind of vehicle-carrying communication MAC layer channel access method learnt based on multiple agent Q
CN107426772A (en) * 2017-07-04 2017-12-01 北京邮电大学 A kind of dynamic contention window method of adjustment, device and equipment based on Q study

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105306176A (en) * 2015-11-13 2016-02-03 南京邮电大学 Realization method for Q learning based vehicle-mounted network media access control (MAC) protocol
CN106332094A (en) * 2016-09-19 2017-01-11 重庆邮电大学 Q algorithm-based dynamic duty ratio coexistence method for LTE-U and Wi-Fi systems in unauthorized frequency band
CN107094321A (en) * 2017-03-31 2017-08-25 南京邮电大学 A kind of vehicle-carrying communication MAC layer channel access method learnt based on multiple agent Q
CN107426772A (en) * 2017-07-04 2017-12-01 北京邮电大学 A kind of dynamic contention window method of adjustment, device and equipment based on Q study

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CELIMUGE WU,ETC: "A MAC Protocol for Delay-sensitive VANET Applications With Self-learning Contention Scheme", 《THE 11TH ANNUAL IEEE CCNC - SMART SPACES AND WIRELESS NETWORKS》 *
ZHAO HAI-TAO,ETC: "Research on Q-learning based Channel Access Control Algorithm for Internet of Vehicles", 《2016 INTERNATIONAL COMPUTER SYMPOSIUM》 *
杜艾芊等: "车载通信中基于Q学习的信道接入技术研究", 《计算机技术与发展》 *
罗颖等: "基于Ad Hoc网络的TCP增强算法研究", 《通信与网络》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766951A (en) * 2019-01-18 2019-05-17 重庆邮电大学 A kind of WiFi gesture identification based on time-frequency statistical property
CN109862567A (en) * 2019-03-28 2019-06-07 电子科技大学 A kind of method of cell mobile communication systems access unlicensed spectrum
CN109862567B (en) * 2019-03-28 2019-12-27 电子科技大学 Method for accessing non-authorized frequency spectrum of cellular mobile communication system
CN110035559A (en) * 2019-04-25 2019-07-19 重庆邮电大学 A kind of contention window size intelligent selecting method based on chaos Q- learning algorithm
CN110035559B (en) * 2019-04-25 2023-03-10 重庆邮电大学 Intelligent competition window size selection method based on chaotic Q-learning algorithm
CN110336620B (en) * 2019-07-16 2021-05-07 沈阳理工大学 QL-UACW backoff method based on MAC layer fair access
CN110336620A (en) * 2019-07-16 2019-10-15 沈阳理工大学 A kind of QL-UACW back-off method based on MAC layer fair exchange protocols
CN110933723A (en) * 2019-11-21 2020-03-27 普联技术有限公司 Roaming switching control method and device and wireless AP
CN110933723B (en) * 2019-11-21 2022-01-04 普联技术有限公司 Roaming switching control method and device and wireless AP
CN111163531A (en) * 2019-12-16 2020-05-15 北京理工大学 Unauthorized spectrum duty ratio coexistence method based on DDPG
CN111163531B (en) * 2019-12-16 2021-07-13 北京理工大学 Unauthorized spectrum duty ratio coexistence method based on DDPG
CN111342920A (en) * 2020-01-10 2020-06-26 重庆邮电大学 Channel selection method based on Q learning
CN111342920B (en) * 2020-01-10 2021-11-02 重庆邮电大学 Channel selection method based on Q learning
CN113316156A (en) * 2021-05-26 2021-08-27 重庆邮电大学 Intelligent coexistence method on unlicensed frequency band
CN113946428A (en) * 2021-11-02 2022-01-18 Oppo广东移动通信有限公司 Processor dynamic control method, electronic equipment and storage medium
CN113946428B (en) * 2021-11-02 2024-06-07 Oppo广东移动通信有限公司 Processor dynamic control method, electronic equipment and storage medium
CN115134026A (en) * 2022-06-29 2022-09-30 重庆邮电大学 Intelligent unlicensed spectrum access method based on mean field
CN115134026B (en) * 2022-06-29 2024-01-02 绍兴市上虞区舜兴电力有限公司 Intelligent unlicensed spectrum access method based on average field

Also Published As

Publication number Publication date
CN108924944B (en) 2021-09-14

Similar Documents

Publication Publication Date Title
CN108924944A (en) The dynamic optimization method of contention window value coexists in LTE and WiFi based on Q-learning algorithm
Nassar et al. Reinforcement learning for adaptive resource allocation in fog RAN for IoT with heterogeneous latency requirements
Wydmański et al. Contention window optimization in IEEE 802.11 ax networks with deep reinforcement learning
Guo et al. Multi-agent reinforcement learning-based distributed channel access for next generation wireless networks
Kaur et al. Energy-efficient resource allocation in cognitive radio networks under cooperative multi-agent model-free reinforcement learning schemes
CN110035559B (en) Intelligent competition window size selection method based on chaotic Q-learning algorithm
CN105359605B (en) The system and method for the primary resource block allocation plan based on greedy algorithm of cellular network with self-organizing link terminal
CN110519849B (en) Communication and computing resource joint allocation method for mobile edge computing
CN105873214A (en) Resource allocation method of D2D communication system based on genetic algorithm
Balakrishnan et al. Deep reinforcement learning based traffic-and channel-aware OFDMA resource allocation
Sohaib et al. Dynamic multichannel access via multi-agent reinforcement learning: Throughput and fairness guarantees
Barrachina-Muñoz et al. Multi-armed bandits for spectrum allocation in multi-agent channel bonding WLANs
Casasole et al. Qcell: Self-optimization of softwarized 5g networks through deep q-learning
Li et al. A distributed ADMM approach with decomposition-coordination for mobile data offloading
Perlaza et al. On the base station selection and base station sharing in self-configuring networks
Iturria-Rivera et al. Cooperate or not Cooperate: Transfer Learning with Multi-Armed Bandit for Spatial Reuse in Wi-Fi
Anderson et al. Optimization decomposition for scheduling and system configuration in wireless networks
Rivero-Angeles et al. Differentiated backoff strategies for prioritized random access delay in multiservice cellular networks
Zhou et al. Context-aware learning-based resource allocation for ubiquitous power IoT
Zou et al. Resource multi-objective mapping algorithm based on virtualized network functions: RMMA
CN107995034B (en) Energy and service cooperation method for dense cellular network
CN110035539A (en) One kind being based on the matched resource optimal distribution method of correlated equilibrium regret value and device
Kosek-Szott et al. Improving IEEE 802.11 ax UORA performance: Comparison of reinforcement learning and heuristic approaches
Hosey et al. Q-learning for cognitive radios
Bikov et al. Smart concurrent learning scheme for 5G network: QoS-aware radio resource allocation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant