CN108924944A - The dynamic optimization method of contention window value coexists in LTE and WiFi based on Q-learning algorithm - Google Patents
The dynamic optimization method of contention window value coexists in LTE and WiFi based on Q-learning algorithm Download PDFInfo
- Publication number
- CN108924944A CN108924944A CN201810797200.2A CN201810797200A CN108924944A CN 108924944 A CN108924944 A CN 108924944A CN 201810797200 A CN201810797200 A CN 201810797200A CN 108924944 A CN108924944 A CN 108924944A
- Authority
- CN
- China
- Prior art keywords
- value
- base station
- laa
- small base
- state
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000005457 optimization Methods 0.000 title claims abstract description 14
- 230000008569 process Effects 0.000 claims abstract description 18
- 230000000739 chaotic effect Effects 0.000 claims abstract description 8
- 230000009471 action Effects 0.000 claims abstract description 7
- 230000006399 behavior Effects 0.000 claims description 34
- 230000006870 function Effects 0.000 claims description 30
- 230000007246 mechanism Effects 0.000 claims description 20
- 238000013507 mapping Methods 0.000 claims description 11
- 230000008859 change Effects 0.000 claims description 8
- 239000011159 matrix material Substances 0.000 claims description 5
- 238000001228 spectrum Methods 0.000 abstract description 12
- 238000004891 communication Methods 0.000 abstract description 6
- 238000005516 engineering process Methods 0.000 abstract description 4
- 238000013475 authorization Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007786 learning performance Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 230000000452 restraining effect Effects 0.000 description 1
- 230000001568 sexual effect Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000002195 synergetic effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W74/00—Wireless channel access
- H04W74/08—Non-scheduled access, e.g. ALOHA
- H04W74/0808—Non-scheduled access, e.g. ALOHA using carrier sensing, e.g. carrier sense multiple access [CSMA]
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The present invention relates to the dynamic optimization methods that contention window value coexists in the LTE and WiFi based on Q-learning algorithm, belong to field of communication technology, including step:1, the state set and set of actions of the small base station LAA are set;2, the state and behavior Q value of the small base station LAA are initialized;3, the initial state value of the small base station LAA is calculated;4, Logistic Chaotic map sequence is calculated according to formula, and is mapped in the small base station behavior value set of LAA and randomly chooses a behavior at(i);5, process performing at(i) after, environment reward value r is obtainedt, into next state st+1;6, the behavior Q value function of the small base station LAA is updated;7, t ← t+1 is enabled, step 4~6 are repeated, until reaching target.The present invention can improve the availability of frequency spectrum of channel, while extending the power system capacity of next generation communication system under conditions of guaranteeing user fairness, provide preferable service quality for user, promote user experience.
Description
Technical field
The invention belongs to field of communication technology, it is related to a kind of LTE based on Q-learning algorithm and WiFi and competition coexists
The dynamic optimization method of window value.
Background technique
Wireless mobile communications become more next in Future Information communication system with the convenience of its popularity and access that use
More important role.With the rapid development of mobile Internet and internet of things service, mobile data flow abruptly increase leads to authorization frequency
Section shortage, therefore operator wishes to excavate unlicensed spectrum to supplement authorization frequency spectrum.LTE-U (the LTE- that 3GPP is proposed
Unlicensed) also it is referred to as authorization auxiliary access module (Licensed Assisted Access, LAA), it is intended to tie up as far as possible
Under conditions of holding original LTE protocol specification, LTE technology is applied to unauthorized frequency range (such as near 5GHz), by unauthorized
Frequency range disposes small base station, and allows LTE to cooperate in unauthorized frequency range and authorized spectrum band by carrier aggregation technology, to reach
To the purpose for promoting honeycomb system capacity and the unauthorized frequency range availability of frequency spectrum of raising.
At present in unauthorized frequency range there are mainly two types of the co-existence schemes of LTE and WiFi:Duty ratio silent mode (Duty
Cycle Muting, DCM) and LAA.DCM is first version of LTE-U, initially by Ericsson and Qualcomm in 2013
It proposes.This scheme with WiFi shares unlicensed spectrum by way of for a period of time LTE periodic quiet, does not need " first to listen
Say afterwards " (Listen Before talk, LBT), and be easy to dispose because modification LTE protocol is not needed, only exist at present
China, India, South Korea and the U.S. use.In the Sophia Antipolis meeting of in June, 2014 France, it is put forward for the first time LTE
LAA scheme.This scheme seeks to a long-range global solutions, its important feature is exactly that LTE is accessed not
It needs to assess channel situation before authorization frequency spectrum, i.e. clear channel assessment (CCA) (the Clear Channel of LBT mechanism
Assessment, CCA) process.Thus this mechanism needs to modify to LTE protocol stack and the support of equipment vendor.At present
The telecommunications such as 3GPP, ESTI tissue is also actively formulating relevant criterion to LBT coexistence mechanism.We study LTE and WiFi network
Between the coexistence mechanism based on LBT, that is, LAA mechanism.Due to the concern and worry to the LAA mechanism performance based on LBT, Yi Xieyan
Study carefully personnel to assess the performance of this coexistence mechanism.By the study found that the contention window value of LBT mechanism to machine coexists
The performance influence of system is very big, and a good avoidance mechanism can generate reasonable competition window according to actual loading situation in network
Value makes user obtain preferable experience to improve the availability of frequency spectrum of channel.
Currently, existing avoidance mechanism lacks the process of dynamic learning, such as binary exponential backoff mechanism, fixed contention window
Mouth avoidance mechanism etc., and symbiotic system channel cannot be objectively limited according to real-time scene flexibly adjustment system parameter
The raising of the availability of frequency spectrum.
Therefore, a good avoidance mechanism is designed, for real-time network load condition, type of service etc. can be generated
Reasonable contention window value will be helpful to improve the availability of frequency spectrum of channel, while the system for extending next generation communication system is held
Amount, provides preferable service quality for user, to promote user experience.
Summary of the invention
In view of this, the purpose of the present invention is to provide a kind of LTE based on Q-learning algorithm and WiFi coexist it is competing
The dynamic optimization method of window value is striven, the small base station LAA, can be according to network real time traffic load, industry by Q-learning algorithm
The factors such as service type carry out its LBT mechanism contention window value for coexisting with WiFi system of flexibly adjustment, are guaranteeing LTE and WiFi
Under conditions of user fairness coexists, overall system throughput is maximized, the availability of frequency spectrum of symbiotic system is improved, to promote user
Experience.This method has the characteristics that be concisely and efficiently, and at the same time, has certain portability.
In order to achieve the above objectives, the present invention provides the following technical solutions:
The dynamic optimization method of contention window value, including following step coexists in LTE and WiFi based on Q-learning algorithm
Suddenly:
S1:The state set and set of actions of the small base station LAA are set;
S2:At the t=0 moment, the state and behavior Q value for initializing the small base station LAA are " 0 ";
S3:Calculate the original state s of the small base station LAAtState value;
S4:Logistic Chaotic map sequence is calculated according to formula, the sequence is then mapped to the small base station behavior value of LAA
In set and randomly choose a behavior at(i);
S5:Process performing at(i) after, system will obtain environment reward value r according to formulat, then into next state
st+1;
S6:The behavior Q value function of the small base station LAA is updated according to formula;
S7:T ← t+1 is enabled, step S4~S6 is repeated, until reaching dbjective state.
Further, in step sl, the state set of the small base station LAA is expressed as the combination of throughput of system and fairness,
That is st={ Rt,Ft, RtIndicate t moment system total throughout obtained in unauthorized frequency range, i.e. LAA and WiFi user gulps down
The sum of the amount of spitting, FtIndicate the fairness function on average, defining fairness function is:
Wherein Rt(s, l) and Rt(s, w) indicates that LAA and WiFi user throughput, nl indicate the quantity of the small base station LAA, nw table
The small base station LAA is divided into four kinds of states according to predefined handling capacity and fairness threshold value by the number of users for showing WiFi:It is low to handle up
The high fairness of low fairness, poor throughput, the low fairness of high-throughput and the high fairness of high-throughput are measured, i.e.,
WhereinWithThe threshold value of handling capacity and fairness is respectively indicated, and
For behavior set, using contention window value as the small base station behavior of LAA, and according to the Ma Er of limited action set
Section's husband's process defines 16≤a of any small base station behavior of t moment LAAt(i)≤128。
Further, in step s 2, the state and behavior Q value that the small base station LAA is arranged are null matrix, base station small for LAA
The solution target of markov decision process is to find an optimal policy π*, so that the value V (s) of each state s is simultaneously
Reach maximum, state value function representation is as follows:
Wherein r (st,at) indicate the reward value that the small base station LAA is obtained from environment, p (st+1|st,at) indicate the small base of LAA
It stands when in state stWhen housing choice behavior atAfter be transferred to state st+1Probability.
Further, in step s 4, the target of the small base station LAA is to obtain higher reward value, and introducing has ergodic, rule
The chaotic motion of rule property and random nature is as a kind of Optimization Mechanism;
There are three types of common mapped systems in chaos system:Logistic mapping, Chebyshev mapping and Henon mapping,
Its equation is mapped for Logistic to be expressed as:
zk+1=μ zk(1-zk)
Wherein 0≤μ≤4 are known as branch parameter, and when μ ∈ [3.5699456 ..., 4], logistic mappings work is in chaos
State takes μ=4;K indicates the number of iterations, and z is known as Chaos Variable, and chaos domain is (0,1).
Further, in step s 5, the small base station LAA will obtain a reward value after executing the behavior of selection from environment,
Reward value function is defined as:
Wherein ε indicates weight factor and 0 < ε < 1,Indicate symbiotic system handling capacity minimum requirements threshold value, Ft° indicate altogether
The minimum of deposit system fairness function requires threshold value.
Further, in step s 6, the small base station LAA after obtaining reward value in environment, is needing to carry out more Q matrix
Newly, more new formula is:
α indicates learning rate in formula and 0 < α < 1, Υ indicate discount factor and 0≤Υ < 1.
The beneficial effects of the present invention are:By Q-learning algorithm dynamic optimization LTE and WiFi in unauthorized frequency range
On the contention window value based on LBT mechanism coexistence, compared with traditional back off algorithm, the present invention in be based on Q-learning algorithm
Dynamic optimization can be carried out to the contention window value that LTE and WiFi coexist on unauthorized, the small base station LAA can be according to network reality
When scene contention window value is adjusted flexibly.Its process as shown in Fig. 2, the small base station LAA first under some state, according to current
Logistic mapping selection of the environment based on chaos system simultaneously executes some behavior;Then environment of observation obtains reward value, according to
Formula is updated Q functional value and is determined the behavior of next state based on current Q functional value, repeats above-mentioned movement until convergence, originally
Invention can improve the availability of frequency spectrum of channel, while extending next generation communication system under conditions of guaranteeing user fairness
Power system capacity, provide preferable service quality for user, promote user experience.
Detailed description of the invention
In order to keep the purpose of the present invention, technical scheme and beneficial effects clearer, the present invention provides following attached drawing and carries out
Explanation:
Fig. 1 is that competition window coexists in a kind of LTE based on Q-learning algorithm described in the embodiment of the present invention and WiFi
The flow diagram of the dynamic optimization method of value;
Fig. 2 is Q-learning described in the embodiment of the present invention and environmental interaction process model;
Fig. 3 is the network illustraton of model that LTE described in the embodiment of the present invention and WiFi coexist.
Specific embodiment
Below in conjunction with attached drawing, a preferred embodiment of the present invention will be described in detail.
The present invention is based on LBT mechanism coexistence problem in WiFi for LTE on unauthorized frequency range (5GHz), proposes a kind of base
The dynamic optimization method of contention window value coexists in the LTE and WiFi of Q-learning algorithm.Compared with traditional back off algorithm,
It can be excellent to the contention window value progress dynamic that LTE and WiFi coexist on unauthorized based on Q-learning algorithm in the present invention
Change, contention window value can be adjusted flexibly according to the real-time scene of network in the small base station LAA.Its process is as shown in Fig. 2, LAA is small first
Under some state, Logistic mapping selection according to current environment based on chaos system simultaneously executes some behavior for base station;
Then environment of observation obtains reward value, updates Q functional value according to formula and determines the row of next state based on current Q functional value
To repeat above-mentioned movement until convergence.
Consider that there are multiple LAA small base stations and multiple WiFi access points (AP), network model such as Fig. 3 institutes in coexistence scenario
Show.Since the small base station LAA can be run in multiple unauthorized frequency ranges, and it is primarily upon the performance that coexists of LAA, therefore, institute
The scene of consideration can simplify as simpler coexistence scenario, and there are the small bases of multiple LAA on specific one unlicensed channel
It stands and a WiFi AP.Assuming that there are the small base stations nl LAA and one to have nw user's in the coexistence scenario considered
WiFi AP, wherein the network insertion of WiFi user follows 802.11 standard of IEEE.
As shown in Figure 1, the contention window based on dynamic optimization LTE and WiFi based on LBT mechanism coexistence in unauthorized frequency range
The method of mouth value, this approach includes the following steps:
100:The state set and set of actions of the small base station LAA are set;
200:At the t=0 moment, the state and behavior Q value for initializing the small base station LAA are " 0 ";
300:Calculate the original state s of the small base station LAAtState value;
400:Logistic Chaotic map sequence is calculated according to formula, the sequence is then mapped to the small base station behavior of LAA
In value set and randomly choose a behavior at(i);
500:Process performing at(i) after, system will obtain environment reward value r according to formulat, then into next shape
State st+1;
600:The behavior Q value function of the small base station LAA is updated according to formula;
700:T ← t+1 is enabled, step 400~600 are repeated, until reaching dbjective state.
Q-learning algorithm is a kind of enhancing study of determining optimizing decision strategy using algorithm, is considered different
Walk a kind of method of Dynamic Programming.During Q-learning algorithm iteration, state set is defined as S, if the decision-making time
For t, then st∈ S indicates that in the state of the small base station t moment LAA be st.Meanwhile the limited behavior that may execute the small base station LAA
Set is defined as A, at∈ A indicates the behavior in the small base station t moment LAA.Reward function r (st,at) indicate that the small base station LAA is based on
State in which stProcess performing atThe reward value obtained from environment afterwards, then from state stIt is transferred to st+1, determine next
The plan time, t+1 was to QtFunction is updated.Q-learning algorithm is really markov decision process (Markov
Decision Processes, MDP) a kind of version.
In co-existin networks, the small base station user of LAA in unauthorized frequency range with WiFi user's harmonious coexistence.Based on Q-
Learning algorithm working principle, state set is expressed as follows:
st={ Rt,Ft}
Wherein RtIndicate t moment system total throughout obtained, i.e. R in unauthorized frequency ranget=Rt(s,l)+Rt(s,
w)。FtIt indicates the fairness function on average, fair function is defined as follows:
Wherein Rt(s,l)(Rt(s, w)) indicate LAA (Wi-Fi) user throughput, FtValue show that system is got over closer to 1
It is fair.According to predefinedWith(and) threshold value, the small base station LAA is divided into four kinds of states:The low public affairs of poor throughput
The high fairness of levelling, poor throughput, the low fairness of high-throughput, the high fairness of high-throughput.Therefore the list of elements of state set S
Show as follows:
Using contention window value as behavior set, then the behavior set A={ a (1), a (2) ..., a (k) } of the small base station LAA,
Its unit is number of time slots.According to the markoff process of limited action set, the small base station behavior 16 of any t moment LAA is defined
≤at(i)≤128
The task that the small base station LAA faces is to determine an optimal policy, so that reward obtained is maximum.It is small for LAA
Base station, can be according to current state, then environment of observation makes best decision to state/movement of next step.State st's
Accoumulation of discount reward value function can be expressed as:
Wherein r (st,at) indicate the small base station LAA in state stSelection acts atWhen instant reward obtained.Υ indicates folding
The factor and 0≤Υ < 1 are detained, discount factor tends to the 0 small base station expression LAA and mainly considers to reward immediately.p(st+1|st,at) indicate
The small base station selected movement a of LAAtWhen from state stIt is transferred to st+1Probability.The target that MDP is solved is to find an optimal policy
π*, so that the value V (s) of each state s reaches maximum simultaneously.According to bellman principle, when total discount period of the small base station LAA
An optimal policy π can at least be obtained when reward is maximum by hoping*So that:
Wherein V*(st) indicate the small base station LAA from state stStart and follows optimal policy π*Maximum-discount obtained is tired
Count reward value.The tactful π given for one, is the function that state space is mapped to motion space, i.e.,:π:st→at.Therefore
Optimal policy can be expressed as form:
π*(st)=argV*(st)
The target of the small base station LAA is to obtain higher reward value, therefore, in each state, it will selection has higher Q
The movement of value.But in the initial stage of study, fewer for state-movement experience, Q value cannot accurately indicate correct
Reinforcement value, in general, the movement of highest q value, which results in the small base station LAA, always can not explore it along identical path
He is preferably worth, to be easily trapped into local optimum.Therefore, in order to overcome the disadvantage, the small base station LAA must be randomly chosen dynamic
Make, therefore, introducing has the chaotic motion of ergodic, regularity and random nature as a kind of Optimization Mechanism, to reduce
The small base station movement selection strategy of LAA falls into the possibility of locally optimal solution.
Chaos system mainly has Logistic mapping, Chebyshev mapping and Henon to map three kinds, for Logistic
Mapped system, equation are expressed as:
zk+1=μ zk(1-zk)
Wherein, 0≤μ≤4 are known as branch parameter in formula, and k indicates the number of iterations, and z is known as Chaos Variable, chaos domain be (0,
1).When μ ∈ [3.5699456 ..., 4], logistic mappings work is in chaos state, that is to say, that in logistic mapping
The lower sequence generated of effect is aperiodic and not convergent.The chaotic motion state that chaos system shows seems random complexity,
But it there are in fact inherent laws.
Based on reward value function, the small base station LAA will change using high-throughput and high fairness as target selection strategy
Generation.The reward value function that the small base station LAA is obtained from environment is defined as:
Wherein ε indicates weight factor and 0 < ε < 1, ε are smaller shows that Q-learning process is more likely to fair sexual factor
Reward obtained.Indicate symbiotic system handling capacity minimum requirements threshold value, Ft° indicate symbiotic system fairness function minimum
It is required that threshold value.It can be seen that r from reward value function expressiontIt is bounded function, item is restrained according to watt golden this (Watkins)
The Q-learning process has convergence known to part.In view of the throughput performance and network fairness factor of whole network,
Reward value function throughput of system be higher than minimum throughput threshold under conditions of make fairness functional value as close as
1。
It is right in a recursive manner in each moment t as the following formula based on the small base station strategy π, LAA in Q-learning algorithm
Q value function is calculated:
It will be apparent that Q value is indicated when the small base station LAA is in state stWhen follow tactful π execution movement atExpectation discount obtained
Reward.Therefore, assessment optimal policy π is aimed at*Under Q value.From above formula it can be concluded that state value function and behavior value function
Relationship it is as follows:
However, being based on uncertainty environment, above-mentioned Q value function is only just set up under optimal policy, i.e. the value of Q value function
Learn to be variation (or not restraining) by Q under non-optimal strategy.Therefore, the following institute of its calculation formula of Q value function is corrected
Show:
Wherein α indicates learning rate and 0 < α < 1, and learning rate is bigger, and the effect of training is fewer before showing to retain.
If each state-movement is to that can be repeated several times, learning rate can decline according to suitable scheme, then to arbitrary finite
MDP, Q-learning algorithm can converge to optimal policy.Learning rate and discount factor synergistic effect adjust Q matrix more
Newly, so influence Q-learning algorithm learning performance, α value 0.5, Υ value 0.8.
Finally, it is stated that preferred embodiment above is only used to illustrate the technical scheme of the present invention and not to limit it, although logical
It crosses above preferred embodiment the present invention is described in detail, however, those skilled in the art should understand that, can be
Various changes are made to it in form and in details, without departing from claims of the present invention limited range.
Claims (6)
1. the dynamic optimization method that contention window value coexists in LTE and WiFi based on Q-learning algorithm, it is characterised in that:Packet
Include following steps:
S1:The state set and set of actions of the small base station LAA are set;
S2:At the t=0 moment, the state and behavior Q value for initializing the small base station LAA are " 0 ";
S3:Calculate the original state s of the small base station LAAtState value;
S4:Logistic Chaotic map sequence is calculated according to formula, the sequence is then mapped to the small base station behavior value set of LAA
In and randomly choose a behavior at(i);
S5:Process performing at(i) after, system will obtain environment reward value r according to formulat, then into next state st+1;
S6:The behavior Q value function of the small base station LAA is updated according to formula;
S7:T ← t+1 is enabled, step S4~S6 is repeated, until reaching dbjective state.
2. the LTE according to claim 1 based on Q-learning algorithm and the dynamic that contention window value coexists in WiFi are excellent
Change method, it is characterised in that:In step sl, the state set of the small base station LAA is expressed as the group of throughput of system and fairness
It closes, i.e. st={ Rt,Ft, RtIndicate t moment system total throughout obtained in unauthorized frequency range, i.e. LAA and WiFi user
The sum of handling capacity, FtIndicate the fairness function on average, defining fairness function is:
Wherein Rt(s, l) and Rt(s, w) indicates that LAA and WiFi user throughput, nl indicate the quantity of the small base station LAA, and nw is indicated
The small base station LAA is divided into four kinds of states according to predefined handling capacity and fairness threshold value by the number of users of WiFi:Poor throughput
The high fairness of low fairness, poor throughput, the low fairness of high-throughput and the high fairness of high-throughput, i.e.,
WhereinWithThe threshold value of handling capacity and fairness is respectively indicated, and
For behavior set, using contention window value as the small base station behavior of LAA, and according to the Markov of limited action set
Process defines 16≤a of any small base station behavior of t moment LAAt(i)≤128。
3. the LTE according to claim 2 based on Q-learning algorithm and the dynamic that contention window value coexists in WiFi are excellent
Change method, it is characterised in that:In step s 2, the state and behavior Q value that the small base station LAA is arranged are null matrix, base small for LAA
Stand markov decision process solution target be find an optimal policy π*, so that the value V (s) of each state s is same
When reach maximum, state value function representation is as follows:
Wherein r (st,at) indicate the reward value that the small base station LAA is obtained from environment, p (st+1|st,at) indicate the small base station LAA when place
In state stWhen housing choice behavior atAfter be transferred to state st+1Probability.
4. the LTE according to claim 3 based on Q-learning algorithm and the dynamic that contention window value coexists in WiFi are excellent
Change method, it is characterised in that:In step s 4, a kind of Optimization Mechanism is used as by Logistic mapping in chaotic motion, with this
Housing choice behavior at(i), the equation of Logistic mapped system is:
zk+1=μ zk(1-zk)
Wherein 0≤μ≤4 are known as branch parameter, take μ=4, k to indicate the number of iterations herein, z is known as Chaos Variable, and chaos domain is
(0,1)。
5. the LTE according to claim 4 based on Q-learning algorithm and the dynamic that contention window value coexists in WiFi are excellent
Change method, it is characterised in that:In step s 5, the small base station LAA will obtain a reward after executing the behavior of selection from environment
Value, reward value function are defined as:
Wherein ε indicates weight factor and 0 < ε < 1,Indicate symbiotic system handling capacity minimum requirements threshold value, Ft° indicate coexistence system
The minimum of system fairness function requires threshold value.
6. the LTE according to claim 5 based on Q-learning algorithm and the dynamic that contention window value coexists in WiFi are excellent
Change method, it is characterised in that:In step s 6, the small base station LAA after obtaining reward value in environment, is needing to carry out Q matrix
It updates, more new formula is:
Wherein α indicates learning rate and 0 < α < 1, Υ indicate discount factor and 0≤Υ < 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810797200.2A CN108924944B (en) | 2018-07-19 | 2018-07-19 | LTE and WiFi coexistence competition window value dynamic optimization method based on Q-learning algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810797200.2A CN108924944B (en) | 2018-07-19 | 2018-07-19 | LTE and WiFi coexistence competition window value dynamic optimization method based on Q-learning algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108924944A true CN108924944A (en) | 2018-11-30 |
CN108924944B CN108924944B (en) | 2021-09-14 |
Family
ID=64414708
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810797200.2A Active CN108924944B (en) | 2018-07-19 | 2018-07-19 | LTE and WiFi coexistence competition window value dynamic optimization method based on Q-learning algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108924944B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109766951A (en) * | 2019-01-18 | 2019-05-17 | 重庆邮电大学 | A kind of WiFi gesture identification based on time-frequency statistical property |
CN109862567A (en) * | 2019-03-28 | 2019-06-07 | 电子科技大学 | A kind of method of cell mobile communication systems access unlicensed spectrum |
CN110035559A (en) * | 2019-04-25 | 2019-07-19 | 重庆邮电大学 | A kind of contention window size intelligent selecting method based on chaos Q- learning algorithm |
CN110336620A (en) * | 2019-07-16 | 2019-10-15 | 沈阳理工大学 | A kind of QL-UACW back-off method based on MAC layer fair exchange protocols |
CN110933723A (en) * | 2019-11-21 | 2020-03-27 | 普联技术有限公司 | Roaming switching control method and device and wireless AP |
CN111163531A (en) * | 2019-12-16 | 2020-05-15 | 北京理工大学 | Unauthorized spectrum duty ratio coexistence method based on DDPG |
CN111342920A (en) * | 2020-01-10 | 2020-06-26 | 重庆邮电大学 | Channel selection method based on Q learning |
CN113316156A (en) * | 2021-05-26 | 2021-08-27 | 重庆邮电大学 | Intelligent coexistence method on unlicensed frequency band |
CN113946428A (en) * | 2021-11-02 | 2022-01-18 | Oppo广东移动通信有限公司 | Processor dynamic control method, electronic equipment and storage medium |
CN115134026A (en) * | 2022-06-29 | 2022-09-30 | 重庆邮电大学 | Intelligent unlicensed spectrum access method based on mean field |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105306176A (en) * | 2015-11-13 | 2016-02-03 | 南京邮电大学 | Realization method for Q learning based vehicle-mounted network media access control (MAC) protocol |
CN106332094A (en) * | 2016-09-19 | 2017-01-11 | 重庆邮电大学 | Q algorithm-based dynamic duty ratio coexistence method for LTE-U and Wi-Fi systems in unauthorized frequency band |
CN107094321A (en) * | 2017-03-31 | 2017-08-25 | 南京邮电大学 | A kind of vehicle-carrying communication MAC layer channel access method learnt based on multiple agent Q |
CN107426772A (en) * | 2017-07-04 | 2017-12-01 | 北京邮电大学 | A kind of dynamic contention window method of adjustment, device and equipment based on Q study |
-
2018
- 2018-07-19 CN CN201810797200.2A patent/CN108924944B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105306176A (en) * | 2015-11-13 | 2016-02-03 | 南京邮电大学 | Realization method for Q learning based vehicle-mounted network media access control (MAC) protocol |
CN106332094A (en) * | 2016-09-19 | 2017-01-11 | 重庆邮电大学 | Q algorithm-based dynamic duty ratio coexistence method for LTE-U and Wi-Fi systems in unauthorized frequency band |
CN107094321A (en) * | 2017-03-31 | 2017-08-25 | 南京邮电大学 | A kind of vehicle-carrying communication MAC layer channel access method learnt based on multiple agent Q |
CN107426772A (en) * | 2017-07-04 | 2017-12-01 | 北京邮电大学 | A kind of dynamic contention window method of adjustment, device and equipment based on Q study |
Non-Patent Citations (4)
Title |
---|
CELIMUGE WU,ETC: "A MAC Protocol for Delay-sensitive VANET Applications With Self-learning Contention Scheme", 《THE 11TH ANNUAL IEEE CCNC - SMART SPACES AND WIRELESS NETWORKS》 * |
ZHAO HAI-TAO,ETC: "Research on Q-learning based Channel Access Control Algorithm for Internet of Vehicles", 《2016 INTERNATIONAL COMPUTER SYMPOSIUM》 * |
杜艾芊等: "车载通信中基于Q学习的信道接入技术研究", 《计算机技术与发展》 * |
罗颖等: "基于Ad Hoc网络的TCP增强算法研究", 《通信与网络》 * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109766951A (en) * | 2019-01-18 | 2019-05-17 | 重庆邮电大学 | A kind of WiFi gesture identification based on time-frequency statistical property |
CN109862567A (en) * | 2019-03-28 | 2019-06-07 | 电子科技大学 | A kind of method of cell mobile communication systems access unlicensed spectrum |
CN109862567B (en) * | 2019-03-28 | 2019-12-27 | 电子科技大学 | Method for accessing non-authorized frequency spectrum of cellular mobile communication system |
CN110035559A (en) * | 2019-04-25 | 2019-07-19 | 重庆邮电大学 | A kind of contention window size intelligent selecting method based on chaos Q- learning algorithm |
CN110035559B (en) * | 2019-04-25 | 2023-03-10 | 重庆邮电大学 | Intelligent competition window size selection method based on chaotic Q-learning algorithm |
CN110336620B (en) * | 2019-07-16 | 2021-05-07 | 沈阳理工大学 | QL-UACW backoff method based on MAC layer fair access |
CN110336620A (en) * | 2019-07-16 | 2019-10-15 | 沈阳理工大学 | A kind of QL-UACW back-off method based on MAC layer fair exchange protocols |
CN110933723A (en) * | 2019-11-21 | 2020-03-27 | 普联技术有限公司 | Roaming switching control method and device and wireless AP |
CN110933723B (en) * | 2019-11-21 | 2022-01-04 | 普联技术有限公司 | Roaming switching control method and device and wireless AP |
CN111163531A (en) * | 2019-12-16 | 2020-05-15 | 北京理工大学 | Unauthorized spectrum duty ratio coexistence method based on DDPG |
CN111163531B (en) * | 2019-12-16 | 2021-07-13 | 北京理工大学 | Unauthorized spectrum duty ratio coexistence method based on DDPG |
CN111342920A (en) * | 2020-01-10 | 2020-06-26 | 重庆邮电大学 | Channel selection method based on Q learning |
CN111342920B (en) * | 2020-01-10 | 2021-11-02 | 重庆邮电大学 | Channel selection method based on Q learning |
CN113316156A (en) * | 2021-05-26 | 2021-08-27 | 重庆邮电大学 | Intelligent coexistence method on unlicensed frequency band |
CN113946428A (en) * | 2021-11-02 | 2022-01-18 | Oppo广东移动通信有限公司 | Processor dynamic control method, electronic equipment and storage medium |
CN113946428B (en) * | 2021-11-02 | 2024-06-07 | Oppo广东移动通信有限公司 | Processor dynamic control method, electronic equipment and storage medium |
CN115134026A (en) * | 2022-06-29 | 2022-09-30 | 重庆邮电大学 | Intelligent unlicensed spectrum access method based on mean field |
CN115134026B (en) * | 2022-06-29 | 2024-01-02 | 绍兴市上虞区舜兴电力有限公司 | Intelligent unlicensed spectrum access method based on average field |
Also Published As
Publication number | Publication date |
---|---|
CN108924944B (en) | 2021-09-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108924944A (en) | The dynamic optimization method of contention window value coexists in LTE and WiFi based on Q-learning algorithm | |
Nassar et al. | Reinforcement learning for adaptive resource allocation in fog RAN for IoT with heterogeneous latency requirements | |
Wydmański et al. | Contention window optimization in IEEE 802.11 ax networks with deep reinforcement learning | |
Guo et al. | Multi-agent reinforcement learning-based distributed channel access for next generation wireless networks | |
Kaur et al. | Energy-efficient resource allocation in cognitive radio networks under cooperative multi-agent model-free reinforcement learning schemes | |
CN110035559B (en) | Intelligent competition window size selection method based on chaotic Q-learning algorithm | |
CN105359605B (en) | The system and method for the primary resource block allocation plan based on greedy algorithm of cellular network with self-organizing link terminal | |
CN110519849B (en) | Communication and computing resource joint allocation method for mobile edge computing | |
CN105873214A (en) | Resource allocation method of D2D communication system based on genetic algorithm | |
Balakrishnan et al. | Deep reinforcement learning based traffic-and channel-aware OFDMA resource allocation | |
Sohaib et al. | Dynamic multichannel access via multi-agent reinforcement learning: Throughput and fairness guarantees | |
Barrachina-Muñoz et al. | Multi-armed bandits for spectrum allocation in multi-agent channel bonding WLANs | |
Casasole et al. | Qcell: Self-optimization of softwarized 5g networks through deep q-learning | |
Li et al. | A distributed ADMM approach with decomposition-coordination for mobile data offloading | |
Perlaza et al. | On the base station selection and base station sharing in self-configuring networks | |
Iturria-Rivera et al. | Cooperate or not Cooperate: Transfer Learning with Multi-Armed Bandit for Spatial Reuse in Wi-Fi | |
Anderson et al. | Optimization decomposition for scheduling and system configuration in wireless networks | |
Rivero-Angeles et al. | Differentiated backoff strategies for prioritized random access delay in multiservice cellular networks | |
Zhou et al. | Context-aware learning-based resource allocation for ubiquitous power IoT | |
Zou et al. | Resource multi-objective mapping algorithm based on virtualized network functions: RMMA | |
CN107995034B (en) | Energy and service cooperation method for dense cellular network | |
CN110035539A (en) | One kind being based on the matched resource optimal distribution method of correlated equilibrium regret value and device | |
Kosek-Szott et al. | Improving IEEE 802.11 ax UORA performance: Comparison of reinforcement learning and heuristic approaches | |
Hosey et al. | Q-learning for cognitive radios | |
Bikov et al. | Smart concurrent learning scheme for 5G network: QoS-aware radio resource allocation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |