CN111246502B - Energy threshold dynamic optimization method based on Q learning - Google Patents

Energy threshold dynamic optimization method based on Q learning Download PDF

Info

Publication number
CN111246502B
CN111246502B CN202010021376.6A CN202010021376A CN111246502B CN 111246502 B CN111246502 B CN 111246502B CN 202010021376 A CN202010021376 A CN 202010021376A CN 111246502 B CN111246502 B CN 111246502B
Authority
CN
China
Prior art keywords
throughput
fairness
action
laa
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010021376.6A
Other languages
Chinese (zh)
Other versions
CN111246502A (en
Inventor
裴二荣
鹿逊
刘珊
易鑫
周礼能
张茹
王振民
朱冰冰
杨光财
荆玉琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202010021376.6A priority Critical patent/CN111246502B/en
Publication of CN111246502A publication Critical patent/CN111246502A/en
Application granted granted Critical
Publication of CN111246502B publication Critical patent/CN111246502B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/06Optimizing the usage of the radio link, e.g. header compression, information sizing, discarding information
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention relates to a dynamic energy threshold optimization method based on Q learning, and belongs to the technical field of communication. By adopting a Q learning-based method, the intelligent agent is repeatedly trained, and then the optimal action is selected according to the converged Q table to reach the target state. The current algorithm can dynamically adjust the LAA energy detection threshold value in real time according to the external environment, and the coexistence system can be ensured to be in a high-throughput and high-fairness state by adopting the algorithm based on Q learning.

Description

Energy threshold dynamic optimization method based on Q learning
Technical Field
The invention belongs to the technical field of communication, and particularly relates to a dynamic energy threshold optimization method based on Q learning.
Background
With the rapid development of mobile communication technology, the number of various intelligent terminals is increased explosively, and spectrum resources are less and less with the increase of the number of intelligent terminals. Spectrum resources can be divided into licensed spectrum resources and unlicensed spectrum resources, where licensed spectrum resources suitable for communication transmission are becoming more scarce and also becoming more crowded, and simply increasing spectrum utilization is not enough to alleviate the shortage of spectrum resources. The 3GPP standardization body proposes that utilizing unlicensed spectrum is an effective solution to the growing traffic.
Currently available unlicensed frequency bands include the 2.4GHz industrial, scientific and medical (ISM band) and 5GHz U-nii (unlicensed National Information infrastructure) frequency bands, and there are already some mature access technologies such as Wi-Fi, bluetooth, radar, D2D (Device-to-Device), and the like in the existing frequency bands, the most prominent of which is Wi-Fi access technology. In addition to meeting the usage limit of unlicensed frequency bands in different countries and regions, the main problem of LTE is how to ensure harmonious coexistence with Wi-Fi under the condition of fair use of frequency band resources.
The LTE technology and the Wi-Fi technology are two distinct wireless communication technologies, and the difference between the two technical protocols causes negative effects in the direct convergence process of the network. For 3GPP standardization organization to deploy LTE system to unlicensed frequency band, a Licensed-Assisted Access (LAA) technology is formulated in LTE Release 13. In order to solve the problem of harmonious coexistence of LTE and Wi-Fi, the LAA scheme changes the access mechanism of LTE, and adopts a Listen Before Talk (Listen Before Talk, LBT) access mechanism. The mechanism requires that all LTE devices need to detect the current channel state before accessing the channel, the LTE devices need to compete with Wi-Fi devices for the channel, and this scheme needs to change the access protocol of LTE. The core of the LBT mechanism is Clear Channel Access (CCA), which uses Energy Detection (ED) technology to sense the Channel. In the LAA scheme, the LTE equipment detects the current channel state before accessing a channel, if the channel state is detected to be busy, the equipment waits for the completion of the transmission of other equipment and searches for a chance to carry out data transmission, if the channel state is detected to be idle, the equipment immediately accesses the channel and carries out data transmission, and the energy detection technology is a simple and effective scheme for judging the channel state. Whether the LTE device accesses the channel for data transmission depends on the result of energy detection, and the size of the energy threshold of the LAA scheme directly affects the result of energy detection.
The reference threshold value specified by the 3GPP standardization organization is a fixed value, and does not take into consideration the real-time network environment and the like. Therefore, a dynamic energy threshold optimization scheme considering a real-time network environment is designed.
Disclosure of Invention
In view of this, the present invention provides a Q learning-based energy threshold dynamic optimization method, which is used to solve the fairness problem when two different networks coexist in an unlicensed frequency band.
In order to achieve the purpose, the invention provides the following technical scheme:
the energy threshold dynamic optimization method based on Q learning comprises the following steps:
s1: setting LAA SBSs action set A ═ { a ═ a1,a2...atAnd state set S ═ S1,s2...stInitializing a Q matrix to be a zero-order matrix, and randomly selecting an initial state by LAA SBSs;
s2: the LAA SBSs select an action a according to an epsilon-greedy selection strategyt
S3: according to action atCalculating the throughput and fairness coefficient of the coexisting system corresponding to the currently selected action, and obtaining the currently selected action atIs given a prize of r(s)t,at);
S4: updating the Q table according to a Q table updating formula of Q learning, and enabling LAA SBSs to enter the next state;
s5: repeatedly executing step S2 and the following steps until the next state reaches the target state;
further, in step S1, for action set a ═ a1,a2...atIn which each action atValues representing different energy thresholds, S ═ S for the set of states1,s2...stEach state stAre all composed of throughput and fairness coefficients, i.e. st={Rt,Ft};
Further, in step S2, an action is chosen using an epsilon-greedy action selection policy. Different from a random selection strategy and a greedy selection strategy, the situation that iteration times are too many due to repeated action selection in the random selection strategy and the situation that a local optimal solution occurs in the greedy selection strategy are avoided. The epsilon-greedy action selection strategy can efficiently and accurately select actions by adopting a selection mode combining the epsilon-greedy action selection strategy and the greedy action selection strategy.
Further, in step S3, action a is selected using the ε -greedy selection strategytUsing action atCalculating a corresponding throughput RtAnd a fairness coefficient FtI.e. to confirm the state s corresponding to the current actiont={Rt,Ft}. For state stThroughput R in (1)tThe sum of the throughput of the LAA system and the throughput of the Wi-Fi system is expressed, and the throughput of the coexisting system is obtained by referring to a Markov chain model. For state stFairness factor F intThe fairness coefficient, which represents a coexistence system, is defined as:
Figure BDA0002360923680000031
wherein R islAnd RwDenotes the LAA and Wi-Fi throughputs, nlAnd nwRespectively representing the equipment quantity and the fairness coefficient F of the LAA SBSs and the Wi-Fi APstThe closer to 1, the more fair the coexistence system. Therefore, the states can be divided into four states, low throughput low fairness, low throughput high fairness, high throughput low throughput and high throughput high fairness, respectively, based on throughput and fairness. The high throughput fairness is a target state of LAA SBSs, and four states are defined as follows:
Figure BDA0002360923680000032
wherein
Figure BDA0002360923680000033
And
Figure BDA0002360923680000034
indicating a threshold for fairness. Further, in step S3, action a is selectedtAfter completion, according to the currentSelected actions receive reward r(s)t,at). The reward function is defined as:
Figure BDA0002360923680000035
wherein F1DEG and F2Degree is the minimum fairness factor defined only when action atWhen the corresponding throughput and fairness coefficients meet certain conditions, the currently selected action is rewarded.
Further, in step S4, the Q table is updated according to the Q table update formula of Q learning. The update formula is:
Figure BDA0002360923680000036
where α represents the learning rate and 0< α <1, γ represents the discounting factor and 0 ≦ γ < 1.
Further, in step S5, for Q learning herein, only if the current state reaches the target state, i.e., the LAA SBSs current state reaches high throughput fairness, is it calculated to complete an iterative process.
The invention has the beneficial effects that: the energy threshold of the LTE-LAA on the unlicensed frequency band is dynamically optimized through the Q learning algorithm, the fairness of the coexistence system can be enabled to be the highest under the condition that the throughput of the coexistence system is ensured to be certain, and the method has a reference meaning for the harmonious coexistence of other heterogeneous networks.
Drawings
In order to make the object, technical scheme and beneficial effect of the invention more clear, the invention provides the following drawings for explanation:
FIG. 1 is a Q learning framework diagram;
FIG. 2 is a diagram of a network model for coexistence of LTE and Wi-Fi.
Detailed Description
Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
The invention provides a dynamic energy threshold optimization method based on Q learning, aiming at the coexistence problem of LTE and Wi-Fi on an unlicensed frequency band. Referring to a fixed reference threshold scheme in a 3GPP standardization file, the invention can dynamically optimize the energy threshold of the LAA SBSs for channel detection based on a Q learning algorithm. The LAA SBSs can dynamically adjust the size of the energy threshold according to the real-time environment of the network. As shown in fig. 1, first, LAA SBSs is used as an agent, and in a certain state, action selection is performed according to an epsilon-greedy action selection policy, which can obtain rewards in a certain environment, and the Q-table is updated according to a Q-learning update formula, and the above actions are repeated until convergence.
In the coexistence scenario, a plurality of LAA SBSs and a plurality of Wi-Fi APs exist, the network model is shown in fig. 2, and we only consider the transmission of data traffic in the downlink, so that the LAA SBSs and the Wi-Fi APs respectively perform channel detection. In fig. 2, the solid black line and the dashed line respectively represent the licensed spectrum and the unlicensed spectrum, we only consider data transmission on the unlicensed spectrum, the dashed red line represents that the Wi-Fi APs broadcasts information such as throughput of the current access point at each decision time, and furthermore, the LAA SBSs can analyze the received broadcast information.
In Q learning, we treat LAA SBSs in heterogeneous networks as agents. At a particular time, the agent observes the state from its environment and takes action, and at each decision time t the agent takes appropriate action to maximize the reward at the next time t + 1. In Q learning, the learned Q value is updated with the instant prize and the discount prize, and stored in a two-dimensional Q table.
In a heterogeneous network, LAA devices coexist with Wi-Fi users on unlicensed bands. Based on the working principle of Q learning, the action and state set representation is defined as follows: a ═ a1,a2...atS and S ═ S1,s2...st}. Wherein each element in the A set represents a different energy threshold for detecting the state of the unlicensed channel, and each element in the S set represents a parameter pair consisting of a throughput coefficient and a fairness coefficient, i.e. St={Rt,FtFor state s }tThroughput R in (1)tThe sum of the throughput of the LAA system and the throughput of the Wi-Fi system is expressed, and the throughput of the coexisting system is obtained by referring to a Markov chain model. For state stFairness factor F intThe fairness coefficient, which represents a coexistence system, is defined as:
Figure BDA0002360923680000051
wherein R islAnd RwDenotes the LAA and Wi-Fi throughputs, nlAnd nwRespectively representing the equipment quantity and the fairness coefficient F of the LAA SBSs and the Wi-Fi APstThe closer to 1, the more fair the coexistence system. Therefore, the states can be divided into four states, low throughput low fairness, low throughput high fairness, high throughput low throughput and high throughput high fairness, respectively, based on throughput and fairness. The high throughput fairness is a target state of the LAA SBSs, and the defined state is as follows:
Figure BDA0002360923680000052
wherein
Figure BDA0002360923680000053
And
Figure BDA0002360923680000054
indicating a threshold for fairness.
The algorithm uses an epsilon-greedy strategy to select actions on the action selection strategy. Different from a random selection strategy and a greedy selection strategy, the epsilon-greedy action selection strategy can efficiently and accurately select actions by adopting a selection mode of combining the random selection strategy and the greedy selection strategy. Is defined as:
Figure BDA0002360923680000055
according to the epsilon-greedy selection policy, action a is performedtThe earning award is r(s)t,at). The reward function is defined as:
Figure BDA0002360923680000056
F1DEG and F2Degree is a prescribed minimum fairness factor, only when action atWhen the corresponding throughput and fairness coefficients meet certain conditions, the selected action is rewarded. Updating the Q value according to an updating formula:
Figure BDA0002360923680000061
wherein 0< alpha <1, and 0< gamma < 1. If α ═ 1 would ignore previously learned experiences and replace them with the latest estimated reward, and the larger γ, the greater the dependency of the agent on the reward for value.
Finally, for the Q learning algorithm herein, an iterative process is only completed if the current state reaches the target state, i.e., the LAA SBSs current state reaches high throughput fairness.
Finally, it is noted that the above-mentioned preferred embodiments illustrate rather than limit the invention, and that, although the invention has been described in detail with reference to the above-mentioned preferred embodiments, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the scope of the invention as defined by the appended claims.

Claims (3)

1. A dynamic energy threshold optimization method based on Q learning is characterized in that: the method comprises the following steps:
s1: setting LAA SBSs action set A ═ { a ═ a1,a2...atIn which each action atRepresenting values of different energy thresholds, setting a set of states S ═ S1,s2...stIn each of which state stAre all composed of throughput and fairness coefficientst={Rt,FtInitializing a Q matrix to be a zero-order matrix, and randomly selecting an initial state by LAA SBSs;
s2: the LAA SBSs select an action a according to an epsilon-greedy selection strategytThe epsilon-greedy action selection strategy can efficiently and accurately select actions by adopting a selection mode combining exploration and utilization;
s3: according to action atCalculating the throughput and fairness coefficient of the coexisting system corresponding to the currently selected action, and obtaining the currently selected action atIs given a prize of r(s)t,at): selecting action a using an epsilon-greedy selection strategytThen using action atCalculating a corresponding throughput RtAnd a fairness coefficient FtI.e. to confirm the state s corresponding to the current actiont={Rt,Ft}; for state stThroughput R in (1)tRepresenting the sum of the throughput of the LAA system and the throughput of the WI-FI system, and calculating the throughput of the coexisting system by referring to a Markov chain model; for state stFairness factor F intThe fairness coefficient, which represents a coexistence system, is defined as:
Figure FDA0003559183020000011
wherein R islAnd RwDenotes the LAA and Wi-Fi throughputs, nlAnd nwRespectively representing the equipment quantity of LAA SBSs and the Wi-Fi AP and the fairness coefficient FtThe closer to 1, the more fair the coexistence system; dividing the states into four states, namely low throughput low fairness, low throughput high fairness, high throughput low throughput and high throughput high fairness according to throughput and fairness, wherein the high throughput high fairness is a target state of LAA SBSs; when selecting action atUpon completion, the reward r(s) is acquired according to the currently selected actiont,at) The reward function is defined as:
Figure FDA0003559183020000012
wherein F1DEG and F2Is the minimum fairness factor defined,only when action atWhen the corresponding throughput and fairness coefficient meet certain conditions, the currently selected action is rewarded;
s4: updating the Q table according to the learned Q table updating formula, and enabling the LAA SBSs to enter the next state;
s5: and repeatedly executing the step S2 and the following steps until the Q table convergence completes the training.
2. The method of claim 1, wherein the energy threshold dynamic optimization method based on Q learning is as follows: in step S4, the formula is updated according to Q-table of Q-learning
Figure FDA0003559183020000021
Where α represents the learning rate and 0< α <1, and γ represents the discount factor and 0 ≦ γ < 1.
3. The method of claim 1, wherein the energy threshold dynamic optimization method based on Q learning is as follows: in step S5, for Q learning herein, only if the current state reaches the target state, i.e., the LAA SBSs current state reaches high throughput fairness, one iteration is calculated to be completed; and repeatedly executing the step S2 and the following steps until the Q table convergence completes the training.
CN202010021376.6A 2020-01-09 2020-01-09 Energy threshold dynamic optimization method based on Q learning Active CN111246502B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010021376.6A CN111246502B (en) 2020-01-09 2020-01-09 Energy threshold dynamic optimization method based on Q learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010021376.6A CN111246502B (en) 2020-01-09 2020-01-09 Energy threshold dynamic optimization method based on Q learning

Publications (2)

Publication Number Publication Date
CN111246502A CN111246502A (en) 2020-06-05
CN111246502B true CN111246502B (en) 2022-04-29

Family

ID=70878159

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010021376.6A Active CN111246502B (en) 2020-01-09 2020-01-09 Energy threshold dynamic optimization method based on Q learning

Country Status (1)

Country Link
CN (1) CN111246502B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113316156B (en) * 2021-05-26 2022-07-12 重庆邮电大学 Intelligent coexistence method on unlicensed frequency band

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106412931A (en) * 2016-12-16 2017-02-15 重庆邮电大学 LTE-U idle channel evaluation method based on multi-slot fusion mechanism
CN108093412A (en) * 2018-01-18 2018-05-29 重庆邮电大学 For the LTE-U based on LAT under multi-operator scenario and WiFi coexistence methods
CN109951864A (en) * 2019-03-28 2019-06-28 重庆邮电大学 The system performance analysis method coexisted based on the imperfect spectrum detection of LAA and WiFi
CN110035559A (en) * 2019-04-25 2019-07-19 重庆邮电大学 A kind of contention window size intelligent selecting method based on chaos Q- learning algorithm

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9769746B2 (en) * 2014-06-10 2017-09-19 Newracom, Inc. Operation method of station in wireless local area network
US10069575B1 (en) * 2017-03-01 2018-09-04 Alcatel Lucent Dynamic interference suppression for massive multiple-input-multiple-output (MIMO) in unlicensed frequency bands

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106412931A (en) * 2016-12-16 2017-02-15 重庆邮电大学 LTE-U idle channel evaluation method based on multi-slot fusion mechanism
CN108093412A (en) * 2018-01-18 2018-05-29 重庆邮电大学 For the LTE-U based on LAT under multi-operator scenario and WiFi coexistence methods
CN109951864A (en) * 2019-03-28 2019-06-28 重庆邮电大学 The system performance analysis method coexisted based on the imperfect spectrum detection of LAA and WiFi
CN110035559A (en) * 2019-04-25 2019-07-19 重庆邮电大学 A kind of contention window size intelligent selecting method based on chaos Q- learning algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"TDoc_List_RAN2#96_110217".《3GPP tsg_ran\WG2_RL2》.2017,全文. *
物联网环境下数据转发模型研究;李继蕊,李小勇,高雅丽,高云全,方滨兴;《软件学报》;20171031;全文 *

Also Published As

Publication number Publication date
CN111246502A (en) 2020-06-05

Similar Documents

Publication Publication Date Title
Ren et al. A survey on dynamic spectrum access protocols for distributed cognitive wireless networks
Sathya et al. Standardization advances for cellular and Wi-Fi coexistence in the unlicensed 5 and 6 GHz bands
CN107580327B (en) Optimal frequency band selection-based cognitive wireless network throughput optimization method
TW201919430A (en) Method and terminal device for resource selection in D2D communication
US20080317062A1 (en) Method for configuring mutli-channel communication
EP2620008A1 (en) Autonomous unlicensed band reuse in mixed cellular and device-to-device network
CN110035559B (en) Intelligent competition window size selection method based on chaotic Q-learning algorithm
Tonnemacher et al. Opportunistic channel access using reinforcement learning in tiered CBRS networks
US10485006B2 (en) Judgment method of channel occupancy and judgment device
EP3275271B1 (en) Method and system for adaptive channel access in unlicensed spectrum
CN111246502B (en) Energy threshold dynamic optimization method based on Q learning
Pei et al. A Q-learning based energy threshold optimization algorithm in LAA networks
CN102111887B (en) Method for adjusting physical random access channel (PRACH) resource density and base station
CN117715219A (en) Space-time domain resource allocation method based on deep reinforcement learning
Mbengue et al. Performance analysis of LAA/Wi-Fi coexistence: Stochastic geometry model
Li et al. Traffic offloading from LTE-U to WiFi: A multi-objective optimization approach
CN109951864B (en) System performance analysis method based on coexistence of LAA imperfect spectrum detection and WiFi
Kiran et al. Wi-Fi and LTE Coexistence in Unlicensed Spectrum
US11252759B2 (en) Reconfiguration of listen-after-talk procedure
CN113316156B (en) Intelligent coexistence method on unlicensed frequency band
CN111741476B (en) Automatic channel avoidance anti-interference transmission method
CN106937326A (en) Method of coordinated transmission and first base station between base station
Tang et al. An almost blank subframe allocation algorithm for 5G new radio in unlicensed bands
Ajami et al. Performance bounds of different channel access priority classes in future Licensed Assisted Access (LAA) LTE networks
Zhang et al. Resource allocation in unlicensed long term evolution HetNets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant