CN111246502A - Energy threshold dynamic optimization method based on Q learning - Google Patents
Energy threshold dynamic optimization method based on Q learning Download PDFInfo
- Publication number
- CN111246502A CN111246502A CN202010021376.6A CN202010021376A CN111246502A CN 111246502 A CN111246502 A CN 111246502A CN 202010021376 A CN202010021376 A CN 202010021376A CN 111246502 A CN111246502 A CN 111246502A
- Authority
- CN
- China
- Prior art keywords
- action
- throughput
- fairness
- learning
- laa
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W24/00—Supervisory, monitoring or testing arrangements
- H04W24/02—Arrangements for optimising operational condition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/02—Traffic management, e.g. flow control or congestion control
- H04W28/06—Optimizing the usage of the radio link, e.g. header compression, information sizing, discarding information
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention relates to a dynamic energy threshold optimization method based on Q learning, and belongs to the technical field of communication. By adopting a Q learning-based method, the intelligent agent is repeatedly trained, and then the optimal action is selected according to the converged Q table to reach the target state. The current algorithm can dynamically adjust the LAA energy detection threshold value in real time according to the external environment, and the coexistence system can be ensured to be in a high-throughput and high-fairness state by adopting the algorithm based on Q learning.
Description
Technical Field
The invention belongs to the technical field of communication, and particularly relates to a dynamic energy threshold optimization method based on Q learning.
Background
With the rapid development of mobile communication technology, the number of various intelligent terminals is increased explosively, and spectrum resources are less and less with the increase of the number of intelligent terminals. Spectrum resources can be divided into licensed spectrum resources and unlicensed spectrum resources, where licensed spectrum resources suitable for communication transmission are becoming more scarce and also becoming more crowded, and simply increasing spectrum utilization is not enough to alleviate the shortage of spectrum resources. The 3GPP standardization body proposes that utilizing unlicensed spectrum is an effective solution to the growing traffic.
Currently available unlicensed frequency bands include the 2.4GHz industrial, scientific and medical (ISM band) and 5GHz U-nii (unlicensed National Information infrastructure) frequency bands, and there are already some mature access technologies such as Wi-Fi, bluetooth, radar, D2D (Device-to-Device), and the like in the existing frequency bands, the most prominent of which is Wi-Fi access technology. In addition to meeting the usage limit of unlicensed frequency bands in different countries and regions, the main problem of LTE is how to ensure harmonious coexistence with Wi-Fi under the condition of fair use of frequency band resources.
The LTE technology and the Wi-Fi technology are two distinct wireless communication technologies, and the difference between the two technical protocols causes negative effects in the direct convergence process of the network. For 3GPP standardization organization to deploy LTE system to unlicensed frequency band, a Licensed-Assisted Access (LAA) technology is formulated in LTE Release 13. In order to solve the problem of harmonious coexistence of LTE and Wi-Fi, the LAA scheme changes the access mechanism of LTE, and adopts a Listen Before Talk (Listen Before Talk, LBT) access mechanism. The mechanism requires that all LTE devices need to detect the current channel state before accessing the channel, the LTE devices need to compete with Wi-Fi devices for the channel, and this scheme needs to change the access protocol of LTE. The core of the LBT mechanism is Clear Channel Access (CCA), which uses Energy Detection (ED) technology to sense the channel. In the LAA scheme, the LTE equipment detects the current channel state before accessing a channel, if the channel state is detected to be busy, the equipment waits for the completion of the transmission of other equipment and searches for a chance to carry out data transmission, if the channel state is detected to be idle, the equipment immediately accesses the channel and carries out data transmission, and the energy detection technology is a simple and effective scheme for judging the channel state. Whether the LTE device accesses the channel for data transmission depends on the result of energy detection, and the size of the energy threshold of the LAA scheme directly affects the result of energy detection.
The reference threshold value specified by the 3GPP standardization organization is a fixed value, and does not take into consideration the real-time network environment and the like. Therefore, a dynamic energy threshold optimization scheme considering a real-time network environment is designed.
Disclosure of Invention
In view of this, the present invention provides a Q learning-based energy threshold dynamic optimization method, which is used to solve the fairness problem when two different networks coexist in an unlicensed frequency band.
In order to achieve the purpose, the invention provides the following technical scheme:
the energy threshold dynamic optimization method based on Q learning comprises the following steps:
s1: setting LAA SBSs action set A ═ { a ═ a1,a2...atAnd state set S ═ S1,s2...stInitializing a Q matrix to be a zero-order matrix, and randomly selecting an initial state by LAA SBSs;
s2: the LAA SBSs select an action a according to an epsilon-greedy selection strategyt;
S3: according to action atCalculating the throughput and fairness coefficient of the coexisting system corresponding to the currently selected action, and obtaining the currently selected action atIs given a prize of r(s)t,at);
S4: updating the Q table according to a Q table updating formula of Q learning, and enabling LAA SBSs to enter the next state;
s5: repeatedly executing step S2 and the following steps until the next state reaches the target state;
further, in step S1, for action set a ═ a1,a2...atIn which each action atValues representing different energy thresholds, S ═ S for the set of states1,s2...stEach state stAre all composed of throughput and fairness coefficients, i.e. st={Rt,Ft};
Further, in step S2, an action is chosen using an epsilon-greedy action selection policy. Different from a random selection strategy and a greedy selection strategy, the situation that iteration times are too many due to repeated action selection in the random selection strategy and the situation that a local optimal solution occurs in the greedy selection strategy are avoided. The epsilon-greedy action selection strategy can efficiently and accurately select actions by adopting a selection mode combining the epsilon-greedy action selection strategy and the greedy action selection strategy.
Further, in step S3, action a is selected using the ε -greedy selection strategytUsing action atCalculating a corresponding throughput RtAnd a fairness coefficient FtI.e. to confirm the state s corresponding to the current actiont={Rt,Ft}. For state stThroughput R in (1)tThe sum of the throughput of the LAA system and the throughput of the Wi-Fi system is expressed, and the throughput of the coexisting system is obtained by referring to a Markov chain model. For state stFairness factor F intThe fairness coefficient, which represents a coexistence system, is defined as:
wherein R islAnd RwDenotes the LAA and Wi-Fi throughputs, nlAnd nwRespectively representing the equipment quantity and the fairness factor F of LAA SBSs and Wi-FiapstThe closer to 1, the more fair the coexistence system. Therefore, the states can be divided into four states, low throughput low fairness, low throughput high fairness, high throughput, respectively, according to throughput and fairnessLow throughput and high throughput with high fairness. The high throughput fairness is a target state of LAA SBSs, and four states are defined as follows:
whereinAndindicating a threshold for fairness. Further, in step S3, action a is selectedtUpon completion, the reward r(s) is acquired according to the currently selected actiont,at). The reward function is defined as:
wherein F1DEG and F2Degree is the minimum fairness factor defined only when action atWhen the corresponding throughput and fairness coefficients meet certain conditions, the currently selected action is rewarded.
Further, in step S4, the Q table is updated according to the Q table update formula of Q learning. The update formula is:
where α represents the learning rate and 0< α <1, γ represents the discount factor and 0 ≦ γ < 1.
Further, in step S5, for Q learning herein, only if the current state reaches the target state, i.e., LAASBSs current state reaches high throughput fairness, one iteration is calculated to be completed.
The invention has the beneficial effects that: the energy threshold of the LTE-LAA on the unlicensed frequency band is dynamically optimized through the Q learning algorithm, the fairness of the coexistence system can be enabled to be the highest under the condition that the throughput of the coexistence system is ensured to be certain, and the method has a reference meaning for the harmonious coexistence of other heterogeneous networks.
Drawings
In order to make the object, technical scheme and beneficial effect of the invention more clear, the invention provides the following drawings for explanation:
FIG. 1 is a Q learning framework diagram;
FIG. 2 is a diagram of a network model for coexistence of LTE and Wi-Fi.
Detailed Description
Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
The invention provides a dynamic energy threshold optimization method based on Q learning, aiming at the coexistence problem of LTE and Wi-Fi on an unlicensed frequency band. Referring to a fixed reference threshold scheme in a 3GPP standardization file, the invention can dynamically optimize the energy threshold of the LAA SBSs for channel detection based on a Q learning algorithm. The LAA SBSs can dynamically adjust the size of the energy threshold according to the real-time environment of the network. As shown in fig. 1, first, LAA SBSs is used as an agent, and in a certain state, action selection is performed according to an epsilon-greedy action selection policy, which can obtain rewards in a certain environment, and the Q-table is updated according to a Q-learning update formula, and the above actions are repeated until convergence.
In the coexistence scenario, a plurality of LAA SBSs and a plurality of Wi-Fi APs exist, the network model is shown in fig. 2, and we only consider the transmission of data traffic in the downlink, so that the LAA SBSs and the Wi-Fi APs respectively perform channel detection. In fig. 2, the solid black line and the dashed line respectively represent the licensed spectrum and the unlicensed spectrum, we only consider data transmission on the unlicensed spectrum, the dashed red line represents that the Wi-Fi APs broadcasts information such as throughput of the current access point at each decision time, and furthermore, the LAA SBSs can analyze the received broadcast information.
In Q learning, we treat LAA SBSs in heterogeneous networks as agents. At a particular time, the agent observes the state from its environment and takes action, and at each decision time t the agent takes appropriate action to maximize the reward at the next time t + 1. In Q learning, the learned Q value is updated with the instant prize and the discount prize, and stored in a two-dimensional Q table.
In a heterogeneous network, LAA devices coexist with Wi-Fi users on unlicensed bands. Based on the working principle of Q learning, the action and state set representation is defined as follows: a ═ a1,a2...atS and S ═ S1,s2...st}. Wherein each element in the A set represents a different energy threshold for detecting the state of the unlicensed channel, and each element in the S set represents a parameter pair consisting of a throughput coefficient and a fairness coefficient, i.e. St={Rt,FtFor state s }tThroughput R in (1)tThe sum of the throughput of the LAA system and the throughput of the Wi-Fi system is expressed, and the throughput of the coexisting system is obtained by referring to a Markov chain model. For state stFairness factor F intThe fairness coefficient, which represents a coexistence system, is defined as:
wherein R islAnd RwDenotes the LAA and Wi-Fi throughputs, nlAnd nwRespectively representing the equipment quantity and the fairness factor F of LAA SBSs and Wi-FiapstThe closer to 1, the more fair the coexistence system. Therefore, the states can be divided into four states, low throughput low fairness, low throughput high fairness, high throughput low throughput and high throughput high fairness, respectively, based on throughput and fairness. The high throughput fairness is a target state of the LAA SBSs, and the defined state is as follows:
The algorithm uses an epsilon-greedy strategy to select actions on the action selection strategy. Different from a random selection strategy and a greedy selection strategy, the epsilon-greedy action selection strategy can efficiently and accurately select actions by adopting a selection mode of combining the random selection strategy and the greedy selection strategy. Is defined as:
according to the epsilon-greedy selection policy, action a is performedtThe earning award is r(s)t,at). The reward function is defined as:
F1DEG and F2Degree is a prescribed minimum fairness factor, only when action atWhen the corresponding throughput and fairness coefficients meet certain conditions, the selected action is rewarded. Updating the Q value according to an updating formula:
where 0< α <1, 0 ≦ γ <1 if α ≦ 1, the previously learned experience will be ignored and replaced with the latest estimated reward, and the larger γ, the greater the agent's dependence on the value reward.
Finally, for the Q learning algorithm herein, an iterative process is only completed if the current state reaches the target state, i.e., the LAA SBSs current state reaches high throughput fairness.
Finally, it is noted that the above-mentioned preferred embodiments illustrate rather than limit the invention, and that, although the invention has been described in detail with reference to the above-mentioned preferred embodiments, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the scope of the invention as defined by the appended claims.
Claims (6)
1. A dynamic energy threshold optimization method based on Q learning is characterized in that: the method comprises the following steps:
s1: setting LAA SBSs action set A ═ { a ═ a1,a2...atAnd state set S ═ S1,s2...stInitializing a Q matrix to be a zero-order matrix, and randomly selecting an initial state by LAA SBSs;
s2: the LAA SBSs select an action a according to an epsilon-greedy selection strategyt;
S3: according to action atCalculating the throughput and fairness coefficient of the coexisting system corresponding to the currently selected action, and obtaining the currently selected action atIs given a prize of r(s)t,at);
S4: updating the Q table according to a Q table updating formula of Q learning, and enabling LAA SBSs to enter the next state;
s5: and repeatedly executing the step S2 and the following steps until the Q table convergence completes the training.
2. The method of claim 1, wherein the energy threshold dynamic optimization method based on Q learning is as follows: in step S1, for action set a, { a } is set1,a2...atIn which each action atValues representing different energy thresholds, S ═ S for the set of states1,s2...stEach state stAre all composed of throughput and fairness coefficients, i.e. st={Rt,Ft}。
3. The method of claim 2, wherein the energy threshold dynamic optimization method based on Q learning is as follows: in step S2, an action is chosen using an epsilon-greedy action selection policy. The epsilon-greedy action selection strategy can efficiently and accurately select actions by adopting a selection mode combining the epsilon-greedy action selection strategy and the greedy action selection strategy.
4. According to claimThe method for dynamically optimizing the energy threshold based on the Q learning, as set forth in claim 3, is characterized in that: in step S3, action a is selected using the ε -greedy selection strategytUsing action atCalculating a corresponding throughput RtAnd a fairness coefficient FtI.e. to confirm the state s corresponding to the current actiont={Rt,Ft}. For state stThroughput R in (1)tThe sum of the throughput of the LAA system and the throughput of the Wi-Fi system is expressed, and the throughput of the coexisting system is obtained by referring to a Markov chain model. For state stFairness factor F intThe fairness coefficient, which represents a coexistence system, is defined as:
wherein R islAnd RwDenotes the LAA and Wi-Fi throughputs, nlAnd nwRespectively representing the equipment quantity and the fairness coefficient F of the LAA SBSs and the Wi-Fi APstThe closer to 1, the more fair the coexistence system. Therefore, the states can be divided into four states, low throughput low fairness, low throughput high fairness, high throughput low throughput and high throughput high fairness, respectively, based on throughput and fairness. Wherein the high throughput fairness is a target state for LAA SBSs, and further, when selecting action atUpon completion, the reward r(s) is acquired according to the currently selected actiont,at). The reward function is defined as:
wherein F1DEG and F2Degree is the minimum fairness factor defined only when action atWhen the corresponding throughput and fairness coefficients meet certain conditions, the currently selected action is rewarded.
6. The method of claim 5, wherein the energy threshold dynamic optimization method based on Q learning is as follows: in step S5, for Q learning herein, an iterative process is only completed if the current state reaches the target state, i.e., the LAA SBSs current state reaches high throughput fairness. And repeatedly executing the step S2 and the following steps until the Q table convergence completes the training.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010021376.6A CN111246502B (en) | 2020-01-09 | 2020-01-09 | Energy threshold dynamic optimization method based on Q learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010021376.6A CN111246502B (en) | 2020-01-09 | 2020-01-09 | Energy threshold dynamic optimization method based on Q learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111246502A true CN111246502A (en) | 2020-06-05 |
CN111246502B CN111246502B (en) | 2022-04-29 |
Family
ID=70878159
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010021376.6A Active CN111246502B (en) | 2020-01-09 | 2020-01-09 | Energy threshold dynamic optimization method based on Q learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111246502B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113316156A (en) * | 2021-05-26 | 2021-08-27 | 重庆邮电大学 | Intelligent coexistence method on unlicensed frequency band |
CN114374977A (en) * | 2022-01-13 | 2022-04-19 | 重庆邮电大学 | Coexistence method based on Q learning under non-cooperation |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150358904A1 (en) * | 2014-06-10 | 2015-12-10 | Newracom Inc. | Operation method of station in wireless local area network |
CN106412931A (en) * | 2016-12-16 | 2017-02-15 | 重庆邮电大学 | LTE-U idle channel evaluation method based on multi-slot fusion mechanism |
CN108093412A (en) * | 2018-01-18 | 2018-05-29 | 重庆邮电大学 | For the LTE-U based on LAT under multi-operator scenario and WiFi coexistence methods |
US20180254838A1 (en) * | 2017-03-01 | 2018-09-06 | Alcatel Lucent | Dynamic interference suppression for massive multiple-input-multiple-output (mimo) in unlicensed frequency bands |
CN109951864A (en) * | 2019-03-28 | 2019-06-28 | 重庆邮电大学 | The system performance analysis method coexisted based on the imperfect spectrum detection of LAA and WiFi |
CN110035559A (en) * | 2019-04-25 | 2019-07-19 | 重庆邮电大学 | A kind of contention window size intelligent selecting method based on chaos Q- learning algorithm |
-
2020
- 2020-01-09 CN CN202010021376.6A patent/CN111246502B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150358904A1 (en) * | 2014-06-10 | 2015-12-10 | Newracom Inc. | Operation method of station in wireless local area network |
CN106412931A (en) * | 2016-12-16 | 2017-02-15 | 重庆邮电大学 | LTE-U idle channel evaluation method based on multi-slot fusion mechanism |
US20180254838A1 (en) * | 2017-03-01 | 2018-09-06 | Alcatel Lucent | Dynamic interference suppression for massive multiple-input-multiple-output (mimo) in unlicensed frequency bands |
CN108093412A (en) * | 2018-01-18 | 2018-05-29 | 重庆邮电大学 | For the LTE-U based on LAT under multi-operator scenario and WiFi coexistence methods |
CN109951864A (en) * | 2019-03-28 | 2019-06-28 | 重庆邮电大学 | The system performance analysis method coexisted based on the imperfect spectrum detection of LAA and WiFi |
CN110035559A (en) * | 2019-04-25 | 2019-07-19 | 重庆邮电大学 | A kind of contention window size intelligent selecting method based on chaos Q- learning algorithm |
Non-Patent Citations (2)
Title |
---|
""TDoc_List_RAN2#96_110217"", 《3GPP TSG_RAN\WG2_RL2》 * |
李继蕊,李小勇,高雅丽,高云全,方滨兴: "物联网环境下数据转发模型研究", 《软件学报》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113316156A (en) * | 2021-05-26 | 2021-08-27 | 重庆邮电大学 | Intelligent coexistence method on unlicensed frequency band |
CN114374977A (en) * | 2022-01-13 | 2022-04-19 | 重庆邮电大学 | Coexistence method based on Q learning under non-cooperation |
Also Published As
Publication number | Publication date |
---|---|
CN111246502B (en) | 2022-04-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022022334A1 (en) | Artificial intelligence-based communication method and communication device | |
Ren et al. | A survey on dynamic spectrum access protocols for distributed cognitive wireless networks | |
CN106332094A (en) | Q algorithm-based dynamic duty ratio coexistence method for LTE-U and Wi-Fi systems in unauthorized frequency band | |
CN106358308A (en) | Resource allocation method for reinforcement learning in ultra-dense network | |
CN107580327B (en) | Optimal frequency band selection-based cognitive wireless network throughput optimization method | |
CN110035559B (en) | Intelligent competition window size selection method based on chaotic Q-learning algorithm | |
Yin et al. | Adaptive LBT for licensed assisted access LTE networks | |
CN109714807A (en) | A kind of cognition wireless network cut-in method based on common control channel | |
Mesodiakaki et al. | Performance analysis of a cognitive radio contention-aware channel selection algorithm | |
CN111246502B (en) | Energy threshold dynamic optimization method based on Q learning | |
Karaca et al. | Modifying backoff freezing mechanism to optimize dense IEEE 802.11 networks | |
CN108901079A (en) | Time-out time determines method, apparatus, equipment and storage medium | |
CN116233897B (en) | IRS-based optimization method for cognitive enhancement cognitive radio network resource allocation | |
EP3275271B1 (en) | Method and system for adaptive channel access in unlicensed spectrum | |
Pei et al. | A Q-learning based energy threshold optimization algorithm in LAA networks | |
Mbengue et al. | Performance analysis of LAA/Wi-Fi coexistence: Stochastic geometry model | |
CN117715219A (en) | Space-time domain resource allocation method based on deep reinforcement learning | |
CN110536398B (en) | Average delay guarantee power control method and system based on multidimensional effective capacity | |
Pei et al. | A chaotic Q-learning-based licensed assisted access scheme over the unlicensed spectrum | |
CN109951864B (en) | System performance analysis method based on coexistence of LAA imperfect spectrum detection and WiFi | |
Kiran et al. | Wi-Fi and LTE Coexistence in Unlicensed Spectrum | |
CN103888952B (en) | The method and system of frequency spectrum is competed in a kind of cognitive radio system | |
CN113316156B (en) | Intelligent coexistence method on unlicensed frequency band | |
CN106937326B (en) | Method for coordinating transmission among base stations and first base station | |
US20210289551A1 (en) | Reconfiguration of listen-after-talk procedure |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |