CN107426772B - Dynamic competition window adjusting method, device and equipment based on Q learning - Google Patents

Dynamic competition window adjusting method, device and equipment based on Q learning Download PDF

Info

Publication number
CN107426772B
CN107426772B CN201710537493.6A CN201710537493A CN107426772B CN 107426772 B CN107426772 B CN 107426772B CN 201710537493 A CN201710537493 A CN 201710537493A CN 107426772 B CN107426772 B CN 107426772B
Authority
CN
China
Prior art keywords
throughput
annealing temperature
value
updated
contention window
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710537493.6A
Other languages
Chinese (zh)
Other versions
CN107426772A (en
Inventor
田辉
闫晓婧
秦城
范绍帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201710537493.6A priority Critical patent/CN107426772B/en
Publication of CN107426772A publication Critical patent/CN107426772A/en
Application granted granted Critical
Publication of CN107426772B publication Critical patent/CN107426772B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/08Load balancing or load distribution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/10Flow control between communication endpoints
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W74/00Wireless channel access, e.g. scheduled or random access
    • H04W74/08Non-scheduled or contention based access, e.g. random access, ALOHA, CSMA [Carrier Sense Multiple Access]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/06Testing, supervising or monitoring using simulated traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/08Testing, supervising or monitoring using real traffic

Abstract

The embodiment of the invention provides a dynamic competition window adjusting method, a device and equipment based on Q learning, wherein the method comprises the following steps: A. initializing channel access parameters and initial annealing temperature; B. transmitting and acquiring a first throughput of data packet transmission under the size of an initial contention window; C. under the first throughput, adopting a simulated annealing algorithm to obtain the size of a first competition window; D. acquiring a second throughput of data packet transmission under the size of the first contention window; E. updating the Q value table according to a preset formula for the second throughput and the target throughput, and updating the initial annealing temperature by using preset conditions to obtain the updated annealing temperature; and when the updated annealing temperature is higher than the minimum threshold value, taking the updated annealing temperature as the initial annealing temperature, executing the steps B to E, repeatedly updating the Q value table and obtaining the updated annealing temperature, and stopping updating the Q value table until the updated annealing temperature is lower than or equal to the minimum threshold value, thus obtaining the optimal competition window.

Description

Dynamic competition window adjusting method, device and equipment based on Q learning
Technical Field
The present invention relates to the field of wireless communication technologies, and in particular, to a method, an apparatus, and a device for adjusting a dynamic contention window based on Q learning.
Background
With the popularization of mobile terminal devices, the amount of traffic carried by cellular mobile networks has also increased substantially. In order to meet the diverse service demands of users, cellular mobile networks require more spectrum resources to adapt to the rapid increase of mobile network traffic. The frequency resources of the licensed band cannot meet the various service requirements of users, and operators have gradually started to use the unlicensed band to complete auxiliary access to meet the increasing traffic. An authorized Assisted Access (LAA) network provides a good way for operators to utilize unlicensed frequency bands, in order to ensure fair coexistence of LAA and other unlicensed spectrum systems, network equipment needs to use energy monitoring to determine whether channels are occupied, and an eNB base station needs to continuously adjust the size of a contention window to ensure successful and correct transmission of data. Meanwhile, a Wireless communication network technology (Wi-Fi for short) can also realize data transmission, and with the popularization of a Wi-Fi environment, a user can save mobile network flow cost and can meet various service requirements at any time.
The LAA system has a relatively strong protocol mechanism, and has feedback mechanisms such as a hybrid automatic repeat request technology and Channel State Information (CSI for short), so that when the LAA system exists with other networks, for example, with Wi-Fi, the Channel access probability of the LAA system is higher than that of the Wi-Fi Channel, the throughput of the LAA network system Channel is higher, the data transmission amount of the LAA system Channel is large, the Channel burden is easily caused, and strong interference is generated on other wireless network access technologies coexisting with the LAA network.
Disclosure of Invention
Embodiments of the present invention provide a method, an apparatus, and a device for adjusting a dynamic contention window based on Q learning, so as to effectively and dynamically adjust the size of the contention window, effectively limit the throughput of an eNB base station of an LAA, and reduce the burden of data transmission. The specific technical scheme is as follows:
the embodiment of the invention provides a dynamic contention window adjusting method based on Q learning, which is applied to an LAA base station and comprises the following steps:
step A, initializing channel access parameters and initial annealing temperature; the channel access parameters include: initial contention window size, Q value table, target throughput;
step B, transmitting a data packet and acquiring a first throughput of data packet transmission under the size of the initial contention window;
c, under the first throughput, adopting a simulated annealing algorithm to obtain the size of a first competition window;
step D, acquiring a second throughput of the data packet transmission under the size of the first contention window;
step E, updating the Q value table according to a preset formula by the second throughput and the target throughput, and updating the initial annealing temperature by using preset conditions to obtain an updated annealing temperature;
and when the updated annealing temperature is higher than the minimum threshold value, taking the updated annealing temperature as the initial annealing temperature, executing the steps B to E, repeatedly updating the Q value table and obtaining the updated annealing temperature, and stopping updating the Q value table until the updated annealing temperature is lower than or equal to the minimum threshold value, thus obtaining the optimal competition window.
Specifically, the obtaining the second throughput of the data packet transmission channel under the size of the first contention window includes:
and under the size of the first competition window, acquiring the second throughput of the data packet transmission channel through a Markov probability model.
Specifically, the updating the Q-value table according to the second throughput and the target throughput by a preset formula includes:
calculating a cost value for transmission of the data packets; the cost value is an absolute value of a difference between the second throughput and the target throughput;
according to the cost value, under the first contention window size, updating the Q-value table according to a preset formula to obtain an updated Q-value table, where the preset formula includes Q (s1, a1) + α [ c + γ minQ (s2, a2) -Q (s1, a1) ], where c ═ s2-s0|, α and γ are constants between 0 and 1, s0 represents a target throughput, s1 represents a first throughput, s2 represents a second throughput, a1 represents a first contention window size value, and a2 represents a second contention window size value.
Specifically, the preset conditions include: and the absolute value of the difference value between the second throughput and the target throughput is less than a preset threshold value.
The embodiment of the invention provides a dynamic competition window adjusting device based on Q learning, which comprises:
the initialization module is used for initializing channel access parameters and initial annealing temperature; the channel access parameters include: initial contention window size, Q value table, target throughput;
a transmission module, configured to transmit a data packet and obtain a first throughput of the data packet transmission in the size of the initial contention window;
the calculation module is used for adopting a simulated annealing algorithm under the first throughput to obtain the size of a first competition window;
an obtaining module, configured to obtain a second throughput of the data packet transmission in the size of the first contention window;
an updating module, configured to update the Q-value table according to a preset formula for the second throughput and the target throughput, and update the initial annealing temperature by using a preset condition, to obtain an updated annealing temperature;
and the circulation module is used for taking the updated annealing temperature as the initial annealing temperature when the updated annealing temperature is higher than the minimum threshold value, executing the transmission module to the updating module, repeatedly updating the Q value table until the updated annealing temperature is lower than or equal to the minimum threshold value, stopping updating the Q value table and obtaining the updated annealing temperature, and obtaining the optimal competition window.
Specifically, the obtaining module is specifically configured to obtain, through a markov probability model, a second throughput of the data packet transmission channel in the size of the first contention window.
Specifically, the update module includes:
a first update sub-module, specifically configured to calculate a cost value of the packet transmission; the cost value is an absolute value of a difference between the second throughput and the target throughput;
and a second updating sub-module, configured to update the Q-value table according to a preset formula under the first contention window size according to the cost value, so as to obtain an updated Q-value table, where the preset formula includes Q (s1, a1) + α [ c + γ minQ (s2, a2) -Q (s1, a1) ], where c ═ s2-s0|, α and γ are constants between 0 and 1, s0 represents a target throughput, s1 represents a first throughput, s2 represents a second throughput, a1 represents a first contention window size value, and a2 represents a second contention window size value.
Specifically, the loop module is specifically configured to update the initial annealing temperature when an absolute value of a difference between the second throughput and a target throughput is smaller than a preset threshold.
The embodiment of the invention provides electronic equipment, which comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory finish mutual communication through the communication bus;
the memory is used for storing a computer program;
the processor is configured to implement the method steps of adjusting the dynamic contention window based on Q learning as described above when executing the program stored in the memory.
An embodiment of the present invention provides a computer-readable storage medium, in which a computer program is stored, and when being executed by a processor, the computer program implements the method steps for adjusting the dynamic contention window based on Q learning as described above.
The embodiment of the invention provides a dynamic competition window adjusting method, a device and equipment based on Q learning, which comprises the following steps: step A, initializing channel access parameters and initial annealing temperature; wherein, the channel access parameters include: initial contention window size, Q value table, target throughput; step B, under the size of the initial contention window, the LAA base station transmits a data packet and acquires a first throughput of data packet transmission; c, under the first throughput, adopting a simulated annealing algorithm to obtain the size of a first competition window; step D, acquiring a second throughput of data packet transmission under the size of the first contention window; step E, updating the Q value table according to a preset formula for the second throughput and the target throughput, and updating the initial annealing temperature by using preset conditions to obtain the updated annealing temperature; and when the updated annealing temperature is higher than the minimum threshold value, taking the updated annealing temperature as the initial annealing temperature, executing the steps B to E, repeatedly updating the Q value table and obtaining the updated annealing temperature, and stopping updating the Q value table until the updated annealing temperature is lower than or equal to the minimum threshold value, thus obtaining the optimal competition window. The embodiment of the invention limits the throughput of the LAA channel by setting the target throughput, initializes the initial annealing temperature, continuously calculates the annealing temperature of each data transmission when continuously updating the Q value table, stops updating the Q value table when the annealing temperature is less than or equal to the minimum threshold value, and takes the competition window of the obtained Q value table as the optimal competition window. Of course, it is not necessary for any product or method of practicing the invention to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a dynamic contention window adjustment method based on Q learning according to an embodiment of the present invention;
fig. 2 is a schematic diagram illustrating the sum of throughputs of the LAA and the WiFi systems under different adjustment methods according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a dynamic contention window adjusting apparatus based on Q learning according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an update module according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to effectively limit the throughput of the eNB base station of the LAA, reduce the burden of data transmission, and achieve effective dynamic adjustment of the size of the contention window, embodiments of the present invention provide a dynamic contention window adjustment method, apparatus, and device based on Q learning, which are described in detail below.
Fig. 1 is a flowchart of a dynamic contention window adjustment method based on Q learning according to an embodiment of the present invention.
Step 101, initializing channel access parameters and an initial annealing temperature.
The method provided by the embodiment of the invention is applied to an eNB base station of LAA, the base station prepares to transmit a data packet under a wireless communication standard, and initializes the channel access parameter and the initial annealing temperature before transmitting the data packet. Wherein, the channel access parameters include: initial contention window size, Q value table, target throughput.
The initial contention window size is a contention window size adopted when the base station transmits the data packet for the first time, and in this embodiment, the initial contention window size value is initialized to 2, that is, the contention window size value adopted when the base station transmits the data packet for the first time is 2.
The Q-value table is a parameter in the Q-learning process, which is a reinforcement learning method, and specifically, Q-learning can be used to find the optimal action selection strategy for any given markov decision process, and the Q-learning term includes a state and an action, and the Q-value table is a matrix, where each row of the matrix represents the current state of the eNB base station, and each column represents the action that may occur when the eNB base station reaches the next state. In this embodiment, the Q-value table is initialized to a 5-row, 15-column matrix, and each element in the matrix has a value of 0.
The target throughput is used to limit the range of the LAA throughput, and the target throughput is set to limit the throughput of the LAA not too large, so as to limit the channel access probability of the LAA, and achieve the effect of reducing the burden of data transmission.
The initial annealing temperature is a parameter of the annealing algorithm, the simulated annealing algorithm can be used to solve the optimization problem, in order to obtain the optimal competition window, an initial annealing temperature needs to be set, the initial annealing temperature is continuously reduced to a minimum threshold, here set to 1, so as to obtain the optimal competition window, and in this step, the initial annealing temperature is initialized to 1000 ℃.
Step 102, transmitting the data packet and obtaining a first throughput of data packet transmission under the size of the initial contention window.
And the LAA base station transmits the data packet under the initial contention window size. The base station server obtains a first throughput of current data packet transmission, after channel access parameters are initialized, each LAA base station has an initial competition strategy, and the initial competition strategy comprises the throughput of a current competition window, namely the throughput corresponding to the initial competition window. In order to obtain the first throughput of the LAA base station data transmission channel under the initial contention window size, the first throughput of the current contention window may be obtained by dividing the number of bits in the transmission data packet by the difference between the receiving time and the transmitting time.
Step 103, under the first throughput, a simulated annealing algorithm is adopted to obtain the size of the first contention window.
After the first throughput under the initial contention window size is obtained, the current state of the base station is judged, and the base station randomly selects a contention window size a1 according to the current first throughput, which is set as s1, and simultaneously selects a contention window size a2 with the minimum Q value under the current first throughput s1 according to a Q value table. If the value of Q (s1, a1) is less than the value of Q (s1, a2), the base station selects a randomly selected contention window size a1 as the first contention window size; if the value of Q (s1, a1) is larger than or equal to the value of Q (s1, a2), the base station selects the random selected contention window size a1 as the first contention window size by using the probability that the difference between the two Q values and the current annealing temperature quotient is an index, otherwise, the base station selects a2 as the first contention window size.
And 104, acquiring a second throughput of the data packet transmission under the size of the first contention window.
And the LAA base station transmits the data packet under the first contention window size. The base station server obtains a second throughput of the current data packet transmission, and in order to obtain the second throughput of the LAA base station data transmission channel under the size of the first contention window, the second throughput of the current contention window can be obtained by dividing the number of bits in the transmission data packet by a difference between the receiving time and the transmitting time.
A specific method for obtaining the second throughput of the data packet transmission channel in the first contention window size is as follows: and under the size of the first competition window, acquiring the second throughput of the data packet transmission channel through a Markov probability model. And the LAA base station continues to send and transmit data packets under the size of the first competition window, a Markov probability model is adopted, the current channel access probability is calculated according to the transition probability among different states in the Markov probability model and a state transition equation, and the second throughput is obtained according to the channel access probability.
And 105, updating the Q value table according to the second throughput and the target throughput according to a preset formula, and updating the initial annealing temperature by using preset conditions to obtain the updated annealing temperature.
And after the second throughput is obtained, updating the values of the elements in the Q value table according to a preset formula by using the second throughput and the target throughput. Assuming that under a first contention window size a1, the first throughput is s1, the obtained second throughput is s2, according to the obtained s2, the second contention window size is a2, and according to a2, a2 and a target throughput s0, a Q value table is updated to Q (s1, a1) + α [ c + γ minQ (s2, a2) -Q (s1, a1) ] according to a preset formula, wherein c is | s2-s0|, α is an arbitrary parameter between 0 and 1, and represents a learning rate, and the size of α controls the rate of updating the Q value; gamma is any parameter between 0 and 1, representing the influence of the current Q value size on the future, and minQ (s2, a2) indicates that when the competition window size is a2, the minimum Q value is selected.
A specific method for updating a Q value table according to a preset formula by using a second throughput and a target throughput comprises the following steps: calculating a cost value for packet transmission; the cost value is the absolute value of the difference between the second throughput and the target throughput; and updating the Q value table according to a preset formula under the size of the first competition window according to the cost value to obtain an updated Q value table. And calculating and updating by directly substituting the absolute value of the difference between the obtained second throughput and the target throughput as a cost value under the size of the first competition window by using a method of calculating the cost value, wherein the cost value is set as c. Updating the Q-value table with reference to the cost value can more intuitively reflect the difference between the throughput per data transmission and the target throughput.
In the embodiment of the present invention, the Q-value table is a matrix with 5 rows and 15 columns, each row represents throughput, the first row is s1, the second row is s2, the third row is s3, the fourth row is s4, the fifth row is s5, wherein s1 ranges from less than 30Mps, s2 ranges from 30Mps to 40Mps, s3 ranges from 40Mps to 50Mps, s4 ranges from 50Mps to 60Mps, s5 ranges from greater than or equal to 60Mps, each column represents the size of the contention window, and the first column to the fifteenth column are 2 to 16, respectively.
And 106, when the updated annealing temperature is higher than the minimum threshold value, taking the updated annealing temperature as the initial annealing temperature, executing the steps 102 to 105, repeatedly updating the Q value table and obtaining the updated annealing temperature, and stopping updating the Q value table until the updated annealing temperature is lower than or equal to the minimum threshold value, so as to obtain the optimal competition window.
After obtaining the updated Q-value table, if the current throughput is in the interval of the target throughput, decreasing the current annealing temperature to T2 ═ β · T1(0< β <1), where T1 is the current annealing temperature and T2 is the decreased annealing temperature, comparing T2 with the set threshold of the minimum temperature, and when T2 is greater than the minimum threshold, executing the above steps 102 to 105, repeating updating the Q-value table until the updated annealing temperature is less than or equal to the minimum threshold, stopping updating the Q-value table, and according to the current throughput, determining the state of the base station, for example, when the throughput is greater, the amount of data transmitted by the LAA base station is greater; when the throughput is small, the LAA base station transmits a small amount of data. In the corresponding state in the Q-value table, i.e. in a certain row of the Q-value table, the smallest Q-value is searched, and then the smallest Q-value corresponds to the optimal contention window, thereby obtaining the optimal contention window.
A specific method for updating the initial annealing temperature by using preset conditions to obtain the updated annealing temperature comprises the following steps: the preset condition is that when the absolute value of the difference between the second throughput and the target throughput is smaller than the preset threshold, the LAA base station calculates the updated annealing temperature by using the simulated annealing algorithm, and updates the annealing temperature, in this embodiment, the preset threshold may be set to 5, that is, when the absolute value of the difference between the second throughput and the target throughput is smaller than the preset threshold, the annealing temperature is updated, and the target throughput is 45Mps, it can be understood that the annealing temperature can be updated when the second throughput is between 40Mps and 50 Mps; and when the updated annealing temperature is less than or equal to the minimum threshold value, stopping updating the Q value table. After updating the Q-value table, calculating the current annealing temperature by simulating the annealing algorithm, when the updated annealing temperature is greater than the minimum threshold, using the updated annealing temperature as the initial annealing temperature, and continuing the calculation of the annealing temperature until the updated annealing temperature is less than or equal to the minimum threshold, in this embodiment, the minimum threshold may be set to 1 ℃. The initial annealing temperature is set to 1000 degrees celsius, so updating of the Q-value table is stopped until the annealing temperature is less than or equal to 1 degree celsius.
And when the updated annealing temperature is less than or equal to the minimum threshold, stopping updating the Q value table, and determining the minimum contention window in the Q value table as the optimal contention window under the throughput of the base station and the current throughput in the interval of the target throughput.
The dynamic competition window adjusting method based on Q learning provided by the embodiment of the invention comprises the following steps: step 101, initializing channel access parameters and initial annealing temperature; wherein, the channel access parameters include: initial contention window size, Q value table, target throughput; 102, under the size of an initial contention window, transmitting a data packet by the LAA base station and acquiring a first throughput of data packet transmission; 103, under the first throughput, adopting a simulated annealing algorithm to obtain the size of a first competition window; 104, acquiring a second throughput of data packet transmission under the size of the first contention window; step 105, updating the Q value table according to a preset formula for the second throughput and the target throughput, and updating the initial annealing temperature by using preset conditions to obtain the updated annealing temperature; when the updated annealing temperature is higher than the minimum threshold, the updated annealing temperature is used as the initial annealing temperature, the above steps 102 to 105 are executed, the Q value table is repeatedly updated and the updated annealing temperature is obtained, and the updating of the Q value table is stopped until the updated annealing temperature is lower than or equal to the minimum threshold, so as to obtain the optimal competition window. The embodiment of the invention limits the throughput of the LAA channel by setting the target throughput, initializes the initial annealing temperature, continuously calculates the annealing temperature of each data transmission when continuously updating the Q value table, stops updating the Q value table when the annealing temperature is less than or equal to the minimum threshold value, and takes the competition window of the obtained Q value table as the optimal competition window.
To further illustrate the effectiveness of the method provided by the embodiment of the present invention, taking the coexistence of the LAA and the WiFi system as an example, as shown in fig. 2, the abscissa represents the throughput of the LAA and the WiFi system, and the ordinate represents the CDF illustrates the cumulative distribution function, which represents the probability value. Fig. 2 shows a schematic diagram of the sum of the throughputs of LAA and WiFi systems under different tuning methods. In the figure, the curve 201 represents the sum of the throughput of the LAA and the WiFi system when the static back-off adjustment method is applied, the curve 202 represents the sum of the throughput of the LAA and the WiFi system when the exponential back-off adjustment method is applied, and the curve 203 represents the sum of the throughput of the LAA and the WiFi system when the method provided by the embodiment of the present invention is applied, as can be seen from fig. 2, the sum of the throughput of the LAA and the WiFi system of the present invention is the largest, which indicates that the WiFi throughput is increased when the LAA throughput is decreased, so the throughput added by the two is increased.
Fig. 2 illustrates that in the Q learning algorithm and the simulated annealing algorithm based on reinforcement learning, a dynamic contention window adjustment method is added, and the eNB contention window size is dynamically adjusted by setting the target throughput and the initial annealing temperature of the LAA base station, so that the throughput of the LAA base station is limited within a certain range, more channel access opportunities are provided to the WiFi system, and the coexistence fairness of the LAA and the WiFi system is improved.
The embodiment of the invention provides a dynamic competition window adjusting device based on Q learning, and the structural schematic diagram of the device is shown in figure 3 and comprises the following components:
an initialization module 301, configured to initialize a channel access parameter and an initial annealing temperature; the channel access parameters include: initial contention window size, Q value table, target throughput;
a transmission module 302, configured to transmit a data packet and obtain a first throughput of data packet transmission in the initial contention window size;
a calculating module 303, configured to adopt a simulated annealing algorithm at the first throughput to obtain a first contention window size;
an obtaining module 304, configured to obtain a second throughput of data packet transmission in the first contention window size;
an updating module 305, configured to update the Q value table according to a preset formula for the second throughput and the target throughput, and update the initial annealing temperature by using a preset condition, so as to obtain an updated annealing temperature;
and a circulation module 306, configured to, when the updated annealing temperature is greater than the minimum threshold, use the updated annealing temperature as the initial annealing temperature, execute the transmission module to the update module, repeatedly update the Q-value table and obtain the updated annealing temperature, and stop updating the Q-value table until the updated annealing temperature is less than or equal to the minimum threshold, to obtain the optimal contention window.
Specifically, the obtaining module 304 is specifically configured to obtain, through a markov probability model, a second throughput of the data packet transmission channel under the first contention window size.
Specifically, the structure diagram of the update module 305 as shown in fig. 4 includes:
a first update submodule 401, configured to calculate a cost value for packet transmission; the cost value is the absolute value of the difference between the second throughput and the target throughput;
the second updating submodule 402 is specifically configured to update the Q-value table according to a preset formula under the first contention window size according to the cost value, so as to obtain an updated Q-value table, where the preset formula includes Q (s2, a2) + α [ c + γ minQ (s2, a3) -Q (s2, a2) ], where c ═ s1-s0|, α and γ are constants between 0 and 1, s0 represents the target throughput, s1 represents the first throughput, a2 represents the first contention window size value, s2 represents the second throughput, and a3 represents the second contention window size value under the second throughput s 2.
The dynamic competition window adjusting device based on Q learning provided by the embodiment of the invention comprises: initializing channel access parameters and initial annealing temperature; wherein, the channel access parameters include: initial contention window size, Q value table, target throughput; under the size of an initial contention window, the LAA base station transmits a data packet and acquires a first throughput of data packet transmission; under the first throughput, the LAA base station adopts a simulated annealing algorithm to obtain the size of a first contention window; acquiring a second throughput of data packet transmission under the size of the first contention window; updating the Q value table according to a preset formula for the second throughput and the target throughput, and updating the initial annealing temperature by using preset conditions to obtain the updated annealing temperature; and when the updated annealing temperature is higher than the minimum threshold value, taking the updated annealing temperature as the initial annealing temperature, executing the steps, repeatedly updating the Q value table and obtaining the updated annealing temperature, and stopping updating the Q value table until the updated annealing temperature is lower than or equal to the minimum threshold value, thus obtaining the optimal competition window. The embodiment of the invention limits the throughput of the LAA channel by setting the target throughput, initializes the initial annealing temperature, continuously calculates the annealing temperature of each data transmission when continuously updating the Q value table, stops updating the Q value table when the annealing temperature is less than or equal to the minimum threshold value, and takes the competition window of the obtained Q value table as the optimal competition window.
An embodiment of the present invention provides an electronic device, as shown in fig. 5, including a processor 501, a communication interface 502, a memory 503 and a communication bus 504, where the processor, the communication interface, and the memory complete mutual communication through the communication bus;
a memory for storing a computer program;
the processor is used for realizing the following method steps of adjusting the dynamic competition window based on the Q learning when the program stored in the memory is executed:
step A, initializing channel access parameters and initial annealing temperature; wherein the channel access parameters include: initial contention window size, Q value table, target throughput;
step B, transmitting the data packet and acquiring a first throughput of data packet transmission under the size of the initial contention window;
c, under the first throughput, adopting a simulated annealing algorithm to obtain the size of a first competition window;
step D, acquiring a second throughput of data packet transmission under the size of the first contention window;
step E, updating the Q value table according to the second throughput and the target throughput according to a preset formula, and updating the initial annealing temperature to obtain the updated annealing temperature;
and when the updated annealing temperature is higher than the minimum threshold value, executing the steps B to E, repeatedly updating the Q value table until the updated annealing temperature is lower than or equal to the minimum threshold value, and stopping updating the Q value table to obtain the optimal competition window.
The electronic device provided by the embodiment of the invention realizes the limitation of the throughput of LAA channel transmission and reduces the transmission load by the following steps, and comprises the following steps: the method comprises the following steps: initializing channel access parameters and initial annealing temperature; wherein, the channel access parameters include: initial contention window size, Q value table, target throughput; under the size of an initial contention window, the LAA base station transmits a data packet and acquires a first throughput of data packet transmission; under the first throughput, the LAA base station adopts a simulated annealing algorithm to obtain the size of a first contention window; acquiring a second throughput of data packet transmission under the size of the first contention window; updating the Q value table according to a preset formula for the second throughput and the target throughput, and updating the initial annealing temperature by using preset conditions to obtain the updated annealing temperature; and when the updated annealing temperature is higher than the minimum threshold value, taking the updated annealing temperature as the initial annealing temperature, executing the steps, repeatedly updating the Q value table and obtaining the updated annealing temperature, and stopping updating the Q value table until the updated annealing temperature is lower than or equal to the minimum threshold value, thus obtaining the optimal competition window. The embodiment of the invention limits the throughput of the LAA channel by setting the target throughput, initializes the initial annealing temperature, continuously calculates the annealing temperature of each data transmission when continuously updating the Q value table, stops updating the Q value table when the annealing temperature is less than or equal to the minimum threshold value, and takes the competition window of the obtained Q value table as the optimal competition window.
An embodiment of the present invention provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the method for adjusting a dynamic contention window based on Q learning as above is implemented.
The computer-readable storage medium provided by the embodiment of the invention comprises: initializing channel access parameters and initial annealing temperature; wherein, the channel access parameters include: initial contention window size, Q value table, target throughput; under the size of an initial contention window, the LAA base station transmits a data packet and acquires a first throughput of data packet transmission; under the first throughput, adopting a simulated annealing algorithm to obtain the size of a first competition window; acquiring a second throughput of data packet transmission under the size of the first contention window; updating the Q value table according to a preset formula for the second throughput and the target throughput, and updating the initial annealing temperature by using preset conditions to obtain the updated annealing temperature; and when the updated annealing temperature is higher than the minimum threshold value, taking the updated annealing temperature as the initial annealing temperature, executing the steps, repeatedly updating the Q value table and obtaining the updated annealing temperature, and stopping updating the Q value table until the updated annealing temperature is lower than or equal to the minimum threshold value, thus obtaining the optimal competition window. The embodiment of the invention limits the throughput of the LAA channel by setting the target throughput, initializes the initial annealing temperature, continuously calculates the annealing temperature of each data transmission when continuously updating the Q value table, stops updating the Q value table when the annealing temperature is less than or equal to the minimum threshold value, and takes the competition window of the obtained Q value table as the optimal competition window.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a network Processor (Ne word Processor, NP), and the like; the integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
It should be noted that the apparatus, the electronic device, and the storage medium according to the embodiments of the present invention are respectively an apparatus, an electronic device, and a storage medium to which the dynamic contention window adjustment method based on Q learning is applied, and all embodiments of the dynamic contention window adjustment method based on Q learning are applicable to the apparatus, the electronic device, and the storage medium, and can achieve the same or similar beneficial effects.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (9)

1. A dynamic contention window adjustment method based on Q learning is applied to an LAA base station, and the method comprises the following steps:
step A, initializing channel access parameters and initial annealing temperature; the channel access parameters include: an initial contention window size, a Q-value table, a target throughput, the initial annealing temperature being a parameter of a annealing algorithm;
step B, transmitting a data packet and acquiring a first throughput of data packet transmission under the size of the initial contention window;
c, under the first throughput, adopting a simulated annealing algorithm to obtain the size of a first competition window;
step D, acquiring a second throughput of the data packet transmission under the size of the first contention window;
step E, updating the Q-value table according to a preset formula for the second throughput and the target throughput, and updating the initial annealing temperature by using a preset condition, so as to obtain an updated annealing temperature, where the preset formula includes Q (s1, a1) + α [ c + γ minQ (s2, a2) -Q (s1, a1) ], where c ═ s2-s0|, α and γ are constants between 0 and 1, s0 represents the target throughput, s1 represents the first throughput, s2 represents the second throughput, a1 represents the first contention window size value, a2 represents the second contention window size value, and the preset condition includes that an absolute value of a difference between the second throughput and the target throughput is smaller than a preset threshold;
and when the updated annealing temperature is higher than the minimum threshold value, taking the updated annealing temperature as the initial annealing temperature, executing the steps B to E, repeatedly updating the Q value table and obtaining the updated annealing temperature, and stopping updating the Q value table until the updated annealing temperature is lower than or equal to the minimum threshold value, thus obtaining the optimal competition window.
2. The method of claim 1, wherein obtaining the second throughput of the data packet transmission channel at the first contention window size comprises:
and under the size of the first competition window, acquiring the second throughput of the data packet transmission channel through a Markov probability model.
3. The method of claim 2, wherein said updating said Q-value table with said second throughput and said target throughput according to a predetermined formula comprises:
calculating a cost value for transmission of the data packets; the cost value is an absolute value of a difference between the second throughput and the target throughput;
and updating the Q value table according to a preset formula under the size of the first competition window according to the cost value to obtain an updated Q value table.
4. An apparatus for adjusting a dynamic contention window based on Q learning, the apparatus comprising:
the initialization module is used for initializing channel access parameters and initial annealing temperature; the channel access parameters include: an initial contention window size, a Q-value table, a target throughput, the initial annealing temperature being a parameter of a annealing algorithm;
a transmission module, configured to transmit a data packet and obtain a first throughput of the data packet transmission in the size of the initial contention window;
the calculation module is used for adopting a simulated annealing algorithm under the first throughput to obtain the size of a first competition window;
an obtaining module, configured to obtain a second throughput of the data packet transmission in the size of the first contention window;
an updating module, configured to update the Q-value table according to a preset formula for the second throughput and the target throughput, and update the initial annealing temperature by using a preset condition, so as to obtain an updated annealing temperature, where the preset formula includes Q (s1, a1) + α [ c + γ minQ (s2, a2) -Q (s1, a1) ], where c ═ s2-s0|, α and γ are constants between 0 and 1, s0 represents a target throughput, s1 represents a first throughput, s2 represents a second throughput, a1 represents a first contention window size value, a2 represents a second contention window size value, and the preset condition includes that an absolute value of a difference between the second throughput and the target throughput is smaller than a preset threshold;
and the circulation module is used for taking the updated annealing temperature as the initial annealing temperature when the updated annealing temperature is higher than the minimum threshold value, executing the transmission module to the updating module, repeatedly updating the Q value table and obtaining the updated annealing temperature, and stopping updating the Q value table until the updated annealing temperature is lower than or equal to the minimum threshold value to obtain the optimal competition window.
5. The apparatus of claim 4, wherein the obtaining module is specifically configured to obtain the second throughput of the data packet transmission channel via a Markov probability model at the first contention window size.
6. The apparatus of claim 5, wherein the update module comprises:
a first update sub-module, specifically configured to calculate a cost value of the packet transmission; the cost value is an absolute value of a difference between the second throughput and the target throughput;
and the second updating submodule is specifically configured to update the Q value table according to a preset formula under the size of the first contention window according to the cost value, so as to obtain an updated Q value table.
7. The apparatus of claim 6, wherein the cycling module is specifically configured to update the initial annealing temperature when an absolute value of a difference between the second throughput and a target throughput is less than a preset threshold.
8. An electronic device, comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory communicate with each other via the communication bus;
the memory is used for storing a computer program;
the processor, when executing the program stored in the memory, implementing the method steps of any of claims 1-3.
9. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-3.
CN201710537493.6A 2017-07-04 2017-07-04 Dynamic competition window adjusting method, device and equipment based on Q learning Active CN107426772B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710537493.6A CN107426772B (en) 2017-07-04 2017-07-04 Dynamic competition window adjusting method, device and equipment based on Q learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710537493.6A CN107426772B (en) 2017-07-04 2017-07-04 Dynamic competition window adjusting method, device and equipment based on Q learning

Publications (2)

Publication Number Publication Date
CN107426772A CN107426772A (en) 2017-12-01
CN107426772B true CN107426772B (en) 2020-01-03

Family

ID=60426803

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710537493.6A Active CN107426772B (en) 2017-07-04 2017-07-04 Dynamic competition window adjusting method, device and equipment based on Q learning

Country Status (1)

Country Link
CN (1) CN107426772B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108924944B (en) * 2018-07-19 2021-09-14 重庆邮电大学 LTE and WiFi coexistence competition window value dynamic optimization method based on Q-learning algorithm
CN109445903B (en) * 2018-09-12 2022-03-29 华南理工大学 Cloud computing energy-saving scheduling implementation method based on QoS feature discovery
CN109803338B (en) * 2019-02-12 2021-03-12 南京邮电大学 Dual-connection base station selection method based on regret degree
CN111637444B (en) * 2020-06-05 2021-10-22 沈阳航空航天大学 Nuclear power steam generator water level control method based on Q learning
CN112637965B (en) * 2020-12-30 2022-06-10 上海交通大学 Game-based Q learning competition window adjusting method, system and medium
CN113316156B (en) * 2021-05-26 2022-07-12 重庆邮电大学 Intelligent coexistence method on unlicensed frequency band

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101466111A (en) * 2009-01-13 2009-06-24 中国人民解放军理工大学通信工程学院 Dynamic spectrum access method based on policy planning constrain Q study
CN105338652A (en) * 2015-09-25 2016-02-17 宇龙计算机通信科技(深圳)有限公司 Channel detection method based on contention window and device thereof
CN105636233A (en) * 2015-12-11 2016-06-01 山东闻远通信技术有限公司 LBT (Listen Before Talk) mechanism which synchronously takes uplink and downlink into consideration in LAA (Licensed-Assisted Access) system
CN106332094A (en) * 2016-09-19 2017-01-11 重庆邮电大学 Q algorithm-based dynamic duty ratio coexistence method for LTE-U and Wi-Fi systems in unauthorized frequency band
CN106656430A (en) * 2015-10-28 2017-05-10 中兴通讯股份有限公司 Listen before talk (LBT) parameter processing method, contention window adjusting method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9412075B2 (en) * 2013-08-23 2016-08-09 Vmware, Inc. Automated scaling of multi-tier applications using reinforced learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101466111A (en) * 2009-01-13 2009-06-24 中国人民解放军理工大学通信工程学院 Dynamic spectrum access method based on policy planning constrain Q study
CN105338652A (en) * 2015-09-25 2016-02-17 宇龙计算机通信科技(深圳)有限公司 Channel detection method based on contention window and device thereof
CN106656430A (en) * 2015-10-28 2017-05-10 中兴通讯股份有限公司 Listen before talk (LBT) parameter processing method, contention window adjusting method and device
CN105636233A (en) * 2015-12-11 2016-06-01 山东闻远通信技术有限公司 LBT (Listen Before Talk) mechanism which synchronously takes uplink and downlink into consideration in LAA (Licensed-Assisted Access) system
CN106332094A (en) * 2016-09-19 2017-01-11 重庆邮电大学 Q algorithm-based dynamic duty ratio coexistence method for LTE-U and Wi-Fi systems in unauthorized frequency band

Also Published As

Publication number Publication date
CN107426772A (en) 2017-12-01

Similar Documents

Publication Publication Date Title
CN107426772B (en) Dynamic competition window adjusting method, device and equipment based on Q learning
Sharma et al. Collaborative distributed Q-learning for RACH congestion minimization in cellular IoT networks
EP3113541A1 (en) Method, apparatus, and system for qos parameter configuration in wlan
CN110248417B (en) Resource allocation method and system for communication service in power Internet of things
US9736718B2 (en) Wireless communication apparatus, wireless station apparatus and wireless communication method
EP3634074B1 (en) Random access method and terminal device
CN111756487B (en) Resource reselection method, node equipment and resource reselection device
CN109548167B (en) Coverage range self-adaptive adjusting method and device, computer storage medium and equipment
US9974079B1 (en) Frequency selection for broadband transmissions among a shared spectrum
CN110674942A (en) Bandwidth management method and device, electronic equipment and readable storage medium
US20170303167A1 (en) Wireless communication apparatus, wireless communication node, and channel detection method
CN110572859A (en) AP switching method, device, equipment and medium
Pacheco-Paramo et al. Delay-aware dynamic access control for mMTC in wireless networks using deep reinforcement learning
CN113692060B (en) Method for configuring and updating random access resources in multi-antenna MIMO scene
CN107147586B (en) Dynamic competition window adjusting method, device and equipment based on random game theory
CN108347744B (en) Equipment access method, device and access control equipment
US20080123619A1 (en) Apparatus, method, and computer readable medium thereof for dividing a beacon interval
US20170127450A1 (en) Data transmission control method and device
US11388623B2 (en) Method, device and apparatus for controlling congestion window in internet of vehicles
EP2930617A1 (en) Resource management method and device
US20180152946A1 (en) Channel selection device and channel selection method
CN113315773B (en) Code rate adjusting method and device, electronic equipment and storage medium
CN105282750A (en) Resource allocation method and device
EP4057653A1 (en) Resource selection method and user equipment
WO2022179077A1 (en) Communication and sensing integrated non-orthogonal multiple access random access communication method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant