CN107426772B

CN107426772B - Dynamic competition window adjusting method, device and equipment based on Q learning

Info

Publication number: CN107426772B
Application number: CN201710537493.6A
Authority: CN
Inventors: 田辉; 闫晓婧; 秦城; 范绍帅
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2017-07-04
Filing date: 2017-07-04
Publication date: 2020-01-03
Anticipated expiration: 2037-07-04
Also published as: CN107426772A

Abstract

The embodiment of the invention provides a dynamic competition window adjusting method, a device and equipment based on Q learning, wherein the method comprises the following steps: A. initializing channel access parameters and initial annealing temperature; B. transmitting and acquiring a first throughput of data packet transmission under the size of an initial contention window; C. under the first throughput, adopting a simulated annealing algorithm to obtain the size of a first competition window; D. acquiring a second throughput of data packet transmission under the size of the first contention window; E. updating the Q value table according to a preset formula for the second throughput and the target throughput, and updating the initial annealing temperature by using preset conditions to obtain the updated annealing temperature; and when the updated annealing temperature is higher than the minimum threshold value, taking the updated annealing temperature as the initial annealing temperature, executing the steps B to E, repeatedly updating the Q value table and obtaining the updated annealing temperature, and stopping updating the Q value table until the updated annealing temperature is lower than or equal to the minimum threshold value, thus obtaining the optimal competition window.

Description

Dynamic competition window adjusting method, device and equipment based on Q learning

Technical Field

The present invention relates to the field of wireless communication technologies, and in particular, to a method, an apparatus, and a device for adjusting a dynamic contention window based on Q learning.

Background

With the popularization of mobile terminal devices, the amount of traffic carried by cellular mobile networks has also increased substantially. In order to meet the diverse service demands of users, cellular mobile networks require more spectrum resources to adapt to the rapid increase of mobile network traffic. The frequency resources of the licensed band cannot meet the various service requirements of users, and operators have gradually started to use the unlicensed band to complete auxiliary access to meet the increasing traffic. An authorized Assisted Access (LAA) network provides a good way for operators to utilize unlicensed frequency bands, in order to ensure fair coexistence of LAA and other unlicensed spectrum systems, network equipment needs to use energy monitoring to determine whether channels are occupied, and an eNB base station needs to continuously adjust the size of a contention window to ensure successful and correct transmission of data. Meanwhile, a Wireless communication network technology (Wi-Fi for short) can also realize data transmission, and with the popularization of a Wi-Fi environment, a user can save mobile network flow cost and can meet various service requirements at any time.

The LAA system has a relatively strong protocol mechanism, and has feedback mechanisms such as a hybrid automatic repeat request technology and Channel State Information (CSI for short), so that when the LAA system exists with other networks, for example, with Wi-Fi, the Channel access probability of the LAA system is higher than that of the Wi-Fi Channel, the throughput of the LAA network system Channel is higher, the data transmission amount of the LAA system Channel is large, the Channel burden is easily caused, and strong interference is generated on other wireless network access technologies coexisting with the LAA network.

Disclosure of Invention

Embodiments of the present invention provide a method, an apparatus, and a device for adjusting a dynamic contention window based on Q learning, so as to effectively and dynamically adjust the size of the contention window, effectively limit the throughput of an eNB base station of an LAA, and reduce the burden of data transmission. The specific technical scheme is as follows:

the embodiment of the invention provides a dynamic contention window adjusting method based on Q learning, which is applied to an LAA base station and comprises the following steps:

step A, initializing channel access parameters and initial annealing temperature; the channel access parameters include: initial contention window size, Q value table, target throughput;

step B, transmitting a data packet and acquiring a first throughput of data packet transmission under the size of the initial contention window;

c, under the first throughput, adopting a simulated annealing algorithm to obtain the size of a first competition window;

step D, acquiring a second throughput of the data packet transmission under the size of the first contention window;

step E, updating the Q value table according to a preset formula by the second throughput and the target throughput, and updating the initial annealing temperature by using preset conditions to obtain an updated annealing temperature;

and when the updated annealing temperature is higher than the minimum threshold value, taking the updated annealing temperature as the initial annealing temperature, executing the steps B to E, repeatedly updating the Q value table and obtaining the updated annealing temperature, and stopping updating the Q value table until the updated annealing temperature is lower than or equal to the minimum threshold value, thus obtaining the optimal competition window.

Specifically, the obtaining the second throughput of the data packet transmission channel under the size of the first contention window includes:

and under the size of the first competition window, acquiring the second throughput of the data packet transmission channel through a Markov probability model.

Specifically, the updating the Q-value table according to the second throughput and the target throughput by a preset formula includes:

calculating a cost value for transmission of the data packets; the cost value is an absolute value of a difference between the second throughput and the target throughput;

according to the cost value, under the first contention window size, updating the Q-value table according to a preset formula to obtain an updated Q-value table, where the preset formula includes Q (s1, a1) + α [ c + γ minQ (s2, a2) -Q (s1, a1) ], where c ═ s2-s0|, α and γ are constants between 0 and 1, s0 represents a target throughput, s1 represents a first throughput, s2 represents a second throughput, a1 represents a first contention window size value, and a2 represents a second contention window size value.

Specifically, the preset conditions include: and the absolute value of the difference value between the second throughput and the target throughput is less than a preset threshold value.

The embodiment of the invention provides a dynamic competition window adjusting device based on Q learning, which comprises:

the initialization module is used for initializing channel access parameters and initial annealing temperature; the channel access parameters include: initial contention window size, Q value table, target throughput;

a transmission module, configured to transmit a data packet and obtain a first throughput of the data packet transmission in the size of the initial contention window;

the calculation module is used for adopting a simulated annealing algorithm under the first throughput to obtain the size of a first competition window;

an obtaining module, configured to obtain a second throughput of the data packet transmission in the size of the first contention window;

an updating module, configured to update the Q-value table according to a preset formula for the second throughput and the target throughput, and update the initial annealing temperature by using a preset condition, to obtain an updated annealing temperature;

and the circulation module is used for taking the updated annealing temperature as the initial annealing temperature when the updated annealing temperature is higher than the minimum threshold value, executing the transmission module to the updating module, repeatedly updating the Q value table until the updated annealing temperature is lower than or equal to the minimum threshold value, stopping updating the Q value table and obtaining the updated annealing temperature, and obtaining the optimal competition window.

Specifically, the obtaining module is specifically configured to obtain, through a markov probability model, a second throughput of the data packet transmission channel in the size of the first contention window.

Specifically, the update module includes:

a first update sub-module, specifically configured to calculate a cost value of the packet transmission; the cost value is an absolute value of a difference between the second throughput and the target throughput;

and a second updating sub-module, configured to update the Q-value table according to a preset formula under the first contention window size according to the cost value, so as to obtain an updated Q-value table, where the preset formula includes Q (s1, a1) + α [ c + γ minQ (s2, a2) -Q (s1, a1) ], where c ═ s2-s0|, α and γ are constants between 0 and 1, s0 represents a target throughput, s1 represents a first throughput, s2 represents a second throughput, a1 represents a first contention window size value, and a2 represents a second contention window size value.

Specifically, the loop module is specifically configured to update the initial annealing temperature when an absolute value of a difference between the second throughput and a target throughput is smaller than a preset threshold.

The embodiment of the invention provides electronic equipment, which comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory finish mutual communication through the communication bus;

the memory is used for storing a computer program;

the processor is configured to implement the method steps of adjusting the dynamic contention window based on Q learning as described above when executing the program stored in the memory.

An embodiment of the present invention provides a computer-readable storage medium, in which a computer program is stored, and when being executed by a processor, the computer program implements the method steps for adjusting the dynamic contention window based on Q learning as described above.

The embodiment of the invention provides a dynamic competition window adjusting method, a device and equipment based on Q learning, which comprises the following steps: step A, initializing channel access parameters and initial annealing temperature; wherein, the channel access parameters include: initial contention window size, Q value table, target throughput; step B, under the size of the initial contention window, the LAA base station transmits a data packet and acquires a first throughput of data packet transmission; c, under the first throughput, adopting a simulated annealing algorithm to obtain the size of a first competition window; step D, acquiring a second throughput of data packet transmission under the size of the first contention window; step E, updating the Q value table according to a preset formula for the second throughput and the target throughput, and updating the initial annealing temperature by using preset conditions to obtain the updated annealing temperature; and when the updated annealing temperature is higher than the minimum threshold value, taking the updated annealing temperature as the initial annealing temperature, executing the steps B to E, repeatedly updating the Q value table and obtaining the updated annealing temperature, and stopping updating the Q value table until the updated annealing temperature is lower than or equal to the minimum threshold value, thus obtaining the optimal competition window. The embodiment of the invention limits the throughput of the LAA channel by setting the target throughput, initializes the initial annealing temperature, continuously calculates the annealing temperature of each data transmission when continuously updating the Q value table, stops updating the Q value table when the annealing temperature is less than or equal to the minimum threshold value, and takes the competition window of the obtained Q value table as the optimal competition window. Of course, it is not necessary for any product or method of practicing the invention to achieve all of the above-described advantages at the same time.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a dynamic contention window adjustment method based on Q learning according to an embodiment of the present invention;

fig. 2 is a schematic diagram illustrating the sum of throughputs of the LAA and the WiFi systems under different adjustment methods according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a dynamic contention window adjusting apparatus based on Q learning according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an update module according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to effectively limit the throughput of the eNB base station of the LAA, reduce the burden of data transmission, and achieve effective dynamic adjustment of the size of the contention window, embodiments of the present invention provide a dynamic contention window adjustment method, apparatus, and device based on Q learning, which are described in detail below.

Fig. 1 is a flowchart of a dynamic contention window adjustment method based on Q learning according to an embodiment of the present invention.

Step 101, initializing channel access parameters and an initial annealing temperature.

The method provided by the embodiment of the invention is applied to an eNB base station of LAA, the base station prepares to transmit a data packet under a wireless communication standard, and initializes the channel access parameter and the initial annealing temperature before transmitting the data packet. Wherein, the channel access parameters include: initial contention window size, Q value table, target throughput.

The initial contention window size is a contention window size adopted when the base station transmits the data packet for the first time, and in this embodiment, the initial contention window size value is initialized to 2, that is, the contention window size value adopted when the base station transmits the data packet for the first time is 2.

The Q-value table is a parameter in the Q-learning process, which is a reinforcement learning method, and specifically, Q-learning can be used to find the optimal action selection strategy for any given markov decision process, and the Q-learning term includes a state and an action, and the Q-value table is a matrix, where each row of the matrix represents the current state of the eNB base station, and each column represents the action that may occur when the eNB base station reaches the next state. In this embodiment, the Q-value table is initialized to a 5-row, 15-column matrix, and each element in the matrix has a value of 0.

The target throughput is used to limit the range of the LAA throughput, and the target throughput is set to limit the throughput of the LAA not too large, so as to limit the channel access probability of the LAA, and achieve the effect of reducing the burden of data transmission.

The initial annealing temperature is a parameter of the annealing algorithm, the simulated annealing algorithm can be used to solve the optimization problem, in order to obtain the optimal competition window, an initial annealing temperature needs to be set, the initial annealing temperature is continuously reduced to a minimum threshold, here set to 1, so as to obtain the optimal competition window, and in this step, the initial annealing temperature is initialized to 1000 ℃.

Step 102, transmitting the data packet and obtaining a first throughput of data packet transmission under the size of the initial contention window.

And the LAA base station transmits the data packet under the initial contention window size. The base station server obtains a first throughput of current data packet transmission, after channel access parameters are initialized, each LAA base station has an initial competition strategy, and the initial competition strategy comprises the throughput of a current competition window, namely the throughput corresponding to the initial competition window. In order to obtain the first throughput of the LAA base station data transmission channel under the initial contention window size, the first throughput of the current contention window may be obtained by dividing the number of bits in the transmission data packet by the difference between the receiving time and the transmitting time.

Step 103, under the first throughput, a simulated annealing algorithm is adopted to obtain the size of the first contention window.

After the first throughput under the initial contention window size is obtained, the current state of the base station is judged, and the base station randomly selects a contention window size a1 according to the current first throughput, which is set as s1, and simultaneously selects a contention window size a2 with the minimum Q value under the current first throughput s1 according to a Q value table. If the value of Q (s1, a1) is less than the value of Q (s1, a2), the base station selects a randomly selected contention window size a1 as the first contention window size; if the value of Q (s1, a1) is larger than or equal to the value of Q (s1, a2), the base station selects the random selected contention window size a1 as the first contention window size by using the probability that the difference between the two Q values and the current annealing temperature quotient is an index, otherwise, the base station selects a2 as the first contention window size.

And 104, acquiring a second throughput of the data packet transmission under the size of the first contention window.

And the LAA base station transmits the data packet under the first contention window size. The base station server obtains a second throughput of the current data packet transmission, and in order to obtain the second throughput of the LAA base station data transmission channel under the size of the first contention window, the second throughput of the current contention window can be obtained by dividing the number of bits in the transmission data packet by a difference between the receiving time and the transmitting time.

A specific method for obtaining the second throughput of the data packet transmission channel in the first contention window size is as follows: and under the size of the first competition window, acquiring the second throughput of the data packet transmission channel through a Markov probability model. And the LAA base station continues to send and transmit data packets under the size of the first competition window, a Markov probability model is adopted, the current channel access probability is calculated according to the transition probability among different states in the Markov probability model and a state transition equation, and the second throughput is obtained according to the channel access probability.

And 105, updating the Q value table according to the second throughput and the target throughput according to a preset formula, and updating the initial annealing temperature by using preset conditions to obtain the updated annealing temperature.

And after the second throughput is obtained, updating the values of the elements in the Q value table according to a preset formula by using the second throughput and the target throughput. Assuming that under a first contention window size a1, the first throughput is s1, the obtained second throughput is s2, according to the obtained s2, the second contention window size is a2, and according to a2, a2 and a target throughput s0, a Q value table is updated to Q (s1, a1) + α [ c + γ minQ (s2, a2) -Q (s1, a1) ] according to a preset formula, wherein c is | s2-s0|, α is an arbitrary parameter between 0 and 1, and represents a learning rate, and the size of α controls the rate of updating the Q value; gamma is any parameter between 0 and 1, representing the influence of the current Q value size on the future, and minQ (s2, a2) indicates that when the competition window size is a2, the minimum Q value is selected.

A specific method for updating a Q value table according to a preset formula by using a second throughput and a target throughput comprises the following steps: calculating a cost value for packet transmission; the cost value is the absolute value of the difference between the second throughput and the target throughput; and updating the Q value table according to a preset formula under the size of the first competition window according to the cost value to obtain an updated Q value table. And calculating and updating by directly substituting the absolute value of the difference between the obtained second throughput and the target throughput as a cost value under the size of the first competition window by using a method of calculating the cost value, wherein the cost value is set as c. Updating the Q-value table with reference to the cost value can more intuitively reflect the difference between the throughput per data transmission and the target throughput.

In the embodiment of the present invention, the Q-value table is a matrix with 5 rows and 15 columns, each row represents throughput, the first row is s1, the second row is s2, the third row is s3, the fourth row is s4, the fifth row is s5, wherein s1 ranges from less than 30Mps, s2 ranges from 30Mps to 40Mps, s3 ranges from 40Mps to 50Mps, s4 ranges from 50Mps to 60Mps, s5 ranges from greater than or equal to 60Mps, each column represents the size of the contention window, and the first column to the fifteenth column are 2 to 16, respectively.

And 106, when the updated annealing temperature is higher than the minimum threshold value, taking the updated annealing temperature as the initial annealing temperature, executing the steps 102 to 105, repeatedly updating the Q value table and obtaining the updated annealing temperature, and stopping updating the Q value table until the updated annealing temperature is lower than or equal to the minimum threshold value, so as to obtain the optimal competition window.

After obtaining the updated Q-value table, if the current throughput is in the interval of the target throughput, decreasing the current annealing temperature to T2 ═ β · T1(0< β <1), where T1 is the current annealing temperature and T2 is the decreased annealing temperature, comparing T2 with the set threshold of the minimum temperature, and when T2 is greater than the minimum threshold, executing the above steps 102 to 105, repeating updating the Q-value table until the updated annealing temperature is less than or equal to the minimum threshold, stopping updating the Q-value table, and according to the current throughput, determining the state of the base station, for example, when the throughput is greater, the amount of data transmitted by the LAA base station is greater; when the throughput is small, the LAA base station transmits a small amount of data. In the corresponding state in the Q-value table, i.e. in a certain row of the Q-value table, the smallest Q-value is searched, and then the smallest Q-value corresponds to the optimal contention window, thereby obtaining the optimal contention window.

A specific method for updating the initial annealing temperature by using preset conditions to obtain the updated annealing temperature comprises the following steps: the preset condition is that when the absolute value of the difference between the second throughput and the target throughput is smaller than the preset threshold, the LAA base station calculates the updated annealing temperature by using the simulated annealing algorithm, and updates the annealing temperature, in this embodiment, the preset threshold may be set to 5, that is, when the absolute value of the difference between the second throughput and the target throughput is smaller than the preset threshold, the annealing temperature is updated, and the target throughput is 45Mps, it can be understood that the annealing temperature can be updated when the second throughput is between 40Mps and 50 Mps; and when the updated annealing temperature is less than or equal to the minimum threshold value, stopping updating the Q value table. After updating the Q-value table, calculating the current annealing temperature by simulating the annealing algorithm, when the updated annealing temperature is greater than the minimum threshold, using the updated annealing temperature as the initial annealing temperature, and continuing the calculation of the annealing temperature until the updated annealing temperature is less than or equal to the minimum threshold, in this embodiment, the minimum threshold may be set to 1 ℃. The initial annealing temperature is set to 1000 degrees celsius, so updating of the Q-value table is stopped until the annealing temperature is less than or equal to 1 degree celsius.

And when the updated annealing temperature is less than or equal to the minimum threshold, stopping updating the Q value table, and determining the minimum contention window in the Q value table as the optimal contention window under the throughput of the base station and the current throughput in the interval of the target throughput.

The dynamic competition window adjusting method based on Q learning provided by the embodiment of the invention comprises the following steps: step 101, initializing channel access parameters and initial annealing temperature; wherein, the channel access parameters include: initial contention window size, Q value table, target throughput; 102, under the size of an initial contention window, transmitting a data packet by the LAA base station and acquiring a first throughput of data packet transmission; 103, under the first throughput, adopting a simulated annealing algorithm to obtain the size of a first competition window; 104, acquiring a second throughput of data packet transmission under the size of the first contention window; step 105, updating the Q value table according to a preset formula for the second throughput and the target throughput, and updating the initial annealing temperature by using preset conditions to obtain the updated annealing temperature; when the updated annealing temperature is higher than the minimum threshold, the updated annealing temperature is used as the initial annealing temperature, the above steps 102 to 105 are executed, the Q value table is repeatedly updated and the updated annealing temperature is obtained, and the updating of the Q value table is stopped until the updated annealing temperature is lower than or equal to the minimum threshold, so as to obtain the optimal competition window. The embodiment of the invention limits the throughput of the LAA channel by setting the target throughput, initializes the initial annealing temperature, continuously calculates the annealing temperature of each data transmission when continuously updating the Q value table, stops updating the Q value table when the annealing temperature is less than or equal to the minimum threshold value, and takes the competition window of the obtained Q value table as the optimal competition window.

To further illustrate the effectiveness of the method provided by the embodiment of the present invention, taking the coexistence of the LAA and the WiFi system as an example, as shown in fig. 2, the abscissa represents the throughput of the LAA and the WiFi system, and the ordinate represents the CDF illustrates the cumulative distribution function, which represents the probability value. Fig. 2 shows a schematic diagram of the sum of the throughputs of LAA and WiFi systems under different tuning methods. In the figure, the curve 201 represents the sum of the throughput of the LAA and the WiFi system when the static back-off adjustment method is applied, the curve 202 represents the sum of the throughput of the LAA and the WiFi system when the exponential back-off adjustment method is applied, and the curve 203 represents the sum of the throughput of the LAA and the WiFi system when the method provided by the embodiment of the present invention is applied, as can be seen from fig. 2, the sum of the throughput of the LAA and the WiFi system of the present invention is the largest, which indicates that the WiFi throughput is increased when the LAA throughput is decreased, so the throughput added by the two is increased.

Fig. 2 illustrates that in the Q learning algorithm and the simulated annealing algorithm based on reinforcement learning, a dynamic contention window adjustment method is added, and the eNB contention window size is dynamically adjusted by setting the target throughput and the initial annealing temperature of the LAA base station, so that the throughput of the LAA base station is limited within a certain range, more channel access opportunities are provided to the WiFi system, and the coexistence fairness of the LAA and the WiFi system is improved.

The embodiment of the invention provides a dynamic competition window adjusting device based on Q learning, and the structural schematic diagram of the device is shown in figure 3 and comprises the following components:

an initialization module 301, configured to initialize a channel access parameter and an initial annealing temperature; the channel access parameters include: initial contention window size, Q value table, target throughput;

a transmission module 302, configured to transmit a data packet and obtain a first throughput of data packet transmission in the initial contention window size;

a calculating module 303, configured to adopt a simulated annealing algorithm at the first throughput to obtain a first contention window size;

an obtaining module 304, configured to obtain a second throughput of data packet transmission in the first contention window size;

an updating module 305, configured to update the Q value table according to a preset formula for the second throughput and the target throughput, and update the initial annealing temperature by using a preset condition, so as to obtain an updated annealing temperature;

and a circulation module 306, configured to, when the updated annealing temperature is greater than the minimum threshold, use the updated annealing temperature as the initial annealing temperature, execute the transmission module to the update module, repeatedly update the Q-value table and obtain the updated annealing temperature, and stop updating the Q-value table until the updated annealing temperature is less than or equal to the minimum threshold, to obtain the optimal contention window.

Specifically, the obtaining module 304 is specifically configured to obtain, through a markov probability model, a second throughput of the data packet transmission channel under the first contention window size.

Specifically, the structure diagram of the update module 305 as shown in fig. 4 includes:

a first update submodule 401, configured to calculate a cost value for packet transmission; the cost value is the absolute value of the difference between the second throughput and the target throughput;

the second updating submodule 402 is specifically configured to update the Q-value table according to a preset formula under the first contention window size according to the cost value, so as to obtain an updated Q-value table, where the preset formula includes Q (s2, a2) + α [ c + γ minQ (s2, a3) -Q (s2, a2) ], where c ═ s1-s0|, α and γ are constants between 0 and 1, s0 represents the target throughput, s1 represents the first throughput, a2 represents the first contention window size value, s2 represents the second throughput, and a3 represents the second contention window size value under the second throughput s 2.

The dynamic competition window adjusting device based on Q learning provided by the embodiment of the invention comprises: initializing channel access parameters and initial annealing temperature; wherein, the channel access parameters include: initial contention window size, Q value table, target throughput; under the size of an initial contention window, the LAA base station transmits a data packet and acquires a first throughput of data packet transmission; under the first throughput, the LAA base station adopts a simulated annealing algorithm to obtain the size of a first contention window; acquiring a second throughput of data packet transmission under the size of the first contention window; updating the Q value table according to a preset formula for the second throughput and the target throughput, and updating the initial annealing temperature by using preset conditions to obtain the updated annealing temperature; and when the updated annealing temperature is higher than the minimum threshold value, taking the updated annealing temperature as the initial annealing temperature, executing the steps, repeatedly updating the Q value table and obtaining the updated annealing temperature, and stopping updating the Q value table until the updated annealing temperature is lower than or equal to the minimum threshold value, thus obtaining the optimal competition window. The embodiment of the invention limits the throughput of the LAA channel by setting the target throughput, initializes the initial annealing temperature, continuously calculates the annealing temperature of each data transmission when continuously updating the Q value table, stops updating the Q value table when the annealing temperature is less than or equal to the minimum threshold value, and takes the competition window of the obtained Q value table as the optimal competition window.

An embodiment of the present invention provides an electronic device, as shown in fig. 5, including a processor 501, a communication interface 502, a memory 503 and a communication bus 504, where the processor, the communication interface, and the memory complete mutual communication through the communication bus;

a memory for storing a computer program;

the processor is used for realizing the following method steps of adjusting the dynamic competition window based on the Q learning when the program stored in the memory is executed:

step A, initializing channel access parameters and initial annealing temperature; wherein the channel access parameters include: initial contention window size, Q value table, target throughput;

step B, transmitting the data packet and acquiring a first throughput of data packet transmission under the size of the initial contention window;

step D, acquiring a second throughput of data packet transmission under the size of the first contention window;

step E, updating the Q value table according to the second throughput and the target throughput according to a preset formula, and updating the initial annealing temperature to obtain the updated annealing temperature;

and when the updated annealing temperature is higher than the minimum threshold value, executing the steps B to E, repeatedly updating the Q value table until the updated annealing temperature is lower than or equal to the minimum threshold value, and stopping updating the Q value table to obtain the optimal competition window.

The electronic device provided by the embodiment of the invention realizes the limitation of the throughput of LAA channel transmission and reduces the transmission load by the following steps, and comprises the following steps: the method comprises the following steps: initializing channel access parameters and initial annealing temperature; wherein, the channel access parameters include: initial contention window size, Q value table, target throughput; under the size of an initial contention window, the LAA base station transmits a data packet and acquires a first throughput of data packet transmission; under the first throughput, the LAA base station adopts a simulated annealing algorithm to obtain the size of a first contention window; acquiring a second throughput of data packet transmission under the size of the first contention window; updating the Q value table according to a preset formula for the second throughput and the target throughput, and updating the initial annealing temperature by using preset conditions to obtain the updated annealing temperature; and when the updated annealing temperature is higher than the minimum threshold value, taking the updated annealing temperature as the initial annealing temperature, executing the steps, repeatedly updating the Q value table and obtaining the updated annealing temperature, and stopping updating the Q value table until the updated annealing temperature is lower than or equal to the minimum threshold value, thus obtaining the optimal competition window. The embodiment of the invention limits the throughput of the LAA channel by setting the target throughput, initializes the initial annealing temperature, continuously calculates the annealing temperature of each data transmission when continuously updating the Q value table, stops updating the Q value table when the annealing temperature is less than or equal to the minimum threshold value, and takes the competition window of the obtained Q value table as the optimal competition window.

An embodiment of the present invention provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the method for adjusting a dynamic contention window based on Q learning as above is implemented.

The computer-readable storage medium provided by the embodiment of the invention comprises: initializing channel access parameters and initial annealing temperature; wherein, the channel access parameters include: initial contention window size, Q value table, target throughput; under the size of an initial contention window, the LAA base station transmits a data packet and acquires a first throughput of data packet transmission; under the first throughput, adopting a simulated annealing algorithm to obtain the size of a first competition window; acquiring a second throughput of data packet transmission under the size of the first contention window; updating the Q value table according to a preset formula for the second throughput and the target throughput, and updating the initial annealing temperature by using preset conditions to obtain the updated annealing temperature; and when the updated annealing temperature is higher than the minimum threshold value, taking the updated annealing temperature as the initial annealing temperature, executing the steps, repeatedly updating the Q value table and obtaining the updated annealing temperature, and stopping updating the Q value table until the updated annealing temperature is lower than or equal to the minimum threshold value, thus obtaining the optimal competition window. The embodiment of the invention limits the throughput of the LAA channel by setting the target throughput, initializes the initial annealing temperature, continuously calculates the annealing temperature of each data transmission when continuously updating the Q value table, stops updating the Q value table when the annealing temperature is less than or equal to the minimum threshold value, and takes the competition window of the obtained Q value table as the optimal competition window.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a network Processor (Ne word Processor, NP), and the like; the integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

It should be noted that the apparatus, the electronic device, and the storage medium according to the embodiments of the present invention are respectively an apparatus, an electronic device, and a storage medium to which the dynamic contention window adjustment method based on Q learning is applied, and all embodiments of the dynamic contention window adjustment method based on Q learning are applicable to the apparatus, the electronic device, and the storage medium, and can achieve the same or similar beneficial effects.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A dynamic contention window adjustment method based on Q learning is applied to an LAA base station, and the method comprises the following steps:

step A, initializing channel access parameters and initial annealing temperature; the channel access parameters include: an initial contention window size, a Q-value table, a target throughput, the initial annealing temperature being a parameter of a annealing algorithm;

step E, updating the Q-value table according to a preset formula for the second throughput and the target throughput, and updating the initial annealing temperature by using a preset condition, so as to obtain an updated annealing temperature, where the preset formula includes Q (s1, a1) + α [ c + γ minQ (s2, a2) -Q (s1, a1) ], where c ═ s2-s0|, α and γ are constants between 0 and 1, s0 represents the target throughput, s1 represents the first throughput, s2 represents the second throughput, a1 represents the first contention window size value, a2 represents the second contention window size value, and the preset condition includes that an absolute value of a difference between the second throughput and the target throughput is smaller than a preset threshold;

2. The method of claim 1, wherein obtaining the second throughput of the data packet transmission channel at the first contention window size comprises:

3. The method of claim 2, wherein said updating said Q-value table with said second throughput and said target throughput according to a predetermined formula comprises:

and updating the Q value table according to a preset formula under the size of the first competition window according to the cost value to obtain an updated Q value table.

4. An apparatus for adjusting a dynamic contention window based on Q learning, the apparatus comprising:

the initialization module is used for initializing channel access parameters and initial annealing temperature; the channel access parameters include: an initial contention window size, a Q-value table, a target throughput, the initial annealing temperature being a parameter of a annealing algorithm;

an updating module, configured to update the Q-value table according to a preset formula for the second throughput and the target throughput, and update the initial annealing temperature by using a preset condition, so as to obtain an updated annealing temperature, where the preset formula includes Q (s1, a1) + α [ c + γ minQ (s2, a2) -Q (s1, a1) ], where c ═ s2-s0|, α and γ are constants between 0 and 1, s0 represents a target throughput, s1 represents a first throughput, s2 represents a second throughput, a1 represents a first contention window size value, a2 represents a second contention window size value, and the preset condition includes that an absolute value of a difference between the second throughput and the target throughput is smaller than a preset threshold;

and the circulation module is used for taking the updated annealing temperature as the initial annealing temperature when the updated annealing temperature is higher than the minimum threshold value, executing the transmission module to the updating module, repeatedly updating the Q value table and obtaining the updated annealing temperature, and stopping updating the Q value table until the updated annealing temperature is lower than or equal to the minimum threshold value to obtain the optimal competition window.

5. The apparatus of claim 4, wherein the obtaining module is specifically configured to obtain the second throughput of the data packet transmission channel via a Markov probability model at the first contention window size.

6. The apparatus of claim 5, wherein the update module comprises:

and the second updating submodule is specifically configured to update the Q value table according to a preset formula under the size of the first contention window according to the cost value, so as to obtain an updated Q value table.

7. The apparatus of claim 6, wherein the cycling module is specifically configured to update the initial annealing temperature when an absolute value of a difference between the second throughput and a target throughput is less than a preset threshold.

8. An electronic device, comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory communicate with each other via the communication bus;

the memory is used for storing a computer program;

the processor, when executing the program stored in the memory, implementing the method steps of any of claims 1-3.

9. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-3.