CN115134026B - Intelligent unlicensed spectrum access method based on average field - Google Patents

Intelligent unlicensed spectrum access method based on average field Download PDF

Info

Publication number
CN115134026B
CN115134026B CN202210746022.7A CN202210746022A CN115134026B CN 115134026 B CN115134026 B CN 115134026B CN 202210746022 A CN202210746022 A CN 202210746022A CN 115134026 B CN115134026 B CN 115134026B
Authority
CN
China
Prior art keywords
agent
action
network
intelligent
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210746022.7A
Other languages
Chinese (zh)
Other versions
CN115134026A (en
Inventor
裴二荣
黄一格
宋珈锐
陶凯
徐成义
刘浔翀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaoxing City Shangyu District Shunxing Electric Power Co ltd
Shenzhen Hongyue Information Technology Co ltd
State Grid Zhejiang Electric Power Co Ltd Shaoxing Shangyu District Power Supply Co
State Grid Zhejiang Electric Power Co Ltd Yuyao Power Supply Co
Original Assignee
State Grid Zhejiang Electric Power Co Ltd Shaoxing Shangyu District Power Supply Co
State Grid Zhejiang Electric Power Co Ltd Yuyao Power Supply Co
Shaoxing City Shangyu District Shunxing Electric Power Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Zhejiang Electric Power Co Ltd Shaoxing Shangyu District Power Supply Co, State Grid Zhejiang Electric Power Co Ltd Yuyao Power Supply Co, Shaoxing City Shangyu District Shunxing Electric Power Co ltd filed Critical State Grid Zhejiang Electric Power Co Ltd Shaoxing Shangyu District Power Supply Co
Priority to CN202210746022.7A priority Critical patent/CN115134026B/en
Publication of CN115134026A publication Critical patent/CN115134026A/en
Application granted granted Critical
Publication of CN115134026B publication Critical patent/CN115134026B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B17/00Monitoring; Testing
    • H04B17/30Monitoring; Testing of propagation channels
    • H04B17/382Monitoring; Testing of propagation channels for resource allocation, admission control or handover
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W74/00Wireless channel access
    • H04W74/08Non-scheduled access, e.g. ALOHA
    • H04W74/0808Non-scheduled access, e.g. ALOHA using carrier sensing, e.g. carrier sense multiple access [CSMA]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W74/00Wireless channel access
    • H04W74/08Non-scheduled access, e.g. ALOHA
    • H04W74/0833Random access procedures, e.g. with 4-step access
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Electromagnetism (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention relates to an intelligent unlicensed spectrum access method based on an average field, and belongs to the field of wireless communication. The method comprises the following steps: s1: initializing environmental parameters and agent parameters; s2: initializing the state and experience playback mechanism RB of each agent; s3: generating actions according to Bolzmann policyS4: at the subsequent beta E Action a is performed in a single execution cycle t Receiving environmental feedback r t And updates the state to s t+1 The method comprises the steps of carrying out a first treatment on the surface of the S5: information exchange is carried out among the nodes; s6: transfer of the sampleStore to experience playback mechanism RB; s7: randomly extracting H transfer samples from an experience playback mechanism RB to update the Q-network; s8: and after training is terminated, each intelligent agent obtains an optimal access strategy. In the invention, the small base station is used as a main body of learning and decision, can acquire the global information of the network through information exchange, and learns the optimal access action strategy, thereby maximizing the total throughput and fairness of the coexistence network.

Description

Intelligent unlicensed spectrum access method based on average field
Technical Field
The invention belongs to the field of wireless communication, and relates to an intelligent unlicensed spectrum access method based on an average field.
Background
With the dramatic increase in the number of mobile devices and data traffic, the demand for high capacity and data rates has increased substantially. Cisco index prediction until 2022, the average mobile connection device would reach 3.6, and global mobile data traffic would increase 7 times. However, the licensed spectrum resources available to mobile network operators are still very limited. This results in a shift in focus from licensed spectrum bands to unlicensed spectrum bands as well, since there is a abundance of spectrum available in the unlicensed bands.
Currently unlicensed spectrum at low frequencies includes 2.4GHz, 5GHz and 6GHz. The 2.4GHz band is the first commercial unlicensed band issued by the FCC and is also the most currently used unlicensed shared band. Existing wireless technologies operating at 2.4GHz include IEEE 802.11, zigBee, bluetooth, and wireless phones, among others. Compared with the 2.4GHz band, the 5GHz band can obtain larger spectrum bandwidth and is less interfered, and most of the frequency bands run on the basis of the IEEE 802.11 technology, so most of the coexistence schemes proposed by researchers mostly consider deployment in the 5GHz band. At 6GHz, it is expected that 5G NR-U will coexist with IEEE 802.11ax/be based systems to meet the high capacity requirements of future mobile networks. In summary, from the past to the future, all of these unlicensed spectrum is mainly used by WiFi technology. Therefore, in order to use cellular technology in the same unlicensed band as WiFi networks, it is necessary to propose an access scheme that coexists with WiFi networks smoothly and efficiently.
In the unlicensed frequency band, the dominant WiFi network adopts a CSMA/CA (Carrier Sense Multiple Access With Collision Avoidance) protocol to access channels in a random competition mode, and the conflict-generated packets are retransmitted according to the rule of binary exponential back-off. In contrast, cellular systems perform data transmission using a centralized scheduling mechanism, in which the transmission opportunities of the user equipment are decided by the base station. Offloading cellular data transmissions to unlicensed bands may significantly degrade the performance of the WiFi network due to the different transmission mechanisms performed by the cellular system and the WiFi network. Heretofore, existing coexistence schemes, including LTE-LAA, LTE-U, ABS, and the like, have had very limited performance achieved on key QoS performance indicators, especially in situations where multiple wireless nodes coexist. In future wireless communication systems, researchers want artificial intelligence to play a key role in unlicensed spectrum access, i.e., on-line, making corresponding adjustments to access modes of base stations according to coexisting network conditions, maintaining key performance indicators at a high level.
In summary, it is significant to provide an intelligent spectrum access scheme for the cellular system and the WiFi coexistence network in the unlicensed frequency band.
Disclosure of Invention
In view of this, the invention provides an intelligent unlicensed spectrum access method based on average fields. In the method, each SBS serves as an intelligent agent, and can determine own access time and transmission time length after access in a mutual cooperation mode according to the current coexistence network state, so that the total throughput and fairness of the coexistence network are jointly maximized, and the service quality of users in the network is ensured.
In order to achieve the above purpose, the present invention provides the following technical solutions:
an intelligent unlicensed spectrum access method based on an average field is characterized in that: the method comprises the following steps:
s1: initializing environmental parameters and agent parameters;
s2: initializing the state and experience playback mechanism RB of each agent;
s3: generating actions according to Bolzmann policy
S4: at the subsequent beta E Action a is performed in a single execution cycle t Receiving environmental feedback r t And updates the state to s t+1
S5: information exchange is carried out among the nodes;
s6: transfer of the sampleStore to experience playback mechanism RB;
s7: randomly extracting H transfer samples from an experience playback mechanism RB to update the Q-network;
s8: and after training is terminated, each intelligent agent obtains an optimal access strategy.
Further, in step S1, the environment parameters include backoff parameters of the WiFi Access Point (WAP) and time parameters of the proposed access frame. Specifically, WAP adopts binary exponential back-off CSMA/CA protocol random access channel, CSMA/CA back-off parameters to be set include initial window size CW, wiFi access point packet length T W And a maximum backoff order m. Furthermore, we further propose an unlicensed spectrum access framework that provides a theoretical basis for the interaction of agents with the environment and optimization goals. In this framework, the time parameters that need to be set include β E 、β SE And beta S . The agent parameters include an initial temperature T in the action selection policy Bolzmann policy 0 The empirical playback mechanism RB size and neural network training parameters in the agent.
Further, in step S2, each of the wisdom needs to be initialized before the formal training process beginsState s of energy body t . State s t Is a global variable defined as:
wherein f t The fairness index of the coexisting network at time t is expressed as follows:
where K represents the number of WAP in the coexistence network, N represents the number of SBS in the coexistence network, and->Respectively represent the time T E (T-T) F ,t]The throughput of the ith WAP and the ith SBS of (a) are defined as:
wherein T is F Indicating the length of the feedback period,and->Respectively representing the packet length or frame length of the ith WAP and the ith SBS transmitted in the current feedback period, so as to swallowThe spitting quantity means in the feedback period T F The packet length or frame length of the successful transmission occupies T F Is a ratio of (2). In addition, each agent contains an experience playback mechanism RB for storing past experience samples for training of the Q-network. RB is a queue-type memory of finite size, which we need to preset.
Further, in step S3, the agent is based on the current state S t And the Bolzmann policy selects actions to be performed next. The Bolzmann policy expression is:
in the formula, Q (s, a) represents an action cost function, namely the value brought by selecting the action a in the current state s, and the value is used for measuring the quality of the action.Indicating the current temperature, T 0 And N represents the initial temperature and the number of times the corresponding action is selected, respectively. T gradually decreases along with the training iteration, which shows that the intelligent agent is insufficient for environment exploration at the beginning of training, so that the intelligent agent tends to randomly execute actions to explore the environment; as the number of exercises increases, agents tend to make action selections with learned knowledge. Further, the actions are defined as:
a t =[AT t ,TX t ]
in the formula, AT t ∈{0,T SF ,2T SF ,…,NT SF The access time is the SBS basic transmission unit subframe T SF Is an integer multiple of (a). TX (transmission x) t ∈{T SF ,2T SF ,…,MT SF The transmission time after the access is represented as a basic transmission unit subframe T of the SBS SF Is an integer multiple of (a). The agent needs to learn a control strategy that can instruct the agent when to access and how long to transmit after access in the current state.
Further, in step S4, each agent will be at a subsequent β E Execution cycle T E In a manner that allows the agent to observe environmental dynamics, i.e., changes in flow patterns, from a larger time scale, thereby calculating r t Sum s t+1 Evaluating action value Q(s) t ,a t ) More accurate and faster learning convergence. Prize value r t The expression of (2) is:
in the coexisting network total throughputIs defined as:
the definition of rewards follows our pursuit goal of maximizing the total throughput of the coexistence network and ensuring fairness, unilaterally increasing throughput or fairness only brings about smaller rewards values, and larger rewards values only when throughput and fairness are increased simultaneously.
Further, in step S5, the nodes exchange information in the last execution period of each feedback period to obtain throughput and action information of the remaining nodes, and calculate the total throughput, fairness, rewards, etc. using the former, and calculate the average action in the average field theory using the latter. In particular, the average field theory is used to ensure convergence of reinforcement learning algorithms in multi-agent environments and reduce computational complexity. First Q-function decomposition begins:
where N (k) is the index set of adjacent agents, the size N k = |n (k) | depends on different application settings. a represents the joint action of all agents. The Q-function for measuring the value of the joint action of the kth agent is represented by the above formulaThe number is approximately the average value of interactions with neighboring agents. Subsequently, the average actions of the kth agent neighboring agent may be:
wherein the method comprises the steps ofIt is understood that the empirical distribution of actions of adjacent agents. In addition, for the adjacent agent j, its action a j Can be expressed as average action->Adding a disturbance term delta j,k
When Q is according to the Taylor formula k (s,a k ,a j ) Regarding a j Second order can be micro-time, Q k (s, a) can be expressed as:
in the following, it can be seen thatIs the remainder of the taylor polynomial andin fact, R k (a j ) A small perturbation term that can prove close to 0 under certain conditions can be omitted. To this end we have already decomposed the Q-function of agent k +.> Note that the dimension of joint action in the Q-function no longer grows exponentially with the number of agents, but this can greatly reduce the computational complexity of the algorithm, independent of the number of agents. Furthermore, we need only consider the neighboring agent actions at the current time t, and not the historical behavior. The update rule of the approximated average field Q-function is as follows:
wherein the average field cost functionThe calculation is as follows:
further, in step S6, the agent will interact with the environment once to generate a transfer sample Stored to RB. If the RB is full, the old sample is popped from the head of the queue according to the queue property, and the new sample is added from the tail of the queue.
Further, in step S7, the agent randomly extracts samples with batch number H from RB, and updates Q-network weights for the loss function by gradient descent method. Wherein the loss function definition is based on mean square error:
wherein y is j Representing a target value, which is calculated by participation of a target Q-network Q' (-), and is defined as:
the loss function is minimized through the gradient descent algorithm every iteration, the weight of the Q-network is updated, and the strategy of the intelligent agent is improved.
Further, in step S8, when the training times reach the expected times, each agent learns an optimal solution, i.e. an optimal access scheme.
The invention has the following effective effects: according to the intelligent unlicensed spectrum access method based on the average field, a plurality of small base stations can adjust access actions in real time according to the state of the coexisting network, finally, the purposes of maximizing total throughput and ensuring fairness are achieved, the spectrum utilization rate of the coexisting network is improved, and the service quality of users is ensured.
Drawings
In order to better and clearly understand the objects, technical solutions and advantageous effects of the present invention, the present invention is illustrated in the following drawings:
FIG. 1 is a flowchart of a deep reinforcement learning algorithm according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of reinforcement learning interaction according to an embodiment of the present invention;
FIG. 3 is a diagram of a SBS/WAP coexistence network model according to the embodiment of the present invention;
fig. 4 is an unlicensed spectrum access framework according to an embodiment of the present invention.
Detailed Description
Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Aiming at the coexistence problem in a 5GHz unlicensed spectrum SBS/WAP coexistence network, the invention provides an intelligent unlicensed spectrum access method based on an average field. In the present invention, each SBS can be compared with the conventional coexistence mechanismThe access opportunity and the transmission time length are adaptively adjusted through mutual cooperation, and the process is shown in fig. 1. SBS selects access action a through Bolzmann strategy firstly t Then, action is performed on the environment and feedback r is obtained t Sum s t+1 And then learning and improving the strategy by using past experience, and finally learning an optimal access strategy through multiple training iterations. The above process corresponds to the interaction process of multiple agents with the environment under cooperative setting, as shown in fig. 2.
The coexistence network we consider is that there are N SBS and K WAPs, and the system model is shown in fig. 3. SBS uses our proposed algorithm to access the channel, while WAP uses CSMA/CA protocol to access the channel. Each WAP employs an isomorphic back-off parameter because the unfairness caused by the heterogeneity of back-off parameters is not within the contemplation of the present invention.
To describe the interactive behavior process of each node and the environment and clearly optimize the target, we set up an unlicensed spectrum access framework to model the environment of the coexistence network, as shown in fig. 4. In the access framework, the time resources are divided into two levels at different time scales. The upper layer describes the process of information exchange and agent decision reasoning on a large time scale. Specifically, the time resource is divided into several feedback periods T F Each feedback period comprises a plurality of T E . Exchange of SBSs with WAPs information with T F For a period, each SBS generates an action according to its own strategy and the received information, and then performs the action in the execution period contained in the current feedback period. At the lower layer, execute cycle T E Is further divided into smaller fine granularity to accommodate the protocols of SBSs and WAPs. Specifically T E Is divided into a plurality of subframes T SF I.e. the basic unit of SBSs scheduling. T (T) SF And is further divided into a plurality of time slots T S I.e. WAPs transmit a basic unit of packet length. To sum up, T F =β E T E ,T E =β SF T SF ,T SF =β S T S ,β E 、β SF And beta S Is an integer.
Our goal is to let eachThe small base stations learn the access strategy, and generate and execute access actions according to the state of the coexisting network in each feedback period, so as to maximize the total throughput and fairness of the coexisting network. The total throughput is defined as:
in the method, in the process of the invention,and->Respectively represent the time T E (T-T) F ,t]The throughput of the ith WAP and the ith SBS of (a) are defined as:
fairness is defined as:
wherein f t ∈[0,1]Meaning that the greater the fairness index, the more evenly the time resource allocation. When all nodes have the same share of time resources, the whole coexisting network reaches the most fair state f t =1。
In the present algorithm, the state of an agent is defined as:
it can be seen that the state is defined as a combination of fairness and throughput for each node, which is highly correlated with the main objectives of the execution of the algorithm, i.e., throughput and fairness, and can well indicate whether the wireless channel is fully and uniformly utilized. Second, the actions of the agent are defined as:
a t =[AT t ,TX t ]
in the formula, AT t ∈{0,T SF ,2T SF ,…,NT SF The access time is the SBS basic transmission unit subframe T SF Is an integer multiple of (a). TX (transmission x) t ∈{T SF ,2T SF ,…,MT SF The transmission time after the access is represented as a basic transmission unit subframe T of the SBS SF Is an integer multiple of (a). The agent uses the Bolzmann strategy in the selection of actions:
finally, the rewards of the agent are defined as:
wherein f t Andrespectively, the fairness and the total throughput of the coexisting networks.
As the number of SBS (agents) in the coexistence network increases, the size of the state space increases, and the computational complexity of the algorithm also increases exponentially. Expanding deep Q-learning in a single agent directly to a multi-agent environment can lead to excessive computational complexity and cannot guarantee algorithm convergence, so we use average field theory to overcome the above challenges. Through information exchange, each agent receives actions of other agents, and average actions of adjacent agents of the kth agent can be:
subsequently, a transfer sample generated by one interaction with the environment is takenStored to RB. The intelligent agent periodically extracts a batch of samples from the experience playback mechanism to train the Q-network, and the access strategy of the intelligent agent is improved. Through continuous training, each agent can learn an optimal strategy.
Finally, it is noted that the above-mentioned preferred embodiments are only intended to illustrate rather than limit the invention, and that, although the invention has been described in detail by means of the above-mentioned preferred embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention as defined by the appended claims.

Claims (4)

1. An intelligent unlicensed spectrum access method based on an average field is characterized in that: the method comprises the following steps:
s1: initializing environmental parameters and agent parameters;
s2: initializing the state and experience playback mechanism RB of each agent;
s3: generating actions according to Bolzmann policy
S4: at the subsequent beta E Action a is performed in a single execution cycle t Receiving environmental feedback r t And updates the state to s t+1
S5: information exchange is carried out among the nodes;
s6: transfer of the sampleStore to experience playback mechanism RB;
s7: randomly extracting H transfer samples from an experience playback mechanism RB to update the Q-network;
s8: after training is terminated, each intelligent agent obtains an optimal access strategy;
in step S1, the environment is defined as an external entity of the agent interaction, so the environment parameters include the backoff parameters of the WiFi access point and the proposed time parameters of the access frame; in the access framework, the time resource is divided into two layers with different time scales, and the upper layer describes the information exchange and agent decision reasoning process with a large time scale, specifically, the time resource is divided into a plurality of feedback periods T F Each feedback period comprises a plurality of T E Information exchange between SBSs and WAPs is performed by T F For the period, each SBS generates action according to its strategy and received information, then performs action in the execution period contained in the current feedback period, and in the lower layer, performs period T E Is further divided into smaller fine granularity to adapt to the protocols of SBSs and WAPs, in particular T E Is divided into a plurality of subframes T SF I.e. basic unit of SBSs scheduling, T SF And is further divided into a plurality of time slots T S I.e. WAPs transmit the basic unit of packet length, T as described above F =β E T E ,T E =β SF T SF ,T SF =β S T S ,β E 、β SF And beta S Is an integer; specifically, the backoff parameters to be set include an initial window size CW, a WiFi access point packet length T W And a maximum backoff order m, and the time parameter to be set includes beta E 、β SF And beta S The method comprises the steps of carrying out a first treatment on the surface of the The agent parameters include T in Bolzmann strategy 0 Experience playback mechanism size and neural network training parameters in the agent; in step S2, the state S of each agent needs to be initialized before the formal training process begins t The method comprises the steps of carrying out a first treatment on the surface of the State s t Is a global variable defined as:
wherein f t The fairness index of the coexisting network at time t is expressed as follows:
where K represents the number of WAP in the coexistence network, N represents the number of SBS in the coexistence network, andrespectively represent the time T E (T-T) F ,t]The throughput of the ith WAP and the ith SBS of (a) are defined as: />In (1) the->And->Respectively representing the packet length or frame length of the ith WAP and the ith SBS transmitted in the current feedback period, so the throughput means in the feedback period T F The packet length or frame length of the successful transmission occupies T F Is a ratio of (3); in addition, each agent contains an experience playback mechanism RB for storing past experience samples for training of the Q-network, where RB is a queue-type memory of finite size, the size of which needs to be preset; in step S3, the agent is based on the current state S t And the Bolzmann policy selects actions to be performed next; the Bolzmann policy expression is:
where Q (s, a) represents an action cost function, i.e. selected in the current state sThe value brought by the action a is used for measuring the quality of the action,indicating the current temperature, T 0 And N represents the initial temperature and the number of times the corresponding action is selected, T gradually decreases along with the training iteration, which indicates that the intelligent agent is insufficient for environment exploration at the beginning of training, so that the action tends to be executed randomly to explore the environment; as the number of exercises increases, agents tend to make action selections with learned knowledge; further, the actions are defined as:
a t =[AT t ,TX t ]
in the formula, AT t ∈{0,T SF ,2T SF ,...,NT SF The access time is the SBS basic transmission unit subframe T SF Integer multiples of (2); TX (transmission x) t ∈{T SF ,2T SF ,...,MT SF The transmission time after the access is represented as a basic transmission unit subframe T of the SBS SF Integer multiples of (2); the intelligent agent needs to learn a control strategy which can guide when the intelligent agent is accessed in the current state and how long to transmit after the intelligent agent is accessed; in step S4, each agent will be at a subsequent beta E Execution cycle T E In a manner that allows the agent to observe environmental dynamics, i.e., changes in flow patterns, from a larger time scale, thereby calculating r t Sum s t+1 Evaluating action value Q(s) t ,a t ) The learning convergence is quicker and more accurate; prize value r t The expression of (2) is:
in the coexisting network total throughputIs defined as:
the definition of rewards follows our pursuit goal, namely, maximize the total throughput of the coexistence network and ensure fairness, unilaterally increase throughput or fairness only bring about smaller rewards, only increase throughput and fairness at the same time bring about larger rewards; in step S5, the nodes exchange information in the last execution period of each feedback period to obtain throughput and action information of the rest nodes, the former is used to calculate total throughput, fairness and rewards, and the latter is used to calculate average actions in the average field theory; specifically, the average field theory is used for ensuring convergence of a reinforcement learning algorithm in a multi-agent environment and reducing computational complexity; first Q-function decomposition begins:
where N (k) is the index set of adjacent agents, the size N k = |n (k) | depends on different application settings; a represents the joint action of all agents; the Q-function used to measure joint action value for the kth agent is represented above as being approximated to the average value of interactions with neighboring agents; subsequently, the average actions of the kth agent neighboring agent may be:
wherein the method comprises the steps ofCan be understood as the empirical distribution of actions of adjacent agents; in addition, for the adjacent agent j, its action a j Can be expressed as average action->Adding a disturbance term delta j,k
When Q is according to the Taylor formula k (s,a k ,a j ) Regarding a j Second order can be micro-time, Q k (s, a) can be expressed as:
in which Sigma is known k δ j, k=0,Is the remainder of the taylor polynomial andin fact, R k (a j ) A small disturbance term that can prove to be close to 0 under certain conditions, so can be omitted; to this end we have already decomposed the Q-function of agent k +.> Note that the dimension of joint action in the Q-function no longer increases exponentially with the number of agents, but is independent of the number of agents, which can greatly reduce the computational complexity of the algorithm; in addition, we only need to consider the actions of the adjacent agents at the current time t, and do not need to consider the historical behaviors; the update rule of the approximated average field Q-function is as follows:
wherein the average field cost functionThe calculation is as follows:
2. the intelligent unlicensed spectrum access method based on average fields according to claim 1, wherein: in step S6, the agent will interact with the environment once to generate a transfer sample Storing to RB; if the RB is full, the old sample is popped from the head of the queue according to the queue property, and the new sample is added from the tail of the queue.
3. The intelligent unlicensed spectrum access method based on average fields according to claim 1, wherein: in step S7, the agent randomly extracts samples with batch number of H from the RB, and updates the Q-network weight for the loss function by adopting a gradient descent method; wherein the loss function definition is based on mean square error:
wherein y is j Representing a target value, which is calculated by participation of a target Q-network Q' (-), and is defined as:
the loss function is minimized through the gradient descent algorithm every iteration, the weight of the Q-network is updated, and the strategy of the intelligent agent is improved.
4. The intelligent unlicensed spectrum access method based on average fields according to claim 1, wherein: in step S8, when the training times reach the expected times, each agent learns an optimal solution, i.e. an optimal access scheme.
CN202210746022.7A 2022-06-29 2022-06-29 Intelligent unlicensed spectrum access method based on average field Active CN115134026B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210746022.7A CN115134026B (en) 2022-06-29 2022-06-29 Intelligent unlicensed spectrum access method based on average field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210746022.7A CN115134026B (en) 2022-06-29 2022-06-29 Intelligent unlicensed spectrum access method based on average field

Publications (2)

Publication Number Publication Date
CN115134026A CN115134026A (en) 2022-09-30
CN115134026B true CN115134026B (en) 2024-01-02

Family

ID=83379784

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210746022.7A Active CN115134026B (en) 2022-06-29 2022-06-29 Intelligent unlicensed spectrum access method based on average field

Country Status (1)

Country Link
CN (1) CN115134026B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101726521B1 (en) * 2015-12-30 2017-04-12 숭실대학교산학협력단 D2D communication system that use the non-licensed band to the auxiliary band and Method of D2D system thereof
CN108924944A (en) * 2018-07-19 2018-11-30 重庆邮电大学 The dynamic optimization method of contention window value coexists in LTE and WiFi based on Q-learning algorithm
CN113316174A (en) * 2021-05-26 2021-08-27 重庆邮电大学 Intelligent access method for unlicensed spectrum
CN114363908A (en) * 2022-01-13 2022-04-15 重庆邮电大学 A2C-based unlicensed spectrum resource sharing method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10405192B2 (en) * 2018-01-15 2019-09-03 Charter Communications Operating, Llc Methods and apparatus for allocation and reconciliation of quasi-licensed wireless spectrum across multiple entities

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101726521B1 (en) * 2015-12-30 2017-04-12 숭실대학교산학협력단 D2D communication system that use the non-licensed band to the auxiliary band and Method of D2D system thereof
CN108924944A (en) * 2018-07-19 2018-11-30 重庆邮电大学 The dynamic optimization method of contention window value coexists in LTE and WiFi based on Q-learning algorithm
CN113316174A (en) * 2021-05-26 2021-08-27 重庆邮电大学 Intelligent access method for unlicensed spectrum
CN114363908A (en) * 2022-01-13 2022-04-15 重庆邮电大学 A2C-based unlicensed spectrum resource sharing method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A Chaotic Q-learning-Based Licensed Assisted Access Scheme Over the Unlicensed Spectrum;Errong Pei et al.;《IEEE Transactions on Vehicular Technology》;第68卷(第10期);全文 *
A Deep Reinforcement Learning Based Spectrum Access Scheme in Unlicensed Bands;Errong Pei et al.;《2021 IEEE International Conference on Communications Workshops (ICC Workshops)》;全文 *
动态频谱管理技术发展研究;田家强 等;《通信技术》(第4期);全文 *

Also Published As

Publication number Publication date
CN115134026A (en) 2022-09-30

Similar Documents

Publication Publication Date Title
Tan et al. Intelligent sharing for LTE and WiFi systems in unlicensed bands: A deep reinforcement learning approach
Wang et al. Price-based spectrum management in cognitive radio networks
Song et al. Stochastic channel selection in cognitive radio networks
Dakdouk et al. Reinforcement learning techniques for optimized channel hopping in IEEE 802.15. 4-TSCH networks
CN111050413B (en) Unmanned aerial vehicle CSMA access method based on adaptive adjustment strategy
CN113316154A (en) Authorized and unauthorized D2D communication resource joint intelligent distribution method
CN110035559B (en) Intelligent competition window size selection method based on chaotic Q-learning algorithm
Ilahi et al. LoRaDRL: Deep reinforcement learning based adaptive PHY layer transmission parameters selection for LoRaWAN
CN113316174B (en) Intelligent access method for unlicensed spectrum
Choe et al. A robust channel access using cooperative reinforcement learning for congested vehicular networks
CN111163531B (en) Unauthorized spectrum duty ratio coexistence method based on DDPG
Barrachina-Muñoz et al. Multi-armed bandits for spectrum allocation in multi-agent channel bonding WLANs
CN114501667A (en) Multi-channel access modeling and distributed implementation method considering service priority
Mazandarani et al. Self-sustaining multiple access with continual deep reinforcement learning for dynamic metaverse applications
CN115134026B (en) Intelligent unlicensed spectrum access method based on average field
Ojo et al. Throughput maximization scheduling algorithm in TSCH networks with deadline constraints
AlQwider et al. Deep Q-network for 5G NR downlink scheduling
Pei et al. A deep reinforcement learning based spectrum access scheme in unlicensed bands
Burgueño et al. Distributed deep reinforcement learning resource allocation scheme for industry 4.0 device-to-device scenarios
Pei et al. Intelligent Access to Unlicensed Spectrum: A Mean Field Based Deep Reinforcement Learning Approach
CN113316156B (en) Intelligent coexistence method on unlicensed frequency band
Song et al. Adaptive Generalized Proportional Fair Scheduling with Deep Reinforcement Learning
CN108768602B (en) Method for selecting authorized user to feed back CSI (channel state information) in independent unlicensed frequency band cellular mobile communication system
Sirhan et al. Cognitive Radio Resource Scheduling using Multi agent QLearning for LTE
Cruz et al. Reinforcement Learning-based Wi-Fi Contention Window Optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20231212

Address after: 518000 1104, Building A, Zhiyun Industrial Park, No. 13, Huaxing Road, Henglang Community, Longhua District, Shenzhen, Guangdong Province

Applicant after: Shenzhen Hongyue Information Technology Co.,Ltd.

Address before: 400065 No. 2, Chongwen Road, Nan'an District, Chongqing

Applicant before: CHONGQING University OF POSTS AND TELECOMMUNICATIONS

Effective date of registration: 20231212

Address after: No. 291, West Section of Renmin Avenue, Cao'e Street, Shangyu District, Shaoxing City, Zhejiang Province, 312300

Applicant after: SHAOXING CITY SHANGYU DISTRICT SHUNXING ELECTRIC POWER CO.,LTD.

Applicant after: State Grid Zhejiang Electric Power Co., Ltd. Shaoxing Shangyu district power supply Co.

Applicant after: State Grid Zhejiang Electric Power Co., Ltd. Yuyao Power Supply Co.

Address before: 518000 1104, Building A, Zhiyun Industrial Park, No. 13, Huaxing Road, Henglang Community, Longhua District, Shenzhen, Guangdong Province

Applicant before: Shenzhen Hongyue Information Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant