CN115134026A - Intelligent unlicensed spectrum access method based on mean field - Google Patents

Intelligent unlicensed spectrum access method based on mean field Download PDF

Info

Publication number
CN115134026A
CN115134026A CN202210746022.7A CN202210746022A CN115134026A CN 115134026 A CN115134026 A CN 115134026A CN 202210746022 A CN202210746022 A CN 202210746022A CN 115134026 A CN115134026 A CN 115134026A
Authority
CN
China
Prior art keywords
agent
action
intelligent
network
method based
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210746022.7A
Other languages
Chinese (zh)
Other versions
CN115134026B (en
Inventor
裴二荣
黄一格
宋珈锐
陶凯
徐成义
刘浔翀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaoxing City Shangyu District Shunxing Electric Power Co ltd
Shenzhen Hongyue Information Technology Co ltd
State Grid Zhejiang Electric Power Co Ltd Shaoxing Shangyu District Power Supply Co
State Grid Zhejiang Electric Power Co Ltd Yuyao Power Supply Co
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202210746022.7A priority Critical patent/CN115134026B/en
Publication of CN115134026A publication Critical patent/CN115134026A/en
Application granted granted Critical
Publication of CN115134026B publication Critical patent/CN115134026B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B17/00Monitoring; Testing
    • H04B17/30Monitoring; Testing of propagation channels
    • H04B17/382Monitoring; Testing of propagation channels for resource allocation, admission control or handover
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W74/00Wireless channel access
    • H04W74/08Non-scheduled access, e.g. ALOHA
    • H04W74/0808Non-scheduled access, e.g. ALOHA using carrier sensing, e.g. carrier sense multiple access [CSMA]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W74/00Wireless channel access
    • H04W74/08Non-scheduled access, e.g. ALOHA
    • H04W74/0833Random access procedures, e.g. with 4-step access
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Electromagnetism (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention relates to an intelligent unauthorized spectrum access method based on an average field, belonging to the field of wireless communication. The method comprises the following steps: s1: initializing environmental parameters and agent parameters; s2: initializing the state and experience playback mechanism RB of each agent; s3: generating actions according to Bolzmann policy
Figure DDA0003719383910000011
S4: at subsequent beta E Performing action a in one execution cycle t Receiving environmental feedback r t And updates the state to s t+1 (ii) a S5: information exchange is carried out among all nodes; s6: will transfer the sample
Figure DDA0003719383910000012
StoringTo an empirical playback mechanism RB; s7: randomly extracting H transfer samples from an experience playback mechanism RB to update the Q-network; s8: training is terminated and each agent gets the optimal access strategy. In the invention, the small base station is used as a main body of learning and decision making, network global information can be obtained through information exchange, and an optimal access action strategy is learned, so that the total throughput and fairness of the coexisting network are maximized.

Description

Intelligent unlicensed spectrum access method based on mean field
Technical Field
The invention belongs to the field of wireless communication, and relates to an intelligent unlicensed spectrum access method based on an average field.
Background
With the dramatic increase in the number of mobile devices and data traffic, the demand for high capacity and data rates has increased dramatically. Cisco's index predicts that by 2022, everyone will reach 3.6 mobile connected devices and global mobile data traffic will increase 7-fold. However, the licensed spectrum resources available to mobile network operators are still quite limited. This results in the focus also shifting from licensed to unlicensed spectrum bands, as there is plenty of spectrum available in unlicensed bands.
Currently unlicensed spectrum for low frequencies includes 2.4GHz, 5GHz, and 6 GHz. The 2.4GHz band is the first commercial unlicensed band released by FCC, and is also the unlicensed shared band most used at present. Existing wireless technologies operating at 2.4GHz include IEEE 802.11, ZigBee, bluetooth, and wireless telephony, among others. Compared with the 2.4GHz band, the 5GHz band can obtain a larger spectrum bandwidth and is less interfered, and most of the frequency bands operate based on the IEEE 802.11 technology, so that most of the coexistence schemes proposed by researchers are mostly considered to be deployed in the 5GHz band. At 6GHz, it is expected that the 5G NR-U will coexist with IEEE 802.11ax/be based systems to meet the high capacity demands of future mobile networks. In summary, all of these unlicensed spectrum is primarily used by WiFi technology from the past to the future. Therefore, in order to use cellular technology in the same unlicensed frequency band as a WiFi network, it is necessary to propose an access scheme that is peacefully and efficiently coexistent with the WiFi network.
In an unauthorized frequency band, a WiFi network which occupies a dominant position adopts a protocol based on CSMA/CA (Carrier Sense Multiple Access With connectivity Avoidance), accesses a channel in a random competition mode, and retransmits packets which generate collisions according to a rule of binary exponential backoff. In contrast, cellular systems perform data transmission using a centralized scheduling mechanism, where the transmission opportunities of user equipment are decided by the base station. Due to the different transmission mechanisms implemented by cellular systems and WiFi networks, cellular data transmissions offloaded to unlicensed bands may significantly degrade the performance of WiFi networks. So far, the existing coexistence schemes, including LTE-LAA, LTE-U, ABS, etc., have very limited performance achieved on key QoS performance indicators, especially in the case of coexistence of multiple wireless nodes. In future wireless communication systems, researchers expect that artificial intelligence can play a key role in unlicensed spectrum access, that is, the access mode of a base station is adjusted correspondingly on line according to the state of a coexisting network, and key performance indexes are maintained at a high level.
In summary, it is significant to provide an intelligent spectrum access scheme for the cellular system and the WiFi coexistence network in the unlicensed frequency band.
Disclosure of Invention
In view of this, the present invention provides an intelligent unlicensed spectrum access method based on an average field. In the method, each SBS serves as an intelligent agent, and can determine the self access time and the transmission time after access in a mutual cooperation mode according to the current coexisting network state, so that the total throughput and fairness of the coexisting network are maximized together, and the service quality of users in the network is ensured.
In order to achieve the purpose, the invention provides the following technical scheme:
an intelligent unlicensed spectrum access method based on an average field is characterized in that: the method comprises the following steps:
s1: initializing environmental parameters and agent parameters;
s2: initializing the state and experience playback mechanism RB of each agent;
s3: generating actions according to Bolzmann policy
Figure BDA0003719383890000021
S4: at subsequent beta E Performing action a in one execution cycle t Receiving environmental feedback r t And updates the status to s t+1
S5: information exchange is carried out among all nodes;
s6: will transfer the sample
Figure BDA0003719383890000022
Storing the data to an experience playback mechanism RB;
s7: randomly extracting H transfer samples from an experience playback mechanism RB to update a Q-network;
s8: training is terminated and each agent gets the optimal access strategy.
Further, in step S1, the environment parameters include a backoff parameter of a WiFi Access Point (WAP) and a time parameter of a proposed access frame. Specifically, WAP adopts CSMA/CA protocol random access channel of binary exponential backoff, and the CSMA/CA backoff parameters to be set include initial window size CW and WiFi access point packet length T W And a maximum backoff order m. In addition, an unlicensed spectrum access framework is further provided, and the framework provides a theoretical basis for interaction and optimization targets of an intelligent agent and the environment. In this framework, the time parameter to be set includes β E 、β SE And beta S . The intelligent agent parameters comprise initial temperature T in an action selection strategy Bolzmann strategy 0 The size of the empirical playback mechanism RB and the neural network training parameters in the agent.
Further, in step S2, before the formal training process is started, the state S of each agent needs to be initialized t . State s t Is a global variable defined as:
Figure BDA0003719383890000023
wherein f is t The fairness index of the coexisting network at the time t is expressed as follows:
Figure BDA0003719383890000031
wherein K represents the number of WAP in the coexisting network, N represents the number of SBS in the coexisting network,
Figure BDA0003719383890000032
Figure BDA0003719383890000033
and
Figure BDA0003719383890000034
respectively representing the time te (T-T) F ,t]The throughput of the ith WAP and the ith SBS is defined as:
Figure BDA0003719383890000035
Figure BDA0003719383890000036
in the formula, T F Which is indicative of the length of the feedback period,
Figure BDA0003719383890000037
and
Figure BDA0003719383890000038
respectively indicates the packet length or frame length transmitted by the ith WAP and the ith SBS in the current feedback period, so the throughput means the feedback period T F T is occupied by the length of successfully transmitted packet or frame F In the presence of a suitable solvent. In addition, each agent contains an experience replay mechanism RB for storing past experience samples for the training of the Q-network. RB is a queue-form memory of limited size, which we need to set in advance.
Further, in step S3, the agent depends on the current state S t And the Bolzmann policy selects the action to be performed next. The Bolzmann strategy expression is:
Figure BDA0003719383890000039
in the formula, Q (s, a) represents an action cost function, i.e., the value of selecting action a in current state s is used for measuring the quality of the action.
Figure BDA00037193838900000310
Indicating the current temperature, T 0 And N represent the initial temperature and the number of times the corresponding action was selected, respectively. T is gradually reduced along with the training iteration, which shows that when the training is just started, the intelligent agent is insufficient in environment exploration, so that the intelligent agent tends to randomly execute actions to explore the environment; as the number of training increases, agents progressively tend to make action selections using learned knowledge. Further, actions are defined as:
a t =[AT t ,TX t ]
in the formula, AT t ∈{0,T SF ,2T SF ,…,NT SF Denotes the access time, which is the SBS basic transmission unit sub-frame T SF Integer multiples of. TX t ∈{T SF ,2T SF ,…,MT SF Denotes the transmission duration after access, which is the SBS basic transmission unit sub-frame T SF Integer multiples of. The agent needs to learn a control policy that can instruct the agent when to access and for how long to transmit after access in the current state.
Further, in step S4, each agent will follow the next β E One execution cycle T E In a manner that enables the agent to observe the environmental dynamics, i.e., changes in flow patterns, from a larger time scale, thereby computing r t And s t+1 And evaluating the action value Q(s) t ,a t ) More accurate and faster learning convergence. Reward value r t The expression of (c) is:
Figure BDA0003719383890000041
where the total throughput of the coexisting network
Figure BDA0003719383890000042
Is defined as:
Figure BDA0003719383890000043
the definition of the reward follows the goal we pursued, i.e. to maximize the total throughput of the coexisting network and ensure fairness, and unilaterally increasing throughput or fairness will only bring a smaller reward value, and only when throughput and fairness increase at the same time will bring a larger reward value.
Further, in step S5, the nodes exchange information in the last execution cycle of each feedback cycle to obtain the throughput and action information of the rest of nodes, the former is used to calculate the total throughput, fairness and rewards, etc., and the latter is used to calculate the average action in the average field theory. In particular, mean field theory is used to ensure convergence of reinforcement learning algorithms and reduce computational complexity in a multi-agent environment. First the Q-function decomposition starts:
Figure BDA0003719383890000044
where N (k) is the index set of neighboring agents, size N k | n (k) | depends on different application settings. a represents the joint action of all agents. The above equation represents that the Q-function of the kth agent, which measures the value of the joint action, is approximated as the average value of the interactions with neighboring agents. Then, the average action of the k-th agent neighbor agent may be:
Figure BDA0003719383890000045
wherein
Figure BDA0003719383890000046
Which may be understood as an empirical distribution of adjacent agent actions. Furthermore, for adjacent agent j, its action a j Can be expressed as an average motion
Figure BDA0003719383890000047
Adding a disturbance term delta j,k
Figure BDA0003719383890000048
According to Taylor's formula, when Q k (s,a k ,a j ) In respect of a j Second order microminiature, Q k (s, a) can be expressed as:
Figure BDA0003719383890000049
Figure BDA0003719383890000051
in which is shown
Figure BDA0003719383890000052
Is a Taylor polynomial remainder and
Figure BDA0003719383890000053
in fact, R k (a j ) Under certain conditions, small perturbation terms close to 0 can be proved, and can be omitted. To this end, we have already decomposed the Q-function of agent k
Figure BDA0003719383890000054
Figure BDA0003719383890000055
It is noted that the joint action dimension in the Q-function no longer grows exponentially with the number of agents, which can greatly reduce the computational complexity of the algorithm regardless of the number of agents. Furthermore, we only need to consider the neighboring agent actions at the current time t, and not the historical behavior. The updating rule of the Q-function of the approximate average field is as follows:
Figure BDA0003719383890000056
wherein the mean field cost function
Figure BDA0003719383890000057
The calculation is as follows:
Figure BDA0003719383890000058
further, in step S6, the agent will generate a transfer sample of the interaction with the environment
Figure BDA0003719383890000059
Figure BDA00037193838900000510
And storing to the RB. If the RB is full, the old sample is popped out from the head of the queue according to the property of the queue, and the new sample is added from the tail of the queue.
Further, in step S7, the agent randomly extracts H batches of samples from RB and updates the Q-network weights using a gradient descent method for the loss function. Wherein the loss function definition is based on mean squared error:
Figure BDA00037193838900000511
in the formula, y j Represents a target value, calculated by participation of the target Q-network Q' (. cndot.), defined as:
Figure BDA00037193838900000512
the weight of the Q-network is updated and the strategy of the agent is improved by minimizing the loss function through the gradient descent algorithm in each iteration.
Further, in step S8, when the training times reach the expected times, each agent learns an optimal solution, i.e. an optimal access scheme.
The invention has the following effective effects: a plurality of small base stations can make adjustment of access actions in real time according to the state of a coexisting network, the aims of maximizing total throughput and ensuring fairness are finally achieved, the spectrum utilization rate of the coexisting network is improved, and the service quality of users is ensured.
Drawings
In order to make the object, technical scheme and beneficial effect of the invention better clear, the invention provides the following drawings for illustration:
FIG. 1 is a flowchart of a deep reinforcement learning algorithm according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of reinforcement learning interaction according to an embodiment of the present invention;
fig. 3 is a diagram illustrating an SBS/WAP coexistence network model according to an embodiment of the present invention;
fig. 4 is an unlicensed spectrum access framework according to an embodiment of the present invention.
Detailed Description
Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
The invention provides an intelligent unlicensed spectrum access method based on an average field, aiming at the coexistence problem in a 5GHz unlicensed spectrum SBS/WAP coexistence network. Compared with the conventional coexistence mechanism, in the present invention, the SBS can adaptively adjust the access timing and the transmission duration by cooperating with each other, and the procedure is as shown in fig. 1. SBS selects the access action a through Bolzmann strategy t Subsequently performing an action on the environment and obtaining feedback r t And s t+1 And then, learning and improving the strategy by using past experience, and finally learning an optimal access strategy through multiple training iterations. The above process is consistent with the interaction process of the multi-agent with the environment in a collaborative setting, as shown in fig. 2.
The coexistence network we consider has N SBS and K WAPs and the system model is shown in fig. 3. SBS accesses the channel using our proposed algorithm, while WAP accesses the channel using CSMA/CA protocol. Each WAP employs homogeneous back-off parameters because unfairness due to the heterogeneity of the back-off parameters is not within the contemplation of the present invention.
In order to describe the interaction behavior process of each node and the environment and to explicitly optimize the target, an unlicensed spectrum access framework is established to perform environment modeling on the coexistence network, as shown in fig. 4. In the access framework, time resources are divided into two levels at different time scales. The upper layer describes the process of information exchange and agent decision reasoning in a large time scale. In particular, the time resource is divided into several feedback periods T F Each feedback period further comprises a plurality of T E . Information exchange between SBSs and WAPs by T F For a period, each SBS generates an action according to its own policy and received information, and then performs the action in an execution period included in the current feedback period. In the lower layer, the period T is executed E Are further divided into smaller fine granularities to accommodate the protocols of the SBSs and WAPs. In particular, T E Is divided into several sub-frames T SF I.e., the basic unit of SBSs scheduling. T is a unit of SF Is further divided into several time slots T S I.e., the basic unit of the transmission packet length of the WAPs. In summary, T F =β E T E ,T E =β SF T SF ,T SF =β S T S ,β E 、β SF And beta S Is an integer.
Our goal is to let each small cell learn the access policy, generate and execute the access action according to the coexisting network state at each feedback cycle, thereby maximizing the coexisting network total throughput and fairness. The total throughput is defined as:
Figure BDA0003719383890000071
in the formula (I), the compound is shown in the specification,
Figure BDA0003719383890000072
and
Figure BDA0003719383890000073
respectively representTime te (T-T) F ,t]The throughput of the ith WAP and the ith SBS is defined as:
Figure BDA0003719383890000074
Figure BDA0003719383890000075
fairness is defined as:
Figure BDA0003719383890000076
in the formula (f) t ∈[0,1]Meaning that the larger the fairness index, the more balanced the time resource allocation. When all nodes have equal share of time resources, the whole coexisting network reaches the fairest state f t =1。
In the present algorithm, the state of an agent is defined as:
Figure BDA0003719383890000077
it can be seen that the state is defined as a combination of fairness and throughput of each node, which is highly correlated with the main objective of algorithm execution, i.e., throughput and fairness, and can well indicate whether the wireless channel is fully and uniformly utilized. Second, the actions of the agent are defined as:
a t =[AT t ,TX t ]
in the formula, AT t ∈{0,T SF ,2T SF ,…,NT SF Denotes the access time, which is the SBS basic transmission unit sub-frame T SF Integer multiples of. TX t ∈{T SF ,2T SF ,…,MT SF Denotes the transmission duration after access, which is the SBS basic transmission unit sub-frame T SF Integer multiples of. The agent uses the Bolzmann policy on the selection of actions:
Figure BDA0003719383890000081
finally, the reward for the agent is defined as:
Figure BDA0003719383890000082
in the formula (f) t And
Figure BDA0003719383890000083
respectively representing fairness and overall throughput of the coexisting networks.
As the number of SBS (intelligent agents) in the coexisting network increases, the size of the state space increases, and the computational complexity of the algorithm also increases exponentially. The direct extension of deep Q-learning in a single agent to a multi-agent environment leads to excessive computational complexity and inability to guarantee algorithm convergence, so we use mean-field theory to overcome the above challenges. Through information exchange, each agent receives the actions of the other agents, and the average action of the k-th agent adjacent agent can be:
Figure BDA0003719383890000084
subsequently, a transfer sample generated by one interaction with the environment is sampled
Figure BDA0003719383890000085
And storing to the RB. The intelligent agent can periodically extract a batch of samples from an experience playback mechanism to train the Q-network, and the access strategy of the intelligent agent is improved. Through continuous training, each agent can learn an optimal strategy.
Finally, it is noted that the above-mentioned preferred embodiments illustrate rather than limit the invention, and that, while the invention has been described in detail with reference to the above-mentioned preferred embodiments, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the scope of the invention as defined by the appended claims.

Claims (9)

1. An intelligent unlicensed spectrum access method based on an average field is characterized in that: the method comprises the following steps:
s1: initializing environmental parameters and agent parameters;
s2: initializing the state and experience playback mechanism RB of each agent;
s3: generating actions according to Bolzmann policy
Figure FDA0003719383880000011
S4: at subsequent beta E Performing action a in one execution cycle t Receiving environmental feedback r t And updates the status to s t+1
S5: information exchange is carried out among all nodes;
s6: will transfer the sample
Figure FDA0003719383880000012
Storing the data to an experience playback mechanism RB;
s7: randomly extracting H transfer samples from an experience playback mechanism RB to update the Q-network;
s8: training is terminated and each agent gets the optimal access strategy.
2. The intelligent unlicensed spectrum access method based on mean field according to claim 1, wherein: in step S1, the environment is defined as an external entity for the agent to interact with, so the environment parameters include the backoff parameter of the WiFi access point and the time parameter of the proposed access framework. Specifically, the backoff parameters to be set include an initial window size CW, a WiFi access point packet length T W And a maximum back-off order m, the time parameter to be set comprises beta E 、β SF And beta S . Agent parameters including Bolzmann policyT 0 The size of the empirical playback mechanism and the neural network training parameters in the agent.
3. The intelligent unlicensed spectrum access method based on mean field according to claim 1, wherein: in step S2, before the formal training process begins, the state S of each agent needs to be initialized t . State s t Is a global variable defined as:
Figure FDA0003719383880000013
wherein f is t The fairness index of the coexisting network at the time t is expressed as follows:
Figure FDA0003719383880000014
wherein K represents the number of WAPs in the coexisting network, N represents the number of SBS in the coexisting network,
Figure FDA0003719383880000015
Figure FDA0003719383880000016
and
Figure FDA0003719383880000017
respectively representing the time te (T-T) F ,t]The throughput of the ith WAP and the ith SBS is defined as:
Figure FDA0003719383880000018
Figure FDA0003719383880000019
in the formula, T F Which is indicative of the length of the feedback period,
Figure FDA0003719383880000021
and
Figure FDA0003719383880000022
respectively indicates the packet length or frame length transmitted by the ith WAP and the ith SBS in the current feedback period, so the throughput means the feedback period T F T is occupied by the length of successfully transmitted packet or frame F The ratio of (a) to (b). In addition, each agent contains an experience replay mechanism RB for storing past experience samples for the training of the Q-network. RB is a queue-form memory of a limited size that we need to set in advance.
4. The intelligent unlicensed spectrum access method based on mean field according to claim 1, wherein: in step S3, the agent depends on the current state S t And the Bolzmann policy selects the action to be performed next. The Bolzmann strategy expression is as follows:
Figure FDA0003719383880000023
in the formula, Q (s, a) represents an action cost function, i.e. the value of selecting action a in the current state s is used to measure the quality of the action.
Figure FDA0003719383880000024
Indicating the current temperature, T 0 And N represent the initial temperature and the number of times the corresponding action was selected, respectively. T is gradually reduced along with the training iteration, which shows that when the training is just started, the intelligent agent is insufficient in environment exploration, so that the intelligent agent tends to randomly execute actions to explore the environment; as the number of training increases, agents progressively tend to make action selections using learned knowledge. Further, actions are defined as:
a t =[AT t ,TX t ]
in the formula, AT t ∈{0,T SF ,2T SF ,...,NT SF Denotes the access time, which is the SBS basic transmission unit sub-frame T SF Integer multiples of. TX t ∈{T SF ,2T SF ,...,MT SF Denotes the transmission duration after access, which is the SBS basic transmission unit sub-frame T SF An integer multiple of. The agent needs to learn a control policy that can guide when the agent is accessed in the current state and how long it is transmitting after access.
5. The intelligent unlicensed spectrum access method based on mean field according to claim 1, wherein: in step S4, each agent will follow the next β E One execution cycle T E In which actions are repeatedly performed, which allows the agent to observe the environmental dynamics, i.e., changes in flow patterns, from a larger time scale, thereby calculating r t And s t+1 And evaluating the action value Q(s) t ,a t ) More accurate and faster learning convergence. Reward value r t The expression of (a) is:
Figure FDA0003719383880000025
where the total throughput of the coexisting network
Figure FDA0003719383880000026
Is defined as:
Figure FDA0003719383880000027
the definition of the reward follows the goal we pursued, namely to maximize the total throughput of the coexisting networks and ensure fairness, increasing throughput or fairness unilaterally will only bring small reward values, and only when both throughput and fairness are increased at the same time will a larger reward value be brought.
6. The intelligent unlicensed spectrum access method based on mean field according to claim 1, wherein: in step S5, the nodes exchange information in the last execution cycle of each feedback cycle to obtain the throughput and action information of the rest nodes, and the former is used to calculate the total throughput, fairness and rewards, etc., and the latter is used to calculate the average action in the average field theory. In particular, mean field theory is used to ensure convergence of reinforcement learning algorithms and reduce computational complexity in a multi-agent environment. First the Q-function decomposition starts:
Figure FDA0003719383880000031
where N (k) is the index set of neighboring agents, size N k | n (k) | depends on different application settings. a represents the joint action of all agents. The above equation represents that the Q-function of the kth agent, which measures the value of the joint action, is approximated as the average value of the interactions with neighboring agents. Then, the average action of the k-th agent neighbor agent may be:
Figure FDA0003719383880000032
wherein
Figure FDA0003719383880000033
Which may be understood as an empirical distribution of adjacent agent actions. Furthermore, for adjacent agent j, its action a j Can be expressed as an average motion
Figure FDA0003719383880000034
Adding a disturbance term delta j,k
Figure FDA0003719383880000035
According to Taylor's formula, when Q k (s,a k ,a j ) In relation to a j Second order microminiature, Q k (s, a) can be expressed as:
Figure FDA0003719383880000036
in the formula, sigma is known k δ j,k =0,
Figure FDA0003719383880000037
Is a Taylor polynomial remainder and
Figure FDA0003719383880000041
in fact, R k (a j ) Under certain conditions, small perturbation terms close to 0 can be proved, and can be omitted. To this end, we have already solved the Q-function decomposition of agent k
Figure FDA0003719383880000042
Figure FDA0003719383880000043
It is noted that the dimension of the joint action in the Q-function no longer grows exponentially with the number of agents, which can greatly reduce the computational complexity of the algorithm regardless of the number of agents. Furthermore, we only need to consider the neighboring agent actions at the current time t, and not the historical behavior. The update rule of the approximated mean field Q-function is as follows:
Figure FDA0003719383880000044
wherein the mean field cost function
Figure FDA0003719383880000045
The calculation is as follows:
Figure FDA0003719383880000046
7. the intelligent unlicensed spectrum access method based on mean field according to claim 1, wherein: in step S6, the agent will generate a transfer sample of the interaction with the environment
Figure FDA0003719383880000047
Figure FDA0003719383880000048
And storing to the RB. If the RB is full, the old sample is popped out from the head of the queue according to the property of the queue, and the new sample is added from the tail of the queue.
8. The intelligent unlicensed spectrum access method based on mean field according to claim 1, wherein: in step S7, the agent randomly extracts a batch of slice samples from RB, and updates the Q-network weights for the loss function using a gradient descent method. Wherein the loss function definition is based on mean squared error:
Figure FDA0003719383880000049
in the formula, y j Represents a target value, calculated by participation of the target Q-network Q' (. cndot.), defined as:
Figure FDA00037193838800000410
the weight of the Q-network is updated and the strategy of the agent is improved by minimizing the loss function through the gradient descent algorithm in each iteration.
9. The intelligent unlicensed spectrum access method based on mean field according to claim 1, wherein: in step S8, when the training times reach the expected times, each agent learns an optimal solution, i.e., an optimal access scheme.
CN202210746022.7A 2022-06-29 2022-06-29 Intelligent unlicensed spectrum access method based on average field Active CN115134026B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210746022.7A CN115134026B (en) 2022-06-29 2022-06-29 Intelligent unlicensed spectrum access method based on average field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210746022.7A CN115134026B (en) 2022-06-29 2022-06-29 Intelligent unlicensed spectrum access method based on average field

Publications (2)

Publication Number Publication Date
CN115134026A true CN115134026A (en) 2022-09-30
CN115134026B CN115134026B (en) 2024-01-02

Family

ID=83379784

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210746022.7A Active CN115134026B (en) 2022-06-29 2022-06-29 Intelligent unlicensed spectrum access method based on average field

Country Status (1)

Country Link
CN (1) CN115134026B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101726521B1 (en) * 2015-12-30 2017-04-12 숭실대학교산학협력단 D2D communication system that use the non-licensed band to the auxiliary band and Method of D2D system thereof
CN108924944A (en) * 2018-07-19 2018-11-30 重庆邮电大学 The dynamic optimization method of contention window value coexists in LTE and WiFi based on Q-learning algorithm
US20190223025A1 (en) * 2018-01-15 2019-07-18 Charter Communications Operating, Llc Methods and apparatus for allocation and reconciliation of quasi-licensed wireless spectrum across multiple entities
CN113316174A (en) * 2021-05-26 2021-08-27 重庆邮电大学 Intelligent access method for unlicensed spectrum
CN114363908A (en) * 2022-01-13 2022-04-15 重庆邮电大学 A2C-based unlicensed spectrum resource sharing method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101726521B1 (en) * 2015-12-30 2017-04-12 숭실대학교산학협력단 D2D communication system that use the non-licensed band to the auxiliary band and Method of D2D system thereof
US20190223025A1 (en) * 2018-01-15 2019-07-18 Charter Communications Operating, Llc Methods and apparatus for allocation and reconciliation of quasi-licensed wireless spectrum across multiple entities
CN108924944A (en) * 2018-07-19 2018-11-30 重庆邮电大学 The dynamic optimization method of contention window value coexists in LTE and WiFi based on Q-learning algorithm
CN113316174A (en) * 2021-05-26 2021-08-27 重庆邮电大学 Intelligent access method for unlicensed spectrum
CN114363908A (en) * 2022-01-13 2022-04-15 重庆邮电大学 A2C-based unlicensed spectrum resource sharing method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ERRONG PEI ET AL.: "A Chaotic Q-learning-Based Licensed Assisted Access Scheme Over the Unlicensed Spectrum", 《IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY》, vol. 68, no. 10, XP011751151, DOI: 10.1109/TVT.2019.2936510 *
ERRONG PEI ET AL.: "A Deep Reinforcement Learning Based Spectrum Access Scheme in Unlicensed Bands", 《2021 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS (ICC WORKSHOPS)》 *
田家强 等: "动态频谱管理技术发展研究", 《通信技术》, no. 4 *

Also Published As

Publication number Publication date
CN115134026B (en) 2024-01-02

Similar Documents

Publication Publication Date Title
Dakdouk et al. Reinforcement learning techniques for optimized channel hopping in IEEE 802.15. 4-TSCH networks
WO2012072445A1 (en) Method and apparatus of communications
CN111050413B (en) Unmanned aerial vehicle CSMA access method based on adaptive adjustment strategy
CN113316174B (en) Intelligent access method for unlicensed spectrum
CN110035559B (en) Intelligent competition window size selection method based on chaotic Q-learning algorithm
CN113316154A (en) Authorized and unauthorized D2D communication resource joint intelligent distribution method
Ilahi et al. LoRaDRL: Deep reinforcement learning based adaptive PHY layer transmission parameters selection for LoRaWAN
Elsayed et al. Deep reinforcement learning for reducing latency in mission critical services
Bi et al. Deep reinforcement learning based power allocation for D2D network
Elsayed et al. Deep Q-learning for low-latency tactile applications: Microgrid communications
Sande et al. Access and radio resource management for IAB networks using deep reinforcement learning
Cao et al. Deep reinforcement learning mac for backscatter communications relying on Wi-Fi architecture
CN114501667A (en) Multi-channel access modeling and distributed implementation method considering service priority
Mazandarani et al. Self-sustaining multiple access with continual deep reinforcement learning for dynamic metaverse applications
Iturria-Rivera et al. Cooperate or not Cooperate: Transfer Learning with Multi-Armed Bandit for Spatial Reuse in Wi-Fi
Karmakar et al. SmartBond: A deep probabilistic machinery for smart channel bonding in IEEE 802.11 ac
CN115134026B (en) Intelligent unlicensed spectrum access method based on average field
Pei et al. A deep reinforcement learning based spectrum access scheme in unlicensed bands
CN113316156B (en) Intelligent coexistence method on unlicensed frequency band
Burgueño et al. Distributed deep reinforcement learning resource allocation scheme for industry 4.0 device-to-device scenarios
Pei et al. Intelligent Access to Unlicensed Spectrum: A Mean Field Based Deep Reinforcement Learning Approach
Cruz et al. Reinforcement Learning-based Wi-Fi Contention Window Optimization
Xu et al. Fair coexistence in unlicensed band for next generation multiple access: The art of learning
Lau et al. ADFPA–A Deep Reinforcement Learning-based Flow Priority Allocation Scheme for Throughput Optimization in FANETs
da SJ Cruz et al. Decentralized Deep Reinforcement Learning Approach for Channel Access Optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20231212

Address after: 518000 1104, Building A, Zhiyun Industrial Park, No. 13, Huaxing Road, Henglang Community, Longhua District, Shenzhen, Guangdong Province

Applicant after: Shenzhen Hongyue Information Technology Co.,Ltd.

Address before: 400065 No. 2, Chongwen Road, Nan'an District, Chongqing

Applicant before: CHONGQING University OF POSTS AND TELECOMMUNICATIONS

Effective date of registration: 20231212

Address after: No. 291, West Section of Renmin Avenue, Cao'e Street, Shangyu District, Shaoxing City, Zhejiang Province, 312300

Applicant after: SHAOXING CITY SHANGYU DISTRICT SHUNXING ELECTRIC POWER CO.,LTD.

Applicant after: State Grid Zhejiang Electric Power Co., Ltd. Shaoxing Shangyu district power supply Co.

Applicant after: State Grid Zhejiang Electric Power Co., Ltd. Yuyao Power Supply Co.

Address before: 518000 1104, Building A, Zhiyun Industrial Park, No. 13, Huaxing Road, Henglang Community, Longhua District, Shenzhen, Guangdong Province

Applicant before: Shenzhen Hongyue Information Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant