CN113163479A - Cellular Internet of things uplink resource allocation method and electronic equipment - Google Patents
Cellular Internet of things uplink resource allocation method and electronic equipment Download PDFInfo
- Publication number
- CN113163479A CN113163479A CN202110164357.3A CN202110164357A CN113163479A CN 113163479 A CN113163479 A CN 113163479A CN 202110164357 A CN202110164357 A CN 202110164357A CN 113163479 A CN113163479 A CN 113163479A
- Authority
- CN
- China
- Prior art keywords
- node
- agent
- strategy
- representing
- channel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 230000001413 cellular effect Effects 0.000 title claims abstract description 31
- 238000013468 resource allocation Methods 0.000 title claims abstract description 24
- 230000009471 action Effects 0.000 claims abstract description 57
- 230000009365 direct transmission Effects 0.000 claims abstract description 14
- 230000002123 temporal effect Effects 0.000 claims abstract description 5
- 239000003795 chemical substances by application Substances 0.000 claims description 103
- 238000012546 transfer Methods 0.000 claims description 41
- 230000006870 function Effects 0.000 claims description 40
- 230000005540 biological transmission Effects 0.000 claims description 32
- 238000004422 calculation algorithm Methods 0.000 claims description 31
- 238000005516 engineering process Methods 0.000 claims description 24
- 238000004364 calculation method Methods 0.000 claims description 19
- 238000004891 communication Methods 0.000 claims description 19
- 239000000126 substance Substances 0.000 claims description 12
- 239000000654 additive Substances 0.000 claims description 9
- 230000000996 additive effect Effects 0.000 claims description 9
- 238000005562 fading Methods 0.000 claims description 9
- 230000008054 signal transmission Effects 0.000 claims description 9
- 230000004044 response Effects 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims description 2
- 230000000875 corresponding effect Effects 0.000 description 9
- 238000001228 spectrum Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000006399 behavior Effects 0.000 description 4
- 230000002787 reinforcement Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 241000282461 Canis lupus Species 0.000 description 1
- 235000008694 Humulus lupulus Nutrition 0.000 description 1
- 239000000956 alloy Substances 0.000 description 1
- 229910045601 alloy Inorganic materials 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/04—Wireless resource allocation
- H04W72/044—Wireless resource allocation based on the type of the allocated resource
- H04W72/0473—Wireless resource allocation based on the type of the allocated resource the resource being transmission power
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/04—TPC
- H04W52/06—TPC algorithms
- H04W52/14—Separate analysis of uplink or downlink
- H04W52/146—Uplink power control
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/50—Allocation or scheduling criteria for wireless resources
- H04W72/53—Allocation or scheduling criteria for wireless resources based on regulatory allocation policies
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/50—Allocation or scheduling criteria for wireless resources
- H04W72/56—Allocation or scheduling criteria for wireless resources based on priority criteria
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
One or more embodiments of the present specification provide a cellular internet of things uplink resource allocation method and an electronic device, where the method includes: each edge node and each direct transmission node of the cellular Internet of things are used as agents, and the agents select an action space A by adopting a search-utilization strategy according to the current system stateiAction a iniAnd executing; according to the executed action aiCalculating the reward value of each agent through a reward function; determining a Q function of the intelligent agent in the current system state according to the Q function of the intelligent agent, and enabling the intelligent agent to enter the next system state from the current system state; determining that the agent performs action a based on the agent's estimation policy, the average estimation policyiA temporal average estimation strategy and an estimation strategy; optimal according to the agent up to a preset number of iterationsAnd estimating a strategy, and performing resource allocation on the uplink resources of the cellular Internet of things. The method provided by the disclosure can realize effective resource allocation of the uplink resources of the cellular Internet of things.
Description
Technical Field
One or more embodiments of the present disclosure relate to the field of wireless communication technologies, and in particular, to an uplink resource allocation method for a cellular internet of things and an electronic device.
Background
As one of the three major application scenarios of 5G, mass machine type communication (mtc) is intended to provide connectivity for large-scale internet of things (IoT) devices. The mMTC supports more than 100 million connections of devices with various QoS requirements per square kilometer, brings opportunities for the interconnection of everything, and simultaneously provides new challenges for the aspects of spectrum utilization rate, transmission delay, data throughput and the like. Non-orthogonal multiple access (NOMA) is considered a key technology that can effectively address these challenges. Compared with the traditional orthogonal multiple access technology, the NOMA can improve the spectrum efficiency, reduce the access delay and the signaling overhead and has more advantages when supporting mass connection by utilizing the new power and the coding domain to carry out non-orthogonal resource allocation on the limited resources among the devices. The basic idea of NOMA is to use non-orthogonal transmission at the transmitting end, actively introduce interference information, and demodulate at the receiving end by Successive Interference Cancellation (SIC) technique. SIC can well improve the spectrum efficiency and effectively enhance the network capacity of an uplink and a downlink. In view of the unique advantages of NOMA, NOMA is currently incorporated into the technical part of the 5G mtc standard by 3GPP, and resource management in NOMA is also a hot research issue in the field of wireless communication.
At present, because the performance of internet of things equipment in a large-scale cellular internet of things application scene is generally poor, a Successive Interference Cancellation (SIC) technology in NOMA transmission cannot be completed, and a relay node and a base station which are used for forwarding cannot perform effective communication; meanwhile, complicated interference situation occurs in NOMA frequency spectrum resource sharing, so that effective resource allocation cannot be performed on uplink resources of the cellular Internet of things.
Disclosure of Invention
In view of this, one or more embodiments of the present disclosure provide a cellular internet of things uplink resource allocation method and an electronic device, so as to solve the problem that effective resource allocation cannot be performed on cellular internet of things uplink resources.
In view of the foregoing, one or more embodiments of the present specification provide a method for allocating uplink resources of a cellular internet of things, including:
taking each edge node and each direct transmission node of the cellular Internet of things as an intelligent agent, and executing the following operations on the intelligent agent until the preset iteration times are reached:
the agent selects an action space A by adopting an exploration-utilization strategy according to the current system state of the agentiAction a iniAnd performing the action ai;
According to the action a executediCalculating a reward value for each of the agents by a reward function; and
determining a Q function of the intelligent agent in the current system state according to the Q function of the intelligent agent, and enabling the intelligent agent to enter the next system state from the current system state;
determining that the agent performs the action a based on an estimation strategy, an average estimation strategy, of the agentiA temporal average estimation strategy and an estimation strategy; and
in response to the determinationThe agent performs the action aiThe estimated strategy value is larger than the average estimated strategy value, and the learning rate delta is usedwAdjusting the current estimation strategy, otherwise using the learning rate deltalAdjusting the current estimation strategy, where δl>δw;
The above operations executed by the agent reach the preset iteration times to obtain the optimal estimation strategy;
and according to the optimal estimation strategy, performing resource allocation on the uplink resources of the cellular Internet of things.
Further, taking each edge node and each direct transmission node of the cellular internet of things as an agent, executing the following operations on the agent until a preset iteration number is reached, and the method further comprises the following steps:
recording the initial Q function value of the intelligent agent as 0, and determining a counter X for recording the occurrence frequency of the system state Si(S), and an initial estimation strategy of the agent pi (S, a)i) Mean estimation strategyWherein the initial estimation strategyInitial mean estimation strategy
Further, the system state S is determined by the state S of the direct transfer nodewAnd the state s of the edge nodenWherein S ═ { S ═ Sw,sn,w∈W,n∈N};
In particular, the state s of the direct transfer nodewChannel allocation coefficient lambda comprising said direct transfer nodew,cState s of said edge nodenChannel allocation coefficient eta including said edge node nn,r,cAnd a transmission power control coefficient thetanWherein λ isw,c={0,1},sw={λw,c,w∈W,c∈C},ηn,r,c={0,1},θn={0.0,0.2,0.4,0.6,0.8,1.0},sn={ηn,r,c,θn,n∈N,r∈R,c∈C}。
Further, the reward function is noted as rew (S, a)i) If the agent is an edge node, the reward function rew (S, a)i) The algorithm is as follows:
if the agent is a direct transfer node, the reward function rew (S, a)i) The algorithm is as follows:
further, the method for determining the Q function in the current system state of the agent comprises:
recording said Q function as Qi(S,ai),
Wherein, deltaqRepresents the Q function learning rate, beta represents the jackpot discount coefficient,respectively the next arriving system state and the action performed.
Further, the exploration-utilization strategy is specifically a greedy strategy epsilon-greedy, and the calculation method of the greedy strategy is as follows:
selective action a of agent i given system state SiIs denoted as p (a)i|S),p(ai| S) algorithm is as follows:
wherein ε represents the action selection probability, and 0<ε<1,Qi(S,ai) Denotes the Q function, Ai(S) represents the number of actions that agent i can perform in system state S.
Further, the determining that the agent performs action aiThe calculation method of the time average estimation strategy comprises the following steps:
the determination that the agent performs action aiThe calculation method of the time estimation strategy comprises the following steps:
wherein the content of the first and second substances,the step size of the updating of the estimation strategy is represented, and the calculation method comprises the following steps:
where δ is the learning rate, δ takes a different value depending on the following two cases,
further, the method also comprises the following steps: determining a signal transmission model for communication among the edge node, the direct transfer node, the relay node and the base station based on a non-orthogonal multiple access (NOMA) technology and an Open Mobile Alliance (OMA) technology, wherein the signal transmission model specifically comprises:
determining N edge nodes, R relay nodes, W direct transmission nodes, and C channels under the base station, where N is {1,2,3, …, N }, R is {1,2,3, …, R }, W is {1,2,3, …, W }, and C is {1,2,3, …, C };
the relay node receives a signal sent by the edge node through NOMA technology to obtain a first signal yrThe first signal yrThe algorithm is as follows:
wherein Hn,rRepresenting the channel gain, θ, of the edge node n to the relay node rnRepresenting the transmission power control coefficient, P, of the edge node nnRepresenting the maximum transmit power, S, of the edge node nnRepresenting signals from edge nodes n, etan,r,cDenotes a channel allocation coefficient, ξ denotes an additive white Gaussian noise signal, andσ2representing additive white Gaussian noise power, wherein N belongs to N, and R belongs to R;
further, Hn,rThe algorithm is as follows:
wherein the content of the first and second substances,representing small-scale fading of the channel to the relay node r of the edge node n and satisfying a gaussian distributiondn,rDenotes the distance from the edge node n to the relay node r, λ is the path loss exponent;
the base station receives the first signal sent by the relay node through the OMA technology and the signal sent by the direct transfer node through the NOMA technology, and the signal is obtained by decoding SIC through the successive interference cancellation technologySecond signal yBSThe second signal yBSThe algorithm is as follows:
wherein Hw,BSRepresenting the channel gain, H, from the direct transfer node w to the base stationr,BSRepresenting the channel gain, P, from the relay node r to the base stationwRepresenting the transmission power, S, of the direct-transfer nodewRepresenting signals from direct-transfer nodes, λw,cDenotes the channel allocation coefficient, murIs a relay gain factor;
Hw,BSthe algorithm is as follows:
wherein the content of the first and second substances,representing small-scale fading of the channel from the direct transfer node w to the base station and satisfying a gaussian distributiondw,BSRepresents the distance from the direct transfer node w to the base station;
Hr,BSthe algorithm is as follows:
wherein the content of the first and second substances,representing small scale fading of the relay node to base station channel and satisfying a gaussian distributiondr,BSRepresents the distance from the relay node r to the base station;
based on Shannon's theorem, calculating the receiving rate R of the base station for receiving the second signalsumThe receiving rate RsumThe algorithm is as follows:
where B denotes the channel bandwidth, τnIndicating that the signal sent by the edge node n is amplified and forwarded by the relay node r on the channel c, and the received signal-to-noise ratio, tau, at the base stationwThe signal transmitted by the direct transmission node w reaches the base station through the channel c, and the receiving signal-to-noise ratio at the base station is represented;
in particular, taunThe calculation method comprises the following steps:
wherein Hi,rDenotes the channel gain, θ, from edge node i to relay node riRepresenting the transmission power control coefficient, P, of the edge node iiRepresenting the maximum transmit power, σ, of the edge node i2For additive white Gaussian noise power, i belongs to N, i is not equal to N, and thetaiPi<θnPn;
τwThe calculation method comprises the following steps:
further, the method also includes, after:
limiting the transmission power of the edge node multiplexing the same channel specifically includes:
when etan,r,cWhen 1, satisfy
Wherein, PtotnFor the threshold value of the transmission power, i ≠ n, αiPi<αnPn;
Determining that each transmission link meets the QoS requirement of the system, and specifically meeting the following conditions:
wherein, tauoA minimum value representing a received signal-to-noise ratio;
limiting each edge node, the direct transfer node and the relay node to only allocate one channel, and specifically satisfying the following conditions:
limiting the number of the edge nodes accessed by each channel, and specifically meeting the following conditions:
wherein q ismaxRepresenting the maximum number of edge nodes allowed to access per channel.
Based on the same inventive concept, one or more embodiments of the present specification further provide an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the method as described in any one of the above items when executing the program.
As can be seen from the above description, in one or more embodiments of the present disclosure, each edge node and each direct transmission node are regarded as an agent, each agent performs its own action according to the state of the whole system, when the reward obtained by the agent is worse than expected, the learning rate can be quickly adjusted to adapt to the policy change of other agents, when the reward obtained is better than expected, the agent learns cautiously, and the time for the policy change is adapted to other agents, and finally, each agent can converge to the optimal estimation policy, and perform resource allocation on each edge node and each direct transmission node based on the optimal estimation policy.
Drawings
In order to more clearly illustrate one or more embodiments or prior art solutions of the present specification, the drawings that are needed in the description of the embodiments or prior art will be briefly described below, and it is obvious that the drawings in the following description are only one or more embodiments of the present specification, and that other drawings may be obtained by those skilled in the art without inventive effort from these drawings.
Fig. 1 is a flowchart of a method for allocating uplink resources in a cellular internet of things according to one or more embodiments of the present disclosure;
FIG. 2 is a flow diagram of determining a signal transmission model in accordance with one or more embodiments of the present disclosure;
FIG. 3 is a flow diagram of optimizing a signal transmission model in accordance with one or more embodiments of the present disclosure;
fig. 4 is a schematic structural diagram of a cellular internet of things uplink resource allocation device according to one or more embodiments of the present disclosure;
fig. 5 is a schematic structural diagram of an electronic device according to one or more embodiments of the present disclosure.
Detailed Description
For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.
It is to be noted that unless otherwise defined, technical or scientific terms used in one or more embodiments of the present specification should have the ordinary meaning as understood by those of ordinary skill in the art to which this disclosure belongs. The use of "first," "second," and similar terms in one or more embodiments of the specification is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items.
As described in the background section, the existing NOMA-based cellular internet of things application scenario cannot perform efficient allocation of uplink resources. In the process of implementing the present disclosure, the applicant finds that a relay node and a base station for forwarding cannot perform effective communication due to poor performance of the internet of things device; meanwhile, complicated interference situation occurs in NOMA frequency spectrum resource sharing, and finally uplink resources cannot be effectively allocated.
Hereinafter, the technical means of the present disclosure will be described in further detail with reference to specific examples.
WoLF in the agent reinforcement learning algorithm (WoLF-PHC) means that parameters need to be adjusted only slowly when the behavior of an agent is better than an expected value, and the speed of adjusting parameters needs to be increased when the behavior of an agent is worse than the expected value. The PHC is a learning algorithm of a single intelligent agent in a stable environment, and through reinforcement learning, the selection probability of the action which can be maximally accumulated and expected is increased, and finally the optimal strategy can be converged.
Under the same base station of the same cell, an edge node represents edge terminal node equipment, a direct transfer node represents direct transmission terminal node equipment, and a relay node represents relay forwarding node equipment; the relay node and the direct transfer node have good channel conditions and can directly communicate with the base station, while the edge node with poor channel conditions in the cell cannot directly communicate with the base station and needs to communicate with the base station through the relay node in an amplification forwarding mode.
Referring to fig. 1, an uplink resource allocation method for a cellular internet of things according to an embodiment of the present specification includes the following steps:
step S101: taking each edge node and each direct transmission node of the cellular Internet of things as an agent, and executing the following operations of step S102-step S104 for each agent until the preset iteration number is reached.
Before the step, the initial value of the Q function of the intelligent agent is recorded as 0, and the Q function is determinedCounter X for recording the number of occurrences of a system state Si(S), and an initial estimation strategy of the agent pi (S, a)i) Mean estimation strategyWherein the initial estimation strategyInitial mean estimation strategyThe estimation strategy represents the probability of selecting each action under a given system state, and the average estimation strategy is a standard for measuring the estimation strategy, so that the estimation strategy is changed to the optimal estimation strategy.
Wherein, aiRepresenting an action space A performed by an agentiThe system state S is determined by the state S of the direct transfer nodewAnd the state s of the above-mentioned edge nodenIs expressed as S ═ Sw,sn,w∈W,n∈N}。
Further, the state s of the direct transfer nodewChannel allocation coefficient lambda comprising said direct transfer nodew,cState s of the above edge nodenChannel allocation coefficient eta including the edge node nn,r,cAnd a transmission power control coefficient thetanWherein, in the step (A),
λw,c={0,1}
sw={λw,c,w∈W,c∈C}
ηn,r,c={0,1}
θn={0.0,0.2,0.4,0.6,0.8,1.0}
sn={ηn,r,c,θn,n∈N,r∈R,c∈C}
step S102: the agent selects an action space A by adopting an exploration-utilization strategy according to the current system state of the agentiAction a iniAnd executed.
In this step, an action space AiComprises thatThe following actions: adjusting signal transmission channels, adjusting connected relay nodes, and adjusting transmission power control. For example, there is an agent i, action ai∈AiIf the agent i directly transmits the node, lambda needs to be adjustedw,cIf the agent is an edge node, adjusting the channel allocation coefficient etan,r,cAnd a transmission power control coefficient thetanAnd (4) finishing.
The exploration is carried out, namely a greedy strategy (epsilon-greedy) is selected by utilizing the strategy, and an action space A is selected by utilizing the greedy strategy (epsilon-greedy)iAction a iniThe specific calculation method is as follows:
selective action a of agent i given system state SiIs denoted as p (a)i|S),p(ai| S) algorithm is as follows:
wherein ε represents the action selection probability, and 0<ε<1,Qi(S,ai) Denotes the Q function, Ai(S) represents the number of actions that agent i can perform in system state S.
That is, agent i will be at ε (0)<ε<1) Probability of selecting an action space A in the System State SiAny of the actions.
Step S103: according to the executed action aiCalculating the reward value of each agent through a reward function; and determining the Q function of the intelligent agent in the current system state according to the Q function of the intelligent agent, and enabling the intelligent agent to enter the next system state from the current system state.
In this step, after each agent has performed the action, the system calculates the corresponding reward value of the agent, and takes the received snr of the transmitted signal at the base station as the reward of the agent, specifically, the reward function is recorded as rew (S, a)i) If the agent is an edge node, then the reward function rew (S, a)i) The algorithm of (1) is as follows:
if the agent is a direct transfer node, then the reward function rew (S, a)i) The algorithm of (1) is as follows:
it will be appreciated that the greater the received signal-to-noise ratio value, the greater the reward value received by the agent. Each agent only needs to observe the state at the current moment without observing the action executed by other agents and the acquired reward value, and takes corresponding action to generate corresponding influence on the system, so that the system enters a new system state at the next moment.
Agent updates the Q function Q (S, a) at this timei) The specific algorithm is as follows:
wherein, deltaqRepresents the Q function learning rate, beta represents the jackpot discount coefficient,the system state reached and the action performed at the next moment, respectively.
Step S104: determining that the agent performs action a based on the agent's estimation policy, the average estimation policyiA temporal average estimation strategy and an estimation strategy; and performing action a in response to determining that the agent performs action aiThe estimated strategy value is larger than the average estimated strategy value, and the learning rate delta is usedwAdjusting the current estimation strategy, otherwise using the learning rate deltalAdjusting the current estimation strategy, where δl>δw。
In this step, the currently executed action a is updatediTime-averaged estimation strategyThe calculation method comprises the following steps:
further, updating the currently performed action aiThe calculation method of the time estimation strategy is as follows:
wherein the content of the first and second substances,the step size of the updating of the estimation strategy is represented, and the calculation method comprises the following steps:
where δ is the learning rate, δ takes a different value depending on the following two cases,
in particular, the estimation strategy of agent i is pii(S,ai) And average estimation strategyBy comparison, if satisfiedThen it is considered as an estimation strategy pii(S,ai) Better, and vice versa, average estimation strategyAnd more preferably. If the current action aiIf the operation is not the one for maximizing the Q function valueIs a negative number, and vice versaIs positive, thereby increasing the probability of selection of the action that maximizes the Q function value.
Step S105: and when the occurrence frequency of the system state reaches a preset iteration frequency, obtaining the optimal estimation strategy of the intelligent agent, and performing resource allocation on the cellular Internet of things uplink resource according to the optimal estimation strategy.
In summary, when the estimation strategy is better, the learning rate of the estimation strategy update becomes slower; when the average estimation strategy is better, the learning efficiency of the estimation strategy update becomes faster. I.e. when the behaviour of the agent is better than expected, the delta is passedwMake slow adjustments to the parameters by delta when the agent's behavior is worse than expectedlAnd rapidly adjusting parameters.
Therefore, the method provided by the embodiment is an uplink resource allocation scheme for online reinforcement learning, and in consideration of a complex interference situation caused by NOMA spectrum resource sharing, in actual complex cellular Internet of things communication, when the number of terminal devices is gradually increased, high computational complexity is caused. However, the multi-agent reinforcement learning algorithm model can enable the system to converge into a stable resource allocation scheme within a specified iteration number. Therefore, the method and the device can realize effective resource allocation of the uplink resources of the cellular Internet of things.
It is to be appreciated that the method can be performed by any apparatus, device, platform, cluster of devices having computing and processing capabilities.
As an optional embodiment, step S101 further includes, before: and determining a signal transmission model for communication between the edge node, the direct transfer node and the relay node and the base station based on the NOMA technology and the OMA technology.
With reference to fig. 2, the signal transmission system model specifically includes:
step S201: n edge nodes, R relay nodes, W direct transfer nodes and C channels under a base station are determined.
In this step, N ═ 1,2,3, …, N }, R ═ 1,2,3, …, R }, W ═ 1,2,3, …, W }, C ═ 1,2,3, …, C }
Step S202: the relay node receives a signal sent by an edge node through the NOMA technology to obtain a first signal yr。
In this step, the first signal yrThe algorithm is as follows:
wherein Hn,rRepresenting the channel gain, θ, from the edge node n to the relay node rnRepresenting the transmission power control coefficient, P, of the edge node nnRepresenting the maximum transmit power, S, of the edge node nnRepresenting signals from edge nodes n, etan,r,cDenotes a channel allocation coefficient, ξ denotes an additive white Gaussian noise signal, andσ2and the power of the additive white Gaussian noise signal is expressed, N belongs to N, and R belongs to R.
Further, Hn,rThe algorithm is as follows:
wherein the content of the first and second substances,representing small-scale fading of the channel to the relay node r of the edge node n and satisfying a gaussian distributiondn,rDenotes the distance from the edge node n to the relay node r, and λ is the path loss exponent.
The edge node and the base station need to be transmitted through two hops when communicating, the edge node sends a signal to the relay node as a first hop, the edge node can multiplex the same subchannel to transmit information to the relay node by using a NOMA mode, the edge node multiplexing the same subchannel can execute NOMA power control in the transmission process, and the signal can be demodulated through a SIC technology when finally reaching the base station through a relay amplification-and-forward (AF) mode.
Step S203: the base station receives a first signal sent by the relay node through the OMA technology and a signal sent by the direct transfer node through the NOMA technology to obtain a second signal yBS。
In this step, the second signal yBSThe algorithm is as follows:
wherein Hw,BSRepresenting the channel gain, H, from the direct transfer node w to the base stationr,BSRepresenting the channel gain, P, from the relay node r to the base stationwRepresenting the transmission power, S, of the direct-transfer nodewRepresenting signals from direct-transfer nodes, λw,cDenotes the channel allocation coefficient, murIs the relay gain factor.
Hw,BSThe algorithm is as follows:
wherein the content of the first and second substances,representing small-scale fading of the channel from the direct transfer node w to the base station and satisfying a gaussian distributiondw,BSRepresenting the distance from the direct transfer node w to the base station.
Hr,BSThe algorithm is as follows:
wherein the content of the first and second substances,representing small scale fading of the relay node to base station channel and satisfying a gaussian distributiondr,BSIndicating the distance from the relay node r to the base station.
Further, λ is the channel c assigned to the direct transfer node w for transmitting signals to the base stationw,c1, otherwise λw,c=0。
It can be understood that the second hop refers to sending out a signal from the relay node to the base station, and considering the performance problem of the relay, the second hop directly adopts the AF mode to transmit the signal in the OMA mode, in the AF mode, the relay node only receives the signal from the edge node and amplifies and transmits the signal to the base station, and the SIC decoding operation is performed by the base station without any encoding operation on the signal.
Step S204: based on Shannon's theorem, calculating the receiving rate R of the base station for receiving the second signalsum。
In this step, the receiving rate R issumThe algorithm is as follows:
where B denotes the channel bandwidth, τnIndicating that the signal sent by the edge node n is amplified and forwarded by the relay node r on the channel c, and the received signal-to-noise ratio, tau, at the base stationwThe signal transmitted by the direct transmission node w reaches the base station through the channel c, and the receiving signal-to-noise ratio at the base station is represented;
in particular, taunThe calculation method comprises the following steps:
wherein Hi,rDenotes the channel gain, θ, from edge node i to relay node riRepresenting the transmission power control coefficient, P, of the edge node iiRepresenting the maximum transmit power, σ, of the edge node i2For additive white Gaussian noise power, i belongs to N, i is not equal to N, and thetaiPi<θnPn。
Hi,rAnd the above Hn,rThe calculation methods are the same, and are not described herein again.
τwThe calculation method comprises the following steps:
as an alternative embodiment, in conjunction with fig. 3, step S204 may further include the following steps:
step S301: the transmission power of edge nodes multiplexing the same channel is limited.
The method specifically comprises the following steps:
when etan,r,cWhen the number is equal to 1, the alloy is put into a container,
wherein, PtotnFor the threshold value of the transmission power, i ≠ n, θiPi<θnPn。
That is, the difference between the power of the edge node n minus the powers of all edge points smaller than the power of the edge node n must be larger than the threshold value P of the transmission powertotnThe threshold value P of the transmission power can be adjusted according to actual conditionstotnThe setting is performed.
Step S302: it is determined that each transmission link satisfies system quality of service (QoS) requirements.
In this step, the conditions to be satisfied are as follows:
wherein, tauoRepresenting the minimum value of the received signal-to-noise ratio.
It can be understood that if each transmission link is required to satisfy the QoS requirement of the system, the above condition is satisfied, and τ can be adjusted according to the actual situationoThe value is set, and is not particularly limited herein.
Step S303: and limiting each edge node, the direct transmission node and the relay node to be allocated with only one channel.
In this step, the conditions to be satisfied are as follows:
step S304: limiting the number of access edge nodes per channel.
In this step, the conditions to be satisfied are as follows:
wherein q ismaxRepresenting the maximum number of edge nodes allowed to access per channel.
The embodiment is system optimization performed for a hybrid transmission system model, and ensures that a base station can successfully decode a received signal by using the SIC technology.
It should be noted that the method of one or more embodiments of the present disclosure may be performed by a single device, such as a computer or server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the devices may perform only one or more steps of the method of one or more embodiments of the present disclosure, and the devices may interact with each other to complete the method.
It should be noted that the above description describes certain embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Based on the same inventive concept, corresponding to any of the above embodiments, one or more embodiments of the present specification further provide a cellular internet of things uplink resource allocation device.
Referring to fig. 4, the uplink resource allocation apparatus for cellular internet of things includes:
the estimation strategy iteration module 401: the method comprises the following steps that each edge node and each direct transfer node of the cellular Internet of things are used as agents, and the following operations are executed on the agents until the preset iteration number is reached: the agent selects an action space A by adopting an exploration-utilization strategy according to the current system state of the agentiAction a iniAnd performing the action ai(ii) a According to the action a executediCalculating a reward value for each of the agents by a reward function; determining a Q function of the intelligent agent in the current system state according to the Q function of the intelligent agent, and enabling the intelligent agent to enter the next system state from the current system state; determining that the agent performs the action a based on an estimation strategy, an average estimation strategy, of the agentiA temporal average estimation strategy and an estimation strategy; and performing the action a in response to determining that the agent performs the action aiThe estimated strategy value is larger than the average estimated strategy value, and the learning rate delta is usedwAdjusting the current estimation strategy, otherwise using the learning rate deltalAdjusting the current estimation strategy, where δl>δw(ii) a And the above operations executed by the intelligent agent reach the preset iteration times to obtain the optimal estimation strategy.
The uplink resource allocation module 402: and the method is configured to perform resource allocation on the uplink resources of the cellular Internet of things according to the optimal estimation strategy.
For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the modules may be implemented in the same one or more software and/or hardware implementations in implementing one or more embodiments of the present description.
The apparatus of the foregoing embodiment is used to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Fig. 5 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.
The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.
The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).
It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.
The electronic device of the foregoing embodiment is used to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the spirit of the present disclosure, features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of different aspects of one or more embodiments of the present description as described above, which are not provided in detail for the sake of brevity.
While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.
It is intended that the one or more embodiments of the present specification embrace all such alternatives, modifications and variations as fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of one or more embodiments of the present disclosure are intended to be included within the scope of the present disclosure.
Claims (10)
1. A cellular Internet of things uplink resource allocation method is characterized by comprising the following steps:
taking each edge node and each direct transmission node of the cellular Internet of things as an intelligent agent, and executing the following operations on the intelligent agent until the preset iteration times are reached:
the agent selects an action space A by adopting an exploration-utilization strategy according to the current system state of the agentiAction a iniAnd performing the action ai;
According to the action a executediCalculating a reward value for each of the agents by a reward function; and
determining a Q function of the intelligent agent in the current system state according to the Q function of the intelligent agent, and enabling the intelligent agent to enter the next system state from the current system state;
determining that the agent performs the action a based on an estimation strategy, an average estimation strategy, of the agentiA temporal average estimation strategy and an estimation strategy; and
performing the action a in response to determining that the agent performs the actioniThe estimated strategy value is larger than the average estimated strategy value, and the learning rate delta is usedwAdjusting the current estimation strategy, otherwise using the learning rate deltalAdjusting the current estimation strategy, where δl>δw;
The above operations executed by the agent reach the preset iteration times to obtain the optimal estimation strategy;
and according to the optimal estimation strategy, performing resource allocation on the uplink resources of the cellular Internet of things.
2. The method according to claim 1, wherein each edge node and each direct transfer node of the cellular internet of things are used as agents, and the following operations are performed on the agents until a preset number of iterations is reached, and before:
recording the initial Q function value of the intelligent agent as 0, and determining a counter X for recording the occurrence frequency of the system state Si(S), and an initial estimation strategy of the agent pi (S, a)i) Mean estimation strategyWherein the initial estimation strategyInitial mean estimation strategy
3. The method of claim 2, wherein the system state S is defined by a state S of the pass-through nodewAnd the state s of the edge nodenWherein S ═ { S ═ Sw,sn,w∈W,n∈N};
In particular, the state s of the direct transfer nodewChannel allocation coefficient lambda comprising said direct transfer nodew,cState s of said edge nodenChannel allocation coefficient eta including said edge node nn,r,cAnd a transmission power control coefficient thetanWherein λ isw,c={0,1},sw={λw,c,w∈W,c∈C},ηn,r,c={0,1},θn={0.0,0.2,0.4,0.6,0.8,1.0},sn={ηn,r,c,θn,n∈N,r∈R,c∈C}。
5. the method of claim 4, wherein the Q function calculation method for determining the current system state of the agent is:
recording said Q function as Qi(S,ai),
6. The method according to claim 5, wherein the exploration-utilization strategy is specifically a greedy strategy epsilon-greedy, and the greedy strategy is calculated by:
selective action a of agent i given system state SiIs denoted as p (a)i|S),p(ai| S) algorithm is as follows:
wherein ε represents the action selection probability, and 0<ε<1,Qi(S,ai) Denotes the Q function, Ai(S) represents the number of actions that agent i can perform in system state S.
7. The method of claim 6, wherein the determining that the agent performs action aiThe calculation method of the time average estimation strategy comprises the following steps:
the determination that the agent performs action aiThe calculation method of the time estimation strategy comprises the following steps:
wherein the content of the first and second substances,the step size of the updating of the estimation strategy is represented, and the calculation method comprises the following steps:
where δ is the learning rate, δ takes a different value depending on the following two cases,
8. the method of claim 2, further comprising, prior to the method: determining a signal transmission model for communication among the edge node, the direct transfer node, the relay node and the base station based on a non-orthogonal multiple access (NOMA) technology and an Open Mobile Alliance (OMA) technology, wherein the signal transmission model specifically comprises:
determining N edge nodes, R relay nodes, W direct transmission nodes, and C channels under the base station, where N is {1,2,3, …, N }, R is {1,2,3, …, R }, W is {1,2,3, …, W }, and C is {1,2,3, …, C };
the relay node receives a signal sent by the edge node through NOMA technology to obtain a first signal yrThe first signal yrThe algorithm is as follows:
wherein Hn,rRepresenting the channel gain, θ, of the edge node n to the relay node rnRepresenting the transmission power control coefficient, P, of the edge node nnRepresenting the maximum transmit power, S, of the edge node nnRepresenting signals from edge nodes n, etan,r,cRepresenting channel segmentsThe coefficient, ξ, represents an additive white Gaussian noise signal, andσ2representing additive white Gaussian noise power, wherein N belongs to N, and R belongs to R;
further, Hn,rThe algorithm is as follows:
wherein the content of the first and second substances,representing small-scale fading of the channel to the relay node r of the edge node n and satisfying a gaussian distributiondn,rDenotes the distance from the edge node n to the relay node r, λ is the path loss exponent;
the base station receives the first signal sent by the relay node through the OMA technology and the signal sent by the direct transfer node through the NOMA technology, and a second signal y is obtained by decoding through a Successive Interference Cancellation (SIC) technologyBSThe second signal yBSThe algorithm is as follows:
wherein Hw,BSRepresenting the channel gain, H, from the direct transfer node w to the base stationr,BSRepresenting the channel gain, P, from the relay node r to the base stationwRepresenting the transmission power, S, of the direct-transfer nodewRepresenting signals from direct-transfer nodes, λw,cDenotes the channel allocation coefficient, murIs a relay gain factor;
Hw,BSthe algorithm is as follows:
wherein the content of the first and second substances,representing small-scale fading of the channel from the direct transfer node w to the base station and satisfying a gaussian distributiondw,BSRepresents the distance from the direct transfer node w to the base station;
Hr,BSthe algorithm is as follows:
wherein the content of the first and second substances,representing small scale fading of the relay node to base station channel and satisfying a gaussian distributiondr,BSRepresents the distance from the relay node r to the base station;
based on Shannon's theorem, calculating the receiving rate R of the base station for receiving the second signalsumThe receiving rate RsumThe algorithm is as follows:
where B denotes the channel bandwidth, τnIndicating that the signal sent by the edge node n is amplified and forwarded by the relay node r on the channel c, and the received signal-to-noise ratio, tau, at the base stationwThe signal transmitted by the direct transmission node w reaches the base station through the channel c, and the receiving signal-to-noise ratio at the base station is represented;
in particular, taunThe calculation method comprises the following steps:
wherein Hi,rDenotes the channel gain, θ, from edge node i to relay node riRepresenting the transmission power control coefficient, P, of the edge node iiRepresenting the maximum transmit power, σ, of the edge node i2For additive white Gaussian noise power, i belongs to N, i is not equal to N, and thetaiPi<θnPn;
τwThe calculation method comprises the following steps:
9. the method of claim 8, further comprising, after the method:
limiting the transmission power of the edge node multiplexing the same channel specifically includes:
when etan,r,cWhen 1, satisfy
Wherein, PtotnFor the threshold value of the transmission power, i ≠ n, αiPi<αnPn;
Determining that each transmission link meets the QoS requirement of the system, and specifically meeting the following conditions:
wherein, tauoA minimum value representing a received signal-to-noise ratio;
limiting each edge node, the direct transfer node and the relay node to only allocate one channel, and specifically satisfying the following conditions:
limiting the number of the edge nodes accessed by each channel, and specifically meeting the following conditions:
wherein q ismaxRepresenting the maximum number of edge nodes allowed to access per channel.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 9 when executing the program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110164357.3A CN113163479A (en) | 2021-02-05 | 2021-02-05 | Cellular Internet of things uplink resource allocation method and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110164357.3A CN113163479A (en) | 2021-02-05 | 2021-02-05 | Cellular Internet of things uplink resource allocation method and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113163479A true CN113163479A (en) | 2021-07-23 |
Family
ID=76882780
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110164357.3A Pending CN113163479A (en) | 2021-02-05 | 2021-02-05 | Cellular Internet of things uplink resource allocation method and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113163479A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114339788A (en) * | 2022-01-06 | 2022-04-12 | 中山大学 | Multi-agent ad hoc network planning method and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190156197A1 (en) * | 2017-11-22 | 2019-05-23 | International Business Machines Corporation | Method for adaptive exploration to accelerate deep reinforcement learning |
CN110418416A (en) * | 2019-07-26 | 2019-11-05 | 东南大学 | Resource allocation methods based on multiple agent intensified learning in mobile edge calculations system |
CN111385894A (en) * | 2020-03-17 | 2020-07-07 | 全球能源互联网研究院有限公司 | Transmission mode selection method and device based on online reinforcement learning |
CN111695690A (en) * | 2020-07-30 | 2020-09-22 | 航天欧华信息技术有限公司 | Multi-agent confrontation decision-making method based on cooperative reinforcement learning and transfer learning |
-
2021
- 2021-02-05 CN CN202110164357.3A patent/CN113163479A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190156197A1 (en) * | 2017-11-22 | 2019-05-23 | International Business Machines Corporation | Method for adaptive exploration to accelerate deep reinforcement learning |
CN110418416A (en) * | 2019-07-26 | 2019-11-05 | 东南大学 | Resource allocation methods based on multiple agent intensified learning in mobile edge calculations system |
CN111385894A (en) * | 2020-03-17 | 2020-07-07 | 全球能源互联网研究院有限公司 | Transmission mode selection method and device based on online reinforcement learning |
CN111695690A (en) * | 2020-07-30 | 2020-09-22 | 航天欧华信息技术有限公司 | Multi-agent confrontation decision-making method based on cooperative reinforcement learning and transfer learning |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114339788A (en) * | 2022-01-06 | 2022-04-12 | 中山大学 | Multi-agent ad hoc network planning method and system |
CN114339788B (en) * | 2022-01-06 | 2023-11-17 | 中山大学 | Multi-agent ad hoc network planning method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11388644B2 (en) | Apparatus and method for load balancing in wireless communication system | |
US9532296B2 (en) | Method of multi-hop cooperative communication from terminal and base station and network for multi-hop cooperative communication | |
WO2015016986A1 (en) | Controlling interference | |
JP6314490B2 (en) | Determination of wireless communication precoder | |
US11601848B2 (en) | Method and apparatus for offloading data in wireless communication system | |
JP2018501689A (en) | How to manage communications between multiple mobiles | |
CN103209427B (en) | User-channel-quality-based collaborative user selection method for source users | |
Patil et al. | Stochastic modeling of depth based routing in underwater sensor networks | |
US20140169262A1 (en) | Communication method and apparatus for multi-hop multi-session transmission | |
CN102711257B (en) | A kind of resource allocation methods and equipment | |
CN111050387B (en) | Base station dormancy method and device based on energy efficiency estimation, electronic equipment and medium | |
CN108777857B (en) | Access control method and system under coexistence scene of URLLC and mMTC | |
CN113163479A (en) | Cellular Internet of things uplink resource allocation method and electronic equipment | |
CN105530203A (en) | Access control method and system for D2D communication link | |
CN113543065B (en) | Communication resource allocation method based on reinforcement learning and related equipment thereof | |
US11864020B2 (en) | Uplink bandwidth estimation over broadband cellular networks | |
CN116801367A (en) | Cross link interference suppression method, network node and storage medium | |
CN108337690B (en) | Multi-standard network resource allocation method applied to distributed integrated access system | |
CN113796127A (en) | Cell selection in a multi-frequency communication network | |
JP6457409B2 (en) | Scheduling apparatus and method | |
CN117318775B (en) | Multi-user communication system and transmission method, equipment and medium thereof | |
US11791959B2 (en) | Methods, apparatus and machine-readable mediums for signalling in a base station | |
Gardazi et al. | On achieving throughput optimality with energy prediction–based power allocation in 5G networks | |
Liu et al. | Robust Power Control in TDMA-based Vehicular Communication Network | |
WO2021109135A1 (en) | Method and access network node for beam control |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210723 |