CN115915454A - SWIPT-assisted downlink resource allocation method and device - Google Patents

SWIPT-assisted downlink resource allocation method and device Download PDF

Info

Publication number
CN115915454A
CN115915454A CN202211225933.1A CN202211225933A CN115915454A CN 115915454 A CN115915454 A CN 115915454A CN 202211225933 A CN202211225933 A CN 202211225933A CN 115915454 A CN115915454 A CN 115915454A
Authority
CN
China
Prior art keywords
user equipment
resource allocation
tolerant
machine user
machine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211225933.1A
Other languages
Chinese (zh)
Inventor
陈硕
卫醒
李学华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Information Science and Technology University
Original Assignee
Beijing Information Science and Technology University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Information Science and Technology University filed Critical Beijing Information Science and Technology University
Priority to CN202211225933.1A priority Critical patent/CN115915454A/en
Publication of CN115915454A publication Critical patent/CN115915454A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Mobile Radio Communication Systems (AREA)

Abstract

The application discloses a method and a device for downlink resource allocation assisted by SWIPT. Wherein, the method comprises the following steps: obtaining a state observation value of a current environment state of a communication link between tolerant machine equipment user equipment and a machine equipment gateway; based on the state observation value of the current environment state, selecting a resource allocation strategy for the tolerant machine user equipment by using a resource allocation model constructed based on a neural network; allocating downlink resources to the tolerant machine user equipment based on the selected resource allocation strategy; wherein the tolerant machine user device requires a machine user device that completes a task of transmission of the periodically generated payload. The method and the device solve the technical problems that the machine user equipment with the energy limitation characteristic can generate overhigh energy consumption and uneven resource distribution of various internet of things equipment when a communication link is established, and particularly the machine user equipment with low QoS (quality of service) requirements can obtain poor resource distribution and therefore the performance is poor.

Description

SWIPT-assisted downlink resource allocation method and device
Technical Field
The present application relates to the field of communications, and in particular, to a method and an apparatus for SWIPT-assisted downlink resource allocation.
Background
The rapid development of the internet of things technology has derived data-driven fields such as the internet industry, intelligent traffic systems and intelligent cities, and the human society has gradually become digital under the drive of the fields.
Machine-to-machine communication (M2M) is a major factor contributing to the development of digitization as it does not require direct human intervention and near-timely wireless connectivity. With the exponential growth of the number of networked machine devices, M2M communication will offer new possibilities for citizen services in the scenes of home, entertainment, medical care, work, etc., and is expected to become a key driver for realizing the beauty vision of everything association.
Unlike conventional human-to-human communication (H2H), M2M communication services have unique features in terms of network energy efficiency, data transmission amount, and high-reliability transmission in some task-critical services. Although a low-power wide area network is developed in the technology of internet of things and is specially used for M2M communication, the low-power wide area network can provide long-distance transmission with extremely low power consumption for M2M devices, but the network can only provide a very low transmission rate, and it is difficult to meet the QoS requirements of most critical M2M services. Therefore, the cellular network is considered as a key factor for deploying M2M communication because of its larger bandwidth and stronger network scalability to satisfy various types of service access and provide high data transmission rate. At the same time, the cellular network itself is too power consuming and costly for many M2M communication applications.
In order to weaken the negative influence of the cellular network on the M2M communication, wireless energy-carrying communication (SWIPT) is a technology with wide application prospects, and can greatly improve the energy efficiency of M2M devices in an H2H/M2M coexisting cellular network. While conventional energy harvesting techniques may enable a mobile device or base station to draw energy from a natural environment such as wind or sunlight, the efficiency of such techniques is severely constrained by geographic location and weather conditions. SWIPT is a radio frequency-based energy harvesting technology that allows an application device to obtain relatively controllable and stable energy. Generally, the technology can convert interference signals into electric energy, namely, a strong interference environment can bring more energy collection while reducing the system throughput. Furthermore, the SWIPT assisted H2H/M2M coexisting cellular network also faces many challenges in meeting user diverse quality of service (QoS) requirements, such as intra-and inter-layer interference control, spectral subband selection, trade-off between system information transmission rate and energy collection amount, etc.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the application provides a SWIPT-assisted downlink resource allocation method and device, so as to at least solve the technical problem that machine user equipment obtains poor resource allocation and performance is poor.
According to an aspect of the embodiments of the present application, a method for SWIPT-assisted downlink resource allocation is provided, including: obtaining a state observation value of a current environment state of a communication link between tolerant machine equipment user equipment and a machine equipment gateway; based on the state observation value of the current environment state, selecting a resource allocation strategy for the tolerant machine user equipment by utilizing a resource allocation model constructed based on a neural network; allocating downlink resources for the tolerant machine user equipment based on the selected resource allocation policy; wherein the tolerant machine user device requires a machine user device that completes a task of transmission of the periodically generated payload.
According to another aspect of the embodiments of the present application, there is also provided a downlink resource allocation apparatus assisted by SWIPT, including: an acquisition module configured to acquire a state observation of a current environmental state of a communication link between a tolerant machine user device and a machine device gateway; a selection module configured to select a resource allocation policy for the tolerant machine user equipment based on the state observation of the current environmental state using a resource allocation model constructed based on a neural network; the distribution module distributes downlink resources for the tolerant machine user equipment based on the selected resource distribution strategy; among them, a machine user equipment that completes a transmission task of a periodically generated payload is required.
In the embodiment of the application, based on the state observation value of the current environment state, the resource allocation model constructed based on the neural network is utilized to select the resource allocation strategy for the tolerant machine user equipment, and the technical problem that the performance of the machine user equipment is poor due to poor resource allocation is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a flowchart of a method for SWIPT-assisted downlink resource allocation according to an embodiment of the present application;
fig. 2 is a flowchart of a method for constructing a downlink resource allocation model according to an embodiment of the present application;
fig. 3 is a flowchart of another method for SWIPT-assisted downlink resource allocation according to an embodiment of the present application;
fig. 4 is a flowchart of another method for SWIPT-assisted downlink resource allocation according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a SWIPT-assisted downlink resource allocation apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a SWIPT-assisted downlink resource allocation system according to an embodiment of the present application;
fig. 7 is a diagram comparing the change in the number of M2M devices with the total energy efficiency of the M2M devices according to an embodiment of the present application;
fig. 8 is a schematic diagram of comparing the variation of the number of tolerant M2M devices with the QoS requirement satisfaction rate of H2H users according to an embodiment of the present application;
fig. 9 is a schematic diagram comparing variation in the number of tolerant M2M devices with probability of interruption of a critical M2M link according to an embodiment of the present application;
fig. 10 is a schematic diagram comparing the change of the number of tolerant M2M devices with the completion rate of load transmission according to an embodiment of the present application
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
According to an embodiment of the present application, there is provided a SWIPT-assisted downlink resource allocation method, as shown in fig. 1, the method includes:
step S102, obtaining a state observation value of the current environment state of a communication link between the tolerant machine equipment user equipment and the machine equipment gateway.
Before obtaining a state observation value of a current environment state of a communication link between tolerant machine user equipment and a machine equipment gateway, dividing the machine user equipment into tolerant machine user equipment and key machine user equipment based on a service type of the machine user equipment; wherein the critical machine user equipment is machine user equipment with transmission reliability requirements higher than a preset requirement threshold, and the tolerant machine user equipment is machine user equipment needing to complete transmission tasks of the periodically generated loads.
In addition to considering the existing conventional H2H (Human-to-Human) user equipment (also referred to as Human user equipment or H2H user), the present embodiment further subdivides the Machine-Type Communication Devices (MTCD) into tolerant Machine user equipment (also referred to as tolerant M2M device or tolerant M2M user) and key Machine user equipment (also referred to as key M2M device or key M2M user), so that different resource allocation strategies can be selected according to the traffic characteristics of different Machine user equipment, and thus resources can be allocated to the Machine user equipment more reasonably.
In one example, a Wireless communication capability (SWIPT) may be provided on the machine user equipment to enable the machine user equipment to obtain energy from a radio frequency environment and simultaneously decode Information. By setting SWIPT, the MTCD can acquire energy and decode information at the same time.
And step S104, based on the state observation value of the current environment state, selecting a resource allocation strategy for the tolerant machine user equipment by using a resource allocation model constructed based on a neural network.
In one example, the resource allocation model may be constructed based on the following method, as shown in fig. 2:
step S1042, a state function is constructed.
The state function is a collection of state observations. For example, the state function may be constructed based on channel gain information for the communication link on each of the spectral subbands, an amount of interference power experienced by the communication link on each of the spectral subbands, a load residual and a transmission time residual for the communication link, a current number of iterations, and a greedy factor representing a current environmental heuristic for the communication link.
In step S1044, an action function is constructed.
The action function is a set of downlink spectrum resources, transmit power levels and power split ratios.
In step S1046, a reward function is constructed.
A reward function is constructed based on a balance of resource allocation optimization objectives and QoS constraints.
In some examples, the reward function may be constructed as follows.
Firstly, determining a reward function penalty item provided by the tolerant type machine user equipment based on the load residual quantity and the transmission time residual quantity of the tolerant type machine user equipment; then, determining a reward function penalty item provided by the key machine user equipment based on the interruption probability of the key machine user equipment; then, determining a reward function penalty term of the human user equipment based on the signal-to-interference-and-noise ratio of the human user equipment; finally, a reward function is constructed based on the total energy value of the communication links of all the machine user devices, the reward function penalty provided by the tolerant machine user device, the reward function penalty provided by the key machine user device, and the reward function penalty of the human user device.
In the embodiment, aiming at QoS constraint, explicit modeling and solving are respectively performed on a threshold value meeting the signal-to-interference-and-noise ratio of H2H user equipment, the interruption probability of a key M2M communication link and the transmission probability of an effective load of the M2M communication link, and a corresponding reward function punishment item is obtained, so that the reward value of a reward function is more accurate, and the rationality of resource allocation is further improved.
In other examples, the reward function may be constructed as follows.
First, a Quality of Service (QoS) constraint is set. For example, the QoS constraint condition of the human user equipment is set to be that the signal to interference plus noise ratio SINR is greater than a set minimum threshold; setting a QoS constraint condition of the tolerant machine user equipment to be that the transmission success rate of the load with the preset size V in the time constraint T is higher than a set success rate threshold; the QoS constraint of the critical machine user equipment is set such that the outage probability is not higher than a set outage threshold.
Next, the total energy efficiency value EE realized by the tolerant machine user equipment and the key machine user equipment is used as a resource allocation optimization target. In this embodiment, the joint spectrum, the transmission power, and the power splitting PS ratio are allocated with the EE maximization of the M2M communication link as a target and under the condition of considering the user QoS constraint.
Finally, a reward function is constructed based on the resource allocation optimization objective and the QoS constraint condition.
Step S1048, building an experience replay pool, where the experience replay pool is used to store data and provide data for training the neural network.
Constructing an experience replay pool for storing training data for training the resource allocation model, wherein the training data comprises state observation values, reward values and selected resource allocation strategies at the current time and the next time; wherein the neural network comprises: the training network trains in a random gradient descending mode by using data randomly extracted from the experience replay pool in each iteration; the target network is a fixed neural network, and the network parameters of the target network are updated once every period of time to be the training network parameters at the current moment.
Step S1049, building and training a resource allocation model based on the state function, the action function, the reward function, and the experience replay pool.
And step S106, distributing downlink resources for the tolerant machine user equipment based on the selected resource distribution strategy.
After the state observation value of the current environment state is obtained or after downlink resources are allocated, data related to the allocated resources can be put into an experience replay pool corresponding to each communication link to be used as training data.
For example, the total effective value of all the machine user devices and the QoS levels of the respective machine user devices and human user devices at the current moment are calculated, and the total effective value, the QoS levels, the interference power levels of the communication links on each frequency spectrum subband, the selected resource allocation strategy, the state observation value, the reward output by the reward function, and the prior information received by the machine device gateway are put into an experience replay pool of each communication link to be used as training data for training the resource allocation model.
The embodiment solves the problem of energy resource allocation in a SWIPT-assisted H2H/M2M (Machine-to-Machine) coexisting cellular network. By allocating appropriate resources (i.e., spectrum, power and Power Splitting (PS ratio)) to tolerant machine user equipment, energy Efficiency (EE) is achieved and QoS requirements are guaranteed.
The present embodiment provides a stable training process based on the state of behavior tracking, and the shared reward function enables the agents to work together in a distributed manner. In addition, the embodiment gives mathematically exact expressions of QoS constraints, thereby reducing the computational complexity of the reward function.
In addition, in order to support different QoS requirements, the present embodiment models the EE optimization problem of multiple M2M communication links as a multi-Agent (Agent) problem, where multiple tolerant M2M communication links share the spectrum occupied by the critical M2M communication link and the H2H communication link. In addition, the present embodiment enables the agent to adapt to a dynamic network environment by designing a state function, an action function, and a reward function, thereby achieving an effect of target optimization.
Example 2
According to an embodiment of the present application, there is further provided a SWIPT-assisted QoS-based downlink resource allocation method, as shown in fig. 3, the method includes:
step S302, initializing a resource allocation system.
Obtaining the channel conditions of the frequency spectrum sub-bands occupied by the H2H users and the key M2M users and prior information such as SINR (signal to interference plus noise ratio) or interrupt probability and the like based on the pre-distributed frequency spectrum sub-bands and transmitting power, more specifically, determining the initial state of each tolerant M2M link at the time t by the initialization means
Figure BDA0003879679910000071
And step S304, building a training neural network and a target neural network for each agent.
Taking each tolerant M2M link as an agent, and building a training neural network Q (s, a; omega) and a target neural network Q for each agent - (s,a;ω - ) Wherein, ω and ω - Representing parameters of a training network and a target network respectively, the input layer of the network is s,namely, the state observation value of each tolerant link, and the estimated reward value corresponding to each action selection a of the output layer is represented as a Q value.
Establishing an experience replay pool D for each agent to store the state observation values S of the self-link in the current time t and the next time t +1 t And S t+1 Selection of action A at the current time t t And a prize value R received at the next moment t+1 In topology (S) t ,A t ,R t+1 ,S t+1 ) Data stored at time t in D is shown.
Step S306, designing a state function, an action function and a reward function for the agent.
And each agent establishes a link communication energy efficiency optimization model based on the prior information and action selection made according to the local state observation value. The method specifically comprises the following steps: and establishing constraints of transmitting power and power split ratio and QoS constraints.
The QoS constraint is modeled as follows according to the actual characteristics of the service: H2H traffic is generally traffic based on voice and internet communications, which puts a high demand on data transmission rate, and therefore the QoS constraint of H2H traffic is modeled as an SINR value greater than a set minimum threshold; most M2M services are event-driven, and their traffic is typically generated periodically at different frequencies according to the requirements of the M2M services themselves, so that the QoS constraint of the tolerant M2M services is modeled as the transmission success rate of data packets of size V within the time constraint T; furthermore, for critical M2M services, it tends to have strict latency and transmission reliability requirements, and therefore the QoS constraint of the critical M2M services is modeled as the outage probability not higher than the set outage threshold.
Step S308, training the training network Q and distributing network resources.
Based on a link communication energy efficiency optimization model, the tolerant M2M link continuously randomly extracts data from a large amount of data stored in an experience replay pool to train the training network Q, reduces errors between parameters of the training network and parameters of a target network in a random gradient descending mode, and finally can make a resource allocation scheme meeting the constraint conditions and QoS constraints after the resource allocation model is trained and converged.
In addition, in order to stabilize the training environment, the implementation adopts a low-dimensional mapping high-dimensional behavior tracking method, so that each tolerant M2M link can know the state of the current learned experience in the face of training data randomly selected from an experience replay pool. In addition, in order to improve the overall energy efficiency performance, the reward function is shared in the embodiment to encourage each tolerant M2M link to explore the environment in a cooperative manner, and make an intelligent decision to improve the overall performance to the maximum.
In one embodiment, the specific process of resource allocation may include: according to the received prior information, a machine equipment gateway selects a frequency spectrum sub-band with the optimal channel condition for multiplexing, and allocates proper transmitting power and power split ratio to the tolerant machine equipment in the link, at each moment (1 ms), the channel condition of the frequency spectrum sub-band in the network changes due to channel fading, and in addition, the QoS index of each type of user in the network is influenced by resource allocation selection made by each tolerant M2M link at each moment. The tolerant M2M judges the advantages and disadvantages of the selection of the resource allocation scheme at the moment according to the changed QoS indexes, namely SINR of the H2H user, the interruption probability of the key M2M user, the load surplus and the transmission time surplus of the tolerant M2M user and the reachable energy efficiency of all M2M users, and stores the current channel condition, the QoS indexes, the received interference, the resource selection scheme and the reachable energy efficiency into an experience replay pool, and the experience replay pool is used as training data of a training network Q in each tolerant M2M link.
For spectrum sub-band selection, transmit power and power split ratio allocation tasks of a tolerant M2M link, a distributed implementation scheme is employed. At each time t, each tolerant M2M link selects a frequency spectrum sub-band with the optimal channel condition and proper transmitting power and power split ratio according to the prior information and the self training network, and performs communication by using the resource allocation scheme, and then obtains the overall energy efficiency of all M2M devices and QoS indexes of other users in the network, which are called as reward functions. All tolerant M2M links in the network share the same reward function, so the resource allocation method provided by the application encourages cooperative behavior among all links to achieve the aim of maximizing the overall energy efficiency of the M2M.
The resource allocation task further includes: at each time t, updating an available environment data set of a tolerant M2M link M, n according to prior information received by a machine equipment gateway MTCG M in the cluster, and designing a state set as follows:
Figure BDA0003879679910000091
wherein,
Figure BDA0003879679910000092
comprising channel gain information on each spectral subband for link m, n>
Figure BDA0003879679910000093
The magnitude of the interference power, V, experienced by each spectral sub-band of the link m, n is described m,n ,T m,n Indicating the payload and transmission time residuals for links m, n, respectively, { QoS } h } h∈H ,{QoS s } s∈S Are binary variables respectively representing the QoS indexes of the H2H users and the key M2M users, namely the QoS constraints of all the H2H users are satisfied, then { QoS is h } h∈H =1, otherwise 0, { QoS s } s∈S Whether the QoS constraint of the concerned key type M2M link is satisfied is expressed by the same binary variable.
In addition, e represents the current iteration number, e is a greedy factor, and represents the current environmental exploration rate of the link m, n, and the two items added into the state set in the embodiment of the present application are used for tracking the environmental state located at the iteration number e and the exploration probability e in the training process: in the training process, the resource allocation strategy of each tolerant M2M link changes along with the change of the resource allocation strategies of other links, so that from the perspective of a single link, the network environment changes all the time, and because the experience extracted from the experience replay pool has randomness, the training data extracted at the current time does not reflect the network environment at the time, but is possibly outdated data, the state of the experience and the strategy selection of each link are difficult to know when each link faces the training data, and a value function really representing the strategy selection of each link is a high-dimensional neural network parameter and cannot be used as the experience data for learning of each link, so that the design of adding iteration times e and greedy factors e into the experience data to map the high-dimensional neural network parameter is made, so that the tolerant M2M link can track the environment state when facing the selected experience data, and the purpose of stable training is achieved.
According to the environment information, the tolerant M2M link M, n utilizes a greedy strategy at the time t: selecting a frequency spectrum sub-band randomly according to the probability E, selecting the transmitting power and the power split ratio or selecting the optimal action selection strategy given by the current training network according to the probability 1-E, wherein an action (resource) set is defined as follows:
Figure BDA0003879679910000101
wherein
Figure BDA0003879679910000102
Representing a spectral sub-band selection, <' > based on>
Figure BDA0003879679910000103
Figure BDA0003879679910000104
Represents the transmit power and the power split selection, respectively>
Figure BDA0003879679910000105
Representing the maximum transmit power, and L and Z represent the division levels of power and power split ratios, respectively. />
In the present embodiment, it is assumed that the set of spectral subbands available in the network is indexed with K = vus, where H = {1,2.. H } represents HUE, and S = {1,2.. S } represents key machine user equipment (CMTCD). Furthermore, M = {1,2.. M } represents MTCG, and tolerant machine user equipment (TMTCD) under each MTCG management is represented by N = {1,2.. N }, where all TMTCDs and CMTCDs are equipped with switp technology, represented by the set D, i.e., D = S $ N.
Tolerant M2M link M, n selection of actions according to time t
Figure BDA0003879679910000106
Establishing a communication link, and according to the selection of the frequency spectrum sub-band, dividing the SINR of the user with high QoS requirement into the following two calculation conditions:
Figure BDA0003879679910000107
Figure BDA0003879679910000108
in the above formula, the first term of the denominator
Figure BDA0003879679910000109
Represents the interference on spectral subband k, which is picked up by an H2H user or a critical M2M user>
Figure BDA00038796799100001010
Figure BDA0003879679910000111
Cross-cluster interference and co-cluster interference experienced when multiplexing the kth spectral subband>
Figure BDA0003879679910000112
The second term σ being a binary variable representing the occupancy of a spectral subband 2 Representing an additive white gaussian noise power. P BS,h And P m,s Representing the transmission power, ρ, allocated by the base station and the machine equipment gateway to the H2H users and the critical M2M users, respectively m,s RepresentPower split ratio, P, of machine gateway to key M2M user BS,h Indicates the transmission power allocated by the base station to the H2H user, <' > is greater or less>
Figure BDA0003879679910000113
Represents the channel gain between the base station and the H2H user, is->
Figure BDA0003879679910000114
Represents the channel gain between the machine equipment gateway and the H2H subscriber, and->
Figure BDA0003879679910000115
And the channel gain between the machine equipment gateway and the key M2M user in the same cluster is shown. After calculating the SINR of the H2H user and the key M2M user, using the following formula:
Figure BDA0003879679910000116
Figure BDA0003879679910000117
respectively judging whether the QoS requirements of the H2H user and the key M2M user are guaranteed or not, wherein
Figure BDA0003879679910000118
And
Figure BDA0003879679910000119
SINR thresholds, representing H2H traffic and critical M2M traffic, respectively, are asserted>
Figure BDA00038796799100001110
The tolerable maximum probability of interruption is specified. If QoS constraints of H2H service are satisfied, environment state information { QoS h } h∈H =1, otherwise { QoS h } h∈H =0, respectively, { QoS h } s∈S Indicating QoS satisfaction of concerned key type M2M users by same indicatorAnd (4) degree.
For the tolerant M2M link itself, the SINR value is represented by:
Figure BDA00038796799100001111
wherein,
Figure BDA00038796799100001112
the interference suffered by the mth and n tolerant M2M links is divided into the following two cases according to different occupancy of the multiplexed frequency spectrum sub-band:
Figure BDA00038796799100001113
Figure BDA00038796799100001114
wherein, P m,s Indicating the transmission power allocated by the machine equipment gateway to the key type M2M user,
Figure BDA00038796799100001115
indicating a tolerant M2M link M, n channel gain on spectral subband k, <' >>
Figure BDA00038796799100001116
Represents a spectral sub-band allocation condition indicator, <' > or>
Figure BDA00038796799100001117
Represents the transmit power of the mth machine device gateway on spectral sub-band k, <' > or>
Figure BDA0003879679910000121
Representing the channel gain of the mth machine device gateway and the tolerant M2M device n on spectral sub-band k, device for selecting or keeping>
Figure BDA0003879679910000122
Represents a spectral sub-band allocation condition indicator, <' > or>
Figure BDA0003879679910000123
Indicating that a tolerant M2M link M, n is communicating on spectral sub-band k, is->
Figure BDA0003879679910000124
Indicating that a tolerant M2M link M, n is not communicating on spectral sub-band k, is->
Figure BDA0003879679910000125
The transmission power distributed to the tolerant M2M users N by the machine equipment gateway M is represented, M represents the total number of the machine equipment gateways, and N represents the total number of the tolerant M2M devices in the control range of one machine equipment gateway.
Considering that the information to be processed by the tolerant M2M service is often data packets periodically generated at different frequencies according to its own service characteristics, qoS of the data packets is modeled as a success rate of completing data packet transmission within a time budget T:
Figure BDA0003879679910000126
wherein,
Figure BDA0003879679910000127
representing the information transmission rate of the link M, n at different times t on a spectral sub-band k, B being the bandwidth of each spectral sub-band, where V represents the periodically generated M2M payload in bits, Δ T Is the channel coherence time.
EE of all SWIPT-assisted M2M devices may be expressed as a ratio of spectral efficiency to energy consumption, and all M2M devices are expressed with D = S £ N:
Figure BDA0003879679910000128
wherein
Figure BDA0003879679910000129
Represents the achievable spectral efficiency of all M2M devices;
Figure BDA00038796799100001210
representing the energy consumption of all M2M devices. Wherein P is C Representing all M2M link power consumption.
Figure BDA00038796799100001211
Represents the energy collection amount of the M2M equipment, wherein theta epsilon [0,1] is the energy conversion efficiency.
So far, the M2M link energy efficiency optimization model of the present application is summarized as follows:
Figure BDA0003879679910000131
Figure BDA0003879679910000132
Figure BDA0003879679910000133
Figure BDA0003879679910000134
Figure BDA0003879679910000135
Figure BDA0003879679910000136
Figure BDA0003879679910000137
Figure BDA0003879679910000138
wherein
Figure BDA0003879679910000139
And &>
Figure BDA00038796799100001310
Respectively represent power split ratio transmission power allocation and spectrum sub-band allocation strategies, conditions (1 a) (1 b) (1 c) respectively represent QoS constraints of an H2H user, a key M2M user and a tolerant M2M user, and condition (1 d) specifies that the power split ratio of an M2M device applying SWIPT is not more than 1 and ^ or in (1 e)>
Figure BDA00038796799100001311
Is a binary variable, tolerant M2M link M, n is assigned a spectral subband k denoted 1, otherwise 0 @in (1 f)>
Figure BDA00038796799100001316
An upper transmit power bound for the tolerant M2M link is specified, and (1 g) is used to constrain each tolerant M2M link to select at most one spectral sub-band for communication.
For each communication, interfering inter-link channel gain model (where g represents the channel gain), the following is summarized:
by using
Figure BDA00038796799100001312
And calculating the channel gain of an M2M communication link M formed between the machine equipment gateway M and the tolerant machine equipment n on the frequency spectrum sub-band k in the same cluster. Wherein X m,n β m,n Representing path loss and shadow fading, known as large scale fadingBoth terms are independent of frequency and remain unchanged for a longer time>
Figure BDA00038796799100001313
Indicating frequency dependent fast fading.
By using
Figure BDA00038796799100001314
And calculating the channel gain of an M2M communication link M, s formed between the machine equipment gateway M and the key machine equipment s in the same cluster on the frequency spectrum sub-band k.
By using
Figure BDA00038796799100001315
And calculating the channel gain of an H2H communication link BS, H formed between the base station BS and the human type user equipment H on the frequency spectrum sub-band k.
By using
Figure BDA0003879679910000141
And calculating the interference channel gain of the base station BS and the tolerant M2M link M, n on the frequency spectrum sub-band k.
By using
Figure BDA0003879679910000142
And calculating the interference channel gain of the machine equipment gateway M' and the tolerant M2M link M, n on the frequency spectrum sub-band k among different clusters.
By using
Figure BDA0003879679910000143
And calculating the interference channel gain of the machine equipment gateway M' and the key M2M link M, s on the frequency spectrum sub-band k among different clusters.
By using
Figure BDA0003879679910000144
And calculating the interference channel gain of the machine equipment gateway m and the human type user equipment h on the frequency spectrum sub-band k.
In order to enable multiple tolerant M2M links to move towards the goal of maximizing overall energy efficiency in the training process and ensure QoS constraints of various types of users, the overall EE and QoS constraints are added into the design of a reward function.
Obviously, the SINR value of the H2H user can be easily obtained from the received power and interference of the user at each time t, and the QoS constraint of the M2M user is used as a probability value, and each agent is required to randomly generate a large number of analog channels at each time to obtain a required probability value, thereby consuming a large amount of computing resources and slowing down the convergence speed of the algorithm.
First, use U n Indicating that the transmission time of the tolerant M2M user is remained when the load is not transmitted, and transmitting the U after the transmission is finished n Set to a constant value. Thus, at time t, the reward penalty associated with a tolerant M2M user is set to:
Figure BDA0003879679910000145
through such a transition, the agents are able to take into account both the transmission time remaining and the impact of the load transmission rate during the training process.
Secondly, the accurate value of the interruption probability of the accurate key M2M equipment is obtained by theoretical analysis and mathematical transformation, and when the s-th key M2M equipment shares the kth frequency spectrum sub-band with the tolerant M2M equipment in different clusters, the interruption probability can be replaced as follows:
Figure BDA0003879679910000151
according to the theory: suppose z 1 ,...,z n Is a mean value of
Figure BDA0003879679910000152
The independent index of (2) distributes random variables, and the formula can be obtained:
Figure BDA0003879679910000153
where c is a positive constant. Therefore, according to the rayleigh fading characteristics and the above equation, the interrupt probability expression can be rewritten as:
Figure BDA0003879679910000154
wherein
Figure BDA0003879679910000159
The interference source can be distinguished as m or m 'according to different interference sources, and the m or m' respectively represents the inter-cluster interference and the intra-cluster interference. Furthermore, is>
Figure BDA00038796799100001510
M and m' may be included at the same time, which means that the interference received at this time includes both inter-cluster interference and intra-cluster interference. It is noted that the multiply term of the above equation is based on the fact that the interference is only inter-cluster interference>
Figure BDA0003879679910000155
Keep II only n An item. Accordingly, is taken up or taken off>
Figure BDA0003879679910000156
Represented as the s-th CMTCD, is disturbed on spectral subband k, represented as->
Figure BDA0003879679910000157
In addition, when
Figure BDA00038796799100001511
When m and m' are included simultaneously, the calculation of the interruption probability needs to multiply all the terms in the above expression continuously, thereby generating a large calculation resource overhead.
To further speed up the algorithm convergence, the above equation is converted to the form:
Figure BDA0003879679910000158
the accurate interrupt probability value of each key M2M device can be obtained by using the formula with smaller computing resource consumption, and in order to enable each agent to further distinguish the quality of the selected resource allocation strategy in the training process, the method further sets
Figure BDA0003879679910000166
To express the QoS satisfaction of the key M2M device within the time slot t, it is expressed as follows:
Figure BDA0003879679910000161
wherein ps is an interruption probability value obtained by using the interruption probability expression derived from the above, and similarly
Figure BDA0003879679910000162
To represent the QoS satisfaction of 2H users within H slot t:
Figure BDA0003879679910000163
to sum up, the present application sets the global reward function to:
Figure BDA0003879679910000164
mu is a positive real number to balance reward and punishment, all the agents can cooperate to explore a position environment and learn a resource allocation strategy which can improve the overall efficiency to the maximum by enabling the agents to share the same reward function, and meanwhile, each punishment item can ensure that each agent considers QoS constraint of various users when performing resource allocation.
After each iteration is finished, all agents randomly draw from the experience replay poolTraining the training network Q by taking small batch of data, and updating the training network parameter omega in a random gradient descending manner to approach the target network parameter omega - And the network parameters are transmitted to the target network after a period of iteration.
The target network is represented as:
Figure BDA0003879679910000165
wherein, γ is more than or equal to 0 and less than or equal to 1, which is a discount factor, and the magnitude of the value determines the importance degree of the current reward and penalty value in the iteration process, and the smaller the value is, the more the current reward and penalty value is seen, and vice versa. S t+1 Represents the state observed value at the next time, and a' represents the state S t+1 Selection of the optimal action to be made, ω - Representing the target network parameters.
In this embodiment, the loss function is expressed as:
L(θ)=E[(Q - -Q(S t ,A t ;ω)) 2 ]
wherein L (θ) represents a loss function, E represents a mathematical expectation, Q represents a training network value function, S t Represents a state observed value at time t, A t Represents the state S t The following actions are selected, ω denotes parameters of the training network.
According to the embodiment of the application, the SWIPT technology is allocated to the M2M equipment in the H2H/M2M coexisting cellular network, and under the complex interference environment brought by spectrum sharing, the frequency spectrum, the transmitting power and the power split ratio resources are allocated to the tolerant M2M link, so that the high EE performance is realized, and the QoS requirements of all types of users are ensured.
In addition, in the embodiment of the present application, in addition to the conventional H2H service, the SWIPT-assisted H2H/M2M coexisting cellular network is considered to include M2M service types, which are further distinguished into a critical service and a tolerant service, wherein M2M devices exist in a cluster form and establish a communication link with a local machine device gateway, and all M2M devices are equipped with the SWIPT technology, so that they can perform energy collection and information decoding simultaneously; an OFDM frequency spectrum sub-band is pre-allocated to an H2H user and a key M2M user in the network, and the other tolerant M2M users multiplex the same frequency spectrum resources.
In the resource allocation method provided by the application, the energy efficiency optimization problem is modeled as a multi-agent problem. Each tolerant M2M link serves as an agent, a set of system state, action and reward function design schemes are provided, and through the designed state set and reward function based on behavior tracking, the agents can select appropriate frequency spectrum sub-bands, transmitting power and power split ratio resources from an action set in a distributed cooperation mode more efficiently in the interaction process with an unknown network environment, and the purpose of optimizing the overall M2M energy efficiency is achieved. In addition, the QoS requirement (QoS constraint) of each type of users is expressed by an accurate mathematical formula, so that the calculation complexity in the model training process is further reduced.
Example 3
According to the embodiment of the present application, another SWIPT-assisted QoS-based downlink resource allocation method is further provided, as shown in fig. 4, the method includes:
step S402, initialization.
Initializing a resource allocation system, and obtaining the channel conditions of the frequency spectrum sub-bands occupied by the H2H link and the key M2M link, SINR (signal to interference and noise ratio) or interrupt probability and other prior information based on the pre-allocated frequency spectrum sub-bands and the transmitting power.
And S404, building a neural network and establishing an experience replay pool for each agent.
And regarding each tolerant M2M link as an agent, building a training neural network and a target neural network, and establishing an experience replay pool for each agent.
In the embodiment of the application, a resource allocation problem is modeled as a multi-agent intelligent decision problem, wherein each tolerant M2M link is considered as an agent for interacting with an unknown network environment and optimizing a resource allocation strategy. Next, a training network, a target network, and an experience replay pool are constructed for each agent using a resource allocation method, wherein the experience replay pool stores data used to train the training network.
In step S406, a state function, an action function and a reward function are designed for the agent.
And each agent establishes a link communication energy efficiency optimization model based on the prior information and action selection made according to the local state observation value, and stores data such as M2M system energy efficiency, various QoS indexes and interference suffered by the agent in an experience replay pool, wherein the data are returned by the link communication energy efficiency optimization model.
Designing the state function, the action function, and the reward function for the agent includes the steps of:
s3-1, designing a low-dimensional mapping high-dimensional trajectory tracking method in a state set: and adding the iteration times and greedy factors into the state set, so that the agent can track the environment state when the iteration times e and the exploration probability belong to the same element in the training process when extracting the training data, and the problem of non-stationarity of the training environment caused by multi-agent setting is solved.
S3-2, establishing a link communication energy efficiency optimization model by the agent according to the state observation value and the selected action, wherein the method specifically comprises the following steps: and formulating constraint conditions of the transmitting power and the power split ratio and QoS (quality of service) requirements, wherein the QoS requirements are respectively modeled as follows according to the actual characteristics of the service: H2H traffic is typically voice and internet communication based traffic, which puts a high demand on data transmission rate, and therefore models the QoS demand of H2H traffic as SINR value greater than the set minimum threshold; most M2M services are event-driven, and their traffic is typically generated periodically at different frequencies according to the M2M service's own needs, thus modeling the QoS requirements of tolerant M2M services as the transmission success rate of data packets of size V within the time constraint T; in addition, for critical M2M services, it often has strict latency and transmission reliability requirements, and therefore the QoS requirement of the critical M2M services is modeled as that the outage probability is not higher than a set threshold.
S3-3, the reward function is designed to be a combination of a main reward item and a punishment item, all agents share the same reward function, the cooperation of all agents is encouraged to explore an optimal resource allocation strategy, meanwhile, the QoS requirements of various users can be considered, and in addition, in order to reduce the consumption of computing resources and accelerate the convergence speed, the accurate mathematical expression is carried out on various QoS requirements in each punishment item of the reward function;
s3-4: calculating the total EE of the current time slot network M2M equipment and the QoS levels of various users through a link optimization model, storing the calculated EE, qoS levels, the interference of a tolerant M2M link, a selected action, a state observation value, obtained rewards and prior information received by a machine equipment gateway into an experience replay pool together, and using the prior information as training data of a training network;
step S408, training a resource allocation model and allocating resources.
Based on the link communication energy efficiency optimization model and the experience replay pool, the agent continuously extracts data randomly from a large amount of data stored in the experience replay pool to train the training network Q, reduces errors between parameters of the training network and parameters of a target network in a random gradient descending mode, and finally can make a resource allocation scheme meeting the constraint conditions and QoS requirements after model training is converged.
The batch data is randomly extracted from the experience replay pool to train the training network, so that the current experience and the past experience can be learned simultaneously, the data correlation is effectively eliminated, and the target network parameters are unchanged within a period of time through asynchronous updating of the target network parameters, so that the algorithm updating is more stable.
The SWIPT technique has two receiver designs, namely "time-switching" which switches between the duration of information decoding and energy collection, and "power-splitting" which splits the received energy into an information decoding portion and an energy collection portion using a power-splitting ratio. The present embodiment adopts a "power division" design, so the power division ratio allocation is introduced in the action set design of the resource allocation method. In other embodiments, "time switching" may be used to replace the power split ratio allocation with the time allocation.
In the aspect of the H2H/M2M coexisting cellular network applied in the prior art, because the network structure of the H2H/M2M coexisting cellular network is complex and the resource allocation task is variable, the performance of the conventional resource allocation scheme is poor, and the interference caused by rich service types and spectrum sharing of the machine user equipment is ignored; and, SWIPT is not used to obtain stable and reliable energy in a network model considering EH. In the practical application aspect of the prior art, the QoS requirements of each type of user are not accurately and mathematically modeled, which often results in poor performance of the user equipment as a low QoS requirement party; in addition, the traditional algorithm has high computational complexity and poor effect when solving the optimization problem with nonlinear constraint, and the centralized training of the intelligent algorithm has slower convergence speed.
Compared with the existing intelligent resource allocation method, the embodiment of the application introduces SWIPT assisted M2M equipment in the network structure to acquire stable and reliable energy from a radio frequency environment.
In addition, a multi-agent cooperation and distributed execution mode is selected in the implementation of the resource allocation method, and the resource allocation scheme of the multi-agent distributed execution can consider the self condition of each agent to make the most suitable resource allocation strategy, while the resource allocation strategy made by the single-agent centralized execution is more suitable for all agents in the network, so that the embodiment greatly lightens the service load of the base station, can update the resource allocation strategy of each agent in each iteration, and can realize better performance while greatly improving the training convergence speed.
It should be noted that for simplicity of description, the above-mentioned embodiments of the method are described as a series of acts, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
Example 4
According to the embodiment of the present application, there is also provided a SWIPT-assisted QoS-based downlink resource allocation apparatus, as shown in fig. 5, the apparatus includes an obtaining module 542, a selecting module 544 and an allocating module 546.
Dividing the machine user equipment into tolerant machine user equipment and key machine user equipment based on the service type of the machine user equipment; wherein the critical machine user equipment is machine user equipment with transmission reliability requirement higher than a preset requirement threshold, and the tolerant machine user equipment is machine user equipment needing to complete transmission task of the periodically generated load.
The acquisition module 542 is configured to acquire a state observation of a current environmental state of a communication link between a tolerant machine user device and a machine device gateway.
The selection module 544 is configured to select a resource allocation policy for the tolerant machine user device based on the state observation of the current environmental state using a resource allocation model constructed based on a neural network.
The allocating module 546 allocates downlink resources to the tolerant machine user equipment based on the selected resource allocation policy.
The downlink resource allocation apparatus in this embodiment can implement the downlink resource allocation method in the foregoing embodiment, and therefore, details are not described here.
Example 5
An embodiment of the present application provides a downlink resource allocation system assisted by SWIPT and based on QoS, as shown in fig. 6, including: a Base Station (BS) 52, a Human User Equipment (HUE) 60, a machine equipment gateway (MTCG) 54, machine user equipment, wherein the machine user equipment includes a critical machine user equipment (CMTCD) 56 and a tolerant machine user equipment (TMTCD) 58.
The machine user equipment is provided with SWIPT. SWIPT is used for embedding M2M communication, endows machine user equipment with the ability of obtaining stable and reliable energy from the radio frequency environment, and carries out the information decoding function simultaneously.
Spectral sub-bands K = vus, meaning that each orthogonal spectral sub-band is pre-allocated to one human user equipment or key machine user equipment, where H = {1,2.., H } denotes, S = {1,2.., S } denotes key machine user equipment; further, the number of machine device gateways or clusters is denoted by M = {1,2.., M }; a tolerant machine user equipment is denoted by N = {1,2.
The base station 52 is configured to establish an H2H communication link with the human ue 60, pre-allocate a spectrum subband and a transmit power, and broadcast a priori information such as channel gain information and QoS index of all H2H links to the machine equipment gateway 54 in a coverage area.
The machine device gateway 54 is configured to form a cluster with machine user devices, establish multiple M2M communication links, and pre-allocate a spectrum sub-band, a transmission power, and a power split ratio to the critical M2M link, where one machine device gateway and multiple machine user devices form a cluster.
A machine equipment gateway (MTCG) 54 may be the downlink resource allocation apparatus in the foregoing embodiment, and may implement the downlink resource allocation method in the foregoing embodiment, and therefore, details are not described here again.
Example 6
Embodiments of the present application also provide a storage medium configured to store program codes for executing the downlink resource allocation method in the above embodiments.
Optionally, in this embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and various media capable of storing program codes.
Simulation test
According to the method, the coverage radius of a base station is 500M, the coverage radius of a machine device gateway is 30m, the number of H2H user devices is 2, the number of machine device gateways is 2, each machine device gateway coverage range contains a key M2M device, and the number of tolerance M2M devices in the coverage range is increased from 0 to 5. System bandwidth is 4MHz, path loss is modeled as 128+37.6 log 10 d, d is km as a unit, shadow fading modeling is lognormal distribution with standard deviation of 8dB, fast fading modeling is Rayleigh fading, and environmental noise power sigma 2 At-114 dBm, the base station allocates the transmission power P to the H2H user equipment BS,h Is 30dBm, transmission power P of the machine equipment gateway to the key M2M user equipment m,s At 23dBm, power split ratio rho m,s 0.8, maximum transmission power allocated to tolerant M2M user equipment
Figure BDA0003879679910000221
15dBm, transmit power and power split ratio split levels L and Z of 10 and 5, respectively, H2H traffic SINR threshold->
Figure BDA0003879679910000222
Is 7dB, the key type M2M traffic SINR threshold value>
Figure BDA0003879679910000223
5dB, interruption probability>
Figure BDA0003879679910000224
Not more than 0.01, the transmission time constraint T of tolerant M2M service load is 100ms, the load size V is 3 x 1024bytes, and in addition, the receiving SINR value of each user is not more than 30dB in consideration of the practical effect of limited precision digital signal processing. For the parameters required to execute the algorithm, the following are set: the discount factor γ =0.5, constant μ =1/50, number of iterations 3000, and greedy factor e decays linearly from 1 to 0.01 in the first 80% of iterations, the neural network of each tolerant M2M link consists of 3 fully connected hidden layers, the number of neurons per layer of neural network is equal to the action-selectable number, i.e., K × L × Z =200, the Relu function is the activation function and the RMSProp optimization algorithm updates the network parameters with a learning rate of 0.001.
The spectrum sub-band, transmission power and power split ratio allocation method provided by the application is named as a multi-agent SWIPT-assisted adaptive spectrum-power-ratio allocation method (MA-SWIPT-ASPRA) according to the characteristics of the method, and is compared with three efficient intelligent resource allocation methods: (1) The algorithm is only to change a multi-M2M link simultaneous distributed execution resource allocation decision realized by the resource allocation method disclosed by the application into a single M2M link asynchronous execution resource allocation decision; (2) The method is a self-adaptive spectrum-power-ratio allocation method (MA-Non-SWIPT-ASPRA) without assistance of SWIPT, and the algorithm only removes the SWIPT function on the basis of the resource allocation method disclosed by the application; (3) The algorithm is the most classical intelligent resource allocation scheme based on reinforcement learning, namely a SWIPT assisted spectrum-power-ratio allocation method (QL-SWIPT-ASPRA) based on Q learning.
Fig. 7 illustrates a change in overall energy efficiency of M2M devices as the number of access M2M devices in the system increases. As can be seen from the figure, as the number of M2M devices increases, the total energy efficiency of the M2M devices increases and then decreases, for the following reasons:
when only the key M2M devices exist in the network, that is, the number of M2M devices is 2, there is almost no difference in total energy efficiency achieved by either the SWIPT scheme or the SWIPT-free scheme, because there is no interference due to spectrum sharing in the network at this time, and the key M2M devices have strong channel gain due to their close distance from the local machine gateway, so as to obtain a very high SINR value, and in the SWIPT scheme, the key M2M devices use most of energy for information decoding, only a small part of energy for collection, and the large order of magnitude difference makes energy efficiency performance almost have no difference.
When the number of the M2M devices in the network is increased from 2 to 6, the total energy efficiency realized by each scheme is increased accordingly and reaches the maximum value when the number of the M2M devices is 6, because the spectrum efficiency can be improved by spectrum multiplexing, so that the network energy efficiency is further improved, when the number of the M2M devices is 6, 4 tolerant M2M links multiplex spectrum resources occupied by 2H links and 2 key M2M links, each scheme can allocate a pre-occupied spectrum sub-band for each tolerant M2M link without generating additional intra-cluster interference, and therefore, the spectrum efficiency of the network reaches the maximum at this time, and the maximum energy value is further realized.
When the number of M2M devices in the network increases from 6 to 12, the energy efficiency achieved by each scheme is reduced, because excessive spectrum reuse will simultaneously generate intra-cluster interference and inter-cluster interference, which results in ubiquitous interference links, severely increased power consumption, and poor energy efficiency performance.
From the enabling conditions of the schemes, in the scheme without the SWIPT, the implemented energy efficiency is rapidly reduced when the number of the M2M devices exceeds 6, and the other SWIPT schemes can maintain a higher energy efficiency until the number of the M2M devices exceeds 8, because the SWIPT scheme can convert a large amount of interference power into energy collection amount while slightly reducing the spectrum efficiency compared with the scheme without the SWIPT, and therefore, based on the balance between the spectrum efficiency and the power consumption, the SWIPT scheme can have better performance under a certain interference environment. This performance degradation phenomenon becomes gradual as M2M devices are increasing, because the spectrum efficiency is seriously impaired by the increasing number of M2M devices, so that the space for the performance to decrease continuously is continuously reduced.
From the performance of each scheme, the resource allocation scheme provided by the application realizes the highest energy efficiency, and the energy efficiency realized by the application is hardly reduced when the number of the M2M devices is increased from 6 to 8, which indicates that the balance performance between the spectrum efficiency and the energy consumption is better. The resource allocation strategy is most suitable for each agent by fully considering self conditions of each agent at each moment through multi-agent setting, the reason that a single agent scheme of centralized training is poor in performance is that the resource allocation scheme made by the single agent scheme is more universal and can be applied to each agent, the self condition of each agent at each moment is ignored to a certain extent, the reason that a Q learning scheme is poor in performance is that the network environment is complex, the state and the action set are huge, the calculation efficiency of a traditional table lookup method of reinforcement learning is low, and some excellent-performance resource allocation schemes can be omitted in an iteration process.
Fig. 8 shows the variation of the H2H user QoS demand satisfaction rate as the number of access tolerant M2M devices in the system increases. As can be seen from the figure, as the number of tolerant M2M devices increases, the QoS requirement satisfaction rate of H2H users decreases. This is because increasing the number of tolerant M2M devices increases the number of links that multiplex the frequency spectrum occupied by the H2H link, which increases the interference power at the receiving end of the H2H link, thereby decreasing the SINR value at the receiving end. In addition, the QoS requirement satisfaction rate of H2H users in the SWIPT-free scheme reacts more severely to the increase of the number of tolerant M2M devices, and the performance is rapidly reduced when only a few M2M devices exist. The reason for this phenomenon is two-fold, one of which is that in the SWIPT-free scheme, the tolerant M2M link prefers the spectrum resource with the same sharing property as the H2H link, because the M2M receiving end of the tolerant link is mostly far away from the base station, the interference from the base station is small, so that a high effective value can be obtained, and when no other user reuses the spectrum sub-band occupied by the key M2M link in the machine equipment gateway control range, the high energy efficiency can be realized; secondly, in the SWIPT scheme, the tolerant M2M link prefers to share the same spectrum resource with the critical M2M link, because the SWIPT can convert the received interference into energy, thereby improving energy efficiency. In all schemes, the resource allocation method provided by the application realizes the optimal QoS requirement satisfaction rate of the H2H users, and proves that the method has better performance in the aspect of guaranteeing the QoS requirements of the users.
Fig. 9 illustrates the variation of the probability of a critical M2M link outage as the number of access tolerant M2M devices in the system increases. As can be seen from fig. 9, as the number of tolerant M2M devices increases, the probability of interruption of the critical M2M link increases. This is based on the fact that more spectrum access brings a higher probability of interruption. Furthermore, the SWIPT-free scheme performs best for the reason that, in addition to the spectrum multiplexing priority as described in fig. 2, the key M2M device in the SWIPT-free scheme uses all the energy it receives for information decoding, so the obtained SINR value is higher, thereby reducing the outage probability. However, the SWIPT scheme sacrifices a certain amount of link transmission reliability in order to achieve higher energy efficiency. The resource allocation method provided by the application is best in the SWIPT scheme, and the high reliability of the scheme is proved.
Referring to fig. 4, a description will be given of a variation in a success rate of load transmission of the tolerant M2M user as the number of access tolerant M2M devices in the system increases. As can be seen from the figure, as the number of tolerant M2M devices increases, the load transmission success rate decreases. This is because excessive spectral reuse causes more interference and power consumption, thereby reducing the capacity of each link for transmitting the payload. Furthermore, the reason why the SWIPT-free scheme performs the worst is that each tolerant M2M link can only maintain a higher level of energy efficiency performance by selecting a lower transmission power without an energy harvesting function, but at the same time, reduces the capacity. Under the premise of having an energy collection function, each tolerant M2M link is more prone to receive a certain amount of interference to obtain higher energy efficiency, so that a higher level of transmission power is willing to be selected, and the link capacity is increased accordingly. The resource allocation method provided by the application has the optimal performance in all schemes, and further proves the effectiveness of the method.
The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims (10)

1. A SWIPT-assisted downlink resource allocation method is characterized by comprising the following steps:
obtaining a state observation value of a current environment state of a communication link between tolerant machine equipment user equipment and a machine equipment gateway;
based on the state observation value of the current environment state, selecting a resource allocation strategy for the tolerant machine user equipment by using a resource allocation model constructed based on a neural network;
allocating downlink resources for the tolerant machine user equipment based on the selected resource allocation policy;
wherein the tolerant machine user device is a machine user device that needs to complete a transmission task of a periodically generated payload.
2. The method of claim 1, prior to obtaining a state observation that is tolerant to a current environmental state of a communication link between a machine device user equipment and a machine device gateway, the method further comprising:
dividing the machine user equipment into tolerant machine user equipment and key machine user equipment based on the service type of the machine user equipment;
wherein the key machine user equipment is a machine user equipment with transmission reliability requirement higher than a preset requirement threshold.
3. The method of claim 2, wherein the resource allocation model is constructed based on the following method:
constructing a state function, wherein the state function is a set of state observations;
constructing an action function, wherein the action function is a set of downlink spectrum resources, a transmission power level and a power split ratio;
constructing a reward function based on a balance of resource allocation optimization objectives and QoS constraints;
and constructing the resource allocation model based on the state function, the action function and the reward function.
4. The method of claim 3, wherein constructing the state function comprises: constructing the state function based on channel gain information of the communication link on each frequency spectrum sub-band, interference power size of the communication link on each frequency spectrum sub-band, load residue and transmission time residue of the communication link, current iteration number and greedy factor representing current environment exploration rate of the communication link.
5. The method of claim 3, wherein constructing a reward function comprises:
determining a reward function penalty provided by the tolerant machine user device based on the remaining amount of load and the remaining amount of transmission time of the tolerant machine user device;
determining a reward function penalty provided by the key machine user device based on an outage probability of the key machine user device;
determining a reward function penalty term of the human user equipment based on the signal-to-interference-and-noise ratio of the human user equipment;
constructing the reward function based on a total effective value of communication links of all machine user devices, a reward function penalty term provided by the tolerant machine user device, a reward function penalty term provided by the critical machine user device, and a reward function penalty term of the human user device.
6. The method of claim 4, wherein constructing the reward function based on a balance of resource allocation optimization objectives and QoS constraints comprises:
setting QoS constraint conditions;
taking the total energy value achieved by the tolerant machine user equipment and the critical machine user equipment as the resource allocation optimization target;
building the reward function based on the resource allocation optimization objective and the QoS constraints.
7. The method of claim 6, wherein setting the QoS constraints comprises:
setting the QoS constraint condition of the human user equipment to be that a signal to interference plus noise ratio (SINR) is larger than a set lowest threshold;
setting the QoS constraint condition of the tolerant machine user equipment to be that the transmission success rate of the load with the preset size V in the time constraint T is higher than a set success rate threshold;
setting the QoS constraint for a critical machine user equipment to be an outage probability not higher than a set outage threshold.
8. The method of claim 3, wherein after obtaining the state observation of the current environmental state of the communication link between the tolerant machine device user equipment and the machine device gateway, the method further comprises:
constructing an experience replay pool for storing training data for training the resource allocation model, wherein the training data comprises state observation values, reward values and selected resource allocation strategies at the current time and the next time;
the neural network comprises a training network and a target network, and the training network is trained in a random gradient descent mode by using data randomly extracted from the experience replay pool in each iteration; the target network is a fixed neural network, and the network parameters of the target network are updated to the training network parameters at the current moment at intervals.
9. The method of any one of claims 2 to 8, wherein the machine user equipment is provided with wireless communication energy carrying SWIPT for enabling the machine user equipment to obtain energy from a radio frequency environment and simultaneously decode information.
10. A SWIPT-assisted downlink resource allocation device is characterized by comprising:
an acquisition module configured to acquire a state observation of a current environmental state of a communication link between a tolerant machine user device and a machine device gateway;
a selection module configured to select a resource allocation policy for the tolerant machine user equipment based on the state observation of the current environmental state using a resource allocation model constructed based on a neural network;
the distribution module is used for distributing downlink resources to the tolerant machine user equipment based on the selected resource distribution strategy;
wherein the tolerant machine user device is a machine user device that needs to complete a transmission task of a periodically generated payload.
CN202211225933.1A 2022-10-09 2022-10-09 SWIPT-assisted downlink resource allocation method and device Pending CN115915454A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211225933.1A CN115915454A (en) 2022-10-09 2022-10-09 SWIPT-assisted downlink resource allocation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211225933.1A CN115915454A (en) 2022-10-09 2022-10-09 SWIPT-assisted downlink resource allocation method and device

Publications (1)

Publication Number Publication Date
CN115915454A true CN115915454A (en) 2023-04-04

Family

ID=86492636

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211225933.1A Pending CN115915454A (en) 2022-10-09 2022-10-09 SWIPT-assisted downlink resource allocation method and device

Country Status (1)

Country Link
CN (1) CN115915454A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117078236A (en) * 2023-10-18 2023-11-17 广东工业大学 Intelligent maintenance method and device for complex equipment, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117078236A (en) * 2023-10-18 2023-11-17 广东工业大学 Intelligent maintenance method and device for complex equipment, electronic equipment and storage medium
CN117078236B (en) * 2023-10-18 2024-02-02 广东工业大学 Intelligent maintenance method and device for complex equipment, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109474980B (en) Wireless network resource allocation method based on deep reinforcement learning
CN109729528B (en) D2D resource allocation method based on multi-agent deep reinforcement learning
Sheng et al. Energy efficiency and delay tradeoff in device-to-device communications underlaying cellular networks
Hu et al. A joint power and bandwidth allocation method based on deep reinforcement learning for V2V communications in 5G
CN112383922A (en) Deep reinforcement learning frequency spectrum sharing method based on prior experience replay
Dai et al. Energy-efficient resource allocation for energy harvesting-based device-to-device communication
CN110492955B (en) Spectrum prediction switching method based on transfer learning strategy
CN109982437B (en) D2D communication spectrum allocation method based on location-aware weighted graph
Elsayed et al. Deep reinforcement learning for reducing latency in mission critical services
CN113316154A (en) Authorized and unauthorized D2D communication resource joint intelligent distribution method
CN113453358A (en) Joint resource allocation method of wireless energy-carrying D2D network
CN114867030A (en) Double-time-scale intelligent wireless access network slicing method
CN114423070B (en) Heterogeneous wireless network power distribution method and system based on D2D
CN115134779A (en) Internet of vehicles resource allocation method based on information age perception
Cui et al. A two-timescale resource allocation scheme in vehicular network slicing
CN110139282B (en) Energy acquisition D2D communication resource allocation method based on neural network
CN115866787A (en) Network resource allocation method integrating terminal direct transmission communication and multi-access edge calculation
CN115915454A (en) SWIPT-assisted downlink resource allocation method and device
CN113301637A (en) D2D communication power control algorithm based on Q learning and neural network
CN108307510A (en) A kind of power distribution method in isomery subzone network
CN115633402A (en) Resource scheduling method for mixed service throughput optimization
Alenezi et al. Energy-efficient power control and resource allocation based on deep reinforcement learning for D2D communications in cellular networks
He et al. QoE‐aware Q‐learning resource allocation for NOMA wireless multimedia communications
Li et al. Multi-agent deep reinforcement learning based resource management in SWIPT enabled cellular networks with H2H/M2M co-existence
De Mari et al. Energy-efficient proactive scheduling in ultra dense networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination