CN116321390A - Power control method, device and equipment - Google Patents

Power control method, device and equipment Download PDF

Info

Publication number
CN116321390A
CN116321390A CN202310581580.7A CN202310581580A CN116321390A CN 116321390 A CN116321390 A CN 116321390A CN 202310581580 A CN202310581580 A CN 202310581580A CN 116321390 A CN116321390 A CN 116321390A
Authority
CN
China
Prior art keywords
network model
secondary user
power
value
interference
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310581580.7A
Other languages
Chinese (zh)
Inventor
冯建武
袁伟
常诚
付晨
朱超
毕韬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Starpoint Technology Co ltd
Original Assignee
Beijing Starpoint Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Starpoint Technology Co ltd filed Critical Beijing Starpoint Technology Co ltd
Priority to CN202310581580.7A priority Critical patent/CN116321390A/en
Publication of CN116321390A publication Critical patent/CN116321390A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/18TPC being performed according to specific parameters
    • H04W52/24TPC being performed according to specific parameters using SIR [Signal to Interference Ratio] or other wireless path parameters
    • H04W52/241TPC being performed according to specific parameters using SIR [Signal to Interference Ratio] or other wireless path parameters taking into account channel quality metrics, e.g. SIR, SNR, CIR, Eb/lo
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/18TPC being performed according to specific parameters
    • H04W52/26TPC being performed according to specific parameters using transmission rate or quality of service QoS [Quality of Service]
    • H04W52/265TPC being performed according to specific parameters using transmission rate or quality of service QoS [Quality of Service] taking into account the quality of service QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/30TPC using constraints in the total amount of available transmission power
    • H04W52/36TPC using constraints in the total amount of available transmission power with a discrete range or set of values, e.g. step size, ramping or offsets
    • H04W52/367Power values between minimum and maximum limits, e.g. dynamic range
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The invention provides a power control method, a device and equipment, which are applied to the technical field of communication, wherein the method comprises the following steps: acquiring environmental state information; obtaining the power value of the secondary user access channel by using the trained first network model and the environment state information; the first network model is obtained based on training of the second network model and training data; the training data includes: environmental state information of a plurality of time slots, power values corresponding to the time slots and rewarding values of the time slots; the prize value for each of said time slots is determined based on the power value for each of said time slots. In the scheme, the successful access rate of the SU can be improved, and the throughput of the SU can be improved; and only the SU is required to perceive the current environment state, the power control strategy of the PU is not required to be known, the actual scene is more met, and the application range is wider.

Description

Power control method, device and equipment
Technical Field
The present invention relates to the field of wireless communications technologies, and in particular, to a power control method, apparatus, and device.
Background
In recent years, with the rapid development of 5G and internet of things, the number of wireless users increases dramatically, and demands of people for wireless communication are increasing, so that spectrum resources become more important. The shortage of radio spectrum resources and the great increase of the demands of people on the radio spectrum resources constitute a great contradiction in the development of radio communication at present. Although there is more and more research about millimeter wave communication, millimeter wave communication has disadvantages of easy blocking, small coverage, etc., and the use of millimeter wave communication requires the establishment of a large number of base stations, which is costly and difficult to popularize in a short time. Therefore, the improvement of the spectrum utilization rate of the middle-low frequency band is important for the development of 5G and the Internet of things.
In order to improve the spectrum utilization of the mid-low frequency band, dynamic spectrum access techniques (Dynamic Spectrum Access, DSA) are proposed to alleviate the current spectrum situation. DSA technology enables Secondary Users (SU) in a cognitive radio network (Cognitive Radio Network, CRN) to use the licensed frequency band of a Primary User (PU) without interfering with the PU. The DSA technology mainly comprises two spectrum access modes, namely an Overlay spectrum access mode and an Underlay spectrum access mode, wherein the Overlay spectrum access mode refers to that the SU accesses an unoccupied frequency band of the current PU in an opportunity mode, and the Underlay spectrum access mode refers to that the same frequency band is used by the SU and the PU at the same time. Since the underway access mode is that SU and PU share the same frequency band at the same time, the success rate of SU accessing the channel is low.
Disclosure of Invention
The invention provides a power control method, a device and equipment, which are used for solving the defect of low success rate of SU access to a channel in the prior art and realizing the power control method capable of improving the success rate and throughput of SU access to the channel.
The invention provides a power control method, which comprises the following steps:
acquiring environmental state information;
obtaining the power value of the secondary user access channel by using the trained first network model and the environment state information; the first network model is obtained based on training of the second network model and training data; the training data includes: environmental state information of a plurality of time slots, power values corresponding to the time slots and rewarding values of the time slots; the prize value for each of said time slots is determined based on the power value for each of said time slots.
According to the power control method provided by the present invention, before obtaining the power value of the secondary user access channel by using the trained first network model and the environmental state information, the method further includes:
step a1, acquiring environmental state information of the secondary user in a t time slot; t is an integer greater than or equal to 0;
step a2, obtaining the power value of the secondary user in the t time slot according to the environmental state information of the t time slot and the first network model;
step a3, determining the received signal-to-interference-and-noise ratio of the primary user and the secondary user according to the power value of the secondary user in the t time slot;
step a4, obtaining the rewarding value of the t time slot according to the received signal-to-interference-and-noise ratio of the main user and the secondary user;
increasing t by a preset step length, and repeatedly executing the steps a1-a4 until training data of the secondary user are acquired;
and training the first network model according to the training data of the secondary user and the second network model to obtain the trained first network model.
According to the power control method provided by the invention, the obtaining the rewarding value of the t time slot according to the received signal-to-interference-and-noise ratio of the primary user and the secondary user comprises the following steps:
determining the rewarding value of the t time slot as the receiving signal-to-interference-and-noise ratio of the secondary user in the t+1 time slot under the condition that the receiving signal-to-interference-and-noise ratio of the primary user is larger than or equal to a first threshold value and the receiving signal-to-interference-and-noise ratio of the secondary user is larger than or equal to a second threshold value;
and determining the rewarding value of the t time slot as zero under the condition that the receiving signal-to-interference-plus-noise ratio of the main user is smaller than the first threshold value and/or the receiving signal-to-interference-noise ratio of the secondary user is smaller than the second threshold value.
According to the power control method provided by the invention, the obtained training data of the secondary user comprises the following steps:
acquiring training data of the secondary user under the condition that the first condition and/or the second condition are/is met;
the first condition includes
Figure SMS_1
T represents a time slot, ">
Figure SMS_2
A first slot interval representing an update parameter;
the second condition includes
Figure SMS_3
T is the time slot, ">
Figure SMS_4
Representing a second slot interval of the update parameter.
According to the power control method provided by the invention, the environment state information comprises the received power of the sensor acquired from the sensor in the network environment.
According to the power control method provided by the invention, the received signal-to-interference-and-noise ratio of the primary user and the secondary user is expressed by the following formula:
Figure SMS_5
wherein ,
Figure SMS_7
time->
Figure SMS_10
Representing the received signal-to-interference-and-noise ratio of said primary user,/->
Figure SMS_13
Time->
Figure SMS_8
Representing a received signal-to-interference-and-noise ratio of the secondary user; />
Figure SMS_11
Representing user +.>
Figure SMS_14
Transmitter->
Figure SMS_15
To the receiver->
Figure SMS_6
Channel gain of>
Figure SMS_9
Representing receiver->
Figure SMS_12
Is a noise power of (a) a noise power of (b).
According to the power control method provided by the invention, the first network model and the second network model respectively comprise two full-connection layers, and the number of neurons of the full-connection layers of the first network model and the second network model is different.
The invention also provides a power control device, comprising:
the acquisition module is used for acquiring environmental state information;
the processing module is used for obtaining the power value of the secondary user access channel by utilizing the trained first network model and the environment state information; the first network model is obtained based on training of the second network model and training data; the training data includes: environmental state information of a plurality of time slots, power values corresponding to the time slots and rewarding values of the time slots; the prize value for each of said time slots is determined based on the power value for each of said time slots.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a power control method as described in any of the above when executing the program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a power control method as described in any of the above.
The invention also provides a computer program product comprising a computer program which when executed by a processor implements a power control method as described in any of the above.
According to the power control method, the device and the equipment provided by the invention, the power value of the secondary user access channel is obtained by utilizing the trained first network model and the environment state information obtained by interaction with the environment; the first network model is obtained based on the training of the second network model and training data; the training data includes: the environment state information of a plurality of time slots, the power value corresponding to each time slot and the rewarding value of each time slot are all obtained, so that a more proper power value can be obtained by utilizing a first network model based on the current environment state information, the success rate of the SU accessing to a channel is higher, and the throughput is higher.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a power control method according to the present invention;
FIG. 2 is a system architecture diagram of a power control method provided by the present invention;
FIG. 3 is a second flow chart of the power control method according to the present invention;
FIG. 4 is a schematic diagram of a simulation result of the power control method according to the present invention;
FIG. 5 is a second schematic diagram of the simulation result of the power control method according to the present invention;
FIG. 6 is a third schematic diagram of the simulation result of the power control method according to the present invention;
FIG. 7 is a schematic diagram of simulation results of the power control method according to the present invention;
FIG. 8 is a schematic diagram of a power control device according to the present invention;
fig. 9 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
First, an application scenario according to an embodiment of the present invention is described:
the secondary user SU in the cognitive radio network CRN uses the licensed frequency band of the PU without interfering with the primary user PU. Particularly, a power control scheme is realized aiming at an Underlay spectrum access mode, namely that the same frequency band is used at the same time for SU and PU, so that the success rate and throughput of SU access channels are improved.
In the embodiment of the present invention, the first network model and the second network model may be established based on a deep reinforcement learning algorithm, including an Actor-Critic algorithm, for example.
The following describes the technical solution of the embodiment of the present invention in detail with reference to fig. 1 to 9. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.
Fig. 1 is a schematic flow chart of a power control method provided by the invention. As shown in fig. 1, the method provided in this embodiment includes:
step 101, acquiring environmental state information;
102, obtaining a power value of the secondary user access channel by using the trained first network model and environment state information; the first network model is obtained based on the second network model and training data; the training data includes: environmental state information of a plurality of time slots, power values corresponding to the time slots and rewarding values of the time slots; the prize value for each slot is determined based on the power value for each slot.
In particular, the secondary user SU obtains the environmental status information by sensing from the network environment, optionally the environmental status information includes the received power of the sensor obtained from the sensor in the network environment.
Alternatively, the first network model and the second network model may be built based on a neural network model.
The power value output by the first network model is typically the power value with the highest probability. The goal of the first network model is to maximize the prize value available; optionally, the second network model is used for evaluating the output result of the first network model, so that the output result of the first network model is more accurate.
The SU continuously acquires environment state information through interaction with a network environment, so that the environment state information is input into the first network model, corresponding rewards are obtained based on the power value output by the first network model, the first network model and the second network model are continuously trained, the power value obtained based on the trained first network model greatly improves the success rate and throughput of channel access, and the service quality of the SU can be met.
According to the method, the power value of the secondary user access channel is obtained by utilizing the trained first network model and environment state information obtained by interaction with the environment; the first network model is obtained based on the training of the second network model and training data; the training data includes: the environment state information of a plurality of time slots, the power value corresponding to each time slot and the rewarding value of each time slot are all obtained, so that a more proper power value can be obtained by utilizing a first network model based on the current environment state information, the success rate of the SU accessing to a channel is higher, and the throughput is higher.
Alternatively, the training model may be implemented as follows:
step a1, acquiring environmental state information of the secondary user in a t time slot; t is an integer greater than or equal to 0;
step a2, obtaining the power value of the secondary user in the t time slot according to the environmental state information of the t time slot and the first network model;
step a3, determining the received signal-to-interference-and-noise ratio of the primary user and the secondary user according to the power value of the secondary user in the t time slot;
step a4, obtaining the rewarding value of the t time slot according to the received signal-to-interference-and-noise ratio of the main user and the secondary user;
increasing t by a preset step length, and repeatedly executing the steps a1-a4 until training data of the secondary user are acquired;
and training the first network model according to the training data of the secondary user and the second network model to obtain the trained first network model.
Specifically, steps a1 to a4 are processes of acquiring training data, and the process of acquiring training data is as follows:
setting initialization parameters in a cognitive radio system
Figure SMS_18
PU transmit power->
Figure SMS_20
And SU transmit power->
Figure SMS_22
Further, a global maximum round number +.>
Figure SMS_17
First slot interval of update parameter +.>
Figure SMS_21
Second time slot interval, +.>
Figure SMS_23
,/>
Figure SMS_24
Maximum slot spacing for a single round; wherein->
Figure SMS_16
and />
Figure SMS_19
Respectively representing the current number of slots and the current number of rounds.
SU obtains environmental status information by interacting with sensors in the environment, e.g. by
Figure SMS_43
A representation; wherein->
Figure SMS_31
Represent the first
Figure SMS_34
The individual sensor is at->
Figure SMS_41
The power received by the time slot. />
Figure SMS_44
and />
Figure SMS_42
Respectively represent +.>
Figure SMS_45
Power of the PU transmitter and SU transmitter in each time slot; />
Figure SMS_26
and />
Figure SMS_37
Representing PU transmitter and SU transmitter to sensor, respectively->
Figure SMS_25
Path loss of>
Figure SMS_35
And
Figure SMS_27
the calculation formulas of (2) are +.>
Figure SMS_36
and />
Figure SMS_32
,/>
Figure SMS_40
and />
Figure SMS_29
Representing PU transmitter and SU transmitter to sensor, respectively->
Figure SMS_33
A distance therebetween; />
Figure SMS_30
Is indicated at +.>
Figure SMS_38
The individual sensor is at->
Figure SMS_28
Noise power of a slot, e.g. a variance +.>
Figure SMS_39
Is a zero-mean gaussian random variable.
And inputting the environmental state information perceived by the SU into a first network model to obtain the power value of the secondary user in the t time slot. For example, the output value of the first network model is the probability of each action (i.e., power value) being selected, one of the power values is selected as SU at the first according to the magnitude of the probability value
Figure SMS_46
Power value of time slot->
Figure SMS_47
I.e. the power value of the SU access channel, optionally the first network model may also directly output the result of the selected power value. Alternatively, the first network model may be to determine the power value of the SU access channel based on the mean and variance of the probability distribution, e.g. in continuous power space.
According to the power value of SU in t time slot
Figure SMS_48
Calculating the received signal-to-interference-and-noise ratio of SU and PU
Figure SMS_49
Further, determining a reward value of the t time slot according to the received signal-to-interference-and-noise ratio of the SU and the PU, for example, if the received signal-to-interference-and-noise ratio of the SU and the PU is larger or a certain condition is met, the reward value is larger; otherwise the prize value is small or the prize value is zero.
Updating environmental status information to
Figure SMS_51
Simultaneous updating of
Figure SMS_55
I.e. increasing t by a preset step size, which is 1. Wherein (1)>
Figure SMS_56
Indicate->
Figure SMS_52
Environmental status of individual time slots->
Figure SMS_53
Indicate->
Figure SMS_57
The individual sensor is at->
Figure SMS_58
Reception power of individual slots, symbol->
Figure SMS_50
"representing symbols of a vector or matrix, +.>
Figure SMS_54
Representing a transpose of the vector or matrix. Repeating the steps a1-a4 until the cycle end condition is met, and obtaining training data.
Alternatively, the received signal to interference and noise ratio of the primary user and the secondary user as shown in fig. 2 is expressed by the following formula:
Figure SMS_59
wherein ,
Figure SMS_62
time->
Figure SMS_65
Representing the received signal-to-interference-and-noise ratio of said primary user,/->
Figure SMS_67
Time->
Figure SMS_61
Representing a received signal-to-interference-and-noise ratio of the secondary user; />
Figure SMS_64
Representing user +.>
Figure SMS_68
Transmitter->
Figure SMS_69
To the receiver->
Figure SMS_60
Channel gain of>
Figure SMS_63
Representing receiver->
Figure SMS_66
Is a noise power of (a) a noise power of (b).
In the above embodiment, according to whether the power value of the SU access channel meets the QoS (e.g. the received signal-to-interference-and-noise ratio) of the SU and the PU at the same time, the corresponding reward value is fed back to the SU, so that the first network model obtained by training can improve the channel access success rate of the SU and the throughput of the SU, and the power control policy of the PU does not need to be known, so that the application range is wider.
Optionally, in case the first condition and/or the second condition is met, training data of the secondary user is obtained, i.e. the cycle is ended.
The first condition includes
Figure SMS_70
T represents a time slot, ">
Figure SMS_71
A first slot interval representing an update parameter;
the second condition includes
Figure SMS_72
T is the time slot, ">
Figure SMS_73
Representing a second slot interval of the update parameter.
Specifically, judge
Figure SMS_74
and />
Figure SMS_75
Whether or not it is true, if
Figure SMS_76
The training time interval, namely the first time interval, is reached, and the first network model and the second network model are trained by using the acquired training data; if->
Figure SMS_77
The current round reaches the maximum time interval, namely the second time slot interval, and the acquired training data are utilized to train the first network model and the second network model; if->
Figure SMS_78
and />
Figure SMS_79
If not, returning to the execution of the step a1, and continuing to acquire the training data.
In the embodiment, the training efficiency of the model can be improved by setting the condition of the cycle ending, and the implementation scheme is simple.
Alternatively, step a4 may be implemented as follows:
determining the rewarding value of the t time slot as the receiving signal-to-interference-and-noise ratio of the secondary user in the t+1 time slot under the condition that the receiving signal-to-interference-and-noise ratio of the primary user is larger than or equal to a first threshold value and the receiving signal-to-interference-and-noise ratio of the secondary user is larger than or equal to a second threshold value;
and determining the rewarding value of the t time slot as zero under the condition that the receiving signal-to-interference-plus-noise ratio of the main user is smaller than the first threshold value and/or the receiving signal-to-interference-noise ratio of the secondary user is smaller than the second threshold value.
Alternatively, the first threshold and the second threshold may be the same or different.
For example, in the t time slot, if the receivers of PU and SU
Figure SMS_80
All satisfy->
Figure SMS_81
,/>
Figure SMS_82
I=1 indicates PU, i=2 indicates SU, then the t-slot SU is said to successfully access the channel, SU is at +.>
Figure SMS_83
The prize value obtained for each time slot is
Figure SMS_84
Indicating that SU and PU meet normal communication at the same time;
otherwise indicate SU in the first
Figure SMS_85
Failure of access to the channel by a slot, SU is at +.>
Figure SMS_86
The prize value obtained for each time slot is
Figure SMS_87
Indicating that at least one of SU and PU does not meet normal communications.
In the above embodiment, according to whether the power value of the SU access channel meets the QoS (e.g. the received signal-to-interference-and-noise ratio) of the SU and the PU at the same time, the corresponding reward value is fed back to the SU, so that the first network model obtained by training can improve the channel access success rate of the SU and the throughput of the SU, and the power control policy of the PU does not need to be known, so that the application range is wider.
Optionally, the first network model and the second network model may be implemented by an actor network and a critic network respectively, where the first network model has two full-connection layers, the input end of the 1 st full-connection layer is the input end of the model, the output data of the 1 st full-connection layer is input to the input end of the 2 nd full-connection layer after passing through a relu activation function, for example, the number of neurons of the first layer is 200, the output data of the output end of the 2 nd full-connection layer is the data after passing through a softmax activation function, and the output data of the 2 nd full-connection layer is the output end of the first network model, for example, the number of neurons of the second layer is 8; the second network model has two full-connection layers, the input end of the 1 st full-connection layer is the input end of the model, the output data of the 1 st full-connection layer is input to the input end of the 2 nd full-connection layer after the relu activation function, for example, the number of neurons of the first layer is 100, the output data of the output end of the 2 nd full-connection layer is the data after the softmax activation function, and the output data of the 2 nd full-connection layer is the output end of the first network model, for example, the number of neurons of the second layer is 1.
Illustratively, as shown in FIG. 3, the method includes the steps of:
step 1: initializing parameters
Figure SMS_88
Etc.
Step 2: SU perceives the environmental state of the t-slot, i.e. obtains environmental state information by interacting with sensors in the environment
Figure SMS_89
, wherein />
Figure SMS_90
Indicate->
Figure SMS_91
The individual sensor is at->
Figure SMS_92
The power received by the time slot.
Step 3: inputting the environment state into a first network model to obtain a power value, namely inputting the environment state perceived by SU into the first network model, wherein the output value of the first network model is that
Figure SMS_93
I.e. the power value of the SU access channel.
Step 4: whether the signal-to-interference-and-noise ratio of the PU and the SU is larger than or equal to a preset threshold value;
specifically, according to SU in
Figure SMS_94
Power value of time slot->
Figure SMS_95
Calculating the received signal to interference plus noise ratio of SU and PU>
Figure SMS_96
. If the receivers of PU and SU are->
Figure SMS_97
All satisfy->
Figure SMS_98
,/>
Figure SMS_99
Then the time slot SU is called to successfully access the channel, and then step 6 is executed; otherwise indicate SU at->
Figure SMS_100
Step 5 is performed when the access to the channel fails in each time slot.
Step 5: the prize value is zero;
SU at the first
Figure SMS_101
The prize value obtained for each time slot is +.>
Figure SMS_102
Indicating that at least one of SU and PU does not meet normal communication, then step 7 is performed.
Step 6: the rewarding value is SU in
Figure SMS_103
Signal-to-interference-and-noise ratio obtained by time slot;
SU at the first
Figure SMS_104
The prize value obtained for each time slot is +.>
Figure SMS_105
Indicating that SU and PU simultaneously fulfil normal communication, then step 7 is performed, wherein +.>
Figure SMS_106
Indicating SU is +.>
Figure SMS_107
Signal-to-interference-and-noise ratio (throughput) obtained for the time slot.
Step 7: updating the environment state to obtain the updated environment state as
Figure SMS_108
At the same time->
Figure SMS_109
. wherein ,/>
Figure SMS_110
Indicate->
Figure SMS_111
Environmental status of individual time slots->
Figure SMS_112
Indicate->
Figure SMS_113
The individual sensor is at->
Figure SMS_114
The received power of each slot.
Step 8: judging whether the first condition and/or the second condition are/is met; if the first condition is met, the training time interval is reached, and then step 9 is executed; if the second condition is met, the current round reaches the maximum value, and then step 9 is executed; if neither the first condition nor the second condition is satisfied, the process returns to step 3.
Step 9: training a model, and updating parameters of the model;
the first and second network models are trained, parameters of the first and second network models are updated, and then step 10 is performed.
Step 10: judging whether a second condition is met; if yes, the current round is ended, and step 11 is executed next; if not, the current round is not finished, and the step 3 is executed.
Step 11:
Figure SMS_115
i.e. the SU increments the number of context interaction rounds by one, and then performs step 12.
Step 12: determining whether T has reached a global maximum, i.e
Figure SMS_116
Whether the power control strategy is established or not, if so, the maximum round number is reached, and the power control strategy adjustment is ended; if the condition is not satisfied, the maximum round number is not reached, and the process returns to the step 3.
As shown in fig. 4 and fig. 5, a curve M represents the success rate and throughput obtained by the method according to the embodiment of the present invention, and a curve N represents the success rate and throughput obtained by the deep Q network algorithm, and from fig. 4 and fig. 5, the method according to the embodiment of the present invention can not only improve the success rate of SU, but also improve the throughput of SU.
As shown in fig. 6 and fig. 7, a curve M1 represents the success rate and throughput obtained by the method of the embodiment of the present invention under the condition of continuous power space, and a curve N represents the success rate and throughput obtained by the method of the embodiment of the present invention under the condition of discrete power space, and from fig. 6 and fig. 7, the method of the embodiment of the present invention can be used not only under the condition of discrete power space but also under the condition of continuous power space, so that the throughput of SU can be improved on the premise of ensuring the successful access rate of SU.
In summary, the method SU of the embodiment of the present invention perceives an environmental state; then, the SU outputs a power value, namely the power value of the access channel of the next time slot, through a first network model according to the current perceived environmental state; then, according to whether the power value of the SU access channel meets the QoS feedback of the SU and the PU at the same time, the environment enters the next state, the steps are repeatedly executed, namely the SU continuously interacts with the environment, and finally learns a proper power control scheme.
The power control device provided by the invention is described below, and the power control device described below and the power control method described above can be referred to correspondingly.
Fig. 8 is a schematic structural diagram of a power control device provided by the present invention. As shown in fig. 8, the power control apparatus provided in this embodiment includes:
an acquisition module 110, configured to acquire environmental status information;
a processing module 120, configured to obtain a power value of the secondary user access channel by using the trained first network model and the environmental state information; the first network model is obtained based on training of the second network model and training data; the training data includes: environmental state information of a plurality of time slots, power values corresponding to the time slots and rewarding values of the time slots; the prize value for each of said time slots is determined based on the power value for each of said time slots.
Optionally, the processing module 120 is further configured to perform the following steps:
step a1, acquiring environmental state information of the secondary user in a t time slot; t is an integer greater than or equal to 0;
step a2, obtaining the power value of the secondary user in the t time slot according to the environmental state information of the t time slot and the first network model;
step a3, determining the received signal-to-interference-and-noise ratio of the primary user and the secondary user according to the power value of the secondary user in the t time slot;
step a4, obtaining the rewarding value of the t time slot according to the received signal-to-interference-and-noise ratio of the main user and the secondary user;
increasing t by a preset step length, and repeatedly executing the steps a1-a4 until training data of the secondary user are acquired;
and training the first network model according to the training data of the secondary user and the second network model to obtain the trained first network model.
Optionally, the processing module 120 is specifically configured to:
determining the rewarding value of the t time slot as the receiving signal-to-interference-and-noise ratio of the secondary user in the t+1 time slot under the condition that the receiving signal-to-interference-and-noise ratio of the primary user is larger than or equal to a first threshold value and the receiving signal-to-interference-and-noise ratio of the secondary user is larger than or equal to a second threshold value;
and determining the rewarding value of the t time slot as zero under the condition that the receiving signal-to-interference-plus-noise ratio of the main user is smaller than the first threshold value and/or the receiving signal-to-interference-noise ratio of the secondary user is smaller than the second threshold value.
Optionally, the processing module 120 is specifically configured to:
acquiring training data of the secondary user under the condition that the first condition and/or the second condition are/is met;
the first condition includes
Figure SMS_117
T represents a time slot, ">
Figure SMS_118
A first slot interval representing an update parameter;
the second condition includes
Figure SMS_119
T is the time slot, ">
Figure SMS_120
Representing a second slot interval of the update parameter.
Optionally, the environmental status information includes a received power of a sensor acquired from a sensor in the network environment.
Optionally, the received signal-to-interference-and-noise ratios of the primary user and the secondary user are expressed by the following formula:
Figure SMS_121
wherein ,
Figure SMS_123
time->
Figure SMS_126
Representing the received signal-to-interference-and-noise ratio of said primary user,/->
Figure SMS_128
Time->
Figure SMS_124
Representing a received signal-to-interference-and-noise ratio of the secondary user; />
Figure SMS_127
Representing user +.>
Figure SMS_130
Transmitter->
Figure SMS_131
To the receiver->
Figure SMS_122
Channel gain of>
Figure SMS_125
Representing receiver->
Figure SMS_129
Is a noise power of (a) a noise power of (b).
Optionally, the first network model and the second network model respectively include two fully connected layers, and the number of neurons of the fully connected layers of the first network model and the second network model are different.
The device of the embodiment of the present invention is configured to perform the method of any of the foregoing method embodiments, and its implementation principle and technical effects are similar, and are not described in detail herein.
Fig. 9 illustrates a physical schematic diagram of an electronic device, as shown in fig. 9, which may include: processor 810, communication interface (Communications Interface) 820, memory 830, and communication bus 840, wherein processor 810, communication interface 820, memory 830 accomplish communication with each other through communication bus 840. Processor 810 may invoke logic instructions in memory 830 to perform a power control method comprising: acquiring environmental state information; obtaining the power value of the secondary user access channel by using the trained first network model and the environment state information; the first network model is obtained based on training of the second network model and training data; the training data includes: environmental state information of a plurality of time slots, power values corresponding to the time slots and rewarding values of the time slots; the prize value for each of said time slots is determined based on the power value for each of said time slots.
Further, the logic instructions in the memory 830 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of performing the power control method provided by the methods described above, the method comprising: acquiring environmental state information; obtaining the power value of the secondary user access channel by using the trained first network model and the environment state information; the first network model is obtained based on training of the second network model and training data; the training data includes: environmental state information of a plurality of time slots, power values corresponding to the time slots and rewarding values of the time slots; the prize value for each of said time slots is determined based on the power value for each of said time slots.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the power control method provided by the above methods, the method comprising: acquiring environmental state information; obtaining the power value of the secondary user access channel by using the trained first network model and the environment state information; the first network model is obtained based on training of the second network model and training data; the training data includes: environmental state information of a plurality of time slots, power values corresponding to the time slots and rewarding values of the time slots; the prize value for each of said time slots is determined based on the power value for each of said time slots.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method of power control, comprising:
acquiring environmental state information;
obtaining the power value of the secondary user access channel by using the trained first network model and the environment state information; the first network model is obtained based on training of the second network model and training data; the training data includes: environmental state information of a plurality of time slots, power values corresponding to the time slots and rewarding values of the time slots; the prize value for each of said time slots is determined based on the power value for each of said time slots.
2. The power control method according to claim 1, wherein before obtaining the power value of the secondary user access channel by using the trained first network model and the environmental state information, the method further comprises:
step a1, acquiring environmental state information of the secondary user in a t time slot; t is an integer greater than or equal to 0;
step a2, obtaining the power value of the secondary user in the t time slot according to the environmental state information of the t time slot and the first network model;
step a3, determining the received signal-to-interference-and-noise ratio of the primary user and the secondary user according to the power value of the secondary user in the t time slot;
step a4, obtaining the rewarding value of the t time slot according to the received signal-to-interference-and-noise ratio of the main user and the secondary user;
increasing t by a preset step length, and repeatedly executing the steps a1-a4 until training data of the secondary user are obtained;
and training the first network model according to the training data of the secondary user and the second network model to obtain the trained first network model.
3. The power control method according to claim 2, wherein the obtaining the reward value of the t slot according to the received signal-to-interference-and-noise ratios of the primary user and the secondary user includes:
determining the rewarding value of the t time slot as the receiving signal-to-interference-and-noise ratio of the secondary user in the t+1 time slot under the condition that the receiving signal-to-interference-and-noise ratio of the primary user is larger than or equal to a first threshold value and the receiving signal-to-interference-and-noise ratio of the secondary user is larger than or equal to a second threshold value;
and determining the rewarding value of the t time slot as zero under the condition that the receiving signal-to-interference-plus-noise ratio of the main user is smaller than the first threshold value and/or the receiving signal-to-interference-noise ratio of the secondary user is smaller than the second threshold value.
4. The power control method of claim 2, wherein the obtaining training data of the secondary user comprises:
acquiring training data of the secondary user under the condition that the first condition and/or the second condition are/is met;
the first condition includes
Figure QLYQS_1
T represents a time slot, ">
Figure QLYQS_2
A first slot interval representing an update parameter;
the second condition includes
Figure QLYQS_3
T is the time slot, representing->
Figure QLYQS_4
Representing a second slot interval of the update parameter.
5. The method of any of claims 1-4, wherein the environmental status information includes a received power of a sensor acquired from a sensor in a network environment.
6. The power control method according to any one of claims 2-4, wherein the received signal-to-interference-and-noise ratio of the primary user and the secondary user is expressed by the following formula:
Figure QLYQS_5
wherein ,
Figure QLYQS_8
time->
Figure QLYQS_11
Representing the received signal-to-interference-and-noise ratio of said primary user,/->
Figure QLYQS_14
Time->
Figure QLYQS_7
Representing a received signal-to-interference-and-noise ratio of the secondary user; />
Figure QLYQS_10
Representing user +.>
Figure QLYQS_13
Transmitter->
Figure QLYQS_15
To the receiver->
Figure QLYQS_6
Channel gain of>
Figure QLYQS_9
Representing receiver->
Figure QLYQS_12
Is a noise power of (a) a noise power of (b).
7. The method of any of claims 1-4, wherein the first network model and the second network model each include two fully connected layers, and wherein the number of neurons of the fully connected layers of the first network model and the second network model are different.
8. A power control apparatus, comprising:
the acquisition module is used for acquiring environmental state information;
the processing module is used for obtaining the power value of the secondary user access channel by utilizing the trained first network model and the environment state information; the first network model is obtained based on training of the second network model and training data; the training data includes: environmental state information of a plurality of time slots, power values corresponding to the time slots and rewarding values of the time slots; the prize value for each of said time slots is determined based on the power value for each of said time slots.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the power control method of any one of claims 1 to 6 when the program is executed by the processor.
10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the power control method according to any one of claims 1 to 6.
CN202310581580.7A 2023-05-23 2023-05-23 Power control method, device and equipment Pending CN116321390A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310581580.7A CN116321390A (en) 2023-05-23 2023-05-23 Power control method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310581580.7A CN116321390A (en) 2023-05-23 2023-05-23 Power control method, device and equipment

Publications (1)

Publication Number Publication Date
CN116321390A true CN116321390A (en) 2023-06-23

Family

ID=86830903

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310581580.7A Pending CN116321390A (en) 2023-05-23 2023-05-23 Power control method, device and equipment

Country Status (1)

Country Link
CN (1) CN116321390A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112203345A (en) * 2020-09-29 2021-01-08 东南大学 D2D communication energy efficiency maximization power distribution method based on deep neural network
US20210135703A1 (en) * 2019-10-30 2021-05-06 CCDC Army Research Laboratory Method and system for optimizing transceiver spectrum sharing
CN113225794A (en) * 2021-04-29 2021-08-06 成都中科微信息技术研究院有限公司 Full-duplex cognitive communication power control method based on deep reinforcement learning
CN113438723A (en) * 2021-06-23 2021-09-24 广东工业大学 Competitive depth Q network power control method with high reward punishment
CN113747386A (en) * 2021-08-16 2021-12-03 四川九洲空管科技有限责任公司 Intelligent power control method in cognitive radio network spectrum sharing
CN113766620A (en) * 2021-07-15 2021-12-07 吉林化工学院 Power control method and device of cognitive radio network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210135703A1 (en) * 2019-10-30 2021-05-06 CCDC Army Research Laboratory Method and system for optimizing transceiver spectrum sharing
CN112203345A (en) * 2020-09-29 2021-01-08 东南大学 D2D communication energy efficiency maximization power distribution method based on deep neural network
CN113225794A (en) * 2021-04-29 2021-08-06 成都中科微信息技术研究院有限公司 Full-duplex cognitive communication power control method based on deep reinforcement learning
CN113438723A (en) * 2021-06-23 2021-09-24 广东工业大学 Competitive depth Q network power control method with high reward punishment
CN113766620A (en) * 2021-07-15 2021-12-07 吉林化工学院 Power control method and device of cognitive radio network
CN113747386A (en) * 2021-08-16 2021-12-03 四川九洲空管科技有限责任公司 Intelligent power control method in cognitive radio network spectrum sharing

Similar Documents

Publication Publication Date Title
Liang et al. Spectrum sharing in vehicular networks based on multi-agent reinforcement learning
Nasir et al. Multi-agent deep reinforcement learning for dynamic power allocation in wireless networks
Davaslioglu et al. DeepWiFi: Cognitive WiFi with deep learning
CN109274456B (en) Incomplete information intelligent anti-interference method based on reinforcement learning
Safari et al. Deep UL2DL: Data-driven channel knowledge transfer from uplink to downlink
CN113423110B (en) Multi-user multi-channel dynamic spectrum access method based on deep reinforcement learning
Kim et al. Autonomous power allocation based on distributed deep learning for device-to-device communication underlaying cellular network
CN114285444A (en) Power optimization method for large-scale de-cellular MIMO system
Safari et al. Deep UL2DL: Channel knowledge transfer from uplink to downlink
Nikoloska et al. Modular meta-learning for power control via random edge graph neural networks
Yu et al. Deep Reinforcement Learning-Based NOMA-Aided Slotted ALOHA for LEO Satellite IoT Networks
US20220123966A1 (en) Data-driven probabilistic modeling of wireless channels using conditional variational auto-encoders
CN113038612B (en) Cognitive radio power control method based on deep learning
CN108566227B (en) Multi-user detection method
CN113194031A (en) User clustering method and system combining interference suppression in fog wireless access network
CN116321390A (en) Power control method, device and equipment
Huang et al. Fast spectrum sharing in vehicular networks: A meta reinforcement learning approach
Chen et al. Adaptive repetition scheme with machine learning for 3GPP NB-IoT
CN114501353B (en) Communication information sending and receiving method and communication equipment
Ali et al. Contextual bandit learning for machine type communications in the null space of multi-antenna systems
CN110492956B (en) Error compensation multi-user detection method and device for MUSA (multiple input multiple output) system
Zou et al. Joint user activity and data detection in grant-free NOMA using generative neural networks
CN108513328B (en) Robust sharing access method and device for partially overlapped channels of mobile communication equipment
Prabakaran et al. An improved deep learning framework for enhancing mimo-Noma system performance
TWI806707B (en) Communication method and communication device thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20230623