CN116321390A - Power control method, device and equipment - Google Patents
Power control method, device and equipment Download PDFInfo
- Publication number
- CN116321390A CN116321390A CN202310581580.7A CN202310581580A CN116321390A CN 116321390 A CN116321390 A CN 116321390A CN 202310581580 A CN202310581580 A CN 202310581580A CN 116321390 A CN116321390 A CN 116321390A
- Authority
- CN
- China
- Prior art keywords
- network model
- secondary user
- power
- value
- interference
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 64
- 238000012549 training Methods 0.000 claims abstract description 66
- 230000007613 environmental effect Effects 0.000 claims abstract description 44
- 238000004590 computer program Methods 0.000 claims description 11
- 210000002569 neuron Anatomy 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 6
- 238000004891 communication Methods 0.000 abstract description 17
- 238000011217 control strategy Methods 0.000 abstract description 3
- 238000001228 spectrum Methods 0.000 description 14
- 238000010586 diagram Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 5
- 230000004913 activation Effects 0.000 description 4
- 230000001149 cognitive effect Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 238000004088 simulation Methods 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000002452 interceptive effect Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/04—TPC
- H04W52/18—TPC being performed according to specific parameters
- H04W52/24—TPC being performed according to specific parameters using SIR [Signal to Interference Ratio] or other wireless path parameters
- H04W52/241—TPC being performed according to specific parameters using SIR [Signal to Interference Ratio] or other wireless path parameters taking into account channel quality metrics, e.g. SIR, SNR, CIR, Eb/lo
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/04—TPC
- H04W52/18—TPC being performed according to specific parameters
- H04W52/26—TPC being performed according to specific parameters using transmission rate or quality of service QoS [Quality of Service]
- H04W52/265—TPC being performed according to specific parameters using transmission rate or quality of service QoS [Quality of Service] taking into account the quality of service QoS
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/04—TPC
- H04W52/30—TPC using constraints in the total amount of available transmission power
- H04W52/36—TPC using constraints in the total amount of available transmission power with a discrete range or set of values, e.g. step size, ramping or offsets
- H04W52/367—Power values between minimum and maximum limits, e.g. dynamic range
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention provides a power control method, a device and equipment, which are applied to the technical field of communication, wherein the method comprises the following steps: acquiring environmental state information; obtaining the power value of the secondary user access channel by using the trained first network model and the environment state information; the first network model is obtained based on training of the second network model and training data; the training data includes: environmental state information of a plurality of time slots, power values corresponding to the time slots and rewarding values of the time slots; the prize value for each of said time slots is determined based on the power value for each of said time slots. In the scheme, the successful access rate of the SU can be improved, and the throughput of the SU can be improved; and only the SU is required to perceive the current environment state, the power control strategy of the PU is not required to be known, the actual scene is more met, and the application range is wider.
Description
Technical Field
The present invention relates to the field of wireless communications technologies, and in particular, to a power control method, apparatus, and device.
Background
In recent years, with the rapid development of 5G and internet of things, the number of wireless users increases dramatically, and demands of people for wireless communication are increasing, so that spectrum resources become more important. The shortage of radio spectrum resources and the great increase of the demands of people on the radio spectrum resources constitute a great contradiction in the development of radio communication at present. Although there is more and more research about millimeter wave communication, millimeter wave communication has disadvantages of easy blocking, small coverage, etc., and the use of millimeter wave communication requires the establishment of a large number of base stations, which is costly and difficult to popularize in a short time. Therefore, the improvement of the spectrum utilization rate of the middle-low frequency band is important for the development of 5G and the Internet of things.
In order to improve the spectrum utilization of the mid-low frequency band, dynamic spectrum access techniques (Dynamic Spectrum Access, DSA) are proposed to alleviate the current spectrum situation. DSA technology enables Secondary Users (SU) in a cognitive radio network (Cognitive Radio Network, CRN) to use the licensed frequency band of a Primary User (PU) without interfering with the PU. The DSA technology mainly comprises two spectrum access modes, namely an Overlay spectrum access mode and an Underlay spectrum access mode, wherein the Overlay spectrum access mode refers to that the SU accesses an unoccupied frequency band of the current PU in an opportunity mode, and the Underlay spectrum access mode refers to that the same frequency band is used by the SU and the PU at the same time. Since the underway access mode is that SU and PU share the same frequency band at the same time, the success rate of SU accessing the channel is low.
Disclosure of Invention
The invention provides a power control method, a device and equipment, which are used for solving the defect of low success rate of SU access to a channel in the prior art and realizing the power control method capable of improving the success rate and throughput of SU access to the channel.
The invention provides a power control method, which comprises the following steps:
acquiring environmental state information;
obtaining the power value of the secondary user access channel by using the trained first network model and the environment state information; the first network model is obtained based on training of the second network model and training data; the training data includes: environmental state information of a plurality of time slots, power values corresponding to the time slots and rewarding values of the time slots; the prize value for each of said time slots is determined based on the power value for each of said time slots.
According to the power control method provided by the present invention, before obtaining the power value of the secondary user access channel by using the trained first network model and the environmental state information, the method further includes:
step a1, acquiring environmental state information of the secondary user in a t time slot; t is an integer greater than or equal to 0;
step a2, obtaining the power value of the secondary user in the t time slot according to the environmental state information of the t time slot and the first network model;
step a3, determining the received signal-to-interference-and-noise ratio of the primary user and the secondary user according to the power value of the secondary user in the t time slot;
step a4, obtaining the rewarding value of the t time slot according to the received signal-to-interference-and-noise ratio of the main user and the secondary user;
increasing t by a preset step length, and repeatedly executing the steps a1-a4 until training data of the secondary user are acquired;
and training the first network model according to the training data of the secondary user and the second network model to obtain the trained first network model.
According to the power control method provided by the invention, the obtaining the rewarding value of the t time slot according to the received signal-to-interference-and-noise ratio of the primary user and the secondary user comprises the following steps:
determining the rewarding value of the t time slot as the receiving signal-to-interference-and-noise ratio of the secondary user in the t+1 time slot under the condition that the receiving signal-to-interference-and-noise ratio of the primary user is larger than or equal to a first threshold value and the receiving signal-to-interference-and-noise ratio of the secondary user is larger than or equal to a second threshold value;
and determining the rewarding value of the t time slot as zero under the condition that the receiving signal-to-interference-plus-noise ratio of the main user is smaller than the first threshold value and/or the receiving signal-to-interference-noise ratio of the secondary user is smaller than the second threshold value.
According to the power control method provided by the invention, the obtained training data of the secondary user comprises the following steps:
acquiring training data of the secondary user under the condition that the first condition and/or the second condition are/is met;
the first condition includesT represents a time slot, ">A first slot interval representing an update parameter;
the second condition includesT is the time slot, ">Representing a second slot interval of the update parameter.
According to the power control method provided by the invention, the environment state information comprises the received power of the sensor acquired from the sensor in the network environment.
According to the power control method provided by the invention, the received signal-to-interference-and-noise ratio of the primary user and the secondary user is expressed by the following formula:
wherein ,time->Representing the received signal-to-interference-and-noise ratio of said primary user,/->Time->Representing a received signal-to-interference-and-noise ratio of the secondary user; />Representing user +.>Transmitter->To the receiver->Channel gain of>Representing receiver->Is a noise power of (a) a noise power of (b).
According to the power control method provided by the invention, the first network model and the second network model respectively comprise two full-connection layers, and the number of neurons of the full-connection layers of the first network model and the second network model is different.
The invention also provides a power control device, comprising:
the acquisition module is used for acquiring environmental state information;
the processing module is used for obtaining the power value of the secondary user access channel by utilizing the trained first network model and the environment state information; the first network model is obtained based on training of the second network model and training data; the training data includes: environmental state information of a plurality of time slots, power values corresponding to the time slots and rewarding values of the time slots; the prize value for each of said time slots is determined based on the power value for each of said time slots.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a power control method as described in any of the above when executing the program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a power control method as described in any of the above.
The invention also provides a computer program product comprising a computer program which when executed by a processor implements a power control method as described in any of the above.
According to the power control method, the device and the equipment provided by the invention, the power value of the secondary user access channel is obtained by utilizing the trained first network model and the environment state information obtained by interaction with the environment; the first network model is obtained based on the training of the second network model and training data; the training data includes: the environment state information of a plurality of time slots, the power value corresponding to each time slot and the rewarding value of each time slot are all obtained, so that a more proper power value can be obtained by utilizing a first network model based on the current environment state information, the success rate of the SU accessing to a channel is higher, and the throughput is higher.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a power control method according to the present invention;
FIG. 2 is a system architecture diagram of a power control method provided by the present invention;
FIG. 3 is a second flow chart of the power control method according to the present invention;
FIG. 4 is a schematic diagram of a simulation result of the power control method according to the present invention;
FIG. 5 is a second schematic diagram of the simulation result of the power control method according to the present invention;
FIG. 6 is a third schematic diagram of the simulation result of the power control method according to the present invention;
FIG. 7 is a schematic diagram of simulation results of the power control method according to the present invention;
FIG. 8 is a schematic diagram of a power control device according to the present invention;
fig. 9 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
First, an application scenario according to an embodiment of the present invention is described:
the secondary user SU in the cognitive radio network CRN uses the licensed frequency band of the PU without interfering with the primary user PU. Particularly, a power control scheme is realized aiming at an Underlay spectrum access mode, namely that the same frequency band is used at the same time for SU and PU, so that the success rate and throughput of SU access channels are improved.
In the embodiment of the present invention, the first network model and the second network model may be established based on a deep reinforcement learning algorithm, including an Actor-Critic algorithm, for example.
The following describes the technical solution of the embodiment of the present invention in detail with reference to fig. 1 to 9. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.
Fig. 1 is a schematic flow chart of a power control method provided by the invention. As shown in fig. 1, the method provided in this embodiment includes:
102, obtaining a power value of the secondary user access channel by using the trained first network model and environment state information; the first network model is obtained based on the second network model and training data; the training data includes: environmental state information of a plurality of time slots, power values corresponding to the time slots and rewarding values of the time slots; the prize value for each slot is determined based on the power value for each slot.
In particular, the secondary user SU obtains the environmental status information by sensing from the network environment, optionally the environmental status information includes the received power of the sensor obtained from the sensor in the network environment.
Alternatively, the first network model and the second network model may be built based on a neural network model.
The power value output by the first network model is typically the power value with the highest probability. The goal of the first network model is to maximize the prize value available; optionally, the second network model is used for evaluating the output result of the first network model, so that the output result of the first network model is more accurate.
The SU continuously acquires environment state information through interaction with a network environment, so that the environment state information is input into the first network model, corresponding rewards are obtained based on the power value output by the first network model, the first network model and the second network model are continuously trained, the power value obtained based on the trained first network model greatly improves the success rate and throughput of channel access, and the service quality of the SU can be met.
According to the method, the power value of the secondary user access channel is obtained by utilizing the trained first network model and environment state information obtained by interaction with the environment; the first network model is obtained based on the training of the second network model and training data; the training data includes: the environment state information of a plurality of time slots, the power value corresponding to each time slot and the rewarding value of each time slot are all obtained, so that a more proper power value can be obtained by utilizing a first network model based on the current environment state information, the success rate of the SU accessing to a channel is higher, and the throughput is higher.
Alternatively, the training model may be implemented as follows:
step a1, acquiring environmental state information of the secondary user in a t time slot; t is an integer greater than or equal to 0;
step a2, obtaining the power value of the secondary user in the t time slot according to the environmental state information of the t time slot and the first network model;
step a3, determining the received signal-to-interference-and-noise ratio of the primary user and the secondary user according to the power value of the secondary user in the t time slot;
step a4, obtaining the rewarding value of the t time slot according to the received signal-to-interference-and-noise ratio of the main user and the secondary user;
increasing t by a preset step length, and repeatedly executing the steps a1-a4 until training data of the secondary user are acquired;
and training the first network model according to the training data of the secondary user and the second network model to obtain the trained first network model.
Specifically, steps a1 to a4 are processes of acquiring training data, and the process of acquiring training data is as follows:
setting initialization parameters in a cognitive radio systemPU transmit power->And SU transmit power->Further, a global maximum round number +.>First slot interval of update parameter +.>Second time slot interval, +.>,/>Maximum slot spacing for a single round; wherein-> and />Respectively representing the current number of slots and the current number of rounds.
SU obtains environmental status information by interacting with sensors in the environment, e.g. byA representation; wherein->Represent the firstThe individual sensor is at->The power received by the time slot. /> and />Respectively represent +.>Power of the PU transmitter and SU transmitter in each time slot; /> and />Representing PU transmitter and SU transmitter to sensor, respectively->Path loss of>Andthe calculation formulas of (2) are +.> and />,/> and />Representing PU transmitter and SU transmitter to sensor, respectively->A distance therebetween; />Is indicated at +.>The individual sensor is at->Noise power of a slot, e.g. a variance +.>Is a zero-mean gaussian random variable.
And inputting the environmental state information perceived by the SU into a first network model to obtain the power value of the secondary user in the t time slot. For example, the output value of the first network model is the probability of each action (i.e., power value) being selected, one of the power values is selected as SU at the first according to the magnitude of the probability valuePower value of time slot->I.e. the power value of the SU access channel, optionally the first network model may also directly output the result of the selected power value. Alternatively, the first network model may be to determine the power value of the SU access channel based on the mean and variance of the probability distribution, e.g. in continuous power space.
According to the power value of SU in t time slotCalculating the received signal-to-interference-and-noise ratio of SU and PUFurther, determining a reward value of the t time slot according to the received signal-to-interference-and-noise ratio of the SU and the PU, for example, if the received signal-to-interference-and-noise ratio of the SU and the PU is larger or a certain condition is met, the reward value is larger; otherwise the prize value is small or the prize value is zero.
Updating environmental status information toSimultaneous updating ofI.e. increasing t by a preset step size, which is 1. Wherein (1)>Indicate->Environmental status of individual time slots->Indicate->The individual sensor is at->Reception power of individual slots, symbol->"representing symbols of a vector or matrix, +.>Representing a transpose of the vector or matrix. Repeating the steps a1-a4 until the cycle end condition is met, and obtaining training data.
Alternatively, the received signal to interference and noise ratio of the primary user and the secondary user as shown in fig. 2 is expressed by the following formula:
wherein ,time->Representing the received signal-to-interference-and-noise ratio of said primary user,/->Time->Representing a received signal-to-interference-and-noise ratio of the secondary user; />Representing user +.>Transmitter->To the receiver->Channel gain of>Representing receiver->Is a noise power of (a) a noise power of (b).
In the above embodiment, according to whether the power value of the SU access channel meets the QoS (e.g. the received signal-to-interference-and-noise ratio) of the SU and the PU at the same time, the corresponding reward value is fed back to the SU, so that the first network model obtained by training can improve the channel access success rate of the SU and the throughput of the SU, and the power control policy of the PU does not need to be known, so that the application range is wider.
Optionally, in case the first condition and/or the second condition is met, training data of the secondary user is obtained, i.e. the cycle is ended.
The first condition includesT represents a time slot, ">A first slot interval representing an update parameter;
the second condition includesT is the time slot, ">Representing a second slot interval of the update parameter.
Specifically, judge and />Whether or not it is true, ifThe training time interval, namely the first time interval, is reached, and the first network model and the second network model are trained by using the acquired training data; if->The current round reaches the maximum time interval, namely the second time slot interval, and the acquired training data are utilized to train the first network model and the second network model; if-> and />If not, returning to the execution of the step a1, and continuing to acquire the training data.
In the embodiment, the training efficiency of the model can be improved by setting the condition of the cycle ending, and the implementation scheme is simple.
Alternatively, step a4 may be implemented as follows:
determining the rewarding value of the t time slot as the receiving signal-to-interference-and-noise ratio of the secondary user in the t+1 time slot under the condition that the receiving signal-to-interference-and-noise ratio of the primary user is larger than or equal to a first threshold value and the receiving signal-to-interference-and-noise ratio of the secondary user is larger than or equal to a second threshold value;
and determining the rewarding value of the t time slot as zero under the condition that the receiving signal-to-interference-plus-noise ratio of the main user is smaller than the first threshold value and/or the receiving signal-to-interference-noise ratio of the secondary user is smaller than the second threshold value.
Alternatively, the first threshold and the second threshold may be the same or different.
For example, in the t time slot, if the receivers of PU and SUAll satisfy->,/>I=1 indicates PU, i=2 indicates SU, then the t-slot SU is said to successfully access the channel, SU is at +.>The prize value obtained for each time slot isIndicating that SU and PU meet normal communication at the same time;
otherwise indicate SU in the firstFailure of access to the channel by a slot, SU is at +.>The prize value obtained for each time slot isIndicating that at least one of SU and PU does not meet normal communications.
In the above embodiment, according to whether the power value of the SU access channel meets the QoS (e.g. the received signal-to-interference-and-noise ratio) of the SU and the PU at the same time, the corresponding reward value is fed back to the SU, so that the first network model obtained by training can improve the channel access success rate of the SU and the throughput of the SU, and the power control policy of the PU does not need to be known, so that the application range is wider.
Optionally, the first network model and the second network model may be implemented by an actor network and a critic network respectively, where the first network model has two full-connection layers, the input end of the 1 st full-connection layer is the input end of the model, the output data of the 1 st full-connection layer is input to the input end of the 2 nd full-connection layer after passing through a relu activation function, for example, the number of neurons of the first layer is 200, the output data of the output end of the 2 nd full-connection layer is the data after passing through a softmax activation function, and the output data of the 2 nd full-connection layer is the output end of the first network model, for example, the number of neurons of the second layer is 8; the second network model has two full-connection layers, the input end of the 1 st full-connection layer is the input end of the model, the output data of the 1 st full-connection layer is input to the input end of the 2 nd full-connection layer after the relu activation function, for example, the number of neurons of the first layer is 100, the output data of the output end of the 2 nd full-connection layer is the data after the softmax activation function, and the output data of the 2 nd full-connection layer is the output end of the first network model, for example, the number of neurons of the second layer is 1.
Illustratively, as shown in FIG. 3, the method includes the steps of:
Step 2: SU perceives the environmental state of the t-slot, i.e. obtains environmental state information by interacting with sensors in the environment, wherein />Indicate->The individual sensor is at->The power received by the time slot.
Step 3: inputting the environment state into a first network model to obtain a power value, namely inputting the environment state perceived by SU into the first network model, wherein the output value of the first network model is thatI.e. the power value of the SU access channel.
Step 4: whether the signal-to-interference-and-noise ratio of the PU and the SU is larger than or equal to a preset threshold value;
specifically, according to SU inPower value of time slot->Calculating the received signal to interference plus noise ratio of SU and PU>. If the receivers of PU and SU are->All satisfy->,/>Then the time slot SU is called to successfully access the channel, and then step 6 is executed; otherwise indicate SU at-> Step 5 is performed when the access to the channel fails in each time slot.
Step 5: the prize value is zero;
SU at the firstThe prize value obtained for each time slot is +.>Indicating that at least one of SU and PU does not meet normal communication, then step 7 is performed.
SU at the firstThe prize value obtained for each time slot is +.>Indicating that SU and PU simultaneously fulfil normal communication, then step 7 is performed, wherein +.>Indicating SU is +.>Signal-to-interference-and-noise ratio (throughput) obtained for the time slot.
Step 7: updating the environment state to obtain the updated environment state asAt the same time->. wherein ,/>Indicate->Environmental status of individual time slots->Indicate->The individual sensor is at->The received power of each slot.
Step 8: judging whether the first condition and/or the second condition are/is met; if the first condition is met, the training time interval is reached, and then step 9 is executed; if the second condition is met, the current round reaches the maximum value, and then step 9 is executed; if neither the first condition nor the second condition is satisfied, the process returns to step 3.
Step 9: training a model, and updating parameters of the model;
the first and second network models are trained, parameters of the first and second network models are updated, and then step 10 is performed.
Step 10: judging whether a second condition is met; if yes, the current round is ended, and step 11 is executed next; if not, the current round is not finished, and the step 3 is executed.
Step 11:i.e. the SU increments the number of context interaction rounds by one, and then performs step 12.
Step 12: determining whether T has reached a global maximum, i.eWhether the power control strategy is established or not, if so, the maximum round number is reached, and the power control strategy adjustment is ended; if the condition is not satisfied, the maximum round number is not reached, and the process returns to the step 3.
As shown in fig. 4 and fig. 5, a curve M represents the success rate and throughput obtained by the method according to the embodiment of the present invention, and a curve N represents the success rate and throughput obtained by the deep Q network algorithm, and from fig. 4 and fig. 5, the method according to the embodiment of the present invention can not only improve the success rate of SU, but also improve the throughput of SU.
As shown in fig. 6 and fig. 7, a curve M1 represents the success rate and throughput obtained by the method of the embodiment of the present invention under the condition of continuous power space, and a curve N represents the success rate and throughput obtained by the method of the embodiment of the present invention under the condition of discrete power space, and from fig. 6 and fig. 7, the method of the embodiment of the present invention can be used not only under the condition of discrete power space but also under the condition of continuous power space, so that the throughput of SU can be improved on the premise of ensuring the successful access rate of SU.
In summary, the method SU of the embodiment of the present invention perceives an environmental state; then, the SU outputs a power value, namely the power value of the access channel of the next time slot, through a first network model according to the current perceived environmental state; then, according to whether the power value of the SU access channel meets the QoS feedback of the SU and the PU at the same time, the environment enters the next state, the steps are repeatedly executed, namely the SU continuously interacts with the environment, and finally learns a proper power control scheme.
The power control device provided by the invention is described below, and the power control device described below and the power control method described above can be referred to correspondingly.
Fig. 8 is a schematic structural diagram of a power control device provided by the present invention. As shown in fig. 8, the power control apparatus provided in this embodiment includes:
an acquisition module 110, configured to acquire environmental status information;
a processing module 120, configured to obtain a power value of the secondary user access channel by using the trained first network model and the environmental state information; the first network model is obtained based on training of the second network model and training data; the training data includes: environmental state information of a plurality of time slots, power values corresponding to the time slots and rewarding values of the time slots; the prize value for each of said time slots is determined based on the power value for each of said time slots.
Optionally, the processing module 120 is further configured to perform the following steps:
step a1, acquiring environmental state information of the secondary user in a t time slot; t is an integer greater than or equal to 0;
step a2, obtaining the power value of the secondary user in the t time slot according to the environmental state information of the t time slot and the first network model;
step a3, determining the received signal-to-interference-and-noise ratio of the primary user and the secondary user according to the power value of the secondary user in the t time slot;
step a4, obtaining the rewarding value of the t time slot according to the received signal-to-interference-and-noise ratio of the main user and the secondary user;
increasing t by a preset step length, and repeatedly executing the steps a1-a4 until training data of the secondary user are acquired;
and training the first network model according to the training data of the secondary user and the second network model to obtain the trained first network model.
Optionally, the processing module 120 is specifically configured to:
determining the rewarding value of the t time slot as the receiving signal-to-interference-and-noise ratio of the secondary user in the t+1 time slot under the condition that the receiving signal-to-interference-and-noise ratio of the primary user is larger than or equal to a first threshold value and the receiving signal-to-interference-and-noise ratio of the secondary user is larger than or equal to a second threshold value;
and determining the rewarding value of the t time slot as zero under the condition that the receiving signal-to-interference-plus-noise ratio of the main user is smaller than the first threshold value and/or the receiving signal-to-interference-noise ratio of the secondary user is smaller than the second threshold value.
Optionally, the processing module 120 is specifically configured to:
acquiring training data of the secondary user under the condition that the first condition and/or the second condition are/is met;
the first condition includesT represents a time slot, ">A first slot interval representing an update parameter;
the second condition includesT is the time slot, ">Representing a second slot interval of the update parameter.
Optionally, the environmental status information includes a received power of a sensor acquired from a sensor in the network environment.
Optionally, the received signal-to-interference-and-noise ratios of the primary user and the secondary user are expressed by the following formula:
wherein ,time->Representing the received signal-to-interference-and-noise ratio of said primary user,/->Time->Representing a received signal-to-interference-and-noise ratio of the secondary user; />Representing user +.>Transmitter->To the receiver->Channel gain of>Representing receiver->Is a noise power of (a) a noise power of (b).
Optionally, the first network model and the second network model respectively include two fully connected layers, and the number of neurons of the fully connected layers of the first network model and the second network model are different.
The device of the embodiment of the present invention is configured to perform the method of any of the foregoing method embodiments, and its implementation principle and technical effects are similar, and are not described in detail herein.
Fig. 9 illustrates a physical schematic diagram of an electronic device, as shown in fig. 9, which may include: processor 810, communication interface (Communications Interface) 820, memory 830, and communication bus 840, wherein processor 810, communication interface 820, memory 830 accomplish communication with each other through communication bus 840. Processor 810 may invoke logic instructions in memory 830 to perform a power control method comprising: acquiring environmental state information; obtaining the power value of the secondary user access channel by using the trained first network model and the environment state information; the first network model is obtained based on training of the second network model and training data; the training data includes: environmental state information of a plurality of time slots, power values corresponding to the time slots and rewarding values of the time slots; the prize value for each of said time slots is determined based on the power value for each of said time slots.
Further, the logic instructions in the memory 830 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of performing the power control method provided by the methods described above, the method comprising: acquiring environmental state information; obtaining the power value of the secondary user access channel by using the trained first network model and the environment state information; the first network model is obtained based on training of the second network model and training data; the training data includes: environmental state information of a plurality of time slots, power values corresponding to the time slots and rewarding values of the time slots; the prize value for each of said time slots is determined based on the power value for each of said time slots.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the power control method provided by the above methods, the method comprising: acquiring environmental state information; obtaining the power value of the secondary user access channel by using the trained first network model and the environment state information; the first network model is obtained based on training of the second network model and training data; the training data includes: environmental state information of a plurality of time slots, power values corresponding to the time slots and rewarding values of the time slots; the prize value for each of said time slots is determined based on the power value for each of said time slots.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (10)
1. A method of power control, comprising:
acquiring environmental state information;
obtaining the power value of the secondary user access channel by using the trained first network model and the environment state information; the first network model is obtained based on training of the second network model and training data; the training data includes: environmental state information of a plurality of time slots, power values corresponding to the time slots and rewarding values of the time slots; the prize value for each of said time slots is determined based on the power value for each of said time slots.
2. The power control method according to claim 1, wherein before obtaining the power value of the secondary user access channel by using the trained first network model and the environmental state information, the method further comprises:
step a1, acquiring environmental state information of the secondary user in a t time slot; t is an integer greater than or equal to 0;
step a2, obtaining the power value of the secondary user in the t time slot according to the environmental state information of the t time slot and the first network model;
step a3, determining the received signal-to-interference-and-noise ratio of the primary user and the secondary user according to the power value of the secondary user in the t time slot;
step a4, obtaining the rewarding value of the t time slot according to the received signal-to-interference-and-noise ratio of the main user and the secondary user;
increasing t by a preset step length, and repeatedly executing the steps a1-a4 until training data of the secondary user are obtained;
and training the first network model according to the training data of the secondary user and the second network model to obtain the trained first network model.
3. The power control method according to claim 2, wherein the obtaining the reward value of the t slot according to the received signal-to-interference-and-noise ratios of the primary user and the secondary user includes:
determining the rewarding value of the t time slot as the receiving signal-to-interference-and-noise ratio of the secondary user in the t+1 time slot under the condition that the receiving signal-to-interference-and-noise ratio of the primary user is larger than or equal to a first threshold value and the receiving signal-to-interference-and-noise ratio of the secondary user is larger than or equal to a second threshold value;
and determining the rewarding value of the t time slot as zero under the condition that the receiving signal-to-interference-plus-noise ratio of the main user is smaller than the first threshold value and/or the receiving signal-to-interference-noise ratio of the secondary user is smaller than the second threshold value.
4. The power control method of claim 2, wherein the obtaining training data of the secondary user comprises:
acquiring training data of the secondary user under the condition that the first condition and/or the second condition are/is met;
the first condition includesT represents a time slot, ">A first slot interval representing an update parameter;
5. The method of any of claims 1-4, wherein the environmental status information includes a received power of a sensor acquired from a sensor in a network environment.
6. The power control method according to any one of claims 2-4, wherein the received signal-to-interference-and-noise ratio of the primary user and the secondary user is expressed by the following formula:
wherein ,time->Representing the received signal-to-interference-and-noise ratio of said primary user,/->Time->Representing a received signal-to-interference-and-noise ratio of the secondary user; />Representing user +.>Transmitter->To the receiver->Channel gain of>Representing receiver->Is a noise power of (a) a noise power of (b).
7. The method of any of claims 1-4, wherein the first network model and the second network model each include two fully connected layers, and wherein the number of neurons of the fully connected layers of the first network model and the second network model are different.
8. A power control apparatus, comprising:
the acquisition module is used for acquiring environmental state information;
the processing module is used for obtaining the power value of the secondary user access channel by utilizing the trained first network model and the environment state information; the first network model is obtained based on training of the second network model and training data; the training data includes: environmental state information of a plurality of time slots, power values corresponding to the time slots and rewarding values of the time slots; the prize value for each of said time slots is determined based on the power value for each of said time slots.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the power control method of any one of claims 1 to 6 when the program is executed by the processor.
10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the power control method according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310581580.7A CN116321390A (en) | 2023-05-23 | 2023-05-23 | Power control method, device and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310581580.7A CN116321390A (en) | 2023-05-23 | 2023-05-23 | Power control method, device and equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116321390A true CN116321390A (en) | 2023-06-23 |
Family
ID=86830903
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310581580.7A Pending CN116321390A (en) | 2023-05-23 | 2023-05-23 | Power control method, device and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116321390A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112203345A (en) * | 2020-09-29 | 2021-01-08 | 东南大学 | D2D communication energy efficiency maximization power distribution method based on deep neural network |
US20210135703A1 (en) * | 2019-10-30 | 2021-05-06 | CCDC Army Research Laboratory | Method and system for optimizing transceiver spectrum sharing |
CN113225794A (en) * | 2021-04-29 | 2021-08-06 | 成都中科微信息技术研究院有限公司 | Full-duplex cognitive communication power control method based on deep reinforcement learning |
CN113438723A (en) * | 2021-06-23 | 2021-09-24 | 广东工业大学 | Competitive depth Q network power control method with high reward punishment |
CN113747386A (en) * | 2021-08-16 | 2021-12-03 | 四川九洲空管科技有限责任公司 | Intelligent power control method in cognitive radio network spectrum sharing |
CN113766620A (en) * | 2021-07-15 | 2021-12-07 | 吉林化工学院 | Power control method and device of cognitive radio network |
-
2023
- 2023-05-23 CN CN202310581580.7A patent/CN116321390A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210135703A1 (en) * | 2019-10-30 | 2021-05-06 | CCDC Army Research Laboratory | Method and system for optimizing transceiver spectrum sharing |
CN112203345A (en) * | 2020-09-29 | 2021-01-08 | 东南大学 | D2D communication energy efficiency maximization power distribution method based on deep neural network |
CN113225794A (en) * | 2021-04-29 | 2021-08-06 | 成都中科微信息技术研究院有限公司 | Full-duplex cognitive communication power control method based on deep reinforcement learning |
CN113438723A (en) * | 2021-06-23 | 2021-09-24 | 广东工业大学 | Competitive depth Q network power control method with high reward punishment |
CN113766620A (en) * | 2021-07-15 | 2021-12-07 | 吉林化工学院 | Power control method and device of cognitive radio network |
CN113747386A (en) * | 2021-08-16 | 2021-12-03 | 四川九洲空管科技有限责任公司 | Intelligent power control method in cognitive radio network spectrum sharing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liang et al. | Spectrum sharing in vehicular networks based on multi-agent reinforcement learning | |
CN114363921A (en) | AI network parameter configuration method and equipment | |
CN109274456B (en) | Incomplete information intelligent anti-interference method based on reinforcement learning | |
Safari et al. | Deep UL2DL: Data-driven channel knowledge transfer from uplink to downlink | |
EP4387316A1 (en) | Communication method and apparatus | |
CN113038612B (en) | Cognitive radio power control method based on deep learning | |
Nikoloska et al. | Modular meta-learning for power control via random edge graph neural networks | |
CN101729164B (en) | Wireless resource allocation method and cognitive radio user equipment | |
CN114285444A (en) | Power optimization method for large-scale de-cellular MIMO system | |
Safari et al. | Deep UL2DL: Channel knowledge transfer from uplink to downlink | |
Yu et al. | Deep-reinforcement-learning-based NOMA-aided slotted ALOHA for LEO satellite IoT networks | |
US20220123966A1 (en) | Data-driven probabilistic modeling of wireless channels using conditional variational auto-encoders | |
CN113194031B (en) | User clustering method and system combining interference suppression in fog wireless access network | |
Cha et al. | A reinforcement learning approach to dynamic spectrum access in Internet-of-Things networks | |
CN116321390A (en) | Power control method, device and equipment | |
Huang et al. | Fast spectrum sharing in vehicular networks: A meta reinforcement learning approach | |
WO2022184011A1 (en) | Information processing method and apparatus, communication device, and readable storage medium | |
CN114501353B (en) | Communication information sending and receiving method and communication equipment | |
CN113132277B (en) | Alignment iterative computation method, device, storage medium and computer equipment | |
CN115811451A (en) | Reference signal sequence generation method, device, equipment and medium | |
Perera et al. | Dynamic Spectrum Fusion: An Adaptive Learning Approach for Hybrid NOMA/OMA in Evolving Wireless Networks | |
Prabakaran et al. | An improved deep learning framework for enhancing mimo-Noma system performance | |
Mehrabian et al. | RL-Based Hyperparameter Selection for Spectrum Sensing With CNNs | |
CN108513328B (en) | Robust sharing access method and device for partially overlapped channels of mobile communication equipment | |
TWI806707B (en) | Communication method and communication device thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20230623 |