CN116321390A

CN116321390A - Power control method, device and equipment

Info

Publication number: CN116321390A
Application number: CN202310581580.7A
Authority: CN
Inventors: 冯建武; 袁伟; 常诚; 付晨; 朱超; 毕韬
Original assignee: Beijing Starpoint Technology Co ltd
Current assignee: Beijing Starpoint Technology Co ltd
Priority date: 2023-05-23
Filing date: 2023-05-23
Publication date: 2023-06-23

Abstract

The invention provides a power control method, a device and equipment, which are applied to the technical field of communication, wherein the method comprises the following steps: acquiring environmental state information; obtaining the power value of the secondary user access channel by using the trained first network model and the environment state information; the first network model is obtained based on training of the second network model and training data; the training data includes: environmental state information of a plurality of time slots, power values corresponding to the time slots and rewarding values of the time slots; the prize value for each of said time slots is determined based on the power value for each of said time slots. In the scheme, the successful access rate of the SU can be improved, and the throughput of the SU can be improved; and only the SU is required to perceive the current environment state, the power control strategy of the PU is not required to be known, the actual scene is more met, and the application range is wider.

Description

Power control method, device and equipment

Technical Field

The present invention relates to the field of wireless communications technologies, and in particular, to a power control method, apparatus, and device.

Background

In recent years, with the rapid development of 5G and internet of things, the number of wireless users increases dramatically, and demands of people for wireless communication are increasing, so that spectrum resources become more important. The shortage of radio spectrum resources and the great increase of the demands of people on the radio spectrum resources constitute a great contradiction in the development of radio communication at present. Although there is more and more research about millimeter wave communication, millimeter wave communication has disadvantages of easy blocking, small coverage, etc., and the use of millimeter wave communication requires the establishment of a large number of base stations, which is costly and difficult to popularize in a short time. Therefore, the improvement of the spectrum utilization rate of the middle-low frequency band is important for the development of 5G and the Internet of things.

In order to improve the spectrum utilization of the mid-low frequency band, dynamic spectrum access techniques (Dynamic Spectrum Access, DSA) are proposed to alleviate the current spectrum situation. DSA technology enables Secondary Users (SU) in a cognitive radio network (Cognitive Radio Network, CRN) to use the licensed frequency band of a Primary User (PU) without interfering with the PU. The DSA technology mainly comprises two spectrum access modes, namely an Overlay spectrum access mode and an Underlay spectrum access mode, wherein the Overlay spectrum access mode refers to that the SU accesses an unoccupied frequency band of the current PU in an opportunity mode, and the Underlay spectrum access mode refers to that the same frequency band is used by the SU and the PU at the same time. Since the underway access mode is that SU and PU share the same frequency band at the same time, the success rate of SU accessing the channel is low.

Disclosure of Invention

The invention provides a power control method, a device and equipment, which are used for solving the defect of low success rate of SU access to a channel in the prior art and realizing the power control method capable of improving the success rate and throughput of SU access to the channel.

The invention provides a power control method, which comprises the following steps:

acquiring environmental state information;

obtaining the power value of the secondary user access channel by using the trained first network model and the environment state information; the first network model is obtained based on training of the second network model and training data; the training data includes: environmental state information of a plurality of time slots, power values corresponding to the time slots and rewarding values of the time slots; the prize value for each of said time slots is determined based on the power value for each of said time slots.

According to the power control method provided by the present invention, before obtaining the power value of the secondary user access channel by using the trained first network model and the environmental state information, the method further includes:

step a1, acquiring environmental state information of the secondary user in a t time slot; t is an integer greater than or equal to 0;

step a2, obtaining the power value of the secondary user in the t time slot according to the environmental state information of the t time slot and the first network model;

step a3, determining the received signal-to-interference-and-noise ratio of the primary user and the secondary user according to the power value of the secondary user in the t time slot;

step a4, obtaining the rewarding value of the t time slot according to the received signal-to-interference-and-noise ratio of the main user and the secondary user;

increasing t by a preset step length, and repeatedly executing the steps a1-a4 until training data of the secondary user are acquired;

and training the first network model according to the training data of the secondary user and the second network model to obtain the trained first network model.

According to the power control method provided by the invention, the obtaining the rewarding value of the t time slot according to the received signal-to-interference-and-noise ratio of the primary user and the secondary user comprises the following steps:

determining the rewarding value of the t time slot as the receiving signal-to-interference-and-noise ratio of the secondary user in the t+1 time slot under the condition that the receiving signal-to-interference-and-noise ratio of the primary user is larger than or equal to a first threshold value and the receiving signal-to-interference-and-noise ratio of the secondary user is larger than or equal to a second threshold value;

and determining the rewarding value of the t time slot as zero under the condition that the receiving signal-to-interference-plus-noise ratio of the main user is smaller than the first threshold value and/or the receiving signal-to-interference-noise ratio of the secondary user is smaller than the second threshold value.

According to the power control method provided by the invention, the obtained training data of the secondary user comprises the following steps:

acquiring training data of the secondary user under the condition that the first condition and/or the second condition are/is met;

the first condition includes

T represents a time slot, ">

A first slot interval representing an update parameter;

the second condition includes

T is the time slot, ">

Representing a second slot interval of the update parameter.

According to the power control method provided by the invention, the environment state information comprises the received power of the sensor acquired from the sensor in the network environment.

According to the power control method provided by the invention, the received signal-to-interference-and-noise ratio of the primary user and the secondary user is expressed by the following formula:

；

wherein ,

time->

Representing the received signal-to-interference-and-noise ratio of said primary user,/->

Time->

Representing a received signal-to-interference-and-noise ratio of the secondary user; />

Representing user +.>

Transmitter->

To the receiver->

Channel gain of>

Representing receiver->

Is a noise power of (a) a noise power of (b).

According to the power control method provided by the invention, the first network model and the second network model respectively comprise two full-connection layers, and the number of neurons of the full-connection layers of the first network model and the second network model is different.

The invention also provides a power control device, comprising:

the acquisition module is used for acquiring environmental state information;

the processing module is used for obtaining the power value of the secondary user access channel by utilizing the trained first network model and the environment state information; the first network model is obtained based on training of the second network model and training data; the training data includes: environmental state information of a plurality of time slots, power values corresponding to the time slots and rewarding values of the time slots; the prize value for each of said time slots is determined based on the power value for each of said time slots.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a power control method as described in any of the above when executing the program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a power control method as described in any of the above.

The invention also provides a computer program product comprising a computer program which when executed by a processor implements a power control method as described in any of the above.

According to the power control method, the device and the equipment provided by the invention, the power value of the secondary user access channel is obtained by utilizing the trained first network model and the environment state information obtained by interaction with the environment; the first network model is obtained based on the training of the second network model and training data; the training data includes: the environment state information of a plurality of time slots, the power value corresponding to each time slot and the rewarding value of each time slot are all obtained, so that a more proper power value can be obtained by utilizing a first network model based on the current environment state information, the success rate of the SU accessing to a channel is higher, and the throughput is higher.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a power control method according to the present invention;

FIG. 2 is a system architecture diagram of a power control method provided by the present invention;

FIG. 3 is a second flow chart of the power control method according to the present invention;

FIG. 4 is a schematic diagram of a simulation result of the power control method according to the present invention;

FIG. 5 is a second schematic diagram of the simulation result of the power control method according to the present invention;

FIG. 6 is a third schematic diagram of the simulation result of the power control method according to the present invention;

FIG. 7 is a schematic diagram of simulation results of the power control method according to the present invention;

FIG. 8 is a schematic diagram of a power control device according to the present invention;

fig. 9 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

First, an application scenario according to an embodiment of the present invention is described:

the secondary user SU in the cognitive radio network CRN uses the licensed frequency band of the PU without interfering with the primary user PU. Particularly, a power control scheme is realized aiming at an Underlay spectrum access mode, namely that the same frequency band is used at the same time for SU and PU, so that the success rate and throughput of SU access channels are improved.

In the embodiment of the present invention, the first network model and the second network model may be established based on a deep reinforcement learning algorithm, including an Actor-Critic algorithm, for example.

The following describes the technical solution of the embodiment of the present invention in detail with reference to fig. 1 to 9. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.

Fig. 1 is a schematic flow chart of a power control method provided by the invention. As shown in fig. 1, the method provided in this embodiment includes:

step 101, acquiring environmental state information;

102, obtaining a power value of the secondary user access channel by using the trained first network model and environment state information; the first network model is obtained based on the second network model and training data; the training data includes: environmental state information of a plurality of time slots, power values corresponding to the time slots and rewarding values of the time slots; the prize value for each slot is determined based on the power value for each slot.

In particular, the secondary user SU obtains the environmental status information by sensing from the network environment, optionally the environmental status information includes the received power of the sensor obtained from the sensor in the network environment.

Alternatively, the first network model and the second network model may be built based on a neural network model.

The power value output by the first network model is typically the power value with the highest probability. The goal of the first network model is to maximize the prize value available; optionally, the second network model is used for evaluating the output result of the first network model, so that the output result of the first network model is more accurate.

The SU continuously acquires environment state information through interaction with a network environment, so that the environment state information is input into the first network model, corresponding rewards are obtained based on the power value output by the first network model, the first network model and the second network model are continuously trained, the power value obtained based on the trained first network model greatly improves the success rate and throughput of channel access, and the service quality of the SU can be met.

According to the method, the power value of the secondary user access channel is obtained by utilizing the trained first network model and environment state information obtained by interaction with the environment; the first network model is obtained based on the training of the second network model and training data; the training data includes: the environment state information of a plurality of time slots, the power value corresponding to each time slot and the rewarding value of each time slot are all obtained, so that a more proper power value can be obtained by utilizing a first network model based on the current environment state information, the success rate of the SU accessing to a channel is higher, and the throughput is higher.

Alternatively, the training model may be implemented as follows:

Specifically, steps a1 to a4 are processes of acquiring training data, and the process of acquiring training data is as follows:

setting initialization parameters in a cognitive radio system

PU transmit power->

And SU transmit power->

Further, a global maximum round number +.>

First slot interval of update parameter +.>

Second time slot interval, +.>

，/>

Maximum slot spacing for a single round; wherein->

and />

Respectively representing the current number of slots and the current number of rounds.

SU obtains environmental status information by interacting with sensors in the environment, e.g. by

A representation; wherein->

Represent the first

The individual sensor is at->

The power received by the time slot. />

and />

Respectively represent +.>

Power of the PU transmitter and SU transmitter in each time slot; />

and />

Representing PU transmitter and SU transmitter to sensor, respectively->

Path loss of>

And

the calculation formulas of (2) are +.>

and />

，/>

and />

Representing PU transmitter and SU transmitter to sensor, respectively->

A distance therebetween; />

Is indicated at +.>

The individual sensor is at->

Noise power of a slot, e.g. a variance +.>

Is a zero-mean gaussian random variable.

And inputting the environmental state information perceived by the SU into a first network model to obtain the power value of the secondary user in the t time slot. For example, the output value of the first network model is the probability of each action (i.e., power value) being selected, one of the power values is selected as SU at the first according to the magnitude of the probability value

Power value of time slot->

I.e. the power value of the SU access channel, optionally the first network model may also directly output the result of the selected power value. Alternatively, the first network model may be to determine the power value of the SU access channel based on the mean and variance of the probability distribution, e.g. in continuous power space.

According to the power value of SU in t time slot

Calculating the received signal-to-interference-and-noise ratio of SU and PU

Further, determining a reward value of the t time slot according to the received signal-to-interference-and-noise ratio of the SU and the PU, for example, if the received signal-to-interference-and-noise ratio of the SU and the PU is larger or a certain condition is met, the reward value is larger; otherwise the prize value is small or the prize value is zero.

Updating environmental status information to

Simultaneous updating of

I.e. increasing t by a preset step size, which is 1. Wherein (1)>

Indicate->

Environmental status of individual time slots->

Indicate->

The individual sensor is at->

Reception power of individual slots, symbol->

"representing symbols of a vector or matrix, +.>

Representing a transpose of the vector or matrix. Repeating the steps a1-a4 until the cycle end condition is met, and obtaining training data.

Alternatively, the received signal to interference and noise ratio of the primary user and the secondary user as shown in fig. 2 is expressed by the following formula:

；

wherein ,

time->

Time->

Representing user +.>

Transmitter->

To the receiver->

Channel gain of>

Representing receiver->

Is a noise power of (a) a noise power of (b).

In the above embodiment, according to whether the power value of the SU access channel meets the QoS (e.g. the received signal-to-interference-and-noise ratio) of the SU and the PU at the same time, the corresponding reward value is fed back to the SU, so that the first network model obtained by training can improve the channel access success rate of the SU and the throughput of the SU, and the power control policy of the PU does not need to be known, so that the application range is wider.

Optionally, in case the first condition and/or the second condition is met, training data of the secondary user is obtained, i.e. the cycle is ended.

The first condition includes

T represents a time slot, ">

A first slot interval representing an update parameter;

the second condition includes

T is the time slot, ">

Representing a second slot interval of the update parameter.

Specifically, judge

and />

Whether or not it is true, if

The training time interval, namely the first time interval, is reached, and the first network model and the second network model are trained by using the acquired training data; if->

The current round reaches the maximum time interval, namely the second time slot interval, and the acquired training data are utilized to train the first network model and the second network model; if->

and />

If not, returning to the execution of the step a1, and continuing to acquire the training data.

In the embodiment, the training efficiency of the model can be improved by setting the condition of the cycle ending, and the implementation scheme is simple.

Alternatively, step a4 may be implemented as follows:

Alternatively, the first threshold and the second threshold may be the same or different.

For example, in the t time slot, if the receivers of PU and SU

All satisfy->

，/>

I=1 indicates PU, i=2 indicates SU, then the t-slot SU is said to successfully access the channel, SU is at +.>

The prize value obtained for each time slot is

Indicating that SU and PU meet normal communication at the same time;

otherwise indicate SU in the first

Failure of access to the channel by a slot, SU is at +.>

The prize value obtained for each time slot is

Indicating that at least one of SU and PU does not meet normal communications.

Optionally, the first network model and the second network model may be implemented by an actor network and a critic network respectively, where the first network model has two full-connection layers, the input end of the 1 st full-connection layer is the input end of the model, the output data of the 1 st full-connection layer is input to the input end of the 2 nd full-connection layer after passing through a relu activation function, for example, the number of neurons of the first layer is 200, the output data of the output end of the 2 nd full-connection layer is the data after passing through a softmax activation function, and the output data of the 2 nd full-connection layer is the output end of the first network model, for example, the number of neurons of the second layer is 8; the second network model has two full-connection layers, the input end of the 1 st full-connection layer is the input end of the model, the output data of the 1 st full-connection layer is input to the input end of the 2 nd full-connection layer after the relu activation function, for example, the number of neurons of the first layer is 100, the output data of the output end of the 2 nd full-connection layer is the data after the softmax activation function, and the output data of the 2 nd full-connection layer is the output end of the first network model, for example, the number of neurons of the second layer is 1.

Illustratively, as shown in FIG. 3, the method includes the steps of:

step 1: initializing parameters

Etc.

Step 2: SU perceives the environmental state of the t-slot, i.e. obtains environmental state information by interacting with sensors in the environment

, wherein />

Indicate->

The individual sensor is at->

The power received by the time slot.

Step 3: inputting the environment state into a first network model to obtain a power value, namely inputting the environment state perceived by SU into the first network model, wherein the output value of the first network model is that

I.e. the power value of the SU access channel.

Step 4: whether the signal-to-interference-and-noise ratio of the PU and the SU is larger than or equal to a preset threshold value;

specifically, according to SU in

Power value of time slot->

Calculating the received signal to interference plus noise ratio of SU and PU>

. If the receivers of PU and SU are->

All satisfy->

，/>

Then the time slot SU is called to successfully access the channel, and then step 6 is executed; otherwise indicate SU at->

Step 5 is performed when the access to the channel fails in each time slot.

Step 5: the prize value is zero;

SU at the first

The prize value obtained for each time slot is +.>

Indicating that at least one of SU and PU does not meet normal communication, then step 7 is performed.

Step 6: the rewarding value is SU in

Signal-to-interference-and-noise ratio obtained by time slot;

SU at the first

The prize value obtained for each time slot is +.>

Indicating that SU and PU simultaneously fulfil normal communication, then step 7 is performed, wherein +.>

Indicating SU is +.>

Signal-to-interference-and-noise ratio (throughput) obtained for the time slot.

Step 7: updating the environment state to obtain the updated environment state as

At the same time->

. wherein ,/>

Indicate->

Environmental status of individual time slots->

Indicate->

The individual sensor is at->

The received power of each slot.

Step 8: judging whether the first condition and/or the second condition are/is met; if the first condition is met, the training time interval is reached, and then step 9 is executed; if the second condition is met, the current round reaches the maximum value, and then step 9 is executed; if neither the first condition nor the second condition is satisfied, the process returns to step 3.

Step 9: training a model, and updating parameters of the model;

the first and second network models are trained, parameters of the first and second network models are updated, and then step 10 is performed.

Step 10: judging whether a second condition is met; if yes, the current round is ended, and step 11 is executed next; if not, the current round is not finished, and the step 3 is executed.

Step 11:

i.e. the SU increments the number of context interaction rounds by one, and then performs step 12.

Step 12: determining whether T has reached a global maximum, i.e

Whether the power control strategy is established or not, if so, the maximum round number is reached, and the power control strategy adjustment is ended; if the condition is not satisfied, the maximum round number is not reached, and the process returns to the step 3.

As shown in fig. 4 and fig. 5, a curve M represents the success rate and throughput obtained by the method according to the embodiment of the present invention, and a curve N represents the success rate and throughput obtained by the deep Q network algorithm, and from fig. 4 and fig. 5, the method according to the embodiment of the present invention can not only improve the success rate of SU, but also improve the throughput of SU.

As shown in fig. 6 and fig. 7, a curve M1 represents the success rate and throughput obtained by the method of the embodiment of the present invention under the condition of continuous power space, and a curve N represents the success rate and throughput obtained by the method of the embodiment of the present invention under the condition of discrete power space, and from fig. 6 and fig. 7, the method of the embodiment of the present invention can be used not only under the condition of discrete power space but also under the condition of continuous power space, so that the throughput of SU can be improved on the premise of ensuring the successful access rate of SU.

In summary, the method SU of the embodiment of the present invention perceives an environmental state; then, the SU outputs a power value, namely the power value of the access channel of the next time slot, through a first network model according to the current perceived environmental state; then, according to whether the power value of the SU access channel meets the QoS feedback of the SU and the PU at the same time, the environment enters the next state, the steps are repeatedly executed, namely the SU continuously interacts with the environment, and finally learns a proper power control scheme.

The power control device provided by the invention is described below, and the power control device described below and the power control method described above can be referred to correspondingly.

Fig. 8 is a schematic structural diagram of a power control device provided by the present invention. As shown in fig. 8, the power control apparatus provided in this embodiment includes:

an acquisition module 110, configured to acquire environmental status information;

a processing module 120, configured to obtain a power value of the secondary user access channel by using the trained first network model and the environmental state information; the first network model is obtained based on training of the second network model and training data; the training data includes: environmental state information of a plurality of time slots, power values corresponding to the time slots and rewarding values of the time slots; the prize value for each of said time slots is determined based on the power value for each of said time slots.

Optionally, the processing module 120 is further configured to perform the following steps:

Optionally, the processing module 120 is specifically configured to:

the first condition includes

T represents a time slot, ">

A first slot interval representing an update parameter;

the second condition includes

T is the time slot, ">

Representing a second slot interval of the update parameter.

Optionally, the environmental status information includes a received power of a sensor acquired from a sensor in the network environment.

Optionally, the received signal-to-interference-and-noise ratios of the primary user and the secondary user are expressed by the following formula:

；

wherein ,

time->

Time->

Representing user +.>

Transmitter->

To the receiver->

Channel gain of>

Representing receiver->

Is a noise power of (a) a noise power of (b).

Optionally, the first network model and the second network model respectively include two fully connected layers, and the number of neurons of the fully connected layers of the first network model and the second network model are different.

The device of the embodiment of the present invention is configured to perform the method of any of the foregoing method embodiments, and its implementation principle and technical effects are similar, and are not described in detail herein.

Fig. 9 illustrates a physical schematic diagram of an electronic device, as shown in fig. 9, which may include: processor 810, communication interface (Communications Interface) 820, memory 830, and communication bus 840, wherein processor 810, communication interface 820, memory 830 accomplish communication with each other through communication bus 840. Processor 810 may invoke logic instructions in memory 830 to perform a power control method comprising: acquiring environmental state information; obtaining the power value of the secondary user access channel by using the trained first network model and the environment state information; the first network model is obtained based on training of the second network model and training data; the training data includes: environmental state information of a plurality of time slots, power values corresponding to the time slots and rewarding values of the time slots; the prize value for each of said time slots is determined based on the power value for each of said time slots.

Further, the logic instructions in the memory 830 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of performing the power control method provided by the methods described above, the method comprising: acquiring environmental state information; obtaining the power value of the secondary user access channel by using the trained first network model and the environment state information; the first network model is obtained based on training of the second network model and training data; the training data includes: environmental state information of a plurality of time slots, power values corresponding to the time slots and rewarding values of the time slots; the prize value for each of said time slots is determined based on the power value for each of said time slots.

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the power control method provided by the above methods, the method comprising: acquiring environmental state information; obtaining the power value of the secondary user access channel by using the trained first network model and the environment state information; the first network model is obtained based on training of the second network model and training data; the training data includes: environmental state information of a plurality of time slots, power values corresponding to the time slots and rewarding values of the time slots; the prize value for each of said time slots is determined based on the power value for each of said time slots.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method of power control, comprising:

acquiring environmental state information;

2. The power control method according to claim 1, wherein before obtaining the power value of the secondary user access channel by using the trained first network model and the environmental state information, the method further comprises:

increasing t by a preset step length, and repeatedly executing the steps a1-a4 until training data of the secondary user are obtained;

3. The power control method according to claim 2, wherein the obtaining the reward value of the t slot according to the received signal-to-interference-and-noise ratios of the primary user and the secondary user includes:

4. The power control method of claim 2, wherein the obtaining training data of the secondary user comprises:

the first condition includes

T represents a time slot, ">

A first slot interval representing an update parameter;

the second condition includes

T is the time slot, representing->

Representing a second slot interval of the update parameter.

5. The method of any of claims 1-4, wherein the environmental status information includes a received power of a sensor acquired from a sensor in a network environment.

6. The power control method according to any one of claims 2-4, wherein the received signal-to-interference-and-noise ratio of the primary user and the secondary user is expressed by the following formula:

；

wherein ,

time->

Time->

Representing user +.>

Transmitter->

To the receiver->

Channel gain of>

Representing receiver->

Is a noise power of (a) a noise power of (b).

7. The method of any of claims 1-4, wherein the first network model and the second network model each include two fully connected layers, and wherein the number of neurons of the fully connected layers of the first network model and the second network model are different.

8. A power control apparatus, comprising:

the acquisition module is used for acquiring environmental state information;

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the power control method of any one of claims 1 to 6 when the program is executed by the processor.

10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the power control method according to any one of claims 1 to 6.