CN112954814A - Channel quality access method in cognitive radio - Google Patents
Channel quality access method in cognitive radio Download PDFInfo
- Publication number
- CN112954814A CN112954814A CN202110107271.7A CN202110107271A CN112954814A CN 112954814 A CN112954814 A CN 112954814A CN 202110107271 A CN202110107271 A CN 202110107271A CN 112954814 A CN112954814 A CN 112954814A
- Authority
- CN
- China
- Prior art keywords
- network
- channel
- secondary user
- global
- actor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W74/00—Wireless channel access, e.g. scheduled or random access
- H04W74/08—Non-scheduled or contention based access, e.g. random access, ALOHA, CSMA [Carrier Sense Multiple Access]
- H04W74/0808—Non-scheduled or contention based access, e.g. random access, ALOHA, CSMA [Carrier Sense Multiple Access] using carrier sensing, e.g. as in CSMA
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B17/00—Monitoring; Testing
- H04B17/30—Monitoring; Testing of propagation channels
- H04B17/309—Measuring or estimating channel quality parameters
- H04B17/336—Signal-to-interference ratio [SIR] or carrier-to-interference ratio [CIR]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B17/00—Monitoring; Testing
- H04B17/30—Monitoring; Testing of propagation channels
- H04B17/382—Monitoring; Testing of propagation channels for resource allocation, admission control or handover
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Abstract
The invention provides a channel quality access method in cognitive radio, which comprises the following specific steps: the local network has an actor network and a critic network, the actor network is responsible for channel selection and interacts with the environment to collect interaction information, the critic network evaluates the advantages and disadvantages of actor network channel selection strategies, but the local network does not update gradients, but collects the gradients and transmits the gradients to the global network, the global network does not interact with the environment, the global network collects the gradients collected by the local networks, performs gradient updating on the local networks, and transmits updated network parameters to the local networks again. The invention comprehensively considers the channel quality and the idle probability, the secondary user can effectively avoid accessing the inferior channel, and the access success rate meeting the service quality requirement is greatly improved.
Description
(I) technical field
The invention belongs to the technical field of communication, in particular to a cognitive radio communication technology, and particularly relates to a channel quality access method in cognitive radio.
(II) background of the invention
With the popularization of 4G/5G networks, mobile devices are increasing, and diversified disciplines such as cloud computing, Internet of things and artificial intelligence are generated, so that emerging communication services are endless. However, wireless spectrum has become increasingly scarce as the basis for the operation of various types of communication services under the existing spectrum planning management. The existing spectrum allocation mode has exclusivity and exclusivity, and even if an authorized user does not use the allocated frequency band, other users cannot use the allocated frequency band. The cognitive radio uses the authorized frequency band in a dynamic spectrum access mode, and provides a brand new scheme for improving the spectrum utilization rate on the premise of not causing harmful interference to authorized users/main users. The channel accessed by the secondary user sensing directly influences the sensing delay, transmission performance and other aspects of the secondary user, and the research thereof is imminent, and the channel accessed by the secondary user sensing is one of the key factors for improving the performance of the cognitive radio system.
The existing channel access algorithm adopts sequential detection access, determines a sensing sequence before sensing and senses according to the defined sensing sequence. And sequentially detecting the access under the condition of knowing some channel environment prior information, such as channel idle probability, a master user occupation rule, a channel signal-to-noise ratio and other information, and designing a channel sensing access sequence. Although sequential detection access is simple in design, it requires knowledge of most of the environment a priori, which is difficult to implement in a practical environment. The performance of the sequential detection algorithm is easily influenced by 'poor channels' in the environment, and although the idle degree of the channels is high, the signal-to-noise ratio is low; or the primary user occupies the channel frequently although the channel is large. If the signal-to-noise ratio sequential detection algorithm is based on, a channel with a high signal-to-noise ratio but frequent occupation of a master user is easy to select, so that the perception access success rate is low; or the sequential detection algorithm based on the channel idle probability is easy to select the channel with high idle degree but low signal-to-noise ratio, which causes the result that the secondary user does not meet the service quality requirement and the throughput obtained by the secondary user is low.
The deep reinforcement learning has excellent success in the fields of electronic games, robots, go and the like, and can interact with the environment to learn on the premise of losing most of prior information of the environment, so that intelligent decision is made. The invention introduces the network of the asynchronous dominant actor appraisal family in deep reinforcement learning into the cognitive radio, so that the secondary user can intelligently select the channel meeting the self service quality requirement for perception access under the condition of unknown most channel environment prior information.
Disclosure of the invention
The invention aims to provide a method for detecting the interference of a low-quality channel in the environment, which can overcome the defect and the defect that a sequential detection algorithm is easily interfered by the low-quality channel in the environment; and intelligently selecting the channel meeting the self service quality requirement for perception access by a method of unknown most channel environment prior information.
The purpose of the invention is realized as follows:
1.1, initializing actor network and comment family network parameters in the global network, and giving the global network parameters to the local network;
1.2, under the local network, the secondary user selects a channel to access according to an observation matrix formed by observation information and a current strategy, and the secondary user perceives the selected channel to access and obtains instant return according to the channel state;
1.3, after a plurality of iterations, respectively calculating gradients of the local actor network and the local commenting family network, transmitting the gradients to the global network, and resetting the gradients of the local actor network and the commenting family network;
1.4, updating the global actor network according to the actor network updating function, updating the global critic network according to the critic network updating function, and endowing the updated global network parameters to a local network;
and 1.5, circularly executing the step 1.2 to the step 1.4 until all the circulation times are finished, and obtaining a complete neural network model.
The asynchronous dominant actor critic network comprises the following main contents: the asynchronous dominant actor critic network is divided into two major parts, a global network and a local network. The neural network structure of the global network and the local network is the same, wherein the actor network is a hidden layer, the number of the neurons is 200, and the activation function is a linear rectification function. The critic network is also a hidden layer, the number of the neurons is 200, and the activation function is a linear rectification function.
The main contents of the local network included in the invention are as follows: each local network is independently interacted with the environment, so that each local network also has an independent actor network and a critic network, the local actor networks are independently interacted with the channel environment respectively, the critic network evaluates the action strategy of the actor networks, and the network structures of the local networks are completely the same.
The observation matrix comprises the following main contents: the method is characterized in that: the secondary user can only observe the state of the selected sensing channel, and the observation information of the secondary user in the t-th time slot is as follows:
Ot=[o1,t,o2,t,...,oN,t]
after a temporary memory mechanism is introduced, the secondary user can store the observation information of the previous M steps. The M-step observation information forms an observation matrix, and the observation matrix at the t-th moment can be expressed as:
St=[Ot-1,Ot-2,Ot-3,...,Ot-M]
the interactive return function comprises the following main contents: the secondary user selects to sense that the accessed channel is idle and meets the self service quality requirement, so that the decision is correct, and positive feedback is obtained; if the channel selected and sensed by the secondary user is occupied by the primary user, the decision is wrong, and a negative feedback punishment is received. Considering that channels meeting the service quality requirement of the secondary user are all in a busy state in a certain period of time, the channel selected and sensed by the secondary user is set to be an idle channel although the channel does not meet the service quality requirement, and a small positive feedback can still be obtained.
DiRepresenting the obtained throughput of the ith channel, with η being the throughput threshold of the secondary user. (D)i- η)/η is the ratio of the throughput obtained for the ith channel to the threshold η difference, mainly to guide the secondary user to select the more excellent channel.
The invention comprises a global network which mainly comprises the following contents: the global network does not interact with the environment, and the method mainly works by collecting gradient data of each network, updating the network through the gradient data and transmitting updated network parameters to each local network.
The update function for the global actor network is:
where θ represents a parameter of the global actor network; a (s, a) represents a merit function representing the degree of superiority and inferiority of the operation in the environmental state; h (Pi)θ'(s)) is a policy entropy for increasing exploratory power of previous users; (ii) a Beta represents a policy entropy weight for controlling the degree of exploration.
The update function for the global critic network is:
where μ represents a parameter of the global critic network; r represents the instant reward obtained by the secondary user; gamma is a discount factor; λ is the learning rate of the critic network.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention comprehensively considers the signal-to-noise ratio and the idle probability of the channel, can effectively avoid poor channels in the environment and effectively improve the success rate of accessing the high-quality channel by the secondary user;
2. the return function of the invention is set to encourage the secondary user to access a more excellent channel on the premise of meeting the QoS, so that the secondary user can be guided to make a better decision;
3. the method is close to the access success rate of the known prior information algorithm under the condition of missing most of the environmental prior information, and is higher than the access success rate of part of the known prior information algorithm when the sensing times are less.
(IV) description of the drawings
FIG. 1 is a flow chart of the algorithm of the present invention;
FIG. 2 shows the number of selections of different channels in each cycle;
fig. 3 the present invention compares sequence-aware access success rates with different known apriori information.
(V) detailed description of the preferred embodiments
The following detailed description is made with reference to the accompanying drawings and specific examples:
the final objective of the algorithm of the invention is that the secondary user can intelligently select an idle channel which accords with the self service quality according to the learned channel access strategy for perception access, and abstract the idle channel into reinforcement learning, namely the strategy adopted by the intelligent agent can maximize the accumulated return. The communication of the user in a single circulation can also be carried out infinitely along with the time, the accumulated return tends to be infinite, and the quality of the strategy cannot be effectively evaluated. Thus defining the number of slots in a single iteration as T. The above problem can be expressed as the following formula:
wherein r isi,tIndicating the instantaneous reward obtained by selecting the ith channel at time t.
The invention sets that N channels and a secondary user exist in the environment, the states of the N channels are all time-varying, and the channel state is only related to the occupation of the primary user; setting that a secondary user can sense N (N < < N) channels in a time slot, wherein in the t-th time slot, environment information which can be observed by the secondary user is as follows:
Ot=[o1,t,o2,t,...,oN,t] (2)
wherein o isi,tObservation information representing the i-th channel at the time t of the secondary user:
wherein xi,tThe channel state of the ith channel at time t. After a temporary memory mechanism is introduced, the secondary user can store the observation information of the previous M steps. The M-step observation information forms an observation matrix, and the observation matrix at the t-th moment can be expressed as:
St=[Ot-1,Ot-2,Ot-3,...,Ot-M] (4)
and the secondary user selects the sensing access which best meets the self QoS requirement after sensing the n channels. The number of elements of the action set when selecting n channel senses is:
if two channels are selected for sensing in a single time slot when 5 channels exist in the environment, the action set is a { (1,2), (1,3), (1,4),. ·, (4,5) }. If only one channel can be sensed in a single timeslot, the action set is the number of channels existing in the environment:
A={1,2,3,...,N} (6)
if the secondary user selects to sense that the accessed channel is idle and meets the self service quality requirement, the decision is correct, and positive feedback is obtained; if the channel selected and sensed by the secondary user is occupied by the primary user, the decision is wrong, and a negative feedback punishment is received. Considering that channels meeting the service quality requirement of the secondary user are all in a busy state in a certain period of time, the channel selected and sensed by the secondary user is set to be an idle channel although the channel does not meet the service quality requirement, and a small positive feedback can still be obtained. The reward function may be represented by the following equation:
the quality of service requirement of the secondary user is determined by the throughput, the quality of service being determined only if the obtained throughput of the access channel is above a threshold requirementThe amount is qualified. DiRepresenting the obtained throughput of the ith channel, with η being the throughput threshold of the secondary user. (D)i- η)/η is the ratio of the throughput obtained for the ith channel to the threshold η difference, mainly to guide the secondary user to select the more excellent channel.
The network of the asynchronous dominant actor commentary family is divided into a local network and a global network. Each local network is independently interacted with the environment, so that each local network also has an independent actor network and a critic network, the local actor networks are independently interacted with the channel environment respectively, the critic network evaluates the action strategy of the actor networks, and the network structures of the local networks are completely the same. The global network does not interact with the environment, and the method mainly works by collecting gradient data of each network, updating the network through the gradient data and transmitting updated network parameters to each local network.
The actor network in the local network performs interaction and action selection with the environment, and the main task is strategy learning, which directly performs gradient calculation on the strategy:
wherein J (θ) represents an objective function of the policy network; piθ(s, a) represents the probability of selecting action a in state s when the network parameter is θ; d(s) representing the number of states collected for this interaction;representing the immediate reward obtained by selecting action a in state s.
The local critic network is mainly used for estimating state value, evaluating the quality degree of an actor network action strategy and guiding actor network updating through an advantage function. The merit function is the merit of some action a over the average in state s. Multi-step sampling is employed in the asynchronous dominant actor critic network to accelerate convergence:
where V(s) represents the value of state s, which can be estimated by the critic's network. In conjunction with equation (9), the policy gradient calculation of equation (8) becomes:
the global network does not interact with the environment, and the method mainly works by collecting gradient data of each network, updating the network through the gradient data and transmitting updated network parameters to each local network. The structure of the global network also remains consistent with the local network due to the mutual communication of parameters and gradients. The actor network in the global network is also responsible for updating the action strategy, and the gradient update can be expressed as:
where θ represents a parameter of the global actor network; a (s, a) represents a merit function representing the degree of superiority and inferiority of the operation in the environmental state; h (Pi)θ'(s)) is a policy entropy for increasing exploratory power of previous users; beta represents a policy entropy weight for controlling the degree of exploration. After the dominance function is introduced, the global network critics network improves the fitting accuracy of the value function by minimizing the square of the dominance function, and the gradient update of the global network critics network can be expressed as:
where μ represents a parameter of the global critic network; r represents the instant reward obtained by the secondary user; gamma is a discount factor; λ is the learning rate of the critic network.
Simulation parameter setting of the simulation example of the invention: the simulation parameters are divided into two parts of system environment parameters and neural network parameters. Wherein the system environment parameters are: the existence of N-10 strips in the environmentIndependent channels, each of which may be occupied by a primary user, with an occupation probability PbusyIs (0,1), the signal-to-noise ratio of the channel ranges from [ -10,10 [)]dB. In simulation experiments, the signal-to-noise ratio of 10 channels is set to be SNR [ -10, -8, -9, -5, -3,0,4,5,7,10 [ -8, -9, -5 [ -3]Corresponding to an occupation probability of Pbusy=[0.1,0.3,0.4,0.3,0.2,0.5,0.3,0.4,0.4,0.9]. The neural network parameters are: the network structures of actors and commentators of the local network and the global network are the same, the actor network is a hidden layer, the number of neurons is 200, and an activation function is a linear rectification function; the output layer directly outputs the action selection probability distribution. The critic network is also a hidden layer, the number of the neurons is 200, and the activation function is a linear rectification function; the output layer is to output an estimate of the value of the state action. The learning rate of the critic network is required to be more than or equal to that of the actor network, and the learning rate Lr of the critic network is set by the methodc0.001, learning rate Lr of actor networka0.0001. The invention sets the access success rate as the probability of the secondary user successfully accessing the idle channel which accords with the service quality.
As shown in fig. 2, there are 3 channels meeting QoS requirements in the environment, and the three channels are selected by the secondary user to sense the access times when sensing once every time slot. As can be seen in the figure, the iteration is initially due to heuristics, and three channels are selected almost the same number of times. However, as the iteration progresses, although the signal-to-noise ratio of the 10 th channel is high, the occupied frequency of the primary user is high, and therefore the number of times of selection is also reduced continuously. The learning of the secondary user considers the channel access from a longer angle, so that the poor channel can be effectively avoided. The other two channels that meet the QoS requirement are selected slowly because their primary users occupy less frequency. Meanwhile, due to the arrangement of the return function, the secondary user is biased to access the 9 th channel under the condition that the occupation probability of the primary user is not large, which shows that the arrangement of the return function can guide the secondary user to make a better decision.
As shown in fig. 3, when there are 3 channels in the environment meeting the QoS requirement, the present invention compares the access success rate with the sequence sensing of different known prior information under different sensing times. All knowPerception is that the algorithm assumes that the signal-to-noise ratio of all channels known by secondary users and the occupation probability of the primary user corresponding to each channel are calculated according to the product of the signal-to-noise ratio and the idle probability of the primary user (SNR (1-P)busy) For sequence perception. It can be seen from the figure that the fully-known sensing always senses a fixed channel due to the characteristics of the sequence sensing, so that the access success rate of the fully-known sensing depends on the first sensing channel under the condition of sensing for 1 time, and the sensing access algorithm provided by the invention can intelligently select a proper channel for access without being limited to the sequence sensing access.
The invention provides a channel quality access method in cognitive radio, which comprises the following specific steps: the local network has an actor network and a critic network, the actor network is responsible for channel selection and interacts with the environment to collect interaction information, the critic network evaluates the advantages and disadvantages of actor network channel selection strategies, but the local network does not update gradients, but collects the gradients and transmits the gradients to the global network, the global network does not interact with the environment, the global network collects the gradients collected by the local networks, performs gradient updating on the local networks, and transmits updated network parameters to the local networks again. The invention comprehensively considers the channel quality and the idle probability, the secondary user can effectively avoid accessing the inferior channel, and the access success rate meeting the service quality requirement is greatly improved.
The technical solution of the present invention is not limited to the technical method, and the present invention can be extended to other modifications, variations, applications and embodiments in application, and all such modifications, variations, applications, embodiments are considered to be within the spirit and teaching scope of the present invention.
Claims (6)
1. A channel quality access method in cognitive radio is characterized in that: the method comprises the following steps:
1.1, initializing actor network and comment family network parameters in the global network, and giving the global network parameters to the local network;
1.2, under the local network, the secondary user selects a channel to access according to an observation matrix formed by observation information and a current strategy, and the secondary user perceives the selected channel to access and obtains instant return according to the channel state;
1.3, after a plurality of iterations, respectively calculating gradients of the local actor network and the local commenting family network, transmitting the gradients to the global network, and resetting the gradients of the local actor network and the commenting family network;
1.4, updating the global actor network according to the actor network updating function, updating the global critic network according to the critic network updating function, and endowing the updated global network parameters to a local network;
and 1.5, circularly executing the step 1.2 to the step 1.4 until all the circulation times are finished, and obtaining a complete neural network model.
2. The method of claim 1, wherein the method comprises: in the environment, a plurality of channels can be accessed, and the secondary user quickly finds and accesses the channel which meets the self service quality requirement.
3. The method of claim 1, wherein the method comprises: step 1.1, the neural networks of the global network and the local network have the same structure, wherein the actor network is a hidden layer, the number of the neurons is 200, the activation function is a linear rectification function, the critic network is a hidden layer, the number of the neurons is 200, and the activation function is a linear rectification function.
4. The method of claim 1, wherein the method comprises: in the step 1.2, each local network is independently interacted with the environment, the actor network and the comment family network are independent respectively, the local actor networks are independently interacted with the channel environment respectively, the comment family network evaluates the action strategy of the actor network, and the network structures of the local networks are completely the same.
5. The method of claim 1, wherein the method comprises: in the observation matrix in step 1.2, the secondary user can only observe the state of the selected sensing channel, and the observation information of the secondary user at the t-th time slot is as follows:
Ot=[o1,t,o2,t,...,oN,t]
after a temporary memory mechanism is introduced, the secondary user stores the observation information of the previous M steps, the observation information of the M steps forms an observation matrix, and the observation matrix at the t-th moment can be expressed as:
St=[Ot-1,Ot-2,Ot-3,...,Ot-M]
obtaining a return after interacting with the environment, wherein the return function is as follows:
the secondary user selects to sense that the accessed channel is idle and meets the self service quality requirement, so that the decision is correct, and positive feedback is obtained; if the channel selected and sensed by the secondary user is occupied by the primary user, the decision error is indicated, and a negative feedback punishment is received; considering that channels meeting the service quality requirement of the secondary user are all in a busy state in a certain period of time, setting the channels selected and sensed by the secondary user to be idle channels although the channels do not meet the service quality requirement, and still obtaining a small positive feedback, DiRepresenting the obtained throughput of the ith channel, η being the throughput threshold of the secondary user, (D)i- η)/η is the ratio of the throughput obtained for the ith channel to the threshold η difference, mainly to guide the secondary user to select the more excellent channel.
6. The method of claim 1, wherein the method comprises: in step 1.3, the update function of the global actor network is:
where θ represents a parameter of the global actor network, A (s, a) represents a merit function representing a degree of goodness of the action under the environment condition, and H (π)θ'(s)) is a policy entropy for increasing exploratory power of previous users;
the update function for the global critic network is:
wherein mu represents the parameters of the global critic network, r represents the instant return obtained by the secondary user, gamma is a discount factor, and lambda is the learning rate of the critic network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110107271.7A CN112954814B (en) | 2021-01-27 | 2021-01-27 | Channel quality access method in cognitive radio |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110107271.7A CN112954814B (en) | 2021-01-27 | 2021-01-27 | Channel quality access method in cognitive radio |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112954814A true CN112954814A (en) | 2021-06-11 |
CN112954814B CN112954814B (en) | 2022-05-20 |
Family
ID=76237380
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110107271.7A Active CN112954814B (en) | 2021-01-27 | 2021-01-27 | Channel quality access method in cognitive radio |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112954814B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108471619A (en) * | 2018-03-22 | 2018-08-31 | 中南大学 | The channel selecting method of cognition wireless sensor network |
CN109089307A (en) * | 2018-07-19 | 2018-12-25 | 浙江工业大学 | A kind of energy-collecting type wireless relay network througput maximization approach based on asynchronous advantage actor reviewer algorithm |
CN109379752A (en) * | 2018-09-10 | 2019-02-22 | 中国移动通信集团江苏有限公司 | Optimization method, device, equipment and the medium of Massive MIMO |
CN110190918A (en) * | 2019-04-25 | 2019-08-30 | 广西大学 | Cognition wireless sensor network frequency spectrum access method based on depth Q study |
CN110492955A (en) * | 2019-08-19 | 2019-11-22 | 上海应用技术大学 | Spectrum prediction switching method based on transfer learning strategy |
CN110691422A (en) * | 2019-10-06 | 2020-01-14 | 湖北工业大学 | Multi-channel intelligent access method based on deep reinforcement learning |
CN111262638A (en) * | 2020-01-17 | 2020-06-09 | 合肥工业大学 | Dynamic spectrum access method based on efficient sample learning |
WO2020152389A1 (en) * | 2019-01-22 | 2020-07-30 | Nokia Solutions And Networks Oy | Machine learning for a communication network |
CN112188503A (en) * | 2020-09-30 | 2021-01-05 | 南京爱而赢科技有限公司 | Dynamic multichannel access method based on deep reinforcement learning and applied to cellular network |
-
2021
- 2021-01-27 CN CN202110107271.7A patent/CN112954814B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108471619A (en) * | 2018-03-22 | 2018-08-31 | 中南大学 | The channel selecting method of cognition wireless sensor network |
CN109089307A (en) * | 2018-07-19 | 2018-12-25 | 浙江工业大学 | A kind of energy-collecting type wireless relay network througput maximization approach based on asynchronous advantage actor reviewer algorithm |
CN109379752A (en) * | 2018-09-10 | 2019-02-22 | 中国移动通信集团江苏有限公司 | Optimization method, device, equipment and the medium of Massive MIMO |
WO2020152389A1 (en) * | 2019-01-22 | 2020-07-30 | Nokia Solutions And Networks Oy | Machine learning for a communication network |
CN110190918A (en) * | 2019-04-25 | 2019-08-30 | 广西大学 | Cognition wireless sensor network frequency spectrum access method based on depth Q study |
CN110492955A (en) * | 2019-08-19 | 2019-11-22 | 上海应用技术大学 | Spectrum prediction switching method based on transfer learning strategy |
CN110691422A (en) * | 2019-10-06 | 2020-01-14 | 湖北工业大学 | Multi-channel intelligent access method based on deep reinforcement learning |
CN111262638A (en) * | 2020-01-17 | 2020-06-09 | 合肥工业大学 | Dynamic spectrum access method based on efficient sample learning |
CN112188503A (en) * | 2020-09-30 | 2021-01-05 | 南京爱而赢科技有限公司 | Dynamic multichannel access method based on deep reinforcement learning and applied to cellular network |
Non-Patent Citations (3)
Title |
---|
P. YANG ET AL: "Dynamic Spectrum Access in Cognitive Radio Networks Using Deep Reinforcement Learning and Evolutionary Game", 《2018 IEEE/CIC INTERNATIONAL CONFERENCE ON COMMUNICATIONS IN CHINA (ICCC)》 * |
Z. SHI, X. XIE AND H. LU: "Deep Reinforcement Learning Based Intelligent User Selection in Massive MIMO Underlay Cognitive Radios", 《IEEE ACCESS》 * |
郭冰洁: "认知无线电系统中多信道动态频谱接入算法研究", 《信息科技辑》 * |
Also Published As
Publication number | Publication date |
---|---|
CN112954814B (en) | 2022-05-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112134916B (en) | Cloud edge collaborative computing migration method based on deep reinforcement learning | |
Wang et al. | A survey on applications of model-free strategy learning in cognitive wireless networks | |
CN107690176B (en) | Network selection method based on Q learning algorithm | |
CN109474980A (en) | A kind of wireless network resource distribution method based on depth enhancing study | |
CN112367132B (en) | Power distribution algorithm in cognitive radio based on reinforcement learning solution | |
CN111262638B (en) | Dynamic spectrum access method based on efficient sample learning | |
CN112188503B (en) | Dynamic multichannel access method based on deep reinforcement learning and applied to cellular network | |
CN113596785B (en) | D2D-NOMA communication system resource allocation method based on deep Q network | |
CN112492691A (en) | Downlink NOMA power distribution method of deep certainty strategy gradient | |
CN108833227A (en) | A kind of smart home communication optimization scheduling system and method based on edge calculations | |
Rao et al. | Network selection in heterogeneous environment: A step toward always best connected and served | |
Zhang et al. | Endogenous security-aware resource management for digital twin and 6G edge intelligence integrated smart park | |
CN103249050B (en) | Based on the multiple dimensioned frequency spectrum access method of business demand | |
CN114126021A (en) | Green cognitive radio power distribution method based on deep reinforcement learning | |
CN114051252A (en) | Multi-user intelligent transmitting power control method in wireless access network | |
CN112954814B (en) | Channel quality access method in cognitive radio | |
Mishra et al. | Raddpg: Resource allocation in cognitive radio with deep reinforcement learning | |
Prasad et al. | Intelligent spectrum sharing and sensing in cognitive radio network by using AROA (adaptive rider optimization algorithm) | |
CN106131920A (en) | A kind of heterogeneous network system of selection based on many attributes Yu queuing theory | |
CN113395757B (en) | Deep reinforcement learning cognitive network power control method based on improved return function | |
CN115250156A (en) | Wireless network multichannel frequency spectrum access method based on federal learning | |
CN114980254B (en) | Dynamic multichannel access method and device based on duel deep cycle Q network | |
Han et al. | Distributed hierarchical game-based algorithm for downlink power allocation in OFDMA femtocell networks | |
Sun et al. | EWA Selection strategy with channel handoff scheme in cognitive radio | |
CN112383965B (en) | Cognitive radio power distribution method based on DRQN and multi-sensor model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |