CN112954814A - Channel quality access method in cognitive radio - Google Patents

Channel quality access method in cognitive radio Download PDF

Info

Publication number
CN112954814A
CN112954814A CN202110107271.7A CN202110107271A CN112954814A CN 112954814 A CN112954814 A CN 112954814A CN 202110107271 A CN202110107271 A CN 202110107271A CN 112954814 A CN112954814 A CN 112954814A
Authority
CN
China
Prior art keywords
network
channel
secondary user
global
actor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110107271.7A
Other languages
Chinese (zh)
Other versions
CN112954814B (en
Inventor
叶方
张音捷
李一兵
孙骞
田园
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN202110107271.7A priority Critical patent/CN112954814B/en
Publication of CN112954814A publication Critical patent/CN112954814A/en
Application granted granted Critical
Publication of CN112954814B publication Critical patent/CN112954814B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W74/00Wireless channel access, e.g. scheduled or random access
    • H04W74/08Non-scheduled or contention based access, e.g. random access, ALOHA, CSMA [Carrier Sense Multiple Access]
    • H04W74/0808Non-scheduled or contention based access, e.g. random access, ALOHA, CSMA [Carrier Sense Multiple Access] using carrier sensing, e.g. as in CSMA
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B17/00Monitoring; Testing
    • H04B17/30Monitoring; Testing of propagation channels
    • H04B17/309Measuring or estimating channel quality parameters
    • H04B17/336Signal-to-interference ratio [SIR] or carrier-to-interference ratio [CIR]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B17/00Monitoring; Testing
    • H04B17/30Monitoring; Testing of propagation channels
    • H04B17/382Monitoring; Testing of propagation channels for resource allocation, admission control or handover
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The invention provides a channel quality access method in cognitive radio, which comprises the following specific steps: the local network has an actor network and a critic network, the actor network is responsible for channel selection and interacts with the environment to collect interaction information, the critic network evaluates the advantages and disadvantages of actor network channel selection strategies, but the local network does not update gradients, but collects the gradients and transmits the gradients to the global network, the global network does not interact with the environment, the global network collects the gradients collected by the local networks, performs gradient updating on the local networks, and transmits updated network parameters to the local networks again. The invention comprehensively considers the channel quality and the idle probability, the secondary user can effectively avoid accessing the inferior channel, and the access success rate meeting the service quality requirement is greatly improved.

Description

Channel quality access method in cognitive radio
(I) technical field
The invention belongs to the technical field of communication, in particular to a cognitive radio communication technology, and particularly relates to a channel quality access method in cognitive radio.
(II) background of the invention
With the popularization of 4G/5G networks, mobile devices are increasing, and diversified disciplines such as cloud computing, Internet of things and artificial intelligence are generated, so that emerging communication services are endless. However, wireless spectrum has become increasingly scarce as the basis for the operation of various types of communication services under the existing spectrum planning management. The existing spectrum allocation mode has exclusivity and exclusivity, and even if an authorized user does not use the allocated frequency band, other users cannot use the allocated frequency band. The cognitive radio uses the authorized frequency band in a dynamic spectrum access mode, and provides a brand new scheme for improving the spectrum utilization rate on the premise of not causing harmful interference to authorized users/main users. The channel accessed by the secondary user sensing directly influences the sensing delay, transmission performance and other aspects of the secondary user, and the research thereof is imminent, and the channel accessed by the secondary user sensing is one of the key factors for improving the performance of the cognitive radio system.
The existing channel access algorithm adopts sequential detection access, determines a sensing sequence before sensing and senses according to the defined sensing sequence. And sequentially detecting the access under the condition of knowing some channel environment prior information, such as channel idle probability, a master user occupation rule, a channel signal-to-noise ratio and other information, and designing a channel sensing access sequence. Although sequential detection access is simple in design, it requires knowledge of most of the environment a priori, which is difficult to implement in a practical environment. The performance of the sequential detection algorithm is easily influenced by 'poor channels' in the environment, and although the idle degree of the channels is high, the signal-to-noise ratio is low; or the primary user occupies the channel frequently although the channel is large. If the signal-to-noise ratio sequential detection algorithm is based on, a channel with a high signal-to-noise ratio but frequent occupation of a master user is easy to select, so that the perception access success rate is low; or the sequential detection algorithm based on the channel idle probability is easy to select the channel with high idle degree but low signal-to-noise ratio, which causes the result that the secondary user does not meet the service quality requirement and the throughput obtained by the secondary user is low.
The deep reinforcement learning has excellent success in the fields of electronic games, robots, go and the like, and can interact with the environment to learn on the premise of losing most of prior information of the environment, so that intelligent decision is made. The invention introduces the network of the asynchronous dominant actor appraisal family in deep reinforcement learning into the cognitive radio, so that the secondary user can intelligently select the channel meeting the self service quality requirement for perception access under the condition of unknown most channel environment prior information.
Disclosure of the invention
The invention aims to provide a method for detecting the interference of a low-quality channel in the environment, which can overcome the defect and the defect that a sequential detection algorithm is easily interfered by the low-quality channel in the environment; and intelligently selecting the channel meeting the self service quality requirement for perception access by a method of unknown most channel environment prior information.
The purpose of the invention is realized as follows:
1.1, initializing actor network and comment family network parameters in the global network, and giving the global network parameters to the local network;
1.2, under the local network, the secondary user selects a channel to access according to an observation matrix formed by observation information and a current strategy, and the secondary user perceives the selected channel to access and obtains instant return according to the channel state;
1.3, after a plurality of iterations, respectively calculating gradients of the local actor network and the local commenting family network, transmitting the gradients to the global network, and resetting the gradients of the local actor network and the commenting family network;
1.4, updating the global actor network according to the actor network updating function, updating the global critic network according to the critic network updating function, and endowing the updated global network parameters to a local network;
and 1.5, circularly executing the step 1.2 to the step 1.4 until all the circulation times are finished, and obtaining a complete neural network model.
The asynchronous dominant actor critic network comprises the following main contents: the asynchronous dominant actor critic network is divided into two major parts, a global network and a local network. The neural network structure of the global network and the local network is the same, wherein the actor network is a hidden layer, the number of the neurons is 200, and the activation function is a linear rectification function. The critic network is also a hidden layer, the number of the neurons is 200, and the activation function is a linear rectification function.
The main contents of the local network included in the invention are as follows: each local network is independently interacted with the environment, so that each local network also has an independent actor network and a critic network, the local actor networks are independently interacted with the channel environment respectively, the critic network evaluates the action strategy of the actor networks, and the network structures of the local networks are completely the same.
The observation matrix comprises the following main contents: the method is characterized in that: the secondary user can only observe the state of the selected sensing channel, and the observation information of the secondary user in the t-th time slot is as follows:
Ot=[o1,t,o2,t,...,oN,t]
after a temporary memory mechanism is introduced, the secondary user can store the observation information of the previous M steps. The M-step observation information forms an observation matrix, and the observation matrix at the t-th moment can be expressed as:
St=[Ot-1,Ot-2,Ot-3,...,Ot-M]
the interactive return function comprises the following main contents: the secondary user selects to sense that the accessed channel is idle and meets the self service quality requirement, so that the decision is correct, and positive feedback is obtained; if the channel selected and sensed by the secondary user is occupied by the primary user, the decision is wrong, and a negative feedback punishment is received. Considering that channels meeting the service quality requirement of the secondary user are all in a busy state in a certain period of time, the channel selected and sensed by the secondary user is set to be an idle channel although the channel does not meet the service quality requirement, and a small positive feedback can still be obtained.
Figure BDA0002918007300000021
DiRepresenting the obtained throughput of the ith channel, with η being the throughput threshold of the secondary user. (D)i- η)/η is the ratio of the throughput obtained for the ith channel to the threshold η difference, mainly to guide the secondary user to select the more excellent channel.
The invention comprises a global network which mainly comprises the following contents: the global network does not interact with the environment, and the method mainly works by collecting gradient data of each network, updating the network through the gradient data and transmitting updated network parameters to each local network.
The update function for the global actor network is:
Figure BDA0002918007300000031
where θ represents a parameter of the global actor network; a (s, a) represents a merit function representing the degree of superiority and inferiority of the operation in the environmental state; h (Pi)θ'(s)) is a policy entropy for increasing exploratory power of previous users; (ii) a Beta represents a policy entropy weight for controlling the degree of exploration.
The update function for the global critic network is:
Figure BDA0002918007300000032
where μ represents a parameter of the global critic network; r represents the instant reward obtained by the secondary user; gamma is a discount factor; λ is the learning rate of the critic network.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention comprehensively considers the signal-to-noise ratio and the idle probability of the channel, can effectively avoid poor channels in the environment and effectively improve the success rate of accessing the high-quality channel by the secondary user;
2. the return function of the invention is set to encourage the secondary user to access a more excellent channel on the premise of meeting the QoS, so that the secondary user can be guided to make a better decision;
3. the method is close to the access success rate of the known prior information algorithm under the condition of missing most of the environmental prior information, and is higher than the access success rate of part of the known prior information algorithm when the sensing times are less.
(IV) description of the drawings
FIG. 1 is a flow chart of the algorithm of the present invention;
FIG. 2 shows the number of selections of different channels in each cycle;
fig. 3 the present invention compares sequence-aware access success rates with different known apriori information.
(V) detailed description of the preferred embodiments
The following detailed description is made with reference to the accompanying drawings and specific examples:
the final objective of the algorithm of the invention is that the secondary user can intelligently select an idle channel which accords with the self service quality according to the learned channel access strategy for perception access, and abstract the idle channel into reinforcement learning, namely the strategy adopted by the intelligent agent can maximize the accumulated return. The communication of the user in a single circulation can also be carried out infinitely along with the time, the accumulated return tends to be infinite, and the quality of the strategy cannot be effectively evaluated. Thus defining the number of slots in a single iteration as T. The above problem can be expressed as the following formula:
Figure BDA0002918007300000033
wherein r isi,tIndicating the instantaneous reward obtained by selecting the ith channel at time t.
The invention sets that N channels and a secondary user exist in the environment, the states of the N channels are all time-varying, and the channel state is only related to the occupation of the primary user; setting that a secondary user can sense N (N < < N) channels in a time slot, wherein in the t-th time slot, environment information which can be observed by the secondary user is as follows:
Ot=[o1,t,o2,t,...,oN,t] (2)
wherein o isi,tObservation information representing the i-th channel at the time t of the secondary user:
Figure BDA0002918007300000041
wherein xi,tThe channel state of the ith channel at time t. After a temporary memory mechanism is introduced, the secondary user can store the observation information of the previous M steps. The M-step observation information forms an observation matrix, and the observation matrix at the t-th moment can be expressed as:
St=[Ot-1,Ot-2,Ot-3,...,Ot-M] (4)
and the secondary user selects the sensing access which best meets the self QoS requirement after sensing the n channels. The number of elements of the action set when selecting n channel senses is:
Figure BDA0002918007300000042
if two channels are selected for sensing in a single time slot when 5 channels exist in the environment, the action set is a { (1,2), (1,3), (1,4),. ·, (4,5) }. If only one channel can be sensed in a single timeslot, the action set is the number of channels existing in the environment:
A={1,2,3,...,N} (6)
if the secondary user selects to sense that the accessed channel is idle and meets the self service quality requirement, the decision is correct, and positive feedback is obtained; if the channel selected and sensed by the secondary user is occupied by the primary user, the decision is wrong, and a negative feedback punishment is received. Considering that channels meeting the service quality requirement of the secondary user are all in a busy state in a certain period of time, the channel selected and sensed by the secondary user is set to be an idle channel although the channel does not meet the service quality requirement, and a small positive feedback can still be obtained. The reward function may be represented by the following equation:
Figure BDA0002918007300000043
the quality of service requirement of the secondary user is determined by the throughput, the quality of service being determined only if the obtained throughput of the access channel is above a threshold requirementThe amount is qualified. DiRepresenting the obtained throughput of the ith channel, with η being the throughput threshold of the secondary user. (D)i- η)/η is the ratio of the throughput obtained for the ith channel to the threshold η difference, mainly to guide the secondary user to select the more excellent channel.
The network of the asynchronous dominant actor commentary family is divided into a local network and a global network. Each local network is independently interacted with the environment, so that each local network also has an independent actor network and a critic network, the local actor networks are independently interacted with the channel environment respectively, the critic network evaluates the action strategy of the actor networks, and the network structures of the local networks are completely the same. The global network does not interact with the environment, and the method mainly works by collecting gradient data of each network, updating the network through the gradient data and transmitting updated network parameters to each local network.
The actor network in the local network performs interaction and action selection with the environment, and the main task is strategy learning, which directly performs gradient calculation on the strategy:
Figure BDA0002918007300000044
wherein J (θ) represents an objective function of the policy network; piθ(s, a) represents the probability of selecting action a in state s when the network parameter is θ; d(s) representing the number of states collected for this interaction;
Figure BDA0002918007300000055
representing the immediate reward obtained by selecting action a in state s.
The local critic network is mainly used for estimating state value, evaluating the quality degree of an actor network action strategy and guiding actor network updating through an advantage function. The merit function is the merit of some action a over the average in state s. Multi-step sampling is employed in the asynchronous dominant actor critic network to accelerate convergence:
Figure BDA0002918007300000051
where V(s) represents the value of state s, which can be estimated by the critic's network. In conjunction with equation (9), the policy gradient calculation of equation (8) becomes:
Figure BDA0002918007300000052
the global network does not interact with the environment, and the method mainly works by collecting gradient data of each network, updating the network through the gradient data and transmitting updated network parameters to each local network. The structure of the global network also remains consistent with the local network due to the mutual communication of parameters and gradients. The actor network in the global network is also responsible for updating the action strategy, and the gradient update can be expressed as:
Figure BDA0002918007300000053
where θ represents a parameter of the global actor network; a (s, a) represents a merit function representing the degree of superiority and inferiority of the operation in the environmental state; h (Pi)θ'(s)) is a policy entropy for increasing exploratory power of previous users; beta represents a policy entropy weight for controlling the degree of exploration. After the dominance function is introduced, the global network critics network improves the fitting accuracy of the value function by minimizing the square of the dominance function, and the gradient update of the global network critics network can be expressed as:
Figure BDA0002918007300000054
where μ represents a parameter of the global critic network; r represents the instant reward obtained by the secondary user; gamma is a discount factor; λ is the learning rate of the critic network.
Simulation parameter setting of the simulation example of the invention: the simulation parameters are divided into two parts of system environment parameters and neural network parameters. Wherein the system environment parameters are: the existence of N-10 strips in the environmentIndependent channels, each of which may be occupied by a primary user, with an occupation probability PbusyIs (0,1), the signal-to-noise ratio of the channel ranges from [ -10,10 [)]dB. In simulation experiments, the signal-to-noise ratio of 10 channels is set to be SNR [ -10, -8, -9, -5, -3,0,4,5,7,10 [ -8, -9, -5 [ -3]Corresponding to an occupation probability of Pbusy=[0.1,0.3,0.4,0.3,0.2,0.5,0.3,0.4,0.4,0.9]. The neural network parameters are: the network structures of actors and commentators of the local network and the global network are the same, the actor network is a hidden layer, the number of neurons is 200, and an activation function is a linear rectification function; the output layer directly outputs the action selection probability distribution. The critic network is also a hidden layer, the number of the neurons is 200, and the activation function is a linear rectification function; the output layer is to output an estimate of the value of the state action. The learning rate of the critic network is required to be more than or equal to that of the actor network, and the learning rate Lr of the critic network is set by the methodc0.001, learning rate Lr of actor networka0.0001. The invention sets the access success rate as the probability of the secondary user successfully accessing the idle channel which accords with the service quality.
As shown in fig. 2, there are 3 channels meeting QoS requirements in the environment, and the three channels are selected by the secondary user to sense the access times when sensing once every time slot. As can be seen in the figure, the iteration is initially due to heuristics, and three channels are selected almost the same number of times. However, as the iteration progresses, although the signal-to-noise ratio of the 10 th channel is high, the occupied frequency of the primary user is high, and therefore the number of times of selection is also reduced continuously. The learning of the secondary user considers the channel access from a longer angle, so that the poor channel can be effectively avoided. The other two channels that meet the QoS requirement are selected slowly because their primary users occupy less frequency. Meanwhile, due to the arrangement of the return function, the secondary user is biased to access the 9 th channel under the condition that the occupation probability of the primary user is not large, which shows that the arrangement of the return function can guide the secondary user to make a better decision.
As shown in fig. 3, when there are 3 channels in the environment meeting the QoS requirement, the present invention compares the access success rate with the sequence sensing of different known prior information under different sensing times. All knowPerception is that the algorithm assumes that the signal-to-noise ratio of all channels known by secondary users and the occupation probability of the primary user corresponding to each channel are calculated according to the product of the signal-to-noise ratio and the idle probability of the primary user (SNR (1-P)busy) For sequence perception. It can be seen from the figure that the fully-known sensing always senses a fixed channel due to the characteristics of the sequence sensing, so that the access success rate of the fully-known sensing depends on the first sensing channel under the condition of sensing for 1 time, and the sensing access algorithm provided by the invention can intelligently select a proper channel for access without being limited to the sequence sensing access.
The invention provides a channel quality access method in cognitive radio, which comprises the following specific steps: the local network has an actor network and a critic network, the actor network is responsible for channel selection and interacts with the environment to collect interaction information, the critic network evaluates the advantages and disadvantages of actor network channel selection strategies, but the local network does not update gradients, but collects the gradients and transmits the gradients to the global network, the global network does not interact with the environment, the global network collects the gradients collected by the local networks, performs gradient updating on the local networks, and transmits updated network parameters to the local networks again. The invention comprehensively considers the channel quality and the idle probability, the secondary user can effectively avoid accessing the inferior channel, and the access success rate meeting the service quality requirement is greatly improved.
The technical solution of the present invention is not limited to the technical method, and the present invention can be extended to other modifications, variations, applications and embodiments in application, and all such modifications, variations, applications, embodiments are considered to be within the spirit and teaching scope of the present invention.

Claims (6)

1. A channel quality access method in cognitive radio is characterized in that: the method comprises the following steps:
1.1, initializing actor network and comment family network parameters in the global network, and giving the global network parameters to the local network;
1.2, under the local network, the secondary user selects a channel to access according to an observation matrix formed by observation information and a current strategy, and the secondary user perceives the selected channel to access and obtains instant return according to the channel state;
1.3, after a plurality of iterations, respectively calculating gradients of the local actor network and the local commenting family network, transmitting the gradients to the global network, and resetting the gradients of the local actor network and the commenting family network;
1.4, updating the global actor network according to the actor network updating function, updating the global critic network according to the critic network updating function, and endowing the updated global network parameters to a local network;
and 1.5, circularly executing the step 1.2 to the step 1.4 until all the circulation times are finished, and obtaining a complete neural network model.
2. The method of claim 1, wherein the method comprises: in the environment, a plurality of channels can be accessed, and the secondary user quickly finds and accesses the channel which meets the self service quality requirement.
3. The method of claim 1, wherein the method comprises: step 1.1, the neural networks of the global network and the local network have the same structure, wherein the actor network is a hidden layer, the number of the neurons is 200, the activation function is a linear rectification function, the critic network is a hidden layer, the number of the neurons is 200, and the activation function is a linear rectification function.
4. The method of claim 1, wherein the method comprises: in the step 1.2, each local network is independently interacted with the environment, the actor network and the comment family network are independent respectively, the local actor networks are independently interacted with the channel environment respectively, the comment family network evaluates the action strategy of the actor network, and the network structures of the local networks are completely the same.
5. The method of claim 1, wherein the method comprises: in the observation matrix in step 1.2, the secondary user can only observe the state of the selected sensing channel, and the observation information of the secondary user at the t-th time slot is as follows:
Ot=[o1,t,o2,t,...,oN,t]
after a temporary memory mechanism is introduced, the secondary user stores the observation information of the previous M steps, the observation information of the M steps forms an observation matrix, and the observation matrix at the t-th moment can be expressed as:
St=[Ot-1,Ot-2,Ot-3,...,Ot-M]
obtaining a return after interacting with the environment, wherein the return function is as follows:
Figure FDA0002918007290000011
the secondary user selects to sense that the accessed channel is idle and meets the self service quality requirement, so that the decision is correct, and positive feedback is obtained; if the channel selected and sensed by the secondary user is occupied by the primary user, the decision error is indicated, and a negative feedback punishment is received; considering that channels meeting the service quality requirement of the secondary user are all in a busy state in a certain period of time, setting the channels selected and sensed by the secondary user to be idle channels although the channels do not meet the service quality requirement, and still obtaining a small positive feedback, DiRepresenting the obtained throughput of the ith channel, η being the throughput threshold of the secondary user, (D)i- η)/η is the ratio of the throughput obtained for the ith channel to the threshold η difference, mainly to guide the secondary user to select the more excellent channel.
6. The method of claim 1, wherein the method comprises: in step 1.3, the update function of the global actor network is:
Figure FDA0002918007290000021
where θ represents a parameter of the global actor network, A (s, a) represents a merit function representing a degree of goodness of the action under the environment condition, and H (π)θ'(s)) is a policy entropy for increasing exploratory power of previous users;
the update function for the global critic network is:
Figure FDA0002918007290000022
wherein mu represents the parameters of the global critic network, r represents the instant return obtained by the secondary user, gamma is a discount factor, and lambda is the learning rate of the critic network.
CN202110107271.7A 2021-01-27 2021-01-27 Channel quality access method in cognitive radio Active CN112954814B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110107271.7A CN112954814B (en) 2021-01-27 2021-01-27 Channel quality access method in cognitive radio

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110107271.7A CN112954814B (en) 2021-01-27 2021-01-27 Channel quality access method in cognitive radio

Publications (2)

Publication Number Publication Date
CN112954814A true CN112954814A (en) 2021-06-11
CN112954814B CN112954814B (en) 2022-05-20

Family

ID=76237380

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110107271.7A Active CN112954814B (en) 2021-01-27 2021-01-27 Channel quality access method in cognitive radio

Country Status (1)

Country Link
CN (1) CN112954814B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108471619A (en) * 2018-03-22 2018-08-31 中南大学 The channel selecting method of cognition wireless sensor network
CN109089307A (en) * 2018-07-19 2018-12-25 浙江工业大学 A kind of energy-collecting type wireless relay network througput maximization approach based on asynchronous advantage actor reviewer algorithm
CN109379752A (en) * 2018-09-10 2019-02-22 中国移动通信集团江苏有限公司 Optimization method, device, equipment and the medium of Massive MIMO
CN110190918A (en) * 2019-04-25 2019-08-30 广西大学 Cognition wireless sensor network frequency spectrum access method based on depth Q study
CN110492955A (en) * 2019-08-19 2019-11-22 上海应用技术大学 Spectrum prediction switching method based on transfer learning strategy
CN110691422A (en) * 2019-10-06 2020-01-14 湖北工业大学 Multi-channel intelligent access method based on deep reinforcement learning
CN111262638A (en) * 2020-01-17 2020-06-09 合肥工业大学 Dynamic spectrum access method based on efficient sample learning
WO2020152389A1 (en) * 2019-01-22 2020-07-30 Nokia Solutions And Networks Oy Machine learning for a communication network
CN112188503A (en) * 2020-09-30 2021-01-05 南京爱而赢科技有限公司 Dynamic multichannel access method based on deep reinforcement learning and applied to cellular network

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108471619A (en) * 2018-03-22 2018-08-31 中南大学 The channel selecting method of cognition wireless sensor network
CN109089307A (en) * 2018-07-19 2018-12-25 浙江工业大学 A kind of energy-collecting type wireless relay network througput maximization approach based on asynchronous advantage actor reviewer algorithm
CN109379752A (en) * 2018-09-10 2019-02-22 中国移动通信集团江苏有限公司 Optimization method, device, equipment and the medium of Massive MIMO
WO2020152389A1 (en) * 2019-01-22 2020-07-30 Nokia Solutions And Networks Oy Machine learning for a communication network
CN110190918A (en) * 2019-04-25 2019-08-30 广西大学 Cognition wireless sensor network frequency spectrum access method based on depth Q study
CN110492955A (en) * 2019-08-19 2019-11-22 上海应用技术大学 Spectrum prediction switching method based on transfer learning strategy
CN110691422A (en) * 2019-10-06 2020-01-14 湖北工业大学 Multi-channel intelligent access method based on deep reinforcement learning
CN111262638A (en) * 2020-01-17 2020-06-09 合肥工业大学 Dynamic spectrum access method based on efficient sample learning
CN112188503A (en) * 2020-09-30 2021-01-05 南京爱而赢科技有限公司 Dynamic multichannel access method based on deep reinforcement learning and applied to cellular network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
P. YANG ET AL: "Dynamic Spectrum Access in Cognitive Radio Networks Using Deep Reinforcement Learning and Evolutionary Game", 《2018 IEEE/CIC INTERNATIONAL CONFERENCE ON COMMUNICATIONS IN CHINA (ICCC)》 *
Z. SHI, X. XIE AND H. LU: "Deep Reinforcement Learning Based Intelligent User Selection in Massive MIMO Underlay Cognitive Radios", 《IEEE ACCESS》 *
郭冰洁: "认知无线电系统中多信道动态频谱接入算法研究", 《信息科技辑》 *

Also Published As

Publication number Publication date
CN112954814B (en) 2022-05-20

Similar Documents

Publication Publication Date Title
CN112134916B (en) Cloud edge collaborative computing migration method based on deep reinforcement learning
Wang et al. A survey on applications of model-free strategy learning in cognitive wireless networks
CN107690176B (en) Network selection method based on Q learning algorithm
CN109474980A (en) A kind of wireless network resource distribution method based on depth enhancing study
CN112367132B (en) Power distribution algorithm in cognitive radio based on reinforcement learning solution
CN111262638B (en) Dynamic spectrum access method based on efficient sample learning
CN112188503B (en) Dynamic multichannel access method based on deep reinforcement learning and applied to cellular network
CN113596785B (en) D2D-NOMA communication system resource allocation method based on deep Q network
CN112492691A (en) Downlink NOMA power distribution method of deep certainty strategy gradient
CN108833227A (en) A kind of smart home communication optimization scheduling system and method based on edge calculations
Rao et al. Network selection in heterogeneous environment: A step toward always best connected and served
Zhang et al. Endogenous security-aware resource management for digital twin and 6G edge intelligence integrated smart park
CN103249050B (en) Based on the multiple dimensioned frequency spectrum access method of business demand
CN114126021A (en) Green cognitive radio power distribution method based on deep reinforcement learning
CN114051252A (en) Multi-user intelligent transmitting power control method in wireless access network
CN112954814B (en) Channel quality access method in cognitive radio
Mishra et al. Raddpg: Resource allocation in cognitive radio with deep reinforcement learning
Prasad et al. Intelligent spectrum sharing and sensing in cognitive radio network by using AROA (adaptive rider optimization algorithm)
CN106131920A (en) A kind of heterogeneous network system of selection based on many attributes Yu queuing theory
CN113395757B (en) Deep reinforcement learning cognitive network power control method based on improved return function
CN115250156A (en) Wireless network multichannel frequency spectrum access method based on federal learning
CN114980254B (en) Dynamic multichannel access method and device based on duel deep cycle Q network
Han et al. Distributed hierarchical game-based algorithm for downlink power allocation in OFDMA femtocell networks
Sun et al. EWA Selection strategy with channel handoff scheme in cognitive radio
CN112383965B (en) Cognitive radio power distribution method based on DRQN and multi-sensor model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant