CN111726811A - Slice resource allocation method and system for cognitive wireless network - Google Patents
Slice resource allocation method and system for cognitive wireless network Download PDFInfo
- Publication number
- CN111726811A CN111726811A CN202010457568.1A CN202010457568A CN111726811A CN 111726811 A CN111726811 A CN 111726811A CN 202010457568 A CN202010457568 A CN 202010457568A CN 111726811 A CN111726811 A CN 111726811A
- Authority
- CN
- China
- Prior art keywords
- resource allocation
- user
- ultra
- reinforcement learning
- slice resource
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013468 resource allocation Methods 0.000 title claims abstract description 80
- 230000001149 cognitive effect Effects 0.000 title claims abstract description 67
- 238000000034 method Methods 0.000 title claims abstract description 53
- 230000002787 reinforcement Effects 0.000 claims abstract description 65
- 230000006870 function Effects 0.000 claims abstract description 49
- 238000004891 communication Methods 0.000 claims abstract description 43
- 230000009471 action Effects 0.000 claims abstract description 34
- 210000002569 neuron Anatomy 0.000 claims description 12
- 238000005457 optimization Methods 0.000 claims description 12
- 238000013528 artificial neural network Methods 0.000 claims description 11
- 230000004913 activation Effects 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 8
- 238000003062 neural network model Methods 0.000 claims description 6
- 238000010276 construction Methods 0.000 claims description 4
- 238000001228 spectrum Methods 0.000 abstract description 11
- 238000005516 engineering process Methods 0.000 abstract description 8
- 239000003795 chemical substances by application Substances 0.000 description 9
- 238000010586 diagram Methods 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000000342 Monte Carlo simulation Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000010485 coping Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W16/00—Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
- H04W16/02—Resource partitioning among network components, e.g. reuse partitioning
- H04W16/04—Traffic adaptive resource partitioning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W16/00—Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
- H04W16/22—Traffic simulation tools or models
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B17/00—Monitoring; Testing
- H04B17/30—Monitoring; Testing of propagation channels
- H04B17/309—Measuring or estimating channel quality parameters
- H04B17/336—Signal-to-interference ratio [SIR] or carrier-to-interference ratio [CIR]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B17/00—Monitoring; Testing
- H04B17/30—Monitoring; Testing of propagation channels
- H04B17/382—Monitoring; Testing of propagation channels for resource allocation, admission control or handover
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The embodiment of the invention provides a slice resource allocation method and system for a cognitive radio network. The method comprises the following steps: establishing a cognitive wireless network slice resource allocation model based on the enhanced mobile broadband slice and the ultra-high reliable ultra-low time delay communication slice; performing deep reinforcement learning on the cognitive wireless network slice resource allocation model based on an Actor-Critic deep reinforcement learning algorithm to obtain an optimal slice resource allocation solution; the Actor-Critic deep reinforcement learning algorithm comprises a user state and an action from the current moment to the next moment, and a system reward function is constructed by the user state and the action. In the embodiment of the invention, in the cognitive network resource allocation, the slicing technology and the Actor-Critic deep reinforcement learning algorithm are combined, and the resources are optimally allocated under the conditions of limited spectrum resources and limited transmitting power, so that the system throughput is maximum.
Description
Technical Field
The invention relates to the technical field of wireless communication, in particular to a slice resource allocation method and system for a cognitive wireless network.
Background
With the rapid development of wireless communication technology, the use of wireless devices (e.g., vehicles, mobile phones, tablets, and various wireless sensors) has increased rapidly, promoting the development of fifth generation (5G) wireless communication, in which 5G wireless networks, the data rate is expected to be 10 times higher than the current rate, and strong connectivity and 100% coverage are expected to provide better quality of service and user experience. While in practice spectrum resources are limited, spectrum usage is regulated for security and stability considerations. Spectrum access rights are typically granted to licensed users and unlicensed users are not allowed to send and receive data over unlicensed regions of the spectrum. Therefore, a contradiction occurs between the limitation of spectrum resources and the increase of the number of users, and how to intelligently allocate resources in the cognitive wireless network becomes a hot point of research.
In cognitive wireless networks, unauthorized users (secondary users) are allowed to communicate within a licensed region of the spectrum as long as that portion of the spectrum is not used by authorized users (primary users). The network slicing technology is one of important characteristics of a 5G network, essentially, a physical network of an operator is divided into a plurality of virtual networks by the network slicing technology, each virtual network is divided according to different service requirements, such as time delay, safety, bandwidth, reliability and the like, three application fields are provided by flexibly coping with different application scenes, such as the 5G network, and the enhanced mobile broadband, the ultra-high reliable ultra-low time delay communication and the large-scale Internet of things meet different communication characteristics and communication requirements.
In addition, reinforcement learning is a branch of artificial intelligence, also called reinforcement learning, and refers to a kind of problem of continuously learning from (with) interaction and a method for solving the problem. The reinforcement learning problem may be described as an agent continuously learning from interactions with the environment to accomplish a particular goal (e.g., to achieve a maximum reward value). At present, the reinforcement learning algorithm is widely applied in the fields of games, communication, medicine and the like. The existing common reinforcement learning methods are divided into two categories, namely, modeled reinforcement learning and modeless reinforcement learning, and according to the characteristic of complex communication scene, the modeless reinforcement learning method is generally adopted, and the modeless reinforcement learning is specifically subdivided into the following categories: the strategy learning method based on the value function comprises a dynamic programming method, a Monte Carlo method, a time sequence difference learning method, a Q learning method and a deep Q learning method; the learning method based on the strategy function comprises a reinforcee algorithm and a reinforcee algorithm with a reference line. Generally, a method based on a value function, such as a Q learning method, may cause over-estimation during policy updating, has a certain influence on convergence, and is good at handling the dispersion problem, whereas a method based on a policy function is more stable during policy updating, but the latter method is difficult to perform sufficient sampling because the solution space of the policy function is larger, causes a larger variance, and is easy to converge to a locally optimal solution.
Disclosure of Invention
The embodiment of the invention provides a slice resource allocation method and system for a cognitive radio network, which are used for solving the problems in the prior art or at least partially solving the problems.
In a first aspect, an embodiment of the present invention provides a slice resource allocation method for a cognitive wireless network, including:
establishing a cognitive wireless network slice resource allocation model based on the enhanced mobile broadband slice and the ultra-high reliable ultra-low time delay communication slice;
performing deep reinforcement learning on the cognitive wireless network slice resource allocation model based on an Actor-Critic deep reinforcement learning algorithm to obtain an optimal slice resource allocation solution; the Actor-Critic deep reinforcement learning algorithm comprises a user state and an action from the current moment to the next moment, and a system reward function is constructed by the user state and the action.
Further, performing deep reinforcement learning on the cognitive wireless network slice resource allocation model based on an Actor-Critic deep reinforcement learning algorithm to obtain an optimal slice resource allocation solution; the Actor-Critic deep reinforcement learning algorithm includes defining a user state and an action from a current time to a next time, and constructing a system reward function from the user state and the action, and the Actor-Critic deep reinforcement learning algorithm further includes:
and obtaining a full-connection neural network model to construct an Actor-critical deep reinforcement learning algorithm network.
Further, the establishing of the cognitive radio network slice resource allocation model based on the enhanced mobile broadband slice and the ultra-high reliable ultra-low time delay communication slice specifically includes:
defining the throughput of a master user of an enhanced mobile broadband and the interruption probability of the master user of ultra-high reliable ultra-low time delay communication;
and defining a system optimization target and a system constraint condition based on the master user throughput and the master user interruption probability and taking the maximum system throughput as a target, and constructing the cognitive wireless network slice resource allocation model.
Further, the defining of the throughput of the primary user of the enhanced mobile broadband and the primary user interruption probability of the ultra-high-reliability ultra-low-delay communication further includes:
the master user throughput is obtained by the signal-to-interference-and-noise ratio of any enhanced mobile broadband master user bandwidth and any enhanced mobile broadband master user on any channel, wherein the signal-to-interference-and-noise ratio is obtained by the channel gain from any enhanced mobile broadband master user transmitter to a master user receiver, the channel gain from any enhanced mobile broadband master user transmitter to a secondary user receiver, the transmitting power of any enhanced mobile broadband master user transmitter on any channel and the transmitting power of any enhanced mobile broadband secondary user transmitter on any channel;
the master user interruption probability is obtained by the delay time of any ultra-high reliable ultra-low delay communication user, the maximum delay time of any ultra-high reliable ultra-low delay communication user and the maximum data arrival rate.
Further, defining a system optimization target and a system constraint condition based on the throughput of the main user and the interruption probability of the main user and with the maximum system throughput as a target, and constructing the cognitive radio network slice resource allocation model, specifically comprising:
maximizing the sum of the throughputs of all secondary users in the system as the system optimization goal;
defining the rate of any enhanced mobile broadband user not to be lower than a first preset value;
defining the probability that any ultra-high reliable ultra-low delay communication user does not meet low delay to be smaller than a second preset value;
defining that one secondary user can only occupy one channel;
defining that the secondary user transmitter power does not exceed a third preset value.
Further, performing deep reinforcement learning on the cognitive wireless network slice resource allocation model based on an Actor-Critic deep reinforcement learning algorithm to obtain an optimal slice resource allocation solution; the Actor-critical deep reinforcement learning algorithm includes defining a user state and an action from a current time to a next time, and a system reward function is constructed by the user state and the action, and specifically includes:
defining all secondary users as intelligent agents and signal-to-interference-and-noise ratio state functions of all primary users at any moment;
based on the SINR state function, obtaining an action function of the intelligent agent from the current moment to the next moment, wherein the action function comprises the state representation of sub-carrier occupied by a sub-user at any moment and the power state representation of the sub-user at any moment;
and setting the reward function of the intelligent agent as the sum of the throughputs of all secondary users, and obtaining the result of the reward function according to whether the enhanced mobile broadband user meets a rate constraint condition and whether the ultra-high-reliability ultra-low time delay communication user meets a power constraint condition.
Further, the obtaining of the fully-connected neural network model to construct an Actor-critical deep reinforcement learning algorithm network specifically includes:
acquiring a three-layer linear neural network, wherein the number of neurons of an input layer is a first preset parameter, the number of neurons of a middle hidden layer is a second preset parameter, the input layer and the middle hidden layer adopt ReLU as an activation function, the number of neurons of an output layer is a third preset parameter, and the output layer adopts sigmoid and softmax as activation functions;
and respectively constructing an Actor network and a criticic network based on the three layers of linear neural networks.
In a second aspect, an embodiment of the present invention provides a slice resource allocation system for a cognitive wireless network, including:
the construction module is used for establishing a cognitive wireless network slice resource allocation model based on the enhanced mobile broadband slice and the ultra-high-reliability ultra-low time delay communication slice;
the solving module is used for carrying out deep reinforcement learning on the cognitive wireless network slice resource allocation model based on an Actor-Critic deep reinforcement learning algorithm to obtain an optimal slice resource allocation solution; the Actor-Critic deep reinforcement learning algorithm comprises a user state and an action from the current moment to the next moment, and a system reward function is constructed by the user state and the action.
In a third aspect, an embodiment of the present invention provides an electronic device, including:
the chip resource allocation method for the cognitive wireless network comprises the following steps of a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of any one of the chip resource allocation methods for the cognitive wireless network.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements any of the steps of the slice resource allocation method for a cognitive wireless network.
According to the slice resource allocation method and system for the cognitive radio network, provided by the embodiment of the invention, in the cognitive network resource allocation, the slice technology and the Actor-Critic deep reinforcement learning algorithm are combined, and the resources are optimally allocated under the conditions of limited spectrum resources and limited transmitting power, so that the system throughput is maximum.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of a slice resource allocation method for a cognitive wireless network according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an Actor and critical network provided in an embodiment of the present invention;
fig. 3 is a structural diagram of a slice resource allocation system for a cognitive wireless network according to an embodiment of the present invention;
fig. 4 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Aiming at the defects in the prior art, the embodiment of the invention provides a slice resource allocation method for a cognitive radio network, which realizes the joint power and channel allocation of secondary users while ensuring the service of a main user, and enables the throughput of all secondary users in the system to be maximum.
Fig. 1 is a flowchart of a slice resource allocation method for a cognitive radio network according to an embodiment of the present invention, as shown in fig. 1, including:
s1, establishing a cognitive radio network slice resource allocation model based on the enhanced mobile broadband slice and the ultra-high reliable ultra-low time delay communication slice;
s2, performing deep reinforcement learning on the cognitive wireless network slice resource allocation model based on an Actor-Critic deep reinforcement learning algorithm to obtain an optimal slice resource allocation solution; the Actor-Critic deep reinforcement learning algorithm comprises a user state and an action from the current moment to the next moment, and a system reward function is constructed by the user state and the action.
Specifically, resource allocation of two application scenes in a 5G network, namely enhanced mobile broadband and ultra-high-reliability ultra-low-delay communication, is considered, network resource slicing is carried out, corresponding resource allocation problems are mapped into a general reinforcement learning algorithm model, a cognitive radio network slice resource allocation model is established, optimization objectives and constraint conditions of the system are included, further, a deep reinforcement learning resource allocation method-CNAC algorithm based on Actor-Critic is provided, a reward function setting mechanism is provided, the constraint conditions and the optimization objectives in the cognitive radio network slice resource allocation model are simultaneously placed into a reward function, and an optimal resource allocation solution of the cognitive radio system is obtained through solving.
In the embodiment of the invention, in the cognitive network resource allocation, the slicing technology and the Actor-Critic deep reinforcement learning algorithm are combined, and the resources are optimally allocated under the conditions of limited spectrum resources and limited transmitting power, so that the system throughput is maximum.
Based on the above embodiment, the method further includes, before step S2:
and obtaining a full-connection neural network model to construct an Actor-critical deep reinforcement learning algorithm network.
Specifically, the Actor-Critic based deep reinforcement learning algorithm adopts a neural network structure, and comprises two networks, namely an Actor network and a Critic network, which adopt the same network structure.
Based on any of the above embodiments, step S1 in the method specifically includes:
defining the throughput of a master user of an enhanced mobile broadband and the interruption probability of the master user of ultra-high reliable ultra-low time delay communication;
and defining a system optimization target and a system constraint condition based on the master user throughput and the master user interruption probability and taking the maximum system throughput as a target, and constructing the cognitive wireless network slice resource allocation model.
Wherein, the definition of the throughput of the master user of the enhanced mobile broadband and the interrupt probability of the master user of the ultra-high reliable ultra-low time delay communication further comprises:
the master user throughput is obtained by the signal-to-interference-and-noise ratio of any enhanced mobile broadband master user bandwidth and any enhanced mobile broadband master user on any channel, wherein the signal-to-interference-and-noise ratio is obtained by the channel gain from any enhanced mobile broadband master user transmitter to a master user receiver, the channel gain from any enhanced mobile broadband master user transmitter to a secondary user receiver, the transmitting power of any enhanced mobile broadband master user transmitter on any channel and the transmitting power of any enhanced mobile broadband secondary user transmitter on any channel;
the master user interruption probability is obtained by the delay time of any ultra-high reliable ultra-low delay communication user, the maximum delay time of any ultra-high reliable ultra-low delay communication user and the maximum data arrival rate.
The method comprises the following steps of establishing a cognitive radio network slice resource allocation model by defining a system optimization target and a system constraint condition based on the throughput of a main user and the interruption probability of the main user and taking the maximum system throughput as a target, and specifically comprises the following steps:
maximizing the sum of the throughputs of all secondary users in the system as the system optimization goal;
defining the rate of any enhanced mobile broadband user not to be lower than a first preset value;
defining the probability that any ultra-high reliable ultra-low delay communication user does not meet low delay to be smaller than a second preset value;
defining that one secondary user can only occupy one channel;
defining that the secondary user transmitter power does not exceed a third preset value.
Specifically, a cognitive radio network slice resource allocation model is first established, where enhanced mobile broadband (eMBB) slice users and ultra-high reliable ultra-low latency communication (URLLC) slice users are considered.
Defining the throughput of a master user m of the eMBB slice to meet the following conditions:
cm,k(t)≥μ0,m∈M1
wherein the content of the first and second substances,
cm,k(t) represents the data transmission rate of the mth user on channel k, gm,kAnd gnm,kRespectively representing the channel gain from the transmitter of primary user m to the receiver of primary user m and the channel gain from the transmitter of primary user m to the receiver of secondary user n, pm(k) Indicating the transmission power, p, of the primary user transmitter on the k-th channeln,k(t) represents the transmit power of the secondary user transmitter on the k-th channel, BmDenotes the bandwidth of user m, B denotes the bandwidth of the whole cognitive system, μ0Representing the lowest throughput requirement for user m.
For URLLC slice users, assuming that the arrival process of slice 2 user packets can be represented by an M/M/1/∞ queuing system and that the packet length follows an exponential distribution, the probability of interruption for primary user M ism∈M2. Wherein d ismRepresenting the delay time of user m, dm,βRepresents the maximum delay time, r, of user mmRepresenting the maximum data arrival rate.
Further, the allocation method aims at the maximum system throughput, and the proposed optimization aims and constraint conditions are as follows:
here, constraint C1Indicating that eMB slice user rate can not be lower than mu at the lowest0,C2The probability of representing that the URLLC slice user does not meet the low delay is less than the minimum value tau, C3Meaning that one channel can be occupied by at most one secondary user, C4Transmit power constraints for the secondary user transmitters.
Based on any of the above embodiments, step S2 in the method specifically includes:
defining all secondary users as intelligent agents and signal-to-interference-and-noise ratio state functions of all primary users at any moment;
based on the SINR state function, obtaining an action function of the intelligent agent from the current moment to the next moment, wherein the action function comprises the state representation of sub-carrier occupied by a sub-user at any moment and the power state representation of the sub-user at any moment;
and setting the reward function of the intelligent agent as the sum of the throughputs of all secondary users, and obtaining the result of the reward function according to whether the enhanced mobile broadband user meets a rate constraint condition and whether the ultra-high-reliability ultra-low time delay communication user meets a power constraint condition.
Specifically, on the basis of the foregoing embodiment, a deep reinforcement learning resource allocation method based on Actor-Critic, namely a CNAC algorithm, is proposed, in which all secondary users are regarded as an agent, SINRs of all primary users at time t are used as states, and s is used as a statetExpressed as:
st={SINR1(t),SINR2(t),...,SINRM(t)}
intelligent slave stTo st+1The actions taken are represented as:
indicating the situation where the secondary user occupies a sub-carrier at time t,representing the secondary user power situation.
Since the problem is aimed at maximizing the cognitive system throughput and considering the difference in user traffic requirements, i.e., different constraints, for the two slices, the reward function r(s) of the agent is used here according to the lagrange duality methodt,at) Setting as sum of throughputs of all secondary users
If the eMBB user meets the rate constraint and the URLLC user meets the power constraint, the reward is set as the sum of the throughput of the secondary user; when the eMBB user does not meet the rate constraint requirement, the reward is set to be 0; when URLLC user does not reach the rate requirement, reward is set to 0.
Based on any one of the above embodiments, the obtaining of the fully-connected neural network model to construct the Actor-Critic reinforcement learning algorithm network specifically includes:
acquiring a three-layer linear neural network, wherein the number of neurons of an input layer is a first preset parameter, the number of neurons of a middle hidden layer is a second preset parameter, the input layer and the middle hidden layer adopt ReLU as an activation function, the number of neurons of an output layer is a third preset parameter, and the output layer adopts sigmoid and softmax as activation functions;
and respectively constructing an Actor network and a criticic network based on the three layers of linear neural networks.
Specifically, the Actor-Critic deep reinforcement learning algorithm network includes two networks, which are an Actor network and a Critic network respectively, as shown in fig. 2, the Actor is based on Policy algorithm and has a function of making a decision, the Critic evaluates the decision of the Actor, generates a TD error according to a state, an action and a reward, and then guides the decision after the Actor.
It can be understood that the Actor network and Critic network adopt the same neural network structure, the CNAC algorithm neural network part, the main body adopts a three-layer linear neural network, the neuron number of the input layer is 16, the activation function is relu, the neuron number of the middle hidden layer is 30, the activation function is relu, the neuron number of the output layer is 12, and two activation functions of sigmoid and softmax are used.
The neural network adopts the dropout technology, the generalization capability of the network is improved, meanwhile, the variance of the network is reduced, and the occurrence of overfitting is prevented. In order to speed up the training of the network, an AdamaOptimizers optimizer is adopted in the back propagation process of the network.
Based on any one of the embodiments, a simulation experiment is performed on the basis of the embodiment of the invention, and a comparison experiment is performed by using a DQN (deep Qnetwork) algorithm, and the experiment result shows that the proposed CNAC algorithm result can be converged more quickly and is better in stability and interruption rate.
Fig. 3 is a structural diagram of a slice resource allocation system for a cognitive radio network according to an embodiment of the present invention, as shown in fig. 3, including: a construction module 31 and a solution module 32; wherein:
the construction module 31 is used for establishing a cognitive radio network slice resource allocation model based on the enhanced mobile broadband slice and the ultra-high reliable ultra-low time delay communication slice; the solving module 32 is used for carrying out deep reinforcement learning on the cognitive wireless network slice resource allocation model based on an Actor-Critic deep reinforcement learning algorithm to obtain an optimal slice resource allocation solution; the Actor-Critic deep reinforcement learning algorithm comprises a user state and an action from the current moment to the next moment, and a system reward function is constructed by the user state and the action.
The system provided by the embodiment of the present invention is used for executing the corresponding method, the specific implementation manner of the system is consistent with the implementation manner of the method, and the related algorithm flow is the same as the algorithm flow of the corresponding method, which is not described herein again.
In the embodiment of the invention, in the cognitive network resource allocation, the slicing technology and the Actor-Critic reinforcement learning algorithm are combined, and the resources are optimally allocated under the conditions of limited spectrum resources and limited transmitting power, so that the system throughput is maximum.
Fig. 4 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 4: a processor (processor)410, a communication Interface 420, a memory (memory)430 and a communication bus 440, wherein the processor 410, the communication Interface 420 and the memory 430 are communicated with each other via the communication bus 440. The processor 410 may call logic instructions in the memory 430 to perform the following method: establishing a cognitive wireless network slice resource allocation model based on the enhanced mobile broadband slice and the ultra-high reliable ultra-low time delay communication slice; performing deep reinforcement learning on the cognitive wireless network slice resource allocation model based on an Actor-Critic deep reinforcement learning algorithm to obtain an optimal slice resource allocation solution; the Actor-Critic deep reinforcement learning algorithm comprises a user state and an action from the current moment to the next moment, and a system reward function is constructed by the user state and the action.
In addition, the logic instructions in the memory 430 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the transmission method provided in the foregoing embodiments when executed by a processor, and for example, the method includes: establishing a cognitive wireless network slice resource allocation model based on the enhanced mobile broadband slice and the ultra-high reliable ultra-low time delay communication slice; performing deep reinforcement learning on the cognitive wireless network slice resource allocation model based on an Actor-Critic deep reinforcement learning algorithm to obtain an optimal slice resource allocation solution; the Actor-Critic deep reinforcement learning algorithm comprises a user state and an action from the current moment to the next moment, and a system reward function is constructed by the user state and the action.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A slice resource allocation method for a cognitive wireless network, comprising:
establishing a cognitive wireless network slice resource allocation model based on the enhanced mobile broadband slice and the ultra-high reliable ultra-low time delay communication slice;
performing deep reinforcement learning on the cognitive wireless network slice resource allocation model based on an Actor-Critic deep reinforcement learning algorithm to obtain an optimal slice resource allocation solution; the Actor-Critic deep reinforcement learning algorithm comprises a user state and an action from the current moment to the next moment, and a system reward function is constructed by the user state and the action.
2. The slice resource allocation method for the cognitive wireless network according to claim 1, wherein the Actor-critical based depth reinforcement learning algorithm is used for performing depth reinforcement learning on the cognitive wireless network slice resource allocation model to obtain an optimal slice resource allocation solution; the Actor-Critic deep reinforcement learning algorithm includes defining a user state and an action from a current time to a next time, and constructing a system reward function from the user state and the action, and the Actor-Critic deep reinforcement learning algorithm further includes:
and obtaining a full-connection neural network model to construct an Actor-critical deep reinforcement learning algorithm network.
3. The slice resource allocation method for the cognitive radio network according to claim 1 or 2, wherein the establishing of the cognitive radio network slice resource allocation model based on the enhanced mobile broadband slice and the ultra-high-reliability ultra-low-latency communication slice specifically comprises:
defining the throughput of a master user of an enhanced mobile broadband and the interruption probability of the master user of ultra-high reliable ultra-low time delay communication;
and defining a system optimization target and a system constraint condition based on the master user throughput and the master user interruption probability and taking the maximum system throughput as a target, and constructing the cognitive wireless network slice resource allocation model.
4. The slice resource allocation method for cognitive wireless networks according to claim 3, wherein the defining of the throughput of the primary user of the enhanced mobile broadband and the interruption probability of the primary user of the ultra-high reliable ultra-low latency communication further comprises:
the master user throughput is obtained by the signal-to-interference-and-noise ratio of any enhanced mobile broadband master user bandwidth and any enhanced mobile broadband master user on any channel, wherein the signal-to-interference-and-noise ratio is obtained by the channel gain from any enhanced mobile broadband master user transmitter to a master user receiver, the channel gain from any enhanced mobile broadband master user transmitter to a secondary user receiver, the transmitting power of any enhanced mobile broadband master user transmitter on any channel and the transmitting power of any enhanced mobile broadband secondary user transmitter on any channel;
the master user interruption probability is obtained by the delay time of any ultra-high reliable ultra-low delay communication user, the maximum delay time of any ultra-high reliable ultra-low delay communication user and the maximum data arrival rate.
5. The slice resource allocation method for the cognitive radio network as claimed in claim 3, wherein the step of defining a system optimization objective and a system constraint condition based on the throughput of the primary user and the interruption probability of the primary user with the maximum system throughput as an objective comprises the steps of:
maximizing the sum of the throughputs of all secondary users in the system as the system optimization goal;
defining the rate of any enhanced mobile broadband user not to be lower than a first preset value;
defining the probability that any ultra-high reliable ultra-low delay communication user does not meet low delay to be smaller than a second preset value;
defining that one secondary user can only occupy one channel;
defining that the secondary user transmitter power does not exceed a third preset value.
6. The slice resource allocation method for the cognitive wireless network according to claim 1, wherein the Actor-critical based depth reinforcement learning algorithm is used for performing depth reinforcement learning on the cognitive wireless network slice resource allocation model to obtain an optimal slice resource allocation solution; the Actor-critical deep reinforcement learning algorithm includes defining a user state and an action from a current time to a next time, and a system reward function is constructed by the user state and the action, and specifically includes:
defining all secondary users as intelligent agents and signal-to-interference-and-noise ratio state functions of all primary users at any moment;
based on the SINR state function, obtaining an action function of the intelligent agent from the current moment to the next moment, wherein the action function comprises the state representation of sub-carrier occupied by a sub-user at any moment and the power state representation of the sub-user at any moment;
and setting the reward function of the intelligent agent as the sum of the throughputs of all secondary users, and obtaining the result of the reward function according to whether the enhanced mobile broadband user meets a rate constraint condition and whether the ultra-high-reliability ultra-low time delay communication user meets a power constraint condition.
7. The slice resource allocation method for the cognitive wireless network according to claim 2, wherein the obtaining of the fully-connected neural network model to construct an Actor-critical deep reinforcement learning algorithm network specifically comprises:
acquiring a three-layer linear neural network, wherein the number of neurons of an input layer is a first preset parameter, the number of neurons of a middle hidden layer is a second preset parameter, the input layer and the middle hidden layer adopt ReLU as an activation function, the number of neurons of an output layer is a third preset parameter, and the output layer adopts sigmoid and softmax as activation functions;
and respectively constructing an Actor network and a criticic network based on the three layers of linear neural networks.
8. A system for slice resource allocation for cognitive wireless networks, comprising:
the construction module is used for establishing a cognitive wireless network slice resource allocation model based on the enhanced mobile broadband slice and the ultra-high-reliability ultra-low time delay communication slice;
the solving module is used for carrying out deep reinforcement learning on the cognitive wireless network slice resource allocation model based on an Actor-Critic deep reinforcement learning algorithm to obtain an optimal slice resource allocation solution; the Actor-Critic deep reinforcement learning algorithm comprises a user state and an action from the current moment to the next moment, and a system reward function is constructed by the user state and the action.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of the slice resource allocation method for cognitive wireless networks according to any of claims 1 to 7.
10. A non-transitory computer readable storage medium, having a computer program stored thereon, wherein the computer program, when being executed by a processor, implements the steps of the slice resource allocation method for cognitive wireless networks according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010457568.1A CN111726811B (en) | 2020-05-26 | 2020-05-26 | Slice resource allocation method and system for cognitive wireless network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010457568.1A CN111726811B (en) | 2020-05-26 | 2020-05-26 | Slice resource allocation method and system for cognitive wireless network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111726811A true CN111726811A (en) | 2020-09-29 |
CN111726811B CN111726811B (en) | 2023-11-14 |
Family
ID=72565084
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010457568.1A Active CN111726811B (en) | 2020-05-26 | 2020-05-26 | Slice resource allocation method and system for cognitive wireless network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111726811B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112272410A (en) * | 2020-10-22 | 2021-01-26 | 北京邮电大学 | Model training method for user association and resource allocation in NOMA (non-orthogonal multiple Access) network |
CN112367628A (en) * | 2020-11-12 | 2021-02-12 | 广东电网有限责任公司 | Intelligent network slice instantiation method and system of power Internet of things |
CN112911715A (en) * | 2021-02-03 | 2021-06-04 | 南京南瑞信息通信科技有限公司 | Power distribution method and device for maximizing throughput in virtual wireless network |
CN112991384A (en) * | 2021-01-27 | 2021-06-18 | 西安电子科技大学 | DDPG-based intelligent cognitive management method for emission resources |
CN113163451A (en) * | 2021-04-23 | 2021-07-23 | 中山大学 | D2D communication network slice distribution method based on deep reinforcement learning |
CN113395757A (en) * | 2021-06-10 | 2021-09-14 | 中国人民解放军空军通信士官学校 | Deep reinforcement learning cognitive network power control method based on improved return function |
CN113438723A (en) * | 2021-06-23 | 2021-09-24 | 广东工业大学 | Competitive depth Q network power control method with high reward punishment |
CN114374608A (en) * | 2020-10-15 | 2022-04-19 | 中国移动通信集团浙江有限公司 | Slice instance backup task scheduling method and device and electronic equipment |
CN114520772A (en) * | 2022-01-19 | 2022-05-20 | 广州杰赛科技股份有限公司 | 5G slice resource scheduling method |
WO2023109007A1 (en) * | 2021-12-17 | 2023-06-22 | 北京邮电大学 | Time domain resource configuration method and apparatus, electronic device, and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109862610A (en) * | 2019-01-08 | 2019-06-07 | 华中科技大学 | A kind of D2D subscriber resource distribution method based on deeply study DDPG algorithm |
US20190230046A1 (en) * | 2018-01-19 | 2019-07-25 | Ciena Corporation | Autonomic resource partitions for adaptive networks |
US20190268894A1 (en) * | 2018-02-28 | 2019-08-29 | Korea Advanced Institute Of Science And Technology | Resource allocation method and apparatus for wireless backhaul network based on reinforcement learning |
US10405193B1 (en) * | 2018-06-28 | 2019-09-03 | At&T Intellectual Property I, L.P. | Dynamic radio access network and intelligent service delivery for multi-carrier access for 5G or other next generation network |
CN110381541A (en) * | 2019-05-28 | 2019-10-25 | 中国电力科学研究院有限公司 | A kind of smart grid slice distribution method and device based on intensified learning |
CN110519783A (en) * | 2019-09-26 | 2019-11-29 | 东华大学 | 5G network based on enhancing study is sliced resource allocation methods |
WO2020049181A1 (en) * | 2018-09-07 | 2020-03-12 | NEC Laboratories Europe GmbH | System and method for network automation in slice-based network using reinforcement learning |
-
2020
- 2020-05-26 CN CN202010457568.1A patent/CN111726811B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190230046A1 (en) * | 2018-01-19 | 2019-07-25 | Ciena Corporation | Autonomic resource partitions for adaptive networks |
US20190268894A1 (en) * | 2018-02-28 | 2019-08-29 | Korea Advanced Institute Of Science And Technology | Resource allocation method and apparatus for wireless backhaul network based on reinforcement learning |
US10405193B1 (en) * | 2018-06-28 | 2019-09-03 | At&T Intellectual Property I, L.P. | Dynamic radio access network and intelligent service delivery for multi-carrier access for 5G or other next generation network |
WO2020049181A1 (en) * | 2018-09-07 | 2020-03-12 | NEC Laboratories Europe GmbH | System and method for network automation in slice-based network using reinforcement learning |
CN109862610A (en) * | 2019-01-08 | 2019-06-07 | 华中科技大学 | A kind of D2D subscriber resource distribution method based on deeply study DDPG algorithm |
CN110381541A (en) * | 2019-05-28 | 2019-10-25 | 中国电力科学研究院有限公司 | A kind of smart grid slice distribution method and device based on intensified learning |
CN110519783A (en) * | 2019-09-26 | 2019-11-29 | 东华大学 | 5G network based on enhancing study is sliced resource allocation methods |
Non-Patent Citations (2)
Title |
---|
RONGPENG LI ET AL: "Deep Reinforcement Learning for ResourceManagement in Network Slicing", Retrieved from the Internet <URL:https://doi.org/10.48550/arXiv.1805.06591> * |
孙三山: "下一代无线网络中基于经济理论的资源分配", 中国博士学位论文全文数据库信息科技辑, no. 01, pages 88 - 109 * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114374608A (en) * | 2020-10-15 | 2022-04-19 | 中国移动通信集团浙江有限公司 | Slice instance backup task scheduling method and device and electronic equipment |
CN114374608B (en) * | 2020-10-15 | 2023-08-15 | 中国移动通信集团浙江有限公司 | Slice instance backup task scheduling method and device and electronic equipment |
CN112272410A (en) * | 2020-10-22 | 2021-01-26 | 北京邮电大学 | Model training method for user association and resource allocation in NOMA (non-orthogonal multiple Access) network |
CN112272410B (en) * | 2020-10-22 | 2022-04-19 | 北京邮电大学 | Model training method for user association and resource allocation in NOMA (non-orthogonal multiple Access) network |
CN112367628A (en) * | 2020-11-12 | 2021-02-12 | 广东电网有限责任公司 | Intelligent network slice instantiation method and system of power Internet of things |
CN112367628B (en) * | 2020-11-12 | 2024-01-23 | 广东电网有限责任公司 | Intelligent network slice instantiation method and system of electric power Internet of things |
CN112991384B (en) * | 2021-01-27 | 2023-04-18 | 西安电子科技大学 | DDPG-based intelligent cognitive management method for emission resources |
CN112991384A (en) * | 2021-01-27 | 2021-06-18 | 西安电子科技大学 | DDPG-based intelligent cognitive management method for emission resources |
CN112911715A (en) * | 2021-02-03 | 2021-06-04 | 南京南瑞信息通信科技有限公司 | Power distribution method and device for maximizing throughput in virtual wireless network |
CN112911715B (en) * | 2021-02-03 | 2024-02-13 | 南京南瑞信息通信科技有限公司 | Method and device for distributing power with maximized throughput in virtual wireless network |
CN113163451A (en) * | 2021-04-23 | 2021-07-23 | 中山大学 | D2D communication network slice distribution method based on deep reinforcement learning |
CN113163451B (en) * | 2021-04-23 | 2022-08-02 | 中山大学 | D2D communication network slice distribution method based on deep reinforcement learning |
CN113395757B (en) * | 2021-06-10 | 2023-06-30 | 中国人民解放军空军通信士官学校 | Deep reinforcement learning cognitive network power control method based on improved return function |
CN113395757A (en) * | 2021-06-10 | 2021-09-14 | 中国人民解放军空军通信士官学校 | Deep reinforcement learning cognitive network power control method based on improved return function |
CN113438723A (en) * | 2021-06-23 | 2021-09-24 | 广东工业大学 | Competitive depth Q network power control method with high reward punishment |
WO2023109007A1 (en) * | 2021-12-17 | 2023-06-22 | 北京邮电大学 | Time domain resource configuration method and apparatus, electronic device, and storage medium |
CN114520772A (en) * | 2022-01-19 | 2022-05-20 | 广州杰赛科技股份有限公司 | 5G slice resource scheduling method |
CN114520772B (en) * | 2022-01-19 | 2023-11-14 | 广州杰赛科技股份有限公司 | 5G slice resource scheduling method |
Also Published As
Publication number | Publication date |
---|---|
CN111726811B (en) | 2023-11-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111726811A (en) | Slice resource allocation method and system for cognitive wireless network | |
CN111901392B (en) | Mobile edge computing-oriented content deployment and distribution method and system | |
Abbasi et al. | Intelligent workload allocation in IoT–Fog–cloud architecture towards mobile edge computing | |
CN110267338B (en) | Joint resource allocation and power control method in D2D communication | |
CN111310932A (en) | Method, device and equipment for optimizing horizontal federated learning system and readable storage medium | |
CN112995951B (en) | 5G Internet of vehicles V2V resource allocation method adopting depth certainty strategy gradient algorithm | |
Li | Multi-agent Q-learning of channel selection in multi-user cognitive radio systems: A two by two case | |
CN110839184B (en) | Method and device for adjusting bandwidth of mobile fronthaul optical network based on flow prediction | |
CN116541106B (en) | Computing task unloading method, computing device and storage medium | |
Yu et al. | Collaborative computation offloading for multi-access edge computing | |
Zhou et al. | Dynamic channel allocation for multi-UAVs: A deep reinforcement learning approach | |
CN113411826B (en) | Edge network equipment caching method based on attention mechanism reinforcement learning | |
Yu et al. | User-centric heterogeneous-action deep reinforcement learning for virtual reality in the metaverse over wireless networks | |
CN114828018A (en) | Multi-user mobile edge computing unloading method based on depth certainty strategy gradient | |
Jiao et al. | Deep reinforcement learning-based optimization for RIS-based UAV-NOMA downlink networks | |
Jere et al. | Distributed learning meets 6G: A communication and computing perspective | |
CN114095940A (en) | Slice resource allocation method and equipment for hybrid access cognitive wireless network | |
CN113286374A (en) | Scheduling method, training method of scheduling algorithm, related system and storage medium | |
CN112906745B (en) | Integrity intelligent network training method based on edge cooperation | |
Zheng et al. | Mobility-Aware Split-Federated With Transfer Learning for Vehicular Semantic Communication Networks | |
Maksymyuk et al. | Artificial intelligence based 5G coverage design and optimization using deep generative adversarial neural networks | |
CN111669758B (en) | Satellite unmanned aerial vehicle converged network resource allocation method and device | |
CN110519664B (en) | Configuration method and device of transceiver in software defined optical network | |
CN111741050A (en) | Data distribution method and system based on roadside unit | |
CN115481752B (en) | Model training method, device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |