CN109561504A - A kind of resource multiplexing method of URLLC and eMBB based on deeply study - Google Patents
A kind of resource multiplexing method of URLLC and eMBB based on deeply study Download PDFInfo
- Publication number
- CN109561504A CN109561504A CN201811383001.3A CN201811383001A CN109561504A CN 109561504 A CN109561504 A CN 109561504A CN 201811383001 A CN201811383001 A CN 201811383001A CN 109561504 A CN109561504 A CN 109561504A
- Authority
- CN
- China
- Prior art keywords
- urllc
- embb
- slot
- mini
- data packet
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 230000005540 biological transmission Effects 0.000 claims abstract description 27
- 238000012549 training Methods 0.000 claims abstract description 26
- 238000011156 evaluation Methods 0.000 claims abstract description 9
- 238000013468 resource allocation Methods 0.000 claims abstract description 5
- 238000013528 artificial neural network Methods 0.000 claims description 32
- 230000009471 action Effects 0.000 claims description 28
- 230000006870 function Effects 0.000 claims description 22
- 230000002787 reinforcement Effects 0.000 claims description 17
- 239000000969 carrier Substances 0.000 claims description 6
- 210000002569 neuron Anatomy 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 2
- 239000006185 dispersion Substances 0.000 claims description 2
- 238000012854 evaluation process Methods 0.000 claims description 2
- 238000011478 gradient descent method Methods 0.000 claims description 2
- 238000004806 packaging method and process Methods 0.000 claims description 2
- 238000004891 communication Methods 0.000 description 4
- 239000000523 sample Substances 0.000 description 4
- 238000001228 spectrum Methods 0.000 description 3
- 239000002699 waste material Substances 0.000 description 3
- 230000011664 signaling Effects 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/04—Wireless resource allocation
- H04W72/044—Wireless resource allocation based on the type of the allocated resource
- H04W72/0446—Resources in time domain, e.g. slots or frames
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/04—Wireless resource allocation
- H04W72/044—Wireless resource allocation based on the type of the allocated resource
- H04W72/0453—Resources in frequency domain, e.g. a carrier in FDMA
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/04—Wireless resource allocation
- H04W72/044—Wireless resource allocation based on the type of the allocated resource
- H04W72/0473—Wireless resource allocation based on the type of the allocated resource the resource being transmission power
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/50—Allocation or scheduling criteria for wireless resources
- H04W72/53—Allocation or scheduling criteria for wireless resources based on regulatory allocation policies
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention discloses the resource multiplexing methods of URLLC and eMBB based on deeply study a kind of: packet information, channel information and the queuing message of the URLLC and eMBB of M mini-slot of acquisition are as training data;The URLLC and eMBB resource multiplex model learnt based on deeply is established, and model parameter is trained using training data;Performance Evaluation is carried out to trained model until meeting performance requirement;The trained model of collected information input is obtained the resource multiplex result of decision by URLLC and eMBB packet information, channel information and the queuing message for collecting current mini-slot;According to the resource multiplex result of decision, resource allocation is carried out to eMBB the and URLLC data packet of current mini-slot.It can satisfy and the reasonable distribution of running time-frequency resource and power is utilized under eMBB and URLLC data packet transmission requirement.
Description
Technical Field
The invention relates to the technical field of wireless communication, in particular to a resource multiplexing method of URLLC and eMBB based on deep reinforcement learning.
Background
In order to meet the requirements of different scene services on delay, reliability, mobility and the like in the future, in 2015, ITU formally defines three major scenes of a future 5G network: enhanced mobile broadband (eMBB), massive machine type communication (mMTC), and ultra-reliable low latency (uRLLC). The eMBB scene is mainly used for pursuing the extremely consistent communication experience among people for further improving the performance of user experience and the like on the basis of the existing mobile broadband service scene. mMTC and eMTC are application scenarios of the Internet of things, but the respective emphasis points are different: mMTC is mainly information interaction between people and objects, and eMTC mainly reflects communication requirements between the objects. One of the important objectives of the 5G NR (New Radio, New air interface) design is to enable services of different models in three scenarios to be effectively multiplexed on the same frequency band.
The URLLC/eMBB scene is the scene with the most urgent need of 5G NR at present, the eMBB service is taken as a basic requirement, and the URLLC service can coexist with the eMBB service under the condition of ensuring the eMBB service spectrum efficiency as much as possible. In order to meet the requirement of low delay of URLLC, one way is to use 60KHz subcarrier spacing to achieve 1/4 (compared with LTE) with the original slot length, and in order to further reduce the slot length, ULRLLC uses 1/14 that takes 4 symbols as one micro slot (mini-slot) and reduces the slot length to LTE. In order to save resources and improve spectrum efficiency, the base station may allocate resources already allocated to the eMBB service for randomly arriving URLLC service. The dynamic resource multiplexing method can avoid resource waste to the maximum extent during resource multiplexing, and certainly can also cause demodulation failure of eMBB service data and cause additional HARQ feedback. Therefore, how to allocate the eMBB and URLLC services in limited resources and achieve efficient utilization of the resources is an urgent problem to be solved.
Disclosure of Invention
The invention aims to provide a URLLC and eMMC resource multiplexing method based on deep reinforcement learning, which can realize reasonable distribution and utilization of time-frequency resources and power under the condition of meeting eMMC and URLLC data packet transmission requirements.
In order to achieve the above object, the present invention provides a resource multiplexing method for URLLC and eMBB based on deep reinforcement learning, which includes:
collecting data packet information, channel information and queue information of URLLC and eMBB of M micro-slots mini-slots as training data; m is a natural number;
establishing a URLLC and eMBB resource multiplexing model based on deep reinforcement learning, and training model parameters by using the training data;
performing performance evaluation on the trained model until the performance requirement is met;
collecting current mini-slot URLLC and eMBB data packet information, channel information and queue information, inputting the collected information into the trained model, and obtaining a resource multiplexing decision result;
and according to the resource multiplexing decision result, carrying out resource allocation on the eMBB and URLLC data packets of the current mini-slot.
In summary, the invention is a resource multiplexing method of URLLC and eMBB based on deep reinforcement learning, which trains the eMBB and URLLC data packet information, channel information and queue information by the deep reinforcement learning method to obtain a decision result of the eMBB and URLLC data packet multiplexing resource, reasonably distributes the multiplexing resource according to the decision result, and effectively solves the problem of power and time-frequency resource waste.
Drawings
Fig. 1 is a schematic diagram of a frame structure and a multiplexing mode for multiplexing eMBB and URLLC time-frequency resources according to the present invention.
Fig. 2 is a flowchart illustrating a resource multiplexing method of URLLC and eMBB based on deep reinforcement learning according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the embodiments of the present invention will be described in detail with reference to the accompanying drawings.
The core idea of the invention is that firstly, data packet information, channel information and queue information of URLLC and eMBB are collected as training data, then a URLLC and eMBB resource multiplexing model based on deep reinforcement learning is established, and the training data is utilized to train model parameters and update model parameters theta. Performing performance evaluation on the obtained URLLC and eMMC resource multiplexing model for deep reinforcement learning, and if the URLLC reliability requirement is met and an eMMC data packet has a lower retransmission rate, finishing the training process; if the performance requirements cannot be met, the model continues to be trained until the loss function converges. And then collecting the current mini-slot URLLC and eMBB data packet information, channel information and queue information, and inputting the information into a trained deep reinforcement learning model to obtain a resource multiplexing decision result. And then resource allocation is carried out on eMBB and URLLC data packets according to the decision result of resource multiplexing, so that efficient utilization of limited multiplexing resources is realized, and the problem of power and time-frequency resource waste is effectively solved.
Referring to fig. 1, a frame structure and a multiplexing method for multiplexing the eMBB and the URLLC according to the present invention are specifically described.
Specifically, in order to meet the requirement of low delay of URLLC, 1/4 (compared with LTE) with the original slot length is realized by using 60KHz subcarrier spacing, and in order to further reduce the slot length, ULRLLC uses 4 symbols as a mini-slot and 1/14 reduced to LTE one TTI length, and transmits with one mini-slot as one TTI. In order to save resources and improve spectrum efficiency, the base station may allocate resources already allocated to the eMBB service for randomly arriving URLLC service. And a dynamic scheduling method is adopted, downlink DCI signaling PI (Pre-Indication) is configured to immediately inform a user of eMMC service data preempted by URLLC service data, and the system informs an eMMC user of periodically detecting PI through RRC sublayer signaling to complete correct demodulation of the preempted eMMC resources. And the full utilization of time-frequency resources is realized.
Fig. 2 is a flowchart illustrating a URLLC and eMBB resource multiplexing method based on deep reinforcement learning according to the present invention.
Step 1, collecting data packet information, channel information and queue information of URLLC and eMBB of M micro-slots mini-slots as training data; m is a natural number;
step 101, taking the kth mini-slot in M as an example, obtaining downlink channel gain g of different subcarriers through Channel Quality Indicator (CQI) information periodically uploaded by UEk=[g1,g2,…,gi]Wherein i is the number of sub-carriers in the mini-slot; and obtaining eMBB data packet bit numberBit number R of URLLC data packetk UReMBB packet queue lengthURLLC packet queue length Qk UR,k∈M;
Step 102, packaging the obtained information into a state vectorAs training data.
Step 2, establishing a URLLC and eMBB resource multiplexing model based on deep reinforcement learning, and training model parameters by using the training data;
step 201, establishing a resource multiplexing model of URLLC and eMBB based on deep reinforcement learning, which comprises the following specific steps:
(1) setting motion vector a ═ PeM,PUR,neM,nur]In which P iseMIndicating the transmit power, P, allocated to an eMBB packet during the current mini-slot transmission timeURIndicating the transmit power, n, allocated to the URLLC packet during the current mini-slot transmission timeeMIndicates the number of sub-carriers, n, allocated to an eMBB packet in the current mini-slot transmission timeurIndicating the number of sub-carriers allocated to URLLC data packets in the current mini-slot transmission time and initializing the queue length Q of eMBB data packetseMAnd queue length Q of URLLC packetURAre all zero;
(2) constructing eval and next two identical neural networks, wherein the eval neural network is used for obtaining an action evaluation function Q of the current state and selecting an action vector a; next neural network by selecting the action valuation function argmax for which the next state is largestaQ' calculating a target action valuation function QtargetThe EVAL neural network parameter updating module is used for completing updating of the EVAL neural network parameters;
(3) setting the parameter C ═ n, n of eval neural networkh,nin,nout,θ,activate](ii) a n denotes the number of hidden layers of the neural network, nh=[nh1,nh2,...,nhn]Indicates the number of neurons included in each hidden layer, ninRepresenting the number of input layer neurons and being equal to the length of the state vector s, noutRepresents the number of output layer neurons and is equal to all possible values of the motion vector a, θ ═ weight, bias]Weight represents weight and is randomly initialized to 0-w, bias tableIndicating bias and initializing to be b, and activating to represent an activation function and adopting a ReLU;
(4) the next neural network parameters C are initialized.
Step 202, the method for training the model parameters by using the training data includes:
A. (1) state vector of kth mini-slotInputting an eval neural network;
(2) selecting motion vector ak;
Specifically, the motion vector akThere are two options, one is to set the probability εaWith probability epsilonaRandomly selecting action a from the action poolk. Wherein epsilonaIs a very small probability value.
Or, alternatively, with a probability of (1- ε)a) Selecting satisfied conditions from eval neural networkAct a ofk. Wherein the action akThere are a number of possible values according to each akThe value of (a) is obtained to obtain Q(s) corresponding to the value of (a)k,akθ) value, and then selecting the largest Q(s)k,akValue of theta)Corresponding to ak。Q(sk,akAnd, θ) values are calculated in detail as shown in (3) below.
(3) According to the motion vector akCalculating the prize r earnedkAnd an action valuation function Q;
(3.1) according to the motion vector akCalculating the prize r earnedkThe method comprises the following specific steps:
according to the selected actionThe signal-to-noise ratio corresponding to the transmission of URLLC data only for the ith subcarrier can be calculated:
for the ith subcarrier, if only eMB data is transmitted, the corresponding signal-to-noise ratio is as follows:
for the ith subcarrier, if the data multiplexed by the ith subcarrier is transmitted, the corresponding signal-to-noise ratio is as follows:
therefore, for the error rate of URLLC data packet transmission on the ith subcarrier:
wherein QgaussRepresenting a gaussian Q-function and V a channel dispersion. Here, theThe selection may be made according to whether the ith sub-carrier is transmitting only URLLC data packets or transmitting both multiplexed data.
According toThe transmission error rate of the obtained kth mini-slot URLLC data packet is as follows:and obtaining the transmission rate of the kth mini-slot URLLC data packet on the ith subcarrier:
according toAndthe throughput of the URLLC data packet in the current mini-slot is obtained as follows:wherein T represents the time domain length of a mini-slot;
according toAnd skTo obtain the bit number of the discarded URLLC data packet of the k mini-slotSetting the maximum queue length of URLLC data packet as HUR;
According toThe throughput of the eMBB data packet in the current mini-slot is obtained as follows:wherein n iskIndicating the number of subcarriers occupied by multiplexing the eMBB and the URLLC,is Gaussian noise;
according toAnd skTo obtain the bit number of the discarded eMBB data packet of the kth mini-slotWherein the maximum queue length of the eMBB data packet is set to be HeM;
According toak,Andreceive a reward rk
ω1To omega5Are all constants.
Wherein,indicating URLLC packet transmission error rate, the required number is equal to epsilonerrorAnd (5) carrying out comparison and then taking values. When the transmission error rate of the URLLC data packet in the kth mini-slot is greater than epsilonerrorTime of flightIf the transmission error rate of the URLLC data packet is less than epsilonerrorTime of flightEpsilon in the context of the inventionerrorIs 10-5。
(3.2) according to the Bellman equation, at state skTaking action ofkUnder the condition of (1), take action onkThe prize r earnedkAdding the Q value of the next state to obtain the expected value, and calculating the action estimation functionWhere λ is the loss factor.
Since the Q of the current state depends on the Q of the next state, an iterative approach can be taken to solve the markov decision problem via the Bellman equation.
(4) Obtaining the next state vector s that arrivesk+1;
In particular, this step sk+1Can be obtained by following s in step 1kThe obtaining is not described herein.
(5) Storing(s)k,ak,rk,sk+1) As a sample;
typically, a plurality of samples will be stored in a memory unit for subsequent training of the model.
(6) Will sk+1Input next neural network obtains maximum action estimation function argmaxa k+1Q’;
(7) According to argmaxa k+1Q' and rkTo obtainWherein gamma represents a discount factor, and theta' is a parameter of the current next neural network;
(8) randomly taking F samples from the memory unit to obtain Q of each sampletargetAnd an action valuation function Q, F being a natural number;
(9) according toSubstituting Q for each sampletargetObtaining a Loss function Loss (theta) by the action evaluation function Q, wherein theta is a parameter of the current eval neural network;
(10) using a gradient descent methodCalculating gradient, and selecting the direction with the fastest gradient descending to update the parameter theta of the eval neural network;
B. taking different k values, repeating the step A, and updating the parameters of the next neural network once every I times of updating the parameters of the eval neural network so that theta is equal to theta; i is a natural number greater than 1;
C. and taking different k values, repeating A to B, and continuously training the model until the loss function is converged.
Step 3, performing performance evaluation on the trained model until the performance requirement is met;
(1) training data obtainedInput of trained modelsk∈M;
(2) Counting the number of eMBB and URLLC data packets sent by the base station in a predetermined time period and respectively recording the number as pEMAnd pURAnd obtaining the number p of the transmission errors of the URLLC and the eMBB data packets in the time period through the information reported to the base station by the UEurAnd pem(ii) a According to pURAnd purObtaining the transmission error rate of URLLCAccording to pEMAnd pemObtaining retransmission rate of eMBB
(3) To peAnd preMaking a judgment if p is satisfiede<ke,keExpressing the transmission error rate requirement of the URLLC data packet under a specific scene; and satisfy pre<kre,kreIf the retransmission rate requirement of the eMB data packet under a specific scene is expressed, the performance evaluation process is completed; otherwise, continuing to train the model until the performance requirement is met.
Step 4, collecting URLLC and eMBB data packet information, channel information and queue information of the current mini-slot, inputting the collected information into the trained model, and obtaining a resource multiplexing decision result;
specifically, the collected data s of the current mini-slot is ═ ReM,RUR,g,QeM,QUR]Inputting the trained model to obtain a ═ PeM,PUR,neM,nur]. Wherein, the acquisition of s follows step 1, and is not described herein again.
And step 5, performing resource allocation on the eMBB and URLLC data packets of the current mini-slot according to the resource multiplexing decision result.
Specifically, according to the obtained resource multiplexing decision result a of the current mini-slot ═ PeM,PUR,neM,nUR]The radio network controller RNC indicates the power size P allocated to the URLLC and eMBB data packets through the radio resource control RRC sublayerURAnd PeMAnd the number of subcarriers n allocated to URLLC and eMBB packetsURAnd neMAnd indicates location information of the allocated subcarriers.
Further, the system informs the eBB user of the information that the eBB is preempted by the URLLC (namely the position information of the subcarriers multiplexed by the eBB and the URLLC) in real time by configuring a downlink DCI (Pre-Indication) signal PI, and informs the eBB user of periodically detecting the PI through an RRC sublayer signal to complete correct demodulation of the preempted eBB resources. As can be seen from the frame structure of fig. 1, each mini-slot time domain includes 4 symbol lengths. As can be seen from the time-frequency resource multiplexing manner in fig. 1, the light-color pattern is a subcarrier position where only eMBB data is transmitted on each mini-slot, and the dark-color pattern is a subcarrier position where eMBB and URLLC are multiplexed on each mini-slot. Therefore, reasonable distribution of URLLC and eMBB data packet services in time-frequency domain resources and power is realized, and efficient utilization of limited multiplexing resources is realized.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A resource multiplexing method of ultra-reliable low-delay URLLC and enhanced mobile broadband eMBB based on deep reinforcement learning is characterized by comprising the following steps:
collecting data packet information, channel information and queue information of URLLC and eMBB of M micro-slots mini-slots as training data; m is a natural number;
establishing a URLLC and eMBB resource multiplexing model based on deep reinforcement learning, and training model parameters by using the training data;
performing performance evaluation on the trained model until the performance requirement is met;
collecting current mini-slot URLLC and eMBB data packet information, channel information and queue information, inputting the collected information into the trained model, and obtaining a resource multiplexing decision result;
and according to the resource multiplexing decision result, carrying out resource allocation on the eMBB and URLLC data packets of the current mini-slot.
2. The method of claim 1, wherein the collecting packet information, channel information, and queue information for M mini-slot URLLC and eMBB as training data comprises:
for the kth mini-slot in M, acquiring the downlink channel gain g of different subcarriersk=[g1,g2,…,gi]Wherein i is the number of sub-carriers in the mini-slot; and obtaining eMBB data packet bit number Rk eMBit number R of URLLC data packetk UReMBB packet queue length Qk eMURLLC packet queue length Qk UR,k∈M;
Packaging the obtained information into a state vector sk=[Rk eM,Rk UR,gk,Qk eM,Qk UR]As training data.
3. The method of claim 2, wherein the establishing the deep reinforcement learning based URLLC and eMBB resource reuse model comprises:
setting motion vector a ═ PeM,PUR,neM,nur]In which P iseMIndicating the transmit power, P, allocated to an eMBB packet during the current mini-slot transmission timeURIndicating the transmit power, n, allocated to the URLLC packet during the current mini-slot transmission timeeMIndicates the number of sub-carriers, n, allocated to an eMBB packet in the current mini-slot transmission timeurIndicating the number of sub-carriers allocated to URLLC data packet in current mini-slot transmission time, and initiatingGrouping queue length Q of eMBB data packetseMAnd queue length Q of URLLC packetURAre all zero;
constructing eval and next two identical neural networks, wherein the eval neural network is used for obtaining an action evaluation function Q of the current state and selecting an action vector a; next neural network by selecting the action valuation function argmax for which the next state is largestaQ' calculating a target action valuation function QtargetThe EVAL neural network parameter updating module is used for completing updating of the EVAL neural network parameters;
setting the parameter C ═ n, n of eval neural networkh,nin,nout,θ,activate](ii) a n denotes the number of hidden layers of the neural network, nh=[nh1,nh2,...,nhn]Indicates the number of neurons included in each hidden layer, ninRepresenting the number of input layer neurons and being equal to the length of the state vector s, noutRepresents the number of output layer neurons and is equal to all possible values of the motion vector a, θ ═ weight, bias]Weight represents weight and is randomly initialized to be 0-w, bias represents bias and is initialized to be b, activate represents an activation function and adopts ReLU;
the next neural network parameters C are initialized.
4. The method of claim 3, wherein the method of training model parameters using the training data comprises:
A. the state vector s of the kth mini-slotk=[Rk eM,Rk UR,gk,Qk eM,Qk UR]Inputting an eval neural network;
selecting motion vector ak;
According to the motion vector akCalculating the prize r earnedkAnd an action valuation function Q;
obtaining the next state vector s that arrivesk+1;
Storing(s)k,ak,rk,sk+1) As a sample;
will sk+1Input next neural network obtains maximum action estimation function argmaxa k+1Q’;
According to argmaxa k+1Q' and rkTo obtainWherein gamma represents a discount factor, and theta' is a parameter of the current next neural network;
randomly taking out F samples to obtain Q of each sampletargetAnd an action valuation function Q, F being a natural number;
according toSubstituting Q for each sampletargetObtaining a Loss function Loss (theta) by the action evaluation function Q, wherein theta is a parameter of the current eval neural network;
using a gradient descent methodCalculating gradient, and selecting the direction with the fastest gradient descending to update the parameter theta of the eval neural network;
B. taking different k values, repeating the step A, and updating the parameters of the next neural network once every I times of updating the parameters of the eval neural network so that theta is equal to theta; i is a natural number greater than 1;
C. and taking different k values, repeating A to B, and continuously training the model until the loss function is converged.
5. The method of claim 4, wherein the selection action vector akThe method comprises the following steps:
setting probability epsilonaWith probability epsilonaRandomly selecting action a from the action poolkOr with probability (1-epsilon)a) Selecting satisfied conditions from eval neural networkAct a ofk。
6. The method of claim 4, wherein a is based on the motion vector akCalculating the prize r earnedkThe method comprises the following steps:
according to ak=[Pk eM,Pk UR,nk eM,nk ur]Obtaining the signal-to-noise ratio corresponding to the ith sub-carrier transmission URLLC data packet
According to ak=[Pk eM,Pk UR,nk eM,nk ur]Andobtaining the error rate of the transmission of the kth mini-slot URLLC data packet on the ith subcarrier:wherein QgaussRepresenting a gaussian Q function, V representing a channel dispersion;
according toThe transmission error rate of the obtained kth mini-slot URLLC data packet is as follows:and obtaining the transmission rate of the kth mini-slot URLLC data packet on the ith subcarrier:
according toAndthe throughput of the URLLC data packet in the current mini-slot is obtained as follows:wherein T represents the time domain length of a mini-slot;
according toAnd skTo obtain the bit number of the discarded URLLC data packet of the k mini-slotSetting the maximum queue length of URLLC data packet as HUR;
According toThe throughput of the eMBB data packet in the current mini-slot is obtained as follows:wherein n iskIndicating the number of subcarriers occupied by multiplexing the eMBB and the URLLC,is Gaussian noise;
according toAnd skTo obtain the bit number of the discarded eMBB data packet of the kth mini-slotWherein the maximum queue length of the eMBB data packet is set to be HeM;
According to epsilonk UR,ak,Andto obtain
ω1To omega5Are all constants.
7. Method according to claim 6, characterized in that in state s, according to the Bellman equationkTaking action ofkUnder the condition of (1), take action onkThe prize r earnedkAdding the Q value of the next state to obtain the expected value, and calculating the action estimation functionWhere λ is the loss factor.
8. The method of claim 7, wherein performing a performance assessment on the trained model until a performance requirement is met comprises:
training data s obtainedk=[Rk eM,Rk UR,gk,Qk eM,Qk UR]Inputting the trained model to obtain ak=[Pk eM,Pk UR,nk eM,nk ur],k∈M;
Counting the number of eMBB and URLLC data packets sent by the base station in a predetermined time period and respectively recording the number as pEMAnd pURAnd obtaining the number p of the transmission errors of the URLLC and the eMBB data packets in the time period through the information reported to the base station by the UEurAnd pem(ii) a According to pURAnd purObtaining the transmission error rate of URLLCAccording to pEMAnd pemObtaining retransmission rate of eMBB
To peAnd preMaking a judgment if p is satisfiede<ke,keExpressing the transmission error rate requirement of the URLLC data packet under a specific scene; and satisfy pre<kre,kreIf the retransmission rate requirement of the eMB data packet under a specific scene is expressed, the performance evaluation process is completed; otherwise, continuing to train the model until the performance requirement is met.
9. The method of claim 7, wherein the collecting URLLC and eMBB packet information, channel information, and queue information for a current mini-slot, inputting the collected information into the trained model, and obtaining a resource reuse decision result comprises:
collecting data s ═ R of the current mini-sloteM,RUR,g,QeM,QUR]Inputting the trained model to obtain a ═ PeM,PUR,neM,nur]。
10. The method of claim 9, wherein the allocating resources for eMBB and URLLC packets of a current mini-slot according to the resource multiplexing decision result comprises:
according to the obtained resource multiplexing decision result a of the current mini-slot ═ PeM,PUR,neM,nUR]The radio network controller RNC indicates the power size P allocated to the URLLC and eMBB data packets through the radio resource control RRC sublayerURAnd PeMAnd the number of subcarriers n allocated to URLLC and eMBB packetsURAnd neMAnd indicates location information of the allocated subcarriers.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811383001.3A CN109561504B (en) | 2018-11-20 | 2018-11-20 | URLLC and eMMC resource multiplexing method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811383001.3A CN109561504B (en) | 2018-11-20 | 2018-11-20 | URLLC and eMMC resource multiplexing method based on deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109561504A true CN109561504A (en) | 2019-04-02 |
CN109561504B CN109561504B (en) | 2020-09-01 |
Family
ID=65866817
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811383001.3A Active CN109561504B (en) | 2018-11-20 | 2018-11-20 | URLLC and eMMC resource multiplexing method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109561504B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111182644A (en) * | 2019-12-24 | 2020-05-19 | 北京邮电大学 | Joint retransmission URLLC resource scheduling method based on deep reinforcement learning |
CN111556572A (en) * | 2020-04-21 | 2020-08-18 | 北京邮电大学 | Spectrum resource and computing resource joint allocation method based on reinforcement learning |
CN113099460A (en) * | 2021-03-10 | 2021-07-09 | 西安交通大学 | Reservation-based URLLC (Universal resource reservation control) hybrid multiple access transmission optimization method and system during eMBB (enhanced multimedia broadcast/multicast service) coexistence |
CN113453236A (en) * | 2021-06-25 | 2021-09-28 | 西南科技大学 | Frequency resource allocation method for URLLC and eMBB mixed service |
CN113691350A (en) * | 2021-08-13 | 2021-11-23 | 北京遥感设备研究所 | eMBB and URLLC joint scheduling method and system |
CN113747450A (en) * | 2021-07-27 | 2021-12-03 | 清华大学 | Service deployment method and device in mobile network and electronic equipment |
CN114143816A (en) * | 2021-12-20 | 2022-03-04 | 国网河南省电力公司信息通信公司 | Dynamic 5G network resource scheduling method based on power service quality guarantee |
CN115439479A (en) * | 2022-11-09 | 2022-12-06 | 北京航空航天大学 | Academic image multiplexing detection method based on reinforcement learning |
CN116234047A (en) * | 2023-03-16 | 2023-06-06 | 华能伊敏煤电有限责任公司 | Mixed service intelligent resource scheduling method based on reinforcement learning algorithm |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108632861A (en) * | 2018-04-17 | 2018-10-09 | 浙江工业大学 | A kind of mobile edge calculations shunting decision-making technique based on deeply study |
CN108633004A (en) * | 2017-03-17 | 2018-10-09 | 工业和信息化部电信研究院 | URLLC business occupies eMBB service resources and indicates channel indicating means |
CN108712755A (en) * | 2018-05-18 | 2018-10-26 | 浙江工业大学 | A kind of nonopiate access uplink transmission time optimization method based on deeply study |
CN108811115A (en) * | 2017-05-05 | 2018-11-13 | 北京展讯高科通信技术有限公司 | EMBB business datums seize processing method, device, base station and user equipment |
-
2018
- 2018-11-20 CN CN201811383001.3A patent/CN109561504B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108633004A (en) * | 2017-03-17 | 2018-10-09 | 工业和信息化部电信研究院 | URLLC business occupies eMBB service resources and indicates channel indicating means |
CN108811115A (en) * | 2017-05-05 | 2018-11-13 | 北京展讯高科通信技术有限公司 | EMBB business datums seize processing method, device, base station and user equipment |
CN108632861A (en) * | 2018-04-17 | 2018-10-09 | 浙江工业大学 | A kind of mobile edge calculations shunting decision-making technique based on deeply study |
CN108712755A (en) * | 2018-05-18 | 2018-10-26 | 浙江工业大学 | A kind of nonopiate access uplink transmission time optimization method based on deeply study |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111182644B (en) * | 2019-12-24 | 2022-02-08 | 北京邮电大学 | Joint retransmission URLLC resource scheduling method based on deep reinforcement learning |
CN111182644A (en) * | 2019-12-24 | 2020-05-19 | 北京邮电大学 | Joint retransmission URLLC resource scheduling method based on deep reinforcement learning |
CN111556572A (en) * | 2020-04-21 | 2020-08-18 | 北京邮电大学 | Spectrum resource and computing resource joint allocation method based on reinforcement learning |
CN113099460A (en) * | 2021-03-10 | 2021-07-09 | 西安交通大学 | Reservation-based URLLC (Universal resource reservation control) hybrid multiple access transmission optimization method and system during eMBB (enhanced multimedia broadcast/multicast service) coexistence |
CN113453236A (en) * | 2021-06-25 | 2021-09-28 | 西南科技大学 | Frequency resource allocation method for URLLC and eMBB mixed service |
CN113453236B (en) * | 2021-06-25 | 2022-06-21 | 西南科技大学 | Frequency resource allocation method for URLLC and eMBB mixed service |
CN113747450A (en) * | 2021-07-27 | 2021-12-03 | 清华大学 | Service deployment method and device in mobile network and electronic equipment |
CN113691350A (en) * | 2021-08-13 | 2021-11-23 | 北京遥感设备研究所 | eMBB and URLLC joint scheduling method and system |
CN113691350B (en) * | 2021-08-13 | 2023-06-20 | 北京遥感设备研究所 | Combined scheduling method and system of eMBB and URLLC |
CN114143816A (en) * | 2021-12-20 | 2022-03-04 | 国网河南省电力公司信息通信公司 | Dynamic 5G network resource scheduling method based on power service quality guarantee |
CN115439479A (en) * | 2022-11-09 | 2022-12-06 | 北京航空航天大学 | Academic image multiplexing detection method based on reinforcement learning |
CN115439479B (en) * | 2022-11-09 | 2023-02-03 | 北京航空航天大学 | Academic image multiplexing detection method based on reinforcement learning |
CN116234047A (en) * | 2023-03-16 | 2023-06-06 | 华能伊敏煤电有限责任公司 | Mixed service intelligent resource scheduling method based on reinforcement learning algorithm |
Also Published As
Publication number | Publication date |
---|---|
CN109561504B (en) | 2020-09-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109561504B (en) | URLLC and eMMC resource multiplexing method based on deep reinforcement learning | |
Liang et al. | Deep-learning-based wireless resource allocation with application to vehicular networks | |
CN111182644B (en) | Joint retransmission URLLC resource scheduling method based on deep reinforcement learning | |
Liu et al. | A cross-layer scheduling algorithm with QoS support in wireless networks | |
CN113498076A (en) | O-RAN-based performance optimization configuration method and device | |
CN114762295A (en) | Machine learning architecture for broadcast and multicast communications | |
Haque et al. | A survey of scheduling in 5g urllc and outlook for emerging 6g systems | |
Angri et al. | Exponential mlwdf (exp-mlwdf) downlink scheduling algorithm evaluated in lte for high mobility and dense area scenario | |
KR20080070387A (en) | Apparatus and method for scheduling in broadband wireless access system | |
Sopin et al. | LTE network model with signals and random resource requirements | |
CN112153744B (en) | Physical layer security resource allocation method in ICV network | |
Saggese et al. | Deep Reinforcement Learning for URLLC data management on top of scheduled eMBB traffic | |
CN112838911B (en) | Method and apparatus in a node used for wireless communication | |
Chehri et al. | Real‐time multiuser scheduling based on end‐user requirement using big data analytics | |
Ganjalizadeh et al. | Interplay between distributed AI workflow and URLLC | |
CN112203351A (en) | Method and apparatus in a node used for wireless communication | |
CN114650606A (en) | Communication equipment, media access control layer architecture and implementation method thereof | |
KR20240141724A (en) | Higher MAC-A (ANALOG MEDIA ACCESS CONTROL) layer functions for analog transmission protocol stack | |
Asheralieva et al. | A two-step resource allocation procedure for LTE-based cognitive radio network | |
US11777866B2 (en) | Systems and methods for intelligent throughput distribution amongst applications of a User Equipment | |
Ye et al. | Video streaming analysis in Vienna LTE system level simulator | |
Elmosilhy et al. | Joint Q-Learning Based Resource Allocation and Multi-Numerology B5G Network Slicing Exploiting LWA Technology | |
US12127014B2 (en) | Protocol stack for analog communication in split architecture network for machine learning (ML) functions | |
US20230262483A1 (en) | Protocol stack for analog communication in split architecture network for machine learning (ml) functions | |
US20230239881A1 (en) | Lower analog media access control (mac-a) layer and physical layer (phy-a) functions for analog transmission protocol stack |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |