CN111182644A - Joint retransmission URLLC resource scheduling method based on deep reinforcement learning - Google Patents
Joint retransmission URLLC resource scheduling method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN111182644A CN111182644A CN201911348750.7A CN201911348750A CN111182644A CN 111182644 A CN111182644 A CN 111182644A CN 201911348750 A CN201911348750 A CN 201911348750A CN 111182644 A CN111182644 A CN 111182644A
- Authority
- CN
- China
- Prior art keywords
- urllc
- resource scheduling
- slot
- reinforcement learning
- mini
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/50—Allocation or scheduling criteria for wireless resources
- H04W72/535—Allocation or scheduling criteria for wireless resources based on resource usage policies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/12—Wireless traffic scheduling
- H04W72/1263—Mapping of traffic onto schedule, e.g. scheduled allocation or multiplexing of flows
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/50—Allocation or scheduling criteria for wireless resources
- H04W72/56—Allocation or scheduling criteria for wireless resources based on priority criteria
- H04W72/566—Allocation or scheduling criteria for wireless resources based on priority criteria of the information or information source or recipient
- H04W72/569—Allocation or scheduling criteria for wireless resources based on priority criteria of the information or information source or recipient of the traffic information
Abstract
The invention discloses a combined retransmission URLLC resource scheduling method based on deep reinforcement learning, which comprises the following steps: collecting data packet information and channel information of URLLC as training data; establishing a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning, and training model parameters by using training data; performing performance evaluation on the obtained URLLC resource scheduling decision model of deep reinforcement learning until performance requirements are met; collecting URLLC data packet information and channel information of the current mini-slot; inputting the obtained information into a URLLC resource scheduling decision model based on deep reinforcement learning to obtain a resource scheduling decision result; and according to the resource scheduling decision result, carrying out resource allocation on the URLLC data packet. The method trains the URLLC data packet information and the channel state information based on the deep reinforcement learning method to obtain a URLLC data packet scheduling resource decision result, reasonably distributes scheduling resources according to the decision result, and solves the problem of power and time-frequency resource waste on the basis of meeting the URLLC transmission requirement.
Description
Technical Field
The invention relates to the field of wireless communication, in particular to a combined retransmission URLLC resource scheduling method based on deep reinforcement learning.
Background
In order to meet the requirements of different scene services on delay, reliability, mobility and the like in the future, in 2015, ITU formally defines three major scenes of a future 5G network: enhanced mobile broadband (eMBB), massive machine-type communication (mMTC), and ultra-reliable low latency (uRLLC). The eMBB scene is mainly used for pursuing the extremely consistent communication experience among people for further improving the performance of user experience and the like on the basis of the existing mobile broadband service scene. mMTC and eMTC are application scenarios of the Internet of things, but the respective emphasis points are different: mMTC is mainly information interaction between people and objects, and eMTC mainly reflects communication requirements between the objects.
In the prior art, URLLC is widely applied to emerging fields such as remote control and smart driving due to its low-delay and high-reliability transmission performance requirements, and is a key direction of 5G research work, so the research on URLLC scene services is also a current hot topic, in order to meet the low-delay requirement of URLLC, one way is to adopt a 60KHz subcarrier interval to achieve a slot length of 1/4 (longer than LTE), in order to further reduce the slot length, ULRLLC adopts 4 symbols as a mini-slot and reduces the slot length to 1/14 of LTE, however, a mini-slot transmission mode is adopted immediately, when URLLC service data demodulation fails, a large experimental overhead is also brought, and a challenge is brought to the low-delay requirement of URLLC.
For example, the invention patent with chinese patent publication No. CN109561504A discloses a resource multiplexing method of URLLC and eMBB based on deep reinforcement learning, which collects data packet information, channel information and queue information of M mini-slot URLLC and eMBB as training data; establishing a URLLC and eMB resource multiplexing model based on deep reinforcement learning, and training model parameters by using training data; performing performance evaluation on the trained model until the performance requirement is met; collecting current mini-slot URLLC and eMBB data packet information, channel information and queue information, inputting the collected information into a trained model, and obtaining a resource multiplexing decision result; and according to the resource multiplexing decision result, performing resource allocation on the eMBB and URLLC data packets of the current mini-slot. The reasonable distribution and utilization of time-frequency resources and power under the transmission requirements of eMBB and URLLC data packets can be met.
The prior art has at least the following problems:
however, if the joint transmission mode of multiple redundant copies is adopted, the limited time-frequency resources are seriously wasted. Therefore, how to allocate the URLLC service in limited resources is an urgent problem to be solved, and the efficient utilization of the resources is realized while the transmission requirements of the URLLC service are met.
Aiming at the problem that the joint transmission mode of multiple redundant copies in the prior art can cause serious waste of limited time-frequency resources. Therefore, how to allocate the URLLC service in limited resources is an urgent problem to be solved while meeting the transmission requirement of the URLLC and realizing efficient utilization of resources, and an effective solution is not proposed at present.
Disclosure of Invention
The invention aims to provide a combined retransmission URLLC resource scheduling method based on deep reinforcement learning aiming at the defects of the prior art.
The method comprises the following operation steps:
step 1.1, obtaining the downlink channel gain g of the current mini-slot through the Channel Quality Indication (CQI) information periodically uploaded by the UE (User Equipment)k;
Step 1.2, the base station packages the URLLC service in the service queue, generates a data packet sent by the kth mini-slot URLLC service, and obtains the bit number N of the URLLC data packetk;
Step 1.3, packaging the obtained information into state information:whereinThe Mth queue length of the URLLC data packet of the kth mini-slot is represented;
step 2, establishing a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning, and training model parameters by using training data, wherein the method specifically comprises the following steps:
step 2.1, a neural network in a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning is constructed and initialized, and the specific steps are as follows:
step 2.1.1, set motion vector space a ═ bool, R1,R2,....RM]Wherein, the pool represents the transmission mode of URLLC service in the current mini-slot, 1 represents the transmission of redundancy version, 0 represents the transmission of single link, and R represents the transmission of single linkMRepresenting the bit number of the Mth queue processed by the current mini-slot;
step 2.1.2, constructing eval and next two same neural networks, wherein the eval neural network is used for obtaining the action estimation function Q of the current state and selecting the action a, and the next neural network selects the action estimation function argmax with the maximum action estimation function argmax of the next stateaQ' calculating a target action valuation function QtargetThe EVAL neural network parameter updating module is used for completing updating of the EVAL neural network parameters;
step 2.1.3, setting the parameter C of the eval neural network as [ n, n ═ nh,nin,nout,θ,bias,activate]N denotes the number of hidden layers of the neural network, nh=[nh1,nh2,...,nhn]Indicates the number of neurons included in each hidden layer, ninIs the number of input layer neurons and is equal to the length of the state vector s, noutExpressing the number of neurons in an output layer and all possible values of the motion vector a, expressing weight by theta and randomly initializing the weight to be 0-w, expressing bias by bias and initializing the bias to be b, expressing an activation function by activity, and adopting a ReLU (Rectified Linear Unit, Linear rectification function) as the activation function;
step 2.1.4, initializing a next neural network parameter C ═ C;
step 2.2, inputting data in the training data into a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning, training model parameters, taking the data of the kth mini-slot in the training data as an example, and specifically comprising the following steps:
step 2.2.1, the data of the k mini-slotInputting an eval neural network of a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning, and calculating the ith queue length of the URLLC according to the following formula (1):
in the formula: z represents the corresponding mini-slot number between retransmission intervals;
step 2.2.2, set probability εaWith probability epsilonaSelecting a random selection action a from an action poolkWith probability (1-epsilon)a) Selecting ARgmax satisfying condition from eval neural networkaAction a of Q (s, a; theta)k;
Step 2.2.3, calculating according to step 2.2.2 to obtain action akThe prize r earnedkAnd the next state s reachedk+1According to the selected actionCalculating the signal-to-noise ratio of the kth mini-slot according to the following formula (2):
in the formula:which represents the power of the gaussian noise, is,represents the power allocated for the k-th slot;
when the boul is 0, single link transmission is used, and there are:
when the pool is 1, a copy transmission mode is adopted, and in this case, there are:
for URLLC traffic, its transmission rate is calculated according to the following equation (3):
calculating the transmission error rate of the URLLC data at the kth mini-slot according to the following formula (4):
the queue length of URLLC is calculated according to equation (5) below:
wherein: z represents the corresponding mini-slot number between retransmission intervals;
calculating the time required for the current arrival service to be transmitted according to the following formula (6):
wherein: count (x) represents the number of retransmissions required when x is zero;
the action a is calculated according to the following equation (7)kThe obtained reward:
wherein: p represents the transmitting power of URLLC, Q represents the queue length of URLLC, flag represents the retransmission, and omega1、ω2、ω3、 ω4Are constants, and the motion estimation function Q is calculated from Bellman Equation (Bellman Equation):
Q(sk,ak)=E[rk+1+λrk+2+λ2rk+3+...|sk,ak]
=E[rk+λQ(sk+1,ak+1)|sk,ak],
namely: the current Q value equals to take action akThe prize r earnedkAdding the Q value of the next state, and calculating the parameter value of the next state according to the formula
Step 2.2.4, converting(s) obtained in step 2.2.3k,ak,rk,sk+1) Storing the data into a memory unit D for the next training of the model;
step 2.2.5, take F samples from memory D at random, and get sk+1Input next godObtaining a maximum motion estimation function argmax over a networka k+1Q’;
Step 2.2.6, the loss function is calculated according to the following equation (8):
Loss=(Qtarget-Q(sk,ak;θ))2
wherein: q represents an action valuation function, theta represents a weight parameter of the current neural network, and gamma represents a discount factor;
step 2.2.7, updating eval neural network weight parameters by adopting a gradient descent method, and calculating the gradient according to the following formula (9):
according to the calculated gradient, selecting the direction with the fastest gradient decrease to update the weight parameter theta;
step 2.3, updating parameters of the eval neural network every time I times to enable theta' to be theta, and updating the next neural network;
step 2.4, repeating the steps 2.2-2.3 to train the model continuously until the loss function is converged;
and 3, performing performance evaluation on the obtained combined retransmission URLLC resource scheduling decision model for deep reinforcement learning until the performance requirement is met, wherein the specific steps are as follows:
step 3.1, continue to use the data obtained in step 1 as a state vectorInputting the trained DQN (Deep Q-Network) model obtained in the step 2 to obtain a decision result of resource decision;
step 3.2, resource allocation is carried out on the URLLC data packet according to the decision result obtained in the step 3.1, and when the allocation result meets that the transmission delay of the URLLC is less than plAnd passError rate less than peIf so, completing the performance evaluation process, performing the step 4, and if the requirements are not met, returning to the step 2, and continuing to train the combined retransmission URLLC resource scheduling decision model based on the deep reinforcement learning until the performance requirements are met;
and 4, collecting URLLC data packet information and channel information of the current mini-slot, and specifically comprising the following steps of:
step 4.1, acquiring the size of a data packet encapsulated by the current mini-slot base station for the incoming URLLC service from the base station side;
step 4.2, acquiring the current mini-slot downlink channel gain g through the CQI information periodically uploaded by the UE;
step 5, inputting the URLLC data packet information and the channel information obtained in the step 4 into a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning to obtain a resource scheduling decision result, and combining the obtained current state information and the queue length information into a state vectorInputting a trained resource scheduling decision model of the combined retransmission URLLC to obtain a decision result of resource scheduling;
and 6, performing resource allocation on the URLLC data packet according to the resource scheduling decision result, and specifically comprising the following steps:
step 6.1, according to the result of the URLLC Resource scheduling obtained in step 5, the RNC (Radio network controller ) indicates the power size allocated to the URLLC and the number of transmission bits allocated to the URLLC data packet through a Radio Resource Control (RRC) sublayer;
and 6.2, a single link transmission mode or a multi-link transmission mode to be adopted by the current mini-slot of the URLLC is instantly informed through configuring a downlink DCI (Pre-Indication) signaling PI (Pre-Indication), so that the reasonable distribution of time-frequency domain resources and power of the URLLC data packet service is realized, and the utilization of limited time-frequency resources is realized.
Compared with the prior art, the method has the following remarkable advantages:
training URLLC data packet information and channel state information by using a deep reinforcement learning method to obtain a resource scheduling decision result, thereby realizing reasonable distribution and utilization of time-frequency resources and power under the condition of meeting the transmission requirement of the URLLC data packet.
And 2, transmission priorities can be set for different queues by adopting a multi-queue mode for transmission, the retransmission queues can be transmitted more flexibly, and transmission on demand is realized to reduce retransmission time delay.
Drawings
Fig. 1 is a flowchart of a URLLC resource scheduling method based on deep reinforcement learning in the present invention;
fig. 2 is a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning.
Detailed Description
The invention is described in further detail below with reference to the drawings and the detailed description.
Referring to fig. 1, the method comprises the steps of:
specifically, the base station obtains the bit number of a URLLC data packet arriving at M mini-slots and the gain of a corresponding channel, and takes the data packet information and channel information of the kth mini-slot as training data, and the specific steps are as follows:
step 1.1, acquiring the current mini-slot downlink channel gain g through the CQI information periodically uploaded by the UEk;
Step 1.2, the base station packages the URLLC service in the service queue, generates a data packet sent by the kth mini-slot URLLC service, and obtains the bit number N of the URLLC data packetk;
Step 1.3, packaging the obtained information into state informationWhereinThe Mth queue length of the URLLC data packet of the kth mini-slot is represented;
step 2, establishing a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning, and training model parameters by using training data, wherein the method specifically comprises the following steps:
step 2.1, a neural network in a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning is constructed and initialized, and the specific steps are as follows:
step 2.1.1, set motion vector space a ═ bool, R1,R2,....RM]Wherein, the pool represents the transmission mode of URLLC service in the current mini-slot, 1 represents the transmission of redundancy version, 0 represents the transmission of single link, and R represents the transmission of single linkMRepresenting the bit number of the Mth queue processed by the current mini-slot;
step 2.1.2, constructing eval and next two same neural networks, wherein the eval neural network is used for obtaining the action estimation function Q of the current state and selecting the action a, and the next neural network selects the action estimation function argmax with the maximum action estimation function argmax of the next stateaQ' calculating a target action valuation function QtargetThe EVAL neural network parameter updating module is used for completing updating of the EVAL neural network parameters;
step 2.1.3, setting the parameter C of the eval neural network as [ n, n ═ nh,nin,nout,θ,bias,activate]In particular, the number of implicit layers of the n neural network, nh=[nh1,nh2,...,nhn]Number, n, of neurons contained in each hidden layerinIs the number of input layer neurons and is equal to the length of the state vector s, noutExpressing the number of neurons in an output layer and all possible values of the motion vector a, expressing weight by theta and randomly initializing the weight to be 0-w, expressing bias by bias and initializing the bias to be b, expressing an activation function by activity, and adopting a ReLU (Rectified Linear Unit, Linear rectification function) as the activation function;
step 2.1.4, initializing a next neural network parameter C ═ C;
step 2.2, inputting data in the training data into a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning, training model parameters, taking the data of the kth mini-slot in the training data as an example, and specifically training the following process:
step 2.2.1, the data of the k mini-slotInputting an eval neural network of a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning, and calculating the ith queue length of the URLLC according to the following formula (1):
in the formula: z represents the corresponding mini-slot number between retransmission intervals;
step 2.2.2, set probability εaWith probability epsilonaSelecting a random selection action a from an action poolkWith probability (1-epsilon)a) Selecting ARgmax satisfying condition from eval neural networkaAction a of Q (s, a; theta)k;
Step 2.2.3, calculating according to step 2.2.2 to obtain action akThe prize r earnedkAnd the next state s reachedk+1In particular, according to the selected action ak=[boolk,R1 k,R2 k,....RM k]Calculating the signal-to-noise ratio of the kth mini-slot according to the following formula (2):
in the formula:which represents the power of the gaussian noise, is,represents the power allocated for the k-th slot;
when the boul is 0, single link transmission is used, and there are:
when the pool is 1, a copy transmission mode is adopted, and in this case, there are:
for URLLC traffic, its transmission rate is calculated according to the following equation (3):
calculating the transmission error rate of the URLLC data at the kth mini-slot according to the following formula (4):
the queue length of URLLC is calculated according to equation (5) below:
wherein: z represents the corresponding number of mini-slots between retransmission intervals,
calculating the time required for the current arrival service to be transmitted according to the following formula (6):
wherein: count (x) indicates the number of retransmissions required when x is zero,
therefore, the action a is calculated according to the following formula (7)kThe obtained reward:
wherein: p represents the transmitting power of URLLC, Q represents the queue length of URLLC, flag represents the retransmission, and omega1、ω2、ω3、 ω4All are constants, and according to the Bellman equation, the motion estimation function Q is calculated:
Q(sk,ak)=E[rk+1+λrk+2+λ2rk+3+...|sk,ak]
=E[rk+λQ(sk+1,ak+1)|sk,ak],
i.e. the current Q value equals to take action akThe prize r earnedkAdding the Q value of the next state, and calculating the parameter value of the next state according to the formula
Step 2.2.4, converting(s) obtained in step 2.2.3k,ak,rk,sk+1) Storing the data into a memory unit D for the next training of the model;
step 2.2.5, in order to solve the problem of correlation and non-static distribution between samples, F samples are randomly taken out from the memory unit D, and s is addedk+1Input next neural network obtains maximum action estimation function argmaxa k+1Q' for solving the correlation and non-static distribution problem between samples;
step 2.2.6, the loss function is calculated according to the following equation (8):
Loss=(Qtarget-Q(sk,ak;θ))2
wherein: q represents an action valuation function, theta represents a weight parameter of the current neural network, and gamma represents a discount factor;
step 2.2.7, updating eval neural network weight parameters by adopting a gradient descent method, and calculating the gradient according to the following formula (9):
according to the calculated gradient, selecting the direction with the fastest gradient decrease to update the weight parameter theta;
step 2.3, updating parameters of the eval neural network every time I times to enable theta' to be theta, and updating the next neural network;
step 2.4, repeating the steps 2.2-2.3 to train the model continuously until the loss function is converged;
and 3, performing performance evaluation on the obtained combined retransmission URLLC resource scheduling decision model for deep reinforcement learning until the performance requirement is met, wherein the specific steps are as follows:
step 3.1, continue to use the data obtained in step 1 as a state vectorInputting the trained DQN model obtained in the step 2 to obtain a decision result of resource decision;
step 3.2, resource allocation is carried out on the URLLC data packet according to the decision result obtained in the step 3.1, and when the allocation result meets that the transmission delay of the URLLC is less than plAnd a transmission error rate less than peIf so, completing the performance evaluation process, performing the step 4, and if the requirements are not met, returning to the step 2, and continuing to train the combined retransmission URLLC resource scheduling decision model based on the deep reinforcement learning until the performance requirements are met;
and 4, collecting URLLC data packet information and channel information of the current mini-slot, and specifically comprising the following steps of:
step 4.1, acquiring the size of a data packet encapsulated by the current mini-slot base station for the incoming URLLC service from the base station side;
step 4.2, acquiring the current mini-slot downlink channel gain g through the CQI information periodically uploaded by the UE;
step 5, inputting the URLLC data packet information and the channel information obtained in the step 4 into a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning to obtain a resource scheduling decision result;
in particular, the obtained current state information and queue length information are combined into a state vector Inputting a trained resource scheduling decision model of the combined retransmission URLLC to obtain a decision result of resource scheduling;
and 6, performing resource allocation on the URLLC data packet according to the resource scheduling decision result, and specifically comprising the following steps:
step 6.1, according to the result of URLLC resource scheduling obtained in step 5, the RNC indicates the power size distributed to the URLLC and the transmission bit number distributed to the URLLC data packet through the RRC sublayer;
and 6.2, informing that the current mini-slot of the URLLC to adopt a single link transmission mode or a multi-link transmission mode through configuring a downlink DCI signaling PI, realizing reasonable distribution of time-frequency domain resources and power of the URLLC data packet service, and realizing utilization of limited time-frequency resources.
Referring to fig. 2, a deep reinforcement learning-based joint retransmission URLLC resource scheduling decision model proposed in the present invention is specifically described.
Specifically, in order to meet the requirement of low delay of URLLC, a 60KHz subcarrier interval is used to achieve a slot length of 1/4 (compared with LTE), in order to further reduce the slot length, ULRLLC uses 4 symbols as a mini-slot to reduce the slot length to 1/14 of the length of LTE one TTI, and uses one mini-slot as one TTI for transmission, M queues are constructed, the first arriving URLLC service enters queue 1 for transmission, and the task uses P to perform transmission1,sIs successful and is transmitted with a probability of P1,fUntil M-1 failures arrive at queue M for discard, the next queue 2 is reached. The transmission priority can be set for different queues by adopting a multi-queue mode for transmission, the retransmission queues can be transmitted more flexibly, and the transmission according to the requirement is realized so as to reduce the retransmission time delay.
The above description is only for the preferred embodiment of the present invention and should not be construed as limiting the present invention, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the present invention, and any modifications, equivalents, improvements, etc. made therein are intended to be included within the scope of the appended claims.
Claims (6)
1. A URLLC resource scheduling method based on deep reinforcement learning is characterized by comprising the following steps:
step 1, collecting data packet information and channel information of URLLC as training data, acquiring the bit number of URLLC data packets arriving at M mini-slots and the gain of a corresponding channel by a base station, and taking the data packet information and the channel information of the kth mini-slot as the training data;
step 2, establishing a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning, and training model parameters by using training data;
step 3, performing performance evaluation on the obtained combined retransmission URLLC resource scheduling decision model of the deep reinforcement learning until the performance requirement is met;
step 4, collecting URLLC data packet information and channel information of the current mini-slot;
step 5, inputting the URLLC data packet information and the channel information obtained in the step 4 into a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning to obtain a resource scheduling decision result, and combining the obtained current state information and the queue length information into a state vectorInputting a trained resource scheduling decision model of the combined retransmission URLLC to obtain a decision result of resource scheduling;
and 6, performing resource allocation on the URLLC data packet according to the resource scheduling decision result.
2. The URLLC resource scheduling method based on deep reinforcement learning of claim 1, characterized in that in step 1, the following steps are included:
step 1.1, acquiring the current mini-slot downlink channel gain g through the CQI information periodically uploaded by the UEk;
Step 1.2, the base station packages the URLLC service in the service queue, generates a data packet sent by the kth mini-slot URLLC service, and obtains the bit number N of the URLLC data packetk;
3. The URLLC resource scheduling method based on deep reinforcement learning of claim 1, characterized in that in step 2, the following steps are included:
step 2.1, a neural network in a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning is constructed and initialized, and the specific steps are as follows:
step 2.1.1, set motion vector space a ═ bool, R1,R2,....RM]Wherein, the pool represents the transmission mode of URLLC service in the current mini-slot, 1 represents the transmission of redundancy version, 0 represents the transmission of single link, and R represents the transmission of single linkMRepresenting the bit number of the Mth queue processed by the current mini-slot;
step 2.1.2, constructing eval and next two same neural networks, wherein the eval neural network is used for obtaining the action evaluation function Q of the current state and selecting the action a, and the next neural network selects the action evaluation function argmax with the maximum next stateaQ' calculating a target action valuation function QtargetThe EVAL neural network parameter updating module is used for completing updating of the EVAL neural network parameters;
step 2.1.3, setting the parameter C of the eval neural network as [ n, n ═ nh,nin,nout,θ,bias,activate]N denotes the number of hidden layers of the neural network, nh=[nh1,nh2,...,nhn]Indicates the number of neurons included in each hidden layer, ninIs the number of input layer neurons and is equal to the length of the state vector s, noutExpressing the number of neurons in an output layer and being equal to all possible values of the motion vector a, expressing weight by theta and randomly initializing to be 0-w, expressing bias by bias and initializing to be b, expressing an activation function by activity, and adopting ReLU as the activation function;
step 2.1.4, initializing a next neural network parameter C ═ C;
step 2.2, inputting data in the training data into a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning, training model parameters, taking the data of the kth mini-slot in the training data as an example, and specifically comprising the following steps:
step 2.2.1, the data of the k mini-slotInputting an eval neural network of a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning, and performing resource scheduling according to the following formula(1) Calculating the ith queue length of URLLC:
in the formula, z represents the corresponding mini-slot number between retransmission intervals;
step 2.2.2, set probability εaWith probability epsilonaSelecting a random selection action a from an action poolkWith probability (1-epsilon)a) Selecting arg max satisfying condition from eval neural networkaAction a of Q (s, a; theta)k;
Step 2.2.3, calculating according to step 2.2.2 to obtain action akThe prize r earnedkAnd the next state s reachedk+1According to the selected action ak=[boolk,R1 k,R2 k,....RM k]Calculating the signal-to-noise ratio of the kth mini-slot according to the following formula (2):
in the formula:which represents the power of the gaussian noise, is,represents the power allocated for the k-th slot;
when the boul is 0, single link transmission is used, and there are:
when the pool is 1, a copy transmission mode is adopted, and in this case, there are:
for URLLC traffic, its transmission rate is calculated according to the following equation (3):
calculating the transmission error rate of the URLLC data at the kth mini-slot according to the following formula (4):
the queue length of URLLC is calculated according to equation (5) below:
wherein: z represents the corresponding mini-slot number between retransmission intervals;
calculating the time required for the current arrival service to be transmitted according to the following formula (6):
wherein: count (x) represents the number of retransmissions required when x is zero;
the action a is calculated according to the following equation (7)kThe reward obtained can be expressed as:
wherein: p represents the transmitting power of URLLC, Q represents the queue length of URLLC, flag represents the retransmission, and omega1、ω2、ω3、ω4All are constants, and according to the Bellman equation, the motion estimation function Q is calculated:
Q(sk,ak)=E[rk+1+λrk+2+λ2rk+3+...|sk,ak]
=E[rk+λQ(sk+1,ak+1)|sk,ak]
i.e. the current Q value equals to take action akThe prize r earnedkAdding the Q value of the next state, and calculating the parameter value of the next state according to the formula
Step 2.2.4, converting(s) obtained in step 2.2.3k,ak,rk,sk+1) Storing the data into a memory unit D for the next training of the model;
step 2.2.5, in order to solve the problem of correlation and non-static distribution between samples, F samples are randomly taken out from the memory unit D, and s is addedk+1Input next neural network to obtain maximum action estimation function
Step 2.2.6, the loss function is calculated according to the following equation (8):
wherein: q represents an action valuation function, theta represents a weight parameter of the current neural network, and gamma represents a discount factor;
step 2.2.7, updating eval neural network weight parameters by adopting a gradient descent method, and calculating the gradient according to the following formula (9):
according to the calculated gradient, selecting the direction with the fastest gradient decrease to update the weight parameter theta;
step 2.3, updating parameters of the eval neural network every time I times to enable theta' to be theta, and updating the next neural network;
and 2.4, repeating the steps 2.2-2.3 to train the model continuously until the loss function is converged.
4. The URLLC resource scheduling method based on deep reinforcement learning of claim 1, characterized in that in step 3, the method further includes the following steps:
step 3.1, continue to use the data obtained in step 1 as a state vectorInputting the trained DQN model obtained in the step 2 to obtain a decision result of resource decision;
step 3.2, resource allocation is carried out on the URLLC data packet according to the decision result obtained in the step 3.1, and when the allocation result meets that the transmission delay of the URLLC is less than plAnd a transmission error rate less than peAnd if the performance requirement is not met, the step 4 is carried out, and if the performance requirement is not met, the step 2 is returned, and the training of the combined retransmission URLLC resource scheduling decision model based on the deep reinforcement learning is continued until the performance requirement is met.
5. The URLLC resource scheduling method based on deep reinforcement learning of claim 1, characterized in that in step 4, the following steps are included:
step 4.1, acquiring the size of a data packet encapsulated by the current mini-slot base station for the incoming URLLC service from the base station side;
and 4.2, acquiring the current mini-slot downlink channel gain g through the CQI information periodically uploaded by the UE.
6. The URLLC resource scheduling method based on deep reinforcement learning of claim 1, characterized in that in step 6, the following steps are included:
step 6.1, obtaining the result of URLLC resource scheduling according to the step 5, and indicating the power size distributed to the URLLC and the transmission bit number distributed to the URLLC data packet by the RNC through an RRC sublayer;
and 6.2, instantly informing that the current mini-slot of the URLLC needs to adopt a single link transmission mode or a multi-link transmission mode by configuring a downlink DCI signaling PI, and realizing the distribution of the time-frequency domain resources and the power of the URLLC data packet service.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911348750.7A CN111182644B (en) | 2019-12-24 | 2019-12-24 | Joint retransmission URLLC resource scheduling method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911348750.7A CN111182644B (en) | 2019-12-24 | 2019-12-24 | Joint retransmission URLLC resource scheduling method based on deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111182644A true CN111182644A (en) | 2020-05-19 |
CN111182644B CN111182644B (en) | 2022-02-08 |
Family
ID=70657926
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911348750.7A Active CN111182644B (en) | 2019-12-24 | 2019-12-24 | Joint retransmission URLLC resource scheduling method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111182644B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112188539A (en) * | 2020-10-10 | 2021-01-05 | 南京理工大学 | Interference cancellation scheduling code design method based on deep reinforcement learning |
CN112261725A (en) * | 2020-10-23 | 2021-01-22 | 安徽理工大学 | Data packet transmission intelligent decision method based on deep reinforcement learning |
CN112511276A (en) * | 2020-11-24 | 2021-03-16 | 广州技象科技有限公司 | Data processing method and device |
CN112508172A (en) * | 2020-11-23 | 2021-03-16 | 北京邮电大学 | Space flight measurement and control adaptive modulation method based on Q learning and SRNN model |
CN112584361A (en) * | 2020-12-09 | 2021-03-30 | 齐鲁工业大学 | Resource scheduling method and device based on deep reinforcement learning in M2M communication |
CN113316259A (en) * | 2021-06-29 | 2021-08-27 | 北京科技大学 | Method and device for scheduling downlink wireless resources supporting AI engine |
CN114340017A (en) * | 2022-03-17 | 2022-04-12 | 山东科技大学 | Heterogeneous network resource slicing method with eMBB and URLLC mixed service |
CN116234047A (en) * | 2023-03-16 | 2023-06-06 | 华能伊敏煤电有限责任公司 | Mixed service intelligent resource scheduling method based on reinforcement learning algorithm |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108391143A (en) * | 2018-04-24 | 2018-08-10 | 南京邮电大学 | A kind of wireless network transmission of video self-adaptation control method based on Q study |
CN109257429A (en) * | 2018-09-25 | 2019-01-22 | 南京大学 | A kind of calculating unloading dispatching method based on deeply study |
CN109561504A (en) * | 2018-11-20 | 2019-04-02 | 北京邮电大学 | A kind of resource multiplexing method of URLLC and eMBB based on deeply study |
CN109873869A (en) * | 2019-03-05 | 2019-06-11 | 东南大学 | A kind of edge cache method based on intensified learning in mist wireless access network |
CN109951897A (en) * | 2019-03-08 | 2019-06-28 | 东华大学 | A kind of MEC discharging method under energy consumption and deferred constraint |
CN110035478A (en) * | 2019-04-18 | 2019-07-19 | 北京邮电大学 | A kind of dynamic multi-channel cut-in method under high-speed mobile scene |
US20190325304A1 (en) * | 2018-04-24 | 2019-10-24 | EMC IP Holding Company LLC | Deep Reinforcement Learning for Workflow Optimization |
US20190356446A1 (en) * | 2017-01-06 | 2019-11-21 | Electronics And Telecommunications Research Institute | Uplink control information transmission method and device |
-
2019
- 2019-12-24 CN CN201911348750.7A patent/CN111182644B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190356446A1 (en) * | 2017-01-06 | 2019-11-21 | Electronics And Telecommunications Research Institute | Uplink control information transmission method and device |
CN108391143A (en) * | 2018-04-24 | 2018-08-10 | 南京邮电大学 | A kind of wireless network transmission of video self-adaptation control method based on Q study |
US20190325304A1 (en) * | 2018-04-24 | 2019-10-24 | EMC IP Holding Company LLC | Deep Reinforcement Learning for Workflow Optimization |
CN109257429A (en) * | 2018-09-25 | 2019-01-22 | 南京大学 | A kind of calculating unloading dispatching method based on deeply study |
CN109561504A (en) * | 2018-11-20 | 2019-04-02 | 北京邮电大学 | A kind of resource multiplexing method of URLLC and eMBB based on deeply study |
CN109873869A (en) * | 2019-03-05 | 2019-06-11 | 东南大学 | A kind of edge cache method based on intensified learning in mist wireless access network |
CN109951897A (en) * | 2019-03-08 | 2019-06-28 | 东华大学 | A kind of MEC discharging method under energy consumption and deferred constraint |
CN110035478A (en) * | 2019-04-18 | 2019-07-19 | 北京邮电大学 | A kind of dynamic multi-channel cut-in method under high-speed mobile scene |
Non-Patent Citations (3)
Title |
---|
H.FARÈS ET AL.: "Two-level HARQ for turbo coded cooperation: System retransmission gain and optimal time allocation", 《2012 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC),2012》 * |
吴大鹏: "带有上行数据帧聚合的光无线融合接入网络节能机制", 《电子与信息学报》 * |
廖晓闽 等: "基于深度强化学习的蜂窝网资源分配算法", 《通信学报》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112188539A (en) * | 2020-10-10 | 2021-01-05 | 南京理工大学 | Interference cancellation scheduling code design method based on deep reinforcement learning |
CN112261725A (en) * | 2020-10-23 | 2021-01-22 | 安徽理工大学 | Data packet transmission intelligent decision method based on deep reinforcement learning |
CN112261725B (en) * | 2020-10-23 | 2022-03-18 | 安徽理工大学 | Data packet transmission intelligent decision method based on deep reinforcement learning |
CN112508172A (en) * | 2020-11-23 | 2021-03-16 | 北京邮电大学 | Space flight measurement and control adaptive modulation method based on Q learning and SRNN model |
CN112511276A (en) * | 2020-11-24 | 2021-03-16 | 广州技象科技有限公司 | Data processing method and device |
CN112584361A (en) * | 2020-12-09 | 2021-03-30 | 齐鲁工业大学 | Resource scheduling method and device based on deep reinforcement learning in M2M communication |
CN112584361B (en) * | 2020-12-09 | 2021-09-07 | 齐鲁工业大学 | Resource scheduling method and device based on deep reinforcement learning in M2M communication |
CN113316259A (en) * | 2021-06-29 | 2021-08-27 | 北京科技大学 | Method and device for scheduling downlink wireless resources supporting AI engine |
CN114340017A (en) * | 2022-03-17 | 2022-04-12 | 山东科技大学 | Heterogeneous network resource slicing method with eMBB and URLLC mixed service |
CN114340017B (en) * | 2022-03-17 | 2022-06-07 | 山东科技大学 | Heterogeneous network resource slicing method with eMBB and URLLC mixed service |
CN116234047A (en) * | 2023-03-16 | 2023-06-06 | 华能伊敏煤电有限责任公司 | Mixed service intelligent resource scheduling method based on reinforcement learning algorithm |
Also Published As
Publication number | Publication date |
---|---|
CN111182644B (en) | 2022-02-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111182644B (en) | Joint retransmission URLLC resource scheduling method based on deep reinforcement learning | |
CN109561504B (en) | URLLC and eMMC resource multiplexing method based on deep reinforcement learning | |
Liu et al. | A cross-layer scheduling algorithm with QoS support in wireless networks | |
US9509478B2 (en) | Method and apparatus for data and control multiplexing | |
JP5259596B2 (en) | Recovery from resource mismatch in wireless communication systems | |
Liu et al. | Cross-layer scheduling with prescribed QoS guarantees in adaptive wireless networks | |
CN103209494B (en) | A kind of real-time video traffic resource allocation methods based on importance labelling | |
CN101895926A (en) | Method and apparatus for scheduling in a wireless network | |
CN112838911B (en) | Method and apparatus in a node used for wireless communication | |
CN114867030A (en) | Double-time-scale intelligent wireless access network slicing method | |
Chung et al. | A-MPDU using fragmented MPDUs for IEEE 802.11 ac MU-MIMO WLANs | |
Kallel et al. | A flexible numerology configuration for efficient resource allocation in 3GPP V2X 5G new radio | |
US20230239881A1 (en) | Lower analog media access control (mac-a) layer and physical layer (phy-a) functions for analog transmission protocol stack | |
Asheralieva et al. | A two-step resource allocation procedure for LTE-based cognitive radio network | |
Noh et al. | Application-level qos and qoe assessment of a cross-layer packet scheduling scheme for audio-video transmission over error-prone ieee 802.11 e hcca wireless lans | |
KR20050083085A (en) | Apparatus and method for scheduling traffic data in mobile communication system using the orthogonal frequency division multiple access | |
Jiang et al. | The design of transport block-based ROHC U-mode for LTE multicast | |
WO2022017127A1 (en) | Method and apparatus used in user equipment and base station for wireless communication | |
CN112688763B (en) | Method and apparatus in a node used for wireless communication | |
WO2023077757A1 (en) | Method and apparatus used in node for wireless communication | |
WO2023179470A1 (en) | Method and apparatus used in node for wireless communication | |
WO2023193672A1 (en) | Method and apparatus for wireless communication | |
KR101958069B1 (en) | Method, apparatus and system for transmitting svc video data in radio communication environment | |
CN115941122A (en) | Method and apparatus for wireless communication | |
CN117479225A (en) | Method and apparatus for wireless communication |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |