CN111182644A - Joint retransmission URLLC resource scheduling method based on deep reinforcement learning - Google Patents

Joint retransmission URLLC resource scheduling method based on deep reinforcement learning Download PDF

Info

Publication number
CN111182644A
CN111182644A CN201911348750.7A CN201911348750A CN111182644A CN 111182644 A CN111182644 A CN 111182644A CN 201911348750 A CN201911348750 A CN 201911348750A CN 111182644 A CN111182644 A CN 111182644A
Authority
CN
China
Prior art keywords
urllc
resource scheduling
slot
reinforcement learning
mini
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911348750.7A
Other languages
Chinese (zh)
Other versions
CN111182644B (en
Inventor
赵中原
李阳
高慧慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201911348750.7A priority Critical patent/CN111182644B/en
Publication of CN111182644A publication Critical patent/CN111182644A/en
Application granted granted Critical
Publication of CN111182644B publication Critical patent/CN111182644B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/50Allocation or scheduling criteria for wireless resources
    • H04W72/535Allocation or scheduling criteria for wireless resources based on resource usage policies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/12Wireless traffic scheduling
    • H04W72/1263Mapping of traffic onto schedule, e.g. scheduled allocation or multiplexing of flows
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/50Allocation or scheduling criteria for wireless resources
    • H04W72/56Allocation or scheduling criteria for wireless resources based on priority criteria
    • H04W72/566Allocation or scheduling criteria for wireless resources based on priority criteria of the information or information source or recipient
    • H04W72/569Allocation or scheduling criteria for wireless resources based on priority criteria of the information or information source or recipient of the traffic information

Abstract

The invention discloses a combined retransmission URLLC resource scheduling method based on deep reinforcement learning, which comprises the following steps: collecting data packet information and channel information of URLLC as training data; establishing a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning, and training model parameters by using training data; performing performance evaluation on the obtained URLLC resource scheduling decision model of deep reinforcement learning until performance requirements are met; collecting URLLC data packet information and channel information of the current mini-slot; inputting the obtained information into a URLLC resource scheduling decision model based on deep reinforcement learning to obtain a resource scheduling decision result; and according to the resource scheduling decision result, carrying out resource allocation on the URLLC data packet. The method trains the URLLC data packet information and the channel state information based on the deep reinforcement learning method to obtain a URLLC data packet scheduling resource decision result, reasonably distributes scheduling resources according to the decision result, and solves the problem of power and time-frequency resource waste on the basis of meeting the URLLC transmission requirement.

Description

Joint retransmission URLLC resource scheduling method based on deep reinforcement learning
Technical Field
The invention relates to the field of wireless communication, in particular to a combined retransmission URLLC resource scheduling method based on deep reinforcement learning.
Background
In order to meet the requirements of different scene services on delay, reliability, mobility and the like in the future, in 2015, ITU formally defines three major scenes of a future 5G network: enhanced mobile broadband (eMBB), massive machine-type communication (mMTC), and ultra-reliable low latency (uRLLC). The eMBB scene is mainly used for pursuing the extremely consistent communication experience among people for further improving the performance of user experience and the like on the basis of the existing mobile broadband service scene. mMTC and eMTC are application scenarios of the Internet of things, but the respective emphasis points are different: mMTC is mainly information interaction between people and objects, and eMTC mainly reflects communication requirements between the objects.
In the prior art, URLLC is widely applied to emerging fields such as remote control and smart driving due to its low-delay and high-reliability transmission performance requirements, and is a key direction of 5G research work, so the research on URLLC scene services is also a current hot topic, in order to meet the low-delay requirement of URLLC, one way is to adopt a 60KHz subcarrier interval to achieve a slot length of 1/4 (longer than LTE), in order to further reduce the slot length, ULRLLC adopts 4 symbols as a mini-slot and reduces the slot length to 1/14 of LTE, however, a mini-slot transmission mode is adopted immediately, when URLLC service data demodulation fails, a large experimental overhead is also brought, and a challenge is brought to the low-delay requirement of URLLC.
For example, the invention patent with chinese patent publication No. CN109561504A discloses a resource multiplexing method of URLLC and eMBB based on deep reinforcement learning, which collects data packet information, channel information and queue information of M mini-slot URLLC and eMBB as training data; establishing a URLLC and eMB resource multiplexing model based on deep reinforcement learning, and training model parameters by using training data; performing performance evaluation on the trained model until the performance requirement is met; collecting current mini-slot URLLC and eMBB data packet information, channel information and queue information, inputting the collected information into a trained model, and obtaining a resource multiplexing decision result; and according to the resource multiplexing decision result, performing resource allocation on the eMBB and URLLC data packets of the current mini-slot. The reasonable distribution and utilization of time-frequency resources and power under the transmission requirements of eMBB and URLLC data packets can be met.
The prior art has at least the following problems:
however, if the joint transmission mode of multiple redundant copies is adopted, the limited time-frequency resources are seriously wasted. Therefore, how to allocate the URLLC service in limited resources is an urgent problem to be solved, and the efficient utilization of the resources is realized while the transmission requirements of the URLLC service are met.
Aiming at the problem that the joint transmission mode of multiple redundant copies in the prior art can cause serious waste of limited time-frequency resources. Therefore, how to allocate the URLLC service in limited resources is an urgent problem to be solved while meeting the transmission requirement of the URLLC and realizing efficient utilization of resources, and an effective solution is not proposed at present.
Disclosure of Invention
The invention aims to provide a combined retransmission URLLC resource scheduling method based on deep reinforcement learning aiming at the defects of the prior art.
The method comprises the following operation steps:
step 1, collecting data packet information and channel information of URLLC (ultra-reliable low-delay communication) as training data, the base station obtaining the bit number of URLLC data packets arriving at M mini-slots and the gain of corresponding channel, and using the data packet information and channel information of the kth mini-slot as training data, the specific steps are as follows:
step 1.1, obtaining the downlink channel gain g of the current mini-slot through the Channel Quality Indication (CQI) information periodically uploaded by the UE (User Equipment)k
Step 1.2, the base station packages the URLLC service in the service queue, generates a data packet sent by the kth mini-slot URLLC service, and obtains the bit number N of the URLLC data packetk
Step 1.3, packaging the obtained information into state information:
Figure BDA0002334118290000021
wherein
Figure BDA0002334118290000022
The Mth queue length of the URLLC data packet of the kth mini-slot is represented;
step 2, establishing a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning, and training model parameters by using training data, wherein the method specifically comprises the following steps:
step 2.1, a neural network in a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning is constructed and initialized, and the specific steps are as follows:
step 2.1.1, set motion vector space a ═ bool, R1,R2,....RM]Wherein, the pool represents the transmission mode of URLLC service in the current mini-slot, 1 represents the transmission of redundancy version, 0 represents the transmission of single link, and R represents the transmission of single linkMRepresenting the bit number of the Mth queue processed by the current mini-slot;
step 2.1.2, constructing eval and next two same neural networks, wherein the eval neural network is used for obtaining the action estimation function Q of the current state and selecting the action a, and the next neural network selects the action estimation function argmax with the maximum action estimation function argmax of the next stateaQ' calculating a target action valuation function QtargetThe EVAL neural network parameter updating module is used for completing updating of the EVAL neural network parameters;
step 2.1.3, setting the parameter C of the eval neural network as [ n, n ═ nh,nin,nout,θ,bias,activate]N denotes the number of hidden layers of the neural network, nh=[nh1,nh2,...,nhn]Indicates the number of neurons included in each hidden layer, ninIs the number of input layer neurons and is equal to the length of the state vector s, noutExpressing the number of neurons in an output layer and all possible values of the motion vector a, expressing weight by theta and randomly initializing the weight to be 0-w, expressing bias by bias and initializing the bias to be b, expressing an activation function by activity, and adopting a ReLU (Rectified Linear Unit, Linear rectification function) as the activation function;
step 2.1.4, initializing a next neural network parameter C ═ C;
step 2.2, inputting data in the training data into a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning, training model parameters, taking the data of the kth mini-slot in the training data as an example, and specifically comprising the following steps:
step 2.2.1, the data of the k mini-slot
Figure BDA0002334118290000038
Inputting an eval neural network of a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning, and calculating the ith queue length of the URLLC according to the following formula (1):
Figure BDA0002334118290000031
Figure BDA0002334118290000032
in the formula: z represents the corresponding mini-slot number between retransmission intervals;
step 2.2.2, set probability εaWith probability epsilonaSelecting a random selection action a from an action poolkWith probability (1-epsilon)a) Selecting ARgmax satisfying condition from eval neural networkaAction a of Q (s, a; theta)k
Step 2.2.3, calculating according to step 2.2.2 to obtain action akThe prize r earnedkAnd the next state s reachedk+1According to the selected action
Figure BDA0002334118290000033
Calculating the signal-to-noise ratio of the kth mini-slot according to the following formula (2):
Figure BDA0002334118290000034
in the formula:
Figure BDA0002334118290000039
which represents the power of the gaussian noise, is,
Figure BDA0002334118290000035
represents the power allocated for the k-th slot;
when the boul is 0, single link transmission is used, and there are:
Figure BDA0002334118290000036
when the pool is 1, a copy transmission mode is adopted, and in this case, there are:
Figure BDA0002334118290000037
wherein:
Figure BDA0002334118290000041
for URLLC traffic, its transmission rate is calculated according to the following equation (3):
Figure BDA0002334118290000042
wherein:
Figure BDA0002334118290000049
indicating channel separation;
calculating the transmission error rate of the URLLC data at the kth mini-slot according to the following formula (4):
Figure BDA0002334118290000043
the queue length of URLLC is calculated according to equation (5) below:
Figure BDA0002334118290000044
Figure BDA0002334118290000045
wherein: z represents the corresponding mini-slot number between retransmission intervals;
calculating the time required for the current arrival service to be transmitted according to the following formula (6):
Figure BDA0002334118290000046
wherein: count (x) represents the number of retransmissions required when x is zero;
the action a is calculated according to the following equation (7)kThe obtained reward:
Figure BDA0002334118290000047
wherein: p represents the transmitting power of URLLC, Q represents the queue length of URLLC, flag represents the retransmission, and omega1、ω2、ω3、 ω4Are constants, and the motion estimation function Q is calculated from Bellman Equation (Bellman Equation):
Q(sk,ak)=E[rk+1+λrk+22rk+3+...|sk,ak]
=E[rk+λQ(sk+1,ak+1)|sk,ak],
namely: the current Q value equals to take action akThe prize r earnedkAdding the Q value of the next state, and calculating the parameter value of the next state according to the formula
Figure BDA0002334118290000048
Step 2.2.4, converting(s) obtained in step 2.2.3k,ak,rk,sk+1) Storing the data into a memory unit D for the next training of the model;
step 2.2.5, take F samples from memory D at random, and get sk+1Input next godObtaining a maximum motion estimation function argmax over a networka k+1Q’;
Step 2.2.6, the loss function is calculated according to the following equation (8):
Loss=(Qtarget-Q(sk,ak;θ))2
Figure BDA0002334118290000051
wherein: q represents an action valuation function, theta represents a weight parameter of the current neural network, and gamma represents a discount factor;
step 2.2.7, updating eval neural network weight parameters by adopting a gradient descent method, and calculating the gradient according to the following formula (9):
Figure BDA0002334118290000052
according to the calculated gradient, selecting the direction with the fastest gradient decrease to update the weight parameter theta;
step 2.3, updating parameters of the eval neural network every time I times to enable theta' to be theta, and updating the next neural network;
step 2.4, repeating the steps 2.2-2.3 to train the model continuously until the loss function is converged;
and 3, performing performance evaluation on the obtained combined retransmission URLLC resource scheduling decision model for deep reinforcement learning until the performance requirement is met, wherein the specific steps are as follows:
step 3.1, continue to use the data obtained in step 1 as a state vector
Figure BDA0002334118290000053
Inputting the trained DQN (Deep Q-Network) model obtained in the step 2 to obtain a decision result of resource decision;
step 3.2, resource allocation is carried out on the URLLC data packet according to the decision result obtained in the step 3.1, and when the allocation result meets that the transmission delay of the URLLC is less than plAnd passError rate less than peIf so, completing the performance evaluation process, performing the step 4, and if the requirements are not met, returning to the step 2, and continuing to train the combined retransmission URLLC resource scheduling decision model based on the deep reinforcement learning until the performance requirements are met;
and 4, collecting URLLC data packet information and channel information of the current mini-slot, and specifically comprising the following steps of:
step 4.1, acquiring the size of a data packet encapsulated by the current mini-slot base station for the incoming URLLC service from the base station side;
step 4.2, acquiring the current mini-slot downlink channel gain g through the CQI information periodically uploaded by the UE;
step 5, inputting the URLLC data packet information and the channel information obtained in the step 4 into a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning to obtain a resource scheduling decision result, and combining the obtained current state information and the queue length information into a state vector
Figure BDA0002334118290000054
Inputting a trained resource scheduling decision model of the combined retransmission URLLC to obtain a decision result of resource scheduling;
and 6, performing resource allocation on the URLLC data packet according to the resource scheduling decision result, and specifically comprising the following steps:
step 6.1, according to the result of the URLLC Resource scheduling obtained in step 5, the RNC (Radio network controller ) indicates the power size allocated to the URLLC and the number of transmission bits allocated to the URLLC data packet through a Radio Resource Control (RRC) sublayer;
and 6.2, a single link transmission mode or a multi-link transmission mode to be adopted by the current mini-slot of the URLLC is instantly informed through configuring a downlink DCI (Pre-Indication) signaling PI (Pre-Indication), so that the reasonable distribution of time-frequency domain resources and power of the URLLC data packet service is realized, and the utilization of limited time-frequency resources is realized.
Compared with the prior art, the method has the following remarkable advantages:
training URLLC data packet information and channel state information by using a deep reinforcement learning method to obtain a resource scheduling decision result, thereby realizing reasonable distribution and utilization of time-frequency resources and power under the condition of meeting the transmission requirement of the URLLC data packet.
And 2, transmission priorities can be set for different queues by adopting a multi-queue mode for transmission, the retransmission queues can be transmitted more flexibly, and transmission on demand is realized to reduce retransmission time delay.
Drawings
Fig. 1 is a flowchart of a URLLC resource scheduling method based on deep reinforcement learning in the present invention;
fig. 2 is a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning.
Detailed Description
The invention is described in further detail below with reference to the drawings and the detailed description.
Referring to fig. 1, the method comprises the steps of:
step 1, collecting data packet information and channel information of URLLC as training data:
specifically, the base station obtains the bit number of a URLLC data packet arriving at M mini-slots and the gain of a corresponding channel, and takes the data packet information and channel information of the kth mini-slot as training data, and the specific steps are as follows:
step 1.1, acquiring the current mini-slot downlink channel gain g through the CQI information periodically uploaded by the UEk
Step 1.2, the base station packages the URLLC service in the service queue, generates a data packet sent by the kth mini-slot URLLC service, and obtains the bit number N of the URLLC data packetk
Step 1.3, packaging the obtained information into state information
Figure BDA0002334118290000061
Wherein
Figure BDA0002334118290000062
The Mth queue length of the URLLC data packet of the kth mini-slot is represented;
step 2, establishing a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning, and training model parameters by using training data, wherein the method specifically comprises the following steps:
step 2.1, a neural network in a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning is constructed and initialized, and the specific steps are as follows:
step 2.1.1, set motion vector space a ═ bool, R1,R2,....RM]Wherein, the pool represents the transmission mode of URLLC service in the current mini-slot, 1 represents the transmission of redundancy version, 0 represents the transmission of single link, and R represents the transmission of single linkMRepresenting the bit number of the Mth queue processed by the current mini-slot;
step 2.1.2, constructing eval and next two same neural networks, wherein the eval neural network is used for obtaining the action estimation function Q of the current state and selecting the action a, and the next neural network selects the action estimation function argmax with the maximum action estimation function argmax of the next stateaQ' calculating a target action valuation function QtargetThe EVAL neural network parameter updating module is used for completing updating of the EVAL neural network parameters;
step 2.1.3, setting the parameter C of the eval neural network as [ n, n ═ nh,nin,nout,θ,bias,activate]In particular, the number of implicit layers of the n neural network, nh=[nh1,nh2,...,nhn]Number, n, of neurons contained in each hidden layerinIs the number of input layer neurons and is equal to the length of the state vector s, noutExpressing the number of neurons in an output layer and all possible values of the motion vector a, expressing weight by theta and randomly initializing the weight to be 0-w, expressing bias by bias and initializing the bias to be b, expressing an activation function by activity, and adopting a ReLU (Rectified Linear Unit, Linear rectification function) as the activation function;
step 2.1.4, initializing a next neural network parameter C ═ C;
step 2.2, inputting data in the training data into a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning, training model parameters, taking the data of the kth mini-slot in the training data as an example, and specifically training the following process:
step 2.2.1, the data of the k mini-slot
Figure BDA0002334118290000073
Inputting an eval neural network of a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning, and calculating the ith queue length of the URLLC according to the following formula (1):
Figure BDA0002334118290000071
Figure BDA0002334118290000072
in the formula: z represents the corresponding mini-slot number between retransmission intervals;
step 2.2.2, set probability εaWith probability epsilonaSelecting a random selection action a from an action poolkWith probability (1-epsilon)a) Selecting ARgmax satisfying condition from eval neural networkaAction a of Q (s, a; theta)k
Step 2.2.3, calculating according to step 2.2.2 to obtain action akThe prize r earnedkAnd the next state s reachedk+1In particular, according to the selected action ak=[boolk,R1 k,R2 k,....RM k]Calculating the signal-to-noise ratio of the kth mini-slot according to the following formula (2):
Figure BDA0002334118290000081
in the formula:
Figure BDA00023341182900000811
which represents the power of the gaussian noise, is,
Figure BDA0002334118290000082
represents the power allocated for the k-th slot;
when the boul is 0, single link transmission is used, and there are:
Figure BDA0002334118290000083
when the pool is 1, a copy transmission mode is adopted, and in this case, there are:
Figure BDA0002334118290000084
wherein:
Figure BDA0002334118290000085
for URLLC traffic, its transmission rate is calculated according to the following equation (3):
Figure BDA0002334118290000086
wherein:
Figure BDA00023341182900000812
it is indicated that the channel is separated,
calculating the transmission error rate of the URLLC data at the kth mini-slot according to the following formula (4):
Figure BDA0002334118290000087
the queue length of URLLC is calculated according to equation (5) below:
Figure BDA0002334118290000088
Figure BDA0002334118290000089
wherein: z represents the corresponding number of mini-slots between retransmission intervals,
calculating the time required for the current arrival service to be transmitted according to the following formula (6):
Figure BDA00023341182900000810
wherein: count (x) indicates the number of retransmissions required when x is zero,
therefore, the action a is calculated according to the following formula (7)kThe obtained reward:
Figure BDA0002334118290000091
wherein: p represents the transmitting power of URLLC, Q represents the queue length of URLLC, flag represents the retransmission, and omega1、ω2、ω3、 ω4All are constants, and according to the Bellman equation, the motion estimation function Q is calculated:
Q(sk,ak)=E[rk+1+λrk+22rk+3+...|sk,ak]
=E[rk+λQ(sk+1,ak+1)|sk,ak],
i.e. the current Q value equals to take action akThe prize r earnedkAdding the Q value of the next state, and calculating the parameter value of the next state according to the formula
Figure BDA0002334118290000092
Step 2.2.4, converting(s) obtained in step 2.2.3k,ak,rk,sk+1) Storing the data into a memory unit D for the next training of the model;
step 2.2.5, in order to solve the problem of correlation and non-static distribution between samples, F samples are randomly taken out from the memory unit D, and s is addedk+1Input next neural network obtains maximum action estimation function argmaxa k+1Q' for solving the correlation and non-static distribution problem between samples;
step 2.2.6, the loss function is calculated according to the following equation (8):
Loss=(Qtarget-Q(sk,ak;θ))2
Figure BDA0002334118290000093
wherein: q represents an action valuation function, theta represents a weight parameter of the current neural network, and gamma represents a discount factor;
step 2.2.7, updating eval neural network weight parameters by adopting a gradient descent method, and calculating the gradient according to the following formula (9):
Figure BDA0002334118290000094
according to the calculated gradient, selecting the direction with the fastest gradient decrease to update the weight parameter theta;
step 2.3, updating parameters of the eval neural network every time I times to enable theta' to be theta, and updating the next neural network;
step 2.4, repeating the steps 2.2-2.3 to train the model continuously until the loss function is converged;
and 3, performing performance evaluation on the obtained combined retransmission URLLC resource scheduling decision model for deep reinforcement learning until the performance requirement is met, wherein the specific steps are as follows:
step 3.1, continue to use the data obtained in step 1 as a state vector
Figure BDA0002334118290000095
Inputting the trained DQN model obtained in the step 2 to obtain a decision result of resource decision;
step 3.2, resource allocation is carried out on the URLLC data packet according to the decision result obtained in the step 3.1, and when the allocation result meets that the transmission delay of the URLLC is less than plAnd a transmission error rate less than peIf so, completing the performance evaluation process, performing the step 4, and if the requirements are not met, returning to the step 2, and continuing to train the combined retransmission URLLC resource scheduling decision model based on the deep reinforcement learning until the performance requirements are met;
and 4, collecting URLLC data packet information and channel information of the current mini-slot, and specifically comprising the following steps of:
step 4.1, acquiring the size of a data packet encapsulated by the current mini-slot base station for the incoming URLLC service from the base station side;
step 4.2, acquiring the current mini-slot downlink channel gain g through the CQI information periodically uploaded by the UE;
step 5, inputting the URLLC data packet information and the channel information obtained in the step 4 into a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning to obtain a resource scheduling decision result;
in particular, the obtained current state information and queue length information are combined into a state vector
Figure BDA0002334118290000101
Figure BDA0002334118290000102
Inputting a trained resource scheduling decision model of the combined retransmission URLLC to obtain a decision result of resource scheduling;
and 6, performing resource allocation on the URLLC data packet according to the resource scheduling decision result, and specifically comprising the following steps:
step 6.1, according to the result of URLLC resource scheduling obtained in step 5, the RNC indicates the power size distributed to the URLLC and the transmission bit number distributed to the URLLC data packet through the RRC sublayer;
and 6.2, informing that the current mini-slot of the URLLC to adopt a single link transmission mode or a multi-link transmission mode through configuring a downlink DCI signaling PI, realizing reasonable distribution of time-frequency domain resources and power of the URLLC data packet service, and realizing utilization of limited time-frequency resources.
Referring to fig. 2, a deep reinforcement learning-based joint retransmission URLLC resource scheduling decision model proposed in the present invention is specifically described.
Specifically, in order to meet the requirement of low delay of URLLC, a 60KHz subcarrier interval is used to achieve a slot length of 1/4 (compared with LTE), in order to further reduce the slot length, ULRLLC uses 4 symbols as a mini-slot to reduce the slot length to 1/14 of the length of LTE one TTI, and uses one mini-slot as one TTI for transmission, M queues are constructed, the first arriving URLLC service enters queue 1 for transmission, and the task uses P to perform transmission1,sIs successful and is transmitted with a probability of P1,fUntil M-1 failures arrive at queue M for discard, the next queue 2 is reached. The transmission priority can be set for different queues by adopting a multi-queue mode for transmission, the retransmission queues can be transmitted more flexibly, and the transmission according to the requirement is realized so as to reduce the retransmission time delay.
The above description is only for the preferred embodiment of the present invention and should not be construed as limiting the present invention, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the present invention, and any modifications, equivalents, improvements, etc. made therein are intended to be included within the scope of the appended claims.

Claims (6)

1. A URLLC resource scheduling method based on deep reinforcement learning is characterized by comprising the following steps:
step 1, collecting data packet information and channel information of URLLC as training data, acquiring the bit number of URLLC data packets arriving at M mini-slots and the gain of a corresponding channel by a base station, and taking the data packet information and the channel information of the kth mini-slot as the training data;
step 2, establishing a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning, and training model parameters by using training data;
step 3, performing performance evaluation on the obtained combined retransmission URLLC resource scheduling decision model of the deep reinforcement learning until the performance requirement is met;
step 4, collecting URLLC data packet information and channel information of the current mini-slot;
step 5, inputting the URLLC data packet information and the channel information obtained in the step 4 into a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning to obtain a resource scheduling decision result, and combining the obtained current state information and the queue length information into a state vector
Figure FDA0002334118280000011
Inputting a trained resource scheduling decision model of the combined retransmission URLLC to obtain a decision result of resource scheduling;
and 6, performing resource allocation on the URLLC data packet according to the resource scheduling decision result.
2. The URLLC resource scheduling method based on deep reinforcement learning of claim 1, characterized in that in step 1, the following steps are included:
step 1.1, acquiring the current mini-slot downlink channel gain g through the CQI information periodically uploaded by the UEk
Step 1.2, the base station packages the URLLC service in the service queue, generates a data packet sent by the kth mini-slot URLLC service, and obtains the bit number N of the URLLC data packetk
Step 1.3, packaging the obtained information into state information
Figure FDA0002334118280000012
Wherein
Figure FDA0002334118280000013
The Mth queue length of the k mini-slot URLLC packet is represented.
3. The URLLC resource scheduling method based on deep reinforcement learning of claim 1, characterized in that in step 2, the following steps are included:
step 2.1, a neural network in a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning is constructed and initialized, and the specific steps are as follows:
step 2.1.1, set motion vector space a ═ bool, R1,R2,....RM]Wherein, the pool represents the transmission mode of URLLC service in the current mini-slot, 1 represents the transmission of redundancy version, 0 represents the transmission of single link, and R represents the transmission of single linkMRepresenting the bit number of the Mth queue processed by the current mini-slot;
step 2.1.2, constructing eval and next two same neural networks, wherein the eval neural network is used for obtaining the action evaluation function Q of the current state and selecting the action a, and the next neural network selects the action evaluation function argmax with the maximum next stateaQ' calculating a target action valuation function QtargetThe EVAL neural network parameter updating module is used for completing updating of the EVAL neural network parameters;
step 2.1.3, setting the parameter C of the eval neural network as [ n, n ═ nh,nin,nout,θ,bias,activate]N denotes the number of hidden layers of the neural network, nh=[nh1,nh2,...,nhn]Indicates the number of neurons included in each hidden layer, ninIs the number of input layer neurons and is equal to the length of the state vector s, noutExpressing the number of neurons in an output layer and being equal to all possible values of the motion vector a, expressing weight by theta and randomly initializing to be 0-w, expressing bias by bias and initializing to be b, expressing an activation function by activity, and adopting ReLU as the activation function;
step 2.1.4, initializing a next neural network parameter C ═ C;
step 2.2, inputting data in the training data into a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning, training model parameters, taking the data of the kth mini-slot in the training data as an example, and specifically comprising the following steps:
step 2.2.1, the data of the k mini-slot
Figure FDA0002334118280000021
Inputting an eval neural network of a combined retransmission URLLC resource scheduling decision model based on deep reinforcement learning, and performing resource scheduling according to the following formula(1) Calculating the ith queue length of URLLC:
Figure FDA0002334118280000022
Figure FDA0002334118280000023
in the formula, z represents the corresponding mini-slot number between retransmission intervals;
step 2.2.2, set probability εaWith probability epsilonaSelecting a random selection action a from an action poolkWith probability (1-epsilon)a) Selecting arg max satisfying condition from eval neural networkaAction a of Q (s, a; theta)k
Step 2.2.3, calculating according to step 2.2.2 to obtain action akThe prize r earnedkAnd the next state s reachedk+1According to the selected action ak=[boolk,R1 k,R2 k,....RM k]Calculating the signal-to-noise ratio of the kth mini-slot according to the following formula (2):
Figure FDA0002334118280000024
in the formula:
Figure FDA0002334118280000025
which represents the power of the gaussian noise, is,
Figure FDA0002334118280000026
represents the power allocated for the k-th slot;
when the boul is 0, single link transmission is used, and there are:
Figure FDA0002334118280000031
when the pool is 1, a copy transmission mode is adopted, and in this case, there are:
Figure FDA0002334118280000032
wherein:
Figure FDA0002334118280000033
for URLLC traffic, its transmission rate is calculated according to the following equation (3):
Figure FDA0002334118280000034
wherein:
Figure FDA0002334118280000035
indicating channel separation;
calculating the transmission error rate of the URLLC data at the kth mini-slot according to the following formula (4):
Figure FDA0002334118280000036
the queue length of URLLC is calculated according to equation (5) below:
Figure FDA0002334118280000037
Figure FDA0002334118280000038
wherein: z represents the corresponding mini-slot number between retransmission intervals;
calculating the time required for the current arrival service to be transmitted according to the following formula (6):
Figure FDA0002334118280000039
wherein: count (x) represents the number of retransmissions required when x is zero;
the action a is calculated according to the following equation (7)kThe reward obtained can be expressed as:
Figure FDA00023341182800000310
wherein: p represents the transmitting power of URLLC, Q represents the queue length of URLLC, flag represents the retransmission, and omega1、ω2、ω3、ω4All are constants, and according to the Bellman equation, the motion estimation function Q is calculated:
Q(sk,ak)=E[rk+1+λrk+22rk+3+...|sk,ak]
=E[rk+λQ(sk+1,ak+1)|sk,ak]
i.e. the current Q value equals to take action akThe prize r earnedkAdding the Q value of the next state, and calculating the parameter value of the next state according to the formula
Figure FDA0002334118280000041
Step 2.2.4, converting(s) obtained in step 2.2.3k,ak,rk,sk+1) Storing the data into a memory unit D for the next training of the model;
step 2.2.5, in order to solve the problem of correlation and non-static distribution between samples, F samples are randomly taken out from the memory unit D, and s is addedk+1Input next neural network to obtain maximum action estimation function
Figure FDA0002334118280000045
Step 2.2.6, the loss function is calculated according to the following equation (8):
Figure FDA0002334118280000042
wherein: q represents an action valuation function, theta represents a weight parameter of the current neural network, and gamma represents a discount factor;
step 2.2.7, updating eval neural network weight parameters by adopting a gradient descent method, and calculating the gradient according to the following formula (9):
Figure FDA0002334118280000043
according to the calculated gradient, selecting the direction with the fastest gradient decrease to update the weight parameter theta;
step 2.3, updating parameters of the eval neural network every time I times to enable theta' to be theta, and updating the next neural network;
and 2.4, repeating the steps 2.2-2.3 to train the model continuously until the loss function is converged.
4. The URLLC resource scheduling method based on deep reinforcement learning of claim 1, characterized in that in step 3, the method further includes the following steps:
step 3.1, continue to use the data obtained in step 1 as a state vector
Figure FDA0002334118280000044
Inputting the trained DQN model obtained in the step 2 to obtain a decision result of resource decision;
step 3.2, resource allocation is carried out on the URLLC data packet according to the decision result obtained in the step 3.1, and when the allocation result meets that the transmission delay of the URLLC is less than plAnd a transmission error rate less than peAnd if the performance requirement is not met, the step 4 is carried out, and if the performance requirement is not met, the step 2 is returned, and the training of the combined retransmission URLLC resource scheduling decision model based on the deep reinforcement learning is continued until the performance requirement is met.
5. The URLLC resource scheduling method based on deep reinforcement learning of claim 1, characterized in that in step 4, the following steps are included:
step 4.1, acquiring the size of a data packet encapsulated by the current mini-slot base station for the incoming URLLC service from the base station side;
and 4.2, acquiring the current mini-slot downlink channel gain g through the CQI information periodically uploaded by the UE.
6. The URLLC resource scheduling method based on deep reinforcement learning of claim 1, characterized in that in step 6, the following steps are included:
step 6.1, obtaining the result of URLLC resource scheduling according to the step 5, and indicating the power size distributed to the URLLC and the transmission bit number distributed to the URLLC data packet by the RNC through an RRC sublayer;
and 6.2, instantly informing that the current mini-slot of the URLLC needs to adopt a single link transmission mode or a multi-link transmission mode by configuring a downlink DCI signaling PI, and realizing the distribution of the time-frequency domain resources and the power of the URLLC data packet service.
CN201911348750.7A 2019-12-24 2019-12-24 Joint retransmission URLLC resource scheduling method based on deep reinforcement learning Active CN111182644B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911348750.7A CN111182644B (en) 2019-12-24 2019-12-24 Joint retransmission URLLC resource scheduling method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911348750.7A CN111182644B (en) 2019-12-24 2019-12-24 Joint retransmission URLLC resource scheduling method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN111182644A true CN111182644A (en) 2020-05-19
CN111182644B CN111182644B (en) 2022-02-08

Family

ID=70657926

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911348750.7A Active CN111182644B (en) 2019-12-24 2019-12-24 Joint retransmission URLLC resource scheduling method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN111182644B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112188539A (en) * 2020-10-10 2021-01-05 南京理工大学 Interference cancellation scheduling code design method based on deep reinforcement learning
CN112261725A (en) * 2020-10-23 2021-01-22 安徽理工大学 Data packet transmission intelligent decision method based on deep reinforcement learning
CN112511276A (en) * 2020-11-24 2021-03-16 广州技象科技有限公司 Data processing method and device
CN112508172A (en) * 2020-11-23 2021-03-16 北京邮电大学 Space flight measurement and control adaptive modulation method based on Q learning and SRNN model
CN112584361A (en) * 2020-12-09 2021-03-30 齐鲁工业大学 Resource scheduling method and device based on deep reinforcement learning in M2M communication
CN113316259A (en) * 2021-06-29 2021-08-27 北京科技大学 Method and device for scheduling downlink wireless resources supporting AI engine
CN114340017A (en) * 2022-03-17 2022-04-12 山东科技大学 Heterogeneous network resource slicing method with eMBB and URLLC mixed service
CN116234047A (en) * 2023-03-16 2023-06-06 华能伊敏煤电有限责任公司 Mixed service intelligent resource scheduling method based on reinforcement learning algorithm

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108391143A (en) * 2018-04-24 2018-08-10 南京邮电大学 A kind of wireless network transmission of video self-adaptation control method based on Q study
CN109257429A (en) * 2018-09-25 2019-01-22 南京大学 A kind of calculating unloading dispatching method based on deeply study
CN109561504A (en) * 2018-11-20 2019-04-02 北京邮电大学 A kind of resource multiplexing method of URLLC and eMBB based on deeply study
CN109873869A (en) * 2019-03-05 2019-06-11 东南大学 A kind of edge cache method based on intensified learning in mist wireless access network
CN109951897A (en) * 2019-03-08 2019-06-28 东华大学 A kind of MEC discharging method under energy consumption and deferred constraint
CN110035478A (en) * 2019-04-18 2019-07-19 北京邮电大学 A kind of dynamic multi-channel cut-in method under high-speed mobile scene
US20190325304A1 (en) * 2018-04-24 2019-10-24 EMC IP Holding Company LLC Deep Reinforcement Learning for Workflow Optimization
US20190356446A1 (en) * 2017-01-06 2019-11-21 Electronics And Telecommunications Research Institute Uplink control information transmission method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190356446A1 (en) * 2017-01-06 2019-11-21 Electronics And Telecommunications Research Institute Uplink control information transmission method and device
CN108391143A (en) * 2018-04-24 2018-08-10 南京邮电大学 A kind of wireless network transmission of video self-adaptation control method based on Q study
US20190325304A1 (en) * 2018-04-24 2019-10-24 EMC IP Holding Company LLC Deep Reinforcement Learning for Workflow Optimization
CN109257429A (en) * 2018-09-25 2019-01-22 南京大学 A kind of calculating unloading dispatching method based on deeply study
CN109561504A (en) * 2018-11-20 2019-04-02 北京邮电大学 A kind of resource multiplexing method of URLLC and eMBB based on deeply study
CN109873869A (en) * 2019-03-05 2019-06-11 东南大学 A kind of edge cache method based on intensified learning in mist wireless access network
CN109951897A (en) * 2019-03-08 2019-06-28 东华大学 A kind of MEC discharging method under energy consumption and deferred constraint
CN110035478A (en) * 2019-04-18 2019-07-19 北京邮电大学 A kind of dynamic multi-channel cut-in method under high-speed mobile scene

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
H.FARÈS ET AL.: "Two-level HARQ for turbo coded cooperation: System retransmission gain and optimal time allocation", 《2012 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC),2012》 *
吴大鹏: "带有上行数据帧聚合的光无线融合接入网络节能机制", 《电子与信息学报》 *
廖晓闽 等: "基于深度强化学习的蜂窝网资源分配算法", 《通信学报》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112188539A (en) * 2020-10-10 2021-01-05 南京理工大学 Interference cancellation scheduling code design method based on deep reinforcement learning
CN112261725A (en) * 2020-10-23 2021-01-22 安徽理工大学 Data packet transmission intelligent decision method based on deep reinforcement learning
CN112261725B (en) * 2020-10-23 2022-03-18 安徽理工大学 Data packet transmission intelligent decision method based on deep reinforcement learning
CN112508172A (en) * 2020-11-23 2021-03-16 北京邮电大学 Space flight measurement and control adaptive modulation method based on Q learning and SRNN model
CN112511276A (en) * 2020-11-24 2021-03-16 广州技象科技有限公司 Data processing method and device
CN112584361A (en) * 2020-12-09 2021-03-30 齐鲁工业大学 Resource scheduling method and device based on deep reinforcement learning in M2M communication
CN112584361B (en) * 2020-12-09 2021-09-07 齐鲁工业大学 Resource scheduling method and device based on deep reinforcement learning in M2M communication
CN113316259A (en) * 2021-06-29 2021-08-27 北京科技大学 Method and device for scheduling downlink wireless resources supporting AI engine
CN114340017A (en) * 2022-03-17 2022-04-12 山东科技大学 Heterogeneous network resource slicing method with eMBB and URLLC mixed service
CN114340017B (en) * 2022-03-17 2022-06-07 山东科技大学 Heterogeneous network resource slicing method with eMBB and URLLC mixed service
CN116234047A (en) * 2023-03-16 2023-06-06 华能伊敏煤电有限责任公司 Mixed service intelligent resource scheduling method based on reinforcement learning algorithm

Also Published As

Publication number Publication date
CN111182644B (en) 2022-02-08

Similar Documents

Publication Publication Date Title
CN111182644B (en) Joint retransmission URLLC resource scheduling method based on deep reinforcement learning
CN109561504B (en) URLLC and eMMC resource multiplexing method based on deep reinforcement learning
Liu et al. A cross-layer scheduling algorithm with QoS support in wireless networks
US9509478B2 (en) Method and apparatus for data and control multiplexing
JP5259596B2 (en) Recovery from resource mismatch in wireless communication systems
Liu et al. Cross-layer scheduling with prescribed QoS guarantees in adaptive wireless networks
CN103209494B (en) A kind of real-time video traffic resource allocation methods based on importance labelling
CN101895926A (en) Method and apparatus for scheduling in a wireless network
CN112838911B (en) Method and apparatus in a node used for wireless communication
CN114867030A (en) Double-time-scale intelligent wireless access network slicing method
Chung et al. A-MPDU using fragmented MPDUs for IEEE 802.11 ac MU-MIMO WLANs
Kallel et al. A flexible numerology configuration for efficient resource allocation in 3GPP V2X 5G new radio
US20230239881A1 (en) Lower analog media access control (mac-a) layer and physical layer (phy-a) functions for analog transmission protocol stack
Asheralieva et al. A two-step resource allocation procedure for LTE-based cognitive radio network
Noh et al. Application-level qos and qoe assessment of a cross-layer packet scheduling scheme for audio-video transmission over error-prone ieee 802.11 e hcca wireless lans
KR20050083085A (en) Apparatus and method for scheduling traffic data in mobile communication system using the orthogonal frequency division multiple access
Jiang et al. The design of transport block-based ROHC U-mode for LTE multicast
WO2022017127A1 (en) Method and apparatus used in user equipment and base station for wireless communication
CN112688763B (en) Method and apparatus in a node used for wireless communication
WO2023077757A1 (en) Method and apparatus used in node for wireless communication
WO2023179470A1 (en) Method and apparatus used in node for wireless communication
WO2023193672A1 (en) Method and apparatus for wireless communication
KR101958069B1 (en) Method, apparatus and system for transmitting svc video data in radio communication environment
CN115941122A (en) Method and apparatus for wireless communication
CN117479225A (en) Method and apparatus for wireless communication

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant