CN111866807A - Software definition vehicle-mounted task fine-grained unloading method based on deep reinforcement learning - Google Patents

Software definition vehicle-mounted task fine-grained unloading method based on deep reinforcement learning Download PDF

Info

Publication number
CN111866807A
CN111866807A CN202010571179.1A CN202010571179A CN111866807A CN 111866807 A CN111866807 A CN 111866807A CN 202010571179 A CN202010571179 A CN 202010571179A CN 111866807 A CN111866807 A CN 111866807A
Authority
CN
China
Prior art keywords
vehicle
unloading
rsu
task
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010571179.1A
Other languages
Chinese (zh)
Other versions
CN111866807B (en
Inventor
李致远
彭二帅
潘森杉
毕俊蕾
张威威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University
Original Assignee
Jiangsu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University filed Critical Jiangsu University
Priority to CN202010571179.1A priority Critical patent/CN111866807B/en
Publication of CN111866807A publication Critical patent/CN111866807A/en
Application granted granted Critical
Publication of CN111866807B publication Critical patent/CN111866807B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/40Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
    • H04W4/44Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P] for communication between vehicles and infrastructures, e.g. vehicle-to-cloud [V2C] or vehicle-to-home [V2H]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/06Testing, supervising or monitoring using simulated traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/08Load balancing or load distribution
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The invention discloses a software-defined vehicle-mounted task fine-grained unloading method based on deep reinforcement learning, which comprises the following steps of: 1. obtaining the information that the vehicle can access the RSU, the information of the vehicle task and the like; 2. dividing unloading time slots of the vehicle-mounted tasks according to the RSU information in the step 1; 3. converting a vehicle-mounted task unloading time slot decision method into a mathematical problem; 4. solving the mathematical problem in the step 3 by using a deep reinforcement learning method; 5. the algorithm is deployed to the SDN controller. The invention can fully utilize the unloading time slot to reduce the network transmission delay caused by unloading. The unloading time slot decision making method fully considers factors including the relative positions of the vehicles and the RSU, the number of the vehicles connected into the RSU, the number of vehicle-mounted tasks required to be received by the RSU and the like, and can effectively reduce the unloading delay of the vehicle tasks.

Description

Software definition vehicle-mounted task fine-grained unloading method based on deep reinforcement learning
Technical Field
The invention belongs to the field of vehicle-mounted mobile edge calculation, and relates to a vehicle-mounted task unloading time slot decision method, which is suitable for small-sized base station (small-cell base stations) environments, and is particularly suitable for small-sized base station load balancing in a local area network.
Background
With the rapid development of the internet of things technology, Mobile Edge Computing (MEC) has become an important component of the internet of things technology. The user may access the mobile edge computing through a wireless access point such as a base station, Road Side Unit (RSU), etc. The MEC may provide computing, storage, etc. resources for the user. These features find wide application in vehicle networks: vehicle Edge Computing (VEC) is a new network model that has been developed in recent years.
The application in the vehicle network can make the vehicle travel more convenient and safer. With the continuous development of vehicle applications, applications such as real-time road analysis, automatic driving, virtual reality and the like which require strong computing power and a large amount of storage space are more and more, and data contents which need to be transmitted are also more and more. The current mainstream research on task offloading of vehicles has focused on the allocation of computing resources. Most of the on-board task unloading time slot decisions are randomly selected, which cannot fully utilize the unloading time slot to reduce the network transmission delay caused by unloading. Factors influencing the task unloading time slot include the relative position of the current vehicle and the RSU, the number of vehicles accessed in the current RSU, the number of vehicle-mounted tasks required to be received by the current RSU and the like.
In view of the above, it is desirable to provide a method for deciding an unloading time slot of a vehicle-mounted task, which can cope with the unloading situation of the vehicle-mounted task and can consider various influence factors.
Disclosure of Invention
Aiming at the problems, the invention provides a Software-Defined vehicle-mounted task unloading time slot decision method based on deep learning, which mainly researches and obtains global state perception data of a Network, such as the number of vehicles accessed in an RSU (remote Defined Network, SDN) in the Network, the load state of an MEC (media independent component) server, Network delay of the RSU and the like, and constructs a self-adaptive optimization decision by combining a deep learning model on the basis to give suggestions of local unloading, global unloading and optimal vehicle-mounted task unloading time slot so as to solve the problem of overhigh delay caused by unloading of vehicle-mounted tasks, and comprises the following steps:
step 1, obtaining information: a set r of RSUs accessible to the vehicle, a network bandwidth b of the vehicle task Q, RSU requesting offloading in the RSU area;
step 2, dividing unloading time slots of the vehicle-mounted tasks according to the RSU information in the step 1;
step 3, modeling the vehicle-mounted task unloading time slot decision method;
step 4, solving the model expression in the step 3 by using a deep reinforcement learning method;
And 5, deploying the algorithm to the SDN controller.
Further, the information in step 1 includes:
let the unload task in RSU region be Q ═ Q1,…Qi,…,QnIn which QiA mission representing an ith vehicle;
② the size of the vehicle-mounted task is recorded as M ═ M1,…,Mi,…MnIn which M isiRepresents QiThe size of (d);
time delay constraint of vehicle-mounted task is recorded as T ═ T1,…,Ti,…,TnWhere T isiIs namely QiDelay constraints of (2);
r ═ R is defined as the RSU set accessible to vehicles1,…Ri,…Rn};RiRepresents the ith RSU
Fifthly, the number of the vehicle-mounted tasks which are accessed by each RSU is rA ═ R { (R)1A,…,RiA,…,RnA }; wherein R isiA represents the number of the vehicle-mounted tasks accessed in the ith RSU;
the bandwidth of RSU is marked as B ═ B1,…,Bi,…,BnIn which B isiRepresents RiThe network bandwidth of (a);
further, the unloading time slot dividing method of the vehicle-mounted task in the step 2 comprises the following steps:
step 2.1, collecting link bandwidth of RSU, and recording as W; collecting the average signal power of the RSU, and recording the average signal power as P; collecting noise work of RSUThe rate is marked as N; recording link loss power of RSU and vehicle as Lp
Step 2.2, the transmission rate v of the vehicle and the RSU can be expressed as:
Figure BDA0002549632880000021
wherein [ L ]P]D is the distance of the vehicle from the RSU in km and f is the signal frequency of the RSU in MHz at 32.45+20lgd +20 lgf.
Step 2.3, the transmission delay of the onboard task with size M can be expressed as:
Figure BDA0002549632880000022
Step 2.4, because the network transmission delay is influenced by the relative distance between the vehicle and the RSU, dividing the coverage area of each RSU into n task unloading time slots Gap1,…,Gapi,…GapnWhere any slot is denoted by g, g ∈ [ Gap [ ]1,…,Gapi,…Gapn]. For convenience of calculation and description, the transmission rates in the same region are assumed to be the same. For convenience of calculation, the RSU is used as a ground vertical point, and g is the distance between the unloading time slot and the vertical point. Then
Figure BDA0002549632880000023
Wherein high is the vertical height of the RSU and the ground;
further, the method for modeling the vehicle-mounted task unloading time slot decision method in the step 3 comprises the following steps:
step 3.1, define the offload slot decision as L ═ L1,…,Li,…,Ln},LiThe unloading time slot decision is obtained by representing the unloading task selection positions of the ith vehicle, wherein the combination of the unloading task selection positions of all vehicles is the unloading time slot decision;
and 3.2, determining the unloading decision of a single task. Single slot decision L for unloading certain vehicle-mounted taskiI.e. selection of an offload slot g, i.e. pair
Figure BDA0002549632880000031
Must have Li∈[Gap1,…,Gapi,…Gapn]
And 3.3, as can be seen from the formulas (1) and (2), the transmission delay of the vehicle-mounted task is determined by the bandwidth b of the RSU, the unloading time slot decision l and the size m of the vehicle-mounted task, and the transmission delay of the vehicle-mounted task can be rewritten as:
Figure BDA0002549632880000032
the link bandwidth W of the RSU is replaced by the bandwidth b of the RSU by the expression (3); representing the relative distance between the vehicle and the RSU;
And 3.4, rewriting the transmission delay of the vehicle-mounted task again by the formula (3) as follows:
Figure BDA0002549632880000033
wherein [ L ] isP]=32.45+20lgd+20lgf;
Step 3.5, converting the decision method of the unloading time slot of the vehicle-mounted task into a solving formula (5), Di(b,l,Mi) Indicating the transmission delay of the ith onboard task.
Figure BDA0002549632880000034
Wherein, z represents whether to unload the task, z is 1 represents unloading the task, and z is 0 represents unloading the task; MAXrARepresents the maximum value of rA; the value of rA is influenced by the decision of unloading time slot of the vehicle-mounted task, and rA is less than or equal to MAXrAIndicating that rA cannot exceed the maximum number of on-board tasks to be accessed.
Further, the specific steps of solving the formula (5) by using the deep reinforcement learning method in the step 4 are as follows:
step 4.1, build Markov State space
S={t,rV,rD,rA}
Wherein the various parameters are specified below:
time delay constraint of vehicle-mounted task is recorded as T ═ T1,…,Ti,…,TnWhere T isiIs task QiDelay constraints of (2);
(R) RSU set for vehicle access is defined as R ═ R1,…Ri,…RnAny unloaded time slot of each RSU in the r is denoted by g, and g belongs to [ Gap ]1,…,Gapi,…Gapn]Where the unloading rates of the vehicle tasks in different unloading slots differ, the set of unloading rates for all unloading slots in R can be expressed as rV ═ { R ═ R1G1V,…,RiGjV,…,RnGnV},RiGjV represents the transmission rate of the jth unloading time slot of the ith RSU;
③ the transmission delay of the vehicle-mounted task in each unloading time slot of each RSU in R is expressed as rD ═ R 1G1D,…,RiGjD,…RnGnD},RiGjD represents the transmission delay of the vehicle-mounted task in the jth unloading time slot of the ith RSU;
fourthly, the number of the vehicle-mounted tasks which are accessed by each RSU is rA ═ R { (R)1A,…,RiA,…,RnA};
Step 4.2, build Markov motion space
A={(a,b)|a∈{[1,n]∩N+},b∈{[1,n]∩N+}
Wherein the various parameters are specified below:
a represents an RSU accessed by a vehicle when unloading a vehicle-mounted task is executed;
b represents the unloading time slot of the RSU accessed by the vehicle when the unloading vehicle-mounted task is executed;
③N+representing a positive integer.
And 4.3, establishing a Markov reward function reward:
reward=(η)×base+(2(η)-1)×delay(rD,t)+access(rA)
wherein the various parameters are specified below:
(η) is a step function
Figure BDA0002549632880000041
When the eta is 1, the unloading of the vehicle-mounted task is successful, and when the eta is 0, the unloading of the vehicle-mounted task is failed.
base is a constant and represents the basic reward, then (eta) x base represents that the basic reward is obtained when the unloading of the vehicle-mounted task is successful, and the basic reward is not obtained when the unloading of the vehicle-mounted task is failed;
delay (rD, t) represents the reward or penalty gained for performing a vehicle unloading task
delay(rD)=Rward×(rD-t)
Wherein rD represents the time taken for unloading the on-board task, and t represents the unloading time constraint of the on-board task. And when unloading is completed within the constraint time t, obtaining the reward, and otherwise, obtaining the penalty. Rward is a reward value or penalty value;
(rA) is used for judging whether the current RSU can also receive more vehicle-mounted tasks
Figure BDA0002549632880000051
MAXrARepresenting the maximum number of on-board tasks that the current RSU can access. When more vehicle-mounted tasks can be accessed, namely rA is less than or equal to MAXrAAccess (rA) has no influence on reward function reward, when rA > MAXrAThen access (rA) will cause reward to equal 0, i.e., there will be no prize.
Step 4.4, according to the Markov model in the step 4.1-4.3, using DDPG-HER algorithm to solve the optimal unloading time slot decision, and the concrete solving steps are as follows:
step 4.4.1, establishing an Actor current network, an Actor target network, a Critic current network and a Critic target network, wherein the descriptions of the four networks are as follows:
the parameter of the current network of the Actor is theta, and theta also refers to a neural network and is responsible for updating the parameter theta of the network and generating a current action A according to the current state S. Action a acts on the current state S, which represents the set of information such as the decision that an unloading slot of a certain vehicle is being made, the location of the vehicle, which decisions have been made, etc. Generating a state S' and a reward R, the reward R being obtained by a reward function reward;
the Actor target network has parameters theta ', theta' also refers to a neural network and is responsible for selecting an action A 'from the experience playback pool and updating theta';
and thirdly, the parameter of the Critic current network is omega, which also refers to a neural network and is responsible for calculating the current Q value, and the Q value is used for measuring the quality of the selection action. Note that: the Q value here is the same as the Q previously representing the i-th vehicle task iDifferent;
and the parameter of the Critic target network is omega ', also refers to a neural network and is responsible for calculating a target Q value, namely Q'.
And 4.4.2, training an Actor current network, an Actor target network, a Critic current network and a Critic target network. The specific steps are as follows:
step 4.4.2.1, first obtaining an initialization state S, and the Actor current network generating action A according to the state S;
step 4.4.2.2, calculate the reward R according to state S and action a, and get the next state S';
step 4.4.2.3, storing { S, A, S' } in an experience playback pool;
step 4.4.2.4, recording the current state as S';
step 4.4.2.5, calculating the current Q value and the target Q value;
step 4.4.2.6, updating the Critic current network parameter omega;
step 4.4.2.7, updating the current network parameter theta of the Actor;
in step 4.4.2.8, if the current state S' is the termination state, the iteration is completed, otherwise go to step 4.4.2.2.
And 4.4.3, obtaining the optimal unloading time slot by the trained network.
Further, the specific method for deploying the algorithm to the SDN controller in step 5 is as follows:
and after the DDPG-HER algorithm training is completed, saving the current network of the Actor and deploying the current network to the SDN controller. When unloading is required, the SDN controller determines the optimal unloading time slot for the vehicle-mounted task according to the current state information of the network and the nodes.
The invention has the beneficial effects that:
aiming at the defects of the prior art, the invention divides the coverage area of the RSU into a plurality of intervals to accurately select the unloading time slot, calculates the optimal unloading decision by reasonable analysis and modeling and simultaneously using a DDP-HER algorithm, and reduces the network delay caused by unloading of the vehicle task.
Drawings
FIG. 1 is a flowchart of an on-board task offload slot decision process
FIG. 2 DDPG-HER Algorithm flow chart
Detailed Description
The invention will be further explained with reference to the drawings.
The present invention is further described below, it should be noted that the specific implementation of the present embodiment is based on the present technology, and detailed implementation procedures and implementation steps are given, but the scope of the present invention is not limited by the present embodiment.
As shown in FIG. 1, assume that at this time vehicle i is ready to unload onboard task QiThen, the specific implementation flow of the present invention is as follows:
(1) an SDN controller is used. The network bandwidth b information including the set r of RSUs, the vehicle tasks Q, RSU requesting offloading in the RSU area in each local area network may be aggregated into the SDN controller when changes occur. Vehicle i preparation unloading vehicle-mounted task QiIts request information is sent to the SDN controller;
(2) According to the divided unloading time slots, the vehicle i prepares to unload the vehicle-mounted task QiAccording to the formula
Figure BDA0002549632880000071
Calculating unloading delay generated by unloading of the vehicle i in different time slots;
(3) SDN controller summarizes unloading task Q of vehicle iiAnd unloading of other vehiclesTask, according to formula
Figure BDA0002549632880000072
Converting the task-loading unloading time slot decision method into a value for solving the above expression;
(4) the above expression is solved using the DDPG-HER algorithm. The method comprises the following specific steps:
1. the initialization state S, i.e. the state of the respective RSU, the completion of all vehicle tasks, is first obtained. And the Actor current network generates an action A according to the state S, wherein the action A is the unloading time slot selected by the task of a certain vehicle. The specific method comprises the following steps: computing a feature vector phi (S) of state S, action
Figure BDA0002549632880000073
Wherein piθA strategy (action) generated by the neural network theta is shown, the neural network theta (Actor current network) can select a time slot for unloading the vehicle task according to the information such as the current state of the RSU,
Figure BDA0002549632880000074
representing noise;
2. the reward R is calculated from the current state S and the action a, and a new state S' is generated. After a certain time slot for unloading the vehicle task is selected, the state of each RSU and the completion condition of all vehicle tasks are changed, and the new state is defined as S';
3. And storing { S, A, S' } into an experience replay pool, wherein the aim is to train the neural network better. Selecting an action A ' by the Actor target network theta ' according to S ' in the experience pool;
4. recording the current state as S';
5. calculating the current Q value and the target Q value
Figure BDA0002549632880000075
Q (S, A, omega) is the current Q value, Q '(S', A ', omega') is the target Q value, and the calculation of the current network omega is completed by inputting the state S and the action A into Critic; y is the target Q value, where Q '(S', A ', ω') is calculated in the same principle as Q (S, A, ω); γ is the learning rate.
6. Updating the criticic current network ω using the current Q value and the target Q value:
ω←ω+(y-Q(S,A,ω))
y represents a more accurate Q value, and ω + (y-Q (S, A, ω)) means that Critic' S current net ω updates itself by Q value.
The criticic current network omega helps the Actor current network theta to update:
θ←θ-TD(S,A,ω)
and TD (S, A, omega) represents that omega calculates the error of the action A selected in the state S from the optimal action, and theta-TD (S, A, omega) represents that the operator eliminates the error in the current network theta.
If the current state S' is the termination state, the iteration is finished, the Actor current network makes the decision of the optimal unloading time slot, otherwise, the step 2 is carried out.
The above-listed series of detailed descriptions are merely specific illustrations of possible embodiments of the present invention, and they are not intended to limit the scope of the present invention, and all equivalent means or modifications that do not depart from the technical spirit of the present invention are intended to be included within the scope of the present invention.

Claims (8)

1. A software definition vehicle-mounted task fine-grained unloading method based on deep reinforcement learning is characterized by comprising the following steps:
step 1, obtaining information: a set r of RSUs accessible to the vehicle, a network bandwidth b of the vehicle task Q, RSU requesting offloading in the RSU area;
step 2, dividing unloading time slots of the vehicle-mounted tasks according to the RSU information in the step 1;
step 3, modeling the vehicle-mounted task unloading time slot decision method;
and 4, solving the model expression in the step 3 by using a deep reinforcement learning method.
2. The software-defined vehicle-mounted task fine-grained unloading method based on deep reinforcement learning according to claim 1, wherein the information in the step 1 specifically comprises:
(ii) offload tasks in the RSU region, denoted as Q ═ Q1,…Qi,…,QnIn which QiA mission representing an ith vehicle;
② the size of the vehicle-mounted task, and is recorded as M ═ M1,…,Mi,…MnIn which M isiRepresents QiThe size of (d);
③t={T1,…,Ti,…,Tnwhere T isiIs namely QiDelay constraints of (2);
r ═ R of RSU set accessible to vehicle1,…Ri,…Rn};
Fifthly, the number of the vehicle-mounted tasks which are accessed by each RSU is recorded as rA ═ R1A,…,RiA,…,RnA};
Bandwidth of RSU, noted as B ═ B1,…,Bi,…,BnIn which B isiRepresents RiThe network bandwidth of (2).
3. The software-defined vehicle-mounted task fine-grained unloading method based on deep reinforcement learning according to claim 1, wherein the unloading time slot dividing method for the vehicle-mounted task in the step 2 is as follows:
Step 2.1, collecting link bandwidth of RSU, and recording as W; collecting the average signal power of the RSU, and recording the average signal power as P; collecting noise power of RSU, and recording as N; recording link loss power of RSU and vehicle as Lp
Step 2.2, the transmission rate v of the vehicle and the RSU can be expressed as:
Figure FDA0002549632870000011
wherein [ L ]P]D is the distance between the vehicle and the RSU, and f is the signal frequency of the RSU, wherein the d is 32.45+20lg d +20lg f;
step 2.3, the transmission delay of the onboard task with size M can be expressed as:
Figure FDA0002549632870000012
step 2.4, dividing the coverage area of each RSU into n task unloading time slots Gap according to the influence of the relative distance between the vehicle and the RSU on the network delay1,…,Gapi,…GapnWhere any slot is denoted by g, g ∈ [ Gap [ ]1,…,Gapi,…Gapn
4. The software-defined vehicle-mounted task fine-grained unloading method based on deep reinforcement learning according to claim 3, wherein the method for modeling the vehicle-mounted task unloading time slot decision method in the step 3 is as follows:
step 3.1, define the offload decision as L ═ L1,…,Li,…,Ln},LiIndicating a location of the ith vehicle to select an unloading task; and g represents the distance between the unloading time slot and the ground vertical point by the SRU. Then
Figure FDA0002549632870000021
Wherein high is the vertical height of the RSU and the ground;
step 3.2, determining the unloading decision of a single task, and the unloading time slot decision L of the vehicle-mounted task iI.e. selection of an offload slot g, i.e. pair
Figure FDA0002549632870000022
Must have Li∈[Gap1,…,Gapi,…Gapn];
Step 3.3, the transmission delay of the vehicle-mounted task can be determined by the bandwidth b of the RSU, the unloading time slot decision l and the size m of the vehicle-mounted task, and then the transmission delay of the vehicle-mounted task can be rewritten as:
Figure FDA0002549632870000023
the link bandwidth W of the RSU is replaced by the bandwidth b of the RSU by the expression (3); representing that the relative distance between the vehicle and the RSU is represented by decision l;
and 3.4, rewriting the transmission delay of the vehicle-mounted task again by the formula (3) as follows:
Figure FDA0002549632870000024
wherein L isp=32.45+20lg l(km)+20lg f(MHz);
Step 3.5, converting the decision method of the unloading time slot of the vehicle-mounted task into a solving formula (5), Di(b,l,Mi) Indicating the transmission delay of the ith onboard task.
Figure FDA0002549632870000025
Wherein, MAXrARepresents the maximum value of rA; the value of rA is influenced by the decision of unloading time slot of the vehicle-mounted task, and rA is less than or equal to MAXrAIndicating that rA cannot exceed the maximum number of on-board tasks to be accessed.
5. The software-defined vehicle-mounted task fine-grained unloading method based on deep reinforcement learning of claim 4 is characterized in that the specific steps of solving the formula (5) by using the deep reinforcement learning method in the step 4 are as follows:
step 4.1, build Markov State space
S={t,rV,rD,rA}
Wherein the various parameters are specified below:
time delay constraint of vehicle-mounted task is recorded as T ═ T1,…,Ti,…,TnWhere T isiIs task Q iDelay constraints of (2);
(R) RSU set for vehicle access is defined as R ═ R1,…Ri,…RnAny unloaded time slot of each RSU in the r is denoted by g, and g belongs to [ Gap ]1,…,Gapi,…Gapn]The unloading rate of the vehicle task in different unloading time slots is different, and the unloading of all the unloading time slots in r is carried outThe carrier rate set is denoted rV ═ R1G1V,…,RiGjV,…,RnGnV},RiGjV represents the transmission rate of the jth unloading time slot of the ith RSU;
③ the transmission delay of the vehicle-mounted task in each unloading time slot of each RSU in R is expressed as rD ═ R1G1D,…,RiGjD,…RnGnD},RiGjD represents the transmission delay of the vehicle-mounted task in the jth unloading time slot of the ith RSU;
fourthly, the number of the vehicle-mounted tasks which are accessed by each RSU is rA ═ R { (R)1A,…,RiA,…,RnA};
Step 4.2, build Markov motion space
A={(a,b)|a∈{[1,n]∩N+},b∈{[1,n]∩N+}
Wherein the various parameters are specified below:
a represents an RSU accessed by a vehicle when unloading a vehicle-mounted task is executed;
b represents the unloading time slot of the RSU accessed by the vehicle when the unloading vehicle-mounted task is executed;
③N+represents a positive integer;
and 4.3, establishing a Markov reward function reward:
reward=(η)×base+(2(η)-1)×delay(rD,t)+access(rA)
wherein the various parameters are specified below:
(η) is a step function
Figure FDA0002549632870000041
When the (eta) is 1, the unloading success of the vehicle-mounted task is shown, when the (eta) is 0, the unloading failure of the vehicle-mounted task is shown, the base is a constant and shows the basic reward, and when the unloading success of the vehicle-mounted task is shown, the (eta) multiplied by the base shows that the basic reward is obtained, and when the unloading failure of the vehicle-mounted task is shown, the basic reward is not obtained;
Delay (rD, t) represents the reward or penalty gained for performing a vehicle unloading task
delay(rD)=Rward×(rD-t)
rD represents the time for unloading the vehicle-mounted task, t represents the unloading time constraint of the vehicle-mounted task, reward is obtained when unloading is completed within the constraint time t, otherwise punishment is obtained, and Rward is a reward value or a punishment value;
(rA) is used for judging whether the current RSU can also receive more vehicle-mounted tasks
Figure FDA0002549632870000042
MAXrARepresenting the maximum number of vehicle-mounted tasks which can be accessed by the current RSU, when more vehicle-mounted tasks can be accessed, namely rA is less than or equal to MAXrAAccess (rA) has no influence on reward function reward, when rA > MAXrAThen access (rA) will cause reward to equal 0, i.e., no reward;
and 4.4, solving the optimal unloading time slot by using a DDPG-HER algorithm according to the Markov model in the step 4.1-4.3.
6. The software-defined vehicle-mounted task fine-grained unloading method based on deep reinforcement learning according to claim 5, wherein the specific implementation of the step 4.4 comprises the following steps:
step 4.4.1, establishing an Actor current network, an Actor target network, a criticic current network and a criticic target network, wherein the descriptions of the four networks are as follows:
The Actor is characterized in that the parameter of the current network is theta, the theta also refers to a neural network and is responsible for updating the parameter theta of the network and generating a current action A according to the current state S, the action A acts on the current state S to generate a state S' and an award R, and the award R is obtained by an award function reward;
the Actor target network has parameters theta ', theta' also refers to a neural network and is responsible for selecting an action A 'from the experience playback pool and updating theta';
the parameter of the Critic current network is omega, which also refers to a neural network and is responsible for calculating the current Q value, and the Q value is used for measuring the quality of the selection action;
the parameter of the Critic target network is omega ', also refers to a neural network and is responsible for calculating a target Q value, namely Q';
step 4.4.2, training an Actor current network, an Actor target network, a Critic current network and a Critic target network, and specifically comprises the following steps:
step 4.4.2.1, first obtaining an initialization state S, and the Actor current network generating action A according to the state S;
step 4.4.2.2, calculate the reward R according to state S and action a, and get the next state S';
step 4.4.2.3, storing { S, A, S' } in an experience playback pool;
step 4.4.2.4, recording the current state as S';
step 4.4.2.5, calculating the current Q value and the target Q value;
Step 4.4.2.6, updating the Critic current network parameter omega;
step 4.4.2.7, updating the current network parameters of the Actor;
step 4.4.2.8, if the current state S' is the termination state, the iteration is finished, otherwise go to step 4.4.2.2;
and 4.4.3, calculating the optimal unloading time slot by the trained network.
7. The deep reinforcement learning-based software defined vehicle-mounted task fine-grained unloading method according to claim 1, further comprising a step 5 of deploying an algorithm to an SDN controller.
8. The software-defined vehicle-mounted task fine-grained unloading method based on deep reinforcement learning according to claim 7, wherein the specific method in the step 5 is as follows:
and after the DDPG-HER algorithm training is completed, saving the current network of the Actor and deploying the current network to the SDN controller. When unloading is required, the SDN controller determines the optimal unloading time slot for the vehicle-mounted task according to the current state information of the network and the nodes.
CN202010571179.1A 2020-06-22 2020-06-22 Software definition vehicle-mounted task fine-grained unloading method based on deep reinforcement learning Active CN111866807B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010571179.1A CN111866807B (en) 2020-06-22 2020-06-22 Software definition vehicle-mounted task fine-grained unloading method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010571179.1A CN111866807B (en) 2020-06-22 2020-06-22 Software definition vehicle-mounted task fine-grained unloading method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN111866807A true CN111866807A (en) 2020-10-30
CN111866807B CN111866807B (en) 2022-10-28

Family

ID=72987863

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010571179.1A Active CN111866807B (en) 2020-06-22 2020-06-22 Software definition vehicle-mounted task fine-grained unloading method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN111866807B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112714178A (en) * 2020-12-25 2021-04-27 北京信息科技大学 Task unloading method and device based on vehicle-mounted edge calculation
CN113422795A (en) * 2021-05-06 2021-09-21 江苏大学 Vehicle-mounted edge task centralized scheduling and resource allocation joint optimization method based on deep reinforcement learning
CN113645273A (en) * 2021-07-06 2021-11-12 南京邮电大学 Internet of vehicles task unloading method based on service priority
CN114116047A (en) * 2021-11-09 2022-03-01 吉林大学 V2I unloading method for vehicle-mounted computation-intensive application based on reinforcement learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109067842A (en) * 2018-07-06 2018-12-21 电子科技大学 Calculating task discharging method towards car networking
CN109257429A (en) * 2018-09-25 2019-01-22 南京大学 A kind of calculating unloading dispatching method based on deeply study
CN109756378A (en) * 2019-01-12 2019-05-14 大连理工大学 A kind of intelligence computation discharging method under In-vehicle networking
CN110557769A (en) * 2019-09-12 2019-12-10 南京邮电大学 C-RAN calculation unloading and resource allocation method based on deep reinforcement learning
CN110798842A (en) * 2019-01-31 2020-02-14 湖北工业大学 Heterogeneous cellular network flow unloading method based on multi-user deep reinforcement learning
CN110891253A (en) * 2019-10-14 2020-03-17 江苏大学 Community popularity-based vehicle-mounted delay tolerant network routing method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109067842A (en) * 2018-07-06 2018-12-21 电子科技大学 Calculating task discharging method towards car networking
CN109257429A (en) * 2018-09-25 2019-01-22 南京大学 A kind of calculating unloading dispatching method based on deeply study
CN109756378A (en) * 2019-01-12 2019-05-14 大连理工大学 A kind of intelligence computation discharging method under In-vehicle networking
CN110798842A (en) * 2019-01-31 2020-02-14 湖北工业大学 Heterogeneous cellular network flow unloading method based on multi-user deep reinforcement learning
CN110557769A (en) * 2019-09-12 2019-12-10 南京邮电大学 C-RAN calculation unloading and resource allocation method based on deep reinforcement learning
CN110891253A (en) * 2019-10-14 2020-03-17 江苏大学 Community popularity-based vehicle-mounted delay tolerant network routing method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
彭二帅: "基于负载预测的车载边缘资源最优控制调度研究与系统实现", 《中国优秀硕士学位论文全文数据库》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112714178A (en) * 2020-12-25 2021-04-27 北京信息科技大学 Task unloading method and device based on vehicle-mounted edge calculation
CN112714178B (en) * 2020-12-25 2023-05-12 北京信息科技大学 Task unloading method and device based on vehicle-mounted edge calculation
CN113422795A (en) * 2021-05-06 2021-09-21 江苏大学 Vehicle-mounted edge task centralized scheduling and resource allocation joint optimization method based on deep reinforcement learning
CN113645273A (en) * 2021-07-06 2021-11-12 南京邮电大学 Internet of vehicles task unloading method based on service priority
CN114116047A (en) * 2021-11-09 2022-03-01 吉林大学 V2I unloading method for vehicle-mounted computation-intensive application based on reinforcement learning
CN114116047B (en) * 2021-11-09 2023-11-03 吉林大学 V2I unloading method for vehicle-mounted computation intensive application based on reinforcement learning

Also Published As

Publication number Publication date
CN111866807B (en) 2022-10-28

Similar Documents

Publication Publication Date Title
CN111866807B (en) Software definition vehicle-mounted task fine-grained unloading method based on deep reinforcement learning
CN110213796B (en) Intelligent resource allocation method in Internet of vehicles
CN110427690A (en) A kind of method and device generating ATO rate curve based on global particle swarm algorithm
CN114650567A (en) Unmanned aerial vehicle-assisted V2I network task unloading method
CN113904948A (en) 5G network bandwidth prediction system and method based on cross-layer multi-dimensional parameters
CN113687875A (en) Vehicle task unloading method and device in Internet of vehicles
CN114374741A (en) Dynamic grouping internet-of-vehicle caching method based on reinforcement learning under MEC environment
CN112598146A (en) Method and device for determining parking position, electronic equipment and readable storage medium
CN115376031A (en) Road unmanned aerial vehicle routing inspection data processing method based on federal adaptive learning
CN113422795B (en) Vehicle-mounted edge task centralized scheduling and resource allocation joint optimization method based on deep reinforcement learning
CN114339842A (en) Method and device for designing dynamic trajectory of unmanned aerial vehicle cluster under time-varying scene based on deep reinforcement learning
Cui et al. Model-free based automated trajectory optimization for UAVs toward data transmission
CN113709249A (en) Safe balanced unloading method and system for driving assisting service
CN103906077A (en) Road side unit placement method based on affinity propagation algorithm
CN116301038A (en) Unmanned aerial vehicle power transmission line autonomous inspection method based on track optimization planning
CN114520991B (en) Unmanned aerial vehicle cluster-based edge network self-adaptive deployment method
Yu et al. Real-time holding control for transfer synchronization via robust multiagent reinforcement learning
CN115550357A (en) Multi-agent multi-task cooperative unloading method
CN114169463A (en) Autonomous prediction lane information model training method and device
CN108985658B (en) Internet of vehicles collaborative downloading method based on fuzzy judgment and client expectation
US20230351205A1 (en) Scheduling for federated learning
CN114328547A (en) Vehicle-mounted high-definition map data source selection method and device
CN113194444B (en) Communication computing resource optimization method, device, system and storage medium
CN113815647B (en) Vehicle speed planning method, device, equipment and medium
CN115002725A (en) Unmanned aerial vehicle-assisted Internet of vehicles resource allocation method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant