CN111866807A - Software definition vehicle-mounted task fine-grained unloading method based on deep reinforcement learning - Google Patents
Software definition vehicle-mounted task fine-grained unloading method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN111866807A CN111866807A CN202010571179.1A CN202010571179A CN111866807A CN 111866807 A CN111866807 A CN 111866807A CN 202010571179 A CN202010571179 A CN 202010571179A CN 111866807 A CN111866807 A CN 111866807A
- Authority
- CN
- China
- Prior art keywords
- vehicle
- unloading
- rsu
- task
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/30—Services specially adapted for particular environments, situations or purposes
- H04W4/40—Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
- H04W4/44—Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P] for communication between vehicles and infrastructures, e.g. vehicle-to-cloud [V2C] or vehicle-to-home [V2H]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W24/00—Supervisory, monitoring or testing arrangements
- H04W24/02—Arrangements for optimising operational condition
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W24/00—Supervisory, monitoring or testing arrangements
- H04W24/06—Testing, supervising or monitoring using simulated traffic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/02—Traffic management, e.g. flow control or congestion control
- H04W28/08—Load balancing or load distribution
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Abstract
The invention discloses a software-defined vehicle-mounted task fine-grained unloading method based on deep reinforcement learning, which comprises the following steps of: 1. obtaining the information that the vehicle can access the RSU, the information of the vehicle task and the like; 2. dividing unloading time slots of the vehicle-mounted tasks according to the RSU information in the step 1; 3. converting a vehicle-mounted task unloading time slot decision method into a mathematical problem; 4. solving the mathematical problem in the step 3 by using a deep reinforcement learning method; 5. the algorithm is deployed to the SDN controller. The invention can fully utilize the unloading time slot to reduce the network transmission delay caused by unloading. The unloading time slot decision making method fully considers factors including the relative positions of the vehicles and the RSU, the number of the vehicles connected into the RSU, the number of vehicle-mounted tasks required to be received by the RSU and the like, and can effectively reduce the unloading delay of the vehicle tasks.
Description
Technical Field
The invention belongs to the field of vehicle-mounted mobile edge calculation, and relates to a vehicle-mounted task unloading time slot decision method, which is suitable for small-sized base station (small-cell base stations) environments, and is particularly suitable for small-sized base station load balancing in a local area network.
Background
With the rapid development of the internet of things technology, Mobile Edge Computing (MEC) has become an important component of the internet of things technology. The user may access the mobile edge computing through a wireless access point such as a base station, Road Side Unit (RSU), etc. The MEC may provide computing, storage, etc. resources for the user. These features find wide application in vehicle networks: vehicle Edge Computing (VEC) is a new network model that has been developed in recent years.
The application in the vehicle network can make the vehicle travel more convenient and safer. With the continuous development of vehicle applications, applications such as real-time road analysis, automatic driving, virtual reality and the like which require strong computing power and a large amount of storage space are more and more, and data contents which need to be transmitted are also more and more. The current mainstream research on task offloading of vehicles has focused on the allocation of computing resources. Most of the on-board task unloading time slot decisions are randomly selected, which cannot fully utilize the unloading time slot to reduce the network transmission delay caused by unloading. Factors influencing the task unloading time slot include the relative position of the current vehicle and the RSU, the number of vehicles accessed in the current RSU, the number of vehicle-mounted tasks required to be received by the current RSU and the like.
In view of the above, it is desirable to provide a method for deciding an unloading time slot of a vehicle-mounted task, which can cope with the unloading situation of the vehicle-mounted task and can consider various influence factors.
Disclosure of Invention
Aiming at the problems, the invention provides a Software-Defined vehicle-mounted task unloading time slot decision method based on deep learning, which mainly researches and obtains global state perception data of a Network, such as the number of vehicles accessed in an RSU (remote Defined Network, SDN) in the Network, the load state of an MEC (media independent component) server, Network delay of the RSU and the like, and constructs a self-adaptive optimization decision by combining a deep learning model on the basis to give suggestions of local unloading, global unloading and optimal vehicle-mounted task unloading time slot so as to solve the problem of overhigh delay caused by unloading of vehicle-mounted tasks, and comprises the following steps:
step 1, obtaining information: a set r of RSUs accessible to the vehicle, a network bandwidth b of the vehicle task Q, RSU requesting offloading in the RSU area;
step 2, dividing unloading time slots of the vehicle-mounted tasks according to the RSU information in the step 1;
step 3, modeling the vehicle-mounted task unloading time slot decision method;
step 4, solving the model expression in the step 3 by using a deep reinforcement learning method;
And 5, deploying the algorithm to the SDN controller.
Further, the information in step 1 includes:
let the unload task in RSU region be Q ═ Q1,…Qi,…,QnIn which QiA mission representing an ith vehicle;
② the size of the vehicle-mounted task is recorded as M ═ M1,…,Mi,…MnIn which M isiRepresents QiThe size of (d);
time delay constraint of vehicle-mounted task is recorded as T ═ T1,…,Ti,…,TnWhere T isiIs namely QiDelay constraints of (2);
r ═ R is defined as the RSU set accessible to vehicles1,…Ri,…Rn};RiRepresents the ith RSU
Fifthly, the number of the vehicle-mounted tasks which are accessed by each RSU is rA ═ R { (R)1A,…,RiA,…,RnA }; wherein R isiA represents the number of the vehicle-mounted tasks accessed in the ith RSU;
the bandwidth of RSU is marked as B ═ B1,…,Bi,…,BnIn which B isiRepresents RiThe network bandwidth of (a);
further, the unloading time slot dividing method of the vehicle-mounted task in the step 2 comprises the following steps:
step 2.1, collecting link bandwidth of RSU, and recording as W; collecting the average signal power of the RSU, and recording the average signal power as P; collecting noise work of RSUThe rate is marked as N; recording link loss power of RSU and vehicle as Lp;
Step 2.2, the transmission rate v of the vehicle and the RSU can be expressed as:
wherein [ L ]P]D is the distance of the vehicle from the RSU in km and f is the signal frequency of the RSU in MHz at 32.45+20lgd +20 lgf.
Step 2.3, the transmission delay of the onboard task with size M can be expressed as:
Step 2.4, because the network transmission delay is influenced by the relative distance between the vehicle and the RSU, dividing the coverage area of each RSU into n task unloading time slots Gap1,…,Gapi,…GapnWhere any slot is denoted by g, g ∈ [ Gap [ ]1,…,Gapi,…Gapn]. For convenience of calculation and description, the transmission rates in the same region are assumed to be the same. For convenience of calculation, the RSU is used as a ground vertical point, and g is the distance between the unloading time slot and the vertical point. Then
Wherein high is the vertical height of the RSU and the ground;
further, the method for modeling the vehicle-mounted task unloading time slot decision method in the step 3 comprises the following steps:
step 3.1, define the offload slot decision as L ═ L1,…,Li,…,Ln},LiThe unloading time slot decision is obtained by representing the unloading task selection positions of the ith vehicle, wherein the combination of the unloading task selection positions of all vehicles is the unloading time slot decision;
and 3.2, determining the unloading decision of a single task. Single slot decision L for unloading certain vehicle-mounted taskiI.e. selection of an offload slot g, i.e. pairMust have Li∈[Gap1,…,Gapi,…Gapn]
And 3.3, as can be seen from the formulas (1) and (2), the transmission delay of the vehicle-mounted task is determined by the bandwidth b of the RSU, the unloading time slot decision l and the size m of the vehicle-mounted task, and the transmission delay of the vehicle-mounted task can be rewritten as:
the link bandwidth W of the RSU is replaced by the bandwidth b of the RSU by the expression (3); representing the relative distance between the vehicle and the RSU;
And 3.4, rewriting the transmission delay of the vehicle-mounted task again by the formula (3) as follows:
wherein [ L ] isP]=32.45+20lgd+20lgf;
Step 3.5, converting the decision method of the unloading time slot of the vehicle-mounted task into a solving formula (5), Di(b,l,Mi) Indicating the transmission delay of the ith onboard task.
Wherein, z represents whether to unload the task, z is 1 represents unloading the task, and z is 0 represents unloading the task; MAXrARepresents the maximum value of rA; the value of rA is influenced by the decision of unloading time slot of the vehicle-mounted task, and rA is less than or equal to MAXrAIndicating that rA cannot exceed the maximum number of on-board tasks to be accessed.
Further, the specific steps of solving the formula (5) by using the deep reinforcement learning method in the step 4 are as follows:
step 4.1, build Markov State space
S={t,rV,rD,rA}
Wherein the various parameters are specified below:
time delay constraint of vehicle-mounted task is recorded as T ═ T1,…,Ti,…,TnWhere T isiIs task QiDelay constraints of (2);
(R) RSU set for vehicle access is defined as R ═ R1,…Ri,…RnAny unloaded time slot of each RSU in the r is denoted by g, and g belongs to [ Gap ]1,…,Gapi,…Gapn]Where the unloading rates of the vehicle tasks in different unloading slots differ, the set of unloading rates for all unloading slots in R can be expressed as rV ═ { R ═ R1G1V,…,RiGjV,…,RnGnV},RiGjV represents the transmission rate of the jth unloading time slot of the ith RSU;
③ the transmission delay of the vehicle-mounted task in each unloading time slot of each RSU in R is expressed as rD ═ R 1G1D,…,RiGjD,…RnGnD},RiGjD represents the transmission delay of the vehicle-mounted task in the jth unloading time slot of the ith RSU;
fourthly, the number of the vehicle-mounted tasks which are accessed by each RSU is rA ═ R { (R)1A,…,RiA,…,RnA};
Step 4.2, build Markov motion space
A={(a,b)|a∈{[1,n]∩N+},b∈{[1,n]∩N+}
Wherein the various parameters are specified below:
a represents an RSU accessed by a vehicle when unloading a vehicle-mounted task is executed;
b represents the unloading time slot of the RSU accessed by the vehicle when the unloading vehicle-mounted task is executed;
③N+representing a positive integer.
And 4.3, establishing a Markov reward function reward:
reward=(η)×base+(2(η)-1)×delay(rD,t)+access(rA)
wherein the various parameters are specified below:
(η) is a step function
When the eta is 1, the unloading of the vehicle-mounted task is successful, and when the eta is 0, the unloading of the vehicle-mounted task is failed.
base is a constant and represents the basic reward, then (eta) x base represents that the basic reward is obtained when the unloading of the vehicle-mounted task is successful, and the basic reward is not obtained when the unloading of the vehicle-mounted task is failed;
delay (rD, t) represents the reward or penalty gained for performing a vehicle unloading task
delay(rD)=Rward×(rD-t)
Wherein rD represents the time taken for unloading the on-board task, and t represents the unloading time constraint of the on-board task. And when unloading is completed within the constraint time t, obtaining the reward, and otherwise, obtaining the penalty. Rward is a reward value or penalty value;
(rA) is used for judging whether the current RSU can also receive more vehicle-mounted tasks
MAXrARepresenting the maximum number of on-board tasks that the current RSU can access. When more vehicle-mounted tasks can be accessed, namely rA is less than or equal to MAXrAAccess (rA) has no influence on reward function reward, when rA > MAXrAThen access (rA) will cause reward to equal 0, i.e., there will be no prize.
Step 4.4, according to the Markov model in the step 4.1-4.3, using DDPG-HER algorithm to solve the optimal unloading time slot decision, and the concrete solving steps are as follows:
step 4.4.1, establishing an Actor current network, an Actor target network, a Critic current network and a Critic target network, wherein the descriptions of the four networks are as follows:
the parameter of the current network of the Actor is theta, and theta also refers to a neural network and is responsible for updating the parameter theta of the network and generating a current action A according to the current state S. Action a acts on the current state S, which represents the set of information such as the decision that an unloading slot of a certain vehicle is being made, the location of the vehicle, which decisions have been made, etc. Generating a state S' and a reward R, the reward R being obtained by a reward function reward;
the Actor target network has parameters theta ', theta' also refers to a neural network and is responsible for selecting an action A 'from the experience playback pool and updating theta';
and thirdly, the parameter of the Critic current network is omega, which also refers to a neural network and is responsible for calculating the current Q value, and the Q value is used for measuring the quality of the selection action. Note that: the Q value here is the same as the Q previously representing the i-th vehicle task iDifferent;
and the parameter of the Critic target network is omega ', also refers to a neural network and is responsible for calculating a target Q value, namely Q'.
And 4.4.2, training an Actor current network, an Actor target network, a Critic current network and a Critic target network. The specific steps are as follows:
step 4.4.2.1, first obtaining an initialization state S, and the Actor current network generating action A according to the state S;
step 4.4.2.2, calculate the reward R according to state S and action a, and get the next state S';
step 4.4.2.3, storing { S, A, S' } in an experience playback pool;
step 4.4.2.4, recording the current state as S';
step 4.4.2.5, calculating the current Q value and the target Q value;
step 4.4.2.6, updating the Critic current network parameter omega;
step 4.4.2.7, updating the current network parameter theta of the Actor;
in step 4.4.2.8, if the current state S' is the termination state, the iteration is completed, otherwise go to step 4.4.2.2.
And 4.4.3, obtaining the optimal unloading time slot by the trained network.
Further, the specific method for deploying the algorithm to the SDN controller in step 5 is as follows:
and after the DDPG-HER algorithm training is completed, saving the current network of the Actor and deploying the current network to the SDN controller. When unloading is required, the SDN controller determines the optimal unloading time slot for the vehicle-mounted task according to the current state information of the network and the nodes.
The invention has the beneficial effects that:
aiming at the defects of the prior art, the invention divides the coverage area of the RSU into a plurality of intervals to accurately select the unloading time slot, calculates the optimal unloading decision by reasonable analysis and modeling and simultaneously using a DDP-HER algorithm, and reduces the network delay caused by unloading of the vehicle task.
Drawings
FIG. 1 is a flowchart of an on-board task offload slot decision process
FIG. 2 DDPG-HER Algorithm flow chart
Detailed Description
The invention will be further explained with reference to the drawings.
The present invention is further described below, it should be noted that the specific implementation of the present embodiment is based on the present technology, and detailed implementation procedures and implementation steps are given, but the scope of the present invention is not limited by the present embodiment.
As shown in FIG. 1, assume that at this time vehicle i is ready to unload onboard task QiThen, the specific implementation flow of the present invention is as follows:
(1) an SDN controller is used. The network bandwidth b information including the set r of RSUs, the vehicle tasks Q, RSU requesting offloading in the RSU area in each local area network may be aggregated into the SDN controller when changes occur. Vehicle i preparation unloading vehicle-mounted task QiIts request information is sent to the SDN controller;
(2) According to the divided unloading time slots, the vehicle i prepares to unload the vehicle-mounted task QiAccording to the formula
Calculating unloading delay generated by unloading of the vehicle i in different time slots;
(3) SDN controller summarizes unloading task Q of vehicle iiAnd unloading of other vehiclesTask, according to formula
Converting the task-loading unloading time slot decision method into a value for solving the above expression;
(4) the above expression is solved using the DDPG-HER algorithm. The method comprises the following specific steps:
1. the initialization state S, i.e. the state of the respective RSU, the completion of all vehicle tasks, is first obtained. And the Actor current network generates an action A according to the state S, wherein the action A is the unloading time slot selected by the task of a certain vehicle. The specific method comprises the following steps: computing a feature vector phi (S) of state S, actionWherein piθA strategy (action) generated by the neural network theta is shown, the neural network theta (Actor current network) can select a time slot for unloading the vehicle task according to the information such as the current state of the RSU,representing noise;
2. the reward R is calculated from the current state S and the action a, and a new state S' is generated. After a certain time slot for unloading the vehicle task is selected, the state of each RSU and the completion condition of all vehicle tasks are changed, and the new state is defined as S';
3. And storing { S, A, S' } into an experience replay pool, wherein the aim is to train the neural network better. Selecting an action A ' by the Actor target network theta ' according to S ' in the experience pool;
4. recording the current state as S';
5. calculating the current Q value and the target Q value
Q (S, A, omega) is the current Q value, Q '(S', A ', omega') is the target Q value, and the calculation of the current network omega is completed by inputting the state S and the action A into Critic; y is the target Q value, where Q '(S', A ', ω') is calculated in the same principle as Q (S, A, ω); γ is the learning rate.
6. Updating the criticic current network ω using the current Q value and the target Q value:
ω←ω+(y-Q(S,A,ω))
y represents a more accurate Q value, and ω + (y-Q (S, A, ω)) means that Critic' S current net ω updates itself by Q value.
The criticic current network omega helps the Actor current network theta to update:
θ←θ-TD(S,A,ω)
and TD (S, A, omega) represents that omega calculates the error of the action A selected in the state S from the optimal action, and theta-TD (S, A, omega) represents that the operator eliminates the error in the current network theta.
If the current state S' is the termination state, the iteration is finished, the Actor current network makes the decision of the optimal unloading time slot, otherwise, the step 2 is carried out.
The above-listed series of detailed descriptions are merely specific illustrations of possible embodiments of the present invention, and they are not intended to limit the scope of the present invention, and all equivalent means or modifications that do not depart from the technical spirit of the present invention are intended to be included within the scope of the present invention.
Claims (8)
1. A software definition vehicle-mounted task fine-grained unloading method based on deep reinforcement learning is characterized by comprising the following steps:
step 1, obtaining information: a set r of RSUs accessible to the vehicle, a network bandwidth b of the vehicle task Q, RSU requesting offloading in the RSU area;
step 2, dividing unloading time slots of the vehicle-mounted tasks according to the RSU information in the step 1;
step 3, modeling the vehicle-mounted task unloading time slot decision method;
and 4, solving the model expression in the step 3 by using a deep reinforcement learning method.
2. The software-defined vehicle-mounted task fine-grained unloading method based on deep reinforcement learning according to claim 1, wherein the information in the step 1 specifically comprises:
(ii) offload tasks in the RSU region, denoted as Q ═ Q1,…Qi,…,QnIn which QiA mission representing an ith vehicle;
② the size of the vehicle-mounted task, and is recorded as M ═ M1,…,Mi,…MnIn which M isiRepresents QiThe size of (d);
③t={T1,…,Ti,…,Tnwhere T isiIs namely QiDelay constraints of (2);
r ═ R of RSU set accessible to vehicle1,…Ri,…Rn};
Fifthly, the number of the vehicle-mounted tasks which are accessed by each RSU is recorded as rA ═ R1A,…,RiA,…,RnA};
Bandwidth of RSU, noted as B ═ B1,…,Bi,…,BnIn which B isiRepresents RiThe network bandwidth of (2).
3. The software-defined vehicle-mounted task fine-grained unloading method based on deep reinforcement learning according to claim 1, wherein the unloading time slot dividing method for the vehicle-mounted task in the step 2 is as follows:
Step 2.1, collecting link bandwidth of RSU, and recording as W; collecting the average signal power of the RSU, and recording the average signal power as P; collecting noise power of RSU, and recording as N; recording link loss power of RSU and vehicle as Lp;
Step 2.2, the transmission rate v of the vehicle and the RSU can be expressed as:
wherein [ L ]P]D is the distance between the vehicle and the RSU, and f is the signal frequency of the RSU, wherein the d is 32.45+20lg d +20lg f;
step 2.3, the transmission delay of the onboard task with size M can be expressed as:
step 2.4, dividing the coverage area of each RSU into n task unloading time slots Gap according to the influence of the relative distance between the vehicle and the RSU on the network delay1,…,Gapi,…GapnWhere any slot is denoted by g, g ∈ [ Gap [ ]1,…,Gapi,…Gapn。
4. The software-defined vehicle-mounted task fine-grained unloading method based on deep reinforcement learning according to claim 3, wherein the method for modeling the vehicle-mounted task unloading time slot decision method in the step 3 is as follows:
step 3.1, define the offload decision as L ═ L1,…,Li,…,Ln},LiIndicating a location of the ith vehicle to select an unloading task; and g represents the distance between the unloading time slot and the ground vertical point by the SRU. ThenWherein high is the vertical height of the RSU and the ground;
step 3.2, determining the unloading decision of a single task, and the unloading time slot decision L of the vehicle-mounted task iI.e. selection of an offload slot g, i.e. pairMust have Li∈[Gap1,…,Gapi,…Gapn];
Step 3.3, the transmission delay of the vehicle-mounted task can be determined by the bandwidth b of the RSU, the unloading time slot decision l and the size m of the vehicle-mounted task, and then the transmission delay of the vehicle-mounted task can be rewritten as:
the link bandwidth W of the RSU is replaced by the bandwidth b of the RSU by the expression (3); representing that the relative distance between the vehicle and the RSU is represented by decision l;
and 3.4, rewriting the transmission delay of the vehicle-mounted task again by the formula (3) as follows:
wherein L isp=32.45+20lg l(km)+20lg f(MHz);
Step 3.5, converting the decision method of the unloading time slot of the vehicle-mounted task into a solving formula (5), Di(b,l,Mi) Indicating the transmission delay of the ith onboard task.
Wherein, MAXrARepresents the maximum value of rA; the value of rA is influenced by the decision of unloading time slot of the vehicle-mounted task, and rA is less than or equal to MAXrAIndicating that rA cannot exceed the maximum number of on-board tasks to be accessed.
5. The software-defined vehicle-mounted task fine-grained unloading method based on deep reinforcement learning of claim 4 is characterized in that the specific steps of solving the formula (5) by using the deep reinforcement learning method in the step 4 are as follows:
step 4.1, build Markov State space
S={t,rV,rD,rA}
Wherein the various parameters are specified below:
time delay constraint of vehicle-mounted task is recorded as T ═ T1,…,Ti,…,TnWhere T isiIs task Q iDelay constraints of (2);
(R) RSU set for vehicle access is defined as R ═ R1,…Ri,…RnAny unloaded time slot of each RSU in the r is denoted by g, and g belongs to [ Gap ]1,…,Gapi,…Gapn]The unloading rate of the vehicle task in different unloading time slots is different, and the unloading of all the unloading time slots in r is carried outThe carrier rate set is denoted rV ═ R1G1V,…,RiGjV,…,RnGnV},RiGjV represents the transmission rate of the jth unloading time slot of the ith RSU;
③ the transmission delay of the vehicle-mounted task in each unloading time slot of each RSU in R is expressed as rD ═ R1G1D,…,RiGjD,…RnGnD},RiGjD represents the transmission delay of the vehicle-mounted task in the jth unloading time slot of the ith RSU;
fourthly, the number of the vehicle-mounted tasks which are accessed by each RSU is rA ═ R { (R)1A,…,RiA,…,RnA};
Step 4.2, build Markov motion space
A={(a,b)|a∈{[1,n]∩N+},b∈{[1,n]∩N+}
Wherein the various parameters are specified below:
a represents an RSU accessed by a vehicle when unloading a vehicle-mounted task is executed;
b represents the unloading time slot of the RSU accessed by the vehicle when the unloading vehicle-mounted task is executed;
③N+represents a positive integer;
and 4.3, establishing a Markov reward function reward:
reward=(η)×base+(2(η)-1)×delay(rD,t)+access(rA)
wherein the various parameters are specified below:
(η) is a step function
When the (eta) is 1, the unloading success of the vehicle-mounted task is shown, when the (eta) is 0, the unloading failure of the vehicle-mounted task is shown, the base is a constant and shows the basic reward, and when the unloading success of the vehicle-mounted task is shown, the (eta) multiplied by the base shows that the basic reward is obtained, and when the unloading failure of the vehicle-mounted task is shown, the basic reward is not obtained;
Delay (rD, t) represents the reward or penalty gained for performing a vehicle unloading task
delay(rD)=Rward×(rD-t)
rD represents the time for unloading the vehicle-mounted task, t represents the unloading time constraint of the vehicle-mounted task, reward is obtained when unloading is completed within the constraint time t, otherwise punishment is obtained, and Rward is a reward value or a punishment value;
(rA) is used for judging whether the current RSU can also receive more vehicle-mounted tasks
MAXrARepresenting the maximum number of vehicle-mounted tasks which can be accessed by the current RSU, when more vehicle-mounted tasks can be accessed, namely rA is less than or equal to MAXrAAccess (rA) has no influence on reward function reward, when rA > MAXrAThen access (rA) will cause reward to equal 0, i.e., no reward;
and 4.4, solving the optimal unloading time slot by using a DDPG-HER algorithm according to the Markov model in the step 4.1-4.3.
6. The software-defined vehicle-mounted task fine-grained unloading method based on deep reinforcement learning according to claim 5, wherein the specific implementation of the step 4.4 comprises the following steps:
step 4.4.1, establishing an Actor current network, an Actor target network, a criticic current network and a criticic target network, wherein the descriptions of the four networks are as follows:
The Actor is characterized in that the parameter of the current network is theta, the theta also refers to a neural network and is responsible for updating the parameter theta of the network and generating a current action A according to the current state S, the action A acts on the current state S to generate a state S' and an award R, and the award R is obtained by an award function reward;
the Actor target network has parameters theta ', theta' also refers to a neural network and is responsible for selecting an action A 'from the experience playback pool and updating theta';
the parameter of the Critic current network is omega, which also refers to a neural network and is responsible for calculating the current Q value, and the Q value is used for measuring the quality of the selection action;
the parameter of the Critic target network is omega ', also refers to a neural network and is responsible for calculating a target Q value, namely Q';
step 4.4.2, training an Actor current network, an Actor target network, a Critic current network and a Critic target network, and specifically comprises the following steps:
step 4.4.2.1, first obtaining an initialization state S, and the Actor current network generating action A according to the state S;
step 4.4.2.2, calculate the reward R according to state S and action a, and get the next state S';
step 4.4.2.3, storing { S, A, S' } in an experience playback pool;
step 4.4.2.4, recording the current state as S';
step 4.4.2.5, calculating the current Q value and the target Q value;
Step 4.4.2.6, updating the Critic current network parameter omega;
step 4.4.2.7, updating the current network parameters of the Actor;
step 4.4.2.8, if the current state S' is the termination state, the iteration is finished, otherwise go to step 4.4.2.2;
and 4.4.3, calculating the optimal unloading time slot by the trained network.
7. The deep reinforcement learning-based software defined vehicle-mounted task fine-grained unloading method according to claim 1, further comprising a step 5 of deploying an algorithm to an SDN controller.
8. The software-defined vehicle-mounted task fine-grained unloading method based on deep reinforcement learning according to claim 7, wherein the specific method in the step 5 is as follows:
and after the DDPG-HER algorithm training is completed, saving the current network of the Actor and deploying the current network to the SDN controller. When unloading is required, the SDN controller determines the optimal unloading time slot for the vehicle-mounted task according to the current state information of the network and the nodes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010571179.1A CN111866807B (en) | 2020-06-22 | 2020-06-22 | Software definition vehicle-mounted task fine-grained unloading method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010571179.1A CN111866807B (en) | 2020-06-22 | 2020-06-22 | Software definition vehicle-mounted task fine-grained unloading method based on deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111866807A true CN111866807A (en) | 2020-10-30 |
CN111866807B CN111866807B (en) | 2022-10-28 |
Family
ID=72987863
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010571179.1A Active CN111866807B (en) | 2020-06-22 | 2020-06-22 | Software definition vehicle-mounted task fine-grained unloading method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111866807B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112714178A (en) * | 2020-12-25 | 2021-04-27 | 北京信息科技大学 | Task unloading method and device based on vehicle-mounted edge calculation |
CN113422795A (en) * | 2021-05-06 | 2021-09-21 | 江苏大学 | Vehicle-mounted edge task centralized scheduling and resource allocation joint optimization method based on deep reinforcement learning |
CN113645273A (en) * | 2021-07-06 | 2021-11-12 | 南京邮电大学 | Internet of vehicles task unloading method based on service priority |
CN114116047A (en) * | 2021-11-09 | 2022-03-01 | 吉林大学 | V2I unloading method for vehicle-mounted computation-intensive application based on reinforcement learning |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109067842A (en) * | 2018-07-06 | 2018-12-21 | 电子科技大学 | Calculating task discharging method towards car networking |
CN109257429A (en) * | 2018-09-25 | 2019-01-22 | 南京大学 | A kind of calculating unloading dispatching method based on deeply study |
CN109756378A (en) * | 2019-01-12 | 2019-05-14 | 大连理工大学 | A kind of intelligence computation discharging method under In-vehicle networking |
CN110557769A (en) * | 2019-09-12 | 2019-12-10 | 南京邮电大学 | C-RAN calculation unloading and resource allocation method based on deep reinforcement learning |
CN110798842A (en) * | 2019-01-31 | 2020-02-14 | 湖北工业大学 | Heterogeneous cellular network flow unloading method based on multi-user deep reinforcement learning |
CN110891253A (en) * | 2019-10-14 | 2020-03-17 | 江苏大学 | Community popularity-based vehicle-mounted delay tolerant network routing method |
-
2020
- 2020-06-22 CN CN202010571179.1A patent/CN111866807B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109067842A (en) * | 2018-07-06 | 2018-12-21 | 电子科技大学 | Calculating task discharging method towards car networking |
CN109257429A (en) * | 2018-09-25 | 2019-01-22 | 南京大学 | A kind of calculating unloading dispatching method based on deeply study |
CN109756378A (en) * | 2019-01-12 | 2019-05-14 | 大连理工大学 | A kind of intelligence computation discharging method under In-vehicle networking |
CN110798842A (en) * | 2019-01-31 | 2020-02-14 | 湖北工业大学 | Heterogeneous cellular network flow unloading method based on multi-user deep reinforcement learning |
CN110557769A (en) * | 2019-09-12 | 2019-12-10 | 南京邮电大学 | C-RAN calculation unloading and resource allocation method based on deep reinforcement learning |
CN110891253A (en) * | 2019-10-14 | 2020-03-17 | 江苏大学 | Community popularity-based vehicle-mounted delay tolerant network routing method |
Non-Patent Citations (1)
Title |
---|
彭二帅: "基于负载预测的车载边缘资源最优控制调度研究与系统实现", 《中国优秀硕士学位论文全文数据库》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112714178A (en) * | 2020-12-25 | 2021-04-27 | 北京信息科技大学 | Task unloading method and device based on vehicle-mounted edge calculation |
CN112714178B (en) * | 2020-12-25 | 2023-05-12 | 北京信息科技大学 | Task unloading method and device based on vehicle-mounted edge calculation |
CN113422795A (en) * | 2021-05-06 | 2021-09-21 | 江苏大学 | Vehicle-mounted edge task centralized scheduling and resource allocation joint optimization method based on deep reinforcement learning |
CN113645273A (en) * | 2021-07-06 | 2021-11-12 | 南京邮电大学 | Internet of vehicles task unloading method based on service priority |
CN114116047A (en) * | 2021-11-09 | 2022-03-01 | 吉林大学 | V2I unloading method for vehicle-mounted computation-intensive application based on reinforcement learning |
CN114116047B (en) * | 2021-11-09 | 2023-11-03 | 吉林大学 | V2I unloading method for vehicle-mounted computation intensive application based on reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN111866807B (en) | 2022-10-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111866807B (en) | Software definition vehicle-mounted task fine-grained unloading method based on deep reinforcement learning | |
CN110213796B (en) | Intelligent resource allocation method in Internet of vehicles | |
CN110427690A (en) | A kind of method and device generating ATO rate curve based on global particle swarm algorithm | |
CN114650567A (en) | Unmanned aerial vehicle-assisted V2I network task unloading method | |
CN113904948A (en) | 5G network bandwidth prediction system and method based on cross-layer multi-dimensional parameters | |
CN113687875A (en) | Vehicle task unloading method and device in Internet of vehicles | |
CN114374741A (en) | Dynamic grouping internet-of-vehicle caching method based on reinforcement learning under MEC environment | |
CN112598146A (en) | Method and device for determining parking position, electronic equipment and readable storage medium | |
CN115376031A (en) | Road unmanned aerial vehicle routing inspection data processing method based on federal adaptive learning | |
CN113422795B (en) | Vehicle-mounted edge task centralized scheduling and resource allocation joint optimization method based on deep reinforcement learning | |
CN114339842A (en) | Method and device for designing dynamic trajectory of unmanned aerial vehicle cluster under time-varying scene based on deep reinforcement learning | |
Cui et al. | Model-free based automated trajectory optimization for UAVs toward data transmission | |
CN113709249A (en) | Safe balanced unloading method and system for driving assisting service | |
CN103906077A (en) | Road side unit placement method based on affinity propagation algorithm | |
CN116301038A (en) | Unmanned aerial vehicle power transmission line autonomous inspection method based on track optimization planning | |
CN114520991B (en) | Unmanned aerial vehicle cluster-based edge network self-adaptive deployment method | |
Yu et al. | Real-time holding control for transfer synchronization via robust multiagent reinforcement learning | |
CN115550357A (en) | Multi-agent multi-task cooperative unloading method | |
CN114169463A (en) | Autonomous prediction lane information model training method and device | |
CN108985658B (en) | Internet of vehicles collaborative downloading method based on fuzzy judgment and client expectation | |
US20230351205A1 (en) | Scheduling for federated learning | |
CN114328547A (en) | Vehicle-mounted high-definition map data source selection method and device | |
CN113194444B (en) | Communication computing resource optimization method, device, system and storage medium | |
CN113815647B (en) | Vehicle speed planning method, device, equipment and medium | |
CN115002725A (en) | Unmanned aerial vehicle-assisted Internet of vehicles resource allocation method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |