CN114625504A - Internet of vehicles edge computing service migration method based on deep reinforcement learning - Google Patents

Internet of vehicles edge computing service migration method based on deep reinforcement learning Download PDF

Info

Publication number
CN114625504A
CN114625504A CN202210232318.7A CN202210232318A CN114625504A CN 114625504 A CN114625504 A CN 114625504A CN 202210232318 A CN202210232318 A CN 202210232318A CN 114625504 A CN114625504 A CN 114625504A
Authority
CN
China
Prior art keywords
vehicle
migration
vehicles
complexity
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210232318.7A
Other languages
Chinese (zh)
Inventor
肖春来
刘迪
赵洪祥
张德干
张捷
张婷
王法玉
陈洪涛
朴铭杰
高星江
李荭娜
李思强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huadian Heavy Machinery Co ltd
Original Assignee
Tianjin University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University of Technology filed Critical Tianjin University of Technology
Priority to CN202210232318.7A priority Critical patent/CN114625504A/en
Publication of CN114625504A publication Critical patent/CN114625504A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4812Task transfer initiation or dispatching by interrupt, e.g. masked
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Abstract

A vehicle networking edge computing service migration method based on deep reinforcement learning is disclosed. Mobile Edge computing is one of the key technologies to reduce the network delay of vehicles, and due to the mobility of vehicles, the services requested by the vehicles should be migrated on different mec (mobile Edge computing) servers frequently to ensure their strict service quality requirements. However, due to uncertainty of vehicle movement, frequent migration increases cost and time delay, so it is very challenging to design a good migration method. The method minimizes the completion time of service migration under the condition of meeting the migration cost. An improved deep deterministic strategy gradient algorithm is constructed in the internet of vehicles by using deep reinforcement learning to optimize the cost and time delay of vehicle task migration. Meanwhile, a centralized training distributed execution method is used for solving the problem of high dimension during vehicle task migration in the Internet of vehicles.

Description

Internet of vehicles edge computing service migration method based on deep reinforcement learning
Technical Field
The invention belongs to the field of Internet of things, and particularly relates to a vehicle networking edge computing service migration method based on deep reinforcement learning.
Background
Edge Computing (MEC) is a promising technology to accommodate the explosive growth of delay-sensitive and computation-intensive mobile applications such as Augmented Reality (AR), real-time video processing, and car networking, but the limited resources of mobile devices have difficulty meeting the above application requirements, and have attracted a great deal of research interest in academia and industry in recent years. Unlike traditional cloud computing, MECs deploy computing and storage resources near the edge of the network of mobile users. Since the service is provided by a nearby edge server rather than a remote cloud, service response latency can be significantly reduced.
However, due to unpredictable user mobility, increasing transmission delay and maintaining good user Quality of experience (Quality of experience) is far more than deploying mobile applications and services at the edge of the network. When the user is far from the edge server where the service is deployed, a large service response delay may occur, or even a service interruption may occur. In order to ensure the continuity of the service, an effective service migration policy is needed to decide when and where to migrate the service.
There is currently little work on distributed task migration in MECs. The traditional method is to transfer tasks by predicting the location of the user, but the mobility of the vehicle in the internet of vehicles is difficult to predict. Some other methods apply Deep-reinforcement learning (DQN) to task migration, where DQN has a good effect on processing complex state space, but cannot meet the task migration requirement of multi-user edge computation, and when the number of users increases, the dimensions of system state space and behavior space increase exponentially. For a multi-user scenario of the internet of vehicles, states of all vehicles are combined into a global state, which causes instability of a multi-user environment, and influences among the vehicles are ignored. It becomes challenging to design an efficient migration policy approach to minimize migration costs and time delays in such a multi-user distributed environment. Therefore, the invention provides a depth certainty strategy gradient algorithm based on self-adaptive weight under the internet of vehicles aiming at a multi-user scene of the internet of vehicles, and simultaneously solves the task migration problem in the internet of vehicles by adopting a centralized off-line training distributed execution method on the basis of the algorithm.
Disclosure of Invention
The invention aims to solve the problem of high dimension during vehicle task migration in the Internet of vehicles. A vehicle networking edge computing service migration method based on deep reinforcement learning is provided. The method researches the service migration problem of vehicles in the Internet of vehicles in a dynamic environment, and minimizes the completion time of service migration under the condition of meeting the migration cost. An improved depth certainty strategy gradient algorithm is constructed in the internet of vehicles by using deep reinforcement learning to optimize the cost and time delay of vehicle task migration, and meanwhile, a centralized training distributed execution method is used to solve the problem of high dimension during vehicle task migration in the internet of vehicles.
Technical scheme of the invention
A car networking edge computing service migration method based on deep reinforcement learning mainly comprises the following steps:
1, establishing a system model:
1.1, establishing a return delay model;
the system comprises M {1,2, 3., M } mobile edge servers, N {1,2, 3., N } mobile vehicles, wherein the mobile vehicles change from one time slot to the next time slot according to a Markov model, the invention considers that the length of each time slot of a time slot model T {1,2, 3., T } is epsilon, the time slot model is regarded as a continuous time sampling, the time intervals between the sampling are equal, and due to the mobility of the vehicles, the service must be migrated across the edge servers in order to ensure the continuity of the service. The virtualization of the container is utilized to manage the computing tasks in the edge server, so that the flexible scheduling of the vehicle computing tasks is realized,
Figure BDA0003538927640000031
indicating whether vehicle m is connected to mobile edge server n at time t,
Figure BDA0003538927640000032
service Task representing vehicle n at time tnWhether it is executed on top of the mobile edge server m;
due to the limited computing resources of the MEC servers, when the computing load of the local MEC server of the mobile vehicle is high, the computing tasks of the vehicle are transmitted to the MEC servers with less computing tasks nearby through the backhaul link, and the transmission delay between the MEC servers is calculated by using cn/CmWherein c isnRepresents the size of n input data of the vehicle, and CmThen the output link bandwidth of MEC edge server m is represented so the backhaul delay of the vehicle can be represented as
Figure BDA0003538927640000033
Figure BDA0003538927640000034
In the above formula, λ represents a positive coefficient, d (m)1,m2) Representing edge servers m1And edge server m2The number of hops in between.
1.2, establishing a communication delay model;
the quality of wireless communication can improve the efficiency of service migration, and the quality of wireless communication can be improved by spectrum resource management, so it is very important to allocate a proper amount of spectrum resources to each vehicle. With SmTo represent the spectrum resources available to a mobile edge server m, all vehicles connected to m sharing the spectrum resources, the invention uses spen,m(t) represents the spectrum proportion allocated to the vehicle n by the MEC server m at the time t, and since the returned data is relatively small and negligible, the transmission delay of the returned result is not considered, and according to shannon's theorem, the data transmission rate between the vehicle n and the edge server m is represented as:
Figure BDA0003538927640000035
in the above formula, PnRepresenting the transmission power, G, of the vehicle nn,m(t) represents the channel gain between the vehicle n and the edge server m at time t,
Figure BDA0003538927640000036
represents white noise power, so the transmission delay of the input data is represented as:
Figure BDA0003538927640000037
1.3, establishing a calculation delay model;
sharing computing resources by all vehicles within the coverage of the MEC server, helping the vehicles to handle their offloading tasks, FmUsed to represent the computing power of MEC server m, phin(t) represents the time at tInscription TasknRequired CPU cycles, therefore, TasknThe required time to complete on MEC server m is expressed as:
Figure BDA0003538927640000041
in the above formula
Figure BDA0003538927640000042
Indicating how many tasks are being executed on the MEC server m, it can be seen from the above equation that the execution delay of the MEC server increases in proportion to the number of executed tasks, so that the computing resources of the target MEC server also need to be considered when performing service migration of the vehicle.
1.4, establishing a migration cost model;
to satisfy vehicle service continuity, service migration between multiple MEC servers is required, while cross-server migration requires additional migration costs, assuming vehicle n will have all off-load tasks from MEC server m1Migration to m2
Figure BDA0003538927640000043
Indicating that the vehicle n will Task at time tnFrom m1Migration to m2The cost of (a) of (b),
Figure BDA0003538927640000044
in the above formula, χ is a positive coefficient, | onL represents the mirror size of the vehicle n unloading task,
Figure BDA0003538927640000045
1.5, description of the problem;
for a moving vehicle n, the task completion time TnIncluding computation delay, backhaul delay, and communication delay, expressed as:
Figure BDA0003538927640000046
Figure BDA0003538927640000047
expressing the total migration cost of the vehicle n, according to the formula (5), the migration cost is 0 when the MEC server is not changed, otherwise the migration cost is α | onL, so get the total migration cost of vehicle n:
Figure BDA0003538927640000051
migration decisions are made at each time period, while a migration Cost budget Cost is made at each time periodbAnd therefore migration cost budget for the entire system
Figure BDA0003538927640000052
Expressed as:
Figure BDA0003538927640000053
on the premise of meeting the migration cost budget, the average delay of the system is minimized through learning, and the optimization formula is expressed as follows:
Figure BDA0003538927640000054
2, a depth certainty strategy gradient algorithm of self-adaptive weight:
2.1, improving a depth deterministic gradient algorithm;
the depth deterministic gradient algorithm adopts an empirical playback mechanism, which can satisfy the assumption of independent distribution of samples and can quickly converge, but is random when sampling samples in the return visit storage, which ignores the different importance of each sample, thereby resulting in that the sampling efficiency of the samples is not very high. A prior experience playback mechanism is proposed later, and the importance of the sample is evaluated by calculating the absolute value of the TD, but the training result is influenced if the TD error is large. The learning of the neural network is not beneficial to the samples with lower complexity, but the neural network is difficult to understand for the training samples with high complexity in the early learning stage. Therefore, each state sample in the playback storage is assigned a priority weight, their sampling probability is set according to the assigned priority weight, an adaptive weight empirical playback mechanism is proposed,
complexity of sample i CF(s)i) Which mainly comprises the importance function RF (r) of the sample return valuei,DEi) And a use frequency function SUF (num) on the samplei),
The importance of the sample return value is expressed as:
RF(ri,DEi)=|DEi|*RW(ri)+α (10)
in the above formula, DEi=Q(si,ai;θc)-(ri+μQ'(s'i,a'i;θc') Denotes TD error, Q(s)i,ai;θc) Is the value of critical component evaluate-network, alpha represents a small positive number, which prevents the situation that cannot be sampled when the time difference is 0, RW (r)i) Representing the weight of the corresponding reward, r being set for stabilityi∈[-1,1]While RW (r)i)>0,
Figure BDA0003538927640000061
In order to prevent the over-fitting phenomenon, a function related to the number of times of using the sample is added, and as the number of times of using the sample increases, the probability that the sample is selected next becomes lower, SUF (num)i) Expressed as:
Figure BDA0003538927640000062
in the above formula, numiIndicating playback of stored samples siP, q are constants greater than 0, so the complexity function is expressed as:
Figure BDA0003538927640000063
in the above formula, the first and second carbon atoms are,
Figure BDA0003538927640000064
representing a hyper-parameter, calculating the sample complexity of a sample defined by the invention:
Figure BDA0003538927640000065
in the above formula, [ phi ] E [0,1 ∈]Representing an exponential random factor, Ψ -1 representing priority sampling, Ψ -0 representing uniform sampling, the exponential random factor ensuring a balance of priority sampling and uniform sampling, thereby preventing the generation of an overfitting phenomenon, and the present invention uses an importance sampling weight w since a distribution error occurs if samples in playback storage are directly samplediTo correct for this deviation, the present invention uses a normalization operation to reduce the TD error,
Figure BDA0003538927640000066
in the above equation, D represents the playback storage capacity, and β represents the compensation coefficient.
2.2, establishing a self-adaptive weight deep learning framework;
applying the framework of centralized training and distributed execution to the proposed AWDDPG (adaptive Weight Deep Deterministic Policy gradient) algorithm, when in the offline centralized training phase, the observation state and the row of other vehicles are saved in the experience replay buffer in addition to the local observation stateTo enable an increase in the number of exercises generated per phase by combining behavior and observed states, and to increase cooperative communication between agents, the pair
Figure BDA0003538927640000071
And
Figure BDA0003538927640000072
when updating, the Actor component evaluates according to the sample acquired by the adaptive weight, and after obtaining the global information, each moving vehicle learns the state-behavior value function of the Actor, and meanwhile, after learning the behaviors of other vehicles, each moving vehicle is fixed in the off-line training stage, so that the influence of other vehicle behaviors on the environment is effectively solved, and in the decision-making stage, the Actor only needs to locally observe the state
Figure BDA0003538927640000073
The vehicle can select an action without knowing the information of other vehicles.
And 3, a calculation service migration method based on deep reinforcement learning:
3.1, description of migration method steps;
firstly, inputting relevant parameters of an algorithm, including batch size, playback storage size, discount factors, soft update coefficients, indexes, hyper-parameters, and the use times and complexity of samples; initializing parameters, and preheating data; and then circularly executing the following steps, firstly initializing the state, then selecting the vehicle and executing the corresponding action, then receiving the reward obtained after the action is executed and simultaneously obtaining the next state, next circularly executing the next action for each vehicle, firstly storing the sample into a playback storage and setting related parameters, then adaptively selecting the sample according to a formula, calculating a time difference error and an importance sampling weight, then updating the weight and calculating the complexity, then updating the evaluate-network parameter of the Critic through a minimized loss function, then updating the evaluate-network parameter of the Actor through a minimized strategy objective equation, finally updating the target-network parameters of the Critic and the Actor, and finishing the method after the circulation is finished.
3.2, complexity analysis;
the number of moving vehicles N and the batch size K are the main reasons for determining the complexity of the adaptive weight sampling time, the temporal complexity of the adaptive weight sampling is denoted as o (nk), since the temporal complexity of the offline training is proportional to the size of the training data and the training time, therefore, only the time complexity of execution needs to be concerned, the complexity of execution is mainly determined by the structure of the neural network, the size of the state space and the size of the action space, and under the condition of not considering the structure of the neural network, the computation complexity is Q (| A | × | S |), | A | represents the number of behavior spaces, | S | represents the number of state spaces, and after DNN is added, the setting of the environment and parameters of the system has great influence on the computation complexity, it is also difficult to estimate, so the complexity of the AWDDPG algorithm can be expressed as O (NK + | A | × | S |).
The invention has the advantages and positive effects that:
mobile edge computing is one of the key technologies to reduce the latency of vehicle networks, and due to the mobility of vehicles, the services requested by them should be migrated frequently on different servers to guarantee their strict quality of service requirements. But due to uncertainty in vehicle movement, frequent migration adds cost and time delay, so it is very challenging to design a good migration method. The invention minimizes the completion time of service migration under the condition of meeting the migration cost. An improved depth deterministic strategy gradient algorithm is constructed in the internet of vehicles by using deep reinforcement learning to optimize the cost and time delay of vehicle task migration. Meanwhile, a centralized training distributed execution method is used for solving the problem of high dimension during vehicle task migration in the Internet of vehicles.
Drawings
FIG. 1 is a prize diagram for AWDDPG and DDPG algorithms;
FIG. 2 is a graph of the loss function of the AWDDPG algorithm;
FIG. 3 is a graph of average completion times for different input data sizes;
FIG. 4 is a graph of average completion times for different numbers of vehicles;
FIG. 5 is a graph of average completion times for different numbers of mobile edge servers;
FIG. 6 is a graph of average completion times for different migration cost budgets;
FIG. 7 is a graph of average migration costs for different input data sizes;
FIG. 8 is a graph of the migration resource ratios for different numbers of vehicles;
FIG. 9 is a flowchart of a method for migrating the edge computing service of the Internet of vehicles based on deep reinforcement learning according to the present invention.
Detailed Description
Example 1:
in the experiment, Matlab 2018a is used for carrying out a large number of experiments to verify the performance of the AWDDPG distributed task migration algorithm based on the Internet of vehicles. The robustness of the algorithm under different parameters is tested through experiments. Meanwhile, the provided algorithm is compared with other algorithms, and the effectiveness of the provided algorithm is proved.
Referring to fig. 9, the method for migrating the edge computing service in the internet of vehicles based on deep reinforcement learning mainly includes the following key steps:
1, establishing a system model:
1.1, establishing a return delay model;
the system comprises M {1,2, 3., M } mobile edge servers, N {1,2, 3., N } mobile vehicles, and the mobile vehicles change from one time slot to the next time slot according to a Markov model, the invention considers that the length of each time slot of a time slot model T {1,2, 3., T } is epsilon, the time slot model can be regarded as a continuous time sample, the time intervals between the samples are equal, and the services have to be migrated across the edge servers to ensure the continuity of the services due to the mobility of the vehicles. The computing tasks in the edge server are managed by virtualization of the container, so that flexible scheduling of the vehicle computing tasks is achieved.
Figure BDA0003538927640000091
Indicating whether vehicle m is connected to mobile edge server n at time t.
Figure BDA0003538927640000092
Service Task representing vehicle n at time tnWhether it is executed on top of the mobile edge server m;
due to the limited computing resources of the MEC server, when the computing load of the local MEC server of the mobile vehicle is high, the computing tasks of the vehicle can be transmitted to the MEC server with less computing tasks nearby through the backhaul link. And the transmission delay between MEC servers can be used as cn/CmWherein c isnRepresents the size of n input data of the vehicle, and CmIt represents the outgoing link bandwidth of the MEC edge server m. The return delay of the vehicle can be expressed as
Figure BDA0003538927640000101
Figure BDA0003538927640000102
In the above formula, λ represents a positive coefficient, d (m)1,m2) Representing edge servers m1And edge server m2The number of hops in between.
1.2, establishing a communication delay model;
the quality of wireless communication can improve the efficiency of service migration, and the quality of wireless communication can be improved by spectrum resource management, so it is very important to allocate a proper amount of spectrum resources to each vehicle. With SmTo represent the spectrum resources available to a mobile edge server m, all vehicles connected to m sharing the spectrum resources, the invention uses spen,m(t) represents the proportion of spectrum assigned by the MEC server m to the vehicle n at time t. Since the returned data is relatively small and negligible, the transmission delay of the returned result is not considered. According to shannon's theorem, the data transmission rate between vehicle n and edge server m can be expressed as:
Figure BDA0003538927640000103
in the above formula, PnRepresenting the transmission power, G, of the vehicle nn,m(t) represents the channel gain between the vehicle n and the edge server m at time t,
Figure BDA0003538927640000104
representing white noise power. The transmission delay of the input data can be expressed as:
Figure BDA0003538927640000105
1.3, establishing a calculation delay model;
all vehicles within the coverage of the MEC server share computing resources, helping the vehicles handle their off-load tasks. FmTo represent the computing power of the MEC server m. Phi is an(t) denotes the Task at time tnThe required CPU cycles. Therefore, TasknThe time required to complete at MEC server m can be expressed as:
Figure BDA0003538927640000111
in the above formula
Figure BDA0003538927640000112
Indicating how many tasks are executing on MEC server m. As can be seen from the above equation, the execution delay of the MEC server increases in proportion to the number of executing tasks, so the computing resources of the target MEC server also need to be considered when performing service migration of the vehicle.
1.4, establishing a migration cost model;
in order to satisfy continuity of vehicle service, service migration between a plurality of MEC servers is required. While the migration across servers requires additionalMigration cost, assuming vehicle n will have all the off-load tasks from MEC server m1Migration to m2
Figure BDA0003538927640000113
Indicating that the vehicle n will Task at time tnFrom m1Migration to m2The cost of (a).
Figure BDA0003538927640000114
In the above formula, χ is a positive coefficient, | onAnd | represents the mirror size of the vehicle n unloading task.
Figure BDA0003538927640000115
1.5, description of the problem;
for a moving vehicle n, the task completion time TnIncluding computation, backhaul, and communication delays, can be expressed as:
Figure BDA0003538927640000116
Figure BDA0003538927640000117
representing the total migration cost of vehicle n. According to the formula (5), the migration cost is 0 when the MEC server is not changed, otherwise the migration cost is α | onL. The total migration cost of vehicle n can be derived:
Figure BDA0003538927640000118
migration decisions are made at each time period, while a migration Cost budget Cost is made at each time periodbAnd therefore migration cost budget for the entire system
Figure BDA0003538927640000121
Can be expressed as:
Figure BDA0003538927640000122
on the premise of meeting the migration cost budget, the average delay of the system can be minimized through learning, and the optimization formula can be expressed as follows:
Figure BDA0003538927640000123
2, self-adaptive weight depth certainty strategy gradient algorithm:
2.1, improving a depth deterministic gradient algorithm;
the depth deterministic gradient algorithm adopts an empirical playback mechanism, which can satisfy the assumption of independent distribution of samples and can quickly converge, but is random when sampling samples in the return visit storage, which ignores the different importance of each sample, thereby resulting in that the sampling efficiency of the samples is not very high. A prior experience playback mechanism is proposed later, and the importance of the sample is evaluated by calculating the absolute value of the TD, but the training result is influenced if the TD error is large. The learning of the neural network is not facilitated for the samples with lower complexity, but the neural network is difficult to understand for the training samples with high complexity in the early stage of learning. Therefore, each state sample in the playback storage is assigned a priority weight, the sampling probability of the state samples is set according to the assigned priority weight, and an adaptive weight empirical playback mechanism is provided.
Complexity of sample i CF(s)i) Which mainly comprises the importance function RF (r) of the sample return valuei,DEi) And a use frequency function SUF (num) on the samplei)。
The importance of the sample return value is expressed as:
RF(ri,DEi)=|DEi|*RW(ri)+α (10)
in the above formula, DEi=Q(si,ai;θc)-(ri+μQ'(s'i,a'i;θc') Denotes TD error, Q(s)i,ai;θc) Is the value of the Critic component evaluate-network. Alpha represents a small positive number, which can prevent the situation that the sampling cannot be performed when the time difference is 0. RW (r)i) Representing the weight of the corresponding reward, r being set for stabilityi∈[-1,1]While RW (r)i)>0。
Figure BDA0003538927640000131
In order to prevent the over-fitting phenomenon, a function related to the number of times of using the sample is added, and as the number of times of using the sample increases, the probability that the sample is selected next becomes lower, SUF (num)i) Can be expressed as:
Figure BDA0003538927640000132
in the above formula, numiRepresenting playback of stored samples siP and q are constants greater than 0. The complexity function can be expressed as:
Figure BDA0003538927640000133
in the above formula, the first and second carbon atoms are,
Figure BDA0003538927640000134
representing a hyper-parameter. Calculating the sampling probability of a sample can be determined by the sample complexity defined by the present invention:
Figure BDA0003538927640000135
in the above formula, [ phi ] E [0,1 ∈]Representing an exponential random factor. Ψ — 1 denotes priority sampling, and Ψ — 0 denotes uniform sampling. The exponential random factor ensures a balance between priority sampling and uniform sampling, thereby preventing the over-fitting phenomenon. Since a distribution error occurs if samples in the replay memory are directly sampled, the present invention uses the importance sampling weight wiTo correct for this deviation. The present invention uses normalization to reduce TD error.
Figure BDA0003538927640000136
In the above equation, D represents the playback storage capacity, and β represents the compensation coefficient.
2.2, establishing a self-adaptive weight deep learning framework;
a framework of centralized training and distributed execution is applied to the proposed awddpg (adaptive Weight Deep Deterministic Policy gradient) algorithm. When in the offline intensive training phase, the observed states and behaviors of other vehicles are saved in addition to the local observed states in the experience replay cache. The number of exercises generated per phase can be increased by combining the behavior and the observed state, while the cooperative communication between agents can also be increased. In pair
Figure BDA0003538927640000141
And
Figure BDA0003538927640000142
when updating, the Actor component evaluates against the samples collected by the adaptive weights. Having obtained the global information, each moving vehicle can learn its own state-behavior value function. Meanwhile, each moving vehicle is stationary during the off-line training phase after learning the behavior of the other vehicles. Therefore, the influence of other vehicle behaviors on the environment can be effectively solved. In the decision phase, because the Actor only needs the local observation state
Figure BDA0003538927640000143
The vehicle can select an action without knowing the information of other vehicles.
And 3, a calculation service migration method based on deep reinforcement learning:
3.1, description of migration method steps;
first, the relevant parameters of the algorithm are input, such as batch size, playback storage size, discount factor, soft update coefficient, exponent, hyper-parameter, number of uses of samples, and complexity. Parameters are then initialized and data is prepared for preheating. And then, circularly executing the following steps, firstly initializing the state, then selecting the action by the vehicle, executing the corresponding action, receiving the reward obtained after the action is executed, and simultaneously obtaining the next state. Next, each vehicle is cycled to perform the next actions, first storing the samples in a playback memory and setting the relevant parameters. And then, adaptively selecting samples according to a formula, and calculating time difference errors and importance sampling weights. The weights are then updated, and the complexity is calculated. Then, the evaluate-network parameter of Critic is updated by minimizing the loss function. The evaluate-network parameter of Actor is then updated by minimizing the policy objective equation. And finally, updating the parameters of the target-network of the Critic and the Actor. After the loop is over, the method ends.
3.2, complexity analysis;
the number N of moving vehicles and the batch size K are the main reasons for determining the time complexity of the adaptive weight sampling, so the time complexity of the adaptive weight sampling can be expressed as o (nk). Since the temporal complexity of offline training is proportional to the size of the training data and the training time, only the temporal complexity of the execution needs to be concerned. The complexity of execution is mainly determined by the structure of the neural network, the size of the state space and the size of the action space. Regardless of the neural network structure, the computational complexity is Q (| a | × | S |), | a | represents the number of behavior spaces, and | S | represents the number of state spaces. After DNN is added, the environment and parameter settings of the system have a great influence on the computational complexity and are difficult to estimate. Therefore, the complexity of the AWDDPG algorithm can be expressed as O (NK + | A | × | S |).
Simulation experiment:
consider a vehicle moving randomly within the coverage of multiple MEC servers, while the vehicle's trajectory belongs to a random walk model. Each vehicle will have its own compute intensive and delay sensitive task and this task will be offloaded to the MEC server for execution. The invention uses a hold-out method to separate training data and verification data, and the separation ratio is 4: 1, they are completely independent. For each vehicle, their Critic assembly was provided with 4 fully connected hidden layers, neuronal [2048,1024,512,256 ]. For the Actor component, the invention deploys 2 fully-connected hidden layers for the Actor component, the number of the neurons is [1024,512] and [512,256], and the output layer of the Actor component is activated by a tanh function. For other layers of neurons the invention uses the ReLU function for activation. Specific experimental parameter settings are shown in tables 1 and 2.
TABLE 1 Experimental parameters
Figure BDA0003538927640000151
Figure BDA0003538927640000161
TABLE 2 AWDDPG parameter settings
Figure BDA0003538927640000162
Fig. 1 shows the convergence of the proposed AWDDPG algorithm. Fig. 2 shows the rewards harvested by the AWDDPG and DDPG algorithms during the intensive training phase. Since the DDPG algorithm uses a standard empirical playback mechanism that is not improved, and ignores useful training samples, AWDDPG can choose better training samples in different training phases, which makes the algorithm converge faster. As can also be seen from fig. 1, after a period of training, when Critic and Actor adjust the evaluation-network and target-network parameters to approach the optimal strategy gradually, the AWDDPG algorithm can converge in a short time and reach a higher and more stable level. Fig. 2 shows the difference between the value of Critic's cost function and the actual prize value, and it can be seen that Critic's cost function gets closer to the true value as the number of iterations increases.
The performance of the proposed AWDDPG distributed task migration algorithm is verified by comparison with other algorithms. Other algorithms include DDPG, Extensive Service Migration (ESM), Always Migration (AM), counterfactual multi-agent (COMA), and Never Migration (NM).
Firstly, the average completion time of the 6 algorithms under the conditions of other fixed variables, different input data sizes, different vehicle numbers and different MEC server numbers of the 6 algorithms is compared. As shown in fig. 3, the average completion time increases as the size of the input data increases. Mainly because the calculation delay of the vehicle unloading task is added when the input task is increased. The average completion time of the AWDDPG algorithm is lowest compared to other algorithms. It can be seen that the average completion time of AM and NM algorithms is high, the AM algorithm is to perform service migration whenever the vehicle leaves the coverage of the edge server, which results in frequent service migration of the vehicle, and when the input data becomes large, the frequency of migration also increases, so the average completion time also gradually increases. The delay is lower than the DDPG algorithm because the invention improves the DDPG algorithm. When the NM algorithm initially selects an edge server, when a plurality of vehicles select the same MEC server for service and do not perform service migration, the resource utilization rate of the server is not high, which may result in an increase in the average completion time of the system. The ESM algorithm is for a single agent scenario, but its performance is not very good for a multi-user scenario, and it is evident that the average completion time increases as the input data size increases. The COMA uses the Actor-Critic algorithm and adopts a method of centrally training distributed execution, but the algorithm ignores the empirical playback mechanism.
The AWDDPG provided by the invention uses an empirical playback mechanism on the basis of COMA, so that the relevance among samples is reduced, and meanwhile, an adaptive weight sampling method is designed to increase the sampling efficiency, so that the convergence speed and stability of the algorithm are greatly increased, and the average completion time of the AWDDPG algorithm is the lowest. Fig. 4 is similar to the analysis of fig. 3. Fig. 5 shows that the average completion time of all algorithms decreases as the number of MEC servers increases, since the available resources of the vehicle also increase as the number of MEC servers increases, and it can be seen that the average completion time of the AWDDPG algorithm proposed by the present invention is the lowest. As can be seen in FIG. 6, when the present invention shifts the migration cost budgets from low to high for each phase, there are 5 algorithms that increase as the migration cost budget increases, and the average completion time of the computational task decreases. Since the NM algorithm does not perform service migration, the average completion time does not change. The above experiments show that the AWDDPG algorithm has better performance and better performance on the index of average completion time.
Next, the AWDDPG algorithm performance is verified by the average migration cost. Fig. 7 can obtain that when the size of the input data is increased, the average migration cost of 5 algorithms is increased, so that the migration cost is considered to be mainly related to the mirror size of the migration data. Because the AM algorithm performs service migration every time, its migration cost increases in proportion to the size of input data. The NM algorithm does not migrate and therefore does not generate energy consumption. Compared with ESM, COMA and DDPG algorithms, the AWDDPG algorithm provided by the invention can find a better migration strategy, so that the migration cost is the lowest.
Next, the AWDDPG algorithm performance is verified by migrating the resource occupancy ratio. As can be seen from fig. 8, when the number of vehicles is different, the occupation ratios of the migration resources of the 5 algorithms can be stabilized at a certain value. Since the AM algorithm performs service migration every time, its migration resource is the largest. The NM algorithm does not migrate, so there is no migration resource occupancy. Compared with ESM, COMA and DDPG algorithms, the AWDDPG algorithm provided by the invention can find a better migration strategy, so that the migration resource occupation ratio is lowest.

Claims (10)

1. A method for migrating Internet of vehicles edge computing services based on deep reinforcement learning is characterized by comprising the following steps:
1, establishing a system model:
1.1, establishing a return delay model;
1.2, establishing a communication delay model;
1.3, establishing a calculation delay model;
1.4, establishing a migration cost model;
1.5, description of the problem;
2, self-adaptive weight depth certainty strategy gradient algorithm:
2.1, improving a depth deterministic gradient algorithm;
2.2, establishing a self-adaptive weight deep learning framework;
and 3, a calculation service migration method based on deep reinforcement learning:
3.1, description of migration method steps;
and 3.2, complexity analysis.
2. The deep reinforcement learning-based migration method for edge computing services in the internet of vehicles according to claim 1, wherein the step 1.1 of building the backhaul delay model is that the system includes M ═ 1,2,3,. and.m } mobile edge servers, N ═ 1,2,3,. and.n } mobile vehicles, the mobile vehicles will change from one timeslot to the next according to the markov model, the length of each timeslot is considered as epsilon for one timeslot model T ═ {1,2,3,. and.t }, where the timeslot models are considered as one continuous time sample, the time intervals between the samples are equal, the virtualization of the container is used to manage the computing tasks in the edge servers, so as to implement flexible scheduling of the vehicle computing tasks,
Figure FDA0003538927630000011
indicates whether or not the vehicle m is connected at time tTo the mobile edge server n is connected,
Figure FDA0003538927630000012
service Task representing vehicle n at time tnWhether it is executed on top of the mobile edge server m;
when the calculation load of the local MEC server of the mobile vehicle is high, the calculation task of the vehicle is transmitted to the MEC server with less calculation tasks nearby through a backhaul link, and the transmission delay between the MEC servers is used as cn/CmWherein c isnIndicating the size of n input data of the vehicle, CmThen the output link bandwidth of MEC edge server m is represented and the backhaul delay of the vehicle is represented as
Figure FDA0003538927630000021
Figure FDA0003538927630000022
In the above formula, λ represents a positive coefficient, d (m)1,m2) Representing edge servers m1And edge server m2The number of hops in between.
3. The method for migrating the edge computing services in the internet of vehicles based on deep reinforcement learning as claimed in claim 1, wherein the method for establishing the communication delay model in step 1.2 is to use SmTo represent the spectrum resources available to the mobile edge server m, all vehicles connected to m share the spectrum resources, in spen,m(t) to represent the spectrum proportion allocated to the vehicle n by the MEC server m at time t, and since the returned data is small, regardless of the transmission delay of the returned result, according to shannon's theorem, the data transmission rate between the vehicle n and the edge server m is represented as:
Figure FDA0003538927630000023
in the above formula, PnRepresenting the transmission power, G, of the vehicle nn,m(t) represents the channel gain between the vehicle n and the edge server m at time t,
Figure FDA0003538927630000024
representing white noise power, the transmission delay of the input data is represented as:
Figure FDA0003538927630000025
4. the deep reinforcement learning-based migration method for edge computing services in internet of vehicles according to claim 1, wherein the method for establishing the computation delay model in step 1.3 is that all vehicles within the coverage of the MEC server share the computing resources to assist the vehicles in handling the unloading tasks of the vehicles, FmUsed to represent the computing power of MEC server m, phin(t) denotes the Task at time tnRequired CPU cycles, therefore, TasknThe required time to complete on MEC server m is expressed as:
Figure FDA0003538927630000031
in the above formula
Figure FDA0003538927630000032
Indicating how many tasks are being executed on the MEC server m, as seen from the above equation, the execution delay of the MEC server increases in proportion to the number of executing tasks, so that the computing resources of the target MEC server also need to be considered when performing service migration of the vehicle.
5. The IOV edge computing service migration method based on deep reinforcement learning of claim 1, wherein the method for establishing the migration cost model in step 1.4That is, to satisfy continuity of vehicle service, service migration needs to be performed between multiple MEC servers, assuming that vehicle n offloads all tasks from MEC server m1Migration to m2
Figure FDA0003538927630000033
Indicating that the vehicle n will Task at time tnFrom m1Migration to m2The cost of (a) of (b),
Figure FDA0003538927630000034
in the above formula, χ is a positive coefficient, | onL represents the mirror size of the vehicle n unloading task,
Figure FDA0003538927630000035
6. the deep reinforcement learning-based IOV edge computing service migration method according to claim 1, wherein the problem of step 1.5 is described as follows, for a moving vehicle n, the task completion time TnIncluding computation, backhaul and communication delays, expressed as:
Figure FDA0003538927630000036
Figure FDA0003538927630000037
representing the total migration cost of the vehicle n, according to the formula (5), the migration cost is 0 when the MEC server is not changed, otherwise the migration cost is α | onL, so get the total migration cost of vehicle n:
Figure FDA0003538927630000038
migration decisions are made at each time period, while a migration Cost budget Cost is made at each time periodbAnd therefore migration cost budget for the entire system
Figure FDA0003538927630000039
Expressed as:
Figure FDA0003538927630000041
on the premise of meeting the migration cost budget, the average delay of the system is minimized through learning, and the optimization formula is expressed as follows:
Figure FDA0003538927630000042
7. the method for migrating the edge computing services in the internet of vehicles based on deep reinforcement learning as claimed in claim 1, wherein the improved deep deterministic gradient algorithm in step 2.1 is that each state sample in the replay storage is assigned a priority weight, their sampling probability is set according to the assigned priority weight, an adaptive weight empirical replay mechanism is proposed,
complexity of sample i CF(s)i) Which mainly comprises the importance function RF (r) of the sample return valuei,DEi) And a use frequency function SUF (num) on the samplei),
The importance of the sample return value is expressed as:
RF(ri,DEi)=|DEi|*RW(ri)+α (10)
in the above formula, DEi=Q(si,ai;θc)-(ri+μQ'(s'i,a'i;θc') Denotes TD error,Q(si,ai;θc) Is the value of critical component evaluate-network, alpha represents a small positive number, preventing the situation of not being able to sample when the time difference is 0, RW (r)i) Representing the weight of the corresponding reward, r being set for stabilityi∈[-1,1]While RW (r)i)>0,
Figure FDA0003538927630000043
In order to prevent the over-fitting phenomenon, a function related to the number of times of using the sample is added, and as the number of times of using the sample increases, the probability that the sample is selected next becomes lower, SUF (num)i) Expressed as:
Figure FDA0003538927630000051
in the above formula, numiIndicating playback of stored samples siP, q are constants greater than 0, so the complexity function is expressed as:
Figure FDA0003538927630000052
in the above formula, the first and second carbon atoms are,
Figure FDA0003538927630000057
representing a hyper-parameter, calculating a sampling probability of a sample by a defined sample complexity:
Figure FDA0003538927630000053
in the above formula, [ phi ] E [0,1 ∈]Denotes an exponential random factor, Ψ ═ 1 denotes priority sampling, Ψ ═ 0 denotes uniform sampling, and the exponential random factor ensures a balance of priority sampling and uniform sampling, thereby preventing overfittingOccurrence of a phenomenon, since a distribution error occurs if samples in the playback memory are directly sampled, importance sampling weight w is usediTo correct for this deviation, a normalization operation is used to reduce the TD error,
Figure FDA0003538927630000054
in the above equation, D represents the playback storage capacity, and β represents the compensation coefficient.
8. The method for migrating the edge computing services in the internet of vehicles based on the deep reinforcement learning as claimed in claim 1, wherein the step 2.2 is to establish an adaptive weight deep learning architecture, apply a centralized training and distributed execution framework to the proposed AWDDPG algorithm, when in the offline centralized training phase, save the observed states and behaviors of other vehicles in the experience playback buffer area in addition to the local observed states, and increase the training number generated in each phase by combining the behaviors and the observed states, and simultaneously increase the cooperative communication between agents, and in the opposite direction to the observed states
Figure FDA0003538927630000055
And
Figure FDA0003538927630000056
when updating, the Actor component evaluates according to the sample acquired by the adaptive weight, and after obtaining the global information, each moving vehicle learns the state-behavior value function of the Actor, and meanwhile, after learning the behaviors of other vehicles, each moving vehicle is fixed in the off-line training stage so as to effectively solve the influence of the behaviors of other vehicles on the environment
Figure FDA0003538927630000061
The vehicle can select an action without knowing the information of other vehicles.
9. The method for migrating the edge computing services in the internet of vehicles based on the deep reinforcement learning as claimed in claim 1, wherein in the step 3.1, the migration method is described as follows, firstly, relevant parameters of the algorithm are input, including batch size, playback storage size, discount factor, soft update coefficient, index, hyper-parameter, number of times of use and complexity of samples; initializing parameters, and preheating data; and then circularly executing the following steps, firstly initializing the state, then selecting the action by the vehicle and executing the corresponding action, then receiving the reward obtained after the action is executed and simultaneously obtaining the next state, circularly executing the next action for each vehicle, firstly storing the sample into a playback storage and setting related parameters, then adaptively selecting the sample according to a formula, calculating time difference error and importance sampling weight, then updating the weight and calculating complexity, then updating the evaluate-network parameter of the Critic by a minimized loss function, then updating the evaluate-network parameter of the Actor by a minimized strategy objective equation, finally updating the target-network parameters of the Critic and the Actor, and finishing the method after the circulation is finished.
10. The deep reinforcement learning-based internet-of-vehicles edge computing service migration method according to claim 1, wherein the step 3.2 complexity analysis is as follows: the number of moving vehicles N and the batch size K are the main reasons for determining the complexity of the adaptive weight sampling time, the temporal complexity of the adaptive weight sampling is denoted as o (nk), since the temporal complexity of the offline training is proportional to the size of the training data and the training time, therefore, only the time complexity of execution needs to be concerned, the complexity of execution is mainly determined by the structure of the neural network, the size of the state space and the size of the action space, and under the condition of not considering the structure of the neural network, the computation complexity is Q (| A | × | S |), | A | represents the number of behavior spaces, | S | represents the number of state spaces, and after DNN is added, the setting of the environment and parameters of the system can have great influence on the computation complexity, it is also difficult to estimate, so the complexity of the AWDDPG algorithm is expressed as O (NK + | A | × | S |).
CN202210232318.7A 2022-03-09 2022-03-09 Internet of vehicles edge computing service migration method based on deep reinforcement learning Pending CN114625504A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210232318.7A CN114625504A (en) 2022-03-09 2022-03-09 Internet of vehicles edge computing service migration method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210232318.7A CN114625504A (en) 2022-03-09 2022-03-09 Internet of vehicles edge computing service migration method based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN114625504A true CN114625504A (en) 2022-06-14

Family

ID=81899365

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210232318.7A Pending CN114625504A (en) 2022-03-09 2022-03-09 Internet of vehicles edge computing service migration method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN114625504A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115134242A (en) * 2022-06-27 2022-09-30 天津理工大学 Vehicle-mounted computing task unloading method based on deep reinforcement learning strategy
CN115550944A (en) * 2022-08-18 2022-12-30 重庆大学 Dynamic service placement method based on edge calculation and deep reinforcement learning in Internet of vehicles
CN115934192A (en) * 2022-12-07 2023-04-07 江苏信息职业技术学院 B5G/6G network-oriented vehicle networking multi-type task cooperative unloading method
CN116016514A (en) * 2022-12-28 2023-04-25 北京工业大学 Intelligent self-adaptive arrangement method for edge computing service

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115134242A (en) * 2022-06-27 2022-09-30 天津理工大学 Vehicle-mounted computing task unloading method based on deep reinforcement learning strategy
CN115134242B (en) * 2022-06-27 2023-08-22 天津理工大学 Vehicle-mounted computing task unloading method based on deep reinforcement learning strategy
CN115550944A (en) * 2022-08-18 2022-12-30 重庆大学 Dynamic service placement method based on edge calculation and deep reinforcement learning in Internet of vehicles
CN115550944B (en) * 2022-08-18 2024-02-27 重庆大学 Dynamic service placement method based on edge calculation and deep reinforcement learning in Internet of vehicles
CN115934192A (en) * 2022-12-07 2023-04-07 江苏信息职业技术学院 B5G/6G network-oriented vehicle networking multi-type task cooperative unloading method
CN115934192B (en) * 2022-12-07 2024-03-26 江苏信息职业技术学院 B5G/6G network-oriented internet of vehicles multi-type task cooperation unloading method
CN116016514A (en) * 2022-12-28 2023-04-25 北京工业大学 Intelligent self-adaptive arrangement method for edge computing service
CN116016514B (en) * 2022-12-28 2024-04-19 北京工业大学 Intelligent self-adaptive arrangement method for edge computing service

Similar Documents

Publication Publication Date Title
CN114625504A (en) Internet of vehicles edge computing service migration method based on deep reinforcement learning
CN113573324B (en) Cooperative task unloading and resource allocation combined optimization method in industrial Internet of things
CN112380008B (en) Multi-user fine-grained task unloading scheduling method for mobile edge computing application
CN113543176B (en) Unloading decision method of mobile edge computing system based on intelligent reflecting surface assistance
CN113568675A (en) Internet of vehicles edge calculation task unloading method based on layered reinforcement learning
CN114285853B (en) Task unloading method based on end edge cloud cooperation in equipment-intensive industrial Internet of things
CN112272390B (en) Processing method and system for task unloading and bandwidth allocation based on physical layer
CN113364859B (en) MEC-oriented joint computing resource allocation and unloading decision optimization method in Internet of vehicles
CN113973113B (en) Distributed service migration method for mobile edge computing
CN111132074A (en) Multi-access edge computing unloading and frame time slot resource allocation method in Internet of vehicles environment
CN111970154B (en) Unloading decision and resource allocation method based on deep reinforcement learning and convex optimization
Fragkos et al. Artificial intelligence enabled distributed edge computing for Internet of Things applications
CN114205353B (en) Calculation unloading method based on hybrid action space reinforcement learning algorithm
CN116233926A (en) Task unloading and service cache joint optimization method based on mobile edge calculation
CN116489712B (en) Mobile edge computing task unloading method based on deep reinforcement learning
CN115134778A (en) Internet of vehicles calculation unloading method based on multi-user game and federal learning
Jeong et al. Deep reinforcement learning-based task offloading decision in the time varying channel
CN114449584A (en) Distributed computing unloading method and device based on deep reinforcement learning
CN112689296B (en) Edge calculation and cache method and system in heterogeneous IoT network
CN111930435B (en) Task unloading decision method based on PD-BPSO technology
Chu et al. Multiuser computing offload algorithm based on mobile edge computing in the internet of things environment
Hossain et al. Edge orchestration based computation peer offloading in MEC-enabled networks: a fuzzy logic approach
CN114942799B (en) Workflow scheduling method based on reinforcement learning in cloud edge environment
CN116137724A (en) Task unloading and resource allocation method based on mobile edge calculation
Xu et al. Decentralized multi-agent reinforcement learning for task offloading under uncertainty

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220713

Address after: 300409 No.8 Jingshun Road, Beichen Science Park, Beichen District, Tianjin

Applicant after: HUADIAN HEAVY MACHINERY Co.,Ltd.

Address before: 300384 No. 391 Binshui West Road, Xiqing District, Tianjin

Applicant before: TIANJIN University OF TECHNOLOGY