CN113543074A

CN113543074A - Joint computing migration and resource allocation method based on vehicle-road cloud cooperation

Info

Publication number: CN113543074A
Application number: CN202110659780.0A
Authority: CN
Inventors: 王书墨; 柴新越; 彭昱捷; 王合伟; 宋晓勤; 程梦倩; 陈权
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2021-06-15
Filing date: 2021-06-15
Publication date: 2021-10-22
Anticipated expiration: 2041-06-15
Also published as: CN113543074B

Abstract

The invention discloses a vehicle-road cloud cooperation-based joint computation migration and resource allocation method.A mobile edge device is deployed at the road side, an optimal vehicle user wireless access mode, channel allocation and transmitting power joint optimization strategy is obtained by utilizing a deep reinforcement learning optimization strategy, and a vehicle user minimizes system time delay by selecting a proper wireless access mode, transmitting power and channel under the condition of meeting the service quality requirement of a computation migration link. The method can effectively solve the problem of joint optimization of a vehicle selection wireless access mode, user channel allocation and power selection by using a depth certainty strategy gradient algorithm, and can stably perform in the optimization of a series of continuous action spaces.

Description

Joint computing migration and resource allocation method based on vehicle-road cloud cooperation

Technical Field

The invention relates to the technical field of resource allocation methods of the Internet of vehicles, in particular to a joint computing migration and resource allocation method based on vehicle road cloud cooperation.

Background

With the development of artificial intelligence, computationally intensive applications such as augmented reality, autopilot, speech recognition and natural language processing are emerging. Completing these applications based on strict delay constraints typically consumes a significant amount of computing resources. Therefore, the limited computing power of the mobile end (mobile phone, vehicle, etc.) is often difficult to meet the computing requirements of the application. In order for mobile terminals to also benefit from these applications, migrating computing tasks to computationally powerful computing nodes has become an effective way to resolve conflicts between resource intensive applications and resource constrained mobile terminals. Such as tasks that may be migrated from the vehicle to a roadside or remote cloud server. Existing migration schemes can be divided into three major categories, namely cloud computing assisted computing migration, edge/fog computing supported migration and cloud edge/fog end coordinated joint migration schemes.

With the rapid development of the internet of vehicles, the greatly increased data calculation amount of the vehicle-mounted application brings challenges to limited vehicle calculation resources, however, time delay is easily caused by remote deployment of the cloud server, and the requirements of the internet of vehicles on low time delay and high reliability cannot be met. The mobile edge computing provides computing service at a position close to the user, so that delay fluctuation caused by remote cloud computing is compensated, and the service quality of the user can be effectively improved. In the prior art, a cooperation method based on MEC and cloud computing is provided by a learner, and is used for automobile migration service in a vehicle-mounted network, jointly optimizing computation migration decision and computation resource allocation, and solving in a game theory mode.

However, the computing resources and storage capacity of the edge node are limited, and migrating the computing task to the roadside infrastructure for processing increases the burden of the communication link, which brings great challenges to the limited communication resources.

Therefore, the invention provides a combined computing migration and resource allocation method based on vehicle-road cloud cooperation, which aims at the scene of cooperative computing migration of mobile edge equipment and a central cloud server, takes the maximization of system time delay as an optimization target of computing migration and resource allocation, and simultaneously obtains good balance between complexity and performance.

Disclosure of Invention

The invention aims to solve the technical problem of the background technology and provides a joint computing migration resource allocation method based on vehicle-road cloud cooperation. The method can realize the combined calculation migration and resource allocation with minimized system time delay under the condition of ensuring that each calculation migration link meets the requirement of service quality.

The invention adopts the following technical scheme for solving the technical problems:

under the condition of considering the service quality requirement of each calculation migration link, the purpose of minimizing the system delay is achieved by reasonably and efficiently calculating migration and resource allocation. The method is characterized in that mobile edge computing is adopted, mobile edge equipment is deployed on the road side and cooperates with a central cloud server as two wireless access modes, and a binary migration scheme is adopted for computing tasks. And (3) adopting a distributed resource allocation method, regarding each vehicle link as an intelligent agent, and selecting a wireless access mode, a channel and transmission power based on the instantaneous state information. By establishing the Deep reinforcement learning model, the Deep reinforcement learning model is optimized by using a Deep Deterministic Policy Gradient algorithm (DDPG). And obtaining an optimal vehicle wireless access mode, user transmitting power and a channel allocation strategy according to the optimized deep reinforcement learning model. The invention is realized by the following technical scheme: a joint computing migration and resource allocation method based on vehicle-road cloud cooperation comprises the following steps:

the method comprises the following steps that (1) a vehicle-mounted network model based on vehicle-road-cloud cooperation is built, and a vehicle has a unique wireless access mode, namely the vehicle can be accessed to a cloud or a mobile edge device;

step (2) establishing a joint calculation migration and resource allocation model comprising N user migration J tasks;

step (3) acquiring resource set mu of peripheral mobile edge equipment during each vehicle migration task_mResource set σ of data center cloud server_cAnd task information t_n,j；

Step (4) a distributed resource allocation method is adopted, and a deep reinforcement learning model is constructed with the aim of minimizing system delay through reasonable and efficient calculation migration and resource allocation under the condition that the service quality requirement of each calculation migration link is considered;

step (5) considering the joint optimization problem in the continuous action space, and optimizing a deep reinforcement learning model by using a DDPG algorithm comprising three aspects of deep learning fitting, soft updating and playback mechanism;

and (6) obtaining an optimal vehicle user wireless access mode, transmitting power and a channel allocation strategy according to the optimized deep reinforcement learning model.

Further, a vehicle-mounted network model based on vehicle-road-cloud coordination is constructed in the step (1), and the vehicle has a unique wireless access mode, that is, the vehicle can access to a cloud or a mobile edge device, specifically as follows:

as shown in fig. 2, a vehicle-mounted network model based on vehicle-road-cloud coordination is constructed. In this model, the vehicle has a unique wireless access mode-the vehicle can access the cloud or mobile edge devices. The combination of cloud computing and mobile edge computing makes up for the problem of limited computing and storage capacity of mobile edge devices. The network model includes macrocells, roadside units (RSUs), and vehicles equipped with computing servers for performing compute-intensive tasks.

1) Macro-cell: the data center cloud server is equipped, so that the computing power and the storage resource are strong. But it is managed by the mobile operator, extra charges are charged through the macrocell migration message, and its bandwidth is becoming increasingly saturated.

2) Roadside units: which is installed at the roadside and may provide wireless communication to connect vehicles. The MEC server may be deployed in the RSU, has a certain computing capability and storage capability, and may provide services for various heterogeneous network devices and locally process migration data.

3) Vehicle: with the development of sensors and communication technologies, nearby vehicles can communicate in an end-to-end manner. Vehicles are primarily used to collect data information, such as traffic jams, traffic accidents, or road damage, and then package the information into messages, sending application requests to a server to process and manage the data.

The vehicle may upload the generated message through the macrocell or the RSU. The delay incurred by forwarding the message through the macro cell is almost negligible but adds additional cost, such as some fees charged by the mobile operator. The message is passed to the appropriate RSU. Although the message migration of the RSU is free, it causes additional delay, and the computational resources of the RSU may not be sufficient to meet the needs of the vehicle.

Further, the step (2) of establishing a communication model and a calculation model including N user migration J tasks, and further establishing a joint calculation migration and resource allocation model includes the following specific steps:

and (2.1) establishing a communication model, wherein two radio accesses of macro cells and RoadSide Units (RSUs) have different frequencies. Suppose that M RSUs are installed at two sides of a road, use

The available bandwidth of the RSU is divided into I channels, denoted by I ═ 1,2

And (4) showing. There are N users, J, which need to be migrated, denoted by N ═ 1, 2.

By P_m，nThe SINR of the nth user to the mth edge device is calculated to obtain the SINR of the nth user served by the mth edge device:

wherein h is_m，nRepresents the subchannel gain, δ, from the nth user to the mth edge device²Representing the noise power, p_n[i]E {0, 1} indicates whether the ith channel is used by the nth user, p_n[i]1 denotes that user n uses the i channel, ρ_n[i]0 means that user n does not use the i channel.

The transmission rate of the nth user to the mth edge device can be expressed as:

R_m，n＝ω_m，n log₂(1+Г_m，n)

ω_m，nis the bandwidth occupied by the edge device m serving the user n, and the total achievable transmission rate is the sum of the transmission rates of all concurrent transmission links.

Similarly, if the nth user directly transmits information to the macrocell via the k channel, the transmission rate can be expressed as:

wherein ω is_k，nBandwidth and signal-to-noise ratio occupied by n users transmitting information to the macrocellular network through k channels, b_k，nE {0, 1} indicates whether the kth channel is used by the nth user;

step (2.2) of establishing a calculation model, wherein each MEC has a resource set mu_m＝{Q_m，R_m，f_m，E_mIn which Q_mRepresenting the maximum computational resource, R, of the MEC server_mRepresenting the maximum available storage capacity of the MEC server, f_mRepresenting the computing power of the MEC server, E_mRepresenting the energy consumed by the MEC server CPU each week it runs.

For a data center cloud server, there is σ_c＝{f_c，E_c}，f_cRepresenting the computing power of a central cloud server, E_cRepresenting the energy consumed by the central cloud server CPU each revolution. The central cloud server has enough storage resources and computing resources to ensure that the tasks can be normally executed.

For a user in the macro cell coverage, its task can be represented as/_n，j＝{d_n，j，t_n，j，c_n，j，qn_，j，n∈N，j∈J}，d_n，jIndicates the size, t, of the task j to be migrated by user n_n，jRepresenting the maximum tolerable delay of user n for task j, c_n，jShowing the number of CPU cycles required for user n to complete task j. q. q.s_n，jRepresenting the computing resources needed by user n to compute task j.

The uploading time delay of the nth user for uploading the jth task to the mth edge server is as follows:

the computing time delay of the mth edge server for processing the jth task is

The uploading time delay of the jth task uploaded to the data center cloud server by the nth user through the k channel is

Because the data center cloud server has powerful computing power, the time for processing tasks on the data center cloud is negligible. After the task is processed on the server, the final calculation result is returned to the user, and many researches show that the data volume of the downloaded final calculation result is very small compared with the uploaded data volume, so that the time delay of issuing can be ignored;

step (2.3), establishing optimization problem expression of joint calculation migration and resource allocation

The goal is to minimize the total system delay by combining how to select the server, subchannel and power allocation to access for each task of the user, subject to computational resource constraints, limited transmission power and quality of service requirements at the user's receiving end.

For any task t_n，jTime delay of

In order to achieve a delay in the transmission,

to calculate the time delay. Therefore, the system migration and the system time delay for computing J tasks of N users are as follows:

the specific optimization problem is as follows:

the objective function of which represents the time delay of the whole system. Constraint C1 indicates that n users can only select one access mode to access to the mobile edge device or the cloud device when performing the j-th computing task they generate. Constraint C2 represents that the edge device has limited computational resources and that the amount of computation allocated to the edge device cannot exceed its computational resources. Constraint C3 represents the requirement of satisfying the qos requirement regardless of whether edge device migration or central cloud server migration is selected, and the qos requirement is expressed by a minimum signal-to-noise ratio. Constraint C4 represents that the allocated transmission power for each user uploading a task is limited by the maximum transmission power that the user can reach.

Further step (3) of acquiring all the mobile edge device storage and calculation resource occupation conditions and all the task information during each vehicle migration task, specifically including the resource set mu of each MEC_mResource set σ of data center cloud server_cInformation t about the task to be performed_n，j

μ_m＝{Q_m，R_m，f_m，E_mIn which Q_mRepresenting the maximum computational resource, R, of the MEC server_mRepresenting the maximum available storage capacity of the MEC server, f_mRepresenting the computing power of the MEC server, E_mRepresenting the energy consumed by the MEC server CPU each week it runs.

σ_c＝{f_c，E_c}，f_cRepresenting the computing power of a central cloud server, E_cRepresenting the energy consumed by the central cloud server CPU each revolution. The central cloud server has enough storage resources and computing resources to ensure that the tasks can be normally executed.

t_n，j＝{d_n，j，t_n，j，c_n，j，q_n，j，n∈N，j∈J}，d_n，jIndicates the size, t, of the task j to be migrated by user n_n，jRepresenting the maximum tolerable delay of user n for task j, c_n，jRepresenting the number of CPU cycles required for user n to complete task j. q. q.s_n，jRepresenting the computing resources needed by user n to compute task j.

Further, the step (4) of constructing a deep reinforcement learning model with the goal of minimizing system delay through reasonable and efficient computation migration and resource allocation by using a distributed resource allocation method under the condition of considering the service quality requirement of each computation migration link includes the following specific steps:

step (4.1), in particular defining the state space S as information relating to the computation migration and resource allocation, comprising the set of resources μ for each MEC_mResource set σ of data center cloud server_cInformation t about the task to be performed_n，jI.e. by

s_t＝{μ_m，σ_c，t_n，j}

Regarding each vehicle as an agent, each vehicle is based on the current state s_tSelecting a wireless access mode, a sub-channel and transmission power by the E.S;

step (4.2), defining action space A as wireless access mode, transmitting power and selected channel, and expressing as

Wireless access mode λ: each vehicle has M +1 task migration decisions, let lambda_n＝{0，1，2，..，M}，λ_nWhere M denotes that vehicle n chooses to migrate its computational tasks to an edge server executing on RSU M, and λ_nIf the vehicle n is determined to migrate the computing task to the data center cloud server to be executed, the vehicle n is 0;

and a sub-channel C: selecting an MEC for task migration, wherein I sub-channels can be selected under the available bandwidth of an RSU, and I is {1, 2.., I }; selecting a data center cloud server for task migration, wherein K sub-channels can be selected under the available bandwidth of the macro cell

Transmission power P: the vehicle n selects different transmitting power when uploading the j task, but is limited by the maximum transmitting power;

and (4.3) defining a reward function R, wherein the goal of joint calculation migration and resource allocation is to select a wireless access mode, a sub-channel and transmission power by a vehicle, and the goal of minimizing system delay by reasonably and efficiently calculating migration and resource allocation is achieved under the condition of meeting the service quality requirement of each calculation migration link. The reward function can thus be expressed as:

in order to obtain a good return for a long period of time, both the pre-ocular return and the future return should be considered. Thus, the main goal of reinforcement learning is to find a strategy to maximize the expected cumulative discount return,

wherein β ∈ [0,1] is a discount factor;

step (4.4), according to the established S, A and R, a deep reinforcement learning model is established on the basis of Q learning, and the function Q (S) is evaluated_t，a_t) Represents the slave state s_tPerforming action a_tThe resulting discount reward, the Q-value update function is:

wherein r is_tIs an instant reward function, gamma is a discount factor, s_tMobile edge server, cloud server and existing task state information, s, for vehicle acquisition at time t_t+1Indicating that the vehicle is performing a_tIn the latter state, A is action a_tThe formed motion space.

Further, the step (5) of considering the joint optimization problem in the continuous motion space and optimizing the deep reinforcement learning model by using the DDPG algorithm including three aspects of deep learning fitting, soft updating and playback mechanism includes the following specific steps:

step (5.1), initializing the number P of training rounds;

step (5.2), initializing a time step t in the P round;

step (5.3), the online Actor policy network inputsState s_tOutput action a_tAnd obtain an instant prize r_tWhile going to the next state s_t+1Thereby obtaining training data(s)_t，a_t，r_t，s_t+1)；

Step (5.4), training data(s)_t，a_t，r_t，s_t+1) Storing the experience in an experience playback pool;

step (5.5), randomly sampling m training data(s) from the experience replay pool_t，a_t，r_t，s_t+1) Forming a data set, and sending the data set to an online Actor policy network, an online Critic evaluation network, a target Actor policy network and a target Critic evaluation network;

step (5.6), defining loss function of online Critic evaluation network

Updating all parameters theta of the Critic current network through gradient back propagation of the neural network;

step (5.7), defining sampling strategy gradient of on-line Actor strategy network

Updating all parameters delta of the current network of the Actor through the gradient back propagation of the neural network;

step (5.8), the online training times reach the target network updating frequency, and the target network parameters delta 'and theta' are respectively updated according to the online network parameters delta and theta;

step (5.9), judging whether t is less than K, wherein K is the total time step in the p round, if yes, t is t +1, entering step 5c, and otherwise, entering step 5 j;

and (5.10) judging whether p is less than I, setting a threshold value for the training round number by I, entering the step 5b if p is p +1, and otherwise, finishing the optimization to obtain the optimized deep reinforcement learning model.

Further, the step (6) of obtaining an optimal vehicle user wireless access mode, transmission power and channel allocation strategy according to the optimized deep reinforcement learning model includes the following specific steps:

step (6.1), utilizing the deep reinforcement learning model trained by the DDPG algorithm to input the state information s of the system at a certain moment_t；

Step (6.2) of outputting the optimal action strategy

Obtaining the optimal vehicle wireless access mode

Allocation channel

And user transmit power

Compared with the prior art, the invention adopting the technical scheme has the following technical effects:

1. the mobile edge equipment is deployed at the road side, and an optimal vehicle user wireless access mode, channel allocation and transmitting power combined optimization strategy is obtained by utilizing a deep reinforcement learning optimization strategy;

2. the vehicle user minimizes the system time delay by selecting a proper wireless access mode, transmitting power and distributing channels under the constraint of meeting the link service quality;

3. the invention uses DDPG algorithm to effectively solve the joint optimization problem of wireless access mode, channel allocation and power selection of vehicle-road cloud cooperation, and can stably represent in the optimization of a series of continuous action spaces;

4. under the conditions of ensuring reasonable resource allocation, calculating the service quality requirement of a migration link and low calculation complexity, the combined calculation migration and resource allocation method based on vehicle-road cloud cooperation provided by the invention is superior in the aspect of minimizing system time delay.

Drawings

Fig. 1 is a flowchart of a joint computing migration and resource allocation method based on vehicle-road cloud coordination.

FIG. 2 is a schematic diagram of a joint computing migration and resource allocation model based on vehicle-road cloud cooperation.

Fig. 3 is a schematic diagram of edge computing and cloud computing collaboration.

Detailed Description

The technical scheme of the invention is further explained in detail by combining the attached drawings:

the invention discloses a joint computing migration and resource allocation method based on vehicle-road cloud cooperation, which has the core idea that: the mobile edge equipment is deployed on the road side, each vehicle is regarded as an intelligent agent by adopting a distributed resource allocation method, and the deep reinforcement learning model is optimized by establishing the deep reinforcement learning model and utilizing a DDPG algorithm. And obtaining an optimal vehicle wireless access mode, user transmitting power and a channel allocation strategy according to the optimized deep reinforcement learning model.

The present invention is described in further detail below.

the vehicle-road-cloud cooperation-based vehicle-mounted network model is constructed in the step (1), the vehicle has a unique wireless access mode, namely the vehicle can access the cloud or the mobile edge device, and the method specifically comprises the following steps:

Step (2) establishing a communication model and a calculation model comprising n user migration J tasks, and further establishing a joint calculation migration and resource allocation model;

the establishment of the communication model and the calculation model including the migration of J tasks by the N users and the establishment of the joint calculation migration and resource allocation model comprises the following specific steps:

and (2.1) establishing a communication model, wherein the two radio accesses of the macro cell and the RSU have different frequencies. Suppose that M RSUs are installed at two sides of a road, use

And (4) showing. J computing tasks of N users need to be migrated and are respectively usedN ═ 1, 2., N } and J ═ 1, 2., J } represent.

By P_m，nThe transmitting power from the nth user to the mth edge device is shown, and the signal-to-noise ratio SINR of the nth user served by the mth edge device can be obtained by calculation

R_m，n＝ω_m，n log₂(1+Г_m，n) (2)

wherein ω is_k，nBandwidth and signal-to-noise ratio occupied by n users transmitting information to the macrocellular network through k channels, b_k，nE {0, 1} indicates whether the kth channel is used by the nth user.

Step (2.2), establishing a calculation model, wherein each MEC has a resource set mu_m＝{Q_m，R_m，f_m，E_mIn which Q_mRepresenting the maximum computational resource, R, of the MEC server_mRepresenting the maximum available storage capacity of an MEC server，f_mRepresenting the computing power of the MEC server, E_mRepresenting the energy consumed by the MEC server CPU each week it runs.

For a user in the macro cell coverage, its task can be represented as/_n，j＝{d_n，j，t_n，j，c_n，j，q_n，j，n∈N，j∈J}，d_n，jIndicates the size, t, of the task j to be migrated by user n_n，jRepresenting the maximum tolerable delay of user n for task j, c_n，jRepresenting the number of CPU cycles required for user n to complete task j. q. q.s_n，jRepresenting the computing resources needed by user n to compute task j

the computing time delay for the mth edge server to process the jth task is as follows:

the uploading time delay of the jth task uploaded to the data center cloud server by the nth user through the k channel is as follows:

because the data center cloud server has powerful computing power, the time for processing tasks on the data center cloud is negligible. After the task is processed on the server, the final calculation result is returned to the user, and many researches show that the data volume of the downloaded final calculation result is very small compared with the data volume of the uploaded final calculation result, so that the time delay of the issuing can be ignored.

The goal is to minimize the total system delay by combining how to select the server, subchannel and power allocation to access for each task of the user, subject to computational resource constraints, transmission power limitations and QoS requirements at the user's receiving end.

For any task t_n，jTime delay of

The time delay of the transmission is delayed,

the specific optimization problem is as follows:

Step (3) acquiring resource set mu of peripheral mobile edge equipment during each vehicle migration task_mResource set σ of data center cloud server_cAnd task information t_n，j；

Acquiring the occupation situation of all mobile edge devices and computing resources and all task information during the task migration of each vehicle, wherein the occupation situation specifically comprises a resource set mu of each MEC_mResource set σ of data center cloud server_cInformation t about the task to be performed_n，j

The distributed resource allocation method adopted in the step (4) aims to construct a deep reinforcement learning model by reasonably and efficiently calculating migration and resource allocation to minimize system delay under the condition of considering the service quality requirement of each calculation migration link, and comprises the following specific steps:

step (ii) of(4.1) defining the State space S specifically as information related to the computation migration and resource allocation, including the resource set μ of each MEC_mResource set σ of data center cloud server_cInformation t about the task to be performed_n，jI.e. by

s_t＝{μ_m，σ_c，t_n，j} (9)

Wireless access mode λ: each vehicle has M +1 task migration decisions, let lambda_n＝{0，1，2，..，M}，λ_nWhere M denotes that vehicle n chooses to migrate its computational tasks to an edge server executing on RSU M, and λ_nAnd 0, the vehicle n decides to migrate the computing task to the data center cloud server for execution.

And a sub-channel C: selecting an MEC for task migration, wherein I sub-channels can be selected under the available bandwidth of an RSU, and I is {1, 2.., I }; selecting a data center cloud server for task migration, wherein M sub-channels can be selected under the available bandwidth of a macro cell

Transmission power P: the vehicle n selects a different transmit power when uploading the j task, but is limited by its maximum transmit power.

wherein β ∈ [0,1] is a discount factor;

Considering the joint optimization problem in the continuous motion space in the step (5), and optimizing a deep reinforcement learning model by using a DDPG algorithm comprising deep learning fitting, soft updating and a playback mechanism;

deep learning fitting refers to the DDPG algorithm fitting a deterministic strategy a ═ μ (s | θ) and an action value function Q (s, a | δ) using deep neural networks with parameters θ and δ, respectively, based on the Actor-critical framework.

The soft update means that the parameters of the action value network are frequently updated in a gradient manner and are used for calculating the gradient of the policy network, so that the learning process of the action value network is likely to be unstable, and therefore, the network is updated by adopting a soft update manner.

Respectively creating an online network and a target network for the strategy network and the action value network:

the network is continuously updated by gradient descent in the training process, and the updating mode of the target network is as follows

θ′＝τθ+(1-τ)θ (13)

δ′＝τδ+(1-τ)δ (14)

The empirical playback mechanism means that the sample data of state transition generated when the sample data interacts with the environment has time sequence relevance, and the deviation of action value function fitting is easily caused. Therefore, by using an experience playback mechanism of the deep Q learning algorithm, the collected samples are firstly put into a sample pool, and then a small batch of samples are randomly selected from the sample pool to be used for training the network. The processing removes the correlation and the dependency among samples, solves the problems of the correlation among data and the non-static distribution of the correlation among data, and enables the algorithm to be easier to converge. The DDPG algorithm which comprises three aspects of deep learning fitting, soft updating and playback mechanism and is used for optimizing the deep reinforcement learning model in the step (5) comprises the following steps:

step (5.1), initializing the number P of training rounds;

step (5.2), initializing a time step t in the P round;

step (5.3), the online Actor strategy network inputs the state s according to the input state_tOutput action a_tAnd obtain an instant prize r_tWhile going to the next state s_t+1Thereby obtaining training data(s)_t，a_t，r_t，s_t+1)；

step (5.6), set Q estimate to

y_i＝r_i+γQ′(S_i+1，μ′(s_i+1|θ′)|δ′) (15)

Defining the loss function of an online Critic evaluation network as

step (5.7), defining the given sampling strategy gradient of the online Actor strategy network as

step (5.8), the online training frequency reaches the target network updating frequency, and the target network parameters delta 'and theta' are respectively updated according to the online network parameters delta and theta;

And (6) obtaining an optimal vehicle user wireless access mode, transmitting power and channel allocation strategy according to the optimized deep reinforcement learning model, and comprising the following steps:

and (6.1). The deep reinforcement learning model trained by the DDPG algorithm is utilized to input the state information s of the system at a certain moment_k(t)；

Step (6.2) of outputting the optimal action strategy

Obtaining the optimal vehicle wireless access mode

Allocation channel

And user transmit power

In fig. 1, a flow of a joint computing migration and resource allocation method based on vehicle-road cloud cooperation is described, where a mobile edge device is deployed on the road side, cooperates with a cloud server for computing, and obtains an optimal user wireless access mode, channel allocation and transmission power joint optimization strategy by using a DDPG optimization deep reinforcement learning model.

In fig. 2, a joint computation migration and resource allocation model based on DDPG is depicted, with mobile edge devices deployed at the roadside.

In fig. 3, a scenario in which a mobile edge device cooperates with a cloud server is described. Cloud computing and mobile edge cloud computing should be coordinated and supplemented with each other, so that mobile user requirements can be better met.

Based on the description of the present invention, it should be apparent to those skilled in the art that the joint computation migration and resource allocation method based on vehicle-road cloud coordination of the present invention can reduce the system delay and ensure the system performance.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only illustrative of the present invention and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A joint computing migration and resource allocation method based on vehicle-road cloud cooperation is characterized by comprising the following steps:

2. The joint computing migration and resource allocation method based on vehicle-road cloud coordination according to claim 1, wherein the step (4) comprises the following specific steps:

step (4.1), in particular defining the state space S as information relating to the computation migration and resource allocation, comprising the set of resources μ for each MEC_mResource set σ of data center cloud server_cInformation t about the task to be performed_n,jNamely:

s_t＝{μ_m,σ_c,t_n,j}

μ_m＝{Q_m,R_m,f_m,E_min which Q_mRepresenting the maximum computational resource, R, of the MEC server_mRepresenting the maximum available storage capacity of the MEC server, f_mRepresenting the computing power of the MEC server, E_mRepresenting the energy consumed by the CPU of the MEC server for each operation week;

σ_c＝{f_c,E_c}，f_crepresenting the computing power of a central cloud server, E_cRepresenting the energy consumed by the CPU of the central cloud server for each operation week; the central cloud server has enough storage resources and computing resources to ensure that tasks can be normally executed;

t_n,j＝{d_n,j,t_n,j,c_n,j,q_n,j,n∈N,j∈J}，d_n,jindicates the size, t, of the task j to be migrated by user n_n,jRepresenting the maximum tolerable delay of user n for task j, c_n,jShowing the number of CPU cycles required for user n to complete task j; q. q.s_n,jRepresenting the computing resources needed by the user n to compute the task j;

step (4.2), defining the action space A as a wireless access mode, transmitting power and a selected channel, and expressing as follows:

wireless access mode λ: each vehicle has M +1 task migration decisions, let lambda_n＝{0，1,2,...,M}，λ_nThat is, M denotes that vehicle n chooses to migrate its computational tasks to an edge server on roadside unit M for execution, and λ_nIf the vehicle n is determined to migrate the computing task to the data center cloud server to be executed, the vehicle n is 0;

and a sub-channel C: selecting an MEC for task migration, wherein I sub-channels can be selected under the available bandwidth of a roadside unit, and I is {1, 2. Selecting a data center cloud server for task migration, wherein K sub-channels can be selected from K {1, 2.., K } under the available bandwidth of a macro cell;

transmission power P: the vehicle n selects different transmitting power when uploading the j task, but is limited by the set maximum transmitting power;

step (4.3), a reward function R is defined, the goal of joint calculation migration and resource allocation is that a vehicle selects a wireless access mode, a sub-channel and transmission power, and under the condition of meeting the service quality requirement of each calculation migration link, the goal of minimizing system delay by reasonably and efficiently calculating migration and resource allocation is achieved; the reward function can thus be expressed as:

in order to obtain good long-term returns, both the pre-ocular return and the future return should be considered; thus, the main goal of reinforcement learning is to find a strategy to maximize the expected cumulative discount return,

wherein β ∈ [0,1] is a discount factor;

step (4.4), according to the established S, A and R, a deep reinforcement learning model is established on the basis of Q learning, and the function Q (S) is evaluated_t,a_t) Represents the slave state s_tPerforming action a_tThe resulting discount reward, the Q-value update function is: