CN115460710A

CN115460710A - Intelligent calculation unloading method in vehicle edge calculation scene based on deep reinforcement learning

Info

Publication number: CN115460710A
Application number: CN202211048897.6A
Authority: CN
Inventors: 汪彦婷; 钱卓; 何立军
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2022-08-30
Filing date: 2022-08-30
Publication date: 2022-12-09
Anticipated expiration: 2042-08-30
Also published as: CN115460710B

Abstract

The invention provides an intelligent computation unloading method in a vehicle edge computation scene based on deep reinforcement learning, which adopts an online framework based on the deep reinforcement learning to jointly optimize an unloading decision of each vehicle terminal, a vehicle local computation capability, vehicle data transmission power, a channel time slot resource allocation decision and a computation resource allocation decision of an edge server according to the time-varying property of a wireless channel, minimizes the computation time delay of a system, and obtains an optimal unloading decision. Compared with the traditional heuristic algorithm, the method adopts deep reinforcement learning, has strong computing power of the deep learning and autonomous learning power of the reinforcement learning, and can automatically update the unloading strategy in the high-dynamic-change vehicle networking environment. The method can quickly converge to an optimal unloading strategy under a wireless channel time-varying environment; when the weight of each vehicle terminal changes, the unloading strategy can be automatically adjusted and quickly converged to a new optimal unloading strategy, and the robustness is high.

Description

Intelligent calculation unloading method in vehicle edge calculation scene based on deep reinforcement learning

Technical Field

The invention relates to the field of mobile edge calculation, in particular to an intelligent online unloading method in a vehicle edge calculation scene based on deep reinforcement learning.

Background

In recent years, with the continuous development of the car networking technology and the continuous increase of the car holding capacity, a large number of car applications and multimedia services appear in succession, the requirements on the aspects of service quality, user experience and system overhead are higher, the requirements on resources such as computing capacity and energy consumption are larger, the computing resources and energy storage contained in a car cannot be well qualified for the tasks such as computing intensive and delay sensitive which are widely existed in the current novel car applications, and in the aspect of solving the problem of insufficient computing resources, the computing is unloaded to become a hot research topic in the car networking.

By computing offloading technology, it is meant to transmit, i.e., "offload," computing tasks to a server with free resources for computing and to transmit back the results of the computing, thereby solving the problem of insufficient computing resources. At present, cloud computing is a relatively mature computing unloading method, computing tasks are unloaded to a cloud end for computing, but the cloud computing is not suitable for a vehicle scene because transmission time delay between the cloud end and a vehicle is too long.

The mobile Edge Computing is a technology based on this, and offloads a Computing task to an Edge server closer to the vehicle for Computing, and accordingly, in a vehicle scene, combining the mobile Edge Computing with a vehicle networking is Vehicle Edge Computing (VEC). Vehicle Edge Computing (VEC) is an effective method for improving vehicle application performance, which is currently expected, and by combining the internet of vehicles with Mobile Edge Computing (MEC), time delay and energy consumption generated by a vehicle terminal during a Computing task can be effectively reduced. Nevertheless, due to the limited battery life and capacity of vehicles, it is difficult to guarantee the performance of vehicle-mounted applications over long periods of time.

Wireless Power Transfer (WPT) refers to a process of transferring energy from an energy source to an electrical load, which is realized by Wireless transmission, rather than by conventional wired means. The energy source transfers energy to the wireless device in the air medium to ensure that the wireless device has sufficient energy to handle various tasks. Recent studies have demonstrated the feasibility of wireless energy transmission techniques.

Because vehicle battery life and energy are limited, the present invention contemplates the addition of wireless energy transfer technology to transfer energy to the vehicle via the mobile edge server, further reducing vehicle energy consumption. The wireless energy transmission technology is combined with the VEC network, and the edge server can supplement energy for the vehicle in a wireless transmission mode, so that the performance of vehicle-mounted application and the service experience of a user are guaranteed and improved, and the wireless power supply mobile edge computing technology is provided. Whereas in a wireless fading environment, in a multi-user scenario, one major challenge is to jointly optimize a single computation mode (offload or local computation) and radio resource allocation, due to the presence of binary offload variables, such problems are typically modeled as Mixed Integer Programming (MIP) problems. Aiming at the problem, the MIP problem is solved by using the traditional branch-and-bound algorithm and dynamic programming, the computation complexity is extremely high, and the MIP algorithm cannot be applied to the application environment which changes in real time; the heuristic local search method and the convex relaxation method can reduce the calculation complexity, but both require a large number of iterations to achieve satisfactory local optimization, and are not suitable for making real-time unloading decisions in a fast fading channel.

Disclosure of Invention

The invention provides an intelligent computation unloading method in a vehicle edge computation scene based on deep reinforcement learning, aiming at a wireless power supply mobile edge computation network comprising an edge server and a plurality of vehicle terminals.

The technical scheme of the invention is as follows:

the intelligent computing unloading method in the vehicle edge computing scene based on the deep reinforcement learning comprises the following steps:

step 1: based on the wireless channel gain h in the time frame _i Generating a relaxed set of offload decisions x by means of a deep neural network _t ；

Step 2: the relaxed unloading decision set x generated in the step 1 is processed by an order-preserving quantification method _t Quantizing to K binary offload decisions;

and step 3: each binary unloading decision x obtained by quantization _k Substituting into the question P1:

P1:

s.t.C1:x _i ∈{0,1}

C2:

C3:

C4:

C5:

where N is the number of vehicle terminals, x _i For the unloading action of vehicle terminal i (i =1, …, N), x _i =1 denotes the offloading of the computing task of the vehicle terminal i to the edge server, x _i =0 indicates that the calculation task of the vehicle terminal i is executed locally; f. of _M Total computing resources owned by the edge server, f _i The computing resources allocated to the vehicle terminal i for the edge server,

calculating the total time delay, w, for the edge of the vehicle terminal i _i Task calculation for vehicle terminal iA weighting factor for the time delay is determined,

calculating the time delay for the local of the vehicle terminal i, wherein a is the time ratio of the energy channel time length transmitted to the vehicle terminal by the edge server, and tau _i The channel duration of the vehicle terminal i is in proportion;

transforming the problem P1 into a resource allocation sub-problem P2:

P2:

s.t.C2、C3、C4、C5

where φ is the number of cycles required to process 1 bit of task data, D _i Calculating the data volume of the task on the vehicle terminal i, mu is the energy harvesting efficiency, P is the transmitting power of the edge server, h _i For the gain, k, of the radio channel in the i-th time period _i Calculating an energy efficiency coefficient, beta, for a vehicle terminal i _i A transmission overhead coefficient for the uplink of the vehicle terminal i; x ₀ And X ₁ For representing a collection of vehicles for taking local calculations and off-loading calculations, respectively; w is the bandwidth of the wireless channel; sigma ² Is the gaussian noise spectral density of the wireless channel;

and 4, step 4: problem P2 is decomposed into two sub-problems, P3 and P4:

P3:

s.t.C4、C5

P4:

s.t.C2、C3

solving the problem P3 to obtain time slot allocation { a, tau }; solving the problem P4 to obtain the calculation resource allocation f;

and 5: for each binary unloading decision, replacing the result obtained by solving the problems P3 and P4 back to the problem P1 and solving the obtained system time delay, and for all the binary unloading decisions, selecting the binary unloading decision with the minimum system time delay as the optimal unloading decision;

step 6: the obtained optimal unloading decision and the wireless channel gain h _i Storing the data as experience marking data in a memory;

and 7: and randomly selecting a data sample from the memory every delta time frames, training the deep neural network, updating the parameter theta in the deep neural network, and then returning to the step 1 until the method is finished.

Further, the edge of the vehicle terminal i calculates the total time delay as

Is composed of

Wherein

For the uplink data transmission delay of the vehicle terminal i,

and the edge server executes the execution time delay required by the task transmitted by the vehicle terminal i.

Further, in step 2, the relaxed unload decision x is taken _t The process of quantifying the offload decisions for the K bins is:

for a given 1 ≦ K ≦ N +1, the decision x is offloaded by relaxation _t Generating a set of K quantization offload decisions { x } _k }(k＝1,…,K)；

When k takes 1, the first binary offload decision x is generated _1,i Comprises the following steps:

wherein x _t,i Make a decision x for offloading _t The ith element in (1);

x is to be _t Element of (2), rootOrdered by their distance from 0.5, denoted as | x _t,(1) -0.5|≤|x _t,(2) -0.5|≤…≤|x _t,(i) -0.5|…≤|x _t,(N) -0.5|, where x _t,(i) Is x _t The ith statistic of (1); based on x _t,(k-1) For the kth (K =1, …, K) binary offload decision x _k Comprises the following steps:

further, in step 4, solving the problem P3 by using an interior point method to obtain time slot allocation { a, τ }; the lagrangian dual method is used to solve the problem P4 resulting in the computational resource allocation f.

Advantageous effects

Compared with the traditional heuristic algorithm, the method adopts deep reinforcement learning, has strong computing power of the deep learning and autonomous learning power of the reinforcement learning, and can automatically update the unloading strategy in the high-dynamic-change vehicle networking environment.

The invention can quickly converge to the optimal unloading strategy under the time-varying environment of the wireless channel; when the weight of each vehicle terminal changes, the method can automatically adjust the unloading strategy and quickly converge to a new optimal unloading strategy, and has stronger robustness.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1: a simplified model diagram of problem P1;

FIG. 2: a method frame diagram;

FIG. 3: a training loss variation curve of the neural network;

FIG. 4 is a schematic view of: a revenue ratio curve graph;

FIG. 5: the impact of learning rate on profitability;

FIG. 6: impact of batch size on profitability;

FIG. 7 is a schematic view of: the impact of memory size on the revenue ratio;

FIG. 8: the impact of the training interval on the profitability;

FIG. 9: modifying weight training loss variation analysis;

FIG. 10: the weight to benefit ratio change analysis is modified.

Detailed Description

The invention provides an intelligent computing unloading method in a vehicle edge computing scene based on deep reinforcement learning, wherein an edge server can broadcast energy to each vehicle terminal, and the vehicle terminals can receive the energy and are used for executing tasks. The invention adopts a binary unloading decision, namely, the task is calculated either locally at the vehicle terminal or unloaded to the edge server for calculation. According to the time-varying conditions of the wireless channel, the method and the system minimize the calculation delay of the system by jointly optimizing the unloading decision of the vehicle terminal, the local calculation capacity and the data transmission power, the channel time slot allocation decision and the calculation resource allocation decision on the edge server. Wherein vehicle local computing capability refers to the ability of the vehicle to process computing tasks locally; the vehicle data transmission power refers to the power of the vehicle when transmitting the task data to the edge server; the channel time slot resource allocation decision means that the edge server needs to occupy channel resources when transmitting energy to each vehicle, and the edge server needs to occupy channel resources when transmitting task data to the edge server by adopting edge calculation, but the channel resources are limited and need to reasonably allocate the channel resources to ensure the minimum time delay; the edge server computing resource allocation decision means that for the vehicles adopting edge computing, the edge server needs computing capacity allocated for each vehicle, but the total computing capacity of the edge server is limited, so that the computing capacity needs to be reasonably allocated to ensure that the time delay is minimum.

The problem can be modeled as a mixed integer programming problem, but due to the combination property of multi-user computing mode selection and strong coupling of the multi-user computing mode selection and resource allocation, the traditional numerical optimization method is difficult to rapidly solve in channel coherence time, and the online unloading algorithm based on deep reinforcement learning provided by the invention jointly optimizes 5 variables of an unloading decision, vehicle local computing capacity, vehicle data transmission power, a channel time slot resource allocation decision and an edge server computing resource allocation decision to obtain an optimal unloading decision, does not need to solve combination optimization, greatly reduces the computing complexity, can obtain an optimal unloading strategy in a short time, and enables the total time delay of the system to be minimum.

Parameter table

The invention provides an intelligent computation unloading method in a vehicle edge computation scene based on deep reinforcement learning, which is realized by adopting the following steps in each time frame:

step 1: based on the wireless channel gain h in the time frame _i Generating a relaxed set of offload decisions x by a deep neural network _t ；

The deep neural network can generate relaxed unloading decisions and quantitatively calculate to obtain the optimal unloading decisions even if the deep neural network is not completely trained. Combining the optimal unloading decision obtained each time and the wireless channel gain in the current time period into a data sample for storage; one data sample is selected from the data samples each time to train the neural network, so that a better unloading decision can be obtained next time.

The unloading decision is a parameter for representing the unloading action of the vehicle terminal, and values of 0 and 1 represent local calculation and unloading calculation respectively. Parameter x _i To representAnd (4) unloading decision of the ith vehicle terminal in the time period. And parameter x _t Refers to the relaxed offload decision set generated by the neural network in each time period, and unlike the previous parameter which only takes the value of 0 or 1, this parameter is a set containing N elements, each element having a value between 0 and 1.

Step 2: the relaxed unloading decision set x generated in the step 1 is processed by an order-preserving quantification method _t Quantized to K binary offload decisions:

specifically, for a given 1 ≦ K ≦ N +1, the decision x is offloaded by relaxation _t Generating a set of K quantization offload decisions { x } _k } (K =1, …, K), as shown below, where x _k And similarly, the unloading decision is a set containing N elements, and each element takes the value of 0 or 1 and represents the unloading decision of the corresponding vehicle terminal.

wherein x _t,i Is a set x _t The ith element in (1).

X is to be _t According to their distance from 0.5 (according to x) _t Is ordered by the relative distance between each entry of (a) and the value 0.5), is represented as | x _t,(1) -0.5|≤|x _t,(2) -0.5|≤…≤|x _t,(i) -0.5|…≤|x _t,(N) -0.5|, where x _t,(i) Is x _t The ith statistic of (1). Based on x _t,(k-1) For the kth (K =1, …, K) binary offload decision x _k Comprises the following steps:

P1:

s.t.C1:x _i ∈{0,1}

C2:

C3:

C4:

C5:

at this time, the problem P1 is transformed into a resource allocation sub-problem P2:

P2:

s.t.C2、C3、C4、C5

x in the formula ₀ And X ₁ For representing sets of vehicles taking local and off-load calculations, respectively, i.e. X ₀ Represents a set of vehicle terminal offload decisions that take local calculations with a value of 0 in the set of offload decisions, and X ₁ And representing the collection of the unloading decisions of the vehicle terminal, which are calculated locally when the value of the unloading decisions in the group is 1.

And 4, step 4: problem P2 is decomposed into two sub-problems, P3 and P4:

P3:

s.t.C4、C5

P4:

s.t.C2、C3

solving the problem P3 by using an interior point method to obtain time slot allocation { a, tau }; the lagrangian dual method is used to solve the problem P4 resulting in the computational resource allocation f.

And 5: for each binary unloading decision, the result obtained by solving the problems P3 and P4 is replaced back to the problem P1 and the system time delay obtained by solving the problem is solved, and for all the binary unloading decisions, the binary unloading decision with the minimum system time delay is selected as the optimal unloading decision x _i 。

If the unloading decisions of each vehicle terminal are determined, the problem P1 can be converted into P2, and the conversion process is achieved by directly and respectively calculating the quantified K unloading decisions, but an optimal one is finally selected from the quantified K unloading decisions. The optimal unloading decision is obtained by decomposing P2 into P3 and P4, and then the optimal unloading decision is substituted into P1 to solve the optimal time delay.

All binary offload decisions here refer to K offload decisions that the neural network gets through the then current radio channel gain per time period. The parameter symbols in the above problem do not all address a certain vehicle terminal, but indicate the unloading operation of all vehicle terminals. Neither slot allocation nor computing resource allocation is directed to a certain vehicle terminal, but rather comprises a set of policies for allocating all vehicle terminals for which offload computations are undertaken.

Step 6: the obtained optimal unloading decision x _i And a radio channel gain h _i Stored in memory as empirical marker data.

And 7: and randomly selecting a data sample (empirical marking data stored in a previous memory) from the memory every delta time frames to train the deep neural network. And updates the DNN parameter θ using Adam algorithm.

Algorithm pseudo code:

the following description is presented in conjunction with specific embodiments which are meant to be illustrative, but not limiting, of the invention.

The calculation model of the embodiment is composed of two parts, namely local calculation and edge calculation. In a local calculation mode, calculation tasks are directly calculated on the vehicle terminal, and time delay is the local calculation time delay of the vehicle terminal i

In the edge calculation mode, a calculation task is unloaded to an edge server by a vehicle terminal for calculation, time delay is uplink data transmission time delay of the vehicle terminal i, and the edge server executes execution time delay required by the task transmitted by the vehicle terminal i and result data return time delay of the edge server. The edge service ignores the time delay generated in the result data return stage of the edge server, and the total time delay of the edge calculation of the vehicle terminal i is

Comprises the following steps:

the invention optimizes the unloading decision x of each intelligent vehicle terminal i in a combined manner _i E.g., {0,1}, channel slot allocation decision { a, τ _i Computation resource allocation decision f of edge server _i Local computing power f of vehicle terminal _i ^l And data transmission power P _i To minimize the system utility function T (x, a, τ, f) ^l P), an optimization problem P0 of minimum delay is established, which is expressed as follows:

P0:

s.t.C1:x _i ∈{0,1}

C2:

C3:

C4:

C5:

C6:

C7:

wherein: w is a _i Calculating a weight factor, w, of the time delay for the i task of the vehicle terminal _i The relative size of the vehicle unit (A) reflects the relative priority of different vehicles, namely reflects the sensitivity of calculation tasks of different vehicles to time delay, and the larger the weight factor is, the more sensitive the calculation tasks of the vehicle unit to the time delay is. f. of _M Is the maximum computing power of the edge server.

For the problem P0, first the local computing power f of the vehicle terminal i is examined _i ^l And data transmission power P _i Impact on problem optimization:

the transmission delay generated when the energy received from the edge server is totally used for local calculation

Can be minimized. At this time, the time delay is calculated locally

Comprises the following steps:

when the energy received from the edge server is totally used for data transmission, the generated transmission delay

Can be minimized. The minimum transmission delay is:

satisfy local computing power f of vehicle terminal i under optimal condition _i ^l And data transmission power P _i After determination, consider the utility function T (x, a, τ, f):

at this time, the problem P0 can be transformed into the problem P1:

P1:

s.t.C1:x _i ∈{0,1}

C2:

C3:

C4:

C5:

the algorithm consists of three alternating phases: offloading decision generation, channel and computing resource allocation, offloading policy update. The algorithm framework is shown in fig. 1.

An unloading decision generation stage: by observing the gain h of the radio channel within the time frame t _i As an input, DNN is used to establish an offload decision x with a slack _t As a mapping between the outputs. Using ReLu function as the activation function of the hidden layer of the neural network and using Sigmoid function as the activation function of the output layer of the neural network to lead the output relaxation unloading decision x _t Satisfy x _i E (0,1). Decision x for relaxation offloading using an order-preserving quantization (order-preserving quantization) method _t Quantization is performed, specifically, for a given 1 ≦ K ≦ N +1, the decision x is unloaded by the relaxation _t Generating a set of ground K quantization offload decisions { x _k As follows:

the first binary offload decision generated is:

x is to be _t Are ordered so that their distances are each 0.5, denoted as | x _t,(1) -0.5|≤|x _t,(2) -0.5|≤…|x _t,(i) -0.5|…|x _t,(N) -0.5|, where x _t,(i) Is x _t The ith statistic of (1). Based on x _t,(k-1) For the kth (K =1, …, K) binary offload decision x _k Comprises the following steps:

channel and computing resource allocation phase: the problem is decomposed into two sub-problems of offloading decision and resource allocation. As shown in fig. 1.

The resource allocation subproblem P2 has the expression:

P2:

s.t.C2、C3、C4、C5

the method is decomposed into two sub-problems of transmission delay solving and calculation delay solving, which are respectively expressed as:

P3:

s.t.C4、C5

P4:

s.t.C2、C3

for the transmission delay subproblem P3, solving the time slot allocation { a, tau } by using an interior point method; for the computation delay subproblem P4, the computation resource allocation f is solved using the lagrange dual method.

After solving the optimal channel time slot allocation decision and the edge server computing resource allocation decision, the channel time slot allocation decision { a, tau calculated by K unloading decisions is calculated _i } and edge server computing resource allocation decision f _i Substituting the problem P1 to calculate the total time delay, selecting the minimum total time delay as the optimal calculation time delay, and the corresponding unloading decision is the optimal unloading decision.

And (3) an unloading strategy updating stage: maintaining an initial empty memory with limited capacity, and making an optimal unloading decision x at the end of each time frame _i Corresponding wireless channel gain h _i The DNNs are trained using empirical label data stored in memory. When the memory space is full, the oldest data in the memory is replaced with the newest data to ensure that the data samples used for training are reliable. A set of data samples is randomly selected from memory for training using an empirical playback technique. Using AdaThe m-algorithm updates the DNN parameter θ by minimizing the cross-entropy loss. After enough empirical label data has been collected, the DNN is trained only once every δ time frames.

The present invention obtains simulation results by performing simulation with a sensor Flow 1.0 in Python, see fig. 3 to 8.

We used a Powercast TX91501-3W of P =3 watts as the energy transmitter of the edge server, with the P2110 harvester as the energy receiver of each vehicle terminal. Let us note the distance d from the ith vehicle terminal to the edge server _i ，d _i The values of the vehicle terminals are uniformly distributed among (2.5,5.2) meters, namely, the vehicle terminals are uniformly distributed in a circular ring with the center of the edge server and the values of the upper radius and the lower radius of (2.5,5.2) meters, so that the access points of the edge server can be ensured to cover all the vehicle terminals. Without loss of generality, we assume that the radio channel gain remains constant within one time frame and changes independently between different time frames. We give the value of the weighting factor of the vehicle terminal i by the following rule: if the number of the vehicle terminal i is odd, w _i Set to 1, otherwise set to 1.5. Consider a neural network consisting of one input layer, two hidden layers, and one output layer, where the first and second layers of hidden neurons contain 120 and 80 hidden neurons, respectively. Setting the training interval δ =10, training batch size | T | =128, memory size 1024, learning rate of adam optimizer is set to 0.01. Other parameter settings are shown in the following table:

simulation parameters	Parameter setting
		Bandwidth W	2MHZ
Calculating task data volume D _i	5×10 ⁶ bytes
		Channel noise power σ ²	1.0×10 ^-8
Edge server computing power f _M	8×10 ⁹ cycles/s
		Edge server energy transfer power P	0.5W
Number of cycles per bit phi	100
		Transmission overhead coefficient beta _i	4

All simulation experiments were run on a computer equipped with an Inter Pentium CPU G860.

Fig. 3 shows the convergence of the loss of neural network training. It can be seen from the figure that the training loss gradually decreases and gradually converges to a smaller value as the number of training times increases. Then, with the further improvement of the training times, the training loss is basically kept stable. The algorithm can be considered to be capable of rapidly converging under the vehicle-mounted environment with time-varying channel conditions, and the unloading decision can be rapidly made.

Definition of

Enumerating the optimal system delay, T, calculated for all binary offload decisions using a greedy algorithm ^* And (h, x) is the optimal time delay obtained by the algorithm. Defining a profitability ratio

The algorithm of the invention is shown to find the optimal time delay effectiveness. The profitability variation is shown in fig. 4. When training begins, the income ratio is about 0.9, and it is obvious that the most time delay obtained by the algorithm has a larger difference with the optimal time delay calculated by the greedy algorithm. With the increasing training times, the gain ratio can be seen to approach to 1, which means that the optimal delay difference obtained by the two algorithms is shrinking continuously. After the number of training passes 2100, we can see that the profit-to-profit ratio basically goes to 1 and no longer changes. At this time, the difference between the optimal solutions obtained by the two algorithms is already small, and we can think that the proposed algorithm gradually converges to the optimal solution.

FIGS. 5 to 8 are graphs showing the effects of learning rate, batch size, memory size, and neural network training interval in the Adam optimizer on the convergence and effectiveness of the present invention,

the robustness of the invention is analyzed in the scene of alternating weights. At the beginning of the simulation experiment, the weights of all vehicle units are determined according to the parity of the serial numbers, and for the vehicle terminals with odd serial numbers, the weights are set to be 1, and for the vehicle terminals with even serial numbers, the weights are set to be 1.5. When the training times reach 2500 times, the weights of the vehicle terminals with odd numbers and even numbers are exchanged, and finally when the training times reach 3500 times, the weights are restored to the original state. And (4) solving the optimal time delay under different weight distribution by a greedy algorithm to calculate the revenue ratio. The training loss and the profit-to-profit ratio change are shown in fig. 9 and 10. Despite the change in weights, training loss and profitability are not affected, and a solution that closely approximates the optimal solution can still be obtained. This illustrates that the present invention can quickly automatically adjust the offloading policy based on the changed weights and eventually converge to a new optimal offloading decision.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made in the above embodiments by those of ordinary skill in the art without departing from the principle and spirit of the present invention.

Claims

1. An intelligent computing unloading method in a vehicle edge computing scene based on deep reinforcement learning is characterized in that: the method comprises the following steps:

and step 3: each binary unloading decision x obtained by quantization _k Substituting into problem P1:

s.t.C1:x _i ∈{0,1}

where N is the number of vehicle terminals, x _i For the unloading action of vehicle terminal i (i =1, …, N), x _i =1 denotes the offloading of the computing task of the vehicle terminal i to the edge server, x _i =0 indicates that the calculation task of the vehicle terminal i is executed locally; f. of _M Total computing resources owned by the edge server, f _i Allocating computing resources, T, of a vehicle terminal i to an edge server _i ^c Calculating the total time delay, w, for the edge of the vehicle terminal i _i Calculating a weight factor, T, of the time delay for the i task of the vehicle terminal _i ^l Calculating the time delay for the local of the vehicle terminal i, wherein a is the time ratio of the energy channel time length transmitted to the vehicle terminal by the edge server, and tau _i The channel duration of the vehicle terminal i is proportional to the channel duration;

translating the problem P1 into a resource allocation sub-problem P2:

s.t.C2、C3、C4、C5

where φ is the number of cycles required to process 1 bit of task data, D _i Calculating the data volume of the task on the vehicle terminal i, mu is the energy harvesting efficiency, P is the transmitting power of the edge server, h _i For the gain of the radio channel in the i-th time period, k _i Calculating an energy efficiency coefficient, beta, for a vehicle terminal i _i A transmission overhead coefficient for the uplink of the vehicle terminal i; x ₀ And X ₁ For representing a collection of vehicles for taking local calculations and off-loading calculations, respectively; w is the bandwidth of the wireless channel; sigma ² Is the gaussian noise spectral density of the wireless channel;

and 4, step 4: problem P2 is decomposed into two sub-problems, P3 and P4:

s.t.C4、C5

s.t.C2、C3

and 5: for each binary unloading decision, replacing the result obtained by solving the problems P3 and P4 into the problem P1 and solving the obtained system time delay, and for all the binary unloading decisions, selecting the binary unloading decision with the minimum system time delay as the optimal unloading decision;

2. The intelligent computing offloading method in a deep reinforcement learning based vehicle edge computing scenario of claim 1, wherein: the total time delay of the edge calculation of the vehicle terminal i is T _i ^c Is T _i ^c ＝T _i ^r +T _i ^es Wherein T is _i ^r For the uplink data transmission delay, T, of the vehicle terminal i _i ^es And the edge server executes the execution time delay required by the task transmitted by the vehicle terminal i.

3. The intelligent computing offloading method in a deep reinforcement learning based vehicle edge computing scenario of claim 1, wherein: in step 2, the relaxed unload decision x is taken _t The process of quantifying the offload decisions into K bins is:

for a given 1 ≦ K ≦ N +1, the decision x is taken by the relaxation offload _t Generating a set of K quantization offload decisions { x } _k }(k＝1,…,K)；

The first binary offload decision x generated when k takes 1 _1,i Comprises the following steps:

wherein x _t,i Make a decision x for offloading _t The ith element in (1);

x is to be _t Are ordered according to their distance from 0.5, denoted as | x _t,(1) -0.5|≤|x _t,(2) -0.5|≤…≤|x _t,(i) -0.5|…≤|x _t,(N) -0.5|, where x _t,(i) Is x _t The ith statistic of (1); based on x _t,(k-1) For the kth (K =1, …, K) binary offload decision x _k Comprises the following steps:

4. the intelligent computing offloading method in a deep reinforcement learning based vehicle edge computing scenario of claim 1, wherein: in step 4, solving the problem P3 by using an interior point method to obtain time slot allocation { a, tau }; the problem P4 is solved using the lagrangian dual method to obtain the computational resource allocation f.