CN115460710A - Intelligent calculation unloading method in vehicle edge calculation scene based on deep reinforcement learning - Google Patents

Intelligent calculation unloading method in vehicle edge calculation scene based on deep reinforcement learning Download PDF

Info

Publication number
CN115460710A
CN115460710A CN202211048897.6A CN202211048897A CN115460710A CN 115460710 A CN115460710 A CN 115460710A CN 202211048897 A CN202211048897 A CN 202211048897A CN 115460710 A CN115460710 A CN 115460710A
Authority
CN
China
Prior art keywords
decision
vehicle
unloading
vehicle terminal
computing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211048897.6A
Other languages
Chinese (zh)
Other versions
CN115460710B (en
Inventor
汪彦婷
钱卓
何立军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202211048897.6A priority Critical patent/CN115460710B/en
Publication of CN115460710A publication Critical patent/CN115460710A/en
Application granted granted Critical
Publication of CN115460710B publication Critical patent/CN115460710B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/40Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention provides an intelligent computation unloading method in a vehicle edge computation scene based on deep reinforcement learning, which adopts an online framework based on the deep reinforcement learning to jointly optimize an unloading decision of each vehicle terminal, a vehicle local computation capability, vehicle data transmission power, a channel time slot resource allocation decision and a computation resource allocation decision of an edge server according to the time-varying property of a wireless channel, minimizes the computation time delay of a system, and obtains an optimal unloading decision. Compared with the traditional heuristic algorithm, the method adopts deep reinforcement learning, has strong computing power of the deep learning and autonomous learning power of the reinforcement learning, and can automatically update the unloading strategy in the high-dynamic-change vehicle networking environment. The method can quickly converge to an optimal unloading strategy under a wireless channel time-varying environment; when the weight of each vehicle terminal changes, the unloading strategy can be automatically adjusted and quickly converged to a new optimal unloading strategy, and the robustness is high.

Description

Intelligent calculation unloading method in vehicle edge calculation scene based on deep reinforcement learning
Technical Field
The invention relates to the field of mobile edge calculation, in particular to an intelligent online unloading method in a vehicle edge calculation scene based on deep reinforcement learning.
Background
In recent years, with the continuous development of the car networking technology and the continuous increase of the car holding capacity, a large number of car applications and multimedia services appear in succession, the requirements on the aspects of service quality, user experience and system overhead are higher, the requirements on resources such as computing capacity and energy consumption are larger, the computing resources and energy storage contained in a car cannot be well qualified for the tasks such as computing intensive and delay sensitive which are widely existed in the current novel car applications, and in the aspect of solving the problem of insufficient computing resources, the computing is unloaded to become a hot research topic in the car networking.
By computing offloading technology, it is meant to transmit, i.e., "offload," computing tasks to a server with free resources for computing and to transmit back the results of the computing, thereby solving the problem of insufficient computing resources. At present, cloud computing is a relatively mature computing unloading method, computing tasks are unloaded to a cloud end for computing, but the cloud computing is not suitable for a vehicle scene because transmission time delay between the cloud end and a vehicle is too long.
The mobile Edge Computing is a technology based on this, and offloads a Computing task to an Edge server closer to the vehicle for Computing, and accordingly, in a vehicle scene, combining the mobile Edge Computing with a vehicle networking is Vehicle Edge Computing (VEC). Vehicle Edge Computing (VEC) is an effective method for improving vehicle application performance, which is currently expected, and by combining the internet of vehicles with Mobile Edge Computing (MEC), time delay and energy consumption generated by a vehicle terminal during a Computing task can be effectively reduced. Nevertheless, due to the limited battery life and capacity of vehicles, it is difficult to guarantee the performance of vehicle-mounted applications over long periods of time.
Wireless Power Transfer (WPT) refers to a process of transferring energy from an energy source to an electrical load, which is realized by Wireless transmission, rather than by conventional wired means. The energy source transfers energy to the wireless device in the air medium to ensure that the wireless device has sufficient energy to handle various tasks. Recent studies have demonstrated the feasibility of wireless energy transmission techniques.
Because vehicle battery life and energy are limited, the present invention contemplates the addition of wireless energy transfer technology to transfer energy to the vehicle via the mobile edge server, further reducing vehicle energy consumption. The wireless energy transmission technology is combined with the VEC network, and the edge server can supplement energy for the vehicle in a wireless transmission mode, so that the performance of vehicle-mounted application and the service experience of a user are guaranteed and improved, and the wireless power supply mobile edge computing technology is provided. Whereas in a wireless fading environment, in a multi-user scenario, one major challenge is to jointly optimize a single computation mode (offload or local computation) and radio resource allocation, due to the presence of binary offload variables, such problems are typically modeled as Mixed Integer Programming (MIP) problems. Aiming at the problem, the MIP problem is solved by using the traditional branch-and-bound algorithm and dynamic programming, the computation complexity is extremely high, and the MIP algorithm cannot be applied to the application environment which changes in real time; the heuristic local search method and the convex relaxation method can reduce the calculation complexity, but both require a large number of iterations to achieve satisfactory local optimization, and are not suitable for making real-time unloading decisions in a fast fading channel.
Disclosure of Invention
The invention provides an intelligent computation unloading method in a vehicle edge computation scene based on deep reinforcement learning, aiming at a wireless power supply mobile edge computation network comprising an edge server and a plurality of vehicle terminals.
The technical scheme of the invention is as follows:
the intelligent computing unloading method in the vehicle edge computing scene based on the deep reinforcement learning comprises the following steps:
step 1: based on the wireless channel gain h in the time frame i Generating a relaxed set of offload decisions x by means of a deep neural network t
Step 2: the relaxed unloading decision set x generated in the step 1 is processed by an order-preserving quantification method t Quantizing to K binary offload decisions;
and step 3: each binary unloading decision x obtained by quantization k Substituting into the question P1:
P1:
Figure BDA0003822960630000021
s.t.C1:x i ∈{0,1}
C2:
Figure BDA0003822960630000031
C3:
Figure BDA0003822960630000032
C4:
Figure BDA0003822960630000033
C5:
Figure BDA0003822960630000034
where N is the number of vehicle terminals, x i For the unloading action of vehicle terminal i (i =1, …, N), x i =1 denotes the offloading of the computing task of the vehicle terminal i to the edge server, x i =0 indicates that the calculation task of the vehicle terminal i is executed locally; f. of M Total computing resources owned by the edge server, f i The computing resources allocated to the vehicle terminal i for the edge server,
Figure BDA0003822960630000035
calculating the total time delay, w, for the edge of the vehicle terminal i i Task calculation for vehicle terminal iA weighting factor for the time delay is determined,
Figure BDA0003822960630000036
calculating the time delay for the local of the vehicle terminal i, wherein a is the time ratio of the energy channel time length transmitted to the vehicle terminal by the edge server, and tau i The channel duration of the vehicle terminal i is in proportion;
transforming the problem P1 into a resource allocation sub-problem P2:
P2:
Figure BDA0003822960630000037
s.t.C2、C3、C4、C5
where φ is the number of cycles required to process 1 bit of task data, D i Calculating the data volume of the task on the vehicle terminal i, mu is the energy harvesting efficiency, P is the transmitting power of the edge server, h i For the gain, k, of the radio channel in the i-th time period i Calculating an energy efficiency coefficient, beta, for a vehicle terminal i i A transmission overhead coefficient for the uplink of the vehicle terminal i; x 0 And X 1 For representing a collection of vehicles for taking local calculations and off-loading calculations, respectively; w is the bandwidth of the wireless channel; sigma 2 Is the gaussian noise spectral density of the wireless channel;
and 4, step 4: problem P2 is decomposed into two sub-problems, P3 and P4:
P3:
Figure BDA0003822960630000038
s.t.C4、C5
P4:
Figure BDA0003822960630000041
s.t.C2、C3
solving the problem P3 to obtain time slot allocation { a, tau }; solving the problem P4 to obtain the calculation resource allocation f;
and 5: for each binary unloading decision, replacing the result obtained by solving the problems P3 and P4 back to the problem P1 and solving the obtained system time delay, and for all the binary unloading decisions, selecting the binary unloading decision with the minimum system time delay as the optimal unloading decision;
step 6: the obtained optimal unloading decision and the wireless channel gain h i Storing the data as experience marking data in a memory;
and 7: and randomly selecting a data sample from the memory every delta time frames, training the deep neural network, updating the parameter theta in the deep neural network, and then returning to the step 1 until the method is finished.
Further, the edge of the vehicle terminal i calculates the total time delay as
Figure BDA0003822960630000042
Is composed of
Figure BDA0003822960630000043
Wherein
Figure BDA0003822960630000044
For the uplink data transmission delay of the vehicle terminal i,
Figure BDA0003822960630000045
and the edge server executes the execution time delay required by the task transmitted by the vehicle terminal i.
Further, in step 2, the relaxed unload decision x is taken t The process of quantifying the offload decisions for the K bins is:
for a given 1 ≦ K ≦ N +1, the decision x is offloaded by relaxation t Generating a set of K quantization offload decisions { x } k }(k=1,…,K);
When k takes 1, the first binary offload decision x is generated 1,i Comprises the following steps:
Figure BDA0003822960630000046
wherein x t,i Make a decision x for offloading t The ith element in (1);
x is to be t Element of (2), rootOrdered by their distance from 0.5, denoted as | x t,(1) -0.5|≤|x t,(2) -0.5|≤…≤|x t,(i) -0.5|…≤|x t,(N) -0.5|, where x t,(i) Is x t The ith statistic of (1); based on x t,(k-1) For the kth (K =1, …, K) binary offload decision x k Comprises the following steps:
Figure BDA0003822960630000051
further, in step 4, solving the problem P3 by using an interior point method to obtain time slot allocation { a, τ }; the lagrangian dual method is used to solve the problem P4 resulting in the computational resource allocation f.
Advantageous effects
Compared with the traditional heuristic algorithm, the method adopts deep reinforcement learning, has strong computing power of the deep learning and autonomous learning power of the reinforcement learning, and can automatically update the unloading strategy in the high-dynamic-change vehicle networking environment.
The invention can quickly converge to the optimal unloading strategy under the time-varying environment of the wireless channel; when the weight of each vehicle terminal changes, the method can automatically adjust the unloading strategy and quickly converge to a new optimal unloading strategy, and has stronger robustness.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1: a simplified model diagram of problem P1;
FIG. 2: a method frame diagram;
FIG. 3: a training loss variation curve of the neural network;
FIG. 4 is a schematic view of: a revenue ratio curve graph;
FIG. 5: the impact of learning rate on profitability;
FIG. 6: impact of batch size on profitability;
FIG. 7 is a schematic view of: the impact of memory size on the revenue ratio;
FIG. 8: the impact of the training interval on the profitability;
FIG. 9: modifying weight training loss variation analysis;
FIG. 10: the weight to benefit ratio change analysis is modified.
Detailed Description
The invention provides an intelligent computing unloading method in a vehicle edge computing scene based on deep reinforcement learning, wherein an edge server can broadcast energy to each vehicle terminal, and the vehicle terminals can receive the energy and are used for executing tasks. The invention adopts a binary unloading decision, namely, the task is calculated either locally at the vehicle terminal or unloaded to the edge server for calculation. According to the time-varying conditions of the wireless channel, the method and the system minimize the calculation delay of the system by jointly optimizing the unloading decision of the vehicle terminal, the local calculation capacity and the data transmission power, the channel time slot allocation decision and the calculation resource allocation decision on the edge server. Wherein vehicle local computing capability refers to the ability of the vehicle to process computing tasks locally; the vehicle data transmission power refers to the power of the vehicle when transmitting the task data to the edge server; the channel time slot resource allocation decision means that the edge server needs to occupy channel resources when transmitting energy to each vehicle, and the edge server needs to occupy channel resources when transmitting task data to the edge server by adopting edge calculation, but the channel resources are limited and need to reasonably allocate the channel resources to ensure the minimum time delay; the edge server computing resource allocation decision means that for the vehicles adopting edge computing, the edge server needs computing capacity allocated for each vehicle, but the total computing capacity of the edge server is limited, so that the computing capacity needs to be reasonably allocated to ensure that the time delay is minimum.
The problem can be modeled as a mixed integer programming problem, but due to the combination property of multi-user computing mode selection and strong coupling of the multi-user computing mode selection and resource allocation, the traditional numerical optimization method is difficult to rapidly solve in channel coherence time, and the online unloading algorithm based on deep reinforcement learning provided by the invention jointly optimizes 5 variables of an unloading decision, vehicle local computing capacity, vehicle data transmission power, a channel time slot resource allocation decision and an edge server computing resource allocation decision to obtain an optimal unloading decision, does not need to solve combination optimization, greatly reduces the computing complexity, can obtain an optimal unloading strategy in a short time, and enables the total time delay of the system to be minimum.
Parameter table
Figure BDA0003822960630000061
Figure BDA0003822960630000071
Figure BDA0003822960630000081
The invention provides an intelligent computation unloading method in a vehicle edge computation scene based on deep reinforcement learning, which is realized by adopting the following steps in each time frame:
step 1: based on the wireless channel gain h in the time frame i Generating a relaxed set of offload decisions x by a deep neural network t
The deep neural network can generate relaxed unloading decisions and quantitatively calculate to obtain the optimal unloading decisions even if the deep neural network is not completely trained. Combining the optimal unloading decision obtained each time and the wireless channel gain in the current time period into a data sample for storage; one data sample is selected from the data samples each time to train the neural network, so that a better unloading decision can be obtained next time.
The unloading decision is a parameter for representing the unloading action of the vehicle terminal, and values of 0 and 1 represent local calculation and unloading calculation respectively. Parameter x i To representAnd (4) unloading decision of the ith vehicle terminal in the time period. And parameter x t Refers to the relaxed offload decision set generated by the neural network in each time period, and unlike the previous parameter which only takes the value of 0 or 1, this parameter is a set containing N elements, each element having a value between 0 and 1.
Step 2: the relaxed unloading decision set x generated in the step 1 is processed by an order-preserving quantification method t Quantized to K binary offload decisions:
specifically, for a given 1 ≦ K ≦ N +1, the decision x is offloaded by relaxation t Generating a set of K quantization offload decisions { x } k } (K =1, …, K), as shown below, where x k And similarly, the unloading decision is a set containing N elements, and each element takes the value of 0 or 1 and represents the unloading decision of the corresponding vehicle terminal.
When k takes 1, the first binary offload decision x is generated 1,i Comprises the following steps:
Figure BDA0003822960630000082
wherein x t,i Is a set x t The ith element in (1).
X is to be t According to their distance from 0.5 (according to x) t Is ordered by the relative distance between each entry of (a) and the value 0.5), is represented as | x t,(1) -0.5|≤|x t,(2) -0.5|≤…≤|x t,(i) -0.5|…≤|x t,(N) -0.5|, where x t,(i) Is x t The ith statistic of (1). Based on x t,(k-1) For the kth (K =1, …, K) binary offload decision x k Comprises the following steps:
Figure BDA0003822960630000091
and step 3: each binary unloading decision x obtained by quantization k Substituting into the question P1:
P1:
Figure BDA0003822960630000092
s.t.C1:x i ∈{0,1}
C2:
Figure BDA0003822960630000093
C3:
Figure BDA0003822960630000094
C4:
Figure BDA0003822960630000095
C5:
Figure BDA0003822960630000096
at this time, the problem P1 is transformed into a resource allocation sub-problem P2:
P2:
Figure BDA0003822960630000097
s.t.C2、C3、C4、C5
x in the formula 0 And X 1 For representing sets of vehicles taking local and off-load calculations, respectively, i.e. X 0 Represents a set of vehicle terminal offload decisions that take local calculations with a value of 0 in the set of offload decisions, and X 1 And representing the collection of the unloading decisions of the vehicle terminal, which are calculated locally when the value of the unloading decisions in the group is 1.
And 4, step 4: problem P2 is decomposed into two sub-problems, P3 and P4:
P3:
Figure BDA0003822960630000101
s.t.C4、C5
P4:
Figure BDA0003822960630000102
s.t.C2、C3
solving the problem P3 by using an interior point method to obtain time slot allocation { a, tau }; the lagrangian dual method is used to solve the problem P4 resulting in the computational resource allocation f.
And 5: for each binary unloading decision, the result obtained by solving the problems P3 and P4 is replaced back to the problem P1 and the system time delay obtained by solving the problem is solved, and for all the binary unloading decisions, the binary unloading decision with the minimum system time delay is selected as the optimal unloading decision x i
If the unloading decisions of each vehicle terminal are determined, the problem P1 can be converted into P2, and the conversion process is achieved by directly and respectively calculating the quantified K unloading decisions, but an optimal one is finally selected from the quantified K unloading decisions. The optimal unloading decision is obtained by decomposing P2 into P3 and P4, and then the optimal unloading decision is substituted into P1 to solve the optimal time delay.
All binary offload decisions here refer to K offload decisions that the neural network gets through the then current radio channel gain per time period. The parameter symbols in the above problem do not all address a certain vehicle terminal, but indicate the unloading operation of all vehicle terminals. Neither slot allocation nor computing resource allocation is directed to a certain vehicle terminal, but rather comprises a set of policies for allocating all vehicle terminals for which offload computations are undertaken.
Step 6: the obtained optimal unloading decision x i And a radio channel gain h i Stored in memory as empirical marker data.
And 7: and randomly selecting a data sample (empirical marking data stored in a previous memory) from the memory every delta time frames to train the deep neural network. And updates the DNN parameter θ using Adam algorithm.
Algorithm pseudo code:
Figure BDA0003822960630000103
Figure BDA0003822960630000111
the following description is presented in conjunction with specific embodiments which are meant to be illustrative, but not limiting, of the invention.
The calculation model of the embodiment is composed of two parts, namely local calculation and edge calculation. In a local calculation mode, calculation tasks are directly calculated on the vehicle terminal, and time delay is the local calculation time delay of the vehicle terminal i
Figure BDA0003822960630000121
In the edge calculation mode, a calculation task is unloaded to an edge server by a vehicle terminal for calculation, time delay is uplink data transmission time delay of the vehicle terminal i, and the edge server executes execution time delay required by the task transmitted by the vehicle terminal i and result data return time delay of the edge server. The edge service ignores the time delay generated in the result data return stage of the edge server, and the total time delay of the edge calculation of the vehicle terminal i is
Figure BDA0003822960630000122
Comprises the following steps:
Figure BDA0003822960630000123
the invention optimizes the unloading decision x of each intelligent vehicle terminal i in a combined manner i E.g., {0,1}, channel slot allocation decision { a, τ i Computation resource allocation decision f of edge server i Local computing power f of vehicle terminal i l And data transmission power P i To minimize the system utility function T (x, a, τ, f) l P), an optimization problem P0 of minimum delay is established, which is expressed as follows:
Figure BDA0003822960630000124
P0:
Figure BDA0003822960630000125
s.t.C1:x i ∈{0,1}
C2:
Figure BDA0003822960630000126
C3:
Figure BDA0003822960630000127
C4:
Figure BDA0003822960630000128
C5:
Figure BDA0003822960630000129
C6:
Figure BDA00038229606300001210
C7:
Figure BDA00038229606300001211
wherein: w is a i Calculating a weight factor, w, of the time delay for the i task of the vehicle terminal i The relative size of the vehicle unit (A) reflects the relative priority of different vehicles, namely reflects the sensitivity of calculation tasks of different vehicles to time delay, and the larger the weight factor is, the more sensitive the calculation tasks of the vehicle unit to the time delay is. f. of M Is the maximum computing power of the edge server.
For the problem P0, first the local computing power f of the vehicle terminal i is examined i l And data transmission power P i Impact on problem optimization:
the transmission delay generated when the energy received from the edge server is totally used for local calculation
Figure BDA0003822960630000131
Can be minimized. At this time, the time delay is calculated locally
Figure BDA0003822960630000132
Comprises the following steps:
Figure BDA0003822960630000133
when the energy received from the edge server is totally used for data transmission, the generated transmission delay
Figure BDA0003822960630000134
Can be minimized. The minimum transmission delay is:
Figure BDA0003822960630000135
satisfy local computing power f of vehicle terminal i under optimal condition i l And data transmission power P i After determination, consider the utility function T (x, a, τ, f):
Figure BDA0003822960630000136
at this time, the problem P0 can be transformed into the problem P1:
P1:
Figure BDA0003822960630000137
s.t.C1:x i ∈{0,1}
C2:
Figure BDA0003822960630000138
C3:
Figure BDA0003822960630000139
C4:
Figure BDA00038229606300001310
C5:
Figure BDA00038229606300001311
the algorithm consists of three alternating phases: offloading decision generation, channel and computing resource allocation, offloading policy update. The algorithm framework is shown in fig. 1.
An unloading decision generation stage: by observing the gain h of the radio channel within the time frame t i As an input, DNN is used to establish an offload decision x with a slack t As a mapping between the outputs. Using ReLu function as the activation function of the hidden layer of the neural network and using Sigmoid function as the activation function of the output layer of the neural network to lead the output relaxation unloading decision x t Satisfy x i E (0,1). Decision x for relaxation offloading using an order-preserving quantization (order-preserving quantization) method t Quantization is performed, specifically, for a given 1 ≦ K ≦ N +1, the decision x is unloaded by the relaxation t Generating a set of ground K quantization offload decisions { x k As follows:
the first binary offload decision generated is:
Figure BDA0003822960630000141
x is to be t Are ordered so that their distances are each 0.5, denoted as | x t,(1) -0.5|≤|x t,(2) -0.5|≤…|x t,(i) -0.5|…|x t,(N) -0.5|, where x t,(i) Is x t The ith statistic of (1). Based on x t,(k-1) For the kth (K =1, …, K) binary offload decision x k Comprises the following steps:
Figure BDA0003822960630000142
channel and computing resource allocation phase: the problem is decomposed into two sub-problems of offloading decision and resource allocation. As shown in fig. 1.
The resource allocation subproblem P2 has the expression:
P2:
Figure BDA0003822960630000143
s.t.C2、C3、C4、C5
the method is decomposed into two sub-problems of transmission delay solving and calculation delay solving, which are respectively expressed as:
P3:
Figure BDA0003822960630000144
s.t.C4、C5
P4:
Figure BDA0003822960630000151
s.t.C2、C3
for the transmission delay subproblem P3, solving the time slot allocation { a, tau } by using an interior point method; for the computation delay subproblem P4, the computation resource allocation f is solved using the lagrange dual method.
After solving the optimal channel time slot allocation decision and the edge server computing resource allocation decision, the channel time slot allocation decision { a, tau calculated by K unloading decisions is calculated i } and edge server computing resource allocation decision f i Substituting the problem P1 to calculate the total time delay, selecting the minimum total time delay as the optimal calculation time delay, and the corresponding unloading decision is the optimal unloading decision.
And (3) an unloading strategy updating stage: maintaining an initial empty memory with limited capacity, and making an optimal unloading decision x at the end of each time frame i Corresponding wireless channel gain h i The DNNs are trained using empirical label data stored in memory. When the memory space is full, the oldest data in the memory is replaced with the newest data to ensure that the data samples used for training are reliable. A set of data samples is randomly selected from memory for training using an empirical playback technique. Using AdaThe m-algorithm updates the DNN parameter θ by minimizing the cross-entropy loss. After enough empirical label data has been collected, the DNN is trained only once every δ time frames.
The present invention obtains simulation results by performing simulation with a sensor Flow 1.0 in Python, see fig. 3 to 8.
We used a Powercast TX91501-3W of P =3 watts as the energy transmitter of the edge server, with the P2110 harvester as the energy receiver of each vehicle terminal. Let us note the distance d from the ith vehicle terminal to the edge server i ,d i The values of the vehicle terminals are uniformly distributed among (2.5,5.2) meters, namely, the vehicle terminals are uniformly distributed in a circular ring with the center of the edge server and the values of the upper radius and the lower radius of (2.5,5.2) meters, so that the access points of the edge server can be ensured to cover all the vehicle terminals. Without loss of generality, we assume that the radio channel gain remains constant within one time frame and changes independently between different time frames. We give the value of the weighting factor of the vehicle terminal i by the following rule: if the number of the vehicle terminal i is odd, w i Set to 1, otherwise set to 1.5. Consider a neural network consisting of one input layer, two hidden layers, and one output layer, where the first and second layers of hidden neurons contain 120 and 80 hidden neurons, respectively. Setting the training interval δ =10, training batch size | T | =128, memory size 1024, learning rate of adam optimizer is set to 0.01. Other parameter settings are shown in the following table:
simulation parameters Parameter setting
Bandwidth W 2MHZ
Calculating task data volume D i 5×10 6 bytes
Channel noise power σ 2 1.0×10 -8
Edge server computing power f M 8×10 9 cycles/s
Edge server energy transfer power P 0.5W
Number of cycles per bit phi 100
Transmission overhead coefficient beta i 4
All simulation experiments were run on a computer equipped with an Inter Pentium CPU G860.
Fig. 3 shows the convergence of the loss of neural network training. It can be seen from the figure that the training loss gradually decreases and gradually converges to a smaller value as the number of training times increases. Then, with the further improvement of the training times, the training loss is basically kept stable. The algorithm can be considered to be capable of rapidly converging under the vehicle-mounted environment with time-varying channel conditions, and the unloading decision can be rapidly made.
Definition of
Figure BDA0003822960630000161
Enumerating the optimal system delay, T, calculated for all binary offload decisions using a greedy algorithm * And (h, x) is the optimal time delay obtained by the algorithm. Defining a profitability ratio
Figure BDA0003822960630000162
The algorithm of the invention is shown to find the optimal time delay effectiveness. The profitability variation is shown in fig. 4. When training begins, the income ratio is about 0.9, and it is obvious that the most time delay obtained by the algorithm has a larger difference with the optimal time delay calculated by the greedy algorithm. With the increasing training times, the gain ratio can be seen to approach to 1, which means that the optimal delay difference obtained by the two algorithms is shrinking continuously. After the number of training passes 2100, we can see that the profit-to-profit ratio basically goes to 1 and no longer changes. At this time, the difference between the optimal solutions obtained by the two algorithms is already small, and we can think that the proposed algorithm gradually converges to the optimal solution.
FIGS. 5 to 8 are graphs showing the effects of learning rate, batch size, memory size, and neural network training interval in the Adam optimizer on the convergence and effectiveness of the present invention,
the robustness of the invention is analyzed in the scene of alternating weights. At the beginning of the simulation experiment, the weights of all vehicle units are determined according to the parity of the serial numbers, and for the vehicle terminals with odd serial numbers, the weights are set to be 1, and for the vehicle terminals with even serial numbers, the weights are set to be 1.5. When the training times reach 2500 times, the weights of the vehicle terminals with odd numbers and even numbers are exchanged, and finally when the training times reach 3500 times, the weights are restored to the original state. And (4) solving the optimal time delay under different weight distribution by a greedy algorithm to calculate the revenue ratio. The training loss and the profit-to-profit ratio change are shown in fig. 9 and 10. Despite the change in weights, training loss and profitability are not affected, and a solution that closely approximates the optimal solution can still be obtained. This illustrates that the present invention can quickly automatically adjust the offloading policy based on the changed weights and eventually converge to a new optimal offloading decision.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made in the above embodiments by those of ordinary skill in the art without departing from the principle and spirit of the present invention.

Claims (4)

1. An intelligent computing unloading method in a vehicle edge computing scene based on deep reinforcement learning is characterized in that: the method comprises the following steps:
step 1: based on the wireless channel gain h in the time frame i Generating a relaxed set of offload decisions x by a deep neural network t
Step 2: the relaxed unloading decision set x generated in the step 1 is processed by an order-preserving quantification method t Quantizing to K binary offload decisions;
and step 3: each binary unloading decision x obtained by quantization k Substituting into problem P1:
Figure FDA0003822960620000011
s.t.C1:x i ∈{0,1}
Figure FDA0003822960620000012
Figure FDA0003822960620000013
Figure FDA0003822960620000014
Figure FDA0003822960620000015
where N is the number of vehicle terminals, x i For the unloading action of vehicle terminal i (i =1, …, N), x i =1 denotes the offloading of the computing task of the vehicle terminal i to the edge server, x i =0 indicates that the calculation task of the vehicle terminal i is executed locally; f. of M Total computing resources owned by the edge server, f i Allocating computing resources, T, of a vehicle terminal i to an edge server i c Calculating the total time delay, w, for the edge of the vehicle terminal i i Calculating a weight factor, T, of the time delay for the i task of the vehicle terminal i l Calculating the time delay for the local of the vehicle terminal i, wherein a is the time ratio of the energy channel time length transmitted to the vehicle terminal by the edge server, and tau i The channel duration of the vehicle terminal i is proportional to the channel duration;
translating the problem P1 into a resource allocation sub-problem P2:
Figure FDA0003822960620000016
s.t.C2、C3、C4、C5
where φ is the number of cycles required to process 1 bit of task data, D i Calculating the data volume of the task on the vehicle terminal i, mu is the energy harvesting efficiency, P is the transmitting power of the edge server, h i For the gain of the radio channel in the i-th time period, k i Calculating an energy efficiency coefficient, beta, for a vehicle terminal i i A transmission overhead coefficient for the uplink of the vehicle terminal i; x 0 And X 1 For representing a collection of vehicles for taking local calculations and off-loading calculations, respectively; w is the bandwidth of the wireless channel; sigma 2 Is the gaussian noise spectral density of the wireless channel;
and 4, step 4: problem P2 is decomposed into two sub-problems, P3 and P4:
Figure FDA0003822960620000021
s.t.C4、C5
Figure FDA0003822960620000022
s.t.C2、C3
solving the problem P3 to obtain time slot allocation { a, tau }; solving the problem P4 to obtain the calculation resource allocation f;
and 5: for each binary unloading decision, replacing the result obtained by solving the problems P3 and P4 into the problem P1 and solving the obtained system time delay, and for all the binary unloading decisions, selecting the binary unloading decision with the minimum system time delay as the optimal unloading decision;
step 6: the obtained optimal unloading decision and the wireless channel gain h i Storing the data as experience marking data in a memory;
and 7: and randomly selecting a data sample from the memory every delta time frames, training the deep neural network, updating the parameter theta in the deep neural network, and then returning to the step 1 until the method is finished.
2. The intelligent computing offloading method in a deep reinforcement learning based vehicle edge computing scenario of claim 1, wherein: the total time delay of the edge calculation of the vehicle terminal i is T i c Is T i c =T i r +T i es Wherein T is i r For the uplink data transmission delay, T, of the vehicle terminal i i es And the edge server executes the execution time delay required by the task transmitted by the vehicle terminal i.
3. The intelligent computing offloading method in a deep reinforcement learning based vehicle edge computing scenario of claim 1, wherein: in step 2, the relaxed unload decision x is taken t The process of quantifying the offload decisions into K bins is:
for a given 1 ≦ K ≦ N +1, the decision x is taken by the relaxation offload t Generating a set of K quantization offload decisions { x } k }(k=1,…,K);
The first binary offload decision x generated when k takes 1 1,i Comprises the following steps:
Figure FDA0003822960620000031
wherein x t,i Make a decision x for offloading t The ith element in (1);
x is to be t Are ordered according to their distance from 0.5, denoted as | x t,(1) -0.5|≤|x t,(2) -0.5|≤…≤|x t,(i) -0.5|…≤|x t,(N) -0.5|, where x t,(i) Is x t The ith statistic of (1); based on x t,(k-1) For the kth (K =1, …, K) binary offload decision x k Comprises the following steps:
Figure FDA0003822960620000032
4. the intelligent computing offloading method in a deep reinforcement learning based vehicle edge computing scenario of claim 1, wherein: in step 4, solving the problem P3 by using an interior point method to obtain time slot allocation { a, tau }; the problem P4 is solved using the lagrangian dual method to obtain the computational resource allocation f.
CN202211048897.6A 2022-08-30 2022-08-30 Intelligent computing unloading method in vehicle edge computing scene based on deep reinforcement learning Active CN115460710B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211048897.6A CN115460710B (en) 2022-08-30 2022-08-30 Intelligent computing unloading method in vehicle edge computing scene based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211048897.6A CN115460710B (en) 2022-08-30 2022-08-30 Intelligent computing unloading method in vehicle edge computing scene based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN115460710A true CN115460710A (en) 2022-12-09
CN115460710B CN115460710B (en) 2024-08-23

Family

ID=84300120

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211048897.6A Active CN115460710B (en) 2022-08-30 2022-08-30 Intelligent computing unloading method in vehicle edge computing scene based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN115460710B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116166444A (en) * 2023-04-26 2023-05-26 南京邮电大学 Collaborative reasoning method oriented to deep learning hierarchical model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20210063990A (en) * 2019-11-25 2021-06-02 경희대학교 산학협력단 Method of machine learning based unmanned aerial vehicle mobile edge server collabrative task matching and offloading
CN113296845A (en) * 2021-06-03 2021-08-24 南京邮电大学 Multi-cell task unloading algorithm based on deep reinforcement learning in edge computing environment
CN114025359A (en) * 2021-11-01 2022-02-08 湖南大学 Resource allocation and computation unloading method, system, device and medium based on deep reinforcement learning
US20220210686A1 (en) * 2020-07-15 2022-06-30 Nantong University Energy-efficient optimized computing offloading method for vehicular edge computing network and system thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20210063990A (en) * 2019-11-25 2021-06-02 경희대학교 산학협력단 Method of machine learning based unmanned aerial vehicle mobile edge server collabrative task matching and offloading
US20220210686A1 (en) * 2020-07-15 2022-06-30 Nantong University Energy-efficient optimized computing offloading method for vehicular edge computing network and system thereof
CN113296845A (en) * 2021-06-03 2021-08-24 南京邮电大学 Multi-cell task unloading algorithm based on deep reinforcement learning in edge computing environment
CN114025359A (en) * 2021-11-01 2022-02-08 湖南大学 Resource allocation and computation unloading method, system, device and medium based on deep reinforcement learning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
MINGHUI MIN; LIANG XIAO; YE CHEN; PENG CHENG; DI WU; WEIHUA ZHUANG: "Learning-based computation offloading for IoT devices with energy harvesting", IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 1 January 2019 (2019-01-01) *
MOLIN LI; TONG CHEN; JIAXIN ZENG; XIAOBO ZHOU; KEQIU LI; HENG QI: "D2D-Assisted Computation Offloading for Mobile Edge Computing Systems with Energy Harvesting", 2019 20TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT), 12 March 2020 (2020-03-12) *
张泽维,李陶深,杨林峰: "多目标优化的SWIPT-MEC任务分级卸载架构与优化算法", 广西大学学报, 30 April 2022 (2022-04-30) *
汪彦婷: "移动边缘计算中异质资源联合调度策略研究", 信息科技辑, 15 February 2020 (2020-02-15) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116166444A (en) * 2023-04-26 2023-05-26 南京邮电大学 Collaborative reasoning method oriented to deep learning hierarchical model

Also Published As

Publication number Publication date
CN115460710B (en) 2024-08-23

Similar Documents

Publication Publication Date Title
CN111245651B (en) Task unloading method based on power control and resource allocation
Bi et al. Lyapunov-guided deep reinforcement learning for stable online computation offloading in mobile-edge computing networks
CN110928654B (en) Distributed online task unloading scheduling method in edge computing system
CN111556461A (en) Vehicle-mounted edge network task distribution and unloading method based on deep Q network
CN112105062B (en) Mobile edge computing network energy consumption minimization strategy method under time-sensitive condition
Yao et al. Caching in dynamic IoT networks by deep reinforcement learning
Xie et al. Backscatter-assisted computation offloading for energy harvesting IoT devices via policy-based deep reinforcement learning
CN114650228B (en) Federal learning scheduling method based on calculation unloading in heterogeneous network
CN110233755B (en) Computing resource and frequency spectrum resource allocation method for fog computing in Internet of things
CN113727362B (en) Unloading strategy method of wireless power supply system based on deep reinforcement learning
CN114554495B (en) Federal learning-oriented user scheduling and resource allocation method
Bi et al. Stable online computation offloading via lyapunov-guided deep reinforcement learning
CN115460710B (en) Intelligent computing unloading method in vehicle edge computing scene based on deep reinforcement learning
CN113821346B (en) Edge computing unloading and resource management method based on deep reinforcement learning
He et al. Computation offloading and resource allocation based on DT-MEC-assisted federated learning framework
Huang et al. Performance optimization for energy-efficient industrial Internet of Things based on ambient backscatter communication: An A3C-FL approach
Han et al. Multi-step reinforcement learning-based offloading for vehicle edge computing
Jiao et al. Deep reinforcement learning for time-energy tradeoff online offloading in MEC-enabled industrial internet of things
CN114521023A (en) SWIPT-assisted NOMA-MEC system resource allocation modeling method
CN116341679A (en) Design method of federal edge learning scheduling strategy with high aging
CN115914230A (en) Adaptive mobile edge computing unloading and resource allocation method
CN113157344B (en) DRL-based energy consumption perception task unloading method in mobile edge computing environment
Gao et al. Deep reinforcement learning-based computation offloading and optimal resource allocation in industrial Internet of Things with NOMA
Wang et al. Adaptive compute offloading algorithm for metasystem based on deep reinforcement learning
CN114281527A (en) Low-complexity mobile edge computing resource allocation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant